Five steps to set up parallel R computation
1. Install and load packages into your workspace
install.packages(c("doParallel", "foreach", "plyr"))
require(doParallel)
require(foreach)
require(plyr)
doParallel and foreach register the cores and set up parallel computing; plyr specifies the functions to be performed.
2. Load your data
data(iris)
In my case, I am using the Edgar Anderson's iris dataset. You might want to import you dataset via functions like read.table.
3. Specify the number of clusters to be used
detectCores()
## [1] 8
cl <- makeCluster(6)
registerDoParallel(cl)
detectCores() is not obligatory. It checks the number of cores your PC/Notebook/server has (in case you don't know it). makeCluster() and registerDoParallel() setup the cores (also called registering). I am using six out of eight cores (two cores reserved for my remaining applications, such as e-mail client).
4. Start your analysis
ptm <- proc.time()
mo <- dlply(iris, .(Species), function(x)
lm(x$Sepal.Width ~ x$Petal.Length, data=x),
.parallel=TRUE,.paropts = list(.packages = NULL,.export="iris"))
diff1 <- proc.time() - ptm diff1
## user system elapsed
## 0.028 0.001 0.453
Here, I am using the iris dataset as input to produce a list with model parameters for each species. In the .paropts argument, you purge all necessary data (and packages) into the workspace of each core. In my case, I am purging only the iris dataset; no special packages are needed. The proc.time() wrapper functions records the time to run the process (not obligatory).
5. Unregister your clusters
stopCluster(cl)
You're done!