I learned Python before getting to know R, but I now use R more often than Python. When trying to loop lots of lots of items for processing, I always wanted to just
import multiprocessing as mp, but I haven’t found something like that in R until very recently.
If you have some familiarity in R, especially experiences of dealing with data, you probably have used the
tidyverse collection, especially how easy to use the
map_* functions in the
purrr package. But do you also know that there is the multiprocessing version
furrr package and its
future_map_* family of function? Basically, every single
map function in
purrr will have a counterpart in
But I always find the default
future package not as easy to work with as I would like. Until I came across
Let’s have an example.
## Loading required package: future
plan(callr, workers = 10) # choose the number of workers wisely # first try the serial version tictoc::tic() purrr::walk(1:10, function(x) Sys.sleep(1)) tictoc::toc()
## 10.039 sec elapsed
# then the multiprocessing via callr tictoc::tic() furrr::future_walk(1:10, function(x) Sys.sleep(1)) tictoc::toc()
## 2.383 sec elapsed
There are, in fact, some costs when using multiprocessing, and this will be compensated if you apply it on many many things. You code will run faster, with multiple CPU cores carrying out the jobs in the same time.