Beyond Single Core: Parallel Analysis in R
R is a great environment for interactive analysis on your desktop, but when your data needs outgrow your personal computer, it’s not clear what to do next.
This is material for a short overview of scalable data analysis in R. The slides can be viewed at https://ljdursi.github.io/beyond-single-core-R .
- How to think about parallelism and scalability in data analysis
- The standard parallel package, including what was the snow and multicore facilities, using airline data as an example
- The foreach package, using airline data and simple stock data;
- A summary of best practices.
Included in the materials, though not in the talk, are some more advanced methods: * The bigmemory package for out-of-core computation on large data matrices, with a simple physical sciences example; * The Rdsm package for shared memory; and * a brief introduction to the powerful pbdR pacakges for extremely large-scale computation.