1 March 2019

Main source

Why R is slow

Flexibility over performance:

\(+\) minimal upfront planning

\(-\) hard to optimize

Less flexibility => faster

Tailored functions

Win speed by being 'sparse':

  • General functions are slow.

  • \(lm.fit()\) instead of \(lm()\)

  • \(vapply()\) instead of \(sapply()\)

  • look on CRAN for existing solutions

Specific options

Win speed by being precise:

  • R has to guess if no argument is given.

  • \(unlist(x, use.names = F)\) instead of \(unlist(x)\)

  • use option \(colClasses\) in \(read.csv()\)

Other methods

  • Byte code compilation.

  • Parallelisation.

'Parallel R' by Q. Ethan McCallum and Stephen Weston

Why loops are even slower

  • Object is copied each time it's expanded.

  • The apply functions are not the solution!

  • Vectorisation when possible (because C loops).

Rcpp

  • Package on CRAN that makes it very simple to connect C++ to R.

  • More flexible and often faster than functional optimisation !

  • Good free tutorial for C++:

http://www.learncpp.com/

  • Further reference:

'Seamless R and C++ integration with Rcpp' by Dirk Eddelbuettel

Getting started

  1. Need a working C++ compiler.

  2. Install and load Rcpp.

  3. Open a C++ file in RStudio.

  4. '#include <Rcpp.h>' and 'using namespace Rcpp;' are needed at the top.

  5. '// [[Rcpp::export]]' before a function.

  6. Write your Rcpp functions.

  7. 'sourceCpp()' the file where needed.

Gibbs sampler case study

  • Some objects are not created through assignment.

  • = instead of <- to assign values.

  • Declare type of all variables.

  • Add semicolons at the end of each line.

  • for(init; check; increment)

  • Vectors start at 0 !

  • Use ( instead of [ to index the matrix.

  • Explicit 'return' statement.

Type of variables

  • Scalar: double, int, String, bool.

  • Vector: NumericVector, IntegerVector, CharacterVector, LogicalVector.

  • Matrix: NumericMatrix, IntegerMatrix, CharacterMatrix, LogicalMatrix.

  • Also exists equivalent of list and data-frame.

  • Etc.

Summary (1)

When stuck with a slow piece of code:

  • Look for alternative implementations in well-known packages.

  • Choose a function specifically tailored for the task.

  • Be explicit with the options.

Summary (2)

When considering writing a loop in R:

  • Think about a vectorized alternative.

  • Code it in Rcpp!