Hadley Wickham talks about R performance:
http://adv-r.had.co.nz/Performance.html
dplyr, tidyr & ggplot2
https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
Set of functions with fast implementation to replace the base ones.
1 March 2019
Hadley Wickham talks about R performance:
http://adv-r.had.co.nz/Performance.html
dplyr, tidyr & ggplot2
https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
Set of functions with fast implementation to replace the base ones.
Flexibility over performance:
\(+\) minimal upfront planning
\(-\) hard to optimize
Less flexibility => faster
Win speed by being 'sparse':
General functions are slow.
\(lm.fit()\) instead of \(lm()\)
\(vapply()\) instead of \(sapply()\)
look on CRAN for existing solutions
Win speed by being precise:
R has to guess if no argument is given.
\(unlist(x, use.names = F)\) instead of \(unlist(x)\)
use option \(colClasses\) in \(read.csv()\)
Byte code compilation.
Parallelisation.
'Parallel R' by Q. Ethan McCallum and Stephen Weston
Object is copied each time it's expanded.
The apply functions are not the solution!
Vectorisation when possible (because C loops).
Package on CRAN that makes it very simple to connect C++ to R.
More flexible and often faster than functional optimisation !
Good free tutorial for C++:
'Seamless R and C++ integration with Rcpp' by Dirk Eddelbuettel
Need a working C++ compiler.
Install and load Rcpp.
Open a C++ file in RStudio.
'#include <Rcpp.h>' and 'using namespace Rcpp;' are needed at the top.
'// [[Rcpp::export]]' before a function.
Write your Rcpp functions.
'sourceCpp()' the file where needed.
Some objects are not created through assignment.
= instead of <- to assign values.
Declare type of all variables.
Add semicolons at the end of each line.
for(init; check; increment)
Vectors start at 0 !
Use ( instead of [ to index the matrix.
Explicit 'return' statement.
Scalar: double, int, String, bool.
Vector: NumericVector, IntegerVector, CharacterVector, LogicalVector.
Matrix: NumericMatrix, IntegerMatrix, CharacterMatrix, LogicalMatrix.
Also exists equivalent of list and data-frame.
Etc.
When stuck with a slow piece of code:
Look for alternative implementations in well-known packages.
Choose a function specifically tailored for the task.
Be explicit with the options.
When considering writing a loop in R:
Think about a vectorized alternative.
Code it in Rcpp!