If there is one thing you learn soon as a data scientist, it is that problem solving gets an extra dimension as the data volume grows. One typical example is building recommendation engines.
A very basic form of a recommendation engine can be built using just a simple matrix algebra. But the situation quickly changes when analyzing data about many thousands customers, who are buying or rating several hundreds of products, which generates large and so called sparse data sets (where a lot of customer-item combinations do not have any value assigned). There are two main problems to overcome – how to store these large sparse matrices and how to run quick calculations over them.
In the following post, I will describe how to approach this problem in R using the package Matrix. The package allows to store large matrices in R’s virtual memory, supports standard matrix operations (transpose, matrix multiplication, element-wise multiplication etc.) and also provides a nice toolkit to develop new custom functions needed for recommendation engines as well as for other applications (here or here) where sparse matrices are used.