Our world is generating more and more data, which people and businesses want to turn into something useful. This naturally attracts many data scientists – or sometimes called data analysts, data miners, and many other fancier names – who aim to help with this extraction of information from data.
A lot of data scientists around me graduated in statistics, mathematics, physics or biology. During their studies they focused on individual modelling techniques or nice visualizations for the papers they wrote. Nobody had ever taken a proper computer science course that would help them tame the programming language completely and allow them to produce a nice and professional code that is easy to read, can be re-used, runs fast and with reasonable memory requirements, is easy to collaborate on and most importantly gives reliable results.
I am no exception to this. During my studies we used R and Matlab to get a hands-on experience with various machine learning techniques. We obviously focused on choosing the best model, tuning its parameters, solving for violated model assumptions and other rather theoretical concepts. So when I started my professional career I had to learn how to deal with imperfect input data, how to create a script that can run daily, how to fit the best model and store a predictions in a database. Or even to use them directly in some online client facing point.
To do this I took the standard path. Reading books, papers, blogs, trying new stuff working on hobby projects, googling, stack-overflowing and asking colleagues. But again mainly focusing on overcoming small ad hoc problems.
Luckily for me, I’ve met a few smart computer scientists on the way who showed me how to develop code that is more professional. Or at least less amateurish. What follows is a list of the most important points I had to learn since I left the university. These points allowed me to work on more complex problems both theoretically and technically. I must admit that making your coding skills better is a never ending story that restarts with every new project.