There are many situations where we find that our code runs too slow and we don’t know the apparent reason. For such situations it comes very handy to use the python cProfile module. The module enables us to see the time individual steps in our code take, as well as the number of times certain functions are being called. In the following paragraphs, I will explore it’s capabilities a bit more.
However first let’s remember the quote by Donald Knuth: “premature optimization is the root of all evil (or at least most of it) in programming”. So make sure that you don’t start optimizing before you even have a working code! In many cases you will not be able to determine the bottlenecks beforehand and might spend a lot of extra effort in the wrong places.
Profiling with cProfile
The easiest way of using the cProfile module from within a Python script can look as follows
pr = cProfile.Profile()
In the code we create a Profile object and enable it, then execute the code that is of interest to us, disable the profiling and view the results.
Our world is generating more and more data, which people and businesses want to turn into something useful. This naturally attracts many data scientists – or sometimes called data analysts, data miners, and many other fancier names – who aim to help with this extraction of information from data.
A lot of data scientists around me graduated in statistics, mathematics, physics or biology. During their studies they focused on individual modelling techniques or nice visualizations for the papers they wrote. Nobody had ever taken a proper computer science course that would help them tame the programming language completely and allow them to produce a nice and professional code that is easy to read, can be re-used, runs fast and with reasonable memory requirements, is easy to collaborate on and most importantly gives reliable results.
I am no exception to this. During my studies we used R and Matlab to get a hands-on experience with various machine learning techniques. We obviously focused on choosing the best model, tuning its parameters, solving for violated model assumptions and other rather theoretical concepts. So when I started my professional career I had to learn how to deal with imperfect input data, how to create a script that can run daily, how to fit the best model and store a predictions in a database. Or even to use them directly in some online client facing point.
To do this I took the standard path. Reading books, papers, blogs, trying new stuff working on hobby projects, googling, stack-overflowing and asking colleagues. But again mainly focusing on overcoming small ad hoc problems.
Luckily for me, I’ve met a few smart computer scientists on the way who showed me how to develop code that is more professional. Or at least less amateurish. What follows is a list of the most important points I had to learn since I left the university. These points allowed me to work on more complex problems both theoretically and technically. I must admit that making your coding skills better is a never ending story that restarts with every new project.