For the last few years everyone talks about the importance of advanced analytics for manufacturing as a next step after lean and Six Sigma programs and what great potential it can unleash. So when we were in front of our first project we were naturally very excited and curious what can be done. The outcome of the project exceeded our expectations both in terms of data modelling and more importantly business results for our client.
It’s been almost 3 years since I started aLook. First as one-man-show, later joined by friends and family. During this time we worked on more than 60 projects with many partners for clients all over the world. It seems we mostly did a good job if I can say that from the returning customers and partners recommending us to their clients. And now we’re hiring!
Trying to motivate the team to work during our first hackathon. 1994 Sid Meier’s Colonization on a phone shared via Apple TV is hard to beat…
For businesses where clients generate revenues over time knowing who will be your most valuable clients in the future is very handy information. Especially if you want to optimise your service models. Continue reading
For our client, an international start-up company (South Africa, Great Britain, Switzerland…), we are currently looking for (1) behavioral data scientist and (2) client delivery analyst.
Table mountain, Cape Town (SA)
The client you would be working for is a company who provides big corporations with employee behavioral analytics. Our team is responsible for building and maintaining their analytical platform as well as for supporting the internal team of behavioral scientist in developing measurements.
The positions we are offering are demanding but do come with their unique advantages. Firstly, we don’t mind when or where you work as long as you deliver what you are supposed to. Secondly, you will have a huge opportunity to grow in data science and related fields, supported by our experienced team. And thirdly, you will be in direct contact with international start-up environment. Continue reading
Monte Carlo method is a handy tool for transforming problems of probabilistic nature into deterministic computations using the law of large numbers. Imagine that you want to asses the future value of your investments and see what is the worst-case scenario for a given level of probability. Or that you want to plan the production of your factory given past daily performance of individual workers to ensure that you will meet a tough delivery plan with high enough probability. For such and many more real-life tasks you can use the Monte Carlo method.
Monte Carlo approximation of Pi
When building predictive models, you obviously need to pay close attention to their performance. That is essentially what it is all about – getting the prediction right. Especially if you are working for paying clients you need to prove that the performance of your models is good enough for their business. Fortunately, there is a whole bunch of statistical metrics and tools at hand for assessing model’s performance.
In my experience, performance metrics for (especially binary) classification tasks such as confusion matrix and derived metrics are naturally understood by almost anyone. A bit more problematic is the situation for regression and time series. For example when you want to predict future sales or want to derive income from other parameters, you need to show how close your prediction is to the observed reality.
I will not write about (adjusted) R-squared, F-test and other statistical measures. Instead, I want to focus on performance metrics that should represent more intuitive concept of performance as I believe they can help you to sell your work much more. These are:
- mean absolute error
- median absolute deviation
- root mean squared error
- mean absolute percentage error
- mean percentage error
Our world is generating more and more data, which people and businesses want to turn into something useful. This naturally attracts many data scientists – or sometimes called data analysts, data miners, and many other fancier names – who aim to help with this extraction of information from data.
A lot of data scientists around me graduated in statistics, mathematics, physics or biology. During their studies they focused on individual modelling techniques or nice visualizations for the papers they wrote. Nobody had ever taken a proper computer science course that would help them tame the programming language completely and allow them to produce a nice and professional code that is easy to read, can be re-used, runs fast and with reasonable memory requirements, is easy to collaborate on and most importantly gives reliable results.
I am no exception to this. During my studies we used R and Matlab to get a hands-on experience with various machine learning techniques. We obviously focused on choosing the best model, tuning its parameters, solving for violated model assumptions and other rather theoretical concepts. So when I started my professional career I had to learn how to deal with imperfect input data, how to create a script that can run daily, how to fit the best model and store a predictions in a database. Or even to use them directly in some online client facing point.
To do this I took the standard path. Reading books, papers, blogs, trying new stuff working on hobby projects, googling, stack-overflowing and asking colleagues. But again mainly focusing on overcoming small ad hoc problems.
Luckily for me, I’ve met a few smart computer scientists on the way who showed me how to develop code that is more professional. Or at least less amateurish. What follows is a list of the most important points I had to learn since I left the university. These points allowed me to work on more complex problems both theoretically and technically. I must admit that making your coding skills better is a never ending story that restarts with every new project.