Author Archives: adamvotava

We are looking for new colleagues! Behavioral data scientist & Client delivery analyst to extend our team

For our client, an international start-up company (South Africa, Great Britain, Switzerland…), we are currently looking for (1) behavioral data scientist and (2) client delivery analyst.

Table mountain, Cape Town (SA)

The client you would be working for is a company who provides big corporations with employee behavioral analytics. Our team is responsible for building and maintaining their analytical platform as well as for supporting the internal team of behavioral scientist in developing measurements.

The positions we are offering are demanding but do come with their unique advantages. Firstly, we don’t mind when or where you work as long as you deliver what you are supposed to. Secondly, you will have a huge opportunity to grow in data science and related fields, supported by our experienced team. And thirdly, you will be in direct contact with international start-up environment.  Continue reading

Monte Carlo Method in R (with worked examples)

Monte Carlo method is a handy tool for transforming problems of probabilistic nature into deterministic computations using the law of large numbers. Imagine that you want to asses the future value of your investments and see what is the worst-case scenario for a given level of probability. Or that you want to plan the production of your factory given past daily performance of individual workers to ensure that you will meet a tough delivery plan with high enough probability. For such and many more real-life tasks you can use the Monte Carlo method.

Monte Carlo approximation of Pi

Continue reading

Understanding the behavior of regression performance metrics

When building predictive models, you obviously need to pay close attention to their performance. That is essentially what it is all about – getting the prediction right. Especially if you are working for paying clients you need to prove that the performance of your models is good enough for their business. Fortunately, there is a whole bunch of statistical metrics and tools at hand for assessing model’s performance.

In my experience, performance metrics for (especially binary) classification tasks such as confusion matrix and derived metrics are naturally understood by almost anyone. A bit more problematic is the situation for regression and time series. For example when you want to predict future sales or want to derive income from other parameters, you need to show how close your prediction is to the observed reality.

I will not write about (adjusted) R-squared, F-test and other statistical measures. Instead, I want to focus on performance metrics that should represent more intuitive concept of performance as I believe they can help you to sell your work much more. These are:

  • mean absolute error
  • median absolute deviation
  • root mean squared error
  • mean absolute percentage error
  • mean percentage error

Continue reading

8 simple ways how to boost your coding skills (not just) in R

Our world is generating more and more data, which people and businesses want to turn into something useful. This naturally attracts many data scientists – or sometimes called data analysts, data miners, and many other fancier names – who aim to help with this extraction of information from data.

A lot of data scientists around me graduated in statistics, mathematics, physics or biology. During their studies they focused on individual modelling techniques or nice visualizations for the papers they wrote. Nobody had ever taken a proper computer science course that would help them tame the programming language completely and allow them to produce a nice and professional code that is easy to read, can be re-used, runs fast and with reasonable memory requirements, is easy to collaborate on and most importantly gives reliable results.

I am no exception to this. During my studies we used R and Matlab to get a hands-on experience with various machine learning techniques. We obviously focused on choosing the best model, tuning its parameters, solving for violated model assumptions and other rather theoretical concepts. So when I started my professional career I had to learn how to deal with imperfect input data, how to create a script that can run daily, how to fit the best model and store a predictions in a database. Or even to use them directly in some online client facing point.

To do this I took the standard path. Reading books, papers, blogs, trying new stuff working on hobby projects, googling, stack-overflowing and asking colleagues. But again mainly focusing on overcoming small ad hoc problems.

Luckily for me, I’ve met a few smart computer scientists on the way who showed me how to develop code that is more professional. Or at least less amateurish. What follows is a list of the most important points I had to learn since I left the university. These points allowed me to work on more complex problems both theoretically and technically. I must admit that making your coding skills better is a never ending story that restarts with every new project.

Continue reading

Propensity modelling and how it is relevant for modern marketing

In the last few years the obvious fact that for successful marketing you need to “contact the right customers with the right offer through the right channel at the right time” has become something of a mantra. While there is nothing to disagree here, it is a pity that for most part the saying stays in words and only gets realized in rare cases. The issue is that while many can repeat the mantra, only few actually know what is needed to put it in practice.  In this post, I am going to talk about the first part – how to target the right customers for your marketing actions?

There are many approaches to solving this great puzzle. One of the extreme solutions is having a team of marketing experts who rely solely on their gut feeling, projecting their opinions on customers, without any proof, not even evaluating or testing the campaigns. Because that’s what they did in their previous job. It might sound ridiculous in today’s digital era, but surprisingly it is often the case.

right_customers

The other extreme is building complex AI engines and let them make all the decisions. This is typically a proposition by some geeky start-up run by fresh PhD holders. This approach is in my opinion also wrong. First, you have absolutely no assurance that the data available truly reflect the reality, that the algorithm works flawlessly or simply that the randomness in the world is not too strong to predict. After all, even companies running algorithmic trading have human dealers overseeing their algorithms, who focus on addressing weaknesses of the algorithms and generally on preventing internal disasters.

As always, I think that the solution lies somewhere in between. An experienced marketer, whose opinion is backed by information extracted from the data available, can truly hit it. Imagine that you have to run a campaign to increase sales of a saving account (or a road bike, new robot, a holiday in Caribbean…). The long proven data extraction technique one should consider is called propensity to buy (or to purchase or to use).

Continue reading

How to plot your own bike/jogging route using Python and Google Maps API

Apart from being a data scientist, I also spend a lot of time on my bike. It is therefore no surprise that I am a huge fan of all kinds of wearable devices. Lots of the times though, I get quite frustrated with the data processing and data visualization software that major providers of wearable devices offer. That’s why I have been trying to take things to my own hands. Recently I have started to play around with plotting my bike route from Python using Google Maps API. My novice’s guide to all this follows in the post.

strava_map

Continue reading

Data Science in Marketing – An Efficient Alternative to Attribution Modelling

Driving marketing budget sometimes seems to be a mysterious art where decisions are based on ideas of few enlightened people, who know what’s right. But you should not fool yourself, the times are changing and so is the way successful marketing is managed. The same as in other fields, experienced marketing managers use information hidden in data to help them. With the amount of data and methods available, it is however often tricky not to get lost and be able to distinguish the signal from the noise. Typical examples are the marketing attribution models – a tool that is widely used, but in my experience rarely maximizes the leveraged value of data. 

Typically, in marketing attribution, marketers want to know, which part of the business KPI (typically site visits, sales, new customers, new revenues etc.) result from which marketing activity. Mainstream approach is to use attribution models that are often very simplistic – like single source attribution (last click, first click) or fractional attribution (where the contribution is distributed among multiple touch points given some simple rule). These methods provide marketers with the importance of each marketing channel or campaign in respect to their KPI. Based on this historical information the marketing managers make a decision about how to allocate the marketing budget. This approach however puts a great deal of pressure to tedious and demanding data detective work to make sure all client touch points are measured correctly. More importantly, there is no way of knowing that this work has been done correctly, which of course has significant impact on credibility of the attribution models.

Knowing these difficulties, we decided for an alternative approach. We thought: Why should we dig into the individual touch points? Shouldn’t we rather focus on marketing investments and model the ultimate business output? And that is exactly what we did. We took investments into individual marketing channels in time and used time series analysis to predict our client’s business goal (number of sales). On top of it, we also added seasonality, marketing investment of competitors and some other simple parameters.


“Even though we are using data to drive marketing decisions on a daily basis, most of the tools that we have used up until now focus on describing the past. Recently we decided to work together with aLook Analytics to change that. Thanks to their modelling approach to marketing investments we now have accurate information about the expected future developments as well.

Using the interactive Shiny application that is built in Keboola Connection, we want to make informed decisions on the fly, which will help us to reach our sales goals in the most cost efficient way.”

Daniel Gorol, BNP Paribas Personal Finance SA / Cetelem


Continue reading