Monthly Archives: April 2016

Data Scientists toolbox: Cloud set-up using Ansible

Given the size of data and complexity of processing, many Data Science projects require scalability that can be provided by cloud environments. Clouds combine high performance and cost-efficiency and are therefore very much sought after. 

The set-up of cloud environment can be quite tedious – fortunately, the needed infrastructure installations are often similar across projects and therefore tools that enable automated infrastructure installations can be used to minimize the manual workload.  The following blog post covers setting up a cloud environment using Ansible, which is one such program.

We will talk about UNIX based system as:

  • most people already have a working knowledge of Windows based systems, but are much less knowledgeable in terms of UNIX
  • most UNIX based systems are open-source under free licenses, so they are cheaper to run in general
  • some handy products (e.g. RStudio server) are only UNIX based so a situation where basic knowledge is necessary can arise

Continue reading

Brighttalk webinar: Data Science as a profit booster for small businesses

Data science or its alter egos – predictive analytics, machine learning or AI – can, if implemented correctly, offer an impressive ROI. Moreover, together with new technologies and products being developed, the data-driven approach becomes more accessible every day. Even medium and small businesses can now leverage the value of data science. In our experience, the proposition is even more rewarding for them than for large corporations.

Why is that? What are some typical applications of data science for small companies?

Watch our Brighttalk webinar to hear more.aLook_at_Brighttalk

Data Science as a profit booster for SMB (2/4): Barriers and who can overcome them?

In the previous post, I explained how I view data science and its value added in the business context. Despite it clearly offers a huge potential for many companies, I think data science is not used successfully and widely enough as it should be. One important barrier is that it stays very abstract.  And surprisingly, it is the small/medium businesses that can overcome the barriers in its implementation the easiest.

Why is Data Science so difficult to grasp?

It really does sound quite glamorous – being able to read and predict from data, having automated dashboards and apps…. All the fancy buzz-words that go together with data science such as big dataartificial intelligence or deep learning show that it is something that attracts attention. But as one nice quote put it, many of these words are like teenage sex – everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…

So why it is so difficult to make Data Science happen when it so great?

Continue reading

Data Science as a profit booster for SMB(1/4): What is Data Science?

Data science is far from being a new field. Yet, for quite a substantial portion of mostly smaller businesses, the term data science stays obscure and difficult to grasp. Many of them keep hearing about it, but they are not really sure if or how they should use it. In my practice, I have come to the conclusion that it is above all them, the SMBs, for whom data science represents a great opportunity.

This is the first of blog posts, in which I would like to share my thoughts on what is the value added of data science as well as how businesses can make use of it. 

Question #1 – What the heck actually is Data Science

The term “data science” is considered just as a new buzzword by many people, all the more so that understanding what it means ideally requires at least some technical awareness. Not that it is necessary to understand data science to be able to make use of it, but unfortunately people tend not to like things they don’t understand.

Data science enables us just to get verified conclusions that matter.

When I need to explain what data science is, I use the very general saying that ‘data science is about extracting knowledge and insights from large amounts of data’. Sure, it can seem that this is nothing new – after all, analyzing data in Excel has been possible since 1985. But let’s not do the same mistake as many famous consultancy companies by believing that applying business intuition over Excel and PowerPoint visualizations is data-driven approach that can give anyone an edge over competition in the 21st century.

Data science is much more than that. It enables us not just to describe and form an impression, but to get verified conclusions based on data in various forms and locations and seamlessly present business-relevant results via lucid, easy-to-distribute, scalable data visualizations and automated products.

The extreme power of data science solutions then in my opinion lies especially in their ability to predict future outcomes – the famous “from descriptive to predictive” – and to get the results to the end-user on an automated basis.

…and how can it even be important for businesses?

Continue reading

Data Scientists toolbox – Intro

The following series of blog posts will cover some essentials in terms of tools used by data scientists. The individual posts should serve as a guide to setting up a proper environment for all types of data science tasks. The tools will therefore be described from a technical perspective without paying attention to individual libraries or algorithms.


The posts will cover tools, that are either not that common or their set-up is a little bit tricky. The aim is to provide information about interesting new tools that help to broaden analytical skills of anyone, who deals with data analysis.

Continue reading