Data Scientists toolbox: Cloud set-up using Ansible

Given the size of data and complexity of processing, many Data Science projects require scalability that can be provided by cloud environments. Clouds combine high performance and cost-efficiency and are therefore very much sought after. 

The set-up of cloud environment can be quite tedious – fortunately, the needed infrastructure installations are often similar across projects and therefore tools that enable automated infrastructure installations can be used to minimize the manual workload.  The following blog post covers setting up a cloud environment using Ansible, which is one such program.

We will talk about UNIX based system as:

  • most people already have a working knowledge of Windows based systems, but are much less knowledgeable in terms of UNIX
  • most UNIX based systems are open-source under free licenses, so they are cheaper to run in general
  • some handy products (e.g. RStudio server) are only UNIX based so a situation where basic knowledge is necessary can arise

