This year we started to work on advanced analytical projects in manufacturing. The boom of IoT sensors, never-ending pressure to increase yields and output quality, decreasing marginal effect of lean and Six Sigma activities and the big trend of analytics caused that we quickly ran out of our existing capacities. The projects are intriguing, data are large, we are fun to work with and the demand is enormous. Honestly, I don’t see any reason why not to join us!
We are looking for a person to build machine learning models for our manufacturing clients. As with other industries we do not specialise ourselves in one type of analytics, but rather seek a high variability in the projects. So far we have worked with images to create an automated quality control algorithm, sensor data to optimise the machine setting but also laboratory measurements when performing a design of experiments.
We have learned that one needs to be familiar with methods of both supervised and unsupervised learning. The overall model is also very often influenced by experts to form a semi-supervised learning. All our solutions need to be self-learning so that it fits the manufacturers’ continuous improvement DNA.
A typical project might involve image processing, dimensionality reduction followed by a clustering or an anomaly detection as well as a binary classification or a regression problem. And sometimes even a prediction such as in predictive maintenance.
Not a long ago we published a blog post about hiring data scientists and data engineers. Now we are looking specifically for machine learning experts. What I’ve learned so far is that the true data guys are motivated by the task they will be asked to solve. So let me give you an example of a problem you could be working on – predictive maintenance. A problem that will also serve as a testing task for the job.
Predictive maintenance in general is about determining the condition of a machine or an equipment by calculating a time to failure. Such models are extremely valuable because they reduce the costs of repair while minimising costs of maintenance.
Because we cannot use data from our clients let us use a data set released in 2008 by the Prognostics CoE at NASA Ames. The Turbofan Engine Degradation Simulation data set contains data from the engine degradation simulation carried out using C-MAPSS. Four different sets were simulated under different combinations of operational conditions and fault modes. There are also data from multiple sensor channels to characterize fault evolution. Below is a simple example of the data.
The data can be downloaded here. The original paper is part of the data download but there are many other publications using this data set. The data consist of multiple multivariate time series for individual engines operating under four different simulated scenarios. For each scenario a training and test set is available.
As it is stated in the data set’s readme file, the engine is operating normally at the start of each time series, and develops a fault at some point during the series (note, in the plot above the time series are shifted to match the time of fault). In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate.
The output should contain a brief summary of the results and the main steps as well as the code running the model. Python is preferred but not a must.
If you are interested please send me your solution at firstname.lastname@example.org.
And stay tuned for the post with the best approaches and results!