Are you into cluster-computing with Apache Spark? This year’s SAIS 2018 conference covered great data engineering and data science best practices for productionizing AI. In a nutshell, you should keep your training data fresh with stream processing, monitor quality, test and serve models (at massive scale when talking about Spark). The conference also provided some deep dive sessions on Spark integration with popular machine learning frameworks, such as well known TensorFlow, Scikit-learn, Keras, PyTorch, DeppLearning4j, BigDL and Deep Learning Pipelines.
Here is the list of several interesting topics (in case you couldn’t join;-):
This year we started to work on advanced analytical projects in manufacturing. The boom of IoT sensors, never-ending pressure to increase yields and output quality, decreasing marginal effect of lean and Six Sigma activities and the big trend of analytics caused that we quickly ran out of our existing capacities. The projects are intriguing, data are large, we are fun to work with and the demand is enormous. Honestly, I don’t see any reason why not to join us!
I’m a data scientist not a public speaker so when the Keboola guys asked me to do a talk with them in London I was excited. The topic we chose was transactional data analysis mainly for two reasons – first, they can be used to solve so many business issues and secondly, they are everywhere.
Coming from a classical IT background in terms of software development it took us a while to arrive at an architecture that was capable of fulfilling our needs for Data Science projects. Be aware that treating these two in a similar matter is not a good idea, as you might seriously lower the productivity of your Data Science team.