Are you into cluster-computing with Apache Spark? This year’s SAIS 2018 conference covered great data engineering and data science best practices for productionizing AI. In a nutshell, you should keep your training data fresh with stream processing, monitor quality, test and serve models (at massive scale when talking about Spark). The conference also provided some deep dive sessions on Spark integration with popular machine learning frameworks, such as well known TensorFlow, Scikit-learn, Keras, PyTorch, DeppLearning4j, BigDL and Deep Learning Pipelines.
Here is the list of several interesting topics (in case you couldn’t join;-):
It’s no secret that a good pricing strategy is one of the most important aspects of every business. There are various ways how to determine prices at which you can maximize your overall profit. However, to maximize profit and engage price-sensitive customers at the same time, you have to make sure you go in the right direction.
Dynamic pricing, based on real-time market changes, is the latest pricing trend that dominates the e-commerce industry. Before deploying our dynamic pricing solution, we faced a problem how to test the performance of the model and compare it to the current client’s solution.
As you may agree to adopt any pricing model without a thorough testing would be extremely risky (especially, when it has a control over thousands of products!).
The question was, how to design an appropriate test that helps us validate a new pricing strategy and gain confidence in the change we were making.
Standard A/B Testing…
The basic idea behind A/B testing is to compare two variants A (the currently used control version) and B (the modified test version). Customers are typically split in half at random, while the two groups need to be as similar as possible. Without being told, the customers in both groups are assigned to either a control group or a test group. The goal is to determine which variant performs better.
This year’s ECCV 2018 conference experienced an unprecedented growth of community and brought to light the most recent advances in computer vision. As expected, all the sessions were dominated by Deep Learning with Convolutional Neural Networks (CNNs).
For those who couldn’t join, I picked up a few interesting topics that caught my attention. Here is the list:
One of the main topics at ECCV 2018 was autonomous driving. Can you compete against LIDAR? Can you detect and reconstruct cars as 3D objects from video? Check some ECCV’s challenges!
The Python programming is our daily bread. We develop frameworks, which are afterwards deployed on the customers’ infrastructures. And in some cases, there is an emphasis on performance, such as in the recent case with a recommender engine, which should load an individual recommendation in less than 30 ms. And then faster calculation might be helpful, especially since the use of a specific distribution requires no changes to the underlying python code
Two weeks ago, Martin found that an Intel distribution for Python exists, so I decided to have a look. Intel claims that this distribution is faster in every way, and shares its benchmark. So apart from conducting the intel benchmark only, I decided to test the distributions using my own benchmark to determine the performance on typical cases often performed in a Data Science pipeline.
Current landscape of IoT (internet of things), low-cost GNSS (satellite navigation system) receivers, and omnipresent wireless networks produce large amounts of data containing geospatial information. This blogpost introduces the basics of the geolocated k-nearest neighbors (k-NN) model and its applications in product campaign targeting.
This year we started to work on advanced analytical projects in manufacturing. The boom of IoT sensors, never-ending pressure to increase yields and output quality, decreasing marginal effect of lean and Six Sigma activities and the big trend of analytics caused that we quickly ran out of our existing capacities. The projects are intriguing, data are large, we are fun to work with and the demand is enormous. Honestly, I don’t see any reason why not to join us!