aLook at the Space Application Hackathon: ML applied to satellite imagery in agriculture

At the beginning of November, Me and my colleague Honza attended the Space Application Hackathon where our team managed to win the earth observation category. Specifically, we worked out two use cases – classification of agricultural land use and crop yield prediction. Here is our write up from the event.


Classification of soybeans and wheat. Upper image is our prediction. Bottom image is the reality

Continue reading


Highlights from Spark + AI Summit 2018 (SAIS 2018)

Are you into cluster-computing with Apache Spark? This year’s SAIS 2018 conference covered great data engineering and data science best practices for productionizing AI. In a nutshell, you should keep your training data fresh with stream processing, monitor quality, test and serve models (at massive scale when talking about Spark). The conference also provided some deep dive sessions on Spark integration with popular machine learning frameworks, such as well known TensorFlow, Scikit-learn, Keras, PyTorch, DeppLearning4j, BigDL and Deep Learning Pipelines.

Here is the list of several interesting topics (in case you couldn’t join;-):

Spark Experience and Use Cases

CERN’s Next Generation Data Analysis Platform with Apache Spark

Great talk about Spark utilization for HEP (high energy psysics) data processing and analysis as a complementary tool for current rid computing in CERN.

Continue reading

Highlights from IEEE International Conference on Image Processing 2018

“Imaging beyond imagination”

That is this year’s ICIP 2018 conference theme. I attended the world’s largest and most comprehensive technical conference focused on image and video processing and computer vision.

Here are my special picks:

Image Representation an Modeling

Reducing Anomaly Detection in Images to Detection in Noise

Smart approach to anomaly detection by removing self-similar content of the image – ready to use for detecting material defects, tumors and others!

Continue reading

Real life testing of dynamic pricing model in e-commerce

It’s no secret that a good pricing strategy is one of the most important aspects of every business. There are various ways how to determine prices at which you can maximize your overall profit. However, to maximize profit and engage price-sensitive customers at the same time, you have to make sure you go in the right direction.

Dynamic pricing, based on real-time market changes, is the latest pricing trend that dominates the e-commerce industry. Before deploying our dynamic pricing solution, we faced a problem how to test the performance of the model and compare it to the current client’s solution.

As you may agree to adopt any pricing model without a thorough testing would be extremely risky (especially, when it has a control over thousands of products!).

The question was, how to design an appropriate test that helps us validate a new pricing strategy and gain confidence in the change we were making.

Standard A/B Testing…

The basic idea behind A/B testing is to compare two variants A (the currently used control version) and B (the modified test version). Customers are typically split in half at random, while the two groups need to be as similar as possible. Without being told, the customers in both groups are assigned to either a control group or a test group. The goal is to determine which variant performs better.

Continue reading

Highlights from the European Conference on Computer Vision 2018

This year’s ECCV 2018 conference experienced an unprecedented growth of community and brought to light the most recent advances in computer vision. As expected, all the sessions were dominated by Deep Learning with Convolutional Neural Networks (CNNs).

For those who couldn’t join, I picked up a few interesting topics that caught my attention. Here is the list:

Autonomous Driving

Self-localization on-the-fly

One of the main topics at ECCV 2018 was autonomous driving. Can you compete against LIDAR? Can you detect and reconstruct cars as 3D objects from video? Check some ECCV’s challenges!


Continue reading

The performance of Intel vs. Anaconda vs. vanilla Python – my personal benchmark

The Python programming is our daily bread. We develop frameworks, which are afterwards deployed on the customers’ infrastructures.  And in some cases, there is an emphasis on performance, such as in the recent case with a recommender engine, which should load an individual recommendation in less than 30 ms.  And then faster calculation might be helpful, especially since the use of a specific distribution requires no changes to the underlying python code

Two weeks ago, Martin found that an Intel distribution for Python exists, so I decided to have a look. Intel claims that this distribution is faster in every way, and shares its benchmark. So apart from conducting the intel benchmark only, I decided to test the distributions using my own benchmark to determine the performance on typical cases often performed in a Data Science pipeline.

Intel Distribution for Python

Continue reading

Geolocated nearest neighbors in product campaign targeting

Current landscape of IoT (internet of things), low-cost GNSS (satellite navigation system) receivers, and omnipresent wireless networks produce large amounts of data containing geospatial information. This blogpost introduces the basics of the geolocated k-nearest neighbors (k-NN) model and its applications in product campaign targeting.

Continue reading