**At the beginning of November, Me and my colleague Honza attended the Space Application Hackathon where our team managed to win the earth observation category. Specifically, we worked out two use cases – classification of agricultural land use and crop yield prediction. Here is our write up from the event.**

# Category Archives: Tech

# Highlights from IEEE International Conference on Image Processing 2018

“Imaging beyond imagination”

**That is this year’s ICIP 2018 conference theme. I attended the world’s largest and most comprehensive technical conference focused on image and video processing and computer vision.**

Here are my special picks:

# Image Representation an Modeling

### Reducing Anomaly Detection in Images to Detection in Noise

Smart approach to anomaly detection by removing self-similar content of the image – ready to use for detecting material defects, tumors and others!

# Real life testing of dynamic pricing model in e-commerce

**It’s no secret that a good pricing strategy is one of the most important aspects of every business. There are various ways how to determine prices at which you can maximize your overall profit. However, to maximize profit and engage price-sensitive customers at the same time, you have to make sure you go in the right direction.**

**Dynamic pricing, based on real-time market changes, is the latest pricing trend that dominates the e-commerce industry. Before deploying our dynamic pricing solution, we faced a problem how to test the performance of the model and compare it to the current client’s solution.**

**As you may agree to adopt any pricing model without a thorough testing would be extremely risky (especially, when it has a control over thousands of products!).**

**The question was, how to design an appropriate test that helps us validate a new pricing strategy and gain confidence in the change we were making.**

## Standard A/B Testing…

The basic idea behind A/B testing is to compare two variants A (the currently used control version) and B (the modified test version). Customers are typically split in half at random, while the two groups need to be as similar as possible. Without being told, the customers in both groups are assigned to either a control group or a test group. The goal is to determine which variant performs better.

# Highlights from the European Conference on Computer Vision 2018

**This year’s ECCV 2018 conference experienced an unprecedented growth of community and brought to light the most recent advances in computer vision. As expected, all the sessions were dominated by Deep Learning with Convolutional Neural Networks (CNNs).**

For those who couldn’t join, I picked up a few interesting topics that caught my attention. Here is the list:

# Autonomous Driving

### Self-localization on-the-fly

One of the main topics at ECCV 2018 was autonomous driving. Can you compete against LIDAR? Can you detect and reconstruct cars as 3D objects from video? Check some ECCV’s challenges!

# The performance of Intel vs. Anaconda vs. vanilla Python – my personal benchmark

**The Python programming is our daily bread. We develop frameworks, which are afterwards deployed on the customers’ infrastructures. And in some cases, there is an emphasis on performance, such as in the recent case with a recommender engine, which should load an individual recommendation in less than 30 ms. And then faster calculation might be helpful, especially since the use of a specific distribution requires no changes to the underlying python code**

**Two weeks ago, Martin found that an Intel distribution for Python exists, so I decided to hav****e a look. Intel claims that this distribution is faster in every way, and shares its benchmark. So apart from conducting the intel benchmark only, I decided to test the distributions using my own benchmark to determine the performance on typical cases often performed in a Data Science pipeline.**

# Geolocated nearest neighbors in product campaign targeting

**Current landscape of IoT (internet of things), low-cost GNSS (satellite navigation system) receivers, and omnipresent wireless networks produce large amounts of data containing geospatial information. This blogpost introduces the basics of the geolocated k-nearest neighbors (k-NN) model and its applications in product campaign targeting.**

# Infrastructure and Development for Data Science

**Coming from a classical IT background in terms of software development it took us a while to arrive at an architecture that was capable of fulfilling our needs for Data Science projects. Be aware that treating these two in a similar matter is not a good idea, as you might seriously lower the productivity of your Data Science team.**

# Automate your Machine Learning in Python – TPOT and Genetic Algorithms

**Automatic Machine Learning (AML) is a pipeline, which enables you to automate the repetitive steps in your Machine Learning (ML) problems and so save time to focus on parts where your expertise has higher value. What is great is that it is not only some vague idea, but there are applied packages, which build on standard python ML packages such as scikit-learn.**

**Anyone familiar with Machine Learning will in this context most probably recall the term grid search. And they will be entirely right to do so. AML is in fact an extension of grid search, as applied in scikit-learn, however instead of iterating over a predefined set of values and their combinations it searches for optimal solutions across methods, features, transformations and parameter values. AML “grid search” therefore does not have to be an exhaustive search over the space of possible configurations – one great application of AML is package called TPOT, which offers applications of e.g. genetic algorithms to mix the individual parameters within a configuration and arrive at the optimal setting. **

**In this post I will shortly present some basics of AML and then dive into applications using TPOT package including its genetic algorithm solution optimization.**

## Basic concepts

The basic concept is very simple, once we receive our raw data we start with the standard ML pipeline.

# Monte Carlo Method in R (with worked examples)

**Monte Carlo method is a handy tool for transforming problems of probabilistic nature into deterministic computations using the law of large numbers. Imagine that you want to asses the future value of your investments and see what is the worst-case scenario for a given level of probability. Or that you want to plan the production of your factory given past daily performance of individual workers to ensure that you will meet a tough delivery plan with high enough probability. For such and many more real-life tasks you can use the Monte Carlo method.**

# Intuition vs Unsupervised Learning – Agglomerative Clustering in practice

**Clustering is a hugely important step of exploratory data analysis and finds plenty of great applications. Typically, clustering technique will identify different groups of observations among your data. For example, if you need to perform market segmentation, cluster analysis will help you with labeling each segment so that you can evaluate each segment’s potential and target some attractive segments. Therefore, your marketing program and positioning strategy rely heavily on the very fundamental step – grouping of your observations and creation of meaningful segments. We may also find many more use cases in computer science, biology, medicine or social science. However, it often turns out to be quite difficult to define properly how a well-separated cluster looks like.**

Today, I will discuss some technical aspects of hierarchical cluster analysis, namely Agglomerative Clustering. One great advantage of this hierarchical approach would be fully automatic selection of the appropriate number of clusters. This is because in genuine unsupervised learning problem, we have no idea how many clusters we should look for! Also, in my view, this clever clustering technique solves some ambiguity issues regarding vague definition of a cluster and thus is more than suitable for automatic detection of such structures. On the other hand, the agglomerative clustering process employs standard metrics for clustering quality description. Therefore, it will be fairly easy to observe what is going on. Continue reading