At the beginning of November, Me and my colleague Honza attended the Space Application Hackathon where our team managed to win the earth observation category. Specifically, we worked out two use cases – classification of agricultural land use and crop yield prediction. Here is our write up from the event.
Are you into cluster-computing with Apache Spark? This year’s SAIS 2018 conference covered great data engineering and data science best practices for productionizing AI. In a nutshell, you should keep your training data fresh with stream processing, monitor quality, test and serve models (at massive scale when talking about Spark). The conference also provided some deep dive sessions on Spark integration with popular machine learning frameworks, such as well known TensorFlow, Scikit-learn, Keras, PyTorch, DeppLearning4j, BigDL and Deep Learning Pipelines.
Here is the list of several interesting topics (in case you couldn’t join;-):
Spark Experience and Use Cases
Great talk about Spark utilization for HEP (high energy psysics) data processing and analysis as a complementary tool for current rid computing in CERN.
“Imaging beyond imagination”
That is this year’s ICIP 2018 conference theme. I attended the world’s largest and most comprehensive technical conference focused on image and video processing and computer vision.
Here are my special picks:
Image Representation an Modeling
Reducing Anomaly Detection in Images to Detection in Noise
Smart approach to anomaly detection by removing self-similar content of the image – ready to use for detecting material defects, tumors and others!
It’s no secret that a good pricing strategy is one of the most important aspects of every business. There are various ways how to determine prices at which you can maximize your overall profit. However, to maximize profit and engage price-sensitive customers at the same time, you have to make sure you go in the right direction.
Dynamic pricing, based on real-time market changes, is the latest pricing trend that dominates the e-commerce industry. Before deploying our dynamic pricing solution, we faced a problem how to test the performance of the model and compare it to the current client’s solution.
As you may agree to adopt any pricing model without a thorough testing would be extremely risky (especially, when it has a control over thousands of products!).
The question was, how to design an appropriate test that helps us validate a new pricing strategy and gain confidence in the change we were making.
Standard A/B Testing…
The basic idea behind A/B testing is to compare two variants A (the currently used control version) and B (the modified test version). Customers are typically split in half at random, while the two groups need to be as similar as possible. Without being told, the customers in both groups are assigned to either a control group or a test group. The goal is to determine which variant performs better.
This year’s ECCV 2018 conference experienced an unprecedented growth of community and brought to light the most recent advances in computer vision. As expected, all the sessions were dominated by Deep Learning with Convolutional Neural Networks (CNNs).
For those who couldn’t join, I picked up a few interesting topics that caught my attention. Here is the list:
The Python programming is our daily bread. We develop frameworks, which are afterwards deployed on the customers’ infrastructures. And in some cases, there is an emphasis on performance, such as in the recent case with a recommender engine, which should load an individual recommendation in less than 30 ms. And then faster calculation might be helpful, especially since the use of a specific distribution requires no changes to the underlying python code
Two weeks ago, Martin found that an Intel distribution for Python exists, so I decided to have a look. Intel claims that this distribution is faster in every way, and shares its benchmark. So apart from conducting the intel benchmark only, I decided to test the distributions using my own benchmark to determine the performance on typical cases often performed in a Data Science pipeline.
Current landscape of IoT (internet of things), low-cost GNSS (satellite navigation system) receivers, and omnipresent wireless networks produce large amounts of data containing geospatial information. This blogpost introduces the basics of the geolocated k-nearest neighbors (k-NN) model and its applications in product campaign targeting.
This year we started to work on advanced analytical projects in manufacturing. The boom of IoT sensors, never-ending pressure to increase yields and output quality, decreasing marginal effect of lean and Six Sigma activities and the big trend of analytics caused that we quickly ran out of our existing capacities. The projects are intriguing, data are large, we are fun to work with and the demand is enormous. Honestly, I don’t see any reason why not to join us!
I’m a data scientist not a public speaker so when the Keboola guys asked me to do a talk with them in London I was excited. The topic we chose was transactional data analysis mainly for two reasons – first, they can be used to solve so many business issues and secondly, they are everywhere.
Great opportunity to meet Adam and learn about real-life applications of transactional data analytics. You can sign-up here. Thanks to Keboola for organizing!
A short preview in the video talk below (content starting at 5:19). The live event will be much better though 🙂