Sign in

Studied Mathematics, graduated in Cryptanalysis, working as a Data Scientist. Interested in algorithms, probability theory, and machine learning. Python user.

Giving an intuition on why Bagging algorithms actually work and displaying the effects of them in an easy and approachable way

There exist a vast amount of great articles describing how Bagging methods like Random Forests work on an algorithmic level and why Bagging is a good thing to do. Usually, the essence is the following:

“You train a lot of Decision Trees on different parts of the training set and average their predictions into a final prediction. The prediction gets better, because the variance of the Random Forest is smaller compared to the variance of a single Decision Tree. (dartboard.png)”

— some article

Of course, I am paraphrasing here. The articles include great pictures, code, and many more thoughts. But…


Recognize linear program problems and solve them in Python with CVXPY

Motivation

Imagine that, for whatever reason, you want to do a diet consisting of apples and strawberries only. You don’t really favor one fruit over the other, but you want to make sure that you…

  1. get enough vitamin C and
  2. get enough calories.

In addition to having a different amount of vitamin C and calories, apples and strawberries also come with a different price tag. So it’s natural to ask for the cheapest diet that fulfills both of your constraints.

Numerical example

Let’s say that you want to consume at least 300 mg of vitamin C (which is way too much, but it…


Hands-on Tutorials

Ditch the p-values and embrace more intuitive probabilities

From time to time, you have to choose between two options. This might be in completely uninformed situations where you can pick the better option with a mere probability of 50%. In some of these uninformed cases, you can even boost this probability with a simple trick, as demonstrated in my other article.

However, usually, you are able to gather some information that helps you pick the better option with high probability. One easy, yet smart method to do this is A/B testing that you have probably heard of or even used already.


Hands-on Tutorials

Teach your regressor how to output zero

Zero-Inflated Data

When working on regression problems, often you have target values that are continuously and evenly distributed in some range. Let me illustrate what I mean by this. Consider the following dataset 2-dimensional dataset:


Model Interpretability

As accurate as gradient boosting, as interpretable as linear regression.

The Interpretability-Accuracy Trade-Off

In the machine learning community, I often hear and read about the notions of interpretability and accuracy, and how there is a trade-off between them. Usually, it is somewhat depicted like this:


Making Sense of Big Data

Be smart and do not manually label hundreds or even thousands of data points yourself.

Introduction

A classic task for us data scientists is building a classification model for some problem. In a perfect world, data samples — including their corresponding labels — are handed to us on a silver plate. We then do our machine learning tricks and mathemagic to come to some useful insights that we derived from the data. So far so good.

However, what often happens in our imperfect yet beautiful world is one of the following:

  1. We get an extremely small dataset that is at least completely labeled. In this case, building a model can be extremely tricky. We have to…


Two envelopes with different amounts of money in them. Choose the better one with a higher chance than fifty-fifty!

In this article, I want to introduce you to a simple problem with an easy-to-apply, yet awfully unintuitive solution. It is one kind of envelope problem and goes like this

There are two envelopes with some different amounts x and y of money in them. The envelopes look exactly the same and are randomly shuffled before they reach your hands.


Learn about recursion and merge sort with code in Python

Recursion is an important concept in mathematics and computer science that comes in many flavors. The essence of them is the following:

There is an object that consists of smaller versions of itself. Usually there is a smallest, atomic object — this is where the recursion ends.

We are especially interested in solving problems using recursions. For example, sorting numbers or other elements, i.e. turning an input array like
[1, 4, 5, 2, 6, 3, 6] into [1, 2, 3, 4, 5, 6, 6] .

This is a fundamental problem in computer science and has been extensively studied by many…


Maybe the most interesting proof method

Introduction

In mathematics, there are thousands of theorems to be proven. Often, we tailor unique proofs for one of these theorems — this can be beautiful, but extremely difficult at the same time. Think about proofs to theorems that involve constructing a desired object.

As a small example, consider the following “theorem”:


Learn how the k-Nearest Neighbors Classifier works and implement it in Python

Another day, another classic algorithm: k-nearest neighbors. Like the naive Bayes classifier, it’s a rather simple method to solve classification problems. The algorithm is intuitive and has an unbeatable training time, which makes it a great candidate to learn when you just start off your machine learning career. Having said this, making predictions is painfully slow, especially for large datasets. The performance for datasets with many features might also not be overwhelming, due to the curse of dimensionality.

In this article, you will learn

  • how the k-nearest neighbors classifier works
  • why it was designed like this
  • why it has these…

Dr. Robert Kübler

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store