Getting Started

When to apply which dimensionality reduction technique

Image for post
Image for post
Photo by KERBSTONE on Pixabay and edited by me :D

Introduction

In my last blog post about Gap-Statistics, I explained that most of the time data has a lot of features (hundreds or even sometimes thousands). We consider a dataset with 100 features as being 100 dimensional. So each feature represents one dimension of the dataset. That’s quite a lot of dimensions for visualizing the data. We can’t just cut away all features except for three, can we? Of course not! One exception could be that 98 out of 100 features are highly correlated (find out with Heatmaps), so we omit 97 and take only a single one of them. Then we would end up with a three-dimensional dataset. …


Closing knowledge Gaps with Gap-Statistics

Image for post
Image for post
Photo by Suserl just me on Freeimages and edited by me

There is a lot of code going on under the hood. That’s why I provide my Github repository at the end of this post and I show just a little code of the K-Means.

Introduction

Clustering is an important technique in Pattern Analysis to identify distinct groups in data. Due to data being mostly more than three-dimensional, we perform dimensionality reduction methods like PCA or Laplacian Eigenmaps before applying a clustering technique. The data is then available in 2D or 3D and this allows us to visualize the found clusters very nicely to humans. …


Implementation in Python with hmmlearn

Image for post
Image for post
Photo by TheDigitalArtist on Pixabay

Digital signatures are on the rise. Since many of us are working now from home, a lot of confidential company E-Mails need to be signed online.
Ian Goodfellows' invention of Generative Adversarial Networks (GAN’s) showed how easy it is nowadays to generate fake numbers on the MNIST dataset. It is actually just a tiny step from that, to also be able to generate imitated signatures with the handwriting style of any person. But isn’t that dangerous?

Can we distinguish with Machine Learning between an original and an artificially crafted signature? Indeed we can! We don’t necessarily even need one of those fancy neural network approaches, we can go totally classic with Hidden Markov Models (HMM). …


What to and what not to expect from a Hackathon

Image for post
Image for post

Introduction

Last month I participated in my first Hackathon because I received a random ad E-Mail from my University which promoted a very cool sounding one. I clicked it and saw that they were searching for teams with different skills, also including Data Scientists. Awesome I thought, I’m in!

I signed up, applied and got accepted. Bingo!

A little bit about my background could be helpful. I am currently studying in my first year the Master of Computer Science in Germany and I am working part-time as a Machine Learning Engineer. …


Image for post
Image for post

How can the effectiveness of marketing be improved?

This blog post should present, how the marketing effectiveness of Airbnb can be enhanced by the analysis of a dataset of 2016. In order to improve the marketing, the four Ps of the marketing mix should be addressed. The dataset contains listings of rented apartments and their attributes.

Marketing Mix — The four P’s

Product

Marketing is about fulfilling customer needs and expectations. Within product policy, the aim is to understand one’s market and be able to figure out which needs and wants the customers to have. In general, one can say that the main need for travelers is to find accommodation but nowadays it is not only about finding accommodations but even more about discovering the right accommodation. …


Image for post
Image for post

As I enrolled for the Udacity Data Science Nanodegree, I didn’t know where my journey is going to end. This time has now arrived after I chose the dataset provided by Arvato for this final project to graduate from the Nanodegree.

The dataset consists of four files:

  • Azdias (891221 rows, 366 columns)
  • Customers (191652 rows, 369 columns)
  • Mailout Train (42962 rows, 367 columns)
  • Mailout Test ()

Project Definition

This project is provided by Arvato Bertelsmann, a Supply Chain Management solution company located in Germany https://www.kaggle.com/c/udacity-arvato-identify-customers.

The project is about finding people in the German population, which is most likely responding to a targeted marketing campaign. …

About

Tim Löhr

Machine Learning Engineer @Siemens and Computer Science Master Student @University of Erlangen-Nürnberg. https://www.linkedin.com/in/tim-l%C3%B6hr-821ba8188/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store