Image for post
Image for post

Customer Segmentation Report for Arvato Financial Services

Project Definition

Problem Statement

Metric

Image for post
Image for post

Methodology

Data

Missing Features

Image for post
Image for post

Features Description

Image for post
Image for post
The three CUSTOMERS specific features

Customer Segmentation Report

Person Topic

Household Topic

125m x 125m Grid Topic

Specific Customer Behaviour

Heatmaps to evaluate the Correlation

Finding Correlations

Image for post
Image for post
Find differences in the Datasets

Heatmap for the Customer dataset

Image for post
Image for post
Customer dataset

Heatmap for the Azdias (general population) dataset

Image for post
Image for post
Azdias dataset

Customer Finance Features

Image for post
Image for post

Azdias Finance Features

Image for post
Image for post

Customer Mind Affinity Features

Image for post
Image for post

Azdias Mind Affinity Features

Image for post
Image for post

So, the typical Arvato customer is…

The typical customers of Arvato are between 60 and 70 years old. They are financially very stable and saved quite some money. They did precaution for the higher ages in which they are currently in, because they grew up in a time where saving money was essential and pure consumption not normal. So they earned a lot money in the golden 70s to 90s and discovered now the online shopping. Since 91% of all purchases were made online, the customers are still young enough to handle the computer well. Most customers have a affinity to traditional thinking and are more rational and probably more academically educated than the average. For that reason, the most important features are:

Supervised Learning Model

pipe = Pipeline([('classifier', RandomForestClassifier())])

# Create space of candidate learning algorithms and their hyperparameters
search_space = [{'classifier': [LogisticRegression()],
'classifier__penalty': ['l2'],
'classifier__C': np.logspace(0, 5)},
{'classifier': [RandomForestClassifier()],
'classifier__n_estimators': [10, 100],
'classifier__max_features': [1, 3]},
{'classifier': [GradientBoostingRegressor(random_state=42)],
'classifier__n_estimators': [50, 100, 200],
'classifier__min_samples_split': [2, 3, 4]}]

Classifier Parametrization

GradientBoostingRegressor(alpha=0.9, ccp_alpha=0.0, criterion='friedman_mse',
init=None, learning_rate=0.1, loss='ls', max_depth=3,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=50,
n_iter_no_change=None, presort='deprecated',
random_state=42, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False)

Kaggle Competition

Conclusion

Written by

Machine Learning Engineer @Siemens and Computer Science Master Student @University of Erlangen-Nürnberg. https://www.linkedin.com/in/tim-l%C3%B6hr-821ba8188/

Get the Medium app