Biometric Signal Verification of Handwriting with Hidden Markov Models

Implementation in Python with hmmlearn

Published in

Towards Data Science

10 min readOct 14, 2020

Digital signatures are on the rise. Since many of us are working now from home, a lot of confidential company E-Mails need to be signed online.
Ian Goodfellows' invention of Generative Adversarial Networks (GAN’s) showed how easy it is nowadays to generate fake numbers on the MNIST dataset. It is actually just a tiny step from that, to also be able to generate imitated signatures with the handwriting style of any person. But isn’t that dangerous?

Can we distinguish with Machine Learning between an original and an artificially crafted signature? Indeed we can! We don’t necessarily even need one of those fancy neural network approaches, we can go totally classic with Hidden Markov Models (HMM). I will show in this post how we can incorporate HMM’s to classify whether a signature was original or imitated.

This project is loosely inspired by the paper of Julian Fierrez et. al. published 2007, called: HMM-Based On-Line Signature Verification.

Introduction

Even though Hidden Markov Models are not state-of-the-art, as the publishing date “2007” of the paper above already suggests, they are still a fundamental concept every Data Scientist should at least have heard about. Understanding the way an HMM works can be enlightening when you want to understand more recent technologies like Recurrent Neural Networks, because many techniques have evolved out of the HMM’s basic idea.

Hidden Markov Models

Hidden Markov Models can include time dependency in their computations. In Figure 1 below we can see, that from each state (Rainy, Sunny) we can transit into Rainy or Sunny back and forth and each of them has a certain probability to emit the three possible output states at every time step (Walk, Shop, Clean). The start probability always needs to be provided as well (60% for Rainy and 40% for Sunny) to start the computational chain.

Figure 1 from Wikipedia: Hidden Markov Model

In our case this means, that a signature is written from left to right with one letter after another. So the time dependency involves the speed, pressure and coordinates of the pen moving around to form a letter. All of this information can be taken into consideration to make an estimate of how likely it is going to be that a signature has been written by the original person or if it has been faked.

Figure 1 suggests that we need prior knowledge about the:

start probability (Rainy 60% or Sunny 40%) Vector
transition (From e.g. Rainy to Rainy 70%) Matrix,
emission (From Rainy to either Walk 10%, Shop 40% or Clean 50%) Matrix

Notice that start probability sums up to 100% and for the Matrices transition and emission each row sums up to 100%.

Gaussian Mixture Models (GMM)

We need time-dependent data and compute features out of them, then we can fit the GMM to find the parameters for our HMM. When we have the parameters of the HMM, we can compute scores for the original and the imitated signatures. Based on the scores, we can make classification estimates of how likely it is going to be that the signature is true or not.

Why should we even use a Gaussian Mixture Model for training the HMM parameters in the first place? Well basically hmmlearn offers us three possibilities to choose from:

GMM-HMM
Gaussian HMM
Multinomial HMM

The GMM just works the best, because estimating the density of our signature is most likely not just a single Gaussian or single distribution, but the ground truth looks more like a valley, which can be way better approximated by a GMM.

We can exploit GMM’s with the so-called EM (Expectation-Maximization) algorithm let us compute those hidden parameters automatically. What is the EM algorithm expecting to compute the hidden parameters?

The answer is: features.

The GMM uses the EM Algorithm to compute its parameters:

E-step: Compute the component occupation probabilities P(M|x) using the current estimates of the GMM parameters (means, variances)
M-step: Compute the GMM parameters using the current estimates of the component occupation probabilities.

Where we have to choose the Hyperparameters of the GMM, which are:

M: The Gaussian Mixture components
S: The HMM States (Sunny, Rainy)

In reality, we can’t know what the best set of Hyperparameters are, but we could use for example the K-Fold Crossvalidation to find that out.

Wait what, Hyperparameters, HMM parameters?

Ok, Hyperparameters are values that we as humans need to choose. This is also called the Model Selection problem. The Model parameters are parameters that the GMM with the EM algorithms computes for us.

Dataset

For measuring those kinds of features (pressure, speed, etc.), we need a device like the one shown on the left-most picture below:

Figure 1 by [1]: Architecture of the proposed on-line signature verification system.

I am using the mobisig dataset. It contains 83 entries in total (49 male, 34 female), which we later will split into the train-, test-, and imitationset. The signatures were collected by Margit Antal et. al. and was published 2017 in their research paper: Online Signature Verification on MOBISIG Finger-Drawn Signature Corpus.

This paper provides one example signature.

Figure 2 by [2]: Pseudosignatures from the database. The first column contains genuine signatures and the second column contains forgeries.

It can be seen that the forgery signature contains more densely connected pressure points. The paper explains how the forgery was created. You can read it if you want, but this post focuses on how we can detect this forgery, not how it was generated.

Package Import

I am going to use the hmmlearn package for python, so I don’t have to reinvent the wheel for building HMM’s from scratch.

Let’s start by importing the default packages for this project. I set the random seed, so you will have exactly the same plots as I have.

File loading

Also, we want to make sure to load the dataset properly, as can be seen in the code snippet above. The data is stored in dictionaries, were each person has two entries (“original” and “imitated”) which are further split into multiple signatures each. Furthermore, all signatures have a few hundred or thousand rows with the timestamp id (which progresses in milliseconds).

Feature Extraction

Looking back to Figure 1, the first step to proceed is the feature extraction. Fitting Gaussian, especially Mixture Gaussian is way more effective and efficient if the data points are zero centered and have a normalized standard deviation. From paper [1] we can obtain the following normalization step:

Figure 3: 1. Step: Position Normalisation

Next, we extract the four features after the normalization step. The formulas (2. Step) are obtained from [3]:

The normalization step 1 is shown in the function (lines: 1–2) of the code. This formula in Figure 3 just means to sum over all x’s and y’s and divide them by the number of occurrences N. So just taking the mean. Furthermore, epsilon (e = np.finfo) is added (lines: 2, 21, 22, 24, … ) to be protected against a division by zero error.

The four features from step 2 are very straightforward computable. The dot over x and y indicate to use the normed x and y we just computed. Apart from that, theta can is calculated exactly like the formula suggests in line 24 of the code. The path velocity is calculated in line 27 the same way. It is just the square root!

Special attention goes to the Log curvature radius, because if you look closely, you can see that the authors want theta also to be normalized. Therefore in line 30 where ρ (rho) is computed, we also call the norm function on θ (theta).

The last feature I called alpha in the code, because it sounds nicer than just a. It is as straightforwardly calculated as the rest in line 33 of the code.

I appended all of the features to their according lists, because we need that in order to feed these features in the hidden Markov learning function.

So, we are done with the feature extraction step. Now let’s build the model.
Oh no!
We need to split the data into train-, test- and imitation datasets. From the mentioned entries I am using the first 25 people as training examples, this is the minimum amount to prevent overfitting, where one entry corresponds to multiple signatures by one person. So the trainset contains 25 different people with many according true and fake signatures. The testset contains another 20 different people and the validationset (X_val) set holds all the imitated signatures.

X_train: (3940, 8)
X_test: (3140, 8)
X_val: (10338, 8)

Since I said that the row length of all of the signatures were different, I always needed to keep track of the current length and append it to the new data frame. That were quite a hustle to program properly. In line 51, I renamed the features to the names from the feature extraction step.

Important to note is that I am calling the feature extraction function in line 47–49, to compute the features.

Build the GMM-HMM Model

To train the GMM HMM, we need to have a look at the documentation from hmmlearn.

We need two essential variables:

transmat ( the transition matrix) (Rainy, Sunny)
startprob (the initial probability matrix for the start)(60% 40% — according to the number of S (HMM States))

The transition matrix will be parameterized such that it is diagonally filled with a 50% probability (0.5), except for the first and the last entry. This is suggested by hmmlearn in the documentation. The same goes for the startprob matrix, I chose the default initialization from hmmlearn.

The features variables X_train are now containing one timestamp and the seven features per row:

X and Y location
The pressure
The four extracted features

Now we can create the model in line 20 of the code below. The number of components and the number of mixtures play an essential role in how good the estimation of imitated signatures will be, but we can’t know what the best values will be as already said. For that reason, I tried different combinations against each other. So let us have a look at how these two hyperparameters behave against each other.

The score function in line 35, 39 and 43 evaluates the scores for the three different sets. Those scores are calculated by the objective function of the GMM, which is the negative log-likelihood.

Alright, let’s call the fit_hmm function with different values for components and GMM’s:

Plotting the GIFs

We call it Success when we are having a set of imitated signatures with different or lower scores than the original (blue or green) signatures.

We can see that the number of HMM states have the highest influence on the scores. We want scores that separate the red dots (imitation) as far as possible from the original blue and green dots. Seven or fewer HMM states appear to separate the red dots pretty well from the rest and the GMM components have apparently a low impact on the output.

Evaluation

If we draw a linear classification line between the colored dots,

we can see how well the HMM is able to detect if the signature is fake or not. We would for example set a threshold value at -2200 and then score a new signature. If the value of the score is less than -2200 we claim it to be fake, if it is bigger we say it is an original signature.

Extension

What we could do now is save up the scores and train for example a Logistic Regression to predict the probability (0–100%) based on the score on how likely it is that the signature is original or not. This Logistic Regression can be further extended with the accuracy score from sklearn, to estimate the uncertainty of the model.

Conclusion

Success! We can successfully distinguish between imitated and original signatures. Some state-of-the-art approaches to also map time constraints are Recurrent Neural Networks. But for now, we saw that this is also possible with very classical approaches like the HMM.

For more information, you can visit my website and add me in the social media I linked there. I would be happy if you give me a clap on this blog post :).

References

[1] Julian Fierrez et. al., HMM-based on-line signature verification: Feature extraction and signature modeling (2007), Science Direct
[2] Margit Antal, László Zsolt Szabó and Tünde Tordai, Online Signature Verification on MOBISIG Finger-Drawn Signature Corpus (2017), Hindawi
[3] Julian Fierrez et. al., An On-Line Signature Verification System Based on Fusion of Local and Global Information (2005), Springer

This blog post is based from knowledge gained through the course Pattern Analysis from the Friedrich Alexander University Erlangen-Nürnberg. I used some parts of the lecture from Dr. Christian Riess to illustrate the examples for this post. So all rights go to Dr. Christian Riess.