Multimodal Liveness Detection — Methods, Issues and Application

From Antispoofing Wiki

Multimodal liveness detection is a comprehensive approach, which allows detecting spoofing attacks by analyzing multiple biometric parameters at the same time.

Problem Overview

Multimodal liveness detection — as opposed to unimodal approach — is based on the principle of scanning and analyzing more than one biometric features of a person. Currently, most detection solutions focus on a single parameter: voice, iris, fingerprints, face, hand geometry etc. The concept of multimodal detection can be dated back to 2013. A study published in Psychology Today showed that people do not recognize others solely by their faces, and "non-facial cues" — such as height and build — also play a major role in identifying a person.

Multimodal liveness detection received greater interest and development upon the growing threat of the Presentation Attacks (PAs). Due to the introduction of sophisticated tools like Generative Adversarial Network, it is possible to fabricate highly realistic images, videos or clone voices for malicious intent.

Currently, many security companies offer practical solutions in multimodal detection. Most of them combine two separate liveness detection methods: face + iris, face + heart activity, face + voice, and so on. The combination of techniques makes multimodal approach more effective than unimodal. Some primary advantages of multimodal detection include:

  • Better antispooing qualities. Analyzing multiple characteristics provides a higher accuracy at detecting presentation attacks.
  • Increased fraud difficulty. Perpetrators will have to counterfeit more than one biometric parameter, increasing the cost and complexity of initiating an attack.
  • ‘Compensation’ mechanism. Multimodal detection is often equipped with fault compensation. A multimodal recognition system is designed such that it can identify a person by analyzing just one biometric feature in case some of its sensors go defunct or if an accessor occludes some traits/features with a mask, gloves, sunglasses etc. This also includes cases when some features are unavailable at all: aphasia, amputation etc.

A number of multimodal solutions are offered including cross-modal fusion, fingerprint and iris modalities etc.

Face & Voice Fusion

The idea behind this method is that joint analysis of speech signals — voice, facial expressions, tongue or lip movement — leads to a synergistic effect, which helps the human brain process audiovisual data. The method relies on fuzzy fusion technique, which employs mutual dependency models. These models, in turn, allow extracting and analyzing spatio-temporal parallels between a person's speech and their facial expressions.

The face and voice fusion method is vastly based on phoneme & viseme correlation. Phoneme refers to the sound made by a person, while viseme represents lip movement and mouth shapes that accompany that sound. Although it should be noted that some phonemes can correlate with multiple visems: /k, ɡ, ŋ/ etc.

To extract lip motion features the kernel Canonical Correlation Analysis (kCCA) on Mel Frequency Cepstral Coefficients (MFCC) is used, meanwhile Multi Channel Gradient Model (MCGM) algorithm is employed for analyzing lip sequences. For testing of this technique, both bona fide and fake media were used including videos created from still images of people. The evaluation of fuzzy fusion revealed satisfying results: the Equal Error Rate (ERR) during the VidTIMIT test was 16.2%.

Face & Fingerprint Fusion

Face and fingerprint multimodal solutions also provide a more reliable detection result. Suggested techniques utilize 1-median filtering and bagging-based fingerprint liveness detection. 1-median filter is more "immune" to outliers and can, therefore, resist spoofing attacks more successfully. Moreover, this technique can be applied to combine points of liveness and recognition scores.

Bagging-based method ensures further fingerprint liveness detection. It incorporates 2D Gabor filters, gray level co-occurrence matrix (GLCM), Fourier Transform (FT) based features, as well the Bootstrap AGGregatING method (also known as bagging). This fusion method was tested with a chimeric dataset with samples borrowed from various datasets including LivDet2013. It showed an 85% accuracy when exposed to replay and other attacks.

Multimodal Biometric Recognition Based on Multifusion

Multifusion detection combines face, fingerprint and finger recognition with the subsequent fusion of their features. Increasing the number of detection techniques enhances the chance of accurate detection.

Multifusion method involves three stages:

  • Fingerprint recognition. It includes variance method, fingerprint thinning, minutia extraction (ridge ending, short ridge, bifurcation), and other procedures.
  • Finger vein recognition. Similar to the previous step, vein recognition involves segmentation, feature extraction, normalization, as well as light source and sensor selection.
  • Face recognition. It is based on standard steps: face detection, feature extraction and subject’s identification.

The final step is to ensure liveness. For this purpose, an algorithm is employed, which comprises Discrete Cosine Transform (DCT) for removing the pixel correlation of the image, HSV color spatial histogram, and SVD and HSV histogram-based detection technique.

This solution was tested with a dataset processed with Fisher vector, which offers 7 scenarios that combine fake and real samples. The results show that detecting and removing false features before feature fusion is vital as they negatively affect accuracy.

Electrocardiogram in Multimodal Systems

Heart activity is one of the main vital signs of a human, and has therefore been been utilized for liveness recognition in the past. For instance, blood oxygenation and pulse analysis are suggested as an effective way of revealing deepfake videos and digital face manipulations. Electrocardiogram analysis is a somewhat similar method. It focuses on electric signals produced in the sinoatrial node, which is the right atrium of the heart. The technique involves isolating fiducial points for every pulse segment. ECG signals are aperiodic which makes them unique for each individual and therefore, a reliable recognition parameter.

Electrocardiogram analysis can be made multimodal and more failproof by combining it with ear and iris liveness detection. Ear and iris images go through pre-processing, which provides EER reduction, normalization, feature extraction via 1d-LBP method, and feature fusion via the concatenation by union principle, etc.

Password & Selfie Image Verification

Multimodal detection also encompasses combination of liveness techniques with passwords and other security measures. One such method was developed by ID R&D, and includes 3 steps. First, an accessor is offered to type in a password for unlocking their device or validating a transaction. Second, entering the password triggers the system, which captures the user’s image in the form of a selfie. Third, the selfie undergoes a liveness check in the specialized system. The system correlates data from both sources for complete verification.

Fingerprint & Iris Liveness Detection Method

In this method multimodal liveness detection is achieved through the analysis of fingerprint and iris textural characteristics. Fingerprint analysis involves extraction of micro- and macro- textural parameters that should be combined into a bigger feature vector.

A support vector machine will be used to classify the fingerprint images. Next, the input data undergoes enlarging and denoising stages. Later, features can be extracted with the help of spatial analysis, Neighboring Gray Tone Difference Matrix (NGTDM), Gray-Level Co-Occurrence Matrix (GLCM) calculation, six Haralick features estimation, etc.

The iris textural parameters are obtained with the help of Local Binary Hexagonal Extrema Pattern (LBHXEP), segmentation conducted with the integro-differential operator, image normalizing achieved with Daugman’s rubber-sheet model, SVM classifier, and other techniques.

Spoofing Detection Algorithm and Fusion Mechanisms

A study held at University of Naples and Clarkson University simulated spoofing attacks against fingerprint modality, which can detect liveness with the perspiration and morphology analysis algorithms. (The performance of the system was evaluated with Spoof False Accept Rate, SFAR). The study revealed that multimodal biometric systems can be made more robust and secure if their matchers follow trained and not fixed rules when exposed to a spoofing attack.


  1. Fujitsu unveils multimodal biometric authentication technology for contactless retail
  2. Facial Recognition is More Accurate in Photos Showing Whole Person
  3. What is aphasia?
  4. Biometric Liveness Checking Using Multimodal Fuzzy Fusion
  5. McGurk effect is a bright example of the cross-modal audiovisual perception
  6. Kernel canonical correlation analysis and its applications to nonlinear measures of association and test of independence
  7. Mel Frequency Cepstral Coefficients
  8. Multiple Object Tracking Using Edge Multi-Channel Gradient Model With ORB Feature
  9. Robust multimodal face and fingerprint fusion in the presence of spoofing attacks
  10. Multimodal Feature-Level Fusion for Biometrics Identification System on IoMT Platform
  11. Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations
  12. ECG method schematic
  13. IDLive® Face. Passive Facial Liveness Detection
  14. A multimodal liveness detection using statistical texture features and spatial analysis
  15. Neighboring Gray Tone Difference Matrix
  16. Haralick texture features from apparent diffusion coefficient (ADC) MRI images depend on imaging and pre-processing parameters
  17. Local binary hexagonal extrema pattern (LBHXEP): a new feature descriptor for fake iris detection
  18. Increase the security of multibiometric systems by incorporating a spoofing detection algorithm in the fusion mechanism