Burger menu

Multimodal Liveness Detection — Methods, Issues and Application

Multimodal liveness detection is a security approach that incorporates two or more biometric modalities for the sake of higher accuracy, robustness, and reliability

Problem Overview

Multimodal liveness detection — as opposed to unimodal approach — is based on the principle of scanning and analyzing more than one biometric features of a person. Currently, most detection solutions focus on a single parameter: voice, iris, fingerprints, face, hand geometry etc. The concept of multimodal detection can be dated back to 2013. A study published in Psychology Today showed that people do not recognize others solely by their faces, and "non-facial cues" — such as height and build — also play a major role in identifying a person.

Multimodal liveness detection received greater interest and development upon the growing threat of the Presentation Attacks (PAs). Due to the introduction of sophisticated tools like Generative Adversarial Network, it is possible to fabricate highly realistic images, videos or clone voices for malicious intent.

A multimodal biometric system designed by Fujitsu Laboratories
A multimodal biometric system designed by Fujitsu Laboratories

Currently, many security companies offer practical solutions in multimodal biometric antispoofing. Most of them combine two separate liveness detection methods: face + iris, face + heart activity, face + voice, and so on. The combination of techniques makes multimodal approach more effective than unimodal. Some primary advantages of multimodal detection include:

  • Better antispooing qualities. Analyzing multiple characteristics provides a higher accuracy at detecting presentation attacks.
  • Increased fraud difficulty. Perpetrators will have to counterfeit more than one biometric parameter, increasing the cost and complexity of initiating an attack.
  • ‘Compensation’ mechanism. Multimodal detection is often equipped with fault compensation. A multimodal recognition system is designed such that it can identify a person by analyzing just one biometric feature in case some of its sensors go defunct or if an accessor occludes some traits/features with a mask, gloves, sunglasses etc. This also includes cases when some features are unavailable at all: aphasia, amputation etc.

A number of multimodal solutions are offered including cross-modal fusion, fingerprint and iris modalities etc.

Face & Voice Fusion

The idea behind this method is that joint analysis of speech signals — voice, facial expressions, tongue or lip movement — leads to a synergistic effect, which helps the human brain process audiovisual data. The method relies on fuzzy fusion technique, which employs mutual dependency models. These models, in turn, allow extracting and analyzing spatio-temporal parallels between a person's speech and their facial expressions.

Face and voice liveness detection technique 'fuzzy fusion' represented schematically
Fuzzy Fusion schematic

The face and voice fusion method is vastly based on phoneme & viseme correlation. Phoneme refers to the sound made by a person, while viseme represents lip movement and mouth shapes that accompany that sound. Although it should be noted that some phonemes can correlate with multiple visems: /k, ɡ, ŋ/ etc.

McGurk effect is a bright example of the cross-modal audiovisual perception  

To extract lip motion features the kernel Canonical Correlation Analysis (kCCA) on Mel Frequency Cepstral Coefficients (MFCC) is used, meanwhile Multi Channel Gradient Model (MCGM) algorithm is employed for analyzing lip sequences. For testing of this technique, both bona fide and fake media were used including videos created from still images of people. The evaluation of fuzzy fusion revealed satisfying results: the Equal Error Rate (ERR) during the VidTIMIT test was 16.2%.

Anti-spoofing technique dubbed 'fuzzy fusion' with its error rates
Fuzzy fusion technique test results

Face & Fingerprint Fusion

Face and fingerprint multimodal solutions also provide a more reliable detection result. Suggested techniques utilize 1-median filtering and bagging-based fingerprint liveness detection. 1-median filter is more "immune" to outliers and can, therefore, resist spoofing attacks more successfully. Moreover, this technique can be applied to combine points of liveness and recognition scores.

Bagging-based method ensures further fingerprint liveness detection. It incorporates 2D Gabor filters, gray level co-occurrence matrix (GLCM), Fourier Transform (FT) based features, as well the Bootstrap AGGregatING method (also known as bagging). This fusion method  was tested with a chimeric dataset with samples borrowed from various datasets including LivDet2013. It showed an 85% accuracy when exposed to replay and other attacks. The method has a rich potential in mobile security and anti-spoofing for IoT.

Combining face and fingerprint detection for higher identification accuracy
Face & fingerprint fusion scheme

Multimodal Biometric Recognition Based on Multifusion

Multifusion detection combines face, fingerprint and finger recognition with the subsequent fusion of their features. Increasing the number of detection techniques enhances the chance of accurate detection.

Face and fingerprint samples used in detection study
Samples used in the study

Multifusion method involves three stages:

  • Fingerprint recognition. It includes variance method, fingerprint thinning, minutia extraction (ridge ending, short ridge, bifurcation), and other procedures.
  • Finger vein recognition. Similar to the previous step, vein recognition involves segmentation, feature extraction, normalization, as well as light source and sensor selection.
  • Face recognition. It is based on standard steps: face detection, feature extraction and subject’s identification.

The final step is to ensure liveness. For this purpose, an algorithm is employed, which comprises Discrete Cosine Transform (DCT) for removing the pixel correlation of the image, HSV color spatial histogram, and SVD and HSV histogram-based detection technique.

Scheme of face, fingerprint and finger vein fusion method
Scheme of face, fingerprint and finger vein fusion method

This solution was tested with a dataset processed with Fisher vector, which offers 7 scenarios that combine fake and real samples. The results show that detecting and removing false features before feature fusion is vital as they negatively affect accuracy.

Electrocardiogram in Multimodal Systems

Heart activity is one of the main vital signs of a human, and it has been featured repeatedly in liveness taxonomy for years. For instance, blood oxygenation and pulse analysis are suggested as an effective way of revealing deepfake videos and digital face manipulations. Electrocardiogram analysis is a somewhat similar method. It focuses on electric signals produced in the sinoatrial node, which is the right atrium of the heart. The technique involves isolating fiducial points for every pulse segment. ECG signals are aperiodic which makes them unique for each individual and therefore, a reliable recognition parameter.

Combining ECG with ear and iris liveness detection
ECG method schematic

Electrocardiogram analysis can be made multimodal and more failproof by combining it with ear and iris liveness detection. Ear and iris images go through pre-processing, which provides EER reduction, normalization, feature extraction via 1d-LBP method, and feature fusion via the concatenation by union principle, etc.

Fusion scheme of ECG with ear and iris liveness detection
Fusion schematic proposed for the ECG, ear and iris -based detection method

Password & Selfie Image Verification

Multimodal detection also encompasses combination of liveness techniques with passwords and other security measures. One such method was developed by ID R&D, and includes 3 steps. First, an accessor is offered to type in a password for unlocking their device or validating a transaction. Second, entering the password triggers the system, which captures the user’s image in the form of a selfie. Third, the selfie undergoes a liveness check in the specialized system. The system correlates data from both sources for complete verification.

Fingerprint & Iris Liveness Detection Method

In this method multimodal liveness detection is achieved through the analysis of fingerprint and iris textural characteristics. Fingerprint analysis involves extraction of micro- and macro- textural parameters that should be combined into a bigger feature vector.

A support vector machine will be used to classify the fingerprint images. Next, the input data undergoes enlarging and denoising stages. Later, features can be extracted with the help of spatial analysis, Neighboring Gray Tone Difference Matrix (NGTDM), Gray-Level Co-Occurrence Matrix (GLCM) calculation, six Haralick features estimation, etc.

Iris liveness detection and fingerprint recognition multimodal system
Framework of the proposed multimodal, fingerprint & iris detection solution

The iris textural parameters are obtained with the help of Local Binary Hexagonal Extrema Pattern (LBHXEP), segmentation conducted with the integro-differential operator, image normalizing achieved with Daugman’s rubber-sheet model, SVM classifier, and other techniques.

Iris liveness detection framework
Iris liveness detection framework

Spoofing Detection Algorithm and Fusion Mechanisms

A study held at University of Naples and Clarkson University simulated spoofing attacks against fingerprint modality, which can detect liveness with the perspiration and morphology analysis algorithms. (The performance of the system was evaluated with Spoof False Accept Rate, SFAR). The study revealed that multimodal biometric systems can be made more robust and secure if their matchers follow trained and not fixed rules when exposed to a spoofing attack.


Multimodal liveness detection — Definition

Multimodal liveness detection relies on analyzing multiple biometric characteristics.

In Multimodal liveness detection, numerous liveness signals are detected and analyzed at the same time. For instance, a person can be verified via scanning their eye retina and fingerprints, checking their voice and facial features, validating their handwriting and electrocardiogram, or any other combination.

It is seen that multimodal systems are far more efficient than unimodal detection. They are harder to spoof since it is more challenging and time consuming to counterfeit more than one biometric trait. Moreover, they prove to be more reliable in case one of the system sensors becomes temporarily defunct.

Unimodal liveness detection — Definition

Unimodal liveness detection includes analysis of a single biometric parameter.

Unimodal liveness detection accepts a single biometric modality as input, such as voice, retina, fingerprints, etc. This method is utilized in the majority of today's detection solutions, in both industry and home-grade systems.

The Unimodal approach has been popularized because of its cost-effectiveness, production simplicity, and ease of maintenance. However, some experts highlight that unimodal systems are more vulnerable to spoofing attacks due to their ‘lopsided’ approach. They have a lower accuracy at detecting attacks, lower fraud difficulty and cannot compensate if the single sensor used is out of commission.

What are the types of multimodal liveness detection?

Three primary types of multimodal liveness detection are currently in use.

Researchers highlight three principal types of multimodal systems:

  1. Face & voice fusion. Based on fuzzy fusion technique, the approach enables a system to analyze lip movement, voice attributes and facial expressions together.
  2. Face & fingerprint fusion. This approach is based on 1-median filtering and bagging-based fingerprint liveness detection, which are highly resistant to spoofing.
  3. Multifusion. The method incorporates fingerprint, finger vein, and facial liveness check.

It should be noted that some researchers also propose electrocardiogram to be included in multimodal antispoofing approaches.

What is the difference between Multimodal & Unimodal approaches?

Multimodal and unimodal solutions differ in capacity of the biometric trait analysis.

The Unimodal approach is the most widespread concept, which is based on analyzing just a single biometric parameter of a person. Unimodal solutions are designed to focus solely on one parameter at a time such as fingerprints, voice, ear shape, etc.

The multimodal approach, in comparison, focuses on more than one biometric features and vitality signals observed in a human body. It can analyze two or more parameters simultaneously to verify a person: voice and face, pulse activity and fingerprints, eye retina and face geometry, and so on.

While a multimodal system is more costly compared to unimodal systems, it is also believed to be far more effective and spoof-resistant.

What are the main sources of biometric information in multimodal detection?

At least three groups of biometric signals are used in modern multimodal systems.

Multimodal systems analyze the same biometric signals as the unimodal ones. In the multimodal approach, these signals are fused into combinations of two or more biometric features: eye retina + fingerprints, facial features + voice, finger vein + fingerprint, and so on.

At the same time, it is also suggested that blood oxygenation and heart activity can be used in multimodal antispoofing solutions as well (electrocardiogram.) Electrodermal activity, body constitution, temperature, handwritten signatures, and other similar parameters can also be potentially used in multimodal systems.

Are multimodal systems robust to spoofing attacks?

Multimodal systems have been generally known to show superior antispoofing performance with some vulnerabilities.

The multimodal detection approach is not completely foolproof, as it does yield a certain error percentage. However, it is seen as a superior system when compared to unimodal antispoofing. The chief advantage is that such a system can analyze multiple vitality signals simultaneously and therefore is much harder to fool.

Replicating two or more physical parameters is arduous and costly, which poses a preliminary barrier on its own to malicious actors. Moreover, a multimodal approach displays a higher accuracy at detecting liveness: fuzzy fusion technique showed a 16.2% EER when tested on a VidTIMIT male dataset.


  1. Fujitsu unveils multimodal biometric authentication technology for contactless retail
  2. Facial Recognition is More Accurate in Photos Showing Whole Person
  3. What is aphasia?
  4. Biometric Liveness Checking Using Multimodal Fuzzy Fusion
  5. Kernel canonical correlation analysis and its applications to nonlinear measures of association and test of independence
  6. Mel Frequency Cepstral Coefficients
  7. Multiple Object Tracking Using Edge Multi-Channel Gradient Model With ORB Feature
  8. Robust multimodal face and fingerprint fusion in the presence of spoofing attacks
  9. Multimodal Feature-Level Fusion for Biometrics Identification System on IoMT Platform
  10. Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations
  11. ECG method schematic
  12. IDLive® Face. Passive Facial Liveness Detection
  13. A multimodal liveness detection using statistical texture features and spatial analysis
  14. Neighboring Gray Tone Difference Matrix
  15. Haralick texture features from apparent diffusion coefficient (ADC) MRI images depend on imaging and pre-processing parameters
  16. Local binary hexagonal extrema pattern (LBHXEP): a new feature descriptor for fake iris detection
  17. Increase the security of multibiometric systems by incorporating a spoofing detection algorithm in the fusion mechanism
Avatar Antispoofing


Editors at Antispoofing Wiki thoroughly review all featured materials before publishing to ensure accuracy and relevance.