Human Performance in Face Liveness Detection

People and machines in face recognition

Machine-operated systems are quicker and more productive at performing repetitive tasks. Facial anti-spoofing with its types, countermeasures, and challenges, is exactly a system, where a vast number of faces should be recognized instantly. For instance, American Airlines introduced face scanners at their terminals to speed up the boarding process and service up to 500,000 passengers daily.

Face blending technique used for making a fake ID

Human performance in face liveness detection has been a topic of controversy prompting the question, whether people should partake in the face liveness recognition or not. As a result, a series of tests and experiments were conducted where humans and machines were put in a competition. Standard performance metrics APCER and BPCER were used to assess human performance in one such test. The test results showed that human efficiency displays a higher error rate. At the same time, it is proposed that the human observer cannot be solely excluded from decision-making. A number of companies and bodies today, rely on face verification. Among them are Airbnb, MasterCard, Australia Bank, Interpol, and others. Therefore, accurate face liveness recognition and the question of human performance is a pressing issue in that area.

Humans vs. Computers in Liveness Detection

Face manipulations are becoming more elaborate, posing a greater threat to face anti spoofing. Moreover, producing deepfakes is now possible, even for people with no technical background. Therefore, the number of attempts at face fraud is expected to grow considerably in the foreseeable future. Experts highlight strong and weak aspects of both Human and Computer face liveness detection. According to a study, fake images and synthesized media often have "limited naturalness" which makes it easier for them to be spotted by a naked eye. At the same time, some face manipulations can be highly complex, even exceeding the quality of genuine images. In this case, human inspection fails to detect them and only machine algorithms can expose fake media.

Human detection

Researchers point to four main aspects that make it easier for a human to detect a spoof image:

Segmentation. In most cases, a human will reject a photo/video sample as fake, if it lacks quality. Heavy pixelation, poor lighting, visual noises — all create a doubt factor.
Face blending. Morphing multiple faces into one is a type of face manipulation, often used to spoof border control. While difficult to detect, such media, if poorly produced, still retains a number of artifacts and blunders visible to a human eye. Additionally, it can fail to meet the standard ID photo requirements.
Fake faces. Synthesized faces made with the Generative Adversarial Network (GAN) can look extremely realistic. At the same time, computing power and knowledge required for this operation may be unavailable to low-level con-artists. As a result, poorly produced media can often cause self-exposure.
Poor synchronization. Temporal inconsistencies in a fake videos quickly attract attention. Poor lip-synching, audio-video mismatches, strange facial expressions — can easily reveal a fake video.

Liveness detection system reveals presented image of man as fake and not a real face — Demo of a face liveness detection process

Interestingly, one of the ways to spot a fake video is by paying attention to the presented person’s behavior. Observing the so-called accompanying behavior— facial expressions, gestures, and body language — is a way in which humans inherently identify each other. If there is a certain degree of unnatural behavior in the footage, the human brain can discern it, especially if situational context is provided — an exclusive human skill that neither passive, not active liveness detection systems can be trained to develop.

Nigerian woman holding a placard that is modified through cheap photoshop tools — A poor quality fake produced in Photoshop

Computer Detection

Computer detection remains superior when compared to human performance in fake face detection, especially in regard to passive liveness solutions. Though sometimes, computers can show a high error rate as well: One study quotes that "several computer algorithms performed with high error rates" when trying to detect morphed images.

Computer detection relies on a number of techniques:

Data modalities detection. This method includes analysis of the following parameters: audio spectrograms, video spatio-temporal features, and audio-video inconsistencies. They can detect artifacts left by processing tools and phoneme-viseme mismatches etc.
Temporal sequential analysis. This method is a powerful tool for detecting deepfakes. It employs the Open-CL language and the temporal-based C-LSTM model. Together they extract frames from the source video, to check which of them were used to conduct face swapping.
Heart rate estimation. This method involves a detector that is capable of analyzing the heart rate of a person presented in a video. It registers face illumination and oxygen levels to analyze slight differences in skin colorization invisible to a naked eye. As a result, the system can spot a "synthetic person".
Illumination-based analysis. This method relies on a simple technique: flashing randomly generated colors and verifying the light reflecting off the subject’s face. Then, by employing linear regression models and a system of cameras and screens, the method verifies the timing. Among other things, it helps to identify a person’s face shape.
MAD. Morphing Attack Detection (MAD) is based on three descriptors: gradient-based, texture-based, and descriptors learned by a deep neural network. They serve to extract features from the image in question to spot clues left by the morphing procedure.

Since, Anti-spoofing is a new and constantly evolving area, apart from the methods discussed above, many new techniques and technologies are constantly under research.

Human vs. Machine Performance Experiments

A Caucasian man's fake face generated with Thispersondoesntexist.com — A fake face generated with Thispersondoesntexist.com

ID R&D’s Experiment

An experiment conducted at ID R&D Inc. was aimed at determining how efficiently people can handle face liveness recognition. During the test, the standard Presentation Attack Instruments (PAIs) were used: 2D and 3D masks, printed photos, cutouts, and displayed images. The experiment revealed the performance of people to be generally much worse than machines in face liveness detection.

APCER-based performance of the human examinees

ATTACK TYPE	2D MASK	3D MASK	PRINTED CUTOUT	PRINTED FOTO	DISPLAY
APCER	2,04%	2,35%	2,04%	30,34%	15,04%

The BPCER-based results — in other words rejected pristine images — also showed a considerable error rate of 18.28%.

A collections of real and fake faces used for research and development of detection softwares — Real and fake faces mingled for the ID R&D test

The second part of the experiment focused on collective feedback. A group of 17 people were challenged to identify fake images — their decisions were based on a majority vote. In this case, the error rate was lower. However, computer-based detection still outperformed human detection.

A similar experiment was collaboratively conducted by some institutions including the University of Berkley. A dataset of human faces was used, only half of which were real. (The other half was synthesized with a Generative Adversarial Network StyleGAN2).

Faces of real individuals and fake faces synthesized using Generative Adversarial Network — Real and GAN-synthesized faces used in the Berkley’s test

The test results were unsatisfactory: average performance by the examinees was at 50%, which is close to blind luck. In the second stage of the test, participants were trained and showed better results, 60%. However, the accuracy of human detection was still low.

Experiment with Synthesized Faces in Adobe Photoshop

Another research involved a popular Adobe Photoshop application. Using it, a number of fake new faces were created. Additionally, the stimuli were diversified with 50 real photos altered by a professional artist. Similar to previous experiments, the results were not in favor of the human participants, showing only 53.5% performance rate. Later, the authors of the study proposed the Dilated Residual Network system (DRNs), which showed an Average Precision rate of 99.8%.

Experiments Conducted at School of Psychology, University of Lincoln

The University of Lincoln conducted an experiment, during which volunteers were tasked to tell real images apart from facial morphs. Prior to the test, participants received a brief training. The test results were quite low with the control group scoring a 56% performance rate and the training group getting only 52.1%. Three more tests were conducted, also showing low human performance.

Faces of real people on left and right morphed to create fake faces in the middle — Morph examples used during the test

Experiments with Human Crowds, Machines, and Machine-informed Crowds

A study held at MIT showed somewhat optimistic results regarding human performance to perceive face liveness. It showed that during Experiment 1, 82% of examinees managed to outperform the leading deepfake detection model. In Experiment 2, only 13-37% of participants were capable of doing the same, showing low repeatability of the results.

Accuracy chart of Human and computer performance in detecting fake images — Human Performance vs. Model Performance distribution

The study also indicated that for successful detection by a human observer, it is necessary to be informed about the context of a deepfake media. (Political videos are mentioned as one of the best contextual examples).

FAQ

What factors influence deepfake detection?

There is a group of factors that can help a person to detect deepfakes.

Though machine detection is more preferable, sometimes a human observer can reveal a deepfake as well. Quality plays a tremendous role with low quality deepfakes being easily identifiable. Producing a nearly spotless forgery requires a lot of computational power and time, which is not accessible to most impostors.

Moreover, antispoofing guidelines mention that deepfakes often reveal themselves through visual noise, background pixelation, lip movement inconsistencies, unnatural physiology and body language of the target, etc. Additionally, real-time deepfake technologies cannot catch up with the speed of movement in the video, therefore, producing warping artifacts.

Are there any experiments that allow people to try deepfake detection?

A number of experiments were hosted to assess human capability at deepfake detection.

A series of experiments compared human and machine performance at deepfake and liveness detection. Among them are experiments by ID R&D, University of Berkley, School of Psychology, MIT, and others. Their goal was to see whether human participants could possibly outperform machine algorithms. Virtually, every single case indicated machine performance to be far superior than human detection.

For example, the ID R&D test revealed that the APCER rate for the printed photo detection was 30.34% and 15.04% for the display attacks, which is highly unsatisfactory. Interestingly, the crowd wisdom challenge showed a little better results at antispoofing.