Facial Antispoofing: Types, Countermeasures and Challenges

From Antispoofing Wiki

Problem Overview

Facial recognition is one of the earliest techniques used to identify a person using Artificial Intelligence (AI). The first instance can be dated back to 1964 when Woodrow Bledsoe conducted a series of experiments using a primitive scanner to capture a human face and a computer to identify it.

The first attempt to put facial recognition to practical use was made in the early 2000s when FERET — Face Recognition Technology developed by NIST —  was used in the Face Recognition Vendor Tests. Their goal was to eventually help the government employ this technology for practical applications.

Since then, facial recognition has become part of numerous applications such as banking, border checks, social media, video games, controlled gambling, education, sporting events, etc. As facial recognition technologies have advanced over the years, so have the tactics and methods used by impostors to cheat/hack the recognition systems.

These hacking methods range from rudimentary to complex attack types: from employing simple 2D photos to constructing elaborate 3D silicone masks, which can sometimes even "emulate" the warmth produced by a live human face.

Terminology

Anti-spoofing terminology is chiefly stated by the International Organization for Standardization (ISO).

Commonly used terms include:

  • Spoofing attack. An attempt by an impostor to be successfully identified by a recognition system as someone else. The attack is carried out for various, most often illegal purposes.
  • Anti-spoofing. A set of countermeasures designed to mitigate or prevent spoofing attacks.
  • Presentation Attack (PA). An attack, during which a biometric presentation — a target's photo, deepfake video, mask, etc. — is presented to the recognition system.
  • Indirect attack. A method, which exploits a technical weakness of the recognition system, such as flawed coding, in order to hack it.
  • Presentation Attack Detection (PAD). A technology, which scans biometric signals of a presented object to detect whether it is a legitimate person or their imitation.
  • Presentation attack instrument. An item used for presenting an attack: a mask, target's picture, pre-recorded video, fake fingerprints etc.
  • Half-Total Error Rate (HTER). A parameter used to assess the performance quality of a recognition system. HTER is calculated with the help of False Rejection Rate and False Acceptance Rate values, using the formula: HTER = (FAR+FRR) / 2.

Liveness detection and anti-spoofing are relatively new technologies, therefore, their terminology and standards are still in development.

Types of Attacks

Facial attacks can be classified as Presentation Attacks and Injection Attacks.

Presentation attacks

Presentation Attacks (PAs) include manipulating or falsifying the existing biometric features and then presenting them physically to a biometric system's sensor — camera in this case. It is suggested that not every PA is adversarial in its nature. For example, a face recognition system can execute a false rejection if a subject has undergone a plastic surgery, applies cosmetics, etc.



Malicious facial attacks, in their turn, pursue nefarious goals and can be separated into:

  • Impostor Attacks. They focus on impersonating the target's facial parameters or completely synthesizing them from scratch.
  • Concealment Attacks. These are utilized to camouflage the attacker's true identity to avoid identification.



Facial PAs encompass a range of techniques that vary in quality and creativity. Impostors are now able to use complex and advanced technologies to bypass facial recognition. Some of the common attack types and tools are tabulated below:

Printed Photo


The simplest type of facial spoofing attack involves presenting a printed photo of a targeted person to a facial recognition system. However, due to its primitivity, the method is of a low threat.

printed picture of African woman with headscarf used in presentation attack
An example of printed attack
Display Attack


In a display attack, the screen of a mobile gadget is used to reproduce an image or video of the target and present it to the camera of a system.

2D mask


Fraudsters also use printed 2D masks, at times with holes in them, which imitate a human face. In this attack, facial proportions and size of the image are meticulously reproduced to create a realistic facial image.

presentation attack using an African man's 2D face mask made of paper
Example of 2D mask attack
3D mask


A more sophisticated method involves sculpting a realistic looking 3D mask, which mimics the face of the target. In 2020 a Vietnamese security company Bkav demonstrated the successful unlocking of an iPhone X, using a 3D printed mask.

highly specialized 3D silicone mask used to unlock iPhone X
The mask used by Bvak to unlock iPhone X [19]
Silicone mask


Another form of attack uses masks made of silicone. Such masks with enhanced elasticity can achieve a highly believable image of a target.

Deepfake


Deepfakes, generated with the help of AI, produce artfully doctored videos. Deepfake can "attach" a target's face to another body, generate a video clip using a static image or mimic a voice.

Pose/expression variation


Additionally, to conceal their identity, an attacker can depart from a neutral facial expression or change their head pose.



PAs belong to the analogue domain — they happen outside the solution's operational system and memory, even though they are attacking the system's algorithms.

Injection attacks

As opposed to PAs, injection attacks happen on the digital domain — they are executed to intercept and tamper bona fide biometric data with a forgery inside the system. They are presented in Software and Hardware-based types.

The Software type implies that a target's gadget can be 'infested' with a malicious application that grants access to its memory, internal communication channels, biometric template, etc. The attacker can modify the genuine data: manipulating the feature extraction module, etc.

The Hardware type requires a module to convert the HDMI stream to MIPI CSI and an LCD controller board. This contraption can replace the genuine video stream with a fake one coming from another device. Hardware-based approach is more sinister, as it offers lower latency, and removes other cues that can expose interference.


Facial Spoofing Prevention

Modern biometric-based security systems offer a range of solutions to tackle facial spoofing.

Hardware-based solutions

Active flash is one of the most effective countermeasures against facial spoofing. Analyzing how light reflects from an object, the system can conclude if it lacks the necessary shape, depth and detail of a real human face. This method analyzes difference between a real face and a mask caused by a discrepancy at higher frequencies which can be detected by the system.

Another potential anti-spoofing method is using a camera equipped with infrared sensors — they can gather thermal data of the target to analyze its emanation and distribution patterns, which are incredibly hard to falsify.

Pupil dilation can be used as another measure to detect PAs. Eye pupils tend to dilate from time to time, in reaction to changes in lighting. An intentional but safe (nontraumatic) increase or decrease of light by the detection system will make pupils react — it can be detected by a 2D camera together with blinking.

3D cameras are also a promising solution for facial anti-spoofing. 3D cameras are able to capture a stereo image due to their "binocular vision", and can also assess if the object presented has enough depth for a real human face and whether its size is adequate for the target.

LIDAR — light detection and ranging technology can also be used to detect facial spoofing attacks. The technology is able to measure distance and depth to assess whether a presented target is fake. LIDAR technology is implemented in the iPhone 12 in order to improve the security of its facial unlocking features.


A Reto RETO3D Classic 3D camera can be used to detect liveness
Reto RETO3D Classic — an example of a 3D camera

Software-based solutions

Software-based (SW) solutions utilize existing camera systems to detect liveness. Regular cameras provide only RGB texture information and do not support infrared, depth map or other 3D features. Therefore, software-based solutions involve complicated post-processing techniques to provide an accurate liveness decision.

There are two widely used SW-based methods: Active liveness and Passive liveness.

Active liveness methods involve challenge-and-response techniques where a person is asked to perform a task: smile, blink repeatedly, turn their head, close eyes etc. By analyzing the movement, the system can identify whether it is a real person or not. Some active liveness methods reconstruct a 3D map using footage of a target recorded from different angles and distances.

Passive liveness includes mostly software-based techniques that function in the background, while being unintrusive. They are usually divided in two groups:

  • Passive techniques detect liveness using data from the device sensors. In this case, the detection software requires access to the device sensors to control parameters such as the color/light of its display to analyze changes in color spectrum of the face. Another example is controlling camera focus to do a reconstruction of a 3D face map.
  • Fully passive techniques are those in which both the user experience and software integration are passive. These methods are based on texture analysis or optical flow analysis of a single RGB image, set of images or a recorded video.

Texture-based analysis is an effective and widely used method of passive detection. This method illuminates the target image/video and examines the skin texture by analyzing its reflecting properties. Since human skin and its fake analogues — especially printed photos — have different surface properties and texture patterns, differentiating a false image from an actual face with the texture-based analysis is relatively easy.

Deep convolutional neural networks (CNN) are also effective at passive anti-spoofing. CNNs are specifically designed to examine visual images. Neural networks work by learning already known attack patterns, especially, in case of temporal and spatial values, from which aligned feature maps are made. Therefore, most CNN approaches are based on these principles.

Another proposed passive tactic uses integration of anisotropic diffusion. This principle states that light diffuses slower when reflected from a 2D surface compared to a 3D surface. A 3D object — like a human face — allows faster anisotropic diffusion due to its nonuniform surface, which is detected by a Specialized Convolutional Neural Network (SCNN). Its architecture can be applied to video sequences to prevent Replay Attacks.


FAQ

What is image spoofing?

Image spoofing is a presentation attack type which utilizes a 2D image of a targeted person.

Image spoofing is the most commonly used type of spoofing attack. In this attack, malicious actors print a 2D photo of the target and then present it to the sensors of a recognition system. (Hence called a “presentation attack”).This attack is popular as it is cheap, quick and easy to execute.

The ease and accessibility of image spoofing also makes it easily detectable. Therefore, it is not successful against most advanced recognition systems. However, it can still be successful if used against relatively inexpensive smartphones, which do not invest heavily on security tools. Unlike pricier gadgets, these phones do not have sophisticated equipment to differentiate a 2D object (presented image) from a 3D one (live face).

What is CNN (Convolutional Neural Network)?

Convolutional neural network is a type of an ANN extensively used in biometric security.

Convolutional Neural Networks (CNNs) are a subtype of Artificial Neural Networks (ANNs). Their name comes from the specific architecture, which involves three elements:

  • Convolutional layer. This is where main computation occurs. It includes a feature map, filter and input data.
  • Pooling layer. It is responsible for dimensionality reduction and applies the aggregation function to the receptive field values. There are two types of pooling: max and average.
  • Fully connected layer. It leverages a softmax activation function to provide correct input classification.

CNNs show highly accurate results when used in image, video recognition or voice liveness and facial liveness. They are therefore, widely employed in antispoofing.

What is 3D face recognition?

3D face recognition is considered the most accurate method in facial recognition technology.

3D recognition is a technique employed as part of facial antispoofing measures. It involves a stereoscopic 3D camera that is capable of range imaging. This feature is critical in analyzing the reflection of light from a presented face. As a result, the system can detect the geometry of rigid facial features. If an object presented as a human face lacks the depth, shape, and anatomical detail of a real face, the system will reject it as afake. An alternative 3D face recognition technique works by using non-specialized cameras. This approach requires a user to perform head movements, so that their image can be captured from different angles.

What is 2D face recognition?

2D face recognition is a relatively simpler method of identifying a human face.

2D face recognition is less complicated compared to 3D recognition. It does not involve sophisticated equipment and techniques such as 3D stereoscopic and infrared cameras, flood illuminators, sensors, and in-depth facial maps. 2D facial recognition simply compares a stored image of a user with the face presented to its cameras. This technique is especially popular as part of the “Phone Unlock” feature in less expensive phone models. Although it is simpler, cheaper and more accessible, 2D face recognition is also more vulnerable to threats. It can fail to detect a presentation attack in the form of a printed photo. Therefore, liveness detection modules are commonly used in conjunction with 2D face recognition for more accurate results.

Is 3D face recognition better than 2D?

3D face recognition is widely considered to be superior than 2D in terms of accuracy. 2D face recognition is better from the hardware requirements point of view.

Face recognition systems that use tools designed for identifying a 3D object show higher accuracy. A 3D camera is a popular equipment type used for 3D face recognition. Using stereoscopic vision, a 3D camera can tell if a presented face lacks the depth and volume intrinsic to a live human face/head. Infrared sensors are another vital component used in 3D face recognition. They detect the warmth emanated by the target’s face and can therefore, differentiate between a lifeless photo/mask and a real face.

A 3D recognition system also analyzes light distribution on a facial surface as well as frequency spectrum. As a result, 3D facial recognition system is extremely hard to trick and, therefore, usually combines face recognition and 3D liveness detection. At the same time 2D face recognition doesn't require any specific hardware, and in conjunction with 2D liveness detection can be used on a broader range of different devices.

How does liveness work with face biometrics?

Liveness detection uses data provided by facial biometrics to prevent presentation attacks.

Biometric recognition systems use Liveness detection techniques to identify whether a presented person is real or not. Based on this, a security system can accept or reject authorization. Facial recognition relies on multiple features and behaviors of the human face. They include pupil dilution, skin texture and temperature analysis, face coloring detection, lip movement, breathing patterns, blinking, etc. Using this data, liveness detection employs techniques such as optical flow analysis, residual neural network, Fourier spectral analysis, stereoscopic cameras, etc. Based on the results, a system can decide whether a person is real or not.

References

  1. Facial recognition system, Wikipedia
  2. Woody Bledsoe, Wikipedia
  3. Face Recognition Technology (FERET), NIST
  4. Where is facial recognition used? Thales
  5. Hackers just broke the iPhone X's Face ID using a 3D-printed mask, Wired
  6. Information technology — Biometric presentation attack detection — Part 3: Testing and reporting
  7. A Survey in Presentation Attack and Presentation Attack Detection
  8. ID.me gathers lots of data besides face scans, including locations. Scammers still have found a way around it
  9. What are deepfakes? TechTalks
  10. Presentation Attack Detection — ISO/IEC 30107
  11. The man in the latex mask: BLACK serial armed robber disguised himself as a WHITE man to rob betting shops
  12. Analogue domain
  13. TC358779XBG Peripheral Datasheet PDF
  14. An Overview Of Face Liveness Detection
  15. Stereo Camera, Wikipedia
  16. Lidar is one of the iPhone and iPad's coolest tricks. Here's what else it can do, Cnet
  17. Face Spoof Attack Recognition Using Discriminative Image Patches
  18. Enhanced Deep Learning Architectures for Face Liveness Detection for Static and Video Sequences
  19. Deepfake de Tom Cruise: pas pour le premier venu
  20. Máscara De Homem Velho Látex Realista De Halloween
  21. Face recognition - presentation attack detection
  22. Enhanced Deep Learning Architectures for Face Liveness Detection for Static and Video Sequences