Facial Antispoofing: Types, Countermeasures and Challenges

From Antispoofing Wiki

Problem Overview

Facial recognition is one of the earliest techniques used to identify a person using Artificial Intelligence (AI). The first instance can be dated back to 1964 when Woodrow Bledsoe conducted a series of experiments using a primitive scanner to capture a human face and a computer to identify it.

The first attempt to put facial recognition to practical use was made in the early 2000s when FERET — Face Recognition Technology developed by NIST —  was used in the Face Recognition Vendor Tests. Their goal was to eventually help the government employ this technology for practical applications.

Since then, facial recognition has become part of numerous applications such as banking, border checks, social media, video games, controlled gambling, education, sporting events, etc. As facial recognition technologies have advanced over the years, so have the tactics and methods used by impostors to cheat/hack the recognition systems.

These hacking methods range from rudimentary to complex attack types: from employing simple 2D photos to constructing elaborate 3D silicone masks, which can sometimes even "emulate" the warmth produced by a live human face.

Terminology

Anti-spoofing terminology is chiefly stated by the International Organization for Standardization (ISO).

Commonly used terms include:

  • Spoofing attack. An attempt by an impostor to be successfully identified by a recognition system as someone else. The attack is carried out for various, most often illegal purposes.
  • Anti-spoofing. A set of countermeasures designed to mitigate or prevent spoofing attacks.
  • Presentation Attack (PA). An attack, during which a biometric presentation — a target's photo, deepfake video, mask, etc. — is presented to the recognition system.
  • Indirect attack. A method, which exploits a technical weakness of the recognition system, such as flawed coding, in order to hack it.
  • Presentation Attack Detection (PAD). A technology, which scans biometric signals of a presented object to detect whether it is a legitimate person or their imitation.
  • Presentation attack instrument. An item used for presenting an attack: a mask, target's picture, pre-recorded video, fake fingerprints etc.
  • Half-Total Error Rate (HTER). A parameter used to assess the performance quality of a recognition system. HTER is calculated with the help of False Rejection Rate and False Acceptance Rate values, using the formula: HTER = (FAR+FRR) / 2.

Liveness detection and anti-spoofing are relatively new technologies, therefore, their terminology and standards are still in development.

Types of Attacks

Facial attacks can be classified as Presentation Attacks and Injection Attacks.

Presentation attacks

Presentation Attacks (PAs) include manipulating or falsifying the existing biometric features and then presenting them physically to a biometric system's sensor — camera in this case. It is suggested that not every PA is adversarial in its nature. For example, a face recognition system can execute a false rejection if a subject has undergone a plastic surgery, applies cosmetics, etc.



Malicious facial attacks, in their turn, pursue nefarious goals and can be separated into:

  • Impostor Attacks. They focus on impersonating the target's facial parameters or completely synthesizing them from scratch.
  • Concealment Attacks. These are utilized to camouflage the attacker's true identity to avoid identification.



Facial PAs encompass a range of techniques that vary in quality and creativity. Impostors are now able to use complex and advanced technologies to bypass facial recognition. Some of the common attack types and tools are tabulated below:

Printed Photo


The simplest type of facial spoofing attack involves presenting a printed photo of a targeted person to a facial recognition system. However, due to its primitivity, the method is of a low threat.

printed picture of African woman with headscarf used in presentation attack
An example of printed attack
Display Attack


In a display attack, the screen of a mobile gadget is used to reproduce an image or video of the target and present it to the camera of a system.

2D mask


Fraudsters also use printed 2D masks, at times with holes in them, which imitate a human face. In this attack, facial proportions and size of the image are meticulously reproduced to create a realistic facial image.

presentation attack using an African man's 2D face mask made of paper
Example of 2D mask attack
3D mask


A more sophisticated method involves sculpting a realistic looking 3D mask, which mimics the face of the target. In 2020 a Vietnamese security company Bkav demonstrated the successful unlocking of an iPhone X, using a 3D printed mask.

highly specialized 3D silicone mask used to unlock iPhone X
The mask used by Bvak to unlock iPhone X [19]
Silicone mask


Another form of attack uses masks made of silicone. Such masks with enhanced elasticity can achieve a highly believable image of a target.

Deepfake


Deepfakes, generated with the help of AI, produce artfully doctored videos. Deepfake can "attach" a target's face to another body, generate a video clip using a static image or mimic a voice.

Pose/expression variation


Additionally, to conceal their identity, an attacker can depart from a neutral facial expression or change their head pose.



PAs belong to the analogue domain — they happen outside the solution's operational system and memory, even though they are attacking the system's algorithms.

Injection attacks

As opposed to PAs, injection attacks happen on the digital domain — they are executed to intercept and tamper bona fide biometric data with a forgery inside the system. They are presented in Software and Hardware-based types.

The Software type implies that a target's gadget can be 'infested' with a malicious application that grants access to its memory, internal communication channels, biometric template, etc. The attacker can modify the genuine data: manipulating the feature extraction module, etc.

The Hardware type requires a module to convert the HDMI stream to MIPI CSI and an LCD controller board. This contraption can replace the genuine video stream with a fake one coming from another device. Hardware-based approach is more sinister, as it offers lower latency, and removes other cues that can expose interference.


Facial Spoofing Prevention

Modern biometric-based security systems offer a range of solutions to tackle facial spoofing.

Hardware-based solutions

Active flash is one of the most effective countermeasures against facial spoofing. Analyzing how light reflects from an object, the system can conclude if it lacks the necessary shape, depth and detail of a real human face. This method analyzes difference between a real face and a mask caused by a discrepancy at higher frequencies which can be detected by the system.

Another potential anti-spoofing method is using a camera equipped with infrared sensors — they can gather thermal data of the target to analyze its emanation and distribution patterns, which are incredibly hard to falsify.

Pupil dilation can be used as another measure to detect PAs. Eye pupils tend to dilate from time to time, in reaction to changes in lighting. An intentional but safe (nontraumatic) increase or decrease of light by the detection system will make pupils react — it can be detected by a 2D camera together with blinking.

3D cameras are also a promising solution for facial anti-spoofing. 3D cameras are able to capture a stereo image due to their "binocular vision", and can also assess if the object presented has enough depth for a real human face and whether its size is adequate for the target.

LIDAR — light detection and ranging technology can also be used to detect facial spoofing attacks. The technology is able to measure distance and depth to assess whether a presented target is fake. LIDAR technology is implemented in the iPhone 12 in order to improve the security of its facial unlocking features.


A Reto RETO3D Classic 3D camera can be used to detect liveness
Reto RETO3D Classic — an example of a 3D camera

Software-based solutions

Software-based (SW) solutions utilize existing camera systems to detect liveness. Regular cameras provide only RGB texture information and do not support infrared, depth map or other 3D features. Therefore, software-based solutions involve complicated post-processing techniques to provide an accurate liveness decision.

There are two widely used SW-based methods: Active liveness and Passive liveness.

Active liveness methods involve challenge-and-response techniques where a person is asked to perform a task: smile, blink repeatedly, turn their head, close eyes etc. By analyzing the movement, the system can identify whether it is a real person or not. Some active liveness methods reconstruct a 3D map using footage of a target recorded from different angles and distances.

Passive liveness includes mostly software-based techniques that function in the background, while being unintrusive. They are usually divided in two groups:

  • Passive techniques detect liveness using data from the device sensors. In this case, the detection software requires access to the device sensors to control parameters such as the color/light of its display to analyze changes in color spectrum of the face. Another example is controlling camera focus to do a reconstruction of a 3D face map.
  • Fully passive techniques are those in which both the user experience and software integration are passive. These methods are based on texture analysis or optical flow analysis of a single RGB image, set of images or a recorded video.

Texture-based analysis is an effective and widely used method of passive detection. This method illuminates the target image/video and examines the skin texture by analyzing its reflecting properties. Since human skin and its fake analogues — especially printed photos — have different surface properties and texture patterns, differentiating a false image from an actual face with the texture-based analysis is relatively easy.

Deep convolutional neural networks (CNN) are also effective at passive anti-spoofing. CNNs are specifically designed to examine visual images. Neural networks work by learning already known attack patterns, especially, in case of temporal and spatial values, from which aligned feature maps are made. Therefore, most CNN approaches are based on these principles.

Another proposed passive tactic uses integration of anisotropic diffusion. This principle states that light diffuses slower when reflected from a 2D surface compared to a 3D surface. A 3D object — like a human face — allows faster anisotropic diffusion due to its nonuniform surface, which is detected by a Specialized Convolutional Neural Network (SCNN). Its architecture can be applied to video sequences to prevent Replay Attacks.


References

  1. Facial recognition system, Wikipedia
  2. Woody Bledsoe, Wikipedia
  3. Face Recognition Technology (FERET), NIST
  4. Where is facial recognition used? Thales
  5. Hackers just broke the iPhone X's Face ID using a 3D-printed mask, Wired
  6. Information technology — Biometric presentation attack detection — Part 3: Testing and reporting
  7. A Survey in Presentation Attack and Presentation Attack Detection
  8. ID.me gathers lots of data besides face scans, including locations. Scammers still have found a way around it
  9. What are deepfakes? TechTalks
  10. Presentation Attack Detection — ISO/IEC 30107
  11. The man in the latex mask: BLACK serial armed robber disguised himself as a WHITE man to rob betting shops
  12. Analogue domain
  13. TC358779XBG Peripheral Datasheet PDF
  14. An Overview Of Face Liveness Detection
  15. Stereo Camera, Wikipedia
  16. Lidar is one of the iPhone and iPad's coolest tricks. Here's what else it can do, Cnet
  17. Face Spoof Attack Recognition Using Discriminative Image Patches
  18. Enhanced Deep Learning Architectures for Face Liveness Detection for Static and Video Sequences
  19. Deepfake de Tom Cruise: pas pour le premier venu
  20. Máscara De Homem Velho Látex Realista De Halloween
  21. Face recognition - presentation attack detection
  22. Enhanced Deep Learning Architectures for Face Liveness Detection for Static and Video Sequences