Facial Anti-Spoofing: Types, Countermeasures and Challenges

From Antispoofing Wiki

Facial recognition is the one of the earliest techniques used to identify a person using Artificial Intelligence (AI). The first instance can be dated back to 1964 when Woodrow Bledsoe conducted a series of experiments using a primitive scanner to capture a human face and a computer to identify it.

The first attempt to put facial recognition to practical use was made in the early 2000s when FERET — Face Recognition Technology developed by NIST—  was used in the Face Recognition Vendor Tests. Their goal was to eventually help the government employ this technology for practical applications.

Since then, facial recognition has become part of numerous applications such as banking, border checks, social media, video games, controlled gambling, education, and sporting events etc. As facial recognition technologies have advanced over the years, so have the tactics and methods used by impostors to cheat/hack the recognition systems.

These hacking methods range from rudimentary to complex attack types: from employing simple 2D photos to constructing elaborate 3D silicone masks, which can sometimes even “emulate” the warmth produced by a live human face.

Terminology

Anti-spoofing terminology is chiefly stated by the International Organization for Standardization (ISO).

Commonly used terms include:

  • Spoofing attack. An attempt by an imposter to be successfully identified by a recognition system as someone else. The attack is carried out for various, most often illegal purposes.
  • Anti-spoofing. A set of countermeasures designed to mitigate or prevent spoofing attacks.
  • Presentation Attack (PA). An attack, during which a biometric presentation — a target’s photo, deepfake video, mask, etc. — is presented to the recognition system.
  • Indirect attack. A method, which exploits a technical weakness of the recognition system, such as flawed coding, in order to hack it.
  • Presentation Attack Detection (PAD). A technology, which scans biometric signals of a presented object to detect whether it is a legitimate person or their imitation.
  • Presentation attack instrument. An item used for presenting an attack: a mask, target's picture, pre-recorded video, fake fingerprints etc.
  • Half-Total Error Rate (HTER). A parameter used to assess the performance quality of a recognition system. HTER is calculated with the help of False Rejection Rate and False Acceptance Rate values, using the formula: HTER = (FAR+FRR) / 2.

Liveness detection and anti-spoofing are relatively new technologies, therefore, their terminologies and standards are still in development.

Types of Attacks

Facial recognition attacks encompass a range of techniques that vary in quality and creativity. Starting from relatively humble techniques, imposters are now able to use complex and advanced technologies to present attacks. Some of the common types of attacks are tabulated below:

Printed Photo


The simplest type of facial spoofing attack involves presenting a printed photo of the targeted person to a facial recognition system. However, due to its primitive technique, the method is no longer a threat, as modern security systems can instantly identify a fake image based on the lack of depth and other intrinsic features of a live human face.

Example of printed attack
Example of printed attack
Display Attack


In a display attack, the screen of a mobile gadget (phone or tablet) is used to reproduce an image of the target — photo or video — and presented to the camera of the system. This attack is mostly ineffective when used against advanced recognition systems.

2D mask


Fraudsters are also known to use printed 2D masks, which imitate a human face. In this attack, facial proportions and size of the image are meticulously reproduced to create a realistic target image.

The attack has a higher chance for success if an original, high-definition photo of the target is used to produce the mask. According to a study, masks can be effectively made by printing a photo on copper paper. Masks made of paper with holes in them are also used.

Example of 2D mask attack
Example of 2D mask attack
3D mask


A more sophisticated method involves sculpting a realistic looking 3D mask, which mimics the face of the target. In 2020 a Vietnamese security company Bkav demonstrated the successful unlocking of an iPhone X, using a 3D printed mask. The mask was constructed using plastic, paper, and silicone costing only $150 and was able to successfully trick the system into believing that it was the actual owner.

The mask used by Bvak to unlock iPhone X [19]
The mask used by Bvak to unlock iPhone X [19]
Silicone mask


Another form of attack uses masks made of silicone material. Silicone masks with enhanced elasticity can achieve a highly believable image of the target. Cognitive research, conducted in 2017, shows that most people have a hard time distinguishing an actual face from a silicone mask.

Moreover, silicone is also capable of simulating physical properties of actual human skin making it a common choice for higher level attacks.


Another research indicates that while making a silicone or latex mask is difficult, expensive, and time-consuming, their prices have decreased over time. Therefore, more presentation attacks involving such masks are expected to take place in the future.

Deepfake


Deepfakes, generated with the help of AI, produce artfully doctored videos. Deepfake can “attach” a target’s face to another body, or can be used to generate a video clip using a static image of the target.


As a result, a deepfake is a highly dangerous attack presented to a security system. A deepfake attack is even more effective when a video deepfake is coupled with an audio deepfake, which can imitate a target's voice.  

Facial Spoofing Prevention

Modern biometric-based security systems offer a range of solutions to tackle facial spoofing.

Hardware-based solutions

Active flash is one of the most effective countermeasures to facial spoofing. Analyzing how light reflects from the presented object, the system can conclude if it lacks the necessary shape, depth and detail of a real human face. This method is based on frequency range analysis. The difference between a real face and a mask causes a discrepancy at higher frequencies which can be detected by the system.

Another potential anti-spoofing method is using a camera equipped with infrared sensors — they can gather thermal data of the target to analyze its emanation and distribution patterns, which are incredibly hard to falsify.

Pupil dilation can be used as another measure to detect presentation attacks. Eye pupils tend to dilate from time to time, in reaction to changes in lighting. An intentional but safe (nontraumatic) increase or decrease of light by the detection system will make pupils react — which can then be detected by a 2D camera, together with blinking.

3D cameras are also a promising solution for facial anti-spoofing. 3D cameras are able to capture a stereo image due to their “binocular vision”, and can also assess if the object presented has enough depth for a real human face and whether its size is adequate for the target.

LIDAR — light detection and ranging technology can also be used to detect facial spoofing attacks. The technology is able to measure distance and depth to assess whether a presented target is fake. LIDAR technology is implemented in the iPhone 12 in order to improve the security of its facial unlocking features.


Reto RETO3D Classic — an example of a 3D camera

Software-based solutions

Software-based (SW) solutions utilize existing camera systems (mobile or web) to detect liveness. Regular cameras provide only RGB texture information and do not support infrared, depth map or other 3D features. Therefore, software-based solutions involve complicated post-processing techniques to provide an accurate liveness decision.

There are two widely used SW-based methods: Active liveness and Passive liveness.

Active liveness methods involve challenge-and-response techniques where a person is asked to perform certain motions: smile, blink repeatedly, turn the head, raise eyebrows, close eyes etc. By analyzing the movement, the system can identify whether it is a real person or not. Some active liveness methods reconstruct a 3D map using a set of images or videos of the target recorded from different angles and distances.

Passive liveness methods are usually divided in two groups:

  • Passive user experience techniques detect liveness using data from the device sensors. In this case, the detection software requires access to device sensors to control parameters such as the color/light of the device display to analyze changes in color spectrum of the face. Other examples include controlling camera focus to do a reconstruction of 3D face map.
  • Fully passive techniques are those in which both the user experience and software integration are passive. These methods are based on texture analysis, optical flow analysis etc. of a single RGB image, set of images or recorded video.

Texture-based analysis is an effective and widely used method of passive detection. This method illuminates the target image/video and examines the skin texture by analyzing its reflecting properties. Since human skin and its fake analogues — especially printed photos — have different surface properties and texture patterns, differentiating a false image from an actual face using texture-based analysis is relatively easy.

Deep convolutional neural network (CNN) techniques are also effective passive methods for anti-spoofing. CNNs are specifically designed to examine visual images . Neural networks work by learning already known attack patterns, especially, in case of temporal and spatial values, from which aligned feature maps are made. Therefore, most CNN approaches are based on these principles.

Another proposed passive tactic uses integration of anisotropic diffusion. This principle states that light diffuses slower when reflected from a 2D surface compared to a 3D surface. A 3D object — like a human face — allows faster anisotropic diffusion due to its nonuniform surface. Once the system captures images which contain diffusion data, they can be “fed” to the Specialized Neural Network (SCNN) to detect their authenticity.



Based on the diffusion data extracted from the image, SCNN makes a decision on whether the face belongs to a real human. SCNN based detection is also applicable to video sequences (to prevent Replay Attacks).

References

  1. Facial recognition system, Wikipedia
  2. Woody Bledsoe, Wikipedia
  3. Face Recognition Technology (FERET), NIST
  4. Where is facial recognition used? Thales
  5. Hackers just broke the iPhone X's Face ID using a 3D-printed mask, Wired
  6. Information technology — Biometric presentation attack detection — Part 3: Testing and reporting
  7. Robust 2D/3D face mask presentation attack detection scheme by exploring multiple features and comparison score level fusion, ResearchGate
  8. Materials used to simulate physical properties of human skin, Wiley Online Library
  9. What are deepfakes? TechTalks
  10. What are Deepfakes? Are They a Security Threat? Tessian
  11. An Overview of Face Liveness Detection
  12. Stereo Camera, Wikipedia
  13. Lidar is one of the iPhone and iPad's coolest tricks. Here's what else it can do, Cnet
  14. Face Spoof Attack Recognition Using Discriminative Image Patches
  15. Enhanced Deep Learning Architectures for Face Liveness Detection for Static and Video Sequences
  16. Hyper-realistic face masks: a new challenge in person identification
  17. Face Anti-Spoofing Starter Kit, Medium
  18. Spoofing 2D face recognition systems with 3D masks, IEEExplore
  19. How Bkav tricked iPhone X's Face ID with a mask, YouTube