Deepfake Attacks in Remote Identification and Countermeasures
While remote identification is a convenient way to digitally authorize a person and mitigate online fraud, it also stays vulnerable to certain attack types.
Definition & Problem Overview
Remote identity proofing (RIDP) is a procedure of collecting and processing information regarding someone’s identity online. It’s vital for digital onboarding, Know Your Client compliance, and so on. Among all else, it greatly depends on analyzing biometric data: facial features, fingerprints and other liveness parameters.
At the same time, RIDP suffers from a number of vulnerabilities. The biggest threat that it’s facing are the Presentation Attacks (PAs). This is a type of attack that features specific tools capable of imitating a target person — they are literally presented to the sensors of a verification system.
PAs include a wide range of Presentation Attack Instruments (PAIs): silicone masks, printed cutouts, replayed videos, as well as digital deepfake manipulations. Therefore, it’s essential to provide fast and highly accurate liveness detection for a RIDP system.
According to a report by the European Agency on Cybersecurity (ENISA), deepfakes and replay attacks made with the hi-definition screens represent the biggest threat to the modern RIDP solutions.
According to a survey held by iProov company, 51.9% of the respondents are worried that deepfakes will be used for stealing their identity and setting up credit cards in their name fraudulently. At the same time, 81.3% of the respondents believe that biometrics will be used for identity proofing in the future.
ENISa isn’t the only official body concerned with identity theft performed with fabricated media. In 2021, Department of Homeland Security released a well-detailed report on existing deepfake and cheapfake technologies, as well as the threats they pose.
Attack Types Aimed at RIDP
ENISA’s report outlines 4 primary attack types that can compromise remote identity proofing.
Possibly the most primitive technique, it implies that a photo of the target person — printed or shown on a high-resolution screen — is presented to the camera of a RIDP system. It’s relatively easy to spot, as photos lack necessary parameters of a living person’s face: depth, natural shadows, skin texture, retina light reflection, etc.
Video replay attacks
A video, featuring either an attacker’s or a target’s face will be replayed to the RPID system on a high-resolution. The video can be stolen, pre-recorded or synthesized with the deepfake tools.
3D mask attacks
3D masks used for attacks range in quality: from cheap-to-make cutouts to elaborate examples produced with a 3D printer. In the case of a Japanese company, a highly realistic mask costs less than $1,000.
According to a report, the number of deepfakes shows a nearly exponential growth, doubling roughly every 6 months. Considering that some deepfake tools are easy-to-obtain, it’s hard to estimate how many synthetic videos exist at the moment.
Typically, all deepfake attacks follow the same scenario:
- Harvesting. In this phase, source images or videos of a target victim are collected. Mostly, they are harvested from the publicly accessible platforms: Instagram, Facebook, Telegram, YouTube, LinkedIn, and others.
- Training. The obtained visual samples (dataset) are "fed" to a specialized software. This can be a simple application like Reface or a sophisticated neural network that requires specific skills and knowledge to be operated. Upon "feeding", these tools will learn fundamental traits of a target person’s appearance: facial features, expressions, mannerisms, and so on.
- Altering. The original images will be processed with various digital manipulation techniques: identity swapping, face morphing, attribute manipulation, etc.
- Attack. Finally, fraudsters have two ways of producing an attack. The first one is a basic PA, when synthetic media is presented to an evidence-based proofing system. The second way implies injecting the fabricated video directly into the camera’s stream.
It’s worth noting that while the second attack method requires extra effort — popularly referred to as "hacking" — it also gives the fraudsters a bigger leeway for action. For example, they can employ a puppeteering technique in real time — this will allow fooling a challenge-based proofing system that prompts a user to blink, nod or smile to get verified.
Face2Face — an application developed at Stanford in 2016 — is a bright example of such a tool. It’s based on a Recurrent Neural Network (RNN) and allows a "facial reenactment of a monocular video sequence". As a result, this tool can re-render a photorealistic output video, adding to it real-time face alterations: lip-synching, etc.
Nonetheless, video deepfakes aren’t the only relevant attack instrument. It appears, audio deepfakes are even more successful at achieving criminal goals. The first known case of an audio deepfake successfully bypassing a human operator took place in 2019 when £200,000 was ordered to be wired to a fraudulent bank account.
Just like a video deepfake, its audio counterpart requires a vast dataset of audio samples to be trained with a deep learning model. They can be obtained from recorded phone calls, voice messages or applications like Clubhouse, Twitter Spaces, and others that provide the "audiorooms".
Liveness detection is seen as the most effective remedy against deepfake attacks on remote identity proofing systems. Liveness detection usually separated into two types:
A challenge-based system prompts an applicant to perform a random action: follow a dot on the screen with their eyes, wave a hand rapidly, turn their head, nod, blink, and so on. As noted, rapid movement helps to reveal a deepfake, as the software cannot catch up with quick flickering. This results in blurring, warping, distortion and other unnatural artifacts.
Overlapping is another nuance that deepfake tools are struggling with. Therefore, it is recommended to prompt an applicant to "place a hand or an object in front of the face", as it will result in heavy visible distortion in case of a puppeteered deepfake.
However, active liveness detection has its own disadvantages: increased customer friction, suspicion and misunderstanding on the user’s side, as well as providing some insight for the fraudsters into its work mechanics.
Passive detection is attested by many experts as a more favorable solution. It provides a vast range of detection techniques that occur without a user even noticing them. Mostly, passive detection focuses on the main life indicators: blinking, light reflection properties intrinsic to the human eye or skin, and so on.
For example, it’s possible to detect an impersonation by analyzing the wavelength of the light reflected off the eye retina — the effect is known as the "bright pupil feature". Another method helps to measure the facial depth by analyzing its low spectrum frequencies — something that a deepfake or mask wouldn’t have. Both can be assisted with the simple smartphone flash lighting.
Heart rate estimation is also a promising passive method. It detects subtle color variations in the human caused by oxygen saturation, heart activity and blood circulation. Deepfakes are regularly devoid of such characteristics.
Liveness Detection Criticism & Safe Ecosystem
Ann-Kathrin Freiberg of BioID mentions that biometric detection isn’t always successful with detecting deepfakes. According to her, the ISO/IEC 30107-3 standard doesn’t really specify deepfake attacks. She claims that fake videos injected into the video flow — an application level attack — poses a far more serious threat than a regular manipulated media.
ENISA’s report points to the urgency of creating a global ecosystem that will guarantee safe and secure remote identity proofing. It includes 5 key elements: Environment controls, Identity document controls, Presentation attack detection, Organizational controls and Process Controls. Together they can mitigate deepfake usage and document forgery.
- Remote Identity Proofing - Attacks & Countermeasures
- The Threat of Deepfakes
- Increasing Threats of Deepfake Identities
- Japanese Company Now Offers Ultra Realistic 3D-Printed Masks of Human Faces
- Realistic 3D mask produced by Kamenya Omoto
- Report: number of expert-crafted video deepfakes double every six months
- Face2Face: Real-time Face Capture and Reenactment of RGB Videos
- FaceRig app employed a similar puppet technique as Face2Face
- Listen carefully: The growing threat of audio deepfake scams
- Exploiting Visual Artifacts to Expose Deepfakes and Face Manipulations
- Insight on face liveness detection: Asystematic literature review
- Deepfakes can be detected by analyzing light reflections in eyes, scientists say
- An overview of face liveness detection
- Example of the blinking analysis
- DeepFakes Detection Based on Heart Rate Estimation: Single- and Multi-frame