Deepfake Detection Competitions

From Antispoofing Wiki

Deepfakes: Harmful Potential & Detection Competitions

Deepfakes are a type of falsified media — audio or visual — produced with deep learning. They employ techniques — such as face swapping — to mimic a target person’s appearance or voice. The technology can be traced back to 1997 when the first digital face manipulation tool Video Rewrite was presented by the Interval Research Corporation. In 2017 deepfake videos emerged as a mainstream threat when their production tools became widely available to the common users.

Currently, deepfakes are considered as a serious threat based on AI and machine learning technologies. Crime Science reports that deepfakes are capable of producing devastating societal harm: from political slander to petty money thefts via realistic impersonations.

As a countermeasure, deepfake detection competitions were proposed to engage anti-spoofing professionals and amateurs in battling this growing threat. Deepfake Detection Challenge 2020 is considered to be the biggest global event targeting deepfakes, with a $1 million prize pool, numerous applications and an extensive deepfake dataset featuring 3,500 paid actors.

Its goal is to find an effective, easy-to-deploy, and highly accurate tool that will detect falsified media quickly, while also providing a full-scale evidence report.


The emergence of deepfakes and its associated harms has spurred regional and international alarm. For instance, the European Parliament published a study, Tackling Deepfakes in European Policy. The document lists the risk categories brought on by the technology including bullying, extortion, identity theft, election manipulations, and so on.

Deepfake Detection: Competitions, Goals & Results

In order to spread awareness on deepfake harms and find creative countermeasures against it, deepfake competitions were organized in the USA, China, Singapore, and European countries: during the Irish Young Scientist & Technology Exhibition (BTYSTE), a novel deepfake detection method was proposed by a school student, which debunked the fake Queen’s speech with a "94.3% confidence".



There are a number of leading challenges and competitions, which set the goal of successful deepfake detection.

FaceForensics++

FaceForensics++ is a large independent deepfake dataset, which allows training various deepfake detectors. At the moment, it contains 1,000 video sequences with 509,914 images used as bona fide visual data.

As for the digital face manipulations, it employs 4 methods:

  • FaceSwap. The method superimposes face regions from a source video to the destination video by using facial landmarks detection, 3D templates, blendshaping, and color correction.
  • Face2Face. It involves facial reenactment by translating the facial expressions from a source video to the destination video. This is done using video input streams and manual keyframe selection.
  • Deepfake. Basically, any media, static or live, that was doctored using a deepfake app such as Face Swapper, DeepAR, FaceSwap, FakeApp etc.
  • Neural Textures. It involves tracked facial geometry obtained with photometric reconstruction loss coupled with an adversarial loss, and other techniques.

The FaceForensics++ challenge welcomes authors and researchers to test their detection methods with their extensive database. The results need to be submitted to the evaluation server. All FaceForensics++ test data and evaluation results are publicly available.

DFDC

Deepfake Detection Challenge or DFDC is, by far, the biggest deepfake detection contest. Hosted by Kaggle in partnership with AWS, Facebook, and The Partnership on AI, it currently engages the highest number of applications (2,256 teams).



The competition lasted from March to June 2020, offering $1 million prize pool. The winning team or researcher was offered half of the prize sum.

DFDC has a massive dataset of 128,154 consented videos. It is segregated in 4 parts:

  • Training set. It is available for download outside the event.
  • Public Validation set. It comprises 400 select videos.
  • Public Test set. A sort of leaderboard where submitted code is ranked.
  • Private Test set. It contains organic videos for the final testing.

The competition had its own application requirements for code, as well as an evaluation algorithm.


[math]\displaystyle{ LogLoss=-\frac{1}{n}\sum_{i=1}^n[y_i\log\bigl(\hat{y_i}\bigr)+\bigl(1-y_i\bigr)\log\bigr(1-\hat{y_i}\bigr)] }[/math]


Translation:

  • n — The number of videos being predicted.
  • yi — Predicted probability of a video being fake.
  • yi1 if the video is fake, 0 if real.
  • log() — The natural (base e) logarithm.

Deeper Forensics Challenge 2020

Deeper Forensics Challenge 2020 focuses on the "real-world face forgery detection". As a groundwork of the competition, it employs a massive dataset DeeperForensics-1.0 with 60,000 videos that include 17.6 million frames.

The challenge lasted for 9 weeks, 8 of which were dedicated to the development phase. The final week was about the testing and evaluation. Like DFDC, the Deeper Forensics challenge applied the binary cross-entropy loss (BCELoss) to attest the competing solutions:


[math]\displaystyle{ BCELoss=-\frac{1}{N}\sum_{i=1}^N[y_i\cdot\log\bigl(p\bigl(y_i\bigr)\bigr)+\bigl(1-y_i\bigr)\cdot\log\bigl(1-p\bigl(y_i\bigr)\bigr)] }[/math]


Translation:

  • N — Quantity of videos in the hidden test set.
  • yi1 for a fake video, 0 for a real one.
  • p (yi) — Denotes the predicted probability that video i is fake.

A total of 25 teams took part in the contest. The winning team’s detection strategy includes three steps:

  • Face Extraction. It involves extraction of 15 frames with VideoCapture. Then the frame-by-frame face region detection comes into play with MTCNN.
  • Classification. Determines the probability of a video being fake. Three EfficientNet models — B0, B1, B2 — play a major part.
  • Output. The average of the face scores for the extracted frames is estimated, predicting a probability that the video is falsified.


DFGC 2021

Deepfake Game Competition 2021 is also a two-stage contest: it includes Deepfake Creation and Deepfake Detection tracks. For the Creation track contestants were tasked to create 1000 face-swap images. In the Detection track the task was to create solutions based on the Celeb-DF dataset.

This dataset contains 5,639 deepfake videos in high-quality. While the created deepfakes were attested with some similarity metrics, the detection solutions were examined with The Area Under the Curve and Receiver Operating Characteristics (AUC ROC).

Winners in the Creation track applied Faceshifter and adversarial noise to enhance the antidetection qualities. In the Detection track the winning solution involves EfficientNet-B3 and MTCNN.


OpenMFC

Open Media Forensics Challenge is hosted by the National Institute of Standards and Technology (NIST).

Its primary goals are:

  • IMDL. Image Detection and Evaluation to detect any possible doctoring of the test image.
  • VMD. Video Manipulation Detection to discover video manipulations.
  • IGMDL. Image GAN Manipulation Detection and Localization for detecting Generative Adversarial Network intervention.
  • VGMD. Video GAN Manipulation Detection uses the previous approach for videos.



It is based on a system of leaderboards, which are updated as soon as a new submission is made. This makes comparison of the existing methods easier.

ForgeryNet

ForgeryNet is based on the eponymous dataset with 2.9 million images and 221,247 videos. Besides, there are currently 7-level image and 8-level video approaches, which makes it a great source for training AI-based solutions.

AUC ROC curve is used for evaluation with True Positive Rate (TPR) and False Positive Rate (FPR) values. In turn, this adds two classes: Positive for falsified media and Negative for bona fide images and videos.

The winning deepfake detection solution included the following elements:

  • DFQ. Dynamic Feature Queue guarantees a more comprehensive metric space for metric learning, which ensures that model training remains stable.
  • Global Patchwise Consistency. It is employed as a cue to detect fake media without face detection.
  • Video Track Processing. Provides rapid forgery verification, as well as temporal localization by sampling the video in question.

The solution is based on ResNeSt-50. However, face detection is omitted to save resources and time: instead the whole frame or image is analyzed.

References

  1. Video Rewrite: Driving Visual Speech with Audio
  2. ‘Deepfakes’ ranked as most serious AI crime threat
  3. Deepfake detection contest winner still guesses wrong a third of the time
  4. Celeb-DF: A New Dataset for DeepFake Forensics
  5. Tackling deepfakes in European policy
  6. Young Scientist: Cork student wins with programme to detect ‘deepfakes’
  7. Channel 4’s ‘deepfake’ Queen’s speech sparks hundreds of complaints to Ofcom
  8. FaceForensics++: Learning to Detect Manipulated Facial Images
  9. How To Install FakeApp
  10. Photometric reconstruction loss
  11. DeepFake Detection on paperswithcode.com
  12. The Partnership on AI Steering Committee on AI and Media Integrity
  13. Sample from the DFDC deepfake dataset
  14. Deepfake Detection Challenge
  15. Handbook of Digital Face Manipulation and Detection From DeepFakes to Morphing Attacks
  16. DeeperForensics-1.0
  17. Simple Neural Network with BCELoss for Binary classification for a custom Dataset
  18. Celeb-DF: A Large-scale Challenging Dataset for DeepFake Forensics
  19. AUC ROC
  20. DFGC 2021: A DeepFake Game Competition
  21. Open Media Forensics Challenge
  22. National Institute of Standards and Technology
  23. Deepfake data used in OpenMFC
  24. ForgeryNet - Face Forgery Analysis Challenge 2021: Methods and Results
  25. ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
  26. ResNeSt-50