Combating Deepfake Videos with Digital Watermarking

Understanding Digital Watermarking

Digital watermarking is a technique to protect authenticity of digital media: pictures, video, audio, as well as mixed media. The concept’s main idea is to insert a specific hidden signature into the media — such as a news footage — that is invisible to the viewer, yet identifiable by a system.

The idea is inspired by the traditional watermarking that dates back to 13th-century Italy, and ever since has been used to designate correspondence, money, royal decrees, and other essential documents as genuine.

However, in the digital domain, watermarks can also be created when authentic media is manipulated — altering its “digital fingerprint,” in other words —thereby flagging it as false, altered, or even dangerous, while also alerting the viewers. This approach is mentioned by the researcher Nicholas Gardiner.

At the same time, watermarking is still admittedly vulnerable to hacking or spoofing, like many other defense mechanisms. Watermarks can be tampered with and put back inside the content, making it appear legitimate. This insidious technique is especially feasible in white-box attack scenarios, where bad actors know how a security system functions from the inside.

Somewhat similar concepts are voiced in the Content Authenticity Initiative launched by Adobe in partnership with other companies, news agencies, and creative studios. This initiative calls for creating a digital environment in which creators can easily protect their authorship, and where the true origin of a piece of media can be traced basically within a minute.

Digital watermarking draws its basis from an 800-year-old technology

Exploring Watermarking Techniques

So far, digital watermarking hasn’t received as much attention as other methods to recognize and expose deepfakes: facial antispoofing, liveness cues detection, analysis of the artifacts and anomalies in various types of media, deployment of state-of-the-art deep learning architectures, and so on. These techniques are often deployed after the fact, to identify tampered media, whereas digital watermarking is a proactive measure to potentially further discourage spoofing.

Application areas where digital watermarking can prevent deepfakes

While digital watermarking has application potential in a range of fields, it’s worth noting that it is especially promising for the digital forensics field.

In order to be effective, a usable watermarking solution should meet the following criteria:

Resilience. A watermark must be protected from tampering: compression, recoloring, cropping, spatial filtering, rotation, blurring, scaling, and so on. There should be a system in place to alert viewers when tampering has been detected.

Security. The “backend” of the watermark must be invisible to both regular audience and malicious actors, since visibility would allow for analyzing its anatomy and launching a reverse-engineering attack.

Data capacity. This denotes how much of the media quality — such as bitrate or resolution — can be lost when a watermark is encoded. Ideally, the quality loss should be minimal and imperceptible to a naked eye.

Cost-efficiency. The “cost” of digital watermarking comes from the computational power spent on creating it. The cost should be minimal to avoid overloading and time costs.

Fragility. This concept takes into account that whenever a watermark is tampered with, it “vanishes” without a trace, possibly letting the system know that intrusion has occurred, and the audience must be warned.

Specific metrics have been suggested to standardize digital watermarking. These include Peak Signal to Noise Ratio (PSNR), Similarity Index (SSIM), and Mean Square Error (MSE), which help measure imperceptibility of the system.

Digital Watermarking Strategies Against Deepfakes

Currently, there are several approaches to digital watermarking.

Spatial domain watermarking (SDW)

SDW refers to modification of the least significant bit (LSB) of the pixel value in a host image with a watermark bit. LSB has the lowest value in the binary place value system, and therefore it can be “sacrificed.” Overall, this approach is easy and cost-efficient, allowing for addition of numerous watermarks — lowering the success rate of intruders, as this increases the chances of some original watermarks “surviving” the attack.

Representation of LSB used in watermarking models

This technique can be improved by replacing every rightmost bit with a watermarking bit, making it quite challenging to tamper them. An alternative approach includes usage of the local binary pattern method (LBP), conversion of the initial picture into square non-overlapping blocks, and spatial relation analysis between mid and neighboring pixels, as well as insertion/extraction of the watermarks according to the pixel data.

Transform domain watermarking (TDW)

TDW is reportedly superior to the previous method, as it’s more resilient to counterfeit. With it, watermarks are inserted into transformed coefficients with the help of frequency transformation, while human vision characteristics are also taken into consideration.

One of the methods is based on a multi-resolution watermarking algorithm. It breaks the host picture and watermark into detailed bands. These detailed bands of both image and watermark are then blended together. It boasts decent imperceptibility, but stays vulnerable to geometric attacks: changing aspect ratio, scaling, cropping, warping, and so on.

Geometric attack examples aimed at image watermarking

Most geometric attacks can be averted with the watermark insertion in the middle band of Discrete Cosine Transform (DCT) coefficients. However, this technique isn’t immune to cropping and stays costly in terms of computational power.

Other suggested techniques include inserting the watermark at low frequencies, combination of wavelet-based watermarking algorithm and pixel-wise masking model that make present watermarks more undetectable, usage of the substitution box to create semi-fragile watermarks with the help of Discrete Fourier Transform (DFT) and chaotic map, and so forth.

Pivotal Tuning Watermarking (PTW)

PTW for pre-trained generators is presented as a watermarking method that surpasses query monitoring performed by companies like OpenAI and passive liveness detection. Plus, it’s dramatically faster than “watermarking from scratch”: from 1 GPU-month to merely 1 hour. Additionally, it requires no training data.

It’s capable of returning a watermarked generator with high-fidelity thanks to the cooperation of n-bit watermarking messages, a specific number of iterations, a regularization parameter, and a learning rate α. It’s done before the watermarking for the latent codes is complete.

AI-generated images in pre- and after-watermarking stages with the PTW method

PTW demonstrated resilience to the black-box scenarios when attackers don’t know how the solution actually works. In the white-box scenario, which implies that malicious actors have direct access to the system’s backend, PTW led to the quality degradation of the tampered media, which makes it easier to expose.

Scheme of deepfake attribution by watermarking the generator

Audio

A 2023 report states that audio watermarking — while useful in music copyright protection — may be a pressing issue in detecting synthetic voices, including those employed by scammers. The problem arises from the fact that the human voice lacks spectral richness compared to a music piece, so there’s little space for watermark to be put in.

Another problem lies in the gradual quality degradation that an audio signal receives when traveling through the phone channels: downsampling, compression, loss of spectral information, as it makes detecting a watermark even harder.

Degradation channels that make audio watermarking challenging

As a result, audio watermarks will be easier to manipulate, even if all providers of the voice-synthesizing tools will agree to embed them. Attackers could also use open-source or personally architected models that will feature no watermarking at all.

Challenges and Critiques

Example of adversarial purification with diffusion to ‘clean’ deepfakes

According to a study co-authored by the Reliable AI Lab director, Soheil Fazi, digital watermarking technology has not yet gotten to a point of sufficient usability. His experiment showed that watermarks could be successfully removed from the pictures with the help of adversarial diffusion.

This approach “dissolves'' an image and then restores it with the reverse generative method to retrieve a picture without any hidden signatures. Theoretically, the same method could be applied to other types of media, so creating an “invincible” watermark model is still a gauntlet to pick.

Digital watermarking is one of many approaches that has been suggested to thwart spoofing efforts. To read more about other standards that have been suggested in deepfake detection, read our next article here.

Combating Deepfake Videos with Digital Watermarking

Understanding Digital Watermarking

Exploring Watermarking Techniques

Digital Watermarking Strategies Against Deepfakes

Challenges and Critiques

Sign up with email

Check your inbox