Burger menu

Spoofing Attacks on AI Text Detectors and Defense against Them

AI text detector spoofing is a malicious practice of presenting synthesized texts as human-written or vice versa.

What Is a Spoofing Attack on an AI-Text Detector?

Typical vulnerabilities of an AI text detector

AI text detector spoofing is a deliberate attempt of presenting a text created with a Large Language Model (LLM) as a human-written to a detecting solution or vice versa. Typically, it is possible by inserting small perturbations into the writing, which confuses an AI-detector, as it’s trained to notice patterns and cannot comprehend the essence of the content on a deeper, human-like level.  

Types of Spoofing Attacks on AI Text Detectors

The green list for the definite article “the”

There are several methods to successfully spoof a text detector:

  1. Soft watermark attack

Green list tokens are used to expose text paraphrasing, so watermarked written content naturally incorporates these tokens. If malicious actors access the green lists, they can produce human-authored content that will be mistakenly identified as “AI-generated”.

  1. Spoofing Retrieval-Based Detectors

This type of detecting solution is based on retrieving semantically-similar generations that can be potentially produced by the same AI. It’s possible due to scanning a sequence database that belongs to a certain API to find possible sequence matches.

A method dubbed recursive paraphrasing can be used to rephrase the text without affecting its initial context. For that purpose the duet of DIPPER and LLaMA-2-7B-Chat models was used, diminishing the detector’s accuracy from 99.3% to 9.7%.

Overview of the recursive paraphrasing
  1. Spoofing Zero-Shot and Neural Network-Based Detectors

The previously mentioned research states that zero-shot detectors can be tricked with elaborate paraphrasing and at least 5 queries. Even though some detectors can demonstrate resilience on certain training text-generation datasets, with other datasets — perhaps unfamiliar — their performance significantly drops as observed by researchers.

Defense against Spoofing Attacks on AI Text Detectors

DetectGPT’s sample model
  1. Enhanced watermarking

A cryptography-based solution proposed for improving defence against paraphrasing is dubbed Bileve (shortened Bi-level). It is incorporates two main components:

  • Integrity check. It employs fine-grained signature bits to confirm integrity of  a text through embedding message-signature pairs.
  • Source tracking. This is possible with the use of coarse-grained signal enhanced with the Weighted Rank Addition (WRA). 

According to the authors of the solution, Bi-level shows excellent performance even when as much as 10% of the text tokens were altered. 

  1. Contrastive Domain Adaptation Framework
ConDA system overview

Contrastive Domain Adaptation framework or ConDA is a fuse of the standard domain adaptation techniques with the representation power of contrastive learning. It allows the solution to detect present domain invariant representations — in turn they help conduct unsupervised detection tasks. 

  1. Retrieval-Based Defense
Overview of the retrieval-based detection

Retrieval-based methods analyze previously made generations by a GenAI with similar semantic content. The idea behind the solution is that to achieve satisfactory similarity scores, a text generator has to follow the same algorithm over and over, which leads to self-repetition.

For the detection purposes a database of the previously generated content is used: the detector searches through it to find similar pieces of synthetic writing to correctly classify the input text. However, two serious issues can be caused by the technique: 

a) Retrieval-based detectors won’t be able to correctly classify a text synthesized by an unfamiliar model b) Scalability requires millions of generated texts to be stored in the training dataset, which seems rather a costly procedure.   

Try our AI Text Detector

Avatar Antispoofing

1 Followers

Editors at Antispoofing Wiki thoroughly review all featured materials before publishing to ensure accuracy and relevance.

Article contents

Hide

More from AI Generated Content

avatar Antispoofing Converting Speech and Text into Real Material Objects

Is It Possible to Transform Speech or Text to Real Material Objects? Transforming speech into objects is rather an elusive…

avatar Antispoofing Generative AI in Design, Engineering and Manufacturing

GenAI Usage in Manufacturing and Its Benefits Apart from producing multimedia, Generative AI has also been adopted to solve manufacturing-related…

avatar Antispoofing Spoofing in Geography and Countermeasures

What Is Geo-Spoofing? Geographical spoofing is a technique of hiding someone’s location or even counterfeiting a genuine GPS signal. It…

avatar Antispoofing History of AI Picture Generators

When Did the First AI Image Generators Appear? The earliest known AI image generator can be dated back to 1973…