What Are the Main Algorithms for Detecting AI-Generated Texts?

There are three main approaches to detect synthesized writing:
- Watermarking. Hidden data is added to the text, which signals that it was AI-generated. Such a modification is invisible to the human eye.
- Statistical methods. Detectors based on the statistical outliers that look for artifacts that a GenAI can leave in a text. They focus on entropy, perplexity, and n-gram frequencies observable in a text.
- Classifying. Classifiers are specifically trained to distinguish human and synthetic written content with the help of datasets that contain numerous samples of both.
Virtually all existing detectors rely on these approaches.
Is It Necessary to Distinguish AI-Generated and Human-Written Texts?
It is acknowledged that an ability to accurately detect synthetic writing is crucial. Large Language Models (LLMs) can be used to spawn fake news, create manipulative political commentary, produce materials, such as emails, used for phishing, or write code with embedded viruses and backdoors. Besides, generated texts contribute to academic dishonesty and can dilute the overall quality level of the scientific writing with inaccurate facts, fake citations, and straightforward plagiarism.
The Main Algorithms for Detecting AI-Generated Text
Algorithms for AI-text detection that are widely discussed in the research papers include:
- Watermarking
Watermarking is a method of inserting “invisible” signals into the text that can be recognized by a detector only. Watermarks can rely on metadata, semantics, or stenographic approaches. The proposed methods include using hash function for generating bit sequences, constructing a specific succession of sentences/paragraphs, converting an image-based watermark into a text string, employment of the adversarial training for creating a secret message, and others.
- Statistical Outlier Detection Methods

A solution dubbed Giant Language Model Test Room (GLTR) is based on statistical outlier analysis, which allows it to detect synthetic writing. It highlights the vocabulary and word sequences that would typically be utilized by an AI, giving a human observer insight into the true nature of the text in question. Another technique scrutinizes the probability function of an LLM that typically makes a generated text occupy the negative curvature in the log probability.

- Classifier Methods

Classifier models are discriminators trained to differentiate synthetic and human writing. Among the proposed solutions are the controllable text-generator dubbed Grover, Energy-based models trained for classification purposes, combination of the term frequency and inverse document frequency (TF-IDF) with deep learning architectures, and others.
- Retrieval-Based Detection Methods

Retrieval-based detectors are called so because they retrieve similarities between the target text and a database of the previously synthesised writing so recurring sentences and passages written by an AI could be identified. It is reported that this approach successfully fends off paraphrasing attacks.
Effectiveness of Algorithms for Detecting AI-Generated Texts
A rather pessimistic view states that sooner or later AI detectors will not be able to identify synthetic texts due to generative models obtaining a higher level of sophistication that allows them to produce extremely human-like writing.
However, authors of the retrieval-based approach believe that the quality of a generated text becomes secondary, as their technique focuses on other and more subtle clues capable of exposing generated written content no matter how “humanized” it may seem.