Burger menu

Manual Detection of AI-Generated Texts 

The human ability to detect GenAI texts without special equipment is still a debatable topic.

Can People Tell If a Text Is Created by AI?

Explanations of why volunteers classify a text as “AI” or “human-made”

Human ability to detect GenAI content has been a topic of discussion since at least 2017, when the first generative models capable of producing realistic images became publicly available. 

With the rise of other advanced generative models — GPT, Bark, stable diffusion —  identifying synthetic media has become quite challenging, if not impossible at times. According to a report, at least 63.5% of respondents couldn’t spot a text written by ChatGPT-4.

Human Evaluation and Behavior Analysis in Generated Text Detection

An experiment with evaluation featuring both manually written and generated content, as well as parameters ranging from “definitely human-written” to “definitely machine-generated”, demonstrated that volunteers can identify correct authorship with a probability basically close to “gambling”: slightly above or lower 50%.

An observation made by RoFT — a gamified platform where players are invited to guess AI-written content — says that generative models tend to make mistakes specific for every individual genre: News, Stories, and others. If a person is warned beforehand about these common errors to look for — contradicting statements or lack of named entities in the text, for example — they have a higher chance to detect GenAI.

Manual Detection Experiments

A number of AI-text detection experiments have been conducted:

  1. Detecting Poems, written by GPT-2

For this challenge, a mixed collection of poem pairs written by people and AI were presented to the human judges. The results showed that correct authorship was guessed with just a 50.21% accuracy. However, the judges' preference was towards the human authors: 1,091 out of 1,915. This may be due to GPT-2 usage, as this model is weaker at adding the emotional component than up-to-date GPTs. 

An improvised AI-poem in the style of Edgar Allan Poe
  1. Experiments with Partly AI-Generated Texts

The game of Real or Fake Text (RoFT) offers the players to guess where the genuine text transitions into synthetic one. A study by the game’s authors revealed that human players are more successful at detecting this boundary with a 23.4% accuracy (the chance accuracy in this case is only 10%).

The boundary detection task overview
  1. Experiments with Self-Presentations

In a series of 6 experiments, volunteers were asked to guess which self-presentations — job resumes, dating and Airbnb profiles — are of human origin. The results showed a maximum accuracy of 52%.  

  1. Experiments with Research Abstracts

A collection of abstracts from research papers, proportionately written by humans and AI, were assessed by a group of experienced science editors. Eventually, reviewers achieved only a 50% accuracy at abstract rating, and only 44.1% of human-written abstracts were identified correctly. This proves that even professional expertise doesn't always help detect the presence of GenAI.

Common reasons to attest a sentence as AI-generated according to human readers
  1. General Experiments with Texts

A 2023 survey revealed that GenAI can produce highly believable content on entertainment for the younger audience, travel, and health — this category managed to convince 53.1% of reviewers that it was human-made.  

Perception of the GenAI content by the broad public

General Tips for Manual Detection of AI-Written Content

There are some techniques that can increase chances to identify artificial writing manually.

  1. Overview of Methods of Manual Detection of AI-Written Content

According to research, linguistic analysis can help detect GenAI writing. It tends to use longer sentences and more conjunctions, as well as less adjectives than humans. Additionally, a language model prefers neutral tone and displays lower perplexity score, which implies that it’s more monotonous than a human-written text.  

  1. Training Evaluators of AI-Generated Texts

It is recommended to train human reviewers before they get to evaluating a text. Training should include: a) Providing GenAI writing samples that stylistically match the text for assessment, and b) Analyzing the reasons why human reviewers mark the content as generated or human-authored.

  1. Recommendations for Manual Detection of AI-Generated Text

The “red flags” of a synthetic text include repetitiveness, inconsistent story-telling or bizarre events that conflict/don’t relate to the plot, blunt sense of humour, lack of details, errors or made up facts, usage of outdated info, mention of non-existent sources, and so on.  

Even AI-powered GenAI text detectors have their own issues. To read more, click here to check out our next article. 

Try our AI Text Detector

Avatar Antispoofing

1 Followers

Editors at Antispoofing Wiki thoroughly review all featured materials before publishing to ensure accuracy and relevance.

Article contents

Hide

More from Deepfake detection

avatar Antispoofing Watermark Spoofing Attacks on Large Language Models

What is an LLM Watermark Spoofing Attack? Contrary to the removal attack, watermark spoofing aims at producing some type of…

avatar Antispoofing Watermark Erasing Attacks on Large Language Models

What Is LLM Watermark Erasing (Removing) Attack? Watermark erasing attack is an unauthorised process of removing metadata inserted in some…

avatar Antispoofing Human vs. Machine Performance in Generated Text Detection

Why is AI-Generated Content Detection Important? Content produced by Large Language Models (LLMs) lacks human expertise and understanding, which often…

avatar Antispoofing Text Watermarking: Definition, General Approaches, Application

What Is Text Watermarking? Text watermarking is a technology, which allows copyright protection with the help of special metadata inserted…