Manual Detection of AI-Generated Texts

Can People Tell If a Text Is Created by AI?

Explanations of why volunteers classify a text as “AI” or “human-made”

Human ability to detect GenAI content has been a topic of discussion since at least 2017, when the first generative models capable of producing realistic images became publicly available.

With the rise of other advanced generative models — GPT, Bark, stable diffusion — identifying synthetic media has become quite challenging, if not impossible at times. According to a report, at least 63.5% of respondents couldn’t spot a text written by ChatGPT-4.

Human Evaluation and Behavior Analysis in Generated Text Detection

An experiment with evaluation featuring both manually written and generated content, as well as parameters ranging from “definitely human-written” to “definitely machine-generated”, demonstrated that volunteers can identify correct authorship with a probability basically close to “gambling”: slightly above or lower 50%.

An observation made by RoFT — a gamified platform where players are invited to guess AI-written content — says that generative models tend to make mistakes specific for every individual genre: News, Stories, and others. If a person is warned beforehand about these common errors to look for — contradicting statements or lack of named entities in the text, for example — they have a higher chance to detect GenAI.

Manual Detection Experiments

A number of AI-text detection experiments have been conducted:

Detecting Poems, written by GPT-2

For this challenge, a mixed collection of poem pairs written by people and AI were presented to the human judges. The results showed that correct authorship was guessed with just a 50.21% accuracy. However, the judges' preference was towards the human authors: 1,091 out of 1,915. This may be due to GPT-2 usage, as this model is weaker at adding the emotional component than up-to-date GPTs.

An improvised AI-poem in the style of Edgar Allan Poe

Experiments with Partly AI-Generated Texts

The game of Real or Fake Text (RoFT) offers the players to guess where the genuine text transitions into synthetic one. A study by the game’s authors revealed that human players are more successful at detecting this boundary with a 23.4% accuracy (the chance accuracy in this case is only 10%).

Experiments with Self-Presentations

In a series of 6 experiments, volunteers were asked to guess which self-presentations — job resumes, dating and Airbnb profiles — are of human origin. The results showed a maximum accuracy of 52%.

Experiments with Research Abstracts

A collection of abstracts from research papers, proportionately written by humans and AI, were assessed by a group of experienced science editors. Eventually, reviewers achieved only a 50% accuracy at abstract rating, and only 44.1% of human-written abstracts were identified correctly. This proves that even professional expertise doesn't always help detect the presence of GenAI.

Common reasons to attest a sentence as AI-generated according to human readers

General Experiments with Texts

A 2023 survey revealed that GenAI can produce highly believable content on entertainment for the younger audience, travel, and health — this category managed to convince 53.1% of reviewers that it was human-made.

Perception of the GenAI content by the broad public

General Tips for Manual Detection of AI-Written Content

There are some techniques that can increase chances to identify artificial writing manually.

Overview of Methods of Manual Detection of AI-Written Content

According to research, linguistic analysis can help detect GenAI writing. It tends to use longer sentences and more conjunctions, as well as less adjectives than humans. Additionally, a language model prefers neutral tone and displays lower perplexity score, which implies that it’s more monotonous than a human-written text.

Training Evaluators of AI-Generated Texts

It is recommended to train human reviewers before they get to evaluating a text. Training should include: a) Providing GenAI writing samples that stylistically match the text for assessment, and b) Analyzing the reasons why human reviewers mark the content as generated or human-authored.

Recommendations for Manual Detection of AI-Generated Text

The “red flags” of a synthetic text include repetitiveness, inconsistent story-telling or bizarre events that conflict/don’t relate to the plot, blunt sense of humour, lack of details, errors or made up facts, usage of outdated info, mention of non-existent sources, and so on.

Even AI-powered GenAI text detectors have their own issues. To read more, click here to check out our next article.

Manual Detection of AI-Generated Texts

Can People Tell If a Text Is Created by AI?

Human Evaluation and Behavior Analysis in Generated Text Detection

Manual Detection Experiments

General Tips for Manual Detection of AI-Written Content

Sign up with email

Check your inbox