Can People Tell If a Text Is Created by AI?

Human ability to detect GenAI content has been a topic of discussion since at least 2017, when the first generative models capable of producing realistic images became publicly available.
With the rise of other advanced generative models — GPT, Bark, stable diffusion — identifying synthetic media has become quite challenging, if not impossible at times. According to a report, at least 63.5% of respondents couldn’t spot a text written by ChatGPT-4.
Human Evaluation and Behavior Analysis in Generated Text Detection
An experiment with evaluation featuring both manually written and generated content, as well as parameters ranging from “definitely human-written” to “definitely machine-generated”, demonstrated that volunteers can identify correct authorship with a probability basically close to “gambling”: slightly above or lower 50%.
An observation made by RoFT — a gamified platform where players are invited to guess AI-written content — says that generative models tend to make mistakes specific for every individual genre: News, Stories, and others. If a person is warned beforehand about these common errors to look for — contradicting statements or lack of named entities in the text, for example — they have a higher chance to detect GenAI.
Manual Detection Experiments
A number of AI-text detection experiments have been conducted:
- Detecting Poems, written by GPT-2
For this challenge, a mixed collection of poem pairs written by people and AI were presented to the human judges. The results showed that correct authorship was guessed with just a 50.21% accuracy. However, the judges' preference was towards the human authors: 1,091 out of 1,915. This may be due to GPT-2 usage, as this model is weaker at adding the emotional component than up-to-date GPTs.

- Experiments with Partly AI-Generated Texts
The game of Real or Fake Text (RoFT) offers the players to guess where the genuine text transitions into synthetic one. A study by the game’s authors revealed that human players are more successful at detecting this boundary with a 23.4% accuracy (the chance accuracy in this case is only 10%).

- Experiments with Self-Presentations
In a series of 6 experiments, volunteers were asked to guess which self-presentations — job resumes, dating and Airbnb profiles — are of human origin. The results showed a maximum accuracy of 52%.
- Experiments with Research Abstracts
A collection of abstracts from research papers, proportionately written by humans and AI, were assessed by a group of experienced science editors. Eventually, reviewers achieved only a 50% accuracy at abstract rating, and only 44.1% of human-written abstracts were identified correctly. This proves that even professional expertise doesn't always help detect the presence of GenAI.

- General Experiments with Texts
A 2023 survey revealed that GenAI can produce highly believable content on entertainment for the younger audience, travel, and health — this category managed to convince 53.1% of reviewers that it was human-made.

General Tips for Manual Detection of AI-Written Content
There are some techniques that can increase chances to identify artificial writing manually.
- Overview of Methods of Manual Detection of AI-Written Content
According to research, linguistic analysis can help detect GenAI writing. It tends to use longer sentences and more conjunctions, as well as less adjectives than humans. Additionally, a language model prefers neutral tone and displays lower perplexity score, which implies that it’s more monotonous than a human-written text.
- Training Evaluators of AI-Generated Texts
It is recommended to train human reviewers before they get to evaluating a text. Training should include: a) Providing GenAI writing samples that stylistically match the text for assessment, and b) Analyzing the reasons why human reviewers mark the content as generated or human-authored.
- Recommendations for Manual Detection of AI-Generated Text
The “red flags” of a synthetic text include repetitiveness, inconsistent story-telling or bizarre events that conflict/don’t relate to the plot, blunt sense of humour, lack of details, errors or made up facts, usage of outdated info, mention of non-existent sources, and so on.
Even AI-powered GenAI text detectors have their own issues. To read more, click here to check out our next article.