Malicious Generative AI: Threats and Countermeasures

Definition and Problem Overview

The term “black hat” originated in cybersecurity from the Western movies

Emergence of Generative Artificial Intelligence (GenAI) started off with automating numerous tasks: drawing pictures, generating large volumes of text, writing programming code, assessing its quality, and so on. Natural Language Processing (NLP) especially benefits from GenAI’s development, as it’s capable of delivering grammatically impeccable content in various genres and styles. Some examples of NLP you’re probably familiar with are search autocomplete, spell check, and translation apps.

Another more recent example of NLP that took the world by storm was ChatGPT. With ChatGPT being the primary tool, AI tools have been quickly repurposed for criminal activities in cyberspace. Typical fraud tactics include:

Email scams. Business Email Compromise (BEC) is the most prevalent tactic. Usually, it is an email masquerading as an invoice requesting an urgent payment or a message with instructions from someone claiming to be from a superior rank. The latter is also known as CEO scam.
Phishing. Emails can be disguised as legitimate messages from a specific entity — school, lottery agency, employment office — to illegally procure sensitive data.
Exploitation. Generative models can ‘sniff out’ and exploit vulnerabilities found in code. This technique is known as automated hacking.
Payload attacks. In essence, payload is an unauthorized request to gather, extract, or erase highly sensitive data. Additionally, it can be used for spotting vulnerabilities such as backdoors.

Malicious GenAI continues to evolve as new models based on Generative Pretrained Transformer (GPT) begin to emerge — they are specifically aimed at performing cybercrime. According to a report, ChatGPT has been mentioned 27,000 times on dark web forums and in Telegram channels dedicated to cyber fraud.

The Rise of Malicious GenAI Copycats

Only the first two GPT models are open-source, while its subsequent iterations are kept away from public use. This is why it’s hard to identify how the main malicious counterparts of ChatGPT — WormGPT and FraudGPT — came about. There are two possible scenarios:

Scenario 1: Authors used the early version of the GPT models as a foundation for their own chatbot models.
Scenario 2: An open-source solution was used instead, with “GPT” being an affix used for promotional purposes. At the moment, there’s a cavalcade of open-source models designed to imitate ChatGPT: GPT-J, GPT4ALL, OpenChatKit, Vicuna, and others.

WormGPT and FraudGPT incited controversy after their introduction in July 2023. They are advertised as easy-to-use solutions that allow to automate fraud activities — for a subscription fee, of course. Training datasets and algorithms implemented in these tools are undisclosed.

FraudGPT was annotated as a “cutting-edge tool” in terms of online fraud by its creator, while the Dazed magazine reported that it enabled hackers to “perform cyberattacks on a never-before-seen scale”. At the same time, some experts were rather skeptical about their efficiency, voicing an opinion that both models are decoys designed to attract users who can’t create malware themselves — in essence, scamming would-be scammers.

While neither of them can be a matchless tool for automated hacking, threats posed by these two models remain feasible. A simulated BEC attack, formulated at the request of the SlashNext’s team, showed that WormGPT can compose a persuasive text, as shown in the illustration below.

What further aggravates the matter is that neither of the cybercrime tools have any ethical limitations whatsoever — unlike ChatGPT, which had to be tricked with a jailbreak maneuver into creating harmful content.

WormGPT

WormGPT is a black hat tool designed for generating malicious content in natural language and code. It was allegedly launched on July 13, 2023 — slightly more than one year after ChatGPT became active. It is available in two versions: V1 with a monthly fee of €100 and V2 charging €550.

Seemingly, it is based on GPT-J-6B developed by EleutherAI. It is an autoregressive, decoder-only Generative Pretrained Transformer model that includes 28 layers trained with the Mesh Transformer JAX — a haiku-type library. The model features 6 billion parameters and is capable of learning language representations used for feature extraction, which is then utilized in downstream tasks.

Its main functionality is focused on analyzing a string of text and then predicting the next token. While this technique is efficient for composing a coherent text, GPT-J’s developers mention that their solution is not the best choice for writing factually accurate texts.

Since the model was trained on an English language 825 GiB dataset Pile, WormGPT cannot create content in other languages. The author advises using the regular ChatGPT for translating the generated data if necessary. Interestingly, GPT-J’s developers warn that their model cannot be deployed without fine-tuning, as it can be prompted into creating harmful content — which is, perhaps, the main selling point for the WormGPT’s creator.

Among all else, WormGPT was successfully tested by a security firm SlashNext to compose a Business Email Compromise (BEC) attack — a phishing letter that deceives a company into paying a fake invoice or executing another destructive action. As the company’s representative Daniel Kelley said, “results were unsettling”.

WormGPT’s author — who calls himself Rafael Morais — states that his creation was never meant to be a “blackhat” solution, and its sole purpose is to provide a large LLM with no limit in the number of characters or censorship. However, a number of limitations has been introduced to it recently, banning prompts related to murder, drug traffic, child pornography, and others.

At the same time, WormGPT is still actively advertised on the darkweb forums, presumably by its author, who guarantees it can code a fully undetectable (FUD) malware that can stay invisible to almost all antiviruses: “It will 99% sure be FUD against most AVs”.

FraudGPT

FraudGPT is another instance of malevolent GenAI first noticed by the Netenrich research team in late July 2023. Pricing for the chatbot usage ranges from $200 to $1,700. Among all else, it is posed as a tool for:

Spear phishing.
Designing malware.
Locating non-VBV bins.
Searching for cardable websites.
Performing Business Email Compromise, and more.

It is unknown which architecture FraudGPT is built on. According to Times of India, it’s somehow using ChatGPT-3 as a primary framework, even though the GPT-3 model is not open-source. Just like its predecessor, FraudGPT exploits the following elements:

Social engineering. By using a specific tone — authority, reliance, sympathy — the model can evoke trust.
Personalization. All content meant for a human addressee is highly personalized, taking into consideration tone of the language, stylistics, professional occupation and position, dialectical nuances, personal interests, and so on.
Emotionality. The model can put some emotional value — the feeling of urgency, curiosity, or greed — to manipulate a target into performing a certain action.

However, fraudulent messages can be detected through a common indicator: they always incite the addressee to fulfil a specific action urgently, such as clicking a link or transferring a sum of money. By instilling an urgent tone, the fraudster can bypass the usual critical thinking the victim might usually apply to such situations and trick them into acting.

DarkBART

DarkBART is the third Large Language Model (LLM) designed for nefarious purposes. Supposedly, it’s still in development, as it doesn’t seem available at the moment. It derives its name from Google Bard — a generative chatbot based on LaMDA — and is promised to be compatible with Google Lens. It means that the newer iteration of malicious GenAI will be able to integrate images into its content.

DarkBART should not be mistaken with DarkBERT — a completely different model designed for tackling cybercrime through discovering new scam techniques early on. However, DarkBART’s author claimed that they have access to DarkBERT, which can be obtained through a specialized academic email only.

An academic email can be easily bought for cybercrime purposes

Countermeasures

One of the dark web threads detected by DarkBERT

Apart from using AI text detectors, such as OpenAI Text Classifier and GPTZero, it is also important to raise awareness regarding GenAI attacks. This includes providing training and guidelines to both corporate staff and public servants, which explain how unknown emails or other text messages should be approached. In the long run, it will help mitigate the scam jeopardy.

Another anti-spoofing measure is employing email verification techniques. These can involve paying attention to lexical content of the unknown email, scanning it for such keywords as “urgent”, “money transfer”, “payment”, and so on. It’s also important to scrutinize the email address of the sender, IP-related data, etc.

A promising solution on fighting against cybercrime in general is the DarkBERT model. It’s partly based on RoBERTa, which serves as a base initialization model.

Trained on public dark web datasets DUTA and CoDa, DarkBERT can scan multiple darknet forums — something that requires a lot of time and effort if done manually. As a result, DarkBERT can outperform malicious actors by detecting the newest fraud techniques and reporting them to security specialists.