Natural Language Watermarking: General Principles, Techniques, Applications, Challenges

What Is Natural Language Watermarking (NLW)?

Noam Chomsky formulated the Transformational Grammar Theory, which can be used for NLW

Natural Language Watermarking (NLW) is a technique, which helps labeling a text with a special signature invisible to a human eye. In essence, this measure helps prevent plagiarism and data leakage (“traitor tracing”), as well as allows tracking down tampered information and even performing text anti-spoofing.

However, NLW is rather challenging to implement, as a text consists of discrete tokens, having a fundamental syntactic nature. Syntax and semantics should be taken into consideration, as changes caused by watermarking can influence structure or grammar of the writing.

As of now, several methods are proposed to apply NLW. Their shared goal is providing metadata binding that connects metadata — authorship, release date, publishing rights — to the document, so further re-usage of the information can be monitored.

Representation of traitor tracing algorithm featuring ciphertexts

General Principles of Text Watermarking

It is argued that the best strategy for NLW is focusing on the global semantic and syntactic structure of a sentence, which is also known as the underlying form — a term coined by Noam Chomsky as part of his transformational grammar theory. Hence, a sentence consists of two fundamental elements:

Deep structure. This part conveys the core meaning of a sentence.
Surface structure. It implies the words that can be altered with synonyms, grammatical form, or other means without changing the core meaning.

Two sentences “She raised her kids alone” and “Her kids were raised by her as a single mother” retain the original meaning in the underlying form, however their surface structure is completely different.

Another approach is applying the watermark embedding/extraction process. A proposed model should combine the text, the encryption key, and some details pertaining to the watermark as inputs. After that, a watermark can be retrieved with the knowledge of the secret key, while also staying “invisible”, so it can’t be erased by an intruder.

General Principles of Adversarial Attacks on Watermarked Texts

Attackers,while attempting to destroy a language watermark, typically can:

Use modifications that allow to retain the original meaning: translation to another language or rephrasing.
Minimally alter the meaning of a text on a sentence level — this can be done to just a few sentences, otherwise text will lose its initial value.
Restructure the writing by shuffling sentences inside the text.
Add a limited number of new sentences.
Crop the text.

Perhaps, lexical ambiguity — when a word can be interpreted in several ways — can also be used for an adversarial attack. For instance, the word “shoot” can refer to footage filming and firearm shooting at the same time. With recontextualization it can be used for producing false information. (Even though it may completely change the initial meaning of a text.)

Lexical and syntactic ambiguity examples

Natural Language Watermarking Techniques

The proposed NLW methods are divided into two classes:

Fragile Natural Language Watermarking Techniques

This category includes Probabilistic Context-free Grammar (PCFG), which employs text parsing to track down grammatical rules on which a text is based on, Synonym Substitution when a subset of lexemes are replaced with their proper synonyms, Translation-based approach that “disguises” insertion of a watermark as a noise signal generated during a machine translation, and other techniques.

Robust Natural Language Watermarking Techniques

Robust NWL include Quadratic Residue-based Synonym Substitution, which relies on ASCII values pertaining to the words, Sentence Tree Structure based on altering the deep sentence structure when a watermark is hidden over the parsed presentation of the sentence, and Sentence Level Transformation Analysis that focuses on verb particle movement, adjective reordering, and adjunct movement in a given sentence.

Challenges in Natural Language Watermarking

Again, word ambiguity plays a nefarious role, as it impedes automatic natural language analysis. To solve the issue, a gargantuan amount of training/testing data is needed to design a model capable of spotting and removing ambiguity collisions. At the same time, watermarking should remain a stealthy technique, perfectly blending in with the author’s style, lexicon, and grammar of the language, which is still a somewhat grandiose task.