Don’t count on language transformers citing their work

An example of a language model generating incorrect text and then attempting to cite the false claim. As of this writing, TLS 1.4 does not exist and Wikipedia makes no mention of TLS 1.4.

“Truth” for language transformers

Language transformers do not have a concept of whether text they’ve generated is “true” – they are designed to predict likely continuations of a piece of text. The algorithms behind this are sophisticated, but they aren’t trying to do anything but recreate textual patterns in the corpus they were trained on. What matters to the transformer is “Given the preceding text, what should come next?” To the transformer, it is immaterial whether the generated text is factual information or misinformation.

Consider these two prompts:

  1. “A recent study by the Harvard Medical School found that the new Novavax vaccine _”
  2. “My 2yo was never the same after she got her shot. And then she got sick anyway. Clearly, the vaccine _”

Suppose you saw two comments on the Internet one beginning with each of the prompts. What would you expect to come next, based on what you have read on the Internet? (For this exercise, rely only on your memory)

Even though you probably personally disagree with what one of the continuations says, you would not be wrong in providing contradictory continuations of these two prompts. You, taking the role of the language transformer, haven’t actually made these claims – in this exercise your goal was not to inform whoever entered the prompt; it was to continue the text. In the same way, the language transformer is not evaluating the text it is generating in this way – it isn’t “taking a side” or even, in some sense, “saying” these things.


Consider possible responses you may have had to the first prompt. You could have said “… was 85% effective in preventing COVID infection 2 weeks after the vaccine was administered” or “… was found to be more effective in people who had never had COVID before.” Even though neither of these is (necessarily) supported by research, they are plausible continuations. Conversely, you are unlikely to have said something like “… caused people to spontaneously grow two additional eyes.” To the language transformer, those first two continuations would be rated at a much higher likelihood than the third and so it would tend to generate responses like those.

One way an application could cite text generated by a Transformer.

Language transformers don’t reference documents at query time and then generate text based on those documents. They are trained long before, and attempt to generate text that is a continuation of whatever they’ve been supplied. After the transformer has generated text, different algorithms analyze the text, and then look for the documents “most relevant” to the related text. Similarly – you could search the Internet for reliable sources that say similar things to how you continued the first prompt, and then cite them. You – and the transformer – aren’t really “citing” where you got this information, you made a guess and then looked for documents that seemed to support your guess.

It may be possible to attempt to determine claims generated text makes, and then re-prompt the transformer if it generated incorrect information, but these are hard problems. Identifying claims a text makes is not straightforward, and determining whether two texts say the same thing is difficult – even for humans. If the transformer generates “the vaccine was very effective” but the research says “the vaccine prevented 85% of cases in adults and 75% in children”, is the generated text “factual”? The citer may reference that research, but a reasonable person may disagree that a vaccine if “very effective” if it only has a 75% effectiveness in children.


Together, these factors mean that applications which use transformer-generated text will be unable to reliably cite sources which back up the text. In the example at the beginning of this post – there is no TLS version 1.4. While NeevaAI cites the TLS Wikipedia Page, as of this writing the text “1.4” does not appear anywhere on that page. Most likely, the transformer generated “is widely used in applications such as email, instant messaging, and voice over IP” because this text appears verbatim in the second sentence of the Wikipedia article. The citer seems to generate one citation per sentence, and so it likely chose Wikipedia as the source for that sentence because it found a high degree of textual similarity between that sentence and the text on Wikipedia. The citer didn’t catch that the generated sentence also makes a claim about TLS 1.4 but not the Wikipedia article.

Leave a Reply