The Trouble with Synthetic Language

Large language models are rightly getting a lot of attention at the moment with the launch of ChatGPT and GPT-4. We are also seeing a backlash against digital images generated from natural language prompts, particularly as these models are trained on the works of others without their consent or acknowledgement. We’re a little concerned by the vitriol present in these debates as AI artists and non-AI artists have more in common than not (in our opinion). Perhaps the tenor of the discourse is more a product of the platforms on which these ‘debates’ are held then the actual stakes involved.

We highly recommend an article by Benjamin Bratton and Blaise Agüera y Arcas at Noema that provides a compelling call to develop new languages for exploring artificial intelligence and its philosophy. Rather than summarizing the points included therein, I want to take up their call to “construct more nuanced vocabularies of analysis, critique, and speculation based on the weirdness right in front of us.”

Specifically, we believe that our framework for representing language in AI is flawed. We seem to be falling into past bad habits of preferring approaches that facilitate abstraction and analysis rather over the actual, embodied existence of the subject under consideration. We prefer studying the butterfly pinned to the board to the one fluttering over the pond.

Finding other approaches to think about and talk about language and AI was a focus of our work in 2023.

Today, we hope to lay out some initial sketches of where we hope to spend our time going forward and to help clarify our own thinking on this topic. We would love your thoughts and feedback.

The Nature of Language

James C. Scott, in Seeing Like a State, describes language as the “joint historical creation of millions of speakers”. This is hardly an egalitarian process, he offers, with ‘experts’, often backed by the power of the state, having a disproportionate influence on a language’s trajectory. However, he adds, language is not particularly “amenable to a dictatorship, either.” Despite the efforts at order that a state or other organized group might make to fix and control how language is used, it “stubbornly tends to go on its own rich, multivalent, colorful way.”

Synthetic language draws on statistical associations present in very large data sets to make predictions about what a coherent and appropriate response to a prompt might be. While models are frequently updated with new data, synthetic language is always looking backwards, on past patterns and connections, in order to predict and create the future.

With language in the “wild”, a human being is presented with a particular context to which they must respond. This context is potentially massive including questions of relationality, status, environment, the properties of the dialogue to that point, history, past encounters, and much more. Sometimes this is a fairly unreflective thing and we offer up phrases with very little reflection or intention involved.

“Some weather, heh?”

”You bet”

Regardless of the degree of improvisation involved, language acts are in response to the ever-becoming world around us and look forward to anticipate how a receiver will make meaning from it. Every utterance is a contribution to a long thread of dialogue that is socially produced. Meaning is unendingly renewed through the participation of ‘others’ who attempt to assemble our communications into a coherent whole, replete with meaning and intention.

“Why did they ask that question in that way?”

”The way they said that made me worry.”

As Mikhail Bakhtin offers, there is a “fundamental differentiation between language in its repeatable aspect (the topic for linguistics), and the particular linguistic utterance which carries and enacts relationships between actual people”. The former is much easier for cognitive technologies to work with and so the latter is often omitted or pushed to the margins. But this is a fundamental misunderstanding of what language is and how it operates in the world.

The direct meaning of a word matters far less than the “actual and always self-interested use to which this meaning is put and the way it is expressed by the speaker, a use determined by the speaker's position (profession, social class, etc.) and by the concrete situation. Who speaks and under what conditions he speaks: this is what determines the word's actual meaning.”

Language, in short, is a means to get things done. And when we produce language, we make choices about how best to realize our intentions.

We do this by casting ourselves out into the consciousness of others in order to assess how a particular set of choices might be interpreted and constituted by that ‘other’.

Word choice, intonation, and stylization, even in the most banal of encounters, are done with intention in order to shape the ever-emerging dialogue in one direction or another.

By looking at language as patterns that can be extracted from the lived context and fluid evolution of dialogue over time, we are trying to make sense of a forest by cutting out a single plant and bringing it back to our lab. There is much to learn from that dead plant, yes, but it represents a fundamental misunderstanding of what language is and what it does.

A dialogic approach puts the epistemological focus on intersubjectivity. Things do not exist in themselves, but rather in the relations they establish with each other. Meaning therefore incorporates characteristics of the immediate context as well as historical and social contexts of performance or social action.

Double Voicedness

A specific case discussed at length by Mikhail Bakthin (almost a century ago) is the idea of double (or multi) voicedness. His application to the novel, particularly the work of Dostoevsky, is incredible, but I will look at its application in more quotidian encounters.

In speech, in writing, in texts, and elsewhere, we are constantly reusing fragments of speech from others and inflecting them with different valences or refracting them in subtle ways. Parody, as an example, is taking the language of others (the actual words, or the style, or the intonation, etc.) and re-presenting it with an inflection that suggests a ‘bringing down’ or another potential perspective that complicates the intentions of the original speech act. We often do this unreflectively and in service to humour or resistance to official ideologies and systems.

If I open the door for my wife and remark “Madam”, I am reusing and performing someone else’s speech (a polite French-speaking gentleman in this case) in an effort to inflect the situation with particular meaning. I am “performing” as a gentleman but in a double voiced way.

I use phrasal verbs (give up) more frequently in informal contexts and Latinate verbs (surrender) in more formal contexts. But then I might use more formal language in an informal situation to deprecate myself, to acknowledge that I am putting on ‘airs’, to suggest an intersection with more formal concerns, or a combination therein.

For a large language model to be able to interpret this double voicedness would require access to experiences of the world nearly infinite in their potential scope. It may produce examples of parody or the like, if they are captured in its training set, but the double voiced quality is lost. The system is responding to a very small sliver of context. Any hope for nuance (or humour) relies on accident.

To be clear, my worry here is not that AI doesn’t “get” situated and double voiced speech. Rather, my worry is that as the volume of synthetic language surpasses the amount produced by actual humans in actual contexts, that we will begin surrendering our capacity for doubled speech (parody, stylization) in order to be more legible to these systems.

Forcing accounts on Twitter to put the word “parody” in their name or face punishment, is a hilarious and childish example of a potential future. Algorithms are bad at interpreting doubled (or multi) voicedness and so we must avoid confusing them (or their tech overlords). My sense is that parody, obfuscation, and multi-voicedness will be critical capacities to deal with the world we’re in and that’s coming.

