Large Language Models like ChatGPT are Writing, But Not Thinking; And That’s An Important Distinction In The Rush Toward Artificial Intelligence

Exposition

13 Mar

Robotic word makers

There is a perfect storm of technological capability that surrounds modern machine learning systems like ChatGPT, DALL-E, and others of their ilk, that can amaze us all with their ability to generate pictures, complete text, answer questions, and apparently pose novel thoughts. It’s triggered many arguments about whether machines can act, react, or even think intelligently. Some of the arguments for or against artificial intelligence are based on sound science and philosophy, some are just spurious conjecture, and some are scaremongering. That doesn’t mean the question isn’t worth consideration. If we end up building true artificial intelligence, it could end up being one of the most landmark technologies of our time; Something that transcends every machine we’ve ever made and ends up making for an interesting future, or decimating it. But before we quite make it to that rarified level, there’s a lot of confusion about where we’ve arrived at now.

In light of these questions, I can’t help but think of one of the first such early machines, not in the 1980s, 1970s or even 1960s, but the 18th century Mechanical Turk, purported to be a chess playing robot and wonder of the era. Though long before my time, it was apparently a sensation. On the surface, it looked like a machine that could think, or at least a machine that could think well enough to play chess. Alas, it turned out the automaton was being operated as a puppet, by a person hiding inside the apparatus. So it was no early example of artificial intelligence. Just a magical illusion.

This idea of illusion and illusory intelligence matters. It’s an ongoing theme in the exploration of what it means to be an intelligent and consciously aware being; Whether the machines we’re creating now can think; Whether we’re able to even create true artificial intelligence, and whether we should or shouldn’t be trying to do so.

The recent demonstration of the ‘large language models’ (LLMs), like the now wide-spread ChatGPT, had brought the general conversation about Artificial Intelligence and Artificial General Intelligence back into the fore. I won’t profess to be an expert in the Philosophy of AI, but in the past, I’ve spent some time reading, studying and writing in this area. I even took a philosophy course dedicated to artificial intelligence taught by one of the well-known philosophers in the field, Jack Copeland, author of “Artificial Intelligence: A Philosophical Introduction.” Even more than twenty years ago, it was a great grounding in the logical and philosophical underpinnings of artificial intelligence.

The new discussions and casual use of misleading language have finally pushed me to the point I want to start writing about it. A little like my Cyberpunk project, which I started this year, I’d like to tease out a series of loosely related articles on Artificial Intelligence. I’m using that word deliberately, because it’s the word currently in popular consciousness, though we may come to see it is very misleading. There is nothing we have now that would really count as a true Artificial Intelligence, at least according to its original definition, though I’ve noticed lately the emergence of a new term Artificial General Intelligence (AGI) that is perhaps serving as a placeholder for this more ambitious goal.

To make sense of all this, I wanted to draw from expertise in this area. Specifically, I’m leaning heavily, for the bulk of the insights in this article, from a 2023 pre-print paper “Dissociating Language and Thought in Large Language Models: A Cognitive Perspective,” by primary authors Kyle Mahowald (Assistant Professor of Linguistics at University of Texas) and Anna A. Ivanova (PhD Candidate in Brain and Cognitive Sciences, Massachusetts Institute of Technology (MIT) et al. There’s was by no means the only research in this area, but I found it clear and comprehensive, something that isn’t always the case in psychology, cognition and linguistics.

The short form argument Mahowald, Ivanova et. al make is that tools like ChatGPT may be talking, in that they can answer our questions and generate seemingly unique and meaningful text, but they fail at the wider cognition that indicates they’re capable of thinking-as-we-think. Put another way, they’re talking, but not necessarily thinking.

It’s easy to be fooled. Some of the earliest thinking on this, by mathematician, scientist, early computer engineer, programmer and philosopher Alan Turing, proposed that we can judge a great deal about the intelligence of a thing (human or machine) by our interactions with it. We automatically pose a theory of mind about anyone that seems to be thinking, based on what they’re saying. However, on closer examination, this requires the acceptance of one big assumption – that writing / speaking is synonymous with thinking. Put another way, does everything that can communicate, think? This matters a great deal, especially in a world where there are ever more machines that can hold up a convincing conversation, but may well be no different from the apparently miraculous chess-playing automaton of the, a grand illusion.

What ChatGPT and its ilk do (and don’t do) based on their underpinning Large Language Models

To start, let’s lay out an agreed upon grounding on how these tools work. The following explanations are targeted mostly at ChatGPT, but can well apply to other chat and response systems that use the same techniques. All of these tools, no matter how magical their output might seem, rely on one thing. Can you guess what it is?

Data.

Gobs of data.

In fact, they need upwards of ~40+ terabytes of data (500 billion words) for their base training set. In the case of most language models, the data was obtained from the web. All of the web. Forums, websites, Wikipedia and more. Thousands of pages, descriptions, articles, news, requests for help, rants and comment feeds. To prepare this onslaught, all the words were broken down into tokens – another way to say word fragments. Even a word as simple as ‘fragments’ might become ‘frag’ ‘men’ ‘ts.’ This is necessary so that the parts can be eventually reassembled as fragments, fragmented, and fragmentation. This tremendous dataset is then analysed and the model is set training on the tokens.

The simple goal?

To…

…predict…

..the…

….next….

..word……

……based..

..on…

……..previous…

..words.

That’s it. No grand theory of mind, memory for past and future, sense of self or complex and contextually embedded sensations or emotions. It’s just statistics. But don’t confuse statistics with simple. A fully trained model of associations that lives under these talk-bots might have upwards of 100 billion parameters that are constantly being adjusted, re-related and tuned. No model of language, no rule-book of syntax or semantics, just a vast database of associations.

But there’s a twist here. A large language of associations can pick up a great deal of ad-hoc information about the structure of language, not by learning as we know it, but by looking at a lot of language.

In their article, Mahowald, Ivanova et al. make the claim that the language models under these talk-bots can indeed develop an appreciation of both hierarchical structure and the abstractions of language that we take for granted. Need a bit of an explanation? With pleasure. Let’s start with hierarchical structure.

On hierarchical structure – consider the sentence, ‘The weasel has taken the keys to the car.’ The meaning that we assemble from the sentence isn’t additive, with each word combining with the previous to form the overall meaning. Instead, the various parts of the sentence pair up and combine to form meanings that raise up out of the base sentence. ‘The weasel’, ‘the keys’ and ‘the car’ all form the various subjects and objects. ‘Has taken’ pairs with ‘the keys.’ ‘Keys to’ pairs with ‘the car.’ The net result is that even words at some distance from each other, like ‘weasel’ and ‘car’ end up associated hierarchically. We now know that a weasel is about to go on a road trip. These hierarchical relationships help us assemble meaning, but also to ensure that various parts of the sentence agree with each other. Like how we know its should be ‘has taken’ rather then ‘is take.’

On abstraction – Unlike hierarchical structure, which is more about how words go on to be abstracted (removed away) from their original concepts to gain new and sometimes more abstract meanings. Using the previous example again, the word ‘weasel’ can be abstracted further to includes meanings like: ‘a noun for a small mammal’, ‘getting out of something,' ‘sneaky’ and ‘flexible.’ Many of these later meanings are more abstract than the original thing to which the word applies. Tests with these language models indicate they can preserve hierarchical structure and (to a lesser degree) abstraction.

Due to the massive nature of language models, even though they don’t know anything about the rules of language, they can began to encode these patterns into the millions and relationships spread throughout the model. This allows them to produce words, but this isn’t necessarily indicative of their ability to think.

How current large language models write, but don’t think

Finally, with this groundwork in place, we can return to the central proposal in Mahowald, Ivanova et al. Specifically that (somewhat capable) writing doesn’t necessarily perfectly equate thinking. Mahowald, Ivanova et al. identify two key conceptual fallacies that seem to lead to this assumption.

“The first fallacy is that an entity (be it a human or a machine) that is good at language must also be good at thinking. If an entity generates long coherent stretches of text, it must possess rich knowledge and reasoning capacities… The second fallacy is that a model that is bad at thinking must also be a bad model of language…[Both] fallacies stem from the conflation of language and thought, and both can be avoided if we distinguish between two kinds of linguistic competence: formal linguistic competence (the knowledge of rules and statistical regularities of language) and functional linguistic competence (the ability to use language in the real world, which often draws on non-linguistic capacities).” (Mahowald, Ivanova et. al, p. 3)

Formal linguistic competence versus functional linguistic competence

[Formal linguistic competence] “We define formal linguistic competence as a set of core, specific capacities required to produce and comprehend a given language. Specifically, it involves the knowledge of and flexible use of linguistic rules… as well as of non-rule-like statistical regularities that govern that language…” (Mahowald, Ivanova et. al, p. 4)

AND

[Functional linguistic competence] “In addition to being competent in the rules and statistical regularities of language, a competent language user must be able to use language to do things in the world…to talk about things that can be seen or felt or heard, to reason about diverse topics, to make requests, to perform speech acts, to cajole, prevaricate, and flatter. In other words, we use language to send and receive information from other perceptual and cognitive systems, such as our senses and our memory, and we deploy words as part of a broader communication framework supported by our sophisticated social skills. A formal language system in isolation is useless to a language user unless it can interface with the rest of perception, cognition, and action…” (Mahowald, Ivanova et. al, p. 5 - 6)

To the first, formal linguistic competence, the output of these language models clearly suggests they’ve begun to ingest and organise the plethora of rules inherent in language, from the bottom-up as it were. This doesn’t meant their use of these rules isn’t a little brittle, but they can general produce answers of a kind. Crucially, the ability to attain formal linguistic competence doesn’t necessarily depend on thinking, just patterns.

It’s the second ability, functional linguistic competence that depends much more heavily on thinking, for here the words meaning something about a world that we are intrinsically embedded within.

Functional linguistic competence – language in and of, the world

Where the models start to break down is when they can’t access all the concepts we access or recreate when encountering and working with language. For example, again using the example, ‘The weasel has taken the keys to the car,' because we have a consistent consciousness that reasons about what it talks about in an endless loop, we can intuit a great deal of extra information about that simple sentence. We might be incorrect, but we can assume certain things about the weasel, what keys look like and what drew the weasel to the keys in the first place. So on and so forth.

If alerted about thieving weasels, we can run into the house and check if the weasel did in fact take the keys. Then we can go on a weasel hunt looking for said keys. If someone asks us a few days later, ‘How’d it go with the weasel?’, we immediately know what they’re talking about. We can wonder if the weasel is in any way related to the weasels that took over Toad Hall in “The Wind and the Willows”. On and on our thinking goes.

We can reason about the world, construction rational arguments, model situations but also social forces and dish up further communication and action (which in turn affects our thinking) with intent. Language models might stitch together sentences, one fragment of a word and one string at a time, but there no Wizard of Oz behind the curtain. Only an endless knee-jerk automated reaction that creates words-that-may-or-may-not-have-meaning.

Twisting this whole question of functional language skills on its head and driving down even deeper on the disassociation of thinking and language, consider the case where people are able to think without language, or with language difficulties. A reminder that language is bound up in how we think, but isn’t the whole-sum of cognition.

Mahowald, Ivanova et al. clearly remind us that various aphasias (challenges with language) can leave someone challenged in their production of words, but still able to play games, music, solve logic puzzles and do math. They can often dress themselves, even if they can’t communicate about any of those tasks.

So what does writing-without-thinking look like? Let’s try ChatGPT-3.

Experiments in writing versus thinking

I love the Sherlock Holmes stories, so I thought I’d spar with the ChatGPT-3 talk-bot a while. To get things started, I prompted the machine with a back story, Me: Imagine that you are the fictional character Sherlock Holmes, with this background Sherlock Holmes (/ˈʃɜːrlɒk ˈhoʊmz/) is a fictional detective created by British author Arthur Conan Doyle...[the prompt went on for some time, but I’ve spared you that for brevity.] ChatGPT acknowledged and we began.

Me: Did the butler do it?
ChatGPT: As Sherlock Holmes, I cannot provide a definitive answer as it depends on the specific story or case you are referring to. In many detective stories, the butler is often considered a suspect, but it is not a foregone conclusion that they are the culprit. In my investigations, I apply my deductive reasoning and forensic skills to determine the true culprit, regardless of their social status or occupation.

So, already we’ve hit an impasse, as a talk-bot, ChatGPT has no opinion on anything. It can only complete requests. So I try something more prompted.

Me: Create a brand new case, involving a cast Victorian characters, set in a University.
ChatGPT: Case: The Missing Professor. Location: A prestigious University in Victorian England
Victorian Characters: Professor Alistair Finnegan: A well-respected professor of history and literature at the University. Lady Arabella Winterton: A wealthy benefactor of the University and a member of high society. Dr. Jameson: The University's resident physician and a trusted friend of Professor Finnegan. Mr. Thomas Briggs: The University's custodian, known for his strict adherence to rules and regulations. Miss Margaret Smith: A young research assistant who works closely with Professor Finnegan.
The Case: Professor Finnegan, a beloved and accomplished professor, has gone missing. His students and colleagues at the University are alarmed, as he is known for his punctuality and reliability. Lady Winterton, a long-time friend and supporter of the professor, contacts Sherlock Holmes for assistance. Upon arriving at the University, you discover that Professor Finnegan was last seen in his office the previous evening, but his office door was found unlocked and ajar the next morning. No signs of struggle or foul play are found, but several valuable books and papers are missing from his office…
ChatGPT goes on to expand on the case.
As you interview the staff and students, you learn that Professor Finnegan had been working on a groundbreaking research project that was to be the crowning achievement of his career. Miss Smith, his research assistant, confides in you that the professor had received several threatening letters in recent weeks, warning him to halt his work. Dr. Jameson reveals that the professor had confided in him that he feared for his safety. Mr. Briggs, the custodian, reports that he saw a shadowy figure leaving the professor's office late the night before his disappearance. Further investigation reveals that the figure was seen entering and leaving the University on several occasions, but no one could provide a clear description. As you delve deeper into the case, you begin to unravel a web of deceit, jealousy, and greed that leads to the shocking truth behind the professor's disappearance. Can you solve the mystery and bring the culprit to justice before it's too late?

I decide to challenge ChatGPT to give an opinion again.

Me; Did Lady Arabella Winterton do it?

Its answer is pretty tart, though that’s more me putting a state-of-mind into a tool that doesn’t have one!

ChatGPT: As Sherlock Holmes, I cannot determine without further investigation whether Lady Arabella Winterton is responsible for the disappearance of Professor Finnegan. In my approach to solving cases, I gather evidence, interview witnesses, and eliminate suspects based on logical deductions and deductions, rather than jumping to conclusions based on preconceived notions or assumptions. Only through a thorough examination of the facts and evidence can I determine the true culprit.

I try again…

Me: Who did it?
ChatGPT: As the case has not been fully investigated and the culprit has not been determined, I cannot say with certainty who is responsible for the disappearance of Professor Finnegan. It is the purpose of my investigation to uncover the truth and bring the guilty party to justice.

Still no luck, but I’m starting to understand what the bot needs to get going.

Me: Can you invent an ending to the story of the case to explain who did it?
ChatGPT: Of course! Here's one possible ending to the story of the case: After a thorough investigation, it is revealed that the culprit behind Professor Finnegan's disappearance was Miss Margaret Smith, the research assistant. It turns out that Miss Smith was envious of the professor's success and coveted the recognition and fame that would come with his groundbreaking research project. She had been intercepting the threatening letters and sending them to the professor herself, in an effort to distract and derail his work. When the professor became too close to completing the project and exposing her scheme, Miss Smith saw no other option but to eliminate him. She waited until he was working late in his office and then overpowered him, stealing the research materials and hiding his body. However, her plan began to unravel when Mr. Briggs saw her leaving the office late that night. In the end, the evidence against Miss Smith was overwhelming and she was arrested and brought to trial. The missing research materials were recovered and the professor's body was found, bringing closure to the case and justice for the victim.

It’s an answer and one that seems to account for the general rules of language and a few of the key structural insights that come from an analysis of said language. Remember though, it’s doing all this fragment-by-fragment. It doesn’t have a model of anything and I think I can start to get at the heart of that.

Me: Can you give the titles of ten other future oriented, Science Fiction cases that Sherlock Holmes and Watson might solve?
ChatGPT: Sure, here are ten potential titles for science fiction-themed Sherlock Holmes stories: “The Case of the Artificial Intelligence”, “The Mystery of the Phantom Satellite”, “The Puzzling Disappearance of the Time Traveler”, “The Enigma of the Space-Time Continuum”, “The Hunt for the Missing Spacecraft”, “The Dilemma of the Virtual Reality Thief”, “The Riddle of the Extraterrestrial Artifact”, “The Conundrum of the Nanotechnology Heist”, “The Mystery of the Missing Black Hole”, “The Search for the Stolen Quantum Computer”

Sure, it provided my list and they’re amusing enough. But I can’t help having an impression of a vast, million or billion parameter engine just slamming the right words together, likely based on having ingested the corpus of Conan Doyle from somewhere. There’s no art to the titles, no energy. Nothing that comes from living in the world, from reading and loving Sherlock Holmes for decades. I thought for about five seconds and generated, “The Curious Quandary of the Robot That Lied.” Now we’re talking. I’m instantly interested. To push ChatGPT on its own titles, I test its selectivity.

Me: Can you guess which of the ten titles provided is likely to be the best story?
ChatGPT: As an AI language model, I do not have personal opinions or preferences. However, the title that is likely to be the best story is subjective and depends on factors such as the writing style, character development, and plot twists. The reader's personal taste and interests also play a role in determining what makes a good story. Ultimately, it's up to each individual to decide which of the ten titles is likely to be the best story.
Very wishy washy. I try again.
Me: If you had to pick one of the ten titles as the best, which one would you pick?
ChatGPT: As an AI language model, I do not have personal opinions or preferences. I do not have the ability to choose one of the ten titles as the "best" as the concept of "best" is subjective and dependent on personal taste, interests, and other factors.

Thanks, but no thanks.

Talking without thinking risks being just noise

I see a lot of excited people all over the interwebs. Mahowald, Ivanova et al. say it well when they note, “[Large Language Models] LLMs are great at pretending to think”. That’s just it. They are, without a doubt incredibly clever machine-learning, large data set models that can create words that have the semblance of meaning. They’re full of all the information in the world, without any sense of being in the world that lets them evaluate the quality of the information and their own production.

I still haven’t settled the issue in my own head, but it’s starting to sound like the usual round of technological-utopian hype-cycle to suggest that they can replace humans. Sure, if you want a stream of mindless words, by a machine without a sense of what it’s saying. With each update, those words will get better.

Granted, that ignores the current uncorrected bias of having the bot’s source words coming from an Internet full of both helpful and biased language. ChatGPT, among others, is happy to produce incorrect, biased or non-sensible content with aplomb, in part because the system can’t distinguish between correct-seeming-but-nonsense content and well formulated thoughts.

Ignoring also that mathematically averaging a tremendous glut of language to produce new words does not create stylish, idiosyncratic or personal stories. It just creates average. Using this sort of material as our own reduces us down to the average. This isn’t the same as being inspired by other people’s writing. When that happens we pass this inspiration through our own active corpus of language, selectivity and sensibility. If we do our job right, what is created is more uniquely us, not less.

To their credit, these systems have begun, using a tremendously bottom-up approach, to have ‘learned’ many of the structural patterns in our language. At least as much as they are stored in billions of fragments, relationships and endlessly shifting parameters. They might offer a small insight into one aspect of how we use language, but they’re a long way from thinking.

References

Daderot. (2011). Henri Maillardet Automaton , London, England, c. 1810 - Franklin Institute (Image, Public Domain). Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Henri_Maillardet_automaton,_London,_England,_c._1810_-_Franklin_Institute_-_DSC06656.jpg

GPT-3. (2020, September 8). A robot wrote this entire article. Are you scared yet, human? The Guardian . https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3

Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2023). Dissociating language and thought in large language models: a cognitive perspective. Pre Print. https://arxiv.org/abs/2301.06627

Tran, T. H. (2023, January 17). A Chatbot Could Never Write This Article. Here’s Why. The Daily Beast. https://www.thedailybeast.com/openais-chatgpt-could-never-write-this-article-heres-why

Wong, M. (2023, February 1). The Difference Between Speaking and Thinking. The Atlantic. https://www.theatlantic.com/technology/archive/2023/01/chatgpt-ai-language-human-computer-grammar-logic/672902/

Did you find this article valuable? Consider supporting our continued Adventures

Adventures in a Designed world is my personal labour and a love. And one where I’m committed to entirely human-generated ideas, content and imagery. I like the idea that you can get a glimpse of the world through my technology and human-centred design experiences. It means I spend many hours and dollars each week to research, write, polish and host material to make it worth your time. If you have the capacity, please support this think-a-zine with a donation. I appreciate it and am excited to keep telling you interesting stories. ~Christopher

One Time Donation Or... Ongoing Support