If Noam Chomsky and ChatGPT sat in a bar to chat about language, the first thing I would want to know is if they were arguing or agreeing. Initially I assumed they would argue, but now I am not so sure.
I am sure that some folks more specialized in computational linguistics will have some opinions- If you know someone please share this newsletter with them and ask them to chime in!
I recently spoke with Ted Gibson, an expert on these things, on Merging Minds. We dove into a number of things including Chomsky, LLMs, and much more. Click here to listen to our conversation.
Chomsky vs. ChatGPT:
Chomsky theorizes that human language acquisition is based on innate structures that allow us to rapidly learn language. Whether “innate structures” are the reason or not, we do know that human language acquisition happens with inputs of far fewer words than Machine Language Learning. Humans need a few tens of millions of words whereas machines need billions to trillions.
The question is whether human language learning (according to Chomsky) and Machine Language Learning are different in terms of structure, or simply different in terms of scale.
It is probably a little bit of both.
The Brain is a Super-Transformer
At first, I thought that the difference was structural. Chomsky said that humans have an innate understanding of language structures that allows them to rapidly acquire language skills. Basically, that the brain is hardwired to process the language inputs they read and hear into an unconscious understanding of how language works. This seemed different to me than how LLMs are trained.
But then I started to think about the transformer system used in LLMs. A transformer is a type of neural network structure that turns input sequences into output sequences by tracking relationships between sequence components. In the simplest terms, this means it reads, tracks the relationship between words and sentences, then it writes. Or it hears, tracks relationships, and then speaks.
In terms of abstract process and function, it does exactly what the brain does:
Takes data in, learns, and puts data out.
The only difference is that the brain is much better at it than the transformer.
A Brain is Worth A Trillion Words
The transformer structure is what has made it possible to process billions or even trillions of words in order to train the most advanced LLMs. In a way, the transformer is the LLM's “brain” that has the ability to learn language structures.
It is the part of the model that makes advanced learning possible.
The interesting contrast is that LLMs are so inefficient at language learning when compared to humans. The transformer has made it possible for LLMs to process and learn from massive amounts of data in order to achieve more advanced language acquisition. Humans can’t process that much data, but they also don’t need to.
A human will achieve advanced language fluency after tens or hundreds of millions of words, whereas the LLM needs billions to trillions. The transformer makes it possible for the LLM to process that many words, but it is remarkable that so many words are needed. Regardless of whether Chomsky is correct or not about how humans are able to learn so quickly, the comparison shows that the human brain is indeed unique.
Chomsky refers to the human ability to learn language from so few words as the phenomenon of “Poverty of the Stimulus”. Basically, we don't consume enough data for us to learn languages so quickly, but we do it anyway. By trying to replicate human learning with machine learning we see that this part of Chomsky’s theory is verified: A poverty of stimulus does lead to poor results in models less magnificent than the human mind.
This, for now, includes all large language models.
The fact that we learn languages after only processing millions of words is a testament to how special the human brain is. When compared to the transformer, our brains are worth a trillion words.
Effect on Low-Resource Languages
English is the perfect language to study the dichotomy between mind and machine because there are many native speakers to compare against and seemingly endless data on which to train the models. In English, we see over and over again that humans learn language faster (in terms of data input) than LLMs do.
But how do LLMs relate to low-resource languages, like indigenous languages with only a few thousand speakers?
Well, they reinforce the idea that the human mind is a more efficient language student than the LLM. The speakers of these languages have learned them purely through close contact with their communities, not through any exposure to broader data sets like books or movies in that language. No media has helped them increase their language stimuli above the “poverty line”. Yet, they learn the language anyway.
If we were to try to train a model on these languages, we would not have nearly enough spoken or written words to make it work. These languages are cases where the human brain has been able to excel with an amount of data that wouldn’t even get an LLM started.
This seems like a spiritual victory for the human mind, but it is also a tragedy because LLMs can not play more of a part in preserving these languages. Theoretically, the training of language models could help to preserve languages that have been threatened by cultural and economic forces since long before AI. But, without sufficient data to build the models this point is moot.
We would need much more efficient models if we were going to build them for languages that don’t have the endless data that English provides. Our models would need to be closer to being as good as the human brain.
They aren't nearly there yet, even if their English language output sometimes makes it seem as though they are.
To Infinity? And Beyond?
The other interesting point between what Chomsky has theorized and what LLMs have shown us is in the famous quote “Language makes infinite use of finite means.” Chomsky adopted this quote from von Humbolt, but espoused it in his work.
What is interesting about this is the difference between mathematical infinity and “hyperbolic infinity”. Mathematical infinity is a concrete concept of something greater than an assignable value, or something never-ending. “Hyperbolic infinity” is when we use the term “infinity” to describe something so large it feels infinite.
Language is a little bit of both.
Theoretically, language is infinite. You can always add one more word onto the end of a string. But practically, language is only hyperbolically infinite. It is constrained by time, and then further constrained by being sensical and avoiding redundancy.
Nonsense or redundant language is still language, of course, but in practical language you can’t just add more words ad infinitum without becoming nonsensical or redundant. What this means is that within its practical constraints, language is not mathematically infinite. It is merely hyperbolically infinite, because it has always felt larger than we could comprehend.
Until now.
Now we have a numerical sense of how large language is, because we have captured much of the English language inside statistics. As language models improve, we are statistically reproducing something that we thought was too big to comprehend; we are counting what we once thought to be uncountable.
We are realizing that the ways in which language practically operates are quantifiable, and we are closing in on quantifying almost all of these practical applications. So, the infinitude of language is more hyperbolic than mathematical. Language may be larger than the human mind can comprehend, but it isn’t infinite. Large Language Models are showing that.
So what does this mean for the writer or the poet who says something in a new way? Don’t they prove that language is infinite?
No.
They prove that our ability to dig to the depths of language is finite and that we have not yet reached “the end of language”. We will be able to keep digging "forever" because our "forever" is finite, not because language isn't. It is only our constraints that make language appear infinite, and they are the reason we will never reach the end of language, although there theoretically is one.
However, we don’t need language to be infinite in order to keep us enamored and entranced forever.
Because, after all, we aren’t infinite either.