The Unreasonable Effectiveness of Pronouns
Adding Sapience to the Sapient Paradox
If self-awareness emerged recently, this should show up in comparative linguistics. The ability to introspect is a big deal, and it would require the invention of new words to communicate new experiences. The most obvious addition is I, the first person singular.
Put a bit more formally as the primordial pronoun postulate: we have had pronouns for as long as we have been self-aware. This suggests a corollary: we can track consciousness by tracking the spread of 1sg. If self-awareness is ancient, this is a useless idea. Imagine if it evolved 200 million, 2 million, or even 50,000 years ago. At this point, the original form of “I” would be dust in the wind, no traces of its outline detectable in speech now.
And yet, this is not at all what we find when we look at pronouns throughout the world. There are too many similarities to be explained by chance, convergent evolution, or diffusion from the Out of Africa event. The simplest explanation is their diffusion ~15,000 years ago.
This is part of the ongoing series on consciousness. The Eve Theory of Consciousness proposed that consciousness is recent, women were self-aware first, and agriculture was the result. The Snake Cult of Consciousness argued that snake venom could function as a psychedelic that would help initiates become self-aware. If you haven’t read either, I recommend Snake Cult as it is more accessible and fun. Or, enjoy this piece as a standalone tour of linguistic enigmas.
Worldwide pronoun similarities. Most 1sg contain the consonant “n”. Not possible by chance.
Case studies of pronoun diffusion in Papua New Guinea and Australia.
A linguist’s grand theory of language that places the origin of full language as proto-Basque-Dennean, a family held together by pronouns.
The Sapient Paradox asks why fully human behavior is only widespread starting 12,000 years ago, given that anatomically modern humans have been around for 200,000 years. This paradox falls short of its name if it is merely concepts like money or religion that are absent before the Holocene. The diffusion of pronouns implies that subjective self-awareness is new as well.
Julian Jaynes should have fixed his date for the origin of consciousness to the origin of pronouns.
Worldwide pronoun similarities
“In general, the pronouns and the numerals are least affected by the obscuring changes [of time], and are therefore the best criteria of relationship among languages.” ~ Roland G. Kent, 1932
Understanding the origin of pronouns requires some terminology. A language family is defined by genetic relationships. That is, two languages that both descend from a parent language are said to be in the same family. For example, consider Uto-Aztecan shown below. By designating it as a family, linguists are positing that all these languages, which range from present-day Idaho to Nicaragua all descend from a single language spoken thousands of years ago.
There is much room for debate about the classification of languages. For example, above Uto-Aztecan there is the controversial Amerind language superfamily, which includes all Indigenous languages in North and South America that do not belong to the Eskimo–Aleut or Na–Dene language families (whose ancestors came in later waves of migration). The main argument is the similarity of the first and second person pronouns across the languages; 1sg often contains “n” and the 2sg contains “m”. Wikipedia helpfully lists a few examples:
But, for the most part, this language family is rejected. Wikipedia summarizes the situation: “Due to a large number of methodological flaws in the 1987 book Language in the Americas, the relationships he proposed between these languages have been rejected by the majority of historical linguists as spurious.” Which is followed by 11 references (vendetta territory, as far as Wikipedia is concerned).
This is mostly a debate about how far back the linguistics toolkit can peer. Linguists who reject the family also believe that there was a group that crossed the Bering Strait ~15,000 years ago that populated the Americas. This group would have spoken the same language, from which the extant languages descend. But, given all this time, it’s not clear that similarities, even if they are striking, are chance or real signal. By analogy to genetics, is it still helpful to call your 5th cousin family? You likely share minimal family lore or DNA. If you do find similarities, is it more than with a stranger? (“Oh! You also call your dad pappers? What are the odds?”) Hard to tell.
We’ll come back to the Amerind debate in order to understand the different camps within the linguistic community. But our main interest is not language families per se. We want to track down the invention of “I”. It piques our interest that supporters of the ancient Amerind family make their case with reference to pronoun similarities, and that it has received so much pushback despite there being a known proto-group 15,000 years ago.
The knights who say ni, na, ŋay, etc…
I am going to teach you a word so powerful it can make both linguists and certain portrayals of King Arthur cower in fear. Ni!!! Or, variously, na, ŋay, ’a(ŋ)kƏn, hinu. These are simply the 1sg in different language families, but they contain a great mystery. Why are they so similar if the languages are so far apart? The answer can only be chance, convergent evolution, or a genetic relationship. We will cover each of these possibilities in turn after establishing the degree of worldwide similarity.
To do so, we refer to the work of linguist Merritt Ruhlen. Ruhlen was a lecturer in Anthropological Sciences and Human Biology at Stanford and co-director, along with Murray Gell-Mann, of the Santa Fe Institute Program on the Evolution of Human Languages. He is not at all fringe, though some of his ideas remain controversial. His book On the origin of languages: studies in linguistic taxonomy argues that all languages came from a single source, and we can know some of the attributes of this proto-Sapiens by comparing existing languages. To support this claim he gathers all known forms of the first and second singular pronouns within language families. In a language family the proto-form of a word is what was originally spoken before it branched into separate languages. Experts spend years triangulating proto forms of words in a single language family by comparing many of the extant forms. Ruhlen collates the results from these disparate efforts. The similarity is astounding. Of the 44 language families he identifies, 30 contain the consonant n or ŋ1. These are:
Caucasian : *nI
Central Amerind: nV
Here are the 14 that do not:
From the list of rejects, 6 contain the related consonant “m”, Yenesian has been reconstructed with an n2, and Hurrian can arguably be combined with Urartian into a single family. I could go on. The super-family Trans-Papua New Guinea is reconstructed as na, and contains 60 language families not on the list. We could add them to run up the score. But Ruhlen was a lecturer at Stanford and devoted years to this question. It is best to accept this list as complete, and proceed with the claim that 30/44 contain the consonant ‘n’. You don’t want to give a guy like me too many researcher degrees of freedom.
Astute readers may have noticed that “I” does not contain “n”. It is something of an irony that I am writing this in English, one of the minority of languages that break the pattern. Such is life, I suppose. If only the Basque had built bigger ships.
The distribution of pronouns can only be explained by chance, convergent evolution, or diffusion. First, let’s address chance. How lucky would we have to be to get 30/44 if this was random? Most languages have about 20 consonant sounds, so random chance is 1/20. Maybe we say that ‘n’ is a common sound in general, we also included ŋ, and some pronouns have multiple consonants (though some like “I” have none). These all make the odds of inclusion higher, so we can use 1/10 instead of 1/20. The discount we apply doesn’t really matter. Even doubling the odds makes 30/44 a 1 in 36,127,314,938,069,163,114 event.
We can apply the same reasoning to test convergent evolution. Say there is some force which unerringly moves pronouns towards ‘n’. How strong would that force have to be for us to observe 30/44 families independently adopt 1sg with ‘n’? Let’s say the force moves the odds from 1/20 to ½, a whole order of magnitude. Even that scenario can be rejected (p<0.05), given we observe 30/44. The force towards convergence would have to be exceedingly strong.
There are non-statistical reasons to reject such a force. The first is that the mechanism is suspect. Mama and papa mean the same thing in many of the world’s languages. These are roughly the first two words a baby learns and therefore plausibly require the easiest sounds. The similarity is produced by constraints of babies learning to talk. This should not apply with the same strength on pronouns, which are learned later, after the child is already using many other consonants. (I is a much more complex idea than kitty or ball.)
Further, in language families themselves, the pronouns tend to diverge from the proto-form of na. At this point it makes sense to take a deeper dive into a single language family.
Papua New Guinea
Papua New Guinea (PNG) has been inhabited for roughly 50,000 years. Deep time, steep mountains and impenetrable jungle has made the region the most linguistically diverse in the world with roughly 1000 languages and 60 language families. Within families, there are languages as distinct as English and Sanskrit. And between families, as distinct as Finnish and Mandarin. Despite the extraordinary differences, pronouns are cognate in most of the languages. Nestled in the 1000 page tome New Guinea Area Languages and Language Study, Vol. 1 is a reconstruction of the route these pronouns likely took when entering the island.
These pronouns are estimated to enter the island at 8,000 BC. The widespread adoption is interesting to linguists because it is understood to be mostly memetic; there was no massive population turnover at this time. Consider how difficult it is to supplant a pronoun in another language. They are held in place by all sorts of grammar. Yet it happened across the whole island, affecting (what is now) dozens of language families, and a thousand languages. Notice that the location of impact is from the direction of Eurasia.
Reconstructing language groups from pronouns
In Pronouns as a preliminary diagnostic for grouping Papuan languages Malcolm Ross aims to define PNG language families by their pronouns, rather than by a broader set of considerations: other cognate words, grammar and phonemes. The scientific interest is that pronouns can do so in a parsimonious manner. These turn out to be geographically sensible and support plausible historical theories about the various language families within Trans New Guinea (TNG). That is the argument at least, and by citation count others in the field agree. Below are the reconstructed proto-pronouns for 14 language families he identifies. Remember what I said about being able to run up the score by adding language families to Ruhlen’s list of 44 proto-pronouns? Look at all the 1sg na!
This paper should be a home run. When you can simplify a model and still make predictions, that is progress in science. However, it contradicts the claim of convergent evolution used to explain why na shows up all over the world. Like many other families, the 1sg in Proto-TNG is reconstructed as na. Divergence from that form is used to define individual families in TNG with great success. Are we then to explain the similarity of this form with proto-forms in Eurasia, the Americas, Australia and Africa by convergent evolution? (Especially considering that the pronoun came to the island from the direction of Eurasia.)
To his credit, Ross is aware of the same set of pronouns in Chadic (a branch of Afroasiatic) as well as Algonquian (a branch of the Algic Amerind languages). He spends 11 pages justifying the use of pronouns to group languages, including the convergence vs divergence conundrum. To answer that he defers to another linguist, Lyle Campbell:
Campbell's answer to the question—and I think he is correct—is that each block is a language family, but that the distribution of the three blocks [TNG, Algonquian, Chadic] across the world is the product of chance.
But, recall the impossible odds chance requires among the larger set of 44 language families. The author doesn’t realize how much chance must explain. To get a differing perspective, it is worth quoting our very own Merrit Ruhlen at length3:
Some linguists, of course, are simply unaware that other language families often have roots similar to those in the family they are interested in, and I suspect that this is the case with Dixon. Other linguists, however, are aware of such roots but choose to ignore them. One of the most cogent pieces of evidence that Greenberg (1987) offered in support of the Amerind phylum was the presence of first-person n and second-person m in all eleven branches. As noted in Chapter 12, the first- and second-person pronouns are known to be among the most stable meanings over time. Dolgopolsky (1964) found that the first-person pronoun is the most stable item, and the second-person pronoun ranked third in stability (following the number 2). It is also well known that initial nasal consonants are among the most stable sounds, and the conjunction of stable sounds with stable meanings has meant that even after 12,000 years these pronouns have been preserved in every branch of the Amerind phylum. Greenberg did not claim to be the first to notice the broad distribution of these two pronouns in North and South America. Swadesh (1954) had underscored their distribution in an article containing additional evidence for Amerind (not yet so named), and a year later Greenberg, unaware of Swadesh's article, discovered the same distribution. Greenberg observes, "That two scholars should independently make the same basic observation is an interesting sidelight in the argument for the Amerind grouping as I have defined it" (1987: 54).
Lyle Campbell, an Amerindian scholar and one of Greenberg's chief critics, sees things differently: "The widespread first-person n and less widespread second-person m markers. . . have been recognized from the beginning without significant impact on classification" (Campbell 1986: 488). Lamentably, Campbell is correct, but that such crucial evidence has been overlooked-or, worse, scorned-is not something to take pride in. Were a biologist to remark smugly, "That group of animals you keep mentioning, the ones with a backbone, has been recognized for a long time and I am not impressed," his colleagues would chuckle and move on to other business. Here we see perhaps one measure of the difference between biology and linguistics, especially as they present themselves today.
So everyone agrees there are similarities among the hundreds of languages in the Americas, and between many proto-languages worldwide. The naysayers simply say it is chance4. But remember the odds we started with, globally it can’t be chance!
This isn’t just coming from me. A vocal minority of linguists have been beating this drum for decades. Take for example the much more exhaustive case made in Once Again on the Comparison of Personal Pronouns in Proto-Languages:
“[It is] incorrect to claim that “chance resemblance” can play an important part in pronominal comparison between languages of different families. There are absolutely no coincidences in paradigm patterns between the languages which are not thought to be genetically related by modern long-range comparativists.”
Why have they done so poorly in convincing the linguistics community at large of these relationships? They have compelling data that has been collected by many independent researchers. If I may be so bold, it could be that this data puts linguists between a rock and a hard place as demonstrated by the paper Where Do Personal Pronouns Come From? The paper aims to explain why pronouns “converge so massively towards a handful of stem consonants, whatever the language family they belong to, while very few seem to have been innovated in the last 10 to 15 ky?”
To this end, the paper notes that over and over again the proto-languages have similar pronouns about 15,000 years ago. But, if Homo Sapiens left Africa with full language, they would have pronouns. Therefore, the paper roots the explanation 50-100k in the past. This asks linguists to believe that the cognates have been preserved not 15k, but 100k years. Writing has existed for 3,000 years, so we know how fast language can change. Try and read Beowulf written just 1,000 years ago. There is good reason that linguists are highly skeptical of any genetic relationships posited to go back more than 8,000 years.
That ceiling can be bent but not broken. Afroasiatic is universally accepted as a language family, which is estimated to be 10-18k years old. And of all words, pronouns are accepted as the most durable. Maybe they can persist some time longer than 8,000 years; at least back to the age of Afroasiatic. Still, it boggles the mind to believe pronomial cognates can survive 100k years, all the way back to when Homo Sapiens only lived in Africa. That is just not how language works. So to answer the origin of these similarities, a linguist must posit that humans either left Africa without pronouns, or reject the most basic linguistic rules. Unsurprisingly, these papers don't get much traction5.
Lest you think this is a one-off in Papua New Guinea (or a 60-off you count by language families, or 1000-off if you count by languages), there are similar mysteries in Australia. We will draw from a 2020 book chapter, Time, diversification, and dispersal on the Australian continent: Three enigmas of linguistic prehistory.
Humans first made it to Australia ~50,000 years ago, the same time as PNG. In fact, for 85% of human history on the landmasses, they were combined as Sahul. It was not until ~8,000 years ago, when the sea levels rose, that they separated. There are 27 language families in Australia, mapped below. You will notice that one family, Pama-Nyungan, takes up 7/8ths of the land mass, and every other language family is clustered in the North. This is a curious distribution for an ancient language family.
As we have come to expect, the 1sg in these language families is some variation of na as seen in the table below (once again, we can find many proto-forms to run up the score for na):
The three enigmas the paper deals with are:
Time: Australian language families bear similarities that should not be apparent if they split apart 50,000 years ago, as the populations did.
Diversification: compared to PNG, there is not much diversity (pronouns, but also grammar and phonemes) in Australian languages. This is odd considering they have been part of the same land mass for 85% of their existence. Further, Australian languages are not at all similar to those in PNG (pronouns excluded), despite being neighbors for so long. Why would they separate so quickly?
Dispersal: what caused Pama-Nyungan to spread across the whole of Australia?
On the question of time, the author offers a way out: Australian language may change much slower than other languages. This isn’t very satisfying as it’s not clear why the laws of linguistics do not apply in that rarified air. It’s not as if Aboriginals lived in stasis, as the expansion of Pama-Nyungan makes so evident. Further, if one accepts that the Australian pronouns are genetically related to variants of na in other parts of the world, then pronouns must have survived in both places for the same amount of time; Australia can’t be unique.
On the question of dispersal, there is a helpful paper in Nature: The origin and expansion of Pama–Nyungan languages across Australia. The abstract reads:
It remains a mystery how Pama–Nyungan, the world’s largest hunter-gatherer language family, came to dominate the Australian continent. Some argue that social or technological advantages allowed rapid language replacement from the Gulf Plains region during the mid-Holocene. Others have proposed expansions from refugia linked to climatic changes after the last ice age or, more controversially, during the initial colonization of Australia. Here, we combine basic vocabulary data from 306 Pama–Nyungan languages with Bayesian phylogeographic methods to explicitly model the expansion of the family across Australia and test between these origin scenarios. We find strong and robust support for a Pama–Nyungan origin in the Gulf Plains region during the mid-Holocene [6,000 BP], implying rapid replacement of non-Pama–Nyungan languages. Concomitant changes in the archaeological record, together with a lack of strong genetic evidence for Holocene population expansion, suggests that Pama–Nyungan languages were carried as part of an expanding package of cultural innovations that probably facilitated the absorption and assimilation of existing hunter-gatherer groups.
When would you have guessed the main Australian language family established its territory? It is extraordinary that the language family that covers 7/8ths of the land only expanded ~6,000 years ago, and they believe it was the result of cultural innovations. What did they have that the other cultures did not? One dissertation argues that initiation rituals and rock art contributed to the Pama-Nyungan expansion6. Notably, this corresponds to the origins of the Rainbow Serpent. As explained in Birth of the Rainbow Serpent in Arnhem Land rock art and oral history:“The Rainbow Serpent has been used to define the nature of human existence for at least 4000–6000 years.” (See the myth wiki for examples of those myths.) They also discuss an outlier Rainbow Serpent dated to 10k years ago.
In a future post, I’ll take a deeper look at the archeological changes that accompanied na in PNG and Australia. So far, the linguistic and archeological facts are quite a good fit for the Snake Cult of Consciousness. Self-awareness (and therefore pronouns) could have spread with rituals involving snake venom. First into PNG, from the direction of Eurasia. Then Australia from the direction of PNG. This is why PNG and Australian languages are so different, because their most dynamic phase occurred after they separated. This is why Pama-Nyungan was able to take Australia by storm, because the people they culturally absorbed were not self-aware. And this is why all Australian language families have similar pronoun forms which they share with PNG and many other languages.
It’s not the only explanation, but something has to give and this seems to be about what you would expect if the Snake Cult of Consciousness precipitated the Holocene. This view makes for a surprisingly good bedfellow with one linguist's grand theory on the origins of language.
Morris Swadesh earned his PhD in linguistics from Yale in 1933. He spent much of the next decade doing field work in Native American languages. During the red scare he was accused of communist sympathies (well, he was a card-carrying member) and was blacklisted from American universities.
He was a man of conviction, acquainted with controversy. Just before his death7, he completed The Origin and Diversification of Language where he puts forth his theory. Below is a temporal-geographic sketch of the familial relationships during the Ice Age:
He believed that language started with proto-Basque-Dennean, and radiated out to the rest of the world. The astute reader will notice that this isn’t placed in Africa as Swadesh predated the acceptance of the Out of Africa theory. He is going off what the language patterns look like, before we knew so much about genetics.
So, what is Basque-Dennean? It is an ambitious proposal named for the languages at its extremities: Basque, an isolate in present-day Spain, and Na-Dene which is spoken among Native Americans stretching all the way down to present-day Mexico. It also includes Caucasian, Burushaski, Sino-Tibetan, Yeniseian, Salishan, Algic, and Sumerian. Their modern range is pictured below (shed a tear for the extinct Sumerians).
Geographically, these languages have no right being relatives. But many notable linguists go to bat for it8. Like many other families, it is held together by pronouns of the expected form:
Animate and Inanimate (the two genders)
One of the first things that humans would have to grapple with once they became self-aware would be agency. (It is hard to be as the gods, knowing good and evil.) Consider moments when you are completely absorbed in a task. This could be something as mundane as driving to work on autopilot, hardly aware at all. Or, if you play sports or an instrument, flow states where you are completely in the moment. This would have been all of existence before the “self” produced a mind separate from the body. Once the self comes into being, it makes sense that grammar would reflect agency.
In many languages, nouns are classified as either male or female. For example, a mesa (table) is feminine in Spanish. Some languages in Basque-Dennean employ animate vs inanimate gender systems. Instead of gender, this classifies nouns (or verbs) by whether they are animate or not. For example, in Basque, the verb “run” is animate, but the verb “trip” is not, because accomplishing it does not require agency9. This reflects your own agency, but we also have Theory of Mind where we project that onto others. Some nouns are gendered by animacy10 such as in Sumerian and languages in the Caucuses and Algic families. A rock therefore would be classified differently than a lion. The grammar depends on whether Theory of Mind is used for an object.
This point is not so strong as evidence that proto-Basque-Denean was the first full language (as Swadesh put forth). All peoples are agents and because volition is grammaticized in some languages does not mean those people were agents first. I include it because it’s an interesting exercise to think about how language would change with self-awareness, be that change 15,000 or 150,000 years ago. Entirely new grammars are on the table. Once Homo became sapient, that would hit the language ecosystem like a meteor. If that happened 15,000 or 30,000 years ago, the ripples would look something like Swadesh’s reconstruction.
The Sapient Paradox asks why fully human behavior is regional until about 12,000 years ago, at which point it appears worldwide. The actual paper is a bit softer on the extent of the change. It discusses two recent behaviors we now consider fundamental: intrinsic value (eg. putting value on something like gold) and the power of the sacred (eg. imputing spiritual powers on an object). But that doesn’t quite reach sapience; one can imagine full-fledged humans got by without money and religion (at least John Lennon can).
My object with the pronouns is to add self-awareness to the list. Even limiting the observations to Sahul, it is an enigma that na was able to sweep through Papua New Guinea and Australia during the Holocene. More broadly, na is the proto-1sg in language families as far-flung as: Sino-Tibetan, Andean, Khoisan, Niger-Congo and Australian. This is not the situation you would expect if humans left Africa with pronouns and an inner life. One linguist even put Basque-Dennean (proto-form: ni) in Eurasia as the first language. And such theories are not relegated to the past. As recently as 2021 a linguist argued that full grammatical language only emerged 20,000 years ago11.
Now, it’s not clear that the inhabitants of Papua New Guinea or Australia did not have pronouns before the advent of na. But in that case, why was it so effective at displacing those native pronouns? And, as argued in the Sapient Paradox, why was the culture so bare before?
Or perhaps na was present before the Out of Africa event. Why then is it so well preserved now? Why do linguists think that it entered Papua New Guinea 10,000 years ago instead of with the first people? Why do we see it diverge from the proto-form even in the last 10,000 years if it had lasted 50,000 years in pristine condition before that? It almost seems like that would require positing a world-changing psychological break about 10,000 years back. Maybe call it the Sapient Paradox?
Breakdown of the Bicameral Mind
Julian Jaynes is perhaps the only other scientist to put forth a theory that consciousness is recent and spread memetically. The Origin of Consciousness in the Breakdown of the Bicameral Mind posits that before humans were conscious, they had a “bicameral” mind: half the brain would produce action plans. These were communicated as auditory hallucinations to the other half, which would execute them. There was no interior space in which one could ruminate. Humans then were “noble automatons who knew not what they did.”
He writes at length about the “analog I” and the discovery of the interior via bicameral breakdown. This he said occurred just 3,200 years ago, in the Near East. His book was cited 5,000 times but as far as I can tell he nor anyone else bothered to ask how old the 1sg is. In fact, Jaynes put forth an entire theory for the evolution of language12. Pronouns are mentioned just once in passing in the section “Other Developments”:
As for the other parts of speech, pronouns being redundant with names, would develop very late, and even in some older languages never get much beyond verb endings first differentiated on the basis of intensity.
Redundant with names! And yet pronouns are the backbone of many a language family. I asked the still-active Julian Jaynes society why pronouns are evidenced before Jaynes’ date. The admin, who was one of Jaynes’ students, said that J-consciousness does not affect pronouns. The same question was asked a decade ago on their forum and the only direct response was: “How do you know that ancients used ‘I’?”
Now, if you believe that the origin of consciousness will not change the words we use to demonstrate the mind-body problem (“I think therefore I am”), that is your prerogative13. But, if you take pronouns seriously, the introduction of na in many places corresponds to what archeologists call the Sapient Paradox. That is a very good sign for a theory about the memetic spread of sapience.
J-consciousness doesn’t seem to do very much. Jaynes claims that writing was invented and pyramids constructed without consciousness. It doesn’t touch pronouns, nor does it correspond to any Paradox or Revolution that perplexes other fields. Another issue is that he doesn’t provide a mechanism of self-discovery. If humans had functioned without consciousness for so long, what could cause that change? For Jaynes it was simply that life got more complex in the Near East during the Bronze age. People realized that the godlike commands in their head were different from those of neighboring cities. The “analog I” rose from the wreckage of that epiphany. Okay, how did that spread to Australia? I can point to the same set of pronouns diffusing to (or at least in) Papua New Guinea and Australia (and the Americas, and Africa, and Oceania). I can also point to snake worship in all of those places, their association with creation and knowledge, and the efficacy of their venom as a psychedelic. Even if snakes were not involved, it boggles the mind to think that bicameral breakdown would not be the object of ritual and the subject of creation myths worldwide.
I had a similar idea to Jaynes: what if self-awareness was a realization? Our different approaches, I think, highlight the strengths of an engineering mindset. I didn’t spend so long philosophizing (which indeed may be counted as a hole thus far). Instead I asked what could cause self-awareness and what evidence the transition would leave. On the other hand, Jaynes asks people to believe consciousness emerged during the historic period but has now been forgotten, detectable only via linguistic wrinkles in Greek epics.
This is not idly ragging on an idea that has come and passed. Eric Hoel, writes the Intrinsic Perspective here on substack. Previously, he studied neuroscience and consciousness at Columbia and Princeton, and was an assistant professor at Tufts. He won the 2022 ACX Book Review Contest with an essay that argued the Sapient Paradox could be explained by the human tendency to gossip. This summer he will release his new book The World Behind the World which promises to update Jaynes’ Origin of Consciousness, among other things. Jaynes is still taken seriously by serious people. A non-genetic answer to the origins of consciousness is philosophically compelling. And it fits the data, as long as the date is moved back to when the “analog I” and other accouterments of sapience are first evidenced.
Self-awareness emerging 3,200 years ago creates mysteries. What is consciousness, if we can accomplish so much without it? How did it spread? Emerging ~15,000 years ago resolves them. For example, it is thought that mental time travel—imagining oneself in the future—requires self-awareness. Certainly agriculture also requires flexibly planning for the future. It was invented independently in the Fertile Crescent, China, Mexico, and Peru in the early Holocene. The global transition happening all at once is a great mystery, despite decades of study. The same logic applies to art, which requires mental travel to the imaginary and only shows up after Out of Africa. The Snake Cult can explain both.
Ultraconserved words in Euroasiatic
To understand the linguistic debate, it is worth exploring one last paper, which tries to find cognates between seven language families in Eurasia. These, pictured below, have no overlap with Basque-Dennean.
At this point, I hope readers expect pronouns to rise to the top. “Thou” and “I” are found to be cognate in 7/7 and 6/7 families respectively. This paper was picked up by WaPo et al which raised the ire of linguists looking to put the kibosh on any project peering too deep into the past. One linguists responded by calling the whole project a statistical method to see “faces in fire”. Along with the normal complaints about the 8,000 year ceiling, he is not satisfied with Euroasiatic’s proposed precipitating event: the end of the Ice Age.
Eurasiatic’s supposed “fit” with the usual suspect, the retreat of the glaciers, is only in (their) chronology. It is no explanation of why Eurasiatic should exist at all. Why should changing climate have favored just one language lineage, out of a single homeland, to dominate Eurasia, rather than a generalized advance of multiple, independent groups right across the continent?
I suggest it wasn’t the ice so much as discovering consciousness.
If self-awareness is recent, then we can track its diffusion by tracking the diffusion of the 1sg
There are striking global similarities in the 1sg which can only be explained by chance, convergent evolution, or diffusion.
Chance is statistically impossible
In language families we observe divergence from the common proto-forms, not convergence. Additionally, to explain the level of similarity the force of convergence would have to be unreasonably strong.
We are left with diffusion, but it is hard to believe the proto-forms are rooted >50k years in the past. Language changes too much over that time period.
Linguists interested in proto-Sapiens (a controversial research area) recognize the global pronoun similarities. Mainstream linguists that work on specific regions also recognize these similarities (at least in their region), though tend not to engage with the implications.
In line with the Sapient Paradox, the introduction of na coincides with a religious, artistic and technological transformation in PNG and Australia.
Together, this tends to support the idea that self-awareness spread at roughly the Holocene.
Those that find Jaynes’ work compelling should follow the pronouns.
I am not a linguist or an archeologist. Thankfully, they have said much of this explicitly. It says a lot of a field that they will write papers about enigmas, rather than just what they can solve. From an entertainment perspective, there are not many fields as given to grand theories or food fights, and it has been a joy to learn some lesser-known ones in this research. My contributions thus far are:
An epistemic argument that if self-awareness is recent, we should be able to understand that transition from descriptions in creation myths, which can survive at least 12,000 years.
From there, I argue that snakes’ prevalence in creation myths is not due to a Jungian or evo-psych archetype, but rather diffusion. Their venom was an active ingredient in the process.
The primordial pronoun postulate: we have had pronouns for as long as we have been self-aware.
Its corollary: we can track consciousness by tracking the spread of 1sg.
My hope is that pronouns can show that the Snake Cult has legs. This gives a mechanism to at least two grand theories: Jaynes and the bicameral mind, and Swadesh on the origin of language as Basque-Dennean. Jaynes consciously divorced psychology from genetics. For Swadesh, this happened as later scientists discovered more about our genetic roots in Africa.
Genetically speaking, there are deep forks in the human family tree. The Khoisan lineage split 100-150k years ago. It is remarkable then, that their 1sg is similar to that of the Basques, Incas, Sino-Tibetans, Caucasians, Chads, and Austronesians.
Linguists are trained that their methods only extend ~8,000 years in the past. Even so, there are inconsistencies in this application. Afro-Asiatic is accepted even though it is as old as the proposed Euroasiatic (which linguists huff at). The Australian language family is, by some accounts, 50,000 years old. Still, looking back ~10,000 years to proto-pronouns, linguists have found that languages around the world converge to just a few forms. How can this be explained?
There is a rag-tag band of linguists that argue these pronouns are echoes of proto-Sapiens, the original tongue. This group does not have much overlap with those that study biological evolution. As such, they usually assume our fundamental wiring has not changed since humans left Africa. This requires the existence of global cognates that have survived for at least 50,000 years.
Defending that notion is not an enviable position for an academic. Even when proposing genetic relationships in Amerind, where it is believed that everyone was related 15,000 years ago, there is intense pushback. Then, what to do with the global pronoun similarities? Imagine if a linguist tried to resolve the conflict by saying pronouns (or grammar, or anything else fundamental) were invented after humans left Africa.
My position is different. This exploration was designed to falsify the Snake Cult and Eve Theory of Consciousness. I thought that if self-awareness was recent then “I” must be too, and that can be tested fairly easily. The distribution of 1sg words should have been random, and it would have been case closed. Instead I found that dozens of language families use similar pronoun forms and linguists have been debating the enigma for decades. Not only that, but the introduction of the Rainbow Serpent corresponds with the memetic, but not genetic, expanse of Pama-Nyungan over the whole of Australia. Curious, right? How does one introduce a language and a creation myth without widespread slaughter? There must have been a massive advantage. It gets back to the question of why snake-worship is so common. If it’s due to instinctual fear of snakes, why is it so often associated with knowledge and creation?
While not a smoking gun, pronouns move the needle on how seriously to take the Sapient Paradox. It may actually include sapience.
ŋ is the ‘n’ used in thanks, anger, rung. Similar to: ng
“Thus, when functioning as the 1st person sg. subject verbal suffixal marker in Kott, it is ŋ (as in igejaŋ 'I am born'), but when functioning as a 1st person sg. subject verbal prefixal marker in Ket, it is b(a), as in ba-kissāl 'I spend the night'. Starostin assumes — presumably correctly — that ŋ is here the primary form, having shifted to m and then to b- already in Proto-Yeniseian due to the language's low tolerance on word-initial resonants.”
The only other non-genetic option is convergent evolution where the human tongue maps the 1sg to na (or some similar version) with incredible regularity. PNG gives evidence that is not the case. Pronouns drift from their original forms just as any other word (save perhaps mama and papa). Ross understood this and so explained similarities to two other regions by chance. My understanding is that other regional experts tend towards this explanation because they can see divergence in their own language.
Where do personal pronouns come from? has one citation. The comments from experts say that paper does a good job in describing the problem (pronouns super similar, over and over go back 10-15k years), but pan the solution.
In fact, the book was only mostly complete. The last chapter, which contains this plot, was not finished.
From Wikipedia: “Classifications similar to Dené–Caucasian were put forward in the 20th century by Alfredo Trombetti, Edward Sapir, Robert Bleichsteiner, Karl Bouda, E. J. Furnée, René Lafon, Robert Shafer, Olivier Guy Tailleur, Morris Swadesh, Vladimir N. Toporov, and other scholars.”
As explained to me by a native Basque-speaking linguist I happened to meet in Mexico
George Poulos, Emeritus professor of linguistics:
“The indication is that human language was a fairly late acquisition of Homo sapiens. It is argued in this study that language, as we know it today, probably began to emerge about 20,000 years ago.”
The study is mostly on the evolution of the vocal tract, and comparative linguistics between click and non-click languages. He thinks grammar emerged in South Africa (his area of expertise), and there was either some (undocumented) additional Out of Africa event, or there were multiple independent inventions of grammar throughout the world. For me, the theory doesn’t have nearly enough snakes.
Steelmanning their position “I” could simply as refer to the first person actor, and not the mind of the actor. Say we are hunting a mammoth and I say “mindvector go this way”. That could be abstracted into 1sg, which need not include the mind. This is the situation put forth in Where Do Personal Pronouns Come From?, where instead of mindvector (a name) the actor-word started out as a title, such as “father” or “mother”. Therefore “Nana go this way”, eventually became our worldwide friend na.
Even in this case, both Descartes and Jaynes find the word “I” to be useful in communicating the mind-body divide. My argument is that linguistic utility was required once we became self-aware. If other languages do not include the mind-body distinction in the 1sg that would be interesting to me, and be evidence against the Primordial Pronoun Postulate.