3 Persuasion as a Major Transition in Evolution
This chapter situates persuasion within deep evolutionary history. The capacity to influence the beliefs and actions of others is not merely a cognitive or cultural phenomenon: the biological record shows it has repeatedly restructured life at the level of the organism, the colony, and the species. The chapter surveys the evidence from eusocial insects, cooperative mammals, and primate societies before turning to human language as the most radical persuasive technology yet evolved.
Somewhere in France, roughly 17,000 years ago, a group of people entered a cave and by torchlight painted the walls with horses, bison, and deer. They mixed pigments, coordinated effort, returned across multiple sessions. Those images are still legible to us today. The people who made them were not the first large-brained humans — anatomically modern Homo sapiens had existed for at least 60,000 years before those paintings appeared. What changed around 40,000 years ago was not the brain. It was something in how that brain was being used: art appeared, burial with grave goods, long-distance trade, explosion of tool types. The question this chapter tries to answer is what changed, and why it took so long.
Language is the obvious candidate — but its role only becomes legible once we understand the problem it solved. That problem is cooperation: how do independent agents, each pursuing its own interests, act as a unit? Evolution has been working on this question for hundreds of millions of years. Bacteria cooperate. Ants cooperate. Chimpanzees cooperate. Each time, the mechanism is the same at its core: one organism changes the behaviour of another through signals. Stripped of culture, rhetoric, and technology, persuasion reduces to exactly that: the use of signals to alter the behaviour of another agent. Non-human species provide the clearest view of what this must accomplish at the most basic level, before language and social convention complicate the picture.

The connection between brain evolution and social life sharpens the argument. Dunbar [10] measured the ratio of neocortex volume to total brain volume across 36 primate genera and compared these ratios with typical group sizes for each species. The correlation is r = 0.77: species with proportionally more neocortex live in larger social groups. Since neocortex size varies heritably across primate lineages, this relationship is most naturally read as causal. Species that faced the cognitive demands of managing relationships in larger groups were selected for larger neocortices. The brain appears to have expanded not primarily to handle tools or difficult terrain, but to handle other individuals.
Applying this regression to humans, Aiello and Dunbar [1] found that our neocortex ratio predicts a social group of roughly 150 people. That figure, now called Dunbar’s number, turns up with notable consistency: in hunter-gatherer band sizes, Neolithic village remains, military company structures, and people’s functional social networks today. Maintaining 150 relationships takes continuous effort. Other primates manage this through physical grooming, which occupies roughly 20 per cent of waking time and proceeds strictly one pair at a time. For a group of 150, the arithmetic breaks down. There is not enough time in the day to groom all the partners such a group requires. Language fills the gap. Vocal exchange can be directed at several people simultaneously, requires far less time per bond maintained, and occurs while doing other things. It scales where grooming cannot.
3.1 Persuasion, Communication, and Coordination
Three concepts recur throughout this chapter and are often conflated in ways that obscure the biology.
Communication is the broader category: any process in which a signal produced by one agent reaches another and influences that agent’s state. A pheromone trail laid by an ant is communication. So is a waggle dance, a dominance display, a tweet. What distinguishes communication from simple causation is that the signal is functional: it exists because receivers respond to it in ways that have historically benefited the sender or the relationship between them.
Persuasion is a subset of communication in which the signal aims to alter the receiver’s beliefs, preferences, or actions. Not all communication is persuasive in this sense. A meerkat’s alarm call does not try to convince companions that a predator is present; it triggers a hardwired escape response. The call is communication, and it coordinates behaviour, but there is no inference happening at the receiver end that could be redirected or resisted. Persuasion enters when the receiver has options and the signal is designed to push among them — when, in other words, the receiver could be unpersuaded. By this criterion, human language is the most powerful persuasion technology in the biological record, because it can address receivers who have not merely escape-or-stay responses but entire systems of beliefs and preferences that can be updated by argument, narrative, or social pressure.
Coordination is what persuasion and communication make possible at the group level. Coordination is the state in which multiple agents act in ways that are mutually consistent and collectively beneficial — not because they share an instinct, but because each has been influenced to expect and rely on the behaviour of the others. Cooperation requires coordination, but coordination is harder than cooperation: it requires not just shared goals but shared models of what each party will do. Language is what made coordination possible beyond the limits of direct observation and shared routine, because language can represent plans, obligations, and the expected behaviour of parties who are not present.
The three concepts stack: communication is the mechanism, persuasion is its influence-directed form, and coordination is the social outcome it enables. What this chapter traces is how evolution built progressively more powerful communication systems, each capable of sustaining coordination at a larger scale, until language made coordination among millions of strangers not just possible but routine.
3.2 The Major Transitions Framework
Life has repeatedly solved a particular problem: how to get formerly independent entities to work as a single unit. Each time this happened, evolution did not simply add new behaviours to an existing system. It created a new level of organisation. Genes that once competed for replication started cooperating inside chromosomes. Cells that once competed for nutrients started cooperating inside bodies. Organisms that once competed for territory started cooperating inside colonies. At every step, the unit of evolutionary competition shifted upward: what was previously the arena of conflict became the platform for a higher-level competitor.
Maynard Smith and Szathmáry [22] catalogued eight of these moments and named them major transitions. Starting from the origin of self-replicating molecules in compartments, the sequence runs through the emergence of chromosomes (coalitions of cooperating genes held together by shared replication machinery), the origin of the DNA–protein translation system (which locked in the separation of heritable information from functional machinery), the emergence of eukaryotic cells (formerly independent bacteria now cooperating as organelles), sexual reproduction (which reshuffles genetic information between lineages), the origin of multicellular organisms (formerly independent cells now cooperating as tissues and organs), the emergence of eusocial insect colonies (colonies so integrated that most individuals forgo reproduction entirely), and finally the emergence of human language societies. At each step, a smaller group of entities gave up some autonomy to become part of a more powerful collective. And at each step, the transition required a mechanism for suppressing defection by the lower-level units: a chromosome holds genes in line because they share a single replication event; a eusocial colony keeps workers in line through pheromones and policing. Communication is the glue that holds each level together.
Queen pheromones in eusocial insect colonies illustrate the logic, on a timescale of weeks rather than billions of years. A honeybee queen produces a blend of fatty acids and aromatics, the queen substance, that is distributed through the colony by physical contact and grooming. Workers who receive this signal have their ovarian development chemically suppressed. They do not reproduce. They forage, build comb, and die at the entrance defending the hive. From the worker’s perspective this looks like a catastrophic sacrifice of individual fitness, and from the gene’s perspective it sometimes is. What makes it evolutionarily stable is that the workers share most of their genes with the queen’s offspring and with each other: the queen’s argument is not dishonest. It says, in chemical terms, “your genes propagate more effectively through this queen than through your own eggs,” and the arithmetic of haplodiploidy makes that claim true. The pheromone is not a trick. The arithmetic of haplodiploidy makes the queen’s chemical claim true, and workers are built to respond accordingly.
The human language transition followed the same pattern, but the communication system that enabled it was orders of magnitude more flexible than any of these predecessors. Grooming could maintain alliances in groups of 50. Alarm calls could coordinate vigilance across a troop. What language made possible was the transmission of cooperative obligations across time, distance, and individuals who had never met. A promise made today can bind behaviour next month. A reputation formed over years in one location travels to the next. A norm that emerged in one generation can be taught explicitly to the next, along with the reasons for it. None of the earlier transitions created a communication system capable of that. The jump from grooming to language was not merely a quantitative increase in signal complexity. It was a change in what signals could be about.
Chemical gradients push organelles to differentiate rather than proliferate. Queen pheromones persuade workers to suppress their own ovarian development. Dominance displays persuade subordinates to defer rather than contest. Before language, these systems were innate and inflexible: the signal triggered a fixed response, and neither sender nor receiver had any capacity to revise the exchange. Language changed this entirely. Flexible, explicit persuasion became possible, persuasion capable of representing futures and obligations, of proposing hypothetical arrangements, of invoking norms that do not yet exist in the immediate environment.
The framework makes a prediction. Each time cooperation scaled up, a new communication technology enabled it. The communication technology was not merely a consequence of the new cooperative arrangement; it was its precondition. The rest of this chapter examines those technologies in order of complexity, beginning with the chemical and mechanical signals of insect colonies and ending with the recursive grammar that allowed human societies to reach scales no other primate has approached.
3.3 From Genes to Culture: Bio-cultural Co-evolution
Each of those transitions shared a common feature: the communication system that enabled it co-evolved with the biological machinery that executed it. In humans, this co-evolution took an unusual form, one in which culture itself became a selective pressure on the genome.
This account situates the evolution of language within a broader bio-cultural co-evolution. The genetic capacity for language, specialised cortical regions, precise articulatory control, sensitivity to syntactic structure, was shaped by selection pressures that were themselves cultural: the demands of hunting coordination, pantomimic storytelling, and finally conversational argumentation. Each cultural advance in communication created new selection pressure on the biological substrate; each biological advance in communicative capacity opened new cultural possibilities. The result, visible in the archaeological record at 40,000 years ago, was a communication system of unprecedented power, one that could coordinate the construction not just of hunting parties but of shared fictions, institutions, and normative systems.
The mechanism connecting cultural practice to genetic change is sometimes called the Baldwin effect. When a behaviour is culturally learned and repeatedly beneficial, individuals who learn it faster or more reliably have social advantages: they coordinate better, secure more alliance partners, and survive more successfully. Over many generations, this selective advantage favours genetic variants that facilitate the learning. Applied to language: children who could acquire grammar more efficiently were better cooperators, and so better survivors. Over time, the biological machinery for grammar acquisition, the neural architecture of Broca’s and Wernicke’s areas, the sensitivity to phonemic contrasts, the timing of the critical period, became more precisely tuned. The innate grammar-acquisition device was itself shaped by the cultural practice of using language. There is no clean separation between “innate” and “learned” here: the innate is what the learned, repeated over thousands of generations, selected into the genome.
The division of labour between genome and culture follows from this logic. As Tooby and Cosmides [33] argued, the genome may as well store the vocabulary in the “cultural environment”: the words themselves are learned from the surrounding community, not inherited through DNA. This is precisely the right division of labour given that cultural evolution is far faster than genetic evolution. The grammar provides a stable generative engine; the vocabulary, which must track cultural innovations such as new tools, new social roles, new institutions, can evolve and diversify at the speed of culture.
The palaeontological record shows exactly when these co-evolutionary pressures produced their decisive outcome.
3.4 Persuasion in Non-Human Animals
A forager bee returns to the hive after finding a rich patch of flowers two kilometres to the north-east. She does not report her discovery; she performs it. On the vertical surface of the comb she runs a waggle dance: a figure-eight pattern in which the angle from vertical encodes the bearing from the sun, and the duration of the waggle run encodes the distance. Within minutes, hundreds of nestmates that have never visited the patch take off and fly directly to it. Nothing was said. No map was drawn. The information was transmitted through movement, and it worked.
Non-human animals show what persuasion looks like when culture has not yet touched it. In humans, almost every act of influence is wrapped in learned convention: language, etiquette, institutional role, narrative tradition. Strip all of that away and the underlying structure becomes invisible. Other species never developed those wrappings, so the biological core of persuasion is exposed and readable.
Three things become clear from studying it. First, persuasion is not cultural in origin. The waggle dance existed hundreds of millions of years before the first hominid. Alarm calls, pheromone trails, dominance displays: these are biological solutions to the cooperation problem, products of natural selection operating without any cultural mediation. Whatever is universal in human persuasion, whatever remains when convention is removed, is a variation on machinery that was already ancient. Second, each non-human system reveals a structural constraint that language later transcended. The waggle dance is extraordinarily precise about location but cannot refer to time: there is no waggle dance for “yesterday’s flowers” or “the patch that will bloom next week.” Queen pheromones suppress worker reproduction but cannot negotiate exceptions or propose novel arrangements. Every non-human system is frozen at a specific design solution, capable of solving one class of coordination problem and nothing else. Language, by contrast, is not frozen. It can represent anything. Seeing the constraints of non-language systems is the clearest way to understand what language added. Third, each case is a natural experiment. Evolution has independently evolved eusociality in insects, naked mole rats, and several spider species; it has independently evolved coalition politics in several primate lineages; it has independently evolved graded alarm calls in birds, mammals, and primates. These convergences reveal the recurrent logic of cooperation under selection. When different lineages facing similar problems arrive at similar solutions, the solution is probably close to optimal for those conditions.
The cases below trace how evolution built, over hundreds of millions of years, increasingly sophisticated persuasion machinery, beginning with insect colonies and ending with the coalition politics of chimpanzees.
3.4.2 Primate Social Persuasion: Grooming, Coalitions, and Politics
Among mammals, the primates provide the richest evidence for sophisticated social persuasion — communication aimed not merely at coordinating immediate behaviour but at managing long-term relationships and social standing.
3.4.2.1 Chimpanzees: The Emergence of Political Persuasion
De Waal’s [35] detailed observations at Arnhem Zoo remain the canonical account of chimpanzee social politics. Male chimpanzees compete for alpha status not primarily through brute force, because an outright fight risks injury to both parties, but through a sustained process of coalition building that is recognisably persuasive. An ambitious male will systematically groom potential allies, sharing food and providing social support in return for backing in future confrontations. The grooming sessions function as negotiations: the sender offers a costly signal (time, grooming effort, food) that communicates commitment and creates a social debt. The receiver is expected to reciprocate support in future conflicts. The channel is tactile (grooming) and gestural; the message is alliance offer; the effect is the creation of a coalition that shifts the balance of power.

De Waal documented that the most successful alpha males at Arnhem were not the largest or most aggressive, but those best at coalition management: grooming many partners, being reliably supportive to allies, and repairing relationships quickly after conflicts. Two males who had fought were often observed, within an hour, presenting themselves to each other for reconciliation. Uninvolved third parties would also approach the loser of a fight and groom or embrace them, a behaviour de Waal interpreted as post-conflict consolation. The consoler needs to recognise the loser’s distress and respond to it, which requires something like empathy, the capacity to model another individual’s emotional state and be moved to act on that model.
Chimpanzees also demonstrate strategic deception, a form of persuasion that involves deliberately misrepresenting information to influence a receiver’s behaviour. Males hide erections when approaching dominant individuals to avoid aggression, and individuals have been observed leading competitors away from food sources they have located, then circling back alone. Byrne and Whiten [6] catalogued dozens of such cases and proposed the “Machiavellian intelligence” hypothesis: that the demands of managing complex social relationships, tracking alliances, reading intentions, deceiving rivals, building coalitions, drove the expansion of primate neocortex. Social intelligence, in this view, is persuasive intelligence.
3.4.2.2 Baboons and Cooperative Sentinels
Baboons (Papio spp.) organise in large troops of 20 to over 100 individuals with multi-level dominance hierarchies that are reproduced daily through communication rather than continuous physical contest. Subordinates signal submission through vocalisation, postural appeasement displays, and grooming directed upward in the hierarchy; dominants signal status through gait, gaze, and priority-of-access behaviours. The social order is reproduced through the ongoing persuasive acts of its participants, not merely imposed by force.
The system goes considerably further than status display. Males use specific vocalisations, “grunts” directed at females before approaching them or their infants. The probability that a female will tolerate an approach is measurably higher after such grunts than in their absence, and the effect is stronger for males with a consistent history of non-aggressive behaviour toward that female. This is the full signal-model-decision sequence in operation: the receiver tracks the sender’s past behaviour, updates her model of his intentions, and regulates her response accordingly.
Females also engage in strategic alliance formation across matrilineal kin groups. A female facing a rival will selectively groom a high-ranking male ally in the days preceding likely conflict, then benefit from his proximity during the confrontation itself. The investment in grooming functions as a deposit in a social account, and the subsequent conflict is the withdrawal. Tracking these relationships across dozens of individuals, over weeks and months, requires a social memory that begins to look less like the fixed-response systems of insects and more like genuine political cognition.
Cooperative sentinel behaviour in meerkats (Suricata suricatta), studied extensively by Clutton-Brock and colleagues [7], illustrates how honest alarm signalling can evolve in the absence of close kinship. Sentinel individuals at elevated positions produce graded alarm calls: the call type encodes both the type of predator and the level of urgency, allowing group members to calibrate their response. The sender (sentinel) invests in costly vigilance; the message (alarm call) is graded and specific; the channel is acoustic; the effect is coordinated predator avoidance across the group. The system is maintained by reputation: sentinels who produce false alarms are less likely to receive reciprocal sentinel behaviour from groupmates.
3.4.2.3 Lions: Cooperative Hunting as Persuasion in Action
Lion (Panthera leo) cooperative hunting provides an example in a non-primate predator. Stander [31] documented that lionesses take on specialised roles in group hunts: “wings” flank the prey while “centres” drive it, with the role division emerging from the spatial positions individuals adopt before the hunt begins. No explicit negotiation is observed; the channel is purely visual (positional and postural signals); the message is the individual’s chosen role; the effect is coordinated group action that achieves prey capture that no individual could accomplish alone.
This coordination is accomplished without anything resembling symbolic communication. The information necessary to divide labour and time attacks is transmitted entirely through movement and position in space. The channel and message complexity need not be high to produce sophisticated collective outcomes. What matters is the reliability of the signal-response relationship.

3.4.3 Semantic Signals: The Vervet Monkey
The examples above, chemical trails, waggle dances, alarm calls, cooperative hunts, all involve signals that influence behaviour. But are any of them semantic in the sense that a word is semantic: referring to a specific object or category in the world, independently of the immediate context? This question is sharpest in the case of the East African vervet monkey (Chlorocebus pygerythrus).
Vervets produce three acoustically distinct alarm calls, one for each of their major predators: a call for pythons, a call for martial eagles, and a call for leopards. Each call triggers a predator-appropriate response in listening monkeys. The leopard call causes them to climb higher into a tree (useless against an eagle); the eagle call causes them to look upward and move into dense undergrowth (useless against a leopard); the snake call causes them to stand upright and scan the ground. The responses are specific to the call, not merely to some general level of danger.
One interpretation is that the calls convey only degree of danger, and that the listening monkey looks around, identifies the predator itself, and responds accordingly. Seyfarth, Cheney and Marler [30] ruled out this explanation by playing recorded calls to monkeys in the absence of any actual predator. The monkeys still responded in the predator-appropriate way, climbing, scanning upward, or standing to scan the ground, demonstrating that the call itself, not the predator, drives the response.
The acid test for semantic representation was a habituation study. If an alarm signal is not followed by an objective threat, animals cease to react to it: they habituate. Vervets have two other calls, wrr and chutter, both used to signal the presence of a neighbouring group of monkeys. If a vervet is habituated to the wrr call, does that habituation transfer to the chutter call? If it does, this means the animal has coded both calls as referring to the same category of event, another group of monkeys, a hallmark of semantic representation.
Habituation does transfer between wrr and chutter, but only when both calls come from the same individual, not from different monkeys. Vervets can identify individuals by voice; what appears to be semantic transfer may partly be individual-recognition transfer. Seyfarth and Cheney concluded that the calls do function as semantic representations, but that the system is entangled with individual identity in a way that human language is not.
What, then, are the limits of this system? They are severe, and illuminating by contrast. Vervets do not use calls to refer to objects that are not present: a vervet will not produce the python call to warn a companion about a snake seen yesterday, or at a distant location. The calls are anchored in the here and now; they lack displacement, one of the most important properties that Hockett [19] identified as distinguishing human language from animal communication systems. Vervets also use alarm calls as a means of deception, producing a false call to drive competitors away from a food item, but this deception is conspicuously crude: the deceiving monkey shows no sign of alarm, and bystanders can often see exactly what it is doing. Animals have calls only for objects of immediate biological interest: predators, rivals, food. There is no vervet call for yesterday, or probably, or if you help me now I will help you later. A potentially complete representation system, one capable of referring to anything including hypotheticals and abstractions, was the decisive step. That is the step language made. The next section asks why such a system would evolve at all, given that producing signals that benefit others looks, at first glance, like a losing strategy.
3.4.4 The Prisoner’s Dilemma and Why Cooperative Persuasion Evolves
The vervet case raises a theoretical question: why would any individual evolve to produce honest signals that benefit others? If cooperation is individually costly, why does it not collapse under defection? The Prisoner’s Dilemma frames this precisely.
A fundamental theoretical puzzle underlies all the examples above: why would natural selection produce organisms that are persuadable, that respond to the signals of others in ways that may benefit the sender?
The Prisoner’s Dilemma (PD) formalises the problem. Two individuals can each choose to cooperate (C) or defect (D). If both cooperate, each receives a reward R. If one defects while the other cooperates, the defector receives the temptation payoff T (highest), while the cooperator receives the sucker’s payoff S (lowest). If both defect, each receives the punishment payoff P. The payoff ordering T > R > P > S means that regardless of what the other player does, defection yields a higher immediate payoff. In a single interaction, rational actors defect, and cooperation collapses.
The iterated Prisoner’s Dilemma changes this dramatically. When the same individuals interact repeatedly, future payoffs discount the present gain from defection. Axelrod [2] ran computer tournaments in which strategies submitted by game theorists competed in iterated PD. The winner, across two tournaments, was the simplest possible strategy: tit-for-tat. Cooperate on the first move, then do whatever the other player did last round. Tit-for-tat is nice (it never defects first), retaliatory (it punishes defection immediately), forgiving (it returns to cooperation as soon as the other player does), and transparent (the other player can easily learn its rule). These properties, Axelrod showed, are exactly what is needed to sustain cooperation in populations of self-interested agents.
The evolutionary implication is deep: in any species with repeated interactions and the ability to recognise individuals, the incentive structure shifts towards cooperative signalling. Persuasion, the use of signals to alter another’s behaviour in ways that benefit the sender, becomes evolutionarily stable when receiver and sender interact repeatedly, because receivers who respond to honest signals and withdraw from exploitative relationships outcompete those who do not. Nowak [23] synthesised the evolutionary routes to cooperation, kin selection, direct reciprocity, indirect reciprocity, network reciprocity, and group selection, and showed that in each case, the underlying mechanism is a signalling system that makes cooperative intent legible and defection costly.
This is the deep evolutionary reason why the social world is saturated with persuasion: honest communication is a stable equilibrium in populations of agents who interact repeatedly and track reputations. The evolution of language in humans, examined in the next section, built on this foundation.
The logic plays out in ways that are easy to observe. In humans, the split-or-steal format of televised game shows provides a near-ideal natural experiment: two strangers face a single iterated-PD-like choice, with large sums of money at stake, no future interaction, and a live audience. Most players defect. But occasionally a player reframes the game entirely — announcing in advance that they will always steal, then promising to share the winnings afterward — converting a one-shot PD into a credible commitment device and demonstrating, in front of cameras, how reputation and pre-announced strategy can substitute for repeated interaction:
The same dynamic appears in non-human animals, with cooperation maintained not by language but by memory and reciprocity. Vampire bats (Desmodus rotundus) return to the roost after nightly foraging and regurgitate blood meals to roostmates who failed to feed — selectively favouring past donors and withholding from past defectors. The relationship is stable because the bats interact nightly and recognise individuals. David Attenborough’s narration of this system in The Trials of Life makes the iterated-PD structure unusually explicit for a wildlife documentary:
3.5 Kin Selection and the Logic of Cooperative Persuasion
A worker bee will sting an intruder and die doing it. A meerkat sentinel stands exposed on a rock, calling loudly, making itself conspicuous to exactly the predators it is warning others about. A Belding’s ground squirrel produces a loud alarm call when it spots a hawk, drawing the predator’s attention to itself. These are costly acts that benefit the group at the individual’s expense. The answer begins with genetics. Hamilton [17] showed that altruistic behaviour can evolve when the recipient is a sufficiently close relative, formalised as Hamilton’s rule: rb > c, where r is genetic relatedness, b is the benefit to the recipient, and c is the cost to the actor. This elegant inequality, derived across two landmark 1964 papers, unified the previously puzzling phenomena of worker sterility, altruistic alarm calls, and cooperative breeding under a single quantitative framework.
In Hymenoptera (bees, ants, wasps), haplodiploidy, the mechanism by which females develop from fertilised eggs and males from unfertilised ones, means that full sisters share three-quarters of their genes (r = 0.75), a relatedness higher than that between a mother and her offspring (r = 0.5) [34]. The arithmetic here is worth pausing on. A worker bee who sacrifices reproduction helps raise sisters who share 75% of her genes. The inclusive fitness gain, the benefit to shared genes flowing through those sisters, can easily exceed the direct fitness cost of forgoing personal reproduction. This is why worker sterility evolves readily in Hymenoptera and not in diploid species: the haplodiploidy coefficient r = 0.75 for sisters is higher than the r = 0.5 for siblings in diploid organisms, and Hamilton’s rule requires rb > c. Higher r makes the left-hand side larger, more than enough to outweigh the cost of sterility.
This connection between kinship and cooperability is not merely arithmetic: it suggests that the trustworthiness of a persuasive signal, the probability that a receiver will act on it, is itself subject to evolutionary selection. Signalling systems evolve towards honesty when sender and receiver interests are sufficiently aligned, and towards manipulation when they diverge. Krebs and Dawkins [21] articulated this as the deep tension underlying all animal communication: signals that reliably benefit both parties are maintained by selection; signals that exploit receivers at their expense drive counter-adaptation. The evolutionary tension between honest persuasion and deceptive manipulation is the most ancient form of the propaganda arms race.
3.6 Costly Signals: Why Honest Communication Persists
The problem with honesty, from an evolutionary standpoint, is that it is always vulnerable to cheating. If a signal of high quality can be faked by a low-quality sender, selection favours fakers and the signal loses its informativeness. So why does honest communication persist at all?
Zahavi [37] provided the answer: signals that impose a genuine cost on the sender cannot be cheaply faked. A peacock’s tail is expensive to grow, metabolically costly to carry, and makes the bird more visible to predators. A weak peacock that grew such a tail would pay the cost without the fitness benefits that a strong peacock enjoys. The tail is informative precisely because it is burdensome — its very extravagance is the guarantee of its honesty. Zahavi called this the handicap principle [38].
The principle extends well beyond feathers. Male deer fight with antlers that are costly in bone, time, and injury risk. Bowerbirds construct elaborate structures that signal building ability and aesthetic sense. Meerkats stand on exposed rocks and call loudly, advertising their sentinel role at personal risk. In each case, the signal is credible because a low-quality individual cannot afford it: the cost enforces honesty.
Human persuasion inherits this logic directly. A speaker who travels far to deliver a message signals commitment. An orator who publicly stakes a prediction signals confidence. A leader who accepts costly obligations — redistributing food, taking on dangerous tasks, absorbing the first risks of collective action — signals alignment of interest with followers. In political science this surfaces as the costly commitment literature; in economics as signalling games. The underlying biology is the same across every domain: costly acts are credible precisely because they are costly.
Reputational commitment extends this further. Publicly announced promises are hard to break without social cost; the public nature of the promise is itself the enforcement mechanism. Much of what we call rhetoric — the choice of a bold claim, the decision to speak on record rather than off — is the deployment of costly signals in the reputational domain.
3.7 The Arms Race: Honest Signals and Their Mimics
Zahavi’s handicap principle predicts honesty. But it predicts something else too: mimicry. Wherever an honest signal becomes reliably responded to, a selection pressure arises for cheap imitations — signals that look honest but are not. Krebs and Dawkins [21] named this the manipulation side of the evolutionary tension. Every signal system, given time, generates an arms race between honest senders and deceptive imitators, and between credulous receivers and skeptical ones.
The evidence is everywhere in nature. Orchids that mimic the appearance and scent of female wasps deceive male wasps into pseudocopulation, achieving pollination without offering any reward. Cuckoos produce eggs that mimic the host’s eggs closely enough to evade rejection. Firefly species Photuris females mimic the flashing patterns of other firefly species to lure and eat males who mistake them for mates. In each case, receivers eventually evolve counter-adaptations: finer discrimination, subtler recognition criteria, resistance to the previously exploited signal.
The arms race between sender manipulation and receiver resistance is the deepest structural feature of communication, older than nervous systems. It predicts something that will become central to the rest of this book: that persuasion and resistance to persuasion co-evolve. Every technique of influence that becomes widely deployed generates, over time, corresponding immunities. The history of mass media is one version of this cycle — from print propaganda to the media literacy it eventually prompted; from television advertising to commercial skepticism; from email spam to spam filters. What changes with AI-generated persuasion is the speed at which new sender strategies can be deployed, potentially outrunning the evolutionary pace at which receiver resistance develops. That asymmetry is examined in Chapter 7.
Hamilton’s rule explains cooperation within kin. Costly signalling explains cooperation among non-kin who can observe each other’s costly acts. Neither reaches the scale of modern human societies. Something else is needed — something that allows cooperation among complete strangers, across time, at continental scale. That gap is where the human record begins.
3.8 The Human Evolutionary Record
To understand language as a genuine major transition, it helps to ground the argument in the palaeontological record. The timeline below reveals a puzzle: the extraordinary delay between the appearance of our lineage and the appearance of our culture [22].
| Period | Species | Brain size | Key marker |
|---|---|---|---|
| 4 Mya | Australopithecines | ~1/3 modern | Upright posture; ape-sized brains |
| 1.5 Mya | Homo erectus | Brain doubled | Hand axe — unchanged for 1 million years |
| 250 kya | Earliest H. sapiens | Near-modern | Slightly more skilled toolmaking |
| 100 kya | Fully modern H. sapiens | Modern | Fully modern large-brained humans, tools still conservative |
| 40 kya | H. sapiens | Modern | Burst of innovation: cave paintings, burials, trade |

At 4 million years ago, the australopithecines walked upright but their brains were ape-sized, roughly one-third the volume of a modern human brain. By 1.5 million years ago, Homo erectus had appeared with a brain that had doubled, and was carrying a remarkable stone tool: the hand axe. What is astonishing is not the hand axe’s appearance, but its extraordinary stability. Virtually identical specimens have been found across Africa, Europe, and Asia, manufactured to the same specification for approximately one million years and across many thousands of generations [22]. Cultural transmission was clearly operating: the design was copied faithfully, but it was copying without cumulative improvement. This is culture, but not yet cumulative culture.
By 250,000 years ago the earliest Homo sapiens were present, with slightly more skilled toolmaking. By 100,000 years ago, fully modern large-brained humans were widespread, yet their tools remained broadly conservative. Then at 40,000 years ago there was a burst of innovation. Cave paintings appeared; the dead were buried with grave goods; shell ornaments were traded across hundreds of kilometres; tool types proliferated rapidly. Language is the obvious candidate for what changed. Only a communication system capable of displacement, of referring to the absent, the past, the hypothetical, could underpin simultaneous innovation in art, ritual, and long-distance trade, all of which require coordination around shared representations that do not exist in the immediate environment.
3.8.2 Pantomime Predating Verbal Language
The sequence reconstructed by Számadó places gestural and pantomimic communication before verbal language in evolutionary time. Ferretti and Adornetti [12] develop this argument in detail: archaic hominins employed pantomime as a primary persuasive medium, a nonverbal, mimetic, non-conventionalized form of communication that represented events and stories through coordinated body movement and relied on shared mental imagery. Unlike the limited gestural repertoires of other apes, pantomime is inherently narrative: it can depict an absent entity, a sequence of events, a causal chain. Experimental evidence shows that gesture dominated over vocalization in early human communicative acts, and that gesture has significantly greater potential than vocalisation for bootstrapping a shared communicative system from scratch.
The reason gesture comes first is straightforward: mimicry of visible actions is simpler than arbitrary acoustic symbols. You can mime a running antelope with your hands; you cannot easily vocalise it without prior convention. Gesture is transparent to the receiver in a way that sound is not, because the form of the gesture shares properties with what it depicts. A plausible neural substrate for this is the mirror neuron system, first characterised in macaques: cells in the premotor cortex that fire both when an action is performed and when it is observed in another individual. Because gesture is visible and imitable, and because the same neural circuits are activated both in producing and perceiving an action, gesture would naturally be the first medium for communication about the actions of agents. Sound became the dominant channel later, once the conventions were established, because of its advantages in range, darkness, and multitasking.
The emergence of conversational language built on this gestural base, adding the dimension that pantomime alone cannot achieve: turn-taking argumentation.
3.8.3 Conversational Language as Reciprocal Persuasion
Ferretti and Adornetti [12] locate the distinguishing feature of modern Homo sapiens not in more complex signs but in conversation: the turn-taking exchange in which both parties alternately produce and respond to communicative acts, each trying to shift the other’s beliefs or actions. Not a one-way transmission but a negotiation.
What makes conversation cognitively special is turn-taking itself. Each speaker must model what the other person understood from the previous turn and respond to that model, not to what was said but to what was registered. This demands theory of mind operating in real time: I need to track not only my own intention but your current mental state as I produce the next utterance. Pantomime, however elaborate, does not require this because there is no conversational floor to manage, no obligation to respond to the other’s interpretation rather than merely repeating one’s own display.
Conversation, on this account, was the evolutionary trigger for grammar. The demands of reciprocal persuasion, composing novel arguments in real time, responding to objections, specifying precisely which object or action or time-point is at issue, required a combinatorial system capable of generating an unbounded number of distinct messages from a finite vocabulary. Consider what is needed to say “if you bring the spears to the ridge, I will drive the prey toward you from the south.” That sentence requires tense, conditionals, reference to locations currently out of sight, and subject-predicate structure linking specific agents to specific roles. Simple declaratives and requests, the kind pantomime can approximate, are not enough. The pressures driving grammar were argumentative, not aesthetic: exchanging reasons, proposing and rejecting plans, negotiating roles and obligations in real time.
Through this process, human communication became multimodal: integrating both speech and gesture as complementary channels. Speech took on the primary grammatical burden, the combinatorial, recursive system for specifying relations among arguments, while gesture retained its role in grounding reference, expressing emphasis, and conveying spatial and iconic information that resists easy encoding in syntax. The result is a hybrid system in which the full meaning of an utterance is often distributed across both channels, but whose core propositional structure is carried by the spoken word.
The genetic architecture that makes all this possible was not installed in a single evolutionary step. It co-evolved with the cultural practices it enabled.
3.9 Language: The Human Major Transition
The communication systems surveyed so far — chemical gradients, waggle dances, alarm calls, coalition politics — share a fundamental constraint. They all operate in the present. A vervet alarm call signals a leopard here and now. A grooming session cements an alliance with the individual directly in front of you. Pheromone trails point to food that currently exists. No signal in any of these systems can refer to what happened yesterday, what might happen tomorrow, or what would have happened if someone had behaved differently.
Language broke that constraint. It differs from every communication system that came before not in degree but in kind, through three properties absent in all animal communication [19]:
- Combinatorial productivity: a finite set of phonemes combines into an unbounded number of morphemes, which combine into an unbounded number of sentences, each capable of expressing a distinct meaning.
- Displacement: language can refer to entities and events that are not present in the immediate environment, including objects in the past or future, distant locations, and purely hypothetical situations.
- Propositional structure: language encodes not just the identity of referents but the relations between them, including causal, conditional, and normative relations.
These properties together mean that language can coordinate behaviour around representations of the world, including representations of social rules, obligations, and sanctions, rather than merely around present stimuli. A honeybee’s waggle dance communicates the location of a food source with extraordinary precision, but it cannot communicate that a certain flower patch is morally off-limits, or that a nestmate who visited it owes an apology. Language can. This is what makes language the pivot of this book: every mechanism of persuasion examined in the chapters that follow — attitude change, framing, narrative, political rhetoric, AI-generated content — operates through it. Without displacement, without propositional structure, without the ability to say if you do this I will do that or they did something terrible last year, none of those mechanisms exist. Persuasion at the scale that humans practise it is not an application of language; it is one of the primary reasons language evolved.
That connection runs in both directions. Language made large-scale persuasion possible; the selection pressure for large-scale cooperation made language evolutionarily advantageous. The two drove each other. The following sections trace how — starting with the social scaling problem language solved, and ending with the way cultural transmission turned language into a system that persuades not just individuals but entire civilisations.
3.9.2 Language and Thought: Distinct Systems
If language evolved primarily for persuasion rather than reasoning, we should expect language and thought to be partially separable — distinct systems with distinct evolutionary histories that happen to interact. The evidence bears this out. A persistent misconception treats the two as the same thing — as if thinking were simply internal speech. Language and thought are biologically distinct systems, and this distinction matters profoundly for understanding what language does as a persuasive technology. Fedorenko, Piantadosi and Gibson [11] review the full body of evidence and conclude that language is primarily a tool for communication between minds, not a medium of private thought.
The clearest evidence comes from patients with severe aphasia — the selective loss of language following damage to the left hemisphere’s language areas. Such patients may lose virtually all ability to produce or comprehend speech, yet retain the ability to perform arithmetic, solve spatial puzzles, follow complex non-verbal instructions, and reason causally about the world. The language system, even when catastrophically damaged, does not take reasoning down with it.
The complementary pattern is equally informative. Certain forms of frontal lobe damage or thought disorder leave linguistic fluency intact — the patient produces grammatical, well-formed sentences — while producing severe impairments in decision-making, planning, and logical inference. A person can speak perfectly and reason very poorly. Together, these two patterns constitute a double dissociation: each system can be selectively damaged while the other is preserved, which is the strongest evidence that they are anatomically and computationally distinct.
The structure of language itself reinforces this conclusion. If language had evolved primarily as a tool for thinking, we would expect it to be optimised for the demands of reasoning: precision, unambiguity, completeness. Instead, the statistical structure of every known human language reflects the pressures of communication between a sender and a receiver [11]. Across languages, the most frequently used words are the shortest — word length is predicted more strongly by contextual predictability than by meaning complexity. Grammatically related elements cluster together within sentences, reducing the listener’s memory load; analyses of large corpora show that actual sentences are consistently shorter in dependency length than random arrangements of the same words would be. And languages tolerate, even exploit, ambiguity: in predictable contexts a shorter ambiguous expression transmits the same information as a longer unambiguous one. Private reasoning, which has no receiver, would gain nothing from ambiguity. Language’s tolerance of it is a signature of its communicative function.
Developmental evidence runs in the same direction. Prelinguistic infants track object permanence, attribute intentions to agents, and compute causal chains before they have the syntactic or lexical resources to describe any of this. Going the other way, children acquire grammatical patterns in domains where their conceptual understanding lags — producing passive constructions or embedded clauses in contexts where they do not fully grasp the logical relationship being expressed. The two systems develop on their own schedules, consistent with distinct biological programmes.
Language enables the persuasive achievements described in the remaining sections of this chapter: shared fictions, institutions, normative systems. Not by creating new thoughts in individuals, but by transmitting representations between minds. Persuasion via language is other-directed: a technology for aligning the mental states of distinct agents, not for improving the reasoning of any one of them.
3.9.3 From Signals to Recursion: The Hierarchy of Linguistic Capacity
Language did not emerge as a finished system. It is better understood as a hierarchy of increasing computational power, each level enabling communicative and cognitive capacities inaccessible to the level below it. Mapping this hierarchy clarifies what is shared between humans and other animals, what is uniquely human, and why the uniquely human components were the decisive step for cooperation at scale.
Level 1 — Signals. The most primitive communicative acts are signals: outputs that reliably trigger specific responses in receivers, with the signal-response relationship fixed by biology. The vervet alarm calls (Section 3.4.3) are paradigm cases: three acoustically distinct calls, each eliciting a predator-appropriate flight response, each biologically specified. The honeybee’s waggle dance encodes direction and distance to food with extraordinary precision. Meerkat sentinel calls grade continuously with threat level. In each case, the signal is tied to an immediate, perceptible state of the world — a real predator, a real food source, a real threat. What signals enable: rapid, reliable coordination of behaviour around present stimuli. What they cannot do: refer to the absent, the categorical, or the arbitrary. There is no vervet signal for yesterday’s python.
Level 2 — Symbols. A symbol is an arbitrary sign-referent relationship: the acoustic or gestural form of the symbol bears no iconic resemblance to what it refers to. The English word cat sounds nothing like a cat; the word red is not itself red; the sign for apple in American Sign Language does not look like an apple. This arbitrariness is not a limitation — it is the crucial enabling property. A signal system tied to iconic or indexical resemblance can only refer to things that can be imitated or pointed at. An arbitrary symbol can refer to anything, including things that cannot be perceived, imitated, or pointed at: obligations, possibilities, mathematical objects, the future. Washoe’s acquisition of arbitrary ASL signs demonstrates that the symbol-forming capacity is not unique to humans — trained apes can acquire a limited set. What symbols enable: naming, the assignment of a stable label to a category that can then be communicated across individuals and across time.
Level 3 — Vocabulary expansion. Once the symbol principle is established, the vocabulary can grow without limit. Each new word extends the referential scope of the system without requiring new signal infrastructure. Vocabulary can track cultural innovation: new concepts get new names (algorithm, democracy, copyright), and those names can spread through a community within a generation, far faster than any genetic process. This is why Tooby and Cosmides [33] argued that the genome stores the capacity to learn words rather than the words themselves: cultural evolution generates vocabulary faster than genetic evolution ever could. What vocabulary expansion enables: the categorical mapping of the world, including the shared conceptual maps that underlie coordinated action among strangers who have never met.
Level 4 — Simple combinations. Placing two symbols in relation — big train, tickle Washoe, my milk — produces a qualitatively richer output than either symbol alone. The combination encodes a proposition: an assertion about how two entities stand in relation to one another. This is the level at which the two-word child, the language-trained chimpanzee, and Genie all operate (see Section 3.9.6). The semantic territory covered by two-word combinations is already substantial: attribution of properties (red book), possession (my milk), location (walk street), agent-patient relations (Adam checker). What combinations enable: propositional content — the expression of states of affairs rather than merely the naming of objects. What they cannot express: the difference between the dog bit the man and the man bit the dog, because at this level word-order rules are absent or inconsistent, and embedding is impossible.
Level 5 — Syntax. Syntax is the system of rules that governs how symbols can be combined — which structural roles they can play, in which order, with which agreement relations. The decisive syntactic innovation, present in all known human languages and absent in all animal communication systems, is the subject-predicate distinction (see Section 3.9.4): the separation of the argument slot (what is being talked about) from the predicate slot (what is being said about it). This allows a vocabulary of N nouns and M verbs to generate N × M distinct propositions rather than N + M distinct signals. The expressive capacity grows multiplicatively rather than additively. What syntax enables: unbounded messages from finite means — the ability to say, and understand, sentences never before uttered.
Level 6 — Recursion. Recursion is the embedding of one linguistic structure inside another of the same type, without principled limit. A sentence can contain a relative clause that contains another relative clause: the dog that the man who the woman hit saw ran. A verb of mental state can take a propositional complement that contains another propositional complement: She knew that he believed that they had agreed to leave. Conditional and counterfactual reasoning is structured recursively: if he had known that she would have done what she said she would never do, he would not have…. Recursion is what allows language to represent thoughts about thoughts — the basis of full-blown theory of mind: not merely knowing that someone has a belief, but knowing what they believe someone else believes about what you believe.
What recursion enables: the entire domain of embedded social reasoning — the contractual, legal, narrative, and moral reasoning that human cooperation depends on. A contract (“if you do X, I will do Y, unless Z obtains, in which case…”) is a recursively embedded conditional. A legal argument is a chain of recursively embedded propositions about what others did, intended, agreed to, and were permitted to do. A novel embeds one character’s consciousness inside another’s, nested within a narrator’s, nested within the author’s representation of a world that never existed. None of this is possible without recursion.
Level 7 — Abstract reference and displacement. The final level is the capacity to refer to entities and events that are not present in the immediate perceptual environment — and, beyond that, to entities that may not exist at all. Displacement [19] is the ability to speak of the past and future, of distant locations, of hypotheticals and counterfactuals. Abstract reference extends this to entities that have no spatiotemporal location whatsoever: obligations, rights, probabilities, mathematical objects, moral duties, social roles. A corporation is not a physical object; democracy is not a perceptual category; justice does not exist at any particular location. Yet these abstractions coordinate the behaviour of millions of people who have never met.
What abstract reference enables is precisely the institutional infrastructure that distinguishes human civilisation from the cooperation of every other species: laws, markets, religions, scientific communities, states. Every one of these institutions is, in the analysis of Section 3.9.9, a shared fiction — a collectively held representation of something that has no physical existence but that generates real coordination through belief. The capacity to represent and communicate about non-present, non-perceptible, non-existent entities is not a luxury; it is the communicative foundation of everything that makes human social organisation unique.
The hierarchy as an evolutionary ladder. The levels are not merely descriptive categories; they map onto distinct evolutionary and developmental stages. Animal communication systems generally reach Level 2 (symbols, in trained apes) or remain at Level 1. Proto-language — in the two-year-old child, the trained ape, and the language-deprived human like Genie — operates at Levels 2–4. Full human language achieves Levels 5–7. The transitions between levels are not smooth gradients; each requires qualitatively new neural architecture. It is the jump from Level 4 to Level 5 — from combination to syntax — that constitutes the major transition in human evolution, and it is Levels 6 and 7 that make human persuasion qualitatively unlike anything seen elsewhere in the animal kingdom.
| Level | Capacity | Example | What it enables |
|---|---|---|---|
| 1 | Signals | Vervet alarm calls | Immediate behavioural coordination |
| 2 | Symbols | Washoe’s ASL signs | Naming arbitrary categories |
| 3 | Vocabulary expansion | Cultural words (algorithm) | Tracking cultural innovation |
| 4 | Simple combinations | Big train; Tickle Washoe | Propositional content |
| 5 | Syntax | Subject-predicate structure | Unbounded messages from finite vocabulary |
| 6 | Recursion | She knew that he believed that… | Theory of mind; contracts; narrative |
| 7 | Abstract reference | Obligations, rights, justice | Institutions; law; shared fictions |
3.9.4 Grammar, Syntax, and Semantics
The most fundamental structural feature of human language — and one that has no counterpart in any animal communication system — is the subject-predicate distinction. Consider four concepts: dog-running, dog-sleeping, lion-running, lion-sleeping. A system without grammar would need a separate signal for each of these four combinations. Human language does something more powerful: it provides two nouns (dog, lion) and two verbs (run, sleep), and allows any noun to be combined with any predicate. To say “the dog is sleeping” is a predication — an assertion that a property holds of an entity. Since many properties can be predicated of each entity, and many entities can be referred to by each noun, the range of things that can be communicated with a vocabulary of a given size grows not additively but multiplicatively. The subject-predicate distinction is a universal feature of all known human languages, and it is the basis of our ability to produce and understand an indefinitely large number of sentences from a finite vocabulary. A vervet monkey cannot do this: its alarm calls are fixed signals, not combinations of reusable parts.
Why is only the capacity to learn language innate, and not the vocabulary itself? If language were adaptive, would it not be more efficient to transmit the vocabulary genetically, rather than requiring each child to learn it from scratch? Tooby and Cosmides [33] offer a compelling answer: if the vocabulary is learned, we can acquire names for cultural innovations — screwdriver, constitution, algorithm. Cultural evolution is far faster than genetic evolution; long before any appreciable number of words could be genetically assimilated, dialects and distinct languages were already present and diverging. The genome may as well store the vocabulary in the “cultural environment” — meaning that the capacity to learn words is heritable, while the words themselves are maintained and transmitted culturally. This is a precise instance of the bio-cultural co-evolution described in Section 3.9.8: the biological endowment creates the receptacle; the cultural environment fills it.
The innateness of grammar’s deep structure presents a different puzzle. The surface rules of different languages vary enormously, but their deep structural properties — subject-predicate organisation, recursive embedding, argument structure — are universal. If these structural principles cannot be learned from the input (bottom-up), nor derived from general reasoning principles (top-down), their existence requires explanation. As Bates and colleagues argued, there are logically only two possibilities: either universal grammar was installed directly by the Creator, or our species underwent a cognitive mutation of unprecedented magnitude — a Big Bang of the language faculty.
This is not an argument that should appeal to an evolutionary biologist. We have been told too often that the eye could not have evolved by natural selection, because any alteration to its structure would destroy its function. Yet we know of many functional intermediates between a simple pigment spot and the vertebrate eye, each fully adequate for the organism’s needs at that stage. Evolutionary biology teaches that complex organs are built incrementally, not installed all at once. The question for language is whether comparable intermediates can be identified. The snag is that the intermediates no longer exist — unlike the eye, for which we have living comparative examples across phyla. But the evidence from cases of partial linguistic capacity — specific language impairment, aphasia, sign languages at different stages of grammaticalisation — suggests that grammatical competence is not all-or-nothing. There can be partial grammar, and partial grammar is far better than none.
3.9.5 Language in the Brain: Genetic and Lesion Evidence
The neurological evidence for language as a biologically specific faculty — not merely a general cognitive capacity applied to communication — comes from two sources: cases of selective brain damage and cases of selective genetic impairment.

Patients with damage to the temporal segment of the left lingual gyrus suffer from colour anomia. They experience colour normally and are in full command of word morphology — they can produce and comprehend sentences — but they are unable to pair colour names with colours: they may pair yellow with grass and green with banana. Given a colour name, they point to the wrong colour. The link between word and concept is selectively impaired. Patients with damage to the anterior and mid-temporal cortices recognise objects correctly but cannot name them; they say, “I know it, but I cannot say the name.” Oddly, this naming difficulty is more pronounced for natural objects than for human-made artefacts, suggesting that different parts of the brain represent categories of objects differently. Damage to the left anterior temporal lobe can selectively impair the ability to retrieve the names of unique persons, while leaving common-noun retrieval intact. Each of these dissociations reveals a distinct sub-component of the language system — a modularity within language itself that is anatomically grounded.
The biological specificity of language is shown most directly by a British family studied by Gopnik [16]. Across three generations, 16 of 30 family members were affected by a peculiar language disorder (dysphasia). The pattern of inheritance — affecting some members of a sibship while sparing others — is consistent with a single dominant autosomal gene. The disorder is not due to imitation of a disordered parent: children affected while one parent is normal still show the impairment. These individuals are not globally impaired: they tell jokes, converse, and some are mathematically capable. There is no general failure to handle hierarchical structures. The impairment is specific to one aspect of morphology: the affected individuals cannot generalise grammatical rules.
The nature of the deficit is beautifully illustrated by Gopnik’s examples. Affected children write sentences such as:
She remembered when she hurts herself the other day. Carol is cry in the church. On Saturday I went to nanny house with nanny and Carol.
In each case the child fails to mark tense or possession using the appropriate morphological change — hurt should be hurt (past), cry should be crying, nanny house should be nanny’s. When shown a picture of an imaginary creature called a wug and then a picture of several such creatures, a normal child immediately says wugs. The dysphasic child cannot: they can learn individual examples of plurals and past tenses, just as we all learn that the past of go is went, but they cannot generalise the rule to new cases. Grammar, for them, is a collection of memorised facts rather than a productive system.
One child demonstrated this precisely. On a Monday she wrote: On Saturday I watch TV. Her teacher corrected this to watched. The following week she wrote: On Saturday I wash myself and I watched TV and I went to bed. She had learned that the past of watch is watched as a particular fact; she had not extended the rule to wash; and she already knew went as a unique memorised form. The productive morphological rule, the ability to generalise, is the specific thing that is missing.
This case carries several implications for understanding what language is. First, it shows that there can be intermediates between perfect linguistic competence and none: these individuals have substantial language, just not generative morphology. This is evidence against the “all-or-nothing” view of language evolution, and for the possibility that the faculty evolved incrementally. Second, the impairment is specific to language — there is no general cognitive defect — suggesting that at least some grammatical knowledge is instantiated in neural structures that are biologically dedicated to language rather than shared with domain-general reasoning.
Subsequent molecular research identified a gene — FOXP2, a transcription factor that regulates neural development — as implicated in heritable language disorder. The human version of FOXP2 differs from the chimpanzee version by two amino acid substitutions that appear to have arisen recently in our evolutionary history, and disruption of this gene produces impairments specifically in the sequencing and articulation of speech. It is one of the first concrete links identified between a specific genetic variant and a specific component of the human language faculty.
3.9.6 Proto-language: Apes, Children, and Genie
The best way to understand what is distinctive about full human language is to examine what precedes it — what we can call proto-language: a communication system that conveys meanings but lacks the productive grammar that makes human language unbounded. The concept is brought into sharp focus by comparing two sets of two-word utterances from very different sources.
(12) Big train; Red book; Adam checker; Mommy lunch; Walk street; Go store; My milk; Pretty boat; Mama honey; Pig Mommy.
(13) Tickle Washoe; Open blanket; Roger ticket; You drink; Go in; In hat; Clothes Mrs. Gardner; Listen dog; Sign me; Hurry gimme.
These two sets are close to indistinguishable. Both draw on a small vocabulary of noun-like and verb-like terms assembled into pairs. They cover the same semantic territory: the attribution of qualities (big train, red book), possession (my milk, clothes Mrs. Gardner), location of actions (walk street, go in), and agent-patient relations (Adam checker, tickle Washoe). The apparent syntax, to the extent that word order carries meaning, is indistinguishable across the two samples.
Yet (12) is from children at the two-word stage of language acquisition, and (13) is from the chimpanzee Washoe, trained in American Sign Language [14]. The surface similarity masks a fundamental difference in motivation. Children at the two-word stage are in the business of categorising the world for its own sake: a child will say red book with no request implied, simply to predicate a property of an object. Washoe’s utterances, by contrast, are overwhelmingly communicative appeals about objects or actions she wants: requests to be tickled, to have a blanket opened, to receive food or attention. The ape is using proto-language instrumentally, as a tool for getting things. The child is using it representationally, as a tool for mapping reality — even when there is nothing to be gained.
This difference — categorisation for its own sake versus communication about immediate wants — marks the threshold between a communicative system and a representational one. Only a fully representational system can build the shared maps of reality — including maps of obligations, norms, past events, and hypothetical futures — that make complex cooperation possible.
Genie: the human case. In 1970, a 13-year-old girl was discovered in Los Angeles who had been confined and severely isolated since approximately 18 months of age, denied normal language input for over eleven years. The linguist Susan Curtiss [8] documented her linguistic development after her rescue. Genie acquired vocabulary rapidly — her capacity to learn words was intact — but her grammar remained permanently limited. Representative utterances from her early speech: Want milk; Mike paint; Applesauce buy store; I want Curtiss play piano. These are slightly more elaborated than the two-word utterances of (12) or (13): she can string together more than two content words and can express a desire for another person’s action. But tense marking, morphological agreement, and the recursive embedding that characterises adult grammar are absent. There is no generative phrase structure in the linguistic sense.
Genie’s language, like Washoe’s, like the two-word child’s, is what we can call proto-language. It is a communication system that is older, in an evolutionary sense, than full human language — phylogenetically ancient. It is present in very young children before grammar has developed, in language-trained apes who have never been exposed to grammar, and in a human who was denied the critical developmental window during which grammar normally takes hold. There appears to be no critical period for proto-language: Genie acquired it at age 13 without difficulty. There is, however, a critical period for the grammatical component that elevates proto-language into full human language — and that window had closed for Genie before she was found.
Genie’s case thus provides a natural experiment that dovetails with the genetic and lesion evidence reviewed in Section 3.9.5. The morphological generalisation ability damaged in Gopnik’s dysphasia family, the specific naming and categorisation abilities impaired by focal lesions, and the syntactic component that Genie could never acquire — all point to the same conclusion: human language is not a single faculty but a family of biologically specific capacities, only some of which are shared with other primates, and only some of which can develop without normal early exposure.
The proto-language capacity appears to be the shared substrate — the platform on which, in our lineage, a new grammatical architecture was built. The question that follows naturally is: how does that architecture arise? Can it emerge from cultural processes alone, without genetic change? The evidence from pidgins and creoles is the clearest answer available.
3.9.7 Pidgins, Creoles, and the Speed of Cultural Evolution
A pidgin is a contact language that emerges spontaneously when speakers of mutually unintelligible languages must communicate — most commonly under conditions of trade or colonial labour. Pidgins draw their vocabulary primarily from one dominant language, but strip away most of its morphology and syntactic complexity. They have no regular tense system, little or no agreement, minimal embedding, and no native speakers. Pidgin speakers share no common language; the pidgin is their improvised solution to an immediate communicative need. It is, in the technical sense, a proto-language: its utterances resemble those of Washoe and of Genie far more than they resemble the output of a native English or Japanese speaker.
A creole is what happens in the next generation. When children grow up in a community where a pidgin is the primary medium of communication, they do not simply learn the pidgin they are exposed to. They systematically expand it. They add consistent grammatical morphology, stable word order, tense-aspect-modality markers, and recursive embedding — until the resulting language has the full expressive power of any natural human language. The proportion of purely grammatical items (articles, prepositions, tense markers, complementisers, relativising particles) rises from near zero in the pidgin to approximately 50 per cent in established creoles. This is not a modest elaboration; it is the spontaneous creation of a grammatical system from proto-linguistic raw material. And it happens within a single generation, without instruction, driven by the same biological endowment that allows any child to acquire whatever language their community speaks.
Why study pidgins and creoles? Because they constitute a natural experiment in language creation: a case in which we can observe, in historical time, the transition from proto-language to full language. Most of the evidence bearing on the origin of language is indirect — inferred from fossils, comparative anatomy, or computational models. Creolisation gives us a direct window.
The Hawaiian case. The clearest documented instance is Hawaiian Creole English. In the late nineteenth and early twentieth centuries, plantation workers arrived in Hawaii from China, Japan, the Philippines, Korea, Portugal, and Puerto Rico. With no common language, they developed a pidgin — simplified English mixed with fragments of their native tongues — for essential communication. Their children, immersed in this pidgin from birth, spontaneously produced Hawaiian Creole English: a fully grammatical language with consistent syntactic rules, a complete tense-aspect-modality system, and the recursive structures entirely absent from the pidgin input. The jump from proto-language to full language occurred in a single generation [3, 4].
Cultural evolution is faster than genetic evolution by many orders of magnitude. A human generation is approximately 25–30 years — far too short for any significant genetic change. The Hawaiian children’s brains were genetically identical to their parents’. What changed was not the genome but the cultural environment: the children were immersed in communicative interaction, even in a structurally impoverished form, and their language faculty — the biological endowment shared by all Homo sapiens — supplied the grammatical architecture that the input lacked.
This means that the transition from proto-language to full language does not, in principle, require genetic change. It requires children, a community, and enough time for the first creole-speaking generation to emerge. The genetic endowment creates the capacity; the cultural environment triggers its full expression. This is a precise and historically documented instance of what Tooby and Cosmides [33] argued for vocabulary: that cultural transmission is the appropriate vehicle for the content of language, while the capacity to acquire language is the appropriate vehicle for genetics. Creolisation extends this argument from words to grammar itself.
For the study of persuasion, the creolisation finding carries a consequential implication. The most powerful persuasive technology in human evolutionary history, fully grammatical language, is as much a cultural product as a biological one. The capacity is genetic; the grammatical realisation is cultural. Groups that create the conditions for creolisation — dense communicative interaction across language boundaries — will produce, within a generation, the full expressive and persuasive power of human language. The biology is universal; the culture is the trigger. This is the deepest sense in which what follows in the next section on dual inheritance theory applies not just to beliefs and rituals, but to language itself.
3.9.8 From Genetic to Cultural Transmission: A New Mode of Persuasion
The persuasive mechanisms reviewed in the preceding sections — pheromone trails, alarm calls, waggle dances, grooming exchanges, dominance displays — share a fundamental property: the capacity for these behaviours is encoded genetically. A honeybee’s ability to produce and respond to the waggle dance, a meerkat’s alarm call repertoire, a chimpanzee’s disposition to engage in coalition politics — all are heritable in the strict biological sense. Each generation must re-evolve the relevant neural architecture through genetic reproduction.
Language introduces a qualitatively different mode of transmission. The capacity for language is genetically encoded — specialised cortical regions, precise articulatory control, sensitivity to syntactic structure — but the content of what language transmits is culturally inherited. A story, a ritual, a legal code, a religious belief is not passed from parent to child through DNA; it is transmitted through imitation, teaching, and symbolic communication, and can spread horizontally across individuals who share no kinship at all. This distinction between the vehicle (genetic) and the payload (cultural) is the heart of what Boyd and Richerson [5] termed dual inheritance theory: human evolutionary history is the story of two inheritance systems, genetic and cultural, operating simultaneously and shaping one another.
The consequences for persuasion are profound. Boyd and Richerson [5] argued that culturally transmitted traits — including systems of ritual, belief, and social norm — are subject to selection in their own right. A group that is successful because of its system of ritual experiences two simultaneous effects: by cultural evolution, its belief system spreads to neighbouring groups through imitation, prestige, or conquest; and by genetic selection, it favours individuals within the group who are constitutionally more susceptible to those beliefs:
“There is between-group selection for culturally inherited systems of belief that favour the success of groups, and there is individual selection for the genetically inherited ability to be influenced by ritual.”
— Boyd & Richerson, Culture and the Evolutionary Process [5], 1985
This idea — that political authority requires the engineering of shared belief, not merely the exercise of force — has a long history in political philosophy. Plato made it explicit in The Republic [25], arguing that the legislator’s central task is precisely this cultural management of belief:
“All he [the legislator] needs to do is to find out what belief is most beneficial to the state, and then use all the resources at his command to ensure that throughout their lives, in speech, story and song, the people all sing to the same tune.”
The candour here is remarkable: the primary instrument of social order is not law or force but the alignment of belief through narrative, music, and repetition — persuasion systematically deployed across an entire culture. Rousseau [27], writing two millennia later, placed a related insight at the heart of The Social Contract: legitimate political authority cannot rest on force alone, but requires the transformation of natural individuals into citizens who genuinely identify with the general will — a transformation that is, at its core, an act of cultural persuasion through shared institutions, rituals, and civic education.
What Boyd and Richerson formalised in evolutionary terms, Plato and Rousseau had identified through political philosophy: stable large-scale cooperation requires the active management of belief, and the most powerful tool for that management is cultural transmission — narrative, ritual, education, and the arts.
This creates a powerful positive feedback loop. Groups that develop more compelling persuasion systems — narratives that evoke strong emotion, rituals that produce collective effervescence, institutions that generate trust — tend to outcompete groups with weaker ones. Within those successful groups, individuals who are more susceptible to narrative persuasion cooperate more reliably and contribute more consistently to group success, leaving more descendants. Over many generations, this dual process would have shaped the human brain to be exquisitely receptive to story, ritual, and authority — not because susceptibility is always individually advantageous, but because groups containing susceptible individuals tended to outsurvive those that did not.
This coevolutionary logic explains something otherwise puzzling: why human beings are so readily moved by fiction. A novel’s characters do not exist; the events did not happen; yet readers experience genuine emotion, update moral intuitions, and sometimes change behaviour. This is not a cognitive failure — it is the expected output of a brain that evolved to treat culturally transmitted narratives as reliable guides to social reality.
3.9.10 Gossip, Reputation, and the Maintenance of Cooperation
The everyday form of language-as-persuasion is gossip: the informal exchange of information about the cooperative or defecting behaviour of third parties. Dunbar [9] estimated that roughly 65% of human conversation concerns social topics — who did what to whom, who can be trusted, who broke a norm. This is not idle chatter; it is the maintenance mechanism of a reputation-based cooperative system. The group’s collective persuasive acts (praise, censure, ridicule, warning) police free-riding in a manner structurally parallel to worker policing in honeybee colonies — but operating through language rather than chemical signals, and scalable to groups of any size connected by communication networks.
The ethnographic record bears this out. In small-scale forager societies, public shaming and ridicule of individuals who claim more than their share — what anthropologists call “levelling mechanisms” — are among the most frequently observed cooperative enforcement tools. The mechanism is purely linguistic: no physical coercion is required. The threat of being talked about badly is sufficient to suppress many forms of free-riding. Individuals who acquire a reputation for generosity receive more food-sharing partnerships, more alliance support, and better coalition options. The reputational currency is maintained by talk.
This extends into modern institutions in ways that are sometimes invisible because they are so familiar. Professional references are gossip formalised into a hiring institution. Yelp reviews are gossip about businesses. Academic citation patterns carry reputational information about the credibility of researchers. Social media pile-ons are the digital equivalent of public shaming around the campfire, operating at a scale that no forager band could achieve but following the same structural logic: a defector is identified, talked about widely, and excluded from future cooperative exchanges. The technology changes; the underlying mechanism of reputation management through language does not.
What changes with scale is the symmetry of the process. In a band of 150, gossip is roughly symmetrical — anyone can talk about anyone. Scale the group, add writing, then print, then broadcast, and the reach becomes radically asymmetric: a sender with access to a platform can address millions simultaneously, while no individual can mount an equivalent reputational response. The mechanism is the same as in the forager band. The structural conditions are not. That asymmetry is the starting point for everything in the rest of this book.
3.10 Reason and Ritual: Two Modes of Cooperative Persuasion
Large-scale cooperation takes two distinct forms, and placing two examples side by side makes the contrast vivid.
A termite mound (Macrotermes bellicosus) can stand four metres tall, house several million individuals, maintain internal temperature to within one degree Celsius, and circulate air through a ventilation system that rivals modern mechanical engineering. No individual termite has a blueprint for this structure. No architect reviewed the plans. The mound emerges from millions of local interactions: each termite responding to chemical and tactile signals from its immediate neighbours, with no individual possessing knowledge of the whole. This is cooperation through ritual: decentralised, unplanned, with order emerging from the accumulated effect of local signalling acts and not from any intention to produce that order.

The Large Hadron Collider at CERN tells a different story. Over ten thousand scientists and engineers from more than one hundred countries coordinated the design, construction, and operation of a 27-kilometre ring of superconducting magnets buried beneath the Franco-Swiss border. Every component was specified in advance; every interface was explicitly agreed upon; every experiment was designed, pre-registered, and reviewed by committees. Large Language Models are a comparably planned achievement: teams of thousands, enormous investment, explicit architectural decisions at every level. This is cooperation through reason: explicit argumentation, centralised planning, a detailed blueprint agreed upon before any physical construction begins.

The contrast between these two modes runs through the entire evolutionary history of cooperation. Eusocial insects cooperate almost entirely through ritual: chemical gradients, probabilistic responses to local signals, collective intelligence without any planning at any level. Human societies use both simultaneously. Historically, Gellner [15] argued that ritual has been the primary mechanism for creating large cooperative groups: communal, emotionally charged performance that stamps shared beliefs onto participants without requiring explicit argument. Reason, the explicit construction and exchange of arguments, is a more recent and more fragile achievement. It depends on specific institutional supports: writing, formal education, deliberative institutions.
Both depend on persuasion. They differ in whether the logic driving the outcome is accessible to the participants, and whether the result was intended by any of them. Language is the technology that made both possible at scale: without it, neither institutional reasoning nor shared ritual can bind strangers.
3.11 Persuasion Across Scales
Life kept finding the same solution to the same problem. Every time independent organisms needed to act as one — cells within a body, workers within a colony, hunters within a band, citizens within a state — they needed a way to make cooperative intent legible and defection costly. Each time the cooperation problem got harder, the communication system had to get more powerful. Chemicals. Dances. Grooming. Alarm calls. Coalition politics. Pantomime. Grammar. Shared fiction.
The table below traces that progression.
| Scale | Organism | Communication Channel | Key Persuasive Function | Transition Enabled |
|---|---|---|---|---|
| Cell → organism | Multicellular animals | Chemical (hormones, receptors) | Differentiation, division of labour | Multicellularity |
| Individual → colony | Eusocial insects, naked mole rats | Pheromones, mechanical signals | Recruitment, suppression of defection | Superorganism |
| Individual → group | Primates, lions, meerkats | Tactile, acoustic, gestural | Alliance building, alarm, coordination | Cooperative foraging and defence |
| Band → tribe | Early Homo | Proto-language, gesture | Reputation management, norm transmission | Coordinated hunting, territory |
| Tribe → civilisation | Homo sapiens | Symbolic language, writing | Shared fictions, institutions | States, markets, religions |
| Nation → globe | Modern humans | Print, broadcast media, internet | Mass persuasion, public opinion | Global coordination |
| Human → AI | Emerging | Large language models | Personalised persuasion at scale | To be determined |
The final row is still being written. Large Language Models represent a new kind of communicative agent: one capable of generating persuasive messages at industrial scale, tailored to individual recipients, across every channel simultaneously.
This returns us to where the chapter began: seventeen thousand years ago, in a cave in the Vézère valley, someone mixed iron oxide with animal fat and pressed a hand to the rock. The question posed at the opening was why — given that anatomically modern humans had existed for 200,000 years — the cultural explosion of symbolic art, long-distance trade, and shared ritual happened so recently and so suddenly. The answer the chapter has been building toward: not anatomy, not brain size, but the accumulation of enough shared fiction. Language gave humans displacement and propositional structure. Culture gave them the ability to build on what previous generations had worked out. At some threshold, those compounding layers produced communities that could hold the same story in their heads long enough to coordinate around it — to paint the same animals on the same walls season after season, to trade the same ochre across hundreds of kilometres, to bury their dead with the same rituals. The cave was not the cause of that threshold. It was the evidence that it had been crossed.
Whether the arrival of AI-generated persuasion at scale represents an analogous threshold, and what the consequences will be, is among the central questions this book tries to address.
The connection between language and persuasion is developed further in Chapter 3, where different academic disciplines — linguistics, social psychology, economics, neuroscience — are examined for the specific mechanisms by which messages change minds. The evolutionary framing developed here provides the deepest context for why those mechanisms exist at all: they are the accumulated solutions to the problem of cooperation across scale, refined over hundreds of millions of years of selection.

