3  Persuasion as a Major Transition in Evolution

NoteChapter Overview

This chapter situates persuasion within deep evolutionary history. The capacity to influence the beliefs and actions of others is not merely a cognitive or cultural phenomenon: the biological record shows it has repeatedly restructured life at the level of the organism, the colony, and the species. The chapter surveys the evidence from eusocial insects, cooperative mammals, and primate societies before turning to human language as the most radical persuasive technology yet evolved.

Somewhere in France, roughly 17,000 years ago, a group of people entered a cave and by torchlight painted the walls with horses, bison, and deer. They mixed pigments, coordinated effort, returned across multiple sessions. Those images are still legible to us today. The people who made them were not the first large-brained humans — anatomically modern Homo sapiens had existed for at least 60,000 years before those paintings appeared. What changed around 40,000 years ago was not the brain. It was something in how that brain was being used: art appeared, burial with grave goods, long-distance trade, explosion of tool types. The question this chapter tries to answer is what changed, and why it took so long.

Language is the obvious candidate — but its role only becomes legible once we understand the problem it solved. That problem is cooperation: how do independent agents, each pursuing its own interests, act as a unit? Evolution has been working on this question for hundreds of millions of years. Bacteria cooperate. Ants cooperate. Chimpanzees cooperate. Each time, the mechanism is the same at its core: one organism changes the behaviour of another through signals. Stripped of culture, rhetoric, and technology, persuasion reduces to exactly that: the use of signals to alter the behaviour of another agent. Non-human species provide the clearest view of what this must accomplish at the most basic level, before language and social convention complicate the picture.

Prehistoric cave paintings on the walls of the Lascaux cave in France, showing horses painted approximately 17,000 years ago.

Cave paintings at Lascaux, France, painted roughly 17,000 years ago. The artists mixed pigments, coordinated effort across sessions, and produced images that survived intact to the present day — the oldest surviving evidence of the kind of shared representation that only language makes possible. Photo: Peter80 / Sailko. CC BY-SA 2.5.

The connection between brain evolution and social life sharpens the argument. Dunbar [10] measured the ratio of neocortex volume to total brain volume across 36 primate genera and compared these ratios with typical group sizes for each species. The correlation is r = 0.77: species with proportionally more neocortex live in larger social groups. Since neocortex size varies heritably across primate lineages, this relationship is most naturally read as causal. Species that faced the cognitive demands of managing relationships in larger groups were selected for larger neocortices. The brain appears to have expanded not primarily to handle tools or difficult terrain, but to handle other individuals.

Applying this regression to humans, Aiello and Dunbar [1] found that our neocortex ratio predicts a social group of roughly 150 people. That figure, now called Dunbar’s number, turns up with notable consistency: in hunter-gatherer band sizes, Neolithic village remains, military company structures, and people’s functional social networks today. Maintaining 150 relationships takes continuous effort. Other primates manage this through physical grooming, which occupies roughly 20 per cent of waking time and proceeds strictly one pair at a time. For a group of 150, the arithmetic breaks down. There is not enough time in the day to groom all the partners such a group requires. Language fills the gap. Vocal exchange can be directed at several people simultaneously, requires far less time per bond maintained, and occurs while doing other things. It scales where grooming cannot.

3.1 Persuasion, Communication, and Coordination

Three concepts recur throughout this chapter and are often conflated in ways that obscure the biology.

Communication is the broader category: any process in which a signal produced by one agent reaches another and influences that agent’s state. A pheromone trail laid by an ant is communication. So is a waggle dance, a dominance display, a tweet. What distinguishes communication from simple causation is that the signal is functional: it exists because receivers respond to it in ways that have historically benefited the sender or the relationship between them.

Persuasion is a subset of communication in which the signal aims to alter the receiver’s beliefs, preferences, or actions. Not all communication is persuasive in this sense. A meerkat’s alarm call does not try to convince companions that a predator is present; it triggers a hardwired escape response. The call is communication, and it coordinates behaviour, but there is no inference happening at the receiver end that could be redirected or resisted. Persuasion enters when the receiver has options and the signal is designed to push among them — when, in other words, the receiver could be unpersuaded. By this criterion, human language is the most powerful persuasion technology in the biological record, because it can address receivers who have not merely escape-or-stay responses but entire systems of beliefs and preferences that can be updated by argument, narrative, or social pressure.

Coordination is what persuasion and communication make possible at the group level. Coordination is the state in which multiple agents act in ways that are mutually consistent and collectively beneficial — not because they share an instinct, but because each has been influenced to expect and rely on the behaviour of the others. Cooperation requires coordination, but coordination is harder than cooperation: it requires not just shared goals but shared models of what each party will do. Language is what made coordination possible beyond the limits of direct observation and shared routine, because language can represent plans, obligations, and the expected behaviour of parties who are not present.

The three concepts stack: communication is the mechanism, persuasion is its influence-directed form, and coordination is the social outcome it enables. What this chapter traces is how evolution built progressively more powerful communication systems, each capable of sustaining coordination at a larger scale, until language made coordination among millions of strangers not just possible but routine.

3.2 The Major Transitions Framework

Life has repeatedly solved a particular problem: how to get formerly independent entities to work as a single unit. Each time this happened, evolution did not simply add new behaviours to an existing system. It created a new level of organisation. Genes that once competed for replication started cooperating inside chromosomes. Cells that once competed for nutrients started cooperating inside bodies. Organisms that once competed for territory started cooperating inside colonies. At every step, the unit of evolutionary competition shifted upward: what was previously the arena of conflict became the platform for a higher-level competitor.

Maynard Smith and Szathmáry [22] catalogued eight of these moments and named them major transitions. Starting from the origin of self-replicating molecules in compartments, the sequence runs through the emergence of chromosomes (coalitions of cooperating genes held together by shared replication machinery), the origin of the DNA–protein translation system (which locked in the separation of heritable information from functional machinery), the emergence of eukaryotic cells (formerly independent bacteria now cooperating as organelles), sexual reproduction (which reshuffles genetic information between lineages), the origin of multicellular organisms (formerly independent cells now cooperating as tissues and organs), the emergence of eusocial insect colonies (colonies so integrated that most individuals forgo reproduction entirely), and finally the emergence of human language societies. At each step, a smaller group of entities gave up some autonomy to become part of a more powerful collective. And at each step, the transition required a mechanism for suppressing defection by the lower-level units: a chromosome holds genes in line because they share a single replication event; a eusocial colony keeps workers in line through pheromones and policing. Communication is the glue that holds each level together.

Queen pheromones in eusocial insect colonies illustrate the logic, on a timescale of weeks rather than billions of years. A honeybee queen produces a blend of fatty acids and aromatics, the queen substance, that is distributed through the colony by physical contact and grooming. Workers who receive this signal have their ovarian development chemically suppressed. They do not reproduce. They forage, build comb, and die at the entrance defending the hive. From the worker’s perspective this looks like a catastrophic sacrifice of individual fitness, and from the gene’s perspective it sometimes is. What makes it evolutionarily stable is that the workers share most of their genes with the queen’s offspring and with each other: the queen’s argument is not dishonest. It says, in chemical terms, “your genes propagate more effectively through this queen than through your own eggs,” and the arithmetic of haplodiploidy makes that claim true. The pheromone is not a trick. The arithmetic of haplodiploidy makes the queen’s chemical claim true, and workers are built to respond accordingly.

The human language transition followed the same pattern, but the communication system that enabled it was orders of magnitude more flexible than any of these predecessors. Grooming could maintain alliances in groups of 50. Alarm calls could coordinate vigilance across a troop. What language made possible was the transmission of cooperative obligations across time, distance, and individuals who had never met. A promise made today can bind behaviour next month. A reputation formed over years in one location travels to the next. A norm that emerged in one generation can be taught explicitly to the next, along with the reasons for it. None of the earlier transitions created a communication system capable of that. The jump from grooming to language was not merely a quantitative increase in signal complexity. It was a change in what signals could be about.

Chemical gradients push organelles to differentiate rather than proliferate. Queen pheromones persuade workers to suppress their own ovarian development. Dominance displays persuade subordinates to defer rather than contest. Before language, these systems were innate and inflexible: the signal triggered a fixed response, and neither sender nor receiver had any capacity to revise the exchange. Language changed this entirely. Flexible, explicit persuasion became possible, persuasion capable of representing futures and obligations, of proposing hypothetical arrangements, of invoking norms that do not yet exist in the immediate environment.

The framework makes a prediction. Each time cooperation scaled up, a new communication technology enabled it. The communication technology was not merely a consequence of the new cooperative arrangement; it was its precondition. The rest of this chapter examines those technologies in order of complexity, beginning with the chemical and mechanical signals of insect colonies and ending with the recursive grammar that allowed human societies to reach scales no other primate has approached.

3.3 From Genes to Culture: Bio-cultural Co-evolution

Each of those transitions shared a common feature: the communication system that enabled it co-evolved with the biological machinery that executed it. In humans, this co-evolution took an unusual form, one in which culture itself became a selective pressure on the genome.

This account situates the evolution of language within a broader bio-cultural co-evolution. The genetic capacity for language, specialised cortical regions, precise articulatory control, sensitivity to syntactic structure, was shaped by selection pressures that were themselves cultural: the demands of hunting coordination, pantomimic storytelling, and finally conversational argumentation. Each cultural advance in communication created new selection pressure on the biological substrate; each biological advance in communicative capacity opened new cultural possibilities. The result, visible in the archaeological record at 40,000 years ago, was a communication system of unprecedented power, one that could coordinate the construction not just of hunting parties but of shared fictions, institutions, and normative systems.

The mechanism connecting cultural practice to genetic change is sometimes called the Baldwin effect. When a behaviour is culturally learned and repeatedly beneficial, individuals who learn it faster or more reliably have social advantages: they coordinate better, secure more alliance partners, and survive more successfully. Over many generations, this selective advantage favours genetic variants that facilitate the learning. Applied to language: children who could acquire grammar more efficiently were better cooperators, and so better survivors. Over time, the biological machinery for grammar acquisition, the neural architecture of Broca’s and Wernicke’s areas, the sensitivity to phonemic contrasts, the timing of the critical period, became more precisely tuned. The innate grammar-acquisition device was itself shaped by the cultural practice of using language. There is no clean separation between “innate” and “learned” here: the innate is what the learned, repeated over thousands of generations, selected into the genome.

The division of labour between genome and culture follows from this logic. As Tooby and Cosmides [33] argued, the genome may as well store the vocabulary in the “cultural environment”: the words themselves are learned from the surrounding community, not inherited through DNA. This is precisely the right division of labour given that cultural evolution is far faster than genetic evolution. The grammar provides a stable generative engine; the vocabulary, which must track cultural innovations such as new tools, new social roles, new institutions, can evolve and diversify at the speed of culture.

The palaeontological record shows exactly when these co-evolutionary pressures produced their decisive outcome.

3.4 Persuasion in Non-Human Animals

A forager bee returns to the hive after finding a rich patch of flowers two kilometres to the north-east. She does not report her discovery; she performs it. On the vertical surface of the comb she runs a waggle dance: a figure-eight pattern in which the angle from vertical encodes the bearing from the sun, and the duration of the waggle run encodes the distance. Within minutes, hundreds of nestmates that have never visited the patch take off and fly directly to it. Nothing was said. No map was drawn. The information was transmitted through movement, and it worked.

Non-human animals show what persuasion looks like when culture has not yet touched it. In humans, almost every act of influence is wrapped in learned convention: language, etiquette, institutional role, narrative tradition. Strip all of that away and the underlying structure becomes invisible. Other species never developed those wrappings, so the biological core of persuasion is exposed and readable.

Three things become clear from studying it. First, persuasion is not cultural in origin. The waggle dance existed hundreds of millions of years before the first hominid. Alarm calls, pheromone trails, dominance displays: these are biological solutions to the cooperation problem, products of natural selection operating without any cultural mediation. Whatever is universal in human persuasion, whatever remains when convention is removed, is a variation on machinery that was already ancient. Second, each non-human system reveals a structural constraint that language later transcended. The waggle dance is extraordinarily precise about location but cannot refer to time: there is no waggle dance for “yesterday’s flowers” or “the patch that will bloom next week.” Queen pheromones suppress worker reproduction but cannot negotiate exceptions or propose novel arrangements. Every non-human system is frozen at a specific design solution, capable of solving one class of coordination problem and nothing else. Language, by contrast, is not frozen. It can represent anything. Seeing the constraints of non-language systems is the clearest way to understand what language added. Third, each case is a natural experiment. Evolution has independently evolved eusociality in insects, naked mole rats, and several spider species; it has independently evolved coalition politics in several primate lineages; it has independently evolved graded alarm calls in birds, mammals, and primates. These convergences reveal the recurrent logic of cooperation under selection. When different lineages facing similar problems arrive at similar solutions, the solution is probably close to optimal for those conditions.

The cases below trace how evolution built, over hundreds of millions of years, increasingly sophisticated persuasion machinery, beginning with insect colonies and ending with the coalition politics of chimpanzees.

3.4.1 Eusocial Insects: Division of Labour at the Extreme

The most dramatic example of persuasion enabling a major evolutionary transition is eusociality: the organisation of insect societies such as honeybees (Apis mellifera), leafcutter ants (Atta), and termites (Macrotermes) into colonies in which most individuals forgo personal reproduction entirely in order to serve the collective. This is, by any measure, one of the most radical subordinations of individual interest in the history of life.

3.4.1.1 Honeybees: Democratic Persuasion at the Hive

Honeybee colonies achieve coordinated collective action through a rich communication system. Seeley [29] documented at least seventeen distinct signal types exchanged between colony members, chemical and mechanical, covering brood recognition, queen presence, aggression, and forager recruitment. Mapping these signals onto the communication framework: the sender is an individual worker or the queen; the channel is chemical (pheromones diffused through the colony air, or deposited on surfaces) or mechanical (vibrations transmitted through the comb); the receiver is a nestmate; the message encodes a specific directive; and the effect is a measurable change in the receiver’s behaviour.

The celebrated waggle dance, first decoded by Karl von Frisch [13], illustrates the precision this system achieves. A forager returning to the hive performs a figure-eight dance on the vertical face of the comb. The angle of the waggle run from vertical encodes the azimuthal angle of the food source from the sun; the duration of the waggle run encodes distance. The forager does not merely signal “food exists”: it delivers a spatial argument that persuades nestmates to fly to a specific location they have never visited. The channel is entirely mechanical, wing vibrations and body movements, yet the information bandwidth is sufficient to coordinate thousands of individuals across distances of several kilometres. The direction of the food source is conveyed with errors under three degrees; the distance with errors of roughly ten per cent. This is specific enough to direct thousands of bees to a patch of flowers a few square metres across, two kilometres away.

What makes the waggle dance a form of persuasion rather than mere stimulus-response signalling is that receiving bees actively evaluate competing dances. When multiple foragers return simultaneously with different routes to different food sources, bees compare the dances, sometimes following one and then another before committing. The colony’s decision about which site to exploit emerges from receivers exercising something that resembles choice among competing advocates, not from a signal that triggers a fixed response.

More remarkable still is the colony’s mechanism for choosing a new nest site. When a swarm must relocate, scouts inspect candidate sites independently and return to advocate for their favoured location through competitive recruitment dances, each scout “arguing” for its site by performing more vigorous dances the better the site. Scouts visit competitors’ sites, and if convinced, may switch allegiance and begin dancing for the new location. Seeley [28] showed that this process reliably converges on the best available site even when no individual scout has visited more than one option. The colony’s decision emerges from a process structurally analogous to human deliberative democracy: decentralised individual advocacy, mutual inspection of alternatives, and quorum sensing to trigger the collective decision.

The stability of the colony also depends on the continuous suppression of individual reproductive ambition. Workers are capable of laying unfertilised (male) eggs, but they rarely do so because worker policing is enforced by nestmates who identify and destroy worker-laid eggs using chemical recognition cues [26]. The queen’s mandibular pheromones simultaneously suppress worker ovarian development and maintain colony cohesion: continuous chemical persuasion, sustaining the colony’s cooperative structure moment to moment.

3.4.1.2 Ants: Distributed Intelligence Through Chemical Language

Ants (Formicidae) have independently evolved eusociality many times and have colonised virtually every terrestrial habitat on Earth. Leafcutter ants (Atta cephalotes) lay pheromone trails from nest to food source that encode not only direction but, through the concentration and composition of the trail, information about food quality and quantity [36]. The channel here is purely chemical; the message is graded in intensity (more pheromone = better food); the effect is differential recruitment proportional to food value. When a source is depleted, returning ants deposit less pheromone, allowing the trail to dissipate, an automatic down-regulation of recruitment that prevents the colony from committing further effort to an exhausted resource. No individual makes this decision; it emerges from the aggregate of local persuasive acts.

Army ants (Eciton burchellii) provide a further example of persuasion without a central planner. They build living bridges from their own bodies to span gaps in the forest floor, with the decision of when to disassemble the bridge emerging from local traffic information: each ant in the bridge responds to whether nestmates are walking over it. The bridge adjusts dynamically to the column’s needs. As traffic increases, the bridge widens; as traffic slows, the bridge begins to dissolve. Each ant in the bridge is computing traffic load from vibrations it feels through the bodies of its neighbours, then adjusting its position accordingly. No individual plans the bridge. It is a standing wave of behaviour maintained entirely by local signals, collectively intelligent without any individual intelligence directing it. The bridge persists as long as it is being used and dissolves when it is not.

3.4.1.3 Beyond Insects: Naked Mole Rats and Eusocial Spiders

For a long time eusociality was considered an insect phenomenon, explicable by the peculiarities of Hymenopteran genetics. Then in 1981 Jennifer Jarvis [20] published her observations of naked mole rats (Heterocephalus glaber) in East African burrow systems, and that assumption collapsed. Colonies of up to 300 individuals, a single breeding queen, all other females reproductively suppressed. The queen’s mechanism is partly physical: she shoves and jostles non-reproductive females repeatedly and persistently. But the shoving works through biochemistry. The pressure of the queen’s presence suppresses the LH surges that would otherwise trigger ovulation in her subordinates. It is persuasion operating at the level of neuroendocrinology: the signal is tactile and olfactory, the effect is a sustained alteration of reproductive physiology that can persist for years. Unlike in bees and ants, haplodiploidy plays no role. Naked mole rats are diploid. What drives their eusociality instead is extreme inbreeding, which produces high genetic relatedness across the colony, combined with the demands of cooperative burrowing through hard, dry, resource-scarce soil that no individual could navigate alone.

A naked mole rat, a wrinkled, nearly hairless mammal, the first confirmed eusocial mammal species.

A naked mole rat (Heterocephalus glaber) — the first mammal confirmed as eusocial, living in colonies of up to 300 individuals dominated by a single breeding queen. Unlike insect eusociality, the naked mole rat’s cooperativity cannot be explained by haplodiploidy; extreme inbreeding and the ecology of underground burrowing drive relatedness instead. Photo: VigilancePrime / Wikimedia Commons. CC BY-SA 3.0.

The web-building spider Anelosimus eximius in tropical South America reached a similar cooperative arrangement by a completely different route [36]. Thousands of individuals share a single communal web, cooperate in prey capture, and defend the structure collectively. None of them is a queen; there is no reproductive suppression through chemical signals. What binds them is the web itself. When a prey item struggles against the silk, the vibration propagates through the shared structure, recruiting nearby spiders toward the source. The channel is the web; the message is the vibration pattern; and the effect is coordinated group action achieved without any individual directing it. Evolution arrived at collective behaviour through a completely different physical medium, which underlines the point: the specific mechanism of persuasion matters less than the function it serves. Any reliable signal that recruits cooperative action at the right time can become the basis for a society.

3.4.2 Primate Social Persuasion: Grooming, Coalitions, and Politics

Among mammals, the primates provide the richest evidence for sophisticated social persuasion — communication aimed not merely at coordinating immediate behaviour but at managing long-term relationships and social standing.

3.4.2.1 Chimpanzees: The Emergence of Political Persuasion

De Waal’s [35] detailed observations at Arnhem Zoo remain the canonical account of chimpanzee social politics. Male chimpanzees compete for alpha status not primarily through brute force, because an outright fight risks injury to both parties, but through a sustained process of coalition building that is recognisably persuasive. An ambitious male will systematically groom potential allies, sharing food and providing social support in return for backing in future confrontations. The grooming sessions function as negotiations: the sender offers a costly signal (time, grooming effort, food) that communicates commitment and creates a social debt. The receiver is expected to reciprocate support in future conflicts. The channel is tactile (grooming) and gestural; the message is alliance offer; the effect is the creation of a coalition that shifts the balance of power.

A chimpanzee in a contemplative posture, illustrating the social intelligence that De Waal documented in coalition politics at Arnhem Zoo.

A chimpanzee displaying the contemplative posture associated with social assessment — scanning the group, evaluating alliances, tracking who groomed whom. De Waal documented that alpha status in chimpanzee groups depends not on physical dominance alone but on sustained coalition management: who is owed a favour, who might defect, who needs to be appeased. Photo: Thomas Lersch. CC BY-SA 3.0.

De Waal documented that the most successful alpha males at Arnhem were not the largest or most aggressive, but those best at coalition management: grooming many partners, being reliably supportive to allies, and repairing relationships quickly after conflicts. Two males who had fought were often observed, within an hour, presenting themselves to each other for reconciliation. Uninvolved third parties would also approach the loser of a fight and groom or embrace them, a behaviour de Waal interpreted as post-conflict consolation. The consoler needs to recognise the loser’s distress and respond to it, which requires something like empathy, the capacity to model another individual’s emotional state and be moved to act on that model.

Chimpanzees also demonstrate strategic deception, a form of persuasion that involves deliberately misrepresenting information to influence a receiver’s behaviour. Males hide erections when approaching dominant individuals to avoid aggression, and individuals have been observed leading competitors away from food sources they have located, then circling back alone. Byrne and Whiten [6] catalogued dozens of such cases and proposed the “Machiavellian intelligence” hypothesis: that the demands of managing complex social relationships, tracking alliances, reading intentions, deceiving rivals, building coalitions, drove the expansion of primate neocortex. Social intelligence, in this view, is persuasive intelligence.

3.4.2.2 Baboons and Cooperative Sentinels

Baboons (Papio spp.) organise in large troops of 20 to over 100 individuals with multi-level dominance hierarchies that are reproduced daily through communication rather than continuous physical contest. Subordinates signal submission through vocalisation, postural appeasement displays, and grooming directed upward in the hierarchy; dominants signal status through gait, gaze, and priority-of-access behaviours. The social order is reproduced through the ongoing persuasive acts of its participants, not merely imposed by force.

The system goes considerably further than status display. Males use specific vocalisations, “grunts” directed at females before approaching them or their infants. The probability that a female will tolerate an approach is measurably higher after such grunts than in their absence, and the effect is stronger for males with a consistent history of non-aggressive behaviour toward that female. This is the full signal-model-decision sequence in operation: the receiver tracks the sender’s past behaviour, updates her model of his intentions, and regulates her response accordingly.

Females also engage in strategic alliance formation across matrilineal kin groups. A female facing a rival will selectively groom a high-ranking male ally in the days preceding likely conflict, then benefit from his proximity during the confrontation itself. The investment in grooming functions as a deposit in a social account, and the subsequent conflict is the withdrawal. Tracking these relationships across dozens of individuals, over weeks and months, requires a social memory that begins to look less like the fixed-response systems of insects and more like genuine political cognition.

Cooperative sentinel behaviour in meerkats (Suricata suricatta), studied extensively by Clutton-Brock and colleagues [7], illustrates how honest alarm signalling can evolve in the absence of close kinship. Sentinel individuals at elevated positions produce graded alarm calls: the call type encodes both the type of predator and the level of urgency, allowing group members to calibrate their response. The sender (sentinel) invests in costly vigilance; the message (alarm call) is graded and specific; the channel is acoustic; the effect is coordinated predator avoidance across the group. The system is maintained by reputation: sentinels who produce false alarms are less likely to receive reciprocal sentinel behaviour from groupmates.

3.4.2.3 Lions: Cooperative Hunting as Persuasion in Action

Lion (Panthera leo) cooperative hunting provides an example in a non-primate predator. Stander [31] documented that lionesses take on specialised roles in group hunts: “wings” flank the prey while “centres” drive it, with the role division emerging from the spatial positions individuals adopt before the hunt begins. No explicit negotiation is observed; the channel is purely visual (positional and postural signals); the message is the individual’s chosen role; the effect is coordinated group action that achieves prey capture that no individual could accomplish alone.

This coordination is accomplished without anything resembling symbolic communication. The information necessary to divide labour and time attacks is transmitted entirely through movement and position in space. The channel and message complexity need not be high to produce sophisticated collective outcomes. What matters is the reliability of the signal-response relationship.

An Asiatic lion pride resting together, illustrating the social coordination of cooperative hunters.

An Asiatic lion pride resting in Gir, India. The cooperative hunting strategies documented by Stander [31] in African lionesses depend on individuals reading and responding to each other’s spatial positions before a hunt begins. No vocalisation is required; position is the signal. Photo: Rutvijsinh1991 / Wikimedia Commons. CC BY-SA 4.0.

3.4.3 Semantic Signals: The Vervet Monkey

The examples above, chemical trails, waggle dances, alarm calls, cooperative hunts, all involve signals that influence behaviour. But are any of them semantic in the sense that a word is semantic: referring to a specific object or category in the world, independently of the immediate context? This question is sharpest in the case of the East African vervet monkey (Chlorocebus pygerythrus).

Vervets produce three acoustically distinct alarm calls, one for each of their major predators: a call for pythons, a call for martial eagles, and a call for leopards. Each call triggers a predator-appropriate response in listening monkeys. The leopard call causes them to climb higher into a tree (useless against an eagle); the eagle call causes them to look upward and move into dense undergrowth (useless against a leopard); the snake call causes them to stand upright and scan the ground. The responses are specific to the call, not merely to some general level of danger.

One interpretation is that the calls convey only degree of danger, and that the listening monkey looks around, identifies the predator itself, and responds accordingly. Seyfarth, Cheney and Marler [30] ruled out this explanation by playing recorded calls to monkeys in the absence of any actual predator. The monkeys still responded in the predator-appropriate way, climbing, scanning upward, or standing to scan the ground, demonstrating that the call itself, not the predator, drives the response.

The acid test for semantic representation was a habituation study. If an alarm signal is not followed by an objective threat, animals cease to react to it: they habituate. Vervets have two other calls, wrr and chutter, both used to signal the presence of a neighbouring group of monkeys. If a vervet is habituated to the wrr call, does that habituation transfer to the chutter call? If it does, this means the animal has coded both calls as referring to the same category of event, another group of monkeys, a hallmark of semantic representation.

Habituation does transfer between wrr and chutter, but only when both calls come from the same individual, not from different monkeys. Vervets can identify individuals by voice; what appears to be semantic transfer may partly be individual-recognition transfer. Seyfarth and Cheney concluded that the calls do function as semantic representations, but that the system is entangled with individual identity in a way that human language is not.

What, then, are the limits of this system? They are severe, and illuminating by contrast. Vervets do not use calls to refer to objects that are not present: a vervet will not produce the python call to warn a companion about a snake seen yesterday, or at a distant location. The calls are anchored in the here and now; they lack displacement, one of the most important properties that Hockett [19] identified as distinguishing human language from animal communication systems. Vervets also use alarm calls as a means of deception, producing a false call to drive competitors away from a food item, but this deception is conspicuously crude: the deceiving monkey shows no sign of alarm, and bystanders can often see exactly what it is doing. Animals have calls only for objects of immediate biological interest: predators, rivals, food. There is no vervet call for yesterday, or probably, or if you help me now I will help you later. A potentially complete representation system, one capable of referring to anything including hypotheticals and abstractions, was the decisive step. That is the step language made. The next section asks why such a system would evolve at all, given that producing signals that benefit others looks, at first glance, like a losing strategy.

3.4.4 The Prisoner’s Dilemma and Why Cooperative Persuasion Evolves

The vervet case raises a theoretical question: why would any individual evolve to produce honest signals that benefit others? If cooperation is individually costly, why does it not collapse under defection? The Prisoner’s Dilemma frames this precisely.

A fundamental theoretical puzzle underlies all the examples above: why would natural selection produce organisms that are persuadable, that respond to the signals of others in ways that may benefit the sender?

The Prisoner’s Dilemma (PD) formalises the problem. Two individuals can each choose to cooperate (C) or defect (D). If both cooperate, each receives a reward R. If one defects while the other cooperates, the defector receives the temptation payoff T (highest), while the cooperator receives the sucker’s payoff S (lowest). If both defect, each receives the punishment payoff P. The payoff ordering T > R > P > S means that regardless of what the other player does, defection yields a higher immediate payoff. In a single interaction, rational actors defect, and cooperation collapses.

The iterated Prisoner’s Dilemma changes this dramatically. When the same individuals interact repeatedly, future payoffs discount the present gain from defection. Axelrod [2] ran computer tournaments in which strategies submitted by game theorists competed in iterated PD. The winner, across two tournaments, was the simplest possible strategy: tit-for-tat. Cooperate on the first move, then do whatever the other player did last round. Tit-for-tat is nice (it never defects first), retaliatory (it punishes defection immediately), forgiving (it returns to cooperation as soon as the other player does), and transparent (the other player can easily learn its rule). These properties, Axelrod showed, are exactly what is needed to sustain cooperation in populations of self-interested agents.

The evolutionary implication is deep: in any species with repeated interactions and the ability to recognise individuals, the incentive structure shifts towards cooperative signalling. Persuasion, the use of signals to alter another’s behaviour in ways that benefit the sender, becomes evolutionarily stable when receiver and sender interact repeatedly, because receivers who respond to honest signals and withdraw from exploitative relationships outcompete those who do not. Nowak [23] synthesised the evolutionary routes to cooperation, kin selection, direct reciprocity, indirect reciprocity, network reciprocity, and group selection, and showed that in each case, the underlying mechanism is a signalling system that makes cooperative intent legible and defection costly.

This is the deep evolutionary reason why the social world is saturated with persuasion: honest communication is a stable equilibrium in populations of agents who interact repeatedly and track reputations. The evolution of language in humans, examined in the next section, built on this foundation.

The logic plays out in ways that are easy to observe. In humans, the split-or-steal format of televised game shows provides a near-ideal natural experiment: two strangers face a single iterated-PD-like choice, with large sums of money at stake, no future interaction, and a live audience. Most players defect. But occasionally a player reframes the game entirely — announcing in advance that they will always steal, then promising to share the winnings afterward — converting a one-shot PD into a credible commitment device and demonstrating, in front of cameras, how reputation and pre-announced strategy can substitute for repeated interaction:

The same dynamic appears in non-human animals, with cooperation maintained not by language but by memory and reciprocity. Vampire bats (Desmodus rotundus) return to the roost after nightly foraging and regurgitate blood meals to roostmates who failed to feed — selectively favouring past donors and withholding from past defectors. The relationship is stable because the bats interact nightly and recognise individuals. David Attenborough’s narration of this system in The Trials of Life makes the iterated-PD structure unusually explicit for a wildlife documentary:

3.5 Kin Selection and the Logic of Cooperative Persuasion

A worker bee will sting an intruder and die doing it. A meerkat sentinel stands exposed on a rock, calling loudly, making itself conspicuous to exactly the predators it is warning others about. A Belding’s ground squirrel produces a loud alarm call when it spots a hawk, drawing the predator’s attention to itself. These are costly acts that benefit the group at the individual’s expense. The answer begins with genetics. Hamilton [17] showed that altruistic behaviour can evolve when the recipient is a sufficiently close relative, formalised as Hamilton’s rule: rb > c, where r is genetic relatedness, b is the benefit to the recipient, and c is the cost to the actor. This elegant inequality, derived across two landmark 1964 papers, unified the previously puzzling phenomena of worker sterility, altruistic alarm calls, and cooperative breeding under a single quantitative framework.

In Hymenoptera (bees, ants, wasps), haplodiploidy, the mechanism by which females develop from fertilised eggs and males from unfertilised ones, means that full sisters share three-quarters of their genes (r = 0.75), a relatedness higher than that between a mother and her offspring (r = 0.5) [34]. The arithmetic here is worth pausing on. A worker bee who sacrifices reproduction helps raise sisters who share 75% of her genes. The inclusive fitness gain, the benefit to shared genes flowing through those sisters, can easily exceed the direct fitness cost of forgoing personal reproduction. This is why worker sterility evolves readily in Hymenoptera and not in diploid species: the haplodiploidy coefficient r = 0.75 for sisters is higher than the r = 0.5 for siblings in diploid organisms, and Hamilton’s rule requires rb > c. Higher r makes the left-hand side larger, more than enough to outweigh the cost of sterility.

This connection between kinship and cooperability is not merely arithmetic: it suggests that the trustworthiness of a persuasive signal, the probability that a receiver will act on it, is itself subject to evolutionary selection. Signalling systems evolve towards honesty when sender and receiver interests are sufficiently aligned, and towards manipulation when they diverge. Krebs and Dawkins [21] articulated this as the deep tension underlying all animal communication: signals that reliably benefit both parties are maintained by selection; signals that exploit receivers at their expense drive counter-adaptation. The evolutionary tension between honest persuasion and deceptive manipulation is the most ancient form of the propaganda arms race.

3.6 Costly Signals: Why Honest Communication Persists

The problem with honesty, from an evolutionary standpoint, is that it is always vulnerable to cheating. If a signal of high quality can be faked by a low-quality sender, selection favours fakers and the signal loses its informativeness. So why does honest communication persist at all?

Zahavi [37] provided the answer: signals that impose a genuine cost on the sender cannot be cheaply faked. A peacock’s tail is expensive to grow, metabolically costly to carry, and makes the bird more visible to predators. A weak peacock that grew such a tail would pay the cost without the fitness benefits that a strong peacock enjoys. The tail is informative precisely because it is burdensome — its very extravagance is the guarantee of its honesty. Zahavi called this the handicap principle [38].

The principle extends well beyond feathers. Male deer fight with antlers that are costly in bone, time, and injury risk. Bowerbirds construct elaborate structures that signal building ability and aesthetic sense. Meerkats stand on exposed rocks and call loudly, advertising their sentinel role at personal risk. In each case, the signal is credible because a low-quality individual cannot afford it: the cost enforces honesty.

Human persuasion inherits this logic directly. A speaker who travels far to deliver a message signals commitment. An orator who publicly stakes a prediction signals confidence. A leader who accepts costly obligations — redistributing food, taking on dangerous tasks, absorbing the first risks of collective action — signals alignment of interest with followers. In political science this surfaces as the costly commitment literature; in economics as signalling games. The underlying biology is the same across every domain: costly acts are credible precisely because they are costly.

Reputational commitment extends this further. Publicly announced promises are hard to break without social cost; the public nature of the promise is itself the enforcement mechanism. Much of what we call rhetoric — the choice of a bold claim, the decision to speak on record rather than off — is the deployment of costly signals in the reputational domain.

3.7 The Arms Race: Honest Signals and Their Mimics

Zahavi’s handicap principle predicts honesty. But it predicts something else too: mimicry. Wherever an honest signal becomes reliably responded to, a selection pressure arises for cheap imitations — signals that look honest but are not. Krebs and Dawkins [21] named this the manipulation side of the evolutionary tension. Every signal system, given time, generates an arms race between honest senders and deceptive imitators, and between credulous receivers and skeptical ones.

The evidence is everywhere in nature. Orchids that mimic the appearance and scent of female wasps deceive male wasps into pseudocopulation, achieving pollination without offering any reward. Cuckoos produce eggs that mimic the host’s eggs closely enough to evade rejection. Firefly species Photuris females mimic the flashing patterns of other firefly species to lure and eat males who mistake them for mates. In each case, receivers eventually evolve counter-adaptations: finer discrimination, subtler recognition criteria, resistance to the previously exploited signal.

The arms race between sender manipulation and receiver resistance is the deepest structural feature of communication, older than nervous systems. It predicts something that will become central to the rest of this book: that persuasion and resistance to persuasion co-evolve. Every technique of influence that becomes widely deployed generates, over time, corresponding immunities. The history of mass media is one version of this cycle — from print propaganda to the media literacy it eventually prompted; from television advertising to commercial skepticism; from email spam to spam filters. What changes with AI-generated persuasion is the speed at which new sender strategies can be deployed, potentially outrunning the evolutionary pace at which receiver resistance develops. That asymmetry is examined in Chapter 7.

Hamilton’s rule explains cooperation within kin. Costly signalling explains cooperation among non-kin who can observe each other’s costly acts. Neither reaches the scale of modern human societies. Something else is needed — something that allows cooperation among complete strangers, across time, at continental scale. That gap is where the human record begins.

3.8 The Human Evolutionary Record

To understand language as a genuine major transition, it helps to ground the argument in the palaeontological record. The timeline below reveals a puzzle: the extraordinary delay between the appearance of our lineage and the appearance of our culture [22].

Table 3.1: Key events in human evolutionary history. Mya = million years ago; kya = thousand years ago.
Period Species Brain size Key marker
4 Mya Australopithecines ~1/3 modern Upright posture; ape-sized brains
1.5 Mya Homo erectus Brain doubled Hand axe — unchanged for 1 million years
250 kya Earliest H. sapiens Near-modern Slightly more skilled toolmaking
100 kya Fully modern H. sapiens Modern Fully modern large-brained humans, tools still conservative
40 kya H. sapiens Modern Burst of innovation: cave paintings, burials, trade

A line chart showing hominin brain size in cubic centimetres from roughly 3.5 million years ago to the present, with a steep rise beginning around 500,000 years ago.

Trends in hominin brain size from Australopithecus to Homo sapiens, plotted against time. The steep acceleration over the last 500,000 years, followed by a slight decrease in the Holocene, underlines both the extraordinary speed of encephalization and the puzzle of the “cultural explosion” delay visible in the table above. Source: DeSilva et al. (2021), Frontiers in Ecology and Evolution. CC BY 4.0.

At 4 million years ago, the australopithecines walked upright but their brains were ape-sized, roughly one-third the volume of a modern human brain. By 1.5 million years ago, Homo erectus had appeared with a brain that had doubled, and was carrying a remarkable stone tool: the hand axe. What is astonishing is not the hand axe’s appearance, but its extraordinary stability. Virtually identical specimens have been found across Africa, Europe, and Asia, manufactured to the same specification for approximately one million years and across many thousands of generations [22]. Cultural transmission was clearly operating: the design was copied faithfully, but it was copying without cumulative improvement. This is culture, but not yet cumulative culture.

By 250,000 years ago the earliest Homo sapiens were present, with slightly more skilled toolmaking. By 100,000 years ago, fully modern large-brained humans were widespread, yet their tools remained broadly conservative. Then at 40,000 years ago there was a burst of innovation. Cave paintings appeared; the dead were buried with grave goods; shell ornaments were traded across hundreds of kilometres; tool types proliferated rapidly. Language is the obvious candidate for what changed. Only a communication system capable of displacement, of referring to the absent, the past, the hypothetical, could underpin simultaneous innovation in art, ritual, and long-distance trade, all of which require coordination around shared representations that do not exist in the immediate environment.

3.8.1 Big Game Hunting: Shared Interest and the Stages of Language

But what drove the evolution of language in the first place? Számadó [32] proposed an account centred on big game hunting, a peculiarly human activity unlike anything else in the primate repertoire. The coordination problem is specific: no single hunter can take down a large ungulate, but a group can, provided roles are assigned before the animal is sighted. The flanker who will drive the prey, the blocker positioned at the ravine, the catcher waiting downstream: each must know the plan before it unfolds. Communicating that plan requires referring to events that have not yet happened, to locations currently out of sight, and to the roles of specific individuals, all properties that animal signals entirely lack. A chimpanzee’s grunt cannot distinguish “you go left while I go right” from “there is food ahead.”

Big game hunting creates shared interest in a way that most primate activities do not. All participants share the outcome whether the hunt succeeds or fails, which minimises the conflict between signaller and receiver that plagues other primate communication. Other primate activities, grooming, mating, food competition, place sender and receiver in partially opposed positions. Hunting does not. It also provides a natural check on dishonesty: who showed up, who ran, who held position is visible to all participants during and after the event.

This context created selection pressure that unfolded in two stages. The first was indexical and iconic signs: concrete objects or actions that referred directly to the prey, a skull or horn shown to indicate the target species, or a gesture mimicking the animal’s movement. These signs were easy to imitate, grounded in shared perceptual experience, and sufficient for the most basic task of recruitment: getting others to come along. Számadó’s model predicts that the earliest referential communication of this kind was about concrete objects, prey species and location, before extending to actions, roles, and plans. The second stage arose from the demands of planning itself. Communicating about events that have not yet happened, about positions to adopt before the animal arrives, drove selection toward more complex communication: compositional signs capable of specifying relations between agents, actions, and objects, rather than merely pointing at a present referent.

That compositional, planning-oriented communication almost certainly began not with the voice but with the body, as the next section shows.

3.8.2 Pantomime Predating Verbal Language

The sequence reconstructed by Számadó places gestural and pantomimic communication before verbal language in evolutionary time. Ferretti and Adornetti [12] develop this argument in detail: archaic hominins employed pantomime as a primary persuasive medium, a nonverbal, mimetic, non-conventionalized form of communication that represented events and stories through coordinated body movement and relied on shared mental imagery. Unlike the limited gestural repertoires of other apes, pantomime is inherently narrative: it can depict an absent entity, a sequence of events, a causal chain. Experimental evidence shows that gesture dominated over vocalization in early human communicative acts, and that gesture has significantly greater potential than vocalisation for bootstrapping a shared communicative system from scratch.

The reason gesture comes first is straightforward: mimicry of visible actions is simpler than arbitrary acoustic symbols. You can mime a running antelope with your hands; you cannot easily vocalise it without prior convention. Gesture is transparent to the receiver in a way that sound is not, because the form of the gesture shares properties with what it depicts. A plausible neural substrate for this is the mirror neuron system, first characterised in macaques: cells in the premotor cortex that fire both when an action is performed and when it is observed in another individual. Because gesture is visible and imitable, and because the same neural circuits are activated both in producing and perceiving an action, gesture would naturally be the first medium for communication about the actions of agents. Sound became the dominant channel later, once the conventions were established, because of its advantages in range, darkness, and multitasking.

The emergence of conversational language built on this gestural base, adding the dimension that pantomime alone cannot achieve: turn-taking argumentation.

3.8.3 Conversational Language as Reciprocal Persuasion

Ferretti and Adornetti [12] locate the distinguishing feature of modern Homo sapiens not in more complex signs but in conversation: the turn-taking exchange in which both parties alternately produce and respond to communicative acts, each trying to shift the other’s beliefs or actions. Not a one-way transmission but a negotiation.

What makes conversation cognitively special is turn-taking itself. Each speaker must model what the other person understood from the previous turn and respond to that model, not to what was said but to what was registered. This demands theory of mind operating in real time: I need to track not only my own intention but your current mental state as I produce the next utterance. Pantomime, however elaborate, does not require this because there is no conversational floor to manage, no obligation to respond to the other’s interpretation rather than merely repeating one’s own display.

Conversation, on this account, was the evolutionary trigger for grammar. The demands of reciprocal persuasion, composing novel arguments in real time, responding to objections, specifying precisely which object or action or time-point is at issue, required a combinatorial system capable of generating an unbounded number of distinct messages from a finite vocabulary. Consider what is needed to say “if you bring the spears to the ridge, I will drive the prey toward you from the south.” That sentence requires tense, conditionals, reference to locations currently out of sight, and subject-predicate structure linking specific agents to specific roles. Simple declaratives and requests, the kind pantomime can approximate, are not enough. The pressures driving grammar were argumentative, not aesthetic: exchanging reasons, proposing and rejecting plans, negotiating roles and obligations in real time.

Through this process, human communication became multimodal: integrating both speech and gesture as complementary channels. Speech took on the primary grammatical burden, the combinatorial, recursive system for specifying relations among arguments, while gesture retained its role in grounding reference, expressing emphasis, and conveying spatial and iconic information that resists easy encoding in syntax. The result is a hybrid system in which the full meaning of an utterance is often distributed across both channels, but whose core propositional structure is carried by the spoken word.

The genetic architecture that makes all this possible was not installed in a single evolutionary step. It co-evolved with the cultural practices it enabled.

3.9 Language: The Human Major Transition

The communication systems surveyed so far — chemical gradients, waggle dances, alarm calls, coalition politics — share a fundamental constraint. They all operate in the present. A vervet alarm call signals a leopard here and now. A grooming session cements an alliance with the individual directly in front of you. Pheromone trails point to food that currently exists. No signal in any of these systems can refer to what happened yesterday, what might happen tomorrow, or what would have happened if someone had behaved differently.

Language broke that constraint. It differs from every communication system that came before not in degree but in kind, through three properties absent in all animal communication [19]:

  • Combinatorial productivity: a finite set of phonemes combines into an unbounded number of morphemes, which combine into an unbounded number of sentences, each capable of expressing a distinct meaning.
  • Displacement: language can refer to entities and events that are not present in the immediate environment, including objects in the past or future, distant locations, and purely hypothetical situations.
  • Propositional structure: language encodes not just the identity of referents but the relations between them, including causal, conditional, and normative relations.

These properties together mean that language can coordinate behaviour around representations of the world, including representations of social rules, obligations, and sanctions, rather than merely around present stimuli. A honeybee’s waggle dance communicates the location of a food source with extraordinary precision, but it cannot communicate that a certain flower patch is morally off-limits, or that a nestmate who visited it owes an apology. Language can. This is what makes language the pivot of this book: every mechanism of persuasion examined in the chapters that follow — attitude change, framing, narrative, political rhetoric, AI-generated content — operates through it. Without displacement, without propositional structure, without the ability to say if you do this I will do that or they did something terrible last year, none of those mechanisms exist. Persuasion at the scale that humans practise it is not an application of language; it is one of the primary reasons language evolved.

That connection runs in both directions. Language made large-scale persuasion possible; the selection pressure for large-scale cooperation made language evolutionarily advantageous. The two drove each other. The following sections trace how — starting with the social scaling problem language solved, and ending with the way cultural transmission turned language into a system that persuades not just individuals but entire civilisations.

3.9.1 Language as an Enabler of Social Scale

Dunbar [9] proposed that language evolved primarily as a form of social grooming — a way of maintaining cooperative alliances among individuals who could not physically groom all their allies simultaneously. In non-human primates, physical grooming is the primary mechanism for building and maintaining social bonds, but it is costly: it occupies roughly 20% of a typical primate’s waking day, and it can only be directed at one individual at a time. These constraints impose a hard ceiling on group size — estimated at roughly 50–80 individuals for most non-human primates.

Language lifts this ceiling. Because vocal interaction can be directed at multiple individuals simultaneously, requires far less time per bond maintained, and can occur during other activities (foraging, walking, resting), it allows cohesive social groups of up to approximately 150 — the figure derived from the neocortex regression [10], and one that recurs with unusual regularity across hunter-gatherer bands, Neolithic village sizes, military unit structures, and functional social networks studied today.

Two chimpanzees grooming each other, illustrating the social bonding mechanism that language replaced at larger group sizes.

Chimpanzees grooming in Higashiyama Zoo. Physical grooming is the primary mechanism for maintaining social bonds in non-human primates, occupying roughly 20 per cent of waking time and proceeding strictly one pair at a time. It cannot scale beyond groups of roughly 50–80. Language, Dunbar argued, is what replaced it. Photo: Nattanan23 / Wikimedia Commons. CC BY-SA 3.0.

But language does not merely scale up grooming. It changes the content of social bonding. Where grooming can only communicate presence and tolerance, language can communicate endorsement or condemnation of absent third parties — a capability with profound implications for cooperation. Knowing that a particular individual cheated in a trade three villages away, and being able to transmit that information to all one’s trading partners, creates a reputational infrastructure that extends cooperative norms far beyond the bounds of personal acquaintance. Nowak [24] termed this mechanism indirect reciprocity: cooperation with strangers is sustained not by direct exchange but by reputation, and reputation requires language.

Maynard Smith and Szathmáry [22] spell out what the social intelligence account implies for the content of language. In humans, it would have been selectively advantageous to communicate about time (yesterday’s hunt, tomorrow’s plan), possession (who owns what, who owes a favour), beliefs and desires (what she thinks, what he wants), tendencies (who can be relied upon, who defects), obligations (who owes what to whom), truth and probability (was she really there? how likely?), and above all hypotheticals and counterfactuals (what would happen if we cooperated; what would have happened if he had not run). As they put it: “the intellectual arms race took place within the species itself.” Every one of these communicative domains is socially consequential in exactly the way the social brain hypothesis would predict — they are the content that matters for managing alliances, predicting others’ behaviour, and coordinating complex collective action. Non-human primates, limited to signals about present and perceptible states, cannot engage with any of them.

Every communication system examined so far transmits capacity and content through the same channel: genetics. Each generation re-evolves the waggle dance, the alarm call, the grooming repertoire from scratch. Language breaks this pattern. The capacity is genetic, but the content (stories, norms, beliefs) is culturally inherited, spreads to strangers, changes within a generation, and compounds across centuries. That distinction is developed in the sections that follow; it is the difference that makes language a transmission medium of a qualitatively different kind.

3.9.2 Language and Thought: Distinct Systems

If language evolved primarily for persuasion rather than reasoning, we should expect language and thought to be partially separable — distinct systems with distinct evolutionary histories that happen to interact. The evidence bears this out. A persistent misconception treats the two as the same thing — as if thinking were simply internal speech. Language and thought are biologically distinct systems, and this distinction matters profoundly for understanding what language does as a persuasive technology. Fedorenko, Piantadosi and Gibson [11] review the full body of evidence and conclude that language is primarily a tool for communication between minds, not a medium of private thought.

The clearest evidence comes from patients with severe aphasia — the selective loss of language following damage to the left hemisphere’s language areas. Such patients may lose virtually all ability to produce or comprehend speech, yet retain the ability to perform arithmetic, solve spatial puzzles, follow complex non-verbal instructions, and reason causally about the world. The language system, even when catastrophically damaged, does not take reasoning down with it.

The complementary pattern is equally informative. Certain forms of frontal lobe damage or thought disorder leave linguistic fluency intact — the patient produces grammatical, well-formed sentences — while producing severe impairments in decision-making, planning, and logical inference. A person can speak perfectly and reason very poorly. Together, these two patterns constitute a double dissociation: each system can be selectively damaged while the other is preserved, which is the strongest evidence that they are anatomically and computationally distinct.

The structure of language itself reinforces this conclusion. If language had evolved primarily as a tool for thinking, we would expect it to be optimised for the demands of reasoning: precision, unambiguity, completeness. Instead, the statistical structure of every known human language reflects the pressures of communication between a sender and a receiver [11]. Across languages, the most frequently used words are the shortest — word length is predicted more strongly by contextual predictability than by meaning complexity. Grammatically related elements cluster together within sentences, reducing the listener’s memory load; analyses of large corpora show that actual sentences are consistently shorter in dependency length than random arrangements of the same words would be. And languages tolerate, even exploit, ambiguity: in predictable contexts a shorter ambiguous expression transmits the same information as a longer unambiguous one. Private reasoning, which has no receiver, would gain nothing from ambiguity. Language’s tolerance of it is a signature of its communicative function.

Developmental evidence runs in the same direction. Prelinguistic infants track object permanence, attribute intentions to agents, and compute causal chains before they have the syntactic or lexical resources to describe any of this. Going the other way, children acquire grammatical patterns in domains where their conceptual understanding lags — producing passive constructions or embedded clauses in contexts where they do not fully grasp the logical relationship being expressed. The two systems develop on their own schedules, consistent with distinct biological programmes.

Language enables the persuasive achievements described in the remaining sections of this chapter: shared fictions, institutions, normative systems. Not by creating new thoughts in individuals, but by transmitting representations between minds. Persuasion via language is other-directed: a technology for aligning the mental states of distinct agents, not for improving the reasoning of any one of them.

3.9.3 From Signals to Recursion: The Hierarchy of Linguistic Capacity

Language did not emerge as a finished system. It is better understood as a hierarchy of increasing computational power, each level enabling communicative and cognitive capacities inaccessible to the level below it. Mapping this hierarchy clarifies what is shared between humans and other animals, what is uniquely human, and why the uniquely human components were the decisive step for cooperation at scale.

Level 1 — Signals. The most primitive communicative acts are signals: outputs that reliably trigger specific responses in receivers, with the signal-response relationship fixed by biology. The vervet alarm calls (Section 3.4.3) are paradigm cases: three acoustically distinct calls, each eliciting a predator-appropriate flight response, each biologically specified. The honeybee’s waggle dance encodes direction and distance to food with extraordinary precision. Meerkat sentinel calls grade continuously with threat level. In each case, the signal is tied to an immediate, perceptible state of the world — a real predator, a real food source, a real threat. What signals enable: rapid, reliable coordination of behaviour around present stimuli. What they cannot do: refer to the absent, the categorical, or the arbitrary. There is no vervet signal for yesterday’s python.

Level 2 — Symbols. A symbol is an arbitrary sign-referent relationship: the acoustic or gestural form of the symbol bears no iconic resemblance to what it refers to. The English word cat sounds nothing like a cat; the word red is not itself red; the sign for apple in American Sign Language does not look like an apple. This arbitrariness is not a limitation — it is the crucial enabling property. A signal system tied to iconic or indexical resemblance can only refer to things that can be imitated or pointed at. An arbitrary symbol can refer to anything, including things that cannot be perceived, imitated, or pointed at: obligations, possibilities, mathematical objects, the future. Washoe’s acquisition of arbitrary ASL signs demonstrates that the symbol-forming capacity is not unique to humans — trained apes can acquire a limited set. What symbols enable: naming, the assignment of a stable label to a category that can then be communicated across individuals and across time.

Level 3 — Vocabulary expansion. Once the symbol principle is established, the vocabulary can grow without limit. Each new word extends the referential scope of the system without requiring new signal infrastructure. Vocabulary can track cultural innovation: new concepts get new names (algorithm, democracy, copyright), and those names can spread through a community within a generation, far faster than any genetic process. This is why Tooby and Cosmides [33] argued that the genome stores the capacity to learn words rather than the words themselves: cultural evolution generates vocabulary faster than genetic evolution ever could. What vocabulary expansion enables: the categorical mapping of the world, including the shared conceptual maps that underlie coordinated action among strangers who have never met.

Level 4 — Simple combinations. Placing two symbols in relation — big train, tickle Washoe, my milk — produces a qualitatively richer output than either symbol alone. The combination encodes a proposition: an assertion about how two entities stand in relation to one another. This is the level at which the two-word child, the language-trained chimpanzee, and Genie all operate (see Section 3.9.6). The semantic territory covered by two-word combinations is already substantial: attribution of properties (red book), possession (my milk), location (walk street), agent-patient relations (Adam checker). What combinations enable: propositional content — the expression of states of affairs rather than merely the naming of objects. What they cannot express: the difference between the dog bit the man and the man bit the dog, because at this level word-order rules are absent or inconsistent, and embedding is impossible.

Level 5 — Syntax. Syntax is the system of rules that governs how symbols can be combined — which structural roles they can play, in which order, with which agreement relations. The decisive syntactic innovation, present in all known human languages and absent in all animal communication systems, is the subject-predicate distinction (see Section 3.9.4): the separation of the argument slot (what is being talked about) from the predicate slot (what is being said about it). This allows a vocabulary of N nouns and M verbs to generate N × M distinct propositions rather than N + M distinct signals. The expressive capacity grows multiplicatively rather than additively. What syntax enables: unbounded messages from finite means — the ability to say, and understand, sentences never before uttered.

Level 6 — Recursion. Recursion is the embedding of one linguistic structure inside another of the same type, without principled limit. A sentence can contain a relative clause that contains another relative clause: the dog that the man who the woman hit saw ran. A verb of mental state can take a propositional complement that contains another propositional complement: She knew that he believed that they had agreed to leave. Conditional and counterfactual reasoning is structured recursively: if he had known that she would have done what she said she would never do, he would not have…. Recursion is what allows language to represent thoughts about thoughts — the basis of full-blown theory of mind: not merely knowing that someone has a belief, but knowing what they believe someone else believes about what you believe.

What recursion enables: the entire domain of embedded social reasoning — the contractual, legal, narrative, and moral reasoning that human cooperation depends on. A contract (“if you do X, I will do Y, unless Z obtains, in which case…”) is a recursively embedded conditional. A legal argument is a chain of recursively embedded propositions about what others did, intended, agreed to, and were permitted to do. A novel embeds one character’s consciousness inside another’s, nested within a narrator’s, nested within the author’s representation of a world that never existed. None of this is possible without recursion.

Level 7 — Abstract reference and displacement. The final level is the capacity to refer to entities and events that are not present in the immediate perceptual environment — and, beyond that, to entities that may not exist at all. Displacement [19] is the ability to speak of the past and future, of distant locations, of hypotheticals and counterfactuals. Abstract reference extends this to entities that have no spatiotemporal location whatsoever: obligations, rights, probabilities, mathematical objects, moral duties, social roles. A corporation is not a physical object; democracy is not a perceptual category; justice does not exist at any particular location. Yet these abstractions coordinate the behaviour of millions of people who have never met.

What abstract reference enables is precisely the institutional infrastructure that distinguishes human civilisation from the cooperation of every other species: laws, markets, religions, scientific communities, states. Every one of these institutions is, in the analysis of Section 3.9.9, a shared fiction — a collectively held representation of something that has no physical existence but that generates real coordination through belief. The capacity to represent and communicate about non-present, non-perceptible, non-existent entities is not a luxury; it is the communicative foundation of everything that makes human social organisation unique.

The hierarchy as an evolutionary ladder. The levels are not merely descriptive categories; they map onto distinct evolutionary and developmental stages. Animal communication systems generally reach Level 2 (symbols, in trained apes) or remain at Level 1. Proto-language — in the two-year-old child, the trained ape, and the language-deprived human like Genie — operates at Levels 2–4. Full human language achieves Levels 5–7. The transitions between levels are not smooth gradients; each requires qualitatively new neural architecture. It is the jump from Level 4 to Level 5 — from combination to syntax — that constitutes the major transition in human evolution, and it is Levels 6 and 7 that make human persuasion qualitatively unlike anything seen elsewhere in the animal kingdom.

Table 3.2: The hierarchy of linguistic capacity, from signals to abstract reference. Each level is a precondition for the next. Human language occupies all seven levels; no non-human system is known to exceed Level 2 without sustained human training.
Level Capacity Example What it enables
1 Signals Vervet alarm calls Immediate behavioural coordination
2 Symbols Washoe’s ASL signs Naming arbitrary categories
3 Vocabulary expansion Cultural words (algorithm) Tracking cultural innovation
4 Simple combinations Big train; Tickle Washoe Propositional content
5 Syntax Subject-predicate structure Unbounded messages from finite vocabulary
6 Recursion She knew that he believed that… Theory of mind; contracts; narrative
7 Abstract reference Obligations, rights, justice Institutions; law; shared fictions

3.9.4 Grammar, Syntax, and Semantics

The most fundamental structural feature of human language — and one that has no counterpart in any animal communication system — is the subject-predicate distinction. Consider four concepts: dog-running, dog-sleeping, lion-running, lion-sleeping. A system without grammar would need a separate signal for each of these four combinations. Human language does something more powerful: it provides two nouns (dog, lion) and two verbs (run, sleep), and allows any noun to be combined with any predicate. To say “the dog is sleeping” is a predication — an assertion that a property holds of an entity. Since many properties can be predicated of each entity, and many entities can be referred to by each noun, the range of things that can be communicated with a vocabulary of a given size grows not additively but multiplicatively. The subject-predicate distinction is a universal feature of all known human languages, and it is the basis of our ability to produce and understand an indefinitely large number of sentences from a finite vocabulary. A vervet monkey cannot do this: its alarm calls are fixed signals, not combinations of reusable parts.

Why is only the capacity to learn language innate, and not the vocabulary itself? If language were adaptive, would it not be more efficient to transmit the vocabulary genetically, rather than requiring each child to learn it from scratch? Tooby and Cosmides [33] offer a compelling answer: if the vocabulary is learned, we can acquire names for cultural innovations — screwdriver, constitution, algorithm. Cultural evolution is far faster than genetic evolution; long before any appreciable number of words could be genetically assimilated, dialects and distinct languages were already present and diverging. The genome may as well store the vocabulary in the “cultural environment” — meaning that the capacity to learn words is heritable, while the words themselves are maintained and transmitted culturally. This is a precise instance of the bio-cultural co-evolution described in Section 3.9.8: the biological endowment creates the receptacle; the cultural environment fills it.

The innateness of grammar’s deep structure presents a different puzzle. The surface rules of different languages vary enormously, but their deep structural properties — subject-predicate organisation, recursive embedding, argument structure — are universal. If these structural principles cannot be learned from the input (bottom-up), nor derived from general reasoning principles (top-down), their existence requires explanation. As Bates and colleagues argued, there are logically only two possibilities: either universal grammar was installed directly by the Creator, or our species underwent a cognitive mutation of unprecedented magnitude — a Big Bang of the language faculty.

This is not an argument that should appeal to an evolutionary biologist. We have been told too often that the eye could not have evolved by natural selection, because any alteration to its structure would destroy its function. Yet we know of many functional intermediates between a simple pigment spot and the vertebrate eye, each fully adequate for the organism’s needs at that stage. Evolutionary biology teaches that complex organs are built incrementally, not installed all at once. The question for language is whether comparable intermediates can be identified. The snag is that the intermediates no longer exist — unlike the eye, for which we have living comparative examples across phyla. But the evidence from cases of partial linguistic capacity — specific language impairment, aphasia, sign languages at different stages of grammaticalisation — suggests that grammatical competence is not all-or-nothing. There can be partial grammar, and partial grammar is far better than none.

3.9.5 Language in the Brain: Genetic and Lesion Evidence

The neurological evidence for language as a biologically specific faculty — not merely a general cognitive capacity applied to communication — comes from two sources: cases of selective brain damage and cases of selective genetic impairment.

Diagram of the left hemisphere of the human brain showing Broca's area in the inferior frontal gyrus and Wernicke's area in the posterior superior temporal gyrus, with the arcuate fasciculus connecting them.

The principal cortical regions involved in language production and comprehension. Broca’s area (frontal lobe, left hemisphere) governs the grammatical sequencing and articulation of speech; Wernicke’s area (posterior superior temporal lobe) governs the comprehension of spoken language. Both are connected by the arcuate fasciculus. Damage to each produces distinct aphasia syndromes, illustrating the anatomical modularity of the language faculty. Diagram: UX Stalin. CC BY-SA 4.0.

Patients with damage to the temporal segment of the left lingual gyrus suffer from colour anomia. They experience colour normally and are in full command of word morphology — they can produce and comprehend sentences — but they are unable to pair colour names with colours: they may pair yellow with grass and green with banana. Given a colour name, they point to the wrong colour. The link between word and concept is selectively impaired. Patients with damage to the anterior and mid-temporal cortices recognise objects correctly but cannot name them; they say, “I know it, but I cannot say the name.” Oddly, this naming difficulty is more pronounced for natural objects than for human-made artefacts, suggesting that different parts of the brain represent categories of objects differently. Damage to the left anterior temporal lobe can selectively impair the ability to retrieve the names of unique persons, while leaving common-noun retrieval intact. Each of these dissociations reveals a distinct sub-component of the language system — a modularity within language itself that is anatomically grounded.

The biological specificity of language is shown most directly by a British family studied by Gopnik [16]. Across three generations, 16 of 30 family members were affected by a peculiar language disorder (dysphasia). The pattern of inheritance — affecting some members of a sibship while sparing others — is consistent with a single dominant autosomal gene. The disorder is not due to imitation of a disordered parent: children affected while one parent is normal still show the impairment. These individuals are not globally impaired: they tell jokes, converse, and some are mathematically capable. There is no general failure to handle hierarchical structures. The impairment is specific to one aspect of morphology: the affected individuals cannot generalise grammatical rules.

The nature of the deficit is beautifully illustrated by Gopnik’s examples. Affected children write sentences such as:

She remembered when she hurts herself the other day. Carol is cry in the church. On Saturday I went to nanny house with nanny and Carol.

In each case the child fails to mark tense or possession using the appropriate morphological change — hurt should be hurt (past), cry should be crying, nanny house should be nanny’s. When shown a picture of an imaginary creature called a wug and then a picture of several such creatures, a normal child immediately says wugs. The dysphasic child cannot: they can learn individual examples of plurals and past tenses, just as we all learn that the past of go is went, but they cannot generalise the rule to new cases. Grammar, for them, is a collection of memorised facts rather than a productive system.

One child demonstrated this precisely. On a Monday she wrote: On Saturday I watch TV. Her teacher corrected this to watched. The following week she wrote: On Saturday I wash myself and I watched TV and I went to bed. She had learned that the past of watch is watched as a particular fact; she had not extended the rule to wash; and she already knew went as a unique memorised form. The productive morphological rule, the ability to generalise, is the specific thing that is missing.

This case carries several implications for understanding what language is. First, it shows that there can be intermediates between perfect linguistic competence and none: these individuals have substantial language, just not generative morphology. This is evidence against the “all-or-nothing” view of language evolution, and for the possibility that the faculty evolved incrementally. Second, the impairment is specific to language — there is no general cognitive defect — suggesting that at least some grammatical knowledge is instantiated in neural structures that are biologically dedicated to language rather than shared with domain-general reasoning.

Subsequent molecular research identified a gene — FOXP2, a transcription factor that regulates neural development — as implicated in heritable language disorder. The human version of FOXP2 differs from the chimpanzee version by two amino acid substitutions that appear to have arisen recently in our evolutionary history, and disruption of this gene produces impairments specifically in the sequencing and articulation of speech. It is one of the first concrete links identified between a specific genetic variant and a specific component of the human language faculty.

3.9.6 Proto-language: Apes, Children, and Genie

The best way to understand what is distinctive about full human language is to examine what precedes it — what we can call proto-language: a communication system that conveys meanings but lacks the productive grammar that makes human language unbounded. The concept is brought into sharp focus by comparing two sets of two-word utterances from very different sources.

(12) Big train; Red book; Adam checker; Mommy lunch; Walk street; Go store; My milk; Pretty boat; Mama honey; Pig Mommy.

(13) Tickle Washoe; Open blanket; Roger ticket; You drink; Go in; In hat; Clothes Mrs. Gardner; Listen dog; Sign me; Hurry gimme.

These two sets are close to indistinguishable. Both draw on a small vocabulary of noun-like and verb-like terms assembled into pairs. They cover the same semantic territory: the attribution of qualities (big train, red book), possession (my milk, clothes Mrs. Gardner), location of actions (walk street, go in), and agent-patient relations (Adam checker, tickle Washoe). The apparent syntax, to the extent that word order carries meaning, is indistinguishable across the two samples.

Yet (12) is from children at the two-word stage of language acquisition, and (13) is from the chimpanzee Washoe, trained in American Sign Language [14]. The surface similarity masks a fundamental difference in motivation. Children at the two-word stage are in the business of categorising the world for its own sake: a child will say red book with no request implied, simply to predicate a property of an object. Washoe’s utterances, by contrast, are overwhelmingly communicative appeals about objects or actions she wants: requests to be tickled, to have a blanket opened, to receive food or attention. The ape is using proto-language instrumentally, as a tool for getting things. The child is using it representationally, as a tool for mapping reality — even when there is nothing to be gained.

This difference — categorisation for its own sake versus communication about immediate wants — marks the threshold between a communicative system and a representational one. Only a fully representational system can build the shared maps of reality — including maps of obligations, norms, past events, and hypothetical futures — that make complex cooperation possible.

Genie: the human case. In 1970, a 13-year-old girl was discovered in Los Angeles who had been confined and severely isolated since approximately 18 months of age, denied normal language input for over eleven years. The linguist Susan Curtiss [8] documented her linguistic development after her rescue. Genie acquired vocabulary rapidly — her capacity to learn words was intact — but her grammar remained permanently limited. Representative utterances from her early speech: Want milk; Mike paint; Applesauce buy store; I want Curtiss play piano. These are slightly more elaborated than the two-word utterances of (12) or (13): she can string together more than two content words and can express a desire for another person’s action. But tense marking, morphological agreement, and the recursive embedding that characterises adult grammar are absent. There is no generative phrase structure in the linguistic sense.

Genie’s language, like Washoe’s, like the two-word child’s, is what we can call proto-language. It is a communication system that is older, in an evolutionary sense, than full human language — phylogenetically ancient. It is present in very young children before grammar has developed, in language-trained apes who have never been exposed to grammar, and in a human who was denied the critical developmental window during which grammar normally takes hold. There appears to be no critical period for proto-language: Genie acquired it at age 13 without difficulty. There is, however, a critical period for the grammatical component that elevates proto-language into full human language — and that window had closed for Genie before she was found.

Genie’s case thus provides a natural experiment that dovetails with the genetic and lesion evidence reviewed in Section 3.9.5. The morphological generalisation ability damaged in Gopnik’s dysphasia family, the specific naming and categorisation abilities impaired by focal lesions, and the syntactic component that Genie could never acquire — all point to the same conclusion: human language is not a single faculty but a family of biologically specific capacities, only some of which are shared with other primates, and only some of which can develop without normal early exposure.

The proto-language capacity appears to be the shared substrate — the platform on which, in our lineage, a new grammatical architecture was built. The question that follows naturally is: how does that architecture arise? Can it emerge from cultural processes alone, without genetic change? The evidence from pidgins and creoles is the clearest answer available.

3.9.7 Pidgins, Creoles, and the Speed of Cultural Evolution

A pidgin is a contact language that emerges spontaneously when speakers of mutually unintelligible languages must communicate — most commonly under conditions of trade or colonial labour. Pidgins draw their vocabulary primarily from one dominant language, but strip away most of its morphology and syntactic complexity. They have no regular tense system, little or no agreement, minimal embedding, and no native speakers. Pidgin speakers share no common language; the pidgin is their improvised solution to an immediate communicative need. It is, in the technical sense, a proto-language: its utterances resemble those of Washoe and of Genie far more than they resemble the output of a native English or Japanese speaker.

A creole is what happens in the next generation. When children grow up in a community where a pidgin is the primary medium of communication, they do not simply learn the pidgin they are exposed to. They systematically expand it. They add consistent grammatical morphology, stable word order, tense-aspect-modality markers, and recursive embedding — until the resulting language has the full expressive power of any natural human language. The proportion of purely grammatical items (articles, prepositions, tense markers, complementisers, relativising particles) rises from near zero in the pidgin to approximately 50 per cent in established creoles. This is not a modest elaboration; it is the spontaneous creation of a grammatical system from proto-linguistic raw material. And it happens within a single generation, without instruction, driven by the same biological endowment that allows any child to acquire whatever language their community speaks.

Why study pidgins and creoles? Because they constitute a natural experiment in language creation: a case in which we can observe, in historical time, the transition from proto-language to full language. Most of the evidence bearing on the origin of language is indirect — inferred from fossils, comparative anatomy, or computational models. Creolisation gives us a direct window.

The Hawaiian case. The clearest documented instance is Hawaiian Creole English. In the late nineteenth and early twentieth centuries, plantation workers arrived in Hawaii from China, Japan, the Philippines, Korea, Portugal, and Puerto Rico. With no common language, they developed a pidgin — simplified English mixed with fragments of their native tongues — for essential communication. Their children, immersed in this pidgin from birth, spontaneously produced Hawaiian Creole English: a fully grammatical language with consistent syntactic rules, a complete tense-aspect-modality system, and the recursive structures entirely absent from the pidgin input. The jump from proto-language to full language occurred in a single generation [3, 4].

Cultural evolution is faster than genetic evolution by many orders of magnitude. A human generation is approximately 25–30 years — far too short for any significant genetic change. The Hawaiian children’s brains were genetically identical to their parents’. What changed was not the genome but the cultural environment: the children were immersed in communicative interaction, even in a structurally impoverished form, and their language faculty — the biological endowment shared by all Homo sapiens — supplied the grammatical architecture that the input lacked.

This means that the transition from proto-language to full language does not, in principle, require genetic change. It requires children, a community, and enough time for the first creole-speaking generation to emerge. The genetic endowment creates the capacity; the cultural environment triggers its full expression. This is a precise and historically documented instance of what Tooby and Cosmides [33] argued for vocabulary: that cultural transmission is the appropriate vehicle for the content of language, while the capacity to acquire language is the appropriate vehicle for genetics. Creolisation extends this argument from words to grammar itself.

For the study of persuasion, the creolisation finding carries a consequential implication. The most powerful persuasive technology in human evolutionary history, fully grammatical language, is as much a cultural product as a biological one. The capacity is genetic; the grammatical realisation is cultural. Groups that create the conditions for creolisation — dense communicative interaction across language boundaries — will produce, within a generation, the full expressive and persuasive power of human language. The biology is universal; the culture is the trigger. This is the deepest sense in which what follows in the next section on dual inheritance theory applies not just to beliefs and rituals, but to language itself.

3.9.8 From Genetic to Cultural Transmission: A New Mode of Persuasion

The persuasive mechanisms reviewed in the preceding sections — pheromone trails, alarm calls, waggle dances, grooming exchanges, dominance displays — share a fundamental property: the capacity for these behaviours is encoded genetically. A honeybee’s ability to produce and respond to the waggle dance, a meerkat’s alarm call repertoire, a chimpanzee’s disposition to engage in coalition politics — all are heritable in the strict biological sense. Each generation must re-evolve the relevant neural architecture through genetic reproduction.

Language introduces a qualitatively different mode of transmission. The capacity for language is genetically encoded — specialised cortical regions, precise articulatory control, sensitivity to syntactic structure — but the content of what language transmits is culturally inherited. A story, a ritual, a legal code, a religious belief is not passed from parent to child through DNA; it is transmitted through imitation, teaching, and symbolic communication, and can spread horizontally across individuals who share no kinship at all. This distinction between the vehicle (genetic) and the payload (cultural) is the heart of what Boyd and Richerson [5] termed dual inheritance theory: human evolutionary history is the story of two inheritance systems, genetic and cultural, operating simultaneously and shaping one another.

The consequences for persuasion are profound. Boyd and Richerson [5] argued that culturally transmitted traits — including systems of ritual, belief, and social norm — are subject to selection in their own right. A group that is successful because of its system of ritual experiences two simultaneous effects: by cultural evolution, its belief system spreads to neighbouring groups through imitation, prestige, or conquest; and by genetic selection, it favours individuals within the group who are constitutionally more susceptible to those beliefs:

“There is between-group selection for culturally inherited systems of belief that favour the success of groups, and there is individual selection for the genetically inherited ability to be influenced by ritual.”

— Boyd & Richerson, Culture and the Evolutionary Process [5], 1985

This idea — that political authority requires the engineering of shared belief, not merely the exercise of force — has a long history in political philosophy. Plato made it explicit in The Republic [25], arguing that the legislator’s central task is precisely this cultural management of belief:

“All he [the legislator] needs to do is to find out what belief is most beneficial to the state, and then use all the resources at his command to ensure that throughout their lives, in speech, story and song, the people all sing to the same tune.”

The candour here is remarkable: the primary instrument of social order is not law or force but the alignment of belief through narrative, music, and repetition — persuasion systematically deployed across an entire culture. Rousseau [27], writing two millennia later, placed a related insight at the heart of The Social Contract: legitimate political authority cannot rest on force alone, but requires the transformation of natural individuals into citizens who genuinely identify with the general will — a transformation that is, at its core, an act of cultural persuasion through shared institutions, rituals, and civic education.

What Boyd and Richerson formalised in evolutionary terms, Plato and Rousseau had identified through political philosophy: stable large-scale cooperation requires the active management of belief, and the most powerful tool for that management is cultural transmission — narrative, ritual, education, and the arts.

This creates a powerful positive feedback loop. Groups that develop more compelling persuasion systems — narratives that evoke strong emotion, rituals that produce collective effervescence, institutions that generate trust — tend to outcompete groups with weaker ones. Within those successful groups, individuals who are more susceptible to narrative persuasion cooperate more reliably and contribute more consistently to group success, leaving more descendants. Over many generations, this dual process would have shaped the human brain to be exquisitely receptive to story, ritual, and authority — not because susceptibility is always individually advantageous, but because groups containing susceptible individuals tended to outsurvive those that did not.

This coevolutionary logic explains something otherwise puzzling: why human beings are so readily moved by fiction. A novel’s characters do not exist; the events did not happen; yet readers experience genuine emotion, update moral intuitions, and sometimes change behaviour. This is not a cognitive failure — it is the expected output of a brain that evolved to treat culturally transmitted narratives as reliable guides to social reality.

3.9.9 Shared Narratives as Cooperative Technology

The most powerful application of language as persuasion technology is the creation and transmission of shared narratives — collectively held stories that coordinate behaviour by establishing what is valued, what is sanctioned, and what will be rewarded or punished.

Harari [18] argued that what distinguishes Homo sapiens from other animals is not superior individual intelligence but the capacity to cooperate flexibly in very large groups through shared fictions — collectively believed stories about entities that have no physical existence but that generate coordinated behaviour through belief alone. Money is real because enough people believe it is; a corporation has legal personhood; a nation-state commands loyalty across millions of strangers. Every institution is, in this analysis, a persuasive achievement — maintained not by physical force but by the ongoing, distributed persuasion of its members that the institution is real and worth sustaining.

Why should narrative outperform explicit argument as a persuasion medium? The comparison is not obvious. An argument, properly constructed, is more precise, more falsifiable, easier to scrutinise. But precision is not the same as effectiveness. Narratives carry several structural advantages over propositional arguments, each corresponding to a feature of how human memory and motivation actually work.

Memory is the first advantage. Events organised as a narrative, with causal links, a protagonist, a goal, an obstacle, and a resolution, are remembered far better than the same information presented as a list. The causal spine of a story provides retrieval cues at every step; each event implies the next. A study that shows something is true is forgotten more readily than a story about someone for whom it was true.

Emotional encoding is the second. Narrative engages the neural systems that process real experience: physiological arousal, perspective-taking, and affective response during story comprehension are measurable and resemble responses to actual events. Explicit arguments primarily engage evaluative processing, which is also the processing most likely to generate counter-arguments. Stories get in before the critical faculties organise a response.

There is also the matter of causal simulation. Narratives invite the reader to mentally simulate events, run counterfactuals, and predict outcomes, activating the same predictive machinery used to navigate real social situations. An abstract claim that cooperation pays leaves the assertion at the level of assertion. A story in which the cooperator thrives and the defector suffers runs the inference as an experience, not just a proposition.

Finally, and perhaps most durably, repeated exposure to narratives defines categories of person: hero, villain, traitor, martyr. Once those categories are established, specific arguments arrive pre-sorted. A person who has absorbed a thousand stories about the loyal son and the treacherous brother does not need to evaluate fresh arguments about kin betrayal. The moral conclusion is already indexed to identity.

The specific narrative forms human cultures developed are among the most effective persuasive technologies ever invented, and two examples show why.

Cain and Abel encodes a set of cooperative norms in narrative form. Abel is the favoured son; Cain, consumed by envy, kills him and attempts to deny responsibility. God’s punishment — the mark of Cain, the curse of perpetual wandering — makes explicit the norm that kinship solidarity is sacred and that betrayal of a brother has cosmic consequences. The story does not make an argument in the logical sense; it does something more effective. It attaches intense emotion (guilt, divine wrath, permanent exclusion from community) to the act of kin betrayal, stamping the norm into the listener’s moral imagination. Anyone who knows the story has been persuaded, about the gravity of what Cain did — and, by extension, about what they must not do.

Shakespeare’s Macbeth encodes the norm that ambition unpaired with moral scruple is self-destructive. Macbeth is not punished by an external authority in the first instance; he is destroyed from within by guilt and paranoia. The play’s persuasive power lies precisely in this: it does not say “murder is wrong because society will catch you.” It says “murder reshapes the murderer’s inner world in ways that make life unliveable.” This is a far more powerful persuasive move, because it makes the violation of cooperative norms intrinsically aversive rather than merely instrumentally costly.

Both stories are instances of what Gellner [15] described as the fundamental mechanism by which social orders reproduce themselves through persuasion:

“The way in which you restrain people from doing a wide variety of things, not compatible with the social order of which they are members, is that you subject them to ritual. The process is simple: you make them dance round a totem pole until they are wild with excitement, and become jellies in the hysteria of collective frenzy: you enhance their emotional state by any device, by all the locally available audio-visual aids, drugs, dance, music, and so on: and once they are really high, you stamp upon their minds the type of concept or notion to which they subsequently become enslaved.”

— Ernest Gellner, Plough, Sword and Book [15], 1988

The pattern is not confined to ancient or canonical texts. Harari [18] points out that money is probably the most widely shared fiction in human history — a story about value that works only because everyone simultaneously believes it. The twenty-dollar bill in your pocket has no intrinsic worth; its power is entirely a function of collective belief, sustained by institutions, laws, and the continuous retelling of a narrative about what money is and what backs it. The same analysis extends to nation-states, legal systems, and corporations: none of these entities exist in the physical world, yet they command the behaviour of billions of people. Each is a persuasive achievement, a shared story maintained by the ongoing work of institutions, rituals, and education.

Religious narratives follow the same structure at a larger scale. The story of the Exodus — slaves liberated by divine intervention, bound by covenant to live by a set of moral rules — has been retold in Jewish, Christian, and Islamic traditions for three thousand years, influencing legal systems, political philosophy, and social norms across cultures that share virtually nothing else. The persuasive work is not done by argument. Nobody who absorbs the Exodus story as a child is offered a syllogism. The story does it: emotion, dramatic stakes, a protagonist to identify with, a resolution that embeds the norm at the level of feeling rather than proposition.

The sophistication of modern narrative — the novel, cinema, serialised drama — is an elaboration of this same mechanism. Stories function as slow-release persuasion, shaping the moral intuitions and cooperative dispositions of audiences over timescales far longer than any direct persuasive appeal could achieve. The fact that Macbeth has been performed continuously for four centuries, in hundreds of languages, to audiences with no shared kinship or community, and that it continues to shape moral intuitions about ambition and guilt, is evidence of the extraordinary persistence and reach of narrative persuasion as a social technology.

3.9.10 Gossip, Reputation, and the Maintenance of Cooperation

The everyday form of language-as-persuasion is gossip: the informal exchange of information about the cooperative or defecting behaviour of third parties. Dunbar [9] estimated that roughly 65% of human conversation concerns social topics — who did what to whom, who can be trusted, who broke a norm. This is not idle chatter; it is the maintenance mechanism of a reputation-based cooperative system. The group’s collective persuasive acts (praise, censure, ridicule, warning) police free-riding in a manner structurally parallel to worker policing in honeybee colonies — but operating through language rather than chemical signals, and scalable to groups of any size connected by communication networks.

The ethnographic record bears this out. In small-scale forager societies, public shaming and ridicule of individuals who claim more than their share — what anthropologists call “levelling mechanisms” — are among the most frequently observed cooperative enforcement tools. The mechanism is purely linguistic: no physical coercion is required. The threat of being talked about badly is sufficient to suppress many forms of free-riding. Individuals who acquire a reputation for generosity receive more food-sharing partnerships, more alliance support, and better coalition options. The reputational currency is maintained by talk.

This extends into modern institutions in ways that are sometimes invisible because they are so familiar. Professional references are gossip formalised into a hiring institution. Yelp reviews are gossip about businesses. Academic citation patterns carry reputational information about the credibility of researchers. Social media pile-ons are the digital equivalent of public shaming around the campfire, operating at a scale that no forager band could achieve but following the same structural logic: a defector is identified, talked about widely, and excluded from future cooperative exchanges. The technology changes; the underlying mechanism of reputation management through language does not.

What changes with scale is the symmetry of the process. In a band of 150, gossip is roughly symmetrical — anyone can talk about anyone. Scale the group, add writing, then print, then broadcast, and the reach becomes radically asymmetric: a sender with access to a platform can address millions simultaneously, while no individual can mount an equivalent reputational response. The mechanism is the same as in the forager band. The structural conditions are not. That asymmetry is the starting point for everything in the rest of this book.

3.10 Reason and Ritual: Two Modes of Cooperative Persuasion

Large-scale cooperation takes two distinct forms, and placing two examples side by side makes the contrast vivid.

A termite mound (Macrotermes bellicosus) can stand four metres tall, house several million individuals, maintain internal temperature to within one degree Celsius, and circulate air through a ventilation system that rivals modern mechanical engineering. No individual termite has a blueprint for this structure. No architect reviewed the plans. The mound emerges from millions of local interactions: each termite responding to chemical and tactile signals from its immediate neighbours, with no individual possessing knowledge of the whole. This is cooperation through ritual: decentralised, unplanned, with order emerging from the accumulated effect of local signalling acts and not from any intention to produce that order.

A tall cathedral termite mound in Litchfield National Park, Australia, illustrating the scale of structure achieved through emergent collective behaviour with no central planner.

A Macrotermes termite mound reaching approximately five metres — built by millions of individuals with no blueprint and no architect, entirely through local chemical and tactile signals. Photo: J. Brew, Litchfield National Park, Australia. CC BY-SA 2.0.

The Large Hadron Collider at CERN tells a different story. Over ten thousand scientists and engineers from more than one hundred countries coordinated the design, construction, and operation of a 27-kilometre ring of superconducting magnets buried beneath the Franco-Swiss border. Every component was specified in advance; every interface was explicitly agreed upon; every experiment was designed, pre-registered, and reviewed by committees. Large Language Models are a comparably planned achievement: teams of thousands, enormous investment, explicit architectural decisions at every level. This is cooperation through reason: explicit argumentation, centralised planning, a detailed blueprint agreed upon before any physical construction begins.

The interior of the Large Hadron Collider tunnel at CERN, showing the blue superconducting magnet array running through the 27-kilometre ring.

The LHC tunnel at CERN — 27 kilometres of superconducting magnets coordinated by more than ten thousand scientists from over one hundred countries. Every interface was explicitly specified; every experiment pre-registered and reviewed. This is cooperation through reason at its most extreme: a shared blueprint so detailed that the entire structure could be agreed upon before any physical construction began. Photo: Julian Herzog. CC BY 4.0.

The contrast between these two modes runs through the entire evolutionary history of cooperation. Eusocial insects cooperate almost entirely through ritual: chemical gradients, probabilistic responses to local signals, collective intelligence without any planning at any level. Human societies use both simultaneously. Historically, Gellner [15] argued that ritual has been the primary mechanism for creating large cooperative groups: communal, emotionally charged performance that stamps shared beliefs onto participants without requiring explicit argument. Reason, the explicit construction and exchange of arguments, is a more recent and more fragile achievement. It depends on specific institutional supports: writing, formal education, deliberative institutions.

Both depend on persuasion. They differ in whether the logic driving the outcome is accessible to the participants, and whether the result was intended by any of them. Language is the technology that made both possible at scale: without it, neither institutional reasoning nor shared ritual can bind strangers.

3.11 Persuasion Across Scales

Life kept finding the same solution to the same problem. Every time independent organisms needed to act as one — cells within a body, workers within a colony, hunters within a band, citizens within a state — they needed a way to make cooperative intent legible and defection costly. Each time the cooperation problem got harder, the communication system had to get more powerful. Chemicals. Dances. Grooming. Alarm calls. Coalition politics. Pantomime. Grammar. Shared fiction.

The table below traces that progression.

Table 3.3: Persuasion as the enabling mechanism of successive cooperative transitions.
Scale Organism Communication Channel Key Persuasive Function Transition Enabled
Cell → organism Multicellular animals Chemical (hormones, receptors) Differentiation, division of labour Multicellularity
Individual → colony Eusocial insects, naked mole rats Pheromones, mechanical signals Recruitment, suppression of defection Superorganism
Individual → group Primates, lions, meerkats Tactile, acoustic, gestural Alliance building, alarm, coordination Cooperative foraging and defence
Band → tribe Early Homo Proto-language, gesture Reputation management, norm transmission Coordinated hunting, territory
Tribe → civilisation Homo sapiens Symbolic language, writing Shared fictions, institutions States, markets, religions
Nation → globe Modern humans Print, broadcast media, internet Mass persuasion, public opinion Global coordination
Human → AI Emerging Large language models Personalised persuasion at scale To be determined

The final row is still being written. Large Language Models represent a new kind of communicative agent: one capable of generating persuasive messages at industrial scale, tailored to individual recipients, across every channel simultaneously.

This returns us to where the chapter began: seventeen thousand years ago, in a cave in the Vézère valley, someone mixed iron oxide with animal fat and pressed a hand to the rock. The question posed at the opening was why — given that anatomically modern humans had existed for 200,000 years — the cultural explosion of symbolic art, long-distance trade, and shared ritual happened so recently and so suddenly. The answer the chapter has been building toward: not anatomy, not brain size, but the accumulation of enough shared fiction. Language gave humans displacement and propositional structure. Culture gave them the ability to build on what previous generations had worked out. At some threshold, those compounding layers produced communities that could hold the same story in their heads long enough to coordinate around it — to paint the same animals on the same walls season after season, to trade the same ochre across hundreds of kilometres, to bury their dead with the same rituals. The cave was not the cause of that threshold. It was the evidence that it had been crossed.

Whether the arrival of AI-generated persuasion at scale represents an analogous threshold, and what the consequences will be, is among the central questions this book tries to address.

NoteLanguage and Persuasion

The connection between language and persuasion is developed further in Chapter 3, where different academic disciplines — linguistics, social psychology, economics, neuroscience — are examined for the specific mechanisms by which messages change minds. The evolutionary framing developed here provides the deepest context for why those mechanisms exist at all: they are the accumulated solutions to the problem of cooperation across scale, refined over hundreds of millions of years of selection.