5 Resources for Persuasion Research
This chapter catalogues the datasets, benchmarks, simulation environments, and computational tools available to persuasion researchers. Resources are grouped by type: text corpora and argument quality benchmarks; multimodal advertising and emotion datasets; eye-tracking and attention corpora; social media and behavioural datasets; pre-trained models relevant to persuasion tasks; and simulation environments. Each entry includes the resource’s scope, size where known, the primary task it was designed for, and known limitations for persuasion research.
Entries are added as the field evolves. To suggest a resource, open an issue on GitHub.
Computational persuasion research sits at the intersection of natural language processing, computer vision, social psychology, and behavioural economics. Each of those fields has produced its own data infrastructure, and those datasets were rarely designed with persuasion as the primary target. The result is a fragmented resource landscape: strong coverage of some phenomena (argument structure, sentiment, eye movements during reading) and almost none for others (longitudinal attitude change, cross-cultural message effectiveness, behavioural outcomes of AI-generated content). This chapter maps what exists and where the gaps are.
5.1 Text Corpora and Argument Quality
5.1.1 Argument Mining Datasets
What makes a convincing argument [28] is the foundational dataset for computational argument quality research. Habernal and Gurevych collected 16,000 argument pairs from online debate platforms (ConvinceMe.net, iDebate.org, CreatedDebate.com), crowd-annotated for relative convincingness. Each pair presents two arguments on the same topic; annotators judged which was more convincing and why. The corpus also includes 9,111 argumentative sentences annotated for 15 attributes of convincingness, including specificity, emotional appeal, and use of evidence. Limitation: convincingness ratings are from crowdworkers reading in an artificial comparison context, not from people who were actually persuaded to change a position.
Persuasion of the Undecided [45] focuses on a practically important population: individuals who report no strong position on a debate topic. The dataset contains crowd-annotated argument pairs from the same debate platforms, filtered for pairs involving undecided evaluators. Because persuasion of the already-committed is structurally different from persuasion of the undecided, this resource targets a distinct phenomenon.
What makes a convincing argument: Cross-Lingual [6] extends the Habernal and Gurevych framework to German, providing one of the few argument quality datasets outside English. The corpus enables, at least in principle, cross-lingual comparisons of argument strength, though the translational mapping of persuasive norms across languages is itself an open research question.
CMV (Change My View) corpus from Reddit [67] contains 3,461 OP-reply pairs from the r/ChangeMyView subreddit, where the original poster (OP) explicitly asked to have their view challenged. The key annotation is whether the OP subsequently awarded a delta (acknowledging their view was changed). This makes it one of the few large-scale datasets where actual attitude change, rather than rated convincingness, is the outcome variable. Tan and colleagues showed that stylistic features — use of hedges, argument length, lexical diversity, and specific discourse markers — predict delta awards. Limitation: the CMV population is self-selected for willingness to be persuaded, and delta awards measure acknowledgement of a good argument rather than durable belief change.
ChangeMyView Threads [25] extend the basic CMV resource by including full thread structure, enabling analysis of how the social dynamics of a discussion thread (who responds, in what order, with what framing) affect persuasive outcomes beyond argument content alone.
Argument quality annotations [65, 66] from Stab and Gurevych provide a 402-essay corpus with fine-grained annotation of argument components (major claim, claim, premise) and argumentative relations (support, attack). The corpus enables training and evaluation of systems that identify argument structure rather than just argument quality. Limitation: essays were written by students and crowdworkers under controlled conditions; the argument structure of spontaneous persuasive speech or social media is substantially messier.
Debate speech persuasion — the IBM Project Debater datasets [29, 69] provide argumentative essays and structured debate transcripts with quality and persuasiveness annotations. Several of these datasets include listening audience persuasion scores, making them closer to a ground-truth persuasion signal than crowdworker quality ratings.
5.1.3 Language Generation and Quality
BooksCorpus and Common Crawl (used in pre-training BERT [23] and GPT-3 [16]) are not persuasion datasets, but the pre-trained models derived from them are the backbone of most contemporary computational persuasion systems. Understanding what these corpora contain — and what they over-represent — is relevant for understanding the biases in downstream persuasion models.
Social Chemistry 101 [26] contains 292,000 rules of thumb describing social norms (“it’s rude to interrupt someone”) with 4.5 million human judgements on situational variation. This resource enables modelling of norm-based persuasion: arguments that appeal to social expectations rather than factual evidence.
5.2 Multimodal Advertising and Emotion
5.2.1 Advertising Datasets
Hussain et al. (2017) Advertisement Dataset [34] contains 64,832 image advertisements from Ads of the World, annotated for sentiment, topic, and persuasion strategy. It is the largest publicly available image advertising corpus, and the annotations include persuasion strategy labels derived from Aristotle’s three appeals (ethos, pathos, logos), making it directly relevant to classical persuasion theory. Limitation: images only, no video; annotations were produced by a small expert panel.
ADVISE (Symbolism and External Knowledge) [79] provides 3,587 advertisements annotated for symbolic content — the non-literal visual associations that carry persuasive meaning (a car advertisement showing a mountain road evokes freedom, not geography). Symbolic annotation requires cultural knowledge; the dataset reflects primarily Western advertising norms.
Visual Rhetoric in Advertisements [78] annotates a subset of advertisements for rhetorical figures — metaphor, hyperbole, irony — in visual form. A visual metaphor argues analogically through juxtaposition of two unlike images; the dataset enables training systems to recognise this fundamentally persuasive structure.
M2P2: Multimodal Persuasion Prediction [3] pairs image, text, and audio features from advertisements with crowd-rated persuasiveness scores. The dataset covers 535 distinct advertisements across a range of product categories, with ratings from Amazon Mechanical Turk workers. M2P2 is currently the most complete multimodal persuasion benchmark available, enabling systematic comparison of unimodal versus multimodal persuasion models. Limitation: ratings measure predicted persuasiveness rather than actual behaviour change; the rater pool is WEIRD-skewed.
Image and Text Persuasion [80] provides experimental data on how the relationship between image and text in an advertisement affects overall persuasive impact. The key finding — that redundant image-text pairs are less persuasive than complementary ones — is supported by the corpus and has direct implications for automated advertisement generation.
Audio Persuasion [64] extends persuasion analysis to the acoustic channel, examining how prosodic features, speaking rate, and vocal affect modulate the persuasive impact of spoken messages.
A Video Is Worth 4096 Tokens [10] addresses the bottleneck of video analysis at scale by verbalising advertisement videos into coherent text narratives using a pipeline of keyframe captioning (BLIP-2), OCR, automatic speech recognition, and brand metadata retrieval, followed by LLM-based story synthesis. The resulting text stories are evaluated on five benchmark datasets across fifteen video understanding tasks — emotion classification, topic classification, and persuasion strategy identification. The paper also releases the first annotated dataset of persuasion strategies in video advertisements. The zero-shot approach outperforms supervised video understanding baselines on four of the five datasets.
5.2.2 Emotion Datasets
International Affective Picture System (IAPS) [49] contains 1,182 standardised emotional images rated on valence, arousal, and dominance by US undergraduate samples. IAPS is the oldest and most widely cited emotional image database; it has been used as a reference standard for affective computing for over three decades. Limitation: images are dated (many from the 1990s), the sample is exclusively American, and the valence/arousal/dominance model captures only a fraction of emotional complexity.
EmoSet [75] is a large-scale visual emotion dataset containing 3.3 million images annotated with eight discrete emotion categories (amusement, awe, contentment, excitement, anger, disgust, fear, sadness), brightness, colorfulness, scene type, and object presence. The dataset was constructed to enable models that connect low-level visual features to high-level emotional responses, with a scope three orders of magnitude larger than IAPS.
eMotions [73] focuses on short video clips, providing frame-level and clip-level emotion annotations across a large corpus of social media video. For persuasion research, short-form video is increasingly the dominant format for political advertising, public health messaging, and commercial persuasion.
BAM! (Behance Artistic Media) [72] provides semantic and emotional annotations of artistic images, enabling the study of persuasive communication in non-photographic, stylised visual content. It is particularly relevant for branded content and design-driven advertising.
Affect in images and text [47] provides colour and texture feature annotations linked to emotional response, grounding affective computing models in low-level visual properties.
5.2.3 Behaviour-Signal Advertising Datasets
The datasets above primarily annotate content features of persuasive material. A complementary line of work uses real behavioural outcomes — clicks, likes, engagement — as labels, connecting message features to audience response.
Persuasion Strategies in Advertisements [41] provides a taxonomy of 21 persuasion strategies used by brands — appeals to authority, social proof, scarcity, reciprocity, analogical reasoning, among others — together with a large image advertisement dataset annotated with those strategy labels. The annotation scheme is derived from marketing and rhetoric literature and applied to advertisements collected from public platforms. The dataset is used as a ground-truth resource in several downstream works on automatic persuasion strategy identification in video [10].
BoigBench [38] is a benchmark for behaviour-optimised image generation, containing advertisement images paired with real engagement signals (likes, shares, click-through rates) collected from social media. It supports evaluation of generative models on the task of producing images predicted to drive higher user engagement. Two baseline models accompany the benchmark: BoigLLM, which conditions a language model on engagement history to select among candidate images, and BoigSD, a Stable Diffusion variant fine-tuned with engagement as the reward signal.
ALPHA / ALPHA50M [11] aligns LLMs to advertisement engagement data. ALPHA is trained on engagement signals — likes, comments, shares — collected from real social media ad campaigns, producing a model that predicts and generates content optimised for behavioural response rather than human preference ratings. ALPHA50M extends this to 50 million ad-engagement pairs. The approach introduces behavioural alignment as distinct from standard RLHF: the reward signal comes from observed audience behaviour rather than human annotator preferences.
SPRO (Self-Play Reward Optimisation) [36] applies self-play reinforcement learning to diffusion-based image generation, using ad engagement data as the reward. The method alternates between generating candidate images and selecting higher-performing outputs via a reward model trained on behavioural data, progressively improving the engagement profile of generated advertising images over standard supervised fine-tuning baselines.
MEMENTO [50] uses web-scale data — web pages, their associated advertisements, and implicit engagement signals from web interactions — as a learning signal for low-data advertising domains. The approach does not require manually annotated training sets for each new domain; instead, the model learns domain-appropriate content features from the distribution of web content encountered in natural browsing. Particularly relevant for advertisers in specialised verticals where labelled ad-engagement data is scarce.
5.3 Eye-Tracking and Attention
Eye movements during reading and scene viewing provide a window into cognitive processing that self-reports cannot. For persuasion research, eye-tracking data reveals what receivers actually attend to, distinct from what they report attending to.
5.3.1 Reading Corpora
Dundee Corpus contains eye movements from 10 participants reading 20 newspaper articles (~51,000 words), recorded with full-paragraph context. It is a standard benchmark for models predicting reading time as a function of linguistic features (frequency, surprisal, syntactic complexity). Reading time is a proxy for processing effort, which in turn predicts encoding depth.
Provo Corpus [46] provides eye-tracking and self-paced reading data for 55 texts (~2,700 words each), with cloze probability norms for each word position. Its combination of oculomotor data and predictability norms makes it the richest available resource for studying how expectation violation (a core mechanism of persuasive surprise) affects processing.
CELER [9] contains 365 participants reading 5,000 sentences in both native English and as L2 English learners, with concurrent eye-tracking. Its scale enables statistical modelling of individual differences in reading — relevant for personalised persuasion research that wants to account for varying processing depths across recipients.
CMCL Shared Task Corpora [32, 33] provide eye-tracking data integrated with NLP benchmarks, enabling direct comparison of human reading patterns with model attention patterns.
5.3.2 Scene Viewing and Advertisement Attention
Human Attention in Image Captioning [30] links eye-tracking data during image captioning to the saliency maps produced by neural captioning models. For persuasion, the dataset enables analysis of whether model attention aligns with human attention during the processing of complex visual scenes.
Scanpath datasets — ScanGAN360 [48], ScanpathNet [8], and ScanPathApp1 provide scanpath (sequence of fixations over time) data during scene viewing, enabling models that predict not just where people look but in what order and for how long. Scanpath prediction is relevant to understanding the temporal unfolding of attention during advertisement processing.
Eye-tracking in NLP benchmarks — the ZuCo [31] and related datasets provide simultaneous EEG and eye-tracking during reading of sentences with various complexity levels. These enable study of neural processing correlates of linguistic difficulty, with implications for message design.
Gaze embeddings for zero-shot classification — Karessli et al. [37] demonstrate that eye-tracking patterns during image viewing carry semantic information sufficient for zero-shot object recognition — evidence that gaze data encodes meaningful representations of visual attention that go beyond simple saliency.
5.5 Pre-Trained Models and APIs
The following models are not datasets but are the primary computational infrastructure for persuasion research.
5.5.1 Language Models
GPT-3 / GPT-4 [16, 51] — the GPT family from OpenAI is currently the most widely used infrastructure for LLM-based persuasion experiments, simulation studies, and automated message generation. GPT-4 specifically has been used by [15] to predict persuasion experiment outcomes and by multiple groups to generate personalised persuasive messages for controlled experiments. Accessible via the OpenAI API.
LLaMA / Vicuna [18, 68] — open-weight alternatives to the GPT family, enabling local deployment and fine-tuning on domain-specific persuasion corpora without API dependency. The LLaMA-family models have been used in persuasion research where full access to model internals (activations, logit distributions) is required.
BERT [23] — the standard encoder model for classification tasks in NLP, including argument quality classification, stance detection, and persuasion strategy identification. Fine-tuned BERT variants remain competitive on most persuasion classification benchmarks.
T5 / Flan-T5 [19, 58] — the text-to-text framework enables unified treatment of persuasion classification and generation tasks. Instruction-tuned Flan-T5 models perform well on argument quality benchmarks without task-specific fine-tuning.
LCBM (Large Content and Behavior Models) [42] — a framework proposed specifically for modelling the relationship between content and downstream behavioural response. Unlike the general-purpose models above, LCBM explicitly targets the link between message features and audience behaviour, making it the closest currently available model to a persuasion-specific foundation model.
LaMP (Language Model Personalization benchmark) [59] provides seven personalised NLP tasks spanning classification (citation identification, news categorisation, product rating) and generation (headline writing, scholarly title generation, email subject generation, tweet paraphrasing). Each task pairs task inputs with a user profile — that user’s prior outputs — and evaluates how well LLMs adapt to individual style and preference. LaMP is the current standard benchmark for personalised generation, using retrieval augmentation as the personalisation mechanism and reporting user-based and time-based data splits.
FSPO (Few-Shot Preference Optimization) [61] frames LLM personalisation as a meta-learning problem: given a small set of labelled preference pairs from a new user, the model rapidly constructs a personalised reward function and generates responses aligned with that user’s preferences. FSPO is trained on over one million synthetic personalised preferences spanning three domains and achieves strong performance on held-out user evaluation. It addresses a structural weakness of standard RLHF — that aggregating preferences across all users loses individual variation — which is directly relevant to personalised persuasive message generation.
Transsuasion / PersuasionBench [63] introduces the task of transsuasion: given a low-performing tweet, rewrite it to achieve the engagement level of a high-performing semantically equivalent tweet by the same author, transferring persuasive impact while preserving content. PersuasionBench pairs tweets written by the same user on the same topic where one version received significantly more engagement. The companion PersuasionArena evaluates LLMs on this and related persuasiveness tasks, demonstrating that persuasive ability scales with model size and that smaller LLMs can be brought to parity with larger ones through targeted fine-tuning.
Behavior-LLaVA [62] fine-tunes a vision-language model on human behavioural signals — video replay graphs, likes, and comments — rather than on human-annotated captions or preference labels. Training on behavioural outcomes teaches the model which visual and semantic features drive viewer engagement, producing representations that transfer to downstream content understanding tasks (emotion recognition, persuasion strategy detection) with improved accuracy over caption-supervised baselines.
5.5.2 Vision-Language Models
BLIP-2 [43] enables querying of image content in natural language, enabling structured annotation of visual persuasive elements (what is shown, what emotion is evoked, what claim is implied) at scale. Relevant for large-scale analysis of advertising corpora.
VideoChat [44] extends BLIP-style vision-language interaction to video, enabling natural-language querying of video advertisement content. This is relevant for the emerging literature on video persuasion, where manual annotation at scale has been the bottleneck.
Segment Anything (SAM) [39] and Track Anything [76] provide universal segmentation for images and video, enabling identification of objects, faces, and text regions within advertising content without task-specific training.
5.5.3 Multimodal Processing Tools
PySceneDetect [14] — a Python library for automated detection of scene cuts and transitions in video. For persuasion research, scene-cut detection is a prerequisite for analysing video advertisement structure (number of cuts per second is a standard measure of production intensity linked to emotional engagement).
GMFlow [74] — optical flow estimation for video, enabling quantification of motion dynamics within scenes. High optical flow correlates with arousal and attention capture in advertising research.
PP-OCR [24] — lightweight text detection and recognition for images and video. Advertising content frequently embeds text overlays; OCR enables this text to be extracted and analysed alongside the visual channel.
U2-Net [57] — salient object detection, identifying the primary visual subject of an image. Saliency and persuasive intent are related: advertisements are designed to direct attention to specific elements.
5.6 Simulation Environments
5.6.1 LLM-Based Population Simulation
Silicon sampling [1] — rather than a packaged tool, this is a methodology: conditioning LLMs on demographic profiles and using the resulting outputs as synthetic survey responses. Argyle and colleagues demonstrate the approach using GPT-3 conditioned on ANES respondent profiles, validating against real survey distributions. The methodology is available for replication; the key parameter choices (conditioning prompt structure, model temperature, validation procedure) are documented in the paper.
Generative Agents [53] — a simulation environment in which 25 LLM-powered agents live in a small-town setting, with persistent memory, daily schedules, and social relationships. Agents produce emergent collective behaviours (organising a party, running an election) without explicit programming. For persuasion research, the environment enables study of how information spreads through a social network, how agents update beliefs based on peer communication, and how persuasive interventions propagate through a simulated community. Code is open-source on GitHub.
OASIS [77] is a scalable open-source social media simulator supporting up to one million LLM-based agents, designed to replicate the dynamics of platforms such as X (formerly Twitter) and Reddit. Agents have persistent profiles, a dynamically updated information environment, and diverse action spaces (post, like, share, follow). The simulator has been used to study information spreading, group polarisation, and herd effects at a scale impossible with human participants. For persuasion research, OASIS provides the closest available approximation to a full social media information environment: a controlled setting where the researcher can inject a message, observe cascade dynamics, and measure attitude shifts across a large synthetic population.
Social Agents: Collective Intelligence [12] — a multi-agent framework in which a diverse population of LLM personas, each instantiated with systematically varied demographic and psychographic profiles, independently responds to a stimulus (advertisement, message, policy proposal), with aggregate predictions outperforming any single model on behavioural outcome tasks. As a simulation environment for persuasion, it functions as a heterogeneous audience simulator: a sender can test a message against a synthetic population before deployment, observing how predicted engagement and attitude-shift vary across demographic subgroups.
AI Psychometrics simulation [54] — Pellert and colleagues demonstrate that standard psychological inventories (Big Five, moral foundations, social value orientation) can be administered to LLMs, producing stable psychological profiles that vary by model and by prompt conditioning. The framework enables systematic exploration of how LLM “personality” interacts with persuasive message design.
Pressman et al. simulacra [56] — a framework for using LLMs as proxies for specific demographic groups in persuasion experiments, with explicit attention to the validity conditions under which LLM responses correspond to human responses.
5.6.2 Agent-Based Models
Axelrod’s tournament [2] — the original computer tournament for iterated Prisoner’s Dilemma strategies is reproducible with standard software (Python axelrod library). While not a persuasion environment per se, it is the foundational simulation environment for studying the evolution of cooperative signalling, the theoretical basis of honest communication.
Agent-based social simulation [5] — the broader class of agent-based models (ABMs) used in computational social science, implemented in platforms such as NetLogo and Mesa (Python). For persuasion, ABMs enable study of opinion dynamics, information cascade, and norm propagation in networks with controlled structure. Standard models include the bounded confidence model and the DeGroot model of opinion averaging.
5.7 Evaluation Frameworks and Metrics
5.7.1 Argument Quality Metrics
Standard NLP evaluation metrics — BLEU [52], METEOR [4], and BERTScore — are used for argument generation evaluation but are poorly calibrated to persuasive quality. A generated argument can be fluent and similar to a reference argument while being entirely unpersuasive, and vice versa. Habernal and Gurevych’s convincingness rankings [28] and the delta rate on CMV [67] are the most widely used task-specific metrics for argument persuasiveness.
5.7.2 Dialogue and Persuasion Metrics
LLM-as-judge [81] — using a strong LLM (GPT-4) to rate the quality of outputs from weaker models has become a standard evaluation approach. For persuasion, LLM judges can assess message fluency, coherence, and apparent persuasive intent, but there is no validated evidence that LLM-rated persuasiveness correlates with actual human attitude change.
Human preference ratings [17] — pairwise preference rating by human annotators is the most valid available proxy for persuasive effectiveness, short of measuring actual attitude or behaviour change. The Chatbot Arena methodology is the current standard for large-scale human preference evaluation of LLM outputs.
5.7.3 Long-Term Memorability
Video memorability [20, 35] datasets annotate videos for long-term memory retention: how likely is a viewer to remember this clip at a 1-week delay? Memorability is related to, but distinct from, persuasiveness: a memorable advertisement is more likely to influence future purchase decisions, but a highly memorable message may be memorable precisely because it is surprising or disturbing rather than because it changed a belief.
LAMBDA (Long-term Ad Memorability Dataset) [60] provides long-term memorability scores for 2,205 multimodal advertisements from 276 brands, collected from 1,749 participants in two-stage sessions with at least a one-day gap between exposure and recall. The dataset distinguishes brand recall from ad recognition, separating short-term recognisability from the long-term memory traces relevant to purchase-funnel effects. The accompanying model, Henry, integrates visual, cognitive, and world-knowledge representations to predict memorability. A companion dataset, UltraLAMBDA, scales to 5 million ads with automatically assigned memorability scores.
ToT2Mem [13] collects memorability signal at scale from Tip-of-the-Tongue retrieval queries on Reddit: posts where users describe content they half-remember but cannot identify. Over 470,000 content-recall pairs spanning multiple modalities are extracted from this unsupervised source, removing the scalability bottleneck of laboratory memorability studies. ToT2Mem-Video is an 82,500-pair video-recall subset. Fine-tuned VLMs on this dataset outperform GPT-4o on descriptive recall generation.
5.7.4 Behaviour-Based Engagement Metrics
Click-through rate (CTR) and engagement rate are the primary outcome variables in digital advertising research. CTR measures the fraction of content impressions resulting in a user click; engagement rate aggregates downstream interactions (likes, shares, comments, saves). CREATER [71] demonstrates a contrastive learning approach that trains content generation models directly on A/B test data, using CTR differences between content variants as the training signal rather than human preference labels. The approach treats behavioural A/B data as implicit pairwise preference data and constructs a loss that pushes the model toward generating content similar to high-CTR variants and away from low-CTR variants. For persuasion research, CREATER introduces a blueprint for converting the vast stores of platform A/B test data into training signal for persuasive content generation.
5.8 Summary: Coverage and Gaps
The table below maps available resources against the major research questions in computational persuasion. A tick indicates at least one well-validated resource; a dash indicates significant gap.
| Research question | Available resources | Gap |
|---|---|---|
| Argument quality (text) | CMV, Habernal, Stab | ✓ adequate for English |
| Attitude change (real-world) | CMV deltas, GOTV field exps | Gap: no large-scale randomised dataset |
| Multimodal persuasion | M2P2, Hussain ads, IAPS | Gap: no behavioural outcomes |
| Longitudinal effects | ANES (coarse), deep canvassing | Gap: no content-linked panel |
| Cross-cultural | CMV cross-lingual, CELER L2 | Gap: no matched cross-cultural benchmark |
| LLM-mediated persuasion | Chatbot Arena, LaMP, FSPO, PersuasionBench | PersuasionBench nascent; no cross-platform benchmark |
| Simulation validity | Argyle, Pellert, Park, Social Agents | Gap: validation against real behaviour |
| Detection of AI content | — | Gap: no large labelled corpus |
| Video persuasion | M2P2, eMotions, LAMBDA, BoigBench, EMNLP-4096 | Partial: engagement signals available; no attitude-change measurement |
| Eye-tracking + persuasion | ZuCo, Provo, CELER | Gap: ad-specific attention corpora |
The two most consequential gaps are the absence of a large-scale dataset linking persuasive content to real behavioural outcomes, and the absence of a provenance-labelled corpus of AI-generated persuasive content. Both are prerequisites for the research frontiers described in ?sec-measurement and Section 6.5 of Chapter 7.
5.1.2 Social Media and Propagation
Reddit corpus [7] via Pushshift provides a comprehensive snapshot of Reddit — posts, comments, scores, metadata — across all subreddits from Reddit’s founding through the collection date. At petabyte scale, this is the largest publicly available corpus of human discussion online. For persuasion research it enables analysis of: upvote dynamics, comment framing, community norm transmission, and longitudinal linguistic patterns across communities. Limitation: Reddit’s demographic skew (predominantly male, English-speaking, Western) limits generalisability.
Enron email corpus [40] contains approximately 600,000 emails from 158 employees of the Enron Corporation, including internal communications that span normal business correspondence, persuasive exchanges, and crisis communication under legal and commercial pressure. It remains a standard benchmark for email classification and, more broadly, for studying persuasion in professional contexts. Limitation: data is from a single organisation under extraordinary circumstances (a corporate fraud case); the communication patterns may not generalise.
Twitter retweet prediction [55, 70] datasets link tweet content and social features to propagation outcomes. Retweet count is a noisy proxy for persuasiveness (content spreads for reasons unrelated to belief change), but propagation datasets are among the few resources that connect message content to real-world dissemination behaviour at scale.
ChangeMyView with linguistic cues [21, 22] — Danescu-Niculescu-Mizil and colleagues studied how linguistic accommodation and memorable phrasing predict conversational outcomes. These corpora, while not exclusively about persuasion, are standard references for the study of how language form (not just content) shapes social influence.