AI Voter Personas: US 2024

Explore the Ideological Landscape

Discover how 15 distinct voter groups position themselves across the political landscape. Based on 51 policy questions from the ANES 2024 survey, these clusters reveal the ideological diversity behind the Trump vs Harris election.

Toggle between 3D and 2D views, select policy dimensions to compare, and hover over clusters to see their demographics and vote shares.

X-axis: Y-axis: Z-axis:

Find Your Cluster

Answer these 10 questions to discover which ideological cluster you belong to.

Chat with a Cluster Persona

These are simulated, fictional characters representing each cluster's statistical profile. Ask them about political issues.

Select a persona:

Enable live LLM chat (powered by backend AI)

💡 LLM chat provides dynamic, conversational responses powered by Anthropic Claude Sonnet 4.5. Uncheck to use rule-based keyword matching instead.

About This Project

Overview

This project uses K-means clustering on ~50 policy attitude questions from the ANES 2024 Time Series Study to identify distinct ideological clusters in the American electorate. The analysis reveals how voters combine policy positions across topics like abortion, immigration, fiscal policy, environment, crime, and foreign affairs.

Rather than forcing voters into simple "liberal" or "conservative" boxes, this approach discovers natural groupings based on actual policy combinations. Some clusters align closely with traditional party platforms; others represent cross-cutting coalitions that defy conventional labels.

Methodology

Universe: "Likely voters" (respondents who definitely or probably will vote in 2024)
Features: 49 pre-core policy variables (V241xxx) spanning abortion/gender, immigration, fiscal policy, environment, healthcare, crime, Israel/Palestine, education, trust in government, and political rights
Feature Selection: Variables with >25% missing data excluded; all scales preserved in original direction (no flipping)
Preprocessing: Median imputation for missing values, z-score standardization to ensure equal weighting across different scales
Distance Metric: Variance-weighted Euclidean distance (features with higher variance receive proportionally higher weight: weight = √variance)
Algorithm: K-means clustering with k-means++ initialization, 50 random restarts
K Selection: K=15 selected from range [8-20] using silhouette score with a small penalty for distance from target K=15 (a design choice favoring granularity over parsimony; see technical note for details)
Cluster separation: Silhouette score is 0.045 and stability ARI is 0.54, indicating soft, overlapping clusters. These are best understood as fuzzy ideological prototypes, not hard-edged voter "types"
Quiz: 10 features selected by Random Forest feature importance to predict cluster membership (61% accuracy vs 79% with all 49 features)
Persona Stances: Cluster-level means computed directly from survey data; LLM chat uses explicit directional stances to ensure consistency (see example prompt)

LLM Validation Experiment

To validate the LLM persona approach, I tested whether Claude Sonnet 4.5 can predict individual respondents' crime policy positions (urban unrest response, death penalty, federal crime spending) from their other 46 policy positions alone, and whether adding demographics (gender, age, education, race) improves predictions. The experiment sampled 200 respondents, each evaluated under two conditions.

Per-question results (200 individuals):

Question	Ideology Only Correct %	Ideology Only Within ±1	+ Demographics Correct %	+ Demographics Within ±1
Urban Unrest (1-7)	30.5%	66.0%	34.7%	70.5%
Death Penalty (1-4)	47.5%	80.1%	46.6%	85.2%
Crime Spending (1-5)	48.2%	92.2%	44.3%	89.2%

Takeaway: Policy positions alone allow Claude Sonnet 4.5 to predict held-out crime stances with 31-48% exact accuracy and 66-92% within ±1 point. Demographics provide a modest improvement, mainly for urban unrest (+4.2pp correct, +4.5pp within ±1). This confirms that ideological coherence exists across policy domains, while also justifying the design choice to provide explicit data-driven stances to the persona chat rather than relying on LLM inference alone.

View Full Validation Report with Charts →

Technical Details

For comprehensive documentation covering the complete analysis pipeline—including data universe definition, feature selection criteria, variance weighting methodology, clustering algorithm details, stability analysis, and persona generation—see:

Read Technical Note

Created By

Guillermo Lezama
Data Scientist and PhD in Economics

Website LinkedIn GitHub

Feedback & Suggestions

This is an ongoing research project. If you have feedback, find errors, or have suggestions for improvement, please reach out via:

GitHub: Submit an issue
LinkedIn: Send a message

Data & Code

Data Source: ANES 2024 Time Series Study
Code Repository: GitHub (Python clustering pipeline + static site generator)
License: MIT (code) / ANES data subject to ANES terms of use

Acknowledgments

This project uses data from the American National Election Studies (ANES). The ANES is a collaboration of Stanford University, the University of Michigan, and funded by the National Science Foundation.

Disclaimer: This is an independent educational project. Any opinions, findings, and conclusions or recommendations expressed here are those of the author and do not necessarily reflect the views of ANES or the NSF.