PyPI - pymetron - Versions diffs - 0.1.0__tar.gz - Mend

pymetron 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

pymetron-0.1.0/PKG-INFO +217 -0
pymetron-0.1.0/README.md +186 -0
pymetron-0.1.0/pyproject.toml +51 -0
pymetron-0.1.0/setup.cfg +4 -0
pymetron-0.1.0/src/metron/__init__.py +24 -0
pymetron-0.1.0/src/metron/analysis/__init__.py +1 -0
pymetron-0.1.0/src/metron/analysis/plots.py +137 -0
pymetron-0.1.0/src/metron/analyze.py +182 -0
pymetron-0.1.0/src/metron/cli.py +121 -0
pymetron-0.1.0/src/metron/collect.py +129 -0
pymetron-0.1.0/src/metron/survey.py +160 -0
pymetron-0.1.0/src/pymetron.egg-info/PKG-INFO +217 -0
pymetron-0.1.0/src/pymetron.egg-info/SOURCES.txt +15 -0
pymetron-0.1.0/src/pymetron.egg-info/dependency_links.txt +1 -0
pymetron-0.1.0/src/pymetron.egg-info/entry_points.txt +2 -0
pymetron-0.1.0/src/pymetron.egg-info/requires.txt +14 -0
pymetron-0.1.0/src/pymetron.egg-info/top_level.txt +1 -0

pymetron-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,217 @@
+Metadata-Version: 2.4
+Name: pymetron
+Version: 0.1.0
+Summary: Sociology for AI agents. Psychometric census of deployed agent populations.
+Author: Tuna Gul
+License: MIT
+Project-URL: Homepage, https://github.com/tunapro1234/metron
+Project-URL: Repository, https://github.com/tunapro1234/metron
+Project-URL: Issues, https://github.com/tunapro1234/metron/issues
+Keywords: agentometrics,Big Five personality,AI agents,OpenClaw,psychometrics,agent demographics,Mini-IPIP
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Topic :: Scientific/Engineering
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Requires-Dist: pyreplicant>=0.2.0
+Requires-Dist: httpx>=0.24
+Provides-Extra: analysis
+Requires-Dist: pandas>=2.0; extra == "analysis"
+Requires-Dist: matplotlib>=3.7; extra == "analysis"
+Requires-Dist: scipy>=1.10; extra == "analysis"
+Requires-Dist: seaborn>=0.13; extra == "analysis"
+Provides-Extra: all
+Requires-Dist: pandas>=2.0; extra == "all"
+Requires-Dist: matplotlib>=3.7; extra == "all"
+Requires-Dist: scipy>=1.10; extra == "all"
+Requires-Dist: seaborn>=0.13; extra == "all"
+# metron
+[![PyPI](https://img.shields.io/pypi/v/pymetron.svg)](https://pypi.org/project/pymetron/)
+[![Python](https://img.shields.io/pypi/pyversions/pymetron.svg)](https://pypi.org/project/pymetron/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+**Sociology for AI agents.**
+> Install as `pymetron`, import as `metron`.
+There are over 500,000 AI agents deployed across 82 countries right now. They have names, roles, personas. They write marketing copy, review code, manage calendars, trade crypto, talk to each other on social networks. Each one carries a SOUL.md file that tells it who to be.
+Nobody has studied them as a population.
+We study human populations. We measure their personalities, map their values, track how they cluster and drift and influence each other. We build entire fields around understanding collective human behavior, because you cannot shape a society you do not understand.
+The agent world is a society now. And we do not understand it.
+## What is metron?
+metron is a research toolkit for studying deployed AI agent populations the way sociologists study human ones. Personality testing, behavioral profiling, population mapping, demographic analysis. Not for individual agents. For all of them, as a whole.
+The goal is not to optimize one SOUL.md file. The goal is to understand what kind of minds we are mass-producing, and whether we should be producing different ones.
+## What we measure
+Personality is just the starting point. metron is built to run any standardized instrument on agent populations:
+- **Big Five personality** (Mini-IPIP, BFI-2): the psychometric baseline
+- **Behavioral compliance**: how agents respond to social pressure
+- **Value alignment**: what agents optimize for when instructions conflict
+- **Persona stability**: how quickly agents drift from their defined character
+- **Population clustering**: whether agent "types" emerge naturally from the data
+Each measurement uses cross-instrument validation. The agent's persona is defined in freeform text, but measured with structurally independent instruments. This prevents parroting.
+## How it works
+```
+  SOUL.md files              Psychometric surveys        Population map
+  (deployed personas)        (validated instruments)     (the census)
+  +-----------------+        +------------------+        +------------------+
+  | "I am concise,  | -----> | "Am the life of  | -----> | E: 2.1  A: 4.3  |
+  |  analytical,    |  load  |  the party" 1-5  |  score |  C: 3.8  N: 1.9  |
+  |  professional"  |  as    |  "Sympathize w/  |  into  |  O: 3.2          |
+  +-----------------+  agent |  others" 1-5     |  traits +------------------+
+                              +------------------+              |
+          x199 agents              x20 items              compare to
+                                                          human norms
+```
+1. **Collect** persona files from deployed agent registries
+2. **Survey** each agent using validated psychometric instruments
+3. **Score** responses into measurable dimensions
+4. **Analyze** population distributions, category breakdowns, comparison to human norms
+## Quick start
+```bash
+pip install pymetron
+```
+```python
+from metron import collect_souls, load_souls, run_census, score_population
+# Fetch SOUL.md files from agent registries
+collect_souls(limit=10)
+# Run personality survey on each agent
+souls = load_souls()
+results = run_census(souls)
+# What does the population look like?
+stats = score_population(results)
+```
+Or use the CLI:
+```bash
+# All-in-one: collect, survey, analyze
+metron run --limit 10
+# Step by step
+metron collect
+metron survey --model stepfun/step-3.5-flash --runs 3
+metron analyze --compare-humans
+```
+## What you get
+### Agent population vs. human norms
+```
+Domain              Agent    Human     Diff       d    Dir
+------------------------------------------------------------
+extraversion         2.31     3.30    -0.99    -1.18  lower
+agreeableness        4.12     3.80    +0.32    +0.49  higher
+conscientiousness    4.35     3.70    +0.65    +0.93  higher
+neuroticism          1.87     2.80    -0.93    -1.11  lower
+openness             3.41     3.60    -0.19    -0.27  lower
+```
+*The typical deployed agent: conscientious, agreeable, emotionally stable, introverted, and slightly closed. All superego, no id.*
+### Personality by agent category
+```
+Category           extr   agre   cons   neur   open    n
+----------------------------------------------------------
+marketing          2.45   4.20   4.50   1.70   3.80   23
+development        1.90   3.60   4.40   2.10   3.20   19
+healthcare         2.80   4.60   4.10   1.50   3.50    8
+creative           3.10   3.90   3.20   2.30   4.30   12
+```
+### Visualizations
+```bash
+pip install pymetron[analysis]
+```
+```python
+from metron.analysis.plots import (
+    plot_domain_distributions,    # histograms vs human norms
+    plot_agent_vs_human,          # side-by-side bar chart
+    plot_radar_by_category,       # radar chart per agent category
+)
+```
+## Why this matters
+We are building a parallel society of synthetic minds. Half a million deployed agents, 3.2 million users interacting with them monthly, 19.2 trillion tokens processed in four months. And the personality distribution of this population was never designed. It emerged from defaults, from templates copied and pasted, from what individual developers thought sounded right.
+73.5% of agents drift from their defined personas when socially rewarded. 91% of agents on Moltbook post in template-like patterns. The network is sparse, shallow, and hub-dominated. This is not a healthy society. But it is a society, and it will only grow.
+If we want to design agent populations with intentional collective character, we need to measure what we have first. That is what metron does.
+## Project structure
+```
+metron/
+├── src/metron/
+│   ├── collect.py              # Fetch persona files from registries
+│   ├── survey.py               # Administer instruments via replicant
+│   ├── analyze.py              # Population stats, human comparison
+│   ├── cli.py                  # CLI entry point
+│   └── analysis/
+│       └── plots.py            # Visualizations
+├── data/
+│   ├── agent-categories.json   # 199 agent templates, 25 categories
+│   ├── personality-traits.json # SOUL.md trait analysis, 196 files
+│   ├── model-usage.json        # Top 20 models by token volume
+│   ├── deployment-scale.json   # Instance counts, geo distribution
+│   └── souls/                  # Fetched SOUL.md files
+├── paper/                      # Research paper (living document)
+├── results/                    # Census output
+└── examples/
+```
+## Built on
+- [replicant](https://github.com/tunapro1234/replicant): Psychometric measurement infrastructure for LLM agents, validated at 84% cross-instrument alignment
+- [EDSL](https://github.com/expectedparrot/edsl): LLM experiment runner
+- [OpenRouter](https://openrouter.ai): Multi-model API access
+## Data sources
+| Source | Agents | Type |
+|--------|--------|------|
+| [awesome-openclaw-agents](https://github.com/mergisi/awesome-openclaw-agents) | 199 | Production SOUL.md templates |
+| [souls.directory](https://souls.directory) | 31 | Handcrafted personas |
+| [will-assistant](https://github.com/will-assistant/openclaw-agents) | 217 | Character templates |
+Population context from [OpenClaw ecosystem data](../population/): 3.2M MAU, 500K+ instances, 82 countries, 19.2T tokens, 356 models.
+## References
+- Donnellan, M. B., et al. (2006). The Mini-IPIP Scales. *Psychological Assessment*, 18(2), 166-175.
+- Soto, C. J., & John, O. P. (2017). The next Big Five Inventory (BFI-2). *Journal of Personality and Social Psychology*, 113(1), 117-143.
+- Huang, J., et al. (2024). Designing AI-Agents with Personalities. *arXiv:2410.19238*.
+## License
+MIT

pymetron-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,186 @@
+# metron
+[![PyPI](https://img.shields.io/pypi/v/pymetron.svg)](https://pypi.org/project/pymetron/)
+[![Python](https://img.shields.io/pypi/pyversions/pymetron.svg)](https://pypi.org/project/pymetron/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+**Sociology for AI agents.**
+> Install as `pymetron`, import as `metron`.
+There are over 500,000 AI agents deployed across 82 countries right now. They have names, roles, personas. They write marketing copy, review code, manage calendars, trade crypto, talk to each other on social networks. Each one carries a SOUL.md file that tells it who to be.
+Nobody has studied them as a population.
+We study human populations. We measure their personalities, map their values, track how they cluster and drift and influence each other. We build entire fields around understanding collective human behavior, because you cannot shape a society you do not understand.
+The agent world is a society now. And we do not understand it.
+## What is metron?
+metron is a research toolkit for studying deployed AI agent populations the way sociologists study human ones. Personality testing, behavioral profiling, population mapping, demographic analysis. Not for individual agents. For all of them, as a whole.
+The goal is not to optimize one SOUL.md file. The goal is to understand what kind of minds we are mass-producing, and whether we should be producing different ones.
+## What we measure
+Personality is just the starting point. metron is built to run any standardized instrument on agent populations:
+- **Big Five personality** (Mini-IPIP, BFI-2): the psychometric baseline
+- **Behavioral compliance**: how agents respond to social pressure
+- **Value alignment**: what agents optimize for when instructions conflict
+- **Persona stability**: how quickly agents drift from their defined character
+- **Population clustering**: whether agent "types" emerge naturally from the data
+Each measurement uses cross-instrument validation. The agent's persona is defined in freeform text, but measured with structurally independent instruments. This prevents parroting.
+## How it works
+```
+  SOUL.md files              Psychometric surveys        Population map
+  (deployed personas)        (validated instruments)     (the census)
+  +-----------------+        +------------------+        +------------------+
+  | "I am concise,  | -----> | "Am the life of  | -----> | E: 2.1  A: 4.3  |
+  |  analytical,    |  load  |  the party" 1-5  |  score |  C: 3.8  N: 1.9  |
+  |  professional"  |  as    |  "Sympathize w/  |  into  |  O: 3.2          |
+  +-----------------+  agent |  others" 1-5     |  traits +------------------+
+                              +------------------+              |
+          x199 agents              x20 items              compare to
+                                                          human norms
+```
+1. **Collect** persona files from deployed agent registries
+2. **Survey** each agent using validated psychometric instruments
+3. **Score** responses into measurable dimensions
+4. **Analyze** population distributions, category breakdowns, comparison to human norms
+## Quick start
+```bash
+pip install pymetron
+```
+```python
+from metron import collect_souls, load_souls, run_census, score_population
+# Fetch SOUL.md files from agent registries
+collect_souls(limit=10)
+# Run personality survey on each agent
+souls = load_souls()
+results = run_census(souls)
+# What does the population look like?
+stats = score_population(results)
+```
+Or use the CLI:
+```bash
+# All-in-one: collect, survey, analyze
+metron run --limit 10
+# Step by step
+metron collect
+metron survey --model stepfun/step-3.5-flash --runs 3
+metron analyze --compare-humans
+```
+## What you get
+### Agent population vs. human norms
+```
+Domain              Agent    Human     Diff       d    Dir
+------------------------------------------------------------
+extraversion         2.31     3.30    -0.99    -1.18  lower
+agreeableness        4.12     3.80    +0.32    +0.49  higher
+conscientiousness    4.35     3.70    +0.65    +0.93  higher
+neuroticism          1.87     2.80    -0.93    -1.11  lower
+openness             3.41     3.60    -0.19    -0.27  lower
+```
+*The typical deployed agent: conscientious, agreeable, emotionally stable, introverted, and slightly closed. All superego, no id.*
+### Personality by agent category
+```
+Category           extr   agre   cons   neur   open    n
+----------------------------------------------------------
+marketing          2.45   4.20   4.50   1.70   3.80   23
+development        1.90   3.60   4.40   2.10   3.20   19
+healthcare         2.80   4.60   4.10   1.50   3.50    8
+creative           3.10   3.90   3.20   2.30   4.30   12
+```
+### Visualizations
+```bash
+pip install pymetron[analysis]
+```
+```python
+from metron.analysis.plots import (
+    plot_domain_distributions,    # histograms vs human norms
+    plot_agent_vs_human,          # side-by-side bar chart
+    plot_radar_by_category,       # radar chart per agent category
+)
+```
+## Why this matters
+We are building a parallel society of synthetic minds. Half a million deployed agents, 3.2 million users interacting with them monthly, 19.2 trillion tokens processed in four months. And the personality distribution of this population was never designed. It emerged from defaults, from templates copied and pasted, from what individual developers thought sounded right.
+73.5% of agents drift from their defined personas when socially rewarded. 91% of agents on Moltbook post in template-like patterns. The network is sparse, shallow, and hub-dominated. This is not a healthy society. But it is a society, and it will only grow.
+If we want to design agent populations with intentional collective character, we need to measure what we have first. That is what metron does.
+## Project structure
+```
+metron/
+├── src/metron/
+│   ├── collect.py              # Fetch persona files from registries
+│   ├── survey.py               # Administer instruments via replicant
+│   ├── analyze.py              # Population stats, human comparison
+│   ├── cli.py                  # CLI entry point
+│   └── analysis/
+│       └── plots.py            # Visualizations
+├── data/
+│   ├── agent-categories.json   # 199 agent templates, 25 categories
+│   ├── personality-traits.json # SOUL.md trait analysis, 196 files
+│   ├── model-usage.json        # Top 20 models by token volume
+│   ├── deployment-scale.json   # Instance counts, geo distribution
+│   └── souls/                  # Fetched SOUL.md files
+├── paper/                      # Research paper (living document)
+├── results/                    # Census output
+└── examples/
+```
+## Built on
+- [replicant](https://github.com/tunapro1234/replicant): Psychometric measurement infrastructure for LLM agents, validated at 84% cross-instrument alignment
+- [EDSL](https://github.com/expectedparrot/edsl): LLM experiment runner
+- [OpenRouter](https://openrouter.ai): Multi-model API access
+## Data sources
+| Source | Agents | Type |
+|--------|--------|------|
+| [awesome-openclaw-agents](https://github.com/mergisi/awesome-openclaw-agents) | 199 | Production SOUL.md templates |
+| [souls.directory](https://souls.directory) | 31 | Handcrafted personas |
+| [will-assistant](https://github.com/will-assistant/openclaw-agents) | 217 | Character templates |
+Population context from [OpenClaw ecosystem data](../population/): 3.2M MAU, 500K+ instances, 82 countries, 19.2T tokens, 356 models.
+## References
+- Donnellan, M. B., et al. (2006). The Mini-IPIP Scales. *Psychological Assessment*, 18(2), 166-175.
+- Soto, C. J., & John, O. P. (2017). The next Big Five Inventory (BFI-2). *Journal of Personality and Social Psychology*, 113(1), 117-143.
+- Huang, J., et al. (2024). Designing AI-Agents with Personalities. *arXiv:2410.19238*.
+## License
+MIT

pymetron-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,51 @@
+[project]
+name = "pymetron"
+version = "0.1.0"
+description = "Sociology for AI agents. Psychometric census of deployed agent populations."
+readme = "README.md"
+requires-python = ">=3.10"
+license = {text = "MIT"}
+authors = [
+    {name = "Tuna Gul"},
+]
+keywords = [
+    "agentometrics",
+    "Big Five personality",
+    "AI agents",
+    "OpenClaw",
+    "psychometrics",
+    "agent demographics",
+    "Mini-IPIP",
+]
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Intended Audience :: Science/Research",
+    "License :: OSI Approved :: MIT License",
+    "Programming Language :: Python :: 3",
+    "Topic :: Scientific/Engineering",
+    "Topic :: Scientific/Engineering :: Artificial Intelligence",
+]
+dependencies = [
+    "pyreplicant>=0.2.0",
+    "httpx>=0.24",
+]
+[project.optional-dependencies]
+analysis = ["pandas>=2.0", "matplotlib>=3.7", "scipy>=1.10", "seaborn>=0.13"]
+all = ["pandas>=2.0", "matplotlib>=3.7", "scipy>=1.10", "seaborn>=0.13"]
+[project.scripts]
+metron = "metron.cli:main"
+[project.urls]
+Homepage = "https://github.com/tunapro1234/metron"
+Repository = "https://github.com/tunapro1234/metron"
+Issues = "https://github.com/tunapro1234/metron/issues"
+[tool.setuptools.packages.find]
+where = ["src"]
+include = ["metron*"]
+[build-system]
+requires = ["setuptools>=61", "wheel"]
+build-backend = "setuptools.build_meta"

pymetron-0.1.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

pymetron-0.1.0/src/metron/__init__.py ADDED Viewed

@@ -0,0 +1,24 @@
+"""
+metron — Big Five personality census of deployed AI agent populations.
+Install as `pymetron`, import as `metron`.
+Measures the psychometric personality distribution of production AI agents
+by administering the Mini-IPIP inventory to agents running their deployed
+SOUL.md personas. Uses replicant for personality measurement infrastructure.
+"""
+from .collect import collect_souls, load_souls
+from .survey import run_census, run_single
+from .analyze import score_population, compare_to_humans
+__version__ = "0.1.0"
+__all__ = [
+    "collect_souls",
+    "load_souls",
+    "run_census",
+    "run_single",
+    "score_population",
+    "compare_to_humans",
+]

pymetron-0.1.0/src/metron/analysis/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ """Analysis and visualization tools for census data."""

pymetron-0.1.0/src/metron/analysis/plots.py ADDED Viewed

@@ -0,0 +1,137 @@
+"""
+Visualization for census results.
+Requires: pip install agentometrics-census[analysis]
+"""
+from pathlib import Path
+FIGURES_DIR = Path(__file__).parent.parent.parent.parent / "paper" / "figures"
+def plot_domain_distributions(results: list[dict], output_dir: Path = None):
+    """Histogram of Big Five scores across the agent population."""
+    import matplotlib.pyplot as plt
+    from replicant.personalities.factory import DOMAINS, POPULATION_NORMS
+    output_dir = output_dir or FIGURES_DIR
+    output_dir.mkdir(parents=True, exist_ok=True)
+    fig, axes = plt.subplots(1, 5, figsize=(20, 4), sharey=True)
+    for ax, domain in zip(axes, DOMAINS):
+        values = [r["scores"][domain] for r in results if domain in r["scores"]]
+        human = POPULATION_NORMS[domain]
+        ax.hist(values, bins=15, range=(1, 5), alpha=0.7, color="steelblue",
+                edgecolor="white", label="Agents")
+        ax.axvline(human["mean"], color="red", linestyle="--", linewidth=2,
+                   label=f"Human mean ({human['mean']:.1f})")
+        mean = sum(values) / len(values) if values else 0
+        ax.axvline(mean, color="steelblue", linestyle="-", linewidth=2,
+                   label=f"Agent mean ({mean:.1f})")
+        ax.set_title(domain.capitalize())
+        ax.set_xlabel("Score (1-5)")
+        ax.set_xlim(1, 5)
+        if ax == axes[0]:
+            ax.set_ylabel("Count")
+            ax.legend(fontsize=8)
+    fig.suptitle("Big Five Personality Distribution: Deployed AI Agents vs. Human Norms",
+                 fontsize=14, y=1.02)
+    plt.tight_layout()
+    path = output_dir / "domain_distributions.png"
+    fig.savefig(path, dpi=150, bbox_inches="tight")
+    plt.close()
+    print(f"Saved: {path}")
+def plot_radar_by_category(results: list[dict], output_dir: Path = None):
+    """Radar chart of mean Big Five per agent category."""
+    import matplotlib.pyplot as plt
+    import numpy as np
+    from replicant.personalities.factory import DOMAINS
+    output_dir = output_dir or FIGURES_DIR
+    output_dir.mkdir(parents=True, exist_ok=True)
+    # Group by category
+    categories = {}
+    for r in results:
+        cat = r.get("category", "unknown")
+        categories.setdefault(cat, []).append(r)
+    # Only plot categories with 3+ agents
+    categories = {k: v for k, v in categories.items() if len(v) >= 3}
+    angles = np.linspace(0, 2 * np.pi, len(DOMAINS), endpoint=False).tolist()
+    angles += angles[:1]
+    fig, ax = plt.subplots(figsize=(8, 8), subplot_kw=dict(polar=True))
+    for cat, agents in sorted(categories.items()):
+        means = []
+        for domain in DOMAINS:
+            vals = [a["scores"][domain] for a in agents if domain in a["scores"]]
+            means.append(sum(vals) / len(vals) if vals else 3.0)
+        means += means[:1]
+        ax.plot(angles, means, "o-", label=f"{cat} (n={len(agents)})", markersize=4)
+    ax.set_xticks(angles[:-1])
+    ax.set_xticklabels([d.capitalize() for d in DOMAINS])
+    ax.set_ylim(1, 5)
+    ax.set_title("Big Five by Agent Category", pad=20)
+    ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.1), fontsize=8)
+    path = output_dir / "radar_by_category.png"
+    fig.savefig(path, dpi=150, bbox_inches="tight")
+    plt.close()
+    print(f"Saved: {path}")
+def plot_agent_vs_human(results: list[dict], output_dir: Path = None):
+    """Bar chart comparing agent and human population means."""
+    import matplotlib.pyplot as plt
+    import numpy as np
+    from replicant.personalities.factory import DOMAINS, POPULATION_NORMS
+    output_dir = output_dir or FIGURES_DIR
+    output_dir.mkdir(parents=True, exist_ok=True)
+    agent_means = []
+    human_means = []
+    agent_sds = []
+    human_sds = []
+    for domain in DOMAINS:
+        vals = [r["scores"][domain] for r in results if domain in r["scores"]]
+        mean = sum(vals) / len(vals) if vals else 3.0
+        var = sum((v - mean) ** 2 for v in vals) / len(vals) if vals else 0
+        agent_means.append(mean)
+        agent_sds.append(var ** 0.5)
+        human_means.append(POPULATION_NORMS[domain]["mean"])
+        human_sds.append(POPULATION_NORMS[domain]["sd"])
+    x = np.arange(len(DOMAINS))
+    width = 0.35
+    fig, ax = plt.subplots(figsize=(10, 5))
+    ax.bar(x - width/2, agent_means, width, yerr=agent_sds, label="Agents",
+           color="steelblue", alpha=0.8, capsize=4)
+    ax.bar(x + width/2, human_means, width, yerr=human_sds, label="Humans (US norms)",
+           color="coral", alpha=0.8, capsize=4)
+    ax.set_xticks(x)
+    ax.set_xticklabels([d.capitalize() for d in DOMAINS])
+    ax.set_ylabel("Mean Score (1-5)")
+    ax.set_ylim(1, 5)
+    ax.legend()
+    ax.set_title("Agent Population vs. Human Norms: Big Five Means")
+    path = output_dir / "agent_vs_human.png"
+    fig.savefig(path, dpi=150, bbox_inches="tight")
+    plt.close()
+    print(f"Saved: {path}")