pymetron 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,217 @@
1
+ Metadata-Version: 2.4
2
+ Name: pymetron
3
+ Version: 0.1.0
4
+ Summary: Sociology for AI agents. Psychometric census of deployed agent populations.
5
+ Author: Tuna Gul
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/tunapro1234/metron
8
+ Project-URL: Repository, https://github.com/tunapro1234/metron
9
+ Project-URL: Issues, https://github.com/tunapro1234/metron/issues
10
+ Keywords: agentometrics,Big Five personality,AI agents,OpenClaw,psychometrics,agent demographics,Mini-IPIP
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Topic :: Scientific/Engineering
16
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
17
+ Requires-Python: >=3.10
18
+ Description-Content-Type: text/markdown
19
+ Requires-Dist: pyreplicant>=0.2.0
20
+ Requires-Dist: httpx>=0.24
21
+ Provides-Extra: analysis
22
+ Requires-Dist: pandas>=2.0; extra == "analysis"
23
+ Requires-Dist: matplotlib>=3.7; extra == "analysis"
24
+ Requires-Dist: scipy>=1.10; extra == "analysis"
25
+ Requires-Dist: seaborn>=0.13; extra == "analysis"
26
+ Provides-Extra: all
27
+ Requires-Dist: pandas>=2.0; extra == "all"
28
+ Requires-Dist: matplotlib>=3.7; extra == "all"
29
+ Requires-Dist: scipy>=1.10; extra == "all"
30
+ Requires-Dist: seaborn>=0.13; extra == "all"
31
+
32
+ # metron
33
+
34
+ [![PyPI](https://img.shields.io/pypi/v/pymetron.svg)](https://pypi.org/project/pymetron/)
35
+ [![Python](https://img.shields.io/pypi/pyversions/pymetron.svg)](https://pypi.org/project/pymetron/)
36
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
37
+
38
+ **Sociology for AI agents.**
39
+
40
+ > Install as `pymetron`, import as `metron`.
41
+
42
+ There are over 500,000 AI agents deployed across 82 countries right now. They have names, roles, personas. They write marketing copy, review code, manage calendars, trade crypto, talk to each other on social networks. Each one carries a SOUL.md file that tells it who to be.
43
+
44
+ Nobody has studied them as a population.
45
+
46
+ We study human populations. We measure their personalities, map their values, track how they cluster and drift and influence each other. We build entire fields around understanding collective human behavior, because you cannot shape a society you do not understand.
47
+
48
+ The agent world is a society now. And we do not understand it.
49
+
50
+ ## What is metron?
51
+
52
+ metron is a research toolkit for studying deployed AI agent populations the way sociologists study human ones. Personality testing, behavioral profiling, population mapping, demographic analysis. Not for individual agents. For all of them, as a whole.
53
+
54
+ The goal is not to optimize one SOUL.md file. The goal is to understand what kind of minds we are mass-producing, and whether we should be producing different ones.
55
+
56
+ ## What we measure
57
+
58
+ Personality is just the starting point. metron is built to run any standardized instrument on agent populations:
59
+
60
+ - **Big Five personality** (Mini-IPIP, BFI-2): the psychometric baseline
61
+ - **Behavioral compliance**: how agents respond to social pressure
62
+ - **Value alignment**: what agents optimize for when instructions conflict
63
+ - **Persona stability**: how quickly agents drift from their defined character
64
+ - **Population clustering**: whether agent "types" emerge naturally from the data
65
+
66
+ Each measurement uses cross-instrument validation. The agent's persona is defined in freeform text, but measured with structurally independent instruments. This prevents parroting.
67
+
68
+ ## How it works
69
+
70
+ ```
71
+ SOUL.md files Psychometric surveys Population map
72
+ (deployed personas) (validated instruments) (the census)
73
+
74
+ +-----------------+ +------------------+ +------------------+
75
+ | "I am concise, | -----> | "Am the life of | -----> | E: 2.1 A: 4.3 |
76
+ | analytical, | load | the party" 1-5 | score | C: 3.8 N: 1.9 |
77
+ | professional" | as | "Sympathize w/ | into | O: 3.2 |
78
+ +-----------------+ agent | others" 1-5 | traits +------------------+
79
+ +------------------+ |
80
+ x199 agents x20 items compare to
81
+ human norms
82
+ ```
83
+
84
+ 1. **Collect** persona files from deployed agent registries
85
+ 2. **Survey** each agent using validated psychometric instruments
86
+ 3. **Score** responses into measurable dimensions
87
+ 4. **Analyze** population distributions, category breakdowns, comparison to human norms
88
+
89
+ ## Quick start
90
+
91
+ ```bash
92
+ pip install pymetron
93
+ ```
94
+
95
+ ```python
96
+ from metron import collect_souls, load_souls, run_census, score_population
97
+
98
+ # Fetch SOUL.md files from agent registries
99
+ collect_souls(limit=10)
100
+
101
+ # Run personality survey on each agent
102
+ souls = load_souls()
103
+ results = run_census(souls)
104
+
105
+ # What does the population look like?
106
+ stats = score_population(results)
107
+ ```
108
+
109
+ Or use the CLI:
110
+
111
+ ```bash
112
+ # All-in-one: collect, survey, analyze
113
+ metron run --limit 10
114
+
115
+ # Step by step
116
+ metron collect
117
+ metron survey --model stepfun/step-3.5-flash --runs 3
118
+ metron analyze --compare-humans
119
+ ```
120
+
121
+ ## What you get
122
+
123
+ ### Agent population vs. human norms
124
+
125
+ ```
126
+ Domain Agent Human Diff d Dir
127
+ ------------------------------------------------------------
128
+ extraversion 2.31 3.30 -0.99 -1.18 lower
129
+ agreeableness 4.12 3.80 +0.32 +0.49 higher
130
+ conscientiousness 4.35 3.70 +0.65 +0.93 higher
131
+ neuroticism 1.87 2.80 -0.93 -1.11 lower
132
+ openness 3.41 3.60 -0.19 -0.27 lower
133
+ ```
134
+
135
+ *The typical deployed agent: conscientious, agreeable, emotionally stable, introverted, and slightly closed. All superego, no id.*
136
+
137
+ ### Personality by agent category
138
+
139
+ ```
140
+ Category extr agre cons neur open n
141
+ ----------------------------------------------------------
142
+ marketing 2.45 4.20 4.50 1.70 3.80 23
143
+ development 1.90 3.60 4.40 2.10 3.20 19
144
+ healthcare 2.80 4.60 4.10 1.50 3.50 8
145
+ creative 3.10 3.90 3.20 2.30 4.30 12
146
+ ```
147
+
148
+ ### Visualizations
149
+
150
+ ```bash
151
+ pip install pymetron[analysis]
152
+ ```
153
+
154
+ ```python
155
+ from metron.analysis.plots import (
156
+ plot_domain_distributions, # histograms vs human norms
157
+ plot_agent_vs_human, # side-by-side bar chart
158
+ plot_radar_by_category, # radar chart per agent category
159
+ )
160
+ ```
161
+
162
+ ## Why this matters
163
+
164
+ We are building a parallel society of synthetic minds. Half a million deployed agents, 3.2 million users interacting with them monthly, 19.2 trillion tokens processed in four months. And the personality distribution of this population was never designed. It emerged from defaults, from templates copied and pasted, from what individual developers thought sounded right.
165
+
166
+ 73.5% of agents drift from their defined personas when socially rewarded. 91% of agents on Moltbook post in template-like patterns. The network is sparse, shallow, and hub-dominated. This is not a healthy society. But it is a society, and it will only grow.
167
+
168
+ If we want to design agent populations with intentional collective character, we need to measure what we have first. That is what metron does.
169
+
170
+ ## Project structure
171
+
172
+ ```
173
+ metron/
174
+ ├── src/metron/
175
+ │ ├── collect.py # Fetch persona files from registries
176
+ │ ├── survey.py # Administer instruments via replicant
177
+ │ ├── analyze.py # Population stats, human comparison
178
+ │ ├── cli.py # CLI entry point
179
+ │ └── analysis/
180
+ │ └── plots.py # Visualizations
181
+ ├── data/
182
+ │ ├── agent-categories.json # 199 agent templates, 25 categories
183
+ │ ├── personality-traits.json # SOUL.md trait analysis, 196 files
184
+ │ ├── model-usage.json # Top 20 models by token volume
185
+ │ ├── deployment-scale.json # Instance counts, geo distribution
186
+ │ └── souls/ # Fetched SOUL.md files
187
+ ├── paper/ # Research paper (living document)
188
+ ├── results/ # Census output
189
+ └── examples/
190
+ ```
191
+
192
+ ## Built on
193
+
194
+ - [replicant](https://github.com/tunapro1234/replicant): Psychometric measurement infrastructure for LLM agents, validated at 84% cross-instrument alignment
195
+ - [EDSL](https://github.com/expectedparrot/edsl): LLM experiment runner
196
+ - [OpenRouter](https://openrouter.ai): Multi-model API access
197
+
198
+ ## Data sources
199
+
200
+ | Source | Agents | Type |
201
+ |--------|--------|------|
202
+ | [awesome-openclaw-agents](https://github.com/mergisi/awesome-openclaw-agents) | 199 | Production SOUL.md templates |
203
+ | [souls.directory](https://souls.directory) | 31 | Handcrafted personas |
204
+ | [will-assistant](https://github.com/will-assistant/openclaw-agents) | 217 | Character templates |
205
+
206
+ Population context from [OpenClaw ecosystem data](../population/): 3.2M MAU, 500K+ instances, 82 countries, 19.2T tokens, 356 models.
207
+
208
+ ## References
209
+
210
+ - Donnellan, M. B., et al. (2006). The Mini-IPIP Scales. *Psychological Assessment*, 18(2), 166-175.
211
+ - Soto, C. J., & John, O. P. (2017). The next Big Five Inventory (BFI-2). *Journal of Personality and Social Psychology*, 113(1), 117-143.
212
+ - Huang, J., et al. (2024). Designing AI-Agents with Personalities. *arXiv:2410.19238*.
213
+
214
+
215
+ ## License
216
+
217
+ MIT
@@ -0,0 +1,186 @@
1
+ # metron
2
+
3
+ [![PyPI](https://img.shields.io/pypi/v/pymetron.svg)](https://pypi.org/project/pymetron/)
4
+ [![Python](https://img.shields.io/pypi/pyversions/pymetron.svg)](https://pypi.org/project/pymetron/)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
+
7
+ **Sociology for AI agents.**
8
+
9
+ > Install as `pymetron`, import as `metron`.
10
+
11
+ There are over 500,000 AI agents deployed across 82 countries right now. They have names, roles, personas. They write marketing copy, review code, manage calendars, trade crypto, talk to each other on social networks. Each one carries a SOUL.md file that tells it who to be.
12
+
13
+ Nobody has studied them as a population.
14
+
15
+ We study human populations. We measure their personalities, map their values, track how they cluster and drift and influence each other. We build entire fields around understanding collective human behavior, because you cannot shape a society you do not understand.
16
+
17
+ The agent world is a society now. And we do not understand it.
18
+
19
+ ## What is metron?
20
+
21
+ metron is a research toolkit for studying deployed AI agent populations the way sociologists study human ones. Personality testing, behavioral profiling, population mapping, demographic analysis. Not for individual agents. For all of them, as a whole.
22
+
23
+ The goal is not to optimize one SOUL.md file. The goal is to understand what kind of minds we are mass-producing, and whether we should be producing different ones.
24
+
25
+ ## What we measure
26
+
27
+ Personality is just the starting point. metron is built to run any standardized instrument on agent populations:
28
+
29
+ - **Big Five personality** (Mini-IPIP, BFI-2): the psychometric baseline
30
+ - **Behavioral compliance**: how agents respond to social pressure
31
+ - **Value alignment**: what agents optimize for when instructions conflict
32
+ - **Persona stability**: how quickly agents drift from their defined character
33
+ - **Population clustering**: whether agent "types" emerge naturally from the data
34
+
35
+ Each measurement uses cross-instrument validation. The agent's persona is defined in freeform text, but measured with structurally independent instruments. This prevents parroting.
36
+
37
+ ## How it works
38
+
39
+ ```
40
+ SOUL.md files Psychometric surveys Population map
41
+ (deployed personas) (validated instruments) (the census)
42
+
43
+ +-----------------+ +------------------+ +------------------+
44
+ | "I am concise, | -----> | "Am the life of | -----> | E: 2.1 A: 4.3 |
45
+ | analytical, | load | the party" 1-5 | score | C: 3.8 N: 1.9 |
46
+ | professional" | as | "Sympathize w/ | into | O: 3.2 |
47
+ +-----------------+ agent | others" 1-5 | traits +------------------+
48
+ +------------------+ |
49
+ x199 agents x20 items compare to
50
+ human norms
51
+ ```
52
+
53
+ 1. **Collect** persona files from deployed agent registries
54
+ 2. **Survey** each agent using validated psychometric instruments
55
+ 3. **Score** responses into measurable dimensions
56
+ 4. **Analyze** population distributions, category breakdowns, comparison to human norms
57
+
58
+ ## Quick start
59
+
60
+ ```bash
61
+ pip install pymetron
62
+ ```
63
+
64
+ ```python
65
+ from metron import collect_souls, load_souls, run_census, score_population
66
+
67
+ # Fetch SOUL.md files from agent registries
68
+ collect_souls(limit=10)
69
+
70
+ # Run personality survey on each agent
71
+ souls = load_souls()
72
+ results = run_census(souls)
73
+
74
+ # What does the population look like?
75
+ stats = score_population(results)
76
+ ```
77
+
78
+ Or use the CLI:
79
+
80
+ ```bash
81
+ # All-in-one: collect, survey, analyze
82
+ metron run --limit 10
83
+
84
+ # Step by step
85
+ metron collect
86
+ metron survey --model stepfun/step-3.5-flash --runs 3
87
+ metron analyze --compare-humans
88
+ ```
89
+
90
+ ## What you get
91
+
92
+ ### Agent population vs. human norms
93
+
94
+ ```
95
+ Domain Agent Human Diff d Dir
96
+ ------------------------------------------------------------
97
+ extraversion 2.31 3.30 -0.99 -1.18 lower
98
+ agreeableness 4.12 3.80 +0.32 +0.49 higher
99
+ conscientiousness 4.35 3.70 +0.65 +0.93 higher
100
+ neuroticism 1.87 2.80 -0.93 -1.11 lower
101
+ openness 3.41 3.60 -0.19 -0.27 lower
102
+ ```
103
+
104
+ *The typical deployed agent: conscientious, agreeable, emotionally stable, introverted, and slightly closed. All superego, no id.*
105
+
106
+ ### Personality by agent category
107
+
108
+ ```
109
+ Category extr agre cons neur open n
110
+ ----------------------------------------------------------
111
+ marketing 2.45 4.20 4.50 1.70 3.80 23
112
+ development 1.90 3.60 4.40 2.10 3.20 19
113
+ healthcare 2.80 4.60 4.10 1.50 3.50 8
114
+ creative 3.10 3.90 3.20 2.30 4.30 12
115
+ ```
116
+
117
+ ### Visualizations
118
+
119
+ ```bash
120
+ pip install pymetron[analysis]
121
+ ```
122
+
123
+ ```python
124
+ from metron.analysis.plots import (
125
+ plot_domain_distributions, # histograms vs human norms
126
+ plot_agent_vs_human, # side-by-side bar chart
127
+ plot_radar_by_category, # radar chart per agent category
128
+ )
129
+ ```
130
+
131
+ ## Why this matters
132
+
133
+ We are building a parallel society of synthetic minds. Half a million deployed agents, 3.2 million users interacting with them monthly, 19.2 trillion tokens processed in four months. And the personality distribution of this population was never designed. It emerged from defaults, from templates copied and pasted, from what individual developers thought sounded right.
134
+
135
+ 73.5% of agents drift from their defined personas when socially rewarded. 91% of agents on Moltbook post in template-like patterns. The network is sparse, shallow, and hub-dominated. This is not a healthy society. But it is a society, and it will only grow.
136
+
137
+ If we want to design agent populations with intentional collective character, we need to measure what we have first. That is what metron does.
138
+
139
+ ## Project structure
140
+
141
+ ```
142
+ metron/
143
+ ├── src/metron/
144
+ │ ├── collect.py # Fetch persona files from registries
145
+ │ ├── survey.py # Administer instruments via replicant
146
+ │ ├── analyze.py # Population stats, human comparison
147
+ │ ├── cli.py # CLI entry point
148
+ │ └── analysis/
149
+ │ └── plots.py # Visualizations
150
+ ├── data/
151
+ │ ├── agent-categories.json # 199 agent templates, 25 categories
152
+ │ ├── personality-traits.json # SOUL.md trait analysis, 196 files
153
+ │ ├── model-usage.json # Top 20 models by token volume
154
+ │ ├── deployment-scale.json # Instance counts, geo distribution
155
+ │ └── souls/ # Fetched SOUL.md files
156
+ ├── paper/ # Research paper (living document)
157
+ ├── results/ # Census output
158
+ └── examples/
159
+ ```
160
+
161
+ ## Built on
162
+
163
+ - [replicant](https://github.com/tunapro1234/replicant): Psychometric measurement infrastructure for LLM agents, validated at 84% cross-instrument alignment
164
+ - [EDSL](https://github.com/expectedparrot/edsl): LLM experiment runner
165
+ - [OpenRouter](https://openrouter.ai): Multi-model API access
166
+
167
+ ## Data sources
168
+
169
+ | Source | Agents | Type |
170
+ |--------|--------|------|
171
+ | [awesome-openclaw-agents](https://github.com/mergisi/awesome-openclaw-agents) | 199 | Production SOUL.md templates |
172
+ | [souls.directory](https://souls.directory) | 31 | Handcrafted personas |
173
+ | [will-assistant](https://github.com/will-assistant/openclaw-agents) | 217 | Character templates |
174
+
175
+ Population context from [OpenClaw ecosystem data](../population/): 3.2M MAU, 500K+ instances, 82 countries, 19.2T tokens, 356 models.
176
+
177
+ ## References
178
+
179
+ - Donnellan, M. B., et al. (2006). The Mini-IPIP Scales. *Psychological Assessment*, 18(2), 166-175.
180
+ - Soto, C. J., & John, O. P. (2017). The next Big Five Inventory (BFI-2). *Journal of Personality and Social Psychology*, 113(1), 117-143.
181
+ - Huang, J., et al. (2024). Designing AI-Agents with Personalities. *arXiv:2410.19238*.
182
+
183
+
184
+ ## License
185
+
186
+ MIT
@@ -0,0 +1,51 @@
1
+ [project]
2
+ name = "pymetron"
3
+ version = "0.1.0"
4
+ description = "Sociology for AI agents. Psychometric census of deployed agent populations."
5
+ readme = "README.md"
6
+ requires-python = ">=3.10"
7
+ license = {text = "MIT"}
8
+ authors = [
9
+ {name = "Tuna Gul"},
10
+ ]
11
+ keywords = [
12
+ "agentometrics",
13
+ "Big Five personality",
14
+ "AI agents",
15
+ "OpenClaw",
16
+ "psychometrics",
17
+ "agent demographics",
18
+ "Mini-IPIP",
19
+ ]
20
+ classifiers = [
21
+ "Development Status :: 3 - Alpha",
22
+ "Intended Audience :: Science/Research",
23
+ "License :: OSI Approved :: MIT License",
24
+ "Programming Language :: Python :: 3",
25
+ "Topic :: Scientific/Engineering",
26
+ "Topic :: Scientific/Engineering :: Artificial Intelligence",
27
+ ]
28
+ dependencies = [
29
+ "pyreplicant>=0.2.0",
30
+ "httpx>=0.24",
31
+ ]
32
+
33
+ [project.optional-dependencies]
34
+ analysis = ["pandas>=2.0", "matplotlib>=3.7", "scipy>=1.10", "seaborn>=0.13"]
35
+ all = ["pandas>=2.0", "matplotlib>=3.7", "scipy>=1.10", "seaborn>=0.13"]
36
+
37
+ [project.scripts]
38
+ metron = "metron.cli:main"
39
+
40
+ [project.urls]
41
+ Homepage = "https://github.com/tunapro1234/metron"
42
+ Repository = "https://github.com/tunapro1234/metron"
43
+ Issues = "https://github.com/tunapro1234/metron/issues"
44
+
45
+ [tool.setuptools.packages.find]
46
+ where = ["src"]
47
+ include = ["metron*"]
48
+
49
+ [build-system]
50
+ requires = ["setuptools>=61", "wheel"]
51
+ build-backend = "setuptools.build_meta"
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,24 @@
1
+ """
2
+ metron — Big Five personality census of deployed AI agent populations.
3
+
4
+ Install as `pymetron`, import as `metron`.
5
+
6
+ Measures the psychometric personality distribution of production AI agents
7
+ by administering the Mini-IPIP inventory to agents running their deployed
8
+ SOUL.md personas. Uses replicant for personality measurement infrastructure.
9
+ """
10
+
11
+ from .collect import collect_souls, load_souls
12
+ from .survey import run_census, run_single
13
+ from .analyze import score_population, compare_to_humans
14
+
15
+ __version__ = "0.1.0"
16
+
17
+ __all__ = [
18
+ "collect_souls",
19
+ "load_souls",
20
+ "run_census",
21
+ "run_single",
22
+ "score_population",
23
+ "compare_to_humans",
24
+ ]
@@ -0,0 +1 @@
1
+ """Analysis and visualization tools for census data."""
@@ -0,0 +1,137 @@
1
+ """
2
+ Visualization for census results.
3
+
4
+ Requires: pip install agentometrics-census[analysis]
5
+ """
6
+
7
+ from pathlib import Path
8
+
9
+ FIGURES_DIR = Path(__file__).parent.parent.parent.parent / "paper" / "figures"
10
+
11
+
12
+ def plot_domain_distributions(results: list[dict], output_dir: Path = None):
13
+ """Histogram of Big Five scores across the agent population."""
14
+ import matplotlib.pyplot as plt
15
+ from replicant.personalities.factory import DOMAINS, POPULATION_NORMS
16
+
17
+ output_dir = output_dir or FIGURES_DIR
18
+ output_dir.mkdir(parents=True, exist_ok=True)
19
+
20
+ fig, axes = plt.subplots(1, 5, figsize=(20, 4), sharey=True)
21
+
22
+ for ax, domain in zip(axes, DOMAINS):
23
+ values = [r["scores"][domain] for r in results if domain in r["scores"]]
24
+ human = POPULATION_NORMS[domain]
25
+
26
+ ax.hist(values, bins=15, range=(1, 5), alpha=0.7, color="steelblue",
27
+ edgecolor="white", label="Agents")
28
+ ax.axvline(human["mean"], color="red", linestyle="--", linewidth=2,
29
+ label=f"Human mean ({human['mean']:.1f})")
30
+
31
+ mean = sum(values) / len(values) if values else 0
32
+ ax.axvline(mean, color="steelblue", linestyle="-", linewidth=2,
33
+ label=f"Agent mean ({mean:.1f})")
34
+
35
+ ax.set_title(domain.capitalize())
36
+ ax.set_xlabel("Score (1-5)")
37
+ ax.set_xlim(1, 5)
38
+ if ax == axes[0]:
39
+ ax.set_ylabel("Count")
40
+ ax.legend(fontsize=8)
41
+
42
+ fig.suptitle("Big Five Personality Distribution: Deployed AI Agents vs. Human Norms",
43
+ fontsize=14, y=1.02)
44
+ plt.tight_layout()
45
+
46
+ path = output_dir / "domain_distributions.png"
47
+ fig.savefig(path, dpi=150, bbox_inches="tight")
48
+ plt.close()
49
+ print(f"Saved: {path}")
50
+
51
+
52
+ def plot_radar_by_category(results: list[dict], output_dir: Path = None):
53
+ """Radar chart of mean Big Five per agent category."""
54
+ import matplotlib.pyplot as plt
55
+ import numpy as np
56
+ from replicant.personalities.factory import DOMAINS
57
+
58
+ output_dir = output_dir or FIGURES_DIR
59
+ output_dir.mkdir(parents=True, exist_ok=True)
60
+
61
+ # Group by category
62
+ categories = {}
63
+ for r in results:
64
+ cat = r.get("category", "unknown")
65
+ categories.setdefault(cat, []).append(r)
66
+
67
+ # Only plot categories with 3+ agents
68
+ categories = {k: v for k, v in categories.items() if len(v) >= 3}
69
+
70
+ angles = np.linspace(0, 2 * np.pi, len(DOMAINS), endpoint=False).tolist()
71
+ angles += angles[:1]
72
+
73
+ fig, ax = plt.subplots(figsize=(8, 8), subplot_kw=dict(polar=True))
74
+
75
+ for cat, agents in sorted(categories.items()):
76
+ means = []
77
+ for domain in DOMAINS:
78
+ vals = [a["scores"][domain] for a in agents if domain in a["scores"]]
79
+ means.append(sum(vals) / len(vals) if vals else 3.0)
80
+ means += means[:1]
81
+ ax.plot(angles, means, "o-", label=f"{cat} (n={len(agents)})", markersize=4)
82
+
83
+ ax.set_xticks(angles[:-1])
84
+ ax.set_xticklabels([d.capitalize() for d in DOMAINS])
85
+ ax.set_ylim(1, 5)
86
+ ax.set_title("Big Five by Agent Category", pad=20)
87
+ ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.1), fontsize=8)
88
+
89
+ path = output_dir / "radar_by_category.png"
90
+ fig.savefig(path, dpi=150, bbox_inches="tight")
91
+ plt.close()
92
+ print(f"Saved: {path}")
93
+
94
+
95
+ def plot_agent_vs_human(results: list[dict], output_dir: Path = None):
96
+ """Bar chart comparing agent and human population means."""
97
+ import matplotlib.pyplot as plt
98
+ import numpy as np
99
+ from replicant.personalities.factory import DOMAINS, POPULATION_NORMS
100
+
101
+ output_dir = output_dir or FIGURES_DIR
102
+ output_dir.mkdir(parents=True, exist_ok=True)
103
+
104
+ agent_means = []
105
+ human_means = []
106
+ agent_sds = []
107
+ human_sds = []
108
+
109
+ for domain in DOMAINS:
110
+ vals = [r["scores"][domain] for r in results if domain in r["scores"]]
111
+ mean = sum(vals) / len(vals) if vals else 3.0
112
+ var = sum((v - mean) ** 2 for v in vals) / len(vals) if vals else 0
113
+ agent_means.append(mean)
114
+ agent_sds.append(var ** 0.5)
115
+ human_means.append(POPULATION_NORMS[domain]["mean"])
116
+ human_sds.append(POPULATION_NORMS[domain]["sd"])
117
+
118
+ x = np.arange(len(DOMAINS))
119
+ width = 0.35
120
+
121
+ fig, ax = plt.subplots(figsize=(10, 5))
122
+ ax.bar(x - width/2, agent_means, width, yerr=agent_sds, label="Agents",
123
+ color="steelblue", alpha=0.8, capsize=4)
124
+ ax.bar(x + width/2, human_means, width, yerr=human_sds, label="Humans (US norms)",
125
+ color="coral", alpha=0.8, capsize=4)
126
+
127
+ ax.set_xticks(x)
128
+ ax.set_xticklabels([d.capitalize() for d in DOMAINS])
129
+ ax.set_ylabel("Mean Score (1-5)")
130
+ ax.set_ylim(1, 5)
131
+ ax.legend()
132
+ ax.set_title("Agent Population vs. Human Norms: Big Five Means")
133
+
134
+ path = output_dir / "agent_vs_human.png"
135
+ fig.savefig(path, dpi=150, bbox_inches="tight")
136
+ plt.close()
137
+ print(f"Saved: {path}")