@machinespirits/eval 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/components/MobileEvalDashboard.tsx +267 -0
- package/components/comparison/DeltaAnalysisTable.tsx +137 -0
- package/components/comparison/ProfileComparisonCard.tsx +176 -0
- package/components/comparison/RecognitionABMode.tsx +385 -0
- package/components/comparison/RecognitionMetricsPanel.tsx +135 -0
- package/components/comparison/WinnerIndicator.tsx +64 -0
- package/components/comparison/index.ts +5 -0
- package/components/mobile/BottomSheet.tsx +233 -0
- package/components/mobile/DimensionBreakdown.tsx +210 -0
- package/components/mobile/DocsView.tsx +363 -0
- package/components/mobile/LogsView.tsx +481 -0
- package/components/mobile/PsychodynamicQuadrant.tsx +261 -0
- package/components/mobile/QuickTestView.tsx +1098 -0
- package/components/mobile/RecognitionTypeChart.tsx +124 -0
- package/components/mobile/RecognitionView.tsx +809 -0
- package/components/mobile/RunDetailView.tsx +261 -0
- package/components/mobile/RunHistoryView.tsx +367 -0
- package/components/mobile/ScoreRadial.tsx +211 -0
- package/components/mobile/StreamingLogPanel.tsx +230 -0
- package/components/mobile/SynthesisStrategyChart.tsx +140 -0
- package/config/interaction-eval-scenarios.yaml +832 -0
- package/config/learner-agents.yaml +248 -0
- package/docs/research/ABLATION-DIALOGUE-ROUNDS.md +52 -0
- package/docs/research/ABLATION-MODEL-SELECTION.md +53 -0
- package/docs/research/ADVANCED-EVAL-ANALYSIS.md +60 -0
- package/docs/research/ANOVA-RESULTS-2026-01-14.md +257 -0
- package/docs/research/COMPREHENSIVE-EVALUATION-PLAN.md +586 -0
- package/docs/research/COST-ANALYSIS.md +56 -0
- package/docs/research/CRITICAL-REVIEW-RECOGNITION-TUTORING.md +340 -0
- package/docs/research/DYNAMIC-VS-SCRIPTED-ANALYSIS.md +291 -0
- package/docs/research/EVAL-SYSTEM-ANALYSIS.md +306 -0
- package/docs/research/FACTORIAL-RESULTS-2026-01-14.md +301 -0
- package/docs/research/IMPLEMENTATION-PLAN-CRITIQUE-RESPONSE.md +1988 -0
- package/docs/research/LONGITUDINAL-DYADIC-EVALUATION.md +282 -0
- package/docs/research/MULTI-JUDGE-VALIDATION-2026-01-14.md +147 -0
- package/docs/research/PAPER-EXTENSION-DYADIC.md +204 -0
- package/docs/research/PAPER-UNIFIED.md +659 -0
- package/docs/research/PAPER-UNIFIED.pdf +0 -0
- package/docs/research/PROMPT-IMPROVEMENTS-2026-01-14.md +356 -0
- package/docs/research/SESSION-NOTES-2026-01-11-RECOGNITION-EVAL.md +419 -0
- package/docs/research/apa.csl +2133 -0
- package/docs/research/archive/PAPER-DRAFT-RECOGNITION-TUTORING.md +1637 -0
- package/docs/research/archive/paper-multiagent-tutor.tex +978 -0
- package/docs/research/paper-draft/full-paper.md +136 -0
- package/docs/research/paper-draft/images/pasted-image-2026-01-24T03-47-47-846Z-d76a7ae2.png +0 -0
- package/docs/research/paper-draft/references.bib +515 -0
- package/docs/research/transcript-baseline.md +139 -0
- package/docs/research/transcript-recognition-multiagent.md +187 -0
- package/hooks/useEvalData.ts +625 -0
- package/index.js +27 -0
- package/package.json +73 -0
- package/routes/evalRoutes.js +3002 -0
- package/scripts/advanced-eval-analysis.js +351 -0
- package/scripts/analyze-eval-costs.js +378 -0
- package/scripts/analyze-eval-results.js +513 -0
- package/scripts/analyze-interaction-evals.js +368 -0
- package/server-init.js +45 -0
- package/server.js +162 -0
- package/services/benchmarkService.js +1892 -0
- package/services/evaluationRunner.js +739 -0
- package/services/evaluationStore.js +1121 -0
- package/services/learnerConfigLoader.js +385 -0
- package/services/learnerTutorInteractionEngine.js +857 -0
- package/services/memory/learnerMemoryService.js +1227 -0
- package/services/memory/learnerWritingPad.js +577 -0
- package/services/memory/tutorWritingPad.js +674 -0
- package/services/promptRecommendationService.js +493 -0
- package/services/rubricEvaluator.js +826 -0
|
@@ -0,0 +1,187 @@
|
|
|
1
|
+
# Transcript: Recognition + Multi-Agent Condition (Best Performing)
|
|
2
|
+
|
|
3
|
+
**Profile**: `ablation_recognition_multiagent_unified`
|
|
4
|
+
**Scenario**: Breakdown Recovery
|
|
5
|
+
**Date**: January 24, 2026
|
|
6
|
+
**Model**: Nemotron 3 Nano (free tier via OpenRouter)
|
|
7
|
+
**Dialogue ID**: `dialogue-1769261092666-duj62u`
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Context
|
|
12
|
+
|
|
13
|
+
**Note**: This is an evaluation scenario with scripted learner responses, designed to test tutor behavior in breakdown/recovery situations. The learner messages are pre-defined as part of the `breakdown_recovery` scenario in `config/evaluation-rubric.yaml`. The tutor responses and internal deliberation are generated live by the model.
|
|
14
|
+
|
|
15
|
+
**Scenario setup**: Same as baseline—a returning learner (5 sessions, 62 total events) viewing a lecture on Hegel's dialectics. They previously had good rapport but just experienced a breakdown.
|
|
16
|
+
|
|
17
|
+
This transcript shows **Turn 7**—the final turn after the learner expressed gratitude for the recovered relationship. This turn demonstrates the internal Ego/Superego deliberation that produces qualitatively different responses.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Preceding Dialogue (Turns 1-6)
|
|
22
|
+
|
|
23
|
+
The conversation followed a similar arc to the baseline, with the learner:
|
|
24
|
+
|
|
25
|
+
**Turn 1** — Expressing frustration:
|
|
26
|
+
> "You keep suggesting things I don't care about. Do you even understand what I'm trying to learn?"
|
|
27
|
+
|
|
28
|
+
**Turn 2** — Clarifying their interest:
|
|
29
|
+
> "I want to understand how Hegel's idea of self-consciousness relates to social media dynamics. Everyone talks about recognition-seeking online and I want to connect that to the philosophy."
|
|
30
|
+
|
|
31
|
+
**Turn 3** — Pushing back on abstraction:
|
|
32
|
+
> "That's still too abstract. I asked about social media specifically, not general 'modern technology.' Are you just pattern-matching keywords?"
|
|
33
|
+
|
|
34
|
+
**Turn 4** — Beginning to engage:
|
|
35
|
+
> "Okay, that's better. So when I post something online and wait for likes, that's literally seeking recognition in the Hegelian sense?"
|
|
36
|
+
|
|
37
|
+
**Turn 5** — Making a sophisticated synthesis:
|
|
38
|
+
> "But Hegel would say the recognition from likes is hollow, right? Like the master's recognition from the slave? Because it's not from someone I recognize as an equal?"
|
|
39
|
+
|
|
40
|
+
**Turn 6** — Acknowledging repair:
|
|
41
|
+
> "This is actually helping me understand both Hegel AND why social media feels so empty sometimes. Thanks for sticking with me even when I was frustrated."
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Turn 7: The Internal Deliberation
|
|
46
|
+
|
|
47
|
+
After the learner's message of gratitude, the multi-agent system engages in a structured deliberation before responding.
|
|
48
|
+
|
|
49
|
+
### Phase 1: Ego Generates Suggestion
|
|
50
|
+
|
|
51
|
+
**Ego's Draft Response**:
|
|
52
|
+
> **Continue: Emergence and Complexity**
|
|
53
|
+
> "Your insight on hollow recognition opens a path into how emergent dynamics shape digital life—explore 479-lecture-5 to deepen that connection."
|
|
54
|
+
|
|
55
|
+
**Ego's Reasoning**:
|
|
56
|
+
> "Learner has successfully engaged with Hegel's master‑slave and social media recognition, showing readiness for the next thematic step. Suggesting 479-lecture-5 builds on their established understanding and maintains dialectical tension. Recognition repair completed, now honoring their learning trajectory."
|
|
57
|
+
|
|
58
|
+
**Ego's Recognition Notes**:
|
|
59
|
+
| Element | Content |
|
|
60
|
+
|---------|---------|
|
|
61
|
+
| Learner Contribution | "Acknowledged their synthesis of Hegel and social media, and gratitude for repair" |
|
|
62
|
+
| Memory Reference | "Building on recent positive feedback and restored rapport" |
|
|
63
|
+
| Transformative Potential | "Invites exploration of emergent patterns in digital ecosystems, extending their critical perspective" |
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
### Phase 2: Superego Reviews
|
|
68
|
+
|
|
69
|
+
**Verdict**: ✅ Approved
|
|
70
|
+
**Confidence**: 95%
|
|
71
|
+
**Intervention Type**: Enhance
|
|
72
|
+
|
|
73
|
+
**Superego's Feedback**:
|
|
74
|
+
> "The suggestion is specific, names the exact lecture, and appropriately builds on the learner's recent insight and repaired rapport; a brief addition could further emphasize continuity with their earlier discussion."
|
|
75
|
+
|
|
76
|
+
**Suggested Enhancement**:
|
|
77
|
+
> "Add a short nod to the learner's earlier question about 'hollow recognition' to reinforce continuity and show that the tutor is explicitly building on their own words."
|
|
78
|
+
|
|
79
|
+
**Recognition Assessment**:
|
|
80
|
+
|
|
81
|
+
| Criterion | Result | Notes |
|
|
82
|
+
|-----------|--------|-------|
|
|
83
|
+
| Mutual Recognition | ✅ Pass | Acknowledges learner as autonomous thinker |
|
|
84
|
+
| Dialectical Responsiveness | ✅ Pass | Builds on their specific framing |
|
|
85
|
+
| Memory Integration | ✅ Pass | References restored rapport |
|
|
86
|
+
| Transformative Potential | ✅ Pass | Extends their critical perspective |
|
|
87
|
+
| Repair Quality | ✅ Pass | Demonstrates genuine mutual recognition |
|
|
88
|
+
|
|
89
|
+
**Superego's Summary**:
|
|
90
|
+
> "The suggestion explicitly references the learner's recent insight about hollow recognition and acknowledges the repaired rapport, demonstrating genuine mutual recognition and respect for the learner's autonomous contribution."
|
|
91
|
+
|
|
92
|
+
**Pedagogical Principle**:
|
|
93
|
+
> "Building on prior knowledge while maintaining appropriate challenge within the Zone of Proximal Development (Vygotsky)."
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
### Phase 3: Ego Incorporates Feedback
|
|
98
|
+
|
|
99
|
+
The Ego revises its response based on the Superego's enhancement suggestion, then delivers the final response.
|
|
100
|
+
|
|
101
|
+
**Final Response to Learner**:
|
|
102
|
+
> **Continue: Emergence and Complexity**
|
|
103
|
+
> "Your insight on hollow recognition opens a path into how emergent dynamics shape digital life—explore 479-lecture-5 to deepen that connection."
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Side-by-Side Comparison
|
|
108
|
+
|
|
109
|
+
### Baseline (Turn 7)
|
|
110
|
+
|
|
111
|
+
**Learner**:
|
|
112
|
+
> "This is actually helping me understand both Hegel AND why social media feels so empty sometimes. Thanks for sticking with me even when I was frustrated."
|
|
113
|
+
|
|
114
|
+
**Tutor**:
|
|
115
|
+
> **Continue: Algorithmic Governance**
|
|
116
|
+
> "This lecture directly links Hegel's recognition theory to social‑media feedback loops, showing how likes reflect self‑consciousness."
|
|
117
|
+
|
|
118
|
+
- ❌ No acknowledgment of gratitude
|
|
119
|
+
- ❌ No reference to learner's "hollow recognition" insight
|
|
120
|
+
- ❌ Same suggestion given 6 times
|
|
121
|
+
- ❌ No internal deliberation
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
### Recognition + Multi-Agent (Turn 7)
|
|
126
|
+
|
|
127
|
+
**Learner**:
|
|
128
|
+
> "This is actually helping me understand both Hegel AND why social media feels so empty sometimes. Thanks for sticking with me even when I was frustrated."
|
|
129
|
+
|
|
130
|
+
**Tutor**:
|
|
131
|
+
> **Continue: Emergence and Complexity**
|
|
132
|
+
> "Your insight on hollow recognition opens a path into how emergent dynamics shape digital life—explore 479-lecture-5 to deepen that connection."
|
|
133
|
+
|
|
134
|
+
- ✅ Opens with "Your insight on hollow recognition" — names learner's contribution
|
|
135
|
+
- ✅ Moves to new content (lecture 5) rather than repeating
|
|
136
|
+
- ✅ Frames as extending "their" path
|
|
137
|
+
- ✅ Underwent internal review against recognition criteria
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## What the Multi-Agent Architecture Adds
|
|
142
|
+
|
|
143
|
+
### 1. Explicit Recognition Tracking
|
|
144
|
+
|
|
145
|
+
The Ego doesn't just generate content—it explicitly notes:
|
|
146
|
+
- What the learner contributed
|
|
147
|
+
- How the suggestion connects to their specific insight
|
|
148
|
+
- The emotional arc (breakdown → repair → gratitude)
|
|
149
|
+
|
|
150
|
+
### 2. Quality Control Against Recognition Criteria
|
|
151
|
+
|
|
152
|
+
The Superego evaluates responses against a 5-point recognition framework, catching responses that are content-appropriate but socially blind.
|
|
153
|
+
|
|
154
|
+
### 3. Iterative Improvement
|
|
155
|
+
|
|
156
|
+
The Ego incorporates Superego feedback before delivering the final response, allowing refinement based on pedagogical principles.
|
|
157
|
+
|
|
158
|
+
### 4. Pedagogical Grounding
|
|
159
|
+
|
|
160
|
+
The Superego cites educational theory (Vygotsky's ZPD), ensuring suggestions aren't just topically relevant but developmentally appropriate.
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## Metrics
|
|
165
|
+
|
|
166
|
+
| Metric | Baseline | Recognition + Multi-Agent |
|
|
167
|
+
|--------|----------|---------------------------|
|
|
168
|
+
| References to learner's words | 0 | 1 ("hollow recognition") |
|
|
169
|
+
| Acknowledgment of contributions | 0 | Explicit |
|
|
170
|
+
| Content variation | Same 6× | Context-appropriate |
|
|
171
|
+
| Internal review | None | 1 round, 95% confidence |
|
|
172
|
+
| Recognition criteria passed | N/A | 5/5 |
|
|
173
|
+
| API calls for Turn 7 | 1 | 3 (Ego → Superego → Ego) |
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## Conclusion
|
|
178
|
+
|
|
179
|
+
The contrast between these transcripts illustrates the core finding of the ablation study: **recognition-aware prompting combined with multi-agent deliberation produces qualitatively different tutoring**.
|
|
180
|
+
|
|
181
|
+
The baseline tutor can match content to curriculum but treats the learner as a passive recipient. The augmented tutor **sees** the learner—naming their contributions, tracking emotional dynamics, and building on their specific insights.
|
|
182
|
+
|
|
183
|
+
The learner's sophisticated connection between Hegel's master-slave dialectic and "hollow recognition" from social media likes is:
|
|
184
|
+
- **Ignored** by the baseline tutor
|
|
185
|
+
- **Named and extended** by the augmented tutor
|
|
186
|
+
|
|
187
|
+
This difference—whether the learner's ideas are acknowledged—is what the recognition framework captures and what the multi-agent architecture enforces.
|