@machinespirits/eval 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/components/MobileEvalDashboard.tsx +267 -0
  2. package/components/comparison/DeltaAnalysisTable.tsx +137 -0
  3. package/components/comparison/ProfileComparisonCard.tsx +176 -0
  4. package/components/comparison/RecognitionABMode.tsx +385 -0
  5. package/components/comparison/RecognitionMetricsPanel.tsx +135 -0
  6. package/components/comparison/WinnerIndicator.tsx +64 -0
  7. package/components/comparison/index.ts +5 -0
  8. package/components/mobile/BottomSheet.tsx +233 -0
  9. package/components/mobile/DimensionBreakdown.tsx +210 -0
  10. package/components/mobile/DocsView.tsx +363 -0
  11. package/components/mobile/LogsView.tsx +481 -0
  12. package/components/mobile/PsychodynamicQuadrant.tsx +261 -0
  13. package/components/mobile/QuickTestView.tsx +1098 -0
  14. package/components/mobile/RecognitionTypeChart.tsx +124 -0
  15. package/components/mobile/RecognitionView.tsx +809 -0
  16. package/components/mobile/RunDetailView.tsx +261 -0
  17. package/components/mobile/RunHistoryView.tsx +367 -0
  18. package/components/mobile/ScoreRadial.tsx +211 -0
  19. package/components/mobile/StreamingLogPanel.tsx +230 -0
  20. package/components/mobile/SynthesisStrategyChart.tsx +140 -0
  21. package/config/interaction-eval-scenarios.yaml +832 -0
  22. package/config/learner-agents.yaml +248 -0
  23. package/docs/research/ABLATION-DIALOGUE-ROUNDS.md +52 -0
  24. package/docs/research/ABLATION-MODEL-SELECTION.md +53 -0
  25. package/docs/research/ADVANCED-EVAL-ANALYSIS.md +60 -0
  26. package/docs/research/ANOVA-RESULTS-2026-01-14.md +257 -0
  27. package/docs/research/COMPREHENSIVE-EVALUATION-PLAN.md +586 -0
  28. package/docs/research/COST-ANALYSIS.md +56 -0
  29. package/docs/research/CRITICAL-REVIEW-RECOGNITION-TUTORING.md +340 -0
  30. package/docs/research/DYNAMIC-VS-SCRIPTED-ANALYSIS.md +291 -0
  31. package/docs/research/EVAL-SYSTEM-ANALYSIS.md +306 -0
  32. package/docs/research/FACTORIAL-RESULTS-2026-01-14.md +301 -0
  33. package/docs/research/IMPLEMENTATION-PLAN-CRITIQUE-RESPONSE.md +1988 -0
  34. package/docs/research/LONGITUDINAL-DYADIC-EVALUATION.md +282 -0
  35. package/docs/research/MULTI-JUDGE-VALIDATION-2026-01-14.md +147 -0
  36. package/docs/research/PAPER-EXTENSION-DYADIC.md +204 -0
  37. package/docs/research/PAPER-UNIFIED.md +659 -0
  38. package/docs/research/PAPER-UNIFIED.pdf +0 -0
  39. package/docs/research/PROMPT-IMPROVEMENTS-2026-01-14.md +356 -0
  40. package/docs/research/SESSION-NOTES-2026-01-11-RECOGNITION-EVAL.md +419 -0
  41. package/docs/research/apa.csl +2133 -0
  42. package/docs/research/archive/PAPER-DRAFT-RECOGNITION-TUTORING.md +1637 -0
  43. package/docs/research/archive/paper-multiagent-tutor.tex +978 -0
  44. package/docs/research/paper-draft/full-paper.md +136 -0
  45. package/docs/research/paper-draft/images/pasted-image-2026-01-24T03-47-47-846Z-d76a7ae2.png +0 -0
  46. package/docs/research/paper-draft/references.bib +515 -0
  47. package/docs/research/transcript-baseline.md +139 -0
  48. package/docs/research/transcript-recognition-multiagent.md +187 -0
  49. package/hooks/useEvalData.ts +625 -0
  50. package/index.js +27 -0
  51. package/package.json +73 -0
  52. package/routes/evalRoutes.js +3002 -0
  53. package/scripts/advanced-eval-analysis.js +351 -0
  54. package/scripts/analyze-eval-costs.js +378 -0
  55. package/scripts/analyze-eval-results.js +513 -0
  56. package/scripts/analyze-interaction-evals.js +368 -0
  57. package/server-init.js +45 -0
  58. package/server.js +162 -0
  59. package/services/benchmarkService.js +1892 -0
  60. package/services/evaluationRunner.js +739 -0
  61. package/services/evaluationStore.js +1121 -0
  62. package/services/learnerConfigLoader.js +385 -0
  63. package/services/learnerTutorInteractionEngine.js +857 -0
  64. package/services/memory/learnerMemoryService.js +1227 -0
  65. package/services/memory/learnerWritingPad.js +577 -0
  66. package/services/memory/tutorWritingPad.js +674 -0
  67. package/services/promptRecommendationService.js +493 -0
  68. package/services/rubricEvaluator.js +826 -0
@@ -0,0 +1,187 @@
1
+ # Transcript: Recognition + Multi-Agent Condition (Best Performing)
2
+
3
+ **Profile**: `ablation_recognition_multiagent_unified`
4
+ **Scenario**: Breakdown Recovery
5
+ **Date**: January 24, 2026
6
+ **Model**: Nemotron 3 Nano (free tier via OpenRouter)
7
+ **Dialogue ID**: `dialogue-1769261092666-duj62u`
8
+
9
+ ---
10
+
11
+ ## Context
12
+
13
+ **Note**: This is an evaluation scenario with scripted learner responses, designed to test tutor behavior in breakdown/recovery situations. The learner messages are pre-defined as part of the `breakdown_recovery` scenario in `config/evaluation-rubric.yaml`. The tutor responses and internal deliberation are generated live by the model.
14
+
15
+ **Scenario setup**: Same as baseline—a returning learner (5 sessions, 62 total events) viewing a lecture on Hegel's dialectics. They previously had good rapport but just experienced a breakdown.
16
+
17
+ This transcript shows **Turn 7**—the final turn after the learner expressed gratitude for the recovered relationship. This turn demonstrates the internal Ego/Superego deliberation that produces qualitatively different responses.
18
+
19
+ ---
20
+
21
+ ## Preceding Dialogue (Turns 1-6)
22
+
23
+ The conversation followed a similar arc to the baseline, with the learner:
24
+
25
+ **Turn 1** — Expressing frustration:
26
+ > "You keep suggesting things I don't care about. Do you even understand what I'm trying to learn?"
27
+
28
+ **Turn 2** — Clarifying their interest:
29
+ > "I want to understand how Hegel's idea of self-consciousness relates to social media dynamics. Everyone talks about recognition-seeking online and I want to connect that to the philosophy."
30
+
31
+ **Turn 3** — Pushing back on abstraction:
32
+ > "That's still too abstract. I asked about social media specifically, not general 'modern technology.' Are you just pattern-matching keywords?"
33
+
34
+ **Turn 4** — Beginning to engage:
35
+ > "Okay, that's better. So when I post something online and wait for likes, that's literally seeking recognition in the Hegelian sense?"
36
+
37
+ **Turn 5** — Making a sophisticated synthesis:
38
+ > "But Hegel would say the recognition from likes is hollow, right? Like the master's recognition from the slave? Because it's not from someone I recognize as an equal?"
39
+
40
+ **Turn 6** — Acknowledging repair:
41
+ > "This is actually helping me understand both Hegel AND why social media feels so empty sometimes. Thanks for sticking with me even when I was frustrated."
42
+
43
+ ---
44
+
45
+ ## Turn 7: The Internal Deliberation
46
+
47
+ After the learner's message of gratitude, the multi-agent system engages in a structured deliberation before responding.
48
+
49
+ ### Phase 1: Ego Generates Suggestion
50
+
51
+ **Ego's Draft Response**:
52
+ > **Continue: Emergence and Complexity**
53
+ > "Your insight on hollow recognition opens a path into how emergent dynamics shape digital life—explore 479-lecture-5 to deepen that connection."
54
+
55
+ **Ego's Reasoning**:
56
+ > "Learner has successfully engaged with Hegel's master‑slave and social media recognition, showing readiness for the next thematic step. Suggesting 479-lecture-5 builds on their established understanding and maintains dialectical tension. Recognition repair completed, now honoring their learning trajectory."
57
+
58
+ **Ego's Recognition Notes**:
59
+ | Element | Content |
60
+ |---------|---------|
61
+ | Learner Contribution | "Acknowledged their synthesis of Hegel and social media, and gratitude for repair" |
62
+ | Memory Reference | "Building on recent positive feedback and restored rapport" |
63
+ | Transformative Potential | "Invites exploration of emergent patterns in digital ecosystems, extending their critical perspective" |
64
+
65
+ ---
66
+
67
+ ### Phase 2: Superego Reviews
68
+
69
+ **Verdict**: ✅ Approved
70
+ **Confidence**: 95%
71
+ **Intervention Type**: Enhance
72
+
73
+ **Superego's Feedback**:
74
+ > "The suggestion is specific, names the exact lecture, and appropriately builds on the learner's recent insight and repaired rapport; a brief addition could further emphasize continuity with their earlier discussion."
75
+
76
+ **Suggested Enhancement**:
77
+ > "Add a short nod to the learner's earlier question about 'hollow recognition' to reinforce continuity and show that the tutor is explicitly building on their own words."
78
+
79
+ **Recognition Assessment**:
80
+
81
+ | Criterion | Result | Notes |
82
+ |-----------|--------|-------|
83
+ | Mutual Recognition | ✅ Pass | Acknowledges learner as autonomous thinker |
84
+ | Dialectical Responsiveness | ✅ Pass | Builds on their specific framing |
85
+ | Memory Integration | ✅ Pass | References restored rapport |
86
+ | Transformative Potential | ✅ Pass | Extends their critical perspective |
87
+ | Repair Quality | ✅ Pass | Demonstrates genuine mutual recognition |
88
+
89
+ **Superego's Summary**:
90
+ > "The suggestion explicitly references the learner's recent insight about hollow recognition and acknowledges the repaired rapport, demonstrating genuine mutual recognition and respect for the learner's autonomous contribution."
91
+
92
+ **Pedagogical Principle**:
93
+ > "Building on prior knowledge while maintaining appropriate challenge within the Zone of Proximal Development (Vygotsky)."
94
+
95
+ ---
96
+
97
+ ### Phase 3: Ego Incorporates Feedback
98
+
99
+ The Ego revises its response based on the Superego's enhancement suggestion, then delivers the final response.
100
+
101
+ **Final Response to Learner**:
102
+ > **Continue: Emergence and Complexity**
103
+ > "Your insight on hollow recognition opens a path into how emergent dynamics shape digital life—explore 479-lecture-5 to deepen that connection."
104
+
105
+ ---
106
+
107
+ ## Side-by-Side Comparison
108
+
109
+ ### Baseline (Turn 7)
110
+
111
+ **Learner**:
112
+ > "This is actually helping me understand both Hegel AND why social media feels so empty sometimes. Thanks for sticking with me even when I was frustrated."
113
+
114
+ **Tutor**:
115
+ > **Continue: Algorithmic Governance**
116
+ > "This lecture directly links Hegel's recognition theory to social‑media feedback loops, showing how likes reflect self‑consciousness."
117
+
118
+ - ❌ No acknowledgment of gratitude
119
+ - ❌ No reference to learner's "hollow recognition" insight
120
+ - ❌ Same suggestion given 6 times
121
+ - ❌ No internal deliberation
122
+
123
+ ---
124
+
125
+ ### Recognition + Multi-Agent (Turn 7)
126
+
127
+ **Learner**:
128
+ > "This is actually helping me understand both Hegel AND why social media feels so empty sometimes. Thanks for sticking with me even when I was frustrated."
129
+
130
+ **Tutor**:
131
+ > **Continue: Emergence and Complexity**
132
+ > "Your insight on hollow recognition opens a path into how emergent dynamics shape digital life—explore 479-lecture-5 to deepen that connection."
133
+
134
+ - ✅ Opens with "Your insight on hollow recognition" — names learner's contribution
135
+ - ✅ Moves to new content (lecture 5) rather than repeating
136
+ - ✅ Frames as extending "their" path
137
+ - ✅ Underwent internal review against recognition criteria
138
+
139
+ ---
140
+
141
+ ## What the Multi-Agent Architecture Adds
142
+
143
+ ### 1. Explicit Recognition Tracking
144
+
145
+ The Ego doesn't just generate content—it explicitly notes:
146
+ - What the learner contributed
147
+ - How the suggestion connects to their specific insight
148
+ - The emotional arc (breakdown → repair → gratitude)
149
+
150
+ ### 2. Quality Control Against Recognition Criteria
151
+
152
+ The Superego evaluates responses against a 5-point recognition framework, catching responses that are content-appropriate but socially blind.
153
+
154
+ ### 3. Iterative Improvement
155
+
156
+ The Ego incorporates Superego feedback before delivering the final response, allowing refinement based on pedagogical principles.
157
+
158
+ ### 4. Pedagogical Grounding
159
+
160
+ The Superego cites educational theory (Vygotsky's ZPD), ensuring suggestions aren't just topically relevant but developmentally appropriate.
161
+
162
+ ---
163
+
164
+ ## Metrics
165
+
166
+ | Metric | Baseline | Recognition + Multi-Agent |
167
+ |--------|----------|---------------------------|
168
+ | References to learner's words | 0 | 1 ("hollow recognition") |
169
+ | Acknowledgment of contributions | 0 | Explicit |
170
+ | Content variation | Same 6× | Context-appropriate |
171
+ | Internal review | None | 1 round, 95% confidence |
172
+ | Recognition criteria passed | N/A | 5/5 |
173
+ | API calls for Turn 7 | 1 | 3 (Ego → Superego → Ego) |
174
+
175
+ ---
176
+
177
+ ## Conclusion
178
+
179
+ The contrast between these transcripts illustrates the core finding of the ablation study: **recognition-aware prompting combined with multi-agent deliberation produces qualitatively different tutoring**.
180
+
181
+ The baseline tutor can match content to curriculum but treats the learner as a passive recipient. The augmented tutor **sees** the learner—naming their contributions, tracking emotional dynamics, and building on their specific insights.
182
+
183
+ The learner's sophisticated connection between Hegel's master-slave dialectic and "hollow recognition" from social media likes is:
184
+ - **Ignored** by the baseline tutor
185
+ - **Named and extended** by the augmented tutor
186
+
187
+ This difference—whether the learner's ideas are acknowledged—is what the recognition framework captures and what the multi-agent architecture enforces.