@machinespirits/eval 0.1.2 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (102) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +161 -0
  3. package/config/eval-settings.yaml +18 -0
  4. package/config/evaluation-rubric-learner.yaml +277 -0
  5. package/config/evaluation-rubric.yaml +613 -0
  6. package/config/interaction-eval-scenarios.yaml +93 -50
  7. package/config/learner-agents.yaml +124 -193
  8. package/config/machinespirits-eval.code-workspace +11 -0
  9. package/config/providers.yaml +60 -0
  10. package/config/suggestion-scenarios.yaml +1399 -0
  11. package/config/tutor-agents.yaml +716 -0
  12. package/docs/EVALUATION-VARIABLES.md +589 -0
  13. package/docs/REPLICATION-PLAN.md +577 -0
  14. package/index.js +15 -6
  15. package/package.json +16 -22
  16. package/routes/evalRoutes.js +88 -36
  17. package/scripts/analyze-judge-reliability.js +401 -0
  18. package/scripts/analyze-run.js +97 -0
  19. package/scripts/analyze-run.mjs +282 -0
  20. package/scripts/analyze-validation-failures.js +141 -0
  21. package/scripts/check-run.mjs +17 -0
  22. package/scripts/code-impasse-strategies.js +1132 -0
  23. package/scripts/compare-runs.js +44 -0
  24. package/scripts/compare-suggestions.js +80 -0
  25. package/scripts/compare-transformation.js +116 -0
  26. package/scripts/dig-into-run.js +158 -0
  27. package/scripts/eval-cli.js +2626 -0
  28. package/scripts/generate-paper-figures.py +452 -0
  29. package/scripts/qualitative-analysis-ai.js +1313 -0
  30. package/scripts/qualitative-analysis.js +688 -0
  31. package/scripts/seed-db.js +87 -0
  32. package/scripts/show-failed-suggestions.js +64 -0
  33. package/scripts/validate-content.js +192 -0
  34. package/server.js +3 -2
  35. package/services/__tests__/evalConfigLoader.test.js +338 -0
  36. package/services/anovaStats.js +499 -0
  37. package/services/contentResolver.js +407 -0
  38. package/services/dialogueTraceAnalyzer.js +454 -0
  39. package/services/evalConfigLoader.js +625 -0
  40. package/services/evaluationRunner.js +2171 -270
  41. package/services/evaluationStore.js +564 -29
  42. package/services/learnerConfigLoader.js +75 -5
  43. package/services/learnerRubricEvaluator.js +284 -0
  44. package/services/learnerTutorInteractionEngine.js +375 -0
  45. package/services/processUtils.js +18 -0
  46. package/services/progressLogger.js +98 -0
  47. package/services/promptRecommendationService.js +31 -26
  48. package/services/promptRewriter.js +427 -0
  49. package/services/rubricEvaluator.js +543 -70
  50. package/services/streamingReporter.js +104 -0
  51. package/services/turnComparisonAnalyzer.js +494 -0
  52. package/components/MobileEvalDashboard.tsx +0 -267
  53. package/components/comparison/DeltaAnalysisTable.tsx +0 -137
  54. package/components/comparison/ProfileComparisonCard.tsx +0 -176
  55. package/components/comparison/RecognitionABMode.tsx +0 -385
  56. package/components/comparison/RecognitionMetricsPanel.tsx +0 -135
  57. package/components/comparison/WinnerIndicator.tsx +0 -64
  58. package/components/comparison/index.ts +0 -5
  59. package/components/mobile/BottomSheet.tsx +0 -233
  60. package/components/mobile/DimensionBreakdown.tsx +0 -210
  61. package/components/mobile/DocsView.tsx +0 -363
  62. package/components/mobile/LogsView.tsx +0 -481
  63. package/components/mobile/PsychodynamicQuadrant.tsx +0 -261
  64. package/components/mobile/QuickTestView.tsx +0 -1098
  65. package/components/mobile/RecognitionTypeChart.tsx +0 -124
  66. package/components/mobile/RecognitionView.tsx +0 -809
  67. package/components/mobile/RunDetailView.tsx +0 -261
  68. package/components/mobile/RunHistoryView.tsx +0 -367
  69. package/components/mobile/ScoreRadial.tsx +0 -211
  70. package/components/mobile/StreamingLogPanel.tsx +0 -230
  71. package/components/mobile/SynthesisStrategyChart.tsx +0 -140
  72. package/docs/research/ABLATION-DIALOGUE-ROUNDS.md +0 -52
  73. package/docs/research/ABLATION-MODEL-SELECTION.md +0 -53
  74. package/docs/research/ADVANCED-EVAL-ANALYSIS.md +0 -60
  75. package/docs/research/ANOVA-RESULTS-2026-01-14.md +0 -257
  76. package/docs/research/COMPREHENSIVE-EVALUATION-PLAN.md +0 -586
  77. package/docs/research/COST-ANALYSIS.md +0 -56
  78. package/docs/research/CRITICAL-REVIEW-RECOGNITION-TUTORING.md +0 -340
  79. package/docs/research/DYNAMIC-VS-SCRIPTED-ANALYSIS.md +0 -291
  80. package/docs/research/EVAL-SYSTEM-ANALYSIS.md +0 -306
  81. package/docs/research/FACTORIAL-RESULTS-2026-01-14.md +0 -301
  82. package/docs/research/IMPLEMENTATION-PLAN-CRITIQUE-RESPONSE.md +0 -1988
  83. package/docs/research/LONGITUDINAL-DYADIC-EVALUATION.md +0 -282
  84. package/docs/research/MULTI-JUDGE-VALIDATION-2026-01-14.md +0 -147
  85. package/docs/research/PAPER-EXTENSION-DYADIC.md +0 -204
  86. package/docs/research/PAPER-UNIFIED.md +0 -659
  87. package/docs/research/PAPER-UNIFIED.pdf +0 -0
  88. package/docs/research/PROMPT-IMPROVEMENTS-2026-01-14.md +0 -356
  89. package/docs/research/SESSION-NOTES-2026-01-11-RECOGNITION-EVAL.md +0 -419
  90. package/docs/research/apa.csl +0 -2133
  91. package/docs/research/archive/PAPER-DRAFT-RECOGNITION-TUTORING.md +0 -1637
  92. package/docs/research/archive/paper-multiagent-tutor.tex +0 -978
  93. package/docs/research/paper-draft/full-paper.md +0 -136
  94. package/docs/research/paper-draft/images/pasted-image-2026-01-24T03-47-47-846Z-d76a7ae2.png +0 -0
  95. package/docs/research/paper-draft/references.bib +0 -515
  96. package/docs/research/transcript-baseline.md +0 -139
  97. package/docs/research/transcript-recognition-multiagent.md +0 -187
  98. package/hooks/useEvalData.ts +0 -625
  99. package/server-init.js +0 -45
  100. package/services/benchmarkService.js +0 -1892
  101. package/types.ts +0 -165
  102. package/utils/haptics.ts +0 -45
@@ -1,211 +0,0 @@
1
- /**
2
- * ScoreRadial Component
3
- *
4
- * A premium circular gauge displaying the overall evaluation score with pass/fail indication.
5
- * Features gradient strokes, glow effects, and smooth animations.
6
- * Mobile-optimized with clear visual feedback.
7
- */
8
-
9
- import React, { useEffect, useState } from 'react';
10
-
11
- interface ScoreRadialProps {
12
- score: number | null;
13
- passed: boolean;
14
- size?: number;
15
- animate?: boolean;
16
- }
17
-
18
- export const ScoreRadial: React.FC<ScoreRadialProps> = ({
19
- score,
20
- passed,
21
- size = 120,
22
- animate = true
23
- }) => {
24
- const [animatedScore, setAnimatedScore] = useState(animate ? 0 : (score ?? 0));
25
- const percentage = score !== null ? animatedScore : 0;
26
- const radius = (size - 16) / 2; // Account for stroke width
27
- const circumference = 2 * Math.PI * radius;
28
- const strokeDashoffset = circumference - (percentage / 100) * circumference;
29
-
30
- // Unique ID for gradient definitions
31
- const gradientId = `score-gradient-${passed ? 'pass' : 'fail'}`;
32
- const glowId = `score-glow-${passed ? 'pass' : 'fail'}`;
33
-
34
- // Animate score on mount/change
35
- useEffect(() => {
36
- if (!animate || score === null) {
37
- setAnimatedScore(score ?? 0);
38
- return;
39
- }
40
-
41
- const duration = 1000;
42
- const startTime = Date.now();
43
- const startValue = animatedScore;
44
- const endValue = score;
45
-
46
- const animateValue = () => {
47
- const elapsed = Date.now() - startTime;
48
- const progress = Math.min(elapsed / duration, 1);
49
- // Ease out cubic
50
- const eased = 1 - Math.pow(1 - progress, 3);
51
- const current = startValue + (endValue - startValue) * eased;
52
-
53
- setAnimatedScore(current);
54
-
55
- if (progress < 1) {
56
- requestAnimationFrame(animateValue);
57
- }
58
- };
59
-
60
- requestAnimationFrame(animateValue);
61
- }, [score, animate]);
62
-
63
- // Get quality label based on score
64
- const getQualityLabel = (s: number): string => {
65
- if (s >= 80) return 'Excellent';
66
- if (s >= 60) return 'Good';
67
- if (s >= 40) return 'Fair';
68
- return 'Needs Work';
69
- };
70
-
71
- return (
72
- <div
73
- className="relative flex items-center justify-center"
74
- style={{ width: size, height: size }}
75
- >
76
- {/* Outer glow effect */}
77
- {score !== null && (
78
- <div
79
- className={`absolute inset-0 rounded-full transition-opacity duration-1000 ${
80
- passed ? 'bg-green-500/10' : 'bg-red-500/10'
81
- } ${percentage > 0 ? 'opacity-100' : 'opacity-0'}`}
82
- style={{
83
- boxShadow: passed
84
- ? '0 0 40px rgba(34, 197, 94, 0.2), inset 0 0 20px rgba(34, 197, 94, 0.05)'
85
- : '0 0 40px rgba(230, 57, 70, 0.2), inset 0 0 20px rgba(230, 57, 70, 0.05)'
86
- }}
87
- />
88
- )}
89
-
90
- <svg className="transform -rotate-90" width={size} height={size}>
91
- {/* Gradient definitions */}
92
- <defs>
93
- <linearGradient id={gradientId} x1="0%" y1="0%" x2="100%" y2="100%">
94
- {passed ? (
95
- <>
96
- <stop offset="0%" stopColor="#22c55e" />
97
- <stop offset="50%" stopColor="#4ade80" />
98
- <stop offset="100%" stopColor="#22c55e" />
99
- </>
100
- ) : (
101
- <>
102
- <stop offset="0%" stopColor="#E63946" />
103
- <stop offset="50%" stopColor="#f87171" />
104
- <stop offset="100%" stopColor="#c1121f" />
105
- </>
106
- )}
107
- </linearGradient>
108
- <filter id={glowId} x="-50%" y="-50%" width="200%" height="200%">
109
- <feGaussianBlur stdDeviation="3" result="coloredBlur" />
110
- <feMerge>
111
- <feMergeNode in="coloredBlur" />
112
- <feMergeNode in="SourceGraphic" />
113
- </feMerge>
114
- </filter>
115
- </defs>
116
-
117
- {/* Background track - glass effect */}
118
- <circle
119
- cx={size / 2}
120
- cy={size / 2}
121
- r={radius}
122
- stroke="rgba(31, 41, 55, 0.8)"
123
- strokeWidth="10"
124
- fill="none"
125
- />
126
-
127
- {/* Inner subtle ring */}
128
- <circle
129
- cx={size / 2}
130
- cy={size / 2}
131
- r={radius - 6}
132
- stroke="rgba(255, 255, 255, 0.03)"
133
- strokeWidth="1"
134
- fill="none"
135
- />
136
-
137
- {/* Progress arc with gradient */}
138
- {score !== null && (
139
- <circle
140
- cx={size / 2}
141
- cy={size / 2}
142
- r={radius}
143
- stroke={`url(#${gradientId})`}
144
- strokeWidth="10"
145
- fill="none"
146
- strokeLinecap="round"
147
- strokeDasharray={circumference}
148
- strokeDashoffset={strokeDashoffset}
149
- filter={percentage >= 60 ? `url(#${glowId})` : undefined}
150
- className="transition-all duration-100"
151
- />
152
- )}
153
-
154
- {/* Decorative end cap glow */}
155
- {score !== null && percentage > 5 && (
156
- <circle
157
- cx={size / 2 + radius * Math.cos((2 * Math.PI * percentage) / 100 - Math.PI / 2)}
158
- cy={size / 2 + radius * Math.sin((2 * Math.PI * percentage) / 100 - Math.PI / 2)}
159
- r="5"
160
- fill={passed ? '#4ade80' : '#f87171'}
161
- opacity="0.6"
162
- className="animate-pulse"
163
- />
164
- )}
165
- </svg>
166
-
167
- {/* Center content */}
168
- <div className="absolute inset-0 flex flex-col items-center justify-center">
169
- {score !== null ? (
170
- <>
171
- {/* Score value */}
172
- <div className="flex items-baseline gap-0.5">
173
- <span className="text-4xl font-bold text-white tabular-nums">
174
- {Math.round(animatedScore)}
175
- </span>
176
- <span className="text-lg text-gray-500 font-medium">%</span>
177
- </div>
178
-
179
- {/* Pass/Fail badge */}
180
- <div
181
- className={`mt-1 px-3 py-0.5 rounded-full text-[10px] font-bold uppercase tracking-wider
182
- ${passed
183
- ? 'bg-green-500/20 text-green-400 border border-green-500/30'
184
- : 'bg-red-500/20 text-red-400 border border-red-500/30'
185
- }`}
186
- >
187
- {passed ? 'Pass' : 'Fail'}
188
- </div>
189
-
190
- {/* Quality label */}
191
- {size >= 140 && (
192
- <span className="mt-2 text-[10px] text-gray-500 font-medium">
193
- {getQualityLabel(score)}
194
- </span>
195
- )}
196
- </>
197
- ) : (
198
- <div className="flex flex-col items-center gap-1">
199
- <svg className="w-8 h-8 text-gray-600" fill="none" viewBox="0 0 24 24" stroke="currentColor">
200
- <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5}
201
- d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
202
- </svg>
203
- <span className="text-gray-500 text-xs font-medium">No score</span>
204
- </div>
205
- )}
206
- </div>
207
- </div>
208
- );
209
- };
210
-
211
- export default ScoreRadial;
@@ -1,230 +0,0 @@
1
- /**
2
- * StreamingLogPanel Component
3
- *
4
- * Displays real-time streaming logs during test execution.
5
- * Premium glass morphism styling with visual log type indicators.
6
- * Expandable panel that auto-scrolls to latest content.
7
- */
8
-
9
- import React, { useRef, useEffect, useState } from 'react';
10
- import type { StreamLog } from '../../hooks/useEvalData';
11
- import haptics from '../../utils/haptics';
12
-
13
- interface StreamingLogPanelProps {
14
- logs: StreamLog[];
15
- isRunning: boolean;
16
- }
17
-
18
- // Log type configurations with styling
19
- interface LogTypeConfig {
20
- color: string;
21
- bgColor: string;
22
- icon: React.ReactNode;
23
- }
24
-
25
- const LogTypeIcons: Record<StreamLog['type'] | 'info', LogTypeConfig> = {
26
- success: {
27
- color: 'text-green-400',
28
- bgColor: 'bg-green-500/10',
29
- icon: (
30
- <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2.5}>
31
- <path strokeLinecap="round" strokeLinejoin="round" d="M5 13l4 4L19 7" />
32
- </svg>
33
- )
34
- },
35
- warning: {
36
- color: 'text-yellow-400',
37
- bgColor: 'bg-yellow-500/10',
38
- icon: (
39
- <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
40
- <path strokeLinecap="round" strokeLinejoin="round" d="M12 9v2m0 4h.01m-6.938 4h13.856c1.54 0 2.502-1.667 1.732-3L13.732 4c-.77-1.333-2.694-1.333-3.464 0L3.34 16c-.77 1.333.192 3 1.732 3z" />
41
- </svg>
42
- )
43
- },
44
- error: {
45
- color: 'text-red-400',
46
- bgColor: 'bg-red-500/10',
47
- icon: (
48
- <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2.5}>
49
- <path strokeLinecap="round" strokeLinejoin="round" d="M6 18L18 6M6 6l12 12" />
50
- </svg>
51
- )
52
- },
53
- progress: {
54
- color: 'text-blue-400',
55
- bgColor: 'bg-blue-500/10',
56
- icon: (
57
- <svg className="w-3.5 h-3.5 animate-spin" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
58
- <path strokeLinecap="round" strokeLinejoin="round" d="M4 4v5h.582m15.356 2A8.001 8.001 0 004.582 9m0 0H9m11 11v-5h-.581m0 0a8.003 8.003 0 01-15.357-2m15.357 2H15" />
59
- </svg>
60
- )
61
- },
62
- info: {
63
- color: 'text-gray-400',
64
- bgColor: 'bg-gray-500/10',
65
- icon: (
66
- <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
67
- <path strokeLinecap="round" strokeLinejoin="round" d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
68
- </svg>
69
- )
70
- }
71
- };
72
-
73
- export const StreamingLogPanel: React.FC<StreamingLogPanelProps> = ({
74
- logs,
75
- isRunning
76
- }) => {
77
- const [isExpanded, setIsExpanded] = useState(true);
78
- const bottomRef = useRef<HTMLDivElement>(null);
79
- const containerRef = useRef<HTMLDivElement>(null);
80
-
81
- // Auto-scroll to bottom when new logs arrive
82
- useEffect(() => {
83
- if (bottomRef.current && containerRef.current) {
84
- bottomRef.current.scrollIntoView({ behavior: 'smooth' });
85
- }
86
- }, [logs]);
87
-
88
- // Auto-expand when running starts
89
- useEffect(() => {
90
- if (isRunning) {
91
- setIsExpanded(true);
92
- }
93
- }, [isRunning]);
94
-
95
- const getLogConfig = (type: StreamLog['type']): LogTypeConfig => {
96
- return LogTypeIcons[type] || LogTypeIcons.info;
97
- };
98
-
99
- // Count log types for summary
100
- const logCounts = logs.reduce((acc, log) => {
101
- acc[log.type] = (acc[log.type] || 0) + 1;
102
- return acc;
103
- }, {} as Record<string, number>);
104
-
105
- if (logs.length === 0 && !isRunning) {
106
- return null;
107
- }
108
-
109
- return (
110
- <div className="border-t border-white/5 bg-gray-900/30 backdrop-blur-sm">
111
- {/* Toggle header - Glass styling */}
112
- <button
113
- type="button"
114
- onClick={() => {
115
- haptics.light();
116
- setIsExpanded(!isExpanded);
117
- }}
118
- className="w-full flex items-center justify-between p-3.5
119
- hover:bg-white/5 active:scale-[0.995] transition-all duration-150"
120
- >
121
- <div className="flex items-center gap-3">
122
- {/* Live indicator */}
123
- {isRunning && (
124
- <div className="relative">
125
- <span className="absolute inset-0 w-2.5 h-2.5 bg-green-500 rounded-full animate-ping opacity-75" />
126
- <span className="relative w-2.5 h-2.5 bg-green-500 rounded-full block" />
127
- </div>
128
- )}
129
-
130
- {/* Title */}
131
- <span className="text-sm font-medium text-gray-300">
132
- {isRunning ? 'Live Output' : 'Output'}
133
- </span>
134
-
135
- {/* Log type counts - Mini badges */}
136
- <div className="flex items-center gap-1.5">
137
- {logCounts.error && logCounts.error > 0 && (
138
- <span className="px-1.5 py-0.5 rounded text-[10px] font-medium bg-red-500/20 text-red-400 border border-red-500/30">
139
- {logCounts.error} error{logCounts.error > 1 ? 's' : ''}
140
- </span>
141
- )}
142
- {logCounts.warning && logCounts.warning > 0 && (
143
- <span className="px-1.5 py-0.5 rounded text-[10px] font-medium bg-yellow-500/20 text-yellow-400 border border-yellow-500/30">
144
- {logCounts.warning} warn
145
- </span>
146
- )}
147
- <span className="px-1.5 py-0.5 rounded text-[10px] font-medium bg-white/5 text-gray-500 border border-white/5">
148
- {logs.length} total
149
- </span>
150
- </div>
151
- </div>
152
-
153
- {/* Expand indicator */}
154
- <div className={`w-7 h-7 rounded-full bg-white/5 flex items-center justify-center
155
- transition-all duration-200 ${isExpanded ? 'bg-white/10' : ''}`}>
156
- <svg
157
- className={`w-4 h-4 text-gray-500 transition-transform duration-200 ${isExpanded ? 'rotate-180' : ''}`}
158
- fill="none"
159
- viewBox="0 0 24 24"
160
- stroke="currentColor"
161
- >
162
- <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 15l7-7 7 7" />
163
- </svg>
164
- </div>
165
- </button>
166
-
167
- {/* Log content - Terminal style with glass */}
168
- <div
169
- ref={containerRef}
170
- className={`overflow-hidden transition-all duration-300 ease-out
171
- ${isExpanded ? 'max-h-80' : 'max-h-0'}`}
172
- >
173
- <div className="h-full overflow-y-auto bg-black/40 backdrop-blur-sm p-3 font-mono text-xs leading-relaxed
174
- border-t border-white/5 scrollbar-hide">
175
-
176
- {/* Log entries */}
177
- {logs.map((log, i) => {
178
- const config = getLogConfig(log.type);
179
- return (
180
- <div
181
- key={i}
182
- className={`flex items-start gap-2 py-1 px-2 -mx-2 rounded
183
- ${config.bgColor} ${config.color}
184
- animate-fade-in`}
185
- style={{ animationDelay: `${Math.min(i * 20, 200)}ms` }}
186
- >
187
- {/* Icon */}
188
- <span className="flex-shrink-0 mt-0.5 w-4 flex items-center justify-center">
189
- {config.icon}
190
- </span>
191
-
192
- {/* Message */}
193
- <span className="whitespace-pre-wrap break-words flex-1">
194
- {log.message}
195
- </span>
196
-
197
- {/* Timestamp for progress logs */}
198
- {log.type === 'progress' && (
199
- <span className="text-[10px] text-gray-600 flex-shrink-0 tabular-nums">
200
- {new Date().toLocaleTimeString(undefined, {
201
- hour: '2-digit',
202
- minute: '2-digit',
203
- second: '2-digit'
204
- })}
205
- </span>
206
- )}
207
- </div>
208
- );
209
- })}
210
-
211
- {/* Typing indicator while running */}
212
- {isRunning && (
213
- <div className="flex items-center gap-1.5 mt-3 pt-2 border-t border-white/5">
214
- <div className="flex items-center gap-1">
215
- <span className="w-1.5 h-1.5 rounded-full bg-brand-red animate-pulse" />
216
- <span className="w-1.5 h-1.5 rounded-full bg-brand-red animate-pulse" style={{ animationDelay: '0.2s' }} />
217
- <span className="w-1.5 h-1.5 rounded-full bg-brand-red animate-pulse" style={{ animationDelay: '0.4s' }} />
218
- </div>
219
- <span className="text-[10px] text-gray-600 ml-1">Processing...</span>
220
- </div>
221
- )}
222
-
223
- <div ref={bottomRef} />
224
- </div>
225
- </div>
226
- </div>
227
- );
228
- };
229
-
230
- export default StreamingLogPanel;
@@ -1,140 +0,0 @@
1
- /**
2
- * SynthesisStrategyChart Component
3
- *
4
- * Horizontal bars showing distribution of synthesis strategies:
5
- * - Ghost Dominates (red) - Superego/authority wins
6
- * - Learner Dominates (blue) - Learner needs prioritized
7
- * - Dialectical Synthesis (gold/green) - True mutual recognition
8
- */
9
-
10
- import React from 'react';
11
-
12
- interface SynthesisStrategyCounts {
13
- ghost_dominates: number;
14
- learner_dominates: number;
15
- dialectical_synthesis: number;
16
- }
17
-
18
- interface SynthesisStrategyChartProps {
19
- counts: SynthesisStrategyCounts;
20
- }
21
-
22
- export const SynthesisStrategyChart: React.FC<SynthesisStrategyChartProps> = ({
23
- counts,
24
- }) => {
25
- const total =
26
- counts.ghost_dominates + counts.learner_dominates + counts.dialectical_synthesis;
27
-
28
- if (total === 0) {
29
- return (
30
- <div className="bg-gray-900/60 backdrop-blur-sm border border-white/5 rounded-xl p-4">
31
- <div className="text-xs text-gray-400 mb-3">Synthesis Strategies</div>
32
- <div className="text-sm text-gray-500 text-center py-4">
33
- No synthesis data recorded
34
- </div>
35
- </div>
36
- );
37
- }
38
-
39
- const strategies = [
40
- {
41
- key: 'dialectical_synthesis',
42
- label: 'Dialectical Synthesis',
43
- count: counts.dialectical_synthesis,
44
- percentage: (counts.dialectical_synthesis / total) * 100,
45
- gradient: 'from-yellow-500 to-green-500',
46
- bgColor: 'bg-yellow-500/20',
47
- borderColor: 'border-yellow-500/30',
48
- textColor: 'text-yellow-400',
49
- icon: '⚡',
50
- description: 'Mutual recognition achieved',
51
- },
52
- {
53
- key: 'learner_dominates',
54
- label: 'Learner Dominates',
55
- count: counts.learner_dominates,
56
- percentage: (counts.learner_dominates / total) * 100,
57
- gradient: 'from-blue-500 to-blue-400',
58
- bgColor: 'bg-blue-500/20',
59
- borderColor: 'border-blue-500/30',
60
- textColor: 'text-blue-400',
61
- icon: '🎯',
62
- description: 'Learner needs prioritized',
63
- },
64
- {
65
- key: 'ghost_dominates',
66
- label: 'Ghost Dominates',
67
- count: counts.ghost_dominates,
68
- percentage: (counts.ghost_dominates / total) * 100,
69
- gradient: 'from-red-500 to-red-400',
70
- bgColor: 'bg-red-500/20',
71
- borderColor: 'border-red-500/30',
72
- textColor: 'text-red-400',
73
- icon: '👻',
74
- description: 'Authority/superego wins',
75
- },
76
- ];
77
-
78
- // Find max for scaling
79
- const maxCount = Math.max(
80
- counts.ghost_dominates,
81
- counts.learner_dominates,
82
- counts.dialectical_synthesis
83
- );
84
-
85
- return (
86
- <div className="bg-gray-900/60 backdrop-blur-sm border border-white/5 rounded-xl p-4">
87
- <div className="flex items-center justify-between mb-4">
88
- <div className="text-xs text-gray-400">Synthesis Strategies</div>
89
- <div className="text-xs text-gray-500">{total} total</div>
90
- </div>
91
-
92
- <div className="space-y-3">
93
- {strategies.map((strategy) => (
94
- <div key={strategy.key} className="space-y-1.5">
95
- {/* Label row */}
96
- <div className="flex items-center justify-between">
97
- <div className="flex items-center gap-2">
98
- <span className="text-sm">{strategy.icon}</span>
99
- <span className="text-xs text-gray-300">{strategy.label}</span>
100
- </div>
101
- <div className="flex items-center gap-2">
102
- <span className={`text-xs ${strategy.textColor} font-medium`}>
103
- {strategy.count}
104
- </span>
105
- <span className="text-xs text-gray-500">
106
- ({strategy.percentage.toFixed(0)}%)
107
- </span>
108
- </div>
109
- </div>
110
-
111
- {/* Progress bar */}
112
- <div className="h-2 bg-gray-800 rounded-full overflow-hidden">
113
- <div
114
- className={`h-full bg-gradient-to-r ${strategy.gradient} rounded-full transition-all duration-500`}
115
- style={{ width: `${maxCount > 0 ? (strategy.count / maxCount) * 100 : 0}%` }}
116
- />
117
- </div>
118
-
119
- {/* Description */}
120
- <div className="text-[10px] text-gray-600">{strategy.description}</div>
121
- </div>
122
- ))}
123
- </div>
124
-
125
- {/* Ideal indicator */}
126
- {counts.dialectical_synthesis > 0 && (
127
- <div className="mt-4 pt-3 border-t border-white/5">
128
- <div className="flex items-center gap-2 text-xs">
129
- <span className="text-yellow-400">
130
- {((counts.dialectical_synthesis / total) * 100).toFixed(0)}%
131
- </span>
132
- <span className="text-gray-500">of moments achieve dialectical synthesis</span>
133
- </div>
134
- </div>
135
- )}
136
- </div>
137
- );
138
- };
139
-
140
- export default SynthesisStrategyChart;
@@ -1,52 +0,0 @@
1
- # Ablation Study: Dialogue Rounds
2
-
3
- **Generated:** 2026-01-14T10:23:56.877Z
4
-
5
- ## Research Question
6
-
7
- Does increasing the number of Ego-Superego dialogue rounds improve tutor suggestion quality?
8
-
9
- ## Method
10
-
11
- Compared evaluation scores across profiles with different `max_rounds` settings:
12
- - **0 rounds**: Single-agent (no Superego review)
13
- - **1 round**: Single critique-revise cycle
14
- - **2 rounds**: Two critique-revise cycles (default)
15
- - **3 rounds**: Three critique-revise cycles
16
-
17
- ## Results
18
-
19
- ### Descriptive Statistics
20
-
21
- | Rounds | N | Mean | SD | 95% CI |
22
- |--------|---|------|-----|--------|
23
- | 0 | 483 | 91.58 | 15.75 | [90.2, 93.0] |
24
- | 1 | 1 | 50.00 | 0.00 | [50.0, 50.0] |
25
- | 2 | 247 | 88.05 | 19.77 | [85.6, 90.5] |
26
- | 3 | 2 | 96.25 | 1.77 | [93.8, 98.7] |
27
-
28
- ### One-Way ANOVA
29
-
30
- | Source | SS | df | MS | F | p | η² |
31
- |--------|-----|-----|-----|-----|-----|-----|
32
- | Between | 3738.46 | 3 | 1246.15 | 4.212 | 0.050 | 0.017 |
33
- | Within | 215696.89 | 729 | 295.88 | | | |
34
- | Total | 219435.35 | 732 | | | | |
35
-
36
- ## Interpretation
37
-
38
- The effect of dialogue rounds on suggestion quality was not statistically significant (F(3, 729) = 4.21, p = 0.050, η² = 0.017).
39
-
40
- Moving from single-agent (0 rounds) to multi-agent with 2 dialogue rounds shows a -3.9% improvement in mean score.
41
-
42
- ## Limitations
43
-
44
- - Confounded with profile differences (model selection, prompts)
45
- - Unbalanced sample sizes across conditions
46
- - No randomized controlled comparison
47
-
48
- ## Implications for System Design
49
-
50
- Based on these results:
51
- - Dialogue rounds may have limited impact compared to other factors
52
- - Consider whether additional API costs are justified
@@ -1,53 +0,0 @@
1
- # Ablation Study: Model Selection
2
-
3
- **Generated:** 2026-01-14T10:25:59.574Z
4
-
5
- ## Research Question
6
-
7
- Does the choice of LLM model (for the Ego agent) affect tutor suggestion quality?
8
-
9
- ## Method
10
-
11
- Analyzed evaluation scores grouped by the Ego model used in each profile.
12
-
13
- ## Results
14
-
15
- ### Descriptive Statistics
16
-
17
- | Model | N | Mean | SD | 95% CI |
18
- |-------|---|------|-----|--------|
19
- | deepseek | 442 | 93.31 | 13.10 | [92.1, 94.5] |
20
- | nemotron | 299 | 86.44 | 20.35 | [84.1, 88.7] |
21
- | haiku | 29 | 84.20 | 21.58 | [76.3, 92.1] |
22
- | gpt-5.2 | 1 | 97.50 | 0.00 | [97.5, 97.5] |
23
- | sonnet | 1 | 97.50 | 0.00 | [97.5, 97.5] |
24
-
25
- ### One-Way ANOVA
26
-
27
- - F(4, 767) = 8.729
28
- - p < .05
29
- - η² = 0.044 (Small effect)
30
-
31
- ### Model Ranking
32
-
33
- 1. **gpt-5.2**: M = 97.50 (n=1)
34
- 2. **sonnet**: M = 97.50 (n=1)
35
- 3. **deepseek**: M = 93.31 (n=442)
36
- 4. **nemotron**: M = 86.44 (n=299)
37
- 5. **haiku**: M = 84.20 (n=29)
38
-
39
- ## Interpretation
40
-
41
- Model selection has a statistically significant effect on suggestion quality (F(4, 767) = 8.73, p < .05, η² = 0.044).
42
-
43
- ## Limitations
44
-
45
- - Confounded with profile differences (prompts, dialogue settings)
46
- - Unbalanced sample sizes across models
47
- - No direct A/B comparison with identical prompts
48
-
49
- ## Implications
50
-
51
- - gpt-5.2 shows the highest mean score but with n=1 observations
52
- - deepseek offers good quality at minimal cost
53
- - Consider running controlled experiments varying only the model