npm - @machinespirits/eval - Versions diffs - 0.1.2 → 0.2.1 - Mend

@machinespirits/eval 0.1.2 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (102) hide show

package/LICENSE +21 -0
package/README.md +161 -0
package/config/eval-settings.yaml +18 -0
package/config/evaluation-rubric-learner.yaml +277 -0
package/config/evaluation-rubric.yaml +613 -0
package/config/interaction-eval-scenarios.yaml +93 -50
package/config/learner-agents.yaml +124 -193
package/config/machinespirits-eval.code-workspace +11 -0
package/config/providers.yaml +60 -0
package/config/suggestion-scenarios.yaml +1399 -0
package/config/tutor-agents.yaml +716 -0
package/docs/EVALUATION-VARIABLES.md +589 -0
package/docs/REPLICATION-PLAN.md +577 -0
package/index.js +15 -6
package/package.json +16 -22
package/routes/evalRoutes.js +88 -36
package/scripts/analyze-judge-reliability.js +401 -0
package/scripts/analyze-run.js +97 -0
package/scripts/analyze-run.mjs +282 -0
package/scripts/analyze-validation-failures.js +141 -0
package/scripts/check-run.mjs +17 -0
package/scripts/code-impasse-strategies.js +1132 -0
package/scripts/compare-runs.js +44 -0
package/scripts/compare-suggestions.js +80 -0
package/scripts/compare-transformation.js +116 -0
package/scripts/dig-into-run.js +158 -0
package/scripts/eval-cli.js +2626 -0
package/scripts/generate-paper-figures.py +452 -0
package/scripts/qualitative-analysis-ai.js +1313 -0
package/scripts/qualitative-analysis.js +688 -0
package/scripts/seed-db.js +87 -0
package/scripts/show-failed-suggestions.js +64 -0
package/scripts/validate-content.js +192 -0
package/server.js +3 -2
package/services/__tests__/evalConfigLoader.test.js +338 -0
package/services/anovaStats.js +499 -0
package/services/contentResolver.js +407 -0
package/services/dialogueTraceAnalyzer.js +454 -0
package/services/evalConfigLoader.js +625 -0
package/services/evaluationRunner.js +2171 -270
package/services/evaluationStore.js +564 -29
package/services/learnerConfigLoader.js +75 -5
package/services/learnerRubricEvaluator.js +284 -0
package/services/learnerTutorInteractionEngine.js +375 -0
package/services/processUtils.js +18 -0
package/services/progressLogger.js +98 -0
package/services/promptRecommendationService.js +31 -26
package/services/promptRewriter.js +427 -0
package/services/rubricEvaluator.js +543 -70
package/services/streamingReporter.js +104 -0
package/services/turnComparisonAnalyzer.js +494 -0
package/components/MobileEvalDashboard.tsx +0 -267
package/components/comparison/DeltaAnalysisTable.tsx +0 -137
package/components/comparison/ProfileComparisonCard.tsx +0 -176
package/components/comparison/RecognitionABMode.tsx +0 -385
package/components/comparison/RecognitionMetricsPanel.tsx +0 -135
package/components/comparison/WinnerIndicator.tsx +0 -64
package/components/comparison/index.ts +0 -5
package/components/mobile/BottomSheet.tsx +0 -233
package/components/mobile/DimensionBreakdown.tsx +0 -210
package/components/mobile/DocsView.tsx +0 -363
package/components/mobile/LogsView.tsx +0 -481
package/components/mobile/PsychodynamicQuadrant.tsx +0 -261
package/components/mobile/QuickTestView.tsx +0 -1098
package/components/mobile/RecognitionTypeChart.tsx +0 -124
package/components/mobile/RecognitionView.tsx +0 -809
package/components/mobile/RunDetailView.tsx +0 -261
package/components/mobile/RunHistoryView.tsx +0 -367
package/components/mobile/ScoreRadial.tsx +0 -211
package/components/mobile/StreamingLogPanel.tsx +0 -230
package/components/mobile/SynthesisStrategyChart.tsx +0 -140
package/docs/research/ABLATION-DIALOGUE-ROUNDS.md +0 -52
package/docs/research/ABLATION-MODEL-SELECTION.md +0 -53
package/docs/research/ADVANCED-EVAL-ANALYSIS.md +0 -60
package/docs/research/ANOVA-RESULTS-2026-01-14.md +0 -257
package/docs/research/COMPREHENSIVE-EVALUATION-PLAN.md +0 -586
package/docs/research/COST-ANALYSIS.md +0 -56
package/docs/research/CRITICAL-REVIEW-RECOGNITION-TUTORING.md +0 -340
package/docs/research/DYNAMIC-VS-SCRIPTED-ANALYSIS.md +0 -291
package/docs/research/EVAL-SYSTEM-ANALYSIS.md +0 -306
package/docs/research/FACTORIAL-RESULTS-2026-01-14.md +0 -301
package/docs/research/IMPLEMENTATION-PLAN-CRITIQUE-RESPONSE.md +0 -1988
package/docs/research/LONGITUDINAL-DYADIC-EVALUATION.md +0 -282
package/docs/research/MULTI-JUDGE-VALIDATION-2026-01-14.md +0 -147
package/docs/research/PAPER-EXTENSION-DYADIC.md +0 -204
package/docs/research/PAPER-UNIFIED.md +0 -659
package/docs/research/PAPER-UNIFIED.pdf +0 -0
package/docs/research/PROMPT-IMPROVEMENTS-2026-01-14.md +0 -356
package/docs/research/SESSION-NOTES-2026-01-11-RECOGNITION-EVAL.md +0 -419
package/docs/research/apa.csl +0 -2133
package/docs/research/archive/PAPER-DRAFT-RECOGNITION-TUTORING.md +0 -1637
package/docs/research/archive/paper-multiagent-tutor.tex +0 -978
package/docs/research/paper-draft/full-paper.md +0 -136
package/docs/research/paper-draft/images/pasted-image-2026-01-24T03-47-47-846Z-d76a7ae2.png +0 -0
package/docs/research/paper-draft/references.bib +0 -515
package/docs/research/transcript-baseline.md +0 -139
package/docs/research/transcript-recognition-multiagent.md +0 -187
package/hooks/useEvalData.ts +0 -625
package/server-init.js +0 -45
package/services/benchmarkService.js +0 -1892
package/types.ts +0 -165
package/utils/haptics.ts +0 -45

package/components/mobile/ScoreRadial.tsx DELETED Viewed

@@ -1,211 +0,0 @@
-/**
- * ScoreRadial Component
- *
- * A premium circular gauge displaying the overall evaluation score with pass/fail indication.
- * Features gradient strokes, glow effects, and smooth animations.
- * Mobile-optimized with clear visual feedback.
- */
-import React, { useEffect, useState } from 'react';
-interface ScoreRadialProps {
-  score: number | null;
-  passed: boolean;
-  size?: number;
-  animate?: boolean;
-}
-export const ScoreRadial: React.FC<ScoreRadialProps> = ({
-  score,
-  passed,
-  size = 120,
-  animate = true
-}) => {
-  const [animatedScore, setAnimatedScore] = useState(animate ? 0 : (score ?? 0));
-  const percentage = score !== null ? animatedScore : 0;
-  const radius = (size - 16) / 2; // Account for stroke width
-  const circumference = 2 * Math.PI * radius;
-  const strokeDashoffset = circumference - (percentage / 100) * circumference;
-  // Unique ID for gradient definitions
-  const gradientId = `score-gradient-${passed ? 'pass' : 'fail'}`;
-  const glowId = `score-glow-${passed ? 'pass' : 'fail'}`;
-  // Animate score on mount/change
-  useEffect(() => {
-    if (!animate || score === null) {
-      setAnimatedScore(score ?? 0);
-      return;
-    }
-    const duration = 1000;
-    const startTime = Date.now();
-    const startValue = animatedScore;
-    const endValue = score;
-    const animateValue = () => {
-      const elapsed = Date.now() - startTime;
-      const progress = Math.min(elapsed / duration, 1);
-      // Ease out cubic
-      const eased = 1 - Math.pow(1 - progress, 3);
-      const current = startValue + (endValue - startValue) * eased;
-      setAnimatedScore(current);
-      if (progress < 1) {
-        requestAnimationFrame(animateValue);
-      }
-    };
-    requestAnimationFrame(animateValue);
-  }, [score, animate]);
-  // Get quality label based on score
-  const getQualityLabel = (s: number): string => {
-    if (s >= 80) return 'Excellent';
-    if (s >= 60) return 'Good';
-    if (s >= 40) return 'Fair';
-    return 'Needs Work';
-  };
-  return (
-    <div
-      className="relative flex items-center justify-center"
-      style={{ width: size, height: size }}
-    >
-      {/* Outer glow effect */}
-      {score !== null && (
-        <div
-          className={`absolute inset-0 rounded-full transition-opacity duration-1000 ${
-            passed ? 'bg-green-500/10' : 'bg-red-500/10'
-          } ${percentage > 0 ? 'opacity-100' : 'opacity-0'}`}
-          style={{
-            boxShadow: passed
-              ? '0 0 40px rgba(34, 197, 94, 0.2), inset 0 0 20px rgba(34, 197, 94, 0.05)'
-              : '0 0 40px rgba(230, 57, 70, 0.2), inset 0 0 20px rgba(230, 57, 70, 0.05)'
-          }}
-        />
-      )}
-      <svg className="transform -rotate-90" width={size} height={size}>
-        {/* Gradient definitions */}
-        <defs>
-          <linearGradient id={gradientId} x1="0%" y1="0%" x2="100%" y2="100%">
-            {passed ? (
-              <>
-                <stop offset="0%" stopColor="#22c55e" />
-                <stop offset="50%" stopColor="#4ade80" />
-                <stop offset="100%" stopColor="#22c55e" />
-              </>
-            ) : (
-              <>
-                <stop offset="0%" stopColor="#E63946" />
-                <stop offset="50%" stopColor="#f87171" />
-                <stop offset="100%" stopColor="#c1121f" />
-              </>
-            )}
-          </linearGradient>
-          <filter id={glowId} x="-50%" y="-50%" width="200%" height="200%">
-            <feGaussianBlur stdDeviation="3" result="coloredBlur" />
-            <feMerge>
-              <feMergeNode in="coloredBlur" />
-              <feMergeNode in="SourceGraphic" />
-            </feMerge>
-          </filter>
-        </defs>
-        {/* Background track - glass effect */}
-        <circle
-          cx={size / 2}
-          cy={size / 2}
-          r={radius}
-          stroke="rgba(31, 41, 55, 0.8)"
-          strokeWidth="10"
-          fill="none"
-        />
-        {/* Inner subtle ring */}
-        <circle
-          cx={size / 2}
-          cy={size / 2}
-          r={radius - 6}
-          stroke="rgba(255, 255, 255, 0.03)"
-          strokeWidth="1"
-          fill="none"
-        />
-        {/* Progress arc with gradient */}
-        {score !== null && (
-          <circle
-            cx={size / 2}
-            cy={size / 2}
-            r={radius}
-            stroke={`url(#${gradientId})`}
-            strokeWidth="10"
-            fill="none"
-            strokeLinecap="round"
-            strokeDasharray={circumference}
-            strokeDashoffset={strokeDashoffset}
-            filter={percentage >= 60 ? `url(#${glowId})` : undefined}
-            className="transition-all duration-100"
-          />
-        )}
-        {/* Decorative end cap glow */}
-        {score !== null && percentage > 5 && (
-          <circle
-            cx={size / 2 + radius * Math.cos((2 * Math.PI * percentage) / 100 - Math.PI / 2)}
-            cy={size / 2 + radius * Math.sin((2 * Math.PI * percentage) / 100 - Math.PI / 2)}
-            r="5"
-            fill={passed ? '#4ade80' : '#f87171'}
-            opacity="0.6"
-            className="animate-pulse"
-          />
-        )}
-      </svg>
-      {/* Center content */}
-      <div className="absolute inset-0 flex flex-col items-center justify-center">
-        {score !== null ? (
-          <>
-            {/* Score value */}
-            <div className="flex items-baseline gap-0.5">
-              <span className="text-4xl font-bold text-white tabular-nums">
-                {Math.round(animatedScore)}
-              </span>
-              <span className="text-lg text-gray-500 font-medium">%</span>
-            </div>
-            {/* Pass/Fail badge */}
-            <div
-              className={`mt-1 px-3 py-0.5 rounded-full text-[10px] font-bold uppercase tracking-wider
-                ${passed
-                  ? 'bg-green-500/20 text-green-400 border border-green-500/30'
-                  : 'bg-red-500/20 text-red-400 border border-red-500/30'
-                }`}
-            >
-              {passed ? 'Pass' : 'Fail'}
-            </div>
-            {/* Quality label */}
-            {size >= 140 && (
-              <span className="mt-2 text-[10px] text-gray-500 font-medium">
-                {getQualityLabel(score)}
-              </span>
-            )}
-          </>
-        ) : (
-          <div className="flex flex-col items-center gap-1">
-            <svg className="w-8 h-8 text-gray-600" fill="none" viewBox="0 0 24 24" stroke="currentColor">
-              <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={1.5}
-                d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
-            </svg>
-            <span className="text-gray-500 text-xs font-medium">No score</span>
-          </div>
-        )}
-      </div>
-    </div>
-  );
-};
-export default ScoreRadial;

package/components/mobile/StreamingLogPanel.tsx DELETED Viewed

@@ -1,230 +0,0 @@
-/**
- * StreamingLogPanel Component
- *
- * Displays real-time streaming logs during test execution.
- * Premium glass morphism styling with visual log type indicators.
- * Expandable panel that auto-scrolls to latest content.
- */
-import React, { useRef, useEffect, useState } from 'react';
-import type { StreamLog } from '../../hooks/useEvalData';
-import haptics from '../../utils/haptics';
-interface StreamingLogPanelProps {
-  logs: StreamLog[];
-  isRunning: boolean;
-}
-// Log type configurations with styling
-interface LogTypeConfig {
-  color: string;
-  bgColor: string;
-  icon: React.ReactNode;
-}
-const LogTypeIcons: Record<StreamLog['type'] | 'info', LogTypeConfig> = {
-  success: {
-    color: 'text-green-400',
-    bgColor: 'bg-green-500/10',
-    icon: (
-      <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2.5}>
-        <path strokeLinecap="round" strokeLinejoin="round" d="M5 13l4 4L19 7" />
-      </svg>
-    )
-  },
-  warning: {
-    color: 'text-yellow-400',
-    bgColor: 'bg-yellow-500/10',
-    icon: (
-      <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
-        <path strokeLinecap="round" strokeLinejoin="round" d="M12 9v2m0 4h.01m-6.938 4h13.856c1.54 0 2.502-1.667 1.732-3L13.732 4c-.77-1.333-2.694-1.333-3.464 0L3.34 16c-.77 1.333.192 3 1.732 3z" />
-      </svg>
-    )
-  },
-  error: {
-    color: 'text-red-400',
-    bgColor: 'bg-red-500/10',
-    icon: (
-      <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2.5}>
-        <path strokeLinecap="round" strokeLinejoin="round" d="M6 18L18 6M6 6l12 12" />
-      </svg>
-    )
-  },
-  progress: {
-    color: 'text-blue-400',
-    bgColor: 'bg-blue-500/10',
-    icon: (
-      <svg className="w-3.5 h-3.5 animate-spin" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
-        <path strokeLinecap="round" strokeLinejoin="round" d="M4 4v5h.582m15.356 2A8.001 8.001 0 004.582 9m0 0H9m11 11v-5h-.581m0 0a8.003 8.003 0 01-15.357-2m15.357 2H15" />
-      </svg>
-    )
-  },
-  info: {
-    color: 'text-gray-400',
-    bgColor: 'bg-gray-500/10',
-    icon: (
-      <svg className="w-3.5 h-3.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" strokeWidth={2}>
-        <path strokeLinecap="round" strokeLinejoin="round" d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
-      </svg>
-    )
-  }
-};
-export const StreamingLogPanel: React.FC<StreamingLogPanelProps> = ({
-  logs,
-  isRunning
-}) => {
-  const [isExpanded, setIsExpanded] = useState(true);
-  const bottomRef = useRef<HTMLDivElement>(null);
-  const containerRef = useRef<HTMLDivElement>(null);
-  // Auto-scroll to bottom when new logs arrive
-  useEffect(() => {
-    if (bottomRef.current && containerRef.current) {
-      bottomRef.current.scrollIntoView({ behavior: 'smooth' });
-    }
-  }, [logs]);
-  // Auto-expand when running starts
-  useEffect(() => {
-    if (isRunning) {
-      setIsExpanded(true);
-    }
-  }, [isRunning]);
-  const getLogConfig = (type: StreamLog['type']): LogTypeConfig => {
-    return LogTypeIcons[type] || LogTypeIcons.info;
-  };
-  // Count log types for summary
-  const logCounts = logs.reduce((acc, log) => {
-    acc[log.type] = (acc[log.type] || 0) + 1;
-    return acc;
-  }, {} as Record<string, number>);
-  if (logs.length === 0 && !isRunning) {
-    return null;
-  }
-  return (
-    <div className="border-t border-white/5 bg-gray-900/30 backdrop-blur-sm">
-      {/* Toggle header - Glass styling */}
-      <button
-        type="button"
-        onClick={() => {
-          haptics.light();
-          setIsExpanded(!isExpanded);
-        }}
-        className="w-full flex items-center justify-between p-3.5
-          hover:bg-white/5 active:scale-[0.995] transition-all duration-150"
-      >
-        <div className="flex items-center gap-3">
-          {/* Live indicator */}
-          {isRunning && (
-            <div className="relative">
-              <span className="absolute inset-0 w-2.5 h-2.5 bg-green-500 rounded-full animate-ping opacity-75" />
-              <span className="relative w-2.5 h-2.5 bg-green-500 rounded-full block" />
-            </div>
-          )}
-          {/* Title */}
-          <span className="text-sm font-medium text-gray-300">
-            {isRunning ? 'Live Output' : 'Output'}
-          </span>
-          {/* Log type counts - Mini badges */}
-          <div className="flex items-center gap-1.5">
-            {logCounts.error && logCounts.error > 0 && (
-              <span className="px-1.5 py-0.5 rounded text-[10px] font-medium bg-red-500/20 text-red-400 border border-red-500/30">
-                {logCounts.error} error{logCounts.error > 1 ? 's' : ''}
-              </span>
-            )}
-            {logCounts.warning && logCounts.warning > 0 && (
-              <span className="px-1.5 py-0.5 rounded text-[10px] font-medium bg-yellow-500/20 text-yellow-400 border border-yellow-500/30">
-                {logCounts.warning} warn
-              </span>
-            )}
-            <span className="px-1.5 py-0.5 rounded text-[10px] font-medium bg-white/5 text-gray-500 border border-white/5">
-              {logs.length} total
-            </span>
-          </div>
-        </div>
-        {/* Expand indicator */}
-        <div className={`w-7 h-7 rounded-full bg-white/5 flex items-center justify-center
-          transition-all duration-200 ${isExpanded ? 'bg-white/10' : ''}`}>
-          <svg
-            className={`w-4 h-4 text-gray-500 transition-transform duration-200 ${isExpanded ? 'rotate-180' : ''}`}
-            fill="none"
-            viewBox="0 0 24 24"
-            stroke="currentColor"
-          >
-            <path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 15l7-7 7 7" />
-          </svg>
-        </div>
-      </button>
-      {/* Log content - Terminal style with glass */}
-      <div
-        ref={containerRef}
-        className={`overflow-hidden transition-all duration-300 ease-out
-          ${isExpanded ? 'max-h-80' : 'max-h-0'}`}
-      >
-        <div className="h-full overflow-y-auto bg-black/40 backdrop-blur-sm p-3 font-mono text-xs leading-relaxed
-          border-t border-white/5 scrollbar-hide">
-          {/* Log entries */}
-          {logs.map((log, i) => {
-            const config = getLogConfig(log.type);
-            return (
-              <div
-                key={i}
-                className={`flex items-start gap-2 py-1 px-2 -mx-2 rounded
-                  ${config.bgColor} ${config.color}
-                  animate-fade-in`}
-                style={{ animationDelay: `${Math.min(i * 20, 200)}ms` }}
-              >
-                {/* Icon */}
-                <span className="flex-shrink-0 mt-0.5 w-4 flex items-center justify-center">
-                  {config.icon}
-                </span>
-                {/* Message */}
-                <span className="whitespace-pre-wrap break-words flex-1">
-                  {log.message}
-                </span>
-                {/* Timestamp for progress logs */}
-                {log.type === 'progress' && (
-                  <span className="text-[10px] text-gray-600 flex-shrink-0 tabular-nums">
-                    {new Date().toLocaleTimeString(undefined, {
-                      hour: '2-digit',
-                      minute: '2-digit',
-                      second: '2-digit'
-                    })}
-                  </span>
-                )}
-              </div>
-            );
-          })}
-          {/* Typing indicator while running */}
-          {isRunning && (
-            <div className="flex items-center gap-1.5 mt-3 pt-2 border-t border-white/5">
-              <div className="flex items-center gap-1">
-                <span className="w-1.5 h-1.5 rounded-full bg-brand-red animate-pulse" />
-                <span className="w-1.5 h-1.5 rounded-full bg-brand-red animate-pulse" style={{ animationDelay: '0.2s' }} />
-                <span className="w-1.5 h-1.5 rounded-full bg-brand-red animate-pulse" style={{ animationDelay: '0.4s' }} />
-              </div>
-              <span className="text-[10px] text-gray-600 ml-1">Processing...</span>
-            </div>
-          )}
-          <div ref={bottomRef} />
-        </div>
-      </div>
-    </div>
-  );
-};
-export default StreamingLogPanel;

package/components/mobile/SynthesisStrategyChart.tsx DELETED Viewed

@@ -1,140 +0,0 @@
-/**
- * SynthesisStrategyChart Component
- *
- * Horizontal bars showing distribution of synthesis strategies:
- * - Ghost Dominates (red) - Superego/authority wins
- * - Learner Dominates (blue) - Learner needs prioritized
- * - Dialectical Synthesis (gold/green) - True mutual recognition
- */
-import React from 'react';
-interface SynthesisStrategyCounts {
-  ghost_dominates: number;
-  learner_dominates: number;
-  dialectical_synthesis: number;
-}
-interface SynthesisStrategyChartProps {
-  counts: SynthesisStrategyCounts;
-}
-export const SynthesisStrategyChart: React.FC<SynthesisStrategyChartProps> = ({
-  counts,
-}) => {
-  const total =
-    counts.ghost_dominates + counts.learner_dominates + counts.dialectical_synthesis;
-  if (total === 0) {
-    return (
-      <div className="bg-gray-900/60 backdrop-blur-sm border border-white/5 rounded-xl p-4">
-        <div className="text-xs text-gray-400 mb-3">Synthesis Strategies</div>
-        <div className="text-sm text-gray-500 text-center py-4">
-          No synthesis data recorded
-        </div>
-      </div>
-    );
-  }
-  const strategies = [
-    {
-      key: 'dialectical_synthesis',
-      label: 'Dialectical Synthesis',
-      count: counts.dialectical_synthesis,
-      percentage: (counts.dialectical_synthesis / total) * 100,
-      gradient: 'from-yellow-500 to-green-500',
-      bgColor: 'bg-yellow-500/20',
-      borderColor: 'border-yellow-500/30',
-      textColor: 'text-yellow-400',
-      icon: '⚡',
-      description: 'Mutual recognition achieved',
-    },
-    {
-      key: 'learner_dominates',
-      label: 'Learner Dominates',
-      count: counts.learner_dominates,
-      percentage: (counts.learner_dominates / total) * 100,
-      gradient: 'from-blue-500 to-blue-400',
-      bgColor: 'bg-blue-500/20',
-      borderColor: 'border-blue-500/30',
-      textColor: 'text-blue-400',
-      icon: '🎯',
-      description: 'Learner needs prioritized',
-    },
-    {
-      key: 'ghost_dominates',
-      label: 'Ghost Dominates',
-      count: counts.ghost_dominates,
-      percentage: (counts.ghost_dominates / total) * 100,
-      gradient: 'from-red-500 to-red-400',
-      bgColor: 'bg-red-500/20',
-      borderColor: 'border-red-500/30',
-      textColor: 'text-red-400',
-      icon: '👻',
-      description: 'Authority/superego wins',
-    },
-  ];
-  // Find max for scaling
-  const maxCount = Math.max(
-    counts.ghost_dominates,
-    counts.learner_dominates,
-    counts.dialectical_synthesis
-  );
-  return (
-    <div className="bg-gray-900/60 backdrop-blur-sm border border-white/5 rounded-xl p-4">
-      <div className="flex items-center justify-between mb-4">
-        <div className="text-xs text-gray-400">Synthesis Strategies</div>
-        <div className="text-xs text-gray-500">{total} total</div>
-      </div>
-      <div className="space-y-3">
-        {strategies.map((strategy) => (
-          <div key={strategy.key} className="space-y-1.5">
-            {/* Label row */}
-            <div className="flex items-center justify-between">
-              <div className="flex items-center gap-2">
-                <span className="text-sm">{strategy.icon}</span>
-                <span className="text-xs text-gray-300">{strategy.label}</span>
-              </div>
-              <div className="flex items-center gap-2">
-                <span className={`text-xs ${strategy.textColor} font-medium`}>
-                  {strategy.count}
-                </span>
-                <span className="text-xs text-gray-500">
-                  ({strategy.percentage.toFixed(0)}%)
-                </span>
-              </div>
-            </div>
-            {/* Progress bar */}
-            <div className="h-2 bg-gray-800 rounded-full overflow-hidden">
-              <div
-                className={`h-full bg-gradient-to-r ${strategy.gradient} rounded-full transition-all duration-500`}
-                style={{ width: `${maxCount > 0 ? (strategy.count / maxCount) * 100 : 0}%` }}
-              />
-            </div>
-            {/* Description */}
-            <div className="text-[10px] text-gray-600">{strategy.description}</div>
-          </div>
-        ))}
-      </div>
-      {/* Ideal indicator */}
-      {counts.dialectical_synthesis > 0 && (
-        <div className="mt-4 pt-3 border-t border-white/5">
-          <div className="flex items-center gap-2 text-xs">
-            <span className="text-yellow-400">
-              {((counts.dialectical_synthesis / total) * 100).toFixed(0)}%
-            </span>
-            <span className="text-gray-500">of moments achieve dialectical synthesis</span>
-          </div>
-        </div>
-      )}
-    </div>
-  );
-};
-export default SynthesisStrategyChart;

package/docs/research/ABLATION-DIALOGUE-ROUNDS.md DELETED Viewed

@@ -1,52 +0,0 @@
-# Ablation Study: Dialogue Rounds
-**Generated:** 2026-01-14T10:23:56.877Z
-## Research Question
-Does increasing the number of Ego-Superego dialogue rounds improve tutor suggestion quality?
-## Method
-Compared evaluation scores across profiles with different `max_rounds` settings:
-- **0 rounds**: Single-agent (no Superego review)
-- **1 round**: Single critique-revise cycle
-- **2 rounds**: Two critique-revise cycles (default)
-- **3 rounds**: Three critique-revise cycles
-## Results
-### Descriptive Statistics
-| Rounds | N | Mean | SD | 95% CI |
-|--------|---|------|-----|--------|
-| 0 | 483 | 91.58 | 15.75 | [90.2, 93.0] |
-| 1 | 1 | 50.00 | 0.00 | [50.0, 50.0] |
-| 2 | 247 | 88.05 | 19.77 | [85.6, 90.5] |
-| 3 | 2 | 96.25 | 1.77 | [93.8, 98.7] |
-### One-Way ANOVA
-| Source | SS | df | MS | F | p | η² |
-|--------|-----|-----|-----|-----|-----|-----|
-| Between | 3738.46 | 3 | 1246.15 | 4.212 | 0.050 | 0.017 |
-| Within | 215696.89 | 729 | 295.88 | | | |
-| Total | 219435.35 | 732 | | | | |
-## Interpretation
-The effect of dialogue rounds on suggestion quality was not statistically significant (F(3, 729) = 4.21, p = 0.050, η² = 0.017).
-Moving from single-agent (0 rounds) to multi-agent with 2 dialogue rounds shows a -3.9% improvement in mean score.
-## Limitations
-- Confounded with profile differences (model selection, prompts)
-- Unbalanced sample sizes across conditions
-- No randomized controlled comparison
-## Implications for System Design
-Based on these results:
-- Dialogue rounds may have limited impact compared to other factors
-- Consider whether additional API costs are justified

package/docs/research/ABLATION-MODEL-SELECTION.md DELETED Viewed

@@ -1,53 +0,0 @@
-# Ablation Study: Model Selection
-**Generated:** 2026-01-14T10:25:59.574Z
-## Research Question
-Does the choice of LLM model (for the Ego agent) affect tutor suggestion quality?
-## Method
-Analyzed evaluation scores grouped by the Ego model used in each profile.
-## Results
-### Descriptive Statistics
-| Model | N | Mean | SD | 95% CI |
-|-------|---|------|-----|--------|
-| deepseek | 442 | 93.31 | 13.10 | [92.1, 94.5] |
-| nemotron | 299 | 86.44 | 20.35 | [84.1, 88.7] |
-| haiku | 29 | 84.20 | 21.58 | [76.3, 92.1] |
-| gpt-5.2 | 1 | 97.50 | 0.00 | [97.5, 97.5] |
-| sonnet | 1 | 97.50 | 0.00 | [97.5, 97.5] |
-### One-Way ANOVA
-- F(4, 767) = 8.729
-- p < .05
-- η² = 0.044 (Small effect)
-### Model Ranking
-1. **gpt-5.2**: M = 97.50 (n=1)
-2. **sonnet**: M = 97.50 (n=1)
-3. **deepseek**: M = 93.31 (n=442)
-4. **nemotron**: M = 86.44 (n=299)
-5. **haiku**: M = 84.20 (n=29)
-## Interpretation
-Model selection has a statistically significant effect on suggestion quality (F(4, 767) = 8.73, p < .05, η² = 0.044).
-## Limitations
-- Confounded with profile differences (prompts, dialogue settings)
-- Unbalanced sample sizes across models
-- No direct A/B comparison with identical prompts
-## Implications
-- gpt-5.2 shows the highest mean score but with n=1 observations
-- deepseek offers good quality at minimal cost
-- Consider running controlled experiments varying only the model