PyPI - mirofish-simulator - Versions diffs - 0.8.0__tar.gz - Mend

mirofish-simulator 0.8.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

mirofish_simulator-0.8.0/.gitignore ADDED Viewed

@@ -0,0 +1,60 @@
+# OS
+.DS_Store
+Thumbs.db
+# 环境变量（保护敏感信息）
+.env
+.env.local
+.env.*.local
+.env.development
+.env.test
+.env.production
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+.venv/
+venv/
+ENV/
+.eggs/
+*.egg-info/
+dist/
+build/
+# Node.js
+node_modules/
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# 测试
+.pytest_cache/
+.coverage
+htmlcov/
+# Cursor
+.cursor/
+.claude/
+# 文档与测试程序
+mydoc/
+mytest/
+# 日志文件
+backend/logs/
+*.log
+# 上传文件
+backend/uploads/
+# Docker 数据
+data/

mirofish_simulator-0.8.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,67 @@
+# Changelog
+All notable changes to mirofish-simulator will be documented in this file.
+## [0.8.0] - 2024-03-16
+### Added
+- **Agentic Misconception-Matching Architecture** - New approach that produces realistic wrong answers
+  - `AgenticOrchestrator` - Main entry point for student simulation
+  - `DistractorAgent` - Maps each wrong answer to the misconception that leads to it
+  - `StudentModelAgent` - Models what a student believes (correct and incorrect)
+  - `SelectorAgent` - Matches student misconceptions to appropriate answers
+- **Factual vs Conceptual Question Handling**
+  - Distinguishes questions requiring specific knowledge (numbers, dates) from conceptual questions
+  - Factual questions require exact knowledge; vague beliefs aren't enough
+- **Batch Simulation** - `simulate_batch()` method for efficient multi-student simulation
+  - Distractor analysis done once and reused for all students
+### Changed
+- Default architecture is now agentic misconception-matching (previous approach moved to legacy)
+- Improved student differentiation by grade and archetype
+- Better handling of high-familiarity students who have correct beliefs
+### Fixed
+- LLM "cheating" problem - the new architecture doesn't try to make the LLM "not know" things
+- Inconsistency between reasoning and selection in SelectorAgent
+### Deprecated
+- `StudentSimulator` (v2 legacy) - Still available but AgenticOrchestrator is recommended
+- `simulate_student()` / `simulate_classroom()` - Use AgenticOrchestrator instead
+## [0.7.0] - 2024-03-15
+### Added
+- Multi-agent architecture with verification
+- KnowledgeAgent, PerceptionAgent, AnswerAgent, VerifierAgent
+- Catches LLM "cheating" via consistency verification
+## [0.6.0] - 2024-03-14
+### Added
+- Agent-based simulation (single agent approach)
+- Accessibility analysis module
+## [0.5.0] - 2024-03-13
+### Added
+- Misconception analysis
+- Comparative quiz analysis
+- Subject taxonomies (AP Government, Mathematics)
+## [0.4.0] - 2024-03-12
+### Added
+- Initial student simulation
+- Student archetypes
+- Basic accessibility analysis

mirofish_simulator-0.8.0/INCEPTBENCH_INTEGRATION.md ADDED Viewed

@@ -0,0 +1,162 @@
+# MiroFish + InceptBench Integration
+## What MiroFish Provides
+InceptBench already has LLM-based distractor analysis. MiroFish adds:
+### 1. Grade-Level Accessibility Analysis (Reliable)
+**Not duplicated by InceptBench**: Deterministic analysis of whether content is readable at target grade.
+```python
+from mirofish_simulator.accessibility import AccessibilityAnalyzer
+analyzer = AccessibilityAnalyzer()
+result = await analyzer.analyze(question, target_grade=5)
+# Returns:
+{
+    "reading_level": {
+        "flesch_kincaid_grade": 8.3,
+        "target_grade": 5,
+        "verdict": "too_advanced"  # 3+ grades above
+    },
+    "vocabulary": {
+        "issues": [
+            {"word": "ratify", "grade_required": 9, "suggestions": ["approve", "accept"]},
+            {"word": "electoral", "grade_required": 8, "suggestions": ["voting", "election"]}
+        ]
+    },
+    "recommendations": [
+        {"fix": "Replace 'ratify' with 'approve' or 'agree to'"},
+        {"fix": "Simplify sentence structure (current: 17 words avg)"}
+    ]
+}
+```
+**InceptBench Integration Point**: Add `accessibility` field to question evaluation output.
+### 2. Vocabulary Database (5000+ words)
+Word-to-grade mapping from Dolch, Fry, AWL word lists.
+```python
+from mirofish_simulator.accessibility import get_word_grade_level
+get_word_grade_level("ratify")  # → 9
+get_word_grade_level("vote")    # → 4
+get_word_grade_level("dog")     # → 1
+```
+### 3. Prior Knowledge Requirements
+What concepts a student needs to answer the question.
+```python
+result.prior_knowledge.required_concepts
+# → ["Electoral College mechanics", "How presidents are elected", "State-level voting"]
+```
+## What MiroFish Does NOT Provide (Reliably)
+### ❌ Accuracy Prediction
+LLMs cannot constrain their knowledge through prompting. Agent-based simulation will show artificially high accuracy because the underlying model "knows" correct answers.
+### ❌ IRT Parameters
+We don't have calibration data. Any difficulty/discrimination values would be fabricated.
+### ❌ Response Distribution Prediction
+Can't reliably predict which wrong answers students would choose.
+## Recommended Integration
+### Option A: Accessibility as Evaluation Dimension
+Add to existing InceptBench evaluators:
+```python
+# In InceptBench evaluation pipeline
+from mirofish_simulator.accessibility import AccessibilityAnalyzer
+class InceptBenchEvaluator:
+    def __init__(self):
+        self.accessibility = AccessibilityAnalyzer()
+        # ... other evaluators
+    async def evaluate(self, question, target_grade=8):
+        result = {
+            # Existing dimensions
+            "distractor_quality": await self.eval_distractors(question),
+            "factual_accuracy": await self.eval_accuracy(question),
+            # NEW: Accessibility dimension
+            "accessibility": await self.accessibility.analyze(question, target_grade)
+        }
+        return result
+```
+### Option B: Standalone Accessibility Check
+Separate endpoint for accessibility validation:
+```python
+# POST /api/inceptbench/accessibility
+{
+    "question": {...},
+    "target_grade": 5
+}
+# Response
+{
+    "readable": false,
+    "reading_level": 8.3,
+    "target_grade": 5,
+    "issues": [...],
+    "recommendations": [...]
+}
+```
+### Option C: CLI Flag
+```bash
+inceptbench evaluate question.json --check-accessibility --grade 5
+```
+## File Changes for Integration
+### Backend
+```python
+# backend/app/api/inceptbench.py
+@router.post("/accessibility")
+async def check_accessibility(
+    question: QuestionModel,
+    target_grade: int = 8,
+):
+    from mirofish_simulator.accessibility import AccessibilityAnalyzer
+    analyzer = AccessibilityAnalyzer()
+    result = await analyzer.analyze(question.dict(), target_grade)
+    return result.to_dict()
+```
+### Package Export
+```python
+# mirofish_simulator/__init__.py
+from .accessibility import (
+    AccessibilityAnalyzer,
+    AccessibilityResult,
+    flesch_kincaid_grade,
+    get_word_grade_level,
+)
+```
+## Value Proposition
+| What InceptBench Has | What MiroFish Adds |
+|---------------------|-------------------|
+| LLM distractor analysis | Deterministic readability scores |
+| Factual accuracy check | Vocabulary-to-grade mapping |
+| Subject/topic tagging | Prior knowledge requirements |
+| - | Actionable simplification recommendations |
+MiroFish is NOT a replacement for InceptBench evaluation. It's a complementary tool for content accessibility validation.

mirofish_simulator-0.8.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,290 @@
+Metadata-Version: 2.4
+Name: mirofish-simulator
+Version: 0.8.0
+Summary: Agentic student simulation using misconception matching - realistic wrong answers without LLM cheating
+Project-URL: Homepage, https://github.com/StanHus/MiroFish
+Project-URL: Documentation, https://github.com/StanHus/MiroFish#readme
+Project-URL: Repository, https://github.com/StanHus/MiroFish
+Author: MiroFish Team
+License: MIT
+Keywords: ai,education,evaluation,llm,misconceptions,simulation,student-modeling
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Education
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Education
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.10
+Requires-Dist: openai>=1.0.0
+Requires-Dist: pydantic>=2.0.0
+Provides-Extra: dev
+Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
+Requires-Dist: pytest>=8.0.0; extra == 'dev'
+Description-Content-Type: text/markdown
+# MiroFish Simulator
+**Agentic student simulation** using misconception matching - produces realistic wrong answers without LLM "cheating".
+## The Problem
+Traditional LLM-based student simulation doesn't work: **LLMs know everything**. When you ask GPT-4 to "act like a 5th grader who doesn't know about electoral votes", it still picks 270 because it can't actually "not know" things.
+## The Solution: Misconception Matching
+Instead of trying to suppress LLM knowledge (impossible), we use a **multi-agent pipeline** that matches student misconceptions to wrong answers:
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                    AgenticOrchestrator                               │
+│                                                                      │
+│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐            │
+│  │  DISTRACTOR  │   │   STUDENT    │   │   SELECTOR   │            │
+│  │    AGENT     │──▶│    MODEL     │──▶│    AGENT     │            │
+│  │              │   │    AGENT     │   │              │            │
+│  │ "What error  │   │ "What does   │   │ "Match       │            │
+│  │  leads to    │   │  this student│   │  misconception│           │
+│  │  each wrong  │   │  believe?"   │   │  to answer"  │            │
+│  │  answer?"    │   │              │   │              │            │
+│  └──────────────┘   └──────────────┘   └──────────────┘            │
+└─────────────────────────────────────────────────────────────────────┘
+```
+### How It Works
+1. **DistractorAgent** - Analyzes each wrong answer to identify what misconception leads to it
+   - "Option A (218) catches students who confuse electoral votes with House majority"
+   - "Option D (435) catches students who confuse with total Congress members"
+2. **StudentModelAgent** - Models what a specific student believes and misconceives
+   - Grade 5 class_clown: Low familiarity, vague beliefs, common misconceptions
+   - Grade 11 honors: High familiarity, specific knowledge, few misconceptions
+3. **SelectorAgent** - Matches the student's misconceptions to the appropriate answer
+   - If student has misconception matching a distractor → pick that distractor
+   - If student has correct belief with high familiarity → pick correct answer
+## Quick Start
+```python
+import asyncio
+from mirofish_simulator import AgenticOrchestrator
+orchestrator = AgenticOrchestrator()
+question = {
+    "text": "How many electoral votes are needed to win the presidency?",
+    "options": ["218", "270", "300", "435"],
+}
+async def main():
+    # Single student
+    result = await orchestrator.simulate(
+        question=question,
+        correct_answer="B",
+        grade=5,
+        archetype="class_clown",
+    )
+    print(f"Selected: {result.selected}")           # "C" (wrong!)
+    print(f"Correct: {result.is_correct}")          # False
+    print(f"Familiarity: {result.student_model.topic_familiarity:.0%}")  # 40%
+    print(f"Misconception: {result.selection_result.misconception_matched}")
+asyncio.run(main())
+```
+### Batch Simulation
+```python
+# Efficient - distractor analysis done once, reused for all students
+results = await orchestrator.simulate_batch(
+    question=question,
+    correct_answer="B",
+    students=[
+        {"grade": 5, "archetype": "class_clown"},
+        {"grade": 8, "archetype": "average_student"},
+        {"grade": 11, "archetype": "honors_overachiever"},
+    ]
+)
+for r in results:
+    status = "✓" if r.is_correct else "✗"
+    print(f"Grade {r.grade} {r.archetype}: {r.selected}) {status}")
+```
+## Realistic Results
+The system produces realistic differentiation:
+| Student | Electoral Votes (hard) | Branches of Gov (easy) |
+|---------|----------------------|----------------------|
+| Grade 5 class_clown | ✗ | ✓ |
+| Grade 8 average | ✗ | ✓ |
+| Grade 11 honors | ✓ | ✓ |
+- **Easy questions**: All students get them right (basic civics)
+- **Hard factual questions**: Only students with specific knowledge get them right
+- **Archetypes matter**: class_clown (low familiarity) gets more wrong than honors
+## Agent Details
+### DistractorAgent
+Maps each wrong answer to the misconception that leads to it.
+```python
+from mirofish_simulator import DistractorAgent
+agent = DistractorAgent()
+analysis = await agent.analyze(question, correct_answer="B")
+for mapping in analysis.mappings:
+    if not mapping.is_correct:
+        print(f"{mapping.option}) {mapping.option_text}")
+        print(f"   Misconception: {mapping.leads_from_misconception}")
+        print(f"   Grade appeal: {mapping.grade_level_appeal}")
+```
+### StudentModelAgent
+Models what a student believes (correct and incorrect).
+```python
+from mirofish_simulator import StudentModelAgent
+agent = StudentModelAgent()
+student = await agent.model_student(question, grade=5, archetype="class_clown")
+print(f"Beliefs: {student.beliefs}")
+print(f"Misconceptions: {student.misconceptions}")
+print(f"Topic familiarity: {student.topic_familiarity:.0%}")
+print(f"Guesses when unsure: {student.guesses_when_unsure}")
+```
+### SelectorAgent
+Matches student misconceptions to answers.
+```python
+from mirofish_simulator import SelectorAgent
+agent = SelectorAgent()
+selection = await agent.select(question, distractor_analysis, student_model)
+print(f"Selected: {selection.selected}")
+print(f"Reason: {selection.selection_reason}")
+print(f"Misconception matched: {selection.misconception_matched}")
+```
+## Archetypes
+| Archetype | Familiarity | Behavior |
+|-----------|-------------|----------|
+| `honors_overachiever` | High (80%+) | Specific knowledge, confident |
+| `average_student` | Medium (60-70%) | Taught content, some gaps |
+| `class_clown` | Low (40%) | Minimal attention, guesses |
+| `esl_student` | Medium | Core concepts solid, vocabulary issues |
+| `disengaged_but_smart` | Variable | Has ability, inconsistent |
+| `quiet_thinker` | Medium | Second-guesses self |
+| `debate_club_kid` | High in interests | Good at arguments |
+## Installation
+```bash
+pip install mirofish-simulator
+```
+Or from source:
+```bash
+cd packages/mirofish-simulator
+pip install -e .
+```
+## Environment
+```bash
+export OPENAI_API_KEY="sk-..."
+```
+## API Reference
+### AgenticOrchestrator
+```python
+orchestrator = AgenticOrchestrator(
+    api_key: str = None,      # Uses OPENAI_API_KEY env var if not provided
+    base_url: str = None,     # Custom API base URL
+    model: str = "gpt-4o-mini",
+)
+# Single simulation
+result = await orchestrator.simulate(
+    question: dict,           # {"text": "...", "options": [...]}
+    correct_answer: str,      # "A", "B", "C", or "D"
+    grade: int,               # 1-12
+    archetype: str,           # See archetypes above
+)
+# Batch simulation (efficient)
+results = await orchestrator.simulate_batch(
+    question: dict,
+    correct_answer: str,
+    students: list,           # [{"grade": 5, "archetype": "..."}, ...]
+)
+```
+### AgenticSimulationResult
+```python
+result.selected              # "A", "B", "C", "D"
+result.selected_text         # The full answer text
+result.is_correct            # True/False
+result.grade                 # Student grade
+result.archetype             # Student archetype
+# Agent outputs
+result.distractor_analysis   # DistractorAnalysis
+result.student_model         # StudentModel
+result.selection_result      # SelectionResult
+# Methods
+result.to_dict()             # Full dict representation
+result.summary()             # Human-readable summary
+```
+## Accessibility Analysis (Static)
+For deterministic content analysis without LLM:
+```python
+from mirofish_simulator import AccessibilityAnalyzer
+analyzer = AccessibilityAnalyzer()
+result = await analyzer.analyze(content, target_grade=5)
+print(f"Reading Level: Grade {result.reading_level.flesch_kincaid_grade}")
+print(f"Vocabulary Issues: {len(result.vocabulary.issues)}")
+```
+## Version History
+### v0.8.0 (Current)
+- **Agentic misconception-matching architecture**
+- DistractorAgent, StudentModelAgent, SelectorAgent
+- Factual vs conceptual question handling
+- Realistic wrong answers without LLM cheating
+### v0.7.0
+- Multi-agent with verification (deprecated approach)
+### v0.6.0
+- Single agent simulation
+## License
+MIT