evalvault 1.62.0__tar.gz → 1.62.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {evalvault-1.62.0 → evalvault-1.62.1}/PKG-INFO +1 -1
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/INDEX.md +4 -0
- evalvault-1.62.1/docs/guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md +426 -0
- evalvault-1.62.1/docs/guides/LENA_RAGAS_CALIBRATION_DEV_PLAN.md +428 -0
- evalvault-1.62.1/docs/guides/PRD_LENA.md +637 -0
- evalvault-1.62.1/docs/guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md +171 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/package-lock.json +14 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/package.json +1 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/AnalysisNodeOutputs.tsx +26 -20
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/PrioritySummaryPanel.tsx +4 -3
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/SpacePlot2D.tsx +27 -21
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/SpacePlot3D.tsx +4 -2
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/StatusBadge.tsx +1 -1
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/hooks/useInsightSpace.ts +6 -8
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/AnalysisCompareView.tsx +7 -3
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/AnalysisLab.tsx +25 -22
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/AnalysisResultView.tsx +3 -3
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/CompareRuns.tsx +8 -2
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/ComprehensiveAnalysis.tsx +24 -28
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/CustomerReport.tsx +6 -2
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/Dashboard.tsx +6 -2
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/EvaluationStudio.tsx +3 -3
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/RunDetails.tsx +15 -9
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/Settings.tsx +3 -2
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/Visualization.tsx +28 -9
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/services/api.ts +3 -3
- {evalvault-1.62.0 → evalvault-1.62.1}/mkdocs.yml +1 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/pyproject.toml +1 -1
- {evalvault-1.62.0 → evalvault-1.62.1}/uv.lock +1 -1
- evalvault-1.62.0/docs/guides/RAG_HUMAN_FEEDBACK_CALIBRATION.md +0 -298
- {evalvault-1.62.0 → evalvault-1.62.1}/.cursor/worktrees.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.dockerignore +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.env.example +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.github/ISSUE_TEMPLATE/question.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.github/dependabot.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.github/pull_request_template.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.github/stale.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.github/workflows/ci.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.github/workflows/release.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.github/workflows/stale.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.gitignore +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.pre-commit-config.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/.python-version +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/AGENTS.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/CHANGELOG.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/CLAUDE.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/CODE_OF_CONDUCT.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/CONTRIBUTING.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/Dockerfile +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/LICENSE.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/README.en.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/SECURITY.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/agent.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/client.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/main.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory/shared/decisions.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory/shared/dependencies.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory/templates/coordinator_guide.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory/templates/work_log_template.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory_integration.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/progress.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/app_spec.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/baseline.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/coding_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/existing_project_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/improvement/architecture_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/improvement/base_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/improvement/coordinator_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/improvement/observability_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/initializer_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/prompt_manifest.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/system.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/requirements.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/agent/security.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/config/domains/insurance/memory.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/config/domains/insurance/terms_dictionary_en.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/config/domains/insurance/terms_dictionary_ko.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/config/methods.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/config/models.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/config/regressions/default.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/config/regressions/ux.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/config/stage_metric_playbook.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/config/stage_metric_thresholds.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/dummy_test_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/insurance_qa_korean.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/insurance_qa_korean.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/insurance_qa_korean_2.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/insurance_qa_korean_3.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/sample.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/visualization_20q_cluster_map.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/visualization_20q_korean.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/visualization_2q_cluster_map.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/visualization_2q_korean.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/kg/knowledge_graph.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/raw/The Complete Guide to Mastering Suno Advanced Strategies for Professional Music Generation.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/raw/edge_cases.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/raw/run_mode_full_domain_memory.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/data/raw/sample_rag_knowledge.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/dataset_templates/dataset_template.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/dataset_templates/dataset_template.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/dataset_templates/dataset_template.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/dataset_templates/method_input_template.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docker-compose.langfuse.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docker-compose.phoenix.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docker-compose.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/README.ko.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/ROADMAP.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/STATUS.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/adapters/inbound.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/adapters/outbound.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/config.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/domain/entities.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/domain/metrics.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/domain/services.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/ports/inbound.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/ports/outbound.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/architecture/open-rag-trace-collector.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/architecture/open-rag-trace-spec.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/getting-started/INSTALLATION.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/AGENTS_SYSTEM_GUIDE.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/CLI_MCP_PLAN.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/DEV_GUIDE.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/OPEN_RAG_TRACE_INTERNAL_ADAPTER.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/OPEN_RAG_TRACE_SAMPLES.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/RELEASE_CHECKLIST.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/USER_GUIDE.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/rag_human_feedback_calibration_implementation_plan.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/mapping/component-to-whitepaper.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/00_frontmatter.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/01_overview.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/02_architecture.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/03_data_flow.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/04_components.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/05_expert_lenses.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/06_implementation.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/07_advanced.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/08_customization.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/09_quality.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/10_performance.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/11_security.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/12_operations.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/13_standards.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/14_roadmap.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/INDEX.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/STYLE_GUIDE.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/stylesheets/extra.css +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/templates/dataset_template.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/templates/dataset_template.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/templates/dataset_template.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/templates/kg_template.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/templates/retriever_docs_template.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/tools/generate-whitepaper.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/docs/web_ui_analysis_migration_plan.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/dummy_test_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/korean_rag/faithfulness_test.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/korean_rag/insurance_qa_100.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/korean_rag/keyword_extraction_test.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/korean_rag/retrieval_test.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/output/comparison.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/output/full_results.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/output/leaderboard.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/output/results_mteb.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/output/retrieval_result.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/run_korean_benchmark.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/kg_generator_demo.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/method_plugin_template/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/method_plugin_template/pyproject.toml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/method_plugin_template/src/method_plugin_template/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/method_plugin_template/src/method_plugin_template/methods.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/usecase/comprehensive_workflow_test.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/usecase/insurance_eval_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/examples/usecase/output/comprehensive_report.html +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/.env.example +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/.gitignore +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/analysis-compare.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/analysis-lab.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/compare-runs.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/dashboard.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/domain-memory.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/evaluation-studio.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/knowledge-base.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/mocks/intents.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/mocks/run_details.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/mocks/runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/run-details.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/eslint.config.js +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/index.html +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/playwright.config.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/public/vite.svg +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/App.css +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/App.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/assets/react.svg +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/InsightSpacePanel.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/Layout.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/MarkdownContent.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/SpaceLegend.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/VirtualizedText.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/config/ui.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/config.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/index.css +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/main.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/DomainMemory.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/KnowledgeBase.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/VisualizationHome.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/types/plotly.d.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/utils/format.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/utils/phoenix.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/utils/runAnalytics.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/utils/score.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/utils/summaryMetrics.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/tailwind.config.js +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/tsconfig.app.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/tsconfig.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/tsconfig.node.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/frontend/vite.config.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/package-lock.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/.gitkeep +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_0aa9fab0-6c2c-4c1c-b228-202a38a2f00c.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_0aa9fab0-6c2c-4c1c-b228-202a38a2f00c.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_2163f844-ee2c-4630-9ba8-35cd9954d92e.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_2163f844-ee2c-4630-9ba8-35cd9954d92e.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_4516d358-2797-4c46-9f14-c1d975588025.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_4516d358-2797-4c46-9f14-c1d975588025.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_9fbf4776-9f5b-4c4b-ba08-c556032cee86.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_9fbf4776-9f5b-4c4b-ba08-c556032cee86.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/causal_analysis.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/diagnostic.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/final_output.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/index.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/load_data.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/load_runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/low_samples.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/nlp_analysis.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/pattern_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/priority_summary.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/ragas_eval.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/report.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/root_cause.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/statistics.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/time_series.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/trend_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/causal_analysis.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/diagnostic.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/final_output.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/index.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/load_data.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/load_runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/low_samples.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/nlp_analysis.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/pattern_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/priority_summary.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/ragas_eval.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/report.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/root_cause.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/statistics.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/time_series.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/trend_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/causal_analysis.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/diagnostic.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/final_output.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/index.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/load_data.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/load_runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/low_samples.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/nlp_analysis.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/pattern_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/priority_summary.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/ragas_eval.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/report.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/root_cause.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/statistics.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/time_series.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/trend_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/causal_analysis.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/diagnostic.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/final_output.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/index.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/load_data.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/load_runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/low_samples.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/nlp_analysis.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/pattern_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/priority_summary.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/ragas_eval.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/report.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/root_cause.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/statistics.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/time_series.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/trend_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/causal_analysis.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/diagnostic.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/final_output.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/index.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/load_data.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/load_runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/low_samples.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/nlp_analysis.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/pattern_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/priority_summary.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/ragas_eval.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/report.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/root_cause.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/statistics.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/time_series.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/trend_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/final_output.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/index.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/load_runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/report.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_change_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_metric_comparison.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/final_output.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/index.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/load_runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/report.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_change_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_metric_comparison.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/final_output.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/index.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/load_runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/report.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_change_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_metric_comparison.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_0aa9fab0_9fbf4776.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_0aa9fab0_9fbf4776.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_0aa9fab0_f1287e90.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_0aa9fab0_f1287e90.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_8f825b22_4516d358.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_8f825b22_4516d358.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_9fbf4776_a491fa0e.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_9fbf4776_a491fa0e.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_f1287e90_8f825b22.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_f1287e90_8f825b22.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r1_smoke.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r2_graphrag.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r2_graphrag_openai.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r3_bm25.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r3_bm25_langfuse3.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r3_dense_faiss.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/improvement_1d91a667-4288-4742-be3a-a8f5310c5140.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r2_graphrag_openai_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r2_graphrag_openai_stage_report.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r2_graphrag_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r2_graphrag_stage_report.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_langfuse2_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_langfuse3_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_langfuse_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_phoenix_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_stage_report.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_dense_faiss_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_dense_faiss_stage_report.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/retrieval_benchmark_smoke_precision.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/retrieval_benchmark_smoke_precision_graphrag.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/reports/retrieval_benchmark_smoke_precision_multi.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/benchmark/download_kmmlu.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev/open_rag_trace_demo.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev/open_rag_trace_integration_template.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev/otel-collector-config.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev/start_web_ui_with_phoenix.sh +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev/validate_open_rag_trace.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev_seed_pipeline_results.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/analyzer/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/analyzer/ast_scanner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/analyzer/confidence_scorer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/analyzer/graph_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/analyzer/side_effect_detector.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/generate_api_docs.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/models/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/models/schema.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/renderer/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/renderer/html_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/ops/phoenix_watch.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/backfill_langfuse_trace_url.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/r3_dense_smoke.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/r3_evalvault_run_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/r3_retriever_docs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/r3_smoke_real.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/r3_stage_events_sample.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/pipeline_template_inspect.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/reports/generate_release_notes.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/run_with_timeout.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/test_full_evaluation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/tests/run_regressions.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/tests/run_retriever_stage_report_smoke.sh +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/validate_tutorials.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/verify_ragas_compliance.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/scripts/verify_workflows.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/main.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/benchmark.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/domain.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/knowledge.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/pipeline.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/runs.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/app.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/agent.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/analyze.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/api.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/benchmark.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/calibrate.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/debug.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/domain.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/experiment.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/gate.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/generate.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/history.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/init.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/kg.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/langfuse.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/method.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/phoenix.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/pipeline.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/prompts.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/run.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/run_helpers.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/stage.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/analysis_io.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/console.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/errors.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/formatters.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/options.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/presets.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/progress.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/validators.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/mcp/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/mcp/schemas.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/mcp/tools.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/analysis_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/base_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/bm25_searcher_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/causal_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/causal_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/common.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/comparison_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/data_loader_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/detailed_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/diagnostic_playbook_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/embedding_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/embedding_distribution_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/embedding_searcher_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/hybrid_rrf_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/hybrid_weighted_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/hypothesis_generator_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/llm_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/low_performer_extractor_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/model_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/morpheme_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/morpheme_quality_checker_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/network_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/nlp_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/nlp_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/pattern_detector_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/pipeline_factory.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/pipeline_helpers.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/priority_summary_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/ragas_evaluator_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/retrieval_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/retrieval_benchmark_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/retrieval_quality_checker_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/root_cause_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/run_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/run_change_detector_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/run_comparator_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/run_loader_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/run_metric_comparator_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/search_comparator_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/statistical_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/statistical_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/statistical_comparator_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/summary_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/time_series_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/timeseries_advanced_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/trend_detector_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/verification_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/benchmark/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/benchmark/lm_eval_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/cache/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/cache/hybrid_cache.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/cache/memory_cache.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/base.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/csv_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/excel_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/json_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/loader_factory.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/method_input_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/streaming_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/templates.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/thresholds.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/debug/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/debug/report_renderer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/documents/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/documents/ocr/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/documents/ocr/paddleocr_backend.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/documents/pdf_extractor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/documents/versioned_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/domain_memory/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/domain_memory/domain_memory_schema.sql +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/domain_memory/sqlite_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/improvement/insight_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/improvement/pattern_detector.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/improvement/playbook_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/improvement/stage_metric_playbook_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/kg/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/kg/graph_rag_retriever.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/kg/networkx_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/kg/parallel_kg_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/kg/query_strategies.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/anthropic_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/azure_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/base.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/instructor_factory.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/llm_relation_augmenter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/ollama_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/openai_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/token_aware_chat.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/vllm_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/methods/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/methods/baseline_oracle.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/methods/external_command.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/methods/registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/bm25_retriever.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/dense_retriever.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/document_chunker.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/hybrid_retriever.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/kiwi_tokenizer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/korean_evaluation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/korean_stopwords.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/toolkit.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/phoenix/sync_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/report/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/report/dashboard_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/report/llm_report_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/report/markdown_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/base_sql.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/benchmark_storage_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/postgres_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/postgres_schema.sql +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/schema.sql +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/sqlite_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/open_rag_log_handler.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/open_rag_trace_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/open_rag_trace_decorators.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/open_rag_trace_helpers.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/phoenix_tracer_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracker/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracker/langfuse_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracker/mlflow_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracker/phoenix_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/agent_types.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/domain_config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/instrumentation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/langfuse_support.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/model_config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/phoenix_support.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/playbooks/improvement_playbook.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/settings.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/debug_ragas.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/debug_ragas_real.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/analysis.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/analysis_pipeline.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/benchmark.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/benchmark_run.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/dataset.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/debug.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/experiment.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/feedback.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/improvement.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/kg.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/memory.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/method.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/prompt.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/rag_trace.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/result.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/stage.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/analysis_registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/confidence.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/contextual_relevancy.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/entity_preservation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/insurance.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/no_answer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/retrieval_rank.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/terms_dictionary.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/text_match.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/analysis_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/async_batch_executor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/batch_executor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/benchmark_report_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/benchmark_runner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/benchmark_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/cache_metrics.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/cluster_map_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/dataset_preprocessor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/debug_report_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/document_chunker.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/document_versioning.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/domain_learning_hook.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/embedding_overlay.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/entity_extractor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/evaluator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/experiment_comparator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/experiment_manager.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/experiment_reporter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/experiment_repository.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/experiment_statistics.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/improvement_guide_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/intent_classifier.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/kg_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/memory_aware_evaluator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/memory_based_analysis.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/method_runner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/pipeline_orchestrator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/pipeline_template_registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/prompt_manifest.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/prompt_registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/prompt_status.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/ragas_prompt_overrides.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/retrieval_metrics.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/retriever_context.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/satisfaction_calibration_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/stage_event_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/stage_metric_guide_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/stage_metric_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/stage_summary_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/synthetic_qa_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/testset_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/threshold_profiles.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/unified_report_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/visual_space_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/mkdocs_helpers.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/inbound/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/inbound/analysis_pipeline_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/inbound/evaluator_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/inbound/learning_hook_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/inbound/web_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/analysis_cache_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/analysis_module_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/analysis_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/benchmark_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/causal_analysis_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/dataset_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/domain_memory_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/embedding_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/improvement_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/intent_classifier_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/korean_nlp_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/llm_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/method_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/nlp_analysis_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/relation_augmenter_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/report_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/stage_storage_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/storage_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/tracer_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/tracker_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/reports/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/reports/release_notes.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/scripts/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/scripts/regression_runner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/conftest.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/benchmark/retrieval_ground_truth_min.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/benchmark/retrieval_ground_truth_multi.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/auto_insurance_qa_korean_full.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/comprehensive_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/edge_cases.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/edge_cases.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/evaluation_test_sample.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/graphrag_retriever_docs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/graphrag_smoke.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_document.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_english.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_english.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_english.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_korean.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_korean.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_korean.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_korean_versioned_pdf.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/run_mode_full_domain_memory.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/run_mode_simple.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/summary_eval_minimal.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/kg/minimal_graph.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/sample_dataset.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/sample_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/sample_dataset.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/benchmark/test_benchmark_service_integration.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/conftest.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_cli_integration.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_data_flow.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_e2e_scenarios.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_evaluation_flow.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_full_workflow.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_langfuse_flow.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_phoenix_flow.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_pipeline_api_contracts.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_storage_flow.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_summary_eval_fixture.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/optional_deps.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/inbound/mcp/test_execute_tools.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/inbound/mcp/test_read_tools.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/documents/test_pdf_extractor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/documents/test_versioned_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/improvement/test_insight_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/improvement/test_pattern_detector.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/improvement/test_playbook_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/improvement/test_stage_metric_playbook_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/kg/test_graph_rag_retriever.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/kg/test_parallel_kg_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/storage/test_benchmark_storage_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/config/test_phoenix_support.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/conftest.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_analysis_metric_registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_confidence.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_contextual_relevancy.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_entity_preservation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_metric_registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_no_answer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_retrieval_rank.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_text_match.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_cache_metrics.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_claim_level.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_dataset_preprocessor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_document_versioning.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_evaluator_comprehensive.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_improvement_guide_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_retrieval_metrics.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_retriever_context.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_stage_event_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_stage_metric_guide_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_synthetic_qa_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/test_embedding_overlay.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/test_prompt_manifest.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/test_prompt_status.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/reports/test_release_notes.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/scripts/test_regression_runner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_agent_types.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_analysis_entities.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_analysis_modules.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_analysis_pipeline.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_analysis_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_anthropic_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_async_batch_executor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_azure_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_benchmark_helpers.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_benchmark_runner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_causal_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_cli.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_cli_domain.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_cli_init.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_cli_progress.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_cli_utils.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_data_loaders.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_domain_config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_domain_memory.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_entities.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_entities_kg.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_entity_extractor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_evaluator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_experiment.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_hybrid_cache.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_instrumentation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_insurance_metric.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_intent_classifier.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_kg_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_kg_networkx.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_kiwi_tokenizer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_kiwi_warning_suppression.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_korean_dense.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_korean_evaluation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_korean_retrieval.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_langfuse_tracker.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_llm_relation_augmenter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_lm_eval_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_markdown_report.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_memory_cache.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_memory_services.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_method_plugins.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_mlflow_tracker.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_model_config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_nlp_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_nlp_entities.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_ollama_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_openai_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_phoenix_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_pipeline_orchestrator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_ports.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_postgres_storage.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_rag_trace_entities.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_run_memory_helpers.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_run_mode_fixtures.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_settings.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_sqlite_storage.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_stage_cli.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_stage_metric_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_stage_storage.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_stage_summary_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_statistical_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_streaming_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_summary_eval_fixture.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_testset_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_web_adapter.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: evalvault
|
|
3
|
-
Version: 1.62.
|
|
3
|
+
Version: 1.62.1
|
|
4
4
|
Summary: RAG evaluation system using Ragas with Phoenix/Langfuse tracing
|
|
5
5
|
Project-URL: Homepage, https://github.com/ntts9990/EvalVault
|
|
6
6
|
Project-URL: Documentation, https://github.com/ntts9990/EvalVault#readme
|
|
@@ -16,6 +16,8 @@
|
|
|
16
16
|
- 사용자 가이드(운영 포함): `guides/USER_GUIDE.md`
|
|
17
17
|
- 개발/기여: `guides/DEV_GUIDE.md`
|
|
18
18
|
- CLI→MCP 이식 계획: `guides/CLI_MCP_PLAN.md`
|
|
19
|
+
- RAGAS 인간 피드백 보정: `guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md`
|
|
20
|
+
- 진단 플레이북: `guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md` (문제→분석→해석→액션 흐름)
|
|
19
21
|
- 릴리즈 체크리스트: `guides/RELEASE_CHECKLIST.md`
|
|
20
22
|
- 상태 요약: `STATUS.md`
|
|
21
23
|
- 로드맵: `ROADMAP.md`
|
|
@@ -36,6 +38,8 @@ docs/
|
|
|
36
38
|
│ ├── USER_GUIDE.md # 사용/운영 종합 가이드
|
|
37
39
|
│ ├── DEV_GUIDE.md # 개발 루틴/테스트/품질
|
|
38
40
|
│ ├── CLI_MCP_PLAN.md # CLI→MCP 이식 계획 (Living Doc)
|
|
41
|
+
│ ├── RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md # RAGAS 보정 방법론
|
|
42
|
+
│ ├── EVALVAULT_DIAGNOSTIC_PLAYBOOK.md # 진단 플레이북
|
|
39
43
|
│ ├── RELEASE_CHECKLIST.md # 배포 체크리스트
|
|
40
44
|
│ ├── OPEN_RAG_TRACE_*.md # Open RAG Trace 샘플/내부 래퍼
|
|
41
45
|
│ └── OPEN_RAG_TRACE_*.md
|
|
@@ -0,0 +1,426 @@
|
|
|
1
|
+
# EvalVault 진단 플레이북 (Diagnostic Playbook)
|
|
2
|
+
> EvalVault의 분석(Analysis) 기능을 “문제 → 분석 선택 → 실행 → 아티팩트 해석 → 개선 실험”으로 연결하는 한국어 진단 실행서
|
|
3
|
+
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
## 1) 목적 / 범위
|
|
7
|
+
|
|
8
|
+
### 목적
|
|
9
|
+
- 평가 점수의 “좋고 나쁨”을 넘어 **왜 그런지(원인)와 무엇을 바꿔야 하는지(액션)**를 재현 가능한 흐름으로 정리한다.
|
|
10
|
+
- CLI/Web UI에서 동일한 DB를 공유하는 전제 하에 **run_id 중심**으로 진단을 표준화한다.
|
|
11
|
+
|
|
12
|
+
### 범위(포함)
|
|
13
|
+
- 단일 실행(run_id) 진단: 통계/NLP/인과/플레이북/가설/네트워크/시계열
|
|
14
|
+
- A/B 비교(run_id A vs B): 통계 비교 + 변경 탐지 + 비교 보고서
|
|
15
|
+
- 검색/임베딩/형태소 “검증” 루틴(데이터 품질·전처리·리트리버 품질 확인)
|
|
16
|
+
- 산출물(보고서/아티팩트) 구조 및 해석 기준
|
|
17
|
+
- 반복 개선 루프 및 품질 체크리스트
|
|
18
|
+
|
|
19
|
+
### 범위(제외)
|
|
20
|
+
- 본 문서는 코드 변경/새 기능 설계를 포함하지 않는다.
|
|
21
|
+
- 외부 링크/URL은 참조하지 않는다.
|
|
22
|
+
|
|
23
|
+
### 전문가 관점 체크(문서 구조 기준)
|
|
24
|
+
- **인지과학자**: 평가 가이드의 모호성/편향을 줄이고, 진단 단계에서 인지 부하를 줄이는 흐름(결정 트리 → 실행 → 해석)을 유지한다.
|
|
25
|
+
- **편집자**: 보고서/아티팩트 해석 순서가 일관되는지(요약 → 근거 → 액션) 확인한다.
|
|
26
|
+
- **국문학자**: 한국어 표현/톤 관련 문제는 `verify_morpheme` 결과와 함께 판단하고, 문체 기준이 분명한지 점검한다.
|
|
27
|
+
- **소프트웨어 개발자**: 아티팩트 경로와 run_id 재현성이 항상 남는지, 실패 시 원인 추적이 가능한지 확인한다.
|
|
28
|
+
- **아키텍트**: 진단 단계에서 모듈 간 의존성이 과도하지 않은지(단일 축 변경 원칙 유지) 점검한다.
|
|
29
|
+
- **UI/UX 전문가**: 사용자가 “다음 액션”을 바로 이해할 수 있도록 핵심 아티팩트/결론 노출 순서를 고정한다.
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## 2) 전제조건(필수) / 준비물
|
|
34
|
+
|
|
35
|
+
### 실행 환경(Extras)
|
|
36
|
+
- `analysis`: 통계/분석 모듈 기반(예: scikit-learn, xgboost)
|
|
37
|
+
- `timeseries`: 시계열 고급 분석 기반(예: aeon, numba)
|
|
38
|
+
- `dashboard`: 대시보드 출력 기반(예: matplotlib)
|
|
39
|
+
- (권장) `korean`: 한국어 형태소/검색 기반(예: kiwipiepy, rank-bm25, sentence-transformers)
|
|
40
|
+
|
|
41
|
+
### 필수 입력/식별자
|
|
42
|
+
- `DB 경로`: 기본 `data/db/evalvault.db` 또는 환경 변수 `EVALVAULT_DB_PATH`
|
|
43
|
+
- `run_id`: 평가/분석/아티팩트가 묶이는 단위 식별자
|
|
44
|
+
- `metrics`: 예) `faithfulness`, `answer_relevancy`, `context_precision`, `context_recall`, `factual_correctness`, `semantic_similarity`
|
|
45
|
+
|
|
46
|
+
### 핵심 산출물 경로(고정 패턴)
|
|
47
|
+
- 단일 실행 자동 분석:
|
|
48
|
+
- `reports/analysis/analysis_<RUN_ID>.json`
|
|
49
|
+
- `reports/analysis/analysis_<RUN_ID>.md`
|
|
50
|
+
- `reports/analysis/artifacts/analysis_<RUN_ID>/index.json`
|
|
51
|
+
- `reports/analysis/artifacts/analysis_<RUN_ID>/<node_id>.json`
|
|
52
|
+
- A/B 비교:
|
|
53
|
+
- `reports/comparison/comparison_<RUN_A>_<RUN_B>.json`
|
|
54
|
+
- `reports/comparison/comparison_<RUN_A>_<RUN_B>.md`
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## 3) 분석 도구 지형도(분류)
|
|
59
|
+
|
|
60
|
+
### 3.1 “의도(AnalysisIntent)” 분류(실행 선택의 기준)
|
|
61
|
+
의도 열거형: `src/evalvault/domain/entities/analysis_pipeline.py`
|
|
62
|
+
|
|
63
|
+
| 카테고리 | Intent 값 | 한 줄 용도 | 기본 템플릿 핵심 노드(요약) |
|
|
64
|
+
|---|---|---|---|
|
|
65
|
+
| 검증 | `verify_morpheme` | 한국어 형태소 처리 품질 점검 | `data_loader → morpheme_analyzer → morpheme_quality_checker → verification_report` |
|
|
66
|
+
| 검증 | `verify_embedding` | 임베딩 분포/품질 점검 | `data_loader → embedding_analyzer → embedding_distribution → verification_report` |
|
|
67
|
+
| 검증 | `verify_retrieval` | 검색 컨텍스트 품질 점검 | `data_loader → retrieval_analyzer → retrieval_quality_checker → verification_report` |
|
|
68
|
+
| 비교 | `compare_search` | BM25/Dense/Hybrid 검색 비교 | `data_loader → (bm25/embedding/hybrid) → search_comparator → comparison_report` |
|
|
69
|
+
| 비교 | `compare_models` | 모델별 성능 비교 | `run_loader → model_analyzer → statistical_comparator → comparison_report` |
|
|
70
|
+
| 비교 | `compare_runs` | 실행(run) 단위 비교 | `run_loader → run_analyzer → statistical_comparator → comparison_report` |
|
|
71
|
+
| 분석 | `analyze_low_metrics` | 낮은 점수 원인 분석(종합) | `ragas_evaluator/low_performer_extractor/diagnostic_playbook/causal/root_cause/priority_summary/llm_report` |
|
|
72
|
+
| 분석 | `analyze_patterns` | 패턴(키워드/질문유형) 중심 | `data_loader → nlp_analyzer → pattern_detector → (priority/llm_report)` |
|
|
73
|
+
| 분석 | `analyze_trends` | 실행 이력 기반 추세 | `run_loader → time_series_analyzer → trend_detector → llm_report` |
|
|
74
|
+
| 분석 | `analyze_statistical` | 통계 요약/상관 | `data_loader → statistical_analyzer` |
|
|
75
|
+
| 분석 | `analyze_nlp` | NLP 요약(키워드/유형) | `data_loader → nlp_analyzer` |
|
|
76
|
+
| 분석 | `analyze_causal` | 인과 단서(상관 기반 힌트 포함) | `data_loader → statistical_analyzer → causal_analyzer` |
|
|
77
|
+
| 분석 | `analyze_network` | 메트릭 상관 네트워크 | `data_loader → statistical_analyzer → network_analyzer` |
|
|
78
|
+
| 분석 | `analyze_playbook` | 플레이북 기반 진단(간단) | `data_loader → diagnostic_playbook` |
|
|
79
|
+
| 분석 | `detect_anomalies` | 이상 탐지(시계열) | `run_loader → timeseries_advanced(mode=anomaly)` |
|
|
80
|
+
| 분석 | `forecast_performance` | 성능 예측(시계열) | `run_loader → timeseries_advanced(mode=forecast)` |
|
|
81
|
+
| 분석 | `generate_hypotheses` | 가설 생성(자동) | `data_loader → statistical_analyzer → ragas_evaluator → low_performer_extractor → hypothesis_generator` |
|
|
82
|
+
| 벤치마크 | `benchmark_retrieval` | 검색 벤치마크 | `retrieval_benchmark` |
|
|
83
|
+
| 보고서 | `generate_summary` | 요약 보고서 | `data_loader/statistical/priority_summary → llm_report(report_type=summary)` |
|
|
84
|
+
| 보고서 | `generate_detailed` | 상세 보고서(종합) | `통계+RAGAS+진단+NLP+패턴+인과+원인+추세 → llm_report(report_type=analysis)` |
|
|
85
|
+
| 보고서 | `generate_comparison` | 비교 보고서(종합) | `run_loader → run_metric_comparator + run_change_detector → llm_report(report_type=comparison)` |
|
|
86
|
+
|
|
87
|
+
> 템플릿 정의 근거: `src/evalvault/domain/services/pipeline_template_registry.py`
|
|
88
|
+
> 모듈 등록 근거: `src/evalvault/adapters/outbound/analysis/pipeline_factory.py`
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
### 3.2 “모듈(module_id)” 지도(실제 실행 단위)
|
|
93
|
+
모듈은 `pipeline_factory.py`에서 등록되며, 각 모듈은 `module_id`를 가진다.
|
|
94
|
+
|
|
95
|
+
| module_id | 파일(근거) | 역할 | 주로 쓰이는 의도/상황 |
|
|
96
|
+
|---|---|---|---|
|
|
97
|
+
| `data_loader` | `src/evalvault/adapters/outbound/analysis/data_loader_module.py` | 데이터/런 로드 | 단일 실행 기반 대부분 |
|
|
98
|
+
| `run_loader` | `src/evalvault/adapters/outbound/analysis/run_loader_module.py` | DB에서 run 로드 | 비교/추세/시계열 |
|
|
99
|
+
| `statistical_analyzer` | `src/evalvault/adapters/outbound/analysis/statistical_analyzer_module.py` | 평균/분산/상관/통과율 | “원인 분석의 시작점” |
|
|
100
|
+
| `nlp_analyzer` | `src/evalvault/adapters/outbound/analysis/nlp_analyzer_module.py` | 키워드/질문유형 요약 | 패턴·분포 확인 |
|
|
101
|
+
| `causal_analyzer` | `src/evalvault/adapters/outbound/analysis/causal_analyzer_module.py` | 인과 단서 생성 | 원인 가설 강화 |
|
|
102
|
+
| `diagnostic_playbook` | `src/evalvault/adapters/outbound/analysis/diagnostic_playbook_module.py` | 메트릭별 진단/개선 힌트 | 저점수 신속 대응 |
|
|
103
|
+
| `root_cause_analyzer` | `src/evalvault/adapters/outbound/analysis/root_cause_analyzer_module.py` | 진단+인과 단서를 원인으로 집계 | 액션 후보 정리 |
|
|
104
|
+
| `pattern_detector` | `src/evalvault/adapters/outbound/analysis/pattern_detector_module.py` | 키워드/유형 패턴 요약 | 재현 조건 찾기 |
|
|
105
|
+
| `retrieval_analyzer` | `src/evalvault/adapters/outbound/analysis/retrieval_analyzer_module.py` | 검색 요약 통계 | 검색 품질 검증 |
|
|
106
|
+
| `retrieval_quality_checker` | `src/evalvault/adapters/outbound/analysis/retrieval_quality_checker_module.py` | 검색 품질 체크(휴리스틱) | “검색이 문제인지” 빠른 판정 |
|
|
107
|
+
| `embedding_analyzer` | `src/evalvault/adapters/outbound/analysis/embedding_analyzer_module.py` | 임베딩 분포 통계 | 임베딩 드리프트/품질 |
|
|
108
|
+
| `morpheme_analyzer` | `src/evalvault/adapters/outbound/analysis/morpheme_analyzer_module.py` | 형태소 분석 | 한국어 쿼리/토큰화 |
|
|
109
|
+
| `morpheme_quality_checker` | `src/evalvault/adapters/outbound/analysis/morpheme_quality_checker_module.py` | 형태소 품질 체크 | 한국어 분석 신뢰성 |
|
|
110
|
+
| `time_series_analyzer` | `src/evalvault/adapters/outbound/analysis/time_series_analyzer_module.py` | 실행 이력 요약 | 추세 파악 |
|
|
111
|
+
| `timeseries_advanced` | `src/evalvault/adapters/outbound/analysis/timeseries_advanced_module.py` | 이상탐지/예측 | 릴리즈 회귀 감지 |
|
|
112
|
+
| `trend_detector` | `src/evalvault/adapters/outbound/analysis/trend_detector_module.py` | 추세 감지 | 회귀 시점 탐색 |
|
|
113
|
+
| `network_analyzer` | `src/evalvault/adapters/outbound/analysis/network_analyzer_module.py` | 상관 네트워크 | 메트릭 구조 파악 |
|
|
114
|
+
| `hypothesis_generator` | `src/evalvault/adapters/outbound/analysis/hypothesis_generator_module.py` | 가설 자동 생성 | 다음 실험 설계 |
|
|
115
|
+
| `run_metric_comparator` | `src/evalvault/adapters/outbound/analysis/run_metric_comparator_module.py` | A/B 메트릭 비교 상세 | 비교 스코어카드 |
|
|
116
|
+
| `run_change_detector` | `src/evalvault/adapters/outbound/analysis/run_change_detector_module.py` | 데이터/설정/프롬프트 변경 탐지 | “뭐가 바뀌었나” |
|
|
117
|
+
| `llm_report` | `src/evalvault/adapters/outbound/analysis/llm_report_module.py` | 요약/상세/비교 리포트 | 사람 읽는 결론화 |
|
|
118
|
+
|
|
119
|
+
---
|
|
120
|
+
|
|
121
|
+
## 4) 진단 결정 트리(문제 → 분석 선택)
|
|
122
|
+
|
|
123
|
+
### 4.1 0단계: “진단 가능 상태” 체크(실패 원인 제거)
|
|
124
|
+
- [ ] `--db` 또는 `EVALVAULT_DB_PATH`가 올바른가
|
|
125
|
+
- [ ] 대상 `run_id`가 DB에 존재하는가 (`evalvault history`로 확인)
|
|
126
|
+
- [ ] 데이터셋에 `thresholds`가 포함되어 있는가(또는 기본 기준을 알고 있는가)
|
|
127
|
+
- [ ] 메트릭 실행 조건(임베딩 필요 메트릭 등)을 충족하는가
|
|
128
|
+
|
|
129
|
+
### 4.2 1단계: 증상 기반 선택(가장 빠른 분기)
|
|
130
|
+
아래 표에서 “현재 문제”를 고르고, 권장 Intent/명령으로 바로 진입한다.
|
|
131
|
+
|
|
132
|
+
| 문제(증상) | 1차 목표 | 권장 Intent(또는 명령) | 우선 확인 아티팩트 |
|
|
133
|
+
|---|---|---|---|
|
|
134
|
+
| 특정 메트릭이 임계치 미달(전반적 저점수) | “왜 낮은지” 원인 후보 만들기 | `analyze_low_metrics` 또는 `analyze_playbook` | `analysis_<RUN_ID>.md`, `diagnostic_playbook.json`, `root_cause_analyzer.json` |
|
|
135
|
+
| `faithfulness` 낮음 | 근거/인용/컨텍스트 정합 문제 분리 | `analyze_low_metrics` + `verify_retrieval` | `retrieval_quality_checker.json`, `ragas_evaluator.json` |
|
|
136
|
+
| `answer_relevancy` 낮음 | 의도 파악/프롬프트 정렬 점검 | `analyze_low_metrics` + `analyze_patterns` | `nlp_analyzer.json`, `pattern_detector.json` |
|
|
137
|
+
| `context_precision` 낮음 | 불필요 컨텍스트/노이즈 | `verify_retrieval` + (필요 시) `compare_search` | `retrieval_analyzer.json`, `search_comparator.json` |
|
|
138
|
+
| `context_recall` 낮음 | top_k/쿼리 확장/청킹 이슈 | `verify_retrieval` + `benchmark_retrieval` | `retrieval_benchmark.json`, `retrieval_quality_checker.json` |
|
|
139
|
+
| 임베딩 기반 메트릭이 불안정/이상 | 임베딩 백엔드/분포 점검 | `verify_embedding` | `embedding_analyzer.json`, `embedding_distribution.json` |
|
|
140
|
+
| 한국어에서 토큰화/검색이 이상 | 형태소 기반 파이프라인 점검 | `verify_morpheme` | `morpheme_quality_checker.json` |
|
|
141
|
+
| 릴리즈 이후 점수가 갑자기 하락 | 하락 시점/원인 변경 추적 | `analyze_trends` + `generate_comparison` | `trend_detector.json`, `run_change_detector.json` |
|
|
142
|
+
| A/B 비교에서 개선이 애매함 | 유의성/변경점 정리 | `evalvault analyze-compare` 또는 `generate_comparison` | `comparison_<A>_<B>.md`, `run_metric_comparator.json` |
|
|
143
|
+
| 스테이지 병목/지연/트레이스 누락 | 실행 단계/추적 문제 분리 | `evalvault stage` + `evalvault debug report` | debug report 출력, stage metrics 리스트 |
|
|
144
|
+
| 상관 구조가 바뀜(메트릭 간 연동) | 상관/네트워크 구조 확인 | `analyze_network` | `network_analyzer.json` |
|
|
145
|
+
| 다음 실험 설계를 못하겠음 | 가설 자동 생성 | `generate_hypotheses` | `hypothesis_generator.json` |
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## 5) 핵심 플레이북 시나리오(최소 8개)
|
|
150
|
+
|
|
151
|
+
> 공통 원칙: 각 시나리오는 **(1) 문제 정의 → (2) 실행 경로 → (3) 아티팩트 판독 → (4) 다음 실험** 순서로 처리한다.
|
|
152
|
+
|
|
153
|
+
### 시나리오 1) “전체 통과율이 낮다” (원인 후보를 빠르게 만든다)
|
|
154
|
+
- 트리거: `analysis_<RUN_ID>.md`에서 전체 통과율이 낮음(예: 0.7 미만)
|
|
155
|
+
- 실행(CLI):
|
|
156
|
+
- `uv run evalvault analyze <RUN_ID> --db data/db/evalvault.db --nlp --causal --playbook --enable-llm`
|
|
157
|
+
- 실행(Web UI):
|
|
158
|
+
- Reports에서 run 선택 → Analysis Lab에서 “분석(통계/NLP/인과/플레이북)” 실행
|
|
159
|
+
- 확인 아티팩트:
|
|
160
|
+
- `reports/analysis/artifacts/analysis_<RUN_ID>/diagnostic_playbook.json`
|
|
161
|
+
- `reports/analysis/artifacts/analysis_<RUN_ID>/root_cause_analyzer.json`
|
|
162
|
+
- `reports/analysis/artifacts/analysis_<RUN_ID>/priority_summary.json`
|
|
163
|
+
- 해석 기준:
|
|
164
|
+
- `diagnostics`의 `gap(임계치-점수)`가 큰 메트릭부터 우선순위.
|
|
165
|
+
- `root_cause_analyzer`의 추천(recommendations) 중 “반복적으로 등장하는 조치”를 1차 실험 후보로 채택.
|
|
166
|
+
- 다음 액션:
|
|
167
|
+
- 프롬프트/검색(top_k, 리랭킹)/데이터 정제 중 **1개 축만** 바꿔 재실행(run) 후 비교.
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
### 시나리오 2) “faithfulness가 낮다” (검색 문제 vs 생성 문제 분리)
|
|
172
|
+
- 트리거: `faithfulness`가 임계치 미달
|
|
173
|
+
- 실행(권장 흐름):
|
|
174
|
+
1) 검색 품질 검증 → 2) 저점수 원인 분석(종합)
|
|
175
|
+
- 실행(파이프라인 Intent):
|
|
176
|
+
- 1) `verify_retrieval`
|
|
177
|
+
- 2) `analyze_low_metrics`
|
|
178
|
+
- 실행(CLI 예시):
|
|
179
|
+
- `uv run evalvault pipeline analyze "검색 품질 검증" --run-id <RUN_ID> --db data/db/evalvault.db`
|
|
180
|
+
- `uv run evalvault pipeline analyze "낮은 메트릭 원인 분석" --run-id <RUN_ID> --db data/db/evalvault.db`
|
|
181
|
+
- 확인 아티팩트(핵심):
|
|
182
|
+
- `retrieval_quality_checker.json`의 `passed` 및 체크 항목(빈 컨텍스트 비율/평균 컨텍스트 토큰/키워드 겹침/ground_truth hit)
|
|
183
|
+
- `diagnostic_playbook.json`의 `faithfulness` 관련 진단/권고
|
|
184
|
+
- 해석 기준:
|
|
185
|
+
- `verify_retrieval` 체크 실패이면: 생성(LLM)보다 **검색/컨텍스트 구성**이 1차 병목.
|
|
186
|
+
- 체크 통과인데 faithfulness만 낮으면: **답변의 근거 인용/출처 정렬(프롬프트/후처리)**를 우선 점검.
|
|
187
|
+
- 다음 액션:
|
|
188
|
+
- 검색 체크 실패 시: 리랭킹/노이즈 필터/컨텍스트 최소 토큰 확보부터.
|
|
189
|
+
- 검색 체크 통과 시: 시스템 프롬프트에 “근거 인용/컨텍스트 밖 주장 금지” 등 정렬 강화.
|
|
190
|
+
|
|
191
|
+
---
|
|
192
|
+
|
|
193
|
+
### 시나리오 3) “answer_relevancy가 낮다” (의도/질문유형 패턴으로 좁힌다)
|
|
194
|
+
- 트리거: `answer_relevancy`가 임계치 미달
|
|
195
|
+
- 실행(Intent): `analyze_patterns` + (필요 시) `analyze_low_metrics`
|
|
196
|
+
- 확인 아티팩트:
|
|
197
|
+
- `nlp_analyzer.json` (top_keywords, question_types)
|
|
198
|
+
- `pattern_detector.json` (상위 키워드/질문유형 요약)
|
|
199
|
+
- 해석 기준:
|
|
200
|
+
- 특정 질문유형(예: 절차형/정의형/비교형)이 과대표집되어 있고 해당 유형에서 점수가 낮으면 → **유형별 프롬프트 분기** 후보.
|
|
201
|
+
- 다음 액션:
|
|
202
|
+
- 질문유형별 템플릿/가드레일을 분리한 뒤 동일 데이터셋으로 재평가.
|
|
203
|
+
|
|
204
|
+
---
|
|
205
|
+
|
|
206
|
+
### 시나리오 4) “context_precision이 낮다” (노이즈 컨텍스트를 줄인다)
|
|
207
|
+
- 트리거: `context_precision`이 낮고, 컨텍스트가 길거나 많음
|
|
208
|
+
- 실행(Intent): `verify_retrieval` → (대안 비교) `compare_search`
|
|
209
|
+
- 확인 아티팩트:
|
|
210
|
+
- `retrieval_analyzer.json` 요약(컨텍스트 개수/토큰/빈 컨텍스트 비율/키워드 겹침)
|
|
211
|
+
- `compare_search` 결과(하이브리드 방식 비교 시)
|
|
212
|
+
- 해석 기준:
|
|
213
|
+
- 키워드 겹침이 낮고 컨텍스트 토큰이 크면: “긴데 관련 없음” 패턴 → **리랭킹/필터링** 우선.
|
|
214
|
+
- 다음 액션:
|
|
215
|
+
- top_k를 무작정 늘리기보다, 불필요 컨텍스트 제거(precision 확보)부터 적용 후 재평가.
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
### 시나리오 5) “context_recall이 낮다” (찾아야 할 근거를 못 찾는다)
|
|
220
|
+
- 트리거: `context_recall`이 낮고, ground_truth가 존재하는 데이터셋
|
|
221
|
+
- 실행(Intent): `verify_retrieval` + `benchmark_retrieval`
|
|
222
|
+
- 확인 아티팩트:
|
|
223
|
+
- `retrieval_quality_checker.json`의 `ground_truth_hit_rate`
|
|
224
|
+
- `retrieval_benchmark.json`(벤치마크 결과)
|
|
225
|
+
- 해석 기준:
|
|
226
|
+
- `ground_truth_hit_rate`가 낮으면: 쿼리/청킹/인덱싱 단계의 재현율 병목 가능성이 큼.
|
|
227
|
+
- 다음 액션:
|
|
228
|
+
- 청킹 전략/쿼리 확장/검색 방식(하이브리드) 실험을 1개씩 분리 실행.
|
|
229
|
+
|
|
230
|
+
---
|
|
231
|
+
|
|
232
|
+
### 시나리오 6) “임베딩 기반 지표가 흔들린다/이상하다” (백엔드/분포 확인)
|
|
233
|
+
- 트리거: `semantic_similarity` 등 임베딩 기반 결과가 불안정하거나 NaN/실패가 잦음
|
|
234
|
+
- 실행(Intent): `verify_embedding`
|
|
235
|
+
- 확인 아티팩트:
|
|
236
|
+
- `embedding_analyzer.json` 요약(backend/model/dimension/avg_norm/norm_std/mean_cosine_to_centroid)
|
|
237
|
+
- `embedding_distribution.json`(분포 점검 결과)
|
|
238
|
+
- 해석 기준:
|
|
239
|
+
- `norm_std`가 지나치게 낮거나 `mean_cosine_to_centroid`가 지나치게 높으면: 임베딩이 한 방향으로 붕괴/클러스터링 가능성.
|
|
240
|
+
- backend 오류가 있으면: 임베딩 지표 해석 전에 환경/모델을 먼저 안정화.
|
|
241
|
+
- 다음 액션:
|
|
242
|
+
- 임베딩 백엔드/모델을 고정한 뒤(프로필/설정) 재평가하여 변동성부터 제거.
|
|
243
|
+
|
|
244
|
+
---
|
|
245
|
+
|
|
246
|
+
### 시나리오 7) “한국어에서 진단 자체가 믿기 어렵다” (형태소/토크나이저 검증)
|
|
247
|
+
- 트리거: 한국어 질문/컨텍스트에서 키워드/검색 결과가 부자연스럽거나 분석 품질이 낮다고 의심됨
|
|
248
|
+
- 실행(Intent): `verify_morpheme`
|
|
249
|
+
- 확인 아티팩트:
|
|
250
|
+
- `morpheme_quality_checker.json`의 `tokenizer_backend`(예: kiwi) 및 토큰/어휘 크기 체크
|
|
251
|
+
- 해석 기준:
|
|
252
|
+
- 형태소 품질 체크 실패 시: 키워드/검색/분류 기반 분석 결과의 신뢰도가 동반 하락할 수 있음.
|
|
253
|
+
- 다음 액션:
|
|
254
|
+
- 한국어 extra 및 토크나이저 백엔드를 먼저 정상화한 뒤, NLP/검색 관련 분석을 재실행.
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
### 시나리오 8) “릴리즈 이후 성능이 회귀했다” (시점+변경점으로 좁힌다)
|
|
259
|
+
- 트리거: 최근 run들의 성능이 하락 추세
|
|
260
|
+
- 실행(Intent):
|
|
261
|
+
- 1) `analyze_trends` (하락 시점 탐색)
|
|
262
|
+
- 2) `generate_comparison` (대표 run A/B 선택 후 변경점+비교 보고서)
|
|
263
|
+
- 실행(CLI 예시):
|
|
264
|
+
- `uv run evalvault pipeline analyze "추세 분석" --db data/db/evalvault.db`
|
|
265
|
+
- `uv run evalvault analyze-compare <RUN_A> <RUN_B> --db data/db/evalvault.db --test t-test|mann-whitney`
|
|
266
|
+
- 확인 아티팩트:
|
|
267
|
+
- `trend_detector.json` (추세 감지 결과)
|
|
268
|
+
- `run_change_detector.json` (데이터셋/설정/프롬프트 변경)
|
|
269
|
+
- `comparison_<A>_<B>.md` (비교 보고서)
|
|
270
|
+
- 해석 기준:
|
|
271
|
+
- “변경 탐지”에서 데이터셋이 바뀌었다면 비교 해석이 왜곡될 수 있으므로 **동일 데이터셋 조건**을 우선 확보.
|
|
272
|
+
- 다음 액션:
|
|
273
|
+
- 변경이 1개 축(프롬프트/모델/검색)으로 수렴되도록 실험 설계를 재정렬.
|
|
274
|
+
|
|
275
|
+
---
|
|
276
|
+
|
|
277
|
+
### 시나리오 9) “스테이지 병목/트레이스 누락” (실행 단계 진단)
|
|
278
|
+
- 트리거: 응답 지연/타임아웃, stage metric 누락, Phoenix/Langfuse 링크 없음
|
|
279
|
+
- 실행(CLI):
|
|
280
|
+
- `uv run evalvault stage compute-metrics <RUN_ID> --db data/db/evalvault.db`
|
|
281
|
+
- `uv run evalvault debug report <RUN_ID> --db data/db/evalvault.db`
|
|
282
|
+
- 확인 아티팩트:
|
|
283
|
+
- debug report의 stage summary/bottlenecks/recommendations/failing metrics
|
|
284
|
+
- trace 링크(phoenix/langfuse)가 있으면 해당 run에서 스팬 흐름 확인
|
|
285
|
+
- 해석 기준:
|
|
286
|
+
- 특정 stage에 병목이 집중되면 그 단계(검색/생성/후처리) 개선 우선
|
|
287
|
+
- trace 링크가 없으면 트레이싱 설정/환경 변수 우선 점검
|
|
288
|
+
- 다음 액션:
|
|
289
|
+
- `PHOENIX_ENABLED`, `PHOENIX_ENDPOINT` 및 Open RAG Trace 계측(어댑터/데코레이터) 확인
|
|
290
|
+
|
|
291
|
+
---
|
|
292
|
+
|
|
293
|
+
### 시나리오 10) “A/B 개선이 애매하다” (유의성/노이즈 관점으로 판단을 강화)
|
|
294
|
+
- 트리거: 평균 차이는 있으나 결론이 흔들림(샘플이 적거나 변동이 큼)
|
|
295
|
+
- 실행(현재 제공 흐름):
|
|
296
|
+
- `uv run evalvault analyze-compare <RUN_A> <RUN_B> --db data/db/evalvault.db --test t-test|mann-whitney`
|
|
297
|
+
- 파이프라인 비교 보고서: `AnalysisIntent.GENERATE_COMPARISON` (내부에서 사용)
|
|
298
|
+
- 보강(신뢰도 진단 프레임):
|
|
299
|
+
- `docs/guides/PRD_LENA.md`의 노이즈 분해/신뢰구간/표본수(N,K) 추천 개념을 적용해 “추가 샘플이 필요한지” 판단한다.
|
|
300
|
+
- 해석 기준(운영 규칙):
|
|
301
|
+
- 효과가 작고 표본이 작으면: 결론을 내리기보다 **N(문항 수) 또는 K(반복 수)** 확대가 우선.
|
|
302
|
+
- 다음 액션:
|
|
303
|
+
- 동일 데이터셋 조건 유지, 평가 비용 대비 효과가 큰 방향으로 N/K를 늘리는 계획을 수립한다.
|
|
304
|
+
|
|
305
|
+
---
|
|
306
|
+
|
|
307
|
+
### 시나리오 11) “자동 지표가 사용자 만족과 어긋난다” (인간 피드백 보정 루프)
|
|
308
|
+
- 트리거: 이해관계자가 RAGAS 점수를 KPI로 신뢰하지 않음
|
|
309
|
+
- 적용 프레임:
|
|
310
|
+
- `docs/guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md`의 절차(대표 샘플링 → 인간 평가 → 보정 모델 → 전체 적용 → 반복 개선)를 운영 루프로 연결한다.
|
|
311
|
+
- 운영 해석 기준:
|
|
312
|
+
- 자동 지표는 “재현 가능한 신호”로 유지하되, 만족도 정합은 보정 루프로 관리한다.
|
|
313
|
+
- 다음 액션:
|
|
314
|
+
- “불일치 케이스(자동 지표는 높지만 만족은 낮음 / 반대)”를 우선 라벨링 대상으로 선정한다.
|
|
315
|
+
|
|
316
|
+
---
|
|
317
|
+
|
|
318
|
+
## 6) CLI / Web UI 실행 경로(치트시트)
|
|
319
|
+
|
|
320
|
+
### 6.1 CLI(가장 빠른 시작)
|
|
321
|
+
- 평가 + 자동 분석:
|
|
322
|
+
- `uv run evalvault run <DATASET> --metrics <M1,M2,...> --db data/db/evalvault.db --auto-analyze`
|
|
323
|
+
- 단일 run 상세 분석(옵션 조합형):
|
|
324
|
+
- `uv run evalvault analyze <RUN_ID> --db data/db/evalvault.db --nlp --causal --playbook --enable-llm`
|
|
325
|
+
- (선택) `--dashboard`, `--anomaly-detect`, `--forecast`, `--network`, `--generate-hypothesis`
|
|
326
|
+
- A/B 비교:
|
|
327
|
+
- `uv run evalvault analyze-compare <RUN_A> <RUN_B> --db data/db/evalvault.db --test t-test|mann-whitney`
|
|
328
|
+
- 스테이지/디버그 진단:
|
|
329
|
+
- `uv run evalvault stage compute-metrics <RUN_ID> --db data/db/evalvault.db`
|
|
330
|
+
- `uv run evalvault debug report <RUN_ID> --db data/db/evalvault.db`
|
|
331
|
+
- 쿼리 기반 파이프라인:
|
|
332
|
+
- `uv run evalvault pipeline analyze "<자연어 쿼리>" --run-id <RUN_ID> --db data/db/evalvault.db`
|
|
333
|
+
- 파이프라인 가시화:
|
|
334
|
+
- `uv run evalvault pipeline intents`
|
|
335
|
+
- `uv run evalvault pipeline templates`
|
|
336
|
+
|
|
337
|
+
### 6.2 Web UI(메뉴 기반 운영)
|
|
338
|
+
- 실행:
|
|
339
|
+
- API: `uv run evalvault serve-api --reload`
|
|
340
|
+
- 프론트: `cd frontend && npm install && npm run dev`
|
|
341
|
+
- 메뉴 구조(이관 계획 기준):
|
|
342
|
+
- 기초 통계 / 시계열(이상·예측) / 구조·원인(인과·네트워크) / 지능형(가설·플레이북) / 비교
|
|
343
|
+
|
|
344
|
+
---
|
|
345
|
+
|
|
346
|
+
## 7) 산출물/아티팩트(무엇을 어디서 보나)
|
|
347
|
+
|
|
348
|
+
### 7.1 “요약 보고서” vs “아티팩트”
|
|
349
|
+
- 요약 보고서(`analysis_<RUN_ID>.md`, `comparison_<A>_<B>.md`): 의사결정용 결론/요약
|
|
350
|
+
- 아티팩트(`artifacts/.../<node_id>.json`): **원본 근거**(재현/디버깅/자동화에 필요)
|
|
351
|
+
|
|
352
|
+
### 7.2 아티팩트 인덱스 활용
|
|
353
|
+
- `reports/analysis/artifacts/analysis_<RUN_ID>/index.json`에는 노드별 결과 파일 경로가 구조화되어 있다.
|
|
354
|
+
- 운영 원칙: “보고서로 결론을 보고 → 인덱스로 근거 노드를 찾아 → 노드 JSON으로 확인” 순서를 고정한다.
|
|
355
|
+
|
|
356
|
+
---
|
|
357
|
+
|
|
358
|
+
## 8) 해석 기준 / 주의사항(오판 방지)
|
|
359
|
+
|
|
360
|
+
### 8.1 비교 분석 주의
|
|
361
|
+
- A/B 비교는 **동일 데이터셋 조건**에서 수행해야 해석이 안전하다.
|
|
362
|
+
- `run_change_detector`에서 데이터셋/설정/프롬프트 변경이 다수 발견되면, 결론을 내리기 전에 변경 축을 줄인다.
|
|
363
|
+
|
|
364
|
+
### 8.2 지표 해석 주의
|
|
365
|
+
- `thresholds`는 데이터셋에 포함되며, “점수 0.8이 항상 합격” 같은 단일 기준을 가정하지 않는다.
|
|
366
|
+
- 임베딩 기반 지표는 임베딩 백엔드/모델 상태에 민감하므로, `verify_embedding`으로 환경 안정성을 먼저 확인한다.
|
|
367
|
+
|
|
368
|
+
### 8.3 한국어 특화 주의
|
|
369
|
+
- 형태소/토큰화 품질이 낮으면 키워드/검색 기반 분석이 왜곡될 수 있다.
|
|
370
|
+
- `verify_morpheme` 결과가 실패인 상태에서 NLP/검색 결과를 과신하지 않는다.
|
|
371
|
+
|
|
372
|
+
---
|
|
373
|
+
|
|
374
|
+
## 9) 반복 개선 루프(운영 표준)
|
|
375
|
+
|
|
376
|
+
### 9.1 루프(고정 절차)
|
|
377
|
+
1. **기준 run 확보**: `evalvault run ... --db ... --auto-analyze`
|
|
378
|
+
2. **문제 분류**: 결정 트리로 Intent 선택(1차 진단)
|
|
379
|
+
3. **근거 확인**: 아티팩트 인덱스 → 핵심 노드 JSON 확인
|
|
380
|
+
4. **가설/액션 1개 선택**: 한 번에 한 축만 변경
|
|
381
|
+
5. **재실행(run)**: 동일 데이터셋/메트릭 유지
|
|
382
|
+
6. **비교(analyze-compare)**: 변화의 방향/유의성/변경점 확인
|
|
383
|
+
7. **기록/공유**: 비교 보고서를 “결정 기록”으로 남긴다
|
|
384
|
+
|
|
385
|
+
### 9.2 “한 번에 하나” 원칙(실험 설계)
|
|
386
|
+
- 한 번에 여러 요소(프롬프트+모델+검색)를 바꾸면 원인 추적이 불가능해진다.
|
|
387
|
+
- 원인 분석이 목적이면 변경을 최소화하고, 개선이 목적이면 변경은 하되 “비교 보고서로 변경점을 문서화”한다.
|
|
388
|
+
|
|
389
|
+
---
|
|
390
|
+
|
|
391
|
+
## 10) 품질 체크리스트(진단 완료 조건)
|
|
392
|
+
|
|
393
|
+
### 10.1 진단의 완결성
|
|
394
|
+
- [ ] 문제(증상)가 “메트릭/구간/범위”로 명확히 정의되었는가
|
|
395
|
+
- [ ] 선택한 Intent/모듈이 문제와 직접 연결되는가(근거 노드가 존재하는가)
|
|
396
|
+
- [ ] 보고서 결론이 아티팩트(노드 JSON)로 추적 가능한가
|
|
397
|
+
|
|
398
|
+
### 10.2 재현성
|
|
399
|
+
- [ ] `DB 경로`, `run_id`, `metrics`, `profile`이 기록되었는가
|
|
400
|
+
- [ ] 산출물 경로(`reports/...`)가 run_id 기준으로 정리되었는가
|
|
401
|
+
- [ ] 비교 시 동일 데이터셋 조건을 확인했는가
|
|
402
|
+
|
|
403
|
+
### 10.3 실행 안정성(환경)
|
|
404
|
+
- [ ] `analysis/timeseries/dashboard` extras가 설치되어 필요한 기능이 실행 가능한가
|
|
405
|
+
- [ ] 임베딩/한국어 토크나이저 환경이 `verify_embedding/verify_morpheme`로 확인되었는가
|
|
406
|
+
|
|
407
|
+
### 10.4 액션 품질
|
|
408
|
+
- [ ] 다음 실험이 “하나의 변경 축”으로 정의되었는가
|
|
409
|
+
- [ ] 성공/실패 판정 기준이 threshold 및 비교 보고서로 정의되었는가
|
|
410
|
+
|
|
411
|
+
---
|
|
412
|
+
|
|
413
|
+
## 부록 A) 빠른 매핑(“무슨 문제에 뭘 쓰나”)
|
|
414
|
+
|
|
415
|
+
| 의도(Intent) | 대표 질문(운영자가 던지는 질문) | 핵심 모듈(module_id) |
|
|
416
|
+
|---|---|---|
|
|
417
|
+
| `analyze_low_metrics` | “점수가 왜 낮지? 당장 뭘 바꿔야 하지?” | `ragas_evaluator`, `diagnostic_playbook`, `root_cause_analyzer`, `llm_report` |
|
|
418
|
+
| `verify_retrieval` | “검색이 문제인가?” | `retrieval_analyzer`, `retrieval_quality_checker` |
|
|
419
|
+
| `verify_embedding` | “임베딩이 정상인가?” | `embedding_analyzer`, `embedding_distribution` |
|
|
420
|
+
| `verify_morpheme` | “한국어 토큰화가 정상인가?” | `morpheme_analyzer`, `morpheme_quality_checker` |
|
|
421
|
+
| `generate_comparison` | “A/B에서 뭐가 바뀌었고 뭐가 유의미하지?” | `run_metric_comparator`, `run_change_detector`, `llm_report` |
|
|
422
|
+
| `analyze_trends` | “언제부터 나빠졌지?” | `time_series_analyzer`, `trend_detector` |
|
|
423
|
+
| `generate_hypotheses` | “다음 실험 가설을 자동으로 만들 수 있나?” | `hypothesis_generator` |
|
|
424
|
+
| `analyze_network` | “메트릭 구조(연동)가 어떻게 바뀌었나?” | `network_analyzer` |
|
|
425
|
+
|
|
426
|
+
> 스테이지/디버그 진단은 Intent 분류 없이 `evalvault stage`, `evalvault debug report`로 실행한다.
|