evalvault 1.62.0__tar.gz → 1.62.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (853) hide show
  1. {evalvault-1.62.0 → evalvault-1.62.1}/PKG-INFO +1 -1
  2. {evalvault-1.62.0 → evalvault-1.62.1}/docs/INDEX.md +4 -0
  3. evalvault-1.62.1/docs/guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md +426 -0
  4. evalvault-1.62.1/docs/guides/LENA_RAGAS_CALIBRATION_DEV_PLAN.md +428 -0
  5. evalvault-1.62.1/docs/guides/PRD_LENA.md +637 -0
  6. evalvault-1.62.1/docs/guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md +171 -0
  7. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/package-lock.json +14 -0
  8. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/package.json +1 -0
  9. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/AnalysisNodeOutputs.tsx +26 -20
  10. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/PrioritySummaryPanel.tsx +4 -3
  11. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/SpacePlot2D.tsx +27 -21
  12. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/SpacePlot3D.tsx +4 -2
  13. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/StatusBadge.tsx +1 -1
  14. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/hooks/useInsightSpace.ts +6 -8
  15. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/AnalysisCompareView.tsx +7 -3
  16. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/AnalysisLab.tsx +25 -22
  17. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/AnalysisResultView.tsx +3 -3
  18. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/CompareRuns.tsx +8 -2
  19. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/ComprehensiveAnalysis.tsx +24 -28
  20. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/CustomerReport.tsx +6 -2
  21. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/Dashboard.tsx +6 -2
  22. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/EvaluationStudio.tsx +3 -3
  23. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/RunDetails.tsx +15 -9
  24. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/Settings.tsx +3 -2
  25. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/Visualization.tsx +28 -9
  26. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/services/api.ts +3 -3
  27. {evalvault-1.62.0 → evalvault-1.62.1}/mkdocs.yml +1 -0
  28. {evalvault-1.62.0 → evalvault-1.62.1}/pyproject.toml +1 -1
  29. {evalvault-1.62.0 → evalvault-1.62.1}/uv.lock +1 -1
  30. evalvault-1.62.0/docs/guides/RAG_HUMAN_FEEDBACK_CALIBRATION.md +0 -298
  31. {evalvault-1.62.0 → evalvault-1.62.1}/.cursor/worktrees.json +0 -0
  32. {evalvault-1.62.0 → evalvault-1.62.1}/.dockerignore +0 -0
  33. {evalvault-1.62.0 → evalvault-1.62.1}/.env.example +0 -0
  34. {evalvault-1.62.0 → evalvault-1.62.1}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
  35. {evalvault-1.62.0 → evalvault-1.62.1}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
  36. {evalvault-1.62.0 → evalvault-1.62.1}/.github/ISSUE_TEMPLATE/question.md +0 -0
  37. {evalvault-1.62.0 → evalvault-1.62.1}/.github/dependabot.yml +0 -0
  38. {evalvault-1.62.0 → evalvault-1.62.1}/.github/pull_request_template.md +0 -0
  39. {evalvault-1.62.0 → evalvault-1.62.1}/.github/stale.yml +0 -0
  40. {evalvault-1.62.0 → evalvault-1.62.1}/.github/workflows/ci.yml +0 -0
  41. {evalvault-1.62.0 → evalvault-1.62.1}/.github/workflows/release.yml +0 -0
  42. {evalvault-1.62.0 → evalvault-1.62.1}/.github/workflows/stale.yml +0 -0
  43. {evalvault-1.62.0 → evalvault-1.62.1}/.gitignore +0 -0
  44. {evalvault-1.62.0 → evalvault-1.62.1}/.pre-commit-config.yaml +0 -0
  45. {evalvault-1.62.0 → evalvault-1.62.1}/.python-version +0 -0
  46. {evalvault-1.62.0 → evalvault-1.62.1}/AGENTS.md +0 -0
  47. {evalvault-1.62.0 → evalvault-1.62.1}/CHANGELOG.md +0 -0
  48. {evalvault-1.62.0 → evalvault-1.62.1}/CLAUDE.md +0 -0
  49. {evalvault-1.62.0 → evalvault-1.62.1}/CODE_OF_CONDUCT.md +0 -0
  50. {evalvault-1.62.0 → evalvault-1.62.1}/CONTRIBUTING.md +0 -0
  51. {evalvault-1.62.0 → evalvault-1.62.1}/Dockerfile +0 -0
  52. {evalvault-1.62.0 → evalvault-1.62.1}/LICENSE.md +0 -0
  53. {evalvault-1.62.0 → evalvault-1.62.1}/README.en.md +0 -0
  54. {evalvault-1.62.0 → evalvault-1.62.1}/README.md +0 -0
  55. {evalvault-1.62.0 → evalvault-1.62.1}/SECURITY.md +0 -0
  56. {evalvault-1.62.0 → evalvault-1.62.1}/agent/README.md +0 -0
  57. {evalvault-1.62.0 → evalvault-1.62.1}/agent/agent.py +0 -0
  58. {evalvault-1.62.0 → evalvault-1.62.1}/agent/client.py +0 -0
  59. {evalvault-1.62.0 → evalvault-1.62.1}/agent/config.py +0 -0
  60. {evalvault-1.62.0 → evalvault-1.62.1}/agent/main.py +0 -0
  61. {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory/README.md +0 -0
  62. {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory/shared/decisions.md +0 -0
  63. {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory/shared/dependencies.md +0 -0
  64. {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory/templates/coordinator_guide.md +0 -0
  65. {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory/templates/work_log_template.md +0 -0
  66. {evalvault-1.62.0 → evalvault-1.62.1}/agent/memory_integration.py +0 -0
  67. {evalvault-1.62.0 → evalvault-1.62.1}/agent/progress.py +0 -0
  68. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/app_spec.txt +0 -0
  69. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/baseline.txt +0 -0
  70. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/coding_prompt.md +0 -0
  71. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/existing_project_prompt.md +0 -0
  72. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/improvement/architecture_prompt.md +0 -0
  73. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/improvement/base_prompt.md +0 -0
  74. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/improvement/coordinator_prompt.md +0 -0
  75. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/improvement/observability_prompt.md +0 -0
  76. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/initializer_prompt.md +0 -0
  77. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/prompt_manifest.json +0 -0
  78. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts/system.txt +0 -0
  79. {evalvault-1.62.0 → evalvault-1.62.1}/agent/prompts.py +0 -0
  80. {evalvault-1.62.0 → evalvault-1.62.1}/agent/requirements.txt +0 -0
  81. {evalvault-1.62.0 → evalvault-1.62.1}/agent/security.py +0 -0
  82. {evalvault-1.62.0 → evalvault-1.62.1}/config/domains/insurance/memory.yaml +0 -0
  83. {evalvault-1.62.0 → evalvault-1.62.1}/config/domains/insurance/terms_dictionary_en.json +0 -0
  84. {evalvault-1.62.0 → evalvault-1.62.1}/config/domains/insurance/terms_dictionary_ko.json +0 -0
  85. {evalvault-1.62.0 → evalvault-1.62.1}/config/methods.yaml +0 -0
  86. {evalvault-1.62.0 → evalvault-1.62.1}/config/models.yaml +0 -0
  87. {evalvault-1.62.0 → evalvault-1.62.1}/config/regressions/default.json +0 -0
  88. {evalvault-1.62.0 → evalvault-1.62.1}/config/regressions/ux.json +0 -0
  89. {evalvault-1.62.0 → evalvault-1.62.1}/config/stage_metric_playbook.yaml +0 -0
  90. {evalvault-1.62.0 → evalvault-1.62.1}/config/stage_metric_thresholds.json +0 -0
  91. {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/dummy_test_dataset.json +0 -0
  92. {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/insurance_qa_korean.csv +0 -0
  93. {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/insurance_qa_korean.json +0 -0
  94. {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/insurance_qa_korean_2.json +0 -0
  95. {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/insurance_qa_korean_3.json +0 -0
  96. {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/sample.json +0 -0
  97. {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/visualization_20q_cluster_map.csv +0 -0
  98. {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/visualization_20q_korean.json +0 -0
  99. {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/visualization_2q_cluster_map.csv +0 -0
  100. {evalvault-1.62.0 → evalvault-1.62.1}/data/datasets/visualization_2q_korean.json +0 -0
  101. {evalvault-1.62.0 → evalvault-1.62.1}/data/kg/knowledge_graph.json +0 -0
  102. {evalvault-1.62.0 → evalvault-1.62.1}/data/raw/The Complete Guide to Mastering Suno Advanced Strategies for Professional Music Generation.md +0 -0
  103. {evalvault-1.62.0 → evalvault-1.62.1}/data/raw/edge_cases.json +0 -0
  104. {evalvault-1.62.0 → evalvault-1.62.1}/data/raw/run_mode_full_domain_memory.json +0 -0
  105. {evalvault-1.62.0 → evalvault-1.62.1}/data/raw/sample_rag_knowledge.txt +0 -0
  106. {evalvault-1.62.0 → evalvault-1.62.1}/dataset_templates/dataset_template.csv +0 -0
  107. {evalvault-1.62.0 → evalvault-1.62.1}/dataset_templates/dataset_template.json +0 -0
  108. {evalvault-1.62.0 → evalvault-1.62.1}/dataset_templates/dataset_template.xlsx +0 -0
  109. {evalvault-1.62.0 → evalvault-1.62.1}/dataset_templates/method_input_template.json +0 -0
  110. {evalvault-1.62.0 → evalvault-1.62.1}/docker-compose.langfuse.yml +0 -0
  111. {evalvault-1.62.0 → evalvault-1.62.1}/docker-compose.phoenix.yaml +0 -0
  112. {evalvault-1.62.0 → evalvault-1.62.1}/docker-compose.yml +0 -0
  113. {evalvault-1.62.0 → evalvault-1.62.1}/docs/README.ko.md +0 -0
  114. {evalvault-1.62.0 → evalvault-1.62.1}/docs/ROADMAP.md +0 -0
  115. {evalvault-1.62.0 → evalvault-1.62.1}/docs/STATUS.md +0 -0
  116. {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/adapters/inbound.md +0 -0
  117. {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/adapters/outbound.md +0 -0
  118. {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/config.md +0 -0
  119. {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/domain/entities.md +0 -0
  120. {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/domain/metrics.md +0 -0
  121. {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/domain/services.md +0 -0
  122. {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/ports/inbound.md +0 -0
  123. {evalvault-1.62.0 → evalvault-1.62.1}/docs/api/ports/outbound.md +0 -0
  124. {evalvault-1.62.0 → evalvault-1.62.1}/docs/architecture/open-rag-trace-collector.md +0 -0
  125. {evalvault-1.62.0 → evalvault-1.62.1}/docs/architecture/open-rag-trace-spec.md +0 -0
  126. {evalvault-1.62.0 → evalvault-1.62.1}/docs/getting-started/INSTALLATION.md +0 -0
  127. {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/AGENTS_SYSTEM_GUIDE.md +0 -0
  128. {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/CLI_MCP_PLAN.md +0 -0
  129. {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/DEV_GUIDE.md +0 -0
  130. {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/OPEN_RAG_TRACE_INTERNAL_ADAPTER.md +0 -0
  131. {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/OPEN_RAG_TRACE_SAMPLES.md +0 -0
  132. {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/RELEASE_CHECKLIST.md +0 -0
  133. {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/USER_GUIDE.md +0 -0
  134. {evalvault-1.62.0 → evalvault-1.62.1}/docs/guides/rag_human_feedback_calibration_implementation_plan.md +0 -0
  135. {evalvault-1.62.0 → evalvault-1.62.1}/docs/mapping/component-to-whitepaper.yaml +0 -0
  136. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/00_frontmatter.md +0 -0
  137. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/01_overview.md +0 -0
  138. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/02_architecture.md +0 -0
  139. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/03_data_flow.md +0 -0
  140. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/04_components.md +0 -0
  141. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/05_expert_lenses.md +0 -0
  142. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/06_implementation.md +0 -0
  143. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/07_advanced.md +0 -0
  144. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/08_customization.md +0 -0
  145. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/09_quality.md +0 -0
  146. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/10_performance.md +0 -0
  147. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/11_security.md +0 -0
  148. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/12_operations.md +0 -0
  149. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/13_standards.md +0 -0
  150. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/14_roadmap.md +0 -0
  151. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/INDEX.md +0 -0
  152. {evalvault-1.62.0 → evalvault-1.62.1}/docs/new_whitepaper/STYLE_GUIDE.md +0 -0
  153. {evalvault-1.62.0 → evalvault-1.62.1}/docs/stylesheets/extra.css +0 -0
  154. {evalvault-1.62.0 → evalvault-1.62.1}/docs/templates/dataset_template.csv +0 -0
  155. {evalvault-1.62.0 → evalvault-1.62.1}/docs/templates/dataset_template.json +0 -0
  156. {evalvault-1.62.0 → evalvault-1.62.1}/docs/templates/dataset_template.xlsx +0 -0
  157. {evalvault-1.62.0 → evalvault-1.62.1}/docs/templates/kg_template.json +0 -0
  158. {evalvault-1.62.0 → evalvault-1.62.1}/docs/templates/retriever_docs_template.json +0 -0
  159. {evalvault-1.62.0 → evalvault-1.62.1}/docs/tools/generate-whitepaper.py +0 -0
  160. {evalvault-1.62.0 → evalvault-1.62.1}/docs/web_ui_analysis_migration_plan.md +0 -0
  161. {evalvault-1.62.0 → evalvault-1.62.1}/dummy_test_dataset.json +0 -0
  162. {evalvault-1.62.0 → evalvault-1.62.1}/examples/README.md +0 -0
  163. {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/README.md +0 -0
  164. {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/korean_rag/faithfulness_test.json +0 -0
  165. {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/korean_rag/insurance_qa_100.json +0 -0
  166. {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/korean_rag/keyword_extraction_test.json +0 -0
  167. {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/korean_rag/retrieval_test.json +0 -0
  168. {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/output/comparison.json +0 -0
  169. {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/output/full_results.json +0 -0
  170. {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/output/leaderboard.json +0 -0
  171. {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/output/results_mteb.json +0 -0
  172. {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/output/retrieval_result.json +0 -0
  173. {evalvault-1.62.0 → evalvault-1.62.1}/examples/benchmarks/run_korean_benchmark.py +0 -0
  174. {evalvault-1.62.0 → evalvault-1.62.1}/examples/kg_generator_demo.py +0 -0
  175. {evalvault-1.62.0 → evalvault-1.62.1}/examples/method_plugin_template/README.md +0 -0
  176. {evalvault-1.62.0 → evalvault-1.62.1}/examples/method_plugin_template/pyproject.toml +0 -0
  177. {evalvault-1.62.0 → evalvault-1.62.1}/examples/method_plugin_template/src/method_plugin_template/__init__.py +0 -0
  178. {evalvault-1.62.0 → evalvault-1.62.1}/examples/method_plugin_template/src/method_plugin_template/methods.py +0 -0
  179. {evalvault-1.62.0 → evalvault-1.62.1}/examples/stage_events.jsonl +0 -0
  180. {evalvault-1.62.0 → evalvault-1.62.1}/examples/usecase/comprehensive_workflow_test.py +0 -0
  181. {evalvault-1.62.0 → evalvault-1.62.1}/examples/usecase/insurance_eval_dataset.json +0 -0
  182. {evalvault-1.62.0 → evalvault-1.62.1}/examples/usecase/output/comprehensive_report.html +0 -0
  183. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/.env.example +0 -0
  184. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/.gitignore +0 -0
  185. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/README.md +0 -0
  186. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/analysis-compare.spec.ts +0 -0
  187. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/analysis-lab.spec.ts +0 -0
  188. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/compare-runs.spec.ts +0 -0
  189. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/dashboard.spec.ts +0 -0
  190. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/domain-memory.spec.ts +0 -0
  191. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/evaluation-studio.spec.ts +0 -0
  192. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/knowledge-base.spec.ts +0 -0
  193. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/mocks/intents.json +0 -0
  194. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/mocks/run_details.json +0 -0
  195. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/mocks/runs.json +0 -0
  196. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/e2e/run-details.spec.ts +0 -0
  197. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/eslint.config.js +0 -0
  198. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/index.html +0 -0
  199. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/playwright.config.ts +0 -0
  200. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/public/vite.svg +0 -0
  201. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/App.css +0 -0
  202. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/App.tsx +0 -0
  203. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/assets/react.svg +0 -0
  204. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/InsightSpacePanel.tsx +0 -0
  205. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/Layout.tsx +0 -0
  206. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/MarkdownContent.tsx +0 -0
  207. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/SpaceLegend.tsx +0 -0
  208. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/components/VirtualizedText.tsx +0 -0
  209. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/config/ui.ts +0 -0
  210. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/config.ts +0 -0
  211. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/index.css +0 -0
  212. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/main.tsx +0 -0
  213. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/DomainMemory.tsx +0 -0
  214. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/KnowledgeBase.tsx +0 -0
  215. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/pages/VisualizationHome.tsx +0 -0
  216. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/types/plotly.d.ts +0 -0
  217. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/utils/format.ts +0 -0
  218. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/utils/phoenix.ts +0 -0
  219. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/utils/runAnalytics.ts +0 -0
  220. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/utils/score.ts +0 -0
  221. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/src/utils/summaryMetrics.ts +0 -0
  222. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/tailwind.config.js +0 -0
  223. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/tsconfig.app.json +0 -0
  224. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/tsconfig.json +0 -0
  225. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/tsconfig.node.json +0 -0
  226. {evalvault-1.62.0 → evalvault-1.62.1}/frontend/vite.config.ts +0 -0
  227. {evalvault-1.62.0 → evalvault-1.62.1}/package-lock.json +0 -0
  228. {evalvault-1.62.0 → evalvault-1.62.1}/reports/.gitkeep +0 -0
  229. {evalvault-1.62.0 → evalvault-1.62.1}/reports/README.md +0 -0
  230. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_0aa9fab0-6c2c-4c1c-b228-202a38a2f00c.json +0 -0
  231. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_0aa9fab0-6c2c-4c1c-b228-202a38a2f00c.md +0 -0
  232. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_2163f844-ee2c-4630-9ba8-35cd9954d92e.json +0 -0
  233. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_2163f844-ee2c-4630-9ba8-35cd9954d92e.md +0 -0
  234. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_4516d358-2797-4c46-9f14-c1d975588025.json +0 -0
  235. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_4516d358-2797-4c46-9f14-c1d975588025.md +0 -0
  236. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb.json +0 -0
  237. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb.md +0 -0
  238. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5.json +0 -0
  239. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5.md +0 -0
  240. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_9fbf4776-9f5b-4c4b-ba08-c556032cee86.json +0 -0
  241. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_9fbf4776-9f5b-4c4b-ba08-c556032cee86.md +0 -0
  242. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775.json +0 -0
  243. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775.md +0 -0
  244. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e.json +0 -0
  245. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e.md +0 -0
  246. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/causal_analysis.json +0 -0
  247. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/diagnostic.json +0 -0
  248. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/final_output.json +0 -0
  249. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/index.json +0 -0
  250. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/load_data.json +0 -0
  251. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/load_runs.json +0 -0
  252. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/low_samples.json +0 -0
  253. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/nlp_analysis.json +0 -0
  254. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/pattern_detection.json +0 -0
  255. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/priority_summary.json +0 -0
  256. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/ragas_eval.json +0 -0
  257. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/report.json +0 -0
  258. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/root_cause.json +0 -0
  259. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/statistics.json +0 -0
  260. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/time_series.json +0 -0
  261. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/trend_detection.json +0 -0
  262. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/causal_analysis.json +0 -0
  263. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/diagnostic.json +0 -0
  264. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/final_output.json +0 -0
  265. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/index.json +0 -0
  266. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/load_data.json +0 -0
  267. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/load_runs.json +0 -0
  268. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/low_samples.json +0 -0
  269. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/nlp_analysis.json +0 -0
  270. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/pattern_detection.json +0 -0
  271. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/priority_summary.json +0 -0
  272. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/ragas_eval.json +0 -0
  273. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/report.json +0 -0
  274. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/root_cause.json +0 -0
  275. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/statistics.json +0 -0
  276. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/time_series.json +0 -0
  277. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/trend_detection.json +0 -0
  278. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/causal_analysis.json +0 -0
  279. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/diagnostic.json +0 -0
  280. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/final_output.json +0 -0
  281. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/index.json +0 -0
  282. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/load_data.json +0 -0
  283. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/load_runs.json +0 -0
  284. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/low_samples.json +0 -0
  285. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/nlp_analysis.json +0 -0
  286. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/pattern_detection.json +0 -0
  287. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/priority_summary.json +0 -0
  288. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/ragas_eval.json +0 -0
  289. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/report.json +0 -0
  290. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/root_cause.json +0 -0
  291. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/statistics.json +0 -0
  292. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/time_series.json +0 -0
  293. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/trend_detection.json +0 -0
  294. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/causal_analysis.json +0 -0
  295. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/diagnostic.json +0 -0
  296. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/final_output.json +0 -0
  297. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/index.json +0 -0
  298. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/load_data.json +0 -0
  299. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/load_runs.json +0 -0
  300. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/low_samples.json +0 -0
  301. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/nlp_analysis.json +0 -0
  302. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/pattern_detection.json +0 -0
  303. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/priority_summary.json +0 -0
  304. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/ragas_eval.json +0 -0
  305. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/report.json +0 -0
  306. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/root_cause.json +0 -0
  307. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/statistics.json +0 -0
  308. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/time_series.json +0 -0
  309. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/trend_detection.json +0 -0
  310. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/causal_analysis.json +0 -0
  311. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/diagnostic.json +0 -0
  312. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/final_output.json +0 -0
  313. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/index.json +0 -0
  314. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/load_data.json +0 -0
  315. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/load_runs.json +0 -0
  316. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/low_samples.json +0 -0
  317. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/nlp_analysis.json +0 -0
  318. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/pattern_detection.json +0 -0
  319. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/priority_summary.json +0 -0
  320. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/ragas_eval.json +0 -0
  321. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/report.json +0 -0
  322. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/root_cause.json +0 -0
  323. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/statistics.json +0 -0
  324. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/time_series.json +0 -0
  325. {evalvault-1.62.0 → evalvault-1.62.1}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/trend_detection.json +0 -0
  326. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/final_output.json +0 -0
  327. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/index.json +0 -0
  328. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/load_runs.json +0 -0
  329. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/report.json +0 -0
  330. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_change_detection.json +0 -0
  331. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_metric_comparison.json +0 -0
  332. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/final_output.json +0 -0
  333. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/index.json +0 -0
  334. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/load_runs.json +0 -0
  335. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/report.json +0 -0
  336. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_change_detection.json +0 -0
  337. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_metric_comparison.json +0 -0
  338. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/final_output.json +0 -0
  339. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/index.json +0 -0
  340. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/load_runs.json +0 -0
  341. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/report.json +0 -0
  342. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_change_detection.json +0 -0
  343. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_metric_comparison.json +0 -0
  344. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_0aa9fab0_9fbf4776.json +0 -0
  345. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_0aa9fab0_9fbf4776.md +0 -0
  346. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_0aa9fab0_f1287e90.json +0 -0
  347. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_0aa9fab0_f1287e90.md +0 -0
  348. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_8f825b22_4516d358.json +0 -0
  349. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_8f825b22_4516d358.md +0 -0
  350. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_9fbf4776_a491fa0e.json +0 -0
  351. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_9fbf4776_a491fa0e.md +0 -0
  352. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_f1287e90_8f825b22.json +0 -0
  353. {evalvault-1.62.0 → evalvault-1.62.1}/reports/comparison/comparison_f1287e90_8f825b22.md +0 -0
  354. {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r1_smoke.md +0 -0
  355. {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r2_graphrag.md +0 -0
  356. {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r2_graphrag_openai.md +0 -0
  357. {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r3_bm25.md +0 -0
  358. {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r3_bm25_langfuse3.md +0 -0
  359. {evalvault-1.62.0 → evalvault-1.62.1}/reports/debug_report_r3_dense_faiss.md +0 -0
  360. {evalvault-1.62.0 → evalvault-1.62.1}/reports/improvement_1d91a667-4288-4742-be3a-a8f5310c5140.md +0 -0
  361. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r2_graphrag_openai_stage_events.jsonl +0 -0
  362. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r2_graphrag_openai_stage_report.txt +0 -0
  363. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r2_graphrag_stage_events.jsonl +0 -0
  364. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r2_graphrag_stage_report.txt +0 -0
  365. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_langfuse2_stage_events.jsonl +0 -0
  366. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_langfuse3_stage_events.jsonl +0 -0
  367. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_langfuse_stage_events.jsonl +0 -0
  368. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_phoenix_stage_events.jsonl +0 -0
  369. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_stage_events.jsonl +0 -0
  370. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_bm25_stage_report.txt +0 -0
  371. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_dense_faiss_stage_events.jsonl +0 -0
  372. {evalvault-1.62.0 → evalvault-1.62.1}/reports/r3_dense_faiss_stage_report.txt +0 -0
  373. {evalvault-1.62.0 → evalvault-1.62.1}/reports/retrieval_benchmark_smoke_precision.csv +0 -0
  374. {evalvault-1.62.0 → evalvault-1.62.1}/reports/retrieval_benchmark_smoke_precision_graphrag.csv +0 -0
  375. {evalvault-1.62.0 → evalvault-1.62.1}/reports/retrieval_benchmark_smoke_precision_multi.csv +0 -0
  376. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/benchmark/download_kmmlu.py +0 -0
  377. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev/open_rag_trace_demo.py +0 -0
  378. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev/open_rag_trace_integration_template.py +0 -0
  379. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev/otel-collector-config.yaml +0 -0
  380. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev/start_web_ui_with_phoenix.sh +0 -0
  381. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev/validate_open_rag_trace.py +0 -0
  382. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/dev_seed_pipeline_results.py +0 -0
  383. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/__init__.py +0 -0
  384. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/analyzer/__init__.py +0 -0
  385. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/analyzer/ast_scanner.py +0 -0
  386. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/analyzer/confidence_scorer.py +0 -0
  387. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/analyzer/graph_builder.py +0 -0
  388. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/analyzer/side_effect_detector.py +0 -0
  389. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/generate_api_docs.py +0 -0
  390. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/models/__init__.py +0 -0
  391. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/models/schema.py +0 -0
  392. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/renderer/__init__.py +0 -0
  393. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/docs/renderer/html_generator.py +0 -0
  394. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/ops/phoenix_watch.py +0 -0
  395. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/backfill_langfuse_trace_url.py +0 -0
  396. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/r3_dense_smoke.py +0 -0
  397. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/r3_evalvault_run_dataset.json +0 -0
  398. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/r3_retriever_docs.json +0 -0
  399. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/r3_smoke_real.jsonl +0 -0
  400. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/perf/r3_stage_events_sample.jsonl +0 -0
  401. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/pipeline_template_inspect.py +0 -0
  402. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/reports/generate_release_notes.py +0 -0
  403. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/run_with_timeout.py +0 -0
  404. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/test_full_evaluation.py +0 -0
  405. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/tests/run_regressions.py +0 -0
  406. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/tests/run_retriever_stage_report_smoke.sh +0 -0
  407. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/validate_tutorials.py +0 -0
  408. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/verify_ragas_compliance.py +0 -0
  409. {evalvault-1.62.0 → evalvault-1.62.1}/scripts/verify_workflows.py +0 -0
  410. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/__init__.py +0 -0
  411. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/__init__.py +0 -0
  412. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/__init__.py +0 -0
  413. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/__init__.py +0 -0
  414. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/adapter.py +0 -0
  415. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/main.py +0 -0
  416. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/__init__.py +0 -0
  417. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/benchmark.py +0 -0
  418. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/config.py +0 -0
  419. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/domain.py +0 -0
  420. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/knowledge.py +0 -0
  421. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/pipeline.py +0 -0
  422. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/api/routers/runs.py +0 -0
  423. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/__init__.py +0 -0
  424. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/app.py +0 -0
  425. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/__init__.py +0 -0
  426. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/agent.py +0 -0
  427. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/analyze.py +0 -0
  428. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/api.py +0 -0
  429. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/benchmark.py +0 -0
  430. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/calibrate.py +0 -0
  431. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/config.py +0 -0
  432. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/debug.py +0 -0
  433. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/domain.py +0 -0
  434. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/experiment.py +0 -0
  435. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/gate.py +0 -0
  436. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/generate.py +0 -0
  437. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/history.py +0 -0
  438. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/init.py +0 -0
  439. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/kg.py +0 -0
  440. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/langfuse.py +0 -0
  441. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/method.py +0 -0
  442. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/phoenix.py +0 -0
  443. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/pipeline.py +0 -0
  444. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/prompts.py +0 -0
  445. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/run.py +0 -0
  446. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/run_helpers.py +0 -0
  447. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/commands/stage.py +0 -0
  448. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/__init__.py +0 -0
  449. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/analysis_io.py +0 -0
  450. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/console.py +0 -0
  451. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/errors.py +0 -0
  452. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/formatters.py +0 -0
  453. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/options.py +0 -0
  454. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/presets.py +0 -0
  455. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/progress.py +0 -0
  456. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/cli/utils/validators.py +0 -0
  457. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/mcp/__init__.py +0 -0
  458. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/mcp/schemas.py +0 -0
  459. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/inbound/mcp/tools.py +0 -0
  460. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/__init__.py +0 -0
  461. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/__init__.py +0 -0
  462. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/analysis_report_module.py +0 -0
  463. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/base_module.py +0 -0
  464. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/bm25_searcher_module.py +0 -0
  465. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/causal_adapter.py +0 -0
  466. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/causal_analyzer_module.py +0 -0
  467. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/common.py +0 -0
  468. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/comparison_report_module.py +0 -0
  469. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/data_loader_module.py +0 -0
  470. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/detailed_report_module.py +0 -0
  471. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/diagnostic_playbook_module.py +0 -0
  472. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/embedding_analyzer_module.py +0 -0
  473. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/embedding_distribution_module.py +0 -0
  474. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/embedding_searcher_module.py +0 -0
  475. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/hybrid_rrf_module.py +0 -0
  476. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/hybrid_weighted_module.py +0 -0
  477. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/hypothesis_generator_module.py +0 -0
  478. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/llm_report_module.py +0 -0
  479. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/low_performer_extractor_module.py +0 -0
  480. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/model_analyzer_module.py +0 -0
  481. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/morpheme_analyzer_module.py +0 -0
  482. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/morpheme_quality_checker_module.py +0 -0
  483. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/network_analyzer_module.py +0 -0
  484. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/nlp_adapter.py +0 -0
  485. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/nlp_analyzer_module.py +0 -0
  486. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/pattern_detector_module.py +0 -0
  487. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/pipeline_factory.py +0 -0
  488. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/pipeline_helpers.py +0 -0
  489. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/priority_summary_module.py +0 -0
  490. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/ragas_evaluator_module.py +0 -0
  491. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/retrieval_analyzer_module.py +0 -0
  492. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/retrieval_benchmark_module.py +0 -0
  493. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/retrieval_quality_checker_module.py +0 -0
  494. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/root_cause_analyzer_module.py +0 -0
  495. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/run_analyzer_module.py +0 -0
  496. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/run_change_detector_module.py +0 -0
  497. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/run_comparator_module.py +0 -0
  498. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/run_loader_module.py +0 -0
  499. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/run_metric_comparator_module.py +0 -0
  500. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/search_comparator_module.py +0 -0
  501. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/statistical_adapter.py +0 -0
  502. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/statistical_analyzer_module.py +0 -0
  503. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/statistical_comparator_module.py +0 -0
  504. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/summary_report_module.py +0 -0
  505. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/time_series_analyzer_module.py +0 -0
  506. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/timeseries_advanced_module.py +0 -0
  507. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/trend_detector_module.py +0 -0
  508. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/analysis/verification_report_module.py +0 -0
  509. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/benchmark/__init__.py +0 -0
  510. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/benchmark/lm_eval_adapter.py +0 -0
  511. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/cache/__init__.py +0 -0
  512. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/cache/hybrid_cache.py +0 -0
  513. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/cache/memory_cache.py +0 -0
  514. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/__init__.py +0 -0
  515. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/base.py +0 -0
  516. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/csv_loader.py +0 -0
  517. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/excel_loader.py +0 -0
  518. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/json_loader.py +0 -0
  519. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/loader_factory.py +0 -0
  520. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/method_input_loader.py +0 -0
  521. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/streaming_loader.py +0 -0
  522. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/templates.py +0 -0
  523. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/dataset/thresholds.py +0 -0
  524. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/debug/__init__.py +0 -0
  525. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/debug/report_renderer.py +0 -0
  526. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/documents/__init__.py +0 -0
  527. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/documents/ocr/__init__.py +0 -0
  528. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/documents/ocr/paddleocr_backend.py +0 -0
  529. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/documents/pdf_extractor.py +0 -0
  530. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/documents/versioned_loader.py +0 -0
  531. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/domain_memory/__init__.py +0 -0
  532. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/domain_memory/domain_memory_schema.sql +0 -0
  533. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/domain_memory/sqlite_adapter.py +0 -0
  534. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/improvement/__init__.py +0 -0
  535. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/improvement/insight_generator.py +0 -0
  536. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/improvement/pattern_detector.py +0 -0
  537. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/improvement/playbook_loader.py +0 -0
  538. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/improvement/stage_metric_playbook_loader.py +0 -0
  539. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/kg/__init__.py +0 -0
  540. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/kg/graph_rag_retriever.py +0 -0
  541. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/kg/networkx_adapter.py +0 -0
  542. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/kg/parallel_kg_builder.py +0 -0
  543. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/kg/query_strategies.py +0 -0
  544. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/__init__.py +0 -0
  545. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/anthropic_adapter.py +0 -0
  546. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/azure_adapter.py +0 -0
  547. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/base.py +0 -0
  548. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/instructor_factory.py +0 -0
  549. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/llm_relation_augmenter.py +0 -0
  550. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/ollama_adapter.py +0 -0
  551. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/openai_adapter.py +0 -0
  552. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/token_aware_chat.py +0 -0
  553. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/llm/vllm_adapter.py +0 -0
  554. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/methods/__init__.py +0 -0
  555. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/methods/baseline_oracle.py +0 -0
  556. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/methods/external_command.py +0 -0
  557. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/methods/registry.py +0 -0
  558. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/__init__.py +0 -0
  559. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/__init__.py +0 -0
  560. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/bm25_retriever.py +0 -0
  561. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/dense_retriever.py +0 -0
  562. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/document_chunker.py +0 -0
  563. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/hybrid_retriever.py +0 -0
  564. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/kiwi_tokenizer.py +0 -0
  565. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/korean_evaluation.py +0 -0
  566. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/korean_stopwords.py +0 -0
  567. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/nlp/korean/toolkit.py +0 -0
  568. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/phoenix/sync_service.py +0 -0
  569. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/report/__init__.py +0 -0
  570. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/report/dashboard_generator.py +0 -0
  571. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/report/llm_report_generator.py +0 -0
  572. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/report/markdown_adapter.py +0 -0
  573. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/__init__.py +0 -0
  574. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/base_sql.py +0 -0
  575. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/benchmark_storage_adapter.py +0 -0
  576. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/postgres_adapter.py +0 -0
  577. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/postgres_schema.sql +0 -0
  578. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/schema.sql +0 -0
  579. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/storage/sqlite_adapter.py +0 -0
  580. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/__init__.py +0 -0
  581. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/open_rag_log_handler.py +0 -0
  582. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/open_rag_trace_adapter.py +0 -0
  583. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/open_rag_trace_decorators.py +0 -0
  584. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/open_rag_trace_helpers.py +0 -0
  585. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracer/phoenix_tracer_adapter.py +0 -0
  586. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracker/__init__.py +0 -0
  587. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracker/langfuse_adapter.py +0 -0
  588. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracker/mlflow_adapter.py +0 -0
  589. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/adapters/outbound/tracker/phoenix_adapter.py +0 -0
  590. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/__init__.py +0 -0
  591. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/agent_types.py +0 -0
  592. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/domain_config.py +0 -0
  593. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/instrumentation.py +0 -0
  594. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/langfuse_support.py +0 -0
  595. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/model_config.py +0 -0
  596. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/phoenix_support.py +0 -0
  597. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/playbooks/improvement_playbook.yaml +0 -0
  598. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/config/settings.py +0 -0
  599. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/debug_ragas.py +0 -0
  600. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/debug_ragas_real.py +0 -0
  601. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/__init__.py +0 -0
  602. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/__init__.py +0 -0
  603. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/analysis.py +0 -0
  604. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/analysis_pipeline.py +0 -0
  605. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/benchmark.py +0 -0
  606. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/benchmark_run.py +0 -0
  607. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/dataset.py +0 -0
  608. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/debug.py +0 -0
  609. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/experiment.py +0 -0
  610. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/feedback.py +0 -0
  611. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/improvement.py +0 -0
  612. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/kg.py +0 -0
  613. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/memory.py +0 -0
  614. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/method.py +0 -0
  615. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/prompt.py +0 -0
  616. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/rag_trace.py +0 -0
  617. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/result.py +0 -0
  618. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/entities/stage.py +0 -0
  619. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/__init__.py +0 -0
  620. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/analysis_registry.py +0 -0
  621. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/confidence.py +0 -0
  622. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/contextual_relevancy.py +0 -0
  623. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/entity_preservation.py +0 -0
  624. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/insurance.py +0 -0
  625. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/no_answer.py +0 -0
  626. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/registry.py +0 -0
  627. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/retrieval_rank.py +0 -0
  628. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/terms_dictionary.json +0 -0
  629. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/metrics/text_match.py +0 -0
  630. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/__init__.py +0 -0
  631. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/analysis_service.py +0 -0
  632. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/async_batch_executor.py +0 -0
  633. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/batch_executor.py +0 -0
  634. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/benchmark_report_service.py +0 -0
  635. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/benchmark_runner.py +0 -0
  636. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/benchmark_service.py +0 -0
  637. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/cache_metrics.py +0 -0
  638. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/cluster_map_builder.py +0 -0
  639. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/dataset_preprocessor.py +0 -0
  640. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/debug_report_service.py +0 -0
  641. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/document_chunker.py +0 -0
  642. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/document_versioning.py +0 -0
  643. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/domain_learning_hook.py +0 -0
  644. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/embedding_overlay.py +0 -0
  645. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/entity_extractor.py +0 -0
  646. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/evaluator.py +0 -0
  647. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/experiment_comparator.py +0 -0
  648. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/experiment_manager.py +0 -0
  649. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/experiment_reporter.py +0 -0
  650. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/experiment_repository.py +0 -0
  651. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/experiment_statistics.py +0 -0
  652. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/improvement_guide_service.py +0 -0
  653. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/intent_classifier.py +0 -0
  654. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/kg_generator.py +0 -0
  655. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/memory_aware_evaluator.py +0 -0
  656. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/memory_based_analysis.py +0 -0
  657. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/method_runner.py +0 -0
  658. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/pipeline_orchestrator.py +0 -0
  659. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/pipeline_template_registry.py +0 -0
  660. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/prompt_manifest.py +0 -0
  661. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/prompt_registry.py +0 -0
  662. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/prompt_status.py +0 -0
  663. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/ragas_prompt_overrides.py +0 -0
  664. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/retrieval_metrics.py +0 -0
  665. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/retriever_context.py +0 -0
  666. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/satisfaction_calibration_service.py +0 -0
  667. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/stage_event_builder.py +0 -0
  668. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/stage_metric_guide_service.py +0 -0
  669. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/stage_metric_service.py +0 -0
  670. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/stage_summary_service.py +0 -0
  671. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/synthetic_qa_generator.py +0 -0
  672. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/testset_generator.py +0 -0
  673. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/threshold_profiles.py +0 -0
  674. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/unified_report_service.py +0 -0
  675. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/domain/services/visual_space_service.py +0 -0
  676. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/mkdocs_helpers.py +0 -0
  677. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/__init__.py +0 -0
  678. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/inbound/__init__.py +0 -0
  679. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/inbound/analysis_pipeline_port.py +0 -0
  680. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/inbound/evaluator_port.py +0 -0
  681. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/inbound/learning_hook_port.py +0 -0
  682. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/inbound/web_port.py +0 -0
  683. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/__init__.py +0 -0
  684. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/analysis_cache_port.py +0 -0
  685. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/analysis_module_port.py +0 -0
  686. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/analysis_port.py +0 -0
  687. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/benchmark_port.py +0 -0
  688. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/causal_analysis_port.py +0 -0
  689. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/dataset_port.py +0 -0
  690. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/domain_memory_port.py +0 -0
  691. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/embedding_port.py +0 -0
  692. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/improvement_port.py +0 -0
  693. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/intent_classifier_port.py +0 -0
  694. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/korean_nlp_port.py +0 -0
  695. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/llm_port.py +0 -0
  696. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/method_port.py +0 -0
  697. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/nlp_analysis_port.py +0 -0
  698. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/relation_augmenter_port.py +0 -0
  699. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/report_port.py +0 -0
  700. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/stage_storage_port.py +0 -0
  701. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/storage_port.py +0 -0
  702. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/tracer_port.py +0 -0
  703. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/ports/outbound/tracker_port.py +0 -0
  704. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/reports/__init__.py +0 -0
  705. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/reports/release_notes.py +0 -0
  706. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/scripts/__init__.py +0 -0
  707. {evalvault-1.62.0 → evalvault-1.62.1}/src/evalvault/scripts/regression_runner.py +0 -0
  708. {evalvault-1.62.0 → evalvault-1.62.1}/tests/__init__.py +0 -0
  709. {evalvault-1.62.0 → evalvault-1.62.1}/tests/conftest.py +0 -0
  710. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/README.md +0 -0
  711. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/benchmark/retrieval_ground_truth_min.json +0 -0
  712. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/benchmark/retrieval_ground_truth_multi.json +0 -0
  713. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/auto_insurance_qa_korean_full.json +0 -0
  714. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/comprehensive_dataset.json +0 -0
  715. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/edge_cases.json +0 -0
  716. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/edge_cases.xlsx +0 -0
  717. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/evaluation_test_sample.json +0 -0
  718. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/graphrag_retriever_docs.json +0 -0
  719. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/graphrag_smoke.json +0 -0
  720. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_document.txt +0 -0
  721. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_english.csv +0 -0
  722. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_english.json +0 -0
  723. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_english.xlsx +0 -0
  724. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_korean.csv +0 -0
  725. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_korean.json +0 -0
  726. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_korean.xlsx +0 -0
  727. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/insurance_qa_korean_versioned_pdf.json +0 -0
  728. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/run_mode_full_domain_memory.json +0 -0
  729. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/run_mode_simple.json +0 -0
  730. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/e2e/summary_eval_minimal.json +0 -0
  731. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/kg/minimal_graph.json +0 -0
  732. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/sample_dataset.csv +0 -0
  733. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/sample_dataset.json +0 -0
  734. {evalvault-1.62.0 → evalvault-1.62.1}/tests/fixtures/sample_dataset.xlsx +0 -0
  735. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/__init__.py +0 -0
  736. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/benchmark/test_benchmark_service_integration.py +0 -0
  737. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/conftest.py +0 -0
  738. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_cli_integration.py +0 -0
  739. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_data_flow.py +0 -0
  740. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_e2e_scenarios.py +0 -0
  741. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_evaluation_flow.py +0 -0
  742. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_full_workflow.py +0 -0
  743. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_langfuse_flow.py +0 -0
  744. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_phoenix_flow.py +0 -0
  745. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_pipeline_api_contracts.py +0 -0
  746. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_storage_flow.py +0 -0
  747. {evalvault-1.62.0 → evalvault-1.62.1}/tests/integration/test_summary_eval_fixture.py +0 -0
  748. {evalvault-1.62.0 → evalvault-1.62.1}/tests/optional_deps.py +0 -0
  749. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/__init__.py +0 -0
  750. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/inbound/mcp/test_execute_tools.py +0 -0
  751. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/inbound/mcp/test_read_tools.py +0 -0
  752. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/documents/test_pdf_extractor.py +0 -0
  753. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/documents/test_versioned_loader.py +0 -0
  754. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/improvement/__init__.py +0 -0
  755. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/improvement/test_insight_generator.py +0 -0
  756. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/improvement/test_pattern_detector.py +0 -0
  757. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/improvement/test_playbook_loader.py +0 -0
  758. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/improvement/test_stage_metric_playbook_loader.py +0 -0
  759. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/kg/test_graph_rag_retriever.py +0 -0
  760. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/kg/test_parallel_kg_builder.py +0 -0
  761. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/adapters/outbound/storage/test_benchmark_storage_adapter.py +0 -0
  762. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/config/test_phoenix_support.py +0 -0
  763. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/conftest.py +0 -0
  764. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_analysis_metric_registry.py +0 -0
  765. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_confidence.py +0 -0
  766. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_contextual_relevancy.py +0 -0
  767. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_entity_preservation.py +0 -0
  768. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_metric_registry.py +0 -0
  769. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_no_answer.py +0 -0
  770. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_retrieval_rank.py +0 -0
  771. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/metrics/test_text_match.py +0 -0
  772. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_cache_metrics.py +0 -0
  773. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_claim_level.py +0 -0
  774. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_dataset_preprocessor.py +0 -0
  775. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_document_versioning.py +0 -0
  776. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_evaluator_comprehensive.py +0 -0
  777. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_improvement_guide_service.py +0 -0
  778. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_retrieval_metrics.py +0 -0
  779. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_retriever_context.py +0 -0
  780. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_stage_event_builder.py +0 -0
  781. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_stage_metric_guide_service.py +0 -0
  782. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/services/test_synthetic_qa_generator.py +0 -0
  783. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/test_embedding_overlay.py +0 -0
  784. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/test_prompt_manifest.py +0 -0
  785. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/domain/test_prompt_status.py +0 -0
  786. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/reports/test_release_notes.py +0 -0
  787. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/scripts/test_regression_runner.py +0 -0
  788. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_agent_types.py +0 -0
  789. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_analysis_entities.py +0 -0
  790. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_analysis_modules.py +0 -0
  791. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_analysis_pipeline.py +0 -0
  792. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_analysis_service.py +0 -0
  793. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_anthropic_adapter.py +0 -0
  794. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_async_batch_executor.py +0 -0
  795. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_azure_adapter.py +0 -0
  796. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_benchmark_helpers.py +0 -0
  797. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_benchmark_runner.py +0 -0
  798. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_causal_adapter.py +0 -0
  799. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_cli.py +0 -0
  800. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_cli_domain.py +0 -0
  801. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_cli_init.py +0 -0
  802. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_cli_progress.py +0 -0
  803. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_cli_utils.py +0 -0
  804. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_data_loaders.py +0 -0
  805. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_domain_config.py +0 -0
  806. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_domain_memory.py +0 -0
  807. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_entities.py +0 -0
  808. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_entities_kg.py +0 -0
  809. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_entity_extractor.py +0 -0
  810. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_evaluator.py +0 -0
  811. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_experiment.py +0 -0
  812. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_hybrid_cache.py +0 -0
  813. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_instrumentation.py +0 -0
  814. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_insurance_metric.py +0 -0
  815. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_intent_classifier.py +0 -0
  816. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_kg_generator.py +0 -0
  817. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_kg_networkx.py +0 -0
  818. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_kiwi_tokenizer.py +0 -0
  819. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_kiwi_warning_suppression.py +0 -0
  820. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_korean_dense.py +0 -0
  821. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_korean_evaluation.py +0 -0
  822. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_korean_retrieval.py +0 -0
  823. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_langfuse_tracker.py +0 -0
  824. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_llm_relation_augmenter.py +0 -0
  825. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_lm_eval_adapter.py +0 -0
  826. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_markdown_report.py +0 -0
  827. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_memory_cache.py +0 -0
  828. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_memory_services.py +0 -0
  829. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_method_plugins.py +0 -0
  830. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_mlflow_tracker.py +0 -0
  831. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_model_config.py +0 -0
  832. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_nlp_adapter.py +0 -0
  833. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_nlp_entities.py +0 -0
  834. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_ollama_adapter.py +0 -0
  835. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_openai_adapter.py +0 -0
  836. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_phoenix_adapter.py +0 -0
  837. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_pipeline_orchestrator.py +0 -0
  838. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_ports.py +0 -0
  839. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_postgres_storage.py +0 -0
  840. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_rag_trace_entities.py +0 -0
  841. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_run_memory_helpers.py +0 -0
  842. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_run_mode_fixtures.py +0 -0
  843. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_settings.py +0 -0
  844. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_sqlite_storage.py +0 -0
  845. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_stage_cli.py +0 -0
  846. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_stage_metric_service.py +0 -0
  847. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_stage_storage.py +0 -0
  848. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_stage_summary_service.py +0 -0
  849. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_statistical_adapter.py +0 -0
  850. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_streaming_loader.py +0 -0
  851. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_summary_eval_fixture.py +0 -0
  852. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_testset_generator.py +0 -0
  853. {evalvault-1.62.0 → evalvault-1.62.1}/tests/unit/test_web_adapter.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: evalvault
3
- Version: 1.62.0
3
+ Version: 1.62.1
4
4
  Summary: RAG evaluation system using Ragas with Phoenix/Langfuse tracing
5
5
  Project-URL: Homepage, https://github.com/ntts9990/EvalVault
6
6
  Project-URL: Documentation, https://github.com/ntts9990/EvalVault#readme
@@ -16,6 +16,8 @@
16
16
  - 사용자 가이드(운영 포함): `guides/USER_GUIDE.md`
17
17
  - 개발/기여: `guides/DEV_GUIDE.md`
18
18
  - CLI→MCP 이식 계획: `guides/CLI_MCP_PLAN.md`
19
+ - RAGAS 인간 피드백 보정: `guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md`
20
+ - 진단 플레이북: `guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md` (문제→분석→해석→액션 흐름)
19
21
  - 릴리즈 체크리스트: `guides/RELEASE_CHECKLIST.md`
20
22
  - 상태 요약: `STATUS.md`
21
23
  - 로드맵: `ROADMAP.md`
@@ -36,6 +38,8 @@ docs/
36
38
  │ ├── USER_GUIDE.md # 사용/운영 종합 가이드
37
39
  │ ├── DEV_GUIDE.md # 개발 루틴/테스트/품질
38
40
  │ ├── CLI_MCP_PLAN.md # CLI→MCP 이식 계획 (Living Doc)
41
+ │ ├── RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md # RAGAS 보정 방법론
42
+ │ ├── EVALVAULT_DIAGNOSTIC_PLAYBOOK.md # 진단 플레이북
39
43
  │ ├── RELEASE_CHECKLIST.md # 배포 체크리스트
40
44
  │ ├── OPEN_RAG_TRACE_*.md # Open RAG Trace 샘플/내부 래퍼
41
45
  │ └── OPEN_RAG_TRACE_*.md
@@ -0,0 +1,426 @@
1
+ # EvalVault 진단 플레이북 (Diagnostic Playbook)
2
+ > EvalVault의 분석(Analysis) 기능을 “문제 → 분석 선택 → 실행 → 아티팩트 해석 → 개선 실험”으로 연결하는 한국어 진단 실행서
3
+
4
+ ---
5
+
6
+ ## 1) 목적 / 범위
7
+
8
+ ### 목적
9
+ - 평가 점수의 “좋고 나쁨”을 넘어 **왜 그런지(원인)와 무엇을 바꿔야 하는지(액션)**를 재현 가능한 흐름으로 정리한다.
10
+ - CLI/Web UI에서 동일한 DB를 공유하는 전제 하에 **run_id 중심**으로 진단을 표준화한다.
11
+
12
+ ### 범위(포함)
13
+ - 단일 실행(run_id) 진단: 통계/NLP/인과/플레이북/가설/네트워크/시계열
14
+ - A/B 비교(run_id A vs B): 통계 비교 + 변경 탐지 + 비교 보고서
15
+ - 검색/임베딩/형태소 “검증” 루틴(데이터 품질·전처리·리트리버 품질 확인)
16
+ - 산출물(보고서/아티팩트) 구조 및 해석 기준
17
+ - 반복 개선 루프 및 품질 체크리스트
18
+
19
+ ### 범위(제외)
20
+ - 본 문서는 코드 변경/새 기능 설계를 포함하지 않는다.
21
+ - 외부 링크/URL은 참조하지 않는다.
22
+
23
+ ### 전문가 관점 체크(문서 구조 기준)
24
+ - **인지과학자**: 평가 가이드의 모호성/편향을 줄이고, 진단 단계에서 인지 부하를 줄이는 흐름(결정 트리 → 실행 → 해석)을 유지한다.
25
+ - **편집자**: 보고서/아티팩트 해석 순서가 일관되는지(요약 → 근거 → 액션) 확인한다.
26
+ - **국문학자**: 한국어 표현/톤 관련 문제는 `verify_morpheme` 결과와 함께 판단하고, 문체 기준이 분명한지 점검한다.
27
+ - **소프트웨어 개발자**: 아티팩트 경로와 run_id 재현성이 항상 남는지, 실패 시 원인 추적이 가능한지 확인한다.
28
+ - **아키텍트**: 진단 단계에서 모듈 간 의존성이 과도하지 않은지(단일 축 변경 원칙 유지) 점검한다.
29
+ - **UI/UX 전문가**: 사용자가 “다음 액션”을 바로 이해할 수 있도록 핵심 아티팩트/결론 노출 순서를 고정한다.
30
+
31
+ ---
32
+
33
+ ## 2) 전제조건(필수) / 준비물
34
+
35
+ ### 실행 환경(Extras)
36
+ - `analysis`: 통계/분석 모듈 기반(예: scikit-learn, xgboost)
37
+ - `timeseries`: 시계열 고급 분석 기반(예: aeon, numba)
38
+ - `dashboard`: 대시보드 출력 기반(예: matplotlib)
39
+ - (권장) `korean`: 한국어 형태소/검색 기반(예: kiwipiepy, rank-bm25, sentence-transformers)
40
+
41
+ ### 필수 입력/식별자
42
+ - `DB 경로`: 기본 `data/db/evalvault.db` 또는 환경 변수 `EVALVAULT_DB_PATH`
43
+ - `run_id`: 평가/분석/아티팩트가 묶이는 단위 식별자
44
+ - `metrics`: 예) `faithfulness`, `answer_relevancy`, `context_precision`, `context_recall`, `factual_correctness`, `semantic_similarity`
45
+
46
+ ### 핵심 산출물 경로(고정 패턴)
47
+ - 단일 실행 자동 분석:
48
+ - `reports/analysis/analysis_<RUN_ID>.json`
49
+ - `reports/analysis/analysis_<RUN_ID>.md`
50
+ - `reports/analysis/artifacts/analysis_<RUN_ID>/index.json`
51
+ - `reports/analysis/artifacts/analysis_<RUN_ID>/<node_id>.json`
52
+ - A/B 비교:
53
+ - `reports/comparison/comparison_<RUN_A>_<RUN_B>.json`
54
+ - `reports/comparison/comparison_<RUN_A>_<RUN_B>.md`
55
+
56
+ ---
57
+
58
+ ## 3) 분석 도구 지형도(분류)
59
+
60
+ ### 3.1 “의도(AnalysisIntent)” 분류(실행 선택의 기준)
61
+ 의도 열거형: `src/evalvault/domain/entities/analysis_pipeline.py`
62
+
63
+ | 카테고리 | Intent 값 | 한 줄 용도 | 기본 템플릿 핵심 노드(요약) |
64
+ |---|---|---|---|
65
+ | 검증 | `verify_morpheme` | 한국어 형태소 처리 품질 점검 | `data_loader → morpheme_analyzer → morpheme_quality_checker → verification_report` |
66
+ | 검증 | `verify_embedding` | 임베딩 분포/품질 점검 | `data_loader → embedding_analyzer → embedding_distribution → verification_report` |
67
+ | 검증 | `verify_retrieval` | 검색 컨텍스트 품질 점검 | `data_loader → retrieval_analyzer → retrieval_quality_checker → verification_report` |
68
+ | 비교 | `compare_search` | BM25/Dense/Hybrid 검색 비교 | `data_loader → (bm25/embedding/hybrid) → search_comparator → comparison_report` |
69
+ | 비교 | `compare_models` | 모델별 성능 비교 | `run_loader → model_analyzer → statistical_comparator → comparison_report` |
70
+ | 비교 | `compare_runs` | 실행(run) 단위 비교 | `run_loader → run_analyzer → statistical_comparator → comparison_report` |
71
+ | 분석 | `analyze_low_metrics` | 낮은 점수 원인 분석(종합) | `ragas_evaluator/low_performer_extractor/diagnostic_playbook/causal/root_cause/priority_summary/llm_report` |
72
+ | 분석 | `analyze_patterns` | 패턴(키워드/질문유형) 중심 | `data_loader → nlp_analyzer → pattern_detector → (priority/llm_report)` |
73
+ | 분석 | `analyze_trends` | 실행 이력 기반 추세 | `run_loader → time_series_analyzer → trend_detector → llm_report` |
74
+ | 분석 | `analyze_statistical` | 통계 요약/상관 | `data_loader → statistical_analyzer` |
75
+ | 분석 | `analyze_nlp` | NLP 요약(키워드/유형) | `data_loader → nlp_analyzer` |
76
+ | 분석 | `analyze_causal` | 인과 단서(상관 기반 힌트 포함) | `data_loader → statistical_analyzer → causal_analyzer` |
77
+ | 분석 | `analyze_network` | 메트릭 상관 네트워크 | `data_loader → statistical_analyzer → network_analyzer` |
78
+ | 분석 | `analyze_playbook` | 플레이북 기반 진단(간단) | `data_loader → diagnostic_playbook` |
79
+ | 분석 | `detect_anomalies` | 이상 탐지(시계열) | `run_loader → timeseries_advanced(mode=anomaly)` |
80
+ | 분석 | `forecast_performance` | 성능 예측(시계열) | `run_loader → timeseries_advanced(mode=forecast)` |
81
+ | 분석 | `generate_hypotheses` | 가설 생성(자동) | `data_loader → statistical_analyzer → ragas_evaluator → low_performer_extractor → hypothesis_generator` |
82
+ | 벤치마크 | `benchmark_retrieval` | 검색 벤치마크 | `retrieval_benchmark` |
83
+ | 보고서 | `generate_summary` | 요약 보고서 | `data_loader/statistical/priority_summary → llm_report(report_type=summary)` |
84
+ | 보고서 | `generate_detailed` | 상세 보고서(종합) | `통계+RAGAS+진단+NLP+패턴+인과+원인+추세 → llm_report(report_type=analysis)` |
85
+ | 보고서 | `generate_comparison` | 비교 보고서(종합) | `run_loader → run_metric_comparator + run_change_detector → llm_report(report_type=comparison)` |
86
+
87
+ > 템플릿 정의 근거: `src/evalvault/domain/services/pipeline_template_registry.py`
88
+ > 모듈 등록 근거: `src/evalvault/adapters/outbound/analysis/pipeline_factory.py`
89
+
90
+ ---
91
+
92
+ ### 3.2 “모듈(module_id)” 지도(실제 실행 단위)
93
+ 모듈은 `pipeline_factory.py`에서 등록되며, 각 모듈은 `module_id`를 가진다.
94
+
95
+ | module_id | 파일(근거) | 역할 | 주로 쓰이는 의도/상황 |
96
+ |---|---|---|---|
97
+ | `data_loader` | `src/evalvault/adapters/outbound/analysis/data_loader_module.py` | 데이터/런 로드 | 단일 실행 기반 대부분 |
98
+ | `run_loader` | `src/evalvault/adapters/outbound/analysis/run_loader_module.py` | DB에서 run 로드 | 비교/추세/시계열 |
99
+ | `statistical_analyzer` | `src/evalvault/adapters/outbound/analysis/statistical_analyzer_module.py` | 평균/분산/상관/통과율 | “원인 분석의 시작점” |
100
+ | `nlp_analyzer` | `src/evalvault/adapters/outbound/analysis/nlp_analyzer_module.py` | 키워드/질문유형 요약 | 패턴·분포 확인 |
101
+ | `causal_analyzer` | `src/evalvault/adapters/outbound/analysis/causal_analyzer_module.py` | 인과 단서 생성 | 원인 가설 강화 |
102
+ | `diagnostic_playbook` | `src/evalvault/adapters/outbound/analysis/diagnostic_playbook_module.py` | 메트릭별 진단/개선 힌트 | 저점수 신속 대응 |
103
+ | `root_cause_analyzer` | `src/evalvault/adapters/outbound/analysis/root_cause_analyzer_module.py` | 진단+인과 단서를 원인으로 집계 | 액션 후보 정리 |
104
+ | `pattern_detector` | `src/evalvault/adapters/outbound/analysis/pattern_detector_module.py` | 키워드/유형 패턴 요약 | 재현 조건 찾기 |
105
+ | `retrieval_analyzer` | `src/evalvault/adapters/outbound/analysis/retrieval_analyzer_module.py` | 검색 요약 통계 | 검색 품질 검증 |
106
+ | `retrieval_quality_checker` | `src/evalvault/adapters/outbound/analysis/retrieval_quality_checker_module.py` | 검색 품질 체크(휴리스틱) | “검색이 문제인지” 빠른 판정 |
107
+ | `embedding_analyzer` | `src/evalvault/adapters/outbound/analysis/embedding_analyzer_module.py` | 임베딩 분포 통계 | 임베딩 드리프트/품질 |
108
+ | `morpheme_analyzer` | `src/evalvault/adapters/outbound/analysis/morpheme_analyzer_module.py` | 형태소 분석 | 한국어 쿼리/토큰화 |
109
+ | `morpheme_quality_checker` | `src/evalvault/adapters/outbound/analysis/morpheme_quality_checker_module.py` | 형태소 품질 체크 | 한국어 분석 신뢰성 |
110
+ | `time_series_analyzer` | `src/evalvault/adapters/outbound/analysis/time_series_analyzer_module.py` | 실행 이력 요약 | 추세 파악 |
111
+ | `timeseries_advanced` | `src/evalvault/adapters/outbound/analysis/timeseries_advanced_module.py` | 이상탐지/예측 | 릴리즈 회귀 감지 |
112
+ | `trend_detector` | `src/evalvault/adapters/outbound/analysis/trend_detector_module.py` | 추세 감지 | 회귀 시점 탐색 |
113
+ | `network_analyzer` | `src/evalvault/adapters/outbound/analysis/network_analyzer_module.py` | 상관 네트워크 | 메트릭 구조 파악 |
114
+ | `hypothesis_generator` | `src/evalvault/adapters/outbound/analysis/hypothesis_generator_module.py` | 가설 자동 생성 | 다음 실험 설계 |
115
+ | `run_metric_comparator` | `src/evalvault/adapters/outbound/analysis/run_metric_comparator_module.py` | A/B 메트릭 비교 상세 | 비교 스코어카드 |
116
+ | `run_change_detector` | `src/evalvault/adapters/outbound/analysis/run_change_detector_module.py` | 데이터/설정/프롬프트 변경 탐지 | “뭐가 바뀌었나” |
117
+ | `llm_report` | `src/evalvault/adapters/outbound/analysis/llm_report_module.py` | 요약/상세/비교 리포트 | 사람 읽는 결론화 |
118
+
119
+ ---
120
+
121
+ ## 4) 진단 결정 트리(문제 → 분석 선택)
122
+
123
+ ### 4.1 0단계: “진단 가능 상태” 체크(실패 원인 제거)
124
+ - [ ] `--db` 또는 `EVALVAULT_DB_PATH`가 올바른가
125
+ - [ ] 대상 `run_id`가 DB에 존재하는가 (`evalvault history`로 확인)
126
+ - [ ] 데이터셋에 `thresholds`가 포함되어 있는가(또는 기본 기준을 알고 있는가)
127
+ - [ ] 메트릭 실행 조건(임베딩 필요 메트릭 등)을 충족하는가
128
+
129
+ ### 4.2 1단계: 증상 기반 선택(가장 빠른 분기)
130
+ 아래 표에서 “현재 문제”를 고르고, 권장 Intent/명령으로 바로 진입한다.
131
+
132
+ | 문제(증상) | 1차 목표 | 권장 Intent(또는 명령) | 우선 확인 아티팩트 |
133
+ |---|---|---|---|
134
+ | 특정 메트릭이 임계치 미달(전반적 저점수) | “왜 낮은지” 원인 후보 만들기 | `analyze_low_metrics` 또는 `analyze_playbook` | `analysis_<RUN_ID>.md`, `diagnostic_playbook.json`, `root_cause_analyzer.json` |
135
+ | `faithfulness` 낮음 | 근거/인용/컨텍스트 정합 문제 분리 | `analyze_low_metrics` + `verify_retrieval` | `retrieval_quality_checker.json`, `ragas_evaluator.json` |
136
+ | `answer_relevancy` 낮음 | 의도 파악/프롬프트 정렬 점검 | `analyze_low_metrics` + `analyze_patterns` | `nlp_analyzer.json`, `pattern_detector.json` |
137
+ | `context_precision` 낮음 | 불필요 컨텍스트/노이즈 | `verify_retrieval` + (필요 시) `compare_search` | `retrieval_analyzer.json`, `search_comparator.json` |
138
+ | `context_recall` 낮음 | top_k/쿼리 확장/청킹 이슈 | `verify_retrieval` + `benchmark_retrieval` | `retrieval_benchmark.json`, `retrieval_quality_checker.json` |
139
+ | 임베딩 기반 메트릭이 불안정/이상 | 임베딩 백엔드/분포 점검 | `verify_embedding` | `embedding_analyzer.json`, `embedding_distribution.json` |
140
+ | 한국어에서 토큰화/검색이 이상 | 형태소 기반 파이프라인 점검 | `verify_morpheme` | `morpheme_quality_checker.json` |
141
+ | 릴리즈 이후 점수가 갑자기 하락 | 하락 시점/원인 변경 추적 | `analyze_trends` + `generate_comparison` | `trend_detector.json`, `run_change_detector.json` |
142
+ | A/B 비교에서 개선이 애매함 | 유의성/변경점 정리 | `evalvault analyze-compare` 또는 `generate_comparison` | `comparison_<A>_<B>.md`, `run_metric_comparator.json` |
143
+ | 스테이지 병목/지연/트레이스 누락 | 실행 단계/추적 문제 분리 | `evalvault stage` + `evalvault debug report` | debug report 출력, stage metrics 리스트 |
144
+ | 상관 구조가 바뀜(메트릭 간 연동) | 상관/네트워크 구조 확인 | `analyze_network` | `network_analyzer.json` |
145
+ | 다음 실험 설계를 못하겠음 | 가설 자동 생성 | `generate_hypotheses` | `hypothesis_generator.json` |
146
+
147
+ ---
148
+
149
+ ## 5) 핵심 플레이북 시나리오(최소 8개)
150
+
151
+ > 공통 원칙: 각 시나리오는 **(1) 문제 정의 → (2) 실행 경로 → (3) 아티팩트 판독 → (4) 다음 실험** 순서로 처리한다.
152
+
153
+ ### 시나리오 1) “전체 통과율이 낮다” (원인 후보를 빠르게 만든다)
154
+ - 트리거: `analysis_<RUN_ID>.md`에서 전체 통과율이 낮음(예: 0.7 미만)
155
+ - 실행(CLI):
156
+ - `uv run evalvault analyze <RUN_ID> --db data/db/evalvault.db --nlp --causal --playbook --enable-llm`
157
+ - 실행(Web UI):
158
+ - Reports에서 run 선택 → Analysis Lab에서 “분석(통계/NLP/인과/플레이북)” 실행
159
+ - 확인 아티팩트:
160
+ - `reports/analysis/artifacts/analysis_<RUN_ID>/diagnostic_playbook.json`
161
+ - `reports/analysis/artifacts/analysis_<RUN_ID>/root_cause_analyzer.json`
162
+ - `reports/analysis/artifacts/analysis_<RUN_ID>/priority_summary.json`
163
+ - 해석 기준:
164
+ - `diagnostics`의 `gap(임계치-점수)`가 큰 메트릭부터 우선순위.
165
+ - `root_cause_analyzer`의 추천(recommendations) 중 “반복적으로 등장하는 조치”를 1차 실험 후보로 채택.
166
+ - 다음 액션:
167
+ - 프롬프트/검색(top_k, 리랭킹)/데이터 정제 중 **1개 축만** 바꿔 재실행(run) 후 비교.
168
+
169
+ ---
170
+
171
+ ### 시나리오 2) “faithfulness가 낮다” (검색 문제 vs 생성 문제 분리)
172
+ - 트리거: `faithfulness`가 임계치 미달
173
+ - 실행(권장 흐름):
174
+ 1) 검색 품질 검증 → 2) 저점수 원인 분석(종합)
175
+ - 실행(파이프라인 Intent):
176
+ - 1) `verify_retrieval`
177
+ - 2) `analyze_low_metrics`
178
+ - 실행(CLI 예시):
179
+ - `uv run evalvault pipeline analyze "검색 품질 검증" --run-id <RUN_ID> --db data/db/evalvault.db`
180
+ - `uv run evalvault pipeline analyze "낮은 메트릭 원인 분석" --run-id <RUN_ID> --db data/db/evalvault.db`
181
+ - 확인 아티팩트(핵심):
182
+ - `retrieval_quality_checker.json`의 `passed` 및 체크 항목(빈 컨텍스트 비율/평균 컨텍스트 토큰/키워드 겹침/ground_truth hit)
183
+ - `diagnostic_playbook.json`의 `faithfulness` 관련 진단/권고
184
+ - 해석 기준:
185
+ - `verify_retrieval` 체크 실패이면: 생성(LLM)보다 **검색/컨텍스트 구성**이 1차 병목.
186
+ - 체크 통과인데 faithfulness만 낮으면: **답변의 근거 인용/출처 정렬(프롬프트/후처리)**를 우선 점검.
187
+ - 다음 액션:
188
+ - 검색 체크 실패 시: 리랭킹/노이즈 필터/컨텍스트 최소 토큰 확보부터.
189
+ - 검색 체크 통과 시: 시스템 프롬프트에 “근거 인용/컨텍스트 밖 주장 금지” 등 정렬 강화.
190
+
191
+ ---
192
+
193
+ ### 시나리오 3) “answer_relevancy가 낮다” (의도/질문유형 패턴으로 좁힌다)
194
+ - 트리거: `answer_relevancy`가 임계치 미달
195
+ - 실행(Intent): `analyze_patterns` + (필요 시) `analyze_low_metrics`
196
+ - 확인 아티팩트:
197
+ - `nlp_analyzer.json` (top_keywords, question_types)
198
+ - `pattern_detector.json` (상위 키워드/질문유형 요약)
199
+ - 해석 기준:
200
+ - 특정 질문유형(예: 절차형/정의형/비교형)이 과대표집되어 있고 해당 유형에서 점수가 낮으면 → **유형별 프롬프트 분기** 후보.
201
+ - 다음 액션:
202
+ - 질문유형별 템플릿/가드레일을 분리한 뒤 동일 데이터셋으로 재평가.
203
+
204
+ ---
205
+
206
+ ### 시나리오 4) “context_precision이 낮다” (노이즈 컨텍스트를 줄인다)
207
+ - 트리거: `context_precision`이 낮고, 컨텍스트가 길거나 많음
208
+ - 실행(Intent): `verify_retrieval` → (대안 비교) `compare_search`
209
+ - 확인 아티팩트:
210
+ - `retrieval_analyzer.json` 요약(컨텍스트 개수/토큰/빈 컨텍스트 비율/키워드 겹침)
211
+ - `compare_search` 결과(하이브리드 방식 비교 시)
212
+ - 해석 기준:
213
+ - 키워드 겹침이 낮고 컨텍스트 토큰이 크면: “긴데 관련 없음” 패턴 → **리랭킹/필터링** 우선.
214
+ - 다음 액션:
215
+ - top_k를 무작정 늘리기보다, 불필요 컨텍스트 제거(precision 확보)부터 적용 후 재평가.
216
+
217
+ ---
218
+
219
+ ### 시나리오 5) “context_recall이 낮다” (찾아야 할 근거를 못 찾는다)
220
+ - 트리거: `context_recall`이 낮고, ground_truth가 존재하는 데이터셋
221
+ - 실행(Intent): `verify_retrieval` + `benchmark_retrieval`
222
+ - 확인 아티팩트:
223
+ - `retrieval_quality_checker.json`의 `ground_truth_hit_rate`
224
+ - `retrieval_benchmark.json`(벤치마크 결과)
225
+ - 해석 기준:
226
+ - `ground_truth_hit_rate`가 낮으면: 쿼리/청킹/인덱싱 단계의 재현율 병목 가능성이 큼.
227
+ - 다음 액션:
228
+ - 청킹 전략/쿼리 확장/검색 방식(하이브리드) 실험을 1개씩 분리 실행.
229
+
230
+ ---
231
+
232
+ ### 시나리오 6) “임베딩 기반 지표가 흔들린다/이상하다” (백엔드/분포 확인)
233
+ - 트리거: `semantic_similarity` 등 임베딩 기반 결과가 불안정하거나 NaN/실패가 잦음
234
+ - 실행(Intent): `verify_embedding`
235
+ - 확인 아티팩트:
236
+ - `embedding_analyzer.json` 요약(backend/model/dimension/avg_norm/norm_std/mean_cosine_to_centroid)
237
+ - `embedding_distribution.json`(분포 점검 결과)
238
+ - 해석 기준:
239
+ - `norm_std`가 지나치게 낮거나 `mean_cosine_to_centroid`가 지나치게 높으면: 임베딩이 한 방향으로 붕괴/클러스터링 가능성.
240
+ - backend 오류가 있으면: 임베딩 지표 해석 전에 환경/모델을 먼저 안정화.
241
+ - 다음 액션:
242
+ - 임베딩 백엔드/모델을 고정한 뒤(프로필/설정) 재평가하여 변동성부터 제거.
243
+
244
+ ---
245
+
246
+ ### 시나리오 7) “한국어에서 진단 자체가 믿기 어렵다” (형태소/토크나이저 검증)
247
+ - 트리거: 한국어 질문/컨텍스트에서 키워드/검색 결과가 부자연스럽거나 분석 품질이 낮다고 의심됨
248
+ - 실행(Intent): `verify_morpheme`
249
+ - 확인 아티팩트:
250
+ - `morpheme_quality_checker.json`의 `tokenizer_backend`(예: kiwi) 및 토큰/어휘 크기 체크
251
+ - 해석 기준:
252
+ - 형태소 품질 체크 실패 시: 키워드/검색/분류 기반 분석 결과의 신뢰도가 동반 하락할 수 있음.
253
+ - 다음 액션:
254
+ - 한국어 extra 및 토크나이저 백엔드를 먼저 정상화한 뒤, NLP/검색 관련 분석을 재실행.
255
+
256
+ ---
257
+
258
+ ### 시나리오 8) “릴리즈 이후 성능이 회귀했다” (시점+변경점으로 좁힌다)
259
+ - 트리거: 최근 run들의 성능이 하락 추세
260
+ - 실행(Intent):
261
+ - 1) `analyze_trends` (하락 시점 탐색)
262
+ - 2) `generate_comparison` (대표 run A/B 선택 후 변경점+비교 보고서)
263
+ - 실행(CLI 예시):
264
+ - `uv run evalvault pipeline analyze "추세 분석" --db data/db/evalvault.db`
265
+ - `uv run evalvault analyze-compare <RUN_A> <RUN_B> --db data/db/evalvault.db --test t-test|mann-whitney`
266
+ - 확인 아티팩트:
267
+ - `trend_detector.json` (추세 감지 결과)
268
+ - `run_change_detector.json` (데이터셋/설정/프롬프트 변경)
269
+ - `comparison_<A>_<B>.md` (비교 보고서)
270
+ - 해석 기준:
271
+ - “변경 탐지”에서 데이터셋이 바뀌었다면 비교 해석이 왜곡될 수 있으므로 **동일 데이터셋 조건**을 우선 확보.
272
+ - 다음 액션:
273
+ - 변경이 1개 축(프롬프트/모델/검색)으로 수렴되도록 실험 설계를 재정렬.
274
+
275
+ ---
276
+
277
+ ### 시나리오 9) “스테이지 병목/트레이스 누락” (실행 단계 진단)
278
+ - 트리거: 응답 지연/타임아웃, stage metric 누락, Phoenix/Langfuse 링크 없음
279
+ - 실행(CLI):
280
+ - `uv run evalvault stage compute-metrics <RUN_ID> --db data/db/evalvault.db`
281
+ - `uv run evalvault debug report <RUN_ID> --db data/db/evalvault.db`
282
+ - 확인 아티팩트:
283
+ - debug report의 stage summary/bottlenecks/recommendations/failing metrics
284
+ - trace 링크(phoenix/langfuse)가 있으면 해당 run에서 스팬 흐름 확인
285
+ - 해석 기준:
286
+ - 특정 stage에 병목이 집중되면 그 단계(검색/생성/후처리) 개선 우선
287
+ - trace 링크가 없으면 트레이싱 설정/환경 변수 우선 점검
288
+ - 다음 액션:
289
+ - `PHOENIX_ENABLED`, `PHOENIX_ENDPOINT` 및 Open RAG Trace 계측(어댑터/데코레이터) 확인
290
+
291
+ ---
292
+
293
+ ### 시나리오 10) “A/B 개선이 애매하다” (유의성/노이즈 관점으로 판단을 강화)
294
+ - 트리거: 평균 차이는 있으나 결론이 흔들림(샘플이 적거나 변동이 큼)
295
+ - 실행(현재 제공 흐름):
296
+ - `uv run evalvault analyze-compare <RUN_A> <RUN_B> --db data/db/evalvault.db --test t-test|mann-whitney`
297
+ - 파이프라인 비교 보고서: `AnalysisIntent.GENERATE_COMPARISON` (내부에서 사용)
298
+ - 보강(신뢰도 진단 프레임):
299
+ - `docs/guides/PRD_LENA.md`의 노이즈 분해/신뢰구간/표본수(N,K) 추천 개념을 적용해 “추가 샘플이 필요한지” 판단한다.
300
+ - 해석 기준(운영 규칙):
301
+ - 효과가 작고 표본이 작으면: 결론을 내리기보다 **N(문항 수) 또는 K(반복 수)** 확대가 우선.
302
+ - 다음 액션:
303
+ - 동일 데이터셋 조건 유지, 평가 비용 대비 효과가 큰 방향으로 N/K를 늘리는 계획을 수립한다.
304
+
305
+ ---
306
+
307
+ ### 시나리오 11) “자동 지표가 사용자 만족과 어긋난다” (인간 피드백 보정 루프)
308
+ - 트리거: 이해관계자가 RAGAS 점수를 KPI로 신뢰하지 않음
309
+ - 적용 프레임:
310
+ - `docs/guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md`의 절차(대표 샘플링 → 인간 평가 → 보정 모델 → 전체 적용 → 반복 개선)를 운영 루프로 연결한다.
311
+ - 운영 해석 기준:
312
+ - 자동 지표는 “재현 가능한 신호”로 유지하되, 만족도 정합은 보정 루프로 관리한다.
313
+ - 다음 액션:
314
+ - “불일치 케이스(자동 지표는 높지만 만족은 낮음 / 반대)”를 우선 라벨링 대상으로 선정한다.
315
+
316
+ ---
317
+
318
+ ## 6) CLI / Web UI 실행 경로(치트시트)
319
+
320
+ ### 6.1 CLI(가장 빠른 시작)
321
+ - 평가 + 자동 분석:
322
+ - `uv run evalvault run <DATASET> --metrics <M1,M2,...> --db data/db/evalvault.db --auto-analyze`
323
+ - 단일 run 상세 분석(옵션 조합형):
324
+ - `uv run evalvault analyze <RUN_ID> --db data/db/evalvault.db --nlp --causal --playbook --enable-llm`
325
+ - (선택) `--dashboard`, `--anomaly-detect`, `--forecast`, `--network`, `--generate-hypothesis`
326
+ - A/B 비교:
327
+ - `uv run evalvault analyze-compare <RUN_A> <RUN_B> --db data/db/evalvault.db --test t-test|mann-whitney`
328
+ - 스테이지/디버그 진단:
329
+ - `uv run evalvault stage compute-metrics <RUN_ID> --db data/db/evalvault.db`
330
+ - `uv run evalvault debug report <RUN_ID> --db data/db/evalvault.db`
331
+ - 쿼리 기반 파이프라인:
332
+ - `uv run evalvault pipeline analyze "<자연어 쿼리>" --run-id <RUN_ID> --db data/db/evalvault.db`
333
+ - 파이프라인 가시화:
334
+ - `uv run evalvault pipeline intents`
335
+ - `uv run evalvault pipeline templates`
336
+
337
+ ### 6.2 Web UI(메뉴 기반 운영)
338
+ - 실행:
339
+ - API: `uv run evalvault serve-api --reload`
340
+ - 프론트: `cd frontend && npm install && npm run dev`
341
+ - 메뉴 구조(이관 계획 기준):
342
+ - 기초 통계 / 시계열(이상·예측) / 구조·원인(인과·네트워크) / 지능형(가설·플레이북) / 비교
343
+
344
+ ---
345
+
346
+ ## 7) 산출물/아티팩트(무엇을 어디서 보나)
347
+
348
+ ### 7.1 “요약 보고서” vs “아티팩트”
349
+ - 요약 보고서(`analysis_<RUN_ID>.md`, `comparison_<A>_<B>.md`): 의사결정용 결론/요약
350
+ - 아티팩트(`artifacts/.../<node_id>.json`): **원본 근거**(재현/디버깅/자동화에 필요)
351
+
352
+ ### 7.2 아티팩트 인덱스 활용
353
+ - `reports/analysis/artifacts/analysis_<RUN_ID>/index.json`에는 노드별 결과 파일 경로가 구조화되어 있다.
354
+ - 운영 원칙: “보고서로 결론을 보고 → 인덱스로 근거 노드를 찾아 → 노드 JSON으로 확인” 순서를 고정한다.
355
+
356
+ ---
357
+
358
+ ## 8) 해석 기준 / 주의사항(오판 방지)
359
+
360
+ ### 8.1 비교 분석 주의
361
+ - A/B 비교는 **동일 데이터셋 조건**에서 수행해야 해석이 안전하다.
362
+ - `run_change_detector`에서 데이터셋/설정/프롬프트 변경이 다수 발견되면, 결론을 내리기 전에 변경 축을 줄인다.
363
+
364
+ ### 8.2 지표 해석 주의
365
+ - `thresholds`는 데이터셋에 포함되며, “점수 0.8이 항상 합격” 같은 단일 기준을 가정하지 않는다.
366
+ - 임베딩 기반 지표는 임베딩 백엔드/모델 상태에 민감하므로, `verify_embedding`으로 환경 안정성을 먼저 확인한다.
367
+
368
+ ### 8.3 한국어 특화 주의
369
+ - 형태소/토큰화 품질이 낮으면 키워드/검색 기반 분석이 왜곡될 수 있다.
370
+ - `verify_morpheme` 결과가 실패인 상태에서 NLP/검색 결과를 과신하지 않는다.
371
+
372
+ ---
373
+
374
+ ## 9) 반복 개선 루프(운영 표준)
375
+
376
+ ### 9.1 루프(고정 절차)
377
+ 1. **기준 run 확보**: `evalvault run ... --db ... --auto-analyze`
378
+ 2. **문제 분류**: 결정 트리로 Intent 선택(1차 진단)
379
+ 3. **근거 확인**: 아티팩트 인덱스 → 핵심 노드 JSON 확인
380
+ 4. **가설/액션 1개 선택**: 한 번에 한 축만 변경
381
+ 5. **재실행(run)**: 동일 데이터셋/메트릭 유지
382
+ 6. **비교(analyze-compare)**: 변화의 방향/유의성/변경점 확인
383
+ 7. **기록/공유**: 비교 보고서를 “결정 기록”으로 남긴다
384
+
385
+ ### 9.2 “한 번에 하나” 원칙(실험 설계)
386
+ - 한 번에 여러 요소(프롬프트+모델+검색)를 바꾸면 원인 추적이 불가능해진다.
387
+ - 원인 분석이 목적이면 변경을 최소화하고, 개선이 목적이면 변경은 하되 “비교 보고서로 변경점을 문서화”한다.
388
+
389
+ ---
390
+
391
+ ## 10) 품질 체크리스트(진단 완료 조건)
392
+
393
+ ### 10.1 진단의 완결성
394
+ - [ ] 문제(증상)가 “메트릭/구간/범위”로 명확히 정의되었는가
395
+ - [ ] 선택한 Intent/모듈이 문제와 직접 연결되는가(근거 노드가 존재하는가)
396
+ - [ ] 보고서 결론이 아티팩트(노드 JSON)로 추적 가능한가
397
+
398
+ ### 10.2 재현성
399
+ - [ ] `DB 경로`, `run_id`, `metrics`, `profile`이 기록되었는가
400
+ - [ ] 산출물 경로(`reports/...`)가 run_id 기준으로 정리되었는가
401
+ - [ ] 비교 시 동일 데이터셋 조건을 확인했는가
402
+
403
+ ### 10.3 실행 안정성(환경)
404
+ - [ ] `analysis/timeseries/dashboard` extras가 설치되어 필요한 기능이 실행 가능한가
405
+ - [ ] 임베딩/한국어 토크나이저 환경이 `verify_embedding/verify_morpheme`로 확인되었는가
406
+
407
+ ### 10.4 액션 품질
408
+ - [ ] 다음 실험이 “하나의 변경 축”으로 정의되었는가
409
+ - [ ] 성공/실패 판정 기준이 threshold 및 비교 보고서로 정의되었는가
410
+
411
+ ---
412
+
413
+ ## 부록 A) 빠른 매핑(“무슨 문제에 뭘 쓰나”)
414
+
415
+ | 의도(Intent) | 대표 질문(운영자가 던지는 질문) | 핵심 모듈(module_id) |
416
+ |---|---|---|
417
+ | `analyze_low_metrics` | “점수가 왜 낮지? 당장 뭘 바꿔야 하지?” | `ragas_evaluator`, `diagnostic_playbook`, `root_cause_analyzer`, `llm_report` |
418
+ | `verify_retrieval` | “검색이 문제인가?” | `retrieval_analyzer`, `retrieval_quality_checker` |
419
+ | `verify_embedding` | “임베딩이 정상인가?” | `embedding_analyzer`, `embedding_distribution` |
420
+ | `verify_morpheme` | “한국어 토큰화가 정상인가?” | `morpheme_analyzer`, `morpheme_quality_checker` |
421
+ | `generate_comparison` | “A/B에서 뭐가 바뀌었고 뭐가 유의미하지?” | `run_metric_comparator`, `run_change_detector`, `llm_report` |
422
+ | `analyze_trends` | “언제부터 나빠졌지?” | `time_series_analyzer`, `trend_detector` |
423
+ | `generate_hypotheses` | “다음 실험 가설을 자동으로 만들 수 있나?” | `hypothesis_generator` |
424
+ | `analyze_network` | “메트릭 구조(연동)가 어떻게 바뀌었나?” | `network_analyzer` |
425
+
426
+ > 스테이지/디버그 진단은 Intent 분류 없이 `evalvault stage`, `evalvault debug report`로 실행한다.