evalvault 1.62.0__tar.gz → 1.63.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {evalvault-1.62.0 → evalvault-1.63.0}/.gitignore +5 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/PKG-INFO +228 -4
- {evalvault-1.62.0 → evalvault-1.63.0}/README.en.md +10 -0
- evalvault-1.63.0/README.md +434 -0
- evalvault-1.63.0/config/ragas_prompts_override.yaml +4 -0
- evalvault-1.63.0/data/datasets/ragas_ko90_en10.json +222 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/INDEX.md +8 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/README.ko.md +6 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/guides/DEV_GUIDE.md +1 -0
- evalvault-1.63.0/docs/guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md +426 -0
- evalvault-1.63.0/docs/guides/EVALVAULT_RUN_EXCEL_SHEETS.md +186 -0
- evalvault-1.63.0/docs/guides/EVALVAULT_WORK_PLAN.md +110 -0
- evalvault-1.63.0/docs/guides/EXTERNAL_TRACE_API_SPEC.md +144 -0
- evalvault-1.63.0/docs/guides/LENA_MVP_IMPLEMENTATION_PLAN.md +763 -0
- evalvault-1.63.0/docs/guides/LENA_RAGAS_CALIBRATION_DEV_PLAN.md +428 -0
- evalvault-1.63.0/docs/guides/PRD_LENA.md +637 -0
- evalvault-1.63.0/docs/guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md +171 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/guides/USER_GUIDE.md +20 -0
- evalvault-1.63.0/docs/guides/WEBUI_CLI_ROLLOUT_PLAN.md +185 -0
- evalvault-1.63.0/docs/guides/prompt_suggestions_design.md +269 -0
- evalvault-1.63.0/docs/templates/otel_openinference_trace_example.json +48 -0
- evalvault-1.63.0/docs/templates/ragas_dataset_example_ko90_en10.json +82 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/package-lock.json +14 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/package.json +1 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/components/AnalysisNodeOutputs.tsx +26 -20
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/components/PrioritySummaryPanel.tsx +4 -3
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/components/SpacePlot2D.tsx +27 -21
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/components/SpacePlot3D.tsx +4 -2
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/components/StatusBadge.tsx +1 -1
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/hooks/useInsightSpace.ts +6 -8
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/AnalysisCompareView.tsx +190 -4
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/AnalysisLab.tsx +64 -22
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/AnalysisResultView.tsx +3 -3
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/CompareRuns.tsx +166 -3
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/ComprehensiveAnalysis.tsx +24 -28
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/CustomerReport.tsx +198 -3
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/Dashboard.tsx +6 -2
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/EvaluationStudio.tsx +3 -3
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/RunDetails.tsx +712 -43
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/Settings.tsx +109 -2
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/Visualization.tsx +28 -9
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/services/api.ts +102 -3
- {evalvault-1.62.0 → evalvault-1.63.0}/mkdocs.yml +2 -0
- evalvault-1.63.0/prompts/system_override.txt +1 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/pyproject.toml +2 -1
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/api/adapter.py +190 -19
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/api/routers/runs.py +66 -2
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/method.py +5 -2
- evalvault-1.63.0/src/evalvault/adapters/inbound/cli/commands/prompts.py +765 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/run.py +43 -2
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/run_helpers.py +10 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/mcp/tools.py +5 -2
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/ragas_evaluator_module.py +13 -9
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/llm/__init__.py +5 -43
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/llm/anthropic_adapter.py +27 -7
- evalvault-1.63.0/src/evalvault/adapters/outbound/llm/factory.py +103 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/llm/llm_relation_augmenter.py +39 -14
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/llm/ollama_adapter.py +34 -10
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/llm/openai_adapter.py +41 -8
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/llm/token_aware_chat.py +21 -2
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/llm/vllm_adapter.py +39 -8
- evalvault-1.63.0/src/evalvault/adapters/outbound/nlp/korean/toolkit_factory.py +20 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/report/llm_report_generator.py +90 -6
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/storage/base_sql.py +527 -21
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/storage/postgres_adapter.py +209 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/storage/postgres_schema.sql +38 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/storage/sqlite_adapter.py +86 -5
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/debug_ragas.py +7 -1
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/debug_ragas_real.py +5 -1
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/__init__.py +10 -0
- evalvault-1.63.0/src/evalvault/domain/entities/prompt_suggestion.py +50 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/__init__.py +6 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/evaluator.py +191 -103
- evalvault-1.63.0/src/evalvault/domain/services/holdout_splitter.py +67 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/intent_classifier.py +73 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/pipeline_template_registry.py +3 -0
- evalvault-1.63.0/src/evalvault/domain/services/prompt_candidate_service.py +117 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/prompt_registry.py +40 -2
- evalvault-1.63.0/src/evalvault/domain/services/prompt_scoring_service.py +286 -0
- evalvault-1.63.0/src/evalvault/domain/services/prompt_suggestion_reporter.py +277 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/synthetic_qa_generator.py +4 -3
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/inbound/learning_hook_port.py +4 -1
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/__init__.py +2 -0
- evalvault-1.63.0/src/evalvault/ports/outbound/llm_factory_port.py +13 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/llm_port.py +34 -2
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/storage_port.py +38 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/test_e2e_scenarios.py +13 -6
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/test_evaluation_flow.py +3 -3
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/test_full_workflow.py +3 -3
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/services/test_evaluator_comprehensive.py +5 -5
- evalvault-1.63.0/tests/unit/domain/services/test_holdout_splitter.py +53 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_analysis_pipeline.py +2 -2
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_evaluator.py +10 -9
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_markdown_report.py +31 -31
- evalvault-1.63.0/tests/unit/test_prompt_candidate_service.py +65 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_sqlite_storage.py +36 -1
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_web_adapter.py +2 -1
- {evalvault-1.62.0 → evalvault-1.63.0}/uv.lock +12 -1
- evalvault-1.62.0/README.md +0 -211
- evalvault-1.62.0/docs/guides/RAG_HUMAN_FEEDBACK_CALIBRATION.md +0 -298
- evalvault-1.62.0/reports/analysis/analysis_0aa9fab0-6c2c-4c1c-b228-202a38a2f00c.json +0 -2246
- evalvault-1.62.0/reports/analysis/analysis_0aa9fab0-6c2c-4c1c-b228-202a38a2f00c.md +0 -80
- evalvault-1.62.0/reports/analysis/analysis_2163f844-ee2c-4630-9ba8-35cd9954d92e.json +0 -111
- evalvault-1.62.0/reports/analysis/analysis_2163f844-ee2c-4630-9ba8-35cd9954d92e.md +0 -52
- evalvault-1.62.0/reports/analysis/analysis_4516d358-2797-4c46-9f14-c1d975588025.json +0 -5774
- evalvault-1.62.0/reports/analysis/analysis_4516d358-2797-4c46-9f14-c1d975588025.md +0 -129
- evalvault-1.62.0/reports/analysis/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb.json +0 -1657
- evalvault-1.62.0/reports/analysis/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb.md +0 -61
- evalvault-1.62.0/reports/analysis/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5.json +0 -4439
- evalvault-1.62.0/reports/analysis/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5.md +0 -110
- evalvault-1.62.0/reports/analysis/analysis_9fbf4776-9f5b-4c4b-ba08-c556032cee86.json +0 -2329
- evalvault-1.62.0/reports/analysis/analysis_9fbf4776-9f5b-4c4b-ba08-c556032cee86.md +0 -16
- evalvault-1.62.0/reports/analysis/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775.json +0 -1909
- evalvault-1.62.0/reports/analysis/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775.md +0 -66
- evalvault-1.62.0/reports/analysis/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e.json +0 -2209
- evalvault-1.62.0/reports/analysis/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e.md +0 -106
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/causal_analysis.json +0 -856
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/diagnostic.json +0 -21
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/final_output.json +0 -135
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/index.json +0 -108
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/load_data.json +0 -1231
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/load_runs.json +0 -1810
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/low_samples.json +0 -39
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/nlp_analysis.json +0 -514
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/pattern_detection.json +0 -42
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/priority_summary.json +0 -204
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/ragas_eval.json +0 -182
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/report.json +0 -137
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/root_cause.json +0 -18
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/statistics.json +0 -511
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/time_series.json +0 -52
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/trend_detection.json +0 -23
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/causal_analysis.json +0 -37
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/diagnostic.json +0 -21
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/final_output.json +0 -112
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/index.json +0 -108
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/load_data.json +0 -462
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/load_runs.json +0 -249
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/low_samples.json +0 -11
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/nlp_analysis.json +0 -143
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/pattern_detection.json +0 -22
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/priority_summary.json +0 -165
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/ragas_eval.json +0 -65
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/report.json +0 -114
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/root_cause.json +0 -18
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/statistics.json +0 -205
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/time_series.json +0 -22
- evalvault-1.62.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/trend_detection.json +0 -12
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/causal_analysis.json +0 -856
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/diagnostic.json +0 -21
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/final_output.json +0 -134
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/index.json +0 -108
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/load_data.json +0 -658
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/load_runs.json +0 -1275
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/low_samples.json +0 -39
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/nlp_analysis.json +0 -514
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/pattern_detection.json +0 -42
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/priority_summary.json +0 -199
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/ragas_eval.json +0 -182
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/report.json +0 -136
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/root_cause.json +0 -18
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/statistics.json +0 -291
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/time_series.json +0 -52
- evalvault-1.62.0/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/trend_detection.json +0 -23
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/causal_analysis.json +0 -37
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/diagnostic.json +0 -21
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/final_output.json +0 -112
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/index.json +0 -108
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/load_data.json +0 -462
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/load_runs.json +0 -484
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/low_samples.json +0 -11
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/nlp_analysis.json +0 -143
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/pattern_detection.json +0 -22
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/priority_summary.json +0 -165
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/ragas_eval.json +0 -65
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/report.json +0 -114
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/root_cause.json +0 -18
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/statistics.json +0 -205
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/time_series.json +0 -28
- evalvault-1.62.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/trend_detection.json +0 -23
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/causal_analysis.json +0 -37
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/diagnostic.json +0 -21
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/final_output.json +0 -91
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/index.json +0 -108
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/load_data.json +0 -203
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/load_runs.json +0 -743
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/low_samples.json +0 -30
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/nlp_analysis.json +0 -487
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/pattern_detection.json +0 -38
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/priority_summary.json +0 -149
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/ragas_eval.json +0 -56
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/report.json +0 -93
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/root_cause.json +0 -18
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/statistics.json +0 -169
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/time_series.json +0 -52
- evalvault-1.62.0/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/trend_detection.json +0 -23
- evalvault-1.62.0/src/evalvault/adapters/inbound/cli/commands/prompts.py +0 -157
- {evalvault-1.62.0 → evalvault-1.63.0}/.cursor/worktrees.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.dockerignore +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.env.example +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.github/ISSUE_TEMPLATE/question.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.github/dependabot.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.github/pull_request_template.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.github/stale.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.github/workflows/ci.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.github/workflows/release.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.github/workflows/stale.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.pre-commit-config.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/.python-version +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/AGENTS.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/CHANGELOG.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/CLAUDE.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/CODE_OF_CONDUCT.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/CONTRIBUTING.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/Dockerfile +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/LICENSE.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/SECURITY.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/agent.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/client.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/main.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/memory/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/memory/shared/decisions.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/memory/shared/dependencies.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/memory/templates/coordinator_guide.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/memory/templates/work_log_template.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/memory_integration.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/progress.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts/app_spec.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts/baseline.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts/coding_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts/existing_project_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts/improvement/architecture_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts/improvement/base_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts/improvement/coordinator_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts/improvement/observability_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts/initializer_prompt.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts/prompt_manifest.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts/system.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/prompts.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/requirements.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/agent/security.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/config/domains/insurance/memory.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/config/domains/insurance/terms_dictionary_en.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/config/domains/insurance/terms_dictionary_ko.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/config/methods.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/config/models.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/config/regressions/default.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/config/regressions/ux.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/config/stage_metric_playbook.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/config/stage_metric_thresholds.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/datasets/dummy_test_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/datasets/insurance_qa_korean.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/datasets/insurance_qa_korean.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/datasets/insurance_qa_korean_2.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/datasets/insurance_qa_korean_3.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/datasets/sample.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/datasets/visualization_20q_cluster_map.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/datasets/visualization_20q_korean.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/datasets/visualization_2q_cluster_map.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/datasets/visualization_2q_korean.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/kg/knowledge_graph.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/raw/The Complete Guide to Mastering Suno Advanced Strategies for Professional Music Generation.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/raw/edge_cases.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/raw/run_mode_full_domain_memory.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/data/raw/sample_rag_knowledge.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/dataset_templates/dataset_template.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/dataset_templates/dataset_template.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/dataset_templates/dataset_template.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/dataset_templates/method_input_template.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docker-compose.langfuse.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docker-compose.phoenix.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docker-compose.yml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/ROADMAP.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/STATUS.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/api/adapters/inbound.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/api/adapters/outbound.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/api/config.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/api/domain/entities.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/api/domain/metrics.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/api/domain/services.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/api/ports/inbound.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/api/ports/outbound.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/architecture/open-rag-trace-collector.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/architecture/open-rag-trace-spec.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/getting-started/INSTALLATION.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/guides/AGENTS_SYSTEM_GUIDE.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/guides/CLI_MCP_PLAN.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/guides/OPEN_RAG_TRACE_INTERNAL_ADAPTER.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/guides/OPEN_RAG_TRACE_SAMPLES.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/guides/RELEASE_CHECKLIST.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/guides/rag_human_feedback_calibration_implementation_plan.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/mapping/component-to-whitepaper.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/00_frontmatter.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/01_overview.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/02_architecture.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/03_data_flow.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/04_components.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/05_expert_lenses.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/06_implementation.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/07_advanced.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/08_customization.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/09_quality.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/10_performance.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/11_security.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/12_operations.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/13_standards.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/14_roadmap.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/INDEX.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/new_whitepaper/STYLE_GUIDE.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/stylesheets/extra.css +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/templates/dataset_template.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/templates/dataset_template.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/templates/dataset_template.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/templates/kg_template.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/templates/retriever_docs_template.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/tools/generate-whitepaper.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/docs/web_ui_analysis_migration_plan.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/dummy_test_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/benchmarks/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/benchmarks/korean_rag/faithfulness_test.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/benchmarks/korean_rag/insurance_qa_100.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/benchmarks/korean_rag/keyword_extraction_test.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/benchmarks/korean_rag/retrieval_test.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/benchmarks/output/comparison.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/benchmarks/output/full_results.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/benchmarks/output/leaderboard.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/benchmarks/output/results_mteb.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/benchmarks/output/retrieval_result.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/benchmarks/run_korean_benchmark.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/kg_generator_demo.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/method_plugin_template/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/method_plugin_template/pyproject.toml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/method_plugin_template/src/method_plugin_template/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/method_plugin_template/src/method_plugin_template/methods.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/usecase/comprehensive_workflow_test.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/usecase/insurance_eval_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/examples/usecase/output/comprehensive_report.html +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/.env.example +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/.gitignore +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/e2e/analysis-compare.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/e2e/analysis-lab.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/e2e/compare-runs.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/e2e/dashboard.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/e2e/domain-memory.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/e2e/evaluation-studio.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/e2e/knowledge-base.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/e2e/mocks/intents.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/e2e/mocks/run_details.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/e2e/mocks/runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/e2e/run-details.spec.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/eslint.config.js +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/index.html +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/playwright.config.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/public/vite.svg +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/App.css +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/App.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/assets/react.svg +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/components/InsightSpacePanel.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/components/Layout.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/components/MarkdownContent.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/components/SpaceLegend.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/components/VirtualizedText.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/config/ui.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/config.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/index.css +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/main.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/DomainMemory.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/KnowledgeBase.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/pages/VisualizationHome.tsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/types/plotly.d.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/utils/format.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/utils/phoenix.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/utils/runAnalytics.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/utils/score.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/src/utils/summaryMetrics.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/tailwind.config.js +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/tsconfig.app.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/tsconfig.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/tsconfig.node.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/frontend/vite.config.ts +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/package-lock.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/.gitkeep +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/final_output.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/index.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/load_runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/report.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_change_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_metric_comparison.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/final_output.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/index.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/load_runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/report.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_change_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_metric_comparison.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/final_output.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/index.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/load_runs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/report.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_change_detection.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_metric_comparison.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/comparison_0aa9fab0_9fbf4776.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/comparison_0aa9fab0_9fbf4776.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/comparison_0aa9fab0_f1287e90.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/comparison_0aa9fab0_f1287e90.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/comparison_8f825b22_4516d358.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/comparison_8f825b22_4516d358.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/comparison_9fbf4776_a491fa0e.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/comparison_9fbf4776_a491fa0e.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/comparison_f1287e90_8f825b22.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/comparison/comparison_f1287e90_8f825b22.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/debug_report_r1_smoke.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/debug_report_r2_graphrag.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/debug_report_r2_graphrag_openai.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/debug_report_r3_bm25.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/debug_report_r3_bm25_langfuse3.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/debug_report_r3_dense_faiss.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/improvement_1d91a667-4288-4742-be3a-a8f5310c5140.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r2_graphrag_openai_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r2_graphrag_openai_stage_report.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r2_graphrag_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r2_graphrag_stage_report.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r3_bm25_langfuse2_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r3_bm25_langfuse3_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r3_bm25_langfuse_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r3_bm25_phoenix_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r3_bm25_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r3_bm25_stage_report.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r3_dense_faiss_stage_events.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/r3_dense_faiss_stage_report.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/retrieval_benchmark_smoke_precision.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/retrieval_benchmark_smoke_precision_graphrag.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/reports/retrieval_benchmark_smoke_precision_multi.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/benchmark/download_kmmlu.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/dev/open_rag_trace_demo.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/dev/open_rag_trace_integration_template.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/dev/otel-collector-config.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/dev/start_web_ui_with_phoenix.sh +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/dev/validate_open_rag_trace.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/dev_seed_pipeline_results.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/docs/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/docs/analyzer/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/docs/analyzer/ast_scanner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/docs/analyzer/confidence_scorer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/docs/analyzer/graph_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/docs/analyzer/side_effect_detector.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/docs/generate_api_docs.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/docs/models/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/docs/models/schema.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/docs/renderer/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/docs/renderer/html_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/ops/phoenix_watch.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/perf/backfill_langfuse_trace_url.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/perf/r3_dense_smoke.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/perf/r3_evalvault_run_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/perf/r3_retriever_docs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/perf/r3_smoke_real.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/perf/r3_stage_events_sample.jsonl +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/pipeline_template_inspect.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/reports/generate_release_notes.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/run_with_timeout.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/test_full_evaluation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/tests/run_regressions.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/tests/run_retriever_stage_report_smoke.sh +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/validate_tutorials.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/verify_ragas_compliance.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/scripts/verify_workflows.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/api/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/api/main.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/api/routers/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/api/routers/benchmark.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/api/routers/config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/api/routers/domain.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/api/routers/knowledge.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/api/routers/pipeline.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/app.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/agent.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/analyze.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/api.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/benchmark.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/calibrate.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/debug.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/domain.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/experiment.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/gate.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/generate.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/history.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/init.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/kg.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/langfuse.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/phoenix.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/pipeline.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/commands/stage.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/utils/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/utils/analysis_io.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/utils/console.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/utils/errors.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/utils/formatters.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/utils/options.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/utils/presets.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/utils/progress.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/cli/utils/validators.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/mcp/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/inbound/mcp/schemas.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/analysis_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/base_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/bm25_searcher_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/causal_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/causal_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/common.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/comparison_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/data_loader_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/detailed_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/diagnostic_playbook_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/embedding_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/embedding_distribution_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/embedding_searcher_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/hybrid_rrf_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/hybrid_weighted_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/hypothesis_generator_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/llm_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/low_performer_extractor_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/model_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/morpheme_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/morpheme_quality_checker_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/network_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/nlp_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/nlp_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/pattern_detector_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/pipeline_factory.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/pipeline_helpers.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/priority_summary_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/retrieval_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/retrieval_benchmark_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/retrieval_quality_checker_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/root_cause_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/run_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/run_change_detector_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/run_comparator_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/run_loader_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/run_metric_comparator_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/search_comparator_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/statistical_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/statistical_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/statistical_comparator_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/summary_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/time_series_analyzer_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/timeseries_advanced_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/trend_detector_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/analysis/verification_report_module.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/benchmark/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/benchmark/lm_eval_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/cache/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/cache/hybrid_cache.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/cache/memory_cache.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/dataset/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/dataset/base.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/dataset/csv_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/dataset/excel_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/dataset/json_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/dataset/loader_factory.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/dataset/method_input_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/dataset/streaming_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/dataset/templates.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/dataset/thresholds.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/debug/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/debug/report_renderer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/documents/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/documents/ocr/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/documents/ocr/paddleocr_backend.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/documents/pdf_extractor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/documents/versioned_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/domain_memory/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/domain_memory/domain_memory_schema.sql +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/domain_memory/sqlite_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/improvement/insight_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/improvement/pattern_detector.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/improvement/playbook_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/improvement/stage_metric_playbook_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/kg/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/kg/graph_rag_retriever.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/kg/networkx_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/kg/parallel_kg_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/kg/query_strategies.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/llm/azure_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/llm/base.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/llm/instructor_factory.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/methods/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/methods/baseline_oracle.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/methods/external_command.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/methods/registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/nlp/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/nlp/korean/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/nlp/korean/bm25_retriever.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/nlp/korean/dense_retriever.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/nlp/korean/document_chunker.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/nlp/korean/hybrid_retriever.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/nlp/korean/kiwi_tokenizer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/nlp/korean/korean_evaluation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/nlp/korean/korean_stopwords.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/nlp/korean/toolkit.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/phoenix/sync_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/report/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/report/dashboard_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/report/markdown_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/storage/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/storage/benchmark_storage_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/storage/schema.sql +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/tracer/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/tracer/open_rag_log_handler.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_decorators.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_helpers.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/tracer/phoenix_tracer_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/tracker/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/tracker/langfuse_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/tracker/mlflow_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/adapters/outbound/tracker/phoenix_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/config/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/config/agent_types.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/config/domain_config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/config/instrumentation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/config/langfuse_support.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/config/model_config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/config/phoenix_support.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/config/playbooks/improvement_playbook.yaml +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/config/settings.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/analysis.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/analysis_pipeline.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/benchmark.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/benchmark_run.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/dataset.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/debug.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/experiment.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/feedback.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/improvement.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/kg.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/memory.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/method.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/prompt.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/rag_trace.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/result.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/entities/stage.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/metrics/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/metrics/analysis_registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/metrics/confidence.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/metrics/contextual_relevancy.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/metrics/entity_preservation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/metrics/insurance.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/metrics/no_answer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/metrics/registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/metrics/retrieval_rank.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/metrics/terms_dictionary.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/metrics/text_match.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/analysis_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/async_batch_executor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/batch_executor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/benchmark_report_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/benchmark_runner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/benchmark_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/cache_metrics.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/cluster_map_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/dataset_preprocessor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/debug_report_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/document_chunker.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/document_versioning.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/domain_learning_hook.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/embedding_overlay.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/entity_extractor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/experiment_comparator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/experiment_manager.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/experiment_reporter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/experiment_repository.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/experiment_statistics.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/improvement_guide_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/kg_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/memory_aware_evaluator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/memory_based_analysis.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/method_runner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/pipeline_orchestrator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/prompt_manifest.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/prompt_status.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/ragas_prompt_overrides.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/retrieval_metrics.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/retriever_context.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/satisfaction_calibration_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/stage_event_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/stage_metric_guide_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/stage_metric_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/stage_summary_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/testset_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/threshold_profiles.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/unified_report_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/domain/services/visual_space_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/mkdocs_helpers.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/inbound/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/inbound/analysis_pipeline_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/inbound/evaluator_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/inbound/web_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/analysis_cache_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/analysis_module_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/analysis_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/benchmark_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/causal_analysis_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/dataset_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/domain_memory_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/embedding_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/improvement_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/intent_classifier_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/korean_nlp_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/method_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/nlp_analysis_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/relation_augmenter_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/report_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/stage_storage_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/tracer_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/ports/outbound/tracker_port.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/reports/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/reports/release_notes.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/scripts/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/src/evalvault/scripts/regression_runner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/conftest.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/README.md +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/benchmark/retrieval_ground_truth_min.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/benchmark/retrieval_ground_truth_multi.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/auto_insurance_qa_korean_full.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/comprehensive_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/edge_cases.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/edge_cases.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/evaluation_test_sample.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/graphrag_retriever_docs.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/graphrag_smoke.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/insurance_document.txt +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/insurance_qa_english.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/insurance_qa_english.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/insurance_qa_english.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/insurance_qa_korean.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/insurance_qa_korean.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/insurance_qa_korean.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/insurance_qa_korean_versioned_pdf.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/run_mode_full_domain_memory.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/run_mode_simple.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/e2e/summary_eval_minimal.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/kg/minimal_graph.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/sample_dataset.csv +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/sample_dataset.json +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/fixtures/sample_dataset.xlsx +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/benchmark/test_benchmark_service_integration.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/conftest.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/test_cli_integration.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/test_data_flow.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/test_langfuse_flow.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/test_phoenix_flow.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/test_pipeline_api_contracts.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/test_storage_flow.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/integration/test_summary_eval_fixture.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/optional_deps.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/inbound/mcp/test_execute_tools.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/inbound/mcp/test_read_tools.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/outbound/documents/test_pdf_extractor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/outbound/documents/test_versioned_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/outbound/improvement/test_insight_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/outbound/improvement/test_pattern_detector.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/outbound/improvement/test_playbook_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/outbound/improvement/test_stage_metric_playbook_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/outbound/kg/test_graph_rag_retriever.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/outbound/kg/test_parallel_kg_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/adapters/outbound/storage/test_benchmark_storage_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/config/test_phoenix_support.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/conftest.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/metrics/test_analysis_metric_registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/metrics/test_confidence.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/metrics/test_contextual_relevancy.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/metrics/test_entity_preservation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/metrics/test_metric_registry.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/metrics/test_no_answer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/metrics/test_retrieval_rank.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/metrics/test_text_match.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/services/test_cache_metrics.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/services/test_claim_level.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/services/test_dataset_preprocessor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/services/test_document_versioning.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/services/test_improvement_guide_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/services/test_retrieval_metrics.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/services/test_retriever_context.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/services/test_stage_event_builder.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/services/test_stage_metric_guide_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/services/test_synthetic_qa_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/test_embedding_overlay.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/test_prompt_manifest.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/domain/test_prompt_status.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/reports/test_release_notes.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/scripts/test_regression_runner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_agent_types.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_analysis_entities.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_analysis_modules.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_analysis_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_anthropic_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_async_batch_executor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_azure_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_benchmark_helpers.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_benchmark_runner.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_causal_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_cli.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_cli_domain.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_cli_init.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_cli_progress.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_cli_utils.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_data_loaders.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_domain_config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_domain_memory.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_entities.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_entities_kg.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_entity_extractor.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_experiment.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_hybrid_cache.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_instrumentation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_insurance_metric.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_intent_classifier.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_kg_generator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_kg_networkx.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_kiwi_tokenizer.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_kiwi_warning_suppression.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_korean_dense.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_korean_evaluation.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_korean_retrieval.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_langfuse_tracker.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_llm_relation_augmenter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_lm_eval_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_memory_cache.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_memory_services.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_method_plugins.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_mlflow_tracker.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_model_config.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_nlp_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_nlp_entities.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_ollama_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_openai_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_phoenix_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_pipeline_orchestrator.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_ports.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_postgres_storage.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_rag_trace_entities.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_run_memory_helpers.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_run_mode_fixtures.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_settings.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_stage_cli.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_stage_metric_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_stage_storage.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_stage_summary_service.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_statistical_adapter.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_streaming_loader.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_summary_eval_fixture.py +0 -0
- {evalvault-1.62.0 → evalvault-1.63.0}/tests/unit/test_testset_generator.py +0 -0
|
@@ -54,6 +54,8 @@ coverage.xml
|
|
|
54
54
|
reports/*.html
|
|
55
55
|
reports/*.xml
|
|
56
56
|
reports/*.json
|
|
57
|
+
reports/analysis/
|
|
58
|
+
reports/analysis/**
|
|
57
59
|
reports/assets/
|
|
58
60
|
reports/api-docs/
|
|
59
61
|
!reports/.gitkeep
|
|
@@ -66,6 +68,8 @@ data/e2e_results/
|
|
|
66
68
|
data/tokenizers/
|
|
67
69
|
|
|
68
70
|
# Local state data (should not be versioned)
|
|
71
|
+
data/cache/
|
|
72
|
+
data/db/
|
|
69
73
|
data/evaluations.db
|
|
70
74
|
*.db
|
|
71
75
|
!tests/**/*.db
|
|
@@ -148,3 +152,4 @@ dmypy.json
|
|
|
148
152
|
.AppleDouble
|
|
149
153
|
.LSOverride
|
|
150
154
|
scratch/
|
|
155
|
+
.sisyphus/
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: evalvault
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.63.0
|
|
4
4
|
Summary: RAG evaluation system using Ragas with Phoenix/Langfuse tracing
|
|
5
5
|
Project-URL: Homepage, https://github.com/ntts9990/EvalVault
|
|
6
6
|
Project-URL: Documentation, https://github.com/ntts9990/EvalVault#readme
|
|
@@ -37,6 +37,7 @@ Requires-Dist: openpyxl
|
|
|
37
37
|
Requires-Dist: pandas
|
|
38
38
|
Requires-Dist: pydantic
|
|
39
39
|
Requires-Dist: pydantic-settings
|
|
40
|
+
Requires-Dist: pypdf>=4.3.0
|
|
40
41
|
Requires-Dist: python-multipart
|
|
41
42
|
Requires-Dist: ragas==0.4.2
|
|
42
43
|
Requires-Dist: rich
|
|
@@ -174,11 +175,229 @@ uv run evalvault run --mode simple tests/fixtures/e2e/insurance_qa_korean.json \
|
|
|
174
175
|
--auto-analyze
|
|
175
176
|
```
|
|
176
177
|
|
|
177
|
-
- 결과는
|
|
178
|
+
- 결과는 기본 DB(`data/db/evalvault.db`)에 저장되어 `history`, Web UI, 비교 분석에서 재사용됩니다.
|
|
179
|
+
- `--db`를 생략해도 기본 경로로 저장되며, 모든 데이터가 자동으로 엑셀로 내보내집니다.
|
|
178
180
|
- `--auto-analyze`는 요약 리포트 + 모듈별 아티팩트를 함께 생성합니다.
|
|
179
181
|
|
|
180
182
|
---
|
|
181
183
|
|
|
184
|
+
## 프롬프트 오버라이드 (RAGAS / 시스템)
|
|
185
|
+
|
|
186
|
+
**에디터 관점**: 기본 동작은 유지하고 필요한 항목만 YAML/파일로 덮어씁니다.
|
|
187
|
+
**개발자 관점**: CLI 옵션 또는 Web API 필드로 주입합니다.
|
|
188
|
+
|
|
189
|
+
### CLI
|
|
190
|
+
- RAGAS 메트릭별 오버라이드: `--ragas-prompts`
|
|
191
|
+
- 시스템 프롬프트 적용: `--system-prompt` 또는 `--system-prompt-file`
|
|
192
|
+
|
|
193
|
+
```bash
|
|
194
|
+
uv run evalvault run --mode full tests/fixtures/e2e/insurance_qa_korean.json \
|
|
195
|
+
--metrics faithfulness,answer_relevancy \
|
|
196
|
+
--ragas-prompts config/ragas_prompts.yaml
|
|
197
|
+
|
|
198
|
+
uv run evalvault run --mode full tests/fixtures/e2e/insurance_qa_korean.json \
|
|
199
|
+
--system-prompt-file prompts/system.txt
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
`config/ragas_prompts.yaml` 예시:
|
|
203
|
+
```yaml
|
|
204
|
+
faithfulness: |
|
|
205
|
+
# custom prompt...
|
|
206
|
+
answer_relevancy: |
|
|
207
|
+
# custom prompt...
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
### Web UI / API
|
|
211
|
+
- `EvalRequest` 필드:
|
|
212
|
+
- `system_prompt`, `system_prompt_name`
|
|
213
|
+
- `ragas_prompt_overrides` (메트릭명 → 프롬프트 문자열)
|
|
214
|
+
- `prompt_set_name`, `prompt_set_description`
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## 프롬프트 후보 추천 (`evalvault prompts suggest`)
|
|
219
|
+
|
|
220
|
+
특정 `run_id`의 프롬프트 스냅샷을 기준으로, **자동/수동 후보 프롬프트**를 모은 뒤 **holdout 분리 데이터**에서 Ragas 메트릭을 평가하고, **가중치 합산 점수**로 Top 후보를 추천합니다.
|
|
221
|
+
|
|
222
|
+
- 필수 전제: `run_id`가 `--db`에 저장되어 있고, 해당 run에 **프롬프트 스냅샷**이 연결되어 있어야 합니다 (`evalvault run` 실행 시 `--db` 사용).
|
|
223
|
+
- 자동 후보: 기본 프롬프트를 바탕으로 템플릿 기반 변형을 생성합니다. (`--candidates`, `--auto/--no-auto`)
|
|
224
|
+
- 수동 후보: `--prompt`(반복 가능), `--prompt-file`(반복 가능)로 후보를 추가합니다. (`--no-auto` 사용 시 수동 후보는 필수)
|
|
225
|
+
- holdout 분리: `--holdout-ratio`(기본 0.2)로 dev/holdout을 나누고, **holdout 쪽 점수로 랭킹**을 계산합니다. 재현이 필요하면 `--seed`를 지정하세요.
|
|
226
|
+
- 가중치: `--weights faithfulness=0.7,answer_relevancy=0.3` 형태로 입력하며, 내부에서 합이 1이 되도록 정규화합니다. 미지정 시 메트릭 균등 가중치가 적용됩니다.
|
|
227
|
+
|
|
228
|
+
### 사용 예시
|
|
229
|
+
|
|
230
|
+
```bash
|
|
231
|
+
# 기본 사용 (자동 후보 + 수동 후보 파일)
|
|
232
|
+
uv run evalvault prompts suggest <RUN_ID> --db data/db/evalvault.db \
|
|
233
|
+
--role system \
|
|
234
|
+
--metrics faithfulness,answer_relevancy \
|
|
235
|
+
--weights faithfulness=0.7,answer_relevancy=0.3 \
|
|
236
|
+
--candidates 5 \
|
|
237
|
+
--prompt-file prompts/candidates.txt
|
|
238
|
+
|
|
239
|
+
# 요약 평가(다중 메트릭) + 가중치
|
|
240
|
+
uv run evalvault prompts suggest <RUN_ID> --db data/db/evalvault.db \
|
|
241
|
+
--metrics summary_score,summary_faithfulness,entity_preservation \
|
|
242
|
+
--weights summary_score=0.5,summary_faithfulness=0.3,entity_preservation=0.2 \
|
|
243
|
+
--candidates 3
|
|
244
|
+
|
|
245
|
+
# 샘플링 2개 중 index 선택
|
|
246
|
+
uv run evalvault prompts suggest <RUN_ID> --db data/db/evalvault.db \
|
|
247
|
+
--generation-n 2 \
|
|
248
|
+
--selection-policy index \
|
|
249
|
+
--selection-index 1
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
- `--prompt-file`은 **한 줄당 후보 프롬프트 1개**를 읽습니다(빈 줄 제외).
|
|
253
|
+
|
|
254
|
+
### 주요 옵션 요약
|
|
255
|
+
- `--role`: 개선 대상 프롬프트 role (기본 system)
|
|
256
|
+
- `--metrics`: 평가 메트릭 목록 (기본 run에서 사용한 메트릭)
|
|
257
|
+
- `--weights`: 메트릭 가중치 (합이 1이 되도록 정규화)
|
|
258
|
+
- `--candidates`: 자동 후보 수 (기본 5)
|
|
259
|
+
- `--auto/--no-auto`: 자동 후보 생성 on/off
|
|
260
|
+
- `--holdout-ratio`: dev/holdout 분리 비율 (기본 0.2)
|
|
261
|
+
- `--seed`: 분리/샘플 재현성
|
|
262
|
+
- `--generation-n`: 후보당 샘플 수
|
|
263
|
+
- `--selection-policy`: 샘플 선택 정책 (`best`|`index`)
|
|
264
|
+
- `--selection-index`: `selection-policy=index` 시 선택할 샘플 인덱스
|
|
265
|
+
|
|
266
|
+
### 출력(기본 경로)
|
|
267
|
+
|
|
268
|
+
- 요약 JSON: `reports/analysis/prompt_suggestions_<RUN_ID>.json`
|
|
269
|
+
- 보고서(Markdown): `reports/analysis/prompt_suggestions_<RUN_ID>.md`
|
|
270
|
+
- 아티팩트 디렉터리: `reports/analysis/artifacts/prompt_suggestions_<RUN_ID>/`
|
|
271
|
+
- 후보 목록: `reports/analysis/artifacts/prompt_suggestions_<RUN_ID>/candidates.json`
|
|
272
|
+
- 후보 점수/샘플 점수: `reports/analysis/artifacts/prompt_suggestions_<RUN_ID>/scores.json`
|
|
273
|
+
- 최종 랭킹: `reports/analysis/artifacts/prompt_suggestions_<RUN_ID>/ranking.json`
|
|
274
|
+
- 인덱스: `reports/analysis/artifacts/prompt_suggestions_<RUN_ID>/index.json`
|
|
275
|
+
|
|
276
|
+
경로를 바꾸려면 `--analysis-dir` 또는 `--output`/`--report`를 사용합니다. 설계 배경은 `docs/guides/prompt_suggestions_design.md`를 참고하세요.
|
|
277
|
+
|
|
278
|
+
### FAQ
|
|
279
|
+
- Q. "프롬프트 스냅샷이 없습니다" 오류가 납니다.
|
|
280
|
+
- A. 해당 run이 `--db`로 저장되었는지 확인하고, `evalvault run` 실행 시 `--db`를 지정하세요.
|
|
281
|
+
- Q. 자동 후보를 끄면 어떻게 되나요?
|
|
282
|
+
- A. `--no-auto` 사용 시 `--prompt` 또는 `--prompt-file`로 수동 후보를 반드시 넣어야 합니다.
|
|
283
|
+
- Q. 점수는 어떤 기준인가요?
|
|
284
|
+
- A. holdout 데이터에서 Ragas 메트릭을 평가하고, `--weights` 가중치로 합산한 점수입니다.
|
|
285
|
+
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
## 엑셀 내보내기 (자동)
|
|
289
|
+
|
|
290
|
+
**에디터 관점**: DB 저장과 동시에 Excel이 자동 생성됩니다.
|
|
291
|
+
**개발자 관점**: 저장 로직에서 `export_run_to_excel`이 자동 호출됩니다.
|
|
292
|
+
|
|
293
|
+
- 기본 DB 경로: `data/db/evalvault.db`
|
|
294
|
+
- 엑셀 경로: `data/db/evalvault_run_<RUN_ID>.xlsx`
|
|
295
|
+
|
|
296
|
+
**시트 구성(요약 → 상세)**
|
|
297
|
+
- `Summary`, `Run`, `TestCases`, `MetricScores`, `MetricsSummary`
|
|
298
|
+
- `RunPromptSets`, `PromptSets`, `PromptSetItems`, `Prompts`
|
|
299
|
+
- `Feedback`, `ClusterMaps`, `StageEvents`, `StageMetrics`
|
|
300
|
+
- `AnalysisReports`, `PipelineResults`
|
|
301
|
+
- 시트별 컬럼 설명: `docs/guides/EVALVAULT_RUN_EXCEL_SHEETS.md`
|
|
302
|
+
|
|
303
|
+
---
|
|
304
|
+
|
|
305
|
+
## 외부 시스템 로그 연동 (의도분석/리트리브/리랭킹 등)
|
|
306
|
+
|
|
307
|
+
**에디터 관점**: 표준 포맷(OTel/JSON/JSONL)으로 붙일 수 있어야 합니다.
|
|
308
|
+
**개발자 관점**: OpenTelemetry + OpenInference 또는 Stage Events로 연결합니다.
|
|
309
|
+
|
|
310
|
+
### 1) Open RAG Trace (권장)
|
|
311
|
+
- OpenTelemetry + OpenInference 기반 표준 스키마
|
|
312
|
+
- 스펙: `docs/architecture/open-rag-trace-spec.md`
|
|
313
|
+
- 연동 규격: `docs/guides/EXTERNAL_TRACE_API_SPEC.md`
|
|
314
|
+
- 샘플: `docs/guides/OPEN_RAG_TRACE_SAMPLES.md`
|
|
315
|
+
|
|
316
|
+
**OTLP HTTP 전송(권장)**
|
|
317
|
+
- 엔드포인트: `http://<host>:6006/v1/traces`
|
|
318
|
+
- Collector 사용 시: `http://<collector-host>:4318/v1/traces`
|
|
319
|
+
|
|
320
|
+
**OpenInference 필수 키(요약)**
|
|
321
|
+
- `rag.module`, `spec.version`
|
|
322
|
+
- 권장: `input.value`, `output.value`, `llm.model_name`, `retrieval.documents_json`
|
|
323
|
+
|
|
324
|
+
### 2) EvalVault 직접 Ingest (Draft)
|
|
325
|
+
- `POST /api/v1/ingest/otel-traces` (OTLP JSON)
|
|
326
|
+
- `POST /api/v1/ingest/stage-events` (JSONL)
|
|
327
|
+
- 예시: `docs/templates/otel_openinference_trace_example.json`
|
|
328
|
+
|
|
329
|
+
**OTLP JSON 예시(요약)**
|
|
330
|
+
```json
|
|
331
|
+
{
|
|
332
|
+
"resourceSpans": [
|
|
333
|
+
{
|
|
334
|
+
"resource": {
|
|
335
|
+
"attributes": [
|
|
336
|
+
{ "key": "service.name", "value": { "stringValue": "rag-service" } }
|
|
337
|
+
]
|
|
338
|
+
},
|
|
339
|
+
"scopeSpans": [
|
|
340
|
+
{
|
|
341
|
+
"spans": [
|
|
342
|
+
{
|
|
343
|
+
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
|
|
344
|
+
"spanId": "00f067aa0ba902b7",
|
|
345
|
+
"name": "retrieve",
|
|
346
|
+
"startTimeUnixNano": 1730000000000000000,
|
|
347
|
+
"endTimeUnixNano": 1730000000500000000,
|
|
348
|
+
"attributes": [
|
|
349
|
+
{ "key": "rag.module", "value": { "stringValue": "retrieve" } },
|
|
350
|
+
{ "key": "spec.version", "value": { "stringValue": "0.1" } },
|
|
351
|
+
{ "key": "input.value", "value": { "stringValue": "보험금 지급 조건" } }
|
|
352
|
+
]
|
|
353
|
+
}
|
|
354
|
+
]
|
|
355
|
+
}
|
|
356
|
+
]
|
|
357
|
+
}
|
|
358
|
+
]
|
|
359
|
+
}
|
|
360
|
+
```
|
|
361
|
+
|
|
362
|
+
**응답 예시(요약)**
|
|
363
|
+
```json
|
|
364
|
+
{
|
|
365
|
+
"status": "ok",
|
|
366
|
+
"ingested": 12,
|
|
367
|
+
"trace_ids": ["4bf92f3577b34da6a3ce929d0e0e4736"]
|
|
368
|
+
}
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
**HTTP 상태 코드(요약)**
|
|
372
|
+
- `200 OK`: 정상 수집
|
|
373
|
+
- `400 Bad Request`: JSON/JSONL 파싱 실패
|
|
374
|
+
- `422 Unprocessable Entity`: 필수 필드 누락/스키마 불일치
|
|
375
|
+
- `500 Internal Server Error`: 저장/파이프라인 내부 오류
|
|
376
|
+
|
|
377
|
+
### 3) Stage Events / Metrics 적재
|
|
378
|
+
- 외부 파이프라인 로그를 JSON/JSONL로 저장 → DB ingest
|
|
379
|
+
|
|
380
|
+
```bash
|
|
381
|
+
uv run evalvault stage ingest path/to/stage_events.jsonl --db data/db/evalvault.db
|
|
382
|
+
uv run evalvault stage summary <RUN_ID> --db data/db/evalvault.db
|
|
383
|
+
```
|
|
384
|
+
|
|
385
|
+
**Stage Event JSONL 예시(요약)**
|
|
386
|
+
```jsonl
|
|
387
|
+
{"run_id":"run_20260103_001","stage_id":"stg_sys_01","stage_type":"system_prompt","stage_name":"system_prompt_v1","duration_ms":18,"attributes":{"prompt_id":"sys-01"}}
|
|
388
|
+
{"run_id":"run_20260103_001","stage_id":"stg_input_01","parent_stage_id":"stg_sys_01","stage_type":"input","stage_name":"user_query","duration_ms":6,"attributes":{"query":"보험금 지급 조건","language":"ko"}}
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
- Stage Event에는 **의도분석/리트리브/리랭킹**의 입력/출력/파라미터/결과를 넣습니다.
|
|
392
|
+
- `--stage-store` 사용 시 EvalVault 내부 실행 로그도 자동 저장됩니다.
|
|
393
|
+
|
|
394
|
+
### 4) 분석 전환 규칙(요약)
|
|
395
|
+
- **RAGAS 형식 데이터셋**이면 `evalvault run` 기반 평가/분석
|
|
396
|
+
- **OTel/OpenInference 트레이스**는 Phoenix로 트레이싱 연결
|
|
397
|
+
- **비정형 로그(Stage Event)**는 `stage ingest` → `stage summary` → 분석 모듈로 전환
|
|
398
|
+
|
|
399
|
+
---
|
|
400
|
+
|
|
182
401
|
## Web UI (FastAPI + React)
|
|
183
402
|
|
|
184
403
|
```bash
|
|
@@ -193,6 +412,11 @@ npm run dev
|
|
|
193
412
|
|
|
194
413
|
브라우저에서 `http://localhost:5173` 접속 후, Evaluation Studio에서 실행/히스토리/리포트를 확인합니다.
|
|
195
414
|
|
|
415
|
+
- LLM 보고서 언어: `/api/v1/runs/{run_id}/report?language=en` (기본 ko)
|
|
416
|
+
- 상세: `docs/guides/USER_GUIDE.md#보고서-언어-옵션`
|
|
417
|
+
- 피드백 집계: 동일 `rater_id` + `test_case_id` 기준 최신 값만 집계, 취소 시 집계 제외
|
|
418
|
+
- 상세: `docs/guides/USER_GUIDE.md#피드백-집계-규칙`
|
|
419
|
+
|
|
196
420
|
---
|
|
197
421
|
|
|
198
422
|
## 산출물(Artifacts) 경로
|
|
@@ -256,12 +480,12 @@ npm run dev
|
|
|
256
480
|
|
|
257
481
|
### 2) 한국어/비영어권 대응 (프롬프트 언어 정렬)
|
|
258
482
|
- **한국어 데이터셋 자동 감지** 후 `answer_relevancy`, `factual_correctness`에 한국어 프롬프트를 기본 적용합니다. (`src/evalvault/domain/services/evaluator.py`)
|
|
483
|
+
- **요약/후보 평가 프롬프트 기본 한국어**: 요약 충실도 판정, 프롬프트 후보 평가, 지식그래프 관계 보강 프롬프트는 기본 `ko`로 동작합니다.
|
|
484
|
+
- 영어가 필요하면 API/SDK에서 `language="en"` 또는 `prompt_language="en"`을 지정하세요.
|
|
259
485
|
- **사용자 프롬프트 오버라이드 지원**: 필요 시 YAML로 메트릭별 프롬프트를 덮어쓸 수 있습니다. (`src/evalvault/domain/services/ragas_prompt_overrides.py`)
|
|
260
486
|
- **외부 근거(비영어권 이슈)**:
|
|
261
487
|
- https://github.com/explodinggradients/ragas/issues/1829
|
|
262
488
|
- https://github.com/explodinggradients/ragas/issues/402
|
|
263
|
-
- **공식 문서(언어 이슈 직접 언급)**:
|
|
264
|
-
- https://docs.ragas.io/en/stable/howtos/customizations/metrics/_metrics_language_adaptation/
|
|
265
489
|
|
|
266
490
|
**이유**: 질문 생성/판정 프롬프트가 영어에 고정될 경우, 비영어 입력에서 언어 불일치로 점수 왜곡이 발생할 수 있으므로 이를 최소화합니다.
|
|
267
491
|
|
|
@@ -42,6 +42,11 @@ Open `http://localhost:5173`, run an evaluation in Evaluation Studio (for exampl
|
|
|
42
42
|
`tests/fixtures/e2e/insurance_qa_korean.json`), then check Analysis Lab/Reports for scores
|
|
43
43
|
and insights.
|
|
44
44
|
|
|
45
|
+
- LLM report language: `/api/v1/runs/{run_id}/report?language=en` (default: ko)
|
|
46
|
+
- Details: `docs/guides/USER_GUIDE.md#보고서-언어-옵션`
|
|
47
|
+
- Feedback aggregation: latest value per `rater_id` + `test_case_id` (cancellations excluded)
|
|
48
|
+
- Details: `docs/guides/USER_GUIDE.md#피드백-집계-규칙`
|
|
49
|
+
|
|
45
50
|
**CLI (terminal view)**
|
|
46
51
|
```bash
|
|
47
52
|
uv run evalvault run tests/fixtures/e2e/insurance_qa_korean.json \
|
|
@@ -334,6 +339,11 @@ uv run evalvault run-full tests/fixtures/e2e/insurance_qa_korean.json \
|
|
|
334
339
|
|
|
335
340
|
---
|
|
336
341
|
|
|
342
|
+
## Prompt Language Defaults (RAGAS)
|
|
343
|
+
|
|
344
|
+
- Korean is the default for summary faithfulness judgment, prompt candidate scoring, and KG relation augmentation.
|
|
345
|
+
- Use `language="en"` or `prompt_language="en"` in API/SDK when English is required.
|
|
346
|
+
|
|
337
347
|
## Supported Metrics
|
|
338
348
|
|
|
339
349
|
EvalVault ships with a set of RAG-focused metrics, including the Ragas 0.4.x family,
|