evalvault 1.57.1__tar.gz → 1.59.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {evalvault-1.57.1 → evalvault-1.59.0}/.github/workflows/ci.yml +46 -1
- evalvault-1.59.0/PKG-INFO +327 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/README.en.md +4 -1
- evalvault-1.59.0/README.md +211 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/README.md +6 -6
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/memory/shared/decisions.md +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/memory/shared/dependencies.md +4 -4
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts/improvement/architecture_prompt.md +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts/improvement/coordinator_prompt.md +6 -5
- evalvault-1.59.0/data/datasets/insurance_qa_korean.json +61 -0
- evalvault-1.59.0/docs/INDEX.md +58 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/README.ko.md +3 -2
- evalvault-1.59.0/docs/ROADMAP.md +30 -0
- evalvault-1.59.0/docs/STATUS.md +27 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/api/adapters/inbound.md +2 -2
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/api/ports/outbound.md +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/getting-started/INSTALLATION.md +2 -1
- evalvault-1.59.0/docs/guides/CLI_MCP_PLAN.md +243 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/guides/DEV_GUIDE.md +6 -10
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/guides/USER_GUIDE.md +7 -10
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/00_frontmatter.md +4 -4
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/01_overview.md +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/02_architecture.md +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/04_components.md +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/05_expert_lenses.md +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/09_quality.md +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/14_roadmap.md +10 -19
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/INDEX.md +3 -3
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/STYLE_GUIDE.md +2 -3
- evalvault-1.59.0/docs/web_ui_analysis_migration_plan.md +91 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/README.md +3 -2
- evalvault-1.59.0/frontend/src/pages/ComprehensiveAnalysis.tsx +695 -0
- evalvault-1.59.0/mkdocs.yml +149 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/pyproject.toml +18 -8
- evalvault-1.59.0/reports/analysis/analysis_2163f844-ee2c-4630-9ba8-35cd9954d92e.json +111 -0
- evalvault-1.59.0/reports/analysis/analysis_2163f844-ee2c-4630-9ba8-35cd9954d92e.md +52 -0
- evalvault-1.59.0/reports/analysis/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb.json +1657 -0
- evalvault-1.59.0/reports/analysis/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb.md +61 -0
- evalvault-1.59.0/reports/analysis/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775.json +1909 -0
- evalvault-1.59.0/reports/analysis/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775.md +66 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/causal_analysis.json +37 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/diagnostic.json +21 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/final_output.json +112 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/index.json +108 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/load_data.json +462 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/load_runs.json +249 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/low_samples.json +11 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/nlp_analysis.json +143 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/pattern_detection.json +22 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/priority_summary.json +165 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/ragas_eval.json +65 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/report.json +114 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/root_cause.json +18 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/statistics.json +205 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/time_series.json +22 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_4792d785-a8ea-4fd3-8a0c-dcbf1889f5fb/trend_detection.json +12 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/causal_analysis.json +37 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/diagnostic.json +21 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/final_output.json +112 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/index.json +108 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/load_data.json +462 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/load_runs.json +484 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/low_samples.json +11 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/nlp_analysis.json +143 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/pattern_detection.json +22 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/priority_summary.json +165 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/ragas_eval.json +65 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/report.json +114 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/root_cause.json +18 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/statistics.json +205 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/time_series.json +28 -0
- evalvault-1.59.0/reports/analysis/artifacts/analysis_e2f7e6bb-a86e-4f6a-8002-0c6f1a831775/trend_detection.json +23 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/api/routers/pipeline.py +48 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/analyze.py +434 -179
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/pipeline.py +5 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/run.py +628 -183
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/run_helpers.py +29 -30
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/utils/analysis_io.py +2 -2
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/utils/progress.py +2 -2
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/__init__.py +13 -3
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/embedding_analyzer_module.py +2 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/embedding_searcher_module.py +2 -1
- evalvault-1.59.0/src/evalvault/adapters/outbound/analysis/hypothesis_generator_module.py +359 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/llm_report_module.py +9 -9
- evalvault-1.59.0/src/evalvault/adapters/outbound/analysis/network_analyzer_module.py +250 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/pipeline_factory.py +3 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/pipeline_helpers.py +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/priority_summary_module.py +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/retrieval_benchmark_module.py +3 -2
- evalvault-1.59.0/src/evalvault/adapters/outbound/analysis/timeseries_advanced_module.py +349 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/benchmark/lm_eval_adapter.py +1 -1
- evalvault-1.59.0/src/evalvault/adapters/outbound/documents/__init__.py +4 -0
- evalvault-1.59.0/src/evalvault/adapters/outbound/documents/ocr/__init__.py +3 -0
- evalvault-1.59.0/src/evalvault/adapters/outbound/documents/ocr/paddleocr_backend.py +112 -0
- evalvault-1.59.0/src/evalvault/adapters/outbound/documents/pdf_extractor.py +50 -0
- evalvault-1.59.0/src/evalvault/adapters/outbound/documents/versioned_loader.py +244 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/improvement/insight_generator.py +23 -12
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/improvement/pattern_detector.py +16 -10
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/improvement/playbook_loader.py +21 -13
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/kg/graph_rag_retriever.py +2 -1
- evalvault-1.59.0/src/evalvault/adapters/outbound/llm/__init__.py +128 -0
- evalvault-1.59.0/src/evalvault/adapters/outbound/llm/instructor_factory.py +133 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/llm/ollama_adapter.py +27 -27
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/llm/token_aware_chat.py +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/report/__init__.py +2 -0
- evalvault-1.59.0/src/evalvault/adapters/outbound/report/dashboard_generator.py +197 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/report/llm_report_generator.py +4 -4
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/report/markdown_adapter.py +61 -63
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/storage/postgres_adapter.py +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/tracer/open_rag_log_handler.py +3 -3
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_adapter.py +3 -3
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_helpers.py +4 -4
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/config/settings.py +10 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/analysis_pipeline.py +13 -3
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/analysis_service.py +3 -3
- evalvault-1.59.0/src/evalvault/domain/services/document_versioning.py +119 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/evaluator.py +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/pipeline_template_registry.py +197 -127
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/retriever_context.py +56 -2
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/visual_space_service.py +1 -1
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/analysis_port.py +2 -2
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/improvement_port.py +4 -0
- evalvault-1.59.0/tests/fixtures/e2e/insurance_qa_korean_versioned_pdf.json +45 -0
- evalvault-1.59.0/tests/unit/adapters/outbound/documents/test_pdf_extractor.py +60 -0
- evalvault-1.59.0/tests/unit/adapters/outbound/documents/test_versioned_loader.py +48 -0
- evalvault-1.59.0/tests/unit/domain/services/test_document_versioning.py +74 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_cli.py +8 -4
- {evalvault-1.57.1 → evalvault-1.59.0}/uv.lock +356 -748
- evalvault-1.57.1/PKG-INFO +0 -683
- evalvault-1.57.1/README.md +0 -573
- evalvault-1.57.1/docs/DOCS_HUB.md +0 -8
- evalvault-1.57.1/docs/FUTURE_TESTS.md +0 -5
- evalvault-1.57.1/docs/INDEX.md +0 -177
- evalvault-1.57.1/docs/TEST_COVERAGE_IMPROVEMENT_PLAN.md +0 -380
- evalvault-1.57.1/docs/architecture/ARCHITECTURE.md +0 -2504
- evalvault-1.57.1/docs/assets/psf-supporting-member.png +0 -0
- evalvault-1.57.1/docs/assets/structure-methods/evaluator-port-deps.svg +0 -13
- evalvault-1.57.1/docs/enterprise/ENTERPRISE_READINESS.md +0 -328
- evalvault-1.57.1/docs/enterprise/IMPLEMENTATION_PLAN.md +0 -1488
- evalvault-1.57.1/docs/guides/structure-methods/01-folder-topology.md +0 -67
- evalvault-1.57.1/docs/guides/structure-methods/02-hexagonal-layer-map.md +0 -66
- evalvault-1.57.1/docs/guides/structure-methods/03-entrypoint-flow.md +0 -71
- evalvault-1.57.1/docs/guides/structure-methods/04-c4-component-view.md +0 -78
- evalvault-1.57.1/docs/guides/structure-methods/05-dependency-graph.md +0 -100
- evalvault-1.57.1/docs/guides/structure-methods/06-data-config-flow.md +0 -67
- evalvault-1.57.1/docs/guides/structure-methods/07-test-driven-map.md +0 -63
- evalvault-1.57.1/docs/internal/DEVELOPMENT_WHITEPAPER_PLAN.md +0 -148
- evalvault-1.57.1/docs/internal/FRONTEND_GAP_ANALYSIS_AND_PLAN.md +0 -211
- evalvault-1.57.1/docs/internal/Tasks_Metrics_Summarization.md +0 -75
- evalvault-1.57.1/docs/internal/WHITEPAPER_UPDATE_STRATEGY.md +0 -844
- evalvault-1.57.1/docs/internal/archive/ARCHITECTURE_AUDIT.md +0 -100
- evalvault-1.57.1/docs/internal/archive/COMPLETED.md +0 -1786
- evalvault-1.57.1/docs/internal/archive/D1_debug_layer_r2_r3_update.md +0 -127
- evalvault-1.57.1/docs/internal/archive/DOMAIN_MEMORY_USAGE.md +0 -407
- evalvault-1.57.1/docs/internal/archive/IMPROVEMENT_PLAN.md +0 -1474
- evalvault-1.57.1/docs/internal/archive/KOREAN_RAG_OPTIMIZATION.md +0 -1107
- evalvault-1.57.1/docs/internal/archive/PARALLEL_STATUS.md +0 -55
- evalvault-1.57.1/docs/internal/archive/PARALLEL_WORK_PLAN.md +0 -172
- evalvault-1.57.1/docs/internal/archive/QWEN3_EMBEDDING_INTEGRATION.md +0 -825
- evalvault-1.57.1/docs/internal/archive/RAG_PERFORMANCE_DATA_STRATEGY_FINAL.md +0 -1380
- evalvault-1.57.1/docs/internal/archive/TICKET_STREAMLIT_PROMPT_PREVIEW.md +0 -17
- evalvault-1.57.1/docs/internal/archive/whitepaper-20260111/NEW_WHITEPAPER.md +0 -836
- evalvault-1.57.1/docs/internal/archive/whitepaper-20260111/whitepaper/00-frontmatter.md +0 -111
- evalvault-1.57.1/docs/internal/archive/whitepaper-20260111/whitepaper/01-project-overview.md +0 -433
- evalvault-1.57.1/docs/internal/archive/whitepaper-20260111/whitepaper/02-architecture.md +0 -767
- evalvault-1.57.1/docs/internal/archive/whitepaper-20260111/whitepaper/03-data-flow.md +0 -901
- evalvault-1.57.1/docs/internal/archive/whitepaper-20260111/whitepaper/04-components.md +0 -1547
- evalvault-1.57.1/docs/internal/archive/whitepaper-20260111/whitepaper/05-expert-perspectives.md +0 -813
- evalvault-1.57.1/docs/internal/guides/D1_DEBUG_LAYER_AGENT_GUIDE.md +0 -147
- evalvault-1.57.1/docs/internal/logs/AGENT_REVIEW_QUEST_LOG.md +0 -170
- evalvault-1.57.1/docs/internal/plans/AB_TEST_COMPARISON_PLAN.md +0 -1003
- evalvault-1.57.1/docs/internal/plans/AGENT_DOCS_SYNC_PLAN.md +0 -123
- evalvault-1.57.1/docs/internal/plans/CLI_DEVELOPMENT_PLAN.md +0 -464
- evalvault-1.57.1/docs/internal/plans/CLI_WEB_UI_CATCHUP_PLAN.md +0 -153
- evalvault-1.57.1/docs/internal/plans/DEBUG_TOOL_PLAN.md +0 -307
- evalvault-1.57.1/docs/internal/plans/DOCS_REFACTOR_PLAN.md +0 -87
- evalvault-1.57.1/docs/internal/plans/ENTERPRISE_METHOD_TESTBED_PLAN.md +0 -202
- evalvault-1.57.1/docs/internal/plans/INSURANCE_RAG_EVALUATION_FRAMEWORK.md +0 -522
- evalvault-1.57.1/docs/internal/plans/INSURANCE_SUMMARY_EVAL_PARALLEL_PLAN.md +0 -295
- evalvault-1.57.1/docs/internal/plans/P2_AGENT_TASKS.md +0 -111
- evalvault-1.57.1/docs/internal/plans/PARALLEL_WORK_PLAN.md +0 -571
- evalvault-1.57.1/docs/internal/plans/PROGRESS_PROMPT_PLAN.md +0 -178
- evalvault-1.57.1/docs/internal/plans/RAGAS_042_ANALYSIS_SUMMARY.md +0 -107
- evalvault-1.57.1/docs/internal/plans/RAGAS_042_COMPLIANCE_ANALYSIS_PLAN.md +0 -426
- evalvault-1.57.1/docs/internal/plans/RAG_TRACING_API_PLAN.md +0 -573
- evalvault-1.57.1/docs/internal/plans/REACT_DEV.md +0 -372
- evalvault-1.57.1/docs/internal/plans/USER_SCENARIO_EXECUTION_PLAN.md +0 -95
- evalvault-1.57.1/docs/internal/plans/VISUALIZATION_IMPLEMENTATION_PLAN.md +0 -611
- evalvault-1.57.1/docs/internal/plans/WEB_UI_FOLLOWUP_PLAN.md +0 -82
- evalvault-1.57.1/docs/internal/plans/WHITEPAPER_PREPARATION_GUIDE.md +0 -84
- evalvault-1.57.1/docs/internal/reference/AGENT_STRATEGY.md +0 -846
- evalvault-1.57.1/docs/internal/reference/ARCHITECTURE_C4.md +0 -1275
- evalvault-1.57.1/docs/internal/reference/CLASS_CATALOG.md +0 -908
- evalvault-1.57.1/docs/internal/reference/CLI_OUTPUT_SCHEMA_SNAPSHOT.md +0 -272
- evalvault-1.57.1/docs/internal/reference/CLI_UI_OPTION_MAPPING.md +0 -181
- evalvault-1.57.1/docs/internal/reference/DESIGN_TOKENS.md +0 -161
- evalvault-1.57.1/docs/internal/reference/DEVELOPMENT_GUIDE.md +0 -1424
- evalvault-1.57.1/docs/internal/reference/FEATURE_SPECS.md +0 -442
- evalvault-1.57.1/docs/internal/reference/IA_STRUCTURE.md +0 -156
- evalvault-1.57.1/docs/internal/reference/LIGHTRAG_OPERATIONS_GUIDE.md +0 -236
- evalvault-1.57.1/docs/internal/reference/PROJECT_MAP.md +0 -1366
- evalvault-1.57.1/docs/internal/reference/QUERY_BASED_ANALYSIS_PIPELINE.md +0 -473
- evalvault-1.57.1/docs/internal/reference/RAG_EVALUATION_REQUIREMENTS.md +0 -509
- evalvault-1.57.1/docs/internal/reference/STYLE_GUIDE.md +0 -517
- evalvault-1.57.1/docs/internal/reference/USER_SCENARIO_REQUIREMENTS.md +0 -154
- evalvault-1.57.1/docs/internal/reference/WEB_UI_E2E_SCENARIOS.md +0 -184
- evalvault-1.57.1/docs/internal/reference/WEB_UI_IMPROVEMENT_BACKLOG.md +0 -48
- evalvault-1.57.1/docs/internal/reference/WEB_UI_MANUAL_CHECKLIST.md +0 -60
- evalvault-1.57.1/docs/internal/reference/WEB_UI_VERIFICATION_METHOD.md +0 -91
- evalvault-1.57.1/docs/internal/reference/benchmark_integration_notes.md +0 -440
- evalvault-1.57.1/docs/internal/reports/R1_COMPLETION_REPORT.md +0 -70
- evalvault-1.57.1/docs/internal/reports/R2_COMPLETION_REPORT.md +0 -79
- evalvault-1.57.1/docs/internal/reports/R3_PROGRESS_REPORT.md +0 -158
- evalvault-1.57.1/docs/internal/reports/R4_PROGRESS_REPORT.md +0 -149
- evalvault-1.57.1/docs/internal/reports/RAGAS_042_VERIFICATION_REPORT.md +0 -259
- evalvault-1.57.1/docs/internal/reports/TEMPERATURE_SEED_ANALYSIS.md +0 -152
- evalvault-1.57.1/docs/internal/status/O1_D1_DEBUG_REPORT_SUMMARY.md +0 -85
- evalvault-1.57.1/docs/internal/status/O1_PARALLEL_STATUS.md +0 -45
- evalvault-1.57.1/docs/internal/status/STATUS.md +0 -198
- evalvault-1.57.1/docs/status/ROADMAP.md +0 -1359
- evalvault-1.57.1/docs/status/STATUS.md +0 -100
- evalvault-1.57.1/docs/test.md +0 -403
- evalvault-1.57.1/docs/tutorials/01-quickstart.md +0 -220
- evalvault-1.57.1/docs/tutorials/02-basic-evaluation.md +0 -391
- evalvault-1.57.1/docs/tutorials/04-phoenix-integration.md +0 -495
- evalvault-1.57.1/docs/tutorials/07-domain-memory.md +0 -392
- evalvault-1.57.1/mkdocs.yml +0 -205
- evalvault-1.57.1/src/evalvault/adapters/outbound/llm/__init__.py +0 -128
- evalvault-1.57.1/src/evalvault/adapters/outbound/llm/instructor_factory.py +0 -39
- {evalvault-1.57.1 → evalvault-1.59.0}/.cursor/worktrees.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.dockerignore +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.env.example +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.github/ISSUE_TEMPLATE/question.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.github/dependabot.yml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.github/pull_request_template.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.github/stale.yml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.github/workflows/release.yml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.github/workflows/stale.yml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.gitignore +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.pre-commit-config.yaml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/.python-version +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/AGENTS.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/CHANGELOG.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/CLAUDE.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/CODE_OF_CONDUCT.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/CONTRIBUTING.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/Dockerfile +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/LICENSE.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/SECURITY.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/agent.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/client.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/config.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/main.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/memory/README.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/memory/templates/coordinator_guide.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/memory/templates/work_log_template.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/memory_integration.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/progress.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts/app_spec.txt +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts/baseline.txt +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts/coding_prompt.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts/existing_project_prompt.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts/improvement/base_prompt.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts/improvement/observability_prompt.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts/initializer_prompt.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts/prompt_manifest.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts/system.txt +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/prompts.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/requirements.txt +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/agent/security.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/config/domains/insurance/memory.yaml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/config/domains/insurance/terms_dictionary_en.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/config/domains/insurance/terms_dictionary_ko.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/config/methods.yaml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/config/models.yaml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/config/regressions/default.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/config/regressions/ux.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/config/stage_metric_playbook.yaml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/config/stage_metric_thresholds.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/datasets/dummy_test_dataset.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/datasets/insurance_qa_korean.csv +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/datasets/sample.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/datasets/visualization_20q_cluster_map.csv +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/datasets/visualization_20q_korean.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/datasets/visualization_2q_cluster_map.csv +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/datasets/visualization_2q_korean.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/kg/knowledge_graph.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/raw/The Complete Guide to Mastering Suno Advanced Strategies for Professional Music Generation.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/raw/edge_cases.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/raw/run_mode_full_domain_memory.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/data/raw/sample_rag_knowledge.txt +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/dataset_templates/dataset_template.csv +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/dataset_templates/dataset_template.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/dataset_templates/dataset_template.xlsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/dataset_templates/method_input_template.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docker-compose.langfuse.yml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docker-compose.phoenix.yaml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docker-compose.yml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/api/adapters/outbound.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/api/config.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/api/domain/entities.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/api/domain/metrics.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/api/domain/services.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/api/ports/inbound.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/architecture/open-rag-trace-collector.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/architecture/open-rag-trace-spec.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/guides/RELEASE_CHECKLIST.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/guides/open-rag-trace-internal-adapter.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/guides/open-rag-trace-samples.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/mapping/component-to-whitepaper.yaml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/03_data_flow.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/06_implementation.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/07_advanced.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/08_customization.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/10_performance.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/11_security.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/12_operations.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/new_whitepaper/13_standards.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/stylesheets/extra.css +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/templates/dataset_template.csv +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/templates/dataset_template.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/templates/dataset_template.xlsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/templates/kg_template.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/templates/retriever_docs_template.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/docs/tools/generate-whitepaper.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/dummy_test_dataset.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/benchmarks/README.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/benchmarks/korean_rag/faithfulness_test.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/benchmarks/korean_rag/insurance_qa_100.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/benchmarks/korean_rag/keyword_extraction_test.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/benchmarks/korean_rag/retrieval_test.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/benchmarks/output/comparison.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/benchmarks/output/full_results.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/benchmarks/output/leaderboard.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/benchmarks/output/results_mteb.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/benchmarks/output/retrieval_result.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/benchmarks/run_korean_benchmark.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/kg_generator_demo.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/method_plugin_template/README.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/method_plugin_template/pyproject.toml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/method_plugin_template/src/method_plugin_template/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/method_plugin_template/src/method_plugin_template/methods.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/stage_events.jsonl +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/usecase/comprehensive_workflow_test.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/usecase/insurance_eval_dataset.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/examples/usecase/output/comprehensive_report.html +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/.env.example +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/.gitignore +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/README.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/e2e/analysis-compare.spec.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/e2e/analysis-lab.spec.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/e2e/compare-runs.spec.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/e2e/dashboard.spec.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/e2e/domain-memory.spec.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/e2e/evaluation-studio.spec.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/e2e/knowledge-base.spec.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/e2e/mocks/intents.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/e2e/mocks/run_details.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/e2e/mocks/runs.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/e2e/run-details.spec.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/eslint.config.js +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/index.html +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/package-lock.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/package.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/playwright.config.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/public/vite.svg +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/App.css +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/App.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/assets/react.svg +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/components/AnalysisNodeOutputs.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/components/InsightSpacePanel.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/components/Layout.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/components/MarkdownContent.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/components/PrioritySummaryPanel.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/components/SpaceLegend.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/components/SpacePlot2D.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/components/SpacePlot3D.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/components/StatusBadge.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/components/VirtualizedText.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/config/ui.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/config.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/hooks/useInsightSpace.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/index.css +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/main.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/AnalysisCompareView.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/AnalysisLab.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/AnalysisResultView.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/CompareRuns.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/CustomerReport.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/Dashboard.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/DomainMemory.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/EvaluationStudio.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/KnowledgeBase.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/RunDetails.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/Settings.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/Visualization.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/pages/VisualizationHome.tsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/services/api.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/types/plotly.d.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/utils/format.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/utils/phoenix.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/utils/runAnalytics.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/utils/score.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/src/utils/summaryMetrics.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/tailwind.config.js +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/tsconfig.app.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/tsconfig.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/tsconfig.node.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/frontend/vite.config.ts +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/package-lock.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/.gitkeep +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/README.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/analysis_0aa9fab0-6c2c-4c1c-b228-202a38a2f00c.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/analysis_0aa9fab0-6c2c-4c1c-b228-202a38a2f00c.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/analysis_4516d358-2797-4c46-9f14-c1d975588025.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/analysis_4516d358-2797-4c46-9f14-c1d975588025.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/analysis_9fbf4776-9f5b-4c4b-ba08-c556032cee86.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/analysis_9fbf4776-9f5b-4c4b-ba08-c556032cee86.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/causal_analysis.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/diagnostic.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/final_output.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/index.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/load_data.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/load_runs.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/low_samples.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/nlp_analysis.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/pattern_detection.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/priority_summary.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/ragas_eval.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/report.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/root_cause.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/statistics.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/time_series.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_4516d358-2797-4c46-9f14-c1d975588025/trend_detection.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/causal_analysis.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/diagnostic.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/final_output.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/index.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/load_data.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/load_runs.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/low_samples.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/nlp_analysis.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/pattern_detection.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/priority_summary.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/ragas_eval.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/report.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/root_cause.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/statistics.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/time_series.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_8f825b22-87f1-4d9b-b3a0-8ff65dbec2c5/trend_detection.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/causal_analysis.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/diagnostic.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/final_output.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/index.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/load_data.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/load_runs.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/low_samples.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/nlp_analysis.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/pattern_detection.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/priority_summary.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/ragas_eval.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/report.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/root_cause.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/statistics.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/time_series.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/analysis/artifacts/analysis_f1287e90-43b6-42c8-b3ac-e6cb3e06a71e/trend_detection.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/final_output.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/index.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/load_runs.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/report.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_change_detection.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_metric_comparison.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/final_output.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/index.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/load_runs.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/report.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_change_detection.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_metric_comparison.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/final_output.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/index.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/load_runs.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/report.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_change_detection.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_metric_comparison.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/comparison_0aa9fab0_9fbf4776.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/comparison_0aa9fab0_9fbf4776.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/comparison_0aa9fab0_f1287e90.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/comparison_0aa9fab0_f1287e90.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/comparison_8f825b22_4516d358.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/comparison_8f825b22_4516d358.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/comparison_9fbf4776_a491fa0e.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/comparison_9fbf4776_a491fa0e.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/comparison_f1287e90_8f825b22.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/comparison/comparison_f1287e90_8f825b22.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/debug_report_r1_smoke.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/debug_report_r2_graphrag.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/debug_report_r2_graphrag_openai.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/debug_report_r3_bm25.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/debug_report_r3_bm25_langfuse3.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/debug_report_r3_dense_faiss.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/improvement_1d91a667-4288-4742-be3a-a8f5310c5140.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r2_graphrag_openai_stage_events.jsonl +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r2_graphrag_openai_stage_report.txt +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r2_graphrag_stage_events.jsonl +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r2_graphrag_stage_report.txt +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r3_bm25_langfuse2_stage_events.jsonl +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r3_bm25_langfuse3_stage_events.jsonl +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r3_bm25_langfuse_stage_events.jsonl +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r3_bm25_phoenix_stage_events.jsonl +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r3_bm25_stage_events.jsonl +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r3_bm25_stage_report.txt +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r3_dense_faiss_stage_events.jsonl +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/r3_dense_faiss_stage_report.txt +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/retrieval_benchmark_smoke_precision.csv +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/retrieval_benchmark_smoke_precision_graphrag.csv +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/reports/retrieval_benchmark_smoke_precision_multi.csv +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/benchmark/download_kmmlu.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/dev/open_rag_trace_demo.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/dev/open_rag_trace_integration_template.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/dev/otel-collector-config.yaml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/dev/start_web_ui_with_phoenix.sh +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/dev/validate_open_rag_trace.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/dev_seed_pipeline_results.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/docs/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/docs/analyzer/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/docs/analyzer/ast_scanner.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/docs/analyzer/confidence_scorer.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/docs/analyzer/graph_builder.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/docs/analyzer/side_effect_detector.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/docs/generate_api_docs.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/docs/models/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/docs/models/schema.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/docs/renderer/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/docs/renderer/html_generator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/ops/phoenix_watch.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/perf/backfill_langfuse_trace_url.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/perf/r3_dense_smoke.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/perf/r3_evalvault_run_dataset.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/perf/r3_retriever_docs.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/perf/r3_smoke_real.jsonl +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/perf/r3_stage_events_sample.jsonl +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/pipeline_template_inspect.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/reports/generate_release_notes.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/run_with_timeout.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/test_full_evaluation.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/tests/run_regressions.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/tests/run_retriever_stage_report_smoke.sh +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/validate_tutorials.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/verify_ragas_compliance.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/scripts/verify_workflows.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/api/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/api/adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/api/main.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/api/routers/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/api/routers/benchmark.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/api/routers/config.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/api/routers/domain.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/api/routers/knowledge.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/api/routers/runs.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/app.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/agent.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/api.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/benchmark.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/config.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/debug.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/domain.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/experiment.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/gate.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/generate.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/history.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/init.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/kg.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/langfuse.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/method.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/phoenix.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/prompts.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/commands/stage.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/utils/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/utils/console.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/utils/errors.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/utils/formatters.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/utils/options.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/utils/presets.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/inbound/cli/utils/validators.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/analysis_report_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/base_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/bm25_searcher_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/causal_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/causal_analyzer_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/common.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/comparison_report_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/data_loader_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/detailed_report_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/diagnostic_playbook_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/embedding_distribution_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/hybrid_rrf_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/hybrid_weighted_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/low_performer_extractor_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/model_analyzer_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/morpheme_analyzer_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/morpheme_quality_checker_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/nlp_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/nlp_analyzer_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/pattern_detector_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/ragas_evaluator_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/retrieval_analyzer_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/retrieval_quality_checker_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/root_cause_analyzer_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/run_analyzer_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/run_change_detector_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/run_comparator_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/run_loader_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/run_metric_comparator_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/search_comparator_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/statistical_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/statistical_analyzer_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/statistical_comparator_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/summary_report_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/time_series_analyzer_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/trend_detector_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/analysis/verification_report_module.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/benchmark/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/cache/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/cache/hybrid_cache.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/cache/memory_cache.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/dataset/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/dataset/base.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/dataset/csv_loader.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/dataset/excel_loader.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/dataset/json_loader.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/dataset/loader_factory.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/dataset/method_input_loader.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/dataset/streaming_loader.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/dataset/templates.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/dataset/thresholds.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/debug/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/debug/report_renderer.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/domain_memory/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/domain_memory/domain_memory_schema.sql +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/domain_memory/sqlite_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/improvement/stage_metric_playbook_loader.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/kg/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/kg/networkx_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/kg/parallel_kg_builder.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/kg/query_strategies.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/llm/anthropic_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/llm/azure_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/llm/base.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/llm/llm_relation_augmenter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/llm/openai_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/llm/vllm_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/methods/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/methods/baseline_oracle.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/methods/external_command.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/methods/registry.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/nlp/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/nlp/korean/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/nlp/korean/bm25_retriever.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/nlp/korean/dense_retriever.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/nlp/korean/document_chunker.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/nlp/korean/hybrid_retriever.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/nlp/korean/kiwi_tokenizer.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/nlp/korean/korean_evaluation.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/nlp/korean/korean_stopwords.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/nlp/korean/toolkit.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/phoenix/sync_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/storage/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/storage/base_sql.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/storage/benchmark_storage_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/storage/postgres_schema.sql +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/storage/schema.sql +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/storage/sqlite_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/tracer/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_decorators.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/tracer/phoenix_tracer_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/tracker/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/tracker/langfuse_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/tracker/mlflow_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/adapters/outbound/tracker/phoenix_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/config/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/config/agent_types.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/config/domain_config.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/config/instrumentation.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/config/langfuse_support.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/config/model_config.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/config/phoenix_support.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/config/playbooks/improvement_playbook.yaml +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/debug_ragas.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/debug_ragas_real.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/analysis.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/benchmark.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/benchmark_run.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/dataset.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/debug.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/experiment.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/improvement.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/kg.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/memory.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/method.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/prompt.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/rag_trace.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/result.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/entities/stage.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/metrics/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/metrics/confidence.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/metrics/contextual_relevancy.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/metrics/entity_preservation.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/metrics/insurance.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/metrics/no_answer.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/metrics/retrieval_rank.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/metrics/terms_dictionary.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/metrics/text_match.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/async_batch_executor.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/batch_executor.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/benchmark_report_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/benchmark_runner.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/benchmark_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/cache_metrics.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/cluster_map_builder.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/dataset_preprocessor.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/debug_report_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/document_chunker.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/domain_learning_hook.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/embedding_overlay.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/entity_extractor.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/experiment_comparator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/experiment_manager.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/experiment_reporter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/experiment_repository.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/experiment_statistics.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/improvement_guide_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/intent_classifier.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/kg_generator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/memory_aware_evaluator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/memory_based_analysis.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/method_runner.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/pipeline_orchestrator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/prompt_manifest.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/prompt_registry.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/prompt_status.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/ragas_prompt_overrides.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/retrieval_metrics.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/stage_event_builder.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/stage_metric_guide_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/stage_metric_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/stage_summary_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/synthetic_qa_generator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/testset_generator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/threshold_profiles.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/domain/services/unified_report_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/mkdocs_helpers.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/inbound/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/inbound/analysis_pipeline_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/inbound/evaluator_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/inbound/learning_hook_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/inbound/web_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/analysis_cache_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/analysis_module_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/benchmark_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/causal_analysis_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/dataset_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/domain_memory_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/embedding_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/intent_classifier_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/korean_nlp_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/llm_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/method_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/nlp_analysis_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/relation_augmenter_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/report_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/stage_storage_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/storage_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/tracer_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/ports/outbound/tracker_port.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/reports/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/reports/release_notes.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/scripts/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/src/evalvault/scripts/regression_runner.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/conftest.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/README.md +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/benchmark/retrieval_ground_truth_min.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/benchmark/retrieval_ground_truth_multi.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/auto_insurance_qa_korean_full.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/comprehensive_dataset.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/edge_cases.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/edge_cases.xlsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/evaluation_test_sample.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/graphrag_retriever_docs.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/graphrag_smoke.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/insurance_document.txt +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/insurance_qa_english.csv +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/insurance_qa_english.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/insurance_qa_english.xlsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/insurance_qa_korean.csv +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/insurance_qa_korean.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/insurance_qa_korean.xlsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/run_mode_full_domain_memory.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/run_mode_simple.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/e2e/summary_eval_minimal.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/kg/minimal_graph.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/sample_dataset.csv +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/sample_dataset.json +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/fixtures/sample_dataset.xlsx +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/benchmark/test_benchmark_service_integration.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/conftest.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/test_cli_integration.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/test_data_flow.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/test_e2e_scenarios.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/test_evaluation_flow.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/test_full_workflow.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/test_langfuse_flow.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/test_phoenix_flow.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/test_pipeline_api_contracts.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/test_storage_flow.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/integration/test_summary_eval_fixture.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/optional_deps.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/adapters/outbound/improvement/test_insight_generator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/adapters/outbound/improvement/test_pattern_detector.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/adapters/outbound/improvement/test_playbook_loader.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/adapters/outbound/improvement/test_stage_metric_playbook_loader.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/adapters/outbound/kg/test_graph_rag_retriever.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/adapters/outbound/kg/test_parallel_kg_builder.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/adapters/outbound/storage/test_benchmark_storage_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/config/test_phoenix_support.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/conftest.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/metrics/test_confidence.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/metrics/test_contextual_relevancy.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/metrics/test_entity_preservation.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/metrics/test_no_answer.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/metrics/test_retrieval_rank.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/metrics/test_text_match.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/services/test_cache_metrics.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/services/test_claim_level.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/services/test_dataset_preprocessor.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/services/test_evaluator_comprehensive.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/services/test_improvement_guide_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/services/test_retrieval_metrics.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/services/test_retriever_context.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/services/test_stage_event_builder.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/services/test_stage_metric_guide_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/services/test_synthetic_qa_generator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/test_embedding_overlay.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/test_prompt_manifest.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/domain/test_prompt_status.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/reports/test_release_notes.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/scripts/test_regression_runner.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_agent_types.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_analysis_entities.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_analysis_modules.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_analysis_pipeline.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_analysis_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_anthropic_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_async_batch_executor.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_azure_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_benchmark_helpers.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_benchmark_runner.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_causal_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_cli_domain.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_cli_init.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_cli_progress.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_cli_utils.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_data_loaders.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_domain_config.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_domain_memory.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_entities.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_entities_kg.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_entity_extractor.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_evaluator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_experiment.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_hybrid_cache.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_instrumentation.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_insurance_metric.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_intent_classifier.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_kg_generator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_kg_networkx.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_kiwi_tokenizer.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_kiwi_warning_suppression.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_korean_dense.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_korean_evaluation.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_korean_retrieval.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_langfuse_tracker.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_llm_relation_augmenter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_lm_eval_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_markdown_report.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_memory_cache.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_memory_services.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_method_plugins.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_mlflow_tracker.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_model_config.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_nlp_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_nlp_entities.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_ollama_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_openai_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_phoenix_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_pipeline_orchestrator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_ports.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_postgres_storage.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_rag_trace_entities.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_run_memory_helpers.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_run_mode_fixtures.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_settings.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_sqlite_storage.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_stage_cli.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_stage_metric_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_stage_storage.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_stage_summary_service.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_statistical_adapter.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_streaming_loader.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_summary_eval_fixture.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_testset_generator.py +0 -0
- {evalvault-1.57.1 → evalvault-1.59.0}/tests/unit/test_web_adapter.py +0 -0
|
@@ -82,7 +82,52 @@ jobs:
|
|
|
82
82
|
run: uv python install 3.12
|
|
83
83
|
|
|
84
84
|
- name: Install dependencies
|
|
85
|
-
run: uv sync --extra dev --extra web --extra korean
|
|
85
|
+
run: uv sync --extra dev --extra web --extra korean --extra docs
|
|
86
|
+
|
|
87
|
+
- name: Docs build
|
|
88
|
+
run: uv run mkdocs build -q
|
|
89
|
+
|
|
90
|
+
- name: Docs link check
|
|
91
|
+
run: |
|
|
92
|
+
uv run python - <<'PY'
|
|
93
|
+
import re
|
|
94
|
+
from pathlib import Path
|
|
95
|
+
|
|
96
|
+
def check_file(p: Path) -> list[str]:
|
|
97
|
+
text = p.read_text(encoding='utf-8', errors='ignore')
|
|
98
|
+
link_re = re.compile(r"\[[^\]]*\]\(([^)]+)\)")
|
|
99
|
+
missing: list[str] = []
|
|
100
|
+
for target in link_re.findall(text):
|
|
101
|
+
target = target.strip()
|
|
102
|
+
if not target or target.startswith('#'):
|
|
103
|
+
continue
|
|
104
|
+
if target.startswith(('http://', 'https://', 'mailto:')):
|
|
105
|
+
continue
|
|
106
|
+
if ' ' in target:
|
|
107
|
+
target = target.split(' ')[0]
|
|
108
|
+
target = target.split('#', 1)[0]
|
|
109
|
+
if not target or target.startswith('/'):
|
|
110
|
+
continue
|
|
111
|
+
resolved = (p.parent / target).resolve()
|
|
112
|
+
if not resolved.exists():
|
|
113
|
+
missing.append(f"{p}: {target}")
|
|
114
|
+
return missing
|
|
115
|
+
|
|
116
|
+
missing: list[str] = []
|
|
117
|
+
docs_root = Path('docs')
|
|
118
|
+
for md in docs_root.rglob('*.md'):
|
|
119
|
+
missing.extend(check_file(md))
|
|
120
|
+
for md in [Path('README.md'), Path('README.en.md')]:
|
|
121
|
+
if md.exists():
|
|
122
|
+
missing.extend(check_file(md))
|
|
123
|
+
|
|
124
|
+
if missing:
|
|
125
|
+
print('Missing local link targets:')
|
|
126
|
+
for item in missing[:100]:
|
|
127
|
+
print('-', item)
|
|
128
|
+
raise SystemExit(1)
|
|
129
|
+
print('Docs links OK')
|
|
130
|
+
PY
|
|
86
131
|
|
|
87
132
|
- name: Check formatting
|
|
88
133
|
run: uv run ruff format --check src/ tests/
|
|
@@ -0,0 +1,327 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: evalvault
|
|
3
|
+
Version: 1.59.0
|
|
4
|
+
Summary: RAG evaluation system using Ragas with Phoenix/Langfuse tracing
|
|
5
|
+
Project-URL: Homepage, https://github.com/ntts9990/EvalVault
|
|
6
|
+
Project-URL: Documentation, https://github.com/ntts9990/EvalVault#readme
|
|
7
|
+
Project-URL: Repository, https://github.com/ntts9990/EvalVault.git
|
|
8
|
+
Project-URL: Issues, https://github.com/ntts9990/EvalVault/issues
|
|
9
|
+
Project-URL: Changelog, https://github.com/ntts9990/EvalVault/releases
|
|
10
|
+
Author: EvalVault Contributors
|
|
11
|
+
Maintainer: EvalVault Contributors
|
|
12
|
+
License: Apache-2.0
|
|
13
|
+
License-File: LICENSE.md
|
|
14
|
+
Keywords: ai,evaluation,langfuse,llm,machine-learning,nlp,observability,opentelemetry,phoenix,rag,ragas,retrieval-augmented-generation,testing
|
|
15
|
+
Classifier: Development Status :: 4 - Beta
|
|
16
|
+
Classifier: Intended Audience :: Developers
|
|
17
|
+
Classifier: Intended Audience :: Science/Research
|
|
18
|
+
Classifier: License :: OSI Approved :: Apache Software License
|
|
19
|
+
Classifier: Operating System :: OS Independent
|
|
20
|
+
Classifier: Programming Language :: Python :: 3
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
23
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
24
|
+
Classifier: Topic :: Software Development :: Quality Assurance
|
|
25
|
+
Classifier: Topic :: Software Development :: Testing
|
|
26
|
+
Classifier: Typing :: Typed
|
|
27
|
+
Requires-Python: >=3.12
|
|
28
|
+
Requires-Dist: chardet
|
|
29
|
+
Requires-Dist: fastapi>=0.128.0
|
|
30
|
+
Requires-Dist: instructor
|
|
31
|
+
Requires-Dist: langchain-openai
|
|
32
|
+
Requires-Dist: langfuse
|
|
33
|
+
Requires-Dist: matplotlib<3.9.0,>=3.8.0
|
|
34
|
+
Requires-Dist: networkx
|
|
35
|
+
Requires-Dist: openai
|
|
36
|
+
Requires-Dist: openpyxl
|
|
37
|
+
Requires-Dist: pandas
|
|
38
|
+
Requires-Dist: pydantic
|
|
39
|
+
Requires-Dist: pydantic-settings
|
|
40
|
+
Requires-Dist: python-multipart
|
|
41
|
+
Requires-Dist: ragas==0.4.2
|
|
42
|
+
Requires-Dist: rich
|
|
43
|
+
Requires-Dist: truststore>=0.10.4
|
|
44
|
+
Requires-Dist: typer
|
|
45
|
+
Requires-Dist: uvicorn>=0.40.0
|
|
46
|
+
Requires-Dist: xlrd
|
|
47
|
+
Provides-Extra: analysis
|
|
48
|
+
Requires-Dist: scikit-learn>=1.3.0; extra == 'analysis'
|
|
49
|
+
Provides-Extra: anthropic
|
|
50
|
+
Requires-Dist: anthropic; extra == 'anthropic'
|
|
51
|
+
Requires-Dist: langchain-anthropic; extra == 'anthropic'
|
|
52
|
+
Provides-Extra: benchmark
|
|
53
|
+
Requires-Dist: datasets>=2.0.0; extra == 'benchmark'
|
|
54
|
+
Requires-Dist: lm-eval[api]>=0.4.0; extra == 'benchmark'
|
|
55
|
+
Provides-Extra: dashboard
|
|
56
|
+
Requires-Dist: matplotlib<3.9.0,>=3.8.0; extra == 'dashboard'
|
|
57
|
+
Provides-Extra: dev
|
|
58
|
+
Requires-Dist: anthropic; extra == 'dev'
|
|
59
|
+
Requires-Dist: arize-phoenix>=8.0.0; extra == 'dev'
|
|
60
|
+
Requires-Dist: datasets>=2.0.0; extra == 'dev'
|
|
61
|
+
Requires-Dist: faiss-cpu>=1.8.0; extra == 'dev'
|
|
62
|
+
Requires-Dist: ijson>=3.3.0; extra == 'dev'
|
|
63
|
+
Requires-Dist: kiwipiepy>=0.18.0; extra == 'dev'
|
|
64
|
+
Requires-Dist: langchain-anthropic; extra == 'dev'
|
|
65
|
+
Requires-Dist: lm-eval[api]>=0.4.0; extra == 'dev'
|
|
66
|
+
Requires-Dist: mkdocs-material>=9.5.0; extra == 'dev'
|
|
67
|
+
Requires-Dist: mkdocs>=1.5.0; extra == 'dev'
|
|
68
|
+
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'dev'
|
|
69
|
+
Requires-Dist: mlflow>=2.0.0; extra == 'dev'
|
|
70
|
+
Requires-Dist: openinference-instrumentation-langchain>=0.1.0; extra == 'dev'
|
|
71
|
+
Requires-Dist: opentelemetry-api>=1.20.0; extra == 'dev'
|
|
72
|
+
Requires-Dist: opentelemetry-exporter-otlp>=1.20.0; extra == 'dev'
|
|
73
|
+
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'dev'
|
|
74
|
+
Requires-Dist: psycopg[binary]>=3.0.0; extra == 'dev'
|
|
75
|
+
Requires-Dist: pydeps>=3.0.1; extra == 'dev'
|
|
76
|
+
Requires-Dist: pymdown-extensions>=10.7.0; extra == 'dev'
|
|
77
|
+
Requires-Dist: pytest; extra == 'dev'
|
|
78
|
+
Requires-Dist: pytest-asyncio; extra == 'dev'
|
|
79
|
+
Requires-Dist: pytest-cov; extra == 'dev'
|
|
80
|
+
Requires-Dist: pytest-html; extra == 'dev'
|
|
81
|
+
Requires-Dist: pytest-mock; extra == 'dev'
|
|
82
|
+
Requires-Dist: pytest-rerunfailures; extra == 'dev'
|
|
83
|
+
Requires-Dist: pytest-xdist; extra == 'dev'
|
|
84
|
+
Requires-Dist: python-multipart; extra == 'dev'
|
|
85
|
+
Requires-Dist: rank-bm25>=0.2.2; extra == 'dev'
|
|
86
|
+
Requires-Dist: ruff; extra == 'dev'
|
|
87
|
+
Requires-Dist: scikit-learn<1.4.0,>=1.3.0; extra == 'dev'
|
|
88
|
+
Requires-Dist: sentence-transformers>=5.2.0; extra == 'dev'
|
|
89
|
+
Provides-Extra: docs
|
|
90
|
+
Requires-Dist: mkdocs-material>=9.5.0; extra == 'docs'
|
|
91
|
+
Requires-Dist: mkdocs>=1.5.0; extra == 'docs'
|
|
92
|
+
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'docs'
|
|
93
|
+
Requires-Dist: pymdown-extensions>=10.7.0; extra == 'docs'
|
|
94
|
+
Provides-Extra: korean
|
|
95
|
+
Requires-Dist: kiwipiepy>=0.18.0; extra == 'korean'
|
|
96
|
+
Requires-Dist: rank-bm25>=0.2.2; extra == 'korean'
|
|
97
|
+
Requires-Dist: sentence-transformers>=5.2.0; extra == 'korean'
|
|
98
|
+
Provides-Extra: mlflow
|
|
99
|
+
Requires-Dist: mlflow>=2.0.0; extra == 'mlflow'
|
|
100
|
+
Provides-Extra: perf
|
|
101
|
+
Requires-Dist: faiss-cpu>=1.8.0; extra == 'perf'
|
|
102
|
+
Requires-Dist: ijson>=3.3.0; extra == 'perf'
|
|
103
|
+
Provides-Extra: phoenix
|
|
104
|
+
Requires-Dist: arize-phoenix>=8.0.0; extra == 'phoenix'
|
|
105
|
+
Requires-Dist: openinference-instrumentation-langchain>=0.1.0; extra == 'phoenix'
|
|
106
|
+
Requires-Dist: opentelemetry-api>=1.20.0; extra == 'phoenix'
|
|
107
|
+
Requires-Dist: opentelemetry-exporter-otlp>=1.20.0; extra == 'phoenix'
|
|
108
|
+
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'phoenix'
|
|
109
|
+
Provides-Extra: postgres
|
|
110
|
+
Requires-Dist: psycopg[binary]>=3.0.0; extra == 'postgres'
|
|
111
|
+
Provides-Extra: timeseries
|
|
112
|
+
Requires-Dist: aeon>=1.3.0; extra == 'timeseries'
|
|
113
|
+
Requires-Dist: numba>=0.55.0; extra == 'timeseries'
|
|
114
|
+
Provides-Extra: web
|
|
115
|
+
Description-Content-Type: text/markdown
|
|
116
|
+
|
|
117
|
+
# EvalVault
|
|
118
|
+
|
|
119
|
+
RAG(Retrieval-Augmented Generation) 시스템을 대상으로 **평가(Eval) → 분석(Analysis) → 추적(Tracing) → 개선 루프**를 하나의 워크플로로 묶는 CLI + Web UI 플랫폼입니다.
|
|
120
|
+
|
|
121
|
+
[](https://pypi.org/project/evalvault/)
|
|
122
|
+
[](https://www.python.org/downloads/)
|
|
123
|
+
[](https://github.com/ntts9990/EvalVault/actions/workflows/ci.yml)
|
|
124
|
+
[](LICENSE.md)
|
|
125
|
+
|
|
126
|
+
English version? See `README.en.md`.
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## Quick Links
|
|
131
|
+
|
|
132
|
+
- 문서 허브: `docs/INDEX.md`
|
|
133
|
+
- 사용자 가이드: `docs/guides/USER_GUIDE.md`
|
|
134
|
+
- 개발 가이드: `docs/guides/DEV_GUIDE.md`
|
|
135
|
+
- 상태/로드맵: `docs/STATUS.md`, `docs/ROADMAP.md`
|
|
136
|
+
- 개발 백서(설계/운영/품질 기준): `docs/new_whitepaper/INDEX.md`
|
|
137
|
+
- Open RAG Trace: `docs/architecture/open-rag-trace-spec.md`
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## EvalVault가 해결하는 문제
|
|
142
|
+
|
|
143
|
+
RAG를 운영하다 보면 결국 아래 질문으로 귀결됩니다.
|
|
144
|
+
|
|
145
|
+
- “모델/프롬프트/리트리버를 바꿨는데, **진짜 좋아졌나?**”
|
|
146
|
+
- “좋아졌다면 **왜** 좋아졌고, 나빠졌다면 **어디서** 깨졌나?”
|
|
147
|
+
- “이 결론을 **재현 가능하게** 팀/CI에서 계속 검증할 수 있나?”
|
|
148
|
+
|
|
149
|
+
EvalVault는 위 질문을 **데이터셋 + 메트릭 + (선택)트레이싱** 관점에서 한 번에 답할 수 있게 설계했습니다.
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## 핵심 개념
|
|
154
|
+
|
|
155
|
+
- **Run 단위**: 평가/분석/아티팩트/트레이스가 하나의 `run_id`로 묶입니다.
|
|
156
|
+
- **Dataset 중심**: threshold(합격 기준)는 데이터셋에 포함되어 “도메인별 합격 기준”을 유지합니다.
|
|
157
|
+
- **Artifacts-first**: 보고서(요약)뿐 아니라, 분석 모듈별 원본 결과(아티팩트)를 구조화된 디렉터리에 보존합니다.
|
|
158
|
+
- **Observability 옵션화**: Phoenix/Langfuse/MLflow는 “필요할 때 켜는” 방식으로, 실행 경로는 최대한 단순하게 유지합니다.
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## 3분 Quickstart (CLI)
|
|
163
|
+
|
|
164
|
+
```bash
|
|
165
|
+
uv sync --extra dev
|
|
166
|
+
cp .env.example .env
|
|
167
|
+
|
|
168
|
+
uv run evalvault run --mode simple tests/fixtures/e2e/insurance_qa_korean.json \
|
|
169
|
+
--metrics faithfulness,answer_relevancy \
|
|
170
|
+
--profile dev \
|
|
171
|
+
--db data/db/evalvault.db \
|
|
172
|
+
--auto-analyze
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
- 결과는 `--db`에 저장되어 `history`, Web UI, 비교 분석에서 재사용됩니다.
|
|
176
|
+
- `--auto-analyze`는 요약 리포트 + 모듈별 아티팩트를 함께 생성합니다.
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
## Web UI (FastAPI + React)
|
|
181
|
+
|
|
182
|
+
```bash
|
|
183
|
+
# API
|
|
184
|
+
uv run evalvault serve-api --reload
|
|
185
|
+
|
|
186
|
+
# Frontend
|
|
187
|
+
cd frontend
|
|
188
|
+
npm install
|
|
189
|
+
npm run dev
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
브라우저에서 `http://localhost:5173` 접속 후, Evaluation Studio에서 실행/히스토리/리포트를 확인합니다.
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
## 산출물(Artifacts) 경로
|
|
197
|
+
|
|
198
|
+
- 단일 실행 자동 분석:
|
|
199
|
+
- 요약 JSON: `reports/analysis/analysis_<RUN_ID>.json`
|
|
200
|
+
- 보고서: `reports/analysis/analysis_<RUN_ID>.md`
|
|
201
|
+
- 아티팩트 인덱스: `reports/analysis/artifacts/analysis_<RUN_ID>/index.json`
|
|
202
|
+
- 노드별 결과: `reports/analysis/artifacts/analysis_<RUN_ID>/<node_id>.json`
|
|
203
|
+
|
|
204
|
+
- A/B 비교 분석:
|
|
205
|
+
- 요약 JSON: `reports/comparison/comparison_<RUN_A>_<RUN_B>.json`
|
|
206
|
+
- 보고서: `reports/comparison/comparison_<RUN_A>_<RUN_B>.md`
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
## 데이터셋 포맷(요약)
|
|
211
|
+
|
|
212
|
+
```json
|
|
213
|
+
{
|
|
214
|
+
"name": "insurance-qa",
|
|
215
|
+
"version": "1.0.0",
|
|
216
|
+
"thresholds": { "faithfulness": 0.8 },
|
|
217
|
+
"test_cases": [
|
|
218
|
+
{
|
|
219
|
+
"id": "tc-001",
|
|
220
|
+
"question": "...",
|
|
221
|
+
"answer": "...",
|
|
222
|
+
"contexts": ["..."]
|
|
223
|
+
}
|
|
224
|
+
]
|
|
225
|
+
}
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
- 필수 필드: `id`, `question`, `answer`, `contexts`
|
|
229
|
+
- `ground_truth`는 일부 메트릭에서 필요합니다.
|
|
230
|
+
- 템플릿: `docs/templates/dataset_template.json`, `docs/templates/dataset_template.csv`, `docs/templates/dataset_template.xlsx`
|
|
231
|
+
- 관련 문서: `docs/guides/USER_GUIDE.md`
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## 지원 메트릭(대표)
|
|
236
|
+
|
|
237
|
+
- Ragas 계열: `faithfulness`, `answer_relevancy`, `context_precision`, `context_recall`, `factual_correctness`, `semantic_similarity`
|
|
238
|
+
- 커스텀 예시(도메인): `insurance_term_accuracy`
|
|
239
|
+
|
|
240
|
+
정확한 옵션/운영 레시피는 `docs/guides/USER_GUIDE.md`를 기준으로 최신화합니다.
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
## RAGAS 0.4.2 데이터 전처리/후처리 (중요)
|
|
245
|
+
|
|
246
|
+
아래 항목은 **RAGAS 0.4.2 기준**으로 EvalVault가 데이터와 점수를 안정화하기 위해 수행하는 처리들입니다. 모두 재현성과 품질 저하 방지를 위해 의도적으로 설계되었습니다.
|
|
247
|
+
|
|
248
|
+
### 1) 데이터 전처리 (입력 안정화)
|
|
249
|
+
- **빈 질문/답변/컨텍스트 제거**: 평가 불가능한 케이스를 사전에 제거합니다. (`src/evalvault/domain/services/dataset_preprocessor.py`)
|
|
250
|
+
- **컨텍스트 정규화**: 공백 정리, 중복 제거, 길이 제한을 통해 컨텍스트 품질을 표준화합니다. (`src/evalvault/domain/services/dataset_preprocessor.py`)
|
|
251
|
+
- **레퍼런스 보완**: 레퍼런스가 필요한 메트릭에서 부족할 경우 질문/답변/컨텍스트 기반으로 보완합니다. (`src/evalvault/domain/services/dataset_preprocessor.py`)
|
|
252
|
+
|
|
253
|
+
**이유**: 입력 품질 편차로 인해 RAGAS 점수 분산이 커지는 것을 방지하고, 메트릭 실행 실패/왜곡을 줄입니다.
|
|
254
|
+
|
|
255
|
+
### 2) 한국어/비영어권 대응 (프롬프트 언어 정렬)
|
|
256
|
+
- **한국어 데이터셋 자동 감지** 후 `answer_relevancy`, `factual_correctness`에 한국어 프롬프트를 기본 적용합니다. (`src/evalvault/domain/services/evaluator.py`)
|
|
257
|
+
- **사용자 프롬프트 오버라이드 지원**: 필요 시 YAML로 메트릭별 프롬프트를 덮어쓸 수 있습니다. (`src/evalvault/domain/services/ragas_prompt_overrides.py`)
|
|
258
|
+
- **외부 근거(비영어권 이슈)**:
|
|
259
|
+
- https://github.com/explodinggradients/ragas/issues/1829
|
|
260
|
+
- https://github.com/explodinggradients/ragas/issues/402
|
|
261
|
+
- **공식 문서(언어 이슈 직접 언급)**:
|
|
262
|
+
- https://docs.ragas.io/en/stable/howtos/customizations/metrics/_metrics_language_adaptation/
|
|
263
|
+
|
|
264
|
+
**이유**: 질문 생성/판정 프롬프트가 영어에 고정될 경우, 비영어 입력에서 언어 불일치로 점수 왜곡이 발생할 수 있으므로 이를 최소화합니다.
|
|
265
|
+
|
|
266
|
+
### 3) 점수 후처리 (안정성 확보)
|
|
267
|
+
- **비숫자/NaN 점수는 0.0 처리**: 메트릭 실패가 전체 파이프라인을 중단시키지 않도록 방어합니다. (`src/evalvault/domain/services/evaluator.py`)
|
|
268
|
+
- **Faithfulness 폴백**: RAGAS가 실패하거나 한국어 텍스트에서 불안정할 경우, 한국어 전용 claim-level 분석으로 점수를 재구성합니다. (`src/evalvault/domain/services/evaluator.py`)
|
|
269
|
+
|
|
270
|
+
**이유**: LLM/임베딩 실패나 NaN으로 인해 결과가 끊기는 문제를 방지하고, 한국어에서 최소한의 신뢰도를 확보하기 위해서입니다.
|
|
271
|
+
|
|
272
|
+
### 4) 요약/시각화 후처리 (비교 가능성 강화)
|
|
273
|
+
- **임계값 기준 정규화**: threshold를 0점 기준으로 정규화하여 성능 개선/악화를 직관적으로 표시합니다. (`src/evalvault/domain/services/visual_space_service.py`)
|
|
274
|
+
- **가중 합산**: `faithfulness`, `factual_correctness`, `answer_relevancy` 등을 가중 결합하여 축/지표로 요약합니다. (`src/evalvault/domain/services/visual_space_service.py`)
|
|
275
|
+
|
|
276
|
+
**이유**: 단일 지표만으로는 해석이 어려운 경우가 많아, 정책적 기준(임계값)과 함께 비교 가능한 요약 점수로 제공하기 위함입니다.
|
|
277
|
+
|
|
278
|
+
---
|
|
279
|
+
|
|
280
|
+
## 모델/프로필 설정(요약)
|
|
281
|
+
|
|
282
|
+
- 프로필 정의: `config/models.yaml`
|
|
283
|
+
- 공통 환경 변수(예):
|
|
284
|
+
- `EVALVAULT_PROFILE`
|
|
285
|
+
- `EVALVAULT_DB_PATH`
|
|
286
|
+
- `OPENAI_API_KEY` 또는 `OLLAMA_BASE_URL` 등
|
|
287
|
+
- 관련 문서: `docs/guides/USER_GUIDE.md`, `docs/guides/DEV_GUIDE.md`, `config/models.yaml`
|
|
288
|
+
|
|
289
|
+
---
|
|
290
|
+
|
|
291
|
+
## Open RAG Trace (외부 RAG 시스템까지 통합)
|
|
292
|
+
|
|
293
|
+
EvalVault는 OpenTelemetry + OpenInference 기반의 **Open RAG Trace** 스키마를 제공해, 외부/내부 RAG 시스템을 동일한 방식으로 계측/수집/분석할 수 있게 합니다.
|
|
294
|
+
|
|
295
|
+
- 스펙: `docs/architecture/open-rag-trace-spec.md`
|
|
296
|
+
- Collector: `docs/architecture/open-rag-trace-collector.md`
|
|
297
|
+
- 샘플/내부 래퍼: `docs/guides/open-rag-trace-samples.md`, `docs/guides/open-rag-trace-internal-adapter.md`
|
|
298
|
+
- 관련 문서: `docs/INDEX.md`, `docs/architecture/open-rag-trace-collector.md`
|
|
299
|
+
|
|
300
|
+
---
|
|
301
|
+
|
|
302
|
+
## 개발/기여
|
|
303
|
+
|
|
304
|
+
```bash
|
|
305
|
+
uv run ruff check src/ tests/
|
|
306
|
+
uv run ruff format src/ tests/
|
|
307
|
+
uv run pytest tests -v
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
- 기여 가이드: `CONTRIBUTING.md`
|
|
311
|
+
- 개발 루틴: `docs/guides/DEV_GUIDE.md`
|
|
312
|
+
- 관련 문서: `docs/STATUS.md`, `docs/ROADMAP.md`
|
|
313
|
+
|
|
314
|
+
---
|
|
315
|
+
|
|
316
|
+
## 문서
|
|
317
|
+
|
|
318
|
+
- `docs/INDEX.md`: 문서 허브
|
|
319
|
+
- `docs/STATUS.md`, `docs/ROADMAP.md`: 현재 상태/방향
|
|
320
|
+
- `docs/guides/USER_GUIDE.md`: 사용/운영 종합
|
|
321
|
+
- `docs/new_whitepaper/INDEX.md`: 설계/운영/품질 기준(전문가 관점)
|
|
322
|
+
|
|
323
|
+
---
|
|
324
|
+
|
|
325
|
+
## License
|
|
326
|
+
|
|
327
|
+
EvalVault is licensed under the [Apache 2.0](LICENSE.md) license.
|
|
@@ -361,8 +361,11 @@ On top of these, `StageMetricService` derives **pipeline-stage metrics** such as
|
|
|
361
361
|
---
|
|
362
362
|
|
|
363
363
|
## Documentation
|
|
364
|
+
- [Docs Index](docs/INDEX.md): documentation hub.
|
|
364
365
|
- [User Guide](docs/guides/USER_GUIDE.md): installation, configuration, CLI recipes, Web UI, Phoenix, automation.
|
|
365
|
-
- [
|
|
366
|
+
- [Dev Guide](docs/guides/DEV_GUIDE.md): local dev/test/lint routines.
|
|
367
|
+
- [Developer Whitepaper](docs/new_whitepaper/INDEX.md): architecture, operations, and engineering standards.
|
|
368
|
+
- [Open RAG Trace Spec](docs/architecture/open-rag-trace-spec.md): tracing schema and integration guide.
|
|
366
369
|
- [CHANGELOG](CHANGELOG.md) for release history.
|
|
367
370
|
|
|
368
371
|
---
|
|
@@ -0,0 +1,211 @@
|
|
|
1
|
+
# EvalVault
|
|
2
|
+
|
|
3
|
+
RAG(Retrieval-Augmented Generation) 시스템을 대상으로 **평가(Eval) → 분석(Analysis) → 추적(Tracing) → 개선 루프**를 하나의 워크플로로 묶는 CLI + Web UI 플랫폼입니다.
|
|
4
|
+
|
|
5
|
+
[](https://pypi.org/project/evalvault/)
|
|
6
|
+
[](https://www.python.org/downloads/)
|
|
7
|
+
[](https://github.com/ntts9990/EvalVault/actions/workflows/ci.yml)
|
|
8
|
+
[](LICENSE.md)
|
|
9
|
+
|
|
10
|
+
English version? See `README.en.md`.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Quick Links
|
|
15
|
+
|
|
16
|
+
- 문서 허브: `docs/INDEX.md`
|
|
17
|
+
- 사용자 가이드: `docs/guides/USER_GUIDE.md`
|
|
18
|
+
- 개발 가이드: `docs/guides/DEV_GUIDE.md`
|
|
19
|
+
- 상태/로드맵: `docs/STATUS.md`, `docs/ROADMAP.md`
|
|
20
|
+
- 개발 백서(설계/운영/품질 기준): `docs/new_whitepaper/INDEX.md`
|
|
21
|
+
- Open RAG Trace: `docs/architecture/open-rag-trace-spec.md`
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## EvalVault가 해결하는 문제
|
|
26
|
+
|
|
27
|
+
RAG를 운영하다 보면 결국 아래 질문으로 귀결됩니다.
|
|
28
|
+
|
|
29
|
+
- “모델/프롬프트/리트리버를 바꿨는데, **진짜 좋아졌나?**”
|
|
30
|
+
- “좋아졌다면 **왜** 좋아졌고, 나빠졌다면 **어디서** 깨졌나?”
|
|
31
|
+
- “이 결론을 **재현 가능하게** 팀/CI에서 계속 검증할 수 있나?”
|
|
32
|
+
|
|
33
|
+
EvalVault는 위 질문을 **데이터셋 + 메트릭 + (선택)트레이싱** 관점에서 한 번에 답할 수 있게 설계했습니다.
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## 핵심 개념
|
|
38
|
+
|
|
39
|
+
- **Run 단위**: 평가/분석/아티팩트/트레이스가 하나의 `run_id`로 묶입니다.
|
|
40
|
+
- **Dataset 중심**: threshold(합격 기준)는 데이터셋에 포함되어 “도메인별 합격 기준”을 유지합니다.
|
|
41
|
+
- **Artifacts-first**: 보고서(요약)뿐 아니라, 분석 모듈별 원본 결과(아티팩트)를 구조화된 디렉터리에 보존합니다.
|
|
42
|
+
- **Observability 옵션화**: Phoenix/Langfuse/MLflow는 “필요할 때 켜는” 방식으로, 실행 경로는 최대한 단순하게 유지합니다.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## 3분 Quickstart (CLI)
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
uv sync --extra dev
|
|
50
|
+
cp .env.example .env
|
|
51
|
+
|
|
52
|
+
uv run evalvault run --mode simple tests/fixtures/e2e/insurance_qa_korean.json \
|
|
53
|
+
--metrics faithfulness,answer_relevancy \
|
|
54
|
+
--profile dev \
|
|
55
|
+
--db data/db/evalvault.db \
|
|
56
|
+
--auto-analyze
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
- 결과는 `--db`에 저장되어 `history`, Web UI, 비교 분석에서 재사용됩니다.
|
|
60
|
+
- `--auto-analyze`는 요약 리포트 + 모듈별 아티팩트를 함께 생성합니다.
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Web UI (FastAPI + React)
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
# API
|
|
68
|
+
uv run evalvault serve-api --reload
|
|
69
|
+
|
|
70
|
+
# Frontend
|
|
71
|
+
cd frontend
|
|
72
|
+
npm install
|
|
73
|
+
npm run dev
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
브라우저에서 `http://localhost:5173` 접속 후, Evaluation Studio에서 실행/히스토리/리포트를 확인합니다.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## 산출물(Artifacts) 경로
|
|
81
|
+
|
|
82
|
+
- 단일 실행 자동 분석:
|
|
83
|
+
- 요약 JSON: `reports/analysis/analysis_<RUN_ID>.json`
|
|
84
|
+
- 보고서: `reports/analysis/analysis_<RUN_ID>.md`
|
|
85
|
+
- 아티팩트 인덱스: `reports/analysis/artifacts/analysis_<RUN_ID>/index.json`
|
|
86
|
+
- 노드별 결과: `reports/analysis/artifacts/analysis_<RUN_ID>/<node_id>.json`
|
|
87
|
+
|
|
88
|
+
- A/B 비교 분석:
|
|
89
|
+
- 요약 JSON: `reports/comparison/comparison_<RUN_A>_<RUN_B>.json`
|
|
90
|
+
- 보고서: `reports/comparison/comparison_<RUN_A>_<RUN_B>.md`
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
## 데이터셋 포맷(요약)
|
|
95
|
+
|
|
96
|
+
```json
|
|
97
|
+
{
|
|
98
|
+
"name": "insurance-qa",
|
|
99
|
+
"version": "1.0.0",
|
|
100
|
+
"thresholds": { "faithfulness": 0.8 },
|
|
101
|
+
"test_cases": [
|
|
102
|
+
{
|
|
103
|
+
"id": "tc-001",
|
|
104
|
+
"question": "...",
|
|
105
|
+
"answer": "...",
|
|
106
|
+
"contexts": ["..."]
|
|
107
|
+
}
|
|
108
|
+
]
|
|
109
|
+
}
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
- 필수 필드: `id`, `question`, `answer`, `contexts`
|
|
113
|
+
- `ground_truth`는 일부 메트릭에서 필요합니다.
|
|
114
|
+
- 템플릿: `docs/templates/dataset_template.json`, `docs/templates/dataset_template.csv`, `docs/templates/dataset_template.xlsx`
|
|
115
|
+
- 관련 문서: `docs/guides/USER_GUIDE.md`
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## 지원 메트릭(대표)
|
|
120
|
+
|
|
121
|
+
- Ragas 계열: `faithfulness`, `answer_relevancy`, `context_precision`, `context_recall`, `factual_correctness`, `semantic_similarity`
|
|
122
|
+
- 커스텀 예시(도메인): `insurance_term_accuracy`
|
|
123
|
+
|
|
124
|
+
정확한 옵션/운영 레시피는 `docs/guides/USER_GUIDE.md`를 기준으로 최신화합니다.
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## RAGAS 0.4.2 데이터 전처리/후처리 (중요)
|
|
129
|
+
|
|
130
|
+
아래 항목은 **RAGAS 0.4.2 기준**으로 EvalVault가 데이터와 점수를 안정화하기 위해 수행하는 처리들입니다. 모두 재현성과 품질 저하 방지를 위해 의도적으로 설계되었습니다.
|
|
131
|
+
|
|
132
|
+
### 1) 데이터 전처리 (입력 안정화)
|
|
133
|
+
- **빈 질문/답변/컨텍스트 제거**: 평가 불가능한 케이스를 사전에 제거합니다. (`src/evalvault/domain/services/dataset_preprocessor.py`)
|
|
134
|
+
- **컨텍스트 정규화**: 공백 정리, 중복 제거, 길이 제한을 통해 컨텍스트 품질을 표준화합니다. (`src/evalvault/domain/services/dataset_preprocessor.py`)
|
|
135
|
+
- **레퍼런스 보완**: 레퍼런스가 필요한 메트릭에서 부족할 경우 질문/답변/컨텍스트 기반으로 보완합니다. (`src/evalvault/domain/services/dataset_preprocessor.py`)
|
|
136
|
+
|
|
137
|
+
**이유**: 입력 품질 편차로 인해 RAGAS 점수 분산이 커지는 것을 방지하고, 메트릭 실행 실패/왜곡을 줄입니다.
|
|
138
|
+
|
|
139
|
+
### 2) 한국어/비영어권 대응 (프롬프트 언어 정렬)
|
|
140
|
+
- **한국어 데이터셋 자동 감지** 후 `answer_relevancy`, `factual_correctness`에 한국어 프롬프트를 기본 적용합니다. (`src/evalvault/domain/services/evaluator.py`)
|
|
141
|
+
- **사용자 프롬프트 오버라이드 지원**: 필요 시 YAML로 메트릭별 프롬프트를 덮어쓸 수 있습니다. (`src/evalvault/domain/services/ragas_prompt_overrides.py`)
|
|
142
|
+
- **외부 근거(비영어권 이슈)**:
|
|
143
|
+
- https://github.com/explodinggradients/ragas/issues/1829
|
|
144
|
+
- https://github.com/explodinggradients/ragas/issues/402
|
|
145
|
+
- **공식 문서(언어 이슈 직접 언급)**:
|
|
146
|
+
- https://docs.ragas.io/en/stable/howtos/customizations/metrics/_metrics_language_adaptation/
|
|
147
|
+
|
|
148
|
+
**이유**: 질문 생성/판정 프롬프트가 영어에 고정될 경우, 비영어 입력에서 언어 불일치로 점수 왜곡이 발생할 수 있으므로 이를 최소화합니다.
|
|
149
|
+
|
|
150
|
+
### 3) 점수 후처리 (안정성 확보)
|
|
151
|
+
- **비숫자/NaN 점수는 0.0 처리**: 메트릭 실패가 전체 파이프라인을 중단시키지 않도록 방어합니다. (`src/evalvault/domain/services/evaluator.py`)
|
|
152
|
+
- **Faithfulness 폴백**: RAGAS가 실패하거나 한국어 텍스트에서 불안정할 경우, 한국어 전용 claim-level 분석으로 점수를 재구성합니다. (`src/evalvault/domain/services/evaluator.py`)
|
|
153
|
+
|
|
154
|
+
**이유**: LLM/임베딩 실패나 NaN으로 인해 결과가 끊기는 문제를 방지하고, 한국어에서 최소한의 신뢰도를 확보하기 위해서입니다.
|
|
155
|
+
|
|
156
|
+
### 4) 요약/시각화 후처리 (비교 가능성 강화)
|
|
157
|
+
- **임계값 기준 정규화**: threshold를 0점 기준으로 정규화하여 성능 개선/악화를 직관적으로 표시합니다. (`src/evalvault/domain/services/visual_space_service.py`)
|
|
158
|
+
- **가중 합산**: `faithfulness`, `factual_correctness`, `answer_relevancy` 등을 가중 결합하여 축/지표로 요약합니다. (`src/evalvault/domain/services/visual_space_service.py`)
|
|
159
|
+
|
|
160
|
+
**이유**: 단일 지표만으로는 해석이 어려운 경우가 많아, 정책적 기준(임계값)과 함께 비교 가능한 요약 점수로 제공하기 위함입니다.
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## 모델/프로필 설정(요약)
|
|
165
|
+
|
|
166
|
+
- 프로필 정의: `config/models.yaml`
|
|
167
|
+
- 공통 환경 변수(예):
|
|
168
|
+
- `EVALVAULT_PROFILE`
|
|
169
|
+
- `EVALVAULT_DB_PATH`
|
|
170
|
+
- `OPENAI_API_KEY` 또는 `OLLAMA_BASE_URL` 등
|
|
171
|
+
- 관련 문서: `docs/guides/USER_GUIDE.md`, `docs/guides/DEV_GUIDE.md`, `config/models.yaml`
|
|
172
|
+
|
|
173
|
+
---
|
|
174
|
+
|
|
175
|
+
## Open RAG Trace (외부 RAG 시스템까지 통합)
|
|
176
|
+
|
|
177
|
+
EvalVault는 OpenTelemetry + OpenInference 기반의 **Open RAG Trace** 스키마를 제공해, 외부/내부 RAG 시스템을 동일한 방식으로 계측/수집/분석할 수 있게 합니다.
|
|
178
|
+
|
|
179
|
+
- 스펙: `docs/architecture/open-rag-trace-spec.md`
|
|
180
|
+
- Collector: `docs/architecture/open-rag-trace-collector.md`
|
|
181
|
+
- 샘플/내부 래퍼: `docs/guides/open-rag-trace-samples.md`, `docs/guides/open-rag-trace-internal-adapter.md`
|
|
182
|
+
- 관련 문서: `docs/INDEX.md`, `docs/architecture/open-rag-trace-collector.md`
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
186
|
+
## 개발/기여
|
|
187
|
+
|
|
188
|
+
```bash
|
|
189
|
+
uv run ruff check src/ tests/
|
|
190
|
+
uv run ruff format src/ tests/
|
|
191
|
+
uv run pytest tests -v
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
- 기여 가이드: `CONTRIBUTING.md`
|
|
195
|
+
- 개발 루틴: `docs/guides/DEV_GUIDE.md`
|
|
196
|
+
- 관련 문서: `docs/STATUS.md`, `docs/ROADMAP.md`
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
## 문서
|
|
201
|
+
|
|
202
|
+
- `docs/INDEX.md`: 문서 허브
|
|
203
|
+
- `docs/STATUS.md`, `docs/ROADMAP.md`: 현재 상태/방향
|
|
204
|
+
- `docs/guides/USER_GUIDE.md`: 사용/운영 종합
|
|
205
|
+
- `docs/new_whitepaper/INDEX.md`: 설계/운영/품질 기준(전문가 관점)
|
|
206
|
+
|
|
207
|
+
---
|
|
208
|
+
|
|
209
|
+
## License
|
|
210
|
+
|
|
211
|
+
EvalVault is licensed under the [Apache 2.0](LICENSE.md) license.
|
|
@@ -27,7 +27,7 @@ evalvault (PyPI Package) agent/ (Development Only)
|
|
|
27
27
|
|
|
28
28
|
### Development Mode (This Folder)
|
|
29
29
|
|
|
30
|
-
Agents for improving EvalVault codebase based on
|
|
30
|
+
Agents for improving EvalVault codebase based on the current roadmap and engineering standards (see `docs/ROADMAP.md`, `docs/new_whitepaper/INDEX.md`):
|
|
31
31
|
|
|
32
32
|
| Agent Type | Focus | P-Levels |
|
|
33
33
|
|------------|-------|----------|
|
|
@@ -280,9 +280,9 @@ cat agent/memory/shared/decisions.md | tail -50
|
|
|
280
280
|
uv run python main.py --project-dir .. --agent-type coordinator
|
|
281
281
|
```
|
|
282
282
|
|
|
283
|
-
## Integration with
|
|
283
|
+
## Integration with Project Docs
|
|
284
284
|
|
|
285
|
-
The agent system follows the
|
|
285
|
+
The agent system follows the project documentation and current engineering standards (see `docs/INDEX.md`):
|
|
286
286
|
|
|
287
287
|
| Priority | Agent | Tasks |
|
|
288
288
|
|----------|-------|-------|
|
|
@@ -298,10 +298,10 @@ The agent system follows the improvement plan in `docs/IMPROVEMENT_PLAN.md`:
|
|
|
298
298
|
|
|
299
299
|
- [Claude Agent SDK Docs](https://platform.claude.com/docs/en/agent-sdk/overview)
|
|
300
300
|
- [Effective Harnesses](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
|
|
301
|
-
- [
|
|
301
|
+
- [Docs Index](../docs/INDEX.md)
|
|
302
|
+
- [Developer Whitepaper](../docs/new_whitepaper/INDEX.md)
|
|
303
|
+
- [Open RAG Trace Spec](../docs/architecture/open-rag-trace-spec.md)
|
|
302
304
|
- [Agent Types Configuration](../src/evalvault/config/agent_types.py)
|
|
303
|
-
- [Agent Strategy](../docs/AGENT_STRATEGY.md)
|
|
304
|
-
- [Improvement Plan](../docs/IMPROVEMENT_PLAN.md)
|
|
305
305
|
- [nonstop-agent](https://github.com/seolcoding/nonstop-agent)
|
|
306
306
|
|
|
307
307
|
---
|
|
@@ -7,8 +7,8 @@
|
|
|
7
7
|
|
|
8
8
|
| 문서 | 용도 |
|
|
9
9
|
|------|------|
|
|
10
|
-
| [
|
|
11
|
-
| [
|
|
10
|
+
| [docs/INDEX.md](../../../docs/INDEX.md) | 프로젝트 문서 허브(최신 링크) |
|
|
11
|
+
| [Developer Whitepaper](../../../docs/new_whitepaper/INDEX.md) | 설계/운영/품질 기준 |
|
|
12
12
|
| [agent/README.md](../../README.md) | 에이전트 시스템 사용법 |
|
|
13
13
|
|
|
14
14
|
---
|
|
@@ -137,8 +137,8 @@ architecture (Storage Adapter)┘
|
|
|
137
137
|
| `src/evalvault/config/settings.py` | `architecture` | - | 모든 에이전트 영향 |
|
|
138
138
|
| `src/evalvault/ports/outbound/tracker_port.py` | `observability` | `rag-data` | 스키마 변경 공유 |
|
|
139
139
|
| `src/evalvault/domain/entities/result.py` | `architecture` | - | 테스트 영향 체크 |
|
|
140
|
-
| `docs/
|
|
141
|
-
| `docs/
|
|
140
|
+
| `docs/INDEX.md` | `coordinator` | All | 문서 구조/링크 변경 시 동기화 |
|
|
141
|
+
| `docs/new_whitepaper/INDEX.md` | `coordinator` | All | 설계/운영 기준 변경 시 동기화 |
|
|
142
142
|
| `agent/memory/shared/decisions.md` | All | - | ADR 형식 준수 |
|
|
143
143
|
|
|
144
144
|
### Shared Namespaces
|