evalvault 1.72.0__tar.gz → 1.72.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {evalvault-1.72.0 → evalvault-1.72.1}/.gitignore +7 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/PKG-INFO +1 -1
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/INDEX.md +1 -0
- evalvault-1.72.1/docs/handbook/CHAPTERS/00_overview.md +106 -0
- evalvault-1.72.1/docs/handbook/CHAPTERS/01_architecture.md +50 -0
- evalvault-1.72.1/docs/handbook/CHAPTERS/02_data_and_metrics.md +74 -0
- evalvault-1.72.1/docs/handbook/CHAPTERS/03_workflows.md +58 -0
- evalvault-1.72.1/docs/handbook/CHAPTERS/04_operations.md +67 -0
- evalvault-1.72.1/docs/handbook/CHAPTERS/05_security.md +46 -0
- evalvault-1.72.1/docs/handbook/CHAPTERS/06_quality_and_testing.md +45 -0
- evalvault-1.72.1/docs/handbook/CHAPTERS/07_ux_and_product.md +56 -0
- evalvault-1.72.1/docs/handbook/CHAPTERS/08_roadmap.md +25 -0
- evalvault-1.72.1/docs/handbook/EXTERNAL.md +22 -0
- evalvault-1.72.1/docs/handbook/INDEX.md +26 -0
- evalvault-1.72.1/docs/handbook/appendix-coverage-matrix.md +1818 -0
- evalvault-1.72.1/docs/handbook/appendix-file-inventory.md +3703 -0
- evalvault-1.72.1/docs/handbook/appendix-roadmap.md +66 -0
- evalvault-1.72.1/docs/handbook/appendix-taxonomy.md +116 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/mkdocs.yml +17 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/pyproject.toml +1 -1
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/config/settings.py +2 -1
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_model_config.py +1 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/uv.lock +1 -1
- evalvault-1.72.0/data/raw/The Complete Guide to Mastering Suno Advanced Strategies for Professional Music Generation.md +0 -1184
- evalvault-1.72.0/data/raw/sample_rag_knowledge.txt +0 -11
- evalvault-1.72.0/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/final_output.json +0 -146
- evalvault-1.72.0/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/index.json +0 -38
- evalvault-1.72.0/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/load_runs.json +0 -362
- evalvault-1.72.0/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/report.json +0 -148
- evalvault-1.72.0/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_change_detection.json +0 -31
- evalvault-1.72.0/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_metric_comparison.json +0 -76
- evalvault-1.72.0/reports/comparison/artifacts/comparison_8f825b22_4516d358/final_output.json +0 -155
- evalvault-1.72.0/reports/comparison/artifacts/comparison_8f825b22_4516d358/index.json +0 -38
- evalvault-1.72.0/reports/comparison/artifacts/comparison_8f825b22_4516d358/load_runs.json +0 -1219
- evalvault-1.72.0/reports/comparison/artifacts/comparison_8f825b22_4516d358/report.json +0 -157
- evalvault-1.72.0/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_change_detection.json +0 -37
- evalvault-1.72.0/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_metric_comparison.json +0 -76
- evalvault-1.72.0/reports/comparison/artifacts/comparison_f1287e90_8f825b22/final_output.json +0 -150
- evalvault-1.72.0/reports/comparison/artifacts/comparison_f1287e90_8f825b22/index.json +0 -38
- evalvault-1.72.0/reports/comparison/artifacts/comparison_f1287e90_8f825b22/load_runs.json +0 -789
- evalvault-1.72.0/reports/comparison/artifacts/comparison_f1287e90_8f825b22/report.json +0 -152
- evalvault-1.72.0/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_change_detection.json +0 -42
- evalvault-1.72.0/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_metric_comparison.json +0 -76
- evalvault-1.72.0/reports/comparison/comparison_0aa9fab0_9fbf4776.json +0 -787
- evalvault-1.72.0/reports/comparison/comparison_0aa9fab0_9fbf4776.md +0 -82
- evalvault-1.72.0/reports/comparison/comparison_0aa9fab0_f1287e90.json +0 -775
- evalvault-1.72.0/reports/comparison/comparison_0aa9fab0_f1287e90.md +0 -96
- evalvault-1.72.0/reports/comparison/comparison_8f825b22_4516d358.json +0 -1656
- evalvault-1.72.0/reports/comparison/comparison_8f825b22_4516d358.md +0 -107
- evalvault-1.72.0/reports/comparison/comparison_9fbf4776_a491fa0e.json +0 -595
- evalvault-1.72.0/reports/comparison/comparison_9fbf4776_a491fa0e.md +0 -16
- evalvault-1.72.0/reports/comparison/comparison_f1287e90_8f825b22.json +0 -1221
- evalvault-1.72.0/reports/comparison/comparison_f1287e90_8f825b22.md +0 -88
- evalvault-1.72.0/tests/fixtures/e2e/edge_cases.json +0 -83
- evalvault-1.72.0/tests/fixtures/e2e/run_mode_full_domain_memory.json +0 -37
- {evalvault-1.72.0 → evalvault-1.72.1}/.dockerignore +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/.env.example +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/.env.offline.example +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/.github/workflows/ci.yml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/.github/workflows/regression-gate.yml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/.github/workflows/release.yml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/.github/workflows/stale.yml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/.pre-commit-config.yaml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/.python-version +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/AGENTS.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/CHANGELOG.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/CLAUDE.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/CODE_OF_CONDUCT.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/CONTRIBUTING.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/Dockerfile +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/LICENSE.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/README.en.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/README.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/SECURITY.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/README.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/agent.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/client.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/config.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/main.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/memory/README.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/memory/shared/decisions.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/memory/shared/dependencies.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/memory/templates/coordinator_guide.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/memory/templates/work_log_template.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/memory_integration.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/progress.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts/app_spec.txt +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts/baseline.txt +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts/coding_prompt.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts/existing_project_prompt.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts/improvement/architecture_prompt.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts/improvement/base_prompt.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts/improvement/coordinator_prompt.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts/improvement/observability_prompt.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts/initializer_prompt.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts/prompt_manifest.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts/system.txt +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/prompts.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/requirements.txt +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/agent/security.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/config/domains/insurance/memory.yaml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/config/domains/insurance/terms_dictionary_en.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/config/domains/insurance/terms_dictionary_ko.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/config/methods.yaml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/config/models.yaml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/config/ragas_prompts_override.yaml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/config/regressions/ci.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/config/regressions/default.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/config/regressions/ux.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/config/stage_metric_playbook.yaml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/config/stage_metric_thresholds.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/datasets/dummy_test_dataset.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/datasets/insurance_qa_korean.csv +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/datasets/insurance_qa_korean.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/datasets/insurance_qa_korean_2.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/datasets/insurance_qa_korean_3.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/datasets/ragas_ko90_en10.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/datasets/sample.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/datasets/visualization_20q_cluster_map.csv +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/datasets/visualization_20q_korean.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/datasets/visualization_2q_cluster_map.csv +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/datasets/visualization_2q_korean.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/kg/knowledge_graph.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/data/rag/user_guide_bm25.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/dataset_templates/dataset_template.csv +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/dataset_templates/dataset_template.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/dataset_templates/dataset_template.xlsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/dataset_templates/method_input_template.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docker-compose.langfuse.yml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docker-compose.offline.yml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docker-compose.phoenix.yaml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docker-compose.yml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/README.ko.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/ROADMAP.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/STATUS.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/api/adapters/inbound.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/api/adapters/outbound.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/api/config.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/api/domain/entities.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/api/domain/metrics.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/api/domain/services.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/api/ports/inbound.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/api/ports/outbound.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/architecture/open-rag-trace-collector.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/architecture/open-rag-trace-spec.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/getting-started/INSTALLATION.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/AGENTS_SYSTEM_GUIDE.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/CHAINLIT_INTEGRATION_PLAN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/CI_REGRESSION_GATE.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/CLI_MCP_PLAN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/CLI_PARALLEL_FEATURES_SPEC.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/CLI_UX_REDESIGN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/DEV_GUIDE.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/DOCS_REFRESH_PLAN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/EVALVAULT_RUN_EXCEL_SHEETS.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/EVALVAULT_WORK_PLAN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/EXTERNAL_TRACE_API_SPEC.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/Extension_2.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/Extension_Data_Difficulty_Profiling_Custom_Judge_Model.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/INSURANCE_SUMMARY_METRICS_PLAN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/LENA_MVP_IMPLEMENTATION_PLAN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/LENA_RAGAS_CALIBRATION_DEV_PLAN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/MULTITURN_EVAL_GUIDE.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/NEXT_STEPS_EXECUTION_PLAN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/OFFLINE_DOCKER.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/OPEN_RAG_TRACE_INTERNAL_ADAPTER.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/OPEN_RAG_TRACE_SAMPLES.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/P0_P3_EXECUTION_REPORT.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/P1_P4_WORK_PLAN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/PARALLEL_WORK_APPROVAL_RULES.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/PRD_LENA.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/PROJECT_STATUS_AND_PLAN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/RAG_CLI_WORKFLOW_TEMPLATES.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/RAG_NOISE_REDUCTION_GUIDE.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/RAG_PERFORMANCE_IMPLEMENTATION_LOG.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/RAG_PERFORMANCE_IMPROVEMENT_PROPOSAL.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/RELEASE_CHECKLIST.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/USER_GUIDE.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/WEBUI_CLI_ROLLOUT_PLAN.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/WORKLOG_LAST_2_DAYS.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/cli_process.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/prompt_suggestions_design.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/rag_human_feedback_calibration_implementation_plan.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/refactoring_strategy.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/guides/repeat_query.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/mapping/component-to-whitepaper.yaml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/00_frontmatter.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/01_overview.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/02_architecture.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/03_data_flow.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/04_components.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/05_expert_lenses.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/06_implementation.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/07_advanced.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/08_customization.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/09_quality.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/10_performance.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/11_security.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/12_operations.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/13_standards.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/14_roadmap.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/INDEX.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/new_whitepaper/STYLE_GUIDE.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/refactor/REFAC_000_master_plan.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/refactor/REFAC_010_agent_playbook.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/refactor/REFAC_020_logging_policy.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/refactor/REFAC_030_phase0_responsibility_map.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/refactor/REFAC_040_wbs_parallel_plan.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/refactor/logs/phase-0-baseline.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/refactor/logs/phase-1-evaluator.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/refactor/logs/phase-2-cli-run.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/refactor/logs/phase-3-analysis.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/security_audit_worklog.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/stylesheets/extra.css +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/templates/dataset_template.csv +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/templates/dataset_template.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/templates/dataset_template.xlsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/templates/eval_report_templates.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/templates/kg_template.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/templates/otel_openinference_trace_example.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/templates/ragas_dataset_example_ko90_en10.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/templates/retriever_docs_template.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/tools/generate-whitepaper.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/docs/web_ui_analysis_migration_plan.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/dummy_test_dataset.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/README.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/benchmarks/README.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/benchmarks/korean_rag/faithfulness_test.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/benchmarks/korean_rag/insurance_qa_100.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/benchmarks/korean_rag/keyword_extraction_test.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/benchmarks/korean_rag/retrieval_test.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/benchmarks/output/comparison.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/benchmarks/output/full_results.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/benchmarks/output/leaderboard.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/benchmarks/output/results_mteb.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/benchmarks/output/retrieval_result.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/benchmarks/run_korean_benchmark.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/kg_generator_demo.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/method_plugin_template/README.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/method_plugin_template/pyproject.toml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/method_plugin_template/src/method_plugin_template/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/method_plugin_template/src/method_plugin_template/methods.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/stage_events.jsonl +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/usecase/comprehensive_workflow_test.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/usecase/insurance_eval_dataset.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/examples/usecase/output/comprehensive_report.html +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/.env.example +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/.gitignore +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/Dockerfile +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/README.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/analysis-compare.spec.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/analysis-lab.spec.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/compare-runs.spec.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/dashboard.spec.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/domain-memory.spec.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/evaluation-studio.spec.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/judge-calibration.spec.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/knowledge-base.spec.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/mocks/intents.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/mocks/run_details.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/mocks/runs.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/e2e/run-details.spec.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/eslint.config.js +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/index.html +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/nginx.conf +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/package-lock.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/package.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/playwright.config.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/public/vite.svg +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/App.css +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/App.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/assets/react.svg +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/AnalysisNodeOutputs.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/InsightSpacePanel.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/Layout.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/MarkdownContent.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/PrioritySummaryPanel.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/SpaceLegend.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/SpacePlot2D.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/SpacePlot3D.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/StatusBadge.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/VirtualizedText.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/ai-elements/Conversation.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/ai-elements/Message.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/ai-elements/PromptInput.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/ai-elements/Response.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/components/ai-elements/index.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/config/ui.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/config.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/hooks/useInsightSpace.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/index.css +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/main.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/AnalysisCompareView.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/AnalysisLab.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/AnalysisResultView.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/Chat.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/CompareRuns.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/ComprehensiveAnalysis.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/CustomerReport.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/Dashboard.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/DomainMemory.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/EvaluationStudio.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/JudgeCalibration.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/KnowledgeBase.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/RunDetails.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/Settings.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/Visualization.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/pages/VisualizationHome.tsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/services/api.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/types/plotly.d.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/utils/format.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/utils/phoenix.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/utils/runAnalytics.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/utils/score.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/src/utils/summaryMetrics.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/tailwind.config.js +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/tsconfig.app.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/tsconfig.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/tsconfig.node.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/frontend/vite.config.ts +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/package-lock.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/prompts/system_override.txt +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/.gitkeep +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/README.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/debug_report_r1_smoke.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/debug_report_r2_graphrag.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/debug_report_r2_graphrag_openai.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/debug_report_r3_bm25.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/debug_report_r3_bm25_langfuse3.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/debug_report_r3_dense_faiss.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/feature_verification_report.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/improvement_1d91a667-4288-4742-be3a-a8f5310c5140.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r2_graphrag_openai_stage_events.jsonl +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r2_graphrag_openai_stage_report.txt +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r2_graphrag_stage_events.jsonl +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r2_graphrag_stage_report.txt +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r3_bm25_langfuse2_stage_events.jsonl +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r3_bm25_langfuse3_stage_events.jsonl +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r3_bm25_langfuse_stage_events.jsonl +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r3_bm25_phoenix_stage_events.jsonl +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r3_bm25_stage_events.jsonl +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r3_bm25_stage_report.txt +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r3_dense_faiss_stage_events.jsonl +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/r3_dense_faiss_stage_report.txt +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/retrieval_benchmark_smoke_precision.csv +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/retrieval_benchmark_smoke_precision_graphrag.csv +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/reports/retrieval_benchmark_smoke_precision_multi.csv +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/benchmark/download_kmmlu.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/ci/run_regression_gate.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/dev/open_rag_trace_demo.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/dev/open_rag_trace_integration_template.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/dev/otel-collector-config.yaml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/dev/start_web_ui_with_phoenix.sh +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/dev/validate_open_rag_trace.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/dev/verify_dashboard_endpoint.sh +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/dev_seed_pipeline_results.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/docs/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/docs/analyzer/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/docs/analyzer/ast_scanner.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/docs/analyzer/confidence_scorer.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/docs/analyzer/graph_builder.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/docs/analyzer/side_effect_detector.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/docs/generate_api_docs.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/docs/models/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/docs/models/schema.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/docs/renderer/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/docs/renderer/html_generator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/offline/bundle_datasets.sh +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/offline/export_images.sh +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/offline/import_images.sh +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/offline/restore_datasets.sh +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/offline/smoke_test.sh +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/ops/phoenix_watch.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/perf/backfill_langfuse_trace_url.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/perf/r3_dense_smoke.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/perf/r3_evalvault_run_dataset.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/perf/r3_retriever_docs.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/perf/r3_smoke_real.jsonl +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/perf/r3_stage_events_sample.jsonl +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/pipeline_template_inspect.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/reports/generate_release_notes.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/run_with_timeout.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/test_full_evaluation.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/tests/run_regressions.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/tests/run_retriever_stage_report_smoke.sh +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/validate_tutorials.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/verify_ragas_compliance.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/scripts/verify_workflows.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/main.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/routers/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/routers/benchmark.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/routers/calibration.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/routers/chat.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/routers/config.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/routers/domain.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/routers/knowledge.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/routers/mcp.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/routers/pipeline.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/api/routers/runs.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/app.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/agent.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/analyze.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/api.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/artifacts.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/benchmark.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/calibrate.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/calibrate_judge.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/compare.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/config.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/debug.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/domain.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/experiment.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/gate.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/generate.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/graph_rag.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/history.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/init.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/kg.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/langfuse.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/method.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/ops.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/phoenix.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/pipeline.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/profile_difficulty.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/prompts.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/regress.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/run.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/run_helpers.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/commands/stage.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/utils/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/utils/analysis_io.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/utils/console.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/utils/errors.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/utils/formatters.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/utils/options.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/utils/presets.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/utils/progress.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/cli/utils/validators.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/mcp/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/mcp/schemas.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/inbound/mcp/tools.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/analysis_report_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/base_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/bm25_searcher_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/causal_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/causal_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/common.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/comparison_pipeline_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/comparison_report_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/data_loader_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/detailed_report_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/diagnostic_playbook_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/embedding_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/embedding_distribution_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/embedding_searcher_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/hybrid_rrf_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/hybrid_weighted_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/hypothesis_generator_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/llm_report_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/low_performer_extractor_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/model_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/morpheme_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/morpheme_quality_checker_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/multiturn_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/network_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/nlp_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/nlp_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/pattern_detector_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/pipeline_factory.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/pipeline_helpers.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/priority_summary_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/ragas_evaluator_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/retrieval_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/retrieval_benchmark_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/retrieval_quality_checker_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/root_cause_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/run_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/run_change_detector_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/run_comparator_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/run_loader_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/run_metric_comparator_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/search_comparator_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/statistical_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/statistical_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/statistical_comparator_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/summary_report_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/time_series_analyzer_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/timeseries_advanced_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/trend_detector_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/analysis/verification_report_module.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/artifact_fs.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/benchmark/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/benchmark/lm_eval_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/cache/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/cache/hybrid_cache.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/cache/memory_cache.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/dataset/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/dataset/base.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/dataset/csv_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/dataset/excel_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/dataset/json_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/dataset/loader_factory.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/dataset/method_input_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/dataset/multiturn_json_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/dataset/streaming_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/dataset/templates.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/dataset/thresholds.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/debug/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/debug/report_renderer.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/documents/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/documents/ocr/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/documents/ocr/paddleocr_backend.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/documents/pdf_extractor.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/documents/versioned_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/domain_memory/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/domain_memory/domain_memory_schema.sql +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/domain_memory/sqlite_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/filesystem/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/filesystem/difficulty_profile_writer.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/filesystem/ops_snapshot_writer.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/improvement/insight_generator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/improvement/pattern_detector.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/improvement/playbook_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/improvement/stage_metric_playbook_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/judge_calibration_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/judge_calibration_reporter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/kg/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/kg/graph_rag_retriever.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/kg/networkx_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/kg/parallel_kg_builder.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/kg/query_strategies.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/llm/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/llm/anthropic_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/llm/azure_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/llm/base.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/llm/factory.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/llm/instructor_factory.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/llm/llm_relation_augmenter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/llm/ollama_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/llm/openai_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/llm/token_aware_chat.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/llm/vllm_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/methods/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/methods/baseline_oracle.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/methods/external_command.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/methods/registry.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/nlp/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/nlp/korean/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/nlp/korean/bm25_retriever.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/nlp/korean/dense_retriever.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/nlp/korean/document_chunker.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/nlp/korean/hybrid_retriever.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/nlp/korean/kiwi_tokenizer.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/nlp/korean/korean_evaluation.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/nlp/korean/korean_stopwords.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/nlp/korean/toolkit.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/nlp/korean/toolkit_factory.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/phoenix/sync_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/report/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/report/ci_report_formatter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/report/dashboard_generator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/report/llm_report_generator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/report/markdown_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/report/pr_comment_formatter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/retriever/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/retriever/graph_rag_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/storage/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/storage/base_sql.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/storage/benchmark_storage_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/storage/postgres_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/storage/postgres_schema.sql +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/storage/schema.sql +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/storage/sqlite_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/tracer/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/tracer/open_rag_log_handler.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/tracer/open_rag_trace_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/tracer/open_rag_trace_decorators.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/tracer/open_rag_trace_helpers.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/tracer/phoenix_tracer_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/tracker/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/tracker/langfuse_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/tracker/log_sanitizer.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/tracker/mlflow_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/adapters/outbound/tracker/phoenix_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/config/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/config/agent_types.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/config/domain_config.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/config/instrumentation.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/config/langfuse_support.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/config/model_config.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/config/phoenix_support.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/config/playbooks/improvement_playbook.yaml +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/config/secret_manager.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/debug_ragas.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/debug_ragas_real.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/analysis.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/analysis_pipeline.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/benchmark.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/benchmark_run.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/dataset.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/debug.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/experiment.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/feedback.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/graph_rag.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/improvement.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/judge_calibration.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/kg.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/memory.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/method.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/multiturn.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/prompt.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/prompt_suggestion.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/rag_trace.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/result.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/entities/stage.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/analysis_registry.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/confidence.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/contextual_relevancy.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/entity_preservation.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/insurance.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/multiturn_metrics.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/no_answer.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/registry.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/retrieval_rank.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/summary_accuracy.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/summary_needs_followup.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/summary_non_definitive.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/summary_risk_coverage.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/terms_dictionary.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/metrics/text_match.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/analysis_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/artifact_lint_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/async_batch_executor.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/batch_executor.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/benchmark_report_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/benchmark_runner.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/benchmark_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/cache_metrics.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/cluster_map_builder.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/custom_metric_snapshot.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/dataset_preprocessor.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/debug_report_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/difficulty_profile_reporter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/difficulty_profiling_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/document_chunker.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/document_versioning.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/domain_learning_hook.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/embedding_overlay.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/entity_extractor.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/evaluator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/experiment_comparator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/experiment_manager.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/experiment_reporter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/experiment_repository.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/experiment_statistics.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/graph_rag_experiment.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/holdout_splitter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/improvement_guide_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/intent_classifier.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/judge_calibration_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/kg_generator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/memory_aware_evaluator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/memory_based_analysis.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/method_runner.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/multiturn_evaluator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/ops_snapshot_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/pipeline_orchestrator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/pipeline_template_registry.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/prompt_candidate_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/prompt_manifest.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/prompt_registry.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/prompt_scoring_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/prompt_status.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/prompt_suggestion_reporter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/ragas_prompt_overrides.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/regression_gate_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/retrieval_metrics.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/retriever_context.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/run_comparison_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/satisfaction_calibration_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/stage_event_builder.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/stage_metric_guide_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/stage_metric_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/stage_summary_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/synthetic_qa_generator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/testset_generator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/threshold_profiles.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/unified_report_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/domain/services/visual_space_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/mkdocs_helpers.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/inbound/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/inbound/analysis_pipeline_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/inbound/evaluator_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/inbound/learning_hook_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/inbound/multiturn_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/inbound/web_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/analysis_cache_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/analysis_module_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/analysis_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/artifact_fs_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/benchmark_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/causal_analysis_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/comparison_pipeline_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/dataset_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/difficulty_profile_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/domain_memory_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/embedding_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/graph_retriever_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/improvement_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/intent_classifier_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/judge_calibration_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/korean_nlp_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/llm_factory_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/llm_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/method_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/nlp_analysis_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/ops_snapshot_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/relation_augmenter_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/report_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/stage_storage_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/storage_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/tracer_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/ports/outbound/tracker_port.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/reports/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/reports/release_notes.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/scripts/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/src/evalvault/scripts/regression_runner.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/conftest.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/README.md +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/benchmark/retrieval_ground_truth_min.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/benchmark/retrieval_ground_truth_multi.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/auto_insurance_qa_korean_full.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/callcenter_summary_5cases.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/comprehensive_dataset.json +0 -0
- {evalvault-1.72.0/data/raw → evalvault-1.72.1/tests/fixtures/e2e}/edge_cases.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/edge_cases.xlsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/evaluation_test_sample.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/graphrag_benchmark.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/graphrag_multi_sample.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/graphrag_retriever_docs.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/graphrag_smoke.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/insurance_document.txt +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/insurance_qa_english.csv +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/insurance_qa_english.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/insurance_qa_english.xlsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/insurance_qa_korean.csv +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/insurance_qa_korean.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/insurance_qa_korean.xlsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/insurance_qa_korean_versioned_pdf.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/multiturn_benchmark.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/regression_baseline.json +0 -0
- {evalvault-1.72.0/data/raw → evalvault-1.72.1/tests/fixtures/e2e}/run_mode_full_domain_memory.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/run_mode_simple.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/e2e/summary_eval_minimal.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/kg/minimal_graph.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/sample_dataset.csv +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/sample_dataset.json +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/fixtures/sample_dataset.xlsx +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/benchmark/test_benchmark_service_integration.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/conftest.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/test_cli_integration.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/test_data_flow.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/test_e2e_scenarios.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/test_evaluation_flow.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/test_full_workflow.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/test_langfuse_flow.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/test_phoenix_flow.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/test_pipeline_api_contracts.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/test_storage_flow.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/integration/test_summary_eval_fixture.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/optional_deps.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/inbound/mcp/test_execute_tools.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/inbound/mcp/test_read_tools.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/outbound/documents/test_pdf_extractor.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/outbound/documents/test_versioned_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/outbound/improvement/test_insight_generator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/outbound/improvement/test_pattern_detector.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/outbound/improvement/test_playbook_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/outbound/improvement/test_stage_metric_playbook_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/outbound/kg/test_graph_rag_retriever.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/outbound/kg/test_parallel_kg_builder.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/outbound/retriever/test_graph_rag_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/adapters/outbound/storage/test_benchmark_storage_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/config/test_phoenix_support.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/conftest.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/metrics/test_analysis_metric_registry.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/metrics/test_confidence.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/metrics/test_contextual_relevancy.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/metrics/test_entity_preservation.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/metrics/test_metric_registry.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/metrics/test_multiturn_metrics.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/metrics/test_no_answer.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/metrics/test_retrieval_rank.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/metrics/test_text_match.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_cache_metrics.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_claim_level.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_dataset_preprocessor.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_document_versioning.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_evaluator_comprehensive.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_holdout_splitter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_improvement_guide_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_judge_calibration_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_ops_snapshot_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_regression_gate_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_retrieval_metrics.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_retriever_context.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_stage_event_builder.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_stage_metric_guide_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/services/test_synthetic_qa_generator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/test_embedding_overlay.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/test_prompt_manifest.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/domain/test_prompt_status.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/reports/test_release_notes.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/scripts/test_regression_runner.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_agent_types.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_analysis_entities.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_analysis_modules.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_analysis_pipeline.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_analysis_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_anthropic_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_artifact_lint_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_async_batch_executor.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_azure_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_benchmark_helpers.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_benchmark_runner.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_causal_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_ci_gate_cli.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_cli.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_cli_artifacts.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_cli_calibrate_judge.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_cli_domain.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_cli_init.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_cli_ops.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_cli_progress.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_cli_utils.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_data_loaders.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_difficulty_profiling_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_domain_config.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_domain_memory.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_entities.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_entities_kg.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_entity_extractor.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_evaluator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_experiment.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_hybrid_cache.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_instrumentation.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_insurance_metric.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_intent_classifier.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_kg_generator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_kg_networkx.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_kiwi_tokenizer.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_kiwi_warning_suppression.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_korean_dense.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_korean_evaluation.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_korean_retrieval.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_langfuse_tracker.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_llm_relation_augmenter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_lm_eval_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_markdown_report.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_memory_cache.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_memory_services.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_method_plugins.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_mlflow_tracker.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_nlp_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_nlp_entities.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_ollama_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_openai_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_phoenix_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_pipeline_orchestrator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_ports.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_postgres_storage.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_pr_comment_formatter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_prompt_candidate_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_rag_trace_entities.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_regress_cli.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_run_comparison_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_run_memory_helpers.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_run_mode_fixtures.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_settings.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_sqlite_storage.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_stage_cli.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_stage_event_schema.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_stage_metric_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_stage_storage.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_stage_summary_service.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_statistical_adapter.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_streaming_loader.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_summary_eval_fixture.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_testset_generator.py +0 -0
- {evalvault-1.72.0 → evalvault-1.72.1}/tests/unit/test_web_adapter.py +0 -0
|
@@ -56,6 +56,10 @@ reports/*.xml
|
|
|
56
56
|
reports/*.json
|
|
57
57
|
reports/analysis/
|
|
58
58
|
reports/analysis/**
|
|
59
|
+
reports/comparison/
|
|
60
|
+
reports/comparison/**
|
|
61
|
+
reports/presentation_materials_*.md
|
|
62
|
+
reports/ralph_loop_briefing.md
|
|
59
63
|
reports/assets/
|
|
60
64
|
reports/api-docs/
|
|
61
65
|
!reports/.gitkeep
|
|
@@ -67,6 +71,9 @@ data/e2e_results/
|
|
|
67
71
|
# HuggingFace tokenizer cache (lm-eval benchmarks)
|
|
68
72
|
data/tokenizers/
|
|
69
73
|
|
|
74
|
+
# Local raw data (should not be versioned)
|
|
75
|
+
data/raw/
|
|
76
|
+
|
|
70
77
|
# Local state data (should not be versioned)
|
|
71
78
|
data/cache/
|
|
72
79
|
data/db/
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: evalvault
|
|
3
|
-
Version: 1.72.
|
|
3
|
+
Version: 1.72.1
|
|
4
4
|
Summary: RAG evaluation system using Ragas with Phoenix/Langfuse tracing
|
|
5
5
|
Project-URL: Homepage, https://github.com/ntts9990/EvalVault
|
|
6
6
|
Project-URL: Documentation, https://github.com/ntts9990/EvalVault#readme
|
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
# 00. Overview
|
|
2
|
+
|
|
3
|
+
> 내부용 본편(상세). 외부 공개 요약은 `docs/handbook/EXTERNAL.md`에 별도 작성.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## TL;DR
|
|
8
|
+
|
|
9
|
+
- EvalVault는 **평가(Evaluation) → 분석(Analysis) → 비교(Compare) → 개선 루프**를 `run_id` 단위로 연결한다.
|
|
10
|
+
- 실행 결과는 DB와 아티팩트로 남아 재현 가능하며, Web UI는 같은 DB를 바라볼 때 즉시 이어진다.
|
|
11
|
+
- 관측(Phoenix/Langfuse), 표준(Open RAG Trace), 학습(Domain Memory), 분석 파이프라인(DAG)은 **옵션화**되어 필요할 때만 켠다.
|
|
12
|
+
|
|
13
|
+
## 미션(1문장)
|
|
14
|
+
|
|
15
|
+
RAG 시스템의 변경이 **진짜 개선인지**를 데이터셋·메트릭·(선택)트레이싱 관점에서 **재현 가능하게** 검증하고, 왜/어디서 깨지는지까지 설명 가능한 워크플로를 제공한다.
|
|
16
|
+
|
|
17
|
+
## 대상 사용자(3)
|
|
18
|
+
|
|
19
|
+
1) RAG를 운영하는 ML/플랫폼/백엔드 엔지니어
|
|
20
|
+
2) 품질/회귀를 책임지는 QA/PM
|
|
21
|
+
3) 반복 평가/벤치마크가 필요한 외부 사용자(컨설팅/솔루션/고객사 PoC)
|
|
22
|
+
|
|
23
|
+
## 핵심 가치(3)
|
|
24
|
+
|
|
25
|
+
1) 재현성: run 단위로 평가/분석/아티팩트/트레이스를 묶고 비교할 수 있다.
|
|
26
|
+
2) 진단 가능성: 점수 변화의 원인을 모듈/스테이지/메트릭 레벨로 추적할 수 있다.
|
|
27
|
+
3) 운영 옵션화: Phoenix/Langfuse/MLflow 같은 관측은 필요할 때만 켠다.
|
|
28
|
+
|
|
29
|
+
## Non-goals(3)
|
|
30
|
+
|
|
31
|
+
1) RAG 시스템 자체를 대신 구현/호스팅하지 않는다.
|
|
32
|
+
2) 단일 점수 하나로 모든 품질을 대체하지 않는다(다중 메트릭/근거 기반).
|
|
33
|
+
3) 특정 벤더/모델에 종속되지 않는다(OpenAI/Ollama/vLLM 등 옵션화).
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## 핵심 개념 요약(공통 언어)
|
|
38
|
+
|
|
39
|
+
- **run_id**: 평가 실행의 단일 식별자. 평가/분석/아티팩트/트레이스가 이 키로 묶인다.
|
|
40
|
+
- **Artifacts**: 요약 리포트와 모듈별 원본 결과를 분리 저장한다.
|
|
41
|
+
- **Stages**: 입력/검색/출력 단계를 이벤트와 메트릭으로 남겨 원인 추적을 가능하게 한다.
|
|
42
|
+
- **Profiles**: `config/models.yaml`과 `.env`로 모델/임베딩을 바꾼다.
|
|
43
|
+
- **Analysis Pipeline**: 의도 기반 DAG로 “왜”를 설명하는 분석을 실행한다.
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## 최소 실행 시나리오(내부 개발자 기준)
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
uv run evalvault run --mode simple tests/fixtures/e2e/insurance_qa_korean.json \
|
|
51
|
+
--metrics faithfulness,answer_relevancy \
|
|
52
|
+
--profile dev \
|
|
53
|
+
--db data/db/evalvault.db \
|
|
54
|
+
--auto-analyze
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
이 실행으로 생성되는 대표 산출물:
|
|
58
|
+
|
|
59
|
+
- 요약 JSON: `reports/analysis/analysis_<RUN_ID>.json`
|
|
60
|
+
- 보고서(Markdown): `reports/analysis/analysis_<RUN_ID>.md`
|
|
61
|
+
- 아티팩트 인덱스: `reports/analysis/artifacts/analysis_<RUN_ID>/index.json`
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## CLI ↔ Web UI 연결
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
# Terminal 1
|
|
69
|
+
uv run evalvault serve-api --reload
|
|
70
|
+
|
|
71
|
+
# Terminal 2
|
|
72
|
+
cd frontend
|
|
73
|
+
npm install
|
|
74
|
+
npm run dev
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
- CLI와 Web UI가 **같은 DB 경로**를 바라보면, CLI 실행 결과가 Web UI에 바로 노출된다.
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## 문서 지도(다음으로 어디를 읽을지)
|
|
82
|
+
|
|
83
|
+
- 구조/경계: `01_architecture.md`
|
|
84
|
+
- 데이터/메트릭: `02_data_and_metrics.md`
|
|
85
|
+
- 실행 흐름: `03_workflows.md`
|
|
86
|
+
- 운영 런북: `04_operations.md`
|
|
87
|
+
- 보안 경계: `05_security.md`
|
|
88
|
+
- 품질/테스트: `06_quality_and_testing.md`
|
|
89
|
+
- UX/제품: `07_ux_and_product.md`
|
|
90
|
+
- 로드맵: `08_roadmap.md`
|
|
91
|
+
|
|
92
|
+
## 근거 링크(3+)
|
|
93
|
+
|
|
94
|
+
- 프로젝트 정의/핵심 개념: `../../README.md`
|
|
95
|
+
- 상태/제약: `../STATUS.md`
|
|
96
|
+
- 로드맵: `../ROADMAP.md`
|
|
97
|
+
- 내부 백서(개요): `../new_whitepaper/01_overview.md`
|
|
98
|
+
- 문서 운영 원칙: `../INDEX.md`
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## 전문가 관점 체크리스트
|
|
103
|
+
|
|
104
|
+
- [ ] run_id/아티팩트/트레이스가 하나의 흐름으로 설명되는가
|
|
105
|
+
- [ ] 최소 실행 시나리오가 재현 가능한가
|
|
106
|
+
- [ ] 옵션 기능(Phoenix/Langfuse/Domain Memory/DAG)이 “필수”처럼 서술되지 않는가
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
# 01. Architecture
|
|
2
|
+
|
|
3
|
+
## 목표
|
|
4
|
+
|
|
5
|
+
EvalVault의 헥사고날(Ports & Adapters) 구조를 이해하고, 어떤 경계를 유지해야 확장/교체가 안전한지 정리한다.
|
|
6
|
+
|
|
7
|
+
## 설계 원칙
|
|
8
|
+
|
|
9
|
+
- SSoT는 `docs/new_whitepaper/02_architecture.md`이며, 구현은 문서에 맞춘다.
|
|
10
|
+
- 도메인은 순수하게 유지하고, 인프라 의존은 포트/어댑터로 분리한다.
|
|
11
|
+
- 어댑터는 포트(계약)에 맞춰 교체 가능해야 한다.
|
|
12
|
+
- 설정과 런타임 선택은 코드가 아니라 프로필/환경 변수로 처리한다.
|
|
13
|
+
|
|
14
|
+
## 코드 지도(핵심 경로)
|
|
15
|
+
|
|
16
|
+
- 도메인 엔티티/서비스: `src/evalvault/domain/`
|
|
17
|
+
- 포트(계약): `src/evalvault/ports/`
|
|
18
|
+
- 어댑터(통합): `src/evalvault/adapters/`
|
|
19
|
+
- 런타임 설정/프로필: `src/evalvault/config/`, `config/models.yaml`
|
|
20
|
+
|
|
21
|
+
## 경계와 의존성 규칙
|
|
22
|
+
|
|
23
|
+
- 도메인 -> 포트는 의존 가능, 포트 -> 도메인은 인터페이스만 유지
|
|
24
|
+
- 어댑터 -> 포트 의존, 어댑터 -> 도메인 직접 의존은 최소화
|
|
25
|
+
- 구성/프로필은 런타임에 주입하며 하드코딩 금지
|
|
26
|
+
|
|
27
|
+
## 확장/교체 가이드
|
|
28
|
+
|
|
29
|
+
1) 포트 정의: `src/evalvault/ports/outbound/` 또는 `src/evalvault/ports/inbound/`
|
|
30
|
+
2) 어댑터 구현: `src/evalvault/adapters/outbound/` 또는 `src/evalvault/adapters/inbound/`
|
|
31
|
+
3) 설정 연결: `src/evalvault/config/` 및 `config/models.yaml`
|
|
32
|
+
|
|
33
|
+
예시 확장 포인트:
|
|
34
|
+
- LLM 어댑터: `src/evalvault/adapters/outbound/llm/`
|
|
35
|
+
- 트래커/관측: `src/evalvault/adapters/outbound/tracker/`
|
|
36
|
+
- 스토리지: `src/evalvault/adapters/outbound/storage/`
|
|
37
|
+
- 아티팩트 FS: `src/evalvault/adapters/outbound/artifact_fs.py`
|
|
38
|
+
- 분석 파이프라인: `src/evalvault/adapters/outbound/analysis/`
|
|
39
|
+
|
|
40
|
+
## 아키텍처 흐름(요약)
|
|
41
|
+
|
|
42
|
+
1) CLI/API 입력 -> 도메인 서비스 호출
|
|
43
|
+
2) 도메인 서비스 -> 포트를 통해 LLM/저장소/트래커 접근
|
|
44
|
+
3) 실행 결과 -> run_id 기준 저장 및 분석/리포트로 연결
|
|
45
|
+
|
|
46
|
+
## 참고(근거)
|
|
47
|
+
|
|
48
|
+
- 내부 백서(SSoT): `../new_whitepaper/02_architecture.md`
|
|
49
|
+
- 아키텍처/표준 관련: `../new_whitepaper/13_standards.md`
|
|
50
|
+
- 포트/어댑터 문서: `../api/ports/inbound.md`, `../api/adapters/inbound.md`, `../api/adapters/outbound.md`
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
# 02. Data & Metrics
|
|
2
|
+
|
|
3
|
+
## 목표
|
|
4
|
+
|
|
5
|
+
데이터셋 포맷, 메트릭, 임계값(threshold), 산출물(artifacts)이 어떻게 연결되는지 이해한다.
|
|
6
|
+
|
|
7
|
+
## 데이터셋 스키마
|
|
8
|
+
|
|
9
|
+
- 표준 스키마: `../../src/evalvault/domain/entities/dataset.py`
|
|
10
|
+
- 템플릿: `../templates/` 및 `../../dataset_templates/`
|
|
11
|
+
|
|
12
|
+
핵심 필드:
|
|
13
|
+
- `test_cases[].question`, `test_cases[].answer`, `test_cases[].contexts`
|
|
14
|
+
- 선택 필드: `test_cases[].ground_truth`, `test_cases[].metadata`
|
|
15
|
+
- 데이터셋 전체 `thresholds`: 메트릭별 합격 기준
|
|
16
|
+
|
|
17
|
+
샘플 데이터:
|
|
18
|
+
- `../../tests/fixtures/sample_dataset.json`
|
|
19
|
+
- `../../tests/fixtures/e2e/insurance_qa_korean.json`
|
|
20
|
+
|
|
21
|
+
예시 명령:
|
|
22
|
+
- `uv run evalvault run tests/fixtures/e2e/insurance_qa_korean.json --metrics faithfulness --profile dev --db data/db/evalvault.db --auto-analyze`
|
|
23
|
+
|
|
24
|
+
## 임계값(Threshold) 처리
|
|
25
|
+
|
|
26
|
+
우선순위(높음 -> 낮음):
|
|
27
|
+
1) CLI override (`--thresholds`)
|
|
28
|
+
2) Dataset `thresholds`
|
|
29
|
+
3) 프로필 기본값 (`threshold_profiles.py`)
|
|
30
|
+
4) 기본 fallback (도메인 서비스)
|
|
31
|
+
|
|
32
|
+
관련 파일:
|
|
33
|
+
- 프로필: `../../src/evalvault/domain/services/threshold_profiles.py`
|
|
34
|
+
- CSV/Excel threshold 컬럼 매핑: `../../src/evalvault/adapters/outbound/dataset/thresholds.py`
|
|
35
|
+
- 결과 엔티티: `../../src/evalvault/domain/entities/result.py`
|
|
36
|
+
|
|
37
|
+
## 메트릭 체계
|
|
38
|
+
|
|
39
|
+
- 레지스트리: `../../src/evalvault/domain/metrics/registry.py`
|
|
40
|
+
- 메트릭 API 문서: `../api/domain/metrics.md`
|
|
41
|
+
- 요약/도메인 메트릭: `../../src/evalvault/domain/metrics/`
|
|
42
|
+
|
|
43
|
+
구성 차원:
|
|
44
|
+
- source: ragas/custom
|
|
45
|
+
- category: qa/summary/retrieval/domain
|
|
46
|
+
- requirement: ground_truth/embeddings 여부
|
|
47
|
+
|
|
48
|
+
예시:
|
|
49
|
+
- QA: faithfulness, answer_relevancy, context_precision
|
|
50
|
+
- Summary: summary_score, summary_faithfulness, entity_preservation
|
|
51
|
+
- Retrieval: mrr, ndcg, hit_rate
|
|
52
|
+
|
|
53
|
+
## 산출물(Artifacts)와 index.json
|
|
54
|
+
|
|
55
|
+
분석 파이프라인은 `artifacts/` 아래에 노드별 JSON과 `index.json`을 생성한다.
|
|
56
|
+
|
|
57
|
+
관련 파일:
|
|
58
|
+
- 아티팩트 IO: `../../src/evalvault/adapters/inbound/cli/utils/analysis_io.py`
|
|
59
|
+
- FS 포트: `../../src/evalvault/ports/outbound/artifact_fs_port.py`
|
|
60
|
+
- FS 구현: `../../src/evalvault/adapters/outbound/artifact_fs.py`
|
|
61
|
+
- 아티팩트 린트: `../../src/evalvault/domain/services/artifact_lint_service.py`
|
|
62
|
+
|
|
63
|
+
## Excel/리포트
|
|
64
|
+
|
|
65
|
+
- Excel export 스펙: `../guides/EVALVAULT_RUN_EXCEL_SHEETS.md`
|
|
66
|
+
- DB export 구현: `../../src/evalvault/adapters/outbound/storage/base_sql.py`
|
|
67
|
+
- 보고서 템플릿: `../templates/eval_report_templates.md`
|
|
68
|
+
|
|
69
|
+
## 참고 경로
|
|
70
|
+
|
|
71
|
+
- 사용자 가이드: `../guides/USER_GUIDE.md`
|
|
72
|
+
- CLI 워크플로우: `../guides/RAG_CLI_WORKFLOW_TEMPLATES.md`
|
|
73
|
+
- 도메인 엔티티: `../../src/evalvault/domain/entities/`
|
|
74
|
+
- 메트릭 구현: `../../src/evalvault/domain/metrics/`
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
# 03. Workflows
|
|
2
|
+
|
|
3
|
+
## 목표
|
|
4
|
+
|
|
5
|
+
EvalVault의 주요 실행 흐름(평가→분석→비교→리포트)을 CLI/Web UI 관점에서 이해한다.
|
|
6
|
+
|
|
7
|
+
## 기본 실행 흐름
|
|
8
|
+
|
|
9
|
+
1) `evalvault run`으로 평가 실행
|
|
10
|
+
2) 결과를 DB/run_id로 저장
|
|
11
|
+
3) `--auto-analyze` 또는 `evalvault analyze`로 분석/리포트 생성
|
|
12
|
+
4) `history`/`compare`/`analyze-compare`로 재현 가능한 비교
|
|
13
|
+
|
|
14
|
+
핵심 키:
|
|
15
|
+
- `run_id`: 평가/분석/아티팩트가 묶이는 단일 식별자
|
|
16
|
+
- `reports/analysis/artifacts/analysis_<RUN_ID>/index.json`: 분석 근거 인덱스
|
|
17
|
+
|
|
18
|
+
## CLI 중심 워크플로우
|
|
19
|
+
|
|
20
|
+
평가 실행:
|
|
21
|
+
- `evalvault run <DATASET> --metrics ... --profile dev --db data/db/evalvault.db --auto-analyze`
|
|
22
|
+
|
|
23
|
+
분석:
|
|
24
|
+
- `evalvault analyze <RUN_ID> --profile dev --db data/db/evalvault.db --nlp --causal --playbook`
|
|
25
|
+
- `evalvault pipeline analyze "<query>" --run-id <RUN_ID> --profile dev --db data/db/evalvault.db`
|
|
26
|
+
|
|
27
|
+
비교:
|
|
28
|
+
- `evalvault compare <RUN_A> <RUN_B> --profile dev --db data/db/evalvault.db`
|
|
29
|
+
- `evalvault analyze-compare <RUN_A> <RUN_B> --profile dev --db data/db/evalvault.db --test t-test|mann-whitney`
|
|
30
|
+
|
|
31
|
+
아티팩트/검증:
|
|
32
|
+
- `evalvault artifacts lint reports/analysis/artifacts/analysis_<RUN_ID>`
|
|
33
|
+
|
|
34
|
+
## 분석 파이프라인 구조
|
|
35
|
+
|
|
36
|
+
- 엔티티/의도: `../../src/evalvault/domain/entities/analysis_pipeline.py`
|
|
37
|
+
- 템플릿 레지스트리: `../../src/evalvault/domain/services/pipeline_template_registry.py`
|
|
38
|
+
- 오케스트레이션: `../../src/evalvault/domain/services/pipeline_orchestrator.py`
|
|
39
|
+
- 모듈 등록: `../../src/evalvault/adapters/outbound/analysis/pipeline_factory.py`
|
|
40
|
+
- CLI 진입점: `../../src/evalvault/adapters/inbound/cli/commands/pipeline.py`
|
|
41
|
+
|
|
42
|
+
## Web UI 연동 흐름
|
|
43
|
+
|
|
44
|
+
- Web UI는 동일 DB를 사용하며 `run_id`로 CLI와 동기화된다.
|
|
45
|
+
- 주요 API:
|
|
46
|
+
- `GET /api/v1/runs/{run_id}`
|
|
47
|
+
- `GET /api/v1/runs/{run_id}/report`
|
|
48
|
+
- `GET /api/v1/runs/{run_id}/analysis-report`
|
|
49
|
+
- `GET /api/v1/runs/{run_id}/dashboard`
|
|
50
|
+
|
|
51
|
+
## 참고
|
|
52
|
+
|
|
53
|
+
- 사용자 가이드: `../guides/USER_GUIDE.md`
|
|
54
|
+
- 워크플로우 템플릿: `../guides/RAG_CLI_WORKFLOW_TEMPLATES.md`
|
|
55
|
+
- 진단 플레이북: `../guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md`
|
|
56
|
+
- 데이터 흐름 백서: `../new_whitepaper/03_data_flow.md`
|
|
57
|
+
- API/웹: `../../src/evalvault/adapters/inbound/api/`
|
|
58
|
+
- CLI: `../../src/evalvault/adapters/inbound/cli/`
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# 04. Operations
|
|
2
|
+
|
|
3
|
+
## 목표
|
|
4
|
+
|
|
5
|
+
프로필/설정, 실행 환경(로컬/도커), 관측 옵션(Phoenix/Langfuse) 등을 운영 관점에서 정리한다.
|
|
6
|
+
|
|
7
|
+
## 프로필/설정
|
|
8
|
+
|
|
9
|
+
- 프로필 정의: `../../config/models.yaml`
|
|
10
|
+
- 런타임 설정: `../../src/evalvault/config/settings.py`
|
|
11
|
+
- 환경 템플릿: `../../.env.example`, `../../.env.offline.example`
|
|
12
|
+
|
|
13
|
+
운영 기본 원칙:
|
|
14
|
+
- 프로필과 시크릿은 분리한다 (모델 정의는 git, 시크릿은 env).
|
|
15
|
+
- `EVALVAULT_PROFILE`로 런타임 구성을 고정한다.
|
|
16
|
+
|
|
17
|
+
## 실행 환경
|
|
18
|
+
|
|
19
|
+
로컬:
|
|
20
|
+
- 설치 가이드: `../getting-started/INSTALLATION.md`
|
|
21
|
+
- API 서버: `uv run evalvault serve-api --reload`
|
|
22
|
+
- Web UI: `frontend`에서 `npm run dev`
|
|
23
|
+
|
|
24
|
+
예시 명령:
|
|
25
|
+
- `cp .env.example .env`
|
|
26
|
+
- `uv sync --extra dev`
|
|
27
|
+
- `EVALVAULT_PROFILE=dev uv run evalvault serve-api --reload`
|
|
28
|
+
- `cd frontend && npm install && npm run dev`
|
|
29
|
+
|
|
30
|
+
도커:
|
|
31
|
+
- 기본 스택: `../../docker-compose.yml`
|
|
32
|
+
- 오프라인 스택: `../../docker-compose.offline.yml`
|
|
33
|
+
- Langfuse 스택: `../../docker-compose.langfuse.yml`
|
|
34
|
+
- Phoenix + OTel: `../../docker-compose.phoenix.yaml`
|
|
35
|
+
|
|
36
|
+
오프라인 운영:
|
|
37
|
+
- 이미지 export/import: `../../scripts/offline/`
|
|
38
|
+
- 오프라인 가이드: `../guides/OFFLINE_DOCKER.md`
|
|
39
|
+
|
|
40
|
+
## 관측/트레이싱
|
|
41
|
+
|
|
42
|
+
옵션 구성:
|
|
43
|
+
- Phoenix 추적: `../../src/evalvault/adapters/outbound/tracker/phoenix_adapter.py`
|
|
44
|
+
- Langfuse 추적: `../../src/evalvault/adapters/outbound/tracker/langfuse_adapter.py`
|
|
45
|
+
- MLflow 추적: `../../src/evalvault/adapters/outbound/tracker/mlflow_adapter.py`
|
|
46
|
+
|
|
47
|
+
스펙/수집:
|
|
48
|
+
- Open RAG Trace 스펙: `../architecture/open-rag-trace-spec.md`
|
|
49
|
+
- Collector: `../architecture/open-rag-trace-collector.md`
|
|
50
|
+
- 샘플: `../guides/OPEN_RAG_TRACE_SAMPLES.md`
|
|
51
|
+
|
|
52
|
+
관련 스크립트:
|
|
53
|
+
- OTel Collector 설정: `../../scripts/dev/otel-collector-config.yaml`
|
|
54
|
+
- Phoenix 모니터링: `../../scripts/ops/phoenix_watch.py`
|
|
55
|
+
|
|
56
|
+
## 운영 점검 체크리스트
|
|
57
|
+
|
|
58
|
+
- `run_id` 기준으로 DB/아티팩트/트레이스를 교차 확인한다.
|
|
59
|
+
- `reports/analysis/artifacts/analysis_<RUN_ID>/index.json`로 분석 근거를 찾는다.
|
|
60
|
+
- `evalvault ops snapshot`으로 실행 환경을 기록한다.
|
|
61
|
+
- Web UI/CLI가 같은 DB를 바라보는지 확인한다.
|
|
62
|
+
|
|
63
|
+
## 참고
|
|
64
|
+
|
|
65
|
+
- 운영 런북(SSoT): `../new_whitepaper/12_operations.md`
|
|
66
|
+
- 오프라인 가이드: `../guides/OFFLINE_DOCKER.md`
|
|
67
|
+
- 설정 API: `../api/config.md`
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# 05. Security
|
|
2
|
+
|
|
3
|
+
## 목표
|
|
4
|
+
|
|
5
|
+
시크릿/키/민감 데이터의 취급 원칙과 외부 공개 요약본의 경계 규칙을 고정한다.
|
|
6
|
+
|
|
7
|
+
## 기본 원칙
|
|
8
|
+
|
|
9
|
+
- 시크릿은 `.env`/환경변수로 관리하고, git 커밋 대상이 아니다.
|
|
10
|
+
- 외부 공개 요약본(`../EXTERNAL.md`)에는 내부 경로/운영 절차/실데이터/수치를 포함하지 않는다.
|
|
11
|
+
|
|
12
|
+
## 시크릿 관리
|
|
13
|
+
|
|
14
|
+
- `secret://` 참조 지원: `../../src/evalvault/config/secret_manager.py`
|
|
15
|
+
- 런타임 해석/검증: `../../src/evalvault/config/settings.py`
|
|
16
|
+
- 환경 템플릿: `../../.env.example`
|
|
17
|
+
|
|
18
|
+
## API 인증/토큰
|
|
19
|
+
|
|
20
|
+
- API 토큰 인증: `../../src/evalvault/adapters/inbound/api/main.py`
|
|
21
|
+
- 지식 API read/write 토큰: `../../src/evalvault/adapters/inbound/api/routers/knowledge.py`
|
|
22
|
+
- MCP 토큰: `../../src/evalvault/adapters/inbound/api/routers/mcp.py`
|
|
23
|
+
|
|
24
|
+
## 로깅/PII 마스킹
|
|
25
|
+
|
|
26
|
+
- 로그 정제/PII 마스킹: `../../src/evalvault/adapters/outbound/tracker/log_sanitizer.py`
|
|
27
|
+
- Phoenix/Langfuse/MLflow 트래커에서 공통 적용
|
|
28
|
+
|
|
29
|
+
## 운영 스냅샷/레덕션
|
|
30
|
+
|
|
31
|
+
- 환경 스냅샷 레덕션: `../../src/evalvault/domain/services/ops_snapshot_service.py`
|
|
32
|
+
- CLI: `evalvault ops snapshot --redact ...`
|
|
33
|
+
|
|
34
|
+
예시 명령:
|
|
35
|
+
- `uv run evalvault ops snapshot --redact OPENAI_API_KEY --redact LANGFUSE_SECRET_KEY --redact DATABASE_URL`
|
|
36
|
+
|
|
37
|
+
## 구성 보안
|
|
38
|
+
|
|
39
|
+
- CORS/프로덕션 검증: `../../src/evalvault/config/settings.py`
|
|
40
|
+
- Langfuse compose 시크릿 교체: `../../docker-compose.langfuse.yml`
|
|
41
|
+
|
|
42
|
+
## 참고
|
|
43
|
+
|
|
44
|
+
- 보안 문서: `../../SECURITY.md`
|
|
45
|
+
- 내부 백서: `../new_whitepaper/11_security.md`
|
|
46
|
+
- 보안 감사 로그: `../security_audit_worklog.md`
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
# 06. Quality & Testing
|
|
2
|
+
|
|
3
|
+
## 목표
|
|
4
|
+
|
|
5
|
+
테스트/회귀 게이트/품질 기준을 이해하고, 변경이 실제 개선인지 검증하는 방법을 정리한다.
|
|
6
|
+
|
|
7
|
+
## 품질 게이트 개요
|
|
8
|
+
|
|
9
|
+
- CI 기본: `../../.github/workflows/ci.yml`
|
|
10
|
+
- 회귀 게이트: `../../.github/workflows/regression-gate.yml`
|
|
11
|
+
- 회귀 실행 스크립트: `../../scripts/ci/run_regression_gate.py`
|
|
12
|
+
|
|
13
|
+
## 테스트 구성
|
|
14
|
+
|
|
15
|
+
- pytest/ruff 설정: `../../pyproject.toml`
|
|
16
|
+
- 유닛 테스트: `../../tests/unit/`
|
|
17
|
+
- 통합 테스트: `../../tests/integration/`
|
|
18
|
+
- E2E 시나리오: `../../tests/integration/test_e2e_scenarios.py`
|
|
19
|
+
|
|
20
|
+
## 회귀 게이트 설정
|
|
21
|
+
|
|
22
|
+
- 설정 파일: `../../config/regressions/ci.json`, `../../config/regressions/default.json`, `../../config/regressions/ux.json`
|
|
23
|
+
- 서비스 로직: `../../src/evalvault/domain/services/regression_gate_service.py`
|
|
24
|
+
- 러너: `../../src/evalvault/scripts/regression_runner.py`
|
|
25
|
+
|
|
26
|
+
## 표준 명령
|
|
27
|
+
|
|
28
|
+
테스트:
|
|
29
|
+
- `uv run pytest tests -v`
|
|
30
|
+
- `uv run pytest --cov=src --cov-report=term`
|
|
31
|
+
|
|
32
|
+
린트/포맷:
|
|
33
|
+
- `uv run ruff check src/ tests/`
|
|
34
|
+
- `uv run ruff format src/ tests/`
|
|
35
|
+
|
|
36
|
+
회귀 게이트:
|
|
37
|
+
- `uv run python scripts/ci/run_regression_gate.py --config config/regressions/ci.json --format text`
|
|
38
|
+
|
|
39
|
+
## 참고
|
|
40
|
+
|
|
41
|
+
- 개발 가이드: `../guides/DEV_GUIDE.md`
|
|
42
|
+
- 회귀 게이트: `../guides/CI_REGRESSION_GATE.md`
|
|
43
|
+
- 릴리즈 체크리스트: `../guides/RELEASE_CHECKLIST.md`
|
|
44
|
+
- 품질 백서: `../new_whitepaper/09_quality.md`
|
|
45
|
+
- 테스트: `../../tests/`
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
# 07. UX & Product
|
|
2
|
+
|
|
3
|
+
## 목표
|
|
4
|
+
|
|
5
|
+
사용자 관점(제품)에서 EvalVault의 경험을 정리하고, Web UI/CLI의 의도와 사용 흐름을 통합한다.
|
|
6
|
+
|
|
7
|
+
## 제품 관점 요약
|
|
8
|
+
|
|
9
|
+
- 기본 사용자 흐름은 `run_id`를 중심으로 평가→분석→비교가 연결된다.
|
|
10
|
+
- Web UI는 CLI의 핵심 워크플로를 시각적으로 재구성한다.
|
|
11
|
+
|
|
12
|
+
## CLI <-> Web UI 매핑
|
|
13
|
+
|
|
14
|
+
- 실행 목록: `history` -> Web UI 실행 리스트
|
|
15
|
+
- 분석 실험실: `analyze`, `analyze-compare`, `pipeline` -> 분석 페이지
|
|
16
|
+
- 비교 화면: `compare`, `analyze-compare` -> 비교 페이지
|
|
17
|
+
- 산출물 확인: `artifacts lint`, `report` -> 리포트/아티팩트 뷰
|
|
18
|
+
|
|
19
|
+
예시 흐름:
|
|
20
|
+
- CLI 실행: `uv run evalvault run tests/fixtures/e2e/insurance_qa_korean.json --metrics faithfulness --profile dev --db data/db/evalvault.db --auto-analyze`
|
|
21
|
+
- Web UI 확인: `http://localhost:5173` -> Dashboard -> Run Details -> Report/Dashboard
|
|
22
|
+
|
|
23
|
+
## Web UI 범위
|
|
24
|
+
|
|
25
|
+
- 계획/롤아웃: `../guides/WEBUI_CLI_ROLLOUT_PLAN.md`
|
|
26
|
+
- 분석 이관: `../web_ui_analysis_migration_plan.md`
|
|
27
|
+
- 프론트엔드 구현: `../../frontend/src/`
|
|
28
|
+
|
|
29
|
+
주요 위치:
|
|
30
|
+
- 페이지: `../../frontend/src/pages/`
|
|
31
|
+
- 컴포넌트: `../../frontend/src/components/`
|
|
32
|
+
- API 연동: `../../frontend/src/services/api.ts`
|
|
33
|
+
|
|
34
|
+
대표 페이지:
|
|
35
|
+
- Dashboard: 실행 리스트/필터/요약
|
|
36
|
+
- Evaluation Studio: 실행 설정/프리셋
|
|
37
|
+
- Analysis Lab: 인텐트 기반 분석 실행
|
|
38
|
+
- Compare Runs: A/B 비교 및 메트릭 변화
|
|
39
|
+
- Settings: 프로필/DB 경로 설정
|
|
40
|
+
|
|
41
|
+
## CLI 전용 기능(현 상태)
|
|
42
|
+
|
|
43
|
+
- Web UI는 `top_k`가 고정되어 있으며 고급 조정은 CLI/API 필요
|
|
44
|
+
- 데이터셋/실험 업로드 및 프롬프트 매니페스트는 CLI 우선
|
|
45
|
+
|
|
46
|
+
## CLI UX 개선 포인트
|
|
47
|
+
|
|
48
|
+
- 비교 명령 중복 정리: `../guides/CLI_UX_REDESIGN.md`
|
|
49
|
+
- 도움말/별칭 정비: `compare`/`analyze-compare`
|
|
50
|
+
|
|
51
|
+
## 참고
|
|
52
|
+
|
|
53
|
+
- Web UI 계획/확장: `../guides/WEBUI_CLI_ROLLOUT_PLAN.md`
|
|
54
|
+
- CLI UX 개선: `../guides/CLI_UX_REDESIGN.md`
|
|
55
|
+
- 사용자 가이드: `../guides/USER_GUIDE.md`
|
|
56
|
+
- 프론트엔드: `../../frontend/src/`
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# 08. Roadmap
|
|
2
|
+
|
|
3
|
+
## 목표
|
|
4
|
+
|
|
5
|
+
현재 로드맵을 요약하고, handbook 작성 과정에서 도출된 목적/우선순위 재조정 근거를 연결한다.
|
|
6
|
+
|
|
7
|
+
## 우선순위 요약(P0-P3)
|
|
8
|
+
|
|
9
|
+
- P0 (안정성/운영): 프로필 검증, CI/테스트 안정화, 기본 운영 안전장치
|
|
10
|
+
- P1 (사용성): Web UI 핵심 워크플로, CLI/웹 공통 규약
|
|
11
|
+
- P2 (관측성/표준): Open RAG Trace, Stage Events/Collector
|
|
12
|
+
- P3 (성능 개선): Retrieval/리랭킹/GraphRAG 개선
|
|
13
|
+
|
|
14
|
+
## 실행 근거
|
|
15
|
+
|
|
16
|
+
- 상태 요약: `../STATUS.md`
|
|
17
|
+
- 공식 로드맵: `../ROADMAP.md`
|
|
18
|
+
- 백서 로드맵: `../new_whitepaper/14_roadmap.md`
|
|
19
|
+
- 재조정 근거(부록): `../appendix-roadmap.md`
|
|
20
|
+
|
|
21
|
+
## 실행 기록/계획
|
|
22
|
+
|
|
23
|
+
- 실행 보고서: `../guides/P0_P3_EXECUTION_REPORT.md`
|
|
24
|
+
- 작업 계획: `../guides/P1_P4_WORK_PLAN.md`
|
|
25
|
+
- 다음 단계: `../guides/NEXT_STEPS_EXECUTION_PLAN.md`
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# EvalVault Handbook (External Summary)
|
|
2
|
+
|
|
3
|
+
> 외부 공개용 요약본입니다. 내부 경로/운영 절차/실데이터/수치/시크릿은 포함하지 않습니다.
|
|
4
|
+
|
|
5
|
+
## EvalVault가 하는 일
|
|
6
|
+
|
|
7
|
+
EvalVault는 RAG(Retrieval-Augmented Generation) 시스템을 대상으로,
|
|
8
|
+
"변경이 진짜 개선인지"를 데이터셋과 메트릭으로 재현 가능하게 검증하고
|
|
9
|
+
결과를 이해/비교/공유할 수 있도록 돕는 평가·분석 워크플로 도구입니다.
|
|
10
|
+
|
|
11
|
+
## 핵심 흐름
|
|
12
|
+
|
|
13
|
+
1) 데이터셋 준비
|
|
14
|
+
2) 메트릭 평가 실행
|
|
15
|
+
3) 결과 요약 및 비교
|
|
16
|
+
4) 문제 원인 분석(선택)
|
|
17
|
+
|
|
18
|
+
## 문서
|
|
19
|
+
|
|
20
|
+
- 내부 상세 handbook: `INDEX.md`
|
|
21
|
+
- 사용자/운영 가이드: `../guides/USER_GUIDE.md`
|
|
22
|
+
- 상태/로드맵: `../STATUS.md`, `../ROADMAP.md`
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
# EvalVault Handbook (교과서형 총정리)
|
|
2
|
+
|
|
3
|
+
> 본편은 내부 독자 기준(상세)으로 작성하고, 외부 공개 요약은 `docs/handbook/EXTERNAL.md`로 분리합니다.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## 목차
|
|
8
|
+
|
|
9
|
+
### 본편 (CHAPTERS)
|
|
10
|
+
|
|
11
|
+
1) `CHAPTERS/00_overview.md`
|
|
12
|
+
2) `CHAPTERS/01_architecture.md`
|
|
13
|
+
3) `CHAPTERS/02_data_and_metrics.md`
|
|
14
|
+
4) `CHAPTERS/03_workflows.md`
|
|
15
|
+
5) `CHAPTERS/04_operations.md`
|
|
16
|
+
6) `CHAPTERS/05_security.md`
|
|
17
|
+
7) `CHAPTERS/06_quality_and_testing.md`
|
|
18
|
+
8) `CHAPTERS/07_ux_and_product.md`
|
|
19
|
+
9) `CHAPTERS/08_roadmap.md`
|
|
20
|
+
|
|
21
|
+
### 부록 (Appendices)
|
|
22
|
+
|
|
23
|
+
- `appendix-file-inventory.md` (전수 인벤토리 + 정독 증거)
|
|
24
|
+
- `appendix-taxonomy.md` (문서/백서 분류 + 중복/갭 감사)
|
|
25
|
+
- `appendix-roadmap.md` (목적/로드맵 재조정 근거)
|
|
26
|
+
- `appendix-coverage-matrix.md` (파일-챕터 매핑)
|