evalvault 1.64.0__tar.gz → 1.65.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {evalvault-1.64.0 → evalvault-1.65.0}/PKG-INFO +1 -1
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/INDEX.md +3 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/ROADMAP.md +5 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/STATUS.md +1 -0
- evalvault-1.65.0/docs/guides/CLI_PARALLEL_FEATURES_SPEC.md +315 -0
- evalvault-1.65.0/docs/guides/Extension_2.md +114 -0
- evalvault-1.65.0/docs/guides/Extension_Data_Difficulty_Profiling_Custom_Judge_Model.md +1412 -0
- evalvault-1.65.0/docs/guides/PARALLEL_WORK_APPROVAL_RULES.md +51 -0
- evalvault-1.65.0/docs/guides/PROJECT_STATUS_AND_PLAN.md +291 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md +77 -2
- evalvault-1.65.0/docs/guides/RAG_NOISE_REDUCTION_GUIDE.md +284 -0
- evalvault-1.65.0/docs/guides/RAG_PERFORMANCE_IMPLEMENTATION_LOG.md +179 -0
- evalvault-1.65.0/docs/guides/RAG_PERFORMANCE_IMPROVEMENT_PROPOSAL.md +477 -0
- evalvault-1.65.0/docs/guides/refactoring_strategy.md +63 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/01_overview.md +1 -0
- evalvault-1.65.0/docs/refactor/REFAC_000_master_plan.md +161 -0
- evalvault-1.65.0/docs/refactor/REFAC_010_agent_playbook.md +83 -0
- evalvault-1.65.0/docs/refactor/REFAC_020_logging_policy.md +61 -0
- evalvault-1.65.0/docs/refactor/REFAC_030_phase0_responsibility_map.md +82 -0
- evalvault-1.65.0/docs/refactor/REFAC_040_wbs_parallel_plan.md +117 -0
- evalvault-1.65.0/docs/refactor/logs/phase-0-baseline.md +26 -0
- evalvault-1.65.0/docs/refactor/logs/phase-1-evaluator.md +26 -0
- evalvault-1.65.0/docs/refactor/logs/phase-2-cli-run.md +26 -0
- evalvault-1.65.0/docs/refactor/logs/phase-3-analysis.md +26 -0
- evalvault-1.65.0/docs/templates/eval_report_templates.md +117 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/AnalysisLab.tsx +36 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/CompareRuns.tsx +42 -1
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/RunDetails.tsx +55 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/pyproject.toml +1 -1
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/__init__.py +14 -7
- evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/artifacts.py +107 -0
- evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/calibrate_judge.py +283 -0
- evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/compare.py +290 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/history.py +13 -85
- evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/ops.py +110 -0
- evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/profile_difficulty.py +160 -0
- evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/regress.py +251 -0
- evalvault-1.65.0/src/evalvault/adapters/outbound/analysis/comparison_pipeline_adapter.py +49 -0
- evalvault-1.65.0/src/evalvault/adapters/outbound/artifact_fs.py +16 -0
- evalvault-1.65.0/src/evalvault/adapters/outbound/filesystem/__init__.py +3 -0
- evalvault-1.65.0/src/evalvault/adapters/outbound/filesystem/difficulty_profile_writer.py +50 -0
- evalvault-1.65.0/src/evalvault/adapters/outbound/filesystem/ops_snapshot_writer.py +13 -0
- evalvault-1.65.0/src/evalvault/adapters/outbound/judge_calibration_adapter.py +36 -0
- evalvault-1.65.0/src/evalvault/adapters/outbound/judge_calibration_reporter.py +57 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracker/langfuse_adapter.py +12 -7
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracker/phoenix_adapter.py +39 -12
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/__init__.py +10 -0
- evalvault-1.65.0/src/evalvault/domain/entities/judge_calibration.py +50 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/stage.py +11 -3
- evalvault-1.65.0/src/evalvault/domain/services/artifact_lint_service.py +268 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/benchmark_runner.py +1 -6
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/dataset_preprocessor.py +26 -0
- evalvault-1.65.0/src/evalvault/domain/services/difficulty_profile_reporter.py +25 -0
- evalvault-1.65.0/src/evalvault/domain/services/difficulty_profiling_service.py +304 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/evaluator.py +2 -0
- evalvault-1.65.0/src/evalvault/domain/services/judge_calibration_service.py +495 -0
- evalvault-1.65.0/src/evalvault/domain/services/ops_snapshot_service.py +159 -0
- evalvault-1.65.0/src/evalvault/domain/services/regression_gate_service.py +199 -0
- evalvault-1.65.0/src/evalvault/domain/services/run_comparison_service.py +159 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/stage_event_builder.py +6 -1
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/stage_metric_service.py +83 -18
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/__init__.py +4 -0
- evalvault-1.65.0/src/evalvault/ports/outbound/artifact_fs_port.py +12 -0
- evalvault-1.65.0/src/evalvault/ports/outbound/comparison_pipeline_port.py +22 -0
- evalvault-1.65.0/src/evalvault/ports/outbound/difficulty_profile_port.py +15 -0
- evalvault-1.65.0/src/evalvault/ports/outbound/judge_calibration_port.py +22 -0
- evalvault-1.65.0/src/evalvault/ports/outbound/ops_snapshot_port.py +8 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_dataset_preprocessor.py +45 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_evaluator_comprehensive.py +34 -2
- evalvault-1.65.0/tests/unit/domain/services/test_judge_calibration_service.py +95 -0
- evalvault-1.65.0/tests/unit/domain/services/test_ops_snapshot_service.py +52 -0
- evalvault-1.65.0/tests/unit/domain/services/test_regression_gate_service.py +68 -0
- evalvault-1.65.0/tests/unit/test_artifact_lint_service.py +68 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_cli.py +110 -20
- evalvault-1.65.0/tests/unit/test_cli_artifacts.py +37 -0
- evalvault-1.65.0/tests/unit/test_cli_calibrate_judge.py +111 -0
- evalvault-1.65.0/tests/unit/test_cli_ops.py +14 -0
- evalvault-1.65.0/tests/unit/test_difficulty_profiling_service.py +120 -0
- evalvault-1.65.0/tests/unit/test_regress_cli.py +104 -0
- evalvault-1.65.0/tests/unit/test_run_comparison_service.py +86 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_stage_metric_service.py +58 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/uv.lock +1 -1
- {evalvault-1.64.0 → evalvault-1.65.0}/.dockerignore +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/.env.example +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/.github/workflows/ci.yml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/.github/workflows/release.yml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/.github/workflows/stale.yml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/.gitignore +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/.pre-commit-config.yaml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/.python-version +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/AGENTS.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/CHANGELOG.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/CLAUDE.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/CODE_OF_CONDUCT.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/CONTRIBUTING.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/Dockerfile +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/LICENSE.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/README.en.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/README.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/SECURITY.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/README.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/agent.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/client.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/config.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/main.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/memory/README.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/memory/shared/decisions.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/memory/shared/dependencies.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/memory/templates/coordinator_guide.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/memory/templates/work_log_template.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/memory_integration.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/progress.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts/app_spec.txt +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts/baseline.txt +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts/coding_prompt.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts/existing_project_prompt.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts/improvement/architecture_prompt.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts/improvement/base_prompt.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts/improvement/coordinator_prompt.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts/improvement/observability_prompt.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts/initializer_prompt.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts/prompt_manifest.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts/system.txt +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/prompts.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/requirements.txt +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/agent/security.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/config/domains/insurance/memory.yaml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/config/domains/insurance/terms_dictionary_en.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/config/domains/insurance/terms_dictionary_ko.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/config/methods.yaml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/config/models.yaml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/config/ragas_prompts_override.yaml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/config/regressions/default.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/config/regressions/ux.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/config/stage_metric_playbook.yaml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/config/stage_metric_thresholds.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/datasets/dummy_test_dataset.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/datasets/insurance_qa_korean.csv +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/datasets/insurance_qa_korean.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/datasets/insurance_qa_korean_2.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/datasets/insurance_qa_korean_3.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/datasets/ragas_ko90_en10.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/datasets/sample.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/datasets/visualization_20q_cluster_map.csv +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/datasets/visualization_20q_korean.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/datasets/visualization_2q_cluster_map.csv +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/datasets/visualization_2q_korean.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/kg/knowledge_graph.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/raw/The Complete Guide to Mastering Suno Advanced Strategies for Professional Music Generation.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/raw/edge_cases.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/raw/run_mode_full_domain_memory.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/data/raw/sample_rag_knowledge.txt +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/dataset_templates/dataset_template.csv +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/dataset_templates/dataset_template.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/dataset_templates/dataset_template.xlsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/dataset_templates/method_input_template.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docker-compose.langfuse.yml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docker-compose.phoenix.yaml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docker-compose.yml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/README.ko.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/api/adapters/inbound.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/api/adapters/outbound.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/api/config.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/api/domain/entities.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/api/domain/metrics.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/api/domain/services.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/api/ports/inbound.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/api/ports/outbound.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/architecture/open-rag-trace-collector.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/architecture/open-rag-trace-spec.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/getting-started/INSTALLATION.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/AGENTS_SYSTEM_GUIDE.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/CLI_MCP_PLAN.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/DEV_GUIDE.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/EVALVAULT_RUN_EXCEL_SHEETS.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/EVALVAULT_WORK_PLAN.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/EXTERNAL_TRACE_API_SPEC.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/LENA_MVP_IMPLEMENTATION_PLAN.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/LENA_RAGAS_CALIBRATION_DEV_PLAN.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/OPEN_RAG_TRACE_INTERNAL_ADAPTER.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/OPEN_RAG_TRACE_SAMPLES.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/PRD_LENA.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/RELEASE_CHECKLIST.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/USER_GUIDE.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/WEBUI_CLI_ROLLOUT_PLAN.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/prompt_suggestions_design.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/guides/rag_human_feedback_calibration_implementation_plan.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/mapping/component-to-whitepaper.yaml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/00_frontmatter.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/02_architecture.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/03_data_flow.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/04_components.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/05_expert_lenses.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/06_implementation.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/07_advanced.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/08_customization.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/09_quality.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/10_performance.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/11_security.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/12_operations.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/13_standards.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/14_roadmap.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/INDEX.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/new_whitepaper/STYLE_GUIDE.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/security_audit_worklog.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/stylesheets/extra.css +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/templates/dataset_template.csv +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/templates/dataset_template.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/templates/dataset_template.xlsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/templates/kg_template.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/templates/otel_openinference_trace_example.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/templates/ragas_dataset_example_ko90_en10.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/templates/retriever_docs_template.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/tools/generate-whitepaper.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/docs/web_ui_analysis_migration_plan.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/dummy_test_dataset.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/README.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/benchmarks/README.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/benchmarks/korean_rag/faithfulness_test.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/benchmarks/korean_rag/insurance_qa_100.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/benchmarks/korean_rag/keyword_extraction_test.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/benchmarks/korean_rag/retrieval_test.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/benchmarks/output/comparison.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/benchmarks/output/full_results.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/benchmarks/output/leaderboard.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/benchmarks/output/results_mteb.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/benchmarks/output/retrieval_result.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/benchmarks/run_korean_benchmark.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/kg_generator_demo.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/method_plugin_template/README.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/method_plugin_template/pyproject.toml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/method_plugin_template/src/method_plugin_template/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/method_plugin_template/src/method_plugin_template/methods.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/stage_events.jsonl +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/usecase/comprehensive_workflow_test.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/usecase/insurance_eval_dataset.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/examples/usecase/output/comprehensive_report.html +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/.env.example +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/.gitignore +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/README.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/e2e/analysis-compare.spec.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/e2e/analysis-lab.spec.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/e2e/compare-runs.spec.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/e2e/dashboard.spec.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/e2e/domain-memory.spec.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/e2e/evaluation-studio.spec.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/e2e/knowledge-base.spec.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/e2e/mocks/intents.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/e2e/mocks/run_details.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/e2e/mocks/runs.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/e2e/run-details.spec.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/eslint.config.js +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/index.html +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/package-lock.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/package.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/playwright.config.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/public/vite.svg +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/App.css +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/App.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/assets/react.svg +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/components/AnalysisNodeOutputs.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/components/InsightSpacePanel.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/components/Layout.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/components/MarkdownContent.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/components/PrioritySummaryPanel.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/components/SpaceLegend.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/components/SpacePlot2D.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/components/SpacePlot3D.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/components/StatusBadge.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/components/VirtualizedText.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/config/ui.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/config.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/hooks/useInsightSpace.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/index.css +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/main.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/AnalysisCompareView.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/AnalysisResultView.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/ComprehensiveAnalysis.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/CustomerReport.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/Dashboard.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/DomainMemory.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/EvaluationStudio.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/KnowledgeBase.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/Settings.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/Visualization.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/pages/VisualizationHome.tsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/services/api.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/types/plotly.d.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/utils/format.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/utils/phoenix.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/utils/runAnalytics.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/utils/score.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/src/utils/summaryMetrics.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/tailwind.config.js +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/tsconfig.app.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/tsconfig.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/tsconfig.node.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/frontend/vite.config.ts +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/mkdocs.yml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/package-lock.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/prompts/system_override.txt +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/.gitkeep +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/README.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/final_output.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/index.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/load_runs.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/report.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_change_detection.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_metric_comparison.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/final_output.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/index.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/load_runs.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/report.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_change_detection.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_metric_comparison.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/final_output.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/index.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/load_runs.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/report.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_change_detection.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_metric_comparison.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/comparison_0aa9fab0_9fbf4776.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/comparison_0aa9fab0_9fbf4776.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/comparison_0aa9fab0_f1287e90.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/comparison_0aa9fab0_f1287e90.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/comparison_8f825b22_4516d358.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/comparison_8f825b22_4516d358.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/comparison_9fbf4776_a491fa0e.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/comparison_9fbf4776_a491fa0e.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/comparison_f1287e90_8f825b22.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/comparison/comparison_f1287e90_8f825b22.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/debug_report_r1_smoke.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/debug_report_r2_graphrag.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/debug_report_r2_graphrag_openai.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/debug_report_r3_bm25.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/debug_report_r3_bm25_langfuse3.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/debug_report_r3_dense_faiss.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/improvement_1d91a667-4288-4742-be3a-a8f5310c5140.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r2_graphrag_openai_stage_events.jsonl +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r2_graphrag_openai_stage_report.txt +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r2_graphrag_stage_events.jsonl +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r2_graphrag_stage_report.txt +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r3_bm25_langfuse2_stage_events.jsonl +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r3_bm25_langfuse3_stage_events.jsonl +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r3_bm25_langfuse_stage_events.jsonl +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r3_bm25_phoenix_stage_events.jsonl +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r3_bm25_stage_events.jsonl +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r3_bm25_stage_report.txt +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r3_dense_faiss_stage_events.jsonl +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/r3_dense_faiss_stage_report.txt +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/retrieval_benchmark_smoke_precision.csv +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/retrieval_benchmark_smoke_precision_graphrag.csv +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/reports/retrieval_benchmark_smoke_precision_multi.csv +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/benchmark/download_kmmlu.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/dev/open_rag_trace_demo.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/dev/open_rag_trace_integration_template.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/dev/otel-collector-config.yaml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/dev/start_web_ui_with_phoenix.sh +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/dev/validate_open_rag_trace.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/dev_seed_pipeline_results.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/docs/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/docs/analyzer/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/docs/analyzer/ast_scanner.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/docs/analyzer/confidence_scorer.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/docs/analyzer/graph_builder.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/docs/analyzer/side_effect_detector.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/docs/generate_api_docs.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/docs/models/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/docs/models/schema.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/docs/renderer/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/docs/renderer/html_generator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/ops/phoenix_watch.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/perf/backfill_langfuse_trace_url.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/perf/r3_dense_smoke.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/perf/r3_evalvault_run_dataset.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/perf/r3_retriever_docs.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/perf/r3_smoke_real.jsonl +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/perf/r3_stage_events_sample.jsonl +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/pipeline_template_inspect.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/reports/generate_release_notes.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/run_with_timeout.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/test_full_evaluation.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/tests/run_regressions.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/tests/run_retriever_stage_report_smoke.sh +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/validate_tutorials.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/verify_ragas_compliance.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/scripts/verify_workflows.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/main.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/benchmark.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/config.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/domain.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/knowledge.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/pipeline.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/runs.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/app.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/agent.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/analyze.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/api.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/benchmark.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/calibrate.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/config.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/debug.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/domain.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/experiment.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/gate.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/generate.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/init.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/kg.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/langfuse.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/method.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/phoenix.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/pipeline.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/prompts.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/run.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/run_helpers.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/stage.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/analysis_io.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/console.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/errors.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/formatters.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/options.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/presets.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/progress.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/validators.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/mcp/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/mcp/schemas.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/inbound/mcp/tools.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/analysis_report_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/base_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/bm25_searcher_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/causal_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/causal_analyzer_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/common.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/comparison_report_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/data_loader_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/detailed_report_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/diagnostic_playbook_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/embedding_analyzer_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/embedding_distribution_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/embedding_searcher_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/hybrid_rrf_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/hybrid_weighted_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/hypothesis_generator_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/llm_report_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/low_performer_extractor_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/model_analyzer_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/morpheme_analyzer_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/morpheme_quality_checker_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/network_analyzer_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/nlp_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/nlp_analyzer_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/pattern_detector_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/pipeline_factory.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/pipeline_helpers.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/priority_summary_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/ragas_evaluator_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/retrieval_analyzer_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/retrieval_benchmark_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/retrieval_quality_checker_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/root_cause_analyzer_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/run_analyzer_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/run_change_detector_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/run_comparator_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/run_loader_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/run_metric_comparator_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/search_comparator_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/statistical_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/statistical_analyzer_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/statistical_comparator_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/summary_report_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/time_series_analyzer_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/timeseries_advanced_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/trend_detector_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/verification_report_module.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/benchmark/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/benchmark/lm_eval_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/cache/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/cache/hybrid_cache.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/cache/memory_cache.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/base.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/csv_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/excel_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/json_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/loader_factory.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/method_input_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/streaming_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/templates.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/thresholds.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/debug/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/debug/report_renderer.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/documents/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/documents/ocr/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/documents/ocr/paddleocr_backend.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/documents/pdf_extractor.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/documents/versioned_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/domain_memory/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/domain_memory/domain_memory_schema.sql +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/domain_memory/sqlite_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/improvement/insight_generator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/improvement/pattern_detector.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/improvement/playbook_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/improvement/stage_metric_playbook_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/kg/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/kg/graph_rag_retriever.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/kg/networkx_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/kg/parallel_kg_builder.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/kg/query_strategies.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/anthropic_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/azure_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/base.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/factory.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/instructor_factory.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/llm_relation_augmenter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/ollama_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/openai_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/token_aware_chat.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/vllm_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/methods/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/methods/baseline_oracle.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/methods/external_command.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/methods/registry.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/bm25_retriever.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/dense_retriever.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/document_chunker.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/hybrid_retriever.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/kiwi_tokenizer.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/korean_evaluation.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/korean_stopwords.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/toolkit.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/toolkit_factory.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/phoenix/sync_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/report/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/report/dashboard_generator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/report/llm_report_generator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/report/markdown_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/base_sql.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/benchmark_storage_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/postgres_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/postgres_schema.sql +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/schema.sql +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/sqlite_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/open_rag_log_handler.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_decorators.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_helpers.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/phoenix_tracer_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracker/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracker/log_sanitizer.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracker/mlflow_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/config/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/config/agent_types.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/config/domain_config.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/config/instrumentation.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/config/langfuse_support.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/config/model_config.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/config/phoenix_support.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/config/playbooks/improvement_playbook.yaml +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/config/secret_manager.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/config/settings.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/debug_ragas.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/debug_ragas_real.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/analysis.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/analysis_pipeline.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/benchmark.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/benchmark_run.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/dataset.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/debug.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/experiment.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/feedback.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/improvement.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/kg.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/memory.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/method.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/prompt.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/prompt_suggestion.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/rag_trace.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/entities/result.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/metrics/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/metrics/analysis_registry.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/metrics/confidence.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/metrics/contextual_relevancy.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/metrics/entity_preservation.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/metrics/insurance.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/metrics/no_answer.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/metrics/registry.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/metrics/retrieval_rank.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/metrics/terms_dictionary.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/metrics/text_match.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/analysis_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/async_batch_executor.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/batch_executor.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/benchmark_report_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/benchmark_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/cache_metrics.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/cluster_map_builder.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/debug_report_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/document_chunker.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/document_versioning.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/domain_learning_hook.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/embedding_overlay.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/entity_extractor.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/experiment_comparator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/experiment_manager.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/experiment_reporter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/experiment_repository.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/experiment_statistics.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/holdout_splitter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/improvement_guide_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/intent_classifier.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/kg_generator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/memory_aware_evaluator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/memory_based_analysis.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/method_runner.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/pipeline_orchestrator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/pipeline_template_registry.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_candidate_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_manifest.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_registry.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_scoring_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_status.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_suggestion_reporter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/ragas_prompt_overrides.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/retrieval_metrics.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/retriever_context.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/satisfaction_calibration_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/stage_metric_guide_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/stage_summary_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/synthetic_qa_generator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/testset_generator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/threshold_profiles.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/unified_report_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/domain/services/visual_space_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/mkdocs_helpers.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/inbound/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/inbound/analysis_pipeline_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/inbound/evaluator_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/inbound/learning_hook_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/inbound/web_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/analysis_cache_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/analysis_module_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/analysis_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/benchmark_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/causal_analysis_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/dataset_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/domain_memory_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/embedding_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/improvement_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/intent_classifier_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/korean_nlp_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/llm_factory_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/llm_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/method_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/nlp_analysis_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/relation_augmenter_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/report_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/stage_storage_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/storage_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/tracer_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/ports/outbound/tracker_port.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/reports/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/reports/release_notes.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/scripts/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/src/evalvault/scripts/regression_runner.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/conftest.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/README.md +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/benchmark/retrieval_ground_truth_min.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/benchmark/retrieval_ground_truth_multi.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/auto_insurance_qa_korean_full.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/comprehensive_dataset.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/edge_cases.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/edge_cases.xlsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/evaluation_test_sample.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/graphrag_retriever_docs.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/graphrag_smoke.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_document.txt +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_english.csv +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_english.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_english.xlsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_korean.csv +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_korean.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_korean.xlsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_korean_versioned_pdf.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/run_mode_full_domain_memory.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/run_mode_simple.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/e2e/summary_eval_minimal.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/kg/minimal_graph.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/sample_dataset.csv +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/sample_dataset.json +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/fixtures/sample_dataset.xlsx +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/benchmark/test_benchmark_service_integration.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/conftest.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/test_cli_integration.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/test_data_flow.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/test_e2e_scenarios.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/test_evaluation_flow.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/test_full_workflow.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/test_langfuse_flow.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/test_phoenix_flow.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/test_pipeline_api_contracts.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/test_storage_flow.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/integration/test_summary_eval_fixture.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/optional_deps.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/inbound/mcp/test_execute_tools.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/inbound/mcp/test_read_tools.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/outbound/documents/test_pdf_extractor.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/outbound/documents/test_versioned_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/outbound/improvement/test_insight_generator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/outbound/improvement/test_pattern_detector.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/outbound/improvement/test_playbook_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/outbound/improvement/test_stage_metric_playbook_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/outbound/kg/test_graph_rag_retriever.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/outbound/kg/test_parallel_kg_builder.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/adapters/outbound/storage/test_benchmark_storage_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/config/test_phoenix_support.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/conftest.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/metrics/test_analysis_metric_registry.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/metrics/test_confidence.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/metrics/test_contextual_relevancy.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/metrics/test_entity_preservation.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/metrics/test_metric_registry.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/metrics/test_no_answer.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/metrics/test_retrieval_rank.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/metrics/test_text_match.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_cache_metrics.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_claim_level.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_document_versioning.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_holdout_splitter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_improvement_guide_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_retrieval_metrics.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_retriever_context.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_stage_event_builder.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_stage_metric_guide_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/services/test_synthetic_qa_generator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/test_embedding_overlay.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/test_prompt_manifest.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/domain/test_prompt_status.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/reports/test_release_notes.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/scripts/test_regression_runner.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_agent_types.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_analysis_entities.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_analysis_modules.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_analysis_pipeline.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_analysis_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_anthropic_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_async_batch_executor.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_azure_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_benchmark_helpers.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_benchmark_runner.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_causal_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_cli_domain.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_cli_init.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_cli_progress.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_cli_utils.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_data_loaders.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_domain_config.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_domain_memory.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_entities.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_entities_kg.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_entity_extractor.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_evaluator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_experiment.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_hybrid_cache.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_instrumentation.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_insurance_metric.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_intent_classifier.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_kg_generator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_kg_networkx.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_kiwi_tokenizer.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_kiwi_warning_suppression.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_korean_dense.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_korean_evaluation.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_korean_retrieval.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_langfuse_tracker.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_llm_relation_augmenter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_lm_eval_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_markdown_report.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_memory_cache.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_memory_services.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_method_plugins.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_mlflow_tracker.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_model_config.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_nlp_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_nlp_entities.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_ollama_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_openai_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_phoenix_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_pipeline_orchestrator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_ports.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_postgres_storage.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_prompt_candidate_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_rag_trace_entities.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_run_memory_helpers.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_run_mode_fixtures.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_settings.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_sqlite_storage.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_stage_cli.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_stage_storage.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_stage_summary_service.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_statistical_adapter.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_streaming_loader.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_summary_eval_fixture.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_testset_generator.py +0 -0
- {evalvault-1.64.0 → evalvault-1.65.0}/tests/unit/test_web_adapter.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: evalvault
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.65.0
|
|
4
4
|
Summary: RAG evaluation system using Ragas with Phoenix/Langfuse tracing
|
|
5
5
|
Project-URL: Homepage, https://github.com/ntts9990/EvalVault
|
|
6
6
|
Project-URL: Documentation, https://github.com/ntts9990/EvalVault#readme
|
|
@@ -19,7 +19,10 @@
|
|
|
19
19
|
- Web UI 확장 설계서: `guides/WEBUI_CLI_ROLLOUT_PLAN.md` (1단계 구현 파일 목록 포함)
|
|
20
20
|
- RAGAS 인간 피드백 보정: `guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md`
|
|
21
21
|
- 진단 플레이북: `guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md` (문제→분석→해석→액션 흐름)
|
|
22
|
+
- RAG 성능 개선 제안서: `guides/RAG_PERFORMANCE_IMPROVEMENT_PROPOSAL.md` (목적/미션·KPI·로드맵)
|
|
23
|
+
- CLI 병렬 기능 설계서: `guides/CLI_PARALLEL_FEATURES_SPEC.md`
|
|
22
24
|
- 실행 결과 엑셀 시트 요약: `guides/EVALVAULT_RUN_EXCEL_SHEETS.md`
|
|
25
|
+
- 평가 리포트 템플릿: `templates/eval_report_templates.md`
|
|
23
26
|
- 릴리즈 체크리스트: `guides/RELEASE_CHECKLIST.md`
|
|
24
27
|
- 상태 요약: `STATUS.md`
|
|
25
28
|
- 로드맵: `ROADMAP.md`
|
|
@@ -24,6 +24,11 @@
|
|
|
24
24
|
- Open RAG Trace 스펙/샘플을 실제 운영 요구에 맞춰 점진 확장(버전 정책 준수)
|
|
25
25
|
- Collector 구성 및 데이터 보존(artifact 분리, PII 마스킹) 가이드 강화
|
|
26
26
|
|
|
27
|
+
### P3 (성능 개선 로드맵)
|
|
28
|
+
- RAG 성능 개선 제안서 기반으로 KPI/평가 프로토콜/로드맵 정립
|
|
29
|
+
- Retrieval/리랭킹/GraphRAG 실험과 운영 지표 통합
|
|
30
|
+
- 전문가 관점(인지/UX/운영) 기반 개선 루프 고도화
|
|
31
|
+
|
|
27
32
|
## 작업 트래킹
|
|
28
33
|
|
|
29
34
|
- 구체적인 이슈/PR 단위 계획은 GitHub Issues/PR에서 관리합니다.
|
|
@@ -12,6 +12,7 @@ EvalVault의 목표는 **RAG 평가/분석/추적을 하나의 Run 단위로 연
|
|
|
12
12
|
- **Observability**: Phoenix(OpenTelemetry/OpenInference) 및 (선택) Langfuse/MLflow
|
|
13
13
|
- **프로필 기반 모델 전환**: `config/models.yaml` + `.env`로 OpenAI/Ollama/vLLM/Anthropic 등
|
|
14
14
|
- **Open RAG Trace 표준**: 외부/내부 RAG 시스템을 표준 스키마로 계측/수집
|
|
15
|
+
- **성능 개선 프레임**: `guides/RAG_PERFORMANCE_IMPROVEMENT_PROPOSAL.md`에 KPI/평가/로드맵 정리
|
|
15
16
|
|
|
16
17
|
## 현재 제약 (투명 공개)
|
|
17
18
|
|
|
@@ -0,0 +1,315 @@
|
|
|
1
|
+
# CLI Parallel Features Spec (Draft)
|
|
2
|
+
|
|
3
|
+
> Audience: CLI/Platform contributors
|
|
4
|
+
> Purpose: Future CLI features aligned with SOLID, BDD, hexagonal & clean architecture
|
|
5
|
+
> Last Updated: 2026-01-18
|
|
6
|
+
|
|
7
|
+
## 1. Overview
|
|
8
|
+
|
|
9
|
+
This document specifies new CLI features that are parallel-by-default, deterministic, and cleanly separated by ports/adapters. The scope is design-level documentation with stable JSON outputs and BDD scenarios.
|
|
10
|
+
|
|
11
|
+
Design goals:
|
|
12
|
+
- SOLID: each command = one use-case orchestrator; dependencies injected via ports
|
|
13
|
+
- Clean/Hexagonal: CLI is an inbound adapter; domain services depend on outbound ports only
|
|
14
|
+
- Parallel execution: bounded concurrency with deterministic aggregation
|
|
15
|
+
- BDD: user-visible behavior is defined via Gherkin scenarios
|
|
16
|
+
|
|
17
|
+
Collaboration rules (conflict avoidance):
|
|
18
|
+
- Each stream modifies different files only.
|
|
19
|
+
- Shared schemas or interfaces change only after explicit agreement.
|
|
20
|
+
- Documentation edits are assigned to a single owner to avoid merge conflicts.
|
|
21
|
+
|
|
22
|
+
## 1.1 Parallel Agent Implementation Plan (Execution)
|
|
23
|
+
|
|
24
|
+
Scope:
|
|
25
|
+
- Implement all commands below in parallel (CLI + domain services + ports + adapters).
|
|
26
|
+
- Each command is owned by exactly one agent end-to-end.
|
|
27
|
+
|
|
28
|
+
Ownership:
|
|
29
|
+
- Agent Compare: `evalvault compare`
|
|
30
|
+
- Agent Calibrate: `evalvault calibrate-judge`
|
|
31
|
+
- Agent Difficulty: `evalvault profile-difficulty`
|
|
32
|
+
- Agent Regress: `evalvault regress`
|
|
33
|
+
- Agent Artifacts: `evalvault artifacts lint`
|
|
34
|
+
- Agent Ops: `evalvault ops snapshot`
|
|
35
|
+
|
|
36
|
+
File boundaries (default):
|
|
37
|
+
- CLI command module for the command
|
|
38
|
+
- Domain service (one use-case service per command)
|
|
39
|
+
- Outbound port interfaces needed by that service
|
|
40
|
+
- Outbound adapters for storage/reporting/FS as needed
|
|
41
|
+
- Tests for the command/service
|
|
42
|
+
|
|
43
|
+
Shared files (change only with explicit agreement):
|
|
44
|
+
- `adapters/inbound/cli/app.py`
|
|
45
|
+
- `adapters/inbound/cli/commands/__init__.py`
|
|
46
|
+
- Common JSON envelope schema or report templates
|
|
47
|
+
- `domain/services/async_batch_executor.py`
|
|
48
|
+
|
|
49
|
+
Definition of done (per agent):
|
|
50
|
+
- CLI command registered and functional with `--help` and a basic run path
|
|
51
|
+
- Domain service + ports/adapters implemented for the use-case
|
|
52
|
+
- Tests added for core logic and CLI wiring
|
|
53
|
+
- Tests and lint pass with the standard project commands
|
|
54
|
+
|
|
55
|
+
Test commands (standard project flow):
|
|
56
|
+
- `uv run ruff check src/ tests/`
|
|
57
|
+
- `uv run ruff format src/ tests/`
|
|
58
|
+
- `uv run pytest tests -v`
|
|
59
|
+
|
|
60
|
+
## 2. Command Specs
|
|
61
|
+
|
|
62
|
+
### 2.1 `evalvault compare`
|
|
63
|
+
|
|
64
|
+
Purpose:
|
|
65
|
+
- Compare two runs (metrics, prompts/config diffs, difficulty distribution) and output a unified report.
|
|
66
|
+
|
|
67
|
+
Synopsis:
|
|
68
|
+
```
|
|
69
|
+
uv run evalvault compare RUN_A RUN_B \
|
|
70
|
+
--db data/db/evalvault.db \
|
|
71
|
+
--metrics faithfulness,answer_relevancy \
|
|
72
|
+
--test t-test \
|
|
73
|
+
--format table \
|
|
74
|
+
--output reports/comparison/comparison_RUNA_RUNB.json \
|
|
75
|
+
--report reports/comparison/comparison_RUNA_RUNB.md \
|
|
76
|
+
--output-dir reports/comparison \
|
|
77
|
+
--artifacts-dir reports/comparison/artifacts/comparison_RUNA_RUNB \
|
|
78
|
+
--parallel --concurrency 8
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Options:
|
|
82
|
+
- `--db, -D <path>`: sqlite db path
|
|
83
|
+
- `--metrics, -m <csv>`: allowlist of metrics
|
|
84
|
+
- `--test, -t <t-test|mann-whitney>`
|
|
85
|
+
- `--format, -f <table|json>`
|
|
86
|
+
- `--output, -o <path>`
|
|
87
|
+
- `--report <path>`
|
|
88
|
+
- `--output-dir <path>`
|
|
89
|
+
- `--artifacts-dir <path>`
|
|
90
|
+
- `--parallel/--no-parallel`, `--concurrency <int>`
|
|
91
|
+
|
|
92
|
+
Exit codes:
|
|
93
|
+
- `0`: success
|
|
94
|
+
- `1`: invalid args or missing run
|
|
95
|
+
- `2`: report generation degraded
|
|
96
|
+
|
|
97
|
+
### 2.2 `evalvault calibrate-judge`
|
|
98
|
+
|
|
99
|
+
Purpose:
|
|
100
|
+
- Calibrate judge scores and emit reliability summary.
|
|
101
|
+
|
|
102
|
+
Synopsis:
|
|
103
|
+
```
|
|
104
|
+
uv run evalvault calibrate-judge RUN_ID \
|
|
105
|
+
--db data/db/evalvault.db \
|
|
106
|
+
--labels-source feedback \
|
|
107
|
+
--method isotonic \
|
|
108
|
+
--metric faithfulness \
|
|
109
|
+
--holdout-ratio 0.2 \
|
|
110
|
+
--seed 42 \
|
|
111
|
+
--write-back \
|
|
112
|
+
--output reports/calibration/judge_calibration_RUNID.json \
|
|
113
|
+
--parallel --concurrency 8
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Options:
|
|
117
|
+
- `--labels-source <feedback|gold|hybrid>`
|
|
118
|
+
- `--method <platt|isotonic|temperature|none>`
|
|
119
|
+
- `--metric <name>` (repeatable)
|
|
120
|
+
- `--holdout-ratio <float>`
|
|
121
|
+
- `--seed <int>`
|
|
122
|
+
- `--write-back`
|
|
123
|
+
- `--output, -o <path>`
|
|
124
|
+
- `--artifacts-dir <path>`
|
|
125
|
+
- `--parallel/--no-parallel`, `--concurrency <int>`
|
|
126
|
+
|
|
127
|
+
Exit codes:
|
|
128
|
+
- `0`: success
|
|
129
|
+
- `1`: labels missing / invalid args
|
|
130
|
+
- `2`: calibration quality below gate
|
|
131
|
+
|
|
132
|
+
### 2.3 `evalvault profile-difficulty`
|
|
133
|
+
|
|
134
|
+
Purpose:
|
|
135
|
+
- Compute difficulty buckets for a dataset or a run.
|
|
136
|
+
|
|
137
|
+
Synopsis:
|
|
138
|
+
```
|
|
139
|
+
uv run evalvault profile-difficulty \
|
|
140
|
+
--db data/db/evalvault.db \
|
|
141
|
+
--dataset-name insurance-qa \
|
|
142
|
+
--limit-runs 50 \
|
|
143
|
+
--metrics faithfulness,answer_relevancy \
|
|
144
|
+
--bucket-count 5 \
|
|
145
|
+
--output reports/difficulty/difficulty_insurance-qa.json \
|
|
146
|
+
--parallel --concurrency 8
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
Options:
|
|
150
|
+
- `--dataset-name <string>` or `--run-id <id>`
|
|
151
|
+
- `--limit-runs <int>`
|
|
152
|
+
- `--metrics, -m <csv>`
|
|
153
|
+
- `--bucket-count <int>`
|
|
154
|
+
- `--min-samples <int>`
|
|
155
|
+
- `--output, -o <path>`
|
|
156
|
+
- `--artifacts-dir <path>`
|
|
157
|
+
- `--parallel/--no-parallel`, `--concurrency <int>`
|
|
158
|
+
|
|
159
|
+
Exit codes:
|
|
160
|
+
- `0`: success
|
|
161
|
+
- `1`: insufficient history or invalid args
|
|
162
|
+
|
|
163
|
+
### 2.4 `evalvault regress`
|
|
164
|
+
|
|
165
|
+
Purpose:
|
|
166
|
+
- CI-grade regression gate vs baseline run.
|
|
167
|
+
|
|
168
|
+
Synopsis:
|
|
169
|
+
```
|
|
170
|
+
uv run evalvault regress RUN_CANDIDATE \
|
|
171
|
+
--db data/db/evalvault.db \
|
|
172
|
+
--baseline RUN_BASELINE \
|
|
173
|
+
--fail-on-regression 0.05 \
|
|
174
|
+
--test t-test \
|
|
175
|
+
--metrics faithfulness,answer_relevancy \
|
|
176
|
+
--format github-actions \
|
|
177
|
+
--output reports/regress/regress_RUNCAND.json \
|
|
178
|
+
--parallel --concurrency 8
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
Exit codes:
|
|
182
|
+
- `0`: pass
|
|
183
|
+
- `1`: invalid input
|
|
184
|
+
- `2`: regression detected
|
|
185
|
+
- `3`: internal error
|
|
186
|
+
|
|
187
|
+
### 2.5 `evalvault artifacts lint`
|
|
188
|
+
|
|
189
|
+
Purpose:
|
|
190
|
+
- Validate required artifacts and schema invariants.
|
|
191
|
+
|
|
192
|
+
Synopsis:
|
|
193
|
+
```
|
|
194
|
+
uv run evalvault artifacts lint ARTIFACT_DIR \
|
|
195
|
+
--strict \
|
|
196
|
+
--format json \
|
|
197
|
+
--output reports/artifacts_lint/lint_RUNID.json \
|
|
198
|
+
--parallel --concurrency 16
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
Checks:
|
|
202
|
+
- `index.json` presence
|
|
203
|
+
- required paths exist
|
|
204
|
+
- JSON schema validation
|
|
205
|
+
|
|
206
|
+
### 2.6 `evalvault ops snapshot`
|
|
207
|
+
|
|
208
|
+
Purpose:
|
|
209
|
+
- Collect reproducibility metadata (profile, model config, env redactions).
|
|
210
|
+
|
|
211
|
+
Synopsis:
|
|
212
|
+
```
|
|
213
|
+
uv run evalvault ops snapshot \
|
|
214
|
+
--profile dev \
|
|
215
|
+
--db data/db/evalvault.db \
|
|
216
|
+
--run-id RUN_ID \
|
|
217
|
+
--include-model-config \
|
|
218
|
+
--include-env \
|
|
219
|
+
--redact OPENAI_API_KEY \
|
|
220
|
+
--output reports/ops/snapshot_RUNID.json
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
## 3. Architecture Alignment
|
|
224
|
+
|
|
225
|
+
### 3.1 SOLID
|
|
226
|
+
- SRP: each command orchestrates a single use-case service
|
|
227
|
+
- OCP: add new commands via new registrars without modifying core command modules
|
|
228
|
+
- DIP: domain services depend on ports (StoragePort, ReportPort, FileSystemPort)
|
|
229
|
+
|
|
230
|
+
### 3.2 Hexagonal/Clean
|
|
231
|
+
- Inbound adapter: `adapters/inbound/cli/commands/*`
|
|
232
|
+
- Domain services: `domain/services/*` for use-cases
|
|
233
|
+
- Outbound ports: `ports/outbound/*`
|
|
234
|
+
- Outbound adapters: sqlite storage, report writers, LLM providers
|
|
235
|
+
|
|
236
|
+
### 3.3 Proposed Services (Draft)
|
|
237
|
+
- `RunComparisonService`
|
|
238
|
+
- `JudgeCalibrationService`
|
|
239
|
+
- `DifficultyProfilingService`
|
|
240
|
+
- `RegressionGateService`
|
|
241
|
+
- `ArtifactLintService`
|
|
242
|
+
- `OpsSnapshotService`
|
|
243
|
+
|
|
244
|
+
## 4. Parallel Execution Model
|
|
245
|
+
|
|
246
|
+
- Use bounded concurrency (`--concurrency`) and deterministic aggregation.
|
|
247
|
+
- Candidate base utility: `domain/services/async_batch_executor.py`.
|
|
248
|
+
- Parallelize per-metric/per-case computations; merge results with stable sorting.
|
|
249
|
+
- LLM calls default to sequential unless explicitly enabled.
|
|
250
|
+
|
|
251
|
+
## 5. JSON Output Envelope
|
|
252
|
+
|
|
253
|
+
Common envelope (recommended):
|
|
254
|
+
```
|
|
255
|
+
{
|
|
256
|
+
"command": "compare",
|
|
257
|
+
"version": 1,
|
|
258
|
+
"status": "ok",
|
|
259
|
+
"started_at": "2026-01-18T00:00:00Z",
|
|
260
|
+
"finished_at": "2026-01-18T00:00:05Z",
|
|
261
|
+
"duration_ms": 5000,
|
|
262
|
+
"artifacts": {
|
|
263
|
+
"dir": "reports/.../artifacts/...",
|
|
264
|
+
"index": "reports/.../artifacts/.../index.json"
|
|
265
|
+
},
|
|
266
|
+
"data": {}
|
|
267
|
+
}
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
## 6. BDD Scenarios (Gherkin)
|
|
271
|
+
|
|
272
|
+
### compare
|
|
273
|
+
```
|
|
274
|
+
Feature: Compare two evaluation runs
|
|
275
|
+
Scenario: Compare two runs with shared metrics
|
|
276
|
+
Given a database with runs "run_a" and "run_b"
|
|
277
|
+
When I run "evalvault compare run_a run_b --format json"
|
|
278
|
+
Then the command exits with code 0
|
|
279
|
+
And the JSON output contains "run_ids" ["run_a", "run_b"]
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
### calibrate-judge
|
|
283
|
+
```
|
|
284
|
+
Feature: Calibrate judge scoring
|
|
285
|
+
Scenario: Calibrate judge scores using feedback labels
|
|
286
|
+
Given a run "run_x" with feedback labels in storage
|
|
287
|
+
When I run "evalvault calibrate-judge run_x --labels-source feedback"
|
|
288
|
+
Then the command exits with code 0
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
### regress
|
|
292
|
+
```
|
|
293
|
+
Feature: Regression gate for CI
|
|
294
|
+
Scenario: Regression detected
|
|
295
|
+
Given a candidate run "run_new" and baseline "run_base"
|
|
296
|
+
When I run "evalvault regress run_new --baseline run_base"
|
|
297
|
+
Then the command exits with code 2
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
## 7. Non-goals
|
|
301
|
+
- No distributed execution or multi-node scheduling
|
|
302
|
+
- No new scoring algorithms; only orchestration and reporting
|
|
303
|
+
- No breaking change to existing CLI
|
|
304
|
+
|
|
305
|
+
## 8. Risks
|
|
306
|
+
- Provider rate limits with parallel LLM calls
|
|
307
|
+
- DB contention under high concurrency
|
|
308
|
+
- Schema drift in artifacts without linting
|
|
309
|
+
|
|
310
|
+
## 9. Mapping to Existing Modules (Evidence)
|
|
311
|
+
- CLI app: `adapters/inbound/cli/app.py`
|
|
312
|
+
- Command registration: `adapters/inbound/cli/commands/__init__.py`
|
|
313
|
+
- Existing compare pipeline: `adapters/inbound/cli/commands/analyze.py`
|
|
314
|
+
- Artifact utilities: `adapters/inbound/cli/utils/analysis_io.py`
|
|
315
|
+
- Async batch executor: `domain/services/async_batch_executor.py`
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
# RAG 시스템 데이터 난이도 평가 및 평가용 LLM 파인튜닝 전략 (현실적 관점)
|
|
2
|
+
|
|
3
|
+
## 1. 데이터 난이도 평가 체계: 근거는 있으나 전제조건이 중요
|
|
4
|
+
|
|
5
|
+
### 1.1 핵심 전제
|
|
6
|
+
- 난이도는 질문/문맥/응답 간 상호작용으로 결정되며, 단일 지표로는 포착이 어렵다.
|
|
7
|
+
- Retrieval Complexity(RC)는 질문 난이도와 QA 성능/전문가 판단 간 상관을 보인다는 근거가 있다.
|
|
8
|
+
- 그러나 난이도는 “프록시 지표”이며, 실제 운영 데이터와의 상관 검증이 선행되어야 한다.
|
|
9
|
+
|
|
10
|
+
### 1.2 난이도 축(권장)
|
|
11
|
+
- 질문 복잡도: 복합 질문, 다단계 추론, 시간/조건 맥락 포함 여부
|
|
12
|
+
- 검색 난이도: 필요한 증거가 여러 문서에 분산되어 있는지, 검색 세트 완전도
|
|
13
|
+
- 답변 품질 신호: 정답 라벨/판정 점수, faithfulness/answer relevancy
|
|
14
|
+
- 노이즈/도메인 일탈: 검색 결과 부재, 도메인 분류 모델의 저확신
|
|
15
|
+
|
|
16
|
+
### 1.3 단계적 구현(현실적)
|
|
17
|
+
1. v0 (휴리스틱): 질의 길이, 멀티홉 플래그, 검색 성공/실패 여부, top-k 점수 분포
|
|
18
|
+
2. v1 (RC 기반): RRCP류 파이프라인을 적용해 RC 추정, 난이도-오류율 상관 검증
|
|
19
|
+
3. v2 (난이도 운영): 난이도 분포 드리프트를 KPI로 관리, 난이도 구간별 threshold 분리
|
|
20
|
+
|
|
21
|
+
### 1.4 노이즈/오류 입력 처리
|
|
22
|
+
- 검색 결과 유사도 하한, 결과 0건, 도메인 분류 저확신을 노이즈로 분류
|
|
23
|
+
- 노이즈 케이스는 별도 태그로 분리하고, 다운스트림에서 안전 응답으로 처리
|
|
24
|
+
|
|
25
|
+
### 1.5 EvalVault 연계
|
|
26
|
+
- 난이도 점수를 run_id 아티팩트로 저장해 난이도별 성능 추세를 비교 가능하게 한다.
|
|
27
|
+
- 난이도 분포 변화가 품질 저하와 연동되는지 검증해 “진짜 원인”인지 확인한다.
|
|
28
|
+
|
|
29
|
+
### 1.6 도메인별 예시(보험/원전)
|
|
30
|
+
- 보험
|
|
31
|
+
- Easy: “자동차 보험 가입 연령은?” (단일 문서 명시)
|
|
32
|
+
- Medium: “운전자 범위 변경 시 보험료가 어떻게 달라지나?” (규정+예외 조합)
|
|
33
|
+
- Hard: “실손보험에서 특정 치료가 비급여일 때 보장 범위는?” (다중 문서/조건 추론)
|
|
34
|
+
- 원전
|
|
35
|
+
- Easy: “1차 계통과 2차 계통의 차이는?” (정의성 질문)
|
|
36
|
+
- Medium: “정비 절차의 단계별 요구 사항은?” (절차/조건 조합)
|
|
37
|
+
- Hard: “특정 사고 시나리오에서 안전 계통 동작 순서와 근거는?” (다단계 추론)
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## 2. 평가용 LLM(as-a-judge) 파인튜닝: 비용 절감 가능, 일반화 리스크 존재
|
|
42
|
+
|
|
43
|
+
### 2.1 기본 원칙
|
|
44
|
+
- 비용 절감은 가능하나, 소형 judge의 일반화/공정성/도메인 이동성은 취약하다.
|
|
45
|
+
- judge 품질은 모델 크기보다 라벨 품질/캘리브레이션에 더 좌우된다.
|
|
46
|
+
|
|
47
|
+
### 2.2 데이터 구성(필수)
|
|
48
|
+
- 휴먼 레이블: 질문-문맥-응답과 점수(1~5) 또는 등급 라벨
|
|
49
|
+
- 선호도(pairwise): A/B 비교 데이터(가능하면 이유 포함)
|
|
50
|
+
- 전문가 정답: 기준 정답과의 일치/누락 평가
|
|
51
|
+
- 운영 로그: thumbs up/down, 재질의, 불만족 신호(약한 라벨)
|
|
52
|
+
|
|
53
|
+
### 2.3 학습 전략(권장)
|
|
54
|
+
- SFT로 시작 후, 선호 데이터가 충분하면 DPO 또는 SLiC-HF 추가 적용
|
|
55
|
+
- 출력 형식은 JSON 스키마를 고정하여 판정 안정성 확보
|
|
56
|
+
- 검증은 GPT-4급 judge와의 일치율, 인간 평가와의 상관을 함께 확인
|
|
57
|
+
|
|
58
|
+
### 2.4 운영 가드레일
|
|
59
|
+
- 캐스케이드 평가: 소형 judge로 대량 처리 후 경계 케이스만 상위 모델로 승격
|
|
60
|
+
- 캘리브레이션: 소량 인간 라벨로 점수 보정 및 신뢰구간 제공
|
|
61
|
+
- 편향 완화: 위치/형식/지식 편향에 대한 swap/format 랜덤화 테스트
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## 3. 최신 파인튜닝/효율 기법: “효율”과 “평가 품질”을 분리해 판단
|
|
66
|
+
|
|
67
|
+
### 3.1 적용 시점 가이드
|
|
68
|
+
- QLoRA/LoRA+/LoftQ는 메모리 효율에 유리하지만, 평가 품질 향상은 별도 검증 필요
|
|
69
|
+
- LongLoRA/Cartridges/MQA는 장문/서빙 효율에 유리하나 judge 성능 보장을 의미하지 않음
|
|
70
|
+
- GaLore는 메모리 절감과 full-update 가능성이 장점이나 운영 복잡도 증가
|
|
71
|
+
|
|
72
|
+
### 3.2 권장 선택 순서
|
|
73
|
+
1. QLoRA + LoRA(또는 LoRA+)로 시작
|
|
74
|
+
2. 캘리브레이션/일관성 확보 후에 확장 기법 고려
|
|
75
|
+
3. 장문 최적화는 실제 장문 업무에서 병목이 확인된 경우에만 적용
|
|
76
|
+
|
|
77
|
+
---
|
|
78
|
+
|
|
79
|
+
## 4. 결론
|
|
80
|
+
- 난이도 프로파일링은 유효하지만, “상관 검증 + 운영 KPI화”가 필수 전제다.
|
|
81
|
+
- 소형 judge는 비용 절감에 유리하나 일반화/편향/일관성 리스크가 크므로 캘리브레이션과 캐스케이드 운영이 필수다.
|
|
82
|
+
- 최신 파인튜닝 기법은 효율성 개선 도구이며, 평가 품질 향상을 보장하지 않는다.
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## 5. 실행 체크리스트
|
|
87
|
+
- 데이터 난이도
|
|
88
|
+
- 난이도 v0 지표가 오류율과 유의미하게 상관되는지 확인
|
|
89
|
+
- 난이도 분포 드리프트가 실제 품질 하락과 연동되는지 검증
|
|
90
|
+
- judge
|
|
91
|
+
- 사람 라벨 3–5% 확보 및 캘리브레이션 리포트 생성
|
|
92
|
+
- 캐스케이드 승격 조건(저신뢰/경계 케이스) 정의
|
|
93
|
+
- 운영
|
|
94
|
+
- run_id 아티팩트에 난이도/판정 근거 저장 여부 확인
|
|
95
|
+
- 난이도별 threshold 및 대응 정책 문서화
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## References
|
|
100
|
+
- RC metric: https://aclanthology.org/2024.findings-acl.872/
|
|
101
|
+
- GRADE difficulty matrix: https://arxiv.org/abs/2508.16994
|
|
102
|
+
- QLoRA: https://arxiv.org/abs/2305.14314
|
|
103
|
+
- LoftQ: https://arxiv.org/abs/2310.08659
|
|
104
|
+
- LoRA+: https://arxiv.org/abs/2402.12354
|
|
105
|
+
- LongLoRA: https://arxiv.org/abs/2309.12307
|
|
106
|
+
- DPO: https://arxiv.org/abs/2305.18290
|
|
107
|
+
- SLiC-HF: https://arxiv.org/abs/2305.10425
|
|
108
|
+
- GaLore: https://arxiv.org/abs/2403.03507
|
|
109
|
+
- Cartridges: https://arxiv.org/abs/2506.06266
|
|
110
|
+
- MQA: https://arxiv.org/abs/1911.02150
|
|
111
|
+
- JudgeLM: https://arxiv.org/abs/2310.17631
|
|
112
|
+
- Fine-tuned judge limits: https://aclanthology.org/2025.findings-acl.306/
|
|
113
|
+
- LLM judge reliability: https://arxiv.org/abs/2412.12509
|
|
114
|
+
- LLM judge bias: https://llm-judge-bias.github.io/
|