evalvault 1.65.0__tar.gz → 1.66.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {evalvault-1.65.0 → evalvault-1.66.0}/PKG-INFO +25 -1
- {evalvault-1.65.0 → evalvault-1.66.0}/README.md +23 -0
- evalvault-1.66.0/config/ragas_prompts_override.yaml +11 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/INDEX.md +5 -4
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/ROADMAP.md +4 -1
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/STATUS.md +13 -1
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/EVALVAULT_RUN_EXCEL_SHEETS.md +16 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/EVALVAULT_WORK_PLAN.md +3 -4
- evalvault-1.66.0/docs/guides/INSURANCE_SUMMARY_METRICS_PLAN.md +152 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/PROJECT_STATUS_AND_PLAN.md +3 -3
- evalvault-1.66.0/docs/guides/RAG_CLI_WORKFLOW_TEMPLATES.md +318 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/RAG_PERFORMANCE_IMPLEMENTATION_LOG.md +1 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/USER_GUIDE.md +200 -7
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/App.tsx +2 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/components/InsightSpacePanel.tsx +33 -1
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/components/Layout.tsx +3 -1
- evalvault-1.66.0/frontend/src/components/ai-elements/Conversation.tsx +23 -0
- evalvault-1.66.0/frontend/src/components/ai-elements/Message.tsx +48 -0
- evalvault-1.66.0/frontend/src/components/ai-elements/PromptInput.tsx +64 -0
- evalvault-1.66.0/frontend/src/components/ai-elements/Response.tsx +14 -0
- evalvault-1.66.0/frontend/src/components/ai-elements/index.ts +4 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/config/ui.ts +17 -0
- evalvault-1.66.0/frontend/src/pages/Chat.tsx +207 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/EvaluationStudio.tsx +22 -12
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/RunDetails.tsx +20 -3
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/Visualization.tsx +32 -5
- {evalvault-1.65.0 → evalvault-1.66.0}/pyproject.toml +2 -1
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/api/adapter.py +14 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/api/main.py +14 -4
- evalvault-1.66.0/src/evalvault/adapters/inbound/api/routers/chat.py +543 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/run.py +14 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/run_helpers.py +21 -2
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/report/llm_report_generator.py +13 -1
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/storage/base_sql.py +41 -1
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/tracker/langfuse_adapter.py +1 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/tracker/mlflow_adapter.py +5 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/tracker/phoenix_adapter.py +29 -2
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/config/settings.py +21 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/prompt.py +1 -1
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/metrics/__init__.py +8 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/metrics/registry.py +39 -3
- evalvault-1.66.0/src/evalvault/domain/metrics/summary_accuracy.py +189 -0
- evalvault-1.66.0/src/evalvault/domain/metrics/summary_needs_followup.py +45 -0
- evalvault-1.66.0/src/evalvault/domain/metrics/summary_non_definitive.py +41 -0
- evalvault-1.66.0/src/evalvault/domain/metrics/summary_risk_coverage.py +45 -0
- evalvault-1.66.0/src/evalvault/domain/services/custom_metric_snapshot.py +233 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/evaluator.py +280 -27
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/prompt_registry.py +39 -10
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/threshold_profiles.py +4 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/visual_space_service.py +79 -4
- evalvault-1.66.0/tests/fixtures/e2e/callcenter_summary_5cases.json +74 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/summary_eval_minimal.json +15 -3
- {evalvault-1.65.0 → evalvault-1.66.0}/uv.lock +1250 -108
- evalvault-1.65.0/config/ragas_prompts_override.yaml +0 -4
- {evalvault-1.65.0 → evalvault-1.66.0}/.dockerignore +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/.env.example +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/.github/workflows/ci.yml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/.github/workflows/release.yml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/.github/workflows/stale.yml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/.gitignore +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/.pre-commit-config.yaml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/.python-version +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/AGENTS.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/CHANGELOG.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/CLAUDE.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/CODE_OF_CONDUCT.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/CONTRIBUTING.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/Dockerfile +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/LICENSE.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/README.en.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/SECURITY.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/README.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/agent.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/client.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/config.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/main.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/memory/README.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/memory/shared/decisions.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/memory/shared/dependencies.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/memory/templates/coordinator_guide.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/memory/templates/work_log_template.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/memory_integration.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/progress.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts/app_spec.txt +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts/baseline.txt +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts/coding_prompt.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts/existing_project_prompt.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts/improvement/architecture_prompt.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts/improvement/base_prompt.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts/improvement/coordinator_prompt.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts/improvement/observability_prompt.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts/initializer_prompt.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts/prompt_manifest.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts/system.txt +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/prompts.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/requirements.txt +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/agent/security.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/config/domains/insurance/memory.yaml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/config/domains/insurance/terms_dictionary_en.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/config/domains/insurance/terms_dictionary_ko.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/config/methods.yaml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/config/models.yaml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/config/regressions/default.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/config/regressions/ux.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/config/stage_metric_playbook.yaml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/config/stage_metric_thresholds.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/datasets/dummy_test_dataset.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/datasets/insurance_qa_korean.csv +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/datasets/insurance_qa_korean.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/datasets/insurance_qa_korean_2.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/datasets/insurance_qa_korean_3.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/datasets/ragas_ko90_en10.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/datasets/sample.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/datasets/visualization_20q_cluster_map.csv +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/datasets/visualization_20q_korean.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/datasets/visualization_2q_cluster_map.csv +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/datasets/visualization_2q_korean.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/kg/knowledge_graph.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/raw/The Complete Guide to Mastering Suno Advanced Strategies for Professional Music Generation.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/raw/edge_cases.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/raw/run_mode_full_domain_memory.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/data/raw/sample_rag_knowledge.txt +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/dataset_templates/dataset_template.csv +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/dataset_templates/dataset_template.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/dataset_templates/dataset_template.xlsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/dataset_templates/method_input_template.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docker-compose.langfuse.yml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docker-compose.phoenix.yaml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docker-compose.yml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/README.ko.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/api/adapters/inbound.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/api/adapters/outbound.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/api/config.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/api/domain/entities.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/api/domain/metrics.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/api/domain/services.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/api/ports/inbound.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/api/ports/outbound.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/architecture/open-rag-trace-collector.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/architecture/open-rag-trace-spec.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/getting-started/INSTALLATION.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/AGENTS_SYSTEM_GUIDE.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/CLI_MCP_PLAN.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/CLI_PARALLEL_FEATURES_SPEC.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/DEV_GUIDE.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/EXTERNAL_TRACE_API_SPEC.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/Extension_2.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/Extension_Data_Difficulty_Profiling_Custom_Judge_Model.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/LENA_MVP_IMPLEMENTATION_PLAN.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/LENA_RAGAS_CALIBRATION_DEV_PLAN.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/OPEN_RAG_TRACE_INTERNAL_ADAPTER.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/OPEN_RAG_TRACE_SAMPLES.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/PARALLEL_WORK_APPROVAL_RULES.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/PRD_LENA.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/RAG_NOISE_REDUCTION_GUIDE.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/RAG_PERFORMANCE_IMPROVEMENT_PROPOSAL.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/RELEASE_CHECKLIST.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/WEBUI_CLI_ROLLOUT_PLAN.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/prompt_suggestions_design.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/rag_human_feedback_calibration_implementation_plan.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/guides/refactoring_strategy.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/mapping/component-to-whitepaper.yaml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/00_frontmatter.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/01_overview.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/02_architecture.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/03_data_flow.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/04_components.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/05_expert_lenses.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/06_implementation.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/07_advanced.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/08_customization.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/09_quality.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/10_performance.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/11_security.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/12_operations.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/13_standards.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/14_roadmap.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/INDEX.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/new_whitepaper/STYLE_GUIDE.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/refactor/REFAC_000_master_plan.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/refactor/REFAC_010_agent_playbook.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/refactor/REFAC_020_logging_policy.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/refactor/REFAC_030_phase0_responsibility_map.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/refactor/REFAC_040_wbs_parallel_plan.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/refactor/logs/phase-0-baseline.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/refactor/logs/phase-1-evaluator.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/refactor/logs/phase-2-cli-run.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/refactor/logs/phase-3-analysis.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/security_audit_worklog.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/stylesheets/extra.css +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/templates/dataset_template.csv +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/templates/dataset_template.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/templates/dataset_template.xlsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/templates/eval_report_templates.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/templates/kg_template.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/templates/otel_openinference_trace_example.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/templates/ragas_dataset_example_ko90_en10.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/templates/retriever_docs_template.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/tools/generate-whitepaper.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/docs/web_ui_analysis_migration_plan.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/dummy_test_dataset.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/README.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/benchmarks/README.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/benchmarks/korean_rag/faithfulness_test.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/benchmarks/korean_rag/insurance_qa_100.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/benchmarks/korean_rag/keyword_extraction_test.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/benchmarks/korean_rag/retrieval_test.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/benchmarks/output/comparison.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/benchmarks/output/full_results.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/benchmarks/output/leaderboard.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/benchmarks/output/results_mteb.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/benchmarks/output/retrieval_result.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/benchmarks/run_korean_benchmark.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/kg_generator_demo.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/method_plugin_template/README.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/method_plugin_template/pyproject.toml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/method_plugin_template/src/method_plugin_template/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/method_plugin_template/src/method_plugin_template/methods.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/stage_events.jsonl +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/usecase/comprehensive_workflow_test.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/usecase/insurance_eval_dataset.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/examples/usecase/output/comprehensive_report.html +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/.env.example +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/.gitignore +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/README.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/e2e/analysis-compare.spec.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/e2e/analysis-lab.spec.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/e2e/compare-runs.spec.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/e2e/dashboard.spec.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/e2e/domain-memory.spec.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/e2e/evaluation-studio.spec.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/e2e/knowledge-base.spec.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/e2e/mocks/intents.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/e2e/mocks/run_details.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/e2e/mocks/runs.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/e2e/run-details.spec.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/eslint.config.js +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/index.html +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/package-lock.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/package.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/playwright.config.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/public/vite.svg +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/App.css +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/assets/react.svg +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/components/AnalysisNodeOutputs.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/components/MarkdownContent.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/components/PrioritySummaryPanel.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/components/SpaceLegend.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/components/SpacePlot2D.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/components/SpacePlot3D.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/components/StatusBadge.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/components/VirtualizedText.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/config.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/hooks/useInsightSpace.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/index.css +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/main.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/AnalysisCompareView.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/AnalysisLab.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/AnalysisResultView.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/CompareRuns.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/ComprehensiveAnalysis.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/CustomerReport.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/Dashboard.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/DomainMemory.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/KnowledgeBase.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/Settings.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/pages/VisualizationHome.tsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/services/api.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/types/plotly.d.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/utils/format.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/utils/phoenix.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/utils/runAnalytics.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/utils/score.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/src/utils/summaryMetrics.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/tailwind.config.js +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/tsconfig.app.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/tsconfig.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/tsconfig.node.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/frontend/vite.config.ts +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/mkdocs.yml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/package-lock.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/prompts/system_override.txt +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/.gitkeep +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/README.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/final_output.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/index.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/load_runs.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/report.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_change_detection.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_metric_comparison.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/final_output.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/index.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/load_runs.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/report.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_change_detection.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_metric_comparison.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/final_output.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/index.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/load_runs.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/report.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_change_detection.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_metric_comparison.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/comparison_0aa9fab0_9fbf4776.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/comparison_0aa9fab0_9fbf4776.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/comparison_0aa9fab0_f1287e90.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/comparison_0aa9fab0_f1287e90.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/comparison_8f825b22_4516d358.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/comparison_8f825b22_4516d358.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/comparison_9fbf4776_a491fa0e.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/comparison_9fbf4776_a491fa0e.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/comparison_f1287e90_8f825b22.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/comparison/comparison_f1287e90_8f825b22.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/debug_report_r1_smoke.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/debug_report_r2_graphrag.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/debug_report_r2_graphrag_openai.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/debug_report_r3_bm25.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/debug_report_r3_bm25_langfuse3.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/debug_report_r3_dense_faiss.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/improvement_1d91a667-4288-4742-be3a-a8f5310c5140.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r2_graphrag_openai_stage_events.jsonl +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r2_graphrag_openai_stage_report.txt +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r2_graphrag_stage_events.jsonl +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r2_graphrag_stage_report.txt +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r3_bm25_langfuse2_stage_events.jsonl +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r3_bm25_langfuse3_stage_events.jsonl +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r3_bm25_langfuse_stage_events.jsonl +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r3_bm25_phoenix_stage_events.jsonl +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r3_bm25_stage_events.jsonl +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r3_bm25_stage_report.txt +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r3_dense_faiss_stage_events.jsonl +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/r3_dense_faiss_stage_report.txt +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/retrieval_benchmark_smoke_precision.csv +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/retrieval_benchmark_smoke_precision_graphrag.csv +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/reports/retrieval_benchmark_smoke_precision_multi.csv +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/benchmark/download_kmmlu.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/dev/open_rag_trace_demo.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/dev/open_rag_trace_integration_template.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/dev/otel-collector-config.yaml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/dev/start_web_ui_with_phoenix.sh +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/dev/validate_open_rag_trace.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/dev_seed_pipeline_results.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/docs/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/docs/analyzer/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/docs/analyzer/ast_scanner.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/docs/analyzer/confidence_scorer.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/docs/analyzer/graph_builder.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/docs/analyzer/side_effect_detector.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/docs/generate_api_docs.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/docs/models/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/docs/models/schema.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/docs/renderer/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/docs/renderer/html_generator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/ops/phoenix_watch.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/perf/backfill_langfuse_trace_url.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/perf/r3_dense_smoke.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/perf/r3_evalvault_run_dataset.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/perf/r3_retriever_docs.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/perf/r3_smoke_real.jsonl +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/perf/r3_stage_events_sample.jsonl +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/pipeline_template_inspect.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/reports/generate_release_notes.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/run_with_timeout.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/test_full_evaluation.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/tests/run_regressions.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/tests/run_retriever_stage_report_smoke.sh +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/validate_tutorials.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/verify_ragas_compliance.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/scripts/verify_workflows.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/api/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/api/routers/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/api/routers/benchmark.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/api/routers/config.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/api/routers/domain.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/api/routers/knowledge.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/api/routers/pipeline.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/api/routers/runs.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/app.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/agent.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/analyze.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/api.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/artifacts.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/benchmark.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/calibrate.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/calibrate_judge.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/compare.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/config.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/debug.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/domain.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/experiment.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/gate.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/generate.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/history.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/init.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/kg.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/langfuse.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/method.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/ops.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/phoenix.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/pipeline.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/profile_difficulty.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/prompts.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/regress.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/commands/stage.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/utils/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/utils/analysis_io.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/utils/console.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/utils/errors.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/utils/formatters.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/utils/options.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/utils/presets.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/utils/progress.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/cli/utils/validators.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/mcp/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/mcp/schemas.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/inbound/mcp/tools.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/analysis_report_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/base_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/bm25_searcher_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/causal_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/causal_analyzer_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/common.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/comparison_pipeline_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/comparison_report_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/data_loader_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/detailed_report_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/diagnostic_playbook_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/embedding_analyzer_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/embedding_distribution_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/embedding_searcher_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/hybrid_rrf_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/hybrid_weighted_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/hypothesis_generator_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/llm_report_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/low_performer_extractor_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/model_analyzer_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/morpheme_analyzer_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/morpheme_quality_checker_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/network_analyzer_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/nlp_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/nlp_analyzer_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/pattern_detector_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/pipeline_factory.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/pipeline_helpers.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/priority_summary_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/ragas_evaluator_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/retrieval_analyzer_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/retrieval_benchmark_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/retrieval_quality_checker_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/root_cause_analyzer_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/run_analyzer_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/run_change_detector_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/run_comparator_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/run_loader_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/run_metric_comparator_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/search_comparator_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/statistical_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/statistical_analyzer_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/statistical_comparator_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/summary_report_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/time_series_analyzer_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/timeseries_advanced_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/trend_detector_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/analysis/verification_report_module.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/artifact_fs.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/benchmark/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/benchmark/lm_eval_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/cache/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/cache/hybrid_cache.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/cache/memory_cache.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/dataset/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/dataset/base.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/dataset/csv_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/dataset/excel_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/dataset/json_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/dataset/loader_factory.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/dataset/method_input_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/dataset/streaming_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/dataset/templates.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/dataset/thresholds.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/debug/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/debug/report_renderer.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/documents/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/documents/ocr/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/documents/ocr/paddleocr_backend.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/documents/pdf_extractor.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/documents/versioned_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/domain_memory/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/domain_memory/domain_memory_schema.sql +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/domain_memory/sqlite_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/filesystem/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/filesystem/difficulty_profile_writer.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/filesystem/ops_snapshot_writer.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/improvement/insight_generator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/improvement/pattern_detector.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/improvement/playbook_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/improvement/stage_metric_playbook_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/judge_calibration_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/judge_calibration_reporter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/kg/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/kg/graph_rag_retriever.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/kg/networkx_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/kg/parallel_kg_builder.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/kg/query_strategies.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/llm/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/llm/anthropic_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/llm/azure_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/llm/base.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/llm/factory.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/llm/instructor_factory.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/llm/llm_relation_augmenter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/llm/ollama_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/llm/openai_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/llm/token_aware_chat.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/llm/vllm_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/methods/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/methods/baseline_oracle.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/methods/external_command.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/methods/registry.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/nlp/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/nlp/korean/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/nlp/korean/bm25_retriever.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/nlp/korean/dense_retriever.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/nlp/korean/document_chunker.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/nlp/korean/hybrid_retriever.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/nlp/korean/kiwi_tokenizer.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/nlp/korean/korean_evaluation.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/nlp/korean/korean_stopwords.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/nlp/korean/toolkit.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/nlp/korean/toolkit_factory.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/phoenix/sync_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/report/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/report/dashboard_generator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/report/markdown_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/storage/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/storage/benchmark_storage_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/storage/postgres_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/storage/postgres_schema.sql +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/storage/schema.sql +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/storage/sqlite_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/tracer/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/tracer/open_rag_log_handler.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_decorators.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_helpers.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/tracer/phoenix_tracer_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/tracker/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/adapters/outbound/tracker/log_sanitizer.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/config/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/config/agent_types.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/config/domain_config.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/config/instrumentation.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/config/langfuse_support.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/config/model_config.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/config/phoenix_support.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/config/playbooks/improvement_playbook.yaml +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/config/secret_manager.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/debug_ragas.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/debug_ragas_real.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/analysis.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/analysis_pipeline.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/benchmark.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/benchmark_run.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/dataset.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/debug.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/experiment.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/feedback.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/improvement.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/judge_calibration.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/kg.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/memory.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/method.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/prompt_suggestion.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/rag_trace.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/result.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/entities/stage.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/metrics/analysis_registry.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/metrics/confidence.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/metrics/contextual_relevancy.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/metrics/entity_preservation.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/metrics/insurance.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/metrics/no_answer.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/metrics/retrieval_rank.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/metrics/terms_dictionary.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/metrics/text_match.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/analysis_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/artifact_lint_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/async_batch_executor.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/batch_executor.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/benchmark_report_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/benchmark_runner.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/benchmark_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/cache_metrics.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/cluster_map_builder.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/dataset_preprocessor.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/debug_report_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/difficulty_profile_reporter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/difficulty_profiling_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/document_chunker.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/document_versioning.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/domain_learning_hook.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/embedding_overlay.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/entity_extractor.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/experiment_comparator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/experiment_manager.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/experiment_reporter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/experiment_repository.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/experiment_statistics.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/holdout_splitter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/improvement_guide_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/intent_classifier.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/judge_calibration_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/kg_generator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/memory_aware_evaluator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/memory_based_analysis.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/method_runner.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/ops_snapshot_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/pipeline_orchestrator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/pipeline_template_registry.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/prompt_candidate_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/prompt_manifest.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/prompt_scoring_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/prompt_status.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/prompt_suggestion_reporter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/ragas_prompt_overrides.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/regression_gate_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/retrieval_metrics.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/retriever_context.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/run_comparison_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/satisfaction_calibration_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/stage_event_builder.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/stage_metric_guide_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/stage_metric_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/stage_summary_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/synthetic_qa_generator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/testset_generator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/domain/services/unified_report_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/mkdocs_helpers.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/inbound/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/inbound/analysis_pipeline_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/inbound/evaluator_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/inbound/learning_hook_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/inbound/web_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/analysis_cache_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/analysis_module_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/analysis_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/artifact_fs_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/benchmark_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/causal_analysis_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/comparison_pipeline_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/dataset_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/difficulty_profile_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/domain_memory_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/embedding_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/improvement_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/intent_classifier_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/judge_calibration_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/korean_nlp_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/llm_factory_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/llm_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/method_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/nlp_analysis_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/ops_snapshot_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/relation_augmenter_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/report_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/stage_storage_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/storage_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/tracer_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/ports/outbound/tracker_port.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/reports/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/reports/release_notes.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/scripts/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/src/evalvault/scripts/regression_runner.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/conftest.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/README.md +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/benchmark/retrieval_ground_truth_min.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/benchmark/retrieval_ground_truth_multi.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/auto_insurance_qa_korean_full.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/comprehensive_dataset.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/edge_cases.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/edge_cases.xlsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/evaluation_test_sample.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/graphrag_retriever_docs.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/graphrag_smoke.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/insurance_document.txt +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/insurance_qa_english.csv +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/insurance_qa_english.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/insurance_qa_english.xlsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/insurance_qa_korean.csv +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/insurance_qa_korean.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/insurance_qa_korean.xlsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/insurance_qa_korean_versioned_pdf.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/run_mode_full_domain_memory.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/e2e/run_mode_simple.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/kg/minimal_graph.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/sample_dataset.csv +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/sample_dataset.json +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/fixtures/sample_dataset.xlsx +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/benchmark/test_benchmark_service_integration.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/conftest.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/test_cli_integration.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/test_data_flow.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/test_e2e_scenarios.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/test_evaluation_flow.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/test_full_workflow.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/test_langfuse_flow.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/test_phoenix_flow.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/test_pipeline_api_contracts.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/test_storage_flow.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/integration/test_summary_eval_fixture.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/optional_deps.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/inbound/mcp/test_execute_tools.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/inbound/mcp/test_read_tools.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/outbound/documents/test_pdf_extractor.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/outbound/documents/test_versioned_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/outbound/improvement/__init__.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/outbound/improvement/test_insight_generator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/outbound/improvement/test_pattern_detector.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/outbound/improvement/test_playbook_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/outbound/improvement/test_stage_metric_playbook_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/outbound/kg/test_graph_rag_retriever.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/outbound/kg/test_parallel_kg_builder.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/adapters/outbound/storage/test_benchmark_storage_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/config/test_phoenix_support.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/conftest.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/metrics/test_analysis_metric_registry.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/metrics/test_confidence.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/metrics/test_contextual_relevancy.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/metrics/test_entity_preservation.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/metrics/test_metric_registry.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/metrics/test_no_answer.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/metrics/test_retrieval_rank.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/metrics/test_text_match.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_cache_metrics.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_claim_level.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_dataset_preprocessor.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_document_versioning.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_evaluator_comprehensive.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_holdout_splitter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_improvement_guide_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_judge_calibration_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_ops_snapshot_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_regression_gate_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_retrieval_metrics.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_retriever_context.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_stage_event_builder.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_stage_metric_guide_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/services/test_synthetic_qa_generator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/test_embedding_overlay.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/test_prompt_manifest.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/domain/test_prompt_status.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/reports/test_release_notes.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/scripts/test_regression_runner.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_agent_types.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_analysis_entities.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_analysis_modules.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_analysis_pipeline.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_analysis_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_anthropic_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_artifact_lint_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_async_batch_executor.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_azure_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_benchmark_helpers.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_benchmark_runner.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_causal_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_cli.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_cli_artifacts.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_cli_calibrate_judge.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_cli_domain.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_cli_init.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_cli_ops.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_cli_progress.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_cli_utils.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_data_loaders.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_difficulty_profiling_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_domain_config.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_domain_memory.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_entities.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_entities_kg.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_entity_extractor.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_evaluator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_experiment.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_hybrid_cache.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_instrumentation.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_insurance_metric.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_intent_classifier.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_kg_generator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_kg_networkx.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_kiwi_tokenizer.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_kiwi_warning_suppression.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_korean_dense.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_korean_evaluation.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_korean_retrieval.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_langfuse_tracker.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_llm_relation_augmenter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_lm_eval_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_markdown_report.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_memory_cache.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_memory_services.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_method_plugins.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_mlflow_tracker.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_model_config.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_nlp_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_nlp_entities.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_ollama_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_openai_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_phoenix_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_pipeline_orchestrator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_ports.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_postgres_storage.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_prompt_candidate_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_rag_trace_entities.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_regress_cli.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_run_comparison_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_run_memory_helpers.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_run_mode_fixtures.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_settings.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_sqlite_storage.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_stage_cli.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_stage_metric_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_stage_storage.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_stage_summary_service.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_statistical_adapter.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_streaming_loader.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_summary_eval_fixture.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_testset_generator.py +0 -0
- {evalvault-1.65.0 → evalvault-1.66.0}/tests/unit/test_web_adapter.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: evalvault
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.66.0
|
|
4
4
|
Summary: RAG evaluation system using Ragas with Phoenix/Langfuse tracing
|
|
5
5
|
Project-URL: Homepage, https://github.com/ntts9990/EvalVault
|
|
6
6
|
Project-URL: Documentation, https://github.com/ntts9990/EvalVault#readme
|
|
@@ -25,6 +25,7 @@ Classifier: Topic :: Software Development :: Quality Assurance
|
|
|
25
25
|
Classifier: Topic :: Software Development :: Testing
|
|
26
26
|
Classifier: Typing :: Typed
|
|
27
27
|
Requires-Python: >=3.12
|
|
28
|
+
Requires-Dist: chainlit>=2.9.5
|
|
28
29
|
Requires-Dist: chardet
|
|
29
30
|
Requires-Dist: fastapi>=0.128.0
|
|
30
31
|
Requires-Dist: instructor
|
|
@@ -137,12 +138,17 @@ English version? See `README.en.md`.
|
|
|
137
138
|
## Quick Links
|
|
138
139
|
|
|
139
140
|
- 문서 허브: `docs/INDEX.md`
|
|
141
|
+
- CLI 실행 시나리오 가이드: `docs/guides/RAG_CLI_WORKFLOW_TEMPLATES.md`
|
|
140
142
|
- 사용자 가이드: `docs/guides/USER_GUIDE.md`
|
|
141
143
|
- 개발 가이드: `docs/guides/DEV_GUIDE.md`
|
|
142
144
|
- 상태/로드맵: `docs/STATUS.md`, `docs/ROADMAP.md`
|
|
143
145
|
- 개발 백서(설계/운영/품질 기준): `docs/new_whitepaper/INDEX.md`
|
|
144
146
|
- Open RAG Trace: `docs/architecture/open-rag-trace-spec.md`
|
|
145
147
|
|
|
148
|
+
### 다음 개선 작업 메모
|
|
149
|
+
- 보험 요약 메트릭 확장 계획: `docs/guides/INSURANCE_SUMMARY_METRICS_PLAN.md`
|
|
150
|
+
- Prompt 반복 적용 계획: `docs/guides/repeat_query.md`
|
|
151
|
+
|
|
146
152
|
---
|
|
147
153
|
|
|
148
154
|
## EvalVault가 해결하는 문제
|
|
@@ -470,6 +476,24 @@ npm run dev
|
|
|
470
476
|
- Ragas 계열: `faithfulness`, `answer_relevancy`, `context_precision`, `context_recall`, `factual_correctness`, `semantic_similarity`
|
|
471
477
|
- 커스텀 예시(도메인): `insurance_term_accuracy`
|
|
472
478
|
|
|
479
|
+
### 요약 메트릭 설계 근거 (summary_score, summary_faithfulness, entity_preservation)
|
|
480
|
+
|
|
481
|
+
### 커스텀 메트릭 스냅샷 (평가 방식/과정/결과 기록)
|
|
482
|
+
- 평가 방식/입출력/규칙/구현 파일 해시를 `run.tracker_metadata.custom_metric_snapshot`에 기록합니다.
|
|
483
|
+
- Excel `CustomMetrics` 시트와 Langfuse/Phoenix/MLflow artifact에도 함께 저장됩니다.
|
|
484
|
+
|
|
485
|
+
- `summary_faithfulness`: 요약의 모든 주장이 컨텍스트에 근거하는지 평가합니다. 환각/왜곡 리스크를 직접적으로 측정합니다.
|
|
486
|
+
- `summary_score`: 컨텍스트 대비 요약의 핵심 정보 보존/간결성 균형을 평가합니다. 정답 요약 단일 기준의 편향을 줄입니다.
|
|
487
|
+
- `entity_preservation`: 금액·기간·조건·면책 등 보험 약관에서 중요한 엔티티가 요약에 유지되는지 측정합니다.
|
|
488
|
+
|
|
489
|
+
**보험 도메인 특화 근거**
|
|
490
|
+
- 보험 약관에서 치명적인 요소(면책, 자기부담, 한도, 조건 등)를 키워드로 직접 반영하고, 금액/기간/비율 같은 핵심 엔티티를 보존하도록 설계했습니다.
|
|
491
|
+
- 범용 규칙(숫자/기간/금액)과 보험 특화 키워드를 함께 사용하므로, 현재 상태는 “보험 리스크 중심의 약한 도메인 특화”로 보는 것이 정확합니다.
|
|
492
|
+
|
|
493
|
+
**해석 주의사항**
|
|
494
|
+
- 세 메트릭 모두 `contexts` 품질에 크게 의존합니다. 컨텍스트가 부정확/과도하면 점수가 낮아질 수 있습니다.
|
|
495
|
+
- `summary_score`는 키프레이즈 기반이므로, 표현이 달라지면 점수가 낮게 나올 수 있습니다.
|
|
496
|
+
|
|
473
497
|
정확한 옵션/운영 레시피는 `docs/guides/USER_GUIDE.md`를 기준으로 최신화합니다.
|
|
474
498
|
|
|
475
499
|
---
|
|
@@ -14,12 +14,17 @@ English version? See `README.en.md`.
|
|
|
14
14
|
## Quick Links
|
|
15
15
|
|
|
16
16
|
- 문서 허브: `docs/INDEX.md`
|
|
17
|
+
- CLI 실행 시나리오 가이드: `docs/guides/RAG_CLI_WORKFLOW_TEMPLATES.md`
|
|
17
18
|
- 사용자 가이드: `docs/guides/USER_GUIDE.md`
|
|
18
19
|
- 개발 가이드: `docs/guides/DEV_GUIDE.md`
|
|
19
20
|
- 상태/로드맵: `docs/STATUS.md`, `docs/ROADMAP.md`
|
|
20
21
|
- 개발 백서(설계/운영/품질 기준): `docs/new_whitepaper/INDEX.md`
|
|
21
22
|
- Open RAG Trace: `docs/architecture/open-rag-trace-spec.md`
|
|
22
23
|
|
|
24
|
+
### 다음 개선 작업 메모
|
|
25
|
+
- 보험 요약 메트릭 확장 계획: `docs/guides/INSURANCE_SUMMARY_METRICS_PLAN.md`
|
|
26
|
+
- Prompt 반복 적용 계획: `docs/guides/repeat_query.md`
|
|
27
|
+
|
|
23
28
|
---
|
|
24
29
|
|
|
25
30
|
## EvalVault가 해결하는 문제
|
|
@@ -347,6 +352,24 @@ npm run dev
|
|
|
347
352
|
- Ragas 계열: `faithfulness`, `answer_relevancy`, `context_precision`, `context_recall`, `factual_correctness`, `semantic_similarity`
|
|
348
353
|
- 커스텀 예시(도메인): `insurance_term_accuracy`
|
|
349
354
|
|
|
355
|
+
### 요약 메트릭 설계 근거 (summary_score, summary_faithfulness, entity_preservation)
|
|
356
|
+
|
|
357
|
+
### 커스텀 메트릭 스냅샷 (평가 방식/과정/결과 기록)
|
|
358
|
+
- 평가 방식/입출력/규칙/구현 파일 해시를 `run.tracker_metadata.custom_metric_snapshot`에 기록합니다.
|
|
359
|
+
- Excel `CustomMetrics` 시트와 Langfuse/Phoenix/MLflow artifact에도 함께 저장됩니다.
|
|
360
|
+
|
|
361
|
+
- `summary_faithfulness`: 요약의 모든 주장이 컨텍스트에 근거하는지 평가합니다. 환각/왜곡 리스크를 직접적으로 측정합니다.
|
|
362
|
+
- `summary_score`: 컨텍스트 대비 요약의 핵심 정보 보존/간결성 균형을 평가합니다. 정답 요약 단일 기준의 편향을 줄입니다.
|
|
363
|
+
- `entity_preservation`: 금액·기간·조건·면책 등 보험 약관에서 중요한 엔티티가 요약에 유지되는지 측정합니다.
|
|
364
|
+
|
|
365
|
+
**보험 도메인 특화 근거**
|
|
366
|
+
- 보험 약관에서 치명적인 요소(면책, 자기부담, 한도, 조건 등)를 키워드로 직접 반영하고, 금액/기간/비율 같은 핵심 엔티티를 보존하도록 설계했습니다.
|
|
367
|
+
- 범용 규칙(숫자/기간/금액)과 보험 특화 키워드를 함께 사용하므로, 현재 상태는 “보험 리스크 중심의 약한 도메인 특화”로 보는 것이 정확합니다.
|
|
368
|
+
|
|
369
|
+
**해석 주의사항**
|
|
370
|
+
- 세 메트릭 모두 `contexts` 품질에 크게 의존합니다. 컨텍스트가 부정확/과도하면 점수가 낮아질 수 있습니다.
|
|
371
|
+
- `summary_score`는 키프레이즈 기반이므로, 표현이 달라지면 점수가 낮게 나올 수 있습니다.
|
|
372
|
+
|
|
350
373
|
정확한 옵션/운영 레시피는 `docs/guides/USER_GUIDE.md`를 기준으로 최신화합니다.
|
|
351
374
|
|
|
352
375
|
---
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
faithfulness: |
|
|
2
|
+
당신은 평가자입니다. 아래 CONTEXT를 기준으로 각 STATEMENT가 직접적으로
|
|
3
|
+
추론 가능한지 판단하세요.
|
|
4
|
+
- verdict는 반드시 정수 1 또는 0으로만 출력하세요(따옴표 없이).
|
|
5
|
+
- 1: 컨텍스트에서 직접적으로 지지됨, 0: 지지되지 않음.
|
|
6
|
+
- JSON 형식으로만 반환하세요.
|
|
7
|
+
|
|
8
|
+
answer_relevancy: |
|
|
9
|
+
당신은 평가자입니다. 질문과 답변이 얼마나 관련 있는지 0~1 점수로 평가하세요.
|
|
10
|
+
- 출력은 숫자 점수와 간단한 근거를 포함해야 합니다.
|
|
11
|
+
- 질문과 무관한 내용이 많으면 낮은 점수를 부여하세요.
|
|
@@ -13,16 +13,17 @@
|
|
|
13
13
|
## 빠른 링크
|
|
14
14
|
|
|
15
15
|
- 설치: `getting-started/INSTALLATION.md`
|
|
16
|
+
- CLI 실행 시나리오 가이드: `guides/RAG_CLI_WORKFLOW_TEMPLATES.md`
|
|
16
17
|
- 사용자 가이드(운영 포함): `guides/USER_GUIDE.md`
|
|
17
18
|
- 개발/기여: `guides/DEV_GUIDE.md`
|
|
18
|
-
- CLI→MCP 이식 계획: `guides/CLI_MCP_PLAN.md`
|
|
19
|
-
- Web UI 확장 설계서: `guides/WEBUI_CLI_ROLLOUT_PLAN.md` (1단계 구현 파일 목록 포함)
|
|
20
|
-
- RAGAS 인간 피드백 보정: `guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md`
|
|
21
19
|
- 진단 플레이북: `guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md` (문제→분석→해석→액션 흐름)
|
|
22
20
|
- RAG 성능 개선 제안서: `guides/RAG_PERFORMANCE_IMPROVEMENT_PROPOSAL.md` (목적/미션·KPI·로드맵)
|
|
23
|
-
-
|
|
21
|
+
- RAGAS 인간 피드백 보정: `guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md`
|
|
24
22
|
- 실행 결과 엑셀 시트 요약: `guides/EVALVAULT_RUN_EXCEL_SHEETS.md`
|
|
25
23
|
- 평가 리포트 템플릿: `templates/eval_report_templates.md`
|
|
24
|
+
- CLI→MCP 이식 계획: `guides/CLI_MCP_PLAN.md`
|
|
25
|
+
- Web UI 확장 설계서: `guides/WEBUI_CLI_ROLLOUT_PLAN.md`
|
|
26
|
+
- 문서 최신화 작업 계획: `guides/DOCS_REFRESH_PLAN.md`
|
|
26
27
|
- 릴리즈 체크리스트: `guides/RELEASE_CHECKLIST.md`
|
|
27
28
|
- 상태 요약: `STATUS.md`
|
|
28
29
|
- 로드맵: `ROADMAP.md`
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# EvalVault 로드맵 (Roadmap)
|
|
2
2
|
|
|
3
|
-
> Last Updated: 2026-01-
|
|
3
|
+
> Last Updated: 2026-01-18
|
|
4
4
|
|
|
5
5
|
이 문서는 **"우리가 다음으로 무엇을, 왜 하는가"**를 외부(사용자/기여자) 관점에서 간단히 공유합니다.
|
|
6
6
|
|
|
@@ -19,15 +19,18 @@
|
|
|
19
19
|
### P1 (사용성)
|
|
20
20
|
- Web UI에서 핵심 워크플로(Evaluation → History → Reports) 완성도 향상
|
|
21
21
|
- CLI/웹 공통 DB/아티팩트 경로 규약을 문서/UX에 일관되게 노출
|
|
22
|
+
- Run 상세 탭(Staging/Prompts/Gate/Debug)과 분석 실험실 연동 강화
|
|
22
23
|
|
|
23
24
|
### P2 (관측성/표준)
|
|
24
25
|
- Open RAG Trace 스펙/샘플을 실제 운영 요구에 맞춰 점진 확장(버전 정책 준수)
|
|
25
26
|
- Collector 구성 및 데이터 보존(artifact 분리, PII 마스킹) 가이드 강화
|
|
27
|
+
- Stage Events 최소 스키마 표준화 및 문서 동기화
|
|
26
28
|
|
|
27
29
|
### P3 (성능 개선 로드맵)
|
|
28
30
|
- RAG 성능 개선 제안서 기반으로 KPI/평가 프로토콜/로드맵 정립
|
|
29
31
|
- Retrieval/리랭킹/GraphRAG 실험과 운영 지표 통합
|
|
30
32
|
- 전문가 관점(인지/UX/운영) 기반 개선 루프 고도화
|
|
33
|
+
- 노이즈 저감/ordering_warning 운영 기준 정착
|
|
31
34
|
|
|
32
35
|
## 작업 트래킹
|
|
33
36
|
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
# EvalVault 상태 요약 (Status)
|
|
2
2
|
|
|
3
3
|
> Audience: 사용자 · 개발자 · 운영자
|
|
4
|
-
> Last Updated: 2026-01-
|
|
4
|
+
> Last Updated: 2026-01-18
|
|
5
5
|
|
|
6
6
|
EvalVault의 목표는 **RAG 평가/분석/추적을 하나의 Run 단위로 연결**해, 실험→회귀→개선 루프를 빠르게 만드는 것입니다.
|
|
7
7
|
|
|
@@ -14,6 +14,18 @@ EvalVault의 목표는 **RAG 평가/분석/추적을 하나의 Run 단위로 연
|
|
|
14
14
|
- **Open RAG Trace 표준**: 외부/내부 RAG 시스템을 표준 스키마로 계측/수집
|
|
15
15
|
- **성능 개선 프레임**: `guides/RAG_PERFORMANCE_IMPROVEMENT_PROPOSAL.md`에 KPI/평가/로드맵 정리
|
|
16
16
|
|
|
17
|
+
## 최근 완료 사항
|
|
18
|
+
|
|
19
|
+
- **CLI 병렬 명령군 완료**: compare/calibrate-judge/profile-difficulty/regress/artifacts lint/ops snapshot
|
|
20
|
+
- **노이즈 저감 파이프라인 강화**: dataset_preprocessor/evaluator/stage_metric_service 개선
|
|
21
|
+
- **ordering_warning 도입**: 순서 복원/경고 메트릭 + 런북/strict 기준 문서화
|
|
22
|
+
- **Web UI 반영**: RunDetails/CompareRuns/AnalysisLab에 경고 표시 및 런북 링크 추가
|
|
23
|
+
|
|
24
|
+
## 품질/검증 상태
|
|
25
|
+
|
|
26
|
+
- Python unit smoke: dataset_preprocessor/evaluator_comprehensive/stage_metric_service PASS
|
|
27
|
+
- Frontend lint/build: eslint PASS, vite build PASS (bundle size warning only)
|
|
28
|
+
|
|
17
29
|
## 현재 제약 (투명 공개)
|
|
18
30
|
|
|
19
31
|
- Web UI의 기능은 CLI의 모든 플래그/옵션을 1:1로 노출하지 않습니다. (고급 옵션은 CLI 우선)
|
|
@@ -65,6 +65,22 @@
|
|
|
65
65
|
- `samples`: 샘플 수
|
|
66
66
|
- 샘플: `avg_score=0.7200`, `pass_rate=0.6`, `samples=30`
|
|
67
67
|
|
|
68
|
+
## CustomMetrics
|
|
69
|
+
- 컬럼 설명
|
|
70
|
+
- `schema_version`: 스냅샷 스키마 버전
|
|
71
|
+
- `metric_name`: 메트릭 이름
|
|
72
|
+
- `source`: 메트릭 출처 (custom)
|
|
73
|
+
- `description`: 메트릭 설명
|
|
74
|
+
- `evaluation_method`: 평가 방식
|
|
75
|
+
- `inputs`: 입력 필드 목록
|
|
76
|
+
- `output`: 점수 범위/판정 규칙
|
|
77
|
+
- `evaluation_process`: 평가 과정 요약
|
|
78
|
+
- `rules`: 키워드/정규식/가중치 등
|
|
79
|
+
- `notes`: 도메인 특화/해석 주의사항
|
|
80
|
+
- `implementation_path`: 구현 파일 경로
|
|
81
|
+
- `implementation_hash`: 구현 파일 해시
|
|
82
|
+
- 샘플: `metric_name=entity_preservation`, `evaluation_method=rule-based`
|
|
83
|
+
|
|
68
84
|
## RunPromptSets
|
|
69
85
|
- 컬럼 설명
|
|
70
86
|
- `run_id`: 실행 ID
|
|
@@ -1,10 +1,9 @@
|
|
|
1
|
-
# EvalVault 작업 계획서 (
|
|
1
|
+
# EvalVault 작업 계획서 (Archived)
|
|
2
2
|
|
|
3
3
|
## 0) 목적
|
|
4
4
|
|
|
5
|
-
-
|
|
6
|
-
-
|
|
7
|
-
- RAGAS 프롬프트와 시스템 프롬프트를 **분리 오버라이드**하고 실제 실행으로 검증
|
|
5
|
+
- 본 문서는 과거 작업 계획서로 분류되어 보존용으로만 남깁니다.
|
|
6
|
+
- 최신 실행 시나리오는 `docs/guides/RAG_CLI_WORKFLOW_TEMPLATES.md`를 기준으로 합니다.
|
|
8
7
|
|
|
9
8
|
## 1) 전제 및 범위
|
|
10
9
|
|
|
@@ -0,0 +1,152 @@
|
|
|
1
|
+
# 보험 도메인 요약(Summary) 메트릭 확장 PRD/SDD (EvalVault)
|
|
2
|
+
|
|
3
|
+
## 1) 목표
|
|
4
|
+
- 보험 상담/약관 요약에 대해 “요약 품질 + 리스크 안내 + 단정 표현 억제”를 평가하는 커스텀 메트릭 4종을 추가한다.
|
|
5
|
+
- 기존 EvalVault 평가 파이프라인(메트릭 레지스트리, CUSTOM_METRIC_MAP, 리포트/엑셀/UI)에 일관되게 통합한다.
|
|
6
|
+
- 메트릭 정의/룰/스냅샷을 명시하여 재현성과 운영 튜닝을 확보한다.
|
|
7
|
+
|
|
8
|
+
## 2) 범위
|
|
9
|
+
### 포함
|
|
10
|
+
- 신규 메트릭 4종
|
|
11
|
+
- summary_accuracy
|
|
12
|
+
- summary_risk_coverage
|
|
13
|
+
- summary_non_definitive
|
|
14
|
+
- summary_needs_followup
|
|
15
|
+
- TestCase metadata 확장
|
|
16
|
+
- summary_tags: list[str]
|
|
17
|
+
- summary_intent: "agent_notes"
|
|
18
|
+
- 통합 순서
|
|
19
|
+
1) CLI
|
|
20
|
+
2) Excel/리포트
|
|
21
|
+
3) Web UI
|
|
22
|
+
|
|
23
|
+
### 제외
|
|
24
|
+
- 신규 평가 파이프라인 도입
|
|
25
|
+
- Ragas 요약 메트릭의 의미 변경
|
|
26
|
+
- 합의되지 않은 추가 메트릭 도입
|
|
27
|
+
|
|
28
|
+
## 3) 현황 및 통일성 기준
|
|
29
|
+
- EvalVault는 custom metric을 evaluator.CUSTOM_METRIC_MAP에 등록하고, registry에서 노출 스펙을 관리한다.
|
|
30
|
+
- summary 메트릭은 CLI/리포트/UI에서 별도 정렬/임계값 기준을 유지한다.
|
|
31
|
+
- TestCase.metadata는 JSON 로더에서 이미 지원되므로, summary_tags/summary_intent는 metadata에 추가하는 방식이 통일적이다.
|
|
32
|
+
|
|
33
|
+
## 4) 데이터 스키마
|
|
34
|
+
### TestCase.metadata
|
|
35
|
+
- summary_tags: list[str] (선택)
|
|
36
|
+
- summary_intent: "agent_notes" (선택, 내부용 고정)
|
|
37
|
+
|
|
38
|
+
예시:
|
|
39
|
+
```json
|
|
40
|
+
{
|
|
41
|
+
"id": "tc-001",
|
|
42
|
+
"question": "상담 요약 요청",
|
|
43
|
+
"answer": "요약문 ...",
|
|
44
|
+
"contexts": ["대화 원문 ..."],
|
|
45
|
+
"ground_truth": "현업 요약 ...",
|
|
46
|
+
"metadata": {
|
|
47
|
+
"summary_intent": "agent_notes",
|
|
48
|
+
"summary_tags": ["exclusion", "deductible", "limit", "needs_followup"]
|
|
49
|
+
}
|
|
50
|
+
}
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## 5) 메트릭 정의
|
|
54
|
+
### 5.1 summary_accuracy
|
|
55
|
+
- 목적: 요약문 내 핵심 엔티티(금액/기간/조건 등)가 컨텍스트에 근거하는지 평가
|
|
56
|
+
- 입력: answer, contexts
|
|
57
|
+
- 점수: supported_entities / summary_entities
|
|
58
|
+
- 보정 정책:
|
|
59
|
+
- summary_entities가 비어있고 context_entities가 있으면 0.5
|
|
60
|
+
- context_entities가 없으면 0.0
|
|
61
|
+
|
|
62
|
+
### 5.2 summary_risk_coverage
|
|
63
|
+
- 목적: 보험 리스크 항목(면책/감액/자기부담금/한도 등) 누락 여부 평가
|
|
64
|
+
- 입력: answer, metadata.summary_tags
|
|
65
|
+
- 점수: covered_tags / expected_tags
|
|
66
|
+
- expected_tags가 없으면 1.0
|
|
67
|
+
|
|
68
|
+
### 5.3 summary_non_definitive
|
|
69
|
+
- 목적: 단정 표현(“무조건 지급”, “반드시”)을 억제했는지 평가
|
|
70
|
+
- 입력: answer
|
|
71
|
+
- 점수: 단정 표현이 없으면 1.0, 있으면 0.0
|
|
72
|
+
|
|
73
|
+
### 5.4 summary_needs_followup
|
|
74
|
+
- 목적: 추가 확인이 필요한 경우 요약에 “추가 확인 필요”를 명시했는지 평가
|
|
75
|
+
- 입력: answer, metadata.summary_tags
|
|
76
|
+
- 규칙:
|
|
77
|
+
- needs_followup 태그가 있으면 followup 표현 포함 시 1.0, 아니면 0.0
|
|
78
|
+
- 태그가 없으면 followup 표현이 없을 때 1.0
|
|
79
|
+
|
|
80
|
+
## 6) 임계값(초기 권장)
|
|
81
|
+
- summary_accuracy: 0.90
|
|
82
|
+
- summary_risk_coverage: 0.90
|
|
83
|
+
- summary_non_definitive: 0.80
|
|
84
|
+
- summary_needs_followup: 0.80
|
|
85
|
+
|
|
86
|
+
## 7) 룰셋(초기)
|
|
87
|
+
### tag -> keyword 매핑
|
|
88
|
+
- exclusion: 면책, 보장 제외, 지급 불가, exclusion
|
|
89
|
+
- deductible: 자기부담, 본인부담금, deductible, copay
|
|
90
|
+
- limit: 한도, 상한, 최대, limit, cap
|
|
91
|
+
- waiting_period: 면책기간, 대기기간, waiting period
|
|
92
|
+
- condition: 조건, 단서, 다만, condition
|
|
93
|
+
- documents_required: 서류, 진단서, 영수증, documents
|
|
94
|
+
- needs_followup: 확인 필요, 추가 확인, 담당자 확인, 재문의, follow up
|
|
95
|
+
|
|
96
|
+
### 단정 표현 탐지
|
|
97
|
+
- 무조건, 반드시, 100%, 전액 지급, 확실히, 분명히, always, guaranteed
|
|
98
|
+
|
|
99
|
+
## 8) 통합 지점 (구현 순서)
|
|
100
|
+
### 8.1 CLI
|
|
101
|
+
- 신규 메트릭 클래스 추가
|
|
102
|
+
- evaluator.CUSTOM_METRIC_MAP 등록
|
|
103
|
+
- metrics.registry에 스펙 추가
|
|
104
|
+
- summary threshold profile 및 SUMMARY_METRIC_ORDER 확장
|
|
105
|
+
|
|
106
|
+
### 8.2 Excel/리포트
|
|
107
|
+
- custom_metric_snapshot에 신규 메트릭 상세 기록
|
|
108
|
+
- Excel export에서 JSON 컬럼 안전 변환(호환성 보강)
|
|
109
|
+
- 요약 리포트/LLM 리포트에서 summary 메트릭 경고 라인 확장
|
|
110
|
+
|
|
111
|
+
### 8.3 Web UI
|
|
112
|
+
- SUMMARY_METRICS/thresholds 확장
|
|
113
|
+
- 요약 메트릭 카드/차트/필터 반영
|
|
114
|
+
|
|
115
|
+
## 9) 리스크/주의사항
|
|
116
|
+
- CSV/Excel 로더는 test_case metadata를 현재 지원하지 않음 (JSON 우선)
|
|
117
|
+
- 단정 표현/리스크 키워드는 표현 다양성으로 과소/과대 탐지 가능
|
|
118
|
+
- summary_non_definitive는 “단정 억제” 점수임을 명확히 표기 필요
|
|
119
|
+
- Excel export는 JSON 컬럼이 섞여있어 변환 실패 가능 → json_columns 강제 변환 유지
|
|
120
|
+
|
|
121
|
+
## 10) 하이브리드(규칙 + LLM 보정) 설계안
|
|
122
|
+
|
|
123
|
+
### 10.1 공통 흐름
|
|
124
|
+
1) 규칙 기반 1차 점수 계산
|
|
125
|
+
2) 경계 사례/태그 누락 등 불확실 구간에서만 LLM 보정
|
|
126
|
+
3) 최종 점수 합성
|
|
127
|
+
- 기본: `final = 0.7 * rule + 0.3 * llm`
|
|
128
|
+
- 또는 LLM이 높은 확신을 줄 때만 override
|
|
129
|
+
|
|
130
|
+
### 10.2 메트릭별 보정 기준
|
|
131
|
+
- `summary_accuracy`
|
|
132
|
+
- 경계 조건: rule 점수 0.3~0.7, 엔티티 수가 매우 적음
|
|
133
|
+
- LLM 질문: “요약의 수치/기간/조건이 컨텍스트에 근거하는가?” (0~1)
|
|
134
|
+
- `summary_risk_coverage`
|
|
135
|
+
- 태그가 없는 경우 LLM이 리스크 항목 존재 여부를 추정 → 가상 태그 생성
|
|
136
|
+
- LLM 질문: “요약에 면책/감액/자기부담/한도/조건이 포함되었는가?”
|
|
137
|
+
- `summary_non_definitive`
|
|
138
|
+
- 규칙이 0.0인 경우만 LLM 재판정
|
|
139
|
+
- LLM 질문: “요약이 사실을 단정적으로 확정하는가?” (0~1)
|
|
140
|
+
- `summary_needs_followup`
|
|
141
|
+
- needs_followup 태그가 있거나 규칙 판단이 모호할 때만 LLM 사용
|
|
142
|
+
- LLM 질문: “요약에 추가 확인/재문의 안내가 포함되어 있는가?” (0/1)
|
|
143
|
+
|
|
144
|
+
### 10.3 운영 가이드
|
|
145
|
+
- LLM 보정은 **경계 사례에만 제한**하여 비용/분산을 줄인다.
|
|
146
|
+
- 프롬프트/모델 버전을 스냅샷에 기록해 회귀를 추적한다.
|
|
147
|
+
- 규칙 기반 점수와 보정 점수를 함께 저장하여 디버깅 가능하게 한다.
|
|
148
|
+
|
|
149
|
+
## 11) 롤아웃
|
|
150
|
+
1) CLI (메트릭 계산/표시)
|
|
151
|
+
2) Excel/리포트
|
|
152
|
+
3) Web UI
|
|
@@ -1,8 +1,8 @@
|
|
|
1
|
-
# EvalVault
|
|
1
|
+
# EvalVault 개발 상태/실행 계획 (Archived)
|
|
2
2
|
|
|
3
3
|
## 목적
|
|
4
|
-
-
|
|
5
|
-
-
|
|
4
|
+
- 본 문서는 과거 작업 로그/계획 성격 문서로 분류되어 보존용으로만 남깁니다.
|
|
5
|
+
- 최신 상태는 `docs/STATUS.md`, `docs/ROADMAP.md`, `docs/guides/RAG_PERFORMANCE_IMPLEMENTATION_LOG.md`를 기준으로 합니다.
|
|
6
6
|
|
|
7
7
|
---
|
|
8
8
|
|