evalvault 1.63.1__tar.gz → 1.65.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (829) hide show
  1. {evalvault-1.63.1 → evalvault-1.65.0}/.env.example +18 -2
  2. {evalvault-1.63.1 → evalvault-1.65.0}/PKG-INFO +8 -1
  3. {evalvault-1.63.1 → evalvault-1.65.0}/README.md +3 -0
  4. {evalvault-1.63.1 → evalvault-1.65.0}/docker-compose.langfuse.yml +19 -19
  5. {evalvault-1.63.1 → evalvault-1.65.0}/docs/INDEX.md +3 -0
  6. {evalvault-1.63.1 → evalvault-1.65.0}/docs/README.ko.md +4 -0
  7. {evalvault-1.63.1 → evalvault-1.65.0}/docs/ROADMAP.md +5 -0
  8. {evalvault-1.63.1 → evalvault-1.65.0}/docs/STATUS.md +1 -0
  9. evalvault-1.65.0/docs/guides/CLI_PARALLEL_FEATURES_SPEC.md +315 -0
  10. evalvault-1.65.0/docs/guides/Extension_2.md +114 -0
  11. evalvault-1.65.0/docs/guides/Extension_Data_Difficulty_Profiling_Custom_Judge_Model.md +1412 -0
  12. evalvault-1.65.0/docs/guides/PARALLEL_WORK_APPROVAL_RULES.md +51 -0
  13. evalvault-1.65.0/docs/guides/PROJECT_STATUS_AND_PLAN.md +291 -0
  14. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md +77 -2
  15. evalvault-1.65.0/docs/guides/RAG_NOISE_REDUCTION_GUIDE.md +284 -0
  16. evalvault-1.65.0/docs/guides/RAG_PERFORMANCE_IMPLEMENTATION_LOG.md +179 -0
  17. evalvault-1.65.0/docs/guides/RAG_PERFORMANCE_IMPROVEMENT_PROPOSAL.md +477 -0
  18. evalvault-1.65.0/docs/guides/refactoring_strategy.md +63 -0
  19. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/01_overview.md +1 -0
  20. evalvault-1.65.0/docs/refactor/REFAC_000_master_plan.md +161 -0
  21. evalvault-1.65.0/docs/refactor/REFAC_010_agent_playbook.md +83 -0
  22. evalvault-1.65.0/docs/refactor/REFAC_020_logging_policy.md +61 -0
  23. evalvault-1.65.0/docs/refactor/REFAC_030_phase0_responsibility_map.md +82 -0
  24. evalvault-1.65.0/docs/refactor/REFAC_040_wbs_parallel_plan.md +117 -0
  25. evalvault-1.65.0/docs/refactor/logs/phase-0-baseline.md +26 -0
  26. evalvault-1.65.0/docs/refactor/logs/phase-1-evaluator.md +26 -0
  27. evalvault-1.65.0/docs/refactor/logs/phase-2-cli-run.md +26 -0
  28. evalvault-1.65.0/docs/refactor/logs/phase-3-analysis.md +26 -0
  29. evalvault-1.65.0/docs/security_audit_worklog.md +482 -0
  30. evalvault-1.65.0/docs/templates/eval_report_templates.md +117 -0
  31. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/AnalysisLab.tsx +36 -0
  32. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/CompareRuns.tsx +42 -1
  33. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/RunDetails.tsx +55 -0
  34. {evalvault-1.63.1 → evalvault-1.65.0}/pyproject.toml +6 -1
  35. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/ops/phoenix_watch.py +19 -2
  36. evalvault-1.65.0/src/evalvault/adapters/inbound/api/main.py +222 -0
  37. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/config.py +6 -1
  38. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/knowledge.py +62 -6
  39. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/__init__.py +14 -7
  40. evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/artifacts.py +107 -0
  41. evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/calibrate_judge.py +283 -0
  42. evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/compare.py +290 -0
  43. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/history.py +13 -85
  44. evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/ops.py +110 -0
  45. evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/profile_difficulty.py +160 -0
  46. evalvault-1.65.0/src/evalvault/adapters/inbound/cli/commands/regress.py +251 -0
  47. evalvault-1.65.0/src/evalvault/adapters/outbound/analysis/comparison_pipeline_adapter.py +49 -0
  48. evalvault-1.65.0/src/evalvault/adapters/outbound/artifact_fs.py +16 -0
  49. evalvault-1.65.0/src/evalvault/adapters/outbound/filesystem/__init__.py +3 -0
  50. evalvault-1.65.0/src/evalvault/adapters/outbound/filesystem/difficulty_profile_writer.py +50 -0
  51. evalvault-1.65.0/src/evalvault/adapters/outbound/filesystem/ops_snapshot_writer.py +13 -0
  52. evalvault-1.65.0/src/evalvault/adapters/outbound/judge_calibration_adapter.py +36 -0
  53. evalvault-1.65.0/src/evalvault/adapters/outbound/judge_calibration_reporter.py +57 -0
  54. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/methods/external_command.py +22 -1
  55. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracker/langfuse_adapter.py +40 -15
  56. evalvault-1.65.0/src/evalvault/adapters/outbound/tracker/log_sanitizer.py +93 -0
  57. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracker/mlflow_adapter.py +3 -2
  58. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracker/phoenix_adapter.py +90 -37
  59. evalvault-1.65.0/src/evalvault/config/secret_manager.py +118 -0
  60. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/config/settings.py +141 -1
  61. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/__init__.py +10 -0
  62. evalvault-1.65.0/src/evalvault/domain/entities/judge_calibration.py +50 -0
  63. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/stage.py +11 -3
  64. evalvault-1.65.0/src/evalvault/domain/services/artifact_lint_service.py +268 -0
  65. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/benchmark_runner.py +1 -6
  66. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/dataset_preprocessor.py +26 -0
  67. evalvault-1.65.0/src/evalvault/domain/services/difficulty_profile_reporter.py +25 -0
  68. evalvault-1.65.0/src/evalvault/domain/services/difficulty_profiling_service.py +304 -0
  69. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/evaluator.py +2 -0
  70. evalvault-1.65.0/src/evalvault/domain/services/judge_calibration_service.py +495 -0
  71. evalvault-1.65.0/src/evalvault/domain/services/ops_snapshot_service.py +159 -0
  72. evalvault-1.65.0/src/evalvault/domain/services/regression_gate_service.py +199 -0
  73. evalvault-1.65.0/src/evalvault/domain/services/run_comparison_service.py +159 -0
  74. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/stage_event_builder.py +6 -1
  75. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/stage_metric_service.py +83 -18
  76. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/__init__.py +4 -0
  77. evalvault-1.65.0/src/evalvault/ports/outbound/artifact_fs_port.py +12 -0
  78. evalvault-1.65.0/src/evalvault/ports/outbound/comparison_pipeline_port.py +22 -0
  79. evalvault-1.65.0/src/evalvault/ports/outbound/difficulty_profile_port.py +15 -0
  80. evalvault-1.65.0/src/evalvault/ports/outbound/judge_calibration_port.py +22 -0
  81. evalvault-1.65.0/src/evalvault/ports/outbound/ops_snapshot_port.py +8 -0
  82. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/test_cli_integration.py +5 -0
  83. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/test_pipeline_api_contracts.py +79 -0
  84. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_dataset_preprocessor.py +45 -0
  85. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_evaluator_comprehensive.py +34 -2
  86. evalvault-1.65.0/tests/unit/domain/services/test_judge_calibration_service.py +95 -0
  87. evalvault-1.65.0/tests/unit/domain/services/test_ops_snapshot_service.py +52 -0
  88. evalvault-1.65.0/tests/unit/domain/services/test_regression_gate_service.py +68 -0
  89. evalvault-1.65.0/tests/unit/test_artifact_lint_service.py +68 -0
  90. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_cli.py +130 -23
  91. evalvault-1.65.0/tests/unit/test_cli_artifacts.py +37 -0
  92. evalvault-1.65.0/tests/unit/test_cli_calibrate_judge.py +111 -0
  93. evalvault-1.65.0/tests/unit/test_cli_ops.py +14 -0
  94. evalvault-1.65.0/tests/unit/test_difficulty_profiling_service.py +120 -0
  95. evalvault-1.65.0/tests/unit/test_regress_cli.py +104 -0
  96. evalvault-1.65.0/tests/unit/test_run_comparison_service.py +86 -0
  97. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_settings.py +21 -0
  98. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_stage_metric_service.py +58 -0
  99. {evalvault-1.63.1 → evalvault-1.65.0}/uv.lock +150 -4
  100. evalvault-1.63.1/src/evalvault/adapters/inbound/api/main.py +0 -84
  101. {evalvault-1.63.1 → evalvault-1.65.0}/.dockerignore +0 -0
  102. {evalvault-1.63.1 → evalvault-1.65.0}/.github/workflows/ci.yml +0 -0
  103. {evalvault-1.63.1 → evalvault-1.65.0}/.github/workflows/release.yml +0 -0
  104. {evalvault-1.63.1 → evalvault-1.65.0}/.github/workflows/stale.yml +0 -0
  105. {evalvault-1.63.1 → evalvault-1.65.0}/.gitignore +0 -0
  106. {evalvault-1.63.1 → evalvault-1.65.0}/.pre-commit-config.yaml +0 -0
  107. {evalvault-1.63.1 → evalvault-1.65.0}/.python-version +0 -0
  108. {evalvault-1.63.1 → evalvault-1.65.0}/AGENTS.md +0 -0
  109. {evalvault-1.63.1 → evalvault-1.65.0}/CHANGELOG.md +0 -0
  110. {evalvault-1.63.1 → evalvault-1.65.0}/CLAUDE.md +0 -0
  111. {evalvault-1.63.1 → evalvault-1.65.0}/CODE_OF_CONDUCT.md +0 -0
  112. {evalvault-1.63.1 → evalvault-1.65.0}/CONTRIBUTING.md +0 -0
  113. {evalvault-1.63.1 → evalvault-1.65.0}/Dockerfile +0 -0
  114. {evalvault-1.63.1 → evalvault-1.65.0}/LICENSE.md +0 -0
  115. {evalvault-1.63.1 → evalvault-1.65.0}/README.en.md +0 -0
  116. {evalvault-1.63.1 → evalvault-1.65.0}/SECURITY.md +0 -0
  117. {evalvault-1.63.1 → evalvault-1.65.0}/agent/README.md +0 -0
  118. {evalvault-1.63.1 → evalvault-1.65.0}/agent/agent.py +0 -0
  119. {evalvault-1.63.1 → evalvault-1.65.0}/agent/client.py +0 -0
  120. {evalvault-1.63.1 → evalvault-1.65.0}/agent/config.py +0 -0
  121. {evalvault-1.63.1 → evalvault-1.65.0}/agent/main.py +0 -0
  122. {evalvault-1.63.1 → evalvault-1.65.0}/agent/memory/README.md +0 -0
  123. {evalvault-1.63.1 → evalvault-1.65.0}/agent/memory/shared/decisions.md +0 -0
  124. {evalvault-1.63.1 → evalvault-1.65.0}/agent/memory/shared/dependencies.md +0 -0
  125. {evalvault-1.63.1 → evalvault-1.65.0}/agent/memory/templates/coordinator_guide.md +0 -0
  126. {evalvault-1.63.1 → evalvault-1.65.0}/agent/memory/templates/work_log_template.md +0 -0
  127. {evalvault-1.63.1 → evalvault-1.65.0}/agent/memory_integration.py +0 -0
  128. {evalvault-1.63.1 → evalvault-1.65.0}/agent/progress.py +0 -0
  129. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts/app_spec.txt +0 -0
  130. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts/baseline.txt +0 -0
  131. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts/coding_prompt.md +0 -0
  132. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts/existing_project_prompt.md +0 -0
  133. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts/improvement/architecture_prompt.md +0 -0
  134. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts/improvement/base_prompt.md +0 -0
  135. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts/improvement/coordinator_prompt.md +0 -0
  136. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts/improvement/observability_prompt.md +0 -0
  137. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts/initializer_prompt.md +0 -0
  138. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts/prompt_manifest.json +0 -0
  139. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts/system.txt +0 -0
  140. {evalvault-1.63.1 → evalvault-1.65.0}/agent/prompts.py +0 -0
  141. {evalvault-1.63.1 → evalvault-1.65.0}/agent/requirements.txt +0 -0
  142. {evalvault-1.63.1 → evalvault-1.65.0}/agent/security.py +0 -0
  143. {evalvault-1.63.1 → evalvault-1.65.0}/config/domains/insurance/memory.yaml +0 -0
  144. {evalvault-1.63.1 → evalvault-1.65.0}/config/domains/insurance/terms_dictionary_en.json +0 -0
  145. {evalvault-1.63.1 → evalvault-1.65.0}/config/domains/insurance/terms_dictionary_ko.json +0 -0
  146. {evalvault-1.63.1 → evalvault-1.65.0}/config/methods.yaml +0 -0
  147. {evalvault-1.63.1 → evalvault-1.65.0}/config/models.yaml +0 -0
  148. {evalvault-1.63.1 → evalvault-1.65.0}/config/ragas_prompts_override.yaml +0 -0
  149. {evalvault-1.63.1 → evalvault-1.65.0}/config/regressions/default.json +0 -0
  150. {evalvault-1.63.1 → evalvault-1.65.0}/config/regressions/ux.json +0 -0
  151. {evalvault-1.63.1 → evalvault-1.65.0}/config/stage_metric_playbook.yaml +0 -0
  152. {evalvault-1.63.1 → evalvault-1.65.0}/config/stage_metric_thresholds.json +0 -0
  153. {evalvault-1.63.1 → evalvault-1.65.0}/data/datasets/dummy_test_dataset.json +0 -0
  154. {evalvault-1.63.1 → evalvault-1.65.0}/data/datasets/insurance_qa_korean.csv +0 -0
  155. {evalvault-1.63.1 → evalvault-1.65.0}/data/datasets/insurance_qa_korean.json +0 -0
  156. {evalvault-1.63.1 → evalvault-1.65.0}/data/datasets/insurance_qa_korean_2.json +0 -0
  157. {evalvault-1.63.1 → evalvault-1.65.0}/data/datasets/insurance_qa_korean_3.json +0 -0
  158. {evalvault-1.63.1 → evalvault-1.65.0}/data/datasets/ragas_ko90_en10.json +0 -0
  159. {evalvault-1.63.1 → evalvault-1.65.0}/data/datasets/sample.json +0 -0
  160. {evalvault-1.63.1 → evalvault-1.65.0}/data/datasets/visualization_20q_cluster_map.csv +0 -0
  161. {evalvault-1.63.1 → evalvault-1.65.0}/data/datasets/visualization_20q_korean.json +0 -0
  162. {evalvault-1.63.1 → evalvault-1.65.0}/data/datasets/visualization_2q_cluster_map.csv +0 -0
  163. {evalvault-1.63.1 → evalvault-1.65.0}/data/datasets/visualization_2q_korean.json +0 -0
  164. {evalvault-1.63.1 → evalvault-1.65.0}/data/kg/knowledge_graph.json +0 -0
  165. {evalvault-1.63.1 → evalvault-1.65.0}/data/raw/The Complete Guide to Mastering Suno Advanced Strategies for Professional Music Generation.md +0 -0
  166. {evalvault-1.63.1 → evalvault-1.65.0}/data/raw/edge_cases.json +0 -0
  167. {evalvault-1.63.1 → evalvault-1.65.0}/data/raw/run_mode_full_domain_memory.json +0 -0
  168. {evalvault-1.63.1 → evalvault-1.65.0}/data/raw/sample_rag_knowledge.txt +0 -0
  169. {evalvault-1.63.1 → evalvault-1.65.0}/dataset_templates/dataset_template.csv +0 -0
  170. {evalvault-1.63.1 → evalvault-1.65.0}/dataset_templates/dataset_template.json +0 -0
  171. {evalvault-1.63.1 → evalvault-1.65.0}/dataset_templates/dataset_template.xlsx +0 -0
  172. {evalvault-1.63.1 → evalvault-1.65.0}/dataset_templates/method_input_template.json +0 -0
  173. {evalvault-1.63.1 → evalvault-1.65.0}/docker-compose.phoenix.yaml +0 -0
  174. {evalvault-1.63.1 → evalvault-1.65.0}/docker-compose.yml +0 -0
  175. {evalvault-1.63.1 → evalvault-1.65.0}/docs/api/adapters/inbound.md +0 -0
  176. {evalvault-1.63.1 → evalvault-1.65.0}/docs/api/adapters/outbound.md +0 -0
  177. {evalvault-1.63.1 → evalvault-1.65.0}/docs/api/config.md +0 -0
  178. {evalvault-1.63.1 → evalvault-1.65.0}/docs/api/domain/entities.md +0 -0
  179. {evalvault-1.63.1 → evalvault-1.65.0}/docs/api/domain/metrics.md +0 -0
  180. {evalvault-1.63.1 → evalvault-1.65.0}/docs/api/domain/services.md +0 -0
  181. {evalvault-1.63.1 → evalvault-1.65.0}/docs/api/ports/inbound.md +0 -0
  182. {evalvault-1.63.1 → evalvault-1.65.0}/docs/api/ports/outbound.md +0 -0
  183. {evalvault-1.63.1 → evalvault-1.65.0}/docs/architecture/open-rag-trace-collector.md +0 -0
  184. {evalvault-1.63.1 → evalvault-1.65.0}/docs/architecture/open-rag-trace-spec.md +0 -0
  185. {evalvault-1.63.1 → evalvault-1.65.0}/docs/getting-started/INSTALLATION.md +0 -0
  186. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/AGENTS_SYSTEM_GUIDE.md +0 -0
  187. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/CLI_MCP_PLAN.md +0 -0
  188. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/DEV_GUIDE.md +0 -0
  189. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md +0 -0
  190. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/EVALVAULT_RUN_EXCEL_SHEETS.md +0 -0
  191. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/EVALVAULT_WORK_PLAN.md +0 -0
  192. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/EXTERNAL_TRACE_API_SPEC.md +0 -0
  193. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/LENA_MVP_IMPLEMENTATION_PLAN.md +0 -0
  194. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/LENA_RAGAS_CALIBRATION_DEV_PLAN.md +0 -0
  195. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/OPEN_RAG_TRACE_INTERNAL_ADAPTER.md +0 -0
  196. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/OPEN_RAG_TRACE_SAMPLES.md +0 -0
  197. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/PRD_LENA.md +0 -0
  198. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/RELEASE_CHECKLIST.md +0 -0
  199. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/USER_GUIDE.md +0 -0
  200. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/WEBUI_CLI_ROLLOUT_PLAN.md +0 -0
  201. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/prompt_suggestions_design.md +0 -0
  202. {evalvault-1.63.1 → evalvault-1.65.0}/docs/guides/rag_human_feedback_calibration_implementation_plan.md +0 -0
  203. {evalvault-1.63.1 → evalvault-1.65.0}/docs/mapping/component-to-whitepaper.yaml +0 -0
  204. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/00_frontmatter.md +0 -0
  205. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/02_architecture.md +0 -0
  206. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/03_data_flow.md +0 -0
  207. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/04_components.md +0 -0
  208. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/05_expert_lenses.md +0 -0
  209. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/06_implementation.md +0 -0
  210. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/07_advanced.md +0 -0
  211. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/08_customization.md +0 -0
  212. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/09_quality.md +0 -0
  213. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/10_performance.md +0 -0
  214. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/11_security.md +0 -0
  215. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/12_operations.md +0 -0
  216. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/13_standards.md +0 -0
  217. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/14_roadmap.md +0 -0
  218. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/INDEX.md +0 -0
  219. {evalvault-1.63.1 → evalvault-1.65.0}/docs/new_whitepaper/STYLE_GUIDE.md +0 -0
  220. {evalvault-1.63.1 → evalvault-1.65.0}/docs/stylesheets/extra.css +0 -0
  221. {evalvault-1.63.1 → evalvault-1.65.0}/docs/templates/dataset_template.csv +0 -0
  222. {evalvault-1.63.1 → evalvault-1.65.0}/docs/templates/dataset_template.json +0 -0
  223. {evalvault-1.63.1 → evalvault-1.65.0}/docs/templates/dataset_template.xlsx +0 -0
  224. {evalvault-1.63.1 → evalvault-1.65.0}/docs/templates/kg_template.json +0 -0
  225. {evalvault-1.63.1 → evalvault-1.65.0}/docs/templates/otel_openinference_trace_example.json +0 -0
  226. {evalvault-1.63.1 → evalvault-1.65.0}/docs/templates/ragas_dataset_example_ko90_en10.json +0 -0
  227. {evalvault-1.63.1 → evalvault-1.65.0}/docs/templates/retriever_docs_template.json +0 -0
  228. {evalvault-1.63.1 → evalvault-1.65.0}/docs/tools/generate-whitepaper.py +0 -0
  229. {evalvault-1.63.1 → evalvault-1.65.0}/docs/web_ui_analysis_migration_plan.md +0 -0
  230. {evalvault-1.63.1 → evalvault-1.65.0}/dummy_test_dataset.json +0 -0
  231. {evalvault-1.63.1 → evalvault-1.65.0}/examples/README.md +0 -0
  232. {evalvault-1.63.1 → evalvault-1.65.0}/examples/benchmarks/README.md +0 -0
  233. {evalvault-1.63.1 → evalvault-1.65.0}/examples/benchmarks/korean_rag/faithfulness_test.json +0 -0
  234. {evalvault-1.63.1 → evalvault-1.65.0}/examples/benchmarks/korean_rag/insurance_qa_100.json +0 -0
  235. {evalvault-1.63.1 → evalvault-1.65.0}/examples/benchmarks/korean_rag/keyword_extraction_test.json +0 -0
  236. {evalvault-1.63.1 → evalvault-1.65.0}/examples/benchmarks/korean_rag/retrieval_test.json +0 -0
  237. {evalvault-1.63.1 → evalvault-1.65.0}/examples/benchmarks/output/comparison.json +0 -0
  238. {evalvault-1.63.1 → evalvault-1.65.0}/examples/benchmarks/output/full_results.json +0 -0
  239. {evalvault-1.63.1 → evalvault-1.65.0}/examples/benchmarks/output/leaderboard.json +0 -0
  240. {evalvault-1.63.1 → evalvault-1.65.0}/examples/benchmarks/output/results_mteb.json +0 -0
  241. {evalvault-1.63.1 → evalvault-1.65.0}/examples/benchmarks/output/retrieval_result.json +0 -0
  242. {evalvault-1.63.1 → evalvault-1.65.0}/examples/benchmarks/run_korean_benchmark.py +0 -0
  243. {evalvault-1.63.1 → evalvault-1.65.0}/examples/kg_generator_demo.py +0 -0
  244. {evalvault-1.63.1 → evalvault-1.65.0}/examples/method_plugin_template/README.md +0 -0
  245. {evalvault-1.63.1 → evalvault-1.65.0}/examples/method_plugin_template/pyproject.toml +0 -0
  246. {evalvault-1.63.1 → evalvault-1.65.0}/examples/method_plugin_template/src/method_plugin_template/__init__.py +0 -0
  247. {evalvault-1.63.1 → evalvault-1.65.0}/examples/method_plugin_template/src/method_plugin_template/methods.py +0 -0
  248. {evalvault-1.63.1 → evalvault-1.65.0}/examples/stage_events.jsonl +0 -0
  249. {evalvault-1.63.1 → evalvault-1.65.0}/examples/usecase/comprehensive_workflow_test.py +0 -0
  250. {evalvault-1.63.1 → evalvault-1.65.0}/examples/usecase/insurance_eval_dataset.json +0 -0
  251. {evalvault-1.63.1 → evalvault-1.65.0}/examples/usecase/output/comprehensive_report.html +0 -0
  252. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/.env.example +0 -0
  253. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/.gitignore +0 -0
  254. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/README.md +0 -0
  255. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/e2e/analysis-compare.spec.ts +0 -0
  256. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/e2e/analysis-lab.spec.ts +0 -0
  257. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/e2e/compare-runs.spec.ts +0 -0
  258. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/e2e/dashboard.spec.ts +0 -0
  259. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/e2e/domain-memory.spec.ts +0 -0
  260. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/e2e/evaluation-studio.spec.ts +0 -0
  261. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/e2e/knowledge-base.spec.ts +0 -0
  262. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/e2e/mocks/intents.json +0 -0
  263. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/e2e/mocks/run_details.json +0 -0
  264. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/e2e/mocks/runs.json +0 -0
  265. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/e2e/run-details.spec.ts +0 -0
  266. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/eslint.config.js +0 -0
  267. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/index.html +0 -0
  268. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/package-lock.json +0 -0
  269. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/package.json +0 -0
  270. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/playwright.config.ts +0 -0
  271. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/public/vite.svg +0 -0
  272. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/App.css +0 -0
  273. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/App.tsx +0 -0
  274. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/assets/react.svg +0 -0
  275. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/components/AnalysisNodeOutputs.tsx +0 -0
  276. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/components/InsightSpacePanel.tsx +0 -0
  277. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/components/Layout.tsx +0 -0
  278. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/components/MarkdownContent.tsx +0 -0
  279. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/components/PrioritySummaryPanel.tsx +0 -0
  280. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/components/SpaceLegend.tsx +0 -0
  281. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/components/SpacePlot2D.tsx +0 -0
  282. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/components/SpacePlot3D.tsx +0 -0
  283. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/components/StatusBadge.tsx +0 -0
  284. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/components/VirtualizedText.tsx +0 -0
  285. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/config/ui.ts +0 -0
  286. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/config.ts +0 -0
  287. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/hooks/useInsightSpace.ts +0 -0
  288. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/index.css +0 -0
  289. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/main.tsx +0 -0
  290. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/AnalysisCompareView.tsx +0 -0
  291. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/AnalysisResultView.tsx +0 -0
  292. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/ComprehensiveAnalysis.tsx +0 -0
  293. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/CustomerReport.tsx +0 -0
  294. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/Dashboard.tsx +0 -0
  295. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/DomainMemory.tsx +0 -0
  296. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/EvaluationStudio.tsx +0 -0
  297. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/KnowledgeBase.tsx +0 -0
  298. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/Settings.tsx +0 -0
  299. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/Visualization.tsx +0 -0
  300. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/pages/VisualizationHome.tsx +0 -0
  301. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/services/api.ts +0 -0
  302. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/types/plotly.d.ts +0 -0
  303. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/utils/format.ts +0 -0
  304. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/utils/phoenix.ts +0 -0
  305. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/utils/runAnalytics.ts +0 -0
  306. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/utils/score.ts +0 -0
  307. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/src/utils/summaryMetrics.ts +0 -0
  308. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/tailwind.config.js +0 -0
  309. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/tsconfig.app.json +0 -0
  310. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/tsconfig.json +0 -0
  311. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/tsconfig.node.json +0 -0
  312. {evalvault-1.63.1 → evalvault-1.65.0}/frontend/vite.config.ts +0 -0
  313. {evalvault-1.63.1 → evalvault-1.65.0}/mkdocs.yml +0 -0
  314. {evalvault-1.63.1 → evalvault-1.65.0}/package-lock.json +0 -0
  315. {evalvault-1.63.1 → evalvault-1.65.0}/prompts/system_override.txt +0 -0
  316. {evalvault-1.63.1 → evalvault-1.65.0}/reports/.gitkeep +0 -0
  317. {evalvault-1.63.1 → evalvault-1.65.0}/reports/README.md +0 -0
  318. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/final_output.json +0 -0
  319. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/index.json +0 -0
  320. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/load_runs.json +0 -0
  321. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/report.json +0 -0
  322. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_change_detection.json +0 -0
  323. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_0aa9fab0_f1287e90/run_metric_comparison.json +0 -0
  324. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/final_output.json +0 -0
  325. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/index.json +0 -0
  326. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/load_runs.json +0 -0
  327. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/report.json +0 -0
  328. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_change_detection.json +0 -0
  329. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_8f825b22_4516d358/run_metric_comparison.json +0 -0
  330. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/final_output.json +0 -0
  331. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/index.json +0 -0
  332. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/load_runs.json +0 -0
  333. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/report.json +0 -0
  334. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_change_detection.json +0 -0
  335. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/artifacts/comparison_f1287e90_8f825b22/run_metric_comparison.json +0 -0
  336. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/comparison_0aa9fab0_9fbf4776.json +0 -0
  337. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/comparison_0aa9fab0_9fbf4776.md +0 -0
  338. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/comparison_0aa9fab0_f1287e90.json +0 -0
  339. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/comparison_0aa9fab0_f1287e90.md +0 -0
  340. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/comparison_8f825b22_4516d358.json +0 -0
  341. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/comparison_8f825b22_4516d358.md +0 -0
  342. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/comparison_9fbf4776_a491fa0e.json +0 -0
  343. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/comparison_9fbf4776_a491fa0e.md +0 -0
  344. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/comparison_f1287e90_8f825b22.json +0 -0
  345. {evalvault-1.63.1 → evalvault-1.65.0}/reports/comparison/comparison_f1287e90_8f825b22.md +0 -0
  346. {evalvault-1.63.1 → evalvault-1.65.0}/reports/debug_report_r1_smoke.md +0 -0
  347. {evalvault-1.63.1 → evalvault-1.65.0}/reports/debug_report_r2_graphrag.md +0 -0
  348. {evalvault-1.63.1 → evalvault-1.65.0}/reports/debug_report_r2_graphrag_openai.md +0 -0
  349. {evalvault-1.63.1 → evalvault-1.65.0}/reports/debug_report_r3_bm25.md +0 -0
  350. {evalvault-1.63.1 → evalvault-1.65.0}/reports/debug_report_r3_bm25_langfuse3.md +0 -0
  351. {evalvault-1.63.1 → evalvault-1.65.0}/reports/debug_report_r3_dense_faiss.md +0 -0
  352. {evalvault-1.63.1 → evalvault-1.65.0}/reports/improvement_1d91a667-4288-4742-be3a-a8f5310c5140.md +0 -0
  353. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r2_graphrag_openai_stage_events.jsonl +0 -0
  354. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r2_graphrag_openai_stage_report.txt +0 -0
  355. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r2_graphrag_stage_events.jsonl +0 -0
  356. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r2_graphrag_stage_report.txt +0 -0
  357. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r3_bm25_langfuse2_stage_events.jsonl +0 -0
  358. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r3_bm25_langfuse3_stage_events.jsonl +0 -0
  359. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r3_bm25_langfuse_stage_events.jsonl +0 -0
  360. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r3_bm25_phoenix_stage_events.jsonl +0 -0
  361. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r3_bm25_stage_events.jsonl +0 -0
  362. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r3_bm25_stage_report.txt +0 -0
  363. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r3_dense_faiss_stage_events.jsonl +0 -0
  364. {evalvault-1.63.1 → evalvault-1.65.0}/reports/r3_dense_faiss_stage_report.txt +0 -0
  365. {evalvault-1.63.1 → evalvault-1.65.0}/reports/retrieval_benchmark_smoke_precision.csv +0 -0
  366. {evalvault-1.63.1 → evalvault-1.65.0}/reports/retrieval_benchmark_smoke_precision_graphrag.csv +0 -0
  367. {evalvault-1.63.1 → evalvault-1.65.0}/reports/retrieval_benchmark_smoke_precision_multi.csv +0 -0
  368. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/benchmark/download_kmmlu.py +0 -0
  369. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/dev/open_rag_trace_demo.py +0 -0
  370. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/dev/open_rag_trace_integration_template.py +0 -0
  371. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/dev/otel-collector-config.yaml +0 -0
  372. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/dev/start_web_ui_with_phoenix.sh +0 -0
  373. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/dev/validate_open_rag_trace.py +0 -0
  374. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/dev_seed_pipeline_results.py +0 -0
  375. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/docs/__init__.py +0 -0
  376. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/docs/analyzer/__init__.py +0 -0
  377. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/docs/analyzer/ast_scanner.py +0 -0
  378. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/docs/analyzer/confidence_scorer.py +0 -0
  379. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/docs/analyzer/graph_builder.py +0 -0
  380. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/docs/analyzer/side_effect_detector.py +0 -0
  381. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/docs/generate_api_docs.py +0 -0
  382. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/docs/models/__init__.py +0 -0
  383. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/docs/models/schema.py +0 -0
  384. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/docs/renderer/__init__.py +0 -0
  385. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/docs/renderer/html_generator.py +0 -0
  386. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/perf/backfill_langfuse_trace_url.py +0 -0
  387. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/perf/r3_dense_smoke.py +0 -0
  388. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/perf/r3_evalvault_run_dataset.json +0 -0
  389. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/perf/r3_retriever_docs.json +0 -0
  390. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/perf/r3_smoke_real.jsonl +0 -0
  391. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/perf/r3_stage_events_sample.jsonl +0 -0
  392. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/pipeline_template_inspect.py +0 -0
  393. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/reports/generate_release_notes.py +0 -0
  394. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/run_with_timeout.py +0 -0
  395. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/test_full_evaluation.py +0 -0
  396. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/tests/run_regressions.py +0 -0
  397. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/tests/run_retriever_stage_report_smoke.sh +0 -0
  398. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/validate_tutorials.py +0 -0
  399. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/verify_ragas_compliance.py +0 -0
  400. {evalvault-1.63.1 → evalvault-1.65.0}/scripts/verify_workflows.py +0 -0
  401. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/__init__.py +0 -0
  402. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/__init__.py +0 -0
  403. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/__init__.py +0 -0
  404. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/__init__.py +0 -0
  405. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/adapter.py +0 -0
  406. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/__init__.py +0 -0
  407. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/benchmark.py +0 -0
  408. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/domain.py +0 -0
  409. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/pipeline.py +0 -0
  410. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/api/routers/runs.py +0 -0
  411. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/__init__.py +0 -0
  412. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/app.py +0 -0
  413. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/agent.py +0 -0
  414. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/analyze.py +0 -0
  415. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/api.py +0 -0
  416. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/benchmark.py +0 -0
  417. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/calibrate.py +0 -0
  418. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/config.py +0 -0
  419. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/debug.py +0 -0
  420. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/domain.py +0 -0
  421. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/experiment.py +0 -0
  422. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/gate.py +0 -0
  423. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/generate.py +0 -0
  424. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/init.py +0 -0
  425. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/kg.py +0 -0
  426. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/langfuse.py +0 -0
  427. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/method.py +0 -0
  428. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/phoenix.py +0 -0
  429. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/pipeline.py +0 -0
  430. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/prompts.py +0 -0
  431. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/run.py +0 -0
  432. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/run_helpers.py +0 -0
  433. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/commands/stage.py +0 -0
  434. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/__init__.py +0 -0
  435. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/analysis_io.py +0 -0
  436. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/console.py +0 -0
  437. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/errors.py +0 -0
  438. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/formatters.py +0 -0
  439. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/options.py +0 -0
  440. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/presets.py +0 -0
  441. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/progress.py +0 -0
  442. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/cli/utils/validators.py +0 -0
  443. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/mcp/__init__.py +0 -0
  444. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/mcp/schemas.py +0 -0
  445. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/inbound/mcp/tools.py +0 -0
  446. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/__init__.py +0 -0
  447. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/__init__.py +0 -0
  448. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/analysis_report_module.py +0 -0
  449. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/base_module.py +0 -0
  450. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/bm25_searcher_module.py +0 -0
  451. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/causal_adapter.py +0 -0
  452. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/causal_analyzer_module.py +0 -0
  453. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/common.py +0 -0
  454. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/comparison_report_module.py +0 -0
  455. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/data_loader_module.py +0 -0
  456. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/detailed_report_module.py +0 -0
  457. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/diagnostic_playbook_module.py +0 -0
  458. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/embedding_analyzer_module.py +0 -0
  459. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/embedding_distribution_module.py +0 -0
  460. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/embedding_searcher_module.py +0 -0
  461. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/hybrid_rrf_module.py +0 -0
  462. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/hybrid_weighted_module.py +0 -0
  463. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/hypothesis_generator_module.py +0 -0
  464. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/llm_report_module.py +0 -0
  465. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/low_performer_extractor_module.py +0 -0
  466. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/model_analyzer_module.py +0 -0
  467. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/morpheme_analyzer_module.py +0 -0
  468. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/morpheme_quality_checker_module.py +0 -0
  469. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/network_analyzer_module.py +0 -0
  470. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/nlp_adapter.py +0 -0
  471. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/nlp_analyzer_module.py +0 -0
  472. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/pattern_detector_module.py +0 -0
  473. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/pipeline_factory.py +0 -0
  474. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/pipeline_helpers.py +0 -0
  475. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/priority_summary_module.py +0 -0
  476. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/ragas_evaluator_module.py +0 -0
  477. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/retrieval_analyzer_module.py +0 -0
  478. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/retrieval_benchmark_module.py +0 -0
  479. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/retrieval_quality_checker_module.py +0 -0
  480. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/root_cause_analyzer_module.py +0 -0
  481. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/run_analyzer_module.py +0 -0
  482. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/run_change_detector_module.py +0 -0
  483. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/run_comparator_module.py +0 -0
  484. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/run_loader_module.py +0 -0
  485. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/run_metric_comparator_module.py +0 -0
  486. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/search_comparator_module.py +0 -0
  487. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/statistical_adapter.py +0 -0
  488. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/statistical_analyzer_module.py +0 -0
  489. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/statistical_comparator_module.py +0 -0
  490. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/summary_report_module.py +0 -0
  491. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/time_series_analyzer_module.py +0 -0
  492. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/timeseries_advanced_module.py +0 -0
  493. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/trend_detector_module.py +0 -0
  494. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/analysis/verification_report_module.py +0 -0
  495. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/benchmark/__init__.py +0 -0
  496. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/benchmark/lm_eval_adapter.py +0 -0
  497. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/cache/__init__.py +0 -0
  498. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/cache/hybrid_cache.py +0 -0
  499. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/cache/memory_cache.py +0 -0
  500. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/__init__.py +0 -0
  501. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/base.py +0 -0
  502. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/csv_loader.py +0 -0
  503. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/excel_loader.py +0 -0
  504. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/json_loader.py +0 -0
  505. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/loader_factory.py +0 -0
  506. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/method_input_loader.py +0 -0
  507. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/streaming_loader.py +0 -0
  508. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/templates.py +0 -0
  509. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/dataset/thresholds.py +0 -0
  510. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/debug/__init__.py +0 -0
  511. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/debug/report_renderer.py +0 -0
  512. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/documents/__init__.py +0 -0
  513. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/documents/ocr/__init__.py +0 -0
  514. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/documents/ocr/paddleocr_backend.py +0 -0
  515. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/documents/pdf_extractor.py +0 -0
  516. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/documents/versioned_loader.py +0 -0
  517. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/domain_memory/__init__.py +0 -0
  518. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/domain_memory/domain_memory_schema.sql +0 -0
  519. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/domain_memory/sqlite_adapter.py +0 -0
  520. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/improvement/__init__.py +0 -0
  521. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/improvement/insight_generator.py +0 -0
  522. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/improvement/pattern_detector.py +0 -0
  523. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/improvement/playbook_loader.py +0 -0
  524. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/improvement/stage_metric_playbook_loader.py +0 -0
  525. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/kg/__init__.py +0 -0
  526. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/kg/graph_rag_retriever.py +0 -0
  527. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/kg/networkx_adapter.py +0 -0
  528. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/kg/parallel_kg_builder.py +0 -0
  529. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/kg/query_strategies.py +0 -0
  530. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/__init__.py +0 -0
  531. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/anthropic_adapter.py +0 -0
  532. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/azure_adapter.py +0 -0
  533. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/base.py +0 -0
  534. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/factory.py +0 -0
  535. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/instructor_factory.py +0 -0
  536. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/llm_relation_augmenter.py +0 -0
  537. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/ollama_adapter.py +0 -0
  538. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/openai_adapter.py +0 -0
  539. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/token_aware_chat.py +0 -0
  540. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/llm/vllm_adapter.py +0 -0
  541. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/methods/__init__.py +0 -0
  542. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/methods/baseline_oracle.py +0 -0
  543. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/methods/registry.py +0 -0
  544. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/__init__.py +0 -0
  545. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/__init__.py +0 -0
  546. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/bm25_retriever.py +0 -0
  547. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/dense_retriever.py +0 -0
  548. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/document_chunker.py +0 -0
  549. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/hybrid_retriever.py +0 -0
  550. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/kiwi_tokenizer.py +0 -0
  551. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/korean_evaluation.py +0 -0
  552. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/korean_stopwords.py +0 -0
  553. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/toolkit.py +0 -0
  554. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/nlp/korean/toolkit_factory.py +0 -0
  555. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/phoenix/sync_service.py +0 -0
  556. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/report/__init__.py +0 -0
  557. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/report/dashboard_generator.py +0 -0
  558. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/report/llm_report_generator.py +0 -0
  559. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/report/markdown_adapter.py +0 -0
  560. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/__init__.py +0 -0
  561. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/base_sql.py +0 -0
  562. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/benchmark_storage_adapter.py +0 -0
  563. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/postgres_adapter.py +0 -0
  564. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/postgres_schema.sql +0 -0
  565. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/schema.sql +0 -0
  566. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/storage/sqlite_adapter.py +0 -0
  567. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/__init__.py +0 -0
  568. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/open_rag_log_handler.py +0 -0
  569. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_adapter.py +0 -0
  570. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_decorators.py +0 -0
  571. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/open_rag_trace_helpers.py +0 -0
  572. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracer/phoenix_tracer_adapter.py +0 -0
  573. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/adapters/outbound/tracker/__init__.py +0 -0
  574. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/config/__init__.py +0 -0
  575. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/config/agent_types.py +0 -0
  576. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/config/domain_config.py +0 -0
  577. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/config/instrumentation.py +0 -0
  578. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/config/langfuse_support.py +0 -0
  579. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/config/model_config.py +0 -0
  580. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/config/phoenix_support.py +0 -0
  581. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/config/playbooks/improvement_playbook.yaml +0 -0
  582. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/debug_ragas.py +0 -0
  583. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/debug_ragas_real.py +0 -0
  584. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/__init__.py +0 -0
  585. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/analysis.py +0 -0
  586. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/analysis_pipeline.py +0 -0
  587. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/benchmark.py +0 -0
  588. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/benchmark_run.py +0 -0
  589. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/dataset.py +0 -0
  590. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/debug.py +0 -0
  591. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/experiment.py +0 -0
  592. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/feedback.py +0 -0
  593. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/improvement.py +0 -0
  594. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/kg.py +0 -0
  595. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/memory.py +0 -0
  596. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/method.py +0 -0
  597. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/prompt.py +0 -0
  598. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/prompt_suggestion.py +0 -0
  599. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/rag_trace.py +0 -0
  600. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/entities/result.py +0 -0
  601. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/metrics/__init__.py +0 -0
  602. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/metrics/analysis_registry.py +0 -0
  603. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/metrics/confidence.py +0 -0
  604. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/metrics/contextual_relevancy.py +0 -0
  605. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/metrics/entity_preservation.py +0 -0
  606. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/metrics/insurance.py +0 -0
  607. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/metrics/no_answer.py +0 -0
  608. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/metrics/registry.py +0 -0
  609. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/metrics/retrieval_rank.py +0 -0
  610. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/metrics/terms_dictionary.json +0 -0
  611. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/metrics/text_match.py +0 -0
  612. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/__init__.py +0 -0
  613. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/analysis_service.py +0 -0
  614. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/async_batch_executor.py +0 -0
  615. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/batch_executor.py +0 -0
  616. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/benchmark_report_service.py +0 -0
  617. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/benchmark_service.py +0 -0
  618. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/cache_metrics.py +0 -0
  619. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/cluster_map_builder.py +0 -0
  620. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/debug_report_service.py +0 -0
  621. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/document_chunker.py +0 -0
  622. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/document_versioning.py +0 -0
  623. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/domain_learning_hook.py +0 -0
  624. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/embedding_overlay.py +0 -0
  625. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/entity_extractor.py +0 -0
  626. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/experiment_comparator.py +0 -0
  627. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/experiment_manager.py +0 -0
  628. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/experiment_reporter.py +0 -0
  629. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/experiment_repository.py +0 -0
  630. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/experiment_statistics.py +0 -0
  631. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/holdout_splitter.py +0 -0
  632. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/improvement_guide_service.py +0 -0
  633. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/intent_classifier.py +0 -0
  634. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/kg_generator.py +0 -0
  635. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/memory_aware_evaluator.py +0 -0
  636. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/memory_based_analysis.py +0 -0
  637. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/method_runner.py +0 -0
  638. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/pipeline_orchestrator.py +0 -0
  639. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/pipeline_template_registry.py +0 -0
  640. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_candidate_service.py +0 -0
  641. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_manifest.py +0 -0
  642. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_registry.py +0 -0
  643. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_scoring_service.py +0 -0
  644. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_status.py +0 -0
  645. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/prompt_suggestion_reporter.py +0 -0
  646. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/ragas_prompt_overrides.py +0 -0
  647. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/retrieval_metrics.py +0 -0
  648. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/retriever_context.py +0 -0
  649. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/satisfaction_calibration_service.py +0 -0
  650. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/stage_metric_guide_service.py +0 -0
  651. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/stage_summary_service.py +0 -0
  652. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/synthetic_qa_generator.py +0 -0
  653. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/testset_generator.py +0 -0
  654. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/threshold_profiles.py +0 -0
  655. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/unified_report_service.py +0 -0
  656. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/domain/services/visual_space_service.py +0 -0
  657. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/mkdocs_helpers.py +0 -0
  658. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/__init__.py +0 -0
  659. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/inbound/__init__.py +0 -0
  660. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/inbound/analysis_pipeline_port.py +0 -0
  661. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/inbound/evaluator_port.py +0 -0
  662. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/inbound/learning_hook_port.py +0 -0
  663. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/inbound/web_port.py +0 -0
  664. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/analysis_cache_port.py +0 -0
  665. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/analysis_module_port.py +0 -0
  666. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/analysis_port.py +0 -0
  667. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/benchmark_port.py +0 -0
  668. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/causal_analysis_port.py +0 -0
  669. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/dataset_port.py +0 -0
  670. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/domain_memory_port.py +0 -0
  671. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/embedding_port.py +0 -0
  672. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/improvement_port.py +0 -0
  673. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/intent_classifier_port.py +0 -0
  674. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/korean_nlp_port.py +0 -0
  675. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/llm_factory_port.py +0 -0
  676. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/llm_port.py +0 -0
  677. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/method_port.py +0 -0
  678. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/nlp_analysis_port.py +0 -0
  679. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/relation_augmenter_port.py +0 -0
  680. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/report_port.py +0 -0
  681. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/stage_storage_port.py +0 -0
  682. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/storage_port.py +0 -0
  683. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/tracer_port.py +0 -0
  684. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/ports/outbound/tracker_port.py +0 -0
  685. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/reports/__init__.py +0 -0
  686. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/reports/release_notes.py +0 -0
  687. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/scripts/__init__.py +0 -0
  688. {evalvault-1.63.1 → evalvault-1.65.0}/src/evalvault/scripts/regression_runner.py +0 -0
  689. {evalvault-1.63.1 → evalvault-1.65.0}/tests/__init__.py +0 -0
  690. {evalvault-1.63.1 → evalvault-1.65.0}/tests/conftest.py +0 -0
  691. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/README.md +0 -0
  692. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/benchmark/retrieval_ground_truth_min.json +0 -0
  693. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/benchmark/retrieval_ground_truth_multi.json +0 -0
  694. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/auto_insurance_qa_korean_full.json +0 -0
  695. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/comprehensive_dataset.json +0 -0
  696. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/edge_cases.json +0 -0
  697. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/edge_cases.xlsx +0 -0
  698. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/evaluation_test_sample.json +0 -0
  699. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/graphrag_retriever_docs.json +0 -0
  700. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/graphrag_smoke.json +0 -0
  701. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_document.txt +0 -0
  702. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_english.csv +0 -0
  703. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_english.json +0 -0
  704. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_english.xlsx +0 -0
  705. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_korean.csv +0 -0
  706. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_korean.json +0 -0
  707. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_korean.xlsx +0 -0
  708. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/insurance_qa_korean_versioned_pdf.json +0 -0
  709. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/run_mode_full_domain_memory.json +0 -0
  710. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/run_mode_simple.json +0 -0
  711. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/e2e/summary_eval_minimal.json +0 -0
  712. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/kg/minimal_graph.json +0 -0
  713. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/sample_dataset.csv +0 -0
  714. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/sample_dataset.json +0 -0
  715. {evalvault-1.63.1 → evalvault-1.65.0}/tests/fixtures/sample_dataset.xlsx +0 -0
  716. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/__init__.py +0 -0
  717. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/benchmark/test_benchmark_service_integration.py +0 -0
  718. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/conftest.py +0 -0
  719. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/test_data_flow.py +0 -0
  720. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/test_e2e_scenarios.py +0 -0
  721. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/test_evaluation_flow.py +0 -0
  722. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/test_full_workflow.py +0 -0
  723. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/test_langfuse_flow.py +0 -0
  724. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/test_phoenix_flow.py +0 -0
  725. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/test_storage_flow.py +0 -0
  726. {evalvault-1.63.1 → evalvault-1.65.0}/tests/integration/test_summary_eval_fixture.py +0 -0
  727. {evalvault-1.63.1 → evalvault-1.65.0}/tests/optional_deps.py +0 -0
  728. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/__init__.py +0 -0
  729. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/inbound/mcp/test_execute_tools.py +0 -0
  730. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/inbound/mcp/test_read_tools.py +0 -0
  731. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/outbound/documents/test_pdf_extractor.py +0 -0
  732. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/outbound/documents/test_versioned_loader.py +0 -0
  733. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/outbound/improvement/__init__.py +0 -0
  734. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/outbound/improvement/test_insight_generator.py +0 -0
  735. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/outbound/improvement/test_pattern_detector.py +0 -0
  736. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/outbound/improvement/test_playbook_loader.py +0 -0
  737. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/outbound/improvement/test_stage_metric_playbook_loader.py +0 -0
  738. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/outbound/kg/test_graph_rag_retriever.py +0 -0
  739. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/outbound/kg/test_parallel_kg_builder.py +0 -0
  740. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/adapters/outbound/storage/test_benchmark_storage_adapter.py +0 -0
  741. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/config/test_phoenix_support.py +0 -0
  742. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/conftest.py +0 -0
  743. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/metrics/test_analysis_metric_registry.py +0 -0
  744. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/metrics/test_confidence.py +0 -0
  745. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/metrics/test_contextual_relevancy.py +0 -0
  746. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/metrics/test_entity_preservation.py +0 -0
  747. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/metrics/test_metric_registry.py +0 -0
  748. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/metrics/test_no_answer.py +0 -0
  749. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/metrics/test_retrieval_rank.py +0 -0
  750. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/metrics/test_text_match.py +0 -0
  751. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_cache_metrics.py +0 -0
  752. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_claim_level.py +0 -0
  753. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_document_versioning.py +0 -0
  754. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_holdout_splitter.py +0 -0
  755. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_improvement_guide_service.py +0 -0
  756. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_retrieval_metrics.py +0 -0
  757. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_retriever_context.py +0 -0
  758. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_stage_event_builder.py +0 -0
  759. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_stage_metric_guide_service.py +0 -0
  760. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/services/test_synthetic_qa_generator.py +0 -0
  761. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/test_embedding_overlay.py +0 -0
  762. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/test_prompt_manifest.py +0 -0
  763. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/domain/test_prompt_status.py +0 -0
  764. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/reports/test_release_notes.py +0 -0
  765. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/scripts/test_regression_runner.py +0 -0
  766. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_agent_types.py +0 -0
  767. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_analysis_entities.py +0 -0
  768. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_analysis_modules.py +0 -0
  769. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_analysis_pipeline.py +0 -0
  770. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_analysis_service.py +0 -0
  771. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_anthropic_adapter.py +0 -0
  772. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_async_batch_executor.py +0 -0
  773. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_azure_adapter.py +0 -0
  774. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_benchmark_helpers.py +0 -0
  775. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_benchmark_runner.py +0 -0
  776. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_causal_adapter.py +0 -0
  777. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_cli_domain.py +0 -0
  778. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_cli_init.py +0 -0
  779. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_cli_progress.py +0 -0
  780. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_cli_utils.py +0 -0
  781. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_data_loaders.py +0 -0
  782. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_domain_config.py +0 -0
  783. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_domain_memory.py +0 -0
  784. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_entities.py +0 -0
  785. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_entities_kg.py +0 -0
  786. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_entity_extractor.py +0 -0
  787. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_evaluator.py +0 -0
  788. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_experiment.py +0 -0
  789. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_hybrid_cache.py +0 -0
  790. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_instrumentation.py +0 -0
  791. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_insurance_metric.py +0 -0
  792. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_intent_classifier.py +0 -0
  793. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_kg_generator.py +0 -0
  794. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_kg_networkx.py +0 -0
  795. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_kiwi_tokenizer.py +0 -0
  796. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_kiwi_warning_suppression.py +0 -0
  797. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_korean_dense.py +0 -0
  798. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_korean_evaluation.py +0 -0
  799. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_korean_retrieval.py +0 -0
  800. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_langfuse_tracker.py +0 -0
  801. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_llm_relation_augmenter.py +0 -0
  802. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_lm_eval_adapter.py +0 -0
  803. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_markdown_report.py +0 -0
  804. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_memory_cache.py +0 -0
  805. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_memory_services.py +0 -0
  806. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_method_plugins.py +0 -0
  807. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_mlflow_tracker.py +0 -0
  808. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_model_config.py +0 -0
  809. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_nlp_adapter.py +0 -0
  810. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_nlp_entities.py +0 -0
  811. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_ollama_adapter.py +0 -0
  812. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_openai_adapter.py +0 -0
  813. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_phoenix_adapter.py +0 -0
  814. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_pipeline_orchestrator.py +0 -0
  815. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_ports.py +0 -0
  816. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_postgres_storage.py +0 -0
  817. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_prompt_candidate_service.py +0 -0
  818. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_rag_trace_entities.py +0 -0
  819. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_run_memory_helpers.py +0 -0
  820. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_run_mode_fixtures.py +0 -0
  821. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_sqlite_storage.py +0 -0
  822. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_stage_cli.py +0 -0
  823. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_stage_storage.py +0 -0
  824. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_stage_summary_service.py +0 -0
  825. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_statistical_adapter.py +0 -0
  826. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_streaming_loader.py +0 -0
  827. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_summary_eval_fixture.py +0 -0
  828. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_testset_generator.py +0 -0
  829. {evalvault-1.63.1 → evalvault-1.65.0}/tests/unit/test_web_adapter.py +0 -0
@@ -39,6 +39,12 @@ OLLAMA_TIMEOUT=120
39
39
  OPENAI_API_KEY=sk-your-api-key-here
40
40
  # OPENAI_BASE_URL=https://api.openai.com/v1 # 커스텀 엔드포인트 (선택)
41
41
 
42
+ # ================================================
43
+ # Secret Manager 연동 (선택)
44
+ # ================================================
45
+ # SECRET_PROVIDER=env|aws|gcp|vault
46
+ # 예시: OPENAI_API_KEY=secret://OPENAI_TOKEN
47
+
42
48
  # ================================================
43
49
  # vLLM 설정 (OpenAI-compatible)
44
50
  # ================================================
@@ -91,8 +97,18 @@ OPENAI_API_KEY=sk-your-api-key-here
91
97
  # POSTGRES_PASSWORD=your-password
92
98
 
93
99
  # ================================================
94
- # CORS / Frontend 설정 (React dev)
95
- # ================================================
100
+ # API 인증 / CORS / Frontend 설정
101
+ # ================================================
102
+ # API 토큰(콤마 구분). 비워두면 인증 비활성화
103
+ # API_AUTH_TOKENS=token1,token2
104
+ # Knowledge API 읽기/쓰기 토큰(콤마 구분). 비워두면 추가 제어 비활성화
105
+ # KNOWLEDGE_READ_TOKENS=read-token
106
+ # KNOWLEDGE_WRITE_TOKENS=write-token
107
+ # 레이트리밋 (기본 비활성화)
108
+ # RATE_LIMIT_ENABLED=false
109
+ # RATE_LIMIT_REQUESTS=120
110
+ # RATE_LIMIT_WINDOW_SECONDS=60
111
+ # RATE_LIMIT_BLOCK_THRESHOLD=10
96
112
  # React 프론트에서 API를 직접 호출할 때만 필요
97
113
  # CORS_ORIGINS=http://localhost:5173,http://127.0.0.1:5173
98
114
 
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: evalvault
3
- Version: 1.63.1
3
+ Version: 1.65.0
4
4
  Summary: RAG evaluation system using Ragas with Phoenix/Langfuse tracing
5
5
  Project-URL: Homepage, https://github.com/ntts9990/EvalVault
6
6
  Project-URL: Documentation, https://github.com/ntts9990/EvalVault#readme
@@ -111,6 +111,10 @@ Requires-Dist: opentelemetry-exporter-otlp>=1.20.0; extra == 'phoenix'
111
111
  Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'phoenix'
112
112
  Provides-Extra: postgres
113
113
  Requires-Dist: psycopg[binary]>=3.0.0; extra == 'postgres'
114
+ Provides-Extra: secrets
115
+ Requires-Dist: boto3; extra == 'secrets'
116
+ Requires-Dist: google-cloud-secret-manager; extra == 'secrets'
117
+ Requires-Dist: hvac; extra == 'secrets'
114
118
  Provides-Extra: timeseries
115
119
  Requires-Dist: aeon>=1.3.0; extra == 'timeseries'
116
120
  Requires-Dist: numba>=0.55.0; extra == 'timeseries'
@@ -175,6 +179,9 @@ uv run evalvault run --mode simple tests/fixtures/e2e/insurance_qa_korean.json \
175
179
  --auto-analyze
176
180
  ```
177
181
 
182
+ - API 인증을 쓰려면 `.env`에 `API_AUTH_TOKENS`를 설정하세요.
183
+ - `secret://` 참조를 쓰면 `SECRET_PROVIDER`와 `--extra secrets`가 필요합니다.
184
+ - 레이트리밋은 `RATE_LIMIT_ENABLED`로 활성화합니다.
178
185
  - 결과는 기본 DB(`data/db/evalvault.db`)에 저장되어 `history`, Web UI, 비교 분석에서 재사용됩니다.
179
186
  - `--db`를 생략해도 기본 경로로 저장되며, 모든 데이터가 자동으로 엑셀로 내보내집니다.
180
187
  - `--auto-analyze`는 요약 리포트 + 모듈별 아티팩트를 함께 생성합니다.
@@ -56,6 +56,9 @@ uv run evalvault run --mode simple tests/fixtures/e2e/insurance_qa_korean.json \
56
56
  --auto-analyze
57
57
  ```
58
58
 
59
+ - API 인증을 쓰려면 `.env`에 `API_AUTH_TOKENS`를 설정하세요.
60
+ - `secret://` 참조를 쓰면 `SECRET_PROVIDER`와 `--extra secrets`가 필요합니다.
61
+ - 레이트리밋은 `RATE_LIMIT_ENABLED`로 활성화합니다.
59
62
  - 결과는 기본 DB(`data/db/evalvault.db`)에 저장되어 `history`, Web UI, 비교 분석에서 재사용됩니다.
60
63
  - `--db`를 생략해도 기본 경로로 저장되며, 모든 데이터가 자동으로 엑셀로 내보내집니다.
61
64
  - `--auto-analyze`는 요약 리포트 + 모듈별 아티팩트를 함께 생성합니다.
@@ -20,28 +20,28 @@ services:
20
20
  - 127.0.0.1:3030:3030
21
21
  environment: &langfuse-worker-env
22
22
  NEXTAUTH_URL: ${NEXTAUTH_URL:-http://localhost:3000}
23
- DATABASE_URL: ${DATABASE_URL:-postgresql://postgres:postgres@postgres:5432/postgres} # CHANGEME
24
- SALT: ${SALT:-mysalt} # CHANGEME
25
- ENCRYPTION_KEY: ${ENCRYPTION_KEY:-0000000000000000000000000000000000000000000000000000000000000000} # CHANGEME: generate via `openssl rand -hex 32`
23
+ DATABASE_URL: ${DATABASE_URL?} # CHANGEME
24
+ SALT: ${SALT?} # CHANGEME
25
+ ENCRYPTION_KEY: ${ENCRYPTION_KEY?} # CHANGEME: generate via `openssl rand -hex 32`
26
26
  TELEMETRY_ENABLED: ${TELEMETRY_ENABLED:-true}
27
27
  LANGFUSE_ENABLE_EXPERIMENTAL_FEATURES: ${LANGFUSE_ENABLE_EXPERIMENTAL_FEATURES:-true}
28
28
  CLICKHOUSE_MIGRATION_URL: ${CLICKHOUSE_MIGRATION_URL:-clickhouse://clickhouse:9000}
29
29
  CLICKHOUSE_URL: ${CLICKHOUSE_URL:-http://clickhouse:8123}
30
- CLICKHOUSE_USER: ${CLICKHOUSE_USER:-clickhouse}
31
- CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD:-clickhouse} # CHANGEME
30
+ CLICKHOUSE_USER: ${CLICKHOUSE_USER?}
31
+ CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD?} # CHANGEME
32
32
  CLICKHOUSE_CLUSTER_ENABLED: ${CLICKHOUSE_CLUSTER_ENABLED:-false}
33
33
  LANGFUSE_USE_AZURE_BLOB: ${LANGFUSE_USE_AZURE_BLOB:-false}
34
34
  LANGFUSE_S3_EVENT_UPLOAD_BUCKET: ${LANGFUSE_S3_EVENT_UPLOAD_BUCKET:-langfuse}
35
35
  LANGFUSE_S3_EVENT_UPLOAD_REGION: ${LANGFUSE_S3_EVENT_UPLOAD_REGION:-auto}
36
- LANGFUSE_S3_EVENT_UPLOAD_ACCESS_KEY_ID: ${LANGFUSE_S3_EVENT_UPLOAD_ACCESS_KEY_ID:-minio}
37
- LANGFUSE_S3_EVENT_UPLOAD_SECRET_ACCESS_KEY: ${LANGFUSE_S3_EVENT_UPLOAD_SECRET_ACCESS_KEY:-miniosecret} # CHANGEME
36
+ LANGFUSE_S3_EVENT_UPLOAD_ACCESS_KEY_ID: ${LANGFUSE_S3_EVENT_UPLOAD_ACCESS_KEY_ID?}
37
+ LANGFUSE_S3_EVENT_UPLOAD_SECRET_ACCESS_KEY: ${LANGFUSE_S3_EVENT_UPLOAD_SECRET_ACCESS_KEY?} # CHANGEME
38
38
  LANGFUSE_S3_EVENT_UPLOAD_ENDPOINT: ${LANGFUSE_S3_EVENT_UPLOAD_ENDPOINT:-http://minio:9000}
39
39
  LANGFUSE_S3_EVENT_UPLOAD_FORCE_PATH_STYLE: ${LANGFUSE_S3_EVENT_UPLOAD_FORCE_PATH_STYLE:-true}
40
40
  LANGFUSE_S3_EVENT_UPLOAD_PREFIX: ${LANGFUSE_S3_EVENT_UPLOAD_PREFIX:-events/}
41
41
  LANGFUSE_S3_MEDIA_UPLOAD_BUCKET: ${LANGFUSE_S3_MEDIA_UPLOAD_BUCKET:-langfuse}
42
42
  LANGFUSE_S3_MEDIA_UPLOAD_REGION: ${LANGFUSE_S3_MEDIA_UPLOAD_REGION:-auto}
43
- LANGFUSE_S3_MEDIA_UPLOAD_ACCESS_KEY_ID: ${LANGFUSE_S3_MEDIA_UPLOAD_ACCESS_KEY_ID:-minio}
44
- LANGFUSE_S3_MEDIA_UPLOAD_SECRET_ACCESS_KEY: ${LANGFUSE_S3_MEDIA_UPLOAD_SECRET_ACCESS_KEY:-miniosecret} # CHANGEME
43
+ LANGFUSE_S3_MEDIA_UPLOAD_ACCESS_KEY_ID: ${LANGFUSE_S3_MEDIA_UPLOAD_ACCESS_KEY_ID?}
44
+ LANGFUSE_S3_MEDIA_UPLOAD_SECRET_ACCESS_KEY: ${LANGFUSE_S3_MEDIA_UPLOAD_SECRET_ACCESS_KEY?} # CHANGEME
45
45
  LANGFUSE_S3_MEDIA_UPLOAD_ENDPOINT: ${LANGFUSE_S3_MEDIA_UPLOAD_ENDPOINT:-http://localhost:9090}
46
46
  LANGFUSE_S3_MEDIA_UPLOAD_FORCE_PATH_STYLE: ${LANGFUSE_S3_MEDIA_UPLOAD_FORCE_PATH_STYLE:-true}
47
47
  LANGFUSE_S3_MEDIA_UPLOAD_PREFIX: ${LANGFUSE_S3_MEDIA_UPLOAD_PREFIX:-media/}
@@ -51,14 +51,14 @@ services:
51
51
  LANGFUSE_S3_BATCH_EXPORT_REGION: ${LANGFUSE_S3_BATCH_EXPORT_REGION:-auto}
52
52
  LANGFUSE_S3_BATCH_EXPORT_ENDPOINT: ${LANGFUSE_S3_BATCH_EXPORT_ENDPOINT:-http://minio:9000}
53
53
  LANGFUSE_S3_BATCH_EXPORT_EXTERNAL_ENDPOINT: ${LANGFUSE_S3_BATCH_EXPORT_EXTERNAL_ENDPOINT:-http://localhost:9090}
54
- LANGFUSE_S3_BATCH_EXPORT_ACCESS_KEY_ID: ${LANGFUSE_S3_BATCH_EXPORT_ACCESS_KEY_ID:-minio}
55
- LANGFUSE_S3_BATCH_EXPORT_SECRET_ACCESS_KEY: ${LANGFUSE_S3_BATCH_EXPORT_SECRET_ACCESS_KEY:-miniosecret} # CHANGEME
54
+ LANGFUSE_S3_BATCH_EXPORT_ACCESS_KEY_ID: ${LANGFUSE_S3_BATCH_EXPORT_ACCESS_KEY_ID?}
55
+ LANGFUSE_S3_BATCH_EXPORT_SECRET_ACCESS_KEY: ${LANGFUSE_S3_BATCH_EXPORT_SECRET_ACCESS_KEY?} # CHANGEME
56
56
  LANGFUSE_S3_BATCH_EXPORT_FORCE_PATH_STYLE: ${LANGFUSE_S3_BATCH_EXPORT_FORCE_PATH_STYLE:-true}
57
57
  LANGFUSE_INGESTION_QUEUE_DELAY_MS: ${LANGFUSE_INGESTION_QUEUE_DELAY_MS:-}
58
58
  LANGFUSE_INGESTION_CLICKHOUSE_WRITE_INTERVAL_MS: ${LANGFUSE_INGESTION_CLICKHOUSE_WRITE_INTERVAL_MS:-}
59
59
  REDIS_HOST: ${REDIS_HOST:-redis}
60
60
  REDIS_PORT: ${REDIS_PORT:-6379}
61
- REDIS_AUTH: ${REDIS_AUTH:-myredissecret} # CHANGEME
61
+ REDIS_AUTH: ${REDIS_AUTH?} # CHANGEME
62
62
  REDIS_TLS_ENABLED: ${REDIS_TLS_ENABLED:-false}
63
63
  REDIS_TLS_CA: ${REDIS_TLS_CA:-/certs/ca.crt}
64
64
  REDIS_TLS_CERT: ${REDIS_TLS_CERT:-/certs/redis.crt}
@@ -74,7 +74,7 @@ services:
74
74
  - 3000:3000
75
75
  environment:
76
76
  <<: *langfuse-worker-env
77
- NEXTAUTH_SECRET: ${NEXTAUTH_SECRET:-mysecret} # CHANGEME
77
+ NEXTAUTH_SECRET: ${NEXTAUTH_SECRET?} # CHANGEME
78
78
  LANGFUSE_INIT_ORG_ID: ${LANGFUSE_INIT_ORG_ID:-}
79
79
  LANGFUSE_INIT_ORG_NAME: ${LANGFUSE_INIT_ORG_NAME:-}
80
80
  LANGFUSE_INIT_PROJECT_ID: ${LANGFUSE_INIT_PROJECT_ID:-}
@@ -91,8 +91,8 @@ services:
91
91
  user: "101:101"
92
92
  environment:
93
93
  CLICKHOUSE_DB: default
94
- CLICKHOUSE_USER: ${CLICKHOUSE_USER:-clickhouse}
95
- CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD:-clickhouse} # CHANGEME
94
+ CLICKHOUSE_USER: ${CLICKHOUSE_USER?}
95
+ CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD?} # CHANGEME
96
96
  volumes:
97
97
  - langfuse_clickhouse_data:/var/lib/clickhouse
98
98
  - langfuse_clickhouse_logs:/var/log/clickhouse-server
@@ -111,8 +111,8 @@ services:
111
111
  # create the 'langfuse' bucket before starting the service
112
112
  command: -c 'mkdir -p /data/langfuse && minio server --address ":9000" --console-address ":9001" /data'
113
113
  environment:
114
- MINIO_ROOT_USER: ${MINIO_ROOT_USER:-minio}
115
- MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-miniosecret} # CHANGEME
114
+ MINIO_ROOT_USER: ${MINIO_ROOT_USER?}
115
+ MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD?} # CHANGEME
116
116
  ports:
117
117
  - 9090:9000
118
118
  - 127.0.0.1:9091:9001
@@ -130,7 +130,7 @@ services:
130
130
  restart: always
131
131
  # CHANGEME: row below to secure redis password
132
132
  command: >
133
- --requirepass ${REDIS_AUTH:-myredissecret}
133
+ --requirepass ${REDIS_AUTH?}
134
134
  --maxmemory-policy noeviction
135
135
  # ports removed to avoid conflict with local redis
136
136
  healthcheck:
@@ -149,7 +149,7 @@ services:
149
149
  retries: 10
150
150
  environment:
151
151
  POSTGRES_USER: ${POSTGRES_USER:-postgres}
152
- POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres} # CHANGEME
152
+ POSTGRES_PASSWORD: ${POSTGRES_PASSWORD?} # CHANGEME
153
153
  POSTGRES_DB: ${POSTGRES_DB:-postgres}
154
154
  TZ: UTC
155
155
  PGTZ: UTC
@@ -19,7 +19,10 @@
19
19
  - Web UI 확장 설계서: `guides/WEBUI_CLI_ROLLOUT_PLAN.md` (1단계 구현 파일 목록 포함)
20
20
  - RAGAS 인간 피드백 보정: `guides/RAGAS_HUMAN_FEEDBACK_CALIBRATION_GUIDE.md`
21
21
  - 진단 플레이북: `guides/EVALVAULT_DIAGNOSTIC_PLAYBOOK.md` (문제→분석→해석→액션 흐름)
22
+ - RAG 성능 개선 제안서: `guides/RAG_PERFORMANCE_IMPROVEMENT_PROPOSAL.md` (목적/미션·KPI·로드맵)
23
+ - CLI 병렬 기능 설계서: `guides/CLI_PARALLEL_FEATURES_SPEC.md`
22
24
  - 실행 결과 엑셀 시트 요약: `guides/EVALVAULT_RUN_EXCEL_SHEETS.md`
25
+ - 평가 리포트 템플릿: `templates/eval_report_templates.md`
23
26
  - 릴리즈 체크리스트: `guides/RELEASE_CHECKLIST.md`
24
27
  - 상태 요약: `STATUS.md`
25
28
  - 로드맵: `ROADMAP.md`
@@ -210,6 +210,7 @@ uv sync --extra dev
210
210
  | `phoenix` | arize-phoenix + OpenTelemetry | Phoenix 트레이싱/데이터셋/실험 연동 |
211
211
  | `anthropic` | anthropic | Anthropic LLM 어댑터 |
212
212
  | `perf` | faiss-cpu, ijson | 대용량 데이터셋 성능 보조 |
213
+ | `secrets` | boto3, google-cloud-secret-manager, hvac | Secret Manager 연동 |
213
214
 
214
215
  `.python-version` 덕분에 uv가 Python 3.12를 자동으로 내려받습니다.
215
216
 
@@ -221,6 +222,9 @@ uv sync --extra dev
221
222
  ```bash
222
223
  cp .env.example .env
223
224
  # OPENAI_API_KEY, OLLAMA_BASE_URL, LANGFUSE_* , PHOENIX_* 등을 채워 넣으세요.
225
+ # API 인증을 쓰려면 API_AUTH_TOKENS를 설정하세요.
226
+ # secret:// 참조를 쓰려면 SECRET_PROVIDER 설정 및 secrets extra가 필요합니다.
227
+ # 레이트리밋은 RATE_LIMIT_ENABLED로 활성화합니다.
224
228
  ```
225
229
  SQLite 경로를 바꾸려면 아래 값을 추가합니다.
226
230
  ```bash
@@ -24,6 +24,11 @@
24
24
  - Open RAG Trace 스펙/샘플을 실제 운영 요구에 맞춰 점진 확장(버전 정책 준수)
25
25
  - Collector 구성 및 데이터 보존(artifact 분리, PII 마스킹) 가이드 강화
26
26
 
27
+ ### P3 (성능 개선 로드맵)
28
+ - RAG 성능 개선 제안서 기반으로 KPI/평가 프로토콜/로드맵 정립
29
+ - Retrieval/리랭킹/GraphRAG 실험과 운영 지표 통합
30
+ - 전문가 관점(인지/UX/운영) 기반 개선 루프 고도화
31
+
27
32
  ## 작업 트래킹
28
33
 
29
34
  - 구체적인 이슈/PR 단위 계획은 GitHub Issues/PR에서 관리합니다.
@@ -12,6 +12,7 @@ EvalVault의 목표는 **RAG 평가/분석/추적을 하나의 Run 단위로 연
12
12
  - **Observability**: Phoenix(OpenTelemetry/OpenInference) 및 (선택) Langfuse/MLflow
13
13
  - **프로필 기반 모델 전환**: `config/models.yaml` + `.env`로 OpenAI/Ollama/vLLM/Anthropic 등
14
14
  - **Open RAG Trace 표준**: 외부/내부 RAG 시스템을 표준 스키마로 계측/수집
15
+ - **성능 개선 프레임**: `guides/RAG_PERFORMANCE_IMPROVEMENT_PROPOSAL.md`에 KPI/평가/로드맵 정리
15
16
 
16
17
  ## 현재 제약 (투명 공개)
17
18
 
@@ -0,0 +1,315 @@
1
+ # CLI Parallel Features Spec (Draft)
2
+
3
+ > Audience: CLI/Platform contributors
4
+ > Purpose: Future CLI features aligned with SOLID, BDD, hexagonal & clean architecture
5
+ > Last Updated: 2026-01-18
6
+
7
+ ## 1. Overview
8
+
9
+ This document specifies new CLI features that are parallel-by-default, deterministic, and cleanly separated by ports/adapters. The scope is design-level documentation with stable JSON outputs and BDD scenarios.
10
+
11
+ Design goals:
12
+ - SOLID: each command = one use-case orchestrator; dependencies injected via ports
13
+ - Clean/Hexagonal: CLI is an inbound adapter; domain services depend on outbound ports only
14
+ - Parallel execution: bounded concurrency with deterministic aggregation
15
+ - BDD: user-visible behavior is defined via Gherkin scenarios
16
+
17
+ Collaboration rules (conflict avoidance):
18
+ - Each stream modifies different files only.
19
+ - Shared schemas or interfaces change only after explicit agreement.
20
+ - Documentation edits are assigned to a single owner to avoid merge conflicts.
21
+
22
+ ## 1.1 Parallel Agent Implementation Plan (Execution)
23
+
24
+ Scope:
25
+ - Implement all commands below in parallel (CLI + domain services + ports + adapters).
26
+ - Each command is owned by exactly one agent end-to-end.
27
+
28
+ Ownership:
29
+ - Agent Compare: `evalvault compare`
30
+ - Agent Calibrate: `evalvault calibrate-judge`
31
+ - Agent Difficulty: `evalvault profile-difficulty`
32
+ - Agent Regress: `evalvault regress`
33
+ - Agent Artifacts: `evalvault artifacts lint`
34
+ - Agent Ops: `evalvault ops snapshot`
35
+
36
+ File boundaries (default):
37
+ - CLI command module for the command
38
+ - Domain service (one use-case service per command)
39
+ - Outbound port interfaces needed by that service
40
+ - Outbound adapters for storage/reporting/FS as needed
41
+ - Tests for the command/service
42
+
43
+ Shared files (change only with explicit agreement):
44
+ - `adapters/inbound/cli/app.py`
45
+ - `adapters/inbound/cli/commands/__init__.py`
46
+ - Common JSON envelope schema or report templates
47
+ - `domain/services/async_batch_executor.py`
48
+
49
+ Definition of done (per agent):
50
+ - CLI command registered and functional with `--help` and a basic run path
51
+ - Domain service + ports/adapters implemented for the use-case
52
+ - Tests added for core logic and CLI wiring
53
+ - Tests and lint pass with the standard project commands
54
+
55
+ Test commands (standard project flow):
56
+ - `uv run ruff check src/ tests/`
57
+ - `uv run ruff format src/ tests/`
58
+ - `uv run pytest tests -v`
59
+
60
+ ## 2. Command Specs
61
+
62
+ ### 2.1 `evalvault compare`
63
+
64
+ Purpose:
65
+ - Compare two runs (metrics, prompts/config diffs, difficulty distribution) and output a unified report.
66
+
67
+ Synopsis:
68
+ ```
69
+ uv run evalvault compare RUN_A RUN_B \
70
+ --db data/db/evalvault.db \
71
+ --metrics faithfulness,answer_relevancy \
72
+ --test t-test \
73
+ --format table \
74
+ --output reports/comparison/comparison_RUNA_RUNB.json \
75
+ --report reports/comparison/comparison_RUNA_RUNB.md \
76
+ --output-dir reports/comparison \
77
+ --artifacts-dir reports/comparison/artifacts/comparison_RUNA_RUNB \
78
+ --parallel --concurrency 8
79
+ ```
80
+
81
+ Options:
82
+ - `--db, -D <path>`: sqlite db path
83
+ - `--metrics, -m <csv>`: allowlist of metrics
84
+ - `--test, -t <t-test|mann-whitney>`
85
+ - `--format, -f <table|json>`
86
+ - `--output, -o <path>`
87
+ - `--report <path>`
88
+ - `--output-dir <path>`
89
+ - `--artifacts-dir <path>`
90
+ - `--parallel/--no-parallel`, `--concurrency <int>`
91
+
92
+ Exit codes:
93
+ - `0`: success
94
+ - `1`: invalid args or missing run
95
+ - `2`: report generation degraded
96
+
97
+ ### 2.2 `evalvault calibrate-judge`
98
+
99
+ Purpose:
100
+ - Calibrate judge scores and emit reliability summary.
101
+
102
+ Synopsis:
103
+ ```
104
+ uv run evalvault calibrate-judge RUN_ID \
105
+ --db data/db/evalvault.db \
106
+ --labels-source feedback \
107
+ --method isotonic \
108
+ --metric faithfulness \
109
+ --holdout-ratio 0.2 \
110
+ --seed 42 \
111
+ --write-back \
112
+ --output reports/calibration/judge_calibration_RUNID.json \
113
+ --parallel --concurrency 8
114
+ ```
115
+
116
+ Options:
117
+ - `--labels-source <feedback|gold|hybrid>`
118
+ - `--method <platt|isotonic|temperature|none>`
119
+ - `--metric <name>` (repeatable)
120
+ - `--holdout-ratio <float>`
121
+ - `--seed <int>`
122
+ - `--write-back`
123
+ - `--output, -o <path>`
124
+ - `--artifacts-dir <path>`
125
+ - `--parallel/--no-parallel`, `--concurrency <int>`
126
+
127
+ Exit codes:
128
+ - `0`: success
129
+ - `1`: labels missing / invalid args
130
+ - `2`: calibration quality below gate
131
+
132
+ ### 2.3 `evalvault profile-difficulty`
133
+
134
+ Purpose:
135
+ - Compute difficulty buckets for a dataset or a run.
136
+
137
+ Synopsis:
138
+ ```
139
+ uv run evalvault profile-difficulty \
140
+ --db data/db/evalvault.db \
141
+ --dataset-name insurance-qa \
142
+ --limit-runs 50 \
143
+ --metrics faithfulness,answer_relevancy \
144
+ --bucket-count 5 \
145
+ --output reports/difficulty/difficulty_insurance-qa.json \
146
+ --parallel --concurrency 8
147
+ ```
148
+
149
+ Options:
150
+ - `--dataset-name <string>` or `--run-id <id>`
151
+ - `--limit-runs <int>`
152
+ - `--metrics, -m <csv>`
153
+ - `--bucket-count <int>`
154
+ - `--min-samples <int>`
155
+ - `--output, -o <path>`
156
+ - `--artifacts-dir <path>`
157
+ - `--parallel/--no-parallel`, `--concurrency <int>`
158
+
159
+ Exit codes:
160
+ - `0`: success
161
+ - `1`: insufficient history or invalid args
162
+
163
+ ### 2.4 `evalvault regress`
164
+
165
+ Purpose:
166
+ - CI-grade regression gate vs baseline run.
167
+
168
+ Synopsis:
169
+ ```
170
+ uv run evalvault regress RUN_CANDIDATE \
171
+ --db data/db/evalvault.db \
172
+ --baseline RUN_BASELINE \
173
+ --fail-on-regression 0.05 \
174
+ --test t-test \
175
+ --metrics faithfulness,answer_relevancy \
176
+ --format github-actions \
177
+ --output reports/regress/regress_RUNCAND.json \
178
+ --parallel --concurrency 8
179
+ ```
180
+
181
+ Exit codes:
182
+ - `0`: pass
183
+ - `1`: invalid input
184
+ - `2`: regression detected
185
+ - `3`: internal error
186
+
187
+ ### 2.5 `evalvault artifacts lint`
188
+
189
+ Purpose:
190
+ - Validate required artifacts and schema invariants.
191
+
192
+ Synopsis:
193
+ ```
194
+ uv run evalvault artifacts lint ARTIFACT_DIR \
195
+ --strict \
196
+ --format json \
197
+ --output reports/artifacts_lint/lint_RUNID.json \
198
+ --parallel --concurrency 16
199
+ ```
200
+
201
+ Checks:
202
+ - `index.json` presence
203
+ - required paths exist
204
+ - JSON schema validation
205
+
206
+ ### 2.6 `evalvault ops snapshot`
207
+
208
+ Purpose:
209
+ - Collect reproducibility metadata (profile, model config, env redactions).
210
+
211
+ Synopsis:
212
+ ```
213
+ uv run evalvault ops snapshot \
214
+ --profile dev \
215
+ --db data/db/evalvault.db \
216
+ --run-id RUN_ID \
217
+ --include-model-config \
218
+ --include-env \
219
+ --redact OPENAI_API_KEY \
220
+ --output reports/ops/snapshot_RUNID.json
221
+ ```
222
+
223
+ ## 3. Architecture Alignment
224
+
225
+ ### 3.1 SOLID
226
+ - SRP: each command orchestrates a single use-case service
227
+ - OCP: add new commands via new registrars without modifying core command modules
228
+ - DIP: domain services depend on ports (StoragePort, ReportPort, FileSystemPort)
229
+
230
+ ### 3.2 Hexagonal/Clean
231
+ - Inbound adapter: `adapters/inbound/cli/commands/*`
232
+ - Domain services: `domain/services/*` for use-cases
233
+ - Outbound ports: `ports/outbound/*`
234
+ - Outbound adapters: sqlite storage, report writers, LLM providers
235
+
236
+ ### 3.3 Proposed Services (Draft)
237
+ - `RunComparisonService`
238
+ - `JudgeCalibrationService`
239
+ - `DifficultyProfilingService`
240
+ - `RegressionGateService`
241
+ - `ArtifactLintService`
242
+ - `OpsSnapshotService`
243
+
244
+ ## 4. Parallel Execution Model
245
+
246
+ - Use bounded concurrency (`--concurrency`) and deterministic aggregation.
247
+ - Candidate base utility: `domain/services/async_batch_executor.py`.
248
+ - Parallelize per-metric/per-case computations; merge results with stable sorting.
249
+ - LLM calls default to sequential unless explicitly enabled.
250
+
251
+ ## 5. JSON Output Envelope
252
+
253
+ Common envelope (recommended):
254
+ ```
255
+ {
256
+ "command": "compare",
257
+ "version": 1,
258
+ "status": "ok",
259
+ "started_at": "2026-01-18T00:00:00Z",
260
+ "finished_at": "2026-01-18T00:00:05Z",
261
+ "duration_ms": 5000,
262
+ "artifacts": {
263
+ "dir": "reports/.../artifacts/...",
264
+ "index": "reports/.../artifacts/.../index.json"
265
+ },
266
+ "data": {}
267
+ }
268
+ ```
269
+
270
+ ## 6. BDD Scenarios (Gherkin)
271
+
272
+ ### compare
273
+ ```
274
+ Feature: Compare two evaluation runs
275
+ Scenario: Compare two runs with shared metrics
276
+ Given a database with runs "run_a" and "run_b"
277
+ When I run "evalvault compare run_a run_b --format json"
278
+ Then the command exits with code 0
279
+ And the JSON output contains "run_ids" ["run_a", "run_b"]
280
+ ```
281
+
282
+ ### calibrate-judge
283
+ ```
284
+ Feature: Calibrate judge scoring
285
+ Scenario: Calibrate judge scores using feedback labels
286
+ Given a run "run_x" with feedback labels in storage
287
+ When I run "evalvault calibrate-judge run_x --labels-source feedback"
288
+ Then the command exits with code 0
289
+ ```
290
+
291
+ ### regress
292
+ ```
293
+ Feature: Regression gate for CI
294
+ Scenario: Regression detected
295
+ Given a candidate run "run_new" and baseline "run_base"
296
+ When I run "evalvault regress run_new --baseline run_base"
297
+ Then the command exits with code 2
298
+ ```
299
+
300
+ ## 7. Non-goals
301
+ - No distributed execution or multi-node scheduling
302
+ - No new scoring algorithms; only orchestration and reporting
303
+ - No breaking change to existing CLI
304
+
305
+ ## 8. Risks
306
+ - Provider rate limits with parallel LLM calls
307
+ - DB contention under high concurrency
308
+ - Schema drift in artifacts without linting
309
+
310
+ ## 9. Mapping to Existing Modules (Evidence)
311
+ - CLI app: `adapters/inbound/cli/app.py`
312
+ - Command registration: `adapters/inbound/cli/commands/__init__.py`
313
+ - Existing compare pipeline: `adapters/inbound/cli/commands/analyze.py`
314
+ - Artifact utilities: `adapters/inbound/cli/utils/analysis_io.py`
315
+ - Async batch executor: `domain/services/async_batch_executor.py`
@@ -0,0 +1,114 @@
1
+ # RAG 시스템 데이터 난이도 평가 및 평가용 LLM 파인튜닝 전략 (현실적 관점)
2
+
3
+ ## 1. 데이터 난이도 평가 체계: 근거는 있으나 전제조건이 중요
4
+
5
+ ### 1.1 핵심 전제
6
+ - 난이도는 질문/문맥/응답 간 상호작용으로 결정되며, 단일 지표로는 포착이 어렵다.
7
+ - Retrieval Complexity(RC)는 질문 난이도와 QA 성능/전문가 판단 간 상관을 보인다는 근거가 있다.
8
+ - 그러나 난이도는 “프록시 지표”이며, 실제 운영 데이터와의 상관 검증이 선행되어야 한다.
9
+
10
+ ### 1.2 난이도 축(권장)
11
+ - 질문 복잡도: 복합 질문, 다단계 추론, 시간/조건 맥락 포함 여부
12
+ - 검색 난이도: 필요한 증거가 여러 문서에 분산되어 있는지, 검색 세트 완전도
13
+ - 답변 품질 신호: 정답 라벨/판정 점수, faithfulness/answer relevancy
14
+ - 노이즈/도메인 일탈: 검색 결과 부재, 도메인 분류 모델의 저확신
15
+
16
+ ### 1.3 단계적 구현(현실적)
17
+ 1. v0 (휴리스틱): 질의 길이, 멀티홉 플래그, 검색 성공/실패 여부, top-k 점수 분포
18
+ 2. v1 (RC 기반): RRCP류 파이프라인을 적용해 RC 추정, 난이도-오류율 상관 검증
19
+ 3. v2 (난이도 운영): 난이도 분포 드리프트를 KPI로 관리, 난이도 구간별 threshold 분리
20
+
21
+ ### 1.4 노이즈/오류 입력 처리
22
+ - 검색 결과 유사도 하한, 결과 0건, 도메인 분류 저확신을 노이즈로 분류
23
+ - 노이즈 케이스는 별도 태그로 분리하고, 다운스트림에서 안전 응답으로 처리
24
+
25
+ ### 1.5 EvalVault 연계
26
+ - 난이도 점수를 run_id 아티팩트로 저장해 난이도별 성능 추세를 비교 가능하게 한다.
27
+ - 난이도 분포 변화가 품질 저하와 연동되는지 검증해 “진짜 원인”인지 확인한다.
28
+
29
+ ### 1.6 도메인별 예시(보험/원전)
30
+ - 보험
31
+ - Easy: “자동차 보험 가입 연령은?” (단일 문서 명시)
32
+ - Medium: “운전자 범위 변경 시 보험료가 어떻게 달라지나?” (규정+예외 조합)
33
+ - Hard: “실손보험에서 특정 치료가 비급여일 때 보장 범위는?” (다중 문서/조건 추론)
34
+ - 원전
35
+ - Easy: “1차 계통과 2차 계통의 차이는?” (정의성 질문)
36
+ - Medium: “정비 절차의 단계별 요구 사항은?” (절차/조건 조합)
37
+ - Hard: “특정 사고 시나리오에서 안전 계통 동작 순서와 근거는?” (다단계 추론)
38
+
39
+ ---
40
+
41
+ ## 2. 평가용 LLM(as-a-judge) 파인튜닝: 비용 절감 가능, 일반화 리스크 존재
42
+
43
+ ### 2.1 기본 원칙
44
+ - 비용 절감은 가능하나, 소형 judge의 일반화/공정성/도메인 이동성은 취약하다.
45
+ - judge 품질은 모델 크기보다 라벨 품질/캘리브레이션에 더 좌우된다.
46
+
47
+ ### 2.2 데이터 구성(필수)
48
+ - 휴먼 레이블: 질문-문맥-응답과 점수(1~5) 또는 등급 라벨
49
+ - 선호도(pairwise): A/B 비교 데이터(가능하면 이유 포함)
50
+ - 전문가 정답: 기준 정답과의 일치/누락 평가
51
+ - 운영 로그: thumbs up/down, 재질의, 불만족 신호(약한 라벨)
52
+
53
+ ### 2.3 학습 전략(권장)
54
+ - SFT로 시작 후, 선호 데이터가 충분하면 DPO 또는 SLiC-HF 추가 적용
55
+ - 출력 형식은 JSON 스키마를 고정하여 판정 안정성 확보
56
+ - 검증은 GPT-4급 judge와의 일치율, 인간 평가와의 상관을 함께 확인
57
+
58
+ ### 2.4 운영 가드레일
59
+ - 캐스케이드 평가: 소형 judge로 대량 처리 후 경계 케이스만 상위 모델로 승격
60
+ - 캘리브레이션: 소량 인간 라벨로 점수 보정 및 신뢰구간 제공
61
+ - 편향 완화: 위치/형식/지식 편향에 대한 swap/format 랜덤화 테스트
62
+
63
+ ---
64
+
65
+ ## 3. 최신 파인튜닝/효율 기법: “효율”과 “평가 품질”을 분리해 판단
66
+
67
+ ### 3.1 적용 시점 가이드
68
+ - QLoRA/LoRA+/LoftQ는 메모리 효율에 유리하지만, 평가 품질 향상은 별도 검증 필요
69
+ - LongLoRA/Cartridges/MQA는 장문/서빙 효율에 유리하나 judge 성능 보장을 의미하지 않음
70
+ - GaLore는 메모리 절감과 full-update 가능성이 장점이나 운영 복잡도 증가
71
+
72
+ ### 3.2 권장 선택 순서
73
+ 1. QLoRA + LoRA(또는 LoRA+)로 시작
74
+ 2. 캘리브레이션/일관성 확보 후에 확장 기법 고려
75
+ 3. 장문 최적화는 실제 장문 업무에서 병목이 확인된 경우에만 적용
76
+
77
+ ---
78
+
79
+ ## 4. 결론
80
+ - 난이도 프로파일링은 유효하지만, “상관 검증 + 운영 KPI화”가 필수 전제다.
81
+ - 소형 judge는 비용 절감에 유리하나 일반화/편향/일관성 리스크가 크므로 캘리브레이션과 캐스케이드 운영이 필수다.
82
+ - 최신 파인튜닝 기법은 효율성 개선 도구이며, 평가 품질 향상을 보장하지 않는다.
83
+
84
+ ---
85
+
86
+ ## 5. 실행 체크리스트
87
+ - 데이터 난이도
88
+ - 난이도 v0 지표가 오류율과 유의미하게 상관되는지 확인
89
+ - 난이도 분포 드리프트가 실제 품질 하락과 연동되는지 검증
90
+ - judge
91
+ - 사람 라벨 3–5% 확보 및 캘리브레이션 리포트 생성
92
+ - 캐스케이드 승격 조건(저신뢰/경계 케이스) 정의
93
+ - 운영
94
+ - run_id 아티팩트에 난이도/판정 근거 저장 여부 확인
95
+ - 난이도별 threshold 및 대응 정책 문서화
96
+
97
+ ---
98
+
99
+ ## References
100
+ - RC metric: https://aclanthology.org/2024.findings-acl.872/
101
+ - GRADE difficulty matrix: https://arxiv.org/abs/2508.16994
102
+ - QLoRA: https://arxiv.org/abs/2305.14314
103
+ - LoftQ: https://arxiv.org/abs/2310.08659
104
+ - LoRA+: https://arxiv.org/abs/2402.12354
105
+ - LongLoRA: https://arxiv.org/abs/2309.12307
106
+ - DPO: https://arxiv.org/abs/2305.18290
107
+ - SLiC-HF: https://arxiv.org/abs/2305.10425
108
+ - GaLore: https://arxiv.org/abs/2403.03507
109
+ - Cartridges: https://arxiv.org/abs/2506.06266
110
+ - MQA: https://arxiv.org/abs/1911.02150
111
+ - JudgeLM: https://arxiv.org/abs/2310.17631
112
+ - Fine-tuned judge limits: https://aclanthology.org/2025.findings-acl.306/
113
+ - LLM judge reliability: https://arxiv.org/abs/2412.12509
114
+ - LLM judge bias: https://llm-judge-bias.github.io/