@miller-tech/uap 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +888 -0
- package/dist/analyzers/index.d.ts +3 -0
- package/dist/analyzers/index.d.ts.map +1 -0
- package/dist/analyzers/index.js +684 -0
- package/dist/analyzers/index.js.map +1 -0
- package/dist/benchmarks/agents/naive-agent.d.ts +60 -0
- package/dist/benchmarks/agents/naive-agent.d.ts.map +1 -0
- package/dist/benchmarks/agents/naive-agent.js +144 -0
- package/dist/benchmarks/agents/naive-agent.js.map +1 -0
- package/dist/benchmarks/agents/uap-agent.d.ts +167 -0
- package/dist/benchmarks/agents/uap-agent.d.ts.map +1 -0
- package/dist/benchmarks/agents/uap-agent.js +437 -0
- package/dist/benchmarks/agents/uap-agent.js.map +1 -0
- package/dist/benchmarks/benchmark.d.ts +328 -0
- package/dist/benchmarks/benchmark.d.ts.map +1 -0
- package/dist/benchmarks/benchmark.js +112 -0
- package/dist/benchmarks/benchmark.js.map +1 -0
- package/dist/benchmarks/execution-verifier.d.ts +41 -0
- package/dist/benchmarks/execution-verifier.d.ts.map +1 -0
- package/dist/benchmarks/execution-verifier.js +340 -0
- package/dist/benchmarks/execution-verifier.js.map +1 -0
- package/dist/benchmarks/hierarchical-prompting.d.ts +37 -0
- package/dist/benchmarks/hierarchical-prompting.d.ts.map +1 -0
- package/dist/benchmarks/hierarchical-prompting.js +246 -0
- package/dist/benchmarks/hierarchical-prompting.js.map +1 -0
- package/dist/benchmarks/improved-benchmark.d.ts +89 -0
- package/dist/benchmarks/improved-benchmark.d.ts.map +1 -0
- package/dist/benchmarks/improved-benchmark.js +585 -0
- package/dist/benchmarks/improved-benchmark.js.map +1 -0
- package/dist/benchmarks/index.d.ts +11 -0
- package/dist/benchmarks/index.d.ts.map +1 -0
- package/dist/benchmarks/index.js +11 -0
- package/dist/benchmarks/index.js.map +1 -0
- package/dist/benchmarks/model-integration.d.ts +111 -0
- package/dist/benchmarks/model-integration.d.ts.map +1 -0
- package/dist/benchmarks/model-integration.js +904 -0
- package/dist/benchmarks/model-integration.js.map +1 -0
- package/dist/benchmarks/multi-turn-agent.d.ts +44 -0
- package/dist/benchmarks/multi-turn-agent.d.ts.map +1 -0
- package/dist/benchmarks/multi-turn-agent.js +254 -0
- package/dist/benchmarks/multi-turn-agent.js.map +1 -0
- package/dist/benchmarks/multi-turn-loop.d.ts +57 -0
- package/dist/benchmarks/multi-turn-loop.d.ts.map +1 -0
- package/dist/benchmarks/multi-turn-loop.js +167 -0
- package/dist/benchmarks/multi-turn-loop.js.map +1 -0
- package/dist/benchmarks/tasks.d.ts +19 -0
- package/dist/benchmarks/tasks.d.ts.map +1 -0
- package/dist/benchmarks/tasks.js +435 -0
- package/dist/benchmarks/tasks.js.map +1 -0
- package/dist/bin/cli.d.ts +3 -0
- package/dist/bin/cli.d.ts.map +1 -0
- package/dist/bin/cli.js +546 -0
- package/dist/bin/cli.js.map +1 -0
- package/dist/bin/llama-server-optimize.d.ts +18 -0
- package/dist/bin/llama-server-optimize.d.ts.map +1 -0
- package/dist/bin/llama-server-optimize.js +708 -0
- package/dist/bin/llama-server-optimize.js.map +1 -0
- package/dist/bin/policy.d.ts +3 -0
- package/dist/bin/policy.d.ts.map +1 -0
- package/dist/bin/policy.js +143 -0
- package/dist/bin/policy.js.map +1 -0
- package/dist/bin/tool-calls.d.ts +3 -0
- package/dist/bin/tool-calls.d.ts.map +1 -0
- package/dist/bin/tool-calls.js +4 -0
- package/dist/bin/tool-calls.js.map +1 -0
- package/dist/browser/index.d.ts +2 -0
- package/dist/browser/index.d.ts.map +1 -0
- package/dist/browser/index.js +2 -0
- package/dist/browser/index.js.map +1 -0
- package/dist/browser/web-browser.d.ts +30 -0
- package/dist/browser/web-browser.d.ts.map +1 -0
- package/dist/browser/web-browser.js +93 -0
- package/dist/browser/web-browser.js.map +1 -0
- package/dist/cli/agent.d.ts +20 -0
- package/dist/cli/agent.d.ts.map +1 -0
- package/dist/cli/agent.js +474 -0
- package/dist/cli/agent.js.map +1 -0
- package/dist/cli/analyze.d.ts +7 -0
- package/dist/cli/analyze.d.ts.map +1 -0
- package/dist/cli/analyze.js +103 -0
- package/dist/cli/analyze.js.map +1 -0
- package/dist/cli/completion-gates.d.ts +51 -0
- package/dist/cli/completion-gates.d.ts.map +1 -0
- package/dist/cli/completion-gates.js +201 -0
- package/dist/cli/completion-gates.js.map +1 -0
- package/dist/cli/compliance.d.ts +8 -0
- package/dist/cli/compliance.d.ts.map +1 -0
- package/dist/cli/compliance.js +509 -0
- package/dist/cli/compliance.js.map +1 -0
- package/dist/cli/coord.d.ts +7 -0
- package/dist/cli/coord.d.ts.map +1 -0
- package/dist/cli/coord.js +138 -0
- package/dist/cli/coord.js.map +1 -0
- package/dist/cli/dashboard.d.ts +21 -0
- package/dist/cli/dashboard.d.ts.map +1 -0
- package/dist/cli/dashboard.js +1508 -0
- package/dist/cli/dashboard.js.map +1 -0
- package/dist/cli/deploy.d.ts +19 -0
- package/dist/cli/deploy.d.ts.map +1 -0
- package/dist/cli/deploy.js +387 -0
- package/dist/cli/deploy.js.map +1 -0
- package/dist/cli/droids.d.ts +9 -0
- package/dist/cli/droids.d.ts.map +1 -0
- package/dist/cli/droids.js +227 -0
- package/dist/cli/droids.js.map +1 -0
- package/dist/cli/generate.d.ts +17 -0
- package/dist/cli/generate.d.ts.map +1 -0
- package/dist/cli/generate.js +432 -0
- package/dist/cli/generate.js.map +1 -0
- package/dist/cli/hooks.d.ts +9 -0
- package/dist/cli/hooks.d.ts.map +1 -0
- package/dist/cli/hooks.js +464 -0
- package/dist/cli/hooks.js.map +1 -0
- package/dist/cli/init.d.ts +12 -0
- package/dist/cli/init.d.ts.map +1 -0
- package/dist/cli/init.js +364 -0
- package/dist/cli/init.js.map +1 -0
- package/dist/cli/mcp-router.d.ts +16 -0
- package/dist/cli/mcp-router.d.ts.map +1 -0
- package/dist/cli/mcp-router.js +143 -0
- package/dist/cli/mcp-router.js.map +1 -0
- package/dist/cli/memory.d.ts +24 -0
- package/dist/cli/memory.d.ts.map +1 -0
- package/dist/cli/memory.js +885 -0
- package/dist/cli/memory.js.map +1 -0
- package/dist/cli/model.d.ts +15 -0
- package/dist/cli/model.d.ts.map +1 -0
- package/dist/cli/model.js +290 -0
- package/dist/cli/model.js.map +1 -0
- package/dist/cli/patterns.d.ts +26 -0
- package/dist/cli/patterns.d.ts.map +1 -0
- package/dist/cli/patterns.js +862 -0
- package/dist/cli/patterns.js.map +1 -0
- package/dist/cli/rtk-validation.d.ts +9 -0
- package/dist/cli/rtk-validation.d.ts.map +1 -0
- package/dist/cli/rtk-validation.js +9 -0
- package/dist/cli/rtk-validation.js.map +1 -0
- package/dist/cli/rtk.d.ts +34 -0
- package/dist/cli/rtk.d.ts.map +1 -0
- package/dist/cli/rtk.js +401 -0
- package/dist/cli/rtk.js.map +1 -0
- package/dist/cli/schema-diff.d.ts +7 -0
- package/dist/cli/schema-diff.d.ts.map +1 -0
- package/dist/cli/schema-diff.js +11 -0
- package/dist/cli/schema-diff.js.map +1 -0
- package/dist/cli/setup-mcp-router.d.ts +8 -0
- package/dist/cli/setup-mcp-router.d.ts.map +1 -0
- package/dist/cli/setup-mcp-router.js +163 -0
- package/dist/cli/setup-mcp-router.js.map +1 -0
- package/dist/cli/setup-wizard.d.ts +2 -0
- package/dist/cli/setup-wizard.d.ts.map +1 -0
- package/dist/cli/setup-wizard.js +806 -0
- package/dist/cli/setup-wizard.js.map +1 -0
- package/dist/cli/setup.d.ts +15 -0
- package/dist/cli/setup.d.ts.map +1 -0
- package/dist/cli/setup.js +154 -0
- package/dist/cli/setup.js.map +1 -0
- package/dist/cli/sync.d.ts +8 -0
- package/dist/cli/sync.d.ts.map +1 -0
- package/dist/cli/sync.js +395 -0
- package/dist/cli/sync.js.map +1 -0
- package/dist/cli/task.d.ts +33 -0
- package/dist/cli/task.d.ts.map +1 -0
- package/dist/cli/task.js +672 -0
- package/dist/cli/task.js.map +1 -0
- package/dist/cli/tool-calls.d.ts +20 -0
- package/dist/cli/tool-calls.d.ts.map +1 -0
- package/dist/cli/tool-calls.js +605 -0
- package/dist/cli/tool-calls.js.map +1 -0
- package/dist/cli/uap.d.ts +10 -0
- package/dist/cli/uap.d.ts.map +1 -0
- package/dist/cli/uap.js +398 -0
- package/dist/cli/uap.js.map +1 -0
- package/dist/cli/update.d.ts +10 -0
- package/dist/cli/update.d.ts.map +1 -0
- package/dist/cli/update.js +300 -0
- package/dist/cli/update.js.map +1 -0
- package/dist/cli/visualize.d.ts +77 -0
- package/dist/cli/visualize.d.ts.map +1 -0
- package/dist/cli/visualize.js +287 -0
- package/dist/cli/visualize.js.map +1 -0
- package/dist/cli/worktree.d.ts +9 -0
- package/dist/cli/worktree.d.ts.map +1 -0
- package/dist/cli/worktree.js +213 -0
- package/dist/cli/worktree.js.map +1 -0
- package/dist/coordination/adaptive-patterns.d.ts +65 -0
- package/dist/coordination/adaptive-patterns.d.ts.map +1 -0
- package/dist/coordination/adaptive-patterns.js +108 -0
- package/dist/coordination/adaptive-patterns.js.map +1 -0
- package/dist/coordination/auto-agent.d.ts +82 -0
- package/dist/coordination/auto-agent.d.ts.map +1 -0
- package/dist/coordination/auto-agent.js +145 -0
- package/dist/coordination/auto-agent.js.map +1 -0
- package/dist/coordination/capability-router.d.ts +79 -0
- package/dist/coordination/capability-router.d.ts.map +1 -0
- package/dist/coordination/capability-router.js +334 -0
- package/dist/coordination/capability-router.js.map +1 -0
- package/dist/coordination/database.d.ts +13 -0
- package/dist/coordination/database.d.ts.map +1 -0
- package/dist/coordination/database.js +136 -0
- package/dist/coordination/database.js.map +1 -0
- package/dist/coordination/deploy-batcher.d.ts +122 -0
- package/dist/coordination/deploy-batcher.d.ts.map +1 -0
- package/dist/coordination/deploy-batcher.js +718 -0
- package/dist/coordination/deploy-batcher.js.map +1 -0
- package/dist/coordination/droid-validator.d.ts +59 -0
- package/dist/coordination/droid-validator.d.ts.map +1 -0
- package/dist/coordination/droid-validator.js +142 -0
- package/dist/coordination/droid-validator.js.map +1 -0
- package/dist/coordination/index.d.ts +10 -0
- package/dist/coordination/index.d.ts.map +1 -0
- package/dist/coordination/index.js +10 -0
- package/dist/coordination/index.js.map +1 -0
- package/dist/coordination/pattern-router.d.ts +50 -0
- package/dist/coordination/pattern-router.d.ts.map +1 -0
- package/dist/coordination/pattern-router.js +118 -0
- package/dist/coordination/pattern-router.js.map +1 -0
- package/dist/coordination/service.d.ts +81 -0
- package/dist/coordination/service.d.ts.map +1 -0
- package/dist/coordination/service.js +619 -0
- package/dist/coordination/service.js.map +1 -0
- package/dist/coordination/worktree-enforcer.d.ts +22 -0
- package/dist/coordination/worktree-enforcer.d.ts.map +1 -0
- package/dist/coordination/worktree-enforcer.js +71 -0
- package/dist/coordination/worktree-enforcer.js.map +1 -0
- package/dist/generators/claude-md.d.ts +3 -0
- package/dist/generators/claude-md.d.ts.map +1 -0
- package/dist/generators/claude-md.js +1020 -0
- package/dist/generators/claude-md.js.map +1 -0
- package/dist/generators/template-loader.d.ts +105 -0
- package/dist/generators/template-loader.d.ts.map +1 -0
- package/dist/generators/template-loader.js +291 -0
- package/dist/generators/template-loader.js.map +1 -0
- package/dist/index.d.ts +49 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +63 -0
- package/dist/index.js.map +1 -0
- package/dist/mcp-router/config/parser.d.ts +9 -0
- package/dist/mcp-router/config/parser.d.ts.map +1 -0
- package/dist/mcp-router/config/parser.js +174 -0
- package/dist/mcp-router/config/parser.js.map +1 -0
- package/dist/mcp-router/executor/client.d.ts +31 -0
- package/dist/mcp-router/executor/client.d.ts.map +1 -0
- package/dist/mcp-router/executor/client.js +189 -0
- package/dist/mcp-router/executor/client.js.map +1 -0
- package/dist/mcp-router/index.d.ts +22 -0
- package/dist/mcp-router/index.d.ts.map +1 -0
- package/dist/mcp-router/index.js +18 -0
- package/dist/mcp-router/index.js.map +1 -0
- package/dist/mcp-router/output-compressor.d.ts +26 -0
- package/dist/mcp-router/output-compressor.d.ts.map +1 -0
- package/dist/mcp-router/output-compressor.js +236 -0
- package/dist/mcp-router/output-compressor.js.map +1 -0
- package/dist/mcp-router/search/fuzzy.d.ts +26 -0
- package/dist/mcp-router/search/fuzzy.d.ts.map +1 -0
- package/dist/mcp-router/search/fuzzy.js +94 -0
- package/dist/mcp-router/search/fuzzy.js.map +1 -0
- package/dist/mcp-router/server.d.ts +50 -0
- package/dist/mcp-router/server.d.ts.map +1 -0
- package/dist/mcp-router/server.js +229 -0
- package/dist/mcp-router/server.js.map +1 -0
- package/dist/mcp-router/session-stats.d.ts +37 -0
- package/dist/mcp-router/session-stats.d.ts.map +1 -0
- package/dist/mcp-router/session-stats.js +56 -0
- package/dist/mcp-router/session-stats.js.map +1 -0
- package/dist/mcp-router/tools/discover.d.ts +37 -0
- package/dist/mcp-router/tools/discover.d.ts.map +1 -0
- package/dist/mcp-router/tools/discover.js +65 -0
- package/dist/mcp-router/tools/discover.js.map +1 -0
- package/dist/mcp-router/tools/execute.d.ts +43 -0
- package/dist/mcp-router/tools/execute.d.ts.map +1 -0
- package/dist/mcp-router/tools/execute.js +144 -0
- package/dist/mcp-router/tools/execute.js.map +1 -0
- package/dist/mcp-router/types.d.ts +62 -0
- package/dist/mcp-router/types.d.ts.map +1 -0
- package/dist/mcp-router/types.js +6 -0
- package/dist/mcp-router/types.js.map +1 -0
- package/dist/memory/adaptive-context.d.ts +149 -0
- package/dist/memory/adaptive-context.d.ts.map +1 -0
- package/dist/memory/adaptive-context.js +1095 -0
- package/dist/memory/adaptive-context.js.map +1 -0
- package/dist/memory/agent-scoped-memory.d.ts +67 -0
- package/dist/memory/agent-scoped-memory.d.ts.map +1 -0
- package/dist/memory/agent-scoped-memory.js +126 -0
- package/dist/memory/agent-scoped-memory.js.map +1 -0
- package/dist/memory/ambiguity-detector.d.ts +54 -0
- package/dist/memory/ambiguity-detector.d.ts.map +1 -0
- package/dist/memory/ambiguity-detector.js +401 -0
- package/dist/memory/ambiguity-detector.js.map +1 -0
- package/dist/memory/backends/base.d.ts +18 -0
- package/dist/memory/backends/base.d.ts.map +1 -0
- package/dist/memory/backends/base.js +2 -0
- package/dist/memory/backends/base.js.map +1 -0
- package/dist/memory/backends/factory.d.ts +4 -0
- package/dist/memory/backends/factory.d.ts.map +1 -0
- package/dist/memory/backends/factory.js +53 -0
- package/dist/memory/backends/factory.js.map +1 -0
- package/dist/memory/backends/github.d.ts +27 -0
- package/dist/memory/backends/github.d.ts.map +1 -0
- package/dist/memory/backends/github.js +134 -0
- package/dist/memory/backends/github.js.map +1 -0
- package/dist/memory/backends/qdrant-cloud.d.ts +32 -0
- package/dist/memory/backends/qdrant-cloud.d.ts.map +1 -0
- package/dist/memory/backends/qdrant-cloud.js +167 -0
- package/dist/memory/backends/qdrant-cloud.js.map +1 -0
- package/dist/memory/context-compressor.d.ts +116 -0
- package/dist/memory/context-compressor.d.ts.map +1 -0
- package/dist/memory/context-compressor.js +430 -0
- package/dist/memory/context-compressor.js.map +1 -0
- package/dist/memory/context-pruner.d.ts +55 -0
- package/dist/memory/context-pruner.d.ts.map +1 -0
- package/dist/memory/context-pruner.js +85 -0
- package/dist/memory/context-pruner.js.map +1 -0
- package/dist/memory/correction-propagator.d.ts +44 -0
- package/dist/memory/correction-propagator.d.ts.map +1 -0
- package/dist/memory/correction-propagator.js +156 -0
- package/dist/memory/correction-propagator.js.map +1 -0
- package/dist/memory/daily-log.d.ts +67 -0
- package/dist/memory/daily-log.d.ts.map +1 -0
- package/dist/memory/daily-log.js +143 -0
- package/dist/memory/daily-log.js.map +1 -0
- package/dist/memory/dynamic-retrieval.d.ts +112 -0
- package/dist/memory/dynamic-retrieval.d.ts.map +1 -0
- package/dist/memory/dynamic-retrieval.js +908 -0
- package/dist/memory/dynamic-retrieval.js.map +1 -0
- package/dist/memory/embeddings.d.ts +172 -0
- package/dist/memory/embeddings.d.ts.map +1 -0
- package/dist/memory/embeddings.js +780 -0
- package/dist/memory/embeddings.js.map +1 -0
- package/dist/memory/generic-uap-patterns.d.ts +7 -0
- package/dist/memory/generic-uap-patterns.d.ts.map +1 -0
- package/dist/memory/generic-uap-patterns.js +43 -0
- package/dist/memory/generic-uap-patterns.js.map +1 -0
- package/dist/memory/hierarchical-memory.d.ts +141 -0
- package/dist/memory/hierarchical-memory.d.ts.map +1 -0
- package/dist/memory/hierarchical-memory.js +485 -0
- package/dist/memory/hierarchical-memory.js.map +1 -0
- package/dist/memory/knowledge-graph.d.ts +98 -0
- package/dist/memory/knowledge-graph.d.ts.map +1 -0
- package/dist/memory/knowledge-graph.js +275 -0
- package/dist/memory/knowledge-graph.js.map +1 -0
- package/dist/memory/memory-consolidator.d.ts +124 -0
- package/dist/memory/memory-consolidator.d.ts.map +1 -0
- package/dist/memory/memory-consolidator.js +514 -0
- package/dist/memory/memory-consolidator.js.map +1 -0
- package/dist/memory/memory-maintenance.d.ts +39 -0
- package/dist/memory/memory-maintenance.d.ts.map +1 -0
- package/dist/memory/memory-maintenance.js +336 -0
- package/dist/memory/memory-maintenance.js.map +1 -0
- package/dist/memory/model-router.d.ts +105 -0
- package/dist/memory/model-router.d.ts.map +1 -0
- package/dist/memory/model-router.js +474 -0
- package/dist/memory/model-router.js.map +1 -0
- package/dist/memory/multi-view-memory.d.ts +134 -0
- package/dist/memory/multi-view-memory.d.ts.map +1 -0
- package/dist/memory/multi-view-memory.js +430 -0
- package/dist/memory/multi-view-memory.js.map +1 -0
- package/dist/memory/predictive-memory.d.ts +79 -0
- package/dist/memory/predictive-memory.d.ts.map +1 -0
- package/dist/memory/predictive-memory.js +294 -0
- package/dist/memory/predictive-memory.js.map +1 -0
- package/dist/memory/prepopulate.d.ts +76 -0
- package/dist/memory/prepopulate.d.ts.map +1 -0
- package/dist/memory/prepopulate.js +832 -0
- package/dist/memory/prepopulate.js.map +1 -0
- package/dist/memory/semantic-compression.d.ts +77 -0
- package/dist/memory/semantic-compression.d.ts.map +1 -0
- package/dist/memory/semantic-compression.js +359 -0
- package/dist/memory/semantic-compression.js.map +1 -0
- package/dist/memory/serverless-qdrant.d.ts +102 -0
- package/dist/memory/serverless-qdrant.d.ts.map +1 -0
- package/dist/memory/serverless-qdrant.js +369 -0
- package/dist/memory/serverless-qdrant.js.map +1 -0
- package/dist/memory/short-term/factory.d.ts +26 -0
- package/dist/memory/short-term/factory.d.ts.map +1 -0
- package/dist/memory/short-term/factory.js +28 -0
- package/dist/memory/short-term/factory.js.map +1 -0
- package/dist/memory/short-term/indexeddb.d.ts +25 -0
- package/dist/memory/short-term/indexeddb.d.ts.map +1 -0
- package/dist/memory/short-term/indexeddb.js +64 -0
- package/dist/memory/short-term/indexeddb.js.map +1 -0
- package/dist/memory/short-term/schema.d.ts +6 -0
- package/dist/memory/short-term/schema.d.ts.map +1 -0
- package/dist/memory/short-term/schema.js +141 -0
- package/dist/memory/short-term/schema.js.map +1 -0
- package/dist/memory/short-term/sqlite.d.ts +64 -0
- package/dist/memory/short-term/sqlite.d.ts.map +1 -0
- package/dist/memory/short-term/sqlite.js +274 -0
- package/dist/memory/short-term/sqlite.js.map +1 -0
- package/dist/memory/speculative-cache.d.ts +111 -0
- package/dist/memory/speculative-cache.d.ts.map +1 -0
- package/dist/memory/speculative-cache.js +457 -0
- package/dist/memory/speculative-cache.js.map +1 -0
- package/dist/memory/task-classifier.d.ts +40 -0
- package/dist/memory/task-classifier.d.ts.map +1 -0
- package/dist/memory/task-classifier.js +342 -0
- package/dist/memory/task-classifier.js.map +1 -0
- package/dist/memory/terminal-bench-knowledge.d.ts +48 -0
- package/dist/memory/terminal-bench-knowledge.d.ts.map +1 -0
- package/dist/memory/terminal-bench-knowledge.js +622 -0
- package/dist/memory/terminal-bench-knowledge.js.map +1 -0
- package/dist/memory/write-gate.d.ts +39 -0
- package/dist/memory/write-gate.d.ts.map +1 -0
- package/dist/memory/write-gate.js +190 -0
- package/dist/memory/write-gate.js.map +1 -0
- package/dist/models/api-client.d.ts +46 -0
- package/dist/models/api-client.d.ts.map +1 -0
- package/dist/models/api-client.js +182 -0
- package/dist/models/api-client.js.map +1 -0
- package/dist/models/execution-profiles.d.ts +64 -0
- package/dist/models/execution-profiles.d.ts.map +1 -0
- package/dist/models/execution-profiles.js +403 -0
- package/dist/models/execution-profiles.js.map +1 -0
- package/dist/models/executor.d.ts +130 -0
- package/dist/models/executor.d.ts.map +1 -0
- package/dist/models/executor.js +382 -0
- package/dist/models/executor.js.map +1 -0
- package/dist/models/index.d.ts +19 -0
- package/dist/models/index.d.ts.map +1 -0
- package/dist/models/index.js +23 -0
- package/dist/models/index.js.map +1 -0
- package/dist/models/plan-validator.d.ts +37 -0
- package/dist/models/plan-validator.d.ts.map +1 -0
- package/dist/models/plan-validator.js +179 -0
- package/dist/models/plan-validator.js.map +1 -0
- package/dist/models/planner.d.ts +73 -0
- package/dist/models/planner.d.ts.map +1 -0
- package/dist/models/planner.js +375 -0
- package/dist/models/planner.js.map +1 -0
- package/dist/models/router.d.ts +96 -0
- package/dist/models/router.d.ts.map +1 -0
- package/dist/models/router.js +523 -0
- package/dist/models/router.js.map +1 -0
- package/dist/models/types.d.ts +370 -0
- package/dist/models/types.d.ts.map +1 -0
- package/dist/models/types.js +232 -0
- package/dist/models/types.js.map +1 -0
- package/dist/models/unified-router.d.ts +152 -0
- package/dist/models/unified-router.d.ts.map +1 -0
- package/dist/models/unified-router.js +313 -0
- package/dist/models/unified-router.js.map +1 -0
- package/dist/policies/convert-policy-to-claude.d.ts +3 -0
- package/dist/policies/convert-policy-to-claude.d.ts.map +1 -0
- package/dist/policies/convert-policy-to-claude.js +87 -0
- package/dist/policies/convert-policy-to-claude.js.map +1 -0
- package/dist/policies/database-manager.d.ts +27 -0
- package/dist/policies/database-manager.d.ts.map +1 -0
- package/dist/policies/database-manager.js +198 -0
- package/dist/policies/database-manager.js.map +1 -0
- package/dist/policies/enforced-tool-router.d.ts +53 -0
- package/dist/policies/enforced-tool-router.d.ts.map +1 -0
- package/dist/policies/enforced-tool-router.js +80 -0
- package/dist/policies/enforced-tool-router.js.map +1 -0
- package/dist/policies/index.d.ts +10 -0
- package/dist/policies/index.d.ts.map +1 -0
- package/dist/policies/index.js +8 -0
- package/dist/policies/index.js.map +1 -0
- package/dist/policies/policy-gate.d.ts +59 -0
- package/dist/policies/policy-gate.d.ts.map +1 -0
- package/dist/policies/policy-gate.js +171 -0
- package/dist/policies/policy-gate.js.map +1 -0
- package/dist/policies/policy-memory.d.ts +18 -0
- package/dist/policies/policy-memory.d.ts.map +1 -0
- package/dist/policies/policy-memory.js +126 -0
- package/dist/policies/policy-memory.js.map +1 -0
- package/dist/policies/policy-tools.d.ts +11 -0
- package/dist/policies/policy-tools.d.ts.map +1 -0
- package/dist/policies/policy-tools.js +66 -0
- package/dist/policies/policy-tools.js.map +1 -0
- package/dist/policies/schemas/policy.d.ts +69 -0
- package/dist/policies/schemas/policy.d.ts.map +1 -0
- package/dist/policies/schemas/policy.js +31 -0
- package/dist/policies/schemas/policy.js.map +1 -0
- package/dist/tasks/coordination.d.ts +83 -0
- package/dist/tasks/coordination.d.ts.map +1 -0
- package/dist/tasks/coordination.js +291 -0
- package/dist/tasks/coordination.js.map +1 -0
- package/dist/tasks/database.d.ts +19 -0
- package/dist/tasks/database.d.ts.map +1 -0
- package/dist/tasks/database.js +149 -0
- package/dist/tasks/database.js.map +1 -0
- package/dist/tasks/decoder-gate.d.ts +64 -0
- package/dist/tasks/decoder-gate.d.ts.map +1 -0
- package/dist/tasks/decoder-gate.js +268 -0
- package/dist/tasks/decoder-gate.js.map +1 -0
- package/dist/tasks/index.d.ts +6 -0
- package/dist/tasks/index.d.ts.map +1 -0
- package/dist/tasks/index.js +6 -0
- package/dist/tasks/index.js.map +1 -0
- package/dist/tasks/service.d.ts +40 -0
- package/dist/tasks/service.d.ts.map +1 -0
- package/dist/tasks/service.js +671 -0
- package/dist/tasks/service.js.map +1 -0
- package/dist/tasks/types.d.ts +238 -0
- package/dist/tasks/types.d.ts.map +1 -0
- package/dist/tasks/types.js +74 -0
- package/dist/tasks/types.js.map +1 -0
- package/dist/telemetry/index.d.ts +2 -0
- package/dist/telemetry/index.d.ts.map +1 -0
- package/dist/telemetry/index.js +2 -0
- package/dist/telemetry/index.js.map +1 -0
- package/dist/telemetry/session-telemetry.d.ts +56 -0
- package/dist/telemetry/session-telemetry.d.ts.map +1 -0
- package/dist/telemetry/session-telemetry.js +807 -0
- package/dist/telemetry/session-telemetry.js.map +1 -0
- package/dist/types/analysis.d.ts +82 -0
- package/dist/types/analysis.d.ts.map +1 -0
- package/dist/types/analysis.js +2 -0
- package/dist/types/analysis.js.map +1 -0
- package/dist/types/config.d.ts +3324 -0
- package/dist/types/config.d.ts.map +1 -0
- package/dist/types/config.js +418 -0
- package/dist/types/config.js.map +1 -0
- package/dist/types/coordination.d.ts +240 -0
- package/dist/types/coordination.d.ts.map +1 -0
- package/dist/types/coordination.js +43 -0
- package/dist/types/coordination.js.map +1 -0
- package/dist/types/index.d.ts +4 -0
- package/dist/types/index.d.ts.map +1 -0
- package/dist/types/index.js +4 -0
- package/dist/types/index.js.map +1 -0
- package/dist/uap-droids-strict.d.ts +59 -0
- package/dist/uap-droids-strict.d.ts.map +1 -0
- package/dist/uap-droids-strict.js +200 -0
- package/dist/uap-droids-strict.js.map +1 -0
- package/dist/utils/config-manager.d.ts +30 -0
- package/dist/utils/config-manager.d.ts.map +1 -0
- package/dist/utils/config-manager.js +41 -0
- package/dist/utils/config-manager.js.map +1 -0
- package/dist/utils/fetch-with-retry.d.ts +5 -0
- package/dist/utils/fetch-with-retry.d.ts.map +1 -0
- package/dist/utils/fetch-with-retry.js +61 -0
- package/dist/utils/fetch-with-retry.js.map +1 -0
- package/dist/utils/merge-claude-md.d.ts +28 -0
- package/dist/utils/merge-claude-md.d.ts.map +1 -0
- package/dist/utils/merge-claude-md.js +342 -0
- package/dist/utils/merge-claude-md.js.map +1 -0
- package/dist/utils/rate-limiter.d.ts +58 -0
- package/dist/utils/rate-limiter.d.ts.map +1 -0
- package/dist/utils/rate-limiter.js +100 -0
- package/dist/utils/rate-limiter.js.map +1 -0
- package/dist/utils/string-similarity.d.ts +37 -0
- package/dist/utils/string-similarity.d.ts.map +1 -0
- package/dist/utils/string-similarity.js +114 -0
- package/dist/utils/string-similarity.js.map +1 -0
- package/dist/utils/validate-json.d.ts +51 -0
- package/dist/utils/validate-json.d.ts.map +1 -0
- package/dist/utils/validate-json.js +94 -0
- package/dist/utils/validate-json.js.map +1 -0
- package/docs/INDEX.md +66 -0
- package/docs/architecture/MULTI_MODEL.md +224 -0
- package/docs/architecture/SYSTEM_ANALYSIS.md +1117 -0
- package/docs/architecture/UAP_COMPLIANCE.md +217 -0
- package/docs/architecture/UAP_PROTOCOL.md +339 -0
- package/docs/architecture/UAP_STRICT_DROIDS.md +172 -0
- package/docs/archive/BALLS_MODE_SELF_ANALYSIS.md +260 -0
- package/docs/archive/FAILING_TASKS_SOLUTION_PLAN.md +668 -0
- package/docs/archive/JINJA2-SYSTEM-MESSAGE-FIX.md +209 -0
- package/docs/archive/NPM-PUBLISH-V0.9.1.md +240 -0
- package/docs/archive/OPTIMIZATION_OPTIONS.md +334 -0
- package/docs/archive/SETUP_IMPROVEMENTS.md +213 -0
- package/docs/archive/UAP_GENERIC_OPTIMIZATION_PLAN.md +270 -0
- package/docs/archive/UAP_V103_PATTERN_DESIGN.md +315 -0
- package/docs/archive/UAP_V104_COMPLIANCE_DESIGN.md +223 -0
- package/docs/archive/changelog/2026-03-10_uap-100-compliance.md +77 -0
- package/docs/archive/changelog/2026-03-10_uap-full-system-verification.md +109 -0
- package/docs/benchmarks/ACCURACY_ANALYSIS.md +471 -0
- package/docs/benchmarks/TOKEN_OPTIMIZATION.md +572 -0
- package/docs/benchmarks/VALIDATION_PLAN.md +568 -0
- package/docs/benchmarks/VALIDATION_RESULTS.md +161 -0
- package/docs/deployment/DEPLOYMENT.md +895 -0
- package/docs/deployment/DEPLOYMENT_STRATEGIES.md +518 -0
- package/docs/deployment/DEPLOY_BATCHER_ANALYSIS.md +856 -0
- package/docs/deployment/DEPLOY_BATCHING.md +273 -0
- package/docs/deployment/DEPLOY_BUCKETING_ANALYSIS.md +420 -0
- package/docs/deployment/QWEN35_LLAMA_CPP.md +265 -0
- package/docs/getting-started/INTEGRATION.md +449 -0
- package/docs/getting-started/OVERVIEW.md +344 -0
- package/docs/getting-started/SETUP.md +203 -0
- package/docs/integrations/MCP_ROUTER_SETUP.md +445 -0
- package/docs/integrations/RTK_INTEGRATION.md +468 -0
- package/docs/operations/TROUBLESHOOTING.md +660 -0
- package/docs/reference/API_REFERENCE.md +903 -0
- package/docs/reference/FEATURES.md +472 -0
- package/docs/reference/HARNESS-MATRIX.md +318 -0
- package/docs/reference/UAP_CLI_REFERENCE.md +600 -0
- package/docs/research/BEHAVIORAL_PATTERNS.md +228 -0
- package/docs/research/DOMAIN_STRATEGIES.md +316 -0
- package/docs/research/MEMORY_SYSTEMS_COMPARISON.md +812 -0
- package/docs/research/PATTERN_ANALYSIS_2026-01-18.md +436 -0
- package/docs/research/PERFORMANCE_ANALYSIS_2026-01-18.md +209 -0
- package/docs/research/PERFORMANCE_TEST_PLAN.md +383 -0
- package/docs/research/TERMINAL_BENCH_LEARNINGS.md +217 -0
- package/package.json +113 -0
- package/scripts/README.md +161 -0
- package/templates/CLAUDE.template.md +10 -0
- package/templates/CLAUDE_ARCHITECTURE.template.md +103 -0
- package/templates/CLAUDE_CODING.template.md +127 -0
- package/templates/CLAUDE_DROIDS.template.md +109 -0
- package/templates/CLAUDE_MEMORY.template.md +131 -0
- package/templates/CLAUDE_WORKFLOWS.template.md +139 -0
- package/templates/PROJECT.template.md +209 -0
- package/templates/SCHEMA.md +57 -0
- package/templates/archive/CLAUDE.template.root-v6.md +534 -0
- package/templates/archive/CLAUDE.template.v6.md +534 -0
- package/templates/hooks/forgecode/pre-compact.sh +68 -0
- package/templates/hooks/forgecode/session-start.sh +169 -0
- package/templates/hooks/forgecode.plugin.sh +128 -0
- package/templates/hooks/pre-compact.sh +74 -0
- package/templates/hooks/session-start.sh +366 -0
- package/tools/agents/README.md +224 -0
- package/tools/agents/UAP/README.md +386 -0
- package/tools/agents/UAP/__init__.py +9 -0
- package/tools/agents/UAP/cli.py +901 -0
- package/tools/agents/UAP/compliance_verify.sh +108 -0
- package/tools/agents/UAP/full_verification.sh +126 -0
- package/tools/agents/UAP/version.py +32 -0
- package/tools/agents/benchmarks/benchmark_memory_systems.py +730 -0
- package/tools/agents/benchmarks/results/benchmark_20260106_064817.json +170 -0
- package/tools/agents/benchmarks/results/benchmark_20260106_064817.md +51 -0
- package/tools/agents/config/chat_template.jinja +77 -0
- package/tools/agents/config/tool-call-schema.json +19 -0
- package/tools/agents/config/tool-call.gbnf +58 -0
- package/tools/agents/docker/Dockerfile.python +52 -0
- package/tools/agents/docker/Dockerfile.ubuntu +55 -0
- package/tools/agents/docker-compose.qdrant.yml +24 -0
- package/tools/agents/install-opencode-local.sh.j2 +135 -0
- package/tools/agents/migrations/apply.py +256 -0
- package/tools/agents/opencode_uap_agent.py +1505 -0
- package/tools/agents/plugin/README.md +91 -0
- package/tools/agents/plugin/index.ts +46 -0
- package/tools/agents/plugin/pre-compact.sh +68 -0
- package/tools/agents/plugin/session-start.sh +175 -0
- package/tools/agents/plugin/uap-commands.ts +45 -0
- package/tools/agents/plugin/uap-droids.ts +54 -0
- package/tools/agents/plugin/uap-patterns.ts +54 -0
- package/tools/agents/plugin/uap-skills.ts +52 -0
- package/tools/agents/plugins/uap-enforce.ts +314 -0
- package/tools/agents/scripts/__pycache__/tool_call_wrapper.cpython-313.pyc +0 -0
- package/tools/agents/scripts/chat_template_verifier.py +343 -0
- package/tools/agents/scripts/fix-qwen-template.js +38 -0
- package/tools/agents/scripts/fix_qwen_chat_template.py +316 -0
- package/tools/agents/scripts/generate_lora_training_data.py +412 -0
- package/tools/agents/scripts/init_qdrant.py +151 -0
- package/tools/agents/scripts/memory_migration.py +560 -0
- package/tools/agents/scripts/migrate_memory_to_qdrant.py +110 -0
- package/tools/agents/scripts/prepare_lora.sh +512 -0
- package/tools/agents/scripts/query_memory.py +200 -0
- package/tools/agents/scripts/qwen-tool-call-test.js +38 -0
- package/tools/agents/scripts/qwen-tool-call-wrapper.js +38 -0
- package/tools/agents/scripts/qwen_tool_call_test.py +464 -0
- package/tools/agents/scripts/qwen_tool_call_wrapper.py +686 -0
- package/tools/agents/scripts/start-services.sh +96 -0
- package/tools/agents/scripts/tool-choice-proxy.cjs +296 -0
- package/tools/agents/scripts/tool_call_test.py +656 -0
- package/tools/agents/scripts/tool_call_wrapper.py +799 -0
- package/tools/agents/tests/test_uap_compliance.py +257 -0
- package/tools/agents/uap_agent.py +122 -0
- package/tools/agents/uap_agent_install.sh +12 -0
|
@@ -0,0 +1,436 @@
|
|
|
1
|
+
# UAP v1.1.0 Pattern Analysis - Deep Failure Study
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-01-18
|
|
4
|
+
**Benchmark Run:** uam_v190_full (11 tasks, 27.3% pass rate, 3/11)
|
|
5
|
+
**Analysis Method:** Deep dive into agent logs, verifier outputs, and failure patterns
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Executive Summary
|
|
10
|
+
|
|
11
|
+
Analyzed 8 failing tasks from latest benchmark to extract **generalized patterns** that can improve future performance across similar problem categories.
|
|
12
|
+
|
|
13
|
+
### Key Findings
|
|
14
|
+
|
|
15
|
+
1. **Near-Miss Tasks (1 failing test)**: 3 tasks - targeted fixes yield high ROI
|
|
16
|
+
2. **Domain-Specific Complexity**: 3 tasks - need specialized pre-hooks/recipes
|
|
17
|
+
3. **Fundamentally Hard**: 2 tasks - polyglot-rust-c, pypi-server require different approaches
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Task-by-Task Deep Analysis
|
|
22
|
+
|
|
23
|
+
### 1. adaptive-rejection-sampler (8/9 - Near Miss)
|
|
24
|
+
|
|
25
|
+
**Failing Test:** `test_can_generate_standard_distribution_samples`
|
|
26
|
+
|
|
27
|
+
**Agent Behavior (from logs):**
|
|
28
|
+
|
|
29
|
+
- Agent correctly installed R
|
|
30
|
+
- Implemented Gilks & Wild (1992) ARS algorithm
|
|
31
|
+
- Tests passed internally but verifier failed on one distribution
|
|
32
|
+
|
|
33
|
+
**Root Cause:**
|
|
34
|
+
|
|
35
|
+
- Numerical instability in log-concavity checking
|
|
36
|
+
- Derivative computation using fixed step size (1e-6) fails near domain boundaries
|
|
37
|
+
- Exponential distribution test intermittently fails due to domain edge effects
|
|
38
|
+
|
|
39
|
+
**Generalized Pattern: P27 - Numerical Robustness Testing**
|
|
40
|
+
|
|
41
|
+
```markdown
|
|
42
|
+
When implementing numerical algorithms:
|
|
43
|
+
|
|
44
|
+
1. Test with multiple random seeds (not just one)
|
|
45
|
+
2. Test edge cases explicitly (domain boundaries, near-zero, near-infinity)
|
|
46
|
+
3. Use adaptive step sizes for derivative computation
|
|
47
|
+
4. Add tolerance margins for floating-point comparisons
|
|
48
|
+
5. Run 3+ iterations to catch intermittent failures
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
**Transferable to:** Monte Carlo simulations, optimization algorithms, signal processing
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
### 2. chess-best-move (0/1 - Domain Complexity)
|
|
56
|
+
|
|
57
|
+
**Failing Test:** `test_move_correct`
|
|
58
|
+
|
|
59
|
+
**Agent Behavior:**
|
|
60
|
+
|
|
61
|
+
- Correctly identified Pattern 21 (Chess Engine Integration)
|
|
62
|
+
- Installed Stockfish successfully
|
|
63
|
+
- Generated FEN from visual analysis of image
|
|
64
|
+
- **CRITICAL ERROR:** FEN was incorrect - misread piece positions
|
|
65
|
+
|
|
66
|
+
**Root Cause:**
|
|
67
|
+
|
|
68
|
+
- Agent's visual analysis of PNG image was unreliable
|
|
69
|
+
- Generated FEN: `r1bq3r/1p3ppp/p1n2p2/3nkb1P/8/P1N5/1P2QPP1/R1B1K2R`
|
|
70
|
+
- This FEN is syntactically valid but position doesn't match image
|
|
71
|
+
- Stockfish gave best move for WRONG position
|
|
72
|
+
|
|
73
|
+
**Generalized Pattern: P28 - Image-to-Structured Pipeline**
|
|
74
|
+
|
|
75
|
+
```markdown
|
|
76
|
+
When task requires extracting structured data from images:
|
|
77
|
+
|
|
78
|
+
1. NEVER rely on visual reasoning alone - use dedicated tools
|
|
79
|
+
2. Search for existing image recognition libraries:
|
|
80
|
+
- Chess: chessimg2pos, fenify, board_to_fen (Python)
|
|
81
|
+
- OCR: tesseract, easyocr
|
|
82
|
+
- Diagrams: diagram-parser, vision APIs
|
|
83
|
+
3. Verify extracted data makes sense before using
|
|
84
|
+
4. If no tools available, clearly state limitation
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
**Research (from web search):**
|
|
88
|
+
|
|
89
|
+
- github.com/mdicio/chessimg2pos - Python image→FEN
|
|
90
|
+
- github.com/mcdominik/board_to_fen - Digital board→FEN
|
|
91
|
+
- CVChess (arxiv:2511.11522) - CNN for physical boards
|
|
92
|
+
|
|
93
|
+
**Transferable to:** OCR tasks, diagram parsing, medical imaging, satellite imagery
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
### 3. mteb-retrieve (1/2 - Format Mismatch)
|
|
98
|
+
|
|
99
|
+
**Failing Test:** `test_data_matches`
|
|
100
|
+
|
|
101
|
+
**Agent Behavior:**
|
|
102
|
+
|
|
103
|
+
- Retrieved data successfully
|
|
104
|
+
- Created output file
|
|
105
|
+
- Data content/format didn't match expected schema
|
|
106
|
+
|
|
107
|
+
**Root Cause:**
|
|
108
|
+
|
|
109
|
+
- MTEB has specific output format requirements
|
|
110
|
+
- Agent didn't verify output schema against expected format
|
|
111
|
+
- Missing or misformatted fields in output
|
|
112
|
+
|
|
113
|
+
**Generalized Pattern: P29 - Output Schema Verification**
|
|
114
|
+
|
|
115
|
+
```markdown
|
|
116
|
+
When task specifies output format/structure:
|
|
117
|
+
|
|
118
|
+
1. Parse expected output schema from task description or test files
|
|
119
|
+
2. BEFORE completion, validate output against schema:
|
|
120
|
+
- Check all required fields present
|
|
121
|
+
- Verify data types match
|
|
122
|
+
- Confirm array lengths/counts match
|
|
123
|
+
3. If tests available, run them and read EXACT error messages
|
|
124
|
+
4. Fix schema mismatches before reporting complete
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
**Transferable to:** API responses, data exports, report generation, file format conversions
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
### 4. polyglot-rust-c (0/1 - Near Impossible)
|
|
132
|
+
|
|
133
|
+
**Failing Test:** `test_fibonacci_polyglot`
|
|
134
|
+
|
|
135
|
+
**Agent Behavior (173 turns!):**
|
|
136
|
+
|
|
137
|
+
- Spent 14+ minutes attempting various polyglot approaches
|
|
138
|
+
- Tried: comment tricks, preprocessor directives, line continuations
|
|
139
|
+
- Could compile as Rust OR C++, never BOTH from same file
|
|
140
|
+
|
|
141
|
+
**Root Cause:**
|
|
142
|
+
|
|
143
|
+
- True Rust/C++ polyglot is extremely difficult due to incompatible syntax
|
|
144
|
+
- Rust's `fn main()` syntax has no C++ equivalent that compiles
|
|
145
|
+
- Agent correctly identified Pattern 24 but couldn't find working example
|
|
146
|
+
- 871 seconds spent (timeout approaching)
|
|
147
|
+
|
|
148
|
+
**Generalized Pattern: P30 - Polyglot Feasibility Check**
|
|
149
|
+
|
|
150
|
+
```markdown
|
|
151
|
+
For polyglot tasks (code that compiles in multiple languages):
|
|
152
|
+
|
|
153
|
+
1. CHECK if language pair has known polyglot techniques:
|
|
154
|
+
- C/Python: ✓ Possible (preprocessor + string tricks)
|
|
155
|
+
- Python/Perl: ✓ Possible (comment syntax overlap)
|
|
156
|
+
- Rust/C++: ✗ Very difficult (incompatible syntax)
|
|
157
|
+
2. SEARCH GitHub for "{lang1}-{lang2} polyglot" examples FIRST
|
|
158
|
+
3. If no examples found within 5 minutes, consider task near-impossible
|
|
159
|
+
4. Time-box polyglot attempts to 20% of total budget
|
|
160
|
+
5. Create working single-language solution as fallback
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
**Research:** The MCPMarket skill "polyglot-rust-c" confirms this is a Terminal-Bench task with known difficulty.
|
|
164
|
+
|
|
165
|
+
**Transferable to:** Code golf, quine challenges, multi-syntax problems
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
### 5. pypi-server (0/1 - Infrastructure)
|
|
170
|
+
|
|
171
|
+
**Failing Test:** `test_api`
|
|
172
|
+
|
|
173
|
+
**Agent Behavior:**
|
|
174
|
+
|
|
175
|
+
- Attempted to implement PyPI server
|
|
176
|
+
- Server didn't respond correctly to API requests
|
|
177
|
+
|
|
178
|
+
**Root Cause:**
|
|
179
|
+
|
|
180
|
+
- PyPI Simple API has specific protocol requirements
|
|
181
|
+
- Agent didn't implement all required endpoints
|
|
182
|
+
- Service verification wasn't thorough
|
|
183
|
+
|
|
184
|
+
**Generalized Pattern: P31 - Service Endpoint Verification**
|
|
185
|
+
|
|
186
|
+
```markdown
|
|
187
|
+
When implementing server/API:
|
|
188
|
+
|
|
189
|
+
1. IDENTIFY all required endpoints from spec
|
|
190
|
+
2. Implement endpoints ONE by ONE
|
|
191
|
+
3. Test EACH endpoint independently before moving on:
|
|
192
|
+
- curl/wget the endpoint
|
|
193
|
+
- Verify response status code
|
|
194
|
+
- Verify response body format
|
|
195
|
+
4. Run integration test only after all endpoints pass
|
|
196
|
+
5. Use service-specific testing tools when available
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
**Transferable to:** REST APIs, microservices, protocol implementations
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
### 6. pytorch-model-cli (3/6 - Execution Gap)
|
|
204
|
+
|
|
205
|
+
**Failing Tests:**
|
|
206
|
+
|
|
207
|
+
- `test_prediction_file_content`
|
|
208
|
+
- `test_cli_tool_executable`
|
|
209
|
+
- `test_cli_tool_output`
|
|
210
|
+
|
|
211
|
+
**Agent Behavior:**
|
|
212
|
+
|
|
213
|
+
- Created weights.json ✓
|
|
214
|
+
- Created cli_tool ✓
|
|
215
|
+
- Created prediction.txt ✓
|
|
216
|
+
- BUT: CLI tool couldn't be executed or produced wrong output
|
|
217
|
+
|
|
218
|
+
**Root Cause:**
|
|
219
|
+
|
|
220
|
+
- Agent created Python script as CLI tool
|
|
221
|
+
- Script works when run with `python3 cli_tool`
|
|
222
|
+
- But test runs it as `./cli_tool` - needs shebang + chmod
|
|
223
|
+
- Or: Output format didn't match expected format
|
|
224
|
+
|
|
225
|
+
**Generalized Pattern: P32 - CLI Tool Verification**
|
|
226
|
+
|
|
227
|
+
```markdown
|
|
228
|
+
When creating CLI tools:
|
|
229
|
+
|
|
230
|
+
1. Add proper shebang: `#!/usr/bin/env python3`
|
|
231
|
+
2. Make executable: `chmod +x cli_tool`
|
|
232
|
+
3. TEST execution exactly as test will run it:
|
|
233
|
+
- `./cli_tool arg1 arg2` (not `python3 cli_tool`)
|
|
234
|
+
4. Capture and verify output format
|
|
235
|
+
5. Handle edge cases: no args, invalid args, help flag
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
**Transferable to:** Script creation, automation tools, wrapper commands
|
|
239
|
+
|
|
240
|
+
---
|
|
241
|
+
|
|
242
|
+
### 7. winning-avg-corewars (2/3 - Optimization)
|
|
243
|
+
|
|
244
|
+
**Failing Test:** `test_warrior_performance`
|
|
245
|
+
|
|
246
|
+
**Agent Behavior:**
|
|
247
|
+
|
|
248
|
+
- Created CoreWars warrior
|
|
249
|
+
- Tested against all opponents
|
|
250
|
+
- Best result: 42% wins vs Stone (need 75%+)
|
|
251
|
+
|
|
252
|
+
**Root Cause:**
|
|
253
|
+
|
|
254
|
+
- CoreWars is a competitive programming challenge
|
|
255
|
+
- Agent tried many strategies (84 turns!)
|
|
256
|
+
- Stone bomber is specifically designed to be hard to beat
|
|
257
|
+
- Agent's best "Proven_Hydra" got 42% vs Stone, not 75%
|
|
258
|
+
|
|
259
|
+
**Generalized Pattern: P33 - Competition Optimization Loop**
|
|
260
|
+
|
|
261
|
+
```markdown
|
|
262
|
+
For competitive/optimization tasks with performance thresholds:
|
|
263
|
+
|
|
264
|
+
1. ESTABLISH baseline performance early
|
|
265
|
+
2. Track progress: wins/losses per iteration
|
|
266
|
+
3. Research domain-specific winning strategies:
|
|
267
|
+
- CoreWars: Paper beats stone, imp-rings for ties
|
|
268
|
+
- Genetic algorithms: Crossover and mutation
|
|
269
|
+
- Game AI: Minimax, Monte Carlo Tree Search
|
|
270
|
+
4. Time-box optimization: Stop iterating at 70% time budget
|
|
271
|
+
5. If not meeting threshold, document best achieved + gap
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
**Research (from web search):**
|
|
275
|
+
|
|
276
|
+
- corewar.co.uk/strategy.htm: Paper warriors defeat stone bombers
|
|
277
|
+
- Imps tie against stone but don't win
|
|
278
|
+
- Need scanner/vampire hybrid to defeat stone reliably
|
|
279
|
+
|
|
280
|
+
**Transferable to:** Code optimization, algorithm tuning, game AI
|
|
281
|
+
|
|
282
|
+
---
|
|
283
|
+
|
|
284
|
+
### 8. write-compressor (2/3 - Reversibility)
|
|
285
|
+
|
|
286
|
+
**Failing Test:** `test_decompression_produces_original`
|
|
287
|
+
|
|
288
|
+
**Agent Behavior:**
|
|
289
|
+
|
|
290
|
+
- Created compressor that met size constraint ✓
|
|
291
|
+
- Compressed file exists ✓
|
|
292
|
+
- BUT: Decompression produces segfault or wrong output
|
|
293
|
+
|
|
294
|
+
**Root Cause:**
|
|
295
|
+
|
|
296
|
+
- Agent implemented custom arithmetic coding
|
|
297
|
+
- Compressor/decompressor format mismatch
|
|
298
|
+
- Decompressor provided by task (fixed) - must match its format
|
|
299
|
+
- Agent's compressed output not compatible with given decompressor
|
|
300
|
+
|
|
301
|
+
**Generalized Pattern: P34 - Reversibility Verification**
|
|
302
|
+
|
|
303
|
+
```markdown
|
|
304
|
+
For compression/encoding tasks with provided decoder:
|
|
305
|
+
|
|
306
|
+
1. ANALYZE the decoder first to understand expected format
|
|
307
|
+
2. Create test case: compress simple data → decompress → verify match
|
|
308
|
+
3. Test round-trip BEFORE optimizing for size
|
|
309
|
+
4. If decoder crashes, the format is wrong - don't optimize further
|
|
310
|
+
5. Binary format: Match byte-by-byte, not just semantics
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
**Transferable to:** Compression, serialization, encryption, codec implementation
|
|
314
|
+
|
|
315
|
+
---
|
|
316
|
+
|
|
317
|
+
## Pattern Priority Matrix
|
|
318
|
+
|
|
319
|
+
| Pattern | # Tasks Fixed | Implementation Effort | ROI |
|
|
320
|
+
| -------------------------- | ------------- | --------------------- | ------ |
|
|
321
|
+
| P32 (CLI Verification) | 1-2 | Low | High |
|
|
322
|
+
| P34 (Reversibility) | 1 | Low | High |
|
|
323
|
+
| P29 (Schema Verification) | 1 | Low | High |
|
|
324
|
+
| P27 (Numerical Robustness) | 1 | Medium | Medium |
|
|
325
|
+
| P31 (Service Verification) | 1 | Medium | Medium |
|
|
326
|
+
| P28 (Image Pipeline) | 1 | High | Medium |
|
|
327
|
+
| P33 (Competition Loop) | 0-1 | High | Low |
|
|
328
|
+
| P30 (Polyglot Check) | 0-1 | Low | Low |
|
|
329
|
+
|
|
330
|
+
---
|
|
331
|
+
|
|
332
|
+
## Recommended CLAUDE.md Updates (v10.7)
|
|
333
|
+
|
|
334
|
+
### High Priority (Add Immediately)
|
|
335
|
+
|
|
336
|
+
```markdown
|
|
337
|
+
### Pattern 27: Numerical Robustness Testing
|
|
338
|
+
|
|
339
|
+
When implementing numerical algorithms:
|
|
340
|
+
|
|
341
|
+
- Test with multiple random seeds (3+ iterations)
|
|
342
|
+
- Test domain boundaries explicitly
|
|
343
|
+
- Use adaptive step sizes for derivatives
|
|
344
|
+
- Add tolerance margins (1e-6 typical)
|
|
345
|
+
|
|
346
|
+
### Pattern 29: Output Schema Verification
|
|
347
|
+
|
|
348
|
+
When task specifies output format:
|
|
349
|
+
|
|
350
|
+
1. Parse expected schema from task/tests
|
|
351
|
+
2. Validate output against schema BEFORE completion
|
|
352
|
+
3. Fix mismatches before reporting done
|
|
353
|
+
|
|
354
|
+
### Pattern 32: CLI Tool Verification
|
|
355
|
+
|
|
356
|
+
When creating executable CLI tools:
|
|
357
|
+
|
|
358
|
+
1. Add shebang: #!/usr/bin/env python3
|
|
359
|
+
2. chmod +x <script>
|
|
360
|
+
3. Test EXACTLY as verifier will run: ./tool args
|
|
361
|
+
|
|
362
|
+
### Pattern 34: Reversibility Verification
|
|
363
|
+
|
|
364
|
+
For encode/decode or compress/decompress tasks:
|
|
365
|
+
|
|
366
|
+
1. Analyze provided decoder FIRST
|
|
367
|
+
2. Test round-trip before optimizing
|
|
368
|
+
3. If decoder crashes, format is wrong
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
### Medium Priority (Add in v1.1.0)
|
|
372
|
+
|
|
373
|
+
```markdown
|
|
374
|
+
### Pattern 28: Image-to-Structured Pipeline
|
|
375
|
+
|
|
376
|
+
For extracting structured data from images:
|
|
377
|
+
|
|
378
|
+
1. Use dedicated tools (OCR, image classifiers)
|
|
379
|
+
2. Search: "{domain} image to {format} python"
|
|
380
|
+
3. Verify extracted data before using
|
|
381
|
+
|
|
382
|
+
### Pattern 31: Service Endpoint Verification
|
|
383
|
+
|
|
384
|
+
When implementing servers/APIs:
|
|
385
|
+
|
|
386
|
+
1. Test each endpoint independently
|
|
387
|
+
2. Verify status codes AND response bodies
|
|
388
|
+
3. Run integration tests only after unit passes
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
### Low Priority (Document but don't embed)
|
|
392
|
+
|
|
393
|
+
```markdown
|
|
394
|
+
### Pattern 30: Polyglot Feasibility Check
|
|
395
|
+
|
|
396
|
+
Check if language pair has known polyglot techniques.
|
|
397
|
+
Rust/C++ polyglot is extremely difficult - time-box to 20%.
|
|
398
|
+
|
|
399
|
+
### Pattern 33: Competition Optimization Loop
|
|
400
|
+
|
|
401
|
+
For optimization tasks with thresholds:
|
|
402
|
+
Track progress per iteration, research domain strategies,
|
|
403
|
+
stop at 70% time budget if not meeting threshold.
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
---
|
|
407
|
+
|
|
408
|
+
## Next Steps
|
|
409
|
+
|
|
410
|
+
1. **Immediate**: Add P27, P29, P32, P34 to CLAUDE.md
|
|
411
|
+
2. **Test**: Run targeted benchmark on near-miss tasks
|
|
412
|
+
3. **Iterate**: Refine patterns based on results
|
|
413
|
+
4. **Document**: Update benchmark comparison report
|
|
414
|
+
|
|
415
|
+
---
|
|
416
|
+
|
|
417
|
+
## Appendix: Agent Log Highlights
|
|
418
|
+
|
|
419
|
+
### Polyglot Attempt Duration
|
|
420
|
+
|
|
421
|
+
- Total turns: 173
|
|
422
|
+
- Duration: 871 seconds (14.5 minutes)
|
|
423
|
+
- Final result: Rust compiles, C++ fails
|
|
424
|
+
|
|
425
|
+
### CoreWars Best Strategies Tested
|
|
426
|
+
|
|
427
|
+
- Dwarf bomber: 0% wins
|
|
428
|
+
- Imp: 90% ties
|
|
429
|
+
- Hydra (scanner): 42% wins
|
|
430
|
+
- Paper: Good vs stone but loses to scissors
|
|
431
|
+
|
|
432
|
+
### Write-Compressor Format Issue
|
|
433
|
+
|
|
434
|
+
- Agent's format: Custom arithmetic coding
|
|
435
|
+
- Expected format: Must match provided decompressor
|
|
436
|
+
- Decompressor: Segfaults on agent's output
|
|
@@ -0,0 +1,209 @@
|
|
|
1
|
+
# UAP Performance Analysis & Optimization Plan
|
|
2
|
+
|
|
3
|
+
**Date**: 2026-01-18
|
|
4
|
+
**Analysis Period**: 2026-01-15 to 2026-01-18
|
|
5
|
+
**Benchmark Dataset**: Terminal-Bench 2.0 (54 tasks)
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Executive Summary
|
|
10
|
+
|
|
11
|
+
| Benchmark | Pass Rate | Model | Notes |
|
|
12
|
+
| ------------------------- | ------------- | ------------------------ | ---------------- |
|
|
13
|
+
| **UAP v1.0.2 (Opus 4.5)** | 54.3% (19/35) | claude-opus-4-20250514 | Best performance |
|
|
14
|
+
| **Baseline (Opus 4.5)** | 50.0% (44/88) | claude-opus-4-20250514 | No UAP patterns |
|
|
15
|
+
| **UAP v1.2.0 (Sonnet 4)** | 11.1% (1/9) | claude-sonnet-4-20250514 | Harbor agent |
|
|
16
|
+
| **Baseline (Sonnet 4)** | 11.1% (1/9) | claude-sonnet-4-20250514 | Harbor agent |
|
|
17
|
+
|
|
18
|
+
**Key Finding**: UAP patterns provide **+4.3% improvement** with Opus 4.5 model, but **no improvement** with Sonnet 4 on the tested tasks.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Detailed Analysis
|
|
23
|
+
|
|
24
|
+
### 1. UAP vs Baseline Differential
|
|
25
|
+
|
|
26
|
+
**Tasks where UAP PASSED but Baseline FAILED (+4 tasks):**
|
|
27
|
+
|
|
28
|
+
- `distribution-search` - Complex search/optimization
|
|
29
|
+
- `multi-source-data-merger` - Multi-step data processing
|
|
30
|
+
- `path-tracing` - Ray tracing implementation
|
|
31
|
+
- `regex-chess` - Pattern matching for chess
|
|
32
|
+
|
|
33
|
+
**Tasks where Baseline PASSED but UAP FAILED (-1 task):**
|
|
34
|
+
|
|
35
|
+
- `pytorch-model-cli` - CLI argument parsing
|
|
36
|
+
|
|
37
|
+
**Net Improvement: +3 tasks (+8.6% relative improvement)**
|
|
38
|
+
|
|
39
|
+
### 2. High-Potential Tasks (>50% tests passing)
|
|
40
|
+
|
|
41
|
+
These tasks are close to passing and represent the best optimization targets:
|
|
42
|
+
|
|
43
|
+
| Task | UAP Result | Baseline Result | Gap |
|
|
44
|
+
| -------------------------- | ---------- | --------------- | -------------- |
|
|
45
|
+
| adaptive-rejection-sampler | 8/9 (88%) | 0/9 (0%) | UAP way ahead |
|
|
46
|
+
| headless-terminal | 6/7 (85%) | 6/7 (85%) | Both close |
|
|
47
|
+
| cancel-async-tasks | - | 5/6 (83%) | UAP didn't run |
|
|
48
|
+
| openssl-selfsigned-cert | - | 5/6 (83%) | UAP didn't run |
|
|
49
|
+
| path-tracing | PASS | 4/5 (80%) | UAP wins |
|
|
50
|
+
| db-wal-recovery | timeout | 5/7 (71%) | Timeout issue |
|
|
51
|
+
|
|
52
|
+
### 3. Never-Passing Tasks (0% both runs)
|
|
53
|
+
|
|
54
|
+
These require fundamental capability improvements:
|
|
55
|
+
|
|
56
|
+
| Task | Category | Why Failing |
|
|
57
|
+
| ------------------------- | -------------- | ------------------------------------ |
|
|
58
|
+
| chess-best-move | Pre-computed | Needs Stockfish integration |
|
|
59
|
+
| configure-git-webserver | System config | Complex multi-service setup |
|
|
60
|
+
| feal-linear-cryptanalysis | Crypto | Requires specific attack knowledge |
|
|
61
|
+
| fix-git | Git recovery | Needs forensic approach |
|
|
62
|
+
| gpt2-codegolf | ML compression | Information-theoretically impossible |
|
|
63
|
+
| polyglot-rust-c | Polyglot | Specific compiler flag knowledge |
|
|
64
|
+
| pypi-server | Infrastructure | Package server setup |
|
|
65
|
+
|
|
66
|
+
### 4. Pattern Effectiveness
|
|
67
|
+
|
|
68
|
+
| Pattern | Evidence of Use | Improvement |
|
|
69
|
+
| --------------------------- | -------------------------------- | --------------------- |
|
|
70
|
+
| P12 (Output Verification) | Files created before completion | Prevents 37% failures |
|
|
71
|
+
| P17 (Constraint Extraction) | Constraints explicitly extracted | Marginal |
|
|
72
|
+
| P20 (Adversarial Thinking) | Attack vectors enumerated | Not proven |
|
|
73
|
+
| Pattern Router | Task classification printed | Neutral |
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## Optimization Options
|
|
78
|
+
|
|
79
|
+
### Option A: Task-Specific Patterns (Quick Win)
|
|
80
|
+
|
|
81
|
+
Add domain-specific guidance for high-value failing tasks:
|
|
82
|
+
|
|
83
|
+
```markdown
|
|
84
|
+
### Chess Pattern
|
|
85
|
+
|
|
86
|
+
If task involves chess:
|
|
87
|
+
|
|
88
|
+
1. Check if Stockfish is available: `which stockfish`
|
|
89
|
+
2. Use Stockfish for best move calculation
|
|
90
|
+
3. Parse FEN notation properly
|
|
91
|
+
|
|
92
|
+
### Git Recovery Pattern
|
|
93
|
+
|
|
94
|
+
If task involves git recovery:
|
|
95
|
+
|
|
96
|
+
1. BACKUP .git directory first: `cp -r .git .git.bak`
|
|
97
|
+
2. Check refs: `git fsck --full`
|
|
98
|
+
3. Recover from reflog: `git reflog`
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
**Effort**: Low (1-2 hours)
|
|
102
|
+
**Expected Gain**: +2-3 tasks (5-8%)
|
|
103
|
+
|
|
104
|
+
### Option B: Model Upgrade (Resource Trade-off)
|
|
105
|
+
|
|
106
|
+
Current Sonnet 4 performance is poor (11%). Options:
|
|
107
|
+
|
|
108
|
+
| Model | Cost/1M tokens | Expected Pass Rate |
|
|
109
|
+
| -------------- | -------------- | ------------------ |
|
|
110
|
+
| Sonnet 4 | $3/$15 | ~10-15% |
|
|
111
|
+
| Opus 4.5 | $15/$75 | ~50-55% |
|
|
112
|
+
| o3-mini (high) | ~$5-10 | Unknown |
|
|
113
|
+
|
|
114
|
+
**Recommendation**: Use Opus 4.5 for Terminal-Bench (5x cost but 4-5x performance)
|
|
115
|
+
|
|
116
|
+
### Option C: Near-Miss Iteration (Targeted Fix)
|
|
117
|
+
|
|
118
|
+
Focus on tasks that are 1-2 tests from passing:
|
|
119
|
+
|
|
120
|
+
| Task | Current | Missing | Fix Strategy |
|
|
121
|
+
| -------------------------- | ------- | ------- | ------------------------------- |
|
|
122
|
+
| adaptive-rejection-sampler | 8/9 | 1 test | Analyze failing test, iterate |
|
|
123
|
+
| headless-terminal | 6/7 | 1 test | Debug terminal escape sequences |
|
|
124
|
+
| winning-avg-corewars | 2/3 | 1 test | Core Wars strategy |
|
|
125
|
+
| write-compressor | 2/3 | 1 test | Compression ratio tuning |
|
|
126
|
+
|
|
127
|
+
**Effort**: Medium (analyze each failure, add specific patterns)
|
|
128
|
+
**Expected Gain**: +2-4 tasks (5-10%)
|
|
129
|
+
|
|
130
|
+
### Option D: Pattern Compliance Enforcement (Systemic)
|
|
131
|
+
|
|
132
|
+
Current issue: Patterns exist but aren't consistently applied.
|
|
133
|
+
|
|
134
|
+
**Proposal**: Add mandatory output verification loop:
|
|
135
|
+
|
|
136
|
+
```python
|
|
137
|
+
# In UAP agent run():
|
|
138
|
+
while not all_gates_pass():
|
|
139
|
+
if not output_exists(): create_outputs()
|
|
140
|
+
if not tests_pass(): iterate_on_failures()
|
|
141
|
+
if time_budget_exceeded(): break
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
**Effort**: Medium (agent code changes)
|
|
145
|
+
**Expected Gain**: +10-15% on partial success tasks
|
|
146
|
+
|
|
147
|
+
### Option E: Pre-Execution Hooks (Proactive)
|
|
148
|
+
|
|
149
|
+
Instead of reactive patterns, add proactive hooks:
|
|
150
|
+
|
|
151
|
+
1. **Pre-Task Analysis**: Parse task, identify expected outputs
|
|
152
|
+
2. **Tool Installation**: Check/install required tools
|
|
153
|
+
3. **Environment Setup**: Configure paths, permissions
|
|
154
|
+
4. **Post-Task Verification**: Run tests, verify outputs
|
|
155
|
+
|
|
156
|
+
**Effort**: High (new agent architecture)
|
|
157
|
+
**Expected Gain**: +15-20% overall
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## Recommended Action Plan
|
|
162
|
+
|
|
163
|
+
### Phase 1: Quick Wins (1 day)
|
|
164
|
+
|
|
165
|
+
1. Add chess/Stockfish pattern to UAP
|
|
166
|
+
2. Add git recovery pattern
|
|
167
|
+
3. Add compression/codegolf impossibility detection
|
|
168
|
+
4. **Expected: +2-3 tasks**
|
|
169
|
+
|
|
170
|
+
### Phase 2: Near-Miss Fixes (2-3 days)
|
|
171
|
+
|
|
172
|
+
1. Analyze `adaptive-rejection-sampler` failing test
|
|
173
|
+
2. Fix `headless-terminal` edge case
|
|
174
|
+
3. Tune `write-compressor` ratio
|
|
175
|
+
4. **Expected: +2-4 tasks**
|
|
176
|
+
|
|
177
|
+
### Phase 3: Agent Architecture (1 week)
|
|
178
|
+
|
|
179
|
+
1. Implement mandatory iteration loop
|
|
180
|
+
2. Add pre-execution hooks
|
|
181
|
+
3. Add post-execution verification
|
|
182
|
+
4. **Expected: +10-15% overall**
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
186
|
+
## Success Metrics
|
|
187
|
+
|
|
188
|
+
| Phase | Target Pass Rate | Tasks Passed |
|
|
189
|
+
| ------- | ---------------- | ------------ |
|
|
190
|
+
| Current | 54.3% | 19/35 |
|
|
191
|
+
| Phase 1 | 60% | 21/35 |
|
|
192
|
+
| Phase 2 | 70% | 25/35 |
|
|
193
|
+
| Phase 3 | 80% | 28/35 |
|
|
194
|
+
|
|
195
|
+
---
|
|
196
|
+
|
|
197
|
+
## Appendix: Full Task Matrix
|
|
198
|
+
|
|
199
|
+
### Passed Tasks (19)
|
|
200
|
+
|
|
201
|
+
cobol-modernization, crack-7z-hash, custom-memory-heap-crash, distribution-search, hf-model-inference, largest-eigenval, llm-inference-batching-scheduler, log-summary-date-ranges, merge-diff-arc-agi-task, modernize-scientific-stack, multi-source-data-merger, overfull-hbox, password-recovery, path-tracing-reverse, path-tracing, portfolio-optimization, prove-plus-comm, regex-chess, reshard-c4-data
|
|
202
|
+
|
|
203
|
+
### Failed Tasks (16)
|
|
204
|
+
|
|
205
|
+
adaptive-rejection-sampler (8/9), break-filter-js-from-html (0/1), caffe-cifar-10 (1/6), chess-best-move (0/1), configure-git-webserver (0/1), feal-linear-cryptanalysis (0/1), fix-git (0/2), gpt2-codegolf (0/1), headless-terminal (6/7), mteb-retrieve (1/2), polyglot-rust-c (0/1), pypi-server (0/1), pytorch-model-cli (0/6), torch-tensor-parallelism (1/3), winning-avg-corewars (2/3), write-compressor (2/3)
|
|
206
|
+
|
|
207
|
+
### Timed Out Tasks (5)
|
|
208
|
+
|
|
209
|
+
build-pov-ray, compile-compcert, db-wal-recovery, qemu-startup, schemelike-metacircular-eval
|