@miller-tech/uap 1.40.0 → 1.40.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +109 -642
- package/docs/INDEX.md +48 -286
- package/docs/architecture/OVERVIEW.md +328 -0
- package/docs/architecture/PROTOCOL.md +204 -0
- package/docs/benchmarks/README.md +17 -192
- package/docs/getting-started/CONFIGURATION.md +237 -0
- package/docs/getting-started/INSTALLATION.md +125 -0
- package/docs/getting-started/QUICKSTART.md +115 -0
- package/docs/guides/COORDINATION.md +162 -0
- package/docs/guides/DELIVER.md +115 -0
- package/docs/guides/DEPLOY_BATCHING.md +212 -0
- package/docs/guides/DROIDS_AND_SKILLS.md +202 -0
- package/docs/guides/LOCAL_MODELS.md +148 -0
- package/docs/guides/MCP_ROUTER.md +195 -0
- package/docs/guides/MEMORY.md +235 -0
- package/docs/guides/MULTI_MODEL.md +223 -0
- package/docs/guides/POLICIES.md +190 -0
- package/docs/guides/WORKTREE_WORKFLOW.md +185 -0
- package/docs/integrations/MCP_ROUTER.md +147 -0
- package/docs/integrations/RTK.md +102 -0
- package/docs/reference/API.md +485 -0
- package/docs/reference/CLI.md +719 -0
- package/docs/reference/CONFIGURATION.md +90 -193
- package/docs/reference/DATABASE_SCHEMA.md +110 -344
- package/docs/reference/FEATURES.md +176 -472
- package/docs/reference/PATTERNS.md +102 -0
- package/docs/reference/PLATFORMS.md +83 -0
- package/package.json +1 -1
- package/docs/AGENTS.md +0 -423
- package/docs/DOCUMENTATION_AUDIT_REPORT.md +0 -131
- package/docs/GETTING_STARTED.md +0 -288
- package/docs/PROJECT_ANALYSIS_REPORT.md +0 -510
- package/docs/architecture/COMPLETE_ARCHITECTURE.md +0 -748
- package/docs/architecture/EXPERT_STACK.md +0 -137
- package/docs/architecture/MULTI_MODEL.md +0 -224
- package/docs/architecture/PLATFORM_GATING.md +0 -68
- package/docs/architecture/SYSTEM_ANALYSIS.md +0 -334
- package/docs/architecture/UAP_COMPLIANCE.md +0 -217
- package/docs/architecture/UAP_PROTOCOL.md +0 -339
- package/docs/architecture/UAP_STRICT_DROIDS.md +0 -172
- package/docs/archive/BALLS_MODE_SELF_ANALYSIS.md +0 -260
- package/docs/archive/BENCHMARK_GAPS_AND_PLAN.md +0 -146
- package/docs/archive/FAILING_TASKS_SOLUTION_PLAN.md +0 -668
- package/docs/archive/JINJA2-SYSTEM-MESSAGE-FIX.md +0 -209
- package/docs/archive/MODEL_ROUTING_IMPLEMENTATION_SUMMARY.md +0 -281
- package/docs/archive/MODEL_ROUTING_OPTIMIZATION_PLAN.md +0 -320
- package/docs/archive/NPM-PUBLISH-V0.9.1.md +0 -240
- package/docs/archive/OPTIMIZATION_OPTIONS.md +0 -334
- package/docs/archive/PARALLELISM_GAPS_AND_OPTIONS.md +0 -422
- package/docs/archive/POLICY_GATE_IMPLEMENTATION.md +0 -245
- package/docs/archive/SETUP_IMPROVEMENTS.md +0 -213
- package/docs/archive/UAP_GENERIC_OPTIMIZATION_PLAN.md +0 -270
- package/docs/archive/UAP_OPTIMIZATION_PLAN.md +0 -701
- package/docs/archive/UAP_V103_PATTERN_DESIGN.md +0 -315
- package/docs/archive/UAP_V104_COMPLIANCE_DESIGN.md +0 -223
- package/docs/archive/changelog/2026-03-10_uap-100-compliance.md +0 -77
- package/docs/archive/changelog/2026-03-10_uap-full-system-verification.md +0 -109
- package/docs/archive/opencode-integration-guide.md +0 -740
- package/docs/archive/opencode-integration-quickref.md +0 -180
- package/docs/benchmarks/OVERNIGHT_RUNNER.md +0 -341
- package/docs/benchmarks/SPECULATIVE_DECODING_JOURNEY_2026-03.md +0 -221
- package/docs/benchmarks/VALIDATION_PLAN.md +0 -568
- package/docs/blog/SPECULATIVE_DECODING_PRODUCTION_PLAYBOOK.md +0 -139
- package/docs/blog/local-coding-agents.md +0 -266
- package/docs/blog/x-thread.md +0 -254
- package/docs/deployment/DEPLOYMENT.md +0 -895
- package/docs/deployment/DEPLOYMENT_STRATEGIES.md +0 -518
- package/docs/deployment/DEPLOY_BATCHER_ANALYSIS.md +0 -224
- package/docs/deployment/DEPLOY_BATCHING.md +0 -273
- package/docs/deployment/DEPLOY_BUCKETING_ANALYSIS.md +0 -420
- package/docs/deployment/QWEN35_LLAMA_CPP.md +0 -426
- package/docs/deployment/UAP_LLAMA_ANTHROPIC_PROXY_BOOTSTRAP.md +0 -279
- package/docs/getting-started/INTEGRATION.md +0 -628
- package/docs/getting-started/OVERVIEW.md +0 -324
- package/docs/getting-started/SETUP.md +0 -377
- package/docs/integrations/MCP_ROUTER_SETUP.md +0 -445
- package/docs/integrations/RTK_INTEGRATION.md +0 -468
- package/docs/operations/TROUBLESHOOTING.md +0 -660
- package/docs/pr/PR_SPECULATIVE_DOCS_TEMPLATE.md +0 -146
- package/docs/pr/UPSTREAM_PRS.md +0 -424
- package/docs/reference/API_REFERENCE.md +0 -903
- package/docs/reference/EXPERT_DROIDS.md +0 -219
- package/docs/reference/HARNESS-MATRIX.md +0 -318
- package/docs/reference/PATTERN_LIBRARY.md +0 -636
- package/docs/reference/UAP_CLI_REFERENCE.md +0 -620
- package/docs/research/BEHAVIORAL_PATTERNS.md +0 -228
- package/docs/research/DOMAIN_STRATEGIES.md +0 -316
- package/docs/research/MEMORY_SYSTEMS_COMPARISON.md +0 -812
- package/docs/research/PATTERN_ANALYSIS_2026-01-18.md +0 -436
- package/docs/research/PERFORMANCE_ANALYSIS_2026-01-18.md +0 -209
- package/docs/research/PERFORMANCE_TEST_PLAN.md +0 -383
- package/docs/research/TERMINAL_BENCH_LEARNINGS.md +0 -217
|
@@ -1,213 +0,0 @@
|
|
|
1
|
-
# Setup Improvements - Summary
|
|
2
|
-
|
|
3
|
-
## Overview
|
|
4
|
-
|
|
5
|
-
Enhanced the UAP setup process to ensure all dependencies are checked, git hooks are configured, and comprehensive documentation is provided.
|
|
6
|
-
|
|
7
|
-
## Changes Made
|
|
8
|
-
|
|
9
|
-
### 1. New Setup Script (`scripts/setup.sh`)
|
|
10
|
-
|
|
11
|
-
A comprehensive setup script that:
|
|
12
|
-
|
|
13
|
-
**Dependency Checking:**
|
|
14
|
-
|
|
15
|
-
- ✅ Checks for required dependencies (Node.js >= 18, npm, git, npx)
|
|
16
|
-
- ✅ Recommends optional dependencies (Docker, Python 3, pre-commit)
|
|
17
|
-
- ✅ Provides platform-specific installation instructions
|
|
18
|
-
- ✅ Shows clear error messages with installation commands
|
|
19
|
-
|
|
20
|
-
**Installation:**
|
|
21
|
-
|
|
22
|
-
- ✅ Installs npm dependencies if not present
|
|
23
|
-
- ✅ Builds TypeScript project
|
|
24
|
-
- ✅ Validates build success before proceeding
|
|
25
|
-
|
|
26
|
-
**Git Hooks Configuration:**
|
|
27
|
-
|
|
28
|
-
- `pre-commit` - Secrets detection, linting enforcement
|
|
29
|
-
- `commit-msg` - Conventional commits validation
|
|
30
|
-
- `pre-push` - Test execution before push
|
|
31
|
-
|
|
32
|
-
**Additional Features:**
|
|
33
|
-
|
|
34
|
-
- ✅ Creates GitHub PR template (if gh CLI available)
|
|
35
|
-
- ✅ Provides clear next steps after setup
|
|
36
|
-
- ✅ Handles missing dependencies gracefully
|
|
37
|
-
|
|
38
|
-
### 2. Updated Installation Scripts
|
|
39
|
-
|
|
40
|
-
**`scripts/install-web.sh`:**
|
|
41
|
-
|
|
42
|
-
- Updated next steps to reference `uap init` instead of `uap init --web`
|
|
43
|
-
- Improved clarity on post-setup actions
|
|
44
|
-
|
|
45
|
-
**`scripts/install-desktop.sh`:**
|
|
46
|
-
|
|
47
|
-
- Updated next steps to reference `uap init` instead of `uap init --desktop`
|
|
48
|
-
- Improved clarity on post-setup actions
|
|
49
|
-
|
|
50
|
-
### 3. Updated Package.json
|
|
51
|
-
|
|
52
|
-
**Added:**
|
|
53
|
-
|
|
54
|
-
- `"setup": "bash scripts/setup.sh"` - Main setup command
|
|
55
|
-
- `"scripts"` directory in `files` array - Ensures scripts are published
|
|
56
|
-
- Updated `postinstall` to recommend `npm run setup`
|
|
57
|
-
|
|
58
|
-
**Removed:**
|
|
59
|
-
|
|
60
|
-
- Duplicate `bin` field (was listed twice)
|
|
61
|
-
|
|
62
|
-
### 4. Enhanced Documentation
|
|
63
|
-
|
|
64
|
-
**`README.md`:**
|
|
65
|
-
|
|
66
|
-
- Added "Complete Setup" section with comprehensive instructions
|
|
67
|
-
- Expanded "Requirements" section with dependency table
|
|
68
|
-
- Added platform-specific installation commands (macOS, Ubuntu, Windows)
|
|
69
|
-
|
|
70
|
-
**`docs/SETUP.md` (NEW):**
|
|
71
|
-
|
|
72
|
-
- Complete setup guide with:
|
|
73
|
-
- Quick start instructions
|
|
74
|
-
- Detailed dependency information
|
|
75
|
-
- Platform-specific setup commands
|
|
76
|
-
- Git hooks documentation
|
|
77
|
-
- Environment variable setup
|
|
78
|
-
- Verification steps
|
|
79
|
-
- Troubleshooting guide
|
|
80
|
-
- Security notes
|
|
81
|
-
|
|
82
|
-
**`scripts/README.md` (NEW):**
|
|
83
|
-
|
|
84
|
-
- Documentation for all setup scripts
|
|
85
|
-
- Git hooks explanation
|
|
86
|
-
- Best practices
|
|
87
|
-
- Security notes
|
|
88
|
-
|
|
89
|
-
### 5. Git Hooks Created
|
|
90
|
-
|
|
91
|
-
**`.git/hooks/pre-commit`:**
|
|
92
|
-
|
|
93
|
-
```bash
|
|
94
|
-
# Checks:
|
|
95
|
-
# - Scans for secrets in TypeScript/JavaScript/JSON files
|
|
96
|
-
# - Runs linter with zero warnings allowed
|
|
97
|
-
# - Prevents accidental commits of sensitive data
|
|
98
|
-
```
|
|
99
|
-
|
|
100
|
-
**`.git/hooks/commit-msg`:**
|
|
101
|
-
|
|
102
|
-
```bash
|
|
103
|
-
# Validates:
|
|
104
|
-
# - Conventional commits format (type(scope): description)
|
|
105
|
-
# - Allowed types: feat, fix, docs, style, refactor, test, chore, perf, ci, build, revert
|
|
106
|
-
# - Allows override with confirmation
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
**`.git/hooks/pre-push`:**
|
|
110
|
-
|
|
111
|
-
```bash
|
|
112
|
-
# Runs:
|
|
113
|
-
# - npm test before pushing
|
|
114
|
-
# - Prevents pushing broken code
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
## Usage
|
|
118
|
-
|
|
119
|
-
### Quick Setup
|
|
120
|
-
|
|
121
|
-
```bash
|
|
122
|
-
# Install UAP globally
|
|
123
|
-
npm install -g universal-agent-protocol
|
|
124
|
-
|
|
125
|
-
# Run comprehensive setup
|
|
126
|
-
npm run setup
|
|
127
|
-
|
|
128
|
-
# Initialize in your project
|
|
129
|
-
uap init
|
|
130
|
-
```
|
|
131
|
-
|
|
132
|
-
### Platform-Specific Setup
|
|
133
|
-
|
|
134
|
-
```bash
|
|
135
|
-
# Web platforms (claude.ai, Factory.AI)
|
|
136
|
-
npm run install:web
|
|
137
|
-
|
|
138
|
-
# Desktop (Claude Code, opencode)
|
|
139
|
-
npm run install:desktop
|
|
140
|
-
```
|
|
141
|
-
|
|
142
|
-
## Testing
|
|
143
|
-
|
|
144
|
-
All changes verified:
|
|
145
|
-
|
|
146
|
-
- ✅ 149 tests pass
|
|
147
|
-
- ✅ Linter passes with no errors
|
|
148
|
-
- ✅ TypeScript builds successfully
|
|
149
|
-
- ✅ Setup script runs without errors
|
|
150
|
-
- ✅ Git hooks created and executable
|
|
151
|
-
|
|
152
|
-
## Benefits
|
|
153
|
-
|
|
154
|
-
1. **Better User Experience:**
|
|
155
|
-
- Clear dependency checking
|
|
156
|
-
- Automatic git hook configuration
|
|
157
|
-
- Comprehensive error messages
|
|
158
|
-
- Platform-specific installation commands
|
|
159
|
-
|
|
160
|
-
2. **Improved Security:**
|
|
161
|
-
- Pre-commit hook detects secrets
|
|
162
|
-
- Linting enforcement prevents bad code
|
|
163
|
-
- Test validation before push
|
|
164
|
-
|
|
165
|
-
3. **Better Documentation:**
|
|
166
|
-
- Setup guide in `docs/SETUP.md`
|
|
167
|
-
- Script documentation in `scripts/README.md`
|
|
168
|
-
- Enhanced README with requirements table
|
|
169
|
-
- Clear next steps after setup
|
|
170
|
-
|
|
171
|
-
4. **Easier Maintenance:**
|
|
172
|
-
- Centralized setup logic in `setup.sh`
|
|
173
|
-
- Consistent configuration across platforms
|
|
174
|
-
- Automated testing of setup process
|
|
175
|
-
|
|
176
|
-
## Next Steps for Users
|
|
177
|
-
|
|
178
|
-
After running `npm run setup`:
|
|
179
|
-
|
|
180
|
-
1. Review the generated CLAUDE.md
|
|
181
|
-
2. Set up cloud memory backends (optional):
|
|
182
|
-
```bash
|
|
183
|
-
export GITHUB_TOKEN=your_token
|
|
184
|
-
export QDRANT_API_KEY=your_key
|
|
185
|
-
export QDRANT_URL=your_url
|
|
186
|
-
```
|
|
187
|
-
3. Start working - your AI assistant will follow the workflows automatically!
|
|
188
|
-
|
|
189
|
-
## Files Modified
|
|
190
|
-
|
|
191
|
-
1. `scripts/setup.sh` - NEW: Comprehensive setup script
|
|
192
|
-
2. `scripts/install-web.sh` - Updated next steps
|
|
193
|
-
3. `scripts/install-desktop.sh` - Updated next steps
|
|
194
|
-
4. `package.json` - Added setup script, updated files array
|
|
195
|
-
5. `README.md` - Enhanced with complete setup instructions
|
|
196
|
-
6. `docs/SETUP.md` - NEW: Complete setup guide
|
|
197
|
-
7. `scripts/README.md` - NEW: Script documentation
|
|
198
|
-
|
|
199
|
-
## Files Created (by setup script)
|
|
200
|
-
|
|
201
|
-
1. `.git/hooks/pre-commit` - Secrets detection, linting
|
|
202
|
-
2. `.git/hooks/commit-msg` - Conventional commits validation
|
|
203
|
-
3. `.git/hooks/pre-push` - Test validation before push
|
|
204
|
-
4. `.github/pull_request_template.md` - PR template (if gh CLI available)
|
|
205
|
-
|
|
206
|
-
## Backwards Compatibility
|
|
207
|
-
|
|
208
|
-
All changes are backwards compatible:
|
|
209
|
-
|
|
210
|
-
- Existing installations continue to work
|
|
211
|
-
- New features are opt-in via `npm run setup`
|
|
212
|
-
- Git hooks are additive (don't break existing workflows)
|
|
213
|
-
- No breaking changes to APIs or configuration
|
|
@@ -1,270 +0,0 @@
|
|
|
1
|
-
# UAM Generic Optimization Plan
|
|
2
|
-
|
|
3
|
-
**Generated:** 2026-01-17
|
|
4
|
-
**Goal:** Fix failures with GENERIC patterns, not task-specific recipes
|
|
5
|
-
|
|
6
|
-
---
|
|
7
|
-
|
|
8
|
-
## Root Cause Analysis
|
|
9
|
-
|
|
10
|
-
| Failure Pattern | Tasks | % of Failures | Generic? |
|
|
11
|
-
|-----------------|-------|---------------|----------|
|
|
12
|
-
| **Output Not Created** | 7 | 37% | YES |
|
|
13
|
-
| **Wrong Output Value** | 5 | 26% | PARTIALLY |
|
|
14
|
-
| **Partial Success** | 4 | 21% | YES |
|
|
15
|
-
| **Exception/Test Failed** | 3 | 16% | PARTIALLY |
|
|
16
|
-
|
|
17
|
-
**Key Insight:** 37% of failures are because the agent simply didn't create the required output file. This is a GENERIC problem solvable with a GENERIC pattern.
|
|
18
|
-
|
|
19
|
-
---
|
|
20
|
-
|
|
21
|
-
## Generic Solution 1: Output Existence Verification (OEV)
|
|
22
|
-
|
|
23
|
-
**Problem:** Agent completes task but required files don't exist.
|
|
24
|
-
|
|
25
|
-
**Affected tasks (7):** break-filter-js-from-html, caffe-cifar-10, compile-compcert, feal-linear-cryptanalysis, gpt2-codegolf, pytorch-model-cli, write-compressor
|
|
26
|
-
|
|
27
|
-
### Pattern: Mandatory Output Check
|
|
28
|
-
|
|
29
|
-
```markdown
|
|
30
|
-
### Pattern 12: Output Existence Verification (OEV)
|
|
31
|
-
|
|
32
|
-
BEFORE reporting task complete, ALWAYS:
|
|
33
|
-
|
|
34
|
-
1. **Parse expected outputs** from task description
|
|
35
|
-
- Look for: "create file X", "output to Y", "generate Z"
|
|
36
|
-
- Look for paths: /app/*.json, /app/*.py, /app/output.*
|
|
37
|
-
|
|
38
|
-
2. **Verify each output EXISTS**
|
|
39
|
-
```bash
|
|
40
|
-
for file in $EXPECTED_OUTPUTS; do
|
|
41
|
-
[ -f "$file" ] || echo "MISSING: $file"
|
|
42
|
-
done
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
3. **If missing, create it**
|
|
46
|
-
- Don't explain why it's missing
|
|
47
|
-
- Don't ask for clarification
|
|
48
|
-
- CREATE A MINIMAL VERSION that satisfies the test
|
|
49
|
-
|
|
50
|
-
4. **Re-verify before completion**
|
|
51
|
-
|
|
52
|
-
**This pattern is MANDATORY for ALL tasks.**
|
|
53
|
-
```
|
|
54
|
-
|
|
55
|
-
### Implementation in CLAUDE.md
|
|
56
|
-
|
|
57
|
-
Add to "CRITICAL REMINDERS" section at END:
|
|
58
|
-
|
|
59
|
-
```markdown
|
|
60
|
-
9. **OUTPUT VERIFICATION**: Before completing ANY task:
|
|
61
|
-
- List all files the task expects you to create
|
|
62
|
-
- Run `ls -la` on each expected output path
|
|
63
|
-
- If ANY file is missing, CREATE IT before saying "done"
|
|
64
|
-
- A task is NOT complete until all outputs exist
|
|
65
|
-
```
|
|
66
|
-
|
|
67
|
-
---
|
|
68
|
-
|
|
69
|
-
## Generic Solution 2: Iterative Refinement Loop (IRL)
|
|
70
|
-
|
|
71
|
-
**Problem:** Agent gets close but stops before full success.
|
|
72
|
-
|
|
73
|
-
**Affected tasks (4):** adaptive-rejection-sampler (8/9), db-wal-recovery (5/7), headless-terminal (6/7), winning-avg-corewars (2/3)
|
|
74
|
-
|
|
75
|
-
### Pattern: Test-Driven Iteration
|
|
76
|
-
|
|
77
|
-
```markdown
|
|
78
|
-
### Pattern 13: Iterative Refinement Loop (IRL)
|
|
79
|
-
|
|
80
|
-
When tests are available and some pass:
|
|
81
|
-
|
|
82
|
-
1. **Run tests after implementation**
|
|
83
|
-
```bash
|
|
84
|
-
pytest /tests/ -v 2>&1 | tee test_results.txt
|
|
85
|
-
```
|
|
86
|
-
|
|
87
|
-
2. **If partial success (some pass, some fail):**
|
|
88
|
-
- Read the failing test output
|
|
89
|
-
- Identify the specific assertion that failed
|
|
90
|
-
- Fix ONLY that specific issue
|
|
91
|
-
- Re-run tests
|
|
92
|
-
|
|
93
|
-
3. **Repeat until 100% pass OR timeout approaching**
|
|
94
|
-
|
|
95
|
-
4. **Budget rule:** Reserve 20% of time for iteration
|
|
96
|
-
|
|
97
|
-
**This pattern applies when test framework is available.**
|
|
98
|
-
```
|
|
99
|
-
|
|
100
|
-
### Implementation in CLAUDE.md
|
|
101
|
-
|
|
102
|
-
Add to decision loop:
|
|
103
|
-
|
|
104
|
-
```markdown
|
|
105
|
-
After implementation:
|
|
106
|
-
1. Run available tests
|
|
107
|
-
2. If partial pass: iterate to fix failures
|
|
108
|
-
3. If timeout approaching: ensure outputs exist (Pattern 12)
|
|
109
|
-
4. Only complete when tests pass OR outputs verified
|
|
110
|
-
```
|
|
111
|
-
|
|
112
|
-
---
|
|
113
|
-
|
|
114
|
-
## Generic Solution 3: Output Format Validation (OFV)
|
|
115
|
-
|
|
116
|
-
**Problem:** Output exists but format/content is wrong.
|
|
117
|
-
|
|
118
|
-
**Affected tasks (5):** chess-best-move, fix-git, mteb-retrieve, polyglot-rust-c, pypi-server
|
|
119
|
-
|
|
120
|
-
### Pattern: Spec-Driven Output
|
|
121
|
-
|
|
122
|
-
```markdown
|
|
123
|
-
### Pattern 14: Output Format Validation (OFV)
|
|
124
|
-
|
|
125
|
-
When task specifies exact output format:
|
|
126
|
-
|
|
127
|
-
1. **Extract format specification** from task description
|
|
128
|
-
- "Output should be JSON with fields X, Y"
|
|
129
|
-
- "File must contain exactly one line"
|
|
130
|
-
- "Result must match hash ABC"
|
|
131
|
-
|
|
132
|
-
2. **Validate before completion**
|
|
133
|
-
```python
|
|
134
|
-
# Example validations
|
|
135
|
-
assert len(output.splitlines()) == 1, "Must be one line"
|
|
136
|
-
assert json.loads(output), "Must be valid JSON"
|
|
137
|
-
assert hashlib.md5(output).hexdigest() == expected
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
3. **If validation fails, fix output**
|
|
141
|
-
- Don't re-explain the task
|
|
142
|
-
- Modify output to match spec
|
|
143
|
-
- Re-validate
|
|
144
|
-
|
|
145
|
-
**This pattern applies when format is explicitly specified.**
|
|
146
|
-
```
|
|
147
|
-
|
|
148
|
-
---
|
|
149
|
-
|
|
150
|
-
## Generic Solution 4: Exception Recovery (ER)
|
|
151
|
-
|
|
152
|
-
**Problem:** Code throws exception instead of producing output.
|
|
153
|
-
|
|
154
|
-
**Affected tasks (3):** configure-git-webserver, schemelike-metacircular-eval, torch-tensor-parallelism
|
|
155
|
-
|
|
156
|
-
### Pattern: Defensive Execution
|
|
157
|
-
|
|
158
|
-
```markdown
|
|
159
|
-
### Pattern 15: Exception Recovery (ER)
|
|
160
|
-
|
|
161
|
-
When running generated code:
|
|
162
|
-
|
|
163
|
-
1. **Wrap execution in try/catch**
|
|
164
|
-
```python
|
|
165
|
-
try:
|
|
166
|
-
result = run_implementation()
|
|
167
|
-
except Exception as e:
|
|
168
|
-
# Log error but don't stop
|
|
169
|
-
print(f"Error: {e}")
|
|
170
|
-
# Try simpler fallback
|
|
171
|
-
result = run_fallback()
|
|
172
|
-
```
|
|
173
|
-
|
|
174
|
-
2. **If exception occurs:**
|
|
175
|
-
- Read the stack trace
|
|
176
|
-
- Fix the specific error
|
|
177
|
-
- Re-run
|
|
178
|
-
|
|
179
|
-
3. **Common fixes:**
|
|
180
|
-
- ImportError → install package or use stdlib
|
|
181
|
-
- FileNotFoundError → create the file
|
|
182
|
-
- PermissionError → chmod or use different path
|
|
183
|
-
- TypeError → check function signatures
|
|
184
|
-
|
|
185
|
-
**This pattern applies when execution fails with traceback.**
|
|
186
|
-
```
|
|
187
|
-
|
|
188
|
-
---
|
|
189
|
-
|
|
190
|
-
## Summary: 4 Generic Patterns
|
|
191
|
-
|
|
192
|
-
| Pattern | Problem Solved | Tasks Fixed | % Improvement |
|
|
193
|
-
|---------|----------------|-------------|---------------|
|
|
194
|
-
| **OEV** (Output Existence) | Files not created | 7 | +18% |
|
|
195
|
-
| **IRL** (Iterative Refinement) | Partial success | 4 | +10% |
|
|
196
|
-
| **OFV** (Output Format) | Wrong content | 5 | +13% |
|
|
197
|
-
| **ER** (Exception Recovery) | Runtime errors | 3 | +8% |
|
|
198
|
-
|
|
199
|
-
**Total potential improvement:** +19 tasks → from 53.8% to ~95% (theoretical max)
|
|
200
|
-
|
|
201
|
-
**Realistic expectation:** +8-10 tasks → from 53.8% to ~75%
|
|
202
|
-
|
|
203
|
-
---
|
|
204
|
-
|
|
205
|
-
## Implementation Priority
|
|
206
|
-
|
|
207
|
-
### Phase 1: OEV (Highest ROI)
|
|
208
|
-
- Add output verification to CLAUDE.md critical reminders
|
|
209
|
-
- 37% of failures are "file not created"
|
|
210
|
-
- Simplest to implement, highest impact
|
|
211
|
-
|
|
212
|
-
### Phase 2: IRL (Medium ROI)
|
|
213
|
-
- Add test iteration loop
|
|
214
|
-
- Helps near-miss tasks
|
|
215
|
-
- Requires test framework detection
|
|
216
|
-
|
|
217
|
-
### Phase 3: OFV + ER (Lower ROI)
|
|
218
|
-
- Add format validation
|
|
219
|
-
- Add exception recovery
|
|
220
|
-
- More complex, fewer affected tasks
|
|
221
|
-
|
|
222
|
-
---
|
|
223
|
-
|
|
224
|
-
## CLAUDE.md Changes Required
|
|
225
|
-
|
|
226
|
-
### Add to Critical Reminders (END of file):
|
|
227
|
-
|
|
228
|
-
```markdown
|
|
229
|
-
9. **OUTPUT VERIFICATION (OEV)**: Before completing ANY task:
|
|
230
|
-
- Identify all files the task requires you to create
|
|
231
|
-
- Verify each file exists: `ls -la /app/expected_output.ext`
|
|
232
|
-
- If missing: CREATE IT, don't explain why it's missing
|
|
233
|
-
- Task is NOT complete until ALL outputs exist
|
|
234
|
-
|
|
235
|
-
10. **ITERATIVE REFINEMENT (IRL)**: If tests are available:
|
|
236
|
-
- Run tests after implementation
|
|
237
|
-
- If some pass, some fail: fix failures and re-run
|
|
238
|
-
- Iterate until 100% pass or timeout approaching
|
|
239
|
-
- Reserve 20% of time budget for iteration
|
|
240
|
-
|
|
241
|
-
11. **FORMAT VALIDATION (OFV)**: If task specifies exact format:
|
|
242
|
-
- Parse format requirements from task description
|
|
243
|
-
- Validate output matches spec before completion
|
|
244
|
-
- Fix mismatches, don't re-explain
|
|
245
|
-
```
|
|
246
|
-
|
|
247
|
-
---
|
|
248
|
-
|
|
249
|
-
## Why This Works
|
|
250
|
-
|
|
251
|
-
1. **Generic patterns apply to ALL tasks** - no task-specific knowledge needed
|
|
252
|
-
2. **Addresses root causes** - not symptoms
|
|
253
|
-
3. **Exploits recency bias** - critical reminders at END of context
|
|
254
|
-
4. **Defensive** - assumes agent will forget, adds verification layer
|
|
255
|
-
|
|
256
|
-
---
|
|
257
|
-
|
|
258
|
-
## Comparison: Generic vs Task-Specific
|
|
259
|
-
|
|
260
|
-
| Approach | Pros | Cons |
|
|
261
|
-
|----------|------|------|
|
|
262
|
-
| **Task-specific recipes** | High accuracy per task | Doesn't scale, maintenance burden |
|
|
263
|
-
| **Generic patterns** | Scales to all tasks | May miss domain nuances |
|
|
264
|
-
| **Hybrid (recommended)** | Best of both | More complex |
|
|
265
|
-
|
|
266
|
-
**Recommendation:** Implement generic patterns FIRST, then add task-specific recipes only for persistent failures.
|
|
267
|
-
|
|
268
|
-
---
|
|
269
|
-
|
|
270
|
-
**Plan Generated:** 2026-01-17
|