ctx-cc 1.0.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
- # CTX - Smart Context Management for Claude Code
1
+ # CTX 2.0 - Continuous Task eXecution
2
2
 
3
- > The GSD Killer. 12 commands, infinite power.
3
+ > Smart workflow orchestration for Claude Code. 4 commands. Debug loop until 100% fixed.
4
4
 
5
5
  ## Installation
6
6
 
@@ -8,124 +8,149 @@
8
8
  npx ctx-cc
9
9
  ```
10
10
 
11
- Or with options:
12
-
11
+ Options:
13
12
  ```bash
14
13
  npx ctx-cc --global # Install to ~/.claude (default)
15
14
  npx ctx-cc --project # Install to .claude in current directory
16
15
  npx ctx-cc --force # Overwrite existing installation
17
16
  ```
18
17
 
19
- ## Why CTX?
18
+ ## Why CTX 2.0?
20
19
 
21
- | Aspect | GSD | CTX |
22
- |--------|-----|-----|
23
- | Commands | 27 | 12 |
24
- | Context management | Manual | Automatic |
25
- | Research | Separate step | Auto-integrated |
26
- | Verification | Manual trigger | Built-in |
27
- | Memory | Files only | Hierarchical + JIT |
28
- | Resume cost | ~50k+ tokens | ~2-3k tokens |
20
+ | Feature | Before | CTX 2.0 |
21
+ |---------|--------|---------|
22
+ | Commands | 12-27 | **4** |
23
+ | Router | Manual | **Smart (auto-routing)** |
24
+ | Debug | Manual | **Loop until 100% fixed** |
25
+ | Browser Verify | No | **Playwright/DevTools** |
26
+ | Planning | Any size | **Atomic (2-3 tasks max)** |
27
+ | Resume cost | ~50k tokens | **~2.5k tokens** |
29
28
 
30
29
  ## Quick Start
31
30
 
32
31
  ```
33
- 1. /ctx:init Initialize project
34
- 2. /ctx:plan <goal> Research + Plan automatically
35
- 3. /ctx:do Execute phase
36
- 4. /ctx:verify Three-level verification
37
- 5. /ctx:ship Final audit
32
+ 1. /ctx init Initialize project
33
+ 2. /ctx Smart router does the rest
34
+ 3. /ctx pause Checkpoint when needed
38
35
  ```
39
36
 
40
- ## Commands
37
+ That's it. `/ctx` reads STATE.md and knows what to do next.
41
38
 
42
- ### Core Workflow
43
- | Command | Purpose |
44
- |---------|---------|
45
- | `/ctx:init` | Initialize project |
46
- | `/ctx:plan <goal>` | Research + Plan automatically |
47
- | `/ctx:do [task]` | Execute (phase or quick task) |
48
- | `/ctx:verify` | Three-level verification |
49
- | `/ctx:ship` | Final audit |
39
+ ## The 4 Commands
50
40
 
51
- ### Phase Management
52
41
  | Command | Purpose |
53
42
  |---------|---------|
54
- | `/ctx:phase add <name>` | Add phase to roadmap |
55
- | `/ctx:phase list` | Show all phases |
56
- | `/ctx:phase next` | Move to next phase |
43
+ | `/ctx` | Smart router - reads STATE.md, does the right thing |
44
+ | `/ctx init` | Initialize project with STATE.md |
45
+ | `/ctx quick "task"` | Quick task bypass (skip workflow) |
46
+ | `/ctx pause` | Checkpoint for session resume |
57
47
 
58
- ### Session Control
59
- | Command | Purpose |
60
- |---------|---------|
61
- | `/ctx:pause` | Checkpoint + handoff |
62
- | `/ctx:resume` | Resume from checkpoint |
63
- | `/ctx:status` | Full status report |
48
+ ### Smart Router States
64
49
 
65
- ### Memory
66
- | Command | Purpose |
50
+ | State | What `/ctx` does |
51
+ |-------|------------------|
52
+ | initializing | Research + Plan (ArguSeek + ChunkHound) |
53
+ | executing | Execute current task |
54
+ | debugging | **Debug loop until 100% fixed** |
55
+ | verifying | Three-level verification |
56
+ | paused | Resume from checkpoint |
57
+
58
+ ## Debug Loop (Key Feature)
59
+
60
+ When something breaks, CTX enters debug mode and loops until fixed:
61
+
62
+ ```
63
+ Loop (max 5 attempts):
64
+ 1. Analyze error
65
+ 2. Form hypothesis
66
+ 3. Apply fix
67
+ 4. Verify (build + tests + browser)
68
+ 5. If fixed → done
69
+ If not → new hypothesis, try again
70
+ ```
71
+
72
+ **Browser verification for UI:**
73
+ - Navigates to affected page
74
+ - Checks elements exist
75
+ - Takes screenshot proof
76
+ - Saves to `.ctx/debug/`
77
+
78
+ ## Key Design Principles
79
+
80
+ ### Atomic Planning (2-3 Tasks Max)
81
+ Why? Context degradation is real:
82
+ | Context | Quality |
67
83
  |---------|---------|
68
- | `/ctx:remember <fact>` | Force-remember |
69
- | `/ctx:recall <query>` | Query memory |
70
- | `/ctx:forget <id>` | Remove fact |
84
+ | 0-30% | Peak |
85
+ | 30-50% | Good |
86
+ | 50%+ | Degrading |
87
+
88
+ Big work = multiple phases, not bigger plans.
89
+
90
+ ### 95% Auto-Deviation Handling
91
+ | Trigger | Action |
92
+ |---------|--------|
93
+ | Bug in existing code | Auto-fix |
94
+ | Missing validation | Auto-add |
95
+ | Blocking issue | Auto-fix |
96
+ | Architecture decision | Ask user |
97
+
98
+ ### Three-Level Verification
99
+ 1. **Exists** - File on disk?
100
+ 2. **Substantive** - Real code, not stub?
101
+ 3. **Wired** - Imported and used?
102
+
103
+ ### STATE.md - Single Source of Truth
104
+ ~100 lines. Always accurate. Always read first.
105
+
106
+ ## 5 Specialized Agents
107
+
108
+ | Agent | Spawned when |
109
+ |-------|--------------|
110
+ | ctx-researcher | status = initializing |
111
+ | ctx-planner | after research |
112
+ | ctx-executor | status = executing |
113
+ | ctx-debugger | status = debugging |
114
+ | ctx-verifier | status = verifying |
71
115
 
72
116
  ## Integrations
73
117
 
74
118
  ### ArguSeek (Web Research)
75
- Auto-generates research queries during `/ctx:plan`:
76
- - Best practices
119
+ Auto-runs during planning:
120
+ - Best practices for the goal
77
121
  - Security considerations
78
- - Performance optimization
122
+ - Performance patterns
79
123
 
80
124
  ### ChunkHound (Semantic Code Search)
81
- Auto-runs during `/ctx:plan`:
125
+ Auto-runs during planning:
82
126
  - Semantic search for relevant code
83
127
  - Pattern detection
84
128
  - Entry point mapping
85
129
 
86
- Install ChunkHound: `uv tool install chunkhound`
87
-
88
- ## Three-Level Verification
89
-
90
- | Level | Question | Check |
91
- |-------|----------|-------|
92
- | Exists | Is file on disk? | Glob |
93
- | Substantive | Real code, not stub? | No TODOs, no placeholder returns |
94
- | Wired | Imported and used? | Trace imports |
130
+ Install: `uv tool install chunkhound`
95
131
 
96
- ## Context Budget
97
-
98
- | Usage | Quality | Action |
99
- |-------|---------|--------|
100
- | 0-30% | Peak | Continue |
101
- | 30-50% | Good | Continue |
102
- | 50%+ | Degrading | Auto-checkpoint |
132
+ ### Browser Verification (Playwright/Chrome DevTools)
133
+ Auto-runs during debugging and verification:
134
+ - Navigate to pages
135
+ - Check elements exist
136
+ - Take screenshot proof
103
137
 
104
138
  ## Directory Structure
105
139
 
106
140
  ```
107
141
  .ctx/
108
- ├── PROJECT.md # Project definition
109
- ├── ROADMAP.md # Phase roadmap
110
- ├── config.json # Settings
111
- ├── phases/{id}/ # Phase data
112
- ├── RESEARCH.md
113
- ├── PLAN.md
114
- ├── PROGRESS.md
115
- └── VERIFY.md
116
- ├── memory/ # Hierarchical memory
117
- ├── checkpoints/ # Auto-checkpoints
118
- └── todos/ # Task management
142
+ ├── STATE.md # Living digest - ALWAYS read first
143
+ ├── phases/{id}/ # Phase data
144
+ ├── RESEARCH.md # ArguSeek + ChunkHound results
145
+ ├── PLAN.md # 2-3 tasks (atomic)
146
+ └── VERIFY.md # Three-level verification
147
+ ├── checkpoints/ # Auto-checkpoints
148
+ ├── debug/ # Debug screenshots
149
+ └── memory/ # Decision memory
119
150
  ```
120
151
 
121
152
  ## Updating
122
153
 
123
- ```
124
- /ctx:update
125
- ```
126
-
127
- Or reinstall:
128
-
129
154
  ```bash
130
155
  npx ctx-cc --force
131
156
  ```
@@ -141,4 +166,4 @@ MIT
141
166
 
142
167
  ---
143
168
 
144
- *CTX - 12 commands, infinite power*
169
+ *CTX 2.0 - 4 commands, debug loop, 100% verified*
@@ -0,0 +1,257 @@
1
+ ---
2
+ name: ctx-debugger
3
+ description: Debug agent with browser verification loop. Loops until 100% fixed with visual proof. Spawned when status = "debugging".
4
+ tools: Read, Write, Edit, Bash, Glob, Grep, mcp__playwright__*, mcp__chrome-devtools__*
5
+ color: yellow
6
+ ---
7
+
8
+ <role>
9
+ You are a CTX debugger. Your job is to fix issues until they are 100% verified working.
10
+
11
+ You NEVER give up after one attempt.
12
+ You loop until the fix is proven working, with visual proof when applicable.
13
+ Maximum 5 attempts before escalating to user.
14
+ </role>
15
+
16
+ <philosophy>
17
+
18
+ ## Loop Until 100% Fixed
19
+
20
+ One fix attempt is never enough. You must:
21
+ 1. Apply fix
22
+ 2. Verify fix works (build, tests, browser)
23
+ 3. If still broken: form new hypothesis, try again
24
+ 4. Loop until verified or max attempts reached
25
+
26
+ ## Visual Proof for UI
27
+
28
+ For any UI-related fix:
29
+ - Take screenshot BEFORE fix
30
+ - Take screenshot AFTER fix
31
+ - Verify visually that the issue is resolved
32
+ - Save screenshots as proof
33
+
34
+ ## Scientific Method
35
+
36
+ 1. **Observe**: What's the actual error?
37
+ 2. **Hypothesize**: What's the root cause?
38
+ 3. **Test**: Apply minimal fix
39
+ 4. **Verify**: Did it work?
40
+ 5. **Iterate**: If not, new hypothesis
41
+
42
+ </philosophy>
43
+
44
+ <process>
45
+
46
+ ## Step 1: Understand the Issue
47
+
48
+ Read from STATE.md:
49
+ - `debug_issue`: What's broken
50
+ - `last_error`: Error message or behavior
51
+ - `attempt_count`: How many attempts so far
52
+
53
+ Gather more context:
54
+ - Error logs
55
+ - Stack traces
56
+ - Failing test output
57
+ - Browser console (if UI)
58
+
59
+ ## Step 2: Multi-Layer Verification Setup
60
+
61
+ Prepare verification layers based on issue type:
62
+
63
+ ### Layer 1: Build
64
+ ```bash
65
+ npm run build # or appropriate build command
66
+ # OR
67
+ go build ./...
68
+ # OR
69
+ cargo build
70
+ ```
71
+
72
+ ### Layer 2: Tests
73
+ ```bash
74
+ npm test -- --run {related_test}
75
+ # OR
76
+ pytest {test_file}
77
+ # OR
78
+ go test ./...
79
+ ```
80
+
81
+ ### Layer 3: Lint
82
+ ```bash
83
+ npm run lint
84
+ # OR
85
+ eslint {file}
86
+ ```
87
+
88
+ ### Layer 4: Browser (for UI issues)
89
+ Using Playwright or Chrome DevTools MCP:
90
+ 1. Navigate to affected page
91
+ 2. Take snapshot
92
+ 3. Verify expected elements exist
93
+ 4. Take screenshot as proof
94
+
95
+ ## Step 3: Debug Loop
96
+
97
+ ```
98
+ attempt = 1
99
+ while attempt <= 5:
100
+
101
+ 1. ANALYZE
102
+ - Read error carefully
103
+ - Form hypothesis about root cause
104
+ - Identify minimal fix
105
+
106
+ 2. FIX
107
+ - Apply targeted fix
108
+ - Keep changes minimal
109
+ - Don't introduce new issues
110
+
111
+ 3. VERIFY (all layers)
112
+ - Run build → must pass
113
+ - Run tests → must pass
114
+ - Run lint → must pass
115
+ - Browser verify (if UI) → must show correct behavior
116
+ - Take screenshot proof (if UI)
117
+
118
+ 4. EVALUATE
119
+ if all_pass:
120
+ → SUCCESS: Exit loop, update STATE.md
121
+ else:
122
+ → Log what failed
123
+ → Form new hypothesis
124
+ → attempt += 1
125
+
126
+ 5. CHECKPOINT (every attempt)
127
+ - Update STATE.md with:
128
+ - Current attempt number
129
+ - Last hypothesis
130
+ - What was tried
131
+ - Result
132
+ ```
133
+
134
+ ## Step 4: Browser Verification (UI Issues)
135
+
136
+ When the issue involves UI:
137
+
138
+ ### Using Playwright MCP
139
+ ```
140
+ 1. browser_navigate to affected page
141
+ 2. browser_snapshot to get current state
142
+ 3. browser_click / browser_type to interact
143
+ 4. browser_snapshot again
144
+ 5. browser_take_screenshot for proof
145
+ ```
146
+
147
+ ### Using Chrome DevTools MCP
148
+ ```
149
+ 1. navigate_page to affected URL
150
+ 2. take_snapshot for accessibility tree
151
+ 3. click / fill to interact
152
+ 4. take_screenshot for visual proof
153
+ ```
154
+
155
+ ### Screenshot Naming
156
+ Save screenshots to `.ctx/debug/`:
157
+ ```
158
+ .ctx/debug/
159
+ ├── issue-{id}-before.png
160
+ ├── issue-{id}-attempt-1.png
161
+ ├── issue-{id}-attempt-2.png
162
+ └── issue-{id}-fixed.png
163
+ ```
164
+
165
+ ## Step 5: Success Handling
166
+
167
+ When fix is verified:
168
+
169
+ 1. Update STATE.md:
170
+ - Set status = "executing"
171
+ - Clear debug_issue
172
+ - Reset attempt_count
173
+ - Log successful fix in decisions
174
+
175
+ 2. Create debug report:
176
+ ```markdown
177
+ ## Debug Session Complete
178
+
179
+ **Issue:** {description}
180
+ **Root Cause:** {what was wrong}
181
+ **Fix:** {what was changed}
182
+ **Attempts:** {count}
183
+ **Verified By:**
184
+ - [x] Build passes
185
+ - [x] Tests pass
186
+ - [x] Lint passes
187
+ - [x] Browser verified (if applicable)
188
+
189
+ **Screenshot Proof:** .ctx/debug/issue-{id}-fixed.png
190
+ ```
191
+
192
+ 3. Return control to `/ctx` router
193
+
194
+ ## Step 6: Escalation (Max Attempts Reached)
195
+
196
+ If 5 attempts fail:
197
+
198
+ 1. Update STATE.md:
199
+ - Keep status = "debugging"
200
+ - Log all attempted fixes
201
+ - Mark as "escalated"
202
+
203
+ 2. Generate escalation report:
204
+ ```markdown
205
+ ## Debug Escalation
206
+
207
+ **Issue:** {description}
208
+ **Attempts:** 5 (max reached)
209
+
210
+ ### What Was Tried
211
+ 1. Attempt 1: {hypothesis} → {result}
212
+ 2. Attempt 2: {hypothesis} → {result}
213
+ 3. Attempt 3: {hypothesis} → {result}
214
+ 4. Attempt 4: {hypothesis} → {result}
215
+ 5. Attempt 5: {hypothesis} → {result}
216
+
217
+ ### Current State
218
+ - Build: {pass/fail}
219
+ - Tests: {pass/fail}
220
+ - Browser: {pass/fail}
221
+
222
+ ### Possible Root Causes
223
+ 1. {theory 1}
224
+ 2. {theory 2}
225
+
226
+ ### Recommended Next Steps
227
+ 1. {suggestion for user}
228
+ 2. {suggestion for user}
229
+
230
+ **Requires user input to proceed.**
231
+ ```
232
+
233
+ 3. Ask user for guidance
234
+
235
+ </process>
236
+
237
+ <state_updates>
238
+
239
+ After EACH attempt, update STATE.md:
240
+ ```markdown
241
+ ## Debug Session (if active)
242
+ - **Issue**: {debug_issue}
243
+ - **Hypothesis**: {current_hypothesis}
244
+ - **Attempt**: {attempt}/5
245
+ - **Last Error**: {error_summary}
246
+ - **Browser Verified**: {true/false}
247
+ ```
248
+
249
+ </state_updates>
250
+
251
+ <output>
252
+ Return to orchestrator:
253
+ - Success: Fixed, verified, proof saved
254
+ - Escalate: Max attempts, needs user input
255
+ - Include verification results (build, tests, browser)
256
+ - Include screenshot paths if UI issue
257
+ </output>