create-ccc-tutor 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +41 -0
- package/bin/cli.js +76 -0
- package/package.json +28 -0
- package/template/.claude/commands/abandon.md +7 -0
- package/template/.claude/commands/add-anti-flag.md +7 -0
- package/template/.claude/commands/add-constitution-clause.md +7 -0
- package/template/.claude/commands/audit-spec.md +7 -0
- package/template/.claude/commands/commit.md +7 -0
- package/template/.claude/commands/constitution-edit.md +7 -0
- package/template/.claude/commands/db-schema.md +7 -0
- package/template/.claude/commands/exam.md +66 -0
- package/template/.claude/commands/execution-plan.md +7 -0
- package/template/.claude/commands/feature-draft.md +7 -0
- package/template/.claude/commands/handoff.md +7 -0
- package/template/.claude/commands/implement.md +7 -0
- package/template/.claude/commands/init.md +7 -0
- package/template/.claude/commands/next.md +7 -0
- package/template/.claude/commands/offload.md +7 -0
- package/template/.claude/commands/pickup.md +7 -0
- package/template/.claude/commands/recall.md +7 -0
- package/template/.claude/commands/remember.md +7 -0
- package/template/.claude/commands/slide.md +87 -0
- package/template/.claude/commands/spec-finalize.md +7 -0
- package/template/.claude/commands/test-fix.md +7 -0
- package/template/.claude/commands/uninstall.md +7 -0
- package/template/.claude/settings.json +161 -0
- package/template/.claude-plugin/plugin.json +41 -0
- package/template/.codex/config.toml +24 -0
- package/template/.codex/hooks.json +4 -0
- package/template/.codex/install-skills.sh +18 -0
- package/template/.codex/skills/exam/SKILL.md +61 -0
- package/template/.codex/skills/slide/SKILL.md +69 -0
- package/template/.harness/agents/README.md +70 -0
- package/template/.harness/agents/_template/junior-agent-template.md +116 -0
- package/template/.harness/agents/backend-reviewer.md +153 -0
- package/template/.harness/agents/frontend-reviewer.md +158 -0
- package/template/.harness/agents/security-reviewer.md +148 -0
- package/template/.harness/agents/test-fixer.md +147 -0
- package/template/.harness/docs/doc-sync.md +29 -0
- package/template/.harness/docs/git-hygiene.md +56 -0
- package/template/.harness/docs/spec-model.md +47 -0
- package/template/.harness/docs/tool-map.md +120 -0
- package/template/.harness/docs/workflow.md +59 -0
- package/template/.harness/scripts/README.md +70 -0
- package/template/.harness/scripts/auditor-gate.sh +388 -0
- package/template/.harness/scripts/bootstrap-check.sh +103 -0
- package/template/.harness/scripts/budget-monitor.sh +223 -0
- package/template/.harness/scripts/check-prereqs.sh +165 -0
- package/template/.harness/scripts/checkpoint-recall.sh +136 -0
- package/template/.harness/scripts/checkpoint-write.sh +281 -0
- package/template/.harness/scripts/decision-log-append.sh +90 -0
- package/template/.harness/scripts/env-check.sh +286 -0
- package/template/.harness/scripts/format-edit.sh +80 -0
- package/template/.harness/scripts/lint-bans.sh +110 -0
- package/template/.harness/scripts/memory-archive.sh +129 -0
- package/template/.harness/scripts/memory-recall.sh +197 -0
- package/template/.harness/scripts/memory-snapshot.sh +124 -0
- package/template/.harness/scripts/post-migration.sh +58 -0
- package/template/.harness/scripts/precommit-cycles.sh +74 -0
- package/template/.harness/scripts/precommit-typecheck.sh +69 -0
- package/template/.harness/scripts/scratchpad-recall.sh +83 -0
- package/template/.harness/scripts/scratchpad-update.sh +39 -0
- package/template/.harness/scripts/standalone-bootstrap.md +443 -0
- package/template/.harness/skills/abandon/SKILL.md +157 -0
- package/template/.harness/skills/add-anti-flag/SKILL.md +205 -0
- package/template/.harness/skills/add-constitution-clause/SKILL.md +244 -0
- package/template/.harness/skills/audit-spec/SKILL.md +395 -0
- package/template/.harness/skills/commit/SKILL.md +270 -0
- package/template/.harness/skills/constitution-edit/SKILL.md +292 -0
- package/template/.harness/skills/db-schema/SKILL.md +145 -0
- package/template/.harness/skills/db-schema/references/methodology.md +202 -0
- package/template/.harness/skills/execution-plan/SKILL.md +346 -0
- package/template/.harness/skills/feature-draft/SKILL.md +426 -0
- package/template/.harness/skills/handoff/SKILL.md +211 -0
- package/template/.harness/skills/implement/SKILL.md +355 -0
- package/template/.harness/skills/init/SKILL.md +805 -0
- package/template/.harness/skills/next/SKILL.md +245 -0
- package/template/.harness/skills/offload/SKILL.md +134 -0
- package/template/.harness/skills/pickup/SKILL.md +213 -0
- package/template/.harness/skills/recall/SKILL.md +159 -0
- package/template/.harness/skills/remember/SKILL.md +205 -0
- package/template/.harness/skills/spec-finalize/SKILL.md +196 -0
- package/template/.harness/skills/test-fix/SKILL.md +363 -0
- package/template/.harness/skills/uninstall/SKILL.md +370 -0
- package/template/.harness/state/install.json +83 -0
- package/template/AGENTS.md +262 -0
- package/template/CCC_MAGI_LICENSE +201 -0
- package/template/CCC_MAGI_README.md +986 -0
- package/template/CLAUDE.md +658 -0
- package/template/codex.md +39 -0
- package/template/constitution.md +164 -0
- package/template/course/README.md +15 -0
- package/template/course/course_code(example)/exam/README.md +2 -0
- package/template/course/course_code(example)/slide/slide_example-1.pdf +40 -0
- package/template/course/course_code(example)/slide/slide_example-2.pdf +40 -0
- package/template/docs/features/slide-query-implementation.md +79 -0
- package/template/docs/features/slide-query.md +211 -0
- package/template/docs-harness/README.md +42 -0
- package/template/docs-harness/adoption-playbook.md +373 -0
- package/template/docs-harness/ccc-step1-driver-template.md +288 -0
- package/template/docs-harness/cli-configs-README.md +78 -0
- package/template/docs-harness/context-architecture-v2.md +249 -0
- package/template/docs-harness/design-spec.md +437 -0
- package/template/docs-harness/memory-layer.md +135 -0
- package/template/docs-harness/retrospective-notes.md +204 -0
- package/template/gitignore +106 -0
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
# CLI Integration Layer (`outcome/cli-configs/`)
|
|
2
|
+
|
|
3
|
+
This directory holds the **per-CLI integration configs** — the hooks, permissions, and runtime settings that each CLI (Claude Code, Codex) needs to actually load this harness.
|
|
4
|
+
|
|
5
|
+
## Why this is split per-CLI
|
|
6
|
+
|
|
7
|
+
Each CLI has its own config format and conventions:
|
|
8
|
+
|
|
9
|
+
| CLI | Config files | Format |
|
|
10
|
+
|-----|-------------|--------|
|
|
11
|
+
| Claude Code | `.claude/settings.json` | JSON |
|
|
12
|
+
| Codex | `.codex/config.toml`, `.codex/hooks.json` | TOML + JSON |
|
|
13
|
+
|
|
14
|
+
The **content** (hook semantics, permission intents) is largely shared. The **format** is per-CLI. In Round 1 of harness cleanup we keep them as separate templates; in Round 2 the plan is a single "settings spec" + adapters that render each CLI's format. For now: 3 files, separately maintained.
|
|
15
|
+
|
|
16
|
+
## What's in here
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
cli-configs/
|
|
20
|
+
├── README.md ← this file
|
|
21
|
+
├── claude/
|
|
22
|
+
│ └── settings.json ← Claude Code settings template
|
|
23
|
+
└── codex/
|
|
24
|
+
├── config.toml ← Codex model + MCP servers config
|
|
25
|
+
└── hooks.json ← Codex hook config (typically same shape as claude/)
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Installation flow
|
|
29
|
+
|
|
30
|
+
When `/init` runs in a user's project:
|
|
31
|
+
|
|
32
|
+
1. Reads slot values (`{{auditor_model_id}}`, `{{junior_reviewers}}`, etc.) from `constitution.md` registry
|
|
33
|
+
2. Renders the templates above with slot values filled in
|
|
34
|
+
3. Writes them to the user's project at:
|
|
35
|
+
- `.claude/settings.json`
|
|
36
|
+
- `.codex/config.toml`
|
|
37
|
+
- `.codex/hooks.json`
|
|
38
|
+
4. Creates `.harness/scripts/` directory with the project's hook scripts (typecheck / lint-bans / cycles / format / post-migration / auditor-gate). **These scripts are project-specific** — the harness ships templates / examples in `outcome/cli-configs/scripts-template/` (TBD in Round 2); the user customizes them based on their stack.
|
|
39
|
+
|
|
40
|
+
## What was removed from the original
|
|
41
|
+
|
|
42
|
+
The harness's source files (`harness/claude/settings.json`, `harness/codex/config.toml`, `harness/codex/hooks.json`) contained:
|
|
43
|
+
|
|
44
|
+
- **User-personal telemetry hooks** (CCC PreToolUse / Stop / Notification with `sessionId=2 port=52301` and HTTP calls to localhost) → REMOVED. That was the original author's local dev tooling, not part of the harness.
|
|
45
|
+
- **User-personal statusLine** (pointing to `/var/folders/...` temp file) → REMOVED.
|
|
46
|
+
- **Supabase MCP permissions and server URL** (with a specific project_ref) → REMOVED, replaced with commented example in `codex/config.toml`.
|
|
47
|
+
- **Specific model name `gpt-5.5`** → slot-ified as `{{auditor_model_id}}`.
|
|
48
|
+
|
|
49
|
+
## Required hook scripts
|
|
50
|
+
|
|
51
|
+
The templates reference these script paths. The harness expects them to exist in the user's project at `.harness/scripts/`:
|
|
52
|
+
|
|
53
|
+
| Script | Purpose | When invoked |
|
|
54
|
+
|--------|---------|--------------|
|
|
55
|
+
| `precommit-typecheck.sh` | Block commit on type/syntax errors | PreToolUse on `git commit` |
|
|
56
|
+
| `lint-bans.sh` | Block commit on anti-flag pattern hits | PreToolUse on `git commit` |
|
|
57
|
+
| `precommit-cycles.sh` | Block commit on dependency cycles (only if `dependency_flow` non-empty) | PreToolUse on `git commit` |
|
|
58
|
+
| `format-edit.sh` | Run formatter on edited files | PostToolUse on `Edit` / `Write` |
|
|
59
|
+
| `post-migration.sh` | Refresh caches + regenerate types after migration | Manual (referenced in `/db-schema`) |
|
|
60
|
+
| `auditor-gate.sh` | Invoke the auditor CLI ({{auditor_model}}) with structured output | Manual (referenced in every audit-gated skill) |
|
|
61
|
+
|
|
62
|
+
These are **project-specific** — the user fills them with their stack's actual commands (e.g., `tsc --noEmit` for TypeScript, `mypy` for Python, etc.).
|
|
63
|
+
|
|
64
|
+
## MCP server defaults
|
|
65
|
+
|
|
66
|
+
The harness ships with:
|
|
67
|
+
|
|
68
|
+
- **`context7`** (Upstash) — universal docs lookup; high value across stacks
|
|
69
|
+
- **`github`** — optional; useful for any project that uses GitHub
|
|
70
|
+
- **Project-specific MCPs** — user adds per stack. Examples:
|
|
71
|
+
- Backend MCPs (e.g., Supabase MCP, Postgres MCP, MongoDB MCP) — only if `backend_db_type` configured
|
|
72
|
+
- SaaS MCPs (Stripe, Sentry, etc.) per the project's external integrations
|
|
73
|
+
|
|
74
|
+
## Permissions philosophy
|
|
75
|
+
|
|
76
|
+
The `permissions.allow` list controls which MCP tool calls Claude Code may execute without prompting. The harness ships a minimal default — `context7` only. Other MCPs require explicit user opt-in at `/init` to avoid silent over-permissioning.
|
|
77
|
+
|
|
78
|
+
If you want a different default, edit `claude/settings.json` here before running `/init`.
|
|
@@ -0,0 +1,249 @@
|
|
|
1
|
+
# Context Architecture v2 — CCC-MAGI
|
|
2
|
+
|
|
3
|
+
> **Status**: Active design (rolled out 2026-05). Supersedes the v1 single-tier observations layer.
|
|
4
|
+
> **Audience**: Harness contributors + power users wanting to understand or tune the memory layer.
|
|
5
|
+
> **Companion files**: `CLAUDE.md § Memory Calling Rules` (operational), `outcome/scripts/memory-*.sh` (mechanism), individual SKILL.md for `/handoff`, `/recall`, `/offload` (UX).
|
|
6
|
+
|
|
7
|
+
## 1. The problem v2 solves
|
|
8
|
+
|
|
9
|
+
v1's `.harness/memory/observations.jsonl` is a single flat append-only file. Three failure modes accumulated:
|
|
10
|
+
|
|
11
|
+
- **Recency bias squeezes out durable wisdom.** Scoring is `+5 feature-match, +1 within 7 days`. After 6 months of project work, old foundational decisions (e.g., "we don't use Redux, ever") lose the recency bonus and stop being recalled — even though they remain load-bearing rules.
|
|
12
|
+
- **Eager injection wastes tokens.** SessionStart injects up to ~2K tokens of recalled entries regardless of whether the user's current task needs them. On unrelated work (refactor, doc edit, off-topic Q&A) this is pure overhead.
|
|
13
|
+
- **Binary `consumed` flag for snapshots is brittle.** If session N+1 happens to start on a different feature than session N's snapshot, the snapshot is read-and-consumed for no benefit; when the user returns to the original feature later, the snapshot is gone.
|
|
14
|
+
|
|
15
|
+
v2 reorganizes memory into a 3-tier (Letta-style) structure with explicit calling rules and just-in-time recall.
|
|
16
|
+
|
|
17
|
+
## 2. The 3 tiers
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
21
|
+
│ Tier 1: Working memory (always in context, ~500 tokens) │
|
|
22
|
+
│ Location: .harness/state/scratchpad.md │
|
|
23
|
+
│ Lifecycle: AI rewrites every turn (Stop hook); read at │
|
|
24
|
+
│ SessionStart │
|
|
25
|
+
│ Stores: current objective / last step / next step / blockers │
|
|
26
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
27
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
28
|
+
│ Tier 2: Recall memory (manifest in context, ~500-1000 tokens) │
|
|
29
|
+
│ Location: .harness/memory/sessions/recall/ │
|
|
30
|
+
│ ├── observations.jsonl (PreCompaction + /remember writes) │
|
|
31
|
+
│ └── snapshots.jsonl (/handoff writes) │
|
|
32
|
+
│ Lifecycle: ≤ 30 days. Manifest (id+focus+date+feature) at │
|
|
33
|
+
│ SessionStart; body loaded on /recall <id>. │
|
|
34
|
+
│ Stores: recent decisions / session snapshots / observations │
|
|
35
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
36
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
37
|
+
│ Tier 3: Archival memory (not in context, on-demand only) │
|
|
38
|
+
│ Location: .harness/memory/sessions/archive/<YYYY-MM>.jsonl │
|
|
39
|
+
│ Lifecycle: Entries older than 30 days migrated by │
|
|
40
|
+
│ memory-archive.sh │
|
|
41
|
+
│ Stores: same schema as Tier 2, just cold storage │
|
|
42
|
+
│ Access: /recall --deep <query> (grep-based, no vector DB) │
|
|
43
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
44
|
+
|
|
45
|
+
╔═════════════════════════════════════════════════════════════════╗
|
|
46
|
+
║ Shared (committed to git, team-wide) ║
|
|
47
|
+
║ ─ .harness/memory/conventions.md ─ long-form rules ║
|
|
48
|
+
║ ─ .harness/memory/decisions.jsonl ─ /remember writes here ║
|
|
49
|
+
╚═════════════════════════════════════════════════════════════════╝
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### Why 3 tiers (not 2, not 4)?
|
|
53
|
+
|
|
54
|
+
- **Working** keeps multi-step task focus (anti-distraction) — needs to be always available, so it pays the constant 500-token cost.
|
|
55
|
+
- **Recall** is the day-to-day memory — needs to be cheap to scan (manifest) but expensive to load (body). 30-day window matches "human working memory" — what you can plausibly remember about a project.
|
|
56
|
+
- **Archival** is the institutional memory — pay only when actually queried. Without this tier, recall manifest would grow unbounded; with this tier, manifest stays at 5-15 entries no matter how long the project runs.
|
|
57
|
+
|
|
58
|
+
A 4th tier (e.g., vector-indexed semantic search) was considered and rejected. Adding a vector DB introduces dependencies (embeddings model, runtime), makes the memory non-grepable, and the empirical gain over grep-based archival recall is small for projects below ~10K total entries. We accept weaker retrieval to preserve the "files-on-disk, no dependencies" property.
|
|
59
|
+
|
|
60
|
+
## 3. AI calling rules (HARD)
|
|
61
|
+
|
|
62
|
+
Without explicit calling rules, AI will fetch recall bodies "for completeness" and waste tokens. The rules below are **hard constraints written into CLAUDE.md**, validated at runtime via skill prompts.
|
|
63
|
+
|
|
64
|
+
### 3.1 Tier 1 (Working / scratchpad)
|
|
65
|
+
- **Always read at SessionStart** (no choice).
|
|
66
|
+
- **Always rewrite at end of each turn** via Stop hook.
|
|
67
|
+
- AI must not "skip" the rewrite — empty scratchpad is acceptable, half-stale is not.
|
|
68
|
+
|
|
69
|
+
### 3.2 Tier 2 (Recall)
|
|
70
|
+
|
|
71
|
+
**Manifest is always injected at SessionStart** (zero choice for AI).
|
|
72
|
+
|
|
73
|
+
**Body fetch via `/recall <id>` triggers ONLY when**:
|
|
74
|
+
1. User explicitly references prior context: "之前 / 上次 / 上回 / 我们之前定的 / last time / before / previously"
|
|
75
|
+
2. Current task's `feature` tag **exactly matches** a manifest entry's `feature`
|
|
76
|
+
3. AI is about to make a decision in an area where a manifest summary indicates a relevant prior decision (overlap, not mere relatedness)
|
|
77
|
+
|
|
78
|
+
**Hard caps**: ≤ 3 recall body fetches per session.
|
|
79
|
+
|
|
80
|
+
### 3.3 Tier 3 (Archive)
|
|
81
|
+
|
|
82
|
+
**No injection at SessionStart** (zero overhead).
|
|
83
|
+
|
|
84
|
+
**`/recall --deep <query>` triggers ONLY when**:
|
|
85
|
+
1. User explicitly asks: "查一下半年前 / search history / older / archive"
|
|
86
|
+
2. Current task's `feature` is not present in the recall manifest
|
|
87
|
+
3. Code being edited has `git blame` older than 30 days (suggests pre-recall-window logic)
|
|
88
|
+
|
|
89
|
+
**Hard caps**: ≤ 1 archive search per session.
|
|
90
|
+
|
|
91
|
+
### 3.4 Prohibitions (also HARD)
|
|
92
|
+
|
|
93
|
+
- ❌ "For completeness" fetches without one of the triggers above
|
|
94
|
+
- ❌ Fetching multiple recall bodies "just in case"
|
|
95
|
+
- ❌ Searching archive when user's question is clearly about current work
|
|
96
|
+
|
|
97
|
+
## 4. Trigger surfaces
|
|
98
|
+
|
|
99
|
+
| Trigger | Threshold | Action | UX |
|
|
100
|
+
|---|---|---|---|
|
|
101
|
+
| Budget pressure | 50% | Soft advisory (existing) | `additionalContext` text only |
|
|
102
|
+
| Budget pressure | 75% | Firm advisory + offer `/offload <task>` | 4-option menu at end-of-turn |
|
|
103
|
+
| Budget pressure | 95% | 3-option menu: `/compact` / `/handoff` / continue | Menu deferred to end-of-turn via Stop hook |
|
|
104
|
+
| End-of-turn | Every turn | Scratchpad rewrite | Stop hook fires `scratchpad-update.sh` |
|
|
105
|
+
| Session start | Every session | Inject scratchpad + recall manifest + checkpoint | SessionStart hooks fire in order |
|
|
106
|
+
| PreCompaction | Auto-compact triggered | Harvest checkpoint+decision-log into snapshot | Deterministic (no LLM call) |
|
|
107
|
+
|
|
108
|
+
## 5. File layout
|
|
109
|
+
|
|
110
|
+
```
|
|
111
|
+
.harness/
|
|
112
|
+
├── memory/
|
|
113
|
+
│ ├── conventions.md [shared, committed]
|
|
114
|
+
│ ├── decisions.jsonl [shared, committed]
|
|
115
|
+
│ └── sessions/ [personal, gitignored]
|
|
116
|
+
│ ├── recall/
|
|
117
|
+
│ │ ├── observations.jsonl
|
|
118
|
+
│ │ └── snapshots.jsonl
|
|
119
|
+
│ └── archive/
|
|
120
|
+
│ ├── 2026-04.jsonl
|
|
121
|
+
│ └── 2026-05.jsonl
|
|
122
|
+
├── state/
|
|
123
|
+
│ ├── scratchpad.md [personal, gitignored]
|
|
124
|
+
│ ├── _handoff-offered/<sid>.flag [personal, gitignored]
|
|
125
|
+
│ └── _handoff-dismissed/<sid>.flag [personal, gitignored]
|
|
126
|
+
└── scripts/
|
|
127
|
+
├── memory-recall.sh [modified: manifest mode]
|
|
128
|
+
├── memory-snapshot.sh [modified: deterministic harvest]
|
|
129
|
+
├── memory-archive.sh [NEW]
|
|
130
|
+
├── scratchpad-update.sh [NEW]
|
|
131
|
+
├── scratchpad-recall.sh [NEW]
|
|
132
|
+
└── budget-monitor.sh [modified: 95% menu + token accuracy]
|
|
133
|
+
|
|
134
|
+
outcome/skills/
|
|
135
|
+
├── recall/ [NEW]
|
|
136
|
+
├── handoff/ [NEW]
|
|
137
|
+
└── offload/ [NEW]
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
## 6. Schemas
|
|
141
|
+
|
|
142
|
+
### 6.1 Recall manifest entry (in-context, ~80 tokens each)
|
|
143
|
+
|
|
144
|
+
```
|
|
145
|
+
[<id>] feature=<f> kind=<k> date=<YYYY-MM-DD> focus="<≤80 chars>"
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
Example:
|
|
149
|
+
```
|
|
150
|
+
[SS-2026053001] feature=auth kind=session-snapshot date=2026-05-30 focus="OTP race condition in middleware"
|
|
151
|
+
[OBS-2026052901] feature=ui kind=decision date=2026-05-29 focus="Use Tailwind not styled-components"
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### 6.2 Snapshot entry (Tier 2 body, ~2-3KB)
|
|
155
|
+
|
|
156
|
+
```json
|
|
157
|
+
{
|
|
158
|
+
"id": "SS-2026053001",
|
|
159
|
+
"ts": "2026-05-30T18:30:00Z",
|
|
160
|
+
"kind": "session-snapshot",
|
|
161
|
+
"feature": "auth",
|
|
162
|
+
"focus": "Resolve OTP race condition in middleware",
|
|
163
|
+
"decisions": [
|
|
164
|
+
{"id": "d-001", "rule": "WHEN form submits with code, THE SYSTEM SHALL validate before navigation"}
|
|
165
|
+
],
|
|
166
|
+
"open_problems": [
|
|
167
|
+
{"id": "p-001", "what": "Concurrent submissions cause double-charge", "blocked_by": "need DB advisory lock"}
|
|
168
|
+
],
|
|
169
|
+
"next_intent": "Implement advisory lock in src/auth/middleware.ts",
|
|
170
|
+
"files_touched": [
|
|
171
|
+
{"path": "src/auth/middleware.ts", "why": "added validation hook"}
|
|
172
|
+
],
|
|
173
|
+
"prev_session_id": "SS-2026052801",
|
|
174
|
+
"source": "handoff"
|
|
175
|
+
}
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### 6.3 Observation entry (Tier 2 body, unchanged from v1)
|
|
179
|
+
|
|
180
|
+
```json
|
|
181
|
+
{
|
|
182
|
+
"id": "OBS-2026052901",
|
|
183
|
+
"ts": "...", "kind": "decision|failure|observation",
|
|
184
|
+
"summary": "...", "details": "...",
|
|
185
|
+
"feature": "...", "files": [...], "tags": [...],
|
|
186
|
+
"source": "manual|session"
|
|
187
|
+
}
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
Note: `id` is new in v2; old entries get IDs back-filled by `memory-archive.sh` on first run.
|
|
191
|
+
|
|
192
|
+
### 6.4 Scratchpad (Tier 1)
|
|
193
|
+
|
|
194
|
+
```markdown
|
|
195
|
+
# Working Scratchpad
|
|
196
|
+
> Rewritten by AI at end of every turn. Read at SessionStart.
|
|
197
|
+
|
|
198
|
+
## Current objective
|
|
199
|
+
<one sentence>
|
|
200
|
+
|
|
201
|
+
## Last step taken
|
|
202
|
+
<what just finished>
|
|
203
|
+
|
|
204
|
+
## Next step
|
|
205
|
+
<what I plan to do next, before user input>
|
|
206
|
+
|
|
207
|
+
## Blockers / open questions
|
|
208
|
+
- <bullet>
|
|
209
|
+
- <bullet>
|
|
210
|
+
|
|
211
|
+
## Decision-relevant context (optional, ≤3 bullets)
|
|
212
|
+
- <bullet>
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
Hard cap: 500 tokens total. If AI's rewrite exceeds this, Stop hook truncates oldest sections.
|
|
216
|
+
|
|
217
|
+
## 7. Tier upgrade rationale (against 2026 SOTA)
|
|
218
|
+
|
|
219
|
+
| Component | v1 tier | v2 tier | Why |
|
|
220
|
+
|---|---|---|---|
|
|
221
|
+
| `memory-recall.sh` | B | A | Manifest-only injection + JIT recall (Anthropic Memory Tool pattern) |
|
|
222
|
+
| `memory-snapshot.sh` | C+ | A | Deterministic harvest (no LLM call competing with Sonnet 4.5+ native compaction) |
|
|
223
|
+
| Memory layer overall | C | A | Letta 3-tier + decay + agent-driven retrieval |
|
|
224
|
+
| Working scratchpad | (none) | A | Manus recitation pattern, anti-drift |
|
|
225
|
+
| Tool result clearing | (none) | A | Anthropic clear_tool_uses_20250919, single config line |
|
|
226
|
+
| Subagent offload UX | (none) | A | Budget-pressure release valve, Anthropic 90.2% lift pattern |
|
|
227
|
+
|
|
228
|
+
## 8. Rollout (Phase plan)
|
|
229
|
+
|
|
230
|
+
- **Phase 0**: Enable `clear_tool_uses_20250919`. ← *single settings line, lowest cost, highest immediate ROI*
|
|
231
|
+
- **Phase 1**: This document.
|
|
232
|
+
- **Phase 2**: Implement 3-tier file layout + `memory-recall.sh` manifest mode + `/recall` skill + archival cron + AI calling rules in CLAUDE.md.
|
|
233
|
+
- **Phase 3**: `/handoff` skill + 95% menu in `budget-monitor.sh` + accurate token accounting (read transcript `usage` instead of byte/4).
|
|
234
|
+
- **Phase 4**: `scratchpad.md` + `scratchpad-update.sh` (Stop hook) + `scratchpad-recall.sh` (SessionStart hook).
|
|
235
|
+
- **Phase 5**: `/offload` skill + 75% menu option in `budget-monitor.sh`.
|
|
236
|
+
|
|
237
|
+
## 9. Rollback
|
|
238
|
+
|
|
239
|
+
Each phase is independently rollback-able:
|
|
240
|
+
|
|
241
|
+
- **Phase 0**: Remove `"betas"` and `"context_management"` from settings.json.
|
|
242
|
+
- **Phase 2**: Move files from `sessions/recall/` back up to `.harness/memory/observations.jsonl`. Restore old `memory-recall.sh`.
|
|
243
|
+
- **Phase 3-5**: Remove the corresponding skill + revert the changed hook scripts.
|
|
244
|
+
|
|
245
|
+
Snapshots / observations / scratchpad are plain text files at all times — none of the changes lock data into a proprietary format. Worst case, rollback recovers by `cat`ing files together.
|
|
246
|
+
|
|
247
|
+
## 10. Test plan
|
|
248
|
+
|
|
249
|
+
See `outcome-test-bed/` for the test fixture and `outcome-test-bed/test-runner.sh` for the 8-case test suite (T1-T8) that validates every phase end-to-end.
|