thought-cabinet 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "thought-cabinet",
3
- "version": "0.2.0",
3
+ "version": "0.2.1",
4
4
  "description": "Thought Cabinet (thc) — CLI for structured AI coding workflows with filesystem-based memory and context management.",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",
@@ -144,6 +144,12 @@ After structure approval:
144
144
 
145
145
  2. **Write plan** using [plan-template.md](plan-template.md)
146
146
  - **MUST** Read the template and follow the structure exactly.
147
+ - **TDD compatibility check**: For every change block, verify:
148
+ - `Testable Behaviors` appears **before** `Reference Implementation`
149
+ - Each testable behavior bullet is specific enough to write a failing test from (includes input, condition, and expected output/behavior)
150
+ - Each bullet maps to exactly one test — split compound behaviors
151
+ - The code block is labeled "Reference Implementation", not "Code to write"
152
+ - If a change block has no conditional logic, no data transformation, and is a pure pass-through, it may omit testable behaviors — document why.
147
153
 
148
154
  3. **Sync thoughts directory**:
149
155
  ```bash
@@ -50,12 +50,24 @@
50
50
 
51
51
  **File**: `path/to/file.ext`
52
52
  **Changes**: [Summary of changes]
53
- **Testable behaviors**: [List the behaviors this change introduces or modifies — these become TDD RED tests during implementation]
53
+
54
+ ##### Testable Behaviors (RED tests)
55
+
56
+ > Each bullet is one TDD RED test. `implementing-plan` writes each test first, watches it fail, then writes the minimal code to pass it.
57
+
58
+ - [Input/condition] → [expected output/behavior]
59
+ - [Edge case] → [expected behavior]
60
+ - [Error case] → [expected fallback]
61
+
62
+ ##### Reference Implementation
54
63
 
55
64
  ```[language]
56
- // Specific code to add/modify
65
+ // Suggested implementation written AFTER the RED tests pass.
66
+ // implementing-plan must not read this before writing the failing tests.
57
67
  ```
58
68
 
69
+ ---
70
+
59
71
  ### Success Criteria:
60
72
 
61
73
  #### Automated Verification:
@@ -81,18 +93,11 @@
81
93
 
82
94
  ---
83
95
 
84
- ## Testing Strategy
85
-
86
- ### Unit Tests:
87
-
88
- - [What to test]
89
- - [Key edge cases]
96
+ ## Integration Testing
90
97
 
91
- ### Integration Tests:
98
+ [End-to-end scenarios that require multiple components working together — not covered by unit tests above]
92
99
 
93
- - [End-to-end scenarios]
94
-
95
- ### Manual Testing Steps:
100
+ ## Manual Testing Steps
96
101
 
97
102
  1. [Specific verification step]
98
103
  2. [Edge case to test manually]
@@ -126,6 +131,26 @@ Always separate into two categories:
126
131
  - Performance under real conditions
127
132
  - User acceptance criteria
128
133
 
134
+ ## TDD Compatibility Requirements
135
+
136
+ When writing each change block, ask:
137
+
138
+ 1. **Are the testable behaviors specific enough to write a failing test from?**
139
+ - Bad: "handles null input"
140
+ - Good: "`envCreateTime=null` with cutoff set → returns `false` (safe fallback)"
141
+
142
+ 2. **Is the behavior written before the code block?**
143
+ - The testable behaviors section must appear before the reference implementation.
144
+ - The implementer reads behaviors first and writes the RED test before reading the code.
145
+
146
+ 3. **Does each bullet map to exactly one test?**
147
+ - Compound behaviors (A and B) → split into two bullets.
148
+ - Each bullet = one `def "..."()` / `it(...)` / `test(...)`.
149
+
150
+ 4. **Is the code block labeled "Reference Implementation"?**
151
+ - Never label it "Code to write" or "Implementation".
152
+ - The label signals it is consulted only after RED → GREEN, not before.
153
+
129
154
  ## Common Patterns
130
155
 
131
156
  ### Database Changes:
@@ -90,9 +90,36 @@ How should I proceed?
90
90
  Before writing any production code for a phase:
91
91
 
92
92
  1. Read existing test files for the modules being changed (if not already read in Getting Started)
93
- 2. Identify the testable behaviors the phase introduces or changes
94
- 3. Apply the `test-driven-development` RED-GREEN-REFACTOR cycle for each behavior
95
- 4. Only after all TDD cycles are complete, proceed to the completion checklist below
93
+ 2. For each change block in the phase, read only the **Testable Behaviors** section — do NOT read the Reference Implementation yet
94
+ 3. For each testable behavior bullet, execute one RED-GREEN-REFACTOR cycle:
95
+ - **RED**: Write one failing test for that behavior. Run it. Confirm it fails for the right reason.
96
+ - **GREEN**: Write the minimal production code to pass it. Run it. Confirm it passes.
97
+ - **REFACTOR**: Clean up. Run tests. Stay green.
98
+ 4. Only after all behavior bullets have passing tests, read the Reference Implementation and reconcile — adjust your implementation if it diverges from the plan's intent, but do not delete passing tests.
99
+ 5. Proceed to the phase completion checklist.
100
+
101
+ ### How to Extract Work Items from a Plan Change Block
102
+
103
+ A change block looks like:
104
+
105
+ ```
106
+ ##### Testable Behaviors (RED tests)
107
+ - `cutoff empty` → `isEnvCreatedAfterCutoff` returns `true`
108
+ - `createTime after cutoff` → returns `true`
109
+ - `createTime before cutoff` → returns `false`
110
+ - `createTime null + cutoff set` → returns `false` (safe fallback)
111
+
112
+ ##### Reference Implementation
113
+ [code]
114
+ ```
115
+
116
+ Map this to a work queue:
117
+ 1. `def "isEnvCreatedAfterCutoff: cutoff empty returns true"()` → RED → GREEN
118
+ 2. `def "isEnvCreatedAfterCutoff: createTime after cutoff returns true"()` → RED → GREEN
119
+ 3. `def "isEnvCreatedAfterCutoff: createTime before cutoff returns false"()` → RED → GREEN
120
+ 4. `def "isEnvCreatedAfterCutoff: null createTime with cutoff set returns false"()` → RED → GREEN
121
+
122
+ Each bullet is one test. Complete all cycles for this change block before moving to the next.
96
123
 
97
124
  ## Phase Completion Checklist
98
125
 
@@ -0,0 +1,74 @@
1
+ ---
2
+ name: onboard
3
+ description: Onboard an AI agent to a new project by initializing ThoughtCabinet thoughts repo and bootstrapping agent memory. Use when starting work on a new repository, setting up a fresh project for AI-assisted development, or when the user asks to onboard, bootstrap, or initialize a project.
4
+ ---
5
+
6
+ # Onboarding a New Project
7
+
8
+ Set up a new project for AI-assisted development: initialize the thoughts repo and bootstrap agent memory in one workflow.
9
+
10
+ ## Workflow Context
11
+
12
+ This skill orchestrates two capabilities that are normally run separately:
13
+ - `thc init` — connects the project to a thoughts repo
14
+ - `init-agent-memory` skill — creates AGENTS.md and supporting docs
15
+
16
+ After onboarding, the project is ready for skills like `creating-plan`, `research-codebase`, and `implementing-plan`.
17
+
18
+ ## Workflow Overview
19
+
20
+ 1. **Pre-flight + Initialize thoughts** - Run `onboard.sh`: check environment, run `thc init`
21
+ 2. **Bootstrap agent memory** - Invoke `init-agent-memory` skill (if needed)
22
+ 3. **Verify** - Run `onboard.sh --verify-only`: confirm everything is wired up
23
+
24
+ ## Step 1: Pre-flight and Initialize Thoughts
25
+
26
+ Run: `bash onboard.sh`
27
+
28
+ **Exit codes determine next action**:
29
+ - **1** — Fatal error (not a git repo, thc missing, init failed). Stop and report.
30
+ - **2** — Thoughts ready, AGENTS.md not found. Proceed to Step 2.
31
+ - **3** — Thoughts + AGENTS.md both exist. Ask user if they want to regenerate memory or skip to Step 3.
32
+
33
+ If thoughts was already initialized and user wants to re-initialize:
34
+ ```bash
35
+ bash onboard.sh --force
36
+ ```
37
+
38
+ ## Step 2: Bootstrap Agent Memory
39
+
40
+ **If AGENTS.md already exists**: Ask the user whether to regenerate or skip.
41
+
42
+ **If AGENTS.md does not exist**: Invoke the `init-agent-memory` skill.
43
+
44
+ **Note**: `thoughts/CLAUDE.md` (from Step 1) and root `CLAUDE.md` (from this step) serve different purposes:
45
+ - `thoughts/CLAUDE.md` — thoughts directory usage rules
46
+ - `./CLAUDE.md` — project memory for the AI agent (symlink to AGENTS.md)
47
+
48
+ ## Step 3: Verify and Present
49
+
50
+ Run: `bash onboard.sh --verify-only`
51
+
52
+ Present results to user:
53
+
54
+ ```
55
+ Project onboarding complete!
56
+
57
+ - thoughts/ connected to [thoughts repo path]
58
+ - AGENTS.md created with project context
59
+ - Git hooks installed (auto-sync on commit)
60
+
61
+ You're ready to use skills like /creating-plan, /research-codebase, and /implementing-plan.
62
+ ```
63
+
64
+ If any step was skipped or failed, note it clearly with suggested remediation.
65
+
66
+ ## Guidelines
67
+
68
+ **Be incremental**: Re-running the skill should be safe — each step skips if already done.
69
+
70
+ **Fail fast**: If a critical step fails, stop and report rather than continuing with a broken setup.
71
+
72
+ **Minimal prompting**: Only ask questions when the answer cannot be inferred from the environment.
73
+
74
+ **Respect existing work**: Never overwrite AGENTS.md, CLAUDE.md, or thoughts/ without explicit user confirmation.
@@ -0,0 +1,118 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+
4
+ # onboard.sh — Pre-flight checks, thoughts init, and verification for the onboard skill.
5
+ #
6
+ # Usage: bash onboard.sh [--force] [--verify-only]
7
+ # --force Re-initialize thoughts even if already set up
8
+ # --verify-only Skip init, only run verification
9
+ #
10
+ # Exit codes:
11
+ # 0 Success (init completed or already set up)
12
+ # 1 Fatal error (not a git repo, thc missing, init failed)
13
+ # 2 Thoughts already initialized, AGENTS.md not found (agent memory needed)
14
+ # 3 Thoughts already initialized, AGENTS.md exists (fully onboarded)
15
+
16
+ FORCE=false
17
+ VERIFY_ONLY=false
18
+ for arg in "$@"; do
19
+ case "$arg" in
20
+ --force) FORCE=true ;;
21
+ --verify-only) VERIFY_ONLY=true ;;
22
+ esac
23
+ done
24
+
25
+ # --- Helpers ---
26
+
27
+ resolve_thc() {
28
+ if command -v thc > /dev/null 2>&1; then
29
+ echo "thc"
30
+ elif command -v thoughtcabinet > /dev/null 2>&1; then
31
+ echo "thoughtcabinet"
32
+ else
33
+ echo ""
34
+ fi
35
+ }
36
+
37
+ check_status() {
38
+ [ -L thoughts/shared ] && THOUGHTS=true || THOUGHTS=false
39
+ [ -f AGENTS.md ] && MEMORY=true || MEMORY=false
40
+
41
+ echo "thoughts: $([ "$THOUGHTS" = true ] && echo initialized || echo 'not initialized')"
42
+ echo "memory: $([ "$MEMORY" = true ] && echo exists || echo 'not found')"
43
+ }
44
+
45
+ verify() {
46
+ echo "=== Onboarding Status ==="
47
+ local issues=0
48
+
49
+ if [ -L thoughts/shared ] && [ -L thoughts/global ]; then
50
+ echo "[OK] thoughts/ initialized"
51
+ else
52
+ echo "[FAIL] thoughts/ not initialized"
53
+ issues=$((issues + 1))
54
+ fi
55
+
56
+ [ -f AGENTS.md ] && echo "[OK] AGENTS.md created" || echo "[SKIP] AGENTS.md not created"
57
+
58
+ if [ -L CLAUDE.md ]; then echo "[OK] CLAUDE.md symlink"
59
+ elif [ -f CLAUDE.md ]; then echo "[OK] CLAUDE.md exists"
60
+ else echo "[SKIP] CLAUDE.md not created"; fi
61
+
62
+ local git_dir
63
+ git_dir=$(git rev-parse --git-common-dir 2>/dev/null)
64
+ [ -f "$git_dir/hooks/pre-commit" ] && echo "[OK] pre-commit hook" || echo "[WARN] no pre-commit hook"
65
+ [ -f "$git_dir/hooks/post-commit" ] && echo "[OK] post-commit hook" || echo "[WARN] no post-commit hook"
66
+
67
+ echo "=== Done ==="
68
+ return "$issues"
69
+ }
70
+
71
+ # --- Main ---
72
+
73
+ # Pre-flight: git repo
74
+ if ! git rev-parse --git-dir > /dev/null 2>&1; then
75
+ echo "FATAL: Not a git repository. Run 'git init' first."
76
+ exit 1
77
+ fi
78
+
79
+ # Pre-flight: thc availability
80
+ THC_CMD=$(resolve_thc)
81
+ if [ -z "$THC_CMD" ]; then
82
+ echo "FATAL: thc is not installed or not in PATH."
83
+ exit 1
84
+ fi
85
+
86
+ # Verify-only mode
87
+ if [ "$VERIFY_ONLY" = true ]; then
88
+ verify
89
+ exit $?
90
+ fi
91
+
92
+ # Status check
93
+ check_status
94
+
95
+ # Initialize thoughts
96
+ if [ "$THOUGHTS" = true ] && [ "$FORCE" = false ]; then
97
+ echo "SKIP: thoughts/ already initialized."
98
+ if [ "$MEMORY" = true ]; then
99
+ exit 3
100
+ else
101
+ exit 2
102
+ fi
103
+ fi
104
+
105
+ INIT_FLAGS="--directory $(basename "$(pwd)")"
106
+ [ "$FORCE" = true ] && INIT_FLAGS="$INIT_FLAGS --force"
107
+ $THC_CMD init $INIT_FLAGS
108
+
109
+ # Verify init succeeded
110
+ if [ -L thoughts/shared ] && [ -L thoughts/global ]; then
111
+ echo "OK: thoughts/ initialized."
112
+ else
113
+ echo "FATAL: thoughts/ init failed — symlinks not created."
114
+ exit 1
115
+ fi
116
+
117
+ # Return based on memory status
118
+ [ -f AGENTS.md ] && exit 3 || exit 2
@@ -0,0 +1,205 @@
1
+ ---
2
+ name: test-skill-e2e
3
+ description: End-to-end smoke test a ThoughtCabinet skill by deploying it to a target agent, running the agent non-interactively against a test project, capturing output, and evaluating results. Use when you want to verify a skill works correctly before shipping.
4
+ ---
5
+
6
+ # End-to-End Skill Testing
7
+
8
+ Smoke test a ThoughtCabinet skill by deploying it to a target agent CLI, invoking the agent non-interactively against a real project, capturing all output, and evaluating results.
9
+
10
+ ## Workflow Overview
11
+
12
+ 1. **Gather inputs** - Determine which skill, which agent, and which project to test against
13
+ 2. **Deploy skills** - Copy all bundled skills into the agent's project-level skill directory
14
+ 3. **Prepare test environment** - Clean up artifacts from prior runs
15
+ 4. **Execute agent** - Run the agent CLI non-interactively with the skill prompt
16
+ 5. **Evaluate results** - Read captured output and generate a pass/fail summary
17
+
18
+ ## Step 1: Gather Inputs
19
+
20
+ Determine three things from the user or surrounding context:
21
+
22
+ | Input | Example |
23
+ |-------|---------|
24
+ | Skill to test | `onboard`, or path like `src/agent-assets/skills/onboard/SKILL.md` |
25
+ | Agent CLI | `codex`, `claude` |
26
+
27
+ Once identified, read the target skill's SKILL.md to understand:
28
+ - What the skill does (description, workflow steps)
29
+ - What artifacts it creates (files, directories, symlinks)
30
+ - What verification checks it performs (look for `[OK]`/`[FAIL]` markers)
31
+
32
+ ```bash
33
+ # Example: read the skill under test
34
+ cat src/agent-assets/skills/<skill-name>/SKILL.md
35
+ ```
36
+
37
+ This understanding is essential for Steps 3 and 5.
38
+
39
+ ## Step 2: Deploy Skills
40
+
41
+ Copy **all** bundled skills from the ThoughtCabinet source tree into the agent's project-level skill directory. Include all skills (not just the one under test) because skills may reference each other.
42
+
43
+ Agent skill directory conventions:
44
+
45
+ | Agent | Directory |
46
+ |-------|-----------|
47
+ | Codex | `<project>/.codex/skills/<skill-name>/` |
48
+ | Claude Code | `<project>/.claude/skills/<skill-name>/` |
49
+ | Cline | `<project>/.cline/skills/<skill-name>/` |
50
+
51
+ ```bash
52
+ # Example: deploy all skills for Codex
53
+ THC_SRC="/path/to/thought-cabinet/src/agent-assets/skills"
54
+ PROJECT="/path/to/test-project"
55
+ AGENT_SKILLS="$PROJECT/.codex/skills"
56
+
57
+ for skill_dir in "$THC_SRC"/*/; do
58
+ skill_name=$(basename "$skill_dir")
59
+ mkdir -p "$AGENT_SKILLS/$skill_name"
60
+ cp "$skill_dir"SKILL.md "$AGENT_SKILLS/$skill_name/SKILL.md"
61
+ done
62
+ ```
63
+
64
+ Verify deployment:
65
+
66
+ ```bash
67
+ ls -la "$AGENT_SKILLS"
68
+ ```
69
+
70
+ ## Step 3: Prepare Test Environment
71
+
72
+ Clean up artifacts from prior runs so the test starts from a known state. Use the skill's SKILL.md (read in Step 1) to determine what to remove.
73
+
74
+ Common cleanup actions by skill type:
75
+
76
+ ```bash
77
+ cd "$PROJECT"
78
+
79
+ # For onboard skill
80
+ thc destroy --force 2>/dev/null || true
81
+ rm -f AGENTS.md
82
+ rm -f CLAUDE.md
83
+ rm -rf docs/architectural-patterns.md
84
+
85
+ # For init-agent-memory skill
86
+ rm -f AGENTS.md
87
+ rm -f CLAUDE.md
88
+ rm -rf docs/architectural-patterns.md
89
+
90
+ # Generic: remove thoughts artifacts
91
+ rm -rf thoughts/
92
+ ```
93
+
94
+ Verify the starting state is clean:
95
+
96
+ ```bash
97
+ echo "=== Pre-test state ==="
98
+ [ ! -d thoughts ] && echo "[OK] no thoughts/" || echo "[WARN] thoughts/ still exists"
99
+ [ ! -f AGENTS.md ] && echo "[OK] no AGENTS.md" || echo "[WARN] AGENTS.md still exists"
100
+ [ ! -f CLAUDE.md ] && echo "[OK] no CLAUDE.md" || echo "[WARN] CLAUDE.md still exists"
101
+ ```
102
+
103
+ All checks should show `[OK]` before proceeding.
104
+
105
+ ## Step 4: Execute Agent
106
+
107
+ Run the agent CLI non-interactively with full permissions, instructing it to invoke the skill. Stream all output to a timestamped log file via `tee`.
108
+
109
+ ### Build the prompt
110
+
111
+ The prompt should instruct the agent to invoke the target skill:
112
+
113
+ ```
114
+ Invoke the <skill-name> skill to <summary of what the skill does>.
115
+ ```
116
+
117
+ ### Agent invocation patterns
118
+
119
+ | Agent | Command |
120
+ |-------|---------|
121
+ | Codex | `codex exec --dangerously-bypass-approvals-and-sandbox "<prompt>"` |
122
+ | Claude Code | `claude -p "<prompt>" --dangerously-skip-permissions` |
123
+
124
+ ### Run with output capture
125
+
126
+ ```bash
127
+ LOGFILE="test-$(date +%Y%m%d-%H%M%S)-<skill-name>.log"
128
+
129
+ # Codex example
130
+ codex exec --dangerously-bypass-approvals-and-sandbox \
131
+ "Invoke the onboard skill to initialize ThoughtCabinet and bootstrap agent memory." \
132
+ 2>&1 | tee "$LOGFILE"
133
+
134
+ # Claude Code example
135
+ claude -p \
136
+ "Invoke the onboard skill to initialize ThoughtCabinet and bootstrap agent memory." \
137
+ --dangerously-skip-permissions \
138
+ 2>&1 | tee "$LOGFILE"
139
+ ```
140
+
141
+ Wait for the agent to complete. Do not interrupt it.
142
+
143
+ ## Step 5: Evaluate Results
144
+
145
+ Read the log file end-to-end and assess:
146
+
147
+ ```bash
148
+ cat "$LOGFILE"
149
+ ```
150
+
151
+ ### Evaluation checklist
152
+
153
+ | Check | What to look for |
154
+ |-------|-----------------|
155
+ | Skill discovery | Agent found and read the SKILL.md |
156
+ | Step execution | Each numbered workflow step was attempted |
157
+ | No errors | No stack traces, permission blocks, or sandbox violations |
158
+ | Artifact creation | Expected files/directories exist on disk |
159
+ | Verification markers | `[OK]` markers in output; no `[FAIL]` markers |
160
+
161
+ ### Verify artifacts on disk
162
+
163
+ ```bash
164
+ echo "=== Post-test verification ==="
165
+ # Adapt these checks to the skill under test
166
+
167
+ # For onboard skill
168
+ [ -L thoughts/shared ] && echo "[OK] thoughts/shared symlink" || echo "[FAIL] thoughts/shared missing"
169
+ [ -L thoughts/global ] && echo "[OK] thoughts/global symlink" || echo "[FAIL] thoughts/global missing"
170
+ [ -f AGENTS.md ] && echo "[OK] AGENTS.md exists" || echo "[FAIL] AGENTS.md missing"
171
+ [ -e CLAUDE.md ] && echo "[OK] CLAUDE.md exists" || echo "[FAIL] CLAUDE.md missing"
172
+ ```
173
+
174
+ ### Generate summary
175
+
176
+ Produce a clear pass/fail report:
177
+
178
+ ```
179
+ === Test Results: <skill-name> ===
180
+ Agent: <agent-name>
181
+ Project: <project-path>
182
+ Log: <logfile-path>
183
+
184
+ Step 1 (Pre-flight): PASS
185
+ Step 2 (Init thoughts): PASS
186
+ Step 3 (Bootstrap memory): PASS
187
+ Step 4 (Verify): PASS
188
+
189
+ Overall: PASS (4/4 steps)
190
+ Notes: <any observations>
191
+ ```
192
+
193
+ If any step failed, include the relevant log excerpt and suggested remediation.
194
+
195
+ ## Guidelines
196
+
197
+ **Idempotent cleanup**: The prepare step (Step 3) should make the test fully repeatable. Running the test twice in a row should produce the same result.
198
+
199
+ **Log everything**: Always capture output to a file. Agent output in terminals can scroll away or be lost.
200
+
201
+ **Read the skill first**: Understanding expected outcomes (Step 1) is essential for meaningful evaluation (Step 5). Do not skip this.
202
+
203
+ **One skill per run**: Test skills individually for clear signal. If you need to test multiple skills, run this workflow once per skill.
204
+
205
+ **Adapt checks to the skill**: The cleanup and verification examples above are for the `onboard` skill. For other skills, derive the appropriate checks from the skill's SKILL.md.