solo-cto-agent 1.0.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +276 -0
- package/README.md +273 -510
- package/bin/auto-setup.js +345 -0
- package/bin/cli.js +1212 -113
- package/bin/consensus-review.js +962 -0
- package/bin/constants.js +150 -0
- package/bin/cowork-engine.js +951 -1091
- package/bin/external-signals.js +1038 -0
- package/bin/i18n.js +229 -0
- package/bin/local-review.js +44 -24
- package/bin/notify-config.js +218 -0
- package/bin/notify.js +97 -1
- package/bin/personalization.js +241 -0
- package/bin/plugin-loader.js +310 -0
- package/bin/plugin-manager.js +334 -0
- package/bin/prompt-utils.js +79 -0
- package/bin/review-parser.js +161 -0
- package/bin/self-evolve/error-collector.js +254 -0
- package/bin/self-evolve/external-trends.js +304 -0
- package/bin/self-evolve/feedback-collector.js +237 -0
- package/bin/self-evolve/quality-analyzer.js +262 -0
- package/bin/self-evolve/rework-learner.js +365 -0
- package/bin/self-evolve/self-evolve-orchestrator.js +350 -0
- package/bin/self-evolve/skill-improver.js +291 -0
- package/bin/self-evolve/skill-scout.js +287 -0
- package/bin/self-evolve/weekly-report.js +250 -0
- package/bin/self-evolve.js +101 -0
- package/bin/sync.js +217 -5
- package/bin/telegram-wizard.js +848 -0
- package/bin/template-audit.js +457 -0
- package/bin/uiux-engine.js +76 -9
- package/bin/watch.js +4 -2
- package/bin/wizard.js +22 -26
- package/completions/solo-cto-agent.bash +100 -0
- package/completions/solo-cto-agent.zsh +122 -0
- package/config.schema.json +126 -0
- package/docs/claude.md +134 -0
- package/docs/codex-main-install.md +236 -0
- package/docs/codex-main-live-validation.md +276 -0
- package/docs/codex-main-validation.svg +56 -0
- package/docs/configuration.md +199 -0
- package/docs/cowork-main-install.md +53 -504
- package/docs/demo.svg +64 -13
- package/docs/feedback-guide.md +1 -1
- package/docs/plugin-api-v2.md +285 -0
- package/docs/telegram-wizard-spec.md +302 -0
- package/package.json +37 -6
- package/skills/build/SKILL.md +4 -0
- package/skills/craft/SKILL.md +2 -2
- package/skills/memory/SKILL.md +33 -0
- package/skills/orchestrate/SKILL.md +1 -1
- package/skills/review/SKILL.md +19 -0
- package/skills/self-evolve/SKILL.md +254 -0
- package/skills/ship/SKILL.md +45 -3
- package/skills/spark/SKILL.md +14 -0
- package/templates/builder-defaults/agent-scores.json +17 -5
- package/templates/orchestrator/.claude/agents/implementer.md +8 -23
- package/templates/orchestrator/.claude/agents/integrator.md +7 -12
- package/templates/orchestrator/.claude/agents/reviewer.md +8 -21
- package/templates/orchestrator/.codex/prompts/implement.md +7 -20
- package/templates/orchestrator/.codex/prompts/integrate.md +7 -11
- package/templates/orchestrator/.codex/prompts/review.md +7 -17
- package/templates/orchestrator/.github/workflows/template-audit.yml +28 -0
- package/templates/orchestrator/agents/README.md +42 -0
- package/templates/orchestrator/agents/implementer.md +34 -0
- package/templates/orchestrator/agents/integrator.md +26 -0
- package/templates/orchestrator/agents/reviewer.md +36 -0
- package/templates/orchestrator/ops/orchestrator/agent-scores.json +15 -15
- package/templates/orchestrator/ops/orchestrator/decision-log.json +5 -0
- package/templates/orchestrator/ops/scripts/consensus-report.js +448 -0
- package/templates/orchestrator/ops/scripts/template-audit.js +278 -0
- package/templates/product-repo/.github/workflows/solo-cto-pipeline.yml +171 -0
- package/templates/workflows/solo-cto-review.yml +184 -0
- package/tiers.json +17 -3
- package/CHANGELOG +0 -82
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,276 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## v1.2.0 (2026-04-17)
|
|
4
|
+
|
|
5
|
+
**Theme**: Public release polish + cowork-main Phase 2/3 + dual-agent metrics.
|
|
6
|
+
|
|
7
|
+
### Highlights
|
|
8
|
+
* Terminal demo SVG with animated CLI walkthrough
|
|
9
|
+
* cowork-main Phase 2 — orchestrator auto-commits agent-scores + error-patterns post CI
|
|
10
|
+
* cowork-main Phase 3 — `session sync` fetches orchestrator data at session start
|
|
11
|
+
* Dual-agent metrics population (cross-review rate, decision tracking, rework cycles)
|
|
12
|
+
* `collect-metrics.js` fixes: orchestrator repo name, array-aware parsing, rework + cross-repo metrics
|
|
13
|
+
* `changelog.yml` CI fix (PAT token, null-safe condition, skip-ci loop prevention)
|
|
14
|
+
* npm keywords expanded for better discoverability
|
|
15
|
+
* README hero section rewritten for public audience
|
|
16
|
+
|
|
17
|
+
### Previous (detailed)
|
|
18
|
+
* feat: v1.2.0 — metrics fix, Phase 2/3 cowork, terminal demo, changelog CI — PR-G7-subcommands: telegram test/config/status/disable/verify + event filter
|
|
19
|
+
|
|
20
|
+
**Theme**: closing the telegram wizard loop. The wizard (PR-G7-impl)
|
|
21
|
+
gets you wired up; this PR adds the day-2 surface — toggle which event
|
|
22
|
+
classes notify you, mute the whole channel without losing creds, run a
|
|
23
|
+
non-interactive verify in CI, and tear it all down with one command.
|
|
24
|
+
|
|
25
|
+
### New: `bin/notify-config.js`
|
|
26
|
+
* Persistent event filter at `~/.solo-cto-agent/notify.json` (override
|
|
27
|
+
via `$SOLO_CTO_NOTIFY_CONFIG`).
|
|
28
|
+
* Schema matches `docs/telegram-wizard-spec.md` §5: `channels`,
|
|
29
|
+
`events` (review.blocker / review.dual-disagree / ci.failure /
|
|
30
|
+
ci.success / deploy.ready / deploy.error), `format`.
|
|
31
|
+
* Fail-open semantics: missing file → defaults; unknown event id →
|
|
32
|
+
enabled; corrupt JSON → defaults + `_error` marker.
|
|
33
|
+
* Atomic disk writes via tmp-file rename. `0600` perms.
|
|
34
|
+
* Empty `channels[]` is honored verbatim (so `telegram disable` can
|
|
35
|
+
truly mute the channel without the writer re-adding 'telegram').
|
|
36
|
+
|
|
37
|
+
### New telegram subcommands
|
|
38
|
+
* `solo-cto-agent telegram test` — one-shot send with current creds.
|
|
39
|
+
Bypasses the event filter (the whole point is to confirm the pipe).
|
|
40
|
+
* `solo-cto-agent telegram verify` — non-interactive `getMe` +
|
|
41
|
+
optional `sendMessage` round-trip. Returns structured exit code for
|
|
42
|
+
CI scripts.
|
|
43
|
+
* `solo-cto-agent telegram status` — dump cred sources (env vs `.env`
|
|
44
|
+
block vs shell profile), mask token, list active events.
|
|
45
|
+
* `solo-cto-agent telegram disable` — strip `.env` block + shell
|
|
46
|
+
profile block + GitHub secrets (best-effort) + drop 'telegram' from
|
|
47
|
+
notify-config channels. Idempotent.
|
|
48
|
+
* `solo-cto-agent telegram config` — toggle events / format. Three
|
|
49
|
+
modes: `--list`, `--event X --on|--off`, `--format compact|detailed`,
|
|
50
|
+
plus an interactive numbered menu when no flags + TTY.
|
|
51
|
+
|
|
52
|
+
### Wizard updates
|
|
53
|
+
* Step 5 now writes the default `notify.json` on first run so users
|
|
54
|
+
don't have to discover `telegram config` separately. Idempotent —
|
|
55
|
+
re-running the wizard never clobbers an existing config.
|
|
56
|
+
|
|
57
|
+
### `bin/notify.js`
|
|
58
|
+
* `sendTelegram` consults notify-config at emit time. If the envelope
|
|
59
|
+
carries `meta.event` and that event is disabled, the send is
|
|
60
|
+
short-circuited (returned as `{ok:true, filtered:true, reason}`).
|
|
61
|
+
* `notifyReviewResult` and `notifyApplyResult` now tag envelopes with
|
|
62
|
+
the appropriate event id (`review.blocker` / `review.dual-disagree`
|
|
63
|
+
/ `ci.failure` / `ci.success`).
|
|
64
|
+
* Lazy-require of notify-config keeps the module usable in
|
|
65
|
+
stripped-down installs that don't ship the new file.
|
|
66
|
+
|
|
67
|
+
### Tests
|
|
68
|
+
* `tests/notify-config.test.mjs` — 14 tests. Defaults, partial-merge,
|
|
69
|
+
corrupt-recovery, format normalization, channel + event toggles.
|
|
70
|
+
* `tests/telegram-subcommands.test.mjs` — 18 tests covering
|
|
71
|
+
`resolveCreds` / `telegramTest` / `telegramVerify` /
|
|
72
|
+
`telegramStatus` (with token masking assertion) / `telegramDisable`
|
|
73
|
+
/ `telegramConfig`. All network calls stubbed via injected
|
|
74
|
+
`httpGetJson` / `httpPostJson`.
|
|
75
|
+
* `tests/telegram-wizard.test.mjs` — `runWizard` tests now isolate
|
|
76
|
+
step-5 notify-config writes via `SOLO_CTO_NOTIFY_CONFIG` so the
|
|
77
|
+
suite never touches the real `~/.solo-cto-agent/`.
|
|
78
|
+
* Total: 441 tests (up from 399 in PR #64).
|
|
79
|
+
|
|
80
|
+
### Docs
|
|
81
|
+
* `docs/telegram-wizard-spec.md` — status flipped from DRAFT → SHIPPED.
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
## Unreleased — Toolkit upgrade: per-tool entry points + examples/
|
|
86
|
+
|
|
87
|
+
**Theme**: repositioning from "skill pack" to "toolkit" by splitting the
|
|
88
|
+
docs surface along tool boundaries and filling `examples/` with real
|
|
89
|
+
usage flows — not feature tours. Each example shows input → agent
|
|
90
|
+
behavior → output → pain reduced, so you can recognise which failure
|
|
91
|
+
mode an example applies to without reading the skill definitions.
|
|
92
|
+
|
|
93
|
+
### Docs structure
|
|
94
|
+
* **`docs/claude.md`** — primary tool entry point (English, slim).
|
|
95
|
+
Links deeper into `cowork-main-install.md` for install detail.
|
|
96
|
+
Landing for: install, keys, tier choice, loop overview.
|
|
97
|
+
* **Per-tool entry-point convention** — README now lists tool entry
|
|
98
|
+
points as a table. Claude is supported today; Cursor / Windsurf /
|
|
99
|
+
Copilot rows are marked "Not yet" and will gain their own docs
|
|
100
|
+
pages when their execution adapters land. The core skills
|
|
101
|
+
(`review`, `build`, `ship`, `memory`, `craft`, `spark`) stay
|
|
102
|
+
tool-agnostic.
|
|
103
|
+
* Removed the single-file top-level `Examples` file; replaced with a
|
|
104
|
+
full `examples/` tree.
|
|
105
|
+
|
|
106
|
+
### examples/ (new)
|
|
107
|
+
* `examples/build/add-google-oauth.md` — NextAuth + Supabase wiring
|
|
108
|
+
with env precheck before code gen.
|
|
109
|
+
* `examples/build/fix-recurring-build-error.md` — circuit-breaker halt
|
|
110
|
+
on 3rd repeat error + root-cause patch instead of 4th band-aid.
|
|
111
|
+
* `examples/ship/pre-deploy-env-lint.md` — service-scan + paste-ready
|
|
112
|
+
`vercel env add` commands before the deploy breaks.
|
|
113
|
+
* `examples/ship/release-with-npm-publish.md` — version bump, tag,
|
|
114
|
+
idempotent publish, safe to re-run via workflow_dispatch.
|
|
115
|
+
* `examples/review/dual-review-blocker.md` — Claude + Codex disagree
|
|
116
|
+
on a Stripe webhook race; cross-review resolves severity.
|
|
117
|
+
* `examples/review/uiux-vision-check.md` — six-axis vision scorecard
|
|
118
|
+
on a preview URL surfaces AI-slop gradients and mobile tap targets.
|
|
119
|
+
* `examples/founder-workflow/session-start-briefing.md` — 7-line brief
|
|
120
|
+
on session start instead of 15-minute context reload.
|
|
121
|
+
* `examples/founder-workflow/idea-critique.md` — risk-first critique
|
|
122
|
+
surfaces a partnership conflict in 2 minutes.
|
|
123
|
+
|
|
124
|
+
### Consistency
|
|
125
|
+
* `scripts/validate-package.js` — required-file list no longer references
|
|
126
|
+
the removed `.cursorrules` / `.windsurfrules` /
|
|
127
|
+
`.github/copilot-instructions.md`. Now tracks `examples/README.md`
|
|
128
|
+
and `docs/claude.md`.
|
|
129
|
+
* `bin/wizard.js` — default editor changed from `Cursor` to
|
|
130
|
+
`Claude Cowork` to match the supported primary surface.
|
|
131
|
+
* README — new "Tool entry points" + "Examples" sections; document
|
|
132
|
+
bullet list cleaned up with proper UTF-8 Korean (the block that was
|
|
133
|
+
previously cp949-mojibake). Remaining Korean mojibake elsewhere in
|
|
134
|
+
the README is tracked as a separate encoding-repair pass.
|
|
135
|
+
|
|
136
|
+
### No behavior change
|
|
137
|
+
* No CLI commands changed. No skill specs changed. No API. Anyone who
|
|
138
|
+
had the previous version installed continues to work identically —
|
|
139
|
+
this is a documentation + examples release.
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
143
|
+
## 1.1.0 — Tier-aware reviews, security signals, plugins & telegram
|
|
144
|
+
|
|
145
|
+
**Theme**: closing the last gaps around signal quality and agent
|
|
146
|
+
extensibility. The review loop now reasons about Haiku/Sonnet/Opus
|
|
147
|
+
tier-appropriately, surfaces live CVE/GHSA advisories via OSV.dev,
|
|
148
|
+
captures screenshots without Playwright, and gains a first-cut
|
|
149
|
+
plugin system + experimental telegram setup wizard.
|
|
150
|
+
|
|
151
|
+
### External signals (PR-G4)
|
|
152
|
+
* **T2 Security Advisories (OSV.dev)** — CVE + GHSA scan across
|
|
153
|
+
`dependencies` + `devDependencies`. Severity normalized (DB-specific
|
|
154
|
+
> CVSS numeric > UNKNOWN) and merged into the external-knowledge
|
|
155
|
+
context block. Gate: `COWORK_EXTERNAL_KNOWLEDGE_SECURITY=0` to skip.
|
|
156
|
+
|
|
157
|
+
### Review tiering (PR-G2)
|
|
158
|
+
* **Per-tier Claude model resolution** — Haiku (cheap triage) / Sonnet
|
|
159
|
+
(default) / Opus (deep review) selected automatically based on watch
|
|
160
|
+
tier. Overridable via `ANTHROPIC_MODEL_HAIKU|SONNET|OPUS`.
|
|
161
|
+
|
|
162
|
+
### UI/UX loop (PR-G5)
|
|
163
|
+
* **Playwright-free screenshot capture** — `uiux vision-review --url`
|
|
164
|
+
and `uiux capture --url` now fall back to thum.io when Playwright is
|
|
165
|
+
unavailable. Viewports: mobile 375x812 / tablet 768x1024 / desktop
|
|
166
|
+
1280x800.
|
|
167
|
+
|
|
168
|
+
### Plugins & integrations (PR-G6 / G7)
|
|
169
|
+
* **`docs/plugin-api-v2.md`** — capability manifest spec
|
|
170
|
+
(env/net/fs/cli/hook/schedule prefixes), contribution points, agent
|
|
171
|
+
targeting (`claude` / `codex` / `cowork` / `headless`).
|
|
172
|
+
* **`plugin` subcommand** — filesystem-only manager:
|
|
173
|
+
`solo-cto-agent plugin list|show|add --path <dir>|remove`. Records
|
|
174
|
+
metadata only; does NOT execute plugin code. Runtime loader lands
|
|
175
|
+
in a follow-up behind the capability gate.
|
|
176
|
+
* **`telegram wizard`** (experimental — `SOLO_CTO_EXPERIMENTAL=1`)
|
|
177
|
+
— one-command bot token + chat_id capture + `.env` / shell profile
|
|
178
|
+
/ GitHub secret writeback + live sendMessage verification.
|
|
179
|
+
* **`docs/telegram-wizard-spec.md`** — full spec including failure
|
|
180
|
+
modes and i18n hooks.
|
|
181
|
+
|
|
182
|
+
### Developer experience
|
|
183
|
+
* **375 tests** (up from 247 in 1.0.0) across 28 files — all offline,
|
|
184
|
+
all network calls stubbed via injected `fetchImpl`.
|
|
185
|
+
* **Shared `prompt-utils.js`** — `ask` / `askYesNo` / `askChoice` /
|
|
186
|
+
`isTTY` / `createRl` extracted from `wizard.js` for future wizards.
|
|
187
|
+
* **npm publish automation** — tag `v*` now triggers full CI +
|
|
188
|
+
`npm publish` + GitHub Release in one workflow.
|
|
189
|
+
|
|
190
|
+
### Upgrade notes
|
|
191
|
+
* No breaking changes. All new features are additive and gated on
|
|
192
|
+
env vars (`COWORK_EXTERNAL_KNOWLEDGE_SECURITY`,
|
|
193
|
+
`SOLO_CTO_EXPERIMENTAL`).
|
|
194
|
+
* `solo-cto-agent plugin` and `solo-cto-agent telegram` are new
|
|
195
|
+
commands — existing commands are unchanged.
|
|
196
|
+
|
|
197
|
+
## 1.0.0 — First stable release
|
|
198
|
+
|
|
199
|
+
**Why 1.0**: the loop is now closable end-to-end. Previous 0.x releases were
|
|
200
|
+
the skill pack alone. 1.0 adds the three-tier external-signal framework,
|
|
201
|
+
self-cross-review, inbound feedback, and honest signal reporting — the pieces
|
|
202
|
+
needed to trust a single-agent loop for production work.
|
|
203
|
+
|
|
204
|
+
### External-loop framework (PR-E1 through E5)
|
|
205
|
+
* T1 Peer Model — OpenAI Codex cross-check via `dual-review`
|
|
206
|
+
* T2 External Knowledge — npm registry package-currency scan surfaces major/minor/deprecated deltas
|
|
207
|
+
* T3 Ground Truth — Vercel deployment + Supabase log signals injected into the review prompt
|
|
208
|
+
* Self-loop warning — boxed notice when no external signals are active (single-model blind-spot alert)
|
|
209
|
+
* Inbound feedback channel — `feedback record` + Slack/GitHub dispatch
|
|
210
|
+
|
|
211
|
+
### Dogfood-driven fixes (PR-F1, F2)
|
|
212
|
+
* default-branch auto-detection (B1) — no more hardcoded `main`, works on `master` / `develop` repos
|
|
213
|
+
* `--target <base>` override (B2) — diff against any ref
|
|
214
|
+
* `--dry-run` now surfaces the self-loop warning without API spend (B3)
|
|
215
|
+
* README flags match reality (B4) — dead examples removed
|
|
216
|
+
* `--json | jq` pipe-safety (B5) — `setLogChannel("stderr")` keeps stdout pure JSON
|
|
217
|
+
* **honest signal reporting (F2)** — `activeCount` now reflects actual fetch outcome, not just env flags. A tier set-but-silent no longer gets counted as "active", and hints surface `enabled-but-silent: T2 (env set, no data)` for debugging.
|
|
218
|
+
|
|
219
|
+
### Developer experience
|
|
220
|
+
* 247 tests (up from ~180 in 0.6.x) covering CLI, engine parser, watch gating, self-loop warning, and new drive-run regressions
|
|
221
|
+
* Package-validate + Changelog + Test CI workflows all green
|
|
222
|
+
|
|
223
|
+
## 0.6.0
|
|
224
|
+
|
|
225
|
+
* added `solo-cto-agent lint` command — flags skills over 150 lines, missing frontmatter, large code blocks
|
|
226
|
+
* added CLI tests (init, status, lint, --force, MISSING state) — 8 new test cases
|
|
227
|
+
* added npm pack dry-run test — verifies tarball includes required files and excludes tests/CI
|
|
228
|
+
* expanded failure-catalog from 8 to 15 patterns (Next.js types, edge runtime, JWT, peer deps, DB migrations, deploy timeouts)
|
|
229
|
+
* added SECURITY.md
|
|
230
|
+
* applied references/ pattern to build skill (377→197 lines) and ship skill (283→124 lines)
|
|
231
|
+
* improved README architecture diagram (full skill system, not just error flow)
|
|
232
|
+
|
|
233
|
+
## 0.5.1
|
|
234
|
+
|
|
235
|
+
* added skill slimming docs (references/ pattern with measured results)
|
|
236
|
+
* fixed BOM encoding in CONTRIBUTING
|
|
237
|
+
* fixed corrupted FAQ section in README
|
|
238
|
+
* cleaned up README: removed duplicate sections, consolidated post-install guide
|
|
239
|
+
* updated ROADMAP with v0.5.0 completion and v0.6.0 plan
|
|
240
|
+
|
|
241
|
+
## 0.5.0
|
|
242
|
+
|
|
243
|
+
* added CLI init/status commands for npm distribution
|
|
244
|
+
* added demo asset, architecture diagram, and updated Quick Start
|
|
245
|
+
* expanded CONTRIBUTING and templates
|
|
246
|
+
|
|
247
|
+
## 0.4.0
|
|
248
|
+
|
|
249
|
+
* added package.json and basic test tooling
|
|
250
|
+
* added failure-catalog.json and schema validation
|
|
251
|
+
* added CI test workflow for PRs
|
|
252
|
+
* added ROADMAP.md
|
|
253
|
+
|
|
254
|
+
## 0.3.0
|
|
255
|
+
|
|
256
|
+
* added .cursorrules for Cursor IDE support
|
|
257
|
+
* added .windsurfrules for Windsurf (Cascade) support
|
|
258
|
+
* added .github/copilot-instructions.md for GitHub Copilot support
|
|
259
|
+
* all three rule files share the same CTO philosophy, adapted to each tool's format
|
|
260
|
+
|
|
261
|
+
## 0.2.0
|
|
262
|
+
|
|
263
|
+
* rewrote README to sound more human and less sales-heavy
|
|
264
|
+
* improved `setup.sh` toward safer repeat installs and updates
|
|
265
|
+
* softened over-strong automation claims in `build`
|
|
266
|
+
* clarified `craft` as intentionally opinionated
|
|
267
|
+
* tightened `review` wording
|
|
268
|
+
* added contribution guidance
|
|
269
|
+
* added example files for practical usage
|
|
270
|
+
|
|
271
|
+
## 0.1.0
|
|
272
|
+
|
|
273
|
+
* initial public release
|
|
274
|
+
* added build, ship, craft, spark, review, and memory skills
|
|
275
|
+
* added setup script
|
|
276
|
+
* added templates for context and project state
|