solo-cto-agent 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (75) hide show
  1. package/CHANGELOG.md +276 -0
  2. package/README.md +273 -510
  3. package/bin/auto-setup.js +345 -0
  4. package/bin/cli.js +1212 -113
  5. package/bin/consensus-review.js +962 -0
  6. package/bin/constants.js +150 -0
  7. package/bin/cowork-engine.js +951 -1091
  8. package/bin/external-signals.js +1038 -0
  9. package/bin/i18n.js +229 -0
  10. package/bin/local-review.js +44 -24
  11. package/bin/notify-config.js +218 -0
  12. package/bin/notify.js +97 -1
  13. package/bin/personalization.js +241 -0
  14. package/bin/plugin-loader.js +310 -0
  15. package/bin/plugin-manager.js +334 -0
  16. package/bin/prompt-utils.js +79 -0
  17. package/bin/review-parser.js +161 -0
  18. package/bin/self-evolve/error-collector.js +254 -0
  19. package/bin/self-evolve/external-trends.js +304 -0
  20. package/bin/self-evolve/feedback-collector.js +237 -0
  21. package/bin/self-evolve/quality-analyzer.js +262 -0
  22. package/bin/self-evolve/rework-learner.js +365 -0
  23. package/bin/self-evolve/self-evolve-orchestrator.js +350 -0
  24. package/bin/self-evolve/skill-improver.js +291 -0
  25. package/bin/self-evolve/skill-scout.js +287 -0
  26. package/bin/self-evolve/weekly-report.js +250 -0
  27. package/bin/self-evolve.js +101 -0
  28. package/bin/sync.js +217 -5
  29. package/bin/telegram-wizard.js +848 -0
  30. package/bin/template-audit.js +457 -0
  31. package/bin/uiux-engine.js +76 -9
  32. package/bin/watch.js +4 -2
  33. package/bin/wizard.js +22 -26
  34. package/completions/solo-cto-agent.bash +100 -0
  35. package/completions/solo-cto-agent.zsh +122 -0
  36. package/config.schema.json +126 -0
  37. package/docs/claude.md +134 -0
  38. package/docs/codex-main-install.md +236 -0
  39. package/docs/codex-main-live-validation.md +276 -0
  40. package/docs/codex-main-validation.svg +56 -0
  41. package/docs/configuration.md +199 -0
  42. package/docs/cowork-main-install.md +53 -504
  43. package/docs/demo.svg +64 -13
  44. package/docs/feedback-guide.md +1 -1
  45. package/docs/plugin-api-v2.md +285 -0
  46. package/docs/telegram-wizard-spec.md +302 -0
  47. package/package.json +37 -6
  48. package/skills/build/SKILL.md +4 -0
  49. package/skills/craft/SKILL.md +2 -2
  50. package/skills/memory/SKILL.md +33 -0
  51. package/skills/orchestrate/SKILL.md +1 -1
  52. package/skills/review/SKILL.md +19 -0
  53. package/skills/self-evolve/SKILL.md +254 -0
  54. package/skills/ship/SKILL.md +45 -3
  55. package/skills/spark/SKILL.md +14 -0
  56. package/templates/builder-defaults/agent-scores.json +17 -5
  57. package/templates/orchestrator/.claude/agents/implementer.md +8 -23
  58. package/templates/orchestrator/.claude/agents/integrator.md +7 -12
  59. package/templates/orchestrator/.claude/agents/reviewer.md +8 -21
  60. package/templates/orchestrator/.codex/prompts/implement.md +7 -20
  61. package/templates/orchestrator/.codex/prompts/integrate.md +7 -11
  62. package/templates/orchestrator/.codex/prompts/review.md +7 -17
  63. package/templates/orchestrator/.github/workflows/template-audit.yml +28 -0
  64. package/templates/orchestrator/agents/README.md +42 -0
  65. package/templates/orchestrator/agents/implementer.md +34 -0
  66. package/templates/orchestrator/agents/integrator.md +26 -0
  67. package/templates/orchestrator/agents/reviewer.md +36 -0
  68. package/templates/orchestrator/ops/orchestrator/agent-scores.json +15 -15
  69. package/templates/orchestrator/ops/orchestrator/decision-log.json +5 -0
  70. package/templates/orchestrator/ops/scripts/consensus-report.js +448 -0
  71. package/templates/orchestrator/ops/scripts/template-audit.js +278 -0
  72. package/templates/product-repo/.github/workflows/solo-cto-pipeline.yml +171 -0
  73. package/templates/workflows/solo-cto-review.yml +184 -0
  74. package/tiers.json +17 -3
  75. package/CHANGELOG +0 -82
package/CHANGELOG.md ADDED
@@ -0,0 +1,276 @@
1
+ # Changelog
2
+
3
+ ## v1.2.0 (2026-04-17)
4
+
5
+ **Theme**: Public release polish + cowork-main Phase 2/3 + dual-agent metrics.
6
+
7
+ ### Highlights
8
+ * Terminal demo SVG with animated CLI walkthrough
9
+ * cowork-main Phase 2 — orchestrator auto-commits agent-scores + error-patterns post CI
10
+ * cowork-main Phase 3 — `session sync` fetches orchestrator data at session start
11
+ * Dual-agent metrics population (cross-review rate, decision tracking, rework cycles)
12
+ * `collect-metrics.js` fixes: orchestrator repo name, array-aware parsing, rework + cross-repo metrics
13
+ * `changelog.yml` CI fix (PAT token, null-safe condition, skip-ci loop prevention)
14
+ * npm keywords expanded for better discoverability
15
+ * README hero section rewritten for public audience
16
+
17
+ ### Previous (detailed)
18
+ * feat: v1.2.0 — metrics fix, Phase 2/3 cowork, terminal demo, changelog CI — PR-G7-subcommands: telegram test/config/status/disable/verify + event filter
19
+
20
+ **Theme**: closing the telegram wizard loop. The wizard (PR-G7-impl)
21
+ gets you wired up; this PR adds the day-2 surface — toggle which event
22
+ classes notify you, mute the whole channel without losing creds, run a
23
+ non-interactive verify in CI, and tear it all down with one command.
24
+
25
+ ### New: `bin/notify-config.js`
26
+ * Persistent event filter at `~/.solo-cto-agent/notify.json` (override
27
+ via `$SOLO_CTO_NOTIFY_CONFIG`).
28
+ * Schema matches `docs/telegram-wizard-spec.md` §5: `channels`,
29
+ `events` (review.blocker / review.dual-disagree / ci.failure /
30
+ ci.success / deploy.ready / deploy.error), `format`.
31
+ * Fail-open semantics: missing file → defaults; unknown event id →
32
+ enabled; corrupt JSON → defaults + `_error` marker.
33
+ * Atomic disk writes via tmp-file rename. `0600` perms.
34
+ * Empty `channels[]` is honored verbatim (so `telegram disable` can
35
+ truly mute the channel without the writer re-adding 'telegram').
36
+
37
+ ### New telegram subcommands
38
+ * `solo-cto-agent telegram test` — one-shot send with current creds.
39
+ Bypasses the event filter (the whole point is to confirm the pipe).
40
+ * `solo-cto-agent telegram verify` — non-interactive `getMe` +
41
+ optional `sendMessage` round-trip. Returns structured exit code for
42
+ CI scripts.
43
+ * `solo-cto-agent telegram status` — dump cred sources (env vs `.env`
44
+ block vs shell profile), mask token, list active events.
45
+ * `solo-cto-agent telegram disable` — strip `.env` block + shell
46
+ profile block + GitHub secrets (best-effort) + drop 'telegram' from
47
+ notify-config channels. Idempotent.
48
+ * `solo-cto-agent telegram config` — toggle events / format. Three
49
+ modes: `--list`, `--event X --on|--off`, `--format compact|detailed`,
50
+ plus an interactive numbered menu when no flags + TTY.
51
+
52
+ ### Wizard updates
53
+ * Step 5 now writes the default `notify.json` on first run so users
54
+ don't have to discover `telegram config` separately. Idempotent —
55
+ re-running the wizard never clobbers an existing config.
56
+
57
+ ### `bin/notify.js`
58
+ * `sendTelegram` consults notify-config at emit time. If the envelope
59
+ carries `meta.event` and that event is disabled, the send is
60
+ short-circuited (returned as `{ok:true, filtered:true, reason}`).
61
+ * `notifyReviewResult` and `notifyApplyResult` now tag envelopes with
62
+ the appropriate event id (`review.blocker` / `review.dual-disagree`
63
+ / `ci.failure` / `ci.success`).
64
+ * Lazy-require of notify-config keeps the module usable in
65
+ stripped-down installs that don't ship the new file.
66
+
67
+ ### Tests
68
+ * `tests/notify-config.test.mjs` — 14 tests. Defaults, partial-merge,
69
+ corrupt-recovery, format normalization, channel + event toggles.
70
+ * `tests/telegram-subcommands.test.mjs` — 18 tests covering
71
+ `resolveCreds` / `telegramTest` / `telegramVerify` /
72
+ `telegramStatus` (with token masking assertion) / `telegramDisable`
73
+ / `telegramConfig`. All network calls stubbed via injected
74
+ `httpGetJson` / `httpPostJson`.
75
+ * `tests/telegram-wizard.test.mjs` — `runWizard` tests now isolate
76
+ step-5 notify-config writes via `SOLO_CTO_NOTIFY_CONFIG` so the
77
+ suite never touches the real `~/.solo-cto-agent/`.
78
+ * Total: 441 tests (up from 399 in PR #64).
79
+
80
+ ### Docs
81
+ * `docs/telegram-wizard-spec.md` — status flipped from DRAFT → SHIPPED.
82
+
83
+ ---
84
+
85
+ ## Unreleased — Toolkit upgrade: per-tool entry points + examples/
86
+
87
+ **Theme**: repositioning from "skill pack" to "toolkit" by splitting the
88
+ docs surface along tool boundaries and filling `examples/` with real
89
+ usage flows — not feature tours. Each example shows input → agent
90
+ behavior → output → pain reduced, so you can recognise which failure
91
+ mode an example applies to without reading the skill definitions.
92
+
93
+ ### Docs structure
94
+ * **`docs/claude.md`** — primary tool entry point (English, slim).
95
+ Links deeper into `cowork-main-install.md` for install detail.
96
+ Landing for: install, keys, tier choice, loop overview.
97
+ * **Per-tool entry-point convention** — README now lists tool entry
98
+ points as a table. Claude is supported today; Cursor / Windsurf /
99
+ Copilot rows are marked "Not yet" and will gain their own docs
100
+ pages when their execution adapters land. The core skills
101
+ (`review`, `build`, `ship`, `memory`, `craft`, `spark`) stay
102
+ tool-agnostic.
103
+ * Removed the single-file top-level `Examples` file; replaced with a
104
+ full `examples/` tree.
105
+
106
+ ### examples/ (new)
107
+ * `examples/build/add-google-oauth.md` — NextAuth + Supabase wiring
108
+ with env precheck before code gen.
109
+ * `examples/build/fix-recurring-build-error.md` — circuit-breaker halt
110
+ on 3rd repeat error + root-cause patch instead of 4th band-aid.
111
+ * `examples/ship/pre-deploy-env-lint.md` — service-scan + paste-ready
112
+ `vercel env add` commands before the deploy breaks.
113
+ * `examples/ship/release-with-npm-publish.md` — version bump, tag,
114
+ idempotent publish, safe to re-run via workflow_dispatch.
115
+ * `examples/review/dual-review-blocker.md` — Claude + Codex disagree
116
+ on a Stripe webhook race; cross-review resolves severity.
117
+ * `examples/review/uiux-vision-check.md` — six-axis vision scorecard
118
+ on a preview URL surfaces AI-slop gradients and mobile tap targets.
119
+ * `examples/founder-workflow/session-start-briefing.md` — 7-line brief
120
+ on session start instead of 15-minute context reload.
121
+ * `examples/founder-workflow/idea-critique.md` — risk-first critique
122
+ surfaces a partnership conflict in 2 minutes.
123
+
124
+ ### Consistency
125
+ * `scripts/validate-package.js` — required-file list no longer references
126
+ the removed `.cursorrules` / `.windsurfrules` /
127
+ `.github/copilot-instructions.md`. Now tracks `examples/README.md`
128
+ and `docs/claude.md`.
129
+ * `bin/wizard.js` — default editor changed from `Cursor` to
130
+ `Claude Cowork` to match the supported primary surface.
131
+ * README — new "Tool entry points" + "Examples" sections; document
132
+ bullet list cleaned up with proper UTF-8 Korean (the block that was
133
+ previously cp949-mojibake). Remaining Korean mojibake elsewhere in
134
+ the README is tracked as a separate encoding-repair pass.
135
+
136
+ ### No behavior change
137
+ * No CLI commands changed. No skill specs changed. No API. Anyone who
138
+ had the previous version installed continues to work identically —
139
+ this is a documentation + examples release.
140
+
141
+ ---
142
+
143
+ ## 1.1.0 — Tier-aware reviews, security signals, plugins & telegram
144
+
145
+ **Theme**: closing the last gaps around signal quality and agent
146
+ extensibility. The review loop now reasons about Haiku/Sonnet/Opus
147
+ tier-appropriately, surfaces live CVE/GHSA advisories via OSV.dev,
148
+ captures screenshots without Playwright, and gains a first-cut
149
+ plugin system + experimental telegram setup wizard.
150
+
151
+ ### External signals (PR-G4)
152
+ * **T2 Security Advisories (OSV.dev)** — CVE + GHSA scan across
153
+ `dependencies` + `devDependencies`. Severity normalized (DB-specific
154
+ > CVSS numeric > UNKNOWN) and merged into the external-knowledge
155
+ context block. Gate: `COWORK_EXTERNAL_KNOWLEDGE_SECURITY=0` to skip.
156
+
157
+ ### Review tiering (PR-G2)
158
+ * **Per-tier Claude model resolution** — Haiku (cheap triage) / Sonnet
159
+ (default) / Opus (deep review) selected automatically based on watch
160
+ tier. Overridable via `ANTHROPIC_MODEL_HAIKU|SONNET|OPUS`.
161
+
162
+ ### UI/UX loop (PR-G5)
163
+ * **Playwright-free screenshot capture** — `uiux vision-review --url`
164
+ and `uiux capture --url` now fall back to thum.io when Playwright is
165
+ unavailable. Viewports: mobile 375x812 / tablet 768x1024 / desktop
166
+ 1280x800.
167
+
168
+ ### Plugins & integrations (PR-G6 / G7)
169
+ * **`docs/plugin-api-v2.md`** — capability manifest spec
170
+ (env/net/fs/cli/hook/schedule prefixes), contribution points, agent
171
+ targeting (`claude` / `codex` / `cowork` / `headless`).
172
+ * **`plugin` subcommand** — filesystem-only manager:
173
+ `solo-cto-agent plugin list|show|add --path <dir>|remove`. Records
174
+ metadata only; does NOT execute plugin code. Runtime loader lands
175
+ in a follow-up behind the capability gate.
176
+ * **`telegram wizard`** (experimental — `SOLO_CTO_EXPERIMENTAL=1`)
177
+ — one-command bot token + chat_id capture + `.env` / shell profile
178
+ / GitHub secret writeback + live sendMessage verification.
179
+ * **`docs/telegram-wizard-spec.md`** — full spec including failure
180
+ modes and i18n hooks.
181
+
182
+ ### Developer experience
183
+ * **375 tests** (up from 247 in 1.0.0) across 28 files — all offline,
184
+ all network calls stubbed via injected `fetchImpl`.
185
+ * **Shared `prompt-utils.js`** — `ask` / `askYesNo` / `askChoice` /
186
+ `isTTY` / `createRl` extracted from `wizard.js` for future wizards.
187
+ * **npm publish automation** — tag `v*` now triggers full CI +
188
+ `npm publish` + GitHub Release in one workflow.
189
+
190
+ ### Upgrade notes
191
+ * No breaking changes. All new features are additive and gated on
192
+ env vars (`COWORK_EXTERNAL_KNOWLEDGE_SECURITY`,
193
+ `SOLO_CTO_EXPERIMENTAL`).
194
+ * `solo-cto-agent plugin` and `solo-cto-agent telegram` are new
195
+ commands — existing commands are unchanged.
196
+
197
+ ## 1.0.0 — First stable release
198
+
199
+ **Why 1.0**: the loop is now closable end-to-end. Previous 0.x releases were
200
+ the skill pack alone. 1.0 adds the three-tier external-signal framework,
201
+ self-cross-review, inbound feedback, and honest signal reporting — the pieces
202
+ needed to trust a single-agent loop for production work.
203
+
204
+ ### External-loop framework (PR-E1 through E5)
205
+ * T1 Peer Model — OpenAI Codex cross-check via `dual-review`
206
+ * T2 External Knowledge — npm registry package-currency scan surfaces major/minor/deprecated deltas
207
+ * T3 Ground Truth — Vercel deployment + Supabase log signals injected into the review prompt
208
+ * Self-loop warning — boxed notice when no external signals are active (single-model blind-spot alert)
209
+ * Inbound feedback channel — `feedback record` + Slack/GitHub dispatch
210
+
211
+ ### Dogfood-driven fixes (PR-F1, F2)
212
+ * default-branch auto-detection (B1) — no more hardcoded `main`, works on `master` / `develop` repos
213
+ * `--target <base>` override (B2) — diff against any ref
214
+ * `--dry-run` now surfaces the self-loop warning without API spend (B3)
215
+ * README flags match reality (B4) — dead examples removed
216
+ * `--json | jq` pipe-safety (B5) — `setLogChannel("stderr")` keeps stdout pure JSON
217
+ * **honest signal reporting (F2)** — `activeCount` now reflects actual fetch outcome, not just env flags. A tier set-but-silent no longer gets counted as "active", and hints surface `enabled-but-silent: T2 (env set, no data)` for debugging.
218
+
219
+ ### Developer experience
220
+ * 247 tests (up from ~180 in 0.6.x) covering CLI, engine parser, watch gating, self-loop warning, and new drive-run regressions
221
+ * Package-validate + Changelog + Test CI workflows all green
222
+
223
+ ## 0.6.0
224
+
225
+ * added `solo-cto-agent lint` command — flags skills over 150 lines, missing frontmatter, large code blocks
226
+ * added CLI tests (init, status, lint, --force, MISSING state) — 8 new test cases
227
+ * added npm pack dry-run test — verifies tarball includes required files and excludes tests/CI
228
+ * expanded failure-catalog from 8 to 15 patterns (Next.js types, edge runtime, JWT, peer deps, DB migrations, deploy timeouts)
229
+ * added SECURITY.md
230
+ * applied references/ pattern to build skill (377→197 lines) and ship skill (283→124 lines)
231
+ * improved README architecture diagram (full skill system, not just error flow)
232
+
233
+ ## 0.5.1
234
+
235
+ * added skill slimming docs (references/ pattern with measured results)
236
+ * fixed BOM encoding in CONTRIBUTING
237
+ * fixed corrupted FAQ section in README
238
+ * cleaned up README: removed duplicate sections, consolidated post-install guide
239
+ * updated ROADMAP with v0.5.0 completion and v0.6.0 plan
240
+
241
+ ## 0.5.0
242
+
243
+ * added CLI init/status commands for npm distribution
244
+ * added demo asset, architecture diagram, and updated Quick Start
245
+ * expanded CONTRIBUTING and templates
246
+
247
+ ## 0.4.0
248
+
249
+ * added package.json and basic test tooling
250
+ * added failure-catalog.json and schema validation
251
+ * added CI test workflow for PRs
252
+ * added ROADMAP.md
253
+
254
+ ## 0.3.0
255
+
256
+ * added .cursorrules for Cursor IDE support
257
+ * added .windsurfrules for Windsurf (Cascade) support
258
+ * added .github/copilot-instructions.md for GitHub Copilot support
259
+ * all three rule files share the same CTO philosophy, adapted to each tool's format
260
+
261
+ ## 0.2.0
262
+
263
+ * rewrote README to sound more human and less sales-heavy
264
+ * improved `setup.sh` toward safer repeat installs and updates
265
+ * softened over-strong automation claims in `build`
266
+ * clarified `craft` as intentionally opinionated
267
+ * tightened `review` wording
268
+ * added contribution guidance
269
+ * added example files for practical usage
270
+
271
+ ## 0.1.0
272
+
273
+ * initial public release
274
+ * added build, ship, craft, spark, review, and memory skills
275
+ * added setup script
276
+ * added templates for context and project state