company-skill 4.6.2 → 4.6.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -46,17 +46,17 @@ GOAL -> THINK -> EXECUTE (parallel waves) -> VERIFY -> Done?
46
46
 
47
47
  ## Dashboard
48
48
 
49
- The dashboard starts automatically when you run `/company` and prints its URL in the cycle banner. Each session gets its own port (7000-7999, derived from the session id). Open it in any browser.
49
+ The dashboard starts automatically when you run `/company` and prints its URL in the cycle banner and the Claude Code status line. Each session gets its own port (7000-7999, derived from the session id). Open it in any browser.
50
50
 
51
51
  ```
52
- http://127.0.0.1:7421 <- your session's link, printed at startup
52
+ http://127.0.0.1:7421 <- your session's link, printed at startup and in the status bar
53
53
  ```
54
54
 
55
55
  What you see, panel by panel:
56
56
 
57
- **Context fill** - the live fill percentage, computed with the same formula the context-guard uses. When the session hits the restart threshold (default 50%), the bar shows "restart due" so you can see the gate before it fires.
57
+ **Context fill** - the live fill percentage, computed with the same formula the context-guard uses. When the session hits the restart threshold (default 50%), the bar shows "restart due" so you can see the gate before it fires. A toggle in the dashboard controls the auto-restart per session.
58
58
 
59
- **Delegation tree** - SVG tree of orchestrator, department leads, and workers. Click any node to expand its current task and status. Zoom with +/- buttons or the mouse wheel. Drag to pan. Fullscreen button expands it. Zero external JS libraries.
59
+ **Delegation tree** - SVG tree of orchestrator, department leads, and workers, with org-chart context filled from COMPANY.md. Click any node to expand its current task and status. Zoom with +/- buttons or the mouse wheel. Drag to pan. Fullscreen button expands it. Zero external JS libraries.
60
60
 
61
61
  **Active agents** - centered live table of every agent the orchestrator has spawned this session, with model, status, and token count.
62
62
 
@@ -69,22 +69,26 @@ The dashboard binds 127.0.0.1 only, reads local files, and sends nothing anywher
69
69
 
70
70
  Multi-agent orchestration buys quality with tokens. /company's answer to the token cost: spend strong-model tokens only where they buy quality, and report the bill every cycle.
71
71
 
72
- **Tiered model delegation.** Each delegation contract carries a `MODEL: cheap|mid|strong` tag. The orchestrator maps the tag to a model at spawn time. Override every sub-agent with `CLAUDE_CODE_SUBAGENT_MODEL` at launch, or write `FORCE_BEST` into `.company/MODEL_POLICY` mid-run.
72
+ **Tiered model delegation.** Each delegation contract carries a `MODEL: cheap|mid|strong` tag. The orchestrator maps the tag to a model at spawn time. Effort scales with both ROI and stakes - high-stakes or high-value work gets heavier spawn. Override every sub-agent with `CLAUDE_CODE_SUBAGENT_MODEL` at launch, or write `FORCE_BEST` into `.company/MODEL_POLICY` mid-run.
73
73
 
74
74
  **Per-cycle cost reporting.** Every cycle produces a `COST:` line in the briefing and a `cycles/cycle-{N}-cost.json` artifact.
75
75
 
76
76
  **Prompt caching.** Agent prompts are laid out stable-first so repeated spawns hit a shared cache prefix.
77
77
 
78
+ **Fable 5 / adaptive thinking.** On models that support adaptive thinking (Fable 5 and later), the orchestrator and verify layers run with thinking enabled. No `budget_tokens` param - reflection depth is model-controlled.
79
+
78
80
 
79
81
  ## Key features
80
82
 
81
83
  **Stop guard** - blocks session exit until every criterion has `passes: true` and reproduced evidence. Malformed state blocks rather than fails open. Deleting a hard criterion blocks instead of unlocking. [34-check test](tests/stop-guard.test.js).
82
84
 
83
- **Context-fill guard** - a second Stop hook forces `/company restart` once context reaches the threshold (default 50%). Reads the model id from the transcript to detect the context window. [37-check test](tests/context-guard.test.js).
85
+ **Context-fill guard** - a second Stop hook forces `/company restart` once context reaches the threshold (default 50%). Reads the model id from the transcript to detect the context window. Per-session auto-restart toggle in the dashboard. [37-check test](tests/context-guard.test.js).
84
86
 
85
87
  **Delegation contracts** - a task does not exist without a filled contract. `check-contracts.js` rejects missing fields, vacuous VERIFY-WITH commands, invalid MODEL tiers, and cyclic dependencies. [17-check test](tests/check-contracts.test.js).
86
88
 
87
- **Double verification** - the Internal Reviewer re-runs every VERIFY-WITH command independently. The Devil's Advocate attacks everything marked passing. Two independent reproductions are evidence. One transcript is a hypothesis.
89
+ **Multi-level verification.** The Internal Reviewer re-runs every VERIFY-WITH command independently. The Devil's Advocate attacks everything marked passing. For criteria tagged `stakes: "high"` in `criteria.json` (irreversible action, security surface, or public-facing claim), the critic runs in three fresh contexts with distinct lenses - correctness, security, reproducibility - and unanimous ACCEPT is required. Normal criteria keep the single critic. The completeness probe enumerates every surface the GOAL names and auto-rejects any unchecked one.
90
+
91
+ **Design judge-panel.** For criteria tagged `kind: design`, the lead may emit up to three contracts from materially different angles plus one synthesis contract, reserved for genuine design forks.
88
92
 
89
93
  **Git isolation** - workers never push to main and never merge. Every code change lands as a draft PR. The merge gate is yours.
90
94
 
@@ -92,6 +96,8 @@ Multi-agent orchestration buys quality with tokens. /company's answer to the tok
92
96
 
93
97
  **Codebase graph** - on repos with >200 tracked files, `scripts/codegraph.js` builds a commit-keyed ranked symbol map into `.company/codegraph/` for lead prompts.
94
98
 
99
+ **Status-line link** - `scripts/statusline.js` appends the per-session dashboard URL to the Claude Code status bar, enforced idempotently on every `/company` run.
100
+
95
101
 
96
102
  ## Commands
97
103
 
@@ -12,12 +12,19 @@ Probe checklist, applied to every passing criterion and every merged-or-mergeabl
12
12
  1. Was the evidence REPRODUCED this cycle or merely transcribed from a worker's claim?
13
13
  2. Does the cited test or command actually exercise the change, or does it pass vacuously?
14
14
  3. What input, edge case, or environment breaks it?
15
- 4. What surface was never checked (other pages, other platforms, error paths)?
15
+ 4. Completeness: enumerate every surface, modality, or claim the GOAL names or implies. Mark each
16
+ CHECKED (evidence cited this cycle) or UNCHECKED. An in-scope UNCHECKED surface is an automatic
17
+ REJECT. Out-of-scope gaps become PROPOSE lines, not REJECTs.
18
+ If a LENS directive is present in your prompt (e.g. "LENS: security"), focus your attack
19
+ through that lens but still apply all other probes. Return ACCEPT/REJECT + one gap line per lens,
20
+ no per-lens score.
16
21
  5. For every external claim: verified from their repo or docs, or guessed from memory?
17
22
  6. Could this be done simpler? Does every added component earn its place?
18
23
  7. Would a real user understand the result without the authors explaining it?
19
24
  8. MAST sweep (arxiv 2503.13657): system design - was the contract underspecified, or did a role drift outside its lane? Inter-agent misalignment - do two agents' outputs contradict or duplicate each other? Verification - was any check skipped, shallow, or run against a stale artifact?
20
25
  9. ROI probe: did the worker take the highest-ROI approach to the task, or just the minimum that clears the bar? A trivially better approach within the same scope is a soft flag. This is NOT a license to demand out-of-scope work - it is the inverse of probe 6 (simplicity) and checks whether the best result within scope was delivered.
26
+ 10. Anti-vacuous test (SKILL.md ANTI-VACUOUS TEST): does the new test FAIL against the pre-change code? If the test passes unconditionally (before and after the fix), it is vacuous - REJECT.
27
+ 11. Feature reachability (SKILL.md FEATURE REACHABILITY): for any feature that gates on a field or condition, is there an authoring or runtime path that sets that field? Probe by reading the skill/agent authoring instructions - if no instruction tells the orchestrator to write the gating field, the feature is dead - REJECT.
21
28
 
22
29
  Audit each probe claim against a tool result from THIS session. Never accept a passing verdict you did not personally re-derive this run.
23
30
 
@@ -38,6 +38,14 @@ Rules that bind you:
38
38
  - MODEL is your difficulty call, not a default you copy. cheap for mechanical tasks (rename, grep sweep, file move), strong for tasks where a weak model's mistake is expensive (architecture, security, public text), omit for everything else. Justify it in one clause. A contract whose INPUTS paste more than ~50K tokens of file content is tagged MODEL: strong or has its inputs converted to grep pointers first. Long-context degradation on a cheap tier is a quality bug, not a saving.
39
39
  - Lay each contract out stable-first: the fixed template fields and pasted boilerplate at the top, volatile values (paths, SHAs, feedback) at the bottom, so repeated spawns share a cacheable prompt prefix. Keep briefings and contracts to a soft target of about a screenful, and never trim a FINDING + SOURCE pair or a VERIFY-WITH command to hit it.
40
40
 
41
+ **Judge-panel for design decisions.** Reserved for genuine design forks, never for a mechanical
42
+ fix. If a criterion is tagged `kind: design` in criteria.json AND you can name 2+ materially
43
+ different angles in one line each, emit N<=3 independent contracts (each from a distinct stated
44
+ angle) plus 1 synthesis contract (a fresh-context judge that picks the winner and grafts
45
+ runner-up ideas). If you cannot name 2+ materially different angles, it is not a design fork:
46
+ use the single contract path. The synthesis judge only selects the winning design, the critic
47
+ and reviewer still gate the chosen design before any merge.
48
+
41
49
  Save your contracts to the tasks file path the orchestrator gave you, and also return them in your reply.
42
50
 
43
51
  Keep the reply SHORT: the contracts, any HIRE lines, any blocker. Cut narration and filler. Compress prose, never evidence.
@@ -19,7 +19,9 @@ Additional duties:
19
19
  - **External fact check.** Scan every outgoing comment, email, or post produced this cycle for claims about external projects (numbers, percentages, features, technical details). Any claim not verified from the actual source is BLOCKED and the task loops back. Memory-based external claims are an automatic rejection.
20
20
  - **Novel ideas.** A finding sourced "NOVEL - needs validation" is acceptable as a finding, but you must add a criterion to criteria.json requiring its validation by experiment.
21
21
  - **Merge gate input.** Your MET grades feed the merge decision. Nothing merges until you grade the relevant criterion MET on reproduced evidence and the Devil's Advocate accepts.
22
- - **Stall counter.** When you keep a criterion failing, increment (or create) an `attempts` field on its criteria.json entry. At 2+ state in your verdict that the approach is stalled and the next cycle must re-plan, not re-try.
22
+ - **Stall counter.** When you keep a criterion failing, increment (or create) an `attempts` field on its criteria.json entry. At 2+ state in your verdict that the approach is stalled and the next cycle must re-plan, not re-try. Writing `attempts` is required: the stall detector and the high-stakes 3-lens gate both read it from criteria.json.
23
+ - **Anti-vacuous test check (SKILL.md ANTI-VACUOUS TEST).** For every new test introduced this cycle, confirm it FAILS against the pre-change code. A test that passes before the fix is vacuous and must be rejected.
24
+ - **Feature reachability check (SKILL.md FEATURE REACHABILITY).** For every new feature that gates on a field in criteria.json (e.g. `stakes: "high"`), confirm the authoring path that sets that field exists in the skill or agent instructions. A gate that is never set is a dead feature; reject it until the wiring is present.
23
25
  - **Respawn reflection.** For any task that will be respawned, write a 3-line block into your verdict for the orchestrator to paste into the fresh contract: WHAT-WAS-TRIED / WHY-IT-FAILED (cited to the findings file) / DO-DIFFERENTLY. The failed worker's self-report is not a source.
24
26
 
25
27
  Audit each verdict against a tool result from THIS session. Only mark a criterion MET when you can cite the command you ran and its output from this run.
@@ -63,7 +63,30 @@ const transcriptPath = typeof stdinData.transcript_path === 'string' ? stdinData
63
63
 
64
64
  // Session scoping: same logic as stop-guard.
65
65
  // Only act for sessions in .company/OWNER. Absent/empty OWNER = gate all (legacy).
66
- const companyDir = process.env.COMPANY_DIR || path.join(process.cwd(), '.company');
66
+ // Resolve companyDir robustly: COMPANY_DIR env wins; else prefer the dir that holds
67
+ // a clean OWNER (at least one valid session-id line); fall back to cwd/.company.
68
+ // A blank/garbled OWNER does NOT qualify a dir as the active run (BLOCKER-1 fix).
69
+ function hasCleanOwner(ownerPath) {
70
+ try {
71
+ const lines = fs.readFileSync(ownerPath, 'utf8')
72
+ .split('\n').map(function (l) { return l.trim(); }).filter(Boolean);
73
+ return lines.length > 0 &&
74
+ lines.every(function (l) { return /^[A-Za-z0-9][A-Za-z0-9._-]{7,}$/.test(l); });
75
+ } catch (e) { return false; }
76
+ }
77
+ function resolveCompanyDir() {
78
+ if (process.env.COMPANY_DIR) return process.env.COMPANY_DIR;
79
+ const home = process.env.HOME || '';
80
+ const cwdDir = path.join(process.cwd(), '.company');
81
+ const homeDir = path.join(home, '.company');
82
+ const cwdHasOwner = hasCleanOwner(path.join(cwdDir, 'OWNER'));
83
+ const homeHasOwner = home && hasCleanOwner(path.join(homeDir, 'OWNER'));
84
+ // cwd/.company wins when it has a clean OWNER (project-local run, or both have OWNER).
85
+ if (cwdHasOwner) return cwdDir;
86
+ if (homeHasOwner) return homeDir;
87
+ return cwdDir; // new-run default: preserves original single-project behavior
88
+ }
89
+ const companyDir = resolveCompanyDir();
67
90
  const ownerPath = path.join(companyDir, 'OWNER');
68
91
  const cancelPath = path.join(companyDir, 'CANCEL');
69
92
 
@@ -7,7 +7,30 @@
7
7
  const fs = require('fs');
8
8
  const path = require('path');
9
9
 
10
- const companyDir = process.env.COMPANY_DIR || path.join(process.cwd(), '.company');
10
+ // Resolve companyDir robustly: COMPANY_DIR env wins; else prefer the dir that holds
11
+ // a clean OWNER (at least one valid session-id line); fall back to cwd/.company.
12
+ // A blank/garbled OWNER does NOT qualify a dir as the active run (BLOCKER-1 fix).
13
+ function hasCleanOwner(ownerPath) {
14
+ try {
15
+ const lines = fs.readFileSync(ownerPath, 'utf8')
16
+ .split('\n').map(function (l) { return l.trim(); }).filter(Boolean);
17
+ return lines.length > 0 &&
18
+ lines.every(function (l) { return /^[A-Za-z0-9][A-Za-z0-9._-]{7,}$/.test(l); });
19
+ } catch (e) { return false; }
20
+ }
21
+ function resolveCompanyDir() {
22
+ if (process.env.COMPANY_DIR) return process.env.COMPANY_DIR;
23
+ const home = process.env.HOME || '';
24
+ const cwdDir = path.join(process.cwd(), '.company');
25
+ const homeDir = path.join(home, '.company');
26
+ const cwdHasOwner = hasCleanOwner(path.join(cwdDir, 'OWNER'));
27
+ const homeHasOwner = home && hasCleanOwner(path.join(homeDir, 'OWNER'));
28
+ // cwd/.company wins when it has a clean OWNER (project-local run, or both have OWNER).
29
+ if (cwdHasOwner) return cwdDir;
30
+ if (homeHasOwner) return homeDir;
31
+ return cwdDir; // new-run default: preserves original single-project behavior
32
+ }
33
+ const companyDir = resolveCompanyDir();
11
34
  if (!fs.existsSync(companyDir)) process.exit(0);
12
35
 
13
36
  // Only sessions listed in OWNER are acted on. A foreign session that shares the
@@ -9,7 +9,30 @@
9
9
  const fs = require('fs');
10
10
  const path = require('path');
11
11
 
12
- const companyDir = process.env.COMPANY_DIR || path.join(process.cwd(), '.company');
12
+ // Resolve companyDir robustly: COMPANY_DIR env wins; else prefer the dir that holds
13
+ // a clean OWNER (at least one valid session-id line); fall back to cwd/.company.
14
+ // A blank/garbled OWNER does NOT qualify a dir as the active run (BLOCKER-1 fix).
15
+ function hasCleanOwner(ownerPath) {
16
+ try {
17
+ const lines = fs.readFileSync(ownerPath, 'utf8')
18
+ .split('\n').map(function (l) { return l.trim(); }).filter(Boolean);
19
+ return lines.length > 0 &&
20
+ lines.every(function (l) { return /^[A-Za-z0-9][A-Za-z0-9._-]{7,}$/.test(l); });
21
+ } catch (e) { return false; }
22
+ }
23
+ function resolveCompanyDir() {
24
+ if (process.env.COMPANY_DIR) return process.env.COMPANY_DIR;
25
+ const home = process.env.HOME || '';
26
+ const cwdDir = path.join(process.cwd(), '.company');
27
+ const homeDir = path.join(home, '.company');
28
+ const cwdHasOwner = hasCleanOwner(path.join(cwdDir, 'OWNER'));
29
+ const homeHasOwner = home && hasCleanOwner(path.join(homeDir, 'OWNER'));
30
+ // cwd/.company wins when it has a clean OWNER (project-local run, or both have OWNER).
31
+ if (cwdHasOwner) return cwdDir;
32
+ if (homeHasOwner) return homeDir;
33
+ return cwdDir; // new-run default: preserves original single-project behavior
34
+ }
35
+ const companyDir = resolveCompanyDir();
13
36
  if (!fs.existsSync(companyDir)) process.exit(0);
14
37
 
15
38
  // Only sessions listed in OWNER are acted on. A foreign session that shares the
@@ -24,7 +24,30 @@ const fs = require('fs');
24
24
  const path = require('path');
25
25
  const crypto = require('crypto');
26
26
 
27
- const companyDir = process.env.COMPANY_DIR || path.join(process.cwd(), '.company');
27
+ // Resolve companyDir robustly: COMPANY_DIR env wins; else prefer the dir that holds
28
+ // a clean OWNER (at least one valid session-id line); fall back to cwd/.company.
29
+ // A blank/garbled OWNER does NOT qualify a dir as the active run (BLOCKER-1 fix).
30
+ function hasCleanOwner(ownerPath) {
31
+ try {
32
+ const lines = fs.readFileSync(ownerPath, 'utf8')
33
+ .split('\n').map(function (l) { return l.trim(); }).filter(Boolean);
34
+ return lines.length > 0 &&
35
+ lines.every(function (l) { return /^[A-Za-z0-9][A-Za-z0-9._-]{7,}$/.test(l); });
36
+ } catch (e) { return false; }
37
+ }
38
+ function resolveCompanyDir() {
39
+ if (process.env.COMPANY_DIR) return process.env.COMPANY_DIR;
40
+ const home = process.env.HOME || '';
41
+ const cwdDir = path.join(process.cwd(), '.company');
42
+ const homeDir = path.join(home, '.company');
43
+ const cwdHasOwner = hasCleanOwner(path.join(cwdDir, 'OWNER'));
44
+ const homeHasOwner = home && hasCleanOwner(path.join(homeDir, 'OWNER'));
45
+ // cwd/.company wins when it has a clean OWNER (project-local run, or both have OWNER).
46
+ if (cwdHasOwner) return cwdDir;
47
+ if (homeHasOwner) return homeDir;
48
+ return cwdDir; // new-run default: preserves original single-project behavior
49
+ }
50
+ const companyDir = resolveCompanyDir();
28
51
  const criteriaPath = path.join(companyDir, 'criteria.json');
29
52
  const goalPath = path.join(companyDir, 'GOAL.md');
30
53
  const cancelPath = path.join(companyDir, 'CANCEL');
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "company-skill",
3
- "version": "4.6.2",
3
+ "version": "4.6.5",
4
4
  "description": "Goal-driven multi-employee company for Claude Code. Give it a goal, it runs until done.",
5
5
  "bin": {
6
6
  "company-skill": "./bin/install.js"
@@ -46,10 +46,11 @@ blocks.forEach((b, i) => {
46
46
  const vw = (b.split('VERIFY-WITH:')[1] || '').split('\n')[0].trim();
47
47
  const errs = [];
48
48
  if (missing.length) errs.push('missing ' + missing.join(' '));
49
- // 8g fix: require a command/verb token, path component, or URL so bare phrases
50
- // like "yes done" are rejected. The old vw.length < 8 guard was removed: it
51
- // caused a false-positive on real short commands like "git log" (7 chars).
52
- const VW_RE = /[/.:]|\b(test|grep|node|python3?|gh|git|curl|cat|ls|npm|make|pytest|diff|echo|jq|bash|sh)\b|\$\(|`|\|\||&&/;
49
+ // 8g fix: require a command/verb token, path component, URL, or a concrete
50
+ // visual-verify phrase (screenshot/playwright/open + named URL/path) so bare
51
+ // filler like "yes done" is rejected. Named-URL screenshot forms are explicitly
52
+ // allowed per skill guidance ("an equally concrete check, like a named URL").
53
+ const VW_RE = /[/.:]|\b(test|grep|node|python3?|gh|git|curl|cat|ls|npm|make|pytest|diff|echo|jq|bash|sh|playwright)\b|\$\(|`|\|\||&&|screenshot\s+https?:\/\/\S+|open\s+https?:\/\/\S+/;
53
54
  if (b.includes('VERIFY-WITH:') && (!vw.length || !VW_RE.test(vw))) errs.push('VERIFY-WITH is empty or vacuous');
54
55
  // ROI must have non-empty content after the colon so triage has something to sort on.
55
56
  const roi = (b.split('ROI:')[1] || '').split('\n')[0].trim();
@@ -129,7 +129,7 @@ function hasOpenPR(branchName) {
129
129
  const prs = JSON.parse(r.stdout || '[]');
130
130
  return prs.length > 0;
131
131
  } catch (e) {
132
- return false;
132
+ return true; // fail safe: unparseable output -> assume open PR, block deletion
133
133
  }
134
134
  }
135
135
 
@@ -64,13 +64,29 @@ function resolvePort() {
64
64
  }
65
65
  const PORT = resolvePort();
66
66
 
67
+ // A blank/garbled OWNER does NOT qualify a dir as the active run (BLOCKER-1 fix).
68
+ function hasCleanOwner(ownerPath) {
69
+ try {
70
+ const lines = fs.readFileSync(ownerPath, 'utf8')
71
+ .split('\n').map(function (l) { return l.trim(); }).filter(Boolean);
72
+ return lines.length > 0 &&
73
+ lines.every(function (l) { return /^[A-Za-z0-9][A-Za-z0-9._-]{7,}$/.test(l); });
74
+ } catch (e) { return false; }
75
+ }
67
76
  function resolveCompanyDir() {
68
77
  const flag = argValue('--company-dir');
69
78
  if (flag) return path.resolve(flag);
70
79
  if (process.env.COMPANY_DIR) return path.resolve(process.env.COMPANY_DIR);
71
- const local = path.resolve('.company');
72
- if (fs.existsSync(local)) return local;
73
- return path.join(os.homedir(), '.company');
80
+ const home = process.env.HOME || os.homedir();
81
+ const cwdDir = path.resolve('.company');
82
+ const homeDir = home ? path.join(home, '.company') : null;
83
+ // Prefer the dir that holds a clean OWNER (real active run) to avoid cwd-drift.
84
+ const cwdHasOwner = hasCleanOwner(path.join(cwdDir, 'OWNER'));
85
+ const homeHasOwner = homeDir && hasCleanOwner(path.join(homeDir, 'OWNER'));
86
+ // cwd/.company wins when it has a clean OWNER (project-local run, or both have OWNER).
87
+ if (cwdHasOwner) return cwdDir;
88
+ if (homeHasOwner) return homeDir;
89
+ return cwdDir; // new-run default: preserves original single-project behavior
74
90
  }
75
91
  const COMPANY_DIR = resolveCompanyDir();
76
92
 
@@ -628,6 +644,8 @@ function computeContextFill(transcriptFile, overrideModel) {
628
644
 
629
645
  // ---------- COMPANY.md org chart parser ----------
630
646
  // Returns { departments: [{ name, lead, roles: [{ name, isLead }] }] }
647
+ // Non-roster headings (Priorities, Rules) and HTML-commented blocks are skipped.
648
+ const NON_ROSTER_SECTIONS = /^(priorities|rules)$/i;
631
649
  function parseCompanyMd() {
632
650
  // Resolve COMPANY.md: env COMPANY_DIR first, else project cwd, else ~/.company
633
651
  const candidates = [
@@ -641,78 +659,73 @@ function parseCompanyMd() {
641
659
  }
642
660
  if (!text) return { departments: [] };
643
661
 
662
+ // Strip HTML comment blocks before parsing so commented-out sections are invisible.
663
+ text = text.replace(/<!--[\s\S]*?-->/g, '');
664
+
644
665
  const departments = [];
645
666
  let currentDept = null;
646
667
 
647
668
  for (const rawLine of text.split('\n')) {
648
669
  const line = rawLine.trim();
649
- // ## heading = new department; strip parenthetical metadata (e.g. "Growth (added 2026-06-09)")
670
+ // ## heading = new department candidate; strip parenthetical metadata first.
650
671
  if (/^##\s+/.test(line)) {
651
672
  const raw = line.replace(/^##\s+/, '').trim();
673
+ // BUG #3: extract the declared lead from the heading "(Lead: X)" before stripping it.
674
+ const headingLeadMatch = raw.match(/\(Lead:\s*([^)]+)\)/i);
675
+ const headingLeadName = headingLeadMatch ? headingLeadMatch[1].trim() : null;
652
676
  const deptName = raw.replace(/\s*\([^)]*\).*$/, '').trim();
653
- currentDept = { name: deptName, lead: null, roles: [] };
677
+ // BUG #2: skip known non-roster sections (Priorities, Rules).
678
+ if (NON_ROSTER_SECTIONS.test(deptName)) { currentDept = null; continue; }
679
+ currentDept = { name: deptName, lead: null, roles: [], _headingLeadName: headingLeadName };
654
680
  departments.push(currentDept);
655
681
  continue;
656
682
  }
657
683
  // Bullet line = a role entry (- **Name** - desc or - Name: desc or - Name, desc)
658
684
  if (/^[-*]\s+/.test(line) && currentDept) {
659
685
  const body = line.replace(/^[-*]\s+/, '').replace(/\*\*/g, '');
660
- // Role name = text before first comma, dash, or colon; strip trailing parenthetical metadata
661
- const nameMatch = body.match(/^([^,\-:]+)/);
662
- if (!nameMatch) continue;
663
- let roleName = nameMatch[1].replace(/\s*\([^)]*\).*$/, '').trim();
664
- // Strip trailing "Lead" prefix markers like "Lead:"
665
- const isExplicitLead = /^lead:/i.test(roleName) || / lead$/i.test(roleName);
666
- if (/^lead:/i.test(roleName)) roleName = roleName.replace(/^lead:/i, '').trim();
686
+ // BUG #3: detect "Lead: Name" bullet form - name comes AFTER the colon.
687
+ const leadPrefixMatch = body.match(/^lead:\s*([^,\-\n]+)/i);
688
+ let roleName, isExplicitLead;
689
+ if (leadPrefixMatch) {
690
+ // "- Lead: Ana runs growth" -> roleName = "Ana", isExplicitLead = true
691
+ roleName = leadPrefixMatch[1].replace(/\s*\([^)]*\).*$/, '').trim().split(/\s+/)[0];
692
+ isExplicitLead = true;
693
+ } else {
694
+ // Normal "- Name, desc" or "- Name - desc" form
695
+ const nameMatch = body.match(/^([^,\-:]+)/);
696
+ if (!nameMatch) continue;
697
+ roleName = nameMatch[1].replace(/\s*\([^)]*\).*$/, '').trim();
698
+ isExplicitLead = false;
699
+ }
667
700
  // CEO is always tier-0, skip from dept roles
668
701
  if (/^ceo$/i.test(roleName)) continue;
669
702
  const role = { name: roleName, isLead: isExplicitLead };
670
703
  currentDept.roles.push(role);
671
- // First role in dept becomes the lead if none marked explicit
704
+ // First role in dept becomes the lead if none marked explicit and no heading lead declared
672
705
  if (!currentDept.lead) currentDept.lead = role;
673
706
  if (isExplicitLead && currentDept.lead !== role) currentDept.lead = role;
674
707
  }
675
708
  }
676
- // Remove departments with no roles
677
- return { departments: departments.filter(d => d.roles.length > 0) };
678
- }
679
709
 
680
- // ---------- MUST-FIX 3 (revised): org tree from COMPANY.md + live agent overlay ----------
681
- function getActiveCycleNumber() {
682
- // Read active cycle from roster header or latest cycle-N-briefing.md
683
- const rosterText = cachedReadFile(path.join(COMPANY_DIR, 'active-roster.md')) || '';
684
- const m = rosterText.match(/cycle\s+(\d+)/i);
685
- if (m) return parseInt(m[1], 10);
686
- // Fallback: find highest numbered briefing
687
- try {
688
- const files = fs.readdirSync(path.join(COMPANY_DIR, 'cycles'));
689
- let max = 0;
690
- for (const f of files) {
691
- const bm = f.match(/^cycle-(\d+)-briefing\.md$/);
692
- if (bm) { const n = parseInt(bm[1], 10); if (n > max) max = n; }
710
+ // Post-process: apply heading-declared lead (BUG #3) - override first-role default
711
+ for (const dept of departments) {
712
+ if (dept._headingLeadName && dept.roles.length > 0) {
713
+ // Find the role whose name matches the heading lead declaration
714
+ const found = dept.roles.find(r =>
715
+ r.name.toLowerCase() === dept._headingLeadName.toLowerCase()
716
+ );
717
+ if (found) dept.lead = found;
718
+ // If no role matches the heading name exactly, fall back to first role (safe default)
693
719
  }
694
- if (max > 0) return max;
695
- } catch (_) { /* ignore */ }
696
- return null;
697
- }
698
-
699
- function parseCycleTaskFile(filepath) {
700
- const text = cachedReadFile(filepath);
701
- if (!text) return [];
702
- const tasks = [];
703
- // Match TASK N: ... EMPLOYEE: ... blocks separated by ---
704
- const taskRe = /^TASK\s+\d+:\s*(.+?)\n(?:[\s\S]*?)^EMPLOYEE:\s*(.+?)(?:\n(?:[\s\S]*?)^SURFACES:\s*(.+?))?(?:\n|$)/gm;
705
- let tm;
706
- while ((tm = taskRe.exec(text)) !== null) {
707
- tasks.push({
708
- task: tm[1].trim().slice(0, 120),
709
- employee: tm[2].trim().slice(0, 60),
710
- surfaces: tm[3] ? tm[3].trim().slice(0, 80) : null
711
- });
720
+ delete dept._headingLeadName;
712
721
  }
713
- return tasks;
722
+
723
+ // Remove departments with no roles (BUG #2: also removes phantom sections)
724
+ return { departments: departments.filter(d => d.roles.length > 0) };
714
725
  }
715
726
 
727
+ // ---------- Org tree from COMPANY.md + live agent overlay ----------
728
+
716
729
  // Score how well a live agent name matches a COMPANY.md role name (higher = better)
717
730
  function roleMatchScore(agentStr, roleName) {
718
731
  const a = agentStr.toLowerCase().replace(/[^a-z0-9 ]/g, ' ').trim();
@@ -731,25 +744,6 @@ function buildOrgTree(projDir, sessionId, liveAgents) {
731
744
  // Tier 0: orchestrator (CEO)
732
745
  const orchNode = { id: 'orchestrator', tier: 0, label: 'CEO', status: 'active', dept: null };
733
746
 
734
- const activeCycle = getActiveCycleNumber();
735
- const cyclesDir = path.join(COMPANY_DIR, 'cycles');
736
-
737
- // Load enrichment from active cycle task files: dept -> [{task, employee, surfaces}]
738
- const enrichment = new Map();
739
- try {
740
- const files = fs.readdirSync(cyclesDir);
741
- for (const f of files) {
742
- if (activeCycle !== null) {
743
- const bm = f.match(/^cycle-(\d+)-tasks-(.+)\.md$/);
744
- if (bm && parseInt(bm[1], 10) === activeCycle) {
745
- const dept = bm[2];
746
- const tasks = parseCycleTaskFile(path.join(cyclesDir, f));
747
- if (tasks.length > 0) enrichment.set(dept, tasks);
748
- }
749
- }
750
- }
751
- } catch (_) { /* ignore */ }
752
-
753
747
  // Load active-roster.md employee names for mapping
754
748
  const rosterText = cachedReadFile(path.join(COMPANY_DIR, 'active-roster.md')) || '';
755
749
 
@@ -885,7 +879,7 @@ function buildOrgTree(projDir, sessionId, liveAgents) {
885
879
  }
886
880
 
887
881
  const note = 'Logically: CEO delegates to dept leads; leads own their team. Physically the orchestrator spawns all agents.';
888
- return { nodes, edges, note, activeCycle };
882
+ return { nodes, edges, note };
889
883
  }
890
884
 
891
885
  // ---------- registry ----------
@@ -28,7 +28,30 @@ const fs = require('fs');
28
28
  const path = require('path');
29
29
  const crypto = require('crypto');
30
30
 
31
- const companyDir = process.env.COMPANY_DIR || path.join(process.cwd(), '.company');
31
+ // Resolve companyDir robustly: COMPANY_DIR env wins; else prefer the dir that holds
32
+ // a clean OWNER (at least one valid session-id line); fall back to cwd/.company.
33
+ // A blank/garbled OWNER does NOT qualify a dir as the active run (BLOCKER-1 fix).
34
+ function hasCleanOwner(ownerPath) {
35
+ try {
36
+ const lines = fs.readFileSync(ownerPath, 'utf8')
37
+ .split('\n').map(function (l) { return l.trim(); }).filter(Boolean);
38
+ return lines.length > 0 &&
39
+ lines.every(function (l) { return /^[A-Za-z0-9][A-Za-z0-9._-]{7,}$/.test(l); });
40
+ } catch (e) { return false; }
41
+ }
42
+ function resolveCompanyDir() {
43
+ if (process.env.COMPANY_DIR) return process.env.COMPANY_DIR;
44
+ const home = process.env.HOME || '';
45
+ const cwdDir = path.join(process.cwd(), '.company');
46
+ const homeDir = path.join(home, '.company');
47
+ const cwdHasOwner = hasCleanOwner(path.join(cwdDir, 'OWNER'));
48
+ const homeHasOwner = home && hasCleanOwner(path.join(homeDir, 'OWNER'));
49
+ // cwd/.company wins when it has a clean OWNER (project-local run, or both have OWNER).
50
+ if (cwdHasOwner) return cwdDir;
51
+ if (homeHasOwner) return homeDir;
52
+ return cwdDir; // new-run default: preserves original single-project behavior
53
+ }
54
+ const companyDir = resolveCompanyDir();
32
55
  const lockPath = path.join(companyDir, 'criteria.lock');
33
56
 
34
57
  let anchorDir = null;
@@ -15,7 +15,30 @@ if (argIdx !== -1 && process.argv[argIdx + 1]) {
15
15
  }
16
16
  const sessionId = sessionArg || process.env.CLAUDE_CODE_SESSION_ID || null;
17
17
 
18
- const companyDir = process.env.COMPANY_DIR || path.join(process.cwd(), '.company');
18
+ // Resolve companyDir robustly: COMPANY_DIR env wins; else prefer the dir that holds
19
+ // a clean OWNER (at least one valid session-id line); fall back to cwd/.company.
20
+ // A blank/garbled OWNER does NOT qualify a dir as the active run (BLOCKER-1 fix).
21
+ function hasCleanOwner(ownerPath) {
22
+ try {
23
+ const lines = fs.readFileSync(ownerPath, 'utf8')
24
+ .split('\n').map(function (l) { return l.trim(); }).filter(Boolean);
25
+ return lines.length > 0 &&
26
+ lines.every(function (l) { return /^[A-Za-z0-9][A-Za-z0-9._-]{7,}$/.test(l); });
27
+ } catch (e) { return false; }
28
+ }
29
+ function resolveCompanyDir() {
30
+ if (process.env.COMPANY_DIR) return process.env.COMPANY_DIR;
31
+ const home = process.env.HOME || '';
32
+ const cwdDir = path.join(process.cwd(), '.company');
33
+ const homeDir = path.join(home, '.company');
34
+ const cwdHasOwner = hasCleanOwner(path.join(cwdDir, 'OWNER'));
35
+ const homeHasOwner = home && hasCleanOwner(path.join(homeDir, 'OWNER'));
36
+ // cwd/.company wins when it has a clean OWNER (project-local run, or both have OWNER).
37
+ if (cwdHasOwner) return cwdDir;
38
+ if (homeHasOwner) return homeDir;
39
+ return cwdDir; // new-run default: preserves original single-project behavior
40
+ }
41
+ const companyDir = resolveCompanyDir();
19
42
 
20
43
  let input;
21
44
  try {
@@ -25,8 +25,28 @@ const os = require('os');
25
25
  let input = '';
26
26
  try { input = fs.readFileSync(0, 'utf8'); } catch (_) {}
27
27
 
28
- // Resolve the company state directory.
29
- const dir = process.env.COMPANY_DIR || path.join(os.homedir(), '.company');
28
+ // Resolve companyDir using the same clean-OWNER-preference logic as the hooks.
29
+ // A blank/garbled OWNER does NOT qualify a dir as the active run (BLOCKER-1 fix).
30
+ function hasCleanOwner(ownerPath) {
31
+ try {
32
+ const lines = fs.readFileSync(ownerPath, 'utf8')
33
+ .split('\n').map(function (l) { return l.trim(); }).filter(Boolean);
34
+ return lines.length > 0 &&
35
+ lines.every(function (l) { return /^[A-Za-z0-9][A-Za-z0-9._-]{7,}$/.test(l); });
36
+ } catch (e) { return false; }
37
+ }
38
+ function resolveDir() {
39
+ if (process.env.COMPANY_DIR) return process.env.COMPANY_DIR;
40
+ const home = process.env.HOME || os.homedir();
41
+ const cwdDir = path.join(process.cwd(), '.company');
42
+ const homeDir = home ? path.join(home, '.company') : null;
43
+ const cwdHasOwner = hasCleanOwner(path.join(cwdDir, 'OWNER'));
44
+ const homeHasOwner = homeDir && hasCleanOwner(path.join(homeDir, 'OWNER'));
45
+ if (cwdHasOwner) return cwdDir;
46
+ if (homeHasOwner) return homeDir;
47
+ return home ? path.join(home, '.company') : cwdDir;
48
+ }
49
+ const dir = resolveDir();
30
50
 
31
51
  // --- Chaining: run the prior statusline command if one is stored ---
32
52
  const baseCfgPath = path.join(dir, 'statusline-base.json');
package/skill/SKILL.md CHANGED
@@ -194,11 +194,11 @@ Otherwise:
194
194
 
195
195
  ```json
196
196
  {"goal":"...","criteria":[
197
- {"id":1,"description":"specific checkable criterion","passes":false,"evidence":null}
197
+ {"id":1,"description":"specific checkable criterion","passes":false,"evidence":null,"stakes":"normal"}
198
198
  ]}
199
199
  ```
200
200
 
201
- Every criterion must be yes/no checkable. No vague language. Every criterion starts FAILING: `passes: false`, `evidence: null`. Only the VERIFY phase may flip a criterion to passing, and only by writing the reproduced evidence into the `evidence` field at the same time. When writing criteria.json for a NEW goal, first run `node <skill-scripts-dir>/reset-company-guard.js` to clear any stale `.company/criteria.lock`, `.company/CANCEL`, `.company/.context-guard-state`, and the external anchor dir from the previous run. The stop guard re-snapshots the new id set on first sight once the stale anchor is gone. Clearing the external anchor and the context-guard state is symmetric with clearing the criteria.lock: skipping either would leave the prior run's state active for the new goal.
201
+ Every criterion must be yes/no checkable. No vague language. Every criterion starts FAILING: `passes: false`, `evidence: null`. Set `stakes: "high"` on any criterion that is irreversible, touches a security surface, makes a public-facing claim, or could cause data loss. Omit `stakes` or set it to `"normal"` for everything else; the default is normal (single critic, existing behavior unchanged). The 3-lens verify in the VERIFY section gates on this field, so a criterion that warrants high scrutiny MUST carry it here or the feature never triggers. Only the VERIFY phase may flip a criterion to passing, and only by writing the reproduced evidence into the `evidence` field at the same time. When writing criteria.json for a NEW goal, first run `node <skill-scripts-dir>/reset-company-guard.js` to clear any stale `.company/criteria.lock`, `.company/CANCEL`, `.company/.context-guard-state`, and the external anchor dir from the previous run. The stop guard re-snapshots the new id set on first sight once the stale anchor is gone. Clearing the external anchor and the context-guard state is symmetric with clearing the criteria.lock: skipping either would leave the prior run's state active for the new goal.
202
202
 
203
203
  The stop guard does NOT auto-heal when GOAL.md changes or any other file-state heuristic fires. That design is intentional: any automatic heal keyed on .company/ file state is bypassable by an in-run actor that can write criteria.json (and also write GOAL.md, which is a sibling file). `reset-company-guard.js` is the ONLY safe path - it is a deliberate, auditable action run before criteria.json is written, not a silent in-guard reset.
204
204
 
@@ -277,7 +277,7 @@ Write `.company/cycles/cycle-{N}-briefing.md` first (exact name, the PreCompact
277
277
 
278
278
  As CEO, read the GOAL and COMPANY.md. Decide which departments and employees are RELEVANT to this specific goal. Only activate relevant ones. A mobile app goal does not need a Topologist. Write `.company/active-roster.md`: each activated employee with a one-line reason.
279
279
 
280
- **Effort scaling:** size the spawn to the goal before spawning anything. Trivial goal (single surface, known fix): no leads, 1-2 contracts written by you. Medium (one department's scope, one wave): 1-2 leads. Complex (multi-surface or unknown root cause): full parallel leads + dependency waves. State the chosen tier in the cycle briefing so the critic can challenge over- or under-spawn. Tie effort to ROI: spend heavier spawn on the highest-value decomposition of the goal, not the most obvious. When two decompositions are both sound, pick the one that unblocks more downstream work or closes the riskiest criterion first.
280
+ **Effort scaling:** size the spawn to the goal before spawning anything. Trivial goal (single surface, known fix): no leads, 1-2 contracts written by you. Medium (one department's scope, one wave): 1-2 leads. Complex (multi-surface or unknown root cause): full parallel leads + dependency waves. State the chosen tier in the cycle briefing so the critic can challenge over- or under-spawn. Tie effort to ROI and stakes: spend heavier spawn on the highest-value decomposition of the goal, not the most obvious. When two decompositions are both sound, pick the one that unblocks more downstream work or closes the riskiest criterion first.
281
281
 
282
282
  Spawn ALL relevant department leads in parallel: one `company-lead` Agent call per department, every Agent call in a SINGLE message. Sequential lead spawns are a bug. If an Agent call fails transiently, retry once, then record the lead as unavailable and fold its planning into your own.
283
283
 
@@ -355,6 +355,15 @@ Before spawning the reviewer, run the findings shape gate: `node <skill-scripts-
355
355
 
356
356
  Then spawn `company-critic` (the Devil's Advocate) on everything marked passing. Its probes: was the evidence reproduced or just transcribed? Does the test actually exercise the change? What input breaks it? What surface was never checked? Could this be simpler? Would a real user understand it? For every external claim: verified from their repo or docs, or guessed? A single unclosed gap means NOT DONE.
357
357
 
358
+ **Perspective-diverse verify for high-stakes criteria.** This generalizes the restart-debate
359
+ 3-role panel (see Restart mode) and Anthropic's evaluator-optimizer/fresh-verifier pattern to
360
+ in-loop high-stakes criteria. Normal-stakes criteria keep the single critic above.
361
+ For a criterion tagged `stakes: high` in criteria.json (irreversible action, security surface,
362
+ public-facing claim, or `attempts >= 2`), spawn the critic in THREE fresh contexts, each with a
363
+ distinct `LENS:` directive in its prompt: correctness, security, reproducibility. Each returns a
364
+ binary ACCEPT/REJECT + one gap line. Any REJECT blocks (unanimous ACCEPT required to pass).
365
+ No per-lens numeric score. Record all three verdicts in the cycle review.
366
+
358
367
  **MERGE GATE:** nothing merges during EXECUTE. A worker's output stops at a draft PR. Only after the reviewer grades the relevant criterion MET on reproduced evidence AND the critic accepts it does the ORCHESTRATOR merge, recording the verdict in the cycle review. Workers never merge, ever. The merge gate reads the PR's Proof of work block against the reviewer's reproduction.
359
368
 
360
369
  **BRANCH AND WORKTREE HYGIENE (MANDATORY after every merge):** after merging a PR, the orchestrator MUST delete the merged branch with `gh pr merge --delete-branch` (the flag deletes the remote branch atomically with the merge) and remove its worktree with `git worktree remove --force <worktree-path>` followed by `git worktree prune`. A merged branch left on origin and a stale worktree are both bugs. Runs that touch multiple repos MUST apply this to every merged PR, not just the last one.
@@ -379,7 +388,11 @@ CYCLE {N} VERDICT: {DONE or NOT DONE}
379
388
  ALL criteria pass + critic accepts = EXIT.
380
389
  Otherwise = loop, re-spawning only the FAILING tasks with the review feedback in their contracts.
381
390
 
382
- **Stall detector:** the reviewer keeps an `attempts` count on each criterion's entry in criteria.json (increment on every cycle it stays failing - the stop guard ignores extra fields). At `attempts >= 2` with same-shape evidence, the next THINK MUST produce a structurally different decomposition for that criterion: new approach, new surfaces, or HIRE. Re-issuing a near-identical contract after two same-shape failures is a planning bug, not persistence.
391
+ **Stall detector:** the reviewer keeps an `attempts` count on each criterion's entry in criteria.json (increment on every cycle it stays failing - the stop guard ignores extra fields). At `attempts >= 2` with same-shape evidence, the next THINK MUST produce a structurally different decomposition for that criterion: new approach, new surfaces, or HIRE. Re-issuing a near-identical contract after two same-shape failures is a planning bug, not persistence. The reviewer agent file instructs the reviewer to increment `attempts` every cycle a criterion stays failing; that instruction is what puts the value into criteria.json so the stall detector and the high-stakes gate can read it.
392
+
393
+ **ANTI-VACUOUS TEST:** a test must exercise the exact code path it claims to verify, never bypass it (e.g. by setting an env that short-circuits the code under test, or asserting on base state that predates the change). A test that passes against the pre-change code is vacuous and provides no signal. The reviewer confirms that every new test FAILS against the unfixed code before accepting it as evidence. The critic probes for this on every passing criterion.
394
+
395
+ **FEATURE REACHABILITY:** a shipped feature that gates on a field or condition is dead if the authoring path that sets that field does not exist. When a new feature is introduced, the reviewer checks that the runtime path which produces the gating field is actually present in the skill or agent instructions, not just that the feature's code merged. The critic probes for unreachable gates on every new feature.
383
396
 
384
397
  ### COMPRESS (between cycles)
385
398