majlis 0.4.5 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/dist/cli.js +602 -59
  2. package/package.json +1 -1
package/dist/cli.js CHANGED
@@ -170,6 +170,34 @@ var init_migrations = __esm({
170
170
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
171
171
  );
172
172
  CREATE INDEX idx_challenges_experiment ON challenges(experiment_id);
173
+ `);
174
+ },
175
+ // Migration 004: v3 → v4 — Reframes, findings tables; dead-end classification
176
+ (db) => {
177
+ db.exec(`
178
+ CREATE TABLE reframes (
179
+ id INTEGER PRIMARY KEY,
180
+ experiment_id INTEGER REFERENCES experiments(id),
181
+ decomposition TEXT NOT NULL,
182
+ divergences TEXT NOT NULL,
183
+ recommendation TEXT NOT NULL,
184
+ created_at DATETIME DEFAULT CURRENT_TIMESTAMP
185
+ );
186
+ CREATE INDEX idx_reframes_experiment ON reframes(experiment_id);
187
+
188
+ CREATE TABLE findings (
189
+ id INTEGER PRIMARY KEY,
190
+ experiment_id INTEGER REFERENCES experiments(id),
191
+ approach TEXT NOT NULL,
192
+ source TEXT NOT NULL,
193
+ relevance TEXT NOT NULL,
194
+ contradicts_current BOOLEAN NOT NULL DEFAULT 0,
195
+ created_at DATETIME DEFAULT CURRENT_TIMESTAMP
196
+ );
197
+ CREATE INDEX idx_findings_experiment ON findings(experiment_id);
198
+
199
+ ALTER TABLE dead_ends ADD COLUMN category TEXT DEFAULT 'structural'
200
+ CHECK(category IN ('structural', 'procedural'));
173
201
  `);
174
202
  }
175
203
  ];
@@ -552,11 +580,20 @@ and write up what you learned.
552
580
  - \`scripts/benchmark.py\` \u2014 the measurement tool. Never change how you're measured.
553
581
  - \`.majlis/\` \u2014 framework config. Not your concern.
554
582
 
583
+ ## Confirmed Doubts
584
+ If your context includes confirmedDoubts, these are weaknesses that the verifier has
585
+ confirmed from a previous cycle. You MUST address each one. Do not ignore them \u2014
586
+ the verifier will check again.
587
+
588
+ ## Metrics
589
+ The framework captures baseline and post-build metrics automatically. Do NOT claim
590
+ specific metric numbers unless quoting framework output. Do NOT run the benchmark
591
+ yourself unless instructed to. If you need to verify your change works, do a minimal
592
+ targeted test, not a full benchmark run.
593
+
555
594
  ## During building:
556
595
  - Tag EVERY decision: proof / test / strong-consensus / consensus / analogy / judgment
557
596
  - When making judgment-level decisions, state: "This is judgment \u2014 reasoning without precedent"
558
- - Run baseline metrics BEFORE making changes
559
- - Run comparison metrics AFTER making changes (once)
560
597
 
561
598
  You may NOT verify your own work or mark your own decisions as proven.
562
599
  Output your decisions in structured format so they can be recorded in the database.
@@ -579,8 +616,14 @@ tools: [Read, Glob, Grep]
579
616
  ---
580
617
  You are the Critic. You practise constructive doubt.
581
618
 
582
- You receive the builder's OUTPUT only \u2014 never its reasoning chain.
583
- Read the experiment log, related prior experiments, classification, and synthesis.
619
+ You receive:
620
+ - The builder's experiment document (the artifact, not the reasoning chain)
621
+ - The current synthesis (project state)
622
+ - Dead-ends (approaches that have been tried and failed)
623
+ - The hypothesis and experiment metadata
624
+
625
+ You do NOT see the builder's reasoning chain \u2014 only their documented output.
626
+ Use the experiment doc, synthesis, and dead-ends to find weaknesses.
584
627
 
585
628
  For each doubt:
586
629
  - What specific claim, decision, or assumption you doubt
@@ -611,6 +654,13 @@ tools: [Read, Glob, Grep]
611
654
  You are the Adversary. You do NOT review code for bugs.
612
655
  You reason about problem structure to CONSTRUCT pathological cases.
613
656
 
657
+ You receive:
658
+ - The git diff of the builder's code changes (the actual code, not prose)
659
+ - The current synthesis (project state)
660
+ - The hypothesis and experiment metadata
661
+
662
+ Study the CODE DIFF carefully \u2014 that is where the builder's assumptions are exposed.
663
+
614
664
  For each approach the builder takes, ask:
615
665
  - What input would make this fail?
616
666
  - What boundary condition was not tested?
@@ -637,6 +687,12 @@ tools: [Read, Glob, Grep, Bash]
637
687
  ---
638
688
  You are the Verifier. Perform dual verification:
639
689
 
690
+ You receive:
691
+ - All doubts with explicit DOUBT-{id} identifiers (use these in your doubt_resolutions)
692
+ - Challenge documents from the adversary
693
+ - Framework-captured metrics (baseline vs post-build) \u2014 this is GROUND TRUTH
694
+ - The hypothesis and experiment metadata
695
+
640
696
  ## Scope Constraint (CRITICAL)
641
697
 
642
698
  You must produce your structured output (grades + doubt resolutions) within your turn budget.
@@ -646,6 +702,11 @@ Reserve your final turns for writing the structured majlis-json output.
646
702
 
647
703
  The framework saves your output automatically. Do NOT attempt to write files.
648
704
 
705
+ ## Metrics (GROUND TRUTH)
706
+ If framework-captured metrics are in your context, these are the canonical before/after numbers.
707
+ Do NOT trust numbers claimed by the builder \u2014 compare against the framework metrics.
708
+ If the builder claims improvement but the framework metrics show regression, flag this.
709
+
649
710
  ## PROVENANCE CHECK:
650
711
  - Can every piece of code trace to an experiment or decision?
651
712
  - Is the chain unbroken from requirement -> classification -> experiment -> code?
@@ -660,13 +721,17 @@ Grade each component: sound / good / weak / rejected
660
721
  Grade each doubt/challenge: confirmed / dismissed (with evidence) / inconclusive
661
722
 
662
723
  ## Structured Output Format
724
+ IMPORTANT: For doubt_resolutions, use the DOUBT-{id} numbers from your context.
725
+ Example: if your context lists "DOUBT-7: [critical] The algorithm fails on X",
726
+ use doubt_id: 7 in your output.
727
+
663
728
  <!-- majlis-json
664
729
  {
665
730
  "grades": [
666
731
  { "component": "...", "grade": "sound|good|weak|rejected", "provenance_intact": true, "content_correct": true, "notes": "..." }
667
732
  ],
668
733
  "doubt_resolutions": [
669
- { "doubt_id": 0, "resolution": "confirmed|dismissed|inconclusive" }
734
+ { "doubt_id": 7, "resolution": "confirmed|dismissed|inconclusive" }
670
735
  ]
671
736
  }
672
737
  -->`,
@@ -692,7 +757,18 @@ Compare your decomposition with the existing classification.
692
757
  Flag structural divergences \u2014 these are the most valuable signals.
693
758
 
694
759
  Produce your reframe document as output. Do NOT attempt to write files.
695
- The framework saves your output automatically.`,
760
+ The framework saves your output automatically.
761
+
762
+ ## Structured Output Format
763
+ <!-- majlis-json
764
+ {
765
+ "reframe": {
766
+ "decomposition": "How you decomposed the problem",
767
+ "divergences": ["List of structural divergences from current classification"],
768
+ "recommendation": "What should change based on your independent analysis"
769
+ }
770
+ }
771
+ -->`,
696
772
  compressor: `---
697
773
  name: compressor
698
774
  model: opus
@@ -700,23 +776,36 @@ tools: [Read, Write, Edit, Glob, Grep]
700
776
  ---
701
777
  You are the Compressor. Hold the entire project in view and compress it.
702
778
 
703
- 1. Read ALL experiments, decisions, doubts, challenges, verification reports,
704
- reframes, and recent diffs.
705
- 2. Cross-reference: same question in different language? contradicting decisions?
779
+ Your taskPrompt includes a "Structured Data (CANONICAL)" section exported directly
780
+ from the SQLite database. This is the source of truth. docs/ files are agent artifacts
781
+ that may contain stale or incorrect information. Cross-reference everything against
782
+ the database export.
783
+
784
+ 1. Read the database export in your context FIRST \u2014 it has all experiments, decisions,
785
+ doubts (with resolutions), verifications (with grades), challenges, and dead-ends.
786
+ 2. Read docs/ files for narrative context, but trust the database when they conflict.
787
+ 3. Cross-reference: same question in different language? contradicting decisions?
706
788
  workaround masking root cause?
707
- 3. Update fragility map: thin coverage, weak components, untested judgment
789
+ 4. Update fragility map: thin coverage, weak components, untested judgment
708
790
  decisions, broken provenance.
709
- 4. Update dead-end registry: compress rejected experiments into structural constraints.
710
- 5. REWRITE synthesis \u2014 shorter and denser. If it's growing, you're accumulating,
711
- not compressing.
712
- 6. Review classification: new sub-types? resolved sub-types?
791
+ 5. Update dead-end registry: compress rejected experiments into structural constraints.
792
+ Mark each dead-end as [structural] or [procedural].
793
+ 6. REWRITE synthesis using the Write tool \u2014 shorter and denser. If it's growing,
794
+ you're accumulating, not compressing. You MUST use the Write tool to update
795
+ docs/synthesis/current.md, docs/synthesis/fragility.md, and docs/synthesis/dead-ends.md.
796
+ The framework does NOT auto-save your output for these files.
797
+ 7. Review classification: new sub-types? resolved sub-types?
713
798
 
714
799
  You may NOT write code, make decisions, or run experiments.
715
800
 
716
801
  ## Structured Output Format
717
802
  <!-- majlis-json
718
803
  {
719
- "guidance": "Summary of compression findings and updated state"
804
+ "compression_report": {
805
+ "synthesis_delta": "What changed in synthesis and why",
806
+ "new_dead_ends": ["List of newly identified dead-end constraints"],
807
+ "fragility_changes": ["List of changes to the fragility map"]
808
+ }
720
809
  }
721
810
  -->`,
722
811
  scout: `---
@@ -729,6 +818,11 @@ You are the Scout. You practise rihla \u2014 travel in search of knowledge.
729
818
  Your job is to search externally for alternative approaches, contradictory evidence,
730
819
  and perspectives from other fields that could inform the current experiment.
731
820
 
821
+ You receive:
822
+ - The current synthesis and fragility map
823
+ - Dead-ends (approaches that have been tried and failed) \u2014 search for alternatives that circumvent these
824
+ - The hypothesis and experiment metadata
825
+
732
826
  For the given experiment:
733
827
  1. Describe the problem in domain-neutral terms
734
828
  2. Search for alternative approaches in other fields or frameworks
@@ -739,13 +833,60 @@ For the given experiment:
739
833
  Rules:
740
834
  - Present findings neutrally. Report each approach on its own terms.
741
835
  - Note where external approaches contradict the current one \u2014 these are the most valuable signals.
836
+ - Focus on approaches that CIRCUMVENT known dead-ends \u2014 these are the most valuable.
742
837
  - You may NOT modify code or make decisions. Produce your rihla document as output only.
743
838
  - Do NOT attempt to write files. The framework saves your output automatically.
744
839
 
745
840
  ## Structured Output Format
746
841
  <!-- majlis-json
747
842
  {
748
- "decisions": []
843
+ "findings": [
844
+ { "approach": "Name of alternative approach", "source": "Where you found it", "relevance": "How it applies", "contradicts_current": true }
845
+ ]
846
+ }
847
+ -->`,
848
+ gatekeeper: `---
849
+ name: gatekeeper
850
+ model: sonnet
851
+ tools: [Read, Glob, Grep]
852
+ ---
853
+ You are the Gatekeeper. You check hypotheses before expensive build cycles.
854
+
855
+ Your job is a fast quality gate \u2014 prevent wasted Opus builds on hypotheses that
856
+ are stale, redundant with dead-ends, or too vague to produce a focused change.
857
+
858
+ ## Checks (in order)
859
+
860
+ ### 1. Stale References
861
+ Does the hypothesis reference specific functions, line numbers, or structures that
862
+ may not exist in the current code? Read the relevant files to verify.
863
+ - If references are stale, list them in stale_references.
864
+
865
+ ### 2. Dead-End Overlap
866
+ Does this hypothesis repeat an approach already ruled out by structural dead-ends?
867
+ Check each structural dead-end in your context \u2014 if the hypothesis matches the
868
+ approach or violates the structural_constraint, flag it.
869
+ - If overlapping, list the dead-end IDs in overlapping_dead_ends.
870
+
871
+ ### 3. Scope Check
872
+ Is this a single focused change? A good hypothesis names ONE function, mechanism,
873
+ or parameter to change. A bad hypothesis says "improve X and also Y and also Z."
874
+ - Flag if the hypothesis tries to do multiple things.
875
+
876
+ ## Output
877
+
878
+ gate_decision:
879
+ - **approve** \u2014 all checks pass, proceed to build
880
+ - **flag** \u2014 concerns found but not blocking (warnings only)
881
+ - **reject** \u2014 hypothesis must be revised (stale refs, dead-end repeat, or too vague)
882
+
883
+ ## Structured Output Format
884
+ <!-- majlis-json
885
+ {
886
+ "gate_decision": "approve|reject|flag",
887
+ "reason": "Brief explanation of decision",
888
+ "stale_references": ["list of stale references found, if any"],
889
+ "overlapping_dead_ends": [0]
749
890
  }
750
891
  -->`
751
892
  };
@@ -1235,12 +1376,12 @@ function getMetricHistoryByFixture(db, fixture) {
1235
1376
  ORDER BY m.captured_at
1236
1377
  `).all(fixture);
1237
1378
  }
1238
- function insertDeadEnd(db, experimentId, approach, whyFailed, structuralConstraint, subType) {
1379
+ function insertDeadEnd(db, experimentId, approach, whyFailed, structuralConstraint, subType, category = "structural") {
1239
1380
  const stmt = db.prepare(`
1240
- INSERT INTO dead_ends (experiment_id, approach, why_failed, structural_constraint, sub_type)
1241
- VALUES (?, ?, ?, ?, ?)
1381
+ INSERT INTO dead_ends (experiment_id, approach, why_failed, structural_constraint, sub_type, category)
1382
+ VALUES (?, ?, ?, ?, ?, ?)
1242
1383
  `);
1243
- const result = stmt.run(experimentId, approach, whyFailed, structuralConstraint, subType);
1384
+ const result = stmt.run(experimentId, approach, whyFailed, structuralConstraint, subType, category);
1244
1385
  return db.prepare("SELECT * FROM dead_ends WHERE id = ?").get(result.lastInsertRowid);
1245
1386
  }
1246
1387
  function listDeadEndsBySubType(db, subType) {
@@ -1315,6 +1456,9 @@ function insertChallenge(db, experimentId, description, reasoning) {
1315
1456
  const result = stmt.run(experimentId, description, reasoning);
1316
1457
  return db.prepare("SELECT * FROM challenges WHERE id = ?").get(result.lastInsertRowid);
1317
1458
  }
1459
+ function getChallengesByExperiment(db, experimentId) {
1460
+ return db.prepare("SELECT * FROM challenges WHERE experiment_id = ? ORDER BY created_at").all(experimentId);
1461
+ }
1318
1462
  function incrementSubTypeFailure(db, subType, experimentId, grade) {
1319
1463
  db.prepare(`
1320
1464
  INSERT INTO sub_type_failures (sub_type, experiment_id, grade)
@@ -1380,6 +1524,94 @@ function recordCompression(db, sessionCountSinceLast, synthesisSizeBefore, synth
1380
1524
  const result = stmt.run(sessionCountSinceLast, synthesisSizeBefore, synthesisSizeAfter);
1381
1525
  return db.prepare("SELECT * FROM compressions WHERE id = ?").get(result.lastInsertRowid);
1382
1526
  }
1527
+ function listStructuralDeadEnds(db) {
1528
+ return db.prepare(`
1529
+ SELECT * FROM dead_ends WHERE category = 'structural' ORDER BY created_at
1530
+ `).all();
1531
+ }
1532
+ function listStructuralDeadEndsBySubType(db, subType) {
1533
+ return db.prepare(`
1534
+ SELECT * FROM dead_ends WHERE category = 'structural' AND sub_type = ? ORDER BY created_at
1535
+ `).all(subType);
1536
+ }
1537
+ function insertReframe(db, experimentId, decomposition, divergences, recommendation) {
1538
+ db.prepare(`
1539
+ INSERT INTO reframes (experiment_id, decomposition, divergences, recommendation)
1540
+ VALUES (?, ?, ?, ?)
1541
+ `).run(experimentId, decomposition, divergences, recommendation);
1542
+ }
1543
+ function insertFinding(db, experimentId, approach, source, relevance, contradictsCurrent) {
1544
+ db.prepare(`
1545
+ INSERT INTO findings (experiment_id, approach, source, relevance, contradicts_current)
1546
+ VALUES (?, ?, ?, ?, ?)
1547
+ `).run(experimentId, approach, source, relevance, contradictsCurrent ? 1 : 0);
1548
+ }
1549
+ function exportForCompressor(db, maxLength = 3e4) {
1550
+ const experiments = listAllExperiments(db);
1551
+ const sections = ["# Structured Data Export (from SQLite)\n"];
1552
+ sections.push("## Experiments");
1553
+ for (const exp of experiments) {
1554
+ sections.push(`### EXP-${String(exp.id).padStart(3, "0")}: ${exp.slug}`);
1555
+ sections.push(`- Status: ${exp.status} | Sub-type: ${exp.sub_type ?? "(none)"}`);
1556
+ sections.push(`- Hypothesis: ${exp.hypothesis ?? "(none)"}`);
1557
+ const decisions = listDecisionsByExperiment(db, exp.id);
1558
+ if (decisions.length > 0) {
1559
+ sections.push(`#### Decisions (${decisions.length})`);
1560
+ for (const d of decisions) {
1561
+ sections.push(`- [${d.evidence_level}] ${d.description} \u2014 ${d.justification} (${d.status})`);
1562
+ }
1563
+ }
1564
+ const doubts = getDoubtsByExperiment(db, exp.id);
1565
+ if (doubts.length > 0) {
1566
+ sections.push(`#### Doubts (${doubts.length})`);
1567
+ for (const d of doubts) {
1568
+ sections.push(`- [${d.severity}] ${d.claim_doubted} (resolution: ${d.resolution ?? "pending"})`);
1569
+ }
1570
+ }
1571
+ const verifications = getVerificationsByExperiment(db, exp.id);
1572
+ if (verifications.length > 0) {
1573
+ sections.push(`#### Verifications (${verifications.length})`);
1574
+ for (const v of verifications) {
1575
+ sections.push(`- ${v.component}: ${v.grade}${v.notes ? ` \u2014 ${v.notes}` : ""}`);
1576
+ }
1577
+ }
1578
+ const challenges = getChallengesByExperiment(db, exp.id);
1579
+ if (challenges.length > 0) {
1580
+ sections.push(`#### Challenges (${challenges.length})`);
1581
+ for (const c of challenges) {
1582
+ sections.push(`- ${c.description}`);
1583
+ }
1584
+ }
1585
+ sections.push("");
1586
+ }
1587
+ const deadEnds = listAllDeadEnds(db);
1588
+ if (deadEnds.length > 0) {
1589
+ sections.push("## Dead Ends");
1590
+ for (const de of deadEnds) {
1591
+ sections.push(`- [${de.category ?? "structural"}] ${de.approach}: ${de.why_failed} \u2192 ${de.structural_constraint}`);
1592
+ }
1593
+ sections.push("");
1594
+ }
1595
+ const unresolvedDoubts = db.prepare(`
1596
+ SELECT d.*, e.slug as experiment_slug
1597
+ FROM doubts d JOIN experiments e ON d.experiment_id = e.id
1598
+ WHERE d.resolution IS NULL
1599
+ ORDER BY d.severity DESC, d.created_at
1600
+ `).all();
1601
+ if (unresolvedDoubts.length > 0) {
1602
+ sections.push("## Unresolved Doubts");
1603
+ for (const d of unresolvedDoubts) {
1604
+ sections.push(`- [${d.severity}] ${d.claim_doubted} (exp: ${d.experiment_slug})`);
1605
+ }
1606
+ }
1607
+ const full = sections.join("\n");
1608
+ if (full.length > maxLength) {
1609
+ return full.slice(0, maxLength) + `
1610
+
1611
+ [TRUNCATED \u2014 full export was ${full.length} chars]`;
1612
+ }
1613
+ return full;
1614
+ }
1383
1615
  var init_queries = __esm({
1384
1616
  "src/db/queries.ts"() {
1385
1617
  "use strict";
@@ -1756,13 +1988,15 @@ async function revert(args) {
1756
1988
  }
1757
1989
  const reasonIdx = args.indexOf("--reason");
1758
1990
  const reason = reasonIdx >= 0 ? args[reasonIdx + 1] : "Manually reverted";
1991
+ const category = args.includes("--structural") ? "structural" : "procedural";
1759
1992
  insertDeadEnd(
1760
1993
  db,
1761
1994
  exp.id,
1762
1995
  exp.hypothesis ?? exp.slug,
1763
1996
  reason,
1764
1997
  `Reverted: ${reason}`,
1765
- exp.sub_type
1998
+ exp.sub_type,
1999
+ category
1766
2000
  );
1767
2001
  updateExperimentStatus(db, exp.id, "dead_end");
1768
2002
  try {
@@ -2052,8 +2286,10 @@ var init_types = __esm({
2052
2286
  "src/state/types.ts"() {
2053
2287
  "use strict";
2054
2288
  TRANSITIONS = {
2055
- ["classified" /* CLASSIFIED */]: ["reframed" /* REFRAMED */, "building" /* BUILDING */],
2056
- ["reframed" /* REFRAMED */]: ["building" /* BUILDING */],
2289
+ ["classified" /* CLASSIFIED */]: ["reframed" /* REFRAMED */, "gated" /* GATED */],
2290
+ ["reframed" /* REFRAMED */]: ["gated" /* GATED */],
2291
+ ["gated" /* GATED */]: ["building" /* BUILDING */, "gated" /* GATED */],
2292
+ // self-loop for rejected hypotheses
2057
2293
  ["building" /* BUILDING */]: ["built" /* BUILT */, "building" /* BUILDING */],
2058
2294
  // self-loop for retry after truncation
2059
2295
  ["built" /* BUILT */]: ["challenged" /* CHALLENGED */, "doubted" /* DOUBTED */],
@@ -2063,7 +2299,9 @@ var init_types = __esm({
2063
2299
  ["verifying" /* VERIFYING */]: ["verified" /* VERIFIED */],
2064
2300
  ["verified" /* VERIFIED */]: ["resolved" /* RESOLVED */],
2065
2301
  ["resolved" /* RESOLVED */]: ["compressed" /* COMPRESSED */, "building" /* BUILDING */],
2302
+ // cycle-back skips gate
2066
2303
  ["compressed" /* COMPRESSED */]: ["merged" /* MERGED */, "building" /* BUILDING */],
2304
+ // cycle-back skips gate
2067
2305
  ["merged" /* MERGED */]: [],
2068
2306
  ["dead_end" /* DEAD_END */]: []
2069
2307
  };
@@ -2092,7 +2330,10 @@ function determineNextStep(exp, valid, hasDoubts2, hasChallenges2) {
2092
2330
  throw new Error(`Experiment ${exp.slug} is terminal (${exp.status})`);
2093
2331
  }
2094
2332
  const status2 = exp.status;
2095
- if (status2 === "classified" /* CLASSIFIED */) {
2333
+ if (status2 === "classified" /* CLASSIFIED */ || status2 === "reframed" /* REFRAMED */) {
2334
+ return valid.includes("gated" /* GATED */) ? "gated" /* GATED */ : valid[0];
2335
+ }
2336
+ if (status2 === "gated" /* GATED */) {
2096
2337
  return valid.includes("building" /* BUILDING */) ? "building" /* BUILDING */ : valid[0];
2097
2338
  }
2098
2339
  if (status2 === "built" /* BUILT */ && !hasDoubts2) {
@@ -2116,7 +2357,29 @@ var init_machine = __esm({
2116
2357
  });
2117
2358
 
2118
2359
  // src/agents/types.ts
2119
- var EXTRACTION_SCHEMA;
2360
+ function getExtractionSchema(role) {
2361
+ switch (role) {
2362
+ case "builder":
2363
+ return '{"decisions": [{"description": "string", "evidence_level": "proof|test|strong_consensus|consensus|analogy|judgment", "justification": "string"}]}';
2364
+ case "critic":
2365
+ return '{"doubts": [{"claim_doubted": "string", "evidence_level_of_claim": "string", "evidence_for_doubt": "string", "severity": "minor|moderate|critical"}]}';
2366
+ case "adversary":
2367
+ return '{"challenges": [{"description": "string", "reasoning": "string"}]}';
2368
+ case "verifier":
2369
+ return '{"grades": [{"component": "string", "grade": "sound|good|weak|rejected", "provenance_intact": true, "content_correct": true, "notes": "string"}], "doubt_resolutions": [{"doubt_id": 0, "resolution": "confirmed|dismissed|inconclusive"}]}';
2370
+ case "gatekeeper":
2371
+ return '{"gate_decision": "approve|reject|flag", "reason": "string", "stale_references": ["string"], "overlapping_dead_ends": [0]}';
2372
+ case "reframer":
2373
+ return '{"reframe": {"decomposition": "string", "divergences": ["string"], "recommendation": "string"}}';
2374
+ case "scout":
2375
+ return '{"findings": [{"approach": "string", "source": "string", "relevance": "string", "contradicts_current": true}]}';
2376
+ case "compressor":
2377
+ return '{"compression_report": {"synthesis_delta": "string", "new_dead_ends": ["string"], "fragility_changes": ["string"]}}';
2378
+ default:
2379
+ return EXTRACTION_SCHEMA;
2380
+ }
2381
+ }
2382
+ var EXTRACTION_SCHEMA, ROLE_REQUIRED_FIELDS;
2120
2383
  var init_types2 = __esm({
2121
2384
  "src/agents/types.ts"() {
2122
2385
  "use strict";
@@ -2127,6 +2390,16 @@ var init_types2 = __esm({
2127
2390
  "guidance": "string (actionable builder guidance)",
2128
2391
  "doubt_resolutions": [{ "doubt_id": 0, "resolution": "confirmed|dismissed|inconclusive" }]
2129
2392
  }`;
2393
+ ROLE_REQUIRED_FIELDS = {
2394
+ builder: ["decisions"],
2395
+ critic: ["doubts"],
2396
+ adversary: ["challenges"],
2397
+ verifier: ["grades"],
2398
+ gatekeeper: ["gate_decision"],
2399
+ reframer: ["reframe"],
2400
+ scout: ["findings"],
2401
+ compressor: ["compression_report"]
2402
+ };
2130
2403
  }
2131
2404
  });
2132
2405
 
@@ -2214,7 +2487,8 @@ function extractViaPatterns(role, markdown) {
2214
2487
  while ((match = doubtPattern.exec(markdown)) !== null) {
2215
2488
  doubts.push({
2216
2489
  claim_doubted: match[1].trim(),
2217
- evidence_level_of_claim: "judgment",
2490
+ evidence_level_of_claim: "unknown",
2491
+ // Don't fabricate — mark as unknown for review
2218
2492
  evidence_for_doubt: "Extracted via regex \u2014 review original document",
2219
2493
  severity: match[2].toLowerCase().trim()
2220
2494
  });
@@ -2225,7 +2499,8 @@ function extractViaPatterns(role, markdown) {
2225
2499
  async function extractViaHaiku(role, markdown) {
2226
2500
  try {
2227
2501
  const truncated = markdown.length > 8e3 ? markdown.slice(0, 8e3) + "\n[truncated]" : markdown;
2228
- const prompt = `Extract all decisions, evidence levels, grades, doubts, and guidance from this ${role} document as JSON. Follow this schema exactly: ${EXTRACTION_SCHEMA}
2502
+ const schema = getExtractionSchema(role);
2503
+ const prompt = `Extract structured data from this ${role} document as JSON. Follow this schema exactly: ${schema}
2229
2504
 
2230
2505
  Document:
2231
2506
  ${truncated}`;
@@ -2258,7 +2533,18 @@ ${truncated}`;
2258
2533
  }
2259
2534
  }
2260
2535
  function hasData(output) {
2261
- return !!(output.decisions && output.decisions.length > 0 || output.grades && output.grades.length > 0 || output.doubts && output.doubts.length > 0 || output.guidance);
2536
+ return !!(output.decisions && output.decisions.length > 0 || output.grades && output.grades.length > 0 || output.doubts && output.doubts.length > 0 || output.challenges && output.challenges.length > 0 || output.findings && output.findings.length > 0 || output.guidance || output.reframe || output.compression_report || output.gate_decision);
2537
+ }
2538
+ function validateForRole(role, output) {
2539
+ const required = ROLE_REQUIRED_FIELDS[role];
2540
+ if (!required) return { valid: true, missing: [] };
2541
+ const missing = required.filter((field) => {
2542
+ const value = output[field];
2543
+ if (value === void 0 || value === null) return true;
2544
+ if (Array.isArray(value) && value.length === 0) return true;
2545
+ return false;
2546
+ });
2547
+ return { valid: missing.length === 0, missing };
2262
2548
  }
2263
2549
  var import_claude_agent_sdk;
2264
2550
  var init_parse = __esm({
@@ -2322,6 +2608,12 @@ ${taskPrompt}`;
2322
2608
  console.log(`[${role}] Artifact written to ${artifactPath}`);
2323
2609
  }
2324
2610
  const structured = await extractStructuredData(role, markdown);
2611
+ if (structured) {
2612
+ const { valid, missing } = validateForRole(role, structured);
2613
+ if (!valid) {
2614
+ console.warn(`[${role}] Output missing expected fields: ${missing.join(", ")}`);
2615
+ }
2616
+ }
2325
2617
  return { output: markdown, structured, truncated };
2326
2618
  }
2327
2619
  async function spawnSynthesiser(context, projectRoot) {
@@ -2525,7 +2817,8 @@ var init_spawn = __esm({
2525
2817
  verifier: 50,
2526
2818
  compressor: 30,
2527
2819
  reframer: 20,
2528
- scout: 20
2820
+ scout: 20,
2821
+ gatekeeper: 10
2529
2822
  };
2530
2823
  DIM2 = "\x1B[2m";
2531
2824
  RESET2 = "\x1B[0m";
@@ -2605,7 +2898,8 @@ async function resolve(db, exp, projectRoot) {
2605
2898
  exp.hypothesis ?? exp.slug,
2606
2899
  whyFailed,
2607
2900
  `Approach rejected: ${whyFailed}`,
2608
- exp.sub_type
2901
+ exp.sub_type,
2902
+ "structural"
2609
2903
  );
2610
2904
  updateExperimentStatus(db, exp.id, "dead_end");
2611
2905
  if (exp.sub_type) {
@@ -2696,6 +2990,8 @@ async function cycle(step, args) {
2696
2990
  return doScout(db, exp, root);
2697
2991
  case "verify":
2698
2992
  return doVerify(db, exp, root);
2993
+ case "gate":
2994
+ return doGate(db, exp, root);
2699
2995
  case "compress":
2700
2996
  return doCompress(db, root);
2701
2997
  }
@@ -2709,6 +3005,49 @@ async function resolveCmd(args) {
2709
3005
  await resolve(db, exp, root);
2710
3006
  updateExperimentStatus(db, exp.id, "resolved");
2711
3007
  }
3008
+ async function doGate(db, exp, root) {
3009
+ transition(exp.status, "gated" /* GATED */);
3010
+ const synthesis = readFileOrEmpty(path9.join(root, "docs", "synthesis", "current.md"));
3011
+ const fragility = readFileOrEmpty(path9.join(root, "docs", "synthesis", "fragility.md"));
3012
+ const structuralDeadEnds = exp.sub_type ? listStructuralDeadEndsBySubType(db, exp.sub_type) : listStructuralDeadEnds(db);
3013
+ const result = await spawnAgent("gatekeeper", {
3014
+ experiment: {
3015
+ id: exp.id,
3016
+ slug: exp.slug,
3017
+ hypothesis: exp.hypothesis,
3018
+ status: exp.status,
3019
+ sub_type: exp.sub_type,
3020
+ builder_guidance: null
3021
+ },
3022
+ deadEnds: structuralDeadEnds.map((d) => ({
3023
+ approach: d.approach,
3024
+ why_failed: d.why_failed,
3025
+ structural_constraint: d.structural_constraint
3026
+ })),
3027
+ fragility,
3028
+ synthesis,
3029
+ taskPrompt: `Gate-check hypothesis for experiment ${exp.slug}:
3030
+ "${exp.hypothesis}"
3031
+
3032
+ Check: (a) stale references \u2014 does the hypothesis reference specific lines, functions, or structures that may not exist? (b) dead-end overlap \u2014 does this hypothesis repeat an approach already ruled out by structural dead-ends? (c) scope \u2014 is this a single focused change, or does it try to do multiple things?
3033
+
3034
+ Output your gate_decision as "approve", "reject", or "flag" with reasoning.`
3035
+ }, root);
3036
+ ingestStructuredOutput(db, exp.id, result.structured);
3037
+ const decision = result.structured?.gate_decision ?? "approve";
3038
+ const reason = result.structured?.reason ?? "";
3039
+ if (decision === "reject") {
3040
+ updateExperimentStatus(db, exp.id, "gated");
3041
+ warn(`Gate REJECTED for ${exp.slug}: ${reason}`);
3042
+ warn(`Revise the hypothesis or run \`majlis revert\` to abandon.`);
3043
+ } else {
3044
+ if (decision === "flag") {
3045
+ warn(`Gate flagged concerns for ${exp.slug}: ${reason}`);
3046
+ }
3047
+ updateExperimentStatus(db, exp.id, "gated");
3048
+ success(`Gate passed for ${exp.slug}. Run \`majlis build\` next.`);
3049
+ }
3050
+ }
2712
3051
  async function doBuild(db, exp, root) {
2713
3052
  transition(exp.status, "building" /* BUILDING */);
2714
3053
  const deadEnds = exp.sub_type ? listDeadEndsBySubType(db, exp.sub_type) : listAllDeadEnds(db);
@@ -2717,7 +3056,38 @@ async function doBuild(db, exp, root) {
2717
3056
  const fragility = fs9.existsSync(fragilityPath) ? fs9.readFileSync(fragilityPath, "utf-8") : "";
2718
3057
  const synthesisPath = path9.join(root, "docs", "synthesis", "current.md");
2719
3058
  const synthesis = fs9.existsSync(synthesisPath) ? fs9.readFileSync(synthesisPath, "utf-8") : "";
3059
+ const confirmedDoubts = getConfirmedDoubts(db, exp.id);
3060
+ const config = loadConfig5(root);
3061
+ if (config.metrics?.command) {
3062
+ try {
3063
+ const output = (0, import_node_child_process4.execSync)(config.metrics.command, {
3064
+ cwd: root,
3065
+ encoding: "utf-8",
3066
+ timeout: 6e4,
3067
+ stdio: ["pipe", "pipe", "pipe"]
3068
+ }).trim();
3069
+ const parsed = parseMetricsOutput(output);
3070
+ for (const m of parsed) {
3071
+ insertMetric(db, exp.id, "before", m.fixture, m.metric_name, m.metric_value);
3072
+ }
3073
+ if (parsed.length > 0) info(`Captured ${parsed.length} baseline metric(s).`);
3074
+ } catch {
3075
+ warn("Could not capture baseline metrics.");
3076
+ }
3077
+ }
2720
3078
  updateExperimentStatus(db, exp.id, "building");
3079
+ let taskPrompt = builderGuidance ? `Previous attempt was weak. Here is guidance for this attempt:
3080
+ ${builderGuidance}
3081
+
3082
+ Build the experiment: ${exp.hypothesis}` : `Build the experiment: ${exp.hypothesis}`;
3083
+ if (confirmedDoubts.length > 0) {
3084
+ taskPrompt += "\n\n## Confirmed Doubts (MUST address)\nThese weaknesses were confirmed by the verifier. Your build MUST address each one:\n";
3085
+ for (const d of confirmedDoubts) {
3086
+ taskPrompt += `- [${d.severity}] ${d.claim_doubted}: ${d.evidence_for_doubt}
3087
+ `;
3088
+ }
3089
+ }
3090
+ taskPrompt += "\n\nNote: The framework captures metrics automatically. Do NOT claim specific numbers unless quoting framework output.";
2721
3091
  const result = await spawnAgent("builder", {
2722
3092
  experiment: {
2723
3093
  id: exp.id,
@@ -2734,10 +3104,8 @@ async function doBuild(db, exp, root) {
2734
3104
  })),
2735
3105
  fragility,
2736
3106
  synthesis,
2737
- taskPrompt: builderGuidance ? `Previous attempt was weak. Here is guidance for this attempt:
2738
- ${builderGuidance}
2739
-
2740
- Build the experiment: ${exp.hypothesis}` : `Build the experiment: ${exp.hypothesis}`
3107
+ confirmedDoubts,
3108
+ taskPrompt
2741
3109
  }, root);
2742
3110
  ingestStructuredOutput(db, exp.id, result.structured);
2743
3111
  if (result.truncated && !result.structured) {
@@ -2747,6 +3115,23 @@ Build the experiment: ${exp.hypothesis}` : `Build the experiment: ${exp.hypothes
2747
3115
  }, root);
2748
3116
  warn(`Experiment stays at 'building'. Run \`majlis build\` to retry or \`majlis revert\` to abandon.`);
2749
3117
  } else {
3118
+ if (config.metrics?.command) {
3119
+ try {
3120
+ const output = (0, import_node_child_process4.execSync)(config.metrics.command, {
3121
+ cwd: root,
3122
+ encoding: "utf-8",
3123
+ timeout: 6e4,
3124
+ stdio: ["pipe", "pipe", "pipe"]
3125
+ }).trim();
3126
+ const parsed = parseMetricsOutput(output);
3127
+ for (const m of parsed) {
3128
+ insertMetric(db, exp.id, "after", m.fixture, m.metric_name, m.metric_value);
3129
+ }
3130
+ if (parsed.length > 0) info(`Captured ${parsed.length} post-build metric(s).`);
3131
+ } catch {
3132
+ warn("Could not capture post-build metrics.");
3133
+ }
3134
+ }
2750
3135
  gitCommitBuild(exp, root);
2751
3136
  updateExperimentStatus(db, exp.id, "built");
2752
3137
  success(`Build complete for ${exp.slug}. Run \`majlis doubt\` or \`majlis challenge\` next.`);
@@ -2754,6 +3139,26 @@ Build the experiment: ${exp.hypothesis}` : `Build the experiment: ${exp.hypothes
2754
3139
  }
2755
3140
  async function doChallenge(db, exp, root) {
2756
3141
  transition(exp.status, "challenged" /* CHALLENGED */);
3142
+ let gitDiff = "";
3143
+ try {
3144
+ gitDiff = (0, import_node_child_process4.execSync)('git diff main -- . ":!.majlis/"', {
3145
+ cwd: root,
3146
+ encoding: "utf-8",
3147
+ stdio: ["pipe", "pipe", "pipe"]
3148
+ }).trim();
3149
+ } catch {
3150
+ }
3151
+ if (gitDiff.length > 8e3) gitDiff = gitDiff.slice(0, 8e3) + "\n[DIFF TRUNCATED]";
3152
+ const synthesis = readFileOrEmpty(path9.join(root, "docs", "synthesis", "current.md"));
3153
+ let taskPrompt = `Construct adversarial test cases for experiment ${exp.slug}: ${exp.hypothesis}`;
3154
+ if (gitDiff) {
3155
+ taskPrompt += `
3156
+
3157
+ ## Code Changes (git diff main)
3158
+ \`\`\`diff
3159
+ ${gitDiff}
3160
+ \`\`\``;
3161
+ }
2757
3162
  const result = await spawnAgent("adversary", {
2758
3163
  experiment: {
2759
3164
  id: exp.id,
@@ -2763,7 +3168,8 @@ async function doChallenge(db, exp, root) {
2763
3168
  sub_type: exp.sub_type,
2764
3169
  builder_guidance: null
2765
3170
  },
2766
- taskPrompt: `Construct adversarial test cases for experiment ${exp.slug}: ${exp.hypothesis}`
3171
+ synthesis,
3172
+ taskPrompt
2767
3173
  }, root);
2768
3174
  ingestStructuredOutput(db, exp.id, result.structured);
2769
3175
  if (result.truncated && !result.structured) {
@@ -2775,6 +3181,20 @@ async function doChallenge(db, exp, root) {
2775
3181
  }
2776
3182
  async function doDoubt(db, exp, root) {
2777
3183
  transition(exp.status, "doubted" /* DOUBTED */);
3184
+ const paddedNum = String(exp.id).padStart(3, "0");
3185
+ const expDocPath = path9.join(root, "docs", "experiments", `${paddedNum}-${exp.slug}.md`);
3186
+ const experimentDoc = readFileOrEmpty(expDocPath);
3187
+ const synthesis = readFileOrEmpty(path9.join(root, "docs", "synthesis", "current.md"));
3188
+ const deadEnds = exp.sub_type ? listDeadEndsBySubType(db, exp.sub_type) : listAllDeadEnds(db);
3189
+ let taskPrompt = `Doubt the work in experiment ${exp.slug}: ${exp.hypothesis}. Produce a doubt document with evidence for each doubt.`;
3190
+ if (experimentDoc) {
3191
+ taskPrompt += `
3192
+
3193
+ ## Experiment Document (builder's artifact)
3194
+ <experiment_doc>
3195
+ ${experimentDoc}
3196
+ </experiment_doc>`;
3197
+ }
2778
3198
  const result = await spawnAgent("critic", {
2779
3199
  experiment: {
2780
3200
  id: exp.id,
@@ -2785,7 +3205,13 @@ async function doDoubt(db, exp, root) {
2785
3205
  builder_guidance: null
2786
3206
  // Critic does NOT see builder reasoning
2787
3207
  },
2788
- taskPrompt: `Doubt the work in experiment ${exp.slug}: ${exp.hypothesis}. Produce a doubt document with evidence for each doubt.`
3208
+ synthesis,
3209
+ deadEnds: deadEnds.map((d) => ({
3210
+ approach: d.approach,
3211
+ why_failed: d.why_failed,
3212
+ structural_constraint: d.structural_constraint
3213
+ })),
3214
+ taskPrompt
2789
3215
  }, root);
2790
3216
  ingestStructuredOutput(db, exp.id, result.structured);
2791
3217
  if (result.truncated && !result.structured) {
@@ -2797,22 +3223,49 @@ async function doDoubt(db, exp, root) {
2797
3223
  }
2798
3224
  async function doScout(db, exp, root) {
2799
3225
  transition(exp.status, "scouted" /* SCOUTED */);
2800
- const synthesisPath = path9.join(root, "docs", "synthesis", "current.md");
2801
- const synthesis = fs9.existsSync(synthesisPath) ? fs9.readFileSync(synthesisPath, "utf-8") : "";
2802
- updateExperimentStatus(db, exp.id, "scouted");
3226
+ const synthesis = readFileOrEmpty(path9.join(root, "docs", "synthesis", "current.md"));
3227
+ const fragility = readFileOrEmpty(path9.join(root, "docs", "synthesis", "fragility.md"));
3228
+ const deadEnds = exp.sub_type ? listDeadEndsBySubType(db, exp.sub_type) : listAllDeadEnds(db);
3229
+ const deadEndsSummary = deadEnds.map(
3230
+ (d) => `- [${d.category ?? "structural"}] ${d.approach}: ${d.why_failed}`
3231
+ ).join("\n");
3232
+ let taskPrompt = `Search for alternative approaches to the problem in experiment ${exp.slug}: ${exp.hypothesis}. Look for contradictory approaches, solutions from other fields, and known limitations of the current approach.`;
3233
+ if (deadEndsSummary) {
3234
+ taskPrompt += `
3235
+
3236
+ ## Known Dead Ends (avoid these approaches)
3237
+ ${deadEndsSummary}`;
3238
+ }
3239
+ if (fragility) {
3240
+ taskPrompt += `
3241
+
3242
+ ## Fragility Map (target these weak areas)
3243
+ ${fragility}`;
3244
+ }
2803
3245
  const result = await spawnAgent("scout", {
2804
3246
  experiment: {
2805
3247
  id: exp.id,
2806
3248
  slug: exp.slug,
2807
3249
  hypothesis: exp.hypothesis,
2808
- status: "scouted",
3250
+ status: exp.status,
2809
3251
  sub_type: exp.sub_type,
2810
3252
  builder_guidance: null
2811
3253
  },
2812
3254
  synthesis,
2813
- taskPrompt: `Search for alternative approaches to the problem in experiment ${exp.slug}: ${exp.hypothesis}. Look for contradictory approaches, solutions from other fields, and known limitations of the current approach.`
3255
+ fragility,
3256
+ deadEnds: deadEnds.map((d) => ({
3257
+ approach: d.approach,
3258
+ why_failed: d.why_failed,
3259
+ structural_constraint: d.structural_constraint
3260
+ })),
3261
+ taskPrompt
2814
3262
  }, root);
2815
3263
  ingestStructuredOutput(db, exp.id, result.structured);
3264
+ if (result.truncated && !result.structured) {
3265
+ warn(`Scout was truncated without structured output. Experiment stays at current status.`);
3266
+ return;
3267
+ }
3268
+ updateExperimentStatus(db, exp.id, "scouted");
2816
3269
  success(`Scout pass complete for ${exp.slug}. Run \`majlis verify\` next.`);
2817
3270
  }
2818
3271
  async function doVerify(db, exp, root) {
@@ -2826,6 +3279,35 @@ async function doVerify(db, exp, root) {
2826
3279
  challenges += fs9.readFileSync(path9.join(challengeDir, f), "utf-8") + "\n\n";
2827
3280
  }
2828
3281
  }
3282
+ const beforeMetrics = getMetricsByExperimentAndPhase(db, exp.id, "before");
3283
+ const afterMetrics = getMetricsByExperimentAndPhase(db, exp.id, "after");
3284
+ let metricsSection = "";
3285
+ if (beforeMetrics.length > 0 || afterMetrics.length > 0) {
3286
+ metricsSection = "\n\n## Framework-Captured Metrics (GROUND TRUTH \u2014 not self-reported by builder)\n";
3287
+ if (beforeMetrics.length > 0) {
3288
+ metricsSection += "### Before Build\n";
3289
+ for (const m of beforeMetrics) {
3290
+ metricsSection += `- ${m.fixture} / ${m.metric_name}: ${m.metric_value}
3291
+ `;
3292
+ }
3293
+ }
3294
+ if (afterMetrics.length > 0) {
3295
+ metricsSection += "### After Build\n";
3296
+ for (const m of afterMetrics) {
3297
+ metricsSection += `- ${m.fixture} / ${m.metric_name}: ${m.metric_value}
3298
+ `;
3299
+ }
3300
+ }
3301
+ }
3302
+ let doubtReference = "";
3303
+ if (doubts.length > 0) {
3304
+ doubtReference = "\n\n## Doubt Reference (use these IDs in doubt_resolutions)\n";
3305
+ for (const d of doubts) {
3306
+ doubtReference += `- DOUBT-${d.id}: [${d.severity}] ${d.claim_doubted}
3307
+ `;
3308
+ }
3309
+ doubtReference += "\nWhen resolving doubts, use the DOUBT-{id} number as the doubt_id value in your doubt_resolutions output.";
3310
+ }
2829
3311
  updateExperimentStatus(db, exp.id, "verifying");
2830
3312
  const result = await spawnAgent("verifier", {
2831
3313
  experiment: {
@@ -2838,7 +3320,7 @@ async function doVerify(db, exp, root) {
2838
3320
  },
2839
3321
  doubts,
2840
3322
  challenges,
2841
- taskPrompt: `Verify experiment ${exp.slug}: ${exp.hypothesis}. Check provenance and content. Test the ${doubts.length} doubt(s) and any adversarial challenges.`
3323
+ taskPrompt: `Verify experiment ${exp.slug}: ${exp.hypothesis}. Check provenance and content. Test the ${doubts.length} doubt(s) and any adversarial challenges.` + metricsSection + doubtReference
2842
3324
  }, root);
2843
3325
  ingestStructuredOutput(db, exp.id, result.structured);
2844
3326
  if (result.truncated && !result.structured) {
@@ -2846,9 +3328,15 @@ async function doVerify(db, exp, root) {
2846
3328
  return;
2847
3329
  }
2848
3330
  if (result.structured?.doubt_resolutions) {
2849
- for (const dr of result.structured.doubt_resolutions) {
2850
- if (dr.doubt_id && dr.resolution) {
3331
+ const knownDoubtIds = new Set(doubts.map((d) => d.id));
3332
+ for (let i = 0; i < result.structured.doubt_resolutions.length; i++) {
3333
+ const dr = result.structured.doubt_resolutions[i];
3334
+ if (!dr.resolution) continue;
3335
+ if (dr.doubt_id && knownDoubtIds.has(dr.doubt_id)) {
2851
3336
  updateDoubtResolution(db, dr.doubt_id, dr.resolution);
3337
+ } else if (doubts[i]) {
3338
+ warn(`Doubt resolution ID ${dr.doubt_id} not found. Using ordinal fallback \u2192 DOUBT-${doubts[i].id}.`);
3339
+ updateDoubtResolution(db, doubts[i].id, dr.resolution);
2852
3340
  }
2853
3341
  }
2854
3342
  }
@@ -2859,8 +3347,9 @@ async function doCompress(db, root) {
2859
3347
  const synthesisPath = path9.join(root, "docs", "synthesis", "current.md");
2860
3348
  const sizeBefore = fs9.existsSync(synthesisPath) ? fs9.statSync(synthesisPath).size : 0;
2861
3349
  const sessionCount = getSessionsSinceCompression(db);
3350
+ const dbExport = exportForCompressor(db);
2862
3351
  const result = await spawnAgent("compressor", {
2863
- taskPrompt: "Read ALL experiments, decisions, doubts, challenges, verification reports, reframes, and recent diffs. Cross-reference for contradictions, redundancies, and patterns. REWRITE docs/synthesis/current.md \u2014 shorter and denser. Update docs/synthesis/fragility.md with current weak areas. Update docs/synthesis/dead-ends.md with structural constraints from rejected experiments."
3352
+ taskPrompt: "## Structured Data (CANONICAL \u2014 from SQLite database)\nThe database export below is the source of truth. docs/ files are agent artifacts that may contain stale or incorrect information. Cross-reference everything against this data.\n\n" + dbExport + "\n\n## Your Task\nRead ALL experiments, decisions, doubts, challenges, verification reports, reframes, and recent diffs. Cross-reference for contradictions, redundancies, and patterns. REWRITE docs/synthesis/current.md \u2014 shorter and denser. Update docs/synthesis/fragility.md with current weak areas. Update docs/synthesis/dead-ends.md with structural constraints from rejected experiments."
2864
3353
  }, root);
2865
3354
  const sizeAfter = fs9.existsSync(synthesisPath) ? fs9.statSync(synthesisPath).size : 0;
2866
3355
  recordCompression(db, sessionCount, sizeBefore, sizeAfter);
@@ -2936,6 +3425,45 @@ function ingestStructuredOutput(db, experimentId, structured) {
2936
3425
  }
2937
3426
  info(`Ingested ${structured.challenges.length} challenge(s)`);
2938
3427
  }
3428
+ if (structured.reframe) {
3429
+ insertReframe(
3430
+ db,
3431
+ experimentId,
3432
+ structured.reframe.decomposition,
3433
+ JSON.stringify(structured.reframe.divergences),
3434
+ structured.reframe.recommendation
3435
+ );
3436
+ info(`Ingested reframe`);
3437
+ }
3438
+ if (structured.findings) {
3439
+ for (const f of structured.findings) {
3440
+ insertFinding(db, experimentId, f.approach, f.source, f.relevance, f.contradicts_current);
3441
+ }
3442
+ info(`Ingested ${structured.findings.length} finding(s)`);
3443
+ }
3444
+ }
3445
+ function readFileOrEmpty(filePath) {
3446
+ try {
3447
+ return fs9.readFileSync(filePath, "utf-8");
3448
+ } catch {
3449
+ return "";
3450
+ }
3451
+ }
3452
+ function loadConfig5(projectRoot) {
3453
+ const configPath = path9.join(projectRoot, ".majlis", "config.json");
3454
+ if (!fs9.existsSync(configPath)) {
3455
+ return {
3456
+ project: { name: "", description: "", objective: "" },
3457
+ cycle: {
3458
+ compression_interval: 5,
3459
+ circuit_breaker_threshold: 3,
3460
+ require_doubt_before_verify: true,
3461
+ require_challenge_before_verify: false,
3462
+ auto_baseline_on_new_experiment: true
3463
+ }
3464
+ };
3465
+ }
3466
+ return JSON.parse(fs9.readFileSync(configPath, "utf-8"));
2939
3467
  }
2940
3468
  var fs9, path9, import_node_child_process4;
2941
3469
  var init_cycle = __esm({
@@ -2950,7 +3478,7 @@ var init_cycle = __esm({
2950
3478
  init_types();
2951
3479
  init_spawn();
2952
3480
  init_resolve();
2953
- init_queries();
3481
+ init_metrics();
2954
3482
  init_format();
2955
3483
  }
2956
3484
  });
@@ -3050,7 +3578,7 @@ async function audit(args) {
3050
3578
  if (!root) throw new Error("Not in a Majlis project. Run `majlis init` first.");
3051
3579
  const db = getDb(root);
3052
3580
  const objective = args.filter((a) => !a.startsWith("--")).join(" ");
3053
- const config = loadConfig5(root);
3581
+ const config = loadConfig6(root);
3054
3582
  const experiments = listAllExperiments(db);
3055
3583
  const deadEnds = listAllDeadEnds(db);
3056
3584
  const circuitBreakers = getAllCircuitBreakerStates(db, config.cycle.circuit_breaker_threshold);
@@ -3107,7 +3635,7 @@ Output: either "classification confirmed \u2014 continue" or "re-classify from X
3107
3635
  }, root);
3108
3636
  success("Purpose audit complete. Review the output above.");
3109
3637
  }
3110
- function loadConfig5(projectRoot) {
3638
+ function loadConfig6(projectRoot) {
3111
3639
  const configPath = path11.join(projectRoot, ".majlis", "config.json");
3112
3640
  if (!fs11.existsSync(configPath)) {
3113
3641
  return { project: { name: "", description: "", objective: "" }, cycle: { circuit_breaker_threshold: 3 } };
@@ -3136,7 +3664,7 @@ async function next(args, isJson) {
3136
3664
  const root = findProjectRoot();
3137
3665
  if (!root) throw new Error("Not in a Majlis project. Run `majlis init` first.");
3138
3666
  const db = getDb(root);
3139
- const config = loadConfig6(root);
3667
+ const config = loadConfig7(root);
3140
3668
  const slugArg = args.filter((a) => !a.startsWith("--"))[0];
3141
3669
  let exp;
3142
3670
  if (slugArg) {
@@ -3249,15 +3777,18 @@ async function executeStep(step, exp, root) {
3249
3777
  updateExperimentStatus(getDb(root), exp.id, "compressed");
3250
3778
  info(`Experiment ${exp.slug} compressed.`);
3251
3779
  break;
3780
+ case "gated" /* GATED */:
3781
+ await cycle("gate", expArgs);
3782
+ break;
3252
3783
  case "reframed" /* REFRAMED */:
3253
3784
  updateExperimentStatus(getDb(root), exp.id, "reframed");
3254
- info(`Reframe acknowledged for ${exp.slug}. Proceeding to build.`);
3785
+ info(`Reframe acknowledged for ${exp.slug}. Proceeding to gate.`);
3255
3786
  break;
3256
3787
  default:
3257
3788
  warn(`Don't know how to execute step: ${step}`);
3258
3789
  }
3259
3790
  }
3260
- function loadConfig6(projectRoot) {
3791
+ function loadConfig7(projectRoot) {
3261
3792
  const configPath = path12.join(projectRoot, ".majlis", "config.json");
3262
3793
  if (!fs12.existsSync(configPath)) {
3263
3794
  return {
@@ -3303,7 +3834,7 @@ async function run(args) {
3303
3834
  throw new Error('Usage: majlis run "goal description"');
3304
3835
  }
3305
3836
  const db = getDb(root);
3306
- const config = loadConfig7(root);
3837
+ const config = loadConfig8(root);
3307
3838
  const MAX_EXPERIMENTS = 10;
3308
3839
  const MAX_STEPS = 200;
3309
3840
  let experimentCount = 0;
@@ -3348,6 +3879,15 @@ async function run(args) {
3348
3879
  const message = err instanceof Error ? err.message : String(err);
3349
3880
  warn(`Step failed for ${exp.slug}: ${message}`);
3350
3881
  try {
3882
+ insertDeadEnd(
3883
+ db,
3884
+ exp.id,
3885
+ exp.hypothesis ?? exp.slug,
3886
+ message,
3887
+ `Process failure: ${message}`,
3888
+ exp.sub_type,
3889
+ "procedural"
3890
+ );
3351
3891
  updateExperimentStatus(db, exp.id, "dead_end");
3352
3892
  } catch {
3353
3893
  }
@@ -3362,11 +3902,11 @@ async function run(args) {
3362
3902
  info("Run `majlis status` to see final state.");
3363
3903
  }
3364
3904
  async function deriveNextHypothesis(goal, root, db) {
3365
- const synthesis = readFileOrEmpty(path13.join(root, "docs", "synthesis", "current.md"));
3366
- const fragility = readFileOrEmpty(path13.join(root, "docs", "synthesis", "fragility.md"));
3367
- const deadEndsDoc = readFileOrEmpty(path13.join(root, "docs", "synthesis", "dead-ends.md"));
3905
+ const synthesis = readFileOrEmpty2(path13.join(root, "docs", "synthesis", "current.md"));
3906
+ const fragility = readFileOrEmpty2(path13.join(root, "docs", "synthesis", "fragility.md"));
3907
+ const deadEndsDoc = readFileOrEmpty2(path13.join(root, "docs", "synthesis", "dead-ends.md"));
3368
3908
  const deadEnds = listAllDeadEnds(db);
3369
- const config = loadConfig7(root);
3909
+ const config = loadConfig8(root);
3370
3910
  let metricsOutput = "";
3371
3911
  if (config.metrics?.command) {
3372
3912
  try {
@@ -3399,7 +3939,10 @@ ${fragility || "(none)"}
3399
3939
  ${deadEndsDoc || "(none)"}
3400
3940
 
3401
3941
  ## Dead Ends (from DB \u2014 ${deadEnds.length} total)
3402
- ${deadEnds.map((d) => `- ${d.approach}: ${d.why_failed} [constraint: ${d.structural_constraint}]`).join("\n") || "(none)"}
3942
+ ${deadEnds.map((d) => `- [${d.category ?? "structural"}] ${d.approach}: ${d.why_failed} [constraint: ${d.structural_constraint}]`).join("\n") || "(none)"}
3943
+
3944
+ Note: [structural] dead ends are HARD CONSTRAINTS \u2014 your hypothesis MUST NOT repeat these approaches.
3945
+ [procedural] dead ends are process failures \u2014 the approach may still be valid if executed differently.
3403
3946
 
3404
3947
  ## Your Task
3405
3948
  1. Assess: based on the metrics and synthesis, has the goal been met? Be specific.
@@ -3484,7 +4027,7 @@ function createNewExperiment(db, root, hypothesis) {
3484
4027
  }
3485
4028
  return exp;
3486
4029
  }
3487
- function readFileOrEmpty(filePath) {
4030
+ function readFileOrEmpty2(filePath) {
3488
4031
  try {
3489
4032
  return fs13.readFileSync(filePath, "utf-8");
3490
4033
  } catch {
@@ -3494,7 +4037,7 @@ function readFileOrEmpty(filePath) {
3494
4037
  function slugify2(text) {
3495
4038
  return text.toLowerCase().replace(/[^a-z0-9]+/g, "-").replace(/^-|-$/g, "").slice(0, 50);
3496
4039
  }
3497
- function loadConfig7(projectRoot) {
4040
+ function loadConfig8(projectRoot) {
3498
4041
  const configPath = path13.join(projectRoot, ".majlis", "config.json");
3499
4042
  if (!fs13.existsSync(configPath)) {
3500
4043
  return {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "majlis",
3
- "version": "0.4.5",
3
+ "version": "0.5.0",
4
4
  "description": "Multi-agent workflow CLI for structured doubt, independent verification, and compressed knowledge",
5
5
  "bin": {
6
6
  "majlis": "./dist/cli.js"