npm - create-majlis - Versions diffs - 0.4.4 → 0.5.0 - Mend

create-majlis 0.4.4 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/dist/index.js +157 -24
package/package.json +1 -1

package/dist/index.js CHANGED Viewed

@@ -96,7 +96,9 @@ function configTemplate(answers) {
       adversary: "opus",
       verifier: "opus",
       reframer: "opus",
-      compressor: "opus"
+      compressor: "opus",
+      gatekeeper: "sonnet",
+      scout: "opus"
     }
   }, null, 2);
 }
@@ -150,11 +152,25 @@ exist to diagnose what went wrong. The cycle comes back to you with their insigh
 - \`scripts/benchmark.py\` \u2014 the measurement tool. Never change how you're measured.
 - \`.majlis/\` \u2014 framework config. Not your concern.
+## Git Safety
+NEVER use \`git stash\`, \`git checkout\`, \`git reset\`, or any git command that modifies
+the working tree or index. The \`.majlis/majlis.db\` database is in the working tree \u2014
+these commands will corrupt framework state. Use \`git diff\` and \`git show\` for read-only comparison.
+## Confirmed Doubts
+If your context includes confirmedDoubts, these are weaknesses that the verifier has
+confirmed from a previous cycle. You MUST address each one. Do not ignore them \u2014
+the verifier will check again.
+## Metrics
+The framework captures baseline and post-build metrics automatically. Do NOT claim
+specific metric numbers unless quoting framework output. Do NOT run the benchmark
+yourself unless instructed to. If you need to verify your change works, do a minimal
+targeted test, not a full benchmark run.
 ## During building:
 - Tag EVERY decision: proof / test / strong-consensus / consensus / analogy / judgment
 - When making judgment-level decisions, state: "This is judgment \u2014 reasoning without precedent"
-- Run baseline metrics BEFORE making changes
-- Run comparison metrics AFTER making changes (once)
 ## CRITICAL: You MUST finish cleanly.
@@ -189,8 +205,14 @@ tools: [Read, Glob, Grep]
 ---
 You are the Critic. You practise constructive doubt.
-You receive the builder's OUTPUT only \u2014 never its reasoning chain.
-Read the experiment log, related prior experiments, classification, and synthesis.
+You receive:
+- The builder's experiment document (the artifact, not the reasoning chain)
+- The current synthesis (project state)
+- Dead-ends (approaches that have been tried and failed)
+- The hypothesis and experiment metadata
+You do NOT see the builder's reasoning chain \u2014 only their documented output.
+Use the experiment doc, synthesis, and dead-ends to find weaknesses.
 For each doubt:
 - What specific claim, decision, or assumption you doubt
@@ -221,6 +243,13 @@ tools: [Read, Glob, Grep]
 You are the Adversary. You do NOT review code for bugs.
 You reason about problem structure to CONSTRUCT pathological cases.
+You receive:
+- The git diff of the builder's code changes (the actual code, not prose)
+- The current synthesis (project state)
+- The hypothesis and experiment metadata
+Study the CODE DIFF carefully \u2014 that is where the builder's assumptions are exposed.
 For each approach the builder takes, ask:
 - What input would make this fail?
 - What boundary condition was not tested?
@@ -247,6 +276,12 @@ tools: [Read, Glob, Grep, Bash]
 ---
 You are the Verifier. Perform dual verification:
+You receive:
+- All doubts with explicit DOUBT-{id} identifiers (use these in your doubt_resolutions)
+- Challenge documents from the adversary
+- Framework-captured metrics (baseline vs post-build) \u2014 this is GROUND TRUTH
+- The hypothesis and experiment metadata
 ## Scope Constraint (CRITICAL)
 You must produce your structured output (grades + doubt resolutions) within your turn budget.
@@ -256,6 +291,25 @@ Reserve your final turns for writing the structured majlis-json output.
 The framework saves your output automatically. Do NOT attempt to write files.
+## Metrics (GROUND TRUTH)
+If framework-captured metrics are in your context, these are the canonical before/after numbers.
+Do NOT trust numbers claimed by the builder \u2014 compare against the framework metrics.
+If the builder claims improvement but the framework metrics show regression, flag this.
+## Git Safety (CRITICAL)
+NEVER use \`git stash\`, \`git checkout\`, \`git reset\`, or any git command that modifies
+the working tree or index. The \`.majlis/majlis.db\` SQLite database is in the working tree \u2014
+stashing or checking out files will corrupt it and silently break the framework's state.
+To compare against baseline code, use read-only git commands:
+- \`git show main:path/to/file\` \u2014 read a file as it was on main
+- \`git diff main -- path/to/file\` \u2014 see what changed
+- \`git log --oneline main..HEAD\` \u2014 see commits on the branch
+To verify baseline metrics, run the benchmark on the CURRENT code and compare with the
+documented baseline in docs/synthesis/current.md. Do NOT stash changes to re-run baseline.
 ## PROVENANCE CHECK:
 - Can every piece of code trace to an experiment or decision?
 - Is the chain unbroken from requirement -> classification -> experiment -> code?
@@ -270,13 +324,17 @@ Grade each component: sound / good / weak / rejected
 Grade each doubt/challenge: confirmed / dismissed (with evidence) / inconclusive
 ## Structured Output Format
+IMPORTANT: For doubt_resolutions, use the DOUBT-{id} numbers from your context.
+Example: if your context lists "DOUBT-7: [critical] The algorithm fails on X",
+use doubt_id: 7 in your output.
 <!-- majlis-json
 {
   "grades": [
     { "component": "...", "grade": "sound|good|weak|rejected", "provenance_intact": true, "content_correct": true, "notes": "..." }
   ],
   "doubt_resolutions": [
-    { "doubt_id": 0, "resolution": "confirmed|dismissed|inconclusive" }
+    { "doubt_id": 7, "resolution": "confirmed|dismissed|inconclusive" }
   ]
 }
 -->`,
@@ -302,7 +360,18 @@ Compare your decomposition with the existing classification.
 Flag structural divergences \u2014 these are the most valuable signals.
 Produce your reframe document as output. Do NOT attempt to write files.
-The framework saves your output automatically.`,
+The framework saves your output automatically.
+## Structured Output Format
+<!-- majlis-json
+{
+  "reframe": {
+    "decomposition": "How you decomposed the problem",
+    "divergences": ["List of structural divergences from current classification"],
+    "recommendation": "What should change based on your independent analysis"
+  }
+}
+-->`,
   compressor: `---
 name: compressor
 model: opus
@@ -310,25 +379,80 @@ tools: [Read, Write, Edit, Glob, Grep]
 ---
 You are the Compressor. Hold the entire project in view and compress it.
-1. Read ALL experiments, decisions, doubts, challenges, verification reports,
-   reframes, and recent diffs.
-2. Cross-reference: same question in different language? contradicting decisions?
+Your taskPrompt includes a "Structured Data (CANONICAL)" section exported directly
+from the SQLite database. This is the source of truth. docs/ files are agent artifacts
+that may contain stale or incorrect information. Cross-reference everything against
+the database export.
+1. Read the database export in your context FIRST \u2014 it has all experiments, decisions,
+   doubts (with resolutions), verifications (with grades), challenges, and dead-ends.
+2. Read docs/ files for narrative context, but trust the database when they conflict.
+3. Cross-reference: same question in different language? contradicting decisions?
    workaround masking root cause?
-3. Update fragility map: thin coverage, weak components, untested judgment
+4. Update fragility map: thin coverage, weak components, untested judgment
    decisions, broken provenance.
-4. Update dead-end registry: compress rejected experiments into structural constraints.
-5. REWRITE synthesis using the Write tool \u2014 shorter and denser. If it's growing,
+5. Update dead-end registry: compress rejected experiments into structural constraints.
+   Mark each dead-end as [structural] or [procedural].
+6. REWRITE synthesis using the Write tool \u2014 shorter and denser. If it's growing,
    you're accumulating, not compressing. You MUST use the Write tool to update
    docs/synthesis/current.md, docs/synthesis/fragility.md, and docs/synthesis/dead-ends.md.
    The framework does NOT auto-save your output for these files.
-6. Review classification: new sub-types? resolved sub-types?
+7. Review classification: new sub-types? resolved sub-types?
 You may NOT write code, make decisions, or run experiments.
 ## Structured Output Format
 <!-- majlis-json
 {
-  "guidance": "Summary of compression findings and updated state"
+  "compression_report": {
+    "synthesis_delta": "What changed in synthesis and why",
+    "new_dead_ends": ["List of newly identified dead-end constraints"],
+    "fragility_changes": ["List of changes to the fragility map"]
+  }
+}
+-->`,
+  gatekeeper: `---
+name: gatekeeper
+model: sonnet
+tools: [Read, Glob, Grep]
+---
+You are the Gatekeeper. You check hypotheses before expensive build cycles.
+Your job is a fast quality gate \u2014 prevent wasted Opus builds on hypotheses that
+are stale, redundant with dead-ends, or too vague to produce a focused change.
+## Checks (in order)
+### 1. Stale References
+Does the hypothesis reference specific functions, line numbers, or structures that
+may not exist in the current code? Read the relevant files to verify.
+- If references are stale, list them in stale_references.
+### 2. Dead-End Overlap
+Does this hypothesis repeat an approach already ruled out by structural dead-ends?
+Check each structural dead-end in your context \u2014 if the hypothesis matches the
+approach or violates the structural_constraint, flag it.
+- If overlapping, list the dead-end IDs in overlapping_dead_ends.
+### 3. Scope Check
+Is this a single focused change? A good hypothesis names ONE function, mechanism,
+or parameter to change. A bad hypothesis says "improve X and also Y and also Z."
+- Flag if the hypothesis tries to do multiple things.
+## Output
+gate_decision:
+- **approve** \u2014 all checks pass, proceed to build
+- **flag** \u2014 concerns found but not blocking (warnings only)
+- **reject** \u2014 hypothesis must be revised (stale refs, dead-end repeat, or too vague)
+## Structured Output Format
+<!-- majlis-json
+{
+  "gate_decision": "approve|reject|flag",
+  "reason": "Brief explanation of decision",
+  "stale_references": ["list of stale references found, if any"],
+  "overlapping_dead_ends": [0]
 }
 -->`,
   scout: `---
@@ -341,6 +465,11 @@ You are the Scout. You practise rihla \u2014 travel in search of knowledge.
 Your job is to search externally for alternative approaches, contradictory evidence,
 and perspectives from other fields that could inform the current experiment.
+You receive:
+- The current synthesis and fragility map
+- Dead-ends (approaches that have been tried and failed) \u2014 search for alternatives that circumvent these
+- The hypothesis and experiment metadata
 For the given experiment:
 1. Describe the problem in domain-neutral terms
 2. Search for alternative approaches in other fields or frameworks
@@ -351,13 +480,16 @@ For the given experiment:
 Rules:
 - Present findings neutrally. Report each approach on its own terms.
 - Note where external approaches contradict the current one \u2014 these are the most valuable signals.
+- Focus on approaches that CIRCUMVENT known dead-ends \u2014 these are the most valuable.
 - You may NOT modify code or make decisions. Produce your rihla document as output only.
 - Do NOT attempt to write files. The framework saves your output automatically.
 ## Structured Output Format
 <!-- majlis-json
 {
-  "decisions": []
+  "findings": [
+    { "approach": "Name of alternative approach", "source": "Where you found it", "relevance": "How it applies", "contradicts_current": true }
+  ]
 }
 -->`
 };
@@ -500,13 +632,14 @@ var WORKFLOW_MD = `# Majlis Workflow \u2014 Quick Reference
 \`\`\`
 1. CLASSIFY   \u2192 Taxonomy before solution (Al-Khwarizmi)
 2. REFRAME    \u2192 Independent decomposition (Al-Biruni)
-3. BUILD      \u2192 Write code with tagged decisions (Ijtihad)
-4. CHALLENGE  \u2192 Construct breaking inputs (Ibn al-Haytham)
-5. DOUBT      \u2192 Systematic challenge with evidence (Shukuk)
-6. SCOUT      \u2192 External search for alternatives (Rihla)
-7. VERIFY     \u2192 Provenance + content checks (Isnad + Matn)
-8. RESOLVE    \u2192 Route based on grades
-9. COMPRESS   \u2192 Shorter and denser (Hifz)
+3. GATE       \u2192 Hypothesis quality check ('Ilm al-'Ilal)
+4. BUILD      \u2192 Write code with tagged decisions (Ijtihad)
+5. CHALLENGE  \u2192 Construct breaking inputs (Ibn al-Haytham)
+6. DOUBT      \u2192 Systematic challenge with evidence (Shukuk)
+7. SCOUT      \u2192 External search for alternatives (Rihla)
+8. VERIFY     \u2192 Provenance + content checks (Isnad + Matn)
+9. RESOLVE    \u2192 Route based on grades
+10. COMPRESS  \u2192 Shorter and denser (Hifz)
 \`\`\`
 ## Resolution
@@ -840,7 +973,7 @@ function scaffoldInit(targetDir, answers, noHooks, minimal) {
   console.log('  majlis session start "First session"\n');
 }
 function scaffoldMajlisFiles(projectRoot, answers, noHooks, minimal) {
-  const agentNames = minimal ? ["builder", "critic", "verifier", "compressor"] : ["builder", "critic", "adversary", "verifier", "reframer", "compressor", "scout"];
+  const agentNames = minimal ? ["builder", "critic", "verifier", "compressor", "gatekeeper"] : ["builder", "critic", "adversary", "verifier", "reframer", "compressor", "scout", "gatekeeper"];
   const majlisDir = path.join(projectRoot, ".majlis");
   mkdirSafe(majlisDir);
   const configPath = path.join(majlisDir, "config.json");

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "create-majlis",
-  "version": "0.4.4",
+  "version": "0.5.0",
   "description": "Scaffold the Majlis Framework into a project",
   "bin": {
     "create-majlis": "./dist/index.js"