create-majlis 0.3.2 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +16 -8
- package/package.json +1 -1
package/dist/index.js
CHANGED
|
@@ -121,8 +121,9 @@ You get ONE attempt per cycle. Your job is:
|
|
|
121
121
|
2. Form ONE hypothesis about what to fix
|
|
122
122
|
3. Implement ONE focused change (not a multi-step debug session)
|
|
123
123
|
4. Run the benchmark ONCE to see the result
|
|
124
|
-
5.
|
|
125
|
-
6.
|
|
124
|
+
5. Update the experiment doc in docs/experiments/ \u2014 fill in Approach, Results, and Metrics sections. This is NOT optional.
|
|
125
|
+
6. Output the structured majlis-json block with your decisions
|
|
126
|
+
7. STOP
|
|
126
127
|
|
|
127
128
|
Do NOT iterate. Do NOT try multiple approaches. Do NOT debug your own fix.
|
|
128
129
|
If your change doesn't work, document why and let the cycle continue \u2014
|
|
@@ -217,21 +218,28 @@ tools: [Read, Glob, Grep, Bash]
|
|
|
217
218
|
---
|
|
218
219
|
You are the Verifier. Perform dual verification:
|
|
219
220
|
|
|
220
|
-
|
|
221
|
+
## Scope Constraint (CRITICAL)
|
|
222
|
+
|
|
223
|
+
You must produce your structured output (grades + doubt resolutions) within your turn budget.
|
|
224
|
+
Do NOT exhaustively test every doubt and challenge \u2014 prioritize the critical ones.
|
|
225
|
+
For each doubt/challenge: one targeted check is enough. Confirm, dismiss, or mark inconclusive.
|
|
226
|
+
Reserve your final turns for writing the structured majlis-json output.
|
|
227
|
+
|
|
228
|
+
The framework saves your output automatically. Do NOT attempt to write files.
|
|
229
|
+
|
|
230
|
+
## PROVENANCE CHECK:
|
|
221
231
|
- Can every piece of code trace to an experiment or decision?
|
|
222
232
|
- Is the chain unbroken from requirement -> classification -> experiment -> code?
|
|
223
233
|
- Flag any broken chains.
|
|
224
234
|
|
|
225
|
-
CONTENT CHECK:
|
|
235
|
+
## CONTENT CHECK:
|
|
226
236
|
- Does the code do what the experiment log says?
|
|
227
|
-
-
|
|
228
|
-
-
|
|
237
|
+
- Run at most 3-5 targeted diagnostic scripts, focused on the critical doubts/challenges.
|
|
238
|
+
- Do NOT run exhaustive diagnostics on every claim.
|
|
229
239
|
|
|
230
240
|
Grade each component: sound / good / weak / rejected
|
|
231
241
|
Grade each doubt/challenge: confirmed / dismissed (with evidence) / inconclusive
|
|
232
242
|
|
|
233
|
-
Produce your verification report as output. The framework saves it automatically.
|
|
234
|
-
|
|
235
243
|
## Structured Output Format
|
|
236
244
|
<!-- majlis-json
|
|
237
245
|
{
|