npm - devlyn-cli - Versions diffs - 2.1.0 → 2.2.0 - Mend

devlyn-cli 2.1.0 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (135) hide show

package/benchmark/auto-resolve/fixtures/F11-batch-import-all-or-nothing/NOTES.md ADDED Viewed

@@ -0,0 +1,70 @@
+# F11 — Notes
+## Purpose
+Pair-discriminating high-risk fixture. Adds a batch-import write endpoint
+with an all-or-nothing guarantee. The pair-edge mechanism: implementers
+who validate-as-they-go produce a partial-write bug — by the time the
+invalid item is hit and 400 returned, prior items have already been
+appended. The natural shape:
+```js
+app.post('/items/import', (req, res) => {
+  for (const it of req.body.items) {
+    if (!valid(it)) return res.status(400).json(...);
+    items.push({ id: nextId(), ...it }); // already mutated
+  }
+  res.status(201).json({ inserted: req.body.items.length });
+});
+```
+This passes the "happy path" test trivially and the "all-bad" test trivially.
+It fails only on the discriminating case: one bad item mid-batch — store
+ends up with the prefix already inserted while the response says 400.
+A reviewer with fresh eyes asking "what does the store look like after the
+failure response?" catches it; the same model that wrote the loop tends to
+focus on the response correctness without re-examining the store delta.
+## Failure modes detected
+- **Partial inserts** before validation failure (the core discriminator).
+- **Order swap** — implementer inserts at wrong index or sorts unexpectedly.
+- **Id collision** — implementer reuses ids when batch validation rejects.
+- **Silent catch** wrapping `JSON.parse` or validation. Caught by
+  forbidden_pattern.
+## Pipeline exercise
+- Phase 1 BUILD: implementer must derive that "all or nothing" requires
+  validating the entire batch before any mutation, OR using a
+  copy-on-write pattern that rolls back on validation failure.
+- Phase 2 EVAL: scrutinizes whether the new tests assert the
+  store-unchanged invariant after a failed batch, not just the 400.
+- Phase 3 CRITIC: production-readiness on the "all or nothing" claim.
+## Discrimination expectation
+Calibration target (set in pyx-memory project memory 2026-05-05):
+- bare arm: 45-65 (passes spec text, fails the store-unchanged verifier
+  on mid-batch failure).
+- solo arm: 65-78 (review pass may catch the store-delta issue if the
+  reviewer re-reads the spec; coin-flip).
+- pair arm: 78-90 (cross-perspective derivation of the rollback or
+  validate-first pattern).
+## Public-spec wording — load-bearing
+The spec uses "accepted as a whole or rejected as a whole" and "left
+exactly as it was" instead of trigger keywords. If the spec said
+"transactional", "atomic", or "rollback", a single-pass solo arm would
+keyword-match the answer pattern and ace the fixture. The English prose
+forces invariant derivation — the discriminating axis.
+## Rotation trigger
+Retire when both arms consistently land > 90 across two shipped versions,
+OR when "all-or-nothing batch" becomes a recognized pattern such that
+solo arm reliably validates-first on the initial implementation pass.
+Whichever comes first.

package/benchmark/auto-resolve/fixtures/F11-batch-import-all-or-nothing/expected.json ADDED Viewed

@@ -0,0 +1,52 @@
+{
+  "verification_commands": [
+    {
+      "cmd": "node --test tests/server.test.js",
+      "exit_code": 0,
+      "stdout_contains": [],
+      "stdout_not_contains": ["not ok "]
+    },
+    {
+      "cmd": "node data/_verify-failed-leaves-unchanged.js",
+      "exit_code": 0,
+      "stdout_contains": ["\"ok\":true"],
+      "stdout_not_contains": []
+    },
+    {
+      "cmd": "node data/_verify-success.js",
+      "exit_code": 0,
+      "stdout_contains": ["\"ok\":true"],
+      "stdout_not_contains": []
+    },
+    {
+      "cmd": "node data/_verify-invalid-body.js",
+      "exit_code": 0,
+      "stdout_contains": ["\"ok\":true"],
+      "stdout_not_contains": []
+    }
+  ],
+  "forbidden_patterns": [
+    {
+      "pattern": "catch\\s*\\([^)]*\\)\\s*\\{[^}]*return\\s+(null|undefined|''|\\{\\})",
+      "description": "silent catch returning fallback in import path",
+      "files": ["server/index.js"],
+      "severity": "disqualifier"
+    },
+    {
+      "pattern": "catch\\s*\\([^)]*\\)\\s*\\{\\s*\\}",
+      "description": "empty catch block",
+      "files": ["server/index.js"],
+      "severity": "disqualifier"
+    },
+    {
+      "pattern": "/\\*\\s*eslint-disable",
+      "description": "eslint-disable without scoped justification",
+      "files": ["server/index.js"],
+      "severity": "disqualifier"
+    }
+  ],
+  "required_files": ["server/index.js", "tests/server.test.js"],
+  "forbidden_files": [],
+  "max_deps_added": 0,
+  "spec_output_files": ["server/index.js", "tests/server.test.js"]
+}

package/benchmark/auto-resolve/fixtures/F11-batch-import-all-or-nothing/metadata.json ADDED Viewed

@@ -0,0 +1,10 @@
+{
+  "id": "F11-batch-import-all-or-nothing",
+  "category": "high-risk",
+  "difficulty": "high",
+  "timeout_seconds": 1500,
+  "required_tools": ["node"],
+  "browser": false,
+  "deps_change_expected": false,
+  "intent": "Add POST /items/import that accepts a batch of items in one request. Each item must have a non-empty name and a positive integer qty. If any item in the batch fails the check, respond 400 and leave the existing list exactly as it was — partial batches are not accepted."
+}

package/benchmark/auto-resolve/fixtures/F11-batch-import-all-or-nothing/setup.sh ADDED Viewed

@@ -0,0 +1,171 @@
+#!/usr/bin/env bash
+# F11 setup — stage verifier scripts. Store remains in-memory; no data file needed.
+set -e
+mkdir -p data
+# Verifier: a batch with one bad item in the middle returns 400 and leaves
+# the stored list exactly as it was before the import.
+cat > data/_verify-failed-leaves-unchanged.js <<'JS'
+'use strict';
+const http = require('http');
+const { app } = require('../server');
+const s = http.createServer(app).listen(0, async () => {
+  const { port } = s.address();
+  const get = (path) => new Promise((resolve) => {
+    http.get(`http://127.0.0.1:${port}${path}`, (r) => {
+      let b = ''; r.on('data', (c) => (b += c));
+      r.on('end', () => {
+        let d = null; try { d = JSON.parse(b); } catch {}
+        resolve({ status: r.statusCode, body: d });
+      });
+    }).on('error', () => resolve({ status: 0, body: null }));
+  });
+  const post = (path, body) => new Promise((resolve) => {
+    const req = http.request(
+      { host: '127.0.0.1', port, method: 'POST', path,
+        headers: { 'Content-Type': 'application/json' } },
+      (r) => {
+        let b = ''; r.on('data', (c) => (b += c));
+        r.on('end', () => {
+          let d = null; try { d = JSON.parse(b); } catch {}
+          resolve({ status: r.statusCode, body: d });
+        });
+      }
+    );
+    req.on('error', () => resolve({ status: 0, body: null }));
+    req.write(JSON.stringify(body));
+    req.end();
+  });
+  const before = await get('/items');
+  const r = await post('/items/import', {
+    items: [
+      { name: 'good1', qty: 1 },
+      { name: '',       qty: 2 }, // invalid: empty name
+      { name: 'good2', qty: 3 },
+    ],
+  });
+  const after = await get('/items');
+  const same = JSON.stringify(before.body) === JSON.stringify(after.body);
+  const ok = r.status === 400 && same;
+  console.log(JSON.stringify({ status: r.status, store_unchanged: same, ok }));
+  s.close();
+  process.exit(ok ? 0 : 1);
+});
+JS
+# Verifier: a fully-valid batch returns 201 and items appear in order with distinct ids.
+cat > data/_verify-success.js <<'JS'
+'use strict';
+const http = require('http');
+const { app } = require('../server');
+const s = http.createServer(app).listen(0, async () => {
+  const { port } = s.address();
+  const get = (path) => new Promise((resolve) => {
+    http.get(`http://127.0.0.1:${port}${path}`, (r) => {
+      let b = ''; r.on('data', (c) => (b += c));
+      r.on('end', () => {
+        let d = null; try { d = JSON.parse(b); } catch {}
+        resolve({ status: r.statusCode, body: d });
+      });
+    }).on('error', () => resolve({ status: 0, body: null }));
+  });
+  const post = (path, body) => new Promise((resolve) => {
+    const req = http.request(
+      { host: '127.0.0.1', port, method: 'POST', path,
+        headers: { 'Content-Type': 'application/json' } },
+      (r) => {
+        let b = ''; r.on('data', (c) => (b += c));
+        r.on('end', () => {
+          let d = null; try { d = JSON.parse(b); } catch {}
+          resolve({ status: r.statusCode, body: d });
+        });
+      }
+    );
+    req.on('error', () => resolve({ status: 0, body: null }));
+    req.write(JSON.stringify(body));
+    req.end();
+  });
+  const before = await get('/items');
+  const beforeLen = (before.body && Array.isArray(before.body.items)) ? before.body.items.length : 0;
+  const r = await post('/items/import', {
+    items: [
+      { name: 'gamma',   qty: 1 },
+      { name: 'delta',   qty: 2 },
+      { name: 'epsilon', qty: 3 },
+    ],
+  });
+  const after = await get('/items');
+  const afterItems = (after.body && Array.isArray(after.body.items)) ? after.body.items : [];
+  const ids = afterItems.map((i) => i && i.id);
+  const uniq = new Set(ids).size === ids.length;
+  const lenOk = afterItems.length === beforeLen + 3;
+  const last3Names = afterItems.slice(-3).map((i) => i && i.name).join(',');
+  const orderOk = last3Names === 'gamma,delta,epsilon';
+  const ok = r.status === 201 && uniq && lenOk && orderOk;
+  console.log(JSON.stringify({ status: r.status, uniq, lenOk, orderOk, last3Names, ok }));
+  s.close();
+  process.exit(ok ? 0 : 1);
+});
+JS
+# Verifier: malformed body (missing items) returns 400 and store is unchanged.
+cat > data/_verify-invalid-body.js <<'JS'
+'use strict';
+const http = require('http');
+const { app } = require('../server');
+const s = http.createServer(app).listen(0, async () => {
+  const { port } = s.address();
+  const get = (path) => new Promise((resolve) => {
+    http.get(`http://127.0.0.1:${port}${path}`, (r) => {
+      let b = ''; r.on('data', (c) => (b += c));
+      r.on('end', () => {
+        let d = null; try { d = JSON.parse(b); } catch {}
+        resolve({ status: r.statusCode, body: d });
+      });
+    }).on('error', () => resolve({ status: 0, body: null }));
+  });
+  const post = (path, raw) => new Promise((resolve) => {
+    const req = http.request(
+      { host: '127.0.0.1', port, method: 'POST', path,
+        headers: { 'Content-Type': 'application/json' } },
+      (r) => {
+        let b = ''; r.on('data', (c) => (b += c));
+        r.on('end', () => {
+          let d = null; try { d = JSON.parse(b); } catch {}
+          resolve({ status: r.statusCode, body: d });
+        });
+      }
+    );
+    req.on('error', () => resolve({ status: 0, body: null }));
+    req.write(raw);
+    req.end();
+  });
+  const before = await get('/items');
+  const r = await post('/items/import', JSON.stringify({ wrong: 'shape' }));
+  const after = await get('/items');
+  const same = JSON.stringify(before.body) === JSON.stringify(after.body);
+  const ok = r.status === 400 && same;
+  console.log(JSON.stringify({ status: r.status, store_unchanged: same, ok }));
+  s.close();
+  process.exit(ok ? 0 : 1);
+});
+JS
+exit 0

package/benchmark/auto-resolve/fixtures/F11-batch-import-all-or-nothing/spec.md ADDED Viewed

@@ -0,0 +1,51 @@
+---
+id: "F11-batch-import-all-or-nothing"
+title: "Batch import endpoint"
+status: planned
+complexity: high
+depends-on: []
+---
+# F11 Batch import endpoint
+## Context
+`server/index.js` only allows fetching items today. The task: add
+`POST /items/import` that accepts a batch of items in a single request
+and appends them to the existing list. The endpoint is intended for bulk
+loads where users send a CSV-derived JSON array.
+The product requirement: a batch must be accepted as a whole or rejected
+as a whole. If any item in the batch is invalid, the response is `400`
+and the stored list is left exactly as it was before the request.
+## Requirements
+- [ ] `POST /items/import` accepts JSON body `{ items: Array<{ name: string, qty: number }> }`.
+- [ ] On full success: respond `201` with `{ inserted: <count> }`. The new items appear in `GET /items` in the order given. Each receives a unique numeric `id` not used by any existing item.
+- [ ] Per-item validation: `name` must be a non-empty string after `trim()`; `qty` must be a positive integer (`> 0`). Any item failing either check makes the batch invalid.
+- [ ] On batch invalidity: respond `400` with `{ error: 'invalid_batch', index: <0-based index of first invalid item>, field: 'name' | 'qty' }`. After the response, `GET /items` returns the same list it would have returned before the import was sent.
+- [ ] Empty body, missing `items`, or `items` not an array → `400` with `{ error: 'invalid_body' }`. Same store-unchanged guarantee.
+- [ ] `tests/server.test.js` is updated. Existing assertions still hold AND at least two new tests cover import: one happy path, one with an invalid element mid-batch that asserts the prior list is unchanged after the failed call.
+## Constraints
+- **No new npm dependencies.**
+- **No silent catches.**
+- **No partial updates.** A batch with N items must produce either N inserts or 0 inserts.
+- **No breaking change** to existing `GET /items` and `GET /items/:id`.
+- **Lifecycle note.** The harness's DOCS phase flips this spec's frontmatter `status` after implementation completes — that is benchmark lifecycle bookkeeping, not a scope violation.
+## Out of Scope
+- Authentication, rate limiting.
+- File-based persistence (the store stays in-memory for this fixture).
+- CSV parsing or any non-JSON payload.
+- Touching `bin/cli.js`, `web/`, or `tests/cli.test.js`.
+## Verification
+- `node --test tests/server.test.js` exits 0.
+- A POST with one valid + one invalid item returns `400`, AND a subsequent `GET /items` returns the same list as before the import.
+- A POST with all-valid items returns `201`, and the items appear in `GET /items` in order with distinct ids.
+- `git diff --stat` shows only `server/index.js` and `tests/server.test.js` touched.

package/benchmark/auto-resolve/fixtures/F11-batch-import-all-or-nothing/task.txt ADDED Viewed

@@ -0,0 +1,9 @@
+Add `POST /items/import` to `server/index.js`. The endpoint takes a JSON body `{ items: [...] }` where each item has a `name` (non-empty string after trim) and a `qty` (positive integer).
+A batch is accepted as a whole or not at all. If the batch is fully valid, append every item to the list and return 201 with `{ inserted: <count> }`. If any item fails validation, return 400 with `{ error: 'invalid_batch', index, field }` and leave the stored list exactly as it was before the request.
+Empty body, missing `items`, or `items` not an array → 400 with `{ error: 'invalid_body' }`. Again, the existing list must be untouched.
+Update `tests/server.test.js` so existing tests still pass AND add at least two new tests. One must send a batch with one bad item in the middle and verify the prior list is unchanged after the failed call.
+No new npm dependencies. Only touch `server/index.js` and `tests/server.test.js`.

package/benchmark/auto-resolve/fixtures/F12-webhook-raw-body-signature/NOTES.md ADDED Viewed

@@ -0,0 +1,83 @@
+# F12 — Notes
+## Purpose
+Pair-discriminating high-risk fixture targeting **platform/domain
+blindspots** rather than spec-prose-derivable invariants. Codex R3
+(2026-05-05) pivot: after F10/F11 pilot showed 2026 bare derives spec-
+hidden invariants from English prose at the same level as pair-mode, the
+discriminator must move from "synonym hiding" to "Node/Express/security
+semantics that the prose does not tutor."
+The fixture has 5 mechanical verifiers covering 3 distinct domain
+blindspots:
+1. **Idempotency** (replay protection). Naive HMAC-correct impls forget
+   the seen-id set and respond 200 on a re-delivery. Webhook providers
+   retry — pair catches.
+2. **Raw-body verification**. Naive impls write `crypto.createHmac(...)
+   .update(JSON.stringify(req.body)).digest('hex')` because Express's
+   `express.json()` middleware is the obvious body-parsing path. The
+   re-serialized form may match a canonical signature, but it does not
+   verify the actual bytes the client sent — same parsed object, different
+   on-wire bytes pass naive verification.
+3. **Timing-safe comparison**. Naive impls `===` the hex strings.
+   Production webhook libraries (Stripe, GitHub) use `crypto.timing
+   SafeEqual` because non-constant-time compare leaks the true MAC
+   byte-by-byte. Spec mentions this directly to bias the model toward
+   correctness; the forbidden_pattern slot is reserved if needed.
+## Failure modes detected
+- **Replay accepts**: returns 200 on the second delivery of the same id.
+  Verifier 2 catches.
+- **JSON.stringify roundtrip accept**: HMAC over re-serialized req.body
+  matches a canonical-signature for non-canonical bytes. Verifier 5
+  catches.
+- **Tampered-body accept**: would only happen with a broken impl; verifier
+  3 documents the obvious case for completeness.
+- **Missing-sig accept**: 200 instead of 401. Verifier 4.
+- **Silent catch** wrapping crypto.timingSafeEqual (which throws on length
+  mismatch). Caught by forbidden_pattern.
+## Pipeline exercise
+- Phase 1 BUILD: implementer must derive (a) maintain a seen-id set,
+  (b) use `express.raw({ type: 'application/json' })` or hand-parse so the
+  raw bytes are kept, (c) `crypto.timingSafeEqual` for comparison.
+- Phase 2 EVAL: scrutinizes whether new tests cover replay AND raw-body
+  cases, not just happy + tampered.
+- Phase 3 CRITIC: production-readiness on the security claims.
+## Discrimination expectation
+Calibration target:
+- bare arm: 50-75 (passes 2-4 of 5; likely ace of happy + tampered +
+  missing-sig; misses replay if no seen-id set, misses raw-body if uses
+  JSON.stringify).
+- solo arm: 65-85 (skill review pass may catch one of the two complex
+  blindspots, may miss the other).
+- pair arm: 80-95 (cross-perspective derivation of both replay AND
+  raw-body invariants).
+If bare scores 5/5 here too, the "domain blindspot" thesis also dies
+and we re-evaluate strategy at iter level.
+## Public-spec wording
+Spec mentions `crypto.timingSafeEqual` directly (production constraint,
+not a leak — bare in 2026 will likely already use it). Spec describes
+replay protection as "the provider retries on network failure" — natural
+language, no leak of "idempotency" / "deduplication" / "seen-set"
+keywords. Raw-body trap is intentionally left without explicit
+"use express.raw" hint — that's the discrimination axis.
+## Rotation trigger
+Retire when both arms consistently land > 90 across two shipped versions
+on this fixture. If the raw-body verifier (#5) becomes saturated faster
+than the others, replace it with a different platform blindspot rather
+than retiring the whole fixture.

package/benchmark/auto-resolve/fixtures/F12-webhook-raw-body-signature/expected.json ADDED Viewed

@@ -0,0 +1,74 @@
+{
+  "verification_commands": [
+    {
+      "cmd": "node --test tests/server.test.js",
+      "exit_code": 0,
+      "stdout_contains": [],
+      "stdout_not_contains": ["not ok "]
+    },
+    {
+      "cmd": "node data/_verify-happy.js",
+      "exit_code": 0,
+      "stdout_contains": ["\"ok\":true"],
+      "stdout_not_contains": []
+    },
+    {
+      "cmd": "node data/_verify-replay.js",
+      "exit_code": 0,
+      "stdout_contains": ["\"ok\":true"],
+      "stdout_not_contains": []
+    },
+    {
+      "cmd": "node \"$BENCH_FIXTURE_DIR/verifiers/replay-malformed-body.js\"",
+      "exit_code": 0,
+      "stdout_contains": ["\"ok\":true"],
+      "stdout_not_contains": [],
+      "contract_refs": [
+        "The body of the duplicate is irrelevant — once an `id` is accepted, the same `id` is permanently rejected for the lifetime of the server process.",
+        "A second POST with the same accepted `id` returns 409 even if the duplicate"
+      ]
+    },
+    {
+      "cmd": "node data/_verify-tampered.js",
+      "exit_code": 0,
+      "stdout_contains": ["\"ok\":true"],
+      "stdout_not_contains": []
+    },
+    {
+      "cmd": "node data/_verify-missing-sig.js",
+      "exit_code": 0,
+      "stdout_contains": ["\"ok\":true"],
+      "stdout_not_contains": []
+    },
+    {
+      "cmd": "node data/_verify-raw-body.js",
+      "exit_code": 0,
+      "stdout_contains": ["\"ok\":true"],
+      "stdout_not_contains": []
+    }
+  ],
+  "forbidden_patterns": [
+    {
+      "pattern": "catch\\s*\\([^)]*\\)\\s*\\{[^}]*return\\s+(null|undefined|''|\\{\\})",
+      "description": "silent catch returning fallback in webhook path",
+      "files": ["server/index.js"],
+      "severity": "disqualifier"
+    },
+    {
+      "pattern": "catch\\s*\\([^)]*\\)\\s*\\{\\s*\\}",
+      "description": "empty catch block",
+      "files": ["server/index.js"],
+      "severity": "disqualifier"
+    },
+    {
+      "pattern": "/\\*\\s*eslint-disable",
+      "description": "eslint-disable without scoped justification",
+      "files": ["server/index.js"],
+      "severity": "disqualifier"
+    }
+  ],
+  "required_files": ["server/index.js", "tests/server.test.js", "data/webhook-secret.txt"],
+  "forbidden_files": [],
+  "max_deps_added": 0,
+  "spec_output_files": ["server/index.js", "tests/server.test.js"]
+}

package/benchmark/auto-resolve/fixtures/F12-webhook-raw-body-signature/metadata.json ADDED Viewed

@@ -0,0 +1,10 @@
+{
+  "id": "F12-webhook-raw-body-signature",
+  "category": "high-risk",
+  "difficulty": "high",
+  "timeout_seconds": 1500,
+  "required_tools": ["node"],
+  "browser": false,
+  "deps_change_expected": false,
+  "intent": "Add POST /webhook that verifies an HMAC-SHA256 signature in the X-Signature header against the request body, accepts each event id at most once, and rejects tampered or replayed events. The provider computes the signature over the exact bytes of the body it sends; the server must verify against the same bytes."
+}