npm - @mcp-graph-workflow/agent-graph-flow - Versions diffs - 0.1.0 - Mend

@mcp-graph-workflow/agent-graph-flow 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

package/README.md +40 -0
package/dist/cli/index.d.ts +1 -0
package/dist/cli/index.js +12842 -0
package/dist/index.d.ts +43 -0
package/dist/index.js +48 -0
package/package.json +142 -0
package/src/skills/analyze/ambiguity-audit.md +46 -0
package/src/skills/analyze/decompose-prd.md +26 -0
package/src/skills/analyze/grill-me.md +26 -0
package/src/skills/analyze/to-prd.md +57 -0
package/src/skills/any/code-detachment.md +26 -0
package/src/skills/any/lessons-consult.md +26 -0
package/src/skills/any/wip-one.md +26 -0
package/src/skills/design/design-an-interface.md +26 -0
package/src/skills/design/seam-audit.md +26 -0
package/src/skills/domain/crypto/common-mistakes.md +71 -0
package/src/skills/domain/ml/common-mistakes.md +55 -0
package/src/skills/domain/rag/chunk-overlap-strategy.md +27 -0
package/src/skills/domain/sqlite-perf/fts5-tuning.md +25 -0
package/src/skills/domain/sqlite-perf/wal-mode.md +26 -0
package/src/skills/domain/systems/common-mistakes.md +62 -0
package/src/skills/domain/testing/vitest-isolation.md +31 -0
package/src/skills/domain/typescript/zod-v4-migration.md +27 -0
package/src/skills/implement/anti-hallucination.md +28 -0
package/src/skills/implement/pure-decision-pattern.md +26 -0
package/src/skills/implement/tracer-bullet-tdd.md +26 -0
package/src/skills/plan/budget-aware-picking.md +26 -0
package/src/skills/plan/plan-sprint.md +26 -0
package/src/skills/plan/to-issues.md +67 -0
package/src/skills/review/citation-coverage-review.md +26 -0
package/src/skills/review/deep-module-review.md +26 -0
package/src/skills/review/zoom-out.md +34 -0
package/src/skills/validate/dod-checklist.md +30 -0
package/src/skills/validate/harness-regression-check.md +26 -0

package/src/skills/domain/rag/chunk-overlap-strategy.md ADDED Viewed

@@ -0,0 +1,27 @@
+---
+domain: rag
+topic: chunk-overlap-strategy
+triggers: [retrieval_quality, lost_context, chunk_boundary]
+discovered_at: 2026-04-28T00:00:00.000Z
+source_task: seed
+confidence: 0.75
+---
+# Chunk Overlap Strategy
+Default to ~15% overlap between adjacent chunks. Information at chunk
+boundaries gets lost when there's no overlap; queries that span the cut
+return either chunk and miss the bridging context.
+## When to apply
+- Retrieval quality drops on multi-sentence reasoning.
+- Tables, code blocks, or definitions span chunk boundaries.
+- Citations point to the wrong chunk by one position.
+## Sizing rule
+- Token budget for a chunk: 256–512 tokens.
+- Overlap: 15% of chunk size (38–76 tokens).
+- For code chunks, prefer overlapping by full statements, not byte counts —
+  splitting an `if/else` block in half makes both chunks unhelpful.

package/src/skills/domain/sqlite-perf/fts5-tuning.md ADDED Viewed

@@ -0,0 +1,25 @@
+---
+domain: sqlite-perf
+topic: fts5-tuning
+triggers: [slow_search, fts5_queries, search_relevance]
+discovered_at: 2026-04-28T00:00:00.000Z
+source_task: seed
+confidence: 0.78
+---
+# FTS5 Tuning
+`bm25()` ranking + `unicode61` tokenizer + content-rowid linking is the
+default fast path. Use external content tables to avoid double-storing.
+## When to apply
+- Search queries scan thousands of rows.
+- Relevance ranking feels off (BM25 needs explicit weights per column).
+- Index size is bloating the database.
+## Tips
+- Rebuild the FTS index after bulk inserts: `INSERT INTO fts(fts) VALUES('rebuild');`
+- Use `MATCH` with prefix tokens (`foo*`) instead of `LIKE`.
+- For Portuguese/Spanish content, append `remove_diacritics 1` to the tokenizer.

package/src/skills/domain/sqlite-perf/wal-mode.md ADDED Viewed

@@ -0,0 +1,26 @@
+---
+domain: sqlite-perf
+topic: wal-mode
+triggers: [slow_writes, lock_contention, concurrent_readers]
+discovered_at: 2026-04-28T00:00:00.000Z
+source_task: seed
+confidence: 0.85
+---
+# SQLite WAL Mode
+Write-Ahead Logging lets readers and a single writer run concurrently without
+blocking each other. Enable with `PRAGMA journal_mode = WAL;` once per database;
+the setting persists in the file header.
+## When to apply
+- Writes are slow under read pressure.
+- `SQLITE_BUSY` errors appear in logs.
+- Multiple processes/threads need to read while one writes.
+## Trade-offs
+- A `-wal` and `-shm` sidecar file appear next to the database.
+- Long-running readers can prevent the WAL from being checkpointed.
+- Backups must include the WAL file or use `VACUUM INTO`.

package/src/skills/domain/systems/common-mistakes.md ADDED Viewed

@@ -0,0 +1,62 @@
+---
+domain: systems
+topic: common-mistakes
+triggers: [concurrency, distributed_systems, performance, deadlock, race_condition]
+discovered_at: 2026-04-30T00:00:00.000Z
+source_task: extracta-paper2code
+confidence: 0.8
+---
+# Systems Implementation — Common Mistakes
+Patterns where the code compiles but the *behavior* is wrong under load.
+## Concurrency
+- **Mutex held across await** — JS/Python async: holding a lock while
+  awaiting another I/O call serializes the whole pool. Release before
+  awaiting; re-acquire after.
+- **Read-modify-write without lock** — `count++` is three ops. Under
+  concurrency you lose increments. Use atomic counters or a single
+  `UPDATE ... SET col = col + 1`.
+- **Double-check locking** — works in Java/C++ with proper memory
+  fences; broken in many other languages because the second check sees
+  a partially-constructed object.
+## Distributed systems
+- **At-most-once vs at-least-once** — clients usually need at-least-once
+  with idempotency keys; servers usually offer at-least-once delivery.
+  Treat duplicates as the default, not the edge case.
+- **Clock skew** — never trust two machines' wall clocks to differ by
+  less than 100 ms. Use logical clocks or relative times for ordering.
+- **Network partitions are not "rare"** — every multi-AZ deploy will see
+  one within a quarter. Design retries + backoff with jitter as
+  always-on, not failure-mode.
+## Performance
+- **N+1 queries** — list view fetches N records then issues N follow-up
+  queries. JOIN or use a batch loader (DataLoader pattern).
+- **Wrong cache key granularity** — caching at request level invalidates
+  too aggressively; per-user often invalidates not enough. The right
+  granularity is the smallest stable subset of the data.
+- **Synchronous I/O in event loop** — Node `fs.readFileSync` in a
+  request handler blocks the whole process. Same for Python asyncio
+  with sync DB drivers.
+## Storage
+- **Index on the wrong column order** — composite index on (user, ts)
+  helps `WHERE user=? AND ts > ?`; doesn't help `WHERE ts > ?`. Order
+  by selectivity-then-range.
+- **WAL not checkpointed** — long-running readers in SQLite WAL mode
+  prevent checkpoint, growing the `-wal` file unboundedly.
+- **Migration without backfill** — adding a NOT NULL column on a large
+  table without a default scans the whole table under a lock.
+## When to escalate
+If a task describes "scale to N users" or "handle bursts", the AC needs
+explicit numbers (latency p95, throughput) and an answer to: what
+happens when the bound is exceeded? Mark UNSPECIFIED otherwise.

package/src/skills/domain/testing/vitest-isolation.md ADDED Viewed

@@ -0,0 +1,31 @@
+---
+domain: testing
+topic: vitest-isolation
+triggers: [flaky_tests, shared_state, test_pollution]
+discovered_at: 2026-04-28T00:00:00.000Z
+source_task: seed
+confidence: 0.8
+---
+# Vitest Test Isolation
+Each test should create its own SqliteStore (`SqliteStore.open(":memory:")`)
+and tear it down in `afterEach`. Module-level singletons leak state across
+files when the worker pool reuses a process.
+## When to apply
+- Tests pass alone but fail when run together.
+- Order-dependent failures.
+- Mutable module-level caches (`Map`, `Set`) that aren't cleared.
+## Pattern
+```ts
+let store: SqliteStore;
+beforeEach(() => { store = SqliteStore.open(":memory:"); });
+afterEach(() => { store.close(); });
+```
+For singletons that can't be replaced, expose a `_reset()` test-only hook
+and call it in `beforeEach` — never share a real DB file across tests.

package/src/skills/domain/typescript/zod-v4-migration.md ADDED Viewed

@@ -0,0 +1,27 @@
+---
+domain: typescript
+topic: zod-v4-migration
+triggers: [zod_upgrade, schema_migration, type_inference_break]
+discovered_at: 2026-04-28T00:00:00.000Z
+source_task: seed
+confidence: 0.82
+---
+# Zod v4 Migration
+Always import from `zod/v4` in this project: `import { z } from 'zod/v4'`.
+Never import from `zod` — that path resolves to v3 in some environments and
+silently changes runtime behavior.
+## Common breaks
+- `z.record()` now requires both key and value schemas.
+- `z.string().email()` was removed; use `z.email()` at the top level.
+- `.optional()` followed by `.default()` changed evaluation order.
+- `safeParse` errors expose `issues` (no longer `errors`).
+## Migration recipe
+1. Bulk replace `from "zod"` → `from "zod/v4"`.
+2. Run `tsc --noEmit` and fix the schema-shape regressions case by case.
+3. Re-run unit tests for each schema; runtime parsing is stricter in v4.

package/src/skills/implement/anti-hallucination.md ADDED Viewed

@@ -0,0 +1,28 @@
+---
+name: anti-hallucination
+description: Forbidden phrases + citation requirements for code and rationale
+category: implement
+phases: [IMPLEMENT, REVIEW]
+---
+# anti-hallucination
+## When to use
+Always — when writing comments, commit messages, PR descriptions, or rationale strings. Per `.claude/rules/anti-hallucination.md`.
+## Steps
+1. Re-read the change before commit. Search for forbidden phrases:
+   "standard practice", "typically", "obviously", "best practice",
+   "as expected", "common pattern", "generally", "normally".
+2. Replace each with a citation (`§EPIC-...`, `§ADR-...`, RFC) or a measurement.
+3. If you can't cite, the claim is probably wrong — delete it.
+4. Add §-citations to any new file in src/core/.
+5. Run `analyze(mode='citation_groundedness')` before finish_task.
+## Anti-patterns
+- Hedging ("I think this is fine") instead of citing.
+- Decorative §-tags that don't actually trace to a spec.
+- Ignoring the linter when it flags banned phrases — fix, don't suppress.

package/src/skills/implement/pure-decision-pattern.md ADDED Viewed

@@ -0,0 +1,26 @@
+---
+name: pure-decision-pattern
+description: Extract pure logic from I/O so tests stay fast and deterministic
+category: implement
+phases: [IMPLEMENT]
+---
+# pure-decision-pattern
+## When to use
+Any module that decides something based on inputs (rate limit, threshold check, status mapping). Keep the decision pure; let the caller orchestrate I/O.
+## Steps
+1. Identify the decision: a function `(input) → output` with no side effects.
+2. Move the decision to its own file under src/core/.../<name>.ts.
+3. Test the decision with deterministic inputs only. No DB, no clock, no fs.
+4. Caller (MCP tool / hook handler) injects clocks, DB connections, env reads.
+5. For env-driven toggles, expose a tiny `isXDisabled(env)` helper that the caller checks.
+## Anti-patterns
+- Importing `process.env` deep in core (couples to node + makes test setup awful).
+- Reading DB inside the decision — breaks the unit test boundary.
+- Branching on `Date.now()` directly — inject a clock fn.

package/src/skills/implement/tracer-bullet-tdd.md ADDED Viewed

@@ -0,0 +1,26 @@
+---
+name: tracer-bullet-tdd
+description: Red-Green-Refactor with one shot through every layer first
+category: implement
+phases: [IMPLEMENT]
+---
+# tracer-bullet-tdd
+## When to use
+Implementing a feature that touches multiple layers (e.g., MCP tool → core function → store). Get a thin slice end-to-end working before fattening any layer.
+## Steps
+1. Write the **smallest** test that exercises every layer (skinny e2e).
+2. Stub each layer with the minimum code to make it red, then green.
+3. Commit the tracer; the diff shows the architecture in one screen.
+4. Now widen each layer: add cases to the unit tests of the layer that needs them.
+5. Refactor only after green; never refactor red code.
+## Anti-patterns
+- Building one layer fully before the next exists ("vertical waterfall").
+- Writing 10 tests up front then implementing — long red period kills feedback.
+- Skipping refactor because tests pass — debt accumulates per layer.

package/src/skills/plan/budget-aware-picking.md ADDED Viewed

@@ -0,0 +1,26 @@
+---
+name: budget-aware-picking
+description: Pick next task respecting cost budget — prefer XS/S when low
+category: plan
+phases: [PLAN, IMPLEMENT]
+---
+# budget-aware-picking
+## When to use
+When LLM cost is approaching the run cap or the sprint budget. Prevents tail-end blowups by biasing toward small tasks.
+## Steps
+1. Read current spend via `metrics(action='session_cost')`.
+2. If totalUsd / capUsdPerRun > 0.8, filter ready tasks to xpSize XS/S.
+3. Sort remaining: priority ASC, depth ASC, id stable tiebreak.
+4. If no XS/S available, fall back to any size (still escalates approval).
+5. Document the bias in `decisions.record` so the audit trail shows why.
+## Anti-patterns
+- Letting an L task run with 5% budget remaining — guaranteed mid-task abort.
+- Hard-stopping at 80% without fallback — unfinished critical path.
+- Re-picking the same task after cost-fallback engages without re-budgeting.

package/src/skills/plan/plan-sprint.md ADDED Viewed

@@ -0,0 +1,26 @@
+---
+name: plan-sprint
+description: Build a sprint with capacity + WIP=1 + dependency-respecting order
+category: plan
+phases: [PLAN]
+---
+# plan-sprint
+## When to use
+End of ANALYZE/DESIGN, before IMPLEMENT. You have decomposed atomic tasks and need a sprint that fits the team capacity.
+## Steps
+1. Compute capacity: hours_available × focus_factor (0.65 default).
+2. Sort tasks by `depends_on` (topological), then priority ASC.
+3. Greedily pack until capacity exhausted; respect xpSize budget.
+4. Run `analyze(mode='design_ready')` and `sync_stack_docs` before unfreezing.
+5. Verify WIP=1 enforceable: no two tasks share the same primary owner.
+## Anti-patterns
+- Filling capacity to 100% (kills cycle time per Little's Law).
+- Ignoring blocked tasks until sprint starts — pre-resolve in PLAN.
+- Sprint with mixed phases (analyze + implement) — split per phase.

package/src/skills/plan/to-issues.md ADDED Viewed

@@ -0,0 +1,67 @@
+---
+name: to-issues
+description: Break a plan or PRD into independently grabbable issues using tracer-bullet vertical slices; complements plan_sprint by producing the issue list that feeds it
+category: plan
+phases: [PLAN]
+---
+# to-issues
+Port of `skills-main/to-issues`. Use after `to-prd` (or when the PRD already exists) to produce the slice list that `plan_sprint` and `gh issue create` consume.
+## Process
+### 1. Gather context
+Work from the PRD node or GitHub issue passed in. `gh issue view <n>` if you need comments.
+### 2. Draft vertical slices
+Each slice is a **tracer bullet** — thin path through ALL layers (schema, API, UI, tests). NOT a horizontal layer cut.
+Rules:
+- Each slice is independently demoable
+- Prefer many thin slices over few fat ones
+- Mark slices **HITL** (needs human decision/review) or **AFK** (agent can ship alone). Prefer AFK.
+### 3. Quiz the user
+Present as a numbered list:
+| # | Title | Type | Blocked by | Stories covered |
+|---|---|---|---|---|
+| 1 | … | AFK | none | US-1, US-2 |
+Ask:
+- Granularity right? (too coarse / too fine)
+- Dependencies correct?
+- Any slices need to merge or split?
+- HITL/AFK assignments correct?
+Iterate until approved.
+### 4. File issues
+For each slice, create with `gh issue create` in dependency order so blockers get real numbers first.
+```markdown
+## Parent
+#<parent-issue-number>
+## What to build
+End-to-end behavior for this slice. No layer-by-layer how-to.
+## Acceptance criteria
+- [ ] …
+## Blocked by
+- #<n>  (or "None — can start immediately")
+```
+Do not close or modify the parent issue.
+## Anti-patterns
+- Horizontal slicing ("schema PR, then API PR, then UI PR") — none of those demo on their own
+- Treating every slice as HITL — block on human availability instead of shipping
+- Skipping the quiz step — user will reject the breakdown later, costing more

package/src/skills/review/citation-coverage-review.md ADDED Viewed

@@ -0,0 +1,26 @@
+---
+name: citation-coverage-review
+description: Verify every new core file has at least one §-citation traceable
+category: review
+phases: [REVIEW, HANDOFF]
+---
+# citation-coverage-review
+## When to use
+In PR review for any change touching src/core/. Citations anchor implementation back to design intent (EPIC, ADR, RFC).
+## Steps
+1. Run `analyze(mode='citation_groundedness')`.
+2. For each src/core/*.ts in the diff lacking §-citation, request one in review.
+3. Acceptable forms: `§EPIC-22.A4`, `§ADR-0049`, `RFC 7232 §2.3`.
+4. Skip src/tests/, src/cli/, src/web/ (caller-side discipline only).
+5. citation-coverage-guard hook (E21.T01) runs the check at task:post-complete; this skill is for human review depth.
+## Anti-patterns
+- Bulk-adding cosmetic §-tags that don't trace anywhere.
+- Approving "no citation needed for utilities" when the utility encodes a real design choice.
+- Ignoring the hook warning thinking "it's just advisory".

package/src/skills/review/deep-module-review.md ADDED Viewed

@@ -0,0 +1,26 @@
+---
+name: deep-module-review
+description: Audit a module's depth ratio + interface surface before merge
+category: review
+phases: [REVIEW]
+---
+# deep-module-review
+## When to use
+In code review of a new module or significant change. Goal: keep modules deep (small interface, large impl) per Ousterhout.
+## Steps
+1. Run `analyze(mode='deep_module', dir=<changed-files>)`.
+2. Reject any new file with depth='shallow' (ratio > 0.5) unless intentional facade.
+3. For 'medium', ask: can any export be made internal? Does any import only need one symbol?
+4. Check: function names describe behavior, not implementation.
+5. Block merge if shallowCandidates > 0 without justification in PR description.
+## Anti-patterns
+- Approving "looks fine" without running the analyzer.
+- Letting test helpers leak into production exports because they're "small".
+- Accepting helper modules with 10 exports — usually a bag of utilities, not a module.

package/src/skills/review/zoom-out.md ADDED Viewed

@@ -0,0 +1,34 @@
+---
+name: zoom-out
+description: Step up one abstraction layer; produce a map of relevant modules and their callers when you don't know an area of code well
+category: review
+phases: [REVIEW, ANALYZE]
+---
+# zoom-out
+Port of `skills-main/zoom-out`. Use when you (or a collaborator) hit code in an unfamiliar area and need orientation before changing it.
+## When to use
+- Reviewing a PR in a subsystem you've never touched
+- About to refactor without a clear map of impact
+- A bug report points into a module whose role you can't articulate
+## Output shape
+A short tree (≤ 1 page) covering:
+1. **The module itself** — purpose in one sentence
+2. **Direct callers** — who depends on this (use `code_intelligence` to find them)
+3. **Direct dependencies** — what this depends on
+4. **Sibling modules** — others playing the same role at the same layer
+5. **Owning epic / requirement node** — pull from the graph if a node references this code path
+Keep it factual. The point is a map, not commentary.
+## Anti-patterns
+- Diving into implementation details — that's "zoom IN"; this skill is the opposite
+- Skipping the callers — without them you can't tell which behaviors are load-bearing
+- Inventing structure to fit a pattern you saw elsewhere — describe what's actually there

package/src/skills/validate/dod-checklist.md ADDED Viewed

@@ -0,0 +1,30 @@
+---
+name: dod-checklist
+description: Definition of Done — 9 checks before update_status(done)
+category: validate
+phases: [VALIDATE, IMPLEMENT]
+---
+# dod-checklist
+## When to use
+Before calling `finish_task` or `update_status(done)`. The pipeline runs these automatically; this skill is the explicit human-readable form for review.
+## Steps
+1. has_acceptance_criteria — task or parent has AC. **Required.**
+2. ac_quality_pass — score ≥ 60 (INVEST). **Required.**
+3. no_unresolved_blockers — no `depends_on` to non-done. **Required.**
+4. status_flow_valid — passed through `in_progress` before `done`. **Required.**
+5. has_description — non-empty.
+6. not_oversized — L/XL must have subtasks.
+7. has_testable_ac — at least 1 AC has GIVEN/WHEN/THEN.
+8. has_estimate — xpSize OR estimateMinutes set.
+9. has_test_files — testFiles populated.
+## Anti-patterns
+- Marking done when test_gate is failing — violates DoD #4 spirit.
+- Hand-editing AC after starting work to fit "what was done".
+- Skipping ac_quality_pass with a "trivially fixed in next task" excuse.

package/src/skills/validate/harness-regression-check.md ADDED Viewed

@@ -0,0 +1,26 @@
+---
+name: harness-regression-check
+description: Compare harness scores before/after to gate merge
+category: validate
+phases: [VALIDATE, REVIEW]
+---
+# harness-regression-check
+## When to use
+In VALIDATE before promoting to REVIEW; in REVIEW before merging. Harness < 70 = elevated hallucination risk.
+## Steps
+1. Read baseline from last green commit: `analyze(mode='harness_trend')`.
+2. Run `analyze(mode='harness_scan')` on current state.
+3. If delta ≤ -5 pts: investigate which dimension regressed (type, test, naming, error handling, etc.).
+4. If delta ≤ -10 pts: block merge. Open an investigation task before continuing.
+5. Persist the new score so the next session sees it as baseline.
+## Anti-patterns
+- Treating harness as vanity metric — it predicts review effort.
+- Boosting one dimension (e.g., adding empty JSDoc) to mask another regression.
+- Ignoring the warning at start_task because "I'll clean up at the end".