npm - @curdx/flow - Versions diffs - 2.0.0-beta.1 → 2.0.0-beta.10 - Mend

@curdx/flow 2.0.0-beta.1 → 2.0.0-beta.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

package/.claude-plugin/marketplace.json +1 -1
package/.claude-plugin/plugin.json +3 -10
package/CHANGELOG.md +20 -0
package/README.zh.md +2 -2
package/agent-preamble/preamble.md +81 -11
package/agents/flow-adversary.md +40 -55
package/agents/flow-architect.md +23 -10
package/agents/flow-debugger.md +2 -2
package/agents/flow-edge-hunter.md +20 -6
package/agents/flow-executor.md +3 -3
package/agents/flow-planner.md +51 -48
package/agents/flow-product-designer.md +14 -1
package/agents/flow-qa-engineer.md +1 -1
package/agents/flow-researcher.md +17 -2
package/agents/flow-reviewer.md +5 -1
package/agents/flow-security-auditor.md +1 -1
package/agents/flow-triage-analyst.md +1 -1
package/agents/flow-ui-researcher.md +2 -2
package/agents/flow-ux-designer.md +1 -1
package/agents/flow-verifier.md +47 -14
package/bin/curdx-flow.js +13 -1
package/cli/doctor.js +28 -13
package/cli/install.js +62 -36
package/cli/protocols.js +63 -10
package/cli/registry.js +73 -0
package/cli/uninstall.js +9 -11
package/cli/upgrade.js +6 -10
package/cli/utils.js +104 -56
package/commands/fast.md +1 -1
package/commands/implement.md +4 -4
package/commands/init.md +14 -3
package/commands/review.md +14 -5
package/commands/spec.md +26 -2
package/commands/start.md +47 -17
package/commands/verify.md +13 -0
package/gates/adversarial-review-gate.md +19 -19
package/gates/devex-gate.md +4 -5
package/gates/edge-case-gate.md +1 -1
package/hooks/hooks.json +0 -11
package/hooks/scripts/quick-mode-guard.sh +12 -9
package/hooks/scripts/session-start.sh +1 -1
package/hooks/scripts/stop-watcher.sh +25 -15
package/knowledge/execution-strategies.md +6 -5
package/knowledge/spec-driven-development.md +8 -7
package/knowledge/two-stage-review.md +4 -3
package/package.json +4 -2
package/skills/brownfield-index/SKILL.md +62 -0
package/skills/browser-qa/SKILL.md +50 -0
package/skills/epic/SKILL.md +68 -0
package/skills/security-audit/SKILL.md +50 -0
package/skills/ui-sketch/SKILL.md +49 -0
package/templates/config.json.tmpl +1 -1
package/templates/design.md.tmpl +32 -112
package/templates/requirements.md.tmpl +25 -43
package/templates/research.md.tmpl +37 -68
package/templates/tasks.md.tmpl +27 -84
package/hooks/scripts/fail-tracker.sh +0 -31

package/commands/start.md CHANGED Viewed

@@ -32,23 +32,45 @@ Entry point for every feature. Works in four modes depending on flags and existi
 ## Flag parsing
-```bash
-FLAG_RESUME=$(echo "$ARGUMENTS" | grep -q -- '--resume' && echo 1 || echo 0)
-FLAG_LIST=$(echo "$ARGUMENTS" | grep -q -- '--list' && echo 1 || echo 0)
-FLAG_MODE=$(echo "$ARGUMENTS" | grep -oP -- '--mode=\K[^\s]+' || echo "standard")
-# Strip flags from ARGUMENTS to leave the positional args
-POS=$(echo "$ARGUMENTS" | sed -E 's/--[a-z-]+(=[^ ]+)?//g' | xargs)
-SPEC_NAME=$(echo "$POS" | awk '{print $1}')
-GOAL=$(echo "$POS" | awk '{$1=""; print $0}' | sed 's/^"//; s/"$//' | xargs)
-```
-Mode must be `fast`, `standard`, or `enterprise`. Invalid → default to `standard` with a warning.
+**Do not shell-split `$ARGUMENTS`.** It is a user-supplied string that may
+contain quoted substrings with spaces, `$`-signs, or embedded quotes.
+`xargs`, naive `awk`, and `sed`-based quote stripping all mis-parse at
+least one of those cases (e.g. `my-feature "Fix user's login bug"` breaks
+`xargs: unmatched quote`). Parse the string as a model task instead:
+1. **Flags** (order-independent, each is self-delimited):
+   - `--resume` / `--list` — boolean presence
+   - `--mode=<fast|standard|enterprise>` — value after `=`
+   Detect each with a single regex over the full `$ARGUMENTS` string and
+   remove the matched span from your working copy. Flags not in the list
+   above are errors — surface them to the user.
+2. **Positional args** (after flags removed):
+   - First whitespace-separated token → `SPEC_NAME` (kebab-case `[a-z0-9-]+`).
+   - Remainder of the string, trimmed and with one layer of outer `"..."`
+     or `'...'` quotes stripped → `GOAL`. Preserve inner quotes as-is.
+3. If `SPEC_NAME` does not match `^[a-z0-9][a-z0-9-]*$` (per
+   `schemas/spec-state.schema.json`), stop and ask the user to pick a
+   valid kebab-case name.
+Mode must be `fast`, `standard`, or `enterprise`. Invalid → default to
+`standard` with a warning.
+Example inputs and their parse:
+| `$ARGUMENTS`                                    | SPEC_NAME    | GOAL                          | flags         |
+|-------------------------------------------------|--------------|-------------------------------|---------------|
+| `my-feature "Add JWT auth"`                     | `my-feature` | `Add JWT auth`                | —             |
+| `my-feature --mode=fast "Add JWT auth"`         | `my-feature` | `Add JWT auth`                | mode=fast     |
+| `my-feature "Fix user's login bug"`             | `my-feature` | `Fix user's login bug`        | —             |
+| `--list`                                        | —            | —                             | list=true     |
+| `--resume`                                      | —            | —                             | resume=true   |
 ## Branch logic
 ### Branch A: `--list`
-Enumerate every directory under `.flow/specs/`, read each `.state.json` for `phase` and `updated_at`, print a numbered list, then `AskUserQuestion` to pick one. Picking sets `.flow/.active-spec` and exits.
+Enumerate every directory under `.flow/specs/`, read each `.state.json` for `phase` and `updated` (per `schemas/spec-state.schema.json`), print a numbered list, then `AskUserQuestion` to pick one. Picking sets `.flow/.active-spec` and exits.
 ### Branch B: `--resume` (no name)
 Read `.flow/.active-spec`. If it points to a valid spec dir, report its current phase and next suggested command (`/curdx-flow:spec` if incomplete, `/curdx-flow:implement` if tasks ready). If `.active-spec` is empty or stale, fall back to Branch A.
@@ -61,17 +83,25 @@ Create a new spec:
 ```bash
 mkdir -p ".flow/specs/$SPEC_NAME"
+# NOTE: field names MUST match schemas/spec-state.schema.json:
+#   - spec_name (not "spec")
+#   - created (date, not "created_at")
+#   - updated (date-time, not "updated_at")
+#   - phase must be one of the enum values; the initial phase is "research"
+#     (there is no "created" phase — that was schema drift pre-beta.9)
+#   - version is required
 cat > ".flow/specs/$SPEC_NAME/.state.json" <<JSON
 {
-  "spec": "$SPEC_NAME",
+  "version": "1.0",
+  "spec_name": "$SPEC_NAME",
   "goal": "$GOAL",
   "mode": "$FLAG_MODE",
-  "phase": "created",
+  "phase": "research",
   "phase_status": {},
   "strategy": "auto",
   "execute_state": {},
-  "created_at": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
-  "updated_at": "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+  "created": "$(date -u +%Y-%m-%d)",
+  "updated": "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
 }
 JSON
 echo "$SPEC_NAME" > .flow/.active-spec

package/commands/verify.md CHANGED Viewed

@@ -67,6 +67,19 @@ If `--strict`:
 ### Step 4: Produce `verification-report.md`
+**Landing check**: sub-agent responses can be truncated by the model's output-length limit. After dispatching `flow-verifier`, verify the report actually landed:
+```bash
+REPORT=".flow/specs/$SPEC_NAME/verification-report.md"
+if [ ! -f "$REPORT" ] || [ "$(wc -c < "$REPORT" 2>/dev/null | tr -d ' ')" -lt 300 ]; then
+  echo "⚠ Report missing or truncated. Re-dispatching flow-verifier with a terse 'write the report now' prompt."
+  # Re-dispatch pattern:
+  #   "Your only job right now is to Write the verification-report.md using the
+  #    findings you already gathered. Do not re-scan. Do not narrate. Write
+  #    the file and stop."
+fi
+```
 Write to `.flow/specs/$SPEC_NAME/verification-report.md`:
 ```markdown

package/gates/adversarial-review-gate.md CHANGED Viewed

@@ -33,19 +33,19 @@ A reviewer agent's output of "everything looks fine, no issues found" is an **in
 - "Looks good" is usually confirmation bias (the agent only checked the obvious)
 - AI tends to please the user ("great job!") — fight this tendency
-**Forced actions**:
-1. If the agent outputs "no issues", automatically trigger a second round
-2. The second round requires the agent to perform deeper analysis via sequential-thinking
-3. If both rounds yield no findings, the agent must **prove** it checked:
-   - List the dimensions examined (at least 5)
-   - For each dimension, give the specific code/file locations inspected
-   - Provide counterfactual hypotheses of "what it would look like if there were a problem"
+**Forced actions when the agent reports "no issues"**:
+1. Automatically trigger a second round framed as "what would a senior skeptic reject in this PR?"
+2. If both rounds still honestly yield no findings, the agent must emit a **proof-of-checking report**:
+   - Every category it examined (with "N/A" for categories that don't apply)
+   - For each examined category, the specific code/file locations inspected
+   - Counterfactual hypotheses of "what this would look like if there were a problem" and why that signature is absent
+3. Fabricating findings to avoid the proof-of-checking step is a violation of L3 red line #2 (fact-driven). Better to emit "clean verdict with proof" than invent issues.
 ---
-### Rule 2: Findings in at Least 3 Categories
+### Rule 2: Coverage proportional to feature scope
-A complete adversarial review must cover (find issues in at least 3 of these categories):
+A complete adversarial review covers every category that applies to the feature, marks the rest as N/A with reason. Number of findings per category is proportional to real issues, not a quota:
 1. **Architecture layer**: Are decisions sound? Future-extensible? Lock-in risks?
 2. **Implementation layer**: Code quality? Error handling? Performance?
@@ -86,22 +86,22 @@ Not allowed:
 Input: object under review (code range / spec / PR diff)
   ↓
 Round 1 (agent self-analysis):
-  - Use sequential-thinking ≥ 6 rounds
-  - Scan all 6 categories
+  - Use sequential-thinking proportional to the surface being probed
+  - Scan each applicable category; mark N/A ones with reason
   - Output findings list
   ↓
 Decision:
-  - Findings ≥ 3? → output report
-  - Findings < 3? → force Round 2
+  - Any real findings? → output report with findings
+  - Zero findings after honest Round 1? → force Round 2 framed as skeptic
   ↓
 Round 2 (deep analysis):
-  - sequential-thinking for another 6 rounds
+  - sequential-thinking proportional to residual uncertainty
   - Focus on "seemingly no issues" parts (trust but verify)
-  - May introduce external perspectives (read issues from similar projects)
+  - Optionally introduce external perspectives (read issues from similar projects)
   ↓
 Decision:
-  - Still < 3? → agent must explicitly prove it checked
-  - Otherwise → output report
+  - Still zero findings? → agent must emit proof-of-checking report (NOT invent findings)
+  - Findings exist? → output report
   ↓
 Output: review-report.md
 ```
@@ -190,10 +190,10 @@ Fix loop:
 ## Failure Recovery
-If after 2 rounds there are still < 3 findings:
+If after Round 2 the honest verdict is still zero findings, emit a proof-of-checking report (do NOT fabricate to hit a quota — there is no quota):
 ```markdown
-## Adversarial Review — Insufficient Findings
+## Adversarial Review — Proof of Checking (zero findings)
 I have examined the following dimensions across 2 rounds of analysis:

package/gates/devex-gate.md CHANGED Viewed

@@ -195,12 +195,12 @@ Reading these test names = reading API behavior documentation.
 ### Agent Automatic
-When `flow-ux-designer` / `flow-reviewer` applies this gate, use sequential-thinking ≥ 4 rounds to scan the 8 dimensions.
+When `flow-ux-designer` / `flow-reviewer` applies this gate, use sequential-thinking proportional to the complexity of the codebase being scanned.
 ### Human Review
 Attach a DevEx checklist at PR time:
-- [ ] Clear naming (reviewed at least 3 times)
+- [ ] Clear naming (re-read until obvious to a new maintainer)
 - [ ] Critical comments exist
 - [ ] Consistent structure
 - [ ] Actionable error messages
@@ -210,7 +210,7 @@ Attach a DevEx checklist at PR time:
 ## Scoring
-Each dimension 0-10 points:
+Score each **applicable** dimension 0-10 (N/A dimensions are excluded from the total):
 ```
 10 = best practice
@@ -220,8 +220,7 @@ Each dimension 0-10 points:
 0  = serious issue
 ```
-Total 40+ / 80 = pass (warning, non-blocking).
-Total < 40 = blocked, improvement required.
+Emit the per-dimension scores with evidence. The gate itself does not block on a numeric threshold; it surfaces the weaknesses for the user (or the reviewing agent) to decide whether any of them rise to a blocker. A single 0/10 on a material dimension is a blocker regardless of the total.
 ---

package/gates/edge-case-gate.md CHANGED Viewed

@@ -104,7 +104,7 @@ Q4. If no test, what test should be added to cover it?
 Input: object under review (function / component / API) + requirements + tests
   ↓
 For each category (1-7):
-  1. Use sequential-thinking to list at least 3 possible edge scenarios
+  1. Use sequential-thinking to list every plausible edge scenario for this category — stop when you've covered the real risk surface, don't pad to a quota, don't fabricate scenarios that won't occur in production
   2. Check whether each scenario has corresponding coverage in tests
   3. Add uncovered ones to the "gap list"
   ↓

package/hooks/hooks.json CHANGED Viewed

@@ -20,17 +20,6 @@
         ]
       }
     ],
-    "PostToolUseFailure": [
-      {
-        "matcher": "Bash|Edit|Write",
-        "hooks": [
-          {
-            "type": "command",
-            "command": "${CLAUDE_PLUGIN_ROOT}/hooks/scripts/fail-tracker.sh"
-          }
-        ]
-      }
-    ],
     "Stop": [
       {
         "hooks": [

package/hooks/scripts/quick-mode-guard.sh CHANGED Viewed

@@ -40,17 +40,20 @@ ACTIVE=$(cat .flow/.active-spec 2>/dev/null)
 STATE_FILE=".flow/specs/$ACTIVE/.state.json"
 [ ! -f "$STATE_FILE" ] && exit 0
-# Read quickMode + mode
-QUICK_MODE=$(python3 -c "
-import json
+# Read quickMode + mode. Pass STATE_FILE via env (NOT shell interpolation
+# into the python source) so an active-spec name containing quotes/$ cannot
+# inject python code.
+export STATE_FILE
+QUICK_MODE=$(python3 -c '
+import json, os
 try:
-    s = json.load(open('$STATE_FILE'))
-    qm = s.get('quickMode', False)
-    mode = s.get('mode', '')
-    print('true' if (qm or mode == 'autonomous') else 'false')
+    s = json.load(open(os.environ["STATE_FILE"]))
+    qm = s.get("quickMode", False)
+    mode = s.get("mode", "")
+    print("true" if (qm or mode == "autonomous") else "false")
 except Exception:
-    print('false')
-" 2>/dev/null)
+    print("false")
+' 2>/dev/null)
 if [ "$QUICK_MODE" = "true" ]; then
   # Block and inject guidance

package/hooks/scripts/session-start.sh CHANGED Viewed

@@ -1,7 +1,7 @@
 #!/usr/bin/env bash
 # CurDX-Flow SessionStart Hook
 # Duties:
-#   1. Daily dependency check — nudge user to /flow-install-deps if recommended plugins missing
+#   1. Daily dependency check — nudge user to `npx @curdx/flow install --all` if recommended plugins missing
 #   2. Load active spec progress into session context
 #
 # Design notes:

package/hooks/scripts/stop-watcher.sh CHANGED Viewed

@@ -56,6 +56,12 @@ if ! command -v python3 >/dev/null 2>&1; then
   allow_stop
 fi
+# Export STATE_FILE BEFORE invoking python3 — the heredoc-based parser reads
+# os.environ["STATE_FILE"]. Previously the export was placed after the
+# heredoc, so python3 always got None, json.load(None) silently failed, and
+# the stop-hook strategy never activated.
+export STATE_FILE
 read STRATEGY PHASE TASK_INDEX TOTAL_TASKS FAILED ROUNDS <<EOF
 $(python3 <<'PY'
 import json, os, sys
@@ -75,7 +81,6 @@ print(strategy, phase, ti, tt, failed, rounds)
 PY
 )
 EOF
-export STATE_FILE
 # Only activate for stop-hook strategy + execute phase
 [ "$STRATEGY" != "stop-hook" ] && allow_stop
@@ -95,12 +100,17 @@ if [ -n "$TRANSCRIPT_PATH" ] && [ -f "$TRANSCRIPT_PATH" ]; then
   TRANSCRIPT_TAIL=$(tail -c 51200 "$TRANSCRIPT_PATH" 2>/dev/null || echo "")
 fi
+# Python state-file updates: use quoted heredocs (<<'PY') + os.environ so
+# the spec-name-derived STATE_FILE path is NEVER interpolated into the
+# python source text. Previously a spec name containing single quotes or
+# $-signs could break the script or inject arbitrary code.
 # Check for explicit completion signals
 if echo "$TRANSCRIPT_TAIL" | grep -q "ALL_TASKS_COMPLETE"; then
   # Cleanup: mark phase completed
-  python3 <<PY 2>/dev/null
-import json
-p = "$STATE_FILE"
+  python3 <<'PY' 2>/dev/null
+import json, os
+p = os.environ["STATE_FILE"]
 s = json.load(open(p))
 s.setdefault("phase_status", {})["execute"] = "completed"
 s["phase"] = "verify"  # move to verify phase
@@ -112,16 +122,16 @@ fi
 # Check for fail signal (accumulate; actual stop decision below)
 if echo "$TRANSCRIPT_TAIL" | grep -q "TASK_FAILED"; then
   # Increment failed_attempts
-  python3 <<PY 2>/dev/null
-import json
-p = "$STATE_FILE"
+  python3 <<'PY' 2>/dev/null
+import json, os
+p = os.environ["STATE_FILE"]
 s = json.load(open(p))
 s.setdefault("execute_state", {})
 s["execute_state"]["failed_attempts"] = s["execute_state"].get("failed_attempts", 0) + 1
 json.dump(s, open(p, "w"), indent=2, ensure_ascii=False)
 PY
-  # Re-read
-  FAILED=$(python3 -c "import json; print(json.load(open('$STATE_FILE'))['execute_state']['failed_attempts'])" 2>/dev/null || echo 0)
+  # Re-read — again via os.environ, no shell interpolation into python.
+  FAILED=$(python3 -c 'import json, os; print(json.load(open(os.environ["STATE_FILE"]))["execute_state"]["failed_attempts"])' 2>/dev/null || echo 0)
 fi
 # ---------- 6. Safety brakes ----------
@@ -138,9 +148,9 @@ fi
 # Check if all tasks done
 if [ "$TASK_INDEX" -ge "$TOTAL_TASKS" ] && [ "$TOTAL_TASKS" -gt 0 ]; then
   # Mark complete
-  python3 <<PY 2>/dev/null
-import json
-p = "$STATE_FILE"
+  python3 <<'PY' 2>/dev/null
+import json, os
+p = os.environ["STATE_FILE"]
 s = json.load(open(p))
 s.setdefault("phase_status", {})["execute"] = "completed"
 s["phase"] = "verify"
@@ -151,9 +161,9 @@ fi
 # ---------- 7. Block and continue ----------
 # Increment round counter
-python3 <<PY 2>/dev/null
-import json
-p = "$STATE_FILE"
+python3 <<'PY' 2>/dev/null
+import json, os
+p = os.environ["STATE_FILE"]
 s = json.load(open(p))
 s.setdefault("execute_state", {})
 s["execute_state"]["global_iteration"] = s["execute_state"].get("global_iteration", 0) + 1

package/knowledge/execution-strategies.md CHANGED Viewed

@@ -223,13 +223,14 @@ return "linear"
 ## Failure Handling (common to all strategies)
-`flow-executor` agent's 5-round retry mechanism:
+`flow-executor` agent's retry ladder — each step escalates only when the prior is honestly exhausted, not on a fixed count:
 ```
-Rounds 1-2: agent retries autonomously (edit code, rerun Verify)
-Round 3: sequential-thinking root-cause analysis ≥ 5 rounds
-Round 4: read related source + trace data flow
-Round 5: report TASK_FAILED
+Step A: autonomous retry (edit + rerun Verify) — only for shallow failures
+Step B: sequential-thinking root-cause analysis proportional to the hypothesis space
+Step C: read related source + trace data flow
+Step D: if ≥3 retries fail with no new hypothesis, stop and challenge the architecture (see preamble L3)
+Step E: report TASK_FAILED
 ```
 ### Extra protections for Stop-Hook strategy

package/knowledge/spec-driven-development.md CHANGED Viewed

@@ -57,7 +57,7 @@ What's wasted isn't code — it's context tokens and decision fatigue from churn
 **Key behaviors** (flow-researcher agent):
 1. Read `.flow/PROJECT.md` and `.flow/CONTEXT.md` to understand project background
 2. Call `mcp__claude_mem__search` to retrieve relevant historical experience
-3. Use sequential-thinking for 5-8 rounds of problem understanding
+3. Use sequential-thinking proportional to the unknowns (1 thought for a trivial prototype, many for a novel domain)
 4. Scan the codebase for reusable modules
 5. Use `mcp__context7__*` to look up latest docs for relevant libraries
 6. When necessary, WebSearch for the latest technical trends
@@ -99,11 +99,12 @@ What's wasted isn't code — it's context tokens and decision fatigue from churn
 **Key behaviors** (flow-architect agent):
 1. Read `research.md` + `requirements.md`
-2. **Must use sequential-thinking for at least 8 rounds**:
-   - Rounds 1-2: constraints
-   - Rounds 3-5: comparison of options A/B
-   - Rounds 6-7: selection + trade-offs
-   - Round 8: rebut yourself
+2. **Use sequential-thinking proportional to the tradeoff surface** — the phases below are orientation, not a quota:
+   - Constraints (from NFR / tech stack)
+   - Option comparison (only when alternatives genuinely compete)
+   - Selection + accepted tradeoff
+   - Self-rebuttal
+   A well-known stack pick may finish in 1 thought; a distributed-system design may run many. Do not pad.
 3. Assign an `AD-NN` ID to each architectural decision
 4. Draw a data flow diagram (mermaid)
 5. Define component interfaces + error paths
@@ -125,7 +126,7 @@ What's wasted isn't code — it's context tokens and decision fatigue from churn
 3. Each task has 5 fields: `Do` / `Files` / `Done-when` / `Verify` / `Commit`
 4. **Multi-source coverage audit**: for each FR / AC / AD / decision, confirm there is a covering task (no omissions)
 5. Mark `[P]` (parallel-safe) and `[VERIFY]` (checkpoint)
-6. Simple decomposition doesn't need sequential-thinking, but reflect on coverage every 5 tasks
+6. Simple decomposition doesn't need sequential-thinking; run a coverage audit at the end (every FR/AC/AD has a task)
 **Deliverable**: `tasks.md`

package/knowledge/two-stage-review.md CHANGED Viewed

@@ -113,17 +113,18 @@ Stage 2 applies all enabled Gates (from `.flow/config.json`):
 #### 2.5 (enterprise) Adversarial review (adversarial-review-gate)
-- ≥ 3 categories of issues found?
+- Every applicable category examined (N/A documented for the rest)?
+- Findings proportional to real issues (zero is OK with a proof-of-checking report)?
 - Each finding has evidence + recommendation?
 #### 2.6 (enterprise) Edge cases (edge-case-gate)
-- Did all 7 major categories pass?
+- Each applicable edge-case category addressed (N/A noted for the rest)?
 - Gap list has priorities?
 ### Stage 2 verdict
-- **EXCELLENT**: all enabled Gates pass, adversarial findings < 3 (high-quality code)
+- **EXCELLENT**: all enabled Gates pass, adversarial review clean or only low-severity findings
 - **GOOD**: all enabled Gates pass, but some warnings
 - **NEEDS_IMPROVEMENT**: Gate violations (blocking)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@curdx/flow",
-  "version": "2.0.0-beta.1",
+  "version": "2.0.0-beta.10",
   "description": "CLI installer for CurDX-Flow — AI engineering workflow meta-framework for Claude Code",
   "type": "module",
   "bin": {
@@ -8,7 +8,8 @@
     "curdx-flow": "bin/curdx-flow.js"
   },
   "scripts": {
-    "prepublishOnly": "node bin/curdx-flow.js --version"
+    "test": "node --test test/*.test.js",
+    "prepublishOnly": "node --test test/*.test.js && node bin/curdx-flow.js --version"
   },
   "files": [
     "bin/",
@@ -22,6 +23,7 @@
     "agent-preamble/",
     "templates/",
     "schemas/",
+    "skills/",
     "README.md",
     "CHANGELOG.md",
     "LICENSE"

package/skills/brownfield-index/SKILL.md ADDED Viewed

@@ -0,0 +1,62 @@
+---
+name: brownfield-index
+description: Invoke when the user is new to an unfamiliar / legacy / brownfield codebase and wants a structural understanding — module map, component inventory, API surface, data flow. Triggers on "legacy code", "brownfield", "unfamiliar", "new to this code", "new to this project", "just joined", "inherited codebase", "explore codebase", "understand structure", "index code", "map modules", "tour", "onboard", "what is this project".
+allowed-tools: [Read, Grep, Glob, Bash]
+---
+# Brownfield Index
+You are invoked when the user needs a structural map of an existing codebase they are not yet familiar with.
+## Preconditions
+1. The repository root is the current working directory (or a path the user specifies).
+2. The project is not a new `/curdx-flow:init`-ed greenfield project (if it is, direct the user to `/curdx-flow:start` instead).
+## Workflow
+### Step 1: Detect project type
+Read `package.json` / `Cargo.toml` / `pyproject.toml` / `go.mod` / `pom.xml` to classify the ecosystem and build tool. This determines which directory conventions to apply.
+### Step 2: Scan directory structure
+Produce a top-level inventory:
+- **Entry points** (main / index / bin scripts)
+- **Module directories** (src/, lib/, internal/, pkg/ …)
+- **Test directories**
+- **Config files**
+- **Tooling** (CI, lint, format configs)
+### Step 3: Component inventory
+For each module directory, list:
+- Files and their apparent role (inferred from names + top-of-file comments)
+- Public exports / exported symbols
+- Third-party dependencies imported
+### Step 4: API surface
+If HTTP / RPC endpoints exist, index them: route → handler → middleware. For CLI tools, index commands → handlers.
+### Step 5: Write index document
+Output `.flow/codebase-index.md` containing:
+- **Overview** (project purpose, build tool, runtime)
+- **Directory tree** (with per-directory one-liner descriptions)
+- **Entry points** (where execution starts)
+- **Key abstractions** (core types, interfaces, classes that everything else hangs off)
+- **External dependencies** (grouped: prod runtime / dev tooling / transitive)
+- **Known gaps / red flags** (missing tests, TODOs, suspicious patterns)
+### Step 6: Hand off
+Point the user at the next useful action:
+- "Looking to add a feature here? Run `/curdx-flow:start <name>` to begin a spec."
+- "Debugging something specific? Run `/curdx-flow:debug '<symptom>'`."
+## Notes
+This skill uses Read + Grep + Glob + Bash with no specialized agent — general tools are enough for structural discovery. The index is meant to be quick (5–10 minutes), not exhaustive.
+For deep research into a specific library or framework, use `context7` MCP directly.

package/skills/browser-qa/SKILL.md ADDED Viewed

@@ -0,0 +1,50 @@
+---
+name: browser-qa
+description: Invoke when the user wants to test a UI/frontend in a real browser — accessibility, performance, console errors, network traffic, visual regression. Triggers on "browser test", "test in browser", "UI test", "e2e test", "frontend test", "accessibility", "a11y", "WCAG", "lighthouse", "performance audit", "console error", "network request", "cross-browser", "responsive", "mobile test", "visual regression", "screenshot".
+allowed-tools: [Read, Write, Bash, Grep, Glob, WebFetch]
+---
+# Browser QA
+You are invoked when the user wants real-browser QA of a UI flow.
+## Preconditions
+1. `chrome-devtools` MCP is available (`mcp__chrome-devtools__*`). If missing, fall back to a manual checklist.
+2. A URL (dev server or deployed) is available. Prompt for it if not provided.
+## Workflow
+### Step 1: Clarify scope
+Confirm with the user:
+- **URL under test** (local `http://localhost:3000` or remote)
+- **Flow to test** (e.g., "sign up → dashboard → logout")
+- **What success looks like** (accessibility / performance / zero console errors / visual match)
+### Step 2: Dispatch `flow-qa-engineer`
+Delegate to the `flow-qa-engineer` agent. It will:
+1. Open the target URL via `mcp__chrome-devtools__new_page`
+2. Drive the flow with `mcp__chrome-devtools__click` / `fill` / `navigate`
+3. Capture `list_console_messages`, `list_network_requests`, `take_screenshot`, optionally `lighthouse_audit`
+4. Compare against expected behavior
+### Step 3: Report findings
+Produce `.flow/specs/<active>/qa-report.md` with:
+- **Bugs** (reproducible, severity P1/P2/P3)
+- **Performance** (LCP / INP / CLS from Lighthouse)
+- **Accessibility** (axe violations with WCAG references)
+- **Console errors** (full stack traces)
+- **Screenshots** (attached)
+### Step 4: Hand off
+If bugs found: suggest `/curdx-flow:debug "<bug title>"` for systematic root-cause analysis.
+If accessibility violations: suggest fixes inline with WCAG refs.
+## References
+- `flow-qa-engineer` agent: `@${CLAUDE_PLUGIN_ROOT}/agents/flow-qa-engineer.md`
+- chrome-devtools MCP docs: https://github.com/ChromeDevTools/chrome-devtools-mcp