npm - onecrawl - Versions diffs - 4.0.0-beta.1 → 4.0.0-beta.3 - Mend

onecrawl 4.0.0-beta.1 → 4.0.0-beta.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/assets/skills/breaking-change-paths/SKILL.md +46 -19
package/assets/skills/completion-gate/SKILL.md +48 -11
package/assets/skills/e2e-testing/SKILL.md +45 -5
package/assets/skills/github-sync/SKILL.md +80 -10
package/assets/skills/interaction-loop/SKILL.md +52 -7
package/assets/skills/planning-tracking/SKILL.md +61 -18
package/assets/skills/policy-coherence-audit/SKILL.md +56 -21
package/assets/skills/programmatic-tool-calling/SKILL.md +88 -19
package/assets/skills/rollback-rca/SKILL.md +57 -22
package/assets/skills/session-logging/SKILL.md +67 -25
package/assets/skills/systematic-debugging/SKILL.md +43 -12
package/assets/skills/testing-policy/SKILL.md +48 -20
package/package.json +1 -1

package/assets/skills/breaking-change-paths/SKILL.md CHANGED Viewed

@@ -17,43 +17,70 @@ Run a structured, future-proof decision process when a task may affect public co
 | Surface | Breaking Impact | Examples |
 |---------|----------------|----------|
 | **CLI flags/args** | HIGH | Renaming `--headless` → `--head`, removing `--native` |
-| **MCP tool names/schemas** | HIGH | Renaming tool actions, changing JSON schemas |
+| **MCP tool names/schemas** | HIGH | Renaming tool actions, changing JSON input schemas |
 | **NAPI/PyO3 method signatures** | HIGH | Changing function names, param types, return types |
-| **Config keys** | MEDIUM | Renaming `config.toml` keys, changing defaults |
+| **Config keys** (`config.toml`) | MEDIUM | Renaming keys, changing defaults, removing options |
 | **Session file format** | MEDIUM | Changing `/tmp/onecrawl-session-*.json` structure |
-| **Daemon protocol** | MEDIUM | Changing HTTP/WS API between CLI and daemon |
-| **Internal crate APIs** | LOW | Changing `pub` functions in workspace crates |
-| **CDP layer** | LOW | Internal CDP wrapper changes |
+| **Daemon HTTP/WS protocol** | MEDIUM | Changing API between CLI ↔ daemon communication |
+| **Profile file format** | MEDIUM | Changing profile storage schema |
+| **Internal crate `pub` APIs** | LOW | Changing `pub fn` in workspace crates (internal consumers only) |
+| **CDP layer wrappers** | LOW | Internal CDP abstractions (no external consumers) |
+## Concrete OneCrawl Examples
+### Non-Breaking Change Examples
+```
+✅ Adding new CLI flag: `onecrawl session start --timeout 30`
+✅ Adding new MCP action: `onecrawl run browser get_accessibility_snapshot`
+✅ New config key with default: config.toml gains `stealth.level = "standard"`
+✅ Adding optional field to session JSON: `{"pid": 1234, "created_at": "..."}`
+✅ New NAPI export function (additive)
+```
+### Breaking Change Examples
+```
+⛔ Renaming `onecrawl run browser` → `onecrawl browser exec`
+⛔ Removing `--native` flag from session start
+⛔ Changing MCP tool "browser" schema (existing clients break)
+⛔ Changing session JSON key: "session_name" → "name"
+⛔ Changing PyO3 class method signature
+```
 ## Decision Framework
 ### Non-Breaking Path
 - Add new flags/options alongside existing ones
-- Deprecation warnings before removal (minimum 2 alpha releases)
+- Deprecation warnings before removal (minimum 2 beta releases)
 - Backward-compatible config: new keys get defaults, old keys still work
 - Additive MCP actions (new actions, not renamed ones)
+- Session JSON: add optional fields, never remove/rename existing
 ### Breaking Path (requires justification)
-- Must document: what breaks, who is affected, migration steps
-- Must provide: migration guide or automated migration tool
-- Must bump: version increment that signals the break
-- Quality gates unchanged: unit/integration/E2E/non-regression must still pass
+- [ ] Document: what breaks, who is affected, migration steps
+- [ ] Migration: provide guide or automated migration in code
+- [ ] Version: bump that signals the break (beta → beta.N+1 minimum)
+- [ ] CHANGELOG: explicit "BREAKING" section with before/after examples
+- [ ] Quality gates unchanged: completion-gate must still pass
 ## Procedure
-1. Identify all contract surfaces affected.
-2. For each, classify as non-breaking or breaking.
-3. Present options via `ask_user` with the compatibility triad:
-   - Non-Breaking Path (additive/compatible)
-   - Breaking Path (with migration plan)
-   - Alternative Structural Path (redesign to avoid the break)
+1. Identify all contract surfaces affected (use table above).
+2. For each surface, classify impact as non-breaking or breaking.
+3. If any HIGH-impact surface is affected, present options to user:
+   - **Non-Breaking Path** (additive/compatible — deprecate first)
+   - **Breaking Path** (with migration plan)
+   - **Alternative Structural Path** (redesign to avoid the break)
 4. Document decision in commit message and CHANGELOG.
+5. If breaking: add `BREAKING:` prefix to commit message.
 ## Done Criteria
 - Decision documented with rationale.
 - Migration steps provided for any breaking change.
-- All quality gates pass.
+- All quality gates pass (completion-gate skill).
+- CHANGELOG updated with breaking change notice if applicable.
 ## Anti-patterns
-- Silent breaking changes (no documentation)
+- Silent breaking changes (no documentation, no CHANGELOG entry)
 - Breaking changes without version bump
-- Assuming internal changes are safe (NAPI/PyO3 are public)
+- Assuming internal crate changes are safe (NAPI/PyO3 expose them)
+- Removing deprecated items before 2-release grace period
+- Breaking MCP schemas without client-side migration path

package/assets/skills/completion-gate/SKILL.md CHANGED Viewed

@@ -21,37 +21,74 @@ Enforce a mandatory quality gate for every Issue, Milestone, and PR with zero-er
 ## OneCrawl Gate Commands
-Run these in order. Both must pass **twice consecutively** with zero output:
+Run in order. All must pass **twice consecutively** with zero output:
 ```bash
-# 1. Clippy (lint + static analysis) - 0 warnings required
+# 1. Clippy (lint + static analysis) — 0 warnings required
 cargo clippy --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- -W clippy::all
 # 2. Build check (fast compilation verification)
-cargo check --workspace
+cargo check -p onecrawl-cli-rs -p onecrawl-mcp-rs
-# 3. Test suite (all tests, single-threaded for determinism)
+# 3. Test suite (573 tests, single-threaded for determinism)
 cargo test --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- --test-threads=1
-# 4. Health check (binary runs correctly)
+# 4. Release build (binary artifact)
+cargo build --release -p onecrawl-cli-rs
+# 5. Health check (binary runs correctly)
 ./target/release/onecrawl health
 ./target/release/onecrawl --version
 ```
-Acceptable warnings: `onecrawl-browser` vendor crate (1 warning, do not touch).
+Acceptable warnings: `onecrawl-browser` vendor crate (1 pre-existing clippy warning — do not touch).
+## Gate Execution Checklist
+- [ ] **Pass 1 — Clippy**: 0 warnings in owned crates
+- [ ] **Pass 1 — Build**: `cargo check` exits 0
+- [ ] **Pass 1 — Tests**: 573 tests passed, 0 failed
+- [ ] **Pass 1 — Binary**: `onecrawl --version` prints `v4.0.0-beta.1`
+- [ ] **Pass 2 — Clippy**: identical to pass 1
+- [ ] **Pass 2 — Tests**: identical to pass 1
+- [ ] **No regressions**: test count ≥ baseline recorded before changes
+- [ ] **Commit**: use `git commit -F /tmp/commit-msg.txt` with Co-authored-by trailer
+## CI Verification (release.yml + rust-ci.yml)
+After push, verify CI on GitHub:
+```bash
+# Check CI status for current branch
+gh run list --repo giulio-leone/onecrawl --branch "$(git branch --show-current)" --limit 5
+# Watch a specific run
+gh run watch <run-id> --repo giulio-leone/onecrawl
+# Get failed job logs
+gh run view <run-id> --repo giulio-leone/onecrawl --log-failed
+```
+**rust-ci.yml** gates: clippy, test (--test-threads=1), build check.
+**release.yml** gates: 5-platform matrix (linux-x64, linux-arm64, macos-x64, macos-arm64, windows-x64).
 ## Procedure
-1. Run all gate commands above.
-2. If any error/warning exists in owned crates, fix and restart from step 1.
-3. Repeat until two clean consecutive passes are recorded.
-4. Only then mark status `done`/merge.
+1. Record baseline: `cargo test ... 2>&1 | tail -1` (note test count).
+2. Run all gate commands above.
+3. If any error/warning exists in owned crates, fix and restart from step 1.
+4. Repeat until two clean consecutive passes are recorded.
+5. Verify CI is green after push (`gh run list`).
+6. Only then mark status `done`/merge.
 ## Done Criteria
 - Two consecutive clean review passes are documented.
 - No unresolved pre-existing issues in touched scope.
+- CI (rust-ci.yml) is green on the pushed branch.
 ## Anti-patterns
 - Single-pass approval
-- Ignoring warnings
+- Ignoring warnings ("it's just one warning")
 - Deferring known issues to "later"
 - Excluding tests with `#[ignore]` to pass the gate
+- Pushing without verifying CI status
+- Using `cargo check --workspace` when `cargo check -p <crate>` suffices (slower)

package/assets/skills/e2e-testing/SKILL.md CHANGED Viewed

@@ -13,12 +13,13 @@ Ensure critical user flows work end-to-end with deterministic, CI-compatible exe
 ## OneCrawl E2E Architecture
 E2E tests live in `packages/onecrawl-rust/crates/onecrawl-e2e/` and require a running Chrome instance.
+Session state files: `/tmp/onecrawl-session*.json` (per-PID markers for multi-agent isolation).
 ```bash
 # Run E2E tests (requires Chrome)
 cargo test -p onecrawl-e2e -- --test-threads=1
-# Manual E2E verification (quick smoke test)
+# Manual E2E smoke test
 onecrawl session start -H                    # Headless Chrome
 onecrawl navigate https://example.com        # Navigate
 onecrawl get title                           # Verify page loaded
@@ -34,20 +35,59 @@ onecrawl session close                       # Cleanup
 5. **Config management**: config set → config show → verify change persists
 6. **Auth persistence**: auth-state save → session close → session start → auth-state load
 7. **Stealth**: session start → stealth detection-audit → verify 0% headless detection
+8. **MCP server**: onecrawl mcp → tool discovery → action execution → clean shutdown
-## Checklist
+## Multi-Agent E2E Test Template
+Tests verifying process-level isolation between concurrent agents:
+```bash
+# Agent A (PID-based session isolation)
+onecrawl session start -H -s "agent-$$-a"
+onecrawl navigate https://example.com
+TITLE_A=$(onecrawl get title)
+# Agent B (separate session, same daemon)
+onecrawl session start -H -s "agent-$$-b"
+onecrawl navigate https://httpbin.org
+TITLE_B=$(onecrawl get title)
+# Verify isolation — each session sees its own page
+[ "$TITLE_A" != "$TITLE_B" ] && echo "PASS: isolation OK"
+# Cleanup both
+onecrawl session close -s "agent-$$-a"
+onecrawl session close -s "agent-$$-b"
+```
+## Pre-Test Checklist
+- [ ] Chrome/Chromium installed and accessible
+- [ ] No stale sessions: `ls /tmp/onecrawl-session*.json` (should be empty)
+- [ ] No orphan Chrome processes: `pgrep -f "chrome.*remote-debugging" | head`
+- [ ] Release binary built: `cargo build --release -p onecrawl-cli-rs`
+## Test Execution Checklist
 - [ ] Happy path covered for each critical flow
 - [ ] Edge cases: invalid input, missing Chrome, concurrent sessions
-- [ ] Cleanup: sessions closed, temp files removed, profiles deleted
-- [ ] Deterministic: no timing-dependent assertions
+- [ ] Deterministic: no timing-dependent assertions (use wait-for, not sleep)
 - [ ] Stable selectors: use `data-testid` or semantic selectors
 - [ ] CI-compatible: headless mode, no GUI dependencies
+## Post-Test Cleanup Checklist
+- [ ] All sessions closed: `onecrawl session close` or `onecrawl session close -s <name>`
+- [ ] Temp files removed: `rm -f /tmp/onecrawl-session*.json`
+- [ ] No orphan Chrome: verify `pgrep -f "chrome.*remote-debugging"` returns empty
+- [ ] Profiles cleaned: `onecrawl profile delete <test-profile>` if created
+- [ ] Daemon stopped (if started for test): `onecrawl daemon stop`
 ## Done Criteria
 - All critical flows pass in headless mode.
 - No leaked browser processes after test completion.
+- No residual session files in `/tmp/`.
 ## Anti-patterns
 - Hard-coded sleep/timeouts instead of wait-for conditions
-- Tests that depend on network state
+- Tests that depend on network state or external services
 - Shared mutable state between test cases
+- Forgetting to close sessions (leaks Chrome processes)
+- Using `--test-threads=N` where N > 1 (session conflicts)

package/assets/skills/github-sync/SKILL.md CHANGED Viewed

@@ -11,43 +11,113 @@ Keep local plan and GitHub project artifacts perfectly aligned.
 - Creating or updating a plan
 - Changing issue status
 - Completing milestones
+- Preparing a release
 ## OneCrawl Repository
 - Owner: `giulio-leone`
 - Repo: `onecrawl`
-- Branch: `main`
-- Tags: `v4.0.0-alpha.XX`
+- Default branch: `main`
+- Current tag: `v4.0.0-beta.1`
+- CI workflows: `release.yml` (5-platform matrix), `rust-ci.yml`
 ## Naming Rules
 - Milestones: `MX: <Title>` (e.g., `M14: Deep Audit`)
 - Issues: `<type>: <description>` (e.g., `fix: session name path traversal`)
 - Branches: `<type>/<short-description>` (e.g., `fix/session-path-traversal`)
-- Tags: `v4.0.0-alpha.XX` (SemVer pre-release)
+- Tags: SemVer pre-release (e.g., `v4.0.0-beta.2`)
+- Commits: use `git commit -F /tmp/commit-msg.txt` (not `-m`), always include Co-authored-by trailer
 ## Required Labels
 - Priority: `P0-critical`, `P1-high`, `P2-medium`, `P3-low`
 - Type: `bug`, `feature`, `refactor`, `docs`, `chore`
 - Status: `todo`, `in-progress`, `review`, `done`, `blocked`
+## gh CLI Commands
+```bash
+# Create milestone
+gh api repos/giulio-leone/onecrawl/milestones -f title="M15: Title" -f state="open"
+# Create issue with labels and milestone
+gh issue create --repo giulio-leone/onecrawl \
+  --title "fix: description" \
+  --body "Details..." \
+  --label "bug,P1-high,in-progress" \
+  --milestone "M15: Title"
+# Update issue status
+gh issue edit <number> --repo giulio-leone/onecrawl --remove-label "in-progress" --add-label "done"
+# Close issue
+gh issue close <number> --repo giulio-leone/onecrawl
+# List open issues for a milestone
+gh issue list --repo giulio-leone/onecrawl --milestone "M15: Title" --state open
+# Check CI status
+gh run list --repo giulio-leone/onecrawl --branch main --limit 5
+```
 ## Procedure
 1. Create GitHub milestone matching local plan milestone.
 2. Create issues for each task, linked to milestone.
 3. Apply labels (priority + type + status).
-4. Update issue status as work progresses.
-5. Close milestone when all issues are done.
+4. Update issue status labels as work progresses.
+5. Close issues when completion gate passes.
+6. Close milestone when all issues are done.
+## OneCrawl Release Workflow
+### Pre-Release Checklist
+- [ ] All milestone issues closed
+- [ ] Completion gate passed (2 consecutive clean runs)
+- [ ] CHANGELOG.md updated with release notes
+- [ ] Version bumped in `packages/onecrawl-rust/Cargo.toml`
+- [ ] `cargo check --workspace` (propagates version to all crates)
+### Release Steps
+```bash
+# 1. Version bump
+# Edit packages/onecrawl-rust/Cargo.toml — update version field
+cargo check --workspace
+# 2. Update CHANGELOG.md — add release section
+# 3. Commit
+echo "chore: release v4.0.0-beta.2
+- <summary of changes>
+Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>" > /tmp/release-msg.txt
+git add -A && git commit -F /tmp/release-msg.txt
+# 4. Tag and push
+git tag v4.0.0-beta.2
+git push origin main --tags
+# 5. npm publish (from packages/onecrawl/)
+cd packages/onecrawl
+npm version 4.0.0-beta.2 --no-git-tag-version
+npm publish --tag beta --access public
+# 6. Verify release.yml CI (5-platform matrix)
+gh run list --repo giulio-leone/onecrawl --workflow release.yml --limit 1
+```
-## OneCrawl Release Sync
-After each alpha release:
-1. Tag: `git tag v4.0.0-alpha.XX && git push origin main --tags`
-2. npm: `npm publish --tag alpha --access public`
-3. CHANGELOG: update with release notes
+### Post-Release Verification
+- [ ] `gh run view <id>` — all 5 platforms green
+- [ ] `npm view onecrawl versions --json | tail` — version published
+- [ ] GitHub tag visible: `gh release list --repo giulio-leone/onecrawl`
 ## Done Criteria
 - All plan items have matching GitHub artifacts.
 - Labels are applied consistently.
 - Milestones reflect actual progress.
+- Releases are tagged, CI-verified, and npm-published.
 ## Anti-patterns
 - Local plan diverges from GitHub state
 - Missing labels on issues
 - Stale milestones with no issues
+- Pushing tags before CI verification
+- Using `git commit -m` instead of `git commit -F`

package/assets/skills/interaction-loop/SKILL.md CHANGED Viewed

@@ -13,14 +13,14 @@ Enforce a strict iterative decision loop with consistent user checkpoints and ex
 - Completing an autonomous run
 ## Default 5-Option Structure
-1. **Recommended Development Path** (mark with star as most future-proof)
+1. **Recommended Development Path** (mark with ⭐ as most future-proof)
 2. **Alternative Development Path A**
 3. **Alternative Development Path B**
 4. **Freeform** — set enum value to `custom` (allows free-text input)
 5. **Autonomous Mode**
 ## Escalation (Compatibility Triad)
-Use ONLY when there is a concrete compatibility/contract impact:
+Use ONLY when there is a concrete compatibility/contract impact (see breaking-change-paths skill):
 1. Non-Breaking Path
 2. Breaking Path
 3. Alternative Structural Path
@@ -31,17 +31,62 @@ Use ONLY when there is a concrete compatibility/contract impact:
 Each option must include:
 `<Title> — Why: <reason> | Leads to: <next step> | Risk: <low|medium|high>`
+### OneCrawl Example Cards
+```
+⭐ 1. Add new --timeout flag (non-breaking)
+   Why: Users need session timeouts | Leads to: CLI + daemon changes | Risk: low
+2. Replace --headless with --head (breaking)
+   Why: Simpler flag name | Leads to: CLI + docs + migration | Risk: high
+3. Config-only timeout (no CLI flag)
+   Why: Minimal surface change | Leads to: config.toml + docs | Risk: low
+4. Freeform — describe your preferred approach
+5. Autonomous Mode — I'll implement the recommended path and report back
+```
+## Copilot CLI Implementation
+In Copilot CLI runtime, the interaction loop uses natural language responses (not tool calls).
+The agent presents options as a numbered list in the response and the user replies with their choice.
+### Autonomous Mode Behavior
+When the user selects Autonomous Mode (option 5):
+1. Agent implements the recommended path (option 1) without further prompts.
+2. Follows all applicable skills (planning-tracking, testing-policy, completion-gate).
+3. On completion, presents a summary with:
+   - What was done
+   - Test results (exact counts)
+   - Files changed
+   - Asks: "Rate this result (1-5), suggest next action, or say 'I am satisfied'."
+4. Loop continues until the exact phrase **"I am satisfied"** is received.
+### Decision Tracking
+Record each decision point in the session database:
+```sql
+INSERT INTO session_state (key, value) VALUES
+  ('decision_1', 'Option 1: Add --timeout flag (non-breaking) — user chose at 2024-01-15T10:30Z');
+```
 ## Rules
 - One clear question per iteration
 - Exactly 5 options every time
-- Option 4 is ALWAYS Freeform (hard invariant)
+- Option 4 is ALWAYS Freeform (hard invariant — label "Freeform", enum value `custom`)
 - Options 1-3 carry forward from previous iteration unless explicitly replaced
-- Mark recommendation with star; explain changes
+- Mark recommendation with ⭐; explain if recommendation changes between iterations
 - At autonomous completion: ask for rating, next action, satisfaction
-- Loop stops ONLY on exact phrase: "I am satisfied"
+- Loop stops ONLY on exact phrase: **"I am satisfied"**
+## Done Criteria
+- User has explicitly ended the loop with "I am satisfied".
+- All decisions are documented in session log.
 ## Anti-patterns
 - Multi-question prompts in one iteration
-- Missing or renaming Freeform option
+- Missing or renaming Freeform option (option 4)
 - Stopping without explicit "I am satisfied"
-- Using compatibility triad as default (it's an escalation)
+- Using compatibility triad as default (it's an escalation for breaking changes)
+- Implementing without presenting options first
+- Changing recommendation without explaining why

package/assets/skills/planning-tracking/SKILL.md CHANGED Viewed

@@ -19,42 +19,85 @@ interface Milestone { id: string; description: string; priority: "critical"|"hig
 interface Issue { id: string; task: string; priority: "critical"|"high"|"medium"|"low"; status: "todo"|"in_progress"|"review"|"done"|"blocked"; depends_on: string[]; children: Record<string, Issue>; }
 ```
+## SQL-Based Tracking
+Use the session database for operational tracking:
+```sql
+-- Create plan items
+INSERT INTO todos (id, title, description, status) VALUES
+  ('m1-cdp-refactor', 'Refactor CDP error handling', 'Update onecrawl-cdp error types to use thiserror, propagate to onecrawl-mcp-rs', 'pending'),
+  ('m1-cli-flags', 'Add --timeout flag', 'Add session timeout to onecrawl-cli-rs session start', 'pending'),
+  ('m1-tests', 'Add timeout tests', 'Unit tests for session timeout in cli-rs and cdp', 'pending');
+-- Declare dependencies
+INSERT INTO todo_deps (todo_id, depends_on) VALUES
+  ('m1-cli-flags', 'm1-cdp-refactor'),   -- CLI depends on CDP changes
+  ('m1-tests', 'm1-cli-flags');            -- Tests depend on implementation
+-- Find ready items (no pending dependencies)
+SELECT t.id, t.title FROM todos t
+WHERE t.status = 'pending'
+AND NOT EXISTS (
+  SELECT 1 FROM todo_deps td
+  JOIN todos dep ON td.depends_on = dep.id
+  WHERE td.todo_id = t.id AND dep.status != 'done'
+);
+-- Update status as work progresses
+UPDATE todos SET status = 'in_progress' WHERE id = 'm1-cdp-refactor';
+UPDATE todos SET status = 'done' WHERE id = 'm1-cdp-refactor';
+```
 ## OneCrawl Workspace Parallelism Rules
 Safe to parallelize:
-- Changes in different crates (e.g., `onecrawl-cdp` + `onecrawl-parser`)
+- Changes in independent crates (e.g., `onecrawl-crypto` + `onecrawl-parser`)
 - NAPI + PyO3 bindings (independent build targets)
 - Documentation + code changes
+- Tests in different crates (with `--test-threads=1` per crate)
 NOT safe to parallelize:
 - Changes in a crate + its dependents (e.g., `onecrawl-cdp` + `onecrawl-cli-rs`)
 - Multiple changes to the same file
-- Version bumps (must be sequential: Cargo.toml then package.json)
+- Version bumps (must be sequential: Cargo.toml → package.json → tag)
+- Any two changes that touch session state or `/tmp/onecrawl-session*.json`
 ## OneCrawl Crate Dependency Graph
 ```
-onecrawl-cli-rs
-  -> onecrawl-mcp-rs -> onecrawl-cdp -> onecrawl-browser
-  -> onecrawl-server -> onecrawl-cdp
-  -> onecrawl-core
-  -> onecrawl-crypto
-  -> onecrawl-parser
-  -> onecrawl-storage
+onecrawl-cli-rs (binary)
+  ├── onecrawl-mcp-rs ──→ onecrawl-cdp ──→ onecrawl-browser (vendor)
+  ├── onecrawl-server ──→ onecrawl-cdp
+  ├── onecrawl-core
+  ├── onecrawl-crypto
+  ├── onecrawl-parser
+  └── onecrawl-storage
+Independent leaf crates (safe to change in parallel):
+  onecrawl-core, onecrawl-crypto, onecrawl-parser, onecrawl-storage
+High-impact crates (changes cascade):
+  onecrawl-cdp (affects: mcp-rs, server, cli-rs)
+  onecrawl-browser (affects: cdp, mcp-rs, server, cli-rs)
 ```
 ## Procedure
-1. Build plan before implementation.
-2. Assign unique IDs to milestones/issues.
-3. Declare dependencies for every issue.
-4. Execute by dependency order, then priority (critical then high then medium then low).
-5. Run independent same-priority milestones in parallel when safe.
-6. Update statuses continuously.
+1. Build plan before implementation (use SQL todos).
+2. Assign unique IDs to milestones/issues (kebab-case: `m1-feature-name`).
+3. Declare dependencies using `todo_deps` table.
+4. Execute by dependency order, then priority (critical → high → medium → low).
+5. Run independent same-priority items in parallel when safe per rules above.
+6. Update statuses continuously (`pending` → `in_progress` → `done`).
+7. Sync to GitHub when milestones complete (github-sync skill).
 ## Done Criteria
-- Plan exists, is up-to-date, and reflects actual execution state.
-- Dependencies are respected and parallel work is safe.
+- Plan exists in SQL todos, is up-to-date, and reflects actual execution state.
+- Dependencies are respected and parallel work follows safety rules.
+- GitHub artifacts are synced (github-sync skill).
 ## Anti-patterns
 - Starting implementation without a plan
-- Missing dependency declarations
+- Missing dependency declarations (causes parallel conflicts)
 - Running blocked items in parallel
+- Changing `onecrawl-cdp` and `onecrawl-cli-rs` simultaneously
+- Using free-form notes instead of structured SQL tracking

package/assets/skills/policy-coherence-audit/SKILL.md CHANGED Viewed

@@ -11,42 +11,77 @@ Detect and remove contradictions across agent policies before execution.
 - Updating AGENTS.md or runtime adapters
 - Merging new workflow rules
 - Noticing behavioral ambiguity during execution
+- Adding or modifying a SKILL.md file
 ## OneCrawl Policy Files
-| File | Purpose |
-|------|---------|
-| `AGENTS.md` | Root dispatcher, skill catalog |
-| `AGENTS.vscode.MD` | VS Code runtime adapter |
-| `AGENTS.copilot-cli.MD` | Copilot CLI runtime adapter |
-| `.github/copilot-instructions.md` | Build, test, architecture reference |
-| `.github/skills/*/SKILL.md` | 14 operational skills |
+| File | Purpose | Owner |
+|------|---------|-------|
+| `AGENTS.md` | Root dispatcher, skill catalog | Runtime-critical contract |
+| `AGENTS.vscode.MD` | VS Code runtime adapter | `vscode_askQuestions` binding |
+| `AGENTS.copilot-cli.MD` | Copilot CLI runtime adapter | `ask_user` binding |
+| `.github/copilot-instructions.md` | Build, test, architecture reference | Project-level context |
+| `.github/skills/*/SKILL.md` | 14 operational skills | Procedural ownership |
 ## Audit Checklist
-1. Language: Are all policies in English?
-2. Interaction model: Is ask_user (CLI) vs vscode_askQuestions (VS Code) correctly bound?
-3. Freeform invariant: Is option 4 always Freeform in all 5-option prompts?
-4. Completion gates: Do all skills reference the same gate procedure?
-5. Scope: Are skills non-overlapping? No duplicate procedures?
-6. Skill references: Do AGENTS.md catalog entries match actual SKILL.md files?
-7. Tool names: Are MCP tool references correct (onecrawl run, not chrome-devtools)?
-8. Build commands: Are cargo/npm commands consistent across all skills?
-9. Version: Are version references up-to-date?
-10. Stop condition: Is "I am satisfied" the only stop phrase in all loop references?
+### Language & Format
+- [ ] All policies are in English (no mixed languages)
+- [ ] Markdown formatting is consistent (headers, lists, code blocks)
+- [ ] No prose duplication between AGENTS.md and SKILL.md files
+### Runtime Binding
+- [ ] `AGENTS.copilot-cli.MD` uses only `ask_user` (never `vscode_askQuestions`)
+- [ ] `AGENTS.vscode.MD` uses only `vscode_askQuestions` (never `ask_user`)
+- [ ] Both adapters have semantic parity (same options, different tool names)
+### Interaction Model
+- [ ] Option 4 is ALWAYS "Freeform" with enum value `custom` in all 5-option prompts
+- [ ] Stop condition is always exact phrase "I am satisfied"
+- [ ] Autonomous mode is always option 5
+- [ ] Compatibility triad is escalation-only, not default
+### Completion Gate Consistency
+- [ ] All skills reference same gate commands:
+  - `cargo clippy --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- -W clippy::all`
+  - `cargo test --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- --test-threads=1`
+- [ ] All skills agree: `onecrawl-browser` vendor crate 1 warning is acceptable
+- [ ] Test count reference is consistent (573 tests as of v4.0.0-beta.1)
+### Tool & Command Coherence
+- [ ] MCP tool references use `onecrawl run <tool>` format (not `chrome-devtools`)
+- [ ] Build commands are consistent: `cargo check -p onecrawl-cli-rs -p onecrawl-mcp-rs`
+- [ ] Commit convention: `git commit -F /tmp/file.txt` (not `-m`), Co-authored-by trailer
+- [ ] Version references: `v4.0.0-beta.1` (not stale alpha references)
+### Cross-Skill Coherence
+- [ ] No two skills define the same procedure (ownership is exclusive)
+- [ ] Skill catalog in `AGENTS.md` matches actual `.github/skills/*/SKILL.md` files (count = 14)
+- [ ] Each skill trigger description in catalog matches the skill's "Use when" section
+- [ ] No circular dependencies between skills
+### OneCrawl-Specific
+- [ ] Crate exclusions consistent: `--exclude onecrawl-e2e --exclude onecrawl-python`
+- [ ] VecDeque convention: `.back()` not `.last()` in all skills
+- [ ] `rand::rng()` warning present where async code is discussed
+- [ ] Session files: `/tmp/onecrawl-session*.json` format consistent
 ## Procedure
 1. Read all policy files listed above.
-2. Run each checklist item.
-3. Document any contradictions found.
-4. Fix contradictions (update the owning file).
-5. Verify fix does not introduce new contradictions.
+2. Run each checklist item — mark pass/fail.
+3. Document any contradictions found with file:line references.
+4. Fix contradictions (update the **owning** file, not duplicates).
+5. Verify fix does not introduce new contradictions (re-run checklist).
+6. If skill content was changed, verify AGENTS.md catalog entry still matches.
 ## Done Criteria
 - All checklist items pass.
 - No contradictions between any two policy files.
+- Skill catalog count matches actual skill directory count.
 ## Anti-patterns
 - Duplicating procedures across skills and AGENTS files
 - Mixing tool names across runtimes
 - Updating one file without checking cross-references
+- Adding a skill without updating AGENTS.md catalog
+- Fixing a contradiction by duplicating content (fix at source instead)

package/assets/skills/programmatic-tool-calling/SKILL.md CHANGED Viewed

@@ -8,34 +8,103 @@ description: "Multi-step tool workflows via code orchestration to reduce latency
 Execute multi-step tool workflows via code orchestration to reduce latency, context pollution, and token overhead.
 ## Use when
-- 3+ dependent tool calls
-- Large intermediate outputs (logs, tables, files)
+- 3+ dependent tool calls in sequence
+- Large intermediate outputs (logs, tables, file listings)
 - Branching logic, retries, or fan-out/fan-in workflows
+- Multi-crate operations that follow the dependency graph
 ## Core Idea
-Treat tools as callable functions inside an orchestration runtime (script/runner), not as one-turn-at-a-time chat actions.
+Treat tools as callable functions inside an orchestration runtime (bash script, Python), not as one-turn-at-a-time chat actions. Return only high-signal summaries to the model context.
-## Procedure
-1. Generate/execute orchestration code for loops, conditionals, parallel calls, retries, and early termination.
-2. Process intermediate data in runtime (filter/aggregate/transform) instead of returning raw data to model context.
-3. Return only high-signal outputs to the model (summary, decision, artifact references).
+## OneCrawl Patterns
+### Pattern 1: Sequential Gate Check (reduce 4 tool calls to 1)
+```bash
+# Instead of 4 separate tool calls, run gate as single orchestrated script
+GATE_RESULT=$(
+  cargo clippy --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- -W clippy::all 2>&1 && \
+  cargo test --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- --test-threads=1 2>&1 | tail -5
+)
+echo "GATE: $GATE_RESULT"
+```
+### Pattern 2: Multi-Crate Parallel Build Check
+```bash
+# Fan-out: check independent crates in parallel
+cargo check -p onecrawl-crypto &
+cargo check -p onecrawl-parser &
+cargo check -p onecrawl-storage &
+cargo check -p onecrawl-core &
+wait
+# Fan-in: all must succeed
+echo "All independent crates OK"
+# Then check dependent crates sequentially
+cargo check -p onecrawl-cdp && cargo check -p onecrawl-mcp-rs && cargo check -p onecrawl-cli-rs
+```
+### Pattern 3: Batch File Analysis (filter before returning to model)
+```bash
+# Instead of reading 12 Cargo.toml files individually:
+grep -r "^version" packages/onecrawl-rust/crates/*/Cargo.toml | \
+  grep -v "\.version" | head -20
+# Returns only the version lines, not full file contents
+```
-## Why It Works (provider/model independent)
-- Fewer model round-trips for multi-call workflows.
-- Intermediate data stays out of context unless needed.
-- Explicit code control flow is easier to test, monitor, and debug.
+### Pattern 4: Multi-Agent Session Orchestration
+```bash
+# Orchestrate multiple onecrawl sessions for testing
+for agent in agent-a agent-b agent-c; do
+  onecrawl session start -H -s "$agent" &
+done
+wait
+# Verify all started
+for agent in agent-a agent-b agent-c; do
+  onecrawl get title -s "$agent"
+done
+# Cleanup
+for agent in agent-a agent-b agent-c; do
+  onecrawl session close -s "$agent"
+done
+```
+### Pattern 5: Conditional Retry with Backoff
+```bash
+MAX_RETRIES=3
+for i in $(seq 1 $MAX_RETRIES); do
+  if cargo test -p onecrawl-cdp -- --test-threads=1 2>&1 | tail -1 | grep -q "ok"; then
+    echo "PASS on attempt $i"
+    break
+  fi
+  echo "FAIL attempt $i/$MAX_RETRIES — waiting ${i}s"
+  sleep "$i"
+done
+```
+## Procedure
+1. Identify workflows with 3+ sequential tool calls.
+2. Write orchestration code (bash preferred for CLI operations).
+3. Process intermediate data in the script (filter/aggregate/transform).
+4. Return only high-signal output to model context (summary, pass/fail, counts).
+5. Validate tool results before using them in subsequent steps.
 ## Guardrails
-- Strict input/output schemas.
-- Validate tool results before use.
-- Idempotent/retry-safe tool design when possible.
-- Timeout/cancellation/expiry handling.
-- Sandbox execution for untrusted code; never blindly execute external payloads.
+- Strict input/output schemas — validate before use.
+- Idempotent/retry-safe operations when possible.
+- Timeout handling: `timeout 60 cargo test ...` for long operations.
+- Never blindly execute tool output as code.
+- Use `set -e` in bash scripts to fail fast on errors.
+- Cap parallel jobs to avoid resource exhaustion: `wait` after fan-out.
 ## Done Criteria
 - Workflow completes with reduced context load and deterministic control flow.
+- Intermediate data stays out of model context unless decision-relevant.
 ## Anti-patterns
-- Returning raw intermediate payloads to the model by default
-- Unbounded loops without stop conditions
-- Executing unvalidated tool output
+- Returning raw `cargo test` output (500+ lines) to model context
+- Unbounded loops without stop conditions or max retries
+- Executing unvalidated tool output as shell commands
+- Sequential tool calls for independent operations (use parallel)
+- Hardcoding paths instead of using workspace-relative references

package/assets/skills/rollback-rca/SKILL.md CHANGED Viewed

@@ -14,42 +14,77 @@ Stop ineffective iteration loops after repeated completion gate failures and cho
 | Failure | Root Cause | Recovery |
 |---------|-----------|----------|
-| Clippy warnings persist | Vendor crate or structural pattern | Add targeted allow with justification |
-| Test timeout | Browser not running or port conflict | Check daemon status |
-| Linker error PyO3 | PyO3 test mode linker conflict | Exclude onecrawl-python from test runs |
-| CDP connection refused | Daemon crashed or wrong port | Restart daemon |
-| Memory growth in tests | Unbounded Vec or HashMap | Cap with VecDeque |
-| Send compile error | rand rng across await | Scope RNG in sync block before await |
+| Clippy warnings persist | Vendor crate or structural pattern | Add targeted `#[allow(...)]` with comment justification |
+| Test timeout | Browser not running or port conflict | `onecrawl health` → restart daemon if needed |
+| Linker error (PyO3) | PyO3 test mode linker conflict | Exclude `onecrawl-python` from test runs |
+| CDP connection refused | Daemon crashed or wrong port | `onecrawl daemon stop && onecrawl daemon start` |
+| Memory growth in tests | Unbounded Vec or HashMap | Cap with VecDeque, use `.back()` not `.last()` |
+| Send compile error | `rand::rng()` across `.await` | Scope RNG in sync block: `let val = { rand::rng().gen() };` |
+| Serde parse failure | `unwrap_or_default()` on bad input | Use `match` with proper error propagation |
+| Session file collision | Multi-agent PID race | Use `session start -s "unique-name-$$"` |
+| Build cache stale | Incremental compilation artifacts | `cargo clean && cargo check --workspace` |
 ## RCA Procedure
-1. Reproduce: Run the exact failing command and capture full output.
-2. Classify: Is it architecture, dependency, scope, or environment?
-3. Analyze:
-   - Architecture: Does the fix require structural changes across crates?
-   - Dependency: Is a third-party crate causing the issue?
-   - Scope: Was the original issue too broadly defined?
-   - Environment: Is it platform or toolchain specific?
-## Recovery Options (present via ask_user)
-1. Rescope: Break the issue into smaller, independently-completable pieces.
-2. Rollback: git stash or git checkout to last known-good state, then retry with different approach.
-3. Redesign: Rethink the approach, maybe the breaking change path is needed.
+### Step 1 — Reproduce
+```bash
+# Capture full failing output
+cargo clippy --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- -W clippy::all 2>&1 | tee /tmp/rca-clippy.log
+cargo test --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- --test-threads=1 2>&1 | tee /tmp/rca-test.log
+```
+### Step 2 — Classify
+- [ ] **Architecture**: Fix requires structural changes across multiple crates?
+- [ ] **Dependency**: Third-party crate causing the issue? (`cargo tree -p <crate>`)
+- [ ] **Scope**: Original issue too broadly defined? Can it be split?
+- [ ] **Environment**: Platform or toolchain specific? (check rust-ci.yml matrix)
+### Step 3 — Analyze
+Examine the crate dependency graph to understand blast radius:
+```
+onecrawl-cli-rs → onecrawl-mcp-rs → onecrawl-cdp → onecrawl-browser
+onecrawl-cli-rs → onecrawl-server → onecrawl-cdp
+onecrawl-cli-rs → onecrawl-core, onecrawl-crypto, onecrawl-parser, onecrawl-storage
+```
+A failure in `onecrawl-cdp` affects CLI, MCP, and server. A failure in `onecrawl-crypto` is isolated.
+## Recovery Options (present to user)
+1. **Rescope**: Break the issue into smaller, independently-completable pieces.
+2. **Rollback**: Return to last known-good state, retry with different approach.
+3. **Redesign**: Rethink the approach — the breaking-change path may be needed.
 ## OneCrawl Rollback Commands
 ```bash
-git stash push -m "rollback: issue-id"
-git checkout v4.0.0-alpha.XX
-cargo check --workspace
+# Option A: Stash current work
+git stash push -m "rollback: issue-id — gate failure #3"
+# Option B: Hard reset to last tag
+git checkout v4.0.0-beta.1
+# Option C: Reset specific files to main
+git checkout main -- packages/onecrawl-rust/crates/<crate>/src/
+# Verify clean state after rollback
+cargo check -p onecrawl-cli-rs -p onecrawl-mcp-rs
 cargo test --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- --test-threads=1
 ```
+## Post-Recovery Verification
+- [ ] Gate commands pass after recovery (completion-gate skill)
+- [ ] Root cause documented in session log
+- [ ] Recovery option chosen is recorded
+- [ ] If rescoped: new sub-issues created on GitHub (github-sync skill)
+- [ ] Cleanup: `rm -f /tmp/rca-*.log`
 ## Done Criteria
 - Root cause identified and documented.
 - Recovery option selected and executed.
-- Gate passes after recovery.
+- Completion gate passes after recovery.
 ## Anti-patterns
 - Continuing to iterate without analyzing root cause
 - Ignoring test failures to move forward
 - Rolling back without understanding what went wrong
+- Applying the same fix strategy after 3 failures (definition of insanity)
+- Not cleaning up RCA artifacts (`/tmp/rca-*.log`)

package/assets/skills/session-logging/SKILL.md CHANGED Viewed

@@ -15,55 +15,97 @@ Maintain an accurate, auditable execution journal for each working session.
 ## OneCrawl Session Template
 ```markdown
-# Session: YYYY-MM-DD
+# Session: YYYY-MM-DD-HHmm
 ## Status
-- Branch: main
-- Version: v4.0.0-alpha.XX
+- Branch: <current branch>
+- Version: v4.0.0-beta.1
 - Mode: Autonomous / Interactive
+- Started: <ISO 8601 timestamp>
+- Ended: <ISO 8601 timestamp>
 ## Work Completed
-- [ ] Issue/feature description
-- [ ] Tests added/updated
+- [ ] Issue/feature description (link to GitHub issue)
+- [ ] Tests added/updated (count: +N new, M modified)
 ## Completion Gate Evidence
-- Clippy: 0 warnings (1 vendor)
-- Tests: XX passed, 0 failed
-- Build: cargo check clean
+- Clippy: 0 warnings (1 vendor — onecrawl-browser)
+- Tests: 573 passed, 0 failed, 0 ignored
+- Build: cargo check -p onecrawl-cli-rs -p onecrawl-mcp-rs clean
+- Binary: ./target/release/onecrawl --version → v4.0.0-beta.1
+- CI: rust-ci.yml run #<id> — all green
 ## Decisions
-- Decision 1: rationale
+- Decision 1: rationale (link to interaction-loop option chosen)
 ## Blockers
 - None / description
 ## GitHub Sync
-- Commits pushed: X
-- Tags: vX.X.X-alpha.XX
-- npm: published vX.X.X-alpha.XX
+- Commits pushed: X (SHAs: abc1234, def5678)
+- Issues closed: #N, #M
+- Tags: v4.0.0-beta.X
+- npm: published v4.0.0-beta.X
+- CI runs: <run-id> (green/red)
 ## OneCrawl Health
-- CLI: vX.X.X-alpha.XX
-- Binary: XXM
+- CLI version: v4.0.0-beta.1
+- Binary size: XXM (./target/release/onecrawl)
 - Daemon: running/stopped
-- Sessions: X active
+- Active sessions: X (/tmp/onecrawl-session*.json)
+- Crate count: 12 workspace members
+```
+## SQL Checkpoint Integration
+Use the session database to track session state across tool calls:
+```sql
+-- Record session start
+INSERT OR REPLACE INTO session_state (key, value) VALUES
+  ('session_start', datetime('now')),
+  ('session_branch', '<branch>'),
+  ('session_version', 'v4.0.0-beta.1'),
+  ('baseline_test_count', '573');
+-- Record completion gate pass
+INSERT OR REPLACE INTO session_state (key, value) VALUES
+  ('gate_pass_1', datetime('now')),
+  ('gate_clippy', '0 warnings'),
+  ('gate_tests', '573 passed, 0 failed');
+-- Query session state
+SELECT key, value FROM session_state WHERE key LIKE 'gate_%' OR key LIKE 'session_%';
 ```
 ## OneCrawl Version Bump Checklist
-1. Update version in `packages/onecrawl-rust/Cargo.toml`
-2. `cargo check --workspace` (propagates to all crates)
-3. Commit with `chore: bump version to vX.X.X-alpha.XX`
-4. `git tag vX.X.X-alpha.XX`
-5. `git push origin main --tags`
-6. Update `packages/onecrawl/package.json` version
-7. `node scripts/sync-assets.js`
-8. `npm publish --tag alpha --access public`
+- [ ] Update version in `packages/onecrawl-rust/Cargo.toml`
+- [ ] `cargo check --workspace` (propagates to all crates)
+- [ ] Commit: `git commit -F /tmp/bump-msg.txt` with Co-authored-by trailer
+- [ ] Tag: `git tag v4.0.0-beta.X`
+- [ ] Push: `git push origin main --tags`
+- [ ] Update `packages/onecrawl/package.json` version
+- [ ] `node scripts/sync-assets.js`
+- [ ] `npm publish --tag beta --access public`
+## Session Health Snapshot Commands
+```bash
+# Capture health snapshot for session log
+./target/release/onecrawl --version
+ls -la ./target/release/onecrawl | awk '{print $5}'  # Binary size
+ls /tmp/onecrawl-session*.json 2>/dev/null | wc -l    # Active sessions
+git --no-pager log --oneline -5                        # Recent commits
+```
 ## Done Criteria
-- Session journal file exists with all required sections.
-- All status fields are accurate.
+- Session journal exists with all required sections filled.
+- All status fields are accurate and timestamped.
+- Completion gate evidence includes exact counts (not "all passed").
 ## Anti-patterns
 - Missing completion gate evidence
 - Undocumented decisions
 - Stale version numbers
+- Using approximate test counts ("~500 passed")
+- Missing GitHub sync SHAs and CI run IDs

package/assets/skills/systematic-debugging/SKILL.md CHANGED Viewed

@@ -20,6 +20,11 @@ Replace blind trial-and-error debugging with a structured, evidence-based proces
 2. Capture the **initial error output** (stack trace, console error, failing test output).
 3. If the bug is intermittent, note frequency and conditions.
+```bash
+# OneCrawl: reproduce a failing test with verbose output
+RUST_LOG=debug cargo test -p onecrawl-cdp -- failing_test_name --test-threads=1 --nocapture 2>&1 | tee debug-data.log
+```
 ### Step 2 — Hypothesize
 1. Based on the error and context, form **max 3 ranked hypotheses**.
 2. For each hypothesis state:
@@ -31,26 +36,37 @@ Replace blind trial-and-error debugging with a structured, evidence-based proces
 ### Step 3 — Instrument
 1. Add **targeted logging** at the locations identified in hypotheses.
 2. Use structured prefixes so output is parseable:
+   ```rust
+   // Rust: use eprintln! for debug (goes to stderr)
+   eprintln!("[DEBUG:H1:session_start] session_name={}, pid={}", name, pid);
+   ```
    ```
    [DEBUG:H1:functionName] variable = value
    [DEBUG:H2:moduleName] state = value
    ```
 3. Log **inputs, outputs, and intermediate state** — not just "reached here".
-4. For async code, include timestamps:
-   ```
-   [DEBUG:H1:fetch] ${new Date().toISOString()} response.status = ${res.status}
+4. For async Rust code, include context about the await point:
+   ```rust
+   eprintln!("[DEBUG:H1:cdp_send] before await, cmd={}", cmd);
+   let result = cdp.send(cmd).await;
+   eprintln!("[DEBUG:H1:cdp_send] after await, result={:?}", result);
    ```
 ### Step 4 — Collect
 1. Run the code / test that triggers the bug.
 2. Pipe **all debug output** to `debug-data.log`:
-   - Terminal: `node app.js 2>&1 | tee debug-data.log`
-   - Test runner: `npm test 2>&1 | tee debug-data.log`
-   - Browser: copy console output into `debug-data.log`
+   ```bash
+   # Rust test with full output
+   RUST_LOG=debug cargo test -p onecrawl-cdp -- test_name --test-threads=1 --nocapture 2>&1 | tee debug-data.log
+   # OneCrawl CLI operation
+   RUST_LOG=debug onecrawl session start -H 2>&1 | tee debug-data.log
+   ```
 3. If using OneCrawl MCP tools, also collect:
    - `onecrawl run browser get_console_messages` → append to `debug-data.log`
    - `onecrawl har drain` → append failed/relevant requests
    - `onecrawl health` → append daemon/session state
+   - `onecrawl config show` → append configuration
 ### Step 5 — Analyze
 1. Read `debug-data.log` and correlate with hypotheses.
@@ -64,12 +80,16 @@ Replace blind trial-and-error debugging with a structured, evidence-based proces
 ### Step 6 — Fix
 1. Apply the **minimal fix** that addresses the confirmed root cause.
 2. Re-run the failing test / reproduction steps to verify the fix.
-3. Ensure no regressions: run the full test suite if available.
+3. Run full suite to ensure no regressions:
+   ```bash
+   cargo test --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- --test-threads=1
+   ```
 ### Step 7 — Clean Up
-1. Remove **all** `[DEBUG:...]` logging added in Step 3.
+1. Remove **all** `[DEBUG:...]` logging / `eprintln!` added in Step 3.
 2. Delete `debug-data.log`.
 3. Commit only the fix, not the debug instrumentation.
+4. Use `git diff --cached` to verify no debug code is staged.
 ## MCP Tool Integration
@@ -87,17 +107,28 @@ When available, prefer OneCrawl MCP tools over manual instrumentation:
 | `onecrawl config show` | Verify configuration values |
 | `onecrawl page-watcher drain` | Monitor DOM changes |
 | `onecrawl network-log drain` | Live network traffic |
-| `cargo test -p <crate> -- test_name` | Run specific failing test |
-| `RUST_LOG=debug cargo test` | Verbose test output |
+| `cargo test -p <crate> -- test_name --nocapture` | Run specific failing test with output |
+| `RUST_LOG=debug cargo test` | Verbose Rust test output |
+## OneCrawl-Specific Debug Checklist
+- [ ] Check daemon status: `onecrawl health`
+- [ ] Check session files: `ls /tmp/onecrawl-session*.json`
+- [ ] Check Chrome processes: `pgrep -la chrome | grep remote-debugging`
+- [ ] Check port conflicts: `lsof -i :9222` (default CDP port)
+- [ ] Check Rust toolchain: `rustc --version && cargo --version`
+- [ ] VecDeque: using `.back()` not `.last()`?
+- [ ] `rand::rng()`: scoped in sync block before `.await`?
+- [ ] Serde: using `match` not `unwrap_or_default()` on parse?
 ## Escalation
 If after **2 instrumentation → analysis cycles** the root cause is still unclear:
 1. Document all collected evidence in `debug-data.log`
 2. Present findings to the user with the data
-3. Ask whether to:
+3. Suggest options:
    - Broaden the investigation scope
    - Involve additional expertise (e.g., review architecture)
    - Accept a workaround while root cause is investigated
+4. If 3+ fix attempts fail, invoke rollback-rca skill.
 ## Done Criteria
 - Root cause identified and documented
@@ -110,7 +141,7 @@ If after **2 instrumentation → analysis cycles** the root cause is still uncle
 - **Blind changes** — modifying code without evidence of what is wrong
 - **Skipping log collection** — "fixing" based on guesses without data
 - **Shotgun debugging** — changing multiple things simultaneously
-- **Leaving debug code** — forgetting to remove `console.log` instrumentation
+- **Leaving debug code** — forgetting to remove `eprintln!` instrumentation
 - **Symptom fixing** — addressing the visible error instead of the root cause
 - **Infinite retry loops** — more than 3 fix attempts without re-analyzing from data
 - **No reproduction** — attempting to fix without confirming the bug exists

package/assets/skills/testing-policy/SKILL.md CHANGED Viewed

@@ -20,17 +20,21 @@ Guarantee deterministic, CI-ready quality verification with no regressions and n
 ## OneCrawl Test Commands
 ```bash
-# Full test suite (excludes E2E and PyO3 — known linker issue in test mode)
+# Full test suite — 573 tests (excludes E2E and PyO3)
 cargo test --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- --test-threads=1
-# Single crate (fast iteration)
+# Single crate (fast iteration during development)
 cargo test -p onecrawl-cli-rs -- --test-threads=1
 cargo test -p onecrawl-cdp -- --test-threads=1
 cargo test -p onecrawl-mcp-rs -- --test-threads=1
+cargo test -p onecrawl-crypto -- --test-threads=1
-# Run specific test
+# Run specific test by name
 cargo test -p onecrawl-cli-rs -- test_name --test-threads=1
+# Verbose output for debugging failures
+RUST_LOG=debug cargo test -p onecrawl-cdp -- test_name --test-threads=1 --nocapture
 # Clippy as static analysis layer
 cargo clippy --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- -W clippy::all
 ```
@@ -39,33 +43,57 @@ cargo clippy --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- -W
 | Crate | Tests | Scope |
 |-------|-------|-------|
-| `onecrawl-cli-rs` | Unit | CLI dispatch, config, parsing |
-| `onecrawl-cdp` | Unit | CDP protocol, stealth, JS evaluation |
-| `onecrawl-mcp-rs` | Unit | MCP server, tool dispatch |
-| `onecrawl-core` | Unit | Shared types, utilities |
-| `onecrawl-crypto` | Unit | Encryption, TOTP, PKCE |
-| `onecrawl-parser` | Unit + Doc | HTML parsing, accessibility tree |
-| `onecrawl-storage` | Unit | KV storage, encrypted store |
-| `onecrawl-server` | Unit | HTTP API server |
+| `onecrawl-cli-rs` | Unit | CLI dispatch, config, session mgmt, daemon control |
+| `onecrawl-cdp` | Unit | CDP protocol, stealth patches, JS evaluation |
+| `onecrawl-mcp-rs` | Unit | MCP server, 18 tools / 546+ actions dispatch |
+| `onecrawl-core` | Unit | Shared types, utilities, error types |
+| `onecrawl-crypto` | Unit | AES-256-GCM, TOTP, PKCE, PBKDF2 |
+| `onecrawl-parser` | Unit + Doc | HTML parsing, accessibility tree, selectors |
+| `onecrawl-storage` | Unit | KV storage, encrypted store, profile mgmt |
+| `onecrawl-server` | Unit | axum HTTP API server, tab lifecycle |
+| `onecrawl-browser` | None | **Vendor** crate — do not add tests, 1 clippy warning OK |
 | `onecrawl-e2e` | E2E | **Excluded** — requires running Chrome |
-| `onecrawl-python` | Unit | **Excluded** — PyO3 linker issue |
+| `onecrawl-python` | Unit | **Excluded** — PyO3 linker issue in test mode |
+| `onecrawl-napi` | None | Node.js bindings — tested via npm |
+## Baseline Capture & Regression Check
+```bash
+# Before changes — capture baseline test count
+cargo test --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- --test-threads=1 2>&1 | grep "test result"
+# Example output: test result: ok. 573 passed; 0 failed; 0 ignored
+# After changes — compare
+cargo test --workspace --exclude onecrawl-e2e --exclude onecrawl-python -- --test-threads=1 2>&1 | grep "test result"
+# Must show: passed ≥ baseline, 0 failed
+```
 ## Procedure
-1. **Before changes**: run full suite and record baseline.
-2. Implement changes and add/update tests.
-3. **After changes**: rerun full suite and verify no regressions.
+1. **Before changes**: run full suite and record baseline test count.
+2. Implement changes and add/update tests for new/changed behavior.
+3. **After changes**: rerun full suite and verify:
+   - [ ] No regressions (0 failed)
+   - [ ] Test count ≥ baseline (no tests removed without justification)
+   - [ ] New tests added for new code paths
 4. If any previously passing test breaks, fix before completion.
 5. Ensure tests are deterministic, isolated, non-interactive, and CI-compatible.
 ## Key Rules
-- Use `--test-threads=1` — some tests share browser state.
+- **Always** use `--test-threads=1` — tests share browser/session state.
 - Never use `#[ignore]` to skip failing tests.
-- VecDeque: use `.back()` not `.last()`, `.iter().skip(n)` not `[n..]`.
+- `VecDeque`: use `.back()` not `.last()`, `.iter().skip(n)` not `[n..]`.
+- `rand::rng()` is `!Send` — scope all RNG in sync blocks before `.await`.
+- Don't use `unwrap_or_default()` on serde parse — use `match` with proper error.
+- Prefer CDP-native operations over JS evaluation in test assertions.
 ## Done Criteria
-- Full suite green, no regressions.
+- [ ] Full suite green (573+ tests), 0 failed, 0 regressions.
+- [ ] Clippy: 0 warnings in owned crates.
+- [ ] New/changed code has corresponding test coverage.
 ## Anti-patterns
-- Manual-only validation
-- Flaky timeout-driven E2E
+- Manual-only validation ("I checked it works")
+- Flaky timeout-driven tests (use wait-for conditions)
 - Merging with failing tests
+- Using `--test-threads=N` where N > 1 (causes flaky failures)
+- Adding `#[allow(unused)]` to silence test compilation warnings

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "onecrawl",
-  "version": "4.0.0-beta.1",
+  "version": "4.0.0-beta.3",
   "description": "Browser automation engine — CLI, MCP server, and agent skills installer",
   "license": "BUSL-1.1",
   "author": "Giulio Leone <giulio@onecrawl.dev>",