npm - pi-crew - Versions diffs - 0.5.2 → 0.5.6 - Mend

pi-crew 0.5.2 → 0.5.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (137) hide show

package/CHANGELOG.md +183 -0
package/README.md +17 -1
package/docs/architecture.md +2 -0
package/docs/bugs/cross-session-notification-leakage.md +82 -0
package/docs/coding-agent-optimization.md +268 -0
package/docs/deep-review-report.md +384 -0
package/docs/distillation/cybersecurity-patterns.md +294 -0
package/docs/migration-v0.4-v0.5.md +208 -0
package/docs/optimization-plan.md +642 -0
package/docs/pi-crew-v0.5.5-audit-fix-plan.md +133 -0
package/docs/pi-mono-opportunities.md +969 -0
package/docs/pi-mono-review.md +291 -0
package/docs/skills/REFERENCE.md +144 -0
package/package.json +12 -9
package/skills/artifact-analysis-loop/SKILL.md +302 -0
package/skills/async-worker-recovery/SKILL.md +19 -1
package/skills/child-pi-spawning/SKILL.md +19 -6
package/skills/context-artifact-hygiene/SKILL.md +19 -2
package/skills/delegation-patterns/SKILL.md +68 -3
package/skills/detection-pipeline-design/SKILL.md +285 -0
package/skills/event-log-tracing/SKILL.md +20 -6
package/skills/git-master/SKILL.md +20 -6
package/skills/hunting-investigation-loop/SKILL.md +401 -0
package/skills/incident-playbook-construction/SKILL.md +383 -0
package/skills/live-agent-lifecycle/SKILL.md +20 -6
package/skills/mailbox-interactive/SKILL.md +19 -6
package/skills/model-routing-context/SKILL.md +19 -1
package/skills/multi-perspective-review/SKILL.md +19 -4
package/skills/observability-reliability/SKILL.md +19 -2
package/skills/orchestration/SKILL.md +20 -2
package/skills/ownership-session-security/SKILL.md +20 -2
package/skills/pi-extension-lifecycle/SKILL.md +20 -2
package/skills/post-mortem/SKILL.md +7 -2
package/skills/read-only-explorer/SKILL.md +20 -6
package/skills/requirements-to-task-packet/SKILL.md +23 -3
package/skills/resource-discovery-config/SKILL.md +20 -2
package/skills/runtime-state-reader/SKILL.md +20 -2
package/skills/safe-bash/SKILL.md +21 -6
package/skills/scrutinize/SKILL.md +20 -2
package/skills/secure-agent-orchestration-review/SKILL.md +29 -2
package/skills/security-review/SKILL.md +560 -0
package/skills/state-mutation-locking/SKILL.md +22 -2
package/skills/systematic-debugging/SKILL.md +8 -6
package/skills/threat-hypothesis-framework/SKILL.md +175 -0
package/skills/ui-render-performance/SKILL.md +20 -2
package/skills/verification-before-done/SKILL.md +17 -2
package/skills/widget-rendering/SKILL.md +21 -6
package/skills/workspace-isolation/SKILL.md +20 -6
package/skills/worktree-isolation/SKILL.md +20 -6
package/src/agents/agent-config.ts +40 -1
package/src/benchmark/benchmark-runner.ts +45 -0
package/src/benchmark/feedback-loop.ts +5 -0
package/src/config/config.ts +32 -5
package/src/config/role-tools.ts +82 -0
package/src/config/suggestions.ts +8 -0
package/src/config/types.ts +4 -0
package/src/extension/async-notifier.ts +10 -1
package/src/extension/crew-cleanup.ts +114 -0
package/src/extension/cross-extension-rpc.ts +1 -1
package/src/extension/notification-router.ts +18 -0
package/src/extension/register.ts +27 -19
package/src/extension/registration/subagent-tools.ts +1 -1
package/src/extension/team-tool/anchor.ts +201 -0
package/src/extension/team-tool/api.ts +2 -1
package/src/extension/team-tool/auto-summarize.ts +154 -0
package/src/extension/team-tool/run.ts +42 -7
package/src/extension/team-tool.ts +44 -2
package/src/hooks/registry.ts +1 -3
package/src/observability/event-bus.ts +69 -0
package/src/observability/event-to-metric.ts +0 -2
package/src/runtime/anchor-manager.ts +473 -0
package/src/runtime/async-runner.ts +8 -4
package/src/runtime/auto-summarize.ts +350 -0
package/src/runtime/background-runner.ts +10 -3
package/src/runtime/budget-tracker.ts +354 -0
package/src/runtime/chain-runner.ts +507 -0
package/src/runtime/child-pi.ts +123 -35
package/src/runtime/crash-recovery.ts +5 -4
package/src/runtime/crew-agent-runtime.ts +1 -0
package/src/runtime/custom-tools/irc-tool.ts +13 -0
package/src/runtime/custom-tools/submit-result-tool.ts +3 -2
package/src/runtime/delivery-coordinator.ts +10 -3
package/src/runtime/dynamic-script-runner.ts +482 -0
package/src/runtime/foreground-control.ts +87 -17
package/src/runtime/handoff-manager.ts +589 -0
package/src/runtime/hidden-handoff.ts +424 -0
package/src/runtime/live-agent-manager.ts +20 -4
package/src/runtime/live-session-runtime.ts +39 -4
package/src/runtime/manifest-cache.ts +2 -1
package/src/runtime/model-resolver.ts +16 -4
package/src/runtime/phase-tracker.ts +373 -0
package/src/runtime/pi-args.ts +11 -1
package/src/runtime/pi-json-output.ts +31 -0
package/src/runtime/pipeline-runner.ts +514 -0
package/src/runtime/progress-tracker.ts +124 -0
package/src/runtime/retry-runner.ts +354 -0
package/src/runtime/sandbox.ts +252 -0
package/src/runtime/scheduler.ts +7 -2
package/src/runtime/skill-effectiveness.ts +473 -0
package/src/runtime/skill-instructions.ts +37 -3
package/src/runtime/subagent-manager.ts +1 -1
package/src/runtime/task-graph.ts +11 -1
package/src/runtime/task-runner.ts +92 -18
package/src/runtime/team-runner.ts +13 -12
package/src/runtime/tool-progress.ts +10 -3
package/src/runtime/verification-gates.ts +367 -0
package/src/schema/team-tool-schema.ts +37 -0
package/src/skills/discover-skills.ts +5 -0
package/src/state/active-run-registry.ts +9 -2
package/src/state/contracts.ts +9 -0
package/src/state/crew-init.ts +3 -3
package/src/state/decision-ledger.ts +98 -55
package/src/state/event-log-rotation.ts +2 -2
package/src/state/event-log.ts +144 -10
package/src/state/hook-instinct-bridge.ts +5 -5
package/src/state/mailbox.ts +10 -0
package/src/state/run-cache.ts +18 -8
package/src/state/state-store.ts +3 -1
package/src/state/types.ts +4 -0
package/src/tools/safe-bash-extension.ts +1 -0
package/src/tools/safe-bash.ts +152 -20
package/src/types/new-api-types.ts +34 -0
package/src/ui/agent-management-overlay.ts +5 -1
package/src/ui/crew-widget.ts +29 -15
package/src/ui/overlays/mailbox-detail-overlay.ts +13 -2
package/src/ui/powerbar-publisher.ts +101 -7
package/src/ui/tool-render.ts +15 -15
package/src/ui/transcript-cache.ts +13 -0
package/src/utils/bm25-search.ts +16 -8
package/src/utils/env-filter.ts +8 -5
package/src/utils/redaction.ts +169 -15
package/src/utils/session-utils.ts +52 -0
package/src/utils/sse-parser.ts +10 -1
package/src/worktree/cleanup.ts +6 -1
package/src/worktree/worktree-manager.ts +32 -13
package/workflows/chain.workflow.md +252 -0
package/workflows/pipeline.workflow.md +27 -0

package/skills/incident-playbook-construction/SKILL.md ADDED Viewed

@@ -0,0 +1,383 @@
+---
+name: incident-playbook-construction
+description: "Build structured incident response playbooks and runbooks."
+triggers:
+  - "build playbook"
+  - "create runbook"
+  - "document procedure"
+  - "IR automation"
+  - "SOAR design"
+---
+# incident-playbook-construction
+Use this skill when building incident response playbooks and structured procedures.
+## Source
+Distilled from `building-incident-response-playbook` (Anthropic Cybersecurity Skills) and generalized for software/team context.
+## When to Use
+- Creating new incident response procedures
+- Documenting procedures for a specific incident type
+- Automating response workflows
+- Preparing for compliance audits (SOC 2, HIPAA, PCI-DSS)
+- Conducting gap analysis of existing response capabilities
+## Playbook Structure
+```yaml
+playbook:
+  name: string                    # e.g., "data-breach-response"
+  version: string                # e.g., "1.0.0"
+  trigger:
+    conditions:
+      - name: string
+        description: string
+  scope:
+    affected:
+      - [files, systems, teams]
+    not_in_scope:
+      - [what this playbook doesn't cover]
+  steps:
+    - id: number
+      name: string
+      action: string             # What to do
+      verify: string             # How to confirm success
+      on_success: next_step_id    # or "close"
+      on_failure: escalation_id   # or "abort"
+  decision_tree:
+    - condition: string          # e.g., "data_encrypted == true"
+      branches:
+        yes: step_id
+        no: step_id
+  escalation:
+    - id: number
+      condition: string
+      action: string              # notify, escalate, abort
+      notify: [roles]
+  raci:
+    responsible: [role]
+    accountable: [role]
+    consulted: [role]
+    informed: [role]
+  sla:
+    detection: duration          # e.g., "15m"
+    containment: duration
+    eradication: duration
+    recovery: duration
+    lessons_learned: duration
+```
+## Playbook Workflow
+```markdown
+## Playbook Construction Process
+1. **Identify Type** → [bug-fix, security-incident, outage, data-breach, supply-chain]
+2. **Define Scope** → [affected files, teams, systems]
+3. **Document Steps** → Numbered procedure: [step → action → verification]
+4. **Add Decisions** → Branch points: [if X then Y else Z]
+5. **Specify Roles** → RACI: [Responsible, Accountable, Consulted, Informed]
+6. **Set SLAs** → Time-based thresholds: [P1: 1hr, P2: 4hr, P3: 24hr]
+7. **Add Automation** → Auto-trigger conditions: [if X then run Y]
+8. **Test** → Validate with [scenario simulation]
+```
+## Step Definition
+Each step follows this structure:
+```yaml
+step:
+  id: 1                          # Sequential ID
+  name: "Contain the incident"  # Human-readable name
+  action: |
+    # Concrete actions to take
+    1. Isolate affected system
+    2. Preserve evidence
+    3. Notify team
+  verify: |
+    # How to confirm step completed
+    - System isolated: check network connections
+    - Evidence preserved: snapshot taken
+    - Team notified: ack received
+  tools:
+    - name: string
+      command: string
+  artifacts:
+    output: [files created by this step]
+  rollback: |
+    # How to undo this step if needed
+    reconnect_system()
+  next:
+    success: 2                   # Next step ID on success
+    failure: "escalate"          # Or step ID, or "abort"
+```
+## Decision Tree Patterns
+### Branch by Severity
+```yaml
+decision:
+  name: severity-assessment
+  condition: "incident.severity"
+  branches:
+    P1:
+      - step: 2                  # Immediate containment
+        notify: [lead, manager]
+      - step: 5                  # War room
+    P2:
+      - step: 3                  # Standard response
+        notify: [lead]
+    P3:
+      - step: 4                  # Low priority
+        notify: [team]
+    P4:
+      - step: 6                  # Backlog
+        notify: []
+```
+### Branch by Type
+```yaml
+decision:
+  name: incident-type-assessment
+  condition: "incident.type"
+  branches:
+    data-breach:
+      - step: data_containment
+      - step: legal_notification
+      - step: affected_users_notification
+    security-compromise:
+      - step: isolate_system
+      - step: preserve_evidence
+      - step: forensic_investigation
+    outage:
+      - step: assess_impact
+      - step: restore_service
+      - step: post-mortem
+    bug:
+      - step: reproduce
+      - step: fix
+      - step: verify
+```
+## RACI Matrix
+```yaml
+raci:
+  roles:
+    - name: orchestrator
+      responsible: [coordinate, dispatch]
+      accountable: [decisions, outcomes]
+    - name: executor
+      responsible: [implement, investigate]
+      accountable: []
+    - name: verifier
+      responsible: [test, validate]
+      accountable: []
+    - name: lead
+      responsible: []
+      accountable: [escalation, resource_allocation]
+    - name: manager
+      responsible: []
+      accountable: [approval, external_communication]
+```
+## SLA Definition
+```yaml
+sla:
+  phases:
+    - name: detection
+      target: "15m"              # Time to detect/acknowledge
+      measured_from: incident_start
+    - name: containment
+      target: "1h"
+      measured_from: detection_complete
+    - name: eradication
+      target: "4h"
+      measured_from: containment_complete
+    - name: recovery
+      target: "24h"
+      measured_from: eradication_complete
+    - name: lessons_learned
+      target: "72h"
+      measured_from: recovery_complete
+  escalation:
+    - phase: detection
+      exceeded: notify_lead
+    - phase: containment
+      exceeded: notify_manager
+    - phase: recovery
+      exceeded: notify_executive
+```
+## Playbook Examples
+### Example 1: Security Incident Playbook
+```yaml
+playbook:
+  name: security-compromise-response
+  trigger:
+    conditions:
+      - name: unauthorized_access
+        description: Access by unauthorized user
+      - name: malware_detection
+        description: Malware or suspicious process
+      - name: data_exfiltration
+        description: Abnormal data transfer
+  steps:
+    - id: 1
+      name: Detect and confirm
+      action: |
+        - Review logs for unauthorized access
+        - Confirm malware detection with secondary tool
+        - Identify scope of compromise
+      verify: |
+        - Logs show compromise indicators
+        - Malware confirmed by 2+ tools
+      next:
+        success: 2
+        failure: abort
+    - id: 2
+      name: Contain
+      action: |
+        - Isolate affected system from network
+        - Preserve evidence (memory dump, disk image)
+        - Block malicious IPs/domains
+      verify: |
+        - System isolated: no network connections
+        - Evidence preserved: hash verified
+      next:
+        success: 3
+        failure: escalate
+    - id: 3
+      name: Investigate
+      action: |
+        - Determine attack vector
+        - Identify affected systems
+        - Timeline reconstruction
+      next:
+        success: 4
+    - id: 4
+      name: Eradicate
+      action: |
+        - Remove malware/backdoor
+        - Patch vulnerability
+        - Reset compromised credentials
+      next:
+        success: 5
+    - id: 5
+      name: Recover
+      action: |
+        - Restore from clean backup
+        - Verify system integrity
+        - Monitor for recurrence
+      next:
+        success: close
+  sla:
+    detection: "15m"
+    containment: "1h"
+    eradication: "4h"
+    recovery: "24h"
+```
+### Example 2: Bug Fix Playbook
+```yaml
+playbook:
+  name: bug-fix-response
+  trigger:
+    conditions:
+      - name: regression
+        description: Existing feature broken
+      - name: new_bug
+        description: Newly reported issue
+  steps:
+    - id: 1
+      name: Reproduce
+      action: |
+        - Get reliable repro steps
+        - Verify bug exists
+        - Document environment
+      verify: |
+        - Bug reproduced consistently OR
+        - Bug confirmed as flaky (>50% reproduction)
+      next:
+        success: 2
+    - id: 2
+      name: Investigate
+      action: |
+        - Find root cause
+        - Identify affected code
+        - Determine fix approach
+      verify: |
+        - Root cause identified (not hypothesis)
+      next:
+        success: 3
+    - id: 3
+      name: Fix
+      action: |
+        - Implement fix
+        - Write regression test
+        - Update documentation
+      next:
+        success: 4
+    - id: 4
+      name: Verify
+      action: |
+        - Run original repro
+        - Run full test suite
+        - Code review
+      next:
+        success: close
+  sla:
+    detection: "30m"
+    containment: "1h"
+    eradication: "4h"
+    recovery: "24h"
+```
+## Enforcement — Incident Playbook Construction Gate
+**Before publishing a playbook, verify:**
+- [ ] Trigger conditions defined (knowing when to use this playbook)
+- [ ] Each step has action + verification + next (success/failure) defined
+- [ ] Decision points have explicit branches (if X then Y else Z)
+- [ ] RACI matrix assigns responsible/accountable roles
+- [ ] SLA phases defined with targets and escalation conditions
+- [ ] Rollback procedures documented for critical steps
+If ANY answer is NO → Stop. Complete playbook structure before publishing.
+## Anti-Patterns
+- **Don't** create playbooks without trigger conditions (don't know when to use)
+- **Don't** skip verification steps (can't confirm success)
+- **Don't** skip rollback procedures (can't undo mistakes)
+- **Don't** skip decision points (linear playbooks miss branches)
+- **Don't** skip SLA definition (no accountability for timing)
+## Tools
+| Tool | Purpose |
+|------|---------|
+| `post-mortem` | Post-incident documentation |
+| `systematic-debugging` | Root cause investigation |
+| `verification-before-done` | Step verification |
+## Verification
+For playbook changes:
+```bash
+cd pi-crew
+npx tsc --noEmit
+node --experimental-strip-types --test test/unit/playbook-validation.test.ts
+```
+*See also: `post-mortem` skill for post-incident documentation, `delegation-patterns` for escalation matrix.*

package/skills/live-agent-lifecycle/SKILL.md CHANGED Viewed

@@ -1,8 +1,14 @@
 ---
 name: live-agent-lifecycle
-description: Live agent registration, workspace isolation, termination, and eviction workflow. Use when tracking live agents, debugging ghost agents, or understanding workspace boundaries.
----
+description: "Live agent registration, workspace isolation, termination, and eviction workflow."
+triggers:
+  - "register agent"
+  - "terminate agent"
+  - "evict stale"
+  - "ghost agent"
+  - "workspace isolation"
+---
 # live-agent-lifecycle
 Live agents are real-time, in-memory worker sessions managed by `LiveAgentManager` (`src/runtime/live-agent-manager.ts`). They are distinct from `CrewAgentRecord` files on disk — live agents provide real-time activity (tool names, response text, turn count) while agent records are durable snapshots.
@@ -33,8 +39,6 @@ interface LiveAgentHandle {
 The in-memory `liveAgents` Map stores all active handles. It is never persisted — on Pi restart, the Map is empty and agents are re-created from agent records.
----
 ## Registration
 `registerLiveAgent(input, eventLogFn?, eventsPath?)` is called when a live session worker starts. It:
@@ -49,8 +53,6 @@ Key caller sites:
 - `live-executor.ts` — when spawning a live task
 - (workspaceId is passed through the entire call chain)
----
 ## Workspace Isolation
 **`workspaceId: string`** field is the workspace boundary. Set to `manifest.cwd` at registration time.
@@ -159,6 +161,18 @@ task.completed → upsertCrewAgent → agents.json updated
 ---
+## Enforcement — Live Agent Lifecycle Gate
+**Before terminating or evicting live agents, verify:**
+- [ ] Agent handle status is terminal (not running/queued/waiting) for eviction
+- [ ] Handle age exceeds STALE_HANDLE_MS (10 minutes) for eviction
+- [ ] workspaceId matches current workspace for cross-workspace prevention
+- [ ] Agent record sync completed before handle eviction (upsertCrewAgent called)
+- [ ] Termination called in all exit paths (finally blocks, crash-recovery)
+If ANY answer is NO → Stop. Verify lifecycle state before mutation.
 ## Anti-patterns
 - **Missing termination on error path**: If a live agent crashes and `terminateLiveAgent` is not called, the handle stays in the Map forever with status "running". Use `finally` blocks or crash-recovery to ensure termination.

package/skills/mailbox-interactive/SKILL.md CHANGED Viewed

@@ -1,8 +1,13 @@
 ---
 name: mailbox-interactive
-description: Interactive waiting-task and mailbox workflow. Use when implementing or operating respond/nudge/ack/replay/supervisor-contact behavior.
+description: "Interactive waiting-task and mailbox workflow."
+triggers:
+  - "respond to worker"
+  - "nudge agent"
+  - "mailbox message"
+  - "supervisor contact"
+  - "waiting task"
 ---
 # mailbox-interactive
 Use this skill for live coordination between leader and workers. Mailbox provides an asynchronous message protocol for steer, follow-up, respond, and nudge operations.
@@ -276,6 +281,18 @@ function verifyRunOwnership(manifest: TeamRunManifest, sessionId: string, force
 }
 ```
+## Enforcement — Mailbox Interactive Gate
+**Before responding to or mutating mailbox state, verify:**
+- [ ] Target task status is "waiting" (respond only works on waiting tasks)
+- [ ] ownerSessionId matches current session (ownership verified)
+- [ ] Run status is not terminal (do not respond to completed/failed/cancelled)
+- [ ] Corrupt JSONL handled gracefully (skip malformed lines)
+- [ ] Backpressure respected (queue depth below MAX_PENDING limits)
+If ANY answer is NO → Stop. Verify mailbox state before mutating.
 ## Anti-patterns
 - **Resuming non-waiting tasks**: `respond` only works on `waiting` tasks. Resuming `running` tasks corrupts state.
@@ -285,8 +302,6 @@ function verifyRunOwnership(manifest: TeamRunManifest, sessionId: string, force
 - **Not handling corrupt JSONL**: Skip malformed lines; don't fail the whole read.
 - **Losing pending messages on session switch**: Pending steers/followups are stored in-memory in the handle. They survive session fork but not session death.
----
 ## Source patterns
 - `src/state/mailbox.ts` — appendInboxMessage, appendSteeringMessage, readMailboxMessages
@@ -297,8 +312,6 @@ function verifyRunOwnership(manifest: TeamRunManifest, sessionId: string, force
 - `src/runtime/supervisor-contact.ts` — parseSupervisorContactFromLine, recordSupervisorContact
 - `src/ui/overlays/mailbox-detail-overlay.ts` — mailbox UI
----
 ## Verification
 ```bash

package/skills/model-routing-context/SKILL.md CHANGED Viewed

@@ -1,8 +1,14 @@
 ---
 name: model-routing-context
 description: Model routing, parent context, thinking level, and prompt construction workflow. Use when changing model fallback, child Pi args, inherited context, task prompts, or compact-read behavior.
----
+triggers:
+  - "change model"
+  - "parent context"
+  - "thinking level"
+  - "task prompts"
+  - "compact read"
+---
 # model-routing-context
 Use this skill when working on model/context propagation.
@@ -22,6 +28,18 @@ Use this skill when working on model/context propagation.
 - When changing model precedence, add tests for undefined, empty, whitespace, agent, task, parent, and explicit tool override cases.
 - Redact secrets in context snippets and child prompts where logs/artifacts may persist them.
+## Enforcement — Model Routing Context Gate
+**Before changing model precedence or building task prompts, verify:**
+- [ ] Empty/whitespace model values treated as absent (not as explicit overrides)
+- [ ] Model precedence chain understood: tool override → step model → team role → agent model → parent → registry default
+- [ ] Thinking level suffix applied correctly (or intentionally omitted)
+- [ ] Secrets redacted in context snippets and child prompts
+- [ ] Tests cover: undefined, empty, whitespace, agent, task, parent, and explicit tool override cases
+If ANY answer is NO → Stop. Verify model routing before proceeding.
 ## Anti-patterns
 - Letting `agentModel: ""` block parent model fallback.

package/skills/multi-perspective-review/SKILL.md CHANGED Viewed

@@ -1,8 +1,13 @@
 ---
 name: multi-perspective-review
-description: "Multi-perspective code review with simpler-alternative pass. Use when reviewing a plan, diff, implementation, worker output, release candidate, or external feedback. Triggers: review this, look at this, LGTM check, sanity check, audit this, get a second opinion, check this PR, examine this code."
+description: "Multi-perspective code review with simpler-alternative pass."
+triggers:
+  - "review this"
+  - "look at this"
+  - "LGTM check"
+  - "sanity check"
+  - "check this PR"
 ---
 # multi-perspective-review
 Core principle: review early, review often, and separate concerns. Reviewer output is evidence to evaluate, not an instruction to obey blindly.
@@ -23,8 +28,6 @@ Before running any review passes, ask:
 This is the most valuable finding you can produce — surfacing unnecessary complexity before reviewing its details.
----
 ## Review Passes
 Run relevant passes separately:
@@ -154,6 +157,18 @@ When receiving feedback:
 5. Test each fix and verify no regressions.
 6. Push back with evidence if the suggestion is wrong, out of scope, or violates user decisions.
+## Enforcement — Multi-Perspective Review Gate
+**Before reporting review findings, verify:**
+- [ ] Simpler-alternative pass completed first (delete, use existing, smaller change, different layer)
+- [ ] Findings include: severity, path/symbol, evidence, impact, fix, verification
+- [ ] No rubber-stamps (if nothing found, state what was traced)
+- [ ] Critical/high findings have actionable fixes before proceeding
+- [ ] Verdict stated: ship / fix-then-ship / rework / reject
+If ANY answer is NO → Stop. Complete review requirements before reporting.
 ## Rules
 - Do not use performative agreement; act or give technical reasoning.

package/skills/observability-reliability/SKILL.md CHANGED Viewed

@@ -1,8 +1,13 @@
 ---
 name: observability-reliability
-description: Metrics, diagnostics, correlation, retry, deadletter, and recovery evidence workflow. Use when adding reliability features or investigating failures.
+description: "Metrics, diagnostics, correlation, retry, deadletter, and recovery evidence workflow."
+triggers:
+  - "add metrics"
+  - "diagnose failure"
+  - "retry logic"
+  - "deadletter"
+  - "recovery evidence"
 ---
 # observability-reliability
 Use this skill for reliability and observability work.
@@ -24,6 +29,18 @@ Use this skill for reliability and observability work.
 - Heartbeat classification should be threshold-based and should ignore terminal tasks/runs.
 - Overflow recovery should track phase progression and terminal states without repeatedly alerting on completed work.
+## Enforcement — Observability Reliability Gate
+**Before emitting metrics or implementing retry, verify:**
+- [ ] Metric labels are low-cardinality (no raw paths, prompts, or secrets)
+- [ ] Secrets redacted before writing logs, events, diagnostics, or bundles
+- [ ] Retry records attempts and deadletters on exhaustion
+- [ ] Diagnostics are safe to share (no secrets, no raw sensitive data)
+- [ ] Heartbeat thresholds ignore terminal tasks/runs
+If ANY answer is NO → Stop. Fix observability issues before proceeding.
 ## Anti-patterns
 - High-cardinality Prometheus labels.

package/skills/orchestration/SKILL.md CHANGED Viewed

@@ -1,8 +1,13 @@
 ---
 name: orchestration
-description: "Multi-phase orchestration for planners and executors. Use when decomposing complex tasks into parallel phases, dispatching workers, verifying gates, and iterating to closure. Triggers: orchestrate this, coordinate these tasks, run this multi-phase, dispatch workers, coordinate team."
+description: "Multi-phase orchestration for planners and executors."
+triggers:
+  - "orchestrate this"
+  - "coordinate tasks"
+  - "run this multi-phase"
+  - "dispatch workers"
+  - "coordinate team"
 ---
 # orchestration
 Use this skill when orchestrating multi-phase tasks across pi-crew teams and workers.
@@ -96,6 +101,19 @@ Maintain the original scope exactly. Không mở rộng scope vì "thấy thêm
 - This is the final safety net — typecheck, tests, lint, everything.
 - Only report DONE when final verification is green.
+## Enforcement — Orchestration Gate
+**Before launching a new phase, verify:**
+- [ ] Full work surface enumerated (all files, symbols, subsystems known)
+- [ ] Phase tasks are independent (disjoint file scope, no edit conflicts)
+- [ ] Each worker has explicit file ownership (no two workers same file)
+- [ ] Verification gates defined for phase completion
+- [ ] Phase gate passed (typecheck, tests, lint green) before advancing
+- [ ] Respawn workers for broken work (do not absorb/fix yourself)
+If ANY answer is NO → Stop. Complete planning before dispatching.
 ## Anti-patterns
 These are the behaviours that kill orchestration quality — tránh xa:

package/skills/ownership-session-security/SKILL.md CHANGED Viewed

@@ -1,8 +1,14 @@
 ---
 name: ownership-session-security
-description: Session ownership and authorization workflow. Use when implementing cancel, respond, steer, run ownership, cwd overrides, imported runs, or cross-session actions.
----
+description: "Session ownership and authorization workflow."
+triggers:
+  - "cancel run"
+  - "respond to task"
+  - "cross-session action"
+  - "ownership verify"
+  - "session security"
+---
 # ownership-session-security
 Use this skill for cross-session safety and trust-boundary work.
@@ -24,6 +30,18 @@ Use this skill for cross-session safety and trust-boundary work.
 - Use `resolveContainedPath`, `resolveRealContainedPath`, `assertSafePathId`, and symlink checks rather than ad-hoc `startsWith` checks.
 - Destructive management actions must require `confirm: true`; referenced resource deletes must require `force: true` where applicable.
+## Enforcement — Ownership Session Security Gate
+**Before mutating run state or cross-session operations, verify:**
+- [ ] Session ID propagated into TeamContext for production paths
+- [ ] ownerSessionId verified before respond/cancel/mutate operations
+- [ ] Path fields (cwd, import, artifact) normalized and contained under allowed base
+- [ ] Safe path helpers used (resolveContainedPath, assertSafePathId) not startsWith checks
+- [ ] Destructive actions require explicit confirm/force parameters
+If ANY answer is NO → Stop. Verify ownership before mutating state.
 ## Anti-patterns
 - Assuming `ctx.sessionId` exists directly on Pi context.