npm - @damian87/omp - Versions diffs - 0.16.0 → 0.19.0 - Mend

@damian87/omp 0.16.0 → 0.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

package/.github/dependabot.yml +42 -0
package/.github/skills/code-review/SKILL.md +5 -1
package/.github/skills/debug/SKILL.md +1 -1
package/.github/skills/ponytail/SKILL.md +83 -0
package/.github/skills/qa-browse/SKILL.md +125 -0
package/.github/skills/ralplan/SKILL.md +4 -1
package/.github/skills/tdd/SKILL.md +21 -6
package/.github/skills/ultraqa/SKILL.md +1 -1
package/.github/skills/verify/SKILL.md +1 -1
package/.github/workflows/ci.yml +67 -0
package/.github/workflows/security.yml +157 -0
package/README.md +2 -0
package/catalog/capabilities.json +46 -0
package/catalog/skills-general.json +26 -0
package/dist/src/cli.js +25 -1
package/dist/src/cli.js.map +1 -1
package/dist/src/env/init.js +1 -1
package/dist/src/env/init.js.map +1 -1
package/dist/src/memory-review/transcript.js +1 -1
package/dist/src/memory-review/transcript.js.map +1 -1
package/dist/src/mode-state/index.d.ts +1 -0
package/dist/src/mode-state/index.js +1 -0
package/dist/src/mode-state/index.js.map +1 -1
package/dist/src/mode-state/paths.d.ts +5 -4
package/dist/src/mode-state/paths.js.map +1 -1
package/dist/src/mode-state/ponytail.d.ts +11 -0
package/dist/src/mode-state/ponytail.js +22 -0
package/dist/src/mode-state/ponytail.js.map +1 -0
package/docs/security-pipeline.md +101 -0
package/package.json +13 -6
package/plugin.json +1 -1
package/scripts/prompt-submit.mjs +10 -0
package/scripts/skills-safety-scan.mjs +231 -0

package/.github/dependabot.yml ADDED Viewed

@@ -0,0 +1,42 @@
+version: 2
+updates:
+  # npm dependencies
+  - package-ecosystem: npm
+    directory: "/"
+    schedule:
+      interval: weekly
+      day: monday
+      time: "06:00"
+      timezone: Europe/London
+    open-pull-requests-limit: 10
+    labels: ["dependencies"]
+    groups:
+      # Coupled peer pairs must upgrade together or npm hits ERESOLVE.
+      # Listed first so they take precedence over the type-based groups
+      # below, and intentionally unrestricted on update-types so majors
+      # (e.g. eslint 9->10) bump alongside their peers in one PR.
+      eslint:
+        patterns: ["eslint", "@eslint/*"]
+      vitest:
+        patterns: ["vitest", "@vitest/*"]
+      dev-dependencies:
+        dependency-type: development
+        update-types: ["minor", "patch"]
+      production-dependencies:
+        dependency-type: production
+        update-types: ["minor", "patch"]
+    commit-message:
+      prefix: "chore(deps)"
+      prefix-development: "chore(dev-deps)"
+  # GitHub Actions used in workflows
+  - package-ecosystem: github-actions
+    directory: "/"
+    schedule:
+      interval: weekly
+      day: monday
+      time: "06:00"
+      timezone: Europe/London
+    labels: ["ci", "dependencies"]
+    commit-message:
+      prefix: "ci(actions)"

package/.github/skills/code-review/SKILL.md CHANGED Viewed

@@ -17,13 +17,17 @@ Use `/code-review` before merge or final handoff.
 1. **Read the diff** — `git diff` for unstaged, `git diff --staged` for staged, or `git diff main...HEAD` for branch diff
 2. **Check for blockers** — bugs, logic errors, missing error handling, broken contracts
-3. **Check for security** — secrets in code, injection risks, auth gaps, unsafe defaults
+3. **Check for security** — secrets in code, injection risks, auth gaps, unsafe defaults, and
+   **data exposure / least privilege**: does the change return, log, or expose more than it
+   needs (PII, password hashes, `SELECT *`, tokens, internal fields)?
 4. **Check for regressions** — does the change break existing tests or documented behaviour?
 5. **Check for scope drift** — does the change do more or less than requested?
 6. **Run tests** if they exist and haven't been run
 ## Rules
+- **Don't stop at the first issue.** Once you find a blocker, keep scanning the whole change —
+  a serious bug (e.g. a data leak) often hides behind the obvious one. Review every line.
 - Only flag issues that genuinely matter — no style nits, no formatting opinions
 - If the code works, tests pass, and scope is right, say so clearly
 - Flag anything you'd reject in a PR review

package/.github/skills/debug/SKILL.md CHANGED Viewed

@@ -17,7 +17,7 @@ Use `/debug` for broken, failing, slow, or confusing behavior.
 ## Steps (follow in order)
-1. **Reproduce** — get the failure to happen reliably. If you can't reproduce, that's important information.
+1. **Reproduce** — get the failure to happen reliably. If you can't reproduce, that's important information. For a web UI bug, use `/qa-browse` to drive the page and reproduce the broken flow.
 2. **Minimise** — find the smallest case that still fails. Strip away unrelated code/config.
 3. **Hypothesise** — form 2–3 ranked theories about the cause. Start with the most likely.
 4. **Inspect** — gather evidence for/against each hypothesis. Read code, add logging, check state.

package/.github/skills/ponytail/SKILL.md ADDED Viewed

@@ -0,0 +1,83 @@
+---
+name: ponytail
+description: Lazy senior dev mode. Forces the simplest, shortest solution that actually works — YAGNI, stdlib first, native platform features before dependencies, one line before fifty, no unrequested abstractions. Use with /ponytail when the user complains about over-engineering, bloat, boilerplate, or unnecessary dependencies, or says "be lazy", "lazy mode", "simplest solution", "minimal solution", "yagni", "do less", or "shortest path". Adapted from DietrichGebert/ponytail (MIT).
+argument-hint: "[lite|full|ultra]"
+---
+# Ponytail — lazy senior dev mode
+You are a lazy senior developer. Lazy means efficient, not careless. The best
+code is the code never written.
+## Mode
+Once activated, every response follows the ladder until deactivated. No drift
+back to over-building. Still active if unsure. Default level: **full**.
+- `/ponytail` or `/ponytail full` — the ladder, applied with judgement (default).
+- `/ponytail lite` — apply the ladder but keep explanatory prose.
+- `/ponytail ultra` — smallest possible diff, code over prose, terse.
+- `/ponytail off` or "normal mode" or "stop ponytail" — deactivate.
+Run `omp ponytail start [level]` to persist the mode across turns (re-injected by
+the prompt-submit hook, like ralph/ultrawork). `omp ponytail off` clears it. If
+the CLI command is unavailable, the in-session rules below still apply.
+## The ladder (stop at the first rung that holds)
+1. Does this need to exist at all? (YAGNI)
+2. Already in this codebase? Reuse the helper/util/pattern — don't rewrite it.
+3. Stdlib does it? Use it.
+4. Native platform feature covers it? Use it.
+5. Already-installed dependency solves it? Use it.
+6. Can this be one line? Make it one line.
+7. Only then: write the minimum code that works.
+The ladder runs **after** you understand the problem, not instead of it: read
+the task and the code it touches, trace the real flow end to end, then climb. A
+small diff you don't understand is laziness dressed up as efficiency.
+Bug fix = root cause, not symptom: grep every caller of the function you touch
+and fix the shared function once. One guard there is a smaller diff than one per
+caller, and patching only the named path leaves a sibling caller broken.
+## Rules
+- No abstractions that weren't explicitly requested.
+- No new dependency if it can be avoided.
+- No boilerplate nobody asked for.
+- Deletion over addition. Boring over clever. Fewest files possible.
+- Shortest working diff wins — but only once you understand the problem.
+- Question complex requests: "Do you actually need X, or does Y cover it?"
+- When two stdlib approaches are the same size, pick the edge-case-correct one.
+  Lazy means less code, not the flimsier algorithm.
+- Mark intentional simplifications with a `ponytail:` comment. If the shortcut
+  has a known ceiling (global lock, O(n²) scan, naive heuristic), name the
+  ceiling and the upgrade path in the comment.
+## Never lazy about
+Understanding the problem, input validation at trust boundaries, error handling
+that prevents data loss, security, accessibility, and anything explicitly
+requested. Lazy code without its check is unfinished: non-trivial logic leaves
+**one** runnable check behind — the smallest thing that fails if the logic
+breaks (an assert-based self-check or one small test file; no frameworks, no
+fixtures). Trivial one-liners need no test.
+## Examples
+**Over-built**: install flatpickr, write a wrapper component, add a stylesheet,
+open a timezone discussion.
+**Ponytail**: `<input type="date">` — the browser has one.
+**Over-built**: a `StringUtils` class with a `capitalize` static method.
+**Ponytail**: `s[0].toUpperCase() + s.slice(1)` at the one call site.
+## Deactivate
+Say "normal mode", "/ponytail off", or "stop ponytail" to return to standard behaviour.
+If the mode was persisted with `omp ponytail start`, a chat-only "off" is not
+enough — the prompt-submit hook keeps re-injecting `[PONYTAIL ACTIVE]` until the
+state file is cleared. So on any deactivation request, **run `omp ponytail off`**
+to clear persisted state, then confirm it's off.

package/.github/skills/qa-browse/SKILL.md ADDED Viewed

@@ -0,0 +1,125 @@
+---
+name: qa-browse
+description: Drive a real browser from the CLI to QA a flow — navigate, click, fill, verify. Uses @playwright/cli (token-efficient, not MCP). Use with /qa-browse when the user wants to manually check a web flow works, not write a test suite.
+argument-hint: "<url> <what to verify>"
+---
+# QA Browse — CLI browser driving with @playwright/cli
+`/qa-browse` opens a live browser via `@playwright/cli` (binary `playwright-cli`) and walks a flow to verify it works. No test files. No MCP. This is distinct from the standard Playwright CLI (`npx playwright`, used for test/codegen/show-trace).
+Engine: `@playwright/cli` (Microsoft). Snapshots live on disk, not in context — cheap tokens. Browser stays alive between commands.
+## Rules
+- If not installed globally, run via the scoped package: `npx @playwright/cli` (NOT `npx playwright-cli` — that resolves the unscoped name and fails with ENOTFOUND). Never assume global.
+- Loop: **snapshot → read refs → act → re-snapshot.** Always.
+- Refs (`e5`, `e12`) are valid only for the latest snapshot. Re-snapshot after any navigation/click that changes the page.
+- Headless by default. Add `--headed` only when a human must watch.
+- Prefer refs over CSS. Use `getByRole`/`getByText` selectors only if a ref isn't available.
+- Verify with `eval` or a snapshot of the result region — don't assume an action worked.
+- Screenshot on each pass/fail checkpoint so there's evidence.
+- `close` when done.
+## Setup
+```bash
+npm install -g @playwright/cli@latest   # or run ad-hoc: npx @playwright/cli
+playwright-cli install-browser chromium  # first run in a fresh env (NOT `install` — that inits a workspace)
+```
+## Core loop
+```bash
+playwright-cli open <url>        # open + navigate (prints a snapshot path)
+playwright-cli snapshot          # accessibility tree with refs → read it
+playwright-cli click e15         # act using a ref
+playwright-cli fill e5 "text"    # fill input (add --submit to press Enter)
+playwright-cli type "text"       # type into focused element
+playwright-cli press Enter       # key press
+playwright-cli snapshot          # re-snapshot to confirm new state
+playwright-cli screenshot        # evidence
+playwright-cli close
+```
+## Interact
+```bash
+playwright-cli click <ref> [button]     # left/right/middle
+playwright-cli dblclick <ref>
+playwright-cli fill <ref> <text> --submit
+playwright-cli select <ref> <value>     # dropdown
+playwright-cli check <ref> / uncheck <ref>
+playwright-cli hover <ref>
+playwright-cli drag <startRef> <endRef>
+playwright-cli upload ./file.pdf
+playwright-cli dialog-accept / dialog-dismiss
+```
+## Navigate
+```bash
+playwright-cli goto <url>
+playwright-cli go-back / go-forward / reload
+```
+## Inspect & verify
+```bash
+playwright-cli snapshot --depth=4          # shallow tree on big pages
+playwright-cli snapshot e34                 # drill into a subtree
+playwright-cli snapshot --raw | grep button # script-friendly
+playwright-cli eval "document.title"        # read page state
+playwright-cli eval "el => el.textContent" e5
+playwright-cli eval "el => el.getAttribute('data-testid')" e5
+playwright-cli console                       # console messages
+```
+## Evidence
+```bash
+playwright-cli screenshot --full-page        # full scrollable page (bare `screenshot` = current viewport)
+playwright-cli screenshot e5                 # one element
+playwright-cli screenshot --filename=step1.png
+playwright-cli video-start / video-stop
+playwright-cli tracing-start / tracing-stop  # record a trace; view it with: npx playwright show-trace <trace>
+playwright-cli pdf --filename=page.pdf
+```
+## Sessions
+State (cookies, localStorage) persists within a session across commands.
+```bash
+playwright-cli --session=qa open <url>       # named session
+playwright-cli -s=qa open <url> --persistent # save profile to disk
+playwright-cli list                          # running sessions
+playwright-cli show                          # live dashboard, take over mouse/kbd
+playwright-cli close-all / kill-all
+```
+## QA flow checklist
+1. `open <url>` → `snapshot`.
+2. For each step: find ref in snapshot → act → re-snapshot → verify expected element/text.
+3. `screenshot` at each checkpoint (pass and fail).
+4. On failure: `eval` the element, capture `console`, take a `--headed` re-run or trace.
+5. Report: what passed, what failed, with screenshot/snapshot paths. `close`.
+## Example — login flow
+```bash
+playwright-cli open https://app.example.com/login
+playwright-cli snapshot
+playwright-cli fill e1 "user@example.com"
+playwright-cli fill e2 "secret" --submit
+playwright-cli snapshot                       # expect dashboard
+playwright-cli eval "document.title"
+playwright-cli screenshot --filename=logged-in.png
+playwright-cli close
+```
+## When NOT to use
+- Want a saved, repeatable test suite → use `/tdd` or write `@playwright/test` specs.
+- Need long-running autonomous loops or persistent introspection → Playwright MCP may fit better.

package/.github/skills/ralplan/SKILL.md CHANGED Viewed

@@ -19,7 +19,10 @@ Use `/ralplan` when the task needs planning before edits.
 2. **List implementation slices** in execution order — each slice should be independently verifiable
 3. **Define acceptance criteria** — what must be true when done
 4. **Define test shape** — which tests to write or run, what they cover
-5. **Call out risks** — what could go wrong, tradeoffs chosen, alternatives rejected
+5. **Call out risks** — what could go wrong, tradeoffs chosen, alternatives rejected. For any
+   auth, security, or data-handling feature, the plan **must** name the security specifics even
+   if the request didn't: secret/token **expiry**, **single-use / replay** protection, and
+   **enumeration / rate-limiting**. Leaving these implicit is how the plan ships a hole.
 6. **Stop at the plan** unless the user explicitly asked to implement
 ## Output

package/.github/skills/tdd/SKILL.md CHANGED Viewed

@@ -13,15 +13,29 @@ Use `/tdd` when a change can be specified by tests.
 - The codebase has an existing test framework
 - You want to prove correctness incrementally
-## Loop (repeat until done)
-1. **Red** — write or identify a failing test that describes the desired behaviour
-2. **Green** — write the minimal code to make the test pass
-3. **Refactor** — clean up the code while keeping tests green
-4. **Run** — run the full related test suite to check for regressions
+## Loop (Canon TDD — repeat until the list is empty)
+0. **List first** — before writing any code, read the **full spec/docstring** and write a
+   **test list**: every scenario you need to cover. Don't start from the happy path — walk the
+   edge-case taxonomy against the spec and add a line for each that applies:
+   - **Boundary** — min/max, zero, empty, first/last, length limits, collapsing/trimming
+   - **Empty/Null** — `""`, `None`, empty collection, whitespace-only
+   - **Format** — **unicode / accented characters**, emoji, special chars, malformed input
+   - **Implicit** — anything the spec *implies* but the prompt didn't spell out
+   A requirement that appears in the spec but not your list is the bug you're about to ship.
+1. **Red** — turn **exactly one** list item into a concrete test with real **assertions**
+   (`assert`, `expect`, `self.assertEqual`); run it and watch it **fail for the right reason**.
+2. **Green** — write the minimal code to make that test (and all previous tests) pass.
+3. **Refactor** — clean up while tests stay green.
+4. **Repeat** — take the next list item; add new items as you discover them. Run the full
+   related suite at the end to check for regressions.
 ## Rules
+- Use **executable assertions** — a script that only prints results for a human to eyeball is
+  **not a test** and does not count as red-green. Every scenario on the list gets an assertion.
+- Work the **whole list**, not just the first case — the bugs hide in the edge cases the prompt
+  didn't spell out (unicode/accents, empty input, boundaries).
 - Test **behaviour** through public surfaces, not implementation details
 - Each test should describe one behaviour — name it clearly (e.g. "returns 404 when user not found")
 - Avoid brittle tests that break when implementation changes but behaviour doesn't
@@ -30,6 +44,7 @@ Use `/tdd` when a change can be specified by tests.
 ## Output
+- `Test list` — the scenarios you enumerated from the spec (incl. the edge cases)
 - `Tests written` — list of test names and what they cover
 - `Implementation` — what was changed to make tests pass
 - `Refactoring` — what was cleaned up

package/.github/skills/ultraqa/SKILL.md CHANGED Viewed

@@ -54,7 +54,7 @@ Number every cycle explicitly: "Cycle 1", "Cycle 2", etc.
 ## Rules
-- Prefer runnable checks over inspection — run tests, don't just read code
+- Prefer runnable checks over inspection — run tests, don't just read code. For web UI flows, exercise the real page with `/qa-browse` rather than inspecting markup.
 - If tests don't exist, write minimal ones that cover the change
 - Route fixes back to `/ralph` or `/ultrawork` if they're substantial

package/.github/skills/verify/SKILL.md CHANGED Viewed

@@ -20,7 +20,7 @@ Use `/verify` before saying done.
    - Tests: `npm test`, `pytest`, etc.
    - Build: does it compile/build without errors?
    - Lint: any new warnings?
-   - Behaviour: does the feature work as described?
+   - Behaviour: does the feature work as described? For web UI flows, use `/qa-browse` to drive the live page and capture snapshot/screenshot evidence.
 3. **Read outputs** — don't assume green means pass; read the actual results
 4. **Report honestly** — if there are gaps, say so

package/.github/workflows/ci.yml ADDED Viewed

@@ -0,0 +1,67 @@
+name: CI
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+  workflow_dispatch:
+# Avoid piling up redundant runs on the same ref.
+concurrency:
+  group: ci-${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+permissions:
+  contents: read
+env:
+  # Hermetic runs — never auto-load a developer's ~/.omp/.env, never self-update.
+  OMP_SKIP_USER_ENV: "1"
+  OMP_NO_UPDATE_CHECK: "1"
+jobs:
+  build-test:
+    name: Build · Test · Lint (Node ${{ matrix.node }})
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        node: [20, 22]
+    steps:
+      - uses: actions/checkout@v7
+      - uses: actions/setup-node@v6
+        with:
+          node-version: ${{ matrix.node }}
+          cache: npm
+      - name: Install dependencies
+        run: npm ci
+      - name: Build (tsc)
+        run: npm run build
+      - name: Lint (eslint)
+        run: npm run lint
+      - name: Unit tests (vitest)
+        run: npm test
+  skills:
+    name: Validate skills & catalog
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v7
+      - uses: actions/setup-node@v6
+        with:
+          node-version: 20
+          cache: npm
+      - run: npm ci
+      # Project's own validators.
+      - name: Lint skills (omp lint:skills)
+        run: npm run lint:skills
+      - name: Validate catalog
+        run: npm run check:catalog

package/.github/workflows/security.yml ADDED Viewed

@@ -0,0 +1,157 @@
+name: Security
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+  schedule:
+    # Weekly full scan (Mondays 06:17 UTC) to catch newly disclosed CVEs.
+    - cron: "17 6 * * 1"
+  workflow_dispatch:
+concurrency:
+  group: security-${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+permissions:
+  contents: read
+env:
+  OMP_SKIP_USER_ENV: "1"
+  OMP_NO_UPDATE_CHECK: "1"
+jobs:
+  # ── 1. Native, free, zero-secret baseline ─────────────────────────────
+  npm-audit:
+    name: npm audit (prod deps)
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v7
+      - uses: actions/setup-node@v6
+        with:
+          node-version: 20
+          cache: npm
+      - run: npm ci
+      # Fail only on HIGH/critical in production dependencies.
+      - name: npm audit (high+, prod only)
+        run: npm run audit:ci
+  skills-safety:
+    name: Skills safety scan
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v7
+      - uses: actions/setup-node@v6
+        with:
+          node-version: 20
+      # No install needed — pure Node script over SKILL.md / agents / catalog.
+      - name: Static safety audit of skills & agents
+        run: node scripts/skills-safety-scan.mjs --root .
+  codeql:
+    name: CodeQL (JS/TS)
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      security-events: write
+      actions: read
+    steps:
+      - uses: actions/checkout@v7
+      - uses: github/codeql-action/init@v4
+        with:
+          languages: javascript-typescript
+          queries: security-and-quality
+      - uses: github/codeql-action/analyze@v4
+        with:
+          category: "/language:javascript-typescript"
+  dependency-review:
+    name: Dependency review (PR only)
+    if: github.event_name == 'pull_request'
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      pull-requests: write
+    steps:
+      - uses: actions/checkout@v7
+      - uses: actions/dependency-review-action@v5
+        with:
+          fail-on-severity: high
+          comment-summary-in-pr: on-failure
+  # ── 2. Socket — supply-chain / malicious package detection ────────────
+  # Requires repo secret SOCKET_SECURITY_API_KEY (free at socket.dev).
+  socket:
+    name: Socket supply-chain scan
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v7
+      - name: Check for Socket token
+        id: gate
+        run: |
+          if [ -n "${{ secrets.SOCKET_SECURITY_API_KEY }}" ]; then
+            echo "enabled=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "enabled=false" >> "$GITHUB_OUTPUT"
+            echo "::notice title=Socket skipped::Set the SOCKET_SECURITY_API_KEY repo secret to enable Socket scanning."
+          fi
+      - uses: actions/setup-node@v6
+        if: steps.gate.outputs.enabled == 'true'
+        with:
+          node-version: 20
+      - name: Socket CLI scan
+        if: steps.gate.outputs.enabled == 'true'
+        env:
+          SOCKET_SECURITY_API_KEY: ${{ secrets.SOCKET_SECURITY_API_KEY }}
+        run: npx -y @socketsecurity/cli@latest scan create . --view --no-interactive
+  # ── 3. Snyk — dependency + code vulnerability scanning ────────────────
+  # Requires repo secret SNYK_TOKEN (free at snyk.io). Uploads SARIF to the
+  # GitHub Security tab.
+  snyk:
+    name: Snyk (deps + code)
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      security-events: write
+    steps:
+      - uses: actions/checkout@v7
+      - name: Check for Snyk token
+        id: gate
+        run: |
+          if [ -n "${{ secrets.SNYK_TOKEN }}" ]; then
+            echo "enabled=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "enabled=false" >> "$GITHUB_OUTPUT"
+            echo "::notice title=Snyk skipped::Set the SNYK_TOKEN repo secret to enable Snyk scanning."
+          fi
+      - uses: actions/setup-node@v6
+        if: steps.gate.outputs.enabled == 'true'
+        with:
+          node-version: 20
+          cache: npm
+      - run: npm ci
+        if: steps.gate.outputs.enabled == 'true'
+      - name: Snyk Open Source (dependencies)
+        if: steps.gate.outputs.enabled == 'true'
+        continue-on-error: true
+        env:
+          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
+        run: npx -y snyk@latest test --severity-threshold=high --sarif-file-output=snyk-deps.sarif
+      - name: Snyk Code (SAST)
+        if: steps.gate.outputs.enabled == 'true'
+        continue-on-error: true
+        env:
+          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
+        run: npx -y snyk@latest code test --severity-threshold=high --sarif-file-output=snyk-code.sarif
+      - name: Upload Snyk results to GitHub Security
+        if: steps.gate.outputs.enabled == 'true'
+        uses: github/codeql-action/upload-sarif@v4
+        with:
+          sarif_file: .
+          category: snyk
+        continue-on-error: true

package/README.md CHANGED Viewed

@@ -385,6 +385,7 @@ omp grows in vertical slices. Items aren't pinned to specific semver versions
 - [Jira adapter](docs/jira.md) — configuration discovery, safe operations, dry-runs, fallback payloads
 - [Self-evolve](docs/self-evolve.md) — extracting reusable skills from session transcripts
 - [Slack setup](docs/slack-setup.md) — Slack app manifest, scopes, Socket-Mode token, `omp gateway serve`
+- [Skill benchmark](benchmarks/skill-bench/README.md) — agentic benchmark that measures whether a skill actually beats *just telling the model* (baseline / one-line prompt / skill arms), with live Haiku 4.5 findings
 ## Layout
@@ -394,6 +395,7 @@ omp grows in vertical slices. Items aren't pinned to specific semver versions
 hooks/hooks.json                  # lifecycle hook manifest
 scripts/*.mjs                     # hook implementations
 src/                              # omp CLI, team runtime, gateway/comms, schedule, mode-state loops
+benchmarks/skill-bench/           # agentic benchmark: does a skill beat just telling the model?
 ```
 Skills follow the [Copilot agent-skills docs](https://docs.github.com/en/copilot) — project skills live in `.github/skills/` and are invoked with `/skill-name`.

package/catalog/capabilities.json CHANGED Viewed

@@ -587,6 +587,52 @@
         }
       }
     },
+    {
+      "id": "ponytail",
+      "name": "ponytail",
+      "title": "Ponytail",
+      "category": "code",
+      "summary": "Lazy senior dev mode \u2014 simplest solution that works (YAGNI, stdlib first).",
+      "notes": "Lite slash project skill plus a persisted mode re-injected by the prompt-submit hook (omp ponytail start|status|off).",
+      "defaultCommand": "ponytail",
+      "phase1": true,
+      "sourceSkill": "ponytail",
+      "providers": {
+        "copilot": "supported"
+      },
+      "support": {
+        "copilot": "native"
+      },
+      "providerSupport": {
+        "copilot": {
+          "state": "native",
+          "notes": "Use /ponytail from .github/skills/ponytail/SKILL.md."
+        }
+      }
+    },
+    {
+      "id": "code.minimal",
+      "name": "code.minimal",
+      "title": "Ponytail",
+      "category": "code",
+      "summary": "Lazy senior dev mode \u2014 simplest solution that works (YAGNI, stdlib first).",
+      "notes": "Alias for ponytail capability.",
+      "defaultCommand": "ponytail",
+      "phase1": true,
+      "sourceSkill": "ponytail",
+      "providers": {
+        "copilot": "supported"
+      },
+      "support": {
+        "copilot": "native"
+      },
+      "providerSupport": {
+        "copilot": {
+          "state": "native",
+          "notes": "Use /ponytail from .github/skills/ponytail/SKILL.md."
+        }
+      }
+    },
     {
       "id": "debug",
       "name": "debug",