@damian87/omp 0.16.0 → 0.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,42 @@
1
+ version: 2
2
+ updates:
3
+ # npm dependencies
4
+ - package-ecosystem: npm
5
+ directory: "/"
6
+ schedule:
7
+ interval: weekly
8
+ day: monday
9
+ time: "06:00"
10
+ timezone: Europe/London
11
+ open-pull-requests-limit: 10
12
+ labels: ["dependencies"]
13
+ groups:
14
+ # Coupled peer pairs must upgrade together or npm hits ERESOLVE.
15
+ # Listed first so they take precedence over the type-based groups
16
+ # below, and intentionally unrestricted on update-types so majors
17
+ # (e.g. eslint 9->10) bump alongside their peers in one PR.
18
+ eslint:
19
+ patterns: ["eslint", "@eslint/*"]
20
+ vitest:
21
+ patterns: ["vitest", "@vitest/*"]
22
+ dev-dependencies:
23
+ dependency-type: development
24
+ update-types: ["minor", "patch"]
25
+ production-dependencies:
26
+ dependency-type: production
27
+ update-types: ["minor", "patch"]
28
+ commit-message:
29
+ prefix: "chore(deps)"
30
+ prefix-development: "chore(dev-deps)"
31
+
32
+ # GitHub Actions used in workflows
33
+ - package-ecosystem: github-actions
34
+ directory: "/"
35
+ schedule:
36
+ interval: weekly
37
+ day: monday
38
+ time: "06:00"
39
+ timezone: Europe/London
40
+ labels: ["ci", "dependencies"]
41
+ commit-message:
42
+ prefix: "ci(actions)"
@@ -17,13 +17,17 @@ Use `/code-review` before merge or final handoff.
17
17
 
18
18
  1. **Read the diff** — `git diff` for unstaged, `git diff --staged` for staged, or `git diff main...HEAD` for branch diff
19
19
  2. **Check for blockers** — bugs, logic errors, missing error handling, broken contracts
20
- 3. **Check for security** — secrets in code, injection risks, auth gaps, unsafe defaults
20
+ 3. **Check for security** — secrets in code, injection risks, auth gaps, unsafe defaults, and
21
+ **data exposure / least privilege**: does the change return, log, or expose more than it
22
+ needs (PII, password hashes, `SELECT *`, tokens, internal fields)?
21
23
  4. **Check for regressions** — does the change break existing tests or documented behaviour?
22
24
  5. **Check for scope drift** — does the change do more or less than requested?
23
25
  6. **Run tests** if they exist and haven't been run
24
26
 
25
27
  ## Rules
26
28
 
29
+ - **Don't stop at the first issue.** Once you find a blocker, keep scanning the whole change —
30
+ a serious bug (e.g. a data leak) often hides behind the obvious one. Review every line.
27
31
  - Only flag issues that genuinely matter — no style nits, no formatting opinions
28
32
  - If the code works, tests pass, and scope is right, say so clearly
29
33
  - Flag anything you'd reject in a PR review
@@ -17,7 +17,7 @@ Use `/debug` for broken, failing, slow, or confusing behavior.
17
17
 
18
18
  ## Steps (follow in order)
19
19
 
20
- 1. **Reproduce** — get the failure to happen reliably. If you can't reproduce, that's important information.
20
+ 1. **Reproduce** — get the failure to happen reliably. If you can't reproduce, that's important information. For a web UI bug, use `/qa-browse` to drive the page and reproduce the broken flow.
21
21
  2. **Minimise** — find the smallest case that still fails. Strip away unrelated code/config.
22
22
  3. **Hypothesise** — form 2–3 ranked theories about the cause. Start with the most likely.
23
23
  4. **Inspect** — gather evidence for/against each hypothesis. Read code, add logging, check state.
@@ -0,0 +1,83 @@
1
+ ---
2
+ name: ponytail
3
+ description: Lazy senior dev mode. Forces the simplest, shortest solution that actually works — YAGNI, stdlib first, native platform features before dependencies, one line before fifty, no unrequested abstractions. Use with /ponytail when the user complains about over-engineering, bloat, boilerplate, or unnecessary dependencies, or says "be lazy", "lazy mode", "simplest solution", "minimal solution", "yagni", "do less", or "shortest path". Adapted from DietrichGebert/ponytail (MIT).
4
+ argument-hint: "[lite|full|ultra]"
5
+ ---
6
+
7
+ # Ponytail — lazy senior dev mode
8
+
9
+ You are a lazy senior developer. Lazy means efficient, not careless. The best
10
+ code is the code never written.
11
+
12
+ ## Mode
13
+
14
+ Once activated, every response follows the ladder until deactivated. No drift
15
+ back to over-building. Still active if unsure. Default level: **full**.
16
+
17
+ - `/ponytail` or `/ponytail full` — the ladder, applied with judgement (default).
18
+ - `/ponytail lite` — apply the ladder but keep explanatory prose.
19
+ - `/ponytail ultra` — smallest possible diff, code over prose, terse.
20
+ - `/ponytail off` or "normal mode" or "stop ponytail" — deactivate.
21
+
22
+ Run `omp ponytail start [level]` to persist the mode across turns (re-injected by
23
+ the prompt-submit hook, like ralph/ultrawork). `omp ponytail off` clears it. If
24
+ the CLI command is unavailable, the in-session rules below still apply.
25
+
26
+ ## The ladder (stop at the first rung that holds)
27
+
28
+ 1. Does this need to exist at all? (YAGNI)
29
+ 2. Already in this codebase? Reuse the helper/util/pattern — don't rewrite it.
30
+ 3. Stdlib does it? Use it.
31
+ 4. Native platform feature covers it? Use it.
32
+ 5. Already-installed dependency solves it? Use it.
33
+ 6. Can this be one line? Make it one line.
34
+ 7. Only then: write the minimum code that works.
35
+
36
+ The ladder runs **after** you understand the problem, not instead of it: read
37
+ the task and the code it touches, trace the real flow end to end, then climb. A
38
+ small diff you don't understand is laziness dressed up as efficiency.
39
+
40
+ Bug fix = root cause, not symptom: grep every caller of the function you touch
41
+ and fix the shared function once. One guard there is a smaller diff than one per
42
+ caller, and patching only the named path leaves a sibling caller broken.
43
+
44
+ ## Rules
45
+
46
+ - No abstractions that weren't explicitly requested.
47
+ - No new dependency if it can be avoided.
48
+ - No boilerplate nobody asked for.
49
+ - Deletion over addition. Boring over clever. Fewest files possible.
50
+ - Shortest working diff wins — but only once you understand the problem.
51
+ - Question complex requests: "Do you actually need X, or does Y cover it?"
52
+ - When two stdlib approaches are the same size, pick the edge-case-correct one.
53
+ Lazy means less code, not the flimsier algorithm.
54
+ - Mark intentional simplifications with a `ponytail:` comment. If the shortcut
55
+ has a known ceiling (global lock, O(n²) scan, naive heuristic), name the
56
+ ceiling and the upgrade path in the comment.
57
+
58
+ ## Never lazy about
59
+
60
+ Understanding the problem, input validation at trust boundaries, error handling
61
+ that prevents data loss, security, accessibility, and anything explicitly
62
+ requested. Lazy code without its check is unfinished: non-trivial logic leaves
63
+ **one** runnable check behind — the smallest thing that fails if the logic
64
+ breaks (an assert-based self-check or one small test file; no frameworks, no
65
+ fixtures). Trivial one-liners need no test.
66
+
67
+ ## Examples
68
+
69
+ **Over-built**: install flatpickr, write a wrapper component, add a stylesheet,
70
+ open a timezone discussion.
71
+ **Ponytail**: `<input type="date">` — the browser has one.
72
+
73
+ **Over-built**: a `StringUtils` class with a `capitalize` static method.
74
+ **Ponytail**: `s[0].toUpperCase() + s.slice(1)` at the one call site.
75
+
76
+ ## Deactivate
77
+
78
+ Say "normal mode", "/ponytail off", or "stop ponytail" to return to standard behaviour.
79
+
80
+ If the mode was persisted with `omp ponytail start`, a chat-only "off" is not
81
+ enough — the prompt-submit hook keeps re-injecting `[PONYTAIL ACTIVE]` until the
82
+ state file is cleared. So on any deactivation request, **run `omp ponytail off`**
83
+ to clear persisted state, then confirm it's off.
@@ -0,0 +1,125 @@
1
+ ---
2
+ name: qa-browse
3
+ description: Drive a real browser from the CLI to QA a flow — navigate, click, fill, verify. Uses @playwright/cli (token-efficient, not MCP). Use with /qa-browse when the user wants to manually check a web flow works, not write a test suite.
4
+ argument-hint: "<url> <what to verify>"
5
+ ---
6
+
7
+ # QA Browse — CLI browser driving with @playwright/cli
8
+
9
+ `/qa-browse` opens a live browser via `@playwright/cli` (binary `playwright-cli`) and walks a flow to verify it works. No test files. No MCP. This is distinct from the standard Playwright CLI (`npx playwright`, used for test/codegen/show-trace).
10
+
11
+ Engine: `@playwright/cli` (Microsoft). Snapshots live on disk, not in context — cheap tokens. Browser stays alive between commands.
12
+
13
+ ## Rules
14
+
15
+ - If not installed globally, run via the scoped package: `npx @playwright/cli` (NOT `npx playwright-cli` — that resolves the unscoped name and fails with ENOTFOUND). Never assume global.
16
+ - Loop: **snapshot → read refs → act → re-snapshot.** Always.
17
+ - Refs (`e5`, `e12`) are valid only for the latest snapshot. Re-snapshot after any navigation/click that changes the page.
18
+ - Headless by default. Add `--headed` only when a human must watch.
19
+ - Prefer refs over CSS. Use `getByRole`/`getByText` selectors only if a ref isn't available.
20
+ - Verify with `eval` or a snapshot of the result region — don't assume an action worked.
21
+ - Screenshot on each pass/fail checkpoint so there's evidence.
22
+ - `close` when done.
23
+
24
+ ## Setup
25
+
26
+ ```bash
27
+ npm install -g @playwright/cli@latest # or run ad-hoc: npx @playwright/cli
28
+ playwright-cli install-browser chromium # first run in a fresh env (NOT `install` — that inits a workspace)
29
+ ```
30
+
31
+ ## Core loop
32
+
33
+ ```bash
34
+ playwright-cli open <url> # open + navigate (prints a snapshot path)
35
+ playwright-cli snapshot # accessibility tree with refs → read it
36
+ playwright-cli click e15 # act using a ref
37
+ playwright-cli fill e5 "text" # fill input (add --submit to press Enter)
38
+ playwright-cli type "text" # type into focused element
39
+ playwright-cli press Enter # key press
40
+ playwright-cli snapshot # re-snapshot to confirm new state
41
+ playwright-cli screenshot # evidence
42
+ playwright-cli close
43
+ ```
44
+
45
+ ## Interact
46
+
47
+ ```bash
48
+ playwright-cli click <ref> [button] # left/right/middle
49
+ playwright-cli dblclick <ref>
50
+ playwright-cli fill <ref> <text> --submit
51
+ playwright-cli select <ref> <value> # dropdown
52
+ playwright-cli check <ref> / uncheck <ref>
53
+ playwright-cli hover <ref>
54
+ playwright-cli drag <startRef> <endRef>
55
+ playwright-cli upload ./file.pdf
56
+ playwright-cli dialog-accept / dialog-dismiss
57
+ ```
58
+
59
+ ## Navigate
60
+
61
+ ```bash
62
+ playwright-cli goto <url>
63
+ playwright-cli go-back / go-forward / reload
64
+ ```
65
+
66
+ ## Inspect & verify
67
+
68
+ ```bash
69
+ playwright-cli snapshot --depth=4 # shallow tree on big pages
70
+ playwright-cli snapshot e34 # drill into a subtree
71
+ playwright-cli snapshot --raw | grep button # script-friendly
72
+ playwright-cli eval "document.title" # read page state
73
+ playwright-cli eval "el => el.textContent" e5
74
+ playwright-cli eval "el => el.getAttribute('data-testid')" e5
75
+ playwright-cli console # console messages
76
+ ```
77
+
78
+ ## Evidence
79
+
80
+ ```bash
81
+ playwright-cli screenshot --full-page # full scrollable page (bare `screenshot` = current viewport)
82
+ playwright-cli screenshot e5 # one element
83
+ playwright-cli screenshot --filename=step1.png
84
+ playwright-cli video-start / video-stop
85
+ playwright-cli tracing-start / tracing-stop # record a trace; view it with: npx playwright show-trace <trace>
86
+ playwright-cli pdf --filename=page.pdf
87
+ ```
88
+
89
+ ## Sessions
90
+
91
+ State (cookies, localStorage) persists within a session across commands.
92
+
93
+ ```bash
94
+ playwright-cli --session=qa open <url> # named session
95
+ playwright-cli -s=qa open <url> --persistent # save profile to disk
96
+ playwright-cli list # running sessions
97
+ playwright-cli show # live dashboard, take over mouse/kbd
98
+ playwright-cli close-all / kill-all
99
+ ```
100
+
101
+ ## QA flow checklist
102
+
103
+ 1. `open <url>` → `snapshot`.
104
+ 2. For each step: find ref in snapshot → act → re-snapshot → verify expected element/text.
105
+ 3. `screenshot` at each checkpoint (pass and fail).
106
+ 4. On failure: `eval` the element, capture `console`, take a `--headed` re-run or trace.
107
+ 5. Report: what passed, what failed, with screenshot/snapshot paths. `close`.
108
+
109
+ ## Example — login flow
110
+
111
+ ```bash
112
+ playwright-cli open https://app.example.com/login
113
+ playwright-cli snapshot
114
+ playwright-cli fill e1 "user@example.com"
115
+ playwright-cli fill e2 "secret" --submit
116
+ playwright-cli snapshot # expect dashboard
117
+ playwright-cli eval "document.title"
118
+ playwright-cli screenshot --filename=logged-in.png
119
+ playwright-cli close
120
+ ```
121
+
122
+ ## When NOT to use
123
+
124
+ - Want a saved, repeatable test suite → use `/tdd` or write `@playwright/test` specs.
125
+ - Need long-running autonomous loops or persistent introspection → Playwright MCP may fit better.
@@ -19,7 +19,10 @@ Use `/ralplan` when the task needs planning before edits.
19
19
  2. **List implementation slices** in execution order — each slice should be independently verifiable
20
20
  3. **Define acceptance criteria** — what must be true when done
21
21
  4. **Define test shape** — which tests to write or run, what they cover
22
- 5. **Call out risks** — what could go wrong, tradeoffs chosen, alternatives rejected
22
+ 5. **Call out risks** — what could go wrong, tradeoffs chosen, alternatives rejected. For any
23
+ auth, security, or data-handling feature, the plan **must** name the security specifics even
24
+ if the request didn't: secret/token **expiry**, **single-use / replay** protection, and
25
+ **enumeration / rate-limiting**. Leaving these implicit is how the plan ships a hole.
23
26
  6. **Stop at the plan** unless the user explicitly asked to implement
24
27
 
25
28
  ## Output
@@ -13,15 +13,29 @@ Use `/tdd` when a change can be specified by tests.
13
13
  - The codebase has an existing test framework
14
14
  - You want to prove correctness incrementally
15
15
 
16
- ## Loop (repeat until done)
17
-
18
- 1. **Red** — write or identify a failing test that describes the desired behaviour
19
- 2. **Green** write the minimal code to make the test pass
20
- 3. **Refactor** clean up the code while keeping tests green
21
- 4. **Run** — run the full related test suite to check for regressions
16
+ ## Loop (Canon TDD — repeat until the list is empty)
17
+
18
+ 0. **List first** — before writing any code, read the **full spec/docstring** and write a
19
+ **test list**: every scenario you need to cover. Don't start from the happy path — walk the
20
+ edge-case taxonomy against the spec and add a line for each that applies:
21
+ - **Boundary** — min/max, zero, empty, first/last, length limits, collapsing/trimming
22
+ - **Empty/Null** — `""`, `None`, empty collection, whitespace-only
23
+ - **Format** — **unicode / accented characters**, emoji, special chars, malformed input
24
+ - **Implicit** — anything the spec *implies* but the prompt didn't spell out
25
+ A requirement that appears in the spec but not your list is the bug you're about to ship.
26
+ 1. **Red** — turn **exactly one** list item into a concrete test with real **assertions**
27
+ (`assert`, `expect`, `self.assertEqual`); run it and watch it **fail for the right reason**.
28
+ 2. **Green** — write the minimal code to make that test (and all previous tests) pass.
29
+ 3. **Refactor** — clean up while tests stay green.
30
+ 4. **Repeat** — take the next list item; add new items as you discover them. Run the full
31
+ related suite at the end to check for regressions.
22
32
 
23
33
  ## Rules
24
34
 
35
+ - Use **executable assertions** — a script that only prints results for a human to eyeball is
36
+ **not a test** and does not count as red-green. Every scenario on the list gets an assertion.
37
+ - Work the **whole list**, not just the first case — the bugs hide in the edge cases the prompt
38
+ didn't spell out (unicode/accents, empty input, boundaries).
25
39
  - Test **behaviour** through public surfaces, not implementation details
26
40
  - Each test should describe one behaviour — name it clearly (e.g. "returns 404 when user not found")
27
41
  - Avoid brittle tests that break when implementation changes but behaviour doesn't
@@ -30,6 +44,7 @@ Use `/tdd` when a change can be specified by tests.
30
44
 
31
45
  ## Output
32
46
 
47
+ - `Test list` — the scenarios you enumerated from the spec (incl. the edge cases)
33
48
  - `Tests written` — list of test names and what they cover
34
49
  - `Implementation` — what was changed to make tests pass
35
50
  - `Refactoring` — what was cleaned up
@@ -54,7 +54,7 @@ Number every cycle explicitly: "Cycle 1", "Cycle 2", etc.
54
54
 
55
55
  ## Rules
56
56
 
57
- - Prefer runnable checks over inspection — run tests, don't just read code
57
+ - Prefer runnable checks over inspection — run tests, don't just read code. For web UI flows, exercise the real page with `/qa-browse` rather than inspecting markup.
58
58
  - If tests don't exist, write minimal ones that cover the change
59
59
  - Route fixes back to `/ralph` or `/ultrawork` if they're substantial
60
60
 
@@ -20,7 +20,7 @@ Use `/verify` before saying done.
20
20
  - Tests: `npm test`, `pytest`, etc.
21
21
  - Build: does it compile/build without errors?
22
22
  - Lint: any new warnings?
23
- - Behaviour: does the feature work as described?
23
+ - Behaviour: does the feature work as described? For web UI flows, use `/qa-browse` to drive the live page and capture snapshot/screenshot evidence.
24
24
  3. **Read outputs** — don't assume green means pass; read the actual results
25
25
  4. **Report honestly** — if there are gaps, say so
26
26
 
@@ -0,0 +1,67 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+ branches: [main]
8
+ workflow_dispatch:
9
+
10
+ # Avoid piling up redundant runs on the same ref.
11
+ concurrency:
12
+ group: ci-${{ github.workflow }}-${{ github.ref }}
13
+ cancel-in-progress: true
14
+
15
+ permissions:
16
+ contents: read
17
+
18
+ env:
19
+ # Hermetic runs — never auto-load a developer's ~/.omp/.env, never self-update.
20
+ OMP_SKIP_USER_ENV: "1"
21
+ OMP_NO_UPDATE_CHECK: "1"
22
+
23
+ jobs:
24
+ build-test:
25
+ name: Build · Test · Lint (Node ${{ matrix.node }})
26
+ runs-on: ubuntu-latest
27
+ strategy:
28
+ fail-fast: false
29
+ matrix:
30
+ node: [20, 22]
31
+ steps:
32
+ - uses: actions/checkout@v7
33
+
34
+ - uses: actions/setup-node@v6
35
+ with:
36
+ node-version: ${{ matrix.node }}
37
+ cache: npm
38
+
39
+ - name: Install dependencies
40
+ run: npm ci
41
+
42
+ - name: Build (tsc)
43
+ run: npm run build
44
+
45
+ - name: Lint (eslint)
46
+ run: npm run lint
47
+
48
+ - name: Unit tests (vitest)
49
+ run: npm test
50
+
51
+ skills:
52
+ name: Validate skills & catalog
53
+ runs-on: ubuntu-latest
54
+ steps:
55
+ - uses: actions/checkout@v7
56
+ - uses: actions/setup-node@v6
57
+ with:
58
+ node-version: 20
59
+ cache: npm
60
+ - run: npm ci
61
+
62
+ # Project's own validators.
63
+ - name: Lint skills (omp lint:skills)
64
+ run: npm run lint:skills
65
+
66
+ - name: Validate catalog
67
+ run: npm run check:catalog
@@ -0,0 +1,157 @@
1
+ name: Security
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+ branches: [main]
8
+ schedule:
9
+ # Weekly full scan (Mondays 06:17 UTC) to catch newly disclosed CVEs.
10
+ - cron: "17 6 * * 1"
11
+ workflow_dispatch:
12
+
13
+ concurrency:
14
+ group: security-${{ github.workflow }}-${{ github.ref }}
15
+ cancel-in-progress: true
16
+
17
+ permissions:
18
+ contents: read
19
+
20
+ env:
21
+ OMP_SKIP_USER_ENV: "1"
22
+ OMP_NO_UPDATE_CHECK: "1"
23
+
24
+ jobs:
25
+ # ── 1. Native, free, zero-secret baseline ─────────────────────────────
26
+ npm-audit:
27
+ name: npm audit (prod deps)
28
+ runs-on: ubuntu-latest
29
+ steps:
30
+ - uses: actions/checkout@v7
31
+ - uses: actions/setup-node@v6
32
+ with:
33
+ node-version: 20
34
+ cache: npm
35
+ - run: npm ci
36
+ # Fail only on HIGH/critical in production dependencies.
37
+ - name: npm audit (high+, prod only)
38
+ run: npm run audit:ci
39
+
40
+ skills-safety:
41
+ name: Skills safety scan
42
+ runs-on: ubuntu-latest
43
+ steps:
44
+ - uses: actions/checkout@v7
45
+ - uses: actions/setup-node@v6
46
+ with:
47
+ node-version: 20
48
+ # No install needed — pure Node script over SKILL.md / agents / catalog.
49
+ - name: Static safety audit of skills & agents
50
+ run: node scripts/skills-safety-scan.mjs --root .
51
+
52
+ codeql:
53
+ name: CodeQL (JS/TS)
54
+ runs-on: ubuntu-latest
55
+ permissions:
56
+ contents: read
57
+ security-events: write
58
+ actions: read
59
+ steps:
60
+ - uses: actions/checkout@v7
61
+ - uses: github/codeql-action/init@v4
62
+ with:
63
+ languages: javascript-typescript
64
+ queries: security-and-quality
65
+ - uses: github/codeql-action/analyze@v4
66
+ with:
67
+ category: "/language:javascript-typescript"
68
+
69
+ dependency-review:
70
+ name: Dependency review (PR only)
71
+ if: github.event_name == 'pull_request'
72
+ runs-on: ubuntu-latest
73
+ permissions:
74
+ contents: read
75
+ pull-requests: write
76
+ steps:
77
+ - uses: actions/checkout@v7
78
+ - uses: actions/dependency-review-action@v5
79
+ with:
80
+ fail-on-severity: high
81
+ comment-summary-in-pr: on-failure
82
+
83
+ # ── 2. Socket — supply-chain / malicious package detection ────────────
84
+ # Requires repo secret SOCKET_SECURITY_API_KEY (free at socket.dev).
85
+ socket:
86
+ name: Socket supply-chain scan
87
+ runs-on: ubuntu-latest
88
+ steps:
89
+ - uses: actions/checkout@v7
90
+ - name: Check for Socket token
91
+ id: gate
92
+ run: |
93
+ if [ -n "${{ secrets.SOCKET_SECURITY_API_KEY }}" ]; then
94
+ echo "enabled=true" >> "$GITHUB_OUTPUT"
95
+ else
96
+ echo "enabled=false" >> "$GITHUB_OUTPUT"
97
+ echo "::notice title=Socket skipped::Set the SOCKET_SECURITY_API_KEY repo secret to enable Socket scanning."
98
+ fi
99
+ - uses: actions/setup-node@v6
100
+ if: steps.gate.outputs.enabled == 'true'
101
+ with:
102
+ node-version: 20
103
+ - name: Socket CLI scan
104
+ if: steps.gate.outputs.enabled == 'true'
105
+ env:
106
+ SOCKET_SECURITY_API_KEY: ${{ secrets.SOCKET_SECURITY_API_KEY }}
107
+ run: npx -y @socketsecurity/cli@latest scan create . --view --no-interactive
108
+
109
+ # ── 3. Snyk — dependency + code vulnerability scanning ────────────────
110
+ # Requires repo secret SNYK_TOKEN (free at snyk.io). Uploads SARIF to the
111
+ # GitHub Security tab.
112
+ snyk:
113
+ name: Snyk (deps + code)
114
+ runs-on: ubuntu-latest
115
+ permissions:
116
+ contents: read
117
+ security-events: write
118
+ steps:
119
+ - uses: actions/checkout@v7
120
+ - name: Check for Snyk token
121
+ id: gate
122
+ run: |
123
+ if [ -n "${{ secrets.SNYK_TOKEN }}" ]; then
124
+ echo "enabled=true" >> "$GITHUB_OUTPUT"
125
+ else
126
+ echo "enabled=false" >> "$GITHUB_OUTPUT"
127
+ echo "::notice title=Snyk skipped::Set the SNYK_TOKEN repo secret to enable Snyk scanning."
128
+ fi
129
+ - uses: actions/setup-node@v6
130
+ if: steps.gate.outputs.enabled == 'true'
131
+ with:
132
+ node-version: 20
133
+ cache: npm
134
+ - run: npm ci
135
+ if: steps.gate.outputs.enabled == 'true'
136
+
137
+ - name: Snyk Open Source (dependencies)
138
+ if: steps.gate.outputs.enabled == 'true'
139
+ continue-on-error: true
140
+ env:
141
+ SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
142
+ run: npx -y snyk@latest test --severity-threshold=high --sarif-file-output=snyk-deps.sarif
143
+
144
+ - name: Snyk Code (SAST)
145
+ if: steps.gate.outputs.enabled == 'true'
146
+ continue-on-error: true
147
+ env:
148
+ SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
149
+ run: npx -y snyk@latest code test --severity-threshold=high --sarif-file-output=snyk-code.sarif
150
+
151
+ - name: Upload Snyk results to GitHub Security
152
+ if: steps.gate.outputs.enabled == 'true'
153
+ uses: github/codeql-action/upload-sarif@v4
154
+ with:
155
+ sarif_file: .
156
+ category: snyk
157
+ continue-on-error: true
package/README.md CHANGED
@@ -385,6 +385,7 @@ omp grows in vertical slices. Items aren't pinned to specific semver versions
385
385
  - [Jira adapter](docs/jira.md) — configuration discovery, safe operations, dry-runs, fallback payloads
386
386
  - [Self-evolve](docs/self-evolve.md) — extracting reusable skills from session transcripts
387
387
  - [Slack setup](docs/slack-setup.md) — Slack app manifest, scopes, Socket-Mode token, `omp gateway serve`
388
+ - [Skill benchmark](benchmarks/skill-bench/README.md) — agentic benchmark that measures whether a skill actually beats *just telling the model* (baseline / one-line prompt / skill arms), with live Haiku 4.5 findings
388
389
 
389
390
  ## Layout
390
391
 
@@ -394,6 +395,7 @@ omp grows in vertical slices. Items aren't pinned to specific semver versions
394
395
  hooks/hooks.json # lifecycle hook manifest
395
396
  scripts/*.mjs # hook implementations
396
397
  src/ # omp CLI, team runtime, gateway/comms, schedule, mode-state loops
398
+ benchmarks/skill-bench/ # agentic benchmark: does a skill beat just telling the model?
397
399
  ```
398
400
 
399
401
  Skills follow the [Copilot agent-skills docs](https://docs.github.com/en/copilot) — project skills live in `.github/skills/` and are invoked with `/skill-name`.
@@ -587,6 +587,52 @@
587
587
  }
588
588
  }
589
589
  },
590
+ {
591
+ "id": "ponytail",
592
+ "name": "ponytail",
593
+ "title": "Ponytail",
594
+ "category": "code",
595
+ "summary": "Lazy senior dev mode \u2014 simplest solution that works (YAGNI, stdlib first).",
596
+ "notes": "Lite slash project skill plus a persisted mode re-injected by the prompt-submit hook (omp ponytail start|status|off).",
597
+ "defaultCommand": "ponytail",
598
+ "phase1": true,
599
+ "sourceSkill": "ponytail",
600
+ "providers": {
601
+ "copilot": "supported"
602
+ },
603
+ "support": {
604
+ "copilot": "native"
605
+ },
606
+ "providerSupport": {
607
+ "copilot": {
608
+ "state": "native",
609
+ "notes": "Use /ponytail from .github/skills/ponytail/SKILL.md."
610
+ }
611
+ }
612
+ },
613
+ {
614
+ "id": "code.minimal",
615
+ "name": "code.minimal",
616
+ "title": "Ponytail",
617
+ "category": "code",
618
+ "summary": "Lazy senior dev mode \u2014 simplest solution that works (YAGNI, stdlib first).",
619
+ "notes": "Alias for ponytail capability.",
620
+ "defaultCommand": "ponytail",
621
+ "phase1": true,
622
+ "sourceSkill": "ponytail",
623
+ "providers": {
624
+ "copilot": "supported"
625
+ },
626
+ "support": {
627
+ "copilot": "native"
628
+ },
629
+ "providerSupport": {
630
+ "copilot": {
631
+ "state": "native",
632
+ "notes": "Use /ponytail from .github/skills/ponytail/SKILL.md."
633
+ }
634
+ }
635
+ },
590
636
  {
591
637
  "id": "debug",
592
638
  "name": "debug",