npm - wordpress-agent-kit - Versions diffs - 0.2.2 → 0.3.2 - Mend

wordpress-agent-kit 0.2.2 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

package/.github/skills/wp-abilities-verify/SKILL.md ADDED Viewed

@@ -0,0 +1,215 @@
+---
+name: wp-abilities-verify
+description: "Verify a WordPress plugin's Abilities API registrations: enumerate abilities, check that callback behavior matches each annotation's claim (the adversarial readonly-but-writes detection), validate permissions and schemas, and validate audit documents produced by wp-abilities-audit."
+license: GPL-2.0-or-later
+compatibility: "Targets WordPress 6.9+ plugins (PHP 7.2.24+). Requires a runnable environment (wp-env, docker-based dev stack, or equivalent) for runtime mode; static mode runs entirely from the plugin checkout with no env. Filesystem-based agent with bash + node."
+---
+# WP Abilities Verify
+Verify a WordPress plugin's Abilities API registrations. The
+centerpiece is the **adversarial annotation correctness check**: a
+`readonly: true` ability that actually writes (via `$wpdb->update`,
+`update_option`, a non-GET delegate, etc.) is a security and UX
+disaster because agents plan actions on the basis of the annotations
+they introspect. This skill catches those lies by reading the callback
+body and comparing what it does against what the annotation claims.
+The skill also validates audit docs produced by `wp-abilities-audit`,
+checks permission gates and schema hygiene, and optionally executes
+each ability against a live environment.
+## When to use
+- After abilities have been registered in a plugin but before a PR
+  lands.
+- As a health-check on an already-shipped plugin (catch regressions
+  where a refactor turned a readonly ability into a writing one).
+- To validate an audit document before handing it to an implementer.
+## Two modes
+- **Static mode** — runs from the plugin checkout. No env. Enumerates
+  via source inspection, runs the adversarial correctness check, runs
+  schema and permission lints, and validates audit docs.
+- **Runtime mode** — requires a running env. Does everything static
+  does PLUS: `wp_get_abilities()` for authoritative enumeration,
+  executes each ability with curated inputs, confirms permission
+  roundtrip against real users, and runs a twin-invocation heuristic
+  on `idempotent: true` abilities to flag candidates for review
+  (return-value equality is a signal, not a verdict — core defines
+  idempotent as "no additional effect on the environment").
+Both modes produce the same structured report format.
+A static-mode PASS means "no obvious-shape violations," not "verified
+write-free." For high-stakes plugins, run runtime mode before landing
+— it catches bootstrap-order, permission-roundtrip, and idempotency
+issues that static can't. See `references/annotation-correctness.md`
+for the static blind spots.
+## Inputs required
+1. **Plugin checkout path** — working tree to verify.
+2. **Mode** — `static` or `runtime`. Default to static if unspecified.
+3. **(Runtime only) Env-up command** — read the plugin's `AGENTS.md`.
+   Common patterns: `npm run wp-env start`, `npx wp-env start`, or a
+   composer-based bring-up. Plugin families with their own dev tooling
+   will document their own command. Do NOT assume `npm run wp-env`
+   works.
+4. **(Optional) Audit doc path** — enables cross-checks between the
+   audit and the registered abilities, and validates the audit itself.
+5. **Report output path** — explicit path, typically the user's vault.
+## Prerequisites
+- `wp-project-triage` has been run on the plugin.
+- The plugin has at least one registered ability in source. Zero hits
+  on `wp_register_ability(` → return a clear "no abilities registered"
+  report, not an empty PASS.
+## Procedure
+### 1. (If audit provided) Validate the audit doc
+Read `references/audit-schema-validation.md`. Validate the audit
+against the canonical schema owned by `wp-abilities-audit`. Surface
+missing required fields, multiple `reference_ability: true`, and
+`backing: null` entries that aren't paired with a `surfaced_gaps`
+entry. `backing: null` alone is WARN (intentional gap output), not
+FAIL.
+### 2. Enumerate abilities statically
+Read `references/static-enumeration.md`. Find each
+`wp_register_ability(` call, extract the name, the annotation block,
+and the execute-callback location. Use a multi-line tool (`rg
+--multiline --pcre2`) — the canonical formatting splits the call
+across lines. Record each ability's source-file + line + annotations +
+callback byte range.
+### 3. (Runtime only) Enumerate via REST + wp-cli
+Read `references/runtime-harness.md`. Bring the env up using the
+command from `AGENTS.md`, then enumerate via `wp_get_abilities()` over
+wp-cli and cross-check against the static inventory. Source-only →
+FAIL (registration not firing). Runtime-only → WARN (dynamic
+registration path).
+### 4. Annotation correctness (the adversarial core)
+Read `references/annotation-correctness.md`. Read each callback body
+and verify it matches the annotation claim:
+- `readonly: true` → callback must not write to the database, the
+  options table, post / user / term / comment data, the filesystem,
+  cron, or via non-GET HTTP / REST delegates.
+- `destructive: false` → callback must not delete, refund, void,
+  cancel, or trash.
+- `idempotent: true` → repeated calls with the same input have no
+  additional effect on the environment (per the `idempotent`
+  annotation's docblock in `class-wp-ability.php`). Static catches
+  counter writes and per-call cron schedules; runtime adds a
+  twin-invocation heuristic for visible state changes.
+The reference lists common write patterns as a starting set, not a
+checklist — plugin vocabularies vary, and the agent extends with verbs
+specific to the plugin under verification.
+False positives get suppressed via an inline `// verify-ignore:
+<annotation> -- <reason>` comment.
+### 5. Permission roundtrip
+Read `references/permission-roundtrip.md`. Static: classify each
+`permission_callback` against the six shapes (preferred Shape A
+`current_user_can(...)`; FAIL on Shape B-bad `WP_REST_Request`
+patterns or Shape E literal `true`). Runtime: anon and subscriber
+denied; admin allowed (unless deliberately public). When an audit was
+provided, cross-check the registered cap against the audit's declared
+gate.
+### 6. Schema lints
+Read `references/schema-lints.md`. Six small principles applied to
+each ability's `input_schema`: object schemas declare
+`additionalProperties`; required fields have descriptions; enums
+non-empty; no `$ref`; defaults are statically constant (including
+`(object) array()`); reference abilities have no required inputs.
+Cross-reference `../wp-abilities-api/references/input-schema-gotchas.md`
+for the four runtime gotchas (defaults not injected on the
+property-level path, pagination key drift, `empty()` on string IDs,
+direct vs indirect invocation strictness).
+### 7. Error-code vocabulary
+Cross-reference `../wp-abilities-api/references/error-code-vocabulary.md`.
+Inspect each callback's `WP_Error` returns; non-vocabulary codes →
+WARN.
+## Verification
+The run produces a structured markdown report at the user-specified
+path:
+```
+---
+Last updated: <YYYY-MM-DD HH:MM>
+---
+# <Plugin> Abilities Verification — <Static|Runtime> Mode
+## Status: <PASS|WARN|FAIL>
+## Audit doc validation (if provided)
+## Static inventory
+## Annotation correctness
+| Ability | Claim | Result | Evidence |
+|---|---|---|---|
+## Permission gates
+## Schema lints
+## Error-code vocabulary
+```
+Every ability is OK, WARN, or FAIL. A single FAIL → top-line FAIL;
+WARNs without FAILs → WARN; otherwise PASS.
+## Failure modes / debugging
+- **Env not reachable (runtime)** — env-up failed or Docker isn't
+  running. Re-run `wp-project-triage`, then fix the env. Don't fall
+  back silently to static without noting it in the report.
+- **No abilities in source** — return a clear "nothing to verify"
+  report.
+- **Audit schema mismatch** — point at
+  `references/audit-schema-validation.md`; don't auto-fix the audit.
+- **False positive on readonly-writes** — see the `// verify-ignore`
+  mechanism in `references/annotation-correctness.md`. Document why
+  each suppression is legitimate.
+- **Runtime enumeration smaller than static** — registration hook
+  isn't firing. Check init hook timing, activation state, autoloader
+  order.
+## Escalation
+- Recurring legitimate pattern that trips the adversarial check across
+  multiple plugins → propose adding it to the suppression guidance in
+  `annotation-correctness.md`. Don't broaden the candidate-pattern
+  list speculatively.
+- Audit-schema validator rejects a legitimate audit → the canonical
+  schema in `../wp-abilities-audit/references/audit-schema.md` has
+  evolved. Update `references/audit-schema-validation.md` to match.
+## Out of scope
+Token-budget measurement is a separate verification axis — an
+annotation-clean, schema-clean, runtime-passing ability set can still
+be unshippable if its `tools/list` form burns through an agent's
+context budget. That axis is tracked separately. Do not aggregate
+manual or external measurement into this skill's PASS / FAIL verdict.

package/.github/skills/wp-abilities-verify/references/annotation-correctness.md ADDED Viewed

@@ -0,0 +1,154 @@
+# Annotation Correctness
+The adversarial core of this skill: verify what the annotation claims
+by reading the callback. A `readonly: true` ability that actually
+writes is a security and UX disaster, and unit tests don't catch it
+because the mock looks just like the real writer.
+## Why this matters
+Agents plan actions on the basis of the annotations they introspect.
+If an ability is annotated `readonly: true`, an orchestrator will
+confidently invoke it in a dry-run, speculative exploration, or
+multi-agent fan-out without thinking twice — because `readonly` means
+"can't break anything".
+A `readonly: true` ability that actually writes is therefore:
+1. **A security hazard** — agents will invoke it in contexts where
+   side effects are forbidden.
+2. **A UX disaster** — the agent's mental model of what happened
+   diverges silently from reality.
+3. **Undetectable at the annotation layer** — the annotation says
+   `readonly: true`; nothing in the registration forces it to be true.
+Unit tests won't catch this class of bug because the mock the test
+constructs looks just like the real writer. What catches it is reading
+the execute callback body and comparing what it does against what the
+annotation says it does.
+## What each annotation promises
+| Annotation | What it promises (from core) |
+|---|---|
+| `readonly: true` | No durable writes to user / business state. GET-style side-effect-free. |
+| `destructive: false` | Won't irreversibly destroy data or forfeit money. |
+| `idempotent: true` | Repeated calls with the same arguments produce no additional effect on the environment (per the `idempotent` annotation's docblock in `class-wp-ability.php`). |
+`readonly: true` prohibits durable writes to user or business state.
+Read-through cache writes (e.g. `set_transient`) and observability
+timestamps (e.g. `last_read_at`) are acceptable when explicitly
+annotated with `verify-ignore` — see the "Suppressing legitimate
+exceptions" section below. The static check treats unannotated writes
+as FAILs; annotated ones pass with the reason recorded as evidence.
+These overlap but are not redundant: `readonly` is the strictest;
+`destructive: false` is weaker (updates that don't destroy are OK);
+`idempotent` is orthogonal (a POST that writes the same row twice is
+both "writes" and "idempotent").
+The Abilities REST run controller operationalizes annotations into
+HTTP method routing (`readonly: true` → GET, `destructive && idempotent`
+→ DELETE, otherwise POST — see
+`WP_REST_Abilities_V1_Run_Controller::validate_request_method()`). That
+mapping is the load-bearing semantic; verify checks that each
+callback's behavior is consistent with how the routing will treat it.
+## How to verify
+For each ability, locate the `execute_callback` body (see
+`static-enumeration.md` step 4), then:
+1. **Read the callback end-to-end.** Form a model of what it actually
+   does. Don't rely on pattern-matching alone.
+2. **Compare to the claim.** A `readonly: true` callback that writes
+   anywhere — the database via `$wpdb`, options / post / user / term /
+   comment writes, filesystem, cron schedules, or non-GET HTTP/REST
+   delegates — FAILs readonly. A `destructive: false` callback that
+   deletes, refunds, voids, cancels, or trashes FAILs destructive. An
+   `idempotent: true` callback whose environmental effect *accumulates*
+   per call (counters, append-only logs, per-call cron schedules) FAILs
+   idempotent.
+3. **Record evidence.** Cite file + line of the offending pattern so a
+   reviewer can jump straight to it.
+Use grep or ripgrep to surface *candidates*. Common writes worth
+looking for:
+```text
+$wpdb->update / insert / delete / replace
+update_option / add_option / delete_option
+wp_insert_post / wp_update_post / wp_delete_post
+update_post_meta / update_user_meta / update_term_meta
+->save / ->delete / ->set_status / ->add_*
+wp_remote_post / wp_remote_delete
+file_put_contents / wp_upload_bits / unlink / rename
+wp_schedule_event / wp_schedule_single_event
+```
+Treat the list as a starting set, not a checklist. Plugin vocabularies
+vary — domain-specific verbs (`->markAsPaid`, `->commit`, `->refund`)
+and framework patterns (Doctrine `->persist`, queue `->dispatch`) won't
+appear above. Once you've grepped for candidates, read the callback to
+confirm whether each hit is actually a write and whether it
+contradicts the annotation in context.
+## Known blind spots
+Static reading + grep can't reach every write. A static-mode PASS
+means "no obvious-shape violations," not "verified write-free."
+| Blind spot | Why static misses it | Mitigation |
+|---|---|---|
+| Indirected service writes — `$repo->persist()`, `$service->commit()`, custom verbs. | Any finite verb list drifts; domain vocabulary varies. | Inspect callbacks that touch custom services or repositories. |
+| `do_action()` whose listeners write. | Provenance ambiguity: ability looks clean; system mutates state in a listener. | Audit listeners on the action. If any writes, downgrade or split. |
+| Implicit core hooks fired by WP API calls — `wp_insert_post()` fires `save_post`; `update_option()` fires `updated_option`; `wp_create_user()` fires `user_register`; etc. | The WP API call IS the write; the hooks fire automatically as a side effect. Agents looking for `do_action()` won't see this. | Treat any WP write-API call as a write regardless of whether the callback also calls `do_action()`. |
+| Action Scheduler / deferred writes — `as_schedule_single_action()`, `WC()->queue()->schedule_single()`, custom job dispatchers. | The callback returns cleanly with no immediately visible DB mutation; the durable write lands later in the AS tables. A static grep for `$wpdb->insert` won't catch it. | Treat scheduler dispatches as writes. The "no additional effect on the environment" promise of `idempotent: true` is violated by accumulating queued jobs even if the immediate return value is constant. |
+| Variable-built HTTP methods on delegate helpers. | Static can't follow runtime values. | Treat callers of helpers whose default method isn't `GET` as suspect. |
+| Tautological capability gates — `current_user_can('read')` on a "private" ability. | The cap looks valid; subscribers happen to hold it. | Cross-reference the permission roundtrip — subscribers should be denied. |
+For high-stakes plugins, run runtime mode (see `runtime-harness.md`)
+before landing — it catches some blind spots via twin-invocation diff
+and live state inspection.
+## Suppressing legitimate exceptions
+When a pattern that looks like a write is semantically a read (e.g.
+populating a read-through cache via `set_transient`, updating a
+`last_read_at` timestamp for tracking, diagnostic logging), suppress
+with an inline comment on the offending line:
+```php
+// verify-ignore: readonly -- writes to read-through cache; semantically a read.
+set_transient( $cache_key, $data, HOUR_IN_SECONDS );
+```
+Format: `// verify-ignore: <annotation> -- <reason>`. Legal annotation
+names: `readonly`, `destructive`, `idempotent`, `all`. Narrower is
+better than `all`.
+## Runtime check complement
+For `idempotent: true` abilities, runtime mode adds a heuristic: invoke
+twice with the same input and compare. See `runtime-harness.md`
+Check 6. Differing returns are a *signal* to inspect, not a verdict —
+under core's definition, the question is whether the *environment*
+changed, not whether the *return value* matches. A response that
+embeds a per-call timestamp / nonce / random ID is fine; a response
+that reflects a counter that grew between calls is not.
+## Report format
+Each finding gets one row in the run's "Annotation correctness" table:
+```markdown
+| Ability | Claim | Result | Evidence |
+|---|---|---|---|
+| myplugin/get-things | readonly=true | OK | callback reads only |
+| myplugin/get-things-with-counts | readonly=true | FAIL | `src/Abilities/Things.php:142`: `$wpdb->update( $table, ... )` |
+| myplugin/submit-thing | destructive=false | OK | no destructive patterns |
+| myplugin/submit-thing | idempotent=false | OK | check only applies when idempotent=true; false annotation acknowledged |
+```
+The evidence column MUST cite file + line so a reviewer can jump
+straight to the issue.

package/.github/skills/wp-abilities-verify/references/audit-schema-validation.md ADDED Viewed

@@ -0,0 +1,131 @@
+# Audit Schema Validation
+How `wp-abilities-verify` validates an audit document produced by
+`wp-abilities-audit`. The canonical schema (field tables, types,
+invariants, known limitations) lives in
+`../../wp-abilities-audit/references/audit-schema.md` — this reference
+covers only the validation procedure: how to extract the YAML, what
+checks to run in what order, and how to report results.
+If a field type or shape question is not answered here, look in the
+canonical schema. Do NOT duplicate field tables in this file — the
+canonical is the single source of truth.
+## Why verify owns the validator
+Verify fails fast on a malformed audit so the rest of its procedure can
+assume well-formed input. Audit produces; verify validates the
+production. Co-locating the validator with verify keeps the
+"validate audit" step in the same procedure as "validate registered
+abilities" and lets a single run produce one consolidated report.
+## Step 1 — extract the YAML
+The audit doc is a markdown file with a single fenced ` ```yaml ` block
+containing the structured fields:
+```bash
+# Scan for the ```yaml fence and capture until the closing ``` fence.
+awk '/^```yaml$/{f=1;next} /^```$/{f=0} f' <audit-doc.md> > /tmp/audit.yaml
+```
+If the audit has multiple YAML blocks (it shouldn't, but defensively),
+take the first one with `proposed_abilities` as a top-level key.
+Parse with any YAML library — `js-yaml` from Node, `yaml` (Python), or
+`yq` from the command line. None of the canonical fields require
+non-standard YAML features (no anchors, no aliases), so a plain
+`yaml.load` is sufficient.
+## Step 2 — validate against the canonical schema
+Apply the field-shape rules defined in
+`../../wp-abilities-audit/references/audit-schema.md`. Specifically:
+1. Every required top-level field is present and non-empty (see
+   "Top-level fields" in the canonical).
+2. `capability_gate` matches one of the legal shapes (single string,
+   `{read, write}` object, or — with WARN per the canonical's "Known
+   limitations" — the legacy slash-separated string).
+3. Every entry in `proposed_abilities` has every required per-ability
+   field with the right type (see "`proposed_abilities`" in the
+   canonical).
+4. Each ability's `annotations` block has all three booleans
+   (`readonly`, `destructive`, `idempotent`) as actual booleans —
+   string `"true"` / `"false"` is FAIL (indicates a quoting bug).
+5. Each ability's `backing` is either an object with the canonical
+   fields or `null`; `null` is WARN, not FAIL (it's intentional gap
+   output).
+Missing required field → FAIL. Wrong type → FAIL. Legacy
+`capability_gate` slash-string → WARN.
+## Step 3 — whole-audit invariants
+Run these after per-field validation passes:
+### Exactly 0 or 1 abilities with `reference_ability: true`
+Count abilities where `reference_ability` is `true`. More than 1 → FAIL
+(the schema permits at most one reference; multiple are ambiguous for
+implementers picking a starting point).
+```js
+const refCount = audit.proposed_abilities.filter(a => a.reference_ability === true).length;
+if (refCount > 1) fail("multiple abilities claim reference_ability: true");
+```
+### Every `backing: null` ability appears in `surfaced_gaps`
+Per the canonical's "Known limitations": a `null` backing is intentional
+gap output and MUST be paired with a matching `surfaced_gaps` entry.
+```js
+const gapNames = new Set((audit.surfaced_gaps || []).map(g => g.name));
+for (const ability of audit.proposed_abilities) {
+  if (ability.backing === null && !gapNames.has(ability.name)) {
+    fail(`ability ${ability.name} has backing: null but is missing from surfaced_gaps`);
+  }
+}
+```
+### `excluded_from_mvp` and `surfaced_gaps` may be empty
+Both are optional; empty arrays are legal. Missing entirely → WARN
+(schema expects them, even if empty).
+## Step 4 — emit the report section
+Each check goes into the "Audit doc validation" section of the run's
+final report:
+```markdown
+## Audit doc validation
+| Check | Result | Detail |
+|---|---|---|
+| Top-level required fields | OK | All 7 required fields present |
+| `capability_gate` shape | OK | string (single-cap) |
+| Per-ability fields | WARN | 1 ability has `backing: null` (intentional) |
+| `reference_ability` uniqueness | OK | 1 ability marked |
+| `surfaced_gaps` consistency | OK | all `backing: null` entries present |
+```
+A single FAIL in this section makes the whole run FAIL; verify cannot
+meaningfully continue without a trustworthy audit. WARN entries don't
+block the rest of the procedure.
+The procedure is manual-but-deterministic: follow the steps above in
+order, emit the report section, and fail fast on any missing required
+field. A future contribution may add a deterministic CLI helper that
+extracts the YAML fence and applies the rules end-to-end; until that
+exists, the steps above are the contract.
+## Escalation
+If the validator rejects an audit that's actually well-formed, the
+canonical schema in
+`../../wp-abilities-audit/references/audit-schema.md` has evolved.
+Update this file's procedure to match (likely adding a new invariant
+or relaxing a field rule). Don't loosen the validation in isolation —
+the canonical schema is the contract; this file is the enforcer.