npm - @clipboard-health/ai-rules - Versions diffs - 2.6.4 → 2.7.0 - Mend

@clipboard-health/ai-rules 2.6.4 → 2.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@clipboard-health/ai-rules",
   "description": "Pre-built AI agent rules for consistent coding standards.",
-  "version": "2.6.4",
+  "version": "2.7.0",
   "bugs": "https://github.com/ClipboardHealth/core-utils/issues",
   "keywords": [
     "ai",

package/skills/cognito-user-analysis/SKILL.md CHANGED Viewed

@@ -22,28 +22,25 @@ Analyze and fix duplicate Cognito users in clipboard-production by comparing aga
 ## Quick Start
 ```bash
-# Set SKILL_DIR to wherever this skill is installed
-SKILL_DIR="<path-to-this-skill>"
 # 1. Verify prerequisites
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/check-prerequisites.sh"
+scripts/check-prerequisites.sh
 # 2. Create input file (one sub per line)
 echo "68e1e380-d0c1-7028-4256-3361fd833080" > subs.txt
 # 3. Pipeline: lookup → find duplicates → analyze → fix
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-lookup.sh" subs.txt results.csv
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-find-duplicates.sh" results.csv duplicates.csv
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-analyze-duplicates.sh" duplicates.csv analysis.csv
+scripts/cognito-lookup.sh subs.txt results.csv
+scripts/cognito-find-duplicates.sh results.csv duplicates.csv
+scripts/cognito-analyze-duplicates.sh duplicates.csv analysis.csv
 # 4. Review analysis.csv, then fix (ALWAYS dry-run first!)
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-fix-duplicates.sh" analysis.csv --dry-run
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-fix-duplicates.sh" analysis.csv
+scripts/cognito-fix-duplicates.sh analysis.csv --dry-run
+scripts/cognito-fix-duplicates.sh analysis.csv
 ```
 ## Prerequisites
-Run `"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/check-prerequisites.sh"` to verify. Requirements:
+Run `scripts/check-prerequisites.sh` to verify. Requirements:
 | Requirement                           | Setup                                                       |
 | ------------------------------------- | ----------------------------------------------------------- |

package/skills/cognito-user-analysis/docs/analysis-workflow.md CHANGED Viewed

@@ -5,7 +5,7 @@ Pipeline: `subs.txt → lookup → find-duplicates → analyze → analysis.csv`
 ## Step 1: Lookup Users
 ```bash
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-lookup.sh" <input_file> [output_file]
+scripts/cognito-lookup.sh <input_file> [output_file]
 ```
 Converts Cognito subs to user details. Run `--help` for all options.
@@ -16,7 +16,7 @@ Converts Cognito subs to user details. Run `--help` for all options.
 ## Step 2: Find Duplicates
 ```bash
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-find-duplicates.sh" <results_csv> [output_file]
+scripts/cognito-find-duplicates.sh <results_csv> [output_file]
 ```
 Searches for other accounts sharing phone or email. Run `--help` for all options.
@@ -26,7 +26,7 @@ Searches for other accounts sharing phone or email. Run `--help` for all options
 ## Step 3: Analyze Duplicates
 ```bash
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-analyze-duplicates.sh" <duplicates_csv> [output_file]
+scripts/cognito-analyze-duplicates.sh <duplicates_csv> [output_file]
 ```
 Compares each duplicate against backend API. Run `--help` for all options.

package/skills/cognito-user-analysis/docs/fix-workflow.md CHANGED Viewed

@@ -5,7 +5,7 @@ Execute fixes after reviewing `analysis.csv`.
 ## Always Dry-Run First
 ```bash
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-fix-duplicates.sh" analysis.csv --dry-run
+scripts/cognito-fix-duplicates.sh analysis.csv --dry-run
 ```
 Review output to confirm correct users will be deleted/updated.
@@ -13,7 +13,7 @@ Review output to confirm correct users will be deleted/updated.
 ## Execute
 ```bash
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-fix-duplicates.sh" analysis.csv
+scripts/cognito-fix-duplicates.sh analysis.csv
 ```
 Run `--help` for all options.

package/skills/cognito-user-analysis/docs/setup.md CHANGED Viewed

@@ -3,7 +3,7 @@
 ## Quick Check
 ```bash
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/check-prerequisites.sh"
+scripts/check-prerequisites.sh
 ```
 This validates all requirements and shows how to fix failures.
@@ -60,7 +60,7 @@ aws cognito-idp list-user-pools \
 Pass the pool ID as a parameter to override the default:
 ```bash
-"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-lookup.sh" subs.txt results.csv cbh-staging-platform us-west-2_XXXXX
+scripts/cognito-lookup.sh subs.txt results.csv cbh-staging-platform us-west-2_XXXXX
 ```
 ## Troubleshooting

package/skills/flaky-test-debugger/SKILL.md ADDED Viewed

@@ -0,0 +1,174 @@
+---
+name: flaky-test-debugger
+description: Debug and fix flaky Playwright E2E tests using Playwright reports and Datadog. Use this skill when investigating intermittent Playwright test failures, triaging flaky E2E tests, or fixing test instability.
+---
+Work through these phases in order. Skip phases only when you already have the information they produce.
+## Phase 1: Triage Snapshot
+Capture these details first so the investigation is reproducible. If the user hasn't provided them, ask.
+- Failing test file and name
+- GitHub Actions run URL to fetch the LLM report
+### Fetch the LLM Report
+Downloads the `playwright-llm-report` artifact from a GitHub Actions run.
+```bash
+bash scripts/fetch-llm-report.sh "<github-actions-url>"
+```
+This downloads and extracts to `/tmp/playwright-llm-report-{runId}/`. The report is a single `llm-report.json` file.
+## Phase 2: Quick Classification
+LLM report structure:
+- **`summary`** -- quick pass/fail counts
+- **`tests[].errors[].message`** -- ANSI-stripped, clean error text
+- **`tests[].errors[].diff`** -- extracted expected/actual from assertion errors
+- **`tests[].errors[].location`** -- exact file and line of failure
+- **`tests[].flaky`** -- true if test passed after retry
+- **`tests[].attempts[]`** -- full retry history with per-attempt status, timing, stdio, attachments, steps, and network
+- **`tests[].attempts[].consoleMessages[]`** -- warning/error/pageerror/page-closed/page-crashed trace entries only (2KB text cap with `[truncated]` marker, max 50 per attempt, high-signal entries prioritized over low-signal)
+- **`tests[].steps` / `tests[].network` / `tests[].timeline`** -- convenience aliases from the final attempt
+- **`tests[].attempts[].timeline[]`** -- unified, sorted-by-`offsetMs` array of all retained events (`kind: "step" | "network" | "console"`). Slimmed-down entries for quick temporal scanning; full details remain in the source arrays
+- **`offsetMs`** -- milliseconds since the attempt's `startTime`. Always present on steps (from `TestStep.startTime`). Optional on network entries (from trace `_monotonicTime` or `startedDateTime`, converted via the trace's `context-options` anchor) and console entries (from trace monotonic `time` field + anchor). Absent when the trace lacks a `context-options` event. Entries without `offsetMs` are excluded from the timeline
+- **`tests[].attempts[].network[].traceId`** -- promoted from `x-datadog-trace-id` header for direct access
+- **`tests[].attempts[].network[]`** -- max 200 per attempt, priority-based: fetch/xhr requests, error responses (status >= 400), failed, and aborted requests are retained over static assets (script, stylesheet, image, font). Includes failure details (`failureText`, `wasAborted`), redirect chain (`redirectToUrl`, `redirectFromUrl`, `redirectChain`), timing breakdown (`timings`), `durationMs` derived from available timing components, and allowlisted headers (`requestHeaders`, `responseHeaders`)
+- **`tests[].attempts[].network[].responseHeaders`** -- includes `x-datadog-trace-id` and `x-datadog-span-id` when present (values capped to 256 chars)
+- **`tests[].attempts[].failureArtifacts`** -- for failing/timed-out/interrupted attempts: `screenshotBase64` (base64-encoded screenshot, max 512KB), `videoPath` (first video attachment path). Omitted entirely when neither screenshot nor video is available
+- **`tests[].attachments[].path`** -- relative to Playwright outputDir
+- **`tests[].stdout` / `tests[].stderr`** -- capped at 4KB with `[truncated]` marker
+Classify the flake to narrow the search space:
+| Category                   | Signal                                                                            | Timeline Pattern                                                                              |
+| -------------------------- | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
+| **Test-state leakage**     | Retries or earlier tests leave auth, cookies, storage, or server state behind     | `attempts[]` — different outcomes across retries                                              |
+| **Data collision**         | "Random" identities aren't unique enough and collide with existing users/entities | `errors[]` — duplicate key or conflict errors                                                 |
+| **Backend stale data**     | API returned 200 but response body shows old state                                | `step(action)` → `network(GET, 200)` → `step(assert) FAIL` — API succeeded but data was stale |
+| **Frontend cache stale**   | No network request after navigation/reload for the relevant endpoint              | `step(reload)` → `step(assert) FAIL` — no intervening network call for expected endpoint      |
+| **Silent network failure** | CORS, DNS, or transport error prevented the request from completing               | `step(action)` → `console(error: "net::ERR_FAILED")` → `step(assert) FAIL`                    |
+| **Render/hydration bug**   | API returned correct data but component didn't render it                          | `network(GET, 200, correct data)` → `step(assert) FAIL` — no console errors                   |
+| **Environment / infra**    | Transient 5xx, timeouts, DNS/network instability                                  | `network` entries with 5xx status; `consoleMessages[]` with connection errors                 |
+| **Locator / UX drift**     | Selector is valid but brittle against small UI changes                            | `errors[]` — locator/selector text in error message                                           |
+## Phase 3: Analyze LLM Report
+### 3a: Walk the Timeline
+**Use `attempts[].timeline[]` as the primary analysis view.** The timeline is a unified, `offsetMs`-sorted array of all steps, network requests, and console entries. Walk it to reconstruct the exact event sequence around the failure:
+```text
+step(click "Submit") → network(POST /api/orders, 201) → step(waitForURL /confirmation) → console(error: "Cannot read property...") → step(expect toBeVisible) FAILED
+```
+For each timeline entry:
+- **`kind: "step"`** — test action with `title`, `category`, `durationMs`, `depth`, optional `error`
+- **`kind: "network"`** — HTTP request with `method`, `url`, `status`, optional `durationMs`, `resourceType`, `traceId`, `failureText`, `wasAborted`
+- **`kind: "console"`** — browser message with `type` (warning/error/pageerror/page-closed/page-crashed) and `text`
+All entries share `offsetMs` (milliseconds since attempt start), giving a single temporal view.
+### 3b: Compare pass vs fail (flaky tests)
+If you don't have passing and failing attempts for the same test, skip to 3c.
+Walk the failed attempt's timeline and the passed attempt's timeline side-by-side to identify the first divergence point:
+1. Align both timelines by step title sequence
+2. Find the first step/network/console entry that differs between attempts
+3. The divergence answers "what was different this time?" directly
+Common divergence patterns:
+- **Same step, different network response** — backend returned different data (stale cache, race condition, eventual consistency)
+- **Same step, network call missing in failed attempt** — frontend cache served stale data, or request was silently blocked
+- **Same step, console error only in failed attempt** — CORS/network failure, or JS exception from unexpected state
+- **Different step timing** — failed attempt took much longer before the assertion, suggesting resource contention or slow backend
+### 3c: Identify failing tests
+Filter `tests[]` for entries where `status` is `"failed"` or `flaky` is `true`. For each:
+- **`errors[]`**: Contains clean error text with extracted assertion diffs and file/line location. This is usually enough to understand what went wrong.
+- **`location`**: Source file, line, and column — jump straight to the code.
+- **`attempts[]`**: Full retry history. Compare attempt outcomes, durations, and errors to see if the failure is consistent or intermittent.
+### 3d: Examine attempts for retry patterns
+Each attempt includes:
+- `status` and `durationMs` — spot timing differences between passing and failing attempts
+- `error` — failure reason per attempt (may differ across retries)
+- `consoleMessages[]` — browser warnings/errors (only warning, error, pageerror, page-closed, page-crashed entries; capped at 2KB / 50 per attempt)
+- `failureArtifacts` — for failed/timed-out/interrupted attempts:
+  - `screenshotBase64` — base64-encoded failure screenshot (max 512KB). **Decode and inspect this** to see exactly what the page showed at failure time — often reveals modals, loading spinners, error banners, or unexpected navigation that the assertion text alone doesn't explain.
+  - `videoPath` — path to video recording
+- `network[]` — HTTP requests/responses for that attempt
+- `timeline[]` — unified sorted event stream
+### 3e: Inspect network activity and extract trace IDs
+The `network[]` array (on tests or individual attempts) includes:
+- `method`, `url`, `status` — identify 4xx/5xx responses
+- `timings` — detailed breakdown: `dnsMs`, `connectMs`, `sslMs`, `sendMs`, `waitMs`, `receiveMs`
+- `durationMs` — total request duration derived from timing components
+- `requestHeaders`, `responseHeaders` — allowlisted headers
+- `redirectChain` — full redirect sequence
+- **`traceId`** — Datadog trace ID extracted from `x-datadog-trace-id` response header. **When present near a failure, you must use references/datadog-apm-traces.md for backend correlation to bridge the gap between frontend test failure and potential backend root cause.**
+Network is capped at 200 entries per attempt, prioritized: fetch/xhr and error responses are retained over static assets. Headers/values capped at 256 chars. If all 200 entries are static assets (script/stylesheet/font) with no API calls, the capture is saturated.
+### 3f: Review test steps
+`tests[].steps[]` provides a step-by-step breakdown of test actions with timing (`offsetMs`, `durationMs`, `depth`). Prefer the timeline view (3a) which interleaves steps with network and console. Use steps directly when you need the full hierarchy (nested steps via `depth`).
+## Phase 4: Evidence Standard
+Do not propose a fix without concrete artifacts. At minimum, include:
+- One **error artifact** — from `tests[].errors[]` (assertion diff, timeout message) or a trace/log entry
+- One **network artifact** — from `tests[].network[]` or `attempts[].network[]` (response status, timing, headers)
+- A **specific code path** that consumed that state — use `tests[].location` to jump to the source
+- When available: **screenshot** from `failureArtifacts.screenshotBase64` showing page state at failure
+- When available: **Datadog trace** via `network[].traceId` showing backend behavior for the failing request
+## Phase 5: Fix Decision Tree
+Apply fixes in this order of priority:
+1. **Validate scenario realism first.** Is the failure path possible for real users, or is it purely a test-setup artifact? If not user-realistic, prioritize test/data/harness fixes over product changes.
+2. **Test harness fix** (when the failure is non-product):
+   - Reset cookies, storage, and session between retries
+   - Isolate test data; generate stronger unique identities
+   - Make retry blocks idempotent
+   - Wait on deterministic app signals, not arbitrary sleeps
+3. **Product fix** (when real users would hit the same issue):
+   - Handle stale or intermediate states safely
+   - Make routing/render logic robust to eventual consistency
+   - Add telemetry for ambiguous transitions
+4. **Both** if user impact exists _and_ tests are fragile.
+## Phase 6: Verification
+Lint and type-check touched files
+## Output Format
+When documenting the fix in a PR or issue, use this structure:
+- **Symptom:** what failed and where
+- **Root cause:** concise technical explanation
+- **Evidence:** trace and network artifacts (include screenshot and Datadog trace when available)
+- **Fix:** test-only, product-only, or both
+- **Validation:** commands and suites run
+- **Residual risk:** what could still be flaky

package/skills/flaky-test-debugger/references/datadog-apm-traces.md ADDED Viewed

@@ -0,0 +1,79 @@
+# Datadog APM Traces
+Fetch and display the full APM trace for a given trace ID, or look up a specific span by span ID.
+## Prerequisites
+The `pup` CLI must be installed and authenticated. Verify with:
+```bash
+pup auth status 2>/dev/null | jq '.status'   # Should show: "valid"
+```
+## Key pup conventions
+- **Durations are in NANOSECONDS**: 1 second = 1,000,000,000 ns; 5ms = 5,000,000 ns. Convert to ms for display by dividing by 1,000,000.
+- **Default time range is 1h.** Always pass `--from` explicitly — use `--from=7d` or `--from=30d` for older traces.
+- **Default output is JSON.** Pipe JSON through `jq` for extraction.
+- **`--limit` defaults to 50.** Max is 1000. For large traces, you may need multiple paginated calls (but pup handles most pagination internally).
+- **Query syntax for traces:** `service:<name> resource_name:<path> @duration:>5s env:production status:error operation_name:<op>`
+## Steps
+### 1. If a span ID was provided, fetch that span first
+```bash
+pup traces search --query="span_id:<SPAN_ID>" --from=30d --limit=1
+```
+Display the span's details (service, operation, resource, duration, status, error if any) before proceeding to fetch the full trace.
+If the query returns no results, tell the user the span was not found in the APM Spans index. Continue to step 2 using the trace ID from the arguments.
+### 2. Fetch the full trace
+Use the `trace_id` to retrieve all spans in the trace:
+```bash
+pup traces search --query="trace_id:<TRACE_ID>" --from=30d --limit=1000
+```
+If the trace has more than 1000 spans, the response will be truncated. In that case, narrow the query by adding filters like `service:<name>` or `status:error` to focus on relevant spans.
+### 3. Parse and summarize the results
+The response JSON has this structure per span:
+```text
+.data[].attributes:
+  .span_id          — unique span identifier
+  .trace_id         — shared across all spans in the trace
+  .parent_id        — parent span (for building the call tree)
+  .service          — service name (e.g., "cbh-backend-main")
+  .operation_name   — operation (e.g., "express.request", "express.middleware", "http.request")
+  .resource_name    — resource (e.g., "GET /api/v1/users", "<anonymous>")
+  .status           — "ok" or "error"
+  .start_timestamp  — ISO 8601 start time
+  .end_timestamp    — ISO 8601 end time
+  .custom.duration  — duration in NANOSECONDS (divide by 1,000,000 for ms)
+  .custom.env       — environment (e.g., "staging", "production")
+  .custom.error     — error object with .message, .file, .fingerprint (null if no error)
+  .custom.type      — span type (e.g., "web", "http", "mongodb", "redis", "worker")
+  .custom.service   — service name (also at top level)
+  .tags[]           — array of tag strings
+```
+Use `jq` to extract a useful summary. Example:
+```bash
+# Quick error summary
+pup traces search --query="trace_id:<TRACE_ID>" --from=30d --limit=1000 \
+  | jq '[.data[] | select(.attributes.custom.error) | {
+      span_id: .attributes.span_id,
+      service: .attributes.service,
+      operation: .attributes.operation_name,
+      resource: .attributes.resource_name,
+      duration_ms: ((.attributes.custom.duration // 0) / 1000000 | . * 100 | round / 100),
+      error: .attributes.custom.error.message
+    }]'
+```

package/skills/flaky-test-debugger/scripts/fetch-llm-report.sh ADDED Viewed

@@ -0,0 +1,78 @@
+#!/usr/bin/env bash
+set -euo pipefail
+# Fetches the playwright-llm-report artifact from a GitHub Actions run.
+# Uses the run ID in both the zip filename and extract directory so parallel
+# downloads from different agents don't collide.
+#
+# Usage: fetch-llm-report.sh <github-actions-url>
+# Example: fetch-llm-report.sh 'https://github.com/Org/Repo/actions/runs/123/attempts/1'
+url="${1:-}"
+if [[ -z "$url" ]]; then
+  echo "Usage: fetch-llm-report.sh <github-actions-url>" >&2
+  exit 1
+fi
+# Parse owner, repo, and run ID from the URL
+if [[ "$url" =~ github\.com/([^/]+)/([^/]+)/actions/runs/([0-9]+) ]]; then
+  owner="${BASH_REMATCH[1]}"
+  repo="${BASH_REMATCH[2]}"
+  run_id="${BASH_REMATCH[3]}"
+else
+  echo "Error: Could not parse GitHub Actions URL: $url" >&2
+  exit 1
+fi
+echo "Repo: ${owner}/${repo}, Run ID: ${run_id}"
+# Find the playwright-llm-report artifact ID
+artifact_json=$(gh api "repos/${owner}/${repo}/actions/runs/${run_id}/artifacts" \
+  --jq '[.artifacts[] | select(.name == "playwright-llm-report" and (.expired | not))] | sort_by(.created_at) | last // empty | {id, name, size_in_bytes, expired}')
+if [[ -z "$artifact_json" ]]; then
+  echo "Error: No 'playwright-llm-report' artifact found for run ${run_id}" >&2
+  echo "Available artifacts:" >&2
+  gh api "repos/${owner}/${repo}/actions/runs/${run_id}/artifacts" \
+    --jq '.artifacts[].name' >&2
+  exit 1
+fi
+artifact_id=$(echo "$artifact_json" | jq -r '.id')
+expired=$(echo "$artifact_json" | jq -r '.expired')
+size=$(echo "$artifact_json" | jq -r '.size_in_bytes')
+if [[ "$expired" == "true" ]]; then
+  echo "Error: Artifact has expired and is no longer available." >&2
+  exit 1
+fi
+echo "Found artifact: id=${artifact_id}, size=${size} bytes"
+# Download and extract using run ID for isolation
+out_dir="/tmp/playwright-llm-report-${run_id}"
+zip_path="${out_dir}.zip"
+# Skip download if already extracted (avoids duplicate work in multi-agent runs)
+if [[ -d "$out_dir" ]] && ls "$out_dir"/*.json &>/dev/null; then
+  echo "Already downloaded — skipping."
+  echo ""
+  echo "Report directory: ${out_dir}"
+  exit 0
+fi
+echo "Downloading to: ${zip_path}"
+tmp_zip="${zip_path}.tmp"
+gh api "repos/${owner}/${repo}/actions/artifacts/${artifact_id}/zip" > "$tmp_zip" && mv "$tmp_zip" "$zip_path"
+echo "Extracting to: ${out_dir}"
+mkdir -p "$out_dir"
+unzip -o "$zip_path" -d "$out_dir"
+rm -f "$zip_path"
+echo ""
+echo "Done! Files:"
+ls -la "$out_dir"
+echo ""
+echo "Report directory: ${out_dir}"

package/skills/unresolved-pr-comments/SKILL.md CHANGED Viewed

@@ -13,7 +13,7 @@ Fetch and analyze unresolved review comments from a GitHub pull request.
 Run the script to fetch PR comment data:
 ```bash
-node "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/unresolved-pr-comments/scripts/unresolvedPrComments.ts" [pr-number]
+node scripts/unresolvedPrComments.ts [pr-number]
 ```
 If no PR number is provided, it uses the PR associated with the current branch.

package/skills/datadog-e2e-trace/SKILL.md DELETED Viewed

@@ -1,146 +0,0 @@
----
-name: datadog-e2e-trace
-description: >
-  Fetch and display the full APM trace for a Datadog CI test run from a Datadog UI URL.
-  Use this skill whenever the user pastes a Datadog CI test URL, asks to investigate an E2E
-  test failure trace, wants to see what happened during a CI test run, or mentions pulling
-  spans/traces from Datadog CI Visibility.
-argument-hint: "<datadog-ci-test-url>"
----
-# Datadog E2E Test Trace
-Fetch the full APM trace for a Datadog CI Visibility test run, given a Datadog UI URL.
-## Arguments
-- `$ARGUMENTS` — A Datadog CI test URL (e.g., `https://app.datadoghq.com/ci/test/...?...&spanID=123456&...`)
-## Prerequisites
-`DD_API_KEY` and `DD_APP_KEY` environment variables, or `~/.dogrc`:
-```ini
-[Connection]
-apikey = <your-api-key>
-appkey = <your-app-key>
-```
-## Steps
-### 1. Extract the `spanID` from the URL
-Parse the `spanID` query parameter from the URL. This is a decimal span ID.
-If the URL has no `spanID`, stop and tell the user: the test run has no associated trace. This typically happens when Datadog RUM is active during E2E tests, which suppresses CI test traces.
-### 2. Resolve API credentials
-```bash
-if [ -n "$DD_API_KEY" ] && [ -n "$DD_APP_KEY" ]; then
-  API_KEY="$DD_API_KEY"
-  APP_KEY="$DD_APP_KEY"
-else
-  API_KEY=$(grep apikey ~/.dogrc | cut -d= -f2 | tr -d ' ')
-  APP_KEY=$(grep appkey ~/.dogrc | cut -d= -f2 | tr -d ' ')
-fi
-```
-Use `$API_KEY` and `$APP_KEY` in all subsequent curl commands.
-### 3. Fetch the span to get the `trace_id`
-Query the Spans API. The request body **must** use the wrapped `data` format shown below — the flat `{"filter": ...}` format returns 400:
-```bash
-curl -s -X POST "https://api.datadoghq.com/api/v2/spans/events/search" \
-  -H "Content-Type: application/json" \
-  -H "DD-API-KEY: ${API_KEY}" \
-  -H "DD-APPLICATION-KEY: ${APP_KEY}" \
-  -d '{
-    "data": {
-      "type": "search_request",
-      "attributes": {
-        "filter": {
-          "query": "span_id:<SPAN_ID>",
-          "from": "now-30d",
-          "to": "now"
-        },
-        "page": {
-          "limit": 1
-        }
-      }
-    }
-  }'
-```
-Extract `trace_id` from `.data[0].attributes.trace_id`.
-If the query returns no results (empty `.data` array), the span exists only in the CI Visibility index and is not available in APM. Tell the user:
-> The span was not found in the APM Spans index — it likely exists only in CI Visibility (e.g., a browser-side or Playwright test span). To fetch a backend trace, open the flamegraph in the Datadog UI, click on a **backend span** (e.g., an API endpoint from a server-side service, not a browser HTTP request), copy the updated URL, and run this skill again.
-Stop here — do not proceed to step 4.
-**Note:** The `index=citest` parameter sometimes present in the URL only controls the Datadog UI view. It does not mean the span is inaccessible via the Spans API. Backend spans (e.g., `express.request`) are often in both the CI Visibility flamegraph and the APM spans index. Always attempt the query regardless of that parameter.
-### 4. Fetch the full trace
-Use the `trace_id` to retrieve all spans in the trace. Paginate until all spans are collected:
-```bash
-ALL_SPANS="[]"
-CURSOR=""
-while true; do
-  if [ -n "$CURSOR" ]; then
-    PAGE_PARAM="\"cursor\": \"${CURSOR}\","
-  else
-    PAGE_PARAM=""
-  fi
-  RESPONSE=$(curl -s -X POST "https://api.datadoghq.com/api/v2/spans/events/search" \
-    -H "Content-Type: application/json" \
-    -H "DD-API-KEY: ${API_KEY}" \
-    -H "DD-APPLICATION-KEY: ${APP_KEY}" \
-    -d "{
-      \"data\": {
-        \"type\": \"search_request\",
-        \"attributes\": {
-          \"filter\": {
-            \"query\": \"trace_id:<TRACE_ID>\",
-            \"from\": \"now-30d\",
-            \"to\": \"now\"
-          },
-          \"sort\": \"timestamp\",
-          \"page\": {
-            ${PAGE_PARAM}
-            \"limit\": 50
-          }
-        }
-      }
-    }")
-  ALL_SPANS=$(echo "$ALL_SPANS" | jq --argjson new "$(echo "$RESPONSE" | jq '.data')" '. + $new')
-  CURSOR=$(echo "$RESPONSE" | jq -r '.meta.page.after // empty')
-  if [ -z "$CURSOR" ]; then
-    break
-  fi
-done
-```
-### 5. Display the results
-Start with a one-line summary: total span count and trace duration (max end time minus min start time).
-Then present a table of spans grouped by type. Mark any span with error status or non-2xx status code with a warning indicator.
-| Type                                                  | Columns                                       |
-| ----------------------------------------------------- | --------------------------------------------- |
-| **API endpoints** (type: `web`)                       | resource name, service, duration, status code |
-| **External HTTP calls** (type: `http`)                | resource, service, duration, status code, URL |
-| **Database queries** (type: `mongodb`, `redis`, etc.) | resource, service, duration                   |
-| **Other spans**                                       | resource, service, type, duration             |
-If there are errors, call them out at the top before the table so the user sees them immediately.