@clipboard-health/ai-rules 2.6.4 → 2.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@clipboard-health/ai-rules",
3
3
  "description": "Pre-built AI agent rules for consistent coding standards.",
4
- "version": "2.6.4",
4
+ "version": "2.7.0",
5
5
  "bugs": "https://github.com/ClipboardHealth/core-utils/issues",
6
6
  "keywords": [
7
7
  "ai",
@@ -22,28 +22,25 @@ Analyze and fix duplicate Cognito users in clipboard-production by comparing aga
22
22
  ## Quick Start
23
23
 
24
24
  ```bash
25
- # Set SKILL_DIR to wherever this skill is installed
26
- SKILL_DIR="<path-to-this-skill>"
27
-
28
25
  # 1. Verify prerequisites
29
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/check-prerequisites.sh"
26
+ scripts/check-prerequisites.sh
30
27
 
31
28
  # 2. Create input file (one sub per line)
32
29
  echo "68e1e380-d0c1-7028-4256-3361fd833080" > subs.txt
33
30
 
34
31
  # 3. Pipeline: lookup → find duplicates → analyze → fix
35
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-lookup.sh" subs.txt results.csv
36
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-find-duplicates.sh" results.csv duplicates.csv
37
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-analyze-duplicates.sh" duplicates.csv analysis.csv
32
+ scripts/cognito-lookup.sh subs.txt results.csv
33
+ scripts/cognito-find-duplicates.sh results.csv duplicates.csv
34
+ scripts/cognito-analyze-duplicates.sh duplicates.csv analysis.csv
38
35
 
39
36
  # 4. Review analysis.csv, then fix (ALWAYS dry-run first!)
40
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-fix-duplicates.sh" analysis.csv --dry-run
41
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-fix-duplicates.sh" analysis.csv
37
+ scripts/cognito-fix-duplicates.sh analysis.csv --dry-run
38
+ scripts/cognito-fix-duplicates.sh analysis.csv
42
39
  ```
43
40
 
44
41
  ## Prerequisites
45
42
 
46
- Run `"${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/check-prerequisites.sh"` to verify. Requirements:
43
+ Run `scripts/check-prerequisites.sh` to verify. Requirements:
47
44
 
48
45
  | Requirement | Setup |
49
46
  | ------------------------------------- | ----------------------------------------------------------- |
@@ -5,7 +5,7 @@ Pipeline: `subs.txt → lookup → find-duplicates → analyze → analysis.csv`
5
5
  ## Step 1: Lookup Users
6
6
 
7
7
  ```bash
8
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-lookup.sh" <input_file> [output_file]
8
+ scripts/cognito-lookup.sh <input_file> [output_file]
9
9
  ```
10
10
 
11
11
  Converts Cognito subs to user details. Run `--help` for all options.
@@ -16,7 +16,7 @@ Converts Cognito subs to user details. Run `--help` for all options.
16
16
  ## Step 2: Find Duplicates
17
17
 
18
18
  ```bash
19
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-find-duplicates.sh" <results_csv> [output_file]
19
+ scripts/cognito-find-duplicates.sh <results_csv> [output_file]
20
20
  ```
21
21
 
22
22
  Searches for other accounts sharing phone or email. Run `--help` for all options.
@@ -26,7 +26,7 @@ Searches for other accounts sharing phone or email. Run `--help` for all options
26
26
  ## Step 3: Analyze Duplicates
27
27
 
28
28
  ```bash
29
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-analyze-duplicates.sh" <duplicates_csv> [output_file]
29
+ scripts/cognito-analyze-duplicates.sh <duplicates_csv> [output_file]
30
30
  ```
31
31
 
32
32
  Compares each duplicate against backend API. Run `--help` for all options.
@@ -5,7 +5,7 @@ Execute fixes after reviewing `analysis.csv`.
5
5
  ## Always Dry-Run First
6
6
 
7
7
  ```bash
8
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-fix-duplicates.sh" analysis.csv --dry-run
8
+ scripts/cognito-fix-duplicates.sh analysis.csv --dry-run
9
9
  ```
10
10
 
11
11
  Review output to confirm correct users will be deleted/updated.
@@ -13,7 +13,7 @@ Review output to confirm correct users will be deleted/updated.
13
13
  ## Execute
14
14
 
15
15
  ```bash
16
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-fix-duplicates.sh" analysis.csv
16
+ scripts/cognito-fix-duplicates.sh analysis.csv
17
17
  ```
18
18
 
19
19
  Run `--help` for all options.
@@ -3,7 +3,7 @@
3
3
  ## Quick Check
4
4
 
5
5
  ```bash
6
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/check-prerequisites.sh"
6
+ scripts/check-prerequisites.sh
7
7
  ```
8
8
 
9
9
  This validates all requirements and shows how to fix failures.
@@ -60,7 +60,7 @@ aws cognito-idp list-user-pools \
60
60
  Pass the pool ID as a parameter to override the default:
61
61
 
62
62
  ```bash
63
- "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/cognito-user-analysis/scripts/cognito-lookup.sh" subs.txt results.csv cbh-staging-platform us-west-2_XXXXX
63
+ scripts/cognito-lookup.sh subs.txt results.csv cbh-staging-platform us-west-2_XXXXX
64
64
  ```
65
65
 
66
66
  ## Troubleshooting
@@ -0,0 +1,174 @@
1
+ ---
2
+ name: flaky-test-debugger
3
+ description: Debug and fix flaky Playwright E2E tests using Playwright reports and Datadog. Use this skill when investigating intermittent Playwright test failures, triaging flaky E2E tests, or fixing test instability.
4
+ ---
5
+
6
+ Work through these phases in order. Skip phases only when you already have the information they produce.
7
+
8
+ ## Phase 1: Triage Snapshot
9
+
10
+ Capture these details first so the investigation is reproducible. If the user hasn't provided them, ask.
11
+
12
+ - Failing test file and name
13
+ - GitHub Actions run URL to fetch the LLM report
14
+
15
+ ### Fetch the LLM Report
16
+
17
+ Downloads the `playwright-llm-report` artifact from a GitHub Actions run.
18
+
19
+ ```bash
20
+ bash scripts/fetch-llm-report.sh "<github-actions-url>"
21
+ ```
22
+
23
+ This downloads and extracts to `/tmp/playwright-llm-report-{runId}/`. The report is a single `llm-report.json` file.
24
+
25
+ ## Phase 2: Quick Classification
26
+
27
+ LLM report structure:
28
+
29
+ - **`summary`** -- quick pass/fail counts
30
+ - **`tests[].errors[].message`** -- ANSI-stripped, clean error text
31
+ - **`tests[].errors[].diff`** -- extracted expected/actual from assertion errors
32
+ - **`tests[].errors[].location`** -- exact file and line of failure
33
+ - **`tests[].flaky`** -- true if test passed after retry
34
+ - **`tests[].attempts[]`** -- full retry history with per-attempt status, timing, stdio, attachments, steps, and network
35
+ - **`tests[].attempts[].consoleMessages[]`** -- warning/error/pageerror/page-closed/page-crashed trace entries only (2KB text cap with `[truncated]` marker, max 50 per attempt, high-signal entries prioritized over low-signal)
36
+ - **`tests[].steps` / `tests[].network` / `tests[].timeline`** -- convenience aliases from the final attempt
37
+ - **`tests[].attempts[].timeline[]`** -- unified, sorted-by-`offsetMs` array of all retained events (`kind: "step" | "network" | "console"`). Slimmed-down entries for quick temporal scanning; full details remain in the source arrays
38
+ - **`offsetMs`** -- milliseconds since the attempt's `startTime`. Always present on steps (from `TestStep.startTime`). Optional on network entries (from trace `_monotonicTime` or `startedDateTime`, converted via the trace's `context-options` anchor) and console entries (from trace monotonic `time` field + anchor). Absent when the trace lacks a `context-options` event. Entries without `offsetMs` are excluded from the timeline
39
+ - **`tests[].attempts[].network[].traceId`** -- promoted from `x-datadog-trace-id` header for direct access
40
+ - **`tests[].attempts[].network[]`** -- max 200 per attempt, priority-based: fetch/xhr requests, error responses (status >= 400), failed, and aborted requests are retained over static assets (script, stylesheet, image, font). Includes failure details (`failureText`, `wasAborted`), redirect chain (`redirectToUrl`, `redirectFromUrl`, `redirectChain`), timing breakdown (`timings`), `durationMs` derived from available timing components, and allowlisted headers (`requestHeaders`, `responseHeaders`)
41
+ - **`tests[].attempts[].network[].responseHeaders`** -- includes `x-datadog-trace-id` and `x-datadog-span-id` when present (values capped to 256 chars)
42
+ - **`tests[].attempts[].failureArtifacts`** -- for failing/timed-out/interrupted attempts: `screenshotBase64` (base64-encoded screenshot, max 512KB), `videoPath` (first video attachment path). Omitted entirely when neither screenshot nor video is available
43
+ - **`tests[].attachments[].path`** -- relative to Playwright outputDir
44
+ - **`tests[].stdout` / `tests[].stderr`** -- capped at 4KB with `[truncated]` marker
45
+
46
+ Classify the flake to narrow the search space:
47
+
48
+ | Category | Signal | Timeline Pattern |
49
+ | -------------------------- | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
50
+ | **Test-state leakage** | Retries or earlier tests leave auth, cookies, storage, or server state behind | `attempts[]` — different outcomes across retries |
51
+ | **Data collision** | "Random" identities aren't unique enough and collide with existing users/entities | `errors[]` — duplicate key or conflict errors |
52
+ | **Backend stale data** | API returned 200 but response body shows old state | `step(action)` → `network(GET, 200)` → `step(assert) FAIL` — API succeeded but data was stale |
53
+ | **Frontend cache stale** | No network request after navigation/reload for the relevant endpoint | `step(reload)` → `step(assert) FAIL` — no intervening network call for expected endpoint |
54
+ | **Silent network failure** | CORS, DNS, or transport error prevented the request from completing | `step(action)` → `console(error: "net::ERR_FAILED")` → `step(assert) FAIL` |
55
+ | **Render/hydration bug** | API returned correct data but component didn't render it | `network(GET, 200, correct data)` → `step(assert) FAIL` — no console errors |
56
+ | **Environment / infra** | Transient 5xx, timeouts, DNS/network instability | `network` entries with 5xx status; `consoleMessages[]` with connection errors |
57
+ | **Locator / UX drift** | Selector is valid but brittle against small UI changes | `errors[]` — locator/selector text in error message |
58
+
59
+ ## Phase 3: Analyze LLM Report
60
+
61
+ ### 3a: Walk the Timeline
62
+
63
+ **Use `attempts[].timeline[]` as the primary analysis view.** The timeline is a unified, `offsetMs`-sorted array of all steps, network requests, and console entries. Walk it to reconstruct the exact event sequence around the failure:
64
+
65
+ ```text
66
+ step(click "Submit") → network(POST /api/orders, 201) → step(waitForURL /confirmation) → console(error: "Cannot read property...") → step(expect toBeVisible) FAILED
67
+ ```
68
+
69
+ For each timeline entry:
70
+
71
+ - **`kind: "step"`** — test action with `title`, `category`, `durationMs`, `depth`, optional `error`
72
+ - **`kind: "network"`** — HTTP request with `method`, `url`, `status`, optional `durationMs`, `resourceType`, `traceId`, `failureText`, `wasAborted`
73
+ - **`kind: "console"`** — browser message with `type` (warning/error/pageerror/page-closed/page-crashed) and `text`
74
+
75
+ All entries share `offsetMs` (milliseconds since attempt start), giving a single temporal view.
76
+
77
+ ### 3b: Compare pass vs fail (flaky tests)
78
+
79
+ If you don't have passing and failing attempts for the same test, skip to 3c.
80
+
81
+ Walk the failed attempt's timeline and the passed attempt's timeline side-by-side to identify the first divergence point:
82
+
83
+ 1. Align both timelines by step title sequence
84
+ 2. Find the first step/network/console entry that differs between attempts
85
+ 3. The divergence answers "what was different this time?" directly
86
+
87
+ Common divergence patterns:
88
+
89
+ - **Same step, different network response** — backend returned different data (stale cache, race condition, eventual consistency)
90
+ - **Same step, network call missing in failed attempt** — frontend cache served stale data, or request was silently blocked
91
+ - **Same step, console error only in failed attempt** — CORS/network failure, or JS exception from unexpected state
92
+ - **Different step timing** — failed attempt took much longer before the assertion, suggesting resource contention or slow backend
93
+
94
+ ### 3c: Identify failing tests
95
+
96
+ Filter `tests[]` for entries where `status` is `"failed"` or `flaky` is `true`. For each:
97
+
98
+ - **`errors[]`**: Contains clean error text with extracted assertion diffs and file/line location. This is usually enough to understand what went wrong.
99
+ - **`location`**: Source file, line, and column — jump straight to the code.
100
+ - **`attempts[]`**: Full retry history. Compare attempt outcomes, durations, and errors to see if the failure is consistent or intermittent.
101
+
102
+ ### 3d: Examine attempts for retry patterns
103
+
104
+ Each attempt includes:
105
+
106
+ - `status` and `durationMs` — spot timing differences between passing and failing attempts
107
+ - `error` — failure reason per attempt (may differ across retries)
108
+ - `consoleMessages[]` — browser warnings/errors (only warning, error, pageerror, page-closed, page-crashed entries; capped at 2KB / 50 per attempt)
109
+ - `failureArtifacts` — for failed/timed-out/interrupted attempts:
110
+ - `screenshotBase64` — base64-encoded failure screenshot (max 512KB). **Decode and inspect this** to see exactly what the page showed at failure time — often reveals modals, loading spinners, error banners, or unexpected navigation that the assertion text alone doesn't explain.
111
+ - `videoPath` — path to video recording
112
+ - `network[]` — HTTP requests/responses for that attempt
113
+ - `timeline[]` — unified sorted event stream
114
+
115
+ ### 3e: Inspect network activity and extract trace IDs
116
+
117
+ The `network[]` array (on tests or individual attempts) includes:
118
+
119
+ - `method`, `url`, `status` — identify 4xx/5xx responses
120
+ - `timings` — detailed breakdown: `dnsMs`, `connectMs`, `sslMs`, `sendMs`, `waitMs`, `receiveMs`
121
+ - `durationMs` — total request duration derived from timing components
122
+ - `requestHeaders`, `responseHeaders` — allowlisted headers
123
+ - `redirectChain` — full redirect sequence
124
+ - **`traceId`** — Datadog trace ID extracted from `x-datadog-trace-id` response header. **When present near a failure, you must use references/datadog-apm-traces.md for backend correlation to bridge the gap between frontend test failure and potential backend root cause.**
125
+
126
+ Network is capped at 200 entries per attempt, prioritized: fetch/xhr and error responses are retained over static assets. Headers/values capped at 256 chars. If all 200 entries are static assets (script/stylesheet/font) with no API calls, the capture is saturated.
127
+
128
+ ### 3f: Review test steps
129
+
130
+ `tests[].steps[]` provides a step-by-step breakdown of test actions with timing (`offsetMs`, `durationMs`, `depth`). Prefer the timeline view (3a) which interleaves steps with network and console. Use steps directly when you need the full hierarchy (nested steps via `depth`).
131
+
132
+ ## Phase 4: Evidence Standard
133
+
134
+ Do not propose a fix without concrete artifacts. At minimum, include:
135
+
136
+ - One **error artifact** — from `tests[].errors[]` (assertion diff, timeout message) or a trace/log entry
137
+ - One **network artifact** — from `tests[].network[]` or `attempts[].network[]` (response status, timing, headers)
138
+ - A **specific code path** that consumed that state — use `tests[].location` to jump to the source
139
+ - When available: **screenshot** from `failureArtifacts.screenshotBase64` showing page state at failure
140
+ - When available: **Datadog trace** via `network[].traceId` showing backend behavior for the failing request
141
+
142
+ ## Phase 5: Fix Decision Tree
143
+
144
+ Apply fixes in this order of priority:
145
+
146
+ 1. **Validate scenario realism first.** Is the failure path possible for real users, or is it purely a test-setup artifact? If not user-realistic, prioritize test/data/harness fixes over product changes.
147
+
148
+ 2. **Test harness fix** (when the failure is non-product):
149
+ - Reset cookies, storage, and session between retries
150
+ - Isolate test data; generate stronger unique identities
151
+ - Make retry blocks idempotent
152
+ - Wait on deterministic app signals, not arbitrary sleeps
153
+
154
+ 3. **Product fix** (when real users would hit the same issue):
155
+ - Handle stale or intermediate states safely
156
+ - Make routing/render logic robust to eventual consistency
157
+ - Add telemetry for ambiguous transitions
158
+
159
+ 4. **Both** if user impact exists _and_ tests are fragile.
160
+
161
+ ## Phase 6: Verification
162
+
163
+ Lint and type-check touched files
164
+
165
+ ## Output Format
166
+
167
+ When documenting the fix in a PR or issue, use this structure:
168
+
169
+ - **Symptom:** what failed and where
170
+ - **Root cause:** concise technical explanation
171
+ - **Evidence:** trace and network artifacts (include screenshot and Datadog trace when available)
172
+ - **Fix:** test-only, product-only, or both
173
+ - **Validation:** commands and suites run
174
+ - **Residual risk:** what could still be flaky
@@ -0,0 +1,79 @@
1
+ # Datadog APM Traces
2
+
3
+ Fetch and display the full APM trace for a given trace ID, or look up a specific span by span ID.
4
+
5
+ ## Prerequisites
6
+
7
+ The `pup` CLI must be installed and authenticated. Verify with:
8
+
9
+ ```bash
10
+ pup auth status 2>/dev/null | jq '.status' # Should show: "valid"
11
+ ```
12
+
13
+ ## Key pup conventions
14
+
15
+ - **Durations are in NANOSECONDS**: 1 second = 1,000,000,000 ns; 5ms = 5,000,000 ns. Convert to ms for display by dividing by 1,000,000.
16
+ - **Default time range is 1h.** Always pass `--from` explicitly — use `--from=7d` or `--from=30d` for older traces.
17
+ - **Default output is JSON.** Pipe JSON through `jq` for extraction.
18
+ - **`--limit` defaults to 50.** Max is 1000. For large traces, you may need multiple paginated calls (but pup handles most pagination internally).
19
+ - **Query syntax for traces:** `service:<name> resource_name:<path> @duration:>5s env:production status:error operation_name:<op>`
20
+
21
+ ## Steps
22
+
23
+ ### 1. If a span ID was provided, fetch that span first
24
+
25
+ ```bash
26
+ pup traces search --query="span_id:<SPAN_ID>" --from=30d --limit=1
27
+ ```
28
+
29
+ Display the span's details (service, operation, resource, duration, status, error if any) before proceeding to fetch the full trace.
30
+
31
+ If the query returns no results, tell the user the span was not found in the APM Spans index. Continue to step 2 using the trace ID from the arguments.
32
+
33
+ ### 2. Fetch the full trace
34
+
35
+ Use the `trace_id` to retrieve all spans in the trace:
36
+
37
+ ```bash
38
+ pup traces search --query="trace_id:<TRACE_ID>" --from=30d --limit=1000
39
+ ```
40
+
41
+ If the trace has more than 1000 spans, the response will be truncated. In that case, narrow the query by adding filters like `service:<name>` or `status:error` to focus on relevant spans.
42
+
43
+ ### 3. Parse and summarize the results
44
+
45
+ The response JSON has this structure per span:
46
+
47
+ ```text
48
+ .data[].attributes:
49
+ .span_id — unique span identifier
50
+ .trace_id — shared across all spans in the trace
51
+ .parent_id — parent span (for building the call tree)
52
+ .service — service name (e.g., "cbh-backend-main")
53
+ .operation_name — operation (e.g., "express.request", "express.middleware", "http.request")
54
+ .resource_name — resource (e.g., "GET /api/v1/users", "<anonymous>")
55
+ .status — "ok" or "error"
56
+ .start_timestamp — ISO 8601 start time
57
+ .end_timestamp — ISO 8601 end time
58
+ .custom.duration — duration in NANOSECONDS (divide by 1,000,000 for ms)
59
+ .custom.env — environment (e.g., "staging", "production")
60
+ .custom.error — error object with .message, .file, .fingerprint (null if no error)
61
+ .custom.type — span type (e.g., "web", "http", "mongodb", "redis", "worker")
62
+ .custom.service — service name (also at top level)
63
+ .tags[] — array of tag strings
64
+ ```
65
+
66
+ Use `jq` to extract a useful summary. Example:
67
+
68
+ ```bash
69
+ # Quick error summary
70
+ pup traces search --query="trace_id:<TRACE_ID>" --from=30d --limit=1000 \
71
+ | jq '[.data[] | select(.attributes.custom.error) | {
72
+ span_id: .attributes.span_id,
73
+ service: .attributes.service,
74
+ operation: .attributes.operation_name,
75
+ resource: .attributes.resource_name,
76
+ duration_ms: ((.attributes.custom.duration // 0) / 1000000 | . * 100 | round / 100),
77
+ error: .attributes.custom.error.message
78
+ }]'
79
+ ```
@@ -0,0 +1,78 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+
4
+ # Fetches the playwright-llm-report artifact from a GitHub Actions run.
5
+ # Uses the run ID in both the zip filename and extract directory so parallel
6
+ # downloads from different agents don't collide.
7
+ #
8
+ # Usage: fetch-llm-report.sh <github-actions-url>
9
+ # Example: fetch-llm-report.sh 'https://github.com/Org/Repo/actions/runs/123/attempts/1'
10
+
11
+ url="${1:-}"
12
+
13
+ if [[ -z "$url" ]]; then
14
+ echo "Usage: fetch-llm-report.sh <github-actions-url>" >&2
15
+ exit 1
16
+ fi
17
+
18
+ # Parse owner, repo, and run ID from the URL
19
+ if [[ "$url" =~ github\.com/([^/]+)/([^/]+)/actions/runs/([0-9]+) ]]; then
20
+ owner="${BASH_REMATCH[1]}"
21
+ repo="${BASH_REMATCH[2]}"
22
+ run_id="${BASH_REMATCH[3]}"
23
+ else
24
+ echo "Error: Could not parse GitHub Actions URL: $url" >&2
25
+ exit 1
26
+ fi
27
+
28
+ echo "Repo: ${owner}/${repo}, Run ID: ${run_id}"
29
+
30
+ # Find the playwright-llm-report artifact ID
31
+ artifact_json=$(gh api "repos/${owner}/${repo}/actions/runs/${run_id}/artifacts" \
32
+ --jq '[.artifacts[] | select(.name == "playwright-llm-report" and (.expired | not))] | sort_by(.created_at) | last // empty | {id, name, size_in_bytes, expired}')
33
+
34
+ if [[ -z "$artifact_json" ]]; then
35
+ echo "Error: No 'playwright-llm-report' artifact found for run ${run_id}" >&2
36
+ echo "Available artifacts:" >&2
37
+ gh api "repos/${owner}/${repo}/actions/runs/${run_id}/artifacts" \
38
+ --jq '.artifacts[].name' >&2
39
+ exit 1
40
+ fi
41
+
42
+ artifact_id=$(echo "$artifact_json" | jq -r '.id')
43
+ expired=$(echo "$artifact_json" | jq -r '.expired')
44
+ size=$(echo "$artifact_json" | jq -r '.size_in_bytes')
45
+
46
+ if [[ "$expired" == "true" ]]; then
47
+ echo "Error: Artifact has expired and is no longer available." >&2
48
+ exit 1
49
+ fi
50
+
51
+ echo "Found artifact: id=${artifact_id}, size=${size} bytes"
52
+
53
+ # Download and extract using run ID for isolation
54
+ out_dir="/tmp/playwright-llm-report-${run_id}"
55
+ zip_path="${out_dir}.zip"
56
+
57
+ # Skip download if already extracted (avoids duplicate work in multi-agent runs)
58
+ if [[ -d "$out_dir" ]] && ls "$out_dir"/*.json &>/dev/null; then
59
+ echo "Already downloaded — skipping."
60
+ echo ""
61
+ echo "Report directory: ${out_dir}"
62
+ exit 0
63
+ fi
64
+
65
+ echo "Downloading to: ${zip_path}"
66
+ tmp_zip="${zip_path}.tmp"
67
+ gh api "repos/${owner}/${repo}/actions/artifacts/${artifact_id}/zip" > "$tmp_zip" && mv "$tmp_zip" "$zip_path"
68
+
69
+ echo "Extracting to: ${out_dir}"
70
+ mkdir -p "$out_dir"
71
+ unzip -o "$zip_path" -d "$out_dir"
72
+ rm -f "$zip_path"
73
+
74
+ echo ""
75
+ echo "Done! Files:"
76
+ ls -la "$out_dir"
77
+ echo ""
78
+ echo "Report directory: ${out_dir}"
@@ -13,7 +13,7 @@ Fetch and analyze unresolved review comments from a GitHub pull request.
13
13
  Run the script to fetch PR comment data:
14
14
 
15
15
  ```bash
16
- node "${CLAUDE_PLUGIN_ROOT:-.agents}/skills/unresolved-pr-comments/scripts/unresolvedPrComments.ts" [pr-number]
16
+ node scripts/unresolvedPrComments.ts [pr-number]
17
17
  ```
18
18
 
19
19
  If no PR number is provided, it uses the PR associated with the current branch.
@@ -1,146 +0,0 @@
1
- ---
2
- name: datadog-e2e-trace
3
- description: >
4
- Fetch and display the full APM trace for a Datadog CI test run from a Datadog UI URL.
5
- Use this skill whenever the user pastes a Datadog CI test URL, asks to investigate an E2E
6
- test failure trace, wants to see what happened during a CI test run, or mentions pulling
7
- spans/traces from Datadog CI Visibility.
8
- argument-hint: "<datadog-ci-test-url>"
9
- ---
10
-
11
- # Datadog E2E Test Trace
12
-
13
- Fetch the full APM trace for a Datadog CI Visibility test run, given a Datadog UI URL.
14
-
15
- ## Arguments
16
-
17
- - `$ARGUMENTS` — A Datadog CI test URL (e.g., `https://app.datadoghq.com/ci/test/...?...&spanID=123456&...`)
18
-
19
- ## Prerequisites
20
-
21
- `DD_API_KEY` and `DD_APP_KEY` environment variables, or `~/.dogrc`:
22
-
23
- ```ini
24
- [Connection]
25
- apikey = <your-api-key>
26
- appkey = <your-app-key>
27
- ```
28
-
29
- ## Steps
30
-
31
- ### 1. Extract the `spanID` from the URL
32
-
33
- Parse the `spanID` query parameter from the URL. This is a decimal span ID.
34
-
35
- If the URL has no `spanID`, stop and tell the user: the test run has no associated trace. This typically happens when Datadog RUM is active during E2E tests, which suppresses CI test traces.
36
-
37
- ### 2. Resolve API credentials
38
-
39
- ```bash
40
- if [ -n "$DD_API_KEY" ] && [ -n "$DD_APP_KEY" ]; then
41
- API_KEY="$DD_API_KEY"
42
- APP_KEY="$DD_APP_KEY"
43
- else
44
- API_KEY=$(grep apikey ~/.dogrc | cut -d= -f2 | tr -d ' ')
45
- APP_KEY=$(grep appkey ~/.dogrc | cut -d= -f2 | tr -d ' ')
46
- fi
47
- ```
48
-
49
- Use `$API_KEY` and `$APP_KEY` in all subsequent curl commands.
50
-
51
- ### 3. Fetch the span to get the `trace_id`
52
-
53
- Query the Spans API. The request body **must** use the wrapped `data` format shown below — the flat `{"filter": ...}` format returns 400:
54
-
55
- ```bash
56
- curl -s -X POST "https://api.datadoghq.com/api/v2/spans/events/search" \
57
- -H "Content-Type: application/json" \
58
- -H "DD-API-KEY: ${API_KEY}" \
59
- -H "DD-APPLICATION-KEY: ${APP_KEY}" \
60
- -d '{
61
- "data": {
62
- "type": "search_request",
63
- "attributes": {
64
- "filter": {
65
- "query": "span_id:<SPAN_ID>",
66
- "from": "now-30d",
67
- "to": "now"
68
- },
69
- "page": {
70
- "limit": 1
71
- }
72
- }
73
- }
74
- }'
75
- ```
76
-
77
- Extract `trace_id` from `.data[0].attributes.trace_id`.
78
-
79
- If the query returns no results (empty `.data` array), the span exists only in the CI Visibility index and is not available in APM. Tell the user:
80
-
81
- > The span was not found in the APM Spans index — it likely exists only in CI Visibility (e.g., a browser-side or Playwright test span). To fetch a backend trace, open the flamegraph in the Datadog UI, click on a **backend span** (e.g., an API endpoint from a server-side service, not a browser HTTP request), copy the updated URL, and run this skill again.
82
-
83
- Stop here — do not proceed to step 4.
84
-
85
- **Note:** The `index=citest` parameter sometimes present in the URL only controls the Datadog UI view. It does not mean the span is inaccessible via the Spans API. Backend spans (e.g., `express.request`) are often in both the CI Visibility flamegraph and the APM spans index. Always attempt the query regardless of that parameter.
86
-
87
- ### 4. Fetch the full trace
88
-
89
- Use the `trace_id` to retrieve all spans in the trace. Paginate until all spans are collected:
90
-
91
- ```bash
92
- ALL_SPANS="[]"
93
- CURSOR=""
94
-
95
- while true; do
96
- if [ -n "$CURSOR" ]; then
97
- PAGE_PARAM="\"cursor\": \"${CURSOR}\","
98
- else
99
- PAGE_PARAM=""
100
- fi
101
-
102
- RESPONSE=$(curl -s -X POST "https://api.datadoghq.com/api/v2/spans/events/search" \
103
- -H "Content-Type: application/json" \
104
- -H "DD-API-KEY: ${API_KEY}" \
105
- -H "DD-APPLICATION-KEY: ${APP_KEY}" \
106
- -d "{
107
- \"data\": {
108
- \"type\": \"search_request\",
109
- \"attributes\": {
110
- \"filter\": {
111
- \"query\": \"trace_id:<TRACE_ID>\",
112
- \"from\": \"now-30d\",
113
- \"to\": \"now\"
114
- },
115
- \"sort\": \"timestamp\",
116
- \"page\": {
117
- ${PAGE_PARAM}
118
- \"limit\": 50
119
- }
120
- }
121
- }
122
- }")
123
-
124
- ALL_SPANS=$(echo "$ALL_SPANS" | jq --argjson new "$(echo "$RESPONSE" | jq '.data')" '. + $new')
125
- CURSOR=$(echo "$RESPONSE" | jq -r '.meta.page.after // empty')
126
-
127
- if [ -z "$CURSOR" ]; then
128
- break
129
- fi
130
- done
131
- ```
132
-
133
- ### 5. Display the results
134
-
135
- Start with a one-line summary: total span count and trace duration (max end time minus min start time).
136
-
137
- Then present a table of spans grouped by type. Mark any span with error status or non-2xx status code with a warning indicator.
138
-
139
- | Type | Columns |
140
- | ----------------------------------------------------- | --------------------------------------------- |
141
- | **API endpoints** (type: `web`) | resource name, service, duration, status code |
142
- | **External HTTP calls** (type: `http`) | resource, service, duration, status code, URL |
143
- | **Database queries** (type: `mongodb`, `redis`, etc.) | resource, service, duration |
144
- | **Other spans** | resource, service, type, duration |
145
-
146
- If there are errors, call them out at the top before the table so the user sees them immediately.