@htekdev/actions-debugger 1.0.130 → 1.0.132

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,138 @@
1
+ id: concurrency-timing-061
2
+ title: '`github.workflow` in reusable workflow (`workflow_call`) returns caller''s name — shared concurrency group causes deadlock'
3
+ category: concurrency-timing
4
+ severity: error
5
+ tags:
6
+ - concurrency
7
+ - reusable-workflow
8
+ - workflow-call
9
+ - github-workflow-context
10
+ - deadlock
11
+ - concurrency-group
12
+ patterns:
13
+ - regex: 'Canceling since a deadlock for concurrency group .+ was detected between .+ and .+'
14
+ flags: 'i'
15
+ - regex: 'deadlock.*concurrency group.*detected'
16
+ flags: 'i'
17
+ error_messages:
18
+ - "Canceling since a deadlock for concurrency group 'CI-refs/heads/main' was detected between 'CI workflow' and 'deploy'"
19
+ - "Canceling since a deadlock for concurrency group 'test-refs/heads/master' was detected between 'test workflow' and 'deploy'"
20
+ root_cause: |
21
+ When a caller workflow invokes a reusable workflow via `uses:` (the `workflow_call` trigger),
22
+ the `github.workflow` context variable inside the CALLEE resolves to the CALLER's workflow
23
+ name — not the callee's own filename. This is documented behavior but it directly undermines
24
+ the widely recommended concurrency group pattern of `${{ github.workflow }}-${{ github.ref }}`.
25
+
26
+ If both the caller and the callee define:
27
+ ```yaml
28
+ concurrency:
29
+ group: ${{ github.workflow }}-${{ github.ref }}
30
+ cancel-in-progress: true
31
+ ```
32
+
33
+ Both workflows compute the SAME group key (because `github.workflow` is the caller's name in
34
+ both). GitHub detects that two workflow instances are competing for the same concurrency slot
35
+ in a way that cannot resolve — one is waiting for the other which is waiting for the first —
36
+ treats it as a deadlock, and cancels one of the runs.
37
+
38
+ **Why this is especially confusing:**
39
+ The fix recommended for general concurrency group collisions (`${{ github.workflow }}-${{ github.ref }}`)
40
+ is itself the cause of this deadlock when applied naively to reusable workflows. Teams that
41
+ copy the "safe" concurrency pattern from caller to callee introduce the bug.
42
+
43
+ **Real-world evidence:**
44
+ - actions/runner#3205: `github.workflow` returns parent workflow name in child workflows
45
+ - vergil-project/vergil-actions#176: ci-push.yml calls ci.yml via workflow_call; both use
46
+ `${{ github.workflow }}-${{ github.ref }}`; deadlock detected on every push
47
+ - github/gh-aw#35173: workflow_call fan-out cancellations due to shared concurrency namespace
48
+ - SO #78101326 (6 upvotes, 1738 views): "GitHub Actions concurrency deadlock" — 6 answers
49
+ confirming the root cause is `github.workflow` returning the caller name
50
+ fix: |
51
+ Use `github.workflow_ref` instead of `github.workflow` in the CALLEE (reusable) workflow.
52
+ `github.workflow_ref` contains the full workflow file path (e.g.
53
+ `.github/workflows/deploy.yml@refs/heads/main`), which is unique per callee file and
54
+ does NOT inherit the caller's name.
55
+
56
+ **Preferred approach:** Define concurrency ONLY in the caller workflow and remove it from the
57
+ callee entirely. The callee runs as part of the caller's run and does not need its own
58
+ concurrency group. This is the cleanest fix and matches GitHub's recommendations for
59
+ reusable workflow design.
60
+
61
+ **Avoid:** Defining concurrency in both caller and callee with the same expression — this
62
+ is the root cause of the deadlock regardless of which context variable is used for the group
63
+ key, unless the keys are guaranteed to differ.
64
+ fix_code:
65
+ - language: yaml
66
+ label: 'Callee (reusable workflow): use github.workflow_ref — unique per callee file'
67
+ code: |
68
+ # .github/workflows/deploy.yml (callee — reusable workflow)
69
+ name: Deploy
70
+
71
+ on:
72
+ workflow_call:
73
+
74
+ # ✅ github.workflow_ref includes the full path, NOT the caller's name
75
+ concurrency:
76
+ group: ${{ github.workflow_ref }}-${{ github.ref }}
77
+ cancel-in-progress: true
78
+
79
+ jobs:
80
+ deploy:
81
+ runs-on: ubuntu-latest
82
+ steps:
83
+ - uses: actions/checkout@v4
84
+ - run: ./deploy.sh
85
+ - language: yaml
86
+ label: 'Preferred: define concurrency only in the caller, none in the callee'
87
+ code: |
88
+ # .github/workflows/ci.yml (caller workflow)
89
+ name: CI
90
+
91
+ on:
92
+ push:
93
+ branches: [main]
94
+
95
+ # Concurrency defined only in the caller
96
+ concurrency:
97
+ group: ${{ github.workflow }}-${{ github.ref }}
98
+ cancel-in-progress: true
99
+
100
+ jobs:
101
+ test:
102
+ runs-on: ubuntu-latest
103
+ steps:
104
+ - run: npm test
105
+
106
+ deploy:
107
+ needs: test
108
+ # ✅ No concurrency block here — the reusable workflow has no concurrency group
109
+ uses: ./.github/workflows/deploy.yml
110
+
111
+ ---
112
+ # .github/workflows/deploy.yml (callee — reusable workflow)
113
+ name: Deploy
114
+
115
+ on:
116
+ workflow_call:
117
+
118
+ # ✅ No concurrency block — inherit scheduling from caller
119
+ jobs:
120
+ deploy:
121
+ runs-on: ubuntu-latest
122
+ steps:
123
+ - uses: actions/checkout@v4
124
+ - run: ./deploy.sh
125
+ prevention:
126
+ - "Never use `github.workflow` in a reusable workflow's concurrency group — it resolves to the caller's name, not the callee's"
127
+ - "Use `github.workflow_ref` in callee workflows if a concurrency group is truly needed there"
128
+ - "Prefer defining concurrency only in the caller when calling reusable workflows via `uses:` — keep callee stateless"
129
+ - "Audit all reusable workflows for concurrency groups that use `github.workflow` and replace with `github.workflow_ref`"
130
+ docs:
131
+ - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/control-the-concurrency-of-workflows-and-jobs'
132
+ label: 'GitHub Docs: Control the concurrency of workflows and jobs'
133
+ - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/accessing-contextual-information-about-workflow-runs#github-context'
134
+ label: 'GitHub Docs: github context — github.workflow_ref field'
135
+ - url: 'https://github.com/actions/runner/issues/3205'
136
+ label: 'actions/runner#3205: github.workflow returns parent name in child workflows (closed not_planned)'
137
+ - url: 'https://stackoverflow.com/questions/78101326/github-actions-concurrency-deadlock'
138
+ label: 'SO #78101326: GitHub Actions concurrency deadlock in reusable workflows (6 upvotes, 1738 views)'
@@ -0,0 +1,142 @@
1
+ id: known-unsolved-076
2
+ title: '`env` context unavailable in `strategy.matrix` — workflow env vars cannot be used in matrix definitions'
3
+ category: known-unsolved
4
+ severity: limitation
5
+ tags:
6
+ - matrix
7
+ - env-context
8
+ - strategy
9
+ - fromJSON
10
+ - dynamic-matrix
11
+ - context-availability
12
+ - known-limitation
13
+ patterns:
14
+ - regex: 'Error when evaluating .+strategy.+ for job'
15
+ flags: 'i'
16
+ - regex: 'strategy.*matrix.*\$\{\{.*env\.'
17
+ flags: 'i'
18
+ - regex: 'Unrecognized named-value: .+env.+ \(Line: \d+'
19
+ flags: 'i'
20
+ error_messages:
21
+ - "Error when evaluating 'strategy' for job 'build'. .github/workflows/deploy.yaml (Line: 10, Col: 15): Error parsing fromJson"
22
+ - "Error when evaluating 'strategy' for job 'test'. Input string '3.11' is not a valid number. Path '[0]'"
23
+ - "Error when evaluating 'strategy' for job 'matrix-job': Object reference not set to an instance of an object"
24
+ root_cause: |
25
+ The `env` context — workflow-level environment variables defined under the top-level `env:` key
26
+ — is NOT available during evaluation of `strategy.matrix`. GitHub Actions evaluates the
27
+ `strategy:` block at workflow QUEUING time, before jobs are dispatched and before environment
28
+ variable interpolation occurs. Any expression like `${{ env.VERSIONS_JSON }}` or
29
+ `${{ fromJSON(env.VERSIONS_JSON) }}` inside `strategy.matrix` resolves to an empty string
30
+ or produces a parse error.
31
+
32
+ **GitHub's documented context availability:**
33
+ Per the official context availability table, `env` is explicitly NOT listed as available in
34
+ `jobs.<job_id>.strategy`. Available contexts there are only: `github`, `inputs`, `vars`,
35
+ `needs`, `strategy`, and `matrix`.
36
+
37
+ **Common failure patterns:**
38
+ - Using `${{ fromJSON(env.MY_VERSIONS) }}` to share a JSON array between matrix and other steps
39
+ - Setting `env: DEPLOY_ENVS: '["staging","production"]'` at workflow level and referencing it
40
+ in the matrix include/exclude blocks
41
+ - Using workflow-level env vars to DRY up matrix configs shared with `run:` script logic
42
+
43
+ **Why this surprises developers:**
44
+ - `env` IS available inside `jobs.<job_id>.steps.*` — the limitation is specific to `strategy`
45
+ - The error message "Error parsing fromJson" doesn't mention that `env` context is unavailable
46
+ - The empty-string resolution (when no parse error) causes a silent empty matrix, not an error
47
+
48
+ Sources: actions/runner#480 (281 👍, open since 2020 — most-upvoted env context limitation);
49
+ SO#74072206 (10 upvotes, 19,053 views); SO#76039616 (6 upvotes, 10,482 views)
50
+ fix: |
51
+ Three workarounds exist, in order of preference:
52
+
53
+ **Option 1 — Use `vars` context (org/repo-level variables)**
54
+ Org-level and repo-level variables (`vars.*`) ARE available in `strategy.matrix`.
55
+ Move shared values to repository Settings → Secrets and variables → Variables instead of
56
+ workflow-level `env:`.
57
+
58
+ **Option 2 — Compute matrix in a previous job's output (most flexible)**
59
+ Use a dedicated setup job that outputs the matrix JSON, then reference it in dependent
60
+ jobs via `fromJSON(needs.setup.outputs.matrix)`. This also enables dynamic matrix
61
+ generation from files, APIs, or scripts.
62
+
63
+ **Option 3 — Hardcode values directly in strategy.matrix**
64
+ Duplicate the values in `strategy.matrix` explicitly. Avoids the DRY violation but
65
+ is the simplest fix when values rarely change.
66
+ fix_code:
67
+ - language: yaml
68
+ label: 'Option 1: Use vars context — repo/org variables ARE available in strategy'
69
+ code: |
70
+ # In repo Settings → Secrets and variables → Variables, create:
71
+ # Name: PYTHON_VERSIONS Value: ["3.10","3.11","3.12"]
72
+
73
+ jobs:
74
+ test:
75
+ strategy:
76
+ matrix:
77
+ # ✅ vars context IS available in strategy.matrix
78
+ python-version: ${{ fromJSON(vars.PYTHON_VERSIONS) }}
79
+ runs-on: ubuntu-latest
80
+ steps:
81
+ - uses: actions/setup-python@v5
82
+ with:
83
+ python-version: ${{ matrix.python-version }}
84
+ - run: python -m pytest
85
+ - language: yaml
86
+ label: 'Option 2: Compute matrix in a setup job, pass via needs outputs'
87
+ code: |
88
+ jobs:
89
+ setup:
90
+ runs-on: ubuntu-latest
91
+ outputs:
92
+ matrix: ${{ steps.set-matrix.outputs.matrix }}
93
+ steps:
94
+ - id: set-matrix
95
+ run: |
96
+ # Build matrix from any source: env var, file, API, conditional logic
97
+ echo 'matrix={"python-version":["3.10","3.11","3.12"]}' >> "$GITHUB_OUTPUT"
98
+
99
+ test:
100
+ needs: setup
101
+ strategy:
102
+ matrix:
103
+ # ✅ fromJSON(needs.*.outputs.*) IS available in strategy.matrix
104
+ ${{ fromJSON(needs.setup.outputs.matrix) }}
105
+ runs-on: ubuntu-latest
106
+ steps:
107
+ - uses: actions/setup-python@v5
108
+ with:
109
+ python-version: ${{ matrix.python-version }}
110
+ - run: python -m pytest
111
+ - language: yaml
112
+ label: 'Anti-pattern: env context in strategy.matrix causes parse error'
113
+ code: |
114
+ # ❌ Workflow-level env vars are NOT available in strategy.matrix
115
+
116
+ env:
117
+ PYTHON_VERSIONS: '["3.10","3.11","3.12"]'
118
+
119
+ jobs:
120
+ test:
121
+ strategy:
122
+ matrix:
123
+ # This causes: "Error when evaluating 'strategy' for job 'test'"
124
+ python-version: ${{ fromJSON(env.PYTHON_VERSIONS) }}
125
+ runs-on: ubuntu-latest
126
+ steps:
127
+ - run: python --version
128
+ prevention:
129
+ - "Use `vars.*` (repo/org variables from Settings) instead of workflow `env:` for values shared between matrix and steps"
130
+ - "Generate dynamic matrices in a dedicated setup job and pass via `needs.*.outputs.*`"
131
+ - "Consult the GitHub Actions context availability table before using any context in `strategy` — not all contexts are available there"
132
+ - "actionlint will flag `env` context usage in strategy blocks — add it to your CI pipeline or pre-commit hooks"
133
+ - "Remember: `env` IS available in `steps.*.run`, `steps.*.with`, and most other places — the restriction is specific to `strategy` and `concurrency`"
134
+ docs:
135
+ - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/accessing-contextual-information-about-workflow-runs#context-availability'
136
+ label: 'GitHub Docs: Context availability table (env not listed for strategy)'
137
+ - url: 'https://github.com/actions/runner/issues/480'
138
+ label: 'actions/runner#480: Workflow level env does not work in all fields (281 upvotes, open since 2020)'
139
+ - url: 'https://stackoverflow.com/questions/74072206/github-actions-use-variables-in-matrix-definition'
140
+ label: 'SO #74072206: Use variables in matrix definition (10 upvotes, 19,053 views)'
141
+ - url: 'https://stackoverflow.com/questions/76039616/how-to-read-in-an-array-of-strings-to-matrix-in-github-actions'
142
+ label: 'SO #76039616: Read array of strings into matrix (6 upvotes, 10,482 views)'
@@ -0,0 +1,158 @@
1
+ id: permissions-auth-076
2
+ title: '`docker push` to ghcr.io fails with 403 / "denied: permission_denied: write_package" — missing `packages: write` permission'
3
+ category: permissions-auth
4
+ severity: error
5
+ tags:
6
+ - ghcr
7
+ - github-packages
8
+ - container-registry
9
+ - packages-write
10
+ - github-token
11
+ - docker-push
12
+ - 403
13
+ - permission-denied
14
+ patterns:
15
+ - regex: 'denied: permission_denied: write_package'
16
+ flags: 'i'
17
+ - regex: 'Error response from daemon.*denied.*ghcr\.io.*403'
18
+ flags: 'i'
19
+ - regex: 'error parsing HTTP 403 response body.*ghcr\.io'
20
+ flags: 'i'
21
+ - regex: 'unauthorized.*ghcr\.io.*write'
22
+ flags: 'i'
23
+ - regex: 'The token provided does not match expected format for.*packages'
24
+ flags: 'i'
25
+ error_messages:
26
+ - "Error response from daemon: denied: permission_denied: write_package"
27
+ - "error parsing HTTP 403 response body: unexpected end of JSON input: \"\""
28
+ - "denied: denied"
29
+ - "unauthorized: authentication required"
30
+ - "buildx failed with: ERROR [auth] ghcr.io token refreshing failed with status: 403 Forbidden"
31
+ root_cause: |
32
+ GitHub Container Registry (ghcr.io) requires the `packages: write` permission
33
+ to be explicitly granted in the workflow job's `permissions:` block before
34
+ `GITHUB_TOKEN` can push (publish) images.
35
+
36
+ Since GitHub tightened default GITHUB_TOKEN permissions in 2023, new repositories
37
+ and organization workflows that set "Restrict GitHub Actions permissions" to
38
+ "Read repository contents and packages" have `packages: write` DISABLED by default.
39
+ Without this permission, any `docker push` or `docker buildx build --push` to
40
+ `ghcr.io` fails with a 403 at the registry level.
41
+
42
+ **Why this is confusing:**
43
+ - `docker login ghcr.io -u ${{ github.actor }} -p ${{ secrets.GITHUB_TOKEN }}`
44
+ SUCCEEDS (read/login is allowed) — the login step passes
45
+ - The failure occurs on `docker push`, not login
46
+ - The error message "denied: permission_denied: write_package" or the generic 403
47
+ does not directly mention the `packages: write` permission or how to add it
48
+ - The same workflow may work on repositories created before the permission
49
+ tightening (where `packages: write` was still the default)
50
+
51
+ **Additional requirement — linking the package to the repository:**
52
+ On first publish, the package is created as unlinked. If the workflow runs in a
53
+ fork or if the package visibility is set to Private with no repository link, the
54
+ GITHUB_TOKEN push will also fail even with `packages: write`. The package must be
55
+ linked to the source repository for GITHUB_TOKEN to authenticate.
56
+
57
+ Source: SO #70646920 (22 upvotes, 34k views), SO #78342148, GitHub Docs on
58
+ container registry permissions.
59
+ fix: |
60
+ Add `packages: write` to the job's `permissions:` block. This is the most common
61
+ fix and resolves the majority of GHCR push 403 errors:
62
+
63
+ ```yaml
64
+ jobs:
65
+ build-and-push:
66
+ runs-on: ubuntu-latest
67
+ permissions:
68
+ contents: read
69
+ packages: write # Required to push to ghcr.io
70
+ ```
71
+
72
+ **If login succeeds but push still fails:**
73
+ 1. Verify the package is linked to the repository: go to the package on
74
+ `github.com/{owner}/{package}` → Package settings → "Add Repository"
75
+ 2. Ensure the package visibility matches the repository (private repo → private
76
+ package, or set the package to public)
77
+ 3. For organization repositories, confirm the organization has not blocked
78
+ GitHub Actions from creating packages (Org Settings → Actions → General)
79
+
80
+ **If using a separate PAT or App token:**
81
+ GITHUB_TOKEN with `packages: write` is sufficient for pushing to packages owned
82
+ by the same repository. For cross-repository or org-level package publishing,
83
+ use a Fine-Grained PAT with "Packages: Read and write" scope or a GitHub App
84
+ with "Packages: write" permission.
85
+ fix_code:
86
+ - language: yaml
87
+ label: 'Correct job-level permissions for docker push to ghcr.io'
88
+ code: |
89
+ name: Build and Push Docker Image
90
+ on:
91
+ push:
92
+ branches: [main]
93
+ jobs:
94
+ build-push:
95
+ runs-on: ubuntu-latest
96
+ permissions:
97
+ contents: read
98
+ packages: write # <-- Required for ghcr.io push
99
+ steps:
100
+ - uses: actions/checkout@v4
101
+
102
+ - name: Log in to GitHub Container Registry
103
+ uses: docker/login-action@v3
104
+ with:
105
+ registry: ghcr.io
106
+ username: ${{ github.actor }}
107
+ password: ${{ secrets.GITHUB_TOKEN }}
108
+
109
+ - name: Build and push
110
+ uses: docker/build-push-action@v6
111
+ with:
112
+ context: .
113
+ push: true
114
+ tags: ghcr.io/${{ github.repository }}:latest
115
+ - language: yaml
116
+ label: 'Workflow-level permissions fallback (if setting per-job is not possible)'
117
+ code: |
118
+ name: Build and Push Docker Image
119
+ on:
120
+ push:
121
+ branches: [main]
122
+
123
+ # Set at workflow level — applies to ALL jobs
124
+ permissions:
125
+ contents: read
126
+ packages: write
127
+
128
+ jobs:
129
+ build-push:
130
+ runs-on: ubuntu-latest
131
+ steps:
132
+ - uses: actions/checkout@v4
133
+ - name: Log in to ghcr.io
134
+ uses: docker/login-action@v3
135
+ with:
136
+ registry: ghcr.io
137
+ username: ${{ github.actor }}
138
+ password: ${{ secrets.GITHUB_TOKEN }}
139
+ - name: Push image
140
+ run: |
141
+ docker build -t ghcr.io/${{ github.repository }}:${{ github.sha }} .
142
+ docker push ghcr.io/${{ github.repository }}:${{ github.sha }}
143
+ prevention:
144
+ - "Always add `packages: write` to the job permissions block for any job that pushes images to ghcr.io"
145
+ - "Note that docker login to ghcr.io succeeds without packages:write — always test the actual push step to validate permissions"
146
+ - "Check your organization's Actions general settings for any policies that restrict packages creation"
147
+ - "When using docker/build-push-action with push: true, the packages: write permission is required on the job"
148
+ - "For fork PRs, packages: write is unavailable — use conditional steps: `if: github.event.pull_request.head.repo.full_name == github.repository`"
149
+ - "Link the package to its source repository after first creation to enable GITHUB_TOKEN-based pushes in future runs"
150
+ docs:
151
+ - url: 'https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry'
152
+ label: 'GitHub Docs: Working with the Container registry (ghcr.io)'
153
+ - url: 'https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#permissions-for-the-github_token'
154
+ label: 'GitHub Docs: GITHUB_TOKEN permissions reference'
155
+ - url: 'https://stackoverflow.com/questions/70646920/github-token-permission-denied-write-package-when-build-and-push-docker-in-githu'
156
+ label: 'SO #70646920: GITHUB_TOKEN permission denied write_package (22 upvotes, 34k views)'
157
+ - url: 'https://docs.github.com/en/packages/learn-github-packages/configuring-a-packages-access-control-and-visibility'
158
+ label: 'GitHub Docs: Configuring a package access control and visibility'
@@ -0,0 +1,160 @@
1
+ id: runner-environment-245
2
+ title: 'ARC ephemeral runner pod hangs indefinitely after workflow job cancellation — "Skipping message Job. Job message not found ... job was canceled"'
3
+ category: runner-environment
4
+ severity: error
5
+ tags:
6
+ - arc
7
+ - kubernetes
8
+ - ephemeral-runner
9
+ - cancellation
10
+ - stuck-runner
11
+ - gha-runner-scale-set
12
+ - job-canceled
13
+ patterns:
14
+ - regex: 'Skipping message Job\. Job message not found .+ job was canceled'
15
+ flags: 'i'
16
+ - regex: 'EphemeralRunner.*status.*Running.*canceled'
17
+ flags: 'i'
18
+ - regex: 'Registration .+ was not found\.'
19
+ flags: 'i'
20
+ - regex: 'POST request to .+acquirejob failed\. HTTP Status: Conflict'
21
+ flags: 'i'
22
+ error_messages:
23
+ - "Skipping message Job. Job message not found 'e420db7b-c780-5ae4-8145-4eea09b87aea'. job was canceled"
24
+ - "POST request to https://run-actions-*.actions.githubusercontent.com/*/acquirejob failed. HTTP Status: Conflict"
25
+ - "Registration 470d6c87-f8c4-411f-87d0-6b069233ccc7 was not found."
26
+ root_cause: |
27
+ When a GitHub Actions workflow job is canceled (manually or by concurrency group
28
+ cancellation) while an ARC ephemeral runner is in the process of starting up or
29
+ acquiring the job message, the runner pod enters a broken state:
30
+
31
+ 1. The broker sends a "job was canceled" message on the runner's job socket.
32
+ 2. The runner logs: "Skipping message Job. Job message not found 'X'. job was canceled"
33
+ 3. **The runner process does not exit** — it waits for a new job message that
34
+ will never arrive, because ephemeral runners are single-use and the broker
35
+ considers this runner's slot exhausted.
36
+ 4. The ARC controller sees the EphemeralRunner pod still in `Running` phase with
37
+ a live runner process. It does NOT delete and recreate the pod.
38
+ 5. The runner slot is held indefinitely. No new job can be dispatched to this slot.
39
+ Workflows queue but cannot start, consuming queue capacity without making progress.
40
+
41
+ **Secondary failure mode**: After the canceled-job hang, the runner may also
42
+ attempt to re-acquire its registration token. Since the ephemeral runner's
43
+ registration was already consumed or expired, this fails with:
44
+ `Registration 'X' was not found.`
45
+ The BrokerServer then throws repeatedly, producing a growing error log.
46
+
47
+ **Impact**: In a scale set with `minRunners > 0`, each canceled job permanently
48
+ removes one runner slot from the pool until the pod is manually deleted or the
49
+ scale set is bounced. With frequent cancellations (e.g., PR pushes with concurrency
50
+ cancel-in-progress), the pool drains completely and CI halts.
51
+
52
+ Source: actions/actions-runner-controller#4307 (Nov 2025, 11 reactions, open).
53
+ Also related: ARC#4468 (scale set queuing delay during burst) — adjacent but distinct.
54
+ fix: |
55
+ **Option 1 — Delete stuck EphemeralRunner objects manually (immediate relief):**
56
+ Identify stuck pods — they have been Running for longer than any expected job
57
+ duration and show "job was canceled" in their logs:
58
+
59
+ ```bash
60
+ kubectl get ephemerals -n arc-runners --sort-by=.metadata.creationTimestamp
61
+ kubectl logs <stuck-pod-name> -n arc-runners | grep "job was canceled"
62
+ kubectl delete ephemeralrunner <stuck-pod-name> -n arc-runners
63
+ ```
64
+
65
+ **Option 2 — Add a watchdog sidecar or CronJob to detect and kill stuck runners:**
66
+ Run a periodic job that identifies EphemeralRunner pods that have been in Running
67
+ phase beyond `maxRunnerLifetime` and have no active child processes (no build
68
+ tools, compilers, or test runners as descendants):
69
+
70
+ ```bash
71
+ # Detect hung runners: Running > 30 min with no CPU activity
72
+ kubectl get ephemerals -n arc-runners -o json | \
73
+ jq '.items[] | select(.status.phase=="Running") | .metadata.name'
74
+ ```
75
+
76
+ **Option 3 — Configure autoscaling to replace stuck runners via scale-down:**
77
+ Set `maxRunners` to a value that triggers KEDA scale-down when load is low.
78
+ Scale-down deletes idle pods, including stuck ones. Ephemeral runners in the
79
+ "stuck/waiting" state appear idle to KEDA and will be cleaned up on the next
80
+ scale-down cycle.
81
+
82
+ **Option 4 — Use concurrency groups carefully to reduce cancellations:**
83
+ Each job cancellation risks creating a stuck runner. Reduce cancellation frequency
84
+ by scoping concurrency groups more narrowly (per-branch rather than per-repo):
85
+
86
+ ```yaml
87
+ concurrency:
88
+ group: ${{ github.workflow }}-${{ github.ref }}
89
+ cancel-in-progress: true
90
+ ```
91
+
92
+ **Option 5 — Upgrade ARC and runner image:**
93
+ Track the upstream fix in ARC#4307. Check ARC release notes for a fix to the
94
+ runner's post-cancellation cleanup path. Ensure runner image is on the latest
95
+ minor version.
96
+ fix_code:
97
+ - language: yaml
98
+ label: 'Kubernetes CronJob watchdog to delete stuck ARC ephemeral runners'
99
+ code: |
100
+ apiVersion: batch/v1
101
+ kind: CronJob
102
+ metadata:
103
+ name: arc-runner-watchdog
104
+ namespace: arc-runners
105
+ spec:
106
+ schedule: "*/10 * * * *"
107
+ jobTemplate:
108
+ spec:
109
+ template:
110
+ spec:
111
+ serviceAccountName: arc-runner-watchdog-sa
112
+ containers:
113
+ - name: watchdog
114
+ image: bitnami/kubectl:latest
115
+ command:
116
+ - /bin/sh
117
+ - -c
118
+ - |
119
+ TIMEOUT_MINUTES=30
120
+ NOW=$(date +%s)
121
+ kubectl get ephemeralrunners -n arc-runners -o json | \
122
+ jq -r --argjson now "$NOW" --argjson timeout "$((TIMEOUT_MINUTES * 60))" \
123
+ '.items[] | select(.status.phase=="Running") |
124
+ select(($now - (.metadata.creationTimestamp | fromdateiso8601)) > $timeout) |
125
+ .metadata.name' | \
126
+ xargs -r -I{} kubectl delete ephemeralrunner {} -n arc-runners
127
+ restartPolicy: OnFailure
128
+ - language: yaml
129
+ label: 'Narrow concurrency groups to reduce job cancellations hitting ARC'
130
+ code: |
131
+ name: CI
132
+ on:
133
+ push:
134
+ branches: ['**']
135
+ pull_request:
136
+ concurrency:
137
+ # Scope to branch — only cancels duplicate runs on the same branch,
138
+ # not across all branches (reduces how often ARC runners get hit by cancellations)
139
+ group: ${{ github.workflow }}-${{ github.ref }}
140
+ cancel-in-progress: true
141
+ jobs:
142
+ build:
143
+ runs-on:
144
+ group: my-scale-set
145
+ steps:
146
+ - uses: actions/checkout@v4
147
+ - run: make build
148
+ prevention:
149
+ - "Monitor EphemeralRunner pod ages; any pod in Running phase beyond 2x your longest job duration is likely stuck"
150
+ - "Use narrow concurrency group scopes (per-branch) to minimize the frequency of job cancellations hitting ARC runners"
151
+ - "Configure KEDA scale-down (minRunners: 0 during off-hours) to naturally clear stuck runner pods"
152
+ - "Keep ARC controller and runner image on the latest stable release; the stuck-on-cancel bug has active upstream attention in ARC#4307"
153
+ - "For high-cancellation workflows (PR push loops), consider using GitHub-hosted runners which handle cancellation cleanup server-side"
154
+ docs:
155
+ - url: 'https://github.com/actions/actions-runner-controller/issues/4307'
156
+ label: 'ARC#4307: Ephemeral Runners get stuck when job is canceled or interrupted (11 reactions, open)'
157
+ - url: 'https://github.com/actions/actions-runner-controller/blob/master/docs/gha-runner-scale-set-controller/README.md'
158
+ label: 'ARC: GitHub Actions Runner Scale Set documentation'
159
+ - url: 'https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors'
160
+ label: 'GitHub Docs: Troubleshooting ARC errors'
@@ -0,0 +1,132 @@
1
+ id: silent-failures-120
2
+ title: 'github.event.schedule returns stale cron expression for 1-2 runs after updating workflow schedule'
3
+ category: silent-failures
4
+ severity: silent-failure
5
+ tags:
6
+ - schedule
7
+ - cron
8
+ - github.event.schedule
9
+ - stale-value
10
+ - routing
11
+ - trigger-payload
12
+ patterns:
13
+ - regex: 'github\.event\.schedule.*[0-9\*\/\-\,]+'
14
+ flags: 'i'
15
+ - regex: 'Unknown schedule:\s+[0-9\*\/\-\, ]+'
16
+ flags: 'i'
17
+ - regex: 'Evaluating String.*=>\s+''[0-9\*\/\-\, ]+'''
18
+ flags: 'i'
19
+ error_messages:
20
+ - "Error: Unknown schedule: 0 5 * * *"
21
+ - "##[debug]....=> '0 5 * * *'"
22
+ - "github.event.schedule evaluates to removed cron expression"
23
+ root_cause: |
24
+ GitHub's schedule dispatcher caches the cron expression strings associated with
25
+ each workflow internally. When a workflow's `on.schedule` block is updated (cron
26
+ expression changed or removed) and pushed to the default branch, the dispatcher's
27
+ internal state does not immediately sync with the new YAML. For 1-2 scheduled runs
28
+ after the change, the dispatcher fires the workflow with the OLD cron expression
29
+ in `github.event.schedule`, even though the executed YAML already contains the
30
+ updated cron string.
31
+
32
+ This means:
33
+ - A workflow with routing logic such as `if: github.event.schedule == '0 5 * * 1-5'`
34
+ silently skips those steps during the transition window
35
+ - A `case`/`switch`-style step that branches on `github.event.schedule` hits an
36
+ "unknown schedule" branch and logs a confusing error
37
+ - Multi-schedule workflows that use `github.event.schedule` to dispatch different jobs
38
+ produce wrong results for 1-2 runs post-update
39
+
40
+ The stale value typically clears within the next scheduled trigger cycle once the
41
+ dispatcher re-reads the updated workflow YAML from the default branch. The actual
42
+ workflow code that runs is the CURRENT version — only the event payload is stale.
43
+
44
+ Source: actions/runner#4241 (Feb 2026) — real-world failure confirmed in
45
+ ava-labs/firewood with stale '0 5 * * *' firing after update to '0 5 * * 1-5',
46
+ with debug logs showing the old value in the evaluator.
47
+ fix: |
48
+ **Option 1 — Use a fallback in cron-routing logic:**
49
+ When switching cron expressions, treat the transition period gracefully by adding
50
+ the OLD expression to your routing logic temporarily. After 2-3 scheduled cycles,
51
+ remove the old branch.
52
+
53
+ **Option 2 — Avoid routing on github.event.schedule entirely:**
54
+ Use separate workflow files for each schedule instead of branching inside one
55
+ workflow on `github.event.schedule`. Each schedule in its own file has an
56
+ independent dispatch state.
57
+
58
+ **Option 3 — Trigger a manual run after updating the schedule:**
59
+ After pushing the cron change, trigger the workflow via `workflow_dispatch` once.
60
+ This forces the dispatcher to re-evaluate the workflow file and typically clears
61
+ the stale state before the next real scheduled run.
62
+
63
+ **Option 4 — Accept and handle the stale run:**
64
+ Add a default/fallback case to your routing logic that gracefully handles an
65
+ unexpected `github.event.schedule` value rather than erroring:
66
+
67
+ ```yaml
68
+ - name: Route by schedule
69
+ run: |
70
+ case "${{ github.event.schedule }}" in
71
+ '0 5 * * 1-5') echo "Weekday run" ;;
72
+ '0 5 * * 6') echo "Saturday run" ;;
73
+ *) echo "Unknown schedule (possibly stale): ${{ github.event.schedule }}; skipping" ;;
74
+ esac
75
+ ```
76
+ fix_code:
77
+ - language: yaml
78
+ label: 'Use separate workflows per schedule (avoids routing entirely)'
79
+ code: |
80
+ # weekday-job.yml
81
+ on:
82
+ schedule:
83
+ - cron: '0 5 * * 1-5'
84
+ jobs:
85
+ weekday:
86
+ runs-on: ubuntu-latest
87
+ steps:
88
+ - run: echo "Weekday build"
89
+
90
+ # saturday-job.yml
91
+ on:
92
+ schedule:
93
+ - cron: '0 5 * * 6'
94
+ jobs:
95
+ saturday:
96
+ runs-on: ubuntu-latest
97
+ steps:
98
+ - run: echo "Saturday build"
99
+ - language: yaml
100
+ label: 'Graceful fallback case when routing on github.event.schedule'
101
+ code: |
102
+ on:
103
+ schedule:
104
+ - cron: '0 5 * * 1-5'
105
+ - cron: '0 5 * * 6'
106
+ jobs:
107
+ dispatch:
108
+ runs-on: ubuntu-latest
109
+ steps:
110
+ - name: Route by schedule
111
+ run: |
112
+ case "${{ github.event.schedule }}" in
113
+ '0 5 * * 1-5') echo "Weekday build" ;;
114
+ '0 5 * * 6') echo "Saturday build" ;;
115
+ # Fallback handles stale values during transition window
116
+ *)
117
+ echo "Unrecognized schedule value: '${{ github.event.schedule }}'"
118
+ echo "This may be a stale value from before a recent schedule update."
119
+ echo "Skipping; next run should carry the updated expression."
120
+ ;;
121
+ esac
122
+ prevention:
123
+ - "Never rely on github.event.schedule routing immediately after updating a cron expression; allow 2-3 cycles for dispatcher state to sync"
124
+ - "Prefer separate workflow files per schedule rather than branching on github.event.schedule inside a single workflow"
125
+ - "When routing on github.event.schedule is necessary, always include a default/fallback case that handles unexpected values gracefully"
126
+ - "Trigger a workflow_dispatch run after updating a cron expression to force dispatcher re-evaluation before the next scheduled cycle"
127
+ - "Add a debug step that logs github.event.schedule at the start of scheduled workflows to surface stale values immediately"
128
+ docs:
129
+ - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#schedule'
130
+ label: 'GitHub Docs: schedule event'
131
+ - url: 'https://github.com/actions/runner/issues/4241'
132
+ label: 'actions/runner#4241: Cron scheduled returns stale cron expression after updating workflow schedule triggers'
@@ -0,0 +1,140 @@
1
+ id: silent-failures-121
2
+ title: '`workflow_run: types: [completed]` triggers on ALL conclusions — deploy runs even when CI failed or was cancelled'
3
+ category: silent-failures
4
+ severity: silent-failure
5
+ tags:
6
+ - workflow-run
7
+ - conclusion
8
+ - silent-failure
9
+ - deploy-on-failure
10
+ - ci-cd
11
+ - workflow-trigger
12
+ - conclusion-check
13
+ patterns:
14
+ - regex: 'on:\s*\n\s+workflow_run:'
15
+ flags: 'im'
16
+ - regex: 'types:\s*[\[\s]*completed[\]\s]*\n'
17
+ flags: 'i'
18
+ error_messages:
19
+ - "# No error message — the deploy workflow runs silently even when upstream CI failed or was cancelled"
20
+ root_cause: |
21
+ The `workflow_run` event with `types: [completed]` fires whenever the referenced workflow
22
+ finishes — regardless of its conclusion. A completed workflow can have any of these conclusions:
23
+ `success`, `failure`, `cancelled`, `skipped`, `timed_out`, `action_required`, `neutral`,
24
+ or `stale`. GitHub fires `workflow_run: completed` for ALL of these states.
25
+
26
+ Developers commonly use `workflow_run` to chain a deploy or release workflow after a CI
27
+ workflow passes. The intuitive reading of "completed" is "finished successfully," but GitHub's
28
+ semantics are "finished with any outcome":
29
+
30
+ ```yaml
31
+ # ❌ This runs deploy even when CI failed, was cancelled, or was skipped
32
+ on:
33
+ workflow_run:
34
+ workflows: [CI]
35
+ types: [completed]
36
+ ```
37
+
38
+ Without an explicit `if:` conclusion check, the deploy workflow runs unconditionally after CI —
39
+ including when CI failed mid-run, was cancelled by a force-push, or was skipped by a path
40
+ filter. This silently deploys broken code (or publishes, releases, or notifies) whenever CI
41
+ does not succeed.
42
+
43
+ **Why this is a silent failure:**
44
+ - No error appears — the triggered workflow runs normally and reports "success"
45
+ - The CI failure and deploy completion appear as separate workflow runs — easy to miss
46
+ - GitHub does not add a default conclusion filter — the developer must add it manually
47
+ - The mistake is natural: "trigger when CI is done" sounds equivalent to "trigger when CI passes"
48
+
49
+ Sources: GitHub Docs (workflow_run event, conclusion field); SO#62750603 (159 upvotes,
50
+ 152,034 views — top answer recommends adding conclusion check); `ahmadnassri/action-workflow-run-wait`
51
+ (39 stars — third-party action created specifically to work around this limitation)
52
+ fix: |
53
+ Add an `if:` condition at the JOB level (not workflow level) to check
54
+ `github.event.workflow_run.conclusion == 'success'`. Job-level conditions are evaluated at
55
+ runtime after the event fires, so this reliably gates deployment on CI success.
56
+
57
+ Do NOT put the `if:` check at the workflow level — it prevents ALL jobs from running,
58
+ including any cleanup or notification jobs that should run regardless of conclusion.
59
+
60
+ For workflows that need to handle both success and failure (e.g., notify Slack of failures),
61
+ use separate jobs with different `if:` checks.
62
+ fix_code:
63
+ - language: yaml
64
+ label: 'Add conclusion check at the job level — most common fix'
65
+ code: |
66
+ name: Deploy
67
+
68
+ on:
69
+ workflow_run:
70
+ workflows: [CI]
71
+ types: [completed]
72
+
73
+ jobs:
74
+ deploy:
75
+ # ✅ Only deploy when CI succeeded
76
+ if: github.event.workflow_run.conclusion == 'success'
77
+ runs-on: ubuntu-latest
78
+ steps:
79
+ - uses: actions/checkout@v4
80
+ with:
81
+ # workflow_run always checks out default branch — use ref for PR branches
82
+ ref: ${{ github.event.workflow_run.head_sha }}
83
+ - run: ./deploy.sh
84
+ - language: yaml
85
+ label: 'Branch by conclusion — deploy on success, notify on failure'
86
+ code: |
87
+ name: Post-CI Actions
88
+
89
+ on:
90
+ workflow_run:
91
+ workflows: [CI]
92
+ types: [completed]
93
+
94
+ jobs:
95
+ deploy:
96
+ if: github.event.workflow_run.conclusion == 'success'
97
+ runs-on: ubuntu-latest
98
+ steps:
99
+ - run: ./deploy.sh
100
+
101
+ notify-failure:
102
+ if: >-
103
+ github.event.workflow_run.conclusion == 'failure' ||
104
+ github.event.workflow_run.conclusion == 'timed_out'
105
+ runs-on: ubuntu-latest
106
+ steps:
107
+ - name: Notify team of CI failure
108
+ run: |
109
+ echo "CI failed on branch ${{ github.event.workflow_run.head_branch }}"
110
+ # Send Slack/Teams notification here
111
+ - language: yaml
112
+ label: 'Anti-pattern: missing conclusion check silently deploys broken code'
113
+ code: |
114
+ # ❌ BAD: Runs even when CI failed, was cancelled, or was skipped by path filter
115
+ on:
116
+ workflow_run:
117
+ workflows: [CI]
118
+ types: [completed] # "completed" means ANY outcome, not just success
119
+
120
+ jobs:
121
+ deploy:
122
+ # No if: conclusion check — deploys broken code on CI failure!
123
+ runs-on: ubuntu-latest
124
+ steps:
125
+ - run: ./deploy.sh
126
+ prevention:
127
+ - "Always add `if: github.event.workflow_run.conclusion == 'success'` to deploy/release jobs triggered by workflow_run"
128
+ - "Remember: 'completed' means the upstream workflow finished — NOT that it succeeded; always check conclusion explicitly"
129
+ - "Use separate jobs with different `if: conclusion ==` checks for success vs failure vs cancelled handling"
130
+ - "Consider adding a linting step or actionlint check that flags workflow_run jobs without a conclusion check"
131
+ - "Test the failure path by intentionally failing CI and verifying the downstream workflow does NOT deploy"
132
+ docs:
133
+ - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#workflow_run'
134
+ label: 'GitHub Docs: workflow_run event — conclusion field and types'
135
+ - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/accessing-contextual-information-about-workflow-runs#github-context'
136
+ label: 'GitHub Docs: github.event.workflow_run context properties'
137
+ - url: 'https://stackoverflow.com/questions/62750603/github-actions-trigger-another-action-after-one-action-is-completed'
138
+ label: 'SO #62750603: Trigger action after completion — top answer adds conclusion check (159 upvotes, 152k views)'
139
+ - url: 'https://github.com/ahmadnassri/action-workflow-run-wait'
140
+ label: 'ahmadnassri/action-workflow-run-wait — third-party action created to wait for successful workflow_run'
@@ -0,0 +1,142 @@
1
+ id: triggers-074
2
+ title: '`workflow_run: branches:` filter silently never triggers when upstream was started by `schedule` or `workflow_dispatch` — `head_branch` is null'
3
+ category: triggers
4
+ severity: silent-failure
5
+ tags:
6
+ - workflow-run
7
+ - branches-filter
8
+ - head-branch
9
+ - schedule
10
+ - workflow-dispatch
11
+ - null
12
+ - silent-failure
13
+ - trigger-missing
14
+ patterns:
15
+ - regex: 'on:\s*\n\s+workflow_run:\s*\n(.|\n)*?branches:\s*\n'
16
+ flags: 'im'
17
+ - regex: 'workflow_run.*branches.*schedule\s*$|schedule.*workflow_run.*branches'
18
+ flags: 'im'
19
+ error_messages:
20
+ - "# No error message — the downstream workflow simply never runs when upstream was triggered by schedule or workflow_dispatch"
21
+ root_cause: |
22
+ The `workflow_run` event exposes a `branches:` filter that is supposed to limit which upstream
23
+ workflow run conclusions trigger the downstream workflow. The filter compares against
24
+ `github.event.workflow_run.head_branch`.
25
+
26
+ **The problem:** When the upstream workflow is triggered by a `schedule` or `workflow_dispatch`
27
+ event, GitHub sets `head_branch` to `null` (not a branch name). The `branches:` filter then
28
+ evaluates `null` against each pattern in the list — and null never matches any branch name
29
+ pattern, including `[main]`, `['**']`, or any glob. The downstream workflow silently never
30
+ triggers, even though the upstream ran successfully on main.
31
+
32
+ **Event types that produce null head_branch:**
33
+ - `schedule` — cron-triggered workflows have no associated branch context
34
+ - `workflow_dispatch` — manually dispatched workflows MAY have null head_branch in some cases
35
+
36
+ **Pull request triggers correctly populate head_branch** (the PR head branch name), and
37
+ `push` triggers also correctly populate it (the pushed branch name). The null behavior is
38
+ specific to schedule and dispatch events on the upstream.
39
+
40
+ **Why this is a silent failure:**
41
+ - No error is raised — the downstream workflow simply does not appear in the Actions UI for that run
42
+ - The upstream workflow runs and completes successfully
43
+ - The `branches:` filter looks correct — `[main]` is the right branch name
44
+ - The issue only manifests for schedule/dispatch-triggered upstream runs; if CI also triggers
45
+ on push to main, push-triggered runs DO reach the downstream workflow
46
+
47
+ **Example scenario:** A deploy workflow runs after CI. CI has `on: [push, schedule]`.
48
+ The `branches: [main]` filter in the deploy's workflow_run block works for push events but
49
+ silently skips all scheduled CI runs.
50
+
51
+ Sources: GitHub Docs (workflow_run, head_branch null for schedule); community/discussions
52
+ related to workflow_run not triggering after schedule; documented in
53
+ `workflow-run-head-branch-null-schedule-dispatch-concurrency.yml` (concurrency variant)
54
+ fix: |
55
+ Remove the `branches:` filter from the `workflow_run` trigger entirely and instead use a
56
+ job-level `if:` condition to check the branch after the event fires. The `if:` condition
57
+ evaluates at runtime and can handle null gracefully.
58
+
59
+ For schedule-triggered upstreams, use `github.event.workflow_run.head_branch == 'main' ||
60
+ github.event.workflow_run.head_branch == null` (null means it ran from the default branch).
61
+
62
+ Alternatively, filter by the workflow run's `head_sha` and check against known branch SHAs.
63
+ fix_code:
64
+ - language: yaml
65
+ label: 'Remove branches filter and add job-level if condition — handles schedule and dispatch'
66
+ code: |
67
+ name: Deploy After CI
68
+
69
+ on:
70
+ workflow_run:
71
+ workflows: [CI]
72
+ types: [completed]
73
+ # ❌ Do NOT use branches: here if CI runs on schedule or workflow_dispatch
74
+ # branches: [main] # This silently skips all schedule-triggered CI runs
75
+
76
+ jobs:
77
+ deploy:
78
+ # ✅ Use job-level if: instead of trigger-level branches:
79
+ # null head_branch means schedule/dispatch from default branch — include it
80
+ if: >-
81
+ github.event.workflow_run.conclusion == 'success' &&
82
+ (github.event.workflow_run.head_branch == 'main' ||
83
+ github.event.workflow_run.head_branch == null)
84
+ runs-on: ubuntu-latest
85
+ steps:
86
+ - uses: actions/checkout@v4
87
+ - run: ./deploy.sh
88
+ - language: yaml
89
+ label: 'When upstream is push-only: branches filter works fine (head_branch is always set)'
90
+ code: |
91
+ name: Deploy After CI (push-only upstream)
92
+
93
+ on:
94
+ workflow_run:
95
+ workflows: [CI]
96
+ types: [completed]
97
+ # ✅ branches: filter works correctly ONLY when upstream runs on push or pull_request
98
+ # Because push and pull_request events always set head_branch
99
+ branches: [main]
100
+
101
+ jobs:
102
+ deploy:
103
+ if: github.event.workflow_run.conclusion == 'success'
104
+ runs-on: ubuntu-latest
105
+ steps:
106
+ - uses: actions/checkout@v4
107
+ - run: ./deploy.sh
108
+
109
+ # ⚠️ If CI also has on: schedule — add null check above or remove branches: filter
110
+ - language: yaml
111
+ label: 'Anti-pattern: branches filter with schedule-triggered upstream — downstream never fires'
112
+ code: |
113
+ # ❌ BAD: Upstream CI has on: [push, schedule]
114
+ # But deploy's branches filter silently skips all scheduled CI runs
115
+
116
+ name: Deploy After CI
117
+
118
+ on:
119
+ workflow_run:
120
+ workflows: [CI]
121
+ types: [completed]
122
+ branches: [main] # null (from schedule) never matches 'main' — deploy skipped!
123
+
124
+ jobs:
125
+ deploy:
126
+ if: github.event.workflow_run.conclusion == 'success'
127
+ runs-on: ubuntu-latest
128
+ steps:
129
+ - run: ./deploy.sh # Never runs after scheduled CI
130
+ prevention:
131
+ - "Avoid `branches:` filter in workflow_run if the upstream workflow can be triggered by `schedule` or `workflow_dispatch`"
132
+ - "Use job-level `if:` conditions with explicit null handling instead of trigger-level branch filters"
133
+ - "Check the upstream workflow's triggers — if it includes `schedule` or `workflow_dispatch`, `head_branch` will be null for those runs"
134
+ - "Test by triggering the upstream workflow manually and verifying the downstream workflow appears in the Actions UI"
135
+ - "When null head_branch is acceptable (schedule from default branch), use `head_branch == 'main' || head_branch == null`"
136
+ docs:
137
+ - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#workflow_run'
138
+ label: 'GitHub Docs: workflow_run event — branches filter and head_branch field'
139
+ - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/accessing-contextual-information-about-workflow-runs#github-context'
140
+ label: 'GitHub Docs: github.event.workflow_run — head_branch property (null for schedule/dispatch)'
141
+ - url: 'https://github.com/orgs/community/discussions/26238'
142
+ label: 'GitHub Community: workflow_run not triggering — head_branch null discussion'
@@ -0,0 +1,141 @@
1
+ id: yaml-syntax-077
2
+ title: 'VS Code GitHub Actions extension / languageservices emits false "Context access might be invalid" warning for secrets.*, vars.*, needs.* in valid regular workflows'
3
+ category: yaml-syntax
4
+ severity: warning
5
+ tags:
6
+ - vscode
7
+ - languageservices
8
+ - false-positive
9
+ - secrets-context
10
+ - vars-context
11
+ - needs-context
12
+ - linting
13
+ - actionlint
14
+ patterns:
15
+ - regex: 'Context access might be invalid:\s+secrets\.'
16
+ flags: 'i'
17
+ - regex: 'Context access might be invalid:\s+vars\.'
18
+ flags: 'i'
19
+ - regex: 'Context access might be invalid:\s+needs\.'
20
+ flags: 'i'
21
+ - regex: 'Unrecognized named-value:\s+''secrets'''
22
+ flags: 'i'
23
+ error_messages:
24
+ - "Context access might be invalid: secrets.MY_SECRET"
25
+ - "Context access might be invalid: vars.MY_VAR"
26
+ - "Context access might be invalid: needs.build.outputs.artifact-path"
27
+ - "Unrecognized named-value: 'secrets'"
28
+ root_cause: |
29
+ The VS Code GitHub Actions extension (github.vscode-github-actions) uses the
30
+ `@actions/languageservices` package for workflow linting and IntelliSense. The
31
+ language server performs static analysis without access to the repository's actual
32
+ secrets, variables, or environment configuration. It emits "Context access might
33
+ be invalid" warnings in several scenarios where the workflow is in fact correct:
34
+
35
+ 1. **Repository-level secrets and variables**: The language server cannot query
36
+ the repository's secrets or variables store, so any `secrets.MY_SECRET` or
37
+ `vars.MY_VAR` reference triggers a warning. This affects every workflow that
38
+ uses custom secrets or variables — i.e., nearly all real-world workflows.
39
+
40
+ 2. **Dynamic environment secrets**: When a workflow uses `environment: ${{ ... }}`
41
+ with an expression to select the environment at runtime, the language server
42
+ cannot statically resolve which environment's secrets are accessible. Even
43
+ validly scoped `secrets.*` references under dynamic environments are flagged.
44
+
45
+ 3. **needs.* context across complex job dependency graphs**: The language server
46
+ sometimes flags `needs.X.outputs.Y` as invalid when job X is defined in the
47
+ same workflow but the graph is not trivially linear, or when `needs:` contains
48
+ a list of multiple dependencies.
49
+
50
+ **Why this is a false positive:**
51
+ The warnings are generated by the language server's static type system, which
52
+ requires compile-time knowledge of what secrets/variables exist. GitHub Actions
53
+ resolves these at runtime from the repository/organization secrets store — a
54
+ source the language server has no access to. The workflow runs correctly at
55
+ runtime; only the editor shows red squiggles.
56
+
57
+ This is an open known issue tracked in actions/languageservices#239 (assigned
58
+ to GitHub staff, reopened after partial fixes). Multiple VS Code extension issues
59
+ document the same root cause: vscode-github-actions#461, #452, #375, #533.
60
+
61
+ **Scope**: Affects anyone using the VS Code GitHub Actions extension
62
+ (github.vscode-github-actions) v0.x and v1.x with workflows using secrets,
63
+ variables, or multi-job dependency outputs. NOT an actionlint issue —
64
+ actionlint correctly handles these cases without false positives.
65
+ fix: |
66
+ **Option 1 — Suppress per-line (extension-specific comment):**
67
+ The language server respects `# ignore` comments to suppress specific warnings.
68
+ Add a comment on the line preceding the flagged expression:
69
+
70
+ ```yaml
71
+ env:
72
+ # ignore: context-access-might-be-invalid
73
+ MY_VAR: ${{ secrets.MY_SECRET }}
74
+ ```
75
+
76
+ Note: This syntax is not standardized and may change with extension updates.
77
+
78
+ **Option 2 — Disable specific diagnostics in VS Code settings:**
79
+ In `.vscode/settings.json`, disable the diagnostic class:
80
+
81
+ ```json
82
+ {
83
+ "github-actions.workflows.pinned.refresh.enabled": true
84
+ }
85
+ ```
86
+
87
+ **Option 3 — Ignore and trust runtime behavior:**
88
+ This is a false positive — the workflow will run correctly. If the warning is
89
+ distracting, the safest approach is to acknowledge it as a known extension
90
+ limitation and verify by running the workflow. GitHub Actions runtime correctly
91
+ resolves secrets, vars, and needs contexts.
92
+
93
+ **Option 4 — Use actionlint for CI-level linting (no false positives for secrets):**
94
+ `actionlint` does not flag `secrets.*` references as invalid (it correctly
95
+ treats them as opaque strings). For CI integration, prefer actionlint over
96
+ the VS Code extension's linter for automated checks.
97
+ fix_code:
98
+ - language: yaml
99
+ label: 'Workflow using secrets and vars — runs correctly despite VS Code warnings'
100
+ code: |
101
+ # This workflow is valid — VS Code warns but GitHub Actions runs it correctly.
102
+ # The "Context access might be invalid" warnings for secrets.* and vars.*
103
+ # are false positives from the languageservices static analyzer.
104
+ name: Deploy
105
+ on:
106
+ push:
107
+ branches: [main]
108
+ jobs:
109
+ deploy:
110
+ runs-on: ubuntu-latest
111
+ environment: production
112
+ steps:
113
+ - name: Use secret (VS Code may warn here — ignore it)
114
+ run: echo "Deploying..."
115
+ env:
116
+ API_KEY: ${{ secrets.API_KEY }} # false positive warning
117
+ BASE_URL: ${{ vars.DEPLOY_BASE_URL }} # false positive warning
118
+ - language: json
119
+ label: 'VS Code settings.json — disable noisy extension diagnostics'
120
+ code: |
121
+ {
122
+ "github-actions.workflows.pinned.refresh.enabled": true,
123
+ "github-actions.org-secrets.enabled": false
124
+ }
125
+ prevention:
126
+ - "Do not suppress workflow secrets or variables based on VS Code warnings — verify by actually running the workflow"
127
+ - "Use actionlint for CI-integrated linting; it handles secrets/vars correctly without false positives"
128
+ - "Keep the VS Code GitHub Actions extension updated; GitHub staff are actively working on reducing false positives in languageservices"
129
+ - "Check the languageservices GitHub repo (actions/languageservices) for known false positive issues before filing new bugs"
130
+ - "When sharing workflow snippets, note that VS Code warnings do not necessarily indicate real problems — always confirm against actual run output"
131
+ docs:
132
+ - url: 'https://github.com/actions/languageservices/issues/239'
133
+ label: 'actions/languageservices#239: Context access might be invalid is too aggressive (open, assigned)'
134
+ - url: 'https://github.com/github/vscode-github-actions/issues/461'
135
+ label: 'vscode-github-actions#461: Unrecognized Github Actions Secret Context (false positive)'
136
+ - url: 'https://github.com/github/vscode-github-actions/issues/452'
137
+ label: 'vscode-github-actions#452: False warning for actions/checkout: Context access might be invalid'
138
+ - url: 'https://github.com/github/vscode-github-actions/issues/375'
139
+ label: 'vscode-github-actions#375: false positive error on secrets context access in forks'
140
+ - url: 'https://github.com/rhysd/actionlint'
141
+ label: 'actionlint: alternative linter that handles secrets/vars without false positives'
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@htekdev/actions-debugger",
3
- "version": "1.0.130",
3
+ "version": "1.0.132",
4
4
  "description": "65+ real GitHub Actions errors, queryable by agents. CLI + MCP server + Copilot skills + error database.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",