@htekdev/actions-debugger 1.0.116 → 1.0.118
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/errors/caching-artifacts/cache-key-windows-path-separator-never-matches.yml +107 -0
- package/errors/caching-artifacts/caching-artifacts-069.yml +133 -0
- package/errors/concurrency-timing/rerun-failed-jobs-bypasses-concurrency-group.yml +89 -0
- package/errors/concurrency-timing/workflow-run-head-branch-null-schedule-dispatch-concurrency.yml +135 -0
- package/errors/known-unsolved/empty-matrix-fromjson-workflow-failure-no-conditional-skip.yml +108 -0
- package/errors/known-unsolved/node-action-post-step-wrong-inputs-nested-composite.yml +133 -0
- package/errors/known-unsolved/ubuntu-24-04-arm64-missing-binder-ashmem-kernel-modules.yml +149 -0
- package/errors/permissions-auth/permissions-auth-069.yml +161 -0
- package/errors/runner-environment/arc-autoscalinglistener-ephemeralrunnerset-stale-after-upgrade.yml +134 -0
- package/errors/runner-environment/broker-server-socket-exception-nat-timeout-linux.yml +114 -0
- package/errors/runner-environment/checkout-v603-hash-algorithm-api-rate-limiting.yml +100 -0
- package/errors/runner-environment/macos-self-hosted-listener-aad-ghost-busy-stall.yml +126 -0
- package/errors/runner-environment/runner-environment-210.yml +105 -0
- package/errors/runner-environment/runner-environment-213.yml +142 -0
- package/errors/runner-environment/setup-node-ebaddevengines-devengines-packagemanager.yml +103 -0
- package/errors/runner-environment/ubuntu-24-man-db-dpkg-trigger-apt-install-stall.yml +94 -0
- package/errors/runner-environment/ubuntu-26-04-missing-preinstalled-tools.yml +178 -0
- package/errors/runner-environment/upload-artifact-v6-proxy-headers-leak-strict-proxy-fail.yml +101 -0
- package/errors/silent-failures/silent-failures-108.yml +108 -0
- package/errors/triggers/pull-request-labeled-fires-all-labels-no-name-filter.yml +110 -0
- package/errors/yaml-syntax/duplicate-step-id-within-job-scope-validation-error.yml +130 -0
- package/package.json +1 -1
|
@@ -0,0 +1,133 @@
|
|
|
1
|
+
id: known-unsolved-066
|
|
2
|
+
title: 'Node Action Post Step Receives Wrong INPUT_* Env Vars When Called Through Nested Composite Action'
|
|
3
|
+
category: known-unsolved
|
|
4
|
+
severity: error
|
|
5
|
+
tags:
|
|
6
|
+
- composite-actions
|
|
7
|
+
- post-step
|
|
8
|
+
- node-action
|
|
9
|
+
- nested
|
|
10
|
+
- inputs
|
|
11
|
+
- runner-bug
|
|
12
|
+
- savestate
|
|
13
|
+
patterns:
|
|
14
|
+
- regex: 'Input required and not supplied:\s*\S+'
|
|
15
|
+
flags: 'i'
|
|
16
|
+
- regex: 'Post job cleanup\.\s*\n.*Input required'
|
|
17
|
+
flags: 'im'
|
|
18
|
+
- regex: 'post.*wrong.*input|INPUT_.*post.*ancestor'
|
|
19
|
+
flags: 'i'
|
|
20
|
+
error_messages:
|
|
21
|
+
- 'Input required and not supplied: token'
|
|
22
|
+
- 'Input required and not supplied: test'
|
|
23
|
+
- 'Error: Input required and not supplied: <input-name>'
|
|
24
|
+
- 'Post job cleanup.'
|
|
25
|
+
root_cause: |
|
|
26
|
+
Runner bug (actions/runner#3514, actions/runner#2030, open since 2022): when a Node.js
|
|
27
|
+
action that has a `post:` step is invoked through one or more composite action layers,
|
|
28
|
+
the runner restores the wrong `INPUT_*` environment variables for the post step execution.
|
|
29
|
+
|
|
30
|
+
At post-step execution time, the runner sets environment variables from the nearest
|
|
31
|
+
ancestor composite action's inputs, not from the inputs actually passed to the Node
|
|
32
|
+
action itself. For example:
|
|
33
|
+
|
|
34
|
+
Workflow → outer-composite (inputs: image-tag: "foo") → inner-composite
|
|
35
|
+
→ node-action (inputs: image-tag: "foo")
|
|
36
|
+
|
|
37
|
+
The node action's main step correctly sees INPUT_IMAGE_TAG=foo.
|
|
38
|
+
In the post step, INPUT_IMAGE_TAG is absent or overwritten by the outer composite's
|
|
39
|
+
INPUT_IMAGE_TAG value (or another ancestor composite's value), causing:
|
|
40
|
+
- Required inputs to appear missing → visible error in post cleanup
|
|
41
|
+
- Optional inputs to resolve to wrong values → silent wrong behavior (e.g.,
|
|
42
|
+
devcontainers/ci pushes image with wrong tag; codeql-action uploads SARIF with
|
|
43
|
+
wrong token)
|
|
44
|
+
|
|
45
|
+
This affects ANY Node action with a post step that is called through composite
|
|
46
|
+
action nesting (depth ≥ 2). First-party actions affected include github/codeql-action
|
|
47
|
+
(fixed via workaround in codeql-action#2557) and pnpm/action-setup (issue #253).
|
|
48
|
+
fix: |
|
|
49
|
+
GitHub has not fixed the runner bug. The canonical workaround is to persist inputs
|
|
50
|
+
in the action's main step using `core.saveState()` and read them back in the post
|
|
51
|
+
step using `core.getState()` instead of `core.getInput()`.
|
|
52
|
+
|
|
53
|
+
Actions consuming this workaround pattern:
|
|
54
|
+
- github/codeql-action (PR #2557): saveState for upload inputs
|
|
55
|
+
- Any action that reads inputs in its post step
|
|
56
|
+
|
|
57
|
+
If you maintain a Node action with a post step, add to your main.ts:
|
|
58
|
+
core.saveState('my-input', core.getInput('my-input'));
|
|
59
|
+
And in your post.ts:
|
|
60
|
+
const val = core.getState('my-input'); // use this instead of core.getInput
|
|
61
|
+
|
|
62
|
+
If you are a workflow author calling a third-party action through composite layers
|
|
63
|
+
and seeing wrong post-step behavior, check whether the action uses `core.getInput`
|
|
64
|
+
in its post step. If so, file an issue with the action maintainer referencing
|
|
65
|
+
actions/runner#3514 and the saveState workaround.
|
|
66
|
+
fix_code:
|
|
67
|
+
- language: typescript
|
|
68
|
+
label: 'Action main.ts — persist inputs to state before post step runs'
|
|
69
|
+
code: |
|
|
70
|
+
import * as core from '@actions/core';
|
|
71
|
+
|
|
72
|
+
async function run() {
|
|
73
|
+
// Read inputs normally in main
|
|
74
|
+
const token = core.getInput('token', { required: true });
|
|
75
|
+
const imageName = core.getInput('image-name');
|
|
76
|
+
|
|
77
|
+
// Persist for post step (workaround for runner#3514)
|
|
78
|
+
core.saveState('token', token);
|
|
79
|
+
core.saveState('image-name', imageName);
|
|
80
|
+
|
|
81
|
+
// ... rest of main logic
|
|
82
|
+
}
|
|
83
|
+
|
|
84
|
+
run();
|
|
85
|
+
- language: typescript
|
|
86
|
+
label: 'Action post.ts — read from state, NOT core.getInput'
|
|
87
|
+
code: |
|
|
88
|
+
import * as core from '@actions/core';
|
|
89
|
+
|
|
90
|
+
async function runPost() {
|
|
91
|
+
// Use getState, NOT getInput — inputs are wrong in post step
|
|
92
|
+
// when called through nested composite action (runner#3514)
|
|
93
|
+
const token = core.getState('token');
|
|
94
|
+
const imageName = core.getState('image-name');
|
|
95
|
+
|
|
96
|
+
// ... post step logic using state values
|
|
97
|
+
}
|
|
98
|
+
|
|
99
|
+
runPost();
|
|
100
|
+
- language: yaml
|
|
101
|
+
label: 'action.yml — declare post step'
|
|
102
|
+
code: |
|
|
103
|
+
name: 'My Action'
|
|
104
|
+
inputs:
|
|
105
|
+
token:
|
|
106
|
+
required: true
|
|
107
|
+
image-name:
|
|
108
|
+
required: false
|
|
109
|
+
runs:
|
|
110
|
+
using: 'node20'
|
|
111
|
+
main: 'dist/main.js'
|
|
112
|
+
post: 'dist/post.js'
|
|
113
|
+
post-if: always()
|
|
114
|
+
prevention:
|
|
115
|
+
- 'In any Node action with a post: step, always use core.saveState/core.getState for
|
|
116
|
+
inputs consumed in post, never core.getInput — this is defensive programming regardless
|
|
117
|
+
of nesting depth'
|
|
118
|
+
- 'Test your action in a composite action wrapper (at least 2 layers deep) to catch
|
|
119
|
+
this bug before publishing'
|
|
120
|
+
- 'Check release notes of actions/runner for fixes to this bug before assuming
|
|
121
|
+
the built-in behavior is fixed'
|
|
122
|
+
- 'When using actions with known post-step input bugs through composite layers,
|
|
123
|
+
consider calling them directly from the workflow (not nested in a composite) as
|
|
124
|
+
a temporary workaround'
|
|
125
|
+
docs:
|
|
126
|
+
- url: 'https://github.com/actions/runner/issues/3514'
|
|
127
|
+
label: 'actions/runner#3514 — Wrong environment passed to node post when called by composite called by composite'
|
|
128
|
+
- url: 'https://github.com/actions/runner/issues/2030'
|
|
129
|
+
label: 'actions/runner#2030 — Composite: Nested actions post steps have the wrong context (open since 2022)'
|
|
130
|
+
- url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#sending-values-to-the-pre-and-post-actions'
|
|
131
|
+
label: 'GitHub Docs — Sending values to pre and post actions (saveState/getState)'
|
|
132
|
+
- url: 'https://github.com/github/codeql-action/pull/2557'
|
|
133
|
+
label: 'codeql-action#2557 — Fix: persist inputs between upload action and post step (reference implementation)'
|
|
@@ -0,0 +1,149 @@
|
|
|
1
|
+
id: known-unsolved-065
|
|
2
|
+
title: 'ubuntu-24.04-arm64 Hosted Runner Kernel Missing binder_linux and ashmem_linux Modules — Android Container Tests Fail'
|
|
3
|
+
category: known-unsolved
|
|
4
|
+
severity: limitation
|
|
5
|
+
tags:
|
|
6
|
+
- ubuntu-24.04-arm64
|
|
7
|
+
- arm64
|
|
8
|
+
- kernel-modules
|
|
9
|
+
- android
|
|
10
|
+
- binder
|
|
11
|
+
- ashmem
|
|
12
|
+
- container
|
|
13
|
+
- known-limitation
|
|
14
|
+
patterns:
|
|
15
|
+
- regex: 'modprobe: FATAL: Module binder_linux not found|modprobe.*binder_linux.*not found'
|
|
16
|
+
flags: 'i'
|
|
17
|
+
- regex: 'modprobe: FATAL: Module ashmem_linux not found|modprobe.*ashmem_linux.*not found'
|
|
18
|
+
flags: 'i'
|
|
19
|
+
- regex: '/dev/binder: No such file or directory|/dev/ashmem: No such file or directory'
|
|
20
|
+
flags: 'i'
|
|
21
|
+
- regex: 'CONFIG_ANDROID_BINDER_IPC.*not set|binder_linux.*absent.*kernel'
|
|
22
|
+
flags: 'i'
|
|
23
|
+
error_messages:
|
|
24
|
+
- 'modprobe: FATAL: Module binder_linux not found in directory /lib/modules/<kernel>'
|
|
25
|
+
- 'modprobe: FATAL: Module ashmem_linux not found in directory /lib/modules/<kernel>'
|
|
26
|
+
- '/dev/binder: No such file or directory'
|
|
27
|
+
- '/dev/ashmem: No such file or directory'
|
|
28
|
+
- 'modprobe binder_linux exited with code 1'
|
|
29
|
+
root_cause: |
|
|
30
|
+
The ubuntu-24.04-arm64 GitHub Actions hosted runner kernel is compiled WITHOUT
|
|
31
|
+
`CONFIG_ANDROID_BINDER_IPC=m` and `CONFIG_ANDROID_BINDERFS=m`. This means the
|
|
32
|
+
`binder_linux` and `ashmem_linux` kernel modules do not exist in
|
|
33
|
+
`/lib/modules/$(uname -r)/` on ARM64 runners.
|
|
34
|
+
|
|
35
|
+
Attempting `modprobe binder_linux` fails silently (exits 1) on ARM64 because
|
|
36
|
+
the module is simply absent from the kernel tree — it cannot be loaded even
|
|
37
|
+
as a privileged container.
|
|
38
|
+
|
|
39
|
+
**Why x86_64 hosted runners DO have these modules:**
|
|
40
|
+
The x86_64 ubuntu-24.04 hosted runners expose `binder` and `ashmem` because
|
|
41
|
+
they support the [GitHub Actions KVM Android hardware acceleration feature](https://github.blog/changelog/2024-04-02-github-actions-hardware-accelerated-android-virtualization-now-available/)
|
|
42
|
+
(released April 2024). The x86_64 kernel was explicitly built with Android
|
|
43
|
+
IPC support to enable this feature. The ARM64 kernel image is compiled
|
|
44
|
+
separately and was not built with these modules.
|
|
45
|
+
|
|
46
|
+
**Affected use cases:**
|
|
47
|
+
- Running [ReDroid](https://github.com/remote-android/redroid-doc) (GPU-enabled
|
|
48
|
+
Android-in-Docker) on native ARM64 CI for ARM-native app testing
|
|
49
|
+
- Running [Waydroid](https://waydro.id/) in ARM64 CI containers
|
|
50
|
+
- Any workflow that requires `/dev/binder` or `/dev/ashmem` devices
|
|
51
|
+
|
|
52
|
+
**Confirmed on:**
|
|
53
|
+
- ubuntu-24.04-arm64 image 20260531.15.1 (kernel aarch64, Azure westus2)
|
|
54
|
+
- Source: actions/runner-images#14184 (feature request, June 2026, open)
|
|
55
|
+
|
|
56
|
+
**No current workaround exists** — the module cannot be compiled from source
|
|
57
|
+
against the runner's kernel headers without the kernel source tree, and the
|
|
58
|
+
azure-linux kernel for ARM64 is not shipped with headers for out-of-tree
|
|
59
|
+
`binder_linux` builds. A request to add the modules to the ubuntu-24.04-arm64
|
|
60
|
+
image is tracked at runner-images#14184 and is currently open with no ETA.
|
|
61
|
+
fix: |
|
|
62
|
+
**No complete fix is currently available for native ARM64 hosted runners.**
|
|
63
|
+
|
|
64
|
+
**Option 1 (Recommended): Use ubuntu-24.04 x86_64 runners for Android tests**
|
|
65
|
+
|
|
66
|
+
The x86_64 ubuntu-24.04 runner has `binder_linux` and `ashmem_linux` and
|
|
67
|
+
supports Android container tests via KVM acceleration. ARM-native testing can
|
|
68
|
+
be approximated via the Android native bridge, at the cost of some overhead:
|
|
69
|
+
|
|
70
|
+
```yaml
|
|
71
|
+
android-test:
|
|
72
|
+
runs-on: ubuntu-24.04 # x86_64 — has binder_linux/ashmem_linux
|
|
73
|
+
steps:
|
|
74
|
+
- uses: actions/checkout@v6
|
|
75
|
+
- run: |
|
|
76
|
+
sudo modprobe binder_linux # succeeds on x86_64
|
|
77
|
+
sudo modprobe ashmem_linux
|
|
78
|
+
# ... run ReDroid or Waydroid Android container tests
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
**Option 2: Use real ARM hardware (Firebase Test Lab, self-hosted)**
|
|
82
|
+
|
|
83
|
+
For true ARM64 profiling (power/wakelock, native execution) use Firebase
|
|
84
|
+
Test Lab with physical Pixel hardware, or a self-hosted ARM64 runner on
|
|
85
|
+
hardware/VMs that expose binder devices.
|
|
86
|
+
|
|
87
|
+
**Option 3: Compile binder_linux from source (complex, unreliable)**
|
|
88
|
+
|
|
89
|
+
Without kernel headers matching the runner kernel, this is not practical
|
|
90
|
+
on GitHub-hosted runners.
|
|
91
|
+
|
|
92
|
+
**Track for a platform fix:** Follow actions/runner-images#14184 for progress
|
|
93
|
+
on adding `binder_linux`/`ashmem_linux` to the ubuntu-24.04-arm64 runner image.
|
|
94
|
+
fix_code:
|
|
95
|
+
- language: yaml
|
|
96
|
+
label: 'Route Android container tests to x86_64 runner which has binder_linux/ashmem_linux'
|
|
97
|
+
code: |
|
|
98
|
+
jobs:
|
|
99
|
+
android-container-test:
|
|
100
|
+
# ubuntu-24.04-arm64 runner kernel is compiled without CONFIG_ANDROID_BINDER_IPC=m
|
|
101
|
+
# Use x86_64 runner which has binder_linux/ashmem_linux for KVM Android acceleration.
|
|
102
|
+
# Track runner-images#14184 for ARM64 kernel module support.
|
|
103
|
+
runs-on: ubuntu-24.04 # x86_64 — binder_linux available
|
|
104
|
+
container:
|
|
105
|
+
image: redroid/redroid:15.0.0-latest
|
|
106
|
+
options: --privileged
|
|
107
|
+
steps:
|
|
108
|
+
- uses: actions/checkout@v6
|
|
109
|
+
- name: Verify binder device
|
|
110
|
+
run: ls -la /dev/binder /dev/ashmem
|
|
111
|
+
- name: Run instrumented tests
|
|
112
|
+
run: ./gradlew connectedAndroidTest
|
|
113
|
+
- language: yaml
|
|
114
|
+
label: 'Guard ARM64 jobs against missing kernel modules'
|
|
115
|
+
code: |
|
|
116
|
+
jobs:
|
|
117
|
+
android-test:
|
|
118
|
+
runs-on: ${{ matrix.runner }}
|
|
119
|
+
strategy:
|
|
120
|
+
matrix:
|
|
121
|
+
runner: [ubuntu-24.04, ubuntu-24.04-arm64]
|
|
122
|
+
steps:
|
|
123
|
+
- uses: actions/checkout@v6
|
|
124
|
+
- name: Check binder_linux availability
|
|
125
|
+
id: binder-check
|
|
126
|
+
run: |
|
|
127
|
+
if modprobe binder_linux 2>/dev/null; then
|
|
128
|
+
echo "available=true" >> "$GITHUB_OUTPUT"
|
|
129
|
+
else
|
|
130
|
+
echo "available=false" >> "$GITHUB_OUTPUT"
|
|
131
|
+
echo "::warning::binder_linux not available on $(uname -m) runner. Skipping Android container tests."
|
|
132
|
+
fi
|
|
133
|
+
- name: Run Android container tests
|
|
134
|
+
if: steps.binder-check.outputs.available == 'true'
|
|
135
|
+
run: ./run-android-tests.sh
|
|
136
|
+
prevention:
|
|
137
|
+
- 'Do not assume binder_linux or ashmem_linux are available on ubuntu-24.04-arm64 hosted runners — the kernel was not built with CONFIG_ANDROID_BINDER_IPC=m.'
|
|
138
|
+
- 'Route Android container (ReDroid/Waydroid) CI tests to x86_64 ubuntu-24.04 runners until runner-images#14184 is resolved.'
|
|
139
|
+
- 'Always guard binder_linux/ashmem_linux modprobe calls with an availability check so ARM64 jobs fail gracefully rather than unexpectedly.'
|
|
140
|
+
- 'Track the ARM64 runner image feature request at actions/runner-images#14184 to know when the modules are added.'
|
|
141
|
+
docs:
|
|
142
|
+
- url: 'https://github.com/actions/runner-images/issues/14184'
|
|
143
|
+
label: 'runner-images #14184 — Add binder_linux and ashmem_linux kernel modules to ubuntu-24.04-arm image (open Jun 2026)'
|
|
144
|
+
- url: 'https://github.com/remote-android/redroid-doc/issues/928'
|
|
145
|
+
label: 'redroid-doc #928 — GitHub Actions ubuntu-24.04-arm runners lack binder/ashmem kernel modules'
|
|
146
|
+
- url: 'https://github.blog/changelog/2024-04-02-github-actions-hardware-accelerated-android-virtualization-now-available/'
|
|
147
|
+
label: 'GitHub Changelog — Hardware-accelerated Android virtualization on x86_64 Actions runners'
|
|
148
|
+
- url: 'https://github.com/actions/runner-images/releases/tag/ubuntu24-arm64%2F20260531.15'
|
|
149
|
+
label: 'ubuntu-24.04-arm64 image 20260531.15.1 release notes'
|
|
@@ -0,0 +1,161 @@
|
|
|
1
|
+
id: permissions-auth-069
|
|
2
|
+
title: 'OIDC trust policy silently fails for repos missing required custom property claim — repository_property:* absent when property unset'
|
|
3
|
+
category: permissions-auth
|
|
4
|
+
severity: error
|
|
5
|
+
tags:
|
|
6
|
+
- oidc
|
|
7
|
+
- custom-properties
|
|
8
|
+
- trust-policy
|
|
9
|
+
- aws
|
|
10
|
+
- azure
|
|
11
|
+
- gcp
|
|
12
|
+
- repository-property
|
|
13
|
+
- april-2026
|
|
14
|
+
patterns:
|
|
15
|
+
- regex: 'Not authorized to perform sts:AssumeRoleWithWebIdentity'
|
|
16
|
+
flags: 'i'
|
|
17
|
+
- regex: 'AccessDenied.*AssumeRoleWithWebIdentity|WebIdentityErr.*AccessDenied'
|
|
18
|
+
flags: 'i'
|
|
19
|
+
- regex: 'Couldn''t retrieve OIDC token.*403|OIDC token.*invalid.*claim'
|
|
20
|
+
flags: 'i'
|
|
21
|
+
- regex: 'Error: Credentials could not be loaded.*OIDC'
|
|
22
|
+
flags: 'i'
|
|
23
|
+
error_messages:
|
|
24
|
+
- 'Error: Not authorized to perform sts:AssumeRoleWithWebIdentity'
|
|
25
|
+
- 'AccessDenied: User: arn:aws:sts::... is not authorized to perform: sts:AssumeRoleWithWebIdentity'
|
|
26
|
+
- 'Error: Credentials could not be loaded, please check your action inputs: Could not load credentials from any providers'
|
|
27
|
+
- 'google.auth.exceptions.DefaultCredentialsError: OIDC token condition not satisfied'
|
|
28
|
+
root_cause: |
|
|
29
|
+
GitHub Actions OIDC tokens now include `repository_property:{name}` claims for each
|
|
30
|
+
custom property set on the repository (generally available from April 2026). This
|
|
31
|
+
feature lets organizations create finer-grained cloud trust policies — for example,
|
|
32
|
+
only allowing OIDC authentication for repos whose `deploy_tier` custom property is
|
|
33
|
+
set to `production`.
|
|
34
|
+
|
|
35
|
+
However, the claim is **absent from the OIDC token when the property is not set on
|
|
36
|
+
the repository**. Cloud providers (AWS IAM, Azure AD, Google Cloud) evaluate a
|
|
37
|
+
missing claim as a condition failure:
|
|
38
|
+
|
|
39
|
+
- **AWS IAM** `Condition: { StringEquals: { "...repository_property:deploy_tier": "production" } }`
|
|
40
|
+
→ `AccessDenied` if the repo has no `deploy_tier` property set
|
|
41
|
+
- **GCP** workload identity attribute conditions on `attribute.repository_property_*`
|
|
42
|
+
→ condition evaluates false, token exchange rejected
|
|
43
|
+
|
|
44
|
+
Common scenarios that trigger this:
|
|
45
|
+
1. **Org-wide trust policy** uses a custom property claim, but individual repos have
|
|
46
|
+
not been tagged with the required property.
|
|
47
|
+
2. **Property renamed or deleted** — the trust policy still references the old
|
|
48
|
+
property name; the token no longer includes the old claim.
|
|
49
|
+
3. **Fork PRs** — forked repositories do not inherit the parent org's custom
|
|
50
|
+
properties; OIDC tokens from fork CI lack the expected claims.
|
|
51
|
+
4. **New repo** — a repository was added to the org after the trust policy was
|
|
52
|
+
configured; the property has not yet been applied to it.
|
|
53
|
+
|
|
54
|
+
The error message (`Not authorized to perform sts:AssumeRoleWithWebIdentity`) is
|
|
55
|
+
identical to other OIDC failures (wrong `sub`, wrong `aud`, expired token) and gives
|
|
56
|
+
no indication that a missing custom property claim is the cause.
|
|
57
|
+
fix: |
|
|
58
|
+
1. **Verify the claim is present in the token**: Use the GitHub OIDC debugger or
|
|
59
|
+
print the decoded token payload in a workflow step to confirm the
|
|
60
|
+
`repository_property:{name}` claim exists and has the expected value.
|
|
61
|
+
|
|
62
|
+
2. **Ensure the custom property is set on all target repos**: In the org settings,
|
|
63
|
+
verify that every repository expected to use the trust policy has the required
|
|
64
|
+
property configured. Newly added repos will not have it by default.
|
|
65
|
+
|
|
66
|
+
3. **Make the condition optional (if the property may not always be set)**:
|
|
67
|
+
In AWS IAM, use `StringLike` with a wildcard or remove the custom-property
|
|
68
|
+
condition from the trust policy; use a separate, more permissive role for repos
|
|
69
|
+
without the property.
|
|
70
|
+
|
|
71
|
+
4. **For fork PRs**: custom properties on the upstream org do not flow to forks.
|
|
72
|
+
Avoid trust policies that require custom property claims in workflows triggered
|
|
73
|
+
by `pull_request` from external forks.
|
|
74
|
+
fix_code:
|
|
75
|
+
- language: yaml
|
|
76
|
+
label: 'Debug step — print OIDC token claims to diagnose missing custom property'
|
|
77
|
+
code: |
|
|
78
|
+
jobs:
|
|
79
|
+
debug-oidc:
|
|
80
|
+
runs-on: ubuntu-latest
|
|
81
|
+
permissions:
|
|
82
|
+
id-token: write
|
|
83
|
+
steps:
|
|
84
|
+
- name: Fetch OIDC token and decode payload
|
|
85
|
+
run: |
|
|
86
|
+
TOKEN=$(curl -sH "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
|
|
87
|
+
"$ACTIONS_ID_TOKEN_REQUEST_URL&audience=sts.amazonaws.com" | jq -r '.value')
|
|
88
|
+
# Decode payload (second segment of JWT, base64url-encoded)
|
|
89
|
+
echo "$TOKEN" | cut -d. -f2 | tr '_-' '/+' \
|
|
90
|
+
| base64 -d 2>/dev/null | jq .
|
|
91
|
+
# Look for "repository_property:your_property_name" in the output.
|
|
92
|
+
# If the claim is missing, the repo does not have that property set.
|
|
93
|
+
|
|
94
|
+
- language: yaml
|
|
95
|
+
label: 'AWS IAM trust policy — correct use of repository_property claim'
|
|
96
|
+
code: |
|
|
97
|
+
# AWS IAM role trust policy (JSON, not YAML — shown here for reference)
|
|
98
|
+
# Only allow OIDC from repos where custom property "deploy_tier" = "production"
|
|
99
|
+
{
|
|
100
|
+
"Version": "2012-10-17",
|
|
101
|
+
"Statement": [{
|
|
102
|
+
"Effect": "Allow",
|
|
103
|
+
"Principal": { "Federated": "arn:aws:iam::ACCOUNT:oidc-provider/token.actions.githubusercontent.com" },
|
|
104
|
+
"Action": "sts:AssumeRoleWithWebIdentity",
|
|
105
|
+
"Condition": {
|
|
106
|
+
"StringEquals": {
|
|
107
|
+
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com",
|
|
108
|
+
"token.actions.githubusercontent.com:repository_property:deploy_tier": "production"
|
|
109
|
+
}
|
|
110
|
+
}
|
|
111
|
+
}]
|
|
112
|
+
}
|
|
113
|
+
# IMPORTANT: Every repo that runs this workflow MUST have the "deploy_tier"
|
|
114
|
+
# custom property set to "production" in org settings. If the property is
|
|
115
|
+
# unset or absent, the token will not include the claim and the assume-role
|
|
116
|
+
# call will return AccessDenied.
|
|
117
|
+
|
|
118
|
+
- language: yaml
|
|
119
|
+
label: 'Workflow — ensure the custom property claim is available before assuming role'
|
|
120
|
+
code: |
|
|
121
|
+
jobs:
|
|
122
|
+
deploy:
|
|
123
|
+
runs-on: ubuntu-latest
|
|
124
|
+
permissions:
|
|
125
|
+
id-token: write
|
|
126
|
+
contents: read
|
|
127
|
+
steps:
|
|
128
|
+
- uses: actions/checkout@v4
|
|
129
|
+
|
|
130
|
+
# Verify the custom property claim is present before assuming the role
|
|
131
|
+
- name: Validate OIDC custom property claim
|
|
132
|
+
run: |
|
|
133
|
+
TOKEN=$(curl -sH "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
|
|
134
|
+
"$ACTIONS_ID_TOKEN_REQUEST_URL&audience=sts.amazonaws.com" | jq -r '.value')
|
|
135
|
+
PAYLOAD=$(echo "$TOKEN" | cut -d. -f2 | tr '_-' '/+' | base64 -d 2>/dev/null)
|
|
136
|
+
TIER=$(echo "$PAYLOAD" | jq -r '."repository_property:deploy_tier" // "MISSING"')
|
|
137
|
+
echo "deploy_tier claim: $TIER"
|
|
138
|
+
if [[ "$TIER" != "production" ]]; then
|
|
139
|
+
echo "::error::Repository custom property 'deploy_tier' is not set to 'production'. Set it in org settings before running this workflow."
|
|
140
|
+
exit 1
|
|
141
|
+
fi
|
|
142
|
+
|
|
143
|
+
- name: Configure AWS credentials via OIDC
|
|
144
|
+
uses: aws-actions/configure-aws-credentials@v4
|
|
145
|
+
with:
|
|
146
|
+
role-to-assume: arn:aws:iam::123456789012:role/my-production-deploy-role
|
|
147
|
+
aws-region: us-east-1
|
|
148
|
+
|
|
149
|
+
prevention:
|
|
150
|
+
- 'Maintain a registry of which repositories have each custom property set — before applying a trust policy that requires a custom property claim, verify all target repos have the property configured.'
|
|
151
|
+
- 'When adding a new repo to an org that uses OIDC custom property trust policies, immediately apply the required custom properties before running any workflows that assume cloud roles.'
|
|
152
|
+
- 'Do not use repository custom property claims in OIDC trust policies for workflows triggered by external fork pull requests — forks do not inherit the upstream org''s custom properties.'
|
|
153
|
+
- 'Add a preflight validation step to workflows that assume cloud roles — verify the expected repository_property:* claim is present in the OIDC token before calling the cloud provider.'
|
|
154
|
+
- 'If a custom property is renamed or removed, update all OIDC trust policies before the change takes effect to avoid sudden AccessDenied failures.'
|
|
155
|
+
docs:
|
|
156
|
+
- url: 'https://github.blog/changelog/2026-04-02-github-actions-early-april-2026-updates/#actions-oidc-tokens-now-support-repository-custom-properties'
|
|
157
|
+
label: 'GitHub Changelog: OIDC tokens now support repository custom properties (April 2026)'
|
|
158
|
+
- url: 'https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/about-security-hardening-with-openid-connect#customizing-the-token-claims'
|
|
159
|
+
label: 'GitHub Docs: Customizing OIDC token claims — repository custom properties'
|
|
160
|
+
- url: 'https://docs.github.com/en/organizations/managing-organization-settings/managing-custom-properties-for-repositories-in-your-organization'
|
|
161
|
+
label: 'GitHub Docs: Managing custom properties for repositories in your organization'
|
package/errors/runner-environment/arc-autoscalinglistener-ephemeralrunnerset-stale-after-upgrade.yml
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
id: runner-environment-211
|
|
2
|
+
title: 'ARC Controller Upgrade Leaves Stale AutoscalingListener and EphemeralRunnerSet — Manual Intervention Required'
|
|
3
|
+
category: runner-environment
|
|
4
|
+
severity: error
|
|
5
|
+
tags:
|
|
6
|
+
- arc
|
|
7
|
+
- actions-runner-controller
|
|
8
|
+
- kubernetes
|
|
9
|
+
- upgrade
|
|
10
|
+
- autoscaling
|
|
11
|
+
- helm
|
|
12
|
+
- stale-controller
|
|
13
|
+
patterns:
|
|
14
|
+
- regex: 'AutoscalingListener.*spec\.image.*old.*version|spec\.image.*ghcr\.io/actions.*stale'
|
|
15
|
+
flags: 'i'
|
|
16
|
+
- regex: 'app\.kubernetes\.io/version.*mismatch|helm\.sh/chart.*stale.*controller'
|
|
17
|
+
flags: 'i'
|
|
18
|
+
- regex: 'RunnerScaleSet.*stale.*image|EphemeralRunnerSet.*old.*version'
|
|
19
|
+
flags: 'i'
|
|
20
|
+
error_messages:
|
|
21
|
+
- 'AutoscalingListener CRs retain stale controller image after controller upgrade'
|
|
22
|
+
- 'EphemeralRunnerSet retains stale version labels after controller upgrade'
|
|
23
|
+
- 'spec.image still points to old controller version after helm upgrade'
|
|
24
|
+
root_cause: |
|
|
25
|
+
When upgrading the `gha-runner-scale-set-controller` Helm chart, two
|
|
26
|
+
controller-managed objects are NOT updated to reflect the new version:
|
|
27
|
+
|
|
28
|
+
1. **AutoscalingListener CRs** — retain the old controller image in `spec.image`
|
|
29
|
+
2. **EphemeralRunnerSet objects** — retain old version labels
|
|
30
|
+
(`app.kubernetes.io/version`, `helm.sh/chart`)
|
|
31
|
+
|
|
32
|
+
The root cause is that the controller gates reconciliation on a **spec hash**.
|
|
33
|
+
A controller-only upgrade does not change any `AutoscalingRunnerSet` spec, so
|
|
34
|
+
the hash is identical and the controller skips reconciliation of both objects
|
|
35
|
+
entirely. The `updateStrategy` flag only governs spec-change rollout, not
|
|
36
|
+
controller version upgrades.
|
|
37
|
+
|
|
38
|
+
**Object hierarchy affected:**
|
|
39
|
+
```
|
|
40
|
+
AutoscalingRunnerSet (Helm-managed)
|
|
41
|
+
├── AutoscalingListener ← stale image after controller upgrade
|
|
42
|
+
└── EphemeralRunnerSet ← stale labels/spec after controller upgrade
|
|
43
|
+
└── EphemeralRunner
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
For minor version bumps (e.g. 0.14.1 → 0.14.2) the staleness may appear
|
|
47
|
+
cosmetic. For **major upgrades** where the `EphemeralRunnerSet` or
|
|
48
|
+
`EphemeralRunner` spec has breaking changes (new required fields, removed
|
|
49
|
+
fields, changed defaults), stale objects under a new controller can cause
|
|
50
|
+
runtime failures — jobs queued but never dispatched, runner pods using the old
|
|
51
|
+
image's entrypoint, or scale-set reporting incorrect capacity.
|
|
52
|
+
|
|
53
|
+
**Version confirmed affected:** controller 0.14.2 / scale-set 0.14.2,
|
|
54
|
+
Kubernetes RKE2 (reproducible on any Kubernetes distribution).
|
|
55
|
+
fix: |
|
|
56
|
+
Two separate manual steps are required after every controller upgrade where
|
|
57
|
+
AutoscalingListener or EphemeralRunnerSet spec has changed:
|
|
58
|
+
|
|
59
|
+
**Step 1 — Delete all AutoscalingListener CRs** (the controller recreates them
|
|
60
|
+
immediately with the new image; runner pods are unaffected):
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
kubectl delete autoscalinglisteners -A --all
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Verify the new version is used after recreation:
|
|
67
|
+
```bash
|
|
68
|
+
kubectl get autoscalinglisteners -A \
|
|
69
|
+
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.image}{"\n"}{end}'
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
**Step 2 — Trigger EphemeralRunnerSet reconciliation** via a dummy annotation
|
|
73
|
+
change to `spec.template.spec` (a change to `minRunners` alone is NOT
|
|
74
|
+
sufficient — it only affects the AutoscalingListener hash, not the
|
|
75
|
+
EphemeralRunnerSet hash):
|
|
76
|
+
|
|
77
|
+
```yaml
|
|
78
|
+
spec:
|
|
79
|
+
template:
|
|
80
|
+
metadata:
|
|
81
|
+
annotations:
|
|
82
|
+
upgrade-trigger: "0.14.2" # bump on each controller upgrade
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
This triggers a graceful transition: the old EphemeralRunnerSet drains
|
|
86
|
+
(in-progress jobs complete) while the new one starts accepting jobs
|
|
87
|
+
immediately. Remove the annotation in a follow-up commit.
|
|
88
|
+
|
|
89
|
+
**Verify EphemeralRunnerSet version labels are updated:**
|
|
90
|
+
```bash
|
|
91
|
+
kubectl get ephemeralrunnersets -A \
|
|
92
|
+
-o custom-columns='NAME:.metadata.name,VERSION:.metadata.labels.app\.kubernetes\.io/version'
|
|
93
|
+
```
|
|
94
|
+
fix_code:
|
|
95
|
+
- language: yaml
|
|
96
|
+
label: 'Trigger EphemeralRunnerSet reconciliation via dummy annotation (per scale set)'
|
|
97
|
+
code: |
|
|
98
|
+
# In your AutoscalingRunnerSet HelmRelease or values.yaml:
|
|
99
|
+
# Add a dummy annotation to spec.template.spec to force EphemeralRunnerSet
|
|
100
|
+
# reconciliation after a controller upgrade.
|
|
101
|
+
# Remove the annotation in a follow-up commit once migration is confirmed.
|
|
102
|
+
spec:
|
|
103
|
+
template:
|
|
104
|
+
metadata:
|
|
105
|
+
annotations:
|
|
106
|
+
upgrade-trigger: "0.14.2" # bump to new controller version
|
|
107
|
+
- language: yaml
|
|
108
|
+
label: 'Post-upgrade runbook as a one-off Job'
|
|
109
|
+
code: |
|
|
110
|
+
# After upgrading the controller chart, run this in CI or manually:
|
|
111
|
+
# Step 1: delete stale AutoscalingListener CRs (controller recreates immediately)
|
|
112
|
+
# kubectl delete autoscalinglisteners -A --all
|
|
113
|
+
#
|
|
114
|
+
# Step 2: patch each AutoscalingRunnerSet with a dummy annotation to force
|
|
115
|
+
# EphemeralRunnerSet reconciliation:
|
|
116
|
+
# kubectl annotate autoscalingrunnersets -A --all \
|
|
117
|
+
# upgrade-trigger=$(date +%s) --overwrite
|
|
118
|
+
#
|
|
119
|
+
# Note: kubectl annotate updates metadata.annotations, not spec.template.spec,
|
|
120
|
+
# so it does NOT trigger EphemeralRunnerSet reconciliation. Use the values
|
|
121
|
+
# approach above (spec.template.metadata.annotations) instead.
|
|
122
|
+
prevention:
|
|
123
|
+
- 'After every ARC controller upgrade, check AutoscalingListener images and EphemeralRunnerSet labels before routing production traffic.'
|
|
124
|
+
- 'Add a post-upgrade step to your CI/CD pipeline that deletes AutoscalingListener CRs and adds a dummy upgrade-trigger annotation.'
|
|
125
|
+
- 'Pin a dummy annotation like `upgrade-trigger: "<version>"` in your HelmRelease values and bump it with each controller upgrade.'
|
|
126
|
+
- 'Subscribe to actions/actions-runner-controller releases and review EphemeralRunnerSet spec changes before upgrading.'
|
|
127
|
+
- 'Track the upstream issue at actions/actions-runner-controller#4513 for a platform-side fix.'
|
|
128
|
+
docs:
|
|
129
|
+
- url: 'https://github.com/actions/actions-runner-controller/issues/4513'
|
|
130
|
+
label: 'ARC #4513 — AutoscalingListener and EphemeralRunnerSet retain stale controller image/labels after upgrade (open Jun 2026)'
|
|
131
|
+
- url: 'https://github.com/actions/actions-runner-controller/blob/master/TROUBLESHOOTING.md'
|
|
132
|
+
label: 'ARC Troubleshooting Guide'
|
|
133
|
+
- url: 'https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/about-actions-runner-controller'
|
|
134
|
+
label: 'About Actions Runner Controller'
|