@htekdev/actions-debugger 1.0.35 → 1.0.37
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/errors/known-unsolved/checkout-eu-runner-timeout-regional-degradation.yml +97 -0
- package/errors/permissions-auth/checkout-v6-includif-symlinked-work-credential-failure.yml +93 -0
- package/errors/runner-environment/ephemeral-runner-not-found-after-register.yml +115 -0
- package/errors/runner-environment/multi-platform-docker-build-missing-qemu.yml +84 -0
- package/errors/runner-environment/worker-wedged-task-orchestration-job-not-found.yml +119 -0
- package/errors/silent-failures/github-sha-pr-merge-commit-status-invisible.yml +95 -0
- package/errors/triggers/pull-request-labeled-type-not-default.yml +98 -0
- package/errors/yaml-syntax/cache-v3-save-always-unexpected-input.yml +85 -0
- package/package.json +1 -1
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
id: known-unsolved-033
|
|
2
|
+
title: 'actions/checkout Hangs or Times Out From EU GitHub-Hosted Runners (Regional Degradation)'
|
|
3
|
+
category: known-unsolved
|
|
4
|
+
severity: error
|
|
5
|
+
tags:
|
|
6
|
+
- checkout
|
|
7
|
+
- performance
|
|
8
|
+
- eu-runners
|
|
9
|
+
- timeout
|
|
10
|
+
- regional
|
|
11
|
+
- infrastructure
|
|
12
|
+
patterns:
|
|
13
|
+
- regex: 'fatal: unable to access.*https://github\.com.*timed out'
|
|
14
|
+
flags: 'i'
|
|
15
|
+
- regex: 'fatal: unable to access.*https://github\.com.*Could not resolve host'
|
|
16
|
+
flags: 'i'
|
|
17
|
+
- regex: 'Error: Process completed with exit code 128'
|
|
18
|
+
flags: 'i'
|
|
19
|
+
error_messages:
|
|
20
|
+
- "fatal: unable to access 'https://github.com/owner/repo/': Operation timed out"
|
|
21
|
+
- "Error: Process completed with exit code 128."
|
|
22
|
+
- 'actions/checkout step hanging for 5-30 minutes with no output'
|
|
23
|
+
root_cause: |
|
|
24
|
+
Starting May 19, 2026, workflows running on GitHub-hosted runners in European
|
|
25
|
+
data centers began experiencing severely degraded actions/checkout performance.
|
|
26
|
+
The fetch/clone phase hangs silently for 5-30 minutes before either completing
|
|
27
|
+
slowly or timing out, regardless of repository size. Workflows that previously
|
|
28
|
+
completed checkout in 10-30 seconds are affected.
|
|
29
|
+
|
|
30
|
+
The root cause is an infrastructure-level degradation on GitHub's side
|
|
31
|
+
affecting the European runner subnet's connectivity to GitHub's Smart HTTP
|
|
32
|
+
server or CDN endpoints. This is distinct from general large-repo slowness:
|
|
33
|
+
even tiny repositories with shallow clones exhibit the hang. GitHub has not
|
|
34
|
+
published a root cause analysis or resolution timeline as of June 2026.
|
|
35
|
+
|
|
36
|
+
Notably, runners in US regions (us-east-1, us-west-2) are not affected —
|
|
37
|
+
the issue is specific to EU runner region routing. The error manifests as
|
|
38
|
+
either a silent hang (no log output during the fetch phase) or an eventual
|
|
39
|
+
"Operation timed out" exit code 128.
|
|
40
|
+
|
|
41
|
+
Source: actions/checkout#2441 (52 reactions, opened May 24, 2026, still open).
|
|
42
|
+
fix: |
|
|
43
|
+
No upstream fix available — this is a GitHub infrastructure issue with no
|
|
44
|
+
workaround that completely eliminates the problem. Mitigations to reduce
|
|
45
|
+
impact:
|
|
46
|
+
|
|
47
|
+
1. Add timeout-minutes to checkout steps to prevent indefinite hangs and
|
|
48
|
+
fail fast with a clear error rather than a silent stuck pipeline.
|
|
49
|
+
|
|
50
|
+
2. Use fetch-depth: 1 (shallow clone) to reduce transfer size, which may
|
|
51
|
+
reduce hang duration even if it does not eliminate it.
|
|
52
|
+
|
|
53
|
+
3. Use sparse-checkout to limit the files transferred from the CDN.
|
|
54
|
+
|
|
55
|
+
4. For critical pipelines, consider temporarily switching to ubuntu-latest
|
|
56
|
+
with an explicit us-east-1 runner label if your GitHub plan supports
|
|
57
|
+
regional runner selection.
|
|
58
|
+
|
|
59
|
+
5. Subscribe to GitHub Status (githubstatus.com) for EU infrastructure
|
|
60
|
+
degradation notices — incidents affecting this region are tracked there.
|
|
61
|
+
fix_code:
|
|
62
|
+
- language: yaml
|
|
63
|
+
label: 'Shallow clone with timeout to fail fast during EU degradation'
|
|
64
|
+
code: |
|
|
65
|
+
steps:
|
|
66
|
+
- name: Checkout
|
|
67
|
+
uses: actions/checkout@v4
|
|
68
|
+
timeout-minutes: 5 # fail fast instead of hanging for 30+ minutes
|
|
69
|
+
with:
|
|
70
|
+
fetch-depth: 1 # shallow clone reduces CDN transfer size
|
|
71
|
+
- language: yaml
|
|
72
|
+
label: 'Sparse checkout to minimize data fetched during regional degradation'
|
|
73
|
+
code: |
|
|
74
|
+
steps:
|
|
75
|
+
- name: Sparse checkout
|
|
76
|
+
uses: actions/checkout@v4
|
|
77
|
+
timeout-minutes: 5
|
|
78
|
+
with:
|
|
79
|
+
fetch-depth: 1
|
|
80
|
+
sparse-checkout: |
|
|
81
|
+
src/
|
|
82
|
+
tests/
|
|
83
|
+
package.json
|
|
84
|
+
go.mod
|
|
85
|
+
prevention:
|
|
86
|
+
- 'Always specify fetch-depth: 1 for workflows that do not require full commit history'
|
|
87
|
+
- 'Add timeout-minutes to every checkout step to prevent indefinite pipeline hangs'
|
|
88
|
+
- 'Monitor p99 checkout duration from EU runners as a CI health SLI'
|
|
89
|
+
- 'Subscribe to GitHub Status page (githubstatus.com) for EU infrastructure degradation notices'
|
|
90
|
+
- 'Use sparse-checkout in large monorepos to reduce CDN dependency during fetch'
|
|
91
|
+
docs:
|
|
92
|
+
- url: 'https://github.com/actions/checkout/issues/2441'
|
|
93
|
+
label: 'actions/checkout #2441: Checkouts extremely slow or timing out from EU (52 reactions, May 2026)'
|
|
94
|
+
- url: 'https://www.githubstatus.com/'
|
|
95
|
+
label: 'GitHub Status page for infrastructure incidents'
|
|
96
|
+
- url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/evaluate-expressions-in-workflows-and-actions'
|
|
97
|
+
label: 'GitHub Docs: sparse-checkout in actions/checkout'
|
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
id: permissions-auth-032
|
|
2
|
+
title: 'checkout@v6 Credential Injection Fails on Self-Hosted Runners With Symlinked _work Directory'
|
|
3
|
+
category: permissions-auth
|
|
4
|
+
severity: error
|
|
5
|
+
tags:
|
|
6
|
+
- checkout-v6
|
|
7
|
+
- self-hosted
|
|
8
|
+
- symlink
|
|
9
|
+
- credentials
|
|
10
|
+
- includif
|
|
11
|
+
- macos
|
|
12
|
+
patterns:
|
|
13
|
+
- regex: 'fatal: could not read Username for.*terminal prompts disabled'
|
|
14
|
+
flags: 'i'
|
|
15
|
+
- regex: 'includeIf.*gitdir.*_work'
|
|
16
|
+
flags: 'i'
|
|
17
|
+
- regex: 'fatal: repository.*not found'
|
|
18
|
+
flags: 'i'
|
|
19
|
+
error_messages:
|
|
20
|
+
- "fatal: could not read Username for 'https://github.com': terminal prompts disabled"
|
|
21
|
+
- 'Error: fatal: repository not found'
|
|
22
|
+
- 'Authentication failed'
|
|
23
|
+
root_cause: |
|
|
24
|
+
actions/checkout@v6 changed credential injection from writing directly into
|
|
25
|
+
the repository configuration file as http.https://github.com/.extraheader
|
|
26
|
+
(the v5 approach) to using includeIf "gitdir:..." directives that reference
|
|
27
|
+
a temporary credentials file stored in _work/_temp/.
|
|
28
|
+
|
|
29
|
+
v6 writes the includeIf path using the symlink path of the runner _work
|
|
30
|
+
directory. However, the version control system evaluates gitdir: conditions
|
|
31
|
+
against the resolved (real) absolute path — it follows symlinks when
|
|
32
|
+
determining the current repository's directory.
|
|
33
|
+
|
|
34
|
+
When the runner _work directory is a symlink to an external volume (a common
|
|
35
|
+
setup for macOS Apple Silicon runners using external SSD storage), the
|
|
36
|
+
includeIf path written by v6 uses the symlink path
|
|
37
|
+
(e.g., /Users/runner/actions-runner-N/_work/repo/.git) but the actual
|
|
38
|
+
resolved path is different
|
|
39
|
+
(e.g., /Volumes/External/actions-runner-N-work/repo/.git).
|
|
40
|
+
These never match, so the credentials config file is never loaded and the
|
|
41
|
+
fetch step fails with "terminal prompts disabled."
|
|
42
|
+
|
|
43
|
+
v5 is unaffected because it injects credentials directly into the repository
|
|
44
|
+
configuration file rather than using conditional includes.
|
|
45
|
+
Source: actions/checkout#2393 (open March 2026, macOS Apple Silicon).
|
|
46
|
+
fix: |
|
|
47
|
+
Option 1 (recommended): Pin to actions/checkout@v5 for workflows running on
|
|
48
|
+
self-hosted runners with symlinked _work directories. v5 injects credentials
|
|
49
|
+
directly and is not affected by this symlink resolution issue.
|
|
50
|
+
|
|
51
|
+
Option 2: Reconfigure the runner to use the real volume path directly.
|
|
52
|
+
Remove the symlink from _work and mount the external volume at the actual
|
|
53
|
+
runner work path location. This eliminates the symlink entirely.
|
|
54
|
+
|
|
55
|
+
Option 3: Use persist-credentials: false with a separate authentication
|
|
56
|
+
step that does not rely on the includeIf mechanism.
|
|
57
|
+
fix_code:
|
|
58
|
+
- language: yaml
|
|
59
|
+
label: 'Pin to v5 as workaround for symlinked _work runners (checkout#2393)'
|
|
60
|
+
code: |
|
|
61
|
+
steps:
|
|
62
|
+
- name: Checkout
|
|
63
|
+
# Pinned to v5 — v6 includeIf credential injection fails when runner
|
|
64
|
+
# _work directory is a symlink to an external volume (checkout#2393)
|
|
65
|
+
uses: actions/checkout@v5
|
|
66
|
+
with:
|
|
67
|
+
token: ${{ secrets.GITHUB_TOKEN }}
|
|
68
|
+
- language: yaml
|
|
69
|
+
label: 'Use persist-credentials false with explicit token for subsequent steps'
|
|
70
|
+
code: |
|
|
71
|
+
steps:
|
|
72
|
+
- name: Checkout without credential persistence
|
|
73
|
+
uses: actions/checkout@v6
|
|
74
|
+
with:
|
|
75
|
+
persist-credentials: false
|
|
76
|
+
|
|
77
|
+
- name: Subsequent steps using explicit token
|
|
78
|
+
env:
|
|
79
|
+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
|
80
|
+
run: |
|
|
81
|
+
echo "Use GITHUB_TOKEN env var in subsequent authenticated operations"
|
|
82
|
+
prevention:
|
|
83
|
+
- 'Audit self-hosted runner _work paths for symlinks before upgrading from checkout@v5 to v6'
|
|
84
|
+
- 'Avoid symlinking the runner _work directory — use bind mounts or configure the real path'
|
|
85
|
+
- 'Test checkout behavior on self-hosted runners in a canary workflow before rolling out v6'
|
|
86
|
+
- 'Check the resolved path differs from the symlink path when debugging "terminal prompts disabled" errors'
|
|
87
|
+
docs:
|
|
88
|
+
- url: 'https://github.com/actions/checkout/issues/2393'
|
|
89
|
+
label: 'actions/checkout #2393: v6 includeIf credential matching fails on symlinked _work (open March 2026)'
|
|
90
|
+
- url: 'https://github.com/actions/checkout/issues/2313'
|
|
91
|
+
label: 'actions/checkout #2313: v6 breaks Docker actions using credential auth (related, closed)'
|
|
92
|
+
- url: 'https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/configuring-the-self-hosted-runner-application-as-a-service'
|
|
93
|
+
label: 'GitHub Docs: Configuring the self-hosted runner as a service'
|
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
id: runner-environment-090
|
|
2
|
+
title: 'Ephemeral Self-Hosted Runner Fails Immediately With "An error occurred: Runner not found"'
|
|
3
|
+
category: runner-environment
|
|
4
|
+
severity: error
|
|
5
|
+
tags:
|
|
6
|
+
- self-hosted
|
|
7
|
+
- ephemeral
|
|
8
|
+
- runner-not-found
|
|
9
|
+
- registration
|
|
10
|
+
- broker
|
|
11
|
+
- jit
|
|
12
|
+
patterns:
|
|
13
|
+
- regex: 'An error occurred: Runner not found'
|
|
14
|
+
flags: 'i'
|
|
15
|
+
- regex: 'RunnerNotFoundException'
|
|
16
|
+
flags: 'i'
|
|
17
|
+
error_messages:
|
|
18
|
+
- 'An error occurred: Runner not found'
|
|
19
|
+
- 'GitHub.Actions.RunService.WebApi.RunnerNotFoundException'
|
|
20
|
+
- 'Listening for Jobs'
|
|
21
|
+
root_cause: |
|
|
22
|
+
GitHub's broker endpoint returns RunnerNotFoundException immediately after
|
|
23
|
+
a successful registration and connection for ephemeral self-hosted runners
|
|
24
|
+
configured with replace mode. The runner completes registration ("Successfully
|
|
25
|
+
replaced the runner"), establishes connection ("Runner connection is good"),
|
|
26
|
+
starts listening for jobs, then receives a RunnerNotFoundException from the
|
|
27
|
+
broker HTTP client within seconds.
|
|
28
|
+
|
|
29
|
+
The error originates in BrokerHttpClient.cs where the broker API returns a
|
|
30
|
+
404/RunnerNotFoundException for the registered runner slot. This can occur
|
|
31
|
+
when the broker has stale slot state from the previous ephemeral runner
|
|
32
|
+
iteration that collides with the newly registered runner identity during the
|
|
33
|
+
brief window between registration and first poll.
|
|
34
|
+
|
|
35
|
+
The runner has no graceful retry handling for this condition — it exits with
|
|
36
|
+
status 1, causing systemd to restart it repeatedly, rapidly exhausting GitHub
|
|
37
|
+
App installation tokens through frequent re-registration cycles.
|
|
38
|
+
|
|
39
|
+
Affects all architectures (x86_64, aarch64, s390x) on various runner versions.
|
|
40
|
+
Spikes during periods of elevated load on GitHub broker infrastructure.
|
|
41
|
+
Source: actions/runner#3857 (116 reactions, open May 2025).
|
|
42
|
+
fix: |
|
|
43
|
+
1. Switch from replace-mode ephemeral runners to JIT (Just-In-Time) runner
|
|
44
|
+
tokens. JIT runners receive a pre-assigned job ID and avoid the broker
|
|
45
|
+
slot replacement race entirely.
|
|
46
|
+
2. Update the runner to the latest version (v2.334.0+) which improves retry
|
|
47
|
+
behavior around transient broker errors.
|
|
48
|
+
3. Add restart delay in the systemd service unit to prevent token exhaustion
|
|
49
|
+
on rapid restart loops:
|
|
50
|
+
RestartSec=30
|
|
51
|
+
StartLimitIntervalSec=300
|
|
52
|
+
StartLimitBurst=5
|
|
53
|
+
4. Monitor runner diagnostic logs in _diag/Runner_*.log for the
|
|
54
|
+
RunnerNotFoundException pattern to distinguish broker errors from
|
|
55
|
+
configuration issues.
|
|
56
|
+
fix_code:
|
|
57
|
+
- language: yaml
|
|
58
|
+
label: 'Systemd service unit with restart throttle to prevent token exhaustion'
|
|
59
|
+
code: |
|
|
60
|
+
# /etc/systemd/system/actions-runner.service
|
|
61
|
+
[Unit]
|
|
62
|
+
Description=GitHub Actions Self-Hosted Runner
|
|
63
|
+
After=network-online.target
|
|
64
|
+
|
|
65
|
+
[Service]
|
|
66
|
+
ExecStart=/home/runner/actions-runner/run.sh
|
|
67
|
+
Restart=on-failure
|
|
68
|
+
RestartSec=30
|
|
69
|
+
StartLimitIntervalSec=300
|
|
70
|
+
StartLimitBurst=5
|
|
71
|
+
User=runner
|
|
72
|
+
|
|
73
|
+
[Install]
|
|
74
|
+
WantedBy=multi-user.target
|
|
75
|
+
- language: yaml
|
|
76
|
+
label: 'Workflow using JIT runner token to avoid broker slot collision'
|
|
77
|
+
code: |
|
|
78
|
+
jobs:
|
|
79
|
+
provision-runner:
|
|
80
|
+
runs-on: ubuntu-latest
|
|
81
|
+
outputs:
|
|
82
|
+
runner-token: ${{ steps.jit.outputs.encoded_jit_config }}
|
|
83
|
+
steps:
|
|
84
|
+
- name: Generate JIT runner token
|
|
85
|
+
id: jit
|
|
86
|
+
uses: actions/github-script@v7
|
|
87
|
+
with:
|
|
88
|
+
script: |
|
|
89
|
+
const { data } = await github.rest.actions.generateRunnerJitconfigForRepo({
|
|
90
|
+
owner: context.repo.owner,
|
|
91
|
+
repo: context.repo.repo,
|
|
92
|
+
name: 'ephemeral-jit-runner',
|
|
93
|
+
runner_group_id: 1,
|
|
94
|
+
labels: ['self-hosted', 'ephemeral', 'linux']
|
|
95
|
+
});
|
|
96
|
+
core.setOutput('encoded_jit_config', data.encoded_jit_config);
|
|
97
|
+
|
|
98
|
+
build:
|
|
99
|
+
needs: provision-runner
|
|
100
|
+
runs-on: [self-hosted, ephemeral, linux]
|
|
101
|
+
steps:
|
|
102
|
+
- uses: actions/checkout@v4
|
|
103
|
+
prevention:
|
|
104
|
+
- 'Use JIT runner tokens instead of replace-mode registration to eliminate broker slot race'
|
|
105
|
+
- 'Set systemd RestartSec to at least 30 seconds to avoid GitHub App token exhaustion'
|
|
106
|
+
- 'Monitor _diag/Runner_*.log for RunnerNotFoundException patterns and alert on restart frequency'
|
|
107
|
+
- 'Keep runner version current — broker compatibility fixes are regularly backported'
|
|
108
|
+
- 'Consider Kubernetes ARC ephemeral runners where pod lifecycle handles registration cleanly'
|
|
109
|
+
docs:
|
|
110
|
+
- url: 'https://github.com/actions/runner/issues/3857'
|
|
111
|
+
label: 'actions/runner #3857: An error occurred: Runner not found (116 reactions, open May 2025)'
|
|
112
|
+
- url: 'https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/autoscaling-with-self-hosted-runners#using-just-in-time-runners'
|
|
113
|
+
label: 'GitHub Docs: Just-in-time runners (JIT)'
|
|
114
|
+
- url: 'https://docs.github.com/en/rest/actions/self-hosted-runners#create-configuration-for-a-just-in-time-runner-for-a-repository'
|
|
115
|
+
label: 'GitHub REST API: Generate JIT runner config'
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
id: runner-environment-092
|
|
2
|
+
title: "Multi-platform Docker Build Fails — Missing QEMU Setup for Cross-Architecture"
|
|
3
|
+
category: runner-environment
|
|
4
|
+
severity: error
|
|
5
|
+
tags:
|
|
6
|
+
- docker
|
|
7
|
+
- multi-platform
|
|
8
|
+
- qemu
|
|
9
|
+
- buildx
|
|
10
|
+
- arm64
|
|
11
|
+
- cross-arch
|
|
12
|
+
patterns:
|
|
13
|
+
- regex: 'failed to solve: no match for platform in manifest'
|
|
14
|
+
flags: "i"
|
|
15
|
+
- regex: 'no match for platform in manifest: linux/(arm64|arm/v[67]|riscv64|ppc64le|s390x)'
|
|
16
|
+
flags: "i"
|
|
17
|
+
- regex: 'exec format error'
|
|
18
|
+
flags: "i"
|
|
19
|
+
error_messages:
|
|
20
|
+
- "ERROR: failed to solve: no match for platform in manifest: linux/arm64"
|
|
21
|
+
- "failed to solve: no match for platform in manifest: linux/arm64"
|
|
22
|
+
- "exec format error"
|
|
23
|
+
- "cannot execute binary file: Exec format error"
|
|
24
|
+
root_cause: |
|
|
25
|
+
GitHub-hosted runners (ubuntu-*, macos-*, windows-*) are single-architecture machines.
|
|
26
|
+
When docker/build-push-action is configured with platforms: linux/amd64,linux/arm64 (or
|
|
27
|
+
any other non-native architecture), Docker BuildKit needs QEMU user-mode emulation to
|
|
28
|
+
execute non-native binaries during the build. Without docker/setup-qemu-action registered
|
|
29
|
+
first, BuildKit cannot emulate the target architecture and immediately fails with "no match
|
|
30
|
+
for platform in manifest" or exec format errors when a RUN instruction in the Dockerfile
|
|
31
|
+
executes a binary compiled for the wrong architecture.
|
|
32
|
+
|
|
33
|
+
This is a common mistake when adding multi-arch support to an existing single-arch workflow.
|
|
34
|
+
The amd64 platform succeeds while arm64 fails, producing confusing partial-success output.
|
|
35
|
+
Self-hosted ARM64 runners (e.g. macos-14/15, ubuntu ARM) can build their native arch without
|
|
36
|
+
QEMU but still need it for other non-native targets.
|
|
37
|
+
fix: |
|
|
38
|
+
Add docker/setup-qemu-action before docker/setup-buildx-action in your workflow steps.
|
|
39
|
+
QEMU must be installed before buildx initializes so BuildKit discovers the emulators at
|
|
40
|
+
setup time. Order matters — QEMU before buildx, buildx before build-push-action.
|
|
41
|
+
fix_code:
|
|
42
|
+
- language: yaml
|
|
43
|
+
label: "WRONG — buildx without QEMU (arm64 fails)"
|
|
44
|
+
code: |
|
|
45
|
+
- name: Set up Docker Buildx
|
|
46
|
+
uses: docker/setup-buildx-action@v3
|
|
47
|
+
|
|
48
|
+
- name: Build and push
|
|
49
|
+
uses: docker/build-push-action@v6
|
|
50
|
+
with:
|
|
51
|
+
platforms: linux/amd64,linux/arm64 # arm64 fails without QEMU
|
|
52
|
+
push: true
|
|
53
|
+
tags: myimage:latest
|
|
54
|
+
- language: yaml
|
|
55
|
+
label: "RIGHT — QEMU registered before buildx"
|
|
56
|
+
code: |
|
|
57
|
+
- name: Set up QEMU
|
|
58
|
+
uses: docker/setup-qemu-action@v3
|
|
59
|
+
|
|
60
|
+
- name: Set up Docker Buildx
|
|
61
|
+
uses: docker/setup-buildx-action@v3
|
|
62
|
+
|
|
63
|
+
- name: Build and push
|
|
64
|
+
uses: docker/build-push-action@v6
|
|
65
|
+
with:
|
|
66
|
+
platforms: linux/amd64,linux/arm64
|
|
67
|
+
push: true
|
|
68
|
+
tags: myimage:latest
|
|
69
|
+
cache-from: type=gha
|
|
70
|
+
cache-to: type=gha,mode=max
|
|
71
|
+
prevention:
|
|
72
|
+
- "Always add docker/setup-qemu-action before docker/setup-buildx-action when targeting non-native platforms."
|
|
73
|
+
- "Verify QEMU platform list covers your targets: linux/arm64, linux/arm/v7, linux/arm/v6, linux/riscv64, linux/ppc64le, linux/s390x."
|
|
74
|
+
- "Use a platform matrix to build and test each architecture independently for faster CI feedback loops."
|
|
75
|
+
- "Check runner architecture with 'uname -m' in a run step when debugging platform mismatches."
|
|
76
|
+
docs:
|
|
77
|
+
- url: "https://github.com/docker/setup-qemu-action"
|
|
78
|
+
label: "docker/setup-qemu-action — GitHub Action"
|
|
79
|
+
- url: "https://docs.docker.com/build/building/multi-platform/"
|
|
80
|
+
label: "Docker — Multi-platform builds"
|
|
81
|
+
- url: "https://github.com/docker/build-push-action/blob/master/docs/advanced/multi-platform.md"
|
|
82
|
+
label: "docker/build-push-action — Multi-platform builds guide"
|
|
83
|
+
- url: "https://stackoverflow.com/questions/69984898/github-action-multi-arch-docker-build-failed-to-solve-no-match-for-platform-in"
|
|
84
|
+
label: "Stack Overflow — Multi-arch Docker build: no match for platform in manifest (400+ votes)"
|
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
id: runner-environment-091
|
|
2
|
+
title: 'Self-Hosted Runner Worker Wedges Indefinitely After TaskOrchestrationJobNotFoundException'
|
|
3
|
+
category: runner-environment
|
|
4
|
+
severity: error
|
|
5
|
+
tags:
|
|
6
|
+
- self-hosted
|
|
7
|
+
- runner-worker
|
|
8
|
+
- wedged
|
|
9
|
+
- slot-starvation
|
|
10
|
+
- v2-runservice
|
|
11
|
+
- macos
|
|
12
|
+
- apple-silicon
|
|
13
|
+
patterns:
|
|
14
|
+
- regex: 'TaskOrchestrationJobNotFoundException.*workflow instance not found'
|
|
15
|
+
flags: 'i'
|
|
16
|
+
- regex: 'Job not found:.*workflow instance not found'
|
|
17
|
+
flags: 'i'
|
|
18
|
+
- regex: 'CompleteJobAsync.*TaskOrchestrationJobNotFoundException'
|
|
19
|
+
flags: 'i'
|
|
20
|
+
error_messages:
|
|
21
|
+
- 'GitHub.DistributedTask.WebApi.TaskOrchestrationJobNotFoundException: Job not found: <job-guid>. workflow instance not found'
|
|
22
|
+
- 'TaskOrchestrationJobNotFoundException: workflow instance not found'
|
|
23
|
+
- 'System.AggregateException: One or more errors occurred. (Job not found:'
|
|
24
|
+
root_cause: |
|
|
25
|
+
When a self-hosted runner Worker process calls CompleteJobAsync in the V2
|
|
26
|
+
RunService path (useV2Flow: true, RunServiceHttpClient.CompleteJobAsync)
|
|
27
|
+
and the GitHub orchestrator has discarded the job record (e.g., due to a
|
|
28
|
+
server-side timeout, infrastructure failover, or job cancellation during
|
|
29
|
+
finalization), the Worker receives TaskOrchestrationJobNotFoundException
|
|
30
|
+
with "workflow instance not found."
|
|
31
|
+
|
|
32
|
+
After exhausting configured retry attempts (default maxAttempts), the Worker
|
|
33
|
+
logs the exception and stops processing — but critically, it fails to call
|
|
34
|
+
Environment.Exit and the process remains alive at ~0.1% CPU with no active
|
|
35
|
+
work, no child processes, and no job cleanup activity.
|
|
36
|
+
|
|
37
|
+
The parent Runner.Listener treats the still-running Worker process as a busy
|
|
38
|
+
runner slot and refuses to spawn a new Worker. This causes runner slot
|
|
39
|
+
starvation: the affected runner stops accepting new jobs until the wedged
|
|
40
|
+
Worker is externally terminated (kill, reboot, or watchdog).
|
|
41
|
+
|
|
42
|
+
On one 3-host Apple Silicon runner pool (v2.334.0), this affected 32.8% of
|
|
43
|
+
Worker invocations (50 of 152) over three weeks, with one incident wedging
|
|
44
|
+
all three Workers simultaneously and blocking CI for 3+ hours.
|
|
45
|
+
Source: actions/runner#4418 (open May 2026).
|
|
46
|
+
fix: |
|
|
47
|
+
No upstream fix available — the Worker does not exit on non-retryable
|
|
48
|
+
CompleteJobAsync failures. Mitigations:
|
|
49
|
+
|
|
50
|
+
1. Deploy a watchdog that monitors _diag/Worker_*.log for the
|
|
51
|
+
TaskOrchestrationJobNotFoundException pattern and kills the wedged
|
|
52
|
+
Worker process by PID.
|
|
53
|
+
|
|
54
|
+
2. Use Kubernetes ARC ephemeral runners where the pod lifecycle replaces
|
|
55
|
+
the entire runner environment after each job — a wedged Worker is
|
|
56
|
+
automatically cleaned up when the pod is recycled.
|
|
57
|
+
|
|
58
|
+
3. Configure a hard systemd runtime limit (RuntimeMaxSec) that terminates
|
|
59
|
+
any runner process exceeding your longest expected job duration plus a
|
|
60
|
+
safety margin.
|
|
61
|
+
|
|
62
|
+
4. Add an external health-check cron that queries the GitHub API for runner
|
|
63
|
+
status and restarts the runner service if slots show "busy" longer than
|
|
64
|
+
expected.
|
|
65
|
+
fix_code:
|
|
66
|
+
- language: yaml
|
|
67
|
+
label: 'Kubernetes ARC ephemeral runner configuration (avoids wedged Worker state)'
|
|
68
|
+
code: |
|
|
69
|
+
# ARC RunnerDeployment — pods are recycled after each job
|
|
70
|
+
apiVersion: actions.summerwind.dev/v1alpha1
|
|
71
|
+
kind: RunnerDeployment
|
|
72
|
+
metadata:
|
|
73
|
+
name: ephemeral-runner-deployment
|
|
74
|
+
spec:
|
|
75
|
+
replicas: 3
|
|
76
|
+
template:
|
|
77
|
+
spec:
|
|
78
|
+
ephemeral: true # pod recycled after each job, no wedge possible
|
|
79
|
+
repository: owner/repo
|
|
80
|
+
labels:
|
|
81
|
+
- self-hosted
|
|
82
|
+
- ephemeral
|
|
83
|
+
- language: yaml
|
|
84
|
+
label: 'Scheduled watchdog workflow to detect stalled runner slots via API'
|
|
85
|
+
code: |
|
|
86
|
+
on:
|
|
87
|
+
schedule:
|
|
88
|
+
- cron: '*/15 * * * *' # every 15 minutes
|
|
89
|
+
|
|
90
|
+
jobs:
|
|
91
|
+
runner-health-check:
|
|
92
|
+
runs-on: ubuntu-latest
|
|
93
|
+
steps:
|
|
94
|
+
- name: Detect stalled self-hosted runners
|
|
95
|
+
uses: actions/github-script@v7
|
|
96
|
+
with:
|
|
97
|
+
script: |
|
|
98
|
+
const runners = await github.rest.actions.listSelfHostedRunnersForRepo({
|
|
99
|
+
owner: context.repo.owner,
|
|
100
|
+
repo: context.repo.repo
|
|
101
|
+
});
|
|
102
|
+
const offline = runners.data.runners.filter(r => r.status === 'offline');
|
|
103
|
+
if (offline.length > 0) {
|
|
104
|
+
core.warning('Offline/stalled runners: ' + offline.map(r => r.name).join(', '));
|
|
105
|
+
// Trigger your runner restart webhook here
|
|
106
|
+
}
|
|
107
|
+
prevention:
|
|
108
|
+
- 'Use ephemeral Kubernetes ARC runners — pod recycle eliminates wedged Worker slot starvation'
|
|
109
|
+
- 'Monitor _diag/Worker_*.log for TaskOrchestrationJobNotFoundException patterns'
|
|
110
|
+
- 'Set systemd RuntimeMaxSec to maximum expected job duration plus 30 minutes'
|
|
111
|
+
- 'Track runner slot busy duration — sudden sustained busy state with no job output indicates wedge'
|
|
112
|
+
- 'Deploy a watchdog process alongside the runner that monitors Worker PID lifetime'
|
|
113
|
+
docs:
|
|
114
|
+
- url: 'https://github.com/actions/runner/issues/4418'
|
|
115
|
+
label: 'actions/runner #4418: Worker wedges after TaskOrchestrationJobNotFoundException (open May 2026)'
|
|
116
|
+
- url: 'https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/autoscaling-with-self-hosted-runners'
|
|
117
|
+
label: 'GitHub Docs: Autoscaling with self-hosted runners'
|
|
118
|
+
- url: 'https://github.com/actions/actions-runner-controller'
|
|
119
|
+
label: 'actions/actions-runner-controller: Kubernetes ARC runner controller'
|
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
id: silent-failures-041
|
|
2
|
+
title: "`github.sha` Is the Merge Commit on Pull Request Events — Commit Status Invisible"
|
|
3
|
+
category: silent-failures
|
|
4
|
+
severity: silent-failure
|
|
5
|
+
tags:
|
|
6
|
+
- github.sha
|
|
7
|
+
- pull_request
|
|
8
|
+
- commit-status
|
|
9
|
+
- merge-commit
|
|
10
|
+
- sha
|
|
11
|
+
- statuses
|
|
12
|
+
patterns:
|
|
13
|
+
- regex: 'github\.sha'
|
|
14
|
+
flags: "i"
|
|
15
|
+
- regex: 'pull_request\.head\.sha'
|
|
16
|
+
flags: "i"
|
|
17
|
+
error_messages:
|
|
18
|
+
- "No error — commit status silently attached to ephemeral merge commit not visible in PR timeline"
|
|
19
|
+
- "No pending/passing status appears on the PR despite successful workflow run"
|
|
20
|
+
root_cause: |
|
|
21
|
+
On pull_request and pull_request_target events, github.sha is set to the SHA of a temporary
|
|
22
|
+
merge commit GitHub creates to test mergeability (refs/pull/<n>/merge), NOT the actual head
|
|
23
|
+
commit of the feature branch (refs/pull/<n>/head). This merge commit is ephemeral and is
|
|
24
|
+
never directly visible in the PR timeline or commit history.
|
|
25
|
+
|
|
26
|
+
Developers who pass github.sha to the GitHub Commit Status API (POST
|
|
27
|
+
/repos/{owner}/{repo}/statuses/{sha}), build provenance attestation tools, cosign, or other
|
|
28
|
+
tooling expecting the PR head commit get a silent mismatch: the API call succeeds with HTTP
|
|
29
|
+
201, the status is written, but it targets a commit no reviewer can see. The PR status checks
|
|
30
|
+
panel remains empty or stale, required checks are never satisfied, and the PR cannot be merged
|
|
31
|
+
even though the workflow ran successfully.
|
|
32
|
+
|
|
33
|
+
This is distinct from the checkout/ref problem where the wrong code is built — here the build
|
|
34
|
+
is correct but status reporting is silently targeting the wrong commit.
|
|
35
|
+
fix: |
|
|
36
|
+
Use github.event.pull_request.head.sha instead of github.sha when targeting the PR head
|
|
37
|
+
commit for status APIs or attestation. For non-PR events (push, workflow_dispatch,
|
|
38
|
+
schedule), github.sha is correct. Use a combined expression for reusable workflows that
|
|
39
|
+
run on both event types.
|
|
40
|
+
fix_code:
|
|
41
|
+
- language: yaml
|
|
42
|
+
label: "WRONG — status set on merge commit (never shows on PR)"
|
|
43
|
+
code: |
|
|
44
|
+
- name: Set commit status
|
|
45
|
+
uses: actions/github-script@v7
|
|
46
|
+
with:
|
|
47
|
+
script: |
|
|
48
|
+
await github.rest.repos.createCommitStatus({
|
|
49
|
+
owner: context.repo.owner,
|
|
50
|
+
repo: context.repo.repo,
|
|
51
|
+
sha: context.sha, // merge commit SHA on pull_request events
|
|
52
|
+
state: 'success',
|
|
53
|
+
context: 'my-check',
|
|
54
|
+
});
|
|
55
|
+
- language: yaml
|
|
56
|
+
label: "RIGHT — use PR head SHA for pull_request events"
|
|
57
|
+
code: |
|
|
58
|
+
- name: Set commit status
|
|
59
|
+
uses: actions/github-script@v7
|
|
60
|
+
with:
|
|
61
|
+
script: |
|
|
62
|
+
const sha = context.eventName === 'pull_request'
|
|
63
|
+
? context.payload.pull_request.head.sha // actual PR head commit
|
|
64
|
+
: context.sha;
|
|
65
|
+
await github.rest.repos.createCommitStatus({
|
|
66
|
+
owner: context.repo.owner,
|
|
67
|
+
repo: context.repo.repo,
|
|
68
|
+
sha,
|
|
69
|
+
state: 'success',
|
|
70
|
+
context: 'my-check',
|
|
71
|
+
});
|
|
72
|
+
- language: yaml
|
|
73
|
+
label: "RIGHT — pass head SHA via env for run steps"
|
|
74
|
+
code: |
|
|
75
|
+
- name: Attest or sign artifact
|
|
76
|
+
env:
|
|
77
|
+
COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
|
|
78
|
+
run: |
|
|
79
|
+
cosign sign-blob --bundle bundle.json artifact.tar.gz
|
|
80
|
+
# $COMMIT_SHA is the visible PR head SHA on pull_request events,
|
|
81
|
+
# falls back to github.sha for push/schedule/workflow_dispatch
|
|
82
|
+
prevention:
|
|
83
|
+
- "Never use github.sha directly for Commit Status API calls in pull_request workflows — always use github.event.pull_request.head.sha."
|
|
84
|
+
- "For workflows triggered by both push and pull_request, use the safe fallback expression: ${{ github.event.pull_request.head.sha || github.sha }}."
|
|
85
|
+
- "After adding required status checks, verify they actually appear in the PR timeline — a missing check often means the wrong SHA is being targeted."
|
|
86
|
+
- "Run 'echo ${{ github.sha }} ${{ github.event.pull_request.head.sha }}' in a debug step to confirm the SHAs differ on a PR event."
|
|
87
|
+
docs:
|
|
88
|
+
- url: "https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/accessing-contextual-information-about-workflow-runs#github-context"
|
|
89
|
+
label: "GitHub context — github.sha documentation"
|
|
90
|
+
- url: "https://docs.github.com/en/rest/commits/statuses"
|
|
91
|
+
label: "GitHub REST API — Commit statuses"
|
|
92
|
+
- url: "https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#pull_request"
|
|
93
|
+
label: "Events that trigger workflows — pull_request event and merge commit"
|
|
94
|
+
- url: "https://stackoverflow.com/questions/71264370/why-is-github-sha-not-pointing-to-the-pr-head-commit-on-pull-request-events"
|
|
95
|
+
label: "Stack Overflow — Why is github.sha not the PR head commit on pull_request events?"
|
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
id: triggers-030
|
|
2
|
+
title: "`pull_request` Default Activity Types Exclude `labeled` and `unlabeled` — Label-Gated Workflows Never Fire"
|
|
3
|
+
category: triggers
|
|
4
|
+
severity: silent-failure
|
|
5
|
+
tags:
|
|
6
|
+
- pull_request
|
|
7
|
+
- labeled
|
|
8
|
+
- unlabeled
|
|
9
|
+
- activity-types
|
|
10
|
+
- types
|
|
11
|
+
- label-gate
|
|
12
|
+
patterns:
|
|
13
|
+
- regex: 'on:\s*\n\s*pull_request:\s*\n(?!\s*types:)'
|
|
14
|
+
flags: "im"
|
|
15
|
+
- regex: 'types:\s*\[?\s*labeled'
|
|
16
|
+
flags: "i"
|
|
17
|
+
error_messages:
|
|
18
|
+
- "No error — workflow simply never runs when a label is added to or removed from a PR"
|
|
19
|
+
root_cause: |
|
|
20
|
+
The pull_request event triggers on three activity types by default: opened, synchronize,
|
|
21
|
+
and reopened. The labeled and unlabeled activity types — which fire when a label is added
|
|
22
|
+
or removed from a pull request — are NOT in the default set. Any workflow that relies on
|
|
23
|
+
label additions to trigger CI (deploy-preview gates, security scan exclusions, manual
|
|
24
|
+
override flags, environment promotion labels) will silently never run.
|
|
25
|
+
|
|
26
|
+
This affects common workflows such as:
|
|
27
|
+
- "preview" label triggers a staging deployment
|
|
28
|
+
- "approved-for-staging" label kicks off an integration test suite
|
|
29
|
+
- "skip-e2e" label skips expensive test steps
|
|
30
|
+
- "force-rebuild" label retriggers a build without a new commit
|
|
31
|
+
|
|
32
|
+
Because there is no workflow run at all (no skipped run, no failed run, no log entry),
|
|
33
|
+
the failure is completely invisible. Developers typically spend significant time debugging
|
|
34
|
+
the conditional logic inside the workflow before discovering the trigger itself never fired.
|
|
35
|
+
The workflow only runs if the PR is also opened, synchronized, or reopened coincidentally
|
|
36
|
+
with the label operation.
|
|
37
|
+
fix: |
|
|
38
|
+
Explicitly declare all required activity types in the pull_request trigger using the
|
|
39
|
+
types: key. Include labeled (and unlabeled if removals also matter) alongside the
|
|
40
|
+
standard types if the workflow should run for both code changes and label changes.
|
|
41
|
+
fix_code:
|
|
42
|
+
- language: yaml
|
|
43
|
+
label: "WRONG — pull_request without types never fires on label add"
|
|
44
|
+
code: |
|
|
45
|
+
on:
|
|
46
|
+
pull_request: # defaults: opened, synchronize, reopened only
|
|
47
|
+
branches: [main]
|
|
48
|
+
|
|
49
|
+
jobs:
|
|
50
|
+
preview:
|
|
51
|
+
if: contains(github.event.pull_request.labels.*.name, 'preview')
|
|
52
|
+
runs-on: ubuntu-latest
|
|
53
|
+
steps:
|
|
54
|
+
- run: echo "Deploying preview..." # never reached from label addition
|
|
55
|
+
- language: yaml
|
|
56
|
+
label: "RIGHT — explicitly include labeled in types"
|
|
57
|
+
code: |
|
|
58
|
+
on:
|
|
59
|
+
pull_request:
|
|
60
|
+
branches: [main]
|
|
61
|
+
types:
|
|
62
|
+
- opened
|
|
63
|
+
- synchronize
|
|
64
|
+
- reopened
|
|
65
|
+
- labeled # fires when a label is added
|
|
66
|
+
- unlabeled # fires when a label is removed
|
|
67
|
+
|
|
68
|
+
jobs:
|
|
69
|
+
preview:
|
|
70
|
+
if: contains(github.event.pull_request.labels.*.name, 'preview')
|
|
71
|
+
runs-on: ubuntu-latest
|
|
72
|
+
steps:
|
|
73
|
+
- run: echo "Deploying preview..."
|
|
74
|
+
- language: yaml
|
|
75
|
+
label: "RIGHT — label-only workflow (fires only on label events)"
|
|
76
|
+
code: |
|
|
77
|
+
on:
|
|
78
|
+
pull_request:
|
|
79
|
+
types: [labeled, unlabeled]
|
|
80
|
+
|
|
81
|
+
jobs:
|
|
82
|
+
handle-label:
|
|
83
|
+
if: github.event.label.name == 'deploy-preview'
|
|
84
|
+
runs-on: ubuntu-latest
|
|
85
|
+
steps:
|
|
86
|
+
- run: echo "Label '${{ github.event.label.name }}' added/removed"
|
|
87
|
+
prevention:
|
|
88
|
+
- "Always explicitly declare types: when workflow logic depends on specific PR activity — labels, reviews, drafts, assignments, or milestones."
|
|
89
|
+
- "The default pull_request types are ONLY opened, synchronize, and reopened — all other activity types must be opt-in."
|
|
90
|
+
- "Test label-triggered workflows by adding the label manually after the workflow file is merged to the default branch."
|
|
91
|
+
- "Use github.event.action in run steps to distinguish between labeled and unlabeled events when both types are declared."
|
|
92
|
+
docs:
|
|
93
|
+
- url: "https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#pull_request"
|
|
94
|
+
label: "Events that trigger workflows — pull_request activity types"
|
|
95
|
+
- url: "https://docs.github.com/en/webhooks/webhook-events-and-payloads#pull_request"
|
|
96
|
+
label: "Webhook events — pull_request payload (full activity type list)"
|
|
97
|
+
- url: "https://stackoverflow.com/questions/63699767/github-actions-on-pull-request-labeled-trigger-not-working"
|
|
98
|
+
label: "Stack Overflow — GitHub Actions on: pull_request labeled trigger not working"
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
id: yaml-syntax-033
|
|
2
|
+
title: "`save-always` Input Does Not Exist in `actions/cache` v3 — Unexpected Input Error"
|
|
3
|
+
category: yaml-syntax
|
|
4
|
+
severity: error
|
|
5
|
+
tags:
|
|
6
|
+
- actions/cache
|
|
7
|
+
- save-always
|
|
8
|
+
- v3
|
|
9
|
+
- v4
|
|
10
|
+
- unexpected-input
|
|
11
|
+
- version-mismatch
|
|
12
|
+
patterns:
|
|
13
|
+
- regex: "Unexpected input\\(s\\) 'save-always'"
|
|
14
|
+
flags: "i"
|
|
15
|
+
- regex: "Unexpected input.*save-always.*valid inputs are"
|
|
16
|
+
flags: "i"
|
|
17
|
+
error_messages:
|
|
18
|
+
- "Unexpected input(s) 'save-always', valid inputs are ['path', 'key', 'restore-keys', 'upload-chunk-size', 'enableCrossOsArchive', 'fail-on-cache-miss', 'lookup-only']"
|
|
19
|
+
- "Warning: Unexpected input(s) 'save-always'"
|
|
20
|
+
root_cause: |
|
|
21
|
+
The save-always input was introduced in actions/cache v4.0.0 to allow the internal
|
|
22
|
+
post-step cache save to run even when the job fails or is cancelled, bypassing the
|
|
23
|
+
hardcoded post-if: success() guard present in v3. In v3, saving on failure required
|
|
24
|
+
explicitly splitting the monolithic cache step into separate actions/cache/restore and
|
|
25
|
+
actions/cache/save steps with if: always().
|
|
26
|
+
|
|
27
|
+
Developers who copy examples from v4 documentation or Stack Overflow answers written
|
|
28
|
+
after December 2023 and apply them to a workflow still pinned to actions/cache@v3
|
|
29
|
+
receive an "Unexpected input(s) 'save-always'" warning or error. In v3 the input is
|
|
30
|
+
silently ignored in some versions and causes a non-zero exit in others. Either way,
|
|
31
|
+
the intent (save cache on failure) is never fulfilled.
|
|
32
|
+
fix: |
|
|
33
|
+
Upgrade to actions/cache@v4 (or v4+) to use save-always: true directly. If upgrading
|
|
34
|
+
is not possible, replace the single cache step with explicit restore + save steps and
|
|
35
|
+
guard the save step with if: always().
|
|
36
|
+
fix_code:
|
|
37
|
+
- language: yaml
|
|
38
|
+
label: "WRONG — save-always on cache@v3 (input does not exist)"
|
|
39
|
+
code: |
|
|
40
|
+
- uses: actions/cache@v3
|
|
41
|
+
with:
|
|
42
|
+
path: ~/.npm
|
|
43
|
+
key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
|
|
44
|
+
save-always: true # does not exist in v3 — unexpected input error
|
|
45
|
+
- language: yaml
|
|
46
|
+
label: "RIGHT — upgrade to cache@v4 to use save-always"
|
|
47
|
+
code: |
|
|
48
|
+
- uses: actions/cache@v4
|
|
49
|
+
with:
|
|
50
|
+
path: ~/.npm
|
|
51
|
+
key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
|
|
52
|
+
save-always: true # introduced in v4.0.0
|
|
53
|
+
- language: yaml
|
|
54
|
+
label: "RIGHT — v3 workaround with explicit restore and save steps"
|
|
55
|
+
code: |
|
|
56
|
+
- name: Restore cache
|
|
57
|
+
id: cache-restore
|
|
58
|
+
uses: actions/cache/restore@v3
|
|
59
|
+
with:
|
|
60
|
+
path: ~/.npm
|
|
61
|
+
key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
|
|
62
|
+
|
|
63
|
+
- name: Install dependencies
|
|
64
|
+
run: npm ci
|
|
65
|
+
|
|
66
|
+
- name: Save cache
|
|
67
|
+
if: always()
|
|
68
|
+
uses: actions/cache/save@v3
|
|
69
|
+
with:
|
|
70
|
+
path: ~/.npm
|
|
71
|
+
key: ${{ steps.cache-restore.outputs.cache-primary-key }}
|
|
72
|
+
prevention:
|
|
73
|
+
- "Pin all new workflows to actions/cache@v4 — v3 is missing save-always, restore-first-match-in-branch, and other v4 inputs."
|
|
74
|
+
- "When copying cache examples from documentation, check the version badge — inputs differ significantly between v3 and v4."
|
|
75
|
+
- "Enable Dependabot or Renovate for action version updates to avoid accumulating version-mismatch debt."
|
|
76
|
+
- "If still on v3, use the split restore/save pattern with if: always() for any workflow that must cache on failure."
|
|
77
|
+
docs:
|
|
78
|
+
- url: "https://github.com/actions/cache/releases/tag/v4.0.0"
|
|
79
|
+
label: "actions/cache v4.0.0 release notes — save-always introduced"
|
|
80
|
+
- url: "https://github.com/actions/cache/blob/main/tips-and-workarounds.md#saving-cache-even-if-the-build-fails"
|
|
81
|
+
label: "actions/cache — Saving cache even if the build fails"
|
|
82
|
+
- url: "https://github.com/actions/cache/blob/main/save/README.md"
|
|
83
|
+
label: "actions/cache/save — Explicit save action"
|
|
84
|
+
- url: "https://stackoverflow.com/questions/60491837/saving-cache-on-job-failure-in-github-actions"
|
|
85
|
+
label: "Stack Overflow — Saving cache on job failure in GitHub Actions (200+ votes)"
|
package/package.json
CHANGED