@htekdev/actions-debugger 1.0.115 → 1.0.116

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,129 @@
1
+ id: runner-environment-203
2
+ title: 'ARC EphemeralRunner Stuck in Running State After OOM Kill — Scale Set Blocked'
3
+ category: runner-environment
4
+ severity: error
5
+ tags:
6
+ - arc
7
+ - actions-runner-controller
8
+ - ephemeral-runner
9
+ - oomkill
10
+ - kubernetes
11
+ - scale-set
12
+ - session-conflict
13
+ - stuck
14
+ - self-hosted
15
+ patterns:
16
+ - regex: 'RunnerScaleSetSessionConflictException'
17
+ flags: 'i'
18
+ - regex: 'TaskAgentSessionConflictException'
19
+ flags: 'i'
20
+ - regex: 'A session for this runner already exists'
21
+ flags: 'i'
22
+ - regex: 'Runner connect error: Error: Conflict\. Retrying until reconnected'
23
+ flags: 'i'
24
+ error_messages:
25
+ - 'RunnerScaleSetSessionConflictException: there is already an active session'
26
+ - 'TaskAgentSessionConflictException: Error: Conflict'
27
+ - 'A session for this runner already exists.'
28
+ - '2026-XX-XX HH:MM:SSZ: Runner connect error: Error: Conflict. Retrying until reconnected.'
29
+ root_cause: |
30
+ When an ARC (Actions Runner Controller) ephemeral runner pod is OOM-killed by the
31
+ Kubernetes kubelet (memory limit exceeded), the pod terminates abruptly without going
32
+ through the runner's graceful shutdown path. The runner never sends a "job completed" or
33
+ "runner offline" signal to the GitHub broker.
34
+
35
+ As a result:
36
+ 1. The EphemeralRunner custom resource stays in phase `Running` indefinitely.
37
+ 2. The ARC scale-set controller still counts the dead runner as "in use", so it does not
38
+ spin up a replacement pod to service the next queued job.
39
+ 3. New jobs remain stuck at "Waiting for a runner to pick up this job..." until the
40
+ stale EphemeralRunner CR is manually deleted.
41
+
42
+ When ARC tries to restart the runner (either via a controller health check or manually),
43
+ the new runner pod connects to the GitHub broker using the same JIT token/session, and
44
+ the broker responds HTTP 409 Conflict because the old session is still registered:
45
+
46
+ TaskAgentSessionConflictException: Error: Conflict
47
+ A session for this runner already exists.
48
+ Runner connect error: Error: Conflict. Retrying until reconnected.
49
+
50
+ The runner retries every 30 seconds. After the broker's session lease expires (~2-3 min
51
+ in most cases), the conflict resolves and the runner connects — but the session timeout
52
+ window varies and can leave the scale set blocked for longer periods.
53
+
54
+ Versions affected:
55
+ - Reproducible across ARC v0.9.x - v0.12.x; partial mitigation added in ARC v0.12.0
56
+ (stale EphemeralRunner detection), but OOM kills on active runners can still bypass it.
57
+ - Frequently triggered by Vitest --coverage, Jest with large test suites, or any
58
+ memory-intensive build tool running without memory limits in the runner container.
59
+ fix: |
60
+ Immediate recovery: delete the stuck EphemeralRunner CR to release the scale-set slot:
61
+
62
+ kubectl delete ephemeralrunner -n <namespace> <runner-name>
63
+
64
+ After deletion, ARC will spin up a new runner pod and pick up the queued job.
65
+
66
+ Root fix (prevent recurrence):
67
+ 1. Set memory limits on runner containers that match actual job requirements with headroom.
68
+ 2. Add workflow-level `timeout-minutes:` to ensure jobs terminate and release the runner
69
+ if they run too long.
70
+ 3. Upgrade ARC to v0.12.0+ for improved stale EphemeralRunner detection.
71
+ 4. Configure `terminationGracePeriodSeconds: 90` (or longer) on runner pods to give the
72
+ runner process time to deregister gracefully before the kubelet force-kills it.
73
+ fix_code:
74
+ - language: yaml
75
+ label: 'Set memory limits and timeout on runner pods to prevent OOM kills'
76
+ code: |
77
+ # In your HelmRelease / values.yaml for actions-runner-controller
78
+ githubConfigUrl: "https://github.com/your-org/your-repo"
79
+ maxRunners: 4
80
+ minRunners: 0
81
+ template:
82
+ spec:
83
+ terminationGracePeriodSeconds: 90
84
+ containers:
85
+ - name: runner
86
+ image: ghcr.io/actions/actions-runner:latest
87
+ resources:
88
+ requests:
89
+ memory: "2Gi"
90
+ cpu: "500m"
91
+ limits:
92
+ memory: "4Gi" # Set appropriate limit; OOM kill occurs when exceeded
93
+ cpu: "2"
94
+ - language: yaml
95
+ label: 'Add workflow timeout to guarantee runner release even on hang'
96
+ code: |
97
+ jobs:
98
+ test:
99
+ runs-on: arc-runner-set
100
+ timeout-minutes: 30 # Runner is released after 30 min even if job hangs
101
+ steps:
102
+ - uses: actions/checkout@v4
103
+ - run: npm test -- --coverage
104
+ - language: yaml
105
+ label: 'Manual recovery: delete stuck EphemeralRunner CR'
106
+ code: |
107
+ # List stuck EphemeralRunners
108
+ kubectl get ephemeralrunner -n arc-systems
109
+
110
+ # Delete the stuck one (ARC will create a new pod automatically)
111
+ kubectl delete ephemeralrunner -n arc-systems <stuck-runner-name>
112
+
113
+ # Alternatively, delete all stuck runners in a namespace
114
+ kubectl delete ephemeralrunner -n arc-systems --field-selector='status.phase=Running'
115
+ prevention:
116
+ - 'Always set memory limits on ARC runner containers; without limits, a single job can consume all node memory and OOM-kill other runners.'
117
+ - 'Set timeout-minutes: at the job level for all ARC-backed workflows to guarantee the runner is eventually released.'
118
+ - 'Upgrade ARC to v0.12.0+ for automatic stale EphemeralRunner cleanup.'
119
+ - 'Monitor EphemeralRunner phase distribution; a growing count of Running CRs with no corresponding pods is a leading indicator of this issue.'
120
+ - 'Add terminationGracePeriodSeconds: 90+ to runner pod templates so gradual shutdown signals have time to deregister the runner.'
121
+ docs:
122
+ - url: 'https://github.com/actions/actions-runner-controller/issues/4155'
123
+ label: 'EphemeralRunner and its pods left stuck Running after runner OOMKILL (15 reactions)'
124
+ - url: 'https://github.com/actions/actions-runner-controller/issues/3922'
125
+ label: 'Scaleset controllers stuck with RunnerScaleSetSessionConflictException (12 reactions)'
126
+ - url: 'https://github.com/actions/runner/issues/4312'
127
+ label: 'Self-hosted runner gets stuck in active state, blocking queued jobs across multiple repositories'
128
+ - url: 'https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller'
129
+ label: 'Managing self-hosted runners with Actions Runner Controller'
@@ -0,0 +1,113 @@
1
+ id: runner-environment-201
2
+ title: 'macOS 26 Homebrew python@3 ships Python 3.13 — removed stdlib modules (cgi, imghdr, aifc, telnetlib) cause ModuleNotFoundError'
3
+ category: runner-environment
4
+ severity: error
5
+ tags:
6
+ - macos
7
+ - macos-26
8
+ - python
9
+ - homebrew
10
+ - stdlib
11
+ - runner-image
12
+ - breaking-change
13
+ patterns:
14
+ - regex: 'ModuleNotFoundError: No module named ''(cgi|imghdr|aifc|chunk|nntplib|telnetlib|uu|xdrlib|sndhdr|sunau|mailcap|msilib|pipes|crypt|spwd|ossaudiodev)'''
15
+ flags: 'i'
16
+ - regex: 'ImportError.*No module named.*cgi|No module named.*imghdr|No module named.*telnetlib'
17
+ flags: 'i'
18
+ - regex: 'python.*3\.13.*deprecated.*module|removed.*python.*3\.13'
19
+ flags: 'i'
20
+ error_messages:
21
+ - 'ModuleNotFoundError: No module named ''cgi'''
22
+ - 'ModuleNotFoundError: No module named ''imghdr'''
23
+ - 'ModuleNotFoundError: No module named ''aifc'''
24
+ - 'ModuleNotFoundError: No module named ''telnetlib'''
25
+ - 'ModuleNotFoundError: No module named ''chunk'''
26
+ - 'ModuleNotFoundError: No module named ''nntplib'''
27
+ root_cause: |
28
+ macOS 26 runner images ship Homebrew python@3 pointing to Python 3.13.x. Python
29
+ 3.13 removed the following stdlib modules that were deprecated since Python 3.11:
30
+
31
+ cgi, cgitb, aifc, chunk, crypt, imghdr, mailcap, msilib (Windows only),
32
+ nntplib, ossaudiodev, pipes, sndhdr, spwd, sunau, telnetlib, uu, xdrlib
33
+
34
+ Workflows that call bare python3 (resolved to Homebrew Python 3.13 on macOS 26)
35
+ and import any of these modules fail with ModuleNotFoundError at runtime.
36
+
37
+ This affects:
38
+ - Scripts using cgi or cgitb for HTTP form parsing
39
+ - Image-type detection using imghdr (commonly used with Pillow-based workflows)
40
+ - Legacy FTP/NNTP clients using nntplib or telnetlib
41
+ - Audio file handling using aifc, sunau, or sndhdr
42
+
43
+ Workflows that previously ran on macos-14 or macos-15 (Homebrew Python 3.11/3.12)
44
+ are affected when the job label is macos-26 or when macos-latest migrates to
45
+ macOS 26. The failure is not immediately obvious because the error occurs at
46
+ import time inside Python, not at the runner level, and the runner step exits
47
+ with a non-zero code that may be mistaken for a test failure rather than an
48
+ environment regression.
49
+
50
+ Note: actions/setup-python@v5+ with an explicit python-version is unaffected —
51
+ this issue only affects scripts that rely on the system/Homebrew python3 binary.
52
+ fix: |
53
+ Option 1 (recommended) — Pin Python with actions/setup-python:
54
+ Always use actions/setup-python with an explicit version to get the exact
55
+ Python version your code requires. This bypasses the Homebrew python@3 symlink.
56
+
57
+ Option 2 — Replace removed modules with modern equivalents:
58
+ - cgi → urllib.parse + email.parser (or the 3rd-party 'cgi' backport)
59
+ - imghdr → imghdr is available as the 3rd-party 'imghdr' backport on PyPI,
60
+ or use python-magic / filetype for image detection
61
+ - telnetlib → use telnetlib3 (PyPI) or asyncio-based Telnet
62
+ - aifc/sunau → use soundfile or wave for audio I/O
63
+
64
+ Option 3 — Pin Homebrew Python to 3.12 on macos-26 (temporary):
65
+ brew install python@3.12
66
+ brew link python@3.12 --force
67
+ echo "/usr/local/opt/python@3.12/bin" >> $GITHUB_PATH
68
+ fix_code:
69
+ - language: yaml
70
+ label: 'Fix: pin Python version with actions/setup-python to avoid Homebrew python@3'
71
+ code: |
72
+ - uses: actions/setup-python@v5
73
+ with:
74
+ python-version: '3.12' # pins to 3.12; immune to Homebrew python@3 upgrade
75
+
76
+ - name: Install dependencies
77
+ run: pip install -r requirements.txt
78
+
79
+ - name: Run script
80
+ run: python script.py # uses setup-python's 3.12, not Homebrew 3.13
81
+ - language: yaml
82
+ label: 'Fix: install removed modules from PyPI backports'
83
+ code: |
84
+ - uses: actions/setup-python@v5
85
+ with:
86
+ python-version: '3.13'
87
+ - name: Install backported removed modules
88
+ run: |
89
+ pip install imghdr # PyPI backport of imghdr for Python 3.13+
90
+ # pip install telnetlib3 # if using Telnet
91
+ - name: Run script
92
+ run: python script.py
93
+ - language: yaml
94
+ label: 'Temporary: install and use python@3.12 from Homebrew on macos-26'
95
+ code: |
96
+ - name: Pin Homebrew Python to 3.12
97
+ run: |
98
+ brew install python@3.12
99
+ echo "/usr/local/opt/python@3.12/libexec/bin" >> $GITHUB_PATH
100
+ - name: Verify Python version
101
+ run: python3 --version # should print Python 3.12.x
102
+ prevention:
103
+ - 'Always use actions/setup-python with an explicit version — never rely on bare python3 pointing to Homebrew python@3'
104
+ - 'Audit scripts for imports of modules removed in Python 3.13: cgi, imghdr, aifc, telnetlib, nntplib, chunk, uu'
105
+ - 'Run pyupgrade --py313-plus locally before the macos-26 migration to catch deprecated imports'
106
+ - 'Add python --version to diagnostic steps to catch unexpected Python version changes early'
107
+ docs:
108
+ - url: 'https://docs.python.org/3/whatsnew/3.13.html#removed-modules'
109
+ label: 'Python 3.13: Removed modules (official docs)'
110
+ - url: 'https://peps.python.org/pep-0594/'
111
+ label: 'PEP 594: Removing dead batteries from the standard library'
112
+ - url: 'https://github.com/actions/setup-python'
113
+ label: 'actions/setup-python: Pin a specific Python version'
@@ -0,0 +1,102 @@
1
+ id: runner-environment-200
2
+ title: 'macOS 26 OpenSSL 3.x rejects legacy RC2 ciphers — p12 certificate import fails with inner_evp_generic_fetch unsupported'
3
+ category: runner-environment
4
+ severity: error
5
+ tags:
6
+ - macos
7
+ - macos-26
8
+ - openssl
9
+ - code-signing
10
+ - certificate
11
+ - ios
12
+ - legacy-cipher
13
+ patterns:
14
+ - regex: 'inner_evp_generic_fetch:unsupported'
15
+ flags: 'i'
16
+ - regex: 'Algorithm \(RC2-\d+-CBC.*\)'
17
+ flags: 'i'
18
+ - regex: 'Could not find certificate from.*stdin|Could not parse.*certificate'
19
+ flags: 'i'
20
+ - regex: 'openssl.*failed with return code:\s*1'
21
+ flags: 'i'
22
+ error_messages:
23
+ - '40CBBC50F87F0000:error:0308010C:digital envelope routines:inner_evp_generic_fetch:unsupported:crypto/evp/evp_fetch.c:355:Global default library context, Algorithm (RC2-40-CBC : 0), Properties ()'
24
+ - 'Could not find certificate from <stdin>'
25
+ - '##[error]Error: /usr/local/bin/openssl failed with return code: 1'
26
+ - '##[warning]Error parsing certificate. This might be caused by an unsupported algorithm. If you''re using old certificate with a new OpenSSL version try to set -legacy flag in opensslPkcsArgs input.'
27
+ root_cause: |
28
+ macOS 26 runner images ship OpenSSL 3.x (OpenSSL 3.6.x as of runner-images macOS 26
29
+ release notes). OpenSSL 3.x removed RC2-40-CBC, RC2-64-CBC, and RC2-128-CBC from
30
+ the default provider. These legacy ciphers were commonly used to encrypt PKCS#12
31
+ certificate bundles generated by macOS Keychain Access, Fastlane cert, older CI
32
+ pipelines, or Keychain import/export tools shipped before 2020.
33
+
34
+ When a workflow installs a p12 certificate using openssl pkcs12 (directly or via
35
+ apple-actions/import-codesign-certs@v1/v2), OpenSSL 3.x cannot decrypt the legacy
36
+ bundle and emits:
37
+
38
+ error:0308010C:digital envelope routines:inner_evp_generic_fetch:unsupported:
39
+ ...Algorithm (RC2-40-CBC : 0), Properties ()
40
+
41
+ followed by "Could not find certificate from <stdin>" and openssl exit code 1.
42
+
43
+ The code-signing step silently skips certificate import, causing downstream
44
+ codesign, xcodebuild archive, or notarytool steps to fail with "no identity found"
45
+ or "identity not in keychain" errors.
46
+
47
+ Workflows that ran successfully on macos-14 (OpenSSL 1.1.x) or macos-15
48
+ (OpenSSL 3.3.x with legacy provider enabled) may start failing when the job
49
+ label is macos-26 or when macos-latest migrates to macOS 26.
50
+ fix: |
51
+ Option 1 (quickest) — Pass opensslPkcsArgs: -legacy to apple-actions/import-codesign-certs:
52
+ The -legacy flag activates OpenSSL's legacy provider which re-enables RC2 and
53
+ other deprecated algorithms. Supported in apple-actions/import-codesign-certs v2+.
54
+
55
+ Option 2 (permanent) — Re-encode the p12 with a modern cipher on your local machine
56
+ before uploading to GitHub Secrets:
57
+ openssl pkcs12 -in legacy.p12 -out certs.pem -nodes -passin pass:OLDPASSWORD
58
+ openssl pkcs12 -export -in certs.pem -out modern.p12 -passout pass:NEWPASSWORD \
59
+ -keypbe aes-256-cbc -certpbe aes-256-cbc -macalg sha256
60
+ base64 -w 0 modern.p12 > modern_b64.txt
61
+ # Upload content of modern_b64.txt as the new GitHub secret value
62
+
63
+ Option 3 — Regenerate certificates with modern tooling:
64
+ Use Fastlane match (>= 3.x) or Xcode 26 certificate export; both default to
65
+ AES-256-CBC which OpenSSL 3.x supports without -legacy.
66
+ fix_code:
67
+ - language: yaml
68
+ label: 'Fix: pass opensslPkcsArgs: -legacy (apple-actions/import-codesign-certs)'
69
+ code: |
70
+ - uses: apple-actions/import-codesign-certs@v3
71
+ with:
72
+ p12-file-base64: ${{ secrets.IOS_DISTRIBUTION_P12 }}
73
+ p12-password: ${{ secrets.IOS_DISTRIBUTION_P12_PASSWORD }}
74
+ opensslPkcsArgs: -legacy # required on macOS 26 / OpenSSL 3.x for RC2-encrypted p12
75
+ - language: yaml
76
+ label: 'Long-term fix: re-encode p12 with AES-256-CBC before updating the secret'
77
+ code: |
78
+ # Run these commands locally (not in CI) to produce a modern p12:
79
+ #
80
+ # openssl pkcs12 -in legacy.p12 -out certs.pem -nodes -passin pass:$OLD_PASS \
81
+ # -legacy # may need -legacy on your local OpenSSL 3.x too
82
+ # openssl pkcs12 -export \
83
+ # -in certs.pem -out modern.p12 \
84
+ # -passout pass:$NEW_PASS \
85
+ # -keypbe aes-256-cbc \
86
+ # -certpbe aes-256-cbc \
87
+ # -macalg sha256
88
+ # base64 -w 0 modern.p12 | pbcopy # paste into GitHub Secrets
89
+ #
90
+ # After updating the secret, remove the opensslPkcsArgs: -legacy workaround.
91
+ prevention:
92
+ - 'Re-encode all p12 certificate bundles with AES-256-CBC before migrating to macos-26'
93
+ - 'Audit p12 cipher with: openssl pkcs12 -in cert.p12 -info -noout -passin pass:X 2>&1 | grep -i cipher'
94
+ - 'Pin macos-15 temporarily as a fallback while re-encoding certificates'
95
+ - 'Test certificate import on macos-26 runners before macos-latest label migration completes'
96
+ docs:
97
+ - url: 'https://github.com/actions/runner-images/issues/10934'
98
+ label: 'GitHub runner-images #10934: Intermittent SSL issue loading iOS certificates on macOS 26'
99
+ - url: 'https://www.openssl.org/docs/man3.0/man7/OSSL_PROVIDER-legacy.html'
100
+ label: 'OpenSSL 3.x legacy provider documentation'
101
+ - url: 'https://github.com/apple-actions/import-codesign-certs'
102
+ label: 'apple-actions/import-codesign-certs: opensslPkcsArgs input'
@@ -0,0 +1,100 @@
1
+ id: runner-environment-202
2
+ title: 'Runner v2.334.0 "Download action repository" repeats for same action when references use different letter casing'
3
+ category: runner-environment
4
+ severity: warning
5
+ tags:
6
+ - runner
7
+ - action-resolution
8
+ - performance
9
+ - v2.334
10
+ - composite-actions
11
+ - case-sensitivity
12
+ patterns:
13
+ - regex: 'Download action repository.*already been downloaded'
14
+ flags: 'i'
15
+ - regex: 'Download action repository.*\d+\.\d+s.*Download action repository'
16
+ flags: 'i'
17
+ - regex: 'Resolving action.*batching.*dedup.*failed|action.*resolution.*duplicate.*case'
18
+ flags: 'i'
19
+ error_messages:
20
+ - 'Download action repository ''actions/checkout@v4'' (SHA:abc123...) 18.35s'
21
+ - 'Download action repository ''Actions/Checkout@v4'' (SHA:abc123...) 19.12s'
22
+ - 'Download action repository ''ACTIONS/CHECKOUT@v4'' (SHA:abc123...) 17.88s'
23
+ root_cause: |
24
+ Runner v2.334.0 introduced "batch and deduplicate action resolution across composite
25
+ depths" to speed up jobs with many actions. The deduplication logic uses a
26
+ case-sensitive equality check on the action reference string (owner/repo@ref). When
27
+ the same action is referenced with different capitalizations — for example,
28
+ actions/checkout@v4 in one workflow step and Actions/Checkout@v4 inside a composite
29
+ action — the deduplication logic treats them as distinct actions and downloads both.
30
+
31
+ This regression is tracked in actions/runner#3731. Each duplicate download takes
32
+ 15-20 seconds, which multiplies across all case-variant references in a job. In
33
+ workflows with many composite actions or large action dependency trees, this can add
34
+ several minutes of silent overhead.
35
+
36
+ The issue does not cause a build failure — the downloads resolve to the same SHA
37
+ and the action runs once. The cost is purely in duplicate network traffic, API
38
+ calls, and elapsed time. Repeated downloads also count against GitHub API rate
39
+ limits for private runners or GHES installations.
40
+
41
+ Most commonly observed when:
42
+ - Composite actions use uppercase or title-case owners in their uses: fields
43
+ - Third-party actions internally call actions/core or actions/toolcache with
44
+ inconsistent casing
45
+ - Workflows mix community-contributed composite actions with differing conventions
46
+ fix: |
47
+ Canonicalize all action references to lowercase owner/repo throughout your
48
+ workflow files and composite action.yml files. GitHub API lookups are
49
+ case-insensitive, so actions/checkout and Actions/Checkout resolve identically,
50
+ but the v2.334.0 deduplication cache is case-sensitive.
51
+
52
+ After fixing, run the job again to confirm "Download action repository" appears
53
+ only once per unique action in the logs.
54
+
55
+ A fix for the runner-side deduplication (case-insensitive cache key) is tracked
56
+ in actions/runner#3731 and may be included in a future runner release.
57
+ fix_code:
58
+ - language: yaml
59
+ label: 'Bug: mixed-case action references cause repeated downloads'
60
+ code: |
61
+ # workflow.yml — uses lowercase
62
+ steps:
63
+ - uses: actions/checkout@v4
64
+
65
+ # internal/composite/action.yml — uses title-case (different casing)
66
+ runs:
67
+ using: composite
68
+ steps:
69
+ - uses: Actions/Checkout@v4 # triggers a second download in v2.334.0
70
+ - language: yaml
71
+ label: 'Fix: canonicalize to lowercase owner/repo in all workflow and composite action files'
72
+ code: |
73
+ # workflow.yml
74
+ steps:
75
+ - uses: actions/checkout@v4 # lowercase
76
+
77
+ # internal/composite/action.yml
78
+ runs:
79
+ using: composite
80
+ steps:
81
+ - uses: actions/checkout@v4 # lowercase matches — deduplicated correctly
82
+ - language: yaml
83
+ label: 'Diagnostic: detect duplicate downloads by searching job logs'
84
+ code: |
85
+ # In the Actions UI, open the job log and search for "Download action repository"
86
+ # If the same action appears more than once (even with different casing), you
87
+ # have duplicate downloads.
88
+ #
89
+ # CLI equivalent (with gh):
90
+ # gh run view <run-id> --log | grep "Download action repository" | sort | uniq -c
91
+ prevention:
92
+ - 'Use consistent lowercase owner/repo for all action references across workflow files and composite actions'
93
+ - 'Add an actionlint check to CI; it flags inconsistent action reference casing'
94
+ - 'Audit composite action.yml files for uppercase uses: references, as they are the most common source'
95
+ - 'Pin runner version to v2.333.0 as a temporary workaround while waiting for runner#3731 fix'
96
+ docs:
97
+ - url: 'https://github.com/actions/runner/issues/3731'
98
+ label: 'GitHub runner#3731: Download action repository called repeatedly for same action (case-sensitivity)'
99
+ - url: 'https://github.com/actions/runner/releases/tag/v2.334.0'
100
+ label: 'Runner v2.334.0 release notes: batch and deduplicate action resolution'
@@ -0,0 +1,105 @@
1
+ id: silent-failures-107
2
+ title: "dorny/paths-filter Reports All Files Changed When 'before' Field Missing on workflow_run"
3
+ category: silent-failures
4
+ severity: silent-failure
5
+ tags:
6
+ - paths-filter
7
+ - workflow_run
8
+ - before-field
9
+ - silent-failure
10
+ - fork-pr
11
+ - changed-files
12
+ - wrong-output
13
+ patterns:
14
+ - regex: '''before'' field is missing in event payload'
15
+ flags: 'i'
16
+ - regex: 'changes will be detected from last commit'
17
+ flags: 'i'
18
+ - regex: 'paths-filter.*workflow.run'
19
+ flags: 'i'
20
+ error_messages:
21
+ - "Warning: 'before' field is missing in event payload - changes will be detected from last commit"
22
+ root_cause: |
23
+ `dorny/paths-filter` (v3 and earlier) contains an internal code path that requires a
24
+ `before` SHA field from the GitHub event payload to compute the diff base. When used in
25
+ a `workflow_run`-triggered workflow, the event payload does not contain a `before` field
26
+ (it only has `head_sha`, `head_branch`, etc.) — so the action falls into a fallback path
27
+ that calls `getChangesInLastCommit()` instead of performing a proper merge-base diff.
28
+
29
+ The result: **every file in the repository is reported as "changed"**, defeating any
30
+ attempt to use paths-filter as a change detection gate in `workflow_run` workflows.
31
+
32
+ Root cause in source code (paths-filter v3):
33
+ The action's `getChangedFilesFromGit()` logic normalizes both `base` and `head` to short
34
+ branch names. On `workflow_run`, `github.context.ref` resolves to the same ref as the
35
+ user-supplied `base:` (e.g. both become "main"). The `isBaseSameAsHead` check triggers
36
+ the fallback: if `base === head` AND `beforeSha` is null, the action emits the warning
37
+ and returns only the last commit's changes — which in a merge-queue or PR workflow is
38
+ completely wrong.
39
+
40
+ The same bug occurs when:
41
+ - Using paths-filter in a `workflow_run` workflow without explicitly passing `ref:` as a SHA.
42
+ - Running from a forked PR where the `before` field is absent from the push payload.
43
+ - Any workflow_run where `github.context.ref` resolves to the same branch name as the
44
+ `base:` input after short-name normalization.
45
+ fix: |
46
+ Pass `ref:` explicitly as the HEAD SHA of the triggering workflow run. Using a full SHA
47
+ prevents the `isBaseSameAsHead` short-circuit because a SHA never equals a branch name
48
+ after normalization:
49
+
50
+ - uses: dorny/paths-filter@v3
51
+ with:
52
+ base: 'main'
53
+ ref: ${{ github.event.workflow_run.head_sha }}
54
+ filters: |
55
+ backend:
56
+ - 'src/**'
57
+
58
+ Also ensure the prior `actions/checkout` step uses `fetch-depth: 0` so the action has
59
+ both `base` and `ref` locally available for `getChangesSinceMergeBase`.
60
+
61
+ For fork PRs: use `ref: ${{ github.event.workflow_run.head_sha }}` and
62
+ `base: ${{ github.event.workflow_run.base_branch }}`.
63
+
64
+ Alternative: upgrade to dorny/paths-filter v4.0.1+ which adds native `merge_group`
65
+ support and has improved ref handling (check release notes for workflow_run fixes).
66
+ fix_code:
67
+ - language: yaml
68
+ label: 'Pass ref as SHA to avoid isBaseSameAsHead false-positive on workflow_run'
69
+ code: |
70
+ on:
71
+ workflow_run:
72
+ workflows: ["CI"]
73
+ types: [completed]
74
+
75
+ jobs:
76
+ check-changes:
77
+ runs-on: ubuntu-latest
78
+ outputs:
79
+ backend: ${{ steps.filter.outputs.backend }}
80
+ steps:
81
+ - uses: actions/checkout@v4
82
+ with:
83
+ ref: ${{ github.event.workflow_run.head_sha }}
84
+ fetch-depth: 0
85
+
86
+ - uses: dorny/paths-filter@v3
87
+ id: filter
88
+ with:
89
+ base: ${{ github.event.workflow_run.base_branch }}
90
+ # REQUIRED: pass ref as SHA, not branch name
91
+ # Without this, base == head after normalization → all files reported as changed
92
+ ref: ${{ github.event.workflow_run.head_sha }}
93
+ filters: |
94
+ backend:
95
+ - 'src/**'
96
+ prevention:
97
+ - 'Always pass ref: ${{ github.event.workflow_run.head_sha }} when using paths-filter in workflow_run contexts.'
98
+ - 'Never rely on implicit ref resolution in workflow_run workflows — the context.ref is the default branch, not the PR head.'
99
+ - 'After fixing, add a smoke-test PR that changes only an unmonitored path and verify paths-filter returns false for the guarded path.'
100
+ - 'Consider tj-actions/changed-files as an alternative; check its workflow_run documentation before switching.'
101
+ docs:
102
+ - url: 'https://github.com/dorny/paths-filter/issues/261'
103
+ label: "Bug: 'before' field is missing when it should not even be used (11 reactions, open)"
104
+ - url: 'https://github.com/dorny/paths-filter'
105
+ label: 'dorny/paths-filter — conditionally run actions based on modified files'
@@ -0,0 +1,99 @@
1
+ id: silent-failures-106
2
+ title: 'windows-11-arm shell: bash Steps Intermittently Produce Zero Output and Exit 0'
3
+ category: silent-failures
4
+ severity: silent-failure
5
+ tags:
6
+ - windows-11-arm
7
+ - bash
8
+ - composite-action
9
+ - arm64
10
+ - wow64
11
+ - silent-failure
12
+ - intermittent
13
+ patterns:
14
+ - regex: 'error: no such command: `[a-z]'
15
+ flags: 'i'
16
+ - regex: 'error: no such subcommand: `[a-z]'
17
+ flags: 'i'
18
+ - regex: 'error\[E0463\]: can''t find crate for `[a-z]'
19
+ flags: 'i'
20
+ error_messages:
21
+ - 'error: no such command: `make`'
22
+ - 'error: no such command: `expand`'
23
+ - 'error: no such subcommand: `make`'
24
+ root_cause: |
25
+ On `windows-11-arm` runners, `shell: bash` resolves to `C:\Program Files\Git\bin\bash.EXE`,
26
+ which is an x86_64 binary running under WoW64 (Windows-on-Windows 64-bit) ARM64
27
+ emulation. The `actions/runner` binary on this platform is a **native ARM64** process.
28
+ When the native ARM64 runner spawns the x86_64 bash.EXE as a child process, an
29
+ intermittent failure in the WoW64 cross-architecture process launch causes the bash
30
+ process to start but never execute its script body. The step logs the command invocation
31
+ and environment dump, then exits 0 with zero script output — the script never ran.
32
+
33
+ Key characteristics:
34
+ - Failure is transient per-process-invocation: two consecutive bash steps in the same
35
+ job can have one succeed and one fail, ruling out image or runner misconfiguration.
36
+ - Zero-output steps take 0-3 seconds; successful steps take 2-7 seconds.
37
+ - Only occurs on `windows-11-arm`; `windows-latest` (amd64) is unaffected.
38
+ - The downstream step then fails with "no such command" or "not installed" errors because
39
+ the tool that was supposed to be installed by the bash step is missing.
40
+ - Reported across multiple independent Microsoft repositories
41
+ (microsoft/Windows-rust-driver-samples, microsoft/windows-drivers-rs), confirming
42
+ this is a platform-level runner bug, not a workflow configuration issue.
43
+
44
+ Root cause: intermittent x86_64 process execution failure under ARM64 WoW64 emulation
45
+ when launched from a native ARM64 parent process. No fix from the platform side as of
46
+ April 2026; actions/runner issue is open in partner-runner-images.
47
+ fix: |
48
+ Option 1 (recommended): invoke bash from PowerShell as a workaround. Wrapping the bash
49
+ call in a pwsh step bypasses the native-arm64 → x86_64 WoW64 launch that triggers the
50
+ bug, because PowerShell on windows-11-arm is itself x86_64, so spawning x86_64 bash
51
+ from x86_64 pwsh does not cross the ARM64/x86_64 boundary:
52
+
53
+ - name: Run script via pwsh workaround
54
+ shell: pwsh
55
+ run: |
56
+ $result = & 'C:\Program Files\Git\bin\bash.EXE' -c "./your-script.sh"
57
+ if ($LASTEXITCODE -ne 0) { exit $LASTEXITCODE }
58
+
59
+ Option 2: use `shell: pwsh` for the entire step and rewrite the logic in PowerShell.
60
+
61
+ Option 3: add a retry wrapper that re-runs the composite action step if the output
62
+ is unexpectedly empty (e.g. check for absence of the installed binary before proceeding
63
+ and re-invoke if missing).
64
+
65
+ Option 4: avoid `windows-11-arm` in the runner matrix until the platform bug is fixed
66
+ upstream, or pin to `windows-latest` (amd64) for bash-heavy composite actions.
67
+ fix_code:
68
+ - language: yaml
69
+ label: 'Invoke bash from pwsh to avoid the WoW64 ARM64 launch bug'
70
+ code: |
71
+ - name: Install tool (windows-11-arm workaround)
72
+ shell: pwsh
73
+ run: |
74
+ # Call x86_64 bash from x86_64 pwsh — avoids native-arm64 → x86_64 WoW64 issue
75
+ & 'C:\Program Files\Git\bin\bash.EXE' --noprofile --norc `
76
+ "${env:GITHUB_ACTION_PATH}/main.sh"
77
+ if ($LASTEXITCODE -ne 0) { exit $LASTEXITCODE }
78
+ - language: yaml
79
+ label: 'Exclude windows-11-arm from bash-heavy matrix jobs'
80
+ code: |
81
+ jobs:
82
+ build:
83
+ strategy:
84
+ matrix:
85
+ os: [windows-latest, ubuntu-latest, macos-latest]
86
+ # windows-11-arm excluded: shell: bash intermittent zero-output bug
87
+ # See: https://github.com/actions/partner-runner-images/issues/169
88
+ prevention:
89
+ - 'Test composite actions on windows-11-arm at scale before relying on them in production CI.'
90
+ - 'Add post-step verification that checks for the installed binary; fail explicitly rather than silently continuing.'
91
+ - 'Monitor taiki-e/install-action release notes for the official workaround (pwsh retry wrapper) if using that action.'
92
+ - 'Prefer shell: pwsh over shell: bash for setup/install scripts in composite actions targeting windows-11-arm.'
93
+ docs:
94
+ - url: 'https://github.com/actions/partner-runner-images/issues/169'
95
+ label: '[windows-11-arm] Composite action bash steps intermittently produce zero output and exit 0'
96
+ - url: 'https://github.com/taiki-e/install-action/issues/1562'
97
+ label: 'On windows-11-arm, sometimes does nothing, but succeeds'
98
+ - url: 'https://github.com/taiki-e/install-action/pull/1647'
99
+ label: 'Call main.sh from pwsh on Windows to work around windows-11-arm runner bug'
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@htekdev/actions-debugger",
3
- "version": "1.0.115",
3
+ "version": "1.0.116",
4
4
  "description": "65+ real GitHub Actions errors, queryable by agents. CLI + MCP server + Copilot skills + error database.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",