@htekdev/actions-debugger 1.0.35 → 1.0.36

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,97 @@
1
+ id: known-unsolved-033
2
+ title: 'actions/checkout Hangs or Times Out From EU GitHub-Hosted Runners (Regional Degradation)'
3
+ category: known-unsolved
4
+ severity: error
5
+ tags:
6
+ - checkout
7
+ - performance
8
+ - eu-runners
9
+ - timeout
10
+ - regional
11
+ - infrastructure
12
+ patterns:
13
+ - regex: 'fatal: unable to access.*https://github\.com.*timed out'
14
+ flags: 'i'
15
+ - regex: 'fatal: unable to access.*https://github\.com.*Could not resolve host'
16
+ flags: 'i'
17
+ - regex: 'Error: Process completed with exit code 128'
18
+ flags: 'i'
19
+ error_messages:
20
+ - "fatal: unable to access 'https://github.com/owner/repo/': Operation timed out"
21
+ - "Error: Process completed with exit code 128."
22
+ - 'actions/checkout step hanging for 5-30 minutes with no output'
23
+ root_cause: |
24
+ Starting May 19, 2026, workflows running on GitHub-hosted runners in European
25
+ data centers began experiencing severely degraded actions/checkout performance.
26
+ The fetch/clone phase hangs silently for 5-30 minutes before either completing
27
+ slowly or timing out, regardless of repository size. Workflows that previously
28
+ completed checkout in 10-30 seconds are affected.
29
+
30
+ The root cause is an infrastructure-level degradation on GitHub's side
31
+ affecting the European runner subnet's connectivity to GitHub's Smart HTTP
32
+ server or CDN endpoints. This is distinct from general large-repo slowness:
33
+ even tiny repositories with shallow clones exhibit the hang. GitHub has not
34
+ published a root cause analysis or resolution timeline as of June 2026.
35
+
36
+ Notably, runners in US regions (us-east-1, us-west-2) are not affected —
37
+ the issue is specific to EU runner region routing. The error manifests as
38
+ either a silent hang (no log output during the fetch phase) or an eventual
39
+ "Operation timed out" exit code 128.
40
+
41
+ Source: actions/checkout#2441 (52 reactions, opened May 24, 2026, still open).
42
+ fix: |
43
+ No upstream fix available — this is a GitHub infrastructure issue with no
44
+ workaround that completely eliminates the problem. Mitigations to reduce
45
+ impact:
46
+
47
+ 1. Add timeout-minutes to checkout steps to prevent indefinite hangs and
48
+ fail fast with a clear error rather than a silent stuck pipeline.
49
+
50
+ 2. Use fetch-depth: 1 (shallow clone) to reduce transfer size, which may
51
+ reduce hang duration even if it does not eliminate it.
52
+
53
+ 3. Use sparse-checkout to limit the files transferred from the CDN.
54
+
55
+ 4. For critical pipelines, consider temporarily switching to ubuntu-latest
56
+ with an explicit us-east-1 runner label if your GitHub plan supports
57
+ regional runner selection.
58
+
59
+ 5. Subscribe to GitHub Status (githubstatus.com) for EU infrastructure
60
+ degradation notices — incidents affecting this region are tracked there.
61
+ fix_code:
62
+ - language: yaml
63
+ label: 'Shallow clone with timeout to fail fast during EU degradation'
64
+ code: |
65
+ steps:
66
+ - name: Checkout
67
+ uses: actions/checkout@v4
68
+ timeout-minutes: 5 # fail fast instead of hanging for 30+ minutes
69
+ with:
70
+ fetch-depth: 1 # shallow clone reduces CDN transfer size
71
+ - language: yaml
72
+ label: 'Sparse checkout to minimize data fetched during regional degradation'
73
+ code: |
74
+ steps:
75
+ - name: Sparse checkout
76
+ uses: actions/checkout@v4
77
+ timeout-minutes: 5
78
+ with:
79
+ fetch-depth: 1
80
+ sparse-checkout: |
81
+ src/
82
+ tests/
83
+ package.json
84
+ go.mod
85
+ prevention:
86
+ - 'Always specify fetch-depth: 1 for workflows that do not require full commit history'
87
+ - 'Add timeout-minutes to every checkout step to prevent indefinite pipeline hangs'
88
+ - 'Monitor p99 checkout duration from EU runners as a CI health SLI'
89
+ - 'Subscribe to GitHub Status page (githubstatus.com) for EU infrastructure degradation notices'
90
+ - 'Use sparse-checkout in large monorepos to reduce CDN dependency during fetch'
91
+ docs:
92
+ - url: 'https://github.com/actions/checkout/issues/2441'
93
+ label: 'actions/checkout #2441: Checkouts extremely slow or timing out from EU (52 reactions, May 2026)'
94
+ - url: 'https://www.githubstatus.com/'
95
+ label: 'GitHub Status page for infrastructure incidents'
96
+ - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/evaluate-expressions-in-workflows-and-actions'
97
+ label: 'GitHub Docs: sparse-checkout in actions/checkout'
@@ -0,0 +1,93 @@
1
+ id: permissions-auth-032
2
+ title: 'checkout@v6 Credential Injection Fails on Self-Hosted Runners With Symlinked _work Directory'
3
+ category: permissions-auth
4
+ severity: error
5
+ tags:
6
+ - checkout-v6
7
+ - self-hosted
8
+ - symlink
9
+ - credentials
10
+ - includif
11
+ - macos
12
+ patterns:
13
+ - regex: 'fatal: could not read Username for.*terminal prompts disabled'
14
+ flags: 'i'
15
+ - regex: 'includeIf.*gitdir.*_work'
16
+ flags: 'i'
17
+ - regex: 'fatal: repository.*not found'
18
+ flags: 'i'
19
+ error_messages:
20
+ - "fatal: could not read Username for 'https://github.com': terminal prompts disabled"
21
+ - 'Error: fatal: repository not found'
22
+ - 'Authentication failed'
23
+ root_cause: |
24
+ actions/checkout@v6 changed credential injection from writing directly into
25
+ the repository configuration file as http.https://github.com/.extraheader
26
+ (the v5 approach) to using includeIf "gitdir:..." directives that reference
27
+ a temporary credentials file stored in _work/_temp/.
28
+
29
+ v6 writes the includeIf path using the symlink path of the runner _work
30
+ directory. However, the version control system evaluates gitdir: conditions
31
+ against the resolved (real) absolute path — it follows symlinks when
32
+ determining the current repository's directory.
33
+
34
+ When the runner _work directory is a symlink to an external volume (a common
35
+ setup for macOS Apple Silicon runners using external SSD storage), the
36
+ includeIf path written by v6 uses the symlink path
37
+ (e.g., /Users/runner/actions-runner-N/_work/repo/.git) but the actual
38
+ resolved path is different
39
+ (e.g., /Volumes/External/actions-runner-N-work/repo/.git).
40
+ These never match, so the credentials config file is never loaded and the
41
+ fetch step fails with "terminal prompts disabled."
42
+
43
+ v5 is unaffected because it injects credentials directly into the repository
44
+ configuration file rather than using conditional includes.
45
+ Source: actions/checkout#2393 (open March 2026, macOS Apple Silicon).
46
+ fix: |
47
+ Option 1 (recommended): Pin to actions/checkout@v5 for workflows running on
48
+ self-hosted runners with symlinked _work directories. v5 injects credentials
49
+ directly and is not affected by this symlink resolution issue.
50
+
51
+ Option 2: Reconfigure the runner to use the real volume path directly.
52
+ Remove the symlink from _work and mount the external volume at the actual
53
+ runner work path location. This eliminates the symlink entirely.
54
+
55
+ Option 3: Use persist-credentials: false with a separate authentication
56
+ step that does not rely on the includeIf mechanism.
57
+ fix_code:
58
+ - language: yaml
59
+ label: 'Pin to v5 as workaround for symlinked _work runners (checkout#2393)'
60
+ code: |
61
+ steps:
62
+ - name: Checkout
63
+ # Pinned to v5 — v6 includeIf credential injection fails when runner
64
+ # _work directory is a symlink to an external volume (checkout#2393)
65
+ uses: actions/checkout@v5
66
+ with:
67
+ token: ${{ secrets.GITHUB_TOKEN }}
68
+ - language: yaml
69
+ label: 'Use persist-credentials false with explicit token for subsequent steps'
70
+ code: |
71
+ steps:
72
+ - name: Checkout without credential persistence
73
+ uses: actions/checkout@v6
74
+ with:
75
+ persist-credentials: false
76
+
77
+ - name: Subsequent steps using explicit token
78
+ env:
79
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
80
+ run: |
81
+ echo "Use GITHUB_TOKEN env var in subsequent authenticated operations"
82
+ prevention:
83
+ - 'Audit self-hosted runner _work paths for symlinks before upgrading from checkout@v5 to v6'
84
+ - 'Avoid symlinking the runner _work directory — use bind mounts or configure the real path'
85
+ - 'Test checkout behavior on self-hosted runners in a canary workflow before rolling out v6'
86
+ - 'Check the resolved path differs from the symlink path when debugging "terminal prompts disabled" errors'
87
+ docs:
88
+ - url: 'https://github.com/actions/checkout/issues/2393'
89
+ label: 'actions/checkout #2393: v6 includeIf credential matching fails on symlinked _work (open March 2026)'
90
+ - url: 'https://github.com/actions/checkout/issues/2313'
91
+ label: 'actions/checkout #2313: v6 breaks Docker actions using credential auth (related, closed)'
92
+ - url: 'https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/configuring-the-self-hosted-runner-application-as-a-service'
93
+ label: 'GitHub Docs: Configuring the self-hosted runner as a service'
@@ -0,0 +1,115 @@
1
+ id: runner-environment-090
2
+ title: 'Ephemeral Self-Hosted Runner Fails Immediately With "An error occurred: Runner not found"'
3
+ category: runner-environment
4
+ severity: error
5
+ tags:
6
+ - self-hosted
7
+ - ephemeral
8
+ - runner-not-found
9
+ - registration
10
+ - broker
11
+ - jit
12
+ patterns:
13
+ - regex: 'An error occurred: Runner not found'
14
+ flags: 'i'
15
+ - regex: 'RunnerNotFoundException'
16
+ flags: 'i'
17
+ error_messages:
18
+ - 'An error occurred: Runner not found'
19
+ - 'GitHub.Actions.RunService.WebApi.RunnerNotFoundException'
20
+ - 'Listening for Jobs'
21
+ root_cause: |
22
+ GitHub's broker endpoint returns RunnerNotFoundException immediately after
23
+ a successful registration and connection for ephemeral self-hosted runners
24
+ configured with replace mode. The runner completes registration ("Successfully
25
+ replaced the runner"), establishes connection ("Runner connection is good"),
26
+ starts listening for jobs, then receives a RunnerNotFoundException from the
27
+ broker HTTP client within seconds.
28
+
29
+ The error originates in BrokerHttpClient.cs where the broker API returns a
30
+ 404/RunnerNotFoundException for the registered runner slot. This can occur
31
+ when the broker has stale slot state from the previous ephemeral runner
32
+ iteration that collides with the newly registered runner identity during the
33
+ brief window between registration and first poll.
34
+
35
+ The runner has no graceful retry handling for this condition — it exits with
36
+ status 1, causing systemd to restart it repeatedly, rapidly exhausting GitHub
37
+ App installation tokens through frequent re-registration cycles.
38
+
39
+ Affects all architectures (x86_64, aarch64, s390x) on various runner versions.
40
+ Spikes during periods of elevated load on GitHub broker infrastructure.
41
+ Source: actions/runner#3857 (116 reactions, open May 2025).
42
+ fix: |
43
+ 1. Switch from replace-mode ephemeral runners to JIT (Just-In-Time) runner
44
+ tokens. JIT runners receive a pre-assigned job ID and avoid the broker
45
+ slot replacement race entirely.
46
+ 2. Update the runner to the latest version (v2.334.0+) which improves retry
47
+ behavior around transient broker errors.
48
+ 3. Add restart delay in the systemd service unit to prevent token exhaustion
49
+ on rapid restart loops:
50
+ RestartSec=30
51
+ StartLimitIntervalSec=300
52
+ StartLimitBurst=5
53
+ 4. Monitor runner diagnostic logs in _diag/Runner_*.log for the
54
+ RunnerNotFoundException pattern to distinguish broker errors from
55
+ configuration issues.
56
+ fix_code:
57
+ - language: yaml
58
+ label: 'Systemd service unit with restart throttle to prevent token exhaustion'
59
+ code: |
60
+ # /etc/systemd/system/actions-runner.service
61
+ [Unit]
62
+ Description=GitHub Actions Self-Hosted Runner
63
+ After=network-online.target
64
+
65
+ [Service]
66
+ ExecStart=/home/runner/actions-runner/run.sh
67
+ Restart=on-failure
68
+ RestartSec=30
69
+ StartLimitIntervalSec=300
70
+ StartLimitBurst=5
71
+ User=runner
72
+
73
+ [Install]
74
+ WantedBy=multi-user.target
75
+ - language: yaml
76
+ label: 'Workflow using JIT runner token to avoid broker slot collision'
77
+ code: |
78
+ jobs:
79
+ provision-runner:
80
+ runs-on: ubuntu-latest
81
+ outputs:
82
+ runner-token: ${{ steps.jit.outputs.encoded_jit_config }}
83
+ steps:
84
+ - name: Generate JIT runner token
85
+ id: jit
86
+ uses: actions/github-script@v7
87
+ with:
88
+ script: |
89
+ const { data } = await github.rest.actions.generateRunnerJitconfigForRepo({
90
+ owner: context.repo.owner,
91
+ repo: context.repo.repo,
92
+ name: 'ephemeral-jit-runner',
93
+ runner_group_id: 1,
94
+ labels: ['self-hosted', 'ephemeral', 'linux']
95
+ });
96
+ core.setOutput('encoded_jit_config', data.encoded_jit_config);
97
+
98
+ build:
99
+ needs: provision-runner
100
+ runs-on: [self-hosted, ephemeral, linux]
101
+ steps:
102
+ - uses: actions/checkout@v4
103
+ prevention:
104
+ - 'Use JIT runner tokens instead of replace-mode registration to eliminate broker slot race'
105
+ - 'Set systemd RestartSec to at least 30 seconds to avoid GitHub App token exhaustion'
106
+ - 'Monitor _diag/Runner_*.log for RunnerNotFoundException patterns and alert on restart frequency'
107
+ - 'Keep runner version current — broker compatibility fixes are regularly backported'
108
+ - 'Consider Kubernetes ARC ephemeral runners where pod lifecycle handles registration cleanly'
109
+ docs:
110
+ - url: 'https://github.com/actions/runner/issues/3857'
111
+ label: 'actions/runner #3857: An error occurred: Runner not found (116 reactions, open May 2025)'
112
+ - url: 'https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/autoscaling-with-self-hosted-runners#using-just-in-time-runners'
113
+ label: 'GitHub Docs: Just-in-time runners (JIT)'
114
+ - url: 'https://docs.github.com/en/rest/actions/self-hosted-runners#create-configuration-for-a-just-in-time-runner-for-a-repository'
115
+ label: 'GitHub REST API: Generate JIT runner config'
@@ -0,0 +1,119 @@
1
+ id: runner-environment-091
2
+ title: 'Self-Hosted Runner Worker Wedges Indefinitely After TaskOrchestrationJobNotFoundException'
3
+ category: runner-environment
4
+ severity: error
5
+ tags:
6
+ - self-hosted
7
+ - runner-worker
8
+ - wedged
9
+ - slot-starvation
10
+ - v2-runservice
11
+ - macos
12
+ - apple-silicon
13
+ patterns:
14
+ - regex: 'TaskOrchestrationJobNotFoundException.*workflow instance not found'
15
+ flags: 'i'
16
+ - regex: 'Job not found:.*workflow instance not found'
17
+ flags: 'i'
18
+ - regex: 'CompleteJobAsync.*TaskOrchestrationJobNotFoundException'
19
+ flags: 'i'
20
+ error_messages:
21
+ - 'GitHub.DistributedTask.WebApi.TaskOrchestrationJobNotFoundException: Job not found: <job-guid>. workflow instance not found'
22
+ - 'TaskOrchestrationJobNotFoundException: workflow instance not found'
23
+ - 'System.AggregateException: One or more errors occurred. (Job not found:'
24
+ root_cause: |
25
+ When a self-hosted runner Worker process calls CompleteJobAsync in the V2
26
+ RunService path (useV2Flow: true, RunServiceHttpClient.CompleteJobAsync)
27
+ and the GitHub orchestrator has discarded the job record (e.g., due to a
28
+ server-side timeout, infrastructure failover, or job cancellation during
29
+ finalization), the Worker receives TaskOrchestrationJobNotFoundException
30
+ with "workflow instance not found."
31
+
32
+ After exhausting configured retry attempts (default maxAttempts), the Worker
33
+ logs the exception and stops processing — but critically, it fails to call
34
+ Environment.Exit and the process remains alive at ~0.1% CPU with no active
35
+ work, no child processes, and no job cleanup activity.
36
+
37
+ The parent Runner.Listener treats the still-running Worker process as a busy
38
+ runner slot and refuses to spawn a new Worker. This causes runner slot
39
+ starvation: the affected runner stops accepting new jobs until the wedged
40
+ Worker is externally terminated (kill, reboot, or watchdog).
41
+
42
+ On one 3-host Apple Silicon runner pool (v2.334.0), this affected 32.8% of
43
+ Worker invocations (50 of 152) over three weeks, with one incident wedging
44
+ all three Workers simultaneously and blocking CI for 3+ hours.
45
+ Source: actions/runner#4418 (open May 2026).
46
+ fix: |
47
+ No upstream fix available — the Worker does not exit on non-retryable
48
+ CompleteJobAsync failures. Mitigations:
49
+
50
+ 1. Deploy a watchdog that monitors _diag/Worker_*.log for the
51
+ TaskOrchestrationJobNotFoundException pattern and kills the wedged
52
+ Worker process by PID.
53
+
54
+ 2. Use Kubernetes ARC ephemeral runners where the pod lifecycle replaces
55
+ the entire runner environment after each job — a wedged Worker is
56
+ automatically cleaned up when the pod is recycled.
57
+
58
+ 3. Configure a hard systemd runtime limit (RuntimeMaxSec) that terminates
59
+ any runner process exceeding your longest expected job duration plus a
60
+ safety margin.
61
+
62
+ 4. Add an external health-check cron that queries the GitHub API for runner
63
+ status and restarts the runner service if slots show "busy" longer than
64
+ expected.
65
+ fix_code:
66
+ - language: yaml
67
+ label: 'Kubernetes ARC ephemeral runner configuration (avoids wedged Worker state)'
68
+ code: |
69
+ # ARC RunnerDeployment — pods are recycled after each job
70
+ apiVersion: actions.summerwind.dev/v1alpha1
71
+ kind: RunnerDeployment
72
+ metadata:
73
+ name: ephemeral-runner-deployment
74
+ spec:
75
+ replicas: 3
76
+ template:
77
+ spec:
78
+ ephemeral: true # pod recycled after each job, no wedge possible
79
+ repository: owner/repo
80
+ labels:
81
+ - self-hosted
82
+ - ephemeral
83
+ - language: yaml
84
+ label: 'Scheduled watchdog workflow to detect stalled runner slots via API'
85
+ code: |
86
+ on:
87
+ schedule:
88
+ - cron: '*/15 * * * *' # every 15 minutes
89
+
90
+ jobs:
91
+ runner-health-check:
92
+ runs-on: ubuntu-latest
93
+ steps:
94
+ - name: Detect stalled self-hosted runners
95
+ uses: actions/github-script@v7
96
+ with:
97
+ script: |
98
+ const runners = await github.rest.actions.listSelfHostedRunnersForRepo({
99
+ owner: context.repo.owner,
100
+ repo: context.repo.repo
101
+ });
102
+ const offline = runners.data.runners.filter(r => r.status === 'offline');
103
+ if (offline.length > 0) {
104
+ core.warning('Offline/stalled runners: ' + offline.map(r => r.name).join(', '));
105
+ // Trigger your runner restart webhook here
106
+ }
107
+ prevention:
108
+ - 'Use ephemeral Kubernetes ARC runners — pod recycle eliminates wedged Worker slot starvation'
109
+ - 'Monitor _diag/Worker_*.log for TaskOrchestrationJobNotFoundException patterns'
110
+ - 'Set systemd RuntimeMaxSec to maximum expected job duration plus 30 minutes'
111
+ - 'Track runner slot busy duration — sudden sustained busy state with no job output indicates wedge'
112
+ - 'Deploy a watchdog process alongside the runner that monitors Worker PID lifetime'
113
+ docs:
114
+ - url: 'https://github.com/actions/runner/issues/4418'
115
+ label: 'actions/runner #4418: Worker wedges after TaskOrchestrationJobNotFoundException (open May 2026)'
116
+ - url: 'https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/autoscaling-with-self-hosted-runners'
117
+ label: 'GitHub Docs: Autoscaling with self-hosted runners'
118
+ - url: 'https://github.com/actions/actions-runner-controller'
119
+ label: 'actions/actions-runner-controller: Kubernetes ARC runner controller'
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@htekdev/actions-debugger",
3
- "version": "1.0.35",
3
+ "version": "1.0.36",
4
4
  "description": "65+ real GitHub Actions errors, queryable by agents. CLI + MCP server + Copilot skills + error database.",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",