@htekdev/actions-debugger 1.0.118 → 1.0.120

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,94 @@
1
+ id: caching-artifacts-070
2
+ title: "setup-python Post step fails — pip cache directory doesn't exist on disk"
3
+ category: caching-artifacts
4
+ severity: error
5
+ tags:
6
+ - setup-python
7
+ - pip
8
+ - cache
9
+ - post-step
10
+ - cache-miss
11
+ - no-dependencies
12
+ - python-version-bump
13
+ patterns:
14
+ - regex: 'Cache folder path is retrieved for pip but doesn.t exist on disk'
15
+ flags: 'i'
16
+ - regex: 'likely indicates that there are no dependencies to cache'
17
+ flags: 'i'
18
+ - regex: 'Post Setup Python.*fail|Post.*setup-python.*error'
19
+ flags: 'i'
20
+ error_messages:
21
+ - "Cache folder path is retrieved for pip but doesn't exist on disk: /home/runner/.cache/pip. This likely indicates that there are no dependencies to cache. Consider removing the cache step if it is not needed."
22
+ root_cause: |
23
+ When actions/setup-python is configured with cache: 'pip', the action records the expected
24
+ pip cache directory path at setup time (/home/runner/.cache/pip on Linux, equivalent on
25
+ macOS/Windows). The Post Setup Python step runs at job end and attempts to save the cache
26
+ to that path. If the directory does not exist on disk at save time, the Post step fails
27
+ with this error.
28
+
29
+ Two common causes:
30
+
31
+ 1. No pip install ran in the job: The job uses cache: 'pip' but only runs linting or other
32
+ pre-installed tools without installing any Python packages. Pip never creates its cache
33
+ directory, so there is nothing to save. The job's actual steps appear green while the
34
+ Post step fails and turns the overall workflow run red.
35
+
36
+ 2. Python version bump causes cache key miss: When the Python patch version changes (e.g.,
37
+ from 3.13.5 to 3.13.6), the setup-python cache key changes and the first run after the
38
+ bump experiences a full cache miss. If pip install runs but writes packages to a virtual
39
+ environment rather than the global pip cache (/home/runner/.cache/pip), the expected
40
+ directory remains empty and the Post step fails. Subsequent runs after the new cache is
41
+ warmed succeed.
42
+
43
+ The failure is deceptive because it surfaces in the Post Setup Python cleanup step — well
44
+ after the test or build steps have already succeeded — making it easy to overlook the root
45
+ cause.
46
+ fix: |
47
+ - Remove cache: 'pip' from any setup-python step in jobs that do not call pip install.
48
+ Linting-only jobs, type-check-only jobs, and jobs that rely entirely on pre-installed
49
+ system Python do not benefit from pip caching.
50
+ - If pip install does run, ensure it runs against the global pip (not a virtualenv that
51
+ bypasses /home/runner/.cache/pip) so the post step can find and save the cache.
52
+ - Upgrade to actions/setup-python@v5 or later: newer versions emit a warning annotation
53
+ instead of failing the step when the cache directory is missing.
54
+ - After a Python version bump, the first run is expected to cache-miss; monitor for Post
55
+ step failures on that first run and confirm subsequent runs succeed.
56
+ fix_code:
57
+ - language: yaml
58
+ label: 'Fix: Remove cache when no pip install runs'
59
+ code: |
60
+ # WRONG: cache: pip set but job only lints — no pip install → Post step fails
61
+ - uses: actions/setup-python@v4
62
+ with:
63
+ python-version: '3.12'
64
+ cache: 'pip' # ← REMOVE when no pip install follows
65
+ - name: Lint
66
+ run: flake8 . # no pip install; /home/runner/.cache/pip never created
67
+
68
+ # CORRECT: omit cache when the job does not install packages
69
+ - uses: actions/setup-python@v4
70
+ with:
71
+ python-version: '3.12'
72
+ # no cache key — Post step skips cache save attempt entirely
73
+ - name: Lint
74
+ run: flake8 .
75
+ - language: yaml
76
+ label: 'Fix: Upgrade to setup-python@v5 for graceful handling'
77
+ code: |
78
+ # CORRECT: v5+ emits a warning annotation instead of failing when cache path missing
79
+ - uses: actions/setup-python@v5
80
+ with:
81
+ python-version: '3.12'
82
+ cache: 'pip'
83
+ - name: Install dependencies
84
+ run: pip install -r requirements.txt
85
+ prevention:
86
+ - 'Only set cache: pip on jobs that actually run pip install — linting-only jobs should omit it.'
87
+ - 'Use actions/setup-python@v5 or later; it handles missing cache directories with a warning instead of a failure.'
88
+ - 'After bumping the Python patch version in python-version:, expect one cache-miss run and watch for Post step failures on that run only.'
89
+ - 'When using virtual environments (venv/pipenv/poetry), ensure pip still writes to the global cache or configure cache-dependency-path appropriately.'
90
+ docs:
91
+ - url: 'https://github.com/actions/setup-python/issues/1169'
92
+ label: 'setup-python#1169: Cache folder path doesn''t exist on disk (Aug 2025)'
93
+ - url: 'https://github.com/actions/setup-python'
94
+ label: 'actions/setup-python: Caching packages documentation'
@@ -0,0 +1,127 @@
1
+ id: concurrency-timing-056
2
+ title: 'Workflow-level and job-level concurrency share same group key — deadlock cancellation fires immediately'
3
+ category: concurrency-timing
4
+ severity: error
5
+ tags:
6
+ - concurrency
7
+ - deadlock
8
+ - workflow-level
9
+ - job-level
10
+ - reusable-workflow
11
+ - workflow-call
12
+ - github-workflow-context
13
+ patterns:
14
+ - regex: 'Canceling since a deadlock for concurrency group .* was detected between'
15
+ flags: 'i'
16
+ - regex: 'deadlock.*concurrency group|concurrency group.*deadlock'
17
+ flags: 'i'
18
+ error_messages:
19
+ - "Canceling since a deadlock for concurrency group 'ci-refs/heads/main' was detected between 'top level workflow' and 'deploy'"
20
+ - "Canceling since a deadlock for concurrency group 'release-refs/heads/main' was detected between 'top level workflow' and 'api'"
21
+ root_cause: |
22
+ GitHub Actions fires a deadlock error and immediately cancels the offending job when the
23
+ same concurrency group name is held simultaneously by two levels within a single workflow
24
+ execution. Two distinct scenarios trigger this:
25
+
26
+ Scenario 1 — Same-workflow self-deadlock: A workflow file defines concurrency: at the
27
+ workflow level AND one of its jobs also defines jobs.<id>.concurrency: using an expression
28
+ that evaluates to the same string:
29
+
30
+ concurrency:
31
+ group: ${{ github.workflow }}-${{ github.ref }} # workflow-level slot
32
+
33
+ jobs:
34
+ deploy:
35
+ concurrency:
36
+ group: ${{ github.workflow }}-${{ github.ref }} # same string → deadlock
37
+ runs-on: ubuntu-latest
38
+
39
+ Scenario 2 — Reusable callee inherits caller context: A calling workflow has workflow-level
40
+ concurrency using ${{ github.workflow }}-${{ github.ref }}. The called reusable workflow
41
+ also defines workflow-level concurrency with the same expression. Because github.workflow
42
+ inside a reusable workflow inherits the CALLER's workflow name (not the callee's filename),
43
+ both evaluate to the identical group key and GitHub detects a deadlock.
44
+
45
+ github.workflow_ref also inherits from the top-level caller and does NOT produce a unique
46
+ value in the callee context; it cannot be used to distinguish caller from callee.
47
+ fix: |
48
+ Scenario 1 — Remove the duplicate concurrency block. Keep either the workflow-level OR
49
+ the job-level declaration, not both with the same key. If per-job isolation is needed,
50
+ append ${{ github.job }} to the job-level group name:
51
+
52
+ jobs:
53
+ deploy:
54
+ concurrency:
55
+ group: ${{ github.workflow }}-${{ github.ref }}-${{ github.job }}
56
+
57
+ Scenario 2 — Remove the concurrency: block entirely from the reusable workflow. The
58
+ caller's workflow-level concurrency already governs the entire execution. If the callee
59
+ needs standalone concurrency when triggered via workflow_dispatch, use a hardcoded unique
60
+ prefix instead of ${{ github.workflow }}:
61
+
62
+ concurrency:
63
+ group: deploy-${{ github.ref }} # hardcoded prefix avoids collision with any caller
64
+ cancel-in-progress: true
65
+ fix_code:
66
+ - language: yaml
67
+ label: 'Scenario 1 fix: remove duplicate job-level concurrency'
68
+ code: |
69
+ # WRONG — identical group at workflow level and job level → deadlock
70
+ concurrency:
71
+ group: ${{ github.workflow }}-${{ github.ref }}
72
+ cancel-in-progress: true
73
+
74
+ jobs:
75
+ deploy:
76
+ concurrency:
77
+ group: ${{ github.workflow }}-${{ github.ref }} # ← DELETE THIS
78
+ runs-on: ubuntu-latest
79
+ steps:
80
+ - run: echo deploying
81
+
82
+ # CORRECT — concurrency only at workflow level
83
+ concurrency:
84
+ group: ${{ github.workflow }}-${{ github.ref }}
85
+ cancel-in-progress: true
86
+
87
+ jobs:
88
+ deploy:
89
+ runs-on: ubuntu-latest
90
+ steps:
91
+ - run: echo deploying
92
+ - language: yaml
93
+ label: 'Scenario 2 fix: remove concurrency from reusable workflow'
94
+ code: |
95
+ # deploy.yml (reusable) — WRONG: workflow-level concurrency collides with caller
96
+ # because github.workflow returns the CALLER's name in reusable context
97
+ on:
98
+ workflow_call:
99
+ workflow_dispatch:
100
+ # concurrency: ← DELETE this entire block from the reusable workflow
101
+ # group: ${{ github.workflow }}-${{ github.ref }}
102
+ # cancel-in-progress: true
103
+
104
+ jobs:
105
+ deploy:
106
+ runs-on: ubuntu-latest
107
+ steps:
108
+ - run: echo deploying
109
+
110
+ # If standalone concurrency is needed for workflow_dispatch calls, use hardcoded prefix:
111
+ # concurrency:
112
+ # group: deploy-${{ github.ref }} # hardcoded "deploy-" avoids collision
113
+ # cancel-in-progress: true
114
+ prevention:
115
+ - 'Before adding concurrency: to a reusable workflow, check if it will be called via workflow_call — if so, remove it or use a hardcoded prefix.'
116
+ - 'Never use the same concurrency group expression at both the workflow level and job level in the same file.'
117
+ - 'Note that ${{ github.workflow }} and ${{ github.workflow_ref }} both return the top-level caller''s values inside a reusable workflow; neither provides the callee''s filename.'
118
+ - 'Use actionlint to statically detect identical concurrency groups — issue actionlint#538 tracks adding this check.'
119
+ docs:
120
+ - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/control-the-concurrency-of-workflows-and-jobs'
121
+ label: 'GitHub Docs: Controlling the concurrency of workflows and jobs'
122
+ - url: 'https://stackoverflow.com/questions/78101326/github-actions-concurrency-deadlock'
123
+ label: 'SO: GitHub Actions concurrency deadlock (Score 6, 1.7K views)'
124
+ - url: 'https://stackoverflow.com/questions/79511940/using-workflow-filename-in-concurrency-group-for-workflows-started-by-workflow-c'
125
+ label: 'SO: Using workflow filename in concurrency group for workflow_call (Score 3)'
126
+ - url: 'https://github.com/github/vscode-github-actions/issues/135'
127
+ label: 'vscode-github-actions#135: Identical concurrency groups cause silent never-run (14 reactions)'
@@ -0,0 +1,115 @@
1
+ id: concurrency-timing-057
2
+ title: "Fork PRs with Identical Branch Names Share Concurrency Group and Cancel Each Other"
3
+ category: concurrency-timing
4
+ severity: silent-failure
5
+ tags:
6
+ - concurrency
7
+ - fork
8
+ - pull_request
9
+ - head_ref
10
+ - cancel-in-progress
11
+ - silent-cancel
12
+ patterns:
13
+ - regex: 'group.*github\.head_ref'
14
+ flags: 'i'
15
+ - regex: 'This run was cancelled'
16
+ flags: 'i'
17
+ error_messages:
18
+ - "This run was cancelled."
19
+ - "Run was cancelled."
20
+ root_cause: |
21
+ When a workflow uses 'github.head_ref' as the sole identifier in a
22
+ concurrency group key (a common pattern to cancel stale runs on the same
23
+ branch), all pull requests that share a branch name across different forks
24
+ map to the SAME concurrency group. With 'cancel-in-progress: true', the
25
+ latest queued run cancels all earlier runs in that group — including runs
26
+ from completely unrelated PRs in OTHER contributor forks.
27
+
28
+ Example scenario:
29
+ - Fork A (alice/myrepo) opens PR from branch 'fix/auth-bug'.
30
+ - Fork B (bob/myrepo) opens a PR from a branch also named 'fix/auth-bug'.
31
+ - Both PRs target the same upstream repo (org/myrepo).
32
+ - Concurrency group: 'ci-fix/auth-bug' (from github.head_ref).
33
+ - When Fork B's PR triggers a run, it cancels Fork A's in-progress run.
34
+
35
+ The cancellation appears as "This run was cancelled" with no explanation
36
+ that a different fork's PR caused it. Maintainers and contributors see
37
+ flaky-looking CI with no obvious cause.
38
+
39
+ This is especially common in:
40
+ - Large open-source projects where many contributors use the same
41
+ conventional branch names (fix/typo, docs/readme, feature/x).
42
+ - Dependabot/Renovate PRs across forks — all use the same structured
43
+ branch name pattern (dependabot/npm_and_yarn/lodash-4.0.0).
44
+ fix: |
45
+ Include 'github.event.pull_request.number' in the concurrency group key.
46
+ PR numbers are unique per repository, so two PRs from different forks
47
+ always have different numbers even if their branch names collide.
48
+
49
+ Alternative: use 'github.run_id' for maximum uniqueness (no cancellation
50
+ across runs at all), but this defeats the purpose of cancel-in-progress
51
+ for the same PR.
52
+
53
+ The recommended pattern from GitHub documentation:
54
+ group: '${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}'
55
+
56
+ The '|| github.ref' fallback handles non-PR events (push, schedule)
57
+ where 'github.event.pull_request.number' is empty.
58
+ fix_code:
59
+ - language: yaml
60
+ label: "Broken — github.head_ref alone causes cross-fork cancellation"
61
+ code: |
62
+ concurrency:
63
+ group: ci-${{ github.head_ref }} # ❌ Collides across forks with same branch name
64
+ cancel-in-progress: true
65
+
66
+ - language: yaml
67
+ label: "Fixed — include PR number to ensure per-PR uniqueness"
68
+ code: |
69
+ concurrency:
70
+ # PR number is unique per repo — different forks never collide
71
+ group: '${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}'
72
+ cancel-in-progress: true
73
+
74
+ - language: yaml
75
+ label: "Alternative — include repo owner to scope per fork"
76
+ code: |
77
+ concurrency:
78
+ # Include head repo full_name to distinguish forks explicitly
79
+ group: >-
80
+ ${{ github.workflow }}-
81
+ ${{ github.event.pull_request.head.repo.full_name || github.repository }}-
82
+ ${{ github.event.pull_request.number || github.ref }}
83
+ cancel-in-progress: true
84
+
85
+ - language: yaml
86
+ label: "Recommended pattern from GitHub docs — workflow + PR number or ref"
87
+ code: |
88
+ name: CI
89
+ on:
90
+ pull_request:
91
+ push:
92
+ branches: [main]
93
+
94
+ concurrency:
95
+ group: '${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}'
96
+ cancel-in-progress: true
97
+
98
+ jobs:
99
+ test:
100
+ runs-on: ubuntu-latest
101
+ steps:
102
+ - uses: actions/checkout@v4
103
+ - run: ./run-tests.sh
104
+
105
+ prevention:
106
+ - "Never use github.head_ref alone as a concurrency group key for pull_request workflows."
107
+ - "Always pair github.head_ref with github.event.pull_request.number to scope to the specific PR."
108
+ - "Use the GitHub-recommended pattern: '${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}'."
109
+ - "Test concurrency behavior by opening two PRs from different forks with the same branch name before merging concurrency configuration."
110
+ - "On public repos with many external contributors, audit all concurrency group keys for cross-fork collision risk."
111
+ docs:
112
+ - url: "https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/using-concurrency"
113
+ label: "GitHub Docs — Using concurrency (recommended group key pattern)"
114
+ - url: "https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#concurrency"
115
+ label: "GitHub Docs — Workflow syntax: concurrency"
@@ -0,0 +1,121 @@
1
+ id: concurrency-timing-058
2
+ title: "concurrency queue:max 100-Slot Overflow Silently Cancels the Oldest Pending Run"
3
+ category: concurrency-timing
4
+ severity: silent-failure
5
+ tags:
6
+ - concurrency
7
+ - queue-max
8
+ - silent-cancel
9
+ - pending
10
+ - overflow
11
+ - deployment
12
+ patterns:
13
+ - regex: 'queue.*max|queue:.*max'
14
+ flags: 'i'
15
+ - regex: 'This run was cancelled|run.*cancelled'
16
+ flags: 'i'
17
+ error_messages:
18
+ - "This run was cancelled."
19
+ - "Run was cancelled."
20
+ root_cause: |
21
+ GitHub Actions concurrency queue:max allows up to 100 workflow runs or jobs
22
+ to wait in a single concurrency group instead of being cancelled immediately.
23
+ When the 100-slot queue is already full and a new run arrives, the oldest
24
+ pending run in the queue is silently cancelled to make room.
25
+
26
+ The cancelled run shows "This run was cancelled." in the UI with no
27
+ indication that the cancellation was caused by queue overflow. This message
28
+ is identical to normal concurrency cancel-in-progress cancellations, making
29
+ it impossible to distinguish overflow cancellation from intentional
30
+ cancellation without additional observability.
31
+
32
+ This problem surfaces when:
33
+ - A pipeline is slower than the push rate (runs queue faster than they drain)
34
+ - A burst of commits or tags fires many workflow runs simultaneously
35
+ - A monorepo path filter causes many unrelated commits to all queue in the
36
+ same deployment concurrency group
37
+ - A workflow is accidentally triggered on every push to any branch and all
38
+ share a single production-deploy concurrency group
39
+
40
+ Note: queue:max and cancel-in-progress:true cannot be combined (validation
41
+ error). queue:max assumes you want all runs to eventually execute in order.
42
+ fix: |
43
+ queue:max with 100 slots is generous for most pipelines. If you are hitting
44
+ the overflow limit, the queue depth is a symptom of a throughput mismatch:
45
+
46
+ 1. Speed up the job so the queue drains faster than it fills. Profile and
47
+ optimize the slowest steps (build caching, parallelism inside the job).
48
+
49
+ 2. Narrow the trigger to reduce unnecessary queuing:
50
+ - Use paths: filters so only relevant changes trigger the workflow
51
+ - Only trigger on specific branches (main, release/*) not all branches
52
+ - Use workflow_dispatch for on-demand deploys instead of automatic pushes
53
+
54
+ 3. Split into multiple concurrency groups scoped per environment or service
55
+ component. Each group has its own 100-slot limit, distributing capacity
56
+ across workloads.
57
+
58
+ 4. If strict ordering is not required and drops are acceptable, switch back
59
+ to queue:single (the default) with cancel-in-progress:false so only one
60
+ pending run is kept, avoiding the 100-slot limit entirely.
61
+ fix_code:
62
+ - language: yaml
63
+ label: "queue:max — silently cancels oldest pending run when >100 queued"
64
+ code: |
65
+ on:
66
+ push:
67
+ branches: ['**'] # Triggers on every push to every branch
68
+
69
+ concurrency:
70
+ group: production-deploy # All branches share ONE slot — queue fills fast
71
+ queue: max # Up to 100 queued; 101st silently cancels the 1st
72
+
73
+ - language: yaml
74
+ label: "Fixed — scope concurrency group per environment + narrow push trigger"
75
+ code: |
76
+ on:
77
+ push:
78
+ branches:
79
+ - main
80
+ - 'release/**'
81
+
82
+ concurrency:
83
+ # Each environment gets its own 100-slot queue
84
+ group: deploy-${{ github.event.deployment.environment || 'staging' }}-${{ github.ref_name }}
85
+ queue: max
86
+
87
+ - language: yaml
88
+ label: "Fixed — add observability step to detect queue overflow"
89
+ code: |
90
+ jobs:
91
+ deploy:
92
+ runs-on: ubuntu-latest
93
+ concurrency:
94
+ group: production-deploy
95
+ queue: max
96
+ steps:
97
+ - name: Check concurrency queue depth
98
+ env:
99
+ GH_TOKEN: ${{ github.token }}
100
+ run: |
101
+ PENDING=$(gh api \
102
+ "repos/${{ github.repository }}/actions/runs?status=waiting&per_page=100" \
103
+ --jq '[.workflow_runs[] | select(.name == "${{ github.workflow }}")] | length')
104
+ echo "Runs currently waiting in concurrency queue: $PENDING"
105
+ if [ "${PENDING:-0}" -ge 90 ]; then
106
+ echo "::warning::Queue near capacity (${PENDING}/100). Oldest run may be dropped on next push."
107
+ fi
108
+
109
+ - name: Deploy
110
+ run: ./deploy.sh
111
+ prevention:
112
+ - "Monitor pipeline throughput: if runs consistently back up to 50+ in queue, the job is too slow for the push frequency."
113
+ - "Narrow workflow triggers with paths: and branches: filters to reduce unnecessary queuing."
114
+ - "Scope concurrency groups per environment or service to distribute the 100-slot limit across workloads."
115
+ - "Use queue:max only when strict ordering matters (deployments). For CI checks, cancel-in-progress:true is preferable."
116
+ - "The combination queue:max and cancel-in-progress:true is a workflow validation error — do not use both."
117
+ docs:
118
+ - url: "https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/using-concurrency"
119
+ label: "GitHub Docs — Using concurrency (queue:max behavior and 100-slot limit)"
120
+ - url: "https://docs.github.com/en/actions/reference/limits"
121
+ label: "GitHub Actions Limits — Concurrency group queue: 100 runs per group"
@@ -0,0 +1,117 @@
1
+ id: known-unsolved-067
2
+ title: 'ubuntu-24.04 Runner df Reports 12-15 GB Ghost Disk Usage — Invisible to du/lsof'
3
+ category: known-unsolved
4
+ severity: silent-failure
5
+ tags:
6
+ - ubuntu-24
7
+ - disk-space
8
+ - enospc
9
+ - runner-agent
10
+ - diagnostics
11
+ - phantom-disk
12
+ - playwright
13
+ - hosted-runner
14
+ patterns:
15
+ - regex: 'ENOSPC:\s*no space left on device'
16
+ flags: 'i'
17
+ - regex: 'df\s+/\s+.*\d{4,5}M\s+.*\d+%'
18
+ flags: 'i'
19
+ - regex: 'No space left on device'
20
+ flags: 'i'
21
+ error_messages:
22
+ - 'ENOSPC: no space left on device, write'
23
+ - 'No space left on device'
24
+ - 'df: cannot read table of mounted file systems: No space left on device'
25
+ root_cause: |
26
+ On ubuntu-24.04 hosted runners, `df /` can report 12–15 GB of disk used
27
+ during heavy test runs (particularly those spawning many short-lived child
28
+ processes or producing large volumes of stdout, such as Playwright WebKit /
29
+ WPE test suites). This usage CANNOT be accounted for by:
30
+
31
+ - `du -shx /` (sum of all directories does not grow)
32
+ - `lsof +L1` (deleted-but-open files show only kernel /memfd:* entries)
33
+ - /proc/<PID>/maps (only kernel memfd entries)
34
+ - /proc/<PID>/io write_bytes (single-digit MB cumulative)
35
+
36
+ The ghost usage RECOVERS fully ~40 seconds after the job's main process
37
+ exits — gradually, over ~10 seconds — even though all child processes are
38
+ already reaped at recovery start. This rules out lingering processes holding
39
+ mmap'd files.
40
+
41
+ Best-guess root cause (unconfirmed by GitHub team as of June 2026): the
42
+ runner agent's diagnostic/log buffers are flushed periodically on the host
43
+ and the flushed bytes are counted in the container's `df` view but are not
44
+ visible from inside the runner's PID namespace. The ~40-second recovery delay
45
+ is consistent with a periodic flush cycle on the agent side.
46
+
47
+ This issue is non-deterministic and tied to the state of the underlying host
48
+ VM. The same workload run locally on ubuntu-24.04 does not reproduce.
49
+
50
+ Affected environments:
51
+ - Native `ubuntu-24.04` hosted runner
52
+ - Containers running on the `ubuntu-24.04` runner (which share the host's /)
53
+ - Does NOT reproduce on self-hosted ubuntu-24.04 VM locally
54
+
55
+ Tracked upstream: https://github.com/actions/runner/issues/4448 (open, May 2026)
56
+ fix: |
57
+ There is NO user-side fix for the phantom disk usage itself — this is
58
+ infrastructure-level behaviour outside the workflow's control.
59
+
60
+ Mitigations to prevent ENOSPC failures:
61
+
62
+ 1. Use a larger runner (8-core or 16-core) — larger runner classes have
63
+ more disk allocated on different host hardware.
64
+
65
+ 2. Reduce stdout volume by adding --quiet / --silent flags to test runners
66
+ and package managers (npm ci --quiet, pytest -q, etc.).
67
+
68
+ 3. Pre-clean the runner's docker layer cache and tool downloads that are
69
+ not needed:
70
+ - name: Free disk space
71
+ run: |
72
+ sudo rm -rf /usr/share/dotnet
73
+ sudo rm -rf /opt/ghc
74
+ sudo rm -rf /usr/local/lib/android
75
+ docker system prune -af
76
+
77
+ 4. Split the job into smaller parallel matrix jobs to reduce per-job output.
78
+
79
+ 5. Monitor disk in a background step to detect the ghost spike early and
80
+ correlate it with failures.
81
+ fix_code:
82
+ - language: yaml
83
+ label: 'Pre-clean unused runner tools to reclaim disk headroom'
84
+ code: |
85
+ steps:
86
+ - name: Free runner disk space
87
+ run: |
88
+ sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android
89
+ sudo apt-get clean
90
+ docker system prune -af --volumes || true
91
+ df -h / # confirm headroom before heavy tests
92
+
93
+ - name: Run Playwright tests
94
+ run: npx playwright test
95
+ - language: yaml
96
+ label: 'Use a larger runner with more disk allocation'
97
+ code: |
98
+ jobs:
99
+ test:
100
+ runs-on: ubuntu-latest-8-cores # or ubuntu-24.04-x64-8-cores
101
+ steps:
102
+ - run: npx playwright test
103
+ prevention:
104
+ - 'Add `df -h /` before and after heavy test steps to measure actual disk
105
+ consumption and detect when the ghost spike occurs.'
106
+ - 'Reduce test output verbosity — the agent diagnostic buffer hypothesis
107
+ correlates large stdout volumes with larger phantom disk readings.'
108
+ - 'For Playwright/WebKit CI that regularly sees ENOSPC: switch to
109
+ `ubuntu-24.04` larger runners or use `--reporter=dot` to minimise output.'
110
+ - 'Do not rely on `du -shx /` for disk capacity planning on hosted runners —
111
+ `df /` may show significantly more usage than du can account for during
112
+ heavy-output jobs.'
113
+ docs:
114
+ - url: 'https://github.com/actions/runner/issues/4448'
115
+ label: 'runner #4448 — df reports 12-15 GB ghost disk usage on ubuntu-24.04 runner (open, May 2026)'
116
+ - url: 'https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources'
117
+ label: 'GitHub Docs — Hosted runner hardware resources (disk sizes per runner class)'