npm - @htekdev/actions-debugger - Versions diffs - 1.0.117 → 1.0.119 - Mend

@htekdev/actions-debugger 1.0.117 → 1.0.119

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

package/errors/known-unsolved/known-unsolved-067.yml ADDED Viewed

@@ -0,0 +1,117 @@
+id: known-unsolved-067
+title: 'ubuntu-24.04 Runner df Reports 12-15 GB Ghost Disk Usage — Invisible to du/lsof'
+category: known-unsolved
+severity: silent-failure
+tags:
+  - ubuntu-24
+  - disk-space
+  - enospc
+  - runner-agent
+  - diagnostics
+  - phantom-disk
+  - playwright
+  - hosted-runner
+patterns:
+  - regex: 'ENOSPC:\s*no space left on device'
+    flags: 'i'
+  - regex: 'df\s+/\s+.*\d{4,5}M\s+.*\d+%'
+    flags: 'i'
+  - regex: 'No space left on device'
+    flags: 'i'
+error_messages:
+  - 'ENOSPC: no space left on device, write'
+  - 'No space left on device'
+  - 'df: cannot read table of mounted file systems: No space left on device'
+root_cause: |
+  On ubuntu-24.04 hosted runners, `df /` can report 12–15 GB of disk used
+  during heavy test runs (particularly those spawning many short-lived child
+  processes or producing large volumes of stdout, such as Playwright WebKit /
+  WPE test suites). This usage CANNOT be accounted for by:
+  - `du -shx /` (sum of all directories does not grow)
+  - `lsof +L1` (deleted-but-open files show only kernel /memfd:* entries)
+  - /proc/<PID>/maps (only kernel memfd entries)
+  - /proc/<PID>/io write_bytes (single-digit MB cumulative)
+  The ghost usage RECOVERS fully ~40 seconds after the job's main process
+  exits — gradually, over ~10 seconds — even though all child processes are
+  already reaped at recovery start. This rules out lingering processes holding
+  mmap'd files.
+  Best-guess root cause (unconfirmed by GitHub team as of June 2026): the
+  runner agent's diagnostic/log buffers are flushed periodically on the host
+  and the flushed bytes are counted in the container's `df` view but are not
+  visible from inside the runner's PID namespace. The ~40-second recovery delay
+  is consistent with a periodic flush cycle on the agent side.
+  This issue is non-deterministic and tied to the state of the underlying host
+  VM. The same workload run locally on ubuntu-24.04 does not reproduce.
+  Affected environments:
+  - Native `ubuntu-24.04` hosted runner
+  - Containers running on the `ubuntu-24.04` runner (which share the host's /)
+  - Does NOT reproduce on self-hosted ubuntu-24.04 VM locally
+  Tracked upstream: https://github.com/actions/runner/issues/4448 (open, May 2026)
+fix: |
+  There is NO user-side fix for the phantom disk usage itself — this is
+  infrastructure-level behaviour outside the workflow's control.
+  Mitigations to prevent ENOSPC failures:
+  1. Use a larger runner (8-core or 16-core) — larger runner classes have
+     more disk allocated on different host hardware.
+  2. Reduce stdout volume by adding --quiet / --silent flags to test runners
+     and package managers (npm ci --quiet, pytest -q, etc.).
+  3. Pre-clean the runner's docker layer cache and tool downloads that are
+     not needed:
+       - name: Free disk space
+         run: |
+           sudo rm -rf /usr/share/dotnet
+           sudo rm -rf /opt/ghc
+           sudo rm -rf /usr/local/lib/android
+           docker system prune -af
+  4. Split the job into smaller parallel matrix jobs to reduce per-job output.
+  5. Monitor disk in a background step to detect the ghost spike early and
+     correlate it with failures.
+fix_code:
+  - language: yaml
+    label: 'Pre-clean unused runner tools to reclaim disk headroom'
+    code: |
+      steps:
+        - name: Free runner disk space
+          run: |
+            sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android
+            sudo apt-get clean
+            docker system prune -af --volumes || true
+            df -h /   # confirm headroom before heavy tests
+        - name: Run Playwright tests
+          run: npx playwright test
+  - language: yaml
+    label: 'Use a larger runner with more disk allocation'
+    code: |
+      jobs:
+        test:
+          runs-on: ubuntu-latest-8-cores   # or ubuntu-24.04-x64-8-cores
+          steps:
+            - run: npx playwright test
+prevention:
+  - 'Add `df -h /` before and after heavy test steps to measure actual disk
+    consumption and detect when the ghost spike occurs.'
+  - 'Reduce test output verbosity — the agent diagnostic buffer hypothesis
+    correlates large stdout volumes with larger phantom disk readings.'
+  - 'For Playwright/WebKit CI that regularly sees ENOSPC: switch to
+    `ubuntu-24.04` larger runners or use `--reporter=dot` to minimise output.'
+  - 'Do not rely on `du -shx /` for disk capacity planning on hosted runners —
+    `df /` may show significantly more usage than du can account for during
+    heavy-output jobs.'
+docs:
+  - url: 'https://github.com/actions/runner/issues/4448'
+    label: 'runner #4448 — df reports 12-15 GB ghost disk usage on ubuntu-24.04 runner (open, May 2026)'
+  - url: 'https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources'
+    label: 'GitHub Docs — Hosted runner hardware resources (disk sizes per runner class)'

package/errors/known-unsolved/known-unsolved-068.yml ADDED Viewed

@@ -0,0 +1,124 @@
+id: known-unsolved-068
+title: "Step outcome cannot distinguish timeout from failure — both report as 'failure' in steps context"
+category: known-unsolved
+severity: limitation
+tags:
+  - timeout-minutes
+  - outcome
+  - conclusion
+  - continue-on-error
+  - steps-context
+  - retry
+  - known-limitation
+  - no-fix
+patterns:
+  - regex: 'steps\.\w+\.outcome\s*==\s*.failure.'
+    flags: 'i'
+  - regex: 'timeout-minutes.*continue-on-error|continue-on-error.*timeout-minutes'
+    flags: 'im'
+  - regex: 'The process.*timed out after \d+ minutes'
+    flags: 'i'
+error_messages:
+  - "Error: The process '/usr/bin/bash' failed with exit code 1"
+  - 'Error: Process completed with exit code 1'
+root_cause: |
+  GitHub Actions exposes two result fields for completed steps in the steps context:
+  - steps.<id>.outcome: the raw result before continue-on-error is applied.
+    Possible values: success, failure, cancelled, skipped.
+  - steps.<id>.conclusion: the final result after continue-on-error is applied.
+    When continue-on-error: true is set on a failed step, conclusion becomes 'success'
+    even if outcome is 'failure'.
+  Neither field distinguishes between a step that failed because the process exited with a
+  non-zero code and a step that failed because it hit its timeout-minutes limit. Both
+  scenarios set outcome to 'failure'. There is no 'timed_out' value, no
+  steps.<id>.timed_out boolean, and no built-in expression function to query the reason
+  for failure.
+  This means workflows cannot natively:
+  - Retry only on timeout while failing fast on real errors
+  - Alert with different severity for timeouts vs application failures
+  - Auto-escalate timeout-minutes only when a timeout (not a logic error) occurred
+  The limitation has been a known open request in the GitHub Actions community since at
+  least 2022 with no current implementation timeline from GitHub.
+fix: |
+  No native fix exists within GitHub Actions expressions. Two manual workarounds are
+  available in bash-based steps:
+  1. Record start time and compute elapsed duration at the next step to infer timeout:
+     Compare elapsed seconds against the timeout-minutes threshold. A step that used
+     approximately 100% of its time budget likely timed out.
+  2. Write a sentinel file just before the critical work; check for its absence afterward.
+     A timed-out step never reaches the sentinel-write line after the long-running command,
+     while a normally-failing step (which exits immediately on error) may or may not.
+  Neither workaround is exact — both have race conditions and edge cases. The most
+  reliable approach is to implement timeout detection inside the script itself using
+  shell signals or test-framework timeout flags.
+fix_code:
+  - language: yaml
+    label: 'Workaround 1: Infer timeout via elapsed time'
+    code: |
+      - name: Start timer
+        id: timer
+        run: echo "start=$(date +%s)" >> "$GITHUB_OUTPUT"
+      - name: Run slow tests
+        id: tests
+        timeout-minutes: 10
+        continue-on-error: true
+        run: npm test
+      - name: Classify failure type
+        if: steps.tests.outcome == 'failure'
+        env:
+          START: ${{ steps.timer.outputs.start }}
+        run: |
+          elapsed=$(( $(date +%s) - START ))
+          timeout_secs=600   # 10 minutes in seconds
+          threshold=$(( timeout_secs - 30 ))  # within 30s of limit → likely timeout
+          if [ "$elapsed" -ge "$threshold" ]; then
+            echo "::warning::Step likely timed out (elapsed ${elapsed}s, limit ${timeout_secs}s)"
+            # Handle timeout-specific logic here (e.g., don't fail, just warn)
+          else
+            echo "::error::Step failed (exit code, not timeout — elapsed ${elapsed}s)"
+            exit 1
+          fi
+  - language: yaml
+    label: 'Workaround 2: Sentinel file to detect timeout vs normal failure'
+    code: |
+      - name: Run tests with sentinel
+        id: tests
+        timeout-minutes: 10
+        continue-on-error: true
+        run: |
+          # The long-running command:
+          npm test
+          # Only reached on clean exit (not timeout, not error):
+          touch /tmp/test-completed
+      - name: Check failure reason
+        if: steps.tests.outcome == 'failure'
+        run: |
+          if [ ! -f /tmp/test-completed ]; then
+            echo "Step timed out or failed before completing"
+            # Inspect logs for timeout keyword:
+            # If the runner log shows "The process timed out after N minutes" → it was timeout
+          else
+            echo "Step completed but exited non-zero — application failure"
+            exit 1
+          fi
+prevention:
+  - 'Log test durations inside the script itself; test framework flags like --testTimeout (Jest) or --timeout (Mocha) provide per-test granularity inside logs.'
+  - 'Use separate jobs for steps with different timeout characteristics — a dedicated integration-test job with a high timeout-minutes and a unit-test job with a low one makes failures easier to categorize.'
+  - 'If the step runs a single long command, wrap it in a shell timeout with a slightly shorter duration than timeout-minutes; the shell timeout exit code (124) is detectable inside the same step.'
+docs:
+  - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/accessing-contextual-information-about-workflow-runs#steps-context'
+    label: 'GitHub Docs: steps context — outcome and conclusion fields'
+  - url: 'https://stackoverflow.com/questions/78233438/github-action-cannot-get-timeout-status-from-previous-step'
+    label: 'SO: Cannot get timeout status from previous step (Mar 2024)'
+  - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/using-conditions-to-control-job-execution'
+    label: 'GitHub Docs: Status check functions (failure, success, cancelled, always)'

package/errors/known-unsolved/node-action-post-step-wrong-inputs-nested-composite.yml ADDED Viewed

@@ -0,0 +1,133 @@
+id: known-unsolved-066
+title: 'Node Action Post Step Receives Wrong INPUT_* Env Vars When Called Through Nested Composite Action'
+category: known-unsolved
+severity: error
+tags:
+  - composite-actions
+  - post-step
+  - node-action
+  - nested
+  - inputs
+  - runner-bug
+  - savestate
+patterns:
+  - regex: 'Input required and not supplied:\s*\S+'
+    flags: 'i'
+  - regex: 'Post job cleanup\.\s*\n.*Input required'
+    flags: 'im'
+  - regex: 'post.*wrong.*input|INPUT_.*post.*ancestor'
+    flags: 'i'
+error_messages:
+  - 'Input required and not supplied: token'
+  - 'Input required and not supplied: test'
+  - 'Error: Input required and not supplied: <input-name>'
+  - 'Post job cleanup.'
+root_cause: |
+  Runner bug (actions/runner#3514, actions/runner#2030, open since 2022): when a Node.js
+  action that has a `post:` step is invoked through one or more composite action layers,
+  the runner restores the wrong `INPUT_*` environment variables for the post step execution.
+  At post-step execution time, the runner sets environment variables from the nearest
+  ancestor composite action's inputs, not from the inputs actually passed to the Node
+  action itself. For example:
+    Workflow → outer-composite (inputs: image-tag: "foo") → inner-composite
+      → node-action (inputs: image-tag: "foo")
+  The node action's main step correctly sees INPUT_IMAGE_TAG=foo.
+  In the post step, INPUT_IMAGE_TAG is absent or overwritten by the outer composite's
+  INPUT_IMAGE_TAG value (or another ancestor composite's value), causing:
+  - Required inputs to appear missing → visible error in post cleanup
+  - Optional inputs to resolve to wrong values → silent wrong behavior (e.g.,
+    devcontainers/ci pushes image with wrong tag; codeql-action uploads SARIF with
+    wrong token)
+  This affects ANY Node action with a post step that is called through composite
+  action nesting (depth ≥ 2). First-party actions affected include github/codeql-action
+  (fixed via workaround in codeql-action#2557) and pnpm/action-setup (issue #253).
+fix: |
+  GitHub has not fixed the runner bug. The canonical workaround is to persist inputs
+  in the action's main step using `core.saveState()` and read them back in the post
+  step using `core.getState()` instead of `core.getInput()`.
+  Actions consuming this workaround pattern:
+  - github/codeql-action (PR #2557): saveState for upload inputs
+  - Any action that reads inputs in its post step
+  If you maintain a Node action with a post step, add to your main.ts:
+    core.saveState('my-input', core.getInput('my-input'));
+  And in your post.ts:
+    const val = core.getState('my-input');  // use this instead of core.getInput
+  If you are a workflow author calling a third-party action through composite layers
+  and seeing wrong post-step behavior, check whether the action uses `core.getInput`
+  in its post step. If so, file an issue with the action maintainer referencing
+  actions/runner#3514 and the saveState workaround.
+fix_code:
+  - language: typescript
+    label: 'Action main.ts — persist inputs to state before post step runs'
+    code: |
+      import * as core from '@actions/core';
+      async function run() {
+        // Read inputs normally in main
+        const token = core.getInput('token', { required: true });
+        const imageName = core.getInput('image-name');
+        // Persist for post step (workaround for runner#3514)
+        core.saveState('token', token);
+        core.saveState('image-name', imageName);
+        // ... rest of main logic
+      }
+      run();
+  - language: typescript
+    label: 'Action post.ts — read from state, NOT core.getInput'
+    code: |
+      import * as core from '@actions/core';
+      async function runPost() {
+        // Use getState, NOT getInput — inputs are wrong in post step
+        // when called through nested composite action (runner#3514)
+        const token = core.getState('token');
+        const imageName = core.getState('image-name');
+        // ... post step logic using state values
+      }
+      runPost();
+  - language: yaml
+    label: 'action.yml — declare post step'
+    code: |
+      name: 'My Action'
+      inputs:
+        token:
+          required: true
+        image-name:
+          required: false
+      runs:
+        using: 'node20'
+        main: 'dist/main.js'
+        post: 'dist/post.js'
+        post-if: always()
+prevention:
+  - 'In any Node action with a post: step, always use core.saveState/core.getState for
+    inputs consumed in post, never core.getInput — this is defensive programming regardless
+    of nesting depth'
+  - 'Test your action in a composite action wrapper (at least 2 layers deep) to catch
+    this bug before publishing'
+  - 'Check release notes of actions/runner for fixes to this bug before assuming
+    the built-in behavior is fixed'
+  - 'When using actions with known post-step input bugs through composite layers,
+    consider calling them directly from the workflow (not nested in a composite) as
+    a temporary workaround'
+docs:
+  - url: 'https://github.com/actions/runner/issues/3514'
+    label: 'actions/runner#3514 — Wrong environment passed to node post when called by composite called by composite'
+  - url: 'https://github.com/actions/runner/issues/2030'
+    label: 'actions/runner#2030 — Composite: Nested actions post steps have the wrong context (open since 2022)'
+  - url: 'https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#sending-values-to-the-pre-and-post-actions'
+    label: 'GitHub Docs — Sending values to pre and post actions (saveState/getState)'
+  - url: 'https://github.com/github/codeql-action/pull/2557'
+    label: 'codeql-action#2557 — Fix: persist inputs between upload action and post step (reference implementation)'

package/errors/known-unsolved/ubuntu-24-04-arm64-missing-binder-ashmem-kernel-modules.yml ADDED Viewed

@@ -0,0 +1,149 @@
+id: known-unsolved-065
+title: 'ubuntu-24.04-arm64 Hosted Runner Kernel Missing binder_linux and ashmem_linux Modules — Android Container Tests Fail'
+category: known-unsolved
+severity: limitation
+tags:
+  - ubuntu-24.04-arm64
+  - arm64
+  - kernel-modules
+  - android
+  - binder
+  - ashmem
+  - container
+  - known-limitation
+patterns:
+  - regex: 'modprobe: FATAL: Module binder_linux not found|modprobe.*binder_linux.*not found'
+    flags: 'i'
+  - regex: 'modprobe: FATAL: Module ashmem_linux not found|modprobe.*ashmem_linux.*not found'
+    flags: 'i'
+  - regex: '/dev/binder: No such file or directory|/dev/ashmem: No such file or directory'
+    flags: 'i'
+  - regex: 'CONFIG_ANDROID_BINDER_IPC.*not set|binder_linux.*absent.*kernel'
+    flags: 'i'
+error_messages:
+  - 'modprobe: FATAL: Module binder_linux not found in directory /lib/modules/<kernel>'
+  - 'modprobe: FATAL: Module ashmem_linux not found in directory /lib/modules/<kernel>'
+  - '/dev/binder: No such file or directory'
+  - '/dev/ashmem: No such file or directory'
+  - 'modprobe binder_linux exited with code 1'
+root_cause: |
+  The ubuntu-24.04-arm64 GitHub Actions hosted runner kernel is compiled WITHOUT
+  `CONFIG_ANDROID_BINDER_IPC=m` and `CONFIG_ANDROID_BINDERFS=m`. This means the
+  `binder_linux` and `ashmem_linux` kernel modules do not exist in
+  `/lib/modules/$(uname -r)/` on ARM64 runners.
+  Attempting `modprobe binder_linux` fails silently (exits 1) on ARM64 because
+  the module is simply absent from the kernel tree — it cannot be loaded even
+  as a privileged container.
+  **Why x86_64 hosted runners DO have these modules:**
+  The x86_64 ubuntu-24.04 hosted runners expose `binder` and `ashmem` because
+  they support the [GitHub Actions KVM Android hardware acceleration feature](https://github.blog/changelog/2024-04-02-github-actions-hardware-accelerated-android-virtualization-now-available/)
+  (released April 2024). The x86_64 kernel was explicitly built with Android
+  IPC support to enable this feature. The ARM64 kernel image is compiled
+  separately and was not built with these modules.
+  **Affected use cases:**
+  - Running [ReDroid](https://github.com/remote-android/redroid-doc) (GPU-enabled
+    Android-in-Docker) on native ARM64 CI for ARM-native app testing
+  - Running [Waydroid](https://waydro.id/) in ARM64 CI containers
+  - Any workflow that requires `/dev/binder` or `/dev/ashmem` devices
+  **Confirmed on:**
+  - ubuntu-24.04-arm64 image 20260531.15.1 (kernel aarch64, Azure westus2)
+  - Source: actions/runner-images#14184 (feature request, June 2026, open)
+  **No current workaround exists** — the module cannot be compiled from source
+  against the runner's kernel headers without the kernel source tree, and the
+  azure-linux kernel for ARM64 is not shipped with headers for out-of-tree
+  `binder_linux` builds. A request to add the modules to the ubuntu-24.04-arm64
+  image is tracked at runner-images#14184 and is currently open with no ETA.
+fix: |
+  **No complete fix is currently available for native ARM64 hosted runners.**
+  **Option 1 (Recommended): Use ubuntu-24.04 x86_64 runners for Android tests**
+  The x86_64 ubuntu-24.04 runner has `binder_linux` and `ashmem_linux` and
+  supports Android container tests via KVM acceleration. ARM-native testing can
+  be approximated via the Android native bridge, at the cost of some overhead:
+  ```yaml
+  android-test:
+    runs-on: ubuntu-24.04   # x86_64 — has binder_linux/ashmem_linux
+    steps:
+      - uses: actions/checkout@v6
+      - run: |
+          sudo modprobe binder_linux  # succeeds on x86_64
+          sudo modprobe ashmem_linux
+          # ... run ReDroid or Waydroid Android container tests
+  ```
+  **Option 2: Use real ARM hardware (Firebase Test Lab, self-hosted)**
+  For true ARM64 profiling (power/wakelock, native execution) use Firebase
+  Test Lab with physical Pixel hardware, or a self-hosted ARM64 runner on
+  hardware/VMs that expose binder devices.
+  **Option 3: Compile binder_linux from source (complex, unreliable)**
+  Without kernel headers matching the runner kernel, this is not practical
+  on GitHub-hosted runners.
+  **Track for a platform fix:** Follow actions/runner-images#14184 for progress
+  on adding `binder_linux`/`ashmem_linux` to the ubuntu-24.04-arm64 runner image.
+fix_code:
+  - language: yaml
+    label: 'Route Android container tests to x86_64 runner which has binder_linux/ashmem_linux'
+    code: |
+      jobs:
+        android-container-test:
+          # ubuntu-24.04-arm64 runner kernel is compiled without CONFIG_ANDROID_BINDER_IPC=m
+          # Use x86_64 runner which has binder_linux/ashmem_linux for KVM Android acceleration.
+          # Track runner-images#14184 for ARM64 kernel module support.
+          runs-on: ubuntu-24.04     # x86_64 — binder_linux available
+          container:
+            image: redroid/redroid:15.0.0-latest
+            options: --privileged
+          steps:
+            - uses: actions/checkout@v6
+            - name: Verify binder device
+              run: ls -la /dev/binder /dev/ashmem
+            - name: Run instrumented tests
+              run: ./gradlew connectedAndroidTest
+  - language: yaml
+    label: 'Guard ARM64 jobs against missing kernel modules'
+    code: |
+      jobs:
+        android-test:
+          runs-on: ${{ matrix.runner }}
+          strategy:
+            matrix:
+              runner: [ubuntu-24.04, ubuntu-24.04-arm64]
+          steps:
+            - uses: actions/checkout@v6
+            - name: Check binder_linux availability
+              id: binder-check
+              run: |
+                if modprobe binder_linux 2>/dev/null; then
+                  echo "available=true" >> "$GITHUB_OUTPUT"
+                else
+                  echo "available=false" >> "$GITHUB_OUTPUT"
+                  echo "::warning::binder_linux not available on $(uname -m) runner. Skipping Android container tests."
+                fi
+            - name: Run Android container tests
+              if: steps.binder-check.outputs.available == 'true'
+              run: ./run-android-tests.sh
+prevention:
+  - 'Do not assume binder_linux or ashmem_linux are available on ubuntu-24.04-arm64 hosted runners — the kernel was not built with CONFIG_ANDROID_BINDER_IPC=m.'
+  - 'Route Android container (ReDroid/Waydroid) CI tests to x86_64 ubuntu-24.04 runners until runner-images#14184 is resolved.'
+  - 'Always guard binder_linux/ashmem_linux modprobe calls with an availability check so ARM64 jobs fail gracefully rather than unexpectedly.'
+  - 'Track the ARM64 runner image feature request at actions/runner-images#14184 to know when the modules are added.'
+docs:
+  - url: 'https://github.com/actions/runner-images/issues/14184'
+    label: 'runner-images #14184 — Add binder_linux and ashmem_linux kernel modules to ubuntu-24.04-arm image (open Jun 2026)'
+  - url: 'https://github.com/remote-android/redroid-doc/issues/928'
+    label: 'redroid-doc #928 — GitHub Actions ubuntu-24.04-arm runners lack binder/ashmem kernel modules'
+  - url: 'https://github.blog/changelog/2024-04-02-github-actions-hardware-accelerated-android-virtualization-now-available/'
+    label: 'GitHub Changelog — Hardware-accelerated Android virtualization on x86_64 Actions runners'
+  - url: 'https://github.com/actions/runner-images/releases/tag/ubuntu24-arm64%2F20260531.15'
+    label: 'ubuntu-24.04-arm64 image 20260531.15.1 release notes'