npm - @htekdev/actions-debugger - Versions diffs - 1.0.3 → 1.0.4 - Mend

@htekdev/actions-debugger 1.0.3 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/errors/concurrency-timing/job-stuck-waiting-for-runner.yml ADDED Viewed

@@ -0,0 +1,105 @@
+id: concurrency-timing-006
+title: "Job Stuck: 'Waiting for a Runner to Pick Up This Job'"
+category: concurrency-timing
+severity: error
+tags:
+  - runner
+  - runs-on
+  - self-hosted
+  - queued
+  - stuck
+  - deprecated-runner
+patterns:
+  - regex: "Waiting for a runner to pick up this job"
+    flags: "i"
+  - regex: "No runner matching the specified labels"
+    flags: "i"
+  - regex: "Could not find any online and idle runners"
+    flags: "i"
+error_messages:
+  - "Waiting for a runner to pick up this job."
+  - "No runner matching the specified labels was found: [your-label]"
+  - "Could not find any online and idle runners matching the required labels."
+root_cause: |
+  A job remains stuck in the "queued" state — showing "Waiting for a runner to pick up
+  this job" — when GitHub Actions cannot find an available runner matching the `runs-on:`
+  labels. The job will wait indefinitely until the `timeout-minutes` limit is reached.
+  The most common causes:
+  1. **Deprecated or retired runner label** — GitHub periodically retires old runner images.
+     `ubuntu-18.04` was retired in April 2023. `ubuntu-20.04` deprecation is in progress.
+     Jobs using these labels get stuck because no GitHub-hosted runners serve the label.
+  2. **Typo in `runs-on:` label** — `ubuntu-latets`, `ubuntu_latest`, `UBuntu-latest` all
+     fail silently. GitHub-hosted label matching is case-sensitive for custom labels.
+  3. **Self-hosted runner offline or de-registered** — the runner was stopped, the service
+     was not restarted after a reboot, or the runner registration token expired. GitHub queues
+     the job and waits for a registered runner with matching labels to come online.
+  4. **Runner group restrictions** — organization admins restrict which repositories can use
+     which runner groups. A job referencing a group the repository is not authorized for will
+     queue indefinitely without an explicit permission error.
+  5. **All runners busy** — all matching runners are executing other jobs. The job correctly
+     queues but appears "stuck" during peak usage. It will eventually be picked up.
+  There is no notification when a job has been queued for an unusually long time — the only
+  signal is the job's wall-clock age and the static "Waiting for a runner" message.
+fix: |
+  Verify the `runs-on:` label against the current list of supported GitHub-hosted runner
+  images. For self-hosted runners, check runner registration and service health.
+fix_code:
+  - language: yaml
+    label: "Use current, non-deprecated GitHub-hosted runner labels"
+    code: |
+      jobs:
+        build:
+          # Use current supported labels only
+          runs-on: ubuntu-latest     # OR ubuntu-22.04, ubuntu-24.04
+          # NOT: ubuntu-18.04 (retired), ubuntu-20.04 (deprecated)
+        build-windows:
+          runs-on: windows-latest   # OR windows-2022, windows-2025
+        build-macos:
+          runs-on: macos-latest     # OR macos-13, macos-14, macos-15
+  - language: yaml
+    label: "Self-hosted runner — verify registration and labels match exactly"
+    code: |
+      jobs:
+        deploy:
+          # Labels must exactly match what the runner was registered with
+          # Check: GitHub Settings → Actions → Runners → click runner → Labels
+          runs-on: [self-hosted, linux, production]
+          steps:
+            - name: Verify runner is the expected host
+              run: echo "Running on $RUNNER_NAME at $(hostname)"
+  - language: yaml
+    label: "Fallback: matrix across hosted and self-hosted runners"
+    code: |
+      jobs:
+        build:
+          strategy:
+            matrix:
+              runner: [ubuntu-latest, [self-hosted, linux]]
+          runs-on: ${{ matrix.runner }}
+prevention:
+  - "Audit `runs-on:` labels in all workflows when GitHub announces runner image deprecations."
+  - "Set a job-level `timeout-minutes` so stuck jobs don't consume queue slots indefinitely."
+  - "For self-hosted runners, configure the runner service to auto-restart on reboot (e.g., `--service` install on Linux via `./svc.sh install`)."
+  - "Use GitHub's runner status page (Settings → Actions → Runners) to verify runners are Online before triggering long jobs."
+  - "Subscribe to GitHub Changelog and Actions deprecation notices to catch retiring runner labels early."
+docs:
+  - url: "https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources"
+    label: "Supported GitHub-hosted runner labels"
+  - url: "https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners"
+    label: "Adding self-hosted runners"
+  - url: "https://stackoverflow.com/questions/70959954/error-waiting-for-a-runner-to-pick-up-this-job-using-github-actions"
+    label: "Stack Overflow: Waiting for a runner to pick up this job"
+  - url: "https://github.com/actions/runner/issues/3609"
+    label: "actions/runner#3609 — Self-hosted runner stuck / deadlock"
+  - url: "https://github.com/orgs/community/discussions/147604"
+    label: "Community: Workflow stuck in queued state"

package/errors/concurrency-timing/matrix-fail-fast-sibling-cancellation.yml ADDED Viewed

@@ -0,0 +1,113 @@
+id: concurrency-timing-007
+title: "Matrix Sibling Jobs Silently Cancelled by fail-fast Default"
+category: concurrency-timing
+severity: silent-failure
+tags:
+  - matrix
+  - fail-fast
+  - cancellation
+  - silent-failure
+  - strategy
+  - job-cancelled
+patterns:
+  - regex: "Some jobs were not run because a sibling job failed"
+    flags: "i"
+  - regex: "Canceling since a higher priority waiting run was found"
+    flags: "i"
+  - regex: "The workflow run was canceled\\."
+    flags: "i"
+error_messages:
+  - "Some jobs were not run because a sibling job failed. To allow them to run anyway, add 'continue-on-error: true' to the matrix job."
+  - "Job was cancelled"
+root_cause: |
+  GitHub Actions matrix strategy defaults to `fail-fast: true`. When ANY matrix leg fails,
+  GitHub immediately cancels all other in-progress and pending legs in the same matrix.
+  This default is rarely what developers want during debugging or CI investigation, and
+  produces a confusing failure pattern:
+  1. **Cancelled legs appear as "Cancelled" not "Failed"** — matrix siblings killed by
+     `fail-fast` show as CANCELLED in the UI (grey icon) rather than red failures. Developers
+     scanning the run summary see one red failure and many grey cancellations, and may not
+     realize those sibling legs had reached significant progress (e.g., partway through a
+     test suite on a different OS or Node version) before being killed.
+  2. **Root cause is obscured** — the only failing leg that matters for diagnosis is the one
+     that triggered `fail-fast`, but with multiple cancellations in the UI, it can be hard to
+     identify which leg failed first.
+  3. **`fail-fast` is inherited silently** — there is no warning annotation that says
+     "fail-fast is enabled and cancelled 5 sibling legs." The default is documented but
+     easy to forget when adding a new matrix.
+  4. **Re-running failed jobs doesn't re-run cancelled siblings** — "Re-run failed jobs"
+     only re-runs the legs that explicitly FAILED, not the ones that were cancelled by
+     fail-fast. Developers re-running failed jobs think they'll see results from all legs,
+     but cancelled siblings stay cancelled. Only "Re-run all jobs" restarts everything.
+  Example: a 3-OS matrix (ubuntu, windows, macos) where ubuntu fails. With fail-fast,
+  windows and macos are immediately cancelled. The developer sees one failure and two
+  cancellations, re-runs the failed ubuntu job, and never discovers that windows also
+  had an independent failing test.
+fix: |
+  Set `fail-fast: false` explicitly on any matrix where you need full signal from all
+  legs — especially for cross-platform or multi-version compatibility matrices. Use
+  `fail-fast: true` intentionally only when running the full matrix after one failure is
+  wasteful (e.g., expensive build matrices during pre-merge CI).
+fix_code:
+  - language: yaml
+    label: "Disable fail-fast to see all matrix leg results"
+    code: |
+      jobs:
+        test:
+          strategy:
+            fail-fast: false  # All legs run regardless of siblings failing
+            matrix:
+              os: [ubuntu-latest, windows-latest, macos-latest]
+              node: [18, 20, 22]
+          runs-on: ${{ matrix.os }}
+          steps:
+            - uses: actions/checkout@v4
+            - uses: actions/setup-node@v4
+              with:
+                node-version: ${{ matrix.node }}
+            - run: npm ci
+            - run: npm test
+  - language: yaml
+    label: "Use fail-fast: true only for expensive pre-merge CI"
+    code: |
+      jobs:
+        # Pre-merge: fail fast to conserve minutes — just need to know if it passes
+        lint-and-typecheck:
+          strategy:
+            fail-fast: true  # OK: fast, cheap, fail early
+            matrix:
+              node: [20, 22]
+          runs-on: ubuntu-latest
+          steps:
+            - run: npm run lint && npm run typecheck
+        # Post-merge: always see all platform results
+        full-test-suite:
+          if: github.event_name == 'push'
+          strategy:
+            fail-fast: false  # Need full signal on all platforms
+            matrix:
+              os: [ubuntu-latest, windows-latest, macos-latest]
+          runs-on: ${{ matrix.os }}
+          steps:
+            - run: npm test
+prevention:
+  - "Always set `fail-fast: false` explicitly on cross-platform or multi-version matrices where you need full compatibility signal."
+  - "After a matrix failure, use 'Re-run all jobs' (not 'Re-run failed jobs') to get results from previously-cancelled siblings."
+  - "Add a workflow summary step with `if: always()` to collect and consolidate test results across all matrix legs even when some are cancelled."
+  - "Be aware that cancelled legs (grey) are NOT the same as passed legs (green) — visually scan for both red and grey when investigating failures."
+docs:
+  - url: "https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idstrategyfail-fast"
+    label: "Workflow syntax: jobs.<job_id>.strategy.fail-fast"
+  - url: "https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idstrategymatrix"
+    label: "Workflow syntax: jobs.<job_id>.strategy.matrix"
+  - url: "https://github.com/orgs/community/discussions/26822"
+    label: "Community: fail-fast cancels matrix siblings unexpectedly"
+  - url: "https://stackoverflow.com/questions/57850553/github-actions-check-steps-status"
+    label: "Stack Overflow: Matrix job cancellation behavior with fail-fast"

package/errors/concurrency-timing/timeout-minutes-job-killed.yml ADDED Viewed

@@ -0,0 +1,107 @@
+id: concurrency-timing-005
+title: "Job Silently Cancelled When timeout-minutes Is Exceeded"
+category: concurrency-timing
+severity: error
+tags:
+  - timeout
+  - timeout-minutes
+  - job-cancelled
+  - timing
+  - runner
+patterns:
+  - regex: "##\\[error\\]The operation was cancelled\\."
+    flags: "i"
+  - regex: "The job '.*' was cancelled because it exceeded the maximum execution time"
+    flags: "i"
+  - regex: "Error: The operation was canceled"
+    flags: "i"
+  - regex: "cancel is received"
+    flags: "i"
+error_messages:
+  - "##[error]The operation was cancelled."
+  - "Error: The operation was canceled"
+  - "The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled."
+root_cause: |
+  When a job (or step) exceeds its configured `timeout-minutes`, GitHub Actions sends a
+  cancellation signal to the runner. The runner has 5 minutes to complete graceful shutdown,
+  after which it is forcibly terminated.
+  The failure mode has two layers of confusion:
+  1. **Status shows "Cancelled" not "Failed"** — a timed-out job is marked CANCELLED in the
+     UI. It does not appear as a red failure. Developers scanning the Actions tab may miss it
+     entirely, especially if another run succeeded after it.
+  2. **No step-level attribution** — the job log shows "The operation was cancelled" but does
+     not identify which specific step was still running or how far it had progressed. Long
+     builds, network-heavy steps, and interactive prompts are common culprits.
+  3. **Default timeout is 360 minutes (6 hours)** — if `timeout-minutes` is not explicitly
+     set, GitHub uses the platform default of 6 hours for GitHub-hosted runners. A job that
+     accidentally blocks (waiting for user input, infinite loop, hung network call) will silently
+     consume 6 hours of runner minutes before being cancelled with no diagnostic output.
+  4. **Step-level timeouts are independent** — `timeout-minutes` on a `steps[*]` entry cancels
+     only that step; the job continues. `timeout-minutes` on `jobs[*]` cancels the entire job.
+     Mixing both is valid but must be understood deliberately.
+fix: |
+  Always set explicit `timeout-minutes` at the job level to bound worst-case runner cost.
+  Tune based on your typical build time (e.g., 2-3× the median duration). Add step-level
+  timeouts on known slow steps (network downloads, test suites) to get better attribution.
+  To diagnose which step was running at cancellation: add a step near the end that dumps
+  elapsed time, or use `if: cancelled()` post-steps to capture diagnostics on timeout.
+fix_code:
+  - language: yaml
+    label: "Explicit job-level timeout with diagnostic post-step"
+    code: |
+      jobs:
+        build:
+          runs-on: ubuntu-latest
+          timeout-minutes: 30  # Set explicitly — don't rely on 6h default
+          steps:
+            - uses: actions/checkout@v4
+            - name: Build
+              run: make build
+            - name: Tests
+              timeout-minutes: 15  # Step-level timeout for attribution
+              run: make test
+            # Always runs — captures which step caused the timeout
+            - name: Dump elapsed time on cancellation
+              if: cancelled()
+              run: echo "Job was cancelled at $(date -u). Check step durations above."
+  - language: yaml
+    label: "Identify which step timed out with job summary annotation"
+    code: |
+      steps:
+        - name: Long network operation
+          timeout-minutes: 10
+          run: |
+            # Use --max-time with curl to avoid relying solely on timeout-minutes
+            curl --max-time 300 https://example.com/large-asset -o output.bin
+        - name: Report timeout if cancelled
+          if: cancelled()
+          run: |
+            echo "## ⏱️ Job Timed Out" >> $GITHUB_STEP_SUMMARY
+            echo "The job was cancelled. Review step durations in the log." >> $GITHUB_STEP_SUMMARY
+prevention:
+  - "Always set `timeout-minutes` at the job level — never rely on the 6-hour GitHub default."
+  - "Add step-level `timeout-minutes` on network-heavy or test steps so cancellation is attributed to a specific step."
+  - "Use `if: cancelled()` post-steps to write a job summary annotation explaining the timeout."
+  - "Run commands with their own timeout flags (e.g., `curl --max-time`, `pytest --timeout`) in addition to runner timeouts."
+  - "Monitor job duration trends — a job approaching its timeout limit is a signal to investigate performance."
+docs:
+  - url: "https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idtimeout-minutes"
+    label: "Workflow syntax: jobs.<job_id>.timeout-minutes"
+  - url: "https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idstepstimeout-minutes"
+    label: "Workflow syntax: jobs.<job_id>.steps[*].timeout-minutes"
+  - url: "https://github.com/actions/runner/issues/1326"
+    label: "actions/runner#1326 — Steps hanging until timeout with no log output"
+  - url: "https://github.com/orgs/community/discussions/38004"
+    label: "Community: Job stops producing output and is later cancelled"
+  - url: "https://docs.github.com/en/actions/administering-github-actions/usage-limits-billing-and-administration#usage-limits"
+    label: "Usage limits: maximum job execution time"

package/errors/known-unsolved/github-step-summary-size-limit.yml ADDED Viewed

@@ -0,0 +1,112 @@
+id: known-unsolved-008
+title: "GITHUB_STEP_SUMMARY Upload Aborted When Content Exceeds 1024k"
+category: known-unsolved
+severity: error
+tags:
+  - step-summary
+  - GITHUB_STEP_SUMMARY
+  - size-limit
+  - job-summary
+  - markdown
+  - limitation
+patterns:
+  - regex: "\\$GITHUB_STEP_SUMMARY upload aborted, supports content up to a size of 1024k, got \\d+k"
+    flags: "i"
+  - regex: "upload aborted.*supports content up to a size of 1024k"
+    flags: "i"
+  - regex: "Error: GITHUB_STEP_SUMMARY.*1024"
+    flags: "i"
+error_messages:
+  - "$GITHUB_STEP_SUMMARY upload aborted, supports content up to a size of 1024k, got 1387k"
+  - "$GITHUB_STEP_SUMMARY upload aborted, supports content up to a size of 1024k, got 2048k"
+root_cause: |
+  GitHub Actions imposes a hard 1 MiB (1024 KiB) size limit on the content written to
+  `$GITHUB_STEP_SUMMARY`. When a step writes more than this limit, the runner aborts
+  the summary upload and logs an error.
+  This is a **platform limit with no workaround** — you cannot increase it. GitHub has not
+  announced plans to raise the limit.
+  Common triggers:
+  1. **Test reporters** — tools like `dorny/test-reporter`, `ctrf-io/github-actions-ctrf`,
+     or `EnricoMi/publish-unit-test-result-action` write per-test result tables. Large
+     test suites (thousands of test cases, especially with long failure messages) easily
+     exceed 1 MiB.
+  2. **Dependency review action** — `actions/dependency-review-action` writes full
+     dependency diff tables. Large projects with hundreds of transitive dependencies produce
+     summaries well above 1 MiB.
+  3. **Coverage reports** — HTML-style coverage tables written to `$GITHUB_STEP_SUMMARY`
+     with per-file rows can grow unboundedly on large monorepos.
+  4. **Log echo pipelines** — `cat large-file >> $GITHUB_STEP_SUMMARY` without size
+     checking is the most direct way to hit the limit.
+  The error aborts the summary upload but does **not** fail the step or job by default.
+  Depending on the action's error handling, the step may succeed (exit 0) even though the
+  summary was not written — making this a silent failure from a reporting perspective.
+fix: |
+  Truncate or paginate summary content before writing it. Most test reporters provide
+  options to limit which results are written (e.g., only failures, not all passed tests).
+  For custom summary generation, check the size before writing and truncate with a note.
+fix_code:
+  - language: yaml
+    label: "Truncate summary content with size check before writing"
+    code: |
+      - name: Generate test report
+        run: |
+          # Generate report to a temp file first
+          ./scripts/generate-report.sh > /tmp/report.md
+          # Check size before writing to summary
+          SIZE_KB=$(du -k /tmp/report.md | cut -f1)
+          MAX_KB=800  # Leave headroom below 1024k limit
+          if [ "$SIZE_KB" -gt "$MAX_KB" ]; then
+            echo "⚠️ Full report too large (${SIZE_KB}k). Showing failures only." >> "$GITHUB_STEP_SUMMARY"
+            ./scripts/generate-report.sh --failures-only >> "$GITHUB_STEP_SUMMARY"
+          else
+            cat /tmp/report.md >> "$GITHUB_STEP_SUMMARY"
+          fi
+  - language: yaml
+    label: "dorny/test-reporter — limit to failures only for large test suites"
+    code: |
+      - name: Test Report
+        uses: dorny/test-reporter@v1
+        if: always()
+        with:
+          name: Test Results
+          path: test-results/**/*.xml
+          reporter: jest-junit
+          # Limit output to avoid 1024k summary limit on large suites
+          only-summary: true          # Write only totals, not per-test rows
+          fail-on-error: false
+  - language: yaml
+    label: "Upload full report as artifact instead of writing to summary"
+    code: |
+      - name: Generate full coverage report
+        run: ./scripts/coverage.sh > /tmp/coverage-full.md
+      - name: Write summary (truncated)
+        run: |
+          head -100 /tmp/coverage-full.md >> "$GITHUB_STEP_SUMMARY"
+          echo "" >> "$GITHUB_STEP_SUMMARY"
+          echo "_Full report available as workflow artifact._" >> "$GITHUB_STEP_SUMMARY"
+      - name: Upload full report as artifact
+        uses: actions/upload-artifact@v4
+        with:
+          name: coverage-report
+          path: /tmp/coverage-full.md
+prevention:
+  - "Never pipe unbounded command output directly to `$GITHUB_STEP_SUMMARY` — always size-check or limit first."
+  - "Configure test reporter actions to write only failures (not all passing tests) when the test suite is large."
+  - "Upload large reports as workflow artifacts and link to them from a short summary, instead of embedding all content in the summary."
+  - "The undocumented historical limit of 65,535 characters cited in older docs/answers is no longer accurate — the current limit is 1024 KiB (1 MiB)."
+docs:
+  - url: "https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#adding-a-job-summary"
+    label: "Workflow commands: Adding a job summary"
+  - url: "https://github.com/actions/dependency-review-action/issues/786"
+    label: "dependency-review-action#786 — Job Summary Size Limitation aborts the job"
+  - url: "https://github.com/dorny/test-reporter/issues/379"
+    label: "dorny/test-reporter#379 — Is the step summary limit for 65535 characters still accurate?"
+  - url: "https://docs.github.com/en/actions/administering-github-actions/usage-limits-billing-and-administration#usage-limits"
+    label: "Usage limits — GitHub Actions"

package/errors/known-unsolved/job-maximum-execution-time.yml ADDED Viewed

@@ -0,0 +1,127 @@
+id: known-unsolved-009
+title: "Job Killed After Maximum Execution Time (6h Hosted / 35-Day Workflow)"
+category: known-unsolved
+severity: limitation
+tags:
+  - timeout
+  - execution-time
+  - job-limits
+  - platform-limit
+  - self-hosted
+  - workflow-duration
+  - limitation
+patterns:
+  - regex: "The job running has exceeded the maximum execution time"
+    flags: "i"
+  - regex: "exceeded the maximum (?:time|execution time)"
+    flags: "i"
+  - regex: "job .* exceeded .* maximum"
+    flags: "i"
+error_messages:
+  - "The job running on runner GitHub Actions X has exceeded the maximum execution time of 360 minutes."
+  - "The job running has exceeded the maximum execution time"
+root_cause: |
+  GitHub Actions enforces hard platform-level execution time limits that cannot be
+  overridden or extended by workflow configuration. These limits exist to protect
+  shared infrastructure and prevent runaway jobs from consuming unlimited resources.
+  **GitHub-hosted runner limits:**
+  - Maximum job execution time: **6 hours** (360 minutes)
+  - Maximum workflow run time: **35 days** (across all jobs, including queued time)
+  - Default `timeout-minutes` when not set: **360 minutes** (6 hours)
+  **Self-hosted runner limits:**
+  - Maximum job execution time: **5 days** (7,200 minutes) by default
+  - Maximum workflow run time: **35 days** (same as hosted)
+  - Self-hosted limits can be customized in enterprise plans via org/enterprise policies
+  **When limits are hit:**
+  - The runner process is sent a SIGTERM (graceful) then SIGKILL (forced) after a grace period
+  - The job is marked CANCELLED (not FAILED) in the UI
+  - The log message "The job running has exceeded the maximum execution time" appears in
+    the runner log (may be visible in the step logs depending on where the runner was killed)
+  - Any `post:` steps for active actions (e.g., cache save, artifact upload) are skipped
+  - No email notification is sent to the repo owner about the cancellation
+  **Why this is a limitation, not just misconfiguration:**
+  - There is no way to set `timeout-minutes` above 21600 (360 hours) to extend the GitHub-hosted 6h cap
+  - The workflow `timeout-minutes` field cannot override the platform cap on GitHub-hosted runners
+  - Jobs requiring more than 6 hours on GitHub-hosted runners have NO supported path without
+    migrating to self-hosted or restructuring the job into multiple shorter sequential jobs
+fix: |
+  There is no way to extend the GitHub-hosted runner 6-hour job cap. Options:
+  1. **Break the job into smaller sequential jobs** — split long-running work (e.g., build
+     artifacts first, test in separate parallel jobs, deploy last). Each job has its own
+     6-hour budget.
+  2. **Migrate to self-hosted runners** — self-hosted runners support up to 5-day jobs.
+     Use actions-runner-controller (ARC) or cloud auto-scaling for elastic capacity.
+  3. **Optimize the slow step** — profile build/test times; parallelize with matrix
+     strategy; use incremental builds or test sharding to reduce per-job duration.
+  4. **Use caching aggressively** — `actions/cache` reduces download/build time between
+     runs, but does not extend limits.
+fix_code:
+  - language: yaml
+    label: "Split a long job into sequential jobs to stay within 6h per job"
+    code: |
+      jobs:
+        build:
+          runs-on: ubuntu-latest
+          timeout-minutes: 120  # 2h budget for build
+          outputs:
+            artifact-id: ${{ steps.upload.outputs.artifact-id }}
+          steps:
+            - uses: actions/checkout@v4
+            - name: Build
+              run: make build-release
+            - name: Upload build artifact
+              id: upload
+              uses: actions/upload-artifact@v4
+              with:
+                name: release-build
+                path: dist/
+        # Separate job — gets its own 6h budget
+        test:
+          needs: build
+          runs-on: ubuntu-latest
+          timeout-minutes: 180  # 3h budget for tests
+          steps:
+            - uses: actions/download-artifact@v4
+              with:
+                artifact-id: ${{ needs.build.outputs.artifact-id }}
+            - run: make test-full
+  - language: yaml
+    label: "Self-hosted runner for jobs requiring more than 6 hours"
+    code: |
+      jobs:
+        long-running-job:
+          # Self-hosted runners support up to 5-day job duration
+          runs-on: [self-hosted, linux, x64]
+          timeout-minutes: 2880  # 48h — only possible on self-hosted
+          steps:
+            - uses: actions/checkout@v4
+            - name: Long-running process
+              run: ./scripts/full-dataset-processing.sh
+prevention:
+  - "Set explicit `timeout-minutes` on every job — don't rely on the implicit 6h GitHub-hosted cap as your only safeguard."
+  - "Profile job duration regularly and alert when a job's P99 duration approaches 80% of its timeout budget."
+  - "Parallelize test suites using matrix strategy or `actions/github-script` dynamic matrix generation to reduce per-job time."
+  - "Use self-hosted runners for any workflow that legitimately requires more than 2-3 hours per job (e.g., large model training, full database rebuild, exhaustive integration tests)."
+  - "Be aware that post-run actions (cache save, artifact upload) will NOT execute if the parent job is killed for exceeding the time limit."
+docs:
+  - url: "https://docs.github.com/en/actions/administering-github-actions/usage-limits-billing-and-administration#usage-limits"
+    label: "Usage limits: job execution time and workflow run time"
+  - url: "https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners#usage-limits"
+    label: "Self-hosted runner usage limits"
+  - url: "https://github.com/orgs/community/discussions/48790"
+    label: "Community: Workflow run time limit 35 days"
+  - url: "https://github.com/orgs/community/discussions/150900"
+    label: "Community: Job cancellation after 6 hours"
+  - url: "https://stackoverflow.com/questions/70187174/github-actions-self-hosted-runner-the-job-running-has-exceeded-the-maximum-exe"
+    label: "Stack Overflow: The job running has exceeded the maximum execution time"
+  - url: "https://github.com/actions/actions-runner-controller"
+    label: "Actions Runner Controller (ARC) — Kubernetes-based self-hosted runner auto-scaling"

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@htekdev/actions-debugger",
-  "version": "1.0.3",
+  "version": "1.0.4",
   "description": "65+ real GitHub Actions errors, queryable by agents. MCP server + Copilot skills + error database.",
   "type": "module",
   "main": "./dist/index.js",