autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (324) hide show
  1. package/.claude-plugin/marketplace.json +22 -0
  2. package/.claude-plugin/plugin.json +13 -0
  3. package/LICENSE +21 -0
  4. package/Makefile +21 -0
  5. package/README.md +140 -0
  6. package/SECURITY.md +28 -0
  7. package/agents/bash-expert.md +113 -0
  8. package/agents/dependency-auditor.md +138 -0
  9. package/agents/integration-tester.md +120 -0
  10. package/agents/lesson-scanner.md +149 -0
  11. package/agents/python-expert.md +179 -0
  12. package/agents/service-monitor.md +141 -0
  13. package/agents/shell-expert.md +147 -0
  14. package/benchmarks/runner.sh +147 -0
  15. package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
  16. package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
  17. package/benchmarks/tasks/02-refactor-module/task.md +8 -0
  18. package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
  19. package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
  20. package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
  21. package/bin/act.js +238 -0
  22. package/commands/autocode.md +6 -0
  23. package/commands/cancel-ralph.md +18 -0
  24. package/commands/code-factory.md +53 -0
  25. package/commands/create-prd.md +55 -0
  26. package/commands/ralph-loop.md +18 -0
  27. package/commands/run-plan.md +117 -0
  28. package/commands/submit-lesson.md +122 -0
  29. package/docs/ARCHITECTURE.md +630 -0
  30. package/docs/CONTRIBUTING.md +125 -0
  31. package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
  32. package/docs/lessons/0002-async-def-without-await.md +28 -0
  33. package/docs/lessons/0003-create-task-without-callback.md +28 -0
  34. package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
  35. package/docs/lessons/0005-sqlite-without-closing.md +33 -0
  36. package/docs/lessons/0006-venv-pip-path.md +27 -0
  37. package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
  38. package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
  39. package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
  40. package/docs/lessons/0010-local-outside-function-bash.md +33 -0
  41. package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
  42. package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
  43. package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
  44. package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
  45. package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
  46. package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
  47. package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
  48. package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
  49. package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
  50. package/docs/lessons/0020-persist-state-incrementally.md +44 -0
  51. package/docs/lessons/0021-dual-axis-testing.md +48 -0
  52. package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
  53. package/docs/lessons/0023-static-analysis-spiral.md +51 -0
  54. package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
  55. package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
  56. package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
  57. package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
  58. package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
  59. package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
  60. package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
  61. package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
  62. package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
  63. package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
  64. package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
  65. package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
  66. package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
  67. package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
  68. package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
  69. package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
  70. package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
  71. package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
  72. package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
  73. package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
  74. package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
  75. package/docs/lessons/0045-iterative-design-improvement.md +33 -0
  76. package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
  77. package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
  78. package/docs/lessons/0048-integration-wiring-batch.md +40 -0
  79. package/docs/lessons/0049-ab-verification.md +41 -0
  80. package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
  81. package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
  82. package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
  83. package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
  84. package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
  85. package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
  86. package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
  87. package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
  88. package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
  89. package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
  90. package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
  91. package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
  92. package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
  93. package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
  94. package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
  95. package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
  96. package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
  97. package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
  98. package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
  99. package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
  100. package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
  101. package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
  102. package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
  103. package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
  104. package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
  105. package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
  106. package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
  107. package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
  108. package/docs/lessons/0078-static-review-without-live-test.md +30 -0
  109. package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
  110. package/docs/lessons/FRAMEWORK.md +161 -0
  111. package/docs/lessons/SUMMARY.md +201 -0
  112. package/docs/lessons/TEMPLATE.md +85 -0
  113. package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
  114. package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
  115. package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
  116. package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
  117. package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
  118. package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
  119. package/docs/plans/2026-02-21-mab-research-report.md +406 -0
  120. package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
  121. package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
  122. package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
  123. package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
  124. package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
  125. package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
  126. package/docs/plans/2026-02-22-mab-run-design.md +462 -0
  127. package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
  128. package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
  129. package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
  130. package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
  131. package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
  132. package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
  133. package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
  134. package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
  135. package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
  136. package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
  137. package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
  138. package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
  139. package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
  140. package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
  141. package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
  142. package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
  143. package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
  144. package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
  145. package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
  146. package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
  147. package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
  148. package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
  149. package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
  150. package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
  151. package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
  152. package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
  153. package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
  154. package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
  155. package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
  156. package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
  157. package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
  158. package/docs/plans/2026-02-24-headless-module-split.md +443 -0
  159. package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
  160. package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
  161. package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
  162. package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
  163. package/docs/plans/audit-findings.md +186 -0
  164. package/docs/telegram-notification-format.md +98 -0
  165. package/examples/example-plan.md +51 -0
  166. package/examples/example-prd.json +72 -0
  167. package/examples/example-roadmap.md +33 -0
  168. package/examples/quickstart-plan.md +63 -0
  169. package/hooks/hooks.json +26 -0
  170. package/hooks/setup-symlinks.sh +48 -0
  171. package/hooks/stop-hook.sh +135 -0
  172. package/package.json +47 -0
  173. package/policies/bash.md +71 -0
  174. package/policies/python.md +71 -0
  175. package/policies/testing.md +61 -0
  176. package/policies/universal.md +60 -0
  177. package/scripts/analyze-report.sh +97 -0
  178. package/scripts/architecture-map.sh +145 -0
  179. package/scripts/auto-compound.sh +273 -0
  180. package/scripts/batch-audit.sh +42 -0
  181. package/scripts/batch-test.sh +101 -0
  182. package/scripts/entropy-audit.sh +221 -0
  183. package/scripts/failure-digest.sh +51 -0
  184. package/scripts/generate-ast-rules.sh +96 -0
  185. package/scripts/init.sh +112 -0
  186. package/scripts/lesson-check.sh +428 -0
  187. package/scripts/lib/common.sh +61 -0
  188. package/scripts/lib/cost-tracking.sh +153 -0
  189. package/scripts/lib/ollama.sh +60 -0
  190. package/scripts/lib/progress-writer.sh +128 -0
  191. package/scripts/lib/run-plan-context.sh +215 -0
  192. package/scripts/lib/run-plan-echo-back.sh +231 -0
  193. package/scripts/lib/run-plan-headless.sh +396 -0
  194. package/scripts/lib/run-plan-notify.sh +57 -0
  195. package/scripts/lib/run-plan-parser.sh +81 -0
  196. package/scripts/lib/run-plan-prompt.sh +215 -0
  197. package/scripts/lib/run-plan-quality-gate.sh +132 -0
  198. package/scripts/lib/run-plan-routing.sh +315 -0
  199. package/scripts/lib/run-plan-sampling.sh +170 -0
  200. package/scripts/lib/run-plan-scoring.sh +146 -0
  201. package/scripts/lib/run-plan-state.sh +142 -0
  202. package/scripts/lib/run-plan-team.sh +199 -0
  203. package/scripts/lib/telegram.sh +54 -0
  204. package/scripts/lib/thompson-sampling.sh +176 -0
  205. package/scripts/license-check.sh +74 -0
  206. package/scripts/mab-run.sh +575 -0
  207. package/scripts/module-size-check.sh +146 -0
  208. package/scripts/patterns/async-no-await.yml +5 -0
  209. package/scripts/patterns/bare-except.yml +6 -0
  210. package/scripts/patterns/empty-catch.yml +6 -0
  211. package/scripts/patterns/hardcoded-localhost.yml +9 -0
  212. package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
  213. package/scripts/pipeline-status.sh +197 -0
  214. package/scripts/policy-check.sh +226 -0
  215. package/scripts/prior-art-search.sh +133 -0
  216. package/scripts/promote-mab-lessons.sh +126 -0
  217. package/scripts/prompts/agent-a-superpowers.md +29 -0
  218. package/scripts/prompts/agent-b-ralph.md +29 -0
  219. package/scripts/prompts/judge-agent.md +61 -0
  220. package/scripts/prompts/planner-agent.md +44 -0
  221. package/scripts/pull-community-lessons.sh +90 -0
  222. package/scripts/quality-gate.sh +266 -0
  223. package/scripts/research-gate.sh +90 -0
  224. package/scripts/run-plan.sh +329 -0
  225. package/scripts/scope-infer.sh +159 -0
  226. package/scripts/setup-ralph-loop.sh +155 -0
  227. package/scripts/telemetry.sh +230 -0
  228. package/scripts/tests/run-all-tests.sh +52 -0
  229. package/scripts/tests/test-act-cli.sh +46 -0
  230. package/scripts/tests/test-agents-md.sh +87 -0
  231. package/scripts/tests/test-analyze-report.sh +114 -0
  232. package/scripts/tests/test-architecture-map.sh +89 -0
  233. package/scripts/tests/test-auto-compound.sh +169 -0
  234. package/scripts/tests/test-batch-test.sh +65 -0
  235. package/scripts/tests/test-benchmark-runner.sh +25 -0
  236. package/scripts/tests/test-common.sh +168 -0
  237. package/scripts/tests/test-cost-tracking.sh +158 -0
  238. package/scripts/tests/test-echo-back.sh +180 -0
  239. package/scripts/tests/test-entropy-audit.sh +146 -0
  240. package/scripts/tests/test-failure-digest.sh +66 -0
  241. package/scripts/tests/test-generate-ast-rules.sh +145 -0
  242. package/scripts/tests/test-helpers.sh +82 -0
  243. package/scripts/tests/test-init.sh +47 -0
  244. package/scripts/tests/test-lesson-check.sh +278 -0
  245. package/scripts/tests/test-lesson-local.sh +55 -0
  246. package/scripts/tests/test-license-check.sh +109 -0
  247. package/scripts/tests/test-mab-run.sh +182 -0
  248. package/scripts/tests/test-ollama-lib.sh +49 -0
  249. package/scripts/tests/test-ollama.sh +60 -0
  250. package/scripts/tests/test-pipeline-status.sh +198 -0
  251. package/scripts/tests/test-policy-check.sh +124 -0
  252. package/scripts/tests/test-prior-art-search.sh +96 -0
  253. package/scripts/tests/test-progress-writer.sh +140 -0
  254. package/scripts/tests/test-promote-mab-lessons.sh +110 -0
  255. package/scripts/tests/test-pull-community-lessons.sh +149 -0
  256. package/scripts/tests/test-quality-gate.sh +241 -0
  257. package/scripts/tests/test-research-gate.sh +132 -0
  258. package/scripts/tests/test-run-plan-cli.sh +86 -0
  259. package/scripts/tests/test-run-plan-context.sh +305 -0
  260. package/scripts/tests/test-run-plan-e2e.sh +153 -0
  261. package/scripts/tests/test-run-plan-headless.sh +424 -0
  262. package/scripts/tests/test-run-plan-notify.sh +124 -0
  263. package/scripts/tests/test-run-plan-parser.sh +217 -0
  264. package/scripts/tests/test-run-plan-prompt.sh +254 -0
  265. package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
  266. package/scripts/tests/test-run-plan-routing.sh +178 -0
  267. package/scripts/tests/test-run-plan-scoring.sh +148 -0
  268. package/scripts/tests/test-run-plan-state.sh +261 -0
  269. package/scripts/tests/test-run-plan-team.sh +157 -0
  270. package/scripts/tests/test-scope-infer.sh +150 -0
  271. package/scripts/tests/test-setup-ralph-loop.sh +63 -0
  272. package/scripts/tests/test-telegram-env.sh +38 -0
  273. package/scripts/tests/test-telegram.sh +121 -0
  274. package/scripts/tests/test-telemetry.sh +46 -0
  275. package/scripts/tests/test-thompson-sampling.sh +139 -0
  276. package/scripts/tests/test-validate-all.sh +60 -0
  277. package/scripts/tests/test-validate-commands.sh +89 -0
  278. package/scripts/tests/test-validate-hooks.sh +98 -0
  279. package/scripts/tests/test-validate-lessons.sh +150 -0
  280. package/scripts/tests/test-validate-plan-quality.sh +235 -0
  281. package/scripts/tests/test-validate-plans.sh +187 -0
  282. package/scripts/tests/test-validate-plugin.sh +106 -0
  283. package/scripts/tests/test-validate-prd.sh +184 -0
  284. package/scripts/tests/test-validate-skills.sh +134 -0
  285. package/scripts/validate-all.sh +57 -0
  286. package/scripts/validate-commands.sh +67 -0
  287. package/scripts/validate-hooks.sh +89 -0
  288. package/scripts/validate-lessons.sh +98 -0
  289. package/scripts/validate-plan-quality.sh +369 -0
  290. package/scripts/validate-plans.sh +120 -0
  291. package/scripts/validate-plugin.sh +86 -0
  292. package/scripts/validate-policies.sh +42 -0
  293. package/scripts/validate-prd.sh +118 -0
  294. package/scripts/validate-skills.sh +96 -0
  295. package/skills/autocode/SKILL.md +285 -0
  296. package/skills/autocode/ab-verification.md +51 -0
  297. package/skills/autocode/code-quality-standards.md +37 -0
  298. package/skills/autocode/competitive-mode.md +364 -0
  299. package/skills/brainstorming/SKILL.md +97 -0
  300. package/skills/capture-lesson/SKILL.md +187 -0
  301. package/skills/check-lessons/SKILL.md +116 -0
  302. package/skills/dispatching-parallel-agents/SKILL.md +110 -0
  303. package/skills/executing-plans/SKILL.md +85 -0
  304. package/skills/finishing-a-development-branch/SKILL.md +201 -0
  305. package/skills/receiving-code-review/SKILL.md +72 -0
  306. package/skills/requesting-code-review/SKILL.md +59 -0
  307. package/skills/requesting-code-review/code-reviewer.md +82 -0
  308. package/skills/research/SKILL.md +145 -0
  309. package/skills/roadmap/SKILL.md +115 -0
  310. package/skills/subagent-driven-development/SKILL.md +98 -0
  311. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
  312. package/skills/subagent-driven-development/implementer-prompt.md +73 -0
  313. package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
  314. package/skills/systematic-debugging/SKILL.md +134 -0
  315. package/skills/systematic-debugging/condition-based-waiting.md +64 -0
  316. package/skills/systematic-debugging/defense-in-depth.md +32 -0
  317. package/skills/systematic-debugging/root-cause-tracing.md +55 -0
  318. package/skills/test-driven-development/SKILL.md +167 -0
  319. package/skills/using-git-worktrees/SKILL.md +219 -0
  320. package/skills/using-superpowers/SKILL.md +54 -0
  321. package/skills/verification-before-completion/SKILL.md +140 -0
  322. package/skills/verify/SKILL.md +82 -0
  323. package/skills/writing-plans/SKILL.md +128 -0
  324. package/skills/writing-skills/SKILL.md +93 -0
@@ -0,0 +1,425 @@
1
+ # Research: Service Monitor Claude Code Agent
2
+
3
+ **Date:** 2026-02-23
4
+ **Status:** Research complete — ready for agent design
5
+ **Confidence:** High on patterns, Medium on gap between existing tools and Claude Code agent idioms
6
+
7
+ ---
8
+
9
+ ## BLUF
10
+
11
+ No existing tool does exactly what's needed. The closest analog is GASP (AI-first monitoring for LLM consumption) combined with systemd_mon's event-driven DBus model and pengutronix's `check-systemd-service` property enumeration. The agent should be structured as a Bash-heavy Claude Code subagent that runs a deterministic inspection suite, then applies AI pattern recognition to output a structured severity report. The key gap all existing tools miss: proactively hunting Cluster A (silent failure) patterns — services that are technically "active" but have swallowed errors and are doing nothing.
12
+
13
+ ---
14
+
15
+ ## Section 1: Claude Code Agent Infrastructure
16
+
17
+ **Source:** [Anthropic Claude Code Sub-agents Documentation](https://code.claude.com/docs/en/sub-agents) | [wshobson/agents](https://github.com/wshobson/agents) | [VoltAgent/awesome-claude-code-subagents](https://github.com/VoltAgent/awesome-claude-code-subagents) | [Piebald-AI/claude-code-system-prompts](https://github.com/Piebald-AI/claude-code-system-prompts)
18
+
19
+ ### Agent File Format
20
+
21
+ Subagents are `.md` files with YAML frontmatter stored at `~/.claude/agents/` (user-level) or `.claude/agents/` (project-level). The frontmatter controls all behavior:
22
+
23
+ ```yaml
24
+ ---
25
+ name: service-monitor
26
+ description: Monitors all user systemd services and timers for failures, restart loops, silent errors, and resource anomalies. Use when asked about service health.
27
+ tools: Bash, Read, Grep
28
+ model: sonnet
29
+ memory: user
30
+ ---
31
+ ```
32
+
33
+ Key frontmatter fields relevant to this agent:
34
+ - `tools: Bash, Read, Grep` — Bash is essential; Read/Grep for config file inspection
35
+ - `model: sonnet` — sufficient for pattern recognition; haiku acceptable for pure data collection
36
+ - `memory: user` — enables cross-session baseline learning (stores to `~/.claude/agent-memory/service-monitor/`)
37
+ - `permissionMode: default` — standard; no need for bypassPermissions since monitoring is read-only
38
+ - `maxTurns` — consider capping at 40 to prevent runaway inspection loops
39
+
40
+ ### Existing Agent Catalog Gaps
41
+
42
+ The wshobson/agents catalog has 100 agents across 9 categories. The infrastructure category includes `observability-engineer` (SLI/SLO management, distributed tracing) and `devops-incident-responder` but nothing targeting local systemd service health on a single machine. VoltAgent's collection similarly lacks systemd-specific agents — its closest entry is `sre-engineer`.
43
+
44
+ The `infra-auditor` agent already in `~/.claude/agents/infra-auditor.md` provides a template: it checks named services with `systemctl --user is-active`, runs connectivity probes, and checks resource usage. The service-monitor agent should be its successor — deeper on log analysis, more systematic on all 12 services and 21 timers, and specifically designed to find Cluster A silent failures.
45
+
46
+ ---
47
+
48
+ ## Section 2: Systemd Service Monitoring Tools
49
+
50
+ ### 2.1 systemd_mon
51
+
52
+ **Source:** [joonty/systemd_mon](https://github.com/joonty/systemd_mon)
53
+
54
+ A Ruby daemon that monitors systemd units via DBus subscription (no polling). Relevant patterns:
55
+
56
+ - **Event-driven via DBus:** Subscribes to state-change notifications — zero CPU overhead at idle. For a Claude Code agent (which runs on-demand rather than as a daemon), the equivalent is `systemctl --user list-units` + `systemctl show` — a snapshot-based approach.
57
+ - **State aggregation:** Systemd emits granular intermediate states (activating/start-pre, activating/start) during transitions. systemd_mon queues these until a stable terminal state emerges, then classifies the outcome as: `recovered`, `automatically restarted`, or `still failed`. This three-state taxonomy is exactly right for the agent's report format.
58
+ - **Restart loop detection:** If a service cycles through activating→failed→activating repeatedly, the queue of state transitions reveals the loop before the `StartLimitBurst` threshold is hit. The agent equivalent: check `NRestarts` from `systemctl show` combined with `ActiveEnterTimestamp` to compute restart frequency.
59
+ - **Alerting channels:** Email, Slack, HipChat. Not directly applicable, but the pattern of "agent-detected event → Telegram notification" fits the existing Telegram infrastructure.
60
+
61
+ ### 2.2 GASP (AI-First Linux Monitoring)
62
+
63
+ **Source:** [AcceleratedIndustries/gasp](https://github.com/AcceleratedIndustries/gasp)
64
+
65
+ The most directly relevant philosophy: monitoring designed for LLM consumption, not human dashboards. Key design principles:
66
+
67
+ - **Context-rich output format:** Structures data for AI reasoning rather than for terminal readability. Each service entry includes state, restart count, recent errors, and resource usage in a single coherent block.
68
+ - **Planned features (in development):** systemd unit states + failed services + restart tracking; 24-hour rolling baselines for anomaly detection; journal log analysis with error rate trending and pattern detection; MCP server implementation.
69
+ - **AI-first anomaly detection:** Rather than threshold alerting (PagerDuty model), GASP aims for contextual reasoning — "is this error rate unusual given time of day and service type?" This is the right model for a Claude Code agent.
70
+ - **Gap:** GASP is still early-stage and daemon-based (Go binary). The Claude Code agent gets the same benefit without a separate process by running on-demand with AI reasoning inline.
71
+
72
+ ### 2.3 pengutronix/monitoring-check-systemd-service
73
+
74
+ **Source:** [pengutronix/monitoring-check-systemd-service](https://github.com/pengutronix/monitoring-check-systemd-service)
75
+
76
+ A Nagios/Icinga plugin that provides the most complete enumeration of `systemctl show` properties worth checking:
77
+
78
+ - Uses DBus (not parsing `systemctl status` text output) — stable, machine-readable
79
+ - Properties it checks: `Id`, `ActiveState`, `SubState`, `LoadState`
80
+ - State mapping (directly adoptable):
81
+ - `LoadState != loaded` → NOT_LOADED (service missing or masked)
82
+ - `ActiveState == failed` → CRITICAL
83
+ - `ActiveState == active` → OK
84
+ - `ActiveState == inactive`, `SubState == dead` → DEAD (warn)
85
+ - `ActiveState == activating/deactivating/reloading` → CHANGING (potential restart loop)
86
+ - **Key insight:** Parsing `systemctl status` text output is explicitly discouraged by systemd developers (Lennart Poettering). Use `systemctl show -p PropertyName` or DBus. The agent should use `systemctl --user show <service> -p ActiveState,SubState,NRestarts,Result,ExecMainStartTimestamp,ActiveEnterTimestamp`.
87
+
88
+ ### 2.4 systemd-doctor
89
+
90
+ **Source:** [0xkelvin/systemd-doctor](https://github.com/0xkelvin/systemd-doctor)
91
+
92
+ Embedded Linux service health tracker. Relevant patterns:
93
+ - Stores metrics in a time-series database to detect trend-based anomalies, not just point-in-time failures
94
+ - Integrates with systemd to auto-restart services when abnormalities detected
95
+ - For the agent: the equivalent of trend tracking is `memory: user` — the agent reads/writes its `MEMORY.md` to track baseline restart counts, error rates per service, and flag deviations from previous runs.
96
+
97
+ ---
98
+
99
+ ## Section 3: Log Pattern Analysis Tools
100
+
101
+ ### 3.1 gjalves/logwatch
102
+
103
+ **Source:** [gjalves/logwatch](https://github.com/gjalves/logwatch)
104
+
105
+ Real-time log monitor (C) for syslog and systemd. Relevant patterns:
106
+ - Pattern-to-action model: define a regex → trigger a script
107
+ - Dual source support: both syslog (`/var/log/syslog`) and systemd journal (via journalctl)
108
+ - For the agent: the equivalent is running `journalctl --user -u <service> --since "24 hours ago" -p warning -o cat` per service, then scanning output for known error patterns
109
+
110
+ ### 3.2 journalctl Patterns for Log Analysis
111
+
112
+ **Sources:** [DigitalOcean journalctl guide](https://www.digitalocean.com/community/tutorials/how-to-use-journalctl-to-view-and-manipulate-systemd-logs) | [Last9 journalctl cheatsheet](https://last9.io/blog/journalctl-commands-cheatsheet/) | [freedesktop.org journalctl man page](https://www.freedesktop.org/software/systemd/man/latest/journalctl.html)
113
+
114
+ The most useful journalctl invocations for the agent:
115
+
116
+ ```bash
117
+ # Error count per service (last 24h)
118
+ journalctl --user -u <service> --since "24 hours ago" -p err -q --no-pager | wc -l
119
+
120
+ # Most frequent error messages (deduped)
121
+ journalctl --user -u <service> --since "24 hours ago" -p warning -o cat --no-pager \
122
+ | sort | uniq -c | sort -rn | head -20
123
+
124
+ # Restart loop detection: look for rapid state transitions
125
+ journalctl --user -u <service> --since "1 hour ago" -o cat --no-pager \
126
+ | grep -E "(Started|Stopped|Failed|start limit)" | tail -20
127
+
128
+ # JSON output for structured parsing
129
+ journalctl --user -u <service> --since "1 hour ago" -p err -o json --no-pager
130
+
131
+ # System-level: find ALL services with recent errors (across all services at once)
132
+ journalctl --user --since "24 hours ago" -p err --no-pager -q -o cat \
133
+ | grep "_SYSTEMD_UNIT=" | sort | uniq -c | sort -rn
134
+ ```
135
+
136
+ **Priority levels for the agent:**
137
+ - `-p err` (level 3): errors only — good for critical detection
138
+ - `-p warning` (level 4): warnings + errors — good for Cluster A silent failure hunting
139
+ - `-p info` (level 6): all messages — useful for verifying a service is actually doing work (not just alive)
140
+
141
+ **Silent failure detection pattern (Cluster A):**
142
+ A service can be `active (running)` but have produced zero log output in 24 hours — it's alive but doing nothing. Detection:
143
+ ```bash
144
+ # Count of log entries in last 24h (any priority)
145
+ journalctl --user -u <service> --since "24 hours ago" --no-pager -q | wc -l
146
+ # If count == 0 AND service is active: SILENT FAILURE CANDIDATE
147
+ ```
148
+
149
+ ### 3.3 incident-helper
150
+
151
+ **Source:** [malikyawar/incident-helper](https://github.com/malikyawar/incident-helper)
152
+
153
+ AI-powered terminal assistant for SREs. Key architectural decisions:
154
+
155
+ - **ServiceResolver:** Runs `systemctl status <service>` + extracts structured fields (unit file, PID, memory, state, recent log lines) into a context block for LLM reasoning
156
+ - **LogResolver:** Reads log files + applies regex pattern detection before passing to LLM — reduces token usage by pre-filtering
157
+ - **Provider abstraction:** Supports multiple LLM backends (OpenAI, Anthropic, local) — for the Claude Code agent, the LLM is Claude itself, so the pre-filtering pattern still applies
158
+ - **Context-aware prompting:** Includes specialized system prompts for "service down", "high error rate", "restart loop" scenarios — each with different diagnostic questions
159
+
160
+ **Key insight:** Pre-filter log output before passing to the LLM. Running 12 services × 24h of logs = potentially hundreds of KB of text. The agent should: (1) collect raw counts and error summaries via bash, (2) extract only the top 20 most frequent error messages per service, (3) pass that structured summary to Claude for pattern reasoning.
161
+
162
+ ---
163
+
164
+ ## Section 4: systemctl Properties for Health Checking
165
+
166
+ **Sources:** [systemctl.com show command reference](https://www.systemctl.com/commands/show/) | [freedesktop.org systemctl man](https://www.freedesktop.org/software/systemd/man/latest/systemctl.html) | [Baeldung NRestarts guide](https://www.baeldung.com/linux/systemd-show-times-service-restarted)
167
+
168
+ ### Service Properties (via `systemctl --user show <service> -p ...`)
169
+
170
+ | Property | Use |
171
+ |----------|-----|
172
+ | `ActiveState` | active/inactive/failed/activating/deactivating |
173
+ | `SubState` | running/dead/exited/failed (substate of ActiveState) |
174
+ | `LoadState` | loaded/not-found/masked |
175
+ | `NRestarts` | Cumulative auto-restart count since last manual start |
176
+ | `Result` | success/exit-code/signal/core-dump/watchdog/start-limit-hit |
177
+ | `ExecMainStartTimestamp` | When main process last started (ISO format) |
178
+ | `ActiveEnterTimestamp` | When service last became active |
179
+ | `ActiveExitTimestamp` | When service last stopped being active |
180
+ | `ExecMainExitTimestamp` | When main process last exited |
181
+ | `MainPID` | Current main process PID (0 if not running) |
182
+ | `MemoryCurrent` | Current memory usage in bytes |
183
+ | `MemoryMax` | Configured memory limit |
184
+ | `CPUUsageNSec` | Cumulative CPU time |
185
+
186
+ **Restart loop detection formula:**
187
+ ```bash
188
+ NRESTARTS=$(systemctl --user show <service> -p NRestarts --value)
189
+ ENTER_TS=$(systemctl --user show <service> -p ActiveEnterTimestamp --value)
190
+ # If NRestarts > 3 and ActiveEnterTimestamp < 1 hour ago: recent restart loop
191
+ ```
192
+
193
+ **Start-limit-hit detection:**
194
+ ```bash
195
+ RESULT=$(systemctl --user show <service> -p Result --value)
196
+ # If Result == "start-limit-hit": service is in death spiral, won't auto-restart
197
+ ```
198
+
199
+ ### Timer Properties (via `systemctl --user show <timer> -p ...`)
200
+
201
+ | Property | Use |
202
+ |----------|-----|
203
+ | `LastTriggerUSec` | Last time the timer fired (microseconds since epoch) |
204
+ | `NextElapseUSecRealtime` | Next scheduled fire time |
205
+ | `ActiveState` | active (waiting) / inactive / failed |
206
+ | `Result` | success/failure for last trigger |
207
+
208
+ **Missed run detection:**
209
+ ```bash
210
+ LAST_TRIGGER=$(systemctl --user show <timer>.timer -p LastTriggerUSec --value)
211
+ NOW_USEC=$(date +%s%6N) # current time in microseconds
212
+ AGE_HOURS=$(( (NOW_USEC - LAST_TRIGGER) / 3600000000 ))
213
+ # Compare against expected interval (e.g., notion-sync should fire every 6h)
214
+ # If AGE_HOURS > 1.5 * expected_interval: MISSED RUN
215
+ ```
216
+
217
+ **Known issue:** `systemctl list-timers` output is human-formatted. Machine-readable checks require `systemctl show` with `-p` flags.
218
+
219
+ ---
220
+
221
+ ## Section 5: Dependency Health Patterns
222
+
223
+ **Sources:** [Netdata systemd units monitoring](https://www.netdata.cloud/monitoring-101/systemdunits-monitoring/) | [CubePath service monitoring guide](https://cubepath.com/docs/monitoring-logging/service-monitoring-with-systemd) | [Zabbix systemd template](https://github.com/MogiePete/zabbix-systemd-service-monitoring)
224
+
225
+ ### Service Dependency Failure Cascade
226
+
227
+ Systemd tracks `Wants=`, `Requires=`, `After=` dependencies. A service can fail because its dependency failed — the unit itself shows "failed" but the root cause is elsewhere. Detection pattern:
228
+
229
+ ```bash
230
+ # Check if a failed service has failed dependencies
231
+ systemctl --user list-dependencies <service> --failed
232
+ # Or: check the service's own log for "dependency failed" messages
233
+ journalctl --user -u <service> --since "1 hour ago" -o cat | grep -i "depend"
234
+ ```
235
+
236
+ **Known dependency in the target system:**
237
+ - `aria-hub.service` depends on MQTT broker (core_mosquitto on HA Pi). If `aria-hub` fails, the agent should check MQTT connectivity: `nc -z <mqtt-broker-ip> 1883`
238
+ - `telegram-listener.service` and `telegram-capture.service` both poll the same Telegram bot API. The 409 conflict error ("getUpdates: only one must be allowed") appears in their logs as a distinct pattern. The agent must check for this specific pattern.
239
+
240
+ ### Known Failure Patterns to Detect
241
+
242
+ | Pattern | Service(s) | Detection |
243
+ |---------|-----------|-----------|
244
+ | Telegram 409 conflict | telegram-listener, telegram-capture | `journalctl --user -u telegram-*.service --since "1h ago" \| grep "409"` |
245
+ | Ollama queue starvation | ollama-queue | Check port 7683 responds; check queue length via API |
246
+ | MQTT disconnect | aria-hub | `journalctl --user -u aria-hub --since "1h ago" \| grep -i "mqtt\|disconnect\|reconnect"` |
247
+ | Memory limit OOM kill | open-webui, gpt-researcher | `Result == oom-kill` in systemctl show; `journalctl -k --since "24h ago" \| grep "oom"` |
248
+ | Start limit hit | any | `Result == start-limit-hit` — service will not auto-restart |
249
+ | Silent active failure | any | `ActiveState == active` + 0 log entries in 24h |
250
+
251
+ ---
252
+
253
+ ## Section 6: Timer and Cron Job Monitoring
254
+
255
+ **Sources:** [check_systemd PyPI](https://pypi.org/project/check_systemd/) | [healthchecks.io](https://healthchecks.io/) | [ArchWiki systemd/Timers](https://wiki.archlinux.org/title/Systemd/Timers)
256
+
257
+ ### check_systemd (Python, PyPI)
258
+
259
+ A Nagios-compatible plugin that includes timer-specific monitoring:
260
+ - `--dead-timers` parameter detects timers that have not fired in longer than expected
261
+ - Checks `LastTriggerUSec` against a configurable age threshold
262
+ - Supports both system and user scope (`--user` flag)
263
+
264
+ Installation: `pip install check_systemd`. Relevant invocation:
265
+ ```bash
266
+ check_systemd --user --dead-timers --dead-timers-warning 1.5 --dead-timers-critical 2.0
267
+ # Warning if timer hasn't fired in 1.5x its expected interval
268
+ ```
269
+
270
+ ### Timer Health Check Strategy
271
+
272
+ The 21 timers have varying intervals. The agent needs a per-timer expected interval table:
273
+
274
+ | Timer | Expected Interval | Max Acceptable Age |
275
+ |-------|------------------|--------------------|
276
+ | `aria-watchdog.timer` | 5 minutes | 15 minutes |
277
+ | `ha-log-sync.timer` | 15 minutes | 45 minutes |
278
+ | `telegram-brief-alerts.timer` | 5 minutes | 15 minutes |
279
+ | `notion-sync.timer` | 6 hours | 9 hours |
280
+ | `notion-vector-sync.timer` | 6 hours | 9 hours |
281
+ | `telegram-capture-sync.timer` | 6 hours | 9 hours |
282
+ | `telegram-brief-{morning,midday,evening}.timer` | daily | 30 hours |
283
+ | `aria-*.timer` (daily) | daily | 30 hours |
284
+ | `aria-*.timer` (weekly) | weekly | 9 days |
285
+ | `ha-log-sync-rotate.timer` | daily | 30 hours |
286
+ | `lessons-review.timer` | monthly | 35 days |
287
+
288
+ ---
289
+
290
+ ## Synthesis: Best Patterns to Adopt
291
+
292
+ ### Pattern 1: Deterministic Inspection + AI Interpretation (from incident-helper)
293
+
294
+ Do NOT ask Claude to "look at the services." Instead, run a deterministic bash inspection suite that produces structured data, then pass that structured data to Claude for pattern reasoning. The agent's job is 80% bash data collection, 20% AI interpretation.
295
+
296
+ ### Pattern 2: Three-Tier Severity from systemd_mon
297
+
298
+ Adopt the `recovered / restarting / still-failed` taxonomy and expand it:
299
+ - **CRITICAL:** `ActiveState == failed`, `Result == start-limit-hit`, `Result == oom-kill`
300
+ - **WARNING:** `NRestarts > 3` in last hour, timer missed by >1.5x interval, error count > threshold
301
+ - **ANOMALY (Cluster A):** `ActiveState == active` + zero log entries in 24h, Telegram 409 in logs, MQTT disconnect loop
302
+ - **OK:** All checks pass
303
+
304
+ ### Pattern 3: Machine-Readable Properties Only (from pengutronix)
305
+
306
+ Never parse `systemctl status` text. Always use:
307
+ ```bash
308
+ systemctl --user show <service> -p ActiveState,SubState,NRestarts,Result,ExecMainStartTimestamp --value
309
+ ```
310
+
311
+ ### Pattern 4: Per-Service Log Error Rate (from GASP's AI-first philosophy)
312
+
313
+ For each service, compute: `(error_count_last_24h, warning_count_last_24h, total_entries_last_24h)`. Pass the ratio, not the raw logs. A service with 1000 entries and 5 errors (0.5%) is healthier than one with 10 entries and 3 errors (30%).
314
+
315
+ ### Pattern 5: Memory-Based Baselines (from systemd-doctor + GASP)
316
+
317
+ Use `memory: user` to persist:
318
+ - NRestarts baseline per service (from last clean run)
319
+ - Error rate baseline per service
320
+ - "Last seen active" timestamp for silent failure detection
321
+ Compare current values against stored baselines; flag deviations >2x as anomalies.
322
+
323
+ ### Pattern 6: Pre-Filter Before LLM (from incident-helper's LogResolver)
324
+
325
+ ```bash
326
+ # Collect: top 20 error messages per service (deduplicated)
327
+ journalctl --user -u <service> --since "24 hours ago" -p err -o cat --no-pager \
328
+ | sort | uniq -c | sort -rn | head -20
329
+ ```
330
+
331
+ Pass this 20-line summary to Claude, not hundreds of raw log lines.
332
+
333
+ ---
334
+
335
+ ## Recommended Agent Structure
336
+
337
+ ### File Location
338
+
339
+ `~/.claude/agents/service-monitor.md` (user-level, available in all projects)
340
+
341
+ ### Frontmatter
342
+
343
+ ```yaml
344
+ ---
345
+ name: service-monitor
346
+ description: Audits all 12 user systemd services and 21 timers for failures, restart loops, silent errors, resource anomalies, and known failure patterns (Telegram 409, MQTT disconnect, OOM kills, start-limit-hit). Use when asked about service health, when services may be failing silently, or before commits that touch service code.
347
+ tools: Bash, Read, Grep
348
+ model: sonnet
349
+ memory: user
350
+ maxTurns: 50
351
+ ---
352
+ ```
353
+
354
+ ### System Prompt Structure (recommended sections)
355
+
356
+ 1. **Identity:** You are a systemd service health monitor for a personal Linux workstation with 12 user services and 21 timers.
357
+
358
+ 2. **Inspection phases (ordered):**
359
+ - Phase 1: Service state sweep (all 12 services via `systemctl --user show`)
360
+ - Phase 2: Timer health check (all 21 timers, compare LastTriggerUSec against interval table)
361
+ - Phase 3: Per-service log analysis (error rates, silent failure detection, known pattern matching)
362
+ - Phase 4: Resource anomaly check (memory usage vs MemoryMax, load average)
363
+ - Phase 5: Known failure pattern scan (Telegram 409, MQTT disconnect, OOM kills)
364
+ - Phase 6: Baseline comparison (read MEMORY.md for previous baselines, flag deviations)
365
+
366
+ 3. **Data collection commands:** Explicit bash commands for each phase (do not improvise)
367
+
368
+ 4. **Output format:** Severity-stratified report (CRITICAL / WARNING / ANOMALY / OK) with recommended actions
369
+
370
+ 5. **Memory update:** After each run, update MEMORY.md with new baselines
371
+
372
+ ### Report Format
373
+
374
+ ```
375
+ SERVICE MONITOR REPORT — <timestamp>
376
+
377
+ CRITICAL (immediate action required):
378
+ - [service]: [issue] — [recommended action]
379
+
380
+ WARNING (investigate soon):
381
+ - [service]: [issue] — [recommended action]
382
+
383
+ ANOMALY — Cluster A Candidates (silent failures):
384
+ - [service]: active but [N] log entries in 24h (baseline: [M]) — verify service is doing work
385
+
386
+ TIMER ISSUES:
387
+ - [timer]: last fired [X] hours ago (expected: every [Y] hours)
388
+
389
+ OK: [N] services healthy, [M] timers on schedule
390
+
391
+ Baseline updated: [timestamp]
392
+ ```
393
+
394
+ ### Known Limitations to Document in the Agent
395
+
396
+ 1. `NRestarts` is cumulative since last `systemctl --user start` — it does not reset on auto-restart. This means a service restarted manually 30 days ago and auto-restarted 5 times since shows NRestarts=5, but a service that was started 1 hour ago and crashed 5 times also shows NRestarts=5. Must combine with `ActiveEnterTimestamp` to compute restart frequency.
397
+
398
+ 2. `LastTriggerUSec` returns 0 for timers that have never fired (e.g., newly installed). The agent should detect this and flag it as a setup issue, not a missed run.
399
+
400
+ 3. User systemd scope requires `--user` flag on all `systemctl` and `journalctl` commands. System-scope services (ssh.socket, tailscaled) require separate invocations without `--user`.
401
+
402
+ 4. `journalctl --user` may require lingering to be enabled (`loginctl show-user justin -p Linger`). If lingering is off, user services stop on logout and logs may be incomplete.
403
+
404
+ ---
405
+
406
+ ## Sources
407
+
408
+ - [Anthropic Claude Code Sub-agents Documentation](https://code.claude.com/docs/en/sub-agents) — definitive agent file format, frontmatter fields, permissionMode, memory scopes
409
+ - [joonty/systemd_mon](https://github.com/joonty/systemd_mon) — event-driven DBus monitoring, state aggregation, restart loop taxonomy
410
+ - [AcceleratedIndustries/gasp](https://github.com/AcceleratedIndustries/gasp) — AI-first monitoring philosophy, context-rich output for LLM consumption
411
+ - [pengutronix/monitoring-check-systemd-service](https://github.com/pengutronix/monitoring-check-systemd-service) — machine-readable property enumeration, DBus-over-text-parsing rationale
412
+ - [malikyawar/incident-helper](https://github.com/malikyawar/incident-helper) — ServiceResolver + LogResolver architecture, pre-filtering before LLM
413
+ - [gjalves/logwatch](https://github.com/gjalves/logwatch) — pattern-to-action model, syslog + systemd dual source
414
+ - [0xkelvin/systemd-doctor](https://github.com/0xkelvin/systemd-doctor) — time-series health tracking for trend-based anomaly detection
415
+ - [systemd/python-systemd](https://github.com/systemd/python-systemd) — programmatic journal access API
416
+ - [check_systemd PyPI](https://pypi.org/project/check_systemd/) — timer dead-run detection, `--dead-timers` parameter
417
+ - [wshobson/agents](https://github.com/wshobson/agents) — Claude Code agent catalog structure, observability-engineer pattern
418
+ - [VoltAgent/awesome-claude-code-subagents](https://github.com/VoltAgent/awesome-claude-code-subagents) — subagent design patterns, infrastructure category coverage
419
+ - [Piebald-AI/claude-code-system-prompts](https://github.com/Piebald-AI/claude-code-system-prompts) — Claude Code system prompt internals
420
+ - [DigitalOcean journalctl guide](https://www.digitalocean.com/community/tutorials/how-to-use-journalctl-to-view-and-manipulate-systemd-logs) — journalctl flag reference
421
+ - [Last9 journalctl cheatsheet](https://last9.io/blog/journalctl-commands-cheatsheet/) — practical monitoring patterns
422
+ - [freedesktop.org systemctl man](https://www.freedesktop.org/software/systemd/man/latest/systemctl.html) — authoritative property reference
423
+ - [MogiePete/zabbix-systemd-service-monitoring](https://github.com/MogiePete/zabbix-systemd-service-monitoring) — multi-service discovery and monitoring template
424
+ - [healthchecks.io](https://healthchecks.io/) — cron/timer missed-run detection model
425
+ - [Netdata systemd units monitoring](https://www.netdata.cloud/monitoring-101/systemdunits-monitoring/) — metrics and state monitoring reference