autonomous-coding-toolkit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +22 -0
- package/.claude-plugin/plugin.json +13 -0
- package/LICENSE +21 -0
- package/Makefile +21 -0
- package/README.md +140 -0
- package/SECURITY.md +28 -0
- package/agents/bash-expert.md +113 -0
- package/agents/dependency-auditor.md +138 -0
- package/agents/integration-tester.md +120 -0
- package/agents/lesson-scanner.md +149 -0
- package/agents/python-expert.md +179 -0
- package/agents/service-monitor.md +141 -0
- package/agents/shell-expert.md +147 -0
- package/benchmarks/runner.sh +147 -0
- package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
- package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
- package/benchmarks/tasks/02-refactor-module/task.md +8 -0
- package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
- package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
- package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
- package/bin/act.js +238 -0
- package/commands/autocode.md +6 -0
- package/commands/cancel-ralph.md +18 -0
- package/commands/code-factory.md +53 -0
- package/commands/create-prd.md +55 -0
- package/commands/ralph-loop.md +18 -0
- package/commands/run-plan.md +117 -0
- package/commands/submit-lesson.md +122 -0
- package/docs/ARCHITECTURE.md +630 -0
- package/docs/CONTRIBUTING.md +125 -0
- package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
- package/docs/lessons/0002-async-def-without-await.md +28 -0
- package/docs/lessons/0003-create-task-without-callback.md +28 -0
- package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
- package/docs/lessons/0005-sqlite-without-closing.md +33 -0
- package/docs/lessons/0006-venv-pip-path.md +27 -0
- package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
- package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
- package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
- package/docs/lessons/0010-local-outside-function-bash.md +33 -0
- package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
- package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
- package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
- package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
- package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
- package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
- package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
- package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
- package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
- package/docs/lessons/0020-persist-state-incrementally.md +44 -0
- package/docs/lessons/0021-dual-axis-testing.md +48 -0
- package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
- package/docs/lessons/0023-static-analysis-spiral.md +51 -0
- package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
- package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
- package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
- package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
- package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
- package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
- package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
- package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
- package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
- package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
- package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
- package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
- package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
- package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
- package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
- package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
- package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
- package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
- package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
- package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
- package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
- package/docs/lessons/0045-iterative-design-improvement.md +33 -0
- package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
- package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
- package/docs/lessons/0048-integration-wiring-batch.md +40 -0
- package/docs/lessons/0049-ab-verification.md +41 -0
- package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
- package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
- package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
- package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
- package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
- package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
- package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
- package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
- package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
- package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
- package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
- package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
- package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
- package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
- package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
- package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
- package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
- package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
- package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
- package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
- package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
- package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
- package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
- package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
- package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
- package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
- package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
- package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
- package/docs/lessons/0078-static-review-without-live-test.md +30 -0
- package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
- package/docs/lessons/FRAMEWORK.md +161 -0
- package/docs/lessons/SUMMARY.md +201 -0
- package/docs/lessons/TEMPLATE.md +85 -0
- package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
- package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
- package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
- package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
- package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
- package/docs/plans/2026-02-21-mab-research-report.md +406 -0
- package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
- package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
- package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
- package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
- package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
- package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
- package/docs/plans/2026-02-22-mab-run-design.md +462 -0
- package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
- package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
- package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
- package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
- package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
- package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
- package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
- package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
- package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
- package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
- package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
- package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
- package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
- package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
- package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
- package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
- package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
- package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
- package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
- package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
- package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
- package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
- package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
- package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
- package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
- package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
- package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
- package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
- package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
- package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
- package/docs/plans/2026-02-24-headless-module-split.md +443 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
- package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
- package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
- package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
- package/docs/plans/audit-findings.md +186 -0
- package/docs/telegram-notification-format.md +98 -0
- package/examples/example-plan.md +51 -0
- package/examples/example-prd.json +72 -0
- package/examples/example-roadmap.md +33 -0
- package/examples/quickstart-plan.md +63 -0
- package/hooks/hooks.json +26 -0
- package/hooks/setup-symlinks.sh +48 -0
- package/hooks/stop-hook.sh +135 -0
- package/package.json +47 -0
- package/policies/bash.md +71 -0
- package/policies/python.md +71 -0
- package/policies/testing.md +61 -0
- package/policies/universal.md +60 -0
- package/scripts/analyze-report.sh +97 -0
- package/scripts/architecture-map.sh +145 -0
- package/scripts/auto-compound.sh +273 -0
- package/scripts/batch-audit.sh +42 -0
- package/scripts/batch-test.sh +101 -0
- package/scripts/entropy-audit.sh +221 -0
- package/scripts/failure-digest.sh +51 -0
- package/scripts/generate-ast-rules.sh +96 -0
- package/scripts/init.sh +112 -0
- package/scripts/lesson-check.sh +428 -0
- package/scripts/lib/common.sh +61 -0
- package/scripts/lib/cost-tracking.sh +153 -0
- package/scripts/lib/ollama.sh +60 -0
- package/scripts/lib/progress-writer.sh +128 -0
- package/scripts/lib/run-plan-context.sh +215 -0
- package/scripts/lib/run-plan-echo-back.sh +231 -0
- package/scripts/lib/run-plan-headless.sh +396 -0
- package/scripts/lib/run-plan-notify.sh +57 -0
- package/scripts/lib/run-plan-parser.sh +81 -0
- package/scripts/lib/run-plan-prompt.sh +215 -0
- package/scripts/lib/run-plan-quality-gate.sh +132 -0
- package/scripts/lib/run-plan-routing.sh +315 -0
- package/scripts/lib/run-plan-sampling.sh +170 -0
- package/scripts/lib/run-plan-scoring.sh +146 -0
- package/scripts/lib/run-plan-state.sh +142 -0
- package/scripts/lib/run-plan-team.sh +199 -0
- package/scripts/lib/telegram.sh +54 -0
- package/scripts/lib/thompson-sampling.sh +176 -0
- package/scripts/license-check.sh +74 -0
- package/scripts/mab-run.sh +575 -0
- package/scripts/module-size-check.sh +146 -0
- package/scripts/patterns/async-no-await.yml +5 -0
- package/scripts/patterns/bare-except.yml +6 -0
- package/scripts/patterns/empty-catch.yml +6 -0
- package/scripts/patterns/hardcoded-localhost.yml +9 -0
- package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
- package/scripts/pipeline-status.sh +197 -0
- package/scripts/policy-check.sh +226 -0
- package/scripts/prior-art-search.sh +133 -0
- package/scripts/promote-mab-lessons.sh +126 -0
- package/scripts/prompts/agent-a-superpowers.md +29 -0
- package/scripts/prompts/agent-b-ralph.md +29 -0
- package/scripts/prompts/judge-agent.md +61 -0
- package/scripts/prompts/planner-agent.md +44 -0
- package/scripts/pull-community-lessons.sh +90 -0
- package/scripts/quality-gate.sh +266 -0
- package/scripts/research-gate.sh +90 -0
- package/scripts/run-plan.sh +329 -0
- package/scripts/scope-infer.sh +159 -0
- package/scripts/setup-ralph-loop.sh +155 -0
- package/scripts/telemetry.sh +230 -0
- package/scripts/tests/run-all-tests.sh +52 -0
- package/scripts/tests/test-act-cli.sh +46 -0
- package/scripts/tests/test-agents-md.sh +87 -0
- package/scripts/tests/test-analyze-report.sh +114 -0
- package/scripts/tests/test-architecture-map.sh +89 -0
- package/scripts/tests/test-auto-compound.sh +169 -0
- package/scripts/tests/test-batch-test.sh +65 -0
- package/scripts/tests/test-benchmark-runner.sh +25 -0
- package/scripts/tests/test-common.sh +168 -0
- package/scripts/tests/test-cost-tracking.sh +158 -0
- package/scripts/tests/test-echo-back.sh +180 -0
- package/scripts/tests/test-entropy-audit.sh +146 -0
- package/scripts/tests/test-failure-digest.sh +66 -0
- package/scripts/tests/test-generate-ast-rules.sh +145 -0
- package/scripts/tests/test-helpers.sh +82 -0
- package/scripts/tests/test-init.sh +47 -0
- package/scripts/tests/test-lesson-check.sh +278 -0
- package/scripts/tests/test-lesson-local.sh +55 -0
- package/scripts/tests/test-license-check.sh +109 -0
- package/scripts/tests/test-mab-run.sh +182 -0
- package/scripts/tests/test-ollama-lib.sh +49 -0
- package/scripts/tests/test-ollama.sh +60 -0
- package/scripts/tests/test-pipeline-status.sh +198 -0
- package/scripts/tests/test-policy-check.sh +124 -0
- package/scripts/tests/test-prior-art-search.sh +96 -0
- package/scripts/tests/test-progress-writer.sh +140 -0
- package/scripts/tests/test-promote-mab-lessons.sh +110 -0
- package/scripts/tests/test-pull-community-lessons.sh +149 -0
- package/scripts/tests/test-quality-gate.sh +241 -0
- package/scripts/tests/test-research-gate.sh +132 -0
- package/scripts/tests/test-run-plan-cli.sh +86 -0
- package/scripts/tests/test-run-plan-context.sh +305 -0
- package/scripts/tests/test-run-plan-e2e.sh +153 -0
- package/scripts/tests/test-run-plan-headless.sh +424 -0
- package/scripts/tests/test-run-plan-notify.sh +124 -0
- package/scripts/tests/test-run-plan-parser.sh +217 -0
- package/scripts/tests/test-run-plan-prompt.sh +254 -0
- package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
- package/scripts/tests/test-run-plan-routing.sh +178 -0
- package/scripts/tests/test-run-plan-scoring.sh +148 -0
- package/scripts/tests/test-run-plan-state.sh +261 -0
- package/scripts/tests/test-run-plan-team.sh +157 -0
- package/scripts/tests/test-scope-infer.sh +150 -0
- package/scripts/tests/test-setup-ralph-loop.sh +63 -0
- package/scripts/tests/test-telegram-env.sh +38 -0
- package/scripts/tests/test-telegram.sh +121 -0
- package/scripts/tests/test-telemetry.sh +46 -0
- package/scripts/tests/test-thompson-sampling.sh +139 -0
- package/scripts/tests/test-validate-all.sh +60 -0
- package/scripts/tests/test-validate-commands.sh +89 -0
- package/scripts/tests/test-validate-hooks.sh +98 -0
- package/scripts/tests/test-validate-lessons.sh +150 -0
- package/scripts/tests/test-validate-plan-quality.sh +235 -0
- package/scripts/tests/test-validate-plans.sh +187 -0
- package/scripts/tests/test-validate-plugin.sh +106 -0
- package/scripts/tests/test-validate-prd.sh +184 -0
- package/scripts/tests/test-validate-skills.sh +134 -0
- package/scripts/validate-all.sh +57 -0
- package/scripts/validate-commands.sh +67 -0
- package/scripts/validate-hooks.sh +89 -0
- package/scripts/validate-lessons.sh +98 -0
- package/scripts/validate-plan-quality.sh +369 -0
- package/scripts/validate-plans.sh +120 -0
- package/scripts/validate-plugin.sh +86 -0
- package/scripts/validate-policies.sh +42 -0
- package/scripts/validate-prd.sh +118 -0
- package/scripts/validate-skills.sh +96 -0
- package/skills/autocode/SKILL.md +285 -0
- package/skills/autocode/ab-verification.md +51 -0
- package/skills/autocode/code-quality-standards.md +37 -0
- package/skills/autocode/competitive-mode.md +364 -0
- package/skills/brainstorming/SKILL.md +97 -0
- package/skills/capture-lesson/SKILL.md +187 -0
- package/skills/check-lessons/SKILL.md +116 -0
- package/skills/dispatching-parallel-agents/SKILL.md +110 -0
- package/skills/executing-plans/SKILL.md +85 -0
- package/skills/finishing-a-development-branch/SKILL.md +201 -0
- package/skills/receiving-code-review/SKILL.md +72 -0
- package/skills/requesting-code-review/SKILL.md +59 -0
- package/skills/requesting-code-review/code-reviewer.md +82 -0
- package/skills/research/SKILL.md +145 -0
- package/skills/roadmap/SKILL.md +115 -0
- package/skills/subagent-driven-development/SKILL.md +98 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
- package/skills/subagent-driven-development/implementer-prompt.md +73 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
- package/skills/systematic-debugging/SKILL.md +134 -0
- package/skills/systematic-debugging/condition-based-waiting.md +64 -0
- package/skills/systematic-debugging/defense-in-depth.md +32 -0
- package/skills/systematic-debugging/root-cause-tracing.md +55 -0
- package/skills/test-driven-development/SKILL.md +167 -0
- package/skills/using-git-worktrees/SKILL.md +219 -0
- package/skills/using-superpowers/SKILL.md +54 -0
- package/skills/verification-before-completion/SKILL.md +140 -0
- package/skills/verify/SKILL.md +82 -0
- package/skills/writing-plans/SKILL.md +128 -0
- package/skills/writing-skills/SKILL.md +93 -0
|
@@ -0,0 +1,533 @@
|
|
|
1
|
+
# Research: Shell Expert Claude Code Agent
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-02-23
|
|
4
|
+
**Status:** Complete
|
|
5
|
+
**Confidence:** High on tool landscape and check taxonomy; Medium on agent structure (few direct precedents for this exact scope)
|
|
6
|
+
**Cynefin domain:** Complicated — knowable with expert analysis
|
|
7
|
+
**Scope:** System operations agent — systemd services, PATH/environment issues, package management, permissions, config integrity. NOT script writing (that belongs to bash-expert, not this agent).
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## BLUF
|
|
12
|
+
|
|
13
|
+
No public Claude Code agent targets the narrow ops domain this agent needs: systemd lifecycle, PATH/environment debugging, package health, and permissions auditing on a personal Linux workstation. The closest precedents are the existing `infra-auditor` agent (already in `~/.claude/agents/`) and the `devops-engineer` agents from VoltAgent/wshobson, which are cloud-IaC-oriented and too broad. The shell-expert agent should be built as a **diagnostic and remediation agent** scoped to five ops domains, using `systemd-analyze` as its primary systemd oracle, Lynis-style check categories as its audit vocabulary, and the existing `infra-auditor` as its status-monitoring complement (not replacement). Build as `~/.claude/agents/shell-expert.md`.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## Section 1: Claude Code Custom Agent Survey — DevOps/Infrastructure
|
|
18
|
+
|
|
19
|
+
### Sources
|
|
20
|
+
|
|
21
|
+
- [VoltAgent/awesome-claude-code-subagents](https://github.com/VoltAgent/awesome-claude-code-subagents) — 100+ production subagents, infrastructure category
|
|
22
|
+
- [wshobson/agents](https://github.com/wshobson/agents) — 76 agents, multi-agent orchestration
|
|
23
|
+
- [iannuttall/claude-agents](https://github.com/iannuttall/claude-agents) — lightweight agent collection
|
|
24
|
+
- [Anthropic sub-agents docs](https://code.claude.com/docs/en/sub-agents)
|
|
25
|
+
- Existing `~/.claude/agents/infra-auditor.md` — Justin's current ops agent
|
|
26
|
+
|
|
27
|
+
### Findings
|
|
28
|
+
|
|
29
|
+
**Structural pattern (all sources agree on this format):**
|
|
30
|
+
```yaml
|
|
31
|
+
---
|
|
32
|
+
name: shell-expert
|
|
33
|
+
description: "Use this agent when diagnosing systemd service failures, PATH/environment issues, package management problems, file permissions, or environment configuration on Linux. NOT for writing shell scripts."
|
|
34
|
+
tools: Read, Grep, Glob, Bash
|
|
35
|
+
model: sonnet
|
|
36
|
+
---
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
The description field is the routing key — Claude dispatches to the agent when user intent matches. The description must be specific enough to avoid false invocations on scripting tasks (those go to bash-expert).
|
|
40
|
+
|
|
41
|
+
**VoltAgent devops-engineer agent — key elements:**
|
|
42
|
+
- Focus: CI/CD pipelines, containers, Kubernetes, cloud IaC — cloud-first, not host-first
|
|
43
|
+
- Model: sonnet (correct for ops work)
|
|
44
|
+
- No systemd-specific checks, no PATH debugging, no host package management
|
|
45
|
+
- Success metrics: 100% automation, >99.9% availability — enterprise framing, not personal workstation
|
|
46
|
+
|
|
47
|
+
**wshobson devops-troubleshooter agent:**
|
|
48
|
+
- Purpose: "Debug production issues, analyze logs, and fix deployment failures"
|
|
49
|
+
- Model: sonnet
|
|
50
|
+
- Coordinate with: devops-troubleshooter, incident-responder, network-engineer
|
|
51
|
+
- Gap: Still cloud/container-centric; does not cover host-level systemd lifecycle or local environment config
|
|
52
|
+
|
|
53
|
+
**wshobson network-engineer agent:**
|
|
54
|
+
- Purpose: "Debug network connectivity, configure load balancers, and analyze traffic"
|
|
55
|
+
- Relevant for: Tailscale diagnostics, DNS resolution failures — partial overlap with shell-expert scope
|
|
56
|
+
|
|
57
|
+
**Existing infra-auditor (Justin's):**
|
|
58
|
+
- Strength: Targeted service health checks, named services, connectivity probes, resource thresholds, config integrity assertions
|
|
59
|
+
- Gap: Status monitoring, not diagnosis — tells you something is broken but doesn't root-cause it or remediate
|
|
60
|
+
- Gap: No systemd unit hardening analysis, no PATH/environment debugging, no package management audit
|
|
61
|
+
- Gap: Hardcoded to specific services — not general-purpose for new service setup or failure investigation
|
|
62
|
+
|
|
63
|
+
**Key pattern from all agents surveyed:** The most useful ops agents have a **checklist-driven diagnostic flow** rather than open-ended instructions. Every domain (services, environment, packages) should have explicit ordered steps that the agent follows, not just "investigate the issue."
|
|
64
|
+
|
|
65
|
+
**Adoption decision:** The shell-expert agent should be the *diagnostic and remediation companion* to infra-auditor's *monitoring* role. When infra-auditor flags a failure, shell-expert is invoked to root-cause and fix it. They are complementary, not overlapping.
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## Section 2: Systemd Service Management — Tools and Validators
|
|
70
|
+
|
|
71
|
+
### Sources
|
|
72
|
+
|
|
73
|
+
- [priv-kweihmann/systemdlint](https://github.com/priv-kweihmann/systemdlint) — Python-based linter for unit files
|
|
74
|
+
- [mackwic/systemd-linter](https://github.com/mackwic/systemd-linter) — Cross-platform unit file linter
|
|
75
|
+
- [systemd/systemd — issue #3677: unit syntax validation](https://github.com/systemd/systemd/issues/3677)
|
|
76
|
+
- [systemd-analyze man page](https://www.freedesktop.org/software/systemd/man/latest/systemd-analyze.html)
|
|
77
|
+
- [linux-audit.com — systemd-analyze](https://linux-audit.com/system-administration/commands/systemd-analyze/)
|
|
78
|
+
- [linux-audit.com — how to verify systemd unit errors](https://linux-audit.com/systemd/faq/how-to-verify-a-systemd-unit-for-errors/)
|
|
79
|
+
- [containersolutions.github.io — debug systemd service units runbook](https://containersolutions.github.io/runbooks/posts/linux/debug-systemd-service-units/)
|
|
80
|
+
|
|
81
|
+
### Findings
|
|
82
|
+
|
|
83
|
+
**systemd-analyze — the primary oracle:**
|
|
84
|
+
|
|
85
|
+
`systemd-analyze` has four subcommands relevant to this agent:
|
|
86
|
+
|
|
87
|
+
| Subcommand | What it does | Use case |
|
|
88
|
+
|---|---|---|
|
|
89
|
+
| `systemd-analyze verify UNIT` | Lints unit file — unknown sections, invalid settings, dependency cycles | Pre-flight before enabling a new unit |
|
|
90
|
+
| `systemd-analyze security UNIT` | Scores hardening posture 0–10 (lower = more secure); lists missing directives | Hardening audit |
|
|
91
|
+
| `systemd-analyze blame` | Lists units by activation time, sorted descending | Boot performance investigation |
|
|
92
|
+
| `systemd-analyze critical-chain` | Timing tree of dependency chain | Slow boot root cause |
|
|
93
|
+
| `systemd-analyze syscall-filter` | Lists syscall filter sets for sandboxing | Understanding `SystemCallFilter` options |
|
|
94
|
+
|
|
95
|
+
The `security` subcommand produces JSON output (`--json=pretty`) with per-setting exposure scores — usable programmatically without parsing human-readable output.
|
|
96
|
+
|
|
97
|
+
**systemdlint (priv-kweihmann):**
|
|
98
|
+
- Originally for cross-compiled embedded images (no live systemd available)
|
|
99
|
+
- Hardening advice output format: `{file}:{line}:{severity} [{id}] - {message}`
|
|
100
|
+
- Example ID: `NoFailureCheck` — return code checking disabled
|
|
101
|
+
- Value: Identifies hardening gaps without running the service
|
|
102
|
+
- Limitation: Does not use systemd's own interpretation; may differ from live systemd behavior
|
|
103
|
+
|
|
104
|
+
**systemd-linter (mackwic):**
|
|
105
|
+
- Cross-platform (Linux, macOS, Windows)
|
|
106
|
+
- Validates unit file structure, applies industry best-practices
|
|
107
|
+
- Useful for writing/reviewing unit files before deployment
|
|
108
|
+
|
|
109
|
+
**Debugging runbook (containersolutions):**
|
|
110
|
+
|
|
111
|
+
Ordered diagnostic procedure for failed services:
|
|
112
|
+
1. `systemctl status <service> --no-pager` — immediate state + recent log lines
|
|
113
|
+
2. `journalctl -u <service> -n 50 --no-pager` — full log context
|
|
114
|
+
3. `journalctl -u <service> -f` — live tail during manual restart attempt
|
|
115
|
+
4. `systemctl status --full --lines=50 <service>` — extended status
|
|
116
|
+
5. Disable `Restart=` temporarily to see underlying errors without auto-restart loop masking them
|
|
117
|
+
6. Run `ExecStart` command manually as the service user (`sudo -u <user> <command>`) to reproduce the environment
|
|
118
|
+
7. Check environment: `systemctl show <service> -p Environment`
|
|
119
|
+
|
|
120
|
+
**Environment-related failure class (most common for user services):**
|
|
121
|
+
- systemd does NOT inherit shell PATH or env vars
|
|
122
|
+
- When PATH is missing: binary lookup uses `/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin` (compiled-in fixed value)
|
|
123
|
+
- When `~` is used in ExecStart: not expanded (systemd is not a shell)
|
|
124
|
+
- When `$HOME` is in `EnvironmentFile`: not shell-expanded — values are literal
|
|
125
|
+
|
|
126
|
+
**Check to adopt:** Before diagnosing any service failure, the agent should first check `systemctl show <service> -p Environment,EnvironmentFile,ExecStart,WorkingDirectory` to get the full runtime config, not just the unit file.
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## Section 3: Systemd Hardening — Check Taxonomy
|
|
131
|
+
|
|
132
|
+
### Sources
|
|
133
|
+
|
|
134
|
+
- [linux-audit.com — how to harden systemd service unit](https://linux-audit.com/systemd/how-to-harden-a-systemd-service-unit/)
|
|
135
|
+
- [linuxjournal.com — systemd service strengthening](https://www.linuxjournal.com/content/systemd-service-strengthening)
|
|
136
|
+
- [ctrl.blog — systemd service sandboxing 101](https://www.ctrl.blog/entry/systemd-service-hardening.html)
|
|
137
|
+
- [synacktiv.com — SHH: systemd hardening made easy](https://www.synacktiv.com/en/publications/systemd-hardening-made-easy-with-shh)
|
|
138
|
+
- [rockylinux.org — systemd units hardening](https://docs.rockylinux.org/9/guides/security/systemd_hardening/)
|
|
139
|
+
|
|
140
|
+
### Findings
|
|
141
|
+
|
|
142
|
+
**Hardening directives by category:**
|
|
143
|
+
|
|
144
|
+
**Privilege escalation prevention:**
|
|
145
|
+
- `NoNewPrivileges=true` — prevents process and children from escalating privileges (SUID, capabilities)
|
|
146
|
+
- `CapabilityBoundingSet=` — restrict Linux capabilities to the minimum required (e.g., `CAP_NET_BIND_SERVICE`)
|
|
147
|
+
- `RestrictSUIDSGID=true` — prevents setuid/setgid file creation
|
|
148
|
+
|
|
149
|
+
**Filesystem restrictions:**
|
|
150
|
+
- `PrivateTmp=yes` — isolated `/tmp` namespace; eliminates tmp-prediction attacks
|
|
151
|
+
- `ProtectSystem=strict` — mounts `/usr`, `/boot`, `/efi` read-only; `=full` also protects `/etc`
|
|
152
|
+
- `ProtectHome=yes` — blocks home directory access; `=read-only` allows reads
|
|
153
|
+
- `ReadOnlyPaths=`, `ReadWritePaths=`, `InaccessiblePaths=` — fine-grained path control
|
|
154
|
+
- `RootDirectory=` — chroot-like confinement
|
|
155
|
+
- `NoExecPaths=/`, `ExecPaths=/usr/bin/myapp` — whitelist-only execution
|
|
156
|
+
|
|
157
|
+
**Namespace isolation:**
|
|
158
|
+
- `PrivateDevices=yes` — isolates hardware device access
|
|
159
|
+
- `PrivateNetwork=yes` — isolated network namespace (for services with no network needs)
|
|
160
|
+
- `RestrictNamespaces=uts ipc pid user cgroup` — blocks specific namespace isolation syscalls
|
|
161
|
+
- `ProtectKernelModules=yes` — prevents explicit kernel module loading
|
|
162
|
+
|
|
163
|
+
**Kernel/system protection:**
|
|
164
|
+
- `ProtectKernelTunables=yes` — read-only kernel tunables (sysctl values)
|
|
165
|
+
- `ProtectControlGroups=yes` — prevents cgroup modification
|
|
166
|
+
- `ProtectClock=yes` — blocks clock changes
|
|
167
|
+
- `ProtectHostname=yes` — blocks hostname/NIS domain changes
|
|
168
|
+
|
|
169
|
+
**Syscall filtering:**
|
|
170
|
+
- `SystemCallFilter=@system-service` — predefined safe set for typical services
|
|
171
|
+
- `SystemCallFilter=~@mount` — blacklist specific syscall groups
|
|
172
|
+
|
|
173
|
+
**Memory security:**
|
|
174
|
+
- `MemoryDenyWriteExecute=yes` — prevents W^X violations (JIT engines need this disabled)
|
|
175
|
+
- `RestrictRealtime=yes` — prevents real-time scheduling (reduces DoS risk)
|
|
176
|
+
- `LockPersonality=yes` — locks execution domain (prevents personality changes)
|
|
177
|
+
|
|
178
|
+
**Network controls:**
|
|
179
|
+
- `IPAddressAllow=192.168.1.0/24` — whitelist allowed source IPs
|
|
180
|
+
- `RestrictAddressFamilies=AF_UNIX AF_INET` — restrict socket families
|
|
181
|
+
- `SocketBindDeny=any` — prevent socket binding except explicitly allowed
|
|
182
|
+
|
|
183
|
+
**SHH (Synacktiv) — automated hardening approach:**
|
|
184
|
+
- Profiles service via strace during operation
|
|
185
|
+
- Maps syscall behavior to compatible hardening directives
|
|
186
|
+
- Excludes directives incompatible with observed behavior
|
|
187
|
+
- Selects most restrictive set that still permits normal operation
|
|
188
|
+
- Good model for the agent's hardening recommendation workflow: observe → profile → recommend
|
|
189
|
+
|
|
190
|
+
**Exposure score interpretation:**
|
|
191
|
+
- `systemd-analyze security <unit>` returns 0.0–10.0 (lower is better)
|
|
192
|
+
- Rating scale: 0–3 = OK, 3–5 = Medium, 5–7 = Exposed, 7–10 = UNSAFE
|
|
193
|
+
- The linuxjournal example went from 9.6 (UNSAFE) to 4.9 (OK) through incremental hardening
|
|
194
|
+
- Recommendation: agent should run this on every user service and flag anything >5
|
|
195
|
+
|
|
196
|
+
**Check to adopt:** The agent should run `systemd-analyze security <service> --json=pretty` on every unit it is asked about, parse exposure score, and present top 5 missing directives by exposure weight.
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
## Section 4: Linux System Hardening Auditors
|
|
201
|
+
|
|
202
|
+
### Sources
|
|
203
|
+
|
|
204
|
+
- [CISOfy/lynis](https://github.com/CISOfy/lynis) — agentless security auditing for Linux/Unix/macOS
|
|
205
|
+
- [nikhilkumar0102/Linux-cis-audit](https://github.com/nikhilkumar0102/Linux-cis-audit) — CIS benchmark auditor for Debian 12
|
|
206
|
+
- [sokdr/LinuxAudit](https://github.com/sokdr/LinuxAudit) — bash audit script
|
|
207
|
+
- [gopikrishna152/security-audit-hardening](https://github.com/gopikrishna152/security-audit-hardening) — combined audit + hardening
|
|
208
|
+
- [trimstray/linux-hardening-checklist](https://github.com/trimstray/linux-hardening-checklist) — production checklist
|
|
209
|
+
|
|
210
|
+
### Findings
|
|
211
|
+
|
|
212
|
+
**Lynis — most comprehensive, most adoptable patterns:**
|
|
213
|
+
|
|
214
|
+
Lynis uses unique identifiers per check (e.g., `KRNL-6000`, `AUTH-9328`) organized by category. Each check maps to a specific audit question. The categories most relevant to this agent:
|
|
215
|
+
|
|
216
|
+
| Category | What it covers |
|
|
217
|
+
|---|---|
|
|
218
|
+
| Boot | GRUB/GRUB2 password, boot loader config |
|
|
219
|
+
| Services | systemctl enabled/disabled, startup services, service manager detection |
|
|
220
|
+
| Users & Groups | Shadow passwords, password aging, inactive accounts, sudo config |
|
|
221
|
+
| File Permissions | `/etc/passwd`, `/etc/shadow`, `/etc/cron.*` ownership and perms |
|
|
222
|
+
| Package Management | Outdated/vulnerable packages, GPG key verification, update policy |
|
|
223
|
+
| Kernel | Kernel version, loaded modules, sysctl hardening parameters |
|
|
224
|
+
| Authentication | PAM config, SSH config, failed login attempts |
|
|
225
|
+
| Networking | Firewall status, open ports, ARP config |
|
|
226
|
+
| Storage | Filesystem options (nodev, nosuid, noexec on mounts) |
|
|
227
|
+
|
|
228
|
+
**Lynis-derived check vocabulary for the agent:**
|
|
229
|
+
|
|
230
|
+
For **package management** domain:
|
|
231
|
+
- Is `apt-get check` clean? (no broken dependencies)
|
|
232
|
+
- Are packages on hold? (`apt-mark showhold`)
|
|
233
|
+
- Are security updates available? (`apt list --upgradable 2>/dev/null | grep -i security`)
|
|
234
|
+
- Are orphaned packages present? (`deborphan` if installed)
|
|
235
|
+
- Is GPG verification enabled in apt config?
|
|
236
|
+
|
|
237
|
+
For **permissions** domain:
|
|
238
|
+
- World-writable files in key directories (`find /etc -perm -o+w`)
|
|
239
|
+
- SUID/SGID binaries not in known-good list (`find / -perm -4000 -o -perm -2000`)
|
|
240
|
+
- `~/.env` mode is 600 (not 644, not 664)
|
|
241
|
+
- Sensitive config files owned by root or service user
|
|
242
|
+
|
|
243
|
+
For **users/groups** domain:
|
|
244
|
+
- No UID 0 accounts beyond root
|
|
245
|
+
- Password aging set for interactive accounts
|
|
246
|
+
- sudo configured with least privilege (NOPASSWD scope is minimal)
|
|
247
|
+
|
|
248
|
+
**CIS benchmark auditor patterns (nikhilkumar0102):**
|
|
249
|
+
- Color-coded PASS/FAIL per check — easy to adopt for agent report format
|
|
250
|
+
- Actionable recommendation per failure — agent should follow this pattern
|
|
251
|
+
- CIS benchmark categories map well to Lynis categories above
|
|
252
|
+
|
|
253
|
+
**Check to adopt:** Lynis's output format (check ID → severity → recommendation) is the right model for the agent's report section. Every finding should include: what was checked, what was found, what to do.
|
|
254
|
+
|
|
255
|
+
---
|
|
256
|
+
|
|
257
|
+
## Section 5: Environment and PATH Debugging
|
|
258
|
+
|
|
259
|
+
### Sources
|
|
260
|
+
|
|
261
|
+
- [linuxvox.com — systemd Environment directive PATH expansion](https://linuxvox.com/blog/systemd-environment-directive-to-set-path/)
|
|
262
|
+
- [itsfoss.gitlab.io — systemd service environment variables](https://itsfoss.gitlab.io/blog/systemd-service-environment-variables)
|
|
263
|
+
- [baeldung.com — systemd environment variables](https://www.baeldung.com/linux/systemd-services-environment-variables)
|
|
264
|
+
- [containersolutions runbook — debug systemd service units](https://containersolutions.github.io/runbooks/posts/linux/debug-systemd-service-units/)
|
|
265
|
+
- [systemd.io — known environment variables](https://systemd.io/ENVIRONMENT/)
|
|
266
|
+
|
|
267
|
+
### Findings
|
|
268
|
+
|
|
269
|
+
**The three most common PATH/environment failure classes:**
|
|
270
|
+
|
|
271
|
+
1. **PATH missing version manager shims** — nvm, pyenv, rbenv inject shims at shell init time (`~/.bashrc`, `~/.bash_profile`). systemd user services do not source these. Binary found in interactive shell; not found in service.
|
|
272
|
+
- Diagnosis: `systemctl show <service> -p Environment` — verify PATH contains `/home/user/.nvm/versions/node/vX.X.X/bin` or equivalent
|
|
273
|
+
- Fix: Add explicit `Environment=PATH=/home/user/.nvm/versions/node/vX.X.X/bin:/usr/local/bin:/usr/bin:/bin` to unit file, or use `ExecStart=/absolute/path/to/binary`
|
|
274
|
+
|
|
275
|
+
2. **EnvironmentFile quoting issues** — systemd's `EnvironmentFile=` does not do shell processing. Quoted values in the file are NOT unquoted: `KEY="value"` produces `KEY='"value"'` (with quotes). Shell scripts and `.env` files often use quotes; systemd does not strip them.
|
|
276
|
+
- Diagnosis: Reproduce by running `sudo -u <user> env` after loading the file vs. `printenv KEY` inside the service
|
|
277
|
+
- Fix: Strip quotes from EnvironmentFile values, or use `Environment="KEY=value"` (one level of quoting, shell-processed by systemd before execution)
|
|
278
|
+
|
|
279
|
+
3. **Tilde and variable expansion** — `ExecStart=~/bin/myapp` or `ExecStart=$HOME/bin/myapp` fail silently (binary not found) because systemd does not expand `~` or shell variables in `ExecStart`. Use absolute paths always.
|
|
280
|
+
- Diagnosis: `systemctl status` shows `code=exited, status=203/EXEC` — executable not found
|
|
281
|
+
- Fix: Replace `~/` with `/home/username/`, replace `$HOME` with literal path in ExecStart
|
|
282
|
+
|
|
283
|
+
**The fixed systemd PATH (when no PATH is set in unit):**
|
|
284
|
+
```
|
|
285
|
+
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|
|
286
|
+
```
|
|
287
|
+
Version manager shims, Homebrew (`/home/linuxbrew/.linuxbrew/bin`), and `~/.local/bin` are NOT in this fixed path.
|
|
288
|
+
|
|
289
|
+
**Justin's specific PATH gotcha (from CLAUDE.md):**
|
|
290
|
+
Homebrew Python 3.14 is first on PATH (`python3`). System python3.12 at `/usr/bin/python3`. Services that need ML packages must use `python3` (Homebrew path) not the system Python. The PATH in user services must be set explicitly to get this right.
|
|
291
|
+
|
|
292
|
+
**Diagnostic command sequence for PATH issues:**
|
|
293
|
+
```bash
|
|
294
|
+
# Step 1: See what PATH the service actually has
|
|
295
|
+
systemctl show <service> -p Environment
|
|
296
|
+
|
|
297
|
+
# Step 2: See what the unit file says
|
|
298
|
+
systemctl cat <service>
|
|
299
|
+
|
|
300
|
+
# Step 3: Manually test with the service user's environment
|
|
301
|
+
sudo -u justin env -i HOME=/home/justin PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin bash -c 'which python3'
|
|
302
|
+
|
|
303
|
+
# Step 4: Check if EnvironmentFile values have quotes
|
|
304
|
+
grep -E '^[A-Z_]+=".+"' /path/to/env-file # quotes that systemd will NOT strip
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
**Check to adopt:** When the agent is asked about any service failure with `status=203/EXEC` (not found), it should run the four-step PATH diagnostic above before suggesting any other fix.
|
|
308
|
+
|
|
309
|
+
---
|
|
310
|
+
|
|
311
|
+
## Section 6: Package Management Auditing
|
|
312
|
+
|
|
313
|
+
### Sources
|
|
314
|
+
|
|
315
|
+
- [linux-audit.com — auditing Linux software packages](https://linux-audit.com/auditing-linux-software-packages-managers/)
|
|
316
|
+
- [baeldung.com — apt packages kept back](https://www.baeldung.com/linux/apt-packages-kept-back)
|
|
317
|
+
- [oneuptime.com — fix broken packages Ubuntu](https://oneuptime.com/blog/post/2026-01-15-fix-broken-packages-ubuntu/view)
|
|
318
|
+
- [labex.io — verify Linux package status](https://labex.io/tutorials/linux-how-to-verify-linux-package-status-435583)
|
|
319
|
+
- Lynis package management category (from Section 4)
|
|
320
|
+
|
|
321
|
+
### Findings
|
|
322
|
+
|
|
323
|
+
**Package health check taxonomy:**
|
|
324
|
+
|
|
325
|
+
| Check | Command | What to flag |
|
|
326
|
+
|---|---|---|
|
|
327
|
+
| Broken dependencies | `sudo apt-get check` | Any output = broken state |
|
|
328
|
+
| Held-back packages | `apt-mark showhold` | Unexpected holds |
|
|
329
|
+
| Security updates | `apt list --upgradable 2>/dev/null \| grep -i security` | Any = action required |
|
|
330
|
+
| Orphaned packages | `apt-get autoremove --dry-run` | Review before running |
|
|
331
|
+
| Broken package list | `dpkg -l \| grep -E '^(iF\|iU\|rF)'` | Install-required/failed states |
|
|
332
|
+
| GPG key validity | `apt-key list 2>/dev/null` (deprecated) / `/etc/apt/trusted.gpg.d/` | Expired or untrusted keys |
|
|
333
|
+
| Apt cache stale | `stat /var/cache/apt/pkgcache.bin` — warn if >48h old | Cache not refreshed |
|
|
334
|
+
|
|
335
|
+
**Common failure patterns and fixes:**
|
|
336
|
+
|
|
337
|
+
`apt-get: The following packages have been kept back` — usually caused by:
|
|
338
|
+
1. A dependency changed (transitional package split)
|
|
339
|
+
2. A new package is required but would be a new install (apt won't auto-install new packages)
|
|
340
|
+
3. Fix: `sudo apt-get install --install-recommends <packages>` or `sudo apt full-upgrade`
|
|
341
|
+
|
|
342
|
+
Broken packages from partial upgrade:
|
|
343
|
+
1. `sudo dpkg --configure -a` — complete any interrupted configuration
|
|
344
|
+
2. `sudo apt-get install -f` — fix broken dependencies
|
|
345
|
+
3. `sudo apt-get check` — verify state after each step
|
|
346
|
+
|
|
347
|
+
**Check to adopt:** The package audit should always run `apt-get check` first (instant, non-destructive). If it fails, escalate to the dpkg/install-f sequence before attempting manual resolution.
|
|
348
|
+
|
|
349
|
+
---
|
|
350
|
+
|
|
351
|
+
## Section 7: Infrastructure-as-Code Review Patterns
|
|
352
|
+
|
|
353
|
+
### Sources
|
|
354
|
+
|
|
355
|
+
- [analysis-tools-dev/static-analysis](https://github.com/analysis-tools-dev/static-analysis) — curated SAST tool list
|
|
356
|
+
- [checkov](https://www.checkov.io/) — IaC security scanning (Terraform, CloudFormation, K8s)
|
|
357
|
+
- [tflint](https://github.com/terraform-linters/tflint) — Terraform-specific linter
|
|
358
|
+
- [bytebase.com — top open source IaC security tools](https://www.bytebase.com/blog/top-open-source-iac-security-tools/)
|
|
359
|
+
- [OWASP — IaC security cheat sheet](https://cheatsheetseries.owasp.org/cheatsheets/Infrastructure_as_Code_Security_Cheat_Sheet.html)
|
|
360
|
+
|
|
361
|
+
### Findings
|
|
362
|
+
|
|
363
|
+
**Relevance to this agent:** Justin's "infrastructure-as-code" is primarily systemd unit files, `~/.env` structure, Tailscale Serve config, and shell profile modifications. Not Terraform/CloudFormation. The IaC review tools are not directly adoptable, but the *review methodology* is:
|
|
364
|
+
|
|
365
|
+
**IaC review patterns worth adopting for unit file/config review:**
|
|
366
|
+
|
|
367
|
+
1. **Static analysis before live testing** — verify/lint the file before enabling the service (maps to `systemd-analyze verify` + `systemdlint`)
|
|
368
|
+
2. **Security policy as code** — hardening requirements are explicit, checkable, not just guidelines (maps to `systemd-analyze security` with threshold assertion)
|
|
369
|
+
3. **Drift detection** — compare running config against source-of-truth unit file (`systemctl cat <service>` vs. `~/.config/systemd/user/<service>.service`)
|
|
370
|
+
4. **Dependency graph validation** — `systemd-analyze critical-chain` for startup ordering; `After=`, `Requires=`, `Wants=` correctness
|
|
371
|
+
|
|
372
|
+
**Checkov patterns (adapted for systemd):**
|
|
373
|
+
- Every service should declare `After=network.target` if it makes network connections
|
|
374
|
+
- Services using `User=` should not also use `PrivilegedInstall=` or run as root
|
|
375
|
+
- `EnvironmentFile=` paths should be absolute, not relative
|
|
376
|
+
- `WorkingDirectory=` should not be `/` (common default that leaks filesystem access)
|
|
377
|
+
|
|
378
|
+
---
|
|
379
|
+
|
|
380
|
+
## Section 8: Synthesis — Best Patterns to Adopt
|
|
381
|
+
|
|
382
|
+
### Gap analysis: what does no existing tool do?
|
|
383
|
+
|
|
384
|
+
| Capability | Existing tool | Gap |
|
|
385
|
+
|---|---|---|
|
|
386
|
+
| Service status monitoring | `infra-auditor` | None — already covered |
|
|
387
|
+
| Unit file syntax validation | `systemd-analyze verify` | Not yet in any agent |
|
|
388
|
+
| Security exposure scoring | `systemd-analyze security` | Not yet in any agent |
|
|
389
|
+
| PATH/environment root cause | None (manual process) | **Gap — primary agent value** |
|
|
390
|
+
| Package health audit | `apt-get check` (manual) | Not yet in any agent |
|
|
391
|
+
| Permissions audit | `find` (manual) | Not yet in any agent |
|
|
392
|
+
| Hardening recommendations | `systemd-analyze security` output | Not yet surfaced to agent |
|
|
393
|
+
| Unit file vs. live config drift | `systemctl cat` vs. file | Not yet in any agent |
|
|
394
|
+
|
|
395
|
+
### Decision matrix: agent vs. script vs. manual
|
|
396
|
+
|
|
397
|
+
| Task | Right tool | Reason |
|
|
398
|
+
|---|---|---|
|
|
399
|
+
| Routine service health check | `infra-auditor` (existing) | Already built, named-service checks |
|
|
400
|
+
| New service failing at start | `shell-expert` | Root cause requires diagnostic tree |
|
|
401
|
+
| PATH/env debugging | `shell-expert` | Requires judgment about which PATH source applies |
|
|
402
|
+
| Package audit | `shell-expert` | Fits same diagnostic/remediation pattern |
|
|
403
|
+
| Permissions review | `shell-expert` | Requires context about what should own what |
|
|
404
|
+
| Boot performance investigation | `shell-expert` | `systemd-analyze blame` + critical-chain interpretation |
|
|
405
|
+
| Unit file hardening | `shell-expert` | `systemd-analyze security` + recommendation |
|
|
406
|
+
|
|
407
|
+
### Core diagnostic flows to encode in the agent
|
|
408
|
+
|
|
409
|
+
**Flow 1: Service failure triage**
|
|
410
|
+
```
|
|
411
|
+
status=203 (exec failed) → PATH diagnostic → absolute path check
|
|
412
|
+
status=1 (runtime error) → journalctl → manual repro as service user
|
|
413
|
+
status=failed (dependency) → systemd-analyze verify → dependency check
|
|
414
|
+
active=failed (startup timeout) → systemd-analyze critical-chain → slow dep
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
**Flow 2: PATH/environment root cause**
|
|
418
|
+
```
|
|
419
|
+
1. systemctl show <service> -p Environment,EnvironmentFile,ExecStart
|
|
420
|
+
2. Check if ExecStart uses ~/ or $HOME → flag, suggest absolute path
|
|
421
|
+
3. Check if EnvironmentFile has quoted values → flag, show fix
|
|
422
|
+
4. Check if required binary is outside systemd's fixed PATH → add Environment= directive
|
|
423
|
+
5. Check for Homebrew/nvm/pyenv shims needed → explicit PATH with version manager bins
|
|
424
|
+
```
|
|
425
|
+
|
|
426
|
+
**Flow 3: Hardening audit**
|
|
427
|
+
```
|
|
428
|
+
1. systemd-analyze security <service> --json=pretty
|
|
429
|
+
2. Report exposure score with rating (OK/Medium/Exposed/UNSAFE)
|
|
430
|
+
3. List top 5 missing directives by exposure weight
|
|
431
|
+
4. For each: show current state → recommended value → impact
|
|
432
|
+
5. Flag services >5.0 as requiring attention
|
|
433
|
+
```
|
|
434
|
+
|
|
435
|
+
**Flow 4: Package health**
|
|
436
|
+
```
|
|
437
|
+
1. apt-get check → if fails, stop and fix before anything else
|
|
438
|
+
2. apt-mark showhold → report any held packages
|
|
439
|
+
3. apt list --upgradable | grep -i security → report security updates
|
|
440
|
+
4. apt-get autoremove --dry-run → report orphaned package count
|
|
441
|
+
```
|
|
442
|
+
|
|
443
|
+
### Report format (adopt from Lynis + infra-auditor patterns)
|
|
444
|
+
|
|
445
|
+
```
|
|
446
|
+
SHELL-EXPERT DIAGNOSIS — <service/domain> — <date>
|
|
447
|
+
|
|
448
|
+
CRITICAL (fix before proceeding):
|
|
449
|
+
- [ID] <finding> → <command to fix>
|
|
450
|
+
|
|
451
|
+
WARNING (action recommended):
|
|
452
|
+
- [ID] <finding> → <recommended action>
|
|
453
|
+
|
|
454
|
+
INFO (informational):
|
|
455
|
+
- [ID] <finding> → <explanation>
|
|
456
|
+
|
|
457
|
+
DIAGNOSIS SUMMARY:
|
|
458
|
+
- Root cause: <one-sentence root cause>
|
|
459
|
+
- Fix: <one or two commands>
|
|
460
|
+
- Verification: <command that confirms fix>
|
|
461
|
+
```
|
|
462
|
+
|
|
463
|
+
---
|
|
464
|
+
|
|
465
|
+
## Section 9: Recommended Agent Structure
|
|
466
|
+
|
|
467
|
+
### Option A: Single shell-expert agent (recommended)
|
|
468
|
+
|
|
469
|
+
Create `~/.claude/agents/shell-expert.md` covering all five ops domains. Each domain has its own ordered diagnostic checklist. The agent decides which domain's flow to execute based on user intent.
|
|
470
|
+
|
|
471
|
+
**Frontmatter:**
|
|
472
|
+
```yaml
|
|
473
|
+
---
|
|
474
|
+
name: shell-expert
|
|
475
|
+
description: "Use this agent when diagnosing systemd service failures, PATH/environment issues, package management problems, file permissions auditing, or environment configuration on Linux. This agent performs diagnosis and remediation, NOT script writing."
|
|
476
|
+
tools: Read, Grep, Glob, Bash
|
|
477
|
+
model: sonnet
|
|
478
|
+
---
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
**Five diagnostic domains (ordered sections in agent body):**
|
|
482
|
+
|
|
483
|
+
1. **Service Lifecycle** — triage failure exit codes, journalctl, manual repro, dependency verification, `systemd-analyze verify`
|
|
484
|
+
2. **Environment & PATH** — four-step PATH diagnostic, EnvironmentFile quoting, tilde expansion, version manager shim detection
|
|
485
|
+
3. **Hardening Audit** — `systemd-analyze security` → exposure score → top-5 directives → fix recommendations
|
|
486
|
+
4. **Package Management** — `apt-get check` → held packages → security updates → orphaned packages → broken deps
|
|
487
|
+
5. **Permissions** — `~/.env` mode, SUID/SGID audit, world-writable scan, service user ownership
|
|
488
|
+
|
|
489
|
+
**Relationship to infra-auditor:**
|
|
490
|
+
- `infra-auditor` = monitoring (is everything up right now?)
|
|
491
|
+
- `shell-expert` = investigation (why did it fail and how to fix it?)
|
|
492
|
+
- Trigger: when infra-auditor reports CRITICAL, shell-expert is the next step
|
|
493
|
+
|
|
494
|
+
### Option B: Split into shell-health + shell-hardening
|
|
495
|
+
|
|
496
|
+
Split service lifecycle/environment/packages into `shell-health` (reactive/diagnosis) and hardening into `shell-hardening` (proactive/audit). Only warranted if the combined agent exceeds ~300 lines of instructions and routing ambiguity becomes a problem.
|
|
497
|
+
|
|
498
|
+
**Recommendation: Option A.** The five domains are naturally sequenced and a single agent with domain sections is cleaner. If it grows unwieldy, split at that point.
|
|
499
|
+
|
|
500
|
+
### What NOT to include
|
|
501
|
+
|
|
502
|
+
- Script writing or bash one-liners on demand → bash-expert (not yet built)
|
|
503
|
+
- Service monitoring on a schedule → infra-auditor (already built)
|
|
504
|
+
- Cloud/IaC/Terraform → different agent, different scope
|
|
505
|
+
- Tailscale network debugging → infra-auditor has connectivity checks; shell-expert handles host-side only (socket, bind address, lingering)
|
|
506
|
+
|
|
507
|
+
---
|
|
508
|
+
|
|
509
|
+
## References
|
|
510
|
+
|
|
511
|
+
- [VoltAgent/awesome-claude-code-subagents](https://github.com/VoltAgent/awesome-claude-code-subagents)
|
|
512
|
+
- [wshobson/agents](https://github.com/wshobson/agents)
|
|
513
|
+
- [iannuttall/claude-agents](https://github.com/iannuttall/claude-agents)
|
|
514
|
+
- [priv-kweihmann/systemdlint](https://github.com/priv-kweihmann/systemdlint)
|
|
515
|
+
- [mackwic/systemd-linter](https://github.com/mackwic/systemd-linter)
|
|
516
|
+
- [systemd-analyze man page](https://www.freedesktop.org/software/systemd/man/latest/systemd-analyze.html)
|
|
517
|
+
- [linux-audit.com — systemd-analyze](https://linux-audit.com/system-administration/commands/systemd-analyze/)
|
|
518
|
+
- [linux-audit.com — how to verify systemd unit errors](https://linux-audit.com/systemd/faq/how-to-verify-a-systemd-unit-for-errors/)
|
|
519
|
+
- [linux-audit.com — how to harden systemd service unit](https://linux-audit.com/systemd/how-to-harden-a-systemd-service-unit/)
|
|
520
|
+
- [linux-audit.com — auditing Linux software packages](https://linux-audit.com/auditing-linux-software-packages-managers/)
|
|
521
|
+
- [linuxjournal.com — systemd service strengthening](https://www.linuxjournal.com/content/systemd-service-strengthening)
|
|
522
|
+
- [ctrl.blog — systemd service sandboxing 101](https://www.ctrl.blog/entry/systemd-service-hardening.html)
|
|
523
|
+
- [synacktiv.com — SHH systemd hardening helper](https://www.synacktiv.com/en/publications/systemd-hardening-made-easy-with-shh)
|
|
524
|
+
- [rockylinux.org — systemd units hardening](https://docs.rockylinux.org/9/guides/security/systemd_hardening/)
|
|
525
|
+
- [CISOfy/lynis](https://github.com/CISOfy/lynis)
|
|
526
|
+
- [nikhilkumar0102/Linux-cis-audit](https://github.com/nikhilkumar0102/Linux-cis-audit)
|
|
527
|
+
- [trimstray/linux-hardening-checklist](https://github.com/trimstray/linux-hardening-checklist)
|
|
528
|
+
- [containersolutions — debug systemd service units runbook](https://containersolutions.github.io/runbooks/posts/linux/debug-systemd-service-units/)
|
|
529
|
+
- [linuxvox.com — systemd Environment PATH](https://linuxvox.com/blog/systemd-environment-directive-to-set-path/)
|
|
530
|
+
- [baeldung.com — systemd environment variables](https://www.baeldung.com/linux/systemd-services-environment-variables)
|
|
531
|
+
- [baeldung.com — apt packages kept back](https://www.baeldung.com/linux/apt-packages-kept-back)
|
|
532
|
+
- [OWASP — IaC security cheat sheet](https://cheatsheetseries.owasp.org/cheatsheets/Infrastructure_as_Code_Security_Cheat_Sheet.html)
|
|
533
|
+
- [analysis-tools-dev/static-analysis](https://github.com/analysis-tools-dev/static-analysis)
|