ralphctl 0.8.0 → 0.8.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "version": 1,
3
- "generatedAt": "2026-05-24T20:04:47.241Z",
3
+ "generatedAt": "2026-05-26T13:05:48.855Z",
4
4
  "assets": [
5
5
  "prompts/_partials/decisions.md",
6
6
  "prompts/_partials/harness-context.md",
@@ -18,6 +18,7 @@
18
18
  "prompts/refine/template.md",
19
19
  "skills/ralphctl-abstraction-first/SKILL.md",
20
20
  "skills/ralphctl-alignment/SKILL.md",
21
- "skills/ralphctl-iterative-review/SKILL.md"
21
+ "skills/ralphctl-iterative-review/SKILL.md",
22
+ "skills/ralphctl-minimal-scaffolding/SKILL.md"
22
23
  ]
23
24
  }
@@ -30,6 +30,13 @@ lift it verbatim. Prefer this over any inference from manifest scripts.
30
30
  entries). **Monorepos**: inspect the root manifest and one or two representative sub-modules to
31
31
  confirm the stack, then propose root-level commands that build/verify the whole tree.
32
32
 
33
+ **Non-interactive flags for JVM stacks.** The harness captures the script's combined stdout/stderr
34
+ to a plain-text log file. Maven, Gradle, and sbt emit ANSI colour codes by default that render
35
+ poorly there. When proposing a command for one of these tools, append the standard non-interactive
36
+ flag — `mvn -B …`, `gradle --console=plain …`, `sbt -no-colors …` — unless the project's own docs
37
+ prescribe a different invocation. Modern Node / Python / Rust tooling respects `NO_COLOR` which the
38
+ harness sets automatically, so no per-tool flag is needed there.
39
+
33
40
  **Polyglot monorepos.** When sub-trees use different toolchains, chain each sub-tree's command so
34
41
  the harness prepares / verifies every half from the repo root. Use `&&` so the first failure stops
35
42
  the chain. Prefer each tool's own directory flag over `cd … &&` so the line stays portable; fall
@@ -74,6 +81,15 @@ When only a manifest exists with install + test scripts and no context file:
74
81
  <note>No context file found; commands inferred from package.json scripts.</note>
75
82
  ```
76
83
 
84
+ When a JVM build descriptor (e.g. `pom.xml`) drives the project and `CLAUDE.md` names install +
85
+ verify steps:
86
+
87
+ ```
88
+ <setup-script>mvn -B -DskipTests install</setup-script>
89
+ <verify-script>mvn -B verify</verify-script>
90
+ <note>Commands lifted from CLAUDE.md; -B disables interactive prompts and ANSI colour for clean persisted logs.</note>
91
+ ```
92
+
77
93
  </example>
78
94
 
79
95
  ## Repository Context
@@ -0,0 +1,58 @@
1
+ ---
2
+ name: ralphctl-minimal-scaffolding
3
+ description: Cross-phase skill — question every harness component on every model bump; remove non-load-bearing pieces one at a time with measurement. Complexity drifts upward by default; subtraction requires discipline.
4
+ ---
5
+
6
+ # Minimal Scaffolding
7
+
8
+ > "Find the simplest solution possible, and only increase complexity when needed. Every component encodes
9
+ > assumptions about model limitations. Stress-test assumptions; they can go stale quickly as models improve.
10
+ > Remove one component at a time when simplifying. Re-examine entire harness when new model releases; strip
11
+ > non-load-bearing pieces."
12
+ >
13
+ > — Anthropic, _Harness Design for Long-Running Apps_
14
+
15
+ Harness complexity drifts upward. Each component that solves a real problem at the time of its addition
16
+ becomes a permanent fixture — even after the model capability that made it necessary has improved past the
17
+ threshold. Without an active counter-pressure, the harness grows into a weight that slows iteration and
18
+ obscures the actual design signal. Minimal scaffolding is not a one-time decision at design time; it is a
19
+ discipline applied on every model release and on every proposed addition.
20
+
21
+ ## When this applies
22
+
23
+ - **Refine** — before proposing that the refine phase needs a new guard, new evaluator, or new validation
24
+ step, ask whether the model would produce the right output without it given a well-scoped prompt.
25
+ - **Plan** — before adding a new planning sub-agent or splitting a flow into more phases, ask whether the
26
+ additional structure would improve output quality measurably, or whether it is defensive scaffolding
27
+ against a past model's limitations.
28
+ - **Execute** — before wiring a new chain primitive, a new leaf, or a new wrapper around the evaluator, ask
29
+ which assumption about current model capability the addition encodes, and whether that assumption is still
30
+ valid.
31
+
32
+ ## What to do
33
+
34
+ 1. **Start with the simplest viable shape.** Draft the simplest version that could work given today's model
35
+ capability. Only add components when the simple version demonstrably fails.
36
+ 2. **Question every component on every model bump.** When a new model version ships, re-read
37
+ `HARNESS-PRINCIPLES.md` § 14 and § 18 in the project's `.claude/docs/` directory. For each `applied` row,
38
+ ask: "Would removing this component degrade output quality on the new model?" If the answer is uncertain,
39
+ run the test.
40
+ 3. **Remove one component at a time, measure.** Never refactor two components simultaneously — you cannot
41
+ isolate the regression. Remove one; run the project's check gate; observe output quality; decide.
42
+ 4. **Default toward subtraction over addition.** When in doubt, omit. Adding a component later when its
43
+ need is proven is cheaper than carrying a component whose need was assumed.
44
+
45
+ ## Anti-patterns
46
+
47
+ - **Stacking primitives "just in case".** Adding a `guard` around the evaluator, a `loop` around the guard,
48
+ and a retry decorator around the loop — each layer may have been justified at the time, but the stack
49
+ encodes four separate assumptions, each of which needs re-validation on every model bump.
50
+ - **Treating scaffolding as load-bearing without measurement.** "We've always had the idle watchdog" is not
51
+ a reason to keep it — it is a reason to measure whether it still fires in practice. If it never fires on
52
+ the current model, it may be removable.
53
+ - **Mass refactors that change more than one component at a time.** Removing the plateau detector and the
54
+ idle watchdog in the same PR makes it impossible to attribute a quality change to either. One component
55
+ at a time.
56
+ - **Never re-auditing existing scaffolding when models get better.** Capability improvements accrue quietly.
57
+ A component that was essential for an older model may be transparently handled by a newer one. Without an
58
+ explicit re-audit cadence, the harness ossifies around old assumptions.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ralphctl",
3
- "version": "0.8.0",
3
+ "version": "0.8.2",
4
4
  "description": "Agent harness for long-running AI coding tasks — orchestrates Claude Code, GitHub Copilot, and OpenAI Codex across repositories",
5
5
  "homepage": "https://github.com/lukas-grigis/ralphctl",
6
6
  "type": "module",
@@ -50,6 +50,8 @@
50
50
  "@types/react": "^19.2.14",
51
51
  "@vitest/coverage-v8": "^4.1.6",
52
52
  "eslint": "^10.4.0",
53
+ "eslint-plugin-react-hooks": "^7.1.1",
54
+ "eslint-plugin-sonarjs": "^4.0.3",
53
55
  "globals": "^17.6.0",
54
56
  "husky": "^9.1.7",
55
57
  "ink-testing-library": "^4.0.0",