ralphctl 0.8.0 → 0.8.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/manifest.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"version": 1,
|
|
3
|
-
"generatedAt": "2026-05-
|
|
3
|
+
"generatedAt": "2026-05-25T15:37:20.381Z",
|
|
4
4
|
"assets": [
|
|
5
5
|
"prompts/_partials/decisions.md",
|
|
6
6
|
"prompts/_partials/harness-context.md",
|
|
@@ -18,6 +18,7 @@
|
|
|
18
18
|
"prompts/refine/template.md",
|
|
19
19
|
"skills/ralphctl-abstraction-first/SKILL.md",
|
|
20
20
|
"skills/ralphctl-alignment/SKILL.md",
|
|
21
|
-
"skills/ralphctl-iterative-review/SKILL.md"
|
|
21
|
+
"skills/ralphctl-iterative-review/SKILL.md",
|
|
22
|
+
"skills/ralphctl-minimal-scaffolding/SKILL.md"
|
|
22
23
|
]
|
|
23
24
|
}
|
|
@@ -30,6 +30,13 @@ lift it verbatim. Prefer this over any inference from manifest scripts.
|
|
|
30
30
|
entries). **Monorepos**: inspect the root manifest and one or two representative sub-modules to
|
|
31
31
|
confirm the stack, then propose root-level commands that build/verify the whole tree.
|
|
32
32
|
|
|
33
|
+
**Non-interactive flags for JVM stacks.** The harness captures the script's combined stdout/stderr
|
|
34
|
+
to a plain-text log file. Maven, Gradle, and sbt emit ANSI colour codes by default that render
|
|
35
|
+
poorly there. When proposing a command for one of these tools, append the standard non-interactive
|
|
36
|
+
flag — `mvn -B …`, `gradle --console=plain …`, `sbt -no-colors …` — unless the project's own docs
|
|
37
|
+
prescribe a different invocation. Modern Node / Python / Rust tooling respects `NO_COLOR` which the
|
|
38
|
+
harness sets automatically, so no per-tool flag is needed there.
|
|
39
|
+
|
|
33
40
|
**Polyglot monorepos.** When sub-trees use different toolchains, chain each sub-tree's command so
|
|
34
41
|
the harness prepares / verifies every half from the repo root. Use `&&` so the first failure stops
|
|
35
42
|
the chain. Prefer each tool's own directory flag over `cd … &&` so the line stays portable; fall
|
|
@@ -74,6 +81,15 @@ When only a manifest exists with install + test scripts and no context file:
|
|
|
74
81
|
<note>No context file found; commands inferred from package.json scripts.</note>
|
|
75
82
|
```
|
|
76
83
|
|
|
84
|
+
When a JVM build descriptor (e.g. `pom.xml`) drives the project and `CLAUDE.md` names install +
|
|
85
|
+
verify steps:
|
|
86
|
+
|
|
87
|
+
```
|
|
88
|
+
<setup-script>mvn -B -DskipTests install</setup-script>
|
|
89
|
+
<verify-script>mvn -B verify</verify-script>
|
|
90
|
+
<note>Commands lifted from CLAUDE.md; -B disables interactive prompts and ANSI colour for clean persisted logs.</note>
|
|
91
|
+
```
|
|
92
|
+
|
|
77
93
|
</example>
|
|
78
94
|
|
|
79
95
|
## Repository Context
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ralphctl-minimal-scaffolding
|
|
3
|
+
description: Cross-phase skill — question every harness component on every model bump; remove non-load-bearing pieces one at a time with measurement. Complexity drifts upward by default; subtraction requires discipline.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Minimal Scaffolding
|
|
7
|
+
|
|
8
|
+
> "Find the simplest solution possible, and only increase complexity when needed. Every component encodes
|
|
9
|
+
> assumptions about model limitations. Stress-test assumptions; they can go stale quickly as models improve.
|
|
10
|
+
> Remove one component at a time when simplifying. Re-examine entire harness when new model releases; strip
|
|
11
|
+
> non-load-bearing pieces."
|
|
12
|
+
>
|
|
13
|
+
> — Anthropic, _Harness Design for Long-Running Apps_
|
|
14
|
+
|
|
15
|
+
Harness complexity drifts upward. Each component that solves a real problem at the time of its addition
|
|
16
|
+
becomes a permanent fixture — even after the model capability that made it necessary has improved past the
|
|
17
|
+
threshold. Without an active counter-pressure, the harness grows into a weight that slows iteration and
|
|
18
|
+
obscures the actual design signal. Minimal scaffolding is not a one-time decision at design time; it is a
|
|
19
|
+
discipline applied on every model release and on every proposed addition.
|
|
20
|
+
|
|
21
|
+
## When this applies
|
|
22
|
+
|
|
23
|
+
- **Refine** — before proposing that the refine phase needs a new guard, new evaluator, or new validation
|
|
24
|
+
step, ask whether the model would produce the right output without it given a well-scoped prompt.
|
|
25
|
+
- **Plan** — before adding a new planning sub-agent or splitting a flow into more phases, ask whether the
|
|
26
|
+
additional structure would improve output quality measurably, or whether it is defensive scaffolding
|
|
27
|
+
against a past model's limitations.
|
|
28
|
+
- **Execute** — before wiring a new chain primitive, a new leaf, or a new wrapper around the evaluator, ask
|
|
29
|
+
which assumption about current model capability the addition encodes, and whether that assumption is still
|
|
30
|
+
valid.
|
|
31
|
+
|
|
32
|
+
## What to do
|
|
33
|
+
|
|
34
|
+
1. **Start with the simplest viable shape.** Draft the simplest version that could work given today's model
|
|
35
|
+
capability. Only add components when the simple version demonstrably fails.
|
|
36
|
+
2. **Question every component on every model bump.** When a new model version ships, re-read
|
|
37
|
+
`HARNESS-PRINCIPLES.md` § 14 and § 18 in the project's `.claude/docs/` directory. For each `applied` row,
|
|
38
|
+
ask: "Would removing this component degrade output quality on the new model?" If the answer is uncertain,
|
|
39
|
+
run the test.
|
|
40
|
+
3. **Remove one component at a time, measure.** Never refactor two components simultaneously — you cannot
|
|
41
|
+
isolate the regression. Remove one; run the project's check gate; observe output quality; decide.
|
|
42
|
+
4. **Default toward subtraction over addition.** When in doubt, omit. Adding a component later when its
|
|
43
|
+
need is proven is cheaper than carrying a component whose need was assumed.
|
|
44
|
+
|
|
45
|
+
## Anti-patterns
|
|
46
|
+
|
|
47
|
+
- **Stacking primitives "just in case".** Adding a `guard` around the evaluator, a `loop` around the guard,
|
|
48
|
+
and a retry decorator around the loop — each layer may have been justified at the time, but the stack
|
|
49
|
+
encodes four separate assumptions, each of which needs re-validation on every model bump.
|
|
50
|
+
- **Treating scaffolding as load-bearing without measurement.** "We've always had the idle watchdog" is not
|
|
51
|
+
a reason to keep it — it is a reason to measure whether it still fires in practice. If it never fires on
|
|
52
|
+
the current model, it may be removable.
|
|
53
|
+
- **Mass refactors that change more than one component at a time.** Removing the plateau detector and the
|
|
54
|
+
idle watchdog in the same PR makes it impossible to attribute a quality change to either. One component
|
|
55
|
+
at a time.
|
|
56
|
+
- **Never re-auditing existing scaffolding when models get better.** Capability improvements accrue quietly.
|
|
57
|
+
A component that was essential for an older model may be transparently handled by a newer one. Without an
|
|
58
|
+
explicit re-audit cadence, the harness ossifies around old assumptions.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "ralphctl",
|
|
3
|
-
"version": "0.8.
|
|
3
|
+
"version": "0.8.1",
|
|
4
4
|
"description": "Agent harness for long-running AI coding tasks — orchestrates Claude Code, GitHub Copilot, and OpenAI Codex across repositories",
|
|
5
5
|
"homepage": "https://github.com/lukas-grigis/ralphctl",
|
|
6
6
|
"type": "module",
|