@canivel/ralph 0.2.0 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/.agents/ralph/PROMPT_build.md +126 -126
  2. package/.agents/ralph/agents.sh +17 -15
  3. package/.agents/ralph/config.sh +25 -25
  4. package/.agents/ralph/log-activity.sh +15 -15
  5. package/.agents/ralph/loop.sh +1027 -1001
  6. package/.agents/ralph/references/CONTEXT_ENGINEERING.md +126 -126
  7. package/.agents/ralph/references/GUARDRAILS.md +174 -174
  8. package/AGENTS.md +20 -20
  9. package/README.md +270 -266
  10. package/bin/ralph +766 -765
  11. package/diagram.svg +55 -55
  12. package/examples/commands.md +46 -46
  13. package/package.json +39 -39
  14. package/skills/commit/SKILL.md +219 -219
  15. package/skills/commit/references/commit_examples.md +292 -292
  16. package/skills/dev-browser/SKILL.md +211 -211
  17. package/skills/dev-browser/bun.lock +443 -443
  18. package/skills/dev-browser/package-lock.json +2988 -2988
  19. package/skills/dev-browser/package.json +31 -31
  20. package/skills/dev-browser/references/scraping.md +155 -155
  21. package/skills/dev-browser/scripts/start-relay.ts +32 -32
  22. package/skills/dev-browser/scripts/start-server.ts +117 -117
  23. package/skills/dev-browser/server.sh +24 -24
  24. package/skills/dev-browser/src/client.ts +474 -474
  25. package/skills/dev-browser/src/index.ts +287 -287
  26. package/skills/dev-browser/src/relay.ts +731 -731
  27. package/skills/dev-browser/src/snapshot/__tests__/snapshot.test.ts +223 -223
  28. package/skills/dev-browser/src/snapshot/browser-script.ts +877 -877
  29. package/skills/dev-browser/src/snapshot/index.ts +14 -14
  30. package/skills/dev-browser/src/snapshot/inject.ts +13 -13
  31. package/skills/dev-browser/src/types.ts +34 -34
  32. package/skills/dev-browser/tsconfig.json +36 -36
  33. package/skills/dev-browser/vitest.config.ts +12 -12
  34. package/skills/prd/SKILL.md +235 -235
  35. package/tests/agent-loops.mjs +79 -79
  36. package/tests/agent-ping.mjs +39 -39
  37. package/tests/audit.md +56 -56
  38. package/tests/cli-smoke.mjs +47 -47
  39. package/tests/real-agents.mjs +127 -127
@@ -1,126 +1,126 @@
1
- # Context Engineering Reference
2
-
3
- This document explains the malloc/free metaphor for LLM context management that underlies the Ralph technique.
4
-
5
- ## The malloc() Metaphor
6
-
7
- In traditional programming:
8
- - `malloc()` allocates memory
9
- - `free()` releases memory
10
- - Memory leaks occur when you allocate without freeing
11
-
12
- In LLM context:
13
- - Reading files, receiving responses, tool outputs = `malloc()`
14
- - **There is no `free()`** - context cannot be released
15
- - The only way to "free" is to start a new conversation
16
-
17
- ## Why This Matters
18
-
19
- ### Context Pollution
20
-
21
- When you work on multiple unrelated tasks in the same context:
22
-
23
- ```
24
- Task 1: Build authentication → context contains auth code, JWT docs, security patterns
25
- Task 2: Build UI components → context now ALSO contains auth stuff
26
-
27
- Result: LLM might suggest auth-related patterns when building UI
28
- or mix concerns inappropriately
29
- ```
30
-
31
- ### Autoregressive Failure
32
-
33
- LLMs predict the next token based on ALL context. When context contains:
34
- - Unrelated information
35
- - Failed attempts
36
- - Mixed concerns
37
-
38
- The model can "spiral" into wrong territory, generating increasingly off-base responses.
39
-
40
- ### The Gutter Metaphor
41
-
42
- > "If the bowling ball is in the gutter, there's no saving it."
43
-
44
- Once context is polluted with failed attempts or mixed concerns, the model will keep referencing that pollution. Starting fresh is often faster than trying to correct course.
45
-
46
- ## Context Health Indicators
47
-
48
- ### 🟢 Healthy Context
49
- - Single focused task
50
- - Relevant files only
51
- - Clear progress
52
- - Under 60% capacity
53
-
54
- ### 🟡 Warning Signs
55
- - Multiple unrelated topics discussed
56
- - Several failed attempts in history
57
- - Approaching 80% capacity
58
- - Repeated similar errors
59
-
60
- ### 🔴 Critical / Gutter
61
- - Mixed concerns throughout
62
- - Circular failure patterns
63
- - Over 90% capacity
64
- - Model suggesting irrelevant solutions
65
-
66
- ## Best Practices
67
-
68
- ### 1. One Task Per Context
69
-
70
- Don't ask "fix the auth bug AND add the new feature". Do them in separate conversations.
71
-
72
- ### 2. Fresh Start on Topic Change
73
-
74
- Finished auth? Start a new conversation for the next feature.
75
-
76
- ### 3. Don't Redline
77
-
78
- Stay under 80% of context capacity. Quality degrades as you approach limits.
79
-
80
- ### 4. Recognize the Gutter
81
-
82
- If you're seeing:
83
- - Same error 3+ times
84
- - Solutions that don't match the problem
85
- - Circular suggestions
86
-
87
- Start fresh. Your progress is in the files.
88
-
89
- ### 5. State in Files, Not Context
90
-
91
- Write progress to files. The next conversation can read them. Context is ephemeral; files are permanent.
92
-
93
- ## Ralph's Approach
94
-
95
- The original Ralph technique (`while :; do cat PROMPT.md | agent ; done`) naturally implements these principles:
96
-
97
- 1. **Each iteration is a fresh process** - Context is freed
98
- 2. **State persists in files** - Progress survives context resets
99
- 3. **Same prompt each time** - Focused, single-task context
100
- 4. **Failures inform guardrails** - Learning without context pollution
101
-
102
- This Cursor implementation aims to bring these benefits while working within Cursor's session model.
103
-
104
- ## Measuring Context
105
-
106
- Rough estimates:
107
- - 1 token ≈ 4 characters
108
- - Average code file: 500-2000 tokens
109
- - Large file: 5000+ tokens
110
- - Conversation history: 100-500 tokens per exchange
111
-
112
- Track allocations in `.ralph/context-log.md` to stay aware.
113
-
114
- ## When to Start Fresh
115
-
116
- **Definitely start fresh when:**
117
- - Switching to unrelated task
118
- - Context over 90% full
119
- - Same error 3+ times
120
- - Model suggestions are off-topic
121
-
122
- **Consider starting fresh when:**
123
- - Context over 70% full
124
- - Significant topic shift within task
125
- - Feeling "stuck"
126
- - Multiple failed approaches in history
1
+ # Context Engineering Reference
2
+
3
+ This document explains the malloc/free metaphor for LLM context management that underlies the Ralph technique.
4
+
5
+ ## The malloc() Metaphor
6
+
7
+ In traditional programming:
8
+ - `malloc()` allocates memory
9
+ - `free()` releases memory
10
+ - Memory leaks occur when you allocate without freeing
11
+
12
+ In LLM context:
13
+ - Reading files, receiving responses, tool outputs = `malloc()`
14
+ - **There is no `free()`** - context cannot be released
15
+ - The only way to "free" is to start a new conversation
16
+
17
+ ## Why This Matters
18
+
19
+ ### Context Pollution
20
+
21
+ When you work on multiple unrelated tasks in the same context:
22
+
23
+ ```
24
+ Task 1: Build authentication → context contains auth code, JWT docs, security patterns
25
+ Task 2: Build UI components → context now ALSO contains auth stuff
26
+
27
+ Result: LLM might suggest auth-related patterns when building UI
28
+ or mix concerns inappropriately
29
+ ```
30
+
31
+ ### Autoregressive Failure
32
+
33
+ LLMs predict the next token based on ALL context. When context contains:
34
+ - Unrelated information
35
+ - Failed attempts
36
+ - Mixed concerns
37
+
38
+ The model can "spiral" into wrong territory, generating increasingly off-base responses.
39
+
40
+ ### The Gutter Metaphor
41
+
42
+ > "If the bowling ball is in the gutter, there's no saving it."
43
+
44
+ Once context is polluted with failed attempts or mixed concerns, the model will keep referencing that pollution. Starting fresh is often faster than trying to correct course.
45
+
46
+ ## Context Health Indicators
47
+
48
+ ### 🟢 Healthy Context
49
+ - Single focused task
50
+ - Relevant files only
51
+ - Clear progress
52
+ - Under 60% capacity
53
+
54
+ ### 🟡 Warning Signs
55
+ - Multiple unrelated topics discussed
56
+ - Several failed attempts in history
57
+ - Approaching 80% capacity
58
+ - Repeated similar errors
59
+
60
+ ### 🔴 Critical / Gutter
61
+ - Mixed concerns throughout
62
+ - Circular failure patterns
63
+ - Over 90% capacity
64
+ - Model suggesting irrelevant solutions
65
+
66
+ ## Best Practices
67
+
68
+ ### 1. One Task Per Context
69
+
70
+ Don't ask "fix the auth bug AND add the new feature". Do them in separate conversations.
71
+
72
+ ### 2. Fresh Start on Topic Change
73
+
74
+ Finished auth? Start a new conversation for the next feature.
75
+
76
+ ### 3. Don't Redline
77
+
78
+ Stay under 80% of context capacity. Quality degrades as you approach limits.
79
+
80
+ ### 4. Recognize the Gutter
81
+
82
+ If you're seeing:
83
+ - Same error 3+ times
84
+ - Solutions that don't match the problem
85
+ - Circular suggestions
86
+
87
+ Start fresh. Your progress is in the files.
88
+
89
+ ### 5. State in Files, Not Context
90
+
91
+ Write progress to files. The next conversation can read them. Context is ephemeral; files are permanent.
92
+
93
+ ## Ralph's Approach
94
+
95
+ The original Ralph technique (`while :; do cat PROMPT.md | agent ; done`) naturally implements these principles:
96
+
97
+ 1. **Each iteration is a fresh process** - Context is freed
98
+ 2. **State persists in files** - Progress survives context resets
99
+ 3. **Same prompt each time** - Focused, single-task context
100
+ 4. **Failures inform guardrails** - Learning without context pollution
101
+
102
+ This Cursor implementation aims to bring these benefits while working within Cursor's session model.
103
+
104
+ ## Measuring Context
105
+
106
+ Rough estimates:
107
+ - 1 token ≈ 4 characters
108
+ - Average code file: 500-2000 tokens
109
+ - Large file: 5000+ tokens
110
+ - Conversation history: 100-500 tokens per exchange
111
+
112
+ Track allocations in `.ralph/context-log.md` to stay aware.
113
+
114
+ ## When to Start Fresh
115
+
116
+ **Definitely start fresh when:**
117
+ - Switching to unrelated task
118
+ - Context over 90% full
119
+ - Same error 3+ times
120
+ - Model suggestions are off-topic
121
+
122
+ **Consider starting fresh when:**
123
+ - Context over 70% full
124
+ - Significant topic shift within task
125
+ - Feeling "stuck"
126
+ - Multiple failed approaches in history
@@ -1,174 +1,174 @@
1
- # Guardrails Reference ("Signs")
2
-
3
- This document explains how to create and use guardrails in Ralph.
4
-
5
- ## The Signs Metaphor
6
-
7
- From Geoffrey Huntley:
8
-
9
- > "Ralph is very good at making playgrounds, but he comes home bruised because he fell off the slide, so one then tunes Ralph by adding a sign next to the slide saying 'SLIDE DOWN, DON'T JUMP, LOOK AROUND,' and Ralph is more likely to look and see the sign."
10
-
11
- Signs are explicit instructions added to prevent known failure modes.
12
-
13
- ## Anatomy of a Sign
14
-
15
- ```markdown
16
- ### Sign: [Descriptive Name]
17
- - **Trigger**: When this situation occurs
18
- - **Instruction**: What to do instead
19
- - **Added after**: When/why this was added
20
- - **Example**: Concrete example if helpful
21
- ```
22
-
23
- ## Types of Signs
24
-
25
- ### 1. Preventive Signs
26
-
27
- Stop problems before they happen:
28
-
29
- ```markdown
30
- ### Sign: Validate Before Trust
31
- - **Trigger**: When receiving external input
32
- - **Instruction**: Always validate and sanitize input before using it
33
- - **Added after**: Iteration 3 - SQL injection vulnerability
34
- ```
35
-
36
- ### 2. Corrective Signs
37
-
38
- Fix recurring mistakes:
39
-
40
- ```markdown
41
- ### Sign: Check Return Values
42
- - **Trigger**: When calling functions that can fail
43
- - **Instruction**: Always check return values and handle errors
44
- - **Added after**: Iteration 7 - Null pointer exception
45
- ```
46
-
47
- ### 3. Process Signs
48
-
49
- Enforce good practices:
50
-
51
- ```markdown
52
- ### Sign: Test Before Commit
53
- - **Trigger**: Before committing changes
54
- - **Instruction**: Run the test suite and ensure all tests pass
55
- - **Added after**: Iteration 2 - Broken tests committed
56
- ```
57
-
58
- ### 4. Architecture Signs
59
-
60
- Guide design decisions:
61
-
62
- ```markdown
63
- ### Sign: Single Responsibility
64
- - **Trigger**: When a function grows beyond 50 lines
65
- - **Instruction**: Consider splitting into smaller, focused functions
66
- - **Added after**: Iteration 12 - Unmaintainable god function
67
- ```
68
-
69
- ## When to Add Signs
70
-
71
- Add a sign when:
72
-
73
- 1. **The same mistake happens twice** - Once is learning, twice is a pattern
74
- 2. **A subtle bug is found** - Prevent future occurrences
75
- 3. **A best practice is violated** - Reinforce good habits
76
- 4. **Context-specific knowledge is needed** - Project-specific conventions
77
-
78
- ## Sign Lifecycle
79
-
80
- ### Creation
81
-
82
- ```markdown
83
- ### Sign: [New Sign]
84
- - **Trigger**: [When it applies]
85
- - **Instruction**: [What to do]
86
- - **Added after**: Iteration N - [What happened]
87
- ```
88
-
89
- ### Refinement
90
-
91
- If a sign isn't working:
92
- - Make the trigger more specific
93
- - Make the instruction clearer
94
- - Add examples
95
-
96
- ### Retirement
97
-
98
- Signs can be removed when:
99
- - The underlying issue is fixed at a deeper level
100
- - The sign is no longer relevant
101
- - The sign is causing more problems than it solves
102
-
103
- ## Example Signs Library
104
-
105
- ### Security
106
-
107
- ```markdown
108
- ### Sign: Sanitize All Input
109
- - **Trigger**: Any user-provided data
110
- - **Instruction**: Use parameterized queries, escape HTML, validate types
111
- - **Example**: `db.query("SELECT * FROM users WHERE id = ?", [userId])`
112
- ```
113
-
114
- ### Error Handling
115
-
116
- ```markdown
117
- ### Sign: Graceful Degradation
118
- - **Trigger**: External service calls
119
- - **Instruction**: Always have a fallback for when services are unavailable
120
- - **Example**: Cache results, provide default values, show friendly errors
121
- ```
122
-
123
- ### Testing
124
-
125
- ```markdown
126
- ### Sign: Test the Unhappy Path
127
- - **Trigger**: Writing tests for new functionality
128
- - **Instruction**: Include tests for error cases, edge cases, and invalid input
129
- ```
130
-
131
- ### Code Quality
132
-
133
- ```markdown
134
- ### Sign: Explain Why, Not What
135
- - **Trigger**: Writing comments
136
- - **Instruction**: Comments should explain reasoning, not describe obvious code
137
- - **Example**: `// Using retry because API is flaky under load` not `// Call the API`
138
- ```
139
-
140
- ## Automatic Sign Detection
141
-
142
- The Ralph hooks can automatically detect some patterns and suggest signs:
143
-
144
- - **Thrashing**: Same file edited many times → "Step back and reconsider"
145
- - **Repeated errors**: Same test failing → "Check the test assumptions"
146
- - **Large changes**: Big diffs → "Consider smaller increments"
147
-
148
- These are logged in `.ralph/failures.md` and can be promoted to guardrails.
149
-
150
- ## Using Signs Effectively
151
-
152
- ### Do
153
-
154
- - Keep signs concise and actionable
155
- - Include concrete examples
156
- - Update signs when they're not working
157
- - Remove outdated signs
158
-
159
- ### Don't
160
-
161
- - Add signs for every minor issue
162
- - Make signs too vague ("be careful")
163
- - Ignore signs that keep triggering
164
- - Let the guardrails file become overwhelming
165
-
166
- ## Integration with Ralph
167
-
168
- Signs are:
169
- 1. Stored in `.ralph/guardrails.md`
170
- 2. Injected into context at the start of each iteration
171
- 3. Referenced when relevant situations arise
172
- 4. Updated based on observed failures
173
-
174
- The goal is a self-improving system where each failure makes future iterations smarter.
1
+ # Guardrails Reference ("Signs")
2
+
3
+ This document explains how to create and use guardrails in Ralph.
4
+
5
+ ## The Signs Metaphor
6
+
7
+ From Geoffrey Huntley:
8
+
9
+ > "Ralph is very good at making playgrounds, but he comes home bruised because he fell off the slide, so one then tunes Ralph by adding a sign next to the slide saying 'SLIDE DOWN, DON'T JUMP, LOOK AROUND,' and Ralph is more likely to look and see the sign."
10
+
11
+ Signs are explicit instructions added to prevent known failure modes.
12
+
13
+ ## Anatomy of a Sign
14
+
15
+ ```markdown
16
+ ### Sign: [Descriptive Name]
17
+ - **Trigger**: When this situation occurs
18
+ - **Instruction**: What to do instead
19
+ - **Added after**: When/why this was added
20
+ - **Example**: Concrete example if helpful
21
+ ```
22
+
23
+ ## Types of Signs
24
+
25
+ ### 1. Preventive Signs
26
+
27
+ Stop problems before they happen:
28
+
29
+ ```markdown
30
+ ### Sign: Validate Before Trust
31
+ - **Trigger**: When receiving external input
32
+ - **Instruction**: Always validate and sanitize input before using it
33
+ - **Added after**: Iteration 3 - SQL injection vulnerability
34
+ ```
35
+
36
+ ### 2. Corrective Signs
37
+
38
+ Fix recurring mistakes:
39
+
40
+ ```markdown
41
+ ### Sign: Check Return Values
42
+ - **Trigger**: When calling functions that can fail
43
+ - **Instruction**: Always check return values and handle errors
44
+ - **Added after**: Iteration 7 - Null pointer exception
45
+ ```
46
+
47
+ ### 3. Process Signs
48
+
49
+ Enforce good practices:
50
+
51
+ ```markdown
52
+ ### Sign: Test Before Commit
53
+ - **Trigger**: Before committing changes
54
+ - **Instruction**: Run the test suite and ensure all tests pass
55
+ - **Added after**: Iteration 2 - Broken tests committed
56
+ ```
57
+
58
+ ### 4. Architecture Signs
59
+
60
+ Guide design decisions:
61
+
62
+ ```markdown
63
+ ### Sign: Single Responsibility
64
+ - **Trigger**: When a function grows beyond 50 lines
65
+ - **Instruction**: Consider splitting into smaller, focused functions
66
+ - **Added after**: Iteration 12 - Unmaintainable god function
67
+ ```
68
+
69
+ ## When to Add Signs
70
+
71
+ Add a sign when:
72
+
73
+ 1. **The same mistake happens twice** - Once is learning, twice is a pattern
74
+ 2. **A subtle bug is found** - Prevent future occurrences
75
+ 3. **A best practice is violated** - Reinforce good habits
76
+ 4. **Context-specific knowledge is needed** - Project-specific conventions
77
+
78
+ ## Sign Lifecycle
79
+
80
+ ### Creation
81
+
82
+ ```markdown
83
+ ### Sign: [New Sign]
84
+ - **Trigger**: [When it applies]
85
+ - **Instruction**: [What to do]
86
+ - **Added after**: Iteration N - [What happened]
87
+ ```
88
+
89
+ ### Refinement
90
+
91
+ If a sign isn't working:
92
+ - Make the trigger more specific
93
+ - Make the instruction clearer
94
+ - Add examples
95
+
96
+ ### Retirement
97
+
98
+ Signs can be removed when:
99
+ - The underlying issue is fixed at a deeper level
100
+ - The sign is no longer relevant
101
+ - The sign is causing more problems than it solves
102
+
103
+ ## Example Signs Library
104
+
105
+ ### Security
106
+
107
+ ```markdown
108
+ ### Sign: Sanitize All Input
109
+ - **Trigger**: Any user-provided data
110
+ - **Instruction**: Use parameterized queries, escape HTML, validate types
111
+ - **Example**: `db.query("SELECT * FROM users WHERE id = ?", [userId])`
112
+ ```
113
+
114
+ ### Error Handling
115
+
116
+ ```markdown
117
+ ### Sign: Graceful Degradation
118
+ - **Trigger**: External service calls
119
+ - **Instruction**: Always have a fallback for when services are unavailable
120
+ - **Example**: Cache results, provide default values, show friendly errors
121
+ ```
122
+
123
+ ### Testing
124
+
125
+ ```markdown
126
+ ### Sign: Test the Unhappy Path
127
+ - **Trigger**: Writing tests for new functionality
128
+ - **Instruction**: Include tests for error cases, edge cases, and invalid input
129
+ ```
130
+
131
+ ### Code Quality
132
+
133
+ ```markdown
134
+ ### Sign: Explain Why, Not What
135
+ - **Trigger**: Writing comments
136
+ - **Instruction**: Comments should explain reasoning, not describe obvious code
137
+ - **Example**: `// Using retry because API is flaky under load` not `// Call the API`
138
+ ```
139
+
140
+ ## Automatic Sign Detection
141
+
142
+ The Ralph hooks can automatically detect some patterns and suggest signs:
143
+
144
+ - **Thrashing**: Same file edited many times → "Step back and reconsider"
145
+ - **Repeated errors**: Same test failing → "Check the test assumptions"
146
+ - **Large changes**: Big diffs → "Consider smaller increments"
147
+
148
+ These are logged in `.ralph/failures.md` and can be promoted to guardrails.
149
+
150
+ ## Using Signs Effectively
151
+
152
+ ### Do
153
+
154
+ - Keep signs concise and actionable
155
+ - Include concrete examples
156
+ - Update signs when they're not working
157
+ - Remove outdated signs
158
+
159
+ ### Don't
160
+
161
+ - Add signs for every minor issue
162
+ - Make signs too vague ("be careful")
163
+ - Ignore signs that keep triggering
164
+ - Let the guardrails file become overwhelming
165
+
166
+ ## Integration with Ralph
167
+
168
+ Signs are:
169
+ 1. Stored in `.ralph/guardrails.md`
170
+ 2. Injected into context at the start of each iteration
171
+ 3. Referenced when relevant situations arise
172
+ 4. Updated based on observed failures
173
+
174
+ The goal is a self-improving system where each failure makes future iterations smarter.
package/AGENTS.md CHANGED
@@ -1,20 +1,20 @@
1
- # AGENTS
2
-
3
- Keep this file short. It is always loaded into context.
4
-
5
- ## Build & test
6
- - No build step.
7
- - Tests (dry-run): `npm test`
8
- - Fast real agent check: `npm run test:ping`
9
- - Full real loop: `npm run test:real`
10
-
11
- ## CLI shape
12
- - CLI entry: `bin/ralph`
13
- - Templates: `.agents/ralph/` (copied to repos on install)
14
- - State/logs: `.ralph/` (local only)
15
- - Skills: `skills/`
16
- - Tests: `tests/`
17
- - Docs/examples: `README.md`, `examples/`
18
-
19
- ## Quirks / Guardrails
20
- **Add any common quirks guiderails here as needed**
1
+ # AGENTS
2
+
3
+ Keep this file short. It is always loaded into context.
4
+
5
+ ## Build & test
6
+ - No build step.
7
+ - Tests (dry-run): `npm test`
8
+ - Fast real agent check: `npm run test:ping`
9
+ - Full real loop: `npm run test:real`
10
+
11
+ ## CLI shape
12
+ - CLI entry: `bin/ralph`
13
+ - Templates: `.agents/ralph/` (copied to repos on install)
14
+ - State/logs: `.ralph/` (local only)
15
+ - Skills: `skills/`
16
+ - Tests: `tests/`
17
+ - Docs/examples: `README.md`, `examples/`
18
+
19
+ ## Quirks / Guardrails
20
+ **Add any common quirks guiderails here as needed**