lithermes-ai 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (133) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +245 -0
  3. package/README_Ko-KR.md +245 -0
  4. package/assets/lithermes-plugin/NOTICE.md +37 -0
  5. package/assets/lithermes-plugin/README.md +40 -0
  6. package/assets/lithermes-plugin/__init__.py +179 -0
  7. package/assets/lithermes-plugin/core.py +853 -0
  8. package/assets/lithermes-plugin/litgoal/__init__.py +10 -0
  9. package/assets/lithermes-plugin/litgoal/cli.py +133 -0
  10. package/assets/lithermes-plugin/litgoal/hook.py +48 -0
  11. package/assets/lithermes-plugin/litgoal/model.py +171 -0
  12. package/assets/lithermes-plugin/litgoal/runtime.py +273 -0
  13. package/assets/lithermes-plugin/litgoal/store.py +93 -0
  14. package/assets/lithermes-plugin/litgoal/tools.py +228 -0
  15. package/assets/lithermes-plugin/payload-version.json +471 -0
  16. package/assets/lithermes-plugin/plugin.yaml +9 -0
  17. package/assets/lithermes-plugin/skills/ai-slop-remover/SKILL.md +142 -0
  18. package/assets/lithermes-plugin/skills/comment-checker/SKILL.md +50 -0
  19. package/assets/lithermes-plugin/skills/debugging/SKILL.md +116 -0
  20. package/assets/lithermes-plugin/skills/debugging/references/methodology/00-setup.md +108 -0
  21. package/assets/lithermes-plugin/skills/debugging/references/methodology/02-investigate.md +121 -0
  22. package/assets/lithermes-plugin/skills/debugging/references/methodology/04-oracle-triple.md +136 -0
  23. package/assets/lithermes-plugin/skills/debugging/references/methodology/05-escalate.md +69 -0
  24. package/assets/lithermes-plugin/skills/debugging/references/methodology/06-fix.md +116 -0
  25. package/assets/lithermes-plugin/skills/debugging/references/methodology/08-qa.md +94 -0
  26. package/assets/lithermes-plugin/skills/debugging/references/methodology/09-cleanup.md +164 -0
  27. package/assets/lithermes-plugin/skills/debugging/references/methodology/partial-runtime-evidence.md +229 -0
  28. package/assets/lithermes-plugin/skills/debugging/references/runtimes/bundled-js-binary.md +415 -0
  29. package/assets/lithermes-plugin/skills/debugging/references/runtimes/go.md +252 -0
  30. package/assets/lithermes-plugin/skills/debugging/references/runtimes/native-binary.md +484 -0
  31. package/assets/lithermes-plugin/skills/debugging/references/runtimes/node.md +260 -0
  32. package/assets/lithermes-plugin/skills/debugging/references/runtimes/python.md +248 -0
  33. package/assets/lithermes-plugin/skills/debugging/references/runtimes/rust.md +234 -0
  34. package/assets/lithermes-plugin/skills/debugging/references/tools/ghidra.md +212 -0
  35. package/assets/lithermes-plugin/skills/debugging/references/tools/playwright-cli.md +194 -0
  36. package/assets/lithermes-plugin/skills/debugging/references/tools/pwndbg.md +263 -0
  37. package/assets/lithermes-plugin/skills/debugging/references/tools/pwntools.md +265 -0
  38. package/assets/lithermes-plugin/skills/frontend-ui-ux/SKILL.md +77 -0
  39. package/assets/lithermes-plugin/skills/lit-plan/SKILL.md +374 -0
  40. package/assets/lithermes-plugin/skills/litgoal/.gitkeep +0 -0
  41. package/assets/lithermes-plugin/skills/litgoal/SKILL.md +207 -0
  42. package/assets/lithermes-plugin/skills/litwork/SKILL.md +262 -0
  43. package/assets/lithermes-plugin/skills/lsp/SKILL.md +53 -0
  44. package/assets/lithermes-plugin/skills/programming/SKILL.md +463 -0
  45. package/assets/lithermes-plugin/skills/programming/references/go/README.md +90 -0
  46. package/assets/lithermes-plugin/skills/programming/references/go/backend-stack.md +641 -0
  47. package/assets/lithermes-plugin/skills/programming/references/go/bootstrap.md +328 -0
  48. package/assets/lithermes-plugin/skills/programming/references/go/bubbletea-v2.md +360 -0
  49. package/assets/lithermes-plugin/skills/programming/references/go/cobra-stack.md +468 -0
  50. package/assets/lithermes-plugin/skills/programming/references/go/concurrency.md +362 -0
  51. package/assets/lithermes-plugin/skills/programming/references/go/data-modeling.md +329 -0
  52. package/assets/lithermes-plugin/skills/programming/references/go/error-handling.md +359 -0
  53. package/assets/lithermes-plugin/skills/programming/references/go/golangci-strict.md +236 -0
  54. package/assets/lithermes-plugin/skills/programming/references/go/grpc-connect.md +375 -0
  55. package/assets/lithermes-plugin/skills/programming/references/go/libraries.md +337 -0
  56. package/assets/lithermes-plugin/skills/programming/references/go/one-liners.md +202 -0
  57. package/assets/lithermes-plugin/skills/programming/references/go/sqlc-pgx.md +471 -0
  58. package/assets/lithermes-plugin/skills/programming/references/go/testing.md +467 -0
  59. package/assets/lithermes-plugin/skills/programming/references/go/type-patterns.md +298 -0
  60. package/assets/lithermes-plugin/skills/programming/references/python/README.md +314 -0
  61. package/assets/lithermes-plugin/skills/programming/references/python/async-anyio.md +442 -0
  62. package/assets/lithermes-plugin/skills/programming/references/python/data-modeling.md +233 -0
  63. package/assets/lithermes-plugin/skills/programming/references/python/data-processing.md +133 -0
  64. package/assets/lithermes-plugin/skills/programming/references/python/error-handling.md +218 -0
  65. package/assets/lithermes-plugin/skills/programming/references/python/fastapi-stack.md +316 -0
  66. package/assets/lithermes-plugin/skills/programming/references/python/httpx2-optimization.md +360 -0
  67. package/assets/lithermes-plugin/skills/programming/references/python/libraries.md +307 -0
  68. package/assets/lithermes-plugin/skills/programming/references/python/one-liners.md +268 -0
  69. package/assets/lithermes-plugin/skills/programming/references/python/orjson-stack.md +378 -0
  70. package/assets/lithermes-plugin/skills/programming/references/python/pydantic-ai.md +285 -0
  71. package/assets/lithermes-plugin/skills/programming/references/python/pyproject-strict.md +232 -0
  72. package/assets/lithermes-plugin/skills/programming/references/python/textual-tui.md +201 -0
  73. package/assets/lithermes-plugin/skills/programming/references/python/type-patterns.md +176 -0
  74. package/assets/lithermes-plugin/skills/programming/references/rust/README.md +317 -0
  75. package/assets/lithermes-plugin/skills/programming/references/rust/async-tokio.md +299 -0
  76. package/assets/lithermes-plugin/skills/programming/references/rust/axum-stack.md +467 -0
  77. package/assets/lithermes-plugin/skills/programming/references/rust/cargo-strict.md +317 -0
  78. package/assets/lithermes-plugin/skills/programming/references/rust/clap-stack.md +409 -0
  79. package/assets/lithermes-plugin/skills/programming/references/rust/concurrency.md +375 -0
  80. package/assets/lithermes-plugin/skills/programming/references/rust/libraries.md +439 -0
  81. package/assets/lithermes-plugin/skills/programming/references/rust/one-liners.md +291 -0
  82. package/assets/lithermes-plugin/skills/programming/references/rust/proptest-insta.md +429 -0
  83. package/assets/lithermes-plugin/skills/programming/references/rust/type-state.md +354 -0
  84. package/assets/lithermes-plugin/skills/programming/references/rust/unsafe-discipline.md +250 -0
  85. package/assets/lithermes-plugin/skills/programming/references/rust/zero-cost-safety.md +527 -0
  86. package/assets/lithermes-plugin/skills/programming/references/rust-ub/README.md +289 -0
  87. package/assets/lithermes-plugin/skills/programming/references/rust-ub/miri-sanitizers-loom.md +411 -0
  88. package/assets/lithermes-plugin/skills/programming/references/rust-ub/ub-taxonomy.md +269 -0
  89. package/assets/lithermes-plugin/skills/programming/references/typescript/README.md +195 -0
  90. package/assets/lithermes-plugin/skills/programming/references/typescript/backend-hono.md +672 -0
  91. package/assets/lithermes-plugin/skills/programming/references/typescript/bootstrap.md +199 -0
  92. package/assets/lithermes-plugin/skills/programming/references/typescript/data-modeling.md +202 -0
  93. package/assets/lithermes-plugin/skills/programming/references/typescript/error-handling.md +169 -0
  94. package/assets/lithermes-plugin/skills/programming/references/typescript/tsconfig-strict.md +152 -0
  95. package/assets/lithermes-plugin/skills/programming/references/typescript/type-patterns.md +196 -0
  96. package/assets/lithermes-plugin/skills/programming/scripts/go/check-no-excuse-rules.sh +173 -0
  97. package/assets/lithermes-plugin/skills/programming/scripts/go/new-project.py +138 -0
  98. package/assets/lithermes-plugin/skills/programming/scripts/go/templates/.editorconfig +13 -0
  99. package/assets/lithermes-plugin/skills/programming/scripts/go/templates/.golangci.yml +95 -0
  100. package/assets/lithermes-plugin/skills/programming/scripts/go/templates/AGENTS.md.tmpl +24 -0
  101. package/assets/lithermes-plugin/skills/programming/scripts/go/templates/README.md.tmpl +12 -0
  102. package/assets/lithermes-plugin/skills/programming/scripts/go/templates/Taskfile.yml +40 -0
  103. package/assets/lithermes-plugin/skills/programming/scripts/go/templates/ci.yml +37 -0
  104. package/assets/lithermes-plugin/skills/programming/scripts/go/templates/config.go +24 -0
  105. package/assets/lithermes-plugin/skills/programming/scripts/go/templates/gitignore +15 -0
  106. package/assets/lithermes-plugin/skills/programming/scripts/go/templates/main.go.tmpl +22 -0
  107. package/assets/lithermes-plugin/skills/programming/scripts/go/templates/run.go +15 -0
  108. package/assets/lithermes-plugin/skills/programming/scripts/python/check-no-excuse-rules.py +687 -0
  109. package/assets/lithermes-plugin/skills/programming/scripts/python/new-project.py +172 -0
  110. package/assets/lithermes-plugin/skills/programming/scripts/python/new-script.py +116 -0
  111. package/assets/lithermes-plugin/skills/programming/scripts/rust/check-no-excuse-rules.py +296 -0
  112. package/assets/lithermes-plugin/skills/programming/scripts/rust/check-no-excuse-rules.sh +158 -0
  113. package/assets/lithermes-plugin/skills/programming/scripts/rust/new-project.py +175 -0
  114. package/assets/lithermes-plugin/skills/programming/scripts/typescript/check-no-excuse-rules.ts +282 -0
  115. package/assets/lithermes-plugin/skills/programming/scripts/typescript/new-project.ts +177 -0
  116. package/assets/lithermes-plugin/skills/refactor/SKILL.md +770 -0
  117. package/assets/lithermes-plugin/skills/remove-ai-slops/SKILL.md +335 -0
  118. package/assets/lithermes-plugin/skills/review-work/SKILL.md +562 -0
  119. package/assets/lithermes-plugin/skills/rules/SKILL.md +41 -0
  120. package/assets/lithermes-plugin/skills/start-work/SKILL.md +332 -0
  121. package/bin/lithermes.js +8 -0
  122. package/cover.png +0 -0
  123. package/package.json +39 -0
  124. package/src/cli.js +129 -0
  125. package/src/lib/check.js +94 -0
  126. package/src/lib/config.js +170 -0
  127. package/src/lib/files.js +65 -0
  128. package/src/lib/hermesDiscovery.js +50 -0
  129. package/src/lib/hud.js +121 -0
  130. package/src/lib/install.js +159 -0
  131. package/src/lib/patch.js +153 -0
  132. package/src/lib/skins.js +113 -0
  133. package/src/lib/spinner.js +104 -0
@@ -0,0 +1,562 @@
1
+ ---
2
+ name: review-work
3
+ description: "Post-implementation review orchestrator. Launches 5 parallel background sub-agents: Oracle (goal/constraint verification), Oracle (code quality), Oracle (security), unspecified-high (hands-on QA execution), unspecified-high (context mining from GitHub/git/Slack/Notion). All must pass for review to pass. MUST USE after completing any significant implementation work. Triggers: 'review work', 'review my work', 'review changes', 'QA my work', 'verify implementation', 'check my work', 'validate changes', 'post-implementation review'."
4
+ ---
5
+ ## LitHermes Hermes-native execution (authoritative — overrides any harness examples below)
6
+
7
+ In Hermes there is **one** way to fan out review lanes: the native **`delegate_task`**
8
+ tool. There is no `spawn_agent`, `task()`, `background_output()`, or
9
+ `team_*` — ignore any such names in the prose below and translate them to `delegate_task`.
10
+
11
+ Start this skill from the `/review-work` slash command. It runs Phase 0 for you
12
+ (collects `git diff --name-only`, the diff, and the detected run command) and hands you
13
+ the five lane briefs plus the gate contract. Then:
14
+
15
+ | Legacy prose says | Do this in Hermes |
16
+ | --- | --- |
17
+ | "launch 5 background sub-agents" | call `delegate_task` **once** with a batch of 5 child tasks (they run in parallel) |
18
+ | `task(subagent_type="oracle", ...)` | a `delegate_task` child whose message contains the lane's full brief + the diff/files inline |
19
+ | `background_output(task_id=...)` / `wait_agent(...)` | the `delegate_task` batch blocks until every child returns; you receive all results together |
20
+ | `load_skills=[...]` | name the skills to load inside the child's `message` |
21
+
22
+ Lane → child mapping (dispatch all five in the single batch):
23
+ `goal` · `qa` · `code-quality` · `security` (supplementary) · `context`.
24
+
25
+ Each child returns: `verdict` (PASS|FAIL), `confidence`, and findings with `file:line`.
26
+ Aggregate and dedupe across lanes, then apply the **all-or-nothing gate**: any lane FAIL
27
+ ⇒ **REVIEW FAILED** (list blocking issues by severity); all five PASS ⇒ **REVIEW PASSED**
28
+ (non-blocking suggestions only). Record the per-lane verdicts; the plugin's `subagent_stop`
29
+ hook also logs each child to the review ledger. If a code block below conflicts with this
30
+ section, this section wins. Every `task(...)`, `background_output(...)`, `spawn_agent(...)`,
31
+ `wait_agent(...)`, and `subagent_type=...` example in the prose below is a LEGACY illustration
32
+ copied from another harness — never call any of them literally; map each to `delegate_task`.
33
+
34
+ # Review Work - 5-Agent Parallel Review Orchestrator
35
+
36
+ Launch 5 specialized sub-agents in parallel to review completed implementation work from every angle. All 5 must pass for the review to pass. If even ONE fails, the review fails.
37
+
38
+ The 5 agents cover complementary concerns - together they form a comprehensive review that no single reviewer could match:
39
+
40
+ | # | Agent | Type | Role | Focus Level |
41
+ |---|-------|------|------|-------------|
42
+ | 1 | Goal Verifier | Oracle | Did we build what was asked? | MAIN |
43
+ | 2 | QA Executor | unspecified-high | Does it actually work? | MAIN |
44
+ | 3 | Code Reviewer | Oracle | Is the code well-written? | MAIN |
45
+ | 4 | Security Auditor | Oracle | Is it secure? | SUB |
46
+ | 5 | Context Miner | unspecified-high | Did we miss any context? | MAIN |
47
+
48
+ ---
49
+
50
+ ## Phase 0: Gather Review Context
51
+
52
+ Before launching agents, collect these inputs. Extract from conversation history first - the user's original request, constraints discussed, and decisions made are usually already in the thread. Only ask if truly missing.
53
+
54
+ <required_inputs>
55
+
56
+ - **GOAL**: The original objective. What was the user trying to achieve? Pull from the initial request in this conversation.
57
+ - **CONSTRAINTS**: Rules, requirements, or limitations. Tech stack restrictions, performance targets, API contracts, design patterns to follow, backward compatibility needs.
58
+ - **BACKGROUND**: Why this work was needed. Business context, user stories, related systems, prior decisions that informed the approach.
59
+ - **CHANGED_FILES**: Auto-collect via `git diff --name-only HEAD~1` or against the appropriate base (branch point, specific commit).
60
+ - **DIFF**: Auto-collect via `git diff HEAD~1` or against the appropriate base.
61
+ - **FILE_CONTENTS**: Read the full content of each changed file (not just the diff). Oracle agents cannot read files - they need full context in the prompt.
62
+ - **RUN_COMMAND**: How to start/run the application. Check `package.json` scripts, `Makefile`, `docker-compose.yml`, or ask the user.
63
+
64
+ </required_inputs>
65
+
66
+
67
+ **NEVER CHECKOUT A PR BRANCH IN THE MAIN WORKTREE. ALWAYS CREATE A NEW GIT WORKTREE (`git worktree add`) AND WORK THERE. THIS PREVENTS CONTAMINATING THE USER'S WORKING DIRECTORY WITH UNRELATED BRANCH STATE.**
68
+
69
+ **Auto-collection sequence:**
70
+
71
+ ```bash
72
+ # 1. Get changed files
73
+ git diff --name-only HEAD~1 # or: git diff --name-only main...HEAD
74
+
75
+ # 2. Get diff
76
+ git diff HEAD~1 # or: git diff main...HEAD
77
+
78
+ # 3. Detect run command
79
+ # Check package.json -> "scripts.dev" or "scripts.start"
80
+ # Check Makefile -> default target
81
+ # Check docker-compose.yml -> services
82
+ ```
83
+
84
+ For GOAL, CONSTRAINTS, BACKGROUND - review the full conversation history. The user's original message almost always contains the goal. Constraints often emerge during discussion. If anything critical is ambiguous, ask ONE focused question - not a checklist.
85
+
86
+ ---
87
+
88
+ ## Phase 1: Launch 5 Agents
89
+
90
+ Launch ALL 5 in a single turn. Every agent uses `run_in_background=true`. No sequential launches. No waiting between them.
91
+
92
+ **Oracle agents receive everything in the prompt** (they cannot read files or run commands). Include DIFF + FILE_CONTENTS + all context directly in the prompt text.
93
+
94
+ **unspecified-high agents are autonomous** - they can read files, run commands, and use tools. Give them goals and pointers, not raw content dumps.
95
+
96
+ ---
97
+
98
+ ### Agent 1: Goal & Constraint Verification (Oracle) - MAIN
99
+
100
+ This agent answers: "Did we build exactly what was asked, within the rules we were given?"
101
+
102
+ ```
103
+ task(
104
+ subagent_type="oracle",
105
+ run_in_background=true,
106
+ load_skills=[],
107
+ description="Verify implementation against original goal and constraints",
108
+ prompt="""
109
+ <review_type>GOAL & CONSTRAINT VERIFICATION</review_type>
110
+
111
+ <original_goal>
112
+ {GOAL - paste the user's original request and any clarifications}
113
+ </original_goal>
114
+
115
+ <constraints>
116
+ {CONSTRAINTS - every rule, requirement, or limitation discussed}
117
+ </constraints>
118
+
119
+ <background>
120
+ {BACKGROUND - why this work was needed, broader context}
121
+ </background>
122
+
123
+ <changed_files>
124
+ {CHANGED_FILES - list of modified file paths}
125
+ </changed_files>
126
+
127
+ <file_contents>
128
+ {FILE_CONTENTS - full content of every changed file, clearly delimited per file}
129
+ </file_contents>
130
+
131
+ <diff>
132
+ {DIFF - the actual git diff}
133
+ </diff>
134
+
135
+ Review whether this implementation correctly and completely achieves the stated goal within the given constraints. Be obsessively thorough - the point of this review is to catch what the implementer missed.
136
+
137
+ REVIEW CHECKLIST:
138
+
139
+ 1. **Goal Completeness**: Break the goal into every sub-requirement (explicit AND implied). For each, mark ACHIEVED / MISSED / PARTIAL. Missing even one implied requirement that a reasonable engineer would have addressed = PARTIAL at minimum.
140
+
141
+ 2. **Constraint Compliance**: List every constraint. For each, verify compliance with specific code evidence. A constraint violated = automatic FAIL.
142
+
143
+ 3. **Requirement Gaps**: Requirements the user clearly wanted but didn't spell out. Things implied by the goal or background that a thoughtful engineer would have included.
144
+
145
+ 4. **Over-Engineering**: Anything added that wasn't requested - unnecessary abstractions, extra features, premature optimizations, speculative generality. Flag these as scope creep.
146
+
147
+ 5. **Edge Cases**: Given the goal, what inputs or scenarios would break this? Trace through at least 5 edge cases mentally.
148
+
149
+ 6. **Behavioral Correctness**: Walk through the code logic for 3+ representative scenarios. Does the code actually produce the expected behavior in each case?
150
+
151
+ OUTPUT FORMAT:
152
+ <verdict>PASS or FAIL</verdict>
153
+ <confidence>HIGH / MEDIUM / LOW</confidence>
154
+ <summary>1-3 sentence overall assessment</summary>
155
+ <goal_breakdown>
156
+ For each sub-requirement:
157
+ - [ACHIEVED/MISSED/PARTIAL] Requirement description
158
+ - Evidence: specific code reference or gap
159
+ </goal_breakdown>
160
+ <constraint_compliance>
161
+ For each constraint:
162
+ - [ACHIEVED/MISSED] Constraint description - evidence
163
+ </constraint_compliance>
164
+ <findings>
165
+ - [PASS/FAIL/WARN] Category: Description
166
+ - File: path (line range if applicable)
167
+ - Evidence: specific code or logic reference
168
+ </findings>
169
+ <blocking_issues>Issues that MUST be fixed. Empty if PASS.</blocking_issues>
170
+ """)
171
+ ```
172
+
173
+ ---
174
+
175
+ ### Agent 2: QA via App Execution (unspecified-high) - MAIN
176
+
177
+ This agent answers: "Does it actually work when you run it?"
178
+
179
+ The QA agent follows a structured process: brainstorm scenarios exhaustively first, then self-review and augment, then create a task list, then execute systematically.
180
+
181
+ ```
182
+ task(
183
+ category="unspecified-high",
184
+ run_in_background=true,
185
+ load_skills=["playwright", "dev-browser"],
186
+ description="QA by actually running and using the application",
187
+ prompt="""
188
+ <review_type>QA - HANDS-ON APP EXECUTION</review_type>
189
+
190
+ <original_goal>
191
+ {GOAL}
192
+ </original_goal>
193
+
194
+ <constraints>
195
+ {CONSTRAINTS}
196
+ </constraints>
197
+
198
+ <changed_files>
199
+ {CHANGED_FILES}
200
+ </changed_files>
201
+
202
+ <run_command>
203
+ {RUN_COMMAND - how to start the application, or "unknown" if not determined}
204
+ </run_command>
205
+
206
+ You are a QA engineer. Your job is to RUN the application and verify it works through hands-on testing. You do not review code - you test behavior.
207
+
208
+ MANDATORY PROCESS (follow in order):
209
+
210
+ ### Step 1: Scenario Brainstorm
211
+
212
+ Before touching the app, write down EVERY test scenario you can think of. Be exhaustive. Think about:
213
+
214
+ - **Happy paths**: The primary use cases this implementation enables. What's the main thing the user wanted to do?
215
+ - **Boundary conditions**: Empty inputs, maximum-length inputs, zero values, negative numbers, special characters, unicode, very large datasets.
216
+ - **Error paths**: Invalid inputs, network failures, missing files, permission denied, timeout conditions.
217
+ - **Regression scenarios**: Existing features that touch the same code paths. Things that worked before and must still work.
218
+ - **State transitions**: What happens when you do things out of order? Rapid repeated actions? Concurrent usage?
219
+ - **UX scenarios** (if applicable): Layout on different sizes, keyboard navigation, screen reader compatibility, loading states, error messages.
220
+ - **Integration points**: Does this feature interact with external services, databases, or other modules? Test those boundaries.
221
+
222
+ Write each scenario as a one-liner with expected behavior. Aim for 15-30 scenarios minimum.
223
+
224
+ ### Step 2: Scenario Augmentation
225
+
226
+ Review your scenario list with fresh eyes. For each scenario, ask:
227
+ - "What could go wrong here that I haven't considered?"
228
+ - "What would a malicious or careless user do?"
229
+ - "What environmental conditions could affect this?" (disk full, slow network, expired tokens)
230
+
231
+ Add at least 5 more scenarios from this reflection. Group scenarios by priority: P0 (must pass), P1 (should pass), P2 (nice to pass).
232
+
233
+ ### Step 3: Create Task List
234
+
235
+ Convert your augmented scenario list into a structured task list (use TaskCreate/TaskUpdate or your todo system). Each task = one test scenario with:
236
+ - Test name
237
+ - Steps to execute
238
+ - Expected result
239
+ - Priority (P0/P1/P2)
240
+
241
+ ### Step 4: Execute Systematically
242
+
243
+ Work through the task list in priority order (P0 first). For each test:
244
+
245
+ 1. Execute the test steps
246
+ 2. Record actual result
247
+ 3. Compare with expected result
248
+ 4. Mark PASS or FAIL
249
+ 5. If FAIL: capture evidence (screenshot, terminal output, error message)
250
+ 6. Mark the task complete
251
+
252
+ **Execution guidance by app type:**
253
+ - **Web app**: Use playwright/dev-browser to navigate, click, fill forms, verify visual output.
254
+ - **CLI tool**: Run commands with various arguments, pipe inputs, check exit codes and output.
255
+ - **Library/SDK**: Write and execute a test script that imports and exercises the public API.
256
+ - **Backend API**: Use curl/httpie to hit endpoints with various payloads, verify response codes and bodies.
257
+ - **Mobile/Desktop**: If not directly runnable, write integration tests and execute them.
258
+
259
+ If the app cannot be started (build failure), that's an immediate FAIL - no need to continue.
260
+
261
+ ### Step 5: Compile Results
262
+
263
+ OUTPUT FORMAT:
264
+ <verdict>PASS or FAIL</verdict>
265
+ <confidence>HIGH / MEDIUM / LOW</confidence>
266
+ <summary>1-3 sentence overall assessment</summary>
267
+ <scenario_coverage>
268
+ Total scenarios: N
269
+ P0: X tested, Y passed
270
+ P1: X tested, Y passed
271
+ P2: X tested, Y passed
272
+ </scenario_coverage>
273
+ <test_results>
274
+ For each test:
275
+ - [PASS/FAIL] Test name (Priority)
276
+ - Steps: What you did
277
+ - Expected: What should happen
278
+ - Actual: What actually happened
279
+ - Evidence: Screenshot path or terminal output snippet (if FAIL)
280
+ </test_results>
281
+ <blocking_issues>P0 or P1 failures only. Empty if PASS.</blocking_issues>
282
+ """)
283
+ ```
284
+
285
+ ---
286
+
287
+ ### Agent 3: Code Quality Review (Oracle) - MAIN
288
+
289
+ This agent answers: "Is the code well-written, maintainable, and consistent with the codebase?"
290
+
291
+ ```
292
+ task(
293
+ subagent_type="oracle",
294
+ run_in_background=true,
295
+ load_skills=[],
296
+ description="Review overall code quality, patterns, and architecture",
297
+ prompt="""
298
+ <review_type>CODE QUALITY REVIEW</review_type>
299
+
300
+ <changed_files>
301
+ {CHANGED_FILES}
302
+ </changed_files>
303
+
304
+ <file_contents>
305
+ {FILE_CONTENTS - full content of changed files AND neighboring files that show existing patterns}
306
+ </file_contents>
307
+
308
+ <diff>
309
+ {DIFF}
310
+ </diff>
311
+
312
+ <background>
313
+ {BACKGROUND}
314
+ </background>
315
+
316
+ You are a senior staff engineer conducting a code review. Your standard: "Would I approve this PR without comments?"
317
+
318
+ REVIEW DIMENSIONS (examine each):
319
+
320
+ 1. **Correctness**: Logic errors, off-by-one, null/undefined handling, race conditions, resource leaks, unhandled promise rejections.
321
+
322
+ 2. **Pattern Consistency**: Does new code follow the codebase's established patterns? Compare with the neighboring files provided. Introducing a new pattern where one already exists = finding.
323
+
324
+ 3. **Naming & Readability**: Clear variable/function/type names? Self-documenting code? Would another engineer understand this without explanation?
325
+
326
+ 4. **Error Handling**: Errors properly caught, logged, and propagated? No empty catch blocks? No swallowed errors? User-facing errors are helpful?
327
+
328
+ 5. **Type Safety**: Any `as any`, `@ts-ignore`, `@ts-expect-error`? Proper generic usage? Correct type narrowing? (If TypeScript/typed language)
329
+
330
+ 6. **Performance**: N+1 queries? Unnecessary re-renders? Blocking I/O on hot paths? Memory leaks? Unbounded growth?
331
+
332
+ 7. **Abstraction Level**: Right level of abstraction? No copy-paste duplication? But also no premature over-abstraction?
333
+
334
+ 8. **Testing**: New behaviors covered by tests? Tests are meaningful, not just coverage padding? Test names describe scenarios?
335
+
336
+ 9. **API Design**: Public interfaces clean and consistent with existing APIs? Breaking changes flagged?
337
+
338
+ 10. **Tech Debt**: Does this introduce new tech debt? Or create coupling that will be painful to change?
339
+
340
+ Categorize each finding by severity:
341
+ - **CRITICAL**: Will cause bugs, data loss, or crashes in production
342
+ - **MAJOR**: Significant quality issue that should be fixed before merge
343
+ - **MINOR**: Improvement worth making but not blocking
344
+ - **NITPICK**: Style preference, optional
345
+
346
+ OUTPUT FORMAT:
347
+ <verdict>PASS or FAIL</verdict>
348
+ <confidence>HIGH / MEDIUM / LOW</confidence>
349
+ <summary>1-3 sentence overall assessment</summary>
350
+ <findings>
351
+ - [CRITICAL/MAJOR/MINOR/NITPICK] Category: Description
352
+ - File: path (line range)
353
+ - Current: what the code does now
354
+ - Suggestion: how to improve
355
+ </findings>
356
+ <blocking_issues>CRITICAL and MAJOR items only. Empty if PASS.</blocking_issues>
357
+ """)
358
+ ```
359
+
360
+ ---
361
+
362
+ ### Agent 4: Security Review (Oracle) - SUB
363
+
364
+ This agent answers: "Are there security vulnerabilities in these changes?"
365
+
366
+ This is supplementary - it focuses exclusively on security. It does NOT comment on code style, architecture, or functionality unless those directly create a security risk.
367
+
368
+ ```
369
+ task(
370
+ subagent_type="oracle",
371
+ run_in_background=true,
372
+ load_skills=[],
373
+ description="Security-focused review of implementation changes",
374
+ prompt="""
375
+ <review_type>SECURITY REVIEW (supplementary)</review_type>
376
+
377
+ <changed_files>
378
+ {CHANGED_FILES}
379
+ </changed_files>
380
+
381
+ <file_contents>
382
+ {FILE_CONTENTS - full content of changed files}
383
+ </file_contents>
384
+
385
+ <diff>
386
+ {DIFF}
387
+ </diff>
388
+
389
+ You are a security engineer. Review this diff exclusively for security vulnerabilities and anti-patterns. Ignore code style, naming, architecture - unless it directly creates a security risk.
390
+
391
+ SECURITY CHECKLIST:
392
+
393
+ 1. **Input Validation**: User inputs sanitized? SQL injection, XSS, command injection, SSRF vectors?
394
+ 2. **Auth & AuthZ**: Authentication checks where needed? Authorization verified for each action? Privilege escalation paths?
395
+ 3. **Secrets & Credentials**: Hardcoded secrets, API keys, tokens in code or config? Secrets in logs?
396
+ 4. **Data Exposure**: Sensitive data in logs? PII in error messages? Over-exposed API responses?
397
+ 5. **Dependencies**: New dependencies added? Known CVEs? Suspicious or unnecessary packages?
398
+ 6. **Cryptography**: Proper algorithms? No custom crypto? Secure random? Proper key management?
399
+ 7. **File & Path**: Path traversal? Unsafe file operations? Symlink following?
400
+ 8. **Network**: CORS configured correctly? Rate limiting? TLS enforced? Certificate validation?
401
+ 9. **Error Leakage**: Stack traces exposed to users? Internal details in error responses?
402
+ 10. **Supply Chain**: Lockfile updated consistently? Dependency pinning?
403
+
404
+ OUTPUT FORMAT:
405
+ <verdict>PASS or FAIL</verdict>
406
+ <severity>CRITICAL / HIGH / MEDIUM / LOW / NONE</severity>
407
+ <summary>1-3 sentence overall assessment</summary>
408
+ <findings>
409
+ - [CRITICAL/HIGH/MEDIUM/LOW] Category: Description
410
+ - File: path (line range)
411
+ - Risk: What could an attacker do?
412
+ - Remediation: Specific fix
413
+ </findings>
414
+ <blocking_issues>CRITICAL and HIGH items only. Empty if PASS.</blocking_issues>
415
+ """)
416
+ ```
417
+
418
+ ---
419
+
420
+ ### Agent 5: Context Mining (unspecified-high) - MAIN
421
+
422
+ This agent answers: "Did we miss any context that should have informed this implementation?"
423
+
424
+ ```
425
+ task(
426
+ category="unspecified-high",
427
+ run_in_background=true,
428
+ load_skills=["git-master"],
429
+ description="Mine all accessible contexts for missed requirements or background knowledge",
430
+ prompt="""
431
+ <review_type>CONTEXT MINING - MISSED REQUIREMENTS & BACKGROUND</review_type>
432
+
433
+ <original_goal>
434
+ {GOAL}
435
+ </original_goal>
436
+
437
+ <constraints>
438
+ {CONSTRAINTS}
439
+ </constraints>
440
+
441
+ <changed_files>
442
+ {CHANGED_FILES}
443
+ </changed_files>
444
+
445
+ <background>
446
+ {BACKGROUND}
447
+ </background>
448
+
449
+ You are an investigator. Your mission: search every accessible information source to find context that should have informed this implementation but might have been missed. The question: "Is there something we should have known but didn't?"
450
+
451
+ SOURCES TO SEARCH (use every available tool):
452
+
453
+ 1. **Git History** (ALWAYS search):
454
+ - `git log --oneline -20 -- {each changed file}` - recent changes and their reasons
455
+ - `git blame {critical sections}` - who wrote what and when
456
+ - `git log --all --grep="{keywords from goal}"` - related commits
457
+ - Look for reverted commits, TODO/FIXME/HACK comments in history
458
+
459
+ 2. **GitHub** (if `gh` CLI available):
460
+ - `gh issue list --search "{keywords}"` - related open/closed issues
461
+ - `gh pr list --search "{keywords}" --state all` - related PRs and their review comments
462
+ - Check if any issue is specifically linked to this work
463
+ - Look at review comments on past PRs touching these files
464
+
465
+ 3. **Communication Channels** (if MCP tools available):
466
+ - Slack: search for messages mentioning the feature, file names, or related keywords
467
+ - Notion: search for design docs, RFCs, ADRs related to this feature
468
+ - Discord: relevant discussions
469
+
470
+ 4. **Codebase Cross-References** (ALWAYS search):
471
+ - Files that import or reference the changed modules
472
+ - Tests that might need updating due to behavior changes
473
+ - Documentation (README, docs/, comments) that references changed behavior
474
+ - Config files that might need corresponding updates
475
+ - Related features in the same domain
476
+
477
+ WHAT TO LOOK FOR:
478
+
479
+ - Requirements mentioned in issues/PRs that the implementation misses
480
+ - Past decisions explaining WHY code was written a certain way - and whether new changes respect those reasons
481
+ - Related systems or features affected by these changes
482
+ - Warnings from previous developers (PR review comments, inline TODOs, commit messages)
483
+ - Migration or deprecation notes that affect the changed code
484
+ - Design decisions documented outside the codebase (Notion, Slack, ADRs)
485
+
486
+ OUTPUT FORMAT:
487
+ <verdict>PASS or FAIL</verdict>
488
+ <confidence>HIGH / MEDIUM / LOW</confidence>
489
+ <summary>1-3 sentence overall assessment</summary>
490
+ <sources_searched>
491
+ - [SEARCHED/SKIPPED] Source name - what was searched (or why it wasn't accessible)
492
+ </sources_searched>
493
+ <discovered_context>
494
+ For each discovery:
495
+ - Source: Where found (git commit abc123, GitHub issue #42, Slack message, etc.)
496
+ - Finding: What was found
497
+ - Relevance: How it relates to the current work
498
+ - Impact: [BLOCKING / IMPORTANT / FYI]
499
+ </discovered_context>
500
+ <missed_requirements>Requirements the implementation should address but doesn't. Empty if none.</missed_requirements>
501
+ <blocking_issues>BLOCKING items only. Empty if PASS.</blocking_issues>
502
+ """)
503
+ ```
504
+
505
+ ---
506
+
507
+ ## Phase 2: Wait & Collect
508
+
509
+ After launching all 5 agents in one turn, **end your response**. Wait for system notifications as each agent completes.
510
+
511
+ As each completes, collect via `background_output(task_id="bg_...")`. Store each verdict:
512
+
513
+ | Agent | Verdict | Notes |
514
+ |-------|---------|-------|
515
+ | 1. Goal Verification | pending | - |
516
+ | 2. QA Execution | pending | - |
517
+ | 3. Code Quality | pending | - |
518
+ | 4. Security | pending | - |
519
+ | 5. Context Mining | pending | - |
520
+
521
+ Do NOT deliver the final report until ALL 5 have completed.
522
+
523
+ ---
524
+
525
+ ## Phase 3: Deliver Verdict
526
+
527
+ <verdict_logic>
528
+
529
+ ALL 5 agents returned PASS → **REVIEW PASSED**
530
+ ANY agent returned FAIL → **REVIEW FAILED - criteria not met**
531
+
532
+ </verdict_logic>
533
+
534
+ Compile the final report in this format:
535
+
536
+ ```markdown
537
+ # Review Work - Final Report
538
+
539
+ ## Overall Verdict: PASSED / FAILED
540
+
541
+ | # | Review Area | Agent Type | Verdict | Confidence |
542
+ |---|------------|------------|---------|------------|
543
+ | 1 | Goal & Constraint Verification | Oracle | PASS/FAIL | HIGH/MED/LOW |
544
+ | 2 | QA Execution | unspecified-high | PASS/FAIL | HIGH/MED/LOW |
545
+ | 3 | Code Quality | Oracle | PASS/FAIL | HIGH/MED/LOW |
546
+ | 4 | Security (supplementary) | Oracle | PASS/FAIL | Severity |
547
+ | 5 | Context Mining | unspecified-high | PASS/FAIL | HIGH/MED/LOW |
548
+
549
+ ## Blocking Issues
550
+ [Aggregated from all agents - deduplicated, prioritized]
551
+
552
+ ## Key Findings
553
+ [Top 5-10 most important findings across all agents, grouped by theme]
554
+
555
+ ## Recommendations
556
+ [If FAILED: exactly what to fix, in priority order]
557
+ [If PASSED: non-blocking suggestions worth considering]
558
+ ```
559
+
560
+ If FAILED - be specific. The user should know exactly what to fix and in what order. No vague "consider improving X" - state the problem, the file, and the fix.
561
+
562
+ If PASSED - keep it short. Highlight any non-blocking suggestions, but don't turn a passing review into a lecture.
@@ -0,0 +1,41 @@
1
+ ---
2
+ name: rules
3
+ description: Use when the user asks about LitHermes repo-rule loading, injected project rules, supported rule file locations, matching, or environment configuration.
4
+ ---
5
+
6
+ # Repo Rules
7
+
8
+ Repo Rules is automatic once the LitHermes plugin is enabled. It loads project
9
+ instructions from your repository and surfaces them to the model as context:
10
+
11
+ - static project instructions are injected on `on_session_start` and on `pre_llm_call`
12
+ - file-specific rules are surfaced as guidance the model self-applies whenever it
13
+ edits a matching file (there is no per-edit plugin hook in Hermes; the matching
14
+ rule text is treated as a standing instruction the model checks before/after edits)
15
+
16
+ Loaded rule text is injected as additional context and is deduplicated per session.
17
+ Repo Rules never rewrites tool output — it only adds context.
18
+
19
+ Supported project sources:
20
+
21
+ - `AGENTS.md`
22
+ - `CLAUDE.md`
23
+ - `CONTEXT.md`
24
+ - `.hermes/rules/**/*.md`
25
+ - `.claude/rules/**/*.md`
26
+ - `.cursor/rules/**/*.md`
27
+ - `.github/instructions/**/*.md`
28
+ - `.github/copilot-instructions.md`
29
+
30
+ Supported environment knobs:
31
+
32
+ - `HERMES_RULES_DISABLED=1`
33
+ - `HERMES_RULES_MODE=both|static|dynamic|off`
34
+ - `HERMES_RULES_MAX_RULE_CHARS=<number>`
35
+ - `HERMES_RULES_MAX_RESULT_CHARS=<number>`
36
+ - `HERMES_RULES_ENABLED_SOURCES=AGENTS.md,.hermes/rules`
37
+
38
+ Static mode loads only repo-root instruction files; dynamic mode loads only the
39
+ file-specific rules matched against the current edit target; `both` (the default)
40
+ loads each as appropriate. State and dedupe bookkeeping live under
41
+ `.hermes/lithermes/`.