xtrm-tools 2.4.0 → 2.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (125) hide show
  1. package/README.md +23 -9
  2. package/cli/dist/index.cjs +774 -240
  3. package/cli/dist/index.cjs.map +1 -1
  4. package/cli/package.json +1 -1
  5. package/config/hooks.json +10 -0
  6. package/config/pi/extensions/core/adapter.ts +2 -14
  7. package/config/pi/extensions/core/guard-rules.ts +70 -0
  8. package/config/pi/extensions/core/session-state.ts +59 -0
  9. package/config/pi/extensions/main-guard.ts +10 -14
  10. package/config/pi/extensions/plan-mode/README.md +65 -0
  11. package/config/pi/extensions/plan-mode/index.ts +340 -0
  12. package/config/pi/extensions/plan-mode/utils.ts +168 -0
  13. package/config/pi/extensions/service-skills.ts +51 -7
  14. package/config/pi/extensions/session-flow.ts +117 -0
  15. package/hooks/beads-claim-sync.mjs +123 -2
  16. package/hooks/beads-compact-restore.mjs +41 -9
  17. package/hooks/beads-compact-save.mjs +36 -5
  18. package/hooks/beads-gate-messages.mjs +27 -1
  19. package/hooks/beads-stop-gate.mjs +58 -8
  20. package/hooks/guard-rules.mjs +86 -0
  21. package/hooks/hooks.json +28 -18
  22. package/hooks/main-guard.mjs +3 -21
  23. package/hooks/quality-check.cjs +1286 -0
  24. package/hooks/quality-check.py +345 -0
  25. package/hooks/session-state.mjs +138 -0
  26. package/package.json +2 -1
  27. package/project-skills/quality-gates/.claude/settings.json +1 -24
  28. package/skills/creating-service-skills/SKILL.md +433 -0
  29. package/skills/creating-service-skills/references/script_quality_standards.md +425 -0
  30. package/skills/creating-service-skills/references/service_skill_system_guide.md +278 -0
  31. package/skills/creating-service-skills/scripts/bootstrap.py +326 -0
  32. package/skills/creating-service-skills/scripts/deep_dive.py +304 -0
  33. package/skills/creating-service-skills/scripts/scaffolder.py +482 -0
  34. package/skills/scoping-service-skills/SKILL.md +231 -0
  35. package/skills/scoping-service-skills/scripts/scope.py +74 -0
  36. package/skills/sync-docs/SKILL.md +235 -0
  37. package/skills/sync-docs/evals/evals.json +89 -0
  38. package/skills/sync-docs/references/doc-structure.md +104 -0
  39. package/skills/sync-docs/references/schema.md +103 -0
  40. package/skills/sync-docs/scripts/context_gatherer.py +246 -0
  41. package/skills/sync-docs/scripts/doc_structure_analyzer.py +495 -0
  42. package/skills/sync-docs/scripts/validate_doc.py +365 -0
  43. package/skills/sync-docs-workspace/iteration-1/benchmark.json +293 -0
  44. package/skills/sync-docs-workspace/iteration-1/benchmark.md +13 -0
  45. package/skills/sync-docs-workspace/iteration-1/eval-doc-audit/eval_metadata.json +27 -0
  46. package/skills/sync-docs-workspace/iteration-1/eval-doc-audit/with_skill/outputs/result.md +210 -0
  47. package/skills/sync-docs-workspace/iteration-1/eval-doc-audit/with_skill/run-1/grading.json +28 -0
  48. package/skills/sync-docs-workspace/iteration-1/eval-doc-audit/with_skill/run-1/timing.json +1 -0
  49. package/skills/sync-docs-workspace/iteration-1/eval-doc-audit/without_skill/outputs/result.md +101 -0
  50. package/skills/sync-docs-workspace/iteration-1/eval-doc-audit/without_skill/run-1/grading.json +28 -0
  51. package/skills/sync-docs-workspace/iteration-1/eval-doc-audit/without_skill/run-1/timing.json +5 -0
  52. package/skills/sync-docs-workspace/iteration-1/eval-doc-audit/without_skill/timing.json +5 -0
  53. package/skills/sync-docs-workspace/iteration-1/eval-fix-mode/eval_metadata.json +27 -0
  54. package/skills/sync-docs-workspace/iteration-1/eval-fix-mode/with_skill/outputs/result.md +198 -0
  55. package/skills/sync-docs-workspace/iteration-1/eval-fix-mode/with_skill/run-1/grading.json +28 -0
  56. package/skills/sync-docs-workspace/iteration-1/eval-fix-mode/with_skill/run-1/timing.json +1 -0
  57. package/skills/sync-docs-workspace/iteration-1/eval-fix-mode/without_skill/outputs/result.md +94 -0
  58. package/skills/sync-docs-workspace/iteration-1/eval-fix-mode/without_skill/run-1/grading.json +28 -0
  59. package/skills/sync-docs-workspace/iteration-1/eval-fix-mode/without_skill/run-1/timing.json +1 -0
  60. package/skills/sync-docs-workspace/iteration-1/eval-sprint-closeout/eval_metadata.json +27 -0
  61. package/skills/sync-docs-workspace/iteration-1/eval-sprint-closeout/with_skill/outputs/result.md +237 -0
  62. package/skills/sync-docs-workspace/iteration-1/eval-sprint-closeout/with_skill/run-1/grading.json +28 -0
  63. package/skills/sync-docs-workspace/iteration-1/eval-sprint-closeout/with_skill/run-1/timing.json +1 -0
  64. package/skills/sync-docs-workspace/iteration-1/eval-sprint-closeout/without_skill/outputs/result.md +134 -0
  65. package/skills/sync-docs-workspace/iteration-1/eval-sprint-closeout/without_skill/run-1/grading.json +28 -0
  66. package/skills/sync-docs-workspace/iteration-1/eval-sprint-closeout/without_skill/run-1/timing.json +1 -0
  67. package/skills/sync-docs-workspace/iteration-2/benchmark.json +297 -0
  68. package/skills/sync-docs-workspace/iteration-2/benchmark.md +13 -0
  69. package/skills/sync-docs-workspace/iteration-2/eval-doc-audit/eval_metadata.json +27 -0
  70. package/skills/sync-docs-workspace/iteration-2/eval-doc-audit/with_skill/outputs/result.md +137 -0
  71. package/skills/sync-docs-workspace/iteration-2/eval-doc-audit/with_skill/run-1/grading.json +92 -0
  72. package/skills/sync-docs-workspace/iteration-2/eval-doc-audit/with_skill/run-1/timing.json +1 -0
  73. package/skills/sync-docs-workspace/iteration-2/eval-doc-audit/without_skill/outputs/result.md +134 -0
  74. package/skills/sync-docs-workspace/iteration-2/eval-doc-audit/without_skill/run-1/grading.json +86 -0
  75. package/skills/sync-docs-workspace/iteration-2/eval-doc-audit/without_skill/run-1/timing.json +1 -0
  76. package/skills/sync-docs-workspace/iteration-2/eval-fix-mode/eval_metadata.json +27 -0
  77. package/skills/sync-docs-workspace/iteration-2/eval-fix-mode/with_skill/outputs/result.md +193 -0
  78. package/skills/sync-docs-workspace/iteration-2/eval-fix-mode/with_skill/run-1/grading.json +72 -0
  79. package/skills/sync-docs-workspace/iteration-2/eval-fix-mode/with_skill/run-1/timing.json +1 -0
  80. package/skills/sync-docs-workspace/iteration-2/eval-fix-mode/without_skill/outputs/result.md +211 -0
  81. package/skills/sync-docs-workspace/iteration-2/eval-fix-mode/without_skill/run-1/grading.json +91 -0
  82. package/skills/sync-docs-workspace/iteration-2/eval-fix-mode/without_skill/run-1/timing.json +5 -0
  83. package/skills/sync-docs-workspace/iteration-2/eval-sprint-closeout/eval_metadata.json +27 -0
  84. package/skills/sync-docs-workspace/iteration-2/eval-sprint-closeout/with_skill/outputs/result.md +182 -0
  85. package/skills/sync-docs-workspace/iteration-2/eval-sprint-closeout/with_skill/run-1/grading.json +95 -0
  86. package/skills/sync-docs-workspace/iteration-2/eval-sprint-closeout/with_skill/run-1/timing.json +1 -0
  87. package/skills/sync-docs-workspace/iteration-2/eval-sprint-closeout/without_skill/outputs/result.md +222 -0
  88. package/skills/sync-docs-workspace/iteration-2/eval-sprint-closeout/without_skill/run-1/grading.json +88 -0
  89. package/skills/sync-docs-workspace/iteration-2/eval-sprint-closeout/without_skill/run-1/timing.json +5 -0
  90. package/skills/sync-docs-workspace/iteration-3/benchmark.json +298 -0
  91. package/skills/sync-docs-workspace/iteration-3/benchmark.md +13 -0
  92. package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/eval_metadata.json +27 -0
  93. package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/with_skill/outputs/result.md +125 -0
  94. package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/with_skill/run-1/grading.json +97 -0
  95. package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/with_skill/run-1/timing.json +5 -0
  96. package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/without_skill/outputs/result.md +144 -0
  97. package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/without_skill/run-1/grading.json +78 -0
  98. package/skills/sync-docs-workspace/iteration-3/eval-doc-audit/without_skill/run-1/timing.json +5 -0
  99. package/skills/sync-docs-workspace/iteration-3/eval-fix-mode/eval_metadata.json +27 -0
  100. package/skills/sync-docs-workspace/iteration-3/eval-fix-mode/with_skill/outputs/result.md +104 -0
  101. package/skills/sync-docs-workspace/iteration-3/eval-fix-mode/with_skill/run-1/grading.json +91 -0
  102. package/skills/sync-docs-workspace/iteration-3/eval-fix-mode/with_skill/run-1/timing.json +5 -0
  103. package/skills/sync-docs-workspace/iteration-3/eval-fix-mode/without_skill/outputs/result.md +79 -0
  104. package/skills/sync-docs-workspace/iteration-3/eval-fix-mode/without_skill/run-1/grading.json +82 -0
  105. package/skills/sync-docs-workspace/iteration-3/eval-fix-mode/without_skill/run-1/timing.json +5 -0
  106. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/eval_metadata.json +27 -0
  107. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/with_skill/outputs/phase1_context.json +302 -0
  108. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/with_skill/outputs/phase2_drift.txt +33 -0
  109. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/with_skill/outputs/phase3_analysis.json +114 -0
  110. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/with_skill/outputs/phase4_fix.txt +118 -0
  111. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/with_skill/outputs/phase5_validate.txt +38 -0
  112. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/with_skill/outputs/result.md +158 -0
  113. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/with_skill/run-1/grading.json +95 -0
  114. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/with_skill/run-1/timing.json +5 -0
  115. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/without_skill/outputs/result.md +71 -0
  116. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/without_skill/run-1/grading.json +90 -0
  117. package/skills/sync-docs-workspace/iteration-3/eval-sprint-closeout/without_skill/run-1/timing.json +5 -0
  118. package/skills/updating-service-skills/SKILL.md +136 -0
  119. package/skills/updating-service-skills/scripts/drift_detector.py +222 -0
  120. package/skills/using-quality-gates/SKILL.md +254 -0
  121. package/skills/using-service-skills/SKILL.md +108 -0
  122. package/skills/using-service-skills/scripts/cataloger.py +74 -0
  123. package/skills/using-service-skills/scripts/skill_activator.py +152 -0
  124. package/skills/using-service-skills/scripts/test_skill_activator.py +58 -0
  125. package/skills/using-xtrm/SKILL.md +34 -38
@@ -0,0 +1,425 @@
1
+ # Script Quality Standards for Service Skills
2
+
3
+ > Distilled from the mercury-market-data implementation (Feb 2026).
4
+ > Updated with lessons from the processing-papers implementation (Feb 2026).
5
+ > Apply these standards to every script generated in Phase 2 of the service-skill-builder workflow.
6
+
7
+ ---
8
+
9
+ ## Table of Contents
10
+
11
+ - [Mandatory DB Connection Pattern](#mandatory-db-connection-pattern)
12
+ - [Schema Verification Before Writing Any SQL](#schema-verification-before-writing-any-sql)
13
+ - [Makefile Standard](#makefile-standard)
14
+ - [Design Principles](#design-principles)
15
+ - [health_probe.py Standards](#health_probepy-standards)
16
+ - [log_hunter.py Standards](#log_hunterpy-standards)
17
+ - [Specialist Script Standards](#specialist-script-standards)
18
+ - [Common Pitfalls](#common-pitfalls)
19
+
20
+ ---
21
+
22
+ ## Mandatory DB Connection Pattern
23
+
24
+ **Every script that touches the database MUST use this exact pattern.** No exceptions.
25
+
26
+ ```python
27
+ #!/usr/bin/env python3
28
+ import sys
29
+ from pathlib import Path
30
+ from dotenv import load_dotenv
31
+
32
+ # Resolve project root (depth depends on script location within .claude/skills/)
33
+ project_root = Path(__file__).resolve().parent.parent.parent.parent.parent
34
+ env_file = project_root / ".env"
35
+ if env_file.exists():
36
+ load_dotenv(str(env_file))
37
+
38
+ sys.path.insert(0, str(project_root))
39
+ from shared.db_pool_manager import execute_db_query
40
+ ```
41
+
42
+ **Why:** System `python3` may lack `dotenv` and project deps. Always test with `venv/bin/python3`.
43
+ **Never:** Raw `psycopg2`, hardcoded DSN strings, or skipping the `load_dotenv` call.
44
+
45
+ ---
46
+
47
+ ## Schema Verification Before Writing Any SQL
48
+
49
+ Run these queries against the live DB **before** writing any script SQL. Paste the results
50
+ into the delegation prompt so the agent never guesses column or table names.
51
+
52
+ ```sql
53
+ -- Step 1: Confirm which tables actually exist
54
+ SELECT tablename FROM pg_tables WHERE schemaname = 'public' ORDER BY tablename;
55
+
56
+ -- Step 2: For each output table — get exact column names and types
57
+ SELECT column_name, data_type, is_nullable
58
+ FROM information_schema.columns
59
+ WHERE table_name = '<your_table>'
60
+ ORDER BY ordinal_position;
61
+ ```
62
+
63
+ **Critical rule:** If a table has no timestamp column, use `COUNT(*)` for freshness checks —
64
+ never guess a column name. Verify with `information_schema.columns` first.
65
+
66
+ ---
67
+
68
+ ## Makefile Standard
69
+
70
+ Every `scripts/` directory MUST contain a `Makefile` with these standard targets.
71
+ This is auto-generated by the scaffolder in Phase 1 and should be updated in Phase 2
72
+ to add any service-specific targets.
73
+
74
+ ```makefile
75
+ # Skill diagnostic scripts for <service-id>
76
+ # Usage: make <target> (from this directory)
77
+ # Override python: make health PYTHON=../../venv/bin/python3
78
+
79
+ PYTHON := python3
80
+
81
+ .PHONY: health health-json data data-json logs errors db help
82
+
83
+ help:
84
+ @echo "Available targets:"
85
+ @echo " health - Run health probe (human readable)"
86
+ @echo " health-json - Run health probe (JSON output)"
87
+ @echo " data - Show latest DB records"
88
+ @echo " data-json - Show latest DB records (JSON, limit 5)"
89
+ @echo " logs - Tail and analyze recent logs"
90
+ @echo " errors - Show errors/criticals only"
91
+ @echo " db - Run DB helper example queries"
92
+
93
+ health:
94
+ $(PYTHON) health_probe.py
95
+
96
+ health-json:
97
+ $(PYTHON) health_probe.py --json
98
+
99
+ data:
100
+ $(PYTHON) data_explorer.py
101
+
102
+ data-json:
103
+ $(PYTHON) data_explorer.py --json --limit 5
104
+
105
+ logs:
106
+ $(PYTHON) log_hunter.py --tail 50
107
+
108
+ errors:
109
+ $(PYTHON) log_hunter.py --errors-only --tail 50
110
+
111
+ db:
112
+ $(PYTHON) db_helper.py
113
+ ```
114
+
115
+ ---
116
+
117
+ ## Design Principles
118
+
119
+ ### 1. Service-Specific, Not Generic
120
+
121
+ The single most important rule. Generic scripts provide zero value.
122
+
123
+ **Wrong (generic stub output):**
124
+ ```python
125
+ error_patterns = [
126
+ r"(ERROR|CRITICAL|FATAL|EXCEPTION)",
127
+ r"ConnectionError",
128
+ r"SyntaxError",
129
+ r"ImportError",
130
+ ]
131
+ ```
132
+
133
+ **Right (service-specific, sourced from actual codebase):**
134
+ ```python
135
+ PATTERNS = [
136
+ # From yfinance source: actual exception class names
137
+ ("Rate limit", r"YFRateLimitError|429.*yahoo|Too Many Requests", "error"),
138
+ ("Missing data", r"YFPricesMissingError|no timezone found|Period.*invalid", "warning"),
139
+ # From DB layer: actual psycopg2 messages
140
+ ("DB connect", r"could not connect|password authentication failed", "critical"),
141
+ ("DB write", r"relation.*does not exist|column.*does not exist", "error"),
142
+ ]
143
+ ```
144
+
145
+ **How to find real patterns:** Read the service's entry point script, exception handlers, and log statements. Search for `logger.error`, `raise`, `except` blocks, and error message strings.
146
+
147
+ ---
148
+
149
+ ### 2. Port Awareness: Host vs. Container
150
+
151
+ Scripts in `skills/` run on the host machine, not inside Docker. Port mappings matter.
152
+
153
+ | Context | Use This Port |
154
+ |---------|--------------|
155
+ | Host scripts (`skills/*.py`) | External mapped port (e.g., `5433` for TimescaleDB `5433:5432`) |
156
+ | Docker service env vars | Internal port (`5432`) |
157
+ | `docker exec` commands | N/A — resolves via container DNS |
158
+
159
+ **Always use env vars with correct defaults:**
160
+ ```python
161
+ DB_HOST = os.getenv("DB_HOST", "localhost")
162
+ DB_PORT = int(os.getenv("DB_PORT", "5433")) # External mapped port
163
+ ```
164
+
165
+ ---
166
+
167
+ ### 3. Read-Before-Write Discipline
168
+
169
+ When a stub file already exists and you are replacing it, **always read it first**. Write tools fail with "File has not been read yet" otherwise. New files (no existing content) can be created directly.
170
+
171
+ ---
172
+
173
+ ### 4. Dual Output Mode
174
+
175
+ Every script must support both human-readable (default) and machine-readable (`--json`) output.
176
+
177
+ ```python
178
+ parser.add_argument("--json", action="store_true", help="Output as JSON")
179
+
180
+ if args.json:
181
+ print(json.dumps(result, indent=2, default=str))
182
+ return
183
+ # ... human-readable output below
184
+ ```
185
+
186
+ ---
187
+
188
+ ### 5. Actionable Remediation in Output
189
+
190
+ When a health probe or log hunter detects a critical problem, it must print the exact fix command — not a generic "check the logs."
191
+
192
+ ```python
193
+ if by_sev["critical"]:
194
+ print(f"\n ⚠ Critical issues detected.")
195
+ if any("OAuth expired" in h["labels"] for h in by_sev["critical"]):
196
+ print(f" Fix: docker exec -it {CONTAINER} python scripts/auth.py --refresh")
197
+ if any("DB connect" in h["labels"] for h in by_sev["critical"]):
198
+ print(f" Fix: docker compose restart timescaledb && docker compose restart {CONTAINER}")
199
+ ```
200
+
201
+ ---
202
+
203
+ ## health_probe.py Standards
204
+
205
+ ### Structure
206
+
207
+ ```python
208
+ CONTAINER = "service-name" # Exact Docker container name
209
+
210
+ def check_container() -> dict:
211
+ """docker inspect for status. Always present."""
212
+ ...
213
+
214
+ def check_<domain>() -> dict:
215
+ """Service-specific check (DB freshness, file presence, HTTP endpoint, etc.)"""
216
+ ...
217
+
218
+ def main():
219
+ # 1. Collect all checks
220
+ # 2. --json: dump report dict
221
+ # 3. Human: print formatted table
222
+ # 4. Print fix commands on failure
223
+ ```
224
+
225
+ ### For DB-writing services: Freshness Table
226
+
227
+ ```python
228
+ # Define per-table stale thresholds based on service update frequency
229
+ FRESHNESS_CHECKS = [
230
+ # (table_name, timestamp_col, stale_threshold_minutes)
231
+ ("candles_5m", "timestamp", 30), # 5m feed → stale if >30m old
232
+ ("outright_snapshots", "snapshot_ts", 10), # continuous → stale if >10m old
233
+ ("volatility_snapshots","snapshot_ts", 1500), # daily job → stale if >25h old
234
+ ]
235
+ ```
236
+
237
+ Stale threshold logic: `update_interval × 3` is a reasonable default, but adjust for business criticality.
238
+
239
+ ### For HTTP API services: Endpoint Probing
240
+
241
+ Do not ping generic ports. Probe the actual API routes the service exposes:
242
+
243
+ ```python
244
+ HEALTH_ENDPOINTS = [
245
+ ("FastAPI health", "http://localhost:8000/api/system/health", 3),
246
+ ("Background server", "http://localhost:5002/health", 2),
247
+ ]
248
+ # Optional smoke tests against real data endpoints
249
+ SMOKE_ENDPOINTS = [
250
+ ("Market snapshot", "http://localhost:8000/api/market/snapshot"),
251
+ ("Volatility data", "http://localhost:8000/api/analytics/volatility"),
252
+ ]
253
+ ```
254
+
255
+ ### For one-shot services (migrations, backfills): Exit Code
256
+
257
+ ```python
258
+ # docker inspect returns status="exited" and exit_code="0" on success
259
+ result = subprocess.run(
260
+ ["docker", "inspect", "--format",
261
+ "{{.State.Status}} {{.State.ExitCode}} {{.State.FinishedAt}}",
262
+ CONTAINER],
263
+ capture_output=True, text=True
264
+ )
265
+ ```
266
+
267
+ A one-shot service is healthy if `exit_code == "0"` and expected tables/schemas exist in the DB.
268
+
269
+ ### For file watcher services: Mount + State File
270
+
271
+ ```python
272
+ def check_scid_mount() -> dict:
273
+ result = subprocess.run(
274
+ ["docker", "exec", CONTAINER, "ls", "/data/scid"],
275
+ capture_output=True, text=True, timeout=10
276
+ )
277
+ files = [f for f in result.stdout.splitlines() if f.endswith(".scid")]
278
+ return {"accessible": result.returncode == 0, "file_count": len(files)}
279
+
280
+ def check_state_file() -> dict:
281
+ result = subprocess.run(
282
+ ["docker", "exec", CONTAINER, "cat", "/app/state/watcher_state.json"],
283
+ capture_output=True, text=True
284
+ )
285
+ if result.returncode != 0:
286
+ return {"present": False}
287
+ return {"present": True, "state": json.loads(result.stdout)}
288
+ ```
289
+
290
+ ---
291
+
292
+ ## log_hunter.py Standards
293
+
294
+ ### Pattern Structure
295
+
296
+ ```python
297
+ PATTERNS = [
298
+ # (label, regex_pattern, severity)
299
+ ("OAuth expired", r"invalid_grant|token.*expired", "critical"),
300
+ ("PDF parse", r"PdfReadError|pdf.*format.*changed", "error"),
301
+ ("No data", r"No new.*report|0 reports.*found", "warning"),
302
+ ("Report saved", r"report.*ingested|saved.*DB", "info"),
303
+ ]
304
+ ```
305
+
306
+ **Severity levels:**
307
+ - `critical`: Service needs restart or manual intervention to recover
308
+ - `error`: Functionality is impaired, data may be incomplete
309
+ - `warning`: Degraded state, worth monitoring
310
+ - `info`: Normal operation confirmation
311
+
312
+ ### Severity Ordering
313
+
314
+ Always use `sev_order` so that the highest severity "wins" when a line matches multiple patterns:
315
+
316
+ ```python
317
+ sev_order = {"critical": 0, "error": 1, "warning": 2, "info": 3}
318
+ if matched_severity is None or sev_order[severity] < sev_order[matched_severity]:
319
+ matched_severity = severity
320
+ ```
321
+
322
+ ### Required CLI Flags
323
+
324
+ ```python
325
+ parser.add_argument("--tail", type=int, default=200)
326
+ parser.add_argument("--since", type=str, default=None) # Docker --since (e.g. "1h", "2026-01-01")
327
+ parser.add_argument("--errors-only", action="store_true") # Skip info entries
328
+ parser.add_argument("--json", action="store_true")
329
+ ```
330
+
331
+ ### Pattern Design Rules
332
+
333
+ 1. Test patterns against the **actual log format** of the service, not hypothetical messages.
334
+ 2. Use `re.IGNORECASE` — log levels and messages vary in capitalization.
335
+ 3. Prefer specific class names (`YFPricesMissingError`) over generic keywords (`Error`).
336
+ 4. For Rust services, add: `r"thread '.*' panicked|panicked at '"` as a critical pattern.
337
+ 5. Always include at least 2 `info` patterns for normal operation confirmation — so the absence of info lines itself becomes a signal.
338
+
339
+ ### Anti-patterns to avoid
340
+
341
+ | Anti-pattern | Why It Fails |
342
+ |---|---|
343
+ | `r"ERROR"` | Matches comment text, variable names, and dozens of false positives |
344
+ | `r"Exception"` | Too broad — every Python `try/except` emits this |
345
+ | `r"ConnectionError"` | Only catches one subclass; misses `OperationalError`, `InterfaceError`, etc. |
346
+ | Single `error_patterns` list without severity | Provides no triage — everything looks equally bad |
347
+
348
+ ---
349
+
350
+ ## Specialist Script Standards
351
+
352
+ ### data_explorer.py (for DB-writing services)
353
+
354
+ Purpose: Let an agent query the service's output tables interactively without writing SQL.
355
+
356
+ ```python
357
+ # Always support:
358
+ parser.add_argument("--symbol", help="Filter to a specific symbol")
359
+ parser.add_argument("--history", action="store_true", help="Show time series, not just latest")
360
+ parser.add_argument("--limit", type=int, default=20)
361
+ parser.add_argument("--json", action="store_true")
362
+ ```
363
+
364
+ Use `DISTINCT ON (symbol) ... ORDER BY symbol, timestamp DESC` for "latest per symbol" queries. Use parameterized queries: `WHERE symbol = %s`.
365
+
366
+ ### endpoint_tester.py (for HTTP API services)
367
+
368
+ Test every real route in the API, not just `/health`. Measure response time and size:
369
+
370
+ ```python
371
+ ENDPOINTS = [
372
+ # (label, method, path, expected_status, timeout_s)
373
+ ("Health check", "GET", "/api/system/health", 200, 3),
374
+ ("Market overview", "GET", "/api/market/overview", 200, 5),
375
+ ("Symbol detail", "GET", "/api/market/ES=F", 200, 5),
376
+ # ... all actual routes
377
+ ]
378
+ ```
379
+
380
+ Report slow endpoints (>2s) separately from failed ones.
381
+
382
+ ### state_inspector.py (for stateful file watchers)
383
+
384
+ Read the state file via `docker exec` and compute lag between current file size and processed byte offset:
385
+
386
+ ```python
387
+ scid_size = get_file_size_in_container(container, filepath)
388
+ lag_bytes = scid_size - state["byte_offset"]
389
+ lag_flag = " ⚠" if lag_bytes > 1_000_000 else ""
390
+ ```
391
+
392
+ ### coverage_checker.py (for one-shot backfill services)
393
+
394
+ Report per-entity (spread, symbol, etc.) row counts, date ranges, and gaps:
395
+
396
+ ```sql
397
+ SELECT entity_id, COUNT(*) AS rows,
398
+ MIN(ts) AS earliest, MAX(ts) AS latest
399
+ FROM output_table
400
+ GROUP BY entity_id ORDER BY entity_id;
401
+ ```
402
+
403
+ Also detect missing entities against a known expected list, and find time-series gaps using `LAG()`.
404
+
405
+ ---
406
+
407
+ ## Common Pitfalls
408
+
409
+ | Pitfall | Prevention |
410
+ |---------|-----------|
411
+ | Script uses port 5432 from host | Default to 5433 (external mapped port); document the discrepancy |
412
+ | Script uses HTTP port scanning instead of real routes | Read docker-compose to find actual port mappings; check the service's API routes |
413
+ | OAuth token path is wrong | `docker exec container ls /expected/path` to verify before hardcoding |
414
+ | **DB table name is guessed** | Run `SELECT tablename FROM pg_tables WHERE schemaname='public'` first; include output in delegation prompt |
415
+ | **DB column name is guessed** | Run `SELECT column_name, data_type FROM information_schema.columns WHERE table_name='X'` per table; include in delegation prompt |
416
+ | **Assumed timestamp column on every table** | Check `information_schema.columns` — if no timestamp exists, use `COUNT(*)` for freshness; never guess |
417
+ | `try` block with no matching `except` | Every DB call needs a complete `try/except`; bare `try` blocks crash silently |
418
+ | Function renamed but call sites not updated | After any rename, grep the scripts dir for the old name before finishing |
419
+ | Delegation with no `-y` flag (Qwen) | Qwen requires `-y` for non-interactive file writes; without it, research happens but no files are written |
420
+ | Using `ccs gemini` instead of `gemini -p` | Gemini: `gemini -p "..."` · Qwen: `qwen -y "..."` · GLM: `env -u CLAUDECODE ccs glm -p "..."` |
421
+ | Scripts tested with system python3 | Always test with `venv/bin/python3`; system python may lack dotenv and other deps |
422
+ | Log patterns too broad | Read the actual `logger.error()` calls in the source code |
423
+ | Missing `--since` flag | Log hunters without `--since` can't be used for incremental monitoring |
424
+ | `health_probe.py` doesn't print fix commands | Always add actionable remediation text after detecting critical states |
425
+ | No `scripts/Makefile` | Every skill must have a Makefile with standard targets; scaffolder generates it in Phase 1 |
@@ -0,0 +1,278 @@
1
+ # Service Skill System: Architecture & Operations Guide
2
+
3
+ > Distilled from real-world Docker microservices projects.
4
+ > This guide is project-agnostic — adapt all examples to your stack.
5
+
6
+ ---
7
+
8
+ ## Table of Contents
9
+
10
+ - [1. System Overview](#1-system-overview)
11
+ - [2. System Architecture](#2-system-architecture)
12
+ - [3. Mandatory Two-Phase Workflow](#3-mandatory-two-phase-workflow)
13
+ - [4. Service Type Classification](#4-service-type-classification)
14
+ - [5. Directory Structure](#5-directory-structure)
15
+ - [6. Skill Lifecycle](#6-skill-lifecycle)
16
+ - [7. Quality Gates](#7-quality-gates)
17
+ - [8. Best Practices](#8-best-practices)
18
+ - [9. Anti-Patterns](#9-anti-patterns)
19
+
20
+ ---
21
+
22
+ ## 1. System Overview
23
+
24
+ The **Service Skill System** transforms an AI agent from a generic assistant into a service-aware operator. Each Docker service in your project gets a dedicated **skill package**: a structured combination of operational documentation and executable diagnostic scripts.
25
+
26
+ ### What a Skill Provides
27
+
28
+ | Layer | Contents | Purpose |
29
+ |-------|----------|---------|
30
+ | `SKILL.md` | Operational manual | How the service works, how to debug it |
31
+ | `scripts/health_probe.py` | Container + data freshness checks | Is the service healthy right now? |
32
+ | `scripts/log_hunter.py` | Pattern-based log analysis | What is the service logging and why? |
33
+ | `scripts/<specialist>.py` | Service-specific inspector | What state does this service hold? |
34
+
35
+ Without scripts, a skill is documentation only. Without documentation, scripts have no context. Both are required.
36
+
37
+ ---
38
+
39
+ ## 2. System Architecture
40
+
41
+ ### Three Components
42
+
43
+ **A. The Builder (`service-skill-builder`)**
44
+ The meta-skill that generates other skills.
45
+ - **Input**: `docker-compose*.yml`, Dockerfiles, entry-point source code
46
+ - **Engine**: `scripts/main.py` (Phase 1 skeleton generator)
47
+ - **Output**: `SKILL.md`, `REFINEMENT_BRIEF.md`, stub scripts → then replaced in Phase 2
48
+
49
+ **B. The Health Checker (`scripts/skill_health_check.py`)**
50
+ Detects drift between skills and the live codebase.
51
+ - Compares service modification timestamps vs. skill generation timestamps
52
+ - Identifies services with no skill (coverage gaps)
53
+ - Reports stale skills needing a re-dive
54
+
55
+ **C. The Generated Skills**
56
+ Individual packages per service (e.g., `.claude/skills/my-service/`).
57
+
58
+ ---
59
+
60
+ ## 3. Mandatory Two-Phase Workflow
61
+
62
+ **Phase 1 and Phase 2 are both required. The skeleton alone is never sufficient.**
63
+
64
+ ### Phase 1: Automated Skeleton
65
+
66
+ Run the generator against your project root:
67
+
68
+ ```bash
69
+ # Discover all Docker services
70
+ python3 .claude/skills/service-skill-builder/scripts/main.py --scan
71
+
72
+ # Generate skeleton for one service
73
+ python3 .claude/skills/service-skill-builder/scripts/main.py <service-name>
74
+ ```
75
+
76
+ The skeleton provides:
77
+ - Structural facts: port mappings, env var names, image names, volumes
78
+ - `REFINEMENT_BRIEF.md` listing every open question
79
+ - Generic stub scripts (placeholder only — **must be replaced**)
80
+
81
+ **The skeleton cannot tell you:**
82
+ - What the service actually writes to the database (column names, stale thresholds)
83
+ - What real error messages look like in the logs
84
+ - What "healthy" vs. "degraded" vs. "failed" looks like
85
+ - What exact commands fix common failures
86
+
87
+ ### Phase 2: Agentic Deep Dive
88
+
89
+ Read the source code. Answer every question in `REFINEMENT_BRIEF.md` using `Grep`, `Glob`, `Read`, and Serena LSP tools. Do not guess. Do not leave placeholders.
90
+
91
+ **Mandatory investigation areas:**
92
+
93
+ #### Container & Runtime
94
+ - What is the exact entry point? (Dockerfile CMD + docker-compose `command:`)
95
+ - Is this a long-running daemon, a cron job, or a one-shot? → determines health strategy
96
+ - Which env vars cause a crash if missing? Which are optional?
97
+ - What volumes does it read from? Write to?
98
+ - What is the restart policy and why?
99
+
100
+ #### Data Layer
101
+ - Which tables does it write? Which does it only read?
102
+ - What is the timestamp column for each output table (`created_at`, `snapshot_ts`, `asof_ts`, etc.)?
103
+ - What is a realistic "stale" threshold in minutes per table? (Rule of thumb: update_interval × 3)
104
+ - Does it use Redis, S3, local files, or other external state?
105
+ - Are queries parameterized? (Check `%s`, `%(name)s`, `?` patterns — never f-strings in SQL)
106
+
107
+ #### Failure Modes
108
+ Build this table with ≥5 rows from code comments, exception handlers, and READMEs:
109
+
110
+ | Symptom | Likely Cause | Resolution |
111
+ |---------|-------------|------------|
112
+ | (what you see in logs or alerts) | (root cause) | (exact docker/shell command to fix) |
113
+
114
+ #### Log Patterns
115
+ Search for `logger.error`, `logger.warning`, `raise`, `except`, and `panic!` in the source:
116
+ - What appears in logs during normal healthy operation? (→ `info` patterns)
117
+ - What appears during recoverable errors? (→ `warning` / `error` patterns)
118
+ - What appears during critical failures requiring restart? (→ `critical` patterns)
119
+ - For Rust services: what does a panic look like? (`thread '.*' panicked`)
120
+
121
+ ---
122
+
123
+ ## 4. Service Type Classification
124
+
125
+ Classify before writing scripts. The service type determines which scripts to write beyond the baseline `health_probe.py` and `log_hunter.py`.
126
+
127
+ | Service Type | Health Probe Strategy | Specialist Script |
128
+ |---|---|---|
129
+ | **Continuous DB writer** | Table freshness (age of most recent row per table) | `data_explorer.py` |
130
+ | **HTTP API server** | HTTP probe against real routes (not just port scan) | `endpoint_tester.py` |
131
+ | **One-shot / migration** | Container exit code + expected tables/schemas present | `coverage_checker.py` |
132
+ | **File watcher** | Mount path accessible + state file present + DB recency | `state_inspector.py` |
133
+ | **Email / API poller** | Container running + auth token file present | service-specific |
134
+ | **Scheduled backup** | Recent backup files in staging dir + daemon running | service-specific |
135
+ | **MCP stdio server** | Data source freshness in DB (no HTTP to probe) | service-specific |
136
+
137
+ ---
138
+
139
+ ## 5. Directory Structure
140
+
141
+ ```
142
+ .claude/skills/
143
+ ├── service-skill-builder/ # Meta-skill (system core)
144
+ │ ├── SKILL.md
145
+ │ ├── references/
146
+ │ │ ├── service_skill_system_guide.md # This file
147
+ │ │ └── script_quality_standards.md # Script design rules
148
+ │ └── scripts/
149
+ │ ├── main.py # Phase 1 skeleton generator
150
+ │ ├── skill_health_check.py # Drift detection
151
+ │ ├── discovery.py # Docker Compose parser
152
+ │ ├── analysis.py # AST/regex code analyzer
153
+ │ ├── devops_audit.py # CI/CD/observability audit
154
+ │ └── generator.py # Skill file generation logic
155
+
156
+ ├── my-service-a/ # Generated skill (long-running daemon)
157
+ │ ├── SKILL.md
158
+ │ └── scripts/
159
+ │ ├── health_probe.py # Container + DB freshness checks
160
+ │ ├── log_hunter.py # Pattern-matched log analysis
161
+ │ └── data_explorer.py # Query output tables interactively
162
+
163
+ ├── my-service-b/ # Generated skill (HTTP API)
164
+ │ ├── SKILL.md
165
+ │ └── scripts/
166
+ │ ├── health_probe.py
167
+ │ ├── log_hunter.py
168
+ │ └── endpoint_tester.py # Probe all real API routes
169
+
170
+ └── my-service-c/ # Generated skill (file watcher)
171
+ ├── SKILL.md
172
+ └── scripts/
173
+ ├── health_probe.py
174
+ ├── log_hunter.py
175
+ └── state_inspector.py # Read state file, compute lag
176
+ ```
177
+
178
+ Agent mirrors — always sync after creating or updating skills:
179
+
180
+ ```bash
181
+ for d in .claude/skills/my-*/; do
182
+ svc=$(basename "$d")
183
+ cp -r "$d" ".agent/skills/$svc/"
184
+ cp -r "$d" ".gemini/skills/$svc/"
185
+ done
186
+ ```
187
+
188
+ ---
189
+
190
+ ## 6. Skill Lifecycle
191
+
192
+ ### When to Generate a Skill
193
+ - A new Docker service is added to the project
194
+ - An existing service is significantly refactored
195
+
196
+ ### When to Update a Skill
197
+ - The service's database schema changes
198
+ - New error conditions are added to the code
199
+ - The entry point or restart policy changes
200
+ - The health check script's stale thresholds no longer reflect reality
201
+
202
+ ### Detecting Drift
203
+
204
+ ```bash
205
+ # Check all skills for staleness
206
+ python3 .claude/skills/service-skill-builder/scripts/skill_health_check.py --all
207
+ ```
208
+
209
+ Output example:
210
+ ```
211
+ my-service-a: HEALTHY
212
+ my-service-b: STALE (service code modified 2026-01-15, skill generated 2025-11-01)
213
+ my-service-c: MISSING (no skill exists)
214
+ ```
215
+
216
+ A skill is **STALE** when the service's source code or docker-compose definition has been modified more recently than the skill was generated. This is a signal to re-run Phase 2 for the affected service.
217
+
218
+ ---
219
+
220
+ ## 7. Quality Gates
221
+
222
+ A skill is **complete** (not draft) when all of the following are true:
223
+
224
+ - [ ] No `[PENDING RESEARCH]` markers remain in SKILL.md
225
+ - [ ] All stub scripts have been replaced with service-specific implementations
226
+ - [ ] `health_probe.py` queries actual output tables with correct stale thresholds
227
+ - [ ] `log_hunter.py` patterns are sourced from the real codebase (not invented)
228
+ - [ ] At least one specialist script exists if the service has unique inspectable state
229
+ - [ ] The Troubleshooting table has ≥5 rows based on real failure modes
230
+ - [ ] All CLI commands in SKILL.md are verified against the actual docker-compose config
231
+ - [ ] Scripts have been synced to `.agent/skills/` and `.gemini/skills/` mirrors
232
+
233
+ ---
234
+
235
+ ## 8. Best Practices
236
+
237
+ ### One Service, One Skill
238
+ Keep skills granular. A skill for `my-api` should not also document `my-worker`. Tightly coupled services (e.g., Redis master/replica) may share a skill if they are always operated together.
239
+
240
+ ### Read Source, Not Docs
241
+ Internal README files go stale. The entry point script, exception handlers, and log statements are the ground truth. Always grep the source code for actual error messages before writing log patterns.
242
+
243
+ ### Port Awareness
244
+ Scripts in `skills/` run on the **host machine**, not inside Docker. Always use the external mapped port:
245
+
246
+ ```python
247
+ # ✅ Host script (external mapped port)
248
+ DB_PORT = int(os.getenv("DB_PORT", "5433"))
249
+
250
+ # ❌ Wrong for a host script (container-internal port)
251
+ DB_PORT = int(os.getenv("DB_PORT", "5432"))
252
+ ```
253
+
254
+ ### Executable Knowledge
255
+ Prefer putting logic into `scripts/` (executed without reading into context) over text-only descriptions in SKILL.md. An agent that can run `health_probe.py` learns the truth about service health in one step. An agent reading stale prose may act on incorrect assumptions.
256
+
257
+ ### Actionable Remediation
258
+ Every critical failure detected by a script must print the exact command to fix it — not "check the logs." For example:
259
+
260
+ ```python
261
+ if not token_present:
262
+ print(f" Fix: docker exec -it {CONTAINER} python scripts/auth.py --refresh")
263
+ ```
264
+
265
+ ---
266
+
267
+ ## 9. Anti-Patterns
268
+
269
+ | Anti-pattern | Why It Fails |
270
+ |---|---|
271
+ | Skip Phase 2 because Phase 1 looks complete | Skeleton has correct port numbers but wrong table names, wrong log patterns, wrong stale thresholds |
272
+ | Copy log patterns from another service's skill | Different services emit different errors; shared patterns produce false positives and miss real failures |
273
+ | Use port 5432 in host scripts | Container-internal port is unreachable from host; scripts silently hang |
274
+ | Write `health_probe.py` without fix commands | Agent sees a failure but has no recovery path |
275
+ | Leave `[PENDING RESEARCH]` markers | The skill is unusable — an agent acting on incomplete info may apply wrong fixes |
276
+ | Forget to sync to `.agent/` and `.gemini/` | Other agent runtimes use stale or missing skills |
277
+ | Use `r"ERROR"` as a log pattern | Matches variable names, comments, thousands of false positives |
278
+ | Hardcode table names without verifying | `SELECT tablename FROM pg_tables WHERE schemaname='public'` first |