@laitszkin/apollo-toolkit 2.8.0 → 2.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,12 @@ All notable changes to this repository are documented in this file.
4
4
 
5
5
  ## [Unreleased]
6
6
 
7
+ ## [v2.9.0] - 2026-03-21
8
+
9
+ ### Changed
10
+ - Update `scheduled-runtime-health-check` to run requested commands in a background terminal immediately or within a requested time window, with optional pre-run safe updates and optional post-run log findings.
11
+ - Update `open-github-issue` to require explicit BDD-style expected behavior, current behavior, and behavior-gap content for problem issues, and enforce that contract in the bundled publisher script and docs.
12
+
7
13
  ## [v2.8.0] - 2026-03-21
8
14
 
9
15
  ### Changed
@@ -42,7 +42,7 @@ The bundled script can also be called directly:
42
42
  python scripts/open_github_issue.py \
43
43
  --issue-type problem \
44
44
  --title "[Log] Payment timeout spike" \
45
- --problem-description "Repeated timeout warnings escalated into request failures during the incident window." \
45
+ --problem-description $'Expected Behavior (BDD)\nGiven the payment service sees transient upstream latency\nWhen the retry path runs\nThen requests should recover without user-visible failures\n\nCurrent Behavior (BDD)\nGiven the payment service sees transient upstream latency\nWhen the retry path runs\nThen repeated timeout warnings still escalate into request failures\n\nBehavior Gap\n- Expected: retries absorb transient upstream slowness.\n- Actual: retries still end in request failures.\n- Difference/Impact: customers receive failed payment attempts during the incident window.\n\nEvidence\n- symptom: repeated timeout warnings escalated into request failures.\n- impact: payment attempts failed for end users.\n- key evidence: logs from the incident window show retries without successful recovery.' \
46
46
  --suspected-cause "payment-api/handler.py:84 retries immediately against a slow upstream with no jitter; confidence high." \
47
47
  --reproduction "Not yet reliably reproducible; more runtime evidence is required." \
48
48
  --repo owner/repo
@@ -72,6 +72,12 @@ Problem issues always include exactly three sections:
72
72
  - `Suspected Cause`
73
73
  - `Reproduction Conditions (if available)`
74
74
 
75
+ Within `Problem Description`, include:
76
+
77
+ - `Expected Behavior (BDD)`
78
+ - `Current Behavior (BDD)`
79
+ - `Behavior Gap`
80
+
75
81
  For Chinese-language repositories, use translated section titles with the same meaning.
76
82
 
77
83
  Feature proposal issues always include:
@@ -14,9 +14,9 @@ description: Publish structured GitHub issues and feature proposals with determi
14
14
 
15
15
  ## Standards
16
16
 
17
- - Evidence: Require structured issue inputs and detect repository language from the target README instead of guessing.
17
+ - Evidence: Require structured issue inputs, detect repository language from the target README instead of guessing, and for `problem` issues capture BDD-style expected vs current behavior with an explicit delta.
18
18
  - Execution: Resolve the repo, normalize the issue body, publish with strict auth order, then return the publication result.
19
- - Quality: Preserve upstream evidence, localize only the structural parts, and keep publication deterministic and reproducible.
19
+ - Quality: Preserve upstream evidence, localize only the structural parts, keep publication deterministic and reproducible, and make behavioral mismatches easy for maintainers to verify.
20
20
  - Output: Return publication mode, issue URL when created, rendered body, and any publish error in the standardized JSON contract.
21
21
 
22
22
  ## Overview
@@ -37,6 +37,7 @@ It is designed to be reusable by other skills that already know the issue title
37
37
  - Detect repository issue language from the target remote README instead of guessing.
38
38
  - Preserve upstream evidence content; only localize section headers and default fallback text.
39
39
  - Make the issue type explicit: `problem` for defects/incidents, `feature` for proposals.
40
+ - For `problem` issues, describe the expected behavior and current behavior with BDD-style `Given / When / Then`, then state the behavioral difference explicitly.
40
41
 
41
42
  ## Workflow
42
43
 
@@ -49,6 +50,11 @@ It is designed to be reusable by other skills that already know the issue title
49
50
  - `problem-description`
50
51
  - `suspected-cause`
51
52
  - `reproduction` (optional)
53
+ - Within `problem-description`, require a precise behavior diff:
54
+ - `Expected Behavior (BDD)`: `Given / When / Then` for what the program should do.
55
+ - `Current Behavior (BDD)`: `Given / When / Then` for what the program does now.
56
+ - `Behavior Gap`: a short explicit comparison of the observable difference and impact.
57
+ - Include the symptom, impact, and key evidence alongside the behavior diff; do not leave the mismatch implicit.
52
58
  - For `feature` issues, require these structured sections:
53
59
  - `proposal` (optional; defaults to title when omitted)
54
60
  - `reason`
@@ -74,7 +80,7 @@ Problem issue:
74
80
  python scripts/open_github_issue.py \
75
81
  --issue-type problem \
76
82
  --title "[Log] <short symptom>" \
77
- --problem-description "<symptom + impact + key evidence>" \
83
+ --problem-description $'Expected Behavior (BDD)\nGiven ...\nWhen ...\nThen ...\n\nCurrent Behavior (BDD)\nGiven ...\nWhen ...\nThen ...\n\nBehavior Gap\n- Expected: ...\n- Actual: ...\n- Difference/Impact: ...\n\nEvidence\n- symptom: ...\n- impact: ...\n- key evidence: ...' \
78
84
  --suspected-cause "<path:line + causal chain + confidence>" \
79
85
  --reproduction "<steps/conditions or leave empty>" \
80
86
  --repo <owner/repo>
@@ -111,6 +117,7 @@ When another skill depends on `open-github-issue`:
111
117
 
112
118
  - Pass exactly one confirmed problem or one accepted feature proposal per invocation.
113
119
  - Prepare evidence or proposal details before calling this skill; do not ask this skill to infer root cause or architecture.
120
+ - For `problem` issues, pass a `problem-description` that contains `Expected Behavior (BDD)`, `Current Behavior (BDD)`, and `Behavior Gap`; the difference must be explicit, not implied.
114
121
  - Reuse the returned `mode`, `issue_url`, and `publish_error` in the parent skill response.
115
122
  - For accepted feature proposals, pass `--issue-type feature` plus `--proposal`, `--reason`, and `--suggested-architecture`.
116
123
 
@@ -19,6 +19,18 @@ DEFAULT_REPRO_ZH = "尚未穩定重現;需補充更多執行期資料。"
19
19
  DEFAULT_REPRO_EN = "Not yet reliably reproducible; more runtime evidence is required."
20
20
  ISSUE_TYPE_PROBLEM = "problem"
21
21
  ISSUE_TYPE_FEATURE = "feature"
22
+ PROBLEM_BDD_MARKER_GROUPS = (
23
+ (
24
+ r"Expected Behavior\s*\(BDD\)",
25
+ r"Current Behavior\s*\(BDD\)",
26
+ r"Behavior Gap",
27
+ ),
28
+ (
29
+ r"預期行為\s*[((]BDD[))]",
30
+ r"(?:目前|當前)行為\s*[((]BDD[))]",
31
+ r"行為(?:落差|差異)",
32
+ ),
33
+ )
22
34
 
23
35
 
24
36
  def parse_args() -> argparse.Namespace:
@@ -83,6 +95,19 @@ def validate_issue_content_args(args: argparse.Namespace) -> None:
83
95
  raise SystemExit("Problem issues require --problem-description.")
84
96
  if not (args.suspected_cause or "").strip():
85
97
  raise SystemExit("Problem issues require --suspected-cause.")
98
+ if not has_required_problem_bdd_sections(args.problem_description or ""):
99
+ raise SystemExit(
100
+ "Problem issues require --problem-description to include "
101
+ "Expected Behavior (BDD), Current Behavior (BDD), and Behavior Gap sections."
102
+ )
103
+
104
+
105
+ def has_required_problem_bdd_sections(problem_description: str) -> bool:
106
+ normalized = problem_description.strip()
107
+ return any(
108
+ all(re.search(pattern, normalized, flags=re.IGNORECASE) for pattern in marker_group)
109
+ for marker_group in PROBLEM_BDD_MARKER_GROUPS
110
+ )
86
111
 
87
112
 
88
113
  def run_command(args: list[str]) -> subprocess.CompletedProcess[str]:
@@ -121,11 +121,59 @@ class OpenGitHubIssueTests(unittest.TestCase):
121
121
  )
122
122
  )
123
123
 
124
+ def test_validate_issue_content_args_requires_problem_bdd_sections(self) -> None:
125
+ with self.assertRaises(SystemExit):
126
+ MODULE.validate_issue_content_args(
127
+ Namespace(
128
+ issue_type=MODULE.ISSUE_TYPE_PROBLEM,
129
+ reason=None,
130
+ suggested_architecture=None,
131
+ problem_description="Repeated timeout warnings escalated into request failures.",
132
+ suspected_cause="handler.py:12",
133
+ )
134
+ )
135
+
136
+ MODULE.validate_issue_content_args(
137
+ Namespace(
138
+ issue_type=MODULE.ISSUE_TYPE_PROBLEM,
139
+ reason=None,
140
+ suggested_architecture=None,
141
+ problem_description=(
142
+ "Expected Behavior (BDD)\n"
143
+ "Given requests arrive during transient upstream latency\n"
144
+ "When the retry path runs\n"
145
+ "Then the request should recover without user-visible failure\n\n"
146
+ "Current Behavior (BDD)\n"
147
+ "Given requests arrive during transient upstream latency\n"
148
+ "When the retry path runs\n"
149
+ "Then the request still fails after immediate retries\n\n"
150
+ "Behavior Gap\n"
151
+ "- Expected: retries absorb transient slowness.\n"
152
+ "- Actual: retries amplify failures.\n"
153
+ "- Difference/Impact: users still receive errors.\n"
154
+ ),
155
+ suspected_cause="handler.py:12",
156
+ )
157
+ )
158
+
124
159
  def test_main_dry_run_returns_structured_json_without_publish_attempt(self) -> None:
125
160
  args = Namespace(
126
161
  title="[Log] sample",
127
162
  issue_type=MODULE.ISSUE_TYPE_PROBLEM,
128
- problem_description="problem",
163
+ problem_description=(
164
+ "Expected Behavior (BDD)\n"
165
+ "Given the issue is confirmed\n"
166
+ "When the issue body is rendered\n"
167
+ "Then the expected path should be explicit\n\n"
168
+ "Current Behavior (BDD)\n"
169
+ "Given the issue is confirmed\n"
170
+ "When the issue body is rendered\n"
171
+ "Then the current path should be explicit\n\n"
172
+ "Behavior Gap\n"
173
+ "- Expected: clear behavior diff.\n"
174
+ "- Actual: sample payload for dry run.\n"
175
+ "- Difference/Impact: contract stays structured.\n"
176
+ ),
129
177
  suspected_cause="handler.py:12",
130
178
  reproduction=None,
131
179
  proposal=None,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@laitszkin/apollo-toolkit",
3
- "version": "2.8.0",
3
+ "version": "2.9.0",
4
4
  "description": "Apollo Toolkit npm installer for managed skill copying across Codex, OpenClaw, and Trae.",
5
5
  "license": "MIT",
6
6
  "author": "LaiTszKin",
@@ -1,15 +1,16 @@
1
1
  # Scheduled Runtime Health Check
2
2
 
3
- An agent skill for scheduled, bounded project runs with post-run health analysis.
3
+ An agent skill for running user-requested commands in a background terminal, optionally inside a bounded time window with post-run log analysis.
4
4
 
5
- This skill helps agents start a project at a chosen time, keep it alive for a fixed observation window, stop it automatically, collect the relevant logs, and summarize module health with evidence-backed findings from `analyse-app-logs`.
5
+ This skill helps agents use a background terminal to run a requested command immediately or in a chosen time window, and optionally summarize evidence-backed findings from the resulting logs via `analyse-app-logs`.
6
6
 
7
7
  ## What this skill provides
8
8
 
9
- - A workflow for one-off or recurring runtime health checks.
9
+ - A workflow for one-off or recurring background-terminal runtime checks.
10
+ - An optional code-update step before execution.
10
11
  - Clear separation between scheduling, runtime observation, shutdown, and diagnosis.
11
12
  - A bounded log window so startup, steady-state, and shutdown evidence stay correlated.
12
- - Module-level health classification: `healthy`, `degraded`, `failed`, or `unknown`.
13
+ - Optional module-level health classification: `healthy`, `degraded`, `failed`, or `unknown`.
13
14
  - Escalation to `improve-observability` when existing telemetry is insufficient.
14
15
 
15
16
  ## Repository structure
@@ -33,22 +34,28 @@ cp -R scheduled-runtime-health-check "$CODEX_HOME/skills/scheduled-runtime-healt
33
34
  Invoke the skill in your prompt:
34
35
 
35
36
  ```text
36
- Use $scheduled-runtime-health-check to start this project at 22:00, keep it running for 6 hours, stop it automatically, and analyze whether the API, worker, and scheduler modules stayed healthy.
37
+ Use $scheduled-runtime-health-check to use a background terminal to run `docker compose up app worker`.
38
+
39
+ Run it in this specific time window: 2026-03-18 22:00 to 2026-03-19 04:00 Asia/Hong_Kong.
40
+
41
+ After the run completes, explain your findings from the logs.
37
42
  ```
38
43
 
39
44
  Best results come from including:
40
45
 
41
46
  - workspace path
42
- - start command
47
+ - execution command
43
48
  - stop command or acceptable shutdown method
44
- - schedule and timezone
45
- - duration
49
+ - schedule/time window and timezone
50
+ - duration when bounded
46
51
  - readiness signal
47
52
  - relevant log files
48
- - modules or subsystems to assess
53
+ - modules or subsystems to assess when findings are requested
54
+ - whether the repository should be updated first, only if you want that behavior
49
55
 
50
56
  If no trustworthy start command is documented, the agent should derive it from the repository or ask only for that missing command.
51
57
  If the user requests a future start time and no reliable scheduler is available, the agent should report that limitation instead of starting the run early.
58
+ If an optional update step was requested but the repository cannot be updated safely because the worktree is dirty or no upstream is configured, the agent should stop and report that exact blocker instead of forcing an update.
52
59
 
53
60
  ## Example
54
61
 
@@ -58,13 +65,14 @@ If the user requests a future start time and no reliable scheduler is available,
58
65
  Use $scheduled-runtime-health-check for this repository.
59
66
 
60
67
  Workspace: /workspace/my-app
61
- Start command: docker compose up app worker
68
+ Execution command: docker compose up app worker
62
69
  Stop command: docker compose down
63
70
  Schedule: 2026-03-18 22:00 Asia/Hong_Kong
64
71
  Duration: 6 hours
65
72
  Readiness signal: GET http://127.0.0.1:3000/health returns 200
66
73
  Logs: docker compose logs, logs/app.log, logs/worker.log
67
74
  Modules to assess: api, worker, scheduler
75
+ After completion: explain findings from the logs
68
76
  ```
69
77
 
70
78
  ### Expected response shape
@@ -73,21 +81,24 @@ Modules to assess: api, worker, scheduler
73
81
  1) Run summary
74
82
  - Started at 2026-03-18 22:00 HKT and stopped at 2026-03-19 04:00 HKT after a 6-hour bounded run.
75
83
 
76
- 2) Module health
84
+ 2) Execution result
85
+ - The background terminal completed the requested run workflow and kept the services up for the full window.
86
+
87
+ 3) Module health
77
88
  - api: healthy, served readiness checks and no error bursts were observed.
78
89
  - worker: degraded, repeated timeout warnings increased after 01:20 HKT.
79
90
  - scheduler: unknown, no positive execution signal was emitted during the window.
80
91
 
81
- 3) Confirmed issues
92
+ 4) Confirmed issues
82
93
  - Reuse evidence-backed findings from $analyse-app-logs.
83
94
 
84
- 4) Potential issues and validation needed
95
+ 5) Potential issues and validation needed
85
96
  - Scheduler may not be firing jobs; add a per-job execution log or metric to confirm.
86
97
 
87
- 5) Observability gaps
98
+ 6) Observability gaps
88
99
  - Missing correlation IDs between api requests and worker jobs.
89
100
 
90
- 6) Automation or scheduler status
101
+ 7) Automation or scheduler status
91
102
  - One bounded scheduled run completed and no further cleanup is required.
92
103
  ```
93
104
 
@@ -1,40 +1,51 @@
1
1
  ---
2
2
  name: scheduled-runtime-health-check
3
- description: Coordinate a scheduled, bounded project run that starts automatically, stays up for a fixed observation window, stops cleanly, and delegates log-based health analysis to analyse-app-logs. Use when users want timed project startups, post-run health checks across modules, and a report of confirmed issues and potential risks.
3
+ description: Use a background terminal to run a user-specified command immediately or in a requested time window, and optionally explain findings from the captured logs after the run. Use when users want timed project execution, bounded runtime checks, or post-run log-based findings.
4
4
  ---
5
5
 
6
6
  # Scheduled Runtime Health Check
7
7
 
8
8
  ## Dependencies
9
9
 
10
- - Required: `analyse-app-logs` for bounded post-run log analysis.
10
+ - Required: `analyse-app-logs` when the user asks for post-run log findings or when the observed run needs evidence-backed diagnosis.
11
11
  - Conditional: `improve-observability` when current logs cannot prove module health or root cause.
12
12
  - Optional: `open-github-issue` indirectly through `analyse-app-logs` when confirmed issues should be published.
13
13
  - Fallback: If no scheduler or automation capability is available for the requested future start time, stop and report that scheduling could not be created; only run immediately when the user explicitly allows an immediate bounded observation instead of a timed start.
14
14
 
15
15
  ## Standards
16
16
 
17
- - Evidence: Anchor every conclusion to the scheduled window, startup/shutdown timestamps, captured logs, and concrete module signals.
18
- - Execution: Collect the run contract, choose a scheduling mechanism, capture logs, run for a bounded window, stop cleanly, then delegate the review to `analyse-app-logs`.
19
- - Quality: Keep scheduling and shutdown deterministic, separate confirmed findings from hypotheses, and mark each module healthy/degraded/failed/unknown with reasons.
20
- - Output: Return the run configuration, module health by area, confirmed issues, potential issues, observability gaps, and automation or scheduler status.
17
+ - Evidence: Anchor every conclusion to the requested command, execution window, startup/shutdown timestamps, captured logs, and concrete runtime signals.
18
+ - Execution: Collect the run contract, use a background terminal, optionally update the code only when the user asks, execute the requested command immediately or in the requested window, capture logs, stop cleanly when bounded, then delegate log review to `analyse-app-logs` only when findings are requested or needed.
19
+ - Quality: Keep scheduling, execution, and shutdown deterministic; separate confirmed findings from hypotheses; and mark each assessed module healthy/degraded/failed/unknown with reasons.
20
+ - Output: Return the run configuration, execution status, log locations, optional code-update result, optional module health by area, confirmed issues, potential issues, observability gaps, and scheduler status when applicable.
21
21
 
22
22
  ## Overview
23
23
 
24
- Use this skill when the user wants an agent to:
24
+ Use this skill when the user wants an agent to do work in this shape:
25
25
 
26
- - start a project at a specific time
27
- - keep it running for a fixed window such as 6 hours
28
- - stop it automatically at the end of that window
29
- - collect logs from startup through shutdown
30
- - assess whether key modules behaved normally
31
- - identify confirmed problems and potential risks from the observed run
26
+ - use a background terminal for the whole run
27
+ - execute a specific command such as `npm run dev`, `docker compose up`, or another repo-defined entrypoint
28
+ - optionally update the project before execution when the user explicitly asks
29
+ - optionally run it inside a specific time window
30
+ - optionally wait for the run to finish and then explain findings from the logs
32
31
 
33
- This skill is an orchestration layer. It owns the schedule, bounded runtime, log capture, and module-level health summary. It delegates deep log diagnosis to `analyse-app-logs`.
32
+ Canonical task shape:
33
+
34
+ `Use $scheduled-runtime-health-check to use a background terminal to run <command>.`
35
+
36
+ Optional suffixes:
37
+
38
+ - `Before running, update this project to the latest safe code state.`
39
+ - `Run it in this specific time window: <window>.`
40
+ - `After the run completes, explain your findings from the logs.`
41
+
42
+ This skill is an orchestration layer. It owns the background terminal session, optional code-update step, optional scheduling, bounded runtime, log capture, and optional module-level health summary. It delegates deep log diagnosis to `analyse-app-logs` only when the user asks for findings or the run clearly needs evidence-backed analysis.
34
43
 
35
44
  ## Core principles
36
45
 
37
46
  - Prefer one bounded observation window over open-ended monitoring.
47
+ - Use one dedicated background terminal session per requested run so execution and logs stay correlated.
48
+ - Treat code update as optional and only perform it when the user explicitly requests it.
38
49
  - Treat startup, steady-state, and shutdown as part of the same investigation.
39
50
  - Do not call a module healthy unless there is at least one positive signal for it.
40
51
  - Separate scheduler failures, boot failures, runtime failures, and shutdown failures.
@@ -43,44 +54,45 @@ This skill is an orchestration layer. It owns the schedule, bounded runtime, log
43
54
  ## Required workflow
44
55
 
45
56
  1. Define the run contract
46
- - Confirm or derive the workspace, start command, stop method, schedule, duration, readiness signal, log locations, and modules to assess.
57
+ - Confirm or derive the workspace, execution command, optional code-update step, optional schedule, optional duration, readiness signal, log locations, and whether post-run findings are required.
47
58
  - Derive commands from trustworthy sources first: `package.json`, `Makefile`, `docker-compose.yml`, `Procfile`, scripts, or project docs.
48
- - If no trustworthy start command or stop method can be found, stop and ask only for the missing command rather than guessing.
49
- 2. Choose the scheduling mechanism
50
- - Prefer the host's native automation or scheduled-task system when available.
51
- - Prefer a single scheduled execution that performs start -> observe -> stop -> analyze so the log window is exact.
52
- - If the platform cannot hold a long-running scheduled task, use paired start/stop jobs and record both task identifiers.
59
+ - If no trustworthy execution command or stop method can be found, stop and ask only for the missing command rather than guessing.
60
+ 2. Prepare the background terminal run
61
+ - Use a dedicated background terminal session for the whole workflow.
62
+ - Create a dedicated run folder and record timezone, cwd, requested command, terminal session identifier, and any requested start/end boundaries.
63
+ - Capture stdout and stderr from the beginning of the session so the full run stays auditable.
64
+ 3. Optionally update to the latest safe code state
65
+ - Only do this step when the user explicitly asked to update the project before execution.
66
+ - Prefer the repository's normal safe update path, such as `git pull --ff-only`, or the project's documented sync command if one exists.
67
+ - Record the commit before and after the update.
68
+ - If the worktree is dirty, the branch has no upstream, or the update cannot be done safely, stop and report the exact blocker instead of guessing or forcing a merge.
69
+ 4. Choose the execution timing
70
+ - If the user gave a specific time window, schedule or delay the same background-terminal run to start in that window.
71
+ - If no time window was requested, run immediately after setup, or after the optional update step if one was requested.
53
72
  - If the user requested a future start time and no reliable scheduler is available, fail closed and report the scheduling limitation instead of starting early.
54
- 3. Prepare bounded log capture
55
- - Create a dedicated run folder for the window and record absolute start time, intended end time, timezone, cwd, command, and PID or job identifier.
56
- - Capture stdout and stderr for the started process, plus any existing app log files that matter for diagnosis.
57
- - Keep startup, runtime, and shutdown evidence in the same run record.
58
- 4. Start and verify readiness
59
- - Launch the project at the scheduled time.
60
- - Wait for a concrete readiness signal such as a health endpoint, listening-port log, worker boot line, or queue-consumer ready message.
61
- - If readiness never arrives, stop the run, preserve logs, and analyze the failed startup window.
62
- 5. Observe during the bounded window
63
- - Track crashes, restarts, retry storms, timeout bursts, stuck jobs, resource pressure, and repeated warnings.
64
- - For each requested module or subsystem, gather at least one positive signal and any degradation signal in the same window.
65
- - If the user did not list modules explicitly, infer the major runtime modules from the repository structure and runtime processes.
66
- 6. Stop cleanly at the end of the window
67
- - Use the project's normal shutdown path first.
68
- - If graceful stop fails, escalate deterministically and record the exact stop sequence and timestamps.
69
- - Treat abnormal shutdown behavior as a health signal, not just an operational detail.
70
- 7. Delegate bounded log analysis
73
+ 5. Run and capture readiness
74
+ - Execute the requested command in the same background terminal.
75
+ - Wait for a concrete readiness signal when the command is expected to stay up, such as a health endpoint, listening-port log, worker boot line, or queue-consumer ready message.
76
+ - If readiness never arrives, stop the run, preserve logs, and treat it as a failed startup window.
77
+ 6. Observe and stop when bounded
78
+ - If a bounded window or explicit stop time was requested, keep the process running only for that agreed window and then stop it cleanly.
79
+ - Track crashes, restarts, retry storms, timeout bursts, stuck jobs, resource pressure, and repeated warnings during the run.
80
+ - Use the project's normal shutdown path first; if graceful stop fails, escalate deterministically and record the exact stop sequence and timestamps.
81
+ 7. Explain findings from logs when requested
82
+ - If the user asked for findings after completion, wait for the run to finish before analyzing the captured logs.
71
83
  - Invoke `analyse-app-logs` on only the captured runtime window.
72
84
  - Pass the service or module names, environment, timezone, run folder, relevant log files, and the exact start/end boundaries.
73
85
  - Reuse its confirmed issues, hypotheses, and monitoring improvements instead of rewriting a separate incident workflow.
74
- 8. Produce the runtime health report
75
- - Summarize the schedule that was executed, whether readiness succeeded, how long the project stayed healthy, and how shutdown behaved.
76
- - Classify each module as `healthy`, `degraded`, `failed`, or `unknown` with concrete evidence.
77
- - Separate already observed issues from potential risks that need more telemetry or a longer run to confirm.
86
+ 8. Produce the final report
87
+ - Always summarize the actual command executed, actual start/end timestamps, execution status, and log locations.
88
+ - Include the code-update result only when an update step was requested.
89
+ - When findings were requested, classify each relevant module as `healthy`, `degraded`, `failed`, or `unknown` with concrete evidence and separate observed issues from risks that still need validation.
78
90
 
79
91
  ## Scheduling rules
80
92
 
81
93
  - Use the user's locale timezone when configuring scheduled tasks.
82
94
  - Name scheduled jobs clearly so the user can recognize start, stop, and analysis ownership.
83
- - Prefer recurring schedules only when the user explicitly wants repeated health checks; otherwise create a one-off bounded run.
95
+ - Prefer recurring schedules only when the user explicitly wants repeated checks; otherwise create a one-off bounded run.
84
96
  - If the host provides agent automations, use them before inventing project-local scheduling files.
85
97
  - If native automation is unavailable, prefer the smallest reliable OS-level scheduling method already present on the machine.
86
98
  - If the request depends on a future start time and no reliable scheduling method exists, do not silently convert the request into an immediate run.
@@ -99,21 +111,26 @@ Absence of errors alone is not enough for `healthy`.
99
111
  Use this structure in responses:
100
112
 
101
113
  1. Run summary
102
- - Workspace, schedule, actual start/end timestamps, duration, readiness result, shutdown result, and log locations.
103
- 2. Module health
104
- - One entry per module with status (`healthy` / `degraded` / `failed` / `unknown`) and evidence.
105
- 3. Confirmed issues
106
- - Reuse evidence-backed findings from `analyse-app-logs`.
107
- 4. Potential issues and validation needed
108
- - Risks that appeared in the run but need more evidence.
109
- 5. Observability gaps
110
- - Missing logs, metrics, probes, or correlation IDs that blocked diagnosis.
111
- 6. Automation or scheduler status
112
- - Created task identifiers, execution status, and whether future cleanup is needed.
114
+ - Workspace, command, schedule if any, actual start/end timestamps, duration if bounded, readiness result, shutdown result if applicable, and log locations.
115
+ 2. Execution result
116
+ - Whether the command completed, stayed up for the requested window, or failed early.
117
+ 3. Code update result
118
+ - Include only when an update step was requested. Record the update command, before/after commit, or the exact blocker.
119
+ 4. Module health
120
+ - Include only when findings were requested or health assessment was part of the task. One entry per module with status (`healthy` / `degraded` / `failed` / `unknown`) and evidence.
121
+ 5. Confirmed issues
122
+ - Include only when log analysis was requested. Reuse evidence-backed findings from `analyse-app-logs`.
123
+ 6. Potential issues and validation needed
124
+ - Include only when log analysis was requested. Risks that appeared in the run but need more evidence.
125
+ 7. Observability gaps
126
+ - Include only when log analysis was requested. Missing logs, metrics, probes, or correlation IDs that blocked diagnosis.
127
+ 8. Automation or scheduler status
128
+ - Include only when a future window or scheduler was involved. Record task identifiers, execution status, and whether future cleanup is needed.
113
129
 
114
130
  ## Guardrails
115
131
 
116
132
  - Do not let the project continue running past the agreed window unless the user explicitly asks.
133
+ - Do not perform a code-update step unless the user explicitly asked for it.
117
134
  - Do not claim steady-state health from startup-only evidence.
118
135
  - Keep the run folder and scheduler metadata so the investigation can be reproduced.
119
136
  - If current logs are too weak to judge module health, recommend `improve-observability` instead of stretching the evidence.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Scheduled Runtime Health Check"
3
- short_description: "Schedule a bounded project run, then assess module health from captured logs."
4
- default_prompt: "Use $scheduled-runtime-health-check to schedule a bounded project run, capture logs from startup through shutdown, classify module health across the requested subsystems, and delegate evidence-based post-run diagnosis to $analyse-app-logs."
3
+ short_description: "Use a background terminal to run a command in a time window and optionally explain findings from logs."
4
+ default_prompt: "Use $scheduled-runtime-health-check to use a background terminal to run the requested command immediately or in the requested time window, optionally update the project first only when the user asks for that, capture logs through the run, and, when findings are requested, delegate bounded post-run log diagnosis to $analyse-app-logs."