@laitszkin/apollo-toolkit 2.7.0 → 2.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +4 -3
- package/CHANGELOG.md +13 -0
- package/README.md +5 -3
- package/analyse-app-logs/LICENSE +1 -1
- package/archive-specs/LICENSE +1 -1
- package/commit-and-push/LICENSE +1 -1
- package/develop-new-features/LICENSE +1 -1
- package/docs-to-voice/LICENSE +1 -1
- package/enhance-existing-features/LICENSE +1 -1
- package/feature-propose/LICENSE +1 -1
- package/generate-spec/LICENSE +1 -1
- package/learn-skill-from-conversations/LICENSE +1 -2
- package/lib/cli.js +4 -4
- package/lib/installer.js +6 -8
- package/novel-to-short-video/LICENSE +1 -1
- package/open-github-issue/LICENSE +1 -1
- package/open-github-issue/README.md +7 -1
- package/open-github-issue/SKILL.md +10 -3
- package/open-github-issue/scripts/open_github_issue.py +25 -0
- package/open-github-issue/tests/test_open_github_issue.py +49 -1
- package/open-source-pr-workflow/LICENSE +1 -1
- package/openai-text-to-image-storyboard/LICENSE +1 -1
- package/openclaw-configuration/SKILL.md +30 -2
- package/openclaw-configuration/references/best-practices.md +15 -0
- package/openclaw-configuration/references/config-reference-map.md +22 -0
- package/package.json +2 -2
- package/review-change-set/LICENSE +1 -1
- package/review-codebases/LICENSE +1 -1
- package/scheduled-runtime-health-check/README.md +26 -15
- package/scheduled-runtime-health-check/SKILL.md +70 -53
- package/scheduled-runtime-health-check/agents/openai.yaml +2 -2
- package/scripts/install_skills.ps1 +10 -17
- package/scripts/install_skills.sh +10 -10
- package/shadow-api-model-research/SKILL.md +114 -0
- package/shadow-api-model-research/agents/openai.yaml +4 -0
- package/shadow-api-model-research/references/fingerprinting-playbook.md +69 -0
- package/shadow-api-model-research/references/request-shape-checklist.md +44 -0
- package/systematic-debug/LICENSE +1 -1
- package/text-to-short-video/LICENSE +1 -1
- package/version-release/LICENSE +1 -1
- package/video-production/LICENSE +16 -13
|
@@ -1,15 +1,16 @@
|
|
|
1
1
|
# Scheduled Runtime Health Check
|
|
2
2
|
|
|
3
|
-
An agent skill for
|
|
3
|
+
An agent skill for running user-requested commands in a background terminal, optionally inside a bounded time window with post-run log analysis.
|
|
4
4
|
|
|
5
|
-
This skill helps agents
|
|
5
|
+
This skill helps agents use a background terminal to run a requested command immediately or in a chosen time window, and optionally summarize evidence-backed findings from the resulting logs via `analyse-app-logs`.
|
|
6
6
|
|
|
7
7
|
## What this skill provides
|
|
8
8
|
|
|
9
|
-
- A workflow for one-off or recurring runtime
|
|
9
|
+
- A workflow for one-off or recurring background-terminal runtime checks.
|
|
10
|
+
- An optional code-update step before execution.
|
|
10
11
|
- Clear separation between scheduling, runtime observation, shutdown, and diagnosis.
|
|
11
12
|
- A bounded log window so startup, steady-state, and shutdown evidence stay correlated.
|
|
12
|
-
-
|
|
13
|
+
- Optional module-level health classification: `healthy`, `degraded`, `failed`, or `unknown`.
|
|
13
14
|
- Escalation to `improve-observability` when existing telemetry is insufficient.
|
|
14
15
|
|
|
15
16
|
## Repository structure
|
|
@@ -33,22 +34,28 @@ cp -R scheduled-runtime-health-check "$CODEX_HOME/skills/scheduled-runtime-healt
|
|
|
33
34
|
Invoke the skill in your prompt:
|
|
34
35
|
|
|
35
36
|
```text
|
|
36
|
-
Use $scheduled-runtime-health-check to
|
|
37
|
+
Use $scheduled-runtime-health-check to use a background terminal to run `docker compose up app worker`.
|
|
38
|
+
|
|
39
|
+
Run it in this specific time window: 2026-03-18 22:00 to 2026-03-19 04:00 Asia/Hong_Kong.
|
|
40
|
+
|
|
41
|
+
After the run completes, explain your findings from the logs.
|
|
37
42
|
```
|
|
38
43
|
|
|
39
44
|
Best results come from including:
|
|
40
45
|
|
|
41
46
|
- workspace path
|
|
42
|
-
-
|
|
47
|
+
- execution command
|
|
43
48
|
- stop command or acceptable shutdown method
|
|
44
|
-
- schedule and timezone
|
|
45
|
-
- duration
|
|
49
|
+
- schedule/time window and timezone
|
|
50
|
+
- duration when bounded
|
|
46
51
|
- readiness signal
|
|
47
52
|
- relevant log files
|
|
48
|
-
- modules or subsystems to assess
|
|
53
|
+
- modules or subsystems to assess when findings are requested
|
|
54
|
+
- whether the repository should be updated first, only if you want that behavior
|
|
49
55
|
|
|
50
56
|
If no trustworthy start command is documented, the agent should derive it from the repository or ask only for that missing command.
|
|
51
57
|
If the user requests a future start time and no reliable scheduler is available, the agent should report that limitation instead of starting the run early.
|
|
58
|
+
If an optional update step was requested but the repository cannot be updated safely because the worktree is dirty or no upstream is configured, the agent should stop and report that exact blocker instead of forcing an update.
|
|
52
59
|
|
|
53
60
|
## Example
|
|
54
61
|
|
|
@@ -58,13 +65,14 @@ If the user requests a future start time and no reliable scheduler is available,
|
|
|
58
65
|
Use $scheduled-runtime-health-check for this repository.
|
|
59
66
|
|
|
60
67
|
Workspace: /workspace/my-app
|
|
61
|
-
|
|
68
|
+
Execution command: docker compose up app worker
|
|
62
69
|
Stop command: docker compose down
|
|
63
70
|
Schedule: 2026-03-18 22:00 Asia/Hong_Kong
|
|
64
71
|
Duration: 6 hours
|
|
65
72
|
Readiness signal: GET http://127.0.0.1:3000/health returns 200
|
|
66
73
|
Logs: docker compose logs, logs/app.log, logs/worker.log
|
|
67
74
|
Modules to assess: api, worker, scheduler
|
|
75
|
+
After completion: explain findings from the logs
|
|
68
76
|
```
|
|
69
77
|
|
|
70
78
|
### Expected response shape
|
|
@@ -73,21 +81,24 @@ Modules to assess: api, worker, scheduler
|
|
|
73
81
|
1) Run summary
|
|
74
82
|
- Started at 2026-03-18 22:00 HKT and stopped at 2026-03-19 04:00 HKT after a 6-hour bounded run.
|
|
75
83
|
|
|
76
|
-
2)
|
|
84
|
+
2) Execution result
|
|
85
|
+
- The background terminal completed the requested run workflow and kept the services up for the full window.
|
|
86
|
+
|
|
87
|
+
3) Module health
|
|
77
88
|
- api: healthy, served readiness checks and no error bursts were observed.
|
|
78
89
|
- worker: degraded, repeated timeout warnings increased after 01:20 HKT.
|
|
79
90
|
- scheduler: unknown, no positive execution signal was emitted during the window.
|
|
80
91
|
|
|
81
|
-
|
|
92
|
+
4) Confirmed issues
|
|
82
93
|
- Reuse evidence-backed findings from $analyse-app-logs.
|
|
83
94
|
|
|
84
|
-
|
|
95
|
+
5) Potential issues and validation needed
|
|
85
96
|
- Scheduler may not be firing jobs; add a per-job execution log or metric to confirm.
|
|
86
97
|
|
|
87
|
-
|
|
98
|
+
6) Observability gaps
|
|
88
99
|
- Missing correlation IDs between api requests and worker jobs.
|
|
89
100
|
|
|
90
|
-
|
|
101
|
+
7) Automation or scheduler status
|
|
91
102
|
- One bounded scheduled run completed and no further cleanup is required.
|
|
92
103
|
```
|
|
93
104
|
|
|
@@ -1,40 +1,51 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: scheduled-runtime-health-check
|
|
3
|
-
description:
|
|
3
|
+
description: Use a background terminal to run a user-specified command immediately or in a requested time window, and optionally explain findings from the captured logs after the run. Use when users want timed project execution, bounded runtime checks, or post-run log-based findings.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Scheduled Runtime Health Check
|
|
7
7
|
|
|
8
8
|
## Dependencies
|
|
9
9
|
|
|
10
|
-
- Required: `analyse-app-logs` for
|
|
10
|
+
- Required: `analyse-app-logs` when the user asks for post-run log findings or when the observed run needs evidence-backed diagnosis.
|
|
11
11
|
- Conditional: `improve-observability` when current logs cannot prove module health or root cause.
|
|
12
12
|
- Optional: `open-github-issue` indirectly through `analyse-app-logs` when confirmed issues should be published.
|
|
13
13
|
- Fallback: If no scheduler or automation capability is available for the requested future start time, stop and report that scheduling could not be created; only run immediately when the user explicitly allows an immediate bounded observation instead of a timed start.
|
|
14
14
|
|
|
15
15
|
## Standards
|
|
16
16
|
|
|
17
|
-
- Evidence: Anchor every conclusion to the
|
|
18
|
-
- Execution: Collect the run contract,
|
|
19
|
-
- Quality: Keep scheduling and shutdown deterministic
|
|
20
|
-
- Output: Return the run configuration, module health by area, confirmed issues, potential issues, observability gaps, and
|
|
17
|
+
- Evidence: Anchor every conclusion to the requested command, execution window, startup/shutdown timestamps, captured logs, and concrete runtime signals.
|
|
18
|
+
- Execution: Collect the run contract, use a background terminal, optionally update the code only when the user asks, execute the requested command immediately or in the requested window, capture logs, stop cleanly when bounded, then delegate log review to `analyse-app-logs` only when findings are requested or needed.
|
|
19
|
+
- Quality: Keep scheduling, execution, and shutdown deterministic; separate confirmed findings from hypotheses; and mark each assessed module healthy/degraded/failed/unknown with reasons.
|
|
20
|
+
- Output: Return the run configuration, execution status, log locations, optional code-update result, optional module health by area, confirmed issues, potential issues, observability gaps, and scheduler status when applicable.
|
|
21
21
|
|
|
22
22
|
## Overview
|
|
23
23
|
|
|
24
|
-
Use this skill when the user wants an agent to:
|
|
24
|
+
Use this skill when the user wants an agent to do work in this shape:
|
|
25
25
|
|
|
26
|
-
-
|
|
27
|
-
-
|
|
28
|
-
-
|
|
29
|
-
-
|
|
30
|
-
-
|
|
31
|
-
- identify confirmed problems and potential risks from the observed run
|
|
26
|
+
- use a background terminal for the whole run
|
|
27
|
+
- execute a specific command such as `npm run dev`, `docker compose up`, or another repo-defined entrypoint
|
|
28
|
+
- optionally update the project before execution when the user explicitly asks
|
|
29
|
+
- optionally run it inside a specific time window
|
|
30
|
+
- optionally wait for the run to finish and then explain findings from the logs
|
|
32
31
|
|
|
33
|
-
|
|
32
|
+
Canonical task shape:
|
|
33
|
+
|
|
34
|
+
`Use $scheduled-runtime-health-check to use a background terminal to run <command>.`
|
|
35
|
+
|
|
36
|
+
Optional suffixes:
|
|
37
|
+
|
|
38
|
+
- `Before running, update this project to the latest safe code state.`
|
|
39
|
+
- `Run it in this specific time window: <window>.`
|
|
40
|
+
- `After the run completes, explain your findings from the logs.`
|
|
41
|
+
|
|
42
|
+
This skill is an orchestration layer. It owns the background terminal session, optional code-update step, optional scheduling, bounded runtime, log capture, and optional module-level health summary. It delegates deep log diagnosis to `analyse-app-logs` only when the user asks for findings or the run clearly needs evidence-backed analysis.
|
|
34
43
|
|
|
35
44
|
## Core principles
|
|
36
45
|
|
|
37
46
|
- Prefer one bounded observation window over open-ended monitoring.
|
|
47
|
+
- Use one dedicated background terminal session per requested run so execution and logs stay correlated.
|
|
48
|
+
- Treat code update as optional and only perform it when the user explicitly requests it.
|
|
38
49
|
- Treat startup, steady-state, and shutdown as part of the same investigation.
|
|
39
50
|
- Do not call a module healthy unless there is at least one positive signal for it.
|
|
40
51
|
- Separate scheduler failures, boot failures, runtime failures, and shutdown failures.
|
|
@@ -43,44 +54,45 @@ This skill is an orchestration layer. It owns the schedule, bounded runtime, log
|
|
|
43
54
|
## Required workflow
|
|
44
55
|
|
|
45
56
|
1. Define the run contract
|
|
46
|
-
- Confirm or derive the workspace,
|
|
57
|
+
- Confirm or derive the workspace, execution command, optional code-update step, optional schedule, optional duration, readiness signal, log locations, and whether post-run findings are required.
|
|
47
58
|
- Derive commands from trustworthy sources first: `package.json`, `Makefile`, `docker-compose.yml`, `Procfile`, scripts, or project docs.
|
|
48
|
-
- If no trustworthy
|
|
49
|
-
2.
|
|
50
|
-
-
|
|
51
|
-
-
|
|
52
|
-
-
|
|
59
|
+
- If no trustworthy execution command or stop method can be found, stop and ask only for the missing command rather than guessing.
|
|
60
|
+
2. Prepare the background terminal run
|
|
61
|
+
- Use a dedicated background terminal session for the whole workflow.
|
|
62
|
+
- Create a dedicated run folder and record timezone, cwd, requested command, terminal session identifier, and any requested start/end boundaries.
|
|
63
|
+
- Capture stdout and stderr from the beginning of the session so the full run stays auditable.
|
|
64
|
+
3. Optionally update to the latest safe code state
|
|
65
|
+
- Only do this step when the user explicitly asked to update the project before execution.
|
|
66
|
+
- Prefer the repository's normal safe update path, such as `git pull --ff-only`, or the project's documented sync command if one exists.
|
|
67
|
+
- Record the commit before and after the update.
|
|
68
|
+
- If the worktree is dirty, the branch has no upstream, or the update cannot be done safely, stop and report the exact blocker instead of guessing or forcing a merge.
|
|
69
|
+
4. Choose the execution timing
|
|
70
|
+
- If the user gave a specific time window, schedule or delay the same background-terminal run to start in that window.
|
|
71
|
+
- If no time window was requested, run immediately after setup, or after the optional update step if one was requested.
|
|
53
72
|
- If the user requested a future start time and no reliable scheduler is available, fail closed and report the scheduling limitation instead of starting early.
|
|
54
|
-
|
|
55
|
-
-
|
|
56
|
-
-
|
|
57
|
-
-
|
|
58
|
-
|
|
59
|
-
-
|
|
60
|
-
-
|
|
61
|
-
-
|
|
62
|
-
|
|
63
|
-
-
|
|
64
|
-
- For each requested module or subsystem, gather at least one positive signal and any degradation signal in the same window.
|
|
65
|
-
- If the user did not list modules explicitly, infer the major runtime modules from the repository structure and runtime processes.
|
|
66
|
-
6. Stop cleanly at the end of the window
|
|
67
|
-
- Use the project's normal shutdown path first.
|
|
68
|
-
- If graceful stop fails, escalate deterministically and record the exact stop sequence and timestamps.
|
|
69
|
-
- Treat abnormal shutdown behavior as a health signal, not just an operational detail.
|
|
70
|
-
7. Delegate bounded log analysis
|
|
73
|
+
5. Run and capture readiness
|
|
74
|
+
- Execute the requested command in the same background terminal.
|
|
75
|
+
- Wait for a concrete readiness signal when the command is expected to stay up, such as a health endpoint, listening-port log, worker boot line, or queue-consumer ready message.
|
|
76
|
+
- If readiness never arrives, stop the run, preserve logs, and treat it as a failed startup window.
|
|
77
|
+
6. Observe and stop when bounded
|
|
78
|
+
- If a bounded window or explicit stop time was requested, keep the process running only for that agreed window and then stop it cleanly.
|
|
79
|
+
- Track crashes, restarts, retry storms, timeout bursts, stuck jobs, resource pressure, and repeated warnings during the run.
|
|
80
|
+
- Use the project's normal shutdown path first; if graceful stop fails, escalate deterministically and record the exact stop sequence and timestamps.
|
|
81
|
+
7. Explain findings from logs when requested
|
|
82
|
+
- If the user asked for findings after completion, wait for the run to finish before analyzing the captured logs.
|
|
71
83
|
- Invoke `analyse-app-logs` on only the captured runtime window.
|
|
72
84
|
- Pass the service or module names, environment, timezone, run folder, relevant log files, and the exact start/end boundaries.
|
|
73
85
|
- Reuse its confirmed issues, hypotheses, and monitoring improvements instead of rewriting a separate incident workflow.
|
|
74
|
-
8. Produce the
|
|
75
|
-
-
|
|
76
|
-
-
|
|
77
|
-
-
|
|
86
|
+
8. Produce the final report
|
|
87
|
+
- Always summarize the actual command executed, actual start/end timestamps, execution status, and log locations.
|
|
88
|
+
- Include the code-update result only when an update step was requested.
|
|
89
|
+
- When findings were requested, classify each relevant module as `healthy`, `degraded`, `failed`, or `unknown` with concrete evidence and separate observed issues from risks that still need validation.
|
|
78
90
|
|
|
79
91
|
## Scheduling rules
|
|
80
92
|
|
|
81
93
|
- Use the user's locale timezone when configuring scheduled tasks.
|
|
82
94
|
- Name scheduled jobs clearly so the user can recognize start, stop, and analysis ownership.
|
|
83
|
-
- Prefer recurring schedules only when the user explicitly wants repeated
|
|
95
|
+
- Prefer recurring schedules only when the user explicitly wants repeated checks; otherwise create a one-off bounded run.
|
|
84
96
|
- If the host provides agent automations, use them before inventing project-local scheduling files.
|
|
85
97
|
- If native automation is unavailable, prefer the smallest reliable OS-level scheduling method already present on the machine.
|
|
86
98
|
- If the request depends on a future start time and no reliable scheduling method exists, do not silently convert the request into an immediate run.
|
|
@@ -99,21 +111,26 @@ Absence of errors alone is not enough for `healthy`.
|
|
|
99
111
|
Use this structure in responses:
|
|
100
112
|
|
|
101
113
|
1. Run summary
|
|
102
|
-
- Workspace, schedule, actual start/end timestamps, duration, readiness result, shutdown result, and log locations.
|
|
103
|
-
2.
|
|
104
|
-
-
|
|
105
|
-
3.
|
|
106
|
-
-
|
|
107
|
-
4.
|
|
108
|
-
-
|
|
109
|
-
5.
|
|
110
|
-
-
|
|
111
|
-
6.
|
|
112
|
-
-
|
|
114
|
+
- Workspace, command, schedule if any, actual start/end timestamps, duration if bounded, readiness result, shutdown result if applicable, and log locations.
|
|
115
|
+
2. Execution result
|
|
116
|
+
- Whether the command completed, stayed up for the requested window, or failed early.
|
|
117
|
+
3. Code update result
|
|
118
|
+
- Include only when an update step was requested. Record the update command, before/after commit, or the exact blocker.
|
|
119
|
+
4. Module health
|
|
120
|
+
- Include only when findings were requested or health assessment was part of the task. One entry per module with status (`healthy` / `degraded` / `failed` / `unknown`) and evidence.
|
|
121
|
+
5. Confirmed issues
|
|
122
|
+
- Include only when log analysis was requested. Reuse evidence-backed findings from `analyse-app-logs`.
|
|
123
|
+
6. Potential issues and validation needed
|
|
124
|
+
- Include only when log analysis was requested. Risks that appeared in the run but need more evidence.
|
|
125
|
+
7. Observability gaps
|
|
126
|
+
- Include only when log analysis was requested. Missing logs, metrics, probes, or correlation IDs that blocked diagnosis.
|
|
127
|
+
8. Automation or scheduler status
|
|
128
|
+
- Include only when a future window or scheduler was involved. Record task identifiers, execution status, and whether future cleanup is needed.
|
|
113
129
|
|
|
114
130
|
## Guardrails
|
|
115
131
|
|
|
116
132
|
- Do not let the project continue running past the agreed window unless the user explicitly asks.
|
|
133
|
+
- Do not perform a code-update step unless the user explicitly asked for it.
|
|
117
134
|
- Do not claim steady-state health from startup-only evidence.
|
|
118
135
|
- Keep the run folder and scheduler metadata so the investigation can be reproduced.
|
|
119
136
|
- If current logs are too weak to judge module health, recommend `improve-observability` instead of stretching the evidence.
|
|
@@ -1,4 +1,4 @@
|
|
|
1
1
|
interface:
|
|
2
2
|
display_name: "Scheduled Runtime Health Check"
|
|
3
|
-
short_description: "
|
|
4
|
-
default_prompt: "Use $scheduled-runtime-health-check to
|
|
3
|
+
short_description: "Use a background terminal to run a command in a time window and optionally explain findings from logs."
|
|
4
|
+
default_prompt: "Use $scheduled-runtime-health-check to use a background terminal to run the requested command immediately or in the requested time window, optionally update the project first only when the user asks for that, capture logs through the run, and, when findings are requested, delegate bounded post-run log diagnosis to $analyse-app-logs."
|
|
@@ -12,9 +12,9 @@ Usage:
|
|
|
12
12
|
./scripts/install_skills.ps1 [codex|openclaw|trae|all]...
|
|
13
13
|
|
|
14
14
|
Modes:
|
|
15
|
-
codex
|
|
16
|
-
openclaw
|
|
17
|
-
trae
|
|
15
|
+
codex Copy skills into ~/.codex/skills
|
|
16
|
+
openclaw Copy skills into ~/.openclaw/workspace*/skills
|
|
17
|
+
trae Copy skills into ~/.trae/skills
|
|
18
18
|
all Install all supported targets
|
|
19
19
|
|
|
20
20
|
Optional environment overrides:
|
|
@@ -33,7 +33,7 @@ function Show-Banner {
|
|
|
33
33
|
@"
|
|
34
34
|
+------------------------------------------+
|
|
35
35
|
| Apollo Toolkit |
|
|
36
|
-
| npm installer and skill
|
|
36
|
+
| npm installer and skill copier |
|
|
37
37
|
+------------------------------------------+
|
|
38
38
|
"@
|
|
39
39
|
}
|
|
@@ -180,7 +180,7 @@ function Remove-PathForce {
|
|
|
180
180
|
}
|
|
181
181
|
}
|
|
182
182
|
|
|
183
|
-
function
|
|
183
|
+
function Copy-Skill {
|
|
184
184
|
param(
|
|
185
185
|
[string]$Source,
|
|
186
186
|
[string]$TargetRoot
|
|
@@ -192,15 +192,8 @@ function Link-Skill {
|
|
|
192
192
|
New-Item -ItemType Directory -Path $TargetRoot -Force | Out-Null
|
|
193
193
|
Remove-PathForce -Target $target
|
|
194
194
|
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
Write-Host "[linked] $target -> $Source"
|
|
198
|
-
}
|
|
199
|
-
catch {
|
|
200
|
-
# Fallback for environments where symlink permission is restricted.
|
|
201
|
-
New-Item -Path $target -ItemType Junction -Target $Source -Force | Out-Null
|
|
202
|
-
Write-Host "[linked-junction] $target -> $Source"
|
|
203
|
-
}
|
|
195
|
+
Copy-Item -LiteralPath $Source -Destination $target -Recurse -Force
|
|
196
|
+
Write-Host "[copied] $Source -> $target"
|
|
204
197
|
}
|
|
205
198
|
|
|
206
199
|
function Install-Codex {
|
|
@@ -215,7 +208,7 @@ function Install-Codex {
|
|
|
215
208
|
|
|
216
209
|
Write-Host "Installing to codex: $target"
|
|
217
210
|
foreach ($src in $SkillPaths) {
|
|
218
|
-
|
|
211
|
+
Copy-Skill -Source $src -TargetRoot $target
|
|
219
212
|
}
|
|
220
213
|
}
|
|
221
214
|
|
|
@@ -242,7 +235,7 @@ function Install-OpenClaw {
|
|
|
242
235
|
$skillsDir = Join-Path $workspace.FullName "skills"
|
|
243
236
|
Write-Host "Installing to openclaw workspace: $skillsDir"
|
|
244
237
|
foreach ($src in $SkillPaths) {
|
|
245
|
-
|
|
238
|
+
Copy-Skill -Source $src -TargetRoot $skillsDir
|
|
246
239
|
}
|
|
247
240
|
}
|
|
248
241
|
}
|
|
@@ -259,7 +252,7 @@ function Install-Trae {
|
|
|
259
252
|
|
|
260
253
|
Write-Host "Installing to trae: $target"
|
|
261
254
|
foreach ($src in $SkillPaths) {
|
|
262
|
-
|
|
255
|
+
Copy-Skill -Source $src -TargetRoot $target
|
|
263
256
|
}
|
|
264
257
|
}
|
|
265
258
|
|
|
@@ -7,9 +7,9 @@ Usage:
|
|
|
7
7
|
./scripts/install_skills.sh [codex|openclaw|trae|all]...
|
|
8
8
|
|
|
9
9
|
Modes:
|
|
10
|
-
codex
|
|
11
|
-
openclaw
|
|
12
|
-
trae
|
|
10
|
+
codex Copy skills into ~/.codex/skills
|
|
11
|
+
openclaw Copy skills into ~/.openclaw/workspace*/skills
|
|
12
|
+
trae Copy skills into ~/.trae/skills
|
|
13
13
|
all Install all supported targets
|
|
14
14
|
|
|
15
15
|
Optional environment overrides:
|
|
@@ -29,7 +29,7 @@ show_banner() {
|
|
|
29
29
|
cat <<'BANNER'
|
|
30
30
|
+------------------------------------------+
|
|
31
31
|
| Apollo Toolkit |
|
|
32
|
-
| npm installer and skill
|
|
32
|
+
| npm installer and skill copier |
|
|
33
33
|
+------------------------------------------+
|
|
34
34
|
BANNER
|
|
35
35
|
}
|
|
@@ -74,7 +74,7 @@ collect_skills() {
|
|
|
74
74
|
fi
|
|
75
75
|
}
|
|
76
76
|
|
|
77
|
-
|
|
77
|
+
replace_with_copy() {
|
|
78
78
|
local src="$1"
|
|
79
79
|
local target_root="$2"
|
|
80
80
|
local name target
|
|
@@ -86,8 +86,8 @@ replace_with_symlink() {
|
|
|
86
86
|
if [[ -e "$target" || -L "$target" ]]; then
|
|
87
87
|
rm -rf "$target"
|
|
88
88
|
fi
|
|
89
|
-
|
|
90
|
-
echo "[
|
|
89
|
+
cp -R "$src" "$target"
|
|
90
|
+
echo "[copied] $src -> $target"
|
|
91
91
|
}
|
|
92
92
|
|
|
93
93
|
install_codex() {
|
|
@@ -96,7 +96,7 @@ install_codex() {
|
|
|
96
96
|
|
|
97
97
|
echo "Installing to codex: $codex_skills_dir"
|
|
98
98
|
for src in "${SKILL_PATHS[@]}"; do
|
|
99
|
-
|
|
99
|
+
replace_with_copy "$src" "$codex_skills_dir"
|
|
100
100
|
done
|
|
101
101
|
}
|
|
102
102
|
|
|
@@ -120,7 +120,7 @@ install_openclaw() {
|
|
|
120
120
|
skills_dir="$workspace/skills"
|
|
121
121
|
echo "Installing to openclaw workspace: $skills_dir"
|
|
122
122
|
for src in "${SKILL_PATHS[@]}"; do
|
|
123
|
-
|
|
123
|
+
replace_with_copy "$src" "$skills_dir"
|
|
124
124
|
done
|
|
125
125
|
done
|
|
126
126
|
}
|
|
@@ -131,7 +131,7 @@ install_trae() {
|
|
|
131
131
|
|
|
132
132
|
echo "Installing to trae: $trae_skills_dir"
|
|
133
133
|
for src in "${SKILL_PATHS[@]}"; do
|
|
134
|
-
|
|
134
|
+
replace_with_copy "$src" "$trae_skills_dir"
|
|
135
135
|
done
|
|
136
136
|
}
|
|
137
137
|
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: shadow-api-model-research
|
|
3
|
+
description: Investigate gated or shadow LLM APIs by capturing real client request shapes, separating request-shape gating from auth/entitlement checks, replaying verified traffic patterns, and attributing the likely underlying model with black-box fingerprinting. Use when users ask how Codex/OpenClaw/custom-provider traffic works, want a capture proxy or replay harness, need LLMMAP-style model comparison, or want a research report on which model a restricted endpoint likely wraps.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Shadow API Model Research
|
|
7
|
+
|
|
8
|
+
## Dependencies
|
|
9
|
+
|
|
10
|
+
- Required: `answering-questions-with-research` for primary-source web verification and code-backed explanations.
|
|
11
|
+
- Conditional: `openclaw-configuration` when the capture path uses OpenClaw custom providers or workspace config edits; `deep-research-topics` when the user wants a formal report, especially PDF output.
|
|
12
|
+
- Optional: none.
|
|
13
|
+
- Fallback: If you cannot inspect either the real client code path or authorized live traffic, stop and report the missing evidence instead of guessing from headers or marketing copy.
|
|
14
|
+
|
|
15
|
+
## Standards
|
|
16
|
+
|
|
17
|
+
- Evidence: Base conclusions on actual client code, captured traffic, official docs, and controlled replay results; do not infer protocol details from memory alone.
|
|
18
|
+
- Execution: Split the job into request-shape capture, replay validation, and model-attribution analysis; treat each as a separate hypothesis gate.
|
|
19
|
+
- Quality: Distinguish request-shape compatibility, auth or entitlement requirements, system-prompt wrapping, and underlying-model behavior; never collapse them into one claim.
|
|
20
|
+
- Output: Return the tested providers, exact capture or replay setup, prompt set or scoring rubric, observed differences, and an explicit confidence statement plus caveats.
|
|
21
|
+
|
|
22
|
+
## Goal
|
|
23
|
+
|
|
24
|
+
Help another agent run lawful, evidence-based shadow-API research without drifting into guesswork about what a gated endpoint checks or which model it wraps.
|
|
25
|
+
|
|
26
|
+
## Workflow
|
|
27
|
+
|
|
28
|
+
### 1. Classify the research ask
|
|
29
|
+
|
|
30
|
+
Decide which of these the user actually needs:
|
|
31
|
+
|
|
32
|
+
- capture the true request shape from a known client
|
|
33
|
+
- configure OpenClaw or another client to hit a controlled endpoint
|
|
34
|
+
- build a replay harness from observed traffic
|
|
35
|
+
- compare the endpoint against known providers with black-box prompts
|
|
36
|
+
- package findings into a concise report
|
|
37
|
+
|
|
38
|
+
If the user is mixing all of them, still execute in that order: capture first, replay second, attribution third.
|
|
39
|
+
|
|
40
|
+
### 2. Verify the real client path before writing any script
|
|
41
|
+
|
|
42
|
+
- Inspect the local client code and active config first.
|
|
43
|
+
- For OpenClaw, load the relevant official docs or local source through `answering-questions-with-research`, and use `openclaw-configuration` if you need to rewire a custom provider for capture.
|
|
44
|
+
- When Codex or another official client is involved, verify current behavior from primary sources and the local installed code when available.
|
|
45
|
+
- Do not claim that a request shape is "Codex-compatible" or "OpenClaw-compatible" until you have either:
|
|
46
|
+
- captured it from the client, or
|
|
47
|
+
- confirmed it from the current implementation and docs.
|
|
48
|
+
|
|
49
|
+
### 3. Capture the true request shape
|
|
50
|
+
|
|
51
|
+
- Read `references/request-shape-checklist.md` before touching the network path.
|
|
52
|
+
- Prefer routing the real client to a capture proxy or controlled upstream you own.
|
|
53
|
+
- Record, at minimum:
|
|
54
|
+
- method
|
|
55
|
+
- path
|
|
56
|
+
- query parameters
|
|
57
|
+
- headers
|
|
58
|
+
- body schema
|
|
59
|
+
- streaming or SSE frame shape
|
|
60
|
+
- retries, timeouts, and backoff behavior
|
|
61
|
+
- any client-added metadata that changes between providers or models
|
|
62
|
+
- Treat aborted turns or partially applied config edits as tainted state; re-check the active config before trusting a capture.
|
|
63
|
+
|
|
64
|
+
### 4. Separate request gating from auth or entitlement
|
|
65
|
+
|
|
66
|
+
- Build explicit hypotheses for what the endpoint may be checking:
|
|
67
|
+
- plain OpenAI-compatible schema only
|
|
68
|
+
- static headers or user-agent shape
|
|
69
|
+
- transport details such as SSE formatting
|
|
70
|
+
- token claims, workspace identity, or other entitlement state
|
|
71
|
+
- Do not tell the user that replaying the request shape is sufficient unless the replay actually works.
|
|
72
|
+
- If the evidence shows the endpoint still rejects cloned traffic, report that the barrier is likely beyond the visible request shape.
|
|
73
|
+
|
|
74
|
+
### 5. Build the replay harness only from observed facts
|
|
75
|
+
|
|
76
|
+
- Read `references/fingerprinting-playbook.md` before implementing the replay phase.
|
|
77
|
+
- Use `.env` or equivalent env-backed config for base URLs, API keys, and provider labels.
|
|
78
|
+
- Mirror only the fields that were actually observed from the client.
|
|
79
|
+
- Keep capture and replay scripts separate unless there is a strong reason to combine them.
|
|
80
|
+
- Preserve the observed stream mode; do not silently downgrade SSE to non-streaming or vice versa.
|
|
81
|
+
|
|
82
|
+
### 6. Run black-box fingerprinting
|
|
83
|
+
|
|
84
|
+
- Compare the target endpoint against one or more control providers with known or documented models.
|
|
85
|
+
- Use a prompt matrix that spans:
|
|
86
|
+
- coding or tool-use style
|
|
87
|
+
- factual knowledge questions with externally verified answers
|
|
88
|
+
- refusal and policy behavior
|
|
89
|
+
- instruction-following edge cases
|
|
90
|
+
- long-context or truncation behavior when relevant
|
|
91
|
+
- When building factual question sets, verify the answer key from primary sources or fresh web research instead of relying on memory.
|
|
92
|
+
- If the user wants LLMMAP-style comparison, keep the benchmark inputs fixed across providers and score each response on the same rubric.
|
|
93
|
+
|
|
94
|
+
### 7. Report with confidence and caveats
|
|
95
|
+
|
|
96
|
+
- Summarize:
|
|
97
|
+
- what was captured
|
|
98
|
+
- what replayed successfully
|
|
99
|
+
- which differences were protocol-level versus model-level
|
|
100
|
+
- the most likely underlying model family
|
|
101
|
+
- the confidence level and why
|
|
102
|
+
- If system prompts or provider-side wrappers likely distort the output, say so explicitly and lower confidence accordingly.
|
|
103
|
+
- If the user wants a report artifact, hand off to `deep-research-topics` after the evidence has been collected.
|
|
104
|
+
|
|
105
|
+
## References
|
|
106
|
+
|
|
107
|
+
- `references/request-shape-checklist.md` for the capture and replay evidence checklist.
|
|
108
|
+
- `references/fingerprinting-playbook.md` for comparison design, scoring dimensions, and report structure.
|
|
109
|
+
|
|
110
|
+
## Guardrails
|
|
111
|
+
|
|
112
|
+
- Keep the work on systems the user is authorized to inspect or test.
|
|
113
|
+
- Do not present speculation about hidden auth checks as established fact.
|
|
114
|
+
- Do not over-index on one response; model attribution needs repeated prompts and multiple signal types.
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
interface:
|
|
2
|
+
display_name: "Shadow API Model Research"
|
|
3
|
+
short_description: "Capture gated client traffic and attribute likely model families"
|
|
4
|
+
default_prompt: "Use $shadow-api-model-research when the task is to inspect Codex/OpenClaw/custom-provider request shapes, build a capture or replay workflow, compare a restricted endpoint against control providers, or estimate the likely underlying model with black-box fingerprinting."
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# Fingerprinting Playbook
|
|
2
|
+
|
|
3
|
+
Use this playbook after you have trustworthy captured traffic or a validated replay harness.
|
|
4
|
+
|
|
5
|
+
## Comparison design
|
|
6
|
+
|
|
7
|
+
- Keep prompts, temperature-like settings, and stream mode fixed across providers.
|
|
8
|
+
- Prefer at least one documented control provider with a known model family.
|
|
9
|
+
- Run multiple prompt categories; one category is not enough for attribution.
|
|
10
|
+
|
|
11
|
+
## Recommended prompt categories
|
|
12
|
+
|
|
13
|
+
### 1. Factual knowledge
|
|
14
|
+
|
|
15
|
+
- Use questions with fresh, externally verifiable answers.
|
|
16
|
+
- Build the answer key from current primary sources or credible web verification.
|
|
17
|
+
- Score for correctness, completeness, and unsupported claims.
|
|
18
|
+
|
|
19
|
+
### 2. Coding style
|
|
20
|
+
|
|
21
|
+
- Use short implementation tasks and bug-fix prompts.
|
|
22
|
+
- Compare code structure, caution level, and explanation style.
|
|
23
|
+
|
|
24
|
+
### 3. Instruction following
|
|
25
|
+
|
|
26
|
+
- Use prompts with explicit formatting or ranking constraints.
|
|
27
|
+
- Compare compliance, stability, and unnecessary extra content.
|
|
28
|
+
|
|
29
|
+
### 4. Refusal and policy behavior
|
|
30
|
+
|
|
31
|
+
- Use borderline prompts that should trigger a recognizable refusal or safe alternative.
|
|
32
|
+
- Compare refusal style, redirect wording, and partial compliance behavior.
|
|
33
|
+
|
|
34
|
+
### 5. Long-context behavior
|
|
35
|
+
|
|
36
|
+
- Only run this when the target is expected to support larger contexts.
|
|
37
|
+
- Compare truncation, summarization drift, and consistency across later references.
|
|
38
|
+
|
|
39
|
+
## Scoring dimensions
|
|
40
|
+
|
|
41
|
+
Score each response on a fixed rubric, for example:
|
|
42
|
+
|
|
43
|
+
- factual accuracy
|
|
44
|
+
- completeness
|
|
45
|
+
- instruction compliance
|
|
46
|
+
- reasoning clarity
|
|
47
|
+
- code quality
|
|
48
|
+
- refusal consistency
|
|
49
|
+
- verbosity control
|
|
50
|
+
- latency or throughput when that matters
|
|
51
|
+
|
|
52
|
+
Use the same rubric for every provider and every prompt.
|
|
53
|
+
|
|
54
|
+
## Confidence discipline
|
|
55
|
+
|
|
56
|
+
- High confidence needs multiple converging signals across categories.
|
|
57
|
+
- Medium confidence fits cases where the target tracks one family strongly but wrappers may distort style.
|
|
58
|
+
- Low confidence fits cases where the protocol was captured but output signals remain mixed.
|
|
59
|
+
|
|
60
|
+
## Suggested report structure
|
|
61
|
+
|
|
62
|
+
1. Research objective
|
|
63
|
+
2. Capture setup
|
|
64
|
+
3. Replay validation status
|
|
65
|
+
4. Prompt matrix and controls
|
|
66
|
+
5. Scoring rubric
|
|
67
|
+
6. Comparative findings
|
|
68
|
+
7. Most likely model family
|
|
69
|
+
8. Caveats, including wrappers and system prompts
|