@buaa_smat/hometrans 0.1.6 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,308 @@
1
+ > ## Documentation Index
2
+ > Fetch the complete documentation index at: https://agentskills.io/llms.txt
3
+ > Use this file to discover all available pages before exploring further.
4
+
5
+ # Using scripts in skills
6
+
7
+ > How to run commands and bundle executable scripts in your skills.
8
+
9
+ Skills can instruct agents to run shell commands and bundle reusable scripts in a `scripts/` directory. This guide covers one-off commands, self-contained scripts with their own dependencies, and how to design script interfaces for agentic use.
10
+
11
+ ## One-off commands
12
+
13
+ When an existing package already does what you need, you can reference it directly in your `SKILL.md` instructions without a `scripts/` directory. Many ecosystems provide tools that auto-resolve dependencies at runtime.
14
+
15
+ <Tabs sync={false}>
16
+ <Tab title="uvx">
17
+ [uvx](https://docs.astral.sh/uv/guides/tools/) runs Python packages in isolated environments with aggressive caching. It ships with [uv](https://docs.astral.sh/uv/).
18
+
19
+ ```bash theme={null}
20
+ uvx ruff@0.8.0 check .
21
+ uvx black@24.10.0 .
22
+ ```
23
+
24
+ * Not bundled with Python — requires a separate install.
25
+ * Fast. Caches aggressively so repeat runs are near-instant.
26
+ </Tab>
27
+
28
+ <Tab title="pipx">
29
+ [pipx](https://pipx.pypa.io/) runs Python packages in isolated environments. Available via OS package managers (`apt install pipx`, `brew install pipx`).
30
+
31
+ ```bash theme={null}
32
+ pipx run 'black==24.10.0' .
33
+ pipx run 'ruff==0.8.0' check .
34
+ ```
35
+
36
+ * Not bundled with Python — requires a separate install.
37
+ * A mature alternative to `uvx`. While `uvx` has become the standard recommendation, `pipx` remains a reliable option with broader OS package manager availability.
38
+ </Tab>
39
+
40
+ <Tab title="npx">
41
+ [npx](https://docs.npmjs.com/cli/commands/npx) runs npm packages, downloading them on demand. It ships with npm (which ships with Node.js).
42
+
43
+ ```bash theme={null}
44
+ npx eslint@9 --fix .
45
+ npx create-vite@6 my-app
46
+ ```
47
+
48
+ * Bundled with Node.js — no extra install needed.
49
+ * Downloads the package, runs it, and caches it for future use.
50
+ * Pin versions with `npx package@version` for reproducibility.
51
+ </Tab>
52
+
53
+ <Tab title="bunx">
54
+ [bunx](https://bun.sh/docs/cli/bunx) is Bun's equivalent of `npx`. It ships with [Bun](https://bun.sh/).
55
+
56
+ ```bash theme={null}
57
+ bunx eslint@9 --fix .
58
+ bunx create-vite@6 my-app
59
+ ```
60
+
61
+ * Drop-in replacement for `npx` in Bun-based environments.
62
+ * Only appropriate when the user's environment has Bun rather than Node.js.
63
+ </Tab>
64
+
65
+ <Tab title="deno run">
66
+ [deno run](https://docs.deno.com/runtime/reference/cli/run/) runs scripts directly from URLs or specifiers. It ships with [Deno](https://deno.com/).
67
+
68
+ ```bash theme={null}
69
+ deno run npm:create-vite@6 my-app
70
+ deno run --allow-read npm:eslint@9 -- --fix .
71
+ ```
72
+
73
+ * Permission flags (`--allow-read`, etc.) are required for filesystem/network access.
74
+ * Use `--` to separate Deno flags from the tool's own flags.
75
+ </Tab>
76
+
77
+ <Tab title="go run">
78
+ [go run](https://pkg.go.dev/cmd/go#hdr-Compile_and_run_Go_program) compiles and runs Go packages directly. It is built into the `go` command.
79
+
80
+ ```bash theme={null}
81
+ go run golang.org/x/tools/cmd/goimports@v0.28.0 .
82
+ go run github.com/golangci/golangci-lint/cmd/golangci-lint@v1.62.0 run
83
+ ```
84
+
85
+ * Built into Go — no extra tooling needed.
86
+ * Pin versions or use `@latest` to make the command explicit.
87
+ </Tab>
88
+ </Tabs>
89
+
90
+ **Tips for one-off commands in skills:**
91
+
92
+ * **Pin versions** (e.g., `npx eslint@9.0.0`) so the command behaves the same over time.
93
+ * **State prerequisites** in your `SKILL.md` (e.g., "Requires Node.js 18+") rather than assuming the agent's environment has them. For runtime-level requirements, use the [`compatibility` frontmatter field](/specification#compatibility-field).
94
+ * **Move complex commands into scripts.** A one-off command works well when you're invoking a tool with a few flags. When a command grows complex enough that it's hard to get right on the first try, a tested script in `scripts/` is more reliable.
95
+
96
+ ## Referencing scripts from `SKILL.md`
97
+
98
+ Use **relative paths from the skill directory root** to reference bundled files. The agent resolves these paths automatically — no absolute paths needed.
99
+
100
+ List available scripts in your `SKILL.md` so the agent knows they exist:
101
+
102
+ ```markdown SKILL.md theme={null}
103
+ ## Available scripts
104
+
105
+ - **`scripts/validate.sh`** — Validates configuration files
106
+ - **`scripts/process.py`** — Processes input data
107
+ ```
108
+
109
+ Then instruct the agent to run them:
110
+
111
+ ````markdown SKILL.md theme={null}
112
+ ## Workflow
113
+
114
+ 1. Run the validation script:
115
+ ```bash
116
+ bash scripts/validate.sh "$INPUT_FILE"
117
+ ```
118
+
119
+ 2. Process the results:
120
+ ```bash
121
+ python3 scripts/process.py --input results.json
122
+ ```
123
+ ````
124
+
125
+ <Note>
126
+ The same relative-path convention works in support files like `references/*.md` — script execution paths (in code blocks) are relative to the **skill directory root**, because the agent runs commands from there.
127
+ </Note>
128
+
129
+ ## Self-contained scripts
130
+
131
+ When you need reusable logic, bundle a script in `scripts/` that declares its own dependencies inline. The agent can run the script with a single command — no separate manifest file or install step required.
132
+
133
+ Several languages support inline dependency declarations:
134
+
135
+ <Tabs sync={false}>
136
+ <Tab title="Python">
137
+ [PEP 723](https://peps.python.org/pep-0723/) defines a standard format for inline script metadata. Declare dependencies in a TOML block inside `# ///` markers:
138
+
139
+ ```python scripts/extract.py theme={null}
140
+ # /// script
141
+ # dependencies = [
142
+ # "beautifulsoup4",
143
+ # ]
144
+ # ///
145
+
146
+ from bs4 import BeautifulSoup
147
+
148
+ html = '<html><body><h1>Welcome</h1><p class="info">This is a test.</p></body></html>'
149
+ print(BeautifulSoup(html, "html.parser").select_one("p.info").get_text())
150
+ ```
151
+
152
+ Run with [uv](https://docs.astral.sh/uv/) (recommended):
153
+
154
+ ```bash theme={null}
155
+ uv run scripts/extract.py
156
+ ```
157
+
158
+ `uv run` creates an isolated environment, installs the declared dependencies, and runs the script. [pipx](https://pipx.pypa.io/) (`pipx run scripts/extract.py`) also supports PEP 723.
159
+
160
+ * Pin versions with [PEP 508](https://peps.python.org/pep-0508/) specifiers: `"beautifulsoup4>=4.12,<5"`.
161
+ * Use `requires-python` to constrain the Python version.
162
+ * Use `uv lock --script` to create a lockfile for full reproducibility.
163
+ </Tab>
164
+
165
+ <Tab title="Deno">
166
+ Deno's `npm:` and `jsr:` import specifiers make every script self-contained by default:
167
+
168
+ ```typescript scripts/extract.ts theme={null}
169
+ #!/usr/bin/env -S deno run
170
+
171
+ import * as cheerio from "npm:cheerio@1.0.0";
172
+
173
+ const html = `<html><body><h1>Welcome</h1><p class="info">This is a test.</p></body></html>`;
174
+ const $ = cheerio.load(html);
175
+ console.log($("p.info").text());
176
+ ```
177
+
178
+ ```bash theme={null}
179
+ deno run scripts/extract.ts
180
+ ```
181
+
182
+ * Use `npm:` for npm packages, `jsr:` for Deno-native packages.
183
+ * Version specifiers follow semver: `@1.0.0` (exact), `@^1.0.0` (compatible).
184
+ * Dependencies are cached globally. Use `--reload` to force re-fetch.
185
+ * Packages with native addons (node-gyp) may not work — packages that ship pre-built binaries work best.
186
+ </Tab>
187
+
188
+ <Tab title="Bun">
189
+ Bun auto-installs missing packages at runtime when no `node_modules` directory is found. Pin versions directly in the import path:
190
+
191
+ ```typescript scripts/extract.ts theme={null}
192
+ #!/usr/bin/env bun
193
+
194
+ import * as cheerio from "cheerio@1.0.0";
195
+
196
+ const html = `<html><body><h1>Welcome</h1><p class="info">This is a test.</p></body></html>`;
197
+ const $ = cheerio.load(html);
198
+ console.log($("p.info").text());
199
+ ```
200
+
201
+ ```bash theme={null}
202
+ bun run scripts/extract.ts
203
+ ```
204
+
205
+ * No `package.json` or `node_modules` needed. TypeScript works natively.
206
+ * Packages are cached globally. First run downloads; subsequent runs are near-instant.
207
+ * If a `node_modules` directory exists anywhere up the directory tree, auto-install is disabled and Bun falls back to standard Node.js resolution.
208
+ </Tab>
209
+
210
+ <Tab title="Ruby">
211
+ Bundler ships with Ruby since 2.6. Use `bundler/inline` to declare gems directly in the script:
212
+
213
+ ```ruby scripts/extract.rb theme={null}
214
+ require 'bundler/inline'
215
+
216
+ gemfile do
217
+ source 'https://rubygems.org'
218
+ gem 'nokogiri'
219
+ end
220
+
221
+ html = '<html><body><h1>Welcome</h1><p class="info">This is a test.</p></body></html>'
222
+ doc = Nokogiri::HTML(html)
223
+ puts doc.at_css('p.info').text
224
+ ```
225
+
226
+ ```bash theme={null}
227
+ ruby scripts/extract.rb
228
+ ```
229
+
230
+ * Pin versions explicitly (`gem 'nokogiri', '~> 1.16'`) — there is no lockfile.
231
+ * An existing `Gemfile` or `BUNDLE_GEMFILE` env var in the working directory can interfere.
232
+ </Tab>
233
+ </Tabs>
234
+
235
+ ## Designing scripts for agentic use
236
+
237
+ When an agent runs your script, it reads stdout and stderr to decide what to do next. A few design choices make scripts dramatically easier for agents to use.
238
+
239
+ ### Avoid interactive prompts
240
+
241
+ This is a hard requirement of the agent execution environment. Agents operate in non-interactive shells — they cannot respond to TTY prompts, password dialogs, or confirmation menus. A script that blocks on interactive input will hang indefinitely.
242
+
243
+ Accept all input via command-line flags, environment variables, or stdin:
244
+
245
+ ```
246
+ # Bad: hangs waiting for input
247
+ $ python scripts/deploy.py
248
+ Target environment: _
249
+
250
+ # Good: clear error with guidance
251
+ $ python scripts/deploy.py
252
+ Error: --env is required. Options: development, staging, production.
253
+ Usage: python scripts/deploy.py --env staging --tag v1.2.3
254
+ ```
255
+
256
+ ### Document usage with `--help`
257
+
258
+ `--help` output is the primary way an agent learns your script's interface. Include a brief description, available flags, and usage examples:
259
+
260
+ ```
261
+ Usage: scripts/process.py [OPTIONS] INPUT_FILE
262
+
263
+ Process input data and produce a summary report.
264
+
265
+ Options:
266
+ --format FORMAT Output format: json, csv, table (default: json)
267
+ --output FILE Write output to FILE instead of stdout
268
+ --verbose Print progress to stderr
269
+
270
+ Examples:
271
+ scripts/process.py data.csv
272
+ scripts/process.py --format csv --output report.csv data.csv
273
+ ```
274
+
275
+ Keep it concise — the output enters the agent's context window alongside everything else it's working with.
276
+
277
+ ### Write helpful error messages
278
+
279
+ When an agent gets an error, the message directly shapes its next attempt. An opaque "Error: invalid input" wastes a turn. Instead, say what went wrong, what was expected, and what to try:
280
+
281
+ ```
282
+ Error: --format must be one of: json, csv, table.
283
+ Received: "xml"
284
+ ```
285
+
286
+ ### Use structured output
287
+
288
+ Prefer structured formats — JSON, CSV, TSV — over free-form text. Structured formats can be consumed by both the agent and standard tools (`jq`, `cut`, `awk`), making your script composable in pipelines.
289
+
290
+ ```
291
+ # Whitespace-aligned — hard to parse programmatically
292
+ NAME STATUS CREATED
293
+ my-service running 2025-01-15
294
+
295
+ # Delimited — unambiguous field boundaries
296
+ {"name": "my-service", "status": "running", "created": "2025-01-15"}
297
+ ```
298
+
299
+ **Separate data from diagnostics:** send structured data to stdout and progress messages, warnings, and other diagnostics to stderr. This lets the agent capture clean, parseable output while still having access to diagnostic information when needed.
300
+
301
+ ### Further considerations
302
+
303
+ * **Idempotency.** Agents may retry commands. "Create if not exists" is safer than "create and fail on duplicate."
304
+ * **Input constraints.** Reject ambiguous input with a clear error rather than guessing. Use enums and closed sets where possible.
305
+ * **Dry-run support.** For destructive or stateful operations, a `--dry-run` flag lets the agent preview what will happen.
306
+ * **Meaningful exit codes.** Use distinct exit codes for different failure types (not found, invalid arguments, auth failure) and document them in your `--help` output so the agent knows what each code means.
307
+ * **Safe defaults.** Consider whether destructive operations should require explicit confirmation flags (`--confirm`, `--force`) or other safeguards appropriate to the risk level.
308
+ * **Predictable output size.** Many agent harnesses automatically truncate tool output beyond a threshold (e.g., 10-30K characters), potentially losing critical information. If your script might produce large output, default to a summary or a reasonable limit, and support flags like `--offset` so the agent can request more information when needed. Alternatively, if output is large and not amenable to pagination, require agents to pass an `--output` flag that specifies either an output file or `-` to explicitly opt in to stdout.
@@ -0,0 +1,163 @@
1
+ # Report Template
2
+
3
+ ALWAYS use this exact structure for the evaluation report. Replace bracketed placeholders with actual content. Skip sections marked N/A only when the corresponding dimension was entirely skipped (e.g., no scripts).
4
+
5
+ ---
6
+
7
+ # Skill Quality Report: [skill-name]
8
+
9
+ **Evaluated on:** [date]
10
+ **Skill path:** [path]
11
+ **Overall score:** [X.X] / 5.0 — [Rating]
12
+
13
+ ---
14
+
15
+ ## Executive Summary
16
+
17
+ [2-3 sentences: what the skill does, its strongest quality, and the single most impactful improvement it needs.]
18
+
19
+ ---
20
+
21
+ ## Dimension Scores
22
+
23
+ | Dimension | Score | Weight | Weighted |
24
+ |-----------|-------|--------|----------|
25
+ | 1. Spec Compliance | X.X | 10% | X.X |
26
+ | 2. Progressive Disclosure | X.X | 15% | X.X |
27
+ | 3. Content Efficiency | X.X | 20% | X.X |
28
+ | 4. Instruction Quality | X.X | 25% | X.X |
29
+ | 5. Description Effectiveness | X.X | 15% | X.X |
30
+ | 6. Script Quality | X.X / N/A | 5% | X.X |
31
+ | 7. Evaluability | X.X | 10% | X.X |
32
+ | **Overall** | | | **X.X** |
33
+
34
+ ---
35
+
36
+ ## Dimension 1 — Specification Compliance: X.X / 5.0
37
+
38
+ | # | Criterion | Score | Evidence |
39
+ |---|-----------|-------|----------|
40
+ | 1.1 | Frontmatter validity | X | [Specific observation] |
41
+ | 1.2 | Name rules | X | [Specific observation] |
42
+ | 1.3 | Description length | X | [Specific observation] |
43
+ | 1.4 | Directory structure | X | [Specific observation] |
44
+
45
+ **What's good:** [1 sentence]
46
+ **What to improve:** [1 sentence with action]
47
+
48
+ ---
49
+
50
+ ## Dimension 2 — Progressive Disclosure: X.X / 5.0
51
+
52
+ | # | Criterion | Score | Evidence |
53
+ |---|-----------|-------|----------|
54
+ | 2.1 | SKILL.md size | X | [Line count / observation] |
55
+ | 2.2 | Reference usage | X | [Which references exist, are they focused?] |
56
+ | 2.3 | Load triggers | X | [Does the skill say when to load each reference?] |
57
+ | 2.4 | Path conventions | X | [Relative paths? Chain depth?] |
58
+
59
+ **What's good:** [1 sentence]
60
+ **What to improve:** [1 sentence with action]
61
+
62
+ ---
63
+
64
+ ## Dimension 3 — Content Efficiency: X.X / 5.0
65
+
66
+ | # | Criterion | Score | Evidence |
67
+ |---|-----------|-------|----------|
68
+ | 3.1 | No generic filler | X | [Examples of filler found, or confirmation of none] |
69
+ | 3.2 | Coherent scope | X | [Scope assessment] |
70
+ | 3.3 | Appropriate detail | X | [Detail level assessment] |
71
+ | 3.4 | Domain grounding | X | [What domain-specific knowledge is present?] |
72
+
73
+ **What's good:** [1 sentence]
74
+ **What to improve:** [1 sentence with action]
75
+
76
+ ---
77
+
78
+ ## Dimension 4 — Instruction Quality: X.X / 5.0
79
+
80
+ | # | Criterion | Score | Evidence |
81
+ |---|-----------|-------|----------|
82
+ | 4.1 | Calibrated specificity | X | [Examples of good or poor calibration] |
83
+ | 4.2 | Defaults over menus | X | [Where does it provide defaults? Where does it list options?] |
84
+ | 4.3 | Procedural over declarative | X | [Does it teach approach or give specific answers?] |
85
+ | 4.4 | Explains why | X | [Examples of reasoning in instructions] |
86
+ | 4.5 | Gotchas | X | [Presence and quality of gotchas section] |
87
+
88
+ **What's good:** [1 sentence]
89
+ **What to improve:** [1 sentence with action]
90
+
91
+ ---
92
+
93
+ ## Dimension 5 — Description Effectiveness: X.X / 5.0
94
+
95
+ | # | Criterion | Score | Evidence |
96
+ |---|-----------|-------|----------|
97
+ | 5.1 | Imperative framing | X | [Quote the description or its key phrases] |
98
+ | 5.2 | Intent-focused | X | [Mechanics vs. user intent analysis] |
99
+ | 5.3 | Trigger coverage | X | [What triggers are covered? What's missing?] |
100
+ | 5.4 | Conciseness | X | [Character count and density assessment] |
101
+ | 5.5 | Keyword specificity | X | [Would this false-trigger or miss-trigger?] |
102
+
103
+ **What's good:** [1 sentence]
104
+ **What to improve:** [1 sentence with action]
105
+
106
+ ---
107
+
108
+ ## Dimension 6 — Script Quality: [X.X / 5.0 or N/A]
109
+
110
+ [If N/A: "No scripts/ directory — this dimension is not applicable."]
111
+
112
+ [If scored:]
113
+
114
+ | # | Criterion | Score | Evidence |
115
+ |---|-----------|-------|----------|
116
+ | 6.1 | Self-contained | X | [Dependency handling] |
117
+ | 6.2 | Non-interactive | X | [Any TTY prompts?] |
118
+ | 6.3 | Help output | X | [--help quality] |
119
+ | 6.4 | Error messages | X | [Error message quality] |
120
+ | 6.5 | Structured output | X | [Output format assessment] |
121
+
122
+ **What's good:** [1 sentence]
123
+ **What to improve:** [1 sentence with action]
124
+
125
+ ---
126
+
127
+ ## Dimension 7 — Evaluability: X.X / 5.0
128
+
129
+ | # | Criterion | Score | Evidence |
130
+ |---|-----------|-------|----------|
131
+ | 7.1 | Test cases exist | X | [evals/evals.json present? How many cases?] |
132
+ | 7.2 | Realistic prompts | X | [Example prompt quality] |
133
+ | 7.3 | Verifiable assertions | X | [Are assertions specific and checkable?] |
134
+ | 7.4 | Edge case coverage | X | [Any boundary condition tests?] |
135
+
136
+ **What's good:** [1 sentence]
137
+ **What to improve:** [1 sentence with action]
138
+
139
+ ---
140
+
141
+ ## Top 3 Strengths
142
+
143
+ 1. **[Strength title]** — [1-2 sentences with evidence]
144
+ 2. **[Strength title]** — [1-2 sentences with evidence]
145
+ 3. **[Strength title]** — [1-2 sentences with evidence]
146
+
147
+ ---
148
+
149
+ ## Top 3 Improvement Areas
150
+
151
+ 1. **[Issue title]** — [Why it matters] → **Fix:** [Specific, actionable change]
152
+ 2. **[Issue title]** — [Why it matters] → **Fix:** [Specific, actionable change]
153
+ 3. **[Issue title]** — [Why it matters] → **Fix:** [Specific, actionable change]
154
+
155
+ ---
156
+
157
+ ## Quick Wins
158
+
159
+ [2-3 one-line fixes that would take under 5 minutes and yield immediate improvement. If none are obvious, say "No quick wins identified — improvements require structural changes."]
160
+
161
+ ---
162
+
163
+ *Report generated by skill-quality-evaluator*