npm - @buaa_smat/hometrans - Versions diffs - 0.1.7 → 0.1.8 - Mend

@buaa_smat/hometrans 0.1.7 → 0.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/skills/skill-quality-evaluator/SKILL.md DELETED Viewed

@@ -1,138 +0,0 @@
----
-name: skill-quality-evaluator
-description: Evaluate the quality of an Agent Skill across multiple dimensions — specification compliance, progressive disclosure, content efficiency, instruction quality, description effectiveness, script quality, and evaluability. Use this skill when the user wants to assess, review, score, or audit a skill, check if a skill follows best practices, get a quality report for a skill, or understand what improvements a skill needs. Triggers on phrases like "evaluate this skill", "review my skill", "skill quality check", "audit SKILL.md", "how good is this skill", "assess skill quality", "评分", "评估skill", "检查skill质量".
----
-# Skill Quality Evaluator
-Evaluate a target skill across 7 dimensions, each broken into concrete sub-criteria. The goal is a structured, evidence-based assessment — not a vibe check.
-## Foundational References
-This skill's evaluation criteria are derived from five authoritative documents bundled in `references/`. Load them when you need deeper context on *why* a criterion matters — especially when the target skill has a borderline score and you need to cite the official guidance.
-| Document | Covers | Most Relevant To |
-|----------|--------|------------------|
-| `references/Specification.md` | Complete Agent Skills format spec: frontmatter fields, naming rules, directory structure, progressive disclosure design | Dimensions 1, 2 |
-| `references/Best-practices-for-skill-creators.md` | How to write well-scoped, calibrated, effective skill instructions — start from real expertise, spend context wisely, calibrate control, patterns (gotchas, templates, checklists, validation loops, plan-validate-execute) | Dimensions 3, 4 |
-| `references/Evaluating-skill-output-quality.md` | Test case design, assertions, grading, benchmarking, human review, the iteration loop | Dimension 7 |
-| `references/Optimizing-skill-descriptions.md` | How the `description` field drives skill triggering, how to write effective descriptions, trigger eval design, train/validation splits, the optimization loop | Dimension 5 |
-| `references/Using-scripts-in-skills.md` | One-off commands, self-contained scripts, designing scripts for agentic use (non-interactive, --help, error messages, structured output) | Dimension 6 |
-The 7-dimension framework below is a synthesis of all five documents. If a criterion feels unclear, open the corresponding document — it contains the rationale and examples.
-## Bundled Resources
-This skill also includes two reference files and one asset. Load them at the specified points — do not load everything upfront.
-| File | When to Load | What It Contains |
-|------|-------------|------------------|
-| `references/scoring-rubric.md` | **Before assigning scores** (after reading the target skill and collecting evidence) | 1-5 scale anchors for every sub-criterion, dimension weight table, and overall score interpretation ranges. Use these anchors to map evidence to scores — never score without consulting this file. |
-| `references/report-template.md` | **After all scores are finalized** (when writing the final report) | Exact Markdown template for the output report. Follow its structure precisely so every report is consistent and machine-readable. |
-| `assets/SKILL_TEMPLATE.md` | **When the user asks to create a new skill** or wants to see what a well-structured skill looks like, or when the evaluation report recommends a structural rewrite | A SKILL.md template with standard frontmatter plus three extra sections: Input Parameters table, Output Content table, and Preconditions (with failure handling patterns). Use this as the starting point for generating new skills. |
-## Workflow
-1. Confirm the skill path from the user. If the user only gave a skill name, search under `./skills/` and `~/.claude/` for it.
-2. Read the target skill's `SKILL.md` fully. Note its directory structure — check for `scripts/`, `references/`, `assets/`, `evals/`.
-3. Read the 7 dimension checklists below to know what evidence to collect. Scan the target skill's auxiliary files (`scripts/`, `references/`, `evals/`) if they exist.
-4. **Load `references/scoring-rubric.md`**, then assign a 1-5 score per sub-criterion using the scale anchors.
-5. Compute dimension averages and the overall weighted score.
-6. **Load `references/report-template.md`** and write the final report following that exact template.
-## Dimension 1 — Specification Compliance
-Check the `SKILL.md` frontmatter and directory against the Agent Skills spec.
-| # | Criterion | What to Check |
-|---|-----------|---------------|
-| 1.1 | Frontmatter validity | YAML parses without errors; `name` and `description` are present |
-| 1.2 | Name rules | 1-64 chars, lowercase alphanumeric + hyphens only, no leading/trailing hyphen, no consecutive hyphens, matches parent directory name |
-| 1.3 | Description length | 1-1024 characters |
-| 1.4 | Directory structure | `SKILL.md` exists at root; optional dirs (`scripts/`, `references/`, `assets/`) used appropriately |
-| 1.5 | No broken references | Every file path mentioned in SKILL.md (`references/*.md`, `scripts/*`, `assets/*`) resolves to an existing file. Every reference/script/asset listed in the skill's own resource tables exists on disk. No 404s. |
-## Dimension 2 — Progressive Disclosure
-Assess whether the skill respects context budgets and loads detail on demand.
-| # | Criterion | What to Check |
-|---|-----------|---------------|
-| 2.1 | SKILL.md size | Under 500 lines; core instructions only |
-| 2.2 | Reference usage | Detailed/exhaustive material moved to `references/` instead of inline; reference files are focused and reasonably sized |
-| 2.3 | Load triggers | When referencing auxiliary files, the skill tells the agent *when* to load them (e.g., "if the API returns 4xx, read `references/errors.md`"), not just "see references/" |
-| 2.4 | Path conventions | File references use relative paths from skill root; no deeply nested reference chains (one level deep from SKILL.md) |
-## Dimension 3 — Content Efficiency
-Evaluate whether every token earns its place.
-| # | Criterion | What to Check |
-|---|-----------|---------------|
-| 3.1 | No generic filler | Does NOT explain concepts the agent already knows (e.g., what a PDF is, how HTTP works). Focuses on what the agent would get wrong without the skill. |
-| 3.2 | Coherent scope | The skill encapsulates one coherent unit of work. Not so narrow that multiple skills must chain for basic tasks; not so broad that it's hard to activate precisely. |
-| 3.3 | Appropriate detail | Not an exhaustive reference manual, not a vague one-liner. Enough structure to guide without over-constraining. |
-| 3.4 | Domain grounding | Contains project-specific, domain-specific, or environment-specific knowledge — not just general best practices the agent already knows. |
-## Dimension 4 — Instruction Quality
-Assess how well the skill *teaches the agent to approach* the task.
-| # | Criterion | What to Check |
-|---|-----------|---------------|
-| 4.1 | Calibrated specificity | Flexible where approaches vary, prescriptive where operations are fragile. Not uniformly rigid or uniformly loose. |
-| 4.2 | Defaults over menus | When multiple tools/approaches work, picks a clear default with brief mention of alternatives — doesn't present an undifferentiated list of options. |
-| 4.3 | Procedural over declarative | Teaches *how to approach* a class of problems, not *what to produce* for a specific instance. The method generalizes. |
-| 4.4 | Explains why | Instructions include reasoning where non-obvious ("because X tends to Y"). Not just naked directives. |
-| 4.5 | Gotchas | Captures environment-specific corrections — things that defy reasonable assumptions and would cause errors if not stated. |
-## Dimension 5 — Description Effectiveness
-Evaluate the `description` frontmatter field as a triggering mechanism.
-| # | Criterion | What to Check |
-|---|-----------|---------------|
-| 5.1 | Imperative framing | Uses "Use this skill when..." or equivalent instructional framing, not passive "This skill does..." |
-| 5.2 | Intent-focused | Describes what the user is trying to achieve, not the skill's internal mechanics. |
-| 5.3 | Trigger coverage | Lists specific contexts and trigger phrases. Pushy enough — includes cases where the user doesn't name the domain explicitly. |
-| 5.4 | Conciseness | Long enough to cover scope, short enough not to bloat context. Under the 1024-char hard limit. |
-| 5.5 | Keyword specificity | Contains concrete keywords that disambiguate from adjacent skills. Would match relevant prompts and not match irrelevant ones. |
-## Dimension 6 — Script Quality
-Skip this dimension entirely if the skill has no `scripts/` directory. Otherwise:
-| # | Criterion | What to Check |
-|---|-----------|---------------|
-| 6.1 | Self-contained | Dependencies declared inline (PEP 723, package.json, bundler/inline) or clearly documented. |
-| 6.2 | Non-interactive | No TTY prompts, password dialogs, or confirmation menus. All input via flags/env/stdin. |
-| 6.3 | Help output | `--help` shows usage, flags, and examples. |
-| 6.4 | Error messages | Say what went wrong, what was expected, and what to try. Not opaque. |
-| 6.5 | Structured output | Prefers JSON/CSV/TSV over free-form text for data; separates diagnostics to stderr. |
-## Dimension 7 — Evaluability
-Check whether the skill can be systematically tested and improved.
-| # | Criterion | What to Check |
-|---|-----------|---------------|
-| 7.1 | Test cases exist | Has `evals/evals.json` with at least 2 test cases. |
-| 7.2 | Realistic prompts | Test prompts look like real user messages — file paths, personal context, varied formality. Not generic "do X". |
-| 7.3 | Verifiable assertions | Assertions are specific, objective, and programmatically checkable where possible. Not vague ("output is good"). |
-| 7.4 | Edge case coverage | At least one test covers a boundary condition or ambiguous input. |
-## Scoring
-Each sub-criterion scores 1-5. The full scale anchors, dimension weights, and score interpretation ranges are in `references/scoring-rubric.md` — you should have already loaded it per the workflow above. After scoring:
-1. Compute dimension averages (mean of sub-criteria scores in each dimension).
-2. Compute the overall weighted score using the weights table in `references/scoring-rubric.md`. Redistribute N/A dimension weights proportionally.
-3. Identify the top 3 strengths (highest-scoring sub-criteria with most impact) and top 3 improvement areas (lowest-scoring, highest-weight items first).
-4. Write the final report following the template in `references/report-template.md` (loaded per the workflow above).
-## Important Notes
-- **Evidence is mandatory.** Every score must cite a specific line, pattern, or absence in the skill files. No "feels like a 4."
-- **Missing dimensions.** If a dimension is not applicable (e.g., no scripts), mark it N/A and exclude from the overall score.
-- **Be constructive.** The goal is to help the skill author improve, not to judge. Frame weaknesses as specific, actionable changes.
-- **Read the whole skill.** Scan `scripts/`, `references/`, and `evals/` if they exist — don't judge on SKILL.md alone.

package/skills/skill-quality-evaluator/assets/SKILL_TEMPLATE.md DELETED Viewed

@@ -1,77 +0,0 @@
----
-name: <skill-name>
-description: >
-  <what it does + when to use it. Use imperative framing, focus on user intent.
-  List trigger phrases. Keep under 1024 chars.>
-license: <license-name or bundled license file>
-compatibility: <environment requirements — product, system packages, network, etc.>
-allowed-tools:
----
-# <Skill Title> — <One-line summary>
-<Brief overview of what this skill accomplishes — 2-3 sentences.>
-## Input Parameters
-<!--
-  List every input the skill expects from the user. Use this exact table format.
-  Mark optional params with "(optional)" in the Variable column.
--->
-| Variable | Required | Meaning | Typical User Phrases / Expected Format |
-|----------|----------|---------|----------------------------------------|
-| `<param1>` | Yes | <what this parameter represents> | <examples of how users describe it, expected format, fallback sources> |
-| `<param2>` | Yes | <what this parameter represents> | <examples of how users describe it, expected format, fallback sources> |
-| `<param3>` | No | <what this parameter represents> | <examples, format, default value if omitted> |
-<!-- Extraction rule: if a required param is missing, ask the user — do NOT guess. -->
-## Output Content
-<!--
-  List every artifact or result this skill produces. Be specific about formats,
-  locations, and what a successful result looks like.
--->
-| Output | Type | Location / Path | Description |
-|--------|------|-----------------|-------------|
-| `<output1>` | <file \| directory \| in-message> | `<where it goes>` | <what it contains, format> |
-| `<output2>` | <file \| directory \| in-message> | `<where it goes>` | <what it contains, format> |
-## Preconditions
-<!--
-  Every file, directory, tool, or environment state the skill assumes exists
-  BEFORE it starts. For each precondition, define the check AND the fallback.
--->
-| # | Precondition | How to Verify | On Failure |
-|---|-------------|---------------|------------|
-| 1 | `<expected state>` | `<bash command or file check to run>` | `<what to tell the user / what to do — e.g., "ask the user to provide X", "create it automatically by running Y", "abort and tell the user to install Z">` |
-<!--
-  Failure handling patterns (pick the right one for each precondition):
-  - **Ask user**: "Parameter X is missing — please provide ..."
-  - **Auto-create**: "Run scripts/setup.sh to generate ..."
-  - **Auto-install**: "pip install X" / "npm install X"
-  - **Abort**: "Cannot proceed — X requires Y. Please ... then retry."
--->
----
-## Workflow
-<!--
-  PATTERN A: Progress Checklist
-  - [ ] markers let the agent track what's done and what remains.
-  - After each step: validate the output before checking the box.
-  - Include the tool/command inside each step so the agent knows what to run.
-  - Steps with (depends: Step N) must wait — the agent skips them until the
-    prerequisite step is checked.
--->
-Progress:
-- [ ] **Step 1:** <action verb + object> — run `scripts/<name>.py <args>`
-- [ ] **Step 2:** <action verb + object> — <tool or command> (depends: Step 1)
-- [ ] **Step 3:** <action verb + object> — <tool or command> (depends: Step 2)
-- [ ] **Step 4:** Validate — run `scripts/<validate>.py <args>`
-- [ ] **Step 5:** Verify output — <final check>
----

package/skills/skill-quality-evaluator/references/Best-practices-for-skill-creators.md DELETED Viewed

@@ -1,277 +0,0 @@
-> ## Documentation Index
-> Fetch the complete documentation index at: https://agentskills.io/llms.txt
-> Use this file to discover all available pages before exploring further.
-# Best practices for skill creators
-> How to write skills that are well-scoped and calibrated to the task.
-## Start from real expertise
-A common pitfall in skill creation is asking an LLM to generate a skill without providing domain-specific context — relying solely on the LLM's general training knowledge. The result is vague, generic procedures ("handle errors appropriately," "follow best practices for authentication") rather than the specific API patterns, edge cases, and project conventions that make a skill valuable.
-Effective skills are grounded in real expertise. The key is feeding domain-specific context into the creation process.
-### Extract from a hands-on task
-Complete a real task in conversation with an agent, providing context, corrections, and preferences along the way. Then extract the reusable pattern into a skill. Pay attention to:
-* **Steps that worked** — the sequence of actions that led to success
-* **Corrections you made** — places where you steered the agent's approach (e.g., "use library X instead of Y," "check for edge case Z")
-* **Input/output formats** — what the data looked like going in and coming out
-* **Context you provided** — project-specific facts, conventions, or constraints the agent didn't already know
-### Synthesize from existing project artifacts
-When you have a body of existing knowledge, you can feed it into an LLM and ask it to synthesize a skill. A data-pipeline skill synthesized from your team's actual incident reports and runbooks will outperform one synthesized from a generic "data engineering best practices" article, because it captures *your* schemas, failure modes, and recovery procedures. The key is project-specific material, not generic references.
-Good source material includes:
-* Internal documentation, runbooks, and style guides
-* API specifications, schemas, and configuration files
-* Code review comments and issue trackers (captures recurring concerns and reviewer expectations)
-* Version control history, especially patches and fixes (reveals patterns through what actually changed)
-* Real-world failure cases and their resolutions
-## Refine with real execution
-The first draft of a skill usually needs refinement. Run the skill against real tasks, then feed the results — all of them, not just failures — back into the creation process. Ask: what triggered false positives? What was missed? What could be cut?
-Even a single pass of execute-then-revise noticeably improves quality, and complex domains often benefit from several.
-<Tip>
-  Read agent execution traces, not just final outputs. If the agent wastes time on unproductive steps, common causes include instructions that are too vague (the agent tries several approaches before finding one that works), instructions that don't apply to the current task (the agent follows them anyway), or too many options presented without a clear default.
-</Tip>
-For a more structured approach to iteration, including test cases, assertions, and grading, see [Evaluating skill output quality](/skill-creation/evaluating-skills).
-## Spending context wisely
-Once a skill activates, its full `SKILL.md` body loads into the agent's context window alongside conversation history, system context, and other active skills. Every token in your skill competes for the agent's attention with everything else in that window.
-### Add what the agent lacks, omit what it knows
-Focus on what the agent *wouldn't* know without your skill: project-specific conventions, domain-specific procedures, non-obvious edge cases, and the particular tools or APIs to use. You don't need to explain what a PDF is, how HTTP works, or what a database migration does.
-````markdown theme={null}
-<!-- Too verbose — the agent already knows what PDFs are -->
-## Extract PDF text
-PDF (Portable Document Format) files are a common file format that contains
-text, images, and other content. To extract text from a PDF, you'll need to
-use a library. pdfplumber is recommended because it handles most cases well.
-<!-- Better — jumps straight to what the agent wouldn't know on its own -->
-## Extract PDF text
-Use pdfplumber for text extraction. For scanned documents, fall back to
-pdf2image with pytesseract.
-```python
-import pdfplumber
-with pdfplumber.open("file.pdf") as pdf:
-    text = pdf.pages[0].extract_text()
-```
-````
-Ask yourself about each piece of content: "Would the agent get this wrong without this instruction?" If the answer is no, cut it. If you're unsure, test it. And if the agent already handles the entire task well without the skill, the skill may not be adding value. See [Evaluating skill output quality](/skill-creation/evaluating-skills) for how to test this systematically.
-### Design coherent units
-Deciding what a skill should cover is like deciding what a function should do: you want it to encapsulate a coherent unit of work that composes well with other skills. Skills scoped too narrowly force multiple skills to load for a single task, risking overhead and conflicting instructions. Skills scoped too broadly become hard to activate precisely. A skill for querying a database and formatting the results may be one coherent unit, while a skill that also covers database administration is probably trying to do too much.
-### Aim for moderate detail
-Overly comprehensive skills can hurt more than they help — the agent struggles to extract what's relevant and may pursue unproductive paths triggered by instructions that don't apply to the current task. Concise, stepwise guidance with a working example tends to outperform exhaustive documentation. When you find yourself covering every edge case, consider whether most are better handled by the agent's own judgment.
-### Structure large skills with progressive disclosure
-The [specification](/specification#progressive-disclosure) recommends keeping `SKILL.md` under 500 lines and 5,000 tokens — just the core instructions the agent needs on every run. When a skill legitimately needs more content, move detailed reference material to separate files in `references/` or similar directories.
-The key is telling the agent *when* to load each file. "Read `references/api-errors.md` if the API returns a non-200 status code" is more useful than a generic "see references/ for details." This lets the agent load context on demand rather than up front, which is how [progressive disclosure](/specification#progressive-disclosure) is designed to work.
-## Calibrating control
-Not every part of a skill needs the same level of prescriptiveness. Match the specificity of your instructions to the fragility of the task.
-### Match specificity to fragility
-**Give the agent freedom** when multiple approaches are valid and the task tolerates variation. For flexible instructions, explaining *why* can be more effective than rigid directives — an agent that understands the purpose behind an instruction makes better context-dependent decisions. A code review skill can describe what to look for without prescribing exact steps:
-```markdown theme={null}
-## Code review process
-1. Check all database queries for SQL injection (use parameterized queries)
-2. Verify authentication checks on every endpoint
-3. Look for race conditions in concurrent code paths
-4. Confirm error messages don't leak internal details
-```
-**Be prescriptive** when operations are fragile, consistency matters, or a specific sequence must be followed:
-````markdown theme={null}
-## Database migration
-Run exactly this sequence:
-```bash
-python scripts/migrate.py --verify --backup
-```
-Do not modify the command or add additional flags.
-````
-Most skills have a mix. Calibrate each part independently.
-### Provide defaults, not menus
-When multiple tools or approaches could work, pick a default and mention alternatives briefly rather than presenting them as equal options.
-````markdown theme={null}
-<!-- Too many options -->
-You can use pypdf, pdfplumber, PyMuPDF, or pdf2image...
-<!-- Clear default with escape hatch -->
-Use pdfplumber for text extraction:
-```python
-import pdfplumber
-```
-For scanned PDFs requiring OCR, use pdf2image with pytesseract instead.
-````
-### Favor procedures over declarations
-A skill should teach the agent *how to approach* a class of problems, not *what to produce* for a specific instance. Compare:
-```markdown theme={null}
-<!-- Specific answer — only useful for this exact task -->
-Join the `orders` table to `customers` on `customer_id`, filter where
-`region = 'EMEA'`, and sum the `amount` column.
-<!-- Reusable method — works for any analytical query -->
-1. Read the schema from `references/schema.yaml` to find relevant tables
-2. Join tables using the `_id` foreign key convention
-3. Apply any filters from the user's request as WHERE clauses
-4. Aggregate numeric columns as needed and format as a markdown table
-```
-This doesn't mean skills can't include specific details — output format templates (see [Templates for output format](#templates-for-output-format)), constraints like "never output PII," and tool-specific instructions are all valuable. The point is that the *approach* should generalize even when individual details are specific.
-## Patterns for effective instructions
-These are reusable techniques for structuring skill content. Not every skill needs all of them — use the ones that fit your task.
-### Gotchas sections
-The highest-value content in many skills is a list of gotchas — environment-specific facts that defy reasonable assumptions. These aren't general advice ("handle errors appropriately") but concrete corrections to mistakes the agent will make without being told otherwise:
-```markdown theme={null}
-## Gotchas
-- The `users` table uses soft deletes. Queries must include
-  `WHERE deleted_at IS NULL` or results will include deactivated accounts.
-- The user ID is `user_id` in the database, `uid` in the auth service,
-  and `accountId` in the billing API. All three refer to the same value.
-- The `/health` endpoint returns 200 as long as the web server is running,
-  even if the database connection is down. Use `/ready` to check full
-  service health.
-```
-Keep gotchas in `SKILL.md` where the agent reads them before encountering the situation. A separate reference file works if you tell the agent when to load it, but for non-obvious issues, the agent may not recognize the trigger.
-<Tip>
-  When an agent makes a mistake you have to correct, add the correction to the gotchas section. This is one of the most direct ways to improve a skill iteratively (see [Refine with real execution](#refine-with-real-execution)).
-</Tip>
-### Templates for output format
-When you need the agent to produce output in a specific format, provide a template. This is more reliable than describing the format in prose, because agents pattern-match well against concrete structures. Short templates can live inline in `SKILL.md`; for longer templates, or templates only needed in certain cases, store them in `assets/` and reference them from `SKILL.md` so they only load when needed.
-````markdown theme={null}
-## Report structure
-Use this template, adapting sections as needed for the specific analysis:
-```markdown
-# [Analysis Title]
-## Executive summary
-[One-paragraph overview of key findings]
-## Key findings
-- Finding 1 with supporting data
-- Finding 2 with supporting data
-## Recommendations
-1. Specific actionable recommendation
-2. Specific actionable recommendation
-```
-````
-### Checklists for multi-step workflows
-An explicit checklist helps the agent track progress and avoid skipping steps, especially when steps have dependencies or validation gates.
-```markdown theme={null}
-## Form processing workflow
-Progress:
-- [ ] Step 1: Analyze the form (run `scripts/analyze_form.py`)
-- [ ] Step 2: Create field mapping (edit `fields.json`)
-- [ ] Step 3: Validate mapping (run `scripts/validate_fields.py`)
-- [ ] Step 4: Fill the form (run `scripts/fill_form.py`)
-- [ ] Step 5: Verify output (run `scripts/verify_output.py`)
-```
-### Validation loops
-Instruct the agent to validate its own work before moving on. The pattern is: do the work, run a validator (a script, a reference checklist, or a self-check), fix any issues, and repeat until validation passes.
-```markdown theme={null}
-## Editing workflow
-1. Make your edits
-2. Run validation: `python scripts/validate.py output/`
-3. If validation fails:
-   - Review the error message
-   - Fix the issues
-   - Run validation again
-4. Only proceed when validation passes
-```
-A reference document can also serve as the "validator" — instruct the agent to check its work against the reference before finalizing.
-### Plan-validate-execute
-For batch or destructive operations, have the agent create an intermediate plan in a structured format, validate it against a source of truth, and only then execute.
-```markdown theme={null}
-## PDF form filling
-1. Extract form fields: `python scripts/analyze_form.py input.pdf` → `form_fields.json`
-   (lists every field name, type, and whether it's required)
-2. Create `field_values.json` mapping each field name to its intended value
-3. Validate: `python scripts/validate_fields.py form_fields.json field_values.json`
-   (checks that every field name exists in the form, types are compatible, and
-   required fields aren't missing)
-4. If validation fails, revise `field_values.json` and re-validate
-5. Fill the form: `python scripts/fill_form.py input.pdf field_values.json output.pdf`
-```
-The key ingredient is step 3: a validation script that checks the plan (`field_values.json`) against the source of truth (`form_fields.json`). Errors like "Field 'signature\_date' not found — available fields: customer\_name, order\_total, signature\_date\_signed" give the agent enough information to self-correct.
-### Bundling reusable scripts
-When [iterating on a skill](/skill-creation/evaluating-skills), compare the agent's execution traces across test cases. If you notice the agent independently reinventing the same logic each run — building charts, parsing a specific format, validating output — that's a signal to write a tested script once and bundle it in `scripts/`.
-For more on designing and bundling scripts, see [Using scripts in skills](/skill-creation/using-scripts).
-## Next steps
-Once you have a working skill, two guides can help you refine it further:
-* **[Evaluating skill output quality](/skill-creation/evaluating-skills)** — Set up test cases, grade results, and iterate systematically.
-* **[Optimizing skill descriptions](/skill-creation/optimizing-descriptions)** — Test and improve your skill's `description` field so it triggers on the right prompts.