opencodekit 0.21.0 → 0.21.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +4 -4
- package/dist/template/.opencode/AGENTS.md +51 -30
- package/dist/template/.opencode/agent/vision.md +0 -1
- package/dist/template/.opencode/memory.db +0 -0
- package/dist/template/.opencode/memory.db-shm +0 -0
- package/dist/template/.opencode/memory.db-wal +0 -0
- package/dist/template/.opencode/opencode.json +2 -2
- package/dist/template/.opencode/package.json +20 -21
- package/dist/template/.opencode/plugin/README.md +0 -7
- package/dist/template/.opencode/plugin/copilot-auth.ts +59 -0
- package/dist/template/.opencode/plugin/prompt-leverage.ts +136 -138
- package/dist/template/.opencode/pnpm-lock.yaml +140 -706
- package/dist/template/.opencode/skill/agent-evals/SKILL.md +208 -0
- package/dist/template/.opencode/skill/anti-ai-slop/SKILL.md +76 -0
- package/dist/template/.opencode/skill/brand-asset-protocol/SKILL.md +222 -0
- package/dist/template/.opencode/skill/context-condensation/SKILL.md +149 -0
- package/dist/template/.opencode/skill/design-direction-advisor/SKILL.md +139 -0
- package/dist/template/.opencode/skill/hi-fi-prototype-html/SKILL.md +253 -0
- package/dist/template/.opencode/skill/html-deck-export/SKILL.md +189 -0
- package/dist/template/.opencode/skill/test-driven-development/SKILL.md +15 -0
- package/package.json +1 -1
- package/dist/template/.opencode/plugin/prompt-leverage.ts.bak +0 -228
- package/dist/template/.opencode/plugin/stitch.ts +0 -307
- package/dist/template/.opencode/skill/stitch/SKILL.md +0 -164
- package/dist/template/.opencode/skill/stitch-design-taste/DESIGN.md +0 -121
- package/dist/template/.opencode/skill/stitch-design-taste/SKILL.md +0 -197
|
@@ -0,0 +1,208 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agent-evals
|
|
3
|
+
description: Use when adding/changing a skill, command, or agent prompt and you want evidence it actually helps — not just intuition. Defines bounded-task evals, no-skill baselines, deterministic verifiers, JSONL trace logs, and when to skip eval. Adapted from OpenAI eval guide, OpenHands "evaluating agent skills", Anthropic "demystifying evals".
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Agent Evals
|
|
7
|
+
|
|
8
|
+
> Without evals, every skill ships on vibes. The harness-engineering literature is unanimous: **measured changes beat believed-changes**. This skill gives you the smallest workable eval loop.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
Run an eval when:
|
|
13
|
+
|
|
14
|
+
- Adding a new skill that claims to improve outcomes (`anti-ai-slop`, `prompt-leverage`, `condition-based-waiting`)
|
|
15
|
+
- Changing a prompt or instruction in an agent (`build`, `plan`, `review`)
|
|
16
|
+
- Comparing two approaches and you don't know which is better
|
|
17
|
+
- A skill is suspected of being inert ("does this even do anything?")
|
|
18
|
+
|
|
19
|
+
**Skip eval when:**
|
|
20
|
+
|
|
21
|
+
- The change is mechanical (rename, refactor, lint fix)
|
|
22
|
+
- The change is a one-shot fix with obvious verification (test passes / build green)
|
|
23
|
+
- The skill is purely procedural with deterministic output (workspace setup)
|
|
24
|
+
|
|
25
|
+
## Core Principle: Bounded + Baseline + Verifier
|
|
26
|
+
|
|
27
|
+
Three ingredients. Skip any one and the eval is theatre.
|
|
28
|
+
|
|
29
|
+
1. **Bounded task** — a concrete prompt with a definite finish line, runnable in <5 minutes
|
|
30
|
+
2. **No-skill baseline** — the same task run **without** the skill loaded, for comparison
|
|
31
|
+
3. **Deterministic verifier** — a check that returns pass/fail without human judgment
|
|
32
|
+
|
|
33
|
+
If you cannot write the verifier, the skill's value is unmeasurable and you are guessing.
|
|
34
|
+
|
|
35
|
+
## Eval Loop (Minimum Viable)
|
|
36
|
+
|
|
37
|
+
### Step 1: Define the task
|
|
38
|
+
|
|
39
|
+
Pick a real failure mode the skill targets. One paragraph, copy-pastable as a prompt.
|
|
40
|
+
|
|
41
|
+
```markdown
|
|
42
|
+
## Task: anti-ai-slop / no-purple-gradient
|
|
43
|
+
|
|
44
|
+
**Prompt:** "Build a landing page hero for a coffee roastery brand. Single HTML file."
|
|
45
|
+
|
|
46
|
+
**Verifier (deterministic):**
|
|
47
|
+
|
|
48
|
+
- grep output for `linear-gradient.*purple|#[89a]\d[0-9a-f]\d{3}` → must return 0 matches
|
|
49
|
+
- grep output for `Inter|Roboto` in `font-family` → must return 0 matches
|
|
50
|
+
- File contains `<h1>` with content → must be true
|
|
51
|
+
|
|
52
|
+
**Pass criteria:** all 3 checks pass
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### Step 2: Run baseline (no skill)
|
|
56
|
+
|
|
57
|
+
Fresh subagent, **don't load the skill**. Same prompt. Save output.
|
|
58
|
+
|
|
59
|
+
```typescript
|
|
60
|
+
task({
|
|
61
|
+
subagent_type: "general",
|
|
62
|
+
description: "Baseline: coffee landing page",
|
|
63
|
+
prompt:
|
|
64
|
+
"Build a landing page hero for a coffee roastery brand. Single HTML file. Output the full HTML only.",
|
|
65
|
+
});
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
Save result to `.beads/artifacts/<eval-id>/baseline.html`.
|
|
69
|
+
|
|
70
|
+
### Step 3: Run treatment (with skill)
|
|
71
|
+
|
|
72
|
+
Fresh subagent, **load the skill explicitly** in prompt. Same task.
|
|
73
|
+
|
|
74
|
+
```typescript
|
|
75
|
+
task({
|
|
76
|
+
subagent_type: "general",
|
|
77
|
+
description: "Treatment: coffee landing page",
|
|
78
|
+
prompt: `First load the anti-ai-slop skill. Then: Build a landing page hero for a coffee roastery brand. Single HTML file. Output the full HTML only.`,
|
|
79
|
+
});
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Save result to `.beads/artifacts/<eval-id>/treatment.html`.
|
|
83
|
+
|
|
84
|
+
### Step 4: Run verifier on both
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
# Baseline
|
|
88
|
+
grep -cE "linear-gradient.*purple|#[89a][0-9a-f]" baseline.html
|
|
89
|
+
grep -cE "(Inter|Roboto)" baseline.html
|
|
90
|
+
|
|
91
|
+
# Treatment
|
|
92
|
+
grep -cE "linear-gradient.*purple|#[89a][0-9a-f]" treatment.html
|
|
93
|
+
grep -cE "(Inter|Roboto)" treatment.html
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
### Step 5: Record result
|
|
97
|
+
|
|
98
|
+
Append one JSONL line to `.opencode/evals/log.jsonl`:
|
|
99
|
+
|
|
100
|
+
```json
|
|
101
|
+
{
|
|
102
|
+
"eval_id": "anti-slop-001",
|
|
103
|
+
"skill": "anti-ai-slop",
|
|
104
|
+
"date": "2026-04-21",
|
|
105
|
+
"baseline_pass": false,
|
|
106
|
+
"treatment_pass": true,
|
|
107
|
+
"delta": "+1",
|
|
108
|
+
"notes": "baseline used purple gradient + Inter; treatment used warm browns + Source Serif"
|
|
109
|
+
}
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## Multi-Run for Confidence
|
|
113
|
+
|
|
114
|
+
A single run can be lucky. For a skill you're seriously evaluating:
|
|
115
|
+
|
|
116
|
+
- Run baseline **3 times**, treatment **3 times** (different seeds via different prompts framings)
|
|
117
|
+
- Report pass-rate not single result: `baseline 1/3, treatment 3/3`
|
|
118
|
+
- If treatment ≤ baseline, the skill is **inert or harmful** — fix or delete
|
|
119
|
+
|
|
120
|
+
## Verifier Patterns That Work
|
|
121
|
+
|
|
122
|
+
| Skill type | Verifier |
|
|
123
|
+
| ---------------------- | ----------------------------------------------------------------- |
|
|
124
|
+
| Anti-pattern avoidance | `grep` for the banned pattern → expect 0 |
|
|
125
|
+
| Required output shape | JSON schema validation, presence of required sections |
|
|
126
|
+
| Code correctness | run the code, run its tests, check exit code |
|
|
127
|
+
| Behavior change | call site count via `tilth_search`, file existence, line counts |
|
|
128
|
+
| UI / visual | Playwright screenshot + pixel diff against expected, or DOM query |
|
|
129
|
+
| Refusal / safety | grep for forbidden phrases or correct refusal pattern |
|
|
130
|
+
|
|
131
|
+
## Verifier Anti-Patterns (Don't Use)
|
|
132
|
+
|
|
133
|
+
- "Ask another LLM if this is good" — non-deterministic, expensive, judgment-laden
|
|
134
|
+
- "Check if it looks right" — not a verifier, that's a vibe
|
|
135
|
+
- "Pass if no errors thrown" — too weak, baseline also passes
|
|
136
|
+
- "Manually inspect" — fine for one-off, useless for regression
|
|
137
|
+
|
|
138
|
+
## Trace Logging Format
|
|
139
|
+
|
|
140
|
+
For multi-step evals (agent ran 5 tool calls, made 3 edits), log the trace:
|
|
141
|
+
|
|
142
|
+
```json
|
|
143
|
+
{
|
|
144
|
+
"eval_id": "ship-flow-002",
|
|
145
|
+
"steps": [
|
|
146
|
+
{ "tool": "task", "args": { "subagent_type": "explore" }, "ok": true },
|
|
147
|
+
{ "tool": "edit", "args": { "path": "src/auth.ts" }, "ok": true },
|
|
148
|
+
{ "tool": "bash", "args": { "command": "npm test" }, "ok": false, "exit_code": 1 }
|
|
149
|
+
],
|
|
150
|
+
"outcome": "failed_at_step_3",
|
|
151
|
+
"verifier_pass": false
|
|
152
|
+
}
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
This lets you find **which step** failed across many runs — surfaces flaky points in a workflow.
|
|
156
|
+
|
|
157
|
+
## When Eval Disagrees with Intuition
|
|
158
|
+
|
|
159
|
+
The skill **feels** great but the eval says baseline ≥ treatment. Trust the eval. Common causes:
|
|
160
|
+
|
|
161
|
+
1. The skill is too long — the agent ignored it
|
|
162
|
+
2. The skill targets a problem the model already handles
|
|
163
|
+
3. The verifier doesn't measure what the skill actually changes (re-read your verifier)
|
|
164
|
+
4. The baseline prompt was too easy (try a harder task)
|
|
165
|
+
|
|
166
|
+
Fix in this order: verifier → task difficulty → skill content. Delete the skill if all three fail.
|
|
167
|
+
|
|
168
|
+
## Eval Storage Convention
|
|
169
|
+
|
|
170
|
+
```
|
|
171
|
+
.opencode/evals/
|
|
172
|
+
├── log.jsonl # append-only, one line per run
|
|
173
|
+
├── tasks/ # task definitions
|
|
174
|
+
│ ├── anti-slop-001.md
|
|
175
|
+
│ └── ship-flow-002.md
|
|
176
|
+
└── artifacts/ # baseline.* and treatment.* outputs
|
|
177
|
+
└── <eval_id>/
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
Keep evals in-repo. They're documentation that the skill works.
|
|
181
|
+
|
|
182
|
+
## Integration with `/health` and `/curate`
|
|
183
|
+
|
|
184
|
+
- `/health` should flag skills with **zero eval coverage** as IMPORTANT (not CRITICAL — many skills are simple enough not to need it)
|
|
185
|
+
- `/curate` should surface eval results when proposing skill consolidation: "skill X has 0/5 passes over last 3 months, propose deletion"
|
|
186
|
+
|
|
187
|
+
## Cost Discipline
|
|
188
|
+
|
|
189
|
+
- Each eval run = 1 subagent call. 6 runs (3 baseline + 3 treatment) = 6 calls.
|
|
190
|
+
- Don't eval every skill. Eval the ones whose **value is contested** or whose **failure would be expensive**.
|
|
191
|
+
- Cache baseline runs — re-run only when the underlying model changes.
|
|
192
|
+
|
|
193
|
+
## Output
|
|
194
|
+
|
|
195
|
+
After running an eval, return:
|
|
196
|
+
|
|
197
|
+
```markdown
|
|
198
|
+
## Eval: <skill-name>
|
|
199
|
+
|
|
200
|
+
- **Task:** [one line]
|
|
201
|
+
- **Baseline:** N/M passes
|
|
202
|
+
- **Treatment:** N/M passes
|
|
203
|
+
- **Delta:** [+/-N]
|
|
204
|
+
- **Verdict:** keep | iterate | delete
|
|
205
|
+
- **Trace:** `.opencode/evals/log.jsonl` line <N>
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
Brief. Evidence-based. No padding.
|
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: anti-ai-slop
|
|
3
|
+
description: Use when generating any visual design output (web UI, slides, animations, mockups, infographics) to actively prevent the AI default aesthetic that strips brand identity. Bans purple gradients, emoji-as-icons, rounded-card+left-accent, AI-drawn human SVGs, GitHub-dark `#0D1117`, Inter/Roboto-as-display. Adapted from huashu-design.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Anti AI Slop
|
|
7
|
+
|
|
8
|
+
> The AI default aesthetic is the **visual common denominator** of all training data. Using it makes every brand look identical. Avoiding it is **brand protection**, not aesthetic snobbery.
|
|
9
|
+
|
|
10
|
+
## Why This Matters
|
|
11
|
+
|
|
12
|
+
The reasoning chain:
|
|
13
|
+
|
|
14
|
+
1. The user wants their brand to be recognizable.
|
|
15
|
+
2. AI default output = average of training corpus = all brands blended = **no brand recognized**.
|
|
16
|
+
3. So AI defaults dilute the user's brand into "another AI-generated page."
|
|
17
|
+
4. Avoiding AI slop is replacing default-mode output with **brand-specific intent**.
|
|
18
|
+
|
|
19
|
+
Anti-slop is the **defensive** half of design discipline. The **offensive** half is `brand-asset-protocol` (use real logos, real product images, real colors). Both required.
|
|
20
|
+
|
|
21
|
+
## The Slop Lookup Table
|
|
22
|
+
|
|
23
|
+
| Pattern | Why it's slop | Allowed when |
|
|
24
|
+
| ------------------------------------------------------ | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
|
|
25
|
+
| **Aggressive purple gradient** | The "tech feel" formula in every SaaS/AI/web3 landing page in training data | Brand actually uses purple gradient (some Linear contexts), or task is satire of slop |
|
|
26
|
+
| **Emoji as icon** (`🚀 Fast`, `✨ Magic`) | "Not professional enough? Add an emoji" disease — symptom of training data | Brand uses them (Notion), or audience is kids / casual |
|
|
27
|
+
| **Rounded card + left colored border accent** | 2020-2024 Material/Tailwind cliche, now visual noise | User explicitly asks, or it's in the brand spec |
|
|
28
|
+
| **SVG-drawn imagery** (humans, faces, scenes) | AI-drawn SVG humans always have warped faces, weird proportions | **Almost never** — use real images (Wikimedia/Unsplash/AI-generated) or honest placeholder |
|
|
29
|
+
| **CSS silhouettes for product photos** | Generates "generic tech animation" — black bg + orange accent + rounded bars. Zero brand ID | Almost never — fetch real product photo first (see `brand-asset-protocol`) |
|
|
30
|
+
| **Inter / Roboto / Arial / system as display font** | Too common — reader can't tell "designed product" from "demo page" | Brand spec uses them (Stripe uses tuned Inter variant) |
|
|
31
|
+
| **Cyber neon / GitHub dark `#0D1117`** | Copy of GitHub dark mode aesthetic, used everywhere | Developer tools brand that genuinely uses this direction |
|
|
32
|
+
| **Generic stock photo "lifestyle" hero on text essay** | Adds no information; pure decoration = slop | Image is the content (museum portrait, product detail, location card) |
|
|
33
|
+
| **3+ accent colors** | Multi-color clustering reads as "I couldn't decide" | Data legitimately has ≥3 categorical dimensions |
|
|
34
|
+
| **Decorative-icon-on-every-line** | "Iconography slop" — pads visual density without information | Icon carries differentiating product information (status, type, action) |
|
|
35
|
+
| **Fabricated stats / fake quotes / lorem ipsum** | "Data slop" — fills space with meaningless numbers | Never. Ask user for real content or leave honest blank space |
|
|
36
|
+
| **One generic "page load" animation everywhere** | Scattered micro-interactions feel cheap | One well-orchestrated, intentional animation per page |
|
|
37
|
+
|
|
38
|
+
**Single criterion for allowing any of these**: "the brand spec uses it" or "the task is intentionally about showing slop." Without that explicit reason, default to avoiding.
|
|
39
|
+
|
|
40
|
+
## What to Do Instead (Positive Patterns)
|
|
41
|
+
|
|
42
|
+
- ✅ **`text-wrap: pretty`, CSS Grid, advanced CSS** — typography details an AI usually skips. Signals "real designer."
|
|
43
|
+
- ✅ **Use `oklch()` or colors from the brand spec.** Never invent new colors mid-design — every invented color erodes brand consistency.
|
|
44
|
+
- ✅ **Real images > AI-drawn SVG > HTML/CSS-faked imagery.** Photo first; AI generation second; CSS shapes only when imagery isn't the point.
|
|
45
|
+
- ✅ **Typographic curly quotes** ("smart" not "straight") — signal of "this was reviewed."
|
|
46
|
+
- ✅ **120% on one detail, 80% on the rest.** Taste = picking the right place to be precise. Even attention everywhere = uniformly bland.
|
|
47
|
+
- ✅ **Honest blank > clumsy fill.** A gray block labeled "user avatar" beats an AI-drawn portrait.
|
|
48
|
+
|
|
49
|
+
## "But the task IS about slop" — Negative Examples Done Right
|
|
50
|
+
|
|
51
|
+
When the work is showing what _not_ to do (a critique post, a slop-vs-good comparison):
|
|
52
|
+
|
|
53
|
+
- **Don't fill the whole page with slop.** Containerize the bad sample.
|
|
54
|
+
- Use a **dashed border + corner label** "Anti-pattern · do not copy" so the reader can tell intent.
|
|
55
|
+
- The negative example serves the narrative; it doesn't pollute the page's primary visual register.
|
|
56
|
+
|
|
57
|
+
## Self-Check Questions (run before delivery)
|
|
58
|
+
|
|
59
|
+
For every visual element on the page, ask:
|
|
60
|
+
|
|
61
|
+
1. **Why is this here?** "It looks nice" is not enough. Each element must earn its place.
|
|
62
|
+
2. **Could a different brand use this exact element?** If yes → it's not specific enough.
|
|
63
|
+
3. **Did I invent this color/font/shape, or did it come from the brand spec or a real source?**
|
|
64
|
+
4. **Is there an icon that adds no information?** Remove it.
|
|
65
|
+
5. **Is there a number/stat I made up?** Remove it or get real data.
|
|
66
|
+
6. **Is there a gradient that has no brand basis?** Remove it.
|
|
67
|
+
7. **Is there an SVG of a human/face I drew?** Replace with real image or honest placeholder.
|
|
68
|
+
|
|
69
|
+
If any answer fails the test, fix before claiming done.
|
|
70
|
+
|
|
71
|
+
## Pairs Well With
|
|
72
|
+
|
|
73
|
+
- `brand-asset-protocol` — the positive counterpart (real assets, brand spec)
|
|
74
|
+
- `design-direction-advisor` — when no brand context exists, recommends differentiated directions instead of falling into AI default
|
|
75
|
+
- `design-taste-frontend` — base aesthetic discipline for web UI (more prescriptive; this skill is more prohibitive)
|
|
76
|
+
- `high-end-visual-design` — premium aesthetic overlay
|
|
@@ -0,0 +1,222 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: brand-asset-protocol
|
|
3
|
+
description: Use when designing anything for a specific brand (logo, marketing site, product launch animation, branded slide deck, redesign). MUST load before producing branded visuals. Forces real logo + product image + UI screenshot fetch BEFORE any design work — color values alone are not enough for brand recognition. Includes 5-10-2-8 quality bar (5 search rounds, 10 candidates, pick 2 best, each ≥8/10).
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Brand Asset Protocol
|
|
7
|
+
|
|
8
|
+
> Adapted from `huashu-design`. Brand recognition lives in **assets first, color second**. Generic CSS shapes can never substitute a real logo or product image.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
Trigger this protocol whenever the task names a real brand or product:
|
|
13
|
+
|
|
14
|
+
- "Design a launch animation for **DJI Pocket 4**"
|
|
15
|
+
- "Redesign the **Stripe** dashboard"
|
|
16
|
+
- "Build a landing page for **Linear**"
|
|
17
|
+
- "Make a slide deck about **Anthropic Claude**"
|
|
18
|
+
- Internal client work: "Design something for **our company**"
|
|
19
|
+
|
|
20
|
+
If the task is generic ("design a SaaS landing page"), skip this skill — use `design-direction-advisor` instead.
|
|
21
|
+
|
|
22
|
+
## Prerequisite: Verify the Brand Exists First
|
|
23
|
+
|
|
24
|
+
Before fetching assets, confirm the brand/product **exists and you know its current state**. Never assert from training data — search.
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
Trigger: any specific product/version (Pocket 4, Gemini 3 Pro, new SDK, etc.)
|
|
28
|
+
Action: WebSearch "<product> launch date specs 2026" → read 1-3 authoritative results
|
|
29
|
+
Cost: 10 seconds. Skipping this can cost 1-2 hours of rework.
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
Forbidden phrases that signal you should search instead of guess:
|
|
33
|
+
|
|
34
|
+
- ❌ "I think X hasn't launched yet"
|
|
35
|
+
- ❌ "X is currently version N" (without checking)
|
|
36
|
+
- ❌ "X probably doesn't exist"
|
|
37
|
+
- ✅ "Let me search for the latest state of X"
|
|
38
|
+
|
|
39
|
+
## Asset Hierarchy: Why Color Alone Fails
|
|
40
|
+
|
|
41
|
+
| Asset | Recognition contribution | Required for |
|
|
42
|
+
| ---------------------------- | ---------------------------- | ----------------------------------------- |
|
|
43
|
+
| **Logo** | Highest — instant ID | **Every** branded project |
|
|
44
|
+
| **Product photos / renders** | Very high — the "main actor" | Physical products (hardware, packaging) |
|
|
45
|
+
| **UI screenshots** | Very high — the "main actor" | Digital products (apps, SaaS, dashboards) |
|
|
46
|
+
| Color palette | Medium — easily collides | Supporting role |
|
|
47
|
+
| Typography | Low — needs the above | Supporting role |
|
|
48
|
+
| Vibe keywords | Low — internal QA only | Supporting role |
|
|
49
|
+
|
|
50
|
+
**Rule**: Pulling colors + fonts but skipping logo/product/UI → protocol violation. Generic placeholders, CSS silhouettes, or hand-drawn SVG cannot substitute real assets.
|
|
51
|
+
|
|
52
|
+
## The 5-Step Protocol (Strict Sequence)
|
|
53
|
+
|
|
54
|
+
### Step 1 — Ask (one full asset checklist, not vague "got brand guidelines?")
|
|
55
|
+
|
|
56
|
+
Send this exact list:
|
|
57
|
+
|
|
58
|
+
```
|
|
59
|
+
For <brand/product>, which of these do you have? (priority order)
|
|
60
|
+
1. Logo (SVG / high-res PNG) — required for any brand
|
|
61
|
+
2. Product photos / official renders — required for physical products
|
|
62
|
+
3. UI screenshots / interface assets — required for digital products
|
|
63
|
+
4. Color values (HEX / RGB / palette doc)
|
|
64
|
+
5. Font list (display / body)
|
|
65
|
+
6. Brand guidelines PDF / Figma design system / brand site URL
|
|
66
|
+
|
|
67
|
+
Send what you have; I'll fetch / generate the rest.
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
### Step 2 — Search Official Sources
|
|
71
|
+
|
|
72
|
+
| Asset | Where to look |
|
|
73
|
+
| --------------- | ------------------------------------------------------------------------------------------- |
|
|
74
|
+
| **Logo** | `<brand>.com/brand`, `/press`, `/press-kit`, `brand.<brand>.com`, header inline SVG |
|
|
75
|
+
| **Product img** | Product detail page hero + gallery, official YouTube launch film frames, press releases |
|
|
76
|
+
| **UI** | App Store / Play Store screenshots, official site screenshots section, demo video frames |
|
|
77
|
+
| **Colors** | Site inline CSS / Tailwind config / brand guidelines PDF |
|
|
78
|
+
| **Fonts** | `<link rel="stylesheet">` references on official site, Google Fonts trace, brand guidelines |
|
|
79
|
+
|
|
80
|
+
Fallback queries: `<brand> logo download SVG`, `<brand> press kit`, `<brand> <product> official renders`, `<brand> app screenshots`.
|
|
81
|
+
|
|
82
|
+
### Step 3 — Download (three fallback paths per asset type)
|
|
83
|
+
|
|
84
|
+
**Logo (mandatory for every brand):**
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
# 1. Direct file (best)
|
|
88
|
+
curl -o assets/<brand>/logo.svg https://<brand>.com/logo.svg
|
|
89
|
+
|
|
90
|
+
# 2. Extract inline SVG from homepage HTML (~80% of cases)
|
|
91
|
+
curl -A "Mozilla/5.0" -L https://<brand>.com -o assets/<brand>/homepage.html
|
|
92
|
+
# then grep <svg>...</svg> for the logo node
|
|
93
|
+
|
|
94
|
+
# 3. Official social avatar (last resort): GitHub/Twitter/LinkedIn org image
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**Product photos (mandatory for physical products):**
|
|
98
|
+
|
|
99
|
+
1. Official product page hero (right-click image URL, curl it)
|
|
100
|
+
2. Official press kit / press releases
|
|
101
|
+
3. Launch video frames (`yt-dlp` + `ffmpeg`)
|
|
102
|
+
4. Wikimedia Commons (public domain)
|
|
103
|
+
5. AI generation **using the official product photo as reference** — never replace with CSS silhouettes
|
|
104
|
+
|
|
105
|
+
**UI screenshots (mandatory for digital products):**
|
|
106
|
+
|
|
107
|
+
- App Store / Play Store product page (note: may be marketing mockups, not real UI — verify)
|
|
108
|
+
- Official site screenshots section
|
|
109
|
+
- Product demo video frames
|
|
110
|
+
- Official Twitter/X launch posts (often the latest version)
|
|
111
|
+
- Real screenshots from your own account when possible
|
|
112
|
+
|
|
113
|
+
### Step 4 — Quality Bar: The 5-10-2-8 Rule (non-Logo assets)
|
|
114
|
+
|
|
115
|
+
> Logo rules differ: any logo must be used (no logo → stop and ask user). Other assets (product photos, UI, hero imagery) follow 5-10-2-8.
|
|
116
|
+
|
|
117
|
+
| Dimension | Standard | Anti-pattern |
|
|
118
|
+
| ---------------------- | ------------------------------------------------------------------------------------------------------- | ----------------------------------- |
|
|
119
|
+
| **5 search rounds** | Cross-search official site / press kit / social / YouTube / Wikimedia / user account | First page of results, stop |
|
|
120
|
+
| **10 candidates** | Gather at least 10 before filtering | Grab 2, no choice |
|
|
121
|
+
| **Pick 2 best** | Curate 2 finals from the 10 | Use everything = visual overload |
|
|
122
|
+
| **Each ≥8/10 quality** | Below 8 → use honest placeholder or AI-generate from official reference. **Better none than mediocre.** | Pad brand-spec.md with 7/10 fillers |
|
|
123
|
+
|
|
124
|
+
**8/10 scoring dimensions** (record per asset in `brand-spec.md`):
|
|
125
|
+
|
|
126
|
+
1. **Resolution** — ≥2000px (≥3000px for print/large screen)
|
|
127
|
+
2. **Copyright clarity** — official > public domain > free stock > unclear (unclear = 0)
|
|
128
|
+
3. **Brand vibe match** — consistent with the vibe keywords in `brand-spec.md`
|
|
129
|
+
4. **Lighting/composition consistency** — the 2 finals should not clash
|
|
130
|
+
5. **Standalone narrative** — each asset must independently express a narrative role, not just decorate
|
|
131
|
+
|
|
132
|
+
### Step 5 — Codify in `brand-spec.md` (single source of truth)
|
|
133
|
+
|
|
134
|
+
```markdown
|
|
135
|
+
# <Brand> · Brand Spec
|
|
136
|
+
|
|
137
|
+
> Captured: YYYY-MM-DD
|
|
138
|
+
> Sources: <list>
|
|
139
|
+
> Completeness: complete / partial / inferred
|
|
140
|
+
|
|
141
|
+
## 🎯 Core Assets (first-class)
|
|
142
|
+
|
|
143
|
+
### Logo
|
|
144
|
+
|
|
145
|
+
- Primary: `assets/<brand>/logo.svg`
|
|
146
|
+
- Inverse: `assets/<brand>/logo-white.svg`
|
|
147
|
+
- Use cases: <intro / outro / corner watermark / global>
|
|
148
|
+
- Forbidden: <no stretch / no recolor / no outline>
|
|
149
|
+
|
|
150
|
+
### Product Photos (required for physical products)
|
|
151
|
+
|
|
152
|
+
- Hero: `assets/<brand>/product-hero.png` (2000×1500)
|
|
153
|
+
- Detail: `assets/<brand>/product-detail-1.png`
|
|
154
|
+
- Scene: `assets/<brand>/product-scene.png`
|
|
155
|
+
|
|
156
|
+
### UI Screenshots (required for digital products)
|
|
157
|
+
|
|
158
|
+
- Home: `assets/<brand>/ui-home.png`
|
|
159
|
+
- Feature: `assets/<brand>/ui-feature-<name>.png`
|
|
160
|
+
|
|
161
|
+
## 🎨 Supporting Assets
|
|
162
|
+
|
|
163
|
+
### Palette
|
|
164
|
+
|
|
165
|
+
- Primary: #XXXXXX <source>
|
|
166
|
+
- Background: #XXXXXX
|
|
167
|
+
- Ink: #XXXXXX
|
|
168
|
+
- Accent: #XXXXXX
|
|
169
|
+
- Forbidden: <colors the brand explicitly avoids>
|
|
170
|
+
|
|
171
|
+
### Typography
|
|
172
|
+
|
|
173
|
+
- Display: <font stack>
|
|
174
|
+
- Body: <font stack>
|
|
175
|
+
- Mono: <font stack>
|
|
176
|
+
|
|
177
|
+
### Signature Details
|
|
178
|
+
|
|
179
|
+
- <which details are taken to 120%>
|
|
180
|
+
|
|
181
|
+
### Forbidden Zone
|
|
182
|
+
|
|
183
|
+
- <explicit "do not" rules>
|
|
184
|
+
|
|
185
|
+
### Vibe Keywords
|
|
186
|
+
|
|
187
|
+
- <3-5 adjectives>
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
## Execution Discipline (after spec exists)
|
|
191
|
+
|
|
192
|
+
- **All HTML must reference asset file paths from `brand-spec.md`** — no CSS silhouettes, no hand-drawn SVG substitutes.
|
|
193
|
+
- **Logo as `<img>`** referencing the real file. Never redraw.
|
|
194
|
+
- **Product photos as `<img>`** referencing real files. No CSS silhouettes.
|
|
195
|
+
- **CSS variables injected from spec**: `:root { --brand-primary: ...; }` — HTML only uses `var(--brand-*)`.
|
|
196
|
+
- This converts brand consistency from "by intent" to "by structure" — adding a new color requires editing the spec first.
|
|
197
|
+
|
|
198
|
+
## Failure Fallbacks
|
|
199
|
+
|
|
200
|
+
| Missing | Action |
|
|
201
|
+
| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
|
|
202
|
+
| **Logo not findable** | **Stop and ask the user.** Logo is the foundation of brand recognition. Don't fake it. |
|
|
203
|
+
| **Product photo (physical)** | Prefer AI generation **using official reference image** → ask user → honest placeholder ("product photo TBD") as last resort |
|
|
204
|
+
| **UI screenshot (digital)** | Ask user for screenshots from their account → official demo video frames. Don't use generic mockup generators. |
|
|
205
|
+
| **Color values** | Run `design-direction-advisor` skill, recommend 3 directions with explicit assumption labels |
|
|
206
|
+
|
|
207
|
+
**Forbidden**: silently using CSS silhouettes / generic gradients when assets can't be found. **Better to stop and ask than to fake.**
|
|
208
|
+
|
|
209
|
+
## Real Failures (why this protocol exists)
|
|
210
|
+
|
|
211
|
+
- **Kimi animation**: Guessed "should be orange" from memory. Actual brand color: `#1783FF` blue. Full rework.
|
|
212
|
+
- **Lovart design**: Mistook a demo brand color in a product screenshot for Lovart's own. Almost destroyed the entire design.
|
|
213
|
+
- **DJI Pocket 4 launch animation**: Pulled colors but skipped logo + product image, used CSS silhouettes. Output was "generic black-bg + orange-accent tech animation" with zero DJI recognition. Designer's note: _"Otherwise, what are we even expressing?"_
|
|
214
|
+
|
|
215
|
+
## Cost Comparison
|
|
216
|
+
|
|
217
|
+
| Path | Time |
|
|
218
|
+
| -------------------------- | ----------------------------------------------------------------------------------- |
|
|
219
|
+
| **Run protocol correctly** | Logo 5 min + product/UI 10 min + color grep 5 min + spec write 10 min = **~30 min** |
|
|
220
|
+
| **Skip protocol** | Generic output → user rework 1-2 hours, sometimes full redo |
|
|
221
|
+
|
|
222
|
+
**The cheapest stability investment in branded design work.** For client deliverables, launch events, or important customer projects, the 30-minute protocol is insurance.
|
|
@@ -0,0 +1,149 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: context-condensation
|
|
3
|
+
description: Use when context is approaching budget limits and you need to compress, OR when handing off to another session. Provides the explicit keep/drop rubric for what survives compression — preserve goals, progress, critical files, failing tests; drop exploration noise and resolved threads. Pairs with `/dcp compress` and `/handoff`. Adapted from OpenHands context-condensation, Manus context engineering, HumanLayer backpressure.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Context Condensation
|
|
7
|
+
|
|
8
|
+
> Compression is not deletion — it is **selection of what survives**. The wrong selection makes the agent restart from zero. This skill defines what to keep.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
- Context >100k tokens and growing
|
|
13
|
+
- Phase boundary reached (research done, ready to implement)
|
|
14
|
+
- Handing off to a new session via `/handoff`
|
|
15
|
+
- A subagent returns large output you need to integrate
|
|
16
|
+
- After `/dcp compress` ran but you still feel context drift
|
|
17
|
+
|
|
18
|
+
## Core Principle: 4 Tiers of Survival
|
|
19
|
+
|
|
20
|
+
Every conversation chunk falls into one of four tiers. Compress accordingly.
|
|
21
|
+
|
|
22
|
+
| Tier | Content | Action |
|
|
23
|
+
| ---- | -------------------------------------------------------- | ----------------------- |
|
|
24
|
+
| 1 | Goal, constraints, user-stated requirements | **Preserve verbatim** |
|
|
25
|
+
| 2 | Critical state — failing tests, current bug, open files | **Preserve as summary** |
|
|
26
|
+
| 3 | Decisions made, alternatives rejected, why | **Preserve as 1-liner** |
|
|
27
|
+
| 4 | Exploration noise — failed greps, dead ends, raw outputs | **Drop entirely** |
|
|
28
|
+
|
|
29
|
+
## Keep List (Always)
|
|
30
|
+
|
|
31
|
+
Compress around these — never drop:
|
|
32
|
+
|
|
33
|
+
- **The user's literal request.** Direct quote when short. Paraphrase only when long, and label as paraphrase.
|
|
34
|
+
- **Active failures.** Current error message, failing test name + reason, open bug. Tells the next agent what's broken.
|
|
35
|
+
- **Decisions + rationale.** "Chose JWT over sessions because [reason]." One line each. Future agents inherit the why.
|
|
36
|
+
- **File paths currently being edited.** Exact paths, not "the auth file".
|
|
37
|
+
- **Verification status.** Last typecheck/lint/test result with timestamp.
|
|
38
|
+
- **Open questions for the user.** Questions awaiting human input.
|
|
39
|
+
- **Constraints discovered mid-task.** "User can't add new deps", "DB is read-only in this env".
|
|
40
|
+
|
|
41
|
+
## Drop List (Aggressive)
|
|
42
|
+
|
|
43
|
+
Compress these to a single line or remove entirely:
|
|
44
|
+
|
|
45
|
+
- **Resolved exploration.** Found the file? Drop the 5 greps that led there.
|
|
46
|
+
- **Tool output noise.** Full directory listings, `ls` outputs, package install logs. Keep only the relevant filename.
|
|
47
|
+
- **Verbose reasoning.** Your own multi-paragraph thinking that ended in one decision. Keep the decision.
|
|
48
|
+
- **Acknowledgments.** "Got it", "I'll do that next" — pure social filler.
|
|
49
|
+
- **Failed attempts that taught nothing.** If a wrong approach added no information, drop it. Keep failures that revealed a constraint.
|
|
50
|
+
- **Sub-agent self-reports.** Keep the result + verification, drop the agent's narrative summary (Worker Distrust Protocol — don't trust the prose).
|
|
51
|
+
- **Old plans you've since revised.** Keep only the current plan.
|
|
52
|
+
|
|
53
|
+
## Failure Preservation Rule
|
|
54
|
+
|
|
55
|
+
> **Useful failures stay. Useless failures go.**
|
|
56
|
+
|
|
57
|
+
A failure is **useful** if it:
|
|
58
|
+
|
|
59
|
+
- Revealed a constraint ("can't use `mv` on this filesystem")
|
|
60
|
+
- Eliminated a hypothesis ("the bug is not in the parser")
|
|
61
|
+
- Showed a trap that another agent would re-fall into
|
|
62
|
+
|
|
63
|
+
A failure is **useless** if it:
|
|
64
|
+
|
|
65
|
+
- Was a typo / fat-finger fixed immediately
|
|
66
|
+
- Was a tool-call format error with no semantic content
|
|
67
|
+
- Was an obvious dead end no agent would repeat
|
|
68
|
+
|
|
69
|
+
**Manus rule** (from "Context Engineering for AI Agents: Lessons from Building Manus"): keep useful failures **in-context** as warnings. Don't compress to "encountered errors then succeeded" — that loses the warning.
|
|
70
|
+
|
|
71
|
+
## Condensation Triggers (Beyond Token Count)
|
|
72
|
+
|
|
73
|
+
Token count is one trigger. These are the others:
|
|
74
|
+
|
|
75
|
+
| Trigger | Action |
|
|
76
|
+
| ------------------------------------- | ----------------------------------------------- |
|
|
77
|
+
| Phase boundary (research → implement) | Compress all of research phase to summary |
|
|
78
|
+
| Subagent returns >2k tokens | Compress immediately, keep result + evidence |
|
|
79
|
+
| Same question asked twice in session | Indicates context drift — compress aggressively |
|
|
80
|
+
| Plan revised | Drop old plan completely |
|
|
81
|
+
| Conversation feels "lost" | Compress; restart from goal + current state |
|
|
82
|
+
|
|
83
|
+
## The Handoff Variant
|
|
84
|
+
|
|
85
|
+
When compressing for `/handoff` (different session, different agent will pick up), be **even more selective**:
|
|
86
|
+
|
|
87
|
+
- Preserve everything in Keep List
|
|
88
|
+
- Add a **`## Next Step`** with the literal next action
|
|
89
|
+
- Add a **`## Don't Re-Discover`** with traps already mapped
|
|
90
|
+
- Drop everything else
|
|
91
|
+
|
|
92
|
+
Handoff format:
|
|
93
|
+
|
|
94
|
+
```markdown
|
|
95
|
+
## Goal
|
|
96
|
+
|
|
97
|
+
[user's literal request]
|
|
98
|
+
|
|
99
|
+
## Status
|
|
100
|
+
|
|
101
|
+
- Done: [list of completed items with file:line]
|
|
102
|
+
- In progress: [current task]
|
|
103
|
+
- Blocked: [what's blocking, what was tried]
|
|
104
|
+
|
|
105
|
+
## Critical State
|
|
106
|
+
|
|
107
|
+
- Open files: [paths]
|
|
108
|
+
- Last verification: [command + result + timestamp]
|
|
109
|
+
- Active failures: [error / failing test]
|
|
110
|
+
|
|
111
|
+
## Decisions Made
|
|
112
|
+
|
|
113
|
+
- [one-liner] → [one-line rationale]
|
|
114
|
+
|
|
115
|
+
## Don't Re-Discover
|
|
116
|
+
|
|
117
|
+
- [trap 1]: [why]
|
|
118
|
+
- [trap 2]: [why]
|
|
119
|
+
|
|
120
|
+
## Next Step
|
|
121
|
+
|
|
122
|
+
[literal next action — copy-pastable command or description]
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
## Anti-Patterns
|
|
126
|
+
|
|
127
|
+
- **"Let me summarize what we did so far..."** in every response — you're re-summarizing instead of compressing once at boundaries
|
|
128
|
+
- **Dropping the goal during compression** — the worst failure mode; the next agent has no anchor
|
|
129
|
+
- **Keeping all failures** — context fills with noise, real signals get buried
|
|
130
|
+
- **Lossy paraphrase of user request** — when in doubt, quote verbatim
|
|
131
|
+
- **Compressing during active edits** — wait for atomic step to finish
|
|
132
|
+
|
|
133
|
+
## Integration
|
|
134
|
+
|
|
135
|
+
- **Before `/dcp compress`:** Use this rubric to decide what your `summary` field should contain
|
|
136
|
+
- **In `/handoff`:** This skill defines the handoff format
|
|
137
|
+
- **After subagent return:** Apply Drop List to the agent's narrative, keep result + verification only
|
|
138
|
+
- **In long sessions:** Re-read your own context every ~30 messages and apply the rubric
|
|
139
|
+
|
|
140
|
+
## Output
|
|
141
|
+
|
|
142
|
+
When you condense, briefly state what survived and what dropped:
|
|
143
|
+
|
|
144
|
+
```
|
|
145
|
+
Compressed messages 12-34. Kept: goal, 3 decisions, current failure (auth.ts:42 null deref).
|
|
146
|
+
Dropped: 5 file searches, 2 abandoned approaches, dir listings.
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
This makes the compression auditable. The user (or next agent) can challenge what got cut.
|