understanding-prime-env 0.1.1 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/bin/install.js
CHANGED
|
@@ -7,7 +7,7 @@ const path = require('path');
|
|
|
7
7
|
const os = require('os');
|
|
8
8
|
const readline = require('readline');
|
|
9
9
|
|
|
10
|
-
const SKILL_NAME = 'understand-
|
|
10
|
+
const SKILL_NAME = 'understand-prime-env';
|
|
11
11
|
const PACKAGE_ROOT = path.join(__dirname, '..');
|
|
12
12
|
const SKILL_MD_PATH = path.join(PACKAGE_ROOT, 'skills', SKILL_NAME, 'SKILL.md');
|
|
13
13
|
|
|
@@ -65,7 +65,7 @@ function installCursor() {
|
|
|
65
65
|
// Cursor MDC format: YAML front-matter + markdown body
|
|
66
66
|
const mdc = [
|
|
67
67
|
'---',
|
|
68
|
-
`description:
|
|
68
|
+
`description: understand-prime-env — generate HTML overview for a Prime Intellect verifiers environment`,
|
|
69
69
|
'globs:',
|
|
70
70
|
' - "**/*.py"',
|
|
71
71
|
'alwaysApply: false',
|
package/package.json
CHANGED
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: understand-prime-env
|
|
3
|
+
description: Generate a rich, self-contained HTML report that fully explains a Prime Intellect verifiers environment. Use this skill any time the user asks to understand, explain, document, visualize, or explore a verifiers environment — even if they just say "what does this environment do?", "explain this env", "give me an overview", or "generate an HTML for this environment". The skill reads the Python source files in the current directory, extracts the dataset, reward functions, rollout logic, and configuration parameters, and writes a beautiful HTML file to the environment folder.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Understand Environment
|
|
7
|
+
|
|
8
|
+
## Goal
|
|
9
|
+
|
|
10
|
+
Produce a single self-contained HTML file (`environment_overview.html`) that gives a first-timer — someone who has never seen this environment — a clear answer to one question in under 2 minutes: **"What does the model get asked to do, and how does it get scored?"**
|
|
11
|
+
|
|
12
|
+
The output is a single screen (no scrolling), three tabs. That's it.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Step 1 — Read the source
|
|
17
|
+
|
|
18
|
+
Read **every `.py` file** in the current directory. Also read `pyproject.toml` and `README.md` if they exist. Do not skip helper files — reward logic is often split across modules (e.g. `*_checks.py`, `*_prompts.py`).
|
|
19
|
+
|
|
20
|
+
Extract only these three things:
|
|
21
|
+
|
|
22
|
+
### 1. Dataset — what does the model see?
|
|
23
|
+
- Find 1–2 real example prompts from the source (a `PROMPTS` list, HuggingFace dataset, or prompt-building function).
|
|
24
|
+
- If real data is unavailable, synthesize 1–2 examples that match the prompt schema exactly.
|
|
25
|
+
- Extract only the **user-facing prompt text** — what the model actually reads. No metadata, no field schemas, no accompanying fields.
|
|
26
|
+
|
|
27
|
+
### 2. Rollout — what is the sequence of events?
|
|
28
|
+
- Identify the 4–5 steps that happen during a single rollout: what the model receives, what it produces, what tools or sandbox it has (if any), and what happens at scoring time.
|
|
29
|
+
- Write each step as a short label (2–5 words) and a one-line description.
|
|
30
|
+
|
|
31
|
+
### 3. Rewards — how does scoring work?
|
|
32
|
+
- List every reward function (`@vf.reward`, functions passed to `Rubric`, reward methods on `Taskset`).
|
|
33
|
+
- For each: its name and one sentence describing what it measures.
|
|
34
|
+
- If multiple rewards combine into a final score, extract the exact formula (e.g. `R = (1 - hw) × visible + hw × hidden`).
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## Step 2 — Generate the HTML
|
|
39
|
+
|
|
40
|
+
Write a single self-contained HTML file to `./environment_overview.html`. No external CDN dependencies — all CSS and JS inline.
|
|
41
|
+
|
|
42
|
+
### Design
|
|
43
|
+
|
|
44
|
+
**Light theme default, dark toggle in the top-right corner.**
|
|
45
|
+
|
|
46
|
+
```
|
|
47
|
+
Light: bg #f8f7f4 · card #ffffff · border #e5e1f0
|
|
48
|
+
text #1a1523 · muted #8b82a8 · accent #a855f7
|
|
49
|
+
Dark: bg #0f0f1a · card #161627 · border #2a2a4a
|
|
50
|
+
text #e2e8f0 · muted #6b6890 · accent #a855f7
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
All colors as CSS custom properties on `:root` and `[data-theme="dark"]`. Toggle swaps the attribute; `localStorage` persists the choice.
|
|
54
|
+
|
|
55
|
+
Typography: Georgia/serif for the env name; `-apple-system, Helvetica Neue, sans-serif` for everything else; `ui-monospace, Fira Code, monospace` for code and formulas. No Inter, no Roboto.
|
|
56
|
+
|
|
57
|
+
### Structure
|
|
58
|
+
|
|
59
|
+
The entire page fits on one screen without scrolling. Layout:
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
┌─────────────────────────────────────────────┐
|
|
63
|
+
│ env name (large, serif) [☀/☾ toggle]│
|
|
64
|
+
│ one-sentence description │
|
|
65
|
+
├─────────────────────────────────────────────┤
|
|
66
|
+
│ [ Dataset ] [ Rollout ] [ Rewards ] │
|
|
67
|
+
├─────────────────────────────────────────────┤
|
|
68
|
+
│ │
|
|
69
|
+
│ tab content (no scroll) │
|
|
70
|
+
│ │
|
|
71
|
+
└─────────────────────────────────────────────┘
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
### Tab 1 — Dataset
|
|
75
|
+
|
|
76
|
+
Show 1–2 example prompts in a clean monospace block:
|
|
77
|
+
- `background: var(--bg-code)`, `border-left: 3px solid var(--accent)`, `padding: 12px 16px`, `border-radius: 0 6px 6px 0`
|
|
78
|
+
- If there are 2 examples, a subtle "Example 1 / 2" toggle (two small buttons, no full tab strip)
|
|
79
|
+
- Nothing else on this tab — no labels, no field names, no copy button
|
|
80
|
+
|
|
81
|
+
### Tab 2 — Rollout
|
|
82
|
+
|
|
83
|
+
A static horizontal pipeline: 4–5 boxes connected by `→` arrows.
|
|
84
|
+
|
|
85
|
+
```
|
|
86
|
+
[ Prompt ] → [ Model ] → [ Response ] → [ Scoring ] → [ Score ]
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
Each box:
|
|
90
|
+
- `background: var(--bg-card)`, `border: 1.5px solid var(--border)`, `border-radius: 8px`, `padding: 10px 16px`
|
|
91
|
+
- **Bold label** (2–4 words) on top
|
|
92
|
+
- One-line description beneath in muted text, `font-size: 0.8rem`
|
|
93
|
+
- On hover: `border-color: var(--accent)`
|
|
94
|
+
|
|
95
|
+
Arrows: plain `→` character in muted color between boxes. No SVG, no animation.
|
|
96
|
+
|
|
97
|
+
Layout: `display: flex; align-items: center; gap: 8px; flex-wrap: wrap` so it reflows gracefully on smaller screens.
|
|
98
|
+
|
|
99
|
+
### Tab 3 — Rewards
|
|
100
|
+
|
|
101
|
+
A clean list. For each reward function:
|
|
102
|
+
|
|
103
|
+
```
|
|
104
|
+
reward_name
|
|
105
|
+
One sentence describing what it measures.
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
- Name: monospace, accent color, `font-size: 0.9rem`
|
|
109
|
+
- Description: normal prose, secondary text color, `font-size: 0.875rem`
|
|
110
|
+
- Separated by a thin `border-bottom: 1px solid var(--border)`
|
|
111
|
+
|
|
112
|
+
If there is a composite formula, show it below the list in a single styled block:
|
|
113
|
+
```
|
|
114
|
+
background: var(--accent-glow) /* rgba(168,85,247,0.10) */
|
|
115
|
+
border: 1px solid var(--accent)
|
|
116
|
+
border-radius: 6px
|
|
117
|
+
padding: 12px 16px
|
|
118
|
+
font-family: monospace
|
|
119
|
+
color: var(--accent)
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
Nothing else on this tab — no weights, no score bars, no judge details.
|
|
123
|
+
|
|
124
|
+
### Theme Toggle
|
|
125
|
+
|
|
126
|
+
A small pill button, top-right of the header. Shows `☀` in dark mode, `☾` in light mode.
|
|
127
|
+
|
|
128
|
+
```js
|
|
129
|
+
const root = document.documentElement;
|
|
130
|
+
const btn = document.getElementById('theme-toggle');
|
|
131
|
+
const saved = localStorage.getItem('pi-theme');
|
|
132
|
+
if (saved) root.setAttribute('data-theme', saved);
|
|
133
|
+
btn.addEventListener('click', () => {
|
|
134
|
+
const next = root.getAttribute('data-theme') === 'dark' ? 'light' : 'dark';
|
|
135
|
+
root.setAttribute('data-theme', next);
|
|
136
|
+
localStorage.setItem('pi-theme', next);
|
|
137
|
+
});
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### Tab Switching
|
|
141
|
+
|
|
142
|
+
```js
|
|
143
|
+
document.querySelectorAll('.tab-btn').forEach(btn => {
|
|
144
|
+
btn.addEventListener('click', () => {
|
|
145
|
+
document.querySelectorAll('.tab-btn').forEach(b => b.classList.remove('active'));
|
|
146
|
+
document.querySelectorAll('.tab-panel').forEach(p => p.classList.remove('active'));
|
|
147
|
+
btn.classList.add('active');
|
|
148
|
+
document.getElementById(btn.dataset.tab).classList.add('active');
|
|
149
|
+
});
|
|
150
|
+
});
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
Active tab style: `border-bottom: 2px solid var(--accent)`, accent color text. Inactive: muted text, no border.
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## Step 3 — Confirm and report
|
|
158
|
+
|
|
159
|
+
After writing the file, tell the user:
|
|
160
|
+
- The full path and `open environment_overview.html` command
|
|
161
|
+
- Two sentences: what the environment does and how it scores
|
|
162
|
+
|
|
163
|
+
## Anti-patterns
|
|
164
|
+
|
|
165
|
+
- Do not add config parameters, file maps, quick-start commands, or any section beyond the three tabs
|
|
166
|
+
- Do not add animations, score bars, copy buttons, or collapsible sections
|
|
167
|
+
- Do not hallucinate reward weights, defaults, or prompt content not found in the source
|
|
168
|
+
- Do not skip helper modules — they often contain the core scoring logic
|
|
169
|
+
- If content would cause scrolling within a tab, cut it further
|
|
@@ -1,398 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: understand-environment
|
|
3
|
-
description: Generate a rich, self-contained HTML report that fully explains a Prime Intellect verifiers environment. Use this skill any time the user asks to understand, explain, document, visualize, or explore a verifiers environment — even if they just say "what does this environment do?", "explain this env", "give me an overview", or "generate an HTML for this environment". The skill reads the Python source files in the current directory, extracts the dataset, reward functions, rollout logic, and configuration parameters, and writes a beautiful HTML file to the environment folder.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Understand Environment
|
|
7
|
-
|
|
8
|
-
## Goal
|
|
9
|
-
|
|
10
|
-
Produce a single self-contained HTML file (`environment_overview.html`) that gives anyone — a researcher, a new contributor, a team lead — a complete, visual understanding of what a verifiers environment does, how rollouts are judged, and how to use it. Run this skill from inside an environment directory (any folder under `environments/`).
|
|
11
|
-
|
|
12
|
-
## Step 1 — Read the source
|
|
13
|
-
|
|
14
|
-
Read **every `.py` file** in the current directory. Also read `pyproject.toml` and `README.md` if they exist. Do not skip helper files — reward logic is often split across modules (e.g. `*_checks.py`, `*_prompts.py`).
|
|
15
|
-
|
|
16
|
-
Extract these four things:
|
|
17
|
-
|
|
18
|
-
### 1. Dataset / Task Prompts
|
|
19
|
-
- What prompts or tasks does the environment feed to the model?
|
|
20
|
-
- If the environment imports a `PROMPTS` list, a HuggingFace dataset, or any structured prompt-building function, surface the actual content (or a representative sample of ≤10 rows).
|
|
21
|
-
- If real data is too large or not local, synthesize 3–5 realistic example rows that match the prompt schema exactly.
|
|
22
|
-
- Show the full message structure: `system` (if any), `user` content, and what fields accompany each row (e.g. `answer`, `info`, `checks`).
|
|
23
|
-
|
|
24
|
-
### 2. Configuration Parameters
|
|
25
|
-
- Find every parameter exposed by `load_environment(...)`, `TasksetConfig`, `HarnessConfig`, or `EnvConfig`.
|
|
26
|
-
- For each: name, type, default value, and a plain-English description of what it controls.
|
|
27
|
-
- Flag parameters that have significant behavioral impact (e.g. change scoring mode, enable/disable reward components).
|
|
28
|
-
|
|
29
|
-
### 3. Reward Functions & Scoring Logic
|
|
30
|
-
- List every reward and metric function (`@vf.reward`, `@vf.metric`, functions passed to `Rubric`, reward methods on `Taskset`).
|
|
31
|
-
- For each: name, weight (if any), what it measures, scoring range (0–1, float, etc.).
|
|
32
|
-
- Show the **composite reward formula** if multiple rewards are combined.
|
|
33
|
-
- If a judge LLM is used, state the model, the abbreviated judge prompt, and what it returns.
|
|
34
|
-
|
|
35
|
-
### 4. Rollout Logic — What Gets Judged
|
|
36
|
-
- What the model sees, how many turns, what tools or sandbox it has access to, what it's expected to produce.
|
|
37
|
-
- What signals are measured: visible constraints, hidden signals, group monitors, etc.
|
|
38
|
-
- The full scoring pipeline in plain English: "Model is given X, produces Y, then Z is checked, W is judged by a model, final score = …"
|
|
39
|
-
|
|
40
|
-
---
|
|
41
|
-
|
|
42
|
-
## Step 2 — Generate the HTML
|
|
43
|
-
|
|
44
|
-
Write a single **self-contained** HTML file to `./environment_overview.html`. No external CDN dependencies whatsoever — all CSS, JS, fonts, and icons must be inline.
|
|
45
|
-
|
|
46
|
-
---
|
|
47
|
-
|
|
48
|
-
### Design Direction: "Research Notebook"
|
|
49
|
-
|
|
50
|
-
The aesthetic is a **premium research notebook** — crisp, information-dense, and scholarly, with a living interactive layer. Light mode is the default; it should feel like a beautifully typeset technical paper. Dark mode shifts to Prime Intellect's signature deep-space palette.
|
|
51
|
-
|
|
52
|
-
**The one thing someone will remember**: The animated rollout pipeline — glowing nodes connected by flowing dashed lines, showing exactly what happens to a prompt step-by-step.
|
|
53
|
-
|
|
54
|
-
---
|
|
55
|
-
|
|
56
|
-
### Theme System
|
|
57
|
-
|
|
58
|
-
Implement a **light/dark toggle** using a single `data-theme` attribute on `<html>`. All colors must be CSS custom properties so the toggle is a single attribute swap with a smooth `transition: background 0.3s, color 0.3s` on every element.
|
|
59
|
-
|
|
60
|
-
**Light theme (default):**
|
|
61
|
-
```
|
|
62
|
-
--bg-page: #f8f7f4 /* warm off-white, like laid paper */
|
|
63
|
-
--bg-card: #ffffff
|
|
64
|
-
--bg-card-hover: #faf9ff /* barely-there purple tint on hover */
|
|
65
|
-
--bg-code: #f3f0ff /* very light purple wash for code */
|
|
66
|
-
--border: #e5e1f0 /* soft lavender border */
|
|
67
|
-
--border-strong: #c4b8e8
|
|
68
|
-
--text-primary: #1a1523 /* near-black with purple undertone */
|
|
69
|
-
--text-secondary:#4a4560
|
|
70
|
-
--text-muted: #8b82a8
|
|
71
|
-
--accent: #a855f7 /* PI purple */
|
|
72
|
-
--accent-dark: #7c3aed
|
|
73
|
-
--accent-glow: rgba(168,85,247,0.15)
|
|
74
|
-
--green: #16a34a
|
|
75
|
-
--red: #dc2626
|
|
76
|
-
--yellow: #ca8a04
|
|
77
|
-
--shadow-sm: 0 1px 3px rgba(120,80,180,0.08), 0 1px 2px rgba(120,80,180,0.05)
|
|
78
|
-
--shadow-md: 0 4px 16px rgba(120,80,180,0.10), 0 2px 6px rgba(120,80,180,0.06)
|
|
79
|
-
--shadow-lg: 0 12px 40px rgba(120,80,180,0.14)
|
|
80
|
-
```
|
|
81
|
-
|
|
82
|
-
**Dark theme** (`[data-theme="dark"]`):
|
|
83
|
-
```
|
|
84
|
-
--bg-page: #0f0f1a
|
|
85
|
-
--bg-card: #161627
|
|
86
|
-
--bg-card-hover: #1c1c35
|
|
87
|
-
--bg-code: #1a1a30
|
|
88
|
-
--border: #2a2a4a
|
|
89
|
-
--border-strong: #3d3d6b
|
|
90
|
-
--text-primary: #e2e8f0
|
|
91
|
-
--text-secondary:#b8b0d0
|
|
92
|
-
--text-muted: #6b6890
|
|
93
|
-
--accent: #a855f7
|
|
94
|
-
--accent-dark: #c084fc
|
|
95
|
-
--accent-glow: rgba(168,85,247,0.20)
|
|
96
|
-
--green: #22c55e
|
|
97
|
-
--red: #ef4444
|
|
98
|
-
--yellow: #eab308
|
|
99
|
-
--shadow-sm: 0 1px 3px rgba(0,0,0,0.4)
|
|
100
|
-
--shadow-md: 0 4px 16px rgba(0,0,0,0.5)
|
|
101
|
-
--shadow-lg: 0 12px 40px rgba(0,0,0,0.6)
|
|
102
|
-
```
|
|
103
|
-
|
|
104
|
-
---
|
|
105
|
-
|
|
106
|
-
### Typography
|
|
107
|
-
|
|
108
|
-
No generic fonts. Use this stack for the UI body:
|
|
109
|
-
```css
|
|
110
|
-
font-family: 'Georgia', 'Times New Roman', serif; /* for display/headers — editorial */
|
|
111
|
-
font-family: ui-monospace, 'Cascadia Code', 'Fira Code', monospace; /* code */
|
|
112
|
-
font-family: -apple-system, 'Helvetica Neue', Arial, sans-serif; /* body prose */
|
|
113
|
-
```
|
|
114
|
-
|
|
115
|
-
Apply Georgia/serif to section headings (`h1`, `h2`) for a research-paper feel. Use the sans stack for body text, labels, badges. Use mono for all code.
|
|
116
|
-
|
|
117
|
-
Typographic scale:
|
|
118
|
-
- Page title: `3rem`, `font-weight: 800`, letter-spacing `-0.03em`
|
|
119
|
-
- Section headings: `1.4rem`, serif, `font-weight: 700`
|
|
120
|
-
- Body: `0.9375rem` / `1.6` line-height
|
|
121
|
-
- Captions/labels: `0.75rem`, uppercase, letter-spacing `0.08em`
|
|
122
|
-
- Code: `0.85rem`
|
|
123
|
-
|
|
124
|
-
---
|
|
125
|
-
|
|
126
|
-
### Layout
|
|
127
|
-
|
|
128
|
-
```
|
|
129
|
-
┌─────────────────────────────────────────────────────┐
|
|
130
|
-
│ STICKY NAV [logo] [sections...] [theme toggle] │
|
|
131
|
-
├─────────────────────────────────────────────────────┤
|
|
132
|
-
│ HERO HEADER (full-width, gradient mesh background) │
|
|
133
|
-
│ env name + description + stat badges │
|
|
134
|
-
├─────────┬───────────────────────────────────────────┤
|
|
135
|
-
│ │ ROLLOUT PIPELINE (animated, full-width) │
|
|
136
|
-
│ SIDE ├───────────────────────────────────────────┤
|
|
137
|
-
│ TOC │ DATASET EXAMPLES (card carousel/tabs) │
|
|
138
|
-
│ (sticky│───────────────────────────────────────────┤
|
|
139
|
-
│ left │ REWARD & SCORING (score bars + formula) │
|
|
140
|
-
│ on ├───────────────────────────────────────────┤
|
|
141
|
-
│ desktop│ CONFIGURATION (grouped param table) │
|
|
142
|
-
│ ) ├───────────────────────────────────────────┤
|
|
143
|
-
│ │ QUICK START (terminal-style code block) │
|
|
144
|
-
│ ├───────────────────────────────────────────┤
|
|
145
|
-
│ │ FILE MAP (visual tree) │
|
|
146
|
-
└─────────┴───────────────────────────────────────────┘
|
|
147
|
-
```
|
|
148
|
-
|
|
149
|
-
On mobile (< 768px): sidebar collapses, single column.
|
|
150
|
-
|
|
151
|
-
---
|
|
152
|
-
|
|
153
|
-
### Component Specs
|
|
154
|
-
|
|
155
|
-
#### Sticky Navigation Bar
|
|
156
|
-
- Height: 52px, `backdrop-filter: blur(12px)`, `background: rgba(var(--bg-page-rgb), 0.85)`
|
|
157
|
-
- Left: `⬡ PI` monogram in accent color + env name in muted text
|
|
158
|
-
- Center: anchor links to each section — underline slides in on hover
|
|
159
|
-
- Right: **theme toggle** — a pill switch (`☀ / ☾`) with CSS transition. Clicking it toggles `data-theme="dark"` on `<html>` and persists to `localStorage`
|
|
160
|
-
- On scroll past hero, nav gains `box-shadow: var(--shadow-sm)`
|
|
161
|
-
|
|
162
|
-
#### Hero Header
|
|
163
|
-
- Background: a CSS mesh gradient that shifts between `--bg-page` and a very subtle purple wash. In light mode: `radial-gradient(ellipse at 20% 50%, rgba(168,85,247,0.07) 0%, transparent 60%), radial-gradient(ellipse at 80% 20%, rgba(124,58,237,0.05) 0%, transparent 50%)`. In dark mode, increase opacity to 0.15/0.12.
|
|
164
|
-
- Top-left: `⬡ Prime Intellect` in `0.75rem` caps + accent color
|
|
165
|
-
- Environment name: large serif, with a thin purple underline that animates in (width 0→100%) on page load using a CSS keyframe
|
|
166
|
-
- One-sentence description beneath in secondary text
|
|
167
|
-
- Stat badges row: pill chips showing environment type, reward count, dataset size, turns. Each chip: `background: var(--accent-glow)`, `border: 1px solid var(--accent)`, `color: var(--accent)`, `font-size: 0.75rem`, `border-radius: 99px`, `padding: 3px 12px`
|
|
168
|
-
|
|
169
|
-
#### Animated Rollout Pipeline
|
|
170
|
-
This is the centerpiece section. Render it as an SVG or pure CSS/HTML flow diagram.
|
|
171
|
-
|
|
172
|
-
Structure: horizontal nodes connected by animated dashed lines.
|
|
173
|
-
|
|
174
|
-
```
|
|
175
|
-
[PROMPT] ──▶ [MODEL] ──▶ [RESPONSE] ──▶ [SCORING] ──▶ [FINAL SCORE]
|
|
176
|
-
```
|
|
177
|
-
|
|
178
|
-
Each node:
|
|
179
|
-
- Rounded rectangle, `background: var(--bg-card)`, `border: 2px solid var(--border)`
|
|
180
|
-
- Icon (use Unicode/emoji: 📋 🤖 💬 ⚖️ 🎯) + label + 1-line description from the actual env
|
|
181
|
-
- On hover: `border-color: var(--accent)`, subtle `box-shadow: 0 0 0 3px var(--accent-glow)`
|
|
182
|
-
|
|
183
|
-
Connecting lines: SVG `<line>` or CSS `border-top: 2px dashed var(--border-strong)` with an animated flow:
|
|
184
|
-
```css
|
|
185
|
-
@keyframes flow {
|
|
186
|
-
from { stroke-dashoffset: 20; }
|
|
187
|
-
to { stroke-dashoffset: 0; }
|
|
188
|
-
}
|
|
189
|
-
/* apply: animation: flow 1s linear infinite; */
|
|
190
|
-
```
|
|
191
|
-
|
|
192
|
-
If the env has tool calls or sandbox steps, add branch nodes below the main line with a vertical connector.
|
|
193
|
-
|
|
194
|
-
On page load, nodes fade+slide in with staggered `animation-delay` (0.1s per node).
|
|
195
|
-
|
|
196
|
-
#### Dataset Examples
|
|
197
|
-
- Tab strip at top: "Example 1", "Example 2", … (up to 5 tabs). Active tab: `border-bottom: 2px solid var(--accent)`, accent color text
|
|
198
|
-
- Each tab panel: a card showing the prompt content, then metadata chips for accompanying fields
|
|
199
|
-
- Prompt content: monospace block with subtle `background: var(--bg-code)`, `border-left: 3px solid var(--accent)`, `padding: 12px 16px`
|
|
200
|
-
- If the prompt has format constraints (e.g. "must not contain letter g"), highlight those phrases with `background: rgba(168,85,247,0.15)`, `border-radius: 3px`
|
|
201
|
-
- Copy button: top-right of each code block, shows "Copied!" with checkmark on click (pure JS)
|
|
202
|
-
|
|
203
|
-
#### Reward & Scoring
|
|
204
|
-
For each reward function, render a **reward card**:
|
|
205
|
-
```
|
|
206
|
-
┌─────────────────────────────────────────────┐
|
|
207
|
-
│ ◉ visible_reward weight: 0.5 │
|
|
208
|
-
│ ───────────────────────────────────────── │
|
|
209
|
-
│ Checks format constraints programmatically │
|
|
210
|
-
│ │
|
|
211
|
-
│ Score range [████████░░] 0.0 → 1.0 │
|
|
212
|
-
│ Type: deterministic · Returns: float │
|
|
213
|
-
└─────────────────────────────────────────────┘
|
|
214
|
-
```
|
|
215
|
-
|
|
216
|
-
Score bar: `<div>` with `background: linear-gradient(to right, var(--accent), var(--accent-dark))`, width animates from 0 to the displayed percentage on page load using `@keyframes grow-bar`.
|
|
217
|
-
|
|
218
|
-
If there's a composite formula, render it in a prominent callout:
|
|
219
|
-
```
|
|
220
|
-
┌─ COMPOSITE FORMULA ──────────────────────────────┐
|
|
221
|
-
│ R = (1 − hidden_weight) × visible │
|
|
222
|
-
│ + hidden_weight × hidden │
|
|
223
|
-
└───────────────────────────────────────────────────┘
|
|
224
|
-
```
|
|
225
|
-
Style: `background: var(--accent-glow)`, `border: 1px solid var(--accent)`, `border-radius: 8px`, `padding: 16px 20px`, monospace formula text in accent color.
|
|
226
|
-
|
|
227
|
-
If a judge LLM is used: a separate "Judge" callout card showing model name as a badge, abbreviated prompt in a collapsible block.
|
|
228
|
-
|
|
229
|
-
#### Configuration Table
|
|
230
|
-
Group parameters by `[Taskset]` / `[Harness]` / `[Top-level]` with a small section label.
|
|
231
|
-
|
|
232
|
-
Each row:
|
|
233
|
-
- `Parameter`: monospace, accent color
|
|
234
|
-
- `Type`: small badge — gray bg, rounded
|
|
235
|
-
- `Default`: monospace, muted
|
|
236
|
-
- `Description`: normal prose
|
|
237
|
-
|
|
238
|
-
High-impact parameters get a `⚡ key` badge in accent color.
|
|
239
|
-
|
|
240
|
-
Alternating row background: `var(--bg-card)` / `var(--bg-card-hover)` for readability.
|
|
241
|
-
|
|
242
|
-
#### Quick Start Block
|
|
243
|
-
A terminal-style window:
|
|
244
|
-
- Header bar: three colored dots (red/yellow/green `●●●`) + title "bash" in muted text
|
|
245
|
-
- Body: `background: #1a1a2e` (always dark, regardless of theme — this is a terminal), `color: #e2e8f0`
|
|
246
|
-
- Commands: lines prefixed with `$ ` in accent color; actual command in white
|
|
247
|
-
- Comments: muted color
|
|
248
|
-
- Copy-all button top-right
|
|
249
|
-
|
|
250
|
-
#### File Map
|
|
251
|
-
A visual file tree using box-drawing characters in a monospace block:
|
|
252
|
-
```
|
|
253
|
-
environments/ifeval_goblin/
|
|
254
|
-
├── ifeval_goblin.py — Main environment, config classes, taskset
|
|
255
|
-
├── ifeval_goblin_checks.py — Format constraint checkers
|
|
256
|
-
├── ifeval_goblin_prompts.py — PROMPTS list (64 task definitions)
|
|
257
|
-
└── pyproject.toml — Dependencies and eval defaults
|
|
258
|
-
```
|
|
259
|
-
Style: `background: var(--bg-code)`, `border-radius: 8px`, `padding: 20px`, monospace, muted text with filename in primary color.
|
|
260
|
-
|
|
261
|
-
#### Collapsible Sections
|
|
262
|
-
Every `<section>` has a toggle header. Clicking it smoothly expands/collapses:
|
|
263
|
-
```css
|
|
264
|
-
.section-body {
|
|
265
|
-
display: grid;
|
|
266
|
-
grid-template-rows: 1fr;
|
|
267
|
-
transition: grid-template-rows 0.3s ease;
|
|
268
|
-
}
|
|
269
|
-
.section-body.collapsed {
|
|
270
|
-
grid-template-rows: 0fr;
|
|
271
|
-
}
|
|
272
|
-
.section-body > .inner { overflow: hidden; }
|
|
273
|
-
```
|
|
274
|
-
The section heading shows a `▾` / `▸` chevron that rotates with `transition: transform 0.3s`.
|
|
275
|
-
|
|
276
|
-
#### Syntax Highlighting (inline JS, no library)
|
|
277
|
-
For Python code snippets, apply a minimal tokenizer via JS after DOM load:
|
|
278
|
-
- Keywords (`def`, `class`, `import`, `from`, `return`, `if`, `async`, `await`, `True`, `False`, `None`): wrap in `<span style="color: var(--accent)">`
|
|
279
|
-
- Strings (`"..."`, `'...'`, `"""..."""`): `color: var(--green)`
|
|
280
|
-
- Comments (`# ...`): `color: var(--text-muted); font-style: italic`
|
|
281
|
-
- Numbers: `color: var(--yellow)`
|
|
282
|
-
- Decorator (`@...`): `color: var(--accent-dark)`
|
|
283
|
-
|
|
284
|
-
#### Footer
|
|
285
|
-
```
|
|
286
|
-
Generated by Claude · Prime Intellect Verifiers · <timestamp>
|
|
287
|
-
```
|
|
288
|
-
Small, centered, `color: var(--text-muted)`, `font-size: 0.75rem`. Separator line above using `border-top: 1px solid var(--border)`.
|
|
289
|
-
|
|
290
|
-
---
|
|
291
|
-
|
|
292
|
-
### Animation Summary
|
|
293
|
-
|
|
294
|
-
All animations use `prefers-reduced-motion: reduce` guard:
|
|
295
|
-
```css
|
|
296
|
-
@media (prefers-reduced-motion: reduce) {
|
|
297
|
-
*, *::before, *::after { animation: none !important; transition: none !important; }
|
|
298
|
-
}
|
|
299
|
-
```
|
|
300
|
-
|
|
301
|
-
Animations to include:
|
|
302
|
-
1. **Page load**: Hero title underline grows left-to-right (600ms ease-out, 200ms delay)
|
|
303
|
-
2. **Stagger fade-in**: Pipeline nodes slide up 12px + fade in, 100ms stagger per node
|
|
304
|
-
3. **Score bars**: Width grows from 0 to final value (800ms ease-out, triggered when section scrolls into view via `IntersectionObserver`)
|
|
305
|
-
4. **Flow lines**: Dashed SVG/CSS lines have continuous `stroke-dashoffset` animation
|
|
306
|
-
5. **Theme toggle**: Smooth 300ms transition on all color properties
|
|
307
|
-
6. **Tab switch**: Content fades in at 150ms
|
|
308
|
-
7. **Section collapse**: Grid row height transition (300ms)
|
|
309
|
-
8. **Copy button**: Brief scale pulse (0.95 → 1.0) + text change
|
|
310
|
-
|
|
311
|
-
---
|
|
312
|
-
|
|
313
|
-
### JavaScript (inline, vanilla, ~80 lines total)
|
|
314
|
-
|
|
315
|
-
```js
|
|
316
|
-
// Theme toggle
|
|
317
|
-
const toggle = document.getElementById('theme-toggle');
|
|
318
|
-
const root = document.documentElement;
|
|
319
|
-
const saved = localStorage.getItem('pi-theme');
|
|
320
|
-
if (saved) root.setAttribute('data-theme', saved);
|
|
321
|
-
toggle.addEventListener('click', () => {
|
|
322
|
-
const next = root.getAttribute('data-theme') === 'dark' ? 'light' : 'dark';
|
|
323
|
-
root.setAttribute('data-theme', next);
|
|
324
|
-
localStorage.setItem('pi-theme', next);
|
|
325
|
-
});
|
|
326
|
-
|
|
327
|
-
// Score bar animation via IntersectionObserver
|
|
328
|
-
const observer = new IntersectionObserver((entries) => {
|
|
329
|
-
entries.forEach(e => {
|
|
330
|
-
if (e.isIntersecting) {
|
|
331
|
-
e.target.style.width = e.target.dataset.target;
|
|
332
|
-
observer.unobserve(e.target);
|
|
333
|
-
}
|
|
334
|
-
});
|
|
335
|
-
}, { threshold: 0.3 });
|
|
336
|
-
document.querySelectorAll('.score-bar-fill').forEach(el => observer.observe(el));
|
|
337
|
-
|
|
338
|
-
// Copy buttons
|
|
339
|
-
document.querySelectorAll('.copy-btn').forEach(btn => {
|
|
340
|
-
btn.addEventListener('click', () => {
|
|
341
|
-
navigator.clipboard.writeText(btn.closest('.code-block').querySelector('code').innerText);
|
|
342
|
-
btn.textContent = '✓ Copied';
|
|
343
|
-
setTimeout(() => btn.textContent = 'Copy', 1800);
|
|
344
|
-
});
|
|
345
|
-
});
|
|
346
|
-
|
|
347
|
-
// Tab switching
|
|
348
|
-
document.querySelectorAll('.tab-btn').forEach(btn => {
|
|
349
|
-
btn.addEventListener('click', () => {
|
|
350
|
-
const group = btn.closest('.tab-group');
|
|
351
|
-
group.querySelectorAll('.tab-btn').forEach(b => b.classList.remove('active'));
|
|
352
|
-
group.querySelectorAll('.tab-panel').forEach(p => p.classList.remove('active'));
|
|
353
|
-
btn.classList.add('active');
|
|
354
|
-
group.querySelector(btn.dataset.target).classList.add('active');
|
|
355
|
-
});
|
|
356
|
-
});
|
|
357
|
-
|
|
358
|
-
// Collapsible sections
|
|
359
|
-
document.querySelectorAll('.section-header').forEach(header => {
|
|
360
|
-
header.addEventListener('click', () => {
|
|
361
|
-
const body = header.nextElementSibling;
|
|
362
|
-
const chevron = header.querySelector('.chevron');
|
|
363
|
-
body.classList.toggle('collapsed');
|
|
364
|
-
chevron.style.transform = body.classList.contains('collapsed') ? 'rotate(-90deg)' : 'rotate(0deg)';
|
|
365
|
-
});
|
|
366
|
-
});
|
|
367
|
-
|
|
368
|
-
// Active nav highlight on scroll
|
|
369
|
-
const sections = document.querySelectorAll('section[id]');
|
|
370
|
-
const navLinks = document.querySelectorAll('.nav-link');
|
|
371
|
-
const scrollObserver = new IntersectionObserver((entries) => {
|
|
372
|
-
entries.forEach(e => {
|
|
373
|
-
if (e.isIntersecting) {
|
|
374
|
-
navLinks.forEach(l => l.classList.remove('active'));
|
|
375
|
-
document.querySelector(`.nav-link[href="#${e.target.id}"]`)?.classList.add('active');
|
|
376
|
-
}
|
|
377
|
-
});
|
|
378
|
-
}, { rootMargin: '-40% 0px -55% 0px' });
|
|
379
|
-
sections.forEach(s => scrollObserver.observe(s));
|
|
380
|
-
```
|
|
381
|
-
|
|
382
|
-
---
|
|
383
|
-
|
|
384
|
-
## Step 3 — Confirm and report
|
|
385
|
-
|
|
386
|
-
After writing the file, tell the user:
|
|
387
|
-
- The full path to `environment_overview.html` and the command to open it (`open environment_overview.html`)
|
|
388
|
-
- A one-paragraph summary: environment type, number of reward functions, dataset source, and the key behavioral parameters worth knowing
|
|
389
|
-
|
|
390
|
-
If any section couldn't be filled because the information wasn't in the source, say so explicitly — never hallucinate reward weights, defaults, or dataset contents.
|
|
391
|
-
|
|
392
|
-
## Anti-patterns
|
|
393
|
-
|
|
394
|
-
- Do not invent reward weights, parameter defaults, or dataset contents not in the source
|
|
395
|
-
- Do not link to external URLs — the file must be fully self-contained
|
|
396
|
-
- Do not skip helper modules (e.g. `*_checks.py`) — they often contain the core scoring logic
|
|
397
|
-
- Do not use Inter, Roboto, or Arial — use the Georgia/serif + system-sans stack specified above
|
|
398
|
-
- Do not default to a dark-only output — light is the default; dark toggle must work
|