understanding-prime-env 0.1.5 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "understanding-prime-env",
3
- "version": "0.1.5",
3
+ "version": "0.1.7",
4
4
  "description": "Generate a rich, self-contained HTML report explaining any Prime Intellect verifiers environment.",
5
5
  "keywords": [
6
6
  "prime-intellect",
@@ -1,169 +1,290 @@
1
1
  ---
2
2
  name: understand-prime-env
3
- description: Generate a rich, self-contained HTML report that fully explains a Prime Intellect verifiers environment. Use this skill any time the user asks to understand, explain, document, visualize, or explore a verifiers environment — even if they just say "what does this environment do?", "explain this env", "give me an overview", or "generate an HTML for this environment". The skill reads the Python source files in the current directory, extracts the dataset, reward functions, rollout logic, and configuration parameters, and writes a beautiful HTML file to the environment folder.
3
+ description: Generate a rich, self-contained HTML report that fully explains a Prime Intellect verifiers environment. Use this skill any time the user asks to understand, explain, document, visualize, or explore a verifiers environment — even if they just say "what does this environment do?", "explain this env", "give me an overview", or "generate an HTML for this environment". The skill reads the Python source files in the current directory, extracts the raw dataset, reward functions, and rollout logic, and writes a visually stunning gamified HTML file to the environment folder.
4
4
  ---
5
5
 
6
- # Understand Environment
6
+ # Understand Prime Environment
7
7
 
8
8
  ## Goal
9
9
 
10
- Produce a single self-contained HTML file (`environment_overview.html`) that gives a first-timer someone who has never seen this environmenta clear answer to one question in under 2 minutes: **"What does the model get asked to do, and how does it get scored?"**
11
-
12
- The output is a single screen (no scrolling), three tabs. That's it.
10
+ Produce a single self-contained HTML file (`environment_overview.html`). A researcher opens it and sees a **stack of 4 cards** like a physical deckeach one peeking out behind the one in front. They click through the deck, one card at a time, in a satisfying progressive reveal. Each card is one chapter of the story. The whole experience should feel like flipping through a beautifully designed research brief.
13
11
 
14
12
  ---
15
13
 
16
14
  ## Step 1 — Read the source
17
15
 
18
- Read **every `.py` file** in the current directory. Also read `pyproject.toml` and `README.md` if they exist. Do not skip helper files — reward logic is often split across modules (e.g. `*_checks.py`, `*_prompts.py`).
16
+ Read **every `.py` file** in the current directory. Also read `pyproject.toml` and `README.md` if they exist. Do not skip helper files — reward logic is often split across modules (e.g. `*_checks.py`, `*_prompts.py`). Read everything before writing a single line of HTML.
17
+
18
+ Extract exactly four things:
19
19
 
20
- Extract only these three things:
20
+ ### Card 1 Environment
21
+ - Name, and one punchy paragraph (3–4 sentences) describing what task this trains a model to do
22
+ - GitHub URL if found anywhere in source or README — if not found, omit entirely
23
+ - 3–5 stat chips: dataset size, reward count, turn count, task type, etc.
21
24
 
22
- ### 1. Datasetwhat does the model see?
23
- - Find 1–2 real example prompts from the source (a `PROMPTS` list, HuggingFace dataset, or prompt-building function).
24
- - If real data is unavailable, synthesize 1–2 examples that match the prompt schema exactly.
25
- - Extract only the **user-facing prompt text** what the model actually reads. No metadata, no field schemas, no accompanying fields.
25
+ ### Card 2Dataset
26
+ - Where the data comes from: HuggingFace dataset name + split, hardcoded list, generator, etc. — one line
27
+ - Every field in a data row: name, type, purpose
28
+ - One complete example row with every field shown in full real values if available, otherwise synthesize one that is indistinguishable from real (exact field names, value formats, constraints)
26
29
 
27
- ### 2. Rolloutwhat is the sequence of events?
28
- - Identify the 4–5 steps that happen during a single rollout: what the model receives, what it produces, what tools or sandbox it has (if any), and what happens at scoring time.
29
- - Write each step as a short label (2–5 words) and a one-line description.
30
+ ### Card 3Rewards
31
+ - Every reward function: name, exactly what it checks, precisely what makes it score 0 vs 1 (and any partial values)
32
+ - Any thresholds, regex patterns, or edge cases a model writer needs to know
33
+ - If rewards combine into a final score: the exact formula
30
34
 
31
- ### 3. Rewardshow does scoring work?
32
- - List every reward function (`@vf.reward`, functions passed to `Rubric`, reward methods on `Taskset`).
33
- - For each: its name and one sentence describing what it measures.
34
- - If multiple rewards combine into a final score, extract the exact formula (e.g. `R = (1 - hw) × visible + hw × hidden`).
35
+ ### Card 4Rollout
36
+ - Step-by-step theoretical trace of one example running end-to-end:
37
+ 1. How the raw row becomes the prompt the model sees
38
+ 2. What the model is expected to produce
39
+ 3. How each reward fires on the output
40
+ 4. How the final score is computed
41
+ 5. What a perfect response looks like vs a zero-score response
35
42
 
36
43
  ---
37
44
 
38
45
  ## Step 2 — Generate the HTML
39
46
 
40
- Write a single self-contained HTML file to `./environment_overview.html`. No external CDN dependencies — all CSS and JS inline.
47
+ Write a single **self-contained** HTML file to `./environment_overview.html`. Zero external dependencies — all CSS and JS inline.
48
+
49
+ ---
50
+
51
+ ### The Core Mechanic — Card Stack Reveal
41
52
 
42
- ### Design
53
+ The entire UI is a **centered card deck**. All four cards occupy the same position. The active card is front and center at full size. Cards behind it peek out — each one slightly smaller, slightly lower, slightly darker — giving the illusion of a physical stack.
43
54
 
44
- **Light theme default, dark toggle in the top-right corner.**
55
+ ```
56
+ ░░░░░░░░░░░░░░░░ ← Card 4 (furthest back, barely visible)
57
+ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ← Card 3
58
+ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ← Card 2
59
+ ██████████████████████████ ← Card 1 (active, full size, full opacity)
60
+ ```
61
+
62
+ Clicking anywhere on the active card — or the "Continue →" button — triggers the reveal: the active card flies out (slides left + slight rotation + fade), and the next card scales up to the front with a spring animation. A progress indicator shows position (● ● ○ ○).
63
+
64
+ When card 4 is shown, "Continue →" becomes "Done ✓" and clicking it does nothing (or fades the stack out gracefully).
65
+
66
+ ---
45
67
 
68
+ ### Visual Design
69
+
70
+ **Background:** Full-viewport dark canvas.
71
+ ```css
72
+ body {
73
+ background: #07090f;
74
+ background-image:
75
+ radial-gradient(ellipse at 20% 20%, rgba(168,85,247,0.06) 0%, transparent 50%),
76
+ radial-gradient(ellipse at 80% 80%, rgba(34,211,238,0.04) 0%, transparent 50%);
77
+ min-height: 100vh;
78
+ display: flex;
79
+ flex-direction: column;
80
+ align-items: center;
81
+ justify-content: center;
82
+ }
46
83
  ```
47
- Light: bg #f8f7f4 · card #ffffff · border #e5e1f0
48
- text #1a1523 · muted #8b82a8 · accent #a855f7
49
- Dark: bg #0f0f1a · card #161627 · border #2a2a4a
50
- text #e2e8f0 · muted #6b6890 · accent #a855f7
84
+
85
+ **Card base:**
86
+ ```css
87
+ .card {
88
+ position: absolute;
89
+ width: min(600px, 90vw);
90
+ background: #0d1117;
91
+ border-radius: 24px;
92
+ padding: 40px 44px 36px;
93
+ transform-origin: center bottom;
94
+ will-change: transform, opacity;
95
+ }
96
+ ```
97
+
98
+ **Stack offset** (CSS, applied via `data-depth` attribute 0–3, 0 = active):
99
+ ```
100
+ depth 0: scale(1.00) translateY(0px) opacity: 1 (active)
101
+ depth 1: scale(0.96) translateY(18px) opacity: 0.65 z-index: -1
102
+ depth 2: scale(0.92) translateY(36px) opacity: 0.35 z-index: -2
103
+ depth 3: scale(0.88) translateY(54px) opacity: 0.15 z-index: -3
104
+ ```
105
+
106
+ Each card has a unique accent. Apply via a CSS custom property `--accent` and `--glow` set on the card element itself. The gradient border and glow use this accent.
107
+
108
+ ```
109
+ Card 1: --accent: #a855f7 --glow: rgba(168,85,247,0.3) (purple)
110
+ Card 2: --accent: #22d3ee --glow: rgba(34,211,238,0.3) (cyan)
111
+ Card 3: --accent: #f59e0b --glow: rgba(245,158,11,0.3) (amber)
112
+ Card 4: --accent: #f43f5e --glow: rgba(244,63,94,0.3) (rose)
113
+ ```
114
+
115
+ **Gradient border** on the active card only:
116
+ ```css
117
+ .card[data-depth="0"] {
118
+ box-shadow:
119
+ 0 0 0 1.5px var(--accent),
120
+ 0 0 60px var(--glow),
121
+ 0 32px 80px rgba(0,0,0,0.6);
122
+ }
123
+ .card[data-depth="1"],
124
+ .card[data-depth="2"],
125
+ .card[data-depth="3"] {
126
+ box-shadow: 0 0 0 1px rgba(255,255,255,0.06);
127
+ }
51
128
  ```
52
129
 
53
- All colors as CSS custom properties on `:root` and `[data-theme="dark"]`. Toggle swaps the attribute; `localStorage` persists the choice.
130
+ **Typography:**
131
+ ```css
132
+ font-family: -apple-system, 'SF Pro Display', 'Helvetica Neue', sans-serif;
133
+ font-family: ui-monospace, 'Cascadia Code', 'Fira Code', monospace; /* code only */
134
+ ```
54
135
 
55
- Typography: Georgia/serif for the env name; `-apple-system, Helvetica Neue, sans-serif` for everything else; `ui-monospace, Fira Code, monospace` for code and formulas. No Inter, no Roboto.
136
+ ---
56
137
 
57
- ### Structure
138
+ ### Card Content Specs
58
139
 
59
- The entire page fits on one screen without scrolling. Layout:
140
+ Each card has the same shell structure:
60
141
 
61
142
  ```
62
- ┌─────────────────────────────────────────────┐
63
- env name (large, serif) [☀/☾ toggle]
64
- one-sentence description
65
- ├─────────────────────────────────────────────┤
66
- [ Dataset ] [ Rollout ] [ Rewards ]
67
- ├─────────────────────────────────────────────┤
68
-
69
- tab content (no scroll)
70
-
71
- └─────────────────────────────────────────────┘
143
+ ┌────────────────────────────────────────────┐
144
+ LABEL (0.65rem, accent, caps, tracking)
145
+ TITLE (1.6rem, 700, white)
146
+ │ ───────────────────────────────────── │
147
+
148
+ │ [BODY — unique per card, see below] │
149
+
150
+ ────────────────────────────────────────
151
+ [progress dots] [Continue → button]
152
+ └────────────────────────────────────────────┘
72
153
  ```
73
154
 
74
- ### Tab 1 Dataset
155
+ **Progress dots:** 4 dots, `width: 7px height: 7px border-radius: 50%`. Active dot: accent color, `width: 20px border-radius: 4px` (pill). Inactive: `rgba(255,255,255,0.15)`. Transition: `width 0.3s ease`.
156
+
157
+ **Continue button:** `background: var(--accent)`, `color: #000`, `font-weight: 700`, `font-size: 0.82rem`, `border-radius: 99px`, `padding: 8px 20px`, `border: none`, `cursor: pointer`. Hover: `opacity: 0.85`.
75
158
 
76
- Show 1–2 example prompts in a clean monospace block:
77
- - `background: var(--bg-code)`, `border-left: 3px solid var(--accent)`, `padding: 12px 16px`, `border-radius: 0 6px 6px 0`
78
- - If there are 2 examples, a subtle "Example 1 / 2" toggle (two small buttons, no full tab strip)
79
- - Nothing else on this tab — no labels, no field names, no copy button
159
+ ---
160
+
161
+ #### Card 1 Body Environment
80
162
 
81
- ### Tab 2 Rollout
163
+ - **Env name**: `font-size: 1.6rem`, `font-weight: 800`, white
164
+ - **Description**: 3–4 sentences, `font-size: 0.9rem`, `color: #94a3b8`, `line-height: 1.65`, `margin: 14px 0`
165
+ - **GitHub link** (only if URL was found in source): pill button — `background: rgba(255,255,255,0.05)`, `border: 1px solid rgba(255,255,255,0.1)`, `color: #e2e8f0`, `border-radius: 99px`, `padding: 5px 14px`, `font-size: 0.78rem`. Shows `↗ GitHub`. Hover: `border-color: var(--accent)`, `color: var(--accent)`. If no URL found, this element does not exist.
166
+ - **Stat chips**: row of 3–5 pills. `background: rgba(168,85,247,0.08)`, `border: 1px solid rgba(168,85,247,0.2)`, `color: #c4b5fd`, `border-radius: 99px`, `padding: 4px 11px`, `font-size: 0.73rem`
82
167
 
83
- A static horizontal pipeline: 4–5 boxes connected by `→` arrows.
168
+ ---
84
169
 
170
+ #### Card 2 Body — Dataset
171
+
172
+ **Source line** — monospace, one line:
85
173
  ```
86
- [ Prompt ] [ Model ] → [ Response ] → [ Scoring ] → [ Score ]
174
+ HuggingFace · openai/gsm8k · train split
87
175
  ```
176
+ Style: `background: rgba(34,211,238,0.05)`, `border-left: 3px solid #22d3ee`, `padding: 8px 14px`, `border-radius: 0 6px 6px 0`, `font-size: 0.82rem`, `color: #67e8f9`
88
177
 
89
- Each box:
90
- - `background: var(--bg-card)`, `border: 1.5px solid var(--border)`, `border-radius: 8px`, `padding: 10px 16px`
91
- - **Bold label** (2–4 words) on top
92
- - One-line description beneath in muted text, `font-size: 0.8rem`
93
- - On hover: `border-color: var(--accent)`
178
+ **Field list** — compact, beneath the source line:
179
+ Each field on one line: `field_name` in cyan monospace + `·` + type/purpose in muted text. `font-size: 0.8rem`, `line-height: 1.8`.
94
180
 
95
- Arrows: plain `→` character in muted color between boxes. No SVG, no animation.
181
+ **Example row** the main content:
182
+ A clean structured display. Label: `EXAMPLE ROW` in `0.65rem` cyan caps. Then each field:
183
+ - Field name: cyan monospace, `font-size: 0.78rem`
184
+ - Value: white, `font-size: 0.82rem`, `line-height: 1.5`
185
+ - Long text values (prompts, answers): wrapped in a soft box — `background: rgba(255,255,255,0.03)`, `border-radius: 6px`, `padding: 8px 12px`, `margin-top: 2px`
186
+ - Full content — never truncated
96
187
 
97
- Layout: `display: flex; align-items: center; gap: 8px; flex-wrap: wrap` so it reflows gracefully on smaller screens.
188
+ ---
98
189
 
99
- ### Tab 3 — Rewards
190
+ #### Card 3 Body — Rewards
100
191
 
101
- A clean list. For each reward function:
192
+ For each reward function, a compact block:
102
193
 
103
194
  ```
104
- reward_name
105
- One sentence describing what it measures.
195
+ format_reward [float]
196
+ Checks response contains <answer>…</answer>
197
+ ✗ 0 tags absent or inner content non-numeric
198
+ ✓ 1 tags present, content is a valid integer
106
199
  ```
107
200
 
108
- - Name: monospace, accent color, `font-size: 0.9rem`
109
- - Description: normal prose, secondary text color, `font-size: 0.875rem`
110
- - Separated by a thin `border-bottom: 1px solid var(--border)`
201
+ - Name: monospace, `color: #fcd34d`, `font-weight: 600`, `font-size: 0.88rem`
202
+ - `[float]` badge: `font-size: 0.68rem`, `color: #6b7280`, right-aligned via `display: flex justify-content: space-between`
203
+ - Description line: `color: #94a3b8`, `font-size: 0.8rem`, `margin: 4px 0 6px`
204
+ - `✗ 0` / `✓ 1` lines: `font-size: 0.78rem`, `✗` in `#f87171`, `✓` in `#4ade80`, text in `#94a3b8`
205
+ - Block: `background: rgba(245,158,11,0.05)`, `border: 1px solid rgba(245,158,11,0.12)`, `border-radius: 10px`, `padding: 12px 14px`, `margin-bottom: 10px`
111
206
 
112
- If there is a composite formula, show it below the list in a single styled block:
207
+ If composite formula exists after all reward blocks:
113
208
  ```
114
- background: var(--accent-glow) /* rgba(168,85,247,0.10) */
115
- border: 1px solid var(--accent)
116
- border-radius: 6px
117
- padding: 12px 16px
118
- font-family: monospace
119
- color: var(--accent)
209
+ background: rgba(245,158,11,0.08)
210
+ border: 1px solid rgba(245,158,11,0.25)
211
+ border-radius: 8px · padding: 10px 14px
212
+ font-family: monospace · color: #fcd34d · font-size: 0.85rem
120
213
  ```
121
214
 
122
- Nothing else on this tab — no weights, no score bars, no judge details.
215
+ ---
123
216
 
124
- ### Theme Toggle
217
+ #### Card 4 Body — Rollout
125
218
 
126
- A small pill button, top-right of the header. Shows `☀` in dark mode, `☾` in light mode.
219
+ Numbered steps. Each step:
127
220
 
128
- ```js
129
- const root = document.documentElement;
130
- const btn = document.getElementById('theme-toggle');
131
- const saved = localStorage.getItem('pi-theme');
132
- if (saved) root.setAttribute('data-theme', saved);
133
- btn.addEventListener('click', () => {
134
- const next = root.getAttribute('data-theme') === 'dark' ? 'light' : 'dark';
135
- root.setAttribute('data-theme', next);
136
- localStorage.setItem('pi-theme', next);
137
- });
138
221
  ```
222
+ 01
223
+ Data → Prompt
224
+ The problem field is inserted into "Solve step by step…"
225
+ as the user message. No system prompt.
226
+ ```
227
+
228
+ - Number: `font-size: 2rem`, `font-weight: 800`, `color: var(--accent)`, `opacity: 0.25`, `line-height: 1`
229
+ - Title: `font-size: 0.88rem`, `font-weight: 700`, `color: #f1f5f9`, `margin: 2px 0`
230
+ - Description: `font-size: 0.8rem`, `color: #94a3b8`, `line-height: 1.55`
231
+ - Left connector: `border-left: 2px solid rgba(244,63,94,0.15)`, `padding-left: 16px`, `margin-left: 12px`, except on last step
232
+ - Between steps: `margin-bottom: 16px`
139
233
 
140
- ### Tab Switching
234
+ Always 5 steps: Data→Prompt · Model Response · Reward Evaluation · Score Computation · Perfect vs Zero.
141
235
 
142
- ```js
143
- document.querySelectorAll('.tab-btn').forEach(btn => {
144
- btn.addEventListener('click', () => {
145
- document.querySelectorAll('.tab-btn').forEach(b => b.classList.remove('active'));
146
- document.querySelectorAll('.tab-panel').forEach(p => p.classList.remove('active'));
147
- btn.classList.add('active');
148
- document.getElementById(btn.dataset.tab).classList.add('active');
149
- });
150
- });
236
+ ---
237
+
238
+ ### Reveal Animation
239
+
240
+ ```css
241
+ @keyframes flyOut {
242
+ to { transform: translateX(-120%) rotate(-8deg); opacity: 0; }
243
+ }
244
+ @keyframes riseUp {
245
+ from { transform: scale(0.96) translateY(18px); opacity: 0.65; }
246
+ to { transform: scale(1) translateY(0); opacity: 1; }
247
+ }
248
+ ```
249
+
250
+ On click:
251
+ 1. Active card: `flyOut 0.45s cubic-bezier(0.4, 0, 0.2, 1) forwards`
252
+ 2. After 80ms: next card transitions from depth-1 styles to depth-0 styles — `riseUp 0.5s cubic-bezier(0.34, 1.56, 0.64, 1)` (spring overshoot)
253
+ 3. All remaining cards shift their `data-depth` attributes down by 1
254
+ 4. Progress dots update with a `0.3s` width transition
255
+
256
+ Guard all animations:
257
+ ```css
258
+ @media (prefers-reduced-motion: reduce) {
259
+ *, *::before, *::after { animation: none !important; transition: none !important; }
260
+ }
151
261
  ```
152
262
 
153
- Active tab style: `border-bottom: 2px solid var(--accent)`, accent color text. Inactive: muted text, no border.
263
+ ---
264
+
265
+ ### Page Chrome
266
+
267
+ **Top:** Environment name in small muted text, centered, `font-size: 0.75rem`, `color: #334155`, `letter-spacing: 0.08em`, `margin-bottom: 32px`.
268
+
269
+ **Bottom:** `Generated by Claude · Prime Intellect · <timestamp>` — `font-size: 0.68rem`, `color: #1e293b`, `margin-top: 28px`.
270
+
271
+ Nothing else. No nav, no sidebar, no header. The cards are the whole UI.
154
272
 
155
273
  ---
156
274
 
157
275
  ## Step 3 — Confirm and report
158
276
 
159
- After writing the file, tell the user:
160
- - The full path and `open environment_overview.html` command
161
- - Two sentences: what the environment does and how it scores
277
+ After writing the file:
278
+ - Give the full path and `open environment_overview.html`
279
+ - Two sentences: what the environment trains and how it scores
162
280
 
163
281
  ## Anti-patterns
164
282
 
165
- - Do not add config parameters, file maps, quick-start commands, or any section beyond the three tabs
166
- - Do not add animations, score bars, copy buttons, or collapsible sections
167
- - Do not hallucinate reward weights, defaults, or prompt content not found in the source
168
- - Do not skip helper modules they often contain the core scoring logic
169
- - If content would cause scrolling within a tab, cut it further
283
+ - Do not dump all content at once each card is one focused chapter
284
+ - Do not truncate the example row — every field, every value, in full
285
+ - Do not invent a GitHub URL only include it if found in the source
286
+ - Do not hallucinate reward weights, field names, or dataset content
287
+ - Do not skip helper modules they contain the core reward logic
288
+ - Do not add tabs, sidebars, scroll-within-cards, or any structure beyond the 4-card deck
289
+ - Do not use a light theme — dark only
290
+ - Do not use Inter, Roboto, or any Google Font