npm - understanding-prime-env - Versions diffs - 0.1.5 → 0.1.7 - Mend

understanding-prime-env 0.1.5 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/package.json +1 -1
package/skills/understand-prime-env/SKILL.md +221 -100

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "understanding-prime-env",
-  "version": "0.1.5",
+  "version": "0.1.7",
   "description": "Generate a rich, self-contained HTML report explaining any Prime Intellect verifiers environment.",
   "keywords": [
     "prime-intellect",

package/skills/understand-prime-env/SKILL.md CHANGED Viewed

@@ -1,169 +1,290 @@
 ---
 name: understand-prime-env
-description: Generate a rich, self-contained HTML report that fully explains a Prime Intellect verifiers environment. Use this skill any time the user asks to understand, explain, document, visualize, or explore a verifiers environment — even if they just say "what does this environment do?", "explain this env", "give me an overview", or "generate an HTML for this environment". The skill reads the Python source files in the current directory, extracts the dataset, reward functions, rollout logic, and configuration parameters, and writes a beautiful HTML file to the environment folder.
+description: Generate a rich, self-contained HTML report that fully explains a Prime Intellect verifiers environment. Use this skill any time the user asks to understand, explain, document, visualize, or explore a verifiers environment — even if they just say "what does this environment do?", "explain this env", "give me an overview", or "generate an HTML for this environment". The skill reads the Python source files in the current directory, extracts the raw dataset, reward functions, and rollout logic, and writes a visually stunning gamified HTML file to the environment folder.
 ---
-# Understand Environment
+# Understand Prime Environment
 ## Goal
-Produce a single self-contained HTML file (`environment_overview.html`) that gives a first-timer — someone who has never seen this environment — a clear answer to one question in under 2 minutes: **"What does the model get asked to do, and how does it get scored?"**
-The output is a single screen (no scrolling), three tabs. That's it.
+Produce a single self-contained HTML file (`environment_overview.html`). A researcher opens it and sees a **stack of 4 cards** — like a physical deck — each one peeking out behind the one in front. They click through the deck, one card at a time, in a satisfying progressive reveal. Each card is one chapter of the story. The whole experience should feel like flipping through a beautifully designed research brief.
 ---
 ## Step 1 — Read the source
-Read **every `.py` file** in the current directory. Also read `pyproject.toml` and `README.md` if they exist. Do not skip helper files — reward logic is often split across modules (e.g. `*_checks.py`, `*_prompts.py`).
+Read **every `.py` file** in the current directory. Also read `pyproject.toml` and `README.md` if they exist. Do not skip helper files — reward logic is often split across modules (e.g. `*_checks.py`, `*_prompts.py`). Read everything before writing a single line of HTML.
+Extract exactly four things:
-Extract only these three things:
+### Card 1 — Environment
+- Name, and one punchy paragraph (3–4 sentences) describing what task this trains a model to do
+- GitHub URL if found anywhere in source or README — if not found, omit entirely
+- 3–5 stat chips: dataset size, reward count, turn count, task type, etc.
-### 1. Dataset — what does the model see?
-- Find 1–2 real example prompts from the source (a `PROMPTS` list, HuggingFace dataset, or prompt-building function).
-- If real data is unavailable, synthesize 1–2 examples that match the prompt schema exactly.
-- Extract only the **user-facing prompt text** — what the model actually reads. No metadata, no field schemas, no accompanying fields.
+### Card 2 — Dataset
+- Where the data comes from: HuggingFace dataset name + split, hardcoded list, generator, etc. — one line
+- Every field in a data row: name, type, purpose
+- One complete example row with every field shown in full — real values if available, otherwise synthesize one that is indistinguishable from real (exact field names, value formats, constraints)
-### 2. Rollout — what is the sequence of events?
-- Identify the 4–5 steps that happen during a single rollout: what the model receives, what it produces, what tools or sandbox it has (if any), and what happens at scoring time.
-- Write each step as a short label (2–5 words) and a one-line description.
+### Card 3 — Rewards
+- Every reward function: name, exactly what it checks, precisely what makes it score 0 vs 1 (and any partial values)
+- Any thresholds, regex patterns, or edge cases a model writer needs to know
+- If rewards combine into a final score: the exact formula
-### 3. Rewards — how does scoring work?
-- List every reward function (`@vf.reward`, functions passed to `Rubric`, reward methods on `Taskset`).
-- For each: its name and one sentence describing what it measures.
-- If multiple rewards combine into a final score, extract the exact formula (e.g. `R = (1 - hw) × visible + hw × hidden`).
+### Card 4 — Rollout
+- Step-by-step theoretical trace of one example running end-to-end:
+  1. How the raw row becomes the prompt the model sees
+  2. What the model is expected to produce
+  3. How each reward fires on the output
+  4. How the final score is computed
+  5. What a perfect response looks like vs a zero-score response
 ---
 ## Step 2 — Generate the HTML
-Write a single self-contained HTML file to `./environment_overview.html`. No external CDN dependencies — all CSS and JS inline.
+Write a single **self-contained** HTML file to `./environment_overview.html`. Zero external dependencies — all CSS and JS inline.
+---
+### The Core Mechanic — Card Stack Reveal
-### Design
+The entire UI is a **centered card deck**. All four cards occupy the same position. The active card is front and center at full size. Cards behind it peek out — each one slightly smaller, slightly lower, slightly darker — giving the illusion of a physical stack.
-**Light theme default, dark toggle in the top-right corner.**
+```
+         ░░░░░░░░░░░░░░░░  ← Card 4 (furthest back, barely visible)
+       ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒  ← Card 3
+     ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓  ← Card 2
+   ██████████████████████████  ← Card 1 (active, full size, full opacity)
+```
+Clicking anywhere on the active card — or the "Continue →" button — triggers the reveal: the active card flies out (slides left + slight rotation + fade), and the next card scales up to the front with a spring animation. A progress indicator shows position (● ● ○ ○).
+When card 4 is shown, "Continue →" becomes "Done ✓" and clicking it does nothing (or fades the stack out gracefully).
+---
+### Visual Design
+**Background:** Full-viewport dark canvas.
+```css
+body {
+  background: #07090f;
+  background-image:
+    radial-gradient(ellipse at 20% 20%, rgba(168,85,247,0.06) 0%, transparent 50%),
+    radial-gradient(ellipse at 80% 80%, rgba(34,211,238,0.04) 0%, transparent 50%);
+  min-height: 100vh;
+  display: flex;
+  flex-direction: column;
+  align-items: center;
+  justify-content: center;
+}
 ```
-Light:  bg #f8f7f4 · card #ffffff · border #e5e1f0
-        text #1a1523 · muted #8b82a8 · accent #a855f7
-Dark:   bg #0f0f1a · card #161627 · border #2a2a4a
-        text #e2e8f0 · muted #6b6890 · accent #a855f7
+**Card base:**
+```css
+.card {
+  position: absolute;
+  width: min(600px, 90vw);
+  background: #0d1117;
+  border-radius: 24px;
+  padding: 40px 44px 36px;
+  transform-origin: center bottom;
+  will-change: transform, opacity;
+}
+```
+**Stack offset** (CSS, applied via `data-depth` attribute 0–3, 0 = active):
+```
+depth 0: scale(1.00)   translateY(0px)    opacity: 1     (active)
+depth 1: scale(0.96)   translateY(18px)   opacity: 0.65  z-index: -1
+depth 2: scale(0.92)   translateY(36px)   opacity: 0.35  z-index: -2
+depth 3: scale(0.88)   translateY(54px)   opacity: 0.15  z-index: -3
+```
+Each card has a unique accent. Apply via a CSS custom property `--accent` and `--glow` set on the card element itself. The gradient border and glow use this accent.
+```
+Card 1:  --accent: #a855f7   --glow: rgba(168,85,247,0.3)   (purple)
+Card 2:  --accent: #22d3ee   --glow: rgba(34,211,238,0.3)   (cyan)
+Card 3:  --accent: #f59e0b   --glow: rgba(245,158,11,0.3)   (amber)
+Card 4:  --accent: #f43f5e   --glow: rgba(244,63,94,0.3)    (rose)
+```
+**Gradient border** on the active card only:
+```css
+.card[data-depth="0"] {
+  box-shadow:
+    0 0 0 1.5px var(--accent),
+    0 0 60px var(--glow),
+    0 32px 80px rgba(0,0,0,0.6);
+}
+.card[data-depth="1"],
+.card[data-depth="2"],
+.card[data-depth="3"] {
+  box-shadow: 0 0 0 1px rgba(255,255,255,0.06);
+}
 ```
-All colors as CSS custom properties on `:root` and `[data-theme="dark"]`. Toggle swaps the attribute; `localStorage` persists the choice.
+**Typography:**
+```css
+font-family: -apple-system, 'SF Pro Display', 'Helvetica Neue', sans-serif;
+font-family: ui-monospace, 'Cascadia Code', 'Fira Code', monospace; /* code only */
+```
-Typography: Georgia/serif for the env name; `-apple-system, Helvetica Neue, sans-serif` for everything else; `ui-monospace, Fira Code, monospace` for code and formulas. No Inter, no Roboto.
+---
-### Structure
+### Card Content Specs
-The entire page fits on one screen without scrolling. Layout:
+Each card has the same shell structure:
 ```
-┌─────────────────────────────────────────────┐
-│  env name (large, serif)        [☀/☾ toggle]│
-│  one-sentence description                   │
-├─────────────────────────────────────────────┤
-│  [ Dataset ]  [ Rollout ]  [ Rewards ]      │
-├─────────────────────────────────────────────┤
-│                                             │
-│  tab content (no scroll)                   │
-│                                             │
-└─────────────────────────────────────────────┘
+┌────────────────────────────────────────────┐
+│  LABEL (0.65rem, accent, caps, tracking)   │
+│  TITLE (1.6rem, 700, white)                │
+│  ─────────────────────────────────────     │
+│                                            │
+│  [BODY — unique per card, see below]       │
+│                                            │
+│  ────────────────────────────────────────  │
+│  [progress dots]    [Continue → button]    │
+└────────────────────────────────────────────┘
 ```
-### Tab 1 — Dataset
+**Progress dots:** 4 dots, `width: 7px height: 7px border-radius: 50%`. Active dot: accent color, `width: 20px border-radius: 4px` (pill). Inactive: `rgba(255,255,255,0.15)`. Transition: `width 0.3s ease`.
+**Continue button:** `background: var(--accent)`, `color: #000`, `font-weight: 700`, `font-size: 0.82rem`, `border-radius: 99px`, `padding: 8px 20px`, `border: none`, `cursor: pointer`. Hover: `opacity: 0.85`.
-Show 1–2 example prompts in a clean monospace block:
-- `background: var(--bg-code)`, `border-left: 3px solid var(--accent)`, `padding: 12px 16px`, `border-radius: 0 6px 6px 0`
-- If there are 2 examples, a subtle "Example 1 / 2" toggle (two small buttons, no full tab strip)
-- Nothing else on this tab — no labels, no field names, no copy button
+---
+#### Card 1 Body — Environment
-### Tab 2 — Rollout
+- **Env name**: `font-size: 1.6rem`, `font-weight: 800`, white
+- **Description**: 3–4 sentences, `font-size: 0.9rem`, `color: #94a3b8`, `line-height: 1.65`, `margin: 14px 0`
+- **GitHub link** (only if URL was found in source): pill button — `background: rgba(255,255,255,0.05)`, `border: 1px solid rgba(255,255,255,0.1)`, `color: #e2e8f0`, `border-radius: 99px`, `padding: 5px 14px`, `font-size: 0.78rem`. Shows `↗ GitHub`. Hover: `border-color: var(--accent)`, `color: var(--accent)`. If no URL found, this element does not exist.
+- **Stat chips**: row of 3–5 pills. `background: rgba(168,85,247,0.08)`, `border: 1px solid rgba(168,85,247,0.2)`, `color: #c4b5fd`, `border-radius: 99px`, `padding: 4px 11px`, `font-size: 0.73rem`
-A static horizontal pipeline: 4–5 boxes connected by `→` arrows.
+---
+#### Card 2 Body — Dataset
+**Source line** — monospace, one line:
 ```
-[ Prompt ] → [ Model ] → [ Response ] → [ Scoring ] → [ Score ]
+HuggingFace · openai/gsm8k · train split
 ```
+Style: `background: rgba(34,211,238,0.05)`, `border-left: 3px solid #22d3ee`, `padding: 8px 14px`, `border-radius: 0 6px 6px 0`, `font-size: 0.82rem`, `color: #67e8f9`
-Each box:
-- `background: var(--bg-card)`, `border: 1.5px solid var(--border)`, `border-radius: 8px`, `padding: 10px 16px`
-- **Bold label** (2–4 words) on top
-- One-line description beneath in muted text, `font-size: 0.8rem`
-- On hover: `border-color: var(--accent)`
+**Field list** — compact, beneath the source line:
+Each field on one line: `field_name` in cyan monospace + `·` + type/purpose in muted text. `font-size: 0.8rem`, `line-height: 1.8`.
-Arrows: plain `→` character in muted color between boxes. No SVG, no animation.
+**Example row** — the main content:
+A clean structured display. Label: `EXAMPLE ROW` in `0.65rem` cyan caps. Then each field:
+- Field name: cyan monospace, `font-size: 0.78rem`
+- Value: white, `font-size: 0.82rem`, `line-height: 1.5`
+- Long text values (prompts, answers): wrapped in a soft box — `background: rgba(255,255,255,0.03)`, `border-radius: 6px`, `padding: 8px 12px`, `margin-top: 2px`
+- Full content — never truncated
-Layout: `display: flex; align-items: center; gap: 8px; flex-wrap: wrap` so it reflows gracefully on smaller screens.
+---
-### Tab 3 — Rewards
+#### Card 3 Body — Rewards
-A clean list. For each reward function:
+For each reward function, a compact block:
 ```
-reward_name
-One sentence describing what it measures.
+format_reward                               [float]
+Checks response contains <answer>…</answer>
+  ✗ 0  tags absent or inner content non-numeric
+  ✓ 1  tags present, content is a valid integer
 ```
-- Name: monospace, accent color, `font-size: 0.9rem`
-- Description: normal prose, secondary text color, `font-size: 0.875rem`
-- Separated by a thin `border-bottom: 1px solid var(--border)`
+- Name: monospace, `color: #fcd34d`, `font-weight: 600`, `font-size: 0.88rem`
+- `[float]` badge: `font-size: 0.68rem`, `color: #6b7280`, right-aligned via `display: flex justify-content: space-between`
+- Description line: `color: #94a3b8`, `font-size: 0.8rem`, `margin: 4px 0 6px`
+- `✗ 0` / `✓ 1` lines: `font-size: 0.78rem`, `✗` in `#f87171`, `✓` in `#4ade80`, text in `#94a3b8`
+- Block: `background: rgba(245,158,11,0.05)`, `border: 1px solid rgba(245,158,11,0.12)`, `border-radius: 10px`, `padding: 12px 14px`, `margin-bottom: 10px`
-If there is a composite formula, show it below the list in a single styled block:
+If composite formula exists — after all reward blocks:
 ```
-background: var(--accent-glow)   /* rgba(168,85,247,0.10) */
-border: 1px solid var(--accent)
-border-radius: 6px
-padding: 12px 16px
-font-family: monospace
-color: var(--accent)
+background: rgba(245,158,11,0.08)
+border: 1px solid rgba(245,158,11,0.25)
+border-radius: 8px · padding: 10px 14px
+font-family: monospace · color: #fcd34d · font-size: 0.85rem
 ```
-Nothing else on this tab — no weights, no score bars, no judge details.
+---
-### Theme Toggle
+#### Card 4 Body — Rollout
-A small pill button, top-right of the header. Shows `☀` in dark mode, `☾` in light mode.
+Numbered steps. Each step:
-```js
-const root = document.documentElement;
-const btn = document.getElementById('theme-toggle');
-const saved = localStorage.getItem('pi-theme');
-if (saved) root.setAttribute('data-theme', saved);
-btn.addEventListener('click', () => {
-  const next = root.getAttribute('data-theme') === 'dark' ? 'light' : 'dark';
-  root.setAttribute('data-theme', next);
-  localStorage.setItem('pi-theme', next);
-});
 ```
+  01
+  Data → Prompt
+  The problem field is inserted into "Solve step by step…"
+  as the user message. No system prompt.
+```
+- Number: `font-size: 2rem`, `font-weight: 800`, `color: var(--accent)`, `opacity: 0.25`, `line-height: 1`
+- Title: `font-size: 0.88rem`, `font-weight: 700`, `color: #f1f5f9`, `margin: 2px 0`
+- Description: `font-size: 0.8rem`, `color: #94a3b8`, `line-height: 1.55`
+- Left connector: `border-left: 2px solid rgba(244,63,94,0.15)`, `padding-left: 16px`, `margin-left: 12px`, except on last step
+- Between steps: `margin-bottom: 16px`
-### Tab Switching
+Always 5 steps: Data→Prompt · Model Response · Reward Evaluation · Score Computation · Perfect vs Zero.
-```js
-document.querySelectorAll('.tab-btn').forEach(btn => {
-  btn.addEventListener('click', () => {
-    document.querySelectorAll('.tab-btn').forEach(b => b.classList.remove('active'));
-    document.querySelectorAll('.tab-panel').forEach(p => p.classList.remove('active'));
-    btn.classList.add('active');
-    document.getElementById(btn.dataset.tab).classList.add('active');
-  });
-});
+---
+### Reveal Animation
+```css
+@keyframes flyOut {
+  to { transform: translateX(-120%) rotate(-8deg); opacity: 0; }
+}
+@keyframes riseUp {
+  from { transform: scale(0.96) translateY(18px); opacity: 0.65; }
+  to   { transform: scale(1) translateY(0); opacity: 1; }
+}
+```
+On click:
+1. Active card: `flyOut 0.45s cubic-bezier(0.4, 0, 0.2, 1) forwards`
+2. After 80ms: next card transitions from depth-1 styles to depth-0 styles — `riseUp 0.5s cubic-bezier(0.34, 1.56, 0.64, 1)` (spring overshoot)
+3. All remaining cards shift their `data-depth` attributes down by 1
+4. Progress dots update with a `0.3s` width transition
+Guard all animations:
+```css
+@media (prefers-reduced-motion: reduce) {
+  *, *::before, *::after { animation: none !important; transition: none !important; }
+}
 ```
-Active tab style: `border-bottom: 2px solid var(--accent)`, accent color text. Inactive: muted text, no border.
+---
+### Page Chrome
+**Top:** Environment name in small muted text, centered, `font-size: 0.75rem`, `color: #334155`, `letter-spacing: 0.08em`, `margin-bottom: 32px`.
+**Bottom:** `Generated by Claude · Prime Intellect · <timestamp>` — `font-size: 0.68rem`, `color: #1e293b`, `margin-top: 28px`.
+Nothing else. No nav, no sidebar, no header. The cards are the whole UI.
 ---
 ## Step 3 — Confirm and report
-After writing the file, tell the user:
-- The full path and `open environment_overview.html` command
-- Two sentences: what the environment does and how it scores
+After writing the file:
+- Give the full path and `open environment_overview.html`
+- Two sentences: what the environment trains and how it scores
 ## Anti-patterns
-- Do not add config parameters, file maps, quick-start commands, or any section beyond the three tabs
-- Do not add animations, score bars, copy buttons, or collapsible sections
-- Do not hallucinate reward weights, defaults, or prompt content not found in the source
-- Do not skip helper modules — they often contain the core scoring logic
-- If content would cause scrolling within a tab, cut it further
+- Do not dump all content at once — each card is one focused chapter
+- Do not truncate the example row — every field, every value, in full
+- Do not invent a GitHub URL — only include it if found in the source
+- Do not hallucinate reward weights, field names, or dataset content
+- Do not skip helper modules — they contain the core reward logic
+- Do not add tabs, sidebars, scroll-within-cards, or any structure beyond the 4-card deck
+- Do not use a light theme — dark only
+- Do not use Inter, Roboto, or any Google Font