npm - @chaprola/mcp-server - Versions diffs - 1.8.0 → 1.9.0 - Mend

@chaprola/mcp-server 1.8.0 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/dist/index.js +15 -1
package/package.json +1 -1
package/references/cookbook.md +311 -430
package/references/gotchas.md +9 -6

package/dist/index.js CHANGED Viewed

@@ -416,7 +416,21 @@ server.tool("chaprola_list", "List files in a project with optional wildcard pat
     return textResult(res);
 }));
 // --- Compile ---
-server.tool("chaprola_compile", "Compile Chaprola source (.CS) to bytecode (.PR). READ chaprola://cookbook BEFORE writing source. Key syntax: no PROGRAM keyword (start with commands), no commas, reports can use MOVE+PRINT 0 buffers or one-line PRINT concatenation, SEEK for primary records, OPEN/READ/WRITE/CLOSE for secondary files, LET supports one operation (no parentheses). Use primary_format to enable P.fieldname addressing (recommended) — the compiler resolves field names to positions and lengths from the format file. If compile fails, call chaprola_help before retrying.", {
+server.tool("chaprola_compile", `Compile Chaprola source (.CS) to bytecode (.PR). READ chaprola://cookbook BEFORE writing source.
+STYLE RULES (mandatory — project review enforces these):
+1. Use QUERY instead of SEEK loops for filtering or single-record lookup. SEEK loops only for processing every record unconditionally.
+2. Don't use MOVE + IF EQUAL for comparisons — use QUERY WHERE.
+3. Use implicit variable assignment (LET name = value) — don't use DEFINE VARIABLE.
+4. END/STOP only for early exit — not needed at end of program.
+5. OPEN PRIMARY not needed when using QUERY with primary_format.
+6. Use named read (READ name rec + name.field) instead of OPEN SECONDARY + S.field for QUERY results.
+7. Every program MUST have an intent file (.DS) — one paragraph: what the program does, parameters, output, who uses it.
+8. Add a comment header — first lines describe purpose and parameters.
+9. Use PRINT concatenation (PRINT "text" + P.field + R1), not MOVE + PRINT 0 buffers.
+10. Use RECORDNUMBERS for bulk delete: QUERY INTO name, then DELETE PRIMARY name.RECORDNUMBERS.
+KEY SYNTAX: no PROGRAM keyword (start with commands), no commas, LET supports one operation (no parentheses), no built-in functions. Use primary_format to enable P.fieldname addressing — the compiler resolves field names to positions and lengths from the format file. If compile fails, call chaprola_help before retrying.`, {
     project: z.string().describe("Project name"),
     name: z.string().describe("Program name (without extension)"),
     source: z.string().describe("Chaprola source code"),

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@chaprola/mcp-server",
-  "version": "1.8.0",
+  "version": "1.9.0",
   "description": "MCP server for Chaprola — agent-first data platform. Gives AI agents tools for structured data storage, record CRUD, querying, schema inspection, documentation lookup, web search, URL fetching, scheduled jobs, scoped site keys, and execution via plain HTTP.",
   "type": "module",
   "main": "dist/index.js",

package/references/cookbook.md CHANGED Viewed

@@ -1,5 +1,18 @@
 # Chaprola Cookbook — Quick Reference
+## Style Rules (mandatory)
+Every program must follow these rules. The project review system enforces them.
+1. **Use QUERY instead of SEEK loops** for filtering or single-record lookup. SEEK loops are only appropriate when processing every record unconditionally.
+2. **Don't use MOVE + IF EQUAL for comparisons** — use QUERY WHERE.
+3. **Use implicit variable assignment** (`LET name = value`) — don't use DEFINE VARIABLE.
+4. **END/STOP only for early exit** — not needed at the end of a program.
+5. **OPEN PRIMARY not needed** when using QUERY with primary_format on compile.
+6. **Use named read** (`READ name rec` + `name.field`) instead of OPEN SECONDARY + S.field for QUERY results.
+7. **Every program must have an intent file (.DS)** — one paragraph: what the program does, what parameters it accepts, what output it produces, who uses it. Create the intent with chaprola_compile or write it manually.
+8. **Add a comment header** — first line(s) should describe the program's purpose and parameters.
 ## Workflow: Import → Compile → Run
 ```bash
@@ -13,136 +26,192 @@ POST /compile {userid, project, name: "REPORT", source: "...", primary_format: "
 POST /run {userid, project, name: "REPORT", primary_file: "STAFF", record: 1}
 ```
-## R-Variable Ranges
-| Range | Purpose | Safe for DEFINE VARIABLE? |
-|-------|---------|--------------------------|
-| R1–R20 | HULDRA elements (parameters) | No — HULDRA overwrites these |
-| R21–R40 | HULDRA objectives (error metrics) | No — HULDRA reads these |
-| R41–R99 | Scratch space | **Yes — always use R41–R99 for DEFINE VARIABLE** |
-For non-HULDRA programs, R1–R40 are technically available but using R41–R99 is a good habit.
-## PRINT: Preferred Output Methods
+## Hello World (no data file)
-**Concatenation (preferred):**
 ```chaprola
-PRINT P.name + " — " + P.department + " — $" + R41
-PRINT "Total: " + R42
-PRINT P.last_name                  // single field, auto-trimmed
-PRINT "Hello from Chaprola!"      // literal string
+// Hello World — minimal Chaprola program
+PRINT "Hello from Chaprola!"
 ```
-- String literals are copied as-is.
-- P./S./U./X. fields are auto-trimmed (trailing spaces removed).
-- R-variables print as integers when no fractional part, otherwise as floats.
-- Concatenation auto-flushes the line.
+No END needed — the program ends naturally.
-**U buffer output (for fixed-width columnar reports only):**
-```chaprola
-CLEAR U
-MOVE P.name U.1 20
-PUT sal INTO U.22 10 D 2
-PRINT 0                            // output entire U buffer, then clear
-```
+## Loop Through All Records
-## Hello World (no data file)
+When processing every record unconditionally (no filter), SEEK loops are appropriate:
 ```chaprola
-PRINT "Hello from Chaprola!"
-END
+// REPORT — List all staff with salaries
+// Primary: STAFF
+LET rec = 1
+100 SEEK rec
+    IF EOF END
+    PRINT P.name + " — " + P.salary
+    LET rec = rec + 1
+    GOTO 100
 ```
-## Loop Through All Records
+Compile with `primary_format: "STAFF"`. No OPEN PRIMARY needed — the compiler reads the format from primary_format.
-Always start programs with `OPEN PRIMARY` to declare the data file. This makes the program self-documenting and eliminates the need for `primary_format` on compile.
+## Filtered Report (QUERY)
+When you want a subset of records, use QUERY — not a SEEK loop with IF conditions:
 ```chaprola
-OPEN PRIMARY "STAFF" 0
-DEFINE VARIABLE rec R41
+// HIGH_EARNERS — List staff earning over 80000
+// Primary: STAFF
+QUERY STAFF INTO earners WHERE salary GT 80000 ORDER BY salary DESC
 LET rec = 1
-100 SEEK rec
-    IF EOF GOTO 900
-    PRINT P.name + " — " + P.salary
+100 READ earners rec
+    IF EOF END
+    PRINT earners.name + " — $" + earners.salary
     LET rec = rec + 1
     GOTO 100
-900 END
 ```
-Compile without `primary_format` — the compiler reads the format from `OPEN PRIMARY`:
-```bash
-POST /compile {userid, project, name: "REPORT", source: "OPEN PRIMARY \"STAFF\" 0\n..."}
+## Single-Record Lookup (QUERY)
+For finding one record by key, use QUERY — not a SEEK/MOVE/IF EQUAL loop:
+```chaprola
+// DETAIL — Look up a single staff member by ID
+// Parameter: staff_id
+// Primary: STAFF
+QUERY STAFF INTO person WHERE staff_id EQ PARAM.staff_id
+LET rec = 1
+READ person rec
+IF EOF END
+PRINT "Name: " + person.name
+PRINT "Department: " + person.department
+PRINT "Salary: $" + person.salary
 ```
-## Filtered Report
+## DELETE with QUERY (RECORDNUMBERS)
+Use QUERY to find records, then DELETE with RECORDNUMBERS for bulk deletion:
 ```chaprola
-GET sal FROM P.salary
-IF sal LT 80000 GOTO 200           // skip low earners
-PRINT P.name + " — " + R41
-200 LET rec = rec + 1
+// CLEANUP — Delete all closed polls and their votes
+// Parameter: poll_id
+// Primary: polls, Secondary: votes
+QUERY polls INTO poll WHERE poll_id EQ PARAM.poll_id
+DELETE PRIMARY poll.RECORDNUMBERS
+OPEN "votes" WHERE poll_id EQ PARAM.poll_id
+DELETE votes.RECORDNUMBERS
+CLOSE
+IF poll.RECORDCOUNT EQ 0 GOTO 900 ;
+PRINT "STATUS|OK"
+PRINT "VOTES_DELETED|" + votes.RECORDCOUNT
+END
+900 PRINT "STATUS|NOT_FOUND"
 ```
+- `poll.RECORDNUMBERS` returns all physical record positions matched by the QUERY
+- `DELETE PRIMARY poll.RECORDNUMBERS` bulk-deletes them in one statement
+- `votes.RECORDCOUNT` returns the filtered count from OPEN WHERE, not total file count
 ## JOIN Two Files (FIND)
 ```chaprola
+// ROSTER — Staff with department names
+// Primary: EMPLOYEES, Secondary: DEPARTMENTS
 OPEN "DEPARTMENTS" 0
-FIND match FROM S.dept_code USING P.dept_code
-IF match EQ 0 GOTO 200             // no match
-READ match                          // load matched secondary record
-PRINT P.name + " — " + S.dept_name
+LET rec = 1
+100 SEEK rec
+    IF EOF END
+    FIND match FROM S.dept_code USING P.dept_code
+    IF match EQ 0 GOTO 200
+    READ match
+    PRINT P.name + " — " + S.dept_name
+200 LET rec = rec + 1
+    GOTO 100
 ```
-Compile with both formats so the compiler resolves fields from both files:
+Compile with both formats:
 ```bash
 POST /compile {
-  userid, project, name: "REPORT",
+  userid, project, name: "ROSTER",
   source: "...",
   primary_format: "EMPLOYEES",
   secondary_format: "DEPARTMENTS"
 }
 ```
-## Comparing Two Memory Locations
-IF EQUAL compares a literal to a location. To compare two memory locations, copy both to U buffer:
+## PRINT: Preferred Output Methods
+**Concatenation (preferred):**
 ```chaprola
-MOVE PARAM.poll_id U.200 12
-MOVE P.poll_id U.180 12
-IF EQUAL U.200 U.180 12 GOTO 200   // match — jump to handler
+PRINT P.name + " — " + P.department + " — $" + R41
+PRINT "Total: " + R42
+PRINT P.last_name                  // single field, auto-trimmed
+PRINT "Hello from Chaprola!"      // literal string
 ```
-## Read-Modify-Write (UPDATE)
+- String literals are copied as-is.
+- P./S./U./X. fields are auto-trimmed (trailing spaces removed).
+- R-variables print as integers when no fractional part, otherwise as floats.
+- Concatenation auto-flushes the line.
+**U buffer output (for fixed-width columnar reports only):**
 ```chaprola
-READ match                          // load record
-GET bal FROM S.balance              // read current value
-LET bal = bal + amt                 // modify
-PUT bal INTO S.balance F 0          // write back to S memory (length auto-filled)
-WRITE match                         // flush to disk
-CLOSE                               // flush all at end
+CLEAR U
+MOVE P.name U.1 20
+PUT sal INTO U.22 10 D 2
+PRINT 0                            // output entire U buffer, then clear
 ```
-## Date Arithmetic
+## R-Variable Ranges
+| Range | Purpose | Notes |
+|-------|---------|-------|
+| R1–R20 | HULDRA elements (parameters) | HULDRA overwrites these |
+| R21–R40 | HULDRA objectives (error metrics) | HULDRA reads these |
+| R41–R99 | Scratch space | Always safe |
+For non-HULDRA programs, all R1–R99 are available. Use implicit assignment (`LET name = value`) — the compiler assigns R-variable slots automatically.
+## Read-Modify-Write (UPDATE)
 ```chaprola
-GET DATE R41 FROM X.primary_modified   // when was file last changed?
-GET DATE R42 FROM X.utc_time           // what time is it now?
-LET R43 = R42 - R41                    // difference in seconds
-LET R43 = R43 / 86400                  // convert to days
-IF R43 GT 30 PRINT "WARNING: file is over 30 days old" ;
+// UPDATE_BALANCE — Add amount to account balance
+// Primary: ACCOUNTS, Secondary: LEDGER
+OPEN "LEDGER" 0
+FIND match FROM S.account_id USING P.account_id
+IF match EQ 0 END
+READ match
+GET bal FROM S.balance
+LET bal = bal + amt
+PUT bal INTO S.balance F 0
+WRITE match
+CLOSE
 ```
-## Get Current User
+## Date Arithmetic
 ```chaprola
-PRINT "Logged in as: " + X.username
+// CHECK_FRESHNESS — Warn if file is stale
+// Primary: any
+GET DATE R1 FROM X.primary_modified
+GET DATE R2 FROM X.utc_time
+LET days = R2 - R1
+LET days = days / 86400
+IF days GT 30 PRINT "WARNING: file is over 30 days old" ;
 ```
 ## System Text Properties (X.)
-Access system metadata by property name — no numeric positions needed:
+Access system metadata by property name:
 | Property | Description |
 |----------|-------------|
@@ -157,6 +226,51 @@ Access system metadata by property name — no numeric positions needed:
 | `X.primary_modified` | Primary file Last-Modified |
 | `X.secondary_modified` | Secondary file Last-Modified |
+## Parameterized Reports (PARAM.name)
+Programs accept named parameters from URL query strings:
+```chaprola
+// STAFF_BY_DEPT — List staff in a department
+// Parameter: dept
+// Primary: STAFF
+QUERY STAFF INTO team WHERE department EQ PARAM.dept ORDER BY salary DESC
+LET rec = 1
+100 READ team rec
+    IF EOF END
+    PRINT team.name + " — " + team.title + " — $" + team.salary
+    LET rec = rec + 1
+    GOTO 100
+```
+Publish with: `POST /publish {userid, project, name: "STAFF_BY_DEPT", primary_file: "STAFF", acl: "authenticated"}`
+Call with: `GET /report?userid=X&project=Y&name=STAFF_BY_DEPT&dept=Engineering`
+Discover params: `POST /report/params {userid, project, name}` — returns .PF schema
+## Cross-File Filtering (IN/NOT IN)
+Use QUERY with NOT IN to find records in one file that don't appear in another:
+```chaprola
+// UNREVIEWED — Find flashcards the user hasn't studied yet
+// Parameters: username
+// Primary: flashcards
+QUERY progress INTO reviewed WHERE username EQ PARAM.username
+QUERY flashcards INTO new_cards WHERE kanji NOT IN "reviewed.kanji"
+LET rec = 1
+100 READ new_cards rec
+    IF EOF END
+    PRINT new_cards.kanji + " — " + new_cards.reading + " — " + new_cards.meaning
+    LET rec = rec + 1
+    GOTO 100
+```
+If the IN-file doesn't exist (e.g., new user), NOT IN treats it as empty — all records pass.
 ## Async for Large Datasets
 ```bash
@@ -169,77 +283,47 @@ POST /run/status {userid, project, job_id}
 # Response: {status: "done", output: "..."}
 ```
-## Parameterized Reports (PARAM.name)
+## Public Apps: Use /report, Not /query
-Programs can accept named parameters from URL query strings. Use this for dynamic reports.
+Site keys require BAA signing. For public-facing web apps, use published reports (`/report`) instead of `/query` for all read operations. `/report` is public — no auth or BAA needed.
+```javascript
+// GOOD: public report — no auth needed
+const url = `${API}/report?userid=myapp&project=data&name=RESULTS&poll_id=${id}`;
+const response = await fetch(url);
+// BAD: /query with site key — fails if BAA not signed
+const response = await fetch(`${API}/query`, {
+  headers: { 'Authorization': `Bearer ${SITE_KEY}` },
+  body: JSON.stringify({ userid: 'myapp', project: 'data', file: 'votes', where: [...] })
+});
+```
+## Chart Data with TABULATE
 ```chaprola
-// Report that accepts &deck=kanji&level=3 as URL params
-MOVE PARAM.deck U.1 20       // string param → U buffer
-LET lvl = PARAM.level        // numeric param → R variable
-SEEK 1
-100 IF EOF GOTO 900
-    MOVE P.deck U.30 10
-    IF EQUAL PARAM.deck U.30 GOTO 200   // filter by deck param
-    GOTO 300
-200 GET cardlvl FROM P.level
-    IF cardlvl NE lvl GOTO 300           // filter by level param
-    PRINT P.kanji + " — " + P.reading
-300 LET rec = rec + 1
-    SEEK rec
-    GOTO 100
-900 END
+// TRENDS — Cross-tabulate mortality by cause and year
+// Primary: mortality
+TABULATE mortality SUM deaths FOR cause VS year WHERE year GE "2020" INTO trends
+PRINT TABULATE trends AS CSV
 ```
-Publish with: `POST /publish {userid, project, name, primary_file, acl: "authenticated"}`
-Call with: `GET /report?userid=X&project=Y&name=Z&deck=kanji&level=3`
-Discover params: `POST /report/params {userid, project, name}` → returns .PF schema (field names, types, widths)
+For web apps, use `PRINT TABULATE trends AS JSON`.
 ## Named Output Positions (U.name)
-Instead of `U.1`, `U.12`, etc., use named positions for readable code:
+Instead of `U.1`, `U.12`, use named positions:
 ```chaprola
-// U.name positions are auto-allocated by the compiler
 MOVE P.name U.name 20
 MOVE P.dept U.dept 10
 PUT sal INTO U.salary 10 D 0
 PRINT 0
 ```
-## GROUP BY with Pivot (via /query)
-Chaprola's pivot IS GROUP BY. Schema: `{row, column, value, aggregate}`.
-```bash
-# SQL: SELECT department, AVG(salary) FROM staff GROUP BY department
-POST /query {
-  userid, project, file: "STAFF",
-  pivot: {
-    row: "department",
-    column: "",
-    value: "salary",
-    aggregate: "avg"
-  }
-}
-# SQL: SELECT department, year, SUM(revenue) FROM sales GROUP BY department, year
-POST /query {
-  userid, project, file: "SALES",
-  pivot: {
-    row: "department",
-    column: "year",
-    value: "revenue",
-    aggregate: "sum"
-  }
-}
-```
-- `row` — grouping field
-- `column` — cross-tab field (use `""` for simple aggregation)
-- `value` — field to aggregate (string or array of strings)
-- `aggregate` — function: `count`, `sum`, `avg`, `min`, `max`, `stddev`
-- Row and column totals included automatically in response
+Positions are auto-allocated by the compiler.
 ## PUT Format Codes
@@ -252,39 +336,17 @@ POST /query {
 Syntax: `PUT R41 INTO P.salary D 2` (R-var, field name, format, decimals — length auto-filled)
-## Common Field Widths
-| Data type | Chars | Example |
-|-----------|-------|---------|
-| ISO datetime | 20 | `2026-03-28T14:30:00Z` |
-| UUID | 36 | `550e8400-e29b-41d4-a716-446655440000` |
-| Email | 50 | `user@example.com` |
-| Short ID | 8–12 | `poll_001` |
-| Dollar amount | 10 | `$1,234.56` |
-| Phone | 15 | `+1-555-123-4567` |
-Use these when sizing MOVE lengths and U buffer positions.
-## Memory Regions
-| Prefix | Description |
-|--------|-------------|
-| `P` | Primary data file — use field names: `P.salary`, `P.name` |
-| `S` | Secondary data file — use field names: `S.dept`, `S.emp_id` |
-| `U` | User buffer (scratch for output) |
-| `X` | System text — use property names: `X.username`, `X.utc_time` |
 ## Math Intrinsics
 ```chaprola
 LET R42 = EXP R41      // e^R41
 LET R42 = LOG R41      // ln(R41)
-LET R42 = SQRT R41     // √R41
+LET R42 = SQRT R41     // sqrt(R41)
 LET R42 = ABS R41      // |R41|
 LET R43 = POW R41 R42  // R41^R42
 ```
-## Import-Download: URL → Dataset (Parquet, Excel, CSV, JSON)
+## Import-Download: URL to Dataset (Parquet, Excel, CSV, JSON)
 ```bash
 # Import Parquet from a cloud data lake
@@ -304,16 +366,57 @@ POST /import-download {
 ```
 Supports: CSV, TSV, JSON, NDJSON, Parquet (zstd/snappy/lz4), Excel (.xlsx/.xls).
-AI instructions are optional — omit to import all columns as-is.
 Lambda: 10 GB /tmp, 900s timeout, 500 MB download limit.
+## GROUP BY with Pivot (via /query)
+Chaprola's pivot IS GROUP BY:
+```bash
+# SQL: SELECT department, AVG(salary) FROM staff GROUP BY department
+POST /query {
+  userid, project, file: "STAFF",
+  pivot: {
+    row: "department",
+    column: "",
+    value: "salary",
+    aggregate: "avg"
+  }
+}
+```
+- `row` — grouping field
+- `column` — cross-tab field (use `""` for simple aggregation)
+- `value` — field to aggregate
+- `aggregate` — count, sum, avg, min, max, stddev
+## Common Field Widths
+| Data type | Chars | Example |
+|-----------|-------|---------|
+| ISO datetime | 20 | `2026-03-28T14:30:00Z` |
+| UUID | 36 | `550e8400-e29b-41d4-a716-446655440000` |
+| Email | 50 | `user@example.com` |
+| Short ID | 8–12 | `poll_001` |
+| Dollar amount | 10 | `$1,234.56` |
+| Phone | 15 | `+1-555-123-4567` |
+## Memory Regions
+| Prefix | Description |
+|--------|-------------|
+| `P` | Primary data file — use field names: `P.salary`, `P.name` |
+| `S` | Secondary data file — use field names: `S.dept`, `S.emp_id` |
+| `U` | User buffer (scratch for output) |
+| `X` | System text — use property names: `X.username`, `X.utc_time` |
 ## HULDRA Optimization — Nonlinear Parameter Fitting
-HULDRA finds the best parameter values for a mathematical model by minimizing the difference between model predictions and observed data. You propose a model, HULDRA finds the coefficients.
+HULDRA finds the best parameter values for a mathematical model by minimizing the difference between model predictions and observed data.
 ### How It Works
-1. You write a VALUE program (normal Chaprola) that reads data, computes predictions using R-variable parameters, and stores the error in an objective R-variable
+1. Write a VALUE program that reads data, computes predictions using R-variable parameters, and stores the error in an objective R-variable
 2. HULDRA repeatedly runs your program with different parameter values, using gradient descent to minimize the objective
 3. When the objective stops improving, HULDRA returns the optimal parameters
@@ -327,7 +430,7 @@ HULDRA finds the best parameter values for a mathematical model by minimizing th
 ### Complete Example: Fit a Linear Model
-**Goal:** Find `salary = a × years_exp + b` that best fits employee data.
+**Goal:** Find `salary = a * years_exp + b` that best fits employee data.
 **Step 1: Import data**
 ```bash
@@ -344,33 +447,29 @@ POST /import {
 ```
 **Step 2: Write and compile the VALUE program**
+Note: HULDRA VALUE programs are the one case where you MUST use R1-R20 directly (HULDRA sets elements) and R21-R40 directly (HULDRA reads objectives). Use implicit assignment for scratch variables R41+.
 ```chaprola
-// VALUE program: salary = R1 * years_exp + R2
+// SALFIT — Linear salary model: salary = R1 * years_exp + R2
 // R1 = slope (per-year raise), R2 = base salary
 // R21 = sum of squared residuals (SSR)
+// Primary: EMP
-DEFINE VARIABLE REC R41
-DEFINE VARIABLE YRS R42
-DEFINE VARIABLE SAL R43
-DEFINE VARIABLE PRED R44
-DEFINE VARIABLE RESID R45
-DEFINE VARIABLE SSR R46
-LET SSR = 0
-LET REC = 1
-100 SEEK REC
+LET ssr = 0
+LET rec = 1
+100 SEEK rec
     IF EOF GOTO 200
-    GET YRS FROM P.years_exp
-    GET SAL FROM P.salary
-    LET PRED = R1 * YRS
-    LET PRED = PRED + R2
-    LET RESID = PRED - SAL
-    LET RESID = RESID * RESID
-    LET SSR = SSR + RESID
-    LET REC = REC + 1
+    GET yrs FROM P.years_exp
+    GET sal FROM P.salary
+    LET pred = R1 * yrs
+    LET pred = pred + R2
+    LET resid = pred - sal
+    LET resid = resid * resid
+    LET ssr = ssr + resid
+    LET rec = rec + 1
     GOTO 100
-200 LET R21 = SSR
-    END
+200 LET R21 = ssr
 ```
 Compile with: `primary_format: "EMP"`
@@ -392,285 +491,67 @@ POST /optimize {
 }
 ```
-**Response:**
-```json
-{
-  "status": "converged",
-  "iterations": 12,
-  "elements": [
-    {"index": 1, "label": "per_year_raise", "value": 4876.5},
-    {"index": 2, "label": "base_salary", "value": 46230.1}
-  ],
-  "objectives": [
-    {"index": 1, "label": "SSR", "value": 2841050.3, "goal": 0.0}
-  ],
-  "elapsed_seconds": 0.02
-}
-```
-**Result:** `salary = $4,877/year × experience + $46,230 base`
-### Element Parameters Explained
+### Element Parameters
 | Field | Description | Guidance |
 |-------|-------------|----------|
-| `index` | Maps to R-variable (1 → R1, 2 → R2, ...) | Max 20 elements |
+| `index` | Maps to R-variable (1 = R1, 2 = R2, ...) | Max 20 elements |
 | `label` | Human-readable name | Returned in results |
-| `start` | Initial guess | Closer to true value = faster convergence |
-| `min`, `max` | Bounds | HULDRA clamps parameters to this range |
-| `delta` | Step size for gradient computation | ~0.1% of expected value range. Too large = inaccurate gradients. Too small = numerical noise |
+| `start` | Initial guess | Closer = faster convergence |
+| `min`, `max` | Bounds | HULDRA clamps to this range |
+| `delta` | Step size for gradient | ~0.1% of expected value range |
 ### Choosing Delta Values
-Delta controls how HULDRA estimates gradients (via central differences). Rules of thumb:
-- **Dollar amounts** (fares, salaries): `delta: 0.01` to `1.0`
-- **Rates/percentages** (per-mile, per-minute): `delta: 0.001` to `0.01`
+- **Dollar amounts**: `delta: 0.01` to `1.0`
+- **Rates/percentages**: `delta: 0.001` to `0.01`
 - **Counts/integers**: `delta: 0.1` to `1.0`
-- **Time values** (hours, peaks): `delta: 0.05` to `0.5`
-If optimization doesn't converge, try making delta smaller.
-### Performance & Limits
+- **Time values**: `delta: 0.05` to `0.5`
-HULDRA runs your VALUE program **1 + 2 × N_elements** times per iteration (once for evaluation, twice per element for gradient). With `max_iterations: 100`:
+### Performance
-| Elements | VM runs/iteration | At 100 iterations |
-|----------|-------------------|-------------------|
-| 2 | 5 | 500 |
-| 3 | 7 | 700 |
-| 5 | 11 | 1,100 |
-| 10 | 21 | 2,100 |
+HULDRA runs your program `1 + 2 * N_elements` times per iteration. Lambda timeout is 900 seconds. For large datasets, sample first — query 200-500 representative records, optimize against the sample.
-**Lambda timeout is 900 seconds.** If each VM run takes 0.01s (100 records), you're fine. If each run takes 1s (100K records), 3 elements × 100 iterations = 700s — cutting it close.
+Use `async_exec: true` for optimizations that might exceed 30 seconds.
-**Strategy for large datasets:** Sample first. Query 200–500 representative records into a smaller dataset, optimize against that. The coefficients transfer to the full dataset.
-```bash
-# Sample 500 records from a large dataset
-POST /query {userid, project, file: "BIGDATA", limit: 500, offset: 100000}
-# Import the sample
-POST /import {userid, project, name: "SAMPLE", data: [...results...]}
-# Optimize against the sample
-POST /optimize {... primary_file: "SAMPLE" ...}
-```
-### Async Optimization
-For optimizations that might exceed 30 seconds (API Gateway timeout), use async mode:
-```bash
-POST /optimize {
-  ... async_exec: true ...
-}
-# Response: {status: "running", job_id: "20260325_..."}
-POST /optimize/status {userid, project, job_id: "20260325_..."}
-# Response: {status: "converged", elements: [...], ...}
-```
-### Multi-Objective Optimization
-HULDRA can minimize multiple objectives simultaneously with different weights:
-```bash
-objectives: [
-  {index: 1, label: "price_error", goal: 0.0, weight: 1.0},
-  {index: 2, label: "volume_error", goal: 0.0, weight: 10.0}
-]
-```
-Higher weight = more important. HULDRA minimizes `Q = sum(weight × (value - goal)²)`.
-### Interpreting Results
-- **`status: "converged"`** — Optimal parameters found. The objective stopped improving.
-- **`status: "timeout"`** — Hit 900s wall clock. Results are the best found so far — often still useful.
-- **`total_objective`** — The raw Q value. Compare across runs, not in absolute terms. Lower = better fit.
-- **`SSR` (objective value)** — Sum of squared residuals. Divide by record count for mean squared error. Take the square root for RMSE in the same units as your data.
-- **`dq_dx` on elements** — Gradient. Values near zero mean the parameter is well-optimized. Large values may indicate the bounds are too tight.
-### Model Catalog — Which Formula to Try
-HULDRA fits any model expressible with Chaprola's math: `+`, `-`, `*`, `/`, `EXP`, `LOG`, `SQRT`, `ABS`, `POW`, and `IF` branching. Use this catalog to pick the right model for your data shape.
+### Model Catalog
 | Model | Formula | When to use | Chaprola math |
 |-------|---------|-------------|---------------|
-| **Linear** | `y = R1*x + R2` | Proportional relationships, constant rate | `*`, `+` |
-| **Multi-linear** | `y = R1*x1 + R2*x2 + R3` | Multiple independent factors | `*`, `+` |
-| **Quadratic** | `y = R1*x^2 + R2*x + R3` | Accelerating/decelerating curves, area scaling | `*`, `+`, `POW` |
-| **Exponential growth** | `y = R1 * EXP(R2*x)` | Compound growth, population, interest | `EXP`, `*` |
-| **Exponential decay** | `y = R1 * EXP(-R2*x) + R3` | Drug clearance, radioactive decay, cooling | `EXP`, `*`, `-` |
-| **Power law** | `y = R1 * POW(x, R2)` | Scaling laws (Zipf, Kleiber), fractal relationships | `POW`, `*` |
-| **Logarithmic** | `y = R1 * LOG(x) + R2` | Diminishing returns, perception (Weber-Fechner) | `LOG`, `*`, `+` |
-| **Gaussian** | `y = R1 * EXP(-(x-R2)^2/(2*R3^2))` | Bell curves, distributions, demand peaks | `EXP`, `*`, `/` |
-| **Logistic (S-curve)** | `y = R1 / (1 + EXP(-R2*(x-R3)))` | Adoption curves, saturation, carrying capacity | `EXP`, `/`, `+` |
-| **Inverse** | `y = R1/x + R2` | Boyle's law, unit cost vs volume | `/`, `+` |
-| **Square root** | `y = R1 * SQRT(x) + R2` | Flow rates (Bernoulli), risk vs portfolio size | `SQRT`, `*`, `+` |
-**How to choose:** Look at your data's shape.
-- Straight line → linear or multi-linear
-- Curves upward faster and faster → exponential growth or quadratic
-- Curves upward then flattens → logarithmic, square root, or logistic
-- Drops fast then levels off → exponential decay or inverse
-- Has a peak/hump → Gaussian
-- Straight on log-log axes → power law
+| **Linear** | `y = R1*x + R2` | Proportional relationships | `*`, `+` |
+| **Multi-linear** | `y = R1*x1 + R2*x2 + R3` | Multiple factors | `*`, `+` |
+| **Quadratic** | `y = R1*x^2 + R2*x + R3` | Accelerating curves | `*`, `+`, `POW` |
+| **Exponential growth** | `y = R1 * EXP(R2*x)` | Compound growth | `EXP`, `*` |
+| **Exponential decay** | `y = R1 * EXP(-R2*x) + R3` | Drug clearance, cooling | `EXP`, `*`, `-` |
+| **Power law** | `y = R1 * POW(x, R2)` | Scaling laws | `POW`, `*` |
+| **Logarithmic** | `y = R1 * LOG(x) + R2` | Diminishing returns | `LOG`, `*`, `+` |
+| **Logistic** | `y = R1 / (1 + EXP(-R2*(x-R3)))` | S-curves, saturation | `EXP`, `/`, `+` |
 ### Nonlinear VALUE Program Patterns
 **Exponential decay:** `y = R1 * exp(-R2 * x) + R3`
 ```chaprola
-LET ARG = R2 * X
-LET ARG = ARG * -1
-LET PRED = EXP ARG
-LET PRED = PRED * R1
-LET PRED = PRED + R3
+LET arg = R2 * x
+LET arg = arg * -1
+LET pred = EXP arg
+LET pred = pred * R1
+LET pred = pred + R3
 ```
 **Power law:** `y = R1 * x^R2`
 ```chaprola
-LET PRED = POW X R2
-LET PRED = PRED * R1
-```
-**Gaussian:** `y = R1 * exp(-(x - R2)^2 / (2 * R3^2))`
-```chaprola
-LET DIFF = X - R2
-LET DIFF = DIFF * DIFF
-LET DENOM = R3 * R3
-LET DENOM = DENOM * 2
-LET ARG = DIFF / DENOM
-LET ARG = ARG * -1
-LET PRED = EXP ARG
-LET PRED = PRED * R1
+LET pred = POW x R2
+LET pred = pred * R1
 ```
 **Logistic S-curve:** `y = R1 / (1 + exp(-R2 * (x - R3)))`
 ```chaprola
-LET ARG = X - R3
-LET ARG = ARG * R2
-LET ARG = ARG * -1
-LET DENOM = EXP ARG
-LET DENOM = DENOM + 1
-LET PRED = R1 / DENOM
-```
-**Logarithmic:** `y = R1 * ln(x) + R2`
-```chaprola
-LET PRED = LOG X
-LET PRED = PRED * R1
-LET PRED = PRED + R2
-```
-All patterns follow the same loop structure: SEEK records, GET fields, compute PRED, accumulate `(PRED - OBS)^2` in SSR, store SSR in R21 at the end.
-## Parameterized Report Endpoint
-Combine QUERY with PARAM to build a dynamic JSON API from a published program. QUERY output is a .QR file (read-only). R20 = matched record count. Missing PARAMs are silently replaced with blank (string) or 0.0 (numeric) — check param warnings in the response for diagnostics.
-```chaprola
-// STAFF_BY_DEPT.CS — call via /report?publisher=admin&program=STAFF_BY_DEPT&dept=Engineering
-QUERY STAFF FIELDS name, salary, title INTO dept_staff WHERE department EQ PARAM.dept ORDER BY salary DESC
-OPEN SECONDARY dept_staff
-DEFINE name = S.name
-DEFINE salary = S.salary
-DEFINE title = S.title
-PRINT "["
-READ SECONDARY
-IF FINI GOTO done
-100 PRINT TRIM "{\"name\":\"" + name + "\",\"title\":\"" + title + "\",\"salary\":" + salary + "}"
-    READ SECONDARY
-    IF FINI GOTO done
-    PRINT ","
-    GOTO 100
-done.
-PRINT "]"
-STOP
-```
-Publish with: `POST /publish {userid, project, name: "STAFF_BY_DEPT", primary_file: "STAFF", acl: "authenticated"}`
-Call with: `POST /report?publisher=admin&program=STAFF_BY_DEPT&dept=Engineering`
-## Cross-File Filtering (IN/NOT IN)
-Use QUERY with NOT IN to find records in one file that don't appear in another. This is the flashcard review pattern — find unreviewed cards by excluding already-reviewed ones. One IN/NOT IN per QUERY.
-If the IN-file doesn't exist (e.g., new user with no progress), NOT IN treats it as empty — all records pass. This is correct: "kanji not in (nothing)" = all kanji.
-```chaprola
-// Step 1: Get the list of kanji the user has already reviewed
-QUERY progress INTO reviewed WHERE username EQ PARAM.username
-// Step 2: Filter flashcards to only those NOT in the reviewed set
-// If progress doesn't exist (new user), all flashcards are returned
-QUERY flashcards INTO new_cards WHERE kanji NOT IN reviewed.kanji
-// Step 3: Loop through unreviewed cards
-OPEN SECONDARY new_cards
-READ SECONDARY
-IF FINI GOTO empty
-100 PRINT S.kanji + " — " + S.reading + " — " + S.meaning
-    READ SECONDARY
-    IF FINI GOTO done
-    GOTO 100
-empty.
-PRINT "All cards reviewed!"
-done.
-STOP
-```
-## Public Apps: Use /report, Not /query
-Site keys require BAA signing. For public-facing web apps, use published reports (`/report`) instead of `/query` for all read operations. `/report` is public — no auth or BAA needed.
-**Pattern:** Move data logic into Chaprola programs (QUERY, TABULATE), publish them, and call `/report` from the frontend. Reserve site keys for write operations (`/insert-record`) only.
-```javascript
-// GOOD: public report — no auth needed, works for anyone
-const url = `${API}/report?userid=myapp&project=data&name=RESULTS&poll_id=${id}`;
-const response = await fetch(url);
-// BAD: /query with site key — fails if BAA not signed (403 Forbidden)
-const response = await fetch(`${API}/query`, {
-  headers: { 'Authorization': `Bearer ${SITE_KEY}` },
-  body: JSON.stringify({ userid: 'myapp', project: 'data', file: 'votes', where: [...] })
-});
-```
-**Why this is better:** The program runs server-side with full access. The frontend gets clean output. No API keys exposed for reads. QUERY + TABULATE in a program replaces client-side pivot logic.
-## Chart Data with TABULATE
-Use TABULATE to produce CSV output suitable for charting. This example cross-tabulates mortality data by cause and year.
-```chaprola
-// Generate a pivot table of death counts by cause and year
-TABULATE mortality SUM deaths FOR cause VS year WHERE year GE "2020" INTO trends
-// Output as CSV — ready for any charting library
-PRINT TABULATE trends AS CSV
-```
-Output:
-```
-cause,2020,2021,2022,2023,total
-Heart disease,690882,693021,699659,702000,2785562
-Cancer,598932,602350,608371,611000,2420653
-...
-```
-For web apps, use JSON output instead:
-```chaprola
-PRINT TABULATE trends AS JSON
+LET arg = x - R3
+LET arg = arg * R2
+LET arg = arg * -1
+LET denom = EXP arg
+LET denom = denom + 1
+LET pred = R1 / denom
 ```
-### Agent Workflow Summary
-1. **Inspect** — Call `/format` to see what fields exist
-2. **Sample** — Use `/query` with `limit` to get a manageable subset (200–500 records)
-3. **Import sample** — `/import` the subset as a new small dataset
-4. **Hypothesize** — Propose a model relating the fields
-5. **Write VALUE program** — Loop through records, compute predicted vs actual, accumulate SSR in R21
-6. **Compile** — `/compile` with `primary_format` pointing to the sample
-7. **Optimize** — `/optimize` with elements, objectives, and the sample as primary_file
-8. **Interpret** — Read the converged element values — those are your model coefficients
-9. **Iterate** — If SSR is high, try a different model (add terms, try nonlinear)
+All patterns follow the same loop structure: SEEK records, GET fields, compute pred, accumulate `(pred - obs)^2` in ssr, store ssr in R21 at the end.

package/references/gotchas.md CHANGED Viewed

@@ -60,8 +60,11 @@ IF EQUAL "CREDIT" U.76 GOTO 200
 ### MOVE literal auto-pads to field width
 `MOVE "Jones" P.name` auto-fills the rest of the field with blanks. No need to clear first.
-### DEFINE VARIABLE names must not collide with field names
-If the format has a `balance` field, don't `DEFINE VARIABLE balance R41`. Use `bal` instead. The compiler confuses the alias with the field name.
+### Don't use DEFINE VARIABLE
+Use implicit assignment: `LET rec = 1`. The compiler assigns R-variable slots automatically. DEFINE VARIABLE is legacy boilerplate.
+### Every program needs an intent file (.DS)
+One paragraph: what the program does, what parameters it accepts, what output it produces. The project review system flags programs without intents.
 ### R-variables are floating point
 All R1–R99 are 64-bit floats. `7 / 2 = 3.5`. Use PUT with `I` format to display as integer.
@@ -120,11 +123,11 @@ Always CLOSE before END if you wrote to the secondary file. Unflushed writes are
 ## HULDRA Optimization
-### Use R41–R99 for scratch variables, not R1–R20
-R1–R20 are reserved for HULDRA elements. R21–R40 are reserved for objectives. Your VALUE program's DEFINE VARIABLE declarations must use R41–R99 only.
+### HULDRA programs: don't use R1–R40 for your variables
+R1–R20 are reserved for HULDRA elements. R21–R40 are reserved for objectives. Use implicit assignment with names that the compiler will place in R41+:
 ```chaprola
-// WRONG: DEFINE VARIABLE counter R1  (HULDRA will overwrite this)
-// RIGHT: DEFINE VARIABLE counter R41
+LET rec = 1    // compiler assigns to R41+, safe from HULDRA
+LET ssr = 0    // accumulate error here, then LET R21 = ssr at the end
 ```
 ### Sample large datasets before optimizing