@graffiticode/l0175 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/compiler.d.ts.map +1 -1
- package/dist/compiler.js +50 -124
- package/dist/compiler.js.map +1 -1
- package/dist/embedding.d.ts +1 -0
- package/dist/embedding.d.ts.map +1 -1
- package/dist/embedding.js +35 -2
- package/dist/embedding.js.map +1 -1
- package/dist/index.d.ts +4 -0
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +2 -0
- package/dist/index.js.map +1 -1
- package/dist/lexicon.d.ts +7 -0
- package/dist/lexicon.d.ts.map +1 -1
- package/dist/lexicon.js +12 -4
- package/dist/lexicon.js.map +1 -1
- package/dist/static/instructions.md +187 -10
- package/dist/static/language-info.json +3 -3
- package/dist/static/lexicon.json +98 -0
- package/dist/static/scope.json +1 -1
- package/dist/static/spec.html +2 -1
- package/dist/static/stems.md +14 -0
- package/dist/static/targets.json +209 -0
- package/dist/static/template.gc +7 -67
- package/dist/static/unparse-hints.json +3 -0
- package/dist/static/usage-guide.md +22 -2
- package/dist/targets.d.ts +25 -0
- package/dist/targets.d.ts.map +1 -0
- package/dist/targets.js +169 -0
- package/dist/targets.js.map +1 -0
- package/dist/verify-example.d.ts +27 -0
- package/dist/verify-example.d.ts.map +1 -0
- package/dist/verify-example.js +82 -0
- package/dist/verify-example.js.map +1 -0
- package/package.json +1 -1
- package/spec/docs.md +5 -2
- package/spec/examples/c1-t10-tm1-multiplechoice.expect.json +1 -0
- package/spec/examples/c1-t10-tm1-multiplechoice.gc +15 -0
- package/spec/examples/c1-t10-tm2-multiselect.expect.json +1 -0
- package/spec/examples/c1-t10-tm2-multiselect.gc +15 -0
- package/spec/examples/c1-t10-tm3-hottext.expect.json +1 -0
- package/spec/examples/c1-t10-tm3-hottext.gc +13 -0
- package/spec/examples/c1-t11-tm1-ebsr.expect.json +4 -0
- package/spec/examples/c1-t11-tm1-ebsr.gc +30 -0
- package/spec/examples/c1-t11-tm2-hottext.expect.json +4 -0
- package/spec/examples/c1-t11-tm2-hottext.gc +30 -0
- package/spec/examples/c1-t11-tm3-shorttext.expect.json +4 -0
- package/spec/examples/c1-t11-tm3-shorttext.gc +30 -0
- package/spec/examples/c1-t4-tm1-ebsr.expect.json +4 -0
- package/spec/examples/c1-t4-tm1-ebsr.gc +30 -0
- package/spec/examples/c1-t4-tm2-hottext.expect.json +4 -0
- package/spec/examples/c1-t4-tm2-hottext.gc +30 -0
- package/spec/examples/c1-t4-tm3-shorttext.expect.json +4 -0
- package/spec/examples/c1-t4-tm3-shorttext.gc +30 -0
- package/spec/examples/c1-t8-tm1-multiplechoice.expect.json +1 -0
- package/spec/examples/c1-t8-tm1-multiplechoice.gc +25 -0
- package/spec/examples/c1-t8-tm2-multiselect.expect.json +1 -0
- package/spec/examples/c1-t8-tm2-multiselect.gc +25 -0
- package/spec/examples/c1-t8-tm3-hottext.expect.json +1 -0
- package/spec/examples/c1-t8-tm3-hottext.gc +25 -0
- package/spec/examples/c1-t9-tm1-multiplechoice.expect.json +1 -0
- package/spec/examples/c1-t9-tm1-multiplechoice.gc +23 -0
- package/spec/examples/c1-t9-tm2-multiselect.expect.json +1 -0
- package/spec/examples/c1-t9-tm2-multiselect.gc +24 -0
- package/spec/examples/c1-t9-tm3-ebsr.expect.json +10 -0
- package/spec/examples/c1-t9-tm3-ebsr.gc +30 -0
- package/spec/examples/c1-t9-tm4-hottext.expect.json +10 -0
- package/spec/examples/c1-t9-tm4-hottext.gc +20 -0
- package/spec/examples/c1-t9-tm5-shorttext.expect.json +10 -0
- package/spec/examples/c1-t9-tm5-shorttext.gc +20 -0
- package/spec/examples.md +55 -11
- package/spec/instructions.md +187 -10
- package/spec/language-info.json +2 -2
- package/spec/rag-examples-design.md +74 -0
- package/spec/scope.json +1 -1
- package/spec/spec.md +9 -1
- package/spec/stems.md +14 -0
- package/spec/template.gc +7 -67
- package/spec/usage-guide.md +22 -2
package/spec/instructions.md
CHANGED
|
@@ -14,7 +14,11 @@ Always declare a top-level `target` (the SBAC learning target the program compos
|
|
|
14
14
|
|
|
15
15
|
- **`c1-t4`** — Target 4: Reasoning & Evidence over **literary** texts (RL standards). Dimensions:
|
|
16
16
|
`character`, `setting`, `event`, `point-of-view`, `theme`, `topic`, `narrators-feelings`,
|
|
17
|
-
`character-relationship`.
|
|
17
|
+
`character-relationship`. Companion standards by dimension: `character` / `character-relationship`
|
|
18
|
+
/ `setting` / `event` → `rl-3`; `point-of-view` / `narrators-feelings` → `rl-6`; **`theme` /
|
|
19
|
+
`topic` → `rl-2`** (the CCSS theme standard — **not** `rl-9`). `rl-1` (cite evidence) is always
|
|
20
|
+
added. You normally **omit** `standard` and let the dimension pick its companion; the full
|
|
21
|
+
Grade-5 **RL** strand (`rl-1`–`rl-7`, `rl-9`) is accepted if you author one explicitly.
|
|
18
22
|
- **`c1-t11`** — Target 11: Reasoning & Evidence over **informational** texts (RI standards).
|
|
19
23
|
Dimensions: `relationships-interactions`, `author-use-of-information`, `point-of-view`,
|
|
20
24
|
`purpose`, `authors-opinion`. Standards: `ri-1` (always) + `ri-3` / `ri-6` / `ri-7` / `ri-8` / `ri-9`.
|
|
@@ -115,8 +119,11 @@ Quote free text (`text`, `rationale`, `subject`, passage heading) and id labels
|
|
|
115
119
|
(required — the id of the supported correct claim; on `multi-select` a **list** of ids = the
|
|
116
120
|
correct set), `stem` (required — the Part A / single-question stem / short-text prompt, authored
|
|
117
121
|
from `stems.md`), and on EBSR `stem-b` (required — the Part B stem). Optional: `subject`,
|
|
118
|
-
`standard`, `dok`,
|
|
119
|
-
|
|
122
|
+
`standard`, `dok`, `task-model` (`tm1`..`tm5` — the per-target task model; the compiler resolves
|
|
123
|
+
it to the item type for the program's `target` and hard-errors if it disagrees with `type`, so it
|
|
124
|
+
both documents and guards intent — see "Task models are per-target" below), and `rubric`
|
|
125
|
+
(short-text only — a list of `band score <n> descriptor "…"` elements; defaults to a 0/1/2 rubric
|
|
126
|
+
if omitted).
|
|
120
127
|
- **band** — a rubric row: `band score 2 descriptor "…" {}`. Used only inside an outcome's `rubric`.
|
|
121
128
|
- **word** / **meaning** (**target `c1-t10` only**) — a top-level `words` list of `word`s; each
|
|
122
129
|
`word` has `id`, `text` (the targeted word/phrase), optional `line`/`quote` (its context), and a
|
|
@@ -138,14 +145,56 @@ Quote free text (`text`, `rationale`, `subject`, passage heading) and id labels
|
|
|
138
145
|
in the focus word's paragraph are warned and dropped.
|
|
139
146
|
- A top-level **`title`** attribute (before `passage`) names the assessment; it is echoed on the output.
|
|
140
147
|
|
|
141
|
-
## Stems (Appropriate Stems
|
|
148
|
+
## Stems (Appropriate Stems)
|
|
142
149
|
|
|
143
150
|
**You author the stem; the compiler does not generate it. Use the guideline's Appropriate-Stem
|
|
144
|
-
templates verbatim — do not invent phrasings.**
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
151
|
+
templates verbatim — do not invent phrasings.** `stems.md` has a **section per target** (T4, T11,
|
|
152
|
+
T9, T8, T10), each with its own task models — open the section for the `target` you picked. For
|
|
153
|
+
each item, pick the one template that matches the item type and the task, and fill its bracketed
|
|
154
|
+
`[...]` slot. Author `stem` (Part A / single-question / short-text prompt) and, on EBSR, `stem-b`
|
|
155
|
+
(Part B).
|
|
156
|
+
|
|
157
|
+
### Task models are per-target — a number alone is meaningless
|
|
158
|
+
|
|
159
|
+
⚠ **Task-model numbers are PER-TARGET and COLLIDE across targets.** The same number maps to a
|
|
160
|
+
different item type depending on the `target`. Look at `tm3` alone:
|
|
161
|
+
|
|
162
|
+
| Number | `c1-t4` / `c1-t11` | `c1-t9` | `c1-t8` / `c1-t10` |
|
|
163
|
+
|--------|--------------------|---------|--------------------|
|
|
164
|
+
| **tm3** | short-text | **ebsr (two-part)** | hot-text |
|
|
165
|
+
|
|
166
|
+
So "task model 3" cannot be resolved without first knowing the target. **Do not assume the
|
|
167
|
+
Reasoning & Evidence (T4/T11) numbering applies elsewhere** — under `c1-t9`, Task Model 3 is EBSR,
|
|
168
|
+
Task Model 4 is Hot Text, Task Model 5 is Short Text.
|
|
169
|
+
|
|
170
|
+
The full per-target task-model → item-type mapping (the compiler enforces exactly this):
|
|
171
|
+
|
|
172
|
+
<!-- GENERATED:task-models START (from targets.json — regenerated by tools/build-static.js; do not edit by hand) -->
|
|
173
|
+
| Target | tm1 | tm2 | tm3 | tm4 | tm5 |
|
|
174
|
+
|--------|-----|-----|-----|-----|-----|
|
|
175
|
+
| `c1-t4` — Grade 5 · Claim 1 · Target 4 (Reasoning & Evidence) | ebsr | hot-text | short-text | — | — |
|
|
176
|
+
| `c1-t11` — Grade 5 · Claim 1 · Target 11 (Reasoning & Evidence) | ebsr | hot-text | short-text | — | — |
|
|
177
|
+
| `c1-t9` — Grade 5 · Claim 1 · Target 9 (Central Ideas) | multiple-choice | multi-select | ebsr | hot-text | short-text |
|
|
178
|
+
| `c1-t8` — Grade 5 · Claim 1 · Target 8 (Key Details) | multiple-choice | multi-select | hot-text | — | — |
|
|
179
|
+
| `c1-t10` — Grade 5 · Claim 1 · Target 10 (Word Meanings) | multiple-choice | multi-select | hot-text | — | — |
|
|
180
|
+
<!-- GENERATED:task-models END -->
|
|
181
|
+
|
|
182
|
+
**Normalization rule (required).** When a request names a task model by number ("TM3", "task
|
|
183
|
+
model 3"), **resolve it against the row for the program's `target`** in the table above — never the
|
|
184
|
+
T4/T11 default. Before composing, echo the resolution, e.g. `target c1-t9, TM3 → ebsr`, and author
|
|
185
|
+
that item type.
|
|
186
|
+
|
|
187
|
+
**Belt-and-suspenders:** you may author the resolved number directly on the outcome as
|
|
188
|
+
`task-model tm3`. The compiler resolves it against the target's table and **hard-errors on a
|
|
189
|
+
mismatch** with `type` (or supplies `type` when you omit it) — so `outcome … task-model tm3 type
|
|
190
|
+
ebsr` on `c1-t9` is self-checking, and `task-model tm3 type short-text` is rejected.
|
|
191
|
+
|
|
192
|
+
The task-model mapping and Part-A choices below are the **Reasoning & Evidence (T4/T11)** catalog
|
|
193
|
+
(EBSR → Task Model 1, Hot Text → Task Model 2, Short Text → Task Model 3; task = inference vs.
|
|
194
|
+
conclusion vs. author-intent; plain subject vs. narrator's-feelings vs. relationship). **T9, T8,
|
|
195
|
+
and T10 use different task models and stems** — see their `stems.md` sections (e.g. T9 "Which
|
|
196
|
+
sentence best shows the main idea…", T8 states the inference then "Which detail … best supports
|
|
197
|
+
this conclusion?", T10 "What does the word … most likely mean?"). Common R&E Part A choices:
|
|
149
198
|
|
|
150
199
|
- inference — "Which of these inferences about [...] is supported by the passage?"
|
|
151
200
|
- conclusion — "Which of these conclusions about [...] is supported by the passage?"
|
|
@@ -182,6 +231,15 @@ outright is literal recall and out of scope.
|
|
|
182
231
|
|
|
183
232
|
## Authoring guidance
|
|
184
233
|
|
|
234
|
+
The distractor-pool, error-type-coverage, and EBSR Part-B rules below are the **Reasoning &
|
|
235
|
+
Evidence (T4/T11)** contract — they apply wherever wrong answers are authored as `distractor`
|
|
236
|
+
claims. The other targets author their wrong answers differently (see Step 0): **T9** uses
|
|
237
|
+
`distractor` claims too but with the significance taxonomy (`too-narrow` / `too-broad` /
|
|
238
|
+
`misreads-detail` / `insignificant`); **T8** has **no distractor claims** — its foils are
|
|
239
|
+
non-supporting `source`s (`supports-wrong-claim` / `irrelevant`); **T10** authors distractor
|
|
240
|
+
`meaning`s, not claims. The grade-level, length-balance, and stem-wording rules apply to **all**
|
|
241
|
+
targets.
|
|
242
|
+
|
|
185
243
|
- For each EBSR/Hot-Text outcome, author **at least 5 viable distractor claims that `targets`
|
|
186
244
|
it**, covering all three error types (`misreads-detail`, `erroneous-inference`,
|
|
187
245
|
`faulty-reasoning`) — with ≥2 alternatives in at least two of the types. An item draws only 3
|
|
@@ -278,7 +336,7 @@ outright is literal recall and out of scope.
|
|
|
278
336
|
- `dimension` (**c1-t9**): `central-idea`, `key-detail`, `summary` · (**c1-t8**): `supporting-evidence` · (**c1-t10**): `word-meaning`
|
|
279
337
|
- claim `status`: `supported`, `distractor` · source `status`: `directly-supports`, `supports-wrong-claim`, `irrelevant` · meaning `status` (c1-t10): `correct`, `distractor`
|
|
280
338
|
- `error-type` (**c1-t4 / c1-t11**): `misreads-detail`, `erroneous-inference`, `faulty-reasoning` · (**c1-t9**): `too-narrow`, `too-broad`, `misreads-detail`, `insignificant` · (**c1-t8**): none — wrong answers are non-supporting `source`s · (**c1-t10**): `other-meaning`, `misinterprets`, `wrong-context`
|
|
281
|
-
- `standard` (**c1-t4**)
|
|
339
|
+
- `standard` — primary companions (normally inferred from the dimension; author one only to override): (**c1-t4**) `rl-1` + `rl-2` (theme/topic) / `rl-3` / `rl-6` · (**c1-t11**) `ri-1` + `ri-3` / `ri-6` / `ri-7` / `ri-8` · (**c1-t9**) `ri-1` + `ri-2` · (**c1-t8**) `ri-1` + `ri-7` · (**c1-t10**) `ri-4` + `l-4` / `l-4a` / `l-4b` / `l-4c` / `l-5c`. The **full CCSS Grade-5 strand for the target's text type is accepted**: any `rl-1`–`rl-7` / `rl-9` on a literary target (c1-t4), any `ri-1`–`ri-9` on an informational target (c1-t11/t9/t8/t10), plus the `l-4` / `l-5` families on c1-t10. (`rl-2` is the theme standard — valid; there is no `rl-8`.)
|
|
282
340
|
- `dok`: `r-dok1`, `r-dok2`, `r-dok3` (R&E items are `r-dok3`; T9 selected-response is `r-dok2`, its written summary `r-dok3`; T8 & T10 are `r-dok2`)
|
|
283
341
|
|
|
284
342
|
## What composition does
|
|
@@ -335,3 +393,122 @@ outcomes [
|
|
|
335
393
|
]
|
|
336
394
|
{}..
|
|
337
395
|
```
|
|
396
|
+
|
|
397
|
+
(For **Target 11**, the same R&E shape over an *informational* passage: `target c1-t11`,
|
|
398
|
+
`type informational`, an RI dimension like `relationships-interactions`, `standard ri-1` + `ri-3`,
|
|
399
|
+
and the T11 stems from `stems.md`.)
|
|
400
|
+
|
|
401
|
+
## Example (Target 9 — Central Ideas, multiple-choice)
|
|
402
|
+
|
|
403
|
+
The OPTIONS are still `claim`s, but the skill is the **main idea** (not infer-and-justify) and the
|
|
404
|
+
distractors are the **significance** taxonomy — usually true statements that just aren't central.
|
|
405
|
+
DOK is `r-dok2`; the standards are `ri-1` + `ri-2`. (No EBSR Part B here; on T9 EBSR/Hot-Text the
|
|
406
|
+
correct claim's `directly-supports` sources are the supporting selection.)
|
|
407
|
+
|
|
408
|
+
```
|
|
409
|
+
target c1-t9
|
|
410
|
+
passage "Honeybees"
|
|
411
|
+
type informational
|
|
412
|
+
lines [
|
|
413
|
+
"Honeybees live together in large groups called colonies. Worker bees gather nectar and build the hive. The queen bee lays all the eggs. By working together, the colony survives and grows."
|
|
414
|
+
]
|
|
415
|
+
claims [
|
|
416
|
+
claim id "c1" status supported dimension central-idea subject "the colony" standard ri-2
|
|
417
|
+
text "Honeybees survive by living and working together, each bee with its own job."
|
|
418
|
+
cites ["e1"] {}
|
|
419
|
+
/* T9 distractors are usually TRUE statements that simply aren't the central idea */
|
|
420
|
+
claim id "d1" status distractor error-type too-narrow targets ["q1"]
|
|
421
|
+
text "The queen bee lays all the eggs."
|
|
422
|
+
rationale "A true supporting detail, not the central idea." cites ["e1"] {}
|
|
423
|
+
claim id "d2" status distractor error-type too-broad targets ["q1"]
|
|
424
|
+
text "Insects are the most important animals on Earth."
|
|
425
|
+
rationale "An overgeneralization beyond the passage." cites ["e1"] {}
|
|
426
|
+
claim id "d3" status distractor error-type misreads-detail targets ["q1"]
|
|
427
|
+
text "Each bee in the colony does every job by itself."
|
|
428
|
+
rationale "Misreads the division of labor." cites ["e1"] {}
|
|
429
|
+
]
|
|
430
|
+
evidence [ source id "e1" line 1 status directly-supports supports ["c1"] {} ]
|
|
431
|
+
outcomes [
|
|
432
|
+
outcome id "q1" type multiple-choice dimension central-idea subject "the colony" standard ri-2 focus "c1"
|
|
433
|
+
stem "Which sentence best shows the main idea of the passage?" {}
|
|
434
|
+
]
|
|
435
|
+
{}..
|
|
436
|
+
```
|
|
437
|
+
|
|
438
|
+
## Example (Target 8 — Key Details, evidence selection)
|
|
439
|
+
|
|
440
|
+
The inference is **GIVEN in the stem**; the OPTIONS are passage `source`s, not claims. Author ONE
|
|
441
|
+
supported `claim` (the given inference, named by `focus`), state it in the `stem`, and author the
|
|
442
|
+
`source`s as the choices: `directly-supports` = correct evidence (give each a `quote`),
|
|
443
|
+
`irrelevant`/`supports-wrong-claim` = foils. **No distractor claims.** Standards `ri-1` + `ri-7`,
|
|
444
|
+
DOK `r-dok2`.
|
|
445
|
+
|
|
446
|
+
```
|
|
447
|
+
target c1-t8
|
|
448
|
+
passage "Aqueducts"
|
|
449
|
+
type informational
|
|
450
|
+
lines [
|
|
451
|
+
"Roman aqueducts carried water across long distances. They used gentle slopes so water flowed by gravity. Arches held the channels high above valleys. Cities far from rivers could finally get fresh water."
|
|
452
|
+
]
|
|
453
|
+
claims [
|
|
454
|
+
claim id "c1" status supported dimension supporting-evidence subject "the aqueducts"
|
|
455
|
+
text "Roman aqueducts let cities far from rivers get fresh water." cites ["e1" "e2"] {}
|
|
456
|
+
]
|
|
457
|
+
evidence [
|
|
458
|
+
source id "e1" line 1 quote "Cities far from rivers could finally get fresh water." status directly-supports supports ["c1"] {}
|
|
459
|
+
source id "e2" line 1 quote "Roman aqueducts carried water across long distances." status directly-supports supports ["c1"] {}
|
|
460
|
+
source id "e3" line 1 quote "They used gentle slopes so water flowed by gravity." status irrelevant supports [] {}
|
|
461
|
+
source id "e4" line 1 quote "Arches held the channels high above valleys." status irrelevant supports [] {}
|
|
462
|
+
source id "e5" line 1 quote "Roman builders also paved long, straight roads." status irrelevant supports [] {}
|
|
463
|
+
]
|
|
464
|
+
outcomes [
|
|
465
|
+
outcome id "q1" type multiple-choice dimension supporting-evidence subject "the aqueducts" standard ri-7 focus "c1"
|
|
466
|
+
stem "Roman aqueducts let far-off cities get fresh water. Which detail from the passage best supports this conclusion?" {}
|
|
467
|
+
]
|
|
468
|
+
{}..
|
|
469
|
+
```
|
|
470
|
+
|
|
471
|
+
## Example (Target 10 — Word Meanings)
|
|
472
|
+
|
|
473
|
+
The OPTIONS are **meanings** of a targeted `word`, authored in a top-level `words` list — not
|
|
474
|
+
claims. One `status correct` meaning + `status distractor` meanings (each a T10 `error-type` +
|
|
475
|
+
`rationale`). The outcome's `focus` names the `word`; state the word + its sentence in the `stem`.
|
|
476
|
+
Standard `ri-4` + an L-4 strategy code, DOK `r-dok2`.
|
|
477
|
+
|
|
478
|
+
```
|
|
479
|
+
target c1-t10
|
|
480
|
+
passage "Aqueducts"
|
|
481
|
+
type informational
|
|
482
|
+
lines [
|
|
483
|
+
"Roman engineers built aqueducts. The aqueduct carried water across long distances."
|
|
484
|
+
]
|
|
485
|
+
words [
|
|
486
|
+
word id "w1" text "aqueduct" line 1 quote "The aqueduct carried water across long distances."
|
|
487
|
+
meanings [
|
|
488
|
+
meaning id "m1" status correct text "a channel built to carry water" {}
|
|
489
|
+
meaning id "m2" status distractor error-type other-meaning text "a boat that carries cargo"
|
|
490
|
+
rationale "Another meaning that ignores the context." {}
|
|
491
|
+
meaning id "m3" status distractor error-type misinterprets text "a tall stone tower"
|
|
492
|
+
rationale "Misreads the sentence." {}
|
|
493
|
+
meaning id "m4" status distractor error-type wrong-context text "a kind of road"
|
|
494
|
+
rationale "Uses the wrong context." {}
|
|
495
|
+
] {}
|
|
496
|
+
]
|
|
497
|
+
outcomes [
|
|
498
|
+
outcome id "q1" type multiple-choice dimension word-meaning subject "aqueduct" standard l-4a focus "w1"
|
|
499
|
+
stem "Read the sentence: \"The aqueduct carried water across long distances.\" What does the word aqueduct most likely mean?" {}
|
|
500
|
+
]
|
|
501
|
+
{}..
|
|
502
|
+
```
|
|
503
|
+
|
|
504
|
+
For a **click-the-word** (`hot-text`) T10 item, drop the `meanings` and instead list the clickable
|
|
505
|
+
candidates as bare `word`s in the same paragraph — the correct one is the outcome's `focus`, the
|
|
506
|
+
rest are distractor candidate words; put only the instruction + definition in the `stem`:
|
|
507
|
+
|
|
508
|
+
```
|
|
509
|
+
words [ word id "w1" text "aqueduct" line 1 {} word id "w2" text "engineers" {} word id "w3" text "water" {} ]
|
|
510
|
+
outcomes [
|
|
511
|
+
outcome id "q1" type hot-text dimension word-meaning subject "aqueduct" standard l-4c focus "w1"
|
|
512
|
+
stem "Read the paragraph below. Click the word that means a channel that carries water." {}
|
|
513
|
+
]
|
|
514
|
+
```
|
package/spec/language-info.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"revised": "2026-06-24",
|
|
3
3
|
"id": "0175",
|
|
4
|
-
"description": "Composes 5th-grade English Language Arts assessment items (Smarter Balanced · Grade 5 · Claim 1) for multiple learning targets — DIFFERENT reading skills: c1-t4 (Reasoning & Evidence, literary, RL) and c1-t11 (Reasoning & Evidence, informational, RI) ask students to infer/conclude and justify with evidence; c1-t9 (Central Ideas, informational, RI-1/RI-2) asks them to determine the main idea, the key details that build it, or summarize; c1-t8 (Key Details, informational, RI-1/RI-7) GIVES the inference in the stem and asks them to select the supporting evidence (the options are passage sources, not statements; item types multiple-choice/multi-select/single-part-hot-text); c1-t10 (Word Meanings, informational, RI-4/L-4) asks for the meaning of a targeted word/phrase in context (the options are MEANINGS, authored as a `word` with candidate `meaning`s — not claims; item types multiple-choice/multi-select, plus a click-the-word hot-text — for that, author the candidate `word`s as a top-level `words` list: the correct one is the outcome's `focus` (give it the `line` of its paragraph) and the others are distractor candidate words; ALL candidates must be words that appear in that one paragraph (they need only `text` — no `meanings`). The compiler shows the whole paragraph and makes the authored candidate words clickable, with the focus word correct (if you author only the correct word, every content word in the paragraph becomes a choice). The `stem` is only the instruction + the definition (e.g. 'Read the paragraph below. Click the word that means …') — never paste the paragraph into the stem). Every program declares a top-level `target` selecting the dimension/standard/error-type/DOK/stem vocabulary. Item-first: a program authors the questions (outcomes) first — each with a unique id, a focus correct claim (a list on multi-select), and a stem from the target's guideline catalog — then supported and distractor claims (each distractor targets the question(s) it foils) and evidence sources for one passage. Item types: ebsr (two-part), hot-text, short-text, multiple-choice (one correct), multi-select (exact correct set). T9 distractors are a significance taxonomy (too-narrow/too-broad/misreads-detail/insignificant) — usually true statements that just aren't central — vs R&E's reasoning-error taxonomy. Supply the passage text already split into paragraphs (keep the paragraph breaks). The compiler takes each outcome's focus and the foils that target it and assembles the item; it selects and validates authored content, it does not generate content or stems. Author the passage and all question text at the target reading level (Grade 5; overridable with a top-level `grade <n>`); the compiler estimates the passage's reading level and warns when it runs above the target grade.",
|
|
4
|
+
"description": "Composes 5th-grade English Language Arts assessment items (Smarter Balanced · Grade 5 · Claim 1) for multiple learning targets — DIFFERENT reading skills: c1-t4 (Reasoning & Evidence, literary, RL) and c1-t11 (Reasoning & Evidence, informational, RI) ask students to infer/conclude and justify with evidence; c1-t9 (Central Ideas, informational, RI-1/RI-2) asks them to determine the main idea, the key details that build it, or summarize; c1-t8 (Key Details, informational, RI-1/RI-7) GIVES the inference in the stem and asks them to select the supporting evidence (the options are passage sources, not statements; item types multiple-choice/multi-select/single-part-hot-text); c1-t10 (Word Meanings, informational, RI-4/L-4) asks for the meaning of a targeted word/phrase in context (the options are MEANINGS, authored as a `word` with candidate `meaning`s — not claims; item types multiple-choice/multi-select, plus a click-the-word hot-text — for that, author the candidate `word`s as a top-level `words` list: the correct one is the outcome's `focus` (give it the `line` of its paragraph) and the others are distractor candidate words; ALL candidates must be words that appear in that one paragraph (they need only `text` — no `meanings`). The compiler shows the whole paragraph and makes the authored candidate words clickable, with the focus word correct (if you author only the correct word, every content word in the paragraph becomes a choice). The `stem` is only the instruction + the definition (e.g. 'Read the paragraph below. Click the word that means …') — never paste the paragraph into the stem). Every program declares a top-level `target` selecting the dimension/standard/error-type/DOK/stem vocabulary. Item-first: a program authors the questions (outcomes) first — each with a unique id, a focus correct claim (a list on multi-select), and a stem from the target's guideline catalog — then supported and distractor claims (each distractor targets the question(s) it foils) and evidence sources for one passage. Item types: ebsr (two-part), hot-text, short-text, multiple-choice (one correct), multi-select (exact correct set); the allowed set is per-target. Task-model NUMBERS are per-target and collide (tm3 = short-text in c1-t4/c1-t11, ebsr in c1-t9, hot-text in c1-t8/c1-t10), so name the item type, or resolve a number against the program's target — never assume the Reasoning & Evidence numbering carries to T9/T8/T10 (see the per-target task-model table in the usage guide). T9 distractors are a significance taxonomy (too-narrow/too-broad/misreads-detail/insignificant) — usually true statements that just aren't central — vs R&E's reasoning-error taxonomy. Supply the passage text already split into paragraphs (keep the paragraph breaks). The compiler takes each outcome's focus and the foils that target it and assembles the item; it selects and validates authored content, it does not generate content or stems. Author the passage and all question text at the target reading level (Grade 5; overridable with a top-level `grade <n>`); the compiler estimates the passage's reading level and warns when it runs above the target grade.",
|
|
5
5
|
"supported_item_types": ["ebsr", "hot-text", "short-text", "multiple-choice", "multi-select"],
|
|
6
6
|
"example_prompts": [
|
|
7
7
|
{
|
|
@@ -72,7 +72,7 @@
|
|
|
72
72
|
{
|
|
73
73
|
"prompt": "Make a click-the-word item: in the paragraph about the city's water supply, the target word is 'aqueduct' (it means a system of pipes/canals that carries water). Stem: 'Read the paragraph below. Click the word that means a system of pipes or canals that carries water.'",
|
|
74
74
|
"produces": "hot-text",
|
|
75
|
-
"notes": "T10 Task Model 3
|
|
75
|
+
"notes": "T10's click-the-word hot-text (its Task Model 3 — task-model numbers are per-target, so this 3 is specific to c1-t10 and is NOT the same item type as tm3 in other targets): target c1-t10, type hot-text, dimension word-meaning, standard ri-4 + l-4a. Author the focus `word` (e.g. { id \"w1\" text \"aqueduct\" line <paragraph #> }) as the outcome's `focus`, then the DISTRACTOR CANDIDATE WORDS either way: (a) as more single-word `word`s in the `words` list, OR (b) as the focus word's distractor `meanings` whose `text` IS the candidate word — a single word, with error-type + rationale (e.g. text \"causeways\", text \"canals\"). For hot-text the meaning text must be the literal word to click, NOT a definition. ALL candidates must be words that appear in the focus word's paragraph. The compiler shows the whole paragraph (from the focus word's `line`) and makes the candidate words clickable, focus correct. Do NOT put the paragraph in the stem; the stem is only the instruction + definition. (If you author only the focus word with a real multi-word definition, the compiler can't find candidates and falls back to making EVERY content word clickable — avoid that by listing the candidate words.)"
|
|
76
76
|
}
|
|
77
77
|
]
|
|
78
78
|
}
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
<!-- SPDX-License-Identifier: CC-BY-4.0 -->
|
|
2
|
+
# L0175 RAG anchor set — design
|
|
3
|
+
|
|
4
|
+
_Status: DESIGN (authoring of the golden pairs is a follow-up). Revised 2026-06-30._
|
|
5
|
+
|
|
6
|
+
## Why this exists
|
|
7
|
+
|
|
8
|
+
The reported failure: asked for **c1-t9, Task Model 3 (two-part EBSR)**, the generator drifts to
|
|
9
|
+
TM4 (hot-text) / TM5 (short-text). The instruction-side fixes (the served collision warning, the
|
|
10
|
+
generated per-target task-model table, and the compiler's `task-model` guard) make the rule
|
|
11
|
+
explicit and enforceable. RAG adds the third leg: a **positive worked example** of the exact shape
|
|
12
|
+
the request wants, so retrieval biases generation toward it instead of toward the more salient
|
|
13
|
+
T4/T11 numbering.
|
|
14
|
+
|
|
15
|
+
The drift is between **siblings within one target** (t9 EBSR vs hot-text vs short-text). An abstract
|
|
16
|
+
per-target renumbering rule is exactly what an LLM under-weights; a retrieved exemplar of
|
|
17
|
+
"c1-t9, TM3, two-part EBSR" is the strongest available anchor. The embedding infra already supports
|
|
18
|
+
this — `src/embedding.ts` emits `target:c1-t9`, `item:ebsr`, `shape:two-part`, and now
|
|
19
|
+
`task-model:3`; `extractQueryFacets` pulls target + item type + an explicit/derived task-model
|
|
20
|
+
number from the query. The corpus just lacks positive output-side exemplars.
|
|
21
|
+
|
|
22
|
+
## What the set should be
|
|
23
|
+
|
|
24
|
+
1. **Paired golden examples (prompt → canonical L0175 program), not prompt-only.** `examples.md` is
|
|
25
|
+
prompt-only; disambiguating siblings needs the *output* shape. Store paired examples the console
|
|
26
|
+
RAG can index via `buildEmbeddingArtifacts({ prompt, code })` — e.g. a `spec/examples/` directory
|
|
27
|
+
of `<id>.gc` programs each with a sibling `<id>.prompt.md`, or an L0175 `training_examples.json`
|
|
28
|
+
array of `{ prompt, code }`.
|
|
29
|
+
|
|
30
|
+
2. **Contrastive siblings — same passage, vary only the task model — for every target** (parity, not
|
|
31
|
+
t9-only). Holding the passage + dimension fixed and changing only the task model makes the
|
|
32
|
+
vectors/facets separate cleanly on item type / shape, so retrieval discriminates instead of
|
|
33
|
+
blurring:
|
|
34
|
+
|
|
35
|
+
| Target | Task models to anchor (item type) |
|
|
36
|
+
|--------|-----------------------------------|
|
|
37
|
+
| `c1-t9` (validate first — the observed failure) | tm3 ebsr · tm4 hot-text · tm5 short-text · tm1 multiple-choice · tm2 multi-select |
|
|
38
|
+
| `c1-t8`, `c1-t10` | tm1 multiple-choice · tm2 multi-select · tm3 hot-text |
|
|
39
|
+
| `c1-t4`, `c1-t11` | tm1 ebsr · tm2 hot-text · tm3 short-text |
|
|
40
|
+
|
|
41
|
+
Every target × task-model cell gets ≥1 anchor. No target ships without anchors. (Source of truth
|
|
42
|
+
for which cells exist: `spec/targets.json` → `targets.<id>.taskModels`.)
|
|
43
|
+
|
|
44
|
+
3. **Tagged with the design signature `embedding.ts` emits** — `target:<t>`, `item:<type>`,
|
|
45
|
+
`shape:two-part|single-part`, `dimension:<d>`, `standard:<s>`, `dok:<d>`, and **`task-model:<n>`**
|
|
46
|
+
(derived from target + item type via `targets.json`). The console filters/boosts on these facets;
|
|
47
|
+
`target:c1-t9` + `task-model:3` together select the EBSR exemplar over its siblings.
|
|
48
|
+
|
|
49
|
+
4. **Each prompt phrased both ways** — by number ("task model 3", "TM3") **and** by item type
|
|
50
|
+
("two-part EBSR") — so the query embedding and `extractQueryFacets` match however the client
|
|
51
|
+
phrases it. `extractQueryFacets` already extracts an explicit number and derives one from
|
|
52
|
+
target + item type; the paired prompts should exercise both spellings.
|
|
53
|
+
|
|
54
|
+
## The store and the verification gate (implemented)
|
|
55
|
+
|
|
56
|
+
The paired store and the gate now exist:
|
|
57
|
+
|
|
58
|
+
- **Store:** `spec/examples/<id>.gc` (the program, authored with `task-model tmN`) + `<id>.expect.json`
|
|
59
|
+
(`{ prompt, expect: { target, taskModel/itemType, dimension, standard } }`). First anchor in:
|
|
60
|
+
`c1-t9-tm3-ebsr` (the exact drift case). This is the human-refined artifact — author/edit the `.gc`.
|
|
61
|
+
- **Gate:** `verifyExample({ errors, data, code, expect })` (exported from `@graffiticode/l0175`) —
|
|
62
|
+
BLOCKING checks (compiles clean; composed `target`/item type/`task-model:<n>`/dimension/standard
|
|
63
|
+
match the declared intent, read from `targets.ts` NOT `instructions.md`) + ADVISORY (compiler
|
|
64
|
+
warnings surfaced, not fatal). It never maps prompt→code; it validates whatever code exists, so it
|
|
65
|
+
guards **both** bootstrapped and hand-refined examples. Run it in the console capture step and after
|
|
66
|
+
any refinement.
|
|
67
|
+
- **Corpus check:** `test/corpus.spec.ts` runs the gate over every `spec/examples/*.gc` on each test
|
|
68
|
+
run; `test/verify-example.spec.ts` proves the gate rejects a mislabeled example that compiles clean.
|
|
69
|
+
|
|
70
|
+
**Coverage: 17 of 17 matrix cells — the full matrix.** Every `targets.<id>.taskModels` cell has a
|
|
71
|
+
gated golden pair (all advisory-free): the c1-t4 / c1-t11 R&E trios, the c1-t8 / c1-t10 trios, and
|
|
72
|
+
the complete c1-t9 five-item set (tm1 MC, tm2 multi-select, tm3 ebsr, tm4 hot-text, tm5 short-text).
|
|
73
|
+
Each is a contrastive sibling set (same passage per target, varying only the task model). Any new or
|
|
74
|
+
edited anchor is re-validated by the gate (`test/corpus.spec.ts`).
|
package/spec/scope.json
CHANGED
|
@@ -14,7 +14,7 @@
|
|
|
14
14
|
"c1-t10 (Word Meanings) dimension: word-meaning — the question asks the meaning of a targeted word/phrase; options are meanings authored as word/meaning (not claims)",
|
|
15
15
|
"Item types: ebsr (two-part selected response), hot-text (select text — sentence-level for R&E/Central-Ideas/Key Details, word-level click-the-word for Word Meanings), short-text (constructed response), multiple-choice (one correct), multi-select (exact correct set); the allowed set is per-target",
|
|
16
16
|
"Authored, error-typed distractors — R&E: misreads-detail / erroneous-inference / faulty-reasoning; Central Ideas (c1-t9): too-narrow / too-broad / misreads-detail / insignificant; Key Details (c1-t8): non-supporting sources (supports-wrong-claim / irrelevant), not distractor claims; Word Meanings (c1-t10): distractor meanings other-meaning / misinterprets / wrong-context — with rationales",
|
|
17
|
-
"Standards rl-1
|
|
17
|
+
"Standards: the full CCSS Grade-5 strand for the target's text type is accepted (literary c1-t4 → RL rl-1..rl-7/rl-9, no rl-8; informational c1-t11/t9/t8/t10 → RI ri-1..ri-9; c1-t10 also the L-4/L-5 vocabulary families). The dimension's companion is inferred when standard is omitted — primary companions: c1-t4 rl-1 + rl-2 (theme/topic) / rl-3 / rl-6; c1-t11 ri-1 + ri-3/ri-6/ri-7/ri-8; c1-t9 ri-1+ri-2; c1-t8 ri-1+ri-7; c1-t10 ri-4 + the L-4 family. DOK r-dok1..r-dok3 (R&E r-dok3; T9 r-dok2, written summary r-dok3; T8 & T10 r-dok2)",
|
|
18
18
|
"A reading-level target set by the guideline/target's grade (Grade 5 for c1-t4/c1-t11), overridable by an optional top-level `grade`; the compiler estimates the passage's reading level and warns when it runs above the target grade"
|
|
19
19
|
],
|
|
20
20
|
"out_of_scope": [
|
package/spec/spec.md
CHANGED
|
@@ -151,7 +151,7 @@ composes). A `focus` that isn't a supported claim, or a `targets` to a missing o
|
|
|
151
151
|
- **`dimension` (c1-t9)**: `central-idea`, `key-detail`, `summary` · **(c1-t8)**: `supporting-evidence` · **(c1-t10)**: `word-meaning`
|
|
152
152
|
- **claim `status`**: `supported`, `distractor` · **source `status`**: `directly-supports`, `supports-wrong-claim`, `irrelevant` · **meaning `status` (c1-t10)**: `correct`, `distractor`
|
|
153
153
|
- **`error-type` (c1-t4 / c1-t11)**: `misreads-detail`, `erroneous-inference`, `faulty-reasoning` · **(c1-t9)**: `too-narrow`, `too-broad`, `misreads-detail`, `insignificant` · **(c1-t8)**: none — non-supporting sources · **(c1-t10)**: `other-meaning`, `misinterprets`, `wrong-context`
|
|
154
|
-
- **`standard
|
|
154
|
+
- **`standard`** — the full CCSS Grade-5 strand for the target's text type is accepted; the dimension's companion is inferred when you omit it. Primary companions: **(c1-t4)** `rl-1` + `rl-2` (theme/topic) / `rl-3` / `rl-6` — full **RL** strand `rl-1`–`rl-7`/`rl-9` accepted (no `rl-8`) · **(c1-t11)** `ri-1` + `ri-3` / `ri-6` / `ri-7` / `ri-8` · **(c1-t9)** `ri-1` + `ri-2` · **(c1-t8)** `ri-1` + `ri-7` — c1-t11/t9/t8 accept the full **RI** strand `ri-1`–`ri-9` · **(c1-t10)** `ri-4` + the `l-4` / `l-5` families. **`dok`**: `r-dok1`, `r-dok2`, `r-dok3`
|
|
155
155
|
|
|
156
156
|
## How composition uses the vocabulary
|
|
157
157
|
|
|
@@ -275,3 +275,11 @@ outcomes [
|
|
|
275
275
|
]
|
|
276
276
|
{}..
|
|
277
277
|
```
|
|
278
|
+
|
|
279
|
+
The example above is Target 4 (literary, R&E). The other targets keep the same flat-chain shape
|
|
280
|
+
but differ in what the options are: **T11** is the same R&E shape over an informational passage
|
|
281
|
+
(RI standards); **T9** (Central Ideas) options are `claim`s judged by significance (`too-narrow` /
|
|
282
|
+
`too-broad` / `insignificant` / `misreads-detail`); **T8** (Key Details) gives the inference in the
|
|
283
|
+
stem and the options are `source`s (no distractor claims); **T10** (Word Meanings) options are
|
|
284
|
+
`meaning`s of a targeted `word` authored in a top-level `words` list. See `instructions.md` for a
|
|
285
|
+
full worked program per target and `stems.md` for each target's stem catalog.
|
package/spec/stems.md
CHANGED
|
@@ -28,6 +28,20 @@ dual-text-stimuli stems are out of scope).
|
|
|
28
28
|
3. Pick the **one stem template** that matches the task and the dimension, and fill its
|
|
29
29
|
bracketed `[...]` slot.
|
|
30
30
|
|
|
31
|
+
Task-model number → item type, per target (the numbers collide across targets, so a number is
|
|
32
|
+
meaningless without its target):
|
|
33
|
+
|
|
34
|
+
<!-- GENERATED:task-models START (from targets.json — regenerated by tools/build-static.js; do not edit by hand) -->
|
|
35
|
+
| Target | tm1 | tm2 | tm3 | tm4 | tm5 |
|
|
36
|
+
|--------|-----|-----|-----|-----|-----|
|
|
37
|
+
| `c1-t4` — Grade 5 · Claim 1 · Target 4 (Reasoning & Evidence) | ebsr | hot-text | short-text | — | — |
|
|
38
|
+
| `c1-t11` — Grade 5 · Claim 1 · Target 11 (Reasoning & Evidence) | ebsr | hot-text | short-text | — | — |
|
|
39
|
+
| `c1-t9` — Grade 5 · Claim 1 · Target 9 (Central Ideas) | multiple-choice | multi-select | ebsr | hot-text | short-text |
|
|
40
|
+
| `c1-t8` — Grade 5 · Claim 1 · Target 8 (Key Details) | multiple-choice | multi-select | hot-text | — | — |
|
|
41
|
+
| `c1-t10` — Grade 5 · Claim 1 · Target 10 (Word Meanings) | multiple-choice | multi-select | hot-text | — | — |
|
|
42
|
+
<!-- GENERATED:task-models END -->
|
|
43
|
+
|
|
44
|
+
|
|
31
45
|
**Specificity rule (required, both targets).** The `[...]` slot must name the concrete thing the
|
|
32
46
|
question is about — a character's name (`Mother`), a specific event (`the turkey-feeding`), a
|
|
33
47
|
specific idea (`the relationship between the bridge designs`), the author's point of view, etc.
|
package/spec/template.gc
CHANGED
|
@@ -1,67 +1,7 @@
|
|
|
1
|
-
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
"Mara crouched at the edge of the tide pool, ignoring the picnic her family had spread out behind her. Her brother called twice, but she did not turn around. A tiny crab scuttled under a rock, and Mara smiled for the first time all day."
|
|
9
|
-
"She had spent the whole car ride staring out the window, saying nothing. Now her fingers traced the cold water as if the pool were the only thing that mattered. \"Five more minutes,\" she whispered, though no one was listening."
|
|
10
|
-
"Behind her, paper plates rustled and her mother laughed at someone's joke. Mara did not hear them."
|
|
11
|
-
]
|
|
12
|
-
claims [
|
|
13
|
-
claim id "c1" status supported dimension character subject "Mara" standard rl-1 dok r-dok3
|
|
14
|
-
text "Mara is more interested in the tide pool than in her family's picnic."
|
|
15
|
-
rationale "Paragraphs 1 and 2 show Mara absorbed by the tide pool while she ignores her family."
|
|
16
|
-
cites ["e1" "e2" "e3" "e4"] {}
|
|
17
|
-
claim id "c5" status supported dimension character subject "the brother"
|
|
18
|
-
text "Mara's brother wants her attention."
|
|
19
|
-
cites ["e2"] {}
|
|
20
|
-
/* Distractors target the question(s) they foil (here, all foil the Mara items q1/q2/q3). */
|
|
21
|
-
claim id "c2" status distractor error-type misreads-detail plausibility 0.85 targets ["q1" "q2" "q3"]
|
|
22
|
-
text "Mara is angry at her brother for calling her."
|
|
23
|
-
rationale "Misreads 'she did not turn around': absorption, not anger."
|
|
24
|
-
cites ["e2"] {}
|
|
25
|
-
claim id "c6" status distractor error-type misreads-detail plausibility 0.6 targets ["q1" "q2" "q3"]
|
|
26
|
-
text "Mara is bored and impatient to leave."
|
|
27
|
-
rationale "Misreads her stillness as boredom; the crab makes her smile — delight, not boredom."
|
|
28
|
-
cites ["e6"] {}
|
|
29
|
-
claim id "c3" status distractor error-type erroneous-inference plausibility 0.55 targets ["q1" "q2" "q3"]
|
|
30
|
-
text "Mara dislikes spending time outdoors."
|
|
31
|
-
rationale "Over-generalizes from her quiet to a dislike her smile at the crab contradicts."
|
|
32
|
-
cites ["e6"] {}
|
|
33
|
-
claim id "c7" status distractor error-type erroneous-inference plausibility 0.5 targets ["q1" "q2" "q3"]
|
|
34
|
-
text "Mara wishes she were somewhere else with her friends."
|
|
35
|
-
rationale "Invents an off-text desire; nothing in the passage mentions friends."
|
|
36
|
-
cites ["e7"] {}
|
|
37
|
-
claim id "c4" status distractor error-type faulty-reasoning plausibility 0.45 targets ["q1" "q2" "q3"]
|
|
38
|
-
text "Because Mara whispers, she must be afraid of her family."
|
|
39
|
-
rationale "Whispering is treated as fear without textual support."
|
|
40
|
-
cites ["e5"] {}
|
|
41
|
-
claim id "c8" status distractor error-type faulty-reasoning plausibility 0.4 targets ["q1" "q2" "q3"]
|
|
42
|
-
text "Because Mara ignores the picnic, she must not love her family."
|
|
43
|
-
rationale "Leaps from one moment of focus to a sweeping claim about her feelings."
|
|
44
|
-
cites ["e6"] {}
|
|
45
|
-
]
|
|
46
|
-
evidence [
|
|
47
|
-
source id "e1" line 1 quote "Mara crouched at the edge of the tide pool, ignoring the picnic her family had spread out behind her." status directly-supports supports ["c1"] {}
|
|
48
|
-
source id "e2" line 1 quote "Her brother called twice, but she did not turn around." status supports-wrong-claim supports ["c1" "c2"]
|
|
49
|
-
rationale "Real evidence, but props up the 'anger' misreading, not the inference." {}
|
|
50
|
-
source id "e3" line 1 quote "A tiny crab scuttled under a rock, and Mara smiled for the first time all day." status directly-supports supports ["c1"] {}
|
|
51
|
-
source id "e4" line 2 quote "Now her fingers traced the cold water as if the pool were the only thing that mattered." status directly-supports supports ["c1"] {}
|
|
52
|
-
source id "e5" line 2 quote "\"Five more minutes,\" she whispered, though no one was listening." status supports-wrong-claim supports ["c4"]
|
|
53
|
-
rationale "Her whisper is real, but it shows focus, not fear." {}
|
|
54
|
-
source id "e6" line 3 quote "Behind her, paper plates rustled and her mother laughed at someone's joke." status irrelevant supports [] {}
|
|
55
|
-
source id "e7" line 3 quote "Mara did not hear them." status irrelevant supports [] {}
|
|
56
|
-
source id "e8" line 2 quote "She had spent the whole car ride staring out the window, saying nothing." status irrelevant supports [] {}
|
|
57
|
-
]
|
|
58
|
-
outcomes [
|
|
59
|
-
outcome id "q1" type ebsr dimension character subject "Mara" standard rl-1 focus "c1"
|
|
60
|
-
stem "Which of these inferences about Mara is supported by the passage?"
|
|
61
|
-
stem-b "Which sentence(s) from the passage best support your answer in Part A?" {}
|
|
62
|
-
outcome id "q2" type short-text dimension character subject "Mara" standard rl-1 focus "c1"
|
|
63
|
-
stem "What inference can be made about Mara? Explain using key details from the passage to support your answer." {}
|
|
64
|
-
outcome id "q3" type hot-text dimension character subject "Mara" standard rl-1 focus "c1"
|
|
65
|
-
stem "Click on the statement that best provides an inference about Mara that is supported by the passage." {}
|
|
66
|
-
]
|
|
67
|
-
{}..
|
|
1
|
+
passage "Title"
|
|
2
|
+
type literary
|
|
3
|
+
lines [
|
|
4
|
+
"Line one."
|
|
5
|
+
] claims []
|
|
6
|
+
evidence []
|
|
7
|
+
outcomes [] {}..
|
package/spec/usage-guide.md
CHANGED
|
@@ -7,7 +7,16 @@ Agent-facing guide for authoring L0175 programs. Read this before composing a `c
|
|
|
7
7
|
|
|
8
8
|
## Overview
|
|
9
9
|
|
|
10
|
-
L0175 is a content-composition language for 5th-grade English Language Arts assessment items (Smarter Balanced spec ELA · Grade 5 · Claim 1 · Reasoning & Evidence). One language serves **multiple learning targets**; **every program first declares a top-level `target`**: `c1-t4` (Target 4 — *literary* texts, RL standards, dimensions like character/theme/point-of-view) or `c1-t11` (Target 11 — *informational* texts, RI standards, dimensions like relationships-interactions/author-use-of-information/point-of-view/purpose). Choose the target from the request (literary vs. informational text and the skill assessed); the dimensions, standards, and stem catalog (`stems.md`) are then scoped to that target, and mixing targets' vocabularies is a compile error. It is **item-first**: after picking the target you compose the questions (`outcome`s) first — each with a unique `id`, a `focus` naming its correct claim, and an explicit `stem` (and `stem-b` on EBSR) instantiated from the guideline's Appropriate-Stem catalog (`stems.md`) — then author the supported `claim`s and a *superset* of distractor `claim`s, each tagging the question(s) it foils via `targets`, plus evidence `source`s. The compiler then *composes* each outcome deterministically: it takes the correct claim from `focus`, draws that question's foils ONLY from the distractors that `targets` it, uses the authored stem, and assembles a finished item in
|
|
10
|
+
L0175 is a content-composition language for 5th-grade English Language Arts assessment items (Smarter Balanced spec ELA · Grade 5 · Claim 1 · Reasoning & Evidence). One language serves **multiple learning targets**; **every program first declares a top-level `target`**: `c1-t4` (Target 4 — *literary* texts, RL standards, dimensions like character/theme/point-of-view) or `c1-t11` (Target 11 — *informational* texts, RI standards, dimensions like relationships-interactions/author-use-of-information/point-of-view/purpose). Choose the target from the request (literary vs. informational text and the skill assessed); the dimensions, standards, and stem catalog (`stems.md`) are then scoped to that target, and mixing targets' vocabularies is a compile error. It is **item-first**: after picking the target you compose the questions (`outcome`s) first — each with a unique `id`, a `focus` naming its correct claim, and an explicit `stem` (and `stem-b` on EBSR) instantiated from the guideline's Appropriate-Stem catalog (`stems.md`) — then author the supported `claim`s and a *superset* of distractor `claim`s, each tagging the question(s) it foils via `targets`, plus evidence `source`s. The compiler then *composes* each outcome deterministically: it takes the correct claim from `focus`, draws that question's foils ONLY from the distractors that `targets` it, uses the authored stem, and assembles a finished item in the task model named by its item type — for Reasoning & Evidence (T4/T11): `ebsr` (two-part evidence-based selected response), `hot-text` (select-text), or `short-text` (constructed response); other targets also offer `multiple-choice`/`multi-select` (the allowed set and the per-target task-model numbering are in *Vocabulary Cues*). One passage + superset can yield several items, each with its own bound foil set. The compiler performs no generation and no stem synthesis — it selects, validates against the guideline, and warns when a question's pool falls short. Distractors are tagged by the SBAC error taxonomy (Part A: `misreads-detail`, `erroneous-inference`, `faulty-reasoning`; Part B: `supports-wrong-claim`, `irrelevant`), each carrying a rationale; composition picks foils for error-type coverage and couples Part B evidence to the claims it plausibly supports. **For each EBSR/Hot-Text question author at least 5 viable distractors that `targets` it (aim for 5–8, over-generating since some are filtered as near-duplicates or accidentally correct) — covering all three error types with ≥2 alternatives in at least two of them, and giving each a `plausibility` score (0–1). An item draws only 3 foils, so a richer targeted pool yields stronger items; fewer than 3 targeting a question is a hard error, fewer than 5 a warning. Likewise, for EBSR Part B author at least 5 non-supporting evidence lines (`supports-wrong-claim` + `irrelevant`) so the compiler can choose the most tempting 3 foil options. No-giveaway rule: at least one of those `supports-wrong-claim` lines must list BOTH the correct claim's id AND a distractor's id in its `supports` (a line that seems to support the right answer but actually backs a misreading) — otherwise the correct evidence line stands alone, Part B telegraphs Part A, and the compiler warns "possible A↔B giveaway." Do not make every wrong-claim line point only at distractors.**
|
|
11
|
+
|
|
12
|
+
**Name the item type; task-model NUMBERS are per-target and collide.** Phrase a request by item
|
|
13
|
+
type (`ebsr` / `hot-text` / `short-text` / `multiple-choice` / `multi-select`) — that is unambiguous.
|
|
14
|
+
Task-model *numbers* (TM1…TM5) are numbered **per target**, so the same number means different things
|
|
15
|
+
in different targets: `tm3` is **short-text** in `c1-t4`/`c1-t11`, **`ebsr`** in `c1-t9`, and
|
|
16
|
+
**`hot-text`** in `c1-t8`/`c1-t10`. A bare "task model 3" is therefore meaningless without its
|
|
17
|
+
target — never assume the Reasoning & Evidence (T4/T11) numbering carries over to T9/T8/T10. When a
|
|
18
|
+
request names a task model by number, resolve it against the **program's target** (see the per-target
|
|
19
|
+
task-model table in *Vocabulary Cues*) and state the item type, e.g. for `c1-t9` "task model 3 → EBSR".
|
|
11
20
|
|
|
12
21
|
**Targets are different skills — pick the one the request assesses.** Beyond the two Reasoning &
|
|
13
22
|
Evidence targets above (`c1-t4` literary, `c1-t11` informational — infer/conclude and justify with
|
|
@@ -84,6 +93,17 @@ Say this to get that:
|
|
|
84
93
|
- **Dimensions (c1-t10)** — `word-meaning` (the answer is a meaning; authored via `word`/`meaning`).
|
|
85
94
|
- **Word / meaning (c1-t10)** — `words [ word id "w1" text "aqueduct" line 1 quote "…" meanings [ meaning id "m1" status correct text "a water channel" {} meaning id "m2" status distractor error-type other-meaning text "a boat" rationale "…" {} ] {} ]`; the outcome's `focus` names the word.
|
|
86
95
|
- **Item types** — `ebsr` (two-part: statement → evidence), `hot-text` (R&E/Central-Ideas: click the supporting/main-idea sentences; Key Details: click the evidence sentences; Word Meanings: click the word matching a definition), `short-text` (constructed response), `multiple-choice` (one correct, single-part), `multi-select` (exact correct set, single-part). MC/Multi-Select have no Part B. The allowed set is per-target (the compiler rejects others).
|
|
96
|
+
- **Task models are per-target — resolve a number against the target.** Task-model NUMBERS (TM1…TM5) are numbered per target and collide, so name the item type rather than a bare number; if a request says "task model N", map N to the item type for the program's target using the table below (e.g. `c1-t9` TM3 → `ebsr`, TM4 → `hot-text`). You may also author the number on the outcome as `task-model tm3` — the compiler resolves it against the target's table and **hard-errors if it disagrees with `type`** (or supplies `type` when omitted), so the intended task model is checked, not guessed.
|
|
97
|
+
|
|
98
|
+
<!-- GENERATED:task-models START (from targets.json — regenerated by tools/build-static.js; do not edit by hand) -->
|
|
99
|
+
| Target | tm1 | tm2 | tm3 | tm4 | tm5 |
|
|
100
|
+
|--------|-----|-----|-----|-----|-----|
|
|
101
|
+
| `c1-t4` — Grade 5 · Claim 1 · Target 4 (Reasoning & Evidence) | ebsr | hot-text | short-text | — | — |
|
|
102
|
+
| `c1-t11` — Grade 5 · Claim 1 · Target 11 (Reasoning & Evidence) | ebsr | hot-text | short-text | — | — |
|
|
103
|
+
| `c1-t9` — Grade 5 · Claim 1 · Target 9 (Central Ideas) | multiple-choice | multi-select | ebsr | hot-text | short-text |
|
|
104
|
+
| `c1-t8` — Grade 5 · Claim 1 · Target 8 (Key Details) | multiple-choice | multi-select | hot-text | — | — |
|
|
105
|
+
| `c1-t10` — Grade 5 · Claim 1 · Target 10 (Word Meanings) | multiple-choice | multi-select | hot-text | — | — |
|
|
106
|
+
<!-- GENERATED:task-models END -->
|
|
87
107
|
- **Central Ideas (c1-t9) distractor `error-type`** — `too-narrow`, `too-broad`, `misreads-detail`, `insignificant` (true-but-not-central); R&E targets use `misreads-detail`/`erroneous-inference`/`faulty-reasoning`.
|
|
88
108
|
- **Program terminator** — top-level forms chain with no `{}` between them; the program ends with a single `{}..`.
|
|
89
109
|
|
|
@@ -104,7 +124,7 @@ Say this to get that:
|
|
|
104
124
|
|
|
105
125
|
## Out of Scope
|
|
106
126
|
|
|
107
|
-
- **Other targets / grades / claims** — L0175 covers G5 · Claim 1
|
|
127
|
+
- **Other targets / grades / claims** — L0175 covers G5 · Claim 1, targets T4 (Reasoning & Evidence, literary), T11 (Reasoning & Evidence, informational), T9 (Central Ideas), T8 (Key Details), and T10 (Word Meanings). Other claims, grades, or Claim-1 targets belong in their own dialects.
|
|
108
128
|
- **Dual-text stimuli** — a single passage only in this version.
|
|
109
129
|
- **Compile-time generation** — the compiler selects and validates authored content; it does not invent claims, distractors, or evidence.
|
|
110
130
|
- **Auto-scoring** — short-text responses are hand-scored against the rubric; the compiler emits the rubric only.
|