soustack 0.2.1 β†’ 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,244 +1,394 @@
1
- # Soustack Core
2
-
3
- > **The Logic Engine for Computational Recipes.**
4
-
5
- [![npm version](https://img.shields.io/npm/v/soustack.svg)](https://www.npmjs.com/package/soustack)
6
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
- [![TypeScript](https://img.shields.io/badge/TypeScript-5.0-blue)](https://www.typescriptlang.org/)
8
-
9
- **Soustack Core** is the reference implementation for the [Soustack Standard](https://github.com/soustack/spec). It provides the validation, parsing, and scaling logic required to turn static recipe data into dynamic, computable objects.
10
-
11
- ---
12
-
13
- ## πŸ’‘ The Value Proposition
14
-
15
- Most recipe formats (like Schema.org) are **descriptive**β€”they tell you _what_ a recipe is.
16
- Soustack is **computational**β€”it understands _how_ a recipe behaves.
17
-
18
- ### The Problems We Solve:
19
-
20
- 1. **The "Salty Soup" Problem (Intelligent Scaling):**
21
- - _Old Way:_ Doubling a recipe doubles every ingredient blindly.
22
- - _Soustack:_ Understands that salt scales differently than flour, and frying oil shouldn't scale at all. It supports **Linear**, **Fixed**, **Discrete**, and **Baker's Percentage** scaling modes.
23
- 2. **The "Lying Prep Time" Problem:**
24
- - _Old Way:_ Authors guess "Prep: 15 mins."
25
- - _Soustack:_ Calculates total time dynamically based on the active/passive duration of every step.
26
- 3. **The "Timing Clash" Problem:**
27
- - _Old Way:_ A flat list of instructions.
28
- - _Soustack:_ A **Dependency Graph** that knows you can chop vegetables while the water boils.
29
-
30
- ---
31
-
32
- ## πŸ“¦ Installation
33
-
34
- ```bash
35
- npm install soustack
36
- ```
37
-
38
- ## What's Included
39
-
40
- - **Validation**: `validateRecipe()` validates Soustack JSON against the bundled schema.
41
- - **Scaling & Computation**: `scaleRecipe()` produces a flat, UI-ready "computed recipe" (scaled ingredients + aggregated timing).
42
- - **Parsers**:
43
- - Ingredient parsing (`parseIngredient`, `parseIngredientLine`)
44
- - Duration parsing (`smartParseDuration`)
45
- - Yield parsing (`parseYield`)
46
- - **Schema.org Conversion**:
47
- - `fromSchemaOrg()` (Schema.org JSON-LD β†’ Soustack)
48
- - `toSchemaOrg()` (Soustack β†’ Schema.org JSON-LD)
49
- - `normalizeImage()` utility for converting Schema.org image formats to Soustack format
50
- - **Image Support**:
51
- - Recipe-level images: single URL or array of URLs
52
- - Instruction-level images: optional image URL per step
53
- - Automatic normalization from Schema.org ImageObject formats
54
- - **Web Scraping**:
55
- - `scrapeRecipe()` fetches a recipe page and extracts Schema.org recipe data (Node.js only)
56
- - `extractRecipeFromHTML()` extracts recipe data from HTML string, returns Soustack format (browser & Node.js compatible)
57
- - `extractSchemaOrgRecipeFromHTML()` extracts raw Schema.org recipe data from HTML string (browser & Node.js compatible)
58
- - Supports JSON-LD (`<script type="application/ld+json">`) and Microdata (`itemscope/itemtype`)
59
-
60
- ## Programmatic Usage
61
-
62
- ```ts
63
- import {
64
- scrapeRecipe,
65
- extractRecipeFromHTML,
66
- extractSchemaOrgRecipeFromHTML,
67
- fromSchemaOrg,
68
- toSchemaOrg,
69
- validateRecipe,
70
- scaleRecipe,
71
- normalizeImage,
72
- } from 'soustack';
73
-
74
- // Validate a Soustack recipe JSON object
75
- validateRecipe(recipe);
76
-
77
- // Scale a recipe to a target yield amount (returns a "computed recipe")
78
- const computed = scaleRecipe(recipe, 2);
79
-
80
- // Scrape a URL into a Soustack recipe (Node.js only, throws if no recipe is found)
81
- const scraped = await scrapeRecipe('https://example.com/recipe');
82
-
83
- // Extract recipe from HTML string (browser & Node.js compatible)
84
- // Option 1: Get Soustack format directly
85
- const html = await fetch('https://example.com/recipe').then((r) => r.text());
86
- const recipe = extractRecipeFromHTML(html);
87
-
88
- // Option 2: Get Schema.org format first (for inspection/modification)
89
- const schemaOrgRecipe = extractSchemaOrgRecipeFromHTML(html);
90
- if (schemaOrgRecipe) {
91
- const soustackRecipe = fromSchemaOrg(schemaOrgRecipe);
92
- }
93
-
94
- // Convert Schema.org β†’ Soustack
95
- const soustack = fromSchemaOrg(schemaOrgJsonLd);
96
-
97
- // Convert Soustack β†’ Schema.org
98
- const jsonLd = toSchemaOrg(recipe);
99
-
100
- // Normalize Schema.org image formats (strings, arrays, ImageObjects)
101
- const normalized = normalizeImage(schemaOrgRecipe.image);
102
- // Returns: string | string[] | undefined
103
- ```
104
-
105
- ## πŸ” Schema.org Conversion
106
-
107
- Use the helpers to move between Schema.org JSON-LD and Soustack's structured recipe format. The conversion automatically handles image normalization, supporting multiple image formats from Schema.org.
108
-
109
- ```ts
110
- import { fromSchemaOrg, toSchemaOrg, normalizeImage } from 'soustack';
111
-
112
- // Convert Schema.org β†’ Soustack (automatically normalizes images)
113
- const soustackRecipe = fromSchemaOrg(schemaOrgJsonLd);
114
- // Recipe images: string | string[] | undefined
115
- // Instruction images: optional image URL per step
116
-
117
- // Convert Soustack β†’ Schema.org (preserves images)
118
- const schemaOrgRecipe = toSchemaOrg(soustackRecipe);
119
-
120
- // Manual image normalization (if needed)
121
- const normalized = normalizeImage(schemaOrgImage);
122
- // Handles: strings, arrays, ImageObjects with url/contentUrl
123
- ```
124
-
125
- ### Image Format Support
126
-
127
- Soustack supports flexible image formats:
128
-
129
- - **Recipe-level images**: Single URL (`string`) or multiple URLs (`string[]`)
130
- - **Instruction-level images**: Optional `image` property on instruction objects
131
- - **Automatic normalization**: Schema.org ImageObjects are automatically converted to URLs during import
132
-
133
- Example recipe with images:
134
-
135
- ```ts
136
- const recipe = {
137
- name: "Chocolate Cake",
138
- image: ["https://example.com/hero.jpg", "https://example.com/gallery.jpg"],
139
- instructions: [
140
- "Mix dry ingredients",
141
- { text: "Decorate the cake", image: "https://example.com/decorate.jpg" },
142
- "Serve"
143
- ]
144
- };
145
- ```
146
-
147
- ## 🧰 Web Scraping
148
-
149
- ### Node.js: `scrapeRecipe()`
150
-
151
- `scrapeRecipe(url, options)` fetches a recipe page and extracts Schema.org data. **Node.js only** due to CORS restrictions.
152
-
153
- Options:
154
-
155
- - `timeout` (ms, default `10000`)
156
- - `userAgent` (string, optional)
157
- - `maxRetries` (default `2`, retries on non-4xx failures)
158
-
159
- ```ts
160
- import { scrapeRecipe } from 'soustack';
161
-
162
- const recipe = await scrapeRecipe('https://example.com/recipe', {
163
- timeout: 15000,
164
- maxRetries: 3,
165
- });
166
- ```
167
-
168
- ### Browser: `extractRecipeFromHTML()` and `extractSchemaOrgRecipeFromHTML()`
169
-
170
- #### `extractRecipeFromHTML()` - Returns Soustack Format
171
-
172
- `extractRecipeFromHTML(html)` extracts recipe data from an HTML string and returns it in Soustack format. **Works in both browser and Node.js**. Perfect for browser usage where you fetch HTML yourself (with cookies/session for authenticated content).
173
-
174
- ```ts
175
- import { extractRecipeFromHTML } from 'soustack';
176
-
177
- // In browser: fetch HTML yourself (bypasses CORS, uses your cookies/session)
178
- const response = await fetch('https://example.com/recipe');
179
- const html = await response.text();
180
- const recipe = extractRecipeFromHTML(html); // Already in Soustack format
181
- ```
182
-
183
- #### `extractSchemaOrgRecipeFromHTML()` - Returns Schema.org Format
184
-
185
- `extractSchemaOrgRecipeFromHTML(html)` extracts the raw Schema.org recipe data from HTML. Returns `null` if no recipe is found. Use this when you need to inspect, debug, or modify the Schema.org data before converting to Soustack format.
186
-
187
- ```ts
188
- import { extractSchemaOrgRecipeFromHTML, fromSchemaOrg } from 'soustack';
189
-
190
- // In browser: fetch HTML yourself
191
- const response = await fetch('https://example.com/recipe');
192
- const html = await response.text();
193
-
194
- // Extract Schema.org format (for inspection/modification)
195
- const schemaOrgRecipe = extractSchemaOrgRecipeFromHTML(html);
196
-
197
- if (schemaOrgRecipe) {
198
- // Inspect or modify Schema.org data before converting
199
- console.log('Found recipe:', schemaOrgRecipe.name);
200
-
201
- // Convert to Soustack format when ready
202
- const soustackRecipe = fromSchemaOrg(schemaOrgRecipe);
203
- }
204
- ```
205
-
206
- **Why use these functions in browsers?**
207
-
208
- - βœ… No CORS issues β€” you fetch HTML yourself
209
- - βœ… Works with authenticated/paywalled content β€” uses browser cookies
210
- - βœ… Smaller bundle β€” no Node.js dependencies
211
- - βœ… Universal β€” works in both browser and Node.js environments
212
- - βœ… Flexible β€” choose Schema.org format for inspection/modification, or Soustack format for direct use
213
-
214
- ### CLI
215
-
216
- ```bash
217
- # Validate & Scale (existing commands)
218
- npx soustack validate recipe.soustack.json
219
- npx soustack scale recipe.soustack.json 2
220
-
221
- # Schema.org ↔ Soustack
222
- npx soustack import recipe.jsonld -o recipe.soustack.json
223
- npx soustack export recipe.soustack.json -o recipe.jsonld
224
- npx soustack scrape "https://example.com/recipe" -o recipe.soustack.json
225
- ```
226
-
227
- ## πŸ”„ Keeping the Schema in Sync
228
-
229
- The `src/schema.json` file in this repository is a **copy** of the official standard.
230
- The source of truth lives in [RichardHerold/soustack-spec](https://github.com/RichardHerold/soustack-spec).
231
-
232
- **Do not edit `src/schema.json` manually.**
233
-
234
- To update to the latest version of the standard, run:
235
-
236
- ```bash
237
- npm run sync:spec
238
- ```
239
-
240
- ## Development
241
-
242
- ```bash
243
- npm test
244
- ```
1
+ # Soustack Core
2
+
3
+ > **The Logic Engine for Computational Recipes.**
4
+
5
+ [![npm version](https://img.shields.io/npm/v/soustack.svg)](https://www.npmjs.com/package/soustack)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
+ [![TypeScript](https://img.shields.io/badge/TypeScript-5.0-blue)](https://www.typescriptlang.org/)
8
+
9
+ **Soustack Core** is the reference implementation for the [Soustack Standard](https://github.com/RichardHerold/soustack-spec). It provides the validation, parsing, and scaling logic required to turn static recipe data into dynamic, computable objects.
10
+
11
+ ---
12
+
13
+ ## πŸ’‘ The Value Proposition
14
+
15
+ Most recipe formats (like Schema.org) are **descriptive**β€”they tell you _what_ a recipe is.
16
+ Soustack is **computational**β€”it understands _how_ a recipe behaves.
17
+
18
+ ### The Problems We Solve:
19
+
20
+ 1. **The "Salty Soup" Problem (Intelligent Scaling):**
21
+ - _Old Way:_ Doubling a recipe doubles every ingredient blindly.
22
+ - _Soustack:_ Understands that salt scales differently than flour, and frying oil shouldn't scale at all. It supports **Linear**, **Fixed**, **Discrete**, **Proportional**, and **Baker's Percentage** scaling modes.
23
+ 2. **The "Lying Prep Time" Problem:**
24
+ - _Old Way:_ Authors guess "Prep: 15 mins."
25
+ - _Soustack:_ Calculates total time dynamically based on the active/passive duration of every step.
26
+ 3. **The "Timing Clash" Problem:**
27
+ - _Old Way:_ A flat list of instructions.
28
+ - _Soustack:_ A **Dependency Graph** that knows you can chop vegetables while the water boils.
29
+
30
+ ---
31
+
32
+ ## πŸ“¦ Installation
33
+
34
+ ```bash
35
+ npm install soustack
36
+ ```
37
+
38
+ ## What's Included
39
+
40
+ - **Validation**: `validateRecipe()` validates Soustack JSON against the bundled schema.
41
+ - **Scaling & Computation**: `scaleRecipe()` scales a recipe while honoring per-ingredient scaling rules and instruction timing.
42
+ - **Schema.org Conversion**:
43
+ - `fromSchemaOrg()` (Schema.org JSON-LD β†’ Soustack)
44
+ - `toSchemaOrg()` (Soustack β†’ Schema.org JSON-LD)
45
+ - **Web Extraction**:
46
+ - Browser-safe HTML parsing: `extractSchemaOrgRecipeFromHTML()` (convert to Soustack with `fromSchemaOrg()`)
47
+ - Node-only scraping entrypoint: `scrapeRecipe()` and helpers via `import { ... } from 'soustack/scrape'`
48
+ - **Unit Conversion**: `convertLineItemToMetric()` converts ingredient line items from imperial volumes/masses into metric with deterministic rounding and ingredient-aware equivalencies.
49
+
50
+ ## πŸš€ Quickstart
51
+
52
+ Validate and scale a recipe in just a few lines:
53
+
54
+ ```ts
55
+ import { validateRecipe, scaleRecipe } from 'soustack';
56
+
57
+ // Validate against the bundled Soustack schema
58
+ const { valid, errors, warnings } = validateRecipe(recipe);
59
+ if (!valid) {
60
+ throw new Error(JSON.stringify(errors, null, 2));
61
+ }
62
+ if (warnings?.length) {
63
+ console.warn('Non-blocking warnings', warnings);
64
+ }
65
+
66
+ // Scale to a new yield (multiplier, target yield, or servings)
67
+ const scaled = scaleRecipe(recipe, { multiplier: 2 });
68
+ ```
69
+
70
+ ### Profile-aware validation
71
+
72
+ Use profiles to enforce integration contracts. Available profiles:
73
+ - **minimal**: Basic recipe structure with minimal requirements
74
+ - **core**: Enhanced profile with structured ingredients and instructions
75
+
76
+ ```ts
77
+ import { detectProfiles, validateRecipe } from 'soustack';
78
+
79
+ // Discover which profiles a recipe already satisfies
80
+ const profiles = detectProfiles(recipe); // e.g. ['minimal', 'core']
81
+
82
+ // Validate with a specific profile (defaults to 'core' if not specified)
83
+ const result = validateRecipe(recipe, { profile: 'minimal' });
84
+ if (!result.valid) {
85
+ console.error('Profile validation failed', result.errors);
86
+ }
87
+
88
+ // Validate with modules
89
+ const recipeWithModules = {
90
+ profile: 'minimal',
91
+ modules: ['nutrition@1', 'times@1'],
92
+ name: 'Test Recipe',
93
+ ingredients: ['1 cup flour'],
94
+ instructions: ['Mix'],
95
+ nutrition: { calories: 100, protein_g: 5 }, // Module payload required if declared
96
+ times: { prepMinutes: 10, cookMinutes: 20, totalMinutes: 30 }, // v0.3: uses *Minutes fields
97
+ };
98
+ const result2 = validateRecipe(recipeWithModules);
99
+ // Validates using: base + minimal profile + nutrition@1 module + times@1 module
100
+ // Module contract: if module is declared, payload must exist (and vice versa)
101
+ ```
102
+
103
+ ### Imperial β†’ metric ingredient conversion
104
+
105
+ ```ts
106
+ import { convertLineItemToMetric } from 'soustack';
107
+
108
+ const flour = convertLineItemToMetric(
109
+ { ingredient: 'flour', quantity: 2, unit: 'cup' },
110
+ 'mass'
111
+ );
112
+ // -> { ingredient: 'flour', quantity: 240, unit: 'g', notes: 'Converted using 120g per cup...' }
113
+
114
+ const liquid = convertLineItemToMetric(
115
+ { ingredient: 'milk', quantity: 2, unit: 'cup' },
116
+ 'volume'
117
+ );
118
+ // -> { ingredient: 'milk', quantity: 473, unit: 'ml' }
119
+ ```
120
+
121
+ The converter rounds using β€œsane” defaults (1β€―g/ml under 1β€―kg/1β€―L, then 5β€―g/10β€―ml and 2 decimal places for kg/L) and surfaces typed errors:
122
+
123
+ - `UnknownUnitError` for unsupported unit tokens
124
+ - `UnsupportedConversionError` if you request a mismatched dimension
125
+ - `MissingEquivalencyError` when no volume→mass density is registered for the ingredient/unit combo
126
+
127
+ ### Browser-safe vs. Node-only entrypoints
128
+
129
+ - **Browser-safe:** `import { extractSchemaOrgRecipeFromHTML, fromSchemaOrg, validateRecipe, scaleRecipe } from 'soustack';`
130
+ - Ships without Node fetch/cheerio dependencies.
131
+ - **Node-only scraping:** `import { scrapeRecipe, extractRecipeFromHTML, extractSchemaOrgRecipeFromHTML } from 'soustack/scrape';`
132
+ - Includes HTTP fetching, retries, and cheerio-based parsing for server environments.
133
+
134
+ ## Spec compatibility & bundled schemas
135
+
136
+ - Targets Soustack spec **v0.3.0** (`spec/SOUSTACK_SPEC_VERSION`, exported as `SOUSTACK_SPEC_VERSION`).
137
+ - Ships the base schema, profile schemas, and module schemas in `spec/schemas/recipe/` and mirrors them into `src/schemas/recipe/` for consumers.
138
+ - Vendored fixtures live in `spec/fixtures` so tests can run offline, and version drift can be checked via `npm run validate:version`.
139
+
140
+ ### Composed Validation Model
141
+
142
+ Soustack v0.3.0 uses a **composed validation model** where recipes are validated using JSON Schema's `allOf` composition:
143
+
144
+ ```json
145
+ {
146
+ "allOf": [
147
+ { "$ref": "base.schema.json" },
148
+ { "$ref": "profiles/{profile}.schema.json" },
149
+ { "$ref": "modules/{module1}/{version}.schema.json" },
150
+ { "$ref": "modules/{module2}/{version}.schema.json" }
151
+ ]
152
+ }
153
+ ```
154
+
155
+ The validator:
156
+ - **Base schema**: Defines the core recipe structure (`@type`, `name`, `ingredients`, `instructions`, `profile`, `modules`)
157
+ - **Profile overlay**: Adds profile-specific requirements (e.g., `minimal` or `core`)
158
+ - **Module overlays**: Each declared module adds its own validation rules
159
+
160
+ **Defaults:**
161
+ - If `profile` is missing, it defaults to `"core"`
162
+ - If `modules` is missing, it defaults to `[]`
163
+
164
+ **Module Contract:** Modules enforce a symmetric contract:
165
+ - If a module is declared in `modules`, the corresponding payload must exist
166
+ - If a payload exists (e.g., `nutrition`, `times`), the module must be declared
167
+ - The validator automatically infers modules from payloads and enforces this contract
168
+
169
+ **Caching:** Validators are cached by `${profile}::${sortedModules.join(",")}` for performance.
170
+
171
+ ### Module Resolution
172
+
173
+ Modules are resolved to schema references using the pattern:
174
+ - Module identifier format: `<name>@<version>` (e.g., `nutrition@1`, `schedule@1`)
175
+ - Schema reference: `https://soustack.org/schemas/recipe/modules/<name>/<version>.schema.json`
176
+
177
+ The module registry (`schemas/registry/modules.json`) defines which modules are available and their properties, including:
178
+ - `schemaOrgMappable`: Whether the module can be converted to Schema.org format
179
+ - `minProfile`: Minimum profile required to use the module
180
+ - `allowedOnMinimal`: Whether the module can be used with the minimal profile
181
+
182
+ **Available Modules (v0.3.0):**
183
+ - `attribution@1`: Source attribution (url, author, datePublished)
184
+ - `taxonomy@1`: Classification (keywords, category, cuisine)
185
+ - `media@1`: Images and videos (images, videos arrays)
186
+ - `times@1`: Timing information (prepMinutes, cookMinutes, totalMinutes)
187
+ - `nutrition@1`: Nutritional data (calories, protein_g as numbers)
188
+ - `schedule@1`: Task scheduling (requires core profile, includes instruction dependencies)
189
+
190
+ ## Programmatic Usage
191
+
192
+ ```ts
193
+ import {
194
+ extractSchemaOrgRecipeFromHTML,
195
+ fromSchemaOrg,
196
+ toSchemaOrg,
197
+ validateRecipe,
198
+ scaleRecipe,
199
+ } from 'soustack';
200
+ import {
201
+ scrapeRecipe,
202
+ extractRecipeFromHTML,
203
+ extractSchemaOrgRecipeFromHTML as extractSchemaOrgRecipeFromHTMLNode,
204
+ } from 'soustack/scrape';
205
+
206
+ // Validate a Soustack recipe JSON object with profile enforcement
207
+ const validation = validateRecipe(recipe, { profile: 'core' });
208
+ if (!validation.valid) {
209
+ console.error(validation.errors);
210
+ }
211
+
212
+ // Scale a recipe to a target yield amount (returns a "computed recipe")
213
+ const scaled = scaleRecipe(recipe, { multiplier: 2 });
214
+
215
+ // Scrape a URL into a Soustack recipe (Node.js only, throws if no recipe is found)
216
+ const scraped = await scrapeRecipe('https://example.com/recipe');
217
+
218
+ // Browser: fetch your own HTML, then parse and convert
219
+ const html = await fetch('https://example.com/recipe').then((r) => r.text());
220
+ const schemaOrgRecipe = extractSchemaOrgRecipeFromHTML(html);
221
+ const recipe = schemaOrgRecipe ? fromSchemaOrg(schemaOrgRecipe) : null;
222
+
223
+ // Node: parse raw HTML with cheerio-powered extractor
224
+ const nodeSchemaOrg = extractSchemaOrgRecipeFromHTMLNode(html);
225
+ const nodeRecipe = extractRecipeFromHTML(html);
226
+
227
+ // Convert Schema.org β†’ Soustack
228
+ const soustack = fromSchemaOrg(schemaOrgJsonLd);
229
+
230
+ // Convert Soustack β†’ Schema.org
231
+ const jsonLd = toSchemaOrg(recipe);
232
+
233
+ ```
234
+
235
+ ## πŸͺΆ Core-lite (browser) Schema.org conversion
236
+
237
+ Need to stay browser-only? Import the core bundle (no `fetch`, no cheerio) and perform Schema.org extraction and conversion entirely client-side:
238
+
239
+ ```ts
240
+ import { extractSchemaOrgRecipeFromHTML, fromSchemaOrg, toSchemaOrg } from 'soustack';
241
+
242
+ async function convert(url: string) {
243
+ const html = await fetch(url).then((r) => r.text());
244
+
245
+ // Pure DOMParser-based extraction (works in modern browsers)
246
+ const schemaOrg = extractSchemaOrgRecipeFromHTML(html);
247
+ if (!schemaOrg) throw new Error('No Schema.org recipe found');
248
+
249
+ // Convert to Soustack and back to Schema.org JSON-LD if needed
250
+ const soustackRecipe = fromSchemaOrg(schemaOrg);
251
+ const jsonLd = toSchemaOrg(soustackRecipe);
252
+
253
+ return { soustackRecipe, jsonLd };
254
+ }
255
+ ```
256
+
257
+ ## πŸ” Schema.org Conversion
258
+
259
+ Use the helpers to move between Schema.org JSON-LD and Soustack's structured recipe format. The conversion automatically handles image normalization, supporting multiple image formats from Schema.org.
260
+
261
+ **BREAKING CHANGE in v0.3.0:** `toSchemaOrg()` now targets the **minimal profile** and only includes modules that are marked as `schemaOrgMappable` in the modules registry. Non-mappable modules (e.g., `nutrition@1`, `schedule@1`) are excluded from the conversion.
262
+
263
+ ```ts
264
+ import { fromSchemaOrg, toSchemaOrg, normalizeImage } from 'soustack';
265
+
266
+ // Convert Schema.org β†’ Soustack (automatically normalizes images)
267
+ const soustackRecipe = fromSchemaOrg(schemaOrgJsonLd);
268
+ // Recipe images: string | string[] | undefined
269
+ // Instruction images: optional image URL per step
270
+
271
+ // Convert Soustack β†’ Schema.org (preserves images)
272
+ const schemaOrgRecipe = toSchemaOrg(soustackRecipe);
273
+
274
+ // Manual image normalization (if needed)
275
+ const normalized = normalizeImage(schemaOrgImage);
276
+ // Handles: strings, arrays, ImageObjects with url/contentUrl
277
+ ```
278
+
279
+ ### Image Format Support
280
+
281
+ Soustack supports flexible image formats:
282
+
283
+ - **Recipe-level images**: Single URL (`string`) or multiple URLs (`string[]`)
284
+ - **Instruction-level images**: Optional `image` property on instruction objects
285
+ - **Automatic normalization**: Schema.org ImageObjects are automatically converted to URLs during import
286
+
287
+ Example recipe with images:
288
+
289
+ ```ts
290
+ const recipe = {
291
+ name: "Chocolate Cake",
292
+ image: ["https://example.com/hero.jpg", "https://example.com/gallery.jpg"],
293
+ instructions: [
294
+ "Mix dry ingredients",
295
+ { text: "Decorate the cake", image: "https://example.com/decorate.jpg" },
296
+ "Serve"
297
+ ]
298
+ };
299
+ ```
300
+
301
+ ## 🧰 Web Scraping
302
+
303
+ ### Node.js: `scrapeRecipe()`
304
+
305
+ `scrapeRecipe(url, options)` fetches a recipe page and extracts Schema.org data. **Node.js only** due to CORS restrictions.
306
+
307
+ Options:
308
+
309
+ - `timeout` (ms, default `10000`)
310
+ - `userAgent` (string, optional)
311
+ - `maxRetries` (default `2`, retries on non-4xx failures)
312
+
313
+ ```ts
314
+ import { scrapeRecipe } from 'soustack';
315
+
316
+ const recipe = await scrapeRecipe('https://example.com/recipe', {
317
+ timeout: 15000,
318
+ maxRetries: 3,
319
+ });
320
+ ```
321
+
322
+ ### Browser: `extractSchemaOrgRecipeFromHTML()`
323
+
324
+ `extractSchemaOrgRecipeFromHTML(html)` extracts the raw Schema.org recipe data from HTML. Returns `null` if no recipe is found. Use this when you need to inspect, debug, or convert Schema.org data in browser builds without dragging in Node dependencies.
325
+
326
+ ```ts
327
+ import { extractSchemaOrgRecipeFromHTML, fromSchemaOrg } from 'soustack';
328
+
329
+ // In browser: fetch HTML yourself
330
+ const response = await fetch('https://example.com/recipe');
331
+ const html = await response.text();
332
+
333
+ // Extract Schema.org format (for inspection/modification)
334
+ const schemaOrgRecipe = extractSchemaOrgRecipeFromHTML(html);
335
+
336
+ if (schemaOrgRecipe) {
337
+ // Inspect or modify Schema.org data before converting
338
+ console.log('Found recipe:', schemaOrgRecipe.name);
339
+
340
+ // Convert to Soustack format when ready
341
+ const soustackRecipe = fromSchemaOrg(schemaOrgRecipe);
342
+ }
343
+ ```
344
+
345
+ ### Node-only scraping: `soustack/scrape`
346
+
347
+ For server-side scraping with built-in fetching and cheerio-based parsing, use the dedicated entrypoint:
348
+
349
+ ```ts
350
+ import { scrapeRecipe, extractRecipeFromHTML, fetchPage } from 'soustack/scrape';
351
+
352
+ // Fetch and parse a URL directly
353
+ const recipe = await scrapeRecipe('https://example.com/recipe');
354
+
355
+ // Or work with already-downloaded HTML
356
+ const html = await fetchPage('https://example.com/recipe');
357
+ const parsed = extractRecipeFromHTML(html);
358
+ ```
359
+
360
+ ### CLI
361
+
362
+ ```bash
363
+ # Validate with profiles (JSON output for pipelines)
364
+ npx soustack validate recipe.soustack.json --profile block --strict --json
365
+
366
+ # Repo-wide test run (validates every *.soustack.json)
367
+ npx soustack test --profile block
368
+
369
+ # Convert Schema.org ↔ Soustack
370
+ npx soustack convert --from schemaorg --to soustack recipe.jsonld -o recipe.soustack.json
371
+ npx soustack convert --from soustack --to schemaorg recipe.soustack.json -o recipe.jsonld
372
+
373
+ # Import (scrape) or scale from the CLI
374
+ npx soustack import --url "https://example.com/recipe" -o recipe.soustack.json
375
+ npx soustack scale recipe.soustack.json 2
376
+ ```
377
+
378
+ ## πŸ”„ Keeping the Schema in Sync
379
+
380
+ The schema files in this repository are **copies** of the official standard. The source of truth lives in [RichardHerold/soustack-spec](https://github.com/RichardHerold/soustack-spec).
381
+
382
+ **Do not edit any synced schema artifacts manually** (`src/schema.json`, `src/soustack.schema.json`, `src/profiles/*.schema.json`).
383
+
384
+ To update to the latest tagged version of the standard, run:
385
+
386
+ ```bash
387
+ npm run sync:spec
388
+ ```
389
+
390
+ ## Development
391
+
392
+ ```bash
393
+ npm test
394
+ ```