soustack 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE CHANGED
@@ -1,21 +1,21 @@
1
- MIT License
2
-
3
- Copyright (c) 2024 Richard Herold
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Richard Herold
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md CHANGED
@@ -1,138 +1,168 @@
1
- # Soustack Core
2
-
3
- > **The Logic Engine for Computational Recipes.**
4
-
5
- [![npm version](https://img.shields.io/npm/v/soustack.svg)](https://www.npmjs.com/package/soustack)
6
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
- [![TypeScript](https://img.shields.io/badge/TypeScript-5.0-blue)](https://www.typescriptlang.org/)
8
-
9
- **Soustack Core** is the reference implementation for the [Soustack Standard](https://github.com/soustack/spec). It provides the validation, parsing, and scaling logic required to turn static recipe data into dynamic, computable objects.
10
-
11
- ---
12
-
13
- ## 💡 The Value Proposition
14
-
15
- Most recipe formats (like Schema.org) are **descriptive**—they tell you _what_ a recipe is.
16
- Soustack is **computational**—it understands _how_ a recipe behaves.
17
-
18
- ### The Problems We Solve:
19
-
20
- 1. **The "Salty Soup" Problem (Intelligent Scaling):**
21
- - _Old Way:_ Doubling a recipe doubles every ingredient blindly.
22
- - _Soustack:_ Understands that salt scales differently than flour, and frying oil shouldn't scale at all. It supports **Linear**, **Fixed**, **Discrete**, and **Baker's Percentage** scaling modes.
23
- 2. **The "Lying Prep Time" Problem:**
24
- - _Old Way:_ Authors guess "Prep: 15 mins."
25
- - _Soustack:_ Calculates total time dynamically based on the active/passive duration of every step.
26
- 3. **The "Timing Clash" Problem:**
27
- - _Old Way:_ A flat list of instructions.
28
- - _Soustack:_ A **Dependency Graph** that knows you can chop vegetables while the water boils.
29
-
30
- ---
31
-
32
- ## 📦 Installation
33
-
34
- ```bash
35
- npm install soustack
36
- ```
37
-
38
- ## What’s Included
39
-
40
- - **Validation**: `validateRecipe()` validates Soustack JSON against the bundled schema.
41
- - **Scaling & Computation**: `scaleRecipe()` produces a flat, UI-ready “computed recipe” (scaled ingredients + aggregated timing).
42
- - **Parsers**:
43
- - Ingredient parsing (`parseIngredient`, `parseIngredientLine`)
44
- - Duration parsing (`smartParseDuration`)
45
- - Yield parsing (`parseYield`)
46
- - **Schema.org Conversion**:
47
- - `fromSchemaOrg()` (Schema.org JSON-LD → Soustack)
48
- - `toSchemaOrg()` (Soustack → Schema.org JSON-LD)
49
- - **Web Scraping**: `scrapeRecipe()` fetches a recipe page and extracts Schema.org recipe data from:
50
- - JSON-LD (`<script type="application/ld+json">`)
51
- - Microdata (`itemscope/itemtype`)
52
-
53
- ## Programmatic Usage
54
-
55
- ```ts
56
- import {
57
- scrapeRecipe,
58
- fromSchemaOrg,
59
- toSchemaOrg,
60
- validateRecipe,
61
- scaleRecipe
62
- } from 'soustack';
63
-
64
- // Validate a Soustack recipe JSON object
65
- validateRecipe(recipe);
66
-
67
- // Scale a recipe to a target yield amount (returns a "computed recipe")
68
- const computed = scaleRecipe(recipe, 2);
69
-
70
- // Scrape a URL into a Soustack recipe (throws if no recipe is found)
71
- const scraped = await scrapeRecipe('https://example.com/recipe');
72
-
73
- // Convert Schema.org Soustack
74
- const soustack = fromSchemaOrg(schemaOrgJsonLd);
75
-
76
- // Convert Soustack Schema.org
77
- const jsonLd = toSchemaOrg(recipe);
78
- ```
79
-
80
- ## 🔁 Schema.org Conversion
81
-
82
- Use the new helpers to move between Schema.org JSON-LD and Soustack's structured recipe format.
83
-
84
- ```ts
85
- import { fromSchemaOrg, toSchemaOrg } from 'soustack';
86
-
87
- const soustackRecipe = fromSchemaOrg(schemaOrgJsonLd);
88
- const schemaOrgRecipe = toSchemaOrg(soustackRecipe);
89
- ```
90
-
91
- ## 🧰 Scraping Options
92
-
93
- `scrapeRecipe(url, options)` supports basic fetch tuning:
94
-
95
- - `timeout` (ms, default `10000`)
96
- - `userAgent` (string, optional)
97
- - `maxRetries` (default `2`, retries on non-4xx failures)
98
-
99
- ```ts
100
- import { scrapeRecipe } from 'soustack';
101
-
102
- const recipe = await scrapeRecipe('https://example.com/recipe', {
103
- timeout: 15000,
104
- maxRetries: 3
105
- });
106
- ```
107
-
108
- ### CLI
109
-
110
- ```bash
111
- # Validate & Scale (existing commands)
112
- npx soustack validate recipe.soustack.json
113
- npx soustack scale recipe.soustack.json 2
114
-
115
- # Schema.org ↔ Soustack
116
- npx soustack import recipe.jsonld -o recipe.soustack.json
117
- npx soustack export recipe.soustack.json -o recipe.jsonld
118
- npx soustack scrape "https://example.com/recipe" -o recipe.soustack.json
119
- ```
120
-
121
- ## 🔄 Keeping the Schema in Sync
122
-
123
- The `src/schema.json` file in this repository is a **copy** of the official standard.
124
- The source of truth lives in [RichardHerold/soustack-spec](https://github.com/RichardHerold/soustack-spec).
125
-
126
- **Do not edit `src/schema.json` manually.**
127
-
128
- To update to the latest version of the standard, run:
129
-
130
- ```bash
131
- npm run sync:spec
132
- ```
133
-
134
- ## Development
135
-
136
- ```bash
137
- npm test
138
- ```
1
+ # Soustack Core
2
+
3
+ > **The Logic Engine for Computational Recipes.**
4
+
5
+ [![npm version](https://img.shields.io/npm/v/soustack.svg)](https://www.npmjs.com/package/soustack)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
+ [![TypeScript](https://img.shields.io/badge/TypeScript-5.0-blue)](https://www.typescriptlang.org/)
8
+
9
+ **Soustack Core** is the reference implementation for the [Soustack Standard](https://github.com/soustack/spec). It provides the validation, parsing, and scaling logic required to turn static recipe data into dynamic, computable objects.
10
+
11
+ ---
12
+
13
+ ## 💡 The Value Proposition
14
+
15
+ Most recipe formats (like Schema.org) are **descriptive**—they tell you _what_ a recipe is.
16
+ Soustack is **computational**—it understands _how_ a recipe behaves.
17
+
18
+ ### The Problems We Solve:
19
+
20
+ 1. **The "Salty Soup" Problem (Intelligent Scaling):**
21
+ - _Old Way:_ Doubling a recipe doubles every ingredient blindly.
22
+ - _Soustack:_ Understands that salt scales differently than flour, and frying oil shouldn't scale at all. It supports **Linear**, **Fixed**, **Discrete**, and **Baker's Percentage** scaling modes.
23
+ 2. **The "Lying Prep Time" Problem:**
24
+ - _Old Way:_ Authors guess "Prep: 15 mins."
25
+ - _Soustack:_ Calculates total time dynamically based on the active/passive duration of every step.
26
+ 3. **The "Timing Clash" Problem:**
27
+ - _Old Way:_ A flat list of instructions.
28
+ - _Soustack:_ A **Dependency Graph** that knows you can chop vegetables while the water boils.
29
+
30
+ ---
31
+
32
+ ## 📦 Installation
33
+
34
+ ```bash
35
+ npm install soustack
36
+ ```
37
+
38
+ ## What’s Included
39
+
40
+ - **Validation**: `validateRecipe()` validates Soustack JSON against the bundled schema.
41
+ - **Scaling & Computation**: `scaleRecipe()` produces a flat, UI-ready “computed recipe” (scaled ingredients + aggregated timing).
42
+ - **Parsers**:
43
+ - Ingredient parsing (`parseIngredient`, `parseIngredientLine`)
44
+ - Duration parsing (`smartParseDuration`)
45
+ - Yield parsing (`parseYield`)
46
+ - **Schema.org Conversion**:
47
+ - `fromSchemaOrg()` (Schema.org JSON-LD → Soustack)
48
+ - `toSchemaOrg()` (Soustack → Schema.org JSON-LD)
49
+ - **Web Scraping**:
50
+ - `scrapeRecipe()` fetches a recipe page and extracts Schema.org recipe data (Node.js only)
51
+ - `extractRecipeFromHTML()` extracts recipe data from HTML string (browser & Node.js compatible)
52
+ - Supports JSON-LD (`<script type="application/ld+json">`) and Microdata (`itemscope/itemtype`)
53
+
54
+ ## Programmatic Usage
55
+
56
+ ```ts
57
+ import {
58
+ scrapeRecipe,
59
+ extractRecipeFromHTML,
60
+ fromSchemaOrg,
61
+ toSchemaOrg,
62
+ validateRecipe,
63
+ scaleRecipe,
64
+ } from 'soustack';
65
+
66
+ // Validate a Soustack recipe JSON object
67
+ validateRecipe(recipe);
68
+
69
+ // Scale a recipe to a target yield amount (returns a "computed recipe")
70
+ const computed = scaleRecipe(recipe, 2);
71
+
72
+ // Scrape a URL into a Soustack recipe (Node.js only, throws if no recipe is found)
73
+ const scraped = await scrapeRecipe('https://example.com/recipe');
74
+
75
+ // Extract recipe from HTML string (browser & Node.js compatible)
76
+ const html = await fetch('https://example.com/recipe').then((r) => r.text());
77
+ const recipe = extractRecipeFromHTML(html);
78
+
79
+ // Convert Schema.org → Soustack
80
+ const soustack = fromSchemaOrg(schemaOrgJsonLd);
81
+
82
+ // Convert Soustack Schema.org
83
+ const jsonLd = toSchemaOrg(recipe);
84
+ ```
85
+
86
+ ## 🔁 Schema.org Conversion
87
+
88
+ Use the new helpers to move between Schema.org JSON-LD and Soustack's structured recipe format.
89
+
90
+ ```ts
91
+ import { fromSchemaOrg, toSchemaOrg } from 'soustack';
92
+
93
+ const soustackRecipe = fromSchemaOrg(schemaOrgJsonLd);
94
+ const schemaOrgRecipe = toSchemaOrg(soustackRecipe);
95
+ ```
96
+
97
+ ## 🧰 Web Scraping
98
+
99
+ ### Node.js: `scrapeRecipe()`
100
+
101
+ `scrapeRecipe(url, options)` fetches a recipe page and extracts Schema.org data. **Node.js only** due to CORS restrictions.
102
+
103
+ Options:
104
+
105
+ - `timeout` (ms, default `10000`)
106
+ - `userAgent` (string, optional)
107
+ - `maxRetries` (default `2`, retries on non-4xx failures)
108
+
109
+ ```ts
110
+ import { scrapeRecipe } from 'soustack';
111
+
112
+ const recipe = await scrapeRecipe('https://example.com/recipe', {
113
+ timeout: 15000,
114
+ maxRetries: 3,
115
+ });
116
+ ```
117
+
118
+ ### Browser: `extractRecipeFromHTML()`
119
+
120
+ `extractRecipeFromHTML(html)` extracts recipe data from an HTML string. **Works in both browser and Node.js**. Perfect for browser usage where you fetch HTML yourself (with cookies/session for authenticated content).
121
+
122
+ ```ts
123
+ import { extractRecipeFromHTML } from 'soustack';
124
+
125
+ // In browser: fetch HTML yourself (bypasses CORS, uses your cookies/session)
126
+ const response = await fetch('https://example.com/recipe');
127
+ const html = await response.text();
128
+ const recipe = extractRecipeFromHTML(html);
129
+ ```
130
+
131
+ **Why use `extractRecipeFromHTML()` in browsers?**
132
+
133
+ - ✅ No CORS issues — you fetch HTML yourself
134
+ - ✅ Works with authenticated/paywalled content — uses browser cookies
135
+ - ✅ Smaller bundle — no Node.js dependencies
136
+ - ✅ Universal — works in both browser and Node.js environments
137
+
138
+ ### CLI
139
+
140
+ ```bash
141
+ # Validate & Scale (existing commands)
142
+ npx soustack validate recipe.soustack.json
143
+ npx soustack scale recipe.soustack.json 2
144
+
145
+ # Schema.org ↔ Soustack
146
+ npx soustack import recipe.jsonld -o recipe.soustack.json
147
+ npx soustack export recipe.soustack.json -o recipe.jsonld
148
+ npx soustack scrape "https://example.com/recipe" -o recipe.soustack.json
149
+ ```
150
+
151
+ ## 🔄 Keeping the Schema in Sync
152
+
153
+ The `src/schema.json` file in this repository is a **copy** of the official standard.
154
+ The source of truth lives in [RichardHerold/soustack-spec](https://github.com/RichardHerold/soustack-spec).
155
+
156
+ **Do not edit `src/schema.json` manually.**
157
+
158
+ To update to the latest version of the standard, run:
159
+
160
+ ```bash
161
+ npm run sync:spec
162
+ ```
163
+
164
+ ## Development
165
+
166
+ ```bash
167
+ npm test
168
+ ```
package/dist/cli/index.js CHANGED
@@ -1313,6 +1313,8 @@ function extractRecipeNode(input) {
1313
1313
  function hasRecipeType(value) {
1314
1314
  if (!value) return false;
1315
1315
  const types = Array.isArray(value) ? value : [value];
1316
+ fetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "fromSchemaOrg.ts:95", message: "hasRecipeType check", data: { types, typesLower: types.map((t) => typeof t === "string" ? t.toLowerCase() : t), isMatch: types.some((e) => typeof e === "string" && e.toLowerCase() === "recipe") }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "A" }) }).catch(() => {
1317
+ });
1316
1318
  return types.some(
1317
1319
  (entry) => typeof entry === "string" && entry.toLowerCase() === "recipe"
1318
1320
  );
@@ -1669,18 +1671,26 @@ var DEFAULT_USER_AGENTS = [
1669
1671
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
1670
1672
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0"
1671
1673
  ];
1672
- var fetchImpl = null;
1673
- async function ensureFetch() {
1674
- if (!fetchImpl) {
1675
- fetchImpl = import('node-fetch').then((mod) => mod.default);
1676
- }
1677
- return fetchImpl;
1678
- }
1679
1674
  function chooseUserAgent(provided) {
1680
1675
  if (provided) return provided;
1681
1676
  const index = Math.floor(Math.random() * DEFAULT_USER_AGENTS.length);
1682
1677
  return DEFAULT_USER_AGENTS[index];
1683
1678
  }
1679
+ function resolveFetch(fetchFn) {
1680
+ if (fetchFn) {
1681
+ return fetchFn;
1682
+ }
1683
+ const globalFetch = globalThis.fetch;
1684
+ if (!globalFetch) {
1685
+ throw new Error(
1686
+ "A global fetch implementation is not available. Provide window.fetch in browsers or upgrade to Node 18+."
1687
+ );
1688
+ }
1689
+ return globalFetch;
1690
+ }
1691
+ function isBrowserEnvironment() {
1692
+ return typeof globalThis.document !== "undefined";
1693
+ }
1684
1694
  function isClientError(error) {
1685
1695
  if (typeof error.status === "number") {
1686
1696
  return error.status >= 400 && error.status < 500;
@@ -1694,25 +1704,40 @@ async function fetchPage(url, options = {}) {
1694
1704
  const {
1695
1705
  timeout = 1e4,
1696
1706
  userAgent,
1697
- maxRetries = 2
1707
+ maxRetries = 2,
1708
+ fetchFn
1698
1709
  } = options;
1699
1710
  let lastError = null;
1711
+ const resolvedFetch = resolveFetch(fetchFn);
1712
+ const isBrowser2 = isBrowserEnvironment();
1700
1713
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
1701
1714
  const controller = new AbortController();
1702
1715
  const timeoutId = setTimeout(() => controller.abort(), timeout);
1703
1716
  try {
1704
- const fetch = await ensureFetch();
1705
1717
  const headers = {
1706
- "User-Agent": chooseUserAgent(userAgent),
1707
1718
  Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
1708
1719
  "Accept-Language": "en-US,en;q=0.5"
1709
1720
  };
1710
- const response = await fetch(url, {
1721
+ if (!isBrowser2) {
1722
+ headers["User-Agent"] = chooseUserAgent(userAgent);
1723
+ }
1724
+ const requestInit = {
1711
1725
  headers,
1712
1726
  signal: controller.signal,
1713
1727
  redirect: "follow"
1714
- });
1728
+ };
1729
+ const response = await resolvedFetch(url, requestInit);
1715
1730
  clearTimeout(timeoutId);
1731
+ if (response && (typeof process === "undefined" || process.env.NODE_ENV !== "test")) {
1732
+ try {
1733
+ const globalFetch = typeof globalThis !== "undefined" && typeof globalThis.fetch !== "undefined" ? globalThis.fetch : null;
1734
+ if (globalFetch) {
1735
+ globalFetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/fetch.ts:63", message: "fetch response", data: { url, status: response.status, statusText: response.statusText, ok: response.ok, isNYTimes: url.includes("nytimes.com") }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "B" }) }).catch(() => {
1736
+ });
1737
+ }
1738
+ } catch {
1739
+ }
1740
+ }
1716
1741
  if (!response.ok) {
1717
1742
  const error = new Error(
1718
1743
  `HTTP ${response.status}: ${response.statusText}`
@@ -1720,7 +1745,18 @@ async function fetchPage(url, options = {}) {
1720
1745
  error.status = response.status;
1721
1746
  throw error;
1722
1747
  }
1723
- return await response.text();
1748
+ const html = await response.text();
1749
+ if (typeof process === "undefined" || process.env.NODE_ENV !== "test") {
1750
+ try {
1751
+ const globalFetch = typeof globalThis !== "undefined" && typeof globalThis.fetch !== "undefined" ? globalThis.fetch : null;
1752
+ if (globalFetch) {
1753
+ globalFetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/fetch.ts:75", message: "HTML received", data: { htmlLength: html.length, hasLoginPage: html.toLowerCase().includes("login") || html.toLowerCase().includes("sign in"), hasRecipeData: html.includes("application/ld+json") || html.includes("schema.org/Recipe") }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "B,D" }) }).catch(() => {
1754
+ });
1755
+ }
1756
+ } catch {
1757
+ }
1758
+ }
1759
+ return html;
1724
1760
  } catch (err) {
1725
1761
  clearTimeout(timeoutId);
1726
1762
  lastError = err instanceof Error ? err : new Error(String(err));
@@ -1747,6 +1783,8 @@ function isRecipeNode(value) {
1747
1783
  return false;
1748
1784
  }
1749
1785
  const type = value["@type"];
1786
+ fetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/extractors/utils.ts:14", message: "isRecipeNode check", data: { type, typeLower: typeof type === "string" ? type.toLowerCase() : Array.isArray(type) ? type.map((t) => typeof t === "string" ? t.toLowerCase() : t) : void 0, isMatch: typeof type === "string" ? RECIPE_TYPES.has(type.toLowerCase()) : Array.isArray(type) ? type.some((e) => typeof e === "string" && RECIPE_TYPES.has(e.toLowerCase())) : false }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "A" }) }).catch(() => {
1787
+ });
1750
1788
  if (typeof type === "string") {
1751
1789
  return RECIPE_TYPES.has(type.toLowerCase());
1752
1790
  }
@@ -1774,14 +1812,20 @@ function normalizeText(value) {
1774
1812
  function extractJsonLd(html) {
1775
1813
  const $ = cheerio.load(html);
1776
1814
  const scripts = $('script[type="application/ld+json"]');
1815
+ fetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/extractors/jsonld.ts:8", message: "JSON-LD scripts found", data: { scriptCount: scripts.length }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "C,D" }) }).catch(() => {
1816
+ });
1777
1817
  const candidates = [];
1778
1818
  scripts.each((_, element) => {
1779
1819
  const content = $(element).html();
1780
1820
  if (!content) return;
1781
1821
  const parsed = safeJsonParse(content);
1782
1822
  if (!parsed) return;
1823
+ fetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/extractors/jsonld.ts:18", message: "JSON-LD parsed", data: { hasGraph: !!(parsed && typeof parsed === "object" && "@graph" in parsed), type: parsed && typeof parsed === "object" && "@type" in parsed ? parsed["@type"] : void 0 }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "A,C" }) }).catch(() => {
1824
+ });
1783
1825
  collectCandidates(parsed, candidates);
1784
1826
  });
1827
+ fetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/extractors/jsonld.ts:22", message: "JSON-LD candidates", data: { candidateCount: candidates.length, candidateTypes: candidates.map((c) => c["@type"]) }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "A,C" }) }).catch(() => {
1828
+ });
1785
1829
  return candidates[0] ?? null;
1786
1830
  }
1787
1831
  function collectCandidates(payload, bucket) {
@@ -1853,13 +1897,123 @@ function findPropertyValue($, context, prop) {
1853
1897
  return normalizeText(node.attr("content")) || normalizeText(node.attr("href")) || normalizeText(node.attr("src")) || normalizeText(node.text());
1854
1898
  }
1855
1899
 
1900
+ // src/scraper/extractors/browser.ts
1901
+ var SIMPLE_PROPS2 = ["name", "description", "image", "recipeYield", "prepTime", "cookTime", "totalTime"];
1902
+ function extractRecipeBrowser(html) {
1903
+ const jsonLdRecipe = extractJsonLdBrowser(html);
1904
+ if (jsonLdRecipe) {
1905
+ return { recipe: jsonLdRecipe, source: "jsonld" };
1906
+ }
1907
+ const microdataRecipe = extractMicrodataBrowser(html);
1908
+ if (microdataRecipe) {
1909
+ return { recipe: microdataRecipe, source: "microdata" };
1910
+ }
1911
+ return { recipe: null, source: null };
1912
+ }
1913
+ function extractJsonLdBrowser(html) {
1914
+ if (typeof globalThis.DOMParser === "undefined") {
1915
+ return null;
1916
+ }
1917
+ const parser = new globalThis.DOMParser();
1918
+ const doc = parser.parseFromString(html, "text/html");
1919
+ const scripts = doc.querySelectorAll('script[type="application/ld+json"]');
1920
+ const candidates = [];
1921
+ scripts.forEach((script) => {
1922
+ const content = script.textContent;
1923
+ if (!content) return;
1924
+ const parsed = safeJsonParse(content);
1925
+ if (!parsed) return;
1926
+ collectCandidates2(parsed, candidates);
1927
+ });
1928
+ return candidates[0] ?? null;
1929
+ }
1930
+ function extractMicrodataBrowser(html) {
1931
+ if (typeof globalThis.DOMParser === "undefined") {
1932
+ return null;
1933
+ }
1934
+ const parser = new globalThis.DOMParser();
1935
+ const doc = parser.parseFromString(html, "text/html");
1936
+ const recipeEl = doc.querySelector('[itemscope][itemtype*="schema.org/Recipe"]');
1937
+ if (!recipeEl) {
1938
+ return null;
1939
+ }
1940
+ const recipe = {
1941
+ "@type": "Recipe"
1942
+ };
1943
+ SIMPLE_PROPS2.forEach((prop) => {
1944
+ const value = findPropertyValue2(recipeEl, prop);
1945
+ if (value) {
1946
+ recipe[prop] = value;
1947
+ }
1948
+ });
1949
+ const ingredients = [];
1950
+ recipeEl.querySelectorAll('[itemprop="recipeIngredient"]').forEach((el) => {
1951
+ const text = normalizeText(
1952
+ el.getAttribute("content") || el.textContent || void 0
1953
+ );
1954
+ if (text) ingredients.push(text);
1955
+ });
1956
+ if (ingredients.length) {
1957
+ recipe.recipeIngredient = ingredients;
1958
+ }
1959
+ const instructions = [];
1960
+ recipeEl.querySelectorAll('[itemprop="recipeInstructions"]').forEach((el) => {
1961
+ const text = normalizeText(el.getAttribute("content")) || normalizeText(el.querySelector('[itemprop="text"]')?.textContent || void 0) || normalizeText(el.textContent || void 0);
1962
+ if (text) instructions.push(text);
1963
+ });
1964
+ if (instructions.length) {
1965
+ recipe.recipeInstructions = instructions;
1966
+ }
1967
+ if (recipe.name || ingredients.length) {
1968
+ return recipe;
1969
+ }
1970
+ return null;
1971
+ }
1972
+ function findPropertyValue2(context, prop) {
1973
+ const node = context.querySelector(`[itemprop="${prop}"]`);
1974
+ if (!node) return void 0;
1975
+ return normalizeText(node.getAttribute("content")) || normalizeText(node.getAttribute("href")) || normalizeText(node.getAttribute("src")) || normalizeText(node.textContent || void 0);
1976
+ }
1977
+ function collectCandidates2(payload, bucket) {
1978
+ if (!payload) return;
1979
+ if (Array.isArray(payload)) {
1980
+ payload.forEach((entry) => collectCandidates2(entry, bucket));
1981
+ return;
1982
+ }
1983
+ if (typeof payload !== "object") {
1984
+ return;
1985
+ }
1986
+ if (isRecipeNode(payload)) {
1987
+ bucket.push(payload);
1988
+ return;
1989
+ }
1990
+ const graph = payload["@graph"];
1991
+ if (Array.isArray(graph)) {
1992
+ graph.forEach((entry) => collectCandidates2(entry, bucket));
1993
+ }
1994
+ }
1995
+
1856
1996
  // src/scraper/extractors/index.ts
1997
+ function isBrowser() {
1998
+ try {
1999
+ return typeof globalThis.DOMParser !== "undefined";
2000
+ } catch {
2001
+ return false;
2002
+ }
2003
+ }
1857
2004
  function extractRecipe(html) {
2005
+ if (isBrowser()) {
2006
+ return extractRecipeBrowser(html);
2007
+ }
1858
2008
  const jsonLdRecipe = extractJsonLd(html);
2009
+ fetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/extractors/index.ts:6", message: "JSON-LD extraction result", data: { hasJsonLd: !!jsonLdRecipe }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "C,D" }) }).catch(() => {
2010
+ });
1859
2011
  if (jsonLdRecipe) {
1860
2012
  return { recipe: jsonLdRecipe, source: "jsonld" };
1861
2013
  }
1862
2014
  const microdataRecipe = extractMicrodata(html);
2015
+ fetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/extractors/index.ts:12", message: "Microdata extraction result", data: { hasMicrodata: !!microdataRecipe }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "D" }) }).catch(() => {
2016
+ });
1863
2017
  if (microdataRecipe) {
1864
2018
  return { recipe: microdataRecipe, source: "microdata" };
1865
2019
  }
@@ -1868,12 +2022,20 @@ function extractRecipe(html) {
1868
2022
 
1869
2023
  // src/scraper/index.ts
1870
2024
  async function scrapeRecipe(url, options = {}) {
2025
+ fetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/index.ts:7", message: "scrapeRecipe entry", data: { url, hasOptions: !!options }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "A,B,C,D,E" }) }).catch(() => {
2026
+ });
1871
2027
  const html = await fetchPage(url, options);
2028
+ fetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/index.ts:9", message: "HTML fetched", data: { htmlLength: html?.length, htmlPreview: html?.substring(0, 200) }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "B" }) }).catch(() => {
2029
+ });
1872
2030
  const { recipe } = extractRecipe(html);
2031
+ fetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/index.ts:11", message: "extractRecipe result", data: { hasRecipe: !!recipe, recipeType: recipe?.["@type"], recipeName: recipe?.name }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "A,C,D" }) }).catch(() => {
2032
+ });
1873
2033
  if (!recipe) {
1874
2034
  throw new Error("No Schema.org recipe data found in page");
1875
2035
  }
1876
2036
  const soustackRecipe = fromSchemaOrg(recipe);
2037
+ fetch("http://127.0.0.1:7243/ingest/7225c3b5-9ac2-4c94-b561-807ca9003b66", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ location: "scraper/index.ts:17", message: "fromSchemaOrg result", data: { hasSoustackRecipe: !!soustackRecipe, soustackRecipeName: soustackRecipe?.name }, timestamp: Date.now(), sessionId: "debug-session", runId: "run1", hypothesisId: "A" }) }).catch(() => {
2038
+ });
1877
2039
  if (!soustackRecipe) {
1878
2040
  throw new Error("Schema.org data did not include a valid recipe");
1879
2041
  }