json-from-llm 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -4,6 +4,27 @@ All notable changes to this project are documented here. The format follows
4
4
  [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and the project adheres
5
5
  to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
6
 
7
+ ## [Unreleased]
8
+
9
+ ## [0.2.1] - 2026-06-07
10
+
11
+ ### Added
12
+
13
+ - Added provider-style fixtures for fenced OpenAI-like output, multiple fenced
14
+ candidates, Anthropic-like prose wrappers, malformed drafts before final JSON,
15
+ truncated streams and unclosed reasoning blocks.
16
+ - Added edge-case coverage for BOM/whitespace, escaped braces in strings,
17
+ deeply nested JSON and partial malformed input.
18
+
19
+ ### Changed
20
+
21
+ - Made balanced scanning delimiter-aware so mismatched `{` / `]` drafts and
22
+ truncated JSON-looking containers are skipped as malformed candidates.
23
+ - Prefer complete fenced payloads across all JSON-ish fences before falling back
24
+ to lower-confidence balanced fragments inside fenced prose.
25
+ - Treat unclosed reasoning tags as reasoning through the end of the text to
26
+ avoid extracting valid-looking draft JSON.
27
+
7
28
  ## [0.2.0] - 2026-06-05
8
29
 
9
30
  ### Added
package/README.md CHANGED
@@ -35,9 +35,9 @@ const data = extractJson<{ score: number }>(modelOutput);
35
35
 
36
36
  ## Why
37
37
 
38
- - **Reasoning-model aware.** Strips `<think>` / `<thinking>` blocks first, so brace-laden reasoning (a real cause of `No object generated` failures with DeepSeek R1, Gemini 2.5 thinking, prompted Claude) never gets mistaken for the payload.
38
+ - **Reasoning-model aware.** Strips `<think>` / `<thinking>` blocks first, including unclosed reasoning prefixes, so brace-laden reasoning (a real cause of `No object generated` failures with DeepSeek R1, Gemini 2.5 thinking, prompted Claude) never gets mistaken for the payload.
39
39
  - **Handles the real wrappers.** Markdown fences (`json` and bare ```), conversational prose before/after, and the JSON sitting bare in the text.
40
- - **String-aware, never corrupts.** The scanner and the trailing-comma repair both respect string contents — a `}` or `,` inside `"a string value"` is left alone.
40
+ - **String-aware, delimiter-aware, never corrupts.** The scanner and the trailing-comma repair both respect string contents — a `}` or `,` inside `"a string value"` is left alone, and mismatched or truncated JSON-looking drafts are skipped.
41
41
  - **Conservative repair.** Removes trailing commas (the most common malformation); it will never rewrite your data.
42
42
  - **Fixture-backed edge cases.** Public fixtures cover reasoning tags, fenced JSON, prose wrappers, trailing commas, top-level type expectations and no-JSON failures.
43
43
  - **Two library entry points + CLI.** `extractJson` throws on failure; `tryExtractJson` returns `{ found }`; `json-from-llm` reads stdin for shell pipelines.
@@ -107,23 +107,69 @@ interface ExtractOptions {
107
107
  extractJson('[1,2] then the answer {"a":1}', { expect: 'object' }); // { a: 1 }
108
108
  ```
109
109
 
110
+ ### Provider-style snippets
111
+
112
+ OpenAI-style fenced output:
113
+
114
+ `````ts
115
+ const value = extractJson<{ score: number }>(
116
+ `Here is the JSON:
117
+ ```json
118
+ {"score":8,"reason":"clear"}
119
+ ````,
120
+ { expect: 'object' },
121
+ );
122
+ `````
123
+
124
+ Anthropic-style prose around the object:
125
+
126
+ ```ts
127
+ const result = tryExtractJson<{ safe: boolean }>(
128
+ 'I will return the object first.\n{"safe":true}\nLet me know if you need more.',
129
+ { expect: 'object' },
130
+ );
131
+ ```
132
+
133
+ Gemini-style thinking plus a top-level array:
134
+
135
+ ```ts
136
+ const items = extractJson<Array<{ id: string }>>(
137
+ '<thinking>{draft: true}</thinking>\n[{"id":"a"}]',
138
+ { expect: 'array' },
139
+ );
140
+ ```
141
+
110
142
  ### Algorithm
111
143
 
112
- 1. Strip `<think>` / `<thinking>` / `<reasoning>` blocks.
113
- 2. Prefer the contents of fenced `json (or bare `) code blocks.
114
- 3. Otherwise scan for the first balanced `{…}` / `[…]` that parses, string-aware.
115
- 4. If parsing fails, apply conservative repair (trailing commas) and retry.
144
+ 1. Strip `<think>` / `<thinking>` / `<reasoning>` blocks. If a reasoning tag is opened and never closed, treat the rest as reasoning.
145
+ 2. Prefer complete contents of fenced `json` (or bare) code blocks.
146
+ 3. If a fence contains prose, scan inside those fences for balanced JSON after complete fence payloads have been tried.
147
+ 4. Otherwise scan for the first balanced `{…}` / `[…]` that parses, string-aware and delimiter-aware.
148
+ 5. If parsing fails, apply conservative repair (trailing commas) and retry.
116
149
 
117
150
  The low-level pieces (`stripReasoning`, `fencedBlocks`, `balancedSpans`, `removeTrailingCommas`) are exported too.
118
151
 
152
+ ### Caveats
153
+
154
+ - TypeScript generics do not validate runtime shape. Pair this with your schema validator when fields matter.
155
+ - Repair is intentionally narrow: trailing commas only. It will not convert JSON5, comments, single quotes or unquoted keys.
156
+ - Candidate order is deterministic: JSON-ish fences first, then balanced objects/arrays in document order, filtered by `expect`.
157
+ - Unclosed reasoning tags return no JSON from that suffix instead of risking a draft extraction.
158
+
119
159
  ## Fixture corpus
120
160
 
121
161
  The package includes a small public corpus under [`fixtures/`](./fixtures):
122
162
 
123
163
  - `deepseek-thinking-object.txt`
124
164
  - `gemini-reasoning-array.txt`
165
+ - `openai-fenced-object.txt`
166
+ - `multiple-fenced-final.txt`
167
+ - `anthropic-prose-object.txt`
125
168
  - `prose-trailing-commas.txt`
169
+ - `malformed-draft-valid-final.txt`
126
170
  - `expect-object-skips-array.txt`
171
+ - `truncated-stream-no-json.txt`
172
+ - `unclosed-thinking-no-json.txt`
127
173
  - `no-json.txt`
128
174
  - expected `tryExtractJson` outputs under `fixtures/expected/`
129
175
 
package/dist/cli.js CHANGED
@@ -48,19 +48,21 @@ function balancedSpans(text) {
48
48
  while (i < text.length) {
49
49
  const ch = text[i];
50
50
  if (ch === "{" || ch === "[") {
51
- const end = matchBalanced(text, i);
52
- if (end !== -1) {
53
- spans.push(text.slice(i, end));
54
- i = end;
51
+ const match = matchBalanced(text, i);
52
+ if (match.end !== -1) {
53
+ spans.push(text.slice(i, match.end));
54
+ i = match.end;
55
55
  continue;
56
56
  }
57
+ i = Math.max(match.resume, i + 1);
58
+ continue;
57
59
  }
58
60
  i++;
59
61
  }
60
62
  return spans;
61
63
  }
62
64
  function matchBalanced(text, start) {
63
- let depth = 0;
65
+ const expectedClosers = [];
64
66
  let inString = false;
65
67
  let escaped = false;
66
68
  for (let i = start; i < text.length; i++) {
@@ -77,22 +79,53 @@ function matchBalanced(text, start) {
77
79
  }
78
80
  if (ch === '"') {
79
81
  inString = true;
80
- } else if (ch === "{" || ch === "[") {
81
- depth++;
82
- } else if (ch === "}" || ch === "]") {
83
- depth--;
84
- if (depth === 0) {
85
- return i + 1;
82
+ continue;
83
+ }
84
+ if (ch === "{") {
85
+ expectedClosers.push("}");
86
+ continue;
87
+ }
88
+ if (ch === "[") {
89
+ expectedClosers.push("]");
90
+ continue;
91
+ }
92
+ if (ch === "}" || ch === "]") {
93
+ if (expectedClosers.pop() !== ch) {
94
+ return { end: -1, resume: i + 1 };
95
+ }
96
+ if (expectedClosers.length === 0) {
97
+ return { end: i + 1, resume: i + 1 };
86
98
  }
87
99
  }
88
100
  }
89
- return -1;
101
+ return {
102
+ end: -1,
103
+ resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1
104
+ };
105
+ }
106
+ function looksLikeJsonContainerStart(text, start) {
107
+ let index = start + 1;
108
+ while (index < text.length && /\s/.test(text[index])) {
109
+ index++;
110
+ }
111
+ const next = text[index];
112
+ if (text[start] === "{") {
113
+ return next === '"' || next === "}";
114
+ }
115
+ return next === void 0 || next === "[" || next === "{" || next === '"' || next === "]" || next === "-" || next >= "0" && next <= "9" || next === "t" || next === "f" || next === "n";
90
116
  }
91
117
 
92
118
  // src/strip.ts
93
- var REASONING_TAGS = /<(think|thinking|reasoning|thought)>[\s\S]*?<\/\1>/gi;
119
+ var CLOSED_REASONING_BLOCK = /<(think|thinking|reasoning|thought)\b[^>]*>[\s\S]*?<\/\1>/gi;
120
+ var OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\b[^>]*>/gi;
94
121
  function stripReasoning(text) {
95
- return text.replace(REASONING_TAGS, "");
122
+ const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, "");
123
+ OPEN_REASONING_TAG.lastIndex = 0;
124
+ const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);
125
+ if (!unclosed) {
126
+ return withoutClosedBlocks;
127
+ }
128
+ return withoutClosedBlocks.slice(0, unclosed.index);
96
129
  }
97
130
  var FENCE = /```[^\S\n]*([a-zA-Z0-9_+-]*)[^\S\n]*\n?([\s\S]*?)```/g;
98
131
  function fencedBlocks(text) {
@@ -150,8 +183,10 @@ function tryExtractJson(text, options = {}) {
150
183
  const expect = options.expect ?? "any";
151
184
  const cleaned = stripReasoning(text);
152
185
  const candidates = [];
153
- for (const block of fencedBlocks(cleaned)) {
154
- candidates.push(block, ...balancedSpans(block));
186
+ const blocks = fencedBlocks(cleaned);
187
+ candidates.push(...blocks);
188
+ for (const block of blocks) {
189
+ candidates.push(...balancedSpans(block));
155
190
  }
156
191
  candidates.push(...balancedSpans(cleaned));
157
192
  for (const candidate of candidates) {
package/dist/cli.js.map CHANGED
@@ -1 +1 @@
1
- {"version":3,"sources":["../src/cli.ts","../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["#!/usr/bin/env node\nimport { realpathSync } from 'node:fs';\nimport { fileURLToPath } from 'node:url';\nimport { extractJson } from './extract.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions } from './types.ts';\n\nexport interface CliStreams {\n stdout: (chunk: string) => void;\n stderr: (chunk: string) => void;\n}\n\ninterface CliConfig extends ExtractOptions {\n help: boolean;\n}\n\nconst usage = `Usage: json-from-llm [--expect object|array|any] [--no-repair]\n\nRead LLM output from stdin and print the extracted JSON value to stdout.\n\nOptions:\n --expect <type> Require the top-level JSON value to be object, array or any.\n --no-repair Disable conservative trailing-comma repair.\n -h, --help Show this help text.\n`;\n\nfunction parseArgs(args: string[]): CliConfig | string {\n const config: CliConfig = { help: false };\n\n for (let index = 0; index < args.length; index += 1) {\n const arg = args[index];\n\n if (arg === '-h' || arg === '--help') {\n config.help = true;\n continue;\n }\n\n if (arg === '--no-repair') {\n config.repair = false;\n continue;\n }\n\n if (arg === '--expect') {\n const value = args[index + 1];\n if (value !== 'object' && value !== 'array' && value !== 'any') {\n return 'invalid --expect value; use object, array or any';\n }\n config.expect = value;\n index += 1;\n continue;\n }\n\n return `unknown option: ${arg}`;\n }\n\n return config;\n}\n\nexport async function runCli(\n args: string[],\n stdin: string,\n streams: CliStreams,\n): Promise<number> {\n const config = parseArgs(args);\n\n if (typeof config === 'string') {\n streams.stderr(`json-from-llm: ${config}\\n`);\n return 2;\n }\n\n if (config.help) {\n streams.stdout(usage);\n return 0;\n }\n\n try {\n const value = extractJson(stdin, config);\n streams.stdout(`${JSON.stringify(value)}\\n`);\n return 0;\n } catch (error) {\n if (error instanceof JsonExtractionError) {\n streams.stderr('json-from-llm: no JSON found\\n');\n return 1;\n }\n\n streams.stderr(\n `json-from-llm: ${error instanceof Error ? error.message : String(error)}\\n`,\n );\n return 1;\n }\n}\n\nasync function readStdin(): Promise<string> {\n process.stdin.setEncoding('utf8');\n let input = '';\n for await (const chunk of process.stdin) {\n input += chunk;\n }\n return input;\n}\n\nexport function isExecutedFile(moduleUrl: string, argvPath?: string): boolean {\n if (!argvPath) {\n return false;\n }\n\n try {\n return realpathSync(fileURLToPath(moduleUrl)) === realpathSync(argvPath);\n } catch {\n return false;\n }\n}\n\nfunction isMain(): boolean {\n return isExecutedFile(import.meta.url, process.argv[1]);\n}\n\nasync function main(): Promise<void> {\n process.exitCode = await runCli(process.argv.slice(2), await readStdin(), {\n stdout: (chunk) => {\n process.stdout.write(chunk);\n },\n stderr: (chunk) => {\n process.stderr.write(chunk);\n },\n });\n}\n\nif (isMain()) {\n void main();\n}\n","/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware: braces and brackets inside JSON strings do\n * not affect nesting, so prose like `\"the } char\"` won't break the scan.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const end = matchBalanced(text, i);\n if (end !== -1) {\n spans.push(text.slice(i, end));\n i = end;\n continue;\n }\n }\n i++;\n }\n return spans;\n}\n\n/** Return the index just past the balanced value starting at `start`, or -1. */\nfunction matchBalanced(text: string, start: number): number {\n let depth = 0;\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n } else if (ch === '{' || ch === '[') {\n depth++;\n } else if (ch === '}' || ch === ']') {\n depth--;\n if (depth === 0) {\n return i + 1;\n }\n }\n }\n\n return -1;\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n */\nconst REASONING_TAGS = /<(think|thinking|reasoning|thought)>[\\s\\S]*?<\\/\\1>/gi;\n\nexport function stripReasoning(text: string): string {\n return text.replace(REASONING_TAGS, '');\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n for (const block of fencedBlocks(cleaned)) {\n candidates.push(block, ...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";;;AACA,SAAS,oBAAoB;AAC7B,SAAS,qBAAqB;;;ACGvB,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,MAAM,cAAc,MAAM,CAAC;AACjC,UAAI,QAAQ,IAAI;AACd,cAAM,KAAK,KAAK,MAAM,GAAG,GAAG,CAAC;AAC7B,YAAI;AACJ;AAAA,MACF;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AAGA,SAAS,cAAc,MAAc,OAAuB;AAC1D,MAAI,QAAQ;AACZ,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AAAA,IACb,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AAAA,IACF,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AACA,UAAI,UAAU,GAAG;AACf,eAAO,IAAI;AAAA,MACb;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AACT;;;AClDA,IAAM,iBAAiB;AAEhB,SAAS,eAAe,MAAsB;AACnD,SAAO,KAAK,QAAQ,gBAAgB,EAAE;AACxC;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACVO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,aAAW,SAAS,aAAa,OAAO,GAAG;AACzC,eAAW,KAAK,OAAO,GAAG,cAAc,KAAK,CAAC;AAAA,EAChD;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;;;ALhFA,IAAM,QAAQ;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAUd,SAAS,UAAU,MAAoC;AACrD,QAAM,SAAoB,EAAE,MAAM,MAAM;AAExC,WAAS,QAAQ,GAAG,QAAQ,KAAK,QAAQ,SAAS,GAAG;AACnD,UAAM,MAAM,KAAK,KAAK;AAEtB,QAAI,QAAQ,QAAQ,QAAQ,UAAU;AACpC,aAAO,OAAO;AACd;AAAA,IACF;AAEA,QAAI,QAAQ,eAAe;AACzB,aAAO,SAAS;AAChB;AAAA,IACF;AAEA,QAAI,QAAQ,YAAY;AACtB,YAAM,QAAQ,KAAK,QAAQ,CAAC;AAC5B,UAAI,UAAU,YAAY,UAAU,WAAW,UAAU,OAAO;AAC9D,eAAO;AAAA,MACT;AACA,aAAO,SAAS;AAChB,eAAS;AACT;AAAA,IACF;AAEA,WAAO,mBAAmB,GAAG;AAAA,EAC/B;AAEA,SAAO;AACT;AAEA,eAAsB,OACpB,MACA,OACA,SACiB;AACjB,QAAM,SAAS,UAAU,IAAI;AAE7B,MAAI,OAAO,WAAW,UAAU;AAC9B,YAAQ,OAAO,kBAAkB,MAAM;AAAA,CAAI;AAC3C,WAAO;AAAA,EACT;AAEA,MAAI,OAAO,MAAM;AACf,YAAQ,OAAO,KAAK;AACpB,WAAO;AAAA,EACT;AAEA,MAAI;AACF,UAAM,QAAQ,YAAY,OAAO,MAAM;AACvC,YAAQ,OAAO,GAAG,KAAK,UAAU,KAAK,CAAC;AAAA,CAAI;AAC3C,WAAO;AAAA,EACT,SAAS,OAAO;AACd,QAAI,iBAAiB,qBAAqB;AACxC,cAAQ,OAAO,gCAAgC;AAC/C,aAAO;AAAA,IACT;AAEA,YAAQ;AAAA,MACN,kBAAkB,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK,CAAC;AAAA;AAAA,IAC1E;AACA,WAAO;AAAA,EACT;AACF;AAEA,eAAe,YAA6B;AAC1C,UAAQ,MAAM,YAAY,MAAM;AAChC,MAAI,QAAQ;AACZ,mBAAiB,SAAS,QAAQ,OAAO;AACvC,aAAS;AAAA,EACX;AACA,SAAO;AACT;AAEO,SAAS,eAAe,WAAmB,UAA4B;AAC5E,MAAI,CAAC,UAAU;AACb,WAAO;AAAA,EACT;AAEA,MAAI;AACF,WAAO,aAAa,cAAc,SAAS,CAAC,MAAM,aAAa,QAAQ;AAAA,EACzE,QAAQ;AACN,WAAO;AAAA,EACT;AACF;AAEA,SAAS,SAAkB;AACzB,SAAO,eAAe,YAAY,KAAK,QAAQ,KAAK,CAAC,CAAC;AACxD;AAEA,eAAe,OAAsB;AACnC,UAAQ,WAAW,MAAM,OAAO,QAAQ,KAAK,MAAM,CAAC,GAAG,MAAM,UAAU,GAAG;AAAA,IACxE,QAAQ,CAAC,UAAU;AACjB,cAAQ,OAAO,MAAM,KAAK;AAAA,IAC5B;AAAA,IACA,QAAQ,CAAC,UAAU;AACjB,cAAQ,OAAO,MAAM,KAAK;AAAA,IAC5B;AAAA,EACF,CAAC;AACH;AAEA,IAAI,OAAO,GAAG;AACZ,OAAK,KAAK;AACZ;","names":[]}
1
+ {"version":3,"sources":["../src/cli.ts","../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["#!/usr/bin/env node\nimport { realpathSync } from 'node:fs';\nimport { fileURLToPath } from 'node:url';\nimport { extractJson } from './extract.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions } from './types.ts';\n\nexport interface CliStreams {\n stdout: (chunk: string) => void;\n stderr: (chunk: string) => void;\n}\n\ninterface CliConfig extends ExtractOptions {\n help: boolean;\n}\n\nconst usage = `Usage: json-from-llm [--expect object|array|any] [--no-repair]\n\nRead LLM output from stdin and print the extracted JSON value to stdout.\n\nOptions:\n --expect <type> Require the top-level JSON value to be object, array or any.\n --no-repair Disable conservative trailing-comma repair.\n -h, --help Show this help text.\n`;\n\nfunction parseArgs(args: string[]): CliConfig | string {\n const config: CliConfig = { help: false };\n\n for (let index = 0; index < args.length; index += 1) {\n const arg = args[index];\n\n if (arg === '-h' || arg === '--help') {\n config.help = true;\n continue;\n }\n\n if (arg === '--no-repair') {\n config.repair = false;\n continue;\n }\n\n if (arg === '--expect') {\n const value = args[index + 1];\n if (value !== 'object' && value !== 'array' && value !== 'any') {\n return 'invalid --expect value; use object, array or any';\n }\n config.expect = value;\n index += 1;\n continue;\n }\n\n return `unknown option: ${arg}`;\n }\n\n return config;\n}\n\nexport async function runCli(\n args: string[],\n stdin: string,\n streams: CliStreams,\n): Promise<number> {\n const config = parseArgs(args);\n\n if (typeof config === 'string') {\n streams.stderr(`json-from-llm: ${config}\\n`);\n return 2;\n }\n\n if (config.help) {\n streams.stdout(usage);\n return 0;\n }\n\n try {\n const value = extractJson(stdin, config);\n streams.stdout(`${JSON.stringify(value)}\\n`);\n return 0;\n } catch (error) {\n if (error instanceof JsonExtractionError) {\n streams.stderr('json-from-llm: no JSON found\\n');\n return 1;\n }\n\n streams.stderr(\n `json-from-llm: ${error instanceof Error ? error.message : String(error)}\\n`,\n );\n return 1;\n }\n}\n\nasync function readStdin(): Promise<string> {\n process.stdin.setEncoding('utf8');\n let input = '';\n for await (const chunk of process.stdin) {\n input += chunk;\n }\n return input;\n}\n\nexport function isExecutedFile(moduleUrl: string, argvPath?: string): boolean {\n if (!argvPath) {\n return false;\n }\n\n try {\n return realpathSync(fileURLToPath(moduleUrl)) === realpathSync(argvPath);\n } catch {\n return false;\n }\n}\n\nfunction isMain(): boolean {\n return isExecutedFile(import.meta.url, process.argv[1]);\n}\n\nasync function main(): Promise<void> {\n process.exitCode = await runCli(process.argv.slice(2), await readStdin(), {\n stdout: (chunk) => {\n process.stdout.write(chunk);\n },\n stderr: (chunk) => {\n process.stderr.write(chunk);\n },\n });\n}\n\nif (isMain()) {\n void main();\n}\n","/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware and delimiter-aware: braces and brackets\n * inside JSON strings do not affect nesting, and `[` must close with `]`.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const match = matchBalanced(text, i);\n if (match.end !== -1) {\n spans.push(text.slice(i, match.end));\n i = match.end;\n continue;\n }\n\n i = Math.max(match.resume, i + 1);\n continue;\n }\n i++;\n }\n return spans;\n}\n\ninterface MatchResult {\n /** Index just past the balanced value, or -1 when no complete value exists. */\n end: number;\n /** Next scan index after a malformed or incomplete candidate. */\n resume: number;\n}\n\nfunction matchBalanced(text: string, start: number): MatchResult {\n const expectedClosers: string[] = [];\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n continue;\n }\n\n if (ch === '{') {\n expectedClosers.push('}');\n continue;\n }\n\n if (ch === '[') {\n expectedClosers.push(']');\n continue;\n }\n\n if (ch === '}' || ch === ']') {\n if (expectedClosers.pop() !== ch) {\n return { end: -1, resume: i + 1 };\n }\n if (expectedClosers.length === 0) {\n return { end: i + 1, resume: i + 1 };\n }\n }\n }\n\n return {\n end: -1,\n resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1,\n };\n}\n\nfunction looksLikeJsonContainerStart(text: string, start: number): boolean {\n let index = start + 1;\n while (index < text.length && /\\s/.test(text[index])) {\n index++;\n }\n\n const next = text[index];\n if (text[start] === '{') {\n return next === '\"' || next === '}';\n }\n\n return (\n next === undefined ||\n next === '[' ||\n next === '{' ||\n next === '\"' ||\n next === ']' ||\n next === '-' ||\n (next >= '0' && next <= '9') ||\n next === 't' ||\n next === 'f' ||\n next === 'n'\n );\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n *\n * If a reasoning tag is opened but not closed, treat the rest of the text as\n * reasoning. Returning no JSON is safer than extracting a valid-looking draft.\n */\nconst CLOSED_REASONING_BLOCK =\n /<(think|thinking|reasoning|thought)\\b[^>]*>[\\s\\S]*?<\\/\\1>/gi;\nconst OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\\b[^>]*>/gi;\n\nexport function stripReasoning(text: string): string {\n const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, '');\n OPEN_REASONING_TAG.lastIndex = 0;\n const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);\n\n if (!unclosed) {\n return withoutClosedBlocks;\n }\n\n return withoutClosedBlocks.slice(0, unclosed.index);\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n const blocks = fencedBlocks(cleaned);\n candidates.push(...blocks);\n for (const block of blocks) {\n candidates.push(...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";;;AACA,SAAS,oBAAoB;AAC7B,SAAS,qBAAqB;;;ACGvB,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,QAAQ,cAAc,MAAM,CAAC;AACnC,UAAI,MAAM,QAAQ,IAAI;AACpB,cAAM,KAAK,KAAK,MAAM,GAAG,MAAM,GAAG,CAAC;AACnC,YAAI,MAAM;AACV;AAAA,MACF;AAEA,UAAI,KAAK,IAAI,MAAM,QAAQ,IAAI,CAAC;AAChC;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AASA,SAAS,cAAc,MAAc,OAA4B;AAC/D,QAAM,kBAA4B,CAAC;AACnC,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,UAAI,gBAAgB,IAAI,MAAM,IAAI;AAChC,eAAO,EAAE,KAAK,IAAI,QAAQ,IAAI,EAAE;AAAA,MAClC;AACA,UAAI,gBAAgB,WAAW,GAAG;AAChC,eAAO,EAAE,KAAK,IAAI,GAAG,QAAQ,IAAI,EAAE;AAAA,MACrC;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AAAA,IACL,KAAK;AAAA,IACL,QAAQ,4BAA4B,MAAM,KAAK,IAAI,KAAK,SAAS,QAAQ;AAAA,EAC3E;AACF;AAEA,SAAS,4BAA4B,MAAc,OAAwB;AACzE,MAAI,QAAQ,QAAQ;AACpB,SAAO,QAAQ,KAAK,UAAU,KAAK,KAAK,KAAK,KAAK,CAAC,GAAG;AACpD;AAAA,EACF;AAEA,QAAM,OAAO,KAAK,KAAK;AACvB,MAAI,KAAK,KAAK,MAAM,KAAK;AACvB,WAAO,SAAS,OAAO,SAAS;AAAA,EAClC;AAEA,SACE,SAAS,UACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACR,QAAQ,OAAO,QAAQ,OACxB,SAAS,OACT,SAAS,OACT,SAAS;AAEb;;;ACjGA,IAAM,yBACJ;AACF,IAAM,qBAAqB;AAEpB,SAAS,eAAe,MAAsB;AACnD,QAAM,sBAAsB,KAAK,QAAQ,wBAAwB,EAAE;AACnE,qBAAmB,YAAY;AAC/B,QAAM,WAAW,mBAAmB,KAAK,mBAAmB;AAE5D,MAAI,CAAC,UAAU;AACb,WAAO;AAAA,EACT;AAEA,SAAO,oBAAoB,MAAM,GAAG,SAAS,KAAK;AACpD;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACvBO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,QAAM,SAAS,aAAa,OAAO;AACnC,aAAW,KAAK,GAAG,MAAM;AACzB,aAAW,SAAS,QAAQ;AAC1B,eAAW,KAAK,GAAG,cAAc,KAAK,CAAC;AAAA,EACzC;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;;;ALlFA,IAAM,QAAQ;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAUd,SAAS,UAAU,MAAoC;AACrD,QAAM,SAAoB,EAAE,MAAM,MAAM;AAExC,WAAS,QAAQ,GAAG,QAAQ,KAAK,QAAQ,SAAS,GAAG;AACnD,UAAM,MAAM,KAAK,KAAK;AAEtB,QAAI,QAAQ,QAAQ,QAAQ,UAAU;AACpC,aAAO,OAAO;AACd;AAAA,IACF;AAEA,QAAI,QAAQ,eAAe;AACzB,aAAO,SAAS;AAChB;AAAA,IACF;AAEA,QAAI,QAAQ,YAAY;AACtB,YAAM,QAAQ,KAAK,QAAQ,CAAC;AAC5B,UAAI,UAAU,YAAY,UAAU,WAAW,UAAU,OAAO;AAC9D,eAAO;AAAA,MACT;AACA,aAAO,SAAS;AAChB,eAAS;AACT;AAAA,IACF;AAEA,WAAO,mBAAmB,GAAG;AAAA,EAC/B;AAEA,SAAO;AACT;AAEA,eAAsB,OACpB,MACA,OACA,SACiB;AACjB,QAAM,SAAS,UAAU,IAAI;AAE7B,MAAI,OAAO,WAAW,UAAU;AAC9B,YAAQ,OAAO,kBAAkB,MAAM;AAAA,CAAI;AAC3C,WAAO;AAAA,EACT;AAEA,MAAI,OAAO,MAAM;AACf,YAAQ,OAAO,KAAK;AACpB,WAAO;AAAA,EACT;AAEA,MAAI;AACF,UAAM,QAAQ,YAAY,OAAO,MAAM;AACvC,YAAQ,OAAO,GAAG,KAAK,UAAU,KAAK,CAAC;AAAA,CAAI;AAC3C,WAAO;AAAA,EACT,SAAS,OAAO;AACd,QAAI,iBAAiB,qBAAqB;AACxC,cAAQ,OAAO,gCAAgC;AAC/C,aAAO;AAAA,IACT;AAEA,YAAQ;AAAA,MACN,kBAAkB,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK,CAAC;AAAA;AAAA,IAC1E;AACA,WAAO;AAAA,EACT;AACF;AAEA,eAAe,YAA6B;AAC1C,UAAQ,MAAM,YAAY,MAAM;AAChC,MAAI,QAAQ;AACZ,mBAAiB,SAAS,QAAQ,OAAO;AACvC,aAAS;AAAA,EACX;AACA,SAAO;AACT;AAEO,SAAS,eAAe,WAAmB,UAA4B;AAC5E,MAAI,CAAC,UAAU;AACb,WAAO;AAAA,EACT;AAEA,MAAI;AACF,WAAO,aAAa,cAAc,SAAS,CAAC,MAAM,aAAa,QAAQ;AAAA,EACzE,QAAQ;AACN,WAAO;AAAA,EACT;AACF;AAEA,SAAS,SAAkB;AACzB,SAAO,eAAe,YAAY,KAAK,QAAQ,KAAK,CAAC,CAAC;AACxD;AAEA,eAAe,OAAsB;AACnC,UAAQ,WAAW,MAAM,OAAO,QAAQ,KAAK,MAAM,CAAC,GAAG,MAAM,UAAU,GAAG;AAAA,IACxE,QAAQ,CAAC,UAAU;AACjB,cAAQ,OAAO,MAAM,KAAK;AAAA,IAC5B;AAAA,IACA,QAAQ,CAAC,UAAU;AACjB,cAAQ,OAAO,MAAM,KAAK;AAAA,IAC5B;AAAA,EACF,CAAC;AACH;AAEA,IAAI,OAAO,GAAG;AACZ,OAAK,KAAK;AACZ;","names":[]}
package/dist/index.cjs CHANGED
@@ -74,19 +74,21 @@ function balancedSpans(text) {
74
74
  while (i < text.length) {
75
75
  const ch = text[i];
76
76
  if (ch === "{" || ch === "[") {
77
- const end = matchBalanced(text, i);
78
- if (end !== -1) {
79
- spans.push(text.slice(i, end));
80
- i = end;
77
+ const match = matchBalanced(text, i);
78
+ if (match.end !== -1) {
79
+ spans.push(text.slice(i, match.end));
80
+ i = match.end;
81
81
  continue;
82
82
  }
83
+ i = Math.max(match.resume, i + 1);
84
+ continue;
83
85
  }
84
86
  i++;
85
87
  }
86
88
  return spans;
87
89
  }
88
90
  function matchBalanced(text, start) {
89
- let depth = 0;
91
+ const expectedClosers = [];
90
92
  let inString = false;
91
93
  let escaped = false;
92
94
  for (let i = start; i < text.length; i++) {
@@ -103,22 +105,53 @@ function matchBalanced(text, start) {
103
105
  }
104
106
  if (ch === '"') {
105
107
  inString = true;
106
- } else if (ch === "{" || ch === "[") {
107
- depth++;
108
- } else if (ch === "}" || ch === "]") {
109
- depth--;
110
- if (depth === 0) {
111
- return i + 1;
108
+ continue;
109
+ }
110
+ if (ch === "{") {
111
+ expectedClosers.push("}");
112
+ continue;
113
+ }
114
+ if (ch === "[") {
115
+ expectedClosers.push("]");
116
+ continue;
117
+ }
118
+ if (ch === "}" || ch === "]") {
119
+ if (expectedClosers.pop() !== ch) {
120
+ return { end: -1, resume: i + 1 };
121
+ }
122
+ if (expectedClosers.length === 0) {
123
+ return { end: i + 1, resume: i + 1 };
112
124
  }
113
125
  }
114
126
  }
115
- return -1;
127
+ return {
128
+ end: -1,
129
+ resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1
130
+ };
131
+ }
132
+ function looksLikeJsonContainerStart(text, start) {
133
+ let index = start + 1;
134
+ while (index < text.length && /\s/.test(text[index])) {
135
+ index++;
136
+ }
137
+ const next = text[index];
138
+ if (text[start] === "{") {
139
+ return next === '"' || next === "}";
140
+ }
141
+ return next === void 0 || next === "[" || next === "{" || next === '"' || next === "]" || next === "-" || next >= "0" && next <= "9" || next === "t" || next === "f" || next === "n";
116
142
  }
117
143
 
118
144
  // src/strip.ts
119
- var REASONING_TAGS = /<(think|thinking|reasoning|thought)>[\s\S]*?<\/\1>/gi;
145
+ var CLOSED_REASONING_BLOCK = /<(think|thinking|reasoning|thought)\b[^>]*>[\s\S]*?<\/\1>/gi;
146
+ var OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\b[^>]*>/gi;
120
147
  function stripReasoning(text) {
121
- return text.replace(REASONING_TAGS, "");
148
+ const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, "");
149
+ OPEN_REASONING_TAG.lastIndex = 0;
150
+ const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);
151
+ if (!unclosed) {
152
+ return withoutClosedBlocks;
153
+ }
154
+ return withoutClosedBlocks.slice(0, unclosed.index);
122
155
  }
123
156
  var FENCE = /```[^\S\n]*([a-zA-Z0-9_+-]*)[^\S\n]*\n?([\s\S]*?)```/g;
124
157
  function fencedBlocks(text) {
@@ -176,8 +209,10 @@ function tryExtractJson(text, options = {}) {
176
209
  const expect = options.expect ?? "any";
177
210
  const cleaned = stripReasoning(text);
178
211
  const candidates = [];
179
- for (const block of fencedBlocks(cleaned)) {
180
- candidates.push(block, ...balancedSpans(block));
212
+ const blocks = fencedBlocks(cleaned);
213
+ candidates.push(...blocks);
214
+ for (const block of blocks) {
215
+ candidates.push(...balancedSpans(block));
181
216
  }
182
217
  candidates.push(...balancedSpans(cleaned));
183
218
  for (const candidate of candidates) {
@@ -1 +1 @@
1
- {"version":3,"sources":["../src/index.ts","../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["export { extractJson, tryExtractJson } from './extract.ts';\nexport { stripReasoning, fencedBlocks } from './strip.ts';\nexport { balancedSpans } from './scan.ts';\nexport { removeTrailingCommas } from './repair.ts';\nexport { JsonExtractionError } from './types.ts';\nexport type { ExtractOptions, ExtractResult } from './types.ts';\n","/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware: braces and brackets inside JSON strings do\n * not affect nesting, so prose like `\"the } char\"` won't break the scan.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const end = matchBalanced(text, i);\n if (end !== -1) {\n spans.push(text.slice(i, end));\n i = end;\n continue;\n }\n }\n i++;\n }\n return spans;\n}\n\n/** Return the index just past the balanced value starting at `start`, or -1. */\nfunction matchBalanced(text: string, start: number): number {\n let depth = 0;\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n } else if (ch === '{' || ch === '[') {\n depth++;\n } else if (ch === '}' || ch === ']') {\n depth--;\n if (depth === 0) {\n return i + 1;\n }\n }\n }\n\n return -1;\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n */\nconst REASONING_TAGS = /<(think|thinking|reasoning|thought)>[\\s\\S]*?<\\/\\1>/gi;\n\nexport function stripReasoning(text: string): string {\n return text.replace(REASONING_TAGS, '');\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n for (const block of fencedBlocks(cleaned)) {\n candidates.push(block, ...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";;;;;;;;;;;;;;;;;;;;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;;;ACKO,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,MAAM,cAAc,MAAM,CAAC;AACjC,UAAI,QAAQ,IAAI;AACd,cAAM,KAAK,KAAK,MAAM,GAAG,GAAG,CAAC;AAC7B,YAAI;AACJ;AAAA,MACF;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AAGA,SAAS,cAAc,MAAc,OAAuB;AAC1D,MAAI,QAAQ;AACZ,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AAAA,IACb,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AAAA,IACF,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AACA,UAAI,UAAU,GAAG;AACf,eAAO,IAAI;AAAA,MACb;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AACT;;;AClDA,IAAM,iBAAiB;AAEhB,SAAS,eAAe,MAAsB;AACnD,SAAO,KAAK,QAAQ,gBAAgB,EAAE;AACxC;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACVO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,aAAW,SAAS,aAAa,OAAO,GAAG;AACzC,eAAW,KAAK,OAAO,GAAG,cAAc,KAAK,CAAC;AAAA,EAChD;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;","names":[]}
1
+ {"version":3,"sources":["../src/index.ts","../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["export { extractJson, tryExtractJson } from './extract.ts';\nexport { stripReasoning, fencedBlocks } from './strip.ts';\nexport { balancedSpans } from './scan.ts';\nexport { removeTrailingCommas } from './repair.ts';\nexport { JsonExtractionError } from './types.ts';\nexport type { ExtractOptions, ExtractResult } from './types.ts';\n","/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware and delimiter-aware: braces and brackets\n * inside JSON strings do not affect nesting, and `[` must close with `]`.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const match = matchBalanced(text, i);\n if (match.end !== -1) {\n spans.push(text.slice(i, match.end));\n i = match.end;\n continue;\n }\n\n i = Math.max(match.resume, i + 1);\n continue;\n }\n i++;\n }\n return spans;\n}\n\ninterface MatchResult {\n /** Index just past the balanced value, or -1 when no complete value exists. */\n end: number;\n /** Next scan index after a malformed or incomplete candidate. */\n resume: number;\n}\n\nfunction matchBalanced(text: string, start: number): MatchResult {\n const expectedClosers: string[] = [];\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n continue;\n }\n\n if (ch === '{') {\n expectedClosers.push('}');\n continue;\n }\n\n if (ch === '[') {\n expectedClosers.push(']');\n continue;\n }\n\n if (ch === '}' || ch === ']') {\n if (expectedClosers.pop() !== ch) {\n return { end: -1, resume: i + 1 };\n }\n if (expectedClosers.length === 0) {\n return { end: i + 1, resume: i + 1 };\n }\n }\n }\n\n return {\n end: -1,\n resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1,\n };\n}\n\nfunction looksLikeJsonContainerStart(text: string, start: number): boolean {\n let index = start + 1;\n while (index < text.length && /\\s/.test(text[index])) {\n index++;\n }\n\n const next = text[index];\n if (text[start] === '{') {\n return next === '\"' || next === '}';\n }\n\n return (\n next === undefined ||\n next === '[' ||\n next === '{' ||\n next === '\"' ||\n next === ']' ||\n next === '-' ||\n (next >= '0' && next <= '9') ||\n next === 't' ||\n next === 'f' ||\n next === 'n'\n );\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n *\n * If a reasoning tag is opened but not closed, treat the rest of the text as\n * reasoning. Returning no JSON is safer than extracting a valid-looking draft.\n */\nconst CLOSED_REASONING_BLOCK =\n /<(think|thinking|reasoning|thought)\\b[^>]*>[\\s\\S]*?<\\/\\1>/gi;\nconst OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\\b[^>]*>/gi;\n\nexport function stripReasoning(text: string): string {\n const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, '');\n OPEN_REASONING_TAG.lastIndex = 0;\n const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);\n\n if (!unclosed) {\n return withoutClosedBlocks;\n }\n\n return withoutClosedBlocks.slice(0, unclosed.index);\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n const blocks = fencedBlocks(cleaned);\n candidates.push(...blocks);\n for (const block of blocks) {\n candidates.push(...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";;;;;;;;;;;;;;;;;;;;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;;;ACKO,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,QAAQ,cAAc,MAAM,CAAC;AACnC,UAAI,MAAM,QAAQ,IAAI;AACpB,cAAM,KAAK,KAAK,MAAM,GAAG,MAAM,GAAG,CAAC;AACnC,YAAI,MAAM;AACV;AAAA,MACF;AAEA,UAAI,KAAK,IAAI,MAAM,QAAQ,IAAI,CAAC;AAChC;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AASA,SAAS,cAAc,MAAc,OAA4B;AAC/D,QAAM,kBAA4B,CAAC;AACnC,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,UAAI,gBAAgB,IAAI,MAAM,IAAI;AAChC,eAAO,EAAE,KAAK,IAAI,QAAQ,IAAI,EAAE;AAAA,MAClC;AACA,UAAI,gBAAgB,WAAW,GAAG;AAChC,eAAO,EAAE,KAAK,IAAI,GAAG,QAAQ,IAAI,EAAE;AAAA,MACrC;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AAAA,IACL,KAAK;AAAA,IACL,QAAQ,4BAA4B,MAAM,KAAK,IAAI,KAAK,SAAS,QAAQ;AAAA,EAC3E;AACF;AAEA,SAAS,4BAA4B,MAAc,OAAwB;AACzE,MAAI,QAAQ,QAAQ;AACpB,SAAO,QAAQ,KAAK,UAAU,KAAK,KAAK,KAAK,KAAK,CAAC,GAAG;AACpD;AAAA,EACF;AAEA,QAAM,OAAO,KAAK,KAAK;AACvB,MAAI,KAAK,KAAK,MAAM,KAAK;AACvB,WAAO,SAAS,OAAO,SAAS;AAAA,EAClC;AAEA,SACE,SAAS,UACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACR,QAAQ,OAAO,QAAQ,OACxB,SAAS,OACT,SAAS,OACT,SAAS;AAEb;;;ACjGA,IAAM,yBACJ;AACF,IAAM,qBAAqB;AAEpB,SAAS,eAAe,MAAsB;AACnD,QAAM,sBAAsB,KAAK,QAAQ,wBAAwB,EAAE;AACnE,qBAAmB,YAAY;AAC/B,QAAM,WAAW,mBAAmB,KAAK,mBAAmB;AAE5D,MAAI,CAAC,UAAU;AACb,WAAO;AAAA,EACT;AAEA,SAAO,oBAAoB,MAAM,GAAG,SAAS,KAAK;AACpD;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACvBO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,QAAM,SAAS,aAAa,OAAO;AACnC,aAAW,KAAK,GAAG,MAAM;AACzB,aAAW,SAAS,QAAQ;AAC1B,eAAW,KAAK,GAAG,cAAc,KAAK,CAAC;AAAA,EACzC;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;","names":[]}
package/dist/index.d.cts CHANGED
@@ -54,8 +54,8 @@ declare function fencedBlocks(text: string): string[];
54
54
 
55
55
  /**
56
56
  * Find the substrings of complete, balanced JSON objects/arrays in `text`,
57
- * in document order. String-aware: braces and brackets inside JSON strings do
58
- * not affect nesting, so prose like `"the } char"` won't break the scan.
57
+ * in document order. String-aware and delimiter-aware: braces and brackets
58
+ * inside JSON strings do not affect nesting, and `[` must close with `]`.
59
59
  */
60
60
  declare function balancedSpans(text: string): string[];
61
61
 
package/dist/index.d.ts CHANGED
@@ -54,8 +54,8 @@ declare function fencedBlocks(text: string): string[];
54
54
 
55
55
  /**
56
56
  * Find the substrings of complete, balanced JSON objects/arrays in `text`,
57
- * in document order. String-aware: braces and brackets inside JSON strings do
58
- * not affect nesting, so prose like `"the } char"` won't break the scan.
57
+ * in document order. String-aware and delimiter-aware: braces and brackets
58
+ * inside JSON strings do not affect nesting, and `[` must close with `]`.
59
59
  */
60
60
  declare function balancedSpans(text: string): string[];
61
61
 
package/dist/index.js CHANGED
@@ -42,19 +42,21 @@ function balancedSpans(text) {
42
42
  while (i < text.length) {
43
43
  const ch = text[i];
44
44
  if (ch === "{" || ch === "[") {
45
- const end = matchBalanced(text, i);
46
- if (end !== -1) {
47
- spans.push(text.slice(i, end));
48
- i = end;
45
+ const match = matchBalanced(text, i);
46
+ if (match.end !== -1) {
47
+ spans.push(text.slice(i, match.end));
48
+ i = match.end;
49
49
  continue;
50
50
  }
51
+ i = Math.max(match.resume, i + 1);
52
+ continue;
51
53
  }
52
54
  i++;
53
55
  }
54
56
  return spans;
55
57
  }
56
58
  function matchBalanced(text, start) {
57
- let depth = 0;
59
+ const expectedClosers = [];
58
60
  let inString = false;
59
61
  let escaped = false;
60
62
  for (let i = start; i < text.length; i++) {
@@ -71,22 +73,53 @@ function matchBalanced(text, start) {
71
73
  }
72
74
  if (ch === '"') {
73
75
  inString = true;
74
- } else if (ch === "{" || ch === "[") {
75
- depth++;
76
- } else if (ch === "}" || ch === "]") {
77
- depth--;
78
- if (depth === 0) {
79
- return i + 1;
76
+ continue;
77
+ }
78
+ if (ch === "{") {
79
+ expectedClosers.push("}");
80
+ continue;
81
+ }
82
+ if (ch === "[") {
83
+ expectedClosers.push("]");
84
+ continue;
85
+ }
86
+ if (ch === "}" || ch === "]") {
87
+ if (expectedClosers.pop() !== ch) {
88
+ return { end: -1, resume: i + 1 };
89
+ }
90
+ if (expectedClosers.length === 0) {
91
+ return { end: i + 1, resume: i + 1 };
80
92
  }
81
93
  }
82
94
  }
83
- return -1;
95
+ return {
96
+ end: -1,
97
+ resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1
98
+ };
99
+ }
100
+ function looksLikeJsonContainerStart(text, start) {
101
+ let index = start + 1;
102
+ while (index < text.length && /\s/.test(text[index])) {
103
+ index++;
104
+ }
105
+ const next = text[index];
106
+ if (text[start] === "{") {
107
+ return next === '"' || next === "}";
108
+ }
109
+ return next === void 0 || next === "[" || next === "{" || next === '"' || next === "]" || next === "-" || next >= "0" && next <= "9" || next === "t" || next === "f" || next === "n";
84
110
  }
85
111
 
86
112
  // src/strip.ts
87
- var REASONING_TAGS = /<(think|thinking|reasoning|thought)>[\s\S]*?<\/\1>/gi;
113
+ var CLOSED_REASONING_BLOCK = /<(think|thinking|reasoning|thought)\b[^>]*>[\s\S]*?<\/\1>/gi;
114
+ var OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\b[^>]*>/gi;
88
115
  function stripReasoning(text) {
89
- return text.replace(REASONING_TAGS, "");
116
+ const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, "");
117
+ OPEN_REASONING_TAG.lastIndex = 0;
118
+ const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);
119
+ if (!unclosed) {
120
+ return withoutClosedBlocks;
121
+ }
122
+ return withoutClosedBlocks.slice(0, unclosed.index);
90
123
  }
91
124
  var FENCE = /```[^\S\n]*([a-zA-Z0-9_+-]*)[^\S\n]*\n?([\s\S]*?)```/g;
92
125
  function fencedBlocks(text) {
@@ -144,8 +177,10 @@ function tryExtractJson(text, options = {}) {
144
177
  const expect = options.expect ?? "any";
145
178
  const cleaned = stripReasoning(text);
146
179
  const candidates = [];
147
- for (const block of fencedBlocks(cleaned)) {
148
- candidates.push(block, ...balancedSpans(block));
180
+ const blocks = fencedBlocks(cleaned);
181
+ candidates.push(...blocks);
182
+ for (const block of blocks) {
183
+ candidates.push(...balancedSpans(block));
149
184
  }
150
185
  candidates.push(...balancedSpans(cleaned));
151
186
  for (const candidate of candidates) {
package/dist/index.js.map CHANGED
@@ -1 +1 @@
1
- {"version":3,"sources":["../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware: braces and brackets inside JSON strings do\n * not affect nesting, so prose like `\"the } char\"` won't break the scan.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const end = matchBalanced(text, i);\n if (end !== -1) {\n spans.push(text.slice(i, end));\n i = end;\n continue;\n }\n }\n i++;\n }\n return spans;\n}\n\n/** Return the index just past the balanced value starting at `start`, or -1. */\nfunction matchBalanced(text: string, start: number): number {\n let depth = 0;\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n } else if (ch === '{' || ch === '[') {\n depth++;\n } else if (ch === '}' || ch === ']') {\n depth--;\n if (depth === 0) {\n return i + 1;\n }\n }\n }\n\n return -1;\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n */\nconst REASONING_TAGS = /<(think|thinking|reasoning|thought)>[\\s\\S]*?<\\/\\1>/gi;\n\nexport function stripReasoning(text: string): string {\n return text.replace(REASONING_TAGS, '');\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n for (const block of fencedBlocks(cleaned)) {\n candidates.push(block, ...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";AAKO,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,MAAM,cAAc,MAAM,CAAC;AACjC,UAAI,QAAQ,IAAI;AACd,cAAM,KAAK,KAAK,MAAM,GAAG,GAAG,CAAC;AAC7B,YAAI;AACJ;AAAA,MACF;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AAGA,SAAS,cAAc,MAAc,OAAuB;AAC1D,MAAI,QAAQ;AACZ,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AAAA,IACb,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AAAA,IACF,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AACA,UAAI,UAAU,GAAG;AACf,eAAO,IAAI;AAAA,MACb;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AACT;;;AClDA,IAAM,iBAAiB;AAEhB,SAAS,eAAe,MAAsB;AACnD,SAAO,KAAK,QAAQ,gBAAgB,EAAE;AACxC;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACVO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,aAAW,SAAS,aAAa,OAAO,GAAG;AACzC,eAAW,KAAK,OAAO,GAAG,cAAc,KAAK,CAAC;AAAA,EAChD;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;","names":[]}
1
+ {"version":3,"sources":["../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware and delimiter-aware: braces and brackets\n * inside JSON strings do not affect nesting, and `[` must close with `]`.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const match = matchBalanced(text, i);\n if (match.end !== -1) {\n spans.push(text.slice(i, match.end));\n i = match.end;\n continue;\n }\n\n i = Math.max(match.resume, i + 1);\n continue;\n }\n i++;\n }\n return spans;\n}\n\ninterface MatchResult {\n /** Index just past the balanced value, or -1 when no complete value exists. */\n end: number;\n /** Next scan index after a malformed or incomplete candidate. */\n resume: number;\n}\n\nfunction matchBalanced(text: string, start: number): MatchResult {\n const expectedClosers: string[] = [];\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n continue;\n }\n\n if (ch === '{') {\n expectedClosers.push('}');\n continue;\n }\n\n if (ch === '[') {\n expectedClosers.push(']');\n continue;\n }\n\n if (ch === '}' || ch === ']') {\n if (expectedClosers.pop() !== ch) {\n return { end: -1, resume: i + 1 };\n }\n if (expectedClosers.length === 0) {\n return { end: i + 1, resume: i + 1 };\n }\n }\n }\n\n return {\n end: -1,\n resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1,\n };\n}\n\nfunction looksLikeJsonContainerStart(text: string, start: number): boolean {\n let index = start + 1;\n while (index < text.length && /\\s/.test(text[index])) {\n index++;\n }\n\n const next = text[index];\n if (text[start] === '{') {\n return next === '\"' || next === '}';\n }\n\n return (\n next === undefined ||\n next === '[' ||\n next === '{' ||\n next === '\"' ||\n next === ']' ||\n next === '-' ||\n (next >= '0' && next <= '9') ||\n next === 't' ||\n next === 'f' ||\n next === 'n'\n );\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n *\n * If a reasoning tag is opened but not closed, treat the rest of the text as\n * reasoning. Returning no JSON is safer than extracting a valid-looking draft.\n */\nconst CLOSED_REASONING_BLOCK =\n /<(think|thinking|reasoning|thought)\\b[^>]*>[\\s\\S]*?<\\/\\1>/gi;\nconst OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\\b[^>]*>/gi;\n\nexport function stripReasoning(text: string): string {\n const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, '');\n OPEN_REASONING_TAG.lastIndex = 0;\n const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);\n\n if (!unclosed) {\n return withoutClosedBlocks;\n }\n\n return withoutClosedBlocks.slice(0, unclosed.index);\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n const blocks = fencedBlocks(cleaned);\n candidates.push(...blocks);\n for (const block of blocks) {\n candidates.push(...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";AAKO,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,QAAQ,cAAc,MAAM,CAAC;AACnC,UAAI,MAAM,QAAQ,IAAI;AACpB,cAAM,KAAK,KAAK,MAAM,GAAG,MAAM,GAAG,CAAC;AACnC,YAAI,MAAM;AACV;AAAA,MACF;AAEA,UAAI,KAAK,IAAI,MAAM,QAAQ,IAAI,CAAC;AAChC;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AASA,SAAS,cAAc,MAAc,OAA4B;AAC/D,QAAM,kBAA4B,CAAC;AACnC,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,UAAI,gBAAgB,IAAI,MAAM,IAAI;AAChC,eAAO,EAAE,KAAK,IAAI,QAAQ,IAAI,EAAE;AAAA,MAClC;AACA,UAAI,gBAAgB,WAAW,GAAG;AAChC,eAAO,EAAE,KAAK,IAAI,GAAG,QAAQ,IAAI,EAAE;AAAA,MACrC;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AAAA,IACL,KAAK;AAAA,IACL,QAAQ,4BAA4B,MAAM,KAAK,IAAI,KAAK,SAAS,QAAQ;AAAA,EAC3E;AACF;AAEA,SAAS,4BAA4B,MAAc,OAAwB;AACzE,MAAI,QAAQ,QAAQ;AACpB,SAAO,QAAQ,KAAK,UAAU,KAAK,KAAK,KAAK,KAAK,CAAC,GAAG;AACpD;AAAA,EACF;AAEA,QAAM,OAAO,KAAK,KAAK;AACvB,MAAI,KAAK,KAAK,MAAM,KAAK;AACvB,WAAO,SAAS,OAAO,SAAS;AAAA,EAClC;AAEA,SACE,SAAS,UACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACR,QAAQ,OAAO,QAAQ,OACxB,SAAS,OACT,SAAS,OACT,SAAS;AAEb;;;ACjGA,IAAM,yBACJ;AACF,IAAM,qBAAqB;AAEpB,SAAS,eAAe,MAAsB;AACnD,QAAM,sBAAsB,KAAK,QAAQ,wBAAwB,EAAE;AACnE,qBAAmB,YAAY;AAC/B,QAAM,WAAW,mBAAmB,KAAK,mBAAmB;AAE5D,MAAI,CAAC,UAAU;AACb,WAAO;AAAA,EACT;AAEA,SAAO,oBAAoB,MAAM,GAAG,SAAS,KAAK;AACpD;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACvBO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,QAAM,SAAS,aAAa,OAAO;AACnC,aAAW,KAAK,GAAG,MAAM;AACzB,aAAW,SAAS,QAAQ;AAC1B,eAAW,KAAK,GAAG,cAAc,KAAK,CAAC;AAAA,EACzC;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;","names":[]}
@@ -5,8 +5,12 @@ plain `JSON.parse`:
5
5
 
6
6
  - reasoning/thinking tags with brace-laden prose
7
7
  - fenced JSON blocks inside conversational text
8
+ - multiple fenced blocks where a later complete payload should win
8
9
  - trailing commas in otherwise valid JSON
9
10
  - competing arrays/objects when callers expect a specific top-level type
11
+ - malformed drafts before a final valid JSON answer
12
+ - truncated stream output that contains nested draft fragments
13
+ - unclosed reasoning tags that should not leak draft JSON
10
14
  - negative text with no recoverable JSON
11
15
 
12
16
  Each `.txt` file has a matching expected JSON file under `fixtures/expected/`.
@@ -0,0 +1,9 @@
1
+ I'll return the object first so it is easy to parse.
2
+
3
+ {
4
+ "provider": "anthropic",
5
+ "safe": true,
6
+ "note": "the literal } character is part of this string"
7
+ }
8
+
9
+ Let me know if you want the schema too.
@@ -0,0 +1,9 @@
1
+ {
2
+ "found": true,
3
+ "expect": "object",
4
+ "value": {
5
+ "provider": "anthropic",
6
+ "safe": true,
7
+ "note": "the literal } character is part of this string"
8
+ }
9
+ }
@@ -0,0 +1,8 @@
1
+ {
2
+ "found": true,
3
+ "expect": "object",
4
+ "value": {
5
+ "score": 8,
6
+ "reason": "clear"
7
+ }
8
+ }
@@ -0,0 +1,8 @@
1
+ {
2
+ "found": true,
3
+ "expect": "object",
4
+ "value": {
5
+ "score": 8,
6
+ "draft": false
7
+ }
8
+ }
@@ -0,0 +1,11 @@
1
+ {
2
+ "found": true,
3
+ "expect": "object",
4
+ "value": {
5
+ "provider": "openai",
6
+ "classification": {
7
+ "label": "ship",
8
+ "confidence": 0.92
9
+ }
10
+ }
11
+ }
@@ -0,0 +1,4 @@
1
+ {
2
+ "found": false,
3
+ "expect": "object"
4
+ }
@@ -0,0 +1,4 @@
1
+ {
2
+ "found": false,
3
+ "expect": "object"
4
+ }
@@ -0,0 +1,7 @@
1
+ Draft:
2
+
3
+ {score: 7, reason: "rough"}
4
+
5
+ Final JSON:
6
+
7
+ {"score":8,"reason":"clear"}
@@ -0,0 +1,11 @@
1
+ The first fenced block is a draft note, not the payload.
2
+
3
+ ```json
4
+ The draft object was {"score": 4, "draft": true}, but do not use it.
5
+ ```
6
+
7
+ Final JSON:
8
+
9
+ ```json
10
+ {"score":8,"draft":false}
11
+ ```
@@ -0,0 +1,11 @@
1
+ Here is the JSON you requested:
2
+
3
+ ```json
4
+ {
5
+ "provider": "openai",
6
+ "classification": {
7
+ "label": "ship",
8
+ "confidence": 0.92
9
+ }
10
+ }
11
+ ```
@@ -0,0 +1,6 @@
1
+ The provider stream ended before the top-level object closed.
2
+
3
+ ```json
4
+ {
5
+ "items": [
6
+ { "id": "draft", "score": 4 }
@@ -0,0 +1,3 @@
1
+ <thinking>
2
+ {"draft": true, "score": 4}
3
+ Still reasoning; no final answer was emitted.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "json-from-llm",
3
- "version": "0.2.0",
3
+ "version": "0.2.1",
4
4
  "description": "Extract valid JSON from an LLM response, even when it is wrapped in reasoning/thinking tags, markdown fences or prose. Zero dependencies.",
5
5
  "keywords": [
6
6
  "llm",