json-from-llm 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +21 -0
- package/README.md +52 -6
- package/dist/cli.js +51 -16
- package/dist/cli.js.map +1 -1
- package/dist/index.cjs +51 -16
- package/dist/index.cjs.map +1 -1
- package/dist/index.d.cts +2 -2
- package/dist/index.d.ts +2 -2
- package/dist/index.js +51 -16
- package/dist/index.js.map +1 -1
- package/fixtures/README.md +4 -0
- package/fixtures/anthropic-prose-object.txt +9 -0
- package/fixtures/expected/anthropic-prose-object.json +9 -0
- package/fixtures/expected/malformed-draft-valid-final.json +8 -0
- package/fixtures/expected/multiple-fenced-final.json +8 -0
- package/fixtures/expected/openai-fenced-object.json +11 -0
- package/fixtures/expected/truncated-stream-no-json.json +4 -0
- package/fixtures/expected/unclosed-thinking-no-json.json +4 -0
- package/fixtures/malformed-draft-valid-final.txt +7 -0
- package/fixtures/multiple-fenced-final.txt +11 -0
- package/fixtures/openai-fenced-object.txt +11 -0
- package/fixtures/truncated-stream-no-json.txt +6 -0
- package/fixtures/unclosed-thinking-no-json.txt +3 -0
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,27 @@ All notable changes to this project are documented here. The format follows
|
|
|
4
4
|
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and the project adheres
|
|
5
5
|
to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
6
6
|
|
|
7
|
+
## [Unreleased]
|
|
8
|
+
|
|
9
|
+
## [0.2.1] - 2026-06-07
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
|
|
13
|
+
- Added provider-style fixtures for fenced OpenAI-like output, multiple fenced
|
|
14
|
+
candidates, Anthropic-like prose wrappers, malformed drafts before final JSON,
|
|
15
|
+
truncated streams and unclosed reasoning blocks.
|
|
16
|
+
- Added edge-case coverage for BOM/whitespace, escaped braces in strings,
|
|
17
|
+
deeply nested JSON and partial malformed input.
|
|
18
|
+
|
|
19
|
+
### Changed
|
|
20
|
+
|
|
21
|
+
- Made balanced scanning delimiter-aware so mismatched `{` / `]` drafts and
|
|
22
|
+
truncated JSON-looking containers are skipped as malformed candidates.
|
|
23
|
+
- Prefer complete fenced payloads across all JSON-ish fences before falling back
|
|
24
|
+
to lower-confidence balanced fragments inside fenced prose.
|
|
25
|
+
- Treat unclosed reasoning tags as reasoning through the end of the text to
|
|
26
|
+
avoid extracting valid-looking draft JSON.
|
|
27
|
+
|
|
7
28
|
## [0.2.0] - 2026-06-05
|
|
8
29
|
|
|
9
30
|
### Added
|
package/README.md
CHANGED
|
@@ -35,9 +35,9 @@ const data = extractJson<{ score: number }>(modelOutput);
|
|
|
35
35
|
|
|
36
36
|
## Why
|
|
37
37
|
|
|
38
|
-
- **Reasoning-model aware.** Strips `<think>` / `<thinking>` blocks first, so brace-laden reasoning (a real cause of `No object generated` failures with DeepSeek R1, Gemini 2.5 thinking, prompted Claude) never gets mistaken for the payload.
|
|
38
|
+
- **Reasoning-model aware.** Strips `<think>` / `<thinking>` blocks first, including unclosed reasoning prefixes, so brace-laden reasoning (a real cause of `No object generated` failures with DeepSeek R1, Gemini 2.5 thinking, prompted Claude) never gets mistaken for the payload.
|
|
39
39
|
- **Handles the real wrappers.** Markdown fences (`json` and bare ```), conversational prose before/after, and the JSON sitting bare in the text.
|
|
40
|
-
- **String-aware, never corrupts.** The scanner and the trailing-comma repair both respect string contents — a `}` or `,` inside `"a string value"` is left alone.
|
|
40
|
+
- **String-aware, delimiter-aware, never corrupts.** The scanner and the trailing-comma repair both respect string contents — a `}` or `,` inside `"a string value"` is left alone, and mismatched or truncated JSON-looking drafts are skipped.
|
|
41
41
|
- **Conservative repair.** Removes trailing commas (the most common malformation); it will never rewrite your data.
|
|
42
42
|
- **Fixture-backed edge cases.** Public fixtures cover reasoning tags, fenced JSON, prose wrappers, trailing commas, top-level type expectations and no-JSON failures.
|
|
43
43
|
- **Two library entry points + CLI.** `extractJson` throws on failure; `tryExtractJson` returns `{ found }`; `json-from-llm` reads stdin for shell pipelines.
|
|
@@ -107,23 +107,69 @@ interface ExtractOptions {
|
|
|
107
107
|
extractJson('[1,2] then the answer {"a":1}', { expect: 'object' }); // { a: 1 }
|
|
108
108
|
```
|
|
109
109
|
|
|
110
|
+
### Provider-style snippets
|
|
111
|
+
|
|
112
|
+
OpenAI-style fenced output:
|
|
113
|
+
|
|
114
|
+
`````ts
|
|
115
|
+
const value = extractJson<{ score: number }>(
|
|
116
|
+
`Here is the JSON:
|
|
117
|
+
```json
|
|
118
|
+
{"score":8,"reason":"clear"}
|
|
119
|
+
````,
|
|
120
|
+
{ expect: 'object' },
|
|
121
|
+
);
|
|
122
|
+
`````
|
|
123
|
+
|
|
124
|
+
Anthropic-style prose around the object:
|
|
125
|
+
|
|
126
|
+
```ts
|
|
127
|
+
const result = tryExtractJson<{ safe: boolean }>(
|
|
128
|
+
'I will return the object first.\n{"safe":true}\nLet me know if you need more.',
|
|
129
|
+
{ expect: 'object' },
|
|
130
|
+
);
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
Gemini-style thinking plus a top-level array:
|
|
134
|
+
|
|
135
|
+
```ts
|
|
136
|
+
const items = extractJson<Array<{ id: string }>>(
|
|
137
|
+
'<thinking>{draft: true}</thinking>\n[{"id":"a"}]',
|
|
138
|
+
{ expect: 'array' },
|
|
139
|
+
);
|
|
140
|
+
```
|
|
141
|
+
|
|
110
142
|
### Algorithm
|
|
111
143
|
|
|
112
|
-
1. Strip `<think>` / `<thinking>` / `<reasoning>` blocks.
|
|
113
|
-
2. Prefer
|
|
114
|
-
3.
|
|
115
|
-
4.
|
|
144
|
+
1. Strip `<think>` / `<thinking>` / `<reasoning>` blocks. If a reasoning tag is opened and never closed, treat the rest as reasoning.
|
|
145
|
+
2. Prefer complete contents of fenced `json` (or bare) code blocks.
|
|
146
|
+
3. If a fence contains prose, scan inside those fences for balanced JSON after complete fence payloads have been tried.
|
|
147
|
+
4. Otherwise scan for the first balanced `{…}` / `[…]` that parses, string-aware and delimiter-aware.
|
|
148
|
+
5. If parsing fails, apply conservative repair (trailing commas) and retry.
|
|
116
149
|
|
|
117
150
|
The low-level pieces (`stripReasoning`, `fencedBlocks`, `balancedSpans`, `removeTrailingCommas`) are exported too.
|
|
118
151
|
|
|
152
|
+
### Caveats
|
|
153
|
+
|
|
154
|
+
- TypeScript generics do not validate runtime shape. Pair this with your schema validator when fields matter.
|
|
155
|
+
- Repair is intentionally narrow: trailing commas only. It will not convert JSON5, comments, single quotes or unquoted keys.
|
|
156
|
+
- Candidate order is deterministic: JSON-ish fences first, then balanced objects/arrays in document order, filtered by `expect`.
|
|
157
|
+
- Unclosed reasoning tags return no JSON from that suffix instead of risking a draft extraction.
|
|
158
|
+
|
|
119
159
|
## Fixture corpus
|
|
120
160
|
|
|
121
161
|
The package includes a small public corpus under [`fixtures/`](./fixtures):
|
|
122
162
|
|
|
123
163
|
- `deepseek-thinking-object.txt`
|
|
124
164
|
- `gemini-reasoning-array.txt`
|
|
165
|
+
- `openai-fenced-object.txt`
|
|
166
|
+
- `multiple-fenced-final.txt`
|
|
167
|
+
- `anthropic-prose-object.txt`
|
|
125
168
|
- `prose-trailing-commas.txt`
|
|
169
|
+
- `malformed-draft-valid-final.txt`
|
|
126
170
|
- `expect-object-skips-array.txt`
|
|
171
|
+
- `truncated-stream-no-json.txt`
|
|
172
|
+
- `unclosed-thinking-no-json.txt`
|
|
127
173
|
- `no-json.txt`
|
|
128
174
|
- expected `tryExtractJson` outputs under `fixtures/expected/`
|
|
129
175
|
|
package/dist/cli.js
CHANGED
|
@@ -48,19 +48,21 @@ function balancedSpans(text) {
|
|
|
48
48
|
while (i < text.length) {
|
|
49
49
|
const ch = text[i];
|
|
50
50
|
if (ch === "{" || ch === "[") {
|
|
51
|
-
const
|
|
52
|
-
if (end !== -1) {
|
|
53
|
-
spans.push(text.slice(i, end));
|
|
54
|
-
i = end;
|
|
51
|
+
const match = matchBalanced(text, i);
|
|
52
|
+
if (match.end !== -1) {
|
|
53
|
+
spans.push(text.slice(i, match.end));
|
|
54
|
+
i = match.end;
|
|
55
55
|
continue;
|
|
56
56
|
}
|
|
57
|
+
i = Math.max(match.resume, i + 1);
|
|
58
|
+
continue;
|
|
57
59
|
}
|
|
58
60
|
i++;
|
|
59
61
|
}
|
|
60
62
|
return spans;
|
|
61
63
|
}
|
|
62
64
|
function matchBalanced(text, start) {
|
|
63
|
-
|
|
65
|
+
const expectedClosers = [];
|
|
64
66
|
let inString = false;
|
|
65
67
|
let escaped = false;
|
|
66
68
|
for (let i = start; i < text.length; i++) {
|
|
@@ -77,22 +79,53 @@ function matchBalanced(text, start) {
|
|
|
77
79
|
}
|
|
78
80
|
if (ch === '"') {
|
|
79
81
|
inString = true;
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
82
|
+
continue;
|
|
83
|
+
}
|
|
84
|
+
if (ch === "{") {
|
|
85
|
+
expectedClosers.push("}");
|
|
86
|
+
continue;
|
|
87
|
+
}
|
|
88
|
+
if (ch === "[") {
|
|
89
|
+
expectedClosers.push("]");
|
|
90
|
+
continue;
|
|
91
|
+
}
|
|
92
|
+
if (ch === "}" || ch === "]") {
|
|
93
|
+
if (expectedClosers.pop() !== ch) {
|
|
94
|
+
return { end: -1, resume: i + 1 };
|
|
95
|
+
}
|
|
96
|
+
if (expectedClosers.length === 0) {
|
|
97
|
+
return { end: i + 1, resume: i + 1 };
|
|
86
98
|
}
|
|
87
99
|
}
|
|
88
100
|
}
|
|
89
|
-
return
|
|
101
|
+
return {
|
|
102
|
+
end: -1,
|
|
103
|
+
resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1
|
|
104
|
+
};
|
|
105
|
+
}
|
|
106
|
+
function looksLikeJsonContainerStart(text, start) {
|
|
107
|
+
let index = start + 1;
|
|
108
|
+
while (index < text.length && /\s/.test(text[index])) {
|
|
109
|
+
index++;
|
|
110
|
+
}
|
|
111
|
+
const next = text[index];
|
|
112
|
+
if (text[start] === "{") {
|
|
113
|
+
return next === '"' || next === "}";
|
|
114
|
+
}
|
|
115
|
+
return next === void 0 || next === "[" || next === "{" || next === '"' || next === "]" || next === "-" || next >= "0" && next <= "9" || next === "t" || next === "f" || next === "n";
|
|
90
116
|
}
|
|
91
117
|
|
|
92
118
|
// src/strip.ts
|
|
93
|
-
var
|
|
119
|
+
var CLOSED_REASONING_BLOCK = /<(think|thinking|reasoning|thought)\b[^>]*>[\s\S]*?<\/\1>/gi;
|
|
120
|
+
var OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\b[^>]*>/gi;
|
|
94
121
|
function stripReasoning(text) {
|
|
95
|
-
|
|
122
|
+
const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, "");
|
|
123
|
+
OPEN_REASONING_TAG.lastIndex = 0;
|
|
124
|
+
const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);
|
|
125
|
+
if (!unclosed) {
|
|
126
|
+
return withoutClosedBlocks;
|
|
127
|
+
}
|
|
128
|
+
return withoutClosedBlocks.slice(0, unclosed.index);
|
|
96
129
|
}
|
|
97
130
|
var FENCE = /```[^\S\n]*([a-zA-Z0-9_+-]*)[^\S\n]*\n?([\s\S]*?)```/g;
|
|
98
131
|
function fencedBlocks(text) {
|
|
@@ -150,8 +183,10 @@ function tryExtractJson(text, options = {}) {
|
|
|
150
183
|
const expect = options.expect ?? "any";
|
|
151
184
|
const cleaned = stripReasoning(text);
|
|
152
185
|
const candidates = [];
|
|
153
|
-
|
|
154
|
-
|
|
186
|
+
const blocks = fencedBlocks(cleaned);
|
|
187
|
+
candidates.push(...blocks);
|
|
188
|
+
for (const block of blocks) {
|
|
189
|
+
candidates.push(...balancedSpans(block));
|
|
155
190
|
}
|
|
156
191
|
candidates.push(...balancedSpans(cleaned));
|
|
157
192
|
for (const candidate of candidates) {
|
package/dist/cli.js.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"sources":["../src/cli.ts","../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["#!/usr/bin/env node\nimport { realpathSync } from 'node:fs';\nimport { fileURLToPath } from 'node:url';\nimport { extractJson } from './extract.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions } from './types.ts';\n\nexport interface CliStreams {\n stdout: (chunk: string) => void;\n stderr: (chunk: string) => void;\n}\n\ninterface CliConfig extends ExtractOptions {\n help: boolean;\n}\n\nconst usage = `Usage: json-from-llm [--expect object|array|any] [--no-repair]\n\nRead LLM output from stdin and print the extracted JSON value to stdout.\n\nOptions:\n --expect <type> Require the top-level JSON value to be object, array or any.\n --no-repair Disable conservative trailing-comma repair.\n -h, --help Show this help text.\n`;\n\nfunction parseArgs(args: string[]): CliConfig | string {\n const config: CliConfig = { help: false };\n\n for (let index = 0; index < args.length; index += 1) {\n const arg = args[index];\n\n if (arg === '-h' || arg === '--help') {\n config.help = true;\n continue;\n }\n\n if (arg === '--no-repair') {\n config.repair = false;\n continue;\n }\n\n if (arg === '--expect') {\n const value = args[index + 1];\n if (value !== 'object' && value !== 'array' && value !== 'any') {\n return 'invalid --expect value; use object, array or any';\n }\n config.expect = value;\n index += 1;\n continue;\n }\n\n return `unknown option: ${arg}`;\n }\n\n return config;\n}\n\nexport async function runCli(\n args: string[],\n stdin: string,\n streams: CliStreams,\n): Promise<number> {\n const config = parseArgs(args);\n\n if (typeof config === 'string') {\n streams.stderr(`json-from-llm: ${config}\\n`);\n return 2;\n }\n\n if (config.help) {\n streams.stdout(usage);\n return 0;\n }\n\n try {\n const value = extractJson(stdin, config);\n streams.stdout(`${JSON.stringify(value)}\\n`);\n return 0;\n } catch (error) {\n if (error instanceof JsonExtractionError) {\n streams.stderr('json-from-llm: no JSON found\\n');\n return 1;\n }\n\n streams.stderr(\n `json-from-llm: ${error instanceof Error ? error.message : String(error)}\\n`,\n );\n return 1;\n }\n}\n\nasync function readStdin(): Promise<string> {\n process.stdin.setEncoding('utf8');\n let input = '';\n for await (const chunk of process.stdin) {\n input += chunk;\n }\n return input;\n}\n\nexport function isExecutedFile(moduleUrl: string, argvPath?: string): boolean {\n if (!argvPath) {\n return false;\n }\n\n try {\n return realpathSync(fileURLToPath(moduleUrl)) === realpathSync(argvPath);\n } catch {\n return false;\n }\n}\n\nfunction isMain(): boolean {\n return isExecutedFile(import.meta.url, process.argv[1]);\n}\n\nasync function main(): Promise<void> {\n process.exitCode = await runCli(process.argv.slice(2), await readStdin(), {\n stdout: (chunk) => {\n process.stdout.write(chunk);\n },\n stderr: (chunk) => {\n process.stderr.write(chunk);\n },\n });\n}\n\nif (isMain()) {\n void main();\n}\n","/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware: braces and brackets inside JSON strings do\n * not affect nesting, so prose like `\"the } char\"` won't break the scan.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const end = matchBalanced(text, i);\n if (end !== -1) {\n spans.push(text.slice(i, end));\n i = end;\n continue;\n }\n }\n i++;\n }\n return spans;\n}\n\n/** Return the index just past the balanced value starting at `start`, or -1. */\nfunction matchBalanced(text: string, start: number): number {\n let depth = 0;\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n } else if (ch === '{' || ch === '[') {\n depth++;\n } else if (ch === '}' || ch === ']') {\n depth--;\n if (depth === 0) {\n return i + 1;\n }\n }\n }\n\n return -1;\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n */\nconst REASONING_TAGS = /<(think|thinking|reasoning|thought)>[\\s\\S]*?<\\/\\1>/gi;\n\nexport function stripReasoning(text: string): string {\n return text.replace(REASONING_TAGS, '');\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n for (const block of fencedBlocks(cleaned)) {\n candidates.push(block, ...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";;;AACA,SAAS,oBAAoB;AAC7B,SAAS,qBAAqB;;;ACGvB,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,MAAM,cAAc,MAAM,CAAC;AACjC,UAAI,QAAQ,IAAI;AACd,cAAM,KAAK,KAAK,MAAM,GAAG,GAAG,CAAC;AAC7B,YAAI;AACJ;AAAA,MACF;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AAGA,SAAS,cAAc,MAAc,OAAuB;AAC1D,MAAI,QAAQ;AACZ,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AAAA,IACb,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AAAA,IACF,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AACA,UAAI,UAAU,GAAG;AACf,eAAO,IAAI;AAAA,MACb;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AACT;;;AClDA,IAAM,iBAAiB;AAEhB,SAAS,eAAe,MAAsB;AACnD,SAAO,KAAK,QAAQ,gBAAgB,EAAE;AACxC;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACVO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,aAAW,SAAS,aAAa,OAAO,GAAG;AACzC,eAAW,KAAK,OAAO,GAAG,cAAc,KAAK,CAAC;AAAA,EAChD;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;;;ALhFA,IAAM,QAAQ;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAUd,SAAS,UAAU,MAAoC;AACrD,QAAM,SAAoB,EAAE,MAAM,MAAM;AAExC,WAAS,QAAQ,GAAG,QAAQ,KAAK,QAAQ,SAAS,GAAG;AACnD,UAAM,MAAM,KAAK,KAAK;AAEtB,QAAI,QAAQ,QAAQ,QAAQ,UAAU;AACpC,aAAO,OAAO;AACd;AAAA,IACF;AAEA,QAAI,QAAQ,eAAe;AACzB,aAAO,SAAS;AAChB;AAAA,IACF;AAEA,QAAI,QAAQ,YAAY;AACtB,YAAM,QAAQ,KAAK,QAAQ,CAAC;AAC5B,UAAI,UAAU,YAAY,UAAU,WAAW,UAAU,OAAO;AAC9D,eAAO;AAAA,MACT;AACA,aAAO,SAAS;AAChB,eAAS;AACT;AAAA,IACF;AAEA,WAAO,mBAAmB,GAAG;AAAA,EAC/B;AAEA,SAAO;AACT;AAEA,eAAsB,OACpB,MACA,OACA,SACiB;AACjB,QAAM,SAAS,UAAU,IAAI;AAE7B,MAAI,OAAO,WAAW,UAAU;AAC9B,YAAQ,OAAO,kBAAkB,MAAM;AAAA,CAAI;AAC3C,WAAO;AAAA,EACT;AAEA,MAAI,OAAO,MAAM;AACf,YAAQ,OAAO,KAAK;AACpB,WAAO;AAAA,EACT;AAEA,MAAI;AACF,UAAM,QAAQ,YAAY,OAAO,MAAM;AACvC,YAAQ,OAAO,GAAG,KAAK,UAAU,KAAK,CAAC;AAAA,CAAI;AAC3C,WAAO;AAAA,EACT,SAAS,OAAO;AACd,QAAI,iBAAiB,qBAAqB;AACxC,cAAQ,OAAO,gCAAgC;AAC/C,aAAO;AAAA,IACT;AAEA,YAAQ;AAAA,MACN,kBAAkB,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK,CAAC;AAAA;AAAA,IAC1E;AACA,WAAO;AAAA,EACT;AACF;AAEA,eAAe,YAA6B;AAC1C,UAAQ,MAAM,YAAY,MAAM;AAChC,MAAI,QAAQ;AACZ,mBAAiB,SAAS,QAAQ,OAAO;AACvC,aAAS;AAAA,EACX;AACA,SAAO;AACT;AAEO,SAAS,eAAe,WAAmB,UAA4B;AAC5E,MAAI,CAAC,UAAU;AACb,WAAO;AAAA,EACT;AAEA,MAAI;AACF,WAAO,aAAa,cAAc,SAAS,CAAC,MAAM,aAAa,QAAQ;AAAA,EACzE,QAAQ;AACN,WAAO;AAAA,EACT;AACF;AAEA,SAAS,SAAkB;AACzB,SAAO,eAAe,YAAY,KAAK,QAAQ,KAAK,CAAC,CAAC;AACxD;AAEA,eAAe,OAAsB;AACnC,UAAQ,WAAW,MAAM,OAAO,QAAQ,KAAK,MAAM,CAAC,GAAG,MAAM,UAAU,GAAG;AAAA,IACxE,QAAQ,CAAC,UAAU;AACjB,cAAQ,OAAO,MAAM,KAAK;AAAA,IAC5B;AAAA,IACA,QAAQ,CAAC,UAAU;AACjB,cAAQ,OAAO,MAAM,KAAK;AAAA,IAC5B;AAAA,EACF,CAAC;AACH;AAEA,IAAI,OAAO,GAAG;AACZ,OAAK,KAAK;AACZ;","names":[]}
|
|
1
|
+
{"version":3,"sources":["../src/cli.ts","../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["#!/usr/bin/env node\nimport { realpathSync } from 'node:fs';\nimport { fileURLToPath } from 'node:url';\nimport { extractJson } from './extract.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions } from './types.ts';\n\nexport interface CliStreams {\n stdout: (chunk: string) => void;\n stderr: (chunk: string) => void;\n}\n\ninterface CliConfig extends ExtractOptions {\n help: boolean;\n}\n\nconst usage = `Usage: json-from-llm [--expect object|array|any] [--no-repair]\n\nRead LLM output from stdin and print the extracted JSON value to stdout.\n\nOptions:\n --expect <type> Require the top-level JSON value to be object, array or any.\n --no-repair Disable conservative trailing-comma repair.\n -h, --help Show this help text.\n`;\n\nfunction parseArgs(args: string[]): CliConfig | string {\n const config: CliConfig = { help: false };\n\n for (let index = 0; index < args.length; index += 1) {\n const arg = args[index];\n\n if (arg === '-h' || arg === '--help') {\n config.help = true;\n continue;\n }\n\n if (arg === '--no-repair') {\n config.repair = false;\n continue;\n }\n\n if (arg === '--expect') {\n const value = args[index + 1];\n if (value !== 'object' && value !== 'array' && value !== 'any') {\n return 'invalid --expect value; use object, array or any';\n }\n config.expect = value;\n index += 1;\n continue;\n }\n\n return `unknown option: ${arg}`;\n }\n\n return config;\n}\n\nexport async function runCli(\n args: string[],\n stdin: string,\n streams: CliStreams,\n): Promise<number> {\n const config = parseArgs(args);\n\n if (typeof config === 'string') {\n streams.stderr(`json-from-llm: ${config}\\n`);\n return 2;\n }\n\n if (config.help) {\n streams.stdout(usage);\n return 0;\n }\n\n try {\n const value = extractJson(stdin, config);\n streams.stdout(`${JSON.stringify(value)}\\n`);\n return 0;\n } catch (error) {\n if (error instanceof JsonExtractionError) {\n streams.stderr('json-from-llm: no JSON found\\n');\n return 1;\n }\n\n streams.stderr(\n `json-from-llm: ${error instanceof Error ? error.message : String(error)}\\n`,\n );\n return 1;\n }\n}\n\nasync function readStdin(): Promise<string> {\n process.stdin.setEncoding('utf8');\n let input = '';\n for await (const chunk of process.stdin) {\n input += chunk;\n }\n return input;\n}\n\nexport function isExecutedFile(moduleUrl: string, argvPath?: string): boolean {\n if (!argvPath) {\n return false;\n }\n\n try {\n return realpathSync(fileURLToPath(moduleUrl)) === realpathSync(argvPath);\n } catch {\n return false;\n }\n}\n\nfunction isMain(): boolean {\n return isExecutedFile(import.meta.url, process.argv[1]);\n}\n\nasync function main(): Promise<void> {\n process.exitCode = await runCli(process.argv.slice(2), await readStdin(), {\n stdout: (chunk) => {\n process.stdout.write(chunk);\n },\n stderr: (chunk) => {\n process.stderr.write(chunk);\n },\n });\n}\n\nif (isMain()) {\n void main();\n}\n","/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware and delimiter-aware: braces and brackets\n * inside JSON strings do not affect nesting, and `[` must close with `]`.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const match = matchBalanced(text, i);\n if (match.end !== -1) {\n spans.push(text.slice(i, match.end));\n i = match.end;\n continue;\n }\n\n i = Math.max(match.resume, i + 1);\n continue;\n }\n i++;\n }\n return spans;\n}\n\ninterface MatchResult {\n /** Index just past the balanced value, or -1 when no complete value exists. */\n end: number;\n /** Next scan index after a malformed or incomplete candidate. */\n resume: number;\n}\n\nfunction matchBalanced(text: string, start: number): MatchResult {\n const expectedClosers: string[] = [];\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n continue;\n }\n\n if (ch === '{') {\n expectedClosers.push('}');\n continue;\n }\n\n if (ch === '[') {\n expectedClosers.push(']');\n continue;\n }\n\n if (ch === '}' || ch === ']') {\n if (expectedClosers.pop() !== ch) {\n return { end: -1, resume: i + 1 };\n }\n if (expectedClosers.length === 0) {\n return { end: i + 1, resume: i + 1 };\n }\n }\n }\n\n return {\n end: -1,\n resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1,\n };\n}\n\nfunction looksLikeJsonContainerStart(text: string, start: number): boolean {\n let index = start + 1;\n while (index < text.length && /\\s/.test(text[index])) {\n index++;\n }\n\n const next = text[index];\n if (text[start] === '{') {\n return next === '\"' || next === '}';\n }\n\n return (\n next === undefined ||\n next === '[' ||\n next === '{' ||\n next === '\"' ||\n next === ']' ||\n next === '-' ||\n (next >= '0' && next <= '9') ||\n next === 't' ||\n next === 'f' ||\n next === 'n'\n );\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n *\n * If a reasoning tag is opened but not closed, treat the rest of the text as\n * reasoning. Returning no JSON is safer than extracting a valid-looking draft.\n */\nconst CLOSED_REASONING_BLOCK =\n /<(think|thinking|reasoning|thought)\\b[^>]*>[\\s\\S]*?<\\/\\1>/gi;\nconst OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\\b[^>]*>/gi;\n\nexport function stripReasoning(text: string): string {\n const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, '');\n OPEN_REASONING_TAG.lastIndex = 0;\n const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);\n\n if (!unclosed) {\n return withoutClosedBlocks;\n }\n\n return withoutClosedBlocks.slice(0, unclosed.index);\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n const blocks = fencedBlocks(cleaned);\n candidates.push(...blocks);\n for (const block of blocks) {\n candidates.push(...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";;;AACA,SAAS,oBAAoB;AAC7B,SAAS,qBAAqB;;;ACGvB,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,QAAQ,cAAc,MAAM,CAAC;AACnC,UAAI,MAAM,QAAQ,IAAI;AACpB,cAAM,KAAK,KAAK,MAAM,GAAG,MAAM,GAAG,CAAC;AACnC,YAAI,MAAM;AACV;AAAA,MACF;AAEA,UAAI,KAAK,IAAI,MAAM,QAAQ,IAAI,CAAC;AAChC;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AASA,SAAS,cAAc,MAAc,OAA4B;AAC/D,QAAM,kBAA4B,CAAC;AACnC,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,UAAI,gBAAgB,IAAI,MAAM,IAAI;AAChC,eAAO,EAAE,KAAK,IAAI,QAAQ,IAAI,EAAE;AAAA,MAClC;AACA,UAAI,gBAAgB,WAAW,GAAG;AAChC,eAAO,EAAE,KAAK,IAAI,GAAG,QAAQ,IAAI,EAAE;AAAA,MACrC;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AAAA,IACL,KAAK;AAAA,IACL,QAAQ,4BAA4B,MAAM,KAAK,IAAI,KAAK,SAAS,QAAQ;AAAA,EAC3E;AACF;AAEA,SAAS,4BAA4B,MAAc,OAAwB;AACzE,MAAI,QAAQ,QAAQ;AACpB,SAAO,QAAQ,KAAK,UAAU,KAAK,KAAK,KAAK,KAAK,CAAC,GAAG;AACpD;AAAA,EACF;AAEA,QAAM,OAAO,KAAK,KAAK;AACvB,MAAI,KAAK,KAAK,MAAM,KAAK;AACvB,WAAO,SAAS,OAAO,SAAS;AAAA,EAClC;AAEA,SACE,SAAS,UACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACR,QAAQ,OAAO,QAAQ,OACxB,SAAS,OACT,SAAS,OACT,SAAS;AAEb;;;ACjGA,IAAM,yBACJ;AACF,IAAM,qBAAqB;AAEpB,SAAS,eAAe,MAAsB;AACnD,QAAM,sBAAsB,KAAK,QAAQ,wBAAwB,EAAE;AACnE,qBAAmB,YAAY;AAC/B,QAAM,WAAW,mBAAmB,KAAK,mBAAmB;AAE5D,MAAI,CAAC,UAAU;AACb,WAAO;AAAA,EACT;AAEA,SAAO,oBAAoB,MAAM,GAAG,SAAS,KAAK;AACpD;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACvBO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,QAAM,SAAS,aAAa,OAAO;AACnC,aAAW,KAAK,GAAG,MAAM;AACzB,aAAW,SAAS,QAAQ;AAC1B,eAAW,KAAK,GAAG,cAAc,KAAK,CAAC;AAAA,EACzC;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;;;ALlFA,IAAM,QAAQ;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAUd,SAAS,UAAU,MAAoC;AACrD,QAAM,SAAoB,EAAE,MAAM,MAAM;AAExC,WAAS,QAAQ,GAAG,QAAQ,KAAK,QAAQ,SAAS,GAAG;AACnD,UAAM,MAAM,KAAK,KAAK;AAEtB,QAAI,QAAQ,QAAQ,QAAQ,UAAU;AACpC,aAAO,OAAO;AACd;AAAA,IACF;AAEA,QAAI,QAAQ,eAAe;AACzB,aAAO,SAAS;AAChB;AAAA,IACF;AAEA,QAAI,QAAQ,YAAY;AACtB,YAAM,QAAQ,KAAK,QAAQ,CAAC;AAC5B,UAAI,UAAU,YAAY,UAAU,WAAW,UAAU,OAAO;AAC9D,eAAO;AAAA,MACT;AACA,aAAO,SAAS;AAChB,eAAS;AACT;AAAA,IACF;AAEA,WAAO,mBAAmB,GAAG;AAAA,EAC/B;AAEA,SAAO;AACT;AAEA,eAAsB,OACpB,MACA,OACA,SACiB;AACjB,QAAM,SAAS,UAAU,IAAI;AAE7B,MAAI,OAAO,WAAW,UAAU;AAC9B,YAAQ,OAAO,kBAAkB,MAAM;AAAA,CAAI;AAC3C,WAAO;AAAA,EACT;AAEA,MAAI,OAAO,MAAM;AACf,YAAQ,OAAO,KAAK;AACpB,WAAO;AAAA,EACT;AAEA,MAAI;AACF,UAAM,QAAQ,YAAY,OAAO,MAAM;AACvC,YAAQ,OAAO,GAAG,KAAK,UAAU,KAAK,CAAC;AAAA,CAAI;AAC3C,WAAO;AAAA,EACT,SAAS,OAAO;AACd,QAAI,iBAAiB,qBAAqB;AACxC,cAAQ,OAAO,gCAAgC;AAC/C,aAAO;AAAA,IACT;AAEA,YAAQ;AAAA,MACN,kBAAkB,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK,CAAC;AAAA;AAAA,IAC1E;AACA,WAAO;AAAA,EACT;AACF;AAEA,eAAe,YAA6B;AAC1C,UAAQ,MAAM,YAAY,MAAM;AAChC,MAAI,QAAQ;AACZ,mBAAiB,SAAS,QAAQ,OAAO;AACvC,aAAS;AAAA,EACX;AACA,SAAO;AACT;AAEO,SAAS,eAAe,WAAmB,UAA4B;AAC5E,MAAI,CAAC,UAAU;AACb,WAAO;AAAA,EACT;AAEA,MAAI;AACF,WAAO,aAAa,cAAc,SAAS,CAAC,MAAM,aAAa,QAAQ;AAAA,EACzE,QAAQ;AACN,WAAO;AAAA,EACT;AACF;AAEA,SAAS,SAAkB;AACzB,SAAO,eAAe,YAAY,KAAK,QAAQ,KAAK,CAAC,CAAC;AACxD;AAEA,eAAe,OAAsB;AACnC,UAAQ,WAAW,MAAM,OAAO,QAAQ,KAAK,MAAM,CAAC,GAAG,MAAM,UAAU,GAAG;AAAA,IACxE,QAAQ,CAAC,UAAU;AACjB,cAAQ,OAAO,MAAM,KAAK;AAAA,IAC5B;AAAA,IACA,QAAQ,CAAC,UAAU;AACjB,cAAQ,OAAO,MAAM,KAAK;AAAA,IAC5B;AAAA,EACF,CAAC;AACH;AAEA,IAAI,OAAO,GAAG;AACZ,OAAK,KAAK;AACZ;","names":[]}
|
package/dist/index.cjs
CHANGED
|
@@ -74,19 +74,21 @@ function balancedSpans(text) {
|
|
|
74
74
|
while (i < text.length) {
|
|
75
75
|
const ch = text[i];
|
|
76
76
|
if (ch === "{" || ch === "[") {
|
|
77
|
-
const
|
|
78
|
-
if (end !== -1) {
|
|
79
|
-
spans.push(text.slice(i, end));
|
|
80
|
-
i = end;
|
|
77
|
+
const match = matchBalanced(text, i);
|
|
78
|
+
if (match.end !== -1) {
|
|
79
|
+
spans.push(text.slice(i, match.end));
|
|
80
|
+
i = match.end;
|
|
81
81
|
continue;
|
|
82
82
|
}
|
|
83
|
+
i = Math.max(match.resume, i + 1);
|
|
84
|
+
continue;
|
|
83
85
|
}
|
|
84
86
|
i++;
|
|
85
87
|
}
|
|
86
88
|
return spans;
|
|
87
89
|
}
|
|
88
90
|
function matchBalanced(text, start) {
|
|
89
|
-
|
|
91
|
+
const expectedClosers = [];
|
|
90
92
|
let inString = false;
|
|
91
93
|
let escaped = false;
|
|
92
94
|
for (let i = start; i < text.length; i++) {
|
|
@@ -103,22 +105,53 @@ function matchBalanced(text, start) {
|
|
|
103
105
|
}
|
|
104
106
|
if (ch === '"') {
|
|
105
107
|
inString = true;
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
108
|
+
continue;
|
|
109
|
+
}
|
|
110
|
+
if (ch === "{") {
|
|
111
|
+
expectedClosers.push("}");
|
|
112
|
+
continue;
|
|
113
|
+
}
|
|
114
|
+
if (ch === "[") {
|
|
115
|
+
expectedClosers.push("]");
|
|
116
|
+
continue;
|
|
117
|
+
}
|
|
118
|
+
if (ch === "}" || ch === "]") {
|
|
119
|
+
if (expectedClosers.pop() !== ch) {
|
|
120
|
+
return { end: -1, resume: i + 1 };
|
|
121
|
+
}
|
|
122
|
+
if (expectedClosers.length === 0) {
|
|
123
|
+
return { end: i + 1, resume: i + 1 };
|
|
112
124
|
}
|
|
113
125
|
}
|
|
114
126
|
}
|
|
115
|
-
return
|
|
127
|
+
return {
|
|
128
|
+
end: -1,
|
|
129
|
+
resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1
|
|
130
|
+
};
|
|
131
|
+
}
|
|
132
|
+
function looksLikeJsonContainerStart(text, start) {
|
|
133
|
+
let index = start + 1;
|
|
134
|
+
while (index < text.length && /\s/.test(text[index])) {
|
|
135
|
+
index++;
|
|
136
|
+
}
|
|
137
|
+
const next = text[index];
|
|
138
|
+
if (text[start] === "{") {
|
|
139
|
+
return next === '"' || next === "}";
|
|
140
|
+
}
|
|
141
|
+
return next === void 0 || next === "[" || next === "{" || next === '"' || next === "]" || next === "-" || next >= "0" && next <= "9" || next === "t" || next === "f" || next === "n";
|
|
116
142
|
}
|
|
117
143
|
|
|
118
144
|
// src/strip.ts
|
|
119
|
-
var
|
|
145
|
+
var CLOSED_REASONING_BLOCK = /<(think|thinking|reasoning|thought)\b[^>]*>[\s\S]*?<\/\1>/gi;
|
|
146
|
+
var OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\b[^>]*>/gi;
|
|
120
147
|
function stripReasoning(text) {
|
|
121
|
-
|
|
148
|
+
const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, "");
|
|
149
|
+
OPEN_REASONING_TAG.lastIndex = 0;
|
|
150
|
+
const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);
|
|
151
|
+
if (!unclosed) {
|
|
152
|
+
return withoutClosedBlocks;
|
|
153
|
+
}
|
|
154
|
+
return withoutClosedBlocks.slice(0, unclosed.index);
|
|
122
155
|
}
|
|
123
156
|
var FENCE = /```[^\S\n]*([a-zA-Z0-9_+-]*)[^\S\n]*\n?([\s\S]*?)```/g;
|
|
124
157
|
function fencedBlocks(text) {
|
|
@@ -176,8 +209,10 @@ function tryExtractJson(text, options = {}) {
|
|
|
176
209
|
const expect = options.expect ?? "any";
|
|
177
210
|
const cleaned = stripReasoning(text);
|
|
178
211
|
const candidates = [];
|
|
179
|
-
|
|
180
|
-
|
|
212
|
+
const blocks = fencedBlocks(cleaned);
|
|
213
|
+
candidates.push(...blocks);
|
|
214
|
+
for (const block of blocks) {
|
|
215
|
+
candidates.push(...balancedSpans(block));
|
|
181
216
|
}
|
|
182
217
|
candidates.push(...balancedSpans(cleaned));
|
|
183
218
|
for (const candidate of candidates) {
|
package/dist/index.cjs.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"sources":["../src/index.ts","../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["export { extractJson, tryExtractJson } from './extract.ts';\nexport { stripReasoning, fencedBlocks } from './strip.ts';\nexport { balancedSpans } from './scan.ts';\nexport { removeTrailingCommas } from './repair.ts';\nexport { JsonExtractionError } from './types.ts';\nexport type { ExtractOptions, ExtractResult } from './types.ts';\n","/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware: braces and brackets inside JSON strings do\n * not affect nesting, so prose like `\"the } char\"` won't break the scan.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const end = matchBalanced(text, i);\n if (end !== -1) {\n spans.push(text.slice(i, end));\n i = end;\n continue;\n }\n }\n i++;\n }\n return spans;\n}\n\n/** Return the index just past the balanced value starting at `start`, or -1. */\nfunction matchBalanced(text: string, start: number): number {\n let depth = 0;\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n } else if (ch === '{' || ch === '[') {\n depth++;\n } else if (ch === '}' || ch === ']') {\n depth--;\n if (depth === 0) {\n return i + 1;\n }\n }\n }\n\n return -1;\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n */\nconst REASONING_TAGS = /<(think|thinking|reasoning|thought)>[\\s\\S]*?<\\/\\1>/gi;\n\nexport function stripReasoning(text: string): string {\n return text.replace(REASONING_TAGS, '');\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n for (const block of fencedBlocks(cleaned)) {\n candidates.push(block, ...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";;;;;;;;;;;;;;;;;;;;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;;;ACKO,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,MAAM,cAAc,MAAM,CAAC;AACjC,UAAI,QAAQ,IAAI;AACd,cAAM,KAAK,KAAK,MAAM,GAAG,GAAG,CAAC;AAC7B,YAAI;AACJ;AAAA,MACF;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AAGA,SAAS,cAAc,MAAc,OAAuB;AAC1D,MAAI,QAAQ;AACZ,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AAAA,IACb,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AAAA,IACF,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AACA,UAAI,UAAU,GAAG;AACf,eAAO,IAAI;AAAA,MACb;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AACT;;;AClDA,IAAM,iBAAiB;AAEhB,SAAS,eAAe,MAAsB;AACnD,SAAO,KAAK,QAAQ,gBAAgB,EAAE;AACxC;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACVO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,aAAW,SAAS,aAAa,OAAO,GAAG;AACzC,eAAW,KAAK,OAAO,GAAG,cAAc,KAAK,CAAC;AAAA,EAChD;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;","names":[]}
|
|
1
|
+
{"version":3,"sources":["../src/index.ts","../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["export { extractJson, tryExtractJson } from './extract.ts';\nexport { stripReasoning, fencedBlocks } from './strip.ts';\nexport { balancedSpans } from './scan.ts';\nexport { removeTrailingCommas } from './repair.ts';\nexport { JsonExtractionError } from './types.ts';\nexport type { ExtractOptions, ExtractResult } from './types.ts';\n","/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware and delimiter-aware: braces and brackets\n * inside JSON strings do not affect nesting, and `[` must close with `]`.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const match = matchBalanced(text, i);\n if (match.end !== -1) {\n spans.push(text.slice(i, match.end));\n i = match.end;\n continue;\n }\n\n i = Math.max(match.resume, i + 1);\n continue;\n }\n i++;\n }\n return spans;\n}\n\ninterface MatchResult {\n /** Index just past the balanced value, or -1 when no complete value exists. */\n end: number;\n /** Next scan index after a malformed or incomplete candidate. */\n resume: number;\n}\n\nfunction matchBalanced(text: string, start: number): MatchResult {\n const expectedClosers: string[] = [];\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n continue;\n }\n\n if (ch === '{') {\n expectedClosers.push('}');\n continue;\n }\n\n if (ch === '[') {\n expectedClosers.push(']');\n continue;\n }\n\n if (ch === '}' || ch === ']') {\n if (expectedClosers.pop() !== ch) {\n return { end: -1, resume: i + 1 };\n }\n if (expectedClosers.length === 0) {\n return { end: i + 1, resume: i + 1 };\n }\n }\n }\n\n return {\n end: -1,\n resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1,\n };\n}\n\nfunction looksLikeJsonContainerStart(text: string, start: number): boolean {\n let index = start + 1;\n while (index < text.length && /\\s/.test(text[index])) {\n index++;\n }\n\n const next = text[index];\n if (text[start] === '{') {\n return next === '\"' || next === '}';\n }\n\n return (\n next === undefined ||\n next === '[' ||\n next === '{' ||\n next === '\"' ||\n next === ']' ||\n next === '-' ||\n (next >= '0' && next <= '9') ||\n next === 't' ||\n next === 'f' ||\n next === 'n'\n );\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n *\n * If a reasoning tag is opened but not closed, treat the rest of the text as\n * reasoning. Returning no JSON is safer than extracting a valid-looking draft.\n */\nconst CLOSED_REASONING_BLOCK =\n /<(think|thinking|reasoning|thought)\\b[^>]*>[\\s\\S]*?<\\/\\1>/gi;\nconst OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\\b[^>]*>/gi;\n\nexport function stripReasoning(text: string): string {\n const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, '');\n OPEN_REASONING_TAG.lastIndex = 0;\n const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);\n\n if (!unclosed) {\n return withoutClosedBlocks;\n }\n\n return withoutClosedBlocks.slice(0, unclosed.index);\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n const blocks = fencedBlocks(cleaned);\n candidates.push(...blocks);\n for (const block of blocks) {\n candidates.push(...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";;;;;;;;;;;;;;;;;;;;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;;;ACKO,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,QAAQ,cAAc,MAAM,CAAC;AACnC,UAAI,MAAM,QAAQ,IAAI;AACpB,cAAM,KAAK,KAAK,MAAM,GAAG,MAAM,GAAG,CAAC;AACnC,YAAI,MAAM;AACV;AAAA,MACF;AAEA,UAAI,KAAK,IAAI,MAAM,QAAQ,IAAI,CAAC;AAChC;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AASA,SAAS,cAAc,MAAc,OAA4B;AAC/D,QAAM,kBAA4B,CAAC;AACnC,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,UAAI,gBAAgB,IAAI,MAAM,IAAI;AAChC,eAAO,EAAE,KAAK,IAAI,QAAQ,IAAI,EAAE;AAAA,MAClC;AACA,UAAI,gBAAgB,WAAW,GAAG;AAChC,eAAO,EAAE,KAAK,IAAI,GAAG,QAAQ,IAAI,EAAE;AAAA,MACrC;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AAAA,IACL,KAAK;AAAA,IACL,QAAQ,4BAA4B,MAAM,KAAK,IAAI,KAAK,SAAS,QAAQ;AAAA,EAC3E;AACF;AAEA,SAAS,4BAA4B,MAAc,OAAwB;AACzE,MAAI,QAAQ,QAAQ;AACpB,SAAO,QAAQ,KAAK,UAAU,KAAK,KAAK,KAAK,KAAK,CAAC,GAAG;AACpD;AAAA,EACF;AAEA,QAAM,OAAO,KAAK,KAAK;AACvB,MAAI,KAAK,KAAK,MAAM,KAAK;AACvB,WAAO,SAAS,OAAO,SAAS;AAAA,EAClC;AAEA,SACE,SAAS,UACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACR,QAAQ,OAAO,QAAQ,OACxB,SAAS,OACT,SAAS,OACT,SAAS;AAEb;;;ACjGA,IAAM,yBACJ;AACF,IAAM,qBAAqB;AAEpB,SAAS,eAAe,MAAsB;AACnD,QAAM,sBAAsB,KAAK,QAAQ,wBAAwB,EAAE;AACnE,qBAAmB,YAAY;AAC/B,QAAM,WAAW,mBAAmB,KAAK,mBAAmB;AAE5D,MAAI,CAAC,UAAU;AACb,WAAO;AAAA,EACT;AAEA,SAAO,oBAAoB,MAAM,GAAG,SAAS,KAAK;AACpD;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACvBO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,QAAM,SAAS,aAAa,OAAO;AACnC,aAAW,KAAK,GAAG,MAAM;AACzB,aAAW,SAAS,QAAQ;AAC1B,eAAW,KAAK,GAAG,cAAc,KAAK,CAAC;AAAA,EACzC;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;","names":[]}
|
package/dist/index.d.cts
CHANGED
|
@@ -54,8 +54,8 @@ declare function fencedBlocks(text: string): string[];
|
|
|
54
54
|
|
|
55
55
|
/**
|
|
56
56
|
* Find the substrings of complete, balanced JSON objects/arrays in `text`,
|
|
57
|
-
* in document order. String-aware: braces and brackets
|
|
58
|
-
* not affect nesting,
|
|
57
|
+
* in document order. String-aware and delimiter-aware: braces and brackets
|
|
58
|
+
* inside JSON strings do not affect nesting, and `[` must close with `]`.
|
|
59
59
|
*/
|
|
60
60
|
declare function balancedSpans(text: string): string[];
|
|
61
61
|
|
package/dist/index.d.ts
CHANGED
|
@@ -54,8 +54,8 @@ declare function fencedBlocks(text: string): string[];
|
|
|
54
54
|
|
|
55
55
|
/**
|
|
56
56
|
* Find the substrings of complete, balanced JSON objects/arrays in `text`,
|
|
57
|
-
* in document order. String-aware: braces and brackets
|
|
58
|
-
* not affect nesting,
|
|
57
|
+
* in document order. String-aware and delimiter-aware: braces and brackets
|
|
58
|
+
* inside JSON strings do not affect nesting, and `[` must close with `]`.
|
|
59
59
|
*/
|
|
60
60
|
declare function balancedSpans(text: string): string[];
|
|
61
61
|
|
package/dist/index.js
CHANGED
|
@@ -42,19 +42,21 @@ function balancedSpans(text) {
|
|
|
42
42
|
while (i < text.length) {
|
|
43
43
|
const ch = text[i];
|
|
44
44
|
if (ch === "{" || ch === "[") {
|
|
45
|
-
const
|
|
46
|
-
if (end !== -1) {
|
|
47
|
-
spans.push(text.slice(i, end));
|
|
48
|
-
i = end;
|
|
45
|
+
const match = matchBalanced(text, i);
|
|
46
|
+
if (match.end !== -1) {
|
|
47
|
+
spans.push(text.slice(i, match.end));
|
|
48
|
+
i = match.end;
|
|
49
49
|
continue;
|
|
50
50
|
}
|
|
51
|
+
i = Math.max(match.resume, i + 1);
|
|
52
|
+
continue;
|
|
51
53
|
}
|
|
52
54
|
i++;
|
|
53
55
|
}
|
|
54
56
|
return spans;
|
|
55
57
|
}
|
|
56
58
|
function matchBalanced(text, start) {
|
|
57
|
-
|
|
59
|
+
const expectedClosers = [];
|
|
58
60
|
let inString = false;
|
|
59
61
|
let escaped = false;
|
|
60
62
|
for (let i = start; i < text.length; i++) {
|
|
@@ -71,22 +73,53 @@ function matchBalanced(text, start) {
|
|
|
71
73
|
}
|
|
72
74
|
if (ch === '"') {
|
|
73
75
|
inString = true;
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
76
|
+
continue;
|
|
77
|
+
}
|
|
78
|
+
if (ch === "{") {
|
|
79
|
+
expectedClosers.push("}");
|
|
80
|
+
continue;
|
|
81
|
+
}
|
|
82
|
+
if (ch === "[") {
|
|
83
|
+
expectedClosers.push("]");
|
|
84
|
+
continue;
|
|
85
|
+
}
|
|
86
|
+
if (ch === "}" || ch === "]") {
|
|
87
|
+
if (expectedClosers.pop() !== ch) {
|
|
88
|
+
return { end: -1, resume: i + 1 };
|
|
89
|
+
}
|
|
90
|
+
if (expectedClosers.length === 0) {
|
|
91
|
+
return { end: i + 1, resume: i + 1 };
|
|
80
92
|
}
|
|
81
93
|
}
|
|
82
94
|
}
|
|
83
|
-
return
|
|
95
|
+
return {
|
|
96
|
+
end: -1,
|
|
97
|
+
resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1
|
|
98
|
+
};
|
|
99
|
+
}
|
|
100
|
+
function looksLikeJsonContainerStart(text, start) {
|
|
101
|
+
let index = start + 1;
|
|
102
|
+
while (index < text.length && /\s/.test(text[index])) {
|
|
103
|
+
index++;
|
|
104
|
+
}
|
|
105
|
+
const next = text[index];
|
|
106
|
+
if (text[start] === "{") {
|
|
107
|
+
return next === '"' || next === "}";
|
|
108
|
+
}
|
|
109
|
+
return next === void 0 || next === "[" || next === "{" || next === '"' || next === "]" || next === "-" || next >= "0" && next <= "9" || next === "t" || next === "f" || next === "n";
|
|
84
110
|
}
|
|
85
111
|
|
|
86
112
|
// src/strip.ts
|
|
87
|
-
var
|
|
113
|
+
var CLOSED_REASONING_BLOCK = /<(think|thinking|reasoning|thought)\b[^>]*>[\s\S]*?<\/\1>/gi;
|
|
114
|
+
var OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\b[^>]*>/gi;
|
|
88
115
|
function stripReasoning(text) {
|
|
89
|
-
|
|
116
|
+
const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, "");
|
|
117
|
+
OPEN_REASONING_TAG.lastIndex = 0;
|
|
118
|
+
const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);
|
|
119
|
+
if (!unclosed) {
|
|
120
|
+
return withoutClosedBlocks;
|
|
121
|
+
}
|
|
122
|
+
return withoutClosedBlocks.slice(0, unclosed.index);
|
|
90
123
|
}
|
|
91
124
|
var FENCE = /```[^\S\n]*([a-zA-Z0-9_+-]*)[^\S\n]*\n?([\s\S]*?)```/g;
|
|
92
125
|
function fencedBlocks(text) {
|
|
@@ -144,8 +177,10 @@ function tryExtractJson(text, options = {}) {
|
|
|
144
177
|
const expect = options.expect ?? "any";
|
|
145
178
|
const cleaned = stripReasoning(text);
|
|
146
179
|
const candidates = [];
|
|
147
|
-
|
|
148
|
-
|
|
180
|
+
const blocks = fencedBlocks(cleaned);
|
|
181
|
+
candidates.push(...blocks);
|
|
182
|
+
for (const block of blocks) {
|
|
183
|
+
candidates.push(...balancedSpans(block));
|
|
149
184
|
}
|
|
150
185
|
candidates.push(...balancedSpans(cleaned));
|
|
151
186
|
for (const candidate of candidates) {
|
package/dist/index.js.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"sources":["../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware: braces and brackets inside JSON strings do\n * not affect nesting, so prose like `\"the } char\"` won't break the scan.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const end = matchBalanced(text, i);\n if (end !== -1) {\n spans.push(text.slice(i, end));\n i = end;\n continue;\n }\n }\n i++;\n }\n return spans;\n}\n\n/** Return the index just past the balanced value starting at `start`, or -1. */\nfunction matchBalanced(text: string, start: number): number {\n let depth = 0;\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n } else if (ch === '{' || ch === '[') {\n depth++;\n } else if (ch === '}' || ch === ']') {\n depth--;\n if (depth === 0) {\n return i + 1;\n }\n }\n }\n\n return -1;\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n */\nconst REASONING_TAGS = /<(think|thinking|reasoning|thought)>[\\s\\S]*?<\\/\\1>/gi;\n\nexport function stripReasoning(text: string): string {\n return text.replace(REASONING_TAGS, '');\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n for (const block of fencedBlocks(cleaned)) {\n candidates.push(block, ...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";AAKO,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,MAAM,cAAc,MAAM,CAAC;AACjC,UAAI,QAAQ,IAAI;AACd,cAAM,KAAK,KAAK,MAAM,GAAG,GAAG,CAAC;AAC7B,YAAI;AACJ;AAAA,MACF;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AAGA,SAAS,cAAc,MAAc,OAAuB;AAC1D,MAAI,QAAQ;AACZ,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AAAA,IACb,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AAAA,IACF,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AACA,UAAI,UAAU,GAAG;AACf,eAAO,IAAI;AAAA,MACb;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AACT;;;AClDA,IAAM,iBAAiB;AAEhB,SAAS,eAAe,MAAsB;AACnD,SAAO,KAAK,QAAQ,gBAAgB,EAAE;AACxC;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACVO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,aAAW,SAAS,aAAa,OAAO,GAAG;AACzC,eAAW,KAAK,OAAO,GAAG,cAAc,KAAK,CAAC;AAAA,EAChD;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;","names":[]}
|
|
1
|
+
{"version":3,"sources":["../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware and delimiter-aware: braces and brackets\n * inside JSON strings do not affect nesting, and `[` must close with `]`.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const match = matchBalanced(text, i);\n if (match.end !== -1) {\n spans.push(text.slice(i, match.end));\n i = match.end;\n continue;\n }\n\n i = Math.max(match.resume, i + 1);\n continue;\n }\n i++;\n }\n return spans;\n}\n\ninterface MatchResult {\n /** Index just past the balanced value, or -1 when no complete value exists. */\n end: number;\n /** Next scan index after a malformed or incomplete candidate. */\n resume: number;\n}\n\nfunction matchBalanced(text: string, start: number): MatchResult {\n const expectedClosers: string[] = [];\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n continue;\n }\n\n if (ch === '{') {\n expectedClosers.push('}');\n continue;\n }\n\n if (ch === '[') {\n expectedClosers.push(']');\n continue;\n }\n\n if (ch === '}' || ch === ']') {\n if (expectedClosers.pop() !== ch) {\n return { end: -1, resume: i + 1 };\n }\n if (expectedClosers.length === 0) {\n return { end: i + 1, resume: i + 1 };\n }\n }\n }\n\n return {\n end: -1,\n resume: looksLikeJsonContainerStart(text, start) ? text.length : start + 1,\n };\n}\n\nfunction looksLikeJsonContainerStart(text: string, start: number): boolean {\n let index = start + 1;\n while (index < text.length && /\\s/.test(text[index])) {\n index++;\n }\n\n const next = text[index];\n if (text[start] === '{') {\n return next === '\"' || next === '}';\n }\n\n return (\n next === undefined ||\n next === '[' ||\n next === '{' ||\n next === '\"' ||\n next === ']' ||\n next === '-' ||\n (next >= '0' && next <= '9') ||\n next === 't' ||\n next === 'f' ||\n next === 'n'\n );\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n *\n * If a reasoning tag is opened but not closed, treat the rest of the text as\n * reasoning. Returning no JSON is safer than extracting a valid-looking draft.\n */\nconst CLOSED_REASONING_BLOCK =\n /<(think|thinking|reasoning|thought)\\b[^>]*>[\\s\\S]*?<\\/\\1>/gi;\nconst OPEN_REASONING_TAG = /<(think|thinking|reasoning|thought)\\b[^>]*>/gi;\n\nexport function stripReasoning(text: string): string {\n const withoutClosedBlocks = text.replace(CLOSED_REASONING_BLOCK, '');\n OPEN_REASONING_TAG.lastIndex = 0;\n const unclosed = OPEN_REASONING_TAG.exec(withoutClosedBlocks);\n\n if (!unclosed) {\n return withoutClosedBlocks;\n }\n\n return withoutClosedBlocks.slice(0, unclosed.index);\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n const blocks = fencedBlocks(cleaned);\n candidates.push(...blocks);\n for (const block of blocks) {\n candidates.push(...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";AAKO,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,QAAQ,cAAc,MAAM,CAAC;AACnC,UAAI,MAAM,QAAQ,IAAI;AACpB,cAAM,KAAK,KAAK,MAAM,GAAG,MAAM,GAAG,CAAC;AACnC,YAAI,MAAM;AACV;AAAA,MACF;AAEA,UAAI,KAAK,IAAI,MAAM,QAAQ,IAAI,CAAC;AAChC;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AASA,SAAS,cAAc,MAAc,OAA4B;AAC/D,QAAM,kBAA4B,CAAC;AACnC,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,sBAAgB,KAAK,GAAG;AACxB;AAAA,IACF;AAEA,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,UAAI,gBAAgB,IAAI,MAAM,IAAI;AAChC,eAAO,EAAE,KAAK,IAAI,QAAQ,IAAI,EAAE;AAAA,MAClC;AACA,UAAI,gBAAgB,WAAW,GAAG;AAChC,eAAO,EAAE,KAAK,IAAI,GAAG,QAAQ,IAAI,EAAE;AAAA,MACrC;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AAAA,IACL,KAAK;AAAA,IACL,QAAQ,4BAA4B,MAAM,KAAK,IAAI,KAAK,SAAS,QAAQ;AAAA,EAC3E;AACF;AAEA,SAAS,4BAA4B,MAAc,OAAwB;AACzE,MAAI,QAAQ,QAAQ;AACpB,SAAO,QAAQ,KAAK,UAAU,KAAK,KAAK,KAAK,KAAK,CAAC,GAAG;AACpD;AAAA,EACF;AAEA,QAAM,OAAO,KAAK,KAAK;AACvB,MAAI,KAAK,KAAK,MAAM,KAAK;AACvB,WAAO,SAAS,OAAO,SAAS;AAAA,EAClC;AAEA,SACE,SAAS,UACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACT,SAAS,OACR,QAAQ,OAAO,QAAQ,OACxB,SAAS,OACT,SAAS,OACT,SAAS;AAEb;;;ACjGA,IAAM,yBACJ;AACF,IAAM,qBAAqB;AAEpB,SAAS,eAAe,MAAsB;AACnD,QAAM,sBAAsB,KAAK,QAAQ,wBAAwB,EAAE;AACnE,qBAAmB,YAAY;AAC/B,QAAM,WAAW,mBAAmB,KAAK,mBAAmB;AAE5D,MAAI,CAAC,UAAU;AACb,WAAO;AAAA,EACT;AAEA,SAAO,oBAAoB,MAAM,GAAG,SAAS,KAAK;AACpD;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACvBO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,QAAM,SAAS,aAAa,OAAO;AACnC,aAAW,KAAK,GAAG,MAAM;AACzB,aAAW,SAAS,QAAQ;AAC1B,eAAW,KAAK,GAAG,cAAc,KAAK,CAAC;AAAA,EACzC;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;","names":[]}
|
package/fixtures/README.md
CHANGED
|
@@ -5,8 +5,12 @@ plain `JSON.parse`:
|
|
|
5
5
|
|
|
6
6
|
- reasoning/thinking tags with brace-laden prose
|
|
7
7
|
- fenced JSON blocks inside conversational text
|
|
8
|
+
- multiple fenced blocks where a later complete payload should win
|
|
8
9
|
- trailing commas in otherwise valid JSON
|
|
9
10
|
- competing arrays/objects when callers expect a specific top-level type
|
|
11
|
+
- malformed drafts before a final valid JSON answer
|
|
12
|
+
- truncated stream output that contains nested draft fragments
|
|
13
|
+
- unclosed reasoning tags that should not leak draft JSON
|
|
10
14
|
- negative text with no recoverable JSON
|
|
11
15
|
|
|
12
16
|
Each `.txt` file has a matching expected JSON file under `fixtures/expected/`.
|
package/package.json
CHANGED