json-from-llm 0.1.2 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +22 -0
- package/README.md +53 -1
- package/dist/cli.js +277 -0
- package/dist/cli.js.map +1 -0
- package/fixtures/README.md +14 -0
- package/fixtures/deepseek-thinking-object.txt +13 -0
- package/fixtures/expect-object-skips-array.txt +7 -0
- package/fixtures/expected/deepseek-thinking-object.json +9 -0
- package/fixtures/expected/expect-object-skips-array.json +8 -0
- package/fixtures/expected/gemini-reasoning-array.json +8 -0
- package/fixtures/expected/no-json.json +4 -0
- package/fixtures/expected/prose-trailing-commas.json +8 -0
- package/fixtures/gemini-reasoning-array.txt +12 -0
- package/fixtures/no-json.txt +1 -0
- package/fixtures/prose-trailing-commas.txt +8 -0
- package/package.json +6 -2
package/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,28 @@ All notable changes to this project are documented here. The format follows
|
|
|
4
4
|
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and the project adheres
|
|
5
5
|
to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
6
6
|
|
|
7
|
+
## [0.2.0] - 2026-06-05
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- Added the `json-from-llm` CLI for shell pipelines. It reads stdin, prints the
|
|
12
|
+
extracted JSON value to stdout and supports `--expect object|array|any` plus
|
|
13
|
+
`--no-repair`.
|
|
14
|
+
- Added CLI tests covering normalized stdout, top-level type selection and
|
|
15
|
+
extraction failures.
|
|
16
|
+
|
|
17
|
+
## [0.1.3] - 2026-06-05
|
|
18
|
+
|
|
19
|
+
### Added
|
|
20
|
+
|
|
21
|
+
- Added a public reasoning-output fixture corpus covering `<think>` /
|
|
22
|
+
`<reasoning>` blocks, fenced JSON, prose wrappers, trailing commas, type
|
|
23
|
+
expectations and no-JSON failures.
|
|
24
|
+
- Added tests that read the published fixture corpus and verify `tryExtractJson`
|
|
25
|
+
results.
|
|
26
|
+
- Published the `fixtures/` directory in the npm package for downstream parser
|
|
27
|
+
and agent-loop tests.
|
|
28
|
+
|
|
7
29
|
## [0.1.2] - 2026-06-04
|
|
8
30
|
|
|
9
31
|
### Changed
|
package/README.md
CHANGED
|
@@ -9,6 +9,9 @@
|
|
|
9
9
|
|
|
10
10
|
> Extract valid JSON from an LLM response — even when it's wrapped in reasoning/thinking tags, markdown fences or prose. **Zero dependencies.**
|
|
11
11
|
|
|
12
|
+
Security posture is tracked in [docs/security-posture.md](./docs/security-posture.md),
|
|
13
|
+
including CodeQL, OpenSSF Scorecard, Dependabot and branch rules.
|
|
14
|
+
|
|
12
15
|
You asked for JSON. The model gave you:
|
|
13
16
|
|
|
14
17
|
````text
|
|
@@ -36,7 +39,8 @@ const data = extractJson<{ score: number }>(modelOutput);
|
|
|
36
39
|
- **Handles the real wrappers.** Markdown fences (`json` and bare ```), conversational prose before/after, and the JSON sitting bare in the text.
|
|
37
40
|
- **String-aware, never corrupts.** The scanner and the trailing-comma repair both respect string contents — a `}` or `,` inside `"a string value"` is left alone.
|
|
38
41
|
- **Conservative repair.** Removes trailing commas (the most common malformation); it will never rewrite your data.
|
|
39
|
-
- **
|
|
42
|
+
- **Fixture-backed edge cases.** Public fixtures cover reasoning tags, fenced JSON, prose wrappers, trailing commas, top-level type expectations and no-JSON failures.
|
|
43
|
+
- **Two library entry points + CLI.** `extractJson` throws on failure; `tryExtractJson` returns `{ found }`; `json-from-llm` reads stdin for shell pipelines.
|
|
40
44
|
- **Zero dependencies**, ESM + CJS, fully typed.
|
|
41
45
|
|
|
42
46
|
## Install
|
|
@@ -45,6 +49,39 @@ const data = extractJson<{ score: number }>(modelOutput);
|
|
|
45
49
|
npm install json-from-llm
|
|
46
50
|
```
|
|
47
51
|
|
|
52
|
+
## CLI
|
|
53
|
+
|
|
54
|
+
Pipe model output directly into the binary:
|
|
55
|
+
|
|
56
|
+
```sh
|
|
57
|
+
cat response.txt | npx json-from-llm
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
Example:
|
|
61
|
+
|
|
62
|
+
````sh
|
|
63
|
+
printf '%s\n' '<think>{draft: true}</think>```json
|
|
64
|
+
{"score":8,"reason":"clear"}
|
|
65
|
+
```' | npx json-from-llm
|
|
66
|
+
# {"score":8,"reason":"clear"}
|
|
67
|
+
````
|
|
68
|
+
|
|
69
|
+
Useful flags:
|
|
70
|
+
|
|
71
|
+
```sh
|
|
72
|
+
# Skip an earlier array and require the first object that parses
|
|
73
|
+
cat response.txt | npx json-from-llm --expect object
|
|
74
|
+
|
|
75
|
+
# Disable trailing-comma repair when you want strict parsing
|
|
76
|
+
cat response.txt | npx json-from-llm --no-repair
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Exit codes:
|
|
80
|
+
|
|
81
|
+
- `0` — JSON extracted and printed to stdout.
|
|
82
|
+
- `1` — no matching JSON value found.
|
|
83
|
+
- `2` — invalid CLI options.
|
|
84
|
+
|
|
48
85
|
## API
|
|
49
86
|
|
|
50
87
|
### `extractJson<T>(text, options?) => T`
|
|
@@ -79,6 +116,21 @@ extractJson('[1,2] then the answer {"a":1}', { expect: 'object' }); // { a: 1 }
|
|
|
79
116
|
|
|
80
117
|
The low-level pieces (`stripReasoning`, `fencedBlocks`, `balancedSpans`, `removeTrailingCommas`) are exported too.
|
|
81
118
|
|
|
119
|
+
## Fixture corpus
|
|
120
|
+
|
|
121
|
+
The package includes a small public corpus under [`fixtures/`](./fixtures):
|
|
122
|
+
|
|
123
|
+
- `deepseek-thinking-object.txt`
|
|
124
|
+
- `gemini-reasoning-array.txt`
|
|
125
|
+
- `prose-trailing-commas.txt`
|
|
126
|
+
- `expect-object-skips-array.txt`
|
|
127
|
+
- `no-json.txt`
|
|
128
|
+
- expected `tryExtractJson` outputs under `fixtures/expected/`
|
|
129
|
+
|
|
130
|
+
The tests read these files directly, so parser changes are checked against
|
|
131
|
+
stable, reusable examples. The fixtures are synthetic and safe for public CI:
|
|
132
|
+
they contain no prompts, secrets, user data or live provider responses.
|
|
133
|
+
|
|
82
134
|
## Related
|
|
83
135
|
|
|
84
136
|
- [`tool-schema`](https://www.npmjs.com/package/tool-schema) — turn a JSON Schema into a provider tool/function schema (define the shape you then extract).
|
package/dist/cli.js
ADDED
|
@@ -0,0 +1,277 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
|
|
3
|
+
// src/cli.ts
|
|
4
|
+
import { realpathSync } from "fs";
|
|
5
|
+
import { fileURLToPath } from "url";
|
|
6
|
+
|
|
7
|
+
// src/repair.ts
|
|
8
|
+
function removeTrailingCommas(json) {
|
|
9
|
+
let out = "";
|
|
10
|
+
let inString = false;
|
|
11
|
+
let escaped = false;
|
|
12
|
+
for (let i = 0; i < json.length; i++) {
|
|
13
|
+
const ch = json[i];
|
|
14
|
+
if (inString) {
|
|
15
|
+
out += ch;
|
|
16
|
+
if (escaped) {
|
|
17
|
+
escaped = false;
|
|
18
|
+
} else if (ch === "\\") {
|
|
19
|
+
escaped = true;
|
|
20
|
+
} else if (ch === '"') {
|
|
21
|
+
inString = false;
|
|
22
|
+
}
|
|
23
|
+
continue;
|
|
24
|
+
}
|
|
25
|
+
if (ch === '"') {
|
|
26
|
+
inString = true;
|
|
27
|
+
out += ch;
|
|
28
|
+
continue;
|
|
29
|
+
}
|
|
30
|
+
if (ch === ",") {
|
|
31
|
+
let j = i + 1;
|
|
32
|
+
while (j < json.length && (json[j] === " " || json[j] === "\n" || json[j] === "\r" || json[j] === " ")) {
|
|
33
|
+
j++;
|
|
34
|
+
}
|
|
35
|
+
if (json[j] === "}" || json[j] === "]") {
|
|
36
|
+
continue;
|
|
37
|
+
}
|
|
38
|
+
}
|
|
39
|
+
out += ch;
|
|
40
|
+
}
|
|
41
|
+
return out;
|
|
42
|
+
}
|
|
43
|
+
|
|
44
|
+
// src/scan.ts
|
|
45
|
+
function balancedSpans(text) {
|
|
46
|
+
const spans = [];
|
|
47
|
+
let i = 0;
|
|
48
|
+
while (i < text.length) {
|
|
49
|
+
const ch = text[i];
|
|
50
|
+
if (ch === "{" || ch === "[") {
|
|
51
|
+
const end = matchBalanced(text, i);
|
|
52
|
+
if (end !== -1) {
|
|
53
|
+
spans.push(text.slice(i, end));
|
|
54
|
+
i = end;
|
|
55
|
+
continue;
|
|
56
|
+
}
|
|
57
|
+
}
|
|
58
|
+
i++;
|
|
59
|
+
}
|
|
60
|
+
return spans;
|
|
61
|
+
}
|
|
62
|
+
function matchBalanced(text, start) {
|
|
63
|
+
let depth = 0;
|
|
64
|
+
let inString = false;
|
|
65
|
+
let escaped = false;
|
|
66
|
+
for (let i = start; i < text.length; i++) {
|
|
67
|
+
const ch = text[i];
|
|
68
|
+
if (inString) {
|
|
69
|
+
if (escaped) {
|
|
70
|
+
escaped = false;
|
|
71
|
+
} else if (ch === "\\") {
|
|
72
|
+
escaped = true;
|
|
73
|
+
} else if (ch === '"') {
|
|
74
|
+
inString = false;
|
|
75
|
+
}
|
|
76
|
+
continue;
|
|
77
|
+
}
|
|
78
|
+
if (ch === '"') {
|
|
79
|
+
inString = true;
|
|
80
|
+
} else if (ch === "{" || ch === "[") {
|
|
81
|
+
depth++;
|
|
82
|
+
} else if (ch === "}" || ch === "]") {
|
|
83
|
+
depth--;
|
|
84
|
+
if (depth === 0) {
|
|
85
|
+
return i + 1;
|
|
86
|
+
}
|
|
87
|
+
}
|
|
88
|
+
}
|
|
89
|
+
return -1;
|
|
90
|
+
}
|
|
91
|
+
|
|
92
|
+
// src/strip.ts
|
|
93
|
+
var REASONING_TAGS = /<(think|thinking|reasoning|thought)>[\s\S]*?<\/\1>/gi;
|
|
94
|
+
function stripReasoning(text) {
|
|
95
|
+
return text.replace(REASONING_TAGS, "");
|
|
96
|
+
}
|
|
97
|
+
var FENCE = /```[^\S\n]*([a-zA-Z0-9_+-]*)[^\S\n]*\n?([\s\S]*?)```/g;
|
|
98
|
+
function fencedBlocks(text) {
|
|
99
|
+
const blocks = [];
|
|
100
|
+
FENCE.lastIndex = 0;
|
|
101
|
+
let match;
|
|
102
|
+
while ((match = FENCE.exec(text)) !== null) {
|
|
103
|
+
const lang = match[1].toLowerCase();
|
|
104
|
+
const content = match[2].trim();
|
|
105
|
+
if (content.length > 0 && (lang === "" || lang.includes("json"))) {
|
|
106
|
+
blocks.push(content);
|
|
107
|
+
}
|
|
108
|
+
}
|
|
109
|
+
return blocks;
|
|
110
|
+
}
|
|
111
|
+
|
|
112
|
+
// src/types.ts
|
|
113
|
+
var JsonExtractionError = class extends Error {
|
|
114
|
+
constructor(message, text) {
|
|
115
|
+
super(message);
|
|
116
|
+
this.text = text;
|
|
117
|
+
this.name = "JsonExtractionError";
|
|
118
|
+
}
|
|
119
|
+
text;
|
|
120
|
+
};
|
|
121
|
+
|
|
122
|
+
// src/extract.ts
|
|
123
|
+
function parseCandidate(candidate, repair) {
|
|
124
|
+
try {
|
|
125
|
+
return { ok: true, value: JSON.parse(candidate) };
|
|
126
|
+
} catch {
|
|
127
|
+
}
|
|
128
|
+
if (repair) {
|
|
129
|
+
try {
|
|
130
|
+
return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };
|
|
131
|
+
} catch {
|
|
132
|
+
}
|
|
133
|
+
}
|
|
134
|
+
return { ok: false };
|
|
135
|
+
}
|
|
136
|
+
function matchesExpect(value, expect) {
|
|
137
|
+
if (expect === "any") {
|
|
138
|
+
return true;
|
|
139
|
+
}
|
|
140
|
+
if (expect === "array") {
|
|
141
|
+
return Array.isArray(value);
|
|
142
|
+
}
|
|
143
|
+
return typeof value === "object" && value !== null && !Array.isArray(value);
|
|
144
|
+
}
|
|
145
|
+
function tryExtractJson(text, options = {}) {
|
|
146
|
+
if (typeof text !== "string" || text.length === 0) {
|
|
147
|
+
return { found: false };
|
|
148
|
+
}
|
|
149
|
+
const repair = options.repair ?? true;
|
|
150
|
+
const expect = options.expect ?? "any";
|
|
151
|
+
const cleaned = stripReasoning(text);
|
|
152
|
+
const candidates = [];
|
|
153
|
+
for (const block of fencedBlocks(cleaned)) {
|
|
154
|
+
candidates.push(block, ...balancedSpans(block));
|
|
155
|
+
}
|
|
156
|
+
candidates.push(...balancedSpans(cleaned));
|
|
157
|
+
for (const candidate of candidates) {
|
|
158
|
+
const parsed = parseCandidate(candidate, repair);
|
|
159
|
+
if (parsed.ok && matchesExpect(parsed.value, expect)) {
|
|
160
|
+
return { found: true, value: parsed.value };
|
|
161
|
+
}
|
|
162
|
+
}
|
|
163
|
+
return { found: false };
|
|
164
|
+
}
|
|
165
|
+
function extractJson(text, options = {}) {
|
|
166
|
+
const result = tryExtractJson(text, options);
|
|
167
|
+
if (!result.found) {
|
|
168
|
+
throw new JsonExtractionError(
|
|
169
|
+
"No JSON value could be extracted from the text.",
|
|
170
|
+
text
|
|
171
|
+
);
|
|
172
|
+
}
|
|
173
|
+
return result.value;
|
|
174
|
+
}
|
|
175
|
+
|
|
176
|
+
// src/cli.ts
|
|
177
|
+
var usage = `Usage: json-from-llm [--expect object|array|any] [--no-repair]
|
|
178
|
+
|
|
179
|
+
Read LLM output from stdin and print the extracted JSON value to stdout.
|
|
180
|
+
|
|
181
|
+
Options:
|
|
182
|
+
--expect <type> Require the top-level JSON value to be object, array or any.
|
|
183
|
+
--no-repair Disable conservative trailing-comma repair.
|
|
184
|
+
-h, --help Show this help text.
|
|
185
|
+
`;
|
|
186
|
+
function parseArgs(args) {
|
|
187
|
+
const config = { help: false };
|
|
188
|
+
for (let index = 0; index < args.length; index += 1) {
|
|
189
|
+
const arg = args[index];
|
|
190
|
+
if (arg === "-h" || arg === "--help") {
|
|
191
|
+
config.help = true;
|
|
192
|
+
continue;
|
|
193
|
+
}
|
|
194
|
+
if (arg === "--no-repair") {
|
|
195
|
+
config.repair = false;
|
|
196
|
+
continue;
|
|
197
|
+
}
|
|
198
|
+
if (arg === "--expect") {
|
|
199
|
+
const value = args[index + 1];
|
|
200
|
+
if (value !== "object" && value !== "array" && value !== "any") {
|
|
201
|
+
return "invalid --expect value; use object, array or any";
|
|
202
|
+
}
|
|
203
|
+
config.expect = value;
|
|
204
|
+
index += 1;
|
|
205
|
+
continue;
|
|
206
|
+
}
|
|
207
|
+
return `unknown option: ${arg}`;
|
|
208
|
+
}
|
|
209
|
+
return config;
|
|
210
|
+
}
|
|
211
|
+
async function runCli(args, stdin, streams) {
|
|
212
|
+
const config = parseArgs(args);
|
|
213
|
+
if (typeof config === "string") {
|
|
214
|
+
streams.stderr(`json-from-llm: ${config}
|
|
215
|
+
`);
|
|
216
|
+
return 2;
|
|
217
|
+
}
|
|
218
|
+
if (config.help) {
|
|
219
|
+
streams.stdout(usage);
|
|
220
|
+
return 0;
|
|
221
|
+
}
|
|
222
|
+
try {
|
|
223
|
+
const value = extractJson(stdin, config);
|
|
224
|
+
streams.stdout(`${JSON.stringify(value)}
|
|
225
|
+
`);
|
|
226
|
+
return 0;
|
|
227
|
+
} catch (error) {
|
|
228
|
+
if (error instanceof JsonExtractionError) {
|
|
229
|
+
streams.stderr("json-from-llm: no JSON found\n");
|
|
230
|
+
return 1;
|
|
231
|
+
}
|
|
232
|
+
streams.stderr(
|
|
233
|
+
`json-from-llm: ${error instanceof Error ? error.message : String(error)}
|
|
234
|
+
`
|
|
235
|
+
);
|
|
236
|
+
return 1;
|
|
237
|
+
}
|
|
238
|
+
}
|
|
239
|
+
async function readStdin() {
|
|
240
|
+
process.stdin.setEncoding("utf8");
|
|
241
|
+
let input = "";
|
|
242
|
+
for await (const chunk of process.stdin) {
|
|
243
|
+
input += chunk;
|
|
244
|
+
}
|
|
245
|
+
return input;
|
|
246
|
+
}
|
|
247
|
+
function isExecutedFile(moduleUrl, argvPath) {
|
|
248
|
+
if (!argvPath) {
|
|
249
|
+
return false;
|
|
250
|
+
}
|
|
251
|
+
try {
|
|
252
|
+
return realpathSync(fileURLToPath(moduleUrl)) === realpathSync(argvPath);
|
|
253
|
+
} catch {
|
|
254
|
+
return false;
|
|
255
|
+
}
|
|
256
|
+
}
|
|
257
|
+
function isMain() {
|
|
258
|
+
return isExecutedFile(import.meta.url, process.argv[1]);
|
|
259
|
+
}
|
|
260
|
+
async function main() {
|
|
261
|
+
process.exitCode = await runCli(process.argv.slice(2), await readStdin(), {
|
|
262
|
+
stdout: (chunk) => {
|
|
263
|
+
process.stdout.write(chunk);
|
|
264
|
+
},
|
|
265
|
+
stderr: (chunk) => {
|
|
266
|
+
process.stderr.write(chunk);
|
|
267
|
+
}
|
|
268
|
+
});
|
|
269
|
+
}
|
|
270
|
+
if (isMain()) {
|
|
271
|
+
void main();
|
|
272
|
+
}
|
|
273
|
+
export {
|
|
274
|
+
isExecutedFile,
|
|
275
|
+
runCli
|
|
276
|
+
};
|
|
277
|
+
//# sourceMappingURL=cli.js.map
|
package/dist/cli.js.map
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"sources":["../src/cli.ts","../src/repair.ts","../src/scan.ts","../src/strip.ts","../src/types.ts","../src/extract.ts"],"sourcesContent":["#!/usr/bin/env node\nimport { realpathSync } from 'node:fs';\nimport { fileURLToPath } from 'node:url';\nimport { extractJson } from './extract.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions } from './types.ts';\n\nexport interface CliStreams {\n stdout: (chunk: string) => void;\n stderr: (chunk: string) => void;\n}\n\ninterface CliConfig extends ExtractOptions {\n help: boolean;\n}\n\nconst usage = `Usage: json-from-llm [--expect object|array|any] [--no-repair]\n\nRead LLM output from stdin and print the extracted JSON value to stdout.\n\nOptions:\n --expect <type> Require the top-level JSON value to be object, array or any.\n --no-repair Disable conservative trailing-comma repair.\n -h, --help Show this help text.\n`;\n\nfunction parseArgs(args: string[]): CliConfig | string {\n const config: CliConfig = { help: false };\n\n for (let index = 0; index < args.length; index += 1) {\n const arg = args[index];\n\n if (arg === '-h' || arg === '--help') {\n config.help = true;\n continue;\n }\n\n if (arg === '--no-repair') {\n config.repair = false;\n continue;\n }\n\n if (arg === '--expect') {\n const value = args[index + 1];\n if (value !== 'object' && value !== 'array' && value !== 'any') {\n return 'invalid --expect value; use object, array or any';\n }\n config.expect = value;\n index += 1;\n continue;\n }\n\n return `unknown option: ${arg}`;\n }\n\n return config;\n}\n\nexport async function runCli(\n args: string[],\n stdin: string,\n streams: CliStreams,\n): Promise<number> {\n const config = parseArgs(args);\n\n if (typeof config === 'string') {\n streams.stderr(`json-from-llm: ${config}\\n`);\n return 2;\n }\n\n if (config.help) {\n streams.stdout(usage);\n return 0;\n }\n\n try {\n const value = extractJson(stdin, config);\n streams.stdout(`${JSON.stringify(value)}\\n`);\n return 0;\n } catch (error) {\n if (error instanceof JsonExtractionError) {\n streams.stderr('json-from-llm: no JSON found\\n');\n return 1;\n }\n\n streams.stderr(\n `json-from-llm: ${error instanceof Error ? error.message : String(error)}\\n`,\n );\n return 1;\n }\n}\n\nasync function readStdin(): Promise<string> {\n process.stdin.setEncoding('utf8');\n let input = '';\n for await (const chunk of process.stdin) {\n input += chunk;\n }\n return input;\n}\n\nexport function isExecutedFile(moduleUrl: string, argvPath?: string): boolean {\n if (!argvPath) {\n return false;\n }\n\n try {\n return realpathSync(fileURLToPath(moduleUrl)) === realpathSync(argvPath);\n } catch {\n return false;\n }\n}\n\nfunction isMain(): boolean {\n return isExecutedFile(import.meta.url, process.argv[1]);\n}\n\nasync function main(): Promise<void> {\n process.exitCode = await runCli(process.argv.slice(2), await readStdin(), {\n stdout: (chunk) => {\n process.stdout.write(chunk);\n },\n stderr: (chunk) => {\n process.stderr.write(chunk);\n },\n });\n}\n\nif (isMain()) {\n void main();\n}\n","/**\n * Remove trailing commas (`{\"a\":1,}` → `{\"a\":1}`, `[1,2,]` → `[1,2]`), which\n * models emit frequently. String-aware: a comma inside a string value is never\n * touched, so this can only ever fix structure, never corrupt content.\n */\nexport function removeTrailingCommas(json: string): string {\n let out = '';\n let inString = false;\n let escaped = false;\n\n for (let i = 0; i < json.length; i++) {\n const ch = json[i];\n\n if (inString) {\n out += ch;\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n out += ch;\n continue;\n }\n\n if (ch === ',') {\n let j = i + 1;\n while (\n j < json.length &&\n (json[j] === ' ' ||\n json[j] === '\\n' ||\n json[j] === '\\r' ||\n json[j] === '\\t')\n ) {\n j++;\n }\n if (json[j] === '}' || json[j] === ']') {\n continue; // drop the trailing comma\n }\n }\n\n out += ch;\n }\n\n return out;\n}\n","/**\n * Find the substrings of complete, balanced JSON objects/arrays in `text`,\n * in document order. String-aware: braces and brackets inside JSON strings do\n * not affect nesting, so prose like `\"the } char\"` won't break the scan.\n */\nexport function balancedSpans(text: string): string[] {\n const spans: string[] = [];\n let i = 0;\n while (i < text.length) {\n const ch = text[i];\n if (ch === '{' || ch === '[') {\n const end = matchBalanced(text, i);\n if (end !== -1) {\n spans.push(text.slice(i, end));\n i = end;\n continue;\n }\n }\n i++;\n }\n return spans;\n}\n\n/** Return the index just past the balanced value starting at `start`, or -1. */\nfunction matchBalanced(text: string, start: number): number {\n let depth = 0;\n let inString = false;\n let escaped = false;\n\n for (let i = start; i < text.length; i++) {\n const ch = text[i];\n\n if (inString) {\n if (escaped) {\n escaped = false;\n } else if (ch === '\\\\') {\n escaped = true;\n } else if (ch === '\"') {\n inString = false;\n }\n continue;\n }\n\n if (ch === '\"') {\n inString = true;\n } else if (ch === '{' || ch === '[') {\n depth++;\n } else if (ch === '}' || ch === ']') {\n depth--;\n if (depth === 0) {\n return i + 1;\n }\n }\n }\n\n return -1;\n}\n","/**\n * Remove model \"thinking\" / reasoning blocks. Reasoning models (DeepSeek R1,\n * Qwen, and prompted Claude/Gemini setups) emit `<think>…</think>` or\n * `<thinking>…</thinking>` before the answer, and that text frequently contains\n * brace-laden prose that would otherwise be mistaken for the payload.\n */\nconst REASONING_TAGS = /<(think|thinking|reasoning|thought)>[\\s\\S]*?<\\/\\1>/gi;\n\nexport function stripReasoning(text: string): string {\n return text.replace(REASONING_TAGS, '');\n}\n\n/**\n * Return the inner contents of fenced code blocks that could hold JSON: blocks\n * tagged ```json / ```jsonc / ```json5, or untagged ``` blocks. Other languages\n * (```python, ```ts) are skipped — they won't contain the answer JSON.\n */\nconst FENCE = /```[^\\S\\n]*([a-zA-Z0-9_+-]*)[^\\S\\n]*\\n?([\\s\\S]*?)```/g;\n\nexport function fencedBlocks(text: string): string[] {\n const blocks: string[] = [];\n FENCE.lastIndex = 0;\n let match: RegExpExecArray | null;\n while ((match = FENCE.exec(text)) !== null) {\n const lang = match[1].toLowerCase();\n const content = match[2].trim();\n if (content.length > 0 && (lang === '' || lang.includes('json'))) {\n blocks.push(content);\n }\n }\n return blocks;\n}\n","/** Options for {@link extractJson} and {@link tryExtractJson}. */\nexport interface ExtractOptions {\n /**\n * Apply conservative, string-aware repairs before parsing — currently the\n * removal of trailing commas, which models emit often. Never rewrites string\n * contents. Default `true`.\n */\n repair?: boolean;\n /**\n * Restrict which top-level JSON value to accept: an `'object'`, an `'array'`,\n * or `'any'` (the default).\n */\n expect?: 'object' | 'array' | 'any';\n}\n\n/** The result of {@link tryExtractJson}. */\nexport type ExtractResult<T> =\n | { found: true; value: T }\n | { found: false; value?: undefined };\n\n/** Thrown by {@link extractJson} when no JSON value can be recovered. */\nexport class JsonExtractionError extends Error {\n constructor(\n message: string,\n /** The original text that no JSON could be extracted from. */\n public readonly text: string,\n ) {\n super(message);\n this.name = 'JsonExtractionError';\n }\n}\n","import { removeTrailingCommas } from './repair.ts';\nimport { balancedSpans } from './scan.ts';\nimport { fencedBlocks, stripReasoning } from './strip.ts';\nimport { JsonExtractionError } from './types.ts';\nimport type { ExtractOptions, ExtractResult } from './types.ts';\n\nfunction parseCandidate(\n candidate: string,\n repair: boolean,\n): { ok: true; value: unknown } | { ok: false } {\n try {\n return { ok: true, value: JSON.parse(candidate) };\n } catch {\n // fall through to repair\n }\n if (repair) {\n try {\n return { ok: true, value: JSON.parse(removeTrailingCommas(candidate)) };\n } catch {\n // unrecoverable\n }\n }\n return { ok: false };\n}\n\nfunction matchesExpect(\n value: unknown,\n expect: 'object' | 'array' | 'any',\n): boolean {\n if (expect === 'any') {\n return true;\n }\n if (expect === 'array') {\n return Array.isArray(value);\n }\n return typeof value === 'object' && value !== null && !Array.isArray(value);\n}\n\n/**\n * Extract a JSON value from LLM output without throwing.\n *\n * Strips `<think>` / `<thinking>` reasoning blocks, prefers fenced ```json\n * code blocks, then scans for the first balanced object/array that parses\n * (applying conservative repair). Returns `{ found: false }` if nothing parses.\n *\n * @example\n * ```ts\n * const r = tryExtractJson<{ score: number }>('<think>...</think>\\n{\"score\":7}');\n * if (r.found) console.log(r.value.score); // 7\n * ```\n */\nexport function tryExtractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): ExtractResult<T> {\n if (typeof text !== 'string' || text.length === 0) {\n return { found: false };\n }\n\n const repair = options.repair ?? true;\n const expect = options.expect ?? 'any';\n const cleaned = stripReasoning(text);\n\n // Candidate substrings, highest confidence first: fenced blocks (and any\n // balanced values inside them), then balanced values anywhere in the text.\n const candidates: string[] = [];\n for (const block of fencedBlocks(cleaned)) {\n candidates.push(block, ...balancedSpans(block));\n }\n candidates.push(...balancedSpans(cleaned));\n\n for (const candidate of candidates) {\n const parsed = parseCandidate(candidate, repair);\n if (parsed.ok && matchesExpect(parsed.value, expect)) {\n return { found: true, value: parsed.value as T };\n }\n }\n return { found: false };\n}\n\n/**\n * Extract a JSON value from LLM output, throwing {@link JsonExtractionError}\n * if none can be recovered. See {@link tryExtractJson} for the algorithm.\n */\nexport function extractJson<T = unknown>(\n text: string,\n options: ExtractOptions = {},\n): T {\n const result = tryExtractJson<T>(text, options);\n if (!result.found) {\n throw new JsonExtractionError(\n 'No JSON value could be extracted from the text.',\n text,\n );\n }\n return result.value;\n}\n"],"mappings":";;;AACA,SAAS,oBAAoB;AAC7B,SAAS,qBAAqB;;;ACGvB,SAAS,qBAAqB,MAAsB;AACzD,MAAI,MAAM;AACV,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,GAAG,IAAI,KAAK,QAAQ,KAAK;AACpC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,aAAO;AACP,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AACX,aAAO;AACP;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,UAAI,IAAI,IAAI;AACZ,aACE,IAAI,KAAK,WACR,KAAK,CAAC,MAAM,OACX,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,QACZ,KAAK,CAAC,MAAM,MACd;AACA;AAAA,MACF;AACA,UAAI,KAAK,CAAC,MAAM,OAAO,KAAK,CAAC,MAAM,KAAK;AACtC;AAAA,MACF;AAAA,IACF;AAEA,WAAO;AAAA,EACT;AAEA,SAAO;AACT;;;AC9CO,SAAS,cAAc,MAAwB;AACpD,QAAM,QAAkB,CAAC;AACzB,MAAI,IAAI;AACR,SAAO,IAAI,KAAK,QAAQ;AACtB,UAAM,KAAK,KAAK,CAAC;AACjB,QAAI,OAAO,OAAO,OAAO,KAAK;AAC5B,YAAM,MAAM,cAAc,MAAM,CAAC;AACjC,UAAI,QAAQ,IAAI;AACd,cAAM,KAAK,KAAK,MAAM,GAAG,GAAG,CAAC;AAC7B,YAAI;AACJ;AAAA,MACF;AAAA,IACF;AACA;AAAA,EACF;AACA,SAAO;AACT;AAGA,SAAS,cAAc,MAAc,OAAuB;AAC1D,MAAI,QAAQ;AACZ,MAAI,WAAW;AACf,MAAI,UAAU;AAEd,WAAS,IAAI,OAAO,IAAI,KAAK,QAAQ,KAAK;AACxC,UAAM,KAAK,KAAK,CAAC;AAEjB,QAAI,UAAU;AACZ,UAAI,SAAS;AACX,kBAAU;AAAA,MACZ,WAAW,OAAO,MAAM;AACtB,kBAAU;AAAA,MACZ,WAAW,OAAO,KAAK;AACrB,mBAAW;AAAA,MACb;AACA;AAAA,IACF;AAEA,QAAI,OAAO,KAAK;AACd,iBAAW;AAAA,IACb,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AAAA,IACF,WAAW,OAAO,OAAO,OAAO,KAAK;AACnC;AACA,UAAI,UAAU,GAAG;AACf,eAAO,IAAI;AAAA,MACb;AAAA,IACF;AAAA,EACF;AAEA,SAAO;AACT;;;AClDA,IAAM,iBAAiB;AAEhB,SAAS,eAAe,MAAsB;AACnD,SAAO,KAAK,QAAQ,gBAAgB,EAAE;AACxC;AAOA,IAAM,QAAQ;AAEP,SAAS,aAAa,MAAwB;AACnD,QAAM,SAAmB,CAAC;AAC1B,QAAM,YAAY;AAClB,MAAI;AACJ,UAAQ,QAAQ,MAAM,KAAK,IAAI,OAAO,MAAM;AAC1C,UAAM,OAAO,MAAM,CAAC,EAAE,YAAY;AAClC,UAAM,UAAU,MAAM,CAAC,EAAE,KAAK;AAC9B,QAAI,QAAQ,SAAS,MAAM,SAAS,MAAM,KAAK,SAAS,MAAM,IAAI;AAChE,aAAO,KAAK,OAAO;AAAA,IACrB;AAAA,EACF;AACA,SAAO;AACT;;;ACVO,IAAM,sBAAN,cAAkC,MAAM;AAAA,EAC7C,YACE,SAEgB,MAChB;AACA,UAAM,OAAO;AAFG;AAGhB,SAAK,OAAO;AAAA,EACd;AAAA,EAJkB;AAKpB;;;ACxBA,SAAS,eACP,WACA,QAC8C;AAC9C,MAAI;AACF,WAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,SAAS,EAAE;AAAA,EAClD,QAAQ;AAAA,EAER;AACA,MAAI,QAAQ;AACV,QAAI;AACF,aAAO,EAAE,IAAI,MAAM,OAAO,KAAK,MAAM,qBAAqB,SAAS,CAAC,EAAE;AAAA,IACxE,QAAQ;AAAA,IAER;AAAA,EACF;AACA,SAAO,EAAE,IAAI,MAAM;AACrB;AAEA,SAAS,cACP,OACA,QACS;AACT,MAAI,WAAW,OAAO;AACpB,WAAO;AAAA,EACT;AACA,MAAI,WAAW,SAAS;AACtB,WAAO,MAAM,QAAQ,KAAK;AAAA,EAC5B;AACA,SAAO,OAAO,UAAU,YAAY,UAAU,QAAQ,CAAC,MAAM,QAAQ,KAAK;AAC5E;AAeO,SAAS,eACd,MACA,UAA0B,CAAC,GACT;AAClB,MAAI,OAAO,SAAS,YAAY,KAAK,WAAW,GAAG;AACjD,WAAO,EAAE,OAAO,MAAM;AAAA,EACxB;AAEA,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,SAAS,QAAQ,UAAU;AACjC,QAAM,UAAU,eAAe,IAAI;AAInC,QAAM,aAAuB,CAAC;AAC9B,aAAW,SAAS,aAAa,OAAO,GAAG;AACzC,eAAW,KAAK,OAAO,GAAG,cAAc,KAAK,CAAC;AAAA,EAChD;AACA,aAAW,KAAK,GAAG,cAAc,OAAO,CAAC;AAEzC,aAAW,aAAa,YAAY;AAClC,UAAM,SAAS,eAAe,WAAW,MAAM;AAC/C,QAAI,OAAO,MAAM,cAAc,OAAO,OAAO,MAAM,GAAG;AACpD,aAAO,EAAE,OAAO,MAAM,OAAO,OAAO,MAAW;AAAA,IACjD;AAAA,EACF;AACA,SAAO,EAAE,OAAO,MAAM;AACxB;AAMO,SAAS,YACd,MACA,UAA0B,CAAC,GACxB;AACH,QAAM,SAAS,eAAkB,MAAM,OAAO;AAC9C,MAAI,CAAC,OAAO,OAAO;AACjB,UAAM,IAAI;AAAA,MACR;AAAA,MACA;AAAA,IACF;AAAA,EACF;AACA,SAAO,OAAO;AAChB;;;ALhFA,IAAM,QAAQ;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAAA;AAUd,SAAS,UAAU,MAAoC;AACrD,QAAM,SAAoB,EAAE,MAAM,MAAM;AAExC,WAAS,QAAQ,GAAG,QAAQ,KAAK,QAAQ,SAAS,GAAG;AACnD,UAAM,MAAM,KAAK,KAAK;AAEtB,QAAI,QAAQ,QAAQ,QAAQ,UAAU;AACpC,aAAO,OAAO;AACd;AAAA,IACF;AAEA,QAAI,QAAQ,eAAe;AACzB,aAAO,SAAS;AAChB;AAAA,IACF;AAEA,QAAI,QAAQ,YAAY;AACtB,YAAM,QAAQ,KAAK,QAAQ,CAAC;AAC5B,UAAI,UAAU,YAAY,UAAU,WAAW,UAAU,OAAO;AAC9D,eAAO;AAAA,MACT;AACA,aAAO,SAAS;AAChB,eAAS;AACT;AAAA,IACF;AAEA,WAAO,mBAAmB,GAAG;AAAA,EAC/B;AAEA,SAAO;AACT;AAEA,eAAsB,OACpB,MACA,OACA,SACiB;AACjB,QAAM,SAAS,UAAU,IAAI;AAE7B,MAAI,OAAO,WAAW,UAAU;AAC9B,YAAQ,OAAO,kBAAkB,MAAM;AAAA,CAAI;AAC3C,WAAO;AAAA,EACT;AAEA,MAAI,OAAO,MAAM;AACf,YAAQ,OAAO,KAAK;AACpB,WAAO;AAAA,EACT;AAEA,MAAI;AACF,UAAM,QAAQ,YAAY,OAAO,MAAM;AACvC,YAAQ,OAAO,GAAG,KAAK,UAAU,KAAK,CAAC;AAAA,CAAI;AAC3C,WAAO;AAAA,EACT,SAAS,OAAO;AACd,QAAI,iBAAiB,qBAAqB;AACxC,cAAQ,OAAO,gCAAgC;AAC/C,aAAO;AAAA,IACT;AAEA,YAAQ;AAAA,MACN,kBAAkB,iBAAiB,QAAQ,MAAM,UAAU,OAAO,KAAK,CAAC;AAAA;AAAA,IAC1E;AACA,WAAO;AAAA,EACT;AACF;AAEA,eAAe,YAA6B;AAC1C,UAAQ,MAAM,YAAY,MAAM;AAChC,MAAI,QAAQ;AACZ,mBAAiB,SAAS,QAAQ,OAAO;AACvC,aAAS;AAAA,EACX;AACA,SAAO;AACT;AAEO,SAAS,eAAe,WAAmB,UAA4B;AAC5E,MAAI,CAAC,UAAU;AACb,WAAO;AAAA,EACT;AAEA,MAAI;AACF,WAAO,aAAa,cAAc,SAAS,CAAC,MAAM,aAAa,QAAQ;AAAA,EACzE,QAAQ;AACN,WAAO;AAAA,EACT;AACF;AAEA,SAAS,SAAkB;AACzB,SAAO,eAAe,YAAY,KAAK,QAAQ,KAAK,CAAC,CAAC;AACxD;AAEA,eAAe,OAAsB;AACnC,UAAQ,WAAW,MAAM,OAAO,QAAQ,KAAK,MAAM,CAAC,GAAG,MAAM,UAAU,GAAG;AAAA,IACxE,QAAQ,CAAC,UAAU;AACjB,cAAQ,OAAO,MAAM,KAAK;AAAA,IAC5B;AAAA,IACA,QAAQ,CAAC,UAAU;AACjB,cAAQ,OAAO,MAAM,KAAK;AAAA,IAC5B;AAAA,EACF,CAAC;AACH;AAEA,IAAI,OAAO,GAAG;AACZ,OAAK,KAAK;AACZ;","names":[]}
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
# Reasoning Output Fixture Corpus
|
|
2
|
+
|
|
3
|
+
This corpus contains deterministic model-output examples that commonly break
|
|
4
|
+
plain `JSON.parse`:
|
|
5
|
+
|
|
6
|
+
- reasoning/thinking tags with brace-laden prose
|
|
7
|
+
- fenced JSON blocks inside conversational text
|
|
8
|
+
- trailing commas in otherwise valid JSON
|
|
9
|
+
- competing arrays/objects when callers expect a specific top-level type
|
|
10
|
+
- negative text with no recoverable JSON
|
|
11
|
+
|
|
12
|
+
Each `.txt` file has a matching expected JSON file under `fixtures/expected/`.
|
|
13
|
+
The fixtures are synthetic and safe for public CI: they contain no prompts,
|
|
14
|
+
secrets, user data or live provider responses.
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
<think>
|
|
2
|
+
The user asked for a risk score. I might draft {score: maybe 6}, but that is
|
|
3
|
+
not JSON and should not be selected.
|
|
4
|
+
</think>
|
|
5
|
+
|
|
6
|
+
Final answer:
|
|
7
|
+
```json
|
|
8
|
+
{
|
|
9
|
+
"score": 8,
|
|
10
|
+
"reason": "clear evidence",
|
|
11
|
+
"tags": ["portable", "safe"]
|
|
12
|
+
}
|
|
13
|
+
```
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
I cannot produce a structured result from the supplied input.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "json-from-llm",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.2.0",
|
|
4
4
|
"description": "Extract valid JSON from an LLM response, even when it is wrapped in reasoning/thinking tags, markdown fences or prose. Zero dependencies.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"llm",
|
|
@@ -32,6 +32,9 @@
|
|
|
32
32
|
"main": "./dist/index.cjs",
|
|
33
33
|
"module": "./dist/index.js",
|
|
34
34
|
"types": "./dist/index.d.ts",
|
|
35
|
+
"bin": {
|
|
36
|
+
"json-from-llm": "dist/cli.js"
|
|
37
|
+
},
|
|
35
38
|
"exports": {
|
|
36
39
|
".": {
|
|
37
40
|
"types": "./dist/index.d.ts",
|
|
@@ -42,6 +45,7 @@
|
|
|
42
45
|
},
|
|
43
46
|
"files": [
|
|
44
47
|
"dist",
|
|
48
|
+
"fixtures",
|
|
45
49
|
"README.md",
|
|
46
50
|
"LICENSE",
|
|
47
51
|
"CHANGELOG.md"
|
|
@@ -67,7 +71,7 @@
|
|
|
67
71
|
"eslint": "^10.4.1",
|
|
68
72
|
"prettier": "^3.4.2",
|
|
69
73
|
"tsup": "^8.3.5",
|
|
70
|
-
"typescript": "^
|
|
74
|
+
"typescript": "^6.0.3",
|
|
71
75
|
"typescript-eslint": "^8.60.0",
|
|
72
76
|
"vitest": "^4.1.8"
|
|
73
77
|
}
|