json-llm-repair 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Tiago Gouvêa
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,201 @@
1
+ # json-llm-repair
2
+
3
+ Parse and repair JSON from LLM outputs with intelligent repair strategies.
4
+
5
+ ## Why?
6
+
7
+ LLMs frequently return JSON in unexpected formats. Models without `response_format` support often wrap JSON in explanatory text or produce malformed syntax. Even models with structured output support (like OpenAI's JSON mode or Anthropic's tool use) occasionally fail to return the exact schema, omitting wrapper objects or adding extra fields.
8
+
9
+ This library handles these issues automatically, with configurable repair strategies.
10
+
11
+ ## Installation
12
+
13
+ ```bash
14
+ npm install json-llm-repair
15
+ # or
16
+ yarn add json-llm-repair
17
+ ```
18
+
19
+ ## Quick Start
20
+
21
+ ```typescript
22
+ import { parseFromLLM } from 'json-llm-repair';
23
+
24
+ const llmOutput = 'Sure! Here is the data: {"name": "John", "age": 30} if you need anything else please let me know.';
25
+ const data = parseFromLLM(llmOutput);
26
+ console.log(data); // { name: "John", age: 30 }
27
+ ```
28
+
29
+ ## What It Fixes
30
+
31
+ ### 1. Extra Text Around JSON
32
+ LLMs often add explanatory text before or after JSON.
33
+
34
+ ```typescript
35
+ const llmOutput = 'Sure! Here is the data: {"name": "John"} Hope this helps!';
36
+ const data = parseFromLLM(llmOutput);
37
+ // Both modes handle this
38
+ ```
39
+
40
+ ### 2. JSON Inside Markdown Code Blocks
41
+ Common with ChatGPT, Claude, and other assistants.
42
+
43
+ ```typescript
44
+ const llmOutput = `Here's your data:
45
+ \`\`\`json
46
+ {"name": "John", "age": 30}
47
+ \`\`\``;
48
+ const data = parseFromLLM(llmOutput);
49
+ // Both modes handle this
50
+ ```
51
+
52
+ ### 3. Multiple JSONs Concatenated
53
+ When LLM outputs multiple JSON objects in sequence.
54
+
55
+ ```typescript
56
+ const llmOutput = '{"id": 1}{"id": 2}{"id": 3}';
57
+ const data = parseFromLLM(llmOutput);
58
+ // Returns first valid JSON: {"id": 1}
59
+ ```
60
+
61
+ ### 4. Invalid JSON Syntax
62
+ Missing quotes, trailing commas, unquoted keys (repair mode only).
63
+
64
+ ```typescript
65
+ const llmOutput = '{name: "John", age: 30,}';
66
+ const data = parseFromLLM(llmOutput, { mode: 'repair' });
67
+ // Fixed to: {"name": "John", "age": 30}
68
+ ```
69
+
70
+ ### 5. Missing Root Key
71
+ LLM forgets the wrapper object expected by your schema (repair mode + schema).
72
+
73
+ ```typescript
74
+ import { z } from 'zod';
75
+
76
+ const UserSchema = z.object({
77
+ user: z.object({ name: z.string(), age: z.number() })
78
+ });
79
+
80
+ const llmOutput = '{"name": "John", "age": 30}';
81
+ const data = parseFromLLM(llmOutput, { mode: 'repair', schema: UserSchema });
82
+ // Wrapped to: { user: { name: "John", age: 30 } }
83
+ ```
84
+
85
+ ### 6. Unescaped Quotes in Strings
86
+ LLM embeds quotes without proper escaping (repair mode only).
87
+
88
+ ```typescript
89
+ const llmOutput = '{"message": "She said "hello" to me"}';
90
+ const data = parseFromLLM(llmOutput, { mode: 'repair' });
91
+ // Fixed to: { message: 'She said "hello" to me' }
92
+ ```
93
+
94
+ > **Note:** May not work reliably with non-ASCII characters (accents, etc).
95
+
96
+ ### 7. Missing Closing Braces or Quotes
97
+ Incomplete JSON from streaming or interrupted responses (repair mode only).
98
+
99
+ ```typescript
100
+ const llmOutput = '{"name": "John", "age": 30';
101
+ const data = parseFromLLM(llmOutput, { mode: 'repair' });
102
+ // Fixed to: { name: "John", age: 30 }
103
+ ```
104
+
105
+ ### 8. Duplicate Keys
106
+ Same property appearing multiple times (repair mode only).
107
+
108
+ ```typescript
109
+ const llmOutput = '{"id": 1, "name": "Alice", "id": 2}';
110
+ const data = parseFromLLM(llmOutput, { mode: 'repair' });
111
+ // Result: { id: 2, name: "Alice" } (last value wins)
112
+ ```
113
+
114
+ ## Mode Comparison
115
+
116
+ | Failure Type | Parse Mode | Repair Mode |
117
+ |--------------|------------|-------------|
118
+ | Text before/after JSON | ✅ Extracts | ✅ Extracts |
119
+ | JSON in markdown blocks | ✅ Extracts | ✅ Extracts |
120
+ | Concatenated JSONs | ✅ Returns first | ✅ Returns first |
121
+ | Missing quotes in keys | ❌ Throws error | ✅ Fixes with jsonrepair |
122
+ | Trailing commas | ❌ Throws error | ✅ Fixes with jsonrepair |
123
+ | Unquoted keys | ❌ Throws error | ✅ Fixes with jsonrepair |
124
+ | Unescaped quotes in values | ❌ Throws error | ✅ Fixes with jsonrepair |
125
+ | Missing closing braces/quotes | ❌ Throws error | ✅ Fixes with jsonrepair |
126
+ | Duplicate keys in object | ❌ Throws error | ✅ Fixes (last wins) |
127
+ | Missing root object | ❌ Returns as-is | ✅ Wraps (with schema) |
128
+ | Completely invalid JSON | ❌ Throws error | ⚠️ Best effort repair |
129
+
130
+ ## Modes
131
+
132
+ | Mode | Behavior |
133
+ |------|----------|
134
+ | `parse` (default) | Extract and parse JSON. Fails on syntax errors. |
135
+ | `repair` | All strategies: jsonrepair, multiple candidates, schema fixes. |
136
+
137
+ ## Examples
138
+
139
+ ### Parse Mode (default)
140
+
141
+ ```typescript
142
+ // Extracts JSON from text, no repair
143
+ const data = parseFromLLM('Here is your data: {"name": "John"}');
144
+ ```
145
+
146
+ ### Repair Mode
147
+
148
+ ```typescript
149
+ // Handles broken JSON syntax
150
+ const data = parseFromLLM(
151
+ 'Sure! {name: "John", age: 30}', // missing quotes
152
+ { mode: 'repair' }
153
+ );
154
+ ```
155
+
156
+ ### Repair Mode + Schema
157
+
158
+ ```typescript
159
+ import { z } from 'zod';
160
+
161
+ const UserSchema = z.object({
162
+ user: z.object({
163
+ name: z.string(),
164
+ age: z.number()
165
+ })
166
+ });
167
+
168
+ // LLM forgot the root "user" key
169
+ const data = parseFromLLM(
170
+ '{"name": "John", "age": 30}',
171
+ { mode: 'repair', schema: UserSchema }
172
+ );
173
+ console.log(data); // { user: { name: "John", age: 30 } }
174
+ ```
175
+
176
+ ## API
177
+
178
+ ### `parseFromLLM<T>(llmOutput: string, options?: ParseOptions): T`
179
+
180
+ Parses JSON from LLM output.
181
+
182
+ **Parameters:**
183
+ - `llmOutput: string` - Raw string from LLM that may contain JSON
184
+ - `options?: ParseOptions` - Optional configuration
185
+
186
+ **Options:**
187
+ - `mode?: 'parse' | 'repair'` - Parsing strategy (default: `'parse'`)
188
+ - `schema?: ZodSchema` - Optional Zod schema for validation and fixes (repair mode only)
189
+
190
+ ### Helper Functions
191
+
192
+ - `hasPossibleJson(str: string): boolean` - Check if string contains JSON braces
193
+ - `isJsonString(str: string): boolean` - Validate if string is valid JSON
194
+
195
+ ## Found an Issue?
196
+
197
+ If you encounter a JSON output format that this library doesn't handle, please [open an issue](../../issues) with an example. We'll be happy to help and improve the library!
198
+
199
+ ## License
200
+
201
+ MIT
@@ -0,0 +1,49 @@
1
+ import { z } from 'zod';
2
+ /**
3
+ * Parsing mode options:
4
+ * - parse: Only basic JSON extraction and parsing
5
+ * - repair: All repair strategies including jsonrepair and schema fixes
6
+ */
7
+ export type ParseMode = 'parse' | 'repair';
8
+ /**
9
+ * Options for parseFromLLM function
10
+ */
11
+ export interface ParseOptions {
12
+ /**
13
+ * Parsing mode
14
+ * @default 'parse'
15
+ */
16
+ mode?: ParseMode;
17
+ /**
18
+ * Optional Zod schema for validation and structural fixes
19
+ * Only used in repair mode
20
+ */
21
+ schema?: z.ZodTypeAny;
22
+ }
23
+ /**
24
+ * Parses and extracts JSON from LLM output strings
25
+ *
26
+ * @param input - Raw string from LLM that may contain JSON
27
+ * @param options - Parsing options
28
+ * @returns Parsed JSON object
29
+ * @throws Error if no valid JSON is found
30
+ *
31
+ * @example
32
+ * ```ts
33
+ * // Parse mode (default)
34
+ * const data = parseFromLLM('Here is the data: {"name": "John"}');
35
+ *
36
+ * // Repair mode with schema
37
+ * const schema = z.object({ user: z.object({ name: z.string() }) });
38
+ * const data = parseFromLLM('{"name": "John"}', { mode: 'repair', schema });
39
+ * ```
40
+ */
41
+ export declare function parseFromLLM<T = any>(input: string, options?: ParseOptions): T;
42
+ /**
43
+ * Checks whether the string may contain JSON braces
44
+ */
45
+ export declare function hasPossibleJson(str: string): boolean;
46
+ /**
47
+ * Validates whether the string is valid JSON
48
+ */
49
+ export declare function isJsonString(str: string): boolean;
package/dist/index.js ADDED
@@ -0,0 +1,226 @@
1
+ "use strict";
2
+ Object.defineProperty(exports, "__esModule", { value: true });
3
+ exports.parseFromLLM = parseFromLLM;
4
+ exports.hasPossibleJson = hasPossibleJson;
5
+ exports.isJsonString = isJsonString;
6
+ const jsonrepair_1 = require("jsonrepair");
7
+ const zod_1 = require("zod");
8
+ /**
9
+ * Parses and extracts JSON from LLM output strings
10
+ *
11
+ * @param input - Raw string from LLM that may contain JSON
12
+ * @param options - Parsing options
13
+ * @returns Parsed JSON object
14
+ * @throws Error if no valid JSON is found
15
+ *
16
+ * @example
17
+ * ```ts
18
+ * // Parse mode (default)
19
+ * const data = parseFromLLM('Here is the data: {"name": "John"}');
20
+ *
21
+ * // Repair mode with schema
22
+ * const schema = z.object({ user: z.object({ name: z.string() }) });
23
+ * const data = parseFromLLM('{"name": "John"}', { mode: 'repair', schema });
24
+ * ```
25
+ */
26
+ function parseFromLLM(input, options) {
27
+ const mode = options?.mode || 'parse';
28
+ const schema = options?.schema;
29
+ let result;
30
+ if (mode === 'parse') {
31
+ result = parseOnly(input);
32
+ }
33
+ else {
34
+ result = parseWithRepair(input);
35
+ }
36
+ // Apply schema fixes only in repair mode
37
+ if (mode === 'repair' && schema) {
38
+ result = wrapRootIfMissing(result, schema);
39
+ }
40
+ return result;
41
+ }
42
+ /**
43
+ * Parse mode: extract and parse JSON without repair
44
+ */
45
+ function parseOnly(input) {
46
+ // Try to find first complete JSON object
47
+ const firstJson = findFirstCompleteJson(input);
48
+ if (!firstJson) {
49
+ throw new Error('No JSON found in the string.');
50
+ }
51
+ try {
52
+ return JSON.parse(firstJson);
53
+ }
54
+ catch (error) {
55
+ throw new Error('Failed to parse JSON: ' + error.message);
56
+ }
57
+ }
58
+ /**
59
+ * Repair mode: multiple strategies with repair
60
+ */
61
+ function parseWithRepair(input) {
62
+ const cleaned = extractOnlyJson(input);
63
+ if (cleaned.startsWith('Invalid input')) {
64
+ throw new Error('No JSON found in the string.');
65
+ }
66
+ // Strategy 1: Try all possible JSON candidates
67
+ const possibleJson = findAllPossibleJson(cleaned);
68
+ for (const jsonCandidate of possibleJson) {
69
+ // Try native JSON.parse first (fast path)
70
+ try {
71
+ return JSON.parse(jsonCandidate);
72
+ }
73
+ catch (parseError) {
74
+ // Try jsonrepair as fallback
75
+ try {
76
+ const repaired = (0, jsonrepair_1.jsonrepair)(jsonCandidate);
77
+ return JSON.parse(repaired);
78
+ }
79
+ catch (repairError) {
80
+ continue;
81
+ }
82
+ }
83
+ }
84
+ // Strategy 2: Fallback to first complete JSON
85
+ const firstJson = findFirstCompleteJson(cleaned);
86
+ if (firstJson) {
87
+ try {
88
+ return JSON.parse(firstJson);
89
+ }
90
+ catch (parseError) {
91
+ const repaired = (0, jsonrepair_1.jsonrepair)(firstJson);
92
+ return JSON.parse(repaired);
93
+ }
94
+ }
95
+ throw new Error('No valid JSON found in the string.');
96
+ }
97
+ /**
98
+ * Wraps parsed object with root key if schema expects it but it's missing
99
+ * Only applies when schema has a single root object key
100
+ */
101
+ function wrapRootIfMissing(parsed, schema) {
102
+ if (!(schema instanceof zod_1.z.ZodObject))
103
+ return parsed;
104
+ const shape = schema.shape;
105
+ const rootKeys = Object.keys(shape);
106
+ if (rootKeys.length !== 1)
107
+ return parsed;
108
+ const rootKey = rootKeys[0];
109
+ const rootSchema = shape[rootKey];
110
+ if (!(rootSchema instanceof zod_1.z.ZodObject))
111
+ return parsed;
112
+ // Already has the root key
113
+ if (parsed && typeof parsed === 'object' && rootKey in parsed)
114
+ return parsed;
115
+ // Check if parsed has all children of the expected root
116
+ const childShape = rootSchema.shape;
117
+ const childKeys = Object.keys(childShape);
118
+ const hasAllChildren = parsed && typeof parsed === 'object' && childKeys.every((k) => k in parsed);
119
+ if (hasAllChildren) {
120
+ return { [rootKey]: parsed };
121
+ }
122
+ return parsed;
123
+ }
124
+ /**
125
+ * Extracts the substring between the first and last braces
126
+ */
127
+ function extractOnlyJson(str) {
128
+ const start = str.indexOf('{');
129
+ const end = str.lastIndexOf('}') + 1;
130
+ if (start !== -1 && end !== -1) {
131
+ return str.slice(start, end);
132
+ }
133
+ return 'Invalid input: no braces found.';
134
+ }
135
+ /**
136
+ * Finds every complete JSON object in the input
137
+ */
138
+ function findAllPossibleJson(input) {
139
+ const candidates = [];
140
+ for (let i = 0; i < input.length; i++) {
141
+ if (input[i] === '{') {
142
+ const jsonCandidate = findCompleteJsonStartingAt(input, i);
143
+ if (jsonCandidate) {
144
+ candidates.push(jsonCandidate);
145
+ }
146
+ }
147
+ }
148
+ return candidates;
149
+ }
150
+ /**
151
+ * Returns a balanced JSON object starting at a given index
152
+ */
153
+ function findCompleteJsonStartingAt(input, startIndex) {
154
+ let braceCount = 0;
155
+ let inString = false;
156
+ let escapeNext = false;
157
+ for (let i = startIndex; i < input.length; i++) {
158
+ const char = input[i];
159
+ if (escapeNext) {
160
+ escapeNext = false;
161
+ continue;
162
+ }
163
+ if (char === '\\') {
164
+ escapeNext = true;
165
+ continue;
166
+ }
167
+ if (char === '"' && !escapeNext) {
168
+ inString = !inString;
169
+ continue;
170
+ }
171
+ if (!inString) {
172
+ if (char === '{') {
173
+ braceCount++;
174
+ }
175
+ else if (char === '}') {
176
+ braceCount--;
177
+ if (braceCount === 0) {
178
+ return input.substring(startIndex, i + 1);
179
+ }
180
+ }
181
+ }
182
+ }
183
+ return null;
184
+ }
185
+ /**
186
+ * Finds the first complete JSON object in the input
187
+ */
188
+ function findFirstCompleteJson(input) {
189
+ let braceCount = 0;
190
+ let startIndex = -1;
191
+ for (let i = 0; i < input.length; i++) {
192
+ const char = input[i];
193
+ if (char === '{') {
194
+ if (startIndex === -1)
195
+ startIndex = i;
196
+ braceCount++;
197
+ }
198
+ else if (char === '}') {
199
+ braceCount--;
200
+ if (braceCount === 0 && startIndex !== -1) {
201
+ return input.substring(startIndex, i + 1);
202
+ }
203
+ }
204
+ }
205
+ return null;
206
+ }
207
+ /**
208
+ * Checks whether the string may contain JSON braces
209
+ */
210
+ function hasPossibleJson(str) {
211
+ const start = str.indexOf('{');
212
+ const end = str.lastIndexOf('}') + 1;
213
+ return start > -1 && end > 1;
214
+ }
215
+ /**
216
+ * Validates whether the string is valid JSON
217
+ */
218
+ function isJsonString(str) {
219
+ try {
220
+ JSON.parse(str);
221
+ return true;
222
+ }
223
+ catch (e) {
224
+ return false;
225
+ }
226
+ }
package/package.json ADDED
@@ -0,0 +1,64 @@
1
+ {
2
+ "name": "json-llm-repair",
3
+ "version": "0.1.0",
4
+ "description": "Parse and repair JSON from LLM outputs with multiple strategies",
5
+ "main": "dist/index.js",
6
+ "types": "dist/index.d.ts",
7
+ "files": [
8
+ "dist"
9
+ ],
10
+ "scripts": {
11
+ "build": "tsc",
12
+ "test": "vitest run",
13
+ "test:watch": "vitest",
14
+ "prepublishOnly": "npm run build",
15
+ "release:patch": "npm version patch -m \"chore(release): %s\" && git push --follow-tags && npm publish",
16
+ "release:minor": "npm version minor -m \"chore(release): %s\" && git push --follow-tags && npm publish",
17
+ "release:major": "npm version major -m \"chore(release): %s\" && git push --follow-tags && npm publish",
18
+ "lint": "eslint src --ext .ts",
19
+ "lint:fix": "eslint src --ext .ts --fix",
20
+ "format": "prettier --write \"src/**/*.ts\"",
21
+ "format:check": "prettier --check \"src/**/*.ts\""
22
+ },
23
+ "keywords": [
24
+ "llm",
25
+ "json",
26
+ "parser",
27
+ "ai",
28
+ "openai",
29
+ "anthropic",
30
+ "repair",
31
+ "extract",
32
+ "typescript"
33
+ ],
34
+ "author": "Tiago Gouvêa",
35
+ "license": "MIT",
36
+ "repository": {
37
+ "type": "git",
38
+ "url": "https://github.com/tiagogouvea/json-llm-repair.git"
39
+ },
40
+ "homepage": "https://github.com/tiagogouvea/json-llm-repair#readme",
41
+ "bugs": {
42
+ "url": "https://github.com/tiagogouvea/json-llm-repair/issues"
43
+ },
44
+ "dependencies": {
45
+ "jsonrepair": "^3.8.0"
46
+ },
47
+ "peerDependencies": {
48
+ "zod": "^3.0.0"
49
+ },
50
+ "peerDependenciesMeta": {
51
+ "zod": {
52
+ "optional": true
53
+ }
54
+ },
55
+ "devDependencies": {
56
+ "zod": "^3.22.0",
57
+ "@typescript-eslint/eslint-plugin": "^6.0.0",
58
+ "@typescript-eslint/parser": "^6.0.0",
59
+ "eslint": "^8.0.0",
60
+ "prettier": "^3.0.0",
61
+ "typescript": "^5.0.0",
62
+ "vitest": "^1.0.0"
63
+ }
64
+ }