distenum 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
distenum-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Karan Taneja
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,129 @@
1
+ Metadata-Version: 2.4
2
+ Name: distenum
3
+ Version: 0.1.0
4
+ Summary: Parse JSON outputs with logprobs from the OpenAI API to convert enum fields into probability distributions.
5
+ Author: Karan Taneja
6
+ License-Expression: MIT
7
+ Project-URL: Repository, https://github.com/ktaneja6/distenum
8
+ Classifier: Development Status :: 3 - Alpha
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Programming Language :: Python :: 3.9
12
+ Classifier: Programming Language :: Python :: 3.10
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Topic :: Scientific/Engineering
16
+ Requires-Python: >=3.9
17
+ Description-Content-Type: text/markdown
18
+ License-File: LICENSE
19
+ Provides-Extra: openai
20
+ Requires-Dist: openai>=1.0; extra == "openai"
21
+ Provides-Extra: dev
22
+ Requires-Dist: pytest; extra == "dev"
23
+ Requires-Dist: ruff; extra == "dev"
24
+ Dynamic: license-file
25
+
26
+ # distenum
27
+
28
+ **distenum** parses JSON outputs with logprobs from the OpenAI API and converts `enum`-type string fields into probability distributions instead of a single label.
29
+
30
+ ## What it does
31
+
32
+ With structured outputs, the API returns a single enum value (e.g. `"positive"`). If you request `logprobs=True`, you also get token-level logprobs. **distenum** turns those logprobs into a probability distribution over your enum options so you can see how confident the model was in each choice.
33
+
34
+ **Example:** For a sentiment field with `enum: ["positive", "negative", "neutral"]`:
35
+
36
+ | From the API (content only) | With distenum (using logprobs) |
37
+ |-----------------------------|---------------------------------|
38
+ | `"sentiment": "positive"` | `"sentiment": {"positive": 0.72, "negative": 0.18, "neutral": 0.10}` |
39
+
40
+ So instead of a single label, you get a distribution you can use for uncertainty, ranking, or thresholding (e.g. only accept when `positive` probability > 0.8).
41
+
42
+ ## Install
43
+
44
+ ```bash
45
+ pip install distenum
46
+ ```
47
+
48
+ To run the example script (calls the OpenAI API):
49
+
50
+ ```bash
51
+ pip install distenum[openai]
52
+ ```
53
+
54
+ ## Quick start
55
+
56
+ Install the package and the OpenAI client: `pip install distenum[openai]`. Then call the API with `logprobs=True` and `top_logprobs=20`, and pass the response logprobs into distenum:
57
+
58
+ ```python
59
+ from openai import OpenAI
60
+ from distenum import parse_using_schema_and_logprobs
61
+
62
+ schema = {
63
+ "type": "object",
64
+ "properties": {
65
+ "sentiment": {
66
+ "type": "string",
67
+ "enum": ["positive", "negative", "neutral"]
68
+ }
69
+ }
70
+ }
71
+
72
+ client = OpenAI()
73
+ response = client.chat.completions.create(
74
+ model="gpt-4o-2024-08-06",
75
+ messages=[{"role": "user", "content": "Your prompt"}],
76
+ response_format={
77
+ "type": "json_schema",
78
+ "json_schema": {"name": "my_schema", "strict": True, "schema": schema}
79
+ },
80
+ logprobs=True,
81
+ top_logprobs=20,
82
+ )
83
+
84
+ logprobs_data = response.choices[0].logprobs
85
+ parsed = parse_using_schema_and_logprobs(schema, logprobs_data)
86
+ # parsed["sentiment"] might be: {"positive": 0.72, "negative": 0.18, "neutral": 0.10}
87
+ ```
88
+
89
+ ## Enum design tips
90
+
91
+ - **Different prefixes:** Enum values are matched to token logprobs by **prefix**. Prefer enum labels that do not share a common prefix (e.g. `"positive"`, `"negative"`, `"neutral"` are good; `"pos"` and `"positive"` can blur probabilities).
92
+ - **Fewer is better:** The API returns at most **20** logprobs per token (`top_logprobs=20`). With many enum values, most will get no mass; keep enums small for meaningful distributions.
93
+
94
+ ## API
95
+
96
+ - **`parse_using_schema_and_logprobs(schema_dict, logprobs_data)`**
97
+ Parses the logprobs stream according to the JSON Schema. Fields of type `string` with an `enum` are returned as a dict mapping each enum label to a probability (non-negative, summing to 1). Other fields are parsed as normal JSON values.
98
+
99
+ - **`tokenize(logprobs_data)`**
100
+ Low-level generator that yields tokens and their top-logprobs from the OpenAI logprobs content.
101
+
102
+ ## Performance
103
+
104
+ The parser walks token-level logprobs and builds probability distributions for enum fields, so it is slower than parsing the same JSON with the standard library. A rough comparison (same logical structure, 100k iterations):
105
+
106
+ | Parser | Time (100k parses) | Throughput | Avg per parse |
107
+ |---------------|--------------------|--------------|---------------|
108
+ | `json.loads` | ~0.16 s | ~630k/sec | ~1.6 µs |
109
+ | distenum | ~3.0 s | ~33k/sec | ~30 µs |
110
+
111
+ So distenum is typically **about 15–20× slower** than `json.loads` for the same structure. In absolute terms, **~30 µs per parse** is negligible compared to an OpenAI API call (typically hundreds of milliseconds to several seconds). Parsing a single response adds no meaningful latency.
112
+
113
+ To run the benchmark yourself from the repo root:
114
+
115
+ ```bash
116
+ PYTHONPATH=. python scripts/benchmark_parser.py
117
+ ```
118
+
119
+ ## Example script
120
+
121
+ From the repo root (with `distenum[openai]` installed):
122
+
123
+ ```bash
124
+ python example_sentiment_openai.py
125
+ ```
126
+
127
+ ## License
128
+
129
+ MIT
@@ -0,0 +1,104 @@
1
+ # distenum
2
+
3
+ **distenum** parses JSON outputs with logprobs from the OpenAI API and converts `enum`-type string fields into probability distributions instead of a single label.
4
+
5
+ ## What it does
6
+
7
+ With structured outputs, the API returns a single enum value (e.g. `"positive"`). If you request `logprobs=True`, you also get token-level logprobs. **distenum** turns those logprobs into a probability distribution over your enum options so you can see how confident the model was in each choice.
8
+
9
+ **Example:** For a sentiment field with `enum: ["positive", "negative", "neutral"]`:
10
+
11
+ | From the API (content only) | With distenum (using logprobs) |
12
+ |-----------------------------|---------------------------------|
13
+ | `"sentiment": "positive"` | `"sentiment": {"positive": 0.72, "negative": 0.18, "neutral": 0.10}` |
14
+
15
+ So instead of a single label, you get a distribution you can use for uncertainty, ranking, or thresholding (e.g. only accept when `positive` probability > 0.8).
16
+
17
+ ## Install
18
+
19
+ ```bash
20
+ pip install distenum
21
+ ```
22
+
23
+ To run the example script (calls the OpenAI API):
24
+
25
+ ```bash
26
+ pip install distenum[openai]
27
+ ```
28
+
29
+ ## Quick start
30
+
31
+ Install the package and the OpenAI client: `pip install distenum[openai]`. Then call the API with `logprobs=True` and `top_logprobs=20`, and pass the response logprobs into distenum:
32
+
33
+ ```python
34
+ from openai import OpenAI
35
+ from distenum import parse_using_schema_and_logprobs
36
+
37
+ schema = {
38
+ "type": "object",
39
+ "properties": {
40
+ "sentiment": {
41
+ "type": "string",
42
+ "enum": ["positive", "negative", "neutral"]
43
+ }
44
+ }
45
+ }
46
+
47
+ client = OpenAI()
48
+ response = client.chat.completions.create(
49
+ model="gpt-4o-2024-08-06",
50
+ messages=[{"role": "user", "content": "Your prompt"}],
51
+ response_format={
52
+ "type": "json_schema",
53
+ "json_schema": {"name": "my_schema", "strict": True, "schema": schema}
54
+ },
55
+ logprobs=True,
56
+ top_logprobs=20,
57
+ )
58
+
59
+ logprobs_data = response.choices[0].logprobs
60
+ parsed = parse_using_schema_and_logprobs(schema, logprobs_data)
61
+ # parsed["sentiment"] might be: {"positive": 0.72, "negative": 0.18, "neutral": 0.10}
62
+ ```
63
+
64
+ ## Enum design tips
65
+
66
+ - **Different prefixes:** Enum values are matched to token logprobs by **prefix**. Prefer enum labels that do not share a common prefix (e.g. `"positive"`, `"negative"`, `"neutral"` are good; `"pos"` and `"positive"` can blur probabilities).
67
+ - **Fewer is better:** The API returns at most **20** logprobs per token (`top_logprobs=20`). With many enum values, most will get no mass; keep enums small for meaningful distributions.
68
+
69
+ ## API
70
+
71
+ - **`parse_using_schema_and_logprobs(schema_dict, logprobs_data)`**
72
+ Parses the logprobs stream according to the JSON Schema. Fields of type `string` with an `enum` are returned as a dict mapping each enum label to a probability (non-negative, summing to 1). Other fields are parsed as normal JSON values.
73
+
74
+ - **`tokenize(logprobs_data)`**
75
+ Low-level generator that yields tokens and their top-logprobs from the OpenAI logprobs content.
76
+
77
+ ## Performance
78
+
79
+ The parser walks token-level logprobs and builds probability distributions for enum fields, so it is slower than parsing the same JSON with the standard library. A rough comparison (same logical structure, 100k iterations):
80
+
81
+ | Parser | Time (100k parses) | Throughput | Avg per parse |
82
+ |---------------|--------------------|--------------|---------------|
83
+ | `json.loads` | ~0.16 s | ~630k/sec | ~1.6 µs |
84
+ | distenum | ~3.0 s | ~33k/sec | ~30 µs |
85
+
86
+ So distenum is typically **about 15–20× slower** than `json.loads` for the same structure. In absolute terms, **~30 µs per parse** is negligible compared to an OpenAI API call (typically hundreds of milliseconds to several seconds). Parsing a single response adds no meaningful latency.
87
+
88
+ To run the benchmark yourself from the repo root:
89
+
90
+ ```bash
91
+ PYTHONPATH=. python scripts/benchmark_parser.py
92
+ ```
93
+
94
+ ## Example script
95
+
96
+ From the repo root (with `distenum[openai]` installed):
97
+
98
+ ```bash
99
+ python example_sentiment_openai.py
100
+ ```
101
+
102
+ ## License
103
+
104
+ MIT
@@ -0,0 +1,5 @@
1
+ """distenum: parse OpenAI JSON logprobs and convert enum fields to probability distributions."""
2
+
3
+ from .parser import parse_using_schema_and_logprobs, tokenize
4
+
5
+ __all__ = ["parse_using_schema_and_logprobs", "tokenize"]
@@ -0,0 +1,348 @@
1
+ """Parse JSON logprobs from the OpenAI API and convert enum fields to probability distributions."""
2
+
3
+ import itertools
4
+ import math
5
+
6
+
7
+ def tokenize(logprobs_data):
8
+ """Yield tokens and their top_logprobs from OpenAI logprobs content."""
9
+ if logprobs_data is None:
10
+ raise ValueError(
11
+ "logprobs_data is None. Pass the logprobs object from the response, e.g. response.choices[0].logprobs"
12
+ )
13
+ if not hasattr(logprobs_data, "content"):
14
+ raise ValueError(
15
+ "logprobs_data has no 'content' attribute. Expected an OpenAI logprobs object (e.g. response.choices[0].logprobs)."
16
+ )
17
+ logprobs_sequence = logprobs_data.content
18
+ if not logprobs_sequence:
19
+ raise ValueError(
20
+ "logprobs_data.content is empty. Ensure the API response was requested with logprobs=True and has content."
21
+ )
22
+
23
+ n = len(logprobs_sequence)
24
+ yield logprobs_sequence[0].top_logprobs
25
+ li = 0
26
+ ci = 0
27
+
28
+ def increment_ci(increase=1):
29
+ nonlocal li, ci
30
+ for _ in range(increase):
31
+ if ci == len(logprobs_sequence[li].token) - 1:
32
+ li += 1
33
+ ci = 0
34
+ yield logprobs_sequence[li].top_logprobs
35
+ else:
36
+ ci += 1
37
+
38
+ while li < n:
39
+ char = logprobs_sequence[li].token[ci]
40
+
41
+ # --- Whitespace ---
42
+ if char.isspace():
43
+ yield from increment_ci()
44
+ continue
45
+
46
+ # --- Structural Tokens ---
47
+ if char in ["{", "}", "[", "]", ":", ","]:
48
+ yield char
49
+ yield from increment_ci()
50
+ continue
51
+
52
+ # --- String Token (with JSON escape decoding: \", \\, \/, \b, \f, \n, \r, \t, \uXXXX) ---
53
+ if char == '"':
54
+ yield from increment_ci()
55
+ start_to_end_string = ""
56
+ _escape_map = {
57
+ '"': '"',
58
+ "\\": "\\",
59
+ "/": "/",
60
+ "b": "\b",
61
+ "f": "\f",
62
+ "n": "\n",
63
+ "r": "\r",
64
+ "t": "\t",
65
+ }
66
+ while li < n:
67
+ c = logprobs_sequence[li].token[ci]
68
+ if c == "\\":
69
+ yield from increment_ci()
70
+ if li >= n:
71
+ break
72
+ esc = logprobs_sequence[li].token[ci]
73
+ if esc == "u":
74
+ yield from increment_ci()
75
+ hex_str = ""
76
+ for _ in range(4):
77
+ if li < n:
78
+ hex_str += logprobs_sequence[li].token[ci]
79
+ yield from increment_ci()
80
+ if len(hex_str) == 4:
81
+ start_to_end_string += chr(int(hex_str, 16))
82
+ else:
83
+ start_to_end_string += _escape_map.get(esc, esc)
84
+ yield from increment_ci()
85
+ elif c == '"':
86
+ break
87
+ else:
88
+ start_to_end_string += c
89
+ yield from increment_ci()
90
+ yield '"' + start_to_end_string + '"'
91
+ yield from increment_ci()
92
+ continue
93
+
94
+ # --- Number/Keyword Tokens (incl. scientific notation: 1e-5, 2.5E+10) ---
95
+ if char.isdigit() or char == "-":
96
+ start_to_end_number = ""
97
+ while li < n:
98
+ c = logprobs_sequence[li].token[ci]
99
+ if (
100
+ c.isdigit()
101
+ or c == "."
102
+ or c == "-"
103
+ or c in "eE+"
104
+ ):
105
+ start_to_end_number += c
106
+ yield from increment_ci()
107
+ else:
108
+ break
109
+ yield start_to_end_number
110
+ continue
111
+
112
+ # --- Keywords (true, false, null) ---
113
+ if char in "tfn":
114
+ if logprobs_sequence[li].token[ci : ci + 4] == "true":
115
+ yield "true"
116
+ yield from increment_ci(4)
117
+ elif logprobs_sequence[li].token[ci : ci + 5] == "false":
118
+ yield "false"
119
+ yield from increment_ci(5)
120
+ elif logprobs_sequence[li].token[ci : ci + 4] == "null":
121
+ yield "null"
122
+ yield from increment_ci(4)
123
+ else:
124
+ raise ValueError(
125
+ f"Invalid keyword in logprobs stream at content index {li}, char index {ci}: "
126
+ f"got {logprobs_sequence[li].token!r}. Expected one of: true, false, null."
127
+ )
128
+ continue
129
+
130
+ raise ValueError(
131
+ f"Unexpected character in logprobs stream at content index {li}, char index {ci}: "
132
+ f"{char!r} in {logprobs_sequence[li].token!r}. Expected a valid JSON token (string, number, true, false, null, or {{ }} [ ] : ,)."
133
+ )
134
+
135
+
136
+ def get_next_token(tokens, allow_list=False):
137
+ token = next(tokens)
138
+ while isinstance(token, list) and not allow_list:
139
+ token = next(tokens)
140
+ return token
141
+
142
+
143
+ def parse_value(tokens, schema_dict):
144
+ """Parse any JSON value (object, array, string, number, bool, null) per schema."""
145
+ logprobs = None
146
+ token = get_next_token(tokens, allow_list=True)
147
+ if isinstance(token, list):
148
+ logprobs = token
149
+ token = get_next_token(tokens)
150
+
151
+ if token == "{" and schema_dict.get("type") == "object":
152
+ return parse_object(tokens, schema_dict)
153
+ elif token == "[" and schema_dict.get("type", "array") == "array":
154
+ return parse_array(tokens, schema_dict)
155
+
156
+ elif token.startswith('"') and schema_dict.get("type") == "string":
157
+ if "enum" in schema_dict:
158
+ if logprobs is None or (isinstance(logprobs, list) and len(logprobs) == 0):
159
+ raise ValueError(
160
+ "Enum field in schema but no logprobs available for this token. "
161
+ "Ensure the API was called with logprobs=True and top_logprobs (e.g. 20). "
162
+ f"Schema enum: {schema_dict['enum']}."
163
+ )
164
+ output = {
165
+ class_label: sum(
166
+ math.exp(tlp.logprob)
167
+ for tlp in logprobs
168
+ if class_label.lower().startswith(tlp.token.strip().lower())
169
+ )
170
+ for class_label in schema_dict["enum"]
171
+ }
172
+ sum_output = sum(output.values())
173
+ if sum_output == 0:
174
+ raise ValueError(
175
+ "Enum field: no logprob mass matched any enum value. "
176
+ f"Schema enum: {schema_dict['enum']}. "
177
+ "Try enum labels with distinct prefixes, or increase top_logprobs."
178
+ )
179
+ output = {k: v / sum_output for k, v in output.items()}
180
+ # Consume the rest of the string token if we only saw the opening quote
181
+ if token == '"':
182
+ get_next_token(tokens)
183
+ else:
184
+ output = token[1:-1]
185
+ return output
186
+
187
+ elif token == "null" and schema_dict.get("type") == "null":
188
+ return None
189
+
190
+ elif token in ("true", "false", "null") and schema_dict.get("type") == "boolean":
191
+ return {"true": True, "false": False, "null": None}[token]
192
+
193
+ elif schema_dict.get("type") == "integer":
194
+ try:
195
+ return int(token)
196
+ except ValueError:
197
+ raise ValueError(
198
+ f"Expected an integer for schema type 'integer', got {token!r}. "
199
+ "Check that the logprobs content matches the schema."
200
+ ) from None
201
+ elif schema_dict.get("type") == "number":
202
+ try:
203
+ if "." in token or "e" in token.lower():
204
+ return float(token)
205
+ return int(token)
206
+ except ValueError:
207
+ raise ValueError(
208
+ f"Expected a number for schema type 'number', got {token!r}. "
209
+ "Check that the logprobs content matches the schema."
210
+ ) from None
211
+ else:
212
+ schema_type = schema_dict.get("type", "(missing)")
213
+ raise ValueError(
214
+ f"Unexpected token {token!r} for schema type {schema_type!r}. "
215
+ f"Schema expects one of: object, array, string, number, integer, boolean, null. "
216
+ "Ensure the logprobs_data content matches the JSON structure implied by the schema."
217
+ )
218
+
219
+
220
+ def parse_object(tokens, schema_dict):
221
+ """Parse a JSON object into a Python dict."""
222
+ if "properties" not in schema_dict:
223
+ raise ValueError(
224
+ "Schema type is 'object' but schema has no 'properties'. "
225
+ "Add a 'properties' dict, e.g. {\"type\": \"object\", \"properties\": {\"key\": {\"type\": \"string\"}}}."
226
+ )
227
+ properties = schema_dict["properties"]
228
+ obj = {}
229
+ peek = get_next_token(tokens)
230
+
231
+ if peek == "}":
232
+ return obj
233
+
234
+ while True:
235
+ if not peek.startswith('"'):
236
+ raise ValueError(
237
+ f"Expected a quoted string key (object property name), got {peek!r}. "
238
+ "JSON object keys must be double-quoted strings."
239
+ )
240
+ # Tokenizer yields '"' then '"key"'; consume full key token when needed
241
+ if peek == '"':
242
+ peek = get_next_token(tokens)
243
+ key = peek[1:-1]
244
+
245
+ colon = get_next_token(tokens)
246
+ if colon != ":":
247
+ raise ValueError(
248
+ f"Expected ':' after object key {key!r}, got {colon!r}. "
249
+ "Check that the logprobs content is valid JSON."
250
+ )
251
+
252
+ if key not in properties:
253
+ allowed = list(properties.keys())
254
+ raise ValueError(
255
+ f"Unknown key {key!r}. Schema only allows: {allowed}. "
256
+ "Ensure the response matches the schema, or add this key to schema properties."
257
+ )
258
+
259
+ value = parse_value(tokens, properties[key])
260
+ obj[key] = value
261
+
262
+ separator = get_next_token(tokens)
263
+ if separator == "}":
264
+ return obj
265
+ if separator != ",":
266
+ raise ValueError(
267
+ f"Expected ',' or '}}' after object value, got {separator!r}. "
268
+ "Check that the logprobs content is valid JSON."
269
+ )
270
+
271
+ peek = get_next_token(tokens)
272
+
273
+
274
+ def parse_array(tokens, schema_dict):
275
+ """Parse a JSON array into a Python list."""
276
+ if "items" not in schema_dict:
277
+ raise ValueError(
278
+ "Schema type is 'array' but schema has no 'items'. "
279
+ "Add an 'items' schema, e.g. {\"type\": \"array\", \"items\": {\"type\": \"string\"}}."
280
+ )
281
+ arr = []
282
+ first_token = next(tokens)
283
+ if isinstance(first_token, list):
284
+ first_token = next(tokens)
285
+ if first_token == "]":
286
+ return arr
287
+
288
+ while True:
289
+ if arr:
290
+ value = parse_value(tokens, schema_dict["items"])
291
+ else:
292
+ temp_tokens = itertools.chain([first_token], tokens)
293
+ value = parse_value(temp_tokens, schema_dict["items"])
294
+
295
+ arr.append(value)
296
+
297
+ separator = next(tokens)
298
+ if isinstance(separator, list):
299
+ separator = next(tokens)
300
+
301
+ if separator == "]":
302
+ return arr
303
+ if separator != ",":
304
+ raise ValueError(
305
+ f"Expected ',' or ']' after array element, got {separator!r}. "
306
+ "Check that the logprobs content is valid JSON."
307
+ )
308
+
309
+
310
+ def _tokenizer_wrapper(logprobs_data):
311
+ yield from tokenize(logprobs_data)
312
+
313
+
314
+ def parse_using_schema_and_logprobs(schema_dict, logprobs_data):
315
+ """Parse OpenAI logprobs content according to a JSON schema.
316
+
317
+ For schema fields of type string with an \"enum\", returns a probability
318
+ distribution over the enum values (dict mapping each enum label to a
319
+ probability in [0, 1]) instead of the raw string. Other fields are
320
+ parsed as usual JSON values.
321
+
322
+ Args:
323
+ schema_dict: JSON Schema dict (e.g. type, properties, items, enum).
324
+ logprobs_data: OpenAI response logprobs object (e.g. response.choices[0].logprobs).
325
+
326
+ Returns:
327
+ Parsed structure (dict/list/primitives) with enum fields as probability dicts.
328
+ """
329
+ if schema_dict is None:
330
+ raise ValueError("schema_dict is None. Pass a JSON Schema dict with at least 'type'.")
331
+ if not isinstance(schema_dict, dict):
332
+ raise ValueError(
333
+ f"schema_dict must be a dict, got {type(schema_dict).__name__}. "
334
+ "Pass a JSON Schema dict (e.g. {\"type\": \"object\", \"properties\": {...}})."
335
+ )
336
+ if "type" not in schema_dict:
337
+ raise ValueError(
338
+ "schema_dict must contain 'type'. Example: {\"type\": \"object\", \"properties\": {...}}."
339
+ )
340
+
341
+ tokens = _tokenizer_wrapper(logprobs_data)
342
+ try:
343
+ return parse_value(tokens, schema_dict)
344
+ except StopIteration:
345
+ raise ValueError(
346
+ "Logprobs stream ended unexpectedly. The logprobs_data.content may not match the expected JSON structure, "
347
+ "or the response may be truncated. Ensure logprobs=True and top_logprobs is set (e.g. 20)."
348
+ ) from None