structai 0.1.2__tar.gz → 0.1.5__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,598 @@
1
+ Metadata-Version: 2.4
2
+ Name: structai
3
+ Version: 0.1.5
4
+ Summary: A utility package for AI development
5
+ Author-email: Wanghan Xu <xu_wanghan@sjtu.edu.cn>
6
+ Project-URL: Homepage, https://github.com/black-yt/structai
7
+ Classifier: Programming Language :: Python :: 3
8
+ Classifier: License :: OSI Approved :: MIT License
9
+ Classifier: Operating System :: OS Independent
10
+ Requires-Python: >=3.6
11
+ Description-Content-Type: text/markdown
12
+ License-File: LICENSE
13
+ Requires-Dist: openai
14
+ Requires-Dist: python-Levenshtein
15
+ Requires-Dist: json_repair
16
+ Requires-Dist: pillow
17
+ Requires-Dist: httpx[socks]
18
+ Requires-Dist: pandas
19
+ Requires-Dist: numpy
20
+ Requires-Dist: tqdm
21
+ Requires-Dist: fastapi
22
+ Requires-Dist: uvicorn
23
+ Dynamic: license-file
24
+
25
+ # StructAI
26
+
27
+ StructAI is a comprehensive utility package for AI development, offering a robust set of tools for file operations, LLM interactions, parallel processing, and general programming tasks.
28
+
29
+ ## Installation
30
+
31
+ > **Recommended for most users.** Installs the latest stable release from PyPI.
32
+ ```bash
33
+ pip install structai
34
+ ```
35
+
36
+ > **For development.** Installs StructAI in editable mode from source, enabling live code changes.
37
+
38
+ ```bash
39
+ git clone https://github.com/black-yt/structai.git
40
+ cd structai
41
+ pip install -e .
42
+ ```
43
+
44
+ > **Note:** Before using LLM-related features, please ensure you have set the necessary environment variables:
45
+
46
+ ```bash
47
+ export LLM_API_KEY="your-api-key"
48
+ export LLM_BASE_URL="your-api-base-url"
49
+ ```
50
+
51
+ ---
52
+
53
+ ## StructAI Library Documentation
54
+
55
+ ### `structai_skill`
56
+
57
+ Returns a comprehensive documentation string for the StructAI library in Markdown format. This is useful for providing context to LLMs about the available tools in this library.
58
+
59
+ * **Args**:
60
+ * None
61
+ * **Returns**:
62
+ * (str): The documentation string.
63
+
64
+ * **Example**:
65
+ ```python
66
+ from structai import structai_skill
67
+
68
+ docs = structai_skill()
69
+ print(docs)
70
+ ```
71
+
72
+
73
+ ### `load_file`
74
+ Automatically reads a file based on its extension.
75
+
76
+ * **Args**:
77
+ * `path` (str): The path to the file to be read.
78
+ * **Returns**:
79
+ * (Any): The content of the file, parsed into an appropriate Python object.
80
+ * `.json` -> `dict` or `list`
81
+ * `.jsonl` -> `list` of dicts
82
+ * `.csv`, `.parquet`, `.xlsx` -> `pandas.DataFrame`
83
+ * `.txt`, `.md`, `.py` -> `str`
84
+ * `.pkl` -> unpickled object
85
+ * `.npy` -> `numpy.ndarray`
86
+ * `.pt` -> `torch` object
87
+ * `.png`, `.jpg`, `.jpeg` -> `PIL.Image.Image`
88
+
89
+ * **Example**:
90
+ ```python
91
+ from structai import load_file
92
+
93
+ # Load a JSON file
94
+ data = load_file("config.json")
95
+
96
+ # Load a CSV file as a pandas DataFrame
97
+ df = load_file("data.csv")
98
+
99
+ # Load an image
100
+ image = load_file("photo.jpg")
101
+ ```
102
+
103
+ ### `save_file`
104
+ Automatically saves data to a file based on the extension. Creates necessary directories if they don't exist.
105
+
106
+ * **Args**:
107
+ * `data` (Any): The data object to save.
108
+ * `path` (str): The destination file path.
109
+ * **Returns**:
110
+ * None
111
+
112
+ * **Example**:
113
+ ```python
114
+ from structai import save_file
115
+
116
+ data = {"key": "value"}
117
+
118
+ # Save as JSON
119
+ save_file(data, "output.json")
120
+
121
+ # Save as Pickle
122
+ save_file(data, "backup.pkl")
123
+ ```
124
+
125
+ ### `print_once`
126
+ Prints a message to stdout only once during the entire program execution. Useful for logging warnings or info inside loops.
127
+
128
+ * **Args**:
129
+ * `msg` (str): The message to print.
130
+ * **Returns**:
131
+ * None
132
+
133
+ * **Example**:
134
+ ```python
135
+ from structai import print_once
136
+
137
+ for i in range(10):
138
+ print_once("Starting processing...") # print only once
139
+ ```
140
+
141
+ ### `make_print_once`
142
+ Creates and returns a local function that prints a message only once. This is useful if you need a "print once" behavior scoped to a specific function or instance rather than globally.
143
+
144
+ * **Args**:
145
+ * None
146
+ * **Returns**:
147
+ * (callable): A function `inner(msg)` that behaves like `print_once`.
148
+
149
+ * **Example**:
150
+ ```python
151
+ from structai import make_print_once
152
+
153
+ logger1 = make_print_once()
154
+ logger2 = make_print_once()
155
+
156
+ logger1("Hello") # Prints "Hello"
157
+ logger1("Hello") # Does nothing
158
+
159
+ logger2("World") # Prints "World"
160
+ logger2("World") # Does nothing
161
+ ```
162
+
163
+ ### `LLMAgent` Class
164
+
165
+ A powerful wrapper class for interacting with OpenAI-compatible LLM APIs. It handles retries, timeouts, and structured output validation.
166
+
167
+ #### `initialization`
168
+
169
+ * **Args**:
170
+ * `api_key` (str, optional): API Key. Defaults to `os.environ["LLM_API_KEY"]`.
171
+ * `api_base` (str, optional): Base URL. Defaults to `os.environ["LLM_BASE_URL"]`.
172
+ * `model_version` (str, optional): Model identifier. Default `'gpt-4.1-mini'`.
173
+ * `system_prompt` (str, optional): Default system prompt. Default `'You are a helpful assistant.'`.
174
+ * `max_tokens` (int, optional): Maximum tokens for generation. Default `None`.
175
+ * `temperature` (float, optional): Sampling temperature. Default `0`.
176
+ * `http_client` (httpx.Client, optional): Optional custom httpx client.
177
+ * `headers` (dict, optional): Optional custom headers.
178
+ * `time_limit` (int, optional): Timeout in seconds. Default `300` (5 minutes).
179
+ * `max_try` (int, optional): Default number of retries. Default `1`.
180
+ * `use_responses_api` (bool, optional): Whether to use the Responses API format. Default `False`.
181
+
182
+ * **Returns**:
183
+ * (LLMAgent): LLMAgent instance.
184
+
185
+ * **Example**:
186
+ ```python
187
+ from structai import LLMAgent
188
+
189
+ agent = LLMAgent()
190
+ ```
191
+
192
+ #### `__call__`
193
+ Sends a query to the LLM with built-in validation, parsing, and retry logic.
194
+
195
+
196
+ * **Args**:
197
+ * `query` (str): The main input text or prompt to be sent to the LLM.
198
+ * `system_prompt` (str, optional): The system instruction. Overrides the default if provided.
199
+ * `return_example` (str | list | dict, optional): A template defining the expected structure and type of the response.
200
+ * `None` or `str` (default): Returns raw response string.
201
+ * `list`: Expects a JSON list string. Validates element types if example elements are provided.
202
+ * `dict`: Expects a JSON object string. Validates keys (supports fuzzy matching).
203
+ * `max_try` (int, optional): Max attempts. Defaults to instance's `max_try`.
204
+ * `wait_time` (float, optional): Time in seconds to wait between retries. Default `0.0`.
205
+ * `n` (int, optional): Number of completion choices. Default `1`.
206
+ * `max_tokens` (int, optional): Overrides instance's `max_tokens`.
207
+ * `temperature` (float, optional): Overrides instance's `temperature`.
208
+ * `image_paths` (list[str], optional): List of local image paths for multimodal models.
209
+ * `history` (list[dict], optional): Conversation history `[{"role": "user", "content": "..."}, ...]`.
210
+ * `use_responses_api` (bool, optional): Overrides instance setting.
211
+ * `list_len` (int, optional): *Validation* - Enforces exact list length.
212
+ * `list_min` (int | float, optional): *Validation* - Enforces minimum value for list elements.
213
+ * `list_max` (int | float, optional): *Validation* - Enforces maximum value for list elements.
214
+ * `check_keys` (bool, optional): *Validation* - Whether to validate dict keys. Default `True`.
215
+
216
+ * **Returns**:
217
+ * (str | list | dict): The parsed response from the LLM.
218
+ * If `n > 1`, returns a list of results.
219
+ * Returns `None` if all retries fail.
220
+
221
+ * **Example**:
222
+ ```python
223
+ # Basic usage
224
+ response = agent("Generate a random number.", n=3, temperature=1)
225
+ # Output: ["Sure! Here's a random number for you: 738", "Sure! Here's a random number: 7382", "Sure! Here's a random number: 487."]
226
+
227
+ # Enforce the output format (List, Dict, or specific types) using `return_example`. Note that the output format needs to be explicitly specified in the prompt.
228
+ numbers = agent(
229
+ "Generate 3 random numbers, for example, [1, 2, 3].",
230
+ return_example=[1],
231
+ list_len=3
232
+ )
233
+ # Output: [10, 42, 7]
234
+
235
+ profile = agent(
236
+ "Create a user profile for Alice, for example, {'name': Alice, 'age': 1, 'city': 'shanghai'}.",
237
+ return_example={"name": "str", "age": 1, "city": "str"}
238
+ )
239
+ # Output: {'name': 'Alice', 'age': 25, 'city': 'New York'}
240
+
241
+ # Multimodal input for vision models
242
+ description = agent(
243
+ "Describe these images",
244
+ image_paths=["path/to/image_1.jpg", "path/to/image_2.jpg"]
245
+ )
246
+
247
+ # Memory context
248
+ history = [
249
+ {"role": "user", "content": "My name is Bob."},
250
+ {"role": "assistant", "content": "Hello Bob."}
251
+ ]
252
+ answer = agent(
253
+ "What is my name?",
254
+ history=history,
255
+ )
256
+ # Output: 'Your name is Bob.'
257
+ ```
258
+
259
+ ### `sanitize_text`
260
+
261
+ Sanitizes text by keeping only ASCII English characters, digits, and common punctuation. Removes control characters and ANSI codes.
262
+
263
+ * **Args**:
264
+ * `text` (str): The text to sanitize.
265
+ * **Returns**:
266
+ * (str): The sanitized text.
267
+
268
+ * **Example**:
269
+ ```python
270
+ from structai import sanitize_text
271
+
272
+ clean = sanitize_text("Hello \x1b[31mWorld\x1b[0m!")
273
+ print(clean) # 'Hello [31mWorld[0m!'
274
+ ```
275
+
276
+ ### `filter_excessive_repeats`
277
+
278
+ Identifies sequences where a single character repeats more than the specified threshold and removes them entirely from the string.
279
+
280
+ * **Args**:
281
+ * `text` (str): The input string.
282
+ * `threshold` (int, optional): The maximum allowed consecutive repetitions. Default `5`.
283
+ * **Returns**:
284
+ * (str): The processed string with excessive repetitions removed.
285
+
286
+ * **Example**:
287
+ ```python
288
+ from structai import filter_excessive_repeats
289
+
290
+ clean = filter_excessive_repeats("Helloooooo World", threshold=5)
291
+ print(clean) # "Hell World"
292
+ ```
293
+
294
+ ### `str2dict`
295
+
296
+ Robustly converts a string representation of a dictionary to a Python `dict`. It handles common formatting errors and uses `json_repair` as a fallback.
297
+
298
+ * **Args**:
299
+ * `s` (str): The string representation of a dictionary.
300
+ * **Returns**:
301
+ * (dict): The parsed dictionary.
302
+
303
+ * **Example**:
304
+ ```python
305
+ from structai import str2dict
306
+
307
+ d = str2dict("{'a': 1, 'b': 2}")
308
+ print(d['a']) # 1
309
+ ```
310
+
311
+ ### `str2list`
312
+
313
+ Robustly converts a string representation of a list to a Python `list`.
314
+
315
+ * **Args**:
316
+ * `s` (str): The string representation of a list.
317
+ * **Returns**:
318
+ * (list): The parsed list.
319
+
320
+ * **Example**:
321
+ ```python
322
+ from structai import str2list
323
+
324
+ l = str2list("[1, 2, 3]")
325
+ print(len(l)) # 3
326
+ ```
327
+
328
+ ### `add_no_proxy_if_private`
329
+
330
+ Checks if the hostname in the URL is a private IP address. If so, it adds it to the `no_proxy` environment variable to bypass proxies.
331
+
332
+ * **Args**:
333
+ * `url` (str): The URL to check.
334
+ * **Returns**:
335
+ * None
336
+
337
+ * **Example**:
338
+ ```python
339
+ from structai import add_no_proxy_if_private
340
+
341
+ add_no_proxy_if_private("http://192.168.1.100:8080/v1")
342
+ ```
343
+
344
+ ### `read_image`
345
+
346
+ Reads an image from a path and returns a PIL Image object.
347
+
348
+ * **Args**:
349
+ * `image_path` (str): The path to the image file.
350
+ * **Returns**:
351
+ * (PIL.Image.Image): The loaded image object.
352
+
353
+ * **Example**:
354
+ ```python
355
+ from structai import read_image
356
+
357
+ img = read_image("photo.jpg")
358
+ ```
359
+
360
+ ### `encode_image`
361
+
362
+ Encodes a PIL Image object into a base64 string.
363
+
364
+ * **Args**:
365
+ * `image_obj` (PIL.Image.Image): The image object to encode.
366
+ * **Returns**:
367
+ * (str): The base64 encoded string.
368
+
369
+ * **Example**:
370
+ ```python
371
+ from structai import encode_image
372
+
373
+ b64_str = encode_image(img)
374
+ ```
375
+
376
+ ### `messages_to_responses_input`
377
+
378
+ Converts standard Chat Completions `messages` format (list of dicts) to the input format required by the Responses API.
379
+
380
+ * **Args**:
381
+ * `messages` (list[dict]): List of message dictionaries with 'role' and 'content'.
382
+ * **Returns**:
383
+ * (tuple): A tuple containing `(system_prompt_content, input_blocks)`.
384
+
385
+ * **Example**:
386
+ ```python
387
+ from structai import messages_to_responses_input
388
+
389
+ messages = [{"role": "user", "content": "Hello"}]
390
+ system_prompt, input_blocks = messages_to_responses_input(messages)
391
+ ```
392
+
393
+ ### `extract_text_outputs`
394
+
395
+ Extracts the text content from an LLM API response object (supports both Chat Completions and Responses API formats).
396
+
397
+ * **Args**:
398
+ * `result` (object): The response object from the LLM API.
399
+ * **Returns**:
400
+ * (list[str]): A list of extracted text outputs.
401
+
402
+ * **Example**:
403
+ ```python
404
+ from structai import extract_text_outputs
405
+
406
+ # Assuming 'response' is the object returned by the OpenAI client
407
+ texts = extract_text_outputs(response)
408
+ print(texts[0])
409
+ ```
410
+
411
+ ### `multi_thread`
412
+
413
+ Executes a function concurrently for each item in `inp_list` using a thread pool.
414
+
415
+ * **Args**:
416
+ * `inp_list` (list[dict]): A list of dictionaries, where each dictionary contains keyword arguments for `function`.
417
+ * `function` (callable): The function to execute.
418
+ * `max_workers` (int, optional): The maximum number of threads. Default `40`.
419
+ * `use_tqdm` (bool, optional): Whether to show a progress bar. Default `True`.
420
+ * **Returns**:
421
+ * (list): A list of results corresponding to the input list order.
422
+
423
+ * **Example**:
424
+ ```python
425
+ from structai import multi_thread
426
+ import time
427
+
428
+ def square(x):
429
+ return x * x
430
+
431
+ inputs = [{"x": i} for i in range(10)]
432
+ results = multi_thread(inputs, square, max_workers=4)
433
+ print(results) # [0, 1, 4, 9, ...]
434
+ ```
435
+
436
+ ### `multi_process`
437
+
438
+ Executes a function concurrently for each item in `inp_list` using a process pool. Ideal for CPU-bound tasks.
439
+
440
+ * **Args**:
441
+ * `inp_list` (list[dict]): A list of dictionaries, where each dictionary contains keyword arguments for `function`.
442
+ * `function` (callable): The function to execute.
443
+ * `max_workers` (int, optional): The maximum number of processes. Default `40`.
444
+ * `use_tqdm` (bool, optional): Whether to show a progress bar. Default `True`.
445
+ * **Returns**:
446
+ * (list): A list of results corresponding to the input list order.
447
+
448
+ * **Example**:
449
+ ```python
450
+ from structai import multi_process
451
+
452
+ # 'heavy_computation' must be defined at the top level for multiprocessing pickling.
453
+ def heavy_computation(n):
454
+ return sum(range(n))
455
+
456
+ inputs = [{"n": 1000} for _ in range(5)]
457
+ results = multi_process(inputs, heavy_computation)
458
+ ```
459
+
460
+ ### `run_server`
461
+
462
+ Starts a FastAPI server that acts as a proxy to an OpenAI-compatible LLM provider using LLM_BASE_URL and LLM_API_KEY in environment variables.
463
+
464
+ * **Args**:
465
+ * `host` (str, optional): The host to bind to. Default `"0.0.0.0"`.
466
+ * `port` (int, optional): The port to bind to. Default `8001`.
467
+ * **Returns**:
468
+ * None (Runs indefinitely until stopped).
469
+
470
+ * **Example**:
471
+ ```python
472
+ from structai import run_server
473
+
474
+ if __name__ == "__main__":
475
+ run_server()
476
+ ```
477
+
478
+ ### `timeout_limit`
479
+
480
+ A decorator that enforces a maximum execution time on a function. Raises `TimeoutError` if the limit is exceeded.
481
+
482
+ * **Args**:
483
+ * `timeout` (float | None): Maximum allowed execution time in seconds.
484
+ * **Returns**:
485
+ * (decorator): A decorator function that wraps the target function.
486
+
487
+ * **Example**:
488
+ ```python
489
+ from structai import timeout_limit
490
+ import time
491
+
492
+ @timeout_limit(timeout=2.0)
493
+ def task():
494
+ time.sleep(5)
495
+
496
+ # This will raise TimeoutError
497
+ task()
498
+ ```
499
+
500
+ ### `run_with_timeout`
501
+
502
+ Runs a function with a specified timeout without using a decorator.
503
+
504
+ * **Args**:
505
+ * `func` (callable): The function to run.
506
+ * `args` (tuple, optional): Positional arguments for the function. Default `()`.
507
+ * `kwargs` (dict, optional): Keyword arguments for the function. Default `None`.
508
+ * `timeout` (float | None): Maximum allowed execution time in seconds.
509
+ * **Returns**:
510
+ * (Any): The return value of the function.
511
+
512
+ * **Example**:
513
+ ```python
514
+ from structai import run_with_timeout
515
+
516
+ def task(x):
517
+ return x * 2
518
+
519
+ result = run_with_timeout(task, args=(10,), timeout=1.0)
520
+ ```
521
+
522
+ ### `remove_tag`
523
+
524
+ Removes specified tags from a string, replacing them with a separator (default newline).
525
+
526
+ * **Args**:
527
+ * `s` (str): The input string.
528
+ * `tags` (list[str], optional): A list of tags to remove. Default `["<think>", "</think>", "<answer>", "</answer>"]`.
529
+ * `r` (str, optional): The replacement string. Default `"\n"`.
530
+ * **Returns**:
531
+ * (str): The cleaned string.
532
+
533
+ * **Example**:
534
+ ```python
535
+ from structai import remove_tag
536
+
537
+ clean_text = remove_tag("<think>...</think> Answer")
538
+ # Output: "...\n Answer"
539
+ ```
540
+
541
+ ### `parse_think_answer`
542
+
543
+ Parses a string containing Chain-of-Thought tags (`<think>...</think>` and `<answer>...</answer>`) and returns the content of both.
544
+
545
+ * **Args**:
546
+ * `text` (str): The input text containing the tags.
547
+ * **Returns**:
548
+ * (tuple): A tuple `(think_content, answer_content)`.
549
+
550
+ * **Example**:
551
+ ```python
552
+ from structai import parse_think_answer
553
+
554
+ raw_text = "<think>Step 1...</think><answer>42</answer>"
555
+ think, answer = parse_think_answer(raw_text)
556
+ print(f"Reasoning: {think}") # Reasoning: Step 1...
557
+ print(f"Result: {answer}") # Result: 42
558
+ ```
559
+
560
+ ### `extract_within_tags`
561
+
562
+ Extracts the substring found between two specific tags.
563
+
564
+ * **Args**:
565
+ * `content` (str): The text to search within.
566
+ * `start_tag` (str, optional): The opening tag. Default `'<answer>'`.
567
+ * `end_tag` (str, optional): The closing tag. Default `'</answer>'`.
568
+ * `default_return` (Any, optional): The value to return if tags are not found. Default `None`.
569
+ * **Returns**:
570
+ * (str | Any): The extracted content string, or `default_return` if not found.
571
+
572
+ * **Example**:
573
+ ```python
574
+ from structai import extract_within_tags
575
+
576
+ text = "Result: <json>{...}</json>"
577
+ json_str = extract_within_tags(text, "<json>", "</json>")
578
+ # Output: "{...}"
579
+ ```
580
+
581
+ ### `get_all_file_paths`
582
+
583
+ Recursively retrieves all file paths in a directory that match a given suffix.
584
+
585
+ * **Args**:
586
+ * `directory` (str): The root directory to search.
587
+ * `suffix` (str, optional): The file suffix to filter by (e.g., '.py'). Default `''` (matches all files).
588
+ * **Returns**:
589
+ * (list[str]): A list of matching file paths.
590
+
591
+ * **Example**:
592
+ ```python
593
+ from structai import get_all_file_paths
594
+
595
+ # Get all Python files in the current directory
596
+ py_files = get_all_file_paths(".", suffix=".py")
597
+ print(py_files)
598
+ ```