retab 0.0.40__py3-none-any.whl → 0.0.42__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. retab/client.py +5 -5
  2. retab/resources/consensus/completions.py +1 -1
  3. retab/resources/consensus/completions_stream.py +5 -5
  4. retab/resources/consensus/responses.py +1 -1
  5. retab/resources/consensus/responses_stream.py +2 -2
  6. retab/resources/documents/client.py +12 -11
  7. retab/resources/documents/extractions.py +4 -4
  8. retab/resources/evals.py +1 -1
  9. retab/resources/evaluations/documents.py +1 -1
  10. retab/resources/jsonlUtils.py +4 -4
  11. retab/resources/processors/automations/endpoints.py +9 -5
  12. retab/resources/processors/automations/links.py +2 -2
  13. retab/resources/processors/automations/logs.py +2 -2
  14. retab/resources/processors/automations/mailboxes.py +43 -32
  15. retab/resources/processors/automations/outlook.py +25 -7
  16. retab/resources/processors/automations/tests.py +8 -2
  17. retab/resources/processors/client.py +25 -16
  18. retab/resources/prompt_optimization.py +1 -1
  19. retab/resources/schemas.py +3 -3
  20. retab/types/automations/mailboxes.py +1 -1
  21. retab/types/completions.py +1 -1
  22. retab/types/documents/create_messages.py +4 -4
  23. retab/types/documents/extractions.py +3 -3
  24. retab/types/documents/parse.py +3 -1
  25. retab/types/evals.py +2 -2
  26. retab/types/evaluations/iterations.py +2 -2
  27. retab/types/evaluations/model.py +2 -2
  28. retab/types/extractions.py +34 -9
  29. retab/types/jobs/prompt_optimization.py +1 -1
  30. retab/types/logs.py +3 -3
  31. retab/types/schemas/object.py +4 -4
  32. retab/types/schemas/templates.py +1 -1
  33. retab/utils/__init__.py +0 -0
  34. retab/utils/_model_cards/anthropic.yaml +59 -0
  35. retab/utils/_model_cards/auto.yaml +43 -0
  36. retab/utils/_model_cards/gemini.yaml +117 -0
  37. retab/utils/_model_cards/openai.yaml +301 -0
  38. retab/utils/_model_cards/xai.yaml +28 -0
  39. retab/utils/ai_models.py +138 -0
  40. retab/utils/benchmarking.py +484 -0
  41. retab/utils/chat.py +327 -0
  42. retab/utils/display.py +440 -0
  43. retab/utils/json_schema.py +2156 -0
  44. retab/utils/mime.py +165 -0
  45. retab/utils/responses.py +169 -0
  46. retab/utils/stream_context_managers.py +52 -0
  47. retab/utils/usage/__init__.py +0 -0
  48. retab/utils/usage/usage.py +301 -0
  49. retab-0.0.42.dist-info/METADATA +119 -0
  50. {retab-0.0.40.dist-info → retab-0.0.42.dist-info}/RECORD +52 -36
  51. retab-0.0.40.dist-info/METADATA +0 -418
  52. {retab-0.0.40.dist-info → retab-0.0.42.dist-info}/WHEEL +0 -0
  53. {retab-0.0.40.dist-info → retab-0.0.42.dist-info}/top_level.txt +0 -0
@@ -1,418 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: retab
3
- Version: 0.0.40
4
- Summary: Retab official python library
5
- Home-page: https://github.com/Retab-dev/retab
6
- Author: Retab
7
- Author-email: contact@retab.com
8
- Project-URL: Team website, https://retab.com
9
- Classifier: Programming Language :: Python :: 3
10
- Classifier: License :: OSI Approved :: MIT License
11
- Classifier: Operating System :: POSIX :: Linux
12
- Classifier: Operating System :: MacOS
13
- Classifier: Intended Audience :: Science/Research
14
- Requires-Python: >=3.6
15
- Description-Content-Type: text/markdown
16
- Requires-Dist: Pillow
17
- Requires-Dist: httpx
18
- Requires-Dist: pydantic
19
- Requires-Dist: pydantic-core
20
- Requires-Dist: requests
21
- Requires-Dist: tqdm
22
- Requires-Dist: types-tqdm
23
- Requires-Dist: backoff
24
- Requires-Dist: termplotlib
25
- Requires-Dist: Levenshtein
26
- Requires-Dist: pandas
27
- Requires-Dist: numpy
28
- Requires-Dist: motor
29
- Requires-Dist: rich
30
- Requires-Dist: puremagic
31
- Requires-Dist: pycountry
32
- Requires-Dist: phonenumbers
33
- Requires-Dist: email-validator
34
- Requires-Dist: python-stdnum
35
- Requires-Dist: nanoid
36
- Requires-Dist: openai
37
- Requires-Dist: google-genai
38
- Requires-Dist: google-generativeai
39
- Requires-Dist: anthropic
40
- Requires-Dist: tiktoken
41
- Requires-Dist: truststore
42
- Requires-Dist: ruff
43
-
44
- # Retab
45
-
46
- <div align="center" style="margin-bottom: 1em;">
47
-
48
- <img src="https://raw.githubusercontent.com/Retab/retab/refs/heads/main/assets/retab-logo.png" alt="Retab Logo" width="150">
49
-
50
-
51
- *The AI Automation Platform*
52
-
53
- Made with love by the team at [Retab](https://retab.dev) 🤍.
54
-
55
- [Our Website](https://retab.dev) | [Documentation](https://docs.retab.dev/get-started/introduction) | [Discord](https://discord.com/invite/vc5tWRPqag) | [Twitter](https://x.com/retabdev)
56
-
57
-
58
- </div>
59
-
60
- ---
61
-
62
- ## How It Works
63
-
64
- Retab allows you to easily create document processing automations. Here is the general workflow:
65
-
66
- ```mermaid
67
- sequenceDiagram
68
- User ->> Retab: File Upload
69
- Retab -->> Retab: Preprocessing
70
- Retab ->> AI Provider: Request on your behalf
71
- AI Provider -->> Retab: Structured Generation
72
- Retab ->> Webhook: Send result
73
- Retab ->> User: Send Confirmation
74
- ```
75
-
76
- ---
77
-
78
- ## General philosophy
79
-
80
- Many people haven't yet realized how powerful LLMs have become at document processing tasks - **we're here to help you unlock these capabilities**.
81
-
82
- Our mission can be described as follows:
83
-
84
- - **Smarter Document Processing**
85
- > Convert any file type (PDFs, Excel, emails, etc.) into LLM-ready format without touching any line of code.
86
-
87
- - **Scalable Workflow Automation**
88
- > Create custom automation loops to process documents at scale.
89
-
90
- - **Model Efficiency & Cost Optimization**
91
- > Get consistent, reliable outputs using schema-based prompt engineering to reduce costs and improve performance.
92
-
93
- You come with your own API key from your favorite AI provider, and we handle the rest in an **easy** and **transparent** way.
94
-
95
- We currently support [OpenAI](https://platform.openai.com/docs/overview), [Anthropic](https://www.anthropic.com/api), [Gemini](https://aistudio.google.com/prompts/new_chat) and [xAI](https://x.ai/api) models.
96
-
97
- <p align="center">
98
- <img src="https://raw.githubusercontent.com/Retab/retab/refs/heads/main/assets/supported_models.png" alt="Supported Models" width="600">
99
- </p>
100
-
101
- ---
102
-
103
- ## Quickstart
104
-
105
- Explore our [Playground](https://www.retab.dev/dashboard/playground) and create your first automations easily 🚀!
106
-
107
- <p align="center">
108
- <img src="https://raw.githubusercontent.com/Retab/retab/refs/heads/main/assets/retab-playground.png" alt="Retab Playground" width="600">
109
- </p>
110
-
111
- ---
112
-
113
- ---
114
-
115
- ## Dev Mode 🔧
116
-
117
- You need more control? You can access the [Documentation](https://docs.retab.dev/get-started/introduction) of our **Python SDK**.
118
-
119
- 1. **Setup the Python SDK**
120
- > Install the Retab Python SDK and configure your API keys to start processing documents with your preferred AI provider.
121
-
122
- 2. **Create your JSON schema**
123
- > Define the structure of the data you want to extract from your documents using our schema format with custom prompting capabilities.
124
-
125
- 3. **Create your FastAPI server with a webhook**
126
- > Set up an endpoint that will receive the structured data extracted from your documents after processing.
127
-
128
- 4. **Create your automation**
129
- > Configure an automation (mailbox or link) that will automatically process incoming documents using your schema and send results to your webhook.
130
-
131
- 5. **Test your automation**
132
- > Validate your setup by sending test documents through your automation and verify the extracted data matches your requirements.
133
-
134
- ### Step 1: Setup of the Python SDK
135
-
136
- To get started, install the `retab` package using pip:
137
-
138
- ```bash
139
- pip install retab
140
- ```
141
-
142
- Then, [create your API key on retab.dev](https://www.retab.dev).
143
-
144
- Create another API key by you favorite API key provider.
145
-
146
- **Reminder**: We currently support [OpenAI](https://platform.openai.com/docs/overview), [Anthropic](https://www.anthropic.com/api), [Gemini](https://aistudio.google.com/prompts/new_chat) and [xAI](https://x.ai/api) models.
147
-
148
- As we will use your API key to make requests to OpenAI on your behalf within an automation, you need to store your API key in the Retab secrets manager:
149
-
150
- ```
151
- OPENAI_API_KEY=sk-xxxxxxxxx
152
- RETAB_API_KEY=sk_retab_xxxxxxxxx
153
- ```
154
-
155
- ```bash
156
- import retab
157
- import os
158
-
159
- reclient = retab.Retab()
160
-
161
- reclient.secrets.external_api_keys.create(
162
- provider="OpenAI",
163
- api_key=os.getenv("OPENAI_API_KEY")
164
- )
165
- ```
166
-
167
- #### Process your first document with the create_messages method:
168
-
169
- ```bash
170
- from retab import Retab
171
- from openai import OpenAI
172
-
173
- # Initialize Retab client
174
- reclient = Retab()
175
-
176
- # Convert any document into LLM-ready format
177
- doc_msg = reclient.documents.create_messages(
178
- document = "invoice.pdf" # Works with PDFs, Excel, emails, etc.
179
- )
180
-
181
- client = OpenAI()
182
- completion = client.chat.completions.create(
183
- model="gpt-4.1-nano",
184
- messages=doc_msg.openai_messages + [
185
- {
186
- "role": "user",
187
- "content": "Summarize the document"
188
- }
189
- ]
190
- )
191
- ```
192
-
193
- ### Step 2: Create your JSON Schema
194
-
195
- We use a standard JSON Schema with custom annotations (`X-SystemPrompt`, `X-FieldPrompt`, and `X-ReasoningPrompt`) as a prompt-engineering framework for the extraction process.
196
-
197
- These annotations help guide the LLM’s behavior and improve extraction accuracy.
198
-
199
- You can learn more about these in our [JSON Schema documentation](https://docs.retab.dev/get-started/prompting-with-the-JSON-schema).
200
-
201
- ```bash
202
- from retab import Retab
203
- from openai import OpenAI
204
- from pydantic import BaseModel, Field, ConfigDict
205
-
206
- # Define your extraction schema
207
- class Invoice(BaseModel):
208
- model_config = ConfigDict(
209
- json_schema_extra = {
210
- "X-SystemPrompt": "You are an expert at analyzing invoice documents."
211
- }
212
- )
213
-
214
- total_amount: float = Field(...,
215
- description="The total invoice amount",
216
- json_schema_extra={
217
- "X-FieldPrompt": "Find the final total amount including taxes"
218
- }
219
- )
220
- date: str = Field(...,
221
- description="Invoice date in YYYY-MM-DD format",
222
- json_schema_extra={
223
- "X-ReasoningPrompt": "Look for dates labeled as 'Invoice Date', 'Date', etc."
224
- }
225
- )
226
-
227
- # Process document and extract data
228
- reclient = Retab()
229
- doc_msg = reclient.documents.create_messages(
230
- document = "invoice.pdf"
231
- )
232
- schema_obj = reclient.schemas.load(
233
- pydantic_model = Invoice
234
- )
235
-
236
- # Extract structured data with any LLM
237
- client = OpenAI()
238
- completion = client.beta.chat.completions.parse(
239
- model="gpt-4o",
240
- messages=schema_obj.openai_messages + doc_msg.openai_messages,
241
- response_format=schema_obj.inference_pydantic_model
242
- )
243
-
244
- print("Extracted data:", completion.choices[0].message.parsed)
245
-
246
- # Validate the response against the original schema if you want to remove the reasoning fields
247
- from retab._utils.json_schema import filter_auxiliary_fields_json
248
- assert completion.choices[0].message.content is not None
249
- extraction = schema_obj.pydantic_model.model_validate(
250
- filter_auxiliary_fields_json(completion.choices[0].message.content, schema_obj.pydantic_model)
251
- )
252
-
253
- print("Extracted data without the reasoning fields:", extraction)
254
- ```
255
-
256
- ### Step 3: Create your FastAPI server with a webhook
257
-
258
- Next, set up a FastAPI route that will handle incoming webhook POST requests.
259
-
260
- Below is an example of a simple FastAPI application with a webhook endpoint:
261
-
262
- ```bash
263
- from fastapi import FastAPI, Request
264
- from fastapi.responses import JSONResponse
265
- from retab.types.automations.webhooks import WebhookRequest
266
- from pydantic import BaseModel, Field, ConfigDict
267
-
268
- app = FastAPI()
269
-
270
- @app.post("/webhook")
271
- async def webhook(request: WebhookRequest):
272
- invoice_object = request.completion.choices[0].message.parsed # The parsed object is the same Invoice object as the one you defined in the Pydantic model
273
- print("Received payload:", invoice_object)
274
- return JSONResponse(content={"status": "success", "data": invoice_object})
275
-
276
- # To run the FastAPI app locally, use the command:
277
- # uvicorn your_module_name:app --reload
278
- if __name__ == "__main__":
279
- import uvicorn
280
- uvicorn.run(app, host="0.0.0.0", port=8000)
281
- ```
282
-
283
- You can test the webhook endpoint locally with a tool like curl or Postman - for example, using curl:
284
-
285
- ```bash
286
- curl -X POST "http://localhost:8000/webhook" \
287
- -H "Content-Type: application/json" \
288
- -d '{"name": "Team Meeting", "date": "2023-12-31"}'
289
- ```
290
-
291
- > ⚠️ **To continue**, you need to deploy your FastAPI app to a server to make your webhook endpoint publicly accessible.
292
- > We recommend using [Replit](https://replit.com) to get started quickly if you don’t have a server yet.
293
-
294
- ### Step 4: Create your automation
295
-
296
- Finally, integrate the webhook with your automation system using the `retab` client.
297
-
298
- This example demonstrates how to create an automation that triggers the webhook when a matching event occurs:
299
-
300
- ```bash
301
- from retab import Retab
302
-
303
- # Initialize the Retab client
304
- reclient = Retab()
305
-
306
- # Create an automation that uses the webhook URL from Step 2
307
- automation = reclient.processors.automations.mailboxes.create(
308
- email="invoices@mailbox.retab.dev",
309
- model="gpt-4.1-nano",
310
- json_schema=Invoice.model_json_schema(), # use the pydantic model to create the json schema
311
- webhook_url="https://your-server.com/webhook", # Replace with your actual webhook URL
312
- )
313
- ```
314
-
315
- At any email sent to `invoices@mailbox.retab.dev`, the automation will send a POST request to your FastAPI webhook endpoint, where the payload can be processed.
316
-
317
- You can see the automation you just created on your [dashboard](https://www.retab.dev/dashboard/processors)!
318
-
319
- ### Step 5: Test your automation
320
-
321
- Finally, you can test the automation rapidly with the test functions of the sdk:
322
-
323
- ```bash
324
- from retab import Retab
325
-
326
- # Initialize the Retab client
327
- reclient = Retab()
328
-
329
- # If you just want to send a test request to your webhook
330
- log = reclient.processors.automations.mailboxes.tests.webhook(
331
- email="test-mailbox-local@devmail.retab.dev",
332
- )
333
-
334
- # If you want to test the file processing logic:
335
- log = reclient.processors.automations.mailboxes.tests.process(
336
- email="test-mailbox-local@devmail.retab.dev",
337
- document="your_invoice_email.eml"
338
- )
339
-
340
- # If you want to test a full email forwarding
341
- log = reclient.processors.automations.mailboxes.tests.forward(
342
- email="retab-quickstart@mailbox.retab.dev",
343
- document="your_invoice_email.eml"
344
- )
345
- ```
346
-
347
- > 💡 **Tip:** You can also test your webhook locally by overriding the webhook URL set in the automation.
348
-
349
- ```bash
350
- from retab import Retab
351
-
352
- reclient = Retab()
353
-
354
- # If you just want to send a test request to your webhook
355
- log = reclient.processors.automations.mailboxes.tests.webhook(
356
- email="test-mailbox-local@devmail.retab.dev",
357
- webhook_url="http://localhost:8000/webhook" # If you want to try your webhook locally, you can override the webhook url set in the automation
358
- )
359
- ```
360
-
361
- And that's it! You can start processing documents at scale!
362
- You have 1000 free requests to get started, and you can [subscribe](https://www.retab.dev) to the pro plan to get more.
363
-
364
- But this minimalistic example is just the beginning.
365
-
366
- Continue reading to learn more about how to use Retab **to its full potential** 🔥.
367
-
368
- ---
369
-
370
- ## Go further
371
-
372
- - [Prompt Engineering Guide](https://docs.retab.dev/get-started/prompting-with-the-json-schema)
373
- - [General Concepts](https://docs.retab.dev/get-started/General-Concepts)
374
- - [Consensus](https://docs.retab.dev/SDK/General-Concepts#consensus)
375
- - [Create mailboxes](https://docs.retab.dev/SDK/Automations#mailbox)
376
- - [Create links](https://docs.retab.dev/SDK/Automations#link)
377
- - Finetuning (coming soon)
378
- - Prompt optimization (coming soon)
379
- - Data-Labelling with our AI-powered annotator (coming soon)
380
-
381
- ---
382
-
383
- ## Jupyter Notebooks
384
-
385
- You can view minimal notebooks that demonstrate how to use Retab to process documents:
386
-
387
- - [Mailbox creation quickstart](https://github.com/Retab-dev/retab/blob/main/notebooks/mailboxes_quickstart.ipynb)
388
- - [Upload Links creation quickstart](https://github.com/Retab-dev/retab/blob/main/notebooks/links_quickstart.ipynb)
389
- - [Document Extractions quickstart](https://github.com/Retab-dev/retab/blob/main/notebooks/Quickstart.ipynb)
390
- - [Document Extractions quickstart - Async](https://github.com/Retab-dev/retab/blob/main/notebooks/Quickstart-Async.ipynb)
391
-
392
- ---
393
-
394
- ## Community
395
-
396
- Let's create the future of document processing together!
397
-
398
- Join our [discord community](https://discord.com/invite/vc5tWRPqag) to share tips, discuss best practices, and showcase what you build. Or just [tweet](https://x.com/retabdev) at us.
399
-
400
- We can't wait to see how you'll use Retab.
401
-
402
- - [Discord](https://discord.com/invite/vc5tWRPqag)
403
- - [Twitter](https://x.com/retabdev)
404
-
405
-
406
- ## Roadmap
407
-
408
- We publicly share our Roadmap with the community.
409
-
410
- Please open an issue or [contact us on X](https://x.com/sachaicb) if you have suggestions or ideas.
411
-
412
- - [ ] node client with ZOD
413
- - [ ] Make a json-schema zoo
414
- - [ ] Offer tools to display tokens usage to our users
415
- - [ ] Launch the data-labelling API (Dataset Upload / Creation / Management / Labelling / Distillation)
416
- - [ ] Launch the data-labelling platform: A web app based on the data-labelling API with a nice UI
417
- - [ ] Launch the prompt-optimisation sdk
418
- - [ ] Launch the finetuning sdk
File without changes