PyPI - chunkr-ai - Versions diffs - 0.0.3__tar.gz → 0.0.5__tar.gz - Mend

chunkr-ai 0.0.3tar.gz → 0.0.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

{chunkr_ai-0.0.3/src/chunkr_ai.egg-info → chunkr_ai-0.0.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: chunkr-ai
-Version: 0.0.3
+Version: 0.0.5
 Summary: Python client for Chunkr: open source document intelligence
 Author-email: Ishaan Kapoor <ishaan@lumina.sh>
 Project-URL: Homepage, https://chunkr.ai
@@ -17,7 +17,13 @@ Requires-Dist: pytest-xdist>=3.6.1; extra == "test"
 # Chunkr Python Client
-This is the Python client for the Chunkr API. It provides a simple interface to interact with Chunkr's services.
+This provides a simple interface to interact with the Chunkr API.
+## Getting Started
+You can get an API key from [Chunkr](https://chunkr.ai) or deploy your own Chunkr instance. For self-hosted deployment options, check out our [deployment guide](https://github.com/lumina-ai-inc/chunkr/tree/main?tab=readme-ov-file#self-hosted-deployment-options).
+For more information about the API and its capabilities, visit the [Chunkr API docs](https://docs.chunkr.ai).
 ## Installation
@@ -102,6 +108,80 @@ chunkr.upload(img)
 ### Configuration
+You can customize the processing behavior by passing a `Configuration` object:
+```python
+from chunkr_ai.models import Configuration, OcrStrategy, SegmentationStrategy, GenerationStrategy
+# Basic configuration
+config = Configuration(
+    ocr_strategy=OcrStrategy.AUTO,
+    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,
+    high_resolution=True,
+    expires_in=3600,  # seconds
+)
+# Upload with configuration
+task = chunkr.upload("document.pdf", config)
+```
+#### Available Configuration Examples
+- **Chunk Processing**
+  ```python
+  from chunkr_ai.models import ChunkProcessing
+  config = Configuration(
+      chunk_processing=ChunkProcessing(target_length=1024)
+  )
+  ```
+- **Expires In**
+  ```python
+  config = Configuration(expires_in=3600)
+  ```
+- **High Resolution**
+  ```python
+  config = Configuration(high_resolution=True)
+  ```
+- **JSON Schema**
+  ```python
+  config = Configuration(json_schema=JsonSchema(
+      title="Sales Data",
+      properties=[
+          Property(name="Person with highest sales", prop_type="string", description="The person with the highest sales"),
+          Property(name="Person with lowest sales", prop_type="string", description="The person with the lowest sales"),
+      ]
+  ))
+  ```
+- **OCR Strategy**
+  ```python
+  config = Configuration(ocr_strategy=OcrStrategy.AUTO)
+  ```
+- **Segment Processing**
+  ```python
+  from chunkr_ai.models import SegmentProcessing, GenerationConfig, GenerationStrategy
+  config = Configuration(
+      segment_processing=SegmentProcessing(
+          page=GenerationConfig(
+              html=GenerationStrategy.LLM,
+              markdown=GenerationStrategy.LLM
+          )
+      )
+  )
+  ```
+- **Segmentation Strategy**
+  ```python
+  config = Configuration(
+      segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS  # or SegmentationStrategy.PAGE
+  )
+  ```
+## Environment setup
 You can provide your API key and URL in several ways:
 1. Environment variables: `CHUNKR_API_KEY` and `CHUNKR_URL`
 2. `.env` file
@@ -112,13 +192,3 @@ chunkr = Chunkr(
     url="https://api.chunkr.ai"
 )
 ```
-## Run tests
-```python
-# Install dependencies
-uv pip install -e ".[test]"
-# Run tests
-uv run pytest
-```

chunkr_ai-0.0.5/README.md ADDED Viewed

@@ -0,0 +1,177 @@
+# Chunkr Python Client
+This provides a simple interface to interact with the Chunkr API.
+## Getting Started
+You can get an API key from [Chunkr](https://chunkr.ai) or deploy your own Chunkr instance. For self-hosted deployment options, check out our [deployment guide](https://github.com/lumina-ai-inc/chunkr/tree/main?tab=readme-ov-file#self-hosted-deployment-options).
+For more information about the API and its capabilities, visit the [Chunkr API docs](https://docs.chunkr.ai).
+## Installation
+```bash
+pip install chunkr-ai
+```
+## Usage
+We provide two clients: `Chunkr` for synchronous operations and `ChunkrAsync` for asynchronous operations.
+### Synchronous Usage
+```python
+from chunkr_ai import Chunkr
+# Initialize client
+chunkr = Chunkr()
+# Upload a file and wait for processing
+task = chunkr.upload("document.pdf")
+# Print the response
+print(task)
+# Get output from task
+output = task.output
+# If you want to upload without waiting for processing
+task = chunkr.start_upload("document.pdf")
+# ... do other things ...
+task.poll()  # Check status when needed
+```
+### Asynchronous Usage
+```python
+from chunkr_ai import ChunkrAsync
+async def process_document():
+    # Initialize client
+    chunkr = ChunkrAsync()
+    # Upload a file and wait for processing
+    task = await chunkr.upload("document.pdf")
+    # Print the response
+    print(task)
+    # Get output from task
+    output = task.output
+    # If you want to upload without waiting for processing
+    task = await chunkr.start_upload("document.pdf")
+    # ... do other things ...
+    await task.poll_async()  # Check status when needed
+```
+### Additional Features
+Both clients support various input types:
+```python
+# Upload from file path
+chunkr.upload("document.pdf")
+# Upload from opened file
+with open("document.pdf", "rb") as f:
+    chunkr.upload(f)
+# Upload from URL
+chunkr.upload("https://example.com/document.pdf")
+# Upload from base64 string
+chunkr.upload("data:application/pdf;base64,JVBERi0xLjcKCjEgMCBvYmo...")
+# Upload an image
+from PIL import Image
+img = Image.open("photo.jpg")
+chunkr.upload(img)
+```
+### Configuration
+You can customize the processing behavior by passing a `Configuration` object:
+```python
+from chunkr_ai.models import Configuration, OcrStrategy, SegmentationStrategy, GenerationStrategy
+# Basic configuration
+config = Configuration(
+    ocr_strategy=OcrStrategy.AUTO,
+    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,
+    high_resolution=True,
+    expires_in=3600,  # seconds
+)
+# Upload with configuration
+task = chunkr.upload("document.pdf", config)
+```
+#### Available Configuration Examples
+- **Chunk Processing**
+  ```python
+  from chunkr_ai.models import ChunkProcessing
+  config = Configuration(
+      chunk_processing=ChunkProcessing(target_length=1024)
+  )
+  ```
+- **Expires In**
+  ```python
+  config = Configuration(expires_in=3600)
+  ```
+- **High Resolution**
+  ```python
+  config = Configuration(high_resolution=True)
+  ```
+- **JSON Schema**
+  ```python
+  config = Configuration(json_schema=JsonSchema(
+      title="Sales Data",
+      properties=[
+          Property(name="Person with highest sales", prop_type="string", description="The person with the highest sales"),
+          Property(name="Person with lowest sales", prop_type="string", description="The person with the lowest sales"),
+      ]
+  ))
+  ```
+- **OCR Strategy**
+  ```python
+  config = Configuration(ocr_strategy=OcrStrategy.AUTO)
+  ```
+- **Segment Processing**
+  ```python
+  from chunkr_ai.models import SegmentProcessing, GenerationConfig, GenerationStrategy
+  config = Configuration(
+      segment_processing=SegmentProcessing(
+          page=GenerationConfig(
+              html=GenerationStrategy.LLM,
+              markdown=GenerationStrategy.LLM
+          )
+      )
+  )
+  ```
+- **Segmentation Strategy**
+  ```python
+  config = Configuration(
+      segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS  # or SegmentationStrategy.PAGE
+  )
+  ```
+## Environment setup
+You can provide your API key and URL in several ways:
+1. Environment variables: `CHUNKR_API_KEY` and `CHUNKR_URL`
+2. `.env` file
+3. Direct initialization:
+```python
+chunkr = Chunkr(
+    api_key="your-api-key",
+    url="https://api.chunkr.ai"
+)
+```

{chunkr_ai-0.0.3 → chunkr_ai-0.0.5}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "chunkr-ai"
-version = "0.0.3"
+version = "0.0.5"
 authors = [{"name" = "Ishaan Kapoor", "email" = "ishaan@lumina.sh"}]
 description = "Python client for Chunkr: open source document intelligence"
 readme = "README.md"

{chunkr_ai-0.0.3 → chunkr_ai-0.0.5}/src/chunkr_ai/api/config.py RENAMED Viewed

@@ -1,4 +1,4 @@
-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, model_validator
 from enum import Enum
 from typing import Optional, List, Dict
@@ -40,15 +40,14 @@ class ChunkProcessing(BaseModel):
 class Property(BaseModel):
     name: str
-    title: Optional[str]
+    title: Optional[str] = None
     prop_type: str
-    description: Optional[str]
-    default: Optional[str]
+    description: Optional[str] = None
+    default: Optional[str] = None
 class JsonSchema(BaseModel):
     title: str
     properties: List[Property]
-    schema_type: Optional[str]
 class OcrStrategy(str, Enum):
     ALL = "All"
@@ -121,10 +120,12 @@ class Configuration(BaseModel):
     ocr_strategy: Optional[OcrStrategy] = Field(default=None)
     segment_processing: Optional[SegmentProcessing] = Field(default=None)
     segmentation_strategy: Optional[SegmentationStrategy] = Field(default=None)
-    target_chunk_length: Optional[int] = Field(default=None)
-class Status(str, Enum):
-    STARTING = "Starting"
-    PROCESSING = "Processing"
-    SUCCEEDED = "Succeeded"
-    FAILED = "Failed"
+    @model_validator(mode='before')
+    def map_deprecated_fields(cls, values: Dict) -> Dict:
+        if isinstance(values, dict) and "target_chunk_length" in values:
+            target_length = values.pop("target_chunk_length")
+            if target_length is not None:
+                values["chunk_processing"] = values.get("chunk_processing", {}) or {}
+                values["chunk_processing"]["target_length"] = target_length
+        return values

{chunkr_ai-0.0.3 → chunkr_ai-0.0.5}/src/chunkr_ai/api/task.py RENAMED Viewed

@@ -1,11 +1,18 @@
 from .protocol import ChunkrClientProtocol
-from .config import Configuration, Status, OutputResponse
+from .config import Configuration, OutputResponse
 import asyncio
 from datetime import datetime
+from enum import Enum
 from pydantic import BaseModel, PrivateAttr
 import time
 from typing import Optional, Union
+class Status(str, Enum):
+    STARTING = "Starting"
+    PROCESSING = "Processing"
+    SUCCEEDED = "Succeeded"
+    FAILED = "Failed"
 class TaskResponse(BaseModel):
     configuration: Configuration
     created_at: datetime

{chunkr_ai-0.0.3 → chunkr_ai-0.0.5}/src/chunkr_ai/models.py RENAMED Viewed

@@ -18,10 +18,9 @@ from .api.config import (
     SegmentProcessing,
     SegmentType,
     SegmentationStrategy,
-    Status
 )
-from .api.task import TaskResponse, TaskPayload
+from .api.task import TaskResponse, TaskPayload, Status
 __all__ = [
     'BoundingBox',

{chunkr_ai-0.0.3 → chunkr_ai-0.0.5/src/chunkr_ai.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: chunkr-ai
-Version: 0.0.3
+Version: 0.0.5
 Summary: Python client for Chunkr: open source document intelligence
 Author-email: Ishaan Kapoor <ishaan@lumina.sh>
 Project-URL: Homepage, https://chunkr.ai
@@ -17,7 +17,13 @@ Requires-Dist: pytest-xdist>=3.6.1; extra == "test"
 # Chunkr Python Client
-This is the Python client for the Chunkr API. It provides a simple interface to interact with Chunkr's services.
+This provides a simple interface to interact with the Chunkr API.
+## Getting Started
+You can get an API key from [Chunkr](https://chunkr.ai) or deploy your own Chunkr instance. For self-hosted deployment options, check out our [deployment guide](https://github.com/lumina-ai-inc/chunkr/tree/main?tab=readme-ov-file#self-hosted-deployment-options).
+For more information about the API and its capabilities, visit the [Chunkr API docs](https://docs.chunkr.ai).
 ## Installation
@@ -102,6 +108,80 @@ chunkr.upload(img)
 ### Configuration
+You can customize the processing behavior by passing a `Configuration` object:
+```python
+from chunkr_ai.models import Configuration, OcrStrategy, SegmentationStrategy, GenerationStrategy
+# Basic configuration
+config = Configuration(
+    ocr_strategy=OcrStrategy.AUTO,
+    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,
+    high_resolution=True,
+    expires_in=3600,  # seconds
+)
+# Upload with configuration
+task = chunkr.upload("document.pdf", config)
+```
+#### Available Configuration Examples
+- **Chunk Processing**
+  ```python
+  from chunkr_ai.models import ChunkProcessing
+  config = Configuration(
+      chunk_processing=ChunkProcessing(target_length=1024)
+  )
+  ```
+- **Expires In**
+  ```python
+  config = Configuration(expires_in=3600)
+  ```
+- **High Resolution**
+  ```python
+  config = Configuration(high_resolution=True)
+  ```
+- **JSON Schema**
+  ```python
+  config = Configuration(json_schema=JsonSchema(
+      title="Sales Data",
+      properties=[
+          Property(name="Person with highest sales", prop_type="string", description="The person with the highest sales"),
+          Property(name="Person with lowest sales", prop_type="string", description="The person with the lowest sales"),
+      ]
+  ))
+  ```
+- **OCR Strategy**
+  ```python
+  config = Configuration(ocr_strategy=OcrStrategy.AUTO)
+  ```
+- **Segment Processing**
+  ```python
+  from chunkr_ai.models import SegmentProcessing, GenerationConfig, GenerationStrategy
+  config = Configuration(
+      segment_processing=SegmentProcessing(
+          page=GenerationConfig(
+              html=GenerationStrategy.LLM,
+              markdown=GenerationStrategy.LLM
+          )
+      )
+  )
+  ```
+- **Segmentation Strategy**
+  ```python
+  config = Configuration(
+      segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS  # or SegmentationStrategy.PAGE
+  )
+  ```
+## Environment setup
 You can provide your API key and URL in several ways:
 1. Environment variables: `CHUNKR_API_KEY` and `CHUNKR_URL`
 2. `.env` file
@@ -112,13 +192,3 @@ chunkr = Chunkr(
     url="https://api.chunkr.ai"
 )
 ```
-## Run tests
-```python
-# Install dependencies
-uv pip install -e ".[test]"
-# Run tests
-uv run pytest
-```

{chunkr_ai-0.0.3 → chunkr_ai-0.0.5}/tests/test_chunkr.py RENAMED Viewed

@@ -8,7 +8,9 @@ from chunkr_ai.models import (
     Configuration,
     GenerationStrategy,
     GenerationConfig,
+    JsonSchema,
     OcrStrategy,
+    Property,
     SegmentationStrategy,
     SegmentProcessing,
     TaskResponse,
@@ -129,7 +131,21 @@ def test_page_llm(chunkr, sample_path):
     assert response.task_id is not None
     assert response.status == "Succeeded"
     assert response.output is not None
+def test_json_schema(chunkr, sample_path):
+    response = chunkr.upload(sample_path, Configuration(
+        json_schema=JsonSchema(
+            title="Sales Data",
+            properties=[
+                Property(name="Person with highest sales", prop_type="string", description="The person with the highest sales"),
+                Property(name="Person with lowest sales", prop_type="string", description="The person with the lowest sales"),
+            ]
+        )
+    ))
+    assert isinstance(response, TaskResponse)
+    assert response.task_id is not None
+    assert response.status == "Succeeded"
+    assert response.output is not None
 async def test_async_send_file_path(async_chunkr, sample_path):
     response = await async_chunkr.upload(sample_path)
@@ -138,4 +154,5 @@ async def test_async_send_file_path(async_chunkr, sample_path):
     assert response.task_id is not None
     assert response.status == "Succeeded"
     assert response.output is not None

chunkr_ai-0.0.3/README.md DELETED Viewed

@@ -1,107 +0,0 @@
-# Chunkr Python Client
-This is the Python client for the Chunkr API. It provides a simple interface to interact with Chunkr's services.
-## Installation
-```bash
-pip install chunkr-ai
-```
-## Usage
-We provide two clients: `Chunkr` for synchronous operations and `ChunkrAsync` for asynchronous operations.
-### Synchronous Usage
-```python
-from chunkr_ai import Chunkr
-# Initialize client
-chunkr = Chunkr()
-# Upload a file and wait for processing
-task = chunkr.upload("document.pdf")
-# Print the response
-print(task)
-# Get output from task
-output = task.output
-# If you want to upload without waiting for processing
-task = chunkr.start_upload("document.pdf")
-# ... do other things ...
-task.poll()  # Check status when needed
-```
-### Asynchronous Usage
-```python
-from chunkr_ai import ChunkrAsync
-async def process_document():
-    # Initialize client
-    chunkr = ChunkrAsync()
-    # Upload a file and wait for processing
-    task = await chunkr.upload("document.pdf")
-    # Print the response
-    print(task)
-    # Get output from task
-    output = task.output
-    # If you want to upload without waiting for processing
-    task = await chunkr.start_upload("document.pdf")
-    # ... do other things ...
-    await task.poll_async()  # Check status when needed
-```
-### Additional Features
-Both clients support various input types:
-```python
-# Upload from file path
-chunkr.upload("document.pdf")
-# Upload from opened file
-with open("document.pdf", "rb") as f:
-    chunkr.upload(f)
-# Upload from URL
-chunkr.upload("https://example.com/document.pdf")
-# Upload from base64 string
-chunkr.upload("data:application/pdf;base64,JVBERi0xLjcKCjEgMCBvYmo...")
-# Upload an image
-from PIL import Image
-img = Image.open("photo.jpg")
-chunkr.upload(img)
-```
-### Configuration
-You can provide your API key and URL in several ways:
-1. Environment variables: `CHUNKR_API_KEY` and `CHUNKR_URL`
-2. `.env` file
-3. Direct initialization:
-```python
-chunkr = Chunkr(
-    api_key="your-api-key",
-    url="https://api.chunkr.ai"
-)
-```
-## Run tests
-```python
-# Install dependencies
-uv pip install -e ".[test]"
-# Run tests
-uv run pytest
-```