PyPI - parallex - Versions diffs - 0.1.0__tar.gz → 0.1.1__tar.gz - Mend

parallex 0.1.0tar.gz → 0.1.1tar.gz

Files changed (23) hide show

parallex-0.1.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,93 @@
+Metadata-Version: 2.1
+Name: parallex
+Version: 0.1.1
+Summary:
+Author: Jeff Hostetler
+Author-email: jeff@summed.ai
+Requires-Python: >=3.12,<4.0
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Requires-Dist: aiologger (>=0.7.0,<0.8.0)
+Requires-Dist: asyncio (>=3.4.3,<4.0.0)
+Requires-Dist: httpx (>=0.27.2,<0.28.0)
+Requires-Dist: openai (>=1.54.4,<2.0.0)
+Requires-Dist: pdf2image (>=1.17.0,<2.0.0)
+Requires-Dist: pydantic (>=2.9.2,<3.0.0)
+Description-Content-Type: text/markdown
+# Parallex
+### What it does
+- Converts PDF into images
+- Makes requests to Azure OpenAI to covert the images to markdown using Batch API
+  - [Azure OpenAPI Batch](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/batch?tabs=standard-input%2Cpython-secure&pivots=programming-language-python)
+  - [OpenAPI Batch](https://platform.openai.com/docs/guides/batch)
+- Polls for batch completion and then coverts AI responses in structured output based on the page of the corresponding PDF
+- Post batch processing to do what you wish with the resulting markdown
+### Requirements
+Parallex uses `graphicsmagick` for the conversion of PDF to images.
+```bash
+brew install graphicsmagick
+```
+### Example usage
+```python
+import os
+from parallex.models.parallex_callable_output import ParallexCallableOutput
+from parallex.parallex import parallex
+os.environ["AZURE_OPENAI_API_KEY"] = "key"
+os.environ["AZURE_OPENAI_ENDPOINT"] = "your-endpoint.com"
+os.environ["AZURE_OPENAI_API_VERSION"] = "deployment_version"
+os.environ["AZURE_OPENAI_API_DEPLOYMENT"] = "deployment_name"
+model = "gpt-4o"
+async def some_operation(file_url: str) -> None:
+  response_data: ParallexCallableOutput = await parallex(
+    model=model,
+    pdf_source_url=file_url,
+    post_process_callable=example_post_process, # Optional
+    concurrency=2, # Optional
+    prompt_text="Turn images into markdown", # Optional
+    log_level="ERROR" # Optional
+  )
+  pages = response_data.pages
+def example_post_process(output: ParallexCallableOutput) -> None:
+    file_name = output.file_name
+    pages = output.pages
+    for page in pages:
+        markdown_for_page = page.output_content
+        pdf_page_number = page.page_number
+```
+Responses have the following structure;
+```python
+class ParallexCallableOutput(BaseModel):
+    file_name: str = Field(description="Name of file that is processed")
+    pdf_source_url: str = Field(description="Given URL of the source of output")
+    trace_id: UUID = Field(description="Unique trace for each file")
+    pages: list[PageResponse] = Field(description="List of PageResponse objects")
+class PageResponse(BaseModel):
+    output_content: str = Field(description="Markdown generated for the page")
+    page_number: int = Field(description="Page number of the associated PDF")
+```
+### Default prompt is
+```python
+"""
+    Convert the following PDF page to markdown.
+    Return only the markdown with no explanation text.
+    Leave out any page numbers and redundant headers or footers.
+    Do not include any code blocks (e.g. "```markdown" or "```") in the response.
+    If unable to parse, return an empty string.
+"""
+```

parallex-0.1.1/README.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Parallex
+### What it does
+- Converts PDF into images
+- Makes requests to Azure OpenAI to covert the images to markdown using Batch API
+  - [Azure OpenAPI Batch](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/batch?tabs=standard-input%2Cpython-secure&pivots=programming-language-python)
+  - [OpenAPI Batch](https://platform.openai.com/docs/guides/batch)
+- Polls for batch completion and then coverts AI responses in structured output based on the page of the corresponding PDF
+- Post batch processing to do what you wish with the resulting markdown
+### Requirements
+Parallex uses `graphicsmagick` for the conversion of PDF to images.
+```bash
+brew install graphicsmagick
+```
+### Example usage
+```python
+import os
+from parallex.models.parallex_callable_output import ParallexCallableOutput
+from parallex.parallex import parallex
+os.environ["AZURE_OPENAI_API_KEY"] = "key"
+os.environ["AZURE_OPENAI_ENDPOINT"] = "your-endpoint.com"
+os.environ["AZURE_OPENAI_API_VERSION"] = "deployment_version"
+os.environ["AZURE_OPENAI_API_DEPLOYMENT"] = "deployment_name"
+model = "gpt-4o"
+async def some_operation(file_url: str) -> None:
+  response_data: ParallexCallableOutput = await parallex(
+    model=model,
+    pdf_source_url=file_url,
+    post_process_callable=example_post_process, # Optional
+    concurrency=2, # Optional
+    prompt_text="Turn images into markdown", # Optional
+    log_level="ERROR" # Optional
+  )
+  pages = response_data.pages
+def example_post_process(output: ParallexCallableOutput) -> None:
+    file_name = output.file_name
+    pages = output.pages
+    for page in pages:
+        markdown_for_page = page.output_content
+        pdf_page_number = page.page_number
+```
+Responses have the following structure;
+```python
+class ParallexCallableOutput(BaseModel):
+    file_name: str = Field(description="Name of file that is processed")
+    pdf_source_url: str = Field(description="Given URL of the source of output")
+    trace_id: UUID = Field(description="Unique trace for each file")
+    pages: list[PageResponse] = Field(description="List of PageResponse objects")
+class PageResponse(BaseModel):
+    output_content: str = Field(description="Markdown generated for the page")
+    page_number: int = Field(description="Page number of the associated PDF")
+```
+### Default prompt is
+```python
+"""
+    Convert the following PDF page to markdown.
+    Return only the markdown with no explanation text.
+    Leave out any page numbers and redundant headers or footers.
+    Do not include any code blocks (e.g. "```markdown" or "```") in the response.
+    If unable to parse, return an empty string.
+"""
+```

{parallex-0.1.0 → parallex-0.1.1}/parallex/parallex.py RENAMED Viewed

@@ -21,9 +21,9 @@ async def parallex(
     model: str,
     pdf_source_url: str,
     post_process_callable: Optional[Callable[..., None]] = None,
-    concurrency: int = 20,
-    prompt_text: str = DEFAULT_PROMPT,
-    log_level: str = "ERROR",
+    concurrency: Optional[int] = 20,
+    prompt_text: Optional[str] = DEFAULT_PROMPT,
+    log_level: Optional[str] = "ERROR",
 ) -> ParallexCallableOutput:
     setup_logger(log_level)
     with tempfile.TemporaryDirectory() as temp_directory:

{parallex-0.1.0 → parallex-0.1.1}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "parallex"
-version = "0.1.0"
+version = "0.1.1"
 description = ""
 authors = ["Jeff Hostetler <jeff@summed.ai>", "Kevin Bao <kevin@summed.ai>"]
 readme = "README.md"

parallex-0.1.0/PKG-INFO DELETED Viewed

@@ -1,42 +0,0 @@
-Metadata-Version: 2.1
-Name: parallex
-Version: 0.1.0
-Summary:
-Author: Jeff Hostetler
-Author-email: jeff@summed.ai
-Requires-Python: >=3.12,<4.0
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.12
-Classifier: Programming Language :: Python :: 3.13
-Requires-Dist: aiologger (>=0.7.0,<0.8.0)
-Requires-Dist: asyncio (>=3.4.3,<4.0.0)
-Requires-Dist: httpx (>=0.27.2,<0.28.0)
-Requires-Dist: openai (>=1.54.4,<2.0.0)
-Requires-Dist: pdf2image (>=1.17.0,<2.0.0)
-Requires-Dist: pydantic (>=2.9.2,<3.0.0)
-Description-Content-Type: text/markdown
-# Parallex
-### What it does
-- Converts file into images
-- Makes requests to OpenAI to covert the images to markdown
-  - [Azure OpenAPI Batch](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/batch?tabs=standard-input%2Cpython-secure&pivots=programming-language-python)
-  - [OpenAPI Batch](https://platform.openai.com/docs/guides/batch)
-- Post batch processing to do what you wish with the resulting markdown
-# Notes for us as we build
-### Poetry
-- Using [poetry](https://python-poetry.org/docs/) for dependency management
-- add dependency `poetry add pydantic`
-- add dev dependency `poetry add --group dev black`
-- run main script `poetry run python main.py`
-- run dev commands `poetry run black parallex`
-# General behavior
-- parallex takes args to do things with file
-- parallex takes args to specify llm model
-- parallex takes a callable to execute once batch process is "ready"

parallex-0.1.0/README.md DELETED Viewed

@@ -1,23 +0,0 @@
-# Parallex
-### What it does
-- Converts file into images
-- Makes requests to OpenAI to covert the images to markdown
-  - [Azure OpenAPI Batch](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/batch?tabs=standard-input%2Cpython-secure&pivots=programming-language-python)
-  - [OpenAPI Batch](https://platform.openai.com/docs/guides/batch)
-- Post batch processing to do what you wish with the resulting markdown
-# Notes for us as we build
-### Poetry
-- Using [poetry](https://python-poetry.org/docs/) for dependency management
-- add dependency `poetry add pydantic`
-- add dev dependency `poetry add --group dev black`
-- run main script `poetry run python main.py`
-- run dev commands `poetry run black parallex`
-# General behavior
-- parallex takes args to do things with file
-- parallex takes args to specify llm model
-- parallex takes a callable to execute once batch process is "ready"