parallex 0.1.0__tar.gz → 0.1.1__tar.gz

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,93 @@
1
+ Metadata-Version: 2.1
2
+ Name: parallex
3
+ Version: 0.1.1
4
+ Summary:
5
+ Author: Jeff Hostetler
6
+ Author-email: jeff@summed.ai
7
+ Requires-Python: >=3.12,<4.0
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3.12
10
+ Classifier: Programming Language :: Python :: 3.13
11
+ Requires-Dist: aiologger (>=0.7.0,<0.8.0)
12
+ Requires-Dist: asyncio (>=3.4.3,<4.0.0)
13
+ Requires-Dist: httpx (>=0.27.2,<0.28.0)
14
+ Requires-Dist: openai (>=1.54.4,<2.0.0)
15
+ Requires-Dist: pdf2image (>=1.17.0,<2.0.0)
16
+ Requires-Dist: pydantic (>=2.9.2,<3.0.0)
17
+ Description-Content-Type: text/markdown
18
+
19
+ # Parallex
20
+
21
+ ### What it does
22
+ - Converts PDF into images
23
+ - Makes requests to Azure OpenAI to covert the images to markdown using Batch API
24
+ - [Azure OpenAPI Batch](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/batch?tabs=standard-input%2Cpython-secure&pivots=programming-language-python)
25
+ - [OpenAPI Batch](https://platform.openai.com/docs/guides/batch)
26
+ - Polls for batch completion and then coverts AI responses in structured output based on the page of the corresponding PDF
27
+ - Post batch processing to do what you wish with the resulting markdown
28
+
29
+ ### Requirements
30
+ Parallex uses `graphicsmagick` for the conversion of PDF to images.
31
+ ```bash
32
+ brew install graphicsmagick
33
+ ```
34
+
35
+
36
+ ### Example usage
37
+
38
+ ```python
39
+ import os
40
+ from parallex.models.parallex_callable_output import ParallexCallableOutput
41
+ from parallex.parallex import parallex
42
+
43
+ os.environ["AZURE_OPENAI_API_KEY"] = "key"
44
+ os.environ["AZURE_OPENAI_ENDPOINT"] = "your-endpoint.com"
45
+ os.environ["AZURE_OPENAI_API_VERSION"] = "deployment_version"
46
+ os.environ["AZURE_OPENAI_API_DEPLOYMENT"] = "deployment_name"
47
+
48
+ model = "gpt-4o"
49
+
50
+ async def some_operation(file_url: str) -> None:
51
+ response_data: ParallexCallableOutput = await parallex(
52
+ model=model,
53
+ pdf_source_url=file_url,
54
+ post_process_callable=example_post_process, # Optional
55
+ concurrency=2, # Optional
56
+ prompt_text="Turn images into markdown", # Optional
57
+ log_level="ERROR" # Optional
58
+ )
59
+ pages = response_data.pages
60
+
61
+ def example_post_process(output: ParallexCallableOutput) -> None:
62
+ file_name = output.file_name
63
+ pages = output.pages
64
+ for page in pages:
65
+ markdown_for_page = page.output_content
66
+ pdf_page_number = page.page_number
67
+
68
+ ```
69
+
70
+ Responses have the following structure;
71
+ ```python
72
+ class ParallexCallableOutput(BaseModel):
73
+ file_name: str = Field(description="Name of file that is processed")
74
+ pdf_source_url: str = Field(description="Given URL of the source of output")
75
+ trace_id: UUID = Field(description="Unique trace for each file")
76
+ pages: list[PageResponse] = Field(description="List of PageResponse objects")
77
+
78
+ class PageResponse(BaseModel):
79
+ output_content: str = Field(description="Markdown generated for the page")
80
+ page_number: int = Field(description="Page number of the associated PDF")
81
+ ```
82
+
83
+ ### Default prompt is
84
+ ```python
85
+ """
86
+ Convert the following PDF page to markdown.
87
+ Return only the markdown with no explanation text.
88
+ Leave out any page numbers and redundant headers or footers.
89
+ Do not include any code blocks (e.g. "```markdown" or "```") in the response.
90
+ If unable to parse, return an empty string.
91
+ """
92
+ ```
93
+
@@ -0,0 +1,74 @@
1
+ # Parallex
2
+
3
+ ### What it does
4
+ - Converts PDF into images
5
+ - Makes requests to Azure OpenAI to covert the images to markdown using Batch API
6
+ - [Azure OpenAPI Batch](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/batch?tabs=standard-input%2Cpython-secure&pivots=programming-language-python)
7
+ - [OpenAPI Batch](https://platform.openai.com/docs/guides/batch)
8
+ - Polls for batch completion and then coverts AI responses in structured output based on the page of the corresponding PDF
9
+ - Post batch processing to do what you wish with the resulting markdown
10
+
11
+ ### Requirements
12
+ Parallex uses `graphicsmagick` for the conversion of PDF to images.
13
+ ```bash
14
+ brew install graphicsmagick
15
+ ```
16
+
17
+
18
+ ### Example usage
19
+
20
+ ```python
21
+ import os
22
+ from parallex.models.parallex_callable_output import ParallexCallableOutput
23
+ from parallex.parallex import parallex
24
+
25
+ os.environ["AZURE_OPENAI_API_KEY"] = "key"
26
+ os.environ["AZURE_OPENAI_ENDPOINT"] = "your-endpoint.com"
27
+ os.environ["AZURE_OPENAI_API_VERSION"] = "deployment_version"
28
+ os.environ["AZURE_OPENAI_API_DEPLOYMENT"] = "deployment_name"
29
+
30
+ model = "gpt-4o"
31
+
32
+ async def some_operation(file_url: str) -> None:
33
+ response_data: ParallexCallableOutput = await parallex(
34
+ model=model,
35
+ pdf_source_url=file_url,
36
+ post_process_callable=example_post_process, # Optional
37
+ concurrency=2, # Optional
38
+ prompt_text="Turn images into markdown", # Optional
39
+ log_level="ERROR" # Optional
40
+ )
41
+ pages = response_data.pages
42
+
43
+ def example_post_process(output: ParallexCallableOutput) -> None:
44
+ file_name = output.file_name
45
+ pages = output.pages
46
+ for page in pages:
47
+ markdown_for_page = page.output_content
48
+ pdf_page_number = page.page_number
49
+
50
+ ```
51
+
52
+ Responses have the following structure;
53
+ ```python
54
+ class ParallexCallableOutput(BaseModel):
55
+ file_name: str = Field(description="Name of file that is processed")
56
+ pdf_source_url: str = Field(description="Given URL of the source of output")
57
+ trace_id: UUID = Field(description="Unique trace for each file")
58
+ pages: list[PageResponse] = Field(description="List of PageResponse objects")
59
+
60
+ class PageResponse(BaseModel):
61
+ output_content: str = Field(description="Markdown generated for the page")
62
+ page_number: int = Field(description="Page number of the associated PDF")
63
+ ```
64
+
65
+ ### Default prompt is
66
+ ```python
67
+ """
68
+ Convert the following PDF page to markdown.
69
+ Return only the markdown with no explanation text.
70
+ Leave out any page numbers and redundant headers or footers.
71
+ Do not include any code blocks (e.g. "```markdown" or "```") in the response.
72
+ If unable to parse, return an empty string.
73
+ """
74
+ ```
@@ -21,9 +21,9 @@ async def parallex(
21
21
  model: str,
22
22
  pdf_source_url: str,
23
23
  post_process_callable: Optional[Callable[..., None]] = None,
24
- concurrency: int = 20,
25
- prompt_text: str = DEFAULT_PROMPT,
26
- log_level: str = "ERROR",
24
+ concurrency: Optional[int] = 20,
25
+ prompt_text: Optional[str] = DEFAULT_PROMPT,
26
+ log_level: Optional[str] = "ERROR",
27
27
  ) -> ParallexCallableOutput:
28
28
  setup_logger(log_level)
29
29
  with tempfile.TemporaryDirectory() as temp_directory:
@@ -1,6 +1,6 @@
1
1
  [tool.poetry]
2
2
  name = "parallex"
3
- version = "0.1.0"
3
+ version = "0.1.1"
4
4
  description = ""
5
5
  authors = ["Jeff Hostetler <jeff@summed.ai>", "Kevin Bao <kevin@summed.ai>"]
6
6
  readme = "README.md"
parallex-0.1.0/PKG-INFO DELETED
@@ -1,42 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: parallex
3
- Version: 0.1.0
4
- Summary:
5
- Author: Jeff Hostetler
6
- Author-email: jeff@summed.ai
7
- Requires-Python: >=3.12,<4.0
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: Programming Language :: Python :: 3.12
10
- Classifier: Programming Language :: Python :: 3.13
11
- Requires-Dist: aiologger (>=0.7.0,<0.8.0)
12
- Requires-Dist: asyncio (>=3.4.3,<4.0.0)
13
- Requires-Dist: httpx (>=0.27.2,<0.28.0)
14
- Requires-Dist: openai (>=1.54.4,<2.0.0)
15
- Requires-Dist: pdf2image (>=1.17.0,<2.0.0)
16
- Requires-Dist: pydantic (>=2.9.2,<3.0.0)
17
- Description-Content-Type: text/markdown
18
-
19
- # Parallex
20
-
21
- ### What it does
22
- - Converts file into images
23
- - Makes requests to OpenAI to covert the images to markdown
24
- - [Azure OpenAPI Batch](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/batch?tabs=standard-input%2Cpython-secure&pivots=programming-language-python)
25
- - [OpenAPI Batch](https://platform.openai.com/docs/guides/batch)
26
- - Post batch processing to do what you wish with the resulting markdown
27
-
28
-
29
- # Notes for us as we build
30
- ### Poetry
31
- - Using [poetry](https://python-poetry.org/docs/) for dependency management
32
- - add dependency `poetry add pydantic`
33
- - add dev dependency `poetry add --group dev black`
34
- - run main script `poetry run python main.py`
35
- - run dev commands `poetry run black parallex`
36
-
37
-
38
- # General behavior
39
- - parallex takes args to do things with file
40
- - parallex takes args to specify llm model
41
- - parallex takes a callable to execute once batch process is "ready"
42
-
parallex-0.1.0/README.md DELETED
@@ -1,23 +0,0 @@
1
- # Parallex
2
-
3
- ### What it does
4
- - Converts file into images
5
- - Makes requests to OpenAI to covert the images to markdown
6
- - [Azure OpenAPI Batch](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/batch?tabs=standard-input%2Cpython-secure&pivots=programming-language-python)
7
- - [OpenAPI Batch](https://platform.openai.com/docs/guides/batch)
8
- - Post batch processing to do what you wish with the resulting markdown
9
-
10
-
11
- # Notes for us as we build
12
- ### Poetry
13
- - Using [poetry](https://python-poetry.org/docs/) for dependency management
14
- - add dependency `poetry add pydantic`
15
- - add dev dependency `poetry add --group dev black`
16
- - run main script `poetry run python main.py`
17
- - run dev commands `poetry run black parallex`
18
-
19
-
20
- # General behavior
21
- - parallex takes args to do things with file
22
- - parallex takes args to specify llm model
23
- - parallex takes a callable to execute once batch process is "ready"
File without changes
File without changes