mistral-ai-ocr 1.0__tar.gz → 1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,10 +1,46 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: mistral-ai-ocr
3
- Version: 1.0
3
+ Version: 1.2
4
4
  Description-Content-Type: text/markdown
5
5
 
6
6
  # Mistral AI OCR
7
- This is a simple script that uses the Mistral AI OCR API to extract text from a PDF or image file
7
+ This is a simple script that uses the Mistral AI OCR API to get the Markdown text from a PDF or image file
8
+
9
+ # Usage
10
+
11
+ ## Install the Requirements
12
+
13
+ To install the necessary requirements, run the following command:
14
+
15
+ ```sh
16
+ pip install mistral-ai-ocr
17
+ ```
18
+
19
+ ## Typical Usage
20
+
21
+ ```sh
22
+ mistral-ai-ocr paper.pdf
23
+ mistral-ai-ocr paper.pdf --api-key jrWjJE5lFketfB2sA6vvhQK2SoHQ6R39
24
+ mistral-ai-ocr paper.pdf -o revision
25
+ mistral-ai-ocr paper.pdf -e
26
+ mistral-ai-ocr paper.pdf -m FULL
27
+ mistral-ai-ocr page74.jpg -e
28
+ mistral-ai-ocr -j paper.json
29
+ mistral-ai-ocr -j paper.json -m TEXT_NO_PAGES -n
30
+ ```
31
+
32
+ ## Arguments
33
+
34
+ | Argument || Description |
35
+ |-|-|-|
36
+ | | | input PDF or image file |
37
+ | -k API_KEY | --api-key API_KEY | Mistral API key, can be set via the **MISTRAL_API_KEY** environment variable |
38
+ | -o OUTPUT | --output OUTPUT | output directory path. If not set, a directory will be created in the current working directory using the same stem (filename without extension) as the input file |
39
+ | -j JSON_OCR_RESPONSE | --json-ocr-response JSON_OCR_RESPONSE | path from which to load a pre-existing JSON OCR response (any input file will be ignored) |
40
+ | -m MODE | --mode MODE | mode of operation: either the name or numerical value of the mode. _Defaults to FULL_NO_PAGES_ |
41
+ | -s PAGE_SEPARATOR | --page-separator PAGE_SEPARATOR | page separator to use when writing the Markdown file. _Defaults to `\n`_ |
42
+ | -n | --no-json | do not write the JSON OCR response to a file. By default, the response is written |
43
+ | -e | --load-dot-env | load the .env file from the current directory using [`python-dotenv`](https://pypi.org/project/python-dotenv/), to retrieve the Mistral API key |
8
44
 
9
45
  ## Modes
10
46
 
@@ -106,42 +142,6 @@ paper
106
142
 
107
143
  By default, the JSON response from the Mistral AI OCR API is saved in the output directory. To disable JSON output, use the `-n` or `--no-json` argument. To experiment with a different **mode** without using additional API calls, reuse an existing JSON response instead of the original input file
108
144
 
109
- # Usage
110
-
111
- ## Install the Requirements
112
-
113
- To install the necessary requirements, run the following command:
114
-
115
- ```sh
116
- pip install mistral-ai-ocr
117
- ```
118
-
119
- ## Typical Usage
120
-
121
- ```sh
122
- mistral-ai-ocr paper.pdf
123
- mistral-ai-ocr paper.pdf --api-key jrWjJE5lFketfB2sA6vvhQK2SoHQ6R39
124
- mistral-ai-ocr paper.pdf -o revision
125
- mistral-ai-ocr paper.pdf -e
126
- mistral-ai-ocr paper.pdf -m FULL
127
- mistral-ai-ocr page74.jpg -e
128
- mistral-ai-ocr -j paper.json
129
- mistral-ai-ocr -j paper.json -m TEXT_NO_PAGES -n
130
- ```
131
-
132
- ## Arguments
133
-
134
- | Argument || Description |
135
- |-|-|-|
136
- | | | input PDF or image file |
137
- | -k API_KEY | --api-key API_KEY | Mistral API key, can be set via the **MISTRAL_API_KEY** environment variable |
138
- | -o OUTPUT | --output OUTPUT | output directory path. If not set, a directory will be created in the current working directory using the same stem (filename without extension) as the input file |
139
- | -j JSON_OCR_RESPONSE | --json-ocr-response JSON_OCR_RESPONSE | path from which to load a pre-existing JSON OCR response (any input file will be ignored) |
140
- | -m MODE | --mode MODE | mode of operation: either the name or numerical value of the mode. _Defaults to FULL_NO_PAGES_ |
141
- | -s PAGE_SEPARATOR | --page-separator PAGE_SEPARATOR | page separator to use when writing the Markdown file. _Defaults to `\n`_ |
142
- | -n | --no-json | do not write the JSON OCR response to a file. By default, the response is written |
143
- | -e | --load-dot-env | load the .env file from the current directory using [`python-dotenv`](https://pypi.org/project/python-dotenv/), to retrieve the Mistral API key |
144
-
145
145
  ### Mistral AI API Key
146
146
 
147
147
  To obtain an API key, you need a [Mistral AI](https://auth.mistral.ai/ui/registration) account. Then visit [https://admin.mistral.ai/organization/api-keys](https://admin.mistral.ai/organization/api-keys) and click the **Create new key** button
@@ -1,5 +1,41 @@
1
1
  # Mistral AI OCR
2
- This is a simple script that uses the Mistral AI OCR API to extract text from a PDF or image file
2
+ This is a simple script that uses the Mistral AI OCR API to get the Markdown text from a PDF or image file
3
+
4
+ # Usage
5
+
6
+ ## Install the Requirements
7
+
8
+ To install the necessary requirements, run the following command:
9
+
10
+ ```sh
11
+ pip install mistral-ai-ocr
12
+ ```
13
+
14
+ ## Typical Usage
15
+
16
+ ```sh
17
+ mistral-ai-ocr paper.pdf
18
+ mistral-ai-ocr paper.pdf --api-key jrWjJE5lFketfB2sA6vvhQK2SoHQ6R39
19
+ mistral-ai-ocr paper.pdf -o revision
20
+ mistral-ai-ocr paper.pdf -e
21
+ mistral-ai-ocr paper.pdf -m FULL
22
+ mistral-ai-ocr page74.jpg -e
23
+ mistral-ai-ocr -j paper.json
24
+ mistral-ai-ocr -j paper.json -m TEXT_NO_PAGES -n
25
+ ```
26
+
27
+ ## Arguments
28
+
29
+ | Argument || Description |
30
+ |-|-|-|
31
+ | | | input PDF or image file |
32
+ | -k API_KEY | --api-key API_KEY | Mistral API key, can be set via the **MISTRAL_API_KEY** environment variable |
33
+ | -o OUTPUT | --output OUTPUT | output directory path. If not set, a directory will be created in the current working directory using the same stem (filename without extension) as the input file |
34
+ | -j JSON_OCR_RESPONSE | --json-ocr-response JSON_OCR_RESPONSE | path from which to load a pre-existing JSON OCR response (any input file will be ignored) |
35
+ | -m MODE | --mode MODE | mode of operation: either the name or numerical value of the mode. _Defaults to FULL_NO_PAGES_ |
36
+ | -s PAGE_SEPARATOR | --page-separator PAGE_SEPARATOR | page separator to use when writing the Markdown file. _Defaults to `\n`_ |
37
+ | -n | --no-json | do not write the JSON OCR response to a file. By default, the response is written |
38
+ | -e | --load-dot-env | load the .env file from the current directory using [`python-dotenv`](https://pypi.org/project/python-dotenv/), to retrieve the Mistral API key |
3
39
 
4
40
  ## Modes
5
41
 
@@ -101,42 +137,6 @@ paper
101
137
 
102
138
  By default, the JSON response from the Mistral AI OCR API is saved in the output directory. To disable JSON output, use the `-n` or `--no-json` argument. To experiment with a different **mode** without using additional API calls, reuse an existing JSON response instead of the original input file
103
139
 
104
- # Usage
105
-
106
- ## Install the Requirements
107
-
108
- To install the necessary requirements, run the following command:
109
-
110
- ```sh
111
- pip install mistral-ai-ocr
112
- ```
113
-
114
- ## Typical Usage
115
-
116
- ```sh
117
- mistral-ai-ocr paper.pdf
118
- mistral-ai-ocr paper.pdf --api-key jrWjJE5lFketfB2sA6vvhQK2SoHQ6R39
119
- mistral-ai-ocr paper.pdf -o revision
120
- mistral-ai-ocr paper.pdf -e
121
- mistral-ai-ocr paper.pdf -m FULL
122
- mistral-ai-ocr page74.jpg -e
123
- mistral-ai-ocr -j paper.json
124
- mistral-ai-ocr -j paper.json -m TEXT_NO_PAGES -n
125
- ```
126
-
127
- ## Arguments
128
-
129
- | Argument || Description |
130
- |-|-|-|
131
- | | | input PDF or image file |
132
- | -k API_KEY | --api-key API_KEY | Mistral API key, can be set via the **MISTRAL_API_KEY** environment variable |
133
- | -o OUTPUT | --output OUTPUT | output directory path. If not set, a directory will be created in the current working directory using the same stem (filename without extension) as the input file |
134
- | -j JSON_OCR_RESPONSE | --json-ocr-response JSON_OCR_RESPONSE | path from which to load a pre-existing JSON OCR response (any input file will be ignored) |
135
- | -m MODE | --mode MODE | mode of operation: either the name or numerical value of the mode. _Defaults to FULL_NO_PAGES_ |
136
- | -s PAGE_SEPARATOR | --page-separator PAGE_SEPARATOR | page separator to use when writing the Markdown file. _Defaults to `\n`_ |
137
- | -n | --no-json | do not write the JSON OCR response to a file. By default, the response is written |
138
- | -e | --load-dot-env | load the .env file from the current directory using [`python-dotenv`](https://pypi.org/project/python-dotenv/), to retrieve the Mistral API key |
139
-
140
140
  ### Mistral AI API Key
141
141
 
142
142
  To obtain an API key, you need a [Mistral AI](https://auth.mistral.ai/ui/registration) account. Then visit [https://admin.mistral.ai/organization/api-keys](https://admin.mistral.ai/organization/api-keys) and click the **Create new key** button
@@ -49,6 +49,7 @@ def main():
49
49
 
50
50
  if args.load_dot_env:
51
51
  load_dotenv()
52
+ load_dotenv(".env")
52
53
 
53
54
  if args.api_key is None:
54
55
  args.api_key = getenv("MISTRAL_API_KEY")
@@ -1,10 +1,46 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: mistral-ai-ocr
3
- Version: 1.0
3
+ Version: 1.2
4
4
  Description-Content-Type: text/markdown
5
5
 
6
6
  # Mistral AI OCR
7
- This is a simple script that uses the Mistral AI OCR API to extract text from a PDF or image file
7
+ This is a simple script that uses the Mistral AI OCR API to get the Markdown text from a PDF or image file
8
+
9
+ # Usage
10
+
11
+ ## Install the Requirements
12
+
13
+ To install the necessary requirements, run the following command:
14
+
15
+ ```sh
16
+ pip install mistral-ai-ocr
17
+ ```
18
+
19
+ ## Typical Usage
20
+
21
+ ```sh
22
+ mistral-ai-ocr paper.pdf
23
+ mistral-ai-ocr paper.pdf --api-key jrWjJE5lFketfB2sA6vvhQK2SoHQ6R39
24
+ mistral-ai-ocr paper.pdf -o revision
25
+ mistral-ai-ocr paper.pdf -e
26
+ mistral-ai-ocr paper.pdf -m FULL
27
+ mistral-ai-ocr page74.jpg -e
28
+ mistral-ai-ocr -j paper.json
29
+ mistral-ai-ocr -j paper.json -m TEXT_NO_PAGES -n
30
+ ```
31
+
32
+ ## Arguments
33
+
34
+ | Argument || Description |
35
+ |-|-|-|
36
+ | | | input PDF or image file |
37
+ | -k API_KEY | --api-key API_KEY | Mistral API key, can be set via the **MISTRAL_API_KEY** environment variable |
38
+ | -o OUTPUT | --output OUTPUT | output directory path. If not set, a directory will be created in the current working directory using the same stem (filename without extension) as the input file |
39
+ | -j JSON_OCR_RESPONSE | --json-ocr-response JSON_OCR_RESPONSE | path from which to load a pre-existing JSON OCR response (any input file will be ignored) |
40
+ | -m MODE | --mode MODE | mode of operation: either the name or numerical value of the mode. _Defaults to FULL_NO_PAGES_ |
41
+ | -s PAGE_SEPARATOR | --page-separator PAGE_SEPARATOR | page separator to use when writing the Markdown file. _Defaults to `\n`_ |
42
+ | -n | --no-json | do not write the JSON OCR response to a file. By default, the response is written |
43
+ | -e | --load-dot-env | load the .env file from the current directory using [`python-dotenv`](https://pypi.org/project/python-dotenv/), to retrieve the Mistral API key |
8
44
 
9
45
  ## Modes
10
46
 
@@ -106,42 +142,6 @@ paper
106
142
 
107
143
  By default, the JSON response from the Mistral AI OCR API is saved in the output directory. To disable JSON output, use the `-n` or `--no-json` argument. To experiment with a different **mode** without using additional API calls, reuse an existing JSON response instead of the original input file
108
144
 
109
- # Usage
110
-
111
- ## Install the Requirements
112
-
113
- To install the necessary requirements, run the following command:
114
-
115
- ```sh
116
- pip install mistral-ai-ocr
117
- ```
118
-
119
- ## Typical Usage
120
-
121
- ```sh
122
- mistral-ai-ocr paper.pdf
123
- mistral-ai-ocr paper.pdf --api-key jrWjJE5lFketfB2sA6vvhQK2SoHQ6R39
124
- mistral-ai-ocr paper.pdf -o revision
125
- mistral-ai-ocr paper.pdf -e
126
- mistral-ai-ocr paper.pdf -m FULL
127
- mistral-ai-ocr page74.jpg -e
128
- mistral-ai-ocr -j paper.json
129
- mistral-ai-ocr -j paper.json -m TEXT_NO_PAGES -n
130
- ```
131
-
132
- ## Arguments
133
-
134
- | Argument || Description |
135
- |-|-|-|
136
- | | | input PDF or image file |
137
- | -k API_KEY | --api-key API_KEY | Mistral API key, can be set via the **MISTRAL_API_KEY** environment variable |
138
- | -o OUTPUT | --output OUTPUT | output directory path. If not set, a directory will be created in the current working directory using the same stem (filename without extension) as the input file |
139
- | -j JSON_OCR_RESPONSE | --json-ocr-response JSON_OCR_RESPONSE | path from which to load a pre-existing JSON OCR response (any input file will be ignored) |
140
- | -m MODE | --mode MODE | mode of operation: either the name or numerical value of the mode. _Defaults to FULL_NO_PAGES_ |
141
- | -s PAGE_SEPARATOR | --page-separator PAGE_SEPARATOR | page separator to use when writing the Markdown file. _Defaults to `\n`_ |
142
- | -n | --no-json | do not write the JSON OCR response to a file. By default, the response is written |
143
- | -e | --load-dot-env | load the .env file from the current directory using [`python-dotenv`](https://pypi.org/project/python-dotenv/), to retrieve the Mistral API key |
144
-
145
145
  ### Mistral AI API Key
146
146
 
147
147
  To obtain an API key, you need a [Mistral AI](https://auth.mistral.ai/ui/registration) account. Then visit [https://admin.mistral.ai/organization/api-keys](https://admin.mistral.ai/organization/api-keys) and click the **Create new key** button
@@ -6,7 +6,7 @@ with open("README.md", "r", encoding="utf-8") as fh:
6
6
 
7
7
  setup(
8
8
  name="mistral-ai-ocr",
9
- version="1.0",
9
+ version="1.2",
10
10
  packages=find_packages(),
11
11
  entry_points={
12
12
  'console_scripts': [
File without changes