PyPI - gemini-ocr-cli - Versions diffs - 0.2.0__tar.gz - Mend

gemini-ocr-cli 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

gemini_ocr_cli-0.2.0/.env.example +16 -0
gemini_ocr_cli-0.2.0/.gitignore +57 -0
gemini_ocr_cli-0.2.0/CHANGELOG.md +47 -0
gemini_ocr_cli-0.2.0/LICENSE +21 -0
gemini_ocr_cli-0.2.0/PKG-INFO +193 -0
gemini_ocr_cli-0.2.0/README.md +155 -0
gemini_ocr_cli-0.2.0/gemini_ocr/__init__.py +8 -0
gemini_ocr_cli-0.2.0/gemini_ocr/__main__.py +6 -0
gemini_ocr_cli-0.2.0/gemini_ocr/cli.py +367 -0
gemini_ocr_cli-0.2.0/gemini_ocr/config.py +106 -0
gemini_ocr_cli-0.2.0/gemini_ocr/processor.py +560 -0
gemini_ocr_cli-0.2.0/gemini_ocr/retry.py +104 -0
gemini_ocr_cli-0.2.0/gemini_ocr/utils.py +193 -0
gemini_ocr_cli-0.2.0/pyproject.toml +82 -0
gemini_ocr_cli-0.2.0/tests/__init__.py +1 -0
gemini_ocr_cli-0.2.0/tests/conftest.py +131 -0
gemini_ocr_cli-0.2.0/tests/test_cli.py +194 -0
gemini_ocr_cli-0.2.0/tests/test_config.py +136 -0
gemini_ocr_cli-0.2.0/tests/test_import.py +27 -0
gemini_ocr_cli-0.2.0/tests/test_integration.py +149 -0
gemini_ocr_cli-0.2.0/tests/test_processor.py +344 -0
gemini_ocr_cli-0.2.0/tests/test_utils.py +217 -0
gemini_ocr_cli-0.2.0/uv.lock +1143 -0

gemini_ocr_cli-0.2.0/.env.example ADDED Viewed

@@ -0,0 +1,16 @@
+# Gemini OCR CLI Configuration
+# Required: Your Google Gemini API key
+# Get one at: https://makersuite.google.com/app/apikey
+GEMINI_API_KEY=your-api-key-here
+# Optional: Default model (default: gemini-2.0-flash-exp)
+# Options: gemini-2.0-flash-exp, gemini-1.5-flash, gemini-1.5-pro
+# GEMINI_MODEL=gemini-2.0-flash-exp
+# Optional: PDF rendering DPI (default: 200)
+# Higher = better quality but slower
+# GEMINI_DPI=200
+# Optional: Maximum file size in MB (default: 20)
+# GEMINI_MAX_FILE_SIZE_MB=20

gemini_ocr_cli-0.2.0/.gitignore ADDED Viewed

@@ -0,0 +1,57 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+.venv/
+venv/
+ENV/
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+.tox/
+.nox/
+# Type checking
+.mypy_cache/
+.dmypy.json
+# Environment
+.env
+.env.local
+.env.*.local
+# OS
+.DS_Store
+Thumbs.db
+# Output
+*_output/
+gemini_ocr_output/

gemini_ocr_cli-0.2.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,47 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.2.0] - 2024-12-23
+### Changed
+- **BREAKING**: Replaced page-by-page PDF processing with native Gemini Files API upload
+  - PDFs are now uploaded directly to Gemini (single API call per document)
+  - Significantly faster processing for multi-page documents
+  - Better quality: native PDF parsing preserves text, tables, and layout
+- Updated default model from `gemini-2.0-flash-exp` to `gemini-3.0-flash`
+- Simplified `OCRResult` dataclass (removed per-page tracking)
+### Added
+- Retry logic with exponential backoff for API rate limits
+- Comprehensive test suite (105 unit tests, integration tests)
+- `token_count` field in `OCRResult` for usage tracking
+### Removed
+- `--dpi` CLI flag (no longer applicable with native PDF upload)
+- `GEMINI_DPI` environment variable
+- Unused utility functions: `image_to_base64`, `pil_image_to_base64`, `save_base64_image`
+- Module-level `settings` singleton from config
+### Fixed
+- API key resolution now correctly prioritizes `GEMINI_API_KEY` over `GOOGLE_API_KEY`
+## [0.1.0] - 2024-12-22
+### Added
+- Initial release
+- PDF and image OCR using Google Gemini vision models
+- CLI commands: `process`, `describe`, `info`
+- Batch processing with progress tracking
+- Incremental processing (skip already-processed files)
+- Markdown output format
+- Figure/chart description generation
+- Support for multiple image formats (JPG, PNG, WEBP, GIF, BMP, TIFF)

gemini_ocr_cli-0.2.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2024 Ruben Fernandez-Fuertes
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

gemini_ocr_cli-0.2.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,193 @@
+Metadata-Version: 2.4
+Name: gemini-ocr-cli
+Version: 0.2.0
+Summary: CLI tool for OCR processing using Google Gemini's vision capabilities
+Project-URL: Homepage, https://github.com/r-uben/gemini-ocr-cli
+Project-URL: Repository, https://github.com/r-uben/gemini-ocr-cli
+Project-URL: Issues, https://github.com/r-uben/gemini-ocr-cli/issues
+Author-email: Ruben Fernandez-Fuertes <fernandezfuertesruben@gmail.com>
+License: MIT
+License-File: LICENSE
+Keywords: cli,document-processing,gemini,google,ocr,pdf,vision
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Text Processing :: General
+Requires-Python: >=3.10
+Requires-Dist: click>=8.1.0
+Requires-Dist: google-genai>=1.0.0
+Requires-Dist: pillow>=10.0.0
+Requires-Dist: pydantic-settings>=2.0.0
+Requires-Dist: pydantic>=2.0.0
+Requires-Dist: pymupdf>=1.24.0
+Requires-Dist: python-dotenv>=1.0.0
+Requires-Dist: rich>=13.0.0
+Provides-Extra: dev
+Requires-Dist: mypy>=1.0.0; extra == 'dev'
+Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
+Requires-Dist: pytest>=8.0.0; extra == 'dev'
+Requires-Dist: ruff>=0.8.0; extra == 'dev'
+Description-Content-Type: text/markdown
+# Gemini OCR CLI
+Command-line tool for OCR processing using Google Gemini's vision capabilities. Extract text, tables, equations, and figures from PDFs and images with high accuracy.
+## Features
+- **Native PDF upload**: Direct PDF processing via Gemini Files API (fast, single API call)
+- **Multi-format support**: PDF and images (JPG, PNG, WEBP, GIF, BMP, TIFF)
+- **High-quality OCR**: Leverages Gemini's advanced vision models
+- **Structure preservation**: Maintains headings, tables, lists, equations
+- **Figure analysis**: Generate detailed descriptions of charts and diagrams
+- **Batch processing**: Process entire directories with progress tracking
+- **Incremental processing**: Skip already-processed files
+- **Automatic retry**: Exponential backoff for API rate limits
+- **Markdown output**: Clean, structured output format
+## Installation
+### From PyPI (recommended)
+```bash
+pip install gemini-ocr-cli
+```
+### Using pipx
+```bash
+pipx install gemini-ocr-cli
+```
+### From source
+```bash
+git clone https://github.com/r-uben/gemini-ocr-cli.git
+cd gemini-ocr-cli
+uv pip install -e .
+```
+## Quick Start
+### API Key Resolution
+The CLI automatically picks up your API key from environment variables (no configuration needed if already set):
+**Priority order:**
+1. `--api-key` CLI argument (highest priority)
+2. `GEMINI_API_KEY` environment variable
+3. `GOOGLE_API_KEY` environment variable (fallback)
+4. `.env` file in current directory
+```bash
+# Option 1: Set environment variable (recommended)
+export GEMINI_API_KEY="your-api-key"
+# Option 2: Use existing GOOGLE_API_KEY (auto-detected)
+export GOOGLE_API_KEY="your-api-key"
+# Option 3: Create a .env file
+echo "GEMINI_API_KEY=your-api-key" > .env
+# Option 4: Pass directly (not recommended for security)
+gemini-ocr paper.pdf --api-key "your-api-key"
+```
+### Process documents
+```bash
+# Single file
+gemini-ocr paper.pdf
+# Directory
+gemini-ocr ./documents/ -o ./results/
+# With custom model
+gemini-ocr paper.pdf --model gemini-1.5-pro
+```
+### Describe figures
+```bash
+# Analyze a chart/diagram
+gemini-ocr describe chart.png
+# Save to file
+gemini-ocr describe figure.jpg -o description.md
+```
+## CLI Reference
+### `gemini-ocr process`
+Process documents and images with OCR.
+```
+Usage: gemini-ocr process [OPTIONS] INPUT_PATH
+Options:
+  -o, --output-dir PATH           Output directory for results
+  --api-key TEXT                  Gemini API key
+  --model TEXT                    Model to use (default: gemini-3.0-flash)
+  --task [convert|extract|table]  OCR task type (default: convert)
+  --prompt TEXT                   Custom prompt for OCR
+  --include-images/--no-images    Extract embedded images (default: True)
+  --save-originals/--no-save-originals
+                                  Save original input images (default: True)
+  --add-timestamp/--no-timestamp  Add timestamp to output folder
+  --reprocess                     Reprocess existing files
+  --env-file PATH                 Path to .env file
+  -v, --verbose                   Enable verbose output
+```
+### `gemini-ocr describe`
+Generate detailed descriptions of figures, charts, and diagrams.
+```
+Usage: gemini-ocr describe [OPTIONS] IMAGE_PATH
+Options:
+  --api-key TEXT    Gemini API key
+  --model TEXT      Model to use
+  -o, --output PATH Output file (default: stdout)
+```
+### `gemini-ocr info`
+Show configuration and system information.
+## Output Format
+Results are saved as Markdown files with:
+- File metadata (original path, processing time)
+- Extracted text (full document)
+- Embedded image references (if enabled)
+- `metadata.json` tracking all processed files
+## Models
+| Model | Speed | Quality | Cost | Recommended For |
+|-------|-------|---------|------|-----------------|
+| `gemini-3.0-flash` | Fast | Good | Low | Default, most documents |
+| `gemini-1.5-flash` | Fast | Good | Low | Simple documents |
+| `gemini-1.5-pro` | Slower | Best | Higher | Complex layouts, equations |
+## Environment Variables
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `GEMINI_API_KEY` | Google Gemini API key | Required |
+| `GOOGLE_API_KEY` | Fallback API key | - |
+| `GEMINI_MODEL` | Default model | `gemini-3.0-flash` |
+## License
+MIT

gemini_ocr_cli-0.2.0/README.md ADDED Viewed

@@ -0,0 +1,155 @@
+# Gemini OCR CLI
+Command-line tool for OCR processing using Google Gemini's vision capabilities. Extract text, tables, equations, and figures from PDFs and images with high accuracy.
+## Features
+- **Native PDF upload**: Direct PDF processing via Gemini Files API (fast, single API call)
+- **Multi-format support**: PDF and images (JPG, PNG, WEBP, GIF, BMP, TIFF)
+- **High-quality OCR**: Leverages Gemini's advanced vision models
+- **Structure preservation**: Maintains headings, tables, lists, equations
+- **Figure analysis**: Generate detailed descriptions of charts and diagrams
+- **Batch processing**: Process entire directories with progress tracking
+- **Incremental processing**: Skip already-processed files
+- **Automatic retry**: Exponential backoff for API rate limits
+- **Markdown output**: Clean, structured output format
+## Installation
+### From PyPI (recommended)
+```bash
+pip install gemini-ocr-cli
+```
+### Using pipx
+```bash
+pipx install gemini-ocr-cli
+```
+### From source
+```bash
+git clone https://github.com/r-uben/gemini-ocr-cli.git
+cd gemini-ocr-cli
+uv pip install -e .
+```
+## Quick Start
+### API Key Resolution
+The CLI automatically picks up your API key from environment variables (no configuration needed if already set):
+**Priority order:**
+1. `--api-key` CLI argument (highest priority)
+2. `GEMINI_API_KEY` environment variable
+3. `GOOGLE_API_KEY` environment variable (fallback)
+4. `.env` file in current directory
+```bash
+# Option 1: Set environment variable (recommended)
+export GEMINI_API_KEY="your-api-key"
+# Option 2: Use existing GOOGLE_API_KEY (auto-detected)
+export GOOGLE_API_KEY="your-api-key"
+# Option 3: Create a .env file
+echo "GEMINI_API_KEY=your-api-key" > .env
+# Option 4: Pass directly (not recommended for security)
+gemini-ocr paper.pdf --api-key "your-api-key"
+```
+### Process documents
+```bash
+# Single file
+gemini-ocr paper.pdf
+# Directory
+gemini-ocr ./documents/ -o ./results/
+# With custom model
+gemini-ocr paper.pdf --model gemini-1.5-pro
+```
+### Describe figures
+```bash
+# Analyze a chart/diagram
+gemini-ocr describe chart.png
+# Save to file
+gemini-ocr describe figure.jpg -o description.md
+```
+## CLI Reference
+### `gemini-ocr process`
+Process documents and images with OCR.
+```
+Usage: gemini-ocr process [OPTIONS] INPUT_PATH
+Options:
+  -o, --output-dir PATH           Output directory for results
+  --api-key TEXT                  Gemini API key
+  --model TEXT                    Model to use (default: gemini-3.0-flash)
+  --task [convert|extract|table]  OCR task type (default: convert)
+  --prompt TEXT                   Custom prompt for OCR
+  --include-images/--no-images    Extract embedded images (default: True)
+  --save-originals/--no-save-originals
+                                  Save original input images (default: True)
+  --add-timestamp/--no-timestamp  Add timestamp to output folder
+  --reprocess                     Reprocess existing files
+  --env-file PATH                 Path to .env file
+  -v, --verbose                   Enable verbose output
+```
+### `gemini-ocr describe`
+Generate detailed descriptions of figures, charts, and diagrams.
+```
+Usage: gemini-ocr describe [OPTIONS] IMAGE_PATH
+Options:
+  --api-key TEXT    Gemini API key
+  --model TEXT      Model to use
+  -o, --output PATH Output file (default: stdout)
+```
+### `gemini-ocr info`
+Show configuration and system information.
+## Output Format
+Results are saved as Markdown files with:
+- File metadata (original path, processing time)
+- Extracted text (full document)
+- Embedded image references (if enabled)
+- `metadata.json` tracking all processed files
+## Models
+| Model | Speed | Quality | Cost | Recommended For |
+|-------|-------|---------|------|-----------------|
+| `gemini-3.0-flash` | Fast | Good | Low | Default, most documents |
+| `gemini-1.5-flash` | Fast | Good | Low | Simple documents |
+| `gemini-1.5-pro` | Slower | Best | Higher | Complex layouts, equations |
+## Environment Variables
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `GEMINI_API_KEY` | Google Gemini API key | Required |
+| `GOOGLE_API_KEY` | Fallback API key | - |
+| `GEMINI_MODEL` | Default model | `gemini-3.0-flash` |
+## License
+MIT

gemini_ocr_cli-0.2.0/gemini_ocr/__init__.py ADDED Viewed

@@ -0,0 +1,8 @@
+"""Gemini OCR CLI - Document processing using Google Gemini's vision capabilities."""
+__version__ = "0.2.0"
+from gemini_ocr.processor import OCRProcessor
+from gemini_ocr.config import Config
+__all__ = ["OCRProcessor", "Config", "__version__"]

gemini_ocr_cli-0.2.0/gemini_ocr/__main__.py ADDED Viewed

@@ -0,0 +1,6 @@
+"""Allow running as python -m gemini_ocr."""
+from gemini_ocr.cli import main
+if __name__ == "__main__":
+    main()