PyPI - owlbear - Versions diffs - 0.2.0__tar.gz - Mend

owlbear 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

owlbear-0.2.0/.claude/settings.local.json +9 -0
owlbear-0.2.0/.gitignore +31 -0
owlbear-0.2.0/CLAUDE.md +26 -0
owlbear-0.2.0/PKG-INFO +436 -0
owlbear-0.2.0/README.md +397 -0
owlbear-0.2.0/owlbear.png +0 -0
owlbear-0.2.0/pyproject.toml +70 -0
owlbear-0.2.0/src/owlbear/__init__.py +7 -0
owlbear-0.2.0/src/owlbear/athena.py +302 -0
owlbear-0.2.0/src/owlbear/client.py +0 -0
owlbear-0.2.0/src/owlbear/trino.py +121 -0
owlbear-0.2.0/src/owlbear/types.py +98 -0
owlbear-0.2.0/tests/__init__.py +0 -0
owlbear-0.2.0/tests/conftest.py +0 -0
owlbear-0.2.0/tests/test_athena.py +490 -0
owlbear-0.2.0/tests/test_client.py +0 -0
owlbear-0.2.0/tests/test_trino.py +229 -0
owlbear-0.2.0/tests/test_types.py +127 -0
owlbear-0.2.0/wolpertinger.jpg +0 -0

owlbear-0.2.0/.claude/settings.local.json ADDED Viewed

@@ -0,0 +1,9 @@
+{
+  "permissions": {
+    "allow": [
+      "WebFetch(domain:github.com)",
+      "WebFetch(domain:boto3.amazonaws.com)",
+      "WebFetch(domain:arrow.apache.org)"
+    ]
+  }
+}

owlbear-0.2.0/.gitignore ADDED Viewed

@@ -0,0 +1,31 @@
+# These are some examples of commonly ignored file patterns.
+# You should customize this list as applicable to your project.
+# Learn more about .gitignore:
+#     https://www.atlassian.com/git/tutorials/saving-changes/gitignore
+# Python specific
+venv/
+__pycache__/
+*.py[cod]
+*.pyc
+.pytest_cache/
+# direnv
+.direnv/
+.envrc
+# Build artifacts
+dist/
+build/
+*.egg-info/
+# IDE
+.idea/
+.vscode/
+# OS generated files
+.DS_Store
+Thumbs.db
+# Log files
+*.log

owlbear-0.2.0/CLAUDE.md ADDED Viewed

@@ -0,0 +1,26 @@
+# Claude Instructions for owlbear
+## Project Overview
+Owlbear is a Python client that bridges AWS Athena and Polars. It executes Athena SQL queries and returns results as typed Polars DataFrames via PyArrow. Named for its two halves: Owl (Athena) + Bear (Polars).
+## Development Guidelines
+- Use Polars for all data processing operations
+- Follow Python packaging best practices with pyproject.toml
+- Maintain compatibility with Python 3.8+
+## Dependencies
+- polars: Core data processing library
+- boto3: AWS SDK for Athena integration
+## Development Dependencies
+- pytest: Testing framework
+- black: Code formatter
+- ruff: Linter
+- mypy: Type checker
+## Commands
+- Install dependencies: `pip install -e .[dev]`
+- Run tests: `pytest`
+- Format code: `black .`
+- Lint code: `ruff check .`
+- Type check: `mypy src/`

owlbear-0.2.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,436 @@
+Metadata-Version: 2.4
+Name: owlbear
+Version: 0.2.0
+Summary: Feathers and claws for your data lake
+Project-URL: Homepage, https://github.com/jdonaldson/owlbear
+Project-URL: Repository, https://github.com/jdonaldson/owlbear
+Project-URL: Issues, https://github.com/jdonaldson/owlbear/issues
+Author-email: "J. Justin Donaldson" <jjd@jjd.io>
+License: MIT
+Keywords: analytics,athena,aws,data,polars,trino
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Database
+Classifier: Topic :: Scientific/Engineering :: Information Analysis
+Requires-Python: >=3.9
+Requires-Dist: polars>=0.20.0
+Requires-Dist: pyarrow>=10.0.0
+Provides-Extra: all
+Requires-Dist: boto3>=1.26.0; extra == 'all'
+Requires-Dist: trino>=0.320.0; extra == 'all'
+Provides-Extra: athena
+Requires-Dist: boto3>=1.26.0; extra == 'athena'
+Provides-Extra: dev
+Requires-Dist: black>=23.0.0; extra == 'dev'
+Requires-Dist: boto3>=1.26.0; extra == 'dev'
+Requires-Dist: mypy>=1.0.0; extra == 'dev'
+Requires-Dist: pytest>=7.0.0; extra == 'dev'
+Requires-Dist: ruff>=0.1.0; extra == 'dev'
+Requires-Dist: trino>=0.320.0; extra == 'dev'
+Provides-Extra: trino
+Requires-Dist: trino>=0.320.0; extra == 'trino'
+Description-Content-Type: text/markdown
+# owlbear
+<img src="owlbear.png" width="150" align="right" alt="Owlbear" />
+**Feathers and claws for your data lake.**
+Owlbear is a Python client that bridges **Athena** and **Trino** to **Polars** DataFrames via PyArrow. A wise chimera — part **Owl** ([Athena](https://aws.amazon.com/athena/), goddess of wisdom), part **Bear** ([Polars](https://pola.rs/), the bear constellation). Query your data lake with SQL, get back fast, typed DataFrames — no serialization or ODBC overhead.
+## Features
+- **Two backends**: `AthenaClient` (AWS Athena via boto3) and `TrinoClient` (direct Trino connection)
+- Shared Presto-family type conversion — both backends produce identically typed Polars DataFrames
+- Pagination support for large result sets (Athena) and row limits (both)
+- Comprehensive error handling and timeout management
+- Query cancellation and execution monitoring (Athena)
+- Built-in retry logic with exponential backoff (Athena)
+## Installation
+### From GitHub (Git)
+```bash
+# Core only (no backend)
+pip install git+https://github.com/jdonaldson/owlbear.git
+# With Athena backend
+pip install "owlbear[athena] @ git+https://github.com/jdonaldson/owlbear.git"
+# With Trino backend
+pip install "owlbear[trino] @ git+https://github.com/jdonaldson/owlbear.git"
+# Both backends
+pip install "owlbear[all] @ git+https://github.com/jdonaldson/owlbear.git"
+```
+### For Development
+```bash
+git clone https://github.com/jdonaldson/owlbear.git
+cd owlbear
+pip install -e ".[dev]"
+```
+## Prerequisites
+- Python 3.8+
+- **Athena**: AWS credentials configured (via AWS CLI, environment variables, or IAM roles) and an S3 bucket for query results
+- **Trino**: A running Trino cluster with network access
+## Quick Start
+### Athena
+```python
+from owlbear import AthenaClient
+client = AthenaClient(
+    database="my_database",
+    output_location="s3://my-bucket/athena-results/",
+    region="us-east-1"
+)
+execution_id = client.query("SELECT * FROM orders LIMIT 5")
+df = client.results(execution_id)
+print(df)
+```
+```
+shape: (5, 4)
+┌─────────────┬────────────┬──────────────┬────────────┐
+│ customer_id ┆ order_date ┆ order_amount ┆ status     │
+│ ---         ┆ ---        ┆ ---          ┆ ---        │
+│ i64         ┆ date       ┆ f64          ┆ str        │
+╞═════════════╪════════════╪══════════════╪════════════╡
+│ 1001        ┆ 2024-03-15 ┆ 249.99       ┆ shipped    │
+│ 1002        ┆ 2024-03-15 ┆ 89.50        ┆ delivered  │
+│ 1003        ┆ 2024-03-16 ┆ 1024.00      ┆ processing │
+│ 1001        ┆ 2024-03-17 ┆ 54.25        ┆ shipped    │
+│ 1004        ┆ 2024-03-17 ┆ 399.99       ┆ delivered  │
+└─────────────┴────────────┴──────────────┴────────────┘
+```
+### Trino
+```python
+from owlbear import TrinoClient
+client = TrinoClient(
+    host="trino.example.com",
+    port=443,
+    user="analyst",
+    catalog="hive",
+    schema="default",
+)
+df = client.query("SELECT * FROM orders LIMIT 5")
+print(df)
+```
+```
+shape: (5, 4)
+┌─────────────┬────────────┬──────────────┬────────────┐
+│ customer_id ┆ order_date ┆ order_amount ┆ status     │
+│ ---         ┆ ---        ┆ ---          ┆ ---        │
+│ i64         ┆ date       ┆ f64          ┆ str        │
+╞═════════════╪════════════╪══════════════╪════════════╡
+│ 1001        ┆ 2024-03-15 ┆ 249.99       ┆ shipped    │
+│ 1002        ┆ 2024-03-15 ┆ 89.50        ┆ delivered  │
+│ 1003        ┆ 2024-03-16 ┆ 1024.00      ┆ processing │
+│ 1001        ┆ 2024-03-17 ┆ 54.25        ┆ shipped    │
+│ 1004        ┆ 2024-03-17 ┆ 399.99       ┆ delivered  │
+└─────────────┴────────────┴──────────────┴────────────┘
+```
+## Usage Examples
+### Basic Query Execution
+```python
+from owlbear import AthenaClient
+# Initialize client
+client = AthenaClient(
+    database="analytics_db",
+    output_location="s3://my-athena-results/queries/",
+    region="us-west-2"
+)
+# Execute query with automatic waiting
+query = """
+SELECT
+    customer_id,
+    SUM(order_amount) as total_spent,
+    COUNT(*) as order_count
+FROM orders
+WHERE order_date >= '2024-01-01'
+GROUP BY customer_id
+ORDER BY total_spent DESC
+LIMIT 50
+"""
+execution_id = client.query(query, wait_for_completion=True)
+results_df = client.results(execution_id)
+# Use Polars operations
+top_customers = results_df.filter(pl.col("total_spent") > 1000)
+print(f"Found {len(top_customers)} high-value customers")
+```
+### Asynchronous Query Execution
+```python
+# Start query without waiting
+execution_id = client.query(
+    "SELECT * FROM large_table",
+    wait_for_completion=False
+)
+# Check query status
+query_info = client.get_query_info(execution_id)
+print(f"Query status: {query_info['Status']['State']}")
+# Wait for completion and get results when ready
+client._wait_for_completion(execution_id)
+df = client.results(execution_id)
+```
+### Using Work Groups
+```python
+# Execute query with a specific work group
+execution_id = client.query(
+    query="SELECT COUNT(*) FROM my_table",
+    work_group="my-workgroup"
+)
+df = client.results(execution_id)
+```
+### Handling Large Result Sets
+```python
+# Get results with pagination (limit to 5000 rows)
+df = client.results(execution_id, max_rows=5000)
+# For larger datasets, consider using LIMIT in your SQL query
+# or processing results in chunks
+```
+### Using with Existing boto3 Session
+```python
+import boto3
+from owlbear import AthenaClient
+# Use existing session (useful for custom credential handling)
+session = boto3.Session(profile_name='my-profile')
+client = AthenaClient.from_session(
+    session=session,
+    database="my_db",
+    output_location="s3://my-bucket/results/"
+)
+# Or with custom config
+from botocore.config import Config
+config = Config(
+    region_name='eu-west-1',
+    retries={'max_attempts': 5}
+)
+client = AthenaClient(
+    database="my_db",
+    output_location="s3://my-bucket/results/",
+    config=config
+)
+```
+### Query Management
+```python
+# List available work groups
+work_groups = client.list_work_groups()
+print(f"Available work groups: {work_groups}")
+# Cancel a running query
+client.cancel_query(execution_id)
+# Get detailed query information
+query_info = client.get_query_info(execution_id)
+print(f"Query execution time: {query_info['Statistics']['TotalExecutionTimeInMillis']}ms")
+print(f"Data processed: {query_info['Statistics']['DataProcessedInBytes']} bytes")
+```
+### Error Handling
+```python
+try:
+    execution_id = client.query("SELECT * FROM non_existent_table")
+    df = client.results(execution_id)
+except Exception as e:
+    if "Query failed" in str(e):
+        print(f"Query execution failed: {e}")
+    elif "timeout" in str(e).lower():
+        print(f"Query timed out: {e}")
+    else:
+        print(f"Unexpected error: {e}")
+```
+## Advanced Usage
+### Custom Query Context
+```python
+execution_id = client.query(
+    query="SELECT * FROM my_table",
+    query_context={"Catalog": "my_catalog"},
+    result_config={"EncryptionConfiguration": {"EncryptionOption": "SSE_S3"}}
+)
+```
+### Working with Different Data Types
+The library automatically handles various Athena data types using PyArrow for proper type inference:
+```python
+# Data types are automatically inferred and converted
+df = client.results(execution_id)
+# Check the inferred types
+print(df.dtypes)  # [Int32, Utf8, Float64, Boolean, Date32, etc.]
+# No manual casting needed for basic types, but you can still cast if needed
+df_modified = df.with_columns([
+    pl.col("timestamp_col").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S"),
+])
+```
+## Configuration
+### Environment Variables
+You can configure AWS credentials using standard environment variables:
+```bash
+export AWS_ACCESS_KEY_ID=your_access_key
+export AWS_SECRET_ACCESS_KEY=your_secret_key
+export AWS_DEFAULT_REGION=us-east-1
+```
+### IAM Permissions
+Your AWS credentials need the following permissions:
+```json
+{
+    "Version": "2012-10-17",
+    "Statement": [
+        {
+            "Effect": "Allow",
+            "Action": [
+                "athena:StartQueryExecution",
+                "athena:GetQueryExecution",
+                "athena:GetQueryResults",
+                "athena:StopQueryExecution",
+                "athena:ListWorkGroups"
+            ],
+            "Resource": "*"
+        },
+        {
+            "Effect": "Allow",
+            "Action": [
+                "s3:GetObject",
+                "s3:PutObject"
+            ],
+            "Resource": "arn:aws:s3:::your-athena-results-bucket/*"
+        },
+        {
+            "Effect": "Allow",
+            "Action": [
+                "glue:GetDatabase",
+                "glue:GetTable",
+                "glue:GetPartitions"
+            ],
+            "Resource": "*"
+        }
+    ]
+}
+```
+## Testing
+Run the test suite:
+```bash
+pytest tests/ -v
+```
+Run tests with coverage:
+```bash
+pytest tests/ --cov=src --cov-report=html
+```
+## Development
+### Setup Development Environment
+```bash
+git clone https://github.com/jdonaldson/owlbear.git
+cd owlbear
+pip install -e ".[dev]"
+```
+### Code Quality
+Format code:
+```bash
+black .
+```
+Lint code:
+```bash
+ruff check .
+```
+Type checking:
+```bash
+mypy src/
+```
+## License
+MIT License - see LICENSE file for details.
+## Contributing
+1. Fork the repository on GitHub
+2. Create a feature branch
+3. Make your changes with tests
+4. Ensure all tests pass and code is formatted
+5. Submit a pull request
+## Changelog
+### v0.2.0
+- Add `TrinoClient` for direct Trino connections
+- Rename `OwlbearClient` → `AthenaClient` (alias kept for backward compat)
+- Extract shared `presto_type_to_pyarrow` type converter
+- Make `boto3` and `trino` optional extras (`[athena]`, `[trino]`, `[all]`)
+### v0.1.0 (2024-08-28)
+- Initial release
+- `AthenaClient` for executing Athena SQL and returning typed Polars DataFrames via PyArrow
+- Automatic Athena-to-PyArrow type mapping (integers, floats, decimals, timestamps, booleans, arrays, maps)
+- Paginated result retrieval with configurable row limits
+- Async query execution with exponential-backoff polling
+- Work group support, query cancellation, and execution monitoring