PyPI - airflow-toolkit - Versions diffs - 2.2.0__tar.gz → 2.4.0__tar.gz - Mend

airflow-toolkit 2.2.0tar.gz → 2.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

{airflow_toolkit-2.2.0/src/airflow_toolkit.egg-info → airflow_toolkit-2.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: airflow-toolkit
-Version: 2.2.0
+Version: 2.4.0
 Summary: A toolkit of operators, hooks and utilities for Apache Airflow 3
 Author-email: Biel Llobera <biel_llobera@dkl.digital>
 Requires-Python: <3.15,>=3.11
@@ -32,6 +32,10 @@ Provides-Extra: duckdb
 Requires-Dist: airflow-provider-duckdb>=0.1.2; extra == "duckdb"
 Provides-Extra: sqlite
 Requires-Dist: apache-airflow-providers-sqlite; extra == "sqlite"
+Provides-Extra: excel
+Requires-Dist: openpyxl>=3.1; extra == "excel"
+Provides-Extra: avro
+Requires-Dist: fastavro>=1.9; extra == "avro"
 Provides-Extra: airflow3-full
 Requires-Dist: apache-airflow<4,>=3; extra == "airflow3-full"
 Requires-Dist: apache-airflow-providers-fab>=3.0.0; extra == "airflow3-full"
@@ -49,6 +53,8 @@ Requires-Dist: requests>=2.31.0; extra == "airflow3-full"
 Requires-Dist: jmespath<2,>=1.0.1; extra == "airflow3-full"
 Requires-Dist: airflow-provider-duckdb>=0.1.2; extra == "airflow3-full"
 Requires-Dist: apache-airflow-providers-sqlite; extra == "airflow3-full"
+Requires-Dist: openpyxl>=3.1; extra == "airflow3-full"
+Requires-Dist: fastavro>=1.9; extra == "airflow3-full"
 Dynamic: license-file
 # Airflow Toolkit
@@ -136,10 +142,11 @@ pip install "airflow-toolkit[airflow3-full]"
 | `google` | `providers-google` | GCS filesystem backend |
 | `azure` | `providers-microsoft-azure` | Azure Blob / ADLS filesystem backend |
 | `sftp` | `providers-sftp` | SFTP filesystem backend |
-| `slack` | `providers-slack` | Slack failure notifications |
 | `http` | `providers-http`, `requests`, `jmespath`, `pandas` | `HttpToFilesystem`, `MultiHttpToFilesystem` |
 | `duckdb` | `airflow-provider-duckdb` | `DuckdbToDeltalake` operator |
 | `sqlite` | `providers-sqlite` | SQLite as source or destination |
+| `excel` | `openpyxl` | Excel (`.xlsx` / `.xls`) support in `FilesystemToDatabase` and `HttpToFilesystem` |
+| `avro` | `fastavro` | Avro support in `FilesystemToDatabase` and `HttpToFilesystem` |
 | `airflow3-full` | all of the above | Quick start / development |
 ---
@@ -184,7 +191,7 @@ Changing the connection's `conn_type` is all that is needed to switch backends
 ### HttpToFilesystem
-Calls an HTTP endpoint and writes the response to any filesystem. Supports pagination, JMESPath filtering, compression, and custom response transformations.
+Calls an HTTP endpoint and writes the response to any filesystem. Supports pagination, JMESPath filtering, compression, OAuth 2.0 authentication, rate limiting, and custom response transformations.
 ```python
 from airflow_toolkit.providers.filesystem.operators.http_to_filesystem import HttpToFilesystem
@@ -201,7 +208,7 @@ HttpToFilesystem(
 )
 ```
-With cursor-based pagination:
+**With cursor-based pagination:**
 ```python
 def next_page(response):
@@ -223,9 +230,70 @@ HttpToFilesystem(
 )
 ```
+**With OAuth 2.0 Client Credentials:**
+`OAuth2ClientCredentials.client_credentials()` returns a configured auth class that fetches the token lazily on the first request and refreshes it automatically 30 seconds before expiry — no manual token management required.
+```python
+from airflow_toolkit.providers.filesystem.operators.auth import OAuth2ClientCredentials
+HttpToFilesystem(
+    task_id='fetch_protected_data',
+    http_conn_id='my_api',
+    filesystem_conn_id='my_data_lake',
+    filesystem_path='raw/data/{{ ds }}/',
+    endpoint='/api/v1/data',
+    method='GET',
+    save_format='jsonl',
+    auth_type=OAuth2ClientCredentials.client_credentials(
+        token_url='https://auth.example.com/oauth2/token',
+        client_id='{{ var.value.oauth2_client_id }}',
+        client_secret='{{ var.value.oauth2_client_secret }}',
+        scope='read',           # optional
+    ),
+)
+```
+**With rate limiting:**
+Use `requests_per_second` to cap how fast paginated requests are sent. This is useful when the API enforces a rate limit.
+```python
+HttpToFilesystem(
+    task_id='fetch_with_rate_limit',
+    http_conn_id='my_api',
+    filesystem_conn_id='my_data_lake',
+    filesystem_path='raw/events/{{ ds }}/',
+    endpoint='/api/v1/events',
+    method='GET',
+    pagination_function=next_page,
+    save_format='jsonl',
+    requests_per_second=3.0,    # max 3 requests per second between pages
+)
+```
+**Supported response formats:**
+`save_format` controls how the response is written to the filesystem. For APIs that return binary formats natively (e.g. a reporting API that streams Excel files), set `source_format` to match the response content type:
+| `source_format` / `save_format` | File extension | Notes |
+|---|---|---|
+| `json` | `.json` | Single JSON object or array |
+| `jsonl` | `.jsonl` | Array response written as one record per line |
+| `csv` | `.csv` | Raw CSV text from the response |
+| `xml` | `.xml` | Raw XML text from the response |
+| `parquet` | `.parquet` | Binary passthrough — API must return Parquet bytes |
+| `excel` | `.xlsx` | Binary passthrough — API must return Excel bytes (requires `[excel]`) |
+| `avro` | `.avro` | Binary passthrough — API must return Avro bytes (requires `[avro]`) |
+| `fixed_width` | `.fwf` | Fixed-width text from the response |
+All text and JSON formats support gzip/zip compression via the `compression` parameter.
 ### MultiHttpToFilesystem
-Runs multiple HTTP requests in a single Airflow task, saving each response as a separate file. Useful for fetching multiple entities or date ranges without creating one task per request.
+Runs multiple HTTP requests in a single Airflow task, saving each response as a separate file. Requests can run **sequentially** (with optional rate limiting) or **in parallel** using a thread pool.
+**Sequential with rate limiting:**
 ```python
 from airflow_toolkit.providers.filesystem.operators.http_to_filesystem import MultiHttpToFilesystem
@@ -237,6 +305,7 @@ MultiHttpToFilesystem(
     filesystem_path='raw/reference/{{ ds }}/',
     method='GET',
     save_format='jsonl',
+    requests_per_second=2.0,    # max 2 requests per second between calls
     multi_requests=[
         {'endpoint': '/api/v1/categories'},
         {'endpoint': '/api/v1/statuses'},
@@ -245,6 +314,31 @@ MultiHttpToFilesystem(
 )
 ```
+**Parallel execution:**
+Set `max_workers` to run requests concurrently using a thread pool. Each request writes to its own file — there are no file collisions.
+```python
+MultiHttpToFilesystem(
+    task_id='fetch_users_parallel',
+    http_conn_id='my_api',
+    filesystem_conn_id='my_data_lake',
+    filesystem_path='raw/users/{{ ds }}/',
+    method='GET',
+    save_format='json',
+    max_workers=5,              # up to 5 concurrent threads
+    multi_requests=[
+        {'endpoint': '/api/v1/users/1'},
+        {'endpoint': '/api/v1/users/2'},
+        {'endpoint': '/api/v1/users/3'},
+        {'endpoint': '/api/v1/users/4'},
+        {'endpoint': '/api/v1/users/5'},
+    ],
+)
+```
+> Rate limiting (`requests_per_second`) applies only in sequential mode. In parallel mode the thread pool controls concurrency — use `max_workers` to avoid overwhelming the API.
 Each entry in `multi_requests` can override any base parameter (`endpoint`, `method`, `headers`, `data`, `jmespath_expression`, `save_format`, `compression`).
 ### SQLToFilesystem
@@ -313,7 +407,9 @@ FilesystemToFilesystem(
 ### FilesystemToDatabase
-Reads files (CSV, JSON, or Parquet) from any filesystem and loads them into any SQLAlchemy-compatible database. Handles schema drift automatically: columns present in the file but missing from the table are added; columns present in the table but missing from the file are filled with `NULL`.
+Reads files from any filesystem and loads them into any SQLAlchemy-compatible database. Handles schema drift automatically: columns present in the file but missing from the table are added; columns present in the table but missing from the file are filled with `NULL`.
+**Supported formats:** `csv`, `json`, `parquet`, `excel`, `avro`, `fixed_width`.
 ```python
 from airflow_toolkit.providers.deltalake.operators.filesystem_to_database import FilesystemToDatabaseOperator
@@ -325,7 +421,7 @@ FilesystemToDatabaseOperator(
     filesystem_path='raw/orders/{{ ds }}/',
     db_schema='public',
     db_table='orders',
-    source_format='csv',
+    source_format='csv',                   # 'csv' | 'json' | 'parquet' | 'excel' | 'avro' | 'fixed_width'
     table_aggregation_type='append',       # 'append' | 'replace' | 'fail'
     metadata={
         '_ds':          '{{ ds }}',
@@ -335,6 +431,52 @@ FilesystemToDatabaseOperator(
 )
 ```
+**Excel** (requires the `[excel]` extra):
+```python
+FilesystemToDatabaseOperator(
+    task_id='load_excel_report',
+    filesystem_conn_id='my_data_lake',
+    database_conn_id='my_postgres',
+    filesystem_path='raw/reports/{{ ds }}/',
+    db_table='monthly_report',
+    source_format='excel',
+    source_format_options={'sheet_name': 'Data'},
+)
+```
+**Avro** (requires the `[avro]` extra):
+```python
+FilesystemToDatabaseOperator(
+    task_id='load_avro_events',
+    filesystem_conn_id='my_data_lake',
+    database_conn_id='my_postgres',
+    filesystem_path='raw/events/{{ ds }}/',
+    db_table='events',
+    source_format='avro',
+)
+```
+**Fixed-width** (no extra required — pandas native):
+```python
+FilesystemToDatabaseOperator(
+    task_id='load_fixed_width',
+    filesystem_conn_id='my_data_lake',
+    database_conn_id='my_postgres',
+    filesystem_path='raw/exports/{{ ds }}/',
+    db_table='transactions',
+    source_format='fixed_width',
+    source_format_options={
+        'colspecs': [(0, 10), (10, 25), (25, 35)],
+        'names': ['date', 'description', 'amount'],
+    },
+)
+```
+Each format is matched by file extension: `.csv`/`.csv.gz`, `.json`/`.json.gz`, `.parquet`/`.parquet.gz`, `.xlsx`/`.xls`, `.avro`, `.fwf`/`.txt`/`.dat`. Files with other extensions in the same prefix are silently skipped.
 ### DuckdbToDeltalake
 Executes a DuckDB SQL query and writes the result directly to a Delta Lake table on Azure storage. Useful for in-process transformations that land results as an open table format.
@@ -530,6 +672,50 @@ Each environment maps to a distinct colour across all channels so alerts are rec
 ---
+## Testing Utilities
+### MockFilesystem
+`MockFilesystem` is an in-memory implementation of `FilesystemProtocol` for unit testing. It requires no Docker, no cloud credentials, and no network — all files are stored in a plain Python dict.
+```python
+from airflow_toolkit.testing import MockFilesystem
+# Pre-load files at construction time
+fs = MockFilesystem({
+    "raw/orders/2024-01-01/data.csv": b"id,amount\n1,100\n2,200",
+})
+# Or write files programmatically
+fs.write(b"id,amount\n3,300", "raw/orders/2024-01-02/data.csv")
+# Inspect the result in assertions
+assert fs.check_file("raw/orders/2024-01-01/data.csv")
+assert len(fs.list_files("raw/orders/")) == 2
+assert fs.files["raw/orders/2024-01-01/data.csv"] == b"id,amount\n1,100\n2,200"
+```
+Use it to patch `FilesystemFactory.get_data_lake_filesystem` in your operator tests:
+```python
+from unittest.mock import patch
+from airflow_toolkit.testing import MockFilesystem
+def test_my_pipeline(tmp_path):
+    fs = MockFilesystem({"data/file.csv": b"id,name\n1,Alice"})
+    with patch(
+        "airflow_toolkit.filesystems.filesystem_factory.FilesystemFactory.get_data_lake_filesystem",
+        return_value=fs,
+    ):
+        # run your operator or task here
+        ...
+```
+`MockFilesystem` implements the full `FilesystemProtocol`: `read`, `write`, `delete_file`, `create_prefix`, `delete_prefix`, `check_file`, `check_prefix`, `list_files`.
+---
 ## Running Tests
 ### Integration tests

{airflow_toolkit-2.2.0 → airflow_toolkit-2.4.0}/README.md RENAMED Viewed

@@ -83,10 +83,11 @@ pip install "airflow-toolkit[airflow3-full]"
 | `google` | `providers-google` | GCS filesystem backend |
 | `azure` | `providers-microsoft-azure` | Azure Blob / ADLS filesystem backend |
 | `sftp` | `providers-sftp` | SFTP filesystem backend |
-| `slack` | `providers-slack` | Slack failure notifications |
 | `http` | `providers-http`, `requests`, `jmespath`, `pandas` | `HttpToFilesystem`, `MultiHttpToFilesystem` |
 | `duckdb` | `airflow-provider-duckdb` | `DuckdbToDeltalake` operator |
 | `sqlite` | `providers-sqlite` | SQLite as source or destination |
+| `excel` | `openpyxl` | Excel (`.xlsx` / `.xls`) support in `FilesystemToDatabase` and `HttpToFilesystem` |
+| `avro` | `fastavro` | Avro support in `FilesystemToDatabase` and `HttpToFilesystem` |
 | `airflow3-full` | all of the above | Quick start / development |
 ---
@@ -131,7 +132,7 @@ Changing the connection's `conn_type` is all that is needed to switch backends
 ### HttpToFilesystem
-Calls an HTTP endpoint and writes the response to any filesystem. Supports pagination, JMESPath filtering, compression, and custom response transformations.
+Calls an HTTP endpoint and writes the response to any filesystem. Supports pagination, JMESPath filtering, compression, OAuth 2.0 authentication, rate limiting, and custom response transformations.
 ```python
 from airflow_toolkit.providers.filesystem.operators.http_to_filesystem import HttpToFilesystem
@@ -148,7 +149,7 @@ HttpToFilesystem(
 )
 ```
-With cursor-based pagination:
+**With cursor-based pagination:**
 ```python
 def next_page(response):
@@ -170,9 +171,70 @@ HttpToFilesystem(
 )
 ```
+**With OAuth 2.0 Client Credentials:**
+`OAuth2ClientCredentials.client_credentials()` returns a configured auth class that fetches the token lazily on the first request and refreshes it automatically 30 seconds before expiry — no manual token management required.
+```python
+from airflow_toolkit.providers.filesystem.operators.auth import OAuth2ClientCredentials
+HttpToFilesystem(
+    task_id='fetch_protected_data',
+    http_conn_id='my_api',
+    filesystem_conn_id='my_data_lake',
+    filesystem_path='raw/data/{{ ds }}/',
+    endpoint='/api/v1/data',
+    method='GET',
+    save_format='jsonl',
+    auth_type=OAuth2ClientCredentials.client_credentials(
+        token_url='https://auth.example.com/oauth2/token',
+        client_id='{{ var.value.oauth2_client_id }}',
+        client_secret='{{ var.value.oauth2_client_secret }}',
+        scope='read',           # optional
+    ),
+)
+```
+**With rate limiting:**
+Use `requests_per_second` to cap how fast paginated requests are sent. This is useful when the API enforces a rate limit.
+```python
+HttpToFilesystem(
+    task_id='fetch_with_rate_limit',
+    http_conn_id='my_api',
+    filesystem_conn_id='my_data_lake',
+    filesystem_path='raw/events/{{ ds }}/',
+    endpoint='/api/v1/events',
+    method='GET',
+    pagination_function=next_page,
+    save_format='jsonl',
+    requests_per_second=3.0,    # max 3 requests per second between pages
+)
+```
+**Supported response formats:**
+`save_format` controls how the response is written to the filesystem. For APIs that return binary formats natively (e.g. a reporting API that streams Excel files), set `source_format` to match the response content type:
+| `source_format` / `save_format` | File extension | Notes |
+|---|---|---|
+| `json` | `.json` | Single JSON object or array |
+| `jsonl` | `.jsonl` | Array response written as one record per line |
+| `csv` | `.csv` | Raw CSV text from the response |
+| `xml` | `.xml` | Raw XML text from the response |
+| `parquet` | `.parquet` | Binary passthrough — API must return Parquet bytes |
+| `excel` | `.xlsx` | Binary passthrough — API must return Excel bytes (requires `[excel]`) |
+| `avro` | `.avro` | Binary passthrough — API must return Avro bytes (requires `[avro]`) |
+| `fixed_width` | `.fwf` | Fixed-width text from the response |
+All text and JSON formats support gzip/zip compression via the `compression` parameter.
 ### MultiHttpToFilesystem
-Runs multiple HTTP requests in a single Airflow task, saving each response as a separate file. Useful for fetching multiple entities or date ranges without creating one task per request.
+Runs multiple HTTP requests in a single Airflow task, saving each response as a separate file. Requests can run **sequentially** (with optional rate limiting) or **in parallel** using a thread pool.
+**Sequential with rate limiting:**
 ```python
 from airflow_toolkit.providers.filesystem.operators.http_to_filesystem import MultiHttpToFilesystem
@@ -184,6 +246,7 @@ MultiHttpToFilesystem(
     filesystem_path='raw/reference/{{ ds }}/',
     method='GET',
     save_format='jsonl',
+    requests_per_second=2.0,    # max 2 requests per second between calls
     multi_requests=[
         {'endpoint': '/api/v1/categories'},
         {'endpoint': '/api/v1/statuses'},
@@ -192,6 +255,31 @@ MultiHttpToFilesystem(
 )
 ```
+**Parallel execution:**
+Set `max_workers` to run requests concurrently using a thread pool. Each request writes to its own file — there are no file collisions.
+```python
+MultiHttpToFilesystem(
+    task_id='fetch_users_parallel',
+    http_conn_id='my_api',
+    filesystem_conn_id='my_data_lake',
+    filesystem_path='raw/users/{{ ds }}/',
+    method='GET',
+    save_format='json',
+    max_workers=5,              # up to 5 concurrent threads
+    multi_requests=[
+        {'endpoint': '/api/v1/users/1'},
+        {'endpoint': '/api/v1/users/2'},
+        {'endpoint': '/api/v1/users/3'},
+        {'endpoint': '/api/v1/users/4'},
+        {'endpoint': '/api/v1/users/5'},
+    ],
+)
+```
+> Rate limiting (`requests_per_second`) applies only in sequential mode. In parallel mode the thread pool controls concurrency — use `max_workers` to avoid overwhelming the API.
 Each entry in `multi_requests` can override any base parameter (`endpoint`, `method`, `headers`, `data`, `jmespath_expression`, `save_format`, `compression`).
 ### SQLToFilesystem
@@ -260,7 +348,9 @@ FilesystemToFilesystem(
 ### FilesystemToDatabase
-Reads files (CSV, JSON, or Parquet) from any filesystem and loads them into any SQLAlchemy-compatible database. Handles schema drift automatically: columns present in the file but missing from the table are added; columns present in the table but missing from the file are filled with `NULL`.
+Reads files from any filesystem and loads them into any SQLAlchemy-compatible database. Handles schema drift automatically: columns present in the file but missing from the table are added; columns present in the table but missing from the file are filled with `NULL`.
+**Supported formats:** `csv`, `json`, `parquet`, `excel`, `avro`, `fixed_width`.
 ```python
 from airflow_toolkit.providers.deltalake.operators.filesystem_to_database import FilesystemToDatabaseOperator
@@ -272,7 +362,7 @@ FilesystemToDatabaseOperator(
     filesystem_path='raw/orders/{{ ds }}/',
     db_schema='public',
     db_table='orders',
-    source_format='csv',
+    source_format='csv',                   # 'csv' | 'json' | 'parquet' | 'excel' | 'avro' | 'fixed_width'
     table_aggregation_type='append',       # 'append' | 'replace' | 'fail'
     metadata={
         '_ds':          '{{ ds }}',
@@ -282,6 +372,52 @@ FilesystemToDatabaseOperator(
 )
 ```
+**Excel** (requires the `[excel]` extra):
+```python
+FilesystemToDatabaseOperator(
+    task_id='load_excel_report',
+    filesystem_conn_id='my_data_lake',
+    database_conn_id='my_postgres',
+    filesystem_path='raw/reports/{{ ds }}/',
+    db_table='monthly_report',
+    source_format='excel',
+    source_format_options={'sheet_name': 'Data'},
+)
+```
+**Avro** (requires the `[avro]` extra):
+```python
+FilesystemToDatabaseOperator(
+    task_id='load_avro_events',
+    filesystem_conn_id='my_data_lake',
+    database_conn_id='my_postgres',
+    filesystem_path='raw/events/{{ ds }}/',
+    db_table='events',
+    source_format='avro',
+)
+```
+**Fixed-width** (no extra required — pandas native):
+```python
+FilesystemToDatabaseOperator(
+    task_id='load_fixed_width',
+    filesystem_conn_id='my_data_lake',
+    database_conn_id='my_postgres',
+    filesystem_path='raw/exports/{{ ds }}/',
+    db_table='transactions',
+    source_format='fixed_width',
+    source_format_options={
+        'colspecs': [(0, 10), (10, 25), (25, 35)],
+        'names': ['date', 'description', 'amount'],
+    },
+)
+```
+Each format is matched by file extension: `.csv`/`.csv.gz`, `.json`/`.json.gz`, `.parquet`/`.parquet.gz`, `.xlsx`/`.xls`, `.avro`, `.fwf`/`.txt`/`.dat`. Files with other extensions in the same prefix are silently skipped.
 ### DuckdbToDeltalake
 Executes a DuckDB SQL query and writes the result directly to a Delta Lake table on Azure storage. Useful for in-process transformations that land results as an open table format.
@@ -477,6 +613,50 @@ Each environment maps to a distinct colour across all channels so alerts are rec
 ---
+## Testing Utilities
+### MockFilesystem
+`MockFilesystem` is an in-memory implementation of `FilesystemProtocol` for unit testing. It requires no Docker, no cloud credentials, and no network — all files are stored in a plain Python dict.
+```python
+from airflow_toolkit.testing import MockFilesystem
+# Pre-load files at construction time
+fs = MockFilesystem({
+    "raw/orders/2024-01-01/data.csv": b"id,amount\n1,100\n2,200",
+})
+# Or write files programmatically
+fs.write(b"id,amount\n3,300", "raw/orders/2024-01-02/data.csv")
+# Inspect the result in assertions
+assert fs.check_file("raw/orders/2024-01-01/data.csv")
+assert len(fs.list_files("raw/orders/")) == 2
+assert fs.files["raw/orders/2024-01-01/data.csv"] == b"id,amount\n1,100\n2,200"
+```
+Use it to patch `FilesystemFactory.get_data_lake_filesystem` in your operator tests:
+```python
+from unittest.mock import patch
+from airflow_toolkit.testing import MockFilesystem
+def test_my_pipeline(tmp_path):
+    fs = MockFilesystem({"data/file.csv": b"id,name\n1,Alice"})
+    with patch(
+        "airflow_toolkit.filesystems.filesystem_factory.FilesystemFactory.get_data_lake_filesystem",
+        return_value=fs,
+    ):
+        # run your operator or task here
+        ...
+```
+`MockFilesystem` implements the full `FilesystemProtocol`: `read`, `write`, `delete_file`, `create_prefix`, `delete_prefix`, `check_file`, `check_prefix`, `list_files`.
+---
 ## Running Tests
 ### Integration tests

{airflow_toolkit-2.2.0 → airflow_toolkit-2.4.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "airflow-toolkit"
-version = "2.2.0"
+version = "2.4.0"
 description = "A toolkit of operators, hooks and utilities for Apache Airflow 3"
 authors = [{ name = "Biel Llobera", email = "biel_llobera@dkl.digital" }]
 requires-python = ">=3.11,<3.15"
@@ -49,6 +49,12 @@ duckdb = [
 sqlite = [
     "apache-airflow-providers-sqlite",
 ]
+excel = [
+    "openpyxl>=3.1",
+]
+avro = [
+    "fastavro>=1.9",
+]
 airflow3-full = [
     "apache-airflow>=3,<4",
     "apache-airflow-providers-fab>=3.0.0",
@@ -66,6 +72,8 @@ airflow3-full = [
     "jmespath>=1.0.1,<2",
     "airflow-provider-duckdb>=0.1.2",
     "apache-airflow-providers-sqlite",
+    "openpyxl>=3.1",
+    "fastavro>=1.9",
 ]
 [dependency-groups]

{airflow_toolkit-2.2.0 → airflow_toolkit-2.4.0}/src/airflow_toolkit/compression_utils.py RENAMED Viewed

@@ -1,10 +1,10 @@
 import gzip
 import zipfile
 from io import BytesIO
-from typing import Literal, Union
+from airflow_toolkit.types import CompressionOptions
 DEFAULT_ZIP_FILENAME = "file.zip"
-CompressionOptions = Union[Literal["infer", "gzip", "bz2", "zip", "xz", "zstd"], None]
 def gzip_data(data: bytes) -> bytes:

airflow-toolkit 2.2.0__tar.gz → 2.4.0__tar.gz

airflow-toolkit 2.2.0tar.gz → 2.4.0tar.gz