ingestr 0.13.60__tar.gz → 0.13.61__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of ingestr might be problematic. Click here for more details.
- {ingestr-0.13.60 → ingestr-0.13.61}/PKG-INFO +3 -2
- {ingestr-0.13.60 → ingestr-0.13.61}/README.md +1 -1
- ingestr-0.13.61/docs/media/cratedb-destination.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/cratedb.md +60 -1
- ingestr-0.13.61/docs/supported-sources/gcs.md +169 -0
- ingestr-0.13.61/ingestr/src/buildinfo.py +1 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/destinations.py +102 -45
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/factory.py +4 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/isoc_pulse/__init__.py +1 -1
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/sources.py +1 -1
- {ingestr-0.13.60 → ingestr-0.13.61}/requirements.in +1 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/requirements.txt +7 -1
- {ingestr-0.13.60 → ingestr-0.13.61}/requirements_arm64.txt +7 -1
- ingestr-0.13.60/docs/supported-sources/gcs.md +0 -66
- ingestr-0.13.60/ingestr/src/buildinfo.py +0 -1
- {ingestr-0.13.60 → ingestr-0.13.61}/.dockerignore +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/.githooks/pre-commit-hook.sh +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/.github/workflows/deploy-docs.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/.github/workflows/release.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/.github/workflows/secrets-scan.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/.github/workflows/tests.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/.gitignore +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/.gitleaksignore +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/.python-version +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/.vale.ini +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/Dockerfile +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/LICENSE.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/Makefile +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/.vitepress/config.mjs +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/.vitepress/theme/custom.css +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/.vitepress/theme/index.js +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/commands/example-uris.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/commands/ingest.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/getting-started/core-concepts.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/getting-started/incremental-loading.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/getting-started/quickstart.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/getting-started/telemetry.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/index.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/applovin_max.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/athena.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/clickhouse_img.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/cratedb-source.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/freshdesk_ingestion.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/gcp_spanner_ingestion.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/github.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/google_analytics_realtime_report.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/googleanalytics.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/ingestion_elasticsearch_img.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/kinesis.bigquery.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/linkedin_ads.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/mixpanel_ingestion.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/personio.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/personio_duckdb.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/phantombuster.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/pipedrive.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/quickbook_ingestion.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/sftp.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/stripe_postgres.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/media/tiktok.png +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/adjust.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/airtable.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/applovin.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/applovin_max.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/appsflyer.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/appstore.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/asana.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/athena.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/attio.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/bigquery.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/chess.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/clickhouse.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/csv.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/custom_queries.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/databricks.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/db2.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/duckdb.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/dynamodb.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/elasticsearch.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/facebook-ads.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/frankfurter.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/freshdesk.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/github.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/google-ads.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/google_analytics.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/gorgias.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/gsheets.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/hubspot.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/isoc-pulse.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/kafka.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/kinesis.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/klaviyo.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/linkedin_ads.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/mixpanel.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/mongodb.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/mssql.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/mysql.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/notion.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/oracle.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/personio.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/phantombuster.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/pinterest.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/pipedrive.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/postgres.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/quickbooks.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/redshift.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/s3.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/salesforce.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/sap-hana.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/sftp.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/shopify.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/slack.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/smartsheets.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/snowflake.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/solidgate.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/spanner.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/sqlite.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/stripe.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/tiktok-ads.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/trustpilot.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/supported-sources/zendesk.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/tutorials/load-kinesis-bigquery.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/tutorials/load-personio-duckdb.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/docs/tutorials/load-stripe-postgres.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/conftest.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/main.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/.gitignore +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/adjust/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/adjust/adjust_helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/airtable/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/applovin/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/applovin_max/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/appsflyer/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/appsflyer/client.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/appstore/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/appstore/client.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/appstore/errors.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/appstore/models.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/appstore/resources.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/arrow/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/asana_source/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/asana_source/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/asana_source/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/attio/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/attio/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/blob.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/chess/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/chess/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/chess/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/collector/spinner.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/dynamodb/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/elasticsearch/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/errors.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/facebook_ads/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/facebook_ads/exceptions.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/facebook_ads/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/facebook_ads/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/facebook_ads/utils.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/filesystem/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/filesystem/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/filesystem/readers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/filters.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/frankfurter/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/frankfurter/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/freshdesk/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/freshdesk/freshdesk_client.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/freshdesk/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/github/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/github/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/github/queries.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/github/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_ads/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_ads/field.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_ads/metrics.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_ads/predicates.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_ads/reports.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_analytics/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_analytics/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_sheets/README.md +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_sheets/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_sheets/helpers/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_sheets/helpers/api_calls.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/google_sheets/helpers/data_processing.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/gorgias/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/gorgias/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/http_client.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/hubspot/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/hubspot/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/hubspot/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/kafka/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/kafka/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/kinesis/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/kinesis/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/klaviyo/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/klaviyo/client.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/klaviyo/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/linkedin_ads/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/linkedin_ads/dimension_time_enum.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/linkedin_ads/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/loader.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/mixpanel/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/mixpanel/client.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/mongodb/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/mongodb/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/notion/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/notion/helpers/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/notion/helpers/client.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/notion/helpers/database.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/notion/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/partition.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/personio/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/personio/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/phantombuster/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/phantombuster/client.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/pinterest/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/pipedrive/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/pipedrive/helpers/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/pipedrive/helpers/custom_fields_munger.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/pipedrive/helpers/pages.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/pipedrive/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/pipedrive/typing.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/quickbooks/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/resource.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/salesforce/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/salesforce/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/shopify/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/shopify/exceptions.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/shopify/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/shopify/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/slack/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/slack/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/slack/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/smartsheets/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/solidgate/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/solidgate/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/sql_database/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/sql_database/callbacks.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/stripe_analytics/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/stripe_analytics/helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/stripe_analytics/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/table_definition.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/telemetry/event.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/testdata/fakebqcredentials.json +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/tiktok_ads/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/tiktok_ads/tiktok_helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/time.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/trustpilot/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/trustpilot/client.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/version.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/zendesk/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/zendesk/helpers/__init__.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/zendesk/helpers/api_helpers.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/zendesk/helpers/credentials.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/zendesk/helpers/talk_api.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/src/zendesk/settings.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/testdata/.gitignore +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/testdata/create_replace.csv +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/testdata/delete_insert_expected.csv +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/testdata/delete_insert_part1.csv +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/testdata/delete_insert_part2.csv +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/testdata/merge_expected.csv +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/testdata/merge_part1.csv +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/testdata/merge_part2.csv +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/ingestr/tests/unit/test_smartsheets.py +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/package-lock.json +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/package.json +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/pyproject.toml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/requirements-dev.txt +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/resources/demo.gif +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/resources/demo.tape +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/resources/ingestr.svg +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/AMPM.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Acronyms.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Colons.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Contractions.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/DateFormat.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Ellipses.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/EmDash.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Exclamation.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/FirstPerson.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Gender.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/GenderBias.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/HeadingPunctuation.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Headings.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Latin.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/LyHyphens.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/OptionalPlurals.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Ordinal.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/OxfordComma.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Parens.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Passive.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Periods.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Quotes.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Ranges.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Semicolons.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Slang.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Spacing.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Spelling.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Units.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/We.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/Will.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/WordList.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/meta.json +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/Google/vocab.txt +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/bruin/Ingestr.yml +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/styles/config/vocabularies/bruin/accept.txt +0 -0
- {ingestr-0.13.60 → ingestr-0.13.61}/test.env.template +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: ingestr
|
|
3
|
-
Version: 0.13.
|
|
3
|
+
Version: 0.13.61
|
|
4
4
|
Summary: ingestr is a command-line application that ingests data from various sources and stores them in any database.
|
|
5
5
|
Project-URL: Homepage, https://github.com/bruin-data/ingestr
|
|
6
6
|
Project-URL: Issues, https://github.com/bruin-data/ingestr/issues
|
|
@@ -47,6 +47,7 @@ Requires-Dist: databricks-sqlalchemy==1.0.2
|
|
|
47
47
|
Requires-Dist: dataclasses-json==0.6.7
|
|
48
48
|
Requires-Dist: decorator==5.2.1
|
|
49
49
|
Requires-Dist: deprecation==2.1.0
|
|
50
|
+
Requires-Dist: dlt-cratedb==0.0.1
|
|
50
51
|
Requires-Dist: dlt==1.10.0
|
|
51
52
|
Requires-Dist: dnspython==2.7.0
|
|
52
53
|
Requires-Dist: duckdb-engine==0.17.0
|
|
@@ -305,7 +306,7 @@ Pull requests are welcome. However, please open an issue first to discuss what y
|
|
|
305
306
|
<tr>
|
|
306
307
|
<td>CrateDB</td>
|
|
307
308
|
<td>✅</td>
|
|
308
|
-
<td
|
|
309
|
+
<td>✅</td>
|
|
309
310
|
</tr>
|
|
310
311
|
<tr>
|
|
311
312
|
<td>Databricks</td>
|
|
Binary file
|
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
massive amounts of data in near real-time, even with complex queries. It is
|
|
5
5
|
PostgreSQL-compatible, and based on Lucene.
|
|
6
6
|
|
|
7
|
-
ingestr supports CrateDB as a source database.
|
|
7
|
+
ingestr supports CrateDB as a source and destination database.
|
|
8
8
|
|
|
9
9
|
## Source
|
|
10
10
|
|
|
@@ -56,6 +56,61 @@ duckdb cratedb.duckdb 'SELECT * FROM dest.summits LIMIT 5'
|
|
|
56
56
|
|
|
57
57
|
<img alt="CrateDB_img" src="../media/cratedb-source.png" />
|
|
58
58
|
|
|
59
|
+
## Destination
|
|
60
|
+
|
|
61
|
+
For connecting to CrateDB as a database destination, ingestr uses the
|
|
62
|
+
[dlt cratedb adapter], which is based on the [dlt postgres adapter],
|
|
63
|
+
in turn using the [psycopg2] package.
|
|
64
|
+
|
|
65
|
+
### URI format
|
|
66
|
+
|
|
67
|
+
The URI format for CrateDB as a destination is as follows:
|
|
68
|
+
```plaintext
|
|
69
|
+
cratedb://<username>:<password>@<host>:<port>?sslmode=<sslmode>
|
|
70
|
+
```
|
|
71
|
+
> [!INFO]
|
|
72
|
+
> When connecting to CrateDB on localhost, use:
|
|
73
|
+
> ```plaintext
|
|
74
|
+
> cratedb://crate:@localhost:5432?sslmode=disable
|
|
75
|
+
> ```
|
|
76
|
+
>
|
|
77
|
+
> When connecting to [CrateDB Cloud], the URI looks like this:
|
|
78
|
+
> ```plaintext
|
|
79
|
+
> cratedb://admin:<PASSWORD>@<CLUSTERNAME>.eks1.eu-west-1.aws.cratedb.net:5432?sslmode=require
|
|
80
|
+
> ```
|
|
81
|
+
|
|
82
|
+
### URI parameters
|
|
83
|
+
- `username` (required): The username is required to authenticate with the CrateDB server.
|
|
84
|
+
- `password` (required): The password is required to authenticate the provided username.
|
|
85
|
+
- `host` (required): The hostname or IP address of the CrateDB server where the database is hosted.
|
|
86
|
+
- `port` (required): The TCP port number used by the CrateDB server. Mostly `5432`.
|
|
87
|
+
- `sslmode` (optional): Set to one of `disable`, `allow`, `prefer`, `require`, `verify-ca`,
|
|
88
|
+
or `verify-full`, see [PostgreSQL SSL Mode Descriptions].
|
|
89
|
+
|
|
90
|
+
### Example
|
|
91
|
+
|
|
92
|
+
This is an example command that will import a CSV file into CrateDB,
|
|
93
|
+
then display the content from CrateDB.
|
|
94
|
+
|
|
95
|
+
```shell
|
|
96
|
+
wget -O input.csv https://github.com/bruin-data/ingestr/raw/refs/heads/main/ingestr/testdata/create_replace.csv
|
|
97
|
+
```
|
|
98
|
+
```shell
|
|
99
|
+
ingestr ingest \
|
|
100
|
+
--source-uri 'csv://input.csv' \
|
|
101
|
+
--source-table 'sample' \
|
|
102
|
+
--dest-uri 'cratedb://crate:@localhost:5432/?sslmode=disable' \
|
|
103
|
+
--dest-table 'doc.sample'
|
|
104
|
+
```
|
|
105
|
+
```shell
|
|
106
|
+
uvx crash -c 'SELECT * FROM doc.sample'
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
<img alt="CrateDB_img" src="../media/cratedb-destination.png" />
|
|
110
|
+
|
|
111
|
+
> [!WARNING]
|
|
112
|
+
> CrateDB supports the `replace` incremental materialization strategy, but
|
|
113
|
+
> currently does not support the `delete+insert`, `merge`, or `scd2` strategies.
|
|
59
114
|
|
|
60
115
|
## Appendix
|
|
61
116
|
|
|
@@ -75,6 +130,10 @@ or share relevant issue reports that help us improve interoperability. Thanks!
|
|
|
75
130
|
|
|
76
131
|
[CrateDB]: https://github.com/crate/crate
|
|
77
132
|
[CrateDB Cloud]: https://console.cratedb.cloud/
|
|
133
|
+
[dlt cratedb adapter]: https://github.com/dlt-hub/dlt/pull/2733
|
|
134
|
+
[dlt postgres adapter]: https://github.com/dlt-hub/dlt/tree/devel/dlt/destinations/impl/postgres
|
|
135
|
+
[PostgreSQL SSL Mode Descriptions]: https://www.postgresql.org/docs/current/libpq-ssl.html#LIBPQ-SSL-SSLMODE-STATEMENTS
|
|
136
|
+
[psycopg2]: https://pypi.org/project/psycopg2-binary/
|
|
78
137
|
[sqlalchemy-cratedb]: https://pypi.org/project/sqlalchemy-cratedb/
|
|
79
138
|
[Support for ingestr/CrateDB]: https://github.com/crate/crate-clients-tools/issues/86
|
|
80
139
|
["tool: dlt/ingestr"]: https://github.com/crate/crate/issues?q=state%3Aopen%20label%3A%22tool%3A%20dlt%2Fingestr%22
|
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
# Google Cloud Storage
|
|
2
|
+
|
|
3
|
+
[Google Cloud Storage](https://cloud.google.com/storage?hl=en) is an online file storage web service for storing and accessing data on Google Cloud Platform infrastructure. The service combines the performance and scalability of Google's cloud with advanced security and sharing capabilities. It is an Infrastructure as a Service (IaaS), comparable to Amazon S3.
|
|
4
|
+
|
|
5
|
+
`ingestr` supports Google Cloud Storage as both a data source and destination.
|
|
6
|
+
|
|
7
|
+
## URI format
|
|
8
|
+
|
|
9
|
+
The URI format for Google Cloud Storage is as follows:
|
|
10
|
+
|
|
11
|
+
```plaintext
|
|
12
|
+
gs://?credentials_path=/path/to/service-account.json
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
URI parameters:
|
|
16
|
+
|
|
17
|
+
- `credentials_path`: path to file containing your Google Cloud [Service Account](https://cloud.google.com/iam/docs/service-account-overview)
|
|
18
|
+
- `credentials_base64`: base64-encoded service account JSON (alternative to credentials_path)
|
|
19
|
+
- `layout`: Layout template (optional, destination only)
|
|
20
|
+
|
|
21
|
+
The `--source-table` must be in the format:
|
|
22
|
+
```
|
|
23
|
+
{bucket name}/{file glob}
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Setting up a GCS Integration
|
|
27
|
+
|
|
28
|
+
To use Google Cloud Storage source in `ingestr`, you will need:
|
|
29
|
+
* A Google Cloud Project.
|
|
30
|
+
* A Service Account with at least [roles/storage.objectUser](https://cloud.google.com/storage/docs/access-control/iam-roles) IAM permission for reading, or [roles/storage.objectAdmin](https://cloud.google.com/storage/docs/access-control/iam-roles) for writing to GCS.
|
|
31
|
+
* A Service Account key file for the corresponding service account.
|
|
32
|
+
|
|
33
|
+
For more information on how to create a Service Account or its keys, see [Create service accounts](https://cloud.google.com/iam/docs/service-accounts-create) and [Create or delete service account keys](https://cloud.google.com/iam/docs/keys-create-delete) on Google Cloud docs.
|
|
34
|
+
|
|
35
|
+
## Example: Loading data from GCS
|
|
36
|
+
|
|
37
|
+
Let's assume that:
|
|
38
|
+
* Service account key is available in the current directory, under the filename `service_account.json`.
|
|
39
|
+
* The bucket you want to load data from is called `my-org-bucket`
|
|
40
|
+
* The source file is available at `data/latest/dump.csv`
|
|
41
|
+
* The data needs to be saved in a DuckDB database called `local.db`
|
|
42
|
+
* The destination table name will be `public.latest_dump`
|
|
43
|
+
|
|
44
|
+
You can run the following command line to achieve this:
|
|
45
|
+
|
|
46
|
+
```sh
|
|
47
|
+
ingestr ingest \
|
|
48
|
+
--source-uri "gs://?credentials_path=$PWD/service_account.json" \
|
|
49
|
+
--source-table "my-org-bucket/data/latest/dump.csv" \
|
|
50
|
+
--dest-uri "duckdb:///local.db" \
|
|
51
|
+
--dest-table "public.latest_dump"
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## Example: Uploading data to GCS
|
|
55
|
+
|
|
56
|
+
For this example, we'll assume that:
|
|
57
|
+
* `records.db` is a DuckDB database.
|
|
58
|
+
* It has a table called `public.users`.
|
|
59
|
+
* The service account key is available in the current directory.
|
|
60
|
+
|
|
61
|
+
The following command demonstrates how to copy data from a local DuckDB database to GCS:
|
|
62
|
+
```sh
|
|
63
|
+
ingestr ingest \
|
|
64
|
+
--source-uri 'duckdb:///records.db' \
|
|
65
|
+
--source-table 'public.users' \
|
|
66
|
+
--dest-uri "gs://?credentials_path=$PWD/service_account.json" \
|
|
67
|
+
--dest-table 'my-org-bucket/records'
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
This will result in a file structure like the following:
|
|
71
|
+
```
|
|
72
|
+
my-org-bucket/
|
|
73
|
+
└── records
|
|
74
|
+
├── _dlt_loads
|
|
75
|
+
├── _dlt_pipeline_state
|
|
76
|
+
├── _dlt_version
|
|
77
|
+
└── users
|
|
78
|
+
└── <load_id>.<file_id>.parquet
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
The value of `load_id` and `file_id` is determined at runtime. The default layout creates a folder with the same table name as the source and places the data inside a parquet file. This layout is configurable using the `layout` parameter.
|
|
82
|
+
|
|
83
|
+
For example, if you would like to create a parquet file with the same name as the source table (as opposed to a folder) you can set `layout` to `{table_name}.{ext}` in the command line above:
|
|
84
|
+
|
|
85
|
+
```sh
|
|
86
|
+
ingestr ingest \
|
|
87
|
+
--source-uri 'duckdb:///records.db' \
|
|
88
|
+
--source-table 'public.users' \
|
|
89
|
+
--dest-uri "gs://?layout={table_name}.{ext}&credentials_path=$PWD/service_account.json" \
|
|
90
|
+
--dest-table 'my-org-bucket/records'
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
Result:
|
|
94
|
+
```
|
|
95
|
+
my-org-bucket/
|
|
96
|
+
└── records
|
|
97
|
+
├── _dlt_loads
|
|
98
|
+
├── _dlt_pipeline_state
|
|
99
|
+
├── _dlt_version
|
|
100
|
+
└── users.parquet
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
List of available Layout variables is available [here](https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#available-layout-placeholders)
|
|
104
|
+
|
|
105
|
+
## Supported File Formats
|
|
106
|
+
`gs` source only supports loading files in the following formats:
|
|
107
|
+
* `csv`: Comma Separated Values
|
|
108
|
+
* `parquet`: [Apache Parquet](https://parquet.apache.org/) storage format.
|
|
109
|
+
* `jsonl`: Line delimited JSON. see [https://jsonlines.org/](https://jsonlines.org/)
|
|
110
|
+
|
|
111
|
+
::: info NOTE
|
|
112
|
+
When writing to GCS, only `parquet` is supported.
|
|
113
|
+
:::
|
|
114
|
+
## File Pattern
|
|
115
|
+
`ingestr` supports [glob](https://en.wikipedia.org/wiki/Glob_(programming)) like pattern matching for `gs` source.
|
|
116
|
+
This allows for a powerful pattern matching mechanism that allows you to specify multiple files in a single `--source-table`.
|
|
117
|
+
|
|
118
|
+
Below are some examples of path patterns, each path pattern is glob you can specify after the bucket name:
|
|
119
|
+
|
|
120
|
+
- `**/*.csv`: Retrieves all the CSV files, regardless of how deep they are within the folder structure.
|
|
121
|
+
- `*.csv`: Retrieves all the CSV files from the first level of a folder.
|
|
122
|
+
- `myFolder/**/*.jsonl`: Retrieves all the JSONL files from anywhere under `myFolder`.
|
|
123
|
+
- `myFolder/mySubFolder/users.parquet`: Retrieves the `users.parquet` file from `mySubFolder`.
|
|
124
|
+
- `employees.jsonl`: Retrieves the `employees.jsonl` file from the root level of the bucket.
|
|
125
|
+
|
|
126
|
+
### Working with compressed files
|
|
127
|
+
|
|
128
|
+
`ingestr` automatically detects and handles gzipped files in your GCS bucket. You can load data from compressed files with the `.gz` extension without any additional configuration.
|
|
129
|
+
|
|
130
|
+
For example, to load data from a gzipped CSV file:
|
|
131
|
+
|
|
132
|
+
```sh
|
|
133
|
+
ingestr ingest \
|
|
134
|
+
--source-uri "gs://?credentials_path=$PWD/service_account.json" \
|
|
135
|
+
--source-table "my-org-bucket/logs/event-data.csv.gz" \
|
|
136
|
+
--dest-uri "duckdb:///compressed_data.duckdb" \
|
|
137
|
+
--dest-table "logs.events"
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
You can also use glob patterns to load multiple compressed files:
|
|
141
|
+
|
|
142
|
+
```sh
|
|
143
|
+
ingestr ingest \
|
|
144
|
+
--source-uri "gs://?credentials_path=$PWD/service_account.json" \
|
|
145
|
+
--source-table "my-org-bucket/logs/**/*.csv.gz" \
|
|
146
|
+
--dest-uri "duckdb:///compressed_data.duckdb" \
|
|
147
|
+
--dest-table "logs.events"
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### File type hinting
|
|
151
|
+
|
|
152
|
+
If your files are properly encoded but lack the correct file extension (CSV, JSONL, or Parquet), you can provide a file type hint to inform `ingestr` about the format of the files. This is done by appending a fragment identifier (`#format`) to the end of the path in your `--source-table` parameter.
|
|
153
|
+
|
|
154
|
+
For example, if you have JSONL-formatted log files stored in GCS with a non-standard extension:
|
|
155
|
+
|
|
156
|
+
```
|
|
157
|
+
--source-table "my-org-bucket/logs/event-data#jsonl"
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
This tells `ingestr` to process the files as JSONL, regardless of their actual extension.
|
|
161
|
+
|
|
162
|
+
Supported format hints include:
|
|
163
|
+
- `#csv` - For comma-separated values files
|
|
164
|
+
- `#jsonl` - For line-delimited JSON files
|
|
165
|
+
- `#parquet` - For Parquet format files
|
|
166
|
+
|
|
167
|
+
::: tip
|
|
168
|
+
File type hinting works with `gzip` compressed files as well.
|
|
169
|
+
:::
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
version = "v0.13.61"
|
|
@@ -1,3 +1,4 @@
|
|
|
1
|
+
import abc
|
|
1
2
|
import base64
|
|
2
3
|
import csv
|
|
3
4
|
import json
|
|
@@ -9,6 +10,7 @@ from urllib.parse import parse_qs, quote, urlparse
|
|
|
9
10
|
import dlt
|
|
10
11
|
import dlt.destinations.impl.filesystem.filesystem
|
|
11
12
|
from dlt.common.configuration.specs import AwsCredentials
|
|
13
|
+
from dlt.common.storages.configuration import FileSystemCredentials
|
|
12
14
|
from dlt.destinations.impl.clickhouse.configuration import (
|
|
13
15
|
ClickHouseCredentials,
|
|
14
16
|
)
|
|
@@ -111,6 +113,14 @@ class BigQueryDestination:
|
|
|
111
113
|
pass
|
|
112
114
|
|
|
113
115
|
|
|
116
|
+
class CrateDBDestination(GenericSqlDestination):
|
|
117
|
+
def dlt_dest(self, uri: str, **kwargs):
|
|
118
|
+
uri = uri.replace("cratedb://", "postgres://")
|
|
119
|
+
import dlt_cratedb.impl.cratedb.factory
|
|
120
|
+
|
|
121
|
+
return dlt_cratedb.impl.cratedb.factory.cratedb(credentials=uri, **kwargs)
|
|
122
|
+
|
|
123
|
+
|
|
114
124
|
class PostgresDestination(GenericSqlDestination):
|
|
115
125
|
def dlt_dest(self, uri: str, **kwargs):
|
|
116
126
|
return dlt.destinations.postgres(credentials=uri, **kwargs)
|
|
@@ -386,43 +396,62 @@ class ClickhouseDestination:
|
|
|
386
396
|
pass
|
|
387
397
|
|
|
388
398
|
|
|
389
|
-
class
|
|
399
|
+
class BlobFSClient(dlt.destinations.impl.filesystem.filesystem.FilesystemClient):
|
|
390
400
|
@property
|
|
391
401
|
def dataset_path(self):
|
|
392
402
|
# override to remove dataset path
|
|
393
403
|
return self.bucket_path
|
|
394
404
|
|
|
395
405
|
|
|
396
|
-
class
|
|
406
|
+
class BlobFS(dlt.destinations.filesystem):
|
|
397
407
|
@property
|
|
398
408
|
def client_class(self):
|
|
399
|
-
return
|
|
409
|
+
return BlobFSClient
|
|
400
410
|
|
|
401
411
|
|
|
402
|
-
class
|
|
412
|
+
class SqliteDestination(GenericSqlDestination):
|
|
403
413
|
def dlt_dest(self, uri: str, **kwargs):
|
|
404
|
-
|
|
405
|
-
params = parse_qs(parsed_uri.query)
|
|
414
|
+
return dlt.destinations.sqlalchemy(credentials=uri)
|
|
406
415
|
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
|
|
416
|
+
def dlt_run_params(self, uri: str, table: str, **kwargs):
|
|
417
|
+
return {
|
|
418
|
+
# https://dlthub.com/docs/dlt-ecosystem/destinations/sqlalchemy#dataset-files
|
|
419
|
+
"dataset_name": "main",
|
|
420
|
+
"table_name": table,
|
|
421
|
+
}
|
|
410
422
|
|
|
411
|
-
secret_access_key = params.get("secret_access_key", [None])[0]
|
|
412
|
-
if secret_access_key is None:
|
|
413
|
-
raise MissingValueError("secret_access_key", "S3")
|
|
414
423
|
|
|
415
|
-
|
|
416
|
-
|
|
417
|
-
|
|
418
|
-
if not parsed_endpoint.scheme or not parsed_endpoint.netloc:
|
|
419
|
-
raise ValueError("Invalid endpoint_url. Must be a valid URL.")
|
|
424
|
+
class MySqlDestination(GenericSqlDestination):
|
|
425
|
+
def dlt_dest(self, uri: str, **kwargs):
|
|
426
|
+
return dlt.destinations.sqlalchemy(credentials=uri)
|
|
420
427
|
|
|
421
|
-
|
|
422
|
-
|
|
423
|
-
|
|
424
|
-
|
|
425
|
-
|
|
428
|
+
def dlt_run_params(self, uri: str, table: str, **kwargs):
|
|
429
|
+
parsed = urlparse(uri)
|
|
430
|
+
database = parsed.path.lstrip("/")
|
|
431
|
+
if not database:
|
|
432
|
+
raise ValueError("You need to specify a database")
|
|
433
|
+
return {
|
|
434
|
+
"dataset_name": database,
|
|
435
|
+
"table_name": table,
|
|
436
|
+
}
|
|
437
|
+
|
|
438
|
+
|
|
439
|
+
class BlobStorageDestination(abc.ABC):
|
|
440
|
+
@abc.abstractmethod
|
|
441
|
+
def credentials(self, params: dict) -> FileSystemCredentials:
|
|
442
|
+
"""Build credentials for the blob storage destination."""
|
|
443
|
+
pass
|
|
444
|
+
|
|
445
|
+
@property
|
|
446
|
+
@abc.abstractmethod
|
|
447
|
+
def protocol(self) -> str:
|
|
448
|
+
"""The protocol used for the blob storage destination."""
|
|
449
|
+
pass
|
|
450
|
+
|
|
451
|
+
def dlt_dest(self, uri: str, **kwargs):
|
|
452
|
+
parsed_uri = urlparse(uri)
|
|
453
|
+
params = parse_qs(parsed_uri.query)
|
|
454
|
+
creds = self.credentials(params)
|
|
426
455
|
|
|
427
456
|
dest_table = kwargs["dest_table"]
|
|
428
457
|
|
|
@@ -442,7 +471,7 @@ class S3Destination:
|
|
|
442
471
|
base_path = "/".join(table_parts[:-1])
|
|
443
472
|
|
|
444
473
|
opts = {
|
|
445
|
-
"bucket_url": f"
|
|
474
|
+
"bucket_url": f"{self.protocol}://{base_path}",
|
|
446
475
|
"credentials": creds,
|
|
447
476
|
# supresses dlt warnings about dataset name normalization.
|
|
448
477
|
# we don't use dataset names in S3 so it's fine to disable this.
|
|
@@ -452,7 +481,7 @@ class S3Destination:
|
|
|
452
481
|
if layout is not None:
|
|
453
482
|
opts["layout"] = layout
|
|
454
483
|
|
|
455
|
-
return
|
|
484
|
+
return BlobFS(**opts) # type: ignore
|
|
456
485
|
|
|
457
486
|
def validate_table(self, table: str):
|
|
458
487
|
table = table.strip("/ ")
|
|
@@ -470,28 +499,56 @@ class S3Destination:
|
|
|
470
499
|
pass
|
|
471
500
|
|
|
472
501
|
|
|
473
|
-
class
|
|
474
|
-
|
|
475
|
-
|
|
502
|
+
class S3Destination(BlobStorageDestination):
|
|
503
|
+
@property
|
|
504
|
+
def protocol(self) -> str:
|
|
505
|
+
return "s3"
|
|
476
506
|
|
|
477
|
-
def
|
|
478
|
-
|
|
479
|
-
|
|
480
|
-
"
|
|
481
|
-
"table_name": table,
|
|
482
|
-
}
|
|
507
|
+
def credentials(self, params: dict) -> FileSystemCredentials:
|
|
508
|
+
access_key_id = params.get("access_key_id", [None])[0]
|
|
509
|
+
if access_key_id is None:
|
|
510
|
+
raise MissingValueError("access_key_id", "S3")
|
|
483
511
|
|
|
512
|
+
secret_access_key = params.get("secret_access_key", [None])[0]
|
|
513
|
+
if secret_access_key is None:
|
|
514
|
+
raise MissingValueError("secret_access_key", "S3")
|
|
484
515
|
|
|
485
|
-
|
|
486
|
-
|
|
487
|
-
|
|
516
|
+
endpoint_url = params.get("endpoint_url", [None])[0]
|
|
517
|
+
if endpoint_url is not None:
|
|
518
|
+
parsed_endpoint = urlparse(endpoint_url)
|
|
519
|
+
if not parsed_endpoint.scheme or not parsed_endpoint.netloc:
|
|
520
|
+
raise ValueError("Invalid endpoint_url. Must be a valid URL.")
|
|
488
521
|
|
|
489
|
-
|
|
490
|
-
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
|
|
494
|
-
|
|
495
|
-
|
|
496
|
-
|
|
497
|
-
|
|
522
|
+
return AwsCredentials(
|
|
523
|
+
aws_access_key_id=access_key_id,
|
|
524
|
+
aws_secret_access_key=secret_access_key,
|
|
525
|
+
endpoint_url=endpoint_url,
|
|
526
|
+
)
|
|
527
|
+
|
|
528
|
+
|
|
529
|
+
class GCSDestination(BlobStorageDestination):
|
|
530
|
+
@property
|
|
531
|
+
def protocol(self) -> str:
|
|
532
|
+
return "gs"
|
|
533
|
+
|
|
534
|
+
def credentials(self, params: dict) -> FileSystemCredentials:
|
|
535
|
+
"""Builds GCS credentials from the provided parameters."""
|
|
536
|
+
credentials_path = params.get("credentials_path")
|
|
537
|
+
credentials_base64 = params.get("credentials_base64")
|
|
538
|
+
credentials_available = any(
|
|
539
|
+
map(
|
|
540
|
+
lambda x: x is not None,
|
|
541
|
+
[credentials_path, credentials_base64],
|
|
542
|
+
)
|
|
543
|
+
)
|
|
544
|
+
if credentials_available is False:
|
|
545
|
+
raise MissingValueError("credentials_path or credentials_base64", "GCS")
|
|
546
|
+
|
|
547
|
+
credentials = None
|
|
548
|
+
if credentials_path:
|
|
549
|
+
with open(credentials_path[0], "r") as f:
|
|
550
|
+
credentials = json.load(f)
|
|
551
|
+
else:
|
|
552
|
+
credentials = json.loads(base64.b64decode(credentials_base64[0]).decode()) # type: ignore
|
|
553
|
+
|
|
554
|
+
return credentials
|
|
@@ -7,9 +7,11 @@ from ingestr.src.destinations import (
|
|
|
7
7
|
AthenaDestination,
|
|
8
8
|
BigQueryDestination,
|
|
9
9
|
ClickhouseDestination,
|
|
10
|
+
CrateDBDestination,
|
|
10
11
|
CsvDestination,
|
|
11
12
|
DatabricksDestination,
|
|
12
13
|
DuckDBDestination,
|
|
14
|
+
GCSDestination,
|
|
13
15
|
MsSQLDestination,
|
|
14
16
|
MySqlDestination,
|
|
15
17
|
PostgresDestination,
|
|
@@ -181,6 +183,7 @@ class SourceDestinationFactory:
|
|
|
181
183
|
}
|
|
182
184
|
destinations: Dict[str, Type[DestinationProtocol]] = {
|
|
183
185
|
"bigquery": BigQueryDestination,
|
|
186
|
+
"cratedb": CrateDBDestination,
|
|
184
187
|
"databricks": DatabricksDestination,
|
|
185
188
|
"duckdb": DuckDBDestination,
|
|
186
189
|
"mssql": MsSQLDestination,
|
|
@@ -197,6 +200,7 @@ class SourceDestinationFactory:
|
|
|
197
200
|
"clickhouse+native": ClickhouseDestination,
|
|
198
201
|
"clickhouse": ClickhouseDestination,
|
|
199
202
|
"s3": S3Destination,
|
|
203
|
+
"gs": GCSDestination,
|
|
200
204
|
"sqlite": SqliteDestination,
|
|
201
205
|
"mysql": MySqlDestination,
|
|
202
206
|
"mysql+pymysql": MySqlDestination,
|
|
@@ -1885,7 +1885,7 @@ class GCSSource:
|
|
|
1885
1885
|
endpoint = blob.parse_endpoint(path_to_file)
|
|
1886
1886
|
except blob.UnsupportedEndpointError:
|
|
1887
1887
|
raise ValueError(
|
|
1888
|
-
"
|
|
1888
|
+
"GCS Source only supports specific formats files: csv, jsonl, parquet"
|
|
1889
1889
|
)
|
|
1890
1890
|
except Exception as e:
|
|
1891
1891
|
raise ValueError(
|
|
@@ -98,6 +98,10 @@ decorator==5.2.1
|
|
|
98
98
|
deprecation==2.1.0
|
|
99
99
|
# via rudder-sdk-python
|
|
100
100
|
dlt==1.10.0
|
|
101
|
+
# via
|
|
102
|
+
# -r requirements.in
|
|
103
|
+
# dlt-cratedb
|
|
104
|
+
dlt-cratedb==0.0.1
|
|
101
105
|
# via -r requirements.in
|
|
102
106
|
dnspython==2.7.0
|
|
103
107
|
# via pymongo
|
|
@@ -363,7 +367,9 @@ protobuf==4.25.6
|
|
|
363
367
|
psutil==6.1.1
|
|
364
368
|
# via -r requirements.in
|
|
365
369
|
psycopg2-binary==2.9.10
|
|
366
|
-
# via
|
|
370
|
+
# via
|
|
371
|
+
# -r requirements.in
|
|
372
|
+
# dlt
|
|
367
373
|
py-machineid==0.6.0
|
|
368
374
|
# via -r requirements.in
|
|
369
375
|
pyairtable==2.3.3
|
|
@@ -98,6 +98,10 @@ decorator==5.2.1
|
|
|
98
98
|
deprecation==2.1.0
|
|
99
99
|
# via rudder-sdk-python
|
|
100
100
|
dlt==1.10.0
|
|
101
|
+
# via
|
|
102
|
+
# -r requirements.in
|
|
103
|
+
# dlt-cratedb
|
|
104
|
+
dlt-cratedb==0.0.1
|
|
101
105
|
# via -r requirements.in
|
|
102
106
|
dnspython==2.7.0
|
|
103
107
|
# via pymongo
|
|
@@ -357,7 +361,9 @@ protobuf==4.25.8
|
|
|
357
361
|
psutil==6.1.1
|
|
358
362
|
# via -r requirements.in
|
|
359
363
|
psycopg2-binary==2.9.10
|
|
360
|
-
# via
|
|
364
|
+
# via
|
|
365
|
+
# -r requirements.in
|
|
366
|
+
# dlt
|
|
361
367
|
py-machineid==0.6.0
|
|
362
368
|
# via -r requirements.in
|
|
363
369
|
pyairtable==2.3.3
|
|
@@ -1,66 +0,0 @@
|
|
|
1
|
-
# Google Cloud Storage
|
|
2
|
-
|
|
3
|
-
[Google Cloud Storage](https://cloud.google.com/storage?hl=en) is an online file storage web service for storing and accessing data on Google Cloud Platform infrastructure. The service combines the performance and scalability of Google's cloud with advanced security and sharing capabilities. It is an Infrastructure as a Service (IaaS), comparable to Amazon S3.
|
|
4
|
-
|
|
5
|
-
## URI format
|
|
6
|
-
|
|
7
|
-
The URI format for Google Cloud Storage is as follows:
|
|
8
|
-
|
|
9
|
-
```plaintext
|
|
10
|
-
gs://?credentials_path=/path/to/service-account.json>
|
|
11
|
-
```
|
|
12
|
-
|
|
13
|
-
URI parameters:
|
|
14
|
-
|
|
15
|
-
- `credentials_path`: path to file containing your Google Cloud [Service Account](https://cloud.google.com/iam/docs/service-account-overview)
|
|
16
|
-
|
|
17
|
-
The `--source-table` must be in the format:
|
|
18
|
-
```
|
|
19
|
-
{bucket name}/{file glob}
|
|
20
|
-
```
|
|
21
|
-
|
|
22
|
-
## Setting up a GCS Integration
|
|
23
|
-
|
|
24
|
-
To use Google Cloud Storage source in `ingestr`, you will need:
|
|
25
|
-
* A Google Cloud Project.
|
|
26
|
-
* A Service Account with atleast [roles/storage.objectUser](https://cloud.google.com/storage/docs/access-control/iam-roles) IAM permission.
|
|
27
|
-
* A Service Account key file for the corresponding service account.
|
|
28
|
-
|
|
29
|
-
For more information on how to create a Service Account or it's keys, see [Create service accounts](https://cloud.google.com/iam/docs/service-accounts-create) and [Create or delete service account keys](https://cloud.google.com/iam/docs/keys-create-delete) on Google Cloud docs.
|
|
30
|
-
|
|
31
|
-
## Example
|
|
32
|
-
|
|
33
|
-
Let's assume that:
|
|
34
|
-
* Service account key in available in the current directory, under the filename `service_account.json`.
|
|
35
|
-
* The bucket you want to load data from is called `my-org-bucket`
|
|
36
|
-
* The source file is available at `data/latest/dump.csv`
|
|
37
|
-
* The data needs to be saved in a DuckDB database called `local.db`
|
|
38
|
-
* The destination table name will be `public.latest_dump`
|
|
39
|
-
|
|
40
|
-
You can run the following command line to achieve this:
|
|
41
|
-
|
|
42
|
-
```sh
|
|
43
|
-
ingestr ingest \
|
|
44
|
-
--source-uri "gs://?credentials_path=$PWD/service_account.json" \
|
|
45
|
-
--source-table "my-org-bucket/data/latest/dump.csv" \
|
|
46
|
-
--dest-uri "duckdb:///local.db" \
|
|
47
|
-
--dest-table "public.latest_dump"
|
|
48
|
-
```
|
|
49
|
-
|
|
50
|
-
## Supported File Formats
|
|
51
|
-
`gs` source only supports loading files in the following formats:
|
|
52
|
-
* `csv`: Comma Separated Values (supports Tab Separated Values as well)
|
|
53
|
-
* `parquet`: [Apache Parquet](https://parquet.apache.org/) storage format.
|
|
54
|
-
* `jsonl`: Line delimited JSON. see [https://jsonlines.org/](https://jsonlines.org/)
|
|
55
|
-
|
|
56
|
-
## File Pattern
|
|
57
|
-
`ingestr` supports [glob](https://en.wikipedia.org/wiki/Glob_(programming)) like pattern matching for `gs` source.
|
|
58
|
-
This allows for a powerful pattern matching mechanism that allows you to specify multiple files in a single `--source-table`.
|
|
59
|
-
|
|
60
|
-
Below are some examples of path patterns, each path pattern is glob you can specify after the bucket name:
|
|
61
|
-
|
|
62
|
-
- `**/*.csv`: Retrieves all the CSV files, regardless of how deep they are within the folder structure.
|
|
63
|
-
- `*.csv`: Retrieves all the CSV files from the first level of a folder.
|
|
64
|
-
- `myFolder/**/*.jsonl`: Retrieves all the JSONL files from anywhere under `myFolder`.
|
|
65
|
-
- `myFolder/mySubFolder/users.parquet`: Retrieves the `users.parquet` file from `mySubFolder`.
|
|
66
|
-
- `employees.jsonl`: Retrieves the `employees.jsonl` file from the root level of the bucket.
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
version = "v0.13.60"
|
|
File without changes
|