ingestr 0.13.87__tar.gz → 0.13.89__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of ingestr might be problematic. Click here for more details.

Files changed (336) hide show
  1. {ingestr-0.13.87 → ingestr-0.13.89}/PKG-INFO +2 -2
  2. {ingestr-0.13.87 → ingestr-0.13.89}/docs/.vitepress/config.mjs +1 -0
  3. {ingestr-0.13.87 → ingestr-0.13.89}/docs/commands/ingest.md +21 -0
  4. ingestr-0.13.89/docs/getting-started/data-masking.md +377 -0
  5. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/adjust.md +10 -8
  6. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/applovin.md +6 -6
  7. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/applovin_max.md +5 -1
  8. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/appsflyer.md +7 -3
  9. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/asana.md +11 -10
  10. ingestr-0.13.89/docs/supported-sources/attio.md +47 -0
  11. ingestr-0.13.89/docs/supported-sources/fluxx.md +119 -0
  12. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/salesforce.md +21 -19
  13. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/stripe.md +40 -39
  14. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/main.py +12 -0
  15. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/applovin/__init__.py +1 -1
  16. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/asana_source/__init__.py +1 -1
  17. ingestr-0.13.89/ingestr/src/buildinfo.py +1 -0
  18. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/destinations.py +37 -2
  19. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/filesystem/__init__.py +8 -3
  20. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/filters.py +9 -0
  21. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/frankfurter/__init__.py +10 -14
  22. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/frankfurter/helpers.py +2 -2
  23. ingestr-0.13.89/ingestr/src/masking.py +344 -0
  24. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/mongodb/helpers.py +11 -7
  25. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/revenuecat/__init__.py +4 -4
  26. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/revenuecat/helpers.py +4 -4
  27. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/salesforce/__init__.py +9 -8
  28. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/sources.py +1 -0
  29. {ingestr-0.13.87 → ingestr-0.13.89}/requirements.in +2 -2
  30. {ingestr-0.13.87 → ingestr-0.13.89}/requirements.txt +5 -11
  31. {ingestr-0.13.87 → ingestr-0.13.89}/requirements_arm64.txt +5 -11
  32. ingestr-0.13.87/docs/supported-sources/attio.md +0 -45
  33. ingestr-0.13.87/docs/supported-sources/fluxx.md +0 -116
  34. ingestr-0.13.87/ingestr/src/buildinfo.py +0 -1
  35. {ingestr-0.13.87 → ingestr-0.13.89}/.dlt/config.toml +0 -0
  36. {ingestr-0.13.87 → ingestr-0.13.89}/.dockerignore +0 -0
  37. {ingestr-0.13.87 → ingestr-0.13.89}/.githooks/pre-commit-hook.sh +0 -0
  38. {ingestr-0.13.87 → ingestr-0.13.89}/.github/workflows/deploy-docs.yml +0 -0
  39. {ingestr-0.13.87 → ingestr-0.13.89}/.github/workflows/release.yml +0 -0
  40. {ingestr-0.13.87 → ingestr-0.13.89}/.github/workflows/secrets-scan.yml +0 -0
  41. {ingestr-0.13.87 → ingestr-0.13.89}/.github/workflows/tests.yml +0 -0
  42. {ingestr-0.13.87 → ingestr-0.13.89}/.gitignore +0 -0
  43. {ingestr-0.13.87 → ingestr-0.13.89}/.gitleaksignore +0 -0
  44. {ingestr-0.13.87 → ingestr-0.13.89}/.python-version +0 -0
  45. {ingestr-0.13.87 → ingestr-0.13.89}/.vale.ini +0 -0
  46. {ingestr-0.13.87 → ingestr-0.13.89}/Dockerfile +0 -0
  47. {ingestr-0.13.87 → ingestr-0.13.89}/LICENSE.md +0 -0
  48. {ingestr-0.13.87 → ingestr-0.13.89}/Makefile +0 -0
  49. {ingestr-0.13.87 → ingestr-0.13.89}/README.md +0 -0
  50. {ingestr-0.13.87 → ingestr-0.13.89}/docs/.vitepress/theme/custom.css +0 -0
  51. {ingestr-0.13.87 → ingestr-0.13.89}/docs/.vitepress/theme/index.js +0 -0
  52. {ingestr-0.13.87 → ingestr-0.13.89}/docs/commands/example-uris.md +0 -0
  53. {ingestr-0.13.87 → ingestr-0.13.89}/docs/getting-started/core-concepts.md +0 -0
  54. {ingestr-0.13.87 → ingestr-0.13.89}/docs/getting-started/incremental-loading.md +0 -0
  55. {ingestr-0.13.87 → ingestr-0.13.89}/docs/getting-started/quickstart.md +0 -0
  56. {ingestr-0.13.87 → ingestr-0.13.89}/docs/getting-started/telemetry.md +0 -0
  57. {ingestr-0.13.87 → ingestr-0.13.89}/docs/index.md +0 -0
  58. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/applovin_max.png +0 -0
  59. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/athena.png +0 -0
  60. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/clickhouse_img.png +0 -0
  61. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/clickup_ingestion.png +0 -0
  62. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/cratedb-destination.png +0 -0
  63. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/cratedb-source.png +0 -0
  64. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/freshdesk_ingestion.png +0 -0
  65. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/gcp_spanner_ingestion.png +0 -0
  66. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/github.png +0 -0
  67. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/google_analytics_realtime_report.png +0 -0
  68. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/googleanalytics.png +0 -0
  69. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/ingestion_elasticsearch_img.png +0 -0
  70. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/kinesis.bigquery.png +0 -0
  71. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/linear.png +0 -0
  72. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/linkedin_ads.png +0 -0
  73. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/mixpanel_ingestion.png +0 -0
  74. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/personio.png +0 -0
  75. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/personio_duckdb.png +0 -0
  76. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/phantombuster.png +0 -0
  77. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/pipedrive.png +0 -0
  78. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/quickbook_ingestion.png +0 -0
  79. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/sftp.png +0 -0
  80. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/stripe_postgres.png +0 -0
  81. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/tiktok.png +0 -0
  82. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/wise_ingestion.png +0 -0
  83. {ingestr-0.13.87 → ingestr-0.13.89}/docs/media/zoom_ingestion.png +0 -0
  84. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/airtable.md +0 -0
  85. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/appstore.md +0 -0
  86. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/athena.md +0 -0
  87. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/bigquery.md +0 -0
  88. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/chess.md +0 -0
  89. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/clickhouse.md +0 -0
  90. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/clickup.md +0 -0
  91. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/cratedb.md +0 -0
  92. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/csv.md +0 -0
  93. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/custom_queries.md +0 -0
  94. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/databricks.md +0 -0
  95. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/db2.md +0 -0
  96. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/duckdb.md +0 -0
  97. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/dynamodb.md +0 -0
  98. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/elasticsearch.md +0 -0
  99. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/facebook-ads.md +0 -0
  100. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/frankfurter.md +0 -0
  101. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/freshdesk.md +0 -0
  102. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/gcs.md +0 -0
  103. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/github.md +0 -0
  104. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/google-ads.md +0 -0
  105. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/google_analytics.md +0 -0
  106. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/gorgias.md +0 -0
  107. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/gsheets.md +0 -0
  108. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/hubspot.md +0 -0
  109. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/influxdb.md +0 -0
  110. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/isoc-pulse.md +0 -0
  111. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/kafka.md +0 -0
  112. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/kinesis.md +0 -0
  113. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/klaviyo.md +0 -0
  114. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/linear.md +0 -0
  115. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/linkedin_ads.md +0 -0
  116. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/mixpanel.md +0 -0
  117. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/mongodb.md +0 -0
  118. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/motherduck.md +0 -0
  119. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/mssql.md +0 -0
  120. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/mysql.md +0 -0
  121. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/notion.md +0 -0
  122. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/oracle.md +0 -0
  123. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/personio.md +0 -0
  124. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/phantombuster.md +0 -0
  125. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/pinterest.md +0 -0
  126. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/pipedrive.md +0 -0
  127. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/postgres.md +0 -0
  128. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/quickbooks.md +0 -0
  129. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/redshift.md +0 -0
  130. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/revenuecat.md +0 -0
  131. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/s3.md +0 -0
  132. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/sap-hana.md +0 -0
  133. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/sftp.md +0 -0
  134. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/shopify.md +0 -0
  135. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/slack.md +0 -0
  136. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/smartsheets.md +0 -0
  137. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/snowflake.md +0 -0
  138. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/solidgate.md +0 -0
  139. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/spanner.md +0 -0
  140. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/sqlite.md +0 -0
  141. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/tiktok-ads.md +0 -0
  142. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/trustpilot.md +0 -0
  143. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/wise.md +0 -0
  144. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/zendesk.md +0 -0
  145. {ingestr-0.13.87 → ingestr-0.13.89}/docs/supported-sources/zoom.md +0 -0
  146. {ingestr-0.13.87 → ingestr-0.13.89}/docs/tutorials/load-kinesis-bigquery.md +0 -0
  147. {ingestr-0.13.87 → ingestr-0.13.89}/docs/tutorials/load-personio-duckdb.md +0 -0
  148. {ingestr-0.13.87 → ingestr-0.13.89}/docs/tutorials/load-stripe-postgres.md +0 -0
  149. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/conftest.py +0 -0
  150. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/.gitignore +0 -0
  151. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/adjust/__init__.py +0 -0
  152. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/adjust/adjust_helpers.py +0 -0
  153. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/airtable/__init__.py +0 -0
  154. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/applovin_max/__init__.py +0 -0
  155. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/appsflyer/__init__.py +0 -0
  156. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/appsflyer/client.py +0 -0
  157. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/appstore/__init__.py +0 -0
  158. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/appstore/client.py +0 -0
  159. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/appstore/errors.py +0 -0
  160. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/appstore/models.py +0 -0
  161. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/appstore/resources.py +0 -0
  162. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/arrow/__init__.py +0 -0
  163. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/asana_source/helpers.py +0 -0
  164. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/asana_source/settings.py +0 -0
  165. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/attio/__init__.py +0 -0
  166. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/attio/helpers.py +0 -0
  167. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/blob.py +0 -0
  168. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/chess/__init__.py +0 -0
  169. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/chess/helpers.py +0 -0
  170. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/chess/settings.py +0 -0
  171. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/clickup/__init__.py +0 -0
  172. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/clickup/helpers.py +0 -0
  173. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/collector/spinner.py +0 -0
  174. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/dynamodb/__init__.py +0 -0
  175. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/elasticsearch/__init__.py +0 -0
  176. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/errors.py +0 -0
  177. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/facebook_ads/__init__.py +0 -0
  178. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/facebook_ads/exceptions.py +0 -0
  179. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/facebook_ads/helpers.py +0 -0
  180. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/facebook_ads/settings.py +0 -0
  181. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/facebook_ads/utils.py +0 -0
  182. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/factory.py +0 -0
  183. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/filesystem/helpers.py +0 -0
  184. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/filesystem/readers.py +0 -0
  185. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/fluxx/__init__.py +0 -0
  186. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/fluxx/helpers.py +0 -0
  187. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/freshdesk/__init__.py +0 -0
  188. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/freshdesk/freshdesk_client.py +0 -0
  189. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/freshdesk/settings.py +0 -0
  190. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/github/__init__.py +0 -0
  191. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/github/helpers.py +0 -0
  192. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/github/queries.py +0 -0
  193. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/github/settings.py +0 -0
  194. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_ads/__init__.py +0 -0
  195. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_ads/field.py +0 -0
  196. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_ads/metrics.py +0 -0
  197. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_ads/predicates.py +0 -0
  198. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_ads/reports.py +0 -0
  199. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_analytics/__init__.py +0 -0
  200. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_analytics/helpers.py +0 -0
  201. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_sheets/README.md +0 -0
  202. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_sheets/__init__.py +0 -0
  203. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_sheets/helpers/__init__.py +0 -0
  204. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_sheets/helpers/api_calls.py +0 -0
  205. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/google_sheets/helpers/data_processing.py +0 -0
  206. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/gorgias/__init__.py +0 -0
  207. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/gorgias/helpers.py +0 -0
  208. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/http_client.py +0 -0
  209. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/hubspot/__init__.py +0 -0
  210. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/hubspot/helpers.py +0 -0
  211. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/hubspot/settings.py +0 -0
  212. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/influxdb/__init__.py +0 -0
  213. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/influxdb/client.py +0 -0
  214. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/isoc_pulse/__init__.py +0 -0
  215. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/kafka/__init__.py +0 -0
  216. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/kafka/helpers.py +0 -0
  217. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/kinesis/__init__.py +0 -0
  218. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/kinesis/helpers.py +0 -0
  219. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/klaviyo/__init__.py +0 -0
  220. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/klaviyo/client.py +0 -0
  221. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/klaviyo/helpers.py +0 -0
  222. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/linear/__init__.py +0 -0
  223. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/linear/helpers.py +0 -0
  224. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/linkedin_ads/__init__.py +0 -0
  225. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/linkedin_ads/dimension_time_enum.py +0 -0
  226. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/linkedin_ads/helpers.py +0 -0
  227. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/loader.py +0 -0
  228. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/mixpanel/__init__.py +0 -0
  229. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/mixpanel/client.py +0 -0
  230. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/mongodb/__init__.py +0 -0
  231. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/notion/__init__.py +0 -0
  232. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/notion/helpers/__init__.py +0 -0
  233. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/notion/helpers/client.py +0 -0
  234. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/notion/helpers/database.py +0 -0
  235. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/notion/settings.py +0 -0
  236. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/partition.py +0 -0
  237. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/personio/__init__.py +0 -0
  238. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/personio/helpers.py +0 -0
  239. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/phantombuster/__init__.py +0 -0
  240. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/phantombuster/client.py +0 -0
  241. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/pinterest/__init__.py +0 -0
  242. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/pipedrive/__init__.py +0 -0
  243. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/pipedrive/helpers/__init__.py +0 -0
  244. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/pipedrive/helpers/custom_fields_munger.py +0 -0
  245. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/pipedrive/helpers/pages.py +0 -0
  246. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/pipedrive/settings.py +0 -0
  247. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/pipedrive/typing.py +0 -0
  248. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/quickbooks/__init__.py +0 -0
  249. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/resource.py +0 -0
  250. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/salesforce/helpers.py +0 -0
  251. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/shopify/__init__.py +0 -0
  252. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/shopify/exceptions.py +0 -0
  253. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/shopify/helpers.py +0 -0
  254. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/shopify/settings.py +0 -0
  255. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/slack/__init__.py +0 -0
  256. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/slack/helpers.py +0 -0
  257. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/slack/settings.py +0 -0
  258. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/smartsheets/__init__.py +0 -0
  259. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/solidgate/__init__.py +0 -0
  260. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/solidgate/helpers.py +0 -0
  261. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/sql_database/__init__.py +0 -0
  262. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/sql_database/callbacks.py +0 -0
  263. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/stripe_analytics/__init__.py +0 -0
  264. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/stripe_analytics/helpers.py +0 -0
  265. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/stripe_analytics/settings.py +0 -0
  266. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/table_definition.py +0 -0
  267. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/telemetry/event.py +0 -0
  268. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/testdata/fakebqcredentials.json +0 -0
  269. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/tiktok_ads/__init__.py +0 -0
  270. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/tiktok_ads/tiktok_helpers.py +0 -0
  271. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/time.py +0 -0
  272. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/trustpilot/__init__.py +0 -0
  273. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/trustpilot/client.py +0 -0
  274. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/version.py +0 -0
  275. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/wise/__init__.py +0 -0
  276. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/wise/client.py +0 -0
  277. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/zendesk/__init__.py +0 -0
  278. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/zendesk/helpers/__init__.py +0 -0
  279. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/zendesk/helpers/api_helpers.py +0 -0
  280. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/zendesk/helpers/credentials.py +0 -0
  281. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/zendesk/helpers/talk_api.py +0 -0
  282. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/zendesk/settings.py +0 -0
  283. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/zoom/__init__.py +0 -0
  284. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/src/zoom/helpers.py +0 -0
  285. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/testdata/.gitignore +0 -0
  286. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/testdata/create_replace.csv +0 -0
  287. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/testdata/delete_insert_expected.csv +0 -0
  288. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/testdata/delete_insert_part1.csv +0 -0
  289. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/testdata/delete_insert_part2.csv +0 -0
  290. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/testdata/merge_expected.csv +0 -0
  291. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/testdata/merge_part1.csv +0 -0
  292. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/testdata/merge_part2.csv +0 -0
  293. {ingestr-0.13.87 → ingestr-0.13.89}/ingestr/tests/unit/test_smartsheets.py +0 -0
  294. {ingestr-0.13.87 → ingestr-0.13.89}/package-lock.json +0 -0
  295. {ingestr-0.13.87 → ingestr-0.13.89}/package.json +0 -0
  296. {ingestr-0.13.87 → ingestr-0.13.89}/pyproject.toml +0 -0
  297. {ingestr-0.13.87 → ingestr-0.13.89}/requirements-dev.txt +0 -0
  298. {ingestr-0.13.87 → ingestr-0.13.89}/resources/demo.gif +0 -0
  299. {ingestr-0.13.87 → ingestr-0.13.89}/resources/demo.tape +0 -0
  300. {ingestr-0.13.87 → ingestr-0.13.89}/resources/ingestr.svg +0 -0
  301. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/AMPM.yml +0 -0
  302. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Acronyms.yml +0 -0
  303. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Colons.yml +0 -0
  304. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Contractions.yml +0 -0
  305. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/DateFormat.yml +0 -0
  306. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Ellipses.yml +0 -0
  307. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/EmDash.yml +0 -0
  308. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Exclamation.yml +0 -0
  309. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/FirstPerson.yml +0 -0
  310. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Gender.yml +0 -0
  311. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/GenderBias.yml +0 -0
  312. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/HeadingPunctuation.yml +0 -0
  313. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Headings.yml +0 -0
  314. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Latin.yml +0 -0
  315. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/LyHyphens.yml +0 -0
  316. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/OptionalPlurals.yml +0 -0
  317. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Ordinal.yml +0 -0
  318. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/OxfordComma.yml +0 -0
  319. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Parens.yml +0 -0
  320. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Passive.yml +0 -0
  321. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Periods.yml +0 -0
  322. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Quotes.yml +0 -0
  323. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Ranges.yml +0 -0
  324. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Semicolons.yml +0 -0
  325. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Slang.yml +0 -0
  326. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Spacing.yml +0 -0
  327. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Spelling.yml +0 -0
  328. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Units.yml +0 -0
  329. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/We.yml +0 -0
  330. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/Will.yml +0 -0
  331. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/WordList.yml +0 -0
  332. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/meta.json +0 -0
  333. {ingestr-0.13.87 → ingestr-0.13.89}/styles/Google/vocab.txt +0 -0
  334. {ingestr-0.13.87 → ingestr-0.13.89}/styles/bruin/Ingestr.yml +0 -0
  335. {ingestr-0.13.87 → ingestr-0.13.89}/styles/config/vocabularies/bruin/accept.txt +0 -0
  336. {ingestr-0.13.87 → ingestr-0.13.89}/test.env.template +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: ingestr
3
- Version: 0.13.87
3
+ Version: 0.13.89
4
4
  Summary: ingestr is a command-line application that ingests data from various sources and stores them in any database.
5
5
  Project-URL: Homepage, https://github.com/bruin-data/ingestr
6
6
  Project-URL: Issues, https://github.com/bruin-data/ingestr/issues
@@ -42,7 +42,7 @@ Requires-Dist: confluent-kafka==2.8.0
42
42
  Requires-Dist: crate==2.0.0
43
43
  Requires-Dist: cryptography==44.0.2
44
44
  Requires-Dist: curlify==2.2.1
45
- Requires-Dist: databricks-sql-connector==2.9.3
45
+ Requires-Dist: databricks-sql-connector==4.0.5
46
46
  Requires-Dist: databricks-sqlalchemy==1.0.2
47
47
  Requires-Dist: dataclasses-json==0.6.7
48
48
  Requires-Dist: decorator==5.2.1
@@ -43,6 +43,7 @@ export default defineConfig({
43
43
  text: "Incremental Loading",
44
44
  link: "/getting-started/incremental-loading.md",
45
45
  },
46
+ { text: "Data Masking", link: "/getting-started/data-masking.md" },
46
47
  { text: "Telemetry", link: "/getting-started/telemetry.md" },
47
48
  ],
48
49
  },
@@ -28,6 +28,7 @@ ingestr ingest \
28
28
  - `--interval-end`: Sets the end of the interval for the incremental key. Defaults to `None`.
29
29
  - `--primary-key TEXT`: Specifies the primary key for the merge operation. Defaults to `None`.
30
30
  - `--columns <column_name>:<column_type>`: Specifies the columns to be ingested. Defaults to `None`.
31
+ - `--mask <column_name>:<algorithm>[:param]`: Applies data masking to specified columns. Can be used multiple times for different columns. See the [Data Masking](../getting-started/data-masking.md) documentation for available algorithms and usage examples. Defaults to `None`.
31
32
 
32
33
  The `interval-start` and `interval-end` options support various datetime formats, here are some examples:
33
34
  - `%Y-%m-%d`: `2023-01-31`
@@ -106,5 +107,25 @@ ingestr ingest
106
107
  --columns 'dt:date'
107
108
  ```
108
109
 
110
+ ### Ingesting with Data Masking
111
+
112
+ ```bash
113
+ ingestr ingest \
114
+ --source-uri 'postgresql://user:pass@localhost/customers' \
115
+ --source-table 'customer_data' \
116
+ --dest-uri 'duckdb:///masked_customers.db' \
117
+ --dest-table 'masked_customers' \
118
+ --mask 'email:hash' \
119
+ --mask 'phone:partial:3' \
120
+ --mask 'ssn:redact' \
121
+ --mask 'salary:round:5000'
122
+ ```
123
+
124
+ This example demonstrates masking sensitive customer data:
125
+ - Email addresses are hashed for consistent anonymization
126
+ - Phone numbers show only first and last 3 digits
127
+ - SSNs are completely redacted
128
+ - Salaries are rounded to nearest $5000
129
+
109
130
  > [!INFO]
110
131
  > For more examples, please refer to the specific platforms' documentation on the sidebar.
@@ -0,0 +1,377 @@
1
+ # Data Masking
2
+
3
+ Data masking is a critical security feature that allows you to protect sensitive information while maintaining data utility for development, testing, and analytics purposes. ingestr provides comprehensive masking capabilities that can be applied to any column during the ingestion process.
4
+
5
+ ## Overview
6
+
7
+ Data masking transforms sensitive data into a protected format while preserving the structure and type of the original data. This is essential for:
8
+
9
+ - **Compliance** with regulations like GDPR, CCPA, HIPAA
10
+ - **Security** in development and testing environments
11
+ - **Privacy** protection in analytics and reporting
12
+ - **Data sharing** with third parties or external systems
13
+
14
+ ## Usage
15
+
16
+ Apply masking to specific columns using the `--mask` parameter:
17
+
18
+ ```bash
19
+ ingestr ingest \
20
+ --source-uri "postgres://user:pass@localhost/db" \
21
+ --source-table "users" \
22
+ --dest-uri "duckdb:///masked_data.db" \
23
+ --dest-table "masked_users" \
24
+ --mask "email:hash" \
25
+ --mask "ssn:partial:4" \
26
+ --mask "salary:round:1000"
27
+ ```
28
+
29
+ ### Format
30
+
31
+ ```
32
+ --mask <column_name>:<algorithm>[:<parameter>]
33
+ ```
34
+
35
+ - `column_name`: The name of the column to mask
36
+ - `algorithm`: The masking algorithm to apply
37
+ - `parameter`: Optional parameter for algorithms that require configuration
38
+
39
+ ## Masking Algorithms
40
+
41
+ ### Irreversible Masking
42
+
43
+ These algorithms permanently transform data in a way that cannot be reversed.
44
+
45
+ #### `hash` / `sha256`
46
+ Creates a SHA-256 hash of the value. Consistent across runs - the same input always produces the same output.
47
+
48
+ **Use cases:** Creating anonymous identifiers, consistent tokenization
49
+ ```bash
50
+ --mask "user_id:hash"
51
+ # john.doe@example.com → a94a8fe5ccb19ba61c4c0873d391e987982fbbd3
52
+ ```
53
+
54
+ #### `md5`
55
+ Creates an MD5 hash. Faster than SHA-256 but less secure (adequate for non-security purposes).
56
+
57
+ **Use cases:** Quick checksums, non-security tokenization
58
+ ```bash
59
+ --mask "session_id:md5"
60
+ ```
61
+
62
+ #### `hmac`
63
+ Hash-based message authentication code with a secret key. Provides consistent hashing across systems when using the same key.
64
+
65
+ **Use cases:** Cross-system consistency with shared secret
66
+ ```bash
67
+ --mask "customer_id:hmac:my-secret-key"
68
+ ```
69
+
70
+ #### `redact`
71
+ Replaces the entire value with "REDACTED".
72
+
73
+ **Use cases:** Complete removal of sensitive data
74
+ ```bash
75
+ --mask "comments:redact"
76
+ # "Customer complaint about..." → "REDACTED"
77
+ ```
78
+
79
+ ### Format-Preserving Masking
80
+
81
+ These algorithms maintain the format and structure of the original data.
82
+
83
+ #### `email`
84
+ Masks email addresses while preserving the domain.
85
+
86
+ **Use cases:** Protecting email addresses while maintaining domain analysis
87
+ ```bash
88
+ --mask "email:email"
89
+ # john.doe@example.com → j******e@example.com
90
+ ```
91
+
92
+ #### `phone`
93
+ Masks phone numbers while preserving country and area codes.
94
+
95
+ **Use cases:** Geographic analysis without exposing full numbers
96
+ ```bash
97
+ --mask "phone:phone"
98
+ # +1-555-123-4567 → +1-555-***-****
99
+ ```
100
+
101
+ #### `credit_card`
102
+ Shows only the last 4 digits of credit card numbers.
103
+
104
+ **Use cases:** Payment processing logs, transaction records
105
+ ```bash
106
+ --mask "card_number:credit_card"
107
+ # 4111-1111-1111-1111 → ****-****-****-1111
108
+ ```
109
+
110
+ #### `ssn`
111
+ Masks Social Security Numbers showing only last 4 digits.
112
+
113
+ **Use cases:** Identity verification systems
114
+ ```bash
115
+ --mask "ssn:ssn"
116
+ # 123-45-6789 → ***-**-6789
117
+ ```
118
+
119
+ ### Partial Masking
120
+
121
+ These algorithms show only portions of the original data.
122
+
123
+ #### `partial`
124
+ Shows first and last N characters, masking the middle.
125
+
126
+ **Use cases:** Names, addresses, partial visibility
127
+ ```bash
128
+ --mask "name:partial:2"
129
+ # "Jonathan" → "Jo****an"
130
+ ```
131
+
132
+ #### `first_letter`
133
+ Shows only the first character.
134
+
135
+ **Use cases:** Initials, abbreviated names
136
+ ```bash
137
+ --mask "first_name:first_letter"
138
+ # "Alice" → "A****"
139
+ ```
140
+
141
+ #### `stars`
142
+ Replaces entire value with asterisks of the same length.
143
+
144
+ **Use cases:** Password fields, complete obfuscation
145
+ ```bash
146
+ --mask "password:stars"
147
+ # "secret123" → "*********"
148
+ ```
149
+
150
+ #### `fixed`
151
+ Replaces with a fixed value.
152
+
153
+ **Use cases:** Standardized replacement values
154
+ ```bash
155
+ --mask "api_key:fixed:MASKED_KEY"
156
+ # "sk_live_abc123" → "MASKED_KEY"
157
+ ```
158
+
159
+ ### Tokenization
160
+
161
+ These algorithms replace values with tokens or identifiers.
162
+
163
+ #### `uuid`
164
+ Replaces with a UUID token. Same values get the same UUID (consistent).
165
+
166
+ **Use cases:** Creating surrogate keys, maintaining referential integrity
167
+ ```bash
168
+ --mask "customer_id:uuid"
169
+ # "CUST001" → "550e8400-e29b-41d4-a716-446655440000"
170
+ ```
171
+
172
+ #### `sequential`
173
+ Replaces with sequential integers starting from 1.
174
+
175
+ **Use cases:** Simple anonymization, reducing data size
176
+ ```bash
177
+ --mask "account_number:sequential"
178
+ # "ACC-2024-001" → 1
179
+ # "ACC-2024-002" → 2
180
+ ```
181
+
182
+ #### `random`
183
+ Replaces with random data of the same type.
184
+
185
+ **Use cases:** Test data generation, complete randomization
186
+ ```bash
187
+ --mask "age:random"
188
+ # 35 → 67 (random number)
189
+ ```
190
+
191
+ ### Numeric Masking
192
+
193
+ These algorithms transform numeric values while preserving their general magnitude.
194
+
195
+ #### `round`
196
+ Rounds numbers to the nearest specified value.
197
+
198
+ **Use cases:** Salary bands, age groups, reducing precision
199
+ ```bash
200
+ --mask "salary:round:5000"
201
+ # 52300 → 50000
202
+
203
+ --mask "age:round:10"
204
+ # 34 → 30
205
+ ```
206
+
207
+ #### `range`
208
+ Replaces with a range bracket.
209
+
210
+ **Use cases:** Bucketing, categorical analysis
211
+ ```bash
212
+ --mask "income:range:10000"
213
+ # 45000 → "40000-50000"
214
+
215
+ --mask "score:range:100"
216
+ # 234 → "200-300"
217
+ ```
218
+
219
+ #### `noise`
220
+ Adds random noise to numeric values.
221
+
222
+ **Use cases:** Statistical privacy, differential privacy
223
+ ```bash
224
+ --mask "revenue:noise:0.1"
225
+ # 100000 → 91234 (±10% random noise)
226
+
227
+ --mask "temperature:noise:0.05"
228
+ # 98.6 → 97.2 (±5% random noise)
229
+ ```
230
+
231
+ ### Date Masking
232
+
233
+ These algorithms transform date and datetime values.
234
+
235
+ #### `date_shift`
236
+ Adds or subtracts random days within a specified range.
237
+
238
+ **Use cases:** Preserving date relationships while obscuring exact dates
239
+ ```bash
240
+ --mask "birth_date:date_shift:30"
241
+ # 1990-05-15 → 1990-06-02 (shifted ±30 days randomly)
242
+ ```
243
+
244
+ #### `year_only`
245
+ Keeps only the year portion of dates.
246
+
247
+ **Use cases:** Age analysis, cohort studies
248
+ ```bash
249
+ --mask "registration_date:year_only"
250
+ # 2024-03-15 → 2024
251
+ ```
252
+
253
+ #### `month_year`
254
+ Keeps only month and year.
255
+
256
+ **Use cases:** Seasonal analysis, monthly aggregations
257
+ ```bash
258
+ --mask "purchase_date:month_year"
259
+ # 2024-03-15 → "2024-03"
260
+ ```
261
+
262
+ ## Use Case Examples
263
+
264
+ ### GDPR Compliance for Development Environment
265
+
266
+ ```bash
267
+ ingestr ingest \
268
+ --source-uri "postgres://prod_user:pass@prod.db/customers" \
269
+ --source-table "customer_data" \
270
+ --dest-uri "postgres://dev_user:pass@dev.db/customers" \
271
+ --dest-table "customer_data" \
272
+ --mask "email:hash" \
273
+ --mask "phone:phone" \
274
+ --mask "name:partial:1" \
275
+ --mask "address:redact" \
276
+ --mask "ip_address:hash" \
277
+ --mask "birth_date:year_only"
278
+ ```
279
+
280
+ ### Healthcare Data for Analytics
281
+
282
+ ```bash
283
+ ingestr ingest \
284
+ --source-uri "mysql://user:pass@hospital.db/patients" \
285
+ --source-table "patient_records" \
286
+ --dest-uri "bigquery://project/dataset" \
287
+ --dest-table "patient_analytics" \
288
+ --mask "patient_id:uuid" \
289
+ --mask "ssn:redact" \
290
+ --mask "diagnosis_notes:redact" \
291
+ --mask "admission_date:date_shift:7" \
292
+ --mask "age:round:5"
293
+ ```
294
+
295
+ ### Financial Data for Testing
296
+
297
+ ```bash
298
+ ingestr ingest \
299
+ --source-uri "snowflake://account/database/schema" \
300
+ --source-table "transactions" \
301
+ --dest-uri "duckdb:///test_data.db" \
302
+ --dest-table "test_transactions" \
303
+ --mask "account_number:sequential" \
304
+ --mask "card_number:credit_card" \
305
+ --mask "amount:noise:0.2" \
306
+ --mask "merchant_name:fixed:TEST_MERCHANT"
307
+ ```
308
+
309
+ ### E-commerce Data Sharing
310
+
311
+ ```bash
312
+ ingestr ingest \
313
+ --source-uri "postgres://internal.db/ecommerce" \
314
+ --source-table "orders" \
315
+ --dest-uri "s3://partner-bucket/data.parquet" \
316
+ --dest-table "shared_orders" \
317
+ --mask "customer_email:email" \
318
+ --mask "shipping_address:first_letter" \
319
+ --mask "order_value:round:10" \
320
+ --mask "customer_name:partial:2"
321
+ ```
322
+
323
+ ## Best Practices
324
+
325
+ ### Choosing the Right Algorithm
326
+
327
+ 1. **For PII (Personally Identifiable Information)**
328
+ - Use `hash` for consistent anonymization
329
+ - Use `redact` for complete removal
330
+ - Use format-preserving masks (`email`, `phone`, `ssn`) for maintaining data structure
331
+
332
+ 2. **For Development/Testing**
333
+ - Use `uuid` or `sequential` for maintaining relationships
334
+ - Use `random` for generating test data
335
+ - Use `partial` for semi-realistic data
336
+
337
+ 3. **For Analytics**
338
+ - Use `round` or `range` for numerical aggregations
339
+ - Use `date_shift` for time-series analysis
340
+ - Use `year_only` or `month_year` for temporal grouping
341
+
342
+ 4. **For Compliance**
343
+ - GDPR: Consider `hash`, `redact`, or `uuid` for personal data
344
+ - HIPAA: Use `redact` for medical records, `date_shift` for dates
345
+ - PCI DSS: Use `credit_card` for card numbers
346
+
347
+ ### Performance Considerations
348
+
349
+ - **Hash-based algorithms** are fast and consistent
350
+ - **Random algorithms** have minimal overhead but don't preserve consistency
351
+ - **Format-preserving masks** have moderate performance impact
352
+ - **Multiple masks** can be applied efficiently in a single pass
353
+
354
+ ### Security Notes
355
+
356
+ 1. **Hashed values** are one-way transformations but may be vulnerable to rainbow table attacks for common values
357
+ 2. **Partial masking** may not provide sufficient protection for highly sensitive data
358
+ 3. **Date shifting** preserves intervals between dates, which may leak information
359
+ 4. **Consistent tokenization** (uuid, hash) maintains relationships which could be exploited
360
+ 5. Always validate that your masking strategy meets your compliance requirements
361
+
362
+ ## Environment Variables
363
+
364
+ You can also set masking configurations via environment variables:
365
+
366
+ ```bash
367
+ export INGESTR_MASK="email:hash,phone:partial:3,ssn:redact"
368
+ ```
369
+
370
+ Multiple masks should be comma-separated when using environment variables.
371
+
372
+ ## Limitations
373
+
374
+ - Masking is applied in-memory during the ingestion process
375
+ - The original source data remains unchanged
376
+ - Some algorithms require additional dependencies (e.g., `date_shift` requires `python-dateutil`)
377
+ - Masking adds processing overhead proportional to the data volume and number of masks applied
@@ -32,12 +32,7 @@ Adjust data may change going back, which means you'll need to change your start
32
32
  ## Tables
33
33
  Adjust source allows ingesting data from various sources:
34
34
 
35
- - `campaigns`: Retrieves data for a campaign, showing the app's revenue and network costs over multiple days.
36
- - `creatives`: Retrieves data for a creative assets, detailing the app's revenue and network costs across multiple days.
37
- - `events`: Retrieves data for [events](https://dev.adjust.com/en/api/rs-api/events/) and event slugs.
38
- - `custom`: Retrieves custom data based on the dimensions and metrics specified.
39
-
40
- ### Custom reports: `custom:<dimensions>:<metrics>[:<filters>]`
35
+ #### Custom reports: `custom:<dimensions>:<metrics>[:<filters>]`
41
36
 
42
37
  The custom table allows you to retrieve data based on specific dimensions and metrics, and apply filters to the data.
43
38
 
@@ -55,7 +50,14 @@ Parameters:
55
50
  > [!WARNING]
56
51
  > Custom tables require a time-based dimension for efficient operation, such as `hour`, `day`, `week`, `month`, or `year`.
57
52
 
58
- ## Examples
53
+ | Table | PK/Merge Key | Inc Key | Inc Strategy | Details |
54
+ | --------------- | ----------- | --------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
55
+ | [Events](https://dev.adjust.com/en/api/rs-api/events) | id | – | replace | Retrieves data for [events](https://dev.adjust.com/en/api/rs-api/events/) and event slugs. | |
56
+ | [campaigns](https://dev.adjust.com/en/api/rs-api/reports) | day | – | merge | Retrieves data for a campaign, showing the app's revenue and network costs over multiple days. `Columns:` campaign, day, app, store_type, channel, country, network_cost, all_revenue_total_d0, ad_revenue_total_d0, revenue_total_d0, all_revenue_total_d1, ad_revenue_total_d1, revenue_total_d1, all_revenue_total_d3, ad_revenue_total_d3, revenue_total_d3, all_revenue_total_d7, ad_revenue_total_d7, revenue_total_d7, all_revenue_total_d14, ad_revenue_total_d14, revenue_total_d14, all_revenue_total_d21 |
57
+ | [creatives](https://dev.adjust.com/en/api/rs-api/reports) | day | - | merge | Retrieves data for a creative assets, detailing the app's revenue and network costs across multiple days. `Columns:` campaign, day, app, store_type, channel, country, adgroup, creative, network_cost, all_revenue_total_d0, ad_revenue_total_d0, revenue_total_d0, all_revenue_total_d1, ad_revenue_total_d1, revenue_total_d1, all_revenue_total_d3, ad_revenue_total_d3, revenue_total_d3, all_revenue_total_d7, ad_revenue_total_d7, revenue_total_d7, all_revenue_total_d14, ad_revenue_total_d14, revenue_total_d14, all_revenue_total_d21 |
58
+ | `custom` | `configurable` | - | merge | Retrieves custom data based on the dimensions and metrics specified.
59
+
60
+ ## Examples
59
61
 
60
62
  Copy campaigns data from Adjust into a DuckDB database:
61
63
  ```sh
@@ -82,4 +84,4 @@ ingestr ingest \
82
84
  --source-table "custom:hour,app,store_id,channel,os_name,country_code,campaign_network,campaign_id_network,adgroup_network, adgroup_id_network,creative_network,creative_id_network:impressions,clicks,cost,network_cost,installs,ad_revenue,all_revenue" \
83
85
  --dest-uri duckdb:///adjust.db \
84
86
  --dest-table "mat.example"
85
- ```
87
+ ```
@@ -101,12 +101,12 @@ $ duckdb report.db 'select day from public.publisher_report group by 1'
101
101
 
102
102
  ## Tables
103
103
 
104
- | Name | Description |
105
- | --- | --- |
106
- | `publisher-report` | Provides daily metrics from the `report` end point using the report_type `publisher` |
107
- | `advertiser-report` | Provides daily metrics from the `report` end point using the report_type `advertiser`|
108
- | `advertiser-probabilistic-report` | Provides daily metrics from the `probabilisticReport` end point using the report_type `advertiser` |
109
- | `advertiser-ska-report` | Provides daily metrics from the `skaReport` end point using the report_type `advertiser` |
104
+ | Name | Merge Key | Inc Key | Inc Strategy | Details |
105
+ | --------------- | ----------- | --------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
106
+ | `publisher-report` | day | day | merge| Provides daily metrics from the `report` end point using the report_type `publisher` |
107
+ | `advertiser-report` | day | day | merge| Provides daily metrics from the `report` end point using the report_type `advertiser`|
108
+ | `advertiser-probabilistic-report` | day | day | merge| Provides daily metrics from the `probabilisticReport` end point using the report_type `advertiser` |
109
+ | `advertiser-ska-report` | day | day | merge| Provides daily metrics from the `skaReport` end point using the report_type `advertiser` |
110
110
 
111
111
  ## Custom Reports
112
112
 
@@ -45,4 +45,8 @@ By default, `ingestr` retrieves data for the last 30 days. For a custom date ran
45
45
  <img alt="applovin_max_img" src="../media/applovin_max.png"/>
46
46
 
47
47
  ## Table
48
- [user_ad_revenue](https://developers.applovin.com/en/max/reporting-apis/user-level-ad-revenue-api/): Provides daily metrics from the user level ad revenue API.User-level revenue data is available eight hours after UTC day end. So, for example, data for UTC 2025-01-01 is available on UTC 2025-01-02 after 08:00.
48
+
49
+ Applovin Max source allows ingesting the following sources into separate tables:
50
+ | Table | PK | Inc Key | Inc Strategy | Details |
51
+ | --------------- | ----------- | --------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
52
+ | [user_ad_revenue](https://developers.applovin.com/en/max/reporting-apis/user-level-ad-revenue-api/) | `partition_date` (extracts date-only portion from the timestamp-based date column) | partition_date | merge | Provides daily metrics from the user level ad revenue API. User-level revenue data is available eight hours after UTC day end. So, for example, data for UTC 2025-01-01 is available on UTC 2025-01-02 after 08:00.
@@ -34,9 +34,13 @@ The result of this command will be a table in the `appsflyer.duckdb` database.
34
34
 
35
35
  ingestr integrates with the [Master Report API](https://dev.appsflyer.com/hc/reference/master_api_get) of AppsFlyer, which allows you to retrieve data for the following tables:
36
36
 
37
- - `campaigns`: Retrieves data for campaigns, detailing the app's costs, loyal users, total installs, and revenue over multiple days.
38
- - `creatives`: Retrieves data for a creative asset, including revenue and cost.
39
- - `custom:<dimensions>:<metrics>`: Retrieves data for custom tables, which can be specified by the user.
37
+ ## Tables
38
+
39
+ | Name | PK/Merge Key | Inc Key | Inc Strategy | Details |
40
+ | --------------- | ----------- | --------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
41
+ | [campaigns](https://dev.appsflyer.com/hc/reference/master_api_get) | install_time | install_time | merge| Retrieves data for campaigns, detailing the app's costs, loyal users, total installs, and revenue over multiple days.`columns:` app_id, campaign, geo, install_time, average_ecpi, clicks, cohort_day_1_revenue_per_user, cohort_day_1_total_revenue_per_user, cohort_day_14_revenue_per_user, cohort_day_14_total_revenue_per_user, cohort_day_21_revenue_per_user, cohort_day_21_total_revenue_per_user, cohort_day_3_revenue_per_user, cohort_day_3_total_revenue_per_user, cohort_day_7_revenue_per_user, cohort_day_7_total_revenue_per_user, cost, impressions, installs, loyal_users, retention_day_7, revenue, roi, uninstalls |
42
+ | [creatives](https://dev.appsflyer.com/hc/reference/master_api_get) | install_time | install_time | merge| Retrieves data for a creative asset, including revenue and cost. `columns:` geo, app_id, install_time, campaign, adset_id, adset, ad_id, impressions, clicks, installs, cost, revenue, average_ecpi, loyal_users, uninstalls, roi |
43
+ | `custom:<dimensions>:<metrics>` | Dynamic (dimensions + install_time) | install_time | merge| Retrieves data for custom tables, which can be specified by the user.|
40
44
 
41
45
  Use these as `--source-table` parameter in the `ingestr ingest` command.
42
46
 
@@ -41,16 +41,17 @@ ingestr ingest \
41
41
 
42
42
  Asana source allows ingesting the following sources into separate tables:
43
43
 
44
- | **Table** | **Description** |
45
- |---------------|---------------------------------------------------------------------------------|
46
- | `workspaces` | Information about people, materials, or assets required to complete a task or project successfully. |
47
- | `projects` | Collections of tasks and related information. |
48
- | `tasks` | Tasks within a project. Only tasks that belong to a project can be ingested. Users private tasks are not ingested, for example. |
49
- | `projects` | Collections of tasks and related information. |
50
- | `tags` | Labels that can be attached to tasks, projects, or conversations to help categorize and organize them. |
51
- | `stories` | Updates or comments that team members can add to a task or project. |
52
- | `teams` | Groups of individuals who work together to complete projects and tasks. |
53
- | `users` | Individuals who have access to the Asana platform. |
44
+ | Table | Primary/Merge Key | Inc Key | Inc Strategy | Details |
45
+ |-------|----|----------|--------------|---------|
46
+ | `workspaces` | - | - | replace | Information about people, materials, or assets required to complete a task or project successfully. Full reload on each run. |
47
+ | `projects` | - | - | replace | Collections of tasks and related information. Full reload on each run. |
48
+ | `sections` | - | - | replace | Project sections and organization. Full reload on each run. |
49
+ | `tags` | - | - | replace | Labels that can be attached to tasks, projects, or conversations. Full reload on each run. |
50
+ | `tasks` | gid | modified_at | merge | Tasks within a project. Only tasks that belong to a project can be ingested. Uses modified_since API parameter for incremental loading. |
51
+ | `stories` | - | - | replace | Updates or comments that team members can add to a task or project. |
52
+ | `teams` | - | - | replace | Groups of individuals who work together to complete projects and tasks. Full reload on each run. |
53
+ | `users` | - | - | replace | Individuals who have access to the Asana platform. Full reload on each run. |
54
+
54
55
 
55
56
  Use these as `--source-table` parameter in the `ingestr ingest` command.
56
57
 
@@ -0,0 +1,47 @@
1
+ # Attio
2
+ [Attio](https://attio.com/) is an AI-native CRM platform that helps companies build, scale, and grow their business.
3
+
4
+ ingestr supports Attio as a source.
5
+
6
+ ## URI format
7
+
8
+ The URI format for Attio is as follows:
9
+
10
+ ```plaintext
11
+ attio://?api_key=<api_key>
12
+ ```
13
+
14
+ URI parameters:
15
+ - `api_key`: the API key used for authentication with the Attio API
16
+
17
+ ## Setting up a Attio Integration
18
+
19
+ You can find your Attio API key by following the guide [here](https://attio.com/help/apps/other-apps/generating-an-api-key).
20
+
21
+ Let's say your `api_key` is key_123, here's a sample command that will copy the data from Attio into a DuckDB database:
22
+
23
+
24
+ ```bash
25
+ ingestr ingest \
26
+ --source-uri 'Attio://?api_key=key_123' \
27
+ --source-table 'objects' \
28
+ --dest-uri duckdb:///attio.duckdb \
29
+ --dest-table 'dest.objects'
30
+ ```
31
+
32
+ ## Tables
33
+
34
+ Attio source supports ingesting the following sources into separate tables:
35
+
36
+ | Table | PK | Inc Key | Inc Strategy | Details |
37
+ |-------|----|----------|--------------|---------|
38
+ | [objects](https://docs.attio.com/rest-api/endpoint-reference/objects/list-objects) | - | - | replace | Objects are the data types used to store facts about your customers. Fetches all objects. Full reload on each run. |
39
+ | [records:{object_api_slug}](https://docs.attio.com/rest-api/endpoint-reference/records/list-records) | - | - | replace | Fetches all records of an object. For example: `records:companies`. Full reload on each run. |
40
+ | [lists](https://docs.attio.com/rest-api/endpoint-reference/lists/list-all-lists) | - | - | replace | Fetches all lists. Full reload on each run. |
41
+ | [list_entries:{list_id}](https://docs.attio.com/rest-api/endpoint-reference/entries/list-entries) | - | - | replace | Lists all items in a specific list. For example: `list_entries:8abc-123-456-789d-123`. Full reload on each run. |
42
+ | [all_list_entries:{object_api_slug}](https://docs.attio.com/rest-api/endpoint-reference/entries/list-entries) | - | - | replace | Fetches all the lists for an object, and then fetches all the entries from that list. For example: `all_list_entries:companies`. Full reload on each run. |
43
+
44
+ Use this as `--source-table` parameter in the `ingestr ingest` command.
45
+
46
+ > [!WARNING]
47
+ > Attio does not support incremental loading, which means ingestr will do a full-refresh.