airbyte-cdk 6.8.1rc9__py3-none-any.whl → 6.8.2.dev1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (73) hide show
  1. airbyte_cdk/cli/source_declarative_manifest/_run.py +11 -5
  2. airbyte_cdk/config_observation.py +1 -1
  3. airbyte_cdk/connector_builder/main.py +1 -1
  4. airbyte_cdk/connector_builder/message_grouper.py +10 -10
  5. airbyte_cdk/destinations/destination.py +1 -1
  6. airbyte_cdk/destinations/vector_db_based/embedder.py +2 -2
  7. airbyte_cdk/destinations/vector_db_based/writer.py +12 -4
  8. airbyte_cdk/entrypoint.py +7 -6
  9. airbyte_cdk/logger.py +2 -2
  10. airbyte_cdk/sources/abstract_source.py +1 -1
  11. airbyte_cdk/sources/config.py +1 -1
  12. airbyte_cdk/sources/connector_state_manager.py +9 -4
  13. airbyte_cdk/sources/declarative/auth/oauth.py +1 -1
  14. airbyte_cdk/sources/declarative/auth/selective_authenticator.py +6 -1
  15. airbyte_cdk/sources/declarative/concurrent_declarative_source.py +76 -28
  16. airbyte_cdk/sources/declarative/datetime/min_max_datetime.py +10 -4
  17. airbyte_cdk/sources/declarative/declarative_component_schema.yaml +16 -17
  18. airbyte_cdk/sources/declarative/decoders/noop_decoder.py +4 -1
  19. airbyte_cdk/sources/declarative/extractors/record_filter.py +3 -5
  20. airbyte_cdk/sources/declarative/incremental/__init__.py +3 -0
  21. airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py +270 -0
  22. airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py +8 -6
  23. airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py +9 -0
  24. airbyte_cdk/sources/declarative/interpolation/jinja.py +35 -36
  25. airbyte_cdk/sources/declarative/interpolation/macros.py +1 -1
  26. airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py +71 -17
  27. airbyte_cdk/sources/declarative/partition_routers/substream_partition_router.py +13 -7
  28. airbyte_cdk/sources/declarative/requesters/error_handlers/default_error_handler.py +1 -1
  29. airbyte_cdk/sources/declarative/requesters/error_handlers/http_response_filter.py +8 -6
  30. airbyte_cdk/sources/declarative/requesters/paginators/default_paginator.py +1 -1
  31. airbyte_cdk/sources/declarative/requesters/request_options/datetime_based_request_options_provider.py +2 -2
  32. airbyte_cdk/sources/declarative/requesters/request_options/interpolated_request_options_provider.py +1 -1
  33. airbyte_cdk/sources/declarative/retrievers/async_retriever.py +5 -2
  34. airbyte_cdk/sources/declarative/retrievers/simple_retriever.py +1 -1
  35. airbyte_cdk/sources/declarative/spec/spec.py +1 -1
  36. airbyte_cdk/sources/declarative/stream_slicers/declarative_partition_generator.py +0 -1
  37. airbyte_cdk/sources/embedded/base_integration.py +3 -2
  38. airbyte_cdk/sources/file_based/availability_strategy/abstract_file_based_availability_strategy.py +12 -4
  39. airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py +18 -7
  40. airbyte_cdk/sources/file_based/file_types/avro_parser.py +14 -11
  41. airbyte_cdk/sources/file_based/file_types/csv_parser.py +3 -3
  42. airbyte_cdk/sources/file_based/file_types/excel_parser.py +11 -5
  43. airbyte_cdk/sources/file_based/file_types/jsonl_parser.py +1 -1
  44. airbyte_cdk/sources/file_based/stream/abstract_file_based_stream.py +2 -2
  45. airbyte_cdk/sources/file_based/stream/concurrent/adapters.py +6 -3
  46. airbyte_cdk/sources/file_based/stream/cursor/default_file_based_cursor.py +1 -1
  47. airbyte_cdk/sources/http_logger.py +3 -3
  48. airbyte_cdk/sources/streams/concurrent/abstract_stream.py +5 -2
  49. airbyte_cdk/sources/streams/concurrent/adapters.py +6 -3
  50. airbyte_cdk/sources/streams/concurrent/availability_strategy.py +9 -3
  51. airbyte_cdk/sources/streams/concurrent/cursor.py +10 -1
  52. airbyte_cdk/sources/streams/concurrent/state_converters/datetime_stream_state_converter.py +2 -2
  53. airbyte_cdk/sources/streams/core.py +17 -14
  54. airbyte_cdk/sources/streams/http/http.py +19 -19
  55. airbyte_cdk/sources/streams/http/http_client.py +4 -48
  56. airbyte_cdk/sources/streams/http/requests_native_auth/abstract_token.py +2 -1
  57. airbyte_cdk/sources/streams/http/requests_native_auth/oauth.py +62 -33
  58. airbyte_cdk/sources/utils/record_helper.py +1 -1
  59. airbyte_cdk/sources/utils/schema_helpers.py +1 -1
  60. airbyte_cdk/sources/utils/transform.py +34 -15
  61. airbyte_cdk/test/entrypoint_wrapper.py +11 -6
  62. airbyte_cdk/test/mock_http/response_builder.py +1 -1
  63. airbyte_cdk/utils/airbyte_secrets_utils.py +1 -1
  64. airbyte_cdk/utils/event_timing.py +10 -10
  65. airbyte_cdk/utils/message_utils.py +4 -3
  66. airbyte_cdk/utils/spec_schema_transformations.py +3 -2
  67. airbyte_cdk/utils/traced_exception.py +14 -12
  68. airbyte_cdk-6.8.2.dev1.dist-info/METADATA +111 -0
  69. {airbyte_cdk-6.8.1rc9.dist-info → airbyte_cdk-6.8.2.dev1.dist-info}/RECORD +72 -71
  70. airbyte_cdk-6.8.1rc9.dist-info/METADATA +0 -307
  71. {airbyte_cdk-6.8.1rc9.dist-info → airbyte_cdk-6.8.2.dev1.dist-info}/LICENSE.txt +0 -0
  72. {airbyte_cdk-6.8.1rc9.dist-info → airbyte_cdk-6.8.2.dev1.dist-info}/WHEEL +0 -0
  73. {airbyte_cdk-6.8.1rc9.dist-info → airbyte_cdk-6.8.2.dev1.dist-info}/entry_points.txt +0 -0
@@ -1,307 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: airbyte-cdk
3
- Version: 6.8.1rc9
4
- Summary: A framework for writing Airbyte Connectors.
5
- Home-page: https://airbyte.com
6
- License: MIT
7
- Keywords: airbyte,connector-development-kit,cdk
8
- Author: Airbyte
9
- Author-email: contact@airbyte.io
10
- Requires-Python: >=3.10,<3.13
11
- Classifier: Development Status :: 3 - Alpha
12
- Classifier: Intended Audience :: Developers
13
- Classifier: License :: OSI Approved :: MIT License
14
- Classifier: Programming Language :: Python :: 3
15
- Classifier: Programming Language :: Python :: 3.10
16
- Classifier: Programming Language :: Python :: 3.11
17
- Classifier: Programming Language :: Python :: 3.12
18
- Classifier: Topic :: Scientific/Engineering
19
- Classifier: Topic :: Software Development :: Libraries :: Python Modules
20
- Provides-Extra: file-based
21
- Provides-Extra: sphinx-docs
22
- Provides-Extra: sql
23
- Provides-Extra: vector-db-based
24
- Requires-Dist: Deprecated (>=1.2,<1.3)
25
- Requires-Dist: Jinja2 (>=3.1.2,<3.2.0)
26
- Requires-Dist: PyYAML (>=6.0.1,<7.0.0)
27
- Requires-Dist: Sphinx (>=4.2,<4.3) ; extra == "sphinx-docs"
28
- Requires-Dist: airbyte-protocol-models-dataclasses (>=0.14,<0.15)
29
- Requires-Dist: avro (>=1.11.2,<1.12.0) ; extra == "file-based"
30
- Requires-Dist: backoff
31
- Requires-Dist: cachetools
32
- Requires-Dist: cohere (==4.21) ; extra == "vector-db-based"
33
- Requires-Dist: cryptography (>=42.0.5,<44.0.0)
34
- Requires-Dist: dpath (>=2.1.6,<3.0.0)
35
- Requires-Dist: dunamai (>=1.22.0,<2.0.0)
36
- Requires-Dist: fastavro (>=1.8.0,<1.9.0) ; extra == "file-based"
37
- Requires-Dist: genson (==1.3.0)
38
- Requires-Dist: isodate (>=0.6.1,<0.7.0)
39
- Requires-Dist: jsonref (>=0.2,<0.3)
40
- Requires-Dist: jsonschema (>=4.17.3,<4.18.0)
41
- Requires-Dist: langchain (==0.1.16) ; extra == "vector-db-based"
42
- Requires-Dist: langchain_core (==0.1.42)
43
- Requires-Dist: markdown ; extra == "file-based"
44
- Requires-Dist: nltk (==3.9.1)
45
- Requires-Dist: numpy (<2)
46
- Requires-Dist: openai[embeddings] (==0.27.9) ; extra == "vector-db-based"
47
- Requires-Dist: orjson (>=3.10.7,<4.0.0)
48
- Requires-Dist: pandas (==2.2.2)
49
- Requires-Dist: pdf2image (==1.16.3) ; extra == "file-based"
50
- Requires-Dist: pdfminer.six (==20221105) ; extra == "file-based"
51
- Requires-Dist: pendulum (<3.0.0)
52
- Requires-Dist: psutil (==6.1.0)
53
- Requires-Dist: pyarrow (>=15.0.0,<15.1.0) ; extra == "file-based"
54
- Requires-Dist: pydantic (>=2.7,<3.0)
55
- Requires-Dist: pyjwt (>=2.8.0,<3.0.0)
56
- Requires-Dist: pyrate-limiter (>=3.1.0,<3.2.0)
57
- Requires-Dist: pytesseract (==0.3.10) ; extra == "file-based"
58
- Requires-Dist: python-calamine (==0.2.3) ; extra == "file-based"
59
- Requires-Dist: python-dateutil
60
- Requires-Dist: python-snappy (==0.7.3) ; extra == "file-based"
61
- Requires-Dist: python-ulid (>=3.0.0,<4.0.0)
62
- Requires-Dist: pytz (==2024.1)
63
- Requires-Dist: rapidfuzz (>=3.10.1,<4.0.0)
64
- Requires-Dist: requests
65
- Requires-Dist: requests_cache
66
- Requires-Dist: serpyco-rs (>=1.10.2,<2.0.0)
67
- Requires-Dist: sphinx-rtd-theme (>=1.0,<1.1) ; extra == "sphinx-docs"
68
- Requires-Dist: sqlalchemy (>=2.0,<3.0,!=2.0.36) ; extra == "sql"
69
- Requires-Dist: tiktoken (==0.8.0) ; extra == "vector-db-based"
70
- Requires-Dist: unstructured.pytesseract (>=0.3.12) ; extra == "file-based"
71
- Requires-Dist: unstructured[docx,pptx] (==0.10.27) ; extra == "file-based"
72
- Requires-Dist: wcmatch (==10.0)
73
- Requires-Dist: xmltodict (>=0.13.0,<0.14.0)
74
- Project-URL: Documentation, https://docs.airbyte.io/
75
- Project-URL: Repository, https://github.com/airbytehq/airbyte-python-cdk
76
- Description-Content-Type: text/markdown
77
-
78
- # Airbyte Python CDK and Low-Code CDK
79
-
80
- Airbyte Python CDK is a framework for building Airbyte API Source Connectors. It provides a set of
81
- classes and helpers that make it easy to build a connector against an HTTP API (REST, GraphQL, etc),
82
- or a generic Python source connector.
83
-
84
- ## Usage
85
-
86
- If you're looking to build a connector, we highly recommend that you
87
- [start with the Connector Builder](https://docs.airbyte.com/connector-development/connector-builder-ui/overview).
88
- It should be enough for 90% connectors out there. For more flexible and complex connectors, use the
89
- [low-code CDK and `SourceDeclarativeManifest`](https://docs.airbyte.com/connector-development/config-based/low-code-cdk-overview).
90
-
91
- If that doesn't work, then consider building on top of the
92
- [lower-level Python CDK itself](https://docs.airbyte.com/connector-development/cdk-python/).
93
-
94
- ### Quick Start
95
-
96
- To get started on a Python CDK based connector or a low-code connector, you can generate a connector
97
- project from a template:
98
-
99
- ```bash
100
- # from the repo root
101
- cd airbyte-integrations/connector-templates/generator
102
- ./generate.sh
103
- ```
104
-
105
- ### Example Connectors
106
-
107
- **HTTP Connectors**:
108
-
109
- - [Stripe](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/)
110
- - [Salesforce](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-salesforce/)
111
-
112
- **Python connectors using the bare-bones `Source` abstraction**:
113
-
114
- - [Google Sheets](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-google-sheets/google_sheets_source/google_sheets_source.py)
115
-
116
- This will generate a project with a type and a name of your choice and put it in
117
- `airbyte-integrations/connectors`. Open the directory with your connector in an editor and follow
118
- the `TODO` items.
119
-
120
- ## Python CDK Overview
121
-
122
- Airbyte CDK code is within `airbyte_cdk` directory. Here's a high level overview of what's inside:
123
-
124
- - `connector_builder`. Internal wrapper that helps the Connector Builder platform run a declarative
125
- manifest (low-code connector). You should not use this code directly. If you need to run a
126
- `SourceDeclarativeManifest`, take a look at
127
- [`source-declarative-manifest`](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-declarative-manifest)
128
- connector implementation instead.
129
- - `destinations`. Basic Destination connector support! If you're building a Destination connector in
130
- Python, try that. Some of our vector DB destinations like `destination-pinecone` are using that
131
- code.
132
- - `models` expose `airbyte_protocol.models` as a part of `airbyte_cdk` package.
133
- - `sources/concurrent_source` is the Concurrent CDK implementation. It supports reading data from
134
- streams concurrently per slice / partition, useful for connectors with high throughput and high
135
- number of records.
136
- - `sources/declarative` is the low-code CDK. It works on top of Airbyte Python CDK, but provides a
137
- declarative manifest language to define streams, operations, etc. This makes it easier to build
138
- connectors without writing Python code.
139
- - `sources/file_based` is the CDK for file-based sources. Examples include S3, Azure, GCS, etc.
140
-
141
- ## Contributing
142
-
143
- Thank you for being interested in contributing to Airbyte Python CDK! Here are some guidelines to
144
- get you started:
145
-
146
- - We adhere to the [code of conduct](/CODE_OF_CONDUCT.md).
147
- - You can contribute by reporting bugs, posting github discussions, opening issues, improving
148
- [documentation](/docs/), and submitting pull requests with bugfixes and new features alike.
149
- - If you're changing the code, please add unit tests for your change.
150
- - When submitting issues or PRs, please add a small reproduction project. Using the changes in your
151
- connector and providing that connector code as an example (or a satellite PR) helps!
152
-
153
- ### First time setup
154
-
155
- Install the project dependencies and development tools:
156
-
157
- ```bash
158
- poetry install --all-extras
159
- ```
160
-
161
- Installing all extras is required to run the full suite of unit tests.
162
-
163
- #### Running tests locally
164
-
165
- - Iterate on the CDK code locally
166
- - Run tests via `poetry run poe unit-test-with-cov`, or `python -m pytest -s unit_tests` if you want
167
- to pass pytest options.
168
- - Run `poetry run poe check-local` to lint all code, type-check modified code, and run unit tests
169
- with coverage in one command.
170
-
171
- To see all available scripts, run `poetry run poe`.
172
-
173
- #### Formatting the code
174
-
175
- - Iterate on the CDK code locally
176
- - Run `poetry run ruff format` to format your changes.
177
-
178
- To see all available `ruff` options, run `poetry run ruff`.
179
-
180
- ##### Autogenerated files
181
-
182
- Low-code CDK models are generated from `sources/declarative/declarative_component_schema.yaml`. If
183
- the iteration you are working on includes changes to the models or the connector generator, you
184
- might want to regenerate them. In order to do that, you can run:
185
-
186
- ```bash
187
- poetry run poe build
188
- ```
189
-
190
- This will generate the code generator docker image and the component manifest files based on the
191
- schemas and templates.
192
-
193
- #### Testing
194
-
195
- All tests are located in the `unit_tests` directory. Run `poetry run poe unit-test-with-cov` to run
196
- them. This also presents a test coverage report. For faster iteration with no coverage report and
197
- more options, `python -m pytest -s unit_tests` is a good place to start.
198
-
199
- #### Building and testing a connector with your local CDK
200
-
201
- When developing a new feature in the CDK, you may find it helpful to run a connector that uses that
202
- new feature. You can test this in one of two ways:
203
-
204
- - Running a connector locally
205
- - Building and running a source via Docker
206
-
207
- ##### Installing your local CDK into a local Python connector
208
-
209
- Open the connector's `pyproject.toml` file and replace the line with `airbyte_cdk` with the
210
- following:
211
-
212
- ```toml
213
- airbyte_cdk = { path = "../../../airbyte-cdk/python/airbyte_cdk", develop = true }
214
- ```
215
-
216
- Then, running `poetry update` should reinstall `airbyte_cdk` from your local working directory.
217
-
218
- ##### Building a Python connector in Docker with your local CDK installed
219
-
220
- _Pre-requisite: Install the
221
- [`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_
222
-
223
- You can build your connector image with the local CDK using
224
-
225
- ```bash
226
- # from the airbytehq/airbyte base directory
227
- airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> build
228
- ```
229
-
230
- Note that the local CDK is injected at build time, so if you make changes, you will have to run the
231
- build command again to see them reflected.
232
-
233
- ##### Running Connector Acceptance Tests for a single connector in Docker with your local CDK installed
234
-
235
- _Pre-requisite: Install the
236
- [`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_
237
-
238
- To run acceptance tests for a single connectors using the local CDK, from the connector directory,
239
- run
240
-
241
- ```bash
242
- airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> test
243
- ```
244
-
245
- #### When you don't have access to the API
246
-
247
- There may be a time when you do not have access to the API (either because you don't have the
248
- credentials, network access, etc...) You will probably still want to do end-to-end testing at least
249
- once. In order to do so, you can emulate the server you would be reaching using a server stubbing
250
- tool.
251
-
252
- For example, using [mockserver](https://www.mock-server.com/), you can set up an expectation file
253
- like this:
254
-
255
- ```json
256
- {
257
- "httpRequest": {
258
- "method": "GET",
259
- "path": "/data"
260
- },
261
- "httpResponse": {
262
- "body": "{\"data\": [{\"record_key\": 1}, {\"record_key\": 2}]}"
263
- }
264
- }
265
- ```
266
-
267
- Assuming this file has been created at `secrets/mock_server_config/expectations.json`, running the
268
- following command will allow to match any requests on path `/data` to return the response defined in
269
- the expectation file:
270
-
271
- ```bash
272
- docker run -d --rm -v $(pwd)/secrets/mock_server_config:/config -p 8113:8113 --env MOCKSERVER_LOG_LEVEL=TRACE --env MOCKSERVER_SERVER_PORT=8113 --env MOCKSERVER_WATCH_INITIALIZATION_JSON=true --env MOCKSERVER_PERSISTED_EXPECTATIONS_PATH=/config/expectations.json --env MOCKSERVER_INITIALIZATION_JSON_PATH=/config/expectations.json mockserver/mockserver:5.15.0
273
- ```
274
-
275
- HTTP requests to `localhost:8113/data` should now return the body defined in the expectations file.
276
- To test this, the implementer either has to change the code which defines the base URL for Python
277
- source or update the `url_base` from low-code. With the Connector Builder running in docker, you
278
- will have to use domain `host.docker.internal` instead of `localhost` as the requests are executed
279
- within docker.
280
-
281
- #### Publishing a new version to PyPi
282
-
283
- Python CDK has a
284
- [GitHub workflow](https://github.com/airbytehq/airbyte/actions/workflows/publish-cdk-command-manually.yml)
285
- that manages the CDK changelog, making a new release for `airbyte_cdk`, publishing it to PyPI, and
286
- then making a commit to update (and subsequently auto-release)
287
- [`source-declarative-m anifest`](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-declarative-manifest)
288
- and Connector Builder (in the platform repository).
289
-
290
- > [!Note]: The workflow will handle the `CHANGELOG.md` entry for you. You should not add changelog
291
- > lines in your PRs to the CDK itself.
292
-
293
- > [!Warning]: The workflow bumps version on it's own, please don't change the CDK version in
294
- > `pyproject.toml` manually.
295
-
296
- 1. You only trigger the release workflow once all the PRs that you want to be included are already
297
- merged into the `master` branch.
298
- 2. The
299
- [`Publish CDK Manually`](https://github.com/airbytehq/airbyte/actions/workflows/publish-cdk-command-manually.yml)
300
- workflow from master using `release-type=major|manor|patch` and setting the changelog message.
301
- 3. When the workflow runs, it will commit a new version directly to master branch.
302
- 4. The workflow will bump the version of `source-declarative-manifest` according to the
303
- `release-type` of the CDK, then commit these changes back to master. The commit to master will
304
- kick off a publish of the new version of `source-declarative-manifest`.
305
- 5. The workflow will also add a pull request to `airbyte-platform-internal` repo to bump the
306
- dependency in Connector Builder.
307
-