airbyte-source-commcare 0.1.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- airbyte_source_commcare-0.1.0.dist-info/METADATA +102 -0
- airbyte_source_commcare-0.1.0.dist-info/RECORD +19 -0
- airbyte_source_commcare-0.1.0.dist-info/WHEEL +5 -0
- airbyte_source_commcare-0.1.0.dist-info/entry_points.txt +2 -0
- airbyte_source_commcare-0.1.0.dist-info/top_level.txt +3 -0
- integration_tests/__init__.py +3 -0
- integration_tests/abnormal_state.json +5 -0
- integration_tests/acceptance.py +16 -0
- integration_tests/catalog.json +1 -0
- integration_tests/configured_catalog.json +16 -0
- integration_tests/invalid_config.json +6 -0
- integration_tests/sample_config.json +6 -0
- integration_tests/sample_state.json +5 -0
- source_commcare/__init__.py +8 -0
- source_commcare/run.py +14 -0
- source_commcare/source.py +337 -0
- source_commcare/spec.yaml +38 -0
- unit_tests/__init__.py +3 -0
- unit_tests/test_source.py +25 -0
@@ -0,0 +1,102 @@
|
|
1
|
+
Metadata-Version: 2.1
|
2
|
+
Name: airbyte-source-commcare
|
3
|
+
Version: 0.1.0
|
4
|
+
Summary: Source implementation for Commcare.
|
5
|
+
Author: Airbyte
|
6
|
+
Author-email: contact@airbyte.io
|
7
|
+
Description-Content-Type: text/markdown
|
8
|
+
Requires-Dist: airbyte-cdk
|
9
|
+
Requires-Dist: bigquery-schema-generator ~=1.5
|
10
|
+
Requires-Dist: gbqschema-converter ~=1.2.0
|
11
|
+
Requires-Dist: flatten-json ~=0.1.13
|
12
|
+
Provides-Extra: tests
|
13
|
+
Requires-Dist: requests-mock ~=1.9.3 ; extra == 'tests'
|
14
|
+
Requires-Dist: pytest ~=6.1 ; extra == 'tests'
|
15
|
+
Requires-Dist: pytest-mock ~=3.6.1 ; extra == 'tests'
|
16
|
+
|
17
|
+
# Commcare Source
|
18
|
+
|
19
|
+
This is the repository for the Commcare source connector, written in Python.
|
20
|
+
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.io/integrations/sources/commcare).
|
21
|
+
|
22
|
+
|
23
|
+
**To iterate on this connector, make sure to complete this prerequisites section.**
|
24
|
+
|
25
|
+
|
26
|
+
From this connector directory, create a virtual environment:
|
27
|
+
```
|
28
|
+
python -m venv .venv
|
29
|
+
```
|
30
|
+
|
31
|
+
This will generate a virtualenv for this module in `.venv/`. Make sure this venv is active in your
|
32
|
+
development environment of choice. To activate it from the terminal, run:
|
33
|
+
```
|
34
|
+
source .venv/bin/activate
|
35
|
+
pip install -r requirements.txt
|
36
|
+
pip install '.[tests]'
|
37
|
+
```
|
38
|
+
If you are in an IDE, follow your IDE's instructions to activate the virtualenv.
|
39
|
+
|
40
|
+
Note that while we are installing dependencies from `requirements.txt`, you should only edit `setup.py` for your dependencies. `requirements.txt` is
|
41
|
+
used for editable installs (`pip install -e`) to pull in Python dependencies from the monorepo and will call `setup.py`.
|
42
|
+
If this is mumbo jumbo to you, don't worry about it, just put your deps in `setup.py` but install using `pip install -r requirements.txt` and everything
|
43
|
+
should work as you expect.
|
44
|
+
|
45
|
+
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.io/integrations/sources/commcare)
|
46
|
+
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_commcare/spec.yaml` file.
|
47
|
+
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.
|
48
|
+
See `integration_tests/sample_config.json` for a sample config file.
|
49
|
+
|
50
|
+
**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source commcare test creds`
|
51
|
+
and place them into `secrets/config.json`.
|
52
|
+
|
53
|
+
```
|
54
|
+
python main.py spec
|
55
|
+
python main.py check --config secrets/config.json
|
56
|
+
python main.py discover --config secrets/config.json
|
57
|
+
python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json
|
58
|
+
```
|
59
|
+
|
60
|
+
|
61
|
+
|
62
|
+
**Via [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) (recommended):**
|
63
|
+
```bash
|
64
|
+
airbyte-ci connectors --name=source-commcare build
|
65
|
+
```
|
66
|
+
|
67
|
+
An image will be built with the tag `airbyte/source-commcare:dev`.
|
68
|
+
|
69
|
+
**Via `docker build`:**
|
70
|
+
```bash
|
71
|
+
docker build -t airbyte/source-commcare:dev .
|
72
|
+
```
|
73
|
+
|
74
|
+
Then run any of the connector commands as follows:
|
75
|
+
```
|
76
|
+
docker run --rm airbyte/source-commcare:dev spec
|
77
|
+
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-commcare:dev check --config /secrets/config.json
|
78
|
+
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-commcare:dev discover --config /secrets/config.json
|
79
|
+
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-commcare:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
|
80
|
+
```
|
81
|
+
|
82
|
+
You can run our full test suite locally using [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md):
|
83
|
+
```bash
|
84
|
+
airbyte-ci connectors --name=source-commcare test
|
85
|
+
```
|
86
|
+
|
87
|
+
Customize `acceptance-test-config.yml` file to configure tests. See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
|
88
|
+
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
|
89
|
+
|
90
|
+
All of your dependencies should go in `setup.py`, NOT `requirements.txt`. The requirements file is only used to connect internal Airbyte dependencies in the monorepo for local development.
|
91
|
+
We split dependencies between two groups, dependencies that are:
|
92
|
+
* required for your connector to work need to go to `MAIN_REQUIREMENTS` list.
|
93
|
+
* required for the testing need to go to `TEST_REQUIREMENTS` list
|
94
|
+
|
95
|
+
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
|
96
|
+
1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-commcare test`
|
97
|
+
2. Bump the connector version in `metadata.yaml`: increment the `dockerImageTag` value. Please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors).
|
98
|
+
3. Make sure the `metadata.yaml` content is up to date.
|
99
|
+
4. Make the connector documentation and its changelog is up to date (`docs/integrations/sources/commcare.md`).
|
100
|
+
5. Create a Pull Request: use [our PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#pull-request-title-convention).
|
101
|
+
6. Pat yourself on the back for being an awesome contributor.
|
102
|
+
7. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
|
@@ -0,0 +1,19 @@
|
|
1
|
+
integration_tests/__init__.py,sha256=4Hw-PX1-VgESLF16cDdvuYCzGJtHntThLF4qIiULWeo,61
|
2
|
+
integration_tests/abnormal_state.json,sha256=-DJMcBhq1do1ntOZXzsaGmxvbDTmYmFS0cSSaUoRLZ4,86
|
3
|
+
integration_tests/acceptance.py,sha256=8eU9iSDbmHyufPvAouJGhPMgPAFTCP8IKIKHLm7u5TE,435
|
4
|
+
integration_tests/catalog.json,sha256=yj0WO6sFU4GCciYUBWjzvvfqrBh869doeOC2Pp5EI1Y,3
|
5
|
+
integration_tests/configured_catalog.json,sha256=1tbuuekIkjKW6NKdwW9ulYAYNjErYgAJqDsq08etQLE,397
|
6
|
+
integration_tests/invalid_config.json,sha256=kXy1C-2u6aW122jeDBlv1taNZDYjzdjRyer_2B1xgk8,142
|
7
|
+
integration_tests/sample_config.json,sha256=Lh9_zss5lxbqGFYpZnvi8qo5eIMNcuQ6GBDKbw1dVhM,115
|
8
|
+
integration_tests/sample_state.json,sha256=HBR6G9QPoPDFAlfAKwRmeNaQ5_56xehMLtwWr4mxd5k,63
|
9
|
+
source_commcare/__init__.py,sha256=e2gcZC5sfkJ_uidHqoPCVmqUdL9mhpMrw4Ix4uIzSng,128
|
10
|
+
source_commcare/run.py,sha256=aZZ10q2rM_fY8lmNk4Gmb_pOhDJvv-biHN6V-ut5Y6I,236
|
11
|
+
source_commcare/source.py,sha256=X_4bNim1LBDCVjqnhWViaFgsjPvMTXkZZ1yNRGRf1Aw,12956
|
12
|
+
source_commcare/spec.yaml,sha256=CMpevuCDKMFZmq16CdeilpvXKbCiQPjmI1fcgzvJQuQ,1019
|
13
|
+
unit_tests/__init__.py,sha256=4Hw-PX1-VgESLF16cDdvuYCzGJtHntThLF4qIiULWeo,61
|
14
|
+
unit_tests/test_source.py,sha256=qneLJGlvcpoyFlnvwzwP2voDtzi8z6c_N-lu0won5wM,682
|
15
|
+
airbyte_source_commcare-0.1.0.dist-info/METADATA,sha256=-GDqr4YKVK8LGyoTB9nmPitvdARyMlTt9LOzmE5dJUc,5536
|
16
|
+
airbyte_source_commcare-0.1.0.dist-info/WHEEL,sha256=oiQVh_5PnQM0E3gPdiz09WCNmwiHDMaGer_elqB3coM,92
|
17
|
+
airbyte_source_commcare-0.1.0.dist-info/entry_points.txt,sha256=zVJihl1jgNaayMgxgA1G9KN85LZ9T8bbss4VfqE78Bg,60
|
18
|
+
airbyte_source_commcare-0.1.0.dist-info/top_level.txt,sha256=WCua9CoU5uJ26Af2zVYpamondw5woLFReykWLPCjX7s,45
|
19
|
+
airbyte_source_commcare-0.1.0.dist-info/RECORD,,
|
@@ -0,0 +1,16 @@
|
|
1
|
+
#
|
2
|
+
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
|
3
|
+
#
|
4
|
+
|
5
|
+
|
6
|
+
import pytest
|
7
|
+
|
8
|
+
pytest_plugins = ("connector_acceptance_test.plugin",)
|
9
|
+
|
10
|
+
|
11
|
+
@pytest.fixture(scope="session", autouse=True)
|
12
|
+
def connector_setup():
|
13
|
+
"""This fixture is a placeholder for external resources that acceptance test might require."""
|
14
|
+
# TODO: setup test dependencies if needed. otherwise remove the TODO comments
|
15
|
+
yield
|
16
|
+
# TODO: clean up test dependencies
|
@@ -0,0 +1 @@
|
|
1
|
+
{}
|
@@ -0,0 +1,16 @@
|
|
1
|
+
{
|
2
|
+
"streams": [
|
3
|
+
{
|
4
|
+
"stream": {
|
5
|
+
"name": "Assess a referred patient",
|
6
|
+
"json_schema": {},
|
7
|
+
"supported_sync_modes": ["full_refresh", "incremental"],
|
8
|
+
"source_defined_cursor": true,
|
9
|
+
"default_cursor_field": ["indexed_on"]
|
10
|
+
},
|
11
|
+
"sync_mode": "incremental",
|
12
|
+
"cursor_field": ["indexed_on"],
|
13
|
+
"destination_sync_mode": "append"
|
14
|
+
}
|
15
|
+
]
|
16
|
+
}
|
source_commcare/run.py
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#
|
2
|
+
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
|
3
|
+
#
|
4
|
+
|
5
|
+
|
6
|
+
import sys
|
7
|
+
|
8
|
+
from airbyte_cdk.entrypoint import launch
|
9
|
+
from source_commcare import SourceCommcare
|
10
|
+
|
11
|
+
|
12
|
+
def run():
|
13
|
+
source = SourceCommcare()
|
14
|
+
launch(source, sys.argv[1:])
|
@@ -0,0 +1,337 @@
|
|
1
|
+
#
|
2
|
+
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
|
3
|
+
#
|
4
|
+
|
5
|
+
import re
|
6
|
+
from abc import ABC
|
7
|
+
from datetime import datetime
|
8
|
+
from typing import Any, Iterable, List, Mapping, MutableMapping, Optional, Tuple
|
9
|
+
from urllib.parse import parse_qs
|
10
|
+
|
11
|
+
import requests
|
12
|
+
from airbyte_cdk.models import SyncMode
|
13
|
+
from airbyte_cdk.sources import AbstractSource
|
14
|
+
from airbyte_cdk.sources.streams import IncrementalMixin, Stream
|
15
|
+
from airbyte_cdk.sources.streams.http import HttpStream
|
16
|
+
from airbyte_cdk.sources.streams.http.requests_native_auth import TokenAuthenticator
|
17
|
+
from flatten_json import flatten
|
18
|
+
|
19
|
+
|
20
|
+
# Basic full refresh stream
|
21
|
+
class CommcareStream(HttpStream, ABC):
|
22
|
+
def __init__(self, project_space, **kwargs):
|
23
|
+
super().__init__(**kwargs)
|
24
|
+
self.project_space = project_space
|
25
|
+
|
26
|
+
@property
|
27
|
+
def url_base(self) -> str:
|
28
|
+
return f"https://www.commcarehq.org/a/{self.project_space}/api/v0.5/"
|
29
|
+
|
30
|
+
# These class variables save state
|
31
|
+
# forms holds form ids and we filter cases which contain one of these form ids
|
32
|
+
# last_form_date stores the date of the last form read so the next cycle for forms and cases starts at the same timestamp
|
33
|
+
forms = set()
|
34
|
+
last_form_date = None
|
35
|
+
schemas = {}
|
36
|
+
unwantedfields = re.compile(r"^(case_|update_|meta|create_|commcare_).*$")
|
37
|
+
|
38
|
+
@property
|
39
|
+
def dateformat(self):
|
40
|
+
return "%Y-%m-%dT%H:%M:%S.%f"
|
41
|
+
|
42
|
+
def scrubUnwantedFields(self, form):
|
43
|
+
newform = {k: v for k, v in form.items() if not self.unwantedfields.match(k)}
|
44
|
+
return newform
|
45
|
+
|
46
|
+
def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]:
|
47
|
+
try:
|
48
|
+
# Server returns status 500 when there are no more rows.
|
49
|
+
# raise an error if server returns an error
|
50
|
+
response.raise_for_status()
|
51
|
+
meta = response.json()["meta"]
|
52
|
+
return parse_qs(meta["next"][1:])
|
53
|
+
except Exception as ex:
|
54
|
+
return ex
|
55
|
+
|
56
|
+
def request_params(
|
57
|
+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, any] = None, next_page_token: Mapping[str, Any] = None
|
58
|
+
) -> MutableMapping[str, Any]:
|
59
|
+
|
60
|
+
params = {"format": "json"}
|
61
|
+
return params
|
62
|
+
|
63
|
+
|
64
|
+
class Application(CommcareStream):
|
65
|
+
primary_key = "id"
|
66
|
+
|
67
|
+
def __init__(self, app_id, **kwargs):
|
68
|
+
super().__init__(**kwargs)
|
69
|
+
self.app_id = app_id
|
70
|
+
|
71
|
+
def path(
|
72
|
+
self, stream_state: Mapping[str, Any] = None, stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None
|
73
|
+
) -> str:
|
74
|
+
return f"application/{self.app_id}/"
|
75
|
+
|
76
|
+
def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]:
|
77
|
+
return None
|
78
|
+
|
79
|
+
def request_params(
|
80
|
+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, any] = None, next_page_token: Mapping[str, Any] = None
|
81
|
+
) -> MutableMapping[str, Any]:
|
82
|
+
|
83
|
+
params = {"format": "json", "extras": "true"}
|
84
|
+
return params
|
85
|
+
|
86
|
+
def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
|
87
|
+
yield response.json()
|
88
|
+
|
89
|
+
|
90
|
+
class IncrementalStream(CommcareStream, IncrementalMixin):
|
91
|
+
cursor_field = "indexed_on"
|
92
|
+
_cursor_value = None
|
93
|
+
|
94
|
+
@property
|
95
|
+
def state(self) -> Mapping[str, Any]:
|
96
|
+
if self._cursor_value:
|
97
|
+
return {self.cursor_field: self._cursor_value}
|
98
|
+
|
99
|
+
@state.setter
|
100
|
+
def state(self, value: Mapping[str, Any]):
|
101
|
+
self._cursor_value = datetime.strptime(value[self.cursor_field], self.dateformat)
|
102
|
+
|
103
|
+
@property
|
104
|
+
def sync_mode(self):
|
105
|
+
return SyncMode.incremental
|
106
|
+
|
107
|
+
@property
|
108
|
+
def supported_sync_modes(self):
|
109
|
+
return [SyncMode.incremental]
|
110
|
+
|
111
|
+
def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]:
|
112
|
+
try:
|
113
|
+
# Server returns status 500 when there are no more rows.
|
114
|
+
# raise an error if server returns an error
|
115
|
+
response.raise_for_status()
|
116
|
+
meta = response.json()["meta"]
|
117
|
+
if meta["next"]:
|
118
|
+
return parse_qs(meta["next"][1:])
|
119
|
+
return None
|
120
|
+
except Exception:
|
121
|
+
return None
|
122
|
+
|
123
|
+
def request_params(
|
124
|
+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, any] = None, next_page_token: Mapping[str, Any] = None
|
125
|
+
) -> MutableMapping[str, Any]:
|
126
|
+
|
127
|
+
params = {"format": "json"}
|
128
|
+
if next_page_token:
|
129
|
+
params.update(next_page_token)
|
130
|
+
return params
|
131
|
+
|
132
|
+
def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
|
133
|
+
for o in iter(response.json()["objects"]):
|
134
|
+
yield o
|
135
|
+
return None
|
136
|
+
|
137
|
+
|
138
|
+
class Case(IncrementalStream):
|
139
|
+
|
140
|
+
"""
|
141
|
+
docs: https://www.commcarehq.org/a/[domain]/api/[version]/case/
|
142
|
+
"""
|
143
|
+
|
144
|
+
cursor_field = "indexed_on"
|
145
|
+
primary_key = "id"
|
146
|
+
|
147
|
+
def __init__(self, start_date, app_id, schema, **kwargs):
|
148
|
+
super().__init__(**kwargs)
|
149
|
+
self._cursor_value = datetime.strptime(start_date, "%Y-%m-%dT%H:%M:%SZ")
|
150
|
+
self.schema = schema
|
151
|
+
|
152
|
+
def get_json_schema(self):
|
153
|
+
return self.schema
|
154
|
+
|
155
|
+
@property
|
156
|
+
def name(self):
|
157
|
+
# Airbyte orders streams in alpha order but since we have dependent peers and we need to
|
158
|
+
# pull all forms before cases, we name this stream to
|
159
|
+
# ensure this stream gets pulled last (assuming ascii stream names only)
|
160
|
+
return "zzz_case"
|
161
|
+
|
162
|
+
def path(
|
163
|
+
self, stream_state: Mapping[str, Any] = None, stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None
|
164
|
+
) -> str:
|
165
|
+
return "case"
|
166
|
+
|
167
|
+
def request_params(
|
168
|
+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, any] = None, next_page_token: Mapping[str, Any] = None
|
169
|
+
) -> MutableMapping[str, Any]:
|
170
|
+
|
171
|
+
# start date is what we saved for forms
|
172
|
+
# if self.cursor_field in self.state else (CommcareStream.last_form_date or self.initial_date)
|
173
|
+
ix = self.state[self.cursor_field]
|
174
|
+
params = {"format": "json", "indexed_on_start": ix.strftime(self.dateformat), "order_by": "indexed_on", "limit": "5000"}
|
175
|
+
if next_page_token:
|
176
|
+
params.update(next_page_token)
|
177
|
+
return params
|
178
|
+
|
179
|
+
def read_records(self, *args, **kwargs) -> Iterable[Mapping[str, Any]]:
|
180
|
+
for record in super().read_records(*args, **kwargs):
|
181
|
+
found = False
|
182
|
+
for f in record["xform_ids"]:
|
183
|
+
if f in CommcareStream.forms:
|
184
|
+
found = True
|
185
|
+
break
|
186
|
+
if found:
|
187
|
+
self._cursor_value = datetime.strptime(record[self.cursor_field], self.dateformat)
|
188
|
+
# Make indexed_on tz aware
|
189
|
+
record.update({"streamname": "case", "indexed_on": record["indexed_on"] + "Z"})
|
190
|
+
# convert xform_ids field from array to comma separated list so flattening won't create
|
191
|
+
# one field per item. This is because some cases have up to 2000 xform_ids and we don't want 2000 extra
|
192
|
+
# fields in the schema
|
193
|
+
record["xform_ids"] = ",".join(record["xform_ids"])
|
194
|
+
frec = flatten(record)
|
195
|
+
yield frec
|
196
|
+
if self._cursor_value.microsecond == 0:
|
197
|
+
# Airbyte converts the cursor_field value (datetime) to string when it saves the state and
|
198
|
+
# our state setter parses the saved state with a format that contains microseconds
|
199
|
+
# self._cursor_value must have non-zero microseconds for the formatting and parsing to work correctly.
|
200
|
+
# This issue would also occur if an incoming record had a timestamp with zero microseconds
|
201
|
+
self._cursor_value = self._cursor_value.replace(microsecond=10)
|
202
|
+
# This cycle of pull is complete so clear out the form ids we saved for this cycle
|
203
|
+
CommcareStream.forms.clear()
|
204
|
+
|
205
|
+
|
206
|
+
class Form(IncrementalStream):
|
207
|
+
"""
|
208
|
+
docs: https://www.commcarehq.org/a/[domain]/api/[version]/form/
|
209
|
+
"""
|
210
|
+
|
211
|
+
cursor_field = "indexed_on"
|
212
|
+
primary_key = "id"
|
213
|
+
|
214
|
+
def __init__(self, start_date, app_id, name, xmlns, schema, **kwargs):
|
215
|
+
super().__init__(**kwargs)
|
216
|
+
self.app_id = app_id
|
217
|
+
self._cursor_value = datetime.strptime(start_date, "%Y-%m-%dT%H:%M:%SZ")
|
218
|
+
self.streamname = name
|
219
|
+
self.xmlns = xmlns
|
220
|
+
self.schema = schema
|
221
|
+
|
222
|
+
@property
|
223
|
+
def name(self):
|
224
|
+
return self.streamname
|
225
|
+
|
226
|
+
def get_json_schema(self):
|
227
|
+
return self.schema
|
228
|
+
|
229
|
+
def path(
|
230
|
+
self, stream_state: Mapping[str, Any] = None, stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None
|
231
|
+
) -> str:
|
232
|
+
return "form"
|
233
|
+
|
234
|
+
def request_params(
|
235
|
+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, any] = None, next_page_token: Mapping[str, Any] = None
|
236
|
+
) -> MutableMapping[str, Any]:
|
237
|
+
|
238
|
+
# if self.cursor_field in self.state else self.initial_date
|
239
|
+
ix = self.state[self.cursor_field]
|
240
|
+
params = {
|
241
|
+
"format": "json",
|
242
|
+
"app_id": self.app_id,
|
243
|
+
"indexed_on_start": ix.strftime(self.dateformat),
|
244
|
+
"order_by": "indexed_on",
|
245
|
+
"limit": "1000",
|
246
|
+
"xmlns": self.xmlns,
|
247
|
+
}
|
248
|
+
if next_page_token:
|
249
|
+
params.update(next_page_token)
|
250
|
+
return params
|
251
|
+
|
252
|
+
def read_records(self, *args, **kwargs) -> Iterable[Mapping[str, Any]]:
|
253
|
+
upd = {"streamname": self.streamname, "xmlns": self.xmlns}
|
254
|
+
for record in super().read_records(*args, **kwargs):
|
255
|
+
self._cursor_value = datetime.strptime(record[self.cursor_field], self.dateformat)
|
256
|
+
CommcareStream.forms.add(record["id"])
|
257
|
+
form = record["form"]
|
258
|
+
form.update(upd)
|
259
|
+
# Append Z to make it timezone aware
|
260
|
+
form.update({"id": record["id"], "indexed_on": record["indexed_on"] + "Z"})
|
261
|
+
newform = self.scrubUnwantedFields(form)
|
262
|
+
yield flatten(newform)
|
263
|
+
if self._cursor_value.microsecond == 0:
|
264
|
+
# Airbyte converts the cursor_field value (datetime) to string when it saves the state and
|
265
|
+
# our state setter parses the saved state with a format that contains microseconds
|
266
|
+
# self._cursor_value must have non-zero microseconds for the formatting and parsing to work correctly.
|
267
|
+
# This issue would also occur if an incoming record had a timestamp with zero microseconds
|
268
|
+
self._cursor_value = self._cursor_value.replace(microsecond=10)
|
269
|
+
|
270
|
+
|
271
|
+
# Source
|
272
|
+
class SourceCommcare(AbstractSource):
|
273
|
+
def check_connection(self, logger, config) -> Tuple[bool, any]:
|
274
|
+
if "api_key" not in config:
|
275
|
+
return False, None
|
276
|
+
return True, None
|
277
|
+
|
278
|
+
def base_schema(self):
|
279
|
+
return {
|
280
|
+
"$schema": "http://json-schema.org/draft-07/schema#",
|
281
|
+
"type": "object",
|
282
|
+
"properties": {"id": {"type": "string"}, "indexed_on": {"type": "string", "format": "date-time"}},
|
283
|
+
}
|
284
|
+
|
285
|
+
def streams(self, config: Mapping[str, Any]) -> List[Stream]:
|
286
|
+
auth = TokenAuthenticator(config["api_key"], auth_method="ApiKey")
|
287
|
+
args = {
|
288
|
+
"authenticator": auth,
|
289
|
+
}
|
290
|
+
appdata = Application(**{**args, "app_id": config["app_id"], "project_space": config["project_space"]}).read_records(
|
291
|
+
sync_mode=SyncMode.full_refresh
|
292
|
+
)
|
293
|
+
|
294
|
+
# Generate streams for forms, one per xmlns and one stream for cases.
|
295
|
+
streams = self.generate_streams(args, config, appdata)
|
296
|
+
return streams
|
297
|
+
|
298
|
+
def generate_streams(self, args, config, appdata):
|
299
|
+
form_args = {"app_id": config["app_id"], "start_date": config["start_date"], "project_space": config["project_space"], **args}
|
300
|
+
streams = []
|
301
|
+
name2xmlns = {}
|
302
|
+
|
303
|
+
# Collect the form names and xmlns from the application
|
304
|
+
for record in appdata:
|
305
|
+
mods = record["modules"]
|
306
|
+
for m in mods:
|
307
|
+
forms = m["forms"]
|
308
|
+
for f in forms:
|
309
|
+
xmlns = f["xmlns"]
|
310
|
+
formname = ""
|
311
|
+
if "en" in f["name"]:
|
312
|
+
formname = f["name"]["en"].strip()
|
313
|
+
else:
|
314
|
+
# Unknown forms are named UNNAMED_xxxxx where xxxxx are the last 5 difits of the XMLNS
|
315
|
+
# This convention gives us repeatable names
|
316
|
+
formname = f"Unnamed_{xmlns[-5:]}"
|
317
|
+
|
318
|
+
name = formname
|
319
|
+
name2xmlns[name] = xmlns
|
320
|
+
|
321
|
+
# Create the streams from the collected names
|
322
|
+
# Sorted by name
|
323
|
+
for k in sorted(name2xmlns):
|
324
|
+
key = name2xmlns[k]
|
325
|
+
stream = Form(name=k, xmlns=key, schema=self.base_schema(), **form_args)
|
326
|
+
streams.append(stream)
|
327
|
+
|
328
|
+
stream = Case(
|
329
|
+
app_id=config["app_id"],
|
330
|
+
start_date=config["start_date"],
|
331
|
+
schema=self.base_schema(),
|
332
|
+
project_space=config["project_space"],
|
333
|
+
**args,
|
334
|
+
)
|
335
|
+
streams.append(stream)
|
336
|
+
|
337
|
+
return streams
|
@@ -0,0 +1,38 @@
|
|
1
|
+
documentationUrl: https://docsurl.com
|
2
|
+
connectionSpecification:
|
3
|
+
$schema: http://json-schema.org/draft-07/schema#
|
4
|
+
title: Commcare Source Spec
|
5
|
+
type: object
|
6
|
+
required:
|
7
|
+
- api_key
|
8
|
+
- app_id
|
9
|
+
- start_date
|
10
|
+
properties:
|
11
|
+
api_key:
|
12
|
+
type: string
|
13
|
+
title: API Key
|
14
|
+
description: >-
|
15
|
+
Commcare API Key
|
16
|
+
airbyte_secret: true
|
17
|
+
order: 0
|
18
|
+
project_space:
|
19
|
+
type: string
|
20
|
+
title: Project Space
|
21
|
+
description: >-
|
22
|
+
Project Space for commcare
|
23
|
+
order: 1
|
24
|
+
app_id:
|
25
|
+
type: string
|
26
|
+
title: Application ID
|
27
|
+
description: >-
|
28
|
+
The Application ID we are interested in
|
29
|
+
airbyte_secret: true
|
30
|
+
order: 2
|
31
|
+
start_date:
|
32
|
+
type: string
|
33
|
+
title: Start date for extracting records
|
34
|
+
pattern: ^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$
|
35
|
+
default: "2022-10-01T00:00:00Z"
|
36
|
+
description: >-
|
37
|
+
UTC date and time in the format 2017-01-25T00:00:00Z. Only records after this date will be replicated.
|
38
|
+
order: 3
|
unit_tests/__init__.py
ADDED
@@ -0,0 +1,25 @@
|
|
1
|
+
#
|
2
|
+
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
|
3
|
+
#
|
4
|
+
|
5
|
+
from unittest.mock import MagicMock, Mock
|
6
|
+
|
7
|
+
import pytest
|
8
|
+
from source_commcare.source import SourceCommcare
|
9
|
+
|
10
|
+
|
11
|
+
@pytest.fixture(name="config")
|
12
|
+
def config_fixture():
|
13
|
+
return {"api_key": "apikey", "app_id": "appid", "start_date": "2022-01-01T00:00:00Z"}
|
14
|
+
|
15
|
+
|
16
|
+
def test_check_connection_ok(mocker, config):
|
17
|
+
source = SourceCommcare()
|
18
|
+
logger_mock = Mock()
|
19
|
+
assert source.check_connection(logger_mock, config=config) == (True, None)
|
20
|
+
|
21
|
+
|
22
|
+
def test_check_connection_fail(mocker, config):
|
23
|
+
source = SourceCommcare()
|
24
|
+
logger_mock = MagicMock()
|
25
|
+
assert source.check_connection(logger_mock, config={}) == (False, None)
|