airbyte-source-commcare 0.1.0__py3-none-any.whl
Sign up to get free protection for your applications and to get access to all the features.
- airbyte_source_commcare-0.1.0.dist-info/METADATA +102 -0
- airbyte_source_commcare-0.1.0.dist-info/RECORD +19 -0
- airbyte_source_commcare-0.1.0.dist-info/WHEEL +5 -0
- airbyte_source_commcare-0.1.0.dist-info/entry_points.txt +2 -0
- airbyte_source_commcare-0.1.0.dist-info/top_level.txt +3 -0
- integration_tests/__init__.py +3 -0
- integration_tests/abnormal_state.json +5 -0
- integration_tests/acceptance.py +16 -0
- integration_tests/catalog.json +1 -0
- integration_tests/configured_catalog.json +16 -0
- integration_tests/invalid_config.json +6 -0
- integration_tests/sample_config.json +6 -0
- integration_tests/sample_state.json +5 -0
- source_commcare/__init__.py +8 -0
- source_commcare/run.py +14 -0
- source_commcare/source.py +337 -0
- source_commcare/spec.yaml +38 -0
- unit_tests/__init__.py +3 -0
- unit_tests/test_source.py +25 -0
@@ -0,0 +1,102 @@
|
|
1
|
+
Metadata-Version: 2.1
|
2
|
+
Name: airbyte-source-commcare
|
3
|
+
Version: 0.1.0
|
4
|
+
Summary: Source implementation for Commcare.
|
5
|
+
Author: Airbyte
|
6
|
+
Author-email: contact@airbyte.io
|
7
|
+
Description-Content-Type: text/markdown
|
8
|
+
Requires-Dist: airbyte-cdk
|
9
|
+
Requires-Dist: bigquery-schema-generator ~=1.5
|
10
|
+
Requires-Dist: gbqschema-converter ~=1.2.0
|
11
|
+
Requires-Dist: flatten-json ~=0.1.13
|
12
|
+
Provides-Extra: tests
|
13
|
+
Requires-Dist: requests-mock ~=1.9.3 ; extra == 'tests'
|
14
|
+
Requires-Dist: pytest ~=6.1 ; extra == 'tests'
|
15
|
+
Requires-Dist: pytest-mock ~=3.6.1 ; extra == 'tests'
|
16
|
+
|
17
|
+
# Commcare Source
|
18
|
+
|
19
|
+
This is the repository for the Commcare source connector, written in Python.
|
20
|
+
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.io/integrations/sources/commcare).
|
21
|
+
|
22
|
+
|
23
|
+
**To iterate on this connector, make sure to complete this prerequisites section.**
|
24
|
+
|
25
|
+
|
26
|
+
From this connector directory, create a virtual environment:
|
27
|
+
```
|
28
|
+
python -m venv .venv
|
29
|
+
```
|
30
|
+
|
31
|
+
This will generate a virtualenv for this module in `.venv/`. Make sure this venv is active in your
|
32
|
+
development environment of choice. To activate it from the terminal, run:
|
33
|
+
```
|
34
|
+
source .venv/bin/activate
|
35
|
+
pip install -r requirements.txt
|
36
|
+
pip install '.[tests]'
|
37
|
+
```
|
38
|
+
If you are in an IDE, follow your IDE's instructions to activate the virtualenv.
|
39
|
+
|
40
|
+
Note that while we are installing dependencies from `requirements.txt`, you should only edit `setup.py` for your dependencies. `requirements.txt` is
|
41
|
+
used for editable installs (`pip install -e`) to pull in Python dependencies from the monorepo and will call `setup.py`.
|
42
|
+
If this is mumbo jumbo to you, don't worry about it, just put your deps in `setup.py` but install using `pip install -r requirements.txt` and everything
|
43
|
+
should work as you expect.
|
44
|
+
|
45
|
+
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.io/integrations/sources/commcare)
|
46
|
+
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_commcare/spec.yaml` file.
|
47
|
+
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.
|
48
|
+
See `integration_tests/sample_config.json` for a sample config file.
|
49
|
+
|
50
|
+
**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source commcare test creds`
|
51
|
+
and place them into `secrets/config.json`.
|
52
|
+
|
53
|
+
```
|
54
|
+
python main.py spec
|
55
|
+
python main.py check --config secrets/config.json
|
56
|
+
python main.py discover --config secrets/config.json
|
57
|
+
python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json
|
58
|
+
```
|
59
|
+
|
60
|
+
|
61
|
+
|
62
|
+
**Via [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) (recommended):**
|
63
|
+
```bash
|
64
|
+
airbyte-ci connectors --name=source-commcare build
|
65
|
+
```
|
66
|
+
|
67
|
+
An image will be built with the tag `airbyte/source-commcare:dev`.
|
68
|
+
|
69
|
+
**Via `docker build`:**
|
70
|
+
```bash
|
71
|
+
docker build -t airbyte/source-commcare:dev .
|
72
|
+
```
|
73
|
+
|
74
|
+
Then run any of the connector commands as follows:
|
75
|
+
```
|
76
|
+
docker run --rm airbyte/source-commcare:dev spec
|
77
|
+
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-commcare:dev check --config /secrets/config.json
|
78
|
+
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-commcare:dev discover --config /secrets/config.json
|
79
|
+
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-commcare:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
|
80
|
+
```
|
81
|
+
|
82
|
+
You can run our full test suite locally using [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md):
|
83
|
+
```bash
|
84
|
+
airbyte-ci connectors --name=source-commcare test
|
85
|
+
```
|
86
|
+
|
87
|
+
Customize `acceptance-test-config.yml` file to configure tests. See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
|
88
|
+
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
|
89
|
+
|
90
|
+
All of your dependencies should go in `setup.py`, NOT `requirements.txt`. The requirements file is only used to connect internal Airbyte dependencies in the monorepo for local development.
|
91
|
+
We split dependencies between two groups, dependencies that are:
|
92
|
+
* required for your connector to work need to go to `MAIN_REQUIREMENTS` list.
|
93
|
+
* required for the testing need to go to `TEST_REQUIREMENTS` list
|
94
|
+
|
95
|
+
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
|
96
|
+
1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-commcare test`
|
97
|
+
2. Bump the connector version in `metadata.yaml`: increment the `dockerImageTag` value. Please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors).
|
98
|
+
3. Make sure the `metadata.yaml` content is up to date.
|
99
|
+
4. Make the connector documentation and its changelog is up to date (`docs/integrations/sources/commcare.md`).
|
100
|
+
5. Create a Pull Request: use [our PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#pull-request-title-convention).
|
101
|
+
6. Pat yourself on the back for being an awesome contributor.
|
102
|
+
7. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
|
@@ -0,0 +1,19 @@
|
|
1
|
+
integration_tests/__init__.py,sha256=4Hw-PX1-VgESLF16cDdvuYCzGJtHntThLF4qIiULWeo,61
|
2
|
+
integration_tests/abnormal_state.json,sha256=-DJMcBhq1do1ntOZXzsaGmxvbDTmYmFS0cSSaUoRLZ4,86
|
3
|
+
integration_tests/acceptance.py,sha256=8eU9iSDbmHyufPvAouJGhPMgPAFTCP8IKIKHLm7u5TE,435
|
4
|
+
integration_tests/catalog.json,sha256=yj0WO6sFU4GCciYUBWjzvvfqrBh869doeOC2Pp5EI1Y,3
|
5
|
+
integration_tests/configured_catalog.json,sha256=1tbuuekIkjKW6NKdwW9ulYAYNjErYgAJqDsq08etQLE,397
|
6
|
+
integration_tests/invalid_config.json,sha256=kXy1C-2u6aW122jeDBlv1taNZDYjzdjRyer_2B1xgk8,142
|
7
|
+
integration_tests/sample_config.json,sha256=Lh9_zss5lxbqGFYpZnvi8qo5eIMNcuQ6GBDKbw1dVhM,115
|
8
|
+
integration_tests/sample_state.json,sha256=HBR6G9QPoPDFAlfAKwRmeNaQ5_56xehMLtwWr4mxd5k,63
|
9
|
+
source_commcare/__init__.py,sha256=e2gcZC5sfkJ_uidHqoPCVmqUdL9mhpMrw4Ix4uIzSng,128
|
10
|
+
source_commcare/run.py,sha256=aZZ10q2rM_fY8lmNk4Gmb_pOhDJvv-biHN6V-ut5Y6I,236
|
11
|
+
source_commcare/source.py,sha256=X_4bNim1LBDCVjqnhWViaFgsjPvMTXkZZ1yNRGRf1Aw,12956
|
12
|
+
source_commcare/spec.yaml,sha256=CMpevuCDKMFZmq16CdeilpvXKbCiQPjmI1fcgzvJQuQ,1019
|
13
|
+
unit_tests/__init__.py,sha256=4Hw-PX1-VgESLF16cDdvuYCzGJtHntThLF4qIiULWeo,61
|
14
|
+
unit_tests/test_source.py,sha256=qneLJGlvcpoyFlnvwzwP2voDtzi8z6c_N-lu0won5wM,682
|
15
|
+
airbyte_source_commcare-0.1.0.dist-info/METADATA,sha256=-GDqr4YKVK8LGyoTB9nmPitvdARyMlTt9LOzmE5dJUc,5536
|
16
|
+
airbyte_source_commcare-0.1.0.dist-info/WHEEL,sha256=oiQVh_5PnQM0E3gPdiz09WCNmwiHDMaGer_elqB3coM,92
|
17
|
+
airbyte_source_commcare-0.1.0.dist-info/entry_points.txt,sha256=zVJihl1jgNaayMgxgA1G9KN85LZ9T8bbss4VfqE78Bg,60
|
18
|
+
airbyte_source_commcare-0.1.0.dist-info/top_level.txt,sha256=WCua9CoU5uJ26Af2zVYpamondw5woLFReykWLPCjX7s,45
|
19
|
+
airbyte_source_commcare-0.1.0.dist-info/RECORD,,
|
@@ -0,0 +1,16 @@
|
|
1
|
+
#
|
2
|
+
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
|
3
|
+
#
|
4
|
+
|
5
|
+
|
6
|
+
import pytest
|
7
|
+
|
8
|
+
pytest_plugins = ("connector_acceptance_test.plugin",)
|
9
|
+
|
10
|
+
|
11
|
+
@pytest.fixture(scope="session", autouse=True)
|
12
|
+
def connector_setup():
|
13
|
+
"""This fixture is a placeholder for external resources that acceptance test might require."""
|
14
|
+
# TODO: setup test dependencies if needed. otherwise remove the TODO comments
|
15
|
+
yield
|
16
|
+
# TODO: clean up test dependencies
|
@@ -0,0 +1 @@
|
|
1
|
+
{}
|
@@ -0,0 +1,16 @@
|
|
1
|
+
{
|
2
|
+
"streams": [
|
3
|
+
{
|
4
|
+
"stream": {
|
5
|
+
"name": "Assess a referred patient",
|
6
|
+
"json_schema": {},
|
7
|
+
"supported_sync_modes": ["full_refresh", "incremental"],
|
8
|
+
"source_defined_cursor": true,
|
9
|
+
"default_cursor_field": ["indexed_on"]
|
10
|
+
},
|
11
|
+
"sync_mode": "incremental",
|
12
|
+
"cursor_field": ["indexed_on"],
|
13
|
+
"destination_sync_mode": "append"
|
14
|
+
}
|
15
|
+
]
|
16
|
+
}
|
source_commcare/run.py
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#
|
2
|
+
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
|
3
|
+
#
|
4
|
+
|
5
|
+
|
6
|
+
import sys
|
7
|
+
|
8
|
+
from airbyte_cdk.entrypoint import launch
|
9
|
+
from source_commcare import SourceCommcare
|
10
|
+
|
11
|
+
|
12
|
+
def run():
|
13
|
+
source = SourceCommcare()
|
14
|
+
launch(source, sys.argv[1:])
|
@@ -0,0 +1,337 @@
|
|
1
|
+
#
|
2
|
+
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
|
3
|
+
#
|
4
|
+
|
5
|
+
import re
|
6
|
+
from abc import ABC
|
7
|
+
from datetime import datetime
|
8
|
+
from typing import Any, Iterable, List, Mapping, MutableMapping, Optional, Tuple
|
9
|
+
from urllib.parse import parse_qs
|
10
|
+
|
11
|
+
import requests
|
12
|
+
from airbyte_cdk.models import SyncMode
|
13
|
+
from airbyte_cdk.sources import AbstractSource
|
14
|
+
from airbyte_cdk.sources.streams import IncrementalMixin, Stream
|
15
|
+
from airbyte_cdk.sources.streams.http import HttpStream
|
16
|
+
from airbyte_cdk.sources.streams.http.requests_native_auth import TokenAuthenticator
|
17
|
+
from flatten_json import flatten
|
18
|
+
|
19
|
+
|
20
|
+
# Basic full refresh stream
|
21
|
+
class CommcareStream(HttpStream, ABC):
|
22
|
+
def __init__(self, project_space, **kwargs):
|
23
|
+
super().__init__(**kwargs)
|
24
|
+
self.project_space = project_space
|
25
|
+
|
26
|
+
@property
|
27
|
+
def url_base(self) -> str:
|
28
|
+
return f"https://www.commcarehq.org/a/{self.project_space}/api/v0.5/"
|
29
|
+
|
30
|
+
# These class variables save state
|
31
|
+
# forms holds form ids and we filter cases which contain one of these form ids
|
32
|
+
# last_form_date stores the date of the last form read so the next cycle for forms and cases starts at the same timestamp
|
33
|
+
forms = set()
|
34
|
+
last_form_date = None
|
35
|
+
schemas = {}
|
36
|
+
unwantedfields = re.compile(r"^(case_|update_|meta|create_|commcare_).*$")
|
37
|
+
|
38
|
+
@property
|
39
|
+
def dateformat(self):
|
40
|
+
return "%Y-%m-%dT%H:%M:%S.%f"
|
41
|
+
|
42
|
+
def scrubUnwantedFields(self, form):
|
43
|
+
newform = {k: v for k, v in form.items() if not self.unwantedfields.match(k)}
|
44
|
+
return newform
|
45
|
+
|
46
|
+
def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]:
|
47
|
+
try:
|
48
|
+
# Server returns status 500 when there are no more rows.
|
49
|
+
# raise an error if server returns an error
|
50
|
+
response.raise_for_status()
|
51
|
+
meta = response.json()["meta"]
|
52
|
+
return parse_qs(meta["next"][1:])
|
53
|
+
except Exception as ex:
|
54
|
+
return ex
|
55
|
+
|
56
|
+
def request_params(
|
57
|
+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, any] = None, next_page_token: Mapping[str, Any] = None
|
58
|
+
) -> MutableMapping[str, Any]:
|
59
|
+
|
60
|
+
params = {"format": "json"}
|
61
|
+
return params
|
62
|
+
|
63
|
+
|
64
|
+
class Application(CommcareStream):
|
65
|
+
primary_key = "id"
|
66
|
+
|
67
|
+
def __init__(self, app_id, **kwargs):
|
68
|
+
super().__init__(**kwargs)
|
69
|
+
self.app_id = app_id
|
70
|
+
|
71
|
+
def path(
|
72
|
+
self, stream_state: Mapping[str, Any] = None, stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None
|
73
|
+
) -> str:
|
74
|
+
return f"application/{self.app_id}/"
|
75
|
+
|
76
|
+
def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]:
|
77
|
+
return None
|
78
|
+
|
79
|
+
def request_params(
|
80
|
+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, any] = None, next_page_token: Mapping[str, Any] = None
|
81
|
+
) -> MutableMapping[str, Any]:
|
82
|
+
|
83
|
+
params = {"format": "json", "extras": "true"}
|
84
|
+
return params
|
85
|
+
|
86
|
+
def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
|
87
|
+
yield response.json()
|
88
|
+
|
89
|
+
|
90
|
+
class IncrementalStream(CommcareStream, IncrementalMixin):
|
91
|
+
cursor_field = "indexed_on"
|
92
|
+
_cursor_value = None
|
93
|
+
|
94
|
+
@property
|
95
|
+
def state(self) -> Mapping[str, Any]:
|
96
|
+
if self._cursor_value:
|
97
|
+
return {self.cursor_field: self._cursor_value}
|
98
|
+
|
99
|
+
@state.setter
|
100
|
+
def state(self, value: Mapping[str, Any]):
|
101
|
+
self._cursor_value = datetime.strptime(value[self.cursor_field], self.dateformat)
|
102
|
+
|
103
|
+
@property
|
104
|
+
def sync_mode(self):
|
105
|
+
return SyncMode.incremental
|
106
|
+
|
107
|
+
@property
|
108
|
+
def supported_sync_modes(self):
|
109
|
+
return [SyncMode.incremental]
|
110
|
+
|
111
|
+
def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]:
|
112
|
+
try:
|
113
|
+
# Server returns status 500 when there are no more rows.
|
114
|
+
# raise an error if server returns an error
|
115
|
+
response.raise_for_status()
|
116
|
+
meta = response.json()["meta"]
|
117
|
+
if meta["next"]:
|
118
|
+
return parse_qs(meta["next"][1:])
|
119
|
+
return None
|
120
|
+
except Exception:
|
121
|
+
return None
|
122
|
+
|
123
|
+
def request_params(
|
124
|
+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, any] = None, next_page_token: Mapping[str, Any] = None
|
125
|
+
) -> MutableMapping[str, Any]:
|
126
|
+
|
127
|
+
params = {"format": "json"}
|
128
|
+
if next_page_token:
|
129
|
+
params.update(next_page_token)
|
130
|
+
return params
|
131
|
+
|
132
|
+
def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
|
133
|
+
for o in iter(response.json()["objects"]):
|
134
|
+
yield o
|
135
|
+
return None
|
136
|
+
|
137
|
+
|
138
|
+
class Case(IncrementalStream):
|
139
|
+
|
140
|
+
"""
|
141
|
+
docs: https://www.commcarehq.org/a/[domain]/api/[version]/case/
|
142
|
+
"""
|
143
|
+
|
144
|
+
cursor_field = "indexed_on"
|
145
|
+
primary_key = "id"
|
146
|
+
|
147
|
+
def __init__(self, start_date, app_id, schema, **kwargs):
|
148
|
+
super().__init__(**kwargs)
|
149
|
+
self._cursor_value = datetime.strptime(start_date, "%Y-%m-%dT%H:%M:%SZ")
|
150
|
+
self.schema = schema
|
151
|
+
|
152
|
+
def get_json_schema(self):
|
153
|
+
return self.schema
|
154
|
+
|
155
|
+
@property
|
156
|
+
def name(self):
|
157
|
+
# Airbyte orders streams in alpha order but since we have dependent peers and we need to
|
158
|
+
# pull all forms before cases, we name this stream to
|
159
|
+
# ensure this stream gets pulled last (assuming ascii stream names only)
|
160
|
+
return "zzz_case"
|
161
|
+
|
162
|
+
def path(
|
163
|
+
self, stream_state: Mapping[str, Any] = None, stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None
|
164
|
+
) -> str:
|
165
|
+
return "case"
|
166
|
+
|
167
|
+
def request_params(
|
168
|
+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, any] = None, next_page_token: Mapping[str, Any] = None
|
169
|
+
) -> MutableMapping[str, Any]:
|
170
|
+
|
171
|
+
# start date is what we saved for forms
|
172
|
+
# if self.cursor_field in self.state else (CommcareStream.last_form_date or self.initial_date)
|
173
|
+
ix = self.state[self.cursor_field]
|
174
|
+
params = {"format": "json", "indexed_on_start": ix.strftime(self.dateformat), "order_by": "indexed_on", "limit": "5000"}
|
175
|
+
if next_page_token:
|
176
|
+
params.update(next_page_token)
|
177
|
+
return params
|
178
|
+
|
179
|
+
def read_records(self, *args, **kwargs) -> Iterable[Mapping[str, Any]]:
|
180
|
+
for record in super().read_records(*args, **kwargs):
|
181
|
+
found = False
|
182
|
+
for f in record["xform_ids"]:
|
183
|
+
if f in CommcareStream.forms:
|
184
|
+
found = True
|
185
|
+
break
|
186
|
+
if found:
|
187
|
+
self._cursor_value = datetime.strptime(record[self.cursor_field], self.dateformat)
|
188
|
+
# Make indexed_on tz aware
|
189
|
+
record.update({"streamname": "case", "indexed_on": record["indexed_on"] + "Z"})
|
190
|
+
# convert xform_ids field from array to comma separated list so flattening won't create
|
191
|
+
# one field per item. This is because some cases have up to 2000 xform_ids and we don't want 2000 extra
|
192
|
+
# fields in the schema
|
193
|
+
record["xform_ids"] = ",".join(record["xform_ids"])
|
194
|
+
frec = flatten(record)
|
195
|
+
yield frec
|
196
|
+
if self._cursor_value.microsecond == 0:
|
197
|
+
# Airbyte converts the cursor_field value (datetime) to string when it saves the state and
|
198
|
+
# our state setter parses the saved state with a format that contains microseconds
|
199
|
+
# self._cursor_value must have non-zero microseconds for the formatting and parsing to work correctly.
|
200
|
+
# This issue would also occur if an incoming record had a timestamp with zero microseconds
|
201
|
+
self._cursor_value = self._cursor_value.replace(microsecond=10)
|
202
|
+
# This cycle of pull is complete so clear out the form ids we saved for this cycle
|
203
|
+
CommcareStream.forms.clear()
|
204
|
+
|
205
|
+
|
206
|
+
class Form(IncrementalStream):
|
207
|
+
"""
|
208
|
+
docs: https://www.commcarehq.org/a/[domain]/api/[version]/form/
|
209
|
+
"""
|
210
|
+
|
211
|
+
cursor_field = "indexed_on"
|
212
|
+
primary_key = "id"
|
213
|
+
|
214
|
+
def __init__(self, start_date, app_id, name, xmlns, schema, **kwargs):
|
215
|
+
super().__init__(**kwargs)
|
216
|
+
self.app_id = app_id
|
217
|
+
self._cursor_value = datetime.strptime(start_date, "%Y-%m-%dT%H:%M:%SZ")
|
218
|
+
self.streamname = name
|
219
|
+
self.xmlns = xmlns
|
220
|
+
self.schema = schema
|
221
|
+
|
222
|
+
@property
|
223
|
+
def name(self):
|
224
|
+
return self.streamname
|
225
|
+
|
226
|
+
def get_json_schema(self):
|
227
|
+
return self.schema
|
228
|
+
|
229
|
+
def path(
|
230
|
+
self, stream_state: Mapping[str, Any] = None, stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None
|
231
|
+
) -> str:
|
232
|
+
return "form"
|
233
|
+
|
234
|
+
def request_params(
|
235
|
+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, any] = None, next_page_token: Mapping[str, Any] = None
|
236
|
+
) -> MutableMapping[str, Any]:
|
237
|
+
|
238
|
+
# if self.cursor_field in self.state else self.initial_date
|
239
|
+
ix = self.state[self.cursor_field]
|
240
|
+
params = {
|
241
|
+
"format": "json",
|
242
|
+
"app_id": self.app_id,
|
243
|
+
"indexed_on_start": ix.strftime(self.dateformat),
|
244
|
+
"order_by": "indexed_on",
|
245
|
+
"limit": "1000",
|
246
|
+
"xmlns": self.xmlns,
|
247
|
+
}
|
248
|
+
if next_page_token:
|
249
|
+
params.update(next_page_token)
|
250
|
+
return params
|
251
|
+
|
252
|
+
def read_records(self, *args, **kwargs) -> Iterable[Mapping[str, Any]]:
|
253
|
+
upd = {"streamname": self.streamname, "xmlns": self.xmlns}
|
254
|
+
for record in super().read_records(*args, **kwargs):
|
255
|
+
self._cursor_value = datetime.strptime(record[self.cursor_field], self.dateformat)
|
256
|
+
CommcareStream.forms.add(record["id"])
|
257
|
+
form = record["form"]
|
258
|
+
form.update(upd)
|
259
|
+
# Append Z to make it timezone aware
|
260
|
+
form.update({"id": record["id"], "indexed_on": record["indexed_on"] + "Z"})
|
261
|
+
newform = self.scrubUnwantedFields(form)
|
262
|
+
yield flatten(newform)
|
263
|
+
if self._cursor_value.microsecond == 0:
|
264
|
+
# Airbyte converts the cursor_field value (datetime) to string when it saves the state and
|
265
|
+
# our state setter parses the saved state with a format that contains microseconds
|
266
|
+
# self._cursor_value must have non-zero microseconds for the formatting and parsing to work correctly.
|
267
|
+
# This issue would also occur if an incoming record had a timestamp with zero microseconds
|
268
|
+
self._cursor_value = self._cursor_value.replace(microsecond=10)
|
269
|
+
|
270
|
+
|
271
|
+
# Source
|
272
|
+
class SourceCommcare(AbstractSource):
|
273
|
+
def check_connection(self, logger, config) -> Tuple[bool, any]:
|
274
|
+
if "api_key" not in config:
|
275
|
+
return False, None
|
276
|
+
return True, None
|
277
|
+
|
278
|
+
def base_schema(self):
|
279
|
+
return {
|
280
|
+
"$schema": "http://json-schema.org/draft-07/schema#",
|
281
|
+
"type": "object",
|
282
|
+
"properties": {"id": {"type": "string"}, "indexed_on": {"type": "string", "format": "date-time"}},
|
283
|
+
}
|
284
|
+
|
285
|
+
def streams(self, config: Mapping[str, Any]) -> List[Stream]:
|
286
|
+
auth = TokenAuthenticator(config["api_key"], auth_method="ApiKey")
|
287
|
+
args = {
|
288
|
+
"authenticator": auth,
|
289
|
+
}
|
290
|
+
appdata = Application(**{**args, "app_id": config["app_id"], "project_space": config["project_space"]}).read_records(
|
291
|
+
sync_mode=SyncMode.full_refresh
|
292
|
+
)
|
293
|
+
|
294
|
+
# Generate streams for forms, one per xmlns and one stream for cases.
|
295
|
+
streams = self.generate_streams(args, config, appdata)
|
296
|
+
return streams
|
297
|
+
|
298
|
+
def generate_streams(self, args, config, appdata):
|
299
|
+
form_args = {"app_id": config["app_id"], "start_date": config["start_date"], "project_space": config["project_space"], **args}
|
300
|
+
streams = []
|
301
|
+
name2xmlns = {}
|
302
|
+
|
303
|
+
# Collect the form names and xmlns from the application
|
304
|
+
for record in appdata:
|
305
|
+
mods = record["modules"]
|
306
|
+
for m in mods:
|
307
|
+
forms = m["forms"]
|
308
|
+
for f in forms:
|
309
|
+
xmlns = f["xmlns"]
|
310
|
+
formname = ""
|
311
|
+
if "en" in f["name"]:
|
312
|
+
formname = f["name"]["en"].strip()
|
313
|
+
else:
|
314
|
+
# Unknown forms are named UNNAMED_xxxxx where xxxxx are the last 5 difits of the XMLNS
|
315
|
+
# This convention gives us repeatable names
|
316
|
+
formname = f"Unnamed_{xmlns[-5:]}"
|
317
|
+
|
318
|
+
name = formname
|
319
|
+
name2xmlns[name] = xmlns
|
320
|
+
|
321
|
+
# Create the streams from the collected names
|
322
|
+
# Sorted by name
|
323
|
+
for k in sorted(name2xmlns):
|
324
|
+
key = name2xmlns[k]
|
325
|
+
stream = Form(name=k, xmlns=key, schema=self.base_schema(), **form_args)
|
326
|
+
streams.append(stream)
|
327
|
+
|
328
|
+
stream = Case(
|
329
|
+
app_id=config["app_id"],
|
330
|
+
start_date=config["start_date"],
|
331
|
+
schema=self.base_schema(),
|
332
|
+
project_space=config["project_space"],
|
333
|
+
**args,
|
334
|
+
)
|
335
|
+
streams.append(stream)
|
336
|
+
|
337
|
+
return streams
|
@@ -0,0 +1,38 @@
|
|
1
|
+
documentationUrl: https://docsurl.com
|
2
|
+
connectionSpecification:
|
3
|
+
$schema: http://json-schema.org/draft-07/schema#
|
4
|
+
title: Commcare Source Spec
|
5
|
+
type: object
|
6
|
+
required:
|
7
|
+
- api_key
|
8
|
+
- app_id
|
9
|
+
- start_date
|
10
|
+
properties:
|
11
|
+
api_key:
|
12
|
+
type: string
|
13
|
+
title: API Key
|
14
|
+
description: >-
|
15
|
+
Commcare API Key
|
16
|
+
airbyte_secret: true
|
17
|
+
order: 0
|
18
|
+
project_space:
|
19
|
+
type: string
|
20
|
+
title: Project Space
|
21
|
+
description: >-
|
22
|
+
Project Space for commcare
|
23
|
+
order: 1
|
24
|
+
app_id:
|
25
|
+
type: string
|
26
|
+
title: Application ID
|
27
|
+
description: >-
|
28
|
+
The Application ID we are interested in
|
29
|
+
airbyte_secret: true
|
30
|
+
order: 2
|
31
|
+
start_date:
|
32
|
+
type: string
|
33
|
+
title: Start date for extracting records
|
34
|
+
pattern: ^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$
|
35
|
+
default: "2022-10-01T00:00:00Z"
|
36
|
+
description: >-
|
37
|
+
UTC date and time in the format 2017-01-25T00:00:00Z. Only records after this date will be replicated.
|
38
|
+
order: 3
|
unit_tests/__init__.py
ADDED
@@ -0,0 +1,25 @@
|
|
1
|
+
#
|
2
|
+
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
|
3
|
+
#
|
4
|
+
|
5
|
+
from unittest.mock import MagicMock, Mock
|
6
|
+
|
7
|
+
import pytest
|
8
|
+
from source_commcare.source import SourceCommcare
|
9
|
+
|
10
|
+
|
11
|
+
@pytest.fixture(name="config")
|
12
|
+
def config_fixture():
|
13
|
+
return {"api_key": "apikey", "app_id": "appid", "start_date": "2022-01-01T00:00:00Z"}
|
14
|
+
|
15
|
+
|
16
|
+
def test_check_connection_ok(mocker, config):
|
17
|
+
source = SourceCommcare()
|
18
|
+
logger_mock = Mock()
|
19
|
+
assert source.check_connection(logger_mock, config=config) == (True, None)
|
20
|
+
|
21
|
+
|
22
|
+
def test_check_connection_fail(mocker, config):
|
23
|
+
source = SourceCommcare()
|
24
|
+
logger_mock = MagicMock()
|
25
|
+
assert source.check_connection(logger_mock, config={}) == (False, None)
|