allus-company-data 0.0.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- allus_company_data-0.0.3/LICENSE +21 -0
- allus_company_data-0.0.3/PKG-INFO +666 -0
- allus_company_data-0.0.3/README.md +648 -0
- allus_company_data-0.0.3/pyproject.toml +32 -0
- allus_company_data-0.0.3/setup.cfg +4 -0
- allus_company_data-0.0.3/src/allus_company_data/__init__.py +63 -0
- allus_company_data-0.0.3/src/allus_company_data/buffer.py +325 -0
- allus_company_data-0.0.3/src/allus_company_data/client.py +399 -0
- allus_company_data-0.0.3/src/allus_company_data/config.py +234 -0
- allus_company_data-0.0.3/src/allus_company_data/crypto.py +285 -0
- allus_company_data-0.0.3/src/allus_company_data/errors.py +103 -0
- allus_company_data-0.0.3/src/allus_company_data/http.py +301 -0
- allus_company_data-0.0.3/src/allus_company_data/models.py +404 -0
- allus_company_data-0.0.3/src/allus_company_data/pump.py +360 -0
- allus_company_data-0.0.3/src/allus_company_data/webhooks.py +370 -0
- allus_company_data-0.0.3/src/allus_company_data.egg-info/PKG-INFO +666 -0
- allus_company_data-0.0.3/src/allus_company_data.egg-info/SOURCES.txt +25 -0
- allus_company_data-0.0.3/src/allus_company_data.egg-info/dependency_links.txt +1 -0
- allus_company_data-0.0.3/src/allus_company_data.egg-info/requires.txt +5 -0
- allus_company_data-0.0.3/src/allus_company_data.egg-info/top_level.txt +1 -0
- allus_company_data-0.0.3/tests/test_client.py +427 -0
- allus_company_data-0.0.3/tests/test_config.py +135 -0
- allus_company_data-0.0.3/tests/test_crypto.py +267 -0
- allus_company_data-0.0.3/tests/test_http.py +338 -0
- allus_company_data-0.0.3/tests/test_models.py +370 -0
- allus_company_data-0.0.3/tests/test_pump.py +709 -0
- allus_company_data-0.0.3/tests/test_webhooks.py +563 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 allme.fyi
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,666 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: allus-company-data
|
|
3
|
+
Version: 0.0.3
|
|
4
|
+
Summary: Reference Python SDK for the allus company-data API: typed, plaintext, slug-keyed conclusions with transparent decryption.
|
|
5
|
+
Author: allme.fyi
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/allus-fyi/company-data-python
|
|
8
|
+
Project-URL: Repository, https://github.com/allus-fyi/company-data-python
|
|
9
|
+
Keywords: allus,allme,company-data,sdk
|
|
10
|
+
Requires-Python: >=3.11
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
License-File: LICENSE
|
|
13
|
+
Requires-Dist: cryptography>=42
|
|
14
|
+
Requires-Dist: requests>=2.31
|
|
15
|
+
Provides-Extra: dev
|
|
16
|
+
Requires-Dist: pytest>=8; extra == "dev"
|
|
17
|
+
Dynamic: license-file
|
|
18
|
+
|
|
19
|
+
# allus-company-data (Python)
|
|
20
|
+
|
|
21
|
+
The Python SDK for the **allus company-data API**. Point it at a JSON
|
|
22
|
+
config file and it hands back typed, plaintext, **your-slug-keyed conclusions**:
|
|
23
|
+
for each connected person, a map of *your request-field slug → plaintext value*
|
|
24
|
+
(plus whether the value is live and when it last changed).
|
|
25
|
+
|
|
26
|
+
The SDK hides everything else — the OAuth token, the field catalog, the id
|
|
27
|
+
plumbing, the hybrid decryption, binary fetching, the changes-queue mechanics,
|
|
28
|
+
JSON-vs-XML. The platform is **zero-knowledge**: the API only ever holds
|
|
29
|
+
ciphertext, so all decryption happens inside the SDK with your service private
|
|
30
|
+
key. **The person's own field choices are never exposed** — you only ever see
|
|
31
|
+
the request slots you configured.
|
|
32
|
+
|
|
33
|
+
> This SDK is one of six language ports that share an identical API surface.
|
|
34
|
+
> This manual is the Python view of it.
|
|
35
|
+
|
|
36
|
+
**Contents:** [TL;DR — fetch new updates](#tldr--fetch-new-updates) ·
|
|
37
|
+
[Quickstart](#quickstart) · [Every call](#every-call) ·
|
|
38
|
+
[The typed value model](#the-typed-value-model) ·
|
|
39
|
+
[The changes pump](#the-changes-pump) · [Webhooks](#webhooks) ·
|
|
40
|
+
[Rate limits](#rate-limits) · [Errors](#errors) ·
|
|
41
|
+
[How it's wired](#how-its-wired)
|
|
42
|
+
|
|
43
|
+
Deeper reference pages live in [`docs/`](docs/):
|
|
44
|
+
[config](docs/config.md) · [model](docs/model.md) · [pump](docs/pump.md) ·
|
|
45
|
+
[webhooks](docs/webhooks.md) · [errors](docs/errors.md).
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## TL;DR — fetch new updates
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
pip install allus-company-data
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
Point a config.json at your service keys:
|
|
56
|
+
|
|
57
|
+
```json
|
|
58
|
+
{
|
|
59
|
+
"api_url": "https://api.allme.fyi",
|
|
60
|
+
"client_id": "svc_xxx",
|
|
61
|
+
"client_secret": "xxx",
|
|
62
|
+
"service_private_key": "/path/to/service.pem",
|
|
63
|
+
"key_passphrase": "xxx",
|
|
64
|
+
"cache_dir": "./allus-cache"
|
|
65
|
+
}
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
Drain everything new, handled one update at a time:
|
|
69
|
+
|
|
70
|
+
```python
|
|
71
|
+
from allus_company_data import Client
|
|
72
|
+
|
|
73
|
+
client = Client.from_config("config.json")
|
|
74
|
+
|
|
75
|
+
def handle(change):
|
|
76
|
+
# one update at a time: event, person, slug, value, live, at
|
|
77
|
+
print(change.event, change.person_id, change.slug, change.value,
|
|
78
|
+
"live" if change.live else "snapshot", change.at)
|
|
79
|
+
|
|
80
|
+
client.process_changes(handle) # returns when the feed is empty
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
`process_changes` pulls every pending change, decrypts it, and hands them to your
|
|
84
|
+
callback ONE BY ONE, acking each only after your code returns. Crash mid-batch?
|
|
85
|
+
The next run replays exactly what wasn't acked — nothing is lost, and the API
|
|
86
|
+
keeps no backlog of its own. Run it on a schedule (cron / systemd timer); there
|
|
87
|
+
is no daemon/follow mode by design. Connections, binary values, and webhooks are
|
|
88
|
+
documented below.
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## Quickstart
|
|
93
|
+
|
|
94
|
+
Requires **Python ≥ 3.11**.
|
|
95
|
+
|
|
96
|
+
```bash
|
|
97
|
+
pip install allus-company-data
|
|
98
|
+
# or, working from this repo: pip install -e '.[dev]' # from sdks/python/
|
|
99
|
+
python -c "import allus_company_data; print(allus_company_data.__version__)"
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### 1. Write a config file
|
|
103
|
+
|
|
104
|
+
A single JSON file holds everything. Any field can be overridden by an `ALLUS_*`
|
|
105
|
+
env var, so secrets needn't live in the file. **No SDK method ever takes a key,
|
|
106
|
+
passphrase, or secret as an argument** — they all come from here.
|
|
107
|
+
|
|
108
|
+
`allus.json`:
|
|
109
|
+
|
|
110
|
+
```json
|
|
111
|
+
{
|
|
112
|
+
"api_url": "https://api.allme.fyi",
|
|
113
|
+
"client_id": "svc_1a2b3c…",
|
|
114
|
+
"client_secret": "…",
|
|
115
|
+
"service_private_key": "./service-CRM.pem",
|
|
116
|
+
"key_passphrase": "…",
|
|
117
|
+
|
|
118
|
+
"account_private_key": "./account.pem",
|
|
119
|
+
"account_passphrase": "…",
|
|
120
|
+
|
|
121
|
+
"webhooks": {
|
|
122
|
+
"wh_abc123": "hmac_secret_for_that_webhook"
|
|
123
|
+
},
|
|
124
|
+
|
|
125
|
+
"cache_dir": "./allus-cache",
|
|
126
|
+
"format": "json"
|
|
127
|
+
}
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
| Field | Required | Meaning |
|
|
131
|
+
|-------|----------|---------|
|
|
132
|
+
| `api_url` | yes | API base, e.g. `https://api.allme.fyi`. |
|
|
133
|
+
| `client_id` / `client_secret` | yes | The registered `client_credentials` credentials for **one** service. |
|
|
134
|
+
| `service_private_key` | yes | Path to the OpenSSL-encrypted PKCS#8 PEM you downloaded from the portal. |
|
|
135
|
+
| `key_passphrase` | yes | Decrypts that PEM in memory at startup. |
|
|
136
|
+
| `account_private_key` / `account_passphrase` | only for `encrypt_payload` webhooks | The company **account** key, used to unwrap an encrypted webhook envelope. |
|
|
137
|
+
| `webhooks` / `webhook_secret` | webhook auth — HMAC (default) | Per-webhook HMAC secrets keyed by webhook id (matched via the `X-Allus-Webhook-Id` header). A single-webhook service can use a flat `"webhook_secret": "…"` instead of the map. |
|
|
138
|
+
| `webhook_bearer_token` | webhook auth — bearer | Verify `Authorization: Bearer <token>` deliveries. |
|
|
139
|
+
| `webhook_basic` | webhook auth — basic | `{"username","password"}` — verify HTTP Basic deliveries. |
|
|
140
|
+
| `webhook_header` | webhook auth — header | `{"name","value"}` — verify a custom-header delivery. |
|
|
141
|
+
| `webhook_auth_none` | webhook auth — none | `true` — explicit opt-out; `verifyWebhook` always passes (use only behind your own gateway). **Configure at most one** webhook auth method (two+ → `ConfigError`). |
|
|
142
|
+
| `cache_dir` | no (default `./allus-cache`) | Durable local buffer for the changes pump. Must be writable + durable. |
|
|
143
|
+
| `format` | no (default `json`) | Wire format `json` or `xml`. Invisible in the output. |
|
|
144
|
+
|
|
145
|
+
Env overrides use the `ALLUS_` prefix of the field name, e.g.
|
|
146
|
+
`ALLUS_CLIENT_SECRET`, `ALLUS_KEY_PASSPHRASE`, `ALLUS_ACCOUNT_PASSPHRASE`,
|
|
147
|
+
`ALLUS_WEBHOOK_SECRET`. A missing/invalid config (or an unreadable PEM / wrong
|
|
148
|
+
passphrase) raises `ConfigError` at construction — fail fast.
|
|
149
|
+
|
|
150
|
+
### 2. First call — list a connection's values
|
|
151
|
+
|
|
152
|
+
```python
|
|
153
|
+
from allus_company_data import Client
|
|
154
|
+
|
|
155
|
+
client = Client.from_config("allus.json")
|
|
156
|
+
|
|
157
|
+
# Iterate every connected person (lazy, auto-paged).
|
|
158
|
+
for conn in client.connections():
|
|
159
|
+
print(conn.display_name, conn.person_id)
|
|
160
|
+
for slug, val in conn.values.items():
|
|
161
|
+
print(f" {slug} = {val.value!r} (live={val.live}, updated={val.updated_at})")
|
|
162
|
+
break # just the first one for the demo
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
Or fetch one connection by id:
|
|
166
|
+
|
|
167
|
+
```python
|
|
168
|
+
conn = client.connection("019xxxxxxxxxxxxxxxxxxxxxxxxx")
|
|
169
|
+
email = conn.values["work_email"].value # "alice@acme.com" (a str)
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
`client = Client.from_env()` builds the same client entirely from `ALLUS_*`
|
|
173
|
+
env vars (no file).
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## Every call
|
|
178
|
+
|
|
179
|
+
`Client` is the only object you construct. Build it from config, then:
|
|
180
|
+
|
|
181
|
+
```python
|
|
182
|
+
Client.from_config(path, **kwargs) -> Client # from a JSON file (env overrides secrets)
|
|
183
|
+
Client.from_env(**kwargs) -> Client # entirely from ALLUS_* env vars
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
`kwargs` are advanced/optional: `http` (an injected `HttpClient`), `logger` (a
|
|
187
|
+
`logging.Logger`), `sleep` (a `Callable[[float], None]`, for tests).
|
|
188
|
+
|
|
189
|
+
### `request_fields()`
|
|
190
|
+
|
|
191
|
+
```python
|
|
192
|
+
request_fields() -> list[RequestField]
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
Your request-field **definitions** — fetched once from
|
|
196
|
+
`GET /api/company-data/request-fields` and cached for the life of the client (it
|
|
197
|
+
types every value). Returns *your* request config, never the person's fields.
|
|
198
|
+
|
|
199
|
+
* **Params:** none.
|
|
200
|
+
* **Returns:** `list[RequestField]` — each `RequestField(slug, label, type, one_time, mandatory, raw)`. `mandatory` is true when the field is mandatory-to-provide **or** mandatory-to-stay-connected.
|
|
201
|
+
* **Raises:** `AuthError`, `ApiError`, `RateLimitError`.
|
|
202
|
+
|
|
203
|
+
```python
|
|
204
|
+
for f in client.request_fields():
|
|
205
|
+
flag = "mandatory" if f.mandatory else "optional"
|
|
206
|
+
print(f"{f.slug:20} {f.type:10} {flag}{' (one-time)' if f.one_time else ''}")
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
### `connections(limit, offset)`
|
|
210
|
+
|
|
211
|
+
```python
|
|
212
|
+
connections(limit: int = 100, offset: int = 0) -> Iterator[Connection]
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
A **lazy generator** that auto-pages `GET /api/company-data/connections?limit&offset`
|
|
216
|
+
and yields one typed `Connection` at a time (bounded memory for a large book).
|
|
217
|
+
Each `conn.values[slug]` is already decrypted (or a lazy binary handle).
|
|
218
|
+
|
|
219
|
+
* **Params:** `limit` — page size (default 100); `offset` — starting offset.
|
|
220
|
+
* **Returns:** `Iterator[Connection]`.
|
|
221
|
+
* **Raises:** `AuthError`, `ApiError`, `DecryptError` (per value, at access), `RateLimitError` (after the iterator's bounded internal backoff — see [Rate limits](#rate-limits)).
|
|
222
|
+
|
|
223
|
+
> **Heavily rate-limited.** Use for the initial full sync + occasional
|
|
224
|
+
> reconciliation only — never as a poll substitute for the changes feed. The
|
|
225
|
+
> generator paces itself within the limit (backs off on `Retry-After`).
|
|
226
|
+
|
|
227
|
+
```python
|
|
228
|
+
# Initial full sync, streaming so a 100k-connection book never lands in memory.
|
|
229
|
+
for conn in client.connections(limit=200):
|
|
230
|
+
upsert_local_record(conn)
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### `connection(id)`
|
|
234
|
+
|
|
235
|
+
```python
|
|
236
|
+
connection(id: str) -> Connection
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
Fetch one connection by its connection id (`GET /api/company-data/connections/{id}`).
|
|
240
|
+
|
|
241
|
+
* **Params:** `id` — the connection id (`Connection.id`).
|
|
242
|
+
* **Returns:** one `Connection`. Note: this endpoint returns `{connection_id, user_id, values}` and **no** `display_name`/`connected_at`, so those identity fields are `None` here (the list endpoint carries them).
|
|
243
|
+
* **Raises:** `AuthError`, `ApiError` (404 if unknown), `DecryptError`, `RateLimitError`.
|
|
244
|
+
|
|
245
|
+
```python
|
|
246
|
+
conn = client.connection(conn_id)
|
|
247
|
+
phone = conn.values.get("mobile")
|
|
248
|
+
if phone:
|
|
249
|
+
print(phone.value, "live" if phone.live else "snapshot")
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
### `logs(limit, offset)`
|
|
253
|
+
|
|
254
|
+
```python
|
|
255
|
+
logs(limit: int = 50, offset: int = 0) -> list[LogEntry]
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
The service's activity log (`GET /api/company-data/logs?limit&offset`) — **ops
|
|
259
|
+
events only** (email / purge / webhook), never person field data.
|
|
260
|
+
|
|
261
|
+
* **Params:** `limit` (default 50), `offset` (default 0).
|
|
262
|
+
* **Returns:** `list[LogEntry]` — each `LogEntry(type, message, metadata, at, raw)`.
|
|
263
|
+
* **Raises:** `AuthError`, `ApiError`, `RateLimitError`.
|
|
264
|
+
|
|
265
|
+
```python
|
|
266
|
+
for entry in client.logs(limit=20):
|
|
267
|
+
print(entry.at, entry.type, entry.message)
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
### `process_changes(handler, **options)`
|
|
271
|
+
|
|
272
|
+
```python
|
|
273
|
+
process_changes(handler: Callable[[Change], None], **options) -> None
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
The crash-safe changes pump: drains the feed through `handler` **one `Change` at
|
|
277
|
+
a time**, durably buffering each batch before delivery, with per-item ack and
|
|
278
|
+
retry → dead-letter → continue. Runs **until the feed is empty, then returns** —
|
|
279
|
+
there is **no follow/daemon mode** (you schedule re-runs yourself). Delivery is
|
|
280
|
+
**at-least-once**, so your handler **must be idempotent** (dedup on `Change.id`).
|
|
281
|
+
See [The changes pump](#the-changes-pump) for the full model.
|
|
282
|
+
|
|
283
|
+
* **Params:** `handler` — your callback; called with one `Change`. A return is an ack; an exception triggers retry.
|
|
284
|
+
* **Options** (keyword-only): `batch_size` (clamped to ≤ 500, default 100), `max_retries` (default 3), `on_error` (`"deadletter"` — default — or `"halt"`), `backoff` (`Callable[[int], float]`, attempt → seconds).
|
|
285
|
+
* **Returns:** `None` (when the feed is empty + the buffer is drained).
|
|
286
|
+
* **Raises:** `AuthError`, `ApiError`, `RateLimitError` (during a drain); `ValueError` (bad `on_error`); whatever the handler raises if `on_error="halt"` and retries are exhausted.
|
|
287
|
+
|
|
288
|
+
```python
|
|
289
|
+
def handle(change):
|
|
290
|
+
if already_processed(change.id): # idempotency — dedup on the stable id
|
|
291
|
+
return
|
|
292
|
+
if change.event == "field_updated":
|
|
293
|
+
store(change.person_id, change.slug, change.value)
|
|
294
|
+
elif change.event in ("connection_deleted", "field_deleted"):
|
|
295
|
+
remove(change.person_id, change.slug)
|
|
296
|
+
mark_processed(change.id)
|
|
297
|
+
|
|
298
|
+
client.process_changes(handle) # returns when the feed is empty
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
> `logger` is **not** a `process_changes` option in this SDK — pass it once to
|
|
302
|
+
> the `Client` constructor (`Client.from_config("allus.json", logger=my_logger)`).
|
|
303
|
+
|
|
304
|
+
### Advanced changes primitives
|
|
305
|
+
|
|
306
|
+
```python
|
|
307
|
+
drain_batch(max: int = 100) -> list[Change] # raw, UNBUFFERED — you own durability
|
|
308
|
+
dead_letters() -> list[dict] # the local dead-letter store
|
|
309
|
+
retry_dead_letters(handler, **options) -> int # re-drive dead-lettered events; returns count re-driven
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
* `drain_batch(max)` — fetches one batch (clamped ≤ 500) and returns the decrypted `Change`s directly. It does **not** persist anything, so a crash loses what the API already deleted. Prefer `process_changes` for safe consumption.
|
|
313
|
+
* `dead_letters()` — each dict is the stored (ciphertext) event plus a flattened `error` and `attempts`.
|
|
314
|
+
* `retry_dead_letters(handler, **options)` — same `max_retries` / `on_error` / `backoff` options as `process_changes`; on success a record is removed, on repeated failure it stays dead-lettered (or re-raises under `"halt"`). Dead letters are never re-fetched from the API — the local store is their only home.
|
|
315
|
+
|
|
316
|
+
```python
|
|
317
|
+
for dl in client.dead_letters():
|
|
318
|
+
print("stuck:", dl["id"], dl["error"], "after", dl["attempts"], "attempts")
|
|
319
|
+
|
|
320
|
+
n = client.retry_dead_letters(handle) # after you've fixed the bug
|
|
321
|
+
print(f"re-drove {n} dead letters")
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
### Webhook helpers (on the client)
|
|
325
|
+
|
|
326
|
+
The webhook receiver helpers are also exposed as `Client` methods (they delegate
|
|
327
|
+
to the module functions, fully config-driven — no key/secret arguments):
|
|
328
|
+
|
|
329
|
+
```python
|
|
330
|
+
client.verify_webhook(raw_body: bytes, headers: dict) -> bool
|
|
331
|
+
client.parse_webhook(raw_body: bytes, headers: dict) -> Change
|
|
332
|
+
client.handle_webhook(raw_body: bytes, headers: dict) -> Change # verify + parse
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
* `verify_webhook` — recomputes `HMAC-SHA256(raw_body, secret)` and constant-time-compares it to `X-Allus-Signature`. Returns `True`/`False`; **never raises** for a bad signature.
|
|
336
|
+
* `parse_webhook` — body → a typed `Change`. Does **not** verify. Handles JSON, XML, and the `encrypt_payload` account-key envelope. Raises `WebhookError` on a malformed/unparseable body.
|
|
337
|
+
* `handle_webhook` — verify **then** parse; raises `WebhookError` on a bad/unknown signature, otherwise returns the `Change`. The typical one-liner inside a route.
|
|
338
|
+
|
|
339
|
+
The same three are importable as standalone functions
|
|
340
|
+
(`from allus_company_data import verify_webhook, parse_webhook, handle_webhook`),
|
|
341
|
+
which take the `config` and the decrypt/type closures explicitly — but inside an
|
|
342
|
+
app you'll almost always use the client methods. See [Webhooks](#webhooks).
|
|
343
|
+
|
|
344
|
+
---
|
|
345
|
+
|
|
346
|
+
## The typed value model
|
|
347
|
+
|
|
348
|
+
You work with these objects and nothing else (`from allus_company_data import …`):
|
|
349
|
+
|
|
350
|
+
```text
|
|
351
|
+
RequestField { slug, label, type, one_time, mandatory } # YOUR request config
|
|
352
|
+
Connection { id, person_id, display_name, connected_at, values: {<slug>: Value} }
|
|
353
|
+
Value { value, live, updated_at }
|
|
354
|
+
Change { id, event, person_id, slug?, value?, live?, at }
|
|
355
|
+
LogEntry { type, message, metadata, at }
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
### Keyed by *your* slug
|
|
359
|
+
|
|
360
|
+
`conn.values["work_email"].value` → `"alice@acme.com"`. The key is the stable,
|
|
361
|
+
explicit slug you set per request field in the portal — rename the label freely,
|
|
362
|
+
the slug is the contract. **The person's source field is never exposed**: no
|
|
363
|
+
source slug, no `field_id`, not even via `.raw`.
|
|
364
|
+
|
|
365
|
+
### `Value(value, live, updated_at)`
|
|
366
|
+
|
|
367
|
+
| Attribute | Meaning |
|
|
368
|
+
|-----------|---------|
|
|
369
|
+
| `value` | The typed plaintext (see the table below). |
|
|
370
|
+
| `live` | `True` if the person chose "keep connected" (auto-updates); `False` for a one-time snapshot. |
|
|
371
|
+
| `updated_at` | `datetime` of when this answer last changed (per-answer, rides on the `Value`). |
|
|
372
|
+
|
|
373
|
+
### Value types (from the field's `type`)
|
|
374
|
+
|
|
375
|
+
| Field type | Python `value` |
|
|
376
|
+
|------------|----------------|
|
|
377
|
+
| `email`, `phone`, `url`, `text` | `str` |
|
|
378
|
+
| `address`, `bank`, `creditcard` | `dict` — the decrypted plaintext is a JSON object, parsed for you |
|
|
379
|
+
| `date`, `date_of_birth` | `datetime.date` (falls back to the raw string if it can't be parsed) |
|
|
380
|
+
| `photo`, `document`, `legal_document` | a lazy `BinaryHandle` — see below |
|
|
381
|
+
|
|
382
|
+
```python
|
|
383
|
+
addr = conn.values["home_address"].value # dict, e.g. {"street": "...", "city": "...", ...}
|
|
384
|
+
dob = conn.values["birthday"].value # datetime.date(1990, 5, 17)
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
### Binary fields — the lazy `BinaryHandle`
|
|
388
|
+
|
|
389
|
+
A photo/document value is a `BinaryHandle`. Nothing is fetched or decrypted until
|
|
390
|
+
you call `.bytes()` or `.save()`:
|
|
391
|
+
|
|
392
|
+
```python
|
|
393
|
+
handle = conn.values["passport_scan"].value # BinaryHandle (no network yet)
|
|
394
|
+
|
|
395
|
+
data = handle.bytes() # GET the slot file → decrypt → file bytes
|
|
396
|
+
n = handle.save("/tmp/passport.jpg") # same, written to disk; returns bytes written
|
|
397
|
+
print(handle.value_url) # the opaque slot-keyed URL it fetches from
|
|
398
|
+
```
|
|
399
|
+
|
|
400
|
+
`.bytes()` GETs the slot-keyed file endpoint, unwraps the API's
|
|
401
|
+
`{"encrypted": true, "value": <wrapper>}` envelope, decrypts with your service
|
|
402
|
+
key, parses the inner JSON envelope (`{"full": "data:…"}` for photos,
|
|
403
|
+
`{"file": "data:…"}` for documents) and base64-decodes the data URI into the
|
|
404
|
+
file bytes. The result is cached on the handle, so repeated calls don't re-fetch.
|
|
405
|
+
|
|
406
|
+
### `Change(id, event, person_id, slug?, value?, live?, at)`
|
|
407
|
+
|
|
408
|
+
A change-feed / webhook event.
|
|
409
|
+
|
|
410
|
+
| Attribute | Meaning |
|
|
411
|
+
|-----------|---------|
|
|
412
|
+
| `id` | **The stable server change-row id — your dedup key** (captured before the server delete). |
|
|
413
|
+
| `event` | `connection_created`, `connection_deleted`, `field_updated`, `field_deleted`, `consent_accepted`, `consent_declined`. |
|
|
414
|
+
| `person_id` | The person the change is about (may be `None`). |
|
|
415
|
+
| `slug`, `value`, `live` | Present only on `field_updated`; `value` is typed exactly like `Value.value` (incl. a lazy `BinaryHandle` for binaries). Connection/consent events carry no slot/value. |
|
|
416
|
+
| `at` | `datetime` of the change. (There is no separate `updated_at` on a change.) |
|
|
417
|
+
|
|
418
|
+
### `.raw`
|
|
419
|
+
|
|
420
|
+
Every model carries `.raw` — the underlying *hardened* API dict — for debugging
|
|
421
|
+
or an edge case the SDK didn't model. It still never contains the person's source
|
|
422
|
+
field.
|
|
423
|
+
|
|
424
|
+
See [`docs/model.md`](docs/model.md) for the full reference.
|
|
425
|
+
|
|
426
|
+
---
|
|
427
|
+
|
|
428
|
+
## The changes pump
|
|
429
|
+
|
|
430
|
+
The changes feed is a server-side **drain-on-fetch queue**:
|
|
431
|
+
`GET /api/company-data/changes?limit=N` returns up to N events (default 100, max
|
|
432
|
+
500) **and deletes exactly those rows in the same transaction** — no
|
|
433
|
+
offset/cursor, and the API keeps no copy afterward. So consumption can't be a
|
|
434
|
+
plain list: a consumer crash mid-batch would lose events the API already deleted,
|
|
435
|
+
and a huge backlog must not materialize in memory. `process_changes` solves both.
|
|
436
|
+
|
|
437
|
+
**Per run, repeating until the feed is empty then returning:**
|
|
438
|
+
|
|
439
|
+
1. **Replay first.** Deliver any un-acked events already in the local buffer (from a previous crashed run), oldest-first.
|
|
440
|
+
2. **Drain.** When the buffer is empty, fetch one batch and **persist it to the durable file buffer (fsync) BEFORE handing anything out.** This is the backup the API no longer has.
|
|
441
|
+
3. **Deliver one-by-one.** For each buffered event, oldest-first: decrypt its value *at delivery* (never on disk), build the typed `Change`, call `handler`.
|
|
442
|
+
4. **Ack / retry / dead-letter.** On success, remove the event from the buffer (ack). On a handler error, retry with backoff up to `max_retries`; then either move it to the dead-letter store and continue (`on_error="deadletter"`, default — one poison event never wedges the stream) or stop and re-raise (`on_error="halt"`). A `DecryptError` on a buffered event (corrupt/truncated ciphertext, rotated key) is **dead-lettered immediately** — re-decrypting can't fix it, so it does *not* burn retries (under `on_error="halt"` it re-raises). Either way it never propagates out and wedges replay.
|
|
443
|
+
5. Repeat until a drain returns empty **and** the buffer is drained → return.
|
|
444
|
+
|
|
445
|
+
### The durable buffer
|
|
446
|
+
|
|
447
|
+
* Plain files under `cache_dir` (zero extra dependencies): `pending/` for un-acked events, `deadletter/` for ones that exhausted retries.
|
|
448
|
+
* Stored events keep their **ciphertext** value — **no plaintext PII is ever written to disk**. Decryption happens only at delivery.
|
|
449
|
+
* Writes are crash-safe (temp file → fsync → atomic rename → dir fsync). Files are named with a monotonic, zero-padded sequence so they replay oldest-first.
|
|
450
|
+
|
|
451
|
+
### Crash safety, at-least-once, and idempotency
|
|
452
|
+
|
|
453
|
+
A batch is durably buffered *before* any delivery, and acked per-item only *after*
|
|
454
|
+
the handler succeeds. The ack can't be atomic with your side-effects — a crash
|
|
455
|
+
between your handler's success and its ack re-delivers that event on the next run.
|
|
456
|
+
That makes delivery **at-least-once**, so:
|
|
457
|
+
|
|
458
|
+
> **Your handler must be idempotent. Dedup on `Change.id`.**
|
|
459
|
+
|
|
460
|
+
`Change.id` is the stable server change-row id, captured before the server delete,
|
|
461
|
+
so it survives crash + replay unchanged.
|
|
462
|
+
|
|
463
|
+
### No follow mode
|
|
464
|
+
|
|
465
|
+
`process_changes` returns when the feed empties. **You** schedule re-runs — a
|
|
466
|
+
cron job, a `while True: client.process_changes(handle); time.sleep(5)` loop, a
|
|
467
|
+
worker queue, whatever fits. The feed is cheap to poll (see
|
|
468
|
+
[Rate limits](#rate-limits)).
|
|
469
|
+
|
|
470
|
+
### Worked example
|
|
471
|
+
|
|
472
|
+
```python
|
|
473
|
+
import time
|
|
474
|
+
from allus_company_data import Client
|
|
475
|
+
|
|
476
|
+
client = Client.from_config("allus.json")
|
|
477
|
+
|
|
478
|
+
def handle(change):
|
|
479
|
+
# Idempotent: skip anything we've already applied.
|
|
480
|
+
if seen(change.id):
|
|
481
|
+
return
|
|
482
|
+
match change.event:
|
|
483
|
+
case "field_updated":
|
|
484
|
+
store_value(change.person_id, change.slug, change.value, live=change.live)
|
|
485
|
+
case "field_deleted":
|
|
486
|
+
clear_value(change.person_id, change.slug)
|
|
487
|
+
case "connection_deleted":
|
|
488
|
+
drop_person(change.person_id)
|
|
489
|
+
case "connection_created" | "consent_accepted" | "consent_declined":
|
|
490
|
+
note_event(change.person_id, change.event, change.at)
|
|
491
|
+
record_seen(change.id)
|
|
492
|
+
|
|
493
|
+
# Schedule your own re-runs; process_changes itself returns when empty.
|
|
494
|
+
while True:
|
|
495
|
+
client.process_changes(handle, batch_size=200, max_retries=5)
|
|
496
|
+
time.sleep(5)
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
If a handler keeps failing, the event lands in the dead-letter store instead of
|
|
500
|
+
blocking the stream; inspect with `client.dead_letters()` and re-drive with
|
|
501
|
+
`client.retry_dead_letters(handle)` after fixing the cause. See
|
|
502
|
+
[`docs/pump.md`](docs/pump.md).
|
|
503
|
+
|
|
504
|
+
---
|
|
505
|
+
|
|
506
|
+
## Webhooks
|
|
507
|
+
|
|
508
|
+
Webhooks are the lower-latency push alternative to polling the changes feed. The
|
|
509
|
+
platform POSTs each change event to your configured webhook URL with:
|
|
510
|
+
|
|
511
|
+
* `X-Allus-Webhook-Id` — which webhook this is (selects the HMAC secret from config).
|
|
512
|
+
* `X-Allus-Signature` — `HMAC-SHA256(rawBody, secret)` as lowercase hex.
|
|
513
|
+
* the body — the same slug-keyed `Change` shape as the pull feed (JSON or XML).
|
|
514
|
+
|
|
515
|
+
All secrets/keys come from config; the helpers take **no key or secret
|
|
516
|
+
arguments**. Use the raw request body bytes (do not re-serialize a parsed body —
|
|
517
|
+
the HMAC is over the exact bytes the platform sent).
|
|
518
|
+
|
|
519
|
+
### In a web route (Flask)
|
|
520
|
+
|
|
521
|
+
```python
|
|
522
|
+
from flask import Flask, request, abort
|
|
523
|
+
from allus_company_data import Client, WebhookError
|
|
524
|
+
|
|
525
|
+
app = Flask(__name__)
|
|
526
|
+
client = Client.from_config("allus.json")
|
|
527
|
+
|
|
528
|
+
@app.post("/allus/webhook")
|
|
529
|
+
def allus_webhook():
|
|
530
|
+
try:
|
|
531
|
+
change = client.handle_webhook(request.get_data(), dict(request.headers))
|
|
532
|
+
except WebhookError:
|
|
533
|
+
abort(401) # bad / unknown signature, or unparseable envelope
|
|
534
|
+
|
|
535
|
+
# Same idempotency rule as the pump: dedup on change.id.
|
|
536
|
+
if not seen(change.id):
|
|
537
|
+
apply_change(change)
|
|
538
|
+
record_seen(change.id)
|
|
539
|
+
return ("", 204)
|
|
540
|
+
```
|
|
541
|
+
|
|
542
|
+
`verify_webhook` / `parse_webhook` let you split the steps if you prefer:
|
|
543
|
+
|
|
544
|
+
```python
|
|
545
|
+
if not client.verify_webhook(raw_body, headers):
|
|
546
|
+
abort(401)
|
|
547
|
+
change = client.parse_webhook(raw_body, headers)
|
|
548
|
+
```
|
|
549
|
+
|
|
550
|
+
### Config-driven secrets
|
|
551
|
+
|
|
552
|
+
Per-webhook HMAC secrets live in the config `webhooks` map, keyed by webhook id;
|
|
553
|
+
the SDK reads `X-Allus-Webhook-Id` off the request and looks up the matching
|
|
554
|
+
secret. A single-webhook service can use the flat `"webhook_secret": "…"`
|
|
555
|
+
shortcut (or `ALLUS_WEBHOOK_SECRET`). An unknown/unconfigured id ⇒ verification
|
|
556
|
+
returns `False` (and `handle_webhook` raises `WebhookError`).
|
|
557
|
+
|
|
558
|
+
### The `encrypt_payload` account-key envelope
|
|
559
|
+
|
|
560
|
+
If a webhook has `encrypt_payload` enabled, the body is **replaced** by a
|
|
561
|
+
`{"_enc":1,…}` envelope encrypted to your company **account** key (and the HMAC is
|
|
562
|
+
over that envelope — the final bytes sent). `parse_webhook`/`handle_webhook`
|
|
563
|
+
unwrap it transparently using the configured `account_private_key` +
|
|
564
|
+
`account_passphrase`, then decrypt the inner field value with the service key — so
|
|
565
|
+
an encrypted-payload `Change` is identical to a plain one. If you receive such a
|
|
566
|
+
webhook without an `account_private_key` configured, you get a `WebhookError`.
|
|
567
|
+
|
|
568
|
+
> The account-key envelope uses OAEP-**SHA1** (OpenSSL's default), distinct from
|
|
569
|
+
> the OAEP-SHA256 used for person field values — the SDK handles this difference
|
|
570
|
+
> internally; you only supply the account key in config.
|
|
571
|
+
|
|
572
|
+
See [`docs/webhooks.md`](docs/webhooks.md).
|
|
573
|
+
|
|
574
|
+
---
|
|
575
|
+
|
|
576
|
+
## Rate limits
|
|
577
|
+
|
|
578
|
+
| Endpoint | Limit | Use it for |
|
|
579
|
+
|----------|-------|-----------|
|
|
580
|
+
| `changes` (the pump) | **generous** | Poll **as often as you like** — it's a cheap drain-on-fetch queue. |
|
|
581
|
+
| `request-fields`, `logs` | moderate | Occasional reads. |
|
|
582
|
+
| `connections`, `connection(id)`, binary `/file` | **heavily limited** | Initial full sync + occasional reconciliation **only** — never as a poll substitute. |
|
|
583
|
+
|
|
584
|
+
A 429 carries `Retry-After`. The SDK backs off and retries automatically:
|
|
585
|
+
|
|
586
|
+
* The transport (`HttpClient`) retries a 429 a bounded number of times honoring `Retry-After`, then surfaces `RateLimitError`.
|
|
587
|
+
* The `connections(...)` generator additionally backs off per `Retry-After` on a surfaced `RateLimitError` and retries the page a bounded number of times before re-raising — so it paces itself within the limit instead of hammering.
|
|
588
|
+
|
|
589
|
+
If you catch a `RateLimitError`, its `.retry_after` is the seconds to wait
|
|
590
|
+
(or `None` when the header was absent).
|
|
591
|
+
|
|
592
|
+
---
|
|
593
|
+
|
|
594
|
+
## Errors
|
|
595
|
+
|
|
596
|
+
All from `allus_company_data`. Same taxonomy + names across all six SDKs.
|
|
597
|
+
|
|
598
|
+
| Error | When |
|
|
599
|
+
|-------|------|
|
|
600
|
+
| `ConfigError` | Missing/invalid config, unreadable key file, or wrong passphrase — at construction (fail fast). |
|
|
601
|
+
| `AuthError` | Token fetch/refresh failed (bad `client_id`/`secret`, revoked client); or a 401 survives the one automatic refresh-and-retry. |
|
|
602
|
+
| `ApiError(status, error_key, message)` | Any non-2xx from the API; carries the HTTP `status`, the platform `error_key` (when present), and `message`. |
|
|
603
|
+
| `DecryptError` | A ciphertext wrapper is malformed, the key is wrong, or the GCM tag mismatches. Surfaces when a value is accessed/decrypted. |
|
|
604
|
+
| `WebhookError` | Signature verification failed, or an envelope couldn't be unwrapped/parsed. |
|
|
605
|
+
| `RateLimitError(retry_after)` | A 429 from a rate-limited endpoint. Subclass of `ApiError` (status fixed at 429); carries `retry_after` (seconds, or `None`). |
|
|
606
|
+
|
|
607
|
+
```python
|
|
608
|
+
from allus_company_data import (
|
|
609
|
+
Client, ConfigError, AuthError, ApiError,
|
|
610
|
+
DecryptError, WebhookError, RateLimitError,
|
|
611
|
+
)
|
|
612
|
+
|
|
613
|
+
try:
|
|
614
|
+
client = Client.from_config("allus.json")
|
|
615
|
+
for conn in client.connections():
|
|
616
|
+
...
|
|
617
|
+
except ConfigError as e:
|
|
618
|
+
... # fix the config / key file
|
|
619
|
+
except RateLimitError as e:
|
|
620
|
+
wait(e.retry_after or 60)
|
|
621
|
+
except ApiError as e:
|
|
622
|
+
log(e.status, e.error_key, e.message)
|
|
623
|
+
```
|
|
624
|
+
|
|
625
|
+
See [`docs/errors.md`](docs/errors.md).
|
|
626
|
+
|
|
627
|
+
---
|
|
628
|
+
|
|
629
|
+
## How it's wired
|
|
630
|
+
|
|
631
|
+
Everything below is what the SDK hides so your code only ever sees conclusions.
|
|
632
|
+
|
|
633
|
+
**Auth / token.** An `HttpClient` owns a `client_credentials`-only token. On the
|
|
634
|
+
first call (or when the cached token nears expiry) it POSTs
|
|
635
|
+
`client_id`/`client_secret` to `{api_url}/oauth2/token` and caches the bearer
|
|
636
|
+
token + its expiry; refresh is automatic. A mid-flight 401 triggers exactly one
|
|
637
|
+
refresh-and-retry, then `AuthError`. The token is scoped server-side to **one**
|
|
638
|
+
service, so every call is implicitly that service's data.
|
|
639
|
+
|
|
640
|
+
**Slug resolution.** `request_fields()` is fetched once and cached; its slug→type
|
|
641
|
+
map types every value (so `address` parses to a dict, `photo` becomes a lazy
|
|
642
|
+
binary handle, etc.). The connection/changes endpoints return values keyed by
|
|
643
|
+
**your** request slug — the person's source field is dropped server-side and
|
|
644
|
+
never reaches the SDK.
|
|
645
|
+
|
|
646
|
+
**Decryption (zero-knowledge).** The service private key is loaded **once** at
|
|
647
|
+
construction from the configured encrypted PEM + passphrase into an in-memory RSA
|
|
648
|
+
key. A `decrypt` closure over it is handed to every model factory and the pump —
|
|
649
|
+
the key never appears in a method signature. Each value is a hybrid wrapper
|
|
650
|
+
(`{"_enc":1,"k":rsa_oaep_sha256(aesKey),"iv":…,"d":aes256gcm(…)}`); the SDK
|
|
651
|
+
RSA-OAEP-SHA256 unwraps the AES key, then AES-256-GCM decrypts the payload. **The
|
|
652
|
+
platform only ever holds ciphertext — it never sees your plaintext.**
|
|
653
|
+
|
|
654
|
+
**Binary fetch.** A binary value is a lazy `BinaryHandle` over a slot-keyed
|
|
655
|
+
`value_url`. On `.bytes()`/`.save()` it GETs that file endpoint, unwraps the
|
|
656
|
+
`{"encrypted":true,"value":<wrapper>}` envelope, runs the same service-key
|
|
657
|
+
decrypt to a JSON file-envelope, and base64-decodes its data URI to the file
|
|
658
|
+
bytes. (Slot-keyed, never source-field-keyed.)
|
|
659
|
+
|
|
660
|
+
**The drain-on-fetch feed.** `process_changes` delegates to a `Pump` wired to a
|
|
661
|
+
`fetch_changes` closure (`GET /changes?limit=`, returning raw ciphertext events)
|
|
662
|
+
and a `decrypt` closure (builds a typed `Change`). Because the fetch deletes the
|
|
663
|
+
rows it returns, the pump persists each batch to the durable file buffer
|
|
664
|
+
(ciphertext at rest) before delivery, acks per-item after your handler succeeds,
|
|
665
|
+
and replays the buffer on restart — see [The changes pump](#the-changes-pump).
|
|
666
|
+
```
|