pytrials-v2 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,24 @@
1
+ name: Publish to PyPI
2
+
3
+ # Publishes on a version tag (e.g. v0.1.0). Uses PyPI Trusted Publishing
4
+ # (OIDC), so no API token is stored in the repo. Configure the trusted
5
+ # publisher once on PyPI: project settings -> Publishing -> add a GitHub
6
+ # publisher for prahlaadr/pytrials-v2, workflow publish.yml, environment pypi.
7
+ on:
8
+ push:
9
+ tags:
10
+ - "v*"
11
+
12
+ jobs:
13
+ publish:
14
+ runs-on: ubuntu-latest
15
+ environment: pypi
16
+ permissions:
17
+ id-token: write
18
+ steps:
19
+ - uses: actions/checkout@v4
20
+ - uses: astral-sh/setup-uv@v5
21
+ - name: Build
22
+ run: uv build
23
+ - name: Publish
24
+ uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,34 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *.egg-info/
5
+ .eggs/
6
+ build/
7
+ dist/
8
+ # Envs
9
+ .venv/
10
+ venv/
11
+ .env
12
+ .env.local
13
+ # Tooling caches
14
+ .mypy_cache/
15
+ .ruff_cache/
16
+ .pytest_cache/
17
+ .coverage
18
+ htmlcov/
19
+ # uv
20
+ .uv/
21
+ # OS / editor
22
+ .DS_Store
23
+ .idea/
24
+ .vscode/
25
+ # docs build
26
+ site/
27
+
28
+ # marimo wasm build output
29
+ demo/dist/
30
+ .vercel/
31
+
32
+ # build artifacts
33
+ dist/
34
+ *.egg-info/
@@ -0,0 +1,23 @@
1
+ # AGENTS.md
2
+
3
+ Python SDK for the ClinicalTrials.gov API v2. Full design and roadmap in `PROJECT_PLAN.md`.
4
+
5
+ ## Conventions
6
+ - Python 3.10+. Package/deps via `uv`. Build backend: hatch.
7
+ - Lint/format: ruff. Types: mypy strict. Tests: pytest + pytest-asyncio, mock httpx with respx.
8
+ - Never use em dashes in any output (code, comments, docs, commits). Use periods, commas, colons, or parentheses.
9
+ - Never commit secrets (this API needs none; it is public and unauthenticated).
10
+
11
+ ## Core stack (from the plan)
12
+ httpx, pydantic v2, tenacity, respx, pytest, ruff, mypy, mkdocs-material, mike, hatch.
13
+
14
+ ## Recommended libraries to speed the build
15
+ - **datamodel-code-generator** — generate the Pydantic v2 models directly from the ClinicalTrials.gov OpenAPI 3.0 spec instead of hand-writing the deeply nested hierarchy. Biggest time saver.
16
+ - **aiolimiter** — async token-bucket rate limiter (the 50 req/min cap), instead of hand-rolling.
17
+ - **hishel** — httpx-native HTTP caching, useful given the rate limit and large study payloads.
18
+ - **stamina** — modern retry wrapper over tenacity, cleaner ergonomics (optional; tenacity is fine).
19
+ - **Typer** — for the v1.0 CLI (`pytrials search ...`).
20
+ - **Polars** for the DataFrame layer (matches the house data stack: DuckDB then Polars, avoid pandas). Offer `.to_polars()` as primary, `.to_pandas()` as a thin convenience.
21
+
22
+ ## Demo / frontend
23
+ A live interactive demo (clinical-trial search playground) is the artifact's "live" surface. Best fit: **Marimo** (reactive Python notebook, export to WASM static HTML, deploy to a pyaarproject subdomain). Alternatives: Streamlit or Gradio for a quick search UI, FastHTML for a fuller pure-Python web app.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Prahlaad R.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,82 @@
1
+ Metadata-Version: 2.4
2
+ Name: pytrials-v2
3
+ Version: 0.1.0
4
+ Summary: A modern, fully-typed Python SDK for the ClinicalTrials.gov API v2.
5
+ Project-URL: Homepage, https://github.com/raami/pytrials-v2
6
+ Project-URL: Repository, https://github.com/raami/pytrials-v2
7
+ Author: Prahlaad Ram
8
+ License: MIT License
9
+
10
+ Copyright (c) 2026 Prahlaad R.
11
+
12
+ Permission is hereby granted, free of charge, to any person obtaining a copy
13
+ of this software and associated documentation files (the "Software"), to deal
14
+ in the Software without restriction, including without limitation the rights
15
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
16
+ copies of the Software, and to permit persons to whom the Software is
17
+ furnished to do so, subject to the following conditions:
18
+
19
+ The above copyright notice and this permission notice shall be included in all
20
+ copies or substantial portions of the Software.
21
+
22
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
23
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
24
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
25
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
26
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
27
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
28
+ SOFTWARE.
29
+ License-File: LICENSE
30
+ Keywords: api,clinical-trials,clinicaltrials,healthcare,sdk
31
+ Classifier: Development Status :: 3 - Alpha
32
+ Classifier: Intended Audience :: Science/Research
33
+ Classifier: License :: OSI Approved :: MIT License
34
+ Classifier: Programming Language :: Python :: 3 :: Only
35
+ Classifier: Programming Language :: Python :: 3.10
36
+ Classifier: Typing :: Typed
37
+ Requires-Python: >=3.10
38
+ Requires-Dist: httpx>=0.27
39
+ Requires-Dist: pydantic>=2
40
+ Description-Content-Type: text/markdown
41
+
42
+ # pytrials-v2
43
+
44
+ A modern, fully-typed Python SDK for the [ClinicalTrials.gov API v2](https://clinicaltrials.gov/data-api/api).
45
+
46
+ > Status: early development. See [PROJECT_PLAN.md](./PROJECT_PLAN.md) for the full design and roadmap.
47
+
48
+ ## Why
49
+
50
+ The ClinicalTrials.gov API v2 (JSON, token pagination, OpenAPI 3.0) launched in 2024, and the old XML v1 API was retired. There is still no well-designed Python SDK for v2. pytrials-v2 aims to be the default: Pydantic-modeled, async-capable, with the ergonomic helpers clinical-trial data consumers actually need.
51
+
52
+ ## What it offers
53
+
54
+ - Full Pydantic v2 models for every API response (real autocomplete, no dict-digging)
55
+ - A validating QueryBuilder that catches bad status, phase, and sort values before the request
56
+ - Async auto-pagination over `pageToken`
57
+ - DataFrame integration that flattens the nested study structure for analysis
58
+ - Date normalization across the API's inconsistent formats
59
+ - Built-in rate limiting (50 req/min) and retry with backoff
60
+
61
+ ## Quickstart (planned API)
62
+
63
+ ```python
64
+ from pytrials import ClinicalTrials
65
+
66
+ ctg = ClinicalTrials()
67
+
68
+ results = ctg.studies.search(condition="breast cancer", status=["RECRUITING"], phase=["PHASE3"])
69
+ study = ctg.studies.get("NCT04852770")
70
+ df = ctg.studies.search(condition="diabetes", status=["RECRUITING"]).to_dataframe()
71
+ ```
72
+
73
+ ## Roadmap
74
+
75
+ - **v0.1.0 Core**: client, search/get, core models, error handling, PyPI publish
76
+ - **v0.2.0 Ergonomics**: QueryBuilder, async paginator, stats endpoints, rate limiting
77
+ - **v0.3.0 Data science**: DataFrame integration, docs site, 90%+ coverage
78
+ - **v1.0.0 Stable**: full results-section models, CLI, notebook examples
79
+
80
+ ## License
81
+
82
+ MIT
@@ -0,0 +1,397 @@
1
+ # pytrials-v2: ClinicalTrials.gov API v2 Python SDK
2
+
3
+ ## Project overview
4
+
5
+ A modern, fully-typed Python SDK for the ClinicalTrials.gov API v2. No authentication required (public API). The goal is to become the default Python library for anyone working with clinical trial data programmatically.
6
+
7
+ **Package name:** `pytrials-v2`
8
+ **PyPI:** `pip install pytrials-v2`
9
+ **GitHub:** `github.com/raami/pytrials-v2`
10
+ **License:** MIT
11
+ **Python:** 3.10+
12
+
13
+ ---
14
+
15
+ ## Why this exists
16
+
17
+ The ClinicalTrials.gov API v2 launched in March 2024 as a complete rewrite: JSON responses, token-based pagination, OpenAPI 3.0 spec, enumerated values instead of free text. The old v1 API (XML-based) was retired in June 2024. There is currently no proper Python SDK for v2. What exists:
18
+
19
+ - `pytrials` on PyPI: partially updated for v2, minimal type coverage, no async, no pagination helper, no query builder
20
+ - `clinical-trials-api` on npm: JavaScript, toy-level
21
+ - MCP servers (cyanheads, etc.): designed for LLM tool use, not developer integration
22
+ - Raw `requests.get()` examples in blog posts
23
+
24
+ The gap: a well-designed, Pydantic-modeled, async-capable Python SDK with ergonomic helpers for the patterns that clinical trial data consumers actually need.
25
+
26
+ ---
27
+
28
+ ## ClinicalTrials.gov API v2 endpoints
29
+
30
+ Base URL: `https://clinicaltrials.gov/api/v2`
31
+
32
+ | Endpoint | Method | Description |
33
+ |---|---|---|
34
+ | `/studies` | GET | Search studies with query params and filters |
35
+ | `/studies/{nctId}` | GET | Get full study record by NCT ID |
36
+ | `/stats/size` | GET | Get total study count for a query |
37
+ | `/stats/fieldValues` | GET | Get valid values for a field with counts |
38
+ | `/stats/listFields` | GET | Browse the study data model field tree |
39
+ | `/version` | GET | API version and data timestamp |
40
+ | `/studies/metadata` | GET | Study data structure metadata |
41
+
42
+ **No auth required.** Rate limit is approximately 50 requests per minute per IP.
43
+
44
+ ---
45
+
46
+ ## SDK architecture
47
+
48
+ ### Layer 1: Public API surface
49
+
50
+ ```python
51
+ from pytrials import ClinicalTrials
52
+
53
+ ctg = ClinicalTrials()
54
+
55
+ # Search studies
56
+ results = ctg.studies.search(
57
+ condition="breast cancer",
58
+ status=["RECRUITING"],
59
+ phase=["PHASE3"],
60
+ page_size=100,
61
+ sort="LastUpdatePostDate:desc"
62
+ )
63
+
64
+ # Get single study
65
+ study = ctg.studies.get("NCT04852770")
66
+
67
+ # Count studies matching a query
68
+ count = ctg.stats.size(condition="diabetes", status=["RECRUITING"])
69
+
70
+ # Get valid values for a field
71
+ phases = ctg.stats.field_values("OverallStatus")
72
+
73
+ # API version info
74
+ version = ctg.version()
75
+ ```
76
+
77
+ #### Module namespaces
78
+
79
+ **ctg.studies**
80
+ - `search(**kwargs)` -- search with query params, returns `StudySearchResult`
81
+ - `get(nct_id)` -- fetch single study, returns `Study`
82
+ - `search_all(**kwargs)` -- async generator, auto-paginates through all results
83
+ - `bulk_get(nct_ids: list)` -- fetch multiple studies by NCT ID list
84
+
85
+ **ctg.stats**
86
+ - `size(**kwargs)` -- total count for a query, returns `int`
87
+ - `field_values(field, **kwargs)` -- enum values with counts, returns `list[FieldValue]`
88
+ - `list_fields(parent=None)` -- browse study data model tree
89
+
90
+ **ctg.metadata**
91
+ - `fields()` -- available fields for field selection
92
+ - `search_areas()` -- valid search area definitions
93
+
94
+ **ctg.version**
95
+ - `__call__()` -- returns `VersionInfo` with api_version and data_timestamp
96
+
97
+ ### Layer 2: QueryBuilder (fluent interface)
98
+
99
+ ```python
100
+ from pytrials import ClinicalTrials, Query
101
+
102
+ ctg = ClinicalTrials()
103
+
104
+ # Fluent query construction
105
+ query = (
106
+ Query()
107
+ .condition("non-small cell lung cancer")
108
+ .intervention("pembrolizumab")
109
+ .sponsor("Merck")
110
+ .location("United States")
111
+ .status("RECRUITING", "NOT_YET_RECRUITING")
112
+ .phase("PHASE3")
113
+ .sort("LastUpdatePostDate:desc")
114
+ )
115
+
116
+ results = ctg.studies.search(query, page_size=50)
117
+ ```
118
+
119
+ The `Query` object validates enum values at construction time (status, phase) and raises `InvalidQueryError` before hitting the API.
120
+
121
+ ### Layer 3: Pydantic v2 models
122
+
123
+ Every API response is a validated Pydantic model. The study data structure is deeply nested, so models mirror the hierarchy:
124
+
125
+ ```
126
+ Study
127
+ protocolSection
128
+ identificationModule (nctId, briefTitle, officialTitle, organization)
129
+ statusModule (overallStatus, startDateStruct, completionDateStruct)
130
+ sponsorCollaboratorsModule (leadSponsor, collaborators)
131
+ descriptionModule (briefSummary, detailedDescription)
132
+ conditionsModule (conditions, keywords)
133
+ designModule (studyType, phases, enrollmentInfo, designInfo)
134
+ armsInterventionsModule (armGroups, interventions)
135
+ outcomesModule (primaryOutcomes, secondaryOutcomes)
136
+ eligibilityModule (criteria, healthyVolunteers, sex, minimumAge, maximumAge)
137
+ contactsLocationsModule (overallOfficials, locations)
138
+ referencesModule (references, seeAlsoLinks)
139
+ derivedSection
140
+ miscInfoModule
141
+ conditionBrowseModule (meshTerms)
142
+ interventionBrowseModule (meshTerms)
143
+ resultsSection (when hasResults=True)
144
+ participantFlowModule
145
+ baselineCharacteristicsModule
146
+ outcomeMeasuresModule
147
+ adverseEventsModule
148
+ hasResults: bool
149
+ ```
150
+
151
+ Key design decisions:
152
+ - All fields are `Optional` with sensible defaults (the API has many nullable fields)
153
+ - Date fields use a custom `CTGDate` type that handles the inconsistent formats ("2024-01-15", "January 2024", "January 15, 2024")
154
+ - Enum fields (status, phase, study type) are Python enums with validation
155
+ - `model_config = ConfigDict(extra="allow")` so new API fields don't break the SDK
156
+
157
+ ### Layer 4: Core HTTP client
158
+
159
+ ```python
160
+ # Internal, not public API
161
+ class CTGClient:
162
+ base_url = "https://clinicaltrials.gov/api/v2"
163
+
164
+ # httpx async client with:
165
+ # - Automatic retry with exponential backoff (tenacity)
166
+ # - Rate limiting (50 req/min token bucket)
167
+ # - Configurable timeout (default 30s)
168
+ # - Response validation via Pydantic
169
+ # - Structured CTGError on 4xx/5xx
170
+ ```
171
+
172
+ ### Layer 5: Pagination
173
+
174
+ ```python
175
+ # Auto-paginate through all results
176
+ async for study in ctg.studies.search_all(condition="cancer"):
177
+ process(study)
178
+
179
+ # Or collect all at once (careful with large result sets)
180
+ all_studies = await ctg.studies.search_all(
181
+ condition="cancer",
182
+ status=["COMPLETED"]
183
+ ).collect()
184
+ ```
185
+
186
+ The paginator handles `pageToken` transparently, respects rate limits between pages, and yields `Study` objects one at a time via async generator.
187
+
188
+ ### Layer 6: pandas integration
189
+
190
+ ```python
191
+ # Direct DataFrame output
192
+ df = ctg.studies.search(
193
+ condition="diabetes",
194
+ status=["RECRUITING"],
195
+ page_size=100
196
+ ).to_dataframe()
197
+
198
+ # Flattened columns: nct_id, brief_title, overall_status,
199
+ # lead_sponsor, phase, enrollment, start_date, ...
200
+ ```
201
+
202
+ The `.to_dataframe()` method flattens the nested study structure into a tabular format suitable for analysis. Users can specify which fields to include.
203
+
204
+ ---
205
+
206
+ ## Project structure
207
+
208
+ ```
209
+ pytrials-v2/
210
+ src/
211
+ pytrials/
212
+ __init__.py # ClinicalTrials client, Query, enums
213
+ client.py # CTGClient (httpx, retry, rate limit)
214
+ models/
215
+ __init__.py
216
+ study.py # Study, ProtocolSection, etc.
217
+ search.py # StudySearchResult, pagination
218
+ stats.py # FieldValue, VersionInfo
219
+ enums.py # OverallStatus, Phase, StudyType
220
+ dates.py # CTGDate custom type
221
+ modules/
222
+ __init__.py
223
+ studies.py # StudiesModule (search, get, search_all)
224
+ stats.py # StatsModule (size, field_values)
225
+ metadata.py # MetadataModule (fields, search_areas)
226
+ query.py # QueryBuilder fluent interface
227
+ pagination.py # AsyncPaginator generator
228
+ errors.py # CTGError, RateLimitError, NotFoundError
229
+ pandas_ext.py # .to_dataframe() integration
230
+ tests/
231
+ conftest.py # respx fixtures, sample responses
232
+ test_client.py
233
+ test_studies.py
234
+ test_stats.py
235
+ test_query.py
236
+ test_pagination.py
237
+ test_models.py
238
+ fixtures/
239
+ search_response.json
240
+ study_detail.json
241
+ stats_size.json
242
+ docs/
243
+ index.md
244
+ quickstart.md
245
+ api-reference/
246
+ studies.md
247
+ stats.md
248
+ query-builder.md
249
+ models.md
250
+ guides/
251
+ pagination.md
252
+ pandas-integration.md
253
+ common-patterns.md # Competitive intel, site selection, patient matching
254
+ pyproject.toml
255
+ README.md
256
+ LICENSE
257
+ CHANGELOG.md
258
+ .github/
259
+ workflows/
260
+ ci.yml # pytest, mypy, ruff, coverage
261
+ publish.yml # PyPI publish on tag
262
+ ```
263
+
264
+ ---
265
+
266
+ ## Tech stack
267
+
268
+ | Tool | Purpose |
269
+ |---|---|
270
+ | httpx | Async HTTP client (better than requests for async) |
271
+ | pydantic v2 | Response validation and type safety |
272
+ | tenacity | Retry with exponential backoff |
273
+ | respx | Mock httpx in tests |
274
+ | pytest + pytest-asyncio | Test framework |
275
+ | ruff | Linting and formatting |
276
+ | mypy (strict) | Static type checking |
277
+ | mkdocs-material | Documentation site |
278
+ | mike | Docs versioning |
279
+ | hatch | Build backend and environment management |
280
+
281
+ ---
282
+
283
+ ## Release roadmap
284
+
285
+ ### v0.1.0: Core (weeks 1-2)
286
+
287
+ Ship a working SDK that covers the basic use cases.
288
+
289
+ - [ ] `ClinicalTrials` client class with httpx
290
+ - [ ] `ctg.studies.search()` with all query params
291
+ - [ ] `ctg.studies.get()` single study by NCT ID
292
+ - [ ] Pydantic models for Study, ProtocolSection, and key submodules
293
+ - [ ] Enum types for OverallStatus, Phase, StudyType
294
+ - [ ] Basic error handling (CTGError with status_code, message)
295
+ - [ ] pytest suite with respx mocking
296
+ - [ ] README with quickstart
297
+ - [ ] PyPI publish via GitHub Actions
298
+
299
+ ### v0.2.0: Ergonomics (weeks 3-4)
300
+
301
+ Make the SDK pleasant to use for real workflows.
302
+
303
+ - [ ] QueryBuilder fluent interface with validation
304
+ - [ ] AsyncPaginator (async generator for search_all)
305
+ - [ ] `ctg.stats.size()` and `ctg.stats.field_values()`
306
+ - [ ] `ctg.version()` endpoint
307
+ - [ ] Retry with exponential backoff (tenacity)
308
+ - [ ] Rate limiter (token bucket, 50 req/min)
309
+ - [ ] CTGDate custom type for inconsistent date formats
310
+ - [ ] `bulk_get()` for multiple NCT IDs
311
+
312
+ ### v0.3.0: Data science (weeks 5-6)
313
+
314
+ Make it useful for analysts and researchers.
315
+
316
+ - [ ] `.to_dataframe()` pandas integration
317
+ - [ ] CSV export option (format=csv passthrough)
318
+ - [ ] `ctg.metadata.fields()` and `search_areas()`
319
+ - [ ] mkdocs-material documentation site
320
+ - [ ] "Common patterns" guide (competitive intel, site selection, patient matching)
321
+ - [ ] mypy strict mode passing
322
+ - [ ] 90%+ test coverage
323
+
324
+ ### v1.0.0: Stable (week 8+)
325
+
326
+ - [ ] Full model coverage for resultsSection (outcomes, adverse events, participant flow)
327
+ - [ ] CLI tool (`pytrials search --condition "diabetes" --status RECRUITING`)
328
+ - [ ] Jupyter notebook examples
329
+ - [ ] Docs site deployed (GitHub Pages or Vercel)
330
+ - [ ] Community feedback incorporated
331
+
332
+ ---
333
+
334
+ ## Competitive differentiation
335
+
336
+ What this SDK does that nothing else offers:
337
+
338
+ 1. **Full Pydantic models.** Every field in the API response is typed. IDE autocomplete works everywhere. `study.protocol_section.eligibility_module.minimum_age` not `study["protocolSection"]["eligibilityModule"]["minimumAge"]`.
339
+
340
+ 2. **QueryBuilder with validation.** Catches invalid status values, phase values, and sort options before the request is sent. No more 400 errors from typos.
341
+
342
+ 3. **Auto-pagination.** `async for study in ctg.studies.search_all(...)` handles pageToken transparently. Nobody should write a pagination loop manually.
343
+
344
+ 4. **pandas-native.** `.to_dataframe()` flattens the deeply nested study structure into analysis-ready columns. This is what 80% of users actually want.
345
+
346
+ 5. **Date normalization.** The API returns dates in at least three formats. The SDK normalizes them into Python `date` objects.
347
+
348
+ 6. **Rate limit awareness.** Built-in token bucket respects the 50 req/min limit. No more 429 errors during bulk operations.
349
+
350
+ 7. **Domain expertise in the API design.** Search patterns (condition + intervention + sponsor + location + status + phase) are first-class, not afterthoughts. The QueryBuilder reflects how regulatory professionals, CROs, and biotech analysts actually query this data.
351
+
352
+ ---
353
+
354
+ ## Example use cases to document
355
+
356
+ ### Competitive intelligence
357
+ Search what trials a specific sponsor is running, filter by phase and status, export to DataFrame for analysis.
358
+
359
+ ### Clinical site selection
360
+ Find facilities running trials for a condition in a geography. The locations data in contactsLocationsModule is rich enough for this.
361
+
362
+ ### Patient matching
363
+ Search recruiting trials by condition, location, age range, and healthy volunteer status. This is the patient-facing use case.
364
+
365
+ ### Regulatory landscape
366
+ Count trials by phase and status for a therapeutic area. Use stats endpoints for aggregate views.
367
+
368
+ ### Drug pipeline tracking
369
+ Track all trials for a specific intervention across phases. Use sort by last update to catch new filings.
370
+
371
+ ---
372
+
373
+ ## Marketing and distribution
374
+
375
+ - PyPI package with clear README
376
+ - Blog post on dev.to or Medium: "Building the Python SDK ClinicalTrials.gov should have shipped"
377
+ - Post in r/bioinformatics, r/clinicalresearch, r/Python
378
+ - LinkedIn post (leveraging your existing healthcare audience)
379
+ - Submit to awesome-python and awesome-healthcare lists
380
+ - Present at OOP Data Camp (you're already going)
381
+ - Cross-reference from OpenTrialGraph
382
+
383
+ ---
384
+
385
+ ## Key API quirks to handle
386
+
387
+ 1. **Date inconsistency.** Some dates are "2024-01-15", others "January 2024", others "January 15, 2024". The SDK should normalize to `datetime.date` or a `CTGDate` that preserves precision (year-only vs. full date).
388
+
389
+ 2. **Nullable arrays.** conditions, interventions, locations, collaborators can all be null or empty arrays. Models must handle both.
390
+
391
+ 3. **Large responses.** A single study with results can be 200KB+ of JSON. The resultsSection (outcomes, adverse events, participant flow, baseline) is massive. Consider lazy loading or optional field selection.
392
+
393
+ 4. **pageSize default is 10.** Most users want more. The SDK should default to 100 or let users set a client-level default.
394
+
395
+ 5. **Enumerated values are case-sensitive.** RECRUITING works, recruiting doesn't. The SDK should handle case normalization.
396
+
397
+ 6. **CSV format returns flat columns.** The JSON and CSV response schemas are different. The SDK should abstract this.
@@ -0,0 +1,41 @@
1
+ # pytrials-v2
2
+
3
+ A modern, fully-typed Python SDK for the [ClinicalTrials.gov API v2](https://clinicaltrials.gov/data-api/api).
4
+
5
+ > Status: early development. See [PROJECT_PLAN.md](./PROJECT_PLAN.md) for the full design and roadmap.
6
+
7
+ ## Why
8
+
9
+ The ClinicalTrials.gov API v2 (JSON, token pagination, OpenAPI 3.0) launched in 2024, and the old XML v1 API was retired. There is still no well-designed Python SDK for v2. pytrials-v2 aims to be the default: Pydantic-modeled, async-capable, with the ergonomic helpers clinical-trial data consumers actually need.
10
+
11
+ ## What it offers
12
+
13
+ - Full Pydantic v2 models for every API response (real autocomplete, no dict-digging)
14
+ - A validating QueryBuilder that catches bad status, phase, and sort values before the request
15
+ - Async auto-pagination over `pageToken`
16
+ - DataFrame integration that flattens the nested study structure for analysis
17
+ - Date normalization across the API's inconsistent formats
18
+ - Built-in rate limiting (50 req/min) and retry with backoff
19
+
20
+ ## Quickstart (planned API)
21
+
22
+ ```python
23
+ from pytrials import ClinicalTrials
24
+
25
+ ctg = ClinicalTrials()
26
+
27
+ results = ctg.studies.search(condition="breast cancer", status=["RECRUITING"], phase=["PHASE3"])
28
+ study = ctg.studies.get("NCT04852770")
29
+ df = ctg.studies.search(condition="diabetes", status=["RECRUITING"]).to_dataframe()
30
+ ```
31
+
32
+ ## Roadmap
33
+
34
+ - **v0.1.0 Core**: client, search/get, core models, error handling, PyPI publish
35
+ - **v0.2.0 Ergonomics**: QueryBuilder, async paginator, stats endpoints, rate limiting
36
+ - **v0.3.0 Data science**: DataFrame integration, docs site, 90%+ coverage
37
+ - **v1.0.0 Stable**: full results-section models, CLI, notebook examples
38
+
39
+ ## License
40
+
41
+ MIT