aikosh 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,18 @@
1
+ include README.md
2
+ include pyproject.toml
3
+
4
+ # Only ship the actual package code.
5
+ recursive-include src/aikosh *.py py.typed
6
+
7
+ # Exclude everything else from sdist.
8
+ prune tests
9
+ prune docs
10
+ prune dist
11
+ prune build
12
+ prune .venv
13
+ prune venv
14
+ prune .pytest_cache
15
+
16
+ global-exclude *.py[cod]
17
+ global-exclude __pycache__
18
+ global-exclude *.egg-info
aikosh-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,385 @@
1
+ Metadata-Version: 2.4
2
+ Name: aikosh
3
+ Version: 0.1.0
4
+ Summary: Python SDK for the AIKosh platform (datasets, models, and more).
5
+ Author: AIKosh SDK contributors
6
+ License: Apache-2.0
7
+ Project-URL: Homepage, https://aikosh.indiaai.gov.in/home
8
+ Keywords: aikosh,indiaai,datasets,machine-learning
9
+ Classifier: Development Status :: 3 - Alpha
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: Intended Audience :: Science/Research
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.9
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Typing :: Typed
18
+ Requires-Python: >=3.9
19
+ Description-Content-Type: text/markdown
20
+ Requires-Dist: httpx<1,>=0.27
21
+ Provides-Extra: dev
22
+ Requires-Dist: pytest>=7.4; extra == "dev"
23
+
24
+ # AIKosh SDK (Python)
25
+
26
+ Open-source Python SDK for working with the AIKosh platform: **discover datasets and models, inspect files, and download assets** (whole packages or individual files).
27
+
28
+ ## What is AIKosh SDK?
29
+
30
+ AIKosh SDK is a **developer-friendly Python library** that wraps the AIKosh platform’s APIs into **simple, stable functions** you can call from notebooks, scripts, and applications.
31
+
32
+ It’s designed to feel like an “ML developer tool” library (similar in spirit to libraries like `transformers`), where common workflows—search, browse files, download—are one import away.
33
+
34
+ ## Why use an SDK (instead of calling APIs directly)?
35
+
36
+ Using an SDK helps developers by providing:
37
+
38
+ - **Simpler usage**: no manual URL construction, headers, or response parsing in every script.
39
+ - **Consistent patterns**: same function shapes for datasets and models (`list_directory`, `list_files`, `get_metadata`, `download`).
40
+ - **Safer downloads**: automatically fetches fresh temporary URLs and streams downloads to disk.
41
+ - **One place to evolve**: when the backend evolves, updating the SDK updates every downstream user.
42
+
43
+ ## What this SDK aspires to help you build
44
+
45
+ Over time, the goal is to make it easy to build:
46
+
47
+ - **Repeatable data/model pipelines**: programmatic discovery + download for training/evaluation.
48
+ - **Dataset/model exploration tools**: list and traverse file trees for validation and QA.
49
+ - **Automation**: integrate AIKosh assets into CI workflows and internal platforms.
50
+
51
+ ## Install
52
+
53
+ From TestPyPI (current distribution name):
54
+
55
+ ```bash
56
+ pip install aikosh
57
+ ```
58
+
59
+ For contributors / local development:
60
+
61
+ ```bash
62
+ pip install -e ".[dev]"
63
+ ```
64
+
65
+ ## Configuration
66
+
67
+ ### API key
68
+
69
+ Option A — environment variable:
70
+
71
+ ```python
72
+ import os
73
+ os.environ["AIKOSH_API_KEY"] = "YOUR_KEY"
74
+ ```
75
+
76
+ Option B — in code:
77
+
78
+ ```python
79
+ import aikosh
80
+ aikosh.set_api_key("YOUR_KEY")
81
+ ```
82
+
83
+ (`AIKOSH_ACCESS_KEY` is also supported.)
84
+
85
+ ## Asset identifiers (`id`)
86
+
87
+ List and metadata responses expose each dataset or model under the **`id`** field. Use that value everywhere the SDK expects an **`identifier`** (or the first argument to domain helpers like `list_files`).
88
+
89
+ ```python
90
+ import aikosh
91
+
92
+ out = aikosh.list_directory("dataset", filters={"page": 1, "size": 10})
93
+ items = out["data"]["items"] # shape depends on API; each item has "id"
94
+ dataset_id = items[0]["id"]
95
+
96
+ out = aikosh.list_directory("model", filters={"page": 1, "size": 10})
97
+ model_id = out["data"]["items"][0]["id"]
98
+ ```
99
+
100
+ Do not use human-readable slugs where the API expects the platform **`id`**.
101
+
102
+ ## Download request parameters
103
+
104
+ | Parameter | Role |
105
+ |-----------|------|
106
+ | `identifier` | Dataset or model **`id`** from list/metadata |
107
+ | `type` | `"dataset"` or `"model"` |
108
+ | `destination_path` | **Local** folder or file path where the download is saved |
109
+ | `file_path` | **Remote** file path inside the asset (single-file download) |
110
+ | `directory_path` | **Remote** folder/path inside the asset (single-file download; can be combined with `filename`) |
111
+ | `filename` | Optional local output name; with `directory_path`, also joins the remote path |
112
+ | `version_id` | Optional version **`id`** when the API supports multiple versions |
113
+ | `max_workers` | Batch downloads only: parallel workers (default **4**, maximum **4**) |
114
+
115
+ **`directory_path` is not a local save path** — use **`destination_path`** for that.
116
+
117
+ ## Quickstart
118
+
119
+ ### 1) Check connectivity
120
+
121
+ ```python
122
+ import aikosh
123
+ print(aikosh.ping()) # dataset filters endpoint
124
+ ```
125
+
126
+ ### 2) Discover available functions
127
+
128
+ ```python
129
+ import aikosh
130
+ aikosh.list_functions()
131
+ # Returns: {"aikosh": {...}, "aikosh.datasets": {...}, "aikosh.models": {...}}
132
+ ```
133
+
134
+ ### 3) Filter master (codes for list filters)
135
+
136
+ ```python
137
+ import aikosh
138
+ aikosh.get_datasets_filter_info()
139
+ # Returns : {"status": "success", "message": "filters endpoint reachable",
140
+ "data": {
141
+ "organisationList": [{id:..., name:...},{}..]
142
+ "sectorsList": [{id:..., name:...},{}..]
143
+ "licensesList": [{id:..., name:...},{}..]
144
+ "datasetTypesList": [{id:..., name:...},{}..]
145
+
146
+ aikosh.get_models_filter_info()
147
+ # Returns : {"status": "success", "message": "filters endpoint reachable",
148
+ "data": {
149
+ "organisationList": [{id:..., name:...},{}..]
150
+ "sectorsList": [{id:..., name:...},{}..]
151
+ "licensesList": [{id:..., name:...},{}..]
152
+ "modelTypesList": [{id:..., name:...},{}..]
153
+ ```
154
+
155
+ ### 4) List datasets or models
156
+
157
+ ```python
158
+ import aikosh
159
+
160
+ out = aikosh.list_directory(
161
+ "dataset",
162
+ filters={"page": 1, "size": 20, "keyword": "sanskrit"},
163
+ )
164
+ print(out["data"])
165
+
166
+ out = aikosh.list_directory(
167
+ "model",
168
+ filters={"page": 1, "size": 20, "keyword": "Bhashini", "modelType": [374,375]},
169
+ )
170
+ print(out["data"])
171
+
172
+ # If filters match nothing, check the SDK message (status stays "success"):
173
+ if out.get("message"):
174
+ print(out["message"])
175
+ ```
176
+
177
+ #### Dataset list filters
178
+
179
+ ```python
180
+ out = aikosh.list_directory(
181
+ "dataset",
182
+ filters={
183
+ "page": 1,
184
+ "size": 20,
185
+ "license": [213,214],
186
+ "sector": [3228,209],
187
+ "fileFormat": ["csv", "json"],
188
+ "versionScore": 3,
189
+ "keyword": "Krishi",
190
+ },
191
+ )
192
+ ```
193
+
194
+ ### 5) Get metadata (datasets and models)
195
+
196
+ One function with `type` set to `"dataset"` or `"model"`:
197
+
198
+ ```python
199
+ import aikosh
200
+
201
+ dataset_id = "PUT_DATASET_ID_HERE" # from list response item["id"]
202
+ model_id = "PUT_MODEL_ID_HERE"
203
+
204
+ print(aikosh.get_metadata("dataset", dataset_id)["data"])
205
+ print(aikosh.get_metadata("model", model_id)["data"])
206
+ ```
207
+
208
+ Aliases: `aikosh.get_dataset_metadata(dataset_id)` and `aikosh.get_model_metadata(model_id)`.
209
+
210
+ ### 6) List files (datasets and models)
211
+
212
+ `directory_path` in **filters** is the **remote** folder inside the asset (`""` for root).
213
+
214
+ ```python
215
+ import aikosh
216
+
217
+ dataset_id = "PUT_DATASET_ID_HERE"
218
+ model_id = "PUT_MODEL_ID_HERE"
219
+
220
+ aikosh.list_files(
221
+ "dataset",
222
+ dataset_id,
223
+ filters={"directory_path": "", "page": 1, "limit": 50},
224
+ )
225
+
226
+ aikosh.list_files(
227
+ "model",
228
+ model_id,
229
+ filters={"directory_path": "", "page": 1, "limit": 50},
230
+ )
231
+ ```
232
+
233
+ Domain shortcuts: `aikosh.datasets.list_files(dataset_id, ...)` and `aikosh.models.list_files(model_id, ...)`.
234
+
235
+ ### 7) Download
236
+
237
+ #### Whole dataset or model
238
+
239
+ ```python
240
+ import aikosh
241
+
242
+ dataset_id = "PUT_DATASET_ID_HERE"
243
+ out = aikosh.download(
244
+ {
245
+ "identifier": dataset_id,
246
+ "type": "dataset",
247
+ "destination_path": "./downloads",
248
+ }
249
+ )
250
+
251
+ model_id = "PUT_MODEL_ID_HERE"
252
+ out = aikosh.download(
253
+ {
254
+ "identifier": model_id,
255
+ "type": "model",
256
+ "destination_path": "./downloads/models",
257
+ # "version_id": "OPTIONAL_VERSION_ID",
258
+ }
259
+ )
260
+ ```
261
+
262
+ #### Single file (remote path + local destination)
263
+
264
+ ```python
265
+ out = aikosh.download(
266
+ {
267
+ "identifier": dataset_id,
268
+ "type": "dataset",
269
+ "file_path": "documents/report.pdf",
270
+ "destination_path": "./downloads/files",
271
+ "filename": "report_copy.pdf",
272
+ }
273
+ )
274
+
275
+ # Or remote folder + file name
276
+ out = aikosh.download(
277
+ {
278
+ "identifier": model_id,
279
+ "type": "model",
280
+ "directory_path": "weights/",
281
+ "filename": "model.bin",
282
+ "destination_path": "./downloads/models/files",
283
+ }
284
+ )
285
+ ```
286
+
287
+ #### Batch download
288
+
289
+ Pass a **list** of download request dicts. Concurrency is controlled with `max_workers` (default **4**; values above **4** are capped at **4**).
290
+
291
+ ```python
292
+ out = aikosh.download(
293
+ [
294
+ {"identifier": "DATASET_ID_1", "type": "dataset", "destination_path": "./downloads"},
295
+ {"identifier": "DATASET_ID_2", "type": "dataset", "destination_path": "./downloads"},
296
+ ],
297
+ max_workers=4, # optional; maximum allowed is 4
298
+ )
299
+ print(out["status"]) # success | partial_success | failed
300
+ print(out["items"])
301
+ ```
302
+
303
+ ## Modules (what to import)
304
+
305
+ - **`import aikosh`**: most users only need this (high-level journey functions).
306
+ - **`import aikosh.datasets`**: dataset journey + raw HTTP helpers.
307
+ - **`import aikosh.models`**: model journey + raw HTTP helpers.
308
+ - **`from aikosh.datasets import api as ds_api`**: advanced usage (parsed `data` from HTTP).
309
+ - **`from aikosh.models import api as models_api`**: same for models.
310
+
311
+ ## Reference: top-level package (`import aikosh`)
312
+
313
+ | Function | Typical use |
314
+ |----------|-------------|
315
+ | `set_api_key` / `set_access_key` | Store API key in-process (also reads env vars). |
316
+ | `get_access_key` | Read the configured key (if any). |
317
+ | `get_metadata(type, identifier, ...)` | Metadata for datasets or models. |
318
+ | `get_dataset_metadata(identifier, ...)` | Same as `get_metadata("dataset", identifier, ...)`. |
319
+ | `get_model_metadata(identifier, ...)` | Same as `get_metadata("model", identifier, ...)`. |
320
+ | `list_directory(type, filters=..., ...)` | List datasets or models. |
321
+ | `list_files(type, identifier, filters=..., ...)` | List files inside a dataset or model. |
322
+ | `download(request, ..., max_workers=...)` | Download dataset or model (single dict or batch list; `max_workers` default 4, max 4). |
323
+ | `to_json(data, ...)` | Serialize nested structures to a JSON string. |
324
+ | `ping(...)` | Connectivity check (dataset filters endpoint). |
325
+ | `list_functions(...)` | List user-facing functions and one-line descriptions (auto-generated). |
326
+ | `get_datasets_filter_info()` | Dataset filter master (codes for list filters). |
327
+ | `get_models_filter_info()` | Model filter master (codes for list filters). |
328
+ | `__version__` | Installed package version string. |
329
+
330
+ ## Reference: `aikosh.models`
331
+
332
+ ### Journey (user-facing)
333
+
334
+ | Function | Purpose |
335
+ |----------|---------|
336
+ | `list_directory(filters=..., ...)` | List models (`page`, `size`, `license`, `sector`, `fileFormat`, `modelType`, `keyword`). |
337
+ | `get_metadata(model_id, ...)` | Model metadata by **`id`**. |
338
+ | `list_files(model_id, filters=..., ...)` | Remote file tree (`directory_path`, optional `version_id`, `page`, `limit`). |
339
+ | `download(request, ..., max_workers=...)` | Download whole model or one file (batch supported; `max_workers` default 4, max 4). |
340
+ | `ping(...)` | Model filters connectivity check. |
341
+ | `to_json(data, ...)` | JSON helper. |
342
+
343
+ ### Low-level API
344
+
345
+ | Function | Purpose |
346
+ |----------|---------|
347
+ | `require_uuid_string(name, value)` | Validate identifier format before API calls. |
348
+ | `get_filters`, `list_models`, `get_model_metadata`, `list_file_details` | Raw HTTP wrappers. |
349
+ | `get_model_download_url`, `get_file_download_url`, `stream_download_url_to_path` | Presigned URLs and streaming to disk. |
350
+
351
+ ## Reference: `aikosh.datasets`
352
+
353
+ ### Journey
354
+
355
+ | Function | Purpose |
356
+ |----------|---------|
357
+ | `list_directory("dataset", filters=..., ...)` | List datasets. |
358
+ | `get_metadata("dataset", dataset_id, ...)` | Dataset metadata by **`id`**. |
359
+ | `get_dataset_metadata_journey(dataset_id, ...)` | Same as `get_metadata("dataset", ...)`. |
360
+ | `list_files(dataset_id, filters=..., ...)` | Remote file tree for a dataset. |
361
+ | `download(..., max_workers=...)` | Dataset downloads (single or batch; `max_workers` default 4, max 4). |
362
+ | `ping(...)` | Dataset filters connectivity check. |
363
+ | `to_json(...)` | JSON helper. |
364
+
365
+ ### Low-level API
366
+
367
+ `get_filters`, `list_datasets`, `get_dataset_metadata`, `list_file_details`, `get_dataset_version_download_url`, `get_file_download_url`, `stream_download_url_to_path`, `require_uuid_string`.
368
+
369
+ ## Notes and limitations
370
+
371
+ - **Use `id` from API responses** as `identifier` in download/metadata/list_files calls.
372
+ - **Batch downloads**: `max_workers` defaults to **4** and cannot exceed **4**.
373
+ - **Unified top-level APIs**: `list_directory`, `get_metadata`, `list_files`, and `download` all accept `type="dataset"` or `type="model"`.
374
+ - **Model-specific shortcuts**: `aikosh.models.list_files`, `aikosh.models.ping`, etc., when you prefer not to pass `type`.
375
+
376
+ ## Troubleshooting
377
+
378
+ - **401 / Invalid API key**: re-check `AIKOSH_API_KEY` or `aikosh.set_api_key(...)`.
379
+ - **422 invalid id**: pass the **`id`** from `list_directory(...)` / metadata, not a slug or display name.
380
+ - **No results from list search**: if `list_directory` is called with filters (e.g. `keyword`) and nothing matches, the response includes a **`message`** field (e.g. `"No dataset found with the passed filters; try a different combination."`). Pagination-only calls (`page` / `size` alone) do not add this message.
381
+ - **Wrong download location**: use `destination_path` for local saves; `directory_path` is only for remote paths inside the asset.
382
+
383
+ ## License
384
+
385
+ Apache-2.0