atdata 0.2.3b1__tar.gz → 0.3.1b1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (399) hide show
  1. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/issues.db +0 -0
  2. atdata-0.3.1b1/.claude/commands/adr.md +42 -0
  3. atdata-0.3.1b1/.claude/commands/changelog.md +61 -0
  4. atdata-0.3.1b1/.claude/commands/feature.md +43 -0
  5. atdata-0.3.1b1/.claude/commands/release.md +63 -0
  6. {atdata-0.2.3b1 → atdata-0.3.1b1}/.github/workflows/uv-test.yml +58 -3
  7. {atdata-0.2.3b1 → atdata-0.3.1b1}/.gitignore +3 -0
  8. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/decisions/04_schema_evolution.md +1 -1
  9. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/decisions/05_lexicon_namespace.md +1 -1
  10. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/decisions/assessment.md +1 -1
  11. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/decisions/record_lexicon_assessment.md +22 -22
  12. atdata-0.2.3b1/.planning/setup/decisions/sampleSchema_design_questions.md → atdata-0.3.1b1/.planning/phases/01-atproto-foundation/decisions/schema_design_questions.md +3 -3
  13. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/examples/code/ndarray_roundtrip.py +1 -1
  14. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/examples/code/validate_ndarray_shim.py +1 -1
  15. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/examples/dataset_blob_storage.json +1 -1
  16. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/examples/dataset_external_storage.json +1 -1
  17. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/examples/lens_example.json +2 -2
  18. atdata-0.2.3b1/.planning/setup/examples/sampleSchema_example.json → atdata-0.3.1b1/.planning/phases/01-atproto-foundation/examples/schema_example.json +2 -2
  19. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/lexicons/README.md +7 -7
  20. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/lexicons/README_ARRAY_FORMATS.md +5 -5
  21. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/lexicons/README_SCHEMA_TYPES.md +10 -10
  22. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/lexicons/ac.foundation.dataset.getLatestSchema.json +1 -1
  23. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/lexicons/ac.foundation.dataset.lens.json +2 -2
  24. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/lexicons/ac.foundation.dataset.record.json +1 -1
  25. atdata-0.2.3b1/.planning/setup/lexicons/ac.foundation.dataset.sampleSchema.json → atdata-0.3.1b1/.planning/phases/01-atproto-foundation/lexicons/ac.foundation.dataset.schema.json +3 -3
  26. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/lexicons/ac.foundation.dataset.schemaType.json +1 -1
  27. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/ndarray_shim_spec.md +1 -1
  28. {atdata-0.2.3b1 → atdata-0.3.1b1}/.vscode/settings.json +6 -0
  29. atdata-0.3.1b1/CHANGELOG.md +153 -0
  30. {atdata-0.2.3b1 → atdata-0.3.1b1}/CLAUDE.md +122 -35
  31. {atdata-0.2.3b1 → atdata-0.3.1b1}/PKG-INFO +5 -2
  32. {atdata-0.2.3b1 → atdata-0.3.1b1}/README.md +1 -1
  33. atdata-0.3.1b1/benchmarks/bench_atmosphere.py +220 -0
  34. atdata-0.3.1b1/benchmarks/bench_dataset_io.py +293 -0
  35. atdata-0.3.1b1/benchmarks/bench_index_providers.py +215 -0
  36. atdata-0.3.1b1/benchmarks/bench_query.py +278 -0
  37. atdata-0.3.1b1/benchmarks/conftest.py +345 -0
  38. atdata-0.3.1b1/benchmarks/render_report.py +462 -0
  39. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/AbstractDataStore.html +72 -288
  40. atdata-0.3.1b1/docs/api/AbstractIndex.html +1043 -0
  41. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/AtUri.html +61 -194
  42. atdata-0.3.1b1/docs/api/AtmosphereClient.html +684 -0
  43. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/AtmosphereIndex.html +68 -203
  44. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/AtmosphereIndexEntry.html +59 -192
  45. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/BlobSource.html +59 -192
  46. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/DataSource.html +69 -271
  47. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/Dataset.html +645 -338
  48. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/DatasetDict.html +60 -193
  49. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/DatasetLoader.html +61 -194
  50. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/DatasetPublisher.html +68 -202
  51. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/DictSample.html +66 -345
  52. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/IndexEntry.html +60 -201
  53. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/Lens.html +59 -192
  54. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/LensLoader.html +61 -194
  55. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/LensPublisher.html +71 -205
  56. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/PDSBlobStore.html +61 -199
  57. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/Packable-protocol.html +59 -250
  58. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/PackableSample.html +65 -278
  59. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/S3Source.html +59 -192
  60. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/SampleBatch.html +71 -212
  61. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/SchemaLoader.html +65 -199
  62. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/SchemaPublisher.html +65 -199
  63. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/URLSource.html +59 -192
  64. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/index.html +65 -198
  65. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/load_dataset.html +60 -193
  66. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/local.Index.html +533 -250
  67. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/local.LocalDatasetEntry.html +60 -193
  68. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/local.S3DataStore.html +101 -193
  69. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/packable.html +62 -252
  70. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/api/promote_to_atmosphere.html +64 -196
  71. atdata-0.3.1b1/docs/benchmarks/index.html +1331 -0
  72. atdata-0.3.1b1/docs/examples/index-workflow.html +1186 -0
  73. atdata-0.3.1b1/docs/examples/index.html +928 -0
  74. atdata-0.3.1b1/docs/examples/lens-transforms.html +1154 -0
  75. atdata-0.3.1b1/docs/examples/manifest-queries.html +1148 -0
  76. atdata-0.3.1b1/docs/examples/multi-split.html +1147 -0
  77. atdata-0.3.1b1/docs/examples/typed-pipeline.html +1132 -0
  78. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/index.html +194 -425
  79. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/architecture.html +126 -212
  80. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/atmosphere.html +134 -220
  81. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/datasets.html +125 -211
  82. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/deployment.html +112 -198
  83. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/lenses.html +123 -209
  84. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/load-dataset.html +124 -210
  85. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/local-storage.html +123 -209
  86. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/packable-samples.html +125 -211
  87. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/promotion.html +120 -206
  88. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/protocols.html +124 -210
  89. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/troubleshooting.html +113 -199
  90. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/reference/uri-spec.html +114 -200
  91. atdata-0.3.1b1/docs/robots.txt +1 -0
  92. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/search.json +659 -387
  93. atdata-0.3.1b1/docs/site_libs/bootstrap/bootstrap-62ce3d63edf8507b4d15f75c6b92352a.min.css +12 -0
  94. atdata-0.2.3b1/docs/site_libs/quarto-html/quarto-syntax-highlighting-9582434199d49cc9e91654cdeeb4866b.css → atdata-0.3.1b1/docs/site_libs/quarto-html/quarto-syntax-highlighting-b854dd4081d6110d4acfde180236d7b2.css +2 -2
  95. atdata-0.3.1b1/docs/sitemap.xml +223 -0
  96. atdata-0.3.1b1/docs/styles.css +50 -0
  97. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/tutorials/atmosphere.html +203 -299
  98. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/tutorials/local-workflow.html +283 -374
  99. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/tutorials/promotion.html +245 -389
  100. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/tutorials/quickstart.html +152 -243
  101. atdata-0.3.1b1/docs_src/.nojekyll +0 -0
  102. atdata-0.3.1b1/docs_src/_brand.yml +73 -0
  103. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/_quarto.yml +79 -13
  104. atdata-0.3.1b1/docs_src/api/AbstractDataStore.qmd +54 -0
  105. atdata-0.3.1b1/docs_src/api/AbstractIndex.qmd +153 -0
  106. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/AtUri.qmd +2 -2
  107. atdata-0.3.1b1/docs_src/api/AtmosphereClient.qmd +4 -0
  108. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/AtmosphereIndex.qmd +11 -9
  109. atdata-0.3.1b1/docs_src/api/DataSource.qmd +54 -0
  110. atdata-0.3.1b1/docs_src/api/Dataset.qmd +510 -0
  111. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/DatasetDict.qmd +1 -1
  112. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/DatasetLoader.qmd +2 -2
  113. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/DatasetPublisher.qmd +2 -3
  114. atdata-0.3.1b1/docs_src/api/DictSample.qmd +96 -0
  115. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/IndexEntry.qmd +1 -3
  116. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/LensLoader.qmd +2 -2
  117. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/LensPublisher.qmd +4 -5
  118. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/PDSBlobStore.qmd +3 -3
  119. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/Packable-protocol.qmd +1 -31
  120. atdata-0.3.1b1/docs_src/api/PackableSample.qmd +59 -0
  121. atdata-0.3.1b1/docs_src/api/SampleBatch.qmd +31 -0
  122. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/SchemaLoader.qmd +3 -4
  123. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/SchemaPublisher.qmd +3 -4
  124. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/index.qmd +6 -6
  125. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/load_dataset.qmd +1 -1
  126. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/local.Index.qmd +284 -77
  127. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/local.LocalDatasetEntry.qmd +3 -3
  128. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/local.S3DataStore.qmd +23 -7
  129. atdata-0.3.1b1/docs_src/api/packable.qmd +23 -0
  130. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/promote_to_atmosphere.qmd +8 -3
  131. atdata-0.3.1b1/docs_src/examples/index-workflow.qmd +191 -0
  132. atdata-0.3.1b1/docs_src/examples/index.qmd +22 -0
  133. atdata-0.3.1b1/docs_src/examples/lens-transforms.qmd +198 -0
  134. atdata-0.3.1b1/docs_src/examples/manifest-queries.qmd +174 -0
  135. atdata-0.3.1b1/docs_src/examples/multi-split.qmd +174 -0
  136. atdata-0.3.1b1/docs_src/examples/typed-pipeline.qmd +168 -0
  137. atdata-0.3.1b1/docs_src/index.qmd +138 -0
  138. atdata-0.3.1b1/docs_src/objects.json +1 -0
  139. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/troubleshooting.qmd +1 -1
  140. atdata-0.3.1b1/docs_src/styles.css +50 -0
  141. atdata-0.3.1b1/docs_src/theme-dark.scss +1 -0
  142. atdata-0.3.1b1/docs_src/theme-light.scss +15 -0
  143. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/tutorials/atmosphere.qmd +30 -41
  144. atdata-0.3.1b1/docs_src/tutorials/local-workflow.qmd +270 -0
  145. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/tutorials/promotion.qmd +65 -126
  146. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/tutorials/quickstart.qmd +8 -16
  147. atdata-0.3.1b1/justfile +51 -0
  148. atdata-0.3.1b1/lexicons/ac.foundation.dataset.arrayFormat.json +1 -0
  149. atdata-0.3.1b1/lexicons/ac.foundation.dataset.getLatestSchema.json +1 -0
  150. atdata-0.3.1b1/lexicons/ac.foundation.dataset.lens.json +1 -0
  151. atdata-0.3.1b1/lexicons/ac.foundation.dataset.record.json +1 -0
  152. atdata-0.3.1b1/lexicons/ac.foundation.dataset.schema.json +1 -0
  153. atdata-0.3.1b1/lexicons/ac.foundation.dataset.schemaType.json +1 -0
  154. atdata-0.3.1b1/lexicons/ac.foundation.dataset.storageBlobs.json +1 -0
  155. atdata-0.3.1b1/lexicons/ac.foundation.dataset.storageExternal.json +1 -0
  156. atdata-0.3.1b1/lexicons/ndarray_shim.json +1 -0
  157. atdata-0.3.1b1/prototyping/human-review-atmosphere.ipynb +66 -0
  158. atdata-0.3.1b1/prototyping/human-review-local.ipynb +674 -0
  159. {atdata-0.2.3b1 → atdata-0.3.1b1}/pyproject.toml +15 -1
  160. atdata-0.3.1b1/src/atdata/.gitignore +1 -0
  161. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/__init__.py +39 -0
  162. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/_cid.py +0 -21
  163. atdata-0.3.1b1/src/atdata/_exceptions.py +168 -0
  164. atdata-0.3.1b1/src/atdata/_helpers.py +86 -0
  165. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/_hf_api.py +95 -11
  166. atdata-0.3.1b1/src/atdata/_logging.py +70 -0
  167. atdata-0.3.1b1/src/atdata/_protocols.py +343 -0
  168. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/_schema_codec.py +7 -6
  169. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/_stub_manager.py +5 -25
  170. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/_type_utils.py +28 -2
  171. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/atmosphere/__init__.py +31 -20
  172. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/atmosphere/_types.py +4 -4
  173. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/atmosphere/client.py +64 -12
  174. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/atmosphere/lens.py +11 -12
  175. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/atmosphere/records.py +12 -12
  176. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/atmosphere/schema.py +16 -18
  177. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/atmosphere/store.py +6 -7
  178. atdata-0.3.1b1/src/atdata/cli/__init__.py +208 -0
  179. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/cli/diagnose.py +2 -2
  180. atdata-0.2.3b1/src/atdata/cli/local.py → atdata-0.3.1b1/src/atdata/cli/infra.py +11 -11
  181. atdata-0.3.1b1/src/atdata/cli/inspect.py +69 -0
  182. atdata-0.3.1b1/src/atdata/cli/preview.py +63 -0
  183. atdata-0.3.1b1/src/atdata/cli/schema.py +109 -0
  184. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/dataset.py +583 -328
  185. atdata-0.3.1b1/src/atdata/index/__init__.py +54 -0
  186. atdata-0.3.1b1/src/atdata/index/_entry.py +157 -0
  187. atdata-0.3.1b1/src/atdata/index/_index.py +1198 -0
  188. atdata-0.3.1b1/src/atdata/index/_schema.py +380 -0
  189. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/lens.py +9 -2
  190. atdata-0.3.1b1/src/atdata/lexicons/__init__.py +121 -0
  191. atdata-0.3.1b1/src/atdata/lexicons/ac.foundation.dataset.arrayFormat.json +16 -0
  192. atdata-0.3.1b1/src/atdata/lexicons/ac.foundation.dataset.getLatestSchema.json +78 -0
  193. atdata-0.3.1b1/src/atdata/lexicons/ac.foundation.dataset.lens.json +99 -0
  194. atdata-0.3.1b1/src/atdata/lexicons/ac.foundation.dataset.record.json +96 -0
  195. atdata-0.3.1b1/src/atdata/lexicons/ac.foundation.dataset.schema.json +107 -0
  196. atdata-0.3.1b1/src/atdata/lexicons/ac.foundation.dataset.schemaType.json +16 -0
  197. atdata-0.3.1b1/src/atdata/lexicons/ac.foundation.dataset.storageBlobs.json +24 -0
  198. atdata-0.3.1b1/src/atdata/lexicons/ac.foundation.dataset.storageExternal.json +25 -0
  199. atdata-0.3.1b1/src/atdata/lexicons/ndarray_shim.json +16 -0
  200. atdata-0.3.1b1/src/atdata/local/__init__.py +70 -0
  201. atdata-0.3.1b1/src/atdata/local/_repo_legacy.py +218 -0
  202. atdata-0.3.1b1/src/atdata/manifest/__init__.py +28 -0
  203. atdata-0.3.1b1/src/atdata/manifest/_aggregates.py +156 -0
  204. atdata-0.3.1b1/src/atdata/manifest/_builder.py +163 -0
  205. atdata-0.3.1b1/src/atdata/manifest/_fields.py +154 -0
  206. atdata-0.3.1b1/src/atdata/manifest/_manifest.py +146 -0
  207. atdata-0.3.1b1/src/atdata/manifest/_query.py +150 -0
  208. atdata-0.3.1b1/src/atdata/manifest/_writer.py +74 -0
  209. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/promote.py +18 -14
  210. atdata-0.3.1b1/src/atdata/providers/__init__.py +25 -0
  211. atdata-0.3.1b1/src/atdata/providers/_base.py +140 -0
  212. atdata-0.3.1b1/src/atdata/providers/_factory.py +69 -0
  213. atdata-0.3.1b1/src/atdata/providers/_postgres.py +214 -0
  214. atdata-0.3.1b1/src/atdata/providers/_redis.py +171 -0
  215. atdata-0.3.1b1/src/atdata/providers/_sqlite.py +191 -0
  216. atdata-0.3.1b1/src/atdata/repository.py +323 -0
  217. atdata-0.3.1b1/src/atdata/stores/__init__.py +23 -0
  218. atdata-0.3.1b1/src/atdata/stores/_disk.py +123 -0
  219. atdata-0.3.1b1/src/atdata/stores/_s3.py +349 -0
  220. atdata-0.3.1b1/src/atdata/testing.py +341 -0
  221. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/EXPECTED_WARNINGS.md +2 -2
  222. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_atmosphere.py +42 -46
  223. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_cid.py +0 -44
  224. atdata-0.3.1b1/tests/test_cli.py +790 -0
  225. atdata-0.3.1b1/tests/test_coverage_gaps.py +306 -0
  226. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_dataset.py +44 -10
  227. atdata-0.3.1b1/tests/test_dev_experience.py +423 -0
  228. atdata-0.3.1b1/tests/test_disk_store.py +123 -0
  229. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_helpers.py +49 -0
  230. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_hf_api.py +48 -8
  231. atdata-0.3.1b1/tests/test_index_providers.py +477 -0
  232. atdata-0.3.1b1/tests/test_index_write.py +254 -0
  233. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_integration.py +1 -1
  234. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_integration_atmosphere.py +25 -26
  235. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_integration_atmosphere_live.py +16 -42
  236. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_integration_cross_backend.py +28 -27
  237. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_integration_dynamic_types.py +2 -2
  238. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_integration_edge_cases.py +3 -3
  239. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_integration_error_handling.py +32 -40
  240. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_integration_lens.py +1 -1
  241. atdata-0.3.1b1/tests/test_integration_manifest.py +263 -0
  242. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_integration_promotion.py +28 -30
  243. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_lens.py +1 -1
  244. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_local.py +52 -200
  245. atdata-0.3.1b1/tests/test_logging.py +60 -0
  246. atdata-0.3.1b1/tests/test_manifest.py +528 -0
  247. atdata-0.3.1b1/tests/test_partial_failure.py +152 -0
  248. atdata-0.3.1b1/tests/test_postgres_provider.py +411 -0
  249. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_promote.py +4 -4
  250. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_protocols.py +10 -7
  251. atdata-0.3.1b1/tests/test_query_coverage.py +215 -0
  252. atdata-0.3.1b1/tests/test_repository.py +379 -0
  253. atdata-0.3.1b1/tests/test_repository_coverage.py +265 -0
  254. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_sources.py +5 -0
  255. atdata-0.3.1b1/tests/test_stub_manager.py +533 -0
  256. atdata-0.3.1b1/tests/test_testing.py +246 -0
  257. atdata-0.3.1b1/tests/test_type_utils.py +182 -0
  258. atdata-0.3.1b1/tests/test_write_samples.py +173 -0
  259. {atdata-0.2.3b1 → atdata-0.3.1b1}/uv.lock +114 -2
  260. atdata-0.2.3b1/CHANGELOG.md +0 -195
  261. atdata-0.2.3b1/docs/api/AbstractIndex.html +0 -1356
  262. atdata-0.2.3b1/docs/api/AtmosphereClient.html +0 -1891
  263. atdata-0.2.3b1/docs/robots.txt +0 -1
  264. atdata-0.2.3b1/docs/site_libs/bootstrap/bootstrap-62bce24ca844314e7bb1a34dbdfe05cc.min.css +0 -12
  265. atdata-0.2.3b1/docs/site_libs/bootstrap/bootstrap-dark-7964ffd8887b0991fe8d71c6c8bc75d6.min.css +0 -12
  266. atdata-0.2.3b1/docs/site_libs/quarto-html/quarto-syntax-highlighting-dark-8dcd8563ea6803ab7cbb3d71ca5772e1.css +0 -210
  267. atdata-0.2.3b1/docs/sitemap.xml +0 -199
  268. atdata-0.2.3b1/docs_src/api/AbstractDataStore.qmd +0 -94
  269. atdata-0.2.3b1/docs_src/api/AbstractIndex.qmd +0 -236
  270. atdata-0.2.3b1/docs_src/api/AtmosphereClient.qmd +0 -422
  271. atdata-0.2.3b1/docs_src/api/DataSource.qmd +0 -95
  272. atdata-0.2.3b1/docs_src/api/Dataset.qmd +0 -241
  273. atdata-0.2.3b1/docs_src/api/DictSample.qmd +0 -151
  274. atdata-0.2.3b1/docs_src/api/PackableSample.qmd +0 -83
  275. atdata-0.2.3b1/docs_src/api/SampleBatch.qmd +0 -42
  276. atdata-0.2.3b1/docs_src/api/packable.qmd +0 -45
  277. atdata-0.2.3b1/docs_src/index.qmd +0 -247
  278. atdata-0.2.3b1/docs_src/objects.json +0 -1
  279. atdata-0.2.3b1/docs_src/tutorials/local-workflow.qmd +0 -269
  280. atdata-0.2.3b1/justfile +0 -2
  281. atdata-0.2.3b1/prototyping/human-review-atmosphere.ipynb +0 -25
  282. atdata-0.2.3b1/prototyping/human-review-local.ipynb +0 -634
  283. atdata-0.2.3b1/src/atdata/_helpers.py +0 -60
  284. atdata-0.2.3b1/src/atdata/_protocols.py +0 -504
  285. atdata-0.2.3b1/src/atdata/cli/__init__.py +0 -222
  286. atdata-0.2.3b1/src/atdata/local.py +0 -1720
  287. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/c.md +0 -0
  288. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/cpp.md +0 -0
  289. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/csharp.md +0 -0
  290. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/global.md +0 -0
  291. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/go.md +0 -0
  292. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/java.md +0 -0
  293. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/javascript-react.md +0 -0
  294. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/javascript.md +0 -0
  295. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/kotlin.md +0 -0
  296. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/odin.md +0 -0
  297. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/php.md +0 -0
  298. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/project.md +0 -0
  299. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/python.md +0 -0
  300. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/ruby.md +0 -0
  301. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/rust.md +0 -0
  302. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/scala.md +0 -0
  303. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/swift.md +0 -0
  304. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/typescript-react.md +0 -0
  305. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/typescript.md +0 -0
  306. {atdata-0.2.3b1 → atdata-0.3.1b1}/.chainlink/rules/zig.md +0 -0
  307. {atdata-0.2.3b1 → atdata-0.3.1b1}/.claude/hooks/post-edit-check.py +0 -0
  308. {atdata-0.2.3b1 → atdata-0.3.1b1}/.claude/hooks/prompt-guard.py +0 -0
  309. {atdata-0.2.3b1 → atdata-0.3.1b1}/.claude/hooks/session-start.py +0 -0
  310. {atdata-0.2.3b1 → atdata-0.3.1b1}/.claude/settings.json +0 -0
  311. {atdata-0.2.3b1 → atdata-0.3.1b1}/.github/workflows/uv-publish-pypi.yml +0 -0
  312. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/01_overview.md +0 -0
  313. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/02_lexicon_design.md +0 -0
  314. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/03_python_client.md +0 -0
  315. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/04_appview.md +0 -0
  316. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/05_codegen.md +0 -0
  317. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/README.md +0 -0
  318. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/atproto_integration.md +0 -0
  319. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/decisions/01_schema_representation_format.md +0 -0
  320. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/decisions/02_lens_code_storage.md +0 -0
  321. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/decisions/03_webdataset_storage.md +0 -0
  322. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/decisions/06_lexicon_validation.md +0 -0
  323. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/decisions/README.md +0 -0
  324. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/lexicons/ac.foundation.dataset.arrayFormat.json +0 -0
  325. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/lexicons/ac.foundation.dataset.storageBlobs.json +0 -0
  326. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/lexicons/ac.foundation.dataset.storageExternal.json +0 -0
  327. {atdata-0.2.3b1/.planning/setup → atdata-0.3.1b1/.planning/phases/01-atproto-foundation}/lexicons/ndarray_shim.json +0 -0
  328. {atdata-0.2.3b1/.planning/roadmap/v0.2 → atdata-0.3.1b1/.planning/phases/02-v0.2-review}/03_human-review-assessment.md +0 -0
  329. {atdata-0.2.3b1/.planning/roadmap/v0.3 → atdata-0.3.1b1/.planning/phases/03-v0.3-roadmap}/01_codebase-review.md +0 -0
  330. {atdata-0.2.3b1/.planning/roadmap/v0.3 → atdata-0.3.1b1/.planning/phases/03-v0.3-roadmap}/02_synthesis-roadmap.md +0 -0
  331. {atdata-0.2.3b1/.planning/roadmap/v0.3 → atdata-0.3.1b1/.planning/phases/03-v0.3-roadmap}/architecture-doc.md +0 -0
  332. {atdata-0.2.3b1 → atdata-0.3.1b1}/.python-version +0 -0
  333. {atdata-0.2.3b1 → atdata-0.3.1b1}/.reference/atproto_lexicon_guide.md +0 -0
  334. {atdata-0.2.3b1 → atdata-0.3.1b1}/.reference/atproto_lexicon_spec.md +0 -0
  335. {atdata-0.2.3b1 → atdata-0.3.1b1}/.reference/huggingface-datasets/architecture.md +0 -0
  336. {atdata-0.2.3b1 → atdata-0.3.1b1}/.reference/huggingface-datasets/loading-guide.md +0 -0
  337. {atdata-0.2.3b1 → atdata-0.3.1b1}/.reference/huggingface-datasets/loading-methods.md +0 -0
  338. {atdata-0.2.3b1 → atdata-0.3.1b1}/.reference/huggingface-datasets/main-classes.md +0 -0
  339. {atdata-0.2.3b1 → atdata-0.3.1b1}/.reference/python_atproto_sdk.md +0 -0
  340. {atdata-0.2.3b1 → atdata-0.3.1b1}/.review/comprehensive-review.md +0 -0
  341. {atdata-0.2.3b1 → atdata-0.3.1b1}/.review/human-review.md +0 -0
  342. {atdata-0.2.3b1 → atdata-0.3.1b1}/LICENSE +0 -0
  343. /atdata-0.2.3b1/docs/.nojekyll → /atdata-0.3.1b1/benchmarks/__init__.py +0 -0
  344. {atdata-0.2.3b1/docs_src → atdata-0.3.1b1/docs}/.nojekyll +0 -0
  345. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/assets/styles.css +0 -0
  346. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/bootstrap/bootstrap-icons.css +0 -0
  347. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/bootstrap/bootstrap-icons.woff +0 -0
  348. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/bootstrap/bootstrap.min.js +0 -0
  349. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/clipboard/clipboard.min.js +0 -0
  350. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/quarto-html/anchor.min.js +0 -0
  351. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/quarto-html/popper.min.js +0 -0
  352. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/quarto-html/quarto.js +0 -0
  353. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/quarto-html/tabsets/tabsets.js +0 -0
  354. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/quarto-html/tippy.css +0 -0
  355. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/quarto-html/tippy.umd.min.js +0 -0
  356. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/quarto-nav/headroom.min.js +0 -0
  357. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/quarto-nav/quarto-nav.js +0 -0
  358. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/quarto-search/autocomplete.umd.js +0 -0
  359. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/quarto-search/fuse.min.js +0 -0
  360. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs/site_libs/quarto-search/quarto-search.js +0 -0
  361. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/.backup/api-index-handwritten.qmd +0 -0
  362. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/.backup/atmosphere.md +0 -0
  363. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/.backup/datasets.md +0 -0
  364. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/.backup/index.md +0 -0
  365. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/.backup/lenses.md +0 -0
  366. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/.backup/load-dataset.md +0 -0
  367. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/.backup/local-storage.md +0 -0
  368. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/.backup/packable-samples.md +0 -0
  369. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/.backup/promotion.md +0 -0
  370. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/.backup/protocols.md +0 -0
  371. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/.gitignore +0 -0
  372. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/AtmosphereIndexEntry.qmd +0 -0
  373. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/BlobSource.qmd +0 -0
  374. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/Lens.qmd +0 -0
  375. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/S3Source.qmd +0 -0
  376. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/api/URLSource.qmd +0 -0
  377. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/assets/styles.css +0 -0
  378. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/architecture.qmd +0 -0
  379. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/atmosphere.qmd +0 -0
  380. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/datasets.qmd +0 -0
  381. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/deployment.qmd +0 -0
  382. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/lenses.qmd +0 -0
  383. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/load-dataset.qmd +0 -0
  384. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/local-storage.qmd +0 -0
  385. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/packable-samples.qmd +0 -0
  386. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/promotion.qmd +0 -0
  387. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/protocols.qmd +0 -0
  388. {atdata-0.2.3b1 → atdata-0.3.1b1}/docs_src/reference/uri-spec.qmd +0 -0
  389. {atdata-0.2.3b1 → atdata-0.3.1b1}/examples/atmosphere_demo.py +0 -0
  390. {atdata-0.2.3b1 → atdata-0.3.1b1}/examples/local_workflow.py +0 -0
  391. {atdata-0.2.3b1 → atdata-0.3.1b1}/examples/promote_workflow.py +0 -0
  392. {atdata-0.2.3b1 → atdata-0.3.1b1}/issues.db +0 -0
  393. {atdata-0.2.3b1 → atdata-0.3.1b1}/prototyping/.credentials/.gitignore +0 -0
  394. {atdata-0.2.3b1 → atdata-0.3.1b1}/prototyping/data/.gitignore +0 -0
  395. {atdata-0.2.3b1 → atdata-0.3.1b1}/src/atdata/_sources.py +0 -0
  396. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/conftest.py +0 -0
  397. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/fixtures/test_samples.tar +0 -0
  398. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_integration_e2e.py +0 -0
  399. {atdata-0.2.3b1 → atdata-0.3.1b1}/tests/test_integration_local.py +0 -0
@@ -0,0 +1,42 @@
1
+ ---
2
+ allowed-tools: Bash(git status:*), Bash(git log:*), Bash(chainlink tree:*), Bash(chainlink comment:*), Bash(chainlink subissue:*), Bash(chainlink create:*), Bash(chainlink session:*), Bash(chainlink --help), Bash(chainlink close:*), Bash(uv run pytest:*), Bash(uv run ruff:*)
3
+ description: Perform an adversarial review
4
+ ---
5
+
6
+ ## Context
7
+
8
+ - Current issue tree: !`chainlink tree`
9
+ - Current test outputs: !`uv run pytest -v`
10
+ - Recent commits: !`git log --oneline -10`
11
+ - Chainlink help: !`chainlink --help`
12
+
13
+ ## Your task
14
+
15
+ 1. Develop summary assessment of test suite
16
+ - Look through all of the unit tests currently in the project, and create a plan of how well these tests are implemented to test the functionality at the core of the project, how well these tests actually fully cover desired behavior and edge cases, whether the tests are formally correct, and whether there is any redundancy in the tests or documentation for them
17
+ - Develop a plan for how to address these concerns point by point
18
+ 2. Develop summary assessment of codebase
19
+ - Look through all of the source files currently in the project's main modules, and create a plan of how well-implemented, efficient, and generalizable the current implementation is, as well as whether there is adequate, too sparse, or too verbose documentation
20
+ - Develop a plan for improvements, tweaks, or refactors that could be applied to the current codebase and its documentation
21
+ 3. Create issue and subissues
22
+ - Create a base issue in chainlink for this adversarial review
23
+ - Create subissues for each of the plan items addressed in steps 1 and 2.
24
+ 4. Address all subissues for this adversarial review
25
+ - Ordered by priority, address and close each of the subissues identified
26
+ - Provide thorough documentation of each step you take in the chainlink comments
27
+
28
+ ## Constraints
29
+
30
+ - **Adversarial**: You are engaging in this task from the perspective of a reviewer that is hyper-critical.
31
+ - **Optimize code contraction**: You are operating as one half of a cyclical dyad, in which the other half is responsible for generating a lot of code, but has a propensity to write too much, and write implementations that are verbose, inefficient, or inaccurate at times. Your job is to be the critical eye, and to identify and implement revisions that make the code concise, efficient, and formally correct.
32
+ - **Consider test correctness**: The tests you are presented with are not necessarily complete for covering the desired functionality. Think through ways in which you could make the test suite more accurate to the task at hand, and also of ways in which you could test the codebase's functionality that are not currently addressed. Be creative and leverage web search in this endeavor to see current best practices for the problem that could aid developing tests.
33
+ - **Preserve documentation for API generation**: This project uses quartodoc to auto-generate API documentation from docstrings. Docstrings are a feature, not bloat. When reviewing documentation verbosity, apply these rules:
34
+ - **KEEP**: Module-level docstrings, class-level docstrings, `Args:`, `Returns:`, `Raises:`, `Examples:` sections on all public APIs
35
+ - **KEEP**: Docstrings that explain *why* something works a certain way, non-obvious behavior, or protocol/interface contracts
36
+ - **KEEP**: `Examples:` sections — these render as live code samples in the docs site
37
+ - **TRIM**: Docstrings that *only* restate the function signature with no added value (e.g. "`name: The name`" when the type hint already says `name: str`)
38
+ - **TRIM**: Multi-paragraph explanations on private/internal helpers where a one-liner suffices
39
+ - **NEVER REMOVE**: Docstrings from public API methods, protocol definitions, or decorated classes
40
+ - When in doubt, leave the docstring. A slightly verbose docstring that helps a user is better than a missing one that forces them to read source.
41
+ - **Batch mechanical fixes**: Group similar changes (e.g. all weak assertion fixes) into a single commit rather than one subissue per file. Reserve individual subissues for changes that require design thought.
42
+ - **Close low-value issues**: If a finding would add complexity, risk regressions, or save fewer than 10 lines, close it as "not worth the churn" with a comment explaining why.
@@ -0,0 +1,61 @@
1
+ ---
2
+ allowed-tools: Bash(git log:*), Bash(git tag:*), Bash(git diff:*), Bash(chainlink *)
3
+ description: Generate a clean CHANGELOG entry from recent work
4
+ ---
5
+
6
+ ## Context
7
+
8
+ - Current version: !`grep '^version' pyproject.toml`
9
+ - Recent tags: !`git tag --sort=-creatordate | head -5`
10
+ - CHANGELOG head: !`head -20 CHANGELOG.md`
11
+ - Recent chainlink issues: !`chainlink list`
12
+
13
+ ## Your task
14
+
15
+ Generate a properly structured CHANGELOG entry for the current release, following [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.
16
+
17
+ ### 1. Gather changes
18
+
19
+ Identify all changes since the last release by examining:
20
+ - `git log --oneline <last-release-tag-or-branch>..HEAD` for commit messages
21
+ - `chainlink list` for closed issues and their descriptions
22
+ - `git diff --stat <last-release-tag-or-branch>..HEAD` for files changed
23
+
24
+ ### 2. Categorize changes
25
+
26
+ Sort changes into Keep a Changelog sections:
27
+
28
+ - **Added**: New features, new files, new public APIs, new test suites
29
+ - **Changed**: Modifications to existing behavior, refactors, dependency updates, CI changes
30
+ - **Fixed**: Bug fixes, lint fixes, CI fixes
31
+ - **Deprecated**: Newly deprecated APIs (with migration path)
32
+ - **Removed**: Removed features, deleted files, removed APIs
33
+
34
+ ### 3. Write the entry
35
+
36
+ Follow these formatting rules:
37
+ - Each item should be a concise, user-facing description — not a chainlink issue title
38
+ - Group related changes under bold sub-headers (e.g. **`LocalDiskStore`**: description)
39
+ - Use nested bullets for sub-items that belong to a feature group
40
+ - Omit internal-only changes (individual subissue closes, review assessments, investigation tickets)
41
+ - Include GitHub issue references where relevant (e.g. `(GH#42)`)
42
+ - Do NOT include chainlink issue numbers — these are internal tracking
43
+
44
+ ### 4. Update CHANGELOG.md
45
+
46
+ - Insert the new version section between `## [Unreleased]` and the previous release
47
+ - Leave `## [Unreleased]` empty at the top
48
+ - Do not modify any existing release sections below
49
+
50
+ ### 5. Verify
51
+
52
+ - Confirm the CHANGELOG renders as valid markdown
53
+ - Confirm no chainlink auto-appended entries leaked into existing release sections
54
+
55
+ ## Constraints
56
+
57
+ - Follow Keep a Changelog format strictly
58
+ - Write for the library's users, not for internal tracking
59
+ - Consolidate — 5 well-written bullets are better than 30 issue titles
60
+ - Preserve existing release sections exactly as they are
61
+ - If chainlink has appended noise to existing sections, clean it up
@@ -0,0 +1,43 @@
1
+ ---
2
+ allowed-tools: Bash(git *), Bash(chainlink *)
3
+ description: Create a feature branch from a human-readable description
4
+ ---
5
+
6
+ ## Context
7
+
8
+ - Current branch: !`git branch --show-current`
9
+ - Recent release branches: !`git branch --list 'release/*' | tail -5`
10
+ - Existing feature branches: !`git branch --list 'feature/*' | tail -10`
11
+ - Working tree status: !`git status --short`
12
+ - Remotes: !`git remote -v`
13
+
14
+ ## Your task
15
+
16
+ The user will provide a human-readable description of the feature (e.g. "add batch retry logic"). Create a feature branch following the project's naming convention.
17
+
18
+ ### 1. Derive the branch name
19
+
20
+ - Slugify the description: lowercase, strip non-alphanumeric characters (except hyphens), replace spaces with hyphens, collapse consecutive hyphens.
21
+ - The branch name is `feature/<slug>` (e.g. `feature/add-batch-retry-logic`).
22
+ - If the slug is empty or the branch already exists, ask the user for a different name.
23
+
24
+ ### 2. Validate preconditions
25
+
26
+ - Confirm there are no uncommitted changes (other than `.chainlink/issues.db`). If there are, warn the user and ask whether to stash or abort.
27
+ - Identify the base branch. Default to the current branch. If the user provides a `--from <ref>` argument, use that instead.
28
+
29
+ ### 3. Create the branch
30
+
31
+ - `git checkout -b feature/<slug>` (from the resolved base)
32
+ - Print the created branch name so the user can confirm.
33
+
34
+ ### 4. Track in chainlink
35
+
36
+ - Create a chainlink issue for the feature work with the user's original description as the title.
37
+ - Set priority to `medium` (unless the user specifies otherwise).
38
+
39
+ ## Constraints
40
+
41
+ - Never force-push or delete branches.
42
+ - Do not push the branch to a remote — the user will do that when ready.
43
+ - Keep the slug concise. If the description is very long, truncate to the first 6-8 meaningful words.
@@ -0,0 +1,63 @@
1
+ ---
2
+ allowed-tools: Bash(git *), Bash(gh *), Bash(uv lock*), Bash(uv run ruff*), Bash(uv run pytest*), Bash(chainlink *), Bash(uv run ruff format*)
3
+ description: Prepare and submit a beta release
4
+ ---
5
+
6
+ ## Context
7
+
8
+ - Current branch: !`git branch --show-current`
9
+ - Recent commits: !`git log --oneline -15`
10
+ - All branches: !`git branch --list 'release/*' | tail -5`
11
+ - Current version: !`grep '^version' pyproject.toml`
12
+ - Remotes: !`git remote -v`
13
+
14
+ ## Your task
15
+
16
+ The user will provide a version string (e.g. `v0.3.0b2`). Perform the full release flow:
17
+
18
+ ### 1. Validate preconditions
19
+ - Confirm all tests pass: `uv run pytest tests/ -x -q`
20
+ - Confirm lint is clean: `uv run ruff check src/ tests/`
21
+ - Confirm formatting is clean: `uv run ruff format --check src/ tests/` (fix with `uv run ruff format src/ tests/` if needed)
22
+ - Confirm no uncommitted changes (other than `.chainlink/issues.db`)
23
+ - Identify the previous release branch to branch from (e.g. `release/v0.3.0b1`)
24
+ - Identify the feature branch to merge (current branch or ask user)
25
+
26
+ ### 2. Create release branch
27
+ - Stash any uncommitted changes
28
+ - `git checkout <previous-release-branch>`
29
+ - `git checkout -b release/<version>`
30
+ - `git merge <feature-branch> --no-ff --no-edit`
31
+ - `git stash pop` (if anything was stashed)
32
+
33
+ ### 3. Prepare release
34
+ - Bump version in `pyproject.toml`
35
+ - Run `uv lock` to update the lockfile
36
+ - Run `/changelog` skill to generate a clean CHANGELOG entry (or generate one manually following Keep a Changelog format with Added/Changed/Fixed sections)
37
+ - Run `uv run ruff check src/ tests/` and fix any lint errors
38
+ - Run `uv run ruff format --check src/ tests/` and fix any format errors (run `uv run ruff format src/ tests/` to auto-fix)
39
+ - Run `uv run pytest tests/ -x -q` to confirm tests pass
40
+
41
+ ### 4. Commit and push
42
+ - `git add pyproject.toml uv.lock CHANGELOG.md .chainlink/issues.db`
43
+ - `git commit -m "release: prepare <version>"`
44
+ - `git push -u origin release/<version>`
45
+
46
+ ### 5. Create PR
47
+ - Create PR to `upstream/main` using `gh pr create`:
48
+ - `--repo forecast-bio/atdata`
49
+ - `--base main`
50
+ - `--head release/<version>`
51
+ - Title: `release: <version>`
52
+ - Body: summary of changes from CHANGELOG, test plan with pass counts
53
+
54
+ ### 6. Track in chainlink
55
+ - Create a chainlink issue for the release, close when PR is submitted
56
+
57
+ ## Constraints
58
+
59
+ - Always use `--no-ff` for merges to preserve branch topology
60
+ - Always run `uv lock` after version bumps — stale lockfiles break CI
61
+ - Always run both `ruff check` and `ruff format --check` before committing — either will fail CI
62
+ - Never force-push to release branches
63
+ - The CHANGELOG should follow Keep a Changelog format with proper Added/Changed/Fixed sections, not a flat list of chainlink issues
@@ -4,13 +4,11 @@ on:
4
4
  push:
5
5
  branches:
6
6
  - main
7
- - release/*
8
7
  pull_request:
9
- branches:
10
- - main
11
8
 
12
9
  permissions:
13
10
  contents: read
11
+ actions: read
14
12
 
15
13
  concurrency:
16
14
  group: ${{ github.workflow }}-${{ github.ref }}
@@ -77,3 +75,60 @@ jobs:
77
75
  with:
78
76
  fail_ci_if_error: false
79
77
  token: ${{ secrets.CODECOV_TOKEN }}
78
+
79
+ benchmark:
80
+ name: Benchmarks
81
+ runs-on: ubuntu-latest
82
+ needs: [lint]
83
+ permissions:
84
+ contents: write
85
+ actions: write
86
+ steps:
87
+ - uses: actions/checkout@v5
88
+
89
+ - name: Set up Python
90
+ uses: actions/setup-python@v5
91
+ with:
92
+ python-version: "3.14"
93
+
94
+ - name: Install uv
95
+ uses: astral-sh/setup-uv@v6
96
+ with:
97
+ enable-cache: true
98
+
99
+ - name: Install just
100
+ uses: extractions/setup-just@v2
101
+
102
+ - name: Install the project
103
+ run: uv sync --locked --all-extras --dev
104
+
105
+ - name: Start Redis
106
+ uses: supercharge/redis-github-action@1.8.1
107
+ with:
108
+ redis-version: 7
109
+
110
+ - name: Run benchmarks
111
+ run: just bench
112
+
113
+ - name: Copy report to docs
114
+ run: |
115
+ mkdir -p docs/benchmarks
116
+ cp .bench/report.html docs/benchmarks/index.html
117
+
118
+ - name: Commit updated benchmark docs
119
+ if: github.event_name == 'push'
120
+ run: |
121
+ git config user.name "github-actions[bot]"
122
+ git config user.email "github-actions[bot]@users.noreply.github.com"
123
+ git add docs/benchmarks/index.html
124
+ git diff --cached --quiet || git commit -m "docs: update benchmark report [skip ci]"
125
+ git push
126
+
127
+ - name: Upload benchmark report
128
+ uses: actions/upload-artifact@v4
129
+ if: always()
130
+ with:
131
+ name: benchmark-report
132
+ path: |
133
+ .bench/report.html
134
+ .bench/*.json
@@ -52,6 +52,9 @@ MANIFEST
52
52
  pip-log.txt
53
53
  pip-delete-this-directory.txt
54
54
 
55
+ # Benchmark results
56
+ .bench/
57
+
55
58
  # Unit test / coverage reports
56
59
  htmlcov/
57
60
  .tox/
@@ -9,7 +9,7 @@
9
9
 
10
10
  For this, let's take the following approach:
11
11
 
12
- 1. Let's make the `rkey` for the `ac.foundation.dataset.sampleSchema` records be of type `any`.
12
+ 1. Let's make the `rkey` for the `ac.foundation.dataset.schema` records be of type `any`.
13
13
  2. Then, we can have our own standard for the `rkey` being of the format `{NSID}@{semver}`, where `{NSID}` gives an NSID for the permanent identifier of this sample schema type.
14
14
  * This allows us to bookkeep on the version updates
15
15
  * We can make a `ac.foundation.dataset.getLatestSchema` `query` Lexicon that will provide the record for the latest version of a given schema, as well
@@ -16,7 +16,7 @@ ac.foundation.dataset.*
16
16
  The choices we have then are
17
17
 
18
18
  ```
19
- ac.foundation.dataset.sampleSchema
19
+ ac.foundation.dataset.schema
20
20
  ac.foundation.dataset.record
21
21
  ac.foundation.dataset.lens
22
22
  ```
@@ -16,7 +16,7 @@ The finalized design decisions prioritize **flexibility and future-proofing** ov
16
16
  2. **Lens Code (#46)**: External repos (GitHub + tangled.org), language metadata, future attestation
17
17
  3. **Storage (#47)**: Hybrid (URLs + blobs) from start, AppView proxy for blobs
18
18
  4. **Evolution (#48)**: rkey as {NSID}@{semver}, getLatestSchema query, optional migration Lenses
19
- 5. **Namespace (#49)**: `ac.foundation.dataset.*` (sampleSchema, record, lens)
19
+ 5. **Namespace (#49)**: `ac.foundation.dataset.*` (schema, record, lens)
20
20
 
21
21
  ---
22
22
 
@@ -17,7 +17,7 @@ Comprehensive assessment of `ac.foundation.dataset.record` Lexicon design agains
17
17
  The record Lexicon provides a solid foundation for dataset indexing with hybrid storage support. Key strengths include clean union-based storage design and appropriate use of ATProto primitives. However, several issues need addressing:
18
18
 
19
19
  - ⚠️ **Critical**: schemaRef should use format validation
20
- - ⚠️ **High**: Metadata structure inconsistency with sampleSchema pattern
20
+ - ⚠️ **High**: Metadata structure inconsistency with schema pattern
21
21
  - ⚠️ **Medium**: Missing $type discriminators in union variants
22
22
  - ✅ **Strength**: Clean storage union design
23
23
  - ✅ **Strength**: Appropriate use of tid keys for datasets
@@ -40,8 +40,8 @@ The record Lexicon provides a solid foundation for dataset indexing with hybrid
40
40
  - Appropriate for records without natural semantic keys
41
41
  - Consistent with ATProto patterns for user-generated content
42
42
 
43
- **Comparison to sampleSchema:**
44
- - sampleSchema uses `"key": "any"` for versioned rkeys like `{NSID}@{semver}`
43
+ **Comparison to schema:**
44
+ - schema uses `"key": "any"` for versioned rkeys like `{NSID}@{semver}`
45
45
  - record uses `"key": "tid"` for chronological dataset entries
46
46
  - Both choices are appropriate for their use cases
47
47
 
@@ -59,14 +59,14 @@ The record Lexicon provides a solid foundation for dataset indexing with hybrid
59
59
  }
60
60
  ```
61
61
 
62
- **Problem:** Should use `"format": "at-uri"` like we did for sampleSchema fields.
62
+ **Problem:** Should use `"format": "at-uri"` like we did for schema fields.
63
63
 
64
64
  **Fix:**
65
65
  ```json
66
66
  "schemaRef": {
67
67
  "type": "string",
68
68
  "format": "at-uri",
69
- "description": "AT-URI reference to the sampleSchema record",
69
+ "description": "AT-URI reference to the schema record",
70
70
  "maxLength": 500
71
71
  }
72
72
  ```
@@ -77,7 +77,7 @@ The record Lexicon provides a solid foundation for dataset indexing with hybrid
77
77
 
78
78
  #### Issue 2.2: License Field Inconsistency ⚠️ **Medium**
79
79
 
80
- sampleSchema metadata:
80
+ schema metadata:
81
81
  ```json
82
82
  "license": {
83
83
  "type": "string",
@@ -97,7 +97,7 @@ record:
97
97
 
98
98
  **Problem:** Inconsistent maxLength and less detailed guidance.
99
99
 
100
- **Recommendation:** Align with sampleSchema:
100
+ **Recommendation:** Align with schema:
101
101
  - maxLength: 200 (to support full URLs)
102
102
  - Enhanced description with examples
103
103
  - Reference Schema.org license property
@@ -106,7 +106,7 @@ record:
106
106
 
107
107
  #### Issue 2.3: Tags Field Inconsistency ⚠️ **Medium**
108
108
 
109
- sampleSchema metadata:
109
+ schema metadata:
110
110
  ```json
111
111
  "tags": {
112
112
  "type": "array",
@@ -145,7 +145,7 @@ record:
145
145
  "license": {...}
146
146
  ```
147
147
 
148
- sampleSchema:
148
+ schema:
149
149
  ```json
150
150
  "metadata": {
151
151
  "type": "object",
@@ -164,10 +164,10 @@ sampleSchema:
164
164
  - Pros: More discoverable (top-level fields, indexed/searchable)
165
165
  - Pros: Validated by Lexicon
166
166
  - Cons: Duplicates structure with metadata blob
167
- - Cons: Inconsistent with sampleSchema pattern
167
+ - Cons: Inconsistent with schema pattern
168
168
 
169
169
  **Option B: Unified Metadata Object**
170
- - Pros: Consistent with sampleSchema
170
+ - Pros: Consistent with schema
171
171
  - Pros: Single source of truth
172
172
  - Cons: Less discoverable for search
173
173
  - Cons: Can't validate blob contents
@@ -298,7 +298,7 @@ storageExternal:
298
298
 
299
299
  **Arguments for closed: false (current):**
300
300
  - Future extensibility (e.g., IPFS-native, Filecoin, Arweave)
301
- - Consistent with sampleSchema schema union pattern
301
+ - Consistent with schema schema union pattern
302
302
  - Graceful degradation for unknown types
303
303
 
304
304
  **Recommendation:** Keep open but document in description that external/blobs are the canonical types maintained by foundation.ac.
@@ -307,7 +307,7 @@ storageExternal:
307
307
 
308
308
  ### 8. Missing Fields from Standard Patterns
309
309
 
310
- Comparing to Schema.org Dataset and sampleSchema patterns:
310
+ Comparing to Schema.org Dataset and schema patterns:
311
311
 
312
312
  **Consider Adding:**
313
313
 
@@ -317,7 +317,7 @@ Comparing to Schema.org Dataset and sampleSchema patterns:
317
317
 
318
318
  2. **Version** - Dataset versioning?
319
319
  - Current approach: New record per version (via tid)
320
- - Alternative: Add explicit `version` field like sampleSchema
320
+ - Alternative: Add explicit `version` field like schema
321
321
  - **Recommendation:** Document that versioning is via new records, reference via AT-URI with tid
322
322
 
323
323
  3. **Citation** - How to cite this dataset?
@@ -357,7 +357,7 @@ Comparing to Schema.org Dataset and sampleSchema patterns:
357
357
  {
358
358
  "$type": "ac.foundation.dataset.record",
359
359
  "name": "CIFAR-10 Training Set",
360
- "schemaRef": "at://did:plc:abc123/ac.foundation.dataset.sampleSchema/imageclassification@1.0.0",
360
+ "schemaRef": "at://did:plc:abc123/ac.foundation.dataset.schema/imageclassification@1.0.0",
361
361
  "storage": {"type": "external", "urls": ["..."]}
362
362
  }
363
363
  ```
@@ -400,7 +400,7 @@ Comparing to Schema.org Dataset and sampleSchema patterns:
400
400
  ### High Priority (Should Fix)
401
401
 
402
402
  4. **Align metadata pattern** - Clarify relationship between top-level fields and metadata blob
403
- 5. **Standardize license field** - Match sampleSchema maxLength and description
403
+ 5. **Standardize license field** - Match schema maxLength and description
404
404
  6. **Standardize tags field** - Use consistent limits or document rationale
405
405
 
406
406
  ### Medium Priority (Consider)
@@ -419,9 +419,9 @@ Comparing to Schema.org Dataset and sampleSchema patterns:
419
419
 
420
420
  ## Consistency Matrix
421
421
 
422
- Comparison of patterns between sampleSchema and record Lexicons:
422
+ Comparison of patterns between schema and record Lexicons:
423
423
 
424
- | Pattern | sampleSchema | record | Status |
424
+ | Pattern | schema | record | Status |
425
425
  |---------|--------------|--------|--------|
426
426
  | AT-URI format | ✅ Uses format | ❌ Missing | **Fix** |
427
427
  | License field | 200 chars, detailed | 100 chars, basic | **Align** |
@@ -440,15 +440,15 @@ Comparison of patterns between sampleSchema and record Lexicons:
440
440
  1. Add `"format": "at-uri"` to schemaRef field
441
441
  2. Change storage union variants to use `$type` discriminator
442
442
  3. Verify blob array item type with ATProto specification
443
- 4. Align license field with sampleSchema (maxLength: 200, enhanced description)
444
- 5. Decide on tags limits (recommend matching sampleSchema: 150/30)
443
+ 4. Align license field with schema (maxLength: 200, enhanced description)
444
+ 5. Decide on tags limits (recommend matching schema: 150/30)
445
445
 
446
446
  ### Documentation Improvements
447
447
 
448
448
  6. Add description clarifying metadata blob vs top-level fields relationship
449
449
  7. Document that dataset versioning is via new records (tids)
450
450
  8. Add note about storage union extensibility
451
- 9. Cross-reference with sampleSchema Lexicon
451
+ 9. Cross-reference with schema Lexicon
452
452
 
453
453
  ### Consider for Phase 2
454
454
 
@@ -460,7 +460,7 @@ Comparison of patterns between sampleSchema and record Lexicons:
460
460
 
461
461
  ## Conclusion
462
462
 
463
- The record Lexicon provides a solid foundation but needs refinement for ATProto compliance and consistency with sampleSchema patterns. The storage union design is excellent, and the use of tids is appropriate. Primary concerns are format validation, union discriminators, and metadata pattern clarity.
463
+ The record Lexicon provides a solid foundation but needs refinement for ATProto compliance and consistency with schema patterns. The storage union design is excellent, and the use of tids is appropriate. Primary concerns are format validation, union discriminators, and metadata pattern clarity.
464
464
 
465
465
  **Estimated effort to address critical issues:** 2-3 hours
466
466
  **Recommended timeline:** Before Phase 1 completion
@@ -1,6 +1,6 @@
1
- # sampleSchema Lexicon Design Questions
1
+ # schema Lexicon Design Questions
2
2
 
3
- This document captures open design questions for the `ac.foundation.dataset.sampleSchema` Lexicon that require user decisions before implementation.
3
+ This document captures open design questions for the `ac.foundation.dataset.schema` Lexicon that require user decisions before implementation.
4
4
 
5
5
  ## Q1: Key Format Validation
6
6
 
@@ -156,7 +156,7 @@ Should we add an explicit default value for `ndarrayShimUri`?
156
156
 
157
157
  ## Notes
158
158
 
159
- These questions should be resolved before finalizing the sampleSchema Lexicon design. Some can be deferred to Phase 2 implementation based on priority.
159
+ These questions should be resolved before finalizing the schema Lexicon design. Some can be deferred to Phase 2 implementation based on priority.
160
160
 
161
161
  **Priority:**
162
162
  - Q1: High (affects rkey strategy)
@@ -41,7 +41,7 @@ def bytes_to_array(b: bytes) -> np.ndarray:
41
41
  # Step 2: Load the JSON Schema for ImageSample
42
42
 
43
43
  # Get path to the schema example
44
- schema_path = Path(__file__).parent.parent / "sampleSchema_example.json"
44
+ schema_path = Path(__file__).parent.parent / "schema_example.json"
45
45
  with open(schema_path) as f:
46
46
  schema_record = json.load(f)
47
47
 
@@ -312,5 +312,5 @@ print("""
312
312
  - Shim definition is sound and reusable
313
313
  - Works as both inline $def and external $ref
314
314
  - Compatible with JSON Schema tooling
315
- - Ready for use in ac.foundation.dataset.sampleSchema Lexicon
315
+ - Ready for use in ac.foundation.dataset.schema Lexicon
316
316
  """)
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "$type": "ac.foundation.dataset.record",
3
3
  "name": "Small Sample Dataset",
4
- "schemaRef": "at://did:plc:def456/ac.foundation.dataset.sampleSchema/textsample@2.1.0",
4
+ "schemaRef": "at://did:plc:def456/ac.foundation.dataset.schema/textsample@2.1.0",
5
5
  "storage": {
6
6
  "$type": "ac.foundation.dataset.storageBlobs",
7
7
  "blobs": [
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "$type": "ac.foundation.dataset.record",
3
3
  "name": "CIFAR-10 Training Set",
4
- "schemaRef": "at://did:plc:abc123/ac.foundation.dataset.sampleSchema/imageclassification@1.0.0",
4
+ "schemaRef": "at://did:plc:abc123/ac.foundation.dataset.schema/imageclassification@1.0.0",
5
5
  "storage": {
6
6
  "$type": "ac.foundation.dataset.storageExternal",
7
7
  "urls": [
@@ -1,8 +1,8 @@
1
1
  {
2
2
  "$type": "ac.foundation.dataset.lens",
3
3
  "name": "RGB to Grayscale Conversion",
4
- "sourceSchema": "at://did:plc:abc123/ac.foundation.dataset.sampleSchema/rgbimage@1.0.0",
5
- "targetSchema": "at://did:plc:abc123/ac.foundation.dataset.sampleSchema/grayscaleimage@1.0.0",
4
+ "sourceSchema": "at://did:plc:abc123/ac.foundation.dataset.schema/rgbimage@1.0.0",
5
+ "targetSchema": "at://did:plc:abc123/ac.foundation.dataset.schema/grayscaleimage@1.0.0",
6
6
  "description": "Converts RGB images to grayscale using standard luminosity formula",
7
7
  "getterCode": {
8
8
  "repository": "https://github.com/alice/vision-lenses",
@@ -1,10 +1,10 @@
1
1
  {
2
- "$type": "ac.foundation.dataset.sampleSchema",
2
+ "$type": "ac.foundation.dataset.schema",
3
3
  "name": "ImageSample",
4
4
  "version": "1.0.0",
5
5
  "schemaType": "jsonSchema",
6
6
  "schema": {
7
- "$type": "ac.foundation.dataset.sampleSchema#jsonSchemaFormat",
7
+ "$type": "ac.foundation.dataset.schema#jsonSchemaFormat",
8
8
  "$schema": "http://json-schema.org/draft-07/schema#",
9
9
  "title": "ImageSample",
10
10
  "type": "object",
@@ -6,16 +6,16 @@ This directory contains the ATProto Lexicon JSON definitions for the distributed
6
6
 
7
7
  ### Core Record Types
8
8
 
9
- 1. **[ac.foundation.dataset.sampleSchema](ac.foundation.dataset.sampleSchema.json)**
9
+ 1. **[ac.foundation.dataset.schema](ac.foundation.dataset.schema.json)**
10
10
  - Defines PackableSample-compatible sample types using JSON Schema
11
11
  - Supports versioning via rkey format: `{NSID}@{semver}`
12
12
  - Includes NDArray shim for ML/scientific data types
13
- - Example: [sampleSchema_example.json](../examples/sampleSchema_example.json)
13
+ - Example: [schema_example.json](../examples/schema_example.json)
14
14
 
15
15
  2. **[ac.foundation.dataset.record](ac.foundation.dataset.record.json)**
16
16
  - Index records for WebDataset-backed datasets
17
17
  - Hybrid storage support (external URLs + PDS blobs)
18
- - References sampleSchema for type information
18
+ - References schema for type information
19
19
  - Examples:
20
20
  - [External storage](../examples/dataset_external_storage.json)
21
21
  - [Blob storage](../examples/dataset_blob_storage.json)
@@ -40,7 +40,7 @@ This directory contains the ATProto Lexicon JSON definitions for the distributed
40
40
  All Lexicons use the `ac.foundation.dataset.*` namespace:
41
41
  - `ac.foundation` - Organization namespace
42
42
  - `dataset` - Domain (distributed datasets)
43
- - Specific record types: `sampleSchema`, `record`, `lens`
43
+ - Specific record types: `schema`, `record`, `lens`
44
44
 
45
45
  ### 2. Schema Versioning (rkey Convention)
46
46
 
@@ -57,7 +57,7 @@ All Lexicons use the `ac.foundation.dataset.*` namespace:
57
57
  - Natural query pattern via `getLatestSchema`
58
58
  - Clear semantic versioning enforcement
59
59
 
60
- **Implementation**: The sampleSchema Lexicon uses `"key": "any"` to support this custom format.
60
+ **Implementation**: The schema Lexicon uses `"key": "any"` to support this custom format.
61
61
 
62
62
  ### 3. JSON Schema with NDArray Shim
63
63
 
@@ -151,7 +151,7 @@ schema_uri = publisher.publish_schema(
151
151
  version="1.0.0",
152
152
  description="RGB image with label"
153
153
  )
154
- # Result: at://did:plc:abc123/ac.foundation.dataset.sampleSchema/imagesample@1.0.0
154
+ # Result: at://did:plc:abc123/ac.foundation.dataset.schema/imagesample@1.0.0
155
155
  ```
156
156
 
157
157
  ### Publishing a Dataset
@@ -226,7 +226,7 @@ See [06_lexicon_validation.md](../decisions/06_lexicon_validation.md) for valida
226
226
 
227
227
  ```bash
228
228
  # Validate Lexicon JSON (requires ATProto tooling)
229
- atproto-lexicon validate ac.foundation.dataset.sampleSchema.json
229
+ atproto-lexicon validate ac.foundation.dataset.schema.json
230
230
 
231
231
  # Validate example records
232
232
  python scripts/validate_examples.py