hafnia 0.1.27__tar.gz → 0.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (132) hide show
  1. {hafnia-0.1.27 → hafnia-0.2.1}/.github/workflows/ci_cd.yaml +1 -1
  2. {hafnia-0.1.27 → hafnia-0.2.1}/.github/workflows/publish_docker.yaml +1 -1
  3. {hafnia-0.1.27 → hafnia-0.2.1}/.vscode/launch.json +2 -2
  4. {hafnia-0.1.27 → hafnia-0.2.1}/.vscode/settings.json +10 -2
  5. {hafnia-0.1.27 → hafnia-0.2.1}/PKG-INFO +209 -99
  6. hafnia-0.2.1/README.md +447 -0
  7. {hafnia-0.1.27 → hafnia-0.2.1}/docs/cli.md +3 -9
  8. hafnia-0.2.1/examples/example_dataset_recipe.py +165 -0
  9. hafnia-0.2.1/examples/example_hafnia_dataset.py +129 -0
  10. {hafnia-0.1.27 → hafnia-0.2.1}/examples/example_torchvision_dataloader.py +14 -8
  11. {hafnia-0.1.27 → hafnia-0.2.1}/pyproject.toml +7 -2
  12. {hafnia-0.1.27 → hafnia-0.2.1}/src/cli/__main__.py +2 -2
  13. {hafnia-0.1.27 → hafnia-0.2.1}/src/cli/config.py +17 -4
  14. hafnia-0.2.1/src/cli/dataset_cmds.py +60 -0
  15. {hafnia-0.1.27 → hafnia-0.2.1}/src/cli/runc_cmds.py +1 -1
  16. hafnia-0.2.1/src/hafnia/data/__init__.py +3 -0
  17. hafnia-0.2.1/src/hafnia/data/factory.py +23 -0
  18. hafnia-0.2.1/src/hafnia/dataset/dataset_helpers.py +91 -0
  19. hafnia-0.2.1/src/hafnia/dataset/dataset_names.py +72 -0
  20. hafnia-0.2.1/src/hafnia/dataset/dataset_recipe/dataset_recipe.py +327 -0
  21. hafnia-0.2.1/src/hafnia/dataset/dataset_recipe/recipe_transforms.py +53 -0
  22. hafnia-0.2.1/src/hafnia/dataset/dataset_recipe/recipe_types.py +140 -0
  23. hafnia-0.2.1/src/hafnia/dataset/dataset_upload_helper.py +468 -0
  24. hafnia-0.2.1/src/hafnia/dataset/hafnia_dataset.py +624 -0
  25. hafnia-0.2.1/src/hafnia/dataset/operations/dataset_stats.py +15 -0
  26. hafnia-0.2.1/src/hafnia/dataset/operations/dataset_transformations.py +82 -0
  27. hafnia-0.2.1/src/hafnia/dataset/operations/table_transformations.py +183 -0
  28. hafnia-0.2.1/src/hafnia/dataset/primitives/__init__.py +16 -0
  29. hafnia-0.2.1/src/hafnia/dataset/primitives/bbox.py +137 -0
  30. hafnia-0.2.1/src/hafnia/dataset/primitives/bitmask.py +182 -0
  31. hafnia-0.2.1/src/hafnia/dataset/primitives/classification.py +56 -0
  32. hafnia-0.2.1/src/hafnia/dataset/primitives/point.py +25 -0
  33. hafnia-0.2.1/src/hafnia/dataset/primitives/polygon.py +100 -0
  34. hafnia-0.2.1/src/hafnia/dataset/primitives/primitive.py +44 -0
  35. hafnia-0.2.1/src/hafnia/dataset/primitives/segmentation.py +51 -0
  36. hafnia-0.2.1/src/hafnia/dataset/primitives/utils.py +51 -0
  37. {hafnia-0.1.27 → hafnia-0.2.1}/src/hafnia/experiment/hafnia_logger.py +7 -7
  38. hafnia-0.2.1/src/hafnia/helper_testing.py +108 -0
  39. {hafnia-0.1.27 → hafnia-0.2.1}/src/hafnia/http.py +5 -3
  40. {hafnia-0.1.27 → hafnia-0.2.1}/src/hafnia/platform/__init__.py +2 -2
  41. hafnia-0.2.1/src/hafnia/platform/datasets.py +197 -0
  42. {hafnia-0.1.27 → hafnia-0.2.1}/src/hafnia/platform/download.py +85 -23
  43. hafnia-0.2.1/src/hafnia/torch_helpers.py +255 -0
  44. {hafnia-0.1.27 → hafnia-0.2.1}/src/hafnia/utils.py +21 -2
  45. hafnia-0.2.1/src/hafnia/visualizations/colors.py +267 -0
  46. hafnia-0.2.1/src/hafnia/visualizations/image_visualizations.py +202 -0
  47. hafnia-0.2.1/tests/conftest.py +81 -0
  48. hafnia-0.2.1/tests/data/expected_images/test_samples/test_check_dataset[caltech-101].png +0 -0
  49. hafnia-0.2.1/tests/data/expected_images/test_samples/test_check_dataset[caltech-256].png +0 -0
  50. hafnia-0.2.1/tests/data/expected_images/test_samples/test_check_dataset[cifar100].png +0 -0
  51. hafnia-0.2.1/tests/data/expected_images/test_samples/test_check_dataset[cifar10].png +0 -0
  52. hafnia-0.2.1/tests/data/expected_images/test_samples/test_check_dataset[coco-2017].png +0 -0
  53. hafnia-0.2.1/tests/data/expected_images/test_samples/test_check_dataset[midwest-vehicle-detection].png +0 -0
  54. hafnia-0.2.1/tests/data/expected_images/test_samples/test_check_dataset[mnist].png +0 -0
  55. hafnia-0.2.1/tests/data/expected_images/test_samples/test_check_dataset[tiny-dataset].png +0 -0
  56. hafnia-0.2.1/tests/data/expected_images/test_samples/test_dataset_draw_image_and_target[caltech-101].png +0 -0
  57. hafnia-0.2.1/tests/data/expected_images/test_samples/test_dataset_draw_image_and_target[caltech-256].png +0 -0
  58. hafnia-0.2.1/tests/data/expected_images/test_samples/test_dataset_draw_image_and_target[cifar100].png +0 -0
  59. hafnia-0.2.1/tests/data/expected_images/test_samples/test_dataset_draw_image_and_target[cifar10].png +0 -0
  60. hafnia-0.2.1/tests/data/expected_images/test_samples/test_dataset_draw_image_and_target[coco-2017].png +0 -0
  61. hafnia-0.2.1/tests/data/expected_images/test_samples/test_dataset_draw_image_and_target[midwest-vehicle-detection].png +0 -0
  62. hafnia-0.2.1/tests/data/expected_images/test_samples/test_dataset_draw_image_and_target[mnist].png +0 -0
  63. hafnia-0.2.1/tests/data/expected_images/test_samples/test_dataset_draw_image_and_target[tiny-dataset].png +0 -0
  64. hafnia-0.2.1/tests/data/expected_images/test_visualizations/test_blur_anonymization[coco-2017].png +0 -0
  65. hafnia-0.2.1/tests/data/expected_images/test_visualizations/test_blur_anonymization[tiny-dataset].png +0 -0
  66. hafnia-0.2.1/tests/data/expected_images/test_visualizations/test_draw_annotations[coco-2017].png +0 -0
  67. hafnia-0.2.1/tests/data/expected_images/test_visualizations/test_draw_annotations[tiny-dataset].png +0 -0
  68. hafnia-0.2.1/tests/data/expected_images/test_visualizations/test_mask_region[coco-2017].png +0 -0
  69. hafnia-0.2.1/tests/data/expected_images/test_visualizations/test_mask_region[tiny-dataset].png +0 -0
  70. hafnia-0.2.1/tests/data/micro_test_datasets/coco-2017/annotations.jsonl +3 -0
  71. hafnia-0.2.1/tests/data/micro_test_datasets/coco-2017/annotations.parquet +0 -0
  72. hafnia-0.2.1/tests/data/micro_test_datasets/coco-2017/data/182a2c0a3ce312cf.jpg +0 -0
  73. hafnia-0.2.1/tests/data/micro_test_datasets/coco-2017/data/4e95c6eb6209880a.jpg +0 -0
  74. hafnia-0.2.1/tests/data/micro_test_datasets/coco-2017/data/cf86c7a23edb55ce.jpg +0 -0
  75. hafnia-0.2.1/tests/data/micro_test_datasets/coco-2017/dataset_info.json +232 -0
  76. hafnia-0.2.1/tests/data/micro_test_datasets/tiny-dataset/annotations.jsonl +3 -0
  77. hafnia-0.2.1/tests/data/micro_test_datasets/tiny-dataset/annotations.parquet +0 -0
  78. hafnia-0.2.1/tests/data/micro_test_datasets/tiny-dataset/data/222bbd5721a8a86e.png +0 -0
  79. hafnia-0.2.1/tests/data/micro_test_datasets/tiny-dataset/data/3251d85443622e4c.png +0 -0
  80. hafnia-0.2.1/tests/data/micro_test_datasets/tiny-dataset/data/3657ababa44af9b6.png +0 -0
  81. hafnia-0.2.1/tests/data/micro_test_datasets/tiny-dataset/dataset_info.json +108 -0
  82. hafnia-0.2.1/tests/dataset/dataset_recipe/test_dataset_recipe_helpers.py +120 -0
  83. hafnia-0.2.1/tests/dataset/dataset_recipe/test_dataset_recipes.py +260 -0
  84. hafnia-0.2.1/tests/dataset/dataset_recipe/test_recipe_transformations.py +224 -0
  85. hafnia-0.2.1/tests/dataset/operations/test_dataset_transformations.py +0 -0
  86. hafnia-0.2.1/tests/dataset/operations/test_table_transformations.py +94 -0
  87. hafnia-0.2.1/tests/dataset/test_colors.py +8 -0
  88. hafnia-0.2.1/tests/dataset/test_dataset_helpers.py +79 -0
  89. hafnia-0.2.1/tests/dataset/test_hafnia_dataset.py +110 -0
  90. hafnia-0.2.1/tests/dataset/test_shape_primitives.py +70 -0
  91. {hafnia-0.1.27 → hafnia-0.2.1}/tests/test_check_example_scripts.py +4 -3
  92. {hafnia-0.1.27 → hafnia-0.2.1}/tests/test_cli.py +47 -44
  93. {hafnia-0.1.27 → hafnia-0.2.1}/tests/test_hafnia_logger.py +8 -10
  94. hafnia-0.2.1/tests/test_samples.py +171 -0
  95. hafnia-0.2.1/tests/test_visualizations.py +62 -0
  96. {hafnia-0.1.27 → hafnia-0.2.1}/uv.lock +135 -658
  97. hafnia-0.1.27/README.md +0 -342
  98. hafnia-0.1.27/examples/dataset_builder.py +0 -537
  99. hafnia-0.1.27/examples/example_load_dataset.py +0 -14
  100. hafnia-0.1.27/src/cli/data_cmds.py +0 -53
  101. hafnia-0.1.27/src/hafnia/data/__init__.py +0 -3
  102. hafnia-0.1.27/src/hafnia/data/factory.py +0 -67
  103. hafnia-0.1.27/src/hafnia/torch_helpers.py +0 -170
  104. hafnia-0.1.27/tests/test_samples.py +0 -174
  105. {hafnia-0.1.27 → hafnia-0.2.1}/.devcontainer/devcontainer.json +0 -0
  106. {hafnia-0.1.27 → hafnia-0.2.1}/.devcontainer/hooks/post_create +0 -0
  107. {hafnia-0.1.27 → hafnia-0.2.1}/.github/dependabot.yaml +0 -0
  108. {hafnia-0.1.27 → hafnia-0.2.1}/.github/workflows/Dockerfile +0 -0
  109. {hafnia-0.1.27 → hafnia-0.2.1}/.github/workflows/build.yaml +0 -0
  110. {hafnia-0.1.27 → hafnia-0.2.1}/.github/workflows/check_release.yaml +0 -0
  111. {hafnia-0.1.27 → hafnia-0.2.1}/.github/workflows/lint.yaml +0 -0
  112. {hafnia-0.1.27 → hafnia-0.2.1}/.github/workflows/publish_pypi.yaml +0 -0
  113. {hafnia-0.1.27 → hafnia-0.2.1}/.github/workflows/tests.yaml +0 -0
  114. {hafnia-0.1.27 → hafnia-0.2.1}/.gitignore +0 -0
  115. {hafnia-0.1.27 → hafnia-0.2.1}/.pre-commit-config.yaml +0 -0
  116. {hafnia-0.1.27 → hafnia-0.2.1}/.python-version +0 -0
  117. {hafnia-0.1.27 → hafnia-0.2.1}/.vscode/extensions.json +0 -0
  118. {hafnia-0.1.27 → hafnia-0.2.1}/LICENSE +0 -0
  119. {hafnia-0.1.27 → hafnia-0.2.1}/docs/release.md +0 -0
  120. {hafnia-0.1.27 → hafnia-0.2.1}/examples/example_logger.py +0 -0
  121. {hafnia-0.1.27 → hafnia-0.2.1}/src/cli/__init__.py +0 -0
  122. {hafnia-0.1.27 → hafnia-0.2.1}/src/cli/consts.py +0 -0
  123. {hafnia-0.1.27 → hafnia-0.2.1}/src/cli/experiment_cmds.py +0 -0
  124. {hafnia-0.1.27 → hafnia-0.2.1}/src/cli/profile_cmds.py +0 -0
  125. {hafnia-0.1.27 → hafnia-0.2.1}/src/cli/recipe_cmds.py +0 -0
  126. {hafnia-0.1.27 → hafnia-0.2.1}/src/hafnia/__init__.py +0 -0
  127. {hafnia-0.1.27 → hafnia-0.2.1}/src/hafnia/experiment/__init__.py +0 -0
  128. {hafnia-0.1.27 → hafnia-0.2.1}/src/hafnia/log.py +0 -0
  129. {hafnia-0.1.27 → hafnia-0.2.1}/src/hafnia/platform/builder.py +0 -0
  130. {hafnia-0.1.27 → hafnia-0.2.1}/src/hafnia/platform/experiment.py +0 -0
  131. {hafnia-0.1.27 → hafnia-0.2.1}/tests/test_builder.py +0 -0
  132. {hafnia-0.1.27 → hafnia-0.2.1}/tests/test_utils.py +0 -0
@@ -21,7 +21,7 @@ jobs:
21
21
  steps:
22
22
  - uses: actions/checkout@v4.2.2
23
23
  - name: Run Trivy vulnerability scanner
24
- uses: aquasecurity/trivy-action@0.31.0
24
+ uses: aquasecurity/trivy-action@0.32.0
25
25
  with:
26
26
  scan-type: 'fs'
27
27
  scan-ref: '.'
@@ -57,7 +57,7 @@ jobs:
57
57
  uses: aws-actions/amazon-ecr-login@v2.0.1
58
58
 
59
59
  - name: Set up Docker Buildx
60
- uses: docker/setup-buildx-action@v3.11.0
60
+ uses: docker/setup-buildx-action@v3.11.1
61
61
 
62
62
  - name: Build and push
63
63
  uses: docker/build-push-action@v6.18.0
@@ -48,12 +48,12 @@
48
48
  ],
49
49
  },
50
50
  {
51
- "name": "debug (hafnia data download mnist)",
51
+ "name": "debug (hafnia dataset X)",
52
52
  "type": "debugpy",
53
53
  "request": "launch",
54
54
  "program": "${workspaceFolder}/src/cli/__main__.py",
55
55
  "args": [
56
- "data",
56
+ "dataset",
57
57
  "download",
58
58
  "mnist",
59
59
  //"./.data",
@@ -22,11 +22,19 @@
22
22
 
23
23
  "python.testing.pytestArgs": [
24
24
  "tests",
25
- "-vv"
25
+ "-vv",
26
+ "--durations=20",
26
27
  ],
27
28
  "python.testing.unittestEnabled": false,
28
29
  "python.testing.pytestEnabled": true,
29
30
  "cSpell.words": [
30
- "HAFNIA"
31
+ "bboxes",
32
+ "Bitmask",
33
+ "bitmasks",
34
+ "flatnonzero",
35
+ "fromarray",
36
+ "HAFNIA",
37
+ "ndarray",
38
+ "rprint"
31
39
  ]
32
40
  }
@@ -1,22 +1,27 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: hafnia
3
- Version: 0.1.27
3
+ Version: 0.2.1
4
4
  Summary: Python SDK for communication with Hafnia platform.
5
5
  Author-email: Milestone Systems <hafniaplatform@milestone.dk>
6
6
  License-File: LICENSE
7
7
  Requires-Python: >=3.10
8
8
  Requires-Dist: boto3>=1.35.91
9
9
  Requires-Dist: click>=8.1.8
10
- Requires-Dist: datasets>=3.2.0
11
10
  Requires-Dist: emoji>=2.14.1
12
11
  Requires-Dist: flatten-dict>=0.4.2
12
+ Requires-Dist: more-itertools>=10.7.0
13
+ Requires-Dist: opencv-python-headless>=4.11.0.86
13
14
  Requires-Dist: pathspec>=0.12.1
14
15
  Requires-Dist: pillow>=11.1.0
16
+ Requires-Dist: polars>=1.30.0
15
17
  Requires-Dist: pyarrow>=18.1.0
18
+ Requires-Dist: pycocotools>=2.0.10
16
19
  Requires-Dist: pydantic>=2.10.4
17
20
  Requires-Dist: rich>=13.9.4
21
+ Requires-Dist: s5cmd>=0.2.0
18
22
  Requires-Dist: seedir>=0.5.0
19
23
  Requires-Dist: tqdm>=4.67.1
24
+ Requires-Dist: xxhash>=3.5.0
20
25
  Description-Content-Type: text/markdown
21
26
 
22
27
  # Hafnia
@@ -28,8 +33,8 @@ The package includes the following interfaces:
28
33
 
29
34
  - `cli`: A Command Line Interface (CLI) to 1) configure/connect to Hafnia's [Training-aaS](https://hafnia.readme.io/docs/training-as-a-service) and 2) create and
30
35
  launch recipe scripts.
31
- - `hafnia`: A python package with helper functions to load and interact with sample datasets and an experiment
32
- tracker (`HafniaLogger`).
36
+ - `hafnia`: A python package including `HafniaDataset` to manage datasets and `HafniaLogger` to do
37
+ experiment tracking.
33
38
 
34
39
 
35
40
  ## The Concept: Training as a Service (Training-aaS)
@@ -76,7 +81,7 @@ Copy the key and save it for later use.
76
81
  1. Download `mnist` from terminal to verify that your configuration is working.
77
82
 
78
83
  ```bash
79
- hafnia data download mnist --force
84
+ hafnia dataset download mnist --force
80
85
  ```
81
86
 
82
87
  ## Getting started: Loading datasets samples
@@ -84,115 +89,220 @@ With Hafnia configured on your local machine, it is now possible to download
84
89
  and explore the dataset sample with a python script:
85
90
 
86
91
  ```python
87
- from hafnia.data import load_dataset
92
+ from hafnia.data import load_dataset, get_dataset_path
93
+ from hafnia.dataset.hafnia_dataset import HafniaDataset
88
94
 
89
- dataset_splits = load_dataset("mnist")
95
+ # To download the sample dataset use:
96
+ path_dataset = get_dataset_path("midwest-vehicle-detection")
90
97
  ```
91
98
 
92
- ### Dataset Format
93
- The returned sample dataset is a [hugging face dataset](https://huggingface.co/docs/datasets/index)
94
- and contains train, validation and test splits.
99
+ This will download the dataset sample `midwest-vehicle-detection` to the local `.data/datasets/` folder
100
+ in a human readable format.
101
+
102
+ Images are stored in the `data` folder, general dataset information is stored in `dataset_info.json`
103
+ and annotations are stored as both `annotations.jsonl`(jsonl) and `annotations.parquet`.
104
+
105
+ ```bash
106
+ $ cd .data/datasets/
107
+ $ tree midwest-vehicle-detection
108
+ midwest-vehicle-detection
109
+ └── sample
110
+ ├── annotations.jsonl
111
+ ├── annotations.parquet
112
+ ├── data
113
+ │   ├── video_0026a86b-2f43-49f2-a17c-59244d10a585_1fps_mp4_frame_00000.png
114
+ │   ....
115
+ │   ├── video_ff17d777-e783-44e2-9bff-a4adac73de4b_1fps_mp4_frame_00000.png
116
+ │   └── video_ff17d777-e783-44e2-9bff-a4adac73de4b_1fps_mp4_frame_00100.png
117
+ └── dataset_info.json
118
+
119
+ 3 directories, 217 files
120
+ ```
121
+
122
+ You can interact with data as you want, but we also provide `HafniaDataset`
123
+ for loading/saving, managing and interacting with the dataset.
124
+
125
+ We recommend to visit and potentially execute the example script [examples/example_hafnia_dataset.py](examples/example_hafnia_dataset.py)
126
+ to see how to use the `HafniaDataset` class and its methods.
127
+
128
+ Below is a short introduction to the `HafniaDataset` class.
129
+
130
+ ```python
131
+ from hafnia.dataset.hafnia_dataset import HafniaDataset, Sample
132
+
133
+ # Load dataset
134
+ dataset = HafniaDataset.read_from_path(path_dataset)
135
+
136
+ # Alternatively, you can use the 'load_dataset' function to download and load dataset in one go.
137
+ # dataset = load_dataset("midwest-vehicle-detection")
138
+
139
+ # Print dataset information
140
+ dataset.print_stats()
141
+
142
+ # Create a dataset split for training
143
+ dataset_train = dataset.create_split_dataset("train")
144
+ ```
145
+
146
+ The `HafniaDataset` object provides a convenient way to interact with the dataset, including methods for
147
+ creating splits, accessing samples, printing statistics, saving to and loading from disk.
148
+
149
+ In essence, the `HafniaDataset` class contains `dataset.info` with dataset information
150
+ and `dataset.samples` with annotations as a polars DataFrame
95
151
 
96
152
  ```python
97
- print(dataset_splits)
98
-
99
- # Output:
100
- >>> DatasetDict({
101
- train: Dataset({
102
- features: ['image_id', 'image', 'height', 'width', 'objects', 'Weather', 'Surface Conditions'],
103
- num_rows: 172
104
- })
105
- validation: Dataset({
106
- features: ['image_id', 'image', 'height', 'width', 'objects', 'Weather', 'Surface Conditions'],
107
- num_rows: 21
108
- })
109
- test: Dataset({
110
- features: ['image_id', 'image', 'height', 'width', 'objects', 'Weather', 'Surface Conditions'],
111
- num_rows: 21
112
- })
113
- })
153
+ # Annotations are stored in a polars DataFrame
154
+ print(dataset.samples.head(2))
155
+ shape: (2, 14)
156
+ ┌──────────────┬─────────────────────────────────┬────────┬───────┬───┬─────────────────────────────────┬──────────┬──────────┬─────────────────────────────────┐
157
+ sample_index ┆ file_name ┆ height ┆ width ┆ … ┆ objects ┆ bitmasks ┆ polygons ┆ meta │
158
+ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
159
+ u32 ┆ str ┆ i64 ┆ i64 ┆ ┆ list[struct[11]] ┆ null ┆ null ┆ struct[5] │
160
+ ╞══════════════╪═════════════════════════════════╪════════╪═══════╪═══╪═════════════════════════════════╪══════════╪══════════╪═════════════════════════════════╡
161
+ │ 0 ┆ /home/ubuntu/code/hafnia/.data… ┆ 1080 ┆ 1920 ┆ … ┆ [{0.0492,0.0357,0.2083,0.23,"V… ┆ null ┆ null ┆ {120.0,1.0,"2024-07-10T18:30:0…
162
+ 100 ┆ /home/ubuntu/code/hafnia/.data… ┆ 1080 ┆ 1920 ┆ … ┆ [{0.146382,0.078704,0.42963,0.… null ┆ null ┆ {120.0,1.0,"2024-07-10T18:30:0…
163
+ └──────────────┴─────────────────────────────────┴────────┴───────┴───┴─────────────────────────────────┴──────────┴──────────┴─────────────────────────────────┘
164
+ ```
114
165
 
166
+ ```python
167
+ # General dataset information is stored in `dataset.info`
168
+ rich.print(dataset.info)
169
+ DatasetInfo(
170
+ dataset_name='midwest-vehicle-detection',
171
+ version='1.0.0',
172
+ tasks=[
173
+ TaskInfo(
174
+ primitive=<class 'hafnia.dataset.primitives.Bbox'>,
175
+ class_names=[
176
+ 'Person',
177
+ 'Vehicle.Bicycle',
178
+ 'Vehicle.Motorcycle',
179
+ 'Vehicle.Car',
180
+ 'Vehicle.Van',
181
+ 'Vehicle.RV',
182
+ 'Vehicle.Single_Truck',
183
+ 'Vehicle.Combo_Truck',
184
+ 'Vehicle.Pickup_Truck',
185
+ 'Vehicle.Trailer',
186
+ 'Vehicle.Emergency_Vehicle',
187
+ 'Vehicle.Bus',
188
+ 'Vehicle.Heavy_Duty_Vehicle'
189
+ ],
190
+ name='bboxes'
191
+ ),
192
+ TaskInfo(primitive=<class 'hafnia.dataset.primitives.Classification'>, class_names=['Clear', 'Foggy'], name='Weather'),
193
+ TaskInfo(primitive=<class 'hafnia.dataset.primitives.Classification'>, class_names=['Dry', 'Wet'], name='Surface Conditions')
194
+ ],
195
+ meta={
196
+ 'n_videos': 109,
197
+ 'n_cameras': 20,
198
+ 'duration': 13080.0,
199
+ 'duration_average': 120.0,
200
+ ...
201
+ }
202
+ )
115
203
  ```
116
204
 
117
- A Hugging Face dataset is a dictionary with splits, where each split is a `Dataset` object.
118
- Each `Dataset` is structured as a table with a set of columns (also called features) and a row for each sample.
205
+ You can iterate and access samples in the dataset using the `HafniaDataset` object.
206
+ Each sample contain image and annotations information.
119
207
 
120
- The features of the dataset can be viewed with the `features` attribute.
121
208
  ```python
122
- # View features of the train split
123
- pprint.pprint(dataset["train"].features)
124
- {'Surface Conditions': ClassLabel(names=['Dry', 'Wet'], id=None),
125
- 'Weather': ClassLabel(names=['Clear', 'Foggy'], id=None),
126
- 'height': Value(dtype='int64', id=None),
127
- 'image': Image(mode=None, decode=True, id=None),
128
- 'image_id': Value(dtype='int64', id=None),
129
- 'objects': Sequence(feature={'bbox': Sequence(feature=Value(dtype='int64',
130
- id=None),
131
- length=-1,
132
- id=None),
133
- 'class_idx': ClassLabel(names=['Vehicle.Bicycle',
134
- 'Vehicle.Motorcycle',
135
- 'Vehicle.Car',
136
- 'Vehicle.Van',
137
- 'Vehicle.RV',
138
- 'Vehicle.Single_Truck',
139
- 'Vehicle.Combo_Truck',
140
- 'Vehicle.Pickup_Truck',
141
- 'Vehicle.Trailer',
142
- 'Vehicle.Emergency_Vehicle',
143
- 'Vehicle.Bus',
144
- 'Vehicle.Heavy_Duty_Vehicle'],
145
- id=None),
146
- 'class_name': Value(dtype='string', id=None),
147
- 'id': Value(dtype='string', id=None)},
148
- length=-1,
149
- id=None),
150
- 'width': Value(dtype='int64', id=None)}
209
+ from hafnia.dataset.hafnia_dataset import HafniaDataset, Sample
210
+ # Access the first sample in the dataset either by index or by iterating over the dataset
211
+ sample_dict = dataset[0]
212
+
213
+ for sample_dict in dataset:
214
+ sample = Sample(**sample_dict)
215
+ print(sample.sample_id, sample.objects)
216
+ break
151
217
  ```
218
+ Not that it is possible to create a `Sample` object from the sample dictionary.
219
+ This is useful for accessing the image and annotations in a structured way.
152
220
 
153
- View the first sample in the training set:
154
221
  ```python
155
- # Print sample from the training set
156
- pprint.pprint(dataset["train"][0])
157
-
158
- {'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=1920x1080 at 0x79D6292C5ED0>,
159
- 'image_id': 4920,
160
- 'height': 1080,
161
- 'Weather': 0,
162
- 'Surface Conditions': 0,
163
- 'objects': {'bbox': [[441, 180, 121, 126],
164
- [549, 151, 131, 103],
165
- [1845, 722, 68, 130],
166
- [1810, 571, 110, 149]],
167
- 'class_idx': [7, 7, 2, 2],
168
- 'class_name': ['Vehicle.Pickup_Truck',
169
- 'Vehicle.Pickup_Truck',
170
- 'Vehicle.Car',
171
- 'Vehicle.Car'],
172
- 'id': ['HW6WiLAJ', 'T/ccFpRi', 'CS0O8B6W', 'DKrJGzjp']},
173
- 'width': 1920}
222
+ # By unpacking the sample dictionary, you can create a `Sample` object.
223
+ sample = Sample(**sample_dict)
224
+
225
+ # Use the `Sample` object to easily read image and draw annotations
226
+ image = sample.read_image()
227
+ image_annotations = sample.draw_annotations()
228
+ ```
174
229
 
230
+ Note that the `Sample` object contains all information about the sample, including image and metadata.
231
+ It also contain annotations as primitive types such as `Bbox`, `Classification`.
232
+
233
+ ```python
234
+ rich.print(sample)
235
+ Sample(
236
+ sample_index=120,
237
+ file_name='/home/ubuntu/code/hafnia/.data/datasets/midwest-vehicle-detection/data/343403325f27e390.png',
238
+ height=1080,
239
+ width=1920,
240
+ split='train',
241
+ is_sample=True,
242
+ collection_index=None,
243
+ collection_id=None,
244
+ remote_path='s3://mdi-production-midwest-vehicle-detection/sample/data/343403325f27e390.png',
245
+ classifications=[
246
+ Classification(
247
+ class_name='Clear',
248
+ class_idx=0,
249
+ object_id=None,
250
+ confidence=None,
251
+ ground_truth=True,
252
+ task_name='Weather',
253
+ meta=None
254
+ ),
255
+ Classification(
256
+ class_name='Day',
257
+ class_idx=3,
258
+ object_id=None,
259
+ confidence=None,
260
+ ground_truth=True,
261
+ task_name='Time of Day',
262
+ meta=None
263
+ ),
264
+ ...
265
+ ],
266
+ objects=[
267
+ Bbox(
268
+ height=0.0492,
269
+ width=0.0357,
270
+ top_left_x=0.2083,
271
+ top_left_y=0.23,
272
+ class_name='Vehicle.Car',
273
+ class_idx=3,
274
+ object_id='cXT4NRVu',
275
+ confidence=None,
276
+ ground_truth=True,
277
+ task_name='bboxes',
278
+ meta=None
279
+ ),
280
+ Bbox(
281
+ height=0.0457,
282
+ width=0.0408,
283
+ top_left_x=0.2521,
284
+ top_left_y=0.2153,
285
+ class_name='Vehicle.Car',
286
+ class_idx=3,
287
+ object_id='MelbIIDU',
288
+ confidence=None,
289
+ ground_truth=True,
290
+ task_name='bboxes',
291
+ meta=None
292
+ ),
293
+ ...
294
+ ],
295
+ bitmasks=None, # Optional a list of Bitmask objects List[Bitmask]
296
+ polygons=None, # Optional a list of Polygon objects List[Polygon]
297
+ meta={
298
+ 'video.data_duration': 120.0,
299
+ 'video.data_fps': 1.0,
300
+ ...
301
+ }
302
+ )
175
303
  ```
176
304
 
177
- For hafnia based datasets, we want to standardized how a dataset and dataset tasks are represented.
178
- We have defined a set of features that are common across all datasets in the Hafnia data library.
179
-
180
- - `image`: The image itself, stored as a PIL image
181
- - `height`: The height of the image in pixels
182
- - `width`: The width of the image in pixels
183
- - `[IMAGE_CLASSIFICATION_TASK]`: [Optional] Image classification tasks are top-level `ClassLabel` feature.
184
- `ClassLabel` is a Hugging Face feature that maps class indices to class names.
185
- In above example we have two classification tasks:
186
- - `Weather`: Classifies the weather conditions in the image, with possible values `Clear` and `Foggy`
187
- - `Surface Conditions`: Classifies the surface conditions in the image, with possible values `Dry` and `Wet`
188
- - `objects`: A dictionary containing information about objects in the image, including:
189
- - `bbox`: Bounding boxes for each object, represented with a list of bounding box coordinates
190
- `[xmin, ymin, bbox_width, bbox_height]`. Each bounding box is defined with a top-left corner coordinate
191
- `(xmin, ymin)` and bounding box width and height `(bbox_width, bbox_height)` in pixels.
192
- - `class_idx`: Class indices for each detected object. This is a
193
- `ClassLabel` feature that maps to the `class_name` feature.
194
- - `class_name`: Class names for each detected object
195
- - `id`: Unique identifiers for each detected object
305
+ To learn more, view and potentially execute the example script [examples/example_hafnia_dataset.py](examples/example_hafnia_dataset.py).
196
306
 
197
307
  ### Dataset Locally vs. Training-aaS
198
308
  An important feature of `load_dataset` is that it will return the full dataset
@@ -354,7 +464,7 @@ curl -LsSf https://astral.sh/uv/install.sh | sh
354
464
  Create virtual environment and install python dependencies
355
465
 
356
466
  ```bash
357
- uv sync
467
+ uv sync --dev
358
468
  ```
359
469
 
360
470
  Run tests: