hirundo 0.1.16__py3-none-any.whl → 0.1.21__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: hirundo
3
- Version: 0.1.16
3
+ Version: 0.1.21
4
4
  Summary: This package is used to interface with Hirundo's platform. It provides a simple API to optimize your ML datasets.
5
5
  Author-email: Hirundo <dev@hirundo.io>
6
6
  License: MIT License
@@ -13,7 +13,7 @@ License: MIT License
13
13
 
14
14
  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15
15
 
16
- Project-URL: Homepage, https://github.com/Hirundo-io/hirundo-client
16
+ Project-URL: Homepage, https://github.com/Hirundo-io/hirundo-python-sdk
17
17
  Keywords: dataset,machine learning,data science,data engineering
18
18
  Classifier: License :: OSI Approved :: MIT License
19
19
  Classifier: Programming Language :: Python
@@ -32,6 +32,10 @@ Requires-Dist: httpx>=0.27.0
32
32
  Requires-Dist: stamina>=24.2.0
33
33
  Requires-Dist: httpx-sse>=0.4.0
34
34
  Requires-Dist: tqdm>=4.66.5
35
+ Requires-Dist: h11>=0.16.0
36
+ Requires-Dist: requests>=2.32.4
37
+ Requires-Dist: urllib3>=2.5.0
38
+ Requires-Dist: setuptools>=78.1.1
35
39
  Provides-Extra: dev
36
40
  Requires-Dist: pyyaml>=6.0.1; extra == "dev"
37
41
  Requires-Dist: types-PyYAML>=6.0.12; extra == "dev"
@@ -46,69 +50,94 @@ Requires-Dist: stamina>=24.2.0; extra == "dev"
46
50
  Requires-Dist: httpx-sse>=0.4.0; extra == "dev"
47
51
  Requires-Dist: pytest>=8.2.0; extra == "dev"
48
52
  Requires-Dist: pytest-asyncio>=0.23.6; extra == "dev"
49
- Requires-Dist: uv>=0.5.8; extra == "dev"
53
+ Requires-Dist: uv>=0.8.6; extra == "dev"
50
54
  Requires-Dist: pre-commit>=3.7.1; extra == "dev"
51
55
  Requires-Dist: virtualenv>=20.6.6; extra == "dev"
52
- Requires-Dist: ruff>=0.11.6; extra == "dev"
56
+ Requires-Dist: ruff>=0.12.0; extra == "dev"
53
57
  Requires-Dist: bumpver; extra == "dev"
54
58
  Requires-Dist: platformdirs>=4.3.6; extra == "dev"
55
59
  Requires-Dist: safety>=3.2.13; extra == "dev"
60
+ Requires-Dist: cryptography>=44.0.1; extra == "dev"
61
+ Requires-Dist: jinja2>=3.1.6; extra == "dev"
56
62
  Provides-Extra: docs
57
63
  Requires-Dist: sphinx>=7.4.7; extra == "docs"
58
- Requires-Dist: sphinx-autobuild>=2024.4.16; extra == "docs"
64
+ Requires-Dist: sphinx-autobuild>=2024.9.3; extra == "docs"
59
65
  Requires-Dist: sphinx-click>=5.0.1; extra == "docs"
60
66
  Requires-Dist: autodoc_pydantic>=2.2.0; extra == "docs"
61
67
  Requires-Dist: furo; extra == "docs"
62
68
  Requires-Dist: sphinx-multiversion; extra == "docs"
63
69
  Requires-Dist: esbonio; extra == "docs"
64
- Requires-Dist: starlette>0.40.0; extra == "docs"
70
+ Requires-Dist: starlette>=0.47.2; extra == "docs"
65
71
  Requires-Dist: markupsafe>=3.0.2; extra == "docs"
72
+ Requires-Dist: jinja2>=3.1.6; extra == "docs"
66
73
  Provides-Extra: pandas
67
- Requires-Dist: pandas>=2.2.2; extra == "pandas"
74
+ Requires-Dist: pandas>=2.2.3; extra == "pandas"
68
75
  Provides-Extra: polars
69
76
  Requires-Dist: polars>=1.0.0; extra == "polars"
70
77
  Dynamic: license-file
71
78
 
72
79
  # Hirundo
73
80
 
74
- This package exposes access to Hirundo APIs for dataset optimization for Machine Learning.
75
-
76
- Dataset optimization is currently available for datasets labelled for classification and object detection.
81
+ This package exposes access to Hirundo APIs for dataset QA for Machine Learning.
77
82
 
83
+ Dataset QA is currently available for datasets labelled for classification and object detection.
78
84
 
79
85
  Support dataset storage configs include:
80
- - Google Cloud (GCP) Storage
81
- - Amazon Web Services (AWS) S3
82
- - Git LFS (Large File Storage) repositories (e.g. GitHub or HuggingFace)
86
+
87
+ - Google Cloud (GCP) Storage
88
+ - Amazon Web Services (AWS) S3
89
+ - Git LFS (Large File Storage) repositories (e.g. GitHub or HuggingFace)
90
+
91
+ Note: This Python package must be used alongside a Hirundo server, either the SaaS platform, a custom VPC deployment or an on-premises installation.
83
92
 
84
93
  Optimizing a classification dataset
85
94
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
86
95
 
87
- Currently ``hirundo`` requires a CSV file with the following columns (all columns are required):
88
- - ``image_path``: The location of the image within the dataset ``root``
89
- - ``label``: The label of the image, i.e. which the class that was annotated for this image
96
+ Currently `hirundo` requires a CSV file with the following columns (all columns are required):
97
+
98
+ - `image_path`: The location of the image within the dataset `data_root_url`
99
+ - `class_name`: The semantic label, i.e. the class name of the class that the image was annotated as belonging to
100
+
101
+ And outputs two Pandas DataFrames with the dataset columns as well as:
102
+
103
+ Suspect DataFrame (filename: `mislabel_suspects.csv`) columns:
90
104
 
91
- And outputs a CSV with the same columns and:
92
- - ``suspect_level``: mislabel suspect level
93
- - ``suggested_label``: suggested label
94
- - ``suggested_label_conf``: suggested label confidence
105
+ - ``suspect_score``: mislabel suspect score
106
+ - ``suspect_level``: mislabel suspect level
107
+ - ``suspect_rank``: mislabel suspect ranking
108
+ - ``suggested_class_name``: suggested semantic label
109
+ - ``suggested_class_conf``: suggested semantic label confidence
110
+
111
+ Errors and warnings DataFrame (filename: `invalid_data.csv`) columns:
112
+
113
+ - ``status``: status message (one of ``NO_LABELS`` / ``MISSING_IMAGE`` / ``INVALID_IMAGE``)
95
114
 
96
115
  Optimizing an object detection (OD) dataset
97
116
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
98
117
 
99
118
  Currently ``hirundo`` requires a CSV file with the following columns (all columns are required):
100
- - ``image_path``: The location of the image within the dataset ``root``
101
- - ``bbox_id``: The index of the bounding box within the dataset. Used to indicate label suspects
102
- - ``label``: The label of the image, i.e. which the class that was annotated for this image
103
- - ``x1``, ``y1``, ``x2``, ``y2``: The bounding box coordinates of the object within the image
104
119
 
105
- And outputs a CSV with the same columns and:
106
- - ``suspect_level``: object mislabel suspect level
107
- - ``suggested_label``: suggested object label
108
- - ``suggested_label_conf``: suggested object label confidence
120
+ - ``image_path``: The location of the image within the dataset ``data_root_url``
121
+ - ``object_id``: The ID of the bounding box within the dataset. Used to indicate object suspects
122
+ - ``class_name``: Object semantic label, i.e. the class name of the object that was annotated
123
+ - ``xmin``: leftmost horizontal pixel coordinate of the object's bounding box
124
+ - ``ymin``: uppermost vertical pixel coordinate of the object's bounding box
125
+ - ``xmax``: rightmost horizontal pixel coordinate of the object's bounding box
126
+ - ``ymax``: lowermost vertical pixel coordinate of the object's bounding box
109
127
 
110
- Note: This Python package must be used alongside a Hirundo server, either the SaaS platform, a custom VPC deployment or an on-premises installation.
111
128
 
129
+ And outputs two Pandas DataFrames with the dataset columns as well as:
130
+
131
+ Suspect DataFrame (filename: `mislabel_suspects.csv`) columns:
132
+
133
+ - ``suspect_score``: object mislabel suspect score
134
+ - ``suspect_level``: object mislabel suspect level
135
+ - ``suspect_rank``: object mislabel suspect ranking
136
+ - ``suggested_class_name``: suggested object semantic label
137
+ - ``suggested_class_conf``: suggested object semantic label confidence
138
+
139
+ Errors and warnings DataFrame (filename: `invalid_data.csv`) columns:
140
+ - ``status``: status message (one of ``NO_LABELS`` / ``MISSING_IMAGE`` / ``INVALID_IMAGE`` / ``INVALID_BBOX`` / ``INVALID_BBOX_SIZE``)
112
141
 
113
142
  ## Installation
114
143
 
@@ -117,11 +146,12 @@ You can install the codebase with a simple `pip install hirundo` to install the
117
146
  ## Usage
118
147
 
119
148
  Classification example:
149
+
120
150
  ```python
121
151
  from hirundo import (
122
152
  HirundoCSV,
123
153
  LabelingType,
124
- OptimizationDataset,
154
+ QADataset,
125
155
  StorageGCP,
126
156
  StorageConfig,
127
157
  StorageTypes,
@@ -132,7 +162,7 @@ gcp_bucket = StorageGCP(
132
162
  project="Hirundo-global",
133
163
  credentials_json=json.loads(os.environ["GCP_CREDENTIALS"]),
134
164
  )
135
- test_dataset = OptimizationDataset(
165
+ test_dataset = QADataset(
136
166
  name="TEST-GCP cifar 100 classification dataset",
137
167
  labeling_type=LabelingType.SINGLE_LABEL_CLASSIFICATION,
138
168
  storage_config=StorageConfig(
@@ -147,12 +177,11 @@ test_dataset = OptimizationDataset(
147
177
  classes=cifar100_classes,
148
178
  )
149
179
 
150
- test_dataset.run_optimization()
180
+ test_dataset.run_qa()
151
181
  results = test_dataset.check_run()
152
182
  print(results)
153
183
  ```
154
184
 
155
-
156
185
  Object detection example:
157
186
 
158
187
  ```python
@@ -160,7 +189,7 @@ from hirundo import (
160
189
  GitRepo,
161
190
  HirundoCSV,
162
191
  LabelingType,
163
- OptimizationDataset,
192
+ QADataset,
164
193
  StorageGit,
165
194
  StorageConfig,
166
195
  StorageTypes,
@@ -173,7 +202,7 @@ git_storage = StorageGit(
173
202
  ),
174
203
  branch="main",
175
204
  )
176
- test_dataset = OptimizationDataset(
205
+ test_dataset = QADataset(
177
206
  name="TEST-HuggingFace-BDD-100k-validation-OD-validation-dataset",
178
207
  labeling_type=LabelingType.OBJECT_DETECTION,
179
208
  storage_config=StorageConfig(
@@ -187,30 +216,15 @@ test_dataset = OptimizationDataset(
187
216
  path="/BDD100K Val from Hirundo.zip/bdd100k/bdd100k.csv"
188
217
  ),
189
218
  ),
190
- classes=[
191
- "traffic light",
192
- "traffic sign",
193
- "car",
194
- "pedestrian",
195
- "bus",
196
- "truck",
197
- "rider",
198
- "bicycle",
199
- "motorcycle",
200
- "train",
201
- "other vehicle",
202
- "other person",
203
- "trailer",
204
- ],
205
219
  )
206
220
 
207
- test_dataset.run_optimization()
221
+ test_dataset.run_qa()
208
222
  results = test_dataset.check_run()
209
223
  print(results)
210
224
  ```
211
225
 
212
- Note: Currently we only support the main CPython release 3.9, 3.10 and 3.11. PyPy support may be introduced in the future.
226
+ Note: Currently we only support the main CPython release 3.9, 3.10, 3.11, 3.12 & 3.13. PyPy support may be introduced in the future.
213
227
 
214
228
  ## Further documentation
215
229
 
216
- To learn more about how to use this library, please visit the [http://docs.hirundo.io/](documentation) or see the Google Colab examples.
230
+ To learn more about how to use this library, please visit the [http://docs.hirundo.io/](documentation) or see the [Google Colab examples](https://github.com/Hirundo-io/hirundo-python-sdk/tree/main/notebooks).
@@ -0,0 +1,25 @@
1
+ hirundo/__init__.py,sha256=GxRK_DHPKG1aqxNa19imqspHRAvBHSAQ5Q0fDwJPCDE,1341
2
+ hirundo/__main__.py,sha256=wcCrL4PjG51r5wVKqJhcoJPTLfHW0wNbD31DrUN0MWI,28
3
+ hirundo/_constraints.py,sha256=slW7Rk9Ml5fuwjnXTLUvHIhnY_9hmcUUy57v9hFog1o,6003
4
+ hirundo/_dataframe.py,sha256=sXEEbCNcLi83wyU9ii884YikCzfASo_3nnrDxhuCv7U,758
5
+ hirundo/_env.py,sha256=efX2sjvYlHkFr2Lcstelei67YSTFpVGT0l08ZsfiMuE,622
6
+ hirundo/_headers.py,sha256=Cwha8gXEQNXL2lc9Lb1klLotkMLD82XOpAdX33TLVj8,521
7
+ hirundo/_http.py,sha256=0kfoznumU3jinHhJIpB6qn5Mt4a3kso59GNXVbpWH7M,2267
8
+ hirundo/_iter_sse_retrying.py,sha256=xNpf3W5qAHkKPJz8H4NZjKE3CrI_8b3m1iYeahdpdEc,4653
9
+ hirundo/_timeouts.py,sha256=gE58NU0t2e4KgKq2sk5rZcezDJAkgvRIbM5AVYFY6Ho,86
10
+ hirundo/_urls.py,sha256=0C85EbL0T-Bj25vJwjNs_obUG8ROSADpmbFdTAyhzlw,1375
11
+ hirundo/cli.py,sha256=u-LsrN17-J7temjrq6NeUGnJ4mO04tMCiQYqVMm6el8,7752
12
+ hirundo/dataset_enum.py,sha256=QnS3fy1OF4wvUtiIAHubKRhc611idS8huopEEolgqEM,1217
13
+ hirundo/dataset_qa.py,sha256=U7cqV4JbYkaByXEf2XdoJrQZ_rI9pgDxrXVbQLc50R8,32470
14
+ hirundo/dataset_qa_results.py,sha256=1F7JhRf7TQomwW9tjbNn8OBrhWHwEaWOND80r39l5uY,1104
15
+ hirundo/git.py,sha256=cBjP7kPnaUHR77FI5ZaERst38eTUDy8q1gAQzy45EB4,6567
16
+ hirundo/labeling.py,sha256=zXQCaqfdaLIG4qbzFGbb94L3FDdRMpdzHwbrDJE07Yk,5006
17
+ hirundo/logger.py,sha256=MUqrYp0fBlxWFhGl6P5t19_uqO7T_PNhrLN5bqY3i7s,275
18
+ hirundo/storage.py,sha256=MPKxkhrBmX84Yuexd4QoLDdVIJHrll9RosCLUsz5q3c,15936
19
+ hirundo/unzip.py,sha256=3aPOsBvF-ZgAumHnQ6hq7JtbFUe9eRRRFsiI6K8cRDE,8188
20
+ hirundo-0.1.21.dist-info/licenses/LICENSE,sha256=fusGGjqT2RGlU6kbkaOk7d-gDnsjk17wq67AO0mwBZI,1065
21
+ hirundo-0.1.21.dist-info/METADATA,sha256=3m7R5dMN5h_C-L2Wl76lzYjpreP5upyHcEkIoAZF1lY,9497
22
+ hirundo-0.1.21.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
23
+ hirundo-0.1.21.dist-info/entry_points.txt,sha256=4ZtnA_Nl1Af8fLnHp3lwjbGDEGU1S6ujb_JwtuQ7ZPM,44
24
+ hirundo-0.1.21.dist-info/top_level.txt,sha256=cmyNqrNZOAYxnywJGFI1AJBLe4SkH8HGsfFx6ncdrbI,8
25
+ hirundo-0.1.21.dist-info/RECORD,,
@@ -1,5 +1,5 @@
1
1
  Wheel-Version: 1.0
2
- Generator: setuptools (79.0.1)
2
+ Generator: setuptools (80.9.0)
3
3
  Root-Is-Purelib: true
4
4
  Tag: py3-none-any
5
5
 
@@ -1,23 +0,0 @@
1
- hirundo/__init__.py,sha256=qKC89bNReZSjGtmf7l3PZD2JoptyVphpsD0Kf2PNXvY,1035
2
- hirundo/__main__.py,sha256=wcCrL4PjG51r5wVKqJhcoJPTLfHW0wNbD31DrUN0MWI,28
3
- hirundo/_constraints.py,sha256=gRv7fXwtjPGqYWIhkVYxu1B__3PdlYRqFyDkTpa9f74,1032
4
- hirundo/_dataframe.py,sha256=sXEEbCNcLi83wyU9ii884YikCzfASo_3nnrDxhuCv7U,758
5
- hirundo/_env.py,sha256=efX2sjvYlHkFr2Lcstelei67YSTFpVGT0l08ZsfiMuE,622
6
- hirundo/_headers.py,sha256=3hybpD_X4SODv3cFZPt9AjGY2vvZaag5OKT3z1SHSjA,521
7
- hirundo/_http.py,sha256=izlnuxStyPugjTAbD8Lo30tA4lZJ5d3kOENNduqrbX4,573
8
- hirundo/_iter_sse_retrying.py,sha256=U331_wZRIbVzi-jnMqo8bp9jBC8MtFBLEs-X0ZvhSDw,4634
9
- hirundo/_timeouts.py,sha256=gE58NU0t2e4KgKq2sk5rZcezDJAkgvRIbM5AVYFY6Ho,86
10
- hirundo/cli.py,sha256=5Tn0eXZGG92BR9HJYUaYozjFbS1t6UTw_I2R0tZBE04,7824
11
- hirundo/dataset_enum.py,sha256=ZEYBP-lrlVqfNWptlmw7JgLNhCyDirtWWPtoMvtg2AE,531
12
- hirundo/dataset_optimization.py,sha256=jR4ZOlKKl05jrA4cq9L1IQuKVPJ3ytXkhOJEg6efFqI,31390
13
- hirundo/dataset_optimization_results.py,sha256=A9YyF5zaZXVtzeDE08I_05v90dhZQADpSjDcS_6eLMc,1129
14
- hirundo/git.py,sha256=6h1hFPlw5FfYMGWXPCitnTqGICmBKmQtb5qKGe3Icmk,6580
15
- hirundo/logger.py,sha256=MUqrYp0fBlxWFhGl6P5t19_uqO7T_PNhrLN5bqY3i7s,275
16
- hirundo/storage.py,sha256=kO-LWlQAM3qTnALEl8s79AiFMYqCG9Sem4MIFQcyvAg,15950
17
- hirundo/unzip.py,sha256=XJqvt2m5pWR-G-fnzgW75VOdd-K4_Rw2r4wiEhZgKZA,8245
18
- hirundo-0.1.16.dist-info/licenses/LICENSE,sha256=fusGGjqT2RGlU6kbkaOk7d-gDnsjk17wq67AO0mwBZI,1065
19
- hirundo-0.1.16.dist-info/METADATA,sha256=CxdCbzafRuVRf1BGsS_tgjodO0g745uuNBl7y4UFMj8,8501
20
- hirundo-0.1.16.dist-info/WHEEL,sha256=SmOxYU7pzNKBqASvQJ7DjX3XGUF92lrGhMb3R6_iiqI,91
21
- hirundo-0.1.16.dist-info/entry_points.txt,sha256=4ZtnA_Nl1Af8fLnHp3lwjbGDEGU1S6ujb_JwtuQ7ZPM,44
22
- hirundo-0.1.16.dist-info/top_level.txt,sha256=cmyNqrNZOAYxnywJGFI1AJBLe4SkH8HGsfFx6ncdrbI,8
23
- hirundo-0.1.16.dist-info/RECORD,,