active-vision 0.1.1__tar.gz → 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: active-vision
3
- Version: 0.1.1
4
- Summary: Active learning for edge vision.
3
+ Version: 0.3.0
4
+ Summary: Active learning for computer vision.
5
5
  Requires-Python: >=3.10
6
6
  Description-Content-Type: text/markdown
7
7
  License-File: LICENSE
@@ -17,10 +17,10 @@ Requires-Dist: timm>=1.0.13
17
17
  Requires-Dist: transformers>=4.48.0
18
18
  Requires-Dist: xinfer>=0.3.2
19
19
 
20
- ![Python Version](https://img.shields.io/badge/python-3.10%2B-blue?style=for-the-badge)
21
- ![License](https://img.shields.io/badge/License-Apache%202.0-green.svg?style=for-the-badge)
22
- [![PyPI](https://img.shields.io/pypi/v/active-vision?style=for-the-badge)](https://pypi.org/project/active-vision/)
23
- ![Downloads](https://img.shields.io/pepy/dt/active-vision?style=for-the-badge&logo=pypi&logoColor=white&label=Downloads&color=purple)
20
+ [![Python Version](https://img.shields.io/badge/python-3.10%2B-blue?style=for-the-badge&logo=python&logoColor=white)](https://pypi.org/project/active-vision/)
21
+ [![PyPI](https://img.shields.io/pypi/v/active-vision?style=for-the-badge&logo=pypi&logoColor=white)](https://pypi.org/project/active-vision/)
22
+ [![Downloads](https://img.shields.io/pepy/dt/active-vision?style=for-the-badge&logo=pypi&logoColor=white&label=Downloads&color=purple)](https://pypi.org/project/active-vision/)
23
+ [![License](https://img.shields.io/badge/License-Apache%202.0-green.svg?style=for-the-badge&logo=apache&logoColor=white)](https://github.com/dnth/active-vision/blob/main/LICENSE)
24
24
 
25
25
  <p align="center">
26
26
  <img src="https://raw.githubusercontent.com/dnth/active-vision/main/assets/logo.png" alt="active-vision">
@@ -47,13 +47,14 @@ The goal of this project is to create a framework for the active learning loop f
47
47
 
48
48
  Uncertainty Sampling:
49
49
  - [X] Least confidence
50
- - [ ] Margin of confidence
51
- - [ ] Ratio of confidence
52
- - [ ] Entropy
50
+ - [X] Margin of confidence
51
+ - [X] Ratio of confidence
52
+ - [X] Entropy
53
53
 
54
54
  Diverse Sampling:
55
55
  - [X] Random sampling
56
- - [ ] Model-based outlier
56
+ - [X] Model-based outlier
57
+ - [ ] Embeddings-based outlier
57
58
  - [ ] Cluster-based
58
59
  - [ ] Representative
59
60
 
@@ -172,7 +173,7 @@ The active learning loop is a iterative process and can keep going until you hit
172
173
  - You hit a budget.
173
174
  - Other criteria.
174
175
 
175
- For this dataset,I decided to stop the active learning loop at 275 labeled images because the performance on the evaluation set is close to the top performing model on the leaderboard.
176
+ For this dataset, I decided to stop the active learning loop at 275 labeled images because the performance on the evaluation set exceeds the top performing model on the leaderboard.
176
177
 
177
178
 
178
179
  | #Labeled Images | Evaluation Accuracy | Train Epochs | Model | Active Learning | Source |
@@ -1,7 +1,7 @@
1
- ![Python Version](https://img.shields.io/badge/python-3.10%2B-blue?style=for-the-badge)
2
- ![License](https://img.shields.io/badge/License-Apache%202.0-green.svg?style=for-the-badge)
3
- [![PyPI](https://img.shields.io/pypi/v/active-vision?style=for-the-badge)](https://pypi.org/project/active-vision/)
4
- ![Downloads](https://img.shields.io/pepy/dt/active-vision?style=for-the-badge&logo=pypi&logoColor=white&label=Downloads&color=purple)
1
+ [![Python Version](https://img.shields.io/badge/python-3.10%2B-blue?style=for-the-badge&logo=python&logoColor=white)](https://pypi.org/project/active-vision/)
2
+ [![PyPI](https://img.shields.io/pypi/v/active-vision?style=for-the-badge&logo=pypi&logoColor=white)](https://pypi.org/project/active-vision/)
3
+ [![Downloads](https://img.shields.io/pepy/dt/active-vision?style=for-the-badge&logo=pypi&logoColor=white&label=Downloads&color=purple)](https://pypi.org/project/active-vision/)
4
+ [![License](https://img.shields.io/badge/License-Apache%202.0-green.svg?style=for-the-badge&logo=apache&logoColor=white)](https://github.com/dnth/active-vision/blob/main/LICENSE)
5
5
 
6
6
  <p align="center">
7
7
  <img src="https://raw.githubusercontent.com/dnth/active-vision/main/assets/logo.png" alt="active-vision">
@@ -28,13 +28,14 @@ The goal of this project is to create a framework for the active learning loop f
28
28
 
29
29
  Uncertainty Sampling:
30
30
  - [X] Least confidence
31
- - [ ] Margin of confidence
32
- - [ ] Ratio of confidence
33
- - [ ] Entropy
31
+ - [X] Margin of confidence
32
+ - [X] Ratio of confidence
33
+ - [X] Entropy
34
34
 
35
35
  Diverse Sampling:
36
36
  - [X] Random sampling
37
- - [ ] Model-based outlier
37
+ - [X] Model-based outlier
38
+ - [ ] Embeddings-based outlier
38
39
  - [ ] Cluster-based
39
40
  - [ ] Representative
40
41
 
@@ -153,7 +154,7 @@ The active learning loop is a iterative process and can keep going until you hit
153
154
  - You hit a budget.
154
155
  - Other criteria.
155
156
 
156
- For this dataset,I decided to stop the active learning loop at 275 labeled images because the performance on the evaluation set is close to the top performing model on the leaderboard.
157
+ For this dataset, I decided to stop the active learning loop at 275 labeled images because the performance on the evaluation set exceeds the top performing model on the leaderboard.
157
158
 
158
159
 
159
160
  | #Labeled Images | Evaluation Accuracy | Train Epochs | Model | Active Learning | Source |
@@ -1,7 +1,7 @@
1
1
  [project]
2
2
  name = "active-vision"
3
- version = "0.1.1"
4
- description = "Active learning for edge vision."
3
+ version = "0.3.0"
4
+ description = "Active learning for computer vision."
5
5
  readme = "README.md"
6
6
  requires-python = ">=3.10"
7
7
  dependencies = [
@@ -0,0 +1,3 @@
1
+ __version__ = "0.3.0"
2
+
3
+ from .core import *
@@ -2,6 +2,8 @@ import pandas as pd
2
2
  from loguru import logger
3
3
  from fastai.vision.all import *
4
4
  import torch
5
+ import numpy as np
6
+ import bisect
5
7
 
6
8
  import warnings
7
9
  from typing import Callable
@@ -55,7 +57,6 @@ class ActiveLearner:
55
57
  learner_path: str = None,
56
58
  ):
57
59
  logger.info(f"Loading dataset from {filepath_col} and {label_col}")
58
- self.train_set = df.copy()
59
60
 
60
61
  logger.info("Creating dataloaders")
61
62
  self.dls = ImageDataLoaders.from_df(
@@ -84,6 +85,8 @@ class ActiveLearner:
84
85
  self.dls, self.model, metrics=accuracy
85
86
  ).to_fp16()
86
87
 
88
+ self.train_set = self.learn.dls.train_ds.items
89
+ self.valid_set = self.learn.dls.valid_ds.items
87
90
  self.class_names = self.dls.vocab
88
91
  self.num_classes = self.dls.c
89
92
  logger.info("Done. Ready to train.")
@@ -135,16 +138,24 @@ class ActiveLearner:
135
138
  """
136
139
  logger.info(f"Running inference on {len(filepaths)} samples")
137
140
  test_dl = self.dls.test_dl(filepaths, bs=batch_size)
138
- preds, _, cls_preds = self.learn.get_preds(dl=test_dl, with_decoded=True)
141
+
142
+ def identity(x):
143
+ return x
144
+
145
+ logits, _, class_idxs = self.learn.get_preds(
146
+ dl=test_dl, with_decoded=True, act=identity
147
+ )
139
148
 
140
149
  self.pred_df = pd.DataFrame(
141
150
  {
142
151
  "filepath": filepaths,
143
- "pred_label": [self.learn.dls.vocab[i] for i in cls_preds.numpy()],
144
- "pred_conf": torch.max(preds, dim=1)[0].numpy(),
145
- "pred_raw": preds.numpy().tolist(),
152
+ "pred_label": [self.learn.dls.vocab[i] for i in class_idxs.numpy()],
153
+ "pred_conf": torch.max(F.softmax(logits, dim=1), dim=1)[0].numpy(),
154
+ "probs": F.softmax(logits, dim=1).numpy().tolist(),
155
+ "logits": logits.numpy().tolist(),
146
156
  }
147
157
  )
158
+
148
159
  return self.pred_df
149
160
 
150
161
  def evaluate(
@@ -189,38 +200,62 @@ class ActiveLearner:
189
200
  df = df[~df["filepath"].isin(self.train_set["filepath"])].copy()
190
201
 
191
202
  if strategy == "least-confidence":
192
- logger.info(f"Getting top {num_samples} low confidence samples")
193
-
194
- df.loc[:, "uncertainty_score"] = 1 - (df["pred_conf"]) / (
203
+ logger.info(
204
+ f"Using least confidence strategy to get top {num_samples} samples"
205
+ )
206
+ df.loc[:, "score"] = 1 - (df["pred_conf"]) / (
195
207
  self.num_classes - (self.num_classes - 1)
196
208
  )
197
209
 
198
- # Sort by descending uncertainty score
199
- uncertain_df = df.sort_values(by="uncertainty_score", ascending=False).head(
200
- num_samples
210
+ elif strategy == "margin-of-confidence":
211
+ logger.info(
212
+ f"Using margin of confidence strategy to get top {num_samples} samples"
201
213
  )
202
- return uncertain_df
214
+ if len(df["probs"].iloc[0]) < 2:
215
+ logger.error("probs has less than 2 elements")
216
+ raise ValueError("probs has less than 2 elements")
203
217
 
204
- # TODO: Implement margin of confidence strategy
205
- elif strategy == "margin-of-confidence":
206
- logger.error("Margin of confidence strategy not implemented")
207
- raise NotImplementedError("Margin of confidence strategy not implemented")
218
+ # Calculate uncertainty score as 1 - (difference between top two predictions)
219
+ df.loc[:, "score"] = df["probs"].apply(
220
+ lambda x: 1 - (np.sort(x)[-1] - np.sort(x)[-2])
221
+ )
208
222
 
209
- # TODO: Implement ratio of confidence strategy
210
223
  elif strategy == "ratio-of-confidence":
211
- logger.error("Ratio of confidence strategy not implemented")
212
- raise NotImplementedError("Ratio of confidence strategy not implemented")
224
+ logger.info(
225
+ f"Using ratio of confidence strategy to get top {num_samples} samples"
226
+ )
227
+ if len(df["probs"].iloc[0]) < 2:
228
+ logger.error("probs has less than 2 elements")
229
+ raise ValueError("probs has less than 2 elements")
230
+
231
+ # Calculate uncertainty score as ratio of top two predictions
232
+ df.loc[:, "score"] = df["probs"].apply(
233
+ lambda x: np.sort(x)[-2] / np.sort(x)[-1]
234
+ )
213
235
 
214
- # TODO: Implement entropy strategy
215
236
  elif strategy == "entropy":
216
- logger.error("Entropy strategy not implemented")
217
- raise NotImplementedError("Entropy strategy not implemented")
237
+ logger.info(f"Using entropy strategy to get top {num_samples} samples")
238
+
239
+ # Calculate uncertainty score as entropy of the prediction
240
+ df.loc[:, "score"] = df["probs"].apply(lambda x: -np.sum(x * np.log2(x)))
241
+
242
+ # Normalize the uncertainty score to be between 0 and 1 by dividing by log2 of the number of classes
243
+ df.loc[:, "score"] = df["score"] / np.log2(self.num_classes)
218
244
 
219
245
  else:
220
246
  logger.error(f"Unknown strategy: {strategy}")
221
247
  raise ValueError(f"Unknown strategy: {strategy}")
222
248
 
223
- def sample_diverse(self, df: pd.DataFrame, num_samples: int):
249
+ df = df[["filepath", "pred_label", "pred_conf", "score", "probs", "logits"]]
250
+
251
+ df["score"] = df["score"].map("{:.4f}".format)
252
+ df["pred_conf"] = df["pred_conf"].map("{:.4f}".format)
253
+
254
+ return df.sort_values(by="score", ascending=False).head(num_samples)
255
+
256
+ def sample_diverse(
257
+ self, df: pd.DataFrame, num_samples: int, strategy: str = "model-based-outlier"
258
+ ):
224
259
  """
225
260
  Sample top `num_samples` diverse samples. Returns a df with filepaths and predicted labels, and confidence scores.
226
261
 
@@ -228,9 +263,63 @@ class ActiveLearner:
228
263
  - model-based-outlier: Get top `num_samples` samples with lowest activation of the model's last layer.
229
264
  - cluster-based: Get top `num_samples` samples with the highest distance to the nearest neighbor.
230
265
  - representative: Get top `num_samples` samples with the highest distance to the centroid of the training set.
266
+
231
267
  """
232
- logger.error("Diverse sampling strategy not implemented")
233
- raise NotImplementedError("Diverse sampling strategy not implemented")
268
+ # Remove samples that is already in the training set
269
+ df = df[~df["filepath"].isin(self.train_set["filepath"])].copy()
270
+
271
+ if strategy == "model-based-outlier":
272
+ logger.info(
273
+ f"Using model-based outlier strategy to get top {num_samples} samples"
274
+ )
275
+
276
+ # Get the activations for all items in the validation set.
277
+ valid_set_preds = self.predict(self.valid_set["filepath"].tolist())
278
+
279
+ # Store logits for each class in a list instead of dict
280
+ validation_class_logits = [
281
+ sorted(
282
+ valid_set_preds["logits"].apply(lambda x: x[i]).tolist(),
283
+ reverse=True,
284
+ )
285
+ for i in range(self.num_classes)
286
+ ]
287
+
288
+ # Get the logits for the unlabeled set
289
+ unlabeled_set_preds = self.predict(df["filepath"].tolist())
290
+
291
+ # For each element in the unlabeled set logits, compare it to the validation set ranked logits and get the position in the ranked logits
292
+ unlabeled_set_logits = []
293
+ for idx, row in unlabeled_set_preds.iterrows():
294
+ logits = row["logits"]
295
+ # For each class, find where this sample's logit would rank in the validation set
296
+ ranks = []
297
+ for class_idx in range(self.num_classes):
298
+ class_logit = logits[class_idx]
299
+ ranked_logits = validation_class_logits[
300
+ class_idx
301
+ ] # Access by index instead of dict key
302
+ # Find position where this logit would be inserted to maintain sorted order
303
+ # Now using bisect_left directly since logits are sorted high to low
304
+ rank = bisect.bisect_left(ranked_logits, class_logit)
305
+ ranks.append(
306
+ rank / len(ranked_logits)
307
+ ) # Normalize rank to 0-1 range
308
+
309
+ # Average rank across all classes - lower means more outlier-like
310
+ avg_rank = np.mean(ranks)
311
+ unlabeled_set_logits.append(avg_rank)
312
+
313
+ # Add outlier scores to dataframe
314
+ df.loc[:, "score"] = unlabeled_set_logits
315
+
316
+ df = df[["filepath", "pred_label", "pred_conf", "score", "probs", "logits"]]
317
+
318
+ df["score"] = df["score"].map("{:.4f}".format)
319
+ df["pred_conf"] = df["pred_conf"].map("{:.4f}".format)
320
+
321
+ # Sort by score ascending higher rank = more outlier-like compared to the validation set
322
+ return df.sort_values(by="score", ascending=False).head(num_samples)
234
323
 
235
324
  def sample_random(self, df: pd.DataFrame, num_samples: int, seed: int = None):
236
325
  """
@@ -258,7 +347,7 @@ class ActiveLearner:
258
347
  return;
259
348
  }
260
349
 
261
- if (e.key === "ArrowUp" || e.key === "Enter") {
350
+ if (e.key === "ArrowUp") {
262
351
  document.getElementById("submit_btn").click();
263
352
  } else if (e.key === "ArrowRight") {
264
353
  document.getElementById("next_btn").click();
@@ -284,7 +373,7 @@ class ActiveLearner:
284
373
  type="filepath",
285
374
  label="Image",
286
375
  value=filepaths[0],
287
- height=500
376
+ height=510,
288
377
  )
289
378
 
290
379
  # Add bar plot with top 5 predictions
@@ -295,11 +384,11 @@ class ActiveLearner:
295
384
  title="Top 5 Predictions",
296
385
  x_lim=[0, 1],
297
386
  value=None
298
- if "pred_raw" not in df.columns
387
+ if "probs" not in df.columns
299
388
  else pd.DataFrame(
300
389
  {
301
390
  "class": self.class_names,
302
- "probability": df["pred_raw"].iloc[0],
391
+ "probability": df["probs"].iloc[0],
303
392
  }
304
393
  ).nlargest(5, "probability"),
305
394
  )
@@ -307,18 +396,27 @@ class ActiveLearner:
307
396
  filename = gr.Textbox(
308
397
  label="Filename", value=filepaths[0], interactive=False
309
398
  )
310
-
311
- pred_label = gr.Textbox(
312
- label="Predicted Label",
313
- value=df["pred_label"].iloc[0]
314
- if "pred_label" in df.columns
315
- else "",
316
- interactive=False,
317
- )
318
- pred_conf = gr.Textbox(
319
- label="Confidence",
320
- value=f"{df['pred_conf'].iloc[0]:.2%}"
321
- if "pred_conf" in df.columns
399
+ with gr.Row():
400
+ pred_label = gr.Textbox(
401
+ label="Predicted Label",
402
+ value=df["pred_label"].iloc[0]
403
+ if "pred_label" in df.columns
404
+ else "",
405
+ interactive=False,
406
+ )
407
+
408
+ pred_conf = gr.Textbox(
409
+ label="Confidence",
410
+ value=df["pred_conf"].iloc[0]
411
+ if "pred_conf" in df.columns
412
+ else "",
413
+ interactive=False,
414
+ )
415
+
416
+ sample_score = gr.Textbox(
417
+ label="Sample Score [0-1] - Indicates how informative the sample is. Higher means more informative.",
418
+ value=df["score"].iloc[0]
419
+ if "score" in df.columns
322
420
  else "",
323
421
  interactive=False,
324
422
  )
@@ -334,7 +432,7 @@ class ActiveLearner:
334
432
  with gr.Row():
335
433
  back_btn = gr.Button("← Previous", elem_id="back_btn")
336
434
  submit_btn = gr.Button(
337
- "Submit (↑/Enter)",
435
+ "Submit ",
338
436
  variant="primary",
339
437
  elem_id="submit_btn",
340
438
  )
@@ -344,8 +442,26 @@ class ActiveLearner:
344
442
  minimum=0,
345
443
  maximum=len(filepaths) - 1,
346
444
  value=0,
445
+ step=1,
347
446
  label="Progress",
348
- interactive=False,
447
+ interactive=True,
448
+ )
449
+
450
+ # Add event handler for slider changes
451
+ progress.change(
452
+ fn=lambda idx: navigate(idx, 0),
453
+ inputs=[progress],
454
+ outputs=[
455
+ filename,
456
+ image,
457
+ pred_label,
458
+ pred_conf,
459
+ category,
460
+ current_index,
461
+ progress,
462
+ pred_plot,
463
+ sample_score,
464
+ ],
349
465
  )
350
466
 
351
467
  finish_btn = gr.Button("Finish Labeling", variant="primary")
@@ -434,11 +550,11 @@ class ActiveLearner:
434
550
  if 0 <= next_idx < len(filepaths):
435
551
  plot_data = (
436
552
  None
437
- if "pred_raw" not in df.columns
553
+ if "probs" not in df.columns
438
554
  else pd.DataFrame(
439
555
  {
440
556
  "class": self.class_names,
441
- "probability": df["pred_raw"].iloc[next_idx],
557
+ "probability": df["probs"].iloc[next_idx],
442
558
  }
443
559
  ).nlargest(5, "probability")
444
560
  )
@@ -448,7 +564,7 @@ class ActiveLearner:
448
564
  df["pred_label"].iloc[next_idx]
449
565
  if "pred_label" in df.columns
450
566
  else "",
451
- f"{df['pred_conf'].iloc[next_idx]:.2%}"
567
+ df["pred_conf"].iloc[next_idx]
452
568
  if "pred_conf" in df.columns
453
569
  else "",
454
570
  df["pred_label"].iloc[next_idx]
@@ -457,14 +573,15 @@ class ActiveLearner:
457
573
  next_idx,
458
574
  next_idx,
459
575
  plot_data,
576
+ df["score"].iloc[next_idx] if "score" in df.columns else "",
460
577
  )
461
578
  plot_data = (
462
579
  None
463
- if "pred_raw" not in df.columns
580
+ if "probs" not in df.columns
464
581
  else pd.DataFrame(
465
582
  {
466
583
  "class": self.class_names,
467
- "probability": df["pred_raw"].iloc[current_idx],
584
+ "probability": df["probs"].iloc[current_idx],
468
585
  }
469
586
  ).nlargest(5, "probability")
470
587
  )
@@ -474,7 +591,7 @@ class ActiveLearner:
474
591
  df["pred_label"].iloc[current_idx]
475
592
  if "pred_label" in df.columns
476
593
  else "",
477
- f"{df['pred_conf'].iloc[current_idx]:.2%}"
594
+ df["pred_conf"].iloc[current_idx]
478
595
  if "pred_conf" in df.columns
479
596
  else "",
480
597
  df["pred_label"].iloc[current_idx]
@@ -483,6 +600,7 @@ class ActiveLearner:
483
600
  current_idx,
484
601
  current_idx,
485
602
  plot_data,
603
+ df["score"].iloc[current_idx] if "score" in df.columns else "",
486
604
  )
487
605
 
488
606
  def save_and_next(current_idx, selected_category):
@@ -490,21 +608,32 @@ class ActiveLearner:
490
608
  current_idx = int(current_idx)
491
609
 
492
610
  if selected_category is None:
493
- plot_data = None if "pred_raw" not in df.columns else pd.DataFrame(
494
- {
495
- "class": self.class_names,
496
- "probability": df["pred_raw"].iloc[current_idx],
497
- }
498
- ).nlargest(5, "probability")
611
+ plot_data = (
612
+ None
613
+ if "probs" not in df.columns
614
+ else pd.DataFrame(
615
+ {
616
+ "class": self.class_names,
617
+ "probability": df["probs"].iloc[current_idx],
618
+ }
619
+ ).nlargest(5, "probability")
620
+ )
499
621
  return (
500
622
  filepaths[current_idx],
501
623
  filepaths[current_idx],
502
- df["pred_label"].iloc[current_idx] if "pred_label" in df.columns else "",
503
- f"{df['pred_conf'].iloc[current_idx]:.2%}" if "pred_conf" in df.columns else "",
504
- df["pred_label"].iloc[current_idx] if "pred_label" in df.columns else None,
624
+ df["pred_label"].iloc[current_idx]
625
+ if "pred_label" in df.columns
626
+ else "",
627
+ df["pred_conf"].iloc[current_idx]
628
+ if "pred_conf" in df.columns
629
+ else "",
630
+ df["pred_label"].iloc[current_idx]
631
+ if "pred_label" in df.columns
632
+ else None,
505
633
  current_idx,
506
634
  current_idx,
507
635
  plot_data,
636
+ df["score"].iloc[current_idx] if "score" in df.columns else "",
508
637
  )
509
638
 
510
639
  # Save the current annotation
@@ -514,38 +643,58 @@ class ActiveLearner:
514
643
  # Move to next image if not at the end
515
644
  next_idx = current_idx + 1
516
645
  if next_idx >= len(filepaths):
517
- plot_data = None if "pred_raw" not in df.columns else pd.DataFrame(
518
- {
519
- "class": self.class_names,
520
- "probability": df["pred_raw"].iloc[current_idx],
521
- }
522
- ).nlargest(5, "probability")
646
+ plot_data = (
647
+ None
648
+ if "probs" not in df.columns
649
+ else pd.DataFrame(
650
+ {
651
+ "class": self.class_names,
652
+ "probability": df["probs"].iloc[current_idx],
653
+ }
654
+ ).nlargest(5, "probability")
655
+ )
523
656
  return (
524
657
  filepaths[current_idx],
525
658
  filepaths[current_idx],
526
- df["pred_label"].iloc[current_idx] if "pred_label" in df.columns else "",
527
- f"{df['pred_conf'].iloc[current_idx]:.2%}" if "pred_conf" in df.columns else "",
528
- df["pred_label"].iloc[current_idx] if "pred_label" in df.columns else None,
659
+ df["pred_label"].iloc[current_idx]
660
+ if "pred_label" in df.columns
661
+ else "",
662
+ df["pred_conf"].iloc[current_idx]
663
+ if "pred_conf" in df.columns
664
+ else "",
665
+ df["pred_label"].iloc[current_idx]
666
+ if "pred_label" in df.columns
667
+ else None,
529
668
  current_idx,
530
669
  current_idx,
531
670
  plot_data,
671
+ df["score"].iloc[current_idx] if "score" in df.columns else "",
532
672
  )
533
673
 
534
- plot_data = None if "pred_raw" not in df.columns else pd.DataFrame(
535
- {
536
- "class": self.class_names,
537
- "probability": df["pred_raw"].iloc[next_idx],
538
- }
539
- ).nlargest(5, "probability")
674
+ plot_data = (
675
+ None
676
+ if "probs" not in df.columns
677
+ else pd.DataFrame(
678
+ {
679
+ "class": self.class_names,
680
+ "probability": df["probs"].iloc[next_idx],
681
+ }
682
+ ).nlargest(5, "probability")
683
+ )
540
684
  return (
541
685
  filepaths[next_idx],
542
686
  filepaths[next_idx],
543
- df["pred_label"].iloc[next_idx] if "pred_label" in df.columns else "",
544
- f"{df['pred_conf'].iloc[next_idx]:.2%}" if "pred_conf" in df.columns else "",
545
- df["pred_label"].iloc[next_idx] if "pred_label" in df.columns else None,
687
+ df["pred_label"].iloc[next_idx]
688
+ if "pred_label" in df.columns
689
+ else "",
690
+ df["pred_conf"].iloc[next_idx] if "pred_conf" in df.columns else "",
691
+ df["pred_label"].iloc[next_idx]
692
+ if "pred_label" in df.columns
693
+ else None,
546
694
  next_idx,
547
695
  next_idx,
548
696
  plot_data,
697
+ df["score"].iloc[next_idx] if "score" in df.columns else "",
549
698
  )
550
699
 
551
700
  def convert_csv_to_parquet():
@@ -571,6 +720,7 @@ class ActiveLearner:
571
720
  current_index,
572
721
  progress,
573
722
  pred_plot,
723
+ sample_score,
574
724
  ],
575
725
  )
576
726
 
@@ -586,6 +736,7 @@ class ActiveLearner:
586
736
  current_index,
587
737
  progress,
588
738
  pred_plot,
739
+ sample_score,
589
740
  ],
590
741
  )
591
742
 
@@ -601,6 +752,7 @@ class ActiveLearner:
601
752
  current_index,
602
753
  progress,
603
754
  pred_plot,
755
+ sample_score,
604
756
  ],
605
757
  )
606
758
 
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: active-vision
3
- Version: 0.1.1
4
- Summary: Active learning for edge vision.
3
+ Version: 0.3.0
4
+ Summary: Active learning for computer vision.
5
5
  Requires-Python: >=3.10
6
6
  Description-Content-Type: text/markdown
7
7
  License-File: LICENSE
@@ -17,10 +17,10 @@ Requires-Dist: timm>=1.0.13
17
17
  Requires-Dist: transformers>=4.48.0
18
18
  Requires-Dist: xinfer>=0.3.2
19
19
 
20
- ![Python Version](https://img.shields.io/badge/python-3.10%2B-blue?style=for-the-badge)
21
- ![License](https://img.shields.io/badge/License-Apache%202.0-green.svg?style=for-the-badge)
22
- [![PyPI](https://img.shields.io/pypi/v/active-vision?style=for-the-badge)](https://pypi.org/project/active-vision/)
23
- ![Downloads](https://img.shields.io/pepy/dt/active-vision?style=for-the-badge&logo=pypi&logoColor=white&label=Downloads&color=purple)
20
+ [![Python Version](https://img.shields.io/badge/python-3.10%2B-blue?style=for-the-badge&logo=python&logoColor=white)](https://pypi.org/project/active-vision/)
21
+ [![PyPI](https://img.shields.io/pypi/v/active-vision?style=for-the-badge&logo=pypi&logoColor=white)](https://pypi.org/project/active-vision/)
22
+ [![Downloads](https://img.shields.io/pepy/dt/active-vision?style=for-the-badge&logo=pypi&logoColor=white&label=Downloads&color=purple)](https://pypi.org/project/active-vision/)
23
+ [![License](https://img.shields.io/badge/License-Apache%202.0-green.svg?style=for-the-badge&logo=apache&logoColor=white)](https://github.com/dnth/active-vision/blob/main/LICENSE)
24
24
 
25
25
  <p align="center">
26
26
  <img src="https://raw.githubusercontent.com/dnth/active-vision/main/assets/logo.png" alt="active-vision">
@@ -47,13 +47,14 @@ The goal of this project is to create a framework for the active learning loop f
47
47
 
48
48
  Uncertainty Sampling:
49
49
  - [X] Least confidence
50
- - [ ] Margin of confidence
51
- - [ ] Ratio of confidence
52
- - [ ] Entropy
50
+ - [X] Margin of confidence
51
+ - [X] Ratio of confidence
52
+ - [X] Entropy
53
53
 
54
54
  Diverse Sampling:
55
55
  - [X] Random sampling
56
- - [ ] Model-based outlier
56
+ - [X] Model-based outlier
57
+ - [ ] Embeddings-based outlier
57
58
  - [ ] Cluster-based
58
59
  - [ ] Representative
59
60
 
@@ -172,7 +173,7 @@ The active learning loop is a iterative process and can keep going until you hit
172
173
  - You hit a budget.
173
174
  - Other criteria.
174
175
 
175
- For this dataset,I decided to stop the active learning loop at 275 labeled images because the performance on the evaluation set is close to the top performing model on the leaderboard.
176
+ For this dataset, I decided to stop the active learning loop at 275 labeled images because the performance on the evaluation set exceeds the top performing model on the leaderboard.
176
177
 
177
178
 
178
179
  | #Labeled Images | Evaluation Accuracy | Train Epochs | Model | Active Learning | Source |
@@ -1,3 +0,0 @@
1
- __version__ = "0.1.1"
2
-
3
- from .core import *
File without changes
File without changes