Perception 0.7.5__tar.gz → 0.7.6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (60) hide show
  1. {perception-0.7.5 → perception-0.7.6}/PKG-INFO +1 -1
  2. perception-0.7.6/perception/__init__.py +3 -0
  3. {perception-0.7.5 → perception-0.7.6}/perception/benchmarking/common.py +5 -7
  4. {perception-0.7.5 → perception-0.7.6}/perception/benchmarking/image.py +4 -5
  5. {perception-0.7.5 → perception-0.7.6}/perception/benchmarking/video.py +2 -2
  6. {perception-0.7.5 → perception-0.7.6}/perception/benchmarking/video_transforms.py +6 -8
  7. {perception-0.7.5 → perception-0.7.6}/perception/experimental/ann/index.py +13 -10
  8. {perception-0.7.5 → perception-0.7.6}/perception/experimental/ann/serve.py +2 -3
  9. {perception-0.7.5 → perception-0.7.6}/perception/experimental/approximate_deduplication.py +8 -8
  10. {perception-0.7.5 → perception-0.7.6}/perception/experimental/debug.py +1 -2
  11. {perception-0.7.5 → perception-0.7.6}/perception/experimental/local_descriptor_deduplication.py +35 -37
  12. {perception-0.7.5 → perception-0.7.6}/perception/hashers/hasher.py +14 -19
  13. {perception-0.7.5 → perception-0.7.6}/perception/hashers/tools.py +26 -29
  14. {perception-0.7.5 → perception-0.7.6}/perception/hashers/video/framewise.py +3 -7
  15. {perception-0.7.5 → perception-0.7.6}/perception/hashers/video/scenes.py +5 -8
  16. {perception-0.7.5 → perception-0.7.6}/perception/hashers/video/tmk.py +10 -11
  17. {perception-0.7.5 → perception-0.7.6}/perception/testing/__init__.py +2 -2
  18. {perception-0.7.5 → perception-0.7.6}/perception/tools.py +22 -30
  19. {perception-0.7.5 → perception-0.7.6}/pyproject.toml +1 -1
  20. {perception-0.7.5 → perception-0.7.6}/setup.py +1 -1
  21. perception-0.7.5/perception/__init__.py +0 -1
  22. {perception-0.7.5 → perception-0.7.6}/LICENSE +0 -0
  23. {perception-0.7.5 → perception-0.7.6}/README.md +0 -0
  24. {perception-0.7.5 → perception-0.7.6}/build.py +0 -0
  25. {perception-0.7.5 → perception-0.7.6}/perception/benchmarking/__init__.py +0 -0
  26. {perception-0.7.5 → perception-0.7.6}/perception/benchmarking/extensions.pyx +0 -0
  27. {perception-0.7.5 → perception-0.7.6}/perception/benchmarking/image_transforms.py +0 -0
  28. {perception-0.7.5 → perception-0.7.6}/perception/experimental/__init__.py +0 -0
  29. {perception-0.7.5 → perception-0.7.6}/perception/experimental/ann/__init__.py +0 -0
  30. {perception-0.7.5 → perception-0.7.6}/perception/extensions.pyx +0 -0
  31. {perception-0.7.5 → perception-0.7.6}/perception/hashers/__init__.py +0 -0
  32. {perception-0.7.5 → perception-0.7.6}/perception/hashers/image/__init__.py +0 -0
  33. {perception-0.7.5 → perception-0.7.6}/perception/hashers/image/average.py +0 -0
  34. {perception-0.7.5 → perception-0.7.6}/perception/hashers/image/dhash.py +0 -0
  35. {perception-0.7.5 → perception-0.7.6}/perception/hashers/image/opencv.py +0 -0
  36. {perception-0.7.5 → perception-0.7.6}/perception/hashers/image/pdq.py +0 -0
  37. {perception-0.7.5 → perception-0.7.6}/perception/hashers/image/phash.py +0 -0
  38. {perception-0.7.5 → perception-0.7.6}/perception/hashers/image/wavelet.py +0 -0
  39. {perception-0.7.5 → perception-0.7.6}/perception/hashers/video/__init__.py +0 -0
  40. {perception-0.7.5 → perception-0.7.6}/perception/py.typed +0 -0
  41. {perception-0.7.5 → perception-0.7.6}/perception/testing/images/README.md +0 -0
  42. {perception-0.7.5 → perception-0.7.6}/perception/testing/images/image1.jpg +0 -0
  43. {perception-0.7.5 → perception-0.7.6}/perception/testing/images/image10.jpg +0 -0
  44. {perception-0.7.5 → perception-0.7.6}/perception/testing/images/image2.jpg +0 -0
  45. {perception-0.7.5 → perception-0.7.6}/perception/testing/images/image3.jpg +0 -0
  46. {perception-0.7.5 → perception-0.7.6}/perception/testing/images/image4.jpg +0 -0
  47. {perception-0.7.5 → perception-0.7.6}/perception/testing/images/image5.jpg +0 -0
  48. {perception-0.7.5 → perception-0.7.6}/perception/testing/images/image6.jpg +0 -0
  49. {perception-0.7.5 → perception-0.7.6}/perception/testing/images/image7.jpg +0 -0
  50. {perception-0.7.5 → perception-0.7.6}/perception/testing/images/image8.jpg +0 -0
  51. {perception-0.7.5 → perception-0.7.6}/perception/testing/images/image9.jpg +0 -0
  52. {perception-0.7.5 → perception-0.7.6}/perception/testing/logos/README.md +0 -0
  53. {perception-0.7.5 → perception-0.7.6}/perception/testing/logos/logoipsum.png +0 -0
  54. {perception-0.7.5 → perception-0.7.6}/perception/testing/videos/README.md +0 -0
  55. {perception-0.7.5 → perception-0.7.6}/perception/testing/videos/expected_tmk.json.gz +0 -0
  56. {perception-0.7.5 → perception-0.7.6}/perception/testing/videos/rgb.m4v +0 -0
  57. {perception-0.7.5 → perception-0.7.6}/perception/testing/videos/v1.m4v +0 -0
  58. {perception-0.7.5 → perception-0.7.6}/perception/testing/videos/v2.m4v +0 -0
  59. {perception-0.7.5 → perception-0.7.6}/perception/testing/videos/v2s.mov +0 -0
  60. {perception-0.7.5 → perception-0.7.6}/perception/utils.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: Perception
3
- Version: 0.7.5
3
+ Version: 0.7.6
4
4
  Summary: Perception provides flexible, well-documented, and comprehensively tested tooling for perceptual hashing research, development, and production use.
5
5
  License: Apache-2.0
6
6
  Author: Thorn
@@ -0,0 +1,3 @@
1
+ from importlib import metadata
2
+
3
+ __version__ = metadata.version("perception")
@@ -3,12 +3,10 @@ import logging
3
3
  import os
4
4
  import shutil
5
5
  import tempfile
6
- import typing
7
6
  import uuid
8
7
  import warnings
9
8
  import zipfile
10
9
  from abc import ABC
11
- from typing import Optional
12
10
 
13
11
  import matplotlib.pyplot as plt
14
12
  import numpy as np
@@ -101,7 +99,7 @@ def compute_threshold_precision_recall(pos, neg, precision_threshold=99.9):
101
99
 
102
100
  class Filterable(ABC):
103
101
  _df: pd.DataFrame
104
- expected_columns: typing.List
102
+ expected_columns: list
105
103
 
106
104
  def __init__(self, df):
107
105
  assert sorted(df.columns) == sorted(
@@ -135,7 +133,7 @@ class Saveable(Filterable):
135
133
  def load(
136
134
  cls,
137
135
  path_to_zip_or_directory: str,
138
- storage_dir: Optional[str] = None,
136
+ storage_dir: str | None = None,
139
137
  verify_md5=True,
140
138
  ):
141
139
  """Load a dataset from a ZIP file or directory.
@@ -311,7 +309,7 @@ class BenchmarkHashes(Filterable):
311
309
 
312
310
  def __init__(self, df: pd.DataFrame):
313
311
  super().__init__(df)
314
- self._metrics: Optional[pd.DataFrame] = None
312
+ self._metrics: pd.DataFrame | None = None
315
313
 
316
314
  def __add__(self, other):
317
315
  return BenchmarkHashes(df=pd.concat([self._df, other._df]).drop_duplicates())
@@ -327,7 +325,7 @@ class BenchmarkHashes(Filterable):
327
325
  self._df.to_csv(filepath, index=False)
328
326
 
329
327
  def compute_metrics(
330
- self, custom_distance_metrics: Optional[dict] = None
328
+ self, custom_distance_metrics: dict | None = None
331
329
  ) -> pd.DataFrame:
332
330
  if self._metrics is not None:
333
331
  return self._metrics
@@ -610,7 +608,7 @@ class BenchmarkDataset(Saveable):
610
608
  expected_columns = ["filepath", "category"]
611
609
 
612
610
  @classmethod
613
- def from_tuples(cls, files: typing.List[typing.Tuple[str, str]]):
611
+ def from_tuples(cls, files: list[tuple[str, str]]):
614
612
  """Build dataset from a set of files.
615
613
 
616
614
  Args:
@@ -1,6 +1,5 @@
1
1
  import logging
2
2
  import os
3
- import typing
4
3
  import uuid
5
4
  import warnings
6
5
 
@@ -19,7 +18,7 @@ log = logging.getLogger(__name__)
19
18
 
20
19
  class BenchmarkImageTransforms(BenchmarkTransforms):
21
20
  def compute_hashes(
22
- self, hashers: typing.Dict[str, ImageHasher], max_workers: int = 5
21
+ self, hashers: dict[str, ImageHasher], max_workers: int = 5
23
22
  ) -> BenchmarkHashes:
24
23
  """Compute hashes for a series of files given some set of hashers.
25
24
 
@@ -86,7 +85,7 @@ class BenchmarkImageTransforms(BenchmarkTransforms):
86
85
  class BenchmarkImageDataset(BenchmarkDataset):
87
86
  def deduplicate(
88
87
  self, hasher: ImageHasher, threshold=0.001, isometric=False
89
- ) -> typing.Tuple["BenchmarkImageDataset", typing.Set[typing.Tuple[str, str]]]:
88
+ ) -> tuple["BenchmarkImageDataset", set[tuple[str, str]]]:
90
89
  """Remove duplicate files from dataset.
91
90
 
92
91
  Args:
@@ -99,7 +98,7 @@ class BenchmarkImageDataset(BenchmarkDataset):
99
98
  A list where each entry is a list of files that are
100
99
  duplicates of each other. We keep only the last entry.
101
100
  """
102
- pairs: typing.Set[typing.Tuple[str, str]] = set()
101
+ pairs: set[tuple[str, str]] = set()
103
102
  for _, group in tqdm(
104
103
  self._df.groupby(["category"]), desc="Deduplicating categories."
105
104
  ):
@@ -120,7 +119,7 @@ class BenchmarkImageDataset(BenchmarkDataset):
120
119
 
121
120
  def transform(
122
121
  self,
123
- transforms: typing.Dict[str, imgaug.augmenters.meta.Augmenter],
122
+ transforms: dict[str, imgaug.augmenters.meta.Augmenter],
124
123
  storage_dir: str,
125
124
  errors: str = "raise",
126
125
  ) -> BenchmarkImageTransforms:
@@ -68,7 +68,7 @@ def _process_row(row, hashers, framerates):
68
68
  class BenchmarkVideoDataset(BenchmarkDataset):
69
69
  def transform(
70
70
  self,
71
- transforms: typing.Dict[str, typing.Callable],
71
+ transforms: dict[str, typing.Callable],
72
72
  storage_dir: str,
73
73
  errors: str = "raise",
74
74
  ):
@@ -171,7 +171,7 @@ class BenchmarkVideoTransforms(BenchmarkTransforms):
171
171
  ]
172
172
 
173
173
  def compute_hashes(
174
- self, hashers: typing.Dict[str, VideoHasher], max_workers: int = 5
174
+ self, hashers: dict[str, VideoHasher], max_workers: int = 5
175
175
  ) -> BenchmarkHashes:
176
176
  """Compute hashes for a series of files given some set of hashers.
177
177
 
@@ -1,6 +1,4 @@
1
1
  import os
2
- import typing
3
- from typing import Optional
4
2
 
5
3
  import cv2
6
4
  import ffmpeg
@@ -29,12 +27,12 @@ def sanitize_output_filepath(input_filepath, output_filepath, output_ext=None):
29
27
 
30
28
 
31
29
  def get_simple_transform(
32
- width: typing.Union[str, int] = -1,
33
- height: typing.Union[str, int] = -1,
34
- pad: Optional[str] = None,
35
- codec: Optional[str] = None,
36
- clip_pct: Optional[typing.Tuple[float, float]] = None,
37
- clip_s: Optional[typing.Tuple[float, float]] = None,
30
+ width: str | int = -1,
31
+ height: str | int = -1,
32
+ pad: str | None = None,
33
+ codec: str | None = None,
34
+ clip_pct: tuple[float, float] | None = None,
35
+ clip_s: tuple[float, float] | None = None,
38
36
  sar=None,
39
37
  fps=None,
40
38
  output_ext=None,
@@ -1,7 +1,6 @@
1
1
  import time
2
2
  import typing
3
3
  import warnings
4
- from typing import Optional
5
4
 
6
5
  import faiss
7
6
  import numpy as np
@@ -10,11 +9,15 @@ import typing_extensions
10
9
 
11
10
  import perception.hashers.tools as pht
12
11
 
13
- QueryInput = typing_extensions.TypedDict("QueryInput", {"id": str, "hash": str})
14
12
 
15
- QueryMatch = typing_extensions.TypedDict(
16
- "QueryMatch", {"id": typing.Any, "matches": typing.List[dict]}
17
- )
13
+ class QueryInput(typing_extensions.TypedDict):
14
+ id: str
15
+ hash: str
16
+
17
+
18
+ class QueryMatch(typing_extensions.TypedDict):
19
+ id: typing.Any
20
+ matches: list[dict]
18
21
 
19
22
 
20
23
  class TuningFailure(Exception):
@@ -260,7 +263,7 @@ class ApproximateNearestNeighbors:
260
263
  s, hash_format=hash_format, dtype=self.dtype, hash_length=self.hash_length
261
264
  )
262
265
 
263
- def vector_to_string(self, vector, hash_format="base64") -> typing.Optional[str]:
266
+ def vector_to_string(self, vector, hash_format="base64") -> str | None:
264
267
  """Convert a vector back to string
265
268
 
266
269
  Args:
@@ -272,9 +275,9 @@ class ApproximateNearestNeighbors:
272
275
 
273
276
  def search(
274
277
  self,
275
- queries: typing.List[QueryInput],
276
- threshold: Optional[int] = None,
277
- threshold_func: Optional[typing.Callable[[np.ndarray], np.ndarray]] = None,
278
+ queries: list[QueryInput],
279
+ threshold: int | None = None,
280
+ threshold_func: typing.Callable[[np.ndarray], np.ndarray] | None = None,
278
281
  hash_format="base64",
279
282
  k=1,
280
283
  ):
@@ -318,7 +321,7 @@ class ApproximateNearestNeighbors:
318
321
  if not self.metadata_columns
319
322
  else self.query_by_id(ids=np.unique(indices[distances < thresholds]))
320
323
  )
321
- matches: typing.List[QueryMatch] = []
324
+ matches: list[QueryMatch] = []
322
325
  for match_distances, match_ids, q, q_threshold in zip(
323
326
  distances, indices, queries, thresholds
324
327
  ):
@@ -3,7 +3,6 @@ import functools
3
3
  import json
4
4
  import logging
5
5
  import typing
6
- from typing import Optional
7
6
 
8
7
  import aiohttp.web
9
8
  import numpy as np
@@ -96,8 +95,8 @@ def get_logger(name, log_level):
96
95
 
97
96
  async def serve(
98
97
  index: ApproximateNearestNeighbors,
99
- default_threshold: Optional[int] = None,
100
- default_threshold_func: Optional[typing.Callable[[np.ndarray], np.ndarray]] = None,
98
+ default_threshold: int | None = None,
99
+ default_threshold_func: typing.Callable[[np.ndarray], np.ndarray] | None = None,
101
100
  default_k: int = 1,
102
101
  concurrency: int = 2,
103
102
  log_level=logging.INFO,
@@ -2,7 +2,6 @@ import logging
2
2
  import math
3
3
  import os.path as op
4
4
  import typing
5
- from typing import Optional
6
5
 
7
6
  import faiss
8
7
  import networkit as nk
@@ -17,9 +16,10 @@ DEFAULT_PCT_PROBE = 0
17
16
  # For faiss training on datasets larger than 50,000 vectors, we take a random sub-sample.
18
17
  TRAIN_LARGE_SIZE: int = 50_000
19
18
 
20
- ClusterAssignment = typing_extensions.TypedDict(
21
- "ClusterAssignment", {"cluster": int, "id": typing.Any}
22
- )
19
+
20
+ class ClusterAssignment(typing_extensions.TypedDict):
21
+ cluster: int
22
+ id: typing.Any
23
23
 
24
24
 
25
25
  def build_index(
@@ -90,7 +90,7 @@ def compute_euclidean_pairwise_duplicates_approx(
90
90
  y_counts=None,
91
91
  pct_probe=0.1,
92
92
  use_gpu: bool = True,
93
- faiss_cache_path: Optional[str] = None,
93
+ faiss_cache_path: str | None = None,
94
94
  show_progress: bool = False,
95
95
  ):
96
96
  """Provides the same result as perception.extensions.compute_pairwise_duplicates_simple
@@ -199,12 +199,12 @@ def compute_euclidean_pairwise_duplicates_approx(
199
199
 
200
200
  def pairs_to_clusters(
201
201
  ids: typing.Iterable[str],
202
- pairs: typing.Iterable[typing.Tuple[str, str]],
202
+ pairs: typing.Iterable[tuple[str, str]],
203
203
  strictness: typing_extensions.Literal[
204
204
  "clique", "community", "component"
205
205
  ] = "clique",
206
206
  max_clique_batch_size: int = 1000,
207
- ) -> typing.List[ClusterAssignment]:
207
+ ) -> list[ClusterAssignment]:
208
208
  """Given a list of pairs of matching files, compute sets
209
209
  of cliques where all files in a clique are connected.
210
210
  Args:
@@ -232,7 +232,7 @@ def pairs_to_clusters(
232
232
  for node_pair in node_pairs:
233
233
  graph.addEdge(node_pair[0], node_pair[1])
234
234
 
235
- assignments: typing.List[ClusterAssignment] = []
235
+ assignments: list[ClusterAssignment] = []
236
236
  cluster_index = 0
237
237
  cc_query = nk.components.ConnectedComponents(graph)
238
238
  cc_query.run()
@@ -1,6 +1,5 @@
1
1
  import logging
2
2
  import random
3
- from typing import Optional
4
3
 
5
4
  import cv2
6
5
  import numpy as np
@@ -18,7 +17,7 @@ def vizualize_pair(
18
17
  features_2,
19
18
  ratio: float,
20
19
  match_metadata=None,
21
- local_path_col: Optional[str] = None,
20
+ local_path_col: str | None = None,
22
21
  sanitized: bool = False,
23
22
  include_all_points=False,
24
23
  circle_size=KEYPOINT_SIZE,
@@ -35,20 +35,20 @@ class Descriptors(typing_extensions.TypedDict):
35
35
  keypoints: np.ndarray
36
36
  descriptors: np.ndarray
37
37
  descriptor_count: int
38
- dimensions: typing.Tuple[int, int]
38
+ dimensions: tuple[int, int]
39
39
  filepath: str
40
40
  hasher: str
41
41
 
42
42
 
43
43
  class MatchStats(typing_extensions.TypedDict):
44
- match: typing.Optional[float]
45
- min_kpBM: typing.Optional[int]
46
- MAB: typing.Optional[str]
47
- intersection: typing.Optional[float]
48
- inliers: typing.Optional[float]
49
- bounds_intersection: typing.Optional[float]
50
- final_matched_a_pts: typing.Optional[typing.List[np.ndarray]]
51
- final_matched_b_pts: typing.Optional[typing.List[np.ndarray]]
44
+ match: float | None
45
+ min_kpBM: int | None
46
+ MAB: str | None
47
+ intersection: float | None
48
+ inliers: float | None
49
+ bounds_intersection: float | None
50
+ final_matched_a_pts: list[np.ndarray] | None
51
+ final_matched_b_pts: list[np.ndarray] | None
52
52
 
53
53
 
54
54
  class LocalHasher(ABC):
@@ -76,7 +76,7 @@ class LocalHasher(ABC):
76
76
  self.validation_inliers = validation_inliers
77
77
  self.validation_intersection = validation_intersection
78
78
 
79
- def compute(self, image) -> typing.Tuple[np.ndarray, np.ndarray]:
79
+ def compute(self, image) -> tuple[np.ndarray, np.ndarray]:
80
80
  return self.hasher.detectAndCompute(image, None)
81
81
 
82
82
  def validate_match(
@@ -86,7 +86,7 @@ class LocalHasher(ABC):
86
86
  minimum_match: float = DEFAULT_MATCH_PCT,
87
87
  minimum_intersection: float = DEFAULT_INTERSECTION,
88
88
  minimum_inliers: int = DEFAULT_INLIERS,
89
- ) -> typing.Tuple[bool, MatchStats]:
89
+ ) -> tuple[bool, MatchStats]:
90
90
  """Validate the match between two sets of keypoints and descriptors. The
91
91
  validation algorithm is as follows:
92
92
 
@@ -307,10 +307,10 @@ def load_and_preprocess(filepath, max_size=DEFAULT_MAX_SIZE, grayscale=True):
307
307
 
308
308
  def generate_image_descriptors(
309
309
  filepath: str,
310
- hasher: typing.Optional[LocalHasher] = None,
310
+ hasher: LocalHasher | None = None,
311
311
  min_features=DEFAULT_MIN_FEATURES,
312
312
  max_size=DEFAULT_MAX_SIZE,
313
- ) -> typing.Optional[Descriptors]:
313
+ ) -> Descriptors | None:
314
314
  """Generate local descriptors for a file.
315
315
 
316
316
  Args:
@@ -362,7 +362,7 @@ def generate_image_descriptors(
362
362
 
363
363
  def build_reference_df(
364
364
  filepaths: typing.Iterable[str],
365
- hasher: typing.Optional[LocalHasher] = None,
365
+ hasher: LocalHasher | None = None,
366
366
  min_features=DEFAULT_MIN_FEATURES,
367
367
  max_size=DEFAULT_MAX_SIZE,
368
368
  show_progress=False,
@@ -429,10 +429,10 @@ def check_hasher(df1: pd.DataFrame, df2: pd.DataFrame):
429
429
  def compute_pairs(
430
430
  match_df,
431
431
  query_df=None,
432
- hasher: typing.Optional[LocalHasher] = None,
432
+ hasher: LocalHasher | None = None,
433
433
  pct_probe=0.1,
434
434
  use_gpu: bool = True,
435
- faiss_cache_path: typing.Optional[str] = None,
435
+ faiss_cache_path: str | None = None,
436
436
  show_progress: bool = False,
437
437
  ):
438
438
  """Compute pairs of matching images from a reference
@@ -537,18 +537,18 @@ def deduplicate_sift_dfs(*args, **kwargs):
537
537
 
538
538
  def deduplicate_dfs(
539
539
  match_df: pd.DataFrame,
540
- query_df: typing.Optional[pd.DataFrame] = None,
540
+ query_df: pd.DataFrame | None = None,
541
541
  coarse_pct_probe: float = ad.DEFAULT_PCT_PROBE,
542
- max_workers: typing.Optional[int] = None,
542
+ max_workers: int | None = None,
543
543
  use_gpu: bool = True,
544
- faiss_cache_path: typing.Optional[str] = None,
544
+ faiss_cache_path: str | None = None,
545
545
  verbose: bool = False,
546
- hasher: typing.Optional[LocalHasher] = None,
546
+ hasher: LocalHasher | None = None,
547
547
  show_progress: bool = False,
548
- ) -> typing.Union[
549
- typing.List[typing.Tuple[typing.Any, typing.Any]],
550
- typing.List[typing.Tuple[typing.Any, typing.Any, MatchStats]],
551
- ]:
548
+ ) -> (
549
+ list[tuple[typing.Any, typing.Any]]
550
+ | list[tuple[typing.Any, typing.Any, MatchStats]]
551
+ ):
552
552
  """Deduplicate images within one set of images or between two sets of images:
553
553
  #. Given a dataframe (or two) of descriptors and keypoints for images.
554
554
  #. Perform a coarse, approximate search for images with common features.
@@ -606,10 +606,10 @@ def deduplicate_dfs(
606
606
  ), "Index of query_df must be unique, or it will cause wrong matches."
607
607
 
608
608
  LOGGER.debug("Validating candidate pairs: %d", len(candidates))
609
- keep: typing.Union[
610
- typing.List[typing.Tuple[typing.Any, typing.Any]],
611
- typing.List[typing.Tuple[typing.Any, typing.Any, MatchStats]],
612
- ] = [] # type: ignore
609
+ keep: (
610
+ list[tuple[typing.Any, typing.Any]]
611
+ | list[tuple[typing.Any, typing.Any, MatchStats]]
612
+ ) = [] # type: ignore
613
613
  with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
614
614
  batch_size = 10_000
615
615
  for start in tqdm.tqdm(range(0, len(candidates), batch_size)):
@@ -638,20 +638,18 @@ def deduplicate_dfs(
638
638
 
639
639
 
640
640
  def deduplicate(
641
- filepaths_or_reference_df: typing.Union[typing.Iterable[str], pd.DataFrame],
642
- query_filepaths_or_df: typing.Optional[
643
- typing.Union[typing.Iterable[str], pd.DataFrame]
644
- ] = None,
641
+ filepaths_or_reference_df: typing.Iterable[str] | pd.DataFrame,
642
+ query_filepaths_or_df: None | (typing.Iterable[str] | pd.DataFrame) = None,
645
643
  max_features: int = DEFAULT_MAX_FEATURES,
646
644
  min_features: int = DEFAULT_MIN_FEATURES,
647
645
  max_size: int = DEFAULT_MAX_SIZE,
648
- hasher: typing.Optional[LocalHasher] = None,
646
+ hasher: LocalHasher | None = None,
649
647
  show_progress: bool = False,
650
648
  **kwargs,
651
- ) -> typing.Union[
652
- typing.List[typing.Tuple[typing.Any, typing.Any]],
653
- typing.List[typing.Tuple[typing.Any, typing.Any, MatchStats]],
654
- ]:
649
+ ) -> (
650
+ list[tuple[typing.Any, typing.Any]]
651
+ | list[tuple[typing.Any, typing.Any, MatchStats]]
652
+ ):
655
653
  """Deduplicate images by doing the following:
656
654
  #. Unletterbox all images and resize to some maximum size, preserving
657
655
  aspect ratio.
@@ -3,7 +3,6 @@ import typing
3
3
  import warnings
4
4
  from abc import ABC, abstractmethod
5
5
  from logging import warning
6
- from typing import Optional
7
6
 
8
7
  import numpy as np
9
8
  import scipy.spatial
@@ -50,7 +49,7 @@ class Hasher(ABC):
50
49
 
51
50
  def vector_to_string(
52
51
  self, vector: np.ndarray, hash_format: str = "base64"
53
- ) -> typing.Optional[str]:
52
+ ) -> str | None:
54
53
  """Convert vector to hash string.
55
54
 
56
55
  Args:
@@ -61,8 +60,8 @@ class Hasher(ABC):
61
60
 
62
61
  def compute_distance(
63
62
  self,
64
- hash1: typing.Union[np.ndarray, str],
65
- hash2: typing.Union[np.ndarray, str],
63
+ hash1: np.ndarray | str,
64
+ hash2: np.ndarray | str,
66
65
  hash_format="base64",
67
66
  ):
68
67
  """Compute the distance between two hashes.
@@ -110,9 +109,9 @@ class Hasher(ABC):
110
109
  @typing.no_type_check
111
110
  def compute_parallel(
112
111
  self,
113
- filepaths: typing.List[str],
114
- progress: Optional["tqdm.tqdm"] = None,
115
- progress_desc: Optional[str] = None,
112
+ filepaths: list[str],
113
+ progress: tqdm.tqdm | None = None,
114
+ progress_desc: str | None = None,
116
115
  max_workers: int = 5,
117
116
  isometric: bool = False,
118
117
  ):
@@ -231,9 +230,7 @@ class ImageHasher(Hasher):
231
230
 
232
231
  def compute(
233
232
  self, image: tools.ImageInputType, hash_format="base64"
234
- ) -> typing.Union[
235
- np.ndarray, typing.Optional[str], typing.List[typing.Optional[str]]
236
- ]:
233
+ ) -> np.ndarray | str | None | list[str | None]:
237
234
  """Compute a hash from an image.
238
235
 
239
236
  Args:
@@ -259,10 +256,8 @@ class ImageHasher(Hasher):
259
256
 
260
257
  def compute_with_quality(
261
258
  self, image: tools.ImageInputType, hash_format="base64"
262
- ) -> typing.Tuple[
263
- typing.Union[
264
- np.ndarray, typing.Optional[str], typing.List[typing.Optional[str]]
265
- ],
259
+ ) -> tuple[
260
+ (np.ndarray | str | None | list[str | None]),
266
261
  int,
267
262
  ]:
268
263
  """Compute hash and hash quality from image.
@@ -287,7 +282,7 @@ class ImageHasher(Hasher):
287
282
  )
288
283
  return (self.vector_to_string(vector, hash_format=hash_format), quality)
289
284
 
290
- def _compute_with_quality(self, image: np.ndarray) -> typing.Tuple[np.ndarray, int]:
285
+ def _compute_with_quality(self, image: np.ndarray) -> tuple[np.ndarray, int]:
291
286
  return self._compute(image), tools.compute_quality(image)
292
287
 
293
288
 
@@ -300,9 +295,9 @@ class VideoHasher(Hasher):
300
295
  def process_frame(
301
296
  self,
302
297
  frame: np.ndarray,
303
- frame_index: typing.Optional[int],
304
- frame_timestamp: typing.Optional[float],
305
- state: Optional[dict] = None,
298
+ frame_index: int | None,
299
+ frame_timestamp: float | None,
300
+ state: dict | None = None,
306
301
  ) -> dict:
307
302
  """Called for each frame in the video. For all
308
303
  but the first frame, a state is provided recording the state from
@@ -327,7 +322,7 @@ class VideoHasher(Hasher):
327
322
  def compute_with_timestamps(
328
323
  self, filepath, errors="raise", hash_format="base64", **kwargs
329
324
  ):
330
- scenes: typing.List[dict] = []
325
+ scenes: list[dict] = []
331
326
  hashes = self.compute(filepath, errors, hash_format, scenes, **kwargs)
332
327
  return [
333
328
  {
@@ -17,7 +17,6 @@ import warnings
17
17
  from collections import Counter
18
18
  from http import client
19
19
  from numbers import Number
20
- from typing import Optional
21
20
  from urllib import request
22
21
 
23
22
  import cv2
@@ -47,7 +46,7 @@ CUDA_CODECS = {
47
46
  }
48
47
 
49
48
  FramesWithIndexesAndTimestamps = typing.Generator[
50
- typing.Tuple[np.ndarray, typing.Optional[int], typing.Optional[float]], None, None
49
+ tuple[np.ndarray, int | None, float | None], None, None
51
50
  ]
52
51
 
53
52
 
@@ -105,9 +104,7 @@ def get_string_length(hash_length: int, dtype: str, hash_format="hex") -> int:
105
104
  raise NotImplementedError("Unknown hash format: " + hash_format)
106
105
 
107
106
 
108
- def vector_to_string(
109
- vector: np.ndarray, dtype: str, hash_format: str
110
- ) -> typing.Optional[str]:
107
+ def vector_to_string(vector: np.ndarray, dtype: str, hash_format: str) -> str | None:
111
108
  """Convert vector to hash.
112
109
 
113
110
  Args:
@@ -287,8 +284,8 @@ def get_common_framerates(id_rates: dict):
287
284
  min(framerates) >= 1 / factor
288
285
  ), "Framerates must be at least 1 frame per hour."
289
286
  best_frame_count = np.inf
290
- best_grouping: typing.Optional[typing.List] = None
291
- best_frame_rates: typing.Optional[typing.List] = None
287
+ best_grouping: list | None = None
288
+ best_frame_rates: list | None = None
292
289
 
293
290
  # We try every possible grouping of framerates to minimize the number
294
291
  # of frames we decode. There is likely a better way to do this,
@@ -432,7 +429,7 @@ def get_video_properties(filepath):
432
429
  raise ValueError(f"{str(out)}: {str(err)}")
433
430
  data = json.loads(out.decode("utf-8"))["streams"][0]
434
431
  numerator, denominator = tuple(map(int, data["avg_frame_rate"].split("/")[:2]))
435
- avg_frame_rate: typing.Optional[fractions.Fraction]
432
+ avg_frame_rate: fractions.Fraction | None
436
433
  if numerator > 0 and denominator > 0:
437
434
  avg_frame_rate = fractions.Fraction(
438
435
  numerator=numerator, denominator=denominator
@@ -450,11 +447,11 @@ def get_video_properties(filepath):
450
447
 
451
448
  def read_video_to_generator_ffmpeg(
452
449
  filepath,
453
- frames_per_second: typing.Optional[typing.Union[str, float]] = None,
450
+ frames_per_second: str | float | None = None,
454
451
  errors="raise",
455
- max_duration: Optional[float] = None,
456
- max_size: Optional[int] = None,
457
- interp: Optional[str] = None,
452
+ max_duration: float | None = None,
453
+ max_size: int | None = None,
454
+ interp: str | None = None,
458
455
  frame_rounding: str = "up",
459
456
  draw_timestamps=False,
460
457
  use_cuda=False,
@@ -519,7 +516,7 @@ def read_video_to_generator_ffmpeg(
519
516
  start_time,
520
517
  ) = get_video_properties(filepath)
521
518
  start_time_offset = (
522
- 0.0 if avg_frame_rate is None else float((1 / (2 * avg_frame_rate)))
519
+ 0.0 if avg_frame_rate is None else float(1 / (2 * avg_frame_rate))
523
520
  )
524
521
  LOGGER.debug(
525
522
  "raw_width: %s, raw_height: %s, avg_frame_rate: %s, codec_name: %s, start_time: %s",
@@ -597,8 +594,8 @@ def read_video_to_generator_ffmpeg(
597
594
  bufsize=bufsize,
598
595
  ) as p:
599
596
  assert p.stdout is not None, "Could not launch subprocess pipe."
600
- timestamp: typing.Optional[float] = 0
601
- frame_index: typing.Optional[int] = 0
597
+ timestamp: float | None = 0
598
+ frame_index: int | None = 0
602
599
  while True:
603
600
  batch = p.stdout.read(bufsize)
604
601
  if not batch:
@@ -648,10 +645,10 @@ def read_video_to_generator_ffmpeg(
648
645
 
649
646
  def read_video_to_generator(
650
647
  filepath,
651
- frames_per_second: typing.Optional[typing.Union[str, float]] = None,
648
+ frames_per_second: str | float | None = None,
652
649
  errors="raise",
653
- max_duration: Optional[float] = None,
654
- max_size: Optional[int] = None,
650
+ max_duration: float | None = None,
651
+ max_size: int | None = None,
655
652
  ) -> FramesWithIndexesAndTimestamps:
656
653
  """This is used by :code:`read_video` when :code:`use_ffmpeg` is False (default).
657
654
 
@@ -674,7 +671,7 @@ def read_video_to_generator(
674
671
  if not os.path.isfile(filepath):
675
672
  raise FileNotFoundError(f"Could not find {filepath}.")
676
673
  if not os.access(filepath, os.R_OK):
677
- raise IOError(f"{filepath} is not readable")
674
+ raise OSError(f"{filepath} is not readable")
678
675
  cap = cv2.VideoCapture(filename=filepath, apiPreference=cv2.CAP_FFMPEG)
679
676
  try:
680
677
  # The purpose of the following block is largely to create a
@@ -702,9 +699,9 @@ def read_video_to_generator(
702
699
  seconds_between_grabbed_frames = 1 / file_frames_per_second
703
700
  grabbed_frame_count = 0
704
701
  if frames_per_second == "keyframes":
705
- frame_indexes: typing.Union[
706
- range, typing.List[int], typing.Iterator[int]
707
- ] = _get_keyframes(filepath)
702
+ frame_indexes: range | list[int] | typing.Iterator[int] = _get_keyframes(
703
+ filepath
704
+ )
708
705
  # The repeat flag is used to handle the case where the
709
706
  # desired sampling rate is higher than the file's frame
710
707
  # rate. In this case, we will need to repeat frames in
@@ -723,7 +720,7 @@ def read_video_to_generator(
723
720
  scale = min(max_size / max(input_width, input_height), 1)
724
721
  else:
725
722
  scale = 1
726
- target_size: typing.Optional[typing.Tuple[int, int]]
723
+ target_size: tuple[int, int] | None
727
724
  if scale < 1:
728
725
  target_size = (int(scale * input_width), int(scale * input_height))
729
726
  else:
@@ -780,7 +777,7 @@ def read_video_into_queue(*args, video_queue, terminate, func, **kwargs):
780
777
 
781
778
  def read_video(
782
779
  filepath,
783
- frames_per_second: typing.Optional[typing.Union[str, float]] = None,
780
+ frames_per_second: str | float | None = None,
784
781
  max_queue_size=128,
785
782
  use_queue=True,
786
783
  errors="raise",
@@ -822,12 +819,12 @@ def read_video(
822
819
  generator = read_video_to_generator_ffmpeg
823
820
  else:
824
821
  generator = read_video_to_generator
825
- frame_index: typing.Optional[int]
826
- timestamp: typing.Optional[float]
822
+ frame_index: int | None
823
+ timestamp: float | None
827
824
  if use_queue:
828
- video_queue = queue.Queue(
825
+ video_queue: queue.Queue[tuple[np.ndarray, int, float]] = queue.Queue(
829
826
  maxsize=max_queue_size
830
- ) # type: queue.Queue[typing.Tuple[np.ndarray, int, float]]
827
+ )
831
828
  terminate = threading.Event()
832
829
  thread = threading.Thread(
833
830
  target=read_video_into_queue,
@@ -964,7 +961,7 @@ def compute_synchronized_video_hashes(
964
961
 
965
962
  def unletterbox(
966
963
  image, only_remove_black: bool = False, min_fraction_meaningful_pixels: float = 0.1
967
- ) -> typing.Optional[typing.Tuple[typing.Tuple[int, int], typing.Tuple[int, int]]]:
964
+ ) -> tuple[tuple[int, int], tuple[int, int]] | None:
968
965
  """Return bounds of non-trivial region of image or None.
969
966
 
970
967
  Unletterboxing is cropping an image such that trivial edge regions
@@ -1,5 +1,3 @@
1
- from typing import Optional
2
-
3
1
  import numpy as np
4
2
 
5
3
  from .. import tools
@@ -17,7 +15,7 @@ class FramewiseHasher(VideoHasher):
17
15
  frame_hasher: ImageHasher,
18
16
  interframe_threshold: float,
19
17
  frames_per_second: int = 15,
20
- quality_threshold: Optional[float] = None,
18
+ quality_threshold: float | None = None,
21
19
  ):
22
20
  self.hash_length = frame_hasher.hash_length
23
21
  self.frames_per_second = frames_per_second
@@ -25,10 +23,8 @@ class FramewiseHasher(VideoHasher):
25
23
  self.distance_metric = frame_hasher.distance_metric
26
24
  if self.distance_metric == "hamming" and interframe_threshold > 1:
27
25
  raise ValueError(
28
- (
29
- "Hamming distance is always between 0 and 1 but "
30
- f"`interframe_threshold` was set to {interframe_threshold}."
31
- )
26
+ "Hamming distance is always between 0 and 1 but "
27
+ f"`interframe_threshold` was set to {interframe_threshold}."
32
28
  )
33
29
  self.dtype = frame_hasher.dtype
34
30
  self.interframe_threshold = interframe_threshold
@@ -1,5 +1,4 @@
1
1
  import logging
2
- from typing import Optional
3
2
 
4
3
  import cv2
5
4
  import numpy as np
@@ -37,7 +36,7 @@ class SimpleSceneDetection(VideoHasher):
37
36
 
38
37
  def __init__(
39
38
  self,
40
- base_hasher: Optional[VideoHasher] = None,
39
+ base_hasher: VideoHasher | None = None,
41
40
  interscene_threshold=None,
42
41
  min_frame_size=50,
43
42
  similarity_threshold=0.95,
@@ -131,12 +130,10 @@ class SimpleSceneDetection(VideoHasher):
131
130
  if subhash is not None and (
132
131
  self.base_hasher.returns_multiple
133
132
  or (
134
- (
135
- self.interscene_threshold is None
136
- or not state["scenes"]
137
- or self.compute_distance(state["scenes"][-1]["hash"], subhash)
138
- > self.interscene_threshold
139
- )
133
+ self.interscene_threshold is None
134
+ or not state["scenes"]
135
+ or self.compute_distance(state["scenes"][-1]["hash"], subhash)
136
+ > self.interscene_threshold
140
137
  )
141
138
  ):
142
139
  # Persist the scene's hash, frames, start timestamp, and end timestamp.
@@ -1,4 +1,3 @@
1
- from typing import Optional
2
1
  import platform
3
2
  import warnings
4
3
 
@@ -17,7 +16,7 @@ class TMKL2(VideoHasher):
17
16
 
18
17
  def __init__(
19
18
  self,
20
- frame_hasher: Optional[ImageHasher] = None,
19
+ frame_hasher: ImageHasher | None = None,
21
20
  frames_per_second: int = 15,
22
21
  normalization: str = "matrix",
23
22
  ):
@@ -119,23 +118,23 @@ class TMKL2(VideoHasher):
119
118
  fv_b = fv_b / norm_b
120
119
 
121
120
  if "freq" in normalization:
122
- norm_a, norm_b = [
121
+ norm_a, norm_b = (
123
122
  np.sqrt((fv**2).sum(axis=1, keepdims=True) / self.m + eps) + eps
124
123
  for fv in [fv_a, fv_b]
125
- ]
124
+ )
126
125
  fv_a = fv_a / norm_a
127
126
  fv_b = fv_b / norm_b
128
127
 
129
128
  if normalization == "matrix":
130
- norm_a, norm_b = [
129
+ norm_a, norm_b = (
131
130
  np.sqrt(np.sum(fv**2, axis=(1, 2)) + eps)[..., np.newaxis] + eps
132
131
  for fv in [fv_a, fv_b]
133
- ] # (T, 1)
132
+ ) # (T, 1)
134
133
 
135
- fv_a_sin, fv_b_sin = [fv[:, : self.m] for fv in [fv_a, fv_b]] # (T, m, d)
136
- fv_a_cos, fv_b_cos = [fv[:, self.m :] for fv in [fv_a, fv_b]] # (T, m, d)
134
+ fv_a_sin, fv_b_sin = (fv[:, : self.m] for fv in [fv_a, fv_b]) # (T, m, d)
135
+ fv_a_cos, fv_b_cos = (fv[:, self.m :] for fv in [fv_a, fv_b]) # (T, m, d)
137
136
  ms = self.ms.reshape(-1, 1) # (m, 1)
138
- dot_sin_sin, dot_sin_cos, dot_cos_cos, dot_cos_sin = [
137
+ dot_sin_sin, dot_sin_cos, dot_cos_cos, dot_cos_sin = (
139
138
  np.sum(p, axis=2, keepdims=True)
140
139
  for p in [
141
140
  fv_a_sin * fv_b_sin,
@@ -143,7 +142,7 @@ class TMKL2(VideoHasher):
143
142
  fv_a_cos * fv_b_cos,
144
143
  fv_a_cos * fv_b_sin,
145
144
  ]
146
- ] # (T, m, 1)
145
+ ) # (T, m, 1)
147
146
  delta = (
148
147
  ms.reshape(1, -1, 1) * offsets.reshape(1, -1) / self.T.reshape((-1, 1, 1))
149
148
  )
@@ -169,7 +168,7 @@ class TMKL1(VideoHasher):
169
168
 
170
169
  def __init__(
171
170
  self,
172
- frame_hasher: Optional[ImageHasher] = None,
171
+ frame_hasher: ImageHasher | None = None,
173
172
  frames_per_second: int = 15,
174
173
  dtype="float32",
175
174
  distance_metric="cosine",
@@ -127,7 +127,7 @@ def test_hasher_parallelization(hasher, test_filepaths):
127
127
 
128
128
 
129
129
  def test_video_hasher_integrity(
130
- hasher: hashers.VideoHasher, test_videos: typing.List[str] = DEFAULT_TEST_VIDEOS
130
+ hasher: hashers.VideoHasher, test_videos: list[str] = DEFAULT_TEST_VIDEOS
131
131
  ):
132
132
  test_hasher_parallelization(hasher, test_videos)
133
133
 
@@ -136,7 +136,7 @@ def test_image_hasher_integrity(
136
136
  hasher: hashers.ImageHasher,
137
137
  pil_opencv_threshold: float,
138
138
  transform_threshold: float,
139
- test_images: typing.List[str] = DEFAULT_TEST_IMAGES,
139
+ test_images: list[str] = DEFAULT_TEST_IMAGES,
140
140
  opencv_hasher: bool = False,
141
141
  ):
142
142
  """Test to ensure a hasher works correctly.
@@ -1,11 +1,9 @@
1
1
  import base64
2
2
  import json
3
3
  import os
4
- import typing
5
4
  import urllib.parse
6
5
  import urllib.request
7
6
  import warnings
8
- from typing import Optional
9
7
 
10
8
  import numpy as np
11
9
  from scipy import spatial
@@ -25,9 +23,7 @@ except ImportError:
25
23
  extensions = None
26
24
 
27
25
 
28
- def _multiple_hashes_for_ids(
29
- hashes: typing.List[typing.Tuple[str, typing.Union[str, np.ndarray]]]
30
- ):
26
+ def _multiple_hashes_for_ids(hashes: list[tuple[str, str | np.ndarray]]):
31
27
  """Check if a list of (hash_id, hash) tuples has more
32
28
  than one hash for a hash_id.
33
29
 
@@ -39,15 +35,15 @@ def _multiple_hashes_for_ids(
39
35
 
40
36
 
41
37
  def deduplicate_hashes(
42
- hashes: typing.List[typing.Tuple[str, typing.Union[str, np.ndarray]]],
38
+ hashes: list[tuple[str, str | np.ndarray]],
43
39
  threshold: float,
44
40
  hash_format: str = "base64",
45
- hasher: Optional[perception_hashers.ImageHasher] = None,
46
- hash_length: Optional[int] = None,
47
- hash_dtype: Optional[str] = None,
48
- distance_metric: Optional[str] = None,
49
- progress: Optional[tqdm] = None,
50
- ) -> typing.List[typing.Tuple[str, str]]:
41
+ hasher: perception_hashers.ImageHasher | None = None,
42
+ hash_length: int | None = None,
43
+ hash_dtype: str | None = None,
44
+ distance_metric: str | None = None,
45
+ progress: tqdm | None = None,
46
+ ) -> list[tuple[str, str]]:
51
47
  """Find duplicates using a list of precomputed hashes.
52
48
 
53
49
  Args:
@@ -102,7 +98,7 @@ def deduplicate_hashes(
102
98
  ]
103
99
  )
104
100
  files = np.array([identifier for identifier, _ in hashes])
105
- pairs: typing.List[typing.Tuple[str, str]] = []
101
+ pairs: list[tuple[str, str]] = []
106
102
  n_hashes = len(vectors)
107
103
  start_idx = 0
108
104
  end_idx = None
@@ -134,7 +130,7 @@ def deduplicate_hashes(
134
130
  # this so we can pass it to the compute_euclidean_pairwise_duplicates
135
131
  # function.
136
132
  if multiple_hashes_per_id:
137
- counts = np.zeros(shape=len(set(hash_id for hash_id, _ in hashes))).astype(
133
+ counts = np.zeros(shape=len({hash_id for hash_id, _ in hashes})).astype(
138
134
  "uint32"
139
135
  )
140
136
  previous_hash_id = None
@@ -162,11 +158,11 @@ def deduplicate_hashes(
162
158
 
163
159
 
164
160
  def deduplicate(
165
- files: typing.List[str],
166
- hashers: typing.List[typing.Tuple[perception_hashers.ImageHasher, float]],
161
+ files: list[str],
162
+ hashers: list[tuple[perception_hashers.ImageHasher, float]],
167
163
  isometric: bool = False,
168
- progress: Optional[tqdm] = None,
169
- ) -> typing.List[typing.Tuple[str, str]]:
164
+ progress: tqdm | None = None,
165
+ ) -> list[tuple[str, str]]:
170
166
  """Find duplicates in a list of files.
171
167
 
172
168
  Args:
@@ -187,7 +183,7 @@ def deduplicate(
187
183
  category=UserWarning,
188
184
  )
189
185
  files = list(files_dedup)
190
- pairs: typing.List[typing.Tuple[str, str]] = []
186
+ pairs: list[tuple[str, str]] = []
191
187
  for hasher_idx, (hasher, threshold) in enumerate(hashers):
192
188
  hash_dicts = hasher.compute_parallel(
193
189
  filepaths=files,
@@ -271,12 +267,12 @@ class SaferMatcher:
271
267
 
272
268
  def __init__(
273
269
  self,
274
- api_key: Optional[str] = None,
275
- username: Optional[str] = None,
276
- password: Optional[str] = None,
277
- url: Optional[str] = None,
278
- hasher: Optional[perception_hashers.ImageHasher] = None,
279
- hasher_api_id: Optional[str] = None,
270
+ api_key: str | None = None,
271
+ username: str | None = None,
272
+ password: str | None = None,
273
+ url: str | None = None,
274
+ hasher: perception_hashers.ImageHasher | None = None,
275
+ hasher_api_id: str | None = None,
280
276
  quality_threshold: int = 90,
281
277
  ):
282
278
  if (
@@ -322,11 +318,7 @@ class SaferMatcher:
322
318
 
323
319
  def match(
324
320
  self,
325
- images: typing.List[
326
- typing.Union[
327
- str, typing.Tuple[perception_hashers.tools.ImageInputType, str]
328
- ]
329
- ],
321
+ images: list[(str | tuple[perception_hashers.tools.ImageInputType, str])],
330
322
  ) -> dict:
331
323
  """Match hashes with the Safer matching service.
332
324
 
@@ -1,6 +1,6 @@
1
1
  [tool.poetry]
2
2
  name = "Perception"
3
- version = "0.7.5"
3
+ version = "0.7.6"
4
4
  description = "Perception provides flexible, well-documented, and comprehensively tested tooling for perceptual hashing research, development, and production use."
5
5
  authors = ["Thorn <info@wearethorn.org>"]
6
6
  license = "Apache License 2.0"
@@ -38,7 +38,7 @@ extras_require = \
38
38
 
39
39
  setup_kwargs = {
40
40
  'name': 'Perception',
41
- 'version': '0.7.5',
41
+ 'version': '0.7.6',
42
42
  'description': 'Perception provides flexible, well-documented, and comprehensively tested tooling for perceptual hashing research, development, and production use.',
43
43
  'long_description': "# perception ![ci](https://github.com/thorn-oss/perception/workflows/ci/badge.svg)\n\n`perception` provides flexible, well-documented, and comprehensively tested tooling for perceptual hashing research, development, and production use. See [the documentation](https://perception.thorn.engineering/en/latest/) for details.\n\n## Background\n\n`perception` was initially developed at [Thorn](https://www.thorn.org) as part of our work to eliminate child sexual abuse material from the internet. For more information on the issue, check out [our CEO's TED talk](https://www.thorn.org/blog/time-is-now-eliminate-csam/).\n\n## Getting Started\n\n### Installation\n\n`pip install perception`\n\n### Hashing\n\nHashing with different functions is simple with `perception`.\n\n```python\nfrom perception import hashers\n\nfile1, file2 = 'test1.jpg', 'test2.jpg'\nhasher = hashers.PHash()\nhash1, hash2 = hasher.compute(file1), hasher.compute(file2)\ndistance = hasher.compute_distance(hash1, hash2)\n```\n\n### Examples\n\nSee below for end-to-end examples for common use cases for perceptual hashes.\n\n- [Detecting child sexual abuse material](https://perception.thorn.engineering/en/latest/examples/detecting_csam.html)\n- [Deduplicating media](https://perception.thorn.engineering/en/latest/examples/deduplication.html)\n- [Benchmarking perceptual hashes](https://perception.thorn.engineering/en/latest/examples/benchmarking.html)\n\n## Supported Hashing Algorithms\n\n`perception` currently ships with:\n\n- pHash (DCT hash) (`perception.hashers.PHash`)\n- Facebook's PDQ Hash (`perception.hashers.PDQ`)\n- dHash (difference hash) (`perception.hashers.DHash`)\n- aHash (average hash) (`perception.hashers.AverageHash`)\n- Marr-Hildreth (`perception.hashers.MarrHildreth`)\n- Color Moment (`perception.hashers.ColorMoment`)\n- Block Mean (`perception.hashers.BlockMean`)\n- wHash (wavelet hash) (`perception.hashers.WaveletHash`)\n\n## Contributing\n\nTo work on the project, start by doing the following.\n\n```bash\n# Install local dependencies for\n# code completion, etc.\nmake init\n\n- To do a (close to) comprehensive check before committing code, you can use `make precommit`.\n\nTo implement new features, please first file an issue proposing your change for discussion.\n\nTo report problems, please file an issue with sample code, expected results, actual results, and a complete traceback.\n\n## Alternatives\n\nThere are other packages worth checking out to see if they meet your needs for perceptual hashing. Here are some\nexamples.\n\n- [dedupe](https://github.com/dedupeio/dedupe)\n- [imagededup](https://idealo.github.io/imagededup/)\n- [ImageHash](https://github.com/JohannesBuchner/imagehash)\n- [PhotoHash](https://github.com/bunchesofdonald/photohash)\n```\n",
44
44
  'author': 'Thorn',
@@ -1 +0,0 @@
1
- __version__ = "0.7.5"
File without changes
File without changes
File without changes