openvisionkit 0.4.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- openvisionkit/__init__.py +1 -0
- openvisionkit/_version.py +24 -0
- openvisionkit/capture/draw_object.py +296 -0
- openvisionkit/capture/image_template.py +61 -0
- openvisionkit/capture/screen_capture.py +13 -0
- openvisionkit/capture/video_recorder.py +128 -0
- openvisionkit/capture/video_template.py +336 -0
- openvisionkit/lib/classifier.py +186 -0
- openvisionkit/lib/face_detector.py +587 -0
- openvisionkit/lib/face_mesh_detector.py +913 -0
- openvisionkit/lib/form_detector.py +465 -0
- openvisionkit/lib/form_roi_annotator.py +679 -0
- openvisionkit/lib/form_roi_detector.py +1078 -0
- openvisionkit/lib/fps_counter.py +38 -0
- openvisionkit/lib/hair_segmentation.py +298 -0
- openvisionkit/lib/hand_detector.py +1230 -0
- openvisionkit/lib/image_detector.py +1095 -0
- openvisionkit/lib/object_detector.py +401 -0
- openvisionkit/lib/pose_detector.py +919 -0
- openvisionkit/lib/selfie_segmentation.py +528 -0
- openvisionkit/lib/text_detector.py +1229 -0
- openvisionkit/utility/live_plot.py +141 -0
- openvisionkit/utility/vision_utilis.py +871 -0
- openvisionkit-0.4.0.dist-info/METADATA +1018 -0
- openvisionkit-0.4.0.dist-info/RECORD +26 -0
- openvisionkit-0.4.0.dist-info/WHEEL +4 -0
|
@@ -0,0 +1,1018 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: openvisionkit
|
|
3
|
+
Version: 0.4.0
|
|
4
|
+
Summary: MediaPipe Tasks API wrapper for Python computer vision
|
|
5
|
+
License: MIT
|
|
6
|
+
Keywords: computer-vision,face-detection,mediapipe,opencv,pose-estimation
|
|
7
|
+
Classifier: Development Status :: 3 - Alpha
|
|
8
|
+
Classifier: Intended Audience :: Developers
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
13
|
+
Classifier: Topic :: Scientific/Engineering :: Image Recognition
|
|
14
|
+
Requires-Python: >=3.11.8
|
|
15
|
+
Requires-Dist: imageio>=2.37.3
|
|
16
|
+
Requires-Dist: imutils>=0.5.4
|
|
17
|
+
Requires-Dist: mediapipe>=0.10.35
|
|
18
|
+
Requires-Dist: mss>=10.2.0
|
|
19
|
+
Requires-Dist: opencv-python>=4.13.0.92
|
|
20
|
+
Requires-Dist: pandas>=3.0.3
|
|
21
|
+
Requires-Dist: pyautogui>=0.9.54
|
|
22
|
+
Requires-Dist: pytesseract>=0.3.13
|
|
23
|
+
Requires-Dist: scikit-image>=0.26.0
|
|
24
|
+
Description-Content-Type: text/markdown
|
|
25
|
+
|
|
26
|
+
# OpenVisionKit
|
|
27
|
+
|
|
28
|
+
[](https://github.com/your-org/openvisionkit/actions/workflows/ci-unit.yml)
|
|
29
|
+
[](https://github.com/your-org/openvisionkit/actions/workflows/ci-security.yml)
|
|
30
|
+
[](https://pypi.org/p/openvisionkit)
|
|
31
|
+
[](https://www.python.org/downloads/)
|
|
32
|
+
[](LICENSE)
|
|
33
|
+
|
|
34
|
+
**OpenVisionKit** is a high-level Python computer vision library built on top of [MediaPipe](https://developers.google.com/mediapipe) and [OpenCV](https://opencv.org/). It provides production-ready detectors and segmentation utilities for face detection, face mesh, hand tracking, pose estimation, object detection, and background segmentation — wrapped in clean, developer-friendly APIs that eliminate boilerplate and let you focus on building.
|
|
35
|
+
|
|
36
|
+
Whether you are prototyping a gesture-controlled application, building a fitness tracker, adding AR effects, or conducting research, OpenVisionKit gives you the tools to go from camera frame to structured detections in a few lines of code.
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Features
|
|
41
|
+
|
|
42
|
+
| Module | Capability |
|
|
43
|
+
|---|---|
|
|
44
|
+
| `FaceDetector` | Bounding boxes, 6-point keypoints, confidence filtering, IoU, face cropping |
|
|
45
|
+
| `FaceMeshDetector` | 478 landmarks, blendshapes, head pose (yaw/pitch/roll), gaze direction, emotion, AR overlays |
|
|
46
|
+
| `HandDetector` | 21 landmarks, gesture recognition, finger-join detection, distance estimation, palm width |
|
|
47
|
+
| `PoseDetector` | 33 body landmarks, joint angle calculation, exercise detection, workout rep counter, segmentation |
|
|
48
|
+
| `ObjectDetector` | EfficientDet-based multi-class detection with bounding boxes and labels |
|
|
49
|
+
| `SelfieSegmentation` | Background removal, blur, replacement, virtual backgrounds, alpha blending |
|
|
50
|
+
| `HairSegmentation` | Hair region segmentation and recoloring |
|
|
51
|
+
| `ScreenCapture` | High-performance screen grabbing via `mss` |
|
|
52
|
+
| `video_capture_template` | Drop-in webcam loop with FPS overlay, recording, and screenshot support |
|
|
53
|
+
| `image_template` | Single-image processing template with auto-centering, resize, and custom logic hook |
|
|
54
|
+
| `TextDetector` | Tesseract OCR with character/word/digit/table detection, NLP entity extraction, image matching, handwriting support |
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## Requirements
|
|
59
|
+
|
|
60
|
+
- Python >= 3.11.8
|
|
61
|
+
- A `.tflite` / `.task` model file for each MediaPipe detector (see [Model Downloads](#model-downloads))
|
|
62
|
+
|
|
63
|
+
### TextDetector additional requirements
|
|
64
|
+
|
|
65
|
+
`TextDetector` uses Tesseract OCR and optional NLP tooling that are **not** bundled with MediaPipe.
|
|
66
|
+
|
|
67
|
+
**1. Install Tesseract binary** (system-level):
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
# macOS
|
|
71
|
+
brew install tesseract
|
|
72
|
+
|
|
73
|
+
# Ubuntu / Debian
|
|
74
|
+
sudo apt-get install tesseract-ocr
|
|
75
|
+
|
|
76
|
+
# Windows
|
|
77
|
+
# Download installer from https://github.com/UB-Mannheim/tesseract/wiki
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**2. Install Python packages:**
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
# pip
|
|
84
|
+
pip install pytesseract imutils pandas scikit-image Pillow
|
|
85
|
+
|
|
86
|
+
# uv
|
|
87
|
+
uv add pytesseract imutils pandas scikit-image Pillow
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
**3. (Optional) spaCy for NLP features** (entity extraction, keyword extraction, summarization, relation extraction):
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
pip install spacy
|
|
94
|
+
python -m spacy download en_core_web_sm
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
Without spaCy, all NLP methods return empty results gracefully — the rest of `TextDetector` works without it.
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Installation
|
|
102
|
+
|
|
103
|
+
### pip
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
pip install openvisionkit
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
Or install directly from source:
|
|
110
|
+
|
|
111
|
+
```bash
|
|
112
|
+
pip install git+https://github.com/your-org/openvisionkit.git
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### uv
|
|
116
|
+
|
|
117
|
+
```bash
|
|
118
|
+
uv add openvisionkit
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Or from source:
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
uv add git+https://github.com/your-org/openvisionkit.git
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
For development (editable install with all dev dependencies):
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
git clone https://github.com/your-org/openvisionkit.git
|
|
131
|
+
cd openvisionkit
|
|
132
|
+
make setup
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## Model Downloads
|
|
138
|
+
|
|
139
|
+
OpenVisionKit delegates inference to MediaPipe `.tflite` / `.task` model files. Download the models you need and place them in a `models/` directory at your project root.
|
|
140
|
+
|
|
141
|
+
| Detector | Model file | Download |
|
|
142
|
+
|---|---|---|
|
|
143
|
+
| `FaceDetector` | `face_detector.tflite` | [MediaPipe Face Detector](https://developers.google.com/mediapipe/solutions/vision/face_detector) |
|
|
144
|
+
| `FaceMeshDetector` | `face_landmarker_v2_with_blendshapes.task` | [MediaPipe Face Landmarker](https://developers.google.com/mediapipe/solutions/vision/face_landmarker) |
|
|
145
|
+
| `HandDetector` | `hand_landmarker.task` | [MediaPipe Hand Landmarker](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker) |
|
|
146
|
+
| `PoseDetector` | `pose_landmarker.task` | [MediaPipe Pose Landmarker](https://developers.google.com/mediapipe/solutions/vision/pose_landmarker) |
|
|
147
|
+
| `ObjectDetector` | `efficientdet_lite.tflite` | [MediaPipe Object Detector](https://developers.google.com/mediapipe/solutions/vision/object_detector) |
|
|
148
|
+
| `SelfieSegmentation` | `deeplab_v3.tflite` | [MediaPipe Image Segmenter](https://developers.google.com/mediapipe/solutions/vision/image_segmenter) |
|
|
149
|
+
| `HairSegmentation` | `hair_segmenter.tflite` | [MediaPipe Hair Segmenter](https://developers.google.com/mediapipe/solutions/vision/image_segmenter) |
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
## Quick Start
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
import cv2
|
|
157
|
+
from openvisionkit.capture.video_template import video_capture_template
|
|
158
|
+
from openvisionkit.lib.hand_detector import HandDetector
|
|
159
|
+
|
|
160
|
+
detector = HandDetector(model_path="./models/hand_landmarker.task")
|
|
161
|
+
|
|
162
|
+
def process(frame):
|
|
163
|
+
frame = detector.draw_landmarks(frame)
|
|
164
|
+
return frame
|
|
165
|
+
|
|
166
|
+
video_capture_template(custom_logic=process, window_name="Hand Tracking")
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## Usage
|
|
172
|
+
|
|
173
|
+
### FaceDetector
|
|
174
|
+
|
|
175
|
+
Detects faces in an image or video stream and returns bounding boxes, keypoints, and confidence scores.
|
|
176
|
+
|
|
177
|
+
```python
|
|
178
|
+
import cv2
|
|
179
|
+
from openvisionkit.lib.face_detector import FaceDetector
|
|
180
|
+
|
|
181
|
+
detector = FaceDetector(
|
|
182
|
+
model_path="./models/face_detector.tflite",
|
|
183
|
+
max_faces=5,
|
|
184
|
+
running_mode="IMAGE", # "IMAGE" | "VIDEO"
|
|
185
|
+
min_detection_confidence=0.5,
|
|
186
|
+
min_suppression_threshold=0.3,
|
|
187
|
+
)
|
|
188
|
+
|
|
189
|
+
frame = cv2.imread("photo.jpg")
|
|
190
|
+
|
|
191
|
+
# Returns annotated frame + list of detection dicts
|
|
192
|
+
annotated, detections = detector.detect_faces(frame, to_draw_bounding_box=True, to_draw_landmarks=True)
|
|
193
|
+
|
|
194
|
+
for det in detections:
|
|
195
|
+
print(det["id"]) # face index
|
|
196
|
+
print(det["score"]) # confidence 0–1
|
|
197
|
+
print(det["bbox"]) # (x, y, w, h)
|
|
198
|
+
print(det["bbox_xyxy"]) # (x1, y1, x2, y2)
|
|
199
|
+
print(det["center"]) # (cx, cy)
|
|
200
|
+
print(det["normalized_keypoints"]) # list of (x, y) pixel coords for 6 landmarks
|
|
201
|
+
|
|
202
|
+
cv2.imshow("Faces", annotated)
|
|
203
|
+
cv2.waitKey(0)
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
**Utility methods:**
|
|
207
|
+
|
|
208
|
+
```python
|
|
209
|
+
# Filter detections below a confidence threshold
|
|
210
|
+
confident = detector.filter_by_confidence(detections, threshold=0.7)
|
|
211
|
+
|
|
212
|
+
# Get the largest face by bounding-box area
|
|
213
|
+
biggest = detector.get_largest_face(detections)
|
|
214
|
+
|
|
215
|
+
# Crop face regions out of the image (optional pixel margin)
|
|
216
|
+
face_crops = detector.crop_faces(frame, detections, margin=10)
|
|
217
|
+
|
|
218
|
+
# Sort by area (descending) or any other detection key
|
|
219
|
+
sorted_faces = detector.sort_faces(detections, by="area")
|
|
220
|
+
|
|
221
|
+
# Intersection over Union — useful for NMS or tracking
|
|
222
|
+
iou = detector.get_iou(detections[0]["bbox_xyxy"], detections[1]["bbox_xyxy"])
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
---
|
|
226
|
+
|
|
227
|
+
### FaceMeshDetector
|
|
228
|
+
|
|
229
|
+
Detects 478 facial landmarks per face along with blendshape expressions and head-pose matrices.
|
|
230
|
+
|
|
231
|
+
```python
|
|
232
|
+
import cv2
|
|
233
|
+
from openvisionkit.lib.face_mesh_detector import FaceMeshDetector
|
|
234
|
+
|
|
235
|
+
detector = FaceMeshDetector(
|
|
236
|
+
model_path="./models/face_landmarker_v2_with_blendshapes.task",
|
|
237
|
+
num_faces=2,
|
|
238
|
+
min_face_detection_confidence=0.5,
|
|
239
|
+
output_face_blendshapes=True,
|
|
240
|
+
output_facial_transformation_matrixes=True,
|
|
241
|
+
)
|
|
242
|
+
|
|
243
|
+
frame = cv2.imread("face.jpg")
|
|
244
|
+
|
|
245
|
+
annotated, faces, blendshapes, matrices, bboxes = detector.face_mesh_detection(frame, drawLandMarks=True)
|
|
246
|
+
|
|
247
|
+
# faces[i] -> list of [x, y] pixel coords for 478 landmarks
|
|
248
|
+
# blendshapes[i] -> dict of {blendshape_name: score} (52 expressions)
|
|
249
|
+
# matrices[i] -> 4x4 numpy head-pose matrix
|
|
250
|
+
# bboxes[i] -> [min_x, min_y, max_x, max_y]
|
|
251
|
+
|
|
252
|
+
for i, blend in enumerate(blendshapes):
|
|
253
|
+
# Rule-based emotion from blendshapes
|
|
254
|
+
emotion = detector.get_emotion(blend)
|
|
255
|
+
print(f"Face {i}: {emotion}")
|
|
256
|
+
|
|
257
|
+
# Gaze direction for each eye
|
|
258
|
+
gaze = detector.get_eye_gaze_direction(faces[i], is_left_eye=True)
|
|
259
|
+
print(f"Left gaze: {gaze}") # "Left" | "Center" | "Right"
|
|
260
|
+
|
|
261
|
+
# Mouth openness ratio (0 = closed, 0.5+ = wide open)
|
|
262
|
+
ratio = detector.get_mouth_openness_ratio(faces[i])
|
|
263
|
+
print(f"Mouth ratio: {ratio:.2f}")
|
|
264
|
+
|
|
265
|
+
# Head pose angles from transformation matrix
|
|
266
|
+
if matrices[i] is not None:
|
|
267
|
+
yaw, pitch, roll = detector.get_head_pose_angles(matrices[i])
|
|
268
|
+
print(f"Yaw: {yaw:.1f} Pitch: {pitch:.1f} Roll: {roll:.1f}")
|
|
269
|
+
|
|
270
|
+
# Inter-pupillary distance
|
|
271
|
+
ipd = detector.get_inter_pupillary_distance(faces[i], normalized=False)
|
|
272
|
+
print(f"IPD: {ipd:.1f}px")
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
**AR overlay example:**
|
|
276
|
+
|
|
277
|
+
```python
|
|
278
|
+
# Overlay a PNG glasses filter (must have alpha channel)
|
|
279
|
+
glasses = cv2.imread("glasses.png", cv2.IMREAD_UNCHANGED) # RGBA
|
|
280
|
+
frame_with_glasses = detector.overlay_ar_filter(frame, faces[0], glasses, filter_type="glasses")
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
---
|
|
284
|
+
|
|
285
|
+
### HandDetector
|
|
286
|
+
|
|
287
|
+
Tracks up to N hands with 21 landmarks each. Provides gesture recognition, finger-join detection, and distance estimation.
|
|
288
|
+
|
|
289
|
+
```python
|
|
290
|
+
import cv2
|
|
291
|
+
from openvisionkit.lib.hand_detector import HandDetector
|
|
292
|
+
|
|
293
|
+
detector = HandDetector(
|
|
294
|
+
model_path="./models/hand_landmarker.task",
|
|
295
|
+
running_mode="IMAGE", # "IMAGE" | "VIDEO"
|
|
296
|
+
max_hands=2,
|
|
297
|
+
detection_confidence=0.5,
|
|
298
|
+
tracking_confidence=0.5,
|
|
299
|
+
smoothing_window=8,
|
|
300
|
+
)
|
|
301
|
+
|
|
302
|
+
frame = cv2.imread("hand.jpg")
|
|
303
|
+
|
|
304
|
+
# Draw landmarks, bounding box, and handedness label
|
|
305
|
+
annotated = detector.draw_landmarks(
|
|
306
|
+
frame,
|
|
307
|
+
to_draw_landmark=True,
|
|
308
|
+
to_draw_bounding_box=True,
|
|
309
|
+
to_put_handle_label=True,
|
|
310
|
+
)
|
|
311
|
+
|
|
312
|
+
# Get structured landmark data for all detected hands
|
|
313
|
+
all_hands = detector.get_landmarks(frame)
|
|
314
|
+
|
|
315
|
+
for hand in all_hands:
|
|
316
|
+
print(hand["hand_type"]) # "Left" or "Right"
|
|
317
|
+
print(hand["bounding_box"]) # (x, y, w, h)
|
|
318
|
+
print(hand["center_point"]) # (cx, cy)
|
|
319
|
+
lm = hand["landmarks_list"] # list of [id, x, y, z]
|
|
320
|
+
|
|
321
|
+
# Which fingers are raised?
|
|
322
|
+
fingers = detector.fingers_up(lm)
|
|
323
|
+
# [thumb, index, middle, ring, little] — 1=up, 0=down
|
|
324
|
+
|
|
325
|
+
# Gesture shortcuts
|
|
326
|
+
print(detector.is_fist())
|
|
327
|
+
print(detector.is_thumbs_up())
|
|
328
|
+
print(detector.is_peace_sign())
|
|
329
|
+
print(detector.is_open_hand())
|
|
330
|
+
|
|
331
|
+
# Distance between any two landmarks with visual feedback
|
|
332
|
+
p1 = (lm[4][1], lm[4][2]) # thumb tip
|
|
333
|
+
p2 = (lm[8][1], lm[8][2]) # index tip
|
|
334
|
+
length, annotated, coords = detector.get_distance(p1, p2, annotated)
|
|
335
|
+
print(f"Thumb-index distance: {length:.1f}px")
|
|
336
|
+
|
|
337
|
+
# Detect if two finger tips are touching
|
|
338
|
+
joined = detector.is_fingers_joined(4, 8, annotated, lm, threshold=0.25)
|
|
339
|
+
|
|
340
|
+
# Palm width in pixels (stable reference)
|
|
341
|
+
palm_px, idx_mcp, pinky_mcp = detector.palm_width_px(frame, lm)
|
|
342
|
+
print(f"Palm width: {palm_px:.1f}px")
|
|
343
|
+
```
|
|
344
|
+
|
|
345
|
+
**Distance estimation (calibration-based):**
|
|
346
|
+
|
|
347
|
+
```python
|
|
348
|
+
# Provide (palm_width_px, distance_cm) pairs to calibrate
|
|
349
|
+
calibration = [(180, 20), (120, 35), (80, 55), (60, 75)]
|
|
350
|
+
detector_calibrated = HandDetector(
|
|
351
|
+
model_path="./models/hand_landmarker.task",
|
|
352
|
+
calibration_samples=calibration,
|
|
353
|
+
)
|
|
354
|
+
|
|
355
|
+
# After calibration, estimate distance from a new palm width
|
|
356
|
+
distance_cm = detector_calibrated.estimate_distance_cm(palm_width_px=110)
|
|
357
|
+
print(f"Estimated distance: {distance_cm:.1f} cm")
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
---
|
|
361
|
+
|
|
362
|
+
### PoseDetector
|
|
363
|
+
|
|
364
|
+
Detects 33 body landmarks. Supports joint angle calculation, exercise classification, workout rep counting, and body segmentation.
|
|
365
|
+
|
|
366
|
+
```python
|
|
367
|
+
import cv2
|
|
368
|
+
from openvisionkit.lib.pose_detector import PoseDetector
|
|
369
|
+
from mediapipe.tasks.python import vision
|
|
370
|
+
|
|
371
|
+
detector = PoseDetector(
|
|
372
|
+
model_path="./models/pose_landmarker.task",
|
|
373
|
+
running_mode=vision.RunningMode.VIDEO, # VIDEO for webcam streams
|
|
374
|
+
num_poses=1,
|
|
375
|
+
min_pose_detection_confidence=0.5,
|
|
376
|
+
output_segmentation_masks=True,
|
|
377
|
+
)
|
|
378
|
+
|
|
379
|
+
cap = cv2.VideoCapture(0)
|
|
380
|
+
while True:
|
|
381
|
+
ret, frame = cap.read()
|
|
382
|
+
if not ret:
|
|
383
|
+
break
|
|
384
|
+
|
|
385
|
+
# Detect and annotate
|
|
386
|
+
annotated, result = detector.detect(frame, draw_landmarks=True)
|
|
387
|
+
|
|
388
|
+
# All landmark positions as pixel dicts
|
|
389
|
+
landmarks = detector.get_all_postion(frame, result)
|
|
390
|
+
|
|
391
|
+
# Get a specific landmark (e.g. nose = id 0)
|
|
392
|
+
nose = detector.get_landmark(result, pose_index=0, landmark_id=0)
|
|
393
|
+
print(nose["x"], nose["y"], nose["visibility"])
|
|
394
|
+
|
|
395
|
+
# Calculate joint angle — e.g. left elbow (shoulder=11, elbow=13, wrist=15)
|
|
396
|
+
annotated, angle = detector.calculate_angle(annotated, result, p1=11, p2=13, p3=15)
|
|
397
|
+
print(f"Left elbow angle: {angle:.1f} degrees")
|
|
398
|
+
|
|
399
|
+
# Classify current exercise
|
|
400
|
+
exercise = detector.detect_exercise(annotated, result)
|
|
401
|
+
print(f"Exercise: {exercise}")
|
|
402
|
+
|
|
403
|
+
# Workout rep counter (tracks bicep curls automatically)
|
|
404
|
+
angle, percent, reps = detector.calculate_workout_percentage()
|
|
405
|
+
stats = detector.get_workout_stats(annotated)
|
|
406
|
+
print(f"Reps: {stats['reps']} Calories: {stats['calories']:.1f}")
|
|
407
|
+
|
|
408
|
+
# Body segmentation overlay (requires output_segmentation_masks=True)
|
|
409
|
+
annotated = detector.draw_segmentation_mask(annotated, result, alpha=0.5, color=(0, 255, 0))
|
|
410
|
+
|
|
411
|
+
cv2.imshow("Pose", annotated)
|
|
412
|
+
if cv2.waitKey(1) & 0xFF == 27:
|
|
413
|
+
break
|
|
414
|
+
|
|
415
|
+
cap.release()
|
|
416
|
+
cv2.destroyAllWindows()
|
|
417
|
+
```
|
|
418
|
+
|
|
419
|
+
**Auto-select the most visible arm for curl tracking:**
|
|
420
|
+
|
|
421
|
+
```python
|
|
422
|
+
p1, p2, p3 = detector.select_active_arm(result)
|
|
423
|
+
annotated, angle = detector.calculate_angle(annotated, result, p1, p2, p3)
|
|
424
|
+
```
|
|
425
|
+
|
|
426
|
+
---
|
|
427
|
+
|
|
428
|
+
### ObjectDetector
|
|
429
|
+
|
|
430
|
+
Detects multiple object classes in a frame using EfficientDet Lite.
|
|
431
|
+
|
|
432
|
+
```python
|
|
433
|
+
import cv2
|
|
434
|
+
from openvisionkit.lib.object_detector import ObjectDetector
|
|
435
|
+
|
|
436
|
+
detector = ObjectDetector(
|
|
437
|
+
model_path="./models/efficientdet_lite.tflite",
|
|
438
|
+
max_results=5,
|
|
439
|
+
running_mode="IMAGE", # "IMAGE" | "VIDEO"
|
|
440
|
+
category_allowlist=None, # e.g. ["person", "car"] to restrict classes
|
|
441
|
+
category_denylist=None,
|
|
442
|
+
)
|
|
443
|
+
|
|
444
|
+
frame = cv2.imread("street.jpg")
|
|
445
|
+
|
|
446
|
+
# Returns annotated image with bounding boxes and labels drawn
|
|
447
|
+
annotated = detector.detect_objects(frame)
|
|
448
|
+
|
|
449
|
+
cv2.imshow("Objects", annotated)
|
|
450
|
+
cv2.waitKey(0)
|
|
451
|
+
|
|
452
|
+
# Or get raw detection result for custom processing
|
|
453
|
+
result, mp_image = detector.detect(frame)
|
|
454
|
+
for detection in result.detections:
|
|
455
|
+
label = detection.categories[0].category_name
|
|
456
|
+
score = detection.categories[0].score
|
|
457
|
+
bbox = detection.bounding_box
|
|
458
|
+
print(f"{label}: {score:.2f} @ ({bbox.origin_x}, {bbox.origin_y})")
|
|
459
|
+
```
|
|
460
|
+
|
|
461
|
+
---
|
|
462
|
+
|
|
463
|
+
### SelfieSegmentation
|
|
464
|
+
|
|
465
|
+
Separates people from backgrounds using DeepLab V3. Multiple compositing modes available.
|
|
466
|
+
|
|
467
|
+
```python
|
|
468
|
+
import cv2
|
|
469
|
+
from openvisionkit.lib.selfie_segmentation import SelfieSegmentation
|
|
470
|
+
|
|
471
|
+
seg = SelfieSegmentation(
|
|
472
|
+
model_path="./models/deeplab_v3.tflite",
|
|
473
|
+
output_category_mask=True,
|
|
474
|
+
)
|
|
475
|
+
|
|
476
|
+
frame = cv2.imread("selfie.jpg")
|
|
477
|
+
|
|
478
|
+
# Remove background (black fill)
|
|
479
|
+
no_bg = seg.remove_background(frame)
|
|
480
|
+
|
|
481
|
+
# Blur background
|
|
482
|
+
blurred = seg.blur_background(frame, blur_strength=(55, 55))
|
|
483
|
+
|
|
484
|
+
# Replace background with an image
|
|
485
|
+
replaced = seg.replace_background(frame, background_path="./bg.jpg")
|
|
486
|
+
|
|
487
|
+
# Solid color background
|
|
488
|
+
colored = seg.color_background(frame, color=(0, 120, 255))
|
|
489
|
+
|
|
490
|
+
# Alpha-blend foreground over a custom background array
|
|
491
|
+
bg = cv2.imread("./bg.jpg")
|
|
492
|
+
blended = seg.alpha_blend(frame, bg)
|
|
493
|
+
|
|
494
|
+
# Optimized virtual background with temporal smoothing + edge refinement
|
|
495
|
+
# (best for real-time webcam use)
|
|
496
|
+
output = seg.optimize_virtual_background(frame, bg)
|
|
497
|
+
|
|
498
|
+
# Single-person isolation — removes other people in the background
|
|
499
|
+
output = seg.optimize_virtual_background_improved(frame, bg)
|
|
500
|
+
|
|
501
|
+
# Debug: visualize the raw segmentation heatmap
|
|
502
|
+
heatmap = seg.overlay_mask(frame)
|
|
503
|
+
|
|
504
|
+
cv2.imshow("Segmented", output)
|
|
505
|
+
cv2.waitKey(0)
|
|
506
|
+
```
|
|
507
|
+
|
|
508
|
+
---
|
|
509
|
+
|
|
510
|
+
### HairSegmentation
|
|
511
|
+
|
|
512
|
+
Segments hair regions for recoloring or styling effects.
|
|
513
|
+
|
|
514
|
+
```python
|
|
515
|
+
import cv2
|
|
516
|
+
import numpy as np
|
|
517
|
+
from openvisionkit.lib.hair_segmentation import HairSegmentation
|
|
518
|
+
|
|
519
|
+
seg = HairSegmentation(model_path="./models/hair_segmenter.tflite")
|
|
520
|
+
|
|
521
|
+
frame = cv2.imread("portrait.jpg")
|
|
522
|
+
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
|
523
|
+
|
|
524
|
+
result = seg.process(rgb_frame)
|
|
525
|
+
mask = result.category_mask.numpy_view() # shape (H, W), values 0–1
|
|
526
|
+
|
|
527
|
+
# Recolor hair to blue
|
|
528
|
+
hair_color = np.zeros_like(frame)
|
|
529
|
+
hair_color[:] = (255, 0, 0) # BGR blue
|
|
530
|
+
hair_region = (mask > 0.5)[..., None]
|
|
531
|
+
output = np.where(hair_region, hair_color, frame)
|
|
532
|
+
|
|
533
|
+
cv2.imshow("Hair", output)
|
|
534
|
+
cv2.waitKey(0)
|
|
535
|
+
```
|
|
536
|
+
|
|
537
|
+
---
|
|
538
|
+
|
|
539
|
+
### ScreenCapture
|
|
540
|
+
|
|
541
|
+
Captures live frames from a monitor — useful for screen-based CV pipelines.
|
|
542
|
+
|
|
543
|
+
```python
|
|
544
|
+
from openvisionkit.capture.screen_capture import ScreenCapture
|
|
545
|
+
import cv2
|
|
546
|
+
|
|
547
|
+
cap = ScreenCapture(monitor_index=1) # 1 = primary monitor
|
|
548
|
+
|
|
549
|
+
while True:
|
|
550
|
+
frame = cap.grab() # returns BGR numpy array
|
|
551
|
+
cv2.imshow("Screen", frame)
|
|
552
|
+
if cv2.waitKey(1) & 0xFF == 27:
|
|
553
|
+
break
|
|
554
|
+
|
|
555
|
+
cv2.destroyAllWindows()
|
|
556
|
+
```
|
|
557
|
+
|
|
558
|
+
---
|
|
559
|
+
|
|
560
|
+
### video_capture_template
|
|
561
|
+
|
|
562
|
+
A reusable webcam loop that handles window setup, FPS display, recording, and screenshots. Pass a `custom_logic` callback for your processing.
|
|
563
|
+
|
|
564
|
+
```python
|
|
565
|
+
import cv2
|
|
566
|
+
from openvisionkit.capture.video_template import video_capture_template
|
|
567
|
+
from openvisionkit.lib.face_detector import FaceDetector
|
|
568
|
+
|
|
569
|
+
detector = FaceDetector(model_path="./models/face_detector.tflite", running_mode="VIDEO")
|
|
570
|
+
|
|
571
|
+
def process(frame):
|
|
572
|
+
annotated, _ = detector.detect_faces(frame)
|
|
573
|
+
return annotated
|
|
574
|
+
|
|
575
|
+
video_capture_template(
|
|
576
|
+
video_source=0, # webcam index or path to video file
|
|
577
|
+
custom_logic=process,
|
|
578
|
+
window_name="Face Detection",
|
|
579
|
+
resolution=(1280, 720),
|
|
580
|
+
draw_fps=True,
|
|
581
|
+
enable_auto_recording=True, # auto-saves .mp4 from first frame
|
|
582
|
+
record_format="mp4", # "mp4" or "gif"
|
|
583
|
+
enable_screenshot=True, # press 's' to capture a frame
|
|
584
|
+
auto_screenshot_after_seconds=10.0, # also auto-capture after 10 s
|
|
585
|
+
auto_screenshot_repeat=False, # True = repeat every 10 s
|
|
586
|
+
)
|
|
587
|
+
```
|
|
588
|
+
|
|
589
|
+
**Key bindings (built-in):**
|
|
590
|
+
|
|
591
|
+
| Key | Action | Condition |
|
|
592
|
+
|---|---|---|
|
|
593
|
+
| `ESC` | Exit loop | always |
|
|
594
|
+
| `s` / `S` | Save screenshot | `enable_screenshot=True` |
|
|
595
|
+
| `r` / `R` | Toggle manual recording on/off | `enable_manual_recording=True` |
|
|
596
|
+
|
|
597
|
+
**Stateful key handlers with `KeyEventManager`:**
|
|
598
|
+
|
|
599
|
+
```python
|
|
600
|
+
from openvisionkit.capture.video_template import KeyEventManager, video_capture_template
|
|
601
|
+
|
|
602
|
+
state = {"score": 0}
|
|
603
|
+
km = KeyEventManager()
|
|
604
|
+
km.register(ord("p"), lambda frame, s: print(f"Score: {s['score']}"))
|
|
605
|
+
km.register(ord("+"), lambda frame, s: s.update({"score": s["score"] + 1}))
|
|
606
|
+
|
|
607
|
+
video_capture_template(
|
|
608
|
+
video_source=0,
|
|
609
|
+
state=state,
|
|
610
|
+
key_manager=km,
|
|
611
|
+
custom_logic=lambda frame: frame,
|
|
612
|
+
)
|
|
613
|
+
```
|
|
614
|
+
|
|
615
|
+
**Manual recording:**
|
|
616
|
+
|
|
617
|
+
```python
|
|
618
|
+
video_capture_template(
|
|
619
|
+
video_source=0,
|
|
620
|
+
enable_manual_recording=True, # press R to start, R again to stop and save
|
|
621
|
+
record_format="gif",
|
|
622
|
+
)
|
|
623
|
+
```
|
|
624
|
+
|
|
625
|
+
**Parameter reference:**
|
|
626
|
+
|
|
627
|
+
| Parameter | Type | Default | Description |
|
|
628
|
+
|---|---|---|---|
|
|
629
|
+
| `video_source` | `int \| str` | `0` | Camera index or path to video file |
|
|
630
|
+
| `loop_forever` | `bool` | `True` | Loop video file when it ends |
|
|
631
|
+
| `custom_logic` | `Callable[[ndarray], ndarray]` | `None` | Per-frame processing; receives and returns BGR image |
|
|
632
|
+
| `state` | `dict` | `None` | Shared state dict passed to every key handler |
|
|
633
|
+
| `key_manager` | `KeyEventManager` | `None` | Custom key-event dispatcher |
|
|
634
|
+
| `window_name` | `str` | `"Demo"` | OpenCV window title |
|
|
635
|
+
| `show_window` | `bool` | `True` | Display the OpenCV window |
|
|
636
|
+
| `resolution` | `tuple[int, int]` | `(1280, 720)` | Camera resolution `(width, height)` |
|
|
637
|
+
| `center_window` | `bool` | `True` | Auto-center window on screen via pyautogui |
|
|
638
|
+
| `draw_fps` | `bool` | `True` | Overlay FPS counter on frame |
|
|
639
|
+
| `fps` | `int` | `15` | Recording frame rate (auto-recording only) |
|
|
640
|
+
| `mouse_callback` | `Callable` | `None` | OpenCV mouse-event callback |
|
|
641
|
+
| `mouse_callback_params` | `dict` | `None` | Extra params passed to mouse callback |
|
|
642
|
+
| `enable_auto_recording` | `bool` | `False` | Record every frame automatically from start |
|
|
643
|
+
| `enable_manual_recording` | `bool` | `False` | Allow toggling recording with `R` key |
|
|
644
|
+
| `record_format` | `str` | `"mp4"` | `"mp4"` or `"gif"` |
|
|
645
|
+
| `enable_screenshot` | `bool` | `False` | Enable `s`-key and auto-screenshot |
|
|
646
|
+
| `screenshot_output_dir` | `str` | `"screenshots"` | Directory for saved screenshots |
|
|
647
|
+
| `screenshot_prefix` | `str` | `"capture"` | Filename prefix before timestamp |
|
|
648
|
+
| `auto_screenshot_after_seconds` | `float` | `None` | Trigger first screenshot after N seconds |
|
|
649
|
+
| `auto_screenshot_repeat` | `bool` | `False` | Repeat auto-screenshot every N seconds |
|
|
650
|
+
|
|
651
|
+
---
|
|
652
|
+
|
|
653
|
+
### image_template
|
|
654
|
+
|
|
655
|
+
A single-image equivalent of `video_capture_template`. Loads one image from disk, applies an optional processing callback, resizes to the target resolution, auto-centers the window on screen, and displays it.
|
|
656
|
+
|
|
657
|
+
```python
|
|
658
|
+
import cv2
|
|
659
|
+
from openvisionkit.capture.image_template import image_template
|
|
660
|
+
from openvisionkit.lib.face_detector import FaceDetector
|
|
661
|
+
|
|
662
|
+
detector = FaceDetector(model_path="./models/face_detector.tflite", running_mode="IMAGE")
|
|
663
|
+
|
|
664
|
+
def process(frame):
|
|
665
|
+
annotated, _ = detector.detect_faces(frame)
|
|
666
|
+
return annotated
|
|
667
|
+
|
|
668
|
+
image_template(
|
|
669
|
+
image_path="photo.jpg",
|
|
670
|
+
custom_logic=process, # receives the loaded BGR image, must return BGR image
|
|
671
|
+
window_name="Face Demo",
|
|
672
|
+
resolution=(1280, 720), # image is resized to this before display
|
|
673
|
+
center_window=True, # auto-centers window on screen via pyautogui
|
|
674
|
+
show_window=True, # set False to run headless (e.g. save to disk instead)
|
|
675
|
+
)
|
|
676
|
+
```
|
|
677
|
+
|
|
678
|
+
Without a `custom_logic` callback the image is loaded, resized, and displayed as-is:
|
|
679
|
+
|
|
680
|
+
```python
|
|
681
|
+
image_template(image_path="photo.jpg")
|
|
682
|
+
```
|
|
683
|
+
|
|
684
|
+
**Parameter reference:**
|
|
685
|
+
|
|
686
|
+
| Parameter | Type | Default | Description |
|
|
687
|
+
|---|---|---|---|
|
|
688
|
+
| `image_path` | `str` | required | Path to the image file |
|
|
689
|
+
| `custom_logic` | `Callable[[ndarray], ndarray]` | `None` | Processing function applied before display |
|
|
690
|
+
| `window_name` | `str` | `"Demo"` | OpenCV window title |
|
|
691
|
+
| `resolution` | `tuple[int, int]` | `(1280, 720)` | `(width, height)` to resize the image |
|
|
692
|
+
| `center_window` | `bool` | `True` | Move window to screen center via pyautogui |
|
|
693
|
+
| `show_window` | `bool` | `True` | Display the OpenCV window |
|
|
694
|
+
|
|
695
|
+
---
|
|
696
|
+
|
|
697
|
+
### TextDetector
|
|
698
|
+
|
|
699
|
+
Tesseract-backed OCR class with per-character, per-word, and per-digit detection, document boundary detection, table extraction, image-to-image feature matching, cursive/handwriting OCR, and optional NLP post-processing via spaCy.
|
|
700
|
+
|
|
701
|
+
#### Installation prerequisites
|
|
702
|
+
|
|
703
|
+
See [TextDetector additional requirements](#textdetector-additional-requirements) above before using this class.
|
|
704
|
+
|
|
705
|
+
#### Basic OCR
|
|
706
|
+
|
|
707
|
+
```python
|
|
708
|
+
import cv2
|
|
709
|
+
from openvisionkit.lib.text_detector import TextDetector
|
|
710
|
+
|
|
711
|
+
image = cv2.imread("document.jpg")
|
|
712
|
+
|
|
713
|
+
detector = TextDetector(
|
|
714
|
+
image=image,
|
|
715
|
+
lang="eng", # Tesseract language code(s); multi-language: "eng+chi_sim"
|
|
716
|
+
oem=3, # OCR Engine Mode — 3 = default (LSTM preferred)
|
|
717
|
+
psm=6, # Page Segmentation Mode — 6 = single uniform text block
|
|
718
|
+
preprocess=True, # apply grayscale + histogram equalization + adaptive threshold
|
|
719
|
+
use_gpu=False, # enable OpenCL GPU acceleration for OpenCV ops
|
|
720
|
+
)
|
|
721
|
+
|
|
722
|
+
# Full text string from the image
|
|
723
|
+
text = detector.detect_text()
|
|
724
|
+
print(text)
|
|
725
|
+
|
|
726
|
+
# Switch language at runtime (no need to reinstantiate)
|
|
727
|
+
detector.set_language("eng+fra")
|
|
728
|
+
|
|
729
|
+
# Replace the image on an existing instance
|
|
730
|
+
new_image = cv2.imread("page2.jpg")
|
|
731
|
+
detector.set_image(new_image)
|
|
732
|
+
```
|
|
733
|
+
|
|
734
|
+
#### Word-level detection
|
|
735
|
+
|
|
736
|
+
```python
|
|
737
|
+
words, annotated = detector.detect_words(
|
|
738
|
+
draw_boxes=True,
|
|
739
|
+
bounding_box_color=(255, 0, 0), # BGR
|
|
740
|
+
text_color=(255, 0, 0),
|
|
741
|
+
font_scale=1,
|
|
742
|
+
font_thickness=2,
|
|
743
|
+
)
|
|
744
|
+
|
|
745
|
+
for word in words:
|
|
746
|
+
print(word["text"]) # recognized word string
|
|
747
|
+
print(word["conf"]) # Tesseract confidence 0–100
|
|
748
|
+
print(word["x"], word["y"], word["w"], word["h"]) # bounding box
|
|
749
|
+
|
|
750
|
+
cv2.imshow("Words", annotated)
|
|
751
|
+
cv2.waitKey(0)
|
|
752
|
+
|
|
753
|
+
# Convenience accessors
|
|
754
|
+
word_strings = detector.get_words() # List[str]
|
|
755
|
+
lines = detector.get_lines() # List[str] — full lines
|
|
756
|
+
avg_conf = detector.get_confidence() # float — mean confidence across all words
|
|
757
|
+
df = detector.to_dataframe() # pandas DataFrame of word detections
|
|
758
|
+
```
|
|
759
|
+
|
|
760
|
+
#### Character-level detection
|
|
761
|
+
|
|
762
|
+
```python
|
|
763
|
+
chars, annotated = detector.detect_characters(
|
|
764
|
+
draw_boxes=True,
|
|
765
|
+
is_dark_background=False, # set True to invert image before OCR
|
|
766
|
+
adjust_text_height=20, # vertical offset for label above bounding box
|
|
767
|
+
bounding_box_color=(255, 0, 0),
|
|
768
|
+
text_color=(255, 0, 0),
|
|
769
|
+
)
|
|
770
|
+
|
|
771
|
+
for c in chars:
|
|
772
|
+
print(c["char"]) # single character string
|
|
773
|
+
print(c["x1"], c["y1"]) # top-left (OpenCV coords)
|
|
774
|
+
print(c["x2"], c["y2"]) # bottom-right (OpenCV coords)
|
|
775
|
+
```
|
|
776
|
+
|
|
777
|
+
#### Digit-only detection
|
|
778
|
+
|
|
779
|
+
```python
|
|
780
|
+
digits, annotated = detector.detect_digits(image, draw_boxes=True)
|
|
781
|
+
print(digits) # e.g. ['4', '2', '0']
|
|
782
|
+
```
|
|
783
|
+
|
|
784
|
+
#### Document & table detection
|
|
785
|
+
|
|
786
|
+
```python
|
|
787
|
+
# Detect document boundary (returns 4-corner numpy array, or None)
|
|
788
|
+
corners = detector.detect_document()
|
|
789
|
+
if corners is not None:
|
|
790
|
+
print("Document corners:", corners)
|
|
791
|
+
|
|
792
|
+
# Extract text from table regions using morphological line detection
|
|
793
|
+
tables = detector.detect_tables()
|
|
794
|
+
for table_text in tables:
|
|
795
|
+
print(table_text)
|
|
796
|
+
```
|
|
797
|
+
|
|
798
|
+
#### Orientation & script detection
|
|
799
|
+
|
|
800
|
+
```python
|
|
801
|
+
osd = detector.image_to_osd()
|
|
802
|
+
print(osd["Orientation in degrees"]) # e.g. '90'
|
|
803
|
+
print(osd["Script"]) # e.g. 'Latin'
|
|
804
|
+
```
|
|
805
|
+
|
|
806
|
+
#### Export formats
|
|
807
|
+
|
|
808
|
+
```python
|
|
809
|
+
# PDF bytes
|
|
810
|
+
pdf_bytes = detector.image_to_pdf_or_hocr(extension="pdf")
|
|
811
|
+
with open("output.pdf", "wb") as f:
|
|
812
|
+
f.write(pdf_bytes)
|
|
813
|
+
|
|
814
|
+
# hOCR HTML bytes
|
|
815
|
+
hocr_bytes = detector.image_to_pdf_or_hocr(extension="hocr")
|
|
816
|
+
|
|
817
|
+
# ALTO XML string (structured layout format for digital libraries)
|
|
818
|
+
alto_xml = detector.image_to_alto_xml()
|
|
819
|
+
```
|
|
820
|
+
|
|
821
|
+
#### Handwriting / cursive OCR
|
|
822
|
+
|
|
823
|
+
```python
|
|
824
|
+
text, preprocessed = detector.extract_cursive_text(image)
|
|
825
|
+
print(text)
|
|
826
|
+
# preprocessed is the adaptive-threshold binary image used for OCR
|
|
827
|
+
```
|
|
828
|
+
|
|
829
|
+
#### Image preprocessing utilities
|
|
830
|
+
|
|
831
|
+
```python
|
|
832
|
+
# Resize (uses imutils to preserve aspect ratio)
|
|
833
|
+
resized = detector.resize(width=800)
|
|
834
|
+
|
|
835
|
+
# Rotate (may clip corners)
|
|
836
|
+
rotated = detector.rotate(angle=45)
|
|
837
|
+
|
|
838
|
+
# Rotate without clipping
|
|
839
|
+
rotated_bound = detector.rotate_bound(angle=45)
|
|
840
|
+
|
|
841
|
+
# Auto deskew (corrects small rotation from skewed scans)
|
|
842
|
+
deskewed = detector.deskew()
|
|
843
|
+
|
|
844
|
+
# Auto Canny edge detection with sigma-based threshold
|
|
845
|
+
edges = detector.auto_canny(sigma=0.33)
|
|
846
|
+
```
|
|
847
|
+
|
|
848
|
+
#### ORB keypoint detection and image matching
|
|
849
|
+
|
|
850
|
+
These methods are useful for comparing a scanned form against a template to detect alignment, tampering, or form type.
|
|
851
|
+
|
|
852
|
+
```python
|
|
853
|
+
# Detect ORB keypoints and descriptors
|
|
854
|
+
keypoints, descriptors, annotated = detector.detect_keypoints(
|
|
855
|
+
features=500,
|
|
856
|
+
draw_keypoints=True,
|
|
857
|
+
keypoint_color=(0, 255, 0),
|
|
858
|
+
)
|
|
859
|
+
|
|
860
|
+
# Compare two images using KNN feature matching + RANSAC homography
|
|
861
|
+
# Falls back to SSIM if not enough features are found
|
|
862
|
+
template = cv2.imread("template.jpg")
|
|
863
|
+
result = detector.compare_matches_knn_matcher(
|
|
864
|
+
image2=template,
|
|
865
|
+
form_name="Invoice",
|
|
866
|
+
no_of_feature=500,
|
|
867
|
+
matched_amount=50,
|
|
868
|
+
percentage_of_matches=20,
|
|
869
|
+
draw_matches=False,
|
|
870
|
+
draw_aligned=False,
|
|
871
|
+
)
|
|
872
|
+
print(result["matches"]) # number of good matches
|
|
873
|
+
print(result["homography"]) # 3x3 transformation matrix
|
|
874
|
+
# result["aligned_image"] # template warped to match the query
|
|
875
|
+
# result["matched_image"] # side-by-side match visualization
|
|
876
|
+
|
|
877
|
+
# Brute-force matcher variant (no ratio test, faster but less selective)
|
|
878
|
+
result_bf = detector.compare_matches_bf_matcher(image2=template, form_name="Invoice")
|
|
879
|
+
|
|
880
|
+
# SSIM-based fallback (used automatically, also callable directly)
|
|
881
|
+
ssim_result = TextDetector.fallback_ssim(image, template, "Invoice")
|
|
882
|
+
print(ssim_result["ssim_score"]) # structural similarity 0.0–1.0
|
|
883
|
+
```
|
|
884
|
+
|
|
885
|
+
#### NLP methods (requires spaCy `en_core_web_sm`)
|
|
886
|
+
|
|
887
|
+
```python
|
|
888
|
+
raw_text = detector.detect_text()
|
|
889
|
+
|
|
890
|
+
# Clean whitespace and newlines
|
|
891
|
+
clean = detector.clean_text(raw_text)
|
|
892
|
+
|
|
893
|
+
# Named entity recognition — returns list of {text, label} dicts
|
|
894
|
+
entities = detector.extract_entities(raw_text)
|
|
895
|
+
# e.g. [{"text": "Singapore", "label": "GPE"}, {"text": "2026", "label": "DATE"}]
|
|
896
|
+
|
|
897
|
+
# Group entities by label
|
|
898
|
+
grouped = detector.group_entities(raw_text)
|
|
899
|
+
# e.g. {"GPE": ["Singapore"], "DATE": ["2026"]}
|
|
900
|
+
|
|
901
|
+
# Keyword extraction (nouns and proper nouns, stop-words filtered)
|
|
902
|
+
keywords = detector.extract_keywords(raw_text)
|
|
903
|
+
|
|
904
|
+
# Extractive summarization (top N sentences)
|
|
905
|
+
summary = detector.summarize(raw_text, max_sentences=3)
|
|
906
|
+
|
|
907
|
+
# Subject-verb-object relation extraction
|
|
908
|
+
relations = detector.extract_relations(raw_text)
|
|
909
|
+
# e.g. [{"subject": ["John"], "verb": "signed", "object": ["contract"]}]
|
|
910
|
+
```
|
|
911
|
+
|
|
912
|
+
#### GPU acceleration
|
|
913
|
+
|
|
914
|
+
```python
|
|
915
|
+
detector.enable_gpu() # enables OpenCV OpenCL (requires compatible GPU)
|
|
916
|
+
detector.disable_gpu() # revert to CPU
|
|
917
|
+
```
|
|
918
|
+
|
|
919
|
+
---
|
|
920
|
+
|
|
921
|
+
## Project Structure
|
|
922
|
+
|
|
923
|
+
```
|
|
924
|
+
openvisionkit/
|
|
925
|
+
├── __init__.py # package version (__version__)
|
|
926
|
+
├── lib/
|
|
927
|
+
│ ├── face_detector.py # FaceDetector
|
|
928
|
+
│ ├── face_mesh_detector.py # FaceMeshDetector (478 landmarks)
|
|
929
|
+
│ ├── hand_detector.py # HandDetector (21 landmarks)
|
|
930
|
+
│ ├── pose_detector.py # PoseDetector (33 landmarks)
|
|
931
|
+
│ ├── object_detector.py # ObjectDetector (EfficientDet)
|
|
932
|
+
│ ├── selfie_segmentation.py # SelfieSegmentation
|
|
933
|
+
│ ├── hair_segmentation.py # HairSegmentation
|
|
934
|
+
│ ├── fps_counter.py # FPSCounter utility
|
|
935
|
+
│ ├── classifier.py # Generic classifier
|
|
936
|
+
│ ├── form_detector.py # Form / document detector
|
|
937
|
+
│ ├── form_roi_detector.py # Form region-of-interest detector
|
|
938
|
+
│ ├── form_roi_annotator.py # Form annotation utilities
|
|
939
|
+
│ ├── image_detector.py # Image-based detector
|
|
940
|
+
│ ├── image_hsv_detector.py # HSV color-range detector
|
|
941
|
+
│ └── text_detector.py # Text detection
|
|
942
|
+
├── capture/
|
|
943
|
+
│ ├── video_template.py # video_capture_template loop
|
|
944
|
+
│ ├── screen_capture.py # ScreenCapture
|
|
945
|
+
│ ├── video_recorder.py # VideoRecorder
|
|
946
|
+
│ ├── image_template.py # Single-image processing template
|
|
947
|
+
│ └── draw_object.py # Drawing helpers
|
|
948
|
+
└── utility/
|
|
949
|
+
├── vision_utilis.py # Shared image utilities
|
|
950
|
+
└── live_plot.py # Real-time matplotlib plotting
|
|
951
|
+
```
|
|
952
|
+
|
|
953
|
+
---
|
|
954
|
+
|
|
955
|
+
## Running Modes
|
|
956
|
+
|
|
957
|
+
All detectors support three MediaPipe running modes:
|
|
958
|
+
|
|
959
|
+
| Mode | Use case | Notes |
|
|
960
|
+
|---|---|---|
|
|
961
|
+
| `IMAGE` | Static images | No timestamp needed |
|
|
962
|
+
| `VIDEO` | Webcam / pre-recorded video | Pass `timestamp_ms` or let detector auto-increment |
|
|
963
|
+
| `LIVE_STREAM` | Async streaming | Results delivered via callback |
|
|
964
|
+
|
|
965
|
+
---
|
|
966
|
+
|
|
967
|
+
## Contributing
|
|
968
|
+
|
|
969
|
+
### Dev setup
|
|
970
|
+
|
|
971
|
+
```bash
|
|
972
|
+
git clone https://github.com/your-org/openvisionkit.git
|
|
973
|
+
cd openvisionkit
|
|
974
|
+
make setup # uv sync + install pre-commit hooks
|
|
975
|
+
```
|
|
976
|
+
|
|
977
|
+
### Useful Make targets
|
|
978
|
+
|
|
979
|
+
| Target | What it does |
|
|
980
|
+
|---|---|
|
|
981
|
+
| `make setup` | Install all deps + pre-commit hooks (run once after clone) |
|
|
982
|
+
| `make format` | Auto-format with black + isort |
|
|
983
|
+
| `make lint` | Run ruff + flake8 |
|
|
984
|
+
| `make lint-fix` | Auto-fix ruff-fixable issues |
|
|
985
|
+
| `make test` | Run all non-integration tests |
|
|
986
|
+
| `make test-cov` | Run tests with HTML coverage report |
|
|
987
|
+
| `make typecheck` | mypy static analysis |
|
|
988
|
+
| `make check` | format-check + lint + typecheck (pre-push sanity) |
|
|
989
|
+
|
|
990
|
+
### Commit convention
|
|
991
|
+
|
|
992
|
+
All commits must follow [Conventional Commits](https://www.conventionalcommits.org/). The pre-commit hook enforces this.
|
|
993
|
+
|
|
994
|
+
| Prefix | Effect |
|
|
995
|
+
|---|---|
|
|
996
|
+
| `fix:`, `perf:`, `refactor:` | patch release |
|
|
997
|
+
| `feat:` | minor release |
|
|
998
|
+
| `feat!:` or `BREAKING CHANGE:` footer | major release |
|
|
999
|
+
| `chore:`, `docs:`, `test:`, `ci:` | no release |
|
|
1000
|
+
|
|
1001
|
+
### CI/CD
|
|
1002
|
+
|
|
1003
|
+
| Workflow | Trigger | Purpose |
|
|
1004
|
+
|---|---|---|
|
|
1005
|
+
| `ci-unit.yml` | push / PR | Unit tests on Python 3.11 + 3.12 |
|
|
1006
|
+
| `ci-integration.yml` | push/PR to main, manual | Integration tests (requires model files) |
|
|
1007
|
+
| `ci-security.yml` | push/PR to main, daily 02:00 UTC | pip-audit, Trivy, CodeQL |
|
|
1008
|
+
| `renovate.yml` | weekly Monday 01:00 UTC | Automated dependency updates |
|
|
1009
|
+
| `semantic-release.yml` | push to main | Semantic version bump + GitHub Release |
|
|
1010
|
+
| `publish.yml` | GitHub Release published | Build + publish to PyPI via OIDC |
|
|
1011
|
+
|
|
1012
|
+
Releases are fully automated — push commits to `main` and the semantic-release workflow handles version bumping, tagging, changelog generation, and PyPI publishing.
|
|
1013
|
+
|
|
1014
|
+
---
|
|
1015
|
+
|
|
1016
|
+
## License
|
|
1017
|
+
|
|
1018
|
+
MIT
|