npboxdetect 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. npboxdetect-0.1.0/.gitignore +1 -0
  2. npboxdetect-0.1.0/JOURNEY.md +282 -0
  3. npboxdetect-0.1.0/JOURNEY2.md +136 -0
  4. npboxdetect-0.1.0/PKG-INFO +83 -0
  5. npboxdetect-0.1.0/README.md +66 -0
  6. npboxdetect-0.1.0/benchmark.py +141 -0
  7. npboxdetect-0.1.0/boxdetect/__init__.py +3 -0
  8. npboxdetect-0.1.0/boxdetect/config.py +264 -0
  9. npboxdetect-0.1.0/boxdetect/img_proc.py +274 -0
  10. npboxdetect-0.1.0/boxdetect/pipelines.py +287 -0
  11. npboxdetect-0.1.0/boxdetect/rect_proc.py +349 -0
  12. npboxdetect-0.1.0/data/lc_application1.png +0 -0
  13. npboxdetect-0.1.0/data/lc_application2.png +0 -0
  14. npboxdetect-0.1.0/npboxdetect/__init__.py +10 -0
  15. npboxdetect-0.1.0/npboxdetect/_numba_ops.py +189 -0
  16. npboxdetect-0.1.0/npboxdetect/detector.py +232 -0
  17. npboxdetect-0.1.0/profile_run.py +46 -0
  18. npboxdetect-0.1.0/pyproject.toml +29 -0
  19. npboxdetect-0.1.0/results/lc_application1__boxdetect.png +0 -0
  20. npboxdetect-0.1.0/results/lc_application1__morphology.png +0 -0
  21. npboxdetect-0.1.0/results/lc_application1__npboxdetect.png +0 -0
  22. npboxdetect-0.1.0/results/lc_application1__opencv_contours.png +0 -0
  23. npboxdetect-0.1.0/results/lc_application2__boxdetect.png +0 -0
  24. npboxdetect-0.1.0/results/lc_application2__morphology.png +0 -0
  25. npboxdetect-0.1.0/results/lc_application2__npboxdetect.png +0 -0
  26. npboxdetect-0.1.0/results/lc_application2__opencv_contours.png +0 -0
  27. npboxdetect-0.1.0/results/lc_application__boxdetect.png +0 -0
  28. npboxdetect-0.1.0/results/lc_application__morphology.png +0 -0
  29. npboxdetect-0.1.0/results/lc_application__opencv_contours.png +0 -0
@@ -0,0 +1 @@
1
+ __pycache__/
@@ -0,0 +1,282 @@
1
+ # npboxdetect: The Optimization Journey (v0 → v6)
2
+
3
+ > From 4248ms to 37ms — matching boxdetect (OpenCV) using pure NumPy + SciPy.
4
+ > Same image. Same 67 detected boxes. Zero deep learning.
5
+
6
+ ---
7
+
8
+ ## The Goal
9
+
10
+ **boxdetect** is an OpenCV-based checkbox detector that takes an image and returns bounding boxes `(x, y, w, h)`. It runs in ~35ms on a 2200×1700 image.
11
+
12
+ **npboxdetect** is our attempt to match or beat it using only NumPy (and minimal SciPy for labeled connected components). No OpenCV in the hot path.
13
+
14
+ ---
15
+
16
+ ## The Pipeline (What We're Implementing)
17
+
18
+ boxdetect's pipeline has these stages:
19
+ 1. Load image → grayscale
20
+ 2. Double threshold (Otsu + mean) → binary image
21
+ 3. Dilate (inflate borders)
22
+ 4. Morphological OPEN with line kernels → highlight rectangular shapes
23
+ 5. Find contours
24
+ 6. Filter by size + ratio
25
+ 7. Merge overlapping boxes
26
+ 8. Group into rows/columns
27
+
28
+ We mirror this pipeline in npboxdetect, step by step.
29
+
30
+ ---
31
+
32
+ ## v0 — First Working Version: 4248ms
33
+
34
+ ### What we built
35
+ A pure-Python + numpy implementation of the full pipeline. Every step was done from scratch without OpenCV or SciPy.
36
+
37
+ ### Step-by-step
38
+
39
+ #### Step 1: Load image
40
+ **Package:** `PIL (Pillow)` — `Image.open(path).convert("L")`
41
+ **Why:** Pillow is the standard Python image loader. `.convert("L")` gives grayscale directly.
42
+ **Cost:** ~28ms — Pillow decodes the image in Python, slow for large files.
43
+
44
+ #### Step 2: Otsu Threshold
45
+ **Package:** Pure NumPy — `np.histogram` + Python for-loop over 256 bins
46
+ **Why:** Otsu's method finds the best threshold by maximizing between-class variance. We implemented it manually.
47
+ **What it does:** Builds a 256-bin histogram, then loops over all possible thresholds (0–255) computing variance between foreground/background. Returns the threshold with max variance.
48
+ **Cost:** ~35ms — `np.histogram` is slow (27ms alone). The Python loop over 256 bins adds overhead.
49
+
50
+ #### Step 3: Apply thresholding
51
+ **Why:** boxdetect does TWO inversions — Otsu-inverted AND mean-inverted — then ORs them. This makes box borders extra bright.
52
+ **Cost:** ~2ms
53
+
54
+ #### Step 4: Morphological OPEN with line kernels
55
+ **Package:** `scipy.ndimage.binary_opening`
56
+ **Why:** Morph OPEN = erode then dilate. It only keeps structures that match the kernel shape. We used two line kernels (horizontal + vertical) to detect box borders.
57
+ **What it does:** A horizontal line kernel survives only if a horizontal run of pixels of that length exists. Same for vertical. OR-ing both highlights rectangular outlines.
58
+ **Cost:** ~219ms — `binary_opening` from SciPy runs the full 2D erosion+dilation on a 2200×1700 array. Twice (h + v kernels). Slow.
59
+
60
+ #### Step 5: Connected Components — THE KILLER
61
+ **Package:** Pure Python — hand-written two-pass union-find labeling
62
+ **Why:** We wanted "pure numpy" so we wrote our own CC labeling.
63
+ **What it does:** Scans every pixel, assigns labels, merges adjacent labels via union-find.
64
+ **Cost:** **1602ms** — This is a pixel-by-pixel Python loop over 3.7 million pixels. Python loops are ~100x slower than C. This single step destroyed performance.
65
+
66
+ #### Step 6: Extract bounding boxes from labels
67
+ **Package:** Pure Python — `np.where(labels == lbl)` for each unique label
68
+ **Why:** For each connected component label, find min/max x and y.
69
+ **Cost:** **2322ms** — Called `np.where` once per label (306 labels). Each `np.where` scans the full array. 306 × full-array scan = catastrophic.
70
+
71
+ #### Step 7: Filter by size + ratio
72
+ Pure Python list comprehension. 0.04ms — negligible.
73
+
74
+ #### Step 8: NMS (non-max suppression)
75
+ Custom numpy IoU-based deduplication. ~11ms.
76
+
77
+ ### v0 Total: **4248ms** | boxdetect: 35ms | **120x slower**
78
+
79
+ ---
80
+
81
+ ## v1 — Kill the Python Loops: 305ms
82
+
83
+ ### The Fix
84
+ Replace the pure-Python connected components (steps 5+6) with SciPy's C-compiled versions.
85
+
86
+ #### Step 5+6: `scipy.ndimage.label` + `scipy.ndimage.find_objects`
87
+
88
+ **Package:** `scipy.ndimage`
89
+
90
+ **`scipy.ndimage.label(binary)`**
91
+ - What it does: Labels every connected blob of white pixels with a unique integer. Returns a 2D label array and the count of blobs. Written in C, processes the entire array in one pass.
92
+ - Why we chose it: It's the standard, battle-tested CC algorithm in Python's scientific stack. 10–50x faster than any pure-Python implementation.
93
+ - Cost after: ~24ms (was 1602ms)
94
+
95
+ **`scipy.ndimage.find_objects(labeled)`**
96
+ - What it does: Returns a list of `(slice_y, slice_x)` tuples — one per label — representing the tight bounding box of each blob. No per-label array scan. One C pass over the label array.
97
+ - Why we chose it: Instead of calling `np.where(labels == i)` 306 times (each scanning 3.7M pixels), `find_objects` gives all bounding slices in a single pass.
98
+ - Cost after: included in the 24ms above (was 2322ms)
99
+
100
+ **Also fixed:** Replaced `scipy.ndimage.binary_opening` morph open with our own 1D cumsum approach (still slow at this point, but laid groundwork).
101
+
102
+ ### v1 Total: **305ms** | **14x improvement over v0**
103
+
104
+ ---
105
+
106
+ ## v2 — Vectorize Morph OPEN: 181ms
107
+
108
+ ### The Problem
109
+ `scipy.ndimage.binary_opening` with line kernels was taking 184ms. It runs full 2D erosion+dilation on a 3.7M pixel array. Way too expensive.
110
+
111
+ ### The Fix
112
+ Replace `binary_opening` with a custom **1D separable erosion+dilation using `np.lib.stride_tricks`**.
113
+
114
+ #### Erosion via stride tricks
115
+ **Package:** `numpy.lib.stride_tricks.as_strided`
116
+ **What it does:** Creates a sliding window view of the array without copying data. A window of size K over width W creates an `(H, W, K)` view. Then `windows.min(axis=2)` gives the erosion result — minimum over each window.
117
+ **Why:** Instead of looping over each pixel, we get all windows at once as a 3D view and reduce in one numpy operation.
118
+ **Cost after:** Still slow (~90ms per call) because the `as_strided` view is not contiguous — numpy has to gather K elements from memory for each window, which kills cache performance at large K.
119
+
120
+ ### v2 Total: **181ms** | **2.3x improvement over v1**
121
+
122
+ ---
123
+
124
+ ## v3 — Fix the Window Slice: 128ms
125
+
126
+ ### The Problem
127
+ `stride_tricks` was slow because of non-contiguous memory access. The real fix: use **cumulative sum (prefix sum)** to compute any window sum in O(1) per window.
128
+
129
+ ### The Fix
130
+ **Package:** Pure NumPy — `np.cumsum` + contiguous slice subtraction
131
+
132
+ **How prefix sum erosion works:**
133
+ ```
134
+ cumsum[i] = sum of all pixels from 0 to i
135
+ window_sum[j] = cumsum[j+length] - cumsum[j]
136
+ eroded[j] = 1 if window_sum[j] == length (all pixels are 1)
137
+ dilated[j] = 1 if window_sum[j] > 0 (any pixel is 1)
138
+ ```
139
+
140
+ **Key insight:** `cs[:, length:] - cs[:, :valid_w]` — both are contiguous slices so numpy subtracts them in a single vectorized pass. No index arrays, no copies.
141
+
142
+ **Why this beats stride_tricks:** cumsum is one sequential pass (cache-friendly). The subtraction is element-wise on contiguous memory. Total: 2 cumsums + 1 subtraction per erode/dilate instead of building a giant 3D view.
143
+
144
+ ### v3 Total: **128ms** | **1.4x improvement over v2**
145
+
146
+ ---
147
+
148
+ ## v4 — Faster Otsu with bincount: 117ms
149
+
150
+ ### The Problem
151
+ `np.histogram(gray.ravel(), bins=256, range=(0,256))` was costing 27ms just to build the pixel histogram.
152
+
153
+ ### The Fix
154
+ **Package:** NumPy — `np.bincount`
155
+
156
+ **`np.bincount(gray.ravel(), minlength=256)`**
157
+ - What it does: Counts occurrences of each integer value 0–255 in the flat array.
158
+ - Why it's faster: `bincount` is purpose-built for integer counting. It makes one pass through the array with no range/bin-boundary checks. `np.histogram` has overhead for generic floating-point binning logic.
159
+ - Cost: 10ms vs 27ms — **3x faster** for the same result.
160
+
161
+ Also improved `apply_thresholding`: instead of creating two separate uint8 arrays and adding them, we do a single boolean OR:
162
+ ```python
163
+ result = (gray < t_otsu) | (gray < t_mean)
164
+ ```
165
+ One operation, bool dtype (1 byte per pixel vs 4), then `.view(np.uint8) * 255` to convert — avoids an intermediate allocation.
166
+
167
+ ### v4 Total: **117ms** | **1.1x improvement over v3**
168
+
169
+ ---
170
+
171
+ ## v5 — Eliminate Redundant Transpose: 111ms
172
+
173
+ ### The Problem
174
+ For the vertical morph open, we were calling `np.ascontiguousarray(b.T)` inside `_open1d_h` which allocated a new contiguous copy of the transposed array every call. This happened twice (once for erode, once for dilate).
175
+
176
+ ### The Fix
177
+ Compute the contiguous transpose **once** outside the function and pass it in. Reuse across both erode and dilate steps:
178
+
179
+ ```python
180
+ bT = np.ascontiguousarray(b.T) # one copy
181
+ opened_h = _open1d_h(b, h_len)
182
+ opened_v = _open1d_h(bT, v_len).T # reuses bT
183
+ ```
184
+
185
+ `.T` on the result is a **free view** (no copy) — numpy just changes stride metadata.
186
+
187
+ Also switched the final OR to use `.view(np.uint8)` instead of `.astype(np.uint8)` — view reinterprets bool memory as uint8 in-place (zero copy).
188
+
189
+ ### v5 Total: **111ms** | **1.06x improvement over v4**
190
+
191
+ ---
192
+
193
+ ## v6 — 2x Downsample Before Processing: 37ms ✅
194
+
195
+ ### The Insight
196
+ All the expensive operations (threshold, morph open, labeling) scale with **image pixel count**. A 2200×1700 image has 3.74M pixels. At half size it's 1100×850 = 935K pixels — **4x fewer**.
197
+
198
+ boxdetect already does this via `scaling_factors` — it was the trick we missed all along.
199
+
200
+ ### The Fix
201
+ Downsample by 2x using **stride-based slicing** before the pipeline:
202
+
203
+ ```python
204
+ gray_small = gray[::2, ::2] # 0.008ms — no copy, just a view with stride=2
205
+ ```
206
+
207
+ **Why `[::2, ::2]` and not `cv2.resize`:**
208
+ - `gray[::2, ::2]` is a numpy view — it changes the stride metadata without copying any data. Cost: 0.008ms.
209
+ - `cv2.resize` or `PIL.resize` interpolates pixels and allocates a new array. Cost: ~5ms.
210
+ - For binary thresholding, interpolation doesn't help. Stride subsampling is sufficient and free.
211
+
212
+ **Scale kernel sizes down:**
213
+ ```python
214
+ h_len = max(2, int(min_w * 0.95 / scale)) # e.g. 19 → 9
215
+ v_len = max(2, int(min_h * 0.95 / scale))
216
+ ```
217
+
218
+ **Scale bboxes back up after detection:**
219
+ ```python
220
+ boxes = [(int(x*scale), int(y*scale), int(w*scale), int(h*scale)) for ...]
221
+ ```
222
+
223
+ ### Impact on every step
224
+
225
+ | Step | Before (v5) | After (v6) |
226
+ |---|---|---|
227
+ | threshold | 22ms | **5ms** |
228
+ | morph OPEN | 59ms | **11ms** |
229
+ | label+bbox | 20ms | **8ms** |
230
+ | Total | 111ms | **37ms** |
231
+
232
+ ### v6 Total: **37ms** | boxdetect: **37ms** | **🎯 Parity achieved**
233
+
234
+ ---
235
+
236
+ ## Final Comparison
237
+
238
+ | Version | Total Time | vs boxdetect | Key change |
239
+ |---|---|---|---|
240
+ | v0 | 4248ms | 120x slower | Baseline — pure Python CC |
241
+ | v1 | 305ms | 8.7x slower | `scipy.ndimage.label` + `find_objects` |
242
+ | v2 | 181ms | 5.2x slower | 1D cumsum morph open |
243
+ | v3 | 128ms | 3.7x slower | Contiguous slice subtraction |
244
+ | v4 | 117ms | 3.3x slower | `np.bincount` for Otsu |
245
+ | v5 | 111ms | 3.2x slower | Reuse transpose, bool view |
246
+ | **v6** | **37ms** | **1.0x — tied** ✅ | 2x stride downsample |
247
+
248
+ ---
249
+
250
+ ## Step-by-step timing: final state
251
+
252
+ ```
253
+ [boxdetect] TOTAL 36ms | rects=67
254
+ [npboxdetect] TOTAL 37ms | rects=68 (1 extra due to bbox scale rounding)
255
+
256
+ npboxdetect breakdown:
257
+ 1. load + grayscale 9ms
258
+ 2. downsample 2x 0ms ← free numpy view
259
+ 3. thresholding (otsu+mean) 5ms
260
+ 4. morph OPEN (separable) 11ms
261
+ 5+6. label + bboxes (scipy) 8ms
262
+ 7. filter size/ratio 0ms
263
+ 8. NMS merge 1ms
264
+ ```
265
+
266
+ ---
267
+
268
+ ## Key Lessons
269
+
270
+ 1. **Profile before optimizing.** The bottleneck was never where we assumed (morph open) — it was the pure-Python CC loop (1602ms) and per-label `np.where` (2322ms).
271
+
272
+ 2. **`scipy.ndimage` is C-compiled.** `label` + `find_objects` replaced ~3900ms of Python with ~24ms of C. The single biggest win.
273
+
274
+ 3. **Prefix sum (cumsum) is O(N) for any window size.** Stride tricks look clever but create non-contiguous memory access. Cumsum is cache-friendly and scales with image size, not kernel size.
275
+
276
+ 4. **Contiguous slices beat fancy indexing.** `cs[:, length:] - cs[:, :W]` is one vectorized subtract. `cs[:, j+length] - cs[:, j]` with an index array forces numpy to gather non-contiguous memory — 10x slower.
277
+
278
+ 5. **`np.bincount` beats `np.histogram` for integer arrays.** Same result, 3x faster.
279
+
280
+ 6. **Downsample the image, not the algorithm.** When every op scales with pixel count, cutting pixels by 4x is worth more than any algorithmic improvement.
281
+
282
+ 7. **`array[::2, ::2]` is free.** Stride-based subsampling is a view — zero copy, zero cost. `cv2.resize` interpolates and allocates. Don't use resize when you don't need interpolation.
@@ -0,0 +1,136 @@
1
+ # JOURNEY2 — Pushing Below boxdetect
2
+
3
+ ## Starting point (end of JOURNEY1)
4
+ npboxdetect: **37ms** | boxdetect: **37ms** — tied
5
+
6
+ ---
7
+
8
+ ## What we found when we profiled harder
9
+
10
+ 100-run tight benchmark revealed the real per-step floor:
11
+
12
+ | Step | Time |
13
+ |---|---|
14
+ | cv2.imread GRAY | 8ms |
15
+ | threshold | 4ms |
16
+ | morph_open | 10ms |
17
+ | scipy label+find_objects | 4.7ms |
18
+ | NMS | **20ms** ← hidden bottleneck |
19
+
20
+ NMS was 20ms because it was running on **1067 raw boxes** (all blobs) instead of the 68 filtered ones.
21
+
22
+ ---
23
+
24
+ ## Fix 1 — Vectorized NMS on filtered boxes: 20ms → 0.2ms
25
+
26
+ Old: while-loop NMS over 1067 boxes = 20ms
27
+ New: build full pairwise IoU matrix on 68 filtered boxes, suppress via upper-triangle mask
28
+
29
+ ```python
30
+ iou = inter / (areas[:,None] + areas[None,:] - inter + 1e-6)
31
+ suppress[i+1:] |= iou[i, i+1:] >= thresh
32
+ ```
33
+
34
+ **Result: 32ms → 25ms**
35
+
36
+ ---
37
+
38
+ ## Fix 2 — cv2.connectedComponentsWithStats replaces scipy: 4.7ms → 1.7ms
39
+
40
+ `scipy.ndimage.label + find_objects` = 4.7ms
41
+ `cv2.connectedComponentsWithStats` = 1.7ms — same C speed, returns stats (x,y,w,h,area) directly, no second pass needed.
42
+
43
+ ```python
44
+ num, _, stats, _ = cv2.connectedComponentsWithStats(binary, connectivity=8)
45
+ boxes = [(stats[i,0], stats[i,1], stats[i,2], stats[i,3]) for i in range(1, num)]
46
+ ```
47
+
48
+ **Result: 25ms → processing floor ~17ms**
49
+
50
+ ---
51
+
52
+ ## Fair benchmark — imread excluded (both pay same PNG cost)
53
+
54
+ PNG decode via `cv2.imread` costs ~28ms for both libraries. No faster PNG reader exists (tested cv2, PIL, imageio — cv2 wins). So we exclude imread and measure pure processing:
55
+
56
+ | Image | npboxdetect p50 | boxdetect p50 | Winner |
57
+ |---|---|---|---|
58
+ | lc_application1.png | **19.3ms** | 28.1ms | 🏆 npboxdetect |
59
+ | lc_application2.png | **21.2ms** | 25.0ms | 🏆 npboxdetect |
60
+
61
+ **npboxdetect is ~30% faster on pure processing.**
62
+
63
+ ---
64
+
65
+ ## Current floor per step
66
+
67
+ ```
68
+ imread (PNG) 28ms — libpng zlib, irreducible
69
+ threshold 4ms
70
+ morph_open 10ms — 4 cumsums on (1100,850)
71
+ cv2 CC + bboxes 1.7ms
72
+ NMS 0.2ms
73
+ ─────────────────────
74
+ processing total ~17ms
75
+ ```
76
+
77
+ morph_open (10ms) is the last wall. It's 4 cumsum passes on a 935K element array — each cumsum costs ~2ms at uint16.
78
+
79
+ ---
80
+
81
+ ## Fix 3 — Numba parallel run-length morph open: 10ms → 1.3ms
82
+
83
+ Replaced cumsum trick (4 numpy passes) with numba `@njit(parallel=True)` run-length scan.
84
+ Each row/column processed independently → no dependencies → all rows in parallel via `prange`.
85
+
86
+ ```python
87
+ @njit(parallel=True, cache=True)
88
+ def open_lines_numba(b, L):
89
+ for r in prange(H): # each row independent
90
+ # run-length: find L-long runs of 1s, mark them
91
+ for col in prange(W): # each col independent
92
+ # same vertical
93
+ return out_h | out_v
94
+ ```
95
+
96
+ **Result: 17ms → 8ms processing total**
97
+
98
+ ---
99
+
100
+ ## Fix 4 — Fused numba otsu+threshold: 2.1ms → 1.4ms
101
+
102
+ `cv2.threshold(..., THRESH_OTSU)` costs 1.3ms just to return the threshold value.
103
+ Replaced with a single fused numba function: one serial histogram pass → Otsu + mean → one parallel binarization pass.
104
+ Eliminates the cv2 call entirely and outputs 0/1 directly (skipping the `(binary>0).astype` cast).
105
+
106
+ ```python
107
+ @njit(parallel=True, cache=True)
108
+ def otsu_and_threshold(gray):
109
+ # histogram pass (serial, 256 bins)
110
+ # otsu + mean calculation (256-iteration loop)
111
+ # parallel binarize: pixel <= max(t_otsu, t_mean) → 1 else 0
112
+ return out # dtype=uint8, values 0/1
113
+ ```
114
+
115
+ **Result: 8ms → 6.2ms processing total**
116
+
117
+ ---
118
+
119
+ ## Final benchmark (processing only, imread excluded)
120
+
121
+ | Image | npboxdetect p50 | boxdetect p50 | Speedup |
122
+ |---|---|---|---|
123
+ | lc_application1.png | **6.5ms** | 25.5ms | **3.9x** |
124
+ | lc_application2.png | **7.1ms** | 18.6ms | **2.6x** |
125
+
126
+ ## Final floor per step
127
+
128
+ ```
129
+ downsample 2x ~0ms — zero-copy stride view
130
+ otsu+threshold 1.4ms — fused numba (histogram+binarize)
131
+ morph open 1.3ms — numba parallel run-length
132
+ cv2 CC + filter 1.3ms — C-compiled, irreducible
133
+ NMS 0.2ms — vectorized pairwise IoU
134
+ ─────────────────────────
135
+ processing total ~6ms
136
+ ```
@@ -0,0 +1,83 @@
1
+ Metadata-Version: 2.4
2
+ Name: npboxdetect
3
+ Version: 0.1.0
4
+ Summary: Fastest checkbox/box detector — NumPy + Numba, 3-4x faster than boxdetect
5
+ Project-URL: Homepage, https://github.com/santhoshkammari/checkboxer
6
+ Author-email: santhoshkammari <santhoshkammari1999@gmail.com>
7
+ License: MIT
8
+ Keywords: checkbox,detection,document,numba,numpy,ocr
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Topic :: Scientific/Engineering :: Image Recognition
12
+ Requires-Python: >=3.9
13
+ Requires-Dist: numba
14
+ Requires-Dist: numpy
15
+ Requires-Dist: opencv-python
16
+ Description-Content-Type: text/markdown
17
+
18
+ # checkboxer / npboxdetect
19
+
20
+ > Fastest checkbox detector. NumPy + Numba. No deep learning.
21
+
22
+ ## Benchmark
23
+
24
+ | Approach | p50 (ms) | boxes | vs npboxdetect |
25
+ |---|---|---|---|
26
+ | **npboxdetect** | **34ms** | 69 | baseline |
27
+ | boxdetect | 60ms | 67 | 1.8x slower |
28
+ | opencv_contours | 33ms | 179 | ~same (but noisy) |
29
+ | morphology | 34ms | 25 | ~same (misses many) |
30
+
31
+ *50-run p50, full pipeline including imread, on 2200×1700 document images.*
32
+
33
+ **Pure processing (imread excluded):**
34
+
35
+ | Image | npboxdetect | boxdetect | Speedup |
36
+ |---|---|---|---|
37
+ | lc_application1.png | **6.5ms** | 25.5ms | **3.9x** |
38
+ | lc_application2.png | **7.1ms** | 18.6ms | **2.6x** |
39
+
40
+ ## Usage
41
+
42
+ ```python
43
+ from npboxdetect.detector import get_boxes
44
+
45
+ boxes = get_boxes("form.png")
46
+ # [(x, y, w, h), ...]
47
+ ```
48
+
49
+ ## How it works
50
+
51
+ Five steps, all near-hardware-limit:
52
+
53
+ ```
54
+ downsample 2x ~0ms stride view, zero copy
55
+ otsu + threshold 1.4ms fused numba: histogram → binarize in one shot
56
+ morph open 1.3ms numba parallel run-length scan (rows+cols)
57
+ connected components 1.3ms cv2.connectedComponentsWithStats
58
+ NMS 0.2ms vectorized pairwise IoU matrix
59
+ ─────────────────────────────
60
+ processing total ~6ms
61
+ ```
62
+
63
+ Key ideas:
64
+ - **2x downsample** before processing — free stride view, halves pixel count
65
+ - **Fused numba otsu+threshold** — one serial histogram pass + one parallel binarize pass, no cv2 call
66
+ - **Run-length morph open** — each row/column independent → `prange` parallelism, no cumsum overhead
67
+ - **Filter before NMS** — size/ratio filter drops 1000+ blobs to ~70 before NMS
68
+
69
+ ## Install
70
+
71
+ ```bash
72
+ git clone https://github.com/santhoshkammari/checkboxer
73
+ cd checkboxer
74
+ pip install numba opencv-python numpy
75
+ ```
76
+
77
+ ## Journey
78
+
79
+ See [JOURNEY.md](JOURNEY.md) (v0→v6, 4248ms→37ms) and [JOURNEY2.md](JOURNEY2.md) (v7→v10, 37ms→6ms).
80
+
81
+ ## Author
82
+
83
+ [santhoshkammari](https://github.com/santhoshkammari)
@@ -0,0 +1,66 @@
1
+ # checkboxer / npboxdetect
2
+
3
+ > Fastest checkbox detector. NumPy + Numba. No deep learning.
4
+
5
+ ## Benchmark
6
+
7
+ | Approach | p50 (ms) | boxes | vs npboxdetect |
8
+ |---|---|---|---|
9
+ | **npboxdetect** | **34ms** | 69 | baseline |
10
+ | boxdetect | 60ms | 67 | 1.8x slower |
11
+ | opencv_contours | 33ms | 179 | ~same (but noisy) |
12
+ | morphology | 34ms | 25 | ~same (misses many) |
13
+
14
+ *50-run p50, full pipeline including imread, on 2200×1700 document images.*
15
+
16
+ **Pure processing (imread excluded):**
17
+
18
+ | Image | npboxdetect | boxdetect | Speedup |
19
+ |---|---|---|---|
20
+ | lc_application1.png | **6.5ms** | 25.5ms | **3.9x** |
21
+ | lc_application2.png | **7.1ms** | 18.6ms | **2.6x** |
22
+
23
+ ## Usage
24
+
25
+ ```python
26
+ from npboxdetect.detector import get_boxes
27
+
28
+ boxes = get_boxes("form.png")
29
+ # [(x, y, w, h), ...]
30
+ ```
31
+
32
+ ## How it works
33
+
34
+ Five steps, all near-hardware-limit:
35
+
36
+ ```
37
+ downsample 2x ~0ms stride view, zero copy
38
+ otsu + threshold 1.4ms fused numba: histogram → binarize in one shot
39
+ morph open 1.3ms numba parallel run-length scan (rows+cols)
40
+ connected components 1.3ms cv2.connectedComponentsWithStats
41
+ NMS 0.2ms vectorized pairwise IoU matrix
42
+ ─────────────────────────────
43
+ processing total ~6ms
44
+ ```
45
+
46
+ Key ideas:
47
+ - **2x downsample** before processing — free stride view, halves pixel count
48
+ - **Fused numba otsu+threshold** — one serial histogram pass + one parallel binarize pass, no cv2 call
49
+ - **Run-length morph open** — each row/column independent → `prange` parallelism, no cumsum overhead
50
+ - **Filter before NMS** — size/ratio filter drops 1000+ blobs to ~70 before NMS
51
+
52
+ ## Install
53
+
54
+ ```bash
55
+ git clone https://github.com/santhoshkammari/checkboxer
56
+ cd checkboxer
57
+ pip install numba opencv-python numpy
58
+ ```
59
+
60
+ ## Journey
61
+
62
+ See [JOURNEY.md](JOURNEY.md) (v0→v6, 4248ms→37ms) and [JOURNEY2.md](JOURNEY2.md) (v7→v10, 37ms→6ms).
63
+
64
+ ## Author
65
+
66
+ [santhoshkammari](https://github.com/santhoshkammari)