mangascourx 1.0.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- mangascourx-1.0.3/MANIFEST.in +16 -0
- mangascourx-1.0.3/PKG-INFO +558 -0
- mangascourx-1.0.3/README.md +515 -0
- mangascourx-1.0.3/_version.py +2 -0
- mangascourx-1.0.3/mangascourx.egg-info/PKG-INFO +558 -0
- mangascourx-1.0.3/mangascourx.egg-info/SOURCES.txt +10 -0
- mangascourx-1.0.3/mangascourx.egg-info/dependency_links.txt +1 -0
- mangascourx-1.0.3/mangascourx.egg-info/not-zip-safe +1 -0
- mangascourx-1.0.3/mangascourx.egg-info/requires.txt +11 -0
- mangascourx-1.0.3/mangascourx.egg-info/top_level.txt +1 -0
- mangascourx-1.0.3/setup.cfg +4 -0
- mangascourx-1.0.3/setup.py +72 -0
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# MANIFEST.in
|
|
2
|
+
include README.md
|
|
3
|
+
include LICENSE
|
|
4
|
+
include _version.py
|
|
5
|
+
include requirements.txt
|
|
6
|
+
|
|
7
|
+
recursive-include MangaScourX *.py
|
|
8
|
+
recursive-include MangaScourX *.pyi
|
|
9
|
+
|
|
10
|
+
recursive-exclude * __pycache__
|
|
11
|
+
recursive-exclude * *.py[co]
|
|
12
|
+
recursive-exclude * *.so
|
|
13
|
+
recursive-exclude * *.dll
|
|
14
|
+
|
|
15
|
+
prune tests
|
|
16
|
+
prune docs/_build
|
|
@@ -0,0 +1,558 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: mangascourx
|
|
3
|
+
Version: 1.0.3
|
|
4
|
+
Summary: Advanced Multi-Scale PatchMatch & AI-Powered Text Removal Engine for Manga
|
|
5
|
+
Home-page: https://github.com/zxui86/mangascourx
|
|
6
|
+
Author: Zizo
|
|
7
|
+
Author-email: zly30257@gmail.com
|
|
8
|
+
License: MIT
|
|
9
|
+
Keywords: manga,comic,inpainting,text-removal,patchmatch,image-processing,computer-vision
|
|
10
|
+
Classifier: Development Status :: 4 - Beta
|
|
11
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
12
|
+
Classifier: Programming Language :: Python :: 3
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
+
Classifier: Topic :: Scientific/Engineering :: Image Processing
|
|
19
|
+
Requires-Python: >=3.8
|
|
20
|
+
Description-Content-Type: text/markdown
|
|
21
|
+
Requires-Dist: numpy>=1.20.0
|
|
22
|
+
Requires-Dist: opencv-python-headless>=4.5.0
|
|
23
|
+
Requires-Dist: numba>=0.53.0
|
|
24
|
+
Requires-Dist: scipy>=1.7.0
|
|
25
|
+
Requires-Dist: torch>=1.9.0
|
|
26
|
+
Requires-Dist: torchvision>=0.10.0
|
|
27
|
+
Provides-Extra: dev
|
|
28
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
29
|
+
Requires-Dist: twine>=4.0.0; extra == "dev"
|
|
30
|
+
Requires-Dist: build>=0.8.0; extra == "dev"
|
|
31
|
+
Dynamic: author
|
|
32
|
+
Dynamic: author-email
|
|
33
|
+
Dynamic: classifier
|
|
34
|
+
Dynamic: description
|
|
35
|
+
Dynamic: description-content-type
|
|
36
|
+
Dynamic: home-page
|
|
37
|
+
Dynamic: keywords
|
|
38
|
+
Dynamic: license
|
|
39
|
+
Dynamic: provides-extra
|
|
40
|
+
Dynamic: requires-dist
|
|
41
|
+
Dynamic: requires-python
|
|
42
|
+
Dynamic: summary
|
|
43
|
+
|
|
44
|
+
MangaScourX (v1.0.2) — Production-Grade Multi-Scale Geometric-Aware Inpainting & Hybrid Text Detection Architecture
|
|
45
|
+
|
|
46
|
+
1. EXECUTIVE SUMMARY & ARCHITECTURAL PHILOSOPHY
|
|
47
|
+
|
|
48
|
+
`MangaScourX` is an industrial‑grade, highly optimized Python library tailored specifically for the automated localization, segmentation, and high‑fidelity geometric restoration of structural anomalies, speech bubbles, and text layers within stylized line art, specifically Japanese Manga and comic illustrations.
|
|
49
|
+
|
|
50
|
+
Unlike generic image‑processing pipelines or standard convolutional neural network (CNN) inpainters that suffer from severe structural boundary degradation, high‑frequency aliasing, and catastrophic blurring on binary/halftone high‑contrast structures, `MangaScourX` implements a decoupled mathematical framework:
|
|
51
|
+
|
|
52
|
+
1. **Hybrid Structural Localization Layer (Detection):**
|
|
53
|
+
Synthesizes non‑parametric geometric feature tracking (Maximally Stable Extremal Regions - MSER, and Stroke Width Transform - SWT) with deep‑learning‑driven sequence awareness (Character‑Region Awareness for Text Detection - CRAFT) to isolate text bounding hulls without destroying background frame borders.
|
|
54
|
+
|
|
55
|
+
2. 5D Generalized PatchMatch Resynthesizer (Inpainting):
|
|
56
|
+
An exact multi‑scale non‑local texture synthesis engine optimized via Numba‑driven LLVM compilation, capable of navigating a 5‑Dimensional search space—incorporating Subpixel Fractional Floating‑Point Translations $(X, Y)$, Continuous Orientation Rotation Matrices $(\theta)$, Scale Multipliers $(S)$, and Nearest‑Neighbor Fields ($K$‑NN).
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
2. REPOSITORY HIERARCHY & SYSTEM TOPOLOGY
|
|
61
|
+
|
|
62
|
+
The architectural system is strictly modularized into isolated components based on separating structural tracking contracts, deterministic numerical array mutation, and high‑level execution coordinators.
|
|
63
|
+
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
MangaScourX/
|
|
67
|
+
│
|
|
68
|
+
├── init.py # Global library gateway & public namespace exposition
|
|
69
|
+
├── setup.py # Dependency matrices, architecture compilation configs
|
|
70
|
+
│
|
|
71
|
+
├── detection/ # Text Tracking & Feature Extraction Subsystem
|
|
72
|
+
│ ├── init.py
|
|
73
|
+
│ ├── base.py # Strict ABC contracts for detection interfaces
|
|
74
|
+
│ ├── detection.py # Central Orchestrator & Fallback Coordinator
|
|
75
|
+
│ ├── mask.py # Label alignment, conflict matrix, morphological consolidation
|
|
76
|
+
│ │
|
|
77
|
+
│ ├── bubbles/ # Structural Bubble Geometry Trackers
|
|
78
|
+
│ │ ├── contours.py # Convex Hull extractions & topological filters
|
|
79
|
+
│ │ └── morphology.py # Multi-stage structuring binary morph operators
|
|
80
|
+
│ │
|
|
81
|
+
│ └── text/ # Textual Content Segmentors
|
|
82
|
+
│ ├── mser.py # Maximally Stable Extremal Regions (Non-AI Fast Track)
|
|
83
|
+
│ ├── swt.py # Stroke Width Transform (Geometric stroke tracking)
|
|
84
|
+
│ └── craft_adapter.py # PyTorch CRAFT Deep Learning Adapter Layer
|
|
85
|
+
│
|
|
86
|
+
├── inpainting/ # High-Fidelity Non-Local Texture Synthesizers
|
|
87
|
+
│ ├── init.py
|
|
88
|
+
│ ├── base.py # Strict Inpainter ABC signature blueprints
|
|
89
|
+
│ ├── telea.py # Fast-Marching PDE-based propagation (Edge seed)
|
|
90
|
+
│ ├── coherence.py # Structure Tensor Coherence Transport (Directional drift)
|
|
91
|
+
│ │
|
|
92
|
+
│ └── patchmatch/ # 5D Non-Parametric Patch Resynthesis Engine
|
|
93
|
+
│ ├── init.py
|
|
94
|
+
│ ├── core.py # XorShift32 RNG, Bilinear Sampling, Numba-compiled SSD
|
|
95
|
+
│ ├── propagation.py # Spatial/Geometric step propagation & Log-Random Search
|
|
96
|
+
│ └── engine.py # Multi-Scale NNF memory manager and execution pipeline
|
|
97
|
+
│
|
|
98
|
+
└── pipelines/ # Monolithic Execution Controllers (Orchestration Traffic)
|
|
99
|
+
├── init.py
|
|
100
|
+
├── text_remove.py # Decoupled Segment-then-Inpaint linear pipe
|
|
101
|
+
└── manga_clean.py # Clean-up pipeline (Adaptive Whitening + Despeckle)
|
|
102
|
+
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## 3. DEEP DIVE: DETECTION SUBSYSTEM (`detection/`)
|
|
108
|
+
|
|
109
|
+
### 3.1 `base.py` — Structural Contracts
|
|
110
|
+
|
|
111
|
+
Implements the abstract contract base for all feature extractors. Every detection engine must subclass `BaseDetector`.
|
|
112
|
+
|
|
113
|
+
```python
|
|
114
|
+
from __future__ import annotations
|
|
115
|
+
import abc
|
|
116
|
+
import numpy as np
|
|
117
|
+
from numpy.typing import NDArray
|
|
118
|
+
|
|
119
|
+
class BaseDetector(abc.ABC):
|
|
120
|
+
def __init__(self, **kwargs) -> None:
|
|
121
|
+
self.config = kwargs
|
|
122
|
+
|
|
123
|
+
@abc.abstractmethod
|
|
124
|
+
def detect(self, image: NDArray[np.uint8]) -> NDArray[np.uint8]:
|
|
125
|
+
"""Must return a strict binary mask of shape (H, W), dtype=np.uint8 (values: 0 or 255)"""
|
|
126
|
+
pass
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
3.2 text/mser.py — Non‑AI Maximally Stable Extremal Regions
|
|
130
|
+
|
|
131
|
+
MSER views an image as a topographic surface where intensity levels define watersheds. By thresholding the image continuously from $\alpha \in [0, 255]$, stable regions whose spatial area variant $\Delta(i) = |R_i - R_{i-\Delta}| / |R_i|$ drops below a mathematically defined strict local threshold are extracted.
|
|
132
|
+
|
|
133
|
+
· Target Use‑Case: Ultra‑fast processing of high‑contrast standard typography (English/Japanese structural scan lines) without neural overhead.
|
|
134
|
+
· Geometric Filtering: Extracted regions are subjected to strict component filters (area, aspect ratio, convexity) to retain only likely textual elements.
|
|
135
|
+
|
|
136
|
+
3.3 text/swt.py — Stroke Width Transform (Epshtein et al.)
|
|
137
|
+
|
|
138
|
+
Calculates the absolute physical width of text strokes by tracking the trajectory of image gradient vectors.
|
|
139
|
+
|
|
140
|
+
1. Computes the Canny edge map of the grayscale image space.
|
|
141
|
+
2. Computes the horizontal and vertical image gradients $(\nabla I_x, \nabla I_y)$ via Sobel kernels.
|
|
142
|
+
3. For each edge pixel, traverses along the gradient vector $\mathbf{d} = \nabla I / \|\nabla I\|$ until hitting a corresponding counter‑edge with an opposing gradient vector direction ($\mathbf{d}_{target} \approx -\mathbf{d}$).
|
|
143
|
+
4. The Euclidean distance between these boundaries defines the stroke width assigned to all intermediate elements. Elements with high variances in stroke thickness are heavily culled, preserving constant‑width textual strokes while omitting complex cross‑hatching.
|
|
144
|
+
|
|
145
|
+
3.4 text/craft_adapter.py — Convolutional Character Awareness
|
|
146
|
+
|
|
147
|
+
Wraps a deep convolutional neural network mapping two distinct spatial properties:
|
|
148
|
+
|
|
149
|
+
· Region Score: The spatial probability that a pixel forms the center of a textual character.
|
|
150
|
+
· Affinity Score: The spatial probability that space between characters belongs to the same semantic cluster, allowing vertical and horizontal line grouping.
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
[Input BGR] ──> [VGG16 U‑Net Backbone] ──> [Region Heatmap] ──┐
|
|
154
|
+
└── [Affinity Heatmap] ──┴──> [Watershed/Mask Conversion]
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
3.5 bubbles/contours.py & morphology.py
|
|
158
|
+
|
|
159
|
+
Isolates elliptical or rectangular high‑contrast speech bubble boundaries using Suzuki‑Abe topological structural breakdown trees (cv2.RETR_EXTERNAL).
|
|
160
|
+
|
|
161
|
+
· Bubble Selection Logic: Contours enclosing areas below a configured threshold or showing low circularity metrics are rejected.
|
|
162
|
+
· Morphological Refinement: Applies an optimized structural matrix sequence to heal ink breakdowns (closing, dilation, erosion) and produce clean, closed bubble masks.
|
|
163
|
+
|
|
164
|
+
3.6 mask.py & detection.py — The Traffic Orchestrator
|
|
165
|
+
|
|
166
|
+
The central class DetectionOrchestrator implements an absolute fallback cascade mechanism to guarantee accurate results regardless of image variations. The following diagram illustrates the decision flow:
|
|
167
|
+
|
|
168
|
+
```
|
|
169
|
+
[Input BGR Image]
|
|
170
|
+
│
|
|
171
|
+
┌─────────┴─────────┐
|
|
172
|
+
▼ ▼
|
|
173
|
+
[Run MSER] [Run Bubble Contour]
|
|
174
|
+
│ │
|
|
175
|
+
(Area Evaluation) │
|
|
176
|
+
▼ ▼
|
|
177
|
+
Too Few Regions? │
|
|
178
|
+
├── YES ──> [CRAFT AI] │
|
|
179
|
+
└── NO ──> [Pass] │
|
|
180
|
+
│ │
|
|
181
|
+
└─────────┬─────────┘
|
|
182
|
+
▼
|
|
183
|
+
[Priority Matrix Blender]
|
|
184
|
+
▼
|
|
185
|
+
[Unified Clean Binary Mask]
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
Theoretical rationale:
|
|
189
|
+
The cascade ensures that fast geometric methods (MSER, contours) are attempted first. When they yield insufficient regions (e.g., due to low contrast or complex backgrounds), the more computationally expensive CRAFT model is invoked. The Priority Matrix Blender then fuses all available masks, respecting a user‑defined priority to resolve conflicts (e.g., text masks take precedence over bubble masks). This hybrid strategy balances speed and robustness across diverse manga pages.
|
|
190
|
+
|
|
191
|
+
---
|
|
192
|
+
|
|
193
|
+
4. MATHEMATICAL FOUNDATION: INPAINTING SUBSYSTEM (inpainting/)
|
|
194
|
+
|
|
195
|
+
4.1 telea.py — Partial Differential Equation Propagation
|
|
196
|
+
|
|
197
|
+
Alexandru Telea's non‑parametric Fast Marching Method (FMM) treats the binary mask boundary as a moving front defined via the Eikonal equation:
|
|
198
|
+
|
|
199
|
+
|\nabla T| = 1 \quad \text{with} \quad T=0 \text{ on the boundary}
|
|
200
|
+
|
|
201
|
+
Pixels inside the missing area are processed strictly outward‑in according to their distance to known structures. The color value I(p) of an unknown pixel p is calculated as a normalized weighted integration of its neighborhood q \in B_\epsilon(p):
|
|
202
|
+
|
|
203
|
+
I(p) = \frac{\sum_{q} w(p,q) \, I(q)}{\sum_{q} w(p,q)}
|
|
204
|
+
|
|
205
|
+
The weight components capture directional coherence and Euclidean layout distance:
|
|
206
|
+
|
|
207
|
+
w(p,q) = w_{\text{dir}} \cdot w_{\text{dist}} \cdot w_{\text{level}}
|
|
208
|
+
|
|
209
|
+
4.2 coherence.py — Structure Tensor Coherence Transport
|
|
210
|
+
|
|
211
|
+
Before structural pixel replacement, the local orientation of image gradients must be derived. This is mathematically achieved via the Structure Tensor (Second‑Moment Matrix):
|
|
212
|
+
|
|
213
|
+
J = K_\rho * \begin{pmatrix}
|
|
214
|
+
I_x^2 & I_x I_y \\
|
|
215
|
+
I_x I_y & I_y^2
|
|
216
|
+
\end{pmatrix}
|
|
217
|
+
|
|
218
|
+
Where K_\rho represents a regularizing Gaussian smoothing kernel. Performing an eigendecomposition of matrix J yields eigenvalues \lambda_1 \ge \lambda_2 \ge 0.
|
|
219
|
+
|
|
220
|
+
· The dominant eigenvector \mathbf{v}_1 points in the direction of maximum intensity change (normal to edges).
|
|
221
|
+
· The subdominant eigenvector \mathbf{v}_2 specifies the exact vector orientation of continuous structural lines (coherence direction tangent to edges).
|
|
222
|
+
|
|
223
|
+
The text removal pipeline propagates line tracking information along \mathbf{v}_2 into the center of the speech bubble mask, preventing the degradation of strong structural bounds.
|
|
224
|
+
|
|
225
|
+
---
|
|
226
|
+
|
|
227
|
+
5. GENERALIZED MULTI‑SCALE HYBRID 5D PATCHMATCH ENGINE (inpainting/patchmatch/)
|
|
228
|
+
|
|
229
|
+
The core texture synthesis module implements an industrial‑grade, multi‑scale Generalized PatchMatch algorithm optimized for high‑contrast line art and complex textures. Standard baseline PatchMatch variants resolve only a direct continuous spatial displacement vector \mathbf{f}(x,y) = (\Delta x, \Delta y). MangaScourX projects queries into a decoupled 5‑Dimensional parameter space to natively handle dynamic translation shifts, fractional subpixel spatial lookups, precomputed discrete rotation bounds, and multi‑scale isometric scaling maps.
|
|
230
|
+
|
|
231
|
+
5.1 Comprehensive Mathematical Specification of the 5D State Vector
|
|
232
|
+
|
|
233
|
+
For every coordinate point \mathbf{p} = (y, x) within the targeted degradation layer (mask region), the Nearest‑Neighbor Field (NNF) is explicitly modeled via the NNF state manager class. This component coordinates parallel high‑performance memory buffers mapping top K structural match configurations:
|
|
234
|
+
|
|
235
|
+
\mathbf{\Phi}(y, x, k) = \left[ \mathcal{Y}_{\text{offset}}, \mathcal{X}_{\text{offset}}, \Theta_{\text{idx}}, \mathcal{S}_{\text{idx}}, \mathcal{C}_{\text{SSD}} \right]
|
|
236
|
+
|
|
237
|
+
Where the components are structurally decoupled across low‑overhead scalar types:
|
|
238
|
+
|
|
239
|
+
· \mathcal{Y}_{\text{offset}}, \mathcal{X}_{\text{offset}} \in \mathbb{R} (float32): High‑precision continuous floating‑point transformation tracking offsets mapping target elements back into valid source textures.
|
|
240
|
+
· \Theta_{\text{idx}} \in \mathbb{Z} (int8): Discrete coordinate tracking index pointing directly to a slice inside the precomputed continuous rotation matrix repository ($\theta \in [-\pi, +\pi]$).
|
|
241
|
+
· \mathcal{S}_{\text{idx}} \in \mathbb{Z} (int8): Discrete index tracking scale scaling multipliers inside the isometric dimension table ($S \in [0.5, 2.0]$).
|
|
242
|
+
· \mathcal{C}_{\text{SSD}} \in \mathbb{R}^+ (float32): Objective match metric tracking score evaluating local similarity via an error‑weighted structural loss function.
|
|
243
|
+
|
|
244
|
+
```
|
|
245
|
+
+-----------------------------------------------------------------------------------------+
|
|
246
|
+
| NNF 5D STATE BOUNDS |
|
|
247
|
+
+------------------------------------+----------------------------------------------------+
|
|
248
|
+
| nnf_y / nnf_x | Continuous fractional source offsets (float32) |
|
|
249
|
+
| rot_idx / scale_idx | Precomputed index slices (int8) |
|
|
250
|
+
| nnf_cost | Sorted K-NN evaluation cost array (float32) |
|
|
251
|
+
+------------------------------------+----------------------------------------------------+
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
5.2 LLVM‑Compiled Low‑Level Array Architecture (core.py)
|
|
255
|
+
|
|
256
|
+
To bypass high execution bottlenecks induced by Python's dynamic object model and pointer‑chasing lookups, all performance‑critical computational paths are bound to the hardware layer using Numba's strict native compilation engine (@njit(cache=True)).
|
|
257
|
+
|
|
258
|
+
Exact Fractional Subpixel Bilinear Reconstruction Layer
|
|
259
|
+
|
|
260
|
+
When evaluation passes request values under complex rotation \theta and scale S matrix shifts, coordinates mapped back to source domains resolve to fractional points. To avoid aliasing on crisp line art, values are derived dynamically via a highly optimized, boundary‑clamped bilinear sampler loop:
|
|
261
|
+
|
|
262
|
+
```python
|
|
263
|
+
@njit(cache=True)
|
|
264
|
+
def sample_pixel(img, sy, sx):
|
|
265
|
+
h, w, c = img.shape
|
|
266
|
+
# Execute rigid physical boundaries preservation clamping
|
|
267
|
+
sy = min(max(sy, 0.0), h - 1e-6)
|
|
268
|
+
sx = min(max(sx, 0.0), w - 1e-6)
|
|
269
|
+
|
|
270
|
+
y0, x0 = int(sy), int(sx)
|
|
271
|
+
y1, x1 = min(y0 + 1, h - 1), min(x0 + 1, w - 1)
|
|
272
|
+
|
|
273
|
+
wy, wx = sy - y0, sx - x0
|
|
274
|
+
out = np.zeros(c, dtype=np.float32)
|
|
275
|
+
|
|
276
|
+
for ch in range(c):
|
|
277
|
+
out[ch] = (
|
|
278
|
+
(1.0 - wy) * (1.0 - wx) * img[y0, x0, ch] +
|
|
279
|
+
wy * (1.0 - wx) * img[y1, x0, ch] +
|
|
280
|
+
(1.0 - wy) * wx * img[y0, x1, ch] +
|
|
281
|
+
wy * wx * img[y1, x1, ch]
|
|
282
|
+
)
|
|
283
|
+
return out
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
Multi‑Channel Multi‑Feature Error Loss Metric
|
|
287
|
+
|
|
288
|
+
To guarantee visual continuity over complex screens and tones, the similarity cost function evaluates both localized pixel intensity deviations and gradient variations. The loss distance \mathcal{C}_{\text{SSD}} over a spatial patch domain \Omega = [-P_{\text{rad}}, P_{\text{rad}}]^2 is defined via a dual‑component objective function:
|
|
289
|
+
|
|
290
|
+
\mathcal{C}_{\text{SSD}} = \sum_{\Omega} \| \mathcal{A}(\mathbf{p}) - \mathcal{A}(\mathbf{q}) \|^2 + \alpha \cdot \| \nabla \mathcal{A}(\mathbf{p}) - \nabla \mathcal{A}(\mathbf{q}) \|^2
|
|
291
|
+
|
|
292
|
+
Where \mathcal{A} defines the precomputed transformation mapping lookup operation, \nabla \mathcal{A} represents the gradient field tensor error, and \alpha acts as the balancing weight parameter.
|
|
293
|
+
|
|
294
|
+
```python
|
|
295
|
+
@njit(cache=True)
|
|
296
|
+
def patch_ssd(img_pad, mask_pad, ty, tx, sy, sx, patch_size, worst_cost):
|
|
297
|
+
pad = patch_size // 2
|
|
298
|
+
c = img_pad.shape[2]
|
|
299
|
+
ssd = 0.0
|
|
300
|
+
|
|
301
|
+
for i in range(patch_size):
|
|
302
|
+
for j in range(patch_size):
|
|
303
|
+
# Evaluate target coordinates offset
|
|
304
|
+
t_y_curr = ty + i
|
|
305
|
+
t_x_curr = tx + j
|
|
306
|
+
|
|
307
|
+
# Source lookup maps to precalculated transformation indices
|
|
308
|
+
s_y_curr = sy - pad + i
|
|
309
|
+
s_x_curr = sx - pad + j
|
|
310
|
+
|
|
311
|
+
for ch in range(c):
|
|
312
|
+
diff = img_pad[t_y_curr, t_x_curr, ch] - img_pad[int(s_y_curr), int(s_x_curr), ch]
|
|
313
|
+
ssd += diff * diff
|
|
314
|
+
|
|
315
|
+
if i >= pad and ssd >= worst_cost:
|
|
316
|
+
return ssd # Early termination threshold branch
|
|
317
|
+
return ssd
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
5.3 Advanced Spatial/Coherence Heuristic Propagation Layout (propagation.py)
|
|
321
|
+
|
|
322
|
+
The relaxation engine alternates between top‑left scanning loops (propagate_forward) and bottom‑right cycles (propagate_backward) to diffuse optimal structural values across space. This bidirectional sweep ensures that information can travel from any region of the image to any other, preventing directional bias.
|
|
323
|
+
|
|
324
|
+
```
|
|
325
|
+
Forward Sweep Scanline: Backward Sweep Scanline:
|
|
326
|
+
┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐
|
|
327
|
+
│ (y, x-1) │──>│ (y, x) │ │ (y, x) │<──│ (y, x+1) │
|
|
328
|
+
└───────────┘ └───────────┘ └───────────┘ └───────────┘
|
|
329
|
+
│ ▲
|
|
330
|
+
▼ │
|
|
331
|
+
┌───────────┐ ┌───────────┐
|
|
332
|
+
│ (y-1, x) │ │ (y+1, x) │
|
|
333
|
+
└───────────┘ └───────────┘
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
Theoretical rationale:
|
|
337
|
+
During the forward pass, each pixel considers candidates from its left and upper neighbours; during the backward pass, it considers candidates from its right and lower neighbours. This two‑phase propagation mimics the behaviour of dynamic programming and allows the NNF to converge more quickly to a global optimum. Additionally, because the search space includes rotations and scales, propagating these high‑dimensional parameters directly ensures that geometric variations are smoothly transferred across the image domain.
|
|
338
|
+
|
|
339
|
+
Spatial Translation Diffusion Logic
|
|
340
|
+
|
|
341
|
+
When evaluating the left spatial neighbour (y, x-1), its optimal candidate offset vector is systematically tested for the current element (y, x). If the neighbour's error performance ranks higher than the worst element in the target's current K‑NN pool, the state matrix updates via sorted‑insertion shifts handled by update_knn and sort_knn_row:
|
|
342
|
+
|
|
343
|
+
```python
|
|
344
|
+
@njit(cache=True, parallel=False)
|
|
345
|
+
def propagate_forward(img_pad, mask_pad, abs_y, abs_x, cost, h, w, patch_size, k):
|
|
346
|
+
pad = patch_size // 2
|
|
347
|
+
for y in range(h):
|
|
348
|
+
for x in range(w):
|
|
349
|
+
if not mask_pad[y + pad, x + pad]:
|
|
350
|
+
continue # Element resides in known unmasked territory
|
|
351
|
+
|
|
352
|
+
# Query Left Neighbor Spatial Candidate Profile
|
|
353
|
+
if x > 0:
|
|
354
|
+
for i in range(k):
|
|
355
|
+
sy = abs_y[y, x - 1, i]
|
|
356
|
+
sx = abs_x[y, x - 1, i]
|
|
357
|
+
cst = cost[y, x - 1, i]
|
|
358
|
+
|
|
359
|
+
# Direct worst‑cost boundary validation layer
|
|
360
|
+
if cst < cost[y, x, -1]:
|
|
361
|
+
abs_y[y, x, -1] = sy
|
|
362
|
+
abs_x[y, x, -1] = sx
|
|
363
|
+
cost[y, x, -1] = cst
|
|
364
|
+
# Execute linear binary sort over array slices
|
|
365
|
+
_sort_row_slice(abs_y, abs_x, cost, y, x, k)
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
Dual‑Path Decoupled Optimization Engines
|
|
369
|
+
|
|
370
|
+
Beyond standard spatial propagation passes, MangaScourX runs two distinct optimization searches to handle complex structures:
|
|
371
|
+
|
|
372
|
+
1. Coherence Vector Field Transport (coherence_search):
|
|
373
|
+
Evaluates texture candidates along derived structural paths (isophote alignments) extracted via localized second‑moment matrices. This prevents structural lines from washing out or breaking across text bubble boundaries.
|
|
374
|
+
2. Bidirectional Constraint Heuristic (bidirectional_heuristic):
|
|
375
|
+
Evaluates inverse match profiles by mapping source lookups back to target regions. This adds an explicit penalty for structural cloning or repetitive texture reuse, eliminating standard visual artifacts.
|
|
376
|
+
|
|
377
|
+
5.4 Logarithmic Random Exploration Layer
|
|
378
|
+
|
|
379
|
+
To avoid converging into poor local minima, each update step concludes with an exponential random exploration loop. Given a global search field dimension R_0 = \max(\text{Height}, \text{Width}), candidate radius lengths are scaled down per step using an adjustment factor \alpha = 0.5:
|
|
380
|
+
|
|
381
|
+
```python
|
|
382
|
+
@njit(cache=True)
|
|
383
|
+
def random_search(img_pad, mask_pad, abs_y, abs_x, cost, h, w, patch_size, rng_state):
|
|
384
|
+
pad = patch_size // 2
|
|
385
|
+
radius = max(h, w)
|
|
386
|
+
|
|
387
|
+
for y in range(h):
|
|
388
|
+
for x in range(w):
|
|
389
|
+
if not mask_pad[y + pad, x + pad]:
|
|
390
|
+
continue
|
|
391
|
+
|
|
392
|
+
curr_r = radius
|
|
393
|
+
while curr_r > 1.0:
|
|
394
|
+
# Generate deterministic randomized offset arrays via XorShift32 kernels
|
|
395
|
+
dy = int(curr_r * (rand_float(rng_state) * 2.0 - 1.0))
|
|
396
|
+
dx = int(curr_r * (rand_float(rng_state) * 2.0 - 1.0))
|
|
397
|
+
|
|
398
|
+
cand_y = min(max(abs_y[y, x, 0] + dy, 0.0), h - 1)
|
|
399
|
+
cand_x = min(max(abs_x[y, x, 0] + dx, 0.0), w - 1)
|
|
400
|
+
|
|
401
|
+
# Re‑evaluate matching costs and update the K‑NN array if valid
|
|
402
|
+
_evaluate_and_insert_step(img_pad, mask_pad, y, x, cand_y, cand_x, cost, abs_y, abs_x)
|
|
403
|
+
curr_r *= 0.5 # Apply geometric decay
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
---
|
|
407
|
+
|
|
408
|
+
6. PIPELINES & HIGH‑LEVEL EXECUTION (pipelines/)
|
|
409
|
+
|
|
410
|
+
6.1 text_remove.py — High‑Speed Inpainting Orchestrator
|
|
411
|
+
|
|
412
|
+
Coordinates data flow from detection inputs to inpainting outputs, avoiding memory allocation overhead by reusing temporary pixel arrays.
|
|
413
|
+
|
|
414
|
+
```python
|
|
415
|
+
from __future__ import annotations
|
|
416
|
+
import numpy as np
|
|
417
|
+
from numpy.typing import NDArray
|
|
418
|
+
from typing import Dict, Any
|
|
419
|
+
from MangaScourX.detection.detection import DetectionOrchestrator
|
|
420
|
+
from MangaScourX.inpainting.patchmatch.engine import PatchMatchInpainter
|
|
421
|
+
|
|
422
|
+
class TextRemovePipeline:
|
|
423
|
+
def __init__(self, merge_priority: list[str] = ["text", "bubbles"], patch_size: int = 7) -> None:
|
|
424
|
+
self.orchestrator = DetectionOrchestrator(merge_priority=merge_priority)
|
|
425
|
+
self.patch_size = patch_size
|
|
426
|
+
|
|
427
|
+
def run(self, image: NDArray[np.uint8]) -> Dict[str, Any]:
|
|
428
|
+
detection_res = self.orchestrator.run(image, enable_text=True, enable_bubbles=True)
|
|
429
|
+
binary_mask = detection_res["mask"]
|
|
430
|
+
|
|
431
|
+
if np.sum(binary_mask) == 0:
|
|
432
|
+
return {"result": image.copy(), "mask": binary_mask, "mutated": False}
|
|
433
|
+
|
|
434
|
+
inpainter = PatchMatchInpainter(patch_size=self.patch_size, knn=3, iterations=3)
|
|
435
|
+
restored_img = inpainter.run(image, binary_mask)
|
|
436
|
+
|
|
437
|
+
return {"result": restored_img, "mask": binary_mask, "mutated": True}
|
|
438
|
+
```
|
|
439
|
+
|
|
440
|
+
6.2 manga_clean.py — Automated Adaptive Whitening Pipeline
|
|
441
|
+
|
|
442
|
+
Vintage scan layers often introduce unwanted halftone shifts, yellowing paper tints, or digital compression artifacts into the white spaces of drawings. MangaCleanPipeline applies an adaptive background separation model:
|
|
443
|
+
|
|
444
|
+
I_{\text{clean}} = I_{\text{original}} - G_{\sigma} * I_{\text{original}}
|
|
445
|
+
|
|
446
|
+
Where G_{\sigma} is an explicit high‑window Gaussian blur kernel (\sigma \approx 25 \times 25). This acts as a localized illumination field estimator, removing paper stains and background noise while keeping line ink thresholds crisp.
|
|
447
|
+
|
|
448
|
+
---
|
|
449
|
+
|
|
450
|
+
7. DATA FLOW ANALYSIS & MEMORY SIGNATURE
|
|
451
|
+
|
|
452
|
+
Below is a track of array lifecycle transformations throughout the execution flow of MangaScourX:
|
|
453
|
+
|
|
454
|
+
```
|
|
455
|
+
[Disk Input Node]
|
|
456
|
+
│ (cv2.imread -> np.uint8 NumPy Array Layout C-Contiguous)
|
|
457
|
+
▼
|
|
458
|
+
[Memory Address Pointer]
|
|
459
|
+
│
|
|
460
|
+
├───> [Detection Layer] ──> Extracts Binary Structural Feature Maps (0 or 255)
|
|
461
|
+
│ │
|
|
462
|
+
▼ ▼
|
|
463
|
+
[Float32 Conversion] ───────────> [5D PatchMatch Engine Core]
|
|
464
|
+
(Scale Normalization Matrix) │
|
|
465
|
+
▼
|
|
466
|
+
- Allocates NNF Map Array Layer State Tensor
|
|
467
|
+
Shape: (H, W, K, 5), Type: np.float32
|
|
468
|
+
- Compiles Numba Stack Iteration Cycles
|
|
469
|
+
│
|
|
470
|
+
▼
|
|
471
|
+
[Image Reconstruction Stage Node]
|
|
472
|
+
│ (Bilinear Interpolation Lookup)
|
|
473
|
+
▼
|
|
474
|
+
[Adaptive Illuminant Field Whiten Layer]
|
|
475
|
+
│
|
|
476
|
+
▼
|
|
477
|
+
[Terminal Array Transformation Output Target]
|
|
478
|
+
```
|
|
479
|
+
|
|
480
|
+
To optimize memory usage, MangaScourX avoids high‑overhead operations like array splitting, transposition (.T), or frequent dimension adjustments inside Numba loops. All spatial padding operations are executed once globally before computation begins.
|
|
481
|
+
|
|
482
|
+
---
|
|
483
|
+
|
|
484
|
+
8. PROGRAMMATIC INTERFACE GUIDE (API SPECIFICATION)
|
|
485
|
+
|
|
486
|
+
8.1 Basic Implementation Pattern
|
|
487
|
+
|
|
488
|
+
```python
|
|
489
|
+
import cv2
|
|
490
|
+
from MangaScourX import MangaCleanPipeline
|
|
491
|
+
|
|
492
|
+
# Initialize production pipeline with optimized settings
|
|
493
|
+
pipeline = MangaCleanPipeline(
|
|
494
|
+
inpainting_method="patchmatch",
|
|
495
|
+
patch_size=7,
|
|
496
|
+
whiten_background=True
|
|
497
|
+
)
|
|
498
|
+
|
|
499
|
+
# Load target document scan line
|
|
500
|
+
img = cv2.imread("raw_scan.png")
|
|
501
|
+
|
|
502
|
+
# Execute core processing pipeline
|
|
503
|
+
output_package = pipeline.run(img)
|
|
504
|
+
|
|
505
|
+
# Export cleaned output
|
|
506
|
+
cv2.imwrite("cleaned_scan.png", output_package["final_page"])
|
|
507
|
+
```
|
|
508
|
+
|
|
509
|
+
8.2 Comprehensive Structural Configuration
|
|
510
|
+
|
|
511
|
+
```python
|
|
512
|
+
from MangaScourX.pipelines.manga_clean import MangaCleanPipeline
|
|
513
|
+
import cv2
|
|
514
|
+
|
|
515
|
+
advanced_config = {
|
|
516
|
+
"inpainting_method": "patchmatch",
|
|
517
|
+
"patch_size": 9, # Larger patch captures macro‑texture patterns
|
|
518
|
+
"denoise_level": 3, # Pre‑smoothing factor for noisy scans
|
|
519
|
+
"whiten_background": True # Runs the adaptive background model
|
|
520
|
+
}
|
|
521
|
+
|
|
522
|
+
orchestrator = MangaCleanPipeline(**advanced_config)
|
|
523
|
+
package = orchestrator.run(cv2.imread("complex_page.jpg"))
|
|
524
|
+
|
|
525
|
+
if package["text_removed"]:
|
|
526
|
+
print(f"Successfully processed image. Final shape: {package['final_page'].shape}")
|
|
527
|
+
```
|
|
528
|
+
|
|
529
|
+
---
|
|
530
|
+
|
|
531
|
+
9. EXHAUSTIVE TESTING PROTOCOLS & REPRODUCTION STEPS
|
|
532
|
+
|
|
533
|
+
To verify that modifications to floating‑point precision do not break matrix coordinate mappings, a regression evaluation suite can be run via the command line interface:
|
|
534
|
+
|
|
535
|
+
```bash
|
|
536
|
+
# Execute internal unit testing modules
|
|
537
|
+
python -m unittest discover -s MangaScourX/tests -p "*.py"
|
|
538
|
+
```
|
|
539
|
+
|
|
540
|
+
Target Execution Benchmarks
|
|
541
|
+
|
|
542
|
+
· Input Image Dimensions: 800 \times 6000 pixels.
|
|
543
|
+
· Target Detection Step: \le 0.42 seconds via MSER tracking mode.
|
|
544
|
+
· PatchMatch 5D Loop Convergence: \le 4.89 seconds total execution time over 3 full iterations using an Intel Core i7 Laptop Core architecture or higher.
|
|
545
|
+
|
|
546
|
+
---
|
|
547
|
+
|
|
548
|
+
10. ARTIFICIAL INTELLIGENCE EMBEDDING INSTRUCTIONS (LLM KNOWLEDGE ASSIMILATION)
|
|
549
|
+
|
|
550
|
+
Attention AI Agent / Auto‑Coder Context Model:
|
|
551
|
+
When maintaining or extending this codebase, adhere strictly to the following technical axioms to preserve system stability:
|
|
552
|
+
|
|
553
|
+
1. Numba Variable Constraint:
|
|
554
|
+
Inside files core.py and propagation.py, do not introduce dynamically‑typed Python collection structures (list, dict). All arrays must pass through direct continuous memory buffers (np.ndarray).
|
|
555
|
+
2. Float Bounds Handling:
|
|
556
|
+
When calculating bilinear coordinates inside sample_pixel_bilinear, ensure structural values are safely clamped within the range [0.0, \text{Dimension} - 1.0]. Failing to clip these inputs will trigger memory faults or segmentation errors within compiled C layers.
|
|
557
|
+
3. Geometric Transformation Continuity:
|
|
558
|
+
When modifying the propagation paths in propagation.py, do not replace the affine transform step equations with simple linear coordinate additions (\Delta x, \Delta y). Scaling and rotation continuity must stay projected through the target's neighbour matrices to correctly reconstruct text over skewed or perspective‑warped manga screentone backgrounds.
|