npm - flappa-doormal - Versions diffs - 2.13.2 → 2.13.4 - Mend

flappa-doormal 2.13.2 → 2.13.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/AGENTS.md CHANGED Viewed

@@ -453,46 +453,44 @@ bunx biome lint .
 1. **`lineStartsAfter` vs `lineStartsWith` is not “cosmetic”**: `lineStartsAfter` changes output by stripping the matched marker via an internal `contentStartOffset` during segment construction. If a client used it by accident, you cannot reconstruct the exact stripped prefix from output alone without referencing the original pages and re-matching the marker.
-2. **Recovery must mirror segmentation’s preprocessing**: If `SegmentationOptions.replace` was used, recovery must apply the same replacements (see `src/preprocessing/replace.ts`) before attempting anchoring or rerun alignment, otherwise substring matching and page joins will drift.
-3. **Page joining differs between matching and output**:
+2. **Page joining differs between matching and output**:
    - Matching always happens on pages concatenated with `\\n` separators.
    - Output segments may normalize page boundaries (`pageJoiner: 'space' | 'newline'`) and breakpoints post-processing uses its own join normalization utilities.
    Recovery code must be explicit about which representation it’s searching.
-4. **Breakpoints can produce “pieces” that were never marker-stripped**: When `maxPages` + `breakpoints` are enabled, only the piece that starts at the original structural boundary could have lost a marker due to `lineStartsAfter`. Mid-segment breakpoint pieces should not be “recovered” unless you can anchor them confidently.
+3. **Breakpoints can produce “pieces” that were never marker-stripped**: When `maxPages` + `breakpoints` are enabled, only the piece that starts at the original structural boundary could have lost a marker due to `lineStartsAfter`. Mid-segment breakpoint pieces should not be “recovered” unless you can anchor them confidently.
-5. **Fuzzy defaults are easy to miss**: Some tokens auto-enable fuzzy matching unless `fuzzy: false` is set (`bab`, `basmalah`, `fasl`, `kitab`, `naql`). If you are validating markers or re-matching prefixes, use the same compilation path as segmentation (`buildRuleRegex` / `processPattern`) so diacritics and token expansion behave identically.
+4. **Fuzzy defaults are easy to miss**: Some tokens auto-enable fuzzy matching unless `fuzzy: false` is set (`bab`, `basmalah`, `fasl`, `kitab`, `naql`). If you are validating markers or re-matching prefixes, use the same compilation path as segmentation (`buildRuleRegex` / `processPattern`) so diacritics and token expansion behave identically.
-6. **Auto-escaping applies to template-like patterns**: `lineStartsWith`, `lineStartsAfter`, `lineEndsWith`, and `template` auto-escape `()[]` outside `{{tokens}}`. Raw `regex` does not. If you compare patterns by string equality, be careful about escaping and whitespace.
+5. **Auto-escaping applies to template-like patterns**: `lineStartsWith`, `lineStartsAfter`, `lineEndsWith`, and `template` auto-escape `()[]` outside `{{tokens}}`. Raw `regex` does not. If you compare patterns by string equality, be careful about escaping and whitespace.
-7. **TypeScript union pitfalls with `SplitRule`**: `SplitRule` is a union where only one pattern type should exist. Avoid mutating rules in-place with `delete` on fields (TS often narrows unions and then complains). Prefer rebuilding converted rules via destructuring (e.g. `{ lineStartsAfter, ...rest }` then create `{...rest, lineStartsWith: lineStartsAfter}`).
+6. **TypeScript union pitfalls with `SplitRule`**: `SplitRule` is a union where only one pattern type should exist. Avoid mutating rules in-place with `delete` on fields (TS often narrows unions and then complains). Prefer rebuilding converted rules via destructuring (e.g. `{ lineStartsAfter, ...rest }` then create `{...rest, lineStartsWith: lineStartsAfter}`).
-8. **Biome lint constraints shape implementation**: The repo enforces low function complexity. Expect to extract helpers (alignment, selector resolution, anchoring) to keep Biome happy. Also, Biome can flag regex character-class usage as misleading; prefer alternation (e.g. `(?:\\u200C|\\u200D|\\uFEFF)`) when removing specific codepoints.
+7. **Biome lint constraints shape implementation**: The repo enforces low function complexity. Expect to extract helpers (alignment, selector resolution, anchoring) to keep Biome happy. Also, Biome can flag regex character-class usage as misleading; prefer alternation (e.g. `(?:\\u200C|\\u200D|\\uFEFF)`) when removing specific codepoints.
-9. **When debugging recovery, start here**:
+8. **When debugging recovery, start here**:
    - `src/segmentation/segmenter.ts` (how content is sliced/trimmed and how `from/to` are computed)
    - `src/segmentation/rule-regex.ts` + `src/segmentation/tokens.ts` (token expansion + fuzzy behavior)
    - `src/preprocessing/replace.ts` (preprocessing parity)
    - `src/recovery.ts` (recovery implementation)
-10. **Prefer library utilities for UI tasks**: Instead of re-implementing rule merging, validation, or token mapping in client code, use `optimizeRules`, `validateRules`/`formatValidationReport`, and `applyTokenMappings`. They handle edge cases (like duplicate patterns, regex safety, or diacritic handling) that ad-hoc implementations might miss.
+9. **Prefer library utilities for UI tasks**: Instead of re-implementing rule merging, validation, or token mapping in client code, use `optimizeRules`, `validateRules`/`formatValidationReport`, and `applyTokenMappings`. They handle edge cases (like duplicate patterns, regex safety, or diacritic handling) that ad-hoc implementations might miss.
-11. **Safety Fallback (Search-back)**: When forced to split at a hard character limit, searching backward for whitespace/punctuation (`[\s\n.,;!?؛،۔]`) prevents word-chopping and improves readability significantly.
+10. **Safety Fallback (Search-back)**: When forced to split at a hard character limit, searching backward for whitespace/punctuation (`[\s\n.,;!?؛،۔]`) prevents word-chopping and improves readability significantly.
-12. **Unicode Boundary Safety (Surrogates + Graphemes)**: Multi-byte characters (like emojis) can be corrupted if split in the middle of a surrogate pair. Similarly, Arabic diacritics (combining marks), ZWJ/ZWNJ, and variation selectors can be orphaned if a hard split lands in the middle of a grapheme cluster. Use `adjustForUnicodeBoundary` when forced to hard-split near a limit.
+11. **Unicode Boundary Safety (Surrogates + Graphemes)**: Multi-byte characters (like emojis) can be corrupted if split in the middle of a surrogate pair. Similarly, Arabic diacritics (combining marks), ZWJ/ZWNJ, and variation selectors can be orphaned if a hard split lands in the middle of a grapheme cluster. Use `adjustForUnicodeBoundary` when forced to hard-split near a limit.
-13. **Recursion/Iteration Safety**: Using a progress-based guard (comparing `cursorPos` before and after loop iteration) is safer than fixed iteration limits for supporting arbitrary-sized content without truncation risks.
+12. **Recursion/Iteration Safety**: Using a progress-based guard (comparing `cursorPos` before and after loop iteration) is safer than fixed iteration limits for supporting arbitrary-sized content without truncation risks.
-14. **Accidental File Overwrites**: Be extremely careful when using tools like `replace_file_content` with large ranges. Verify file integrity frequently (e.g., `git diff`) to catch accidental deletions of existing code or tests. Merging new tests into existing files is a high-risk operation for AI agents.
+13. **Accidental File Overwrites**: Be extremely careful when using tools like `replace_file_content` with large ranges. Verify file integrity frequently (e.g., `git diff`) to catch accidental deletions of existing code or tests. Merging new tests into existing files is a high-risk operation for AI agents.
-15. **Invisible Unicode Marks Break Regex Anchors**: Arabic text often contains invisible formatting marks like Left-to-Right Mark (`U+200E`), Right-to-Left Mark (`U+200F`), Arabic Letter Mark (`U+061C`), Zero-Width Space (`U+200B`), Zero-Width Non-Joiner (`U+200C`), Zero-Width Joiner (`U+200D`), or BOM (`U+FEFF`). These can appear at line starts after `\n` but before visible characters, breaking `^` anchored patterns. Solution: include an optional zero-width character class prefix in line-start patterns: `^[\u200E\u200F\u061C\u200B\u200C\u200D\uFEFF]*(?:pattern)`. The library now handles this automatically in `buildLineStartsWithRegexSource` and `buildLineStartsAfterRegexSource`.
+14. **Invisible Unicode Marks Break Regex Anchors**: Arabic text often contains invisible formatting marks like Left-to-Right Mark (`U+200E`), Right-to-Left Mark (`U+200F`), Arabic Letter Mark (`U+061C`), Zero-Width Space (`U+200B`), Zero-Width Non-Joiner (`U+200C`), Zero-Width Joiner (`U+200D`), or BOM (`U+FEFF`). These can appear at line starts after `\n` but before visible characters, breaking `^` anchored patterns. Solution: include an optional zero-width character class prefix in line-start patterns: `^[\u200E\u200F\u061C\u200B\u200C\u200D\uFEFF]*(?:pattern)`. The library now handles this automatically in `buildLineStartsWithRegexSource` and `buildLineStartsAfterRegexSource`.
-16. **Large Segment Performance & Debugging Strategy**: When processing large books (1000+ pages), avoid O(n²) algorithms. The library uses a fast-path threshold (1000 pages) to switch from accurate string-search boundary detection to cumulative-offset-based slicing. Even on the iterative path (e.g. debug mode), we **slice only the active window (+padding)** per iteration (never `fullContent.slice(cursorPos)`), to avoid quadratic allocation/GC churn. To diagnose performance bottlenecks: (1) Look for logs with "Using iterative path" or "Using accurate string-search path" with large `pageCount` values, (2) Check `iterations` count in completion logs, (3) Strategic logs are placed at operation boundaries (start/end) NOT inside tight loops to avoid log-induced performance regression.
+15. **Large Segment Performance & Debugging Strategy**: When processing large books (1000+ pages), avoid O(n²) algorithms. The library uses a fast-path threshold (1000 pages) to switch from accurate string-search boundary detection to cumulative-offset-based slicing. Even on the iterative path (e.g. debug mode), we **slice only the active window (+padding)** per iteration (never `fullContent.slice(cursorPos)`), to avoid quadratic allocation/GC churn. To diagnose performance bottlenecks: (1) Look for logs with "Using iterative path" or "Using accurate string-search path" with large `pageCount` values, (2) Check `iterations` count in completion logs, (3) Strategic logs are placed at operation boundaries (start/end) NOT inside tight loops to avoid log-induced performance regression.
-17. **`maxPages=0` is a hard invariant**: When `maxPages=0`, breakpoint windows must never scan beyond the current page boundary. Relying purely on boundary detection (string search) can fail near page ends for long Arabic text + space joiners, letting the window “see” into the next page and creating multi-page segments. The safe fix is to clamp the breakpoint window to the current page’s end using `boundaryPositions` in breakpoint processing.
+16. **`maxPages=0` is a hard invariant**: When `maxPages=0`, breakpoint windows must never scan beyond the current page boundary. Relying purely on boundary detection (string search) can fail near page ends for long Arabic text + space joiners, letting the window “see” into the next page and creating multi-page segments. The safe fix is to clamp the breakpoint window to the current page’s end using `boundaryPositions` in breakpoint processing.
-18. **`''` breakpoint semantics depend on whether the window is page-bounded vs length-bounded**: `''` means “page boundary fallback”, but it’s intentionally **mode-dependent**:
+17. **`''` breakpoint semantics depend on whether the window is page-bounded vs length-bounded**: `''` means “page boundary fallback”, but it’s intentionally **mode-dependent**:
    - **Page-bounded window (maxPages-driven)**: `''` should “swallow the remainder of the current page” (i.e. break at the **next page boundary**, not at an arbitrary character limit). This prevents accidentally consuming part of the next page when no other breakpoint patterns match.
    - **Length-bounded window (maxContentLength-driven)**: `''` should **not** force an early page-boundary break. In this mode we want the best split *near the length limit* (safe-break fallback → Unicode-safe hard split) even if that means a piece can cross a page boundary.
@@ -508,17 +506,17 @@ bunx biome lint .
    }
    ```
-19. **Beware `.only` in test files**: A single `it.only(...)` can mask unrelated failing fixtures for a long time. When debugging, remove `.only` as soon as you have a focused reproduction, and re-run the full suite to catch latent failures.
+18. **Beware `.only` in test files**: A single `it.only(...)` can mask unrelated failing fixtures for a long time. When debugging, remove `.only` as soon as you have a focused reproduction, and re-run the full suite to catch latent failures.
-20. **Tooling gotcha: IDE diagnostics vs actual parser**: If the editor shows parse errors but `bun test` and `bunx biome check` pass, suspect unsaved local edits or stale diagnostics rather than codebase syntax. Always validate with a direct `bunx biome check <file>` before making sweeping “syntax fix” edits.
+19. **Tooling gotcha: IDE diagnostics vs actual parser**: If the editor shows parse errors but `bun test` and `bunx biome check` pass, suspect unsaved local edits or stale diagnostics rather than codebase syntax. Always validate with a direct `bunx biome check <file>` before making sweeping “syntax fix” edits.
-21. **Content-based page detection fails with overlapping content**: The `computeNextFromIdx` function uses prefix matching to detect page transitions. When page 0 ends with text identical to page 1's prefix, it incorrectly advances `currentFromIdx`. **Fix**: When `maxPages=0`, override content-based detection with position-based detection via `findPageIndexForPosition(cursorPos, boundaryPositions, fromIdx)`. Always trust cumulative offsets over content heuristics for strict page isolation.
+20. **Content-based page detection fails with overlapping content**: The `computeNextFromIdx` function uses prefix matching to detect page transitions. When page 0 ends with text identical to page 1's prefix, it incorrectly advances `currentFromIdx`. **Fix**: When `maxPages=0`, override content-based detection with position-based detection via `findPageIndexForPosition(cursorPos, boundaryPositions, fromIdx)`. Always trust cumulative offsets over content heuristics for strict page isolation.
-22. **Test edge cases with data that TRIGGERS the bug path**: Simple test data often bypasses problematic code paths. Ensure tests: (a) use `maxContentLength` to force sub-page splitting, (b) include enough content to exceed window sizes, (c) create overlapping/duplicate text at page boundaries, (d) verify that segments are actually split (not just checking no crashes).
+21. **Test edge cases with data that TRIGGERS the bug path**: Simple test data often bypasses problematic code paths. Ensure tests: (a) use `maxContentLength` to force sub-page splitting, (b) include enough content to exceed window sizes, (c) create overlapping/duplicate text at page boundaries, (d) verify that segments are actually split (not just checking no crashes).
-23. **Debug breakpoint processing with the logger**: Pass a `logger` object with `debug` and `trace` methods to `segmentPages()`. Key logs: `boundaryPositions built` (page boundary byte offsets), `iteration=N` (shows `currentFromIdx`, `cursorPos`, `windowEndPosition` per loop), `Complete` (final segment count).
+22. **Debug breakpoint processing with the logger**: Pass a `logger` object with `debug` and `trace` methods to `segmentPages()`. Key logs: `boundaryPositions built` (page boundary byte offsets), `iteration=N` (shows `currentFromIdx`, `cursorPos`, `windowEndPosition` per loop), `Complete` (final segment count).
-24. **Navigating `breakpoint-processor.ts`**: Key functions in (approximate) execution order:
+23. **Navigating `breakpoint-processor.ts`**: Key functions in (approximate) execution order:
    - `applyBreakpoints()` (entry point)
    - `processOversizedSegment()` (main loop)
@@ -530,15 +528,15 @@ bunx biome lint .
        - `advanceCursorAndIndex()` (progress)
        - `computeNextFromIdx()` (heuristic) **or** position-based override when `maxPages=0` (see #21)
-25. **Page attribution can drift in large-document breakpoint processing**: For ≥`FAST_PATH_THRESHOLD` segments, boundary positions may be derived from cumulative offsets (fast path). If upstream content is modified (e.g. marker stripping or accidental leading-trim), binary-search attribution can classify a piece as starting **before** `currentFromIdx`, inflating `(to - from)` and violating `maxPages`. **Fix**: clamp `actualStartIdx >= currentFromIdx` and re-apply the `maxPages` window using the same ID-span logic as `computeWindowEndIdx(...)` before creating the piece segment.
+24. **Page attribution can drift in large-document breakpoint processing**: For ≥`FAST_PATH_THRESHOLD` segments, boundary positions may be derived from cumulative offsets (fast path). If upstream content is modified (e.g. marker stripping or accidental leading-trim), binary-search attribution can classify a piece as starting **before** `currentFromIdx`, inflating `(to - from)` and violating `maxPages`. **Fix**: clamp `actualStartIdx >= currentFromIdx` and re-apply the `maxPages` window using the same ID-span logic as `computeWindowEndIdx(...)` before creating the piece segment.
-26. **Offset fast path must respect page-ID span semantics**: `maxPages` in this library is enforced as an **ID span** invariant (`(to ?? from) - from <= maxPages`). For large segments, the offset-based fast path must choose `segEnd` using the same ID-window logic as `computeWindowEndIdx(...)` (not “N pages by index”), otherwise gaps (e.g. `2216 → 2218`) produce illegal spans.
+25. **Offset fast path must respect page-ID span semantics**: `maxPages` in this library is enforced as an **ID span** invariant (`(to ?? from) - from <= maxPages`). For large segments, the offset-based fast path must choose `segEnd` using the same ID-window logic as `computeWindowEndIdx(...)` (not “N pages by index”), otherwise gaps (e.g. `2216 → 2218`) produce illegal spans.
-27. **Never `trimStart()` huge fallback content**: `ensureFallbackSegment()` constructs “all pages as one segment” when there are no structural split rules. If this giant content is `trimStart()`’d, cumulative offsets and derived boundary positions become inconsistent, which can lead to incorrect `from/to` attribution and `maxPages` violations that only appear on very large books.
+26. **Never `trimStart()` huge fallback content**: `ensureFallbackSegment()` constructs “all pages as one segment” when there are no structural split rules. If this giant content is `trimStart()`’d, cumulative offsets and derived boundary positions become inconsistent, which can lead to incorrect `from/to` attribution and `maxPages` violations that only appear on very large books.
-28. **Always test both sides of the fast-path threshold**: Several breakpoint bugs only reproduce at or above `FAST_PATH_THRESHOLD` (1000). Add regressions at `threshold-1` and `threshold` to avoid “works in small unit tests, fails on full books” surprises.
+27. **Always test both sides of the fast-path threshold**: Several breakpoint bugs only reproduce at or above `FAST_PATH_THRESHOLD` (1000). Add regressions at `threshold-1` and `threshold` to avoid “works in small unit tests, fails on full books” surprises.
-29. **Breakpoint `split` behavior**: The `split: 'at' | 'after'` option for breakpoints controls where the split happens relative to the matched text:
+28. **Breakpoint `split` behavior**: The `split: 'at' | 'after'` option for breakpoints controls where the split happens relative to the matched text:
    - `'after'` (default): Match is included in the previous segment
    - `'at'`: Match starts the next segment
    Key implementation details in `findPatternBreakPosition`:
@@ -547,15 +545,15 @@ bunx biome lint .
    - Zero-length matches (lookaheads) are always skipped to prevent infinite loops
    - Empty pattern `''` forces `splitAt=false` since page boundaries have no matched text
-30. **Unicode safety is the user's responsibility for patterns**: Unlike `findSafeBreakPosition` (which adjusts for grapheme boundaries), pattern-based breaks use the exact position where the user's regex matched. If a pattern matches mid-grapheme, that's a pattern authoring error, not a library bug. The library should NOT silently adjust pattern match positions.
+29. **Unicode safety is the user's responsibility for patterns**: Unlike `findSafeBreakPosition` (which adjusts for grapheme boundaries), pattern-based breaks use the exact position where the user's regex matched. If a pattern matches mid-grapheme, that's a pattern authoring error, not a library bug. The library should NOT silently adjust pattern match positions.
-31. **Fast path doesn't affect split behavior**: The offset-based fast path only applies to empty pattern `''` breakpoints (page boundary fallback), and empty patterns force `splitAt=false`. Pattern-based breakpoints with `split:'at'` never engage the fast path.
+30. **Fast path doesn't affect split behavior**: The offset-based fast path only applies to empty pattern `''` breakpoints (page boundary fallback), and empty patterns force `splitAt=false`. Pattern-based breakpoints with `split:'at'` never engage the fast path.
-32. **Whitespace trimming affects split:'at' output**: `createSegment()` trims segment content. With `split:'at'`, if the matched text is whitespace-only, it will be trimmed from the start of the next segment. This is usually desirable for delimiter patterns.
+31. **Whitespace trimming affects split:'at' output**: `createSegment()` trims segment content. With `split:'at'`, if the matched text is whitespace-only, it will be trimmed from the start of the next segment. This is usually desirable for delimiter patterns.
-33. **`prefer` semantics with `split:'at'`**: With `prefer:'longer'` + `split:'at'`, the algorithm selects the LAST valid match, maximizing content in the previous segment. This is correct but can be counterintuitive since the resulting previous segment might appear "shorter" than with `split:'after'`.
+32. **`prefer` semantics with `split:'at'`**: With `prefer:'longer'` + `split:'at'`, the algorithm selects the LAST valid match, maximizing content in the previous segment. This is correct but can be counterintuitive since the resulting previous segment might appear "shorter" than with `split:'after'`.
-34. **Multi-agent review synthesis**: Getting implementation reviews from multiple AI models (Claude, GPT, Grok, Gemini) and synthesizing their feedback helps catch issues a single reviewer might miss. Key insight: when reviewers disagree on "critical" issues, investigate the codebase to verify claims before implementing fixes. Some "critical" issues are based on incorrect assumptions about how fast paths or downstream functions work.
+33. **Multi-agent review synthesis**: Getting implementation reviews from multiple AI models (Claude, GPT, Grok, Gemini) and synthesizing their feedback helps catch issues a single reviewer might miss. Key insight: when reviewers disagree on "critical" issues, investigate the codebase to verify claims before implementing fixes. Some "critical" issues are based on incorrect assumptions about how fast paths or downstream functions work.
 ### Process Template (Multi-agent design review, TDD-first)

package/README.md CHANGED Viewed

@@ -1166,9 +1166,6 @@ const options: SegmentationOptions = {
   // - `flags` defaults to 'gu'. If provided, `g` and `u` are always enforced.
   // - `pageIds: []` means "apply to no pages" (skip that rule).
   // - Remember JSON escaping: to match a literal '.', use regex: "\\\\." in JSON.
-  replace: [
-    { regex: "([\\u0660-\\u0669]+)\\s*[-–—ـ]\\s*", replacement: "$1 - " }
-  ],
   rules: [
     { lineStartsWith: ['## '], split: 'at' }
   ],

package/dist/index.d.mts CHANGED Viewed

@@ -698,26 +698,6 @@ interface Logger {
   /** Log a warning message (potential issues) */
   warn?: (message: string, ...args: unknown[]) => void;
 }
-/**
- * - Default regex flags: `gu` (global + unicode)
- * - If `flags` is provided, it is validated and merged with required flags:
- *   `g` and `u` are always enforced.
- *
- * `pageIds` controls which pages a rule applies to:
- * - `undefined`: apply to all pages
- * - `[]`: apply to no pages (rule is skipped)
- * - `[id1, id2, ...]`: apply only to those pages
- */
-type Replacement = {
-  /** Raw regex source string (no token expansion). Compiled with `u` (and always `g`). */
-  regex: string;
-  /** Replacement string (passed to `String.prototype.replace`). */
-  replacement: string;
-  /** Optional regex flags; `g` and `u` are always enforced. */
-  flags?: string;
-  /** Optional list of page IDs to apply this replacement to. Empty array means skip. */
-  pageIds?: number[];
-};
 /**
  * Segmentation options controlling how pages are split.
  *
@@ -751,12 +731,6 @@ type Replacement = {
  * };
  */
 type SegmentationOptions = {
-  /**
-   * Optional pre-processing replacements applied to page content BEFORE segmentation.
-   *
-   * Replacements are applied per-page (not on concatenated content), in array order.
-   */
-  replace?: Replacement[];
   /**
    * Rules applied in order to find split points.
    *
@@ -891,9 +865,6 @@ type SegmentationOptions = {
   logger?: Logger;
 };
 //#endregion
-//#region src/preprocessing/replace.d.ts
-declare const applyReplacements: (pages: Page[], rules?: Replacement[]) => Page[];
-//#endregion
 //#region src/recovery.d.ts
 type MarkerRecoverySelector = {
   type: 'rule_indices';
@@ -1389,5 +1360,5 @@ declare const escapeTemplateBrackets: (pattern: string) => string;
 declare const escapeRegex: (s: string) => string;
 declare const makeDiacriticInsensitive: (text: string) => string;
 //#endregion
-export { type Breakpoint, type BreakpointRule, type CommonLineStartPattern, type DetectedPattern, type ExpandResult, type LineStartAnalysisOptions, type LineStartPatternExample, type Logger, type MarkerRecoveryReport, type MarkerRecoveryRun, type MarkerRecoverySelector, type OptimizeResult, PATTERN_TYPE_KEYS, type Page, type PageRange, type PatternProcessor, type PatternTypeKey, type RepeatingSequenceExample, type RepeatingSequenceOptions, type RepeatingSequencePattern, type RuleValidationResult, type Segment, type SegmentationOptions, type SplitRule, TOKEN_PATTERNS, Token, type TokenKey, type TokenMapping, type ValidationIssue, type ValidationIssueType, analyzeCommonLineStarts, analyzeRepeatingSequences, analyzeTextForRule, applyReplacements, applyTokenMappings, containsTokens, detectTokenPatterns, escapeRegex, escapeTemplateBrackets, expandCompositeTokensInTemplate, expandTokens, expandTokensWithCaptures, formatValidationReport, generateTemplateFromText, getAvailableTokens, getTokenPattern, makeDiacriticInsensitive, optimizeRules, recoverMistakenLineStartsAfterMarkers, recoverMistakenMarkersForRuns, segmentPages, shouldDefaultToFuzzy, stripTokenMappings, suggestPatternConfig, templateToRegex, validateRules, withCapture };
+export { type Breakpoint, type BreakpointRule, type CommonLineStartPattern, type DetectedPattern, type ExpandResult, type LineStartAnalysisOptions, type LineStartPatternExample, type Logger, type MarkerRecoveryReport, type MarkerRecoveryRun, type MarkerRecoverySelector, type OptimizeResult, PATTERN_TYPE_KEYS, type Page, type PageRange, type PatternProcessor, type PatternTypeKey, type RepeatingSequenceExample, type RepeatingSequenceOptions, type RepeatingSequencePattern, type RuleValidationResult, type Segment, type SegmentationOptions, type SplitRule, TOKEN_PATTERNS, Token, type TokenKey, type TokenMapping, type ValidationIssue, type ValidationIssueType, analyzeCommonLineStarts, analyzeRepeatingSequences, analyzeTextForRule, applyTokenMappings, containsTokens, detectTokenPatterns, escapeRegex, escapeTemplateBrackets, expandCompositeTokensInTemplate, expandTokens, expandTokensWithCaptures, formatValidationReport, generateTemplateFromText, getAvailableTokens, getTokenPattern, makeDiacriticInsensitive, optimizeRules, recoverMistakenLineStartsAfterMarkers, recoverMistakenMarkersForRuns, segmentPages, shouldDefaultToFuzzy, stripTokenMappings, suggestPatternConfig, templateToRegex, validateRules, withCapture };
 //# sourceMappingURL=index.d.mts.map

package/dist/index.d.mts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"index.d.mts","names":[],"sources":["../src/types/index.ts","../src/analysis/line-starts.ts","../src/analysis/repeating-sequences.ts","../src/detection.ts","../src/types/rules.ts","../src/optimization/optimize-rules.ts","../src/types/breakpoints.ts","../src/types/options.ts","../src/~~preprocessing/replace.ts","../src/~~recovery.ts","../src/segmentation/breakpoint-utils.ts","../src/segmentation/pattern-validator.ts","../src/segmentation/segmenter.ts","../src/segmentation/tokens.ts","../src/utils/textUtils.ts"],"sourcesContent":[],"mappings":";;AAcA;AA4CA;AA2BA;;;;AClEA;AAcA;AAEA;AA4PA;;;;AAGyB,KDpRb,OAAA,GCoRa;;;;ACzQzB;AAaA;AAOA;EAiOa,OAAA,EAAA,MAAA;EACF;;;EAEgB,IAAA,EAAA,MAAA;;;;AC5Q3B;AA+EA;AAgEA;EAuBa,EAAA,CAAA,EAAA,MAAA;EAiCA;;;;AC5M+B;AA4B3B;AA4BG;EA+Df,IAAA,CAAA,EJ5EM,MI4EN,CAAA,MAAA,EAAA,OAAsB,CAAA;AAAA,CAAA;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AAU9C;AAoJlB;AAAwB,KJ/RZ,IAAA,GI+RY;EAAc;;;;;;ECjV1B;AAsCZ;;;;AC7BA;EAyGY,OAAA,EAAA,MAAU;;;;AC/FtB;~~AAuBA;AA2CA;;;;;;AAoJmB~~,~~KP5JP,~~SAAA~~,GO4JO,MAAA,GAAA,CAAA,MAAA,EAAA,MAAA,CAAA~~;;;~~APnOP~~,KCKA,wBAAA,GDwBK;EAeL,IAAA,CAAA,EAAI,MAAA;EA2BJ,WAAA,CAAS,EAAA,MAAA;;;;EClET,wBAAA,CAAA,EAAwB,OAAA;EAcxB,yBAAA,CAAuB,EAAA,OAAA;EAEvB,MAAA,CAAA,EAAA,aAAA,GAAsB,OAAA;EA4PrB,UAAA,CAAA,EAAA,CAAA,IAAA,EAAA,MAmBZ,EAAA,MAAA,EAAA,MAAA,EAAA,GAAA,OAAA;EAlBU,cAAA,CAAA,EAnQU,MAmQV,EAAA;EACE,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;CACV;AAAsB,KAjQb,uBAAA,GAiQa;;;;ACzQb,KDUA,sBAAA,GCVwB;EAaxB,OAAA,EAAA,MAAA;EAOA,KAAA,EAAA,MAAA;EAiOC,QAAA,EDxOC,uBC6Pb,EAAA;CApBU;;;;cDgBE,iCACF,kBACE,6BACV;;;AAHU,KCtQD,wBAAA,GDyRX;EAlBU,WAAA,CAAA,EAAA,MAAA;EACE,WAAA,CAAA,EAAA,MAAA;EACV,QAAA,CAAA,EAAA,MAAA;EAAsB,IAAA,CAAA,EAAA,MAAA;;;;ECzQb,WAAA,CAAA,EAAA,MAAA;EAaA,YAAA,CAAA,EAAA,MAAA;EAOA,iBAAA,CAAA,EAAA,MAAA;AAiOZ,CAAA;AACW,KAzOC,wBAAA,GAyOD;EACG,IAAA,EAAA,MAAA;EACX,OAAA,EAAA,MAAA;EAAwB,MAAA,EAAA,MAAA;;;KApOf,wBAAA;ECxCA,OAAA,EAAA,MAAA;EA+EC,KAAA,EAAA,MAAA;EAgEA,QAAA,EDpGC,wBCmHb,EAAA;AAQD,CAAA;;;AC3K4C;AA4B3B;AA4BG;AA8BM;AAyDrB,cF+HQ,yBE/HW,EAAA,CAAA,KAAA,EFgIb,IEhIa,EAAA,EAAA,OAAA,CAAA,EFiIV,wBEjIU,EAAA,GFkIrB,wBElIqB,EAAA;;;;AJjIxB;AA4CA;AA2BY,KGhFA,eAAA,GHgFS;;;;EClET,KAAA,EAAA,MAAA;EAcA;EAEA,KAAA,EAAA,MAAA;EA4PC;EACF,QAAA,EAAA,MAAA;CACE;;;;;;ACxQb;AAaA;AAOA;AAiOA;;;;;;;cC1La,uCAAmC;AA/EhD;AA+EA;AAgEA;AAuBA;AAiCA;;;;AC5M4C;AA4B3B;AA4BG;AA8BM;AAiCC;AAuCtB,cDVQ,wBCUG,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,QAAA,EDViD,eCUjD,EAAA,EAAA,GAAA,MAAA;;;;;;;AAsBH,cDTA,oBCSuG,EAAA,CAAA,QAAA,EDRtG,eCQsG,EAAA,EAAA,GAAA;EAOxG,WAAA,EAAA,gBAAyB,GAAA,iBAAiB;EAUjD,KAAA,EAAA,OAAA;EAyCA,QAAA,CAAA,EAAA,MAAA;AA2GL,CAAA;;;;;;;cD7Ia;EEpMD,QAAA,EAAA,MAAA;EAsCC,WAAA,EAAA,gBAAwB,GAAA,iBAAA;;;YFqKvB;AGlMd,CAAA,GAAY,IAAA;;;ANHZ;AA4CA;AA2BA;;;;AClEA;AAcA;AAEA;AA4PA;;;;;;;;ACtQA;AAaA;AAOA;AAiOA;;;;;;KElPK,YAAA;;EDvBO,KAAA,EAAA,MAAA;AA+EZ,CAAA;AAgEA;AAuBA;AAiCA;;;;AC5M4C;AA4B3B;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AAU9C;AAoJlB;KAjSK,eAAA,GAiSmB;EAAc;EAAgB,QAAA,EAAA,MAAA;CAAe;;;;ACjVrE;AAsCA;;;;AC7BA;AAyGA;;;;AC/FA;~~AAuBA~~;~~AA2CA;;;;;;;;;;AC3DA~~,~~KJoDK,~~qBAAA,~~GIlCJ~~;~~EAlBwC~~;~~EAAgB~~,cAAA,EAAA,MAAA,EAAA;~~CAAa;;;;;AC1BtE~~;~~AAKA;;;;;;AAOA;~~AA2BE;~~AAmnBF~~;;;;;;;;;AAsDA~~;;;;;;KLjmBK,sBAAA;;;AM0CL,CAAA~~;;;;ACpJA;AAKA;AAcA~~;;;;;;AA2FA~~;~~AAsCA;;;;ACiJA;;;KRnKK~~,~~mBAAA~~,~~GQmKmE~~;EAAA;;;;~~AC3MxE~~;AAsCA;AAGA;AAiBA;AAgDA;AA2CA~~;AAWA;AAkJA;AA8CA;AAuBA~~,~~KThUK~~,~~WAAA~~,~~GACC~~,~~YSsUL~~,~~GTrUK~~,~~eS8T0C~~,~~GT7T1C~~,~~qBS6T0C,GT5T1C,sBS4T0C,GT3T1C,mBS2T0C~~;~~AAqBhD~~;AAgBA;AA8BA;AAWA;AAoBA;AA6BA;;;;~~ACzjBA~~;~~AAqDA~~;~~AAcA;;;;cV6Ea~~;;;;;;KAOD,cAAA,WAAyB;;;;;;;KAUhC,aAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAyCA,eAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAyCS;;;;;;;;;;;;SAaH;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqDC,SAAA,GAAY,cAAc,gBAAgB;;;AJ3UtD;AA4CA;AA2BA;KK7EY,cAAA;;SAED;EJSC;EAcA,WAAA,EAAA,MAAA;AAEZ,CAAA;AA4Pa,cIjPA,aJoQZ,EAAA,CAAA,KAAA,EIpQoC,SJoQpC,EAAA,EAAA,GAAA;EAlBU,WAAA,EAAA,MAAA;EACE,KAAA,EInPwB,SJmPxB,EAAA;CACV;;;ADpRH;AA4CA;AA2BA;;;;AClEA;AAcA;AAEA;AA4PA;;;;;;KK9QY,cAAA;;AJQZ;AAaA;AAOA;AAiOA;;;;;;;;ACzQA;AA+EA;AAgEA;AAuBA;AAiCA;;;;AC5M4C;AA4B3B;EA0DZ,KAAA,CAAA,EAAA,MAAA;EAiCA;AAAsB;AAwBH;;;;;;;AAqCxB;EAOY,KAAA,CAAA,EAAA,IAAA,GAAA,OAAc;EAUrB;AAAa;AAoJlB;;EAAsC,GAAA,CAAA,EAAA,MAAA;EAAgB;;;;;ECjV1C;AAsCZ;;;;AC7BA;AAyGA;;;;AC/FA;~~AAuBA~~;~~AA2CA~~;;;;;;~~EAoJmB,OAAA,CAAA,ED5JL,SC4JK,EAAA;;;;AC/MnB~~;;;;;;;;~~AC1BA;AAKA;;;;;;EAOY~~,QAAA,CAAA,EAAA,MAAA;~~AA2BV~~,CAAA;~~AAmnBF;;;;;;;;;AAsDA;;;;;AAGoD~~,~~KHjmBxC~~,UAAA,~~GGimBwC~~,~~MAAA~~,~~GHjmBlB~~,~~cGimBkB~~;;;~~ATjqBpD~~;AA2BA;;;;AClEA;AAcA;AAEA;AA4PA;;;;;;;;ACtQA;AAaA;AAOA;AAiOA;;;;;UKnPiB,MAAA;;;EJtBL;EA+EC,KAAA,CAAA,EAAA,CAAA,OAAA,EAAA,MAiDZ,EAAA,GAAA,IAjD+C,EAAA,OAAA,EAAA,EAAA,GAAA,IAAA;EAgEnC;EAuBA,IAAA,CAAA,EAAA,CAAA,OAAA,EAAA,MAyBZ,EAAA,GAAA,IAAA,EAxBa,OAAA,EAAA,EAAA,GAAA,IAAe;EAgChB;;;;AC5M+B;AA4B3B;AA4BG;AA8BM;AAiCC;AAwBH~~;;;;;;AAoBC,KGjHb,WAAA,GHiHa~~;~~EAiBZ~~;~~EAOD,KAAA,EAAA,MAAA~~;~~EAUP~~;~~EAyCA~~,~~WAAA,EAAA,MAAe~~;~~EA2GR~~;~~EAAY~~,KAAA,CAAA,EAAA,~~MAAA~~;~~EAAc~~;~~EAAgB~~,OAAA,CAAA,EAAA,MAAA~~,EAAA~~;~~CAAe;;;;ACjVrE~~;~~AAsCA;;;;AC7BA;AAyGA;;;;AC/FA;AAuBA;AA2CA;;;;;;;;;;AC3DA;;;;;;;KD2DY~~,~~mBAAA;EErFA;AAKZ;;;;EAIc,~~OAAA,CAAA,~~EFkFA~~,~~WElFA~~,~~EAAA;EAAsB;AAGpC;AA2BE;AAmnBF;;;;EAIc~~,~~KAAA~~,~~CAAA~~,~~EF1jBF~~,~~SE0jBE~~,~~EAAA~~;~~EAGa;;;;AA+C3B~~;;;;;~~EAGoD~~,~~KAAA~~,CAAA,EAAA,~~OAAA,GAAA~~;;;;~~IC1jBxC~~,~~OAAA~~,CAAA,~~EHpCY~~,~~KGoCI,CAAA,~~MAAA~~,GAAA,YAAA,CAAA;;;;ACpJ5B~~;~~AAKA;AAcA;;;;;;AA2FA;AAsCA~~;;;;~~ECiJa~~;;;;;;;;~~AC3Mb~~;AAsCA;AAGA;~~EAiBa,gBAAA,CAAA,EAAA,MAAA~~;~~EAgDA~~;~~AA2Cb~~;AAWA;AAkJA;AA8CA;AAuBA;AAqBA;AAgBA;AA8BA;AAWA;AAoBA;AA6BA~~;;;;ACzjBA~~;~~AAqDA~~;~~AAcA;;;;;;;;;;;gBPkFkB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;WAwDL~~;;;;~~APvLD~~,~~cQxBC~~,~~iBRwBG,EAAA,CAAA,KAAA,EQxByB,IRwBzB,EAAA,EAAA,KAAA,CAAA,EQxByC,WRwBzC,EAAA,EAAA,GQxBsD,IRwBtD,EAAA;;;AA2BJ,KS7EA,~~sBAAA,~~GT6ES~~;;;;EClET,IAAA,EAAA,0BAAwB;EAcxB,KAAA,CAAA,EAAA,OAAA,GAAA,YAAuB;EAEvB,QAAA,EAAA,MAAA,EAAA;AA4PZ,CAAA,GAAa;EACF,IAAA,EAAA,WAAA;EACE,SAAA,EAAA,CAAA,IAAA,~~EQtRgC~~,~~SRsRhC~~,EAAA,KAAA,EAAA,MAAA,EAAA,GAAA,OAAA;CACV;AAAsB,~~KQrRb~~,iBAAA,~~GRqRa~~;~~WQpRZ~~;SACF;YACG;~~EPSF~~,QAAA,~~EORE~~,~~sBPQsB~~;AAapC,CAAA;AAOY,~~KOzBA~~,oBAAA,~~GPyBwB~~;EAiOvB,OAAA,EAAA;IACF,IAAA,EAAA,YAAA,GAAA,wBAAA;IACG,SAAA,EAAA,MAAA;IACX,aAAA,EAAA,MAAA;IAAwB,SAAA,EAAA,MAAA;;;~~UOrPf~~;~~INvBA~~,SAAA,EAAA,MAAe;IA+Ed,QAAA,EAAA,MAAA;IAgEA,aAAA,EAAA,MAAA;IAuBA,UAAA,EAAA,MAAA;EAiCA,CAAA,CAAA;~~WM1KA~~;;;~~ILNR~~,sBAAY,CAAA,EAAA,MAAA;IA4BZ,qBAAe,CAAA,EAAA,MAAA;IA8Bf,YAAA,EAAA,MAAqB;IAiCrB,MAAA,EAAA,WAAA,GAAsB,oBAAA,GAAA,WAAA,GAAA,sBAAA,GAAA,qBAAA;IAwBtB,QAAA,EAAA,OAAA,GAAmB,QAAA,GAAA,MAAA;IAenB,EAAA,CAAA,EAAA,MAAW;IACV,KAAA,CAAA,EAAA,MAAA,EAAA;EACA,CAAA,CAAA;EACA,MAAA,EAAA,MAAA,EAAA;EACA,QAAA,EAAA,MAAA,EAAA;CACA;~~KKlHD~~,oBAAA,~~GLkHoB~~,MAAA,GAAA,YAAA,GAAA,qBAAA;AAiBZ,~~iBK8eG~~,qCAAA,~~CL9eoG~~,KAAA,~~EK+ezG~~,~~IL/eyG~~,EAAA,EAAA,QAAA,~~EKgftG~~,~~OLhfsG~~,EAAA,EAAA,OAAA,~~EKifvG~~,~~mBLjfuG~~,EAAA,QAAA,~~EKkftG~~,~~sBLlfsG~~,EAAA,IAiBlG,CAjBkG,EAAA;EAOxG,IAAA,CAAA,EAAA,YAAc,GAAA,wBAAW;EAUhC,gBAAa,CAAA,~~EKoeS~~,~~oBLpeT~~;AAAA,CAAA,CAAA,EAyCb;EA2GO,MAAA,~~EKkVC~~,~~oBLlVQ~~;EAAG,QAAA,~~EKkVqB~~,~~OLlVrB~~,EAAA;CAAc;AAAgB,~~iBK+XtC~~,6BAAA,~~CL/XsC~~,IAAA,~~EKgY5C~~,~~iBLhY4C~~,EAAA,EAAA,KAAA,EAAA;EAAe,IAAA,CAAA,EAAA,YAAA,GAAA,wBAAA;~~qBKiYW~~;;UACnE;~~EJntBD~~,QAAA,~~EImtBiC~~,~~OJntBnB~~,EAAA;AAsC1B,CAAA;;;;~~AIwnBc~~,~~KCrgBF~~,gBAAA,~~GDqgBE~~,CAAA,OAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;~~ARnpBd~~;AAcA;AAEA;AA4Pa,~~KUlRD~~,mBAAA,~~GVqSX~~,gBAAA,GAAA,eAAA,GAAA,WAAA,GAAA,eAAA;;;;AAhBwB,~~KUhRb~~,eAAA,~~GVgRa~~;QU/Qf;;;~~ETME~~;EAaA,KAAA,CAAA,EAAA,MAAA;EAOA;EAiOC,OAAA,CAAA,EAAA,MAAA;CACF;;;;;KS/OC,oBAAA;oBACU;~~ER5BV~~,eAAA,CAAA,EAAe,~~CQ6BJ~~,~~eR7BI~~,GAAA,SAAA,CAAA,EAAA;EA+Ed,YAAA,CAAA,EAAA,~~CQjDO~~,~~eRkGnB~~,GAjD+C,SAAA,CAAA,EAAA;EAgEnC,QAAA,CAAA,~~EQhHE~~,~~eRgHF~~;AAuBb,CAAA;AAiCA;;;;AC5M4C;AA4B3B;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AAmD3D,~~cOnHQ~~,~~aP4JC~~,EAAA,CAAA,KAAA,~~EO5JuB~~,~~SPyKpB~~,EAAA,EAAA,GAAA,~~COzK~~+B,~~oBPyK~~/B,GAAA,SAAA,CAAA,EAAA;AAqDjB;;;;;;;;ACjVA;AAsCA;;;;AC7BY,~~cKgJC~~,~~sBL5EU~~,EAAA,CAAA,OAAA,EAAA,~~CK4E0B~~,~~oBL5E1B~~,GAAA,SAAA,CAAA,EAAA,EAAA,GAAA,MAAA,EAAA;;;;;AJ5DvB;AAaA;AAOA;AAiOA;;;;;;;;ACzQA;AA+EA;AAgEA;AAuBA;AAiCA;;;;AC5M4C;AA4B3B;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AAU9C;AAoJlB;;;;;~~cQvCa~~,sBAAuB,iBAAiB,wBAAmB;;;;~~AZpS5D~~,~~cayFC~~,~~Kb5DF~~,EAAA;EAeC;EA2BA,SAAA,GAAS,EAAA,SAAA;;;;EClET,SAAA,MAAA,EAAA,YAAwB;EAcxB;EAEA,SAAA,IAAA,EAAA,UAAsB;EA4PrB;EACF,SAAA,IAAA,EAAA,UAAA;EACE;EACV,SAAA,IAAA,EAAA,UAAA;EAAsB;;;;ECzQb;EAaA,SAAA,IAAA,EAAA,UAAA;EAOA;EAiOC,SAAA,GAAA,EAAA,SAAA;EACF;EACG,SAAA,QAAA,EAAA,cAAA;EACX;EAAwB,SAAA,IAAA,EAAA,UAAA;;;;EC5Qf,SAAA,KAAA,EAAA,WAAe;EA+Ed;EAgEA,SAAA,KAAA,EAAA,WAeZ;EAQY;EAiCA,SAAA,MAAA,EAAA,YAmBZ;;;;AC/N2C;AAwDvC,~~KSqFO~~,QAAA,~~GTrFQ~~,MAAA,~~OSqFgB~~,~~KTrFhB~~;AAAA;AA+Df,~~cSyBQ~~,~~WTzBc~~,EAAA,CAAA,KAAA,EAAA,MAAA,EAAA,IAAA,EAAA,MAAA,EAAA,GAAA,MAAA;AAAA;AAuCtB,~~cSGQ~~,+~~BTHG~~,EAAA,CAAA,QAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;;;;AAsBhB;AAOA;AAAgE;AAU9C;AAoJlB;;;;;;;;ACjVA;AAsCA;;;;AC7BA;AAyGA;~~cOuFa~~,gBAAgB;;;~~ANtL7B~~;~~AAuBA~~;~~AA2CA;;;;;;;;;;AC3DA;AAAyC~~,~~cK0N5B~~,~~cL1N4B~~,EAAA,CAAA,KAAA,EAAA,MAAA,EAAA,GAAA,OAAA~~;;;;;;;AC1B7B~~,~~KI+PA~~,YAAA,~~GJ/PA~~;~~EAKA~~;;;;;~~EAIwB~~,OAAA,EAAA,MAAA;~~EAGxB~~;~~AA2BV;AAmnBF;;;EAGa~~,YAAA,EAAA,MAAA,EAAA;~~EACC;;;;;EAkDE,WAAA,EAAA,OAAA;CACN;;;;;;;;ACxjBV~~;;;;ACpJA;AAKA;AAcA;;;;;;AA2FA;AAsCA;;;;~~ACiJA~~;;;;;;;;~~AC3MA~~;AAsCA;AAGA;AAiBA;~~AAgDa~~,~~cAwMA~~,~~wBAxMsB~~,EAAA,CAAA,~~KAAA~~,EAAA,MAAA,EAAA,cAAA,CAAA,EAAA,CAAA,OAAA,EAAA,MAAA,EAAA,GAAA,MAAA,EAAA,aAAA,CAAA,EAAA,MAAA,EAAA,GAAA;~~EA2CtB~~,YAAA,EAAA,~~MAGZ,EAAA;EAQW,WAAA,EAAA,OAAY;EAkJX,OAAA,EAAA,~~MAAA~~;AA8Cb~~,~~CAAA;AAuBA;AAqBA;AAgBA;AA8BA;AAWA;AAoBA;AA6BA~~;;;;~~ACzjBA~~;~~AAqDA~~;~~AAcA;;;;;;;cDgWa~~;;;;;;;;;;;;;;;;;;;;;;cAuBA,uCAAmC;;;;;;;;;;;;;cAqBnC;;;;;;;;;;;;;;;cAgBA;;;;;;;;;;;;;;;cA8BA;;;;KAWD,YAAA;;;;;;;;;;;;;;;;;;;;;;cAoBC,iDAAkD;;;;;;;;;;;;;;;;cA6BlD;;;~~AZ9Tb~~;;;;;;;;ACtQA;AAaA;AAOA;AAiOA;;;;;;;;ACzQA;AA+EA;AAgEA;AAuBA;AAiCa,~~cWxKA~~,~~sBX+KC~~,EAAA,CAAA,OAAe,EAAA,MAAA,EAAA,GAAA,MAAA;;;;ACnNe;AA4B3B;AA4BG;AA8BM;AAiCC;AAwBH;;;;;AAoBlB,~~cU1EO~~,~~WV0EP~~,EAAA,CAAA,CAAA,EAAA,MAAA,EAAA,GAAA,MAAA;AAAmB,~~cU5DZ~~,~~wBV4DY~~,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA,MAAA"}
1	+ {"version":3,"file":"index.d.mts","names":[],"sources":["../src/types/index.ts","../src/analysis/line-starts.ts","../src/analysis/repeating-sequences.ts","../src/detection.ts","../src/types/rules.ts","../src/optimization/optimize-rules.ts","../src/types/breakpoints.ts","../src/types/options.ts","../src/recovery.ts","../src/segmentation/breakpoint-utils.ts","../src/segmentation/pattern-validator.ts","../src/segmentation/segmenter.ts","../src/segmentation/tokens.ts","../src/utils/textUtils.ts"],"sourcesContent":[],"mappings":";;AAcA;AA4CA;AA2BA;;;;AClEA;AAcA;AAEA;AA4PA;;;;AAGyB,KDpRb,OAAA,GCoRa;;;;ACzQzB;AAaA;AAOA;EAiOa,OAAA,EAAA,MAAA;EACF;;;EAEgB,IAAA,EAAA,MAAA;;;;AC5Q3B;AA+EA;AAgEA;EAuBa,EAAA,CAAA,EAAA,MAAA;EAiCA;;;;AC5M+B;AA4B3B;AA4BG;EA+Df,IAAA,CAAA,EJ5EM,MI4EN,CAAA,MAAA,EAAA,OAAsB,CAAA;AAAA,CAAA;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AAU9C;AAoJlB;AAAwB,KJ/RZ,IAAA,GI+RY;EAAc;;;;;;ECjV1B;AAsCZ;;;;AC7BA;EAyGY,OAAA,EAAA,MAAU;;;;AC/FtB;AA6CA;;;;;;;KPaY,SAAA;;;AAvEA,KCKA,wBAAA,GDwBK;EAeL,IAAA,CAAA,EAAI,MAAA;EA2BJ,WAAA,CAAS,EAAA,MAAA;;;;EClET,wBAAA,CAAA,EAAwB,OAAA;EAcxB,yBAAA,CAAuB,EAAA,OAAA;EAEvB,MAAA,CAAA,EAAA,aAAA,GAAsB,OAAA;EA4PrB,UAAA,CAAA,EAAA,CAAA,IAAA,EAAA,MAmBZ,EAAA,MAAA,EAAA,MAAA,EAAA,GAAA,OAAA;EAlBU,cAAA,CAAA,EAnQU,MAmQV,EAAA;EACE,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;CACV;AAAsB,KAjQb,uBAAA,GAiQa;;;;ACzQb,KDUA,sBAAA,GCVwB;EAaxB,OAAA,EAAA,MAAA;EAOA,KAAA,EAAA,MAAA;EAiOC,QAAA,EDxOC,uBC6Pb,EAAA;CApBU;;;;cDgBE,iCACF,kBACE,6BACV;;;AAHU,KCtQD,wBAAA,GDyRX;EAlBU,WAAA,CAAA,EAAA,MAAA;EACE,WAAA,CAAA,EAAA,MAAA;EACV,QAAA,CAAA,EAAA,MAAA;EAAsB,IAAA,CAAA,EAAA,MAAA;;;;ECzQb,WAAA,CAAA,EAAA,MAAA;EAaA,YAAA,CAAA,EAAA,MAAA;EAOA,iBAAA,CAAA,EAAA,MAAA;AAiOZ,CAAA;AACW,KAzOC,wBAAA,GAyOD;EACG,IAAA,EAAA,MAAA;EACX,OAAA,EAAA,MAAA;EAAwB,MAAA,EAAA,MAAA;;;KApOf,wBAAA;ECxCA,OAAA,EAAA,MAAA;EA+EC,KAAA,EAAA,MAAA;EAgEA,QAAA,EDpGC,wBCmHb,EAAA;AAQD,CAAA;;;AC3K4C;AA4B3B;AA4BG;AA8BM;AAyDrB,cF+HQ,yBE/HW,EAAA,CAAA,KAAA,EFgIb,IEhIa,EAAA,EAAA,OAAA,CAAA,EFiIV,wBEjIU,EAAA,GFkIrB,wBElIqB,EAAA;;;;AJjIxB;AA4CA;AA2BY,KGhFA,eAAA,GHgFS;;;;EClET,KAAA,EAAA,MAAA;EAcA;EAEA,KAAA,EAAA,MAAA;EA4PC;EACF,QAAA,EAAA,MAAA;CACE;;;;;;ACxQb;AAaA;AAOA;AAiOA;;;;;;;cC1La,uCAAmC;AA/EhD;AA+EA;AAgEA;AAuBA;AAiCA;;;;AC5M4C;AA4B3B;AA4BG;AA8BM;AAiCC;AAuCtB,cDVQ,wBCUG,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,QAAA,EDViD,eCUjD,EAAA,EAAA,GAAA,MAAA;;;;;;;AAsBH,cDTA,oBCSuG,EAAA,CAAA,QAAA,EDRtG,eCQsG,EAAA,EAAA,GAAA;EAOxG,WAAA,EAAA,gBAAyB,GAAA,iBAAiB;EAUjD,KAAA,EAAA,OAAA;EAyCA,QAAA,CAAA,EAAA,MAAA;AA2GL,CAAA;;;;;;;cD7Ia;EEpMD,QAAA,EAAA,MAAA;EAsCC,WAAA,EAAA,gBAAwB,GAAA,iBAAA;;;YFqKvB;AGlMd,CAAA,GAAY,IAAA;;;ANHZ;AA4CA;AA2BA;;;;AClEA;AAcA;AAEA;AA4PA;;;;;;;;ACtQA;AAaA;AAOA;AAiOA;;;;;;KElPK,YAAA;;EDvBO,KAAA,EAAA,MAAA;AA+EZ,CAAA;AAgEA;AAuBA;AAiCA;;;;AC5M4C;AA4B3B;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AAU9C;AAoJlB;KAjSK,eAAA,GAiSmB;EAAc;EAAgB,QAAA,EAAA,MAAA;CAAe;;;;ACjVrE;AAsCA;;;;AC7BA;AAyGA;;;;AC/FA;AA6CA;;;;;;;;;ACjEA;AAKA;KJ0EK,qBAAA,GIzEQ;EACF;EACG,cAAA,EAAA,MAAA,EAAA;CACA;;AAGd;AA2BE;AAknBF;;;;;;;;;AAsDA;;;;;;;;;ACrjBA;;;;ACpJA;AAKA;AAcA;KNuFK,sBAAA,GMtFiB;EACC;EACH,eAAA,EAAA,MAAA,EAAA;CACL;;AAuFf;AAsCA;;;;ACgJA;;;;;;;;AC1MA;AAsCA;AAGA;AAiBA;AAgDA;AA2CA,KR7GK,mBAAA,GQgHJ;EAQW;EAkJC,YAAA,EAAA,MAAA,EAAA;AA8Cb,CAAA;AAuBA;AAqBA;AAgBA;AA8BA;AAWA;AAoBA;AA6BA;;;;ACxjBA,KTyHK,WAAA,GACC,YSxHL,GTyHK,eSzHL,GT0HK,qBS1HL,GT2HK,sBS3HL,GT4HK,mBS5HL;AAoDD;AAeA;;;;;;;;;;;;;;cT0Ea;;;;;;KAOD,cAAA,WAAyB;;;;;;;KAUhC,aAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAyCA,eAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAyCS;;;;;;;;;;;;SAaH;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqDC,SAAA,GAAY,cAAc,gBAAgB;;;AJ3UtD;AA4CA;AA2BA;KK7EY,cAAA;;SAED;EJSC;EAcA,WAAA,EAAA,MAAA;AAEZ,CAAA;AA4Pa,cIjPA,aJoQZ,EAAA,CAAA,KAAA,EIpQoC,SJoQpC,EAAA,EAAA,GAAA;EAlBU,WAAA,EAAA,MAAA;EACE,KAAA,EInPwB,SJmPxB,EAAA;CACV;;;ADpRH;AA4CA;AA2BA;;;;AClEA;AAcA;AAEA;AA4PA;;;;;;KK9QY,cAAA;;AJQZ;AAaA;AAOA;AAiOA;;;;;;;;ACzQA;AA+EA;AAgEA;AAuBA;AAiCA;;;;AC5M4C;AA4B3B;EA0DZ,KAAA,CAAA,EAAA,MAAA;EAiCA;AAAsB;AAwBH;;;;;;;AAqCxB;EAOY,KAAA,CAAA,EAAA,IAAA,GAAA,OAAc;EAUrB;AAAa;AAoJlB;;EAAsC,GAAA,CAAA,EAAA,MAAA;EAAgB;;;;;ECjV1C;AAsCZ;;;;AC7BA;AAyGA;;;;AC/FA;AA6CA;;;;;;;YDac;;AE9Ed;AAKA;;;;;;AAOA;AA2BE;AAknBF;;;;;;;;EASoD,QAAA,CAAA,EAAA,MAAA;AA6CpD,CAAA;;;;;;;;;ACrjBA;;;;ACpJA;AAKA;AAcY,KJ0FA,UAAA,GI1FA,MAAoB,GJ0FE,cI1FF;;;AV0BhC;AA2BA;;;;AClEA;AAcA;AAEA;AA4PA;;;;;;;;ACtQA;AAaA;AAOA;AAiOA;;;;;UKnPiB,MAAA;;;EJtBL;EA+EC,KAAA,CAAA,EAAA,CAAA,OAAA,EAAA,MAiDZ,EAAA,GAAA,IAjD+C,EAAA,OAAA,EAAA,EAAA,GAAA,IAAA;EAgEnC;EAuBA,IAAA,CAAA,EAAA,CAAA,OAAA,EAAA,MAyBZ,EAAA,GAAA,IAAA,EAxBa,OAAA,EAAA,EAAA,GAAA,IAAe;EAgChB;;;;AC5M+B;AA4B3B;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AAU9C;AAoJlB;;;;;;;;ACjVA;AAsCA;;;;AC7BA;AAyGA;;;KClDY,mBAAA;EA7CK;AA6CjB;;;;;;UAQY;;;ACzEZ;AAKA;;;;;;EAOY,KAAA,CAAA,EAAA,OAAA,GAAA;IA6BP;IAgnBW,OAAA,CAAA,EAAA,MAAA;IACL;IACG,OAAA,CAAA,EDjkBU,KCikBV,CAAA,MAAA,GAAA,YAAA,CAAA;EACD,CAAA;EACC;;;;;AAkDd;;;;;;;;;ECrjBY,QAAA,CAAA,EAAA,MAAA;;;;ACpJZ;AAKA;AAcA;;;;;;EA2Fa,gBAwBP,CAAA,EAAA,MAAA;EAcO;;;;ACgJb;;;;;;;;AC1MA;AAsCA;AAGA;AAiBA;AAgDA;AA2CA;AAWA;AAkJA;AA8CA;AAuBA;AAqBA;AAgBA;AA8BA;AAWA;AAoBA;AA6BA;gBLhckB;;;AMxHlB;AAsDA;AAeA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;WN2Ga;;;;APhID,KQ9EA,sBAAA,GR8ES;;;;EClET,IAAA,EAAA,0BAAwB;EAcxB,KAAA,CAAA,EAAA,OAAA,GAAA,YAAuB;EAEvB,QAAA,EAAA,MAAA,EAAA;AA4PZ,CAAA,GAAa;EACF,IAAA,EAAA,WAAA;EACE,SAAA,EAAA,CAAA,IAAA,EOvRgC,SPuRhC,EAAA,KAAA,EAAA,MAAA,EAAA,GAAA,OAAA;CACV;AAAsB,KOtRb,iBAAA,GPsRa;WOrRZ;SACF;YACG;ENUF,QAAA,EMTE,sBNSsB;AAapC,CAAA;AAOY,KM1BA,oBAAA,GN0BwB;EAiOvB,OAAA,EAAA;IACF,IAAA,EAAA,YAAA,GAAA,wBAAA;IACG,SAAA,EAAA,MAAA;IACX,aAAA,EAAA,MAAA;IAAwB,SAAA,EAAA,MAAA;;;UMtPf;ILtBA,SAAA,EAAA,MAAe;IA+Ed,QAAA,EAAA,MAAA;IAgEA,aAAA,EAAA,MAAA;IAuBA,UAAA,EAAA,MAAA;EAiCA,CAAA,CAAA;WK3KA;;;IJLR,sBAAY,CAAA,EAAA,MAAA;IA4BZ,qBAAe,CAAA,EAAA,MAAA;IA8Bf,YAAA,EAAA,MAAqB;IAiCrB,MAAA,EAAA,WAAA,GAAsB,oBAAA,GAAA,WAAA,GAAA,sBAAA,GAAA,qBAAA;IAwBtB,QAAA,EAAA,OAAA,GAAmB,QAAA,GAAA,MAAA;IAenB,EAAA,CAAA,EAAA,MAAW;IACV,KAAA,CAAA,EAAA,MAAA,EAAA;EACA,CAAA,CAAA;EACA,MAAA,EAAA,MAAA,EAAA;EACA,QAAA,EAAA,MAAA,EAAA;CACA;KInHD,oBAAA,GJmHoB,MAAA,GAAA,YAAA,GAAA,qBAAA;AAiBZ,iBI4eG,qCAAA,CJ5eoG,KAAA,EI6ezG,IJ7eyG,EAAA,EAAA,QAAA,EI8etG,OJ9esG,EAAA,EAAA,OAAA,EI+evG,mBJ/euG,EAAA,QAAA,EIgftG,sBJhfsG,EAAA,IAiBlG,CAjBkG,EAAA;EAOxG,IAAA,CAAA,EAAA,YAAc,GAAA,wBAAW;EAUhC,gBAAa,CAAA,EIkeS,oBJleT;AAAA,CAAA,CAAA,EAyCb;EA2GO,MAAA,EIgVC,oBJhVQ;EAAG,QAAA,EIgVqB,OJhVrB,EAAA;CAAc;AAAgB,iBI6XtC,6BAAA,CJ7XsC,IAAA,EI8X5C,iBJ9X4C,EAAA,EAAA,KAAA,EAAA;EAAe,IAAA,CAAA,EAAA,YAAA,GAAA,wBAAA;qBI+XW;;UACnE;EHjtBD,QAAA,EGitBiC,OHjtBnB,EAAA;AAsC1B,CAAA;;;;AG2qBoD,KCxjBxC,gBAAA,GDwjBwC,CAAA,OAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;APtsBpD;AAcA;AAEA;AA4Pa,KSlRD,mBAAA,GTqSX,gBAAA,GAAA,eAAA,GAAA,WAAA,GAAA,eAAA;;;;AAhBwB,KShRb,eAAA,GTgRa;QS/Qf;;;ERME;EAaA,KAAA,CAAA,EAAA,MAAA;EAOA;EAiOC,OAAA,CAAA,EAAA,MAAA;CACF;;;;;KQ/OC,oBAAA;oBACU;EP5BV,eAAA,CAAA,EAAe,CO6BJ,eP7BI,GAAA,SAAA,CAAA,EAAA;EA+Ed,YAAA,CAAA,EAAA,COjDO,ePkGnB,GAjD+C,SAAA,CAAA,EAAA;EAgEnC,QAAA,CAAA,EOhHE,ePgHF;AAuBb,CAAA;AAiCA;;;;AC5M4C;AA4B3B;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AAmD3D,cMnHQ,aN4JC,EAAA,CAAA,KAAA,EM5JuB,SNyKpB,EAAA,EAAA,GAAA,CMzK+B,oBNyK/B,GAAA,SAAA,CAAA,EAAA;AAqDjB;;;;;;;;ACjVA;AAsCA;;;;AC7BY,cIgJC,sBJ5EU,EAAA,CAAA,OAAA,EAAA,CI4E0B,oBJ5E1B,GAAA,SAAA,CAAA,EAAA,EAAA,GAAA,MAAA,EAAA;;;;;AJ5DvB;AAaA;AAOA;AAiOA;;;;;;;;ACzQA;AA+EA;AAgEA;AAuBA;AAiCA;;;;AC5M4C;AA4B3B;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AAU9C;AAoJlB;;;;;cOxCa,sBAAuB,iBAAiB,wBAAmB;;;;AXnS5D,cYyFC,KZ5DF,EAAA;EAeC;EA2BA,SAAA,GAAS,EAAA,SAAA;;;;EClET,SAAA,MAAA,EAAA,YAAwB;EAcxB;EAEA,SAAA,IAAA,EAAA,UAAsB;EA4PrB;EACF,SAAA,IAAA,EAAA,UAAA;EACE;EACV,SAAA,IAAA,EAAA,UAAA;EAAsB;;;;ECzQb;EAaA,SAAA,IAAA,EAAA,UAAA;EAOA;EAiOC,SAAA,GAAA,EAAA,SAAA;EACF;EACG,SAAA,QAAA,EAAA,cAAA;EACX;EAAwB,SAAA,IAAA,EAAA,UAAA;;;;EC5Qf,SAAA,KAAA,EAAA,WAAe;EA+Ed;EAgEA,SAAA,KAAA,EAAA,WAeZ;EAQY;EAiCA,SAAA,MAAA,EAAA,YAmBZ;;;;AC/N2C;AAwDvC,KQqFO,QAAA,GRrFQ,MAAA,OQqFgB,KRrFhB;AAAA;AA+Df,cQyBQ,WRzBc,EAAA,CAAA,KAAA,EAAA,MAAA,EAAA,IAAA,EAAA,MAAA,EAAA,GAAA,MAAA;AAAA;AAuCtB,cQGQ,+BRHG,EAAA,CAAA,QAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;;;;AAsBhB;AAOA;AAAgE;AAU9C;AAoJlB;;;;;;;;ACjVA;AAsCA;;;;AC7BA;AAyGA;cMuFa,gBAAgB;;;ALtL7B;AA6CA;;;;;;;;;ACjEA;AAKA;;AAEW,cI8OE,cJ9OF,EAAA,CAAA,KAAA,EAAA,MAAA,EAAA,GAAA,OAAA;;;;AAKX;AA2BE;AAknBF;AACW,KI1ZC,YAAA,GJ0ZD;EACG;;;;;EAO+B,OAAA,EAAA,MAAA;EAAO;AA6CpD;;;;EAG6C,YAAA,EAAA,MAAA,EAAA;EAAO;;;;ACxjBpD;;;;ACpJA;AAKA;AAcA;;;;;;AA2FA;AAsCA;;;;ACgJA;;;;;;;;AC1MA;AAsCA;AAGA;AAiBA;AAgDA;AA2CA;AAWA;AAkJA;AA8CA;AAuBA;AAqBA;AAgBA;AA8BA;AAWA;AAoBa,cAvKA,wBAuKkD,EAAA,CAAA,KAAY,EAAA,MAAA,EAAA,cAAA,CAAA,EAAA,CAAA,OAAA,EAAA,MAAA,EAAA,GAAA,MAAA,EAAA,aAAA,CAAA,EAAA,MAAA,EAAA,GAAA;EA6B9D,YAAA,EAAA,MAAA,EAGZ;;;;AC3jBD;AAsDA;AAeA;;;;;;;;;;;;;;;;;cD6Va;;;;;;;;;;;;;;;;;;;;;;cAuBA,uCAAmC;;;;;;;;;;;;;cAqBnC;;;;;;;;;;;;;;;cAgBA;;;;;;;;;;;;;;;cA8BA;;;;KAWD,YAAA;;;;;;;;;;;;;;;;;;;;;;cAoBC,iDAAkD;;;;;;;;;;;;;;;;cA6BlD;;;AX9Tb;;;;;;;;ACtQA;AAaA;AAOA;AAiOA;;;;;;;;ACzQA;AA+EA;AAgEA;AAuBA;AAiCa,cUvKA,sBV8KC,EAAA,CAAA,OAAe,EAAA,MAAA,EAAA,GAAA,MAAA;;;;ACnNe;AA4B3B;AA4BG;AA8BM;AAiCC;AAwBH;;;;;AAoBlB,cSxEO,WTwEP,EAAA,CAAA,CAAA,EAAA,MAAA,EAAA,GAAA,MAAA;AAAmB,cSzDZ,wBTyDY,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA,MAAA"}

package/dist/index.mjs CHANGED Viewed

@@ -34,7 +34,9 @@ const normalizeLineEndings = (content) => {
 * escapeTemplateBrackets('{{harf}}')
 * // → '{{harf}}' (unchanged - no brackets outside tokens)
 */
-const escapeTemplateBrackets = (pattern) => pattern.replace(/(\{\{[^}]*\}\})|([()[\]])/g, (_match, token, bracket) => token || `\\${bracket}`);
+const escapeTemplateBrackets = (pattern) => {
+	return pattern.replace(/(\{\{[^}]*\}\})|([()[\]])/g, (_match, token, bracket) => token || `\\${bracket}`);
+};
 /**
 * Character class matching all Arabic diacritics (Tashkeel/Harakat).
 *
@@ -93,7 +95,9 @@ const getEquivClass = (ch) => {
 	const group = EQUIV_GROUPS.find((g) => g.includes(ch));
 	return group ? `[${group.map(escapeRegex).join("")}]` : escapeRegex(ch);
 };
-const normalizeArabicLight = (str) => str.normalize("NFC").replace(/[\u200C\u200D]/g, "").replace(/\s+/g, " ").trim();
+const normalizeArabicLight = (str) => {
+	return str.normalize("NFC").replace(/[\u200C\u200D]/g, "").replace(/\s+/g, " ").trim();
+};
 const makeDiacriticInsensitive = (text) => {
 	const diacriticsMatcher = `${DIACRITICS_CLASS}*`;
 	return Array.from(normalizeArabicLight(text)).map((ch) => getEquivClass(ch) + diacriticsMatcher).join("");
@@ -1224,53 +1228,6 @@ const optimizeRules = (rules) => {
 	};
 };
-//#endregion
-//#region src/preprocessing/replace.ts
-const DEFAULT_REPLACE_FLAGS = "gu";
-const normalizeReplaceFlags = (flags) => {
-	if (!flags) return DEFAULT_REPLACE_FLAGS;
-	const allowed = new Set([
-		"g",
-		"i",
-		"m",
-		"s",
-		"u",
-		"y"
-	]);
-	const set = new Set(flags.split("").filter((ch) => {
-		if (!allowed.has(ch)) throw new Error(`Invalid replace regex flag: "${ch}" (allowed: gimsyu)`);
-		return true;
-	}));
-	set.add("g");
-	set.add("u");
-	return [
-		"g",
-		"i",
-		"m",
-		"s",
-		"y",
-		"u"
-	].filter((c) => set.has(c)).join("");
-};
-const compileReplaceRules = (rules) => rules.filter((r) => !(r.pageIds && r.pageIds.length === 0)).map((r) => ({
-	pageIdSet: r.pageIds ? new Set(r.pageIds) : void 0,
-	re: new RegExp(r.regex, normalizeReplaceFlags(r.flags)),
-	replacement: r.replacement
-}));
-const applyReplacements = (pages, rules) => {
-	if (!rules?.length || !pages.length) return pages;
-	const compiled = compileReplaceRules(rules);
-	if (!compiled.length) return pages;
-	return pages.map((p) => {
-		let content = p.content;
-		for (const rule of compiled) if (!rule.pageIdSet || rule.pageIdSet.has(p.id)) content = content.replace(rule.re, rule.replacement);
-		return content === p.content ? p : {
-			...p,
-			content
-		};
-	});
-};
 //#endregion
 //#region src/segmentation/rule-regex.ts
 /**
@@ -1928,24 +1885,63 @@ const findPatternBreakPosition = (windowContent, regex, prefer, splitAt = false)
 * Handles page boundary breakpoint (empty pattern).
 * Returns break position or -1 if no valid position found.
 */
+const findStartOfNextPageInWindow = (remainingContent, currentFromIdx, toIdx, pageIds, normalizedPages, targetPos) => {
+	const targetNextPageIdx = currentFromIdx + 1;
+	for (let nextIdx = targetNextPageIdx; nextIdx > currentFromIdx; nextIdx--) if (nextIdx <= toIdx) {
+		const nextPageData = normalizedPages.get(pageIds[nextIdx]);
+		if (nextPageData) {
+			const boundaryPos = findNextPagePosition(remainingContent, nextPageData);
+			if (boundaryPos > 0 && boundaryPos <= targetPos) return boundaryPos;
+		}
+	}
+	return -1;
+};
 const handlePageBoundaryBreak = (remainingContent, currentFromIdx, windowEndPosition, maxContentLength, toIdx, pageIds, normalizedPages) => {
 	const targetPos = Math.min(windowEndPosition, remainingContent.length);
-	if (!(maxContentLength !== void 0 && windowEndPosition === maxContentLength)) {
-		const targetNextPageIdx = currentFromIdx + 1;
-		for (let nextIdx = targetNextPageIdx; nextIdx > currentFromIdx; nextIdx--) if (nextIdx <= toIdx) {
-			const nextPageData = normalizedPages.get(pageIds[nextIdx]);
-			if (nextPageData) {
-				const boundaryPos = findNextPagePosition(remainingContent, nextPageData);
-				if (boundaryPos > 0 && boundaryPos <= targetPos) return boundaryPos;
-			}
-		}
+	const isLengthBounded = maxContentLength !== void 0 && windowEndPosition === maxContentLength;
+	if (!isLengthBounded) {
+		const boundaryPos = findStartOfNextPageInWindow(remainingContent, currentFromIdx, toIdx, pageIds, normalizedPages, targetPos);
+		if (boundaryPos > 0) return { pos: boundaryPos };
 	}
 	if (targetPos < remainingContent.length) {
 		const safePos = findSafeBreakPosition(remainingContent, targetPos);
-		if (safePos !== -1) return safePos;
-		return adjustForUnicodeBoundary(remainingContent, targetPos);
+		if (safePos !== -1) return {
+			pos: safePos,
+			splitReason: isLengthBounded ? "whitespace" : void 0
+		};
+		return {
+			pos: adjustForUnicodeBoundary(remainingContent, targetPos),
+			splitReason: isLengthBounded ? "unicode_boundary" : void 0
+		};
 	}
-	return targetPos;
+	return { pos: targetPos };
+};
+const checkBreakpointMatch = (i, remainingContent, currentFromIdx, toIdx, windowEndIdx, windowEndPosition, ctx, maxContentLength) => {
+	const { pageIds, normalizedPages, expandedBreakpoints, prefer } = ctx;
+	const bpCtx = expandedBreakpoints[i];
+	const { rule, regex, excludeSet, skipWhenRegex } = bpCtx;
+	if (!isInBreakpointRange(pageIds[currentFromIdx], rule)) return null;
+	if (hasExcludedPageInRange(excludeSet, pageIds, currentFromIdx, windowEndIdx)) return null;
+	if (skipWhenRegex?.test(remainingContent)) return null;
+	if (regex === null) {
+		const result = handlePageBoundaryBreak(remainingContent, currentFromIdx, windowEndPosition, maxContentLength, toIdx, pageIds, normalizedPages);
+		return {
+			breakPos: result.pos,
+			breakpointIndex: i,
+			contentLengthSplit: result.splitReason && maxContentLength ? {
+				maxContentLength,
+				reason: result.splitReason
+			} : void 0,
+			rule
+		};
+	}
+	const breakPos = findPatternBreakPosition(remainingContent.slice(0, Math.min(windowEndPosition, remainingContent.length)), regex, prefer, bpCtx.splitAt);
+	if (breakPos > 0) return {
+		breakPos,
+		breakpointIndex: i,
+		rule
+	};
+	return null;
 };
 /**
 * Tries to find a break position within the current window using breakpoint patterns.
@@ -1959,23 +1955,10 @@ const handlePageBoundaryBreak = (remainingContent, currentFromIdx, windowEndPosi
 * @returns Break position in the content, or -1 if no break found
 */
 const findBreakPosition = (remainingContent, currentFromIdx, toIdx, windowEndIdx, windowEndPosition, ctx, maxContentLength) => {
-	const { pageIds, normalizedPages, expandedBreakpoints, prefer } = ctx;
+	const { expandedBreakpoints } = ctx;
 	for (let i = 0; i < expandedBreakpoints.length; i++) {
-		const { rule, regex, excludeSet, skipWhenRegex } = expandedBreakpoints[i];
-		if (!isInBreakpointRange(pageIds[currentFromIdx], rule)) continue;
-		if (hasExcludedPageInRange(excludeSet, pageIds, currentFromIdx, windowEndIdx)) continue;
-		if (skipWhenRegex?.test(remainingContent)) continue;
-		if (regex === null) return {
-			breakPos: handlePageBoundaryBreak(remainingContent, currentFromIdx, windowEndPosition, maxContentLength, toIdx, pageIds, normalizedPages),
-			breakpointIndex: i,
-			rule
-		};
-		const breakPos = findPatternBreakPosition(remainingContent.slice(0, Math.min(windowEndPosition, remainingContent.length)), regex, prefer, expandedBreakpoints[i].splitAt);
-		if (breakPos > 0) return {
-			breakPos,
-			breakpointIndex: i,
-			rule
-		};
+		const match = checkBreakpointMatch(i, remainingContent, currentFromIdx, toIdx, windowEndIdx, windowEndPosition, ctx, maxContentLength);
+		if (match) return match;
 	}
 	return null;
 };
@@ -2124,7 +2107,8 @@ const findBreakOffsetForWindow = (remainingContent, currentFromIdx, windowEndIdx
 	if (patternMatch && patternMatch.breakPos > 0) return {
 		breakOffset: patternMatch.breakPos,
 		breakpointIndex: patternMatch.breakpointIndex,
-		breakpointRule: patternMatch.rule
+		breakpointRule: patternMatch.rule,
+		contentLengthSplit: patternMatch.contentLengthSplit
 	};
 	if (windowEndPosition < remainingContent.length) {
 		const safeOffset = findSafeBreakPosition(remainingContent, windowEndPosition);
@@ -3128,8 +3112,7 @@ const segmentPages = (pages, options) => {
 		prefer,
 		ruleCount: rules.length
 	});
-	const processedPages = options.replace ? applyReplacements(pages, options.replace) : pages;
-	const { content: matchContent, normalizedPages: normalizedContent, pageMap } = buildPageMap(processedPages);
+	const { content: matchContent, normalizedPages: normalizedContent, pageMap } = buildPageMap(pages);
 	logger?.debug?.("[segmenter] content built", {
 		pageIds: pageMap.pageIds,
 		totalContentLength: matchContent.length
@@ -3142,10 +3125,10 @@ const segmentPages = (pages, options) => {
 	});
 	let segments = buildSegments(unique, matchContent, pageMap, rules, pageJoiner);
 	logger?.debug?.("[segmenter] structural segments built", { segmentCount: segments.length });
-	segments = ensureFallbackSegment(segments, processedPages, normalizedContent, pageJoiner);
+	segments = ensureFallbackSegment(segments, pages, normalizedContent, pageJoiner);
 	if (hasLimits) {
 		logger?.debug?.("[segmenter] applying breakpoints to oversized segments");
-		const result = applyBreakpoints(segments, processedPages, normalizedContent, maxPages, breakpoints, prefer, (p) => processPattern(p, false).pattern, logger, pageJoiner, debug?.includeBreakpoint ? debug.metaKey : void 0, maxContentLength, processBreakpointPattern);
+		const result = applyBreakpoints(segments, pages, normalizedContent, maxPages, breakpoints, prefer, (p) => processPattern(p, false).pattern, logger, pageJoiner, debug?.includeBreakpoint ? debug.metaKey : void 0, maxContentLength, processBreakpointPattern);
 		logger?.info?.("[segmenter] segmentation complete (with breakpoints)", { finalSegmentCount: result.length });
 		return result;
 	}
@@ -3497,13 +3480,12 @@ const runStage1IfEnabled = (pages, segments, options, selectedRuleIndices, mode)
 		recoveredAtIndex,
 		recoveredDetailAtIndex
 	};
-	const processedPages = options.replace ? applyReplacements(pages, options.replace) : pages;
-	const pageIdToIndex = buildPageIdToIndex(processedPages);
+	const pageIdToIndex = buildPageIdToIndex(pages);
 	const pageJoiner = options.pageJoiner ?? "space";
 	const compiledMistaken = compileMistakenRulesAsStartsWith(options, selectedRuleIndices);
 	for (let i = 0; i < segments.length; i++) {
 		const orig = segments[i];
-		const r = tryBestEffortRecoverOneSegment(orig, processedPages, pageIdToIndex, compiledMistaken, pageJoiner);
+		const r = tryBestEffortRecoverOneSegment(orig, pages, pageIdToIndex, compiledMistaken, pageJoiner);
 		if (r.kind !== "recovered") continue;
 		const seg = {
 			...orig,
@@ -3858,5 +3840,5 @@ const formatValidationReport = (results) => results.flatMap((result, i) => {
 });
 //#endregion
-export { PATTERN_TYPE_KEYS, TOKEN_PATTERNS, Token, analyzeCommonLineStarts, analyzeRepeatingSequences, analyzeTextForRule, applyReplacements, applyTokenMappings, containsTokens, detectTokenPatterns, escapeRegex, escapeTemplateBrackets, expandCompositeTokensInTemplate, expandTokens, expandTokensWithCaptures, formatValidationReport, generateTemplateFromText, getAvailableTokens, getTokenPattern, makeDiacriticInsensitive, optimizeRules, recoverMistakenLineStartsAfterMarkers, recoverMistakenMarkersForRuns, segmentPages, shouldDefaultToFuzzy, stripTokenMappings, suggestPatternConfig, templateToRegex, validateRules, withCapture };
+export { PATTERN_TYPE_KEYS, TOKEN_PATTERNS, Token, analyzeCommonLineStarts, analyzeRepeatingSequences, analyzeTextForRule, applyTokenMappings, containsTokens, detectTokenPatterns, escapeRegex, escapeTemplateBrackets, expandCompositeTokensInTemplate, expandTokens, expandTokensWithCaptures, formatValidationReport, generateTemplateFromText, getAvailableTokens, getTokenPattern, makeDiacriticInsensitive, optimizeRules, recoverMistakenLineStartsAfterMarkers, recoverMistakenMarkersForRuns, segmentPages, shouldDefaultToFuzzy, stripTokenMappings, suggestPatternConfig, templateToRegex, validateRules, withCapture };
 //# sourceMappingURL=index.mjs.map