flappa-doormal 2.11.1 → 2.11.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -400,6 +400,8 @@ bunx biome lint .
400
400
 
401
401
  12. **Prefix matching fails with duplicated content**: When using `indexOf()` to find page boundaries by matching prefixes, false positives occur when pages have identical prefixes AND content is duplicated within pages. Solution: use cumulative byte offsets as the source of truth for expected boundaries, and only accept prefix matches within a strict deviation threshold (2000 chars). When content-based detection fails, fall back directly to the calculated offset rather than returning `remainingContent.length` (which merges all remaining pages).
402
402
 
403
+ 13. **ASCII vs Arabic-Indic Numerals**: While most classical Arabic texts use Arabic-Indic digits (`٠-٩`), modern digitizers often mix them with ASCII digits (`0-9`). Providing separate tokens (`{{raqms}}` for Arabic and `{{nums}}` for ASCII) allows better precision in rule definitions while keeping patterns readable. Always check which digit set is used in the source text before authoring rules.
404
+
403
405
  ### For Future AI Agents (Recovery + Repo gotchas)
404
406
 
405
407
  1. **`lineStartsAfter` vs `lineStartsWith` is not “cosmetic”**: `lineStartsAfter` changes output by stripping the matched marker via an internal `contentStartOffset` during segment construction. If a client used it by accident, you cannot reconstruct the exact stripped prefix from output alone without referencing the original pages and re-matching the marker.
@@ -439,6 +441,8 @@ bunx biome lint .
439
441
 
440
442
  15. **Invisible Unicode Marks Break Regex Anchors**: Arabic text often contains invisible bidirectional formatting marks like Left-to-Right Mark (`U+200E`), Right-to-Left Mark (`U+200F`), or Arabic Letter Mark (`U+061C`). These appear at line starts after `\n` but before visible characters, breaking `^` anchored patterns. Solution: include an optional zero-width character class prefix in line-start patterns: `^[\u200E\u200F\u061C\u200B\uFEFF]*(?:pattern)`. The library now handles this automatically in `buildLineStartsWithRegexSource` and `buildLineStartsAfterRegexSource`.
441
443
 
444
+ 16. **Large Segment Performance & Debugging Strategy**: When processing large books (1000+ pages), avoid O(n²) algorithms. The library uses a fast-path threshold (1000 pages) to switch from accurate string-search boundary detection to cumulative-offset-based slicing. To diagnose performance bottlenecks: (1) Look for logs with "Using iterative path" or "Using accurate string-search path" with large `pageCount` values, (2) Check `iterations` count in completion logs, (3) Strategic logs are placed at operation boundaries (start/end) NOT inside tight loops to avoid log-induced performance regression.
445
+
442
446
  ### Process Template (Multi-agent design review, TDD-first)
443
447
 
444
448
  If you want to repeat the “write a plan → get multiple AI critiques → synthesize → update plan → implement TDD-first” workflow, use:
@@ -467,6 +471,8 @@ If you want to repeat the “write a plan → get multiple AI critiques → synt
467
471
  | `{{kitab}}` | "كتاب" (book) | كتاب الصلاة |
468
472
  | `{{raqm}}` | Single Arabic-Indic numeral | ٥ |
469
473
  | `{{raqms}}` | Multiple Arabic-Indic numerals | ٧٥٦٣ |
474
+ | `{{num}}` | Single ASCII numeral | 5 |
475
+ | `{{nums}}` | Multiple ASCII numerals | 123 |
470
476
  | `{{raqms:num}}` | Numerals with named capture | `meta.num = "٧٥٦٣"` |
471
477
  | `{{dash}}` | Various dash characters | - – — ـ |
472
478
  | `{{harfs}}` | Single-letter codes separated by spaces | `د ت س ي ق` |
package/README.md CHANGED
@@ -102,6 +102,8 @@ Replace regex with readable tokens:
102
102
  |-------|---------|------------------|
103
103
  | `{{raqms}}` | Arabic-Indic digits | `[\\u0660-\\u0669]+` |
104
104
  | `{{raqm}}` | Single Arabic digit | `[\\u0660-\\u0669]` |
105
+ | `{{nums}}` | ASCII digits | `\\d+` |
106
+ | `{{num}}` | Single ASCII digit | `\\d` |
105
107
  | `{{dash}}` | Dash variants | `[-–—ـ]` |
106
108
  | `{{harf}}` | Arabic letter | `[أ-ي]` |
107
109
  | `{{harfs}}` | Single-letter codes separated by spaces | `[أ-ي](?:\s+[أ-ي])*` |
@@ -776,6 +778,8 @@ Available tokens you may use in templates:
776
778
  - {{naql}} (حدثنا/أخبرنا/... narration phrases)
777
779
  - {{raqm}} (single Arabic-Indic digit)
778
780
  - {{raqms}} (Arabic-Indic digits)
781
+ - {{num}} (single ASCII digit)
782
+ - {{nums}} (ASCII digits)
779
783
  - {{dash}} (dash variants)
780
784
  - {{tarqim}} (punctuation [. ! ? ؟ ؛])
781
785
  - {{harf}} (Arabic letter)
@@ -1324,3 +1328,6 @@ bun run deploy
1324
1328
 
1325
1329
  MIT
1326
1330
 
1331
+ ## Inspiration
1332
+
1333
+ The name of the project is from Asmāʾ, it seems to be some sort of gymnastic move.
@@ -1 +1 @@
1
- {"version":3,"file":"index.d.mts","names":[],"sources":["../src/segmentation/fuzzy.ts","../src/segmentation/types.ts","../src/segmentation/optimize-rules.ts","../src/segmentation/pattern-validator.ts","../src/segmentation/replace.ts","../src/segmentation/segmenter.ts","../src/segmentation/tokens.ts","../src/analysis/line-starts.ts","../src/analysis/repeating-sequences.ts","../src/detection.ts","../src/recovery.ts"],"sourcesContent":[],"mappings":";;AAkEA;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBY,cDnUC,WCmUG,EAAA,CAAA,CAAA,EAAA,MAAA,EAAA,GAAA,MAAA;AAoChB;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;;AC5GA;AAKA;AAcA;;;;;;AA8GA;AAwDA;;;;AC1LA;AA2DA;AAAyC,cJ0F5B,wBI1F4B,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;AJLzC;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;KArVK,YAAA,GAqViD;EAAe;EAkBzD,KAAA,EAAI,MAAA;AAoChB,CAAA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;;AC5GA;AAKA;AAcA;;KF0BK,eAAA,GExBkB;EACH;EACL,QAAA,EAAA,MAAA;CAAe;AA0G9B;AAwDA;;;;AC1LA;AA2DA;;;;;;;;ACyPA;;;;;;;;AC5PA;AA0QA;AAsDA;AA2CA,KLvVK,qBAAA,GK0VJ;EAQW;EAuKC,cAAA,EAAA,MAAA,EAAA;AA6Cb,CAAA;AAuBA;AAqBA;AAgBA;AA8BA;AAWA;AAoBA;AA6BA;;;;AC/wBA;AAcA;AAEA;AAwQA;;;;;;;;AClRA;AAaA;AAOA;AA2OA;;;;KPjKK,sBAAA,GOoKsB;;;;ACjR3B;AA+EA;AAgEA;AAuBA;AAiCA;;;;AC7MA;AAKA;;;;;;AAOA;AA2BE;AAmnBF;;KT/gBK,mBAAA,GSihBS;EACD;EACC,YAAA,EAAA,MAAA,EAAA;CAGa;;;;AA+C3B;;;;;;;KTtjBK,WAAA,GACC,eACA,kBACA,wBACA,yBACA;;;;;;;;;;;;;;;;cAiBO;;;;;;KAOD,cAAA,WAAyB;;;;;;;KAYhC,aAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAiDO,SAAA;;;;;;;KAYP,eAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAyCS;;;;;;;;;;;;SAaH;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAuDC,SAAA,GAAY,cAAc,gBAAgB;;;;;;;;;;;;;KAkB1C,IAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAoCA,cAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAqCE;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqCF,UAAA,YAAsB;;;;;;;;;;;;;;;;;;;;;;;;;UA8BjB,MAAA;;;;;;;;;;;;;;;;;;;;;;KAuBL,WAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA+CA,mBAAA;;;;;;YAME;;;;;;;;UASF;;;;;;;;;;;;;;cAiBY;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;gBA4DN;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;WAwDL;;;;;;;;;;;;;;;;KAiBD,OAAA;;;;;;;;;;;;;;;;;;;;;;;;;;SA6BD;;;;AAxoBa;;;AAkBlB,KChJM,cAAA,GDgJN;EACA;EACA,KAAA,EChJK,SDgJL,EAAA;EAAmB;EAiBZ,WAAA,EAAA,MAAA;AAOb,CAAA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBA;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;cAAa,uBAAwB,gBAAc;;;AD3FlC;AA4BG;AA8BM;AAyDrB,KEpIO,mBAAA,GFoIY,gBAAA,GAAA,eAAA,GAAA,WAAA,GAAA,eAAA;AAAA;;;AAkBlB,KEjJM,eAAA,GFiJN;EACA,IAAA,EEjJI,mBFiJJ;EACA,OAAA,EAAA,MAAA;EAAmB,UAAA,CAAA,EAAA,MAAA;EAiBZ;EAOD,KAAA,CAAA,EAAA,MAAA;EAYP;EAiDO,OAAA,CAAA,EAAA,MAAS;AAA6B,CAAA;AAyHlD;;;;AAAqE,KEnVzD,oBAAA,GFmVyD;EAkBzD,cAAI,CAAA,EAAA,CEpWM,eFoWN,GAAA,SAAA,CAAA,EAAA;EAoCJ,eAAA,CAAA,EAAc,CEvYH,eF4aT,GAAA,SAAS,CAAA,EAAA;EAqCX,YAAA,CAAU,EAAA,CEhdF,eFgdc,GAAA,SAAc,CAAA,EAAA;EA8B/B,QAAA,CAAM,EE7eR,eF6eQ;AAuBvB,CAAA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;;AC5GA;AAKA;AAcA;AACsB,cA6GT,aA7GS,EAAA,CAAA,KAAA,EA6Ge,SA7Gf,EAAA,EAAA,GAAA,CA6G8B,oBA7G9B,GAAA,SAAA,CAAA,EAAA;;;;;AA6GtB;AAwDA;;;;AC1LA;AA2DA;;;AAAyE,cD+H5D,sBC/H4D,EAAA,CAAA,OAAA,EAAA,CD+HxB,oBC/HwB,GAAA,SAAA,CAAA,EAAA,EAAA,GAAA,MAAA,EAAA;;;AJLzE;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAuCtB,KGpJO,WAAA,GAAc,WHoJV,CGpJsB,mBHoJtB,CAAA,SAAA,CAAA,CAAA,CAAA,MAAA,CAAA;;;;;;;AAsBhB;AAOA;AAAgE;AA6DhE;AAYK,cG/LQ,iBHwOC,EAAA,CAAA,KAaH,EGrP8B,IHqPxB,EAAA,EAAA,KAAA,CAAA,EGrPwC,WHqPxC,EAAA,EAAA,GGrPwD,IHqPxD,EAAA;;;;;;;AAyEjB;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;;AC5GA;AAKA;AAcA;;;;;;AA8GA;AAwDA;;;;AC1LA;AA2DA;;;AAAyE,cCyP5D,YDzP4D,EAAA,CAAA,KAAA,ECyPrC,IDzPqC,EAAA,EAAA,OAAA,ECyPpB,mBDzPoB,EAAA,GCyPD,ODzPC,EAAA;;;;AJLzE;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBA;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;;AC5GA;AAKA;AAcA;;;;;;AA8GA;AAwDA;;;;AC1LA;AA2Da,cEHA,sBFyBZ,EAAA,CAAA,OAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;;;;;ACmOD;;;;;;;cCca;AA1Qb;AA0QA;AAsDA;AA2CA;AAWA;AAuKA;AA6CA;AAuBA;AAqBA;AAgBA;AA8BA;AAWA;AAoBA;AA6BA;;;;AC/wBA;AAcA;AAEA;AAwQA;;;;;cDuFa,gBAAgB;;;AEzW7B;AAaA;AAOA;AA2OA;;;;;;;;AC9QA;AA+EA;AAgEa,cHoRA,cGrQZ,EAAA,CAAA,KAfgE,EAAA,MAAA,EAAA,GAAA,OAAe;AAuBhF;AAiCA;;;;AC7MA;AAKY,KJ+aA,YAAA,GI/aiB;EAChB;;;;;EAMD,OAAA,EAAA,MAAA;EA6BP;AAinBL;;;;EAIc,YAAA,EAAA,MAAA,EAAA;EAGa;;;;AA+C3B;EACU,WAAA,EAAA,OAAA;CACsE;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;cJvHnE,mHAIV;;;;;;;;;;;;;;;;;;;;cAyCU;;;;;;;;;;;;;;;;;;;;;;cAuBA,uCAAmC;;;;;;;;;;;;;cAqBnC;;;;;;;;;;;;;;;cAgBA;;;;;;;;;;;;;;;cA8BA;;;;KAWD,YAAA;;;;;;;;;;;;;;;;;;;;;;cAoBC,iDAAkD;;;;;;;;;;;;;;;;cA6BlD;;;ANluBA,KO7CD,wBAAA,GP6C8E;EA+F7E,IAAA,CAAA,EAAA,MAAA;;;;ECnIR,WAAA,CAAA,EAAA,MAAY;EA4BZ,wBAAe,CAAA,EAAA,OAAA;EA8Bf,yBAAqB,CAAA,EAAA,OAAA;EAiCrB,MAAA,CAAA,EAAA,aAAA,GAAsB,OAAA;EAwBtB,UAAA,CAAA,EAAA,CAAA,IAAA,EAAA,MAAmB,EAAA,MAAA,EAAA,MAAA,EAAA,GAAA,OAAA;EAenB,cAAW,CAAA,EMjIK,MNiIL,EAAA;EACV,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;CACA;AACA,KMhIM,uBAAA,GNgIN;EACA,IAAA,EAAA,MAAA;EACA,MAAA,EAAA,MAAA;CAAmB;AAiBZ,KMjJD,sBAAA,GNiJwG;EAOxG,OAAA,EAAA,MAAA;EAYP,KAAA,EAAA,MAAA;EAiDO,QAAA,EMlNE,uBNkNO,EAAA;AAA6B,CAAA;AAyHlD;;;AAAsD,cMtEzC,uBNsEyC,EAAA,CAAA,KAAA,EMrE3C,INqE2C,EAAA,EAAA,OAAA,CAAA,EMpEzC,wBNoEyC,EAAA,GMnEnD,sBNmEmD,EAAA;;;AAlOjD,KOtHO,wBAAA,GPsHY;EAenB,WAAA,CAAA,EAAW,MAAA;EACV,WAAA,CAAA,EAAA,MAAA;EACA,QAAA,CAAA,EAAA,MAAA;EACA,IAAA,CAAA,EAAA,MAAA;EACA,yBAAA,CAAA,EAAA,OAAA;EACA,YAAA,CAAA,EAAA,OAAA;EAAmB,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;EAiBZ,WAAA,CAAA,EAAA,MAAuG;EAOxG,YAAA,CAAA,EAAA,MAAc;EAYrB,iBAAa,CAAA,EAAA,MAAA;AAiDlB,CAAA;AAYK,KO9NO,wBAAA,GPuQE;EAoEF,IAAA,EAAA,MAAS;EAAG,OAAA,EAAA,MAAA;EAAc,MAAA,EAAA,MAAA;EAAgB,YAAA,EAAA,MAAA,EAAA;CAAe;AAkBzD,KOtVA,wBAAA,GPsVI;EAoCJ,OAAA,EAAA,MAAA;EA0EA,KAAA,EAAA,MAAU;EA8BL,QAAA,EO/dH,wBP+dS,EAAA;AAuBvB,CAAA;;;;;;AAoNA;cOlea,mCACF,kBACG,6BACX;;;;AR3NH;AA+FA;;;;;ACnIiB;AA4BG;AA+Df,KQ7GO,eAAA,GR6Ge;EAwBtB;EAeA,KAAA,EAAA,MAAA;EACC;EACA,KAAA,EAAA,MAAA;EACA;EACA,KAAA,EAAA,MAAA;EACA;EAAmB,QAAA,EAAA,MAAA;AAiBzB,CAAA;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBA;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;AAMc,cQlgBD,mBRkgBC,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GQlgBkC,eRkgBlC,EAAA;;;;;;AA+Jd;;;;ACzuBA;AAsGA;;;cOkCa,mDAAoD;AN9IjE;AAKA;AAcA;;;;AAIe,cM8IF,oBN9IE,EAAA,CAAA,QAAA,EM+ID,eN/IC,EAAA,EAAA,GAAA;EAAe,WAAA,EAAA,gBAAA,GAAA,iBAAA;EA0GjB,KAAA,EAAA,OAAA;EAwDA,QAAA,CAAA,EAAA,MAAA;;;;AC1Lb;AA2DA;;;AAAyE,cK4I5D,kBL5I4D,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA;EAAI,QAAA,EAAA,MAAA;;;;ECyPhE,QAAA,EItGC,eJgLb,EAAA;CA1EmC,GAAA,IAAA;;;AL9PvB,KU5DD,sBAAA,GV4D8E;EA+F7E,IAAA,EAAA,cAAA;;;;ECnIR,KAAA,CAAA,EAAA,OAAY,GAAA,YAAA;EA4BZ,QAAA,EAAA,MAAA,EAAe;AAAA,CAAA,GA8Bf;EAiCA,IAAA,EAAA,WAAA;EAwBA,SAAA,EAAA,CAAA,IAAA,ESxIwC,STwIrB,EAAA,KAAA,EAAA,MAAA,EAAA,GAAA,OAAA;AAAA,CAAA;AAgBlB,KStJM,iBAAA,GTsJN;EACA,OAAA,EStJO,mBTsJP;EACA,KAAA,EStJK,ITsJL,EAAA;EACA,QAAA,EStJQ,OTsJR,EAAA;EACA,QAAA,EStJQ,sBTsJR;CAAmB;AAiBZ,KSpKD,oBAAA,GToKwG;EAOxG,OAAA,EAAA;IAYP,IAAA,EAAA,YAAa,GAAA,wBAAA;IAiDN,SAAS,EAAA,MAAA;IAYhB,aAAA,EAAe,MAAA;IA6GR,SAAS,EAAA,MAAA;IAAG,UAAA,EAAA,MAAA;EAAc,CAAA;EAAgB,KAAA,CAAA,ESzV1C,KTyV0C,CAAA;IAAe,SAAA,EAAA,MAAA;IAkBzD,QAAI,EAAA,MAAA;IAoCJ,aAAc,EAAA,MAAA;IA0Ed,UAAU,EAAA,MAAA;EA8BL,CAAA,CAAA;EAuBL,OAAA,ESxgBC,KTwgBU,CAAA;IA+CX,IAAA,EAAA,MAAA;IAME,oBAAA,EAAA,MAAA;IASF,sBAAA,CAAA,EAAA,MAAA;IAiBY,qBAAA,CAAA,EAAA,MAAA;IA4DN,YAAA,EAAA,MAAA;IAwDL,MAAA,EAAA,WAAA,GAAA,oBAAA,GAAA,WAAA,GAAA,sBAAA,GAAA,qBAAA;IAAM,QAAA,EAAA,OAAA,GAAA,QAAA,GAAA,MAAA;IAiBP,EAAA,CAAA,EAAA,MAAO;;;;ECzuBP,QAAA,EAAA,MAAA,EAAc;AAsG1B,CAAA;KQ1EK,oBAAA;iBAinBW,qCAAA,QACL,kBACG,oBACD,+BACC,4BPlpBd;;EALY,gBAAA,CAAA,EO0pBe,oBP1pBI;AAK/B,CAAA,CAAA,EAAY;EAcA,MAAA,EOyoBC,oBPzoBmB;EACV,QAAA,EOwoBuB,OPxoBvB,EAAA;CACC;AACH,iBOmrBJ,6BAAA,CPnrBI,IAAA,EOorBV,iBPprBU,EAAA,EAAA,IA2GpB,CA3GoB,EAAA;EACL,IAAA,CAAA,EAAA,YAAA,GAAA,wBAAA;EAAe,gBAAA,CAAA,EOorBkD,oBPprBlD;AA0G9B,CAAA,CAAA,EAAa;EAwDA,MAAA,EOmhBA,oBP1eZ;YO0e4C"}
1
+ {"version":3,"file":"index.d.mts","names":[],"sources":["../src/segmentation/fuzzy.ts","../src/segmentation/types.ts","../src/segmentation/optimize-rules.ts","../src/segmentation/pattern-validator.ts","../src/segmentation/replace.ts","../src/segmentation/segmenter.ts","../src/segmentation/tokens.ts","../src/analysis/line-starts.ts","../src/analysis/repeating-sequences.ts","../src/detection.ts","../src/recovery.ts"],"sourcesContent":[],"mappings":";;AAkEA;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBY,cDnUC,WCmUG,EAAA,CAAA,CAAA,EAAA,MAAA,EAAA,GAAA,MAAA;AAoChB;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;;AC5GA;AAKA;AAcA;;;;;;AA8GA;AAwDA;;;;AC1LA;AA2DA;AAAyC,cJ0F5B,wBI1F4B,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;AJLzC;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;KArVK,YAAA,GAqViD;EAAe;EAkBzD,KAAA,EAAI,MAAA;AAoChB,CAAA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;;AC5GA;AAKA;AAcA;;KF0BK,eAAA,GExBkB;EACH;EACL,QAAA,EAAA,MAAA;CAAe;AA0G9B;AAwDA;;;;AC1LA;AA2DA;;;;;;;;ACyPA;;;;;;;;AC5PA;AAwRA;AAsDA;AA2CA,KLrWK,qBAAA,GKwWJ;EAQW;EAuKC,cAAA,EAAA,MAAA,EAAA;AA6Cb,CAAA;AAuBA;AAqBA;AAgBA;AA8BA;AAWA;AAoBA;AA6BA;;;;AC7xBA;AAcA;AAEA;AAwQA;;;;;;;;AClRA;AAaA;AAOA;AA2OA;;;;KPjKK,sBAAA,GOoKsB;;;;ACjR3B;AA+EA;AAgEA;AAuBA;AAiCA;;;;AC7MA;AAKA;;;;;;AAOA;AA2BE;AAmnBF;;KT/gBK,mBAAA,GSihBS;EACD;EACC,YAAA,EAAA,MAAA,EAAA;CAGa;;;;AA+C3B;;;;;;;KTtjBK,WAAA,GACC,eACA,kBACA,wBACA,yBACA;;;;;;;;;;;;;;;;cAiBO;;;;;;KAOD,cAAA,WAAyB;;;;;;;KAYhC,aAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAiDO,SAAA;;;;;;;KAYP,eAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAyCS;;;;;;;;;;;;SAaH;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAuDC,SAAA,GAAY,cAAc,gBAAgB;;;;;;;;;;;;;KAkB1C,IAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAoCA,cAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAqCE;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqCF,UAAA,YAAsB;;;;;;;;;;;;;;;;;;;;;;;;;UA8BjB,MAAA;;;;;;;;;;;;;;;;;;;;;;KAuBL,WAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA+CA,mBAAA;;;;;;YAME;;;;;;;;UASF;;;;;;;;;;;;;;cAiBY;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;gBA4DN;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;WAwDL;;;;;;;;;;;;;;;;KAiBD,OAAA;;;;;;;;;;;;;;;;;;;;;;;;;;SA6BD;;;;AAxoBa;;;AAkBlB,KChJM,cAAA,GDgJN;EACA;EACA,KAAA,EChJK,SDgJL,EAAA;EAAmB;EAiBZ,WAAA,EAAA,MAAA;AAOb,CAAA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBA;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;cAAa,uBAAwB,gBAAc;;;AD3FlC;AA4BG;AA8BM;AAyDrB,KEpIO,mBAAA,GFoIY,gBAAA,GAAA,eAAA,GAAA,WAAA,GAAA,eAAA;AAAA;;;AAkBlB,KEjJM,eAAA,GFiJN;EACA,IAAA,EEjJI,mBFiJJ;EACA,OAAA,EAAA,MAAA;EAAmB,UAAA,CAAA,EAAA,MAAA;EAiBZ;EAOD,KAAA,CAAA,EAAA,MAAA;EAYP;EAiDO,OAAA,CAAA,EAAA,MAAS;AAA6B,CAAA;AAyHlD;;;;AAAqE,KEnVzD,oBAAA,GFmVyD;EAkBzD,cAAI,CAAA,EAAA,CEpWM,eFoWN,GAAA,SAAA,CAAA,EAAA;EAoCJ,eAAA,CAAA,EAAc,CEvYH,eF4aT,GAAA,SAAS,CAAA,EAAA;EAqCX,YAAA,CAAU,EAAA,CEhdF,eFgdc,GAAA,SAAc,CAAA,EAAA;EA8B/B,QAAA,CAAM,EE7eR,eF6eQ;AAuBvB,CAAA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;;AC5GA;AAKA;AAcA;AACsB,cA6GT,aA7GS,EAAA,CAAA,KAAA,EA6Ge,SA7Gf,EAAA,EAAA,GAAA,CA6G8B,oBA7G9B,GAAA,SAAA,CAAA,EAAA;;;;;AA6GtB;AAwDA;;;;AC1LA;AA2DA;;;AAAyE,cD+H5D,sBC/H4D,EAAA,CAAA,OAAA,EAAA,CD+HxB,oBC/HwB,GAAA,SAAA,CAAA,EAAA,EAAA,GAAA,MAAA,EAAA;;;AJLzE;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAuCtB,KGpJO,WAAA,GAAc,WHoJV,CGpJsB,mBHoJtB,CAAA,SAAA,CAAA,CAAA,CAAA,MAAA,CAAA;;;;;;;AAsBhB;AAOA;AAAgE;AA6DhE;AAYK,cG/LQ,iBHwOC,EAAA,CAAA,KAaH,EGrP8B,IHqPxB,EAAA,EAAA,KAAA,CAAA,EGrPwC,WHqPxC,EAAA,EAAA,GGrPwD,IHqPxD,EAAA;;;;;;;AAyEjB;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;;AC5GA;AAKA;AAcA;;;;;;AA8GA;AAwDA;;;;AC1LA;AA2DA;;;AAAyE,cCyP5D,YDzP4D,EAAA,CAAA,KAAA,ECyPrC,IDzPqC,EAAA,EAAA,OAAA,ECyPpB,mBDzPoB,EAAA,GCyPD,ODzPC,EAAA;;;;AJLzE;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBA;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAqKA;;;;ACzuBA;AAsGA;;;;AC5GA;AAKA;AAcA;;;;;;AA8GA;AAwDA;;;;AC1LA;AA2Da,cEHA,sBFyBZ,EAAA,CAAA,OAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;;;;;ACmOD;;;;;;;cC4Ba;AAxRb;AAwRA;AAsDA;AA2CA;AAWA;AAuKA;AA6CA;AAuBA;AAqBA;AAgBA;AA8BA;AAWA;AAoBA;AA6BA;;;;AC7xBA;AAcA;AAEA;AAwQA;;;;;cDqGa,gBAAgB;;;AEvX7B;AAaA;AAOA;AA2OA;;;;;;;;AC9QA;AA+EA;AAgEa,cHkSA,cGnRZ,EAAA,CAAA,KAfgE,EAAA,MAAA,EAAA,GAAA,OAAe;AAuBhF;AAiCA;;;;AC7MA;AAKY,KJ6bA,YAAA,GI7biB;EAChB;;;;;EAMD,OAAA,EAAA,MAAA;EA6BP;AAinBL;;;;EAIc,YAAA,EAAA,MAAA,EAAA;EAGa;;;;AA+C3B;EACU,WAAA,EAAA,OAAA;CACsE;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;cJzGnE,mHAIV;;;;;;;;;;;;;;;;;;;;cAyCU;;;;;;;;;;;;;;;;;;;;;;cAuBA,uCAAmC;;;;;;;;;;;;;cAqBnC;;;;;;;;;;;;;;;cAgBA;;;;;;;;;;;;;;;cA8BA;;;;KAWD,YAAA;;;;;;;;;;;;;;;;;;;;;;cAoBC,iDAAkD;;;;;;;;;;;;;;;;cA6BlD;;;ANhvBA,KO7CD,wBAAA,GP6C8E;EA+F7E,IAAA,CAAA,EAAA,MAAA;;;;ECnIR,WAAA,CAAA,EAAA,MAAY;EA4BZ,wBAAe,CAAA,EAAA,OAAA;EA8Bf,yBAAqB,CAAA,EAAA,OAAA;EAiCrB,MAAA,CAAA,EAAA,aAAA,GAAsB,OAAA;EAwBtB,UAAA,CAAA,EAAA,CAAA,IAAA,EAAA,MAAmB,EAAA,MAAA,EAAA,MAAA,EAAA,GAAA,OAAA;EAenB,cAAW,CAAA,EMjIK,MNiIL,EAAA;EACV,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;CACA;AACA,KMhIM,uBAAA,GNgIN;EACA,IAAA,EAAA,MAAA;EACA,MAAA,EAAA,MAAA;CAAmB;AAiBZ,KMjJD,sBAAA,GNiJwG;EAOxG,OAAA,EAAA,MAAA;EAYP,KAAA,EAAA,MAAA;EAiDO,QAAA,EMlNE,uBNkNO,EAAA;AAA6B,CAAA;AAyHlD;;;AAAsD,cMtEzC,uBNsEyC,EAAA,CAAA,KAAA,EMrE3C,INqE2C,EAAA,EAAA,OAAA,CAAA,EMpEzC,wBNoEyC,EAAA,GMnEnD,sBNmEmD,EAAA;;;AAlOjD,KOtHO,wBAAA,GPsHY;EAenB,WAAA,CAAA,EAAW,MAAA;EACV,WAAA,CAAA,EAAA,MAAA;EACA,QAAA,CAAA,EAAA,MAAA;EACA,IAAA,CAAA,EAAA,MAAA;EACA,yBAAA,CAAA,EAAA,OAAA;EACA,YAAA,CAAA,EAAA,OAAA;EAAmB,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;EAiBZ,WAAA,CAAA,EAAA,MAAuG;EAOxG,YAAA,CAAA,EAAA,MAAc;EAYrB,iBAAa,CAAA,EAAA,MAAA;AAiDlB,CAAA;AAYK,KO9NO,wBAAA,GPuQE;EAoEF,IAAA,EAAA,MAAS;EAAG,OAAA,EAAA,MAAA;EAAc,MAAA,EAAA,MAAA;EAAgB,YAAA,EAAA,MAAA,EAAA;CAAe;AAkBzD,KOtVA,wBAAA,GPsVI;EAoCJ,OAAA,EAAA,MAAA;EA0EA,KAAA,EAAA,MAAU;EA8BL,QAAA,EO/dH,wBP+dS,EAAA;AAuBvB,CAAA;;;;;;AAoNA;cOlea,mCACF,kBACG,6BACX;;;;AR3NH;AA+FA;;;;;ACnIiB;AA4BG;AA+Df,KQ7GO,eAAA,GR6Ge;EAwBtB;EAeA,KAAA,EAAA,MAAA;EACC;EACA,KAAA,EAAA,MAAA;EACA;EACA,KAAA,EAAA,MAAA;EACA;EAAmB,QAAA,EAAA,MAAA;AAiBzB,CAAA;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBA;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;AAMc,cQlgBD,mBRkgBC,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GQlgBkC,eRkgBlC,EAAA;;;;;;AA+Jd;;;;ACzuBA;AAsGA;;;cOkCa,mDAAoD;AN9IjE;AAKA;AAcA;;;;AAIe,cM8IF,oBN9IE,EAAA,CAAA,QAAA,EM+ID,eN/IC,EAAA,EAAA,GAAA;EAAe,WAAA,EAAA,gBAAA,GAAA,iBAAA;EA0GjB,KAAA,EAAA,OAAA;EAwDA,QAAA,CAAA,EAAA,MAAA;;;;AC1Lb;AA2DA;;;AAAyE,cK4I5D,kBL5I4D,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA;EAAI,QAAA,EAAA,MAAA;;;;ECyPhE,QAAA,EItGC,eJgLb,EAAA;CA1EmC,GAAA,IAAA;;;AL9PvB,KU5DD,sBAAA,GV4D8E;EA+F7E,IAAA,EAAA,cAAA;;;;ECnIR,KAAA,CAAA,EAAA,OAAY,GAAA,YAAA;EA4BZ,QAAA,EAAA,MAAA,EAAe;AAAA,CAAA,GA8Bf;EAiCA,IAAA,EAAA,WAAA;EAwBA,SAAA,EAAA,CAAA,IAAA,ESxIwC,STwIrB,EAAA,KAAA,EAAA,MAAA,EAAA,GAAA,OAAA;AAAA,CAAA;AAgBlB,KStJM,iBAAA,GTsJN;EACA,OAAA,EStJO,mBTsJP;EACA,KAAA,EStJK,ITsJL,EAAA;EACA,QAAA,EStJQ,OTsJR,EAAA;EACA,QAAA,EStJQ,sBTsJR;CAAmB;AAiBZ,KSpKD,oBAAA,GToKwG;EAOxG,OAAA,EAAA;IAYP,IAAA,EAAA,YAAa,GAAA,wBAAA;IAiDN,SAAS,EAAA,MAAA;IAYhB,aAAA,EAAe,MAAA;IA6GR,SAAS,EAAA,MAAA;IAAG,UAAA,EAAA,MAAA;EAAc,CAAA;EAAgB,KAAA,CAAA,ESzV1C,KTyV0C,CAAA;IAAe,SAAA,EAAA,MAAA;IAkBzD,QAAI,EAAA,MAAA;IAoCJ,aAAc,EAAA,MAAA;IA0Ed,UAAU,EAAA,MAAA;EA8BL,CAAA,CAAA;EAuBL,OAAA,ESxgBC,KTwgBU,CAAA;IA+CX,IAAA,EAAA,MAAA;IAME,oBAAA,EAAA,MAAA;IASF,sBAAA,CAAA,EAAA,MAAA;IAiBY,qBAAA,CAAA,EAAA,MAAA;IA4DN,YAAA,EAAA,MAAA;IAwDL,MAAA,EAAA,WAAA,GAAA,oBAAA,GAAA,WAAA,GAAA,sBAAA,GAAA,qBAAA;IAAM,QAAA,EAAA,OAAA,GAAA,QAAA,GAAA,MAAA;IAiBP,EAAA,CAAA,EAAA,MAAO;;;;ECzuBP,QAAA,EAAA,MAAA,EAAc;AAsG1B,CAAA;KQ1EK,oBAAA;iBAinBW,qCAAA,QACL,kBACG,oBACD,+BACC,4BPlpBd;;EALY,gBAAA,CAAA,EO0pBe,oBP1pBI;AAK/B,CAAA,CAAA,EAAY;EAcA,MAAA,EOyoBC,oBPzoBmB;EACV,QAAA,EOwoBuB,OPxoBvB,EAAA;CACC;AACH,iBOmrBJ,6BAAA,CPnrBI,IAAA,EOorBV,iBPprBU,EAAA,EAAA,IA2GpB,CA3GoB,EAAA;EACL,IAAA,CAAA,EAAA,YAAA,GAAA,wBAAA;EAAe,gBAAA,CAAA,EOorBkD,oBPprBlD;AA0G9B,CAAA,CAAA,EAAa;EAwDA,MAAA,EOmhBA,oBP1eZ;YO0e4C"}
package/dist/index.mjs CHANGED
@@ -423,6 +423,8 @@ const BASE_TOKENS = {
423
423
  "وحدثني",
424
424
  "وحدثنيه"
425
425
  ].join("|"),
426
+ num: "\\d",
427
+ nums: "\\d+",
426
428
  raqm: "[\\u0660-\\u0669]",
427
429
  raqms: "[\\u0660-\\u0669]+",
428
430
  rumuz: RUMUZ_BLOCK,
@@ -1043,8 +1045,43 @@ const applyReplacements = (pages, rules) => {
1043
1045
  });
1044
1046
  };
1045
1047
 
1048
+ //#endregion
1049
+ //#region src/segmentation/breakpoint-constants.ts
1050
+ /**
1051
+ * Shared constants for segmentation breakpoint processing.
1052
+ */
1053
+ /**
1054
+ * Threshold for using offset-based fast path in boundary processing.
1055
+ *
1056
+ * Below this: accurate string-search (handles offset drift from structural rules).
1057
+ * At or above this: O(n) arithmetic (performance critical for large books).
1058
+ *
1059
+ * The value of 1000 is chosen based on typical Arabic book sizes:
1060
+ * - Sahih al-Bukhari: ~1000-3000 pages
1061
+ * - Standard hadith collections: 1000-7000 pages
1062
+ * - Large aggregated corpora: 10k-50k pages
1063
+ *
1064
+ * For segments ≥1000 pages, the performance gain from offset-based slicing
1065
+ * outweighs the minor accuracy loss from potential offset drift.
1066
+ *
1067
+ * @remarks
1068
+ * Fast path is skipped when:
1069
+ * - `maxContentLength` is set (requires character-accurate splitting)
1070
+ * - `debugMetaKey` is set (requires proper provenance tracking)
1071
+ * - Content was structurally modified by marker stripping (offsets may drift)
1072
+ */
1073
+ const FAST_PATH_THRESHOLD = 1e3;
1074
+
1046
1075
  //#endregion
1047
1076
  //#region src/segmentation/breakpoint-utils.ts
1077
+ /**
1078
+ * Utility functions for breakpoint processing in the segmentation engine.
1079
+ *
1080
+ * These functions handle breakpoint normalization, page exclusion checking,
1081
+ * and segment creation. Extracted for independent testing and reuse.
1082
+ *
1083
+ * @module breakpoint-utils
1084
+ */
1048
1085
  const WINDOW_PREFIX_LENGTHS = [
1049
1086
  80,
1050
1087
  60,
@@ -1357,6 +1394,31 @@ const findPageStartNearExpectedBoundary = (remainingContent, _currentFromIdx, ta
1357
1394
  */
1358
1395
  const buildBoundaryPositions = (segmentContent, fromIdx, toIdx, pageIds, normalizedPages, cumulativeOffsets, logger) => {
1359
1396
  const boundaryPositions = [0];
1397
+ const pageCount = toIdx - fromIdx + 1;
1398
+ if (pageCount >= FAST_PATH_THRESHOLD) {
1399
+ logger?.debug?.("[breakpoints] Using fast-path for large segment in buildBoundaryPositions", {
1400
+ fromIdx,
1401
+ pageCount,
1402
+ toIdx
1403
+ });
1404
+ const baseOffset = cumulativeOffsets[fromIdx] ?? 0;
1405
+ for (let i = fromIdx + 1; i <= toIdx; i++) {
1406
+ const offset = cumulativeOffsets[i];
1407
+ if (offset !== void 0) {
1408
+ const boundary = Math.max(0, offset - baseOffset);
1409
+ const prevBoundary = boundaryPositions[boundaryPositions.length - 1];
1410
+ boundaryPositions.push(Math.max(prevBoundary + 1, Math.min(boundary, segmentContent.length)));
1411
+ }
1412
+ }
1413
+ boundaryPositions.push(segmentContent.length);
1414
+ return boundaryPositions;
1415
+ }
1416
+ logger?.debug?.("[breakpoints] buildBoundaryPositions: Using accurate string-search path", {
1417
+ contentLength: segmentContent.length,
1418
+ fromIdx,
1419
+ pageCount,
1420
+ toIdx
1421
+ });
1360
1422
  const startOffsetInFromPage = estimateStartOffsetInCurrentPage(segmentContent, fromIdx, pageIds, normalizedPages);
1361
1423
  for (let i = fromIdx + 1; i <= toIdx; i++) {
1362
1424
  const expectedBoundary = cumulativeOffsets[i] !== void 0 && cumulativeOffsets[fromIdx] !== void 0 ? Math.max(0, cumulativeOffsets[i] - cumulativeOffsets[fromIdx] - startOffsetInFromPage) : segmentContent.length;
@@ -1369,6 +1431,7 @@ const buildBoundaryPositions = (segmentContent, fromIdx, toIdx, pageIds, normali
1369
1431
  }
1370
1432
  }
1371
1433
  boundaryPositions.push(segmentContent.length);
1434
+ logger?.debug?.("[breakpoints] buildBoundaryPositions: Complete", { boundaryCount: boundaryPositions.length });
1372
1435
  return boundaryPositions;
1373
1436
  };
1374
1437
  /**
@@ -1719,6 +1782,127 @@ const skipWhitespace$1 = (content, startPos) => {
1719
1782
  return pos;
1720
1783
  };
1721
1784
  /**
1785
+ * Validates that cumulative offsets match actual content length within a tolerance.
1786
+ * Required to detect if structural rules (like `lineStartsAfter`) have stripped content
1787
+ * which would make offset-based calculations inaccurate.
1788
+ */
1789
+ const checkFastPathAlignment = (cumulativeOffsets, fullContent, fromIdx, toIdx, pageCount, logger) => {
1790
+ const expectedLength = (cumulativeOffsets[toIdx + 1] ?? fullContent.length) - (cumulativeOffsets[fromIdx] ?? 0);
1791
+ const actualLength = fullContent.length;
1792
+ const driftTolerance = Math.max(100, actualLength * .01);
1793
+ const isAligned = Math.abs(expectedLength - actualLength) <= driftTolerance;
1794
+ if (!isAligned && pageCount >= FAST_PATH_THRESHOLD) logger?.warn?.("[breakpoints] Offset drift detected in fast-path candidate, falling back to slow path", {
1795
+ actualLength,
1796
+ drift: Math.abs(expectedLength - actualLength),
1797
+ expectedLength,
1798
+ pageCount
1799
+ });
1800
+ return isAligned;
1801
+ };
1802
+ /**
1803
+ * Handles the special optimized case for maxPages=0 (1 page per segment).
1804
+ * This is O(n) and safer than offset arithmetic as it uses source pages directly.
1805
+ */
1806
+ const processTrivialFastPath = (fromIdx, toIdx, pageIds, normalizedPages, pageCount, originalMeta, debugMetaKey, logger) => {
1807
+ logger?.debug?.("[breakpoints] Using trivial per-page fast-path (maxPages=0)", {
1808
+ fromIdx,
1809
+ pageCount,
1810
+ toIdx
1811
+ });
1812
+ const result = [];
1813
+ for (let i = fromIdx; i <= toIdx; i++) {
1814
+ const pageData = normalizedPages.get(pageIds[i]);
1815
+ if (pageData?.content.trim()) {
1816
+ const meta = getSegmentMetaWithDebug(i === fromIdx, debugMetaKey, originalMeta, null);
1817
+ const seg = createSegment(pageData.content.trim(), pageIds[i], void 0, meta);
1818
+ if (seg) result.push(seg);
1819
+ }
1820
+ }
1821
+ return result;
1822
+ };
1823
+ /**
1824
+ * Handles fast-path segmentation for maxPages > 0 using cumulative offsets.
1825
+ * Avoids O(n²) string searching but requires accurate offsets.
1826
+ */
1827
+ const processOffsetFastPath = (fullContent, fromIdx, toIdx, pageIds, cumulativeOffsets, maxPages, originalMeta, debugMetaKey, logger) => {
1828
+ const result = [];
1829
+ const effectiveMaxPages = maxPages + 1;
1830
+ const pageCount = toIdx - fromIdx + 1;
1831
+ logger?.debug?.("[breakpoints] Using offset-based fast-path for large segment", {
1832
+ effectiveMaxPages,
1833
+ fromIdx,
1834
+ maxPages,
1835
+ pageCount,
1836
+ toIdx
1837
+ });
1838
+ const baseOffset = cumulativeOffsets[fromIdx] ?? 0;
1839
+ for (let segStart = fromIdx; segStart <= toIdx; segStart += effectiveMaxPages) {
1840
+ const segEnd = Math.min(segStart + effectiveMaxPages - 1, toIdx);
1841
+ const startOffset = Math.max(0, (cumulativeOffsets[segStart] ?? 0) - baseOffset);
1842
+ const endOffset = segEnd < toIdx ? Math.max(0, (cumulativeOffsets[segEnd + 1] ?? fullContent.length) - baseOffset) : fullContent.length;
1843
+ const rawContent = fullContent.slice(startOffset, endOffset).trim();
1844
+ if (rawContent) {
1845
+ const meta = getSegmentMetaWithDebug(segStart === fromIdx, debugMetaKey, originalMeta, null);
1846
+ const seg = {
1847
+ content: rawContent,
1848
+ from: pageIds[segStart]
1849
+ };
1850
+ if (segEnd > segStart) seg.to = pageIds[segEnd];
1851
+ if (meta) seg.meta = meta;
1852
+ result.push(seg);
1853
+ }
1854
+ }
1855
+ return result;
1856
+ };
1857
+ /**
1858
+ * Checks if the remaining content fits within paged/length limits.
1859
+ * If so, pushes the final segment and returns true.
1860
+ */
1861
+ const handleOversizedSegmentFit = (remainingContent, currentFromIdx, toIdx, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint, result) => {
1862
+ const remainingSpan = computeRemainingSpan(currentFromIdx, toIdx, pageIds);
1863
+ const remainingHasExclusions = hasAnyExclusionsInRange(expandedBreakpoints, pageIds, currentFromIdx, toIdx);
1864
+ const fitsInPages = remainingSpan <= maxPages;
1865
+ const fitsInLength = !maxContentLength || remainingContent.length <= maxContentLength;
1866
+ if (fitsInPages && fitsInLength && !remainingHasExclusions) {
1867
+ const includeMeta = isFirstPiece || Boolean(debugMetaKey);
1868
+ const finalSeg = createFinalSegment(remainingContent, currentFromIdx, toIdx, pageIds, getSegmentMetaWithDebug(isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint), includeMeta);
1869
+ if (finalSeg) result.push(finalSeg);
1870
+ return true;
1871
+ }
1872
+ return false;
1873
+ };
1874
+ /**
1875
+ * Builds metadata for a segment piece, optionally including debug info.
1876
+ */
1877
+ const getSegmentMetaWithDebug = (isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint) => {
1878
+ if (!(isFirstPiece || Boolean(debugMetaKey))) return;
1879
+ if (debugMetaKey && lastBreakpoint) return mergeDebugIntoMeta(isFirstPiece ? originalMeta : void 0, debugMetaKey, buildBreakpointDebugPatch(lastBreakpoint.breakpointIndex, lastBreakpoint.rule));
1880
+ return isFirstPiece ? originalMeta : void 0;
1881
+ };
1882
+ /**
1883
+ * Calculates window end position, capped by maxContentLength if present.
1884
+ */
1885
+ const getWindowEndPosition = (remainingContent, currentFromIdx, windowEndIdx, toIdx, pageIds, normalizedPages, cumulativeOffsets, maxContentLength, logger) => {
1886
+ let windowEndPosition = findBreakpointWindowEndPosition(remainingContent, currentFromIdx, windowEndIdx, toIdx, pageIds, normalizedPages, cumulativeOffsets, logger);
1887
+ if (maxContentLength && maxContentLength < windowEndPosition) windowEndPosition = maxContentLength;
1888
+ return windowEndPosition;
1889
+ };
1890
+ /**
1891
+ * Advances cursorPos and currentFromIdx for the next iteration.
1892
+ */
1893
+ const advanceCursorAndIndex = (fullContent, breakPos, actualEndIdx, toIdx, pageIds, normalizedPages) => {
1894
+ const nextCursorPos = skipWhitespace$1(fullContent, breakPos);
1895
+ return {
1896
+ currentFromIdx: computeNextFromIdx(fullContent.slice(nextCursorPos, nextCursorPos + 500), actualEndIdx, toIdx, pageIds, normalizedPages),
1897
+ cursorPos: nextCursorPos
1898
+ };
1899
+ };
1900
+ /**
1901
+ * Applies breakpoints to oversized segments.
1902
+ *
1903
+ * Note: This is an internal engine used by `segmentPages()`.
1904
+ */
1905
+ /**
1722
1906
  * Processes an oversized segment by iterating through the content and
1723
1907
  * breaking it into smaller pieces that fit within maxPages constraints.
1724
1908
  *
@@ -1727,6 +1911,20 @@ const skipWhitespace$1 = (content, startPos) => {
1727
1911
  const processOversizedSegment = (segment, fromIdx, toIdx, pageIds, normalizedPages, cumulativeOffsets, expandedBreakpoints, maxPages, prefer, logger, debugMetaKey, maxContentLength) => {
1728
1912
  const result = [];
1729
1913
  const fullContent = segment.content;
1914
+ const pageCount = toIdx - fromIdx + 1;
1915
+ const isAligned = checkFastPathAlignment(cumulativeOffsets, fullContent, fromIdx, toIdx, pageCount, logger);
1916
+ if (pageCount >= FAST_PATH_THRESHOLD && isAligned && !maxContentLength && !debugMetaKey) {
1917
+ if (maxPages === 0) return processTrivialFastPath(fromIdx, toIdx, pageIds, normalizedPages, pageCount, segment.meta, debugMetaKey, logger);
1918
+ return processOffsetFastPath(fullContent, fromIdx, toIdx, pageIds, cumulativeOffsets, maxPages, segment.meta, debugMetaKey, logger);
1919
+ }
1920
+ logger?.debug?.("[breakpoints] processOversizedSegment: Using iterative path", {
1921
+ contentLength: fullContent.length,
1922
+ fromIdx,
1923
+ maxContentLength,
1924
+ maxPages,
1925
+ pageCount,
1926
+ toIdx
1927
+ });
1730
1928
  let cursorPos = 0;
1731
1929
  let currentFromIdx = fromIdx;
1732
1930
  let isFirstPiece = true;
@@ -1742,12 +1940,13 @@ const processOversizedSegment = (segment, fromIdx, toIdx, pageIds, normalizedPag
1742
1940
  const MAX_SAFE_ITERATIONS = 1e5;
1743
1941
  while (cursorPos < fullContent.length && currentFromIdx <= toIdx && i < MAX_SAFE_ITERATIONS) {
1744
1942
  i++;
1745
- const remainingContent = fullContent.slice(cursorPos);
1943
+ const safeSliceLen = maxContentLength ? maxContentLength + 4e3 : void 0;
1944
+ const remainingContent = safeSliceLen ? fullContent.slice(cursorPos, cursorPos + safeSliceLen) : fullContent.slice(cursorPos);
1746
1945
  if (!remainingContent.trim()) break;
1747
1946
  if (handleOversizedSegmentFit(remainingContent, currentFromIdx, toIdx, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, segment.meta, lastBreakpoint, result)) break;
1748
1947
  const windowEndIdx = computeWindowEndIdx(currentFromIdx, toIdx, pageIds, maxPages);
1749
1948
  const windowEndPosition = getWindowEndPosition(remainingContent, currentFromIdx, windowEndIdx, toIdx, pageIds, normalizedPages, cumulativeOffsets, maxContentLength, logger);
1750
- logger?.debug?.(`[breakpoints] iteration=${i}`, {
1949
+ logger?.trace?.(`[breakpoints] iteration=${i}`, {
1751
1950
  currentFromIdx,
1752
1951
  cursorPos,
1753
1952
  windowEndIdx,
@@ -1784,57 +1983,12 @@ const processOversizedSegment = (segment, fromIdx, toIdx, pageIds, normalizedPag
1784
1983
  fullContentLength: fullContent.length,
1785
1984
  iterations: i
1786
1985
  });
1787
- logger?.debug?.("[breakpoints] done", { resultCount: result.length });
1986
+ logger?.debug?.("[breakpoints] processOversizedSegment: Complete", {
1987
+ iterations: i,
1988
+ resultCount: result.length
1989
+ });
1788
1990
  return result;
1789
1991
  };
1790
- /**
1791
- * Checks if the remaining content fits within paged/length limits.
1792
- * If so, pushes the final segment and returns true.
1793
- */
1794
- const handleOversizedSegmentFit = (remainingContent, currentFromIdx, toIdx, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint, result) => {
1795
- const remainingSpan = computeRemainingSpan(currentFromIdx, toIdx, pageIds);
1796
- const remainingHasExclusions = hasAnyExclusionsInRange(expandedBreakpoints, pageIds, currentFromIdx, toIdx);
1797
- const fitsInPages = remainingSpan <= maxPages;
1798
- const fitsInLength = !maxContentLength || remainingContent.length <= maxContentLength;
1799
- if (fitsInPages && fitsInLength && !remainingHasExclusions) {
1800
- const includeMeta = isFirstPiece || Boolean(debugMetaKey);
1801
- const finalSeg = createFinalSegment(remainingContent, currentFromIdx, toIdx, pageIds, getSegmentMetaWithDebug(isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint), includeMeta);
1802
- if (finalSeg) result.push(finalSeg);
1803
- return true;
1804
- }
1805
- return false;
1806
- };
1807
- /**
1808
- * Builds metadata for a segment piece, optionally including debug info.
1809
- */
1810
- const getSegmentMetaWithDebug = (isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint) => {
1811
- if (!(isFirstPiece || Boolean(debugMetaKey))) return;
1812
- if (debugMetaKey && lastBreakpoint) return mergeDebugIntoMeta(isFirstPiece ? originalMeta : void 0, debugMetaKey, buildBreakpointDebugPatch(lastBreakpoint.breakpointIndex, lastBreakpoint.rule));
1813
- return isFirstPiece ? originalMeta : void 0;
1814
- };
1815
- /**
1816
- * Calculates window end position, capped by maxContentLength if present.
1817
- */
1818
- const getWindowEndPosition = (remainingContent, currentFromIdx, windowEndIdx, toIdx, pageIds, normalizedPages, cumulativeOffsets, maxContentLength, logger) => {
1819
- let windowEndPosition = findBreakpointWindowEndPosition(remainingContent, currentFromIdx, windowEndIdx, toIdx, pageIds, normalizedPages, cumulativeOffsets, logger);
1820
- if (maxContentLength && maxContentLength < windowEndPosition) windowEndPosition = maxContentLength;
1821
- return windowEndPosition;
1822
- };
1823
- /**
1824
- * Advances cursorPos and currentFromIdx for the next iteration.
1825
- */
1826
- const advanceCursorAndIndex = (fullContent, breakPos, actualEndIdx, toIdx, pageIds, normalizedPages) => {
1827
- const nextCursorPos = skipWhitespace$1(fullContent, breakPos);
1828
- return {
1829
- currentFromIdx: computeNextFromIdx(fullContent.slice(nextCursorPos), actualEndIdx, toIdx, pageIds, normalizedPages),
1830
- cursorPos: nextCursorPos
1831
- };
1832
- };
1833
- /**
1834
- * Applies breakpoints to oversized segments.
1835
- *
1836
- * Note: This is an internal engine used by `segmentPages()`.
1837
- */
1838
1992
  const applyBreakpoints = (segments, pages, normalizedContent, maxPages, breakpoints, prefer, patternProcessor, logger, pageJoiner = "space", debugMetaKey, maxContentLength) => {
1839
1993
  const pageIds = pages.map((p) => p.id);
1840
1994
  const pageIdToIndex = buildPageIdToIndexMap(pageIds);
@@ -1865,6 +2019,15 @@ const applyBreakpoints = (segments, pages, normalizedContent, maxPages, breakpoi
1865
2019
  result.push(segment);
1866
2020
  continue;
1867
2021
  }
2022
+ logger?.debug?.("[breakpoints] Processing oversized segment", {
2023
+ contentLength: segment.content.length,
2024
+ from: segment.from,
2025
+ hasExclusions,
2026
+ pageSpan: toIdx - fromIdx + 1,
2027
+ reasonFitsInLength: fitsInLength,
2028
+ reasonFitsInPages: fitsInPages,
2029
+ to: segment.to
2030
+ });
1868
2031
  const broken = processOversizedSegment(segment, fromIdx, toIdx, pageIds, normalizedPages, cumulativeOffsets, expandedBreakpoints, maxPages, prefer, logger, debugMetaKey, maxContentLength);
1869
2032
  result.push(...broken.map((s) => {
1870
2033
  const segFromIdx = pageIdToIndex.get(s.from) ?? -1;