flappa-doormal 2.7.0 → 2.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -28,6 +28,8 @@ src/
28
28
  ├── index.ts # Main entry point and exports
29
29
  ├── pattern-detection.ts # Token detection for auto-generating rules (NEW)
30
30
  ├── pattern-detection.test.ts # Pattern detection tests (22 tests)
31
+ ├── recovery.ts # Marker recovery utility (recover mistaken lineStartsAfter)
32
+ ├── recovery.test.ts # Marker recovery tests
31
33
  └── segmentation/
32
34
  ├── types.ts # TypeScript type definitions for rules/segments
33
35
  ├── segmenter.ts # Core segmentation engine (segmentPages)
@@ -56,6 +58,11 @@ src/
56
58
  - Takes array of `{id, content}` pages and split rules
57
59
  - Returns array of `{content, from, to?, meta?}` segments
58
60
 
61
+ 1. **`recoverMistakenLineStartsAfterMarkers(pages, segments, options, selector)`** - Recovery helper
62
+ - Use when a client mistakenly used `lineStartsAfter` where `lineStartsWith` was intended
63
+ - Deterministic mode reruns segmentation with selected rules converted to `lineStartsWith` and merges recovered `content` back into the provided segments
64
+ - Optional `mode: 'best_effort_then_rerun'` attempts a conservative anchor-based recovery first, then falls back to rerun for unresolved segments
65
+
59
66
  2. **`tokens.ts`** - Template system
60
67
  - `TOKEN_PATTERNS` - Map of token names to regex patterns
61
68
  - `expandTokensWithCaptures()` - Expands `{{token:name}}` syntax
@@ -362,6 +369,39 @@ bunx biome lint .
362
369
 
363
370
  11. **Boundary-position algorithm improves page attribution**: Building a position map of page boundaries once per segment (O(n)) enables binary search for O(log n) lookups per piece. Key insight: when a segment starts mid-page (common after structural rules), expected boundary estimates must account for the offset into the starting page. Without this adjustment, position-based lookups can return the wrong page when pages have identical content prefixes.
364
371
 
372
+ ### For Future AI Agents (Recovery + Repo gotchas)
373
+
374
+ 1. **`lineStartsAfter` vs `lineStartsWith` is not “cosmetic”**: `lineStartsAfter` changes output by stripping the matched marker via an internal `contentStartOffset` during segment construction. If a client used it by accident, you cannot reconstruct the exact stripped prefix from output alone without referencing the original pages and re-matching the marker.
375
+
376
+ 2. **Recovery must mirror segmentation’s preprocessing**: If `SegmentationOptions.replace` was used, recovery must apply the same replacements (see `src/segmentation/replace.ts`) before attempting anchoring or rerun alignment, otherwise substring matching and page joins will drift.
377
+
378
+ 3. **Page joining differs between matching and output**:
379
+ - Matching always happens on pages concatenated with `\\n` separators.
380
+ - Output segments may normalize page boundaries (`pageJoiner: 'space' | 'newline'`) and breakpoints post-processing uses its own join normalization utilities.
381
+ Recovery code must be explicit about which representation it’s searching.
382
+
383
+ 4. **Breakpoints can produce “pieces” that were never marker-stripped**: When `maxPages` + `breakpoints` are enabled, only the piece that starts at the original structural boundary could have lost a marker due to `lineStartsAfter`. Mid-segment breakpoint pieces should not be “recovered” unless you can anchor them confidently.
384
+
385
+ 5. **Fuzzy defaults are easy to miss**: Some tokens auto-enable fuzzy matching unless `fuzzy: false` is set (`bab`, `basmalah`, `fasl`, `kitab`, `naql`). If you are validating markers or re-matching prefixes, use the same compilation path as segmentation (`buildRuleRegex` / `processPattern`) so diacritics and token expansion behave identically.
386
+
387
+ 6. **Auto-escaping applies to template-like patterns**: `lineStartsWith`, `lineStartsAfter`, `lineEndsWith`, and `template` auto-escape `()[]` outside `{{tokens}}`. Raw `regex` does not. If you compare patterns by string equality, be careful about escaping and whitespace.
388
+
389
+ 7. **TypeScript union pitfalls with `SplitRule`**: `SplitRule` is a union where only one pattern type should exist. Avoid mutating rules in-place with `delete` on fields (TS often narrows unions and then complains). Prefer rebuilding converted rules via destructuring (e.g. `{ lineStartsAfter, ...rest }` then create `{...rest, lineStartsWith: lineStartsAfter}`).
390
+
391
+ 8. **Biome lint constraints shape implementation**: The repo enforces low function complexity. Expect to extract helpers (alignment, selector resolution, anchoring) to keep Biome happy. Also, Biome can flag regex character-class usage as misleading; prefer alternation (e.g. `(?:\\u200C|\\u200D|\\uFEFF)`) when removing specific codepoints.
392
+
393
+ 9. **When debugging recovery, start here**:
394
+ - `src/segmentation/segmenter.ts` (how content is sliced/trimmed and how `from/to` are computed)
395
+ - `src/segmentation/rule-regex.ts` + `src/segmentation/tokens.ts` (token expansion + fuzzy behavior)
396
+ - `src/segmentation/replace.ts` (preprocessing parity)
397
+ - `src/recovery.ts` (recovery implementation)
398
+
399
+ ### Process Template (Multi-agent design review, TDD-first)
400
+
401
+ If you want to repeat the “write a plan → get multiple AI critiques → synthesize → update plan → implement TDD-first” workflow, use:
402
+
403
+ - `docs/ai-multi-agent-tdd-template.md`
404
+
365
405
  ### Architecture Insights
366
406
 
367
407
  - **Declarative > Imperative**: Users describe patterns, library handles regex
package/README.md CHANGED
@@ -697,6 +697,53 @@ const options: SegmentationOptions = {
697
697
  const segments: Segment[] = segmentPages(pages, options);
698
698
  ```
699
699
 
700
+ ### Marker recovery (when `lineStartsAfter` was used by accident)
701
+
702
+ If you accidentally used `lineStartsAfter` for markers that should have been preserved (e.g. Arabic connective phrases like `وروى` / `وذكر`), you can recover those missing prefixes from existing segments.
703
+
704
+ #### `recoverMistakenLineStartsAfterMarkers(pages, segments, options, selector, opts?)`
705
+
706
+ This function returns new segments with recovered `content` plus a `report` describing what happened.
707
+
708
+ **Recommended (deterministic) mode**: rerun segmentation with selected rules converted to `lineStartsWith`, then merge recovered content back.
709
+
710
+ ```ts
711
+ import { recoverMistakenLineStartsAfterMarkers, segmentPages } from 'flappa-doormal';
712
+
713
+ const pages = [{ id: 1, content: 'وروى أحمد\nوذكر خالد' }];
714
+ const options = { rules: [{ lineStartsAfter: ['وروى '] }, { lineStartsAfter: ['وذكر '] }] };
715
+
716
+ const segments = segmentPages(pages, options);
717
+ // segments[0].content === 'أحمد' (marker stripped)
718
+
719
+ const { segments: recovered, report } = recoverMistakenLineStartsAfterMarkers(
720
+ pages,
721
+ segments,
722
+ options,
723
+ { type: 'rule_indices', indices: [0] }, // recover only the first rule
724
+ );
725
+
726
+ // recovered[0].content === 'وروى أحمد'
727
+ // recovered[1].content === 'خالد' (unchanged)
728
+ console.log(report.summary);
729
+ ```
730
+
731
+ **Optional**: best-effort anchoring mode attempts to recover without rerunning first, then falls back to rerun for unresolved segments:
732
+
733
+ ```ts
734
+ const { segments: recovered } = recoverMistakenLineStartsAfterMarkers(
735
+ pages,
736
+ segments,
737
+ options,
738
+ { type: 'rule_indices', indices: [0] },
739
+ { mode: 'best_effort_then_rerun' }
740
+ );
741
+ ```
742
+
743
+ Notes:
744
+ - Recovery is **explicitly scoped** by the `selector`; it will not “guess” which rules are mistaken.
745
+ - If your segments were heavily post-processed (trimmed/normalized/reordered), recovery may return unresolved items; see the report for details.
746
+
700
747
  ### `stripHtmlTags(html)`
701
748
 
702
749
  Remove all HTML tags from content, keeping only text.
package/dist/index.d.mts CHANGED
@@ -1149,70 +1149,18 @@ declare const getAvailableTokens: () => string[];
1149
1149
  */
1150
1150
  declare const getTokenPattern: (tokenName: string) => string | undefined;
1151
1151
  //#endregion
1152
- //#region src/analysis.d.ts
1152
+ //#region src/analysis/line-starts.d.ts
1153
1153
  type LineStartAnalysisOptions = {
1154
- /** Return top K patterns (after filtering). Default: 20 */
1155
1154
  topK?: number;
1156
- /** Only consider the first N characters of each trimmed line. Default: 60 */
1157
1155
  prefixChars?: number;
1158
- /** Ignore lines shorter than this (after trimming). Default: 6 */
1159
1156
  minLineLength?: number;
1160
- /** Only include patterns that appear at least this many times. Default: 3 */
1161
1157
  minCount?: number;
1162
- /** Keep up to this many example lines per pattern. Default: 5 */
1163
1158
  maxExamples?: number;
1164
- /**
1165
- * If true, include a literal first word when no token match is found at the start.
1166
- * Default: true
1167
- */
1168
1159
  includeFirstWordFallback?: boolean;
1169
- /**
1170
- * If true, strip Arabic diacritics (harakat/tashkeel) for the purposes of matching tokens.
1171
- * This helps patterns like `وأَخْبَرَنَا` match the `{{naql}}` token (`وأخبرنا`).
1172
- *
1173
- * Note: examples are still stored in their original (unstripped) form.
1174
- *
1175
- * Default: true
1176
- */
1177
1160
  normalizeArabicDiacritics?: boolean;
1178
- /**
1179
- * How to sort patterns before applying `topK`.
1180
- *
1181
- * - `specificity` (default): prioritize more structured prefixes first (tokenCount, then literalLen), then count.
1182
- * - `count`: prioritize highest-frequency patterns first, then specificity.
1183
- */
1184
1161
  sortBy?: 'specificity' | 'count';
1185
- /**
1186
- * Optional filter to restrict which lines are analyzed.
1187
- *
1188
- * The `line` argument is the trimmed + whitespace-collapsed version of the line.
1189
- * Return `true` to include it, `false` to skip it.
1190
- *
1191
- * @example
1192
- * // Only analyze markdown H2 headings
1193
- * { lineFilter: (line) => line.startsWith('## ') }
1194
- */
1195
1162
  lineFilter?: (line: string, pageId: number) => boolean;
1196
- /**
1197
- * Optional list of prefix matchers to consume before tokenization.
1198
- *
1199
- * This is for "syntactic" prefixes that are common at line start but are not
1200
- * meaningful as tokens by themselves (e.g. markdown headings like `##`).
1201
- *
1202
- * Each matcher is applied at the current position. If it matches, the matched
1203
- * text is appended (escaped) to the signature and the scanner advances.
1204
- *
1205
- * @example
1206
- * // Support markdown blockquotes and headings
1207
- * { prefixMatchers: [/^>+/u, /^#+/u] }
1208
- */
1209
1163
  prefixMatchers?: RegExp[];
1210
- /**
1211
- * How to represent whitespace in returned `pattern` signatures.
1212
- *
1213
- * - `regex` (default): use `\\s*` placeholders between tokens (useful if you paste patterns into regex-ish templates).
1214
- * - `space`: use literal single spaces (`' '`) between tokens (safer if you don't want `\\s` to match newlines when reused as regex).
1215
- */
1216
1164
  whitespace?: 'regex' | 'space';
1217
1165
  };
1218
1166
  type LineStartPatternExample = {
@@ -1226,12 +1174,41 @@ type CommonLineStartPattern = {
1226
1174
  };
1227
1175
  /**
1228
1176
  * Analyze pages and return the most common line-start patterns (top K).
1229
- *
1230
- * This is a pure algorithmic heuristic: it tokenizes common prefixes into a stable
1231
- * template-ish string using the library tokens (e.g., `{{bab}}`, `{{raqms}}`, `{{rumuz}}`).
1232
1177
  */
1233
1178
  declare const analyzeCommonLineStarts: (pages: Page[], options?: LineStartAnalysisOptions) => CommonLineStartPattern[];
1234
1179
  //#endregion
1180
+ //#region src/analysis/repeating-sequences.d.ts
1181
+ type RepeatingSequenceOptions = {
1182
+ minElements?: number;
1183
+ maxElements?: number;
1184
+ minCount?: number;
1185
+ topK?: number;
1186
+ normalizeArabicDiacritics?: boolean;
1187
+ requireToken?: boolean;
1188
+ whitespace?: 'regex' | 'space';
1189
+ maxExamples?: number;
1190
+ contextChars?: number;
1191
+ maxUniquePatterns?: number;
1192
+ };
1193
+ type RepeatingSequenceExample = {
1194
+ text: string;
1195
+ context: string;
1196
+ pageId: number;
1197
+ startIndices: number[];
1198
+ };
1199
+ type RepeatingSequencePattern = {
1200
+ pattern: string;
1201
+ count: number;
1202
+ examples: RepeatingSequenceExample[];
1203
+ };
1204
+ /**
1205
+ * Analyze pages for commonly repeating word sequences.
1206
+ *
1207
+ * Use for continuous text without line breaks. For line-based analysis,
1208
+ * use `analyzeCommonLineStarts()` instead.
1209
+ */
1210
+ declare const analyzeRepeatingSequences: (pages: Page[], options?: RepeatingSequenceOptions) => RepeatingSequencePattern[];
1211
+ //#endregion
1235
1212
  //#region src/detection.d.ts
1236
1213
  /**
1237
1214
  * Pattern detection utilities for recognizing template tokens in Arabic text.
@@ -1307,5 +1284,67 @@ declare const analyzeTextForRule: (text: string) => {
1307
1284
  detected: DetectedPattern[];
1308
1285
  } | null;
1309
1286
  //#endregion
1310
- export { type Breakpoint, type BreakpointRule, type CommonLineStartPattern, type DetectedPattern, type ExpandResult, type LineStartAnalysisOptions, type LineStartPatternExample, type Logger, type Page, type PageRange, type ReplaceRule, type RuleValidationResult, type Segment, type SegmentationOptions, type SplitRule, TOKEN_PATTERNS, type ValidationIssue, type ValidationIssueType, analyzeCommonLineStarts, analyzeTextForRule, applyReplacements, containsTokens, detectTokenPatterns, escapeRegex, escapeTemplateBrackets, expandCompositeTokensInTemplate, expandTokens, expandTokensWithCaptures, generateTemplateFromText, getAvailableTokens, getTokenPattern, makeDiacriticInsensitive, segmentPages, suggestPatternConfig, templateToRegex, validateRules };
1287
+ //#region src/recovery.d.ts
1288
+ type MarkerRecoverySelector = {
1289
+ type: 'rule_indices';
1290
+ indices: number[];
1291
+ } | {
1292
+ type: 'lineStartsAfter_patterns';
1293
+ match?: 'exact' | 'normalized';
1294
+ patterns: string[];
1295
+ } | {
1296
+ type: 'predicate';
1297
+ predicate: (rule: SplitRule, index: number) => boolean;
1298
+ };
1299
+ type MarkerRecoveryRun = {
1300
+ options: SegmentationOptions;
1301
+ pages: Page[];
1302
+ segments: Segment[];
1303
+ selector: MarkerRecoverySelector;
1304
+ };
1305
+ type MarkerRecoveryReport = {
1306
+ summary: {
1307
+ mode: 'rerun_only' | 'best_effort_then_rerun';
1308
+ recovered: number;
1309
+ totalSegments: number;
1310
+ unchanged: number;
1311
+ unresolved: number;
1312
+ };
1313
+ byRun?: Array<{
1314
+ recovered: number;
1315
+ runIndex: number;
1316
+ totalSegments: number;
1317
+ unresolved: number;
1318
+ }>;
1319
+ details: Array<{
1320
+ from: number;
1321
+ originalStartPreview: string;
1322
+ recoveredPrefixPreview?: string;
1323
+ recoveredStartPreview?: string;
1324
+ segmentIndex: number;
1325
+ status: 'recovered' | 'skipped_idempotent' | 'unchanged' | 'unresolved_alignment' | 'unresolved_selector';
1326
+ strategy: 'rerun' | 'stage1' | 'none';
1327
+ to?: number;
1328
+ notes?: string[];
1329
+ }>;
1330
+ errors: string[];
1331
+ warnings: string[];
1332
+ };
1333
+ type NormalizeCompareMode = 'none' | 'whitespace' | 'whitespace_and_nfkc';
1334
+ declare function recoverMistakenLineStartsAfterMarkers(pages: Page[], segments: Segment[], options: SegmentationOptions, selector: MarkerRecoverySelector, opts?: {
1335
+ mode?: 'rerun_only' | 'best_effort_then_rerun';
1336
+ normalizeCompare?: NormalizeCompareMode;
1337
+ }): {
1338
+ report: MarkerRecoveryReport;
1339
+ segments: Segment[];
1340
+ };
1341
+ declare function recoverMistakenMarkersForRuns(runs: MarkerRecoveryRun[], opts?: {
1342
+ mode?: 'rerun_only' | 'best_effort_then_rerun';
1343
+ normalizeCompare?: NormalizeCompareMode;
1344
+ }): {
1345
+ report: MarkerRecoveryReport;
1346
+ segments: Segment[];
1347
+ };
1348
+ //#endregion
1349
+ export { type Breakpoint, type BreakpointRule, type CommonLineStartPattern, type DetectedPattern, type ExpandResult, type LineStartAnalysisOptions, type LineStartPatternExample, type Logger, type MarkerRecoveryReport, type MarkerRecoveryRun, type MarkerRecoverySelector, type Page, type PageRange, type RepeatingSequenceExample, type RepeatingSequenceOptions, type RepeatingSequencePattern, type ReplaceRule, type RuleValidationResult, type Segment, type SegmentationOptions, type SplitRule, TOKEN_PATTERNS, type ValidationIssue, type ValidationIssueType, analyzeCommonLineStarts, analyzeRepeatingSequences, analyzeTextForRule, applyReplacements, containsTokens, detectTokenPatterns, escapeRegex, escapeTemplateBrackets, expandCompositeTokensInTemplate, expandTokens, expandTokensWithCaptures, generateTemplateFromText, getAvailableTokens, getTokenPattern, makeDiacriticInsensitive, recoverMistakenLineStartsAfterMarkers, recoverMistakenMarkersForRuns, segmentPages, suggestPatternConfig, templateToRegex, validateRules };
1311
1350
  //# sourceMappingURL=index.d.mts.map
@@ -1 +1 @@
1
- {"version":3,"file":"index.d.mts","names":[],"sources":["../src/segmentation/fuzzy.ts","../src/segmentation/types.ts","../src/segmentation/pattern-validator.ts","../src/segmentation/replace.ts","../src/segmentation/segmenter.ts","../src/segmentation/tokens.ts","../src/analysis.ts","../src/detection.ts"],"sourcesContent":[],"mappings":";;AAkEA;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA;AAqCA;AA0EY,cD/bC,WC+bqB,EAAA,CAAA,CAAA,EAAA,MAAc,EAAA,GAAA,MAAA;AA8BhD;AAuBA;AA+CA;;;;;;AAsIA;;;;AC9tBA;AAKA;AAUA;;;;;;AAwGA;;;;ACxHA;AA2DA;;;;;;;;ACkcA;;AAAqD,cJxWxC,wBIwWwC,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;AJvcrD;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA,KApXK,YAAA,GAoXW;EAqCJ;EA0EA,KAAA,EAAA,MAAU;AA8BtB,CAAA;AAuBA;AA+CA;;;;;;AAsIA;;;;AC9tBA;AAKA;AAUA;;;;;;AAwGA;;;;ACxHA,KF8CK,eAAA,GE9CkB;EA2DV;EAA4B,QAAA,EAAA,MAAA;CAAgB;;;;;;ACkczD;;;;;;;;ACrcA;AA0QA;AAsDA;AA2CA;AAWA;AAuKA;AA6CA;AAuBA;AAqBA;AAgBA;;;KJlnBK,qBAAA;EKpFO;EAkEA,cAAA,EAAA,MAAA,EAAA;AAEZ,CAAA;AAkSA;;;;;;;;AC9VA;AA+EA;AAgEA;AAuBA;AAiCA;;;;;;;;;;;;;;;;KN1FK,sBAAA;;;;;;;;;;;;;;;;;;;;;;;KAwBA,mBAAA;;;;;;;;;;;;;;KAeA,WAAA,GACC,eACA,kBACA,wBACA,yBACA;;;;;;;KAYD,aAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA+EO,SAAA;;;;;;;KAYP,eAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAyCS;;;;;;;;;;;;SAaH;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA8DC,SAAA,GAAY,cAAc,gBAAgB;;;;;;;;;;;;;KAkB1C,IAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqCA,cAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAqCE;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqCF,UAAA,YAAsB;;;;;;;;;;;;;;;;;;;;;;;;;UA8BjB,MAAA;;;;;;;;;;;;;;;;;;;;;;KAuBL,WAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA+CA,mBAAA;;;;;;YAME;;;;;;;;UASF;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;gBA8CM;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;WAwDL;;;;;;;;;;;;;;;;KAiBD,OAAA;;;;;;;;;;;;;;;;;;;;;;;;;;SA6BD;;;;AA1uBM;AA4BG;AA8BM;AAyDrB,KCpIO,mBAAA,GDoIY,gBAAA,GAAA,eAAA,GAAA,WAAA;AAAA;;;AAkBlB,KCjJM,eAAA,GDiJN;EACA,IAAA,ECjJI,mBDiJJ;EACA,OAAA,EAAA,MAAA;EAAmB,UAAA,CAAA,EAAA,MAAA;AAAA,CAAA;AA2FzB;AAAkD;AAgIlD;;AAAsC,KCpW1B,oBAAA,GDoW0B;EAAgB,cAAA,CAAA,EAAA,CCnWhC,eDmWgC,GAAA,SAAA,CAAA,EAAA;EAAe,eAAA,CAAA,EAAA,CClW9C,eDkW8C,GAAA,SAAA,CAAA,EAAA;EAkBzD,YAAI,CAAA,EAAA,CCnXI,eDmXJ,GAAA,SAAA,CAAA,EAAA;EAqCJ,QAAA,CAAA,ECvZG,eDuZW;AA0E1B,CAAA;AA8BA;AAuBA;AA+CA;;;;;;AAsIA;;;;AC9tBA;AAKA;AAUA;;;;;AAI8B,cAoGjB,aApGiB,EAAA,CAAA,KAAA,EAoGO,SApGP,EAAA,EAAA,GAAA,CAoGsB,oBApGtB,GAAA,SAAA,CAAA,EAAA;;;AFkC9B;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAuCtB,KEpJO,WAAA,GAAc,WFoJV,CEpJsB,mBFoJtB,CAAA,SAAA,CAAA,CAAA,CAAA,MAAA,CAAA;;;;;;;AAKS;AA2FzB;AAAkD;AAgIlD;AAAwB,cEzTX,iBFyTW,EAAA,CAAA,KAAA,EEzTiB,IFyTjB,EAAA,EAAA,KAAA,CAAA,EEzTiC,WFyTjC,EAAA,EAAA,GEzTiD,IFyTjD,EAAA;;;;;AAkBxB;AAqCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;AAsIA;;;;AC9tBA;AAKA;AAUA;;;;;;AAwGA;;;;ACxHA;AA2DA;;;;;;;;ACkcA;;;;;;;;ACrcA;AA0QA;AAsDA;AA2CA;AAWA;AAuKA;AA6CA;AAuBA;AAqBA;AAgBA;cDjMa,sBAAuB,iBAAiB,wBAAsB;;;;AJvc3E;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA;AAqCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;AAsIA;;;;AC9tBA;AAKA;AAUA;;;;;;AAwGA;;;;ACxHA;AA2DA;;;;;;;;ACkcA;AAAoC,cCrcvB,sBDqcuB,EAAA,CAAA,OAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;;;;ACrcpC;AA0QA;AAsDA;AA2CA;AAWA;AAuKA;AA6CA;AAuBA;AAqBa,cA5WA,+BA4WsD,EAAA,CAAA,QAAA,EAAA,MAAA,EAAA,GAAA,MAAA;AAgBnE;;;;ACtsBA;AAkEA;AAEA;AAkSA;;;;;;;;AC9VA;AA+EA;AAgEA;AAuBA;AAiCA;;;;;;cFiLa,gBAAgB;;;;;;;;;;;;;;;;cA2ChB;;;;;;;KAWD,YAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;cAuKC,mHAIV;;;;;;;;;;;;;;;;;;;;cAyCU;;;;;;;;;;;;;;;;;;;;;;cAuBA,uCAAmC;;;;;;;;;;;;;cAqBnC;;;;;;;;;;;;;;;cAgBA;;;ALxoBA,KM9DD,wBAAA,GN8D8E;EA+F7E;;;;ECnIR;EA4BA,aAAA,CAAA,EAAA,MAAe;EA8Bf;EAiCA,QAAA,CAAA,EAAA,MAAA;EAwBA;EAeA,WAAA,CAAA,EAAW,MAAA;EACV;;;;EAIA,wBAAA,CAAA,EAAA,OAAA;EAAmB;AAAA;AA2FzB;AAAkD;AAgIlD;;;;EAAqE,yBAAA,CAAA,EAAA,OAAA;EAkBzD;AAqCZ;AA0EA;AA8BA;AAuBA;AA+CA;EAMc,MAAA,CAAA,EAAA,aAAA,GAAA,OAAA;EASF;;;;AAuHZ;;;;AC9tBA;AAKA;EAUY,UAAA,CAAA,EAAA,CAAA,IAAA,EAAA,MAAoB,EAAA,MAAA,EAAA,MAAA,EAAA,GAAA,OAAA;EACV;;;;;AAuGtB;;;;ACxHA;AA2DA;;;EAAyE,cAAA,CAAA,EGXpD,MHWoD,EAAA;EAAI;;;;ACkc7E;;EAAqD,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;CAAsB;AAAO,KEnctE,uBAAA,GFmcsE;;;;ACrcrE,KCID,sBAAA,GDKX;EAiQY,OAAA,EAAA,MAAA;EAsDA,KAAA,EAAA,MAAA;EA2CA,QAAA,ECpWC,uBDuWb,EAAA;AAQD,CAAA;AAuKA;AA6CA;AAuBA;AAqBA;AAgBA;;cChWa,iCACF,kBACE,6BACV;;;;AN3SH;AA+FA;;;;;ACnIiB;AA4BG;AA+Df,KM7GO,eAAA,GN6Ge;EAwBtB;EAeA,KAAA,EAAA,MAAA;EACC;EACA,KAAA,EAAA,MAAA;EACA;EACA,KAAA,EAAA,MAAA;EACA;EAAmB,QAAA,EAAA,MAAA;AAAA,CAAA;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA;AAqCA;AA0EA;AA8BA;AAuBA;AA+CA;;;AA6DkB,cMvkBL,mBNukBK,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GMvkB8B,eNukB9B,EAAA;;;AAyElB;;;;AC9tBA;AAKA;AAUA;;;;;AAI8B,cK2HjB,wBL3HiB,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,QAAA,EK2HmC,eL3HnC,EAAA,EAAA,GAAA,MAAA;AAoG9B;;;;ACxHA;AA2DA;AAAyC,cI2G5B,oBJ3G4B,EAAA,CAAA,QAAA,EI4G3B,eJ5G2B,EAAA,EAAA,GAAA;EAAgB,WAAA,EAAA,gBAAA,GAAA,iBAAA;EAAgB,KAAA,EAAA,OAAA;EAAI,QAAA,CAAA,EAAA,MAAA;;;;ACkc7E;;;;AAAkF,cGtTrE,kBHsTqE,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA;;;;ECrcrE,QAAA,CAAA,EAAA,MAAA;EA0QA,QAAA,EEpHC,eFoHD,EAAA;AAsDb,CAAA,GAAa,IAAA"}
1
+ {"version":3,"file":"index.d.mts","names":[],"sources":["../src/segmentation/fuzzy.ts","../src/segmentation/types.ts","../src/segmentation/pattern-validator.ts","../src/segmentation/replace.ts","../src/segmentation/segmenter.ts","../src/segmentation/tokens.ts","../src/analysis/line-starts.ts","../src/analysis/repeating-sequences.ts","../src/detection.ts","../src/recovery.ts"],"sourcesContent":[],"mappings":";;AAkEA;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA;AAqCA;AA0EY,cD/bC,WC+bqB,EAAA,CAAA,CAAA,EAAA,MAAc,EAAA,GAAA,MAAA;AA8BhD;AAuBA;AA+CA;;;;;;AAsIA;;;;AC9tBA;AAKA;AAUA;;;;;;AAwGA;;;;ACxHA;AA2DA;;;;;;;;AC6SA;;AAAqD,cJnNxC,wBImNwC,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;AJlTrD;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA,KApXK,YAAA,GAoXW;EAqCJ;EA0EA,KAAA,EAAA,MAAU;AA8BtB,CAAA;AAuBA;AA+CA;;;;;;AAsIA;;;;AC9tBA;AAKA;AAUA;;;;;;AAwGA;;;;ACxHA,KF8CK,eAAA,GE9CkB;EA2DV;EAA4B,QAAA,EAAA,MAAA;CAAgB;;;;;;AC6SzD;;;;;;;;AChTA;AA0QA;AAsDA;AA2CA;AAWA;AAuKA;AA6CA;AAuBA;AAqBA;AAgBA;;;KJlnBK,qBAAA;EKnEO;EAcA,cAAA,EAAA,MAAA,EAAA;AAEZ,CAAA;AAwQA;;;;;;;;AClRA;AAaA;AAOA;AA2OA;;;;;;;;AC9QA;AA+EA;AAgEA;AAuBA;AAiCA;;;;AC7MA;AAKA,KR8GK,sBAAA,GQ9GwB;EAChB;EACF,eAAA,EAAA,MAAA,EAAA;CACG;;;AAId;AA2BE;AAmnBF;;;;;;;;;AAsDA;;;;;;KRrkBK,mBAAA;;;;;;;;;;;;;;KAeA,WAAA,GACC,eACA,kBACA,wBACA,yBACA;;;;;;;KAYD,aAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA+EO,SAAA;;;;;;;KAYP,eAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAyCS;;;;;;;;;;;;SAaH;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA8DC,SAAA,GAAY,cAAc,gBAAgB;;;;;;;;;;;;;KAkB1C,IAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqCA,cAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAqCE;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqCF,UAAA,YAAsB;;;;;;;;;;;;;;;;;;;;;;;;;UA8BjB,MAAA;;;;;;;;;;;;;;;;;;;;;;KAuBL,WAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA+CA,mBAAA;;;;;;YAME;;;;;;;;UASF;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;gBA8CM;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;WAwDL;;;;;;;;;;;;;;;;KAiBD,OAAA;;;;;;;;;;;;;;;;;;;;;;;;;;SA6BD;;;;AA1uBM;AA4BG;AA8BM;AAyDrB,KCpIO,mBAAA,GDoIY,gBAAA,GAAA,eAAA,GAAA,WAAA;AAAA;;;AAkBlB,KCjJM,eAAA,GDiJN;EACA,IAAA,ECjJI,mBDiJJ;EACA,OAAA,EAAA,MAAA;EAAmB,UAAA,CAAA,EAAA,MAAA;AAAA,CAAA;AA2FzB;AAAkD;AAgIlD;;AAAsC,KCpW1B,oBAAA,GDoW0B;EAAgB,cAAA,CAAA,EAAA,CCnWhC,eDmWgC,GAAA,SAAA,CAAA,EAAA;EAAe,eAAA,CAAA,EAAA,CClW9C,eDkW8C,GAAA,SAAA,CAAA,EAAA;EAkBzD,YAAI,CAAA,EAAA,CCnXI,eDmXJ,GAAA,SAAA,CAAA,EAAA;EAqCJ,QAAA,CAAA,ECvZG,eDuZW;AA0E1B,CAAA;AA8BA;AAuBA;AA+CA;;;;;;AAsIA;;;;AC9tBA;AAKA;AAUA;;;;;AAI8B,cAoGjB,aApGiB,EAAA,CAAA,KAAA,EAoGO,SApGP,EAAA,EAAA,GAAA,CAoGsB,oBApGtB,GAAA,SAAA,CAAA,EAAA;;;AFkC9B;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAuCtB,KEpJO,WAAA,GAAc,WFoJV,CEpJsB,mBFoJtB,CAAA,SAAA,CAAA,CAAA,CAAA,MAAA,CAAA;;;;;;;AAKS;AA2FzB;AAAkD;AAgIlD;AAAwB,cEzTX,iBFyTW,EAAA,CAAA,KAAA,EEzTiB,IFyTjB,EAAA,EAAA,KAAA,CAAA,EEzTiC,WFyTjC,EAAA,EAAA,GEzTiD,IFyTjD,EAAA;;;;;AAkBxB;AAqCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;AAsIA;;;;AC9tBA;AAKA;AAUA;;;;;;AAwGA;;;;ACxHA;AA2DA;;;;;;;;AC6SA;;;;;;;;AChTA;AA0QA;AAsDA;AA2CA;AAWA;AAuKA;AA6CA;AAuBA;AAqBA;AAgBA;cDtVa,sBAAuB,iBAAiB,wBAAmB;;;;AJlTxE;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA;AAqCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;AAsIA;;;;AC9tBA;AAKA;AAUA;;;;;;AAwGA;;;;ACxHA;AA2DA;;;;;;;;AC6SA;AAAoC,cChTvB,sBDgTuB,EAAA,CAAA,OAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;;;;AChTpC;AA0QA;AAsDA;AA2CA;AAWA;AAuKA;AA6CA;AAuBA;AAqBa,cA5WA,+BA4WsD,EAAA,CAAA,QAAA,EAAA,MAAA,EAAA,GAAA,MAAA;AAgBnE;;;;ACrrBA;AAcA;AAEA;AAwQA;;;;;;;;AClRA;AAaA;AAOA;AA2OA;;;;;;;cF0Ga,gBAAgB;AGxX7B;AA+EA;AAgEA;AAuBA;AAiCA;;;;AC7MA;AAKA;;;;;;AAOY,cJ6ZC,cI7ZmB,EAAA,CAAA,KAQpB,EAAA,MAMC,EAAK,GAAA,OAAA;AAahB;AAmnBF;;;;;AAO2B,KJ7Of,YAAA,GI6Oe;EAEd;;;AA6Cb;;EAEgF,OAAA,EAAA,MAAA;EACnE;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;cJxHA,mHAIV;;;;;;;;;;;;;;;;;;;;cAyCU;;;;;;;;;;;;;;;;;;;;;;cAuBA,uCAAmC;;;;;;;;;;;;;cAqBnC;;;;;;;;;;;;;;;cAgBA;;;ALxoBA,KM7CD,wBAAA,GN6C8E;EA+F7E,IAAA,CAAA,EAAA,MAAA;;;;ECnIR,WAAA,CAAA,EAAA,MAAY;EA4BZ,wBAAe,CAAA,EAAA,OAAA;EA8Bf,yBAAqB,CAAA,EAAA,OAAA;EAiCrB,MAAA,CAAA,EAAA,aAAA,GAAsB,OAAA;EAwBtB,UAAA,CAAA,EAAA,CAAA,IAAA,EAAA,MAAmB,EAAA,MAAA,EAAA,MAAA,EAAA,GAAA,OAAA;EAenB,cAAW,CAAA,EKjIK,MLiIL,EAAA;EACV,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;CACA;AACA,KKhIM,uBAAA,GLgIN;EACA,IAAA,EAAA,MAAA;EACA,MAAA,EAAA,MAAA;CAAmB;AAYpB,KK5IO,sBAAA,GL4IM;EA+EN,OAAA,EAAA,MAAS;EAYhB,KAAA,EAAA,MAAA;EAoHO,QAAA,EKxVE,uBLwVO,EAAA;CAAG;;;;AAkBZ,cKrGC,uBLqGG,EAAA,CAAA,KAAA,EKpGL,ILoGK,EAAA,EAAA,OAAA,CAAA,EKnGH,wBLmGG,EAAA,GKlGb,sBLkGa,EAAA;;;AAjQX,KMtHO,wBAAA,GNsHY;EAenB,WAAA,CAAA,EAAW,MAAA;EACV,WAAA,CAAA,EAAA,MAAA;EACA,QAAA,CAAA,EAAA,MAAA;EACA,IAAA,CAAA,EAAA,MAAA;EACA,yBAAA,CAAA,EAAA,OAAA;EACA,YAAA,CAAA,EAAA,OAAA;EAAmB,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;EAYpB,WAAA,CAAA,EAAA,MAAa;EA+EN,YAAS,CAAA,EAAA,MAAA;EAYhB,iBAAA,CAAe,EAAA,MAAA;AAoHpB,CAAA;AAAwB,KMxVZ,wBAAA,GNwVY;EAAc,IAAA,EAAA,MAAA;EAAgB,OAAA,EAAA,MAAA;EAAe,MAAA,EAAA,MAAA;EAkBzD,YAAI,EAAA,MAAA,EAAA;AAqChB,CAAA;AA0EY,KMldA,wBAAA,GNkdsB;EA8BjB,OAAA,EAAM,MAAA;EAuBX,KAAA,EAAA,MAAA;EA+CA,QAAA,EMnjBE,wBNmjBiB,EAAA;CAMjB;;;AAgId;;;;AC9tBY,cK6QC,yBL7QkB,EAAA,CAAA,KAAA,EK8QpB,IL9QoB,EAAA,EAAA,OAAA,CAAA,EK+QjB,wBL/QiB,EAAA,GKgR5B,wBLhR4B,EAAA;;;;AFqD/B;AA+FA;;;;;ACnIiB;AA4BG;AA+Df,KO7GO,eAAA,GP6Ge;EAwBtB;EAeA,KAAA,EAAA,MAAA;EACC;EACA,KAAA,EAAA,MAAA;EACA;EACA,KAAA,EAAA,MAAA;EACA;EAAmB,QAAA,EAAA,MAAA;AAAA,CAAA;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA;AAqCA;AA0EA;AA8BA;AAuBA;AA+CA;;;AA6DkB,cOvkBL,mBPukBK,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GOvkB8B,ePukB9B,EAAA;;;AAyElB;;;;AC9tBA;AAKA;AAUA;;;;;AAI8B,cM2HjB,wBN3HiB,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,QAAA,EM2HmC,eN3HnC,EAAA,EAAA,GAAA,MAAA;AAoG9B;;;;ACxHA;AA2DA;AAAyC,cK2G5B,oBL3G4B,EAAA,CAAA,QAAA,EK4G3B,eL5G2B,EAAA,EAAA,GAAA;EAAgB,WAAA,EAAA,gBAAA,GAAA,iBAAA;EAAgB,KAAA,EAAA,OAAA;EAAI,QAAA,CAAA,EAAA,MAAA;;;;AC6S7E;;;;AAAwE,cIjK3D,kBJiK2D,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA;;;;EChT3D,QAAA,CAAA,EAAA,MAAA;EA0QA,QAAA,EGpHC,eHoHD,EAAA;AAsDb,CAAA,GAAa,IAAA;;;ALlUA,KS5DD,sBAAA,GT4D8E;EA+F7E,IAAA,EAAA,cAAA;;;;ECnIR,KAAA,CAAA,EAAA,OAAY,GAAA,YAAA;EA4BZ,QAAA,EAAA,MAAA,EAAe;AAAA,CAAA,GA8Bf;EAiCA,IAAA,EAAA,WAAA;EAwBA,SAAA,EAAA,CAAA,IAAA,EQxIwC,SRwIrB,EAAA,KAAA,EAAA,MAAA,EAAA,GAAA,OAAA;AAAA,CAAA;AAgBlB,KQtJM,iBAAA,GRsJN;EACA,OAAA,EQtJO,mBRsJP;EACA,KAAA,EQtJK,IRsJL,EAAA;EACA,QAAA,EQtJQ,ORsJR,EAAA;EACA,QAAA,EQtJQ,sBRsJR;CAAmB;AAYpB,KQ/JO,oBAAA,GR+JM;EA+EN,OAAA,EAAA;IAYP,IAAA,EAAA,YAAe,GAAA,wBAsDH;IA8DL,SAAS,EAAA,MAAA;IAAG,aAAA,EAAA,MAAA;IAAc,SAAA,EAAA,MAAA;IAAgB,UAAA,EAAA,MAAA;EAAe,CAAA;EAkBzD,KAAA,CAAA,EQxXA,KRwXI,CAAA;IAqCJ,SAAA,EAAA,MAAc;IA0Ed,QAAA,EAAU,MAAA;IA8BL,aAAM,EAAA,MAAA;IAuBX,UAAW,EAAA,MAAA;EA+CX,CAAA,CAAA;EAME,OAAA,EQ3kBD,KR2kBC,CAAA;IASF,IAAA,EAAA,MAAA;IA8CM,oBAAA,EAAA,MAAA;IAwDL,sBAAA,CAAA,EAAA,MAAA;IAAM,qBAAA,CAAA,EAAA,MAAA;IAiBP,YAAO,EAAA,MA6BR;;;;IC3vBC,KAAA,CAAA,EAAA,MAAA,EAAA;EAKA,CAAA,CAAA;EAUA,MAAA,EAAA,MAAA,EAAA;EACU,QAAA,EAAA,MAAA,EAAA;CACC;KOiBlB,oBAAA,GPhBe,MAAA,GAAA,YAAA,GAAA,qBAAA;AACL,iBOgoBC,qCAAA,CPhoBD,KAAA,EOioBJ,IPjoBI,EAAA,EAAA,QAAA,EOkoBD,OPloBC,EAAA,EAAA,OAAA,EOmoBF,mBPnoBE,EAAA,QAAA,EOooBD,sBPpoBC,EAAA,KAAA,EAAA;EAAe,IAAA,CAAA,EAAA,YAAA,GAAA,wBAAA;EAoGjB,gBA0CZ,CAAA,EOyf0B,oBPniByB;;UOqiBvC;YAAgC;AN7pB7C,CAAA;AA2Da,iBM+oBG,6BAAA,CNznBf,IAAA,EM0nBS,iBN1nBT,EAAA,EAAA,IAtBwE,CAsBxE,EAAA;EAtBwC,IAAA,CAAA,EAAA,YAAA,GAAA,wBAAA;EAAgB,gBAAA,CAAA,EMipBuB,oBNjpBvB;CAAgB,CAAA,EAAA;EAAI,MAAA,EMkpBhE,oBNlpBgE;YMkpBhC"}