flappa-doormal 2.9.0 → 2.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +19 -9
- package/README.md +73 -1
- package/dist/index.d.mts +138 -1
- package/dist/index.d.mts.map +1 -1
- package/dist/index.mjs +231 -2
- package/dist/index.mjs.map +1 -1
- package/package.json +2 -2
package/AGENTS.md
CHANGED
|
@@ -42,6 +42,7 @@ src/
|
|
|
42
42
|
├── breakpoint-processor.ts # Breakpoint post-processing engine (applyBreakpoints)
|
|
43
43
|
├── breakpoint-utils.ts # Breakpoint processing utilities (windowing, excludes, page joins)
|
|
44
44
|
├── rule-regex.ts # SplitRule -> compiled regex builder (buildRuleRegex, processPattern)
|
|
45
|
+
├── optimize-rules.ts # Rule optimization logic (merge, dedupe, sort)
|
|
45
46
|
├── tokens.ts # Token definitions and expansion logic
|
|
46
47
|
├── fuzzy.ts # Diacritic-insensitive matching utilities
|
|
47
48
|
├── html.ts # HTML utilities (stripHtmlTags)
|
|
@@ -69,32 +70,38 @@ src/
|
|
|
69
70
|
- Deterministic mode reruns segmentation with selected rules converted to `lineStartsWith` and merges recovered `content` back into the provided segments
|
|
70
71
|
- Optional `mode: 'best_effort_then_rerun'` attempts a conservative anchor-based recovery first, then falls back to rerun for unresolved segments
|
|
71
72
|
|
|
72
|
-
|
|
73
|
+
3. **`tokens.ts`** - Template system
|
|
73
74
|
- `TOKEN_PATTERNS` - Map of token names to regex patterns
|
|
74
75
|
- `expandTokensWithCaptures()` - Expands `{{token:name}}` syntax
|
|
75
76
|
- `shouldDefaultToFuzzy()` - Checks if patterns contain fuzzy-default tokens (bab, basmalah, fasl, kitab, naql)
|
|
77
|
+
- `applyTokenMappings()` - Applies named captures (`{{token:name}}`) to raw templates
|
|
78
|
+
- `stripTokenMappings()` - Strips named captures (reverts to `{{token}}`)
|
|
76
79
|
- Supports fuzzy transform for diacritic-insensitive matching
|
|
77
80
|
- **Fuzzy-default tokens**: `bab`, `basmalah`, `fasl`, `kitab`, `naql` - auto-enable fuzzy matching unless `fuzzy: false` is set
|
|
78
81
|
|
|
79
|
-
|
|
82
|
+
4. **`match-utils.ts`** - Extracted utilities (for testability)
|
|
80
83
|
- `extractNamedCaptures()` - Get named groups from regex match
|
|
81
84
|
- `filterByConstraints()` - Apply min/max page filters
|
|
82
85
|
- `anyRuleAllowsId()` - Check if page passes rule constraints
|
|
83
86
|
|
|
84
|
-
|
|
87
|
+
5. **`rule-regex.ts`** - SplitRule → compiled regex builder
|
|
85
88
|
- `buildRuleRegex()` - Compiles rule patterns (`lineStartsWith`, `lineStartsAfter`, `lineEndsWith`, `template`, `regex`)
|
|
86
89
|
- `processPattern()` - Token expansion + auto-escaping + optional fuzzy application
|
|
87
90
|
- `extractNamedCaptureNames()` - Extract `(?<name>...)` groups from raw regex patterns
|
|
88
91
|
|
|
89
|
-
|
|
92
|
+
6. **`optimize-rules.ts`** - Rule management logic
|
|
93
|
+
- `optimizeRules()` - Merges compatible rules, deduplicates patterns, and sorts by specificity (longest patterns first)
|
|
94
|
+
|
|
95
|
+
7. **`pattern-validator.ts`** - Rule validation utilities
|
|
90
96
|
- `validateRules()` - Detects typos in patterns (missing `{{}}`, unknown tokens, duplicates)
|
|
97
|
+
- `formatValidationReport()` - Formats validation issues into human-readable strings
|
|
91
98
|
- Returns parallel array structure for easy error tracking
|
|
92
99
|
|
|
93
|
-
|
|
100
|
+
8. **`breakpoint-processor.ts`** - Breakpoint post-processing engine
|
|
94
101
|
- `applyBreakpoints()` - Splits oversized structural segments using breakpoint patterns + windowing
|
|
95
102
|
- Applies `pageJoiner` normalization to breakpoint-created segments
|
|
96
103
|
|
|
97
|
-
|
|
104
|
+
9. **`breakpoint-utils.ts`** - Breakpoint processing utilities
|
|
98
105
|
- `normalizeBreakpoint()` - Convert string to BreakpointRule object
|
|
99
106
|
- `isPageExcluded()` - Check if page is in exclude list
|
|
100
107
|
- `isInBreakpointRange()` - Validate page against min/max/exclude constraints
|
|
@@ -111,16 +118,17 @@ src/
|
|
|
111
118
|
- `findNextPagePosition()` - Find next page content position
|
|
112
119
|
- `findPatternBreakPosition()` - Find pattern match by preference
|
|
113
120
|
|
|
114
|
-
|
|
121
|
+
10. **`types.ts`** - Type definitions
|
|
115
122
|
- `Logger` interface - Optional logging for debugging
|
|
116
123
|
- `SegmentationOptions` - Options with `logger` property
|
|
117
124
|
- `pageJoiner` - Controls how page boundaries are represented in output (`space` default)
|
|
125
|
+
- `PATTERN_TYPE_KEYS` - Runtime array of all pattern types (for UI building)
|
|
118
126
|
- Verbosity levels: `trace`, `debug`, `info`, `warn`, `error`
|
|
119
127
|
|
|
120
|
-
|
|
128
|
+
11. **`fuzzy.ts`** - Arabic text normalization
|
|
121
129
|
- `makeDiacriticInsensitive()` - Generate regex that ignores diacritics
|
|
122
130
|
|
|
123
|
-
|
|
131
|
+
12. **`pattern-detection.ts`** - Token auto-detection (NEW)
|
|
124
132
|
- `detectTokenPatterns()` - Detect tokens in text with positions
|
|
125
133
|
- `generateTemplateFromText()` - Convert text to template string
|
|
126
134
|
- `suggestPatternConfig()` - Suggest rule configuration
|
|
@@ -402,6 +410,8 @@ bunx biome lint .
|
|
|
402
410
|
- `src/segmentation/replace.ts` (preprocessing parity)
|
|
403
411
|
- `src/recovery.ts` (recovery implementation)
|
|
404
412
|
|
|
413
|
+
10. **Prefer library utilities for UI tasks**: Instead of re-implementing rule merging, validation, or token mapping in client code, use `optimizeRules`, `validateRules`/`formatValidationReport`, and `applyTokenMappings`. They handle edge cases (like duplicate patterns, regex safety, or diacritic handling) that ad-hoc implementations might miss.
|
|
414
|
+
|
|
405
415
|
### Process Template (Multi-agent design review, TDD-first)
|
|
406
416
|
|
|
407
417
|
If you want to repeat the “write a plan → get multiple AI critiques → synthesize → update plan → implement TDD-first” workflow, use:
|
package/README.md
CHANGED
|
@@ -158,6 +158,23 @@ const rules = [{
|
|
|
158
158
|
| `template` | Depends | Custom pattern with full control |
|
|
159
159
|
| `regex` | Depends | Raw regex for complex cases |
|
|
160
160
|
|
|
161
|
+
#### Building UIs with Pattern Type Keys
|
|
162
|
+
|
|
163
|
+
The library exports `PATTERN_TYPE_KEYS` (a const array) and `PatternTypeKey` (a type) for building UIs that let users select pattern types:
|
|
164
|
+
|
|
165
|
+
```typescript
|
|
166
|
+
import { PATTERN_TYPE_KEYS, type PatternTypeKey } from 'flappa-doormal';
|
|
167
|
+
|
|
168
|
+
// PATTERN_TYPE_KEYS = ['lineStartsWith', 'lineStartsAfter', 'lineEndsWith', 'template', 'regex']
|
|
169
|
+
|
|
170
|
+
// Build a dropdown/select
|
|
171
|
+
PATTERN_TYPE_KEYS.map(key => <option value={key}>{key}</option>)
|
|
172
|
+
|
|
173
|
+
// Type-safe validation
|
|
174
|
+
const isPatternKey = (k: string): k is PatternTypeKey =>
|
|
175
|
+
(PATTERN_TYPE_KEYS as readonly string[]).includes(k);
|
|
176
|
+
```
|
|
177
|
+
|
|
161
178
|
### 4.1 Page-start Guard (avoid page-wrap false positives)
|
|
162
179
|
|
|
163
180
|
When matching at line starts (e.g., `{{naql}}`), a new page can begin with a marker that is actually a **continuation** of the previous page (page wrap), not a true new segment.
|
|
@@ -495,8 +512,34 @@ const segments = segmentPages(pages, { rules });
|
|
|
495
512
|
// ]
|
|
496
513
|
```
|
|
497
514
|
|
|
498
|
-
## Rule
|
|
515
|
+
## Rule Optimization
|
|
499
516
|
|
|
517
|
+
Use `optimizeRules()` to automatically merge compatible rules, remove duplicate patterns, and sort rules by specificity (longest patterns first):
|
|
518
|
+
|
|
519
|
+
```typescript
|
|
520
|
+
import { optimizeRules } from 'flappa-doormal';
|
|
521
|
+
|
|
522
|
+
const rules = [
|
|
523
|
+
// These will be merged because meta/fuzzy options match
|
|
524
|
+
{ lineStartsWith: ['{{kitab}}'], fuzzy: true, meta: { type: 'header' } },
|
|
525
|
+
{ lineStartsWith: ['{{bab}}'], fuzzy: true, meta: { type: 'header' } },
|
|
526
|
+
|
|
527
|
+
// This will be kept separate
|
|
528
|
+
{ lineStartsAfter: ['{{numbered}}'], meta: { type: 'entry' } },
|
|
529
|
+
];
|
|
530
|
+
|
|
531
|
+
const { rules: optimized, mergedCount } = optimizeRules(rules);
|
|
532
|
+
|
|
533
|
+
// Result:
|
|
534
|
+
// optimized[0] = {
|
|
535
|
+
// lineStartsWith: ['{{kitab}}', '{{bab}}'],
|
|
536
|
+
// fuzzy: true,
|
|
537
|
+
// meta: { type: 'header' }
|
|
538
|
+
// }
|
|
539
|
+
// optimized[1] = { lineStartsAfter: ['{{numbered}}'], ... }
|
|
540
|
+
```
|
|
541
|
+
|
|
542
|
+
## Rule Validation
|
|
500
543
|
|
|
501
544
|
Use `validateRules()` to detect common mistakes in rule patterns before running segmentation:
|
|
502
545
|
|
|
@@ -512,6 +555,16 @@ const issues = validateRules([
|
|
|
512
555
|
// issues[0]?.lineStartsAfter?.[0]?.type === 'missing_braces'
|
|
513
556
|
// issues[1]?.lineStartsWith?.[0]?.type === 'unknown_token'
|
|
514
557
|
// issues[2]?.lineStartsAfter?.[0]?.type === 'missing_braces'
|
|
558
|
+
|
|
559
|
+
// To get a simple list of error strings for UI display:
|
|
560
|
+
import { formatValidationReport } from 'flappa-doormal';
|
|
561
|
+
|
|
562
|
+
const errors = formatValidationReport(issues);
|
|
563
|
+
// [
|
|
564
|
+
// 'Rule 1, lineStartsAfter: Missing {{}} around token "raqms:num"',
|
|
565
|
+
// 'Rule 2, lineStartsWith: Unknown token "{{unknown}}"',
|
|
566
|
+
// ...
|
|
567
|
+
// ]
|
|
515
568
|
```
|
|
516
569
|
|
|
517
570
|
**Checks performed:**
|
|
@@ -519,6 +572,25 @@ const issues = validateRules([
|
|
|
519
572
|
- **Unknown tokens**: Flags tokens inside `{{}}` that don't exist (e.g., `{{nonexistent}}`)
|
|
520
573
|
- **Duplicates**: Finds duplicate patterns within the same rule
|
|
521
574
|
|
|
575
|
+
## Token Mapping Utilities
|
|
576
|
+
|
|
577
|
+
When building UIs for rule editing, it's often useful to separate the *token pattern* (e.g., `{{raqms}}`) from the *capture name* (e.g., `{{raqms:hadithNum}}`).
|
|
578
|
+
|
|
579
|
+
```typescript
|
|
580
|
+
import { applyTokenMappings, stripTokenMappings } from 'flappa-doormal';
|
|
581
|
+
|
|
582
|
+
// 1. Apply user-defined mappings to a raw template
|
|
583
|
+
const template = '{{raqms}} {{dash}}';
|
|
584
|
+
const mappings = [{ token: 'raqms', name: 'num' }];
|
|
585
|
+
|
|
586
|
+
const result = applyTokenMappings(template, mappings);
|
|
587
|
+
// result = '{{raqms:num}} {{dash}}'
|
|
588
|
+
|
|
589
|
+
// 2. Strip captures to get back to the canonical pattern
|
|
590
|
+
const raw = stripTokenMappings(result);
|
|
591
|
+
// raw = '{{raqms}} {{dash}}'
|
|
592
|
+
```
|
|
593
|
+
|
|
522
594
|
## Prompting LLMs / Agents to Generate Rules (Shamela books)
|
|
523
595
|
|
|
524
596
|
### Pre-analysis (no LLM required): generate “hints” from the book
|
package/dist/index.d.mts
CHANGED
|
@@ -218,6 +218,28 @@ type LineEndsWithPattern = {
|
|
|
218
218
|
* - `lineEndsWith` - Match line endings
|
|
219
219
|
*/
|
|
220
220
|
type PatternType = RegexPattern | TemplatePattern | LineStartsWithPattern | LineStartsAfterPattern | LineEndsWithPattern;
|
|
221
|
+
/**
|
|
222
|
+
* Pattern type key names for split rules.
|
|
223
|
+
*
|
|
224
|
+
* Use this array to dynamically iterate over pattern types in UIs,
|
|
225
|
+
* or use the `PatternTypeKey` type for type-safe string unions.
|
|
226
|
+
*
|
|
227
|
+
* @example
|
|
228
|
+
* // Build a dropdown/select in UI
|
|
229
|
+
* PATTERN_TYPE_KEYS.map(key => <option value={key}>{key}</option>)
|
|
230
|
+
*
|
|
231
|
+
* @example
|
|
232
|
+
* // Type-safe pattern key validation
|
|
233
|
+
* const validateKey = (k: string): k is PatternTypeKey =>
|
|
234
|
+
* (PATTERN_TYPE_KEYS as readonly string[]).includes(k);
|
|
235
|
+
*/
|
|
236
|
+
declare const PATTERN_TYPE_KEYS: readonly ["lineStartsWith", "lineStartsAfter", "lineEndsWith", "template", "regex"];
|
|
237
|
+
/**
|
|
238
|
+
* String union of pattern type key names.
|
|
239
|
+
*
|
|
240
|
+
* Derived from `PATTERN_TYPE_KEYS` to stay in sync automatically.
|
|
241
|
+
*/
|
|
242
|
+
type PatternTypeKey = (typeof PATTERN_TYPE_KEYS)[number];
|
|
221
243
|
/**
|
|
222
244
|
* Configuration for how and where to split content when a pattern matches.
|
|
223
245
|
*
|
|
@@ -745,6 +767,46 @@ type Segment = {
|
|
|
745
767
|
meta?: Record<string, unknown>;
|
|
746
768
|
};
|
|
747
769
|
//#endregion
|
|
770
|
+
//#region src/segmentation/optimize-rules.d.ts
|
|
771
|
+
/**
|
|
772
|
+
* Result from optimizing rules.
|
|
773
|
+
*/
|
|
774
|
+
type OptimizeResult = {
|
|
775
|
+
/** Optimized rules (merged and sorted by specificity) */
|
|
776
|
+
rules: SplitRule[];
|
|
777
|
+
/** Number of rules that were merged into existing rules */
|
|
778
|
+
mergedCount: number;
|
|
779
|
+
};
|
|
780
|
+
/**
|
|
781
|
+
* Optimize split rules by merging compatible rules and sorting by specificity.
|
|
782
|
+
*
|
|
783
|
+
* This function:
|
|
784
|
+
* 1. **Merges compatible rules**: Rules with the same pattern type and identical
|
|
785
|
+
* options (meta, fuzzy, min/max, etc.) have their pattern arrays combined
|
|
786
|
+
* 2. **Deduplicates patterns**: Removes duplicate patterns within each rule
|
|
787
|
+
* 3. **Sorts by specificity**: Rules with longer patterns come first
|
|
788
|
+
*
|
|
789
|
+
* Only array-based pattern types (`lineStartsWith`, `lineStartsAfter`, `lineEndsWith`)
|
|
790
|
+
* can be merged. `template` and `regex` rules are kept separate.
|
|
791
|
+
*
|
|
792
|
+
* @param rules - Array of split rules to optimize
|
|
793
|
+
* @returns Optimized rules and count of merged rules
|
|
794
|
+
*
|
|
795
|
+
* @example
|
|
796
|
+
* import { optimizeRules } from 'flappa-doormal';
|
|
797
|
+
*
|
|
798
|
+
* const { rules, mergedCount } = optimizeRules([
|
|
799
|
+
* { lineStartsWith: ['{{kitab}}'], fuzzy: true, meta: { type: 'header' } },
|
|
800
|
+
* { lineStartsWith: ['{{bab}}'], fuzzy: true, meta: { type: 'header' } },
|
|
801
|
+
* { lineStartsAfter: ['{{numbered}}'], meta: { type: 'entry' } },
|
|
802
|
+
* ]);
|
|
803
|
+
*
|
|
804
|
+
* // rules[0] = { lineStartsWith: ['{{kitab}}', '{{bab}}'], fuzzy: true, meta: { type: 'header' } }
|
|
805
|
+
* // rules[1] = { lineStartsAfter: ['{{numbered}}'], meta: { type: 'entry' } }
|
|
806
|
+
* // mergedCount = 1
|
|
807
|
+
*/
|
|
808
|
+
declare const optimizeRules: (rules: SplitRule[]) => OptimizeResult;
|
|
809
|
+
//#endregion
|
|
748
810
|
//#region src/segmentation/pattern-validator.d.ts
|
|
749
811
|
/**
|
|
750
812
|
* Types of validation issues that can be detected.
|
|
@@ -757,6 +819,10 @@ type ValidationIssue = {
|
|
|
757
819
|
type: ValidationIssueType;
|
|
758
820
|
message: string;
|
|
759
821
|
suggestion?: string;
|
|
822
|
+
/** The token name involved in the issue (for unknown_token / missing_braces) */
|
|
823
|
+
token?: string;
|
|
824
|
+
/** The specific pattern involved (for duplicate) */
|
|
825
|
+
pattern?: string;
|
|
760
826
|
};
|
|
761
827
|
/**
|
|
762
828
|
* Validation result for a single rule, with issues keyed by pattern type.
|
|
@@ -788,6 +854,20 @@ type RuleValidationResult = {
|
|
|
788
854
|
* // issues[1]?.lineStartsWith?.[0]?.type === 'unknown_token'
|
|
789
855
|
*/
|
|
790
856
|
declare const validateRules: (rules: SplitRule[]) => (RuleValidationResult | undefined)[];
|
|
857
|
+
/**
|
|
858
|
+
* Formats a validation result array into a list of human-readable error messages.
|
|
859
|
+
*
|
|
860
|
+
* Useful for displaying validation errors in UIs.
|
|
861
|
+
*
|
|
862
|
+
* @param results - The result array from `validateRules()`
|
|
863
|
+
* @returns Array of formatted error strings
|
|
864
|
+
*
|
|
865
|
+
* @example
|
|
866
|
+
* const issues = validateRules(rules);
|
|
867
|
+
* const errors = formatValidationReport(issues);
|
|
868
|
+
* // ["Rule 1, lineStartsWith: Missing {{}} around token..."]
|
|
869
|
+
*/
|
|
870
|
+
declare const formatValidationReport: (results: (RuleValidationResult | undefined)[]) => string[];
|
|
791
871
|
//#endregion
|
|
792
872
|
//#region src/segmentation/replace.d.ts
|
|
793
873
|
/**
|
|
@@ -1127,6 +1207,63 @@ declare const getAvailableTokens: () => string[];
|
|
|
1127
1207
|
* getTokenPattern('unknown') // → undefined
|
|
1128
1208
|
*/
|
|
1129
1209
|
declare const getTokenPattern: (tokenName: string) => string | undefined;
|
|
1210
|
+
/**
|
|
1211
|
+
* Checks if a pattern (or array of patterns) contains tokens that should
|
|
1212
|
+
* default to fuzzy matching.
|
|
1213
|
+
*
|
|
1214
|
+
* Fuzzy-default tokens are: bab, basmalah, fasl, kitab, naql
|
|
1215
|
+
*
|
|
1216
|
+
* @param patterns - Single pattern string or array of pattern strings
|
|
1217
|
+
* @returns `true` if any pattern contains a fuzzy-default token
|
|
1218
|
+
*
|
|
1219
|
+
* @example
|
|
1220
|
+
* shouldDefaultToFuzzy('{{bab}} الإيمان') // true
|
|
1221
|
+
* shouldDefaultToFuzzy('{{raqms}} {{dash}}') // false
|
|
1222
|
+
* shouldDefaultToFuzzy(['{{kitab}}', '{{raqms}}']) // true
|
|
1223
|
+
*/
|
|
1224
|
+
declare const shouldDefaultToFuzzy: (patterns: string | string[]) => boolean;
|
|
1225
|
+
/**
|
|
1226
|
+
* Structure for mapping a token to a capture name.
|
|
1227
|
+
*/
|
|
1228
|
+
type TokenMapping = {
|
|
1229
|
+
token: string;
|
|
1230
|
+
name: string;
|
|
1231
|
+
};
|
|
1232
|
+
/**
|
|
1233
|
+
* Apply token mappings to a template string.
|
|
1234
|
+
*
|
|
1235
|
+
* Transforms `{{token}}` into `{{token:name}}` based on the provided mappings.
|
|
1236
|
+
* Useful for applying user-configured capture names to a raw template.
|
|
1237
|
+
*
|
|
1238
|
+
* - Only affects exact matches of `{{token}}`.
|
|
1239
|
+
* - Does NOT affect tokens that already have a capture name (e.g. `{{token:existing}}`).
|
|
1240
|
+
* - Does NOT affect capture-only tokens (e.g. `{{:name}}`).
|
|
1241
|
+
*
|
|
1242
|
+
* @param template - The template string to transform
|
|
1243
|
+
* @param mappings - Array of mappings from token name to capture name
|
|
1244
|
+
* @returns Transformed template string with captures applied
|
|
1245
|
+
*
|
|
1246
|
+
* @example
|
|
1247
|
+
* applyTokenMappings('{{raqms}} {{dash}}', [{ token: 'raqms', name: 'num' }])
|
|
1248
|
+
* // → '{{raqms:num}} {{dash}}'
|
|
1249
|
+
*/
|
|
1250
|
+
declare const applyTokenMappings: (template: string, mappings: TokenMapping[]) => string;
|
|
1251
|
+
/**
|
|
1252
|
+
* Strip token mappings from a template string.
|
|
1253
|
+
*
|
|
1254
|
+
* Transforms `{{token:name}}` back into `{{token}}`.
|
|
1255
|
+
* Also transforms `{{:name}}` patterns (capture-only) into `{{}}` (which is invalid/empty).
|
|
1256
|
+
*
|
|
1257
|
+
* Useful for normalizing templates for storage or comparison.
|
|
1258
|
+
*
|
|
1259
|
+
* @param template - The template string to strip
|
|
1260
|
+
* @returns Template string with capture names removed
|
|
1261
|
+
*
|
|
1262
|
+
* @example
|
|
1263
|
+
* stripTokenMappings('{{raqms:num}} {{dash}}')
|
|
1264
|
+
* // → '{{raqms}} {{dash}}'
|
|
1265
|
+
*/
|
|
1266
|
+
declare const stripTokenMappings: (template: string) => string;
|
|
1130
1267
|
//#endregion
|
|
1131
1268
|
//#region src/analysis/line-starts.d.ts
|
|
1132
1269
|
type LineStartAnalysisOptions = {
|
|
@@ -1325,5 +1462,5 @@ declare function recoverMistakenMarkersForRuns(runs: MarkerRecoveryRun[], opts?:
|
|
|
1325
1462
|
segments: Segment[];
|
|
1326
1463
|
};
|
|
1327
1464
|
//#endregion
|
|
1328
|
-
export { type Breakpoint, type BreakpointRule, type CommonLineStartPattern, type DetectedPattern, type ExpandResult, type LineStartAnalysisOptions, type LineStartPatternExample, type Logger, type MarkerRecoveryReport, type MarkerRecoveryRun, type MarkerRecoverySelector, type Page, type PageRange, type RepeatingSequenceExample, type RepeatingSequenceOptions, type RepeatingSequencePattern, type ReplaceRule, type RuleValidationResult, type Segment, type SegmentationOptions, type SplitRule, TOKEN_PATTERNS, type ValidationIssue, type ValidationIssueType, analyzeCommonLineStarts, analyzeRepeatingSequences, analyzeTextForRule, applyReplacements, containsTokens, detectTokenPatterns, escapeRegex, escapeTemplateBrackets, expandCompositeTokensInTemplate, expandTokens, expandTokensWithCaptures, generateTemplateFromText, getAvailableTokens, getTokenPattern, makeDiacriticInsensitive, recoverMistakenLineStartsAfterMarkers, recoverMistakenMarkersForRuns, segmentPages, suggestPatternConfig, templateToRegex, validateRules };
|
|
1465
|
+
export { type Breakpoint, type BreakpointRule, type CommonLineStartPattern, type DetectedPattern, type ExpandResult, type LineStartAnalysisOptions, type LineStartPatternExample, type Logger, type MarkerRecoveryReport, type MarkerRecoveryRun, type MarkerRecoverySelector, type OptimizeResult, PATTERN_TYPE_KEYS, type Page, type PageRange, type PatternTypeKey, type RepeatingSequenceExample, type RepeatingSequenceOptions, type RepeatingSequencePattern, type ReplaceRule, type RuleValidationResult, type Segment, type SegmentationOptions, type SplitRule, TOKEN_PATTERNS, type TokenMapping, type ValidationIssue, type ValidationIssueType, analyzeCommonLineStarts, analyzeRepeatingSequences, analyzeTextForRule, applyReplacements, applyTokenMappings, containsTokens, detectTokenPatterns, escapeRegex, escapeTemplateBrackets, expandCompositeTokensInTemplate, expandTokens, expandTokensWithCaptures, formatValidationReport, generateTemplateFromText, getAvailableTokens, getTokenPattern, makeDiacriticInsensitive, optimizeRules, recoverMistakenLineStartsAfterMarkers, recoverMistakenMarkersForRuns, segmentPages, shouldDefaultToFuzzy, stripTokenMappings, suggestPatternConfig, templateToRegex, validateRules };
|
|
1329
1466
|
//# sourceMappingURL=index.d.mts.map
|
package/dist/index.d.mts.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.d.mts","names":[],"sources":["../src/segmentation/fuzzy.ts","../src/segmentation/types.ts","../src/segmentation/pattern-validator.ts","../src/segmentation/replace.ts","../src/segmentation/segmenter.ts","../src/segmentation/tokens.ts","../src/analysis/line-starts.ts","../src/analysis/repeating-sequences.ts","../src/detection.ts","../src/recovery.ts"],"sourcesContent":[],"mappings":";;AAkEA;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;
|
|
1
|
+
{"version":3,"file":"index.d.mts","names":[],"sources":["../src/segmentation/fuzzy.ts","../src/segmentation/types.ts","../src/segmentation/optimize-rules.ts","../src/segmentation/pattern-validator.ts","../src/segmentation/replace.ts","../src/segmentation/segmenter.ts","../src/segmentation/tokens.ts","../src/analysis/line-starts.ts","../src/analysis/repeating-sequences.ts","../src/detection.ts","../src/recovery.ts"],"sourcesContent":[],"mappings":";;AAkEA;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBY,cDnUC,WCmUG,EAAA,CAAA,CAAA,EAAA,MAAA,EAAA,GAAA,MAAA;AAoChB;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAwJA;;;;AC5tBA;AAsGA;;;;AC5GA;AAKA;AAcA;;;;;;AA8GA;AAwDA;;;;AC1LA;AA2DA;AAAyC,cJ0F5B,wBI1F4B,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;AJLzC;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;KArVK,YAAA,GAqViD;EAAe;EAkBzD,KAAA,EAAI,MAAA;AAoChB,CAAA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAwJA;;;;AC5tBA;AAsGA;;;;AC5GA;AAKA;AAcA;;KF0BK,eAAA,GExBkB;EACH;EACL,QAAA,EAAA,MAAA;CAAe;AA0G9B;AAwDA;;;;AC1LA;AA2DA;;;;;;;;AC+SA;;;;;;;;AClTA;AA0QA;AAsDA;AA2CA,KLvVK,qBAAA,GK0VJ;EAQW;EAuKC,cAAA,EAAA,MAAA,EAAA;AA6Cb,CAAA;AAuBA;AAqBA;AAgBA;AA8BA;AAWA;AAoBA;AA6BA;;;;AC/wBA;AAcA;AAEA;AAwQA;;;;;;;;AClRA;AAaA;AAOA;AA2OA;;;;KPjKK,sBAAA,GOoKsB;;;;ACjR3B;AA+EA;AAgEA;AAuBA;AAiCA;;;;AC7MA;AAKA;;;;;;AAOA;AA2BE;AAmnBF;;KT/gBK,mBAAA,GSihBS;EACD;EACC,YAAA,EAAA,MAAA,EAAA;CAGa;;;;AA+C3B;;;;;;;KTtjBK,WAAA,GACC,eACA,kBACA,wBACA,yBACA;;;;;;;;;;;;;;;;cAiBO;;;;;;KAOD,cAAA,WAAyB;;;;;;;KAYhC,aAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAiDO,SAAA;;;;;;;KAYP,eAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAyCS;;;;;;;;;;;;SAaH;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAuDC,SAAA,GAAY,cAAc,gBAAgB;;;;;;;;;;;;;KAkB1C,IAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAoCA,cAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAqCE;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqCF,UAAA,YAAsB;;;;;;;;;;;;;;;;;;;;;;;;;UA8BjB,MAAA;;;;;;;;;;;;;;;;;;;;;;KAuBL,WAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA+CA,mBAAA;;;;;;YAME;;;;;;;;UASF;;;;;;;;;;;;;;cAiBY;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;gBA+CN;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;WAwDL;;;;;;;;;;;;;;;;KAiBD,OAAA;;;;;;;;;;;;;;;;;;;;;;;;;;SA6BD;;;;AA3nBa;;;AAkBlB,KChJM,cAAA,GDgJN;EACA;EACA,KAAA,EChJK,SDgJL,EAAA;EAAmB;EAiBZ,WAAA,EAAA,MAAA;AAOb,CAAA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBA;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAwJA;;;;AC5tBA;AAsGA;;;cAAa,uBAAwB,gBAAc;;;AD3FlC;AA4BG;AA8BM;AAyDrB,KEpIO,mBAAA,GFoIY,gBAAA,GAAA,eAAA,GAAA,WAAA,GAAA,eAAA;AAAA;;;AAkBlB,KEjJM,eAAA,GFiJN;EACA,IAAA,EEjJI,mBFiJJ;EACA,OAAA,EAAA,MAAA;EAAmB,UAAA,CAAA,EAAA,MAAA;EAiBZ;EAOD,KAAA,CAAA,EAAA,MAAA;EAYP;EAiDO,OAAA,CAAA,EAAA,MAAS;AAA6B,CAAA;AAyHlD;;;;AAAqE,KEnVzD,oBAAA,GFmVyD;EAkBzD,cAAI,CAAA,EAAA,CEpWM,eFoWN,GAAA,SAAA,CAAA,EAAA;EAoCJ,eAAA,CAAA,EAAc,CEvYH,eF4aT,GAAA,SAAS,CAAA,EAAA;EAqCX,YAAA,CAAU,EAAA,CEhdF,eFgdc,GAAA,SAAc,CAAA,EAAA;EA8B/B,QAAA,CAAM,EE7eR,eF6eQ;AAuBvB,CAAA;AA+CA;;;;;;;AAwJA;;;;AC5tBA;AAsGA;;;;AC5GA;AAKA;AAcA;AACsB,cA6GT,aA7GS,EAAA,CAAA,KAAA,EA6Ge,SA7Gf,EAAA,EAAA,GAAA,CA6G8B,oBA7G9B,GAAA,SAAA,CAAA,EAAA;;;;;AA6GtB;AAwDA;;;;AC1LA;AA2DA;;;AAAyE,cD+H5D,sBC/H4D,EAAA,CAAA,OAAA,EAAA,CD+HxB,oBC/HwB,GAAA,SAAA,CAAA,EAAA,EAAA,GAAA,MAAA,EAAA;;;AJLzE;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAuCtB,KGpJO,WAAA,GAAc,WHoJV,CGpJsB,mBHoJtB,CAAA,SAAA,CAAA,CAAA,CAAA,MAAA,CAAA;;;;;;;AAsBhB;AAOA;AAAgE;AA6DhE;AAYK,cG/LQ,iBHwOC,EAAA,CAAA,KAaH,EGrP8B,IHqPxB,EAAA,EAAA,KAAA,CAAA,EGrPwC,WHqPxC,EAAA,EAAA,GGrPwD,IHqPxD,EAAA;;;;;;;AAyEjB;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAwJA;;;;AC5tBA;AAsGA;;;;AC5GA;AAKA;AAcA;;;;;;AA8GA;AAwDA;;;;AC1LA;AA2DA;;;;;;;;AC+SA;;;;;;;;AClTA;AA0Qa,cDwCA,YCxCA,EAAA,CAAA,KAaZ,ED2BmC,IC3BnC,EAAA,EAAA,OAAA,ED2BoD,mBC3BpD,EAAA,GD2BuE,OC3BvE,EAAA;;;;ANzRD;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAqCxB;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBA;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;;;;;;;AAwJA;;;;AC5tBA;AAsGA;;;;AC5GA;AAKA;AAcA;;;;;;AA8GA;AAwDA;;;;AC1LA;AA2Da,cEHA,sBFyBZ,EAAA,CAAA,OAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;;;;;ACyRD;;;;;;;cCxCa;AA1Qb;AA0QA;AAsDA;AA2CA;AAWA;AAuKA;AA6CA;AAuBA;AAqBA;AAgBA;AA8BA;AAWA;AAoBA;AA6BA;;;;AC/wBA;AAcA;AAEA;AAwQA;;;;;cDuFa,gBAAgB;;;AEzW7B;AAaA;AAOA;AA2OA;;;;;;;;AC9QA;AA+EA;AAgEa,cHoRA,cGrQZ,EAAA,CAAA,KAfgE,EAAA,MAAA,EAAA,GAAA,OAAe;AAuBhF;AAiCA;;;;AC7MA;AAKY,KJ+aA,YAAA,GI/aiB;EAChB;;;;;EAMD,OAAA,EAAA,MAAA;EA6BP;AAinBL;;;;EAIc,YAAA,EAAA,MAAA,EAAA;EAGa;;;;AA+C3B;EACU,WAAA,EAAA,OAAA;CACsE;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;cJvHnE,mHAIV;;;;;;;;;;;;;;;;;;;;cAyCU;;;;;;;;;;;;;;;;;;;;;;cAuBA,uCAAmC;;;;;;;;;;;;;cAqBnC;;;;;;;;;;;;;;;cAgBA;;;;;;;;;;;;;;;cA8BA;;;;KAWD,YAAA;;;;;;;;;;;;;;;;;;;;;;cAoBC,iDAAkD;;;;;;;;;;;;;;;;cA6BlD;;;ANluBA,KO7CD,wBAAA,GP6C8E;EA+F7E,IAAA,CAAA,EAAA,MAAA;;;;ECnIR,WAAA,CAAA,EAAA,MAAY;EA4BZ,wBAAe,CAAA,EAAA,OAAA;EA8Bf,yBAAqB,CAAA,EAAA,OAAA;EAiCrB,MAAA,CAAA,EAAA,aAAA,GAAsB,OAAA;EAwBtB,UAAA,CAAA,EAAA,CAAA,IAAA,EAAA,MAAmB,EAAA,MAAA,EAAA,MAAA,EAAA,GAAA,OAAA;EAenB,cAAW,CAAA,EMjIK,MNiIL,EAAA;EACV,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;CACA;AACA,KMhIM,uBAAA,GNgIN;EACA,IAAA,EAAA,MAAA;EACA,MAAA,EAAA,MAAA;CAAmB;AAiBZ,KMjJD,sBAAA,GNiJwG;EAOxG,OAAA,EAAA,MAAA;EAYP,KAAA,EAAA,MAAA;EAiDO,QAAA,EMlNE,uBNkNO,EAAA;AAA6B,CAAA;AAyHlD;;;AAAsD,cMtEzC,uBNsEyC,EAAA,CAAA,KAAA,EMrE3C,INqE2C,EAAA,EAAA,OAAA,CAAA,EMpEzC,wBNoEyC,EAAA,GMnEnD,sBNmEmD,EAAA;;;AAlOjD,KOtHO,wBAAA,GPsHY;EAenB,WAAA,CAAA,EAAW,MAAA;EACV,WAAA,CAAA,EAAA,MAAA;EACA,QAAA,CAAA,EAAA,MAAA;EACA,IAAA,CAAA,EAAA,MAAA;EACA,yBAAA,CAAA,EAAA,OAAA;EACA,YAAA,CAAA,EAAA,OAAA;EAAmB,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;EAiBZ,WAAA,CAAA,EAAA,MAAuG;EAOxG,YAAA,CAAA,EAAA,MAAc;EAYrB,iBAAa,CAAA,EAAA,MAAA;AAiDlB,CAAA;AAYK,KO9NO,wBAAA,GPuQE;EAoEF,IAAA,EAAA,MAAS;EAAG,OAAA,EAAA,MAAA;EAAc,MAAA,EAAA,MAAA;EAAgB,YAAA,EAAA,MAAA,EAAA;CAAe;AAkBzD,KOtVA,wBAAA,GPsVI;EAoCJ,OAAA,EAAA,MAAA;EA0EA,KAAA,EAAA,MAAU;EA8BL,QAAA,EO/dH,wBP+dS,EAAA;AAuBvB,CAAA;;;;;;AAuMA;cOrda,mCACF,kBACG,6BACX;;;;AR3NH;AA+FA;;;;;ACnIiB;AA4BG;AA+Df,KQ7GO,eAAA,GR6Ge;EAwBtB;EAeA,KAAA,EAAA,MAAA;EACC;EACA,KAAA,EAAA,MAAA;EACA;EACA,KAAA,EAAA,MAAA;EACA;EAAmB,QAAA,EAAA,MAAA;AAiBzB,CAAA;AAOA;AAAgE;AA6DhE;AAAkD;AAyHlD;;;;;AAkBA;AAoCA;AA0EA;AA8BA;AAuBA;AA+CA;AAMc,cQlgBD,mBRkgBC,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GQlgBkC,eRkgBlC,EAAA;;;;;;AAkJd;;;;AC5tBA;AAsGA;;;cOkCa,mDAAoD;AN9IjE;AAKA;AAcA;;;;AAIe,cM8IF,oBN9IE,EAAA,CAAA,QAAA,EM+ID,eN/IC,EAAA,EAAA,GAAA;EAAe,WAAA,EAAA,gBAAA,GAAA,iBAAA;EA0GjB,KAAA,EAAA,OAAA;EAwDA,QAAA,CAAA,EAAA,MAAA;;;;AC1Lb;AA2DA;;;AAAyE,cK4I5D,kBL5I4D,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA;EAAI,QAAA,EAAA,MAAA;;;;EC+ShE,QAAA,EI5JC,eJ6Nb,EAAA;CAjEmC,GAAA,IAAA;;;ALpTvB,KU5DD,sBAAA,GV4D8E;EA+F7E,IAAA,EAAA,cAAA;;;;ECnIR,KAAA,CAAA,EAAA,OAAY,GAAA,YAAA;EA4BZ,QAAA,EAAA,MAAA,EAAe;AAAA,CAAA,GA8Bf;EAiCA,IAAA,EAAA,WAAA;EAwBA,SAAA,EAAA,CAAA,IAAA,ESxIwC,STwIrB,EAAA,KAAA,EAAA,MAAA,EAAA,GAAA,OAAA;AAAA,CAAA;AAgBlB,KStJM,iBAAA,GTsJN;EACA,OAAA,EStJO,mBTsJP;EACA,KAAA,EStJK,ITsJL,EAAA;EACA,QAAA,EStJQ,OTsJR,EAAA;EACA,QAAA,EStJQ,sBTsJR;CAAmB;AAiBZ,KSpKD,oBAAA,GToKwG;EAOxG,OAAA,EAAA;IAYP,IAAA,EAAA,YAAa,GAAA,wBAAA;IAiDN,SAAS,EAAA,MAAA;IAYhB,aAAA,EAAe,MAAA;IA6GR,SAAS,EAAA,MAAA;IAAG,UAAA,EAAA,MAAA;EAAc,CAAA;EAAgB,KAAA,CAAA,ESzV1C,KTyV0C,CAAA;IAAe,SAAA,EAAA,MAAA;IAkBzD,QAAI,EAAA,MAAA;IAoCJ,aAAc,EAAA,MAAA;IA0Ed,UAAU,EAAA,MAAA;EA8BL,CAAA,CAAA;EAuBL,OAAA,ESxgBC,KTwgBU,CAAA;IA+CX,IAAA,EAAA,MAAA;IAME,oBAAA,EAAA,MAAA;IASF,sBAAA,CAAA,EAAA,MAAA;IAiBY,qBAAA,CAAA,EAAA,MAAA;IA+CN,YAAA,EAAA,MAAA;IAwDL,MAAA,EAAA,WAAA,GAAA,oBAAA,GAAA,WAAA,GAAA,sBAAA,GAAA,qBAAA;IAAM,QAAA,EAAA,OAAA,GAAA,QAAA,GAAA,MAAA;IAiBP,EAAA,CAAA,EAAA,MAAO;;;;EC5tBP,QAAA,EAAA,MAAA,EAAc;AAsG1B,CAAA;KQ1EK,oBAAA;iBAinBW,qCAAA,QACL,kBACG,oBACD,+BACC,4BPlpBd;;EALY,gBAAA,CAAA,EO0pBe,oBP1pBI;AAK/B,CAAA,CAAA,EAAY;EAcA,MAAA,EOyoBC,oBPzoBmB;EACV,QAAA,EOwoBuB,OPxoBvB,EAAA;CACC;AACH,iBOmrBJ,6BAAA,CPnrBI,IAAA,EOorBV,iBPprBU,EAAA,EAAA,IA2GpB,CA3GoB,EAAA;EACL,IAAA,CAAA,EAAA,YAAA,GAAA,wBAAA;EAAe,gBAAA,CAAA,EOorBkD,oBPprBlD;AA0G9B,CAAA,CAAA,EAAa;EAwDA,MAAA,EOmhBA,oBP1eZ;YO0e4C"}
|