npm - flappa-doormal - Versions diffs - 2.3.1 → 2.5.0 - Mend

flappa-doormal 2.3.1 → 2.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/AGENTS.md CHANGED Viewed

@@ -379,6 +379,8 @@ bunx biome lint .
 | `{{raqms}}` | Multiple Arabic-Indic numerals | ٧٥٦٣ |
 | `{{raqms:num}}` | Numerals with named capture | `meta.num = "٧٥٦٣"` |
 | `{{dash}}` | Various dash characters | - – — ـ |
+| `{{harfs}}` | Single-letter codes separated by spaces | `د ت س ي ق` |
+| `{{rumuz}}` | rijāl/takhrīj source abbreviations (matches blocks like `خت ٤`, `خ سي`) | `خت ٤` |
 | `{{numbered}}` | Composite: `{{raqms}} {{dash}}` | ٧٥٦٣ - |
 **Named captures**: Add `:name` suffix to capture into `meta`:
@@ -387,3 +389,56 @@ bunx biome lint .
 // → segment.meta.hadithNum = "٧٥٦٣"
 ```
+## Page-start Guard (`pageStartGuard`)
+Some books contain page-wrap continuations where a new page starts with a common line-start marker (e.g. `{{naql}}`) but it is not a true new segment.
+Use `pageStartGuard` on a rule to allow matches at the start of a page **only if** the previous page’s last non-whitespace character matches a pattern (tokens supported):
+```typescript
+{
+  fuzzy: true,
+  lineStartsWith: ['{{naql}}'],
+  split: 'at',
+  pageStartGuard: '{{tarqim}}'
+}
+```
+Notes:
+- Applies only at page starts; mid-page line starts are unaffected.
+- Implemented in `src/segmentation/segmenter.ts` match filtering.
+## Analysis Helper (`analyzeCommonLineStarts`)
+`analyzeCommonLineStarts(pages)` scans lines across pages and returns common template-like line-start signatures (tokenized with `TOKEN_PATTERNS`). It’s intended to help you quickly discover rule candidates without using an LLM.
+Useful options (recent additions):
+- **`sortBy`**: `'specificity'` (default) or `'count'` (highest-frequency first). `topK` is applied **after** sorting.
+- **`lineFilter`**: restrict which lines are analyzed (e.g. only Markdown headings).
+- **`prefixMatchers`**: consume syntactic prefixes before tokenization (default includes headings via `/^#+/u`).
+  - This is how you see variations *after* prefixes like `##` instead of collapsing to just `"##"`.
+- **`normalizeArabicDiacritics`**: `true` by default so tokens match diacritized forms (e.g. `وأَخْبَرَنَا` → `{{naql}}`).
+- **`whitespace`**: `'regex'` (default) uses `\\s*` placeholders; `'space'` uses literal spaces in returned signatures.
+Examples:
+```typescript
+import { analyzeCommonLineStarts } from 'flappa-doormal';
+// Top 20 by frequency
+const top20 = analyzeCommonLineStarts(pages, { sortBy: 'count', topK: 20 });
+// Only headings (## / ### / ...)
+const headings = analyzeCommonLineStarts(pages, {
+  lineFilter: (line) => line.startsWith('#'),
+  sortBy: 'count',
+});
+// Custom prefixes (e.g. blockquotes + headings)
+const quoted = analyzeCommonLineStarts(pages, {
+  lineFilter: (line) => line.startsWith('>') || line.startsWith('#'),
+  prefixMatchers: [/^>+/u, /^#+/u],
+  sortBy: 'count',
+});
+```

package/README.md CHANGED Viewed

@@ -90,7 +90,8 @@ Replace regex with readable tokens:
 | `{{raqm}}` | Single Arabic digit | `[\\u0660-\\u0669]` |
 | `{{dash}}` | Dash variants | `[-–—ـ]` |
 | `{{harf}}` | Arabic letter | `[أ-ي]` |
-| `{{harfs}}` | Arabic letters with spaces | `[أ-ي](?:[أ-ي\s]*[أ-ي])?` |
+| `{{harfs}}` | Single-letter codes separated by spaces | `[أ-ي](?:\s+[أ-ي])*` |
+| `{{rumuz}}` | Source abbreviations (rijāl/takhrīj rumuz), incl. multi-code blocks | e.g. `خت ٤`, `خ سي`, `خ فق`, `د ت سي ق` |
 | `{{numbered}}` | Hadith numbering `٢٢ - ` | `{{raqms}} {{dash}} ` |
 | `{{fasl}}` | Section markers | `فصل\|مسألة` |
 | `{{tarqim}}` | Punctuation marks | `[.!?؟؛]` |
@@ -144,6 +145,26 @@ const rules = [{
 | `template` | Depends | Custom pattern with full control |
 | `regex` | Depends | Raw regex for complex cases |
+### 4.1 Page-start Guard (avoid page-wrap false positives)
+When matching at line starts (e.g., `{{naql}}`), a new page can begin with a marker that is actually a **continuation** of the previous page (page wrap), not a true new segment.
+Use `pageStartGuard` to allow a rule to match at the start of a page **only if** the previous page’s last non-whitespace character matches a pattern (tokens supported):
+```typescript
+const segments = segmentPages(pages, {
+  rules: [{
+    fuzzy: true,
+    lineStartsWith: ['{{naql}}'],
+    split: 'at',
+    // Only allow a split at the start of a new page if the previous page ended with sentence punctuation:
+    pageStartGuard: '{{tarqim}}'
+  }]
+});
+```
+This guard applies **only at page starts**. Mid-page line starts are unaffected.
 ### 5. Auto-Escaping Brackets
 In `lineStartsWith`, `lineStartsAfter`, `lineEndsWith`, and `template` patterns, parentheses `()` and square brackets `[]` are **automatically escaped**. This means you can write intuitive patterns without manual escaping:
@@ -296,19 +317,253 @@ const segments = segmentPages(pages, {
 ### Narrator Abbreviation Codes
-Use `{{harfs}}` for matching Arabic letter abbreviations with spaces (common in narrator biography books):
+Use `{{rumuz}}` for matching rijāl/takhrīj source abbreviations (common in narrator biography books and takhrīj notes):
 ```typescript
 const segments = segmentPages(pages, {
   rules: [{
-    lineStartsAfter: ['{{raqms:num}} {{harfs}}:'],
+    lineStartsAfter: ['{{raqms:num}} {{rumuz}}:'],
     split: 'at'
   }]
 });
-// Matches: ١١١٨ د ت سي ق: حجاج بن دينار
+// Matches: ١١١٨ ع: ...   /   ١١١٨ خ سي: ...  /  ١١١٨ خ فق: ...
 // meta: { num: '١١١٨' }
-// content: 'حجاج بن دينار' (abbreviations stripped)
+// content: '...' (rumuz stripped)
+```
+If your data uses *only single-letter codes separated by spaces* (e.g., `د ت س ي ق`), you can also use `{{harfs}}`.
+## Analysis Helpers (no LLM required)
+Use `analyzeCommonLineStarts(pages)` to discover common line-start signatures across a book, useful for rule authoring:
+```typescript
+import { analyzeCommonLineStarts } from 'flappa-doormal';
+const patterns = analyzeCommonLineStarts(pages);
+// [{ pattern: "{{numbered}}", count: 1234, examples: [...] }, ...]
+```
+You can control **what gets analyzed** and **how results are ranked**:
+```typescript
+import { analyzeCommonLineStarts } from 'flappa-doormal';
+// Top 20 most common line-start signatures (by frequency)
+const topByCount = analyzeCommonLineStarts(pages, {
+  sortBy: 'count',
+  topK: 20,
+});
+// Only analyze markdown H2 headings (lines beginning with "##")
+// This shows what comes AFTER the heading marker (e.g. "## {{bab}}", "## {{numbered}}\\[", etc.)
+const headingVariants = analyzeCommonLineStarts(pages, {
+  lineFilter: (line) => line.startsWith('##'),
+  sortBy: 'count',
+  topK: 40,
+});
+// Support additional prefix styles without changing library code
+// (e.g. markdown blockquotes ">> ..." + headings)
+const quotedHeadings = analyzeCommonLineStarts(pages, {
+  lineFilter: (line) => line.startsWith('>') || line.startsWith('#'),
+  prefixMatchers: [/^>+/u, /^#+/u],
+  sortBy: 'count',
+  topK: 40,
+});
+```
+Key options:
+- `sortBy`: `'specificity'` (default) or `'count'` (highest frequency first)
+- `lineFilter`: restrict which lines are counted (e.g. only headings)
+- `prefixMatchers`: consume syntactic prefixes (default includes headings via `/^#+/u`) so you can see variations *after* the prefix
+- `normalizeArabicDiacritics`: `true` by default (helps token matching like `وأَخْبَرَنَا` → `{{naql}}`)
+- `whitespace`: how whitespace is represented in returned patterns:
+  - `'regex'` (default): uses `\\s*` placeholders between tokens
+  - `'space'`: uses literal single spaces (`' '`) between tokens (useful if you don't want `\\s` to later match newlines when reusing these patterns)
+## Prompting LLMs / Agents to Generate Rules (Shamela books)
+### Pre-analysis (no LLM required): generate “hints” from the book
+Before prompting an LLM, you can quickly extract **high-signal pattern hints** from the book using:
+- `analyzeCommonLineStarts(pages, options)` (from `src/line-start-analysis.ts`): common **line-start signatures** (tokenized)
+- `analyzeTextForRule(text)` / `detectTokenPatterns(text)` (from `src/pattern-detection.ts`): turn a **single representative line** into a token template suggestion
+These help the LLM avoid guessing and focus on the patterns actually present.
+#### Step 1: top line-start signatures (frequency-first)
+```typescript
+import { analyzeCommonLineStarts } from 'flappa-doormal';
+const top = analyzeCommonLineStarts(pages, {
+  sortBy: 'count',
+  topK: 40,
+  minCount: 10,
+});
+console.log(top.map((p) => ({ pattern: p.pattern, count: p.count, example: p.examples[0] })));
+```
+Typical output (example):
+```text
+[
+  { pattern: "{{numbered}}", count: 1200, example: { pageId: 50, line: "١ - حَدَّثَنَا ..." } },
+  { pattern: "{{bab}}",      count:  180, example: { pageId: 66, line: "باب ..." } },
+  { pattern: "##\\s*{{bab}}",count:  140, example: { pageId: 69, line: "## باب ..." } }
+]
+```
+If you only want to analyze headings (to see what comes *after* `##`):
+```typescript
+const headingVariants = analyzeCommonLineStarts(pages, {
+  lineFilter: (line) => line.startsWith('##'),
+  sortBy: 'count',
+  topK: 40,
+});
+```
+#### Step 2: convert a few representative lines into token templates
+Pick 3–10 representative line prefixes from the book (often from the examples returned above) and run:
+```typescript
+import { analyzeTextForRule } from 'flappa-doormal';
+console.log(analyzeTextForRule("٢٩- خ سي: أحمد بن حميد ..."));
+// -> { template: "{{raqms}}- {{rumuz}}: أحمد...", patternType: "lineStartsAfter", fuzzy: false, ... }
+```
+#### Step 3: paste the “hints” into your LLM prompt
+When you prompt the LLM, include a short “Hints” section:
+- Top 20–50 `analyzeCommonLineStarts` patterns (with counts + 1–2 examples)
+- 3–10 `analyzeTextForRule(...)` results
+- A small sample of pages (not the full book)
+Then instruct the LLM to **prioritize rules that align with those hints**.
+You can use an LLM to generate `SegmentationOptions` by pasting it a random subset of pages and asking it to infer robust segmentation rules. Here’s a ready-to-copy plain-text prompt:
+```text
+You are helping me generate JSON configuration for a text-segmentation function called segmentPages(pages, options).
+It segments Arabic book pages (e.g., Shamela) into logical segments (books/chapters/sections/entries/hadiths).
+I will give you a random subset of pages so you can infer patterns. You must respond with ONLY JSON (no prose).
+I will paste a random subset of pages. Each page has:
+- id: page number (not necessarily consecutive)
+- content: plain text; line breaks are \n
+Output ONLY a JSON object compatible with SegmentationOptions (no prose, no code fences).
+SegmentationOptions shape:
+- rules: SplitRule[]
+- optional: maxPages, breakpoints, prefer
+SplitRule constraints:
+- Each rule must use exactly ONE of: lineStartsWith, lineStartsAfter, lineEndsWith, template, regex
+- Optional fields: split ("at" | "after"), meta, min, max, exclude, occurrence ("first" | "last"), fuzzy
+Important behaviors:
+- lineStartsAfter matches at line start but strips the marker from segment.content.
+- Template patterns (lineStartsWith/After/EndsWith/template) auto-escape ()[] outside tokens.
+- Raw regex patterns do NOT auto-escape and can include groups, named captures, etc.
+Available tokens you may use in templates:
+- {{basmalah}}  (بسم الله / ﷽)
+- {{kitab}}     (كتاب)
+- {{bab}}       (باب)
+- {{fasl}}      (فصل | مسألة)
+- {{naql}}      (حدثنا/أخبرنا/... narration phrases)
+- {{raqm}}      (single Arabic-Indic digit)
+- {{raqms}}     (Arabic-Indic digits)
+- {{dash}}      (dash variants)
+- {{tarqim}}    (punctuation [. ! ? ؟ ؛])
+- {{harf}}      (Arabic letter)
+- {{harfs}}     (single-letter codes separated by spaces; e.g. "د ت س ي ق")
+- {{rumuz}}     (rijāl/takhrīj source abbreviations; matches blocks like "خت ٤", "خ سي", "خ فق")
+Named captures:
+- {{raqms:num}} captures to meta.num
+- {{:name}} captures arbitrary text to meta.name
+Your tasks:
+1) Identify document structure from the sample:
+   - book headers (كتاب), chapter headers (باب), sections (فصل/مسألة), hadith numbering, biography entries, etc.
+2) Propose a minimal but robust ordered ruleset:
+   - Put most-specific rules first.
+   - Use fuzzy:true for Arabic headings where diacritics vary.
+   - Use lineStartsAfter when you want to remove the marker (e.g., hadith numbers, rumuz prefixes).
+3) Use constraints:
+   - Use min/max/exclude when front matter differs or specific pages are noisy.
+4) If segments can span many pages:
+   - Set maxPages and breakpoints.
+   - Suggested breakpoints (in order): "{{tarqim}}\\s*", "\\n", "" (page boundary)
+   - Prefer "longer" unless there’s a reason to prefer shorter segments.
+5) Capture useful metadata:
+   - For numbering patterns, capture the number into meta.num (e.g., {{raqms:num}}).
+Examples (what good answers look like):
+Example A: hadith-style numbered segments
+Input pages:
+PAGE 10:
+٣٤ - حَدَّثَنَا ...\n... (rest of hadith)
+PAGE 11:
+٣٥ - حَدَّثَنَا ...\n... (rest of hadith)
+Good JSON answer:
+{
+  "rules": [
+    {
+      "lineStartsAfter": ["{{raqms:num}} {{dash}}\\s*"],
+      "split": "at",
+      "meta": { "type": "hadith" }
+    }
+  ]
+}
+Example B: chapter markers + hadith numbers
+Input pages:
+PAGE 50:
+كتاب الصلاة\nباب فضل الصلاة\n١ - حَدَّثَنَا ...\n...
+PAGE 51:
+٢ - حَدَّثَنَا ...\n...
+Good JSON answer:
+{
+  "rules": [
+    { "fuzzy": true, "lineStartsWith": ["{{kitab}}"], "split": "at", "meta": { "type": "book" } },
+    { "fuzzy": true, "lineStartsWith": ["{{bab}}"], "split": "at", "meta": { "type": "chapter" } },
+    { "lineStartsAfter": ["{{raqms:num}}\\s*{{dash}}\\s*"], "split": "at", "meta": { "type": "hadith" } }
+  ]
+}
+Example C: narrator/rijāl entries with rumuz (codes) + colon
+Input pages:
+PAGE 257:
+٢٩- خ سي: أحمد بن حميد...\nوكان من حفاظ الكوفة.
+PAGE 258:
+١٠٢- ق: تمييز ولهم شيخ آخر...\n...
+Good JSON answer:
+{
+  "rules": [
+    {
+      "lineStartsAfter": ["{{raqms:num}}\\s*{{dash}}\\s*{{rumuz}}:\\s*"],
+      "split": "at",
+      "meta": { "type": "entry" }
+    }
+  ]
+}
+Now wait for the pages.
 ```
 ### Sentence-Based Splitting (Last Period Per Page)

package/dist/index.d.mts CHANGED Viewed

@@ -360,6 +360,27 @@ type RuleConstraints = {
    * - undefined: No fallback (current behavior)
    */
   fallback?: 'page';
+  /**
+   * Page-start guard: only allow this rule to match at the START of a page if the
+   * previous page's last non-whitespace character matches this pattern.
+   *
+   * This is useful for avoiding false positives caused purely by page wrap.
+   *
+   * Example use-case:
+   * - Split on `{{naql}}` at line starts (e.g. "أخبرنا ...")
+   * - BUT if a new page starts with "أخبرنا ..." and the previous page did NOT
+   *   end with sentence-ending punctuation, treat it as a continuation and do not split.
+   *
+   * Notes:
+   * - This guard applies ONLY at page starts, not mid-page line starts.
+   * - This is a template pattern (tokens allowed). It is checked against the LAST
+   *   non-whitespace character of the previous page's content.
+   *
+   * @example
+   * // Allow split at page start only if previous page ends with sentence punctuation
+   * { lineStartsWith: ['{{naql}}'], fuzzy: true, pageStartGuard: '{{tarqim}}' }
+   */
+  pageStartGuard?: string;
 };
 /**
  * A complete split rule combining pattern, behavior, and constraints.
@@ -720,7 +741,6 @@ type Segment = {
 };
 //#endregion
 //#region src/segmentation/segmenter.d.ts
 /**
  * Applies breakpoints to oversized segments.
  *
@@ -779,25 +799,6 @@ type Segment = {
  */
 declare const segmentPages: (pages: Page[], options: SegmentationOptions) => Segment[];
 //#endregion
-//#region src/segmentation/textUtils.d.ts
-/**
- * Strip all HTML tags from content, keeping only text.
- *
- * @param html - HTML content
- * @returns Plain text content
- */
-declare const stripHtmlTags: (html: string) => string;
-/**
- * Normalizes line endings to Unix-style (`\n`).
- *
- * Converts Windows (`\r\n`) and old Mac (`\r`) line endings to Unix style
- * for consistent pattern matching across platforms.
- *
- * @param content - Raw content with potentially mixed line endings
- * @returns Content with all line endings normalized to `\n`
- */
-declare const normalizeLineEndings: (content: string) => string;
-//#endregion
 //#region src/segmentation/tokens.d.ts
 /**
  * Token-based template system for Arabic text pattern matching.
@@ -1039,7 +1040,90 @@ declare const getAvailableTokens: () => string[];
  */
 declare const getTokenPattern: (tokenName: string) => string | undefined;
 //#endregion
-//#region src/pattern-detection.d.ts
+//#region src/analysis.d.ts
+type LineStartAnalysisOptions = {
+  /** Return top K patterns (after filtering). Default: 20 */
+  topK?: number;
+  /** Only consider the first N characters of each trimmed line. Default: 60 */
+  prefixChars?: number;
+  /** Ignore lines shorter than this (after trimming). Default: 6 */
+  minLineLength?: number;
+  /** Only include patterns that appear at least this many times. Default: 3 */
+  minCount?: number;
+  /** Keep up to this many example lines per pattern. Default: 5 */
+  maxExamples?: number;
+  /**
+   * If true, include a literal first word when no token match is found at the start.
+   * Default: true
+   */
+  includeFirstWordFallback?: boolean;
+  /**
+   * If true, strip Arabic diacritics (harakat/tashkeel) for the purposes of matching tokens.
+   * This helps patterns like `وأَخْبَرَنَا` match the `{{naql}}` token (`وأخبرنا`).
+   *
+   * Note: examples are still stored in their original (unstripped) form.
+   *
+   * Default: true
+   */
+  normalizeArabicDiacritics?: boolean;
+  /**
+   * How to sort patterns before applying `topK`.
+   *
+   * - `specificity` (default): prioritize more structured prefixes first (tokenCount, then literalLen), then count.
+   * - `count`: prioritize highest-frequency patterns first, then specificity.
+   */
+  sortBy?: 'specificity' | 'count';
+  /**
+   * Optional filter to restrict which lines are analyzed.
+   *
+   * The `line` argument is the trimmed + whitespace-collapsed version of the line.
+   * Return `true` to include it, `false` to skip it.
+   *
+   * @example
+   * // Only analyze markdown H2 headings
+   * { lineFilter: (line) => line.startsWith('## ') }
+   */
+  lineFilter?: (line: string, pageId: number) => boolean;
+  /**
+   * Optional list of prefix matchers to consume before tokenization.
+   *
+   * This is for "syntactic" prefixes that are common at line start but are not
+   * meaningful as tokens by themselves (e.g. markdown headings like `##`).
+   *
+   * Each matcher is applied at the current position. If it matches, the matched
+   * text is appended (escaped) to the signature and the scanner advances.
+   *
+   * @example
+   * // Support markdown blockquotes and headings
+   * { prefixMatchers: [/^>+/u, /^#+/u] }
+   */
+  prefixMatchers?: RegExp[];
+  /**
+   * How to represent whitespace in returned `pattern` signatures.
+   *
+   * - `regex` (default): use `\\s*` placeholders between tokens (useful if you paste patterns into regex-ish templates).
+   * - `space`: use literal single spaces (`' '`) between tokens (safer if you don't want `\\s` to match newlines when reused as regex).
+   */
+  whitespace?: 'regex' | 'space';
+};
+type LineStartPatternExample = {
+  line: string;
+  pageId: number;
+};
+type CommonLineStartPattern = {
+  pattern: string;
+  count: number;
+  examples: LineStartPatternExample[];
+};
+/**
+ * Analyze pages and return the most common line-start patterns (top K).
+ *
+ * This is a pure algorithmic heuristic: it tokenizes common prefixes into a stable
+ * template-ish string using the library tokens (e.g., `{{bab}}`, `{{raqms}}`, `{{rumuz}}`).
+ */
+declare const analyzeCommonLineStarts: (pages: Page[], options?: LineStartAnalysisOptions) => CommonLineStartPattern[];
+//#endregion
+//#region src/detection.d.ts
 /**
  * Pattern detection utilities for recognizing template tokens in Arabic text.
  * Used to auto-detect patterns from user-highlighted text in the segmentation dialog.
@@ -1114,5 +1198,5 @@ declare const analyzeTextForRule: (text: string) => {
   detected: DetectedPattern[];
 } | null;
 //#endregion
-export { type Breakpoint, type BreakpointRule, type DetectedPattern, type ExpandResult, type Logger, type Page, type PageRange, type Segment, type SegmentationOptions, type SplitRule, TOKEN_PATTERNS, analyzeTextForRule, containsTokens, detectTokenPatterns, escapeRegex, escapeTemplateBrackets, expandTokens, expandTokensWithCaptures, generateTemplateFromText, getAvailableTokens, getTokenPattern, makeDiacriticInsensitive, normalizeLineEndings, segmentPages, stripHtmlTags, suggestPatternConfig, templateToRegex };
+export { type Breakpoint, type BreakpointRule, type CommonLineStartPattern, type DetectedPattern, type ExpandResult, type LineStartAnalysisOptions, type LineStartPatternExample, type Logger, type Page, type PageRange, type Segment, type SegmentationOptions, type SplitRule, TOKEN_PATTERNS, analyzeCommonLineStarts, analyzeTextForRule, containsTokens, detectTokenPatterns, escapeRegex, escapeTemplateBrackets, expandTokens, expandTokensWithCaptures, generateTemplateFromText, getAvailableTokens, getTokenPattern, makeDiacriticInsensitive, segmentPages, suggestPatternConfig, templateToRegex };
 //# sourceMappingURL=index.d.mts.map

package/dist/index.d.mts.map CHANGED Viewed

	@@ -1 +1 @@
1	- {"version":3,"file":"index.d.mts","names":[],"sources":["../src/segmentation/fuzzy.ts","../src/segmentation/types.ts","../src/segmentation/segmenter.ts","../src/segmentation/~~textUtils~~.ts","../src/~~segmentation/tokens~~.ts","../src/~~pattern-~~detection.ts"],"sourcesContent":[],"mappings":";;AAkEA;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;AA2FzB;AAAkD;~~AA0GlD~~;;;;;AAkBA;AAqCA;AA0EY,~~cDzaC~~,~~WCyaqB~~,EAAA,CAAA,CAAA,EAAA,MAAc,EAAA,GAAA,MAAA;AA8BhD;AAiDA;;;;;AA+HA;;;;~~AC/FA~~;;;;;;;;~~ACplBA~~;~~AAcA;;;;ACgDA~~;~~AA6NA;~~AA2CA;AAWA;AA2DA;AAyHA;AAuBA;AAqBA;AAgBA;;;;~~AC3jBY~~,~~cLqJC~~,~~wBKrJc~~,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;~~ALsD3B~~;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;AA2FzB;AAAkD;~~AA0GlD~~;;;;;AAkBA,~~KA9VK~~,YAAA,~~GA8VW~~;EAqCJ;EA0EA,KAAA,EAAA,MAAU;AA8BtB,CAAA;AAiDA;;;;;AA+HA;;;;~~AC/FA~~;;;;;;;;~~ACplBA~~;~~AAcA;;;;ACgDA~~;~~AA6NA~~,~~KHvOK~~,eAAA,~~GG2OJ~~;~~EAuCY~~;~~EAWD~~,QAAA,EAAA,~~MAAY~~;~~AA2DxB,CAAA~~;~~AAyHA~~;~~AAuBA~~;~~AAqBA~~;~~AAgBA;;;;AC3jBA~~;~~AA0DA;AA4DA;~~AAuBA;AAiCA~~;;;;;;;;;;;;;;KJlGK~~,qBAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAiCA,sBAAA;;;;;;;;;;;;;;;;;;;;;;;KAwBA,mBAAA;;;;;;;;;;;;;;KAeA,WAAA,GACC,eACA,kBACA,wBACA,yBACA;;;;;;;KAYD,aAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA+EO,SAAA;;;;;;;KAYP,eAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAyCS;;;;;;;;;;;;SAaH~~;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAwCC~~,SAAA,GAAY,cAAc,gBAAgB;;;;;;;;;;;;;KAkB1C,IAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqCA,cAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAqCE;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqCF,UAAA,YAAsB;;;;;;;;;;;;;;;;;;;;;;;;;UA8BjB,MAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAiDL,mBAAA;;;;;;;;UAQA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;gBA8CM;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;WAwDL;;;;;;;;;;;;;;;;KAiBD,OAAA;;;;;;;;;;;;;;;;;;;;;;;;;;SA6BD~~;;;;;;;;;AEhtBX~~;~~AAcA~~;;;;~~ACgDA~~;~~AA6NA~~;AA2CA;AAWA;AA2DA;AAyHA;AAuBA;AAqBA;AAgBA;;;;~~AC3jBA~~;~~AA0DA~~;~~AA4DA~~;AAuBA;AAiCA~~;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;cHgaa~~,sBAAuB,iBAAiB,wBAAsB;;;;~~AFxhB3E~~;AA+FA;;;;~~ACnIK~~,~~cExBQ~~,~~aFwBI~~,EAAA,CAAA,~~IAAA~~,EAAA,MAAA,EAAA,GAAA,MAAA;~~AAAA~~;~~AA4BG~~;~~AA8BM~~;~~AAiCC~~;~~AAwBH;;;;;AAoBlB~~,~~cEjJO~~,~~oBFiJP~~,EAAA,CAAA,~~OAAA,~~EAAA,MAAA,~~EAAA~~,~~GAAA~~,MAAA;;;;~~ADnGN~~;~~AA+FA;;;;;ACnIiB~~;~~AA4BG~~;~~AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;~~AA2FzB;AAAkD;~~AA0GlD;;;;;AAkBA~~;~~AAqCA~~;AA0EA;AA8BA;AAiDA~~;;;;;AA+HA;;;;AC/FA;;;;;;;;ACplBA~~;~~AAcA~~;;;;~~ACgDA~~;~~AA6NA~~;AA2CA;AAWA;AA2DA;AAyHA;AAuBA;AAqBA;AAgBA~~;;;cAngBa~~;~~ACxDb~~;~~AA0DA~~;~~AA4DA~~;~~AAuBA~~;~~AAiCA;;;;;;;;;;;;;;;;;;;;;cDuGa~~,~~gBAAgB;;;;;;;;;;;;;;;;cA2ChB~~;;;;;;;~~KAWD~~,~~YAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;cA2DC~~,~~mHAIV;;;;;;;;;;;;;;;;;;;;cAqHU;;;;;;;;;;;;;;;;;;;;;;cAuBA~~,~~uCAAmC;;;;;;;;;;;;;cAqBnC;;;;;;;;;;;;;;;cAgBA~~;;;;~~AJrgBb~~;AA+FA;;;;;ACnIiB;AA4BG;AA+Df,KI7GO,eAAA,GJ6Ge;EAwBtB;EAeA,KAAA,EAAA,MAAA;EACC;EACA,KAAA,EAAA,MAAA;EACA;EACA,KAAA,EAAA,MAAA;EACA;EAAmB,QAAA,EAAA,MAAA;AAAA,CAAA;AA2FzB;AAAkD;~~AA0GlD~~;;;;;AAkBA;AAqCA;AA0EA;AA8BA;AAiDA;;;;AA8GmB,~~cIlmBN~~,~~mBJkmBM~~,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,~~GIlmBgC~~,~~eJkmBhC~~,EAAA;AAiBnB;;;;~~AC/FA~~;;;;;;;;~~ACplBA~~;~~AAca~~,~~cE8GA~~,~~wBF7GyD~~,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,QAAA,~~EE6GL~~,~~eF7GK~~,EAAA,EAAA,GAAA,MAAA~~;;;;AC+CtE~~;~~AA6NA~~;~~AA2CA~~;~~AAWY~~,~~cC9LC~~,~~oBD8LW~~,EAAA,CAAA,QAAA,~~EC7LV~~,~~eD6LU~~,EAAA,EAAA,GAAA~~;EA2DX~~,~~WAAA,EAAA,gBAoGZ,GAAA,iBAhGE~~;~~EAqHU,KAAA,EAAA,OAAA~~;~~EAuBA,QAAA,CAAA,EAAA,MAAA~~;~~AAqBb,CAAA;AAgBA~~;;;;~~AC3jBA;AA0DA;AA4Da~~,~~cAwDA~~,~~kBAzCZ~~,EAAA,~~CAfgE~~,IAAA,EAAA,MAAA,EAAA,~~GAAe;EAuBnE~~,QAAA,EAAA,MAAA;~~EAiCA~~,~~WAAA~~,EAAA,~~gBAmBZ~~,~~GAZa~~,~~iBAAe;;;YAAf~~"}
1	+ {"version":3,"file":"index.d.mts","names":[],"sources":["../src/segmentation/fuzzy.ts","../src/segmentation/types.ts","../src/segmentation/segmenter.ts","../src/segmentation/tokens.ts","../src/analysis.ts","../src/detection.ts"],"sourcesContent":[],"mappings":";;AAkEA;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA;AAqCA;AA0EY,cD/bC,WC+bqB,EAAA,CAAA,CAAA,EAAA,MAAc,EAAA,GAAA,MAAA;AA8BhD;AAiDA;;;;;AA+HA;;;;ACjPA;;;;;;;;AC1ZA;AAgQA;AA2CA;AAWA;AA2DA;AAyHA;AAuBA;AAqBA;AAgBA;;;;ACtmBA;AAkEA;AAEA;AAuRA;;AAEa,cJhMA,wBIgMA,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;AJ/Rb;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA,KApXK,YAAA,GAoXW;EAqCJ;EA0EA,KAAA,EAAA,MAAU;AA8BtB,CAAA;AAiDA;;;;;AA+HA;;;;ACjPA;;;;;;;;AC1ZA;AAgQA;AA2CA;AAWA;AA2DA;AAyHA;AAuBA,KF3gBK,eAAA,GEkhBJ;EAcY;EAgBA,QAAA,EAAA,MAAA;;;;ACtmBb;AAkEA;AAEA;AAuRA;;;;;;;;ACnVA;AA+EA;AAgEA;AAuBA;AAiCA;;;;;;;;KJ3HK,qBAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAiCA,sBAAA;;;;;;;;;;;;;;;;;;;;;;;KAwBA,mBAAA;;;;;;;;;;;;;;KAeA,WAAA,GACC,eACA,kBACA,wBACA,yBACA;;;;;;;KAYD,aAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA+EO,SAAA;;;;;;;KAYP,eAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAyCS;;;;;;;;;;;;SAaH;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KA8DC,SAAA,GAAY,cAAc,gBAAgB;;;;;;;;;;;;;KAkB1C,IAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqCA,cAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;YAqCE;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAqCF,UAAA,YAAsB;;;;;;;;;;;;;;;;;;;;;;;;;UA8BjB,MAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;KAiDL,mBAAA;;;;;;;;UAQA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;gBA8CM;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;WAwDL;;;;;;;;;;;;;;;;KAiBD,OAAA;;;;;;;;;;;;;;;;;;;;;;;;;;SA6BD;;;;;;AA1VX;AAqCA;AA0EA;AA8BA;AAiDA;;;;;AA+HA;;;;ACjPA;;;;;;;;AC1ZA;AAgQA;AA2CA;AAWA;AA2DA;AAyHA;AAuBA;AAqBA;AAgBA;;;;ACtmBA;AAkEA;AAEA;AAuRA;;;;;;;;ACnVA;AA+EA;AAgEA;AAuBA;AAiCA;;;;;;cH2Qa,sBAAuB,iBAAiB,wBAAsB;;;;AF5Z3E;AA+FA;;;;;ACnIiB;AA4BG;AA8BM;AAiCC;AAwBH;;;;;;;AAoBC;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA;AAqCA;AA0EA;AA8BA;AAiDA;;;;;AA+HA;;;;ACjPA;;;;;;;;AC1ZA;AAgQA;AA2CA;AAWA;AA2DA;AAyHA;AAuBA;AAqBA;AAgBA;;;;ACtmBA;AAkEA;AAEA;AAuRA;AACW,cD5RE,sBC4RF,EAAA,CAAA,OAAA,EAAA,MAAA,EAAA,GAAA,MAAA;;;;;;;ACpVX;AA+EA;AAgEA;AAuBA;AAiCA;;;;;;;;;;;;;;;cFiHa,gBAAgB;;;;;;;;;;;;;;;;cA2ChB;;;;;;;KAWD,YAAA;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;cA2DC,mHAIV;;;;;;;;;;;;;;;;;;;;cAqHU;;;;;;;;;;;;;;;;;;;;;;cAuBA,uCAAmC;;;;;;;;;;;;;cAqBnC;;;;;;;;;;;;;;;cAgBA;;;AHxiBA,KI9DD,wBAAA,GJ8D8E;EA+F7E;;;;ECnIR;EA4BA,aAAA,CAAA,EAAA,MAAe;EA8Bf;EAiCA,QAAA,CAAA,EAAA,MAAA;EAwBA;EAeA,WAAA,CAAA,EAAW,MAAA;EACV;;;;EAIA,wBAAA,CAAA,EAAA,OAAA;EAAmB;AAAA;AA2FzB;AAAkD;AAgIlD;;;;EAAqE,yBAAA,CAAA,EAAA,OAAA;EAkBzD;AAqCZ;AA0EA;AA8BA;AAiDA;;EAsDkB,MAAA,CAAA,EAAA,aAAA,GAAA,OAAA;EAwDL;;AAiBb;;;;ACjPA;;;;EAAkF,UAAA,CAAA,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,MAAA,EAAA,MAAA,EAAA,GAAA,OAAA;;;;AC1ZlF;AAgQA;AA2CA;AAWA;AA2DA;AAyHA;AAuBA;AAqBA;AAgBA;;mBC9iBqB;;AAxDrB;AAkEA;AAEA;AAuRA;;EAEa,UAAA,CAAA,EAAA,OAAA,GAAA,OAAA;CACV;AAAsB,KA5Rb,uBAAA,GA4Ra;;;;ACtVb,KD4DA,sBAAA,GC5De;EA+Ed,OAAA,EAAA,MAAA;EAgEA,KAAA,EAAA,MAAA;EAuBA,QAAA,EDvGC,uBCgIb,EAxBa;AAgCd,CAAA;;;;;;;cD4Ia,iCACF,kBACE,6BACV;;;;AJhSH;AA+FA;;;;;ACnIiB;AA4BG;AA+Df,KI7GO,eAAA,GJ6Ge;EAwBtB;EAeA,KAAA,EAAA,MAAA;EACC;EACA,KAAA,EAAA,MAAA;EACA;EACA,KAAA,EAAA,MAAA;EACA;EAAmB,QAAA,EAAA,MAAA;AAAA,CAAA;AA2FzB;AAAkD;AAgIlD;;;;;AAkBA;AAqCA;AA0EA;AA8BA;AAiDA;;;;AA8GmB,cInmBN,mBJmmBM,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GInmB6B,eJmmB7B,EAAA;AAiBnB;;;;ACjPA;;;;;;;;AC1ZA;AAgQa,cEzKA,wBFyKsB,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,QAAA,EEzK8B,eFyK9B,EAAA,EAAA,GAAA,MAAA;AA2CnC;AAWA;AA2DA;AAyHA;AAuBA;AAqBA;AAgBa,cExbA,oBFwbsF,EAAA,CAAA,QAAA,EEvbrF,eFubqF,EAAA,EAAA,GAAA;;;;ACtmBnG,CAAA;AAkEA;AAEA;AAuRA;;;;AAGyB,cC/IZ,kBD+IY,EAAA,CAAA,IAAA,EAAA,MAAA,EAAA,GAAA;;;;ECtVb,QAAA,CAAA,EAAA,MAAA;EA+EC,QAAA,EA+HC,eA9Eb,EAAA;AAeD,CAAA,GAAa,IAAA"}