flappa-doormal 2.19.0 → 2.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md CHANGED
@@ -30,6 +30,17 @@ src/
30
30
  │ ├── breakpoints.ts # Breakpoint types
31
31
  │ ├── options.ts # SegmentationOptions and Logger
32
32
  │ └── segmenter.ts # Internal segmenter types
33
+ ├── dictionary/ # Dictionary-specific compiler, runtime, profiles, diagnostics
34
+ │ ├── arabic-dictionary-rule.ts
35
+ │ ├── constants.ts
36
+ │ ├── dictionary-blockers.ts
37
+ │ ├── dictionary-candidates.ts
38
+ │ ├── dictionary-diagnostics.ts
39
+ │ ├── dictionary-zones.ts
40
+ │ ├── heading-classifier.ts
41
+ │ ├── profile.ts
42
+ │ ├── profiles.ts
43
+ │ └── runtime.ts
33
44
  ├── analysis/ # Pattern discovery module
34
45
  │ ├── line-starts.ts # analyzeCommonLineStarts (frequent line markers)
35
46
  │ ├── repeating-sequences.ts # analyzeRepeatingSequences (N-grams)
@@ -56,6 +67,22 @@ src/
56
67
  ├── detection.ts # Pattern auto-detection (standalone)
57
68
  └── *.test.ts # Unit and integration tests (co-located)
58
69
 
70
+ testing/
71
+ ├── exports.test.ts # Public export contract test
72
+ └── fixtures/
73
+ ├── README.md # Fixture purpose, source, and refresh workflow
74
+ ├── dictionary-book-options.ts # Local golden options for the four reference dictionaries
75
+ ├── dictionary-books.ts # Test fixture loader helpers
76
+ ├── dictionary-fixture-manifest.ts
77
+ └── dictionary-books/ # Extracted markdown pages used by integration tests
78
+
79
+ scripts/
80
+ ├── analyze-dictionary-profile.ts # Full-book diagnostics against an explicit input file/books dir
81
+ ├── export-dictionary-book-options.ts
82
+ ├── extract-dictionary-test-fixtures.ts
83
+ ├── generate-dictionary-html-previews.ts
84
+ └── split-dictionary-csvs.ts
85
+
59
86
  ### Core Components
60
87
 
61
88
  1. **`segmentPages(pages, options)`** - Main entry point (`src/segmentation/segmenter.ts`)
@@ -100,6 +127,24 @@ src/
100
127
  - `options.ts`: Comprehensive `SegmentationOptions` and `Logger` definitions
101
128
  - `index.ts`: Public API types for consumers
102
129
 
130
+ ### Dictionary Blocker Notes
131
+
132
+ - `previousWord.scope` defaults to `'samePage'` and only checks the same page's
133
+ preceding Arabic word unless you opt into cross-page behavior.
134
+ - `previousWord.scope: 'pageStart'` only runs for page-start candidates and
135
+ compares against the previous page's last Arabic word, skipping the check when
136
+ the previous page ends with strong sentence punctuation.
137
+ - `previousWord.scope: 'any'` combines the page-start cross-page check with the
138
+ usual same-page check for non-page-start candidates.
139
+ - `pageContinuation.authorityPrecision` defaults to `'high'`; set it to
140
+ `'aggressive'` when page-start continuation blocking should treat
141
+ authority-like prefixes more conservatively.
142
+ - `qualifierTail` and `structuralLeak` are intentionally non-configurable global
143
+ safety checks. They run before zone blockers and appear in diagnostics as
144
+ rejection reasons.
145
+ - `diagnoseDictionaryProfile()` now reports `rejectionReasons` rather than
146
+ the former `blockerHits`.
147
+
103
148
  11. **`textUtils.ts`** - Low-level helpers (`src/utils/textUtils.ts`)
104
149
  - `makeDiacriticInsensitive()`: Arabic-aware regex generation
105
150
  - `adjustForUnicodeBoundary()`: Prevents invalid splits across multi-character clusters
@@ -386,7 +431,8 @@ The original `segmentPages` had complexity 37 (max: 15). Extraction:
386
431
 
387
432
  - **Unit tests**: Each utility function has dedicated tests
388
433
  - **Integration tests**: Full pipeline tests in `src/segmentation/segmenter.test.ts`
389
- - **Real-world tests**: `src/segmentation/segmenter.bukhari.test.ts` uses actual hadith data
434
+ - **Dictionary integration tests**: `src/dictionary/*.test.ts` use extracted markdown fixtures under `testing/fixtures/dictionary-books/`
435
+ - **Optional corpus tooling**: full-book diagnostics/preview scripts can use external Shamela JSONs via `--input` or `--books-dir`, but the test suite does not require a local `books/` directory
390
436
  - **Style convention**: Prefer `it('should ...', () => { ... })` (Bun) for consistency across the suite
391
437
  - Run: `bun test`
392
438
 
@@ -395,7 +441,7 @@ The original `segmentPages` had complexity 37 (max: 15). Extraction:
395
441
  1. **TypeScript strict mode** - No `any` types
396
442
  2. **Biome linting** - Max complexity 15 per function (some exceptions exist)
397
443
  3. **JSDoc comments** - All exported functions documented
398
- 4. **Test coverage** - 642 tests across 21 files
444
+ 4. **Test coverage** - keep coverage representative; do not rely on local corpora for CI
399
445
 
400
446
  ## Dependencies
401
447
 
@@ -417,14 +463,26 @@ bun test
417
463
  bun run build
418
464
  # Output: dist/index.mjs (~17 KB gzip ~5.7 KB)
419
465
 
420
- # Run performance test (generates 50K pages, measures segmentation speed/memory)
421
- bun run perf
466
+ # Run performance tests
467
+ bun run test:perf
468
+
469
+ # Regenerate extracted dictionary test fixtures (requires external books dir if not using ./books)
470
+ bun run dictionary:extract-fixtures -- --books-dir /path/to/books
471
+
472
+ # Export built-in dictionary options (writes to out/dictionary-options by default)
473
+ bun run dictionary:export-options
474
+
475
+ # Scan a full book with a builtin dictionary profile
476
+ bun run dictionary:scan -- --book 1687 --input /path/to/1687.json
477
+
478
+ # Validate a dictionary profile shape in userland
479
+ # (public API: validateDictionaryProfile(profile))
422
480
 
423
481
  # Format code
424
482
  bunx biome format --write .
425
483
 
426
484
  # Lint code
427
- bunx biome lint .
485
+ bunx biome check .
428
486
  ```
429
487
 
430
488
  ## Lessons Learned
@@ -605,12 +663,6 @@ bunx biome lint .
605
663
 
606
664
  57. **Validation Hints Specificity**: Generic error hints like "Check segmenter.ts" are unhelpful. Provide specific file names and logical components (e.g., "Check maxPages windowing in breakpoint-processor.ts"). User-friendly validation reports guide debugging much faster than "Something is wrong".
607
665
 
608
- ### Process Template (Multi-agent design review, TDD-first)
609
-
610
- If you want to repeat the “write a plan → get multiple AI critiques → synthesize → update plan → implement TDD-first” workflow, use:
611
-
612
- - `docs/ai-multi-agent-tdd-template.md`
613
-
614
666
  ### Architecture Insights
615
667
 
616
668
  - **Declarative > Imperative**: Users describe patterns, library handles regex
package/README.md CHANGED
@@ -349,12 +349,100 @@ const segments = segmentPages(pages, {
349
349
  If the previous page ends with strong sentence punctuation (`.`, `!`, `?`, `؟`, `؛`),
350
350
  the stoplist guard is skipped and the page-start match is allowed.
351
351
 
352
- #### Arabic Dictionary Helper
352
+ #### Preferred Dictionary Profile
353
353
 
354
- Use `createArabicDictionaryEntryRule()` to build a conservative rule for Arabic
355
- dictionaries with lemma capture, stopword filtering, and page-wrap protection.
356
- The helper now returns a serializable native `dictionaryEntry` rule rather than
357
- an eagerly-compiled regex blob:
354
+ For new Shamela-style dictionary work, prefer the top-level `dictionary`
355
+ profile over hand-built raw regexes or the older one-rule helper:
356
+
357
+ ```typescript
358
+ import { segmentPages } from 'flappa-doormal';
359
+
360
+ const segments = segmentPages(pages, {
361
+ breakpoints: ['{{tarqim}}'],
362
+ dictionary: {
363
+ version: 2,
364
+ zones: [{
365
+ name: 'main',
366
+ blockers: [
367
+ { appliesTo: ['lineEntry', 'inlineSubentry'], use: 'pageContinuation' },
368
+ { appliesTo: ['lineEntry', 'inlineSubentry'], use: 'intro' },
369
+ {
370
+ appliesTo: ['lineEntry', 'inlineSubentry'],
371
+ use: 'stopLemma',
372
+ words: ['ومعناه', 'ويقال', 'وقيل']
373
+ },
374
+ ],
375
+ families: [
376
+ { classes: ['chapter'], emit: 'chapter', use: 'heading' },
377
+ { emit: 'entry', use: 'lineEntry', wrappers: 'none' },
378
+ { emit: 'entry', prefixes: ['و'], stripPrefixesFromLemma: false, use: 'inlineSubentry' },
379
+ ],
380
+ }],
381
+ },
382
+ maxPages: 1,
383
+ });
384
+ ```
385
+
386
+ Why this is preferred:
387
+ - serializable JSON authoring shape
388
+ - profile-scoped blockers instead of giant regex blobs
389
+ - zone support for books that change layout later
390
+ - compatible with diagnostics tooling via `diagnoseDictionaryProfile()`
391
+ - first-class validation via `validateDictionaryProfile()`
392
+
393
+ Blocker authoring notes:
394
+ - `previousWord.scope` defaults to `'samePage'`
395
+ - set `scope: 'pageStart'` to compare only against the previous page's last
396
+ Arabic word for page-start candidates
397
+ - set `scope: 'any'` to combine the page-start cross-page check with the normal
398
+ same-page check
399
+ - `pageContinuation.authorityPrecision` defaults to `'high'`; use
400
+ `'aggressive'` when page-start continuation filtering should treat
401
+ authority-like prefixes more conservatively
402
+ - `qualifierTail` and `structuralLeak` are always-on global safety checks and
403
+ show up in diagnostics even though they are not zone-declared blockers
404
+
405
+ The production dictionary implementation now lives under `src/dictionary/`
406
+ inside the repo, separate from the generic segmentation internals.
407
+
408
+ Dictionary runtime semantics:
409
+ - `segmentPages()` is still the only entry point; dictionary profiles do not use
410
+ a separate API
411
+ - dictionary split points are merged with ordinary `rules`
412
+ - when a rule split and a dictionary split land at the same offset, metadata is
413
+ merged; if `debug` is enabled, `_flappa.rule` and `_flappa.dictionary` can
414
+ both appear on the same segment
415
+ - for dictionary-only configs, content before the first detected entry/chapter
416
+ is preserved as a leading segment with no dictionary metadata
417
+
418
+ #### Advanced: Single-Rule Arabic Dictionary Matching
419
+
420
+ `createArabicDictionaryEntryRule()` and the native `dictionaryEntry` rule shape
421
+ are still supported as the lower-level, advanced path for clients who want one
422
+ Arabic dictionary-style matcher inside a broader `rules` pipeline.
423
+
424
+ Use this path when:
425
+ - you need exactly one conservative dictionary headword rule
426
+ - you want to compose it with ordinary `SplitRule[]`
427
+ - you do not need profile zones, per-family blockers, or full-book tuning
428
+
429
+ Prefer the top-level `dictionary` profile when:
430
+ - segmenting an entire dictionary book
431
+ - persisting JSON config for a corpus
432
+ - the book changes layout in different sections
433
+ - you need diagnostics, rejection-reason rates, or book-specific profile tuning
434
+
435
+ Decision guide:
436
+
437
+ | Use case | Preferred API |
438
+ |----------|---------------|
439
+ | One conservative lemma matcher inside a normal segmentation pipeline | `createArabicDictionaryEntryRule()` / `dictionaryEntry` |
440
+ | Full-book dictionary segmentation with blockers, families, and zones | top-level `dictionary` |
441
+ | Persisted JSON config for real books | top-level `dictionary` |
442
+ | Advanced composition with other `SplitRule[]` rules | `createArabicDictionaryEntryRule()` / `dictionaryEntry` |
443
+
444
+ The helper returns a serializable native `dictionaryEntry` rule rather than an
445
+ eagerly-compiled regex blob:
358
446
 
359
447
  ```typescript
360
448
  import { createArabicDictionaryEntryRule, segmentPages } from 'flappa-doormal';
@@ -400,6 +488,193 @@ Behavior:
400
488
  - Can match comma-separated headword lists like `سبد، دبس:` when enabled
401
489
  - Can suppress same-page false positives like `جلّ وعزّ:` with `samePagePrevWordStoplist`
402
490
 
491
+ Option notes:
492
+ - `stopWords`
493
+ - exact lemma-level blockers for non-lexical heads like `وقيل` or `ويقال`
494
+ - use this for rejecting candidate headwords themselves
495
+ - `pageStartPrevWordStoplist`
496
+ - blocks a page-start candidate when the previous page ends with one of these
497
+ words
498
+ - useful for page-wrap false positives after citation/introduction prose
499
+ - `samePagePrevWordStoplist`
500
+ - blocks a same-page candidate when the previous local word matches
501
+ - useful for phrases like `جلّ وعزّ`
502
+ - `allowParenthesized`
503
+ - enables heads like `(عنبر):`
504
+ - `allowWhitespaceBeforeColon`
505
+ - enables spacing variants like `عنبر :`
506
+ - `allowCommaSeparated`
507
+ - enables grouped heads like `سبد، دبس:`
508
+ - `midLineSubentries`
509
+ - when `true`, allows conservative same-line subentries such as `والعزاء:`
510
+ - when `false`, only line-start/page-start heads are emitted
511
+
512
+ Serialization tradeoff:
513
+ - `dictionaryEntry` is serializable and safe to keep in JSON
514
+ - but it is still a single-rule primitive
515
+ - if you need corpus-wide blocker tuning, families, or zones, move up to the
516
+ top-level `dictionary` profile
517
+
518
+ Example: compose with chapter rules
519
+
520
+ ```typescript
521
+ import { createArabicDictionaryEntryRule, segmentPages } from 'flappa-doormal';
522
+
523
+ const segments = segmentPages(pages, {
524
+ rules: [
525
+ { lineStartsAfter: ['## '], meta: { type: 'chapter' } },
526
+ {
527
+ fuzzy: true,
528
+ lineStartsAfter: ['{{bab}} '],
529
+ meta: { type: 'chapter' },
530
+ },
531
+ createArabicDictionaryEntryRule({
532
+ stopWords: ['وقيل', 'ويقال', 'قال'],
533
+ pageStartPrevWordStoplist: ['قال', 'وقيل', 'ويقال'],
534
+ samePagePrevWordStoplist: ['جل'],
535
+ allowCommaSeparated: true,
536
+ }),
537
+ ],
538
+ breakpoints: ['{{tarqim}}'],
539
+ maxPages: 1,
540
+ });
541
+ ```
542
+
543
+ Example: one-off advanced rule inside a non-dictionary pipeline
544
+
545
+ ```typescript
546
+ import { createArabicDictionaryEntryRule, segmentPages } from 'flappa-doormal';
547
+
548
+ const segments = segmentPages(pages, {
549
+ rules: [
550
+ { lineStartsWith: ['{{kitab}}'], meta: { type: 'book' } },
551
+ { lineStartsWith: ['{{bab}}'], meta: { type: 'chapter' } },
552
+ createArabicDictionaryEntryRule({
553
+ stopWords: ['وقيل', 'ويقال'],
554
+ midLineSubentries: false,
555
+ allowParenthesized: true,
556
+ }),
557
+ ],
558
+ });
559
+ ```
560
+
561
+ Use `createArabicDictionaryEntryRule()` or `dictionaryEntry` when you only need
562
+ one conservative dictionary matcher and want it to behave like a normal
563
+ `SplitRule`.
564
+
565
+ For full-book dictionary profiling, diagnostics, and book-specific tuning,
566
+ prefer the top-level `dictionary` contract above.
567
+
568
+ #### Repo Fixture Book Options
569
+
570
+ The repo keeps book-specific golden options for the four reference Shamela
571
+ dictionaries as local test/support fixtures, not as part of the public package
572
+ API.
573
+
574
+ If you want standalone JSON copies of those fixture options for your own local
575
+ workflow, export them on demand:
576
+
577
+ ```bash
578
+ bun run dictionary:export-options
579
+ bun run dictionary:export-options -- --out-dir /path/to/dictionary-options
580
+ ```
581
+
582
+ By default this writes to `out/dictionary-options/`, which is not intended to
583
+ be checked into the repo.
584
+
585
+ #### Dictionary Diagnostics
586
+
587
+ Use `diagnoseDictionaryProfile()` when tuning blockers and families for a
588
+ dictionary profile:
589
+
590
+ ```typescript
591
+ import { diagnoseDictionaryProfile } from 'flappa-doormal';
592
+
593
+ const diagnostics = diagnoseDictionaryProfile(pages, profile, {
594
+ sampleLimit: 25,
595
+ });
596
+
597
+ console.log(diagnostics.rejectionReasons);
598
+ console.log(diagnostics.rejectedLemmas.slice(0, 10));
599
+ ```
600
+
601
+ Returned diagnostics include:
602
+ - accepted vs rejected candidate counts
603
+ - accepted counts by `kind`
604
+ - accepted/rejected counts by family and zone
605
+ - rejection-reason counts (`intro`, `stopLemma`, `pageContinuation`,
606
+ `qualifierTail`, `structuralLeak`, etc.)
607
+ - top rejected lemmas
608
+ - sampled accepted/rejected candidates for quick inspection
609
+
610
+ `diagnoseDictionaryProfile()` is primarily a tuning API for profile authoring,
611
+ so consumers should treat its output shape as less stable than the segmentation
612
+ API itself.
613
+
614
+ Validate profiles before persisting them or shipping them to an editor/CI step:
615
+
616
+ ```typescript
617
+ import { validateDictionaryProfile } from 'flappa-doormal';
618
+
619
+ const issues = validateDictionaryProfile(profile);
620
+ if (issues.length > 0) {
621
+ console.error(issues);
622
+ }
623
+ ```
624
+
625
+ Validation catches:
626
+ - empty or duplicate zones
627
+ - invalid gate shapes
628
+ - empty blocker lists
629
+ - inert heading families (for example, a heading family that emits `entry` but
630
+ never matches `entry` headings)
631
+
632
+ The runtime throws `DictionaryProfileValidationError` if invalid profiles reach
633
+ `segmentPages()` or `diagnoseDictionaryProfile()`.
634
+
635
+ #### Dictionary Surface Analysis
636
+
637
+ For corpus exploration and profile authoring, the library also exposes the
638
+ heading/surface scanner used during the proposal phase:
639
+
640
+ ```typescript
641
+ import {
642
+ analyzeDictionaryMarkdownPages,
643
+ classifyDictionaryHeading,
644
+ scanDictionaryMarkdownPage,
645
+ } from 'flappa-doormal';
646
+
647
+ const kind = classifyDictionaryHeading('## (خَ غ)');
648
+ const pageMatches = scanDictionaryMarkdownPage(page);
649
+ const report = analyzeDictionaryMarkdownPages(pages);
650
+ ```
651
+
652
+ Use these for:
653
+ - inspecting `convertContentToMarkdown()` output before profile authoring
654
+ - spotting structural marker/code lines
655
+ - building your own authoring tools around the same heading classifier
656
+
657
+ These are analysis helpers, not a replacement for the full runtime.
658
+
659
+ For full-book scans, use the bundled script:
660
+
661
+ ```bash
662
+ bun run dictionary:scan -- --book 1687 --input /path/to/1687.json
663
+ bun run dictionary:scan -- --book 7031 --books-dir /path/to/books --json
664
+ bun run dictionary:scan -- --book 1687 --input /path/to/1687.json --out diagnostics/1687.txt
665
+ ```
666
+
667
+ The scan script:
668
+ - reads an explicit `--input` file or resolves `<books-dir>/<book>.json`
669
+ - converts each page with `convertContentToMarkdown()`
670
+ - applies `removeZeroWidth`
671
+ - runs `diagnoseDictionaryProfile()` with the repo-local golden profile fixture
672
+ for that book
673
+
674
+ The test suite does not require the full Shamela corpora. It uses extracted
675
+ markdown fixtures under `testing/fixtures/dictionary-books/`, so moving your
676
+ local `books/` directory will not break CI or the built-in tests.
677
+
403
678
  #### Dictionary Letter-Code Lines
404
679
 
405
680
  For dictionary-specific letter-code lines like `ك ش ن` or `(هـ ث)`, use
@@ -561,11 +836,6 @@ Pass an optional `logger` to trace segmentation decisions or enable `debug` to a
561
836
  const segments = segmentPages(pages, {
562
837
  rules: [...],
563
838
  debug: true, // Enables detailed match metadata
564
- logger: {
565
- debug: (msg, data) => console.log(`[DEBUG] ${msg}`, data),
566
- info: (msg, data) => console.info(`[INFO] ${msg}`, data),
567
- warn: (msg, data) => console.warn(`[WARN] ${msg}`, data),
568
- error: (msg, data) => console.error(`[ERROR] ${msg}`, data),
569
839
  logger: {
570
840
  debug: (msg, data) => console.log(`[DEBUG] ${msg}`, data),
571
841
  info: (msg, data) => console.info(`[INFO] ${msg}`, data),
@@ -620,7 +890,35 @@ If a segment was created by a `breakpoint` pattern (e.g. because it exceeded `ma
620
890
  }
621
891
  ```
622
892
 
623
- **3. Safety Fallback Splits (`maxContentLength`)**
893
+ **3. Dictionary-based Splits**
894
+ If a segment was created by a dictionary profile:
895
+ ```json
896
+ {
897
+ "meta": {
898
+ "_flappa": {
899
+ "dictionary": {
900
+ "family": "lineEntry"
901
+ }
902
+ }
903
+ }
904
+ }
905
+ ```
906
+
907
+ Heading-driven dictionary splits can also record the heading class:
908
+ ```json
909
+ {
910
+ "meta": {
911
+ "_flappa": {
912
+ "dictionary": {
913
+ "family": "heading",
914
+ "headingClass": "chapter"
915
+ }
916
+ }
917
+ }
918
+ }
919
+ ```
920
+
921
+ **4. Safety Fallback Splits (`maxContentLength`)**
624
922
  If no rule or breakpoint matched and the library was forced to perform a safety fallback split:
625
923
  ```json
626
924
  {
@@ -1099,6 +1397,80 @@ const segments = segmentPages(pages, { rules });
1099
1397
  // ]
1100
1398
  ```
1101
1399
 
1400
+ ## Agent Advisor Workflow
1401
+
1402
+ If you want an AI agent to start from raw pages and get to a draft configuration with less hand-written glue, use `suggestSegmentationOptions()`:
1403
+
1404
+ ```typescript
1405
+ import { suggestSegmentationOptions } from 'flappa-doormal';
1406
+
1407
+ const report = suggestSegmentationOptions(pages, {
1408
+ maxRules: 4,
1409
+ topLineStarts: 12,
1410
+ topRepeatingSequences: 8,
1411
+ });
1412
+
1413
+ console.log(report.assessment);
1414
+ console.log(report.recommendedOptions);
1415
+ console.log(report.ruleSuggestions.slice(0, 5));
1416
+ ```
1417
+
1418
+ The report includes:
1419
+
1420
+ - preprocess cleanup hints (`removeZeroWidth`, `condenseEllipsis`, `fixTrailingWaw`)
1421
+ - an assessment of whether the book looks `structured`, `continuous`, or `mixed`
1422
+ - draft `SplitRule[]` suggestions with examples and confidence
1423
+ - a ready-to-run `recommendedOptions` object
1424
+ - rule validation output
1425
+ - self-evaluation of the generated segmentation draft
1426
+ - optional breakpoint suggestions when the draft still produces very large segments
1427
+
1428
+ For local JSON files, you can run the bundled script:
1429
+
1430
+ ```bash
1431
+ bun run segment:advise -- --input ./pages.json
1432
+ bun run segment:advise -- --input ./book.json --format markdown --out ./segmentation-report.md
1433
+ ```
1434
+
1435
+ Input can be either:
1436
+
1437
+ - `Page[]`
1438
+ - `{ pages: Page[] }`
1439
+
1440
+ ## MCP Server
1441
+
1442
+ The repo now includes a stdio MCP server wrapper for agent workflows:
1443
+
1444
+ ```bash
1445
+ bun run mcp:serve
1446
+ ```
1447
+
1448
+ When packaged, the server binary is:
1449
+
1450
+ ```bash
1451
+ flappa-doormal-mcp
1452
+ ```
1453
+
1454
+ Exposed MCP tools:
1455
+
1456
+ - `inspect_book`
1457
+ Input: `{ pages, advisorOptions? }`
1458
+ Returns preprocess detections, line-start analysis, repeating sequences, and draft rule suggestions.
1459
+ - `suggest_segmentation_options`
1460
+ Input: `{ pages, advisorOptions? }`
1461
+ Returns the full advisor report, including `recommendedOptions`.
1462
+ - `preview_segmentation`
1463
+ Input: `{ pages, options, sampleSegments? }`
1464
+ Runs segmentation and returns segments, samples, and validation.
1465
+ - `validate_segmentation`
1466
+ Input: `{ pages, options, segments }`
1467
+ Validates caller-provided segments against the source book.
1468
+ - `score_candidate_options`
1469
+ Input: `{ pages, candidates, sampleSegments? }`
1470
+ Ranks multiple `SegmentationOptions` candidates using validation and segment-shape heuristics.
1471
+
1472
+ All tool results are returned as JSON-friendly objects so agents can iterate without scraping prose output.
1473
+
1102
1474
  ## Advanced: Metadata Extraction & Data Migration
1103
1475
 
1104
1476
  If you already have pre-segmented data (e.g., records from a database or JSON file) and want to use **flappa-doormal's** token system to extract metadata and clean the content without further splitting, you can use the **Metadata Extraction** pattern.