flappa-doormal 2.14.0 → 2.14.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +23 -0
- package/README.md +12 -0
- package/dist/index.mjs +51 -6
- package/dist/index.mjs.map +1 -1
- package/package.json +1 -1
package/AGENTS.md
CHANGED
|
@@ -583,6 +583,29 @@ bunx biome lint .
|
|
|
583
583
|
|
|
584
584
|
42. **Use `assertNever` for exhaustive switches**: When switching on union types (like `PreprocessTransform`), add a `default` case that calls `assertNever(x: never)` which throws. TypeScript will error at compile time if a new union member is added but not handled.
|
|
585
585
|
|
|
586
|
+
43. **`words` field matches partial words**: The `words` field generates `\s+(?:word1|word2)` which matches text *starting with* the word, not complete words. `words: ['ثم']` will match `ثمامة` (a name). **Solution**: Add trailing space for whole-word matching: `words: ['ثم ']`.
|
|
587
|
+
|
|
588
|
+
44. **Breakpoints are only applied when content EXCEEDS limits**: Per the documented behavior, breakpoints split segments that exceed `maxPages` or `maxContentLength`. If content fits within both limits, breakpoints should NOT be applied. Tests that expect breakpoint splits on already-compliant content have incorrect expectations.
|
|
589
|
+
|
|
590
|
+
45. **`maxPages=0` + `maxContentLength` interaction is subtle**: When both constraints are set:
|
|
591
|
+
- Check if remaining content on the CURRENT PAGE fits within `maxContentLength`
|
|
592
|
+
- If yes AND remaining content spans multiple pages: create segment for current page, advance to next page
|
|
593
|
+
- If no (content exceeds length): apply breakpoints as normal
|
|
594
|
+
- Bug symptom: adding a second page caused first page to be over-split into tiny fragments (e.g., 147, 229, 65 chars instead of ~1800 chars)
|
|
595
|
+
- Root cause: code checked ALL remaining content's span (crossing pages) instead of just current page's content
|
|
596
|
+
|
|
597
|
+
46. **Minimal regression tests must trigger the bug path**: When fixing bugs, create tests that:
|
|
598
|
+
- Use realistic data sizes that exceed thresholds
|
|
599
|
+
- Include the specific constraint combination that triggered the bug (e.g., `maxPages=0` + `maxContentLength` + multiple pages)
|
|
600
|
+
- Assert on segment COUNT and LENGTHS, not just "no crashes"
|
|
601
|
+
- Would FAIL without the fix (tiny fragments) and PASS with it (normal segments)
|
|
602
|
+
|
|
603
|
+
47. **Existing test expectations can be wrong**: When a fix causes existing tests to fail, investigate whether the test expectation matches documented behavior. The test `should not merge the pages when content overlaps between pages` expected 4 segments but the correct count is 3 (per documented semantics). Update tests to match correct behavior, don't revert fixes to match incorrect tests.
|
|
604
|
+
|
|
605
|
+
48. **The "adding content changes behavior" smell**: If adding unrelated content (like a second page) dramatically changes how the first page is processed, suspect incorrect span/window calculations. The fix pattern: ensure window calculations are scoped to the CURRENT context (current page) not the ORIGINAL context (all remaining content).
|
|
606
|
+
|
|
607
|
+
49. **Use `trimStart()` not `trim()` for user-provided patterns with semantic whitespace**: When processing user-provided patterns like the `words` field, only strip leading whitespace (likely accidental). Trailing whitespace may be intentional for whole-word matching (e.g., `'بل '` should match only the standalone word, not words starting with `بل` like `بلغ`). **Bug symptom**: `words: ['بل ']` matched `بلغ` because `.trim()` stripped the trailing space to just `بل`. **Fix**: Use `.trimStart()` to preserve trailing whitespace.
|
|
608
|
+
|
|
586
609
|
### Process Template (Multi-agent design review, TDD-first)
|
|
587
610
|
|
|
588
611
|
If you want to repeat the “write a plan → get multiple AI critiques → synthesize → update plan → implement TDD-first” workflow, use:
|
package/README.md
CHANGED
|
@@ -657,6 +657,18 @@ For breaking on multiple words, the `words` field provides a simpler syntax with
|
|
|
657
657
|
// Note: Empty `words: []` is filtered out (no-op), NOT treated as page-boundary fallback
|
|
658
658
|
```
|
|
659
659
|
|
|
660
|
+
**⚠️ Partial Word Matching**: The `words` field matches text that *starts with* the word, not complete words only. For example, `words: ['ثم']` will also match `ثمامة` (a name starting with ثم).
|
|
661
|
+
|
|
662
|
+
To match only complete words, add a **trailing space**:
|
|
663
|
+
|
|
664
|
+
```typescript
|
|
665
|
+
// ❌ Matches 'ثم' anywhere, including inside 'ثمامة'
|
|
666
|
+
{ words: ['فهذا', 'ثم', 'أقول'] }
|
|
667
|
+
|
|
668
|
+
// ✅ Matches only standalone words followed by space
|
|
669
|
+
{ words: ['فهذا ', 'ثم ', 'أقول '] }
|
|
670
|
+
```
|
|
671
|
+
|
|
660
672
|
**Security note (ReDoS)**: Breakpoints (and raw `regex` rules) compile user-provided regular expressions. **Do not accept untrusted patterns** (e.g. from end users) without validation/sandboxing; some regexes can trigger catastrophic backtracking and hang the process.
|
|
661
673
|
|
|
662
674
|
### 12. Occurrence Filtering
|
package/dist/index.mjs
CHANGED
|
@@ -1690,7 +1690,7 @@ const createSegment = (content, fromPageId, toPageId, meta) => {
|
|
|
1690
1690
|
* Words are escaped, processed, sorted by length, and joined with alternation.
|
|
1691
1691
|
*/
|
|
1692
1692
|
const buildWordsRegex = (words, processPattern$1) => {
|
|
1693
|
-
const processed = words.map((w) => w.
|
|
1693
|
+
const processed = words.map((w) => w.trimStart()).filter((w) => w.length > 0).map((w) => processPattern$1(escapeWordsOutsideTokens(w)));
|
|
1694
1694
|
const unique = [...new Set(processed)];
|
|
1695
1695
|
if (unique.length === 0) return null;
|
|
1696
1696
|
unique.sort((a, b) => b.length - a.length);
|
|
@@ -2389,15 +2389,19 @@ const processOffsetFastPath = (fullContent, fromIdx, toIdx, pageIds, cumulativeO
|
|
|
2389
2389
|
/**
|
|
2390
2390
|
* Checks if the remaining content fits within paged/length limits.
|
|
2391
2391
|
* If so, pushes the final segment and returns true.
|
|
2392
|
+
*
|
|
2393
|
+
* @param actualRemainingEndIdx - The actual end page index of the remaining content
|
|
2394
|
+
* (computed from boundaryPositions), NOT the original segment's toIdx. This is critical
|
|
2395
|
+
* for maxPages=0 scenarios where remaining content may end before toIdx.
|
|
2392
2396
|
*/
|
|
2393
|
-
const handleOversizedSegmentFit = (remainingContent, currentFromIdx,
|
|
2394
|
-
const remainingSpan = computeRemainingSpan(currentFromIdx,
|
|
2395
|
-
const remainingHasExclusions = hasAnyExclusionsInRange(expandedBreakpoints, pageIds, currentFromIdx,
|
|
2397
|
+
const handleOversizedSegmentFit = (remainingContent, currentFromIdx, actualRemainingEndIdx, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint, result) => {
|
|
2398
|
+
const remainingSpan = computeRemainingSpan(currentFromIdx, actualRemainingEndIdx, pageIds);
|
|
2399
|
+
const remainingHasExclusions = hasAnyExclusionsInRange(expandedBreakpoints, pageIds, currentFromIdx, actualRemainingEndIdx);
|
|
2396
2400
|
const fitsInPages = remainingSpan <= maxPages;
|
|
2397
2401
|
const fitsInLength = !maxContentLength || remainingContent.length <= maxContentLength;
|
|
2398
2402
|
if (fitsInPages && fitsInLength && !remainingHasExclusions) {
|
|
2399
2403
|
const includeMeta = isFirstPiece || Boolean(debugMetaKey);
|
|
2400
|
-
const finalSeg = createFinalSegment(remainingContent, currentFromIdx,
|
|
2404
|
+
const finalSeg = createFinalSegment(remainingContent, currentFromIdx, actualRemainingEndIdx, pageIds, getSegmentMetaWithDebug(isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint), includeMeta);
|
|
2401
2405
|
if (finalSeg) result.push(finalSeg);
|
|
2402
2406
|
return true;
|
|
2403
2407
|
}
|
|
@@ -2510,6 +2514,36 @@ const tryProcessOversizedSegmentFastPath = (segment, fromIdx, toIdx, pageIds, no
|
|
|
2510
2514
|
if (maxPages === 0) return processTrivialFastPath(fromIdx, toIdx, pageIds, normalizedPages, pageCount, segment.meta, debugMetaKey, logger);
|
|
2511
2515
|
return processOffsetFastPath(fullContent, fromIdx, toIdx, pageIds, cumulativeOffsets, maxPages, segment.meta, debugMetaKey, logger);
|
|
2512
2516
|
};
|
|
2517
|
+
/**
|
|
2518
|
+
* For maxPages=0 with maxContentLength: if current page's remaining content fits,
|
|
2519
|
+
* create a segment and advance to next page without applying breakpoints.
|
|
2520
|
+
*/
|
|
2521
|
+
const tryHandleCurrentPageFit = (fullContent, cursorPos, currentFromIdx, fromIdx, actualRemainingEndIdx, boundaryPositions, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, segmentMeta, lastBreakpoint, result) => {
|
|
2522
|
+
if (maxPages !== 0 || !maxContentLength || currentFromIdx >= actualRemainingEndIdx) return { handled: false };
|
|
2523
|
+
const currentPageEndPos = boundaryPositions[currentFromIdx - fromIdx + 1] ?? fullContent.length;
|
|
2524
|
+
const currentPageRemainingContent = fullContent.slice(cursorPos, currentPageEndPos).trim();
|
|
2525
|
+
if (!currentPageRemainingContent) return { handled: false };
|
|
2526
|
+
const currentPageFitsInLength = currentPageRemainingContent.length <= maxContentLength;
|
|
2527
|
+
const currentPageHasExclusions = hasAnyExclusionsInRange(expandedBreakpoints, pageIds, currentFromIdx, currentFromIdx);
|
|
2528
|
+
if (!currentPageFitsInLength || currentPageHasExclusions) return { handled: false };
|
|
2529
|
+
const pageBoundaryIdx = expandedBreakpoints.findIndex((bp) => bp.regex === null);
|
|
2530
|
+
const pageBoundaryBreakpoint = pageBoundaryIdx >= 0 ? {
|
|
2531
|
+
breakpointIndex: pageBoundaryIdx,
|
|
2532
|
+
rule: { pattern: "" }
|
|
2533
|
+
} : lastBreakpoint;
|
|
2534
|
+
const includeMeta = isFirstPiece || Boolean(debugMetaKey);
|
|
2535
|
+
const meta = getSegmentMetaWithDebug(isFirstPiece, debugMetaKey, segmentMeta, pageBoundaryBreakpoint);
|
|
2536
|
+
const seg = createSegment(currentPageRemainingContent, pageIds[currentFromIdx], void 0, includeMeta ? meta : void 0);
|
|
2537
|
+
if (seg) result.push(seg);
|
|
2538
|
+
let newCursorPos = currentPageEndPos;
|
|
2539
|
+
while (newCursorPos < fullContent.length && /\s/.test(fullContent[newCursorPos])) newCursorPos++;
|
|
2540
|
+
return {
|
|
2541
|
+
handled: true,
|
|
2542
|
+
newCursorPos,
|
|
2543
|
+
newFromIdx: currentFromIdx + 1,
|
|
2544
|
+
newLastBreakpoint: pageBoundaryBreakpoint
|
|
2545
|
+
};
|
|
2546
|
+
};
|
|
2513
2547
|
const processOversizedSegmentIterative = (segment, fromIdx, toIdx, pageIds, normalizedPages, cumulativeOffsets, expandedBreakpoints, maxPages, prefer, logger, debugMetaKey, maxContentLength) => {
|
|
2514
2548
|
const result = [];
|
|
2515
2549
|
const fullContent = segment.content;
|
|
@@ -2545,7 +2579,18 @@ const processOversizedSegmentIterative = (segment, fromIdx, toIdx, pageIds, norm
|
|
|
2545
2579
|
didHitMaxIterations = false;
|
|
2546
2580
|
break;
|
|
2547
2581
|
}
|
|
2548
|
-
|
|
2582
|
+
const actualRemainingContent = fullContent.slice(cursorPos);
|
|
2583
|
+
const actualEndPos = Math.max(cursorPos, fullContent.length - 1);
|
|
2584
|
+
const actualRemainingEndIdx = Math.min(findPageIndexForPosition(actualEndPos, boundaryPositions, fromIdx), toIdx);
|
|
2585
|
+
const currentPageFit = tryHandleCurrentPageFit(fullContent, cursorPos, currentFromIdx, fromIdx, actualRemainingEndIdx, boundaryPositions, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, segment.meta, lastBreakpoint, result);
|
|
2586
|
+
if (currentPageFit.handled) {
|
|
2587
|
+
cursorPos = currentPageFit.newCursorPos;
|
|
2588
|
+
currentFromIdx = currentPageFit.newFromIdx;
|
|
2589
|
+
lastBreakpoint = currentPageFit.newLastBreakpoint;
|
|
2590
|
+
isFirstPiece = false;
|
|
2591
|
+
continue;
|
|
2592
|
+
}
|
|
2593
|
+
if (handleOversizedSegmentFit(actualRemainingContent, currentFromIdx, actualRemainingEndIdx, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, segment.meta, lastBreakpoint, result)) {
|
|
2549
2594
|
didHitMaxIterations = false;
|
|
2550
2595
|
break;
|
|
2551
2596
|
}
|