npm - flappa-doormal - Versions diffs - 2.14.0 → 2.14.2 - Mend

flappa-doormal 2.14.0 → 2.14.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/AGENTS.md CHANGED Viewed

@@ -583,6 +583,29 @@ bunx biome lint .
 42. **Use `assertNever` for exhaustive switches**: When switching on union types (like `PreprocessTransform`), add a `default` case that calls `assertNever(x: never)` which throws. TypeScript will error at compile time if a new union member is added but not handled.
+43. **`words` field matches partial words**: The `words` field generates `\s+(?:word1|word2)` which matches text *starting with* the word, not complete words. `words: ['ثم']` will match `ثمامة` (a name). **Solution**: Add trailing space for whole-word matching: `words: ['ثم ']`.
+44. **Breakpoints are only applied when content EXCEEDS limits**: Per the documented behavior, breakpoints split segments that exceed `maxPages` or `maxContentLength`. If content fits within both limits, breakpoints should NOT be applied. Tests that expect breakpoint splits on already-compliant content have incorrect expectations.
+45. **`maxPages=0` + `maxContentLength` interaction is subtle**: When both constraints are set:
+   - Check if remaining content on the CURRENT PAGE fits within `maxContentLength`
+   - If yes AND remaining content spans multiple pages: create segment for current page, advance to next page
+   - If no (content exceeds length): apply breakpoints as normal
+   - Bug symptom: adding a second page caused first page to be over-split into tiny fragments (e.g., 147, 229, 65 chars instead of ~1800 chars)
+   - Root cause: code checked ALL remaining content's span (crossing pages) instead of just current page's content
+46. **Minimal regression tests must trigger the bug path**: When fixing bugs, create tests that:
+   - Use realistic data sizes that exceed thresholds
+   - Include the specific constraint combination that triggered the bug (e.g., `maxPages=0` + `maxContentLength` + multiple pages)
+   - Assert on segment COUNT and LENGTHS, not just "no crashes"
+   - Would FAIL without the fix (tiny fragments) and PASS with it (normal segments)
+47. **Existing test expectations can be wrong**: When a fix causes existing tests to fail, investigate whether the test expectation matches documented behavior. The test `should not merge the pages when content overlaps between pages` expected 4 segments but the correct count is 3 (per documented semantics). Update tests to match correct behavior, don't revert fixes to match incorrect tests.
+48. **The "adding content changes behavior" smell**: If adding unrelated content (like a second page) dramatically changes how the first page is processed, suspect incorrect span/window calculations. The fix pattern: ensure window calculations are scoped to the CURRENT context (current page) not the ORIGINAL context (all remaining content).
+49. **Use `trimStart()` not `trim()` for user-provided patterns with semantic whitespace**: When processing user-provided patterns like the `words` field, only strip leading whitespace (likely accidental). Trailing whitespace may be intentional for whole-word matching (e.g., `'بل '` should match only the standalone word, not words starting with `بل` like `بلغ`). **Bug symptom**: `words: ['بل ']` matched `بلغ` because `.trim()` stripped the trailing space to just `بل`. **Fix**: Use `.trimStart()` to preserve trailing whitespace.
 ### Process Template (Multi-agent design review, TDD-first)
 If you want to repeat the “write a plan → get multiple AI critiques → synthesize → update plan → implement TDD-first” workflow, use:

package/README.md CHANGED Viewed

@@ -657,6 +657,18 @@ For breaking on multiple words, the `words` field provides a simpler syntax with
 // Note: Empty `words: []` is filtered out (no-op), NOT treated as page-boundary fallback
 ```
+**⚠️ Partial Word Matching**: The `words` field matches text that *starts with* the word, not complete words only. For example, `words: ['ثم']` will also match `ثمامة` (a name starting with ثم).
+To match only complete words, add a **trailing space**:
+```typescript
+// ❌ Matches 'ثم' anywhere, including inside 'ثمامة'
+{ words: ['فهذا', 'ثم', 'أقول'] }
+// ✅ Matches only standalone words followed by space
+{ words: ['فهذا ', 'ثم ', 'أقول '] }
+```
 **Security note (ReDoS)**: Breakpoints (and raw `regex` rules) compile user-provided regular expressions. **Do not accept untrusted patterns** (e.g. from end users) without validation/sandboxing; some regexes can trigger catastrophic backtracking and hang the process.
 ### 12. Occurrence Filtering

package/dist/index.mjs CHANGED Viewed

@@ -1690,7 +1690,7 @@ const createSegment = (content, fromPageId, toPageId, meta) => {
 * Words are escaped, processed, sorted by length, and joined with alternation.
 */
 const buildWordsRegex = (words, processPattern$1) => {
-	const processed = words.map((w) => w.trim()).filter((w) => w.length > 0).map((w) => processPattern$1(escapeWordsOutsideTokens(w)));
+	const processed = words.map((w) => w.trimStart()).filter((w) => w.length > 0).map((w) => processPattern$1(escapeWordsOutsideTokens(w)));
 	const unique = [...new Set(processed)];
 	if (unique.length === 0) return null;
 	unique.sort((a, b) => b.length - a.length);
@@ -2389,15 +2389,19 @@ const processOffsetFastPath = (fullContent, fromIdx, toIdx, pageIds, cumulativeO
 /**
 * Checks if the remaining content fits within paged/length limits.
 * If so, pushes the final segment and returns true.
+*
+* @param actualRemainingEndIdx - The actual end page index of the remaining content
+*   (computed from boundaryPositions), NOT the original segment's toIdx. This is critical
+*   for maxPages=0 scenarios where remaining content may end before toIdx.
 */
-const handleOversizedSegmentFit = (remainingContent, currentFromIdx, toIdx, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint, result) => {
-	const remainingSpan = computeRemainingSpan(currentFromIdx, toIdx, pageIds);
-	const remainingHasExclusions = hasAnyExclusionsInRange(expandedBreakpoints, pageIds, currentFromIdx, toIdx);
+const handleOversizedSegmentFit = (remainingContent, currentFromIdx, actualRemainingEndIdx, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint, result) => {
+	const remainingSpan = computeRemainingSpan(currentFromIdx, actualRemainingEndIdx, pageIds);
+	const remainingHasExclusions = hasAnyExclusionsInRange(expandedBreakpoints, pageIds, currentFromIdx, actualRemainingEndIdx);
 	const fitsInPages = remainingSpan <= maxPages;
 	const fitsInLength = !maxContentLength || remainingContent.length <= maxContentLength;
 	if (fitsInPages && fitsInLength && !remainingHasExclusions) {
 		const includeMeta = isFirstPiece || Boolean(debugMetaKey);
-		const finalSeg = createFinalSegment(remainingContent, currentFromIdx, toIdx, pageIds, getSegmentMetaWithDebug(isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint), includeMeta);
+		const finalSeg = createFinalSegment(remainingContent, currentFromIdx, actualRemainingEndIdx, pageIds, getSegmentMetaWithDebug(isFirstPiece, debugMetaKey, originalMeta, lastBreakpoint), includeMeta);
 		if (finalSeg) result.push(finalSeg);
 		return true;
 	}
@@ -2510,6 +2514,36 @@ const tryProcessOversizedSegmentFastPath = (segment, fromIdx, toIdx, pageIds, no
 	if (maxPages === 0) return processTrivialFastPath(fromIdx, toIdx, pageIds, normalizedPages, pageCount, segment.meta, debugMetaKey, logger);
 	return processOffsetFastPath(fullContent, fromIdx, toIdx, pageIds, cumulativeOffsets, maxPages, segment.meta, debugMetaKey, logger);
 };
+/**
+* For maxPages=0 with maxContentLength: if current page's remaining content fits,
+* create a segment and advance to next page without applying breakpoints.
+*/
+const tryHandleCurrentPageFit = (fullContent, cursorPos, currentFromIdx, fromIdx, actualRemainingEndIdx, boundaryPositions, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, segmentMeta, lastBreakpoint, result) => {
+	if (maxPages !== 0 || !maxContentLength || currentFromIdx >= actualRemainingEndIdx) return { handled: false };
+	const currentPageEndPos = boundaryPositions[currentFromIdx - fromIdx + 1] ?? fullContent.length;
+	const currentPageRemainingContent = fullContent.slice(cursorPos, currentPageEndPos).trim();
+	if (!currentPageRemainingContent) return { handled: false };
+	const currentPageFitsInLength = currentPageRemainingContent.length <= maxContentLength;
+	const currentPageHasExclusions = hasAnyExclusionsInRange(expandedBreakpoints, pageIds, currentFromIdx, currentFromIdx);
+	if (!currentPageFitsInLength || currentPageHasExclusions) return { handled: false };
+	const pageBoundaryIdx = expandedBreakpoints.findIndex((bp) => bp.regex === null);
+	const pageBoundaryBreakpoint = pageBoundaryIdx >= 0 ? {
+		breakpointIndex: pageBoundaryIdx,
+		rule: { pattern: "" }
+	} : lastBreakpoint;
+	const includeMeta = isFirstPiece || Boolean(debugMetaKey);
+	const meta = getSegmentMetaWithDebug(isFirstPiece, debugMetaKey, segmentMeta, pageBoundaryBreakpoint);
+	const seg = createSegment(currentPageRemainingContent, pageIds[currentFromIdx], void 0, includeMeta ? meta : void 0);
+	if (seg) result.push(seg);
+	let newCursorPos = currentPageEndPos;
+	while (newCursorPos < fullContent.length && /\s/.test(fullContent[newCursorPos])) newCursorPos++;
+	return {
+		handled: true,
+		newCursorPos,
+		newFromIdx: currentFromIdx + 1,
+		newLastBreakpoint: pageBoundaryBreakpoint
+	};
+};
 const processOversizedSegmentIterative = (segment, fromIdx, toIdx, pageIds, normalizedPages, cumulativeOffsets, expandedBreakpoints, maxPages, prefer, logger, debugMetaKey, maxContentLength) => {
 	const result = [];
 	const fullContent = segment.content;
@@ -2545,7 +2579,18 @@ const processOversizedSegmentIterative = (segment, fromIdx, toIdx, pageIds, norm
 			didHitMaxIterations = false;
 			break;
 		}
-		if (handleOversizedSegmentFit(remainingContent, currentFromIdx, toIdx, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, segment.meta, lastBreakpoint, result)) {
+		const actualRemainingContent = fullContent.slice(cursorPos);
+		const actualEndPos = Math.max(cursorPos, fullContent.length - 1);
+		const actualRemainingEndIdx = Math.min(findPageIndexForPosition(actualEndPos, boundaryPositions, fromIdx), toIdx);
+		const currentPageFit = tryHandleCurrentPageFit(fullContent, cursorPos, currentFromIdx, fromIdx, actualRemainingEndIdx, boundaryPositions, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, segment.meta, lastBreakpoint, result);
+		if (currentPageFit.handled) {
+			cursorPos = currentPageFit.newCursorPos;
+			currentFromIdx = currentPageFit.newFromIdx;
+			lastBreakpoint = currentPageFit.newLastBreakpoint;
+			isFirstPiece = false;
+			continue;
+		}
+		if (handleOversizedSegmentFit(actualRemainingContent, currentFromIdx, actualRemainingEndIdx, pageIds, expandedBreakpoints, maxPages, maxContentLength, isFirstPiece, debugMetaKey, segment.meta, lastBreakpoint, result)) {
 			didHitMaxIterations = false;
 			break;
 		}