npm - email-origin-chain - Versions diffs - 1.0.8 → 1.0.11 - Mend

email-origin-chain 1.0.8 → 1.0.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +35 -2
package/dist/detectors/outlook-empty-header-detector.d.ts +1 -1
package/dist/detectors/outlook-empty-header-detector.js +2 -1
package/dist/detectors/outlook-reverse-fr-detector.d.ts +1 -1
package/dist/detectors/outlook-reverse-fr-detector.js +2 -1
package/dist/detectors/registry.js +6 -6
package/dist/index.js +15 -3
package/dist/inline-layer.js +1 -1
package/dist/scoring.d.ts +16 -0
package/dist/scoring.js +154 -0
package/dist/types.d.ts +8 -0
package/dist/utils.js +3 -0
package/docs/architecture/README.md +4 -0
package/docs/confidence_scoring.md +75 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -17,6 +17,7 @@ Detailed documentation can be found in the [docs/architecture/](docs/architectur
 - [Phase 2: Plugin Architecture](docs/architecture/phase2_plugin_foundation.md)
 - [Phase 3: Full Compatibility (100%)](docs/architecture/phase3_fallbacks.md)
 - [Deep Forward Fix Walkthrough](docs/walkthrough_deep_forward_fix.md)
+- [Confidence Scoring System](docs/confidence_scoring.md)
 - [Detector Usage & Priorities](docs/detectors_usage.md)
 **✅ Test Coverage:** The library has been validated against **239 fixtures** from the `email-forward-parser-recursive` library with a **100% success rate** (239/239). This includes validating message bodies and ensuring non-message snippets are correctly identified. See [Test Coverage Report](docs/TEST_COVERAGE.md) for details.
@@ -77,6 +78,10 @@ The library returns a `ResultObject` with the following structure:
 | `text` | `string \| null` | Cleaned body content of the deepest message. |
 | `attachments` | `array` | Metadata for MIME attachments found at the deepest level. |
 | `history` | `array` | **Conversation Chaining**: Full audit trail of the discussion (see below). |
+| `confidence_score` | `number` | Reliability score (0-100) based on signal analysis. |
+| `confidence_description` | `string` | Human-readable explanation of the score. |
+| `confidence_signals` | `object` | Key-value breakdown of triggered bonuses and penalties. |
+| `confidence_reasons` | `array` | Detailed list of triggered scoring rules. |
 | `diagnostics` | `object` | Metadata about the parsing process. |
 ### Diagnostics Detail
@@ -116,6 +121,22 @@ Each history entry contains its own `from`, `to`, `cc`, `subject`, `date_iso`, `
 - `content:silent_forward`: The user forwarded the message without adding any text.
 - `date:unparseable`: A date string was found but could not be normalized to ISO.
+## Confidence Scoring System
+To ensure high-quality extraction from text-based forwards, the library uses a **Signal-Based Confidence Score**. It analyzes metrics like email address density, sender count consistency, and quote levels to detect "Garbage" or incomplete chains.
+### Scoring Logic:
+- **Baseline**: 100% confidence for standard formatting (~2 emails per level).
+- **Penalties**:
+    - **Sender Mismatch**: More senders found than levels detected (-75%).
+    - **Quote Mismatch**: Quote nesting deeper than detected levels (-75%).
+    - **Partial Chain**: Only 1 email detected per level (-50%).
+    - **Ghost Forward**: No emails found in text (-100%).
+- **Bonuses**:
+    - **Validated Density**: High email density corroborated by context headers (+75%).
+Check the [Confidence Scoring Documentation](docs/confidence_scoring.md) for full details.
 ### Typical Output Example
 ```json
@@ -148,7 +169,13 @@ Each history entry contains its own `from`, `to`, `cc`, `subject`, `date_iso`, `
     "depth": 2,
     "parsedOk": true,
     "warnings": []
-  }
+  },
+  "confidence_score": 100,
+  "confidence_description": "High Confidence: Standard Density: Ratio 2.00 is optimal (~2 emails per level)",
+  "confidence_signals": {},
+  "confidence_reasons": [
+    "Standard Density: Ratio 2.00 is optimal (~2 emails per level)"
+  ]
 }
 ```
@@ -278,7 +305,13 @@ console.log(result.diagnostics.depth); // 4 (5 messages total)
     "depth": 4,
     "parsedOk": true,
     "warnings": []
-  }
+  },
+  "confidence_score": 100,
+  "confidence_description": "High Confidence: Standard Density: Ratio 2.00 is optimal (~2 emails per level)",
+  "confidence_signals": {},
+  "confidence_reasons": [
+    "Standard Density: Ratio 2.00 is optimal (~2 emails per level)"
+  ]
 }
 ```

package/dist/detectors/outlook-empty-header-detector.d.ts CHANGED Viewed

@@ -10,7 +10,7 @@ import { ForwardDetector, DetectionResult } from './types';
  */
 export declare class OutlookEmptyHeaderDetector implements ForwardDetector {
     readonly name = "outlook_empty_header";
-    readonly priority = 50;
+    readonly priority = -50;
     private readonly HEADER_PATTERN;
     detect(text: string): DetectionResult;
 }

package/dist/detectors/outlook-empty-header-detector.js CHANGED Viewed

@@ -14,7 +14,7 @@ const cleaner_1 = require("../utils/cleaner");
 class OutlookEmptyHeaderDetector {
     constructor() {
         this.name = 'outlook_empty_header';
-        this.priority = 50; // Fallback for corrupted headers (after specifics, before generic Crisp)
+        this.priority = -50; // Very specific - High Priority
         // Regex to capture the header block:
         // 1. Optional Separator (mostly underscores)
         // 2. De: ... (From)
@@ -51,6 +51,7 @@ class OutlookEmptyHeaderDetector {
                     message: message || undefined,
                     email: {
                         from: fromLine,
+                        to: toLine,
                         subject: subjectLine,
                         date: dateLine || undefined,
                         body: finalBody

package/dist/detectors/outlook-reverse-fr-detector.d.ts CHANGED Viewed

@@ -4,7 +4,7 @@ import { ForwardDetector, DetectionResult } from './types';
  */
 export declare class OutlookReverseFrDetector implements ForwardDetector {
     readonly name = "outlook_reverse_fr";
-    readonly priority = -20;
+    readonly priority = -45;
     private readonly ENVOYE_PATTERN;
     private readonly DE_PATTERN;
     private readonly A_PATTERN;

package/dist/detectors/outlook-reverse-fr-detector.js CHANGED Viewed

@@ -8,7 +8,7 @@ const cleaner_1 = require("../utils/cleaner");
 class OutlookReverseFrDetector {
     constructor() {
         this.name = 'outlook_reverse_fr';
-        this.priority = -20; // Specific detector - High Priority (Override)
+        this.priority = -45; // Specific detector - High Priority
         // Regex patterns for field detection
         this.ENVOYE_PATTERN = /^[ \t]*Envoy(?:é|=E9|e)?\s*:\s*(.*?)\s*$/m;
         this.DE_PATTERN = /^[ \t]*De\s*:/i;
@@ -76,6 +76,7 @@ class OutlookReverseFrDetector {
                 from: fromEmail.includes('@')
                     ? { name: fromName !== fromEmail ? fromName : '', address: fromEmail }
                     : { name: fromName, address: fromName },
+                to: a ? extractValue(a.line) : undefined,
                 subject: objet ? extractValue(objet.line) : '',
                 date: extractValue(envoyeMatch[0]),
                 body: finalBody

package/dist/detectors/registry.js CHANGED Viewed

@@ -15,12 +15,12 @@ class DetectorRegistry {
     constructor(customDetectors = []) {
         this.detectors = [];
         // Register all detectors (priority determines order)
-        this.register(new crisp_detector_1.CrispDetector()); // priority: 0 (highest - universal library)
-        this.register(new outlook_empty_header_detector_1.OutlookEmptyHeaderDetector()); // priority: 5 (handle empty headers)
-        this.register(new outlook_reverse_fr_detector_1.OutlookReverseFrDetector()); // priority: 6 (handle reversed FR headers)
-        this.register(new reply_detector_1.ReplyDetector()); // priority: 7 (handle standard replies)
-        this.register(new outlook_fr_detector_1.OutlookFRDetector()); // priority: 10 (fallback for FR formats)
-        this.register(new new_outlook_detector_1.NewOutlookDetector()); // priority: 10 (fallback for new Outlook)
+        this.register(new outlook_empty_header_detector_1.OutlookEmptyHeaderDetector()); // priority: -50 (Very specific)
+        this.register(new outlook_reverse_fr_detector_1.OutlookReverseFrDetector()); // priority: -45 (Specific)
+        this.register(new new_outlook_detector_1.NewOutlookDetector()); // priority: -40 (Specific)
+        this.register(new outlook_fr_detector_1.OutlookFRDetector()); // priority: -30 (Fallback for FR)
+        this.register(new reply_detector_1.ReplyDetector()); // priority: -10 (Replies)
+        this.register(new crisp_detector_1.CrispDetector()); // priority: 100 (Universal fallback)
         // Register custom detectors
         customDetectors.forEach(detector => this.register(detector));
     }

package/dist/index.js CHANGED Viewed

@@ -18,6 +18,7 @@ exports.extractDeepestHybrid = extractDeepestHybrid;
 const mime_layer_1 = require("./mime-layer");
 const inline_layer_1 = require("./inline-layer");
 const utils_1 = require("./utils");
+const scoring_1 = require("./scoring");
 /**
  * Main entry point: Extract the deepest forwarded email using hybrid strategy
  */
@@ -53,17 +54,17 @@ async function extractDeepestHybrid(raw, options) {
         const inlineResult = await (0, inline_layer_1.processInline)(mimeResult.rawBody, mimeResult.depth, mimeResult.history, opts.customDetectors);
         // Step 3: Align results
         let from = (0, utils_1.normalizeFrom)(inlineResult.from);
-        let to = inlineResult.to;
+        let to = (0, utils_1.normalizeFrom)(inlineResult.to);
         let subject = inlineResult.subject;
         let date_raw = inlineResult.date_raw;
         let date_iso = inlineResult.date_iso;
         let text = inlineResult.text;
         if (inlineResult.diagnostics.method === 'fallback' && mimeResult.metadata) {
             const m = mimeResult.metadata;
-            if (!from && m.from?.value?.[0]) {
+            if ((!from || !from.address) && m.from?.value?.[0]) {
                 from = (0, utils_1.normalizeFrom)({ name: m.from.value[0].name, address: m.from.value[0].address });
             }
-            if (!to && m.to?.value?.[0]) {
+            if ((!to || !to.address) && m.to?.value?.[0]) {
                 to = (0, utils_1.normalizeFrom)({ name: m.to.value[0].name, address: m.to.value[0].address });
             }
             if (!subject && m.subject)
@@ -99,6 +100,8 @@ async function extractDeepestHybrid(raw, options) {
         date_iso = date_iso || (0, utils_1.normalizeDateToISO)(date_raw);
         // Destructure to exclude 'from' since we have our own normalized version
         const { from: _unusedFrom, ...restInlineResult } = inlineResult;
+        // Calculate confidence score
+        const confidence = (0, scoring_1.calculateConfidence)(mimeResult.rawBody, mimeResult.depth + inlineResult.diagnostics.depth);
         const result = {
             ...restInlineResult,
             // Use our normalized/enriched values
@@ -109,6 +112,15 @@ async function extractDeepestHybrid(raw, options) {
             date_iso,
             text: (0, utils_1.cleanText)(text),
             full_body: mimeResult.rawBody,
+            // Confidence
+            confidence_score: confidence.score,
+            confidence_description: confidence.description,
+            confidence_ratio: confidence.ratio,
+            confidence_email_count: confidence.email_count,
+            confidence_sender_count: confidence.sender_count,
+            confidence_quote_depth: confidence.quote_depth,
+            confidence_signals: confidence.signals,
+            confidence_reasons: confidence.reasons,
             attachments: [...attachments, ...inlineResult.attachments],
             diagnostics: {
                 ...inlineResult.diagnostics,

package/dist/inline-layer.js CHANGED Viewed

@@ -121,7 +121,7 @@ async function processInline(text, depth, baseHistory = [], customDetectors = []
             attachments: [],
             history: history.slice().reverse(),
             diagnostics: {
-                method: (deepestEntry.flags.find(f => f.startsWith('method:')) || 'inline'),
+                method: (deepestEntry.flags.find(f => f.startsWith('method:'))?.replace('method:', '') || 'inline'),
                 depth: currentDepth - startingDepth,
                 parsedOk: true,
                 warnings: warnings

package/dist/scoring.d.ts ADDED Viewed

@@ -0,0 +1,16 @@
+/**
+ * Confidence Score Calculation Logic
+ * Evaluates the coherence between detected forward depth and email address density.
+ * Uses a Signal-Based architecture where various factors contribute to a health score.
+ */
+export interface ConfidenceResult {
+    score: number;
+    description: string;
+    ratio: number;
+    email_count: number;
+    sender_count: number;
+    quote_depth: number;
+    signals: Record<string, number>;
+    reasons: string[];
+}
+export declare function calculateConfidence(fullBody: string, depth: number): ConfidenceResult;

package/dist/scoring.js ADDED Viewed

@@ -0,0 +1,154 @@
+"use strict";
+/**
+ * Confidence Score Calculation Logic
+ * Evaluates the coherence between detected forward depth and email address density.
+ * Uses a Signal-Based architecture where various factors contribute to a health score.
+ */
+Object.defineProperty(exports, "__esModule", { value: true });
+exports.calculateConfidence = calculateConfidence;
+function calculateConfidence(fullBody, depth) {
+    // 0. Base case: No depth detected implies no confidence metric applicable (N/A)
+    if (depth === 0) {
+        return {
+            score: 100,
+            description: "N/A (No depth detected)",
+            ratio: 0,
+            email_count: 0,
+            sender_count: 0,
+            quote_depth: 0,
+            signals: {},
+            reasons: ["No depth detected"]
+        };
+    }
+    // 1. Calculate Max Quote Depth (">" prefix)
+    const lines = fullBody.split('\n');
+    let maxQuoteDepth = 0;
+    for (const line of lines) {
+        const match = line.match(/^(\s*>)+/);
+        if (match) {
+            const qCount = (match[0].match(/>/g) || []).length;
+            if (qCount > maxQuoteDepth)
+                maxQuoteDepth = qCount;
+        }
+    }
+    // 2. Count emails strictly between angle brackets <...>
+    const emailRegex = /<[\s\r\n]*([^\s<>@]+@[^\s<>@]+)[\s\r\n]*>/g;
+    let match;
+    const emails = [];
+    while ((match = emailRegex.exec(fullBody)) !== null) {
+        emails.push({ addr: match[1], index: match.index, fullMatchLength: match[0].length });
+    }
+    const count = emails.length;
+    const ratio = count / depth;
+    // 3. Sender & Header context analysis
+    let explainedCount = 0;
+    let fromCount = 0;
+    const contextWindow = 150;
+    const fromKeywords = [
+        "From", "Od", "Fra", "Von", "De", "Lähettäjä", "Šalje", "Feladó", "Da", "Van", "Expeditorul",
+        "Отправитель", "Från", "Kimden", "Від кого", "Saatja", "De la", "Gönderen", "От", "Від",
+        "Mittente", "Nadawca", "送信元"
+    ];
+    const otherKeywords = [
+        "To", "Komu", "Til", "An", "Para", "Vastaanottaja", "À", "Prima", "Címzett", "A", "Aan", "Do",
+        "Destinatarul", "Кому", "Pre", "Till", "Kime", "Pour", "Adresat", "送信先",
+        "Cc", "CC", "Kopie", "Kopio", "Másolat", "Kopi", "Dw", "Копия", "Kopia", "Bilgi", "Копія",
+        "Másolatot kap", "Kópia", "Copie à",
+        "Reply-To", "Odgovori na", "Odpověď na", "Svar til", "Antwoord aan", "Vastaus", "Répondre à",
+        "Antwort an", "Válaszcím", "Rispondi a", "Odpowiedź-do", "Responder A", "Responder a",
+        "Răspuns către", "Ответ-Кому", "Odpovedať-Pre", "Svara till", "Yanıt Adresi", "Кому відповісти"
+    ];
+    const trailingSenderKeywords = [
+        "wrote", "escribió", "a écrit", "kirjoitti", "ezt írta", "ha scritto", "geschreven", "skrev",
+        "napisał", "escreveu", "написал", "napísal", "följande", "tarihinde şunu yazdı", "napsal"
+    ];
+    const buildRegex = (words, strict = false) => {
+        const sorted = Array.from(new Set(words)).sort((a, b) => b.length - a.length);
+        const joined = sorted.map(k => k.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')).join('|');
+        const prefix = `[\\*\\_\\>]*\\s*`;
+        const suffix = `\\s*[\\*\\_]*\\s*`;
+        if (strict) {
+            return new RegExp(`(?:${prefix}(?:${joined})${suffix})\\s*:\\s*(?:[^:\\n]*\\n\\s*)?[^:\\n]*$`, 'i');
+        }
+        return new RegExp(`(?:${prefix}(?:${joined})${suffix})\\s*:`, 'i');
+    };
+    const headerPattern = buildRegex([...fromKeywords, ...otherKeywords], false);
+    const fromPattern = buildRegex(fromKeywords, true);
+    const trailingPattern = new RegExp(`^\\s*[\\*\\_\\>]*\\s*(?:${trailingSenderKeywords.join('|')})\\s*:?`, 'i');
+    for (const email of emails) {
+        const start = Math.max(0, email.index - contextWindow);
+        const preText = fullBody.substring(start, email.index);
+        const postText = fullBody.substring(email.index + email.fullMatchLength);
+        const blocks = preText.split(/\n\s*\n/);
+        const currentBlock = blocks[blocks.length - 1];
+        if (headerPattern.test(currentBlock))
+            explainedCount++;
+        if (fromPattern.test(preText) || trailingPattern.test(postText))
+            fromCount++;
+    }
+    // ⚖️  SIGNAL-BASED SCORING ⚖️
+    const signals = {};
+    let finalScore = 100;
+    const reasons = [];
+    // --- 1. Ratio Signals (The base score) ---
+    if (count === 0) {
+        signals['penalty_ghost'] = -100;
+        reasons.push("Ghost Forward: 0 emails found in the body");
+    }
+    else if (ratio < 0.5) {
+        signals['penalty_inconsistent'] = -100;
+        reasons.push(`Inconsistent Density: Ratio ${ratio.toFixed(2)} is too low (expected >= 0.5)`);
+    }
+    else if (ratio >= 0.5 && ratio <= 1.5) {
+        signals['adjustment_partial'] = -50;
+        reasons.push(`Partial Chain: Ratio ${ratio.toFixed(2)} suggests ~1 email per detected level`);
+    }
+    else if (ratio > 2.4) {
+        signals['adjustment_high_density'] = -75;
+        reasons.push(`High Density: Ratio ${ratio.toFixed(2)} is high (many emails per level)`);
+        // Bonus for validated high density
+        const explainedRatio = explainedCount / count;
+        if (explainedRatio >= 0.6) {
+            signals['bonus_validated_density'] = 75;
+            reasons.push(`Validated Density: ${Math.round(explainedRatio * 100)}% of emails are preceded by headers`);
+        }
+        else {
+            reasons.push(`Unvalidated Density: Only ${Math.round(explainedRatio * 100)}% of emails have header context`);
+        }
+    }
+    else {
+        reasons.push(`Standard Density: Ratio ${ratio.toFixed(2)} is optimal (~2 emails per level)`);
+    }
+    // --- 2. Coherence Signals (Penalties) ---
+    if (fromCount > depth) {
+        signals['penalty_sender_mismatch'] = -75;
+        reasons.push(`Sender Mismatch: Found ${fromCount} senders but only ${depth} forward levels`);
+    }
+    if (maxQuoteDepth > depth) {
+        signals['penalty_quote_mismatch'] = -75;
+        reasons.push(`Quote Mismatch: Max quote nesting ${maxQuoteDepth} exceeds detected depth ${depth}`);
+    }
+    // --- Aggregate ---
+    for (const val of Object.values(signals)) {
+        finalScore += val;
+    }
+    finalScore = Math.max(0, Math.min(100, finalScore));
+    // Map description based on final score if not already descriptive
+    let description = reasons.join("; ");
+    if (finalScore === 100)
+        description = "High Confidence: " + description;
+    else if (finalScore >= 50)
+        description = "Medium Confidence: " + description;
+    else
+        description = "Low Confidence: " + description;
+    return {
+        score: finalScore,
+        description,
+        ratio,
+        email_count: count,
+        sender_count: fromCount,
+        quote_depth: maxQuoteDepth,
+        signals,
+        reasons
+    };
+}

package/dist/types.d.ts CHANGED Viewed

@@ -35,6 +35,14 @@ export interface ResultObject {
     date_iso: string | null;
     text: string | null;
     full_body?: string;
+    confidence_score?: number;
+    confidence_description?: string;
+    confidence_ratio?: number;
+    confidence_email_count?: number;
+    confidence_sender_count?: number;
+    confidence_quote_depth?: number;
+    confidence_signals?: Record<string, number>;
+    confidence_reasons?: string[];
     attachments: Attachment[];
     history: HistoryEntry[];
     diagnostics: Diagnostics;

package/dist/utils.js CHANGED Viewed

@@ -221,6 +221,9 @@ function normalizeFrom(from) {
     if (from.address) {
         from.address = from.address.replace(/^[\*\_]+|[\*\_]+$/g, '').trim();
     }
+    // FINAL VALIDATION: If at the end we have no address and no name, return null
+    if (!from.address && !from.name)
+        return null;
     return from;
 }
 function normalizeParserResult(parsed, method, depth, warnings = []) {

package/docs/architecture/README.md CHANGED Viewed

@@ -16,6 +16,10 @@ This directory contains the documentation for the refactor of the `email-deepest
     *   Implementation of `OutlookFRDetector`, `NewOutlookDetector`, and `ReplyDetector`.
     *   Achieved **100% compatibility** with 239/239 body fixtures.
+4.  **[Confidence Scoring System](../confidence_scoring.md)**
+    *   Implementation of the signal-based reliability evaluation.
+    *   Handles email density, sender count mismatches, and quote level analysis.
 ## Planning & Reports
 *   **[Overall Plugin Plan](plugin_plan.md)**: The technical blueprint for the refactor.

package/docs/confidence_scoring.md ADDED Viewed

@@ -0,0 +1,75 @@
+# Confidence Scoring System
+The `email-origin-chain` library implements a specialized **Signal-Based Scoring System** to evaluate the reliability of detected email chains. This is particularly important for `inline` detection (text-based), where formatting can sometimes be ambiguous.
+## ⚖️ Architecture
+Instead of a single boolean check, the system evaluates several independent **Signals**. These signals can be positive (bonuses) or negative (penalties) and are aggregated into a final score from **0 to 100**.
+```mermaid
+graph TD
+    A[Message Body Analysis] --> B[Metrics Extraction]
+    B --> C[Ratio: Email / Depth]
+    B --> D[Sender Detection]
+    B --> E[Quote Nesting Levels]
+    C & D & E --> F[Signal Evaluators]
+    F --> S1[Ratio Adjustments]
+    F --> S2[Sender Count Penalty]
+    F --> S3[Quote Depth Penalty]
+    F --> S4[Context Header Bonus]
+    S1 & S2 & S3 & S4 --> G[Aggregate Score]
+    G --> H[Clamping 0-100]
+```
+## 📊 Available Signals
+### 1. Ratio Signals (Base Score)
+The ratio is calculated as `Detected Email Addresses / Detected Forward Depth`.
+| Signal | Logic | Impact | Description |
+| :--- | :--- | :--- | :--- |
+| `penalty_ghost` | `email_count == 0` | -100 | The chain indicates a forward but no actual email addresses found. |
+| `penalty_inconsistent` | `ratio < 0.5` | -100 | Extremely low density for the detected depth. |
+| `adjustment_partial` | `0.5 <= ratio <= 1.5` | -50 | Likely a partial chain (e.g., only 1 email detected per level). |
+| `adjustment_high_density` | `ratio > 2.4` | -75 | Very high density (many emails). Requires verification. |
+| **Standard** | `1.5 < ratio <= 2.4` | +0 | **Optimal base score (100)**. Typical for From/To blocks. |
+### 2. Validation & Penalties
+These signals refine the base score based on visual or logical evidence.
+| Signal | Condition | Impact | Description |
+| :--- | :--- | :--- | :--- |
+| `bonus_validated_density` | High density + >60% headers | +75 | Validates "High Density" chains if email addresses are preceded by headers like `To:` or `Cc:`. |
+| `penalty_sender_mismatch` | `senders > depth` | -75 | Found more actual `From:` headers than recursion levels. Suggests a missed separator. |
+| `penalty_quote_mismatch` | `quote_depth > depth` | -75 | Found nested `>` symbols deeper than the detected levels. Suggests hidden levels. |
+## 🔍 Debugging & Transparency
+The extraction result provides two fields for auditing the score:
+1.  **`confidence_signals`**: A raw key-value pair of every triggered signal and its impact.
+2.  **`confidence_reasons`**: A list of human-readable strings explaining each triggered signal.
+### Example Suspect Result
+```json
+{
+  "confidence_score": 25,
+  "confidence_description": "Low Confidence: High Density: Ratio 5.00 is high (many emails per level); Unvalidated Density; Sender Mismatch: Found 2 senders but only 1 forward levels",
+  "confidence_signals": {
+    "adjustment_high_density": -75,
+    "penalty_sender_mismatch": -75
+  },
+  "confidence_reasons": [
+    "High Density: Ratio 5.00 is high (many emails per level)",
+    "Unvalidated Density: Only 0% of emails have header context",
+    "Sender Mismatch: Found 2 senders but only 1 forward levels"
+  ]
+}
+```
+## 🛠 Usage for Developers
+You should typically flag results with `confidence_score < 50` for manual review, as they likely indicate "Garbage" chains or highly fragmented formatting that fooled the parser.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "email-origin-chain",
-  "version": "1.0.8",
+  "version": "1.0.11",
   "description": "Uncover the full audit trail of your email threads. Recursively reconstructs the entire conversation history with instant access to the original sender and true source message.",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",