email-origin-chain 1.0.8 → 1.0.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -17,6 +17,7 @@ Detailed documentation can be found in the [docs/architecture/](docs/architectur
17
17
  - [Phase 2: Plugin Architecture](docs/architecture/phase2_plugin_foundation.md)
18
18
  - [Phase 3: Full Compatibility (100%)](docs/architecture/phase3_fallbacks.md)
19
19
  - [Deep Forward Fix Walkthrough](docs/walkthrough_deep_forward_fix.md)
20
+ - [Confidence Scoring System](docs/confidence_scoring.md)
20
21
  - [Detector Usage & Priorities](docs/detectors_usage.md)
21
22
 
22
23
  **✅ Test Coverage:** The library has been validated against **239 fixtures** from the `email-forward-parser-recursive` library with a **100% success rate** (239/239). This includes validating message bodies and ensuring non-message snippets are correctly identified. See [Test Coverage Report](docs/TEST_COVERAGE.md) for details.
@@ -77,6 +78,10 @@ The library returns a `ResultObject` with the following structure:
77
78
  | `text` | `string \| null` | Cleaned body content of the deepest message. |
78
79
  | `attachments` | `array` | Metadata for MIME attachments found at the deepest level. |
79
80
  | `history` | `array` | **Conversation Chaining**: Full audit trail of the discussion (see below). |
81
+ | `confidence_score` | `number` | Reliability score (0-100) based on signal analysis. |
82
+ | `confidence_description` | `string` | Human-readable explanation of the score. |
83
+ | `confidence_signals` | `object` | Key-value breakdown of triggered bonuses and penalties. |
84
+ | `confidence_reasons` | `array` | Detailed list of triggered scoring rules. |
80
85
  | `diagnostics` | `object` | Metadata about the parsing process. |
81
86
 
82
87
  ### Diagnostics Detail
@@ -116,6 +121,22 @@ Each history entry contains its own `from`, `to`, `cc`, `subject`, `date_iso`, `
116
121
  - `content:silent_forward`: The user forwarded the message without adding any text.
117
122
  - `date:unparseable`: A date string was found but could not be normalized to ISO.
118
123
 
124
+ ## Confidence Scoring System
125
+
126
+ To ensure high-quality extraction from text-based forwards, the library uses a **Signal-Based Confidence Score**. It analyzes metrics like email address density, sender count consistency, and quote levels to detect "Garbage" or incomplete chains.
127
+
128
+ ### Scoring Logic:
129
+ - **Baseline**: 100% confidence for standard formatting (~2 emails per level).
130
+ - **Penalties**:
131
+ - **Sender Mismatch**: More senders found than levels detected (-75%).
132
+ - **Quote Mismatch**: Quote nesting deeper than detected levels (-75%).
133
+ - **Partial Chain**: Only 1 email detected per level (-50%).
134
+ - **Ghost Forward**: No emails found in text (-100%).
135
+ - **Bonuses**:
136
+ - **Validated Density**: High email density corroborated by context headers (+75%).
137
+
138
+ Check the [Confidence Scoring Documentation](docs/confidence_scoring.md) for full details.
139
+
119
140
  ### Typical Output Example
120
141
 
121
142
  ```json
@@ -148,7 +169,13 @@ Each history entry contains its own `from`, `to`, `cc`, `subject`, `date_iso`, `
148
169
  "depth": 2,
149
170
  "parsedOk": true,
150
171
  "warnings": []
151
- }
172
+ },
173
+ "confidence_score": 100,
174
+ "confidence_description": "High Confidence: Standard Density: Ratio 2.00 is optimal (~2 emails per level)",
175
+ "confidence_signals": {},
176
+ "confidence_reasons": [
177
+ "Standard Density: Ratio 2.00 is optimal (~2 emails per level)"
178
+ ]
152
179
  }
153
180
  ```
154
181
 
@@ -278,7 +305,13 @@ console.log(result.diagnostics.depth); // 4 (5 messages total)
278
305
  "depth": 4,
279
306
  "parsedOk": true,
280
307
  "warnings": []
281
- }
308
+ },
309
+ "confidence_score": 100,
310
+ "confidence_description": "High Confidence: Standard Density: Ratio 2.00 is optimal (~2 emails per level)",
311
+ "confidence_signals": {},
312
+ "confidence_reasons": [
313
+ "Standard Density: Ratio 2.00 is optimal (~2 emails per level)"
314
+ ]
282
315
  }
283
316
  ```
284
317
 
@@ -10,7 +10,7 @@ import { ForwardDetector, DetectionResult } from './types';
10
10
  */
11
11
  export declare class OutlookEmptyHeaderDetector implements ForwardDetector {
12
12
  readonly name = "outlook_empty_header";
13
- readonly priority = 50;
13
+ readonly priority = -50;
14
14
  private readonly HEADER_PATTERN;
15
15
  detect(text: string): DetectionResult;
16
16
  }
@@ -14,7 +14,7 @@ const cleaner_1 = require("../utils/cleaner");
14
14
  class OutlookEmptyHeaderDetector {
15
15
  constructor() {
16
16
  this.name = 'outlook_empty_header';
17
- this.priority = 50; // Fallback for corrupted headers (after specifics, before generic Crisp)
17
+ this.priority = -50; // Very specific - High Priority
18
18
  // Regex to capture the header block:
19
19
  // 1. Optional Separator (mostly underscores)
20
20
  // 2. De: ... (From)
@@ -51,6 +51,7 @@ class OutlookEmptyHeaderDetector {
51
51
  message: message || undefined,
52
52
  email: {
53
53
  from: fromLine,
54
+ to: toLine,
54
55
  subject: subjectLine,
55
56
  date: dateLine || undefined,
56
57
  body: finalBody
@@ -4,7 +4,7 @@ import { ForwardDetector, DetectionResult } from './types';
4
4
  */
5
5
  export declare class OutlookReverseFrDetector implements ForwardDetector {
6
6
  readonly name = "outlook_reverse_fr";
7
- readonly priority = -20;
7
+ readonly priority = -45;
8
8
  private readonly ENVOYE_PATTERN;
9
9
  private readonly DE_PATTERN;
10
10
  private readonly A_PATTERN;
@@ -8,7 +8,7 @@ const cleaner_1 = require("../utils/cleaner");
8
8
  class OutlookReverseFrDetector {
9
9
  constructor() {
10
10
  this.name = 'outlook_reverse_fr';
11
- this.priority = -20; // Specific detector - High Priority (Override)
11
+ this.priority = -45; // Specific detector - High Priority
12
12
  // Regex patterns for field detection
13
13
  this.ENVOYE_PATTERN = /^[ \t]*Envoy(?:é|=E9|e)?\s*:\s*(.*?)\s*$/m;
14
14
  this.DE_PATTERN = /^[ \t]*De\s*:/i;
@@ -76,6 +76,7 @@ class OutlookReverseFrDetector {
76
76
  from: fromEmail.includes('@')
77
77
  ? { name: fromName !== fromEmail ? fromName : '', address: fromEmail }
78
78
  : { name: fromName, address: fromName },
79
+ to: a ? extractValue(a.line) : undefined,
79
80
  subject: objet ? extractValue(objet.line) : '',
80
81
  date: extractValue(envoyeMatch[0]),
81
82
  body: finalBody
@@ -15,12 +15,12 @@ class DetectorRegistry {
15
15
  constructor(customDetectors = []) {
16
16
  this.detectors = [];
17
17
  // Register all detectors (priority determines order)
18
- this.register(new crisp_detector_1.CrispDetector()); // priority: 0 (highest - universal library)
19
- this.register(new outlook_empty_header_detector_1.OutlookEmptyHeaderDetector()); // priority: 5 (handle empty headers)
20
- this.register(new outlook_reverse_fr_detector_1.OutlookReverseFrDetector()); // priority: 6 (handle reversed FR headers)
21
- this.register(new reply_detector_1.ReplyDetector()); // priority: 7 (handle standard replies)
22
- this.register(new outlook_fr_detector_1.OutlookFRDetector()); // priority: 10 (fallback for FR formats)
23
- this.register(new new_outlook_detector_1.NewOutlookDetector()); // priority: 10 (fallback for new Outlook)
18
+ this.register(new outlook_empty_header_detector_1.OutlookEmptyHeaderDetector()); // priority: -50 (Very specific)
19
+ this.register(new outlook_reverse_fr_detector_1.OutlookReverseFrDetector()); // priority: -45 (Specific)
20
+ this.register(new new_outlook_detector_1.NewOutlookDetector()); // priority: -40 (Specific)
21
+ this.register(new outlook_fr_detector_1.OutlookFRDetector()); // priority: -30 (Fallback for FR)
22
+ this.register(new reply_detector_1.ReplyDetector()); // priority: -10 (Replies)
23
+ this.register(new crisp_detector_1.CrispDetector()); // priority: 100 (Universal fallback)
24
24
  // Register custom detectors
25
25
  customDetectors.forEach(detector => this.register(detector));
26
26
  }
package/dist/index.js CHANGED
@@ -18,6 +18,7 @@ exports.extractDeepestHybrid = extractDeepestHybrid;
18
18
  const mime_layer_1 = require("./mime-layer");
19
19
  const inline_layer_1 = require("./inline-layer");
20
20
  const utils_1 = require("./utils");
21
+ const scoring_1 = require("./scoring");
21
22
  /**
22
23
  * Main entry point: Extract the deepest forwarded email using hybrid strategy
23
24
  */
@@ -53,17 +54,17 @@ async function extractDeepestHybrid(raw, options) {
53
54
  const inlineResult = await (0, inline_layer_1.processInline)(mimeResult.rawBody, mimeResult.depth, mimeResult.history, opts.customDetectors);
54
55
  // Step 3: Align results
55
56
  let from = (0, utils_1.normalizeFrom)(inlineResult.from);
56
- let to = inlineResult.to;
57
+ let to = (0, utils_1.normalizeFrom)(inlineResult.to);
57
58
  let subject = inlineResult.subject;
58
59
  let date_raw = inlineResult.date_raw;
59
60
  let date_iso = inlineResult.date_iso;
60
61
  let text = inlineResult.text;
61
62
  if (inlineResult.diagnostics.method === 'fallback' && mimeResult.metadata) {
62
63
  const m = mimeResult.metadata;
63
- if (!from && m.from?.value?.[0]) {
64
+ if ((!from || !from.address) && m.from?.value?.[0]) {
64
65
  from = (0, utils_1.normalizeFrom)({ name: m.from.value[0].name, address: m.from.value[0].address });
65
66
  }
66
- if (!to && m.to?.value?.[0]) {
67
+ if ((!to || !to.address) && m.to?.value?.[0]) {
67
68
  to = (0, utils_1.normalizeFrom)({ name: m.to.value[0].name, address: m.to.value[0].address });
68
69
  }
69
70
  if (!subject && m.subject)
@@ -99,6 +100,8 @@ async function extractDeepestHybrid(raw, options) {
99
100
  date_iso = date_iso || (0, utils_1.normalizeDateToISO)(date_raw);
100
101
  // Destructure to exclude 'from' since we have our own normalized version
101
102
  const { from: _unusedFrom, ...restInlineResult } = inlineResult;
103
+ // Calculate confidence score
104
+ const confidence = (0, scoring_1.calculateConfidence)(mimeResult.rawBody, mimeResult.depth + inlineResult.diagnostics.depth);
102
105
  const result = {
103
106
  ...restInlineResult,
104
107
  // Use our normalized/enriched values
@@ -109,6 +112,15 @@ async function extractDeepestHybrid(raw, options) {
109
112
  date_iso,
110
113
  text: (0, utils_1.cleanText)(text),
111
114
  full_body: mimeResult.rawBody,
115
+ // Confidence
116
+ confidence_score: confidence.score,
117
+ confidence_description: confidence.description,
118
+ confidence_ratio: confidence.ratio,
119
+ confidence_email_count: confidence.email_count,
120
+ confidence_sender_count: confidence.sender_count,
121
+ confidence_quote_depth: confidence.quote_depth,
122
+ confidence_signals: confidence.signals,
123
+ confidence_reasons: confidence.reasons,
112
124
  attachments: [...attachments, ...inlineResult.attachments],
113
125
  diagnostics: {
114
126
  ...inlineResult.diagnostics,
@@ -121,7 +121,7 @@ async function processInline(text, depth, baseHistory = [], customDetectors = []
121
121
  attachments: [],
122
122
  history: history.slice().reverse(),
123
123
  diagnostics: {
124
- method: (deepestEntry.flags.find(f => f.startsWith('method:')) || 'inline'),
124
+ method: (deepestEntry.flags.find(f => f.startsWith('method:'))?.replace('method:', '') || 'inline'),
125
125
  depth: currentDepth - startingDepth,
126
126
  parsedOk: true,
127
127
  warnings: warnings
@@ -0,0 +1,16 @@
1
+ /**
2
+ * Confidence Score Calculation Logic
3
+ * Evaluates the coherence between detected forward depth and email address density.
4
+ * Uses a Signal-Based architecture where various factors contribute to a health score.
5
+ */
6
+ export interface ConfidenceResult {
7
+ score: number;
8
+ description: string;
9
+ ratio: number;
10
+ email_count: number;
11
+ sender_count: number;
12
+ quote_depth: number;
13
+ signals: Record<string, number>;
14
+ reasons: string[];
15
+ }
16
+ export declare function calculateConfidence(fullBody: string, depth: number): ConfidenceResult;
@@ -0,0 +1,154 @@
1
+ "use strict";
2
+ /**
3
+ * Confidence Score Calculation Logic
4
+ * Evaluates the coherence between detected forward depth and email address density.
5
+ * Uses a Signal-Based architecture where various factors contribute to a health score.
6
+ */
7
+ Object.defineProperty(exports, "__esModule", { value: true });
8
+ exports.calculateConfidence = calculateConfidence;
9
+ function calculateConfidence(fullBody, depth) {
10
+ // 0. Base case: No depth detected implies no confidence metric applicable (N/A)
11
+ if (depth === 0) {
12
+ return {
13
+ score: 100,
14
+ description: "N/A (No depth detected)",
15
+ ratio: 0,
16
+ email_count: 0,
17
+ sender_count: 0,
18
+ quote_depth: 0,
19
+ signals: {},
20
+ reasons: ["No depth detected"]
21
+ };
22
+ }
23
+ // 1. Calculate Max Quote Depth (">" prefix)
24
+ const lines = fullBody.split('\n');
25
+ let maxQuoteDepth = 0;
26
+ for (const line of lines) {
27
+ const match = line.match(/^(\s*>)+/);
28
+ if (match) {
29
+ const qCount = (match[0].match(/>/g) || []).length;
30
+ if (qCount > maxQuoteDepth)
31
+ maxQuoteDepth = qCount;
32
+ }
33
+ }
34
+ // 2. Count emails strictly between angle brackets <...>
35
+ const emailRegex = /<[\s\r\n]*([^\s<>@]+@[^\s<>@]+)[\s\r\n]*>/g;
36
+ let match;
37
+ const emails = [];
38
+ while ((match = emailRegex.exec(fullBody)) !== null) {
39
+ emails.push({ addr: match[1], index: match.index, fullMatchLength: match[0].length });
40
+ }
41
+ const count = emails.length;
42
+ const ratio = count / depth;
43
+ // 3. Sender & Header context analysis
44
+ let explainedCount = 0;
45
+ let fromCount = 0;
46
+ const contextWindow = 150;
47
+ const fromKeywords = [
48
+ "From", "Od", "Fra", "Von", "De", "Lähettäjä", "Šalje", "Feladó", "Da", "Van", "Expeditorul",
49
+ "Отправитель", "Från", "Kimden", "Від кого", "Saatja", "De la", "Gönderen", "От", "Від",
50
+ "Mittente", "Nadawca", "送信元"
51
+ ];
52
+ const otherKeywords = [
53
+ "To", "Komu", "Til", "An", "Para", "Vastaanottaja", "À", "Prima", "Címzett", "A", "Aan", "Do",
54
+ "Destinatarul", "Кому", "Pre", "Till", "Kime", "Pour", "Adresat", "送信先",
55
+ "Cc", "CC", "Kopie", "Kopio", "Másolat", "Kopi", "Dw", "Копия", "Kopia", "Bilgi", "Копія",
56
+ "Másolatot kap", "Kópia", "Copie à",
57
+ "Reply-To", "Odgovori na", "Odpověď na", "Svar til", "Antwoord aan", "Vastaus", "Répondre à",
58
+ "Antwort an", "Válaszcím", "Rispondi a", "Odpowiedź-do", "Responder A", "Responder a",
59
+ "Răspuns către", "Ответ-Кому", "Odpovedať-Pre", "Svara till", "Yanıt Adresi", "Кому відповісти"
60
+ ];
61
+ const trailingSenderKeywords = [
62
+ "wrote", "escribió", "a écrit", "kirjoitti", "ezt írta", "ha scritto", "geschreven", "skrev",
63
+ "napisał", "escreveu", "написал", "napísal", "följande", "tarihinde şunu yazdı", "napsal"
64
+ ];
65
+ const buildRegex = (words, strict = false) => {
66
+ const sorted = Array.from(new Set(words)).sort((a, b) => b.length - a.length);
67
+ const joined = sorted.map(k => k.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')).join('|');
68
+ const prefix = `[\\*\\_\\>]*\\s*`;
69
+ const suffix = `\\s*[\\*\\_]*\\s*`;
70
+ if (strict) {
71
+ return new RegExp(`(?:${prefix}(?:${joined})${suffix})\\s*:\\s*(?:[^:\\n]*\\n\\s*)?[^:\\n]*$`, 'i');
72
+ }
73
+ return new RegExp(`(?:${prefix}(?:${joined})${suffix})\\s*:`, 'i');
74
+ };
75
+ const headerPattern = buildRegex([...fromKeywords, ...otherKeywords], false);
76
+ const fromPattern = buildRegex(fromKeywords, true);
77
+ const trailingPattern = new RegExp(`^\\s*[\\*\\_\\>]*\\s*(?:${trailingSenderKeywords.join('|')})\\s*:?`, 'i');
78
+ for (const email of emails) {
79
+ const start = Math.max(0, email.index - contextWindow);
80
+ const preText = fullBody.substring(start, email.index);
81
+ const postText = fullBody.substring(email.index + email.fullMatchLength);
82
+ const blocks = preText.split(/\n\s*\n/);
83
+ const currentBlock = blocks[blocks.length - 1];
84
+ if (headerPattern.test(currentBlock))
85
+ explainedCount++;
86
+ if (fromPattern.test(preText) || trailingPattern.test(postText))
87
+ fromCount++;
88
+ }
89
+ // ⚖️ SIGNAL-BASED SCORING ⚖️
90
+ const signals = {};
91
+ let finalScore = 100;
92
+ const reasons = [];
93
+ // --- 1. Ratio Signals (The base score) ---
94
+ if (count === 0) {
95
+ signals['penalty_ghost'] = -100;
96
+ reasons.push("Ghost Forward: 0 emails found in the body");
97
+ }
98
+ else if (ratio < 0.5) {
99
+ signals['penalty_inconsistent'] = -100;
100
+ reasons.push(`Inconsistent Density: Ratio ${ratio.toFixed(2)} is too low (expected >= 0.5)`);
101
+ }
102
+ else if (ratio >= 0.5 && ratio <= 1.5) {
103
+ signals['adjustment_partial'] = -50;
104
+ reasons.push(`Partial Chain: Ratio ${ratio.toFixed(2)} suggests ~1 email per detected level`);
105
+ }
106
+ else if (ratio > 2.4) {
107
+ signals['adjustment_high_density'] = -75;
108
+ reasons.push(`High Density: Ratio ${ratio.toFixed(2)} is high (many emails per level)`);
109
+ // Bonus for validated high density
110
+ const explainedRatio = explainedCount / count;
111
+ if (explainedRatio >= 0.6) {
112
+ signals['bonus_validated_density'] = 75;
113
+ reasons.push(`Validated Density: ${Math.round(explainedRatio * 100)}% of emails are preceded by headers`);
114
+ }
115
+ else {
116
+ reasons.push(`Unvalidated Density: Only ${Math.round(explainedRatio * 100)}% of emails have header context`);
117
+ }
118
+ }
119
+ else {
120
+ reasons.push(`Standard Density: Ratio ${ratio.toFixed(2)} is optimal (~2 emails per level)`);
121
+ }
122
+ // --- 2. Coherence Signals (Penalties) ---
123
+ if (fromCount > depth) {
124
+ signals['penalty_sender_mismatch'] = -75;
125
+ reasons.push(`Sender Mismatch: Found ${fromCount} senders but only ${depth} forward levels`);
126
+ }
127
+ if (maxQuoteDepth > depth) {
128
+ signals['penalty_quote_mismatch'] = -75;
129
+ reasons.push(`Quote Mismatch: Max quote nesting ${maxQuoteDepth} exceeds detected depth ${depth}`);
130
+ }
131
+ // --- Aggregate ---
132
+ for (const val of Object.values(signals)) {
133
+ finalScore += val;
134
+ }
135
+ finalScore = Math.max(0, Math.min(100, finalScore));
136
+ // Map description based on final score if not already descriptive
137
+ let description = reasons.join("; ");
138
+ if (finalScore === 100)
139
+ description = "High Confidence: " + description;
140
+ else if (finalScore >= 50)
141
+ description = "Medium Confidence: " + description;
142
+ else
143
+ description = "Low Confidence: " + description;
144
+ return {
145
+ score: finalScore,
146
+ description,
147
+ ratio,
148
+ email_count: count,
149
+ sender_count: fromCount,
150
+ quote_depth: maxQuoteDepth,
151
+ signals,
152
+ reasons
153
+ };
154
+ }
package/dist/types.d.ts CHANGED
@@ -35,6 +35,14 @@ export interface ResultObject {
35
35
  date_iso: string | null;
36
36
  text: string | null;
37
37
  full_body?: string;
38
+ confidence_score?: number;
39
+ confidence_description?: string;
40
+ confidence_ratio?: number;
41
+ confidence_email_count?: number;
42
+ confidence_sender_count?: number;
43
+ confidence_quote_depth?: number;
44
+ confidence_signals?: Record<string, number>;
45
+ confidence_reasons?: string[];
38
46
  attachments: Attachment[];
39
47
  history: HistoryEntry[];
40
48
  diagnostics: Diagnostics;
package/dist/utils.js CHANGED
@@ -221,6 +221,9 @@ function normalizeFrom(from) {
221
221
  if (from.address) {
222
222
  from.address = from.address.replace(/^[\*\_]+|[\*\_]+$/g, '').trim();
223
223
  }
224
+ // FINAL VALIDATION: If at the end we have no address and no name, return null
225
+ if (!from.address && !from.name)
226
+ return null;
224
227
  return from;
225
228
  }
226
229
  function normalizeParserResult(parsed, method, depth, warnings = []) {
@@ -16,6 +16,10 @@ This directory contains the documentation for the refactor of the `email-deepest
16
16
  * Implementation of `OutlookFRDetector`, `NewOutlookDetector`, and `ReplyDetector`.
17
17
  * Achieved **100% compatibility** with 239/239 body fixtures.
18
18
 
19
+ 4. **[Confidence Scoring System](../confidence_scoring.md)**
20
+ * Implementation of the signal-based reliability evaluation.
21
+ * Handles email density, sender count mismatches, and quote level analysis.
22
+
19
23
  ## Planning & Reports
20
24
 
21
25
  * **[Overall Plugin Plan](plugin_plan.md)**: The technical blueprint for the refactor.
@@ -0,0 +1,75 @@
1
+ # Confidence Scoring System
2
+
3
+ The `email-origin-chain` library implements a specialized **Signal-Based Scoring System** to evaluate the reliability of detected email chains. This is particularly important for `inline` detection (text-based), where formatting can sometimes be ambiguous.
4
+
5
+ ## ⚖️ Architecture
6
+
7
+ Instead of a single boolean check, the system evaluates several independent **Signals**. These signals can be positive (bonuses) or negative (penalties) and are aggregated into a final score from **0 to 100**.
8
+
9
+ ```mermaid
10
+ graph TD
11
+ A[Message Body Analysis] --> B[Metrics Extraction]
12
+ B --> C[Ratio: Email / Depth]
13
+ B --> D[Sender Detection]
14
+ B --> E[Quote Nesting Levels]
15
+
16
+ C & D & E --> F[Signal Evaluators]
17
+
18
+ F --> S1[Ratio Adjustments]
19
+ F --> S2[Sender Count Penalty]
20
+ F --> S3[Quote Depth Penalty]
21
+ F --> S4[Context Header Bonus]
22
+
23
+ S1 & S2 & S3 & S4 --> G[Aggregate Score]
24
+ G --> H[Clamping 0-100]
25
+ ```
26
+
27
+ ## 📊 Available Signals
28
+
29
+ ### 1. Ratio Signals (Base Score)
30
+ The ratio is calculated as `Detected Email Addresses / Detected Forward Depth`.
31
+
32
+ | Signal | Logic | Impact | Description |
33
+ | :--- | :--- | :--- | :--- |
34
+ | `penalty_ghost` | `email_count == 0` | -100 | The chain indicates a forward but no actual email addresses found. |
35
+ | `penalty_inconsistent` | `ratio < 0.5` | -100 | Extremely low density for the detected depth. |
36
+ | `adjustment_partial` | `0.5 <= ratio <= 1.5` | -50 | Likely a partial chain (e.g., only 1 email detected per level). |
37
+ | `adjustment_high_density` | `ratio > 2.4` | -75 | Very high density (many emails). Requires verification. |
38
+ | **Standard** | `1.5 < ratio <= 2.4` | +0 | **Optimal base score (100)**. Typical for From/To blocks. |
39
+
40
+ ### 2. Validation & Penalties
41
+ These signals refine the base score based on visual or logical evidence.
42
+
43
+ | Signal | Condition | Impact | Description |
44
+ | :--- | :--- | :--- | :--- |
45
+ | `bonus_validated_density` | High density + >60% headers | +75 | Validates "High Density" chains if email addresses are preceded by headers like `To:` or `Cc:`. |
46
+ | `penalty_sender_mismatch` | `senders > depth` | -75 | Found more actual `From:` headers than recursion levels. Suggests a missed separator. |
47
+ | `penalty_quote_mismatch` | `quote_depth > depth` | -75 | Found nested `>` symbols deeper than the detected levels. Suggests hidden levels. |
48
+
49
+ ## 🔍 Debugging & Transparency
50
+
51
+ The extraction result provides two fields for auditing the score:
52
+
53
+ 1. **`confidence_signals`**: A raw key-value pair of every triggered signal and its impact.
54
+ 2. **`confidence_reasons`**: A list of human-readable strings explaining each triggered signal.
55
+
56
+ ### Example Suspect Result
57
+ ```json
58
+ {
59
+ "confidence_score": 25,
60
+ "confidence_description": "Low Confidence: High Density: Ratio 5.00 is high (many emails per level); Unvalidated Density; Sender Mismatch: Found 2 senders but only 1 forward levels",
61
+ "confidence_signals": {
62
+ "adjustment_high_density": -75,
63
+ "penalty_sender_mismatch": -75
64
+ },
65
+ "confidence_reasons": [
66
+ "High Density: Ratio 5.00 is high (many emails per level)",
67
+ "Unvalidated Density: Only 0% of emails have header context",
68
+ "Sender Mismatch: Found 2 senders but only 1 forward levels"
69
+ ]
70
+ }
71
+ ```
72
+
73
+ ## 🛠 Usage for Developers
74
+
75
+ You should typically flag results with `confidence_score < 50` for manual review, as they likely indicate "Garbage" chains or highly fragmented formatting that fooled the parser.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "email-origin-chain",
3
- "version": "1.0.8",
3
+ "version": "1.0.11",
4
4
  "description": "Uncover the full audit trail of your email threads. Recursively reconstructs the entire conversation history with instant access to the original sender and true source message.",
5
5
  "main": "dist/index.js",
6
6
  "types": "dist/index.d.ts",