transparent-confidence 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,537 @@
1
+ # transparent-confidence
2
+
3
+ **Structured, explainable confidence scoring for RAG systems.**
4
+
5
+ [![npm version](https://img.shields.io/npm/v/transparent-confidence.svg)](https://www.npmjs.com/package/transparent-confidence)
6
+ [![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)
7
+ [![CI](https://github.com/etetzlaff/transparent-confidence/actions/workflows/ci.yml/badge.svg)](https://github.com/etetzlaff/transparent-confidence/actions/workflows/ci.yml)
8
+
9
+ ---
10
+
11
+ ## The Problem
12
+
13
+ - RAG pipelines produce answers but no reliable signal of whether to trust them
14
+ - Raw retrieval scores (cosine similarity, BM25) are meaningless to end users and even to downstream systems
15
+ - There is no standard for expressing RAG answer quality in a way that is auditable, explainable, and actionable
16
+
17
+ ---
18
+
19
+ ## The Solution
20
+
21
+ `transparent-confidence` computes a typed scorecard (0–100) for any RAG answer:
22
+
23
+ - **Always normalized** — score is 0–100 regardless of which optional dimensions are active
24
+ - **Per-dimension breakdowns** — every point is explainable, not a black box
25
+ - **Tiered display** — Answer Confidence (Tier 1) and System Readiness (Tier 2) shown separately
26
+ - **Zero required config** — three core dimensions work out of the box; optional extensions activate on demand
27
+
28
+ ---
29
+
30
+ ## Install
31
+
32
+ ```bash
33
+ npm install transparent-confidence
34
+ ```
35
+
36
+ Requires Node.js 18+.
37
+
38
+ ---
39
+
40
+ ## Quick Start
41
+
42
+ ```typescript
43
+ import { computeConfidence } from 'transparent-confidence';
44
+
45
+ const scorecard = computeConfidence({
46
+ confidenceLevel: 'high',
47
+ citationCount: 3,
48
+ candidates: [
49
+ {
50
+ retrievalScores: { semantic: 0.88, keyword: 0.72 },
51
+ combinedScore: 0.88,
52
+ documentId: 'doc-001',
53
+ },
54
+ {
55
+ retrievalScores: { semantic: 0.85, keyword: 0.68 },
56
+ combinedScore: 0.85,
57
+ documentId: 'doc-002',
58
+ },
59
+ {
60
+ retrievalScores: { semantic: 0.82, keyword: 0.65 },
61
+ combinedScore: 0.82,
62
+ documentId: 'doc-003',
63
+ },
64
+ ],
65
+ });
66
+
67
+ console.log(scorecard.total); // 100
68
+ console.log(scorecard.label); // 'Strong'
69
+ console.log(scorecard.labelColor); // 'green'
70
+ ```
71
+
72
+ **Output shape:**
73
+
74
+ ```json
75
+ {
76
+ "total": 100,
77
+ "label": "Strong",
78
+ "labelColor": "green",
79
+ "tier1": { "score": 100, "label": "Strong", "color": "green" },
80
+ "tier2": null,
81
+ "dimensions": {
82
+ "grounding": { "raw": 30, "max": 30, "normalized": 100, "explanation": "Source text directly and unambiguously answers the question. 3 sections explicitly cited in answer (+2)." },
83
+ "retrieval": { "raw": 25, "max": 25, "normalized": 100, "explanation": "3 candidates confirmed by 2+ retrieval methods. Top-3 effective score avg: 0.85. 3 distinct source documents. 3 total candidates." },
84
+ "consistency": { "raw": 10, "max": 10, "normalized": 100, "explanation": "Score std dev 0.024 — very tight retrieval consistency. No conflict detected (+2)." }
85
+ },
86
+ "meta": {
87
+ "rawTotal": 65,
88
+ "maxPossible": 65,
89
+ "activeExtensions": []
90
+ }
91
+ }
92
+ ```
93
+
94
+ ---
95
+
96
+ ## Algorithm
97
+
98
+ The score is built from three core dimensions (always active) and up to three optional extensions. Raw points from all active dimensions are summed and normalized to 0–100.
99
+
100
+ ```
101
+ normalizedScore = round((rawTotal / maxPossible) × 100)
102
+
103
+ maxPossible = 65 (core)
104
+ + 20 (Authority active)
105
+ + 15 (Corpus active)
106
+ + 15 (Freshness active)
107
+ ```
108
+
109
+ ### Labels
110
+
111
+ Applied to the final normalized score:
112
+
113
+ | Label | Range | Color |
114
+ |---|---|---|
115
+ | Strong | ≥ 85 | green |
116
+ | Moderate | ≥ 65 | amber |
117
+ | Limited | ≥ 40 | orange |
118
+ | Insufficient | < 40 | red |
119
+
120
+ ### Tier Display
121
+
122
+ **Tier 1 — Answer Confidence:** Grounding + Retrieval + Consistency + Authority (when active). Normalized independently to 0–100. Labels match composite scale.
123
+
124
+ **Tier 2 — System Readiness:** Corpus + Freshness (when active). Normalized independently to 0–100. Uses separate labels: Complete / Good / Partial / Thin. Hidden (`null`) when neither extension is configured.
125
+
126
+ ---
127
+
128
+ ### Dimension 1 — Answer Grounding (max 30 pts)
129
+
130
+ Scores how well the LLM answer is grounded in source documents.
131
+
132
+ **Required inputs:** `confidenceLevel`
133
+
134
+ **Optional inputs:** `ambiguityNotes`, `documentsSilent`, `requiresExpertReview`, `externalConstraintNote`, `hasConflict`, `queryComplexity`, `faithfulnessScore`, `citationCount`
135
+
136
+ #### Base score
137
+
138
+ | Condition | Base |
139
+ |---|---|
140
+ | `documentsSilent = true` | 0 — all further logic skipped |
141
+ | `confidenceLevel = 'low'` | 5 |
142
+ | `confidenceLevel = 'medium'` | 13 |
143
+ | `confidenceLevel = 'high'` + ambiguity present | 21 |
144
+ | `confidenceLevel = 'high'` + no ambiguity | 30 |
145
+
146
+ #### Penalties (applied after base, floor 0)
147
+
148
+ | Condition | Penalty |
149
+ |---|---|
150
+ | `requiresExpertReview = true` | −3 |
151
+ | `externalConstraintNote` present | −2 |
152
+ | `hasConflict = true` | −5 |
153
+
154
+ #### `queryComplexity` ceiling (applied after penalties)
155
+
156
+ | Value | Ceiling |
157
+ |---|---|
158
+ | `'direct'` or not provided | 30 (no ceiling) |
159
+ | `'inferential'` | 24 |
160
+ | `'multi-hop'` | 18 |
161
+ | `'comparative'` | 16 |
162
+
163
+ #### `faithfulnessScore` modifier (applied after ceiling, floor 0)
164
+
165
+ An external faithfulness score (e.g. from RAGAs or a custom evaluator) that measures whether the LLM answer text is supported by the retrieved passages.
166
+
167
+ | Value | Modifier |
168
+ |---|---|
169
+ | ≥ 0.90 | +0 |
170
+ | 0.70–0.89 | −3 |
171
+ | 0.50–0.69 | −7 |
172
+ | < 0.50 | −12 |
173
+ | Not provided | Not applied |
174
+
175
+ #### `citationCount` bonus (applied last, cannot exceed 30)
176
+
177
+ | Value | Bonus |
178
+ |---|---|
179
+ | ≥ 3 | +2 |
180
+ | 2 | +1 |
181
+ | 0–1 or not provided | +0 |
182
+
183
+ ---
184
+
185
+ ### Dimension 2 — Retrieval Confidence (max 25 pts)
186
+
187
+ Scores the quality, breadth, and agreement of the retrieved candidates. Three sub-signals summed, total capped at 25.
188
+
189
+ **Required inputs:** `candidates[].retrievalScores`, `candidates[].combinedScore`
190
+
191
+ **Optional inputs:** `candidates[].documentId`, `candidates[].extractionQuality`
192
+
193
+ #### Sub-signal A — Method Agreement (0–15)
194
+
195
+ Counts candidates where ≥ 2 named retrieval methods each scored > 0:
196
+
197
+ | Candidates confirmed by 2+ methods | Points |
198
+ |---|---|
199
+ | ≥ 3 | 15 |
200
+ | 2 | 12 |
201
+ | 1 | 8 |
202
+ | 0 | 3 |
203
+
204
+ #### Sub-signal B — Score Magnitude (0–8)
205
+
206
+ Average `combinedScore` of top 3 candidates by score. If `extractionQuality` is provided, applies as a multiplier before averaging: `effectiveScore = combinedScore × extractionQuality`.
207
+
208
+ | Avg effective score | Points |
209
+ |---|---|
210
+ | ≥ 0.80 | 8 |
211
+ | ≥ 0.65 | 6 |
212
+ | ≥ 0.50 | 4 |
213
+ | ≥ 0.35 | 2 |
214
+ | < 0.35 | 0 |
215
+
216
+ #### Sub-signal C — Source Diversity + Section Breadth (0–5)
217
+
218
+ | Unique `documentId` values | Points |
219
+ |---|---|
220
+ | ≥ 3 distinct documents | +3 |
221
+ | 2 distinct documents | +1 |
222
+ | 1 or not provided | +0 |
223
+
224
+ | Total candidate count | Points |
225
+ |---|---|
226
+ | ≥ 5 | +2 |
227
+ | 3–4 | +1 |
228
+ | ≤ 2 | +0 |
229
+
230
+ ---
231
+
232
+ ### Dimension 3 — Evidence Consistency (max 10 pts)
233
+
234
+ Scores how consistent the retrieved candidates are with each other.
235
+
236
+ **Required inputs:** `candidates[].combinedScore`
237
+
238
+ **Optional inputs:** `conflictingCandidateCount`, `hasConflict`
239
+
240
+ `conflictingCandidateCount` takes precedence over boolean `hasConflict` when both are provided.
241
+
242
+ #### Sub-signal A — Score Variance (0–8)
243
+
244
+ Population standard deviation of `combinedScore` across all candidates:
245
+
246
+ | Condition | Points |
247
+ |---|---|
248
+ | No candidates | 0 |
249
+ | Only 1 candidate | 4 (neutral — variance unmeasurable) |
250
+ | std dev < 0.10 | 8 |
251
+ | std dev < 0.20 | 6 |
252
+ | std dev < 0.30 | 4 |
253
+ | std dev ≥ 0.30 | 2 |
254
+
255
+ #### Sub-signal B — Conflict Status (−2 to +2, total floor 0)
256
+
257
+ | Condition | Adjustment |
258
+ |---|---|
259
+ | `conflictingCandidateCount = 0` or no conflict indicators | +2 |
260
+ | `conflictingCandidateCount = 1` | 0 |
261
+ | `conflictingCandidateCount ≥ 2` | −2 |
262
+ | `hasConflict = true` (boolean fallback, no count given) | −2 |
263
+
264
+ ---
265
+
266
+ ## API Reference
267
+
268
+ ### `computeConfidence(inputs, config?)`
269
+
270
+ ```typescript
271
+ function computeConfidence(inputs: ScoringInputs, config?: ScoringConfig): ConfidenceScorecard;
272
+ ```
273
+
274
+ Scores a single RAG answer. `config` is optional — omitting it runs the three core dimensions only.
275
+
276
+ ### `createScorer(config)`
277
+
278
+ ```typescript
279
+ function createScorer(config: ScoringConfig): {
280
+ compute: (inputs: ScoringInputs) => ConfidenceScorecard;
281
+ };
282
+ ```
283
+
284
+ Returns a scorer pre-bound to a config. Use when scoring many answers against the same corpus/authority setup.
285
+
286
+ ```typescript
287
+ const scorer = createScorer({ corpus: { expectedDocCount: 10 } });
288
+ const s1 = scorer.compute(inputs1);
289
+ const s2 = scorer.compute(inputs2);
290
+ ```
291
+
292
+ ---
293
+
294
+ ### `ScoringInputs`
295
+
296
+ | Field | Type | Required | Description |
297
+ |---|---|---|---|
298
+ | `confidenceLevel` | `'high' \| 'medium' \| 'low'` | ✅ | LLM self-assessed confidence in the answer |
299
+ | `candidates` | `Candidate[]` | ✅ | Retrieved chunks used to produce the answer |
300
+ | `ambiguityNotes` | `string \| null` | — | Non-null value signals the LLM found ambiguity in the source |
301
+ | `requiresExpertReview` | `boolean` | — | LLM recommends human expert review |
302
+ | `externalConstraintNote` | `string \| null` | — | Non-null signals an external constraint limits the answer |
303
+ | `documentsSilent` | `boolean` | — | True when source documents do not address the question at all |
304
+ | `hasConflict` | `boolean` | — | Documents contain conflicting information |
305
+ | `conflictingCandidateCount` | `number` | — | Number of conflicting candidates (overrides `hasConflict`) |
306
+ | `queryComplexity` | `'direct' \| 'inferential' \| 'multi-hop' \| 'comparative'` | — | Complexity of the question type; applies ceiling to grounding |
307
+ | `faithfulnessScore` | `number` | — | 0–1 external faithfulness score (e.g. RAGAs); applies modifier to grounding |
308
+ | `citationCount` | `number` | — | Number of distinct source sections explicitly cited in the answer |
309
+ | `corpusDocCount` | `number` | — | Current document count in the corpus (required when Corpus extension active) |
310
+ | `missingRelevantType` | `boolean` | — | True when a known relevant document type is not in the corpus |
311
+
312
+ ### `Candidate`
313
+
314
+ | Field | Type | Required | Description |
315
+ |---|---|---|---|
316
+ | `retrievalScores` | `Record<string, number>` | ✅ | Named scores per retrieval method, e.g. `{ semantic: 0.8, keyword: 0.6 }` |
317
+ | `combinedScore` | `number` | ✅ | Final blended score 0–1 used for ranking |
318
+ | `documentId` | `string` | — | Source document identifier; used for diversity scoring |
319
+ | `documentType` | `string` | — | Document type label; matched against Authority tier keywords |
320
+ | `authorityRank` | `number` | — | Explicit authority rank (lower = higher authority); overrides keyword matching |
321
+ | `isAmendment` | `boolean` | — | True if this candidate comes from an amendment to the base document |
322
+ | `extractionQuality` | `number` | — | 0–1 OCR or extraction quality multiplier applied to `combinedScore` |
323
+ | `lastUpdated` | `Date` | — | Document last-updated date; used by Freshness extension |
324
+
325
+ ### `ScoringConfig`
326
+
327
+ All fields optional. Passing a key activates that extension.
328
+
329
+ | Field | Type | Default | Description |
330
+ |---|---|---|---|
331
+ | `authority` | `{ tiers?: AuthorityTier[] }` | — | Activates Source Authority extension |
332
+ | `authority.tiers` | `AuthorityTier[]` | See below | Custom authority tier definitions |
333
+ | `corpus` | `{ expectedDocCount: number }` | — | Activates Corpus Completeness extension |
334
+ | `corpus.expectedDocCount` | `number` | *(required)* | Number of document types expected in a complete corpus |
335
+ | `freshness` | `FreshnessConfig` | — | Activates Document Freshness extension |
336
+ | `freshness.maxAgeForFullScore` | `number` (days) | 90 | Documents within this age receive full freshness points |
337
+ | `freshness.penaltyPerMonth` | `number` | 1.5 | Points deducted per 30-day increment beyond window |
338
+ | `freshness.hardCutoffAge` | `number` (days) | 730 | Documents at or beyond this age score 0 |
339
+
340
+ **Default Authority tiers** (when `config.authority.tiers` is omitted):
341
+
342
+ | Name | Rank |
343
+ |---|---|
344
+ | Primary | 10 |
345
+ | Secondary | 20 |
346
+ | Supporting | 30 |
347
+
348
+ ### `ConfidenceScorecard`
349
+
350
+ | Field | Type | Description |
351
+ |---|---|---|
352
+ | `total` | `number` | Normalized score 0–100 (integer) |
353
+ | `label` | `'Strong' \| 'Moderate' \| 'Limited' \| 'Insufficient'` | Human-readable label |
354
+ | `labelColor` | `'green' \| 'amber' \| 'orange' \| 'red'` | Display color for UI badge |
355
+ | `tier1` | `{ score, label, color } \| null` | Answer Confidence tier (Grounding + Retrieval + Consistency + Authority) |
356
+ | `tier2` | `{ score, label, color } \| null` | System Readiness tier (Corpus + Freshness); null when neither extension active |
357
+ | `dimensions.grounding` | `DimensionScore` | Always present |
358
+ | `dimensions.retrieval` | `DimensionScore` | Always present |
359
+ | `dimensions.consistency` | `DimensionScore` | Always present |
360
+ | `dimensions.authority` | `DimensionScore \| undefined` | Present only when Authority extension active |
361
+ | `dimensions.corpus` | `DimensionScore \| undefined` | Present only when Corpus extension active |
362
+ | `dimensions.freshness` | `DimensionScore \| undefined` | Present only when Freshness extension active |
363
+ | `meta.rawTotal` | `number` | Sum of raw points before normalization |
364
+ | `meta.maxPossible` | `number` | Maximum achievable raw points given active extensions |
365
+ | `meta.activeExtensions` | `string[]` | Names of active extensions, e.g. `['authority', 'corpus']` |
366
+
367
+ ### `DimensionScore`
368
+
369
+ | Field | Type | Description |
370
+ |---|---|---|
371
+ | `raw` | `number` | Raw points scored for this dimension |
372
+ | `max` | `number` | Maximum raw points for this dimension |
373
+ | `normalized` | `number` | `raw / max × 100`, rounded (0–100) |
374
+ | `explanation` | `string` | Human-readable summary of what drove the score |
375
+
376
+ ---
377
+
378
+ ## Extensions
379
+
380
+ ### Source Authority
381
+
382
+ Scores how authoritative the retrieved sources are. Useful for legal, compliance, governance, and policy domains where document hierarchy matters.
383
+
384
+ ```typescript
385
+ import { computeConfidence } from 'transparent-confidence';
386
+
387
+ const scorecard = computeConfidence(inputs, {
388
+ authority: {
389
+ tiers: [
390
+ { name: 'CC&Rs', rank: 10, keywords: ['CC&Rs', 'Declaration', 'Master Deed'] },
391
+ { name: 'Bylaws', rank: 15, keywords: ['Bylaws'] },
392
+ { name: 'Rules', rank: 20, keywords: ['Rules', 'Regulations', 'Policy'] },
393
+ { name: 'Board Notes', rank: 30, keywords: ['Minutes', 'Resolution'] },
394
+ ],
395
+ },
396
+ });
397
+ ```
398
+
399
+ Each candidate is classified by matching `documentType` against tier `keywords`. `authorityRank` on the candidate overrides keyword matching if provided.
400
+
401
+ **Scoring:** 18 base pts for highest-authority source found, +1 if any candidate has `isAmendment: true`, +1 if multiple tiers represented. Max 20.
402
+
403
+ ### Corpus Completeness
404
+
405
+ Scores how complete the document corpus is relative to what's expected. Surfaces the risk that a correct answer exists but the documents needed to find it haven't been uploaded.
406
+
407
+ ```typescript
408
+ const scorecard = computeConfidence(inputs, {
409
+ corpus: { expectedDocCount: 6 },
410
+ });
411
+ ```
412
+
413
+ Provide `corpusDocCount` on inputs with the current document count. Set `missingRelevantType: true` if a known document type relevant to the query is absent from the corpus.
414
+
415
+ **Scoring:** 15 pts at 100% coverage, scales down by ratio. −3 penalty for `missingRelevantType`. Floor 0.
416
+
417
+ ### Document Freshness
418
+
419
+ Scores how recent the retrieved documents are. Uses the median `lastUpdated` date across all candidates.
420
+
421
+ ```typescript
422
+ const scorecard = computeConfidence(inputs, {
423
+ freshness: {
424
+ maxAgeForFullScore: 60, // days — full score if median age ≤ 60
425
+ penaltyPerMonth: 2, // pts lost per 30-day increment beyond window
426
+ hardCutoffAge: 365, // days — score = 0 beyond this
427
+ },
428
+ });
429
+ ```
430
+
431
+ All three config fields are optional; defaults are `maxAgeForFullScore: 90`, `penaltyPerMonth: 1.5`, `hardCutoffAge: 730`. Provide `lastUpdated: Date` on each candidate.
432
+
433
+ ---
434
+
435
+ ## Enhanced Signals
436
+
437
+ These inputs add nuance to the core dimension scores. All are optional and independently skipped when not provided.
438
+
439
+ ### `faithfulnessScore`
440
+
441
+ A 0–1 score measuring whether the LLM answer text is actually supported by the retrieved passages — distinct from `confidenceLevel`, which reflects LLM self-assessment. Tools like [RAGAs](https://docs.ragas.io/) compute this. Applies a −3 to −12 modifier to grounding, preventing high-confidence scores when the model hallucinates.
442
+
443
+ ```typescript
444
+ { confidenceLevel: 'high', faithfulnessScore: 0.45, candidates: [...] }
445
+ // confidenceLevel='high' starts at 30; faithfulnessScore < 0.50 → −12 → raw capped lower
446
+ ```
447
+
448
+ ### `queryComplexity`
449
+
450
+ Indicates the structural complexity of the question. Sets a ceiling on grounding to prevent high grounding scores on questions that require inference the model may not have made correctly.
451
+
452
+ | Value | Ceiling | Use when |
453
+ |---|---|---|
454
+ | `'direct'` | 30 (none) | Factual lookup, single document section |
455
+ | `'inferential'` | 24 | Requires reasoning across implicit relationships |
456
+ | `'multi-hop'` | 18 | Answer requires chaining multiple document sections |
457
+ | `'comparative'` | 16 | Comparing two or more policies, rules, or entities |
458
+
459
+ ### `citationCount`
460
+
461
+ Number of distinct source sections explicitly cited in the answer. Adds +1 (2 citations) or +2 (≥3 citations) to grounding. Rewards answers that show their work.
462
+
463
+ ### `extractionQuality`
464
+
465
+ A 0–1 multiplier per candidate reflecting OCR or PDF extraction quality. Applied as `effectiveScore = combinedScore × extractionQuality` before retrieval scoring. Prevents high retrieval scores from poorly-extracted documents.
466
+
467
+ ### Source Diversity (`documentId`)
468
+
469
+ Setting `documentId` on candidates enables source diversity scoring in the Retrieval dimension. Answers grounded in 3+ distinct documents earn +3 pts; 2 documents earn +1 pt. Encourages retrieval pipelines to cast a wide net rather than pulling multiple chunks from the same document.
470
+
471
+ ---
472
+
473
+ ## Examples
474
+
475
+ Working examples are in the [`examples/`](examples/) directory. Each file includes the scenario description, expected label, and expected score range in the header comment.
476
+
477
+ | File | Scenario | Expected |
478
+ |---|---|---|
479
+ | [`basic-rag.ts`](examples/basic-rag.ts) | Three core dimensions, zero config | Strong (100) |
480
+ | [`legal-docs.ts`](examples/legal-docs.ts) | Authority + Corpus, HOA governance | Moderate (78) |
481
+ | [`knowledge-base.ts`](examples/knowledge-base.ts) | Freshness only, API documentation KB | Moderate (78) |
482
+ | [`full-pipeline.ts`](examples/full-pipeline.ts) | All six dimensions, all enhanced signals | Strong (91) |
483
+
484
+ Run any example:
485
+
486
+ ```bash
487
+ npx tsx examples/basic-rag.ts
488
+ ```
489
+
490
+ ---
491
+
492
+ ## Roadmap
493
+
494
+ Planned for future versions — none of these are started or committed:
495
+
496
+ - **Calibration API** — supply historical score/outcome pairs to tune dimension weights for your domain
497
+ - **Batch scoring** — `computeAll(inputs[])` returning sorted scorecards for comparison
498
+ - **Score explanation renderer** — format `DimensionScore.explanation` fields into structured Markdown or HTML for display
499
+ - **Streaming scorecard** — emit partial scorecard as dimensions complete, useful for long-running pipelines
500
+ - **Python port** — identical algorithm, same test scenarios, same output shape
501
+ - **Preset configs** — `createScorer(presets.legalDocs)`, `createScorer(presets.customerSupport)` for common domain setups
502
+
503
+ ---
504
+
505
+ ## Contributing
506
+
507
+ 1. Clone the repo and install dependencies:
508
+ ```bash
509
+ git clone https://github.com/etetzlaff/transparent-confidence.git
510
+ cd transparent-confidence
511
+ npm install
512
+ ```
513
+
514
+ 2. Run tests:
515
+ ```bash
516
+ npm test
517
+ ```
518
+
519
+ 3. Type-check:
520
+ ```bash
521
+ npm run typecheck
522
+ ```
523
+
524
+ 4. Lint and format:
525
+ ```bash
526
+ npm run lint
527
+ ```
528
+
529
+ 5. File issues at [GitHub Issues](https://github.com/etetzlaff/transparent-confidence/issues). PRs welcome — please open an issue first for non-trivial changes.
530
+
531
+ **Test coverage target:** ≥ 90% line, ≥ 95% function, ≥ 85% branch. Run `npm run coverage` to check.
532
+
533
+ ---
534
+
535
+ ## License
536
+
537
+ [Apache 2.0](LICENSE)