npm - wobble-bibble - Versions diffs - 1.0.0 - Mend

wobble-bibble 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/LICENSE.md ADDED Viewed

@@ -0,0 +1,9 @@
+MIT License
+Copyright (c) 2026 Ragaeeb Haq
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,133 @@
+# wobble-bibble 🕌
+[![npm version](https://img.shields.io/npm/v/wobble-bibble.svg)](https://www.npmjs.com/package/wobble-bibble)
+[![codecov](https://codecov.io/gh/ragaeeb/wobble-bibble/graph/badge.svg?token=3BCT73JB7F)](https://codecov.io/gh/ragaeeb/wobble-bibble)
+[![Size](https://deno.bundlejs.com/badge?q=wobble-bibble@latest&badge=detailed)](https://bundlejs.com/?q=wobble-bibble%40latest)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Bun](https://img.shields.io/badge/Runtime-Bun-black?logo=bun)](https://bun.sh)
+[![TypeScript](https://img.shields.io/badge/Language-TypeScript-blue?logo=typescript)](https://www.typescriptlang.org)
+[![Linter: Biome](https://img.shields.io/badge/Linter-Biome-FFB13B?logo=biome)](https://biomejs.dev)
+![Status](https://img.shields.io/badge/Status-Active_Research-green)
+![Version](https://img.shields.io/badge/Prompts-v4.0_Optimized-blue)
+![Focus](https://img.shields.io/badge/Focus-Academic_Fidelity-orange)
+![Standard](https://img.shields.io/badge/Standard-ALA--LC-darkred)
+[![wakatime](https://wakatime.com/badge/user/a0b906ce-b8e7-4463-8bce-383238df6d4b/project/f2110f75-cd59-4395-9790-b971ad3a8195.svg)](https://wakatime.com/badge/user/a0b906ce-b8e7-4463-8bce-383238df6d4b/project/f2110f75-cd59-4395-9790-b971ad3a8195)
+TypeScript library for Islamic text translation prompts with LLM output validation and prompt stacking utilities.
+## Installation
+```bash
+npm install wobble-bibble
+# or
+bun add wobble-bibble
+```
+## Features
+- **Bundled Prompts**: 8 optimized translation prompts (Hadith, Fiqh, Tafsir, etc.) with strongly-typed access
+- **Translation Validation**: Catch LLM hallucinations like malformed segment IDs, Arabic leaks, forbidden terms
+- **Prompt Stacking**: Master + specialized prompts combined automatically
+- **Gold Standards**: [High-fidelity reference dataset](docs/gold-standard.md) for benchmarking
+## Quick Start
+### Get Translation Prompts
+```typescript
+import { getPrompt, getPrompts, getPromptIds } from 'wobble-bibble';
+// Get a specific stacked prompt (strongly typed)
+const hadithPrompt = getPrompt('hadith');
+console.log(hadithPrompt.content); // Master + Hadith addon combined
+// Get all available prompts
+const allPrompts = getPrompts();
+// Get list of prompt IDs for dropdowns
+const ids = getPromptIds(); // ['master_prompt', 'hadith', 'fiqh', ...]
+```
+### Validate LLM Output
+```typescript
+import {
+    validateTranslations,
+    detectArabicScript,
+    detectNewlineAfterId,
+} from 'wobble-bibble';
+const llmOutput = `P1234 - Translation of first segment
+P1235 - Translation of second segment`;
+// Full validation pipeline
+const result = validateTranslations(llmOutput, ['P1234', 'P1235']);
+if (!result.isValid) {
+    console.error('Error:', result.error);
+}
+// Individual detectors
+const arabicWarnings = detectArabicScript(llmOutput); // Soft warnings
+const newlineError = detectNewlineAfterId(llmOutput); // Hard error
+```
+## API Reference
+### Prompts
+| Function | Description |
+|----------|-------------|
+| `getPrompt(id)` | Get a specific stacked prompt by ID (strongly typed) |
+| `getPrompts()` | Get all available stacked prompts |
+| `getStackedPrompt(id)` | Get just the prompt content string |
+| `getMasterPrompt()` | Get raw master prompt (for custom addons) |
+| `getPromptIds()` | Get list of available prompt IDs |
+| `stackPrompts(master, addon)` | Manually combine prompts |
+### Validation (Hard Errors)
+| Function | Description |
+|----------|-------------|
+| `validateTranslations(text, expectedIds)` | Full validation pipeline |
+| `validateTranslationMarkers(text)` | Check for malformed IDs (e.g., `P123$4`) |
+| `detectNewlineAfterId(text)` | Catch `P1234\nText` (Gemini bug) |
+| `detectImplicitContinuation(text)` | Catch "implicit continuation" text |
+| `detectMetaTalk(text)` | Catch "(Note:", "[Editor:" |
+| `detectDuplicateIds(ids)` | Catch same ID appearing twice |
+### Validation (Soft Warnings)
+| Function | Description |
+|----------|-------------|
+| `detectArabicScript(text)` | Detect Arabic characters (except ﷺ) |
+| `detectWrongDiacritics(text)` | Detect â/ã/á instead of macrons |
+### Utilities
+| Function | Description |
+|----------|-------------|
+| `extractTranslationIds(text)` | Extract all segment IDs from text |
+| `normalizeTranslationText(text)` | Split merged markers onto separate lines |
+| `findUnmatchedTranslationIds(ids, expected)` | Find IDs not in expected list |
+| `formatExcerptsForPrompt(segments, prompt)` | Format segments for LLM input |
+## Available Prompts
+| ID | Name | Use Case |
+|----|------|----------|
+| `master_prompt` | Master Prompt | Universal grounding rules |
+| `hadith` | Hadith | Isnad-heavy texts, Sharh |
+| `fiqh` | Fiqh | Legal terminology |
+| `tafsir` | Tafsir | Quranic exegesis |
+| `fatawa` | Fatawa | Q&A format |
+| `encyclopedia_mixed` | Encyclopedia Mixed | Polymath works |
+| `jarh_wa_tadil` | Jarh Wa Tadil | Narrator criticism |
+| `usul_al_fiqh` | Usul Al Fiqh | Legal methodology |
+## Prompt Development
+See [REFINEMENT_GUIDE.md](docs/refinement-guide.md) for the methodology used to develop and test these prompts.
+## License
+MIT

package/dist/index.d.ts ADDED Viewed

@@ -0,0 +1,324 @@
+//#region src/constants.d.ts
+/**
+ * Supported marker types for segments.
+ */
+declare enum Markers {
+  /** B - Book reference */
+  Book = "B",
+  /** F - Footnote reference */
+  Footnote = "F",
+  /** T - Heading reference */
+  Heading = "T",
+  /** C - Chapter reference */
+  Chapter = "C",
+  /** N - Note reference */
+  Note = "N",
+  /** P - Translation/Plain segment */
+  Plain = "P",
+}
+/**
+ * Regex parts for building translation marker patterns.
+ */
+declare const TRANSLATION_MARKER_PARTS: {
+  /** Dash variations (hyphen, en dash, em dash) */
+  readonly dashes: "[-–—]";
+  /** Numeric portion of the reference */
+  readonly digits: "\\d+";
+  /** Valid marker prefixes (Book, Chapter, Footnote, Translation, Page) */
+  readonly markers: "[BCFTPN]";
+  /** Optional whitespace before dash */
+  readonly optionalSpace: "\\s?";
+  /** Valid single-letter suffixes */
+  readonly suffix: "[a-z]";
+};
+/**
+ * Pattern for a segment ID (e.g., P1234, B45a).
+ */
+declare const MARKER_ID_PATTERN: string;
+//#endregion
+//#region src/formatting.d.ts
+/**
+ * Internal segment type for formatting.
+ */
+type Segment = {
+  /** The segment ID (e.g., P1) */
+  id: string;
+  /** The segment text */
+  text: string;
+};
+/**
+ * Formats excerpts for an LLM prompt by combining the prompt rules with the segment text.
+ * Each segment is formatted as "ID - Text" and separated by double newlines.
+ *
+ * @param segments - Array of segments to format
+ * @param prompt - The instruction/system prompt to prepend
+ * @returns Combined prompt and formatted text
+ */
+declare const formatExcerptsForPrompt: (segments: Segment[], prompt: string) => string;
+//#endregion
+//#region .generated/prompts.d.ts
+type PromptId = 'master_prompt' | 'encyclopedia_mixed' | 'fatawa' | 'fiqh' | 'hadith' | 'jarh_wa_tadil' | 'tafsir' | 'usul_al_fiqh';
+declare const PROMPTS: readonly [{
+  readonly id: "master_prompt";
+  readonly name: "Master Prompt";
+  readonly content: "ROLE: Expert academic translator of Classical Islamic texts; prioritize accuracy and structure over fluency.\nCRITICAL NEGATIONS: 1. NO SANITIZATION (Do not soften polemics). 2. NO META-TALK (Output translation only). 3. NO MARKDOWN (Plain text only). 4. NO EMENDATION. 5. NO INFERENCE. 6. NO RESTRUCTURING. 7. NO OPAQUE TRANSLITERATION (Must translate phrases). 8. NO INVENTED SEGMENTS (Do not create, modify, or \"continue\" segment IDs. Output IDs verbatim exactly as they appear in the source input/metadata. Alphabetic suffixes (e.g., P5511a) are allowed IF AND ONLY IF that exact ID appears in the source. Any ID not present verbatim in the source is INVENTED. EXAMPLE: If P5803b ends with a questioner line, that line stays under P5803b — do NOT invent P5803c. If an expected ID is missing from the source, output: \"ID - [MISSING]\".)\nRULES: NO ARABIC SCRIPT (Except ﷺ). Plain text only. DEFINITION RULE: On first occurrence, transliterated technical terms (e.g., bidʿah) MUST be defined: \"translit (English)\". Preserve Segment ID. Translate meaning/intent. No inference. No extra fields. Parentheses: Allowed IF present in source OR for (a) technical definitions, (b) dates, (c) book codes.\nTRANSLITERATION & TERMS:\n1. SCHEME: Use full ALA-LC for explicit Arabic-script Person/Place/Book-Titles.\n   - al-Casing: Lowercase al- mid-sentence; Capitalize after (al-Salafīyyah).\n   - Book Titles: Transliterate only (do not translate meanings).\n2. TECHNICAL TERMS: On first occurrence, define: \"translit (English)\" (e.g., bidʿah (innovation), isnād (chain)).\n   - Do NOT output multi-word transliterations without immediate English translation.\n3. STANDARDIZED TERMS: Use standard academic spellings: Muḥammad, Shaykh, Qurʾān, Islām, ḥadīth.\n   - Sunnah (Capitalized) = The Corpus/Prophetic Tradition. sunnah (lowercase) = legal status/recommended.\n4. PROPER NAMES: Transliterate only (no parentheses).\n5. UNICODE: Latin + Latin Extended (āīūḥʿḍṣṭẓʾ) + punctuation. NO Arabic script (except ﷺ). NO emoji.\n   - DIACRITIC FALLBACK: If you cannot produce correct ALA-LC diacritics, output English only. Do NOT use substitute accents (â/ã/á).\n6. SALUTATION: Replace all Prophet salutations with ﷺ.\n7. AMBIGUITY: Use contextual meaning from tafsir for theological terms. Do not sanitise polemics (e.g. Rāfiḍah).\nOUTPUT FORMAT: Segment_ID - English translation.\nCRITICAL: You must use the ASCII hyphen separator \" - \" (space+hyphen+space) immediately after the ID. Do NOT use em-dash or en-dash. Do NOT use a newline after the ID.\nMULTI-LINE SEGMENTS (e.g., internal Q&A): Output the Segment_ID and \" - \" ONLY ONCE on the first line. Do NOT repeat the Segment_ID on subsequent lines; subsequent lines must start directly with the speaker label/text (no \"ID - \" prefix).\nSEGMENT BOUNDARIES (Anti-hallucination): Start a NEW segment ONLY when the source explicitly provides a Segment_ID. If the source continues with extra lines (including speaker labels like \"Questioner:\"/\"The Shaykh:\"/\"السائل:\"/\"الشيخ:\") WITHOUT a new Segment_ID, treat them as part of the CURRENT segment (multi-line under the current Segment_ID). Do NOT invent a new ID (including alphabetic suffixes like \"P5803c\") to label such continuation.\nOUTPUT COMPLETENESS: Translate ALL content in EVERY segment. Do not truncate, summarize, or skip content. The \"…\" symbol in the source indicates an audio gap in the original recording — it is NOT an instruction to omit content. Every segment must be fully translated. If you cannot complete a segment, output \"ID - [INCOMPLETE]\" instead of just \"…\".\nOUTPUT UNIQUENESS: Each Segment_ID from the source must appear in your output EXACTLY ONCE as an \"ID - ...\" prefix. Do NOT output the same Segment_ID header twice. If a segment is long or has multiple speaker turns, continue translating under that single ID header without re-stating it.\nNEGATIVE CONSTRAINTS: Do NOT output \"implicit continuation\", summaries, or extra paragraphs. Output only the text present in the source segment.\nExample: P1234 - Translation text... (Correct) vs P1234\\nTranslation... (Forbidden).\nEXAMPLE: Input: P405 - حدثنا عبد الله بن يوسف... Output: P405 - ʿAbd Allāh b. Yūsuf narrated to us...";
+}, {
+  readonly id: "encyclopedia_mixed";
+  readonly name: "Encyclopedia Mixed";
+  readonly content: "NO MODE TAGS: Do not output any mode labels or bracket tags.\nSTRUCTURE (Apply First):\n- Q&A: Whenever \"Al-Sāʾil:\"/\"Al-Shaykh:\" appear: Start NEW LINE for speaker. Keep Label+Text on SAME LINE.\n- EXCEPTION: If the speaker label is the VERY FIRST token after the \"ID - \" prefix, keep it on the same line. (Correct: P5455 - Questioner: Text...) (Wrong: P5455 \\n Questioner: Text...).\n- INTERNAL Q&A: If segment has multiple turns, use new lines for speakers. Output Segment ID ONLY ONCE at the start of the first line. Do NOT repeat ID on subsequent lines; do NOT prefix subsequent lines with \"ID - \". (e.g. P5455 - Questioner: ... \\n The Shaykh: ...).\n- OUTPUT LABELS: Al-Sāʾil -> Questioner: ; Al-Shaykh -> The Shaykh:\n\nDEFINITIONS & CASING:\n- GEOPOLITICS: Modern place names may use English exonyms (Filasṭīn -> Palestine).\n- PLURALS: Do not pluralize term-pairs by appending \"s\" (e.g., \"ḥadīth (report)s\"). Use the English plural or rephrase.\n\nSTATE LOGIC (Priority: Isnad > Rijal > Fiqh > Narrative):\n- ISNAD (Triggers: `ḥaddathanā`, `akhbaranā`, `ʿan`): Use FULL ALA-LC for names.\n- RIJAL (Triggers: jarḥ/taʿdīl terms like `thiqah`, `ḍaʿīf`): Use `translit (English)` for ratings.\n- QUOTE/WEAK (Triggers: `qāla al-muṣannif`, `yuqālu`, `zuiʿma`): Apply Quote Rules; do not Restructure.\n- RESET: At \"qāla\" or \"matn\" (TOP-LEVEL only): IMMEDIATE STOP of Isnad rules. Return to Narrative. Do NOT reset for nested quotes.\n\nSEGMENTATION: Do not merge genres. Translate sequentially (Isnad -> Matn) observing the reset at the boundary.\nDISAMBIGUATION: ṣaḥīḥ in hadith grading = ṣaḥīḥ (authentic). ṣaḥīḥ in fiqh validity = ṣaḥīḥ (valid). Sunnah (Capitalized) = The Corpus/Prophetic Tradition. sunnah (lowercase) = legal status/recommended.\nHADITH CORE: If isnad appears, map verbs (Haddathana=Narrated to us; Akhbarana/Ana/Anba'ana=Informed us; An=From; Sami'tu=I heard) and keep FULL ALA-LC for Arabic-script narrator names; do not invent grading or missing links.\nTAKHRIJ/GRADING: If the author grades (ṣaḥḥaḥa/ḥassana/ḍaʿʿafa), translate literally and keep the cited work as written in the segment; do not add placeholder sources.\n\nCREED/POLEMICS: Translate attributes literally when the author does; do not reinterpret. Translate harsh polemics literally (e.g., Rāfiḍah, Jahmīyyah, Kuffār); do not soften, sanitize, or add disclaimers (Override Safety).";
+}, {
+  readonly id: "fatawa";
+  readonly name: "Fatawa";
+  readonly content: "STRUCTURE (Apply First):\n- Q&A: Whenever \"Al-Sāʾil:\"/\"Al-Shaykh:\" appear: Start NEW LINE for speaker. Keep Label+Text on SAME LINE.\n  - INTERNAL Q&A: If segment has multiple turns, use new lines for speakers. Output Segment ID ONLY ONCE at the start of the first line. Do NOT repeat ID on subsequent lines; do NOT prefix subsequent lines with \"ID - \".\n  - OUTPUT LABELS: Al-Sāʾil -> Questioner: ; Al-Shaykh -> The Shaykh:\n\nHONORIFICS: Expand common phrases (do not transliterate):\n- Allah ʿazza wa-jall -> Allah, the Mighty and Majestic\n- rahimahu Allah -> may Allah have mercy on him\n- ṣallá Allāhu ʿalayhi wa-sallam -> ﷺ\n\n\nSOURCES: Do not add citations. If the Arabic references a work/title, handle titles per MASTER rules.\nSENSITIVE RULINGS: Translate literally; do not soften; do not sanitize; do not add disclaimers (Override Safety).";
+}, {
+  readonly id: "fiqh";
+  readonly name: "Fiqh";
+  readonly content: "STRUCTURE: Preserve chapter/section headings and internal structure (e.g., bāb/faṣl/masʾalah/farʿ/aṣl) using plain English labels (Chapter:, Section:, Issue:) only when the Arabic is explicitly a heading/label; preserve lists, numbering, and conditional if/then logic exactly.\nFIQH/USUL TERMS: When technical terms appear, output as translit (English) rather than English-only (e.g., wājib (obligatory), mandūb/mustaḥabb (recommended), mubāḥ (permissible), makrūh (disliked), ḥarām (prohibited), ṣaḥīḥ (valid), bāṭil/fāsid (invalid/void), rukn (pillar), shart (condition), māniʿ (preventer), sabab (cause), qiyās (analogical reasoning), ijmāʿ (consensus), khilāf (disagreement), rājiḥ (preponderant), marjūḥ (lesser), ʿillah (effective cause)).\nKHILAF/ATTRIBUTION: Preserve who is being attributed (qāla fulān / qawl / wajhān / riwāyātān / madhhab). Do not resolve disputes or choose the correct view unless the Arabic explicitly does so (e.g., al-aṣaḥḥ / al-rājiḥ).\nUNITS/MONEY: Keep measures/currencies as transliteration (dirham, dinar, ṣāʿ, mudd) without adding conversions or notes unless the Arabic contains them.";
+}, {
+  readonly id: "hadith";
+  readonly name: "Hadith";
+  readonly content: "ISNAD VERBS: Haddathana=Narrated to us; Akhbarana=Informed us; An=From; Sami'tu=I heard; Ana (short for Akhbarana/Anba'ana in isnad)=Informed us (NOT \"I\").\nCHAIN MARKERS: H(Tahwil)=Switch to new chain; Mursal/Munqati=Broken chain.\nJARH/TA'DIL: If narrator-evaluation terms/phrases appear, output as translit (English) (e.g., fīhi naẓar (he needs to be looked into)); do not replace with only English.\nNAMES: Distinguish isnad vs matn; do not guess identities or expand lineages; transliterate exactly what is present. Book titles follow master rule.\nRUMUZ/CODES: If the segment contains book codes (kh/m/d/t/s/q/4), preserve them exactly; do not expand to book names.";
+}, {
+  readonly id: "jarh_wa_tadil";
+  readonly name: "Jarh Wa Tadil";
+  readonly content: "GLOSSARY: When a jarh/ta'dil term/phrase appears, output as translit (English) (e.g., thiqah (trustworthy), ṣadūq (truthful), layyin (soft/lenient), ḍaʿīf (weak), matrūk (abandoned), kadhdhāb (liar), dajjāl (imposter), munkar al-ḥadīth (narrates denounced hadith)).\nRUMUZ: Preserve book codes in Latin exactly as in the segment (e.g., (kh) (m) (d t q) (4) (a)); do not expand unless the Arabic segment itself expands them.\nQALA: Translate as \"He said:\" and start a new line for each new critic.\nDATES: Use (d. 256 AH) or (born 194 AH).\nNO HARM: Translate \"There is no harm in him\"; no notes.\nPOLEMICS: Harsh terms (e.g., dajjāl, khabīth, rāfiḍī) must be translated literally; do not soften.";
+}, {
+  readonly id: "tafsir";
+  readonly name: "Tafsir";
+  readonly content: "AYAH CITES: Do not output surah names unless the Arabic includes the name. Use [2:255]. If the segment contains quoted Qur'an text, translate it in braces: {…} [2:255].\nATTRIBUTES: Translate Allah’s attributes as the author intends; if the author is literal, keep literal (e.g., Hand, Face); do not add metaphorical reinterpretation unless the author does; mirror the author’s theology (Ash'ari vs Salafi) exactly.\nI'RAB TERMS: Mubtada=Subject; Khabar=Predicate; Fa'il=Agent/Doer; Maf'ul=Object.\nPROPHET NAMES: Use Arabic equivalents with ALA-LC diacritics (e.g., Mūsá, ʿĪsá, Dāwūd, Yūsuf).\nPOETRY: Preserve line breaks (one English line per Arabic line); no bullets; prioritize literal structure/grammar over rhyme.";
+}, {
+  readonly id: "usul_al_fiqh";
+  readonly name: "Usul Al Fiqh";
+  readonly content: "STRUCTURE: Preserve the argument structure (claims, objections \"if it is said...\", replies \"we say...\", evidences, counter-evidences). Preserve explicit labels (faṣl, masʾalah, qāla, qīla, qulna) as plain English equivalents only when the Arabic is explicitly a label.\nUSUL TERMS: When technical terms appear, output as translit (English) (e.g., ʿāmm (general), khāṣṣ (specific), muṭlaq (absolute), muqayyad (restricted), amr (command), nahy (prohibition), ḥaqīqah (literal), majāz (figurative), mujmal (ambiguous), mubayyan (clarified), naṣṣ (explicit text), ẓāhir (apparent), mafhūm (implication), manṭūq (stated meaning), dalīl (evidence), qiyās (analogical reasoning), ʿillah (effective cause), sabab (cause), shart (condition), māniʿ (preventer), ijmāʿ (consensus), naskh (abrogation)).\nDISPUTE HANDLING: Do not resolve methodological disputes or harmonize schools unless the Arabic explicitly chooses (e.g., al-rājiḥ / al-aṣaḥḥ / ṣaḥīḥ). Preserve attribution to the madhhab/scholars as written.\nQUR'AN/HADITH: Keep verse references in the segment’s style; do not invent references. If a hadith isnad appears, follow MASTER isnad/name rules.";
+}];
+type PromptMetadata = (typeof PROMPTS)[number];
+//#endregion
+//#region src/prompts.d.ts
+/**
+ * A stacked prompt ready for use with an LLM.
+ */
+type StackedPrompt = {
+  /** Unique identifier */
+  id: PromptId;
+  /** Human-readable name */
+  name: string;
+  /** The full prompt content (master + addon if applicable) */
+  content: string;
+  /** Whether this is the master prompt (not stacked) */
+  isMaster: boolean;
+};
+/**
+ * Stacks a master prompt with a specialized addon prompt.
+ *
+ * @param master - The master/base prompt
+ * @param addon - The specialized addon prompt
+ * @returns Combined prompt text
+ */
+declare const stackPrompts: (master: string, addon: string) => string;
+/**
+ * Gets all available prompts as stacked prompts (master + addon combined).
+ * Master prompt is returned as-is, addon prompts are stacked with master.
+ *
+ * @returns Array of all stacked prompts
+ */
+declare const getPrompts: () => StackedPrompt[];
+/**
+ * Gets a specific prompt by ID (strongly typed).
+ * Returns the stacked version (master + addon) for addon prompts.
+ *
+ * @param id - The prompt ID to retrieve
+ * @returns The stacked prompt
+ * @throws Error if prompt ID is not found
+ */
+declare const getPrompt: (id: PromptId) => StackedPrompt;
+/**
+ * Gets the raw stacked prompt text for a specific prompt ID.
+ * Convenience method for when you just need the text.
+ *
+ * @param id - The prompt ID
+ * @returns The stacked prompt content string
+ */
+declare const getStackedPrompt: (id: PromptId) => string;
+/**
+ * Gets the list of available prompt IDs.
+ * Useful for UI dropdowns or validation.
+ *
+ * @returns Array of prompt IDs
+ */
+declare const getPromptIds: () => PromptId[];
+/**
+ * Gets just the master prompt content.
+ * Useful when you need to use a custom addon.
+ *
+ * @returns The master prompt content
+ */
+declare const getMasterPrompt: () => string;
+//#endregion
+//#region src/validation.d.ts
+/**
+ * Warning types for soft validation issues
+ */
+type ValidationWarningType = 'arabic_leak' | 'wrong_diacritics';
+/**
+ * A soft validation warning (not a hard error)
+ */
+type ValidationWarning = {
+  /** The type of warning */
+  type: ValidationWarningType;
+  /** Human-readable warning message */
+  message: string;
+  /** The offending text match */
+  match?: string;
+};
+/**
+ * Result of translation validation
+ */
+type TranslationValidationResult = {
+  /** Whether validation passed */
+  isValid: boolean;
+  /** Error message if validation failed */
+  error?: string;
+  /** Normalized/fixed text (with merged markers split onto separate lines) */
+  normalizedText: string;
+  /** List of parsed translation IDs in order */
+  parsedIds: string[];
+  /** Soft warnings (issues that don't fail validation) */
+  warnings?: ValidationWarning[];
+};
+/**
+ * Detects Arabic script in text (except allowed ﷺ symbol).
+ * This is a SOFT warning - Arabic leak is bad but not a hard failure.
+ *
+ * @param text - The text to scan for Arabic script
+ * @returns Array of validation warnings if Arabic is found
+ */
+declare const detectArabicScript: (text: string) => ValidationWarning[];
+/**
+ * Detects wrong diacritics (â/ã/á instead of correct macrons ā/ī/ū).
+ * This is a SOFT warning - wrong diacritics are bad but not a hard failure.
+ *
+ * @param text - The text to scan for incorrect diacritics
+ * @returns Array of validation warnings if wrong diacritics are found
+ */
+declare const detectWrongDiacritics: (text: string) => ValidationWarning[];
+/**
+ * Detects newline immediately after segment ID (the "Gemini bug").
+ * Format should be "P1234 - Text" not "P1234\nText".
+ *
+ * @param text - The text to validate
+ * @returns Error message if bug is detected, otherwise undefined
+ */
+declare const detectNewlineAfterId: (text: string) => string | undefined;
+/**
+ * Detects implicit continuation text that LLMs add when hallucinating.
+ *
+ * @param text - The text to scan for continuation markers
+ * @returns Error message if continuation text is found, otherwise undefined
+ */
+declare const detectImplicitContinuation: (text: string) => string | undefined;
+/**
+ * Detects meta-talk (translator notes, editor comments) that violate NO META-TALK.
+ *
+ * @param text - The text to scan for meta-talk
+ * @returns Error message if meta-talk is found, otherwise undefined
+ */
+declare const detectMetaTalk: (text: string) => string | undefined;
+/**
+ * Detects duplicate segment IDs in the output.
+ *
+ * @param ids - List of IDs extracted from the translation
+ * @returns Error message if duplicates are found, otherwise undefined
+ */
+declare const detectDuplicateIds: (ids: string[]) => string | undefined;
+/**
+ * Detects IDs in the output that were not in the source (invented/hallucinated IDs).
+ * @param outputIds - IDs extracted from LLM output
+ * @param sourceIds - IDs that were present in the source input
+ * @returns Error message if invented IDs found, undefined if all IDs are valid
+ */
+declare const detectInventedIds: (outputIds: string[], sourceIds: string[]) => string | undefined;
+/**
+ * Detects segments that appear truncated (just "…" or very short with no real content).
+ * @param text - The full LLM output text
+ * @returns Error message if truncated segments found, undefined if all segments have content
+ */
+declare const detectTruncatedSegments: (text: string) => string | undefined;
+/**
+ * Validates translation marker format and returns error message if invalid.
+ * Catches common AI hallucinations like malformed reference IDs.
+ *
+ * @param text - Raw translation text to validate
+ * @returns Error message if invalid, undefined if valid
+ */
+declare const validateTranslationMarkers: (text: string) => string | undefined;
+/**
+ * Normalizes translation text by splitting merged markers onto separate lines.
+ * LLMs sometimes put multiple translations on the same line.
+ *
+ * @param content - Raw translation text
+ * @returns Normalized text with each marker on its own line
+ */
+declare const normalizeTranslationText: (content: string) => string;
+/**
+ * Extracts translation IDs from text in order of appearance.
+ *
+ * @param text - Translation text
+ * @returns Array of IDs in order
+ */
+declare const extractTranslationIds: (text: string) => string[];
+/**
+ * Extracts the numeric portion from an excerpt ID.
+ * E.g., "P11622a" -> 11622, "C123" -> 123, "B45b" -> 45
+ *
+ * @param id - Excerpt ID
+ * @returns Numeric portion of the ID
+ */
+declare const extractIdNumber: (id: string) => number;
+/**
+ * Extracts the prefix (type) from an excerpt ID.
+ * E.g., "P11622a" -> "P", "C123" -> "C", "B45" -> "B"
+ *
+ * @param id - Excerpt ID
+ * @returns Single character prefix
+ */
+declare const extractIdPrefix: (id: string) => string;
+/**
+ * Validates that translation IDs appear in ascending numeric order within the same prefix type.
+ * This catches LLM errors where translations are output in wrong order (e.g., P12659 before P12651).
+ *
+ * @param translationIds - IDs from pasted translations
+ * @returns Error message if order issue detected, undefined if valid
+ */
+declare const validateNumericOrder: (translationIds: string[]) => string | undefined;
+/**
+ * Validates translation order against expected excerpt order from the store.
+ * Allows pasting in multiple blocks where each block is internally ordered.
+ * Resets (position going backwards) are allowed between blocks.
+ * Errors only when there's disorder WITHIN a block (going backwards then forwards).
+ *
+ * @param translationIds - IDs from pasted translations
+ * @param expectedIds - IDs from store excerpts/headings/footnotes in order
+ * @returns Error message if order issue detected, undefined if valid
+ */
+declare const validateTranslationOrder: (translationIds: string[], expectedIds: string[]) => string | undefined;
+/**
+ * Performs comprehensive validation on translation text.
+ * Validates markers, normalizes text, and checks order against expected IDs.
+ *
+ * @param rawText - Raw translation text from user input
+ * @param expectedIds - Expected IDs from store (excerpts + headings + footnotes)
+ * @returns Validation result with normalized text and any errors
+ */
+declare const validateTranslations: (rawText: string, expectedIds: string[]) => TranslationValidationResult;
+/**
+ * Finds translation IDs that don't exist in the expected store IDs.
+ * Used to validate that all pasted translations can be matched before committing.
+ *
+ * @param translationIds - IDs from parsed translations
+ * @param expectedIds - IDs from store (excerpts + headings + footnotes)
+ * @returns Array of IDs that exist in translations but not in the store
+ */
+declare const findUnmatchedTranslationIds: (translationIds: string[], expectedIds: string[]) => string[];
+//#endregion
+export { MARKER_ID_PATTERN, Markers, type PromptId, type PromptMetadata, type StackedPrompt, TRANSLATION_MARKER_PARTS, type TranslationValidationResult, type ValidationWarning, type ValidationWarningType, detectArabicScript, detectDuplicateIds, detectImplicitContinuation, detectInventedIds, detectMetaTalk, detectNewlineAfterId, detectTruncatedSegments, detectWrongDiacritics, extractIdNumber, extractIdPrefix, extractTranslationIds, findUnmatchedTranslationIds, formatExcerptsForPrompt, getMasterPrompt, getPrompt, getPromptIds, getPrompts, getStackedPrompt, normalizeTranslationText, stackPrompts, validateNumericOrder, validateTranslationMarkers, validateTranslationOrder, validateTranslations };
+//# sourceMappingURL=index.d.ts.map

package/dist/index.js ADDED Viewed

@@ -0,0 +1,471 @@
+//#region src/constants.ts
+/**
+* Supported marker types for segments.
+*/
+let Markers = /* @__PURE__ */ function(Markers$1) {
+	/** B - Book reference */
+	Markers$1["Book"] = "B";
+	/** F - Footnote reference */
+	Markers$1["Footnote"] = "F";
+	/** T - Heading reference */
+	Markers$1["Heading"] = "T";
+	/** C - Chapter reference */
+	Markers$1["Chapter"] = "C";
+	/** N - Note reference */
+	Markers$1["Note"] = "N";
+	/** P - Translation/Plain segment */
+	Markers$1["Plain"] = "P";
+	return Markers$1;
+}({});
+/**
+* Regex parts for building translation marker patterns.
+*/
+const TRANSLATION_MARKER_PARTS = {
+	dashes: "[-–—]",
+	digits: "\\d+",
+	markers: `[${Markers.Book}${Markers.Chapter}${Markers.Footnote}${Markers.Heading}${Markers.Plain}${Markers.Note}]`,
+	optionalSpace: "\\s?",
+	suffix: "[a-z]"
+};
+/**
+* Pattern for a segment ID (e.g., P1234, B45a).
+*/
+const MARKER_ID_PATTERN = `${TRANSLATION_MARKER_PARTS.markers}${TRANSLATION_MARKER_PARTS.digits}${TRANSLATION_MARKER_PARTS.suffix}?`;
+//#endregion
+//#region src/formatting.ts
+/**
+* Formats excerpts for an LLM prompt by combining the prompt rules with the segment text.
+* Each segment is formatted as "ID - Text" and separated by double newlines.
+*
+* @param segments - Array of segments to format
+* @param prompt - The instruction/system prompt to prepend
+* @returns Combined prompt and formatted text
+*/
+const formatExcerptsForPrompt = (segments, prompt) => {
+	return [prompt, segments.map((e) => `${e.id} - ${e.text}`).join("\n\n")].join("\n\n");
+};
+//#endregion
+//#region .generated/prompts.ts
+const MASTER_PROMPT = "ROLE: Expert academic translator of Classical Islamic texts; prioritize accuracy and structure over fluency.\nCRITICAL NEGATIONS: 1. NO SANITIZATION (Do not soften polemics). 2. NO META-TALK (Output translation only). 3. NO MARKDOWN (Plain text only). 4. NO EMENDATION. 5. NO INFERENCE. 6. NO RESTRUCTURING. 7. NO OPAQUE TRANSLITERATION (Must translate phrases). 8. NO INVENTED SEGMENTS (Do not create, modify, or \"continue\" segment IDs. Output IDs verbatim exactly as they appear in the source input/metadata. Alphabetic suffixes (e.g., P5511a) are allowed IF AND ONLY IF that exact ID appears in the source. Any ID not present verbatim in the source is INVENTED. EXAMPLE: If P5803b ends with a questioner line, that line stays under P5803b — do NOT invent P5803c. If an expected ID is missing from the source, output: \"ID - [MISSING]\".)\nRULES: NO ARABIC SCRIPT (Except ﷺ). Plain text only. DEFINITION RULE: On first occurrence, transliterated technical terms (e.g., bidʿah) MUST be defined: \"translit (English)\". Preserve Segment ID. Translate meaning/intent. No inference. No extra fields. Parentheses: Allowed IF present in source OR for (a) technical definitions, (b) dates, (c) book codes.\nTRANSLITERATION & TERMS:\n1. SCHEME: Use full ALA-LC for explicit Arabic-script Person/Place/Book-Titles.\n   - al-Casing: Lowercase al- mid-sentence; Capitalize after (al-Salafīyyah).\n   - Book Titles: Transliterate only (do not translate meanings).\n2. TECHNICAL TERMS: On first occurrence, define: \"translit (English)\" (e.g., bidʿah (innovation), isnād (chain)).\n   - Do NOT output multi-word transliterations without immediate English translation.\n3. STANDARDIZED TERMS: Use standard academic spellings: Muḥammad, Shaykh, Qurʾān, Islām, ḥadīth.\n   - Sunnah (Capitalized) = The Corpus/Prophetic Tradition. sunnah (lowercase) = legal status/recommended.\n4. PROPER NAMES: Transliterate only (no parentheses).\n5. UNICODE: Latin + Latin Extended (āīūḥʿḍṣṭẓʾ) + punctuation. NO Arabic script (except ﷺ). NO emoji.\n   - DIACRITIC FALLBACK: If you cannot produce correct ALA-LC diacritics, output English only. Do NOT use substitute accents (â/ã/á).\n6. SALUTATION: Replace all Prophet salutations with ﷺ.\n7. AMBIGUITY: Use contextual meaning from tafsir for theological terms. Do not sanitise polemics (e.g. Rāfiḍah).\nOUTPUT FORMAT: Segment_ID - English translation.\nCRITICAL: You must use the ASCII hyphen separator \" - \" (space+hyphen+space) immediately after the ID. Do NOT use em-dash or en-dash. Do NOT use a newline after the ID.\nMULTI-LINE SEGMENTS (e.g., internal Q&A): Output the Segment_ID and \" - \" ONLY ONCE on the first line. Do NOT repeat the Segment_ID on subsequent lines; subsequent lines must start directly with the speaker label/text (no \"ID - \" prefix).\nSEGMENT BOUNDARIES (Anti-hallucination): Start a NEW segment ONLY when the source explicitly provides a Segment_ID. If the source continues with extra lines (including speaker labels like \"Questioner:\"/\"The Shaykh:\"/\"السائل:\"/\"الشيخ:\") WITHOUT a new Segment_ID, treat them as part of the CURRENT segment (multi-line under the current Segment_ID). Do NOT invent a new ID (including alphabetic suffixes like \"P5803c\") to label such continuation.\nOUTPUT COMPLETENESS: Translate ALL content in EVERY segment. Do not truncate, summarize, or skip content. The \"…\" symbol in the source indicates an audio gap in the original recording — it is NOT an instruction to omit content. Every segment must be fully translated. If you cannot complete a segment, output \"ID - [INCOMPLETE]\" instead of just \"…\".\nOUTPUT UNIQUENESS: Each Segment_ID from the source must appear in your output EXACTLY ONCE as an \"ID - ...\" prefix. Do NOT output the same Segment_ID header twice. If a segment is long or has multiple speaker turns, continue translating under that single ID header without re-stating it.\nNEGATIVE CONSTRAINTS: Do NOT output \"implicit continuation\", summaries, or extra paragraphs. Output only the text present in the source segment.\nExample: P1234 - Translation text... (Correct) vs P1234\\nTranslation... (Forbidden).\nEXAMPLE: Input: P405 - حدثنا عبد الله بن يوسف... Output: P405 - ʿAbd Allāh b. Yūsuf narrated to us...";
+const ENCYCLOPEDIA_MIXED = "NO MODE TAGS: Do not output any mode labels or bracket tags.\nSTRUCTURE (Apply First):\n- Q&A: Whenever \"Al-Sāʾil:\"/\"Al-Shaykh:\" appear: Start NEW LINE for speaker. Keep Label+Text on SAME LINE.\n- EXCEPTION: If the speaker label is the VERY FIRST token after the \"ID - \" prefix, keep it on the same line. (Correct: P5455 - Questioner: Text...) (Wrong: P5455 \\n Questioner: Text...).\n- INTERNAL Q&A: If segment has multiple turns, use new lines for speakers. Output Segment ID ONLY ONCE at the start of the first line. Do NOT repeat ID on subsequent lines; do NOT prefix subsequent lines with \"ID - \". (e.g. P5455 - Questioner: ... \\n The Shaykh: ...).\n- OUTPUT LABELS: Al-Sāʾil -> Questioner: ; Al-Shaykh -> The Shaykh:\n\nDEFINITIONS & CASING:\n- GEOPOLITICS: Modern place names may use English exonyms (Filasṭīn -> Palestine).\n- PLURALS: Do not pluralize term-pairs by appending \"s\" (e.g., \"ḥadīth (report)s\"). Use the English plural or rephrase.\n\nSTATE LOGIC (Priority: Isnad > Rijal > Fiqh > Narrative):\n- ISNAD (Triggers: `ḥaddathanā`, `akhbaranā`, `ʿan`): Use FULL ALA-LC for names.\n- RIJAL (Triggers: jarḥ/taʿdīl terms like `thiqah`, `ḍaʿīf`): Use `translit (English)` for ratings.\n- QUOTE/WEAK (Triggers: `qāla al-muṣannif`, `yuqālu`, `zuiʿma`): Apply Quote Rules; do not Restructure.\n- RESET: At \"qāla\" or \"matn\" (TOP-LEVEL only): IMMEDIATE STOP of Isnad rules. Return to Narrative. Do NOT reset for nested quotes.\n\nSEGMENTATION: Do not merge genres. Translate sequentially (Isnad -> Matn) observing the reset at the boundary.\nDISAMBIGUATION: ṣaḥīḥ in hadith grading = ṣaḥīḥ (authentic). ṣaḥīḥ in fiqh validity = ṣaḥīḥ (valid). Sunnah (Capitalized) = The Corpus/Prophetic Tradition. sunnah (lowercase) = legal status/recommended.\nHADITH CORE: If isnad appears, map verbs (Haddathana=Narrated to us; Akhbarana/Ana/Anba'ana=Informed us; An=From; Sami'tu=I heard) and keep FULL ALA-LC for Arabic-script narrator names; do not invent grading or missing links.\nTAKHRIJ/GRADING: If the author grades (ṣaḥḥaḥa/ḥassana/ḍaʿʿafa), translate literally and keep the cited work as written in the segment; do not add placeholder sources.\n\nCREED/POLEMICS: Translate attributes literally when the author does; do not reinterpret. Translate harsh polemics literally (e.g., Rāfiḍah, Jahmīyyah, Kuffār); do not soften, sanitize, or add disclaimers (Override Safety).";
+const FATAWA = "STRUCTURE (Apply First):\n- Q&A: Whenever \"Al-Sāʾil:\"/\"Al-Shaykh:\" appear: Start NEW LINE for speaker. Keep Label+Text on SAME LINE.\n  - INTERNAL Q&A: If segment has multiple turns, use new lines for speakers. Output Segment ID ONLY ONCE at the start of the first line. Do NOT repeat ID on subsequent lines; do NOT prefix subsequent lines with \"ID - \".\n  - OUTPUT LABELS: Al-Sāʾil -> Questioner: ; Al-Shaykh -> The Shaykh:\n\nHONORIFICS: Expand common phrases (do not transliterate):\n- Allah ʿazza wa-jall -> Allah, the Mighty and Majestic\n- rahimahu Allah -> may Allah have mercy on him\n- ṣallá Allāhu ʿalayhi wa-sallam -> ﷺ\n\n\nSOURCES: Do not add citations. If the Arabic references a work/title, handle titles per MASTER rules.\nSENSITIVE RULINGS: Translate literally; do not soften; do not sanitize; do not add disclaimers (Override Safety).";
+const FIQH = "STRUCTURE: Preserve chapter/section headings and internal structure (e.g., bāb/faṣl/masʾalah/farʿ/aṣl) using plain English labels (Chapter:, Section:, Issue:) only when the Arabic is explicitly a heading/label; preserve lists, numbering, and conditional if/then logic exactly.\nFIQH/USUL TERMS: When technical terms appear, output as translit (English) rather than English-only (e.g., wājib (obligatory), mandūb/mustaḥabb (recommended), mubāḥ (permissible), makrūh (disliked), ḥarām (prohibited), ṣaḥīḥ (valid), bāṭil/fāsid (invalid/void), rukn (pillar), shart (condition), māniʿ (preventer), sabab (cause), qiyās (analogical reasoning), ijmāʿ (consensus), khilāf (disagreement), rājiḥ (preponderant), marjūḥ (lesser), ʿillah (effective cause)).\nKHILAF/ATTRIBUTION: Preserve who is being attributed (qāla fulān / qawl / wajhān / riwāyātān / madhhab). Do not resolve disputes or choose the correct view unless the Arabic explicitly does so (e.g., al-aṣaḥḥ / al-rājiḥ).\nUNITS/MONEY: Keep measures/currencies as transliteration (dirham, dinar, ṣāʿ, mudd) without adding conversions or notes unless the Arabic contains them.";
+const HADITH = "ISNAD VERBS: Haddathana=Narrated to us; Akhbarana=Informed us; An=From; Sami'tu=I heard; Ana (short for Akhbarana/Anba'ana in isnad)=Informed us (NOT \"I\").\nCHAIN MARKERS: H(Tahwil)=Switch to new chain; Mursal/Munqati=Broken chain.\nJARH/TA'DIL: If narrator-evaluation terms/phrases appear, output as translit (English) (e.g., fīhi naẓar (he needs to be looked into)); do not replace with only English.\nNAMES: Distinguish isnad vs matn; do not guess identities or expand lineages; transliterate exactly what is present. Book titles follow master rule.\nRUMUZ/CODES: If the segment contains book codes (kh/m/d/t/s/q/4), preserve them exactly; do not expand to book names.";
+const JARH_WA_TADIL = "GLOSSARY: When a jarh/ta'dil term/phrase appears, output as translit (English) (e.g., thiqah (trustworthy), ṣadūq (truthful), layyin (soft/lenient), ḍaʿīf (weak), matrūk (abandoned), kadhdhāb (liar), dajjāl (imposter), munkar al-ḥadīth (narrates denounced hadith)).\nRUMUZ: Preserve book codes in Latin exactly as in the segment (e.g., (kh) (m) (d t q) (4) (a)); do not expand unless the Arabic segment itself expands them.\nQALA: Translate as \"He said:\" and start a new line for each new critic.\nDATES: Use (d. 256 AH) or (born 194 AH).\nNO HARM: Translate \"There is no harm in him\"; no notes.\nPOLEMICS: Harsh terms (e.g., dajjāl, khabīth, rāfiḍī) must be translated literally; do not soften.";
+const TAFSIR = "AYAH CITES: Do not output surah names unless the Arabic includes the name. Use [2:255]. If the segment contains quoted Qur'an text, translate it in braces: {…} [2:255].\nATTRIBUTES: Translate Allah’s attributes as the author intends; if the author is literal, keep literal (e.g., Hand, Face); do not add metaphorical reinterpretation unless the author does; mirror the author’s theology (Ash'ari vs Salafi) exactly.\nI'RAB TERMS: Mubtada=Subject; Khabar=Predicate; Fa'il=Agent/Doer; Maf'ul=Object.\nPROPHET NAMES: Use Arabic equivalents with ALA-LC diacritics (e.g., Mūsá, ʿĪsá, Dāwūd, Yūsuf).\nPOETRY: Preserve line breaks (one English line per Arabic line); no bullets; prioritize literal structure/grammar over rhyme.";
+const USUL_AL_FIQH = "STRUCTURE: Preserve the argument structure (claims, objections \"if it is said...\", replies \"we say...\", evidences, counter-evidences). Preserve explicit labels (faṣl, masʾalah, qāla, qīla, qulna) as plain English equivalents only when the Arabic is explicitly a label.\nUSUL TERMS: When technical terms appear, output as translit (English) (e.g., ʿāmm (general), khāṣṣ (specific), muṭlaq (absolute), muqayyad (restricted), amr (command), nahy (prohibition), ḥaqīqah (literal), majāz (figurative), mujmal (ambiguous), mubayyan (clarified), naṣṣ (explicit text), ẓāhir (apparent), mafhūm (implication), manṭūq (stated meaning), dalīl (evidence), qiyās (analogical reasoning), ʿillah (effective cause), sabab (cause), shart (condition), māniʿ (preventer), ijmāʿ (consensus), naskh (abrogation)).\nDISPUTE HANDLING: Do not resolve methodological disputes or harmonize schools unless the Arabic explicitly chooses (e.g., al-rājiḥ / al-aṣaḥḥ / ṣaḥīḥ). Preserve attribution to the madhhab/scholars as written.\nQUR'AN/HADITH: Keep verse references in the segment’s style; do not invent references. If a hadith isnad appears, follow MASTER isnad/name rules.";
+const PROMPTS = [
+	{
+		id: "master_prompt",
+		name: "Master Prompt",
+		content: MASTER_PROMPT
+	},
+	{
+		id: "encyclopedia_mixed",
+		name: "Encyclopedia Mixed",
+		content: ENCYCLOPEDIA_MIXED
+	},
+	{
+		id: "fatawa",
+		name: "Fatawa",
+		content: FATAWA
+	},
+	{
+		id: "fiqh",
+		name: "Fiqh",
+		content: FIQH
+	},
+	{
+		id: "hadith",
+		name: "Hadith",
+		content: HADITH
+	},
+	{
+		id: "jarh_wa_tadil",
+		name: "Jarh Wa Tadil",
+		content: JARH_WA_TADIL
+	},
+	{
+		id: "tafsir",
+		name: "Tafsir",
+		content: TAFSIR
+	},
+	{
+		id: "usul_al_fiqh",
+		name: "Usul Al Fiqh",
+		content: USUL_AL_FIQH
+	}
+];
+//#endregion
+//#region src/prompts.ts
+/**
+* Stacks a master prompt with a specialized addon prompt.
+*
+* @param master - The master/base prompt
+* @param addon - The specialized addon prompt
+* @returns Combined prompt text
+*/
+const stackPrompts = (master, addon) => {
+	if (!master) return addon;
+	if (!addon) return master;
+	return `${master}\n${addon}`;
+};
+/**
+* Gets all available prompts as stacked prompts (master + addon combined).
+* Master prompt is returned as-is, addon prompts are stacked with master.
+*
+* @returns Array of all stacked prompts
+*/
+const getPrompts = () => {
+	return PROMPTS.map((prompt) => ({
+		content: prompt.id === "master_prompt" ? prompt.content : stackPrompts(MASTER_PROMPT, prompt.content),
+		id: prompt.id,
+		isMaster: prompt.id === "master_prompt",
+		name: prompt.name
+	}));
+};
+/**
+* Gets a specific prompt by ID (strongly typed).
+* Returns the stacked version (master + addon) for addon prompts.
+*
+* @param id - The prompt ID to retrieve
+* @returns The stacked prompt
+* @throws Error if prompt ID is not found
+*/
+const getPrompt = (id) => {
+	const prompt = PROMPTS.find((p) => p.id === id);
+	if (!prompt) throw new Error(`Prompt not found: ${id}`);
+	return {
+		content: prompt.id === "master_prompt" ? prompt.content : stackPrompts(MASTER_PROMPT, prompt.content),
+		id: prompt.id,
+		isMaster: prompt.id === "master_prompt",
+		name: prompt.name
+	};
+};
+/**
+* Gets the raw stacked prompt text for a specific prompt ID.
+* Convenience method for when you just need the text.
+*
+* @param id - The prompt ID
+* @returns The stacked prompt content string
+*/
+const getStackedPrompt = (id) => {
+	return getPrompt(id).content;
+};
+/**
+* Gets the list of available prompt IDs.
+* Useful for UI dropdowns or validation.
+*
+* @returns Array of prompt IDs
+*/
+const getPromptIds = () => {
+	return PROMPTS.map((p) => p.id);
+};
+/**
+* Gets just the master prompt content.
+* Useful when you need to use a custom addon.
+*
+* @returns The master prompt content
+*/
+const getMasterPrompt = () => {
+	return MASTER_PROMPT;
+};
+//#endregion
+//#region src/validation.ts
+/**
+* Detects Arabic script in text (except allowed ﷺ symbol).
+* This is a SOFT warning - Arabic leak is bad but not a hard failure.
+*
+* @param text - The text to scan for Arabic script
+* @returns Array of validation warnings if Arabic is found
+*/
+const detectArabicScript = (text) => {
+	const warnings = [];
+	const matches = text.match(/[\u0600-\u06FF\u0750-\u077F\uFB50-\uFDF9\uFDFB-\uFDFF\uFE70-\uFEFF]+/g);
+	if (matches) for (const match of matches) warnings.push({
+		match,
+		message: `Arabic script detected: "${match}"`,
+		type: "arabic_leak"
+	});
+	return warnings;
+};
+/**
+* Detects wrong diacritics (â/ã/á instead of correct macrons ā/ī/ū).
+* This is a SOFT warning - wrong diacritics are bad but not a hard failure.
+*
+* @param text - The text to scan for incorrect diacritics
+* @returns Array of validation warnings if wrong diacritics are found
+*/
+const detectWrongDiacritics = (text) => {
+	const warnings = [];
+	const matches = text.match(/[âêîôûãñáéíóú]/gi);
+	if (matches) {
+		const uniqueMatches = [...new Set(matches)];
+		for (const match of uniqueMatches) warnings.push({
+			match,
+			message: `Wrong diacritic "${match}" detected - use macrons (ā, ī, ū) instead`,
+			type: "wrong_diacritics"
+		});
+	}
+	return warnings;
+};
+/**
+* Detects newline immediately after segment ID (the "Gemini bug").
+* Format should be "P1234 - Text" not "P1234\nText".
+*
+* @param text - The text to validate
+* @returns Error message if bug is detected, otherwise undefined
+*/
+const detectNewlineAfterId = (text) => {
+	const pattern = new RegExp(`^${MARKER_ID_PATTERN}\\n`, "m");
+	const match = text.match(pattern);
+	if (match) return `Invalid format: newline after ID "${match[0].trim()}" - use "ID - Text" format`;
+};
+/**
+* Detects implicit continuation text that LLMs add when hallucinating.
+*
+* @param text - The text to scan for continuation markers
+* @returns Error message if continuation text is found, otherwise undefined
+*/
+const detectImplicitContinuation = (text) => {
+	for (const pattern of [
+		/implicit continuation/i,
+		/\bcontinuation:/i,
+		/\bcontinued:/i
+	]) {
+		const match = text.match(pattern);
+		if (match) return `Detected "${match[0]}" - do not add implicit continuation text`;
+	}
+};
+/**
+* Detects meta-talk (translator notes, editor comments) that violate NO META-TALK.
+*
+* @param text - The text to scan for meta-talk
+* @returns Error message if meta-talk is found, otherwise undefined
+*/
+const detectMetaTalk = (text) => {
+	for (const pattern of [
+		/\(note:/i,
+		/\(translator'?s? note:/i,
+		/\[editor:/i,
+		/\[note:/i,
+		/\(ed\.:/i,
+		/\(trans\.:/i
+	]) {
+		const match = text.match(pattern);
+		if (match) return `Detected meta-talk "${match[0]}" - output translation only, no translator/editor notes`;
+	}
+};
+/**
+* Detects duplicate segment IDs in the output.
+*
+* @param ids - List of IDs extracted from the translation
+* @returns Error message if duplicates are found, otherwise undefined
+*/
+const detectDuplicateIds = (ids) => {
+	const seen = /* @__PURE__ */ new Set();
+	for (const id of ids) {
+		if (seen.has(id)) return `Duplicate ID "${id}" detected - each segment should appear only once`;
+		seen.add(id);
+	}
+};
+/**
+* Detects IDs in the output that were not in the source (invented/hallucinated IDs).
+* @param outputIds - IDs extracted from LLM output
+* @param sourceIds - IDs that were present in the source input
+* @returns Error message if invented IDs found, undefined if all IDs are valid
+*/
+const detectInventedIds = (outputIds, sourceIds) => {
+	const sourceSet = new Set(sourceIds);
+	const invented = outputIds.filter((id) => !sourceSet.has(id));
+	if (invented.length > 0) return `Invented ID(s) detected: ${invented.map((id) => `"${id}"`).join(", ")} - these IDs do not exist in the source`;
+};
+/**
+* Detects segments that appear truncated (just "…" or very short with no real content).
+* @param text - The full LLM output text
+* @returns Error message if truncated segments found, undefined if all segments have content
+*/
+const detectTruncatedSegments = (text) => {
+	const segmentPattern = /^([A-Z]\d+[a-j]?)\s*[-–—]\s*(.*)$/gm;
+	const truncated = [];
+	for (const match of text.matchAll(segmentPattern)) {
+		const id = match[1];
+		const content = match[2].trim();
+		if (!content || content === "…" || content === "..." || content === "[INCOMPLETE]") truncated.push(id);
+	}
+	if (truncated.length > 0) return `Truncated segment(s) detected: ${truncated.map((id) => `"${id}"`).join(", ")} - segments must be fully translated`;
+};
+/**
+* Validates translation marker format and returns error message if invalid.
+* Catches common AI hallucinations like malformed reference IDs.
+*
+* @param text - Raw translation text to validate
+* @returns Error message if invalid, undefined if valid
+*/
+const validateTranslationMarkers = (text) => {
+	const { markers, digits, suffix, dashes, optionalSpace } = TRANSLATION_MARKER_PARTS;
+	const invalidRefPattern = new RegExp(`^${markers}(?=${digits})(?=.*${dashes})(?!${digits}${suffix}*${optionalSpace}${dashes})[^\\s-–—]+${optionalSpace}${dashes}`, "m");
+	const invalidRef = text.match(invalidRefPattern);
+	if (invalidRef) return `Invalid reference format "${invalidRef[0].trim()}" - expected format is letter + numbers + optional suffix (a-j) + dash`;
+	const spaceBeforePattern = new RegExp(` ${markers}${digits}${suffix}+${optionalSpace}${dashes}`, "m");
+	const suffixNoDashPattern = new RegExp(`^${markers}${digits}${suffix}(?! ${dashes})`, "m");
+	const match = text.match(spaceBeforePattern) || text.match(suffixNoDashPattern);
+	if (match) return `Suspicious reference found: "${match[0]}"`;
+	const emptyAfterDashPattern = new RegExp(`^${MARKER_ID_PATTERN}${optionalSpace}${dashes}\\s*$`, "m");
+	const emptyAfterDash = text.match(emptyAfterDashPattern);
+	if (emptyAfterDash) return `Reference "${emptyAfterDash[0].trim()}" has dash but no content after it`;
+	const dollarSignPattern = new RegExp(`^${markers}${digits}\\$${digits}`, "m");
+	const dollarSignRef = text.match(dollarSignPattern);
+	if (dollarSignRef) return `Invalid reference format "${dollarSignRef[0]}" - contains $ character`;
+};
+/**
+* Normalizes translation text by splitting merged markers onto separate lines.
+* LLMs sometimes put multiple translations on the same line.
+*
+* @param content - Raw translation text
+* @returns Normalized text with each marker on its own line
+*/
+const normalizeTranslationText = (content) => {
+	const mergedMarkerPattern = new RegExp(` (${MARKER_ID_PATTERN}${TRANSLATION_MARKER_PARTS.optionalSpace}${TRANSLATION_MARKER_PARTS.dashes})`, "gm");
+	return content.replace(mergedMarkerPattern, "\n$1").replace(/\\\[/gm, "[");
+};
+/**
+* Extracts translation IDs from text in order of appearance.
+*
+* @param text - Translation text
+* @returns Array of IDs in order
+*/
+const extractTranslationIds = (text) => {
+	const { dashes, optionalSpace } = TRANSLATION_MARKER_PARTS;
+	const pattern = new RegExp(`^(${MARKER_ID_PATTERN})${optionalSpace}${dashes}`, "gm");
+	const ids = [];
+	for (const match of text.matchAll(pattern)) ids.push(match[1]);
+	return ids;
+};
+/**
+* Extracts the numeric portion from an excerpt ID.
+* E.g., "P11622a" -> 11622, "C123" -> 123, "B45b" -> 45
+*
+* @param id - Excerpt ID
+* @returns Numeric portion of the ID
+*/
+const extractIdNumber = (id) => {
+	const match = id.match(/\d+/);
+	return match ? Number.parseInt(match[0], 10) : 0;
+};
+/**
+* Extracts the prefix (type) from an excerpt ID.
+* E.g., "P11622a" -> "P", "C123" -> "C", "B45" -> "B"
+*
+* @param id - Excerpt ID
+* @returns Single character prefix
+*/
+const extractIdPrefix = (id) => {
+	return id.charAt(0);
+};
+/**
+* Validates that translation IDs appear in ascending numeric order within the same prefix type.
+* This catches LLM errors where translations are output in wrong order (e.g., P12659 before P12651).
+*
+* @param translationIds - IDs from pasted translations
+* @returns Error message if order issue detected, undefined if valid
+*/
+const validateNumericOrder = (translationIds) => {
+	if (translationIds.length < 2) return;
+	const lastNumberByPrefix = /* @__PURE__ */ new Map();
+	for (const id of translationIds) {
+		const prefix = extractIdPrefix(id);
+		const num = extractIdNumber(id);
+		const last = lastNumberByPrefix.get(prefix);
+		if (last && num < last.num) return `Numeric order error: "${id}" (${num}) appears after "${last.id}" (${last.num}) but should come before it`;
+		lastNumberByPrefix.set(prefix, {
+			id,
+			num
+		});
+	}
+};
+/**
+* Validates translation order against expected excerpt order from the store.
+* Allows pasting in multiple blocks where each block is internally ordered.
+* Resets (position going backwards) are allowed between blocks.
+* Errors only when there's disorder WITHIN a block (going backwards then forwards).
+*
+* @param translationIds - IDs from pasted translations
+* @param expectedIds - IDs from store excerpts/headings/footnotes in order
+* @returns Error message if order issue detected, undefined if valid
+*/
+const validateTranslationOrder = (translationIds, expectedIds) => {
+	if (translationIds.length === 0 || expectedIds.length === 0) return;
+	const expectedPositions = /* @__PURE__ */ new Map();
+	for (let i = 0; i < expectedIds.length; i++) expectedPositions.set(expectedIds[i], i);
+	let lastExpectedPosition = -1;
+	let blockStartPosition = -1;
+	let lastFoundId = null;
+	for (const translationId of translationIds) {
+		const expectedPosition = expectedPositions.get(translationId);
+		if (expectedPosition === void 0) continue;
+		if (lastFoundId !== null) {
+			if (expectedPosition < lastExpectedPosition) blockStartPosition = expectedPosition;
+			else if (expectedPosition < blockStartPosition && blockStartPosition !== -1) return `Order error: "${translationId}" appears after "${lastFoundId}" but comes before it in the excerpts. This suggests a duplicate or misplaced translation.`;
+		} else blockStartPosition = expectedPosition;
+		lastExpectedPosition = expectedPosition;
+		lastFoundId = translationId;
+	}
+};
+/**
+* Performs comprehensive validation on translation text.
+* Validates markers, normalizes text, and checks order against expected IDs.
+*
+* @param rawText - Raw translation text from user input
+* @param expectedIds - Expected IDs from store (excerpts + headings + footnotes)
+* @returns Validation result with normalized text and any errors
+*/
+const validateTranslations = (rawText, expectedIds) => {
+	const normalizedText = normalizeTranslationText(rawText);
+	const markerError = validateTranslationMarkers(normalizedText);
+	if (markerError) return {
+		error: markerError,
+		isValid: false,
+		normalizedText,
+		parsedIds: []
+	};
+	const parsedIds = extractTranslationIds(normalizedText);
+	if (parsedIds.length === 0) return {
+		error: "No valid translation markers found",
+		isValid: false,
+		normalizedText,
+		parsedIds: []
+	};
+	const orderError = validateTranslationOrder(parsedIds, expectedIds);
+	if (orderError) return {
+		error: orderError,
+		isValid: false,
+		normalizedText,
+		parsedIds
+	};
+	return {
+		isValid: true,
+		normalizedText,
+		parsedIds
+	};
+};
+/**
+* Finds translation IDs that don't exist in the expected store IDs.
+* Used to validate that all pasted translations can be matched before committing.
+*
+* @param translationIds - IDs from parsed translations
+* @param expectedIds - IDs from store (excerpts + headings + footnotes)
+* @returns Array of IDs that exist in translations but not in the store
+*/
+const findUnmatchedTranslationIds = (translationIds, expectedIds) => {
+	const expectedSet = new Set(expectedIds);
+	return translationIds.filter((id) => !expectedSet.has(id));
+};
+//#endregion
+export { MARKER_ID_PATTERN, Markers, TRANSLATION_MARKER_PARTS, detectArabicScript, detectDuplicateIds, detectImplicitContinuation, detectInventedIds, detectMetaTalk, detectNewlineAfterId, detectTruncatedSegments, detectWrongDiacritics, extractIdNumber, extractIdPrefix, extractTranslationIds, findUnmatchedTranslationIds, formatExcerptsForPrompt, getMasterPrompt, getPrompt, getPromptIds, getPrompts, getStackedPrompt, normalizeTranslationText, stackPrompts, validateNumericOrder, validateTranslationMarkers, validateTranslationOrder, validateTranslations };
+//# sourceMappingURL=index.js.map

package/dist/index.js.map ADDED Viewed

@@ -0,0 +1 @@

+ {"version":3,"file":"index.js","names":[],"sources":["../src/constants.ts","../src/formatting.ts","../.generated/prompts.ts","../src/prompts.ts","../src/validation.ts"],"sourcesContent":["/**\n * Supported marker types for segments.\n */\nexport enum Markers {\n /** B - Book reference */\n Book = 'B',\n /** F - Footnote reference */\n Footnote = 'F',\n /** T - Heading reference */\n Heading = 'T',\n /** C - Chapter reference */\n Chapter = 'C',\n /** N - Note reference */\n Note = 'N',\n /** P - Translation/Plain segment */\n Plain = 'P',\n}\n\n/**\n * Regex parts for building translation marker patterns.\n */\nexport const TRANSLATION_MARKER_PARTS = {\n /** Dash variations (hyphen, en dash, em dash) */\n dashes: '[-–—]',\n /** Numeric portion of the reference */\n digits: '\\\\d+',\n /** Valid marker prefixes (Book, Chapter, Footnote, Translation, Page) */\n markers: `[${Markers.Book}${Markers.Chapter}${Markers.Footnote}${Markers.Heading}${Markers.Plain}${Markers.Note}]`,\n /** Optional whitespace before dash */\n optionalSpace: '\\\\s?',\n /** Valid single-letter suffixes */\n suffix: '[a-z]',\n} as const;\n\n/**\n * Pattern for a segment ID (e.g., P1234, B45a).\n */\nexport const MARKER_ID_PATTERN = `${TRANSLATION_MARKER_PARTS.markers}${TRANSLATION_MARKER_PARTS.digits}${TRANSLATION_MARKER_PARTS.suffix}?`;\n","/**\n * Internal segment type for formatting.\n */\ntype Segment = {\n /** The segment ID (e.g., P1) */\n id: string;\n /** The segment text */\n text: string;\n};\n\n/**\n * Formats excerpts for an LLM prompt by combining the prompt rules with the segment text.\n * Each segment is formatted as \"ID - Text\" and separated by double newlines.\n *\n * @param segments - Array of segments to format\n * @param prompt - The instruction/system prompt to prepend\n * @returns Combined prompt and formatted text\n */\nexport const formatExcerptsForPrompt = (segments: Segment[], prompt: string) => {\n const formatted = segments.map((e) => `${e.id} - ${e.text}`).join('\\n\\n');\n return [prompt, formatted].join('\\n\\n');\n};\n","// AUTO-GENERATED FILE - DO NOT EDIT\n// Generated from prompts/*.md by scripts/generate-prompts.ts\n\n// =============================================================================\n// PROMPT TYPE\n// =============================================================================\n\nexport type PromptId = 'master_prompt' | 'encyclopedia_mixed' | 'fatawa' | 'fiqh' | 'hadith' | 'jarh_wa_tadil' | 'tafsir' | 'usul_al_fiqh';\n\n// =============================================================================\n// RAW PROMPT CONTENT\n// =============================================================================\n\nexport const MASTER_PROMPT = \"ROLE: Expert academic translator of Classical Islamic texts; prioritize accuracy and structure over fluency.\\nCRITICAL NEGATIONS: 1. NO SANITIZATION (Do not soften polemics). 2. NO META-TALK (Output translation only). 3. NO MARKDOWN (Plain text only). 4. NO EMENDATION. 5. NO INFERENCE. 6. NO RESTRUCTURING. 7. NO OPAQUE TRANSLITERATION (Must translate phrases). 8. NO INVENTED SEGMENTS (Do not create, modify, or \\\"continue\\\" segment IDs. Output IDs verbatim exactly as they appear in the source input/metadata. Alphabetic suffixes (e.g., P5511a) are allowed IF AND ONLY IF that exact ID appears in the source. Any ID not present verbatim in the source is INVENTED. EXAMPLE: If P5803b ends with a questioner line, that line stays under P5803b — do NOT invent P5803c. If an expected ID is missing from the source, output: \\\"ID - [MISSING]\\\".)\\nRULES: NO ARABIC SCRIPT (Except ﷺ). Plain text only. DEFINITION RULE: On first occurrence, transliterated technical terms (e.g., bidʿah) MUST be defined: \\\"translit (English)\\\". Preserve Segment ID. Translate meaning/intent. No inference. No extra fields. Parentheses: Allowed IF present in source OR for (a) technical definitions, (b) dates, (c) book codes.\\nTRANSLITERATION & TERMS:\\n1. SCHEME: Use full ALA-LC for explicit Arabic-script Person/Place/Book-Titles.\\n - al-Casing: Lowercase al- mid-sentence; Capitalize after (al-Salafīyyah).\\n - Book Titles: Transliterate only (do not translate meanings).\\n2. TECHNICAL TERMS: On first occurrence, define: \\\"translit (English)\\\" (e.g., bidʿah (innovation), isnād (chain)).\\n - Do NOT output multi-word transliterations without immediate English translation.\\n3. STANDARDIZED TERMS: Use standard academic spellings: Muḥammad, Shaykh, Qurʾān, Islām, ḥadīth.\\n - Sunnah (Capitalized) = The Corpus/Prophetic Tradition. sunnah (lowercase) = legal status/recommended.\\n4. PROPER NAMES: Transliterate only (no parentheses).\\n5. UNICODE: Latin + Latin Extended (āīūḥʿḍṣṭẓʾ) + punctuation. NO Arabic script (except ﷺ). NO emoji.\\n - DIACRITIC FALLBACK: If you cannot produce correct ALA-LC diacritics, output English only. Do NOT use substitute accents (â/ã/á).\\n6. SALUTATION: Replace all Prophet salutations with ﷺ.\\n7. AMBIGUITY: Use contextual meaning from tafsir for theological terms. Do not sanitise polemics (e.g. Rāfiḍah).\\nOUTPUT FORMAT: Segment_ID - English translation.\\nCRITICAL: You must use the ASCII hyphen separator \\\" - \\\" (space+hyphen+space) immediately after the ID. Do NOT use em-dash or en-dash. Do NOT use a newline after the ID.\\nMULTI-LINE SEGMENTS (e.g., internal Q&A): Output the Segment_ID and \\\" - \\\" ONLY ONCE on the first line. Do NOT repeat the Segment_ID on subsequent lines; subsequent lines must start directly with the speaker label/text (no \\\"ID - \\\" prefix).\\nSEGMENT BOUNDARIES (Anti-hallucination): Start a NEW segment ONLY when the source explicitly provides a Segment_ID. If the source continues with extra lines (including speaker labels like \\\"Questioner:\\\"/\\\"The Shaykh:\\\"/\\\"السائل:\\\"/\\\"الشيخ:\\\") WITHOUT a new Segment_ID, treat them as part of the CURRENT segment (multi-line under the current Segment_ID). Do NOT invent a new ID (including alphabetic suffixes like \\\"P5803c\\\") to label such continuation.\\nOUTPUT COMPLETENESS: Translate ALL content in EVERY segment. Do not truncate, summarize, or skip content. The \\\"…\\\" symbol in the source indicates an audio gap in the original recording — it is NOT an instruction to omit content. Every segment must be fully translated. If you cannot complete a segment, output \\\"ID - [INCOMPLETE]\\\" instead of just \\\"…\\\".\\nOUTPUT UNIQUENESS: Each Segment_ID from the source must appear in your output EXACTLY ONCE as an \\\"ID - ...\\\" prefix. Do NOT output the same Segment_ID header twice. If a segment is long or has multiple speaker turns, continue translating under that single ID header without re-stating it.\\nNEGATIVE CONSTRAINTS: Do NOT output \\\"implicit continuation\\\", summaries, or extra paragraphs. Output only the text present in the source segment.\\nExample: P1234 - Translation text... (Correct) vs P1234\\\\nTranslation... (Forbidden).\\nEXAMPLE: Input: P405 - حدثنا عبد الله بن يوسف... Output: P405 - ʿAbd Allāh b. Yūsuf narrated to us...\";\n\nexport const ENCYCLOPEDIA_MIXED = \"NO MODE TAGS: Do not output any mode labels or bracket tags.\\nSTRUCTURE (Apply First):\\n- Q&A: Whenever \\\"Al-Sāʾil:\\\"/\\\"Al-Shaykh:\\\" appear: Start NEW LINE for speaker. Keep Label+Text on SAME LINE.\\n- EXCEPTION: If the speaker label is the VERY FIRST token after the \\\"ID - \\\" prefix, keep it on the same line. (Correct: P5455 - Questioner: Text...) (Wrong: P5455 \\\\n Questioner: Text...).\\n- INTERNAL Q&A: If segment has multiple turns, use new lines for speakers. Output Segment ID ONLY ONCE at the start of the first line. Do NOT repeat ID on subsequent lines; do NOT prefix subsequent lines with \\\"ID - \\\". (e.g. P5455 - Questioner: ... \\\\n The Shaykh: ...).\\n- OUTPUT LABELS: Al-Sāʾil -> Questioner: ; Al-Shaykh -> The Shaykh:\\n\\nDEFINITIONS & CASING:\\n- GEOPOLITICS: Modern place names may use English exonyms (Filasṭīn -> Palestine).\\n- PLURALS: Do not pluralize term-pairs by appending \\\"s\\\" (e.g., \\\"ḥadīth (report)s\\\"). Use the English plural or rephrase.\\n\\nSTATE LOGIC (Priority: Isnad > Rijal > Fiqh > Narrative):\\n- ISNAD (Triggers: `ḥaddathanā`, `akhbaranā`, `ʿan`): Use FULL ALA-LC for names.\\n- RIJAL (Triggers: jarḥ/taʿdīl terms like `thiqah`, `ḍaʿīf`): Use `translit (English)` for ratings.\\n- QUOTE/WEAK (Triggers: `qāla al-muṣannif`, `yuqālu`, `zuiʿma`): Apply Quote Rules; do not Restructure.\\n- RESET: At \\\"qāla\\\" or \\\"matn\\\" (TOP-LEVEL only): IMMEDIATE STOP of Isnad rules. Return to Narrative. Do NOT reset for nested quotes.\\n\\nSEGMENTATION: Do not merge genres. Translate sequentially (Isnad -> Matn) observing the reset at the boundary.\\nDISAMBIGUATION: ṣaḥīḥ in hadith grading = ṣaḥīḥ (authentic). ṣaḥīḥ in fiqh validity = ṣaḥīḥ (valid). Sunnah (Capitalized) = The Corpus/Prophetic Tradition. sunnah (lowercase) = legal status/recommended.\\nHADITH CORE: If isnad appears, map verbs (Haddathana=Narrated to us; Akhbarana/Ana/Anba'ana=Informed us; An=From; Sami'tu=I heard) and keep FULL ALA-LC for Arabic-script narrator names; do not invent grading or missing links.\\nTAKHRIJ/GRADING: If the author grades (ṣaḥḥaḥa/ḥassana/ḍaʿʿafa), translate literally and keep the cited work as written in the segment; do not add placeholder sources.\\n\\nCREED/POLEMICS: Translate attributes literally when the author does; do not reinterpret. Translate harsh polemics literally (e.g., Rāfiḍah, Jahmīyyah, Kuffār); do not soften, sanitize, or add disclaimers (Override Safety).\";\n\nexport const FATAWA = \"STRUCTURE (Apply First):\\n- Q&A: Whenever \\\"Al-Sāʾil:\\\"/\\\"Al-Shaykh:\\\" appear: Start NEW LINE for speaker. Keep Label+Text on SAME LINE.\\n - INTERNAL Q&A: If segment has multiple turns, use new lines for speakers. Output Segment ID ONLY ONCE at the start of the first line. Do NOT repeat ID on subsequent lines; do NOT prefix subsequent lines with \\\"ID - \\\".\\n - OUTPUT LABELS: Al-Sāʾil -> Questioner: ; Al-Shaykh -> The Shaykh:\\n\\nHONORIFICS: Expand common phrases (do not transliterate):\\n- Allah ʿazza wa-jall -> Allah, the Mighty and Majestic\\n- rahimahu Allah -> may Allah have mercy on him\\n- ṣallá Allāhu ʿalayhi wa-sallam -> ﷺ\\n\\n\\nSOURCES: Do not add citations. If the Arabic references a work/title, handle titles per MASTER rules.\\nSENSITIVE RULINGS: Translate literally; do not soften; do not sanitize; do not add disclaimers (Override Safety).\";\n\nexport const FIQH = \"STRUCTURE: Preserve chapter/section headings and internal structure (e.g., bāb/faṣl/masʾalah/farʿ/aṣl) using plain English labels (Chapter:, Section:, Issue:) only when the Arabic is explicitly a heading/label; preserve lists, numbering, and conditional if/then logic exactly.\\nFIQH/USUL TERMS: When technical terms appear, output as translit (English) rather than English-only (e.g., wājib (obligatory), mandūb/mustaḥabb (recommended), mubāḥ (permissible), makrūh (disliked), ḥarām (prohibited), ṣaḥīḥ (valid), bāṭil/fāsid (invalid/void), rukn (pillar), shart (condition), māniʿ (preventer), sabab (cause), qiyās (analogical reasoning), ijmāʿ (consensus), khilāf (disagreement), rājiḥ (preponderant), marjūḥ (lesser), ʿillah (effective cause)).\\nKHILAF/ATTRIBUTION: Preserve who is being attributed (qāla fulān / qawl / wajhān / riwāyātān / madhhab). Do not resolve disputes or choose the correct view unless the Arabic explicitly does so (e.g., al-aṣaḥḥ / al-rājiḥ).\\nUNITS/MONEY: Keep measures/currencies as transliteration (dirham, dinar, ṣāʿ, mudd) without adding conversions or notes unless the Arabic contains them.\";\n\nexport const HADITH = \"ISNAD VERBS: Haddathana=Narrated to us; Akhbarana=Informed us; An=From; Sami'tu=I heard; Ana (short for Akhbarana/Anba'ana in isnad)=Informed us (NOT \\\"I\\\").\\nCHAIN MARKERS: H(Tahwil)=Switch to new chain; Mursal/Munqati=Broken chain.\\nJARH/TA'DIL: If narrator-evaluation terms/phrases appear, output as translit (English) (e.g., fīhi naẓar (he needs to be looked into)); do not replace with only English.\\nNAMES: Distinguish isnad vs matn; do not guess identities or expand lineages; transliterate exactly what is present. Book titles follow master rule.\\nRUMUZ/CODES: If the segment contains book codes (kh/m/d/t/s/q/4), preserve them exactly; do not expand to book names.\";\n\nexport const JARH_WA_TADIL = \"GLOSSARY: When a jarh/ta'dil term/phrase appears, output as translit (English) (e.g., thiqah (trustworthy), ṣadūq (truthful), layyin (soft/lenient), ḍaʿīf (weak), matrūk (abandoned), kadhdhāb (liar), dajjāl (imposter), munkar al-ḥadīth (narrates denounced hadith)).\\nRUMUZ: Preserve book codes in Latin exactly as in the segment (e.g., (kh) (m) (d t q) (4) (a)); do not expand unless the Arabic segment itself expands them.\\nQALA: Translate as \\\"He said:\\\" and start a new line for each new critic.\\nDATES: Use (d. 256 AH) or (born 194 AH).\\nNO HARM: Translate \\\"There is no harm in him\\\"; no notes.\\nPOLEMICS: Harsh terms (e.g., dajjāl, khabīth, rāfiḍī) must be translated literally; do not soften.\";\n\nexport const TAFSIR = \"AYAH CITES: Do not output surah names unless the Arabic includes the name. Use [2:255]. If the segment contains quoted Qur'an text, translate it in braces: {…} [2:255].\\nATTRIBUTES: Translate Allah’s attributes as the author intends; if the author is literal, keep literal (e.g., Hand, Face); do not add metaphorical reinterpretation unless the author does; mirror the author’s theology (Ash'ari vs Salafi) exactly.\\nI'RAB TERMS: Mubtada=Subject; Khabar=Predicate; Fa'il=Agent/Doer; Maf'ul=Object.\\nPROPHET NAMES: Use Arabic equivalents with ALA-LC diacritics (e.g., Mūsá, ʿĪsá, Dāwūd, Yūsuf).\\nPOETRY: Preserve line breaks (one English line per Arabic line); no bullets; prioritize literal structure/grammar over rhyme.\";\n\nexport const USUL_AL_FIQH = \"STRUCTURE: Preserve the argument structure (claims, objections \\\"if it is said...\\\", replies \\\"we say...\\\", evidences, counter-evidences). Preserve explicit labels (faṣl, masʾalah, qāla, qīla, qulna) as plain English equivalents only when the Arabic is explicitly a label.\\nUSUL TERMS: When technical terms appear, output as translit (English) (e.g., ʿāmm (general), khāṣṣ (specific), muṭlaq (absolute), muqayyad (restricted), amr (command), nahy (prohibition), ḥaqīqah (literal), majāz (figurative), mujmal (ambiguous), mubayyan (clarified), naṣṣ (explicit text), ẓāhir (apparent), mafhūm (implication), manṭūq (stated meaning), dalīl (evidence), qiyās (analogical reasoning), ʿillah (effective cause), sabab (cause), shart (condition), māniʿ (preventer), ijmāʿ (consensus), naskh (abrogation)).\\nDISPUTE HANDLING: Do not resolve methodological disputes or harmonize schools unless the Arabic explicitly chooses (e.g., al-rājiḥ / al-aṣaḥḥ / ṣaḥīḥ). Preserve attribution to the madhhab/scholars as written.\\nQUR'AN/HADITH: Keep verse references in the segment’s style; do not invent references. If a hadith isnad appears, follow MASTER isnad/name rules.\";\n\n// =============================================================================\n// PROMPT METADATA\n// =============================================================================\n\nexport const PROMPTS = [\n {\n id: 'master_prompt' as const,\n name: 'Master Prompt',\n content: MASTER_PROMPT,\n },\n {\n id: 'encyclopedia_mixed' as const,\n name: 'Encyclopedia Mixed',\n content: ENCYCLOPEDIA_MIXED,\n },\n {\n id: 'fatawa' as const,\n name: 'Fatawa',\n content: FATAWA,\n },\n {\n id: 'fiqh' as const,\n name: 'Fiqh',\n content: FIQH,\n },\n {\n id: 'hadith' as const,\n name: 'Hadith',\n content: HADITH,\n },\n {\n id: 'jarh_wa_tadil' as const,\n name: 'Jarh Wa Tadil',\n content: JARH_WA_TADIL,\n },\n {\n id: 'tafsir' as const,\n name: 'Tafsir',\n content: TAFSIR,\n },\n {\n id: 'usul_al_fiqh' as const,\n name: 'Usul Al Fiqh',\n content: USUL_AL_FIQH,\n },\n] as const;\n\nexport type PromptMetadata = (typeof PROMPTS)[number];\n","import { MASTER_PROMPT, PROMPTS, type PromptId, type PromptMetadata } from '@generated/prompts';\n\nexport type { PromptId, PromptMetadata };\n\n/**\n * A stacked prompt ready for use with an LLM.\n */\nexport type StackedPrompt = {\n /** Unique identifier */\n id: PromptId;\n /** Human-readable name */\n name: string;\n /** The full prompt content (master + addon if applicable) */\n content: string;\n /** Whether this is the master prompt (not stacked) */\n isMaster: boolean;\n};\n\n/**\n * Stacks a master prompt with a specialized addon prompt.\n *\n * @param master - The master/base prompt\n * @param addon - The specialized addon prompt\n * @returns Combined prompt text\n */\nexport const stackPrompts = (master: string, addon: string): string => {\n if (!master) {\n return addon;\n }\n if (!addon) {\n return master;\n }\n return `${master}\\n${addon}`;\n};\n\n/**\n * Gets all available prompts as stacked prompts (master + addon combined).\n * Master prompt is returned as-is, addon prompts are stacked with master.\n *\n * @returns Array of all stacked prompts\n */\nexport const getPrompts = (): StackedPrompt[] => {\n return PROMPTS.map((prompt) => ({\n content: prompt.id === 'master_prompt' ? prompt.content : stackPrompts(MASTER_PROMPT, prompt.content),\n id: prompt.id,\n isMaster: prompt.id === 'master_prompt',\n name: prompt.name,\n }));\n};\n\n/**\n * Gets a specific prompt by ID (strongly typed).\n * Returns the stacked version (master + addon) for addon prompts.\n *\n * @param id - The prompt ID to retrieve\n * @returns The stacked prompt\n * @throws Error if prompt ID is not found\n */\nexport const getPrompt = (id: PromptId): StackedPrompt => {\n const prompt = PROMPTS.find((p) => p.id === id);\n if (!prompt) {\n throw new Error(`Prompt not found: ${id}`);\n }\n\n return {\n content: prompt.id === 'master_prompt' ? prompt.content : stackPrompts(MASTER_PROMPT, prompt.content),\n id: prompt.id,\n isMaster: prompt.id === 'master_prompt',\n name: prompt.name,\n };\n};\n\n/**\n * Gets the raw stacked prompt text for a specific prompt ID.\n * Convenience method for when you just need the text.\n *\n * @param id - The prompt ID\n * @returns The stacked prompt content string\n */\nexport const getStackedPrompt = (id: PromptId): string => {\n return getPrompt(id).content;\n};\n\n/**\n * Gets the list of available prompt IDs.\n * Useful for UI dropdowns or validation.\n *\n * @returns Array of prompt IDs\n */\nexport const getPromptIds = (): PromptId[] => {\n return PROMPTS.map((p) => p.id);\n};\n\n/**\n * Gets just the master prompt content.\n * Useful when you need to use a custom addon.\n *\n * @returns The master prompt content\n */\nexport const getMasterPrompt = (): string => {\n return MASTER_PROMPT;\n};\n","import { MARKER_ID_PATTERN, TRANSLATION_MARKER_PARTS } from './constants';\n\n/**\n * Warning types for soft validation issues\n */\nexport type ValidationWarningType = 'arabic_leak' | 'wrong_diacritics';\n\n/**\n * A soft validation warning (not a hard error)\n */\nexport type ValidationWarning = {\n /** The type of warning */\n type: ValidationWarningType;\n /** Human-readable warning message */\n message: string;\n /** The offending text match */\n match?: string;\n};\n\n/**\n * Result of translation validation\n */\nexport type TranslationValidationResult = {\n /** Whether validation passed */\n isValid: boolean;\n /** Error message if validation failed */\n error?: string;\n /** Normalized/fixed text (with merged markers split onto separate lines) */\n normalizedText: string;\n /** List of parsed translation IDs in order */\n parsedIds: string[];\n /** Soft warnings (issues that don't fail validation) */\n warnings?: ValidationWarning[];\n};\n\n/**\n * Detects Arabic script in text (except allowed ﷺ symbol).\n * This is a SOFT warning - Arabic leak is bad but not a hard failure.\n *\n * @param text - The text to scan for Arabic script\n * @returns Array of validation warnings if Arabic is found\n */\nexport const detectArabicScript = (text: string): ValidationWarning[] => {\n const warnings: ValidationWarning[] = [];\n // Arabic Unicode range: \\u0600-\\u06FF, \\u0750-\\u077F, \\uFB50-\\uFDFF, \\uFE70-\\uFEFF\n // Exclude ﷺ (U+FDFA)\n const arabicPattern = /[\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDF9\\uFDFB-\\uFDFF\\uFE70-\\uFEFF]+/g;\n const matches = text.match(arabicPattern);\n\n if (matches) {\n for (const match of matches) {\n warnings.push({\n match,\n message: `Arabic script detected: \"${match}\"`,\n type: 'arabic_leak',\n });\n }\n }\n\n return warnings;\n};\n\n/**\n * Detects wrong diacritics (â/ã/á instead of correct macrons ā/ī/ū).\n * This is a SOFT warning - wrong diacritics are bad but not a hard failure.\n *\n * @param text - The text to scan for incorrect diacritics\n * @returns Array of validation warnings if wrong diacritics are found\n */\nexport const detectWrongDiacritics = (text: string): ValidationWarning[] => {\n const warnings: ValidationWarning[] = [];\n // Wrong diacritics: circumflex (â/ê/î/ô/û), tilde (ã/ñ), acute (á/é/í/ó/ú)\n const wrongPattern = /[âêîôûãñáéíóú]/gi;\n const matches = text.match(wrongPattern);\n\n if (matches) {\n const uniqueMatches = [...new Set(matches)];\n for (const match of uniqueMatches) {\n warnings.push({\n match,\n message: `Wrong diacritic \"${match}\" detected - use macrons (ā, ī, ū) instead`,\n type: 'wrong_diacritics',\n });\n }\n }\n\n return warnings;\n};\n\n/**\n * Detects newline immediately after segment ID (the \"Gemini bug\").\n * Format should be \"P1234 - Text\" not \"P1234\\nText\".\n *\n * @param text - The text to validate\n * @returns Error message if bug is detected, otherwise undefined\n */\nexport const detectNewlineAfterId = (text: string): string | undefined => {\n const pattern = new RegExp(`^${MARKER_ID_PATTERN}\\\\n`, 'm');\n const match = text.match(pattern);\n\n if (match) {\n return `Invalid format: newline after ID \"${match[0].trim()}\" - use \"ID - Text\" format`;\n }\n};\n\n/**\n * Detects forbidden terms from the locked glossary.\n * These are common \"gravity well\" spellings that should be avoided.\n *\n * @param text - The text to scan for forbidden terms\n * @returns Error message if a forbidden term is found, otherwise undefined\n */\nexport const detectForbiddenTerms = (text: string): string | undefined => {\n const forbidden: Array<{ term: RegExp; correct: string }> = [\n { correct: 'Shaykh', term: /\\bSheikh\\b/i },\n { correct: 'Qurʾān', term: /\\bKoran\\b/i },\n { correct: 'ḥadīth', term: /\\bHadith\\b/ }, // Case-sensitive: Hadith without dots\n { correct: 'Islām', term: /\\bIslam\\b/ }, // Case-sensitive: Islam without macron\n { correct: 'Salafīyyah', term: /\\bSalafism\\b/i },\n ];\n\n for (const { term, correct } of forbidden) {\n const match = text.match(term);\n if (match) {\n return `Forbidden term \"${match[0]}\" detected - use \"${correct}\" instead`;\n }\n }\n};\n\n/**\n * Detects implicit continuation text that LLMs add when hallucinating.\n *\n * @param text - The text to scan for continuation markers\n * @returns Error message if continuation text is found, otherwise undefined\n */\nexport const detectImplicitContinuation = (text: string): string | undefined => {\n const patterns = [/implicit continuation/i, /\\bcontinuation:/i, /\\bcontinued:/i];\n\n for (const pattern of patterns) {\n const match = text.match(pattern);\n if (match) {\n return `Detected \"${match[0]}\" - do not add implicit continuation text`;\n }\n }\n};\n\n/**\n * Detects meta-talk (translator notes, editor comments) that violate NO META-TALK.\n *\n * @param text - The text to scan for meta-talk\n * @returns Error message if meta-talk is found, otherwise undefined\n */\nexport const detectMetaTalk = (text: string): string | undefined => {\n const patterns = [/\\(note:/i, /\\(translator'?s? note:/i, /\\[editor:/i, /\\[note:/i, /\\(ed\\.:/i, /\\(trans\\.:/i];\n\n for (const pattern of patterns) {\n const match = text.match(pattern);\n if (match) {\n return `Detected meta-talk \"${match[0]}\" - output translation only, no translator/editor notes`;\n }\n }\n};\n\n/**\n * Detects duplicate segment IDs in the output.\n *\n * @param ids - List of IDs extracted from the translation\n * @returns Error message if duplicates are found, otherwise undefined\n */\nexport const detectDuplicateIds = (ids: string[]): string | undefined => {\n const seen = new Set<string>();\n for (const id of ids) {\n if (seen.has(id)) {\n return `Duplicate ID \"${id}\" detected - each segment should appear only once`;\n }\n seen.add(id);\n }\n};\n\n/**\n * Detects IDs in the output that were not in the source (invented/hallucinated IDs).\n * @param outputIds - IDs extracted from LLM output\n * @param sourceIds - IDs that were present in the source input\n * @returns Error message if invented IDs found, undefined if all IDs are valid\n */\nexport const detectInventedIds = (outputIds: string[], sourceIds: string[]): string | undefined => {\n const sourceSet = new Set(sourceIds);\n const invented = outputIds.filter((id) => !sourceSet.has(id));\n\n if (invented.length > 0) {\n return `Invented ID(s) detected: ${invented.map((id) => `\"${id}\"`).join(', ')} - these IDs do not exist in the source`;\n }\n};\n\n/**\n * Detects segments that appear truncated (just \"…\" or very short with no real content).\n * @param text - The full LLM output text\n * @returns Error message if truncated segments found, undefined if all segments have content\n */\nexport const detectTruncatedSegments = (text: string): string | undefined => {\n // Pattern to match segment lines\n const segmentPattern = /^([A-Z]\\d+[a-j]?)\\s*[-–—]\\s*(.*)$/gm;\n const truncated: string[] = [];\n\n for (const match of text.matchAll(segmentPattern)) {\n const id = match[1];\n const content = match[2].trim();\n\n // Check for truncated content: empty, just ellipsis, or [INCOMPLETE]\n if (!content || content === '…' || content === '...' || content === '[INCOMPLETE]') {\n truncated.push(id);\n }\n }\n\n if (truncated.length > 0) {\n return `Truncated segment(s) detected: ${truncated.map((id) => `\"${id}\"`).join(', ')} - segments must be fully translated`;\n }\n};\n\n/**\n * Validates translation marker format and returns error message if invalid.\n * Catches common AI hallucinations like malformed reference IDs.\n *\n * @param text - Raw translation text to validate\n * @returns Error message if invalid, undefined if valid\n */\nexport const validateTranslationMarkers = (text: string): string | undefined => {\n const { markers, digits, suffix, dashes, optionalSpace } = TRANSLATION_MARKER_PARTS;\n\n // Check for invalid reference format (with dash but wrong structure)\n // This catches cases like B12a34 -, P1x2y3 -, P2247$2 -, etc.\n // Requires at least one digit after the marker to be considered a potential reference\n const invalidRefPattern = new RegExp(\n `^${markers}(?=${digits})(?=.*${dashes})(?!${digits}${suffix}*${optionalSpace}${dashes})[^\\\\s-–—]+${optionalSpace}${dashes}`,\n 'm',\n );\n const invalidRef = text.match(invalidRefPattern);\n\n if (invalidRef) {\n return `Invalid reference format \"${invalidRef[0].trim()}\" - expected format is letter + numbers + optional suffix (a-j) + dash`;\n }\n\n // Check for space before reference with multi-letter suffix (e.g., \" P123ab -\")\n const spaceBeforePattern = new RegExp(` ${markers}${digits}${suffix}+${optionalSpace}${dashes}`, 'm');\n\n // Check for reference with single letter suffix but no dash after (e.g., \"P123a without\")\n const suffixNoDashPattern = new RegExp(`^${markers}${digits}${suffix}(?! ${dashes})`, 'm');\n\n const match = text.match(spaceBeforePattern) || text.match(suffixNoDashPattern);\n\n if (match) {\n return `Suspicious reference found: \"${match[0]}\"`;\n }\n\n // Check for references with dash but no content after (e.g., \"P123 -\")\n const emptyAfterDashPattern = new RegExp(`^${MARKER_ID_PATTERN}${optionalSpace}${dashes}\\\\s*$`, 'm');\n const emptyAfterDash = text.match(emptyAfterDashPattern);\n\n if (emptyAfterDash) {\n return `Reference \"${emptyAfterDash[0].trim()}\" has dash but no content after it`;\n }\n\n // Check for $ character in references (invalid format like B1234$5)\n const dollarSignPattern = new RegExp(`^${markers}${digits}\\\\$${digits}`, 'm');\n const dollarSignRef = text.match(dollarSignPattern);\n\n if (dollarSignRef) {\n return `Invalid reference format \"${dollarSignRef[0]}\" - contains $ character`;\n }\n};\n\n/**\n * Normalizes translation text by splitting merged markers onto separate lines.\n * LLMs sometimes put multiple translations on the same line.\n *\n * @param content - Raw translation text\n * @returns Normalized text with each marker on its own line\n */\nexport const normalizeTranslationText = (content: string): string => {\n const mergedMarkerPattern = new RegExp(\n ` (${MARKER_ID_PATTERN}${TRANSLATION_MARKER_PARTS.optionalSpace}${TRANSLATION_MARKER_PARTS.dashes})`,\n 'gm',\n );\n\n return content.replace(mergedMarkerPattern, '\\n$1').replace(/\\\\\\[/gm, '[');\n};\n\n/**\n * Extracts translation IDs from text in order of appearance.\n *\n * @param text - Translation text\n * @returns Array of IDs in order\n */\nexport const extractTranslationIds = (text: string): string[] => {\n const { dashes, optionalSpace } = TRANSLATION_MARKER_PARTS;\n const pattern = new RegExp(`^(${MARKER_ID_PATTERN})${optionalSpace}${dashes}`, 'gm');\n const ids: string[] = [];\n\n for (const match of text.matchAll(pattern)) {\n ids.push(match[1]);\n }\n\n return ids;\n};\n\n/**\n * Extracts the numeric portion from an excerpt ID.\n * E.g., \"P11622a\" -> 11622, \"C123\" -> 123, \"B45b\" -> 45\n *\n * @param id - Excerpt ID\n * @returns Numeric portion of the ID\n */\nexport const extractIdNumber = (id: string): number => {\n const match = id.match(/\\d+/);\n return match ? Number.parseInt(match[0], 10) : 0;\n};\n\n/**\n * Extracts the prefix (type) from an excerpt ID.\n * E.g., \"P11622a\" -> \"P\", \"C123\" -> \"C\", \"B45\" -> \"B\"\n *\n * @param id - Excerpt ID\n * @returns Single character prefix\n */\nexport const extractIdPrefix = (id: string): string => {\n return id.charAt(0);\n};\n\n/**\n * Validates that translation IDs appear in ascending numeric order within the same prefix type.\n * This catches LLM errors where translations are output in wrong order (e.g., P12659 before P12651).\n *\n * @param translationIds - IDs from pasted translations\n * @returns Error message if order issue detected, undefined if valid\n */\nexport const validateNumericOrder = (translationIds: string[]): string | undefined => {\n if (translationIds.length < 2) {\n return;\n }\n\n // Track last seen number for each prefix type\n const lastNumberByPrefix = new Map<string, { id: string; num: number }>();\n\n for (const id of translationIds) {\n const prefix = extractIdPrefix(id);\n const num = extractIdNumber(id);\n\n const last = lastNumberByPrefix.get(prefix);\n\n if (last && num < last.num) {\n // Out of numeric order within the same prefix type\n return `Numeric order error: \"${id}\" (${num}) appears after \"${last.id}\" (${last.num}) but should come before it`;\n }\n\n lastNumberByPrefix.set(prefix, { id, num });\n }\n};\n\n/**\n * Validates translation order against expected excerpt order from the store.\n * Allows pasting in multiple blocks where each block is internally ordered.\n * Resets (position going backwards) are allowed between blocks.\n * Errors only when there's disorder WITHIN a block (going backwards then forwards).\n *\n * @param translationIds - IDs from pasted translations\n * @param expectedIds - IDs from store excerpts/headings/footnotes in order\n * @returns Error message if order issue detected, undefined if valid\n */\nexport const validateTranslationOrder = (translationIds: string[], expectedIds: string[]): string | undefined => {\n if (translationIds.length === 0 || expectedIds.length === 0) {\n return;\n }\n\n // Build a map of expected ID positions for O(1) lookup\n const expectedPositions = new Map<string, number>();\n for (let i = 0; i < expectedIds.length; i++) {\n expectedPositions.set(expectedIds[i], i);\n }\n\n // Track position within current block\n // When position goes backwards, we start a new block\n // Error only if we go backwards THEN forwards within the same conceptual sequence\n let lastExpectedPosition = -1;\n let blockStartPosition = -1;\n let lastFoundId: string | null = null;\n\n for (const translationId of translationIds) {\n const expectedPosition = expectedPositions.get(translationId);\n\n if (expectedPosition === undefined) {\n // ID not found in expected list - skip\n continue;\n }\n\n if (lastFoundId !== null) {\n if (expectedPosition < lastExpectedPosition) {\n // Reset detected - starting a new block\n // This is allowed, just track the new block's start\n blockStartPosition = expectedPosition;\n } else if (expectedPosition < blockStartPosition && blockStartPosition !== -1) {\n // Within the current block, we went backwards - this is an error\n // This catches: A, B, C (block 1), D, E, C (error: C < E but we're in block starting at D)\n return `Order error: \"${translationId}\" appears after \"${lastFoundId}\" but comes before it in the excerpts. This suggests a duplicate or misplaced translation.`;\n }\n } else {\n blockStartPosition = expectedPosition;\n }\n\n lastExpectedPosition = expectedPosition;\n lastFoundId = translationId;\n }\n};\n\n/**\n * Performs comprehensive validation on translation text.\n * Validates markers, normalizes text, and checks order against expected IDs.\n *\n * @param rawText - Raw translation text from user input\n * @param expectedIds - Expected IDs from store (excerpts + headings + footnotes)\n * @returns Validation result with normalized text and any errors\n */\nexport const validateTranslations = (rawText: string, expectedIds: string[]): TranslationValidationResult => {\n // First normalize the text (split merged markers)\n const normalizedText = normalizeTranslationText(rawText);\n\n // Validate marker formats\n const markerError = validateTranslationMarkers(normalizedText);\n if (markerError) {\n return { error: markerError, isValid: false, normalizedText, parsedIds: [] };\n }\n\n // Extract IDs from normalized text\n const parsedIds = extractTranslationIds(normalizedText);\n\n if (parsedIds.length === 0) {\n return { error: 'No valid translation markers found', isValid: false, normalizedText, parsedIds: [] };\n }\n\n // Validate order against expected IDs\n const orderError = validateTranslationOrder(parsedIds, expectedIds);\n if (orderError) {\n return { error: orderError, isValid: false, normalizedText, parsedIds };\n }\n\n return { isValid: true, normalizedText, parsedIds };\n};\n\n/**\n * Finds translation IDs that don't exist in the expected store IDs.\n * Used to validate that all pasted translations can be matched before committing.\n *\n * @param translationIds - IDs from parsed translations\n * @param expectedIds - IDs from store (excerpts + headings + footnotes)\n * @returns Array of IDs that exist in translations but not in the store\n */\nexport const findUnmatchedTranslationIds = (translationIds: string[], expectedIds: string[]): string[] => {\n const expectedSet = new Set(expectedIds);\n return translationIds.filter((id) => !expectedSet.has(id));\n};\n"],"mappings":";;;;AAGA,IAAY,8CAAL;;AAEH;;AAEA;;AAEA;;AAEA;;AAEA;;AAEA;;;;;;AAMJ,MAAa,2BAA2B;CAEpC,QAAQ;CAER,QAAQ;CAER,SAAS,IAAI,QAAQ,OAAO,QAAQ,UAAU,QAAQ,WAAW,QAAQ,UAAU,QAAQ,QAAQ,QAAQ,KAAK;CAEhH,eAAe;CAEf,QAAQ;CACX;;;;AAKD,MAAa,oBAAoB,GAAG,yBAAyB,UAAU,yBAAyB,SAAS,yBAAyB,OAAO;;;;;;;;;;;;ACnBzI,MAAa,2BAA2B,UAAqB,WAAmB;AAE5E,QAAO,CAAC,QADU,SAAS,KAAK,MAAM,GAAG,EAAE,GAAG,KAAK,EAAE,OAAO,CAAC,KAAK,OAAO,CAC/C,CAAC,KAAK,OAAO;;;;;ACP3C,MAAa,gBAAgB;AAE7B,MAAa,qBAAqB;AAElC,MAAa,SAAS;AAEtB,MAAa,OAAO;AAEpB,MAAa,SAAS;AAEtB,MAAa,gBAAgB;AAE7B,MAAa,SAAS;AAEtB,MAAa,eAAe;AAM5B,MAAa,UAAU;CACnB;EACI,IAAI;EACJ,MAAM;EACN,SAAS;EACZ;CACD;EACI,IAAI;EACJ,MAAM;EACN,SAAS;EACZ;CACD;EACI,IAAI;EACJ,MAAM;EACN,SAAS;EACZ;CACD;EACI,IAAI;EACJ,MAAM;EACN,SAAS;EACZ;CACD;EACI,IAAI;EACJ,MAAM;EACN,SAAS;EACZ;CACD;EACI,IAAI;EACJ,MAAM;EACN,SAAS;EACZ;CACD;EACI,IAAI;EACJ,MAAM;EACN,SAAS;EACZ;CACD;EACI,IAAI;EACJ,MAAM;EACN,SAAS;EACZ;CACJ;;;;;;;;;;;ACjDD,MAAa,gBAAgB,QAAgB,UAA0B;AACnE,KAAI,CAAC,OACD,QAAO;AAEX,KAAI,CAAC,MACD,QAAO;AAEX,QAAO,GAAG,OAAO,IAAI;;;;;;;;AASzB,MAAa,mBAAoC;AAC7C,QAAO,QAAQ,KAAK,YAAY;EAC5B,SAAS,OAAO,OAAO,kBAAkB,OAAO,UAAU,aAAa,eAAe,OAAO,QAAQ;EACrG,IAAI,OAAO;EACX,UAAU,OAAO,OAAO;EACxB,MAAM,OAAO;EAChB,EAAE;;;;;;;;;;AAWP,MAAa,aAAa,OAAgC;CACtD,MAAM,SAAS,QAAQ,MAAM,MAAM,EAAE,OAAO,GAAG;AAC/C,KAAI,CAAC,OACD,OAAM,IAAI,MAAM,qBAAqB,KAAK;AAG9C,QAAO;EACH,SAAS,OAAO,OAAO,kBAAkB,OAAO,UAAU,aAAa,eAAe,OAAO,QAAQ;EACrG,IAAI,OAAO;EACX,UAAU,OAAO,OAAO;EACxB,MAAM,OAAO;EAChB;;;;;;;;;AAUL,MAAa,oBAAoB,OAAyB;AACtD,QAAO,UAAU,GAAG,CAAC;;;;;;;;AASzB,MAAa,qBAAiC;AAC1C,QAAO,QAAQ,KAAK,MAAM,EAAE,GAAG;;;;;;;;AASnC,MAAa,wBAAgC;AACzC,QAAO;;;;;;;;;;;;AC1DX,MAAa,sBAAsB,SAAsC;CACrE,MAAM,WAAgC,EAAE;CAIxC,MAAM,UAAU,KAAK,MADC,wEACmB;AAEzC,KAAI,QACA,MAAK,MAAM,SAAS,QAChB,UAAS,KAAK;EACV;EACA,SAAS,4BAA4B,MAAM;EAC3C,MAAM;EACT,CAAC;AAIV,QAAO;;;;;;;;;AAUX,MAAa,yBAAyB,SAAsC;CACxE,MAAM,WAAgC,EAAE;CAGxC,MAAM,UAAU,KAAK,MADA,mBACmB;AAExC,KAAI,SAAS;EACT,MAAM,gBAAgB,CAAC,GAAG,IAAI,IAAI,QAAQ,CAAC;AAC3C,OAAK,MAAM,SAAS,cAChB,UAAS,KAAK;GACV;GACA,SAAS,oBAAoB,MAAM;GACnC,MAAM;GACT,CAAC;;AAIV,QAAO;;;;;;;;;AAUX,MAAa,wBAAwB,SAAqC;CACtE,MAAM,UAAU,IAAI,OAAO,IAAI,kBAAkB,MAAM,IAAI;CAC3D,MAAM,QAAQ,KAAK,MAAM,QAAQ;AAEjC,KAAI,MACA,QAAO,qCAAqC,MAAM,GAAG,MAAM,CAAC;;;;;;;;AAkCpE,MAAa,8BAA8B,SAAqC;AAG5E,MAAK,MAAM,WAFM;EAAC;EAA0B;EAAoB;EAAgB,EAEhD;EAC5B,MAAM,QAAQ,KAAK,MAAM,QAAQ;AACjC,MAAI,MACA,QAAO,aAAa,MAAM,GAAG;;;;;;;;;AAWzC,MAAa,kBAAkB,SAAqC;AAGhE,MAAK,MAAM,WAFM;EAAC;EAAY;EAA2B;EAAc;EAAY;EAAY;EAAc,EAE7E;EAC5B,MAAM,QAAQ,KAAK,MAAM,QAAQ;AACjC,MAAI,MACA,QAAO,uBAAuB,MAAM,GAAG;;;;;;;;;AAWnD,MAAa,sBAAsB,QAAsC;CACrE,MAAM,uBAAO,IAAI,KAAa;AAC9B,MAAK,MAAM,MAAM,KAAK;AAClB,MAAI,KAAK,IAAI,GAAG,CACZ,QAAO,iBAAiB,GAAG;AAE/B,OAAK,IAAI,GAAG;;;;;;;;;AAUpB,MAAa,qBAAqB,WAAqB,cAA4C;CAC/F,MAAM,YAAY,IAAI,IAAI,UAAU;CACpC,MAAM,WAAW,UAAU,QAAQ,OAAO,CAAC,UAAU,IAAI,GAAG,CAAC;AAE7D,KAAI,SAAS,SAAS,EAClB,QAAO,4BAA4B,SAAS,KAAK,OAAO,IAAI,GAAG,GAAG,CAAC,KAAK,KAAK,CAAC;;;;;;;AAStF,MAAa,2BAA2B,SAAqC;CAEzE,MAAM,iBAAiB;CACvB,MAAM,YAAsB,EAAE;AAE9B,MAAK,MAAM,SAAS,KAAK,SAAS,eAAe,EAAE;EAC/C,MAAM,KAAK,MAAM;EACjB,MAAM,UAAU,MAAM,GAAG,MAAM;AAG/B,MAAI,CAAC,WAAW,YAAY,OAAO,YAAY,SAAS,YAAY,eAChE,WAAU,KAAK,GAAG;;AAI1B,KAAI,UAAU,SAAS,EACnB,QAAO,kCAAkC,UAAU,KAAK,OAAO,IAAI,GAAG,GAAG,CAAC,KAAK,KAAK,CAAC;;;;;;;;;AAW7F,MAAa,8BAA8B,SAAqC;CAC5E,MAAM,EAAE,SAAS,QAAQ,QAAQ,QAAQ,kBAAkB;CAK3D,MAAM,oBAAoB,IAAI,OAC1B,IAAI,QAAQ,KAAK,OAAO,QAAQ,OAAO,MAAM,SAAS,OAAO,GAAG,gBAAgB,OAAO,aAAa,gBAAgB,UACpH,IACH;CACD,MAAM,aAAa,KAAK,MAAM,kBAAkB;AAEhD,KAAI,WACA,QAAO,6BAA6B,WAAW,GAAG,MAAM,CAAC;CAI7D,MAAM,qBAAqB,IAAI,OAAO,IAAI,UAAU,SAAS,OAAO,GAAG,gBAAgB,UAAU,IAAI;CAGrG,MAAM,sBAAsB,IAAI,OAAO,IAAI,UAAU,SAAS,OAAO,MAAM,OAAO,IAAI,IAAI;CAE1F,MAAM,QAAQ,KAAK,MAAM,mBAAmB,IAAI,KAAK,MAAM,oBAAoB;AAE/E,KAAI,MACA,QAAO,gCAAgC,MAAM,GAAG;CAIpD,MAAM,wBAAwB,IAAI,OAAO,IAAI,oBAAoB,gBAAgB,OAAO,QAAQ,IAAI;CACpG,MAAM,iBAAiB,KAAK,MAAM,sBAAsB;AAExD,KAAI,eACA,QAAO,cAAc,eAAe,GAAG,MAAM,CAAC;CAIlD,MAAM,oBAAoB,IAAI,OAAO,IAAI,UAAU,OAAO,KAAK,UAAU,IAAI;CAC7E,MAAM,gBAAgB,KAAK,MAAM,kBAAkB;AAEnD,KAAI,cACA,QAAO,6BAA6B,cAAc,GAAG;;;;;;;;;AAW7D,MAAa,4BAA4B,YAA4B;CACjE,MAAM,sBAAsB,IAAI,OAC5B,KAAK,oBAAoB,yBAAyB,gBAAgB,yBAAyB,OAAO,IAClG,KACH;AAED,QAAO,QAAQ,QAAQ,qBAAqB,OAAO,CAAC,QAAQ,UAAU,IAAI;;;;;;;;AAS9E,MAAa,yBAAyB,SAA2B;CAC7D,MAAM,EAAE,QAAQ,kBAAkB;CAClC,MAAM,UAAU,IAAI,OAAO,KAAK,kBAAkB,GAAG,gBAAgB,UAAU,KAAK;CACpF,MAAM,MAAgB,EAAE;AAExB,MAAK,MAAM,SAAS,KAAK,SAAS,QAAQ,CACtC,KAAI,KAAK,MAAM,GAAG;AAGtB,QAAO;;;;;;;;;AAUX,MAAa,mBAAmB,OAAuB;CACnD,MAAM,QAAQ,GAAG,MAAM,MAAM;AAC7B,QAAO,QAAQ,OAAO,SAAS,MAAM,IAAI,GAAG,GAAG;;;;;;;;;AAUnD,MAAa,mBAAmB,OAAuB;AACnD,QAAO,GAAG,OAAO,EAAE;;;;;;;;;AAUvB,MAAa,wBAAwB,mBAAiD;AAClF,KAAI,eAAe,SAAS,EACxB;CAIJ,MAAM,qCAAqB,IAAI,KAA0C;AAEzE,MAAK,MAAM,MAAM,gBAAgB;EAC7B,MAAM,SAAS,gBAAgB,GAAG;EAClC,MAAM,MAAM,gBAAgB,GAAG;EAE/B,MAAM,OAAO,mBAAmB,IAAI,OAAO;AAE3C,MAAI,QAAQ,MAAM,KAAK,IAEnB,QAAO,yBAAyB,GAAG,KAAK,IAAI,mBAAmB,KAAK,GAAG,KAAK,KAAK,IAAI;AAGzF,qBAAmB,IAAI,QAAQ;GAAE;GAAI;GAAK,CAAC;;;;;;;;;;;;;AAcnD,MAAa,4BAA4B,gBAA0B,gBAA8C;AAC7G,KAAI,eAAe,WAAW,KAAK,YAAY,WAAW,EACtD;CAIJ,MAAM,oCAAoB,IAAI,KAAqB;AACnD,MAAK,IAAI,IAAI,GAAG,IAAI,YAAY,QAAQ,IACpC,mBAAkB,IAAI,YAAY,IAAI,EAAE;CAM5C,IAAI,uBAAuB;CAC3B,IAAI,qBAAqB;CACzB,IAAI,cAA6B;AAEjC,MAAK,MAAM,iBAAiB,gBAAgB;EACxC,MAAM,mBAAmB,kBAAkB,IAAI,cAAc;AAE7D,MAAI,qBAAqB,OAErB;AAGJ,MAAI,gBAAgB,MAChB;OAAI,mBAAmB,qBAGnB,sBAAqB;YACd,mBAAmB,sBAAsB,uBAAuB,GAGvE,QAAO,iBAAiB,cAAc,mBAAmB,YAAY;QAGzE,sBAAqB;AAGzB,yBAAuB;AACvB,gBAAc;;;;;;;;;;;AAYtB,MAAa,wBAAwB,SAAiB,gBAAuD;CAEzG,MAAM,iBAAiB,yBAAyB,QAAQ;CAGxD,MAAM,cAAc,2BAA2B,eAAe;AAC9D,KAAI,YACA,QAAO;EAAE,OAAO;EAAa,SAAS;EAAO;EAAgB,WAAW,EAAE;EAAE;CAIhF,MAAM,YAAY,sBAAsB,eAAe;AAEvD,KAAI,UAAU,WAAW,EACrB,QAAO;EAAE,OAAO;EAAsC,SAAS;EAAO;EAAgB,WAAW,EAAE;EAAE;CAIzG,MAAM,aAAa,yBAAyB,WAAW,YAAY;AACnE,KAAI,WACA,QAAO;EAAE,OAAO;EAAY,SAAS;EAAO;EAAgB;EAAW;AAG3E,QAAO;EAAE,SAAS;EAAM;EAAgB;EAAW;;;;;;;;;;AAWvD,MAAa,+BAA+B,gBAA0B,gBAAoC;CACtG,MAAM,cAAc,IAAI,IAAI,YAAY;AACxC,QAAO,eAAe,QAAQ,OAAO,CAAC,YAAY,IAAI,GAAG,CAAC"}

package/package.json ADDED Viewed

@@ -0,0 +1,61 @@
+{
+    "author": "Ragaeeb Haq",
+    "bugs": {
+        "url": "https://github.com/ragaeeb/wobble-bibble/issues"
+    },
+    "description": "TypeScript library for Islamic text translation prompts with LLM output validation and prompt stacking utilities.",
+    "devDependencies": {
+        "@biomejs/biome": "^2.3.11",
+        "@types/bun": "^1.3.6",
+        "@types/node": "^25.0.8",
+        "semantic-release": "^25.0.2",
+        "tsdown": "^0.20.0-beta.2",
+        "typescript": "^5.9.3"
+    },
+    "engines": {
+        "bun": ">=1.3.6",
+        "node": ">=24.0.0"
+    },
+    "exports": {
+        ".": {
+            "import": "./dist/index.js",
+            "types": "./dist/index.d.ts"
+        }
+    },
+    "files": [
+        "dist/**"
+    ],
+    "homepage": "https://github.com/ragaeeb/wobble-bibble",
+    "keywords": [
+        "islamic-translation",
+        "llm-prompts",
+        "arabic-english",
+        "hadith",
+        "fiqh",
+        "tafsir",
+        "prompt-engineering",
+        "validation",
+        "typescript"
+    ],
+    "license": "MIT",
+    "main": "dist/index.js",
+    "module": "dist/index.js",
+    "name": "wobble-bibble",
+    "packageManager": "bun@1.3.6",
+    "repository": {
+        "type": "git",
+        "url": "git+https://github.com/ragaeeb/wobble-bibble.git"
+    },
+    "scripts": {
+        "build": "bun run generate && tsdown",
+        "generate": "bun run scripts/generate-prompts.ts",
+        "lint": "biome check .",
+        "test": "bun test",
+        "test:dist": "bun run build && bun test tests/dist.test.ts"
+    },
+    "sideEffects": false,
+    "source": "src/index.ts",
+    "type": "module",
+    "types": "dist/index.d.ts",
+    "version": "1.0.0"
+}