npm - bitaboom - Versions diffs - 1.5.0 → 2.1.0 - Mend

bitaboom 1.5.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# Table of Contents
+# Bitaboom
 [![wakatime](https://wakatime.com/badge/user/a0b906ce-b8e7-4463-8bce-383238df6d4b/project/4a00f7dd-3a49-4d59-a2ff-43c89e22d650.svg)](https://wakatime.com/badge/user/a0b906ce-b8e7-4463-8bce-383238df6d4b/project/4a00f7dd-3a49-4d59-a2ff-43c89e22d650)
 ![GitHub](https://img.shields.io/github/license/ragaeeb/bitaboom)
@@ -8,838 +8,169 @@
 ![GitHub stars](https://img.shields.io/github/stars/ragaeeb/bitaboom?style=social)
 ![GitHub Release](https://img.shields.io/github/v/release/ragaeeb/bitaboom)
 [![codecov](https://codecov.io/gh/ragaeeb/bitaboom/graph/badge.svg?token=7Z3E38HXCD)](https://codecov.io/gh/ragaeeb/bitaboom)
-[![Size](https://deno.bundlejs.com/badge?q=bitaboom@latest&badge=detailed)](https://bundlejs.com/?q=bitaboom%40latest)
 ![typescript](https://badgen.net/badge/icon/typescript?icon=typescript&label&color=blue)
-# Bitaboom - A String Utilities Library
+Bitaboom is a TypeScript-first string utility toolkit focused on Arabic and bilingual (Arabic ↔ English) publishing workflows. It ships a wide surface of helpers for:
-Bitaboom is a NodeJS string utility library written in TypeScript, designed to provide a collection of helpful string manipulation functions. It supports the latest ESNext features and is tested using Bun's test runner.
+- Arabic script awareness (diacritics, tatweel, Urdu glyphs, punctuation harmonisation)
+- Formatting and typography clean-up for scanned/OCRd manuscripts
+- Sanitisation pipelines for removing noise such as references, page numbers, markdown artefacts, or escaped spaces
+- Parsing helpers (balanced punctuation, JSON normalisation, page range parsing)
+- Transliteration cleanup and salutation normalisation for classical Islamic texts
-## Demo
+The project targets ESNext and is built/tested with Bun. All exports are tree-shakeable and documented with JSDoc.
-[Demo](https://ragaeeb.github.io/bitaboom/)
-## Installation
-To install Bitaboom, use npm or yarn:
+## Quick start
 ```bash
 npm install bitaboom
 # or
 yarn add bitaboom
 # or
-pnpm i bitaboom
+pnpm add bitaboom
 # or
 bun add bitaboom
 ```
-## Usage
-Import the library into your project:
-```typescript
-import { functionName } from 'bitaboom';
-```
-Use any function from the API in your code:
 ```typescript
-const result = functionName('inputString');
-console.log(result);
-```
-## Available Functions
-### `addSpaceBeforeAndAfterPunctuation`
-Adds spaces before and after punctuation marks except in specific cases like quoted text.
-#### Example
-```javascript
-addSpaceBeforeAndAfterPunctuation('Text,word');
-// Output: 'Text, word'
-```
----
-### `addSpaceBetweenArabicTextAndNumbers`
-Inserts spaces between Arabic text and numbers.
-#### Example
-```javascript
-addSpaceBetweenArabicTextAndNumbers('الآية37');
-// Output: 'الآية 37'
-```
----
-### `applySmartQuotes`
-Turns regular double quotes into smart quotes and fixes any incorrect starting quotes.
-#### Example
-```javascript
-applySmartQuotes('The "quick brown" fox');
-// Output: 'The "quick brown" fox'
-```
----
-### `arabicNumeralToNumber`
-Converts Arabic-Indic numerals (٠-٩) to a JavaScript number. This function finds all Arabic-Indic digits in the input string and converts them to their corresponding Arabic (Western) digits, then parses the result as an integer.
-#### Example
-```javascript
-arabicNumeralToNumber("١٢٣");
-// Output: 123
-arabicNumeralToNumber("٥٠");
-// Output: 50
-arabicNumeralToNumber("abc١٢٣xyz");
-// Output: 123 (non-digits ignored)
-```
----
-### `cleanExtremeArabicUnderscores`
-Removes extreme Arabic underscores (ـ) from the beginning or end of lines. It does not affect Hijri dates or certain Arabic terms.
-#### Example
-```javascript
-cleanExtremeArabicUnderscores('ـThis is a textـ');
-// Output: "This is a text"
-```
----
-### `cleanLiteralNewLines`
-Replaces literal new line characters (`\n`) with actual line breaks.
-#### Example
-```javascript
-cleanLiteralNewLines('A\nB');
-// Output: 'A\nB'
-```
----
-### `cleanMultilines`
-Removes trailing spaces from each line in a multiline string.
-#### Example
-```javascript
-cleanMultilines(' This is a line   \nAnother line   ');
-// Output: 'This is a line\nAnother line'
-```
----
-### `cleanSymbolsAndPartReferences`
-Removes various symbols, part references, and numerical markers from the text.
-#### Example
-```javascript
-cleanSymbolsAndPartReferences('(1) (2/3)');
-// Output: ''
-```
----
-### `cleanTrailingPageNumbers`
-Removes trailing page numbers formatted as `-[46]-` from the text.
-#### Example
-```javascript
-cleanTrailingPageNumbers('This is some -[46]- text');
-// Output: 'This is some text'
-```
----
-### `convertUrduSymbolsToArabic`
-Converts Urdu symbols like 'ھ' and 'ی' to their Arabic equivalents 'ه' and 'ي'.
-#### Example
-```javascript
-convertUrduSymbolsToArabic('ھذا');
-// Output: 'هذا'
-```
----
-### `ensureSpaceBeforeBrackets`
-Ensures there is exactly one space before parentheses that follow non-whitespace characters. Normalizes multiple spaces to a single space.
-#### Example
-```javascript
-ensureSpaceBeforeBrackets('text(note)');
-// Output: 'text (note)'
-ensureSpaceBeforeBrackets('text   (note)');
-// Output: 'text (note)'
-```
----
-### `ensureSpaceBeforeQuotes`
-Ensures at most 1 space exists before any word before Arabic quotation marks. Adds a space if there isn't one, or reduces multiple spaces to one.
-#### Example
-```javascript
-ensureSpaceBeforeQuotes('text«quote»');
-// Output: 'text «quote»'
-ensureSpaceBeforeQuotes('text   «quote»');
-// Output: 'text «quote»'
-```
----
-### `escapeRegex`
-Escapes a string so it can be safely embedded into a RegExp source.
-#### Example
-```javascript
-escapeRegex('Hello [world]');
-// Output: 'Hello \\[world\\]'
-```
----
-### `extractInitials`
-Extracts initials from the input string, typically for names or titles.
-#### Example
-```javascript
-extractInitials('Nayl al-Awtar');
-// Output: 'NA'
-```
-### `fixBracketTypos`
-Fixes common bracket and quotation mark typos in text. Corrects malformed patterns like "(«", "»)", and misplaced digits in brackets.
-#### Example
-```javascript
-fixBracketTypos('(«text»)');
-// Output: '«text»'
-fixBracketTypos(')5)');
-// Output: '(5)'
-```
----
-### `fixCurlyBraces`
-Fixes mismatched curly braces by converting incorrect bracket/brace combinations to proper curly braces { }.
-#### Example
-```javascript
-fixCurlyBraces('(content}');
-// Output: '{content}'
-fixCurlyBraces('{content)');
-// Output: '{content}'
-```
----
-### `fixMismatchedQuotationMarks`
-Fixes mismatched quotation marks in Arabic text by converting various incorrect bracket/quote combinations to proper Arabic quotation marks (« »).
-#### Example
-```javascript
-fixMismatchedQuotationMarks('«text)');
-// Output: '«text»'
-fixMismatchedQuotationMarks('(text»');
-// Output: '«text»'
-```
----
-### `fixTrailingWow`
-Corrects unnecessary trailing "و" in greetings or phrases.
-#### Example
-```javascript
-fixTrailingWow('السلام عليكم و رحمة');
-// Output: 'السلام عليكم ورحمة'
-```
----
-### `getArabicScore`
-Calculates the proportion of Arabic characters in text relative to total non-whitespace characters.
-#### Example
-```javascript
-getArabicScore('مرحبا hello');
-// Output: 0.5 (5 Arabic chars out of 10 non-whitespace chars)
-getArabicScore('مرحبا');
-// Output: 1.0 (100% Arabic)
-```
----
-### `hasWordInSingleLine`
-Checks if a line has any word by itself.
-#### Example
-```javascript
-hasWordInSingleLine('Abc efg\nhij\nklmn opq');
-// Output: true (since "hij" is by itself)
-```
----
-### `insertLineBreaksAfterPunctuation`
-Adds line breaks after punctuation marks such as periods, exclamation points, and question marks.
-#### Example
-```javascript
-insertLineBreaksAfterPunctuation('Text.');
-// Output: 'Text.
-'
-```
----
-### `isAllUppercase`
-Detects if text is entirely in uppercase letters.
-#### Example
-```javascript
-isAllUppercase('HELLO WORLD');
-// Output: true
-isAllUppercase('Hello World');
-// Output: false
-isAllUppercase('123');
-// Output: false (no letters)
-```
----
-### `isBalanced`
-Checks if both quotes and brackets are balanced in a string. A string is considered balanced when all double quotes have matching pairs (even count) and all brackets (parentheses, square brackets, curly braces) are properly matched and nested.
-#### Example
-```javascript
-isBalanced('He said "Hello (world)!"');
-// Output: true
-isBalanced('He said "Hello (world!"');
-// Output: false (unbalanced quote)
-```
----
-### `isJsonStructureValid`
-Checks if a given string resembles a JSON object with numeric or quoted keys and values that are single or double quoted. Useful for detecting malformed JSON-like structures that can be fixed.
-#### Example
-```javascript
-isJsonStructureValid("{10: 'abc', 'key': 'value'}");
-// Output: true
-```
----
-### `isOnlyPunctuation`
-Checks if the input string consists only of punctuation characters.
-#### Example
-```javascript
-isOnlyPunctuation('!?');
-// Output: true
-```
----
-### `makeDiacriticInsensitive`
-Creates a diacritic-insensitive regex pattern for Arabic text matching. Normalizes text, handles character equivalences (ا/آ/أ/إ, ة/ه, ى/ي), and makes each character tolerant of Arabic diacritics (Tashkeel/Harakat).
-#### Example
-```javascript
-const pattern = makeDiacriticInsensitive('محمد');
-const regex = new RegExp(pattern, 'gi');
-regex.test('مُحَمَّد'); // true
-regex.test('محمد'); // true
-```
----
-### `normalizeArabicPrefixesToAl`
-Replaces common Arabic prefixes like 'Al-', 'Ar-', 'Ash-', etc., with 'al-' in the text. Handles variations and lam-assimilation patterns (before sun letters), and avoids changes where assimilation rules do not apply.
-#### Example
-```javascript
-normalizeArabicPrefixesToAl('Ash-Shafiee');
-// Output: 'al-Shafiee'
-```
----
-### `normalizeDoubleApostrophes`
-Removes double occurrences of Arabic apostrophes such as ʿʿ or ʾʾ.
-#### Example
-```javascript
-normalizeDoubleApostrophes('ʿulamāʾʾ');
-// Output: 'ʿulamāʾ'
-```
----
-### `normalizeJsonSyntax`
-Converts a string that resembles JSON but has numeric keys and single-quoted values into valid JSON format. The function replaces numeric keys with quoted numeric keys and ensures all values are double-quoted, as required by JSON.
-#### Example
-```javascript
-normalizeJsonSyntax("{10: 'abc', 20: 'def'}");
-// Output: '{"10": "abc", "20": "def"}'
-```
----
-### `normalizeTransliteratedEnglish`
-Simplifies English transliterations by removing diacritics, apostrophes, and common prefixes.
-#### Example
-```javascript
-normalizeTransliteratedEnglish('Al-Jadwāl');
-// Output: 'Jadwal'
-```
----
-### `normalize`
-Normalizes the text by removing diacritics, apostrophes, and dashes.
-#### Example
-```javascript
-normalize('Al-Jadwāl');
-// Output: 'AlJadwal'
-```
----
-### `parsePageRanges`
-Parses page input string into array of page numbers, supporting ranges and lists.
-#### Example
-```javascript
-parsePageRanges('1-5');
-// Output: [1, 2, 3, 4, 5]
-parsePageRanges('1,3,5');
-// Output: [1, 3, 5]
-parsePageRanges('10-8');
-// Throws Error: 'Start page cannot be greater than end page'
-```
----
-### `removeArabicPrefixes`
-Strips common Arabic prefixes like 'al-', 'bi-', 'fī', 'wa-', etc., from the beginning of words.
-#### Example
-```javascript
-removeArabicPrefixes('al-Bukhari');
-// Output: 'Bukhari'
-```
----
-### `removeDeathYear`
-Removes death year references like "(d. 390H)" and "[d. 100h]" from the text.
-#### Example
-```javascript
-removeDeathYear('Sufyān ibn 'Uyaynah (d. 198h)');
-// Output: 'Sufyān ibn 'Uyaynah'
-```
----
-### `removeMarkdownFormatting`
-Removes common Markdown formatting syntax from text.
-#### Example
-```javascript
-removeMarkdownFormatting('**Bold** and *italic* text');
-// Output: 'Bold and italic text'
-removeMarkdownFormatting('# Header\n- List item');
-// Output: 'Header\nList item'
-```
----
-### `removeNonIndexSignatures`
-Removes single-digit numbers and dashes from Arabic text but preserves numbers used as indexes.
-#### Example
-```javascript
-removeNonIndexSignatures('الورقه 3 المصدر');
-// Output: 'الورقه المصدر'
-```
----
-### `removeNumbersAndDashes`
-Removes numeric digits and dashes from the text.
-#### Example
-```javascript
-removeNumbersAndDashes('ABC 123-Xyz');
-// Output: 'ABC Xyz'
-```
----
-### `removeRedundantPunctuation`
-Removes redundant punctuation marks that follow Arabic question marks or exclamation marks. This function cleans up text by removing periods (.) or Arabic commas (،) that immediately follow Arabic question marks (؟) or exclamation marks (!).
-#### Example
-```javascript
-removeRedundantPunctuation('كيف حالك؟.');
-// Output: 'كيف حالك؟'
-removeRedundantPunctuation('ممتاز!،');
-// Output: 'ممتاز!'
-```
----
-### `removeSingleDigitReferences`
-Removes single digit references like (1), «2», [3] from the text.
-#### Example
-```javascript
-removeSingleDigitReferences('Ref (1), Ref «2», Ref [3]');
-// Output: 'Ref , Ref , Ref '
-```
----
-### `removeSingularCodes`
-Removes Arabic letters or Arabic-Indic numerals enclosed in square brackets or parentheses.
-#### Example
-```javascript
-removeSingularCodes('[س]');
-// Output: ''
-```
----
-### `removeSolitaryArabicLetters`
-Removes solitary Arabic letters unless they are 'ha' used in Hijri years.
-#### Example
-```javascript
-removeSolitaryArabicLetters('ب ا الكلمات ت');
-// Output: 'ا الكلمات'
-```
----
-### `removeUrls`
-Removes URLs from the text.
-#### Example
-```javascript
-removeUrls('Visit https://example.com');
-// Output: 'Visit '
-```
-### `replaceDoubleBracketsWithArrows`
-Replaces double parentheses with single arrow quotation marks. Converts `((text))` format to `«text»` format, handling optional spaces inside the brackets.
-#### Example
-```javascript
-replaceDoubleBracketsWithArrows('((text))');
-// Output: '«text»'
-replaceDoubleBracketsWithArrows('(( spaced text ))');
-// Output: '«spaced text»'
-```
----
-### `replaceEnglishPunctuationWithArabic`
-Replaces English punctuation marks (e.g., ? and ;) with their Arabic equivalents.
-#### Example
-```javascript
-replaceEnglishPunctuationWithArabic('This; and, that?');
-// Output: 'This؛and، that؟'
-```
----
-### `replaceLineBreaksWithSpaces`
-Replaces consecutive line breaks and whitespace characters with a single space.
-#### Example
-```javascript
-replaceLineBreaksWithSpaces('a\nb');
-// Output: 'a b'
-```
----
-### `replaceSalutationsWithSymbol`
-Replaces common salutations like "sallahu alayhi wasallam" with "ﷺ". Handles variations like 'peace and blessings be upon him'.
-#### Example
-```javascript
-replaceSalutationsWithSymbol('Then Muḥammad (sallahu alayhi wasallam)');
-// Output: 'Then Muḥammad ﷺ'
-```
----
-### `splitByQuotes`
-Splits a string by spaces but keeps quoted substrings intact. Substrings enclosed in double quotes are treated as a single part.
-#### Example
-```javascript
-splitByQuotes('"This is" "a part of the" "string and"');
-// Output: ["This is", "a part of the", "string and"]
-```
-### `stripAllDigits`
-Removes all numeric digits from the text.
-#### Example
-```javascript
-stripAllDigits('abc123');
-// Output: 'abc'
-```
----
-### `toTitleCase`
-Converts a string to title case (first letter of each word capitalized).
-#### Example
-```javascript
-toTitleCase('hello world');
-// Output: 'Hello World'
-toTitleCase('the quick brown fox');
-// Output: 'The Quick Brown Fox'
-```
----
-### `truncate`
-Truncates a string to a specified length, adding an ellipsis if truncated.
-#### Example
-```javascript
-truncate('The quick brown fox jumps over the lazy dog', 20);
-// Output: 'The quick brown fox…'
-truncate('Short text', 50);
-// Output: 'Short text'
-```
----
-### `truncateMiddle`
-Truncates a string from the middle, preserving both the beginning and end portions.
-#### Example
-```javascript
-truncateMiddle('The quick brown fox jumps right over the lazy dog', 20);
-// Output: 'The quick bro…zy dog'
-truncateMiddle('The quick brown fox jumps right over the lazy dog', 25, 8);
-// Output: 'The quick brown …lazy dog'
-truncateMiddle('Short text', 50);
-// Output: 'Short text'
-```
----
-### `unescapeSpaces`
-Unescapes backslash-escaped spaces and trims whitespace from both ends. Commonly used to clean file paths that have been escaped when pasted into terminals.
-#### Example
-```javascript
-unescapeSpaces('My\\ Folder\\ Name');
-// Output: 'My Folder Name'
-unescapeSpaces('  /path/to/My\\ Document.txt  ');
-// Output: '/path/to/My Document.txt'
-unescapeSpaces('regular text');
-// Output: 'regular text'
-```
----
-## sanitizeArabic — unified Arabic text sanitizer
-`sanitizeArabic(input, optionsOrPreset)` provides fast, configurable cleanup for Arabic text and replaces older per-rule utilities.
-It supports presets (`"light"`, `"search"`, `"aggressive"`) and fine-grained options like `stripDiacritics`, `stripTatweel`, `normalizeAlif`,
-`replaceAlifMaqsurah`, `replaceTaMarbutahWithHa`, `stripZeroWidth`, `zeroWidthToSpace`, `stripLatinAndSymbols`, `lettersAndSpacesOnly`,
-`keepOnlyArabicLetters`, `collapseWhitespace`, `trim`, and `removeHijriMarker`. For one-off rules, use `base: 'none'` to apply only what you specify.
-**Examples**
-```ts
-import { sanitizeArabic } from 'bitaboom';
-// Light display cleanup
-sanitizeArabic('  مرحبا\u200C\u200D   بالعالم  ', 'light'); // → 'مرحبا بالعالم'
-// Tolerant search normalization
-sanitizeArabic('اَلسَّلَامُ عَلَيْكُمْ', 'search'); // → 'السلام عليكم'
-// Indexing-friendly text (letters + spaces only)
-sanitizeArabic('اَلسَّلَامُ 1435/3/29 هـ — www', 'aggressive'); // → 'السلام'
-// Tatweel-only, preserving dates/list markers
-sanitizeArabic('أبـــتِـــكَةُ', { base: 'none', stripTatweel: true }); // → 'أبتِكَةُ'
-// Zero-width controls → spaces
-sanitizeArabic('يَخْلُوَ ‏. ‏ قَالَ غَرِيبٌ ‏. ‏', { base: 'none', stripZeroWidth: true, zeroWidthToSpace: true });
-// → 'يَخْلُوَ  .   قَالَ غَرِيبٌ  .  '
-```
-## makeDiacriticInsensitiveRegex — tolerant Arabic matcher
-`makeDiacriticInsensitiveRegex(needle, opts?)` returns a `RegExp` that matches Arabic text while ignoring diacritics,
-optionally tolerating tatweel, and treating common equivalents as equal (`ا~أ~إ~آ`, `ة~ه`, `ى~ي`). Whitespace in the needle
-is treated as `\s+` by default, making it robust across spacing variants.
-**Examples**
-```ts
-import { makeDiacriticInsensitiveRegex } from 'bitaboom';
+import { makeDiacriticInsensitiveRegex, removeMarkdownFormatting } from 'bitaboom';
 const rx = makeDiacriticInsensitiveRegex('أنا إلى الآفاق');
-rx.test('انا الى الافاق'); // true
-rx.test('أنا الي الآفاق'); // true
-```
-**Composing tolerant heads with a literal tail**
-```ts
-const heads = ['السلام', 'مرحبا'];
-const pattern = heads.map(h => makeDiacriticInsensitiveRegex(h).source).join('|');
-const rx2 = new RegExp(`^(?:${pattern})\s+عليكم.*$`, 'mu');
-rx2.test('اَلسَّلَامُ عَلَيْكُمْ ورحمة'); // true
-```
+rx.test('انا الي الافاق'); // true
+const plain = removeMarkdownFormatting('**Bold** _italic_ [link](https://example.com)');
+console.log(plain); // "Bold italic link"
+```
+## Feature highlights
+- **Arabic-first matching** – build diacritic-insensitive regular expressions, collapse tatweel, score Arabic content density, and replace Urdu glyphs.
+- **Rich typography normalisers** – more than 30 helpers to fix punctuation spacing, quotes, brackets, ellipses, smart quotes, uppercase detection, and whitespace quirks.
+- **Sanitisation pipelines** – strip references, URLs, part markers, markdown decorations, escaped spaces, or numbers in bilingual text.
+- **Parsing helpers** – validate JSON-ish blobs, split search queries by quotes, balance parentheses/quotes, and expand page range strings.
+- **Transliteration polish** – normalise common Arabic prefixes (`al-`, `wa-`, `bi-`), dedupe apostrophes, replace salutations with ﷺ, and extract initials from transliterated names.
+- **Bun-native toolchain** – tests run through `bun test` and builds use an in-repo `tsdown` pipeline powered by `bun build` + `tsc` for declarations.
+## API overview
+All modules are exported from `src/index.ts`. Functions are grouped below by feature area.
+### Arabic helpers (`src/arabic.ts`)
+| Function | Description |
+| --- | --- |
+| `arabicNumeralToNumber` | Convert Arabic-Indic numerals (٠-٩) embedded in a string into a JavaScript number. |
+| `cleanExtremeArabicUnderscores` | Remove decorative tatweel/underscores at line edges without touching Hijri date suffixes. |
+| `convertUrduSymbolsToArabic` | Map Urdu variants such as ھ → ه and ی → ي. |
+| `getArabicScore` | Return the ratio of Arabic letters to total non-space, non-digit characters (0 → 1). |
+| `fixTrailingWow` | Collapse stray "و" separators in greetings (e.g. `عليكم و رحمة` → `عليكم ورحمة`). |
+| `addSpaceBetweenArabicTextAndNumbers` | Insert a space between Arabic text segments and following numbers. |
+| `removeNonIndexSignatures` | Drop single-digit indices and dangling dashes surrounded by Arabic text. |
+| `removeSingularCodes` | Strip single Arabic letters or digits enclosed in (), [], or «». |
+| `removeSolitaryArabicLetters` | Remove isolated Arabic letters (excluding Hijri "ه"). |
+| `replaceEnglishPunctuationWithArabic` | Replace ASCII `?` and `;` with Arabic equivalents (`؟`, `؛`) and normalise commas. |
+### Cleaning & tolerant matching (`src/cleaning.ts`, `src/sanitization.ts`)
+| Function | Description |
+| --- | --- |
+| `escapeRegex` | Safely escape special characters for inclusion in regular expression sources. |
+| `makeDiacriticInsensitiveRegex` | Build a `RegExp` tolerant of Arabic diacritics, tatweel, whitespace variants, and letter equivalences. |
+| `makeDiacriticInsensitive` | Produce a pattern string (no delimiters) for diacritic-insensitive matching of Arabic text. |
+| `cleanSymbolsAndPartReferences` | Remove bracketed part markers, Arabic ornaments, and numeric references. |
+| `cleanTrailingPageNumbers` | Drop `-[123]-` page markers. |
+| `replaceLineBreaksWithSpaces` | Collapse whitespace and newline runs to single spaces. |
+| `stripAllDigits` | Remove ASCII digits. |
+| `removeDeathYear` | Strip `(d. ####H)`/`[d. ####h]` style death-year mentions. |
+| `removeNumbersAndDashes` | Remove digits and dash characters everywhere. |
+| `removeSingleDigitReferences` | Delete single digit markers like `(1)`, `[2]`, `«3»`. |
+| `removeUrls` | Remove `http(s)` URLs. |
+| `removeMarkdownFormatting` | Drop markdown bold/italic/link/list/header/backtick syntax. |
+| `truncate` | Trim strings to a maximum length with ellipsis (`…`). |
+| `truncateMiddle` | Preserve start/end segments while truncating the middle with ellipsis. |
+| `unescapeSpaces` | Convert escaped spaces (`\ `) back to regular spaces and trim ends. |
+### Formatting & typography (`src/formatting.ts`)
+| Function | Description |
+| --- | --- |
+| `insertLineBreaksAfterPunctuation` | Add line breaks after `.`, `!`, `?`, and `؟`. |
+| `addSpaceBeforeAndAfterPunctuation` | Normalise spacing around punctuation while respecting quotes and ayah markers. |
+| `applySmartQuotes` | Convert straight quotes to smart quotes and fix opening quotes. |
+| `cleanLiteralNewLines` | Replace literal `\n`/`\r` sequences with actual newlines. |
+| `cleanMultilines` | Trim trailing spaces per line. |
+| `hasWordInSingleLine` | Detect whether a line contains a single standalone word. |
+| `isOnlyPunctuation` | Check whether a string consists solely of punctuation/digits. |
+| `cleanSpacesBeforePeriod` | Remove stray spaces before punctuation marks. |
+| `condenseAsterisks` | Collapse multiple `*` into a single asterisk. |
+| `condenseColons` | Normalise colon clusters like `.:.` → `:`. |
+| `condenseDashes` | Reduce consecutive dashes to a single dash. |
+| `condenseEllipsis` | Convert runs of periods to a single ellipsis character. |
+| `reduceMultilineBreaksToDouble` | Limit blank lines to at most two consecutive newlines. |
+| `reduceMultilineBreaksToSingle` | Collapse multiple blank lines to a single newline. |
+| `condensePeriods` | Normalise spaced dot sequences (`. . .`). |
+| `condenseUnderscores` | Collapse repeated underscores and tatweel runs. |
+| `doubleToSingleBrackets` | Replace doubled parentheses/brackets with single ones. |
+| `ensureSpaceBeforeBrackets` | Guarantee a single space before bracketed notes. |
+| `ensureSpaceBeforeQuotes` | Ensure spacing before Arabic guillemets « ». |
+| `fixBracketTypos` | Repair mismatched bracket pairs (e.g. `(«` or `)3)`). |
+| `fixCurlyBraces` | Normalise `{}` curly brace mismatches. |
+| `fixMismatchedQuotationMarks` | Fix malformed Arabic guillemets and parentheses combos. |
+| `formatStringBySentence` | Reflow paragraphs while keeping numbered footnotes on separate lines. |
+| `isAllUppercase` | Detect text containing only uppercase letters (ignoring non-letters). |
+| `normalizeSlashInReferences` | Convert spaced fractions `127 / 11` → `127/11`. |
+| `normalizeSpaces` | Collapse spaces/tabs to single spaces. |
+| `removeRedundantPunctuation` | Remove redundant punctuation following Arabic `؟`/`!`. |
+| `removeSpaceInsideBrackets` | Trim internal spaces inside brackets/parentheses. |
+| `replaceDoubleBracketsWithArrows` | Turn `((text))` into `«text»`. |
+| `stripBoldStyling` | Remove bold stylisation by decomposing Unicode. |
+| `stripItalicsStyling` | Replace italic Unicode letters with plain equivalents. |
+| `stripStyling` | Convenience combo of bold + italics stripping. |
+| `toTitleCase` | Convert strings to title case, respecting Unicode letters. |
+| `trimSpaceInsideQuotes` | Remove spaces immediately inside quotes/guillemets. |
+### Parsing helpers (`src/parsing.ts`)
+| Function | Description |
+| --- | --- |
+| `normalizeJsonSyntax` | Convert pseudo-JSON with numeric keys/single quotes into valid JSON. |
+| `isJsonStructureValid` | Detect JSON-like key/value blobs that can be normalised. |
+| `splitByQuotes` | Split by spaces while keeping quoted substrings intact. |
+| `isBalanced` | Ensure quotes and brackets are balanced and properly nested. |
+| `parsePageRanges` | Expand range/list strings (`1-3,5`) into numeric arrays. |
+### Transliteration (`src/transliteration.ts`)
+| Function | Description |
+| --- | --- |
+| `normalizeArabicPrefixesToAl` | Normalise Arabic definite article prefixes to `al-`. |
+| `normalizeDoubleApostrophes` | Collapse duplicated Arabic apostrophes (`ʿʿ`, `ʾʾ`). |
+| `replaceSalutationsWithSymbol` | Replace salutations like "sallallahu alayhi wasallam" with ﷺ. |
+| `normalize` | Strip diacritics, apostrophes, and dashes from transliterated text. |
+| `removeArabicPrefixes` | Remove prefixes such as `al-`, `wa-`, `bi-`, `fī`, `li-`. |
+| `normalizeTransliteratedEnglish` | Combine prefix removal + diacritic stripping. |
+| `extractInitials` | Extract the first letters from up to two words (after normalisation). |
+## Build & development
+| Task | Command |
+| --- | --- |
+| Build library | `bun run build` (invokes the in-repo `scripts/tsdown.ts` pipeline, which bundles via `bun build` then emits declarations through `tsc`). |
+| Run tests | `bun test` |
+| Lint | `bun run lint` |
+| Format | `bun run format` |
+| Continuous lint | `bun run lint:ci` |
+The custom `tsdown` script ensures reproducible builds without relying on `tsup`. It cleans the `dist/` directory, bundles `src/index.ts` with Bun's bundler (minified ESM output + sourcemap), and finally emits `.d.ts` files using `tsc --emitDeclarationOnly`.
+## Contributing
+1. Fork the repository and clone it locally.
+2. Install Bun (`curl -fsSL https://bun.sh/install | bash`).
+3. Run tests with `bun test` and format with `bun run format` before opening a pull request.
+Issues and PRs are welcome—please include tests whenever you add or change behaviour.
+## License
+MIT © Ragaeeb Haq