flappa-doormal 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE.md ADDED
@@ -0,0 +1,7 @@
1
+ Copyright 2025 Ragaeeb Haq>
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4
+
5
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6
+
7
+ THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,357 @@
1
+ # flappa-doormal
2
+
3
+ [![wakatime](https://wakatime.com/badge/user/a0b906ce-b8e7-4463-8bce-383238df6d4b/project/384fa29d-72e8-4078-980f-45d363f10507.svg)](https://wakatime.com/badge/user/a0b906ce-b8e7-4463-8bce-383238df6d4b/project/384fa29d-72e8-4078-980f-45d363f10507)
4
+ [![Node.js CI](https://github.com/ragaeeb/flappa-doormal/actions/workflows/build.yml/badge.svg)](https://github.com/ragaeeb/flappa-doormal/actions/workflows/build.yml) ![GitHub License](https://img.shields.io/github/license/ragaeeb/flappa-doormal)
5
+ ![GitHub Release](https://img.shields.io/github/v/release/ragaeeb/flappa-doormal)
6
+ [![Size](https://deno.bundlejs.com/badge?q=flappa-doormal@latest)](https://bundlejs.com/?q=flappa-doormal%40latest)
7
+ ![typescript](https://badgen.net/badge/icon/typescript?icon=typescript&label&color=blue)
8
+ ![npm](https://img.shields.io/npm/v/flappa-doormal)
9
+ ![npm](https://img.shields.io/npm/dm/flappa-doormal)
10
+ ![GitHub issues](https://img.shields.io/github/issues/ragaeeb/flappa-doormal)
11
+ ![GitHub stars](https://img.shields.io/github/stars/ragaeeb/flappa-doormal?style=social)
12
+ [![codecov](https://codecov.io/gh/ragaeeb/flappa-doormal/graph/badge.svg?token=RQ2BV4M9IS)](https://codecov.io/gh/ragaeeb/flappa-doormal)
13
+ [![npm version](https://badge.fury.io/js/flappa-doormal.svg)](https://badge.fury.io/js/flappa-doormal)
14
+
15
+ **Arabic text marker pattern library** - Generate regex patterns from declarative marker configurations.
16
+
17
+ 🎯 **Purpose:** Simplify Arabic text segmentation by replacing complex regex patterns with readable, composable templates.
18
+
19
+ ## Installation
20
+
21
+ ```bash
22
+ bun add flappa-doormal
23
+ # Peer dependencies
24
+ bun add bitaboom baburchi shamela
25
+ ```
26
+
27
+ ## Quick Start
28
+
29
+ ```typescript
30
+ import { generateRegexFromMarker } from 'flappa-doormal';
31
+
32
+ // Simple numbered marker
33
+ const regex = generateRegexFromMarker({
34
+ type: 'numbered' // Defaults: Arabic-Indic numerals, dash separator
35
+ });
36
+
37
+ regex.exec('٥ - نص الحديث');
38
+ // Returns: ['٥ - نص الحديث', 'نص الحديث']
39
+ ```
40
+
41
+ ## Features
42
+
43
+ ✅ **13 Preset Types** - Common patterns like `bab`, `hadith-chain`, `basmala`
44
+ ✅ **Template System** - Use `{num}`, `{dash}`, `{bullet}` instead of regex
45
+ ✅ **Type-Safe** - Full TypeScript support
46
+ ✅ **Composable** - Mix and match tokens with quantifiers
47
+ ✅ **Diacritic-Insensitive** - Handles Arabic text variations
48
+
49
+ ## Marker Types
50
+
51
+ ### Basic Types
52
+ ```typescript
53
+ { type: 'numbered' } // ٥ - text
54
+ { type: 'bullet' } // • text
55
+ { type: 'bab' } // باب chapter
56
+ { type: 'hadith-chain' } // حَدَّثَنَا narrator
57
+ { type: 'basmala' } // بسم الله
58
+ { type: 'square-bracket' } // [٦٥] reference
59
+ ```
60
+
61
+ ### Numbered Variants
62
+ ```typescript
63
+ { type: 'num-letter' } // ٥ أ - (number + letter)
64
+ { type: 'num-paren' } // ٥ (أ) - (number + paren)
65
+ { type: 'num-slash' } // ٥/٦ - (number/number)
66
+ ```
67
+
68
+ ### Custom Patterns
69
+
70
+ **Using templates (recommended):**
71
+ ```typescript
72
+ {
73
+ type: 'pattern',
74
+ template: '{bullet}? {num}+ {dash}'
75
+ }
76
+ ```
77
+
78
+ **Using raw regex (for complex patterns):**
79
+ ```typescript
80
+ {
81
+ type: 'pattern',
82
+ pattern: '^CUSTOM: (.*)' // When templates aren't sufficient
83
+ }
84
+ ```
85
+
86
+ **Using format for numbered:**
87
+ ```typescript
88
+ {
89
+ type: 'numbered',
90
+ format: '{bullet}+ {num} {letter} {dash}'
91
+ }
92
+ ```
93
+
94
+ ## Complex Pattern Examples
95
+
96
+ ### Comma-Separated Numerals
97
+ Match patterns like: `٩٩٣٦، ٩٩٣٧ - حَدَّثَنَا`
98
+
99
+ ```typescript
100
+ {
101
+ type: 'pattern',
102
+ template: '{num}(?:،{s}{num})*{s}{dash}'
103
+ }
104
+ ```
105
+
106
+ ### Number / Letter
107
+ Match patterns like: `١١٠٧٣/ أ - حَدَّثَنَا`
108
+
109
+ ```typescript
110
+ {
111
+ type: 'pattern',
112
+ template: '{num}{s}/{s}{letter}{s}{dash}'
113
+ }
114
+ ```
115
+
116
+ ### Number / Number (Built-in)
117
+ Match patterns like: `١٠٢٦٦ / ١ - "وَإِذَا`
118
+
119
+ ```typescript
120
+ {
121
+ type: 'num-slash' // Built-in preset
122
+ }
123
+ ```
124
+
125
+ ### Repeating Dots
126
+ Match patterns like: `. . . . . . . . . .`
127
+
128
+ ```typescript
129
+ {
130
+ type: 'pattern',
131
+ template: '\\.(?:{s}\\.)+'
132
+ }
133
+ ```
134
+
135
+ ### Asterisk + Dots + Number
136
+ Match patterns like: `*. . . / ٨٦ - حَدَّثَنَا`
137
+
138
+ **Option 1: Capture from asterisk**
139
+ ```typescript
140
+ {
141
+ type: 'pattern',
142
+ template: '\\*\\.(?:{s}\\.)*{s}/{s}{num}{s}{dash}',
143
+ removeMarker: false // Keep everything
144
+ }
145
+ ```
146
+
147
+ **Option 2: Detect from asterisk, capture from number**
148
+ ```typescript
149
+ {
150
+ type: 'pattern',
151
+ pattern: '^\\*\\.(?:\\s?\\.)*\\s?/\\s?([\\u0660-\\u0669]+\\s?[-–—ـ].*)'
152
+ }
153
+ ```
154
+
155
+ ## Template Tokens
156
+
157
+ | Token | Matches | Example |
158
+ |-------|---------|---------|
159
+ | `{num}` | Arabic-Indic numerals | `[\\u0660-\\u0669]+` |
160
+ | `{latin}` | Latin numerals | `\\d+` |
161
+ | `{roman}` | Roman numerals | `[IVXLCDM]+` |
162
+ | `{dash}` | Various dashes | `[-–—ـ]` |
163
+ | `{dot}` | Period | `\\.` |
164
+ | `{bullet}` | Bullet variants | `[•*°]` |
165
+ | `{letter}` | Arabic letters | `[أ-ي]` |
166
+ | `{s}` | Optional space | `\\s?` |
167
+ | `{space}` | Required space | `\\s+` |
168
+
169
+ **Quantifiers:** Add `+`, `*`, `?` after tokens: `{num}+`, `{bullet}?`
170
+
171
+ ## Examples
172
+
173
+ ### Before (Regex)
174
+ ```typescript
175
+ const pattern = '^[•*°]+ ([\\u0660-\\u0669]+\\s?[-–—ـ].*)';
176
+ ```
177
+
178
+ ### After (Template)
179
+ ```typescript
180
+ {
181
+ type: 'numbered',
182
+ format: '{bullet}+ {num} {dash}'
183
+ }
184
+ ```
185
+
186
+ **80% reduction in complexity!**
187
+
188
+ ## API
189
+
190
+ ### `generateRegexFromMarker(config)`
191
+
192
+ ```typescript
193
+ import { generateRegexFromMarker, type MarkerConfig } from 'flappa-doormal';
194
+
195
+ const config: MarkerConfig = {
196
+ type: 'numbered',
197
+ numbering: 'arabic-indic', // or 'latin', 'roman'
198
+ separator: 'dash', // or 'dot', 'colon', 'paren'
199
+ removeMarker: true, // Remove marker from capture (default: true)
200
+ };
201
+
202
+ const regex = generateRegexFromMarker(config);
203
+ ```
204
+
205
+ ### `expandTemplate(template, options)`
206
+
207
+ ```typescript
208
+ import { expandTemplate } from 'flappa-doormal';
209
+
210
+ const pattern = expandTemplate('{num} {dash}');
211
+ // Returns: '^[\\u0660-\\u0669]+ [-–—ـ](.*)'
212
+
213
+ const pattern2 = expandTemplate('{num} {dash}', { removeMarker: false });
214
+ // Returns: '^([\\u0660-\\u0669]+ [-–—ـ].*)'
215
+ ```
216
+
217
+ ### `validateTemplate(template)`
218
+
219
+ ```typescript
220
+ import { validateTemplate } from 'flappa-doormal';
221
+
222
+ const result = validateTemplate('{num} {invalid}');
223
+ // Returns: { valid: false, errors: ['Unknown tokens: {invalid}'] }
224
+ ```
225
+
226
+ ## Configuration Options
227
+
228
+ ```typescript
229
+ type MarkerConfig = {
230
+ type: MarkerType;
231
+ numbering?: 'arabic-indic' | 'latin' | 'roman';
232
+ separator?: 'dash' | 'dot' | 'paren' | 'colon' | 'none' | string;
233
+ format?: string; // Template for numbered markers
234
+ template?: string; // Template for pattern markers
235
+ pattern?: string; // Raw regex (when templates aren't enough)
236
+ tokens?: Record<string, string>; // Custom token definitions
237
+ phrases?: string[]; // For 'phrase' and 'hadith-chain' types
238
+ removeMarker?: boolean; // Default: true for numbered/bullet
239
+ };
240
+ ```
241
+
242
+ ## Extensibility
243
+
244
+ ### Extending Default Phrase Lists
245
+
246
+ ```typescript
247
+ import { DEFAULT_HADITH_PHRASES, generateRegexFromMarker } from 'flappa-doormal';
248
+
249
+ // Add to existing hadith phrases
250
+ const myPhrases = [...DEFAULT_HADITH_PHRASES, 'أَخْبَرَنِي', 'سَمِعْتُ'];
251
+
252
+ const regex = generateRegexFromMarker({
253
+ type: 'hadith-chain',
254
+ phrases: myPhrases,
255
+ });
256
+ ```
257
+
258
+ ### Using Type-Specific Generators
259
+
260
+ ```typescript
261
+ import { generateHadithChainRegex, DEFAULT_HADITH_PHRASES } from 'flappa-doormal';
262
+
263
+ // Direct access to type-specific generator
264
+ const regex = generateHadithChainRegex(
265
+ { type: 'hadith-chain', phrases: [...DEFAULT_HADITH_PHRASES, 'extra'] },
266
+ true // removeMarker
267
+ );
268
+ ```
269
+
270
+ ### Custom Tokens
271
+
272
+ ```typescript
273
+ import { createTokenMap, expandTemplate } from 'flappa-doormal';
274
+
275
+ const customTokens = createTokenMap({
276
+ verse: '\\[[\\u0660-\\u0669]+\\]',
277
+ tafsir: 'تفسير',
278
+ });
279
+
280
+ const pattern = expandTemplate('{verse} {tafsir}', {
281
+ tokens: customTokens,
282
+ removeMarker: true
283
+ });
284
+ ```
285
+
286
+ ## Available Exports
287
+
288
+ **Constants:**
289
+ - `DEFAULT_HADITH_PHRASES` - Default narrator phrases
290
+ - `DEFAULT_BASMALA_PATTERNS` - Default basmala patterns
291
+ - `TOKENS` - Token definitions
292
+
293
+ **Functions:**
294
+ - `generateRegexFromMarker()` - Main function
295
+ - `generate{Type}Regex()` - 12 type-specific generators
296
+ - `expandTemplate()` - Template expansion
297
+ - `validateTemplate()` - Template validation
298
+ - `createTokenMap()` - Custom token maps
299
+
300
+ ## Testing
301
+
302
+ This project has comprehensive unit test coverage for all marker type generators.
303
+
304
+ ```bash
305
+ # Run all tests
306
+ bun test
307
+
308
+ # Run specific test file
309
+ bun test src/markers/type-generators.test.ts
310
+
311
+ # Run tests with coverage
312
+ bun test --coverage
313
+ ```
314
+
315
+ **Test Coverage**: 100% coverage for `type-generators.ts` with 54+ test cases covering:
316
+ - All 12 marker type generators
317
+ - Edge cases (empty phrases, diacritic variations, custom separators)
318
+ - Error handling (missing required fields)
319
+ - Various numbering styles and separators
320
+
321
+ ## Development
322
+
323
+ ```bash
324
+ # Install dependencies
325
+ bun install
326
+
327
+ # Run tests
328
+ bun test
329
+
330
+ # Build (if needed)
331
+ bun run build
332
+
333
+ # Format code
334
+ bunx biome format --write .
335
+
336
+ # Lint code
337
+ bunx biome lint .
338
+ ```
339
+
340
+ ## For AI Agents
341
+
342
+ See [AGENTS.md](./AGENTS.md) for comprehensive guidance on:
343
+ - Project architecture and design patterns
344
+ - Adding new marker types
345
+ - Testing strategies
346
+ - Code quality standards
347
+ - Extension points
348
+
349
+ ## License
350
+
351
+ MIT
352
+
353
+ ## Related
354
+
355
+ - [bitaboom](https://github.com/ragaeeb/bitaboom) - Arabic text utilities
356
+ - [baburchi](https://github.com/ragaeeb/baburchi) - Text sanitization
357
+ - [shamela](https://github.com/ragaeeb/shamela) - Shamela library utilities