@futpib/parser 1.0.7 → 1.0.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/readme.md CHANGED
@@ -1,17 +1,478 @@
1
- # parser
1
+ # @futpib/parser
2
2
 
3
- > parser
3
+ > A functional TypeScript library for parsing and serializing binary data formats using composable parser combinators
4
4
 
5
5
  [![Coverage Status](https://coveralls.io/repos/github/futpib/parser/badge.svg?branch=master)](https://coveralls.io/github/futpib/parser?branch=master)
6
+ [![npm version](https://img.shields.io/npm/v/@futpib/parser.svg)](https://www.npmjs.com/package/@futpib/parser)
6
7
 
7
- ## Example
8
+ ## Overview
8
9
 
9
- ```js
10
- // parser
11
- ```
10
+ `@futpib/parser` is a powerful parser combinator library built on async iterables for streaming binary data parsing. It provides a functional, type-safe approach to parsing complex binary formats including APK files, DEX files, ZIP archives, BSON documents, Java KeyStore files, and more.
11
+
12
+ ### Key Features
13
+
14
+ - 🔄 **Bidirectional**: Both parsing (deserialization) and unparsing (serialization) support
15
+ - 🌊 **Streaming**: Built on async iterables for memory-efficient parsing of large files
16
+ - 🧩 **Composable**: Build complex parsers from simple, reusable building blocks
17
+ - 🔒 **Type-safe**: Full TypeScript support with extensive generics
18
+ - 🎯 **Functional**: Uses fp-ts for functional programming patterns
19
+ - ⚡ **Fast**: Optimized for performance with lookahead and backtracking support
20
+
21
+ ### Supported Formats
22
+
23
+ - **Android Packages (APK)** - Parse and unparse Android application packages
24
+ - **Dalvik Executable (DEX)** - Parse Dalvik bytecode and executable files
25
+ - **ZIP Archives** - Full ZIP file parsing and creation
26
+ - **BSON** - Binary JSON documents
27
+ - **Java KeyStore** - Java keystore files
28
+ - **Smali** - Smali bytecode assembly language
29
+ - **JSON** - Standard JSON parsing
30
+ - **Regular Expressions** - Pattern-based parsing
31
+ - **Bash Scripts** - Parse bash scripts
32
+ - **Java Source** - Parse Java source code
33
+ - **JavaScript** - Parse JavaScript source code
34
+ - **Zig** - Parse Zig source code
35
+ - **S-expressions** - Symbolic expressions
12
36
 
13
37
  ## Install
14
38
 
39
+ ```bash
40
+ # Using yarn
41
+ yarn add @futpib/parser
42
+
43
+ # Using npm
44
+ npm install @futpib/parser
45
+ ```
46
+
47
+ **Optional peer dependency**: For parsing modified UTF-8 (used in DEX files):
48
+ ```bash
49
+ yarn add mutf-8
50
+ ```
51
+
52
+ ## Quick Start
53
+
54
+ ### Basic Parsing Example
55
+
56
+ ```typescript
57
+ import {
58
+ createTupleParser,
59
+ createExactSequenceParser,
60
+ createFixedLengthSequenceParser,
61
+ runParser,
62
+ uint8ArrayParserInputCompanion,
63
+ } from '@futpib/parser';
64
+
65
+ // Parse a simple binary format: magic bytes + 4-byte length
66
+ const myFormatParser = createTupleParser([
67
+ createExactSequenceParser(new Uint8Array([0x4D, 0x59, 0x46, 0x4D])), // "MYFM" magic
68
+ createFixedLengthSequenceParser(4), // 4-byte length field
69
+ ]);
70
+
71
+ const data = new Uint8Array([0x4D, 0x59, 0x46, 0x4D, 0x00, 0x00, 0x00, 0x10]);
72
+ const result = await runParser(
73
+ myFormatParser,
74
+ data,
75
+ uint8ArrayParserInputCompanion
76
+ );
77
+
78
+ console.log(result); // [Uint8Array([0x4D, 0x59, 0x46, 0x4D]), Uint8Array([0x00, 0x00, 0x00, 0x10])]
79
+ ```
80
+
81
+ ### String Parsing Example
82
+
83
+ ```typescript
84
+ import {
85
+ createObjectParser,
86
+ createExactSequenceParser,
87
+ createRegExpParser,
88
+ runParser,
89
+ stringParserInputCompanion,
90
+ } from '@futpib/parser';
91
+
92
+ // Parse a simple key-value format
93
+ // Note: RegExpParser returns a RegExpExecArray (match array), not just the string
94
+ // Note: Underscore-prefixed keys like _separator are omitted from output
95
+ const keyValueParser = createObjectParser({
96
+ key: createRegExpParser(/[a-z]+/),
97
+ _separator: createExactSequenceParser('='),
98
+ value: createRegExpParser(/[0-9]+/),
99
+ });
100
+
101
+ const result = await runParser(
102
+ keyValueParser,
103
+ 'name=123',
104
+ stringParserInputCompanion
105
+ );
106
+
107
+ console.log(result.key[0]); // 'name' - the [0] element contains the matched string
108
+ console.log(result.value[0]); // '123'
109
+ ```
110
+
111
+ ### Array Parsing Example
112
+
113
+ ```typescript
114
+ import {
115
+ createTerminatedArrayParser,
116
+ createUnionParser,
117
+ createExactElementParser,
118
+ runParser,
119
+ uint8ArrayParserInputCompanion,
120
+ } from '@futpib/parser';
121
+
122
+ // Parse an array of specific bytes (1 or 2) until we hit a zero byte
123
+ const byteArrayParser = createTerminatedArrayParser(
124
+ createUnionParser([
125
+ createExactElementParser(1),
126
+ createExactElementParser(2),
127
+ ]),
128
+ createExactElementParser(0)
129
+ );
130
+
131
+ const data = new Uint8Array([1, 2, 1, 2, 1, 0]);
132
+ const result = await runParser(
133
+ byteArrayParser,
134
+ data,
135
+ uint8ArrayParserInputCompanion
136
+ );
137
+
138
+ console.log(result); // [[1, 2, 1, 2, 1], 0]
139
+ ```
140
+
141
+ ### Real-world Example: Parsing ZIP Files
142
+
143
+ ```typescript
144
+ import { runParser, uint8ArrayParserInputCompanion } from '@futpib/parser';
145
+ import { zipParser } from '@futpib/parser/build/zipParser.js';
146
+ import { readFile } from 'fs/promises';
147
+
148
+ const zipData = await readFile('archive.zip');
149
+ const zip = await runParser(
150
+ zipParser,
151
+ zipData,
152
+ uint8ArrayParserInputCompanion
153
+ );
154
+
155
+ for (const entry of zip.entries) {
156
+ console.log(`File: ${entry.path}`);
157
+ console.log(`Size: ${entry.uncompressedSize} bytes`);
158
+ console.log(`Compressed: ${entry.compressedSize} bytes`);
159
+ }
160
+ ```
161
+
162
+ ### Unparsing (Serialization) Example
163
+
164
+ ```typescript
165
+ import {
166
+ createArrayUnparser,
167
+ runUnparser,
168
+ uint8ArrayUnparserOutputCompanion,
169
+ } from '@futpib/parser';
170
+
171
+ // Create an unparser for arrays of bytes
172
+ const byteArrayUnparser = createArrayUnparser(async function* (byte) {
173
+ yield byte;
174
+ });
175
+
176
+ const bytes = [0x48, 0x65, 0x6C, 0x6C, 0x6F]; // "Hello"
177
+ const result = await runUnparser(
178
+ bytes,
179
+ byteArrayUnparser,
180
+ uint8ArrayUnparserOutputCompanion
181
+ );
182
+
183
+ console.log(result); // Uint8Array([0x48, 0x65, 0x6C, 0x6C, 0x6F])
184
+ ```
185
+
186
+ ## Core Concepts
187
+
188
+ ### Parser
189
+
190
+ A `Parser<Output, Sequence, Element>` is a function that takes a `ParserContext` and returns parsed output:
191
+
192
+ ```typescript
193
+ type Parser<Output, Sequence, Element> = (
194
+ parserContext: ParserContext<Sequence, Element>
195
+ ) => Output | Promise<Output>;
196
+ ```
197
+
198
+ ### Parser Combinators
199
+
200
+ Build complex parsers from simple ones:
201
+
202
+ #### Basic Combinators
203
+ - **`createElementParser()`** - Parse a single element from the input
204
+ - **`createExactElementParser(element)`** - Parse and match an exact element value
205
+ - **`createExactSequenceParser(sequence)`** - Match exact byte/string sequences
206
+ - **`createFixedLengthSequenceParser(length)`** - Parse a fixed-length sequence
207
+ - **`createPredicateElementParser(predicate)`** - Parse element matching a predicate function
208
+
209
+ #### Structural Combinators
210
+ - **`createTupleParser(parsers)`** - Parse a tuple of values in sequence
211
+ - **`createObjectParser(parsers)`** - Parse an object with named fields (underscore-prefixed keys omitted)
212
+ - **`createArrayParser(parser)`** - Parse an array of values (until parser fails)
213
+ - **`createNonEmptyArrayParser(parser)`** - Parse at least one element into an array
214
+ - **`createListParser(parser)`** - Parse a list of values
215
+
216
+ #### Array & Sequence Combinators
217
+ - **`createTerminatedArrayParser(elementParser, terminatorParser)`** - Parse until terminator is found
218
+ - **`createSeparatedArrayParser(elementParser, separatorParser)`** - Parse values separated by delimiters
219
+ - **`createSeparatedNonEmptyArrayParser(elementParser, separatorParser)`** - Parse at least one separated value
220
+ - **`createElementTerminatedSequenceParser(terminator)`** - Parse sequence until element terminator
221
+ - **`createElementTerminatedSequenceArrayParser(terminator)`** - Parse array of sequences terminated by element
222
+ - **`createSequenceTerminatedSequenceParser(terminator)`** - Parse sequence until sequence terminator
223
+
224
+ #### Choice & Optional Combinators
225
+ - **`createUnionParser(parsers)`** - Try multiple parsers (first success wins)
226
+ - **`createDisjunctionParser(parsers)`** - Try multiple parsers and collect all results
227
+ - **`createOptionalParser(parser)`** - Parse optional values (returns undefined on failure)
228
+
229
+ #### Lookahead & Control Flow
230
+ - **`createLookaheadParser(parser)`** - Parse without consuming input (peek ahead)
231
+ - **`createNegativeLookaheadParser(parser)`** - Succeed only if parser would fail (without consuming)
232
+ - **`createParserAccessorParser(accessor)`** - Dynamically select parser based on context
233
+ - **`createElementSwitchParser(cases, defaultParser?)`** - Switch parser based on next element
234
+
235
+ #### Utility Combinators
236
+ - **`createSkipParser(length)`** - Skip a fixed number of elements
237
+ - **`createSkipToParser(parser)`** - Skip until parser succeeds
238
+ - **`createSliceBoundedParser(parser, maxLength)`** - Limit parser to slice bounds
239
+ - **`createEndOfInputParser()`** - Ensure all input has been consumed
240
+ - **`createQuantifierParser(parser, min, max)`** - Parse between min and max repetitions
241
+ - **`createParserConsumedSequenceParser(parser)`** - Parse and return consumed sequence
242
+
243
+ #### String-Specific Combinators
244
+ - **`createRegExpParser(regexp)`** - Parse using regular expressions (returns RegExpExecArray)
245
+
246
+ #### Debugging Combinators
247
+ - **`createDebugLogParser(label, parser)`** - Log parser execution for debugging
248
+ - **`createDebugLogInputParser(label)`** - Log current input position
249
+
250
+ ### Unparser (Serializer) Combinators
251
+
252
+ An `Unparser<Input, Sequence, Element>` converts data back into binary/string format:
253
+
254
+ ```typescript
255
+ type Unparser<Input, Sequence, Element> = (
256
+ input: Input,
257
+ unparserContext: UnparserContext<Sequence, Element>
258
+ ) => AsyncIterable<Sequence | Element>;
15
259
  ```
16
- yarn add --dev @futpib/parser
260
+
261
+ #### Available Unparsers
262
+ - **`createArrayUnparser(elementUnparser)`** - Serialize an array of elements
263
+ - **`createSequenceUnparser(elementUnparsers)`** - Serialize a sequence of values
264
+
265
+ #### Advanced Unparser Features: WriteLater and WriteEarlier
266
+
267
+ When serializing complex formats, you may need to write a size or offset field before you know its value. Use `WriteLater` and `WriteEarlier` for this:
268
+
269
+ ```typescript
270
+ import {
271
+ runUnparser,
272
+ uint8ArrayUnparserOutputCompanion,
273
+ type UnparserContext,
274
+ } from '@futpib/parser';
275
+
276
+ // Example: Write a length-prefixed string
277
+ async function* lengthPrefixedStringUnparser(
278
+ text: string,
279
+ ctx: UnparserContext<Uint8Array, number>
280
+ ) {
281
+ // Reserve 4 bytes for length (to be written later)
282
+ const lengthPlaceholder = yield* ctx.writeLater(4);
283
+
284
+ // Write the string content
285
+ const encoder = new TextEncoder();
286
+ const bytes = encoder.encode(text);
287
+ yield bytes;
288
+
289
+ // Now write the length back to the reserved space
290
+ const lengthBytes = new Uint8Array(4);
291
+ new DataView(lengthBytes.buffer).setUint32(0, bytes.length, false);
292
+ yield* ctx.writeEarlier(
293
+ lengthPlaceholder,
294
+ async function*() { yield lengthBytes; },
295
+ null
296
+ );
297
+ }
298
+
299
+ // Collect results from the async iterable
300
+ const chunks: Uint8Array[] = [];
301
+ for await (const chunk of runUnparser(
302
+ lengthPrefixedStringUnparser,
303
+ 'Hello',
304
+ uint8ArrayUnparserOutputCompanion
305
+ )) {
306
+ chunks.push(chunk);
307
+ }
308
+ const result = uint8ArrayUnparserOutputCompanion.concat(chunks);
309
+
310
+ // result: Uint8Array([0, 0, 0, 5, 72, 101, 108, 108, 111])
311
+ // length=5 ^ H e l l o
17
312
  ```
313
+
314
+ **`WriteLater`** - Reserve space in the output to be filled later
315
+ - `writeLater(length)` - Returns a `WriteLater` token representing reserved space
316
+ - Used when you need to write a value that depends on data processed later
317
+
318
+ **`WriteEarlier`** - Fill in previously reserved space
319
+ - `writeEarlier(writeLater, unparser, input)` - Write data to previously reserved space
320
+ - The `writeLater` token identifies which space to fill
321
+ - Allows writing forward references (e.g., file sizes, offsets)
322
+
323
+ ### Input/Output Companions
324
+
325
+ Companions provide type-specific operations:
326
+
327
+ - **`uint8ArrayParserInputCompanion`** - For binary data (Uint8Array)
328
+ - **`stringParserInputCompanion`** - For text data (string)
329
+ - **`uint8ArrayUnparserOutputCompanion`** - For binary serialization
330
+ - **`stringUnparserOutputCompanion`** - For text serialization
331
+
332
+ ## API Reference
333
+
334
+ ### Running Parsers
335
+
336
+ ```typescript
337
+ // Parse input completely
338
+ runParser<Output, Sequence, Element>(
339
+ parser: Parser<Output, Sequence, Element>,
340
+ input: Sequence,
341
+ inputCompanion: ParserInputCompanion<Sequence, Element>,
342
+ options?: RunParserOptions
343
+ ): Promise<Output>
344
+
345
+ // Parse with remaining input
346
+ runParserWithRemainingInput<Output, Sequence, Element>(
347
+ parser: Parser<Output, Sequence, Element>,
348
+ input: Sequence,
349
+ inputCompanion: ParserInputCompanion<Sequence, Element>,
350
+ options?: RunParserOptions
351
+ ): Promise<{ output: Output; remainingInput: Sequence }>
352
+ ```
353
+
354
+ ### Running Unparsers
355
+
356
+ ```typescript
357
+ runUnparser<Input, Sequence, Element>(
358
+ input: Input,
359
+ unparser: Unparser<Input, Sequence, Element>,
360
+ outputCompanion: UnparserOutputCompanion<Sequence, Element>
361
+ ): Promise<Sequence>
362
+ ```
363
+
364
+ ### Error Handling
365
+
366
+ Parsers throw `ParserError` when parsing fails:
367
+
368
+ ```typescript
369
+ try {
370
+ const result = await runParser(parser, input, inputCompanion);
371
+ } catch (error) {
372
+ if (isParserError(error)) {
373
+ console.error('Parse failed:', error.message);
374
+ }
375
+ }
376
+ ```
377
+
378
+ Options for error handling:
379
+ - `errorJoinMode: 'first'` - Return first error (default, faster)
380
+ - `errorJoinMode: 'all'` - Collect all errors (more detailed, slower)
381
+
382
+ ## Advanced Usage
383
+
384
+ ### Custom Parsers
385
+
386
+ Create custom parsers by implementing the `Parser` type:
387
+
388
+ ```typescript
389
+ import { setParserName, type Parser } from '@futpib/parser';
390
+
391
+ const customParser: Parser<number, Uint8Array> = async (parserContext) => {
392
+ const byte1 = await parserContext.read(0); // Read and consume 1 byte
393
+ const byte2 = await parserContext.read(0); // Read and consume another byte
394
+
395
+ // Combine into a 16-bit big-endian integer
396
+ return (byte1 << 8) | byte2;
397
+ };
398
+
399
+ setParserName(customParser, 'uint16BEParser');
400
+ ```
401
+
402
+ ### Lookahead and Backtracking
403
+
404
+ ```typescript
405
+ import { createLookaheadParser, createUnionParser } from '@futpib/parser';
406
+
407
+ // Try to peek ahead without consuming input
408
+ const peekParser = createLookaheadParser(someParser);
409
+
410
+ // Union parser automatically backtracks on failure
411
+ const eitherParser = createUnionParser([
412
+ parserA, // Try this first
413
+ parserB, // If parserA fails, try this
414
+ parserC, // If parserB fails, try this
415
+ ]);
416
+ ```
417
+
418
+ ### Conditional Parsing
419
+
420
+ ```typescript
421
+ import { createObjectParser, createParserAccessorParser } from '@futpib/parser';
422
+
423
+ const conditionalParser = createObjectParser({
424
+ type: createElementParser(),
425
+ value: createParserAccessorParser((ctx) =>
426
+ // Choose parser based on previously parsed 'type'
427
+ ctx.type === 1 ? int32Parser : stringParser
428
+ ),
429
+ });
430
+ ```
431
+
432
+ ## Development
433
+
434
+ ```bash
435
+ # Install dependencies
436
+ yarn install
437
+
438
+ # Build the project
439
+ yarn build
440
+
441
+ # Run tests
442
+ yarn test
443
+
444
+ # Run tests in watch mode
445
+ yarn dev
446
+
447
+ # Lint code
448
+ yarn xo
449
+
450
+ # Type check
451
+ yarn tsd
452
+ ```
453
+
454
+ ## Architecture
455
+
456
+ The library uses:
457
+ - **Async Iterables** - For streaming large files without loading everything into memory
458
+ - **Parser Combinators** - Composable building blocks for complex parsers
459
+ - **fp-ts** - Functional programming utilities
460
+ - **TypeScript Generics** - Type-safe parser composition
461
+
462
+ ## Contributing
463
+
464
+ Contributions are welcome! This project uses:
465
+ - TypeScript with strict mode
466
+ - AVA for testing
467
+ - xo for linting
468
+ - Yarn 4 with Plug'n'Play
469
+
470
+ ## License
471
+
472
+ GPL-3.0-only
473
+
474
+ ## Links
475
+
476
+ - [GitHub Repository](https://github.com/futpib/parser)
477
+ - [npm Package](https://www.npmjs.com/package/@futpib/parser)
478
+ - [Issue Tracker](https://github.com/futpib/parser/issues)