meta-parser-generator 1.0.4 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -4,26 +4,43 @@
4
4
  npm install meta-parser-generator
5
5
  ```
6
6
 
7
- Meta Parser Generator will help you generate an efficient parser using a grammar and a token definition.
8
- Meta programming is used to generate a single self contained parser file.
9
-
10
- This code has been extracted from https://github.com/batiste/blop-language
7
+ Meta Parser Generator will help you generate an efficient parser using grammar and a token definition.
8
+ Meta programming is used to generate a single self-contained parser file.
11
9
 
12
10
  ## Characteristics
13
11
 
14
- * LL parser (Left to Right parser), arbitrary look ahead
15
- * Direct Left recursion support (no indirect)
12
+ * PEG parser (Parsing Expression Grammar) with ordered choice
13
+ * Packrat parsing with memoization for linear time complexity
14
+ * Direct left recursion support using Guido van Rossum's algorithm
16
15
  * Parser code is generated from a grammar
17
- * Good parsing performance (provided your grammar is efficient)
18
- * Decent error reporting on parsing error
19
- * Memoization
20
- * Small source code (~500 lines of code), no dependencies
16
+ * Good parsing performance (O(n) with memoization)
17
+ * Excellent error reporting with context
18
+ * Small source code (~600 lines of code), no dependencies
19
+
20
+ ### Important: Grammar Order Matters
21
+
22
+ Unlike LL or LR parsers, PEG parsers use **ordered choice**. The first matching alternative is selected, and no backtracking occurs across alternatives. This means:
23
+
24
+ ```javascript
25
+ // This grammar will NEVER match 'number' because 'name' matches first!
26
+ 'VALUE': [
27
+ ['name'], // matches ANY identifier including '123abc'
28
+ ['number'], // NEVER reached if name is defined as /^[\w]+/
29
+ ]
30
+
31
+ // Correct order: more specific rules first
32
+ 'VALUE': [
33
+ ['number'], // try number first
34
+ ['name'], // then try name
35
+ ]
36
+ ```
21
37
 
22
38
  ## How to generate and use a parser
23
39
 
24
40
  This will generate a mathematical operation parser
25
41
 
26
42
  ```javascript
43
+ // generator.js
27
44
  const { generateParser } = require('meta-parser-generator');
28
45
  const path = require('path');
29
46
 
@@ -53,7 +70,13 @@ const grammar = {
53
70
  ['number'],
54
71
  ],
55
72
  };
73
+ ```
74
+
75
+ Then execute this script `node generate.js`
56
76
 
77
+ ```javascript
78
+ // generate.js
79
+ const { tokensDefinition, grammar } = require('./generator');
57
80
  // this generate the executable parser file
58
81
  generateParser(grammar, tokensDefinition, path.resolve(__dirname, './parser.js'));
59
82
  console.log('parser generated');
@@ -79,41 +102,49 @@ let ast = parse('9+10-190.3');
79
102
  console.log(ast)
80
103
  ```
81
104
 
82
- ### How does generated parser works?
105
+ ### How does the generated parser work?
106
+
107
+ Each grammar rule you write is transformed into a function, and those grammar functions call each other until the input parsing is successful. The parser uses:
83
108
 
84
- Each grammar rule you write is transformed into a function, and those grammar functions call each other until the input parsing is sucessful. Therefor the JavaScript call stack is used by the generated parser. So if you design a very recursive grammar, you might trigger a "Maximum call stack size exceeded" error for a large input.
109
+ 1. **PEG Ordered Choice**: For each rule with multiple alternatives, tries them in order and returns the first match
110
+ 2. **Packrat Parsing**: Memoization prevents re-parsing the same position, guaranteeing O(n) time complexity
111
+ 3. **Left Recursion Handling**: Uses a special memoization strategy based on Guido van Rossum's algorithm
85
112
 
86
- In our example `MATH` grammar rule above you have a left recursion. It means you can parse expressions such as 1+2+3+4+5+...X, where X is the maximum stack size of V8.
113
+ The JavaScript call stack is used by the generated parser. So, if you design a very recursive grammar, you might trigger a "Maximum call stack size exceeded" error for a large input.
87
114
 
88
- To know the default maximum stack size of V8 you can run `node --v8-options | grep stack-size`. If the default size is not enough for your grammar, use this option to extend the size. You can also try to rewrite your grammar in order to be less recursive.
115
+ In our example, the `MATH` grammar rule has left recursion, allowing you to parse expressions like 1+2+3+4+5+...X, where X is limited by V8's stack size.
89
116
 
90
- Anything that can be handled by a modifier rather than recursion will not use the call stack and should be preffered.
117
+ To find out the default maximum stack size of V8, run `node --v8-options | grep stack-size`. If the default size is not enough, you can extend it or rewrite your grammar.
118
+
119
+ **Best practice**: Use modifiers (`*`, `+`, `?`) instead of recursion when possible - they don't use the call stack and handle large inputs better.
120
+
121
+ **Note**: For very large files, the memoization cache can grow significantly. The parser clears the cache between parse calls, but memory usage during parsing is proportional to input size × grammar complexity.
91
122
 
92
123
  ### AST interface
93
124
 
94
125
  ```typescript
95
126
  type ASTNode = RuleNode | Token
96
127
 
97
- interface RuleNode {
98
- stream_index: number // position of the first token for this rule in the token stream
99
- type: str // name of the rule
100
- subRule: number // index of this grammar rule in the subrule array
101
- children: [ASTNode] // list of children
102
- named: { [key: string]: ASTNode; } // named elements withing this rule, see named aliases
128
+ export interface RuleNode {
129
+ stream_index: number // position of the first token for this rule in the token stream
130
+ type: string // name of the rule
131
+ sub_rule_index: number // index of this grammar rule in the sub_rule_index array
132
+ children: [ASTNode] // list of children
133
+ named: { [key: string]: ASTNode; } // named elements withing this rule, see named aliases
103
134
  }
104
135
  ```
105
136
 
106
137
  At the leaf of the AST you will find the final tokens. They have a slightly different interface
107
138
 
108
139
  ```typescript
109
- interface Token {
110
- stream_index: number // position of the token in the token stream
111
- type: str // name of token
112
- value: str // the value of the token
113
- len: number // shortcut for value.length
114
- lineStart: number // line start position in the input
115
- columnStart: number // column start position in the input
116
- start: number // character start position in the input
140
+ export interface Token {
141
+ stream_index: number // position of the token in the token stream
142
+ type: string // name of token
143
+ value: string // the value of the token
144
+ len: number // shortcut for value.length
145
+ line_start: number // line start position in the input
146
+ column_start: number // column start position in the input
147
+ start: number // character start position in the input
117
148
  }
118
149
  ```
119
150
 
@@ -139,8 +170,9 @@ To facilitate your work with the AST, you can name a rule or a token using a col
139
170
 
140
171
  ```typescript
141
172
  'MATH': [
142
- ['MATH', 'math_operator:operator', 'number:num'], // math_operator and number token are named
143
- ['number:num'], // here only number is named
173
+ ['MATH', 'math_operator:operator', 'number:num'], // tokens math_operator and number
174
+ // are named with operator and num
175
+ ['number:num'], // here only number is named with num
144
176
  ]
145
177
  ```
146
178
 
@@ -168,3 +200,7 @@ The util function `displayError` will display detailed informations about a toke
168
200
  is based on the first grammar rule found that consume the most token from the stream.
169
201
 
170
202
  <img src="/error.png" width="800">
203
+
204
+ ## Projects using this parser
205
+
206
+ * The Blop language https://github.com/batiste/blop-language
@@ -0,0 +1,22 @@
1
+ import { TokensDefinition, Grammar } from './types';
2
+ /**
3
+ * Generates the tokenizer function code as an array of strings
4
+ * @param tokenDef - Token definitions mapping token names to patterns
5
+ * @returns Array of code lines for the tokenizer
6
+ */
7
+ export declare function generateTokenizer(tokenDef: TokensDefinition): string[];
8
+ /**
9
+ * Generates the complete parser code from grammar and token definitions
10
+ * @param grammar - Grammar rules defining the language structure
11
+ * @param tokensDef - Token definitions mapping token names to patterns
12
+ * @param debug - Whether to include debug logging in generated code
13
+ * @returns Array of code lines for the complete parser
14
+ */
15
+ export declare function generate(grammar: Grammar, tokensDef: TokensDefinition, debug: boolean): string[];
16
+ /**
17
+ * Generates a parser file from grammar and token definitions
18
+ * @param grammar - Grammar rules defining the language structure
19
+ * @param tokensDefinition - Token definitions mapping token names to patterns
20
+ * @param filename - Output path for the generated parser file
21
+ */
22
+ export declare function generateParser(grammar: Grammar, tokensDefinition: TokensDefinition, filename: string): void;
@@ -0,0 +1,377 @@
1
+ "use strict";
2
+ var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
3
+ if (k2 === undefined) k2 = k;
4
+ var desc = Object.getOwnPropertyDescriptor(m, k);
5
+ if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
6
+ desc = { enumerable: true, get: function() { return m[k]; } };
7
+ }
8
+ Object.defineProperty(o, k2, desc);
9
+ }) : (function(o, m, k, k2) {
10
+ if (k2 === undefined) k2 = k;
11
+ o[k2] = m[k];
12
+ }));
13
+ var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
14
+ Object.defineProperty(o, "default", { enumerable: true, value: v });
15
+ }) : function(o, v) {
16
+ o["default"] = v;
17
+ });
18
+ var __importStar = (this && this.__importStar) || function (mod) {
19
+ if (mod && mod.__esModule) return mod;
20
+ var result = {};
21
+ if (mod != null) for (var k in mod) if (k !== "default" && Object.prototype.hasOwnProperty.call(mod, k)) __createBinding(result, mod, k);
22
+ __setModuleDefault(result, mod);
23
+ return result;
24
+ };
25
+ Object.defineProperty(exports, "__esModule", { value: true });
26
+ exports.generateParser = exports.generate = exports.generateTokenizer = void 0;
27
+ const fs = __importStar(require("fs"));
28
+ const utils_1 = require("./utils");
29
+ const recordFailure = `
30
+ let best_failure;
31
+ let best_failure_array = [];
32
+ let best_failure_index = 0;
33
+
34
+ // Records parsing failures at the deepest position reached
35
+ // Collects all failures at the same position to potentially show "expected: X, Y, or Z"
36
+ function record_failure(failure, i) {
37
+ // New deepest position reached - reset tracking
38
+ if (i > best_failure_index) {
39
+ best_failure_array = [];
40
+ best_failure = null;
41
+ best_failure_index = i;
42
+ }
43
+ // Record this failure
44
+ best_failure_array.push(failure);
45
+ // Keep first failure as primary for error messages
46
+ if (!best_failure) {
47
+ best_failure = failure;
48
+ }
49
+ }
50
+
51
+
52
+ // Memoization cache for regular rules
53
+ // Note: For very large inputs, this cache can grow to O(n * m) where:
54
+ // n = input size, m = number of grammar rules
55
+ // Cache is cleared between parse() calls to prevent memory leaks
56
+ let cache = {};
57
+
58
+ function memoize(name, func) {
59
+ return function memoize_inner(stream, index) {
60
+ const key = \`\${name}-\${index}\`;
61
+ let value = cache[key];
62
+ if (value !== undefined) {
63
+ return value;
64
+ }
65
+ value = func(stream, index);
66
+ cache[key] = value;
67
+ return value;
68
+ };
69
+ }
70
+
71
+ // Separate cache for left-recursive rules
72
+ let cacheR = {};
73
+
74
+ // based on https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1
75
+ function memoize_left_recur(name, func) {
76
+ return function memoize_inner(stream, index) {
77
+ const key = \`\${name}-\${index}\`;
78
+ let value = cacheR[key];
79
+ if (value !== undefined) {
80
+ return value;
81
+ }
82
+ // prime this rule with a failure
83
+ cacheR[key] = false;
84
+ let lastpos;
85
+ let lastvalue = value;
86
+ while (true) {
87
+ value = func(stream, index);
88
+ if (!value) break;
89
+ if (value.last_index <= lastpos) break;
90
+ lastpos = value.last_index;
91
+ lastvalue = value;
92
+ cacheR[key] = value;
93
+ }
94
+ return lastvalue;
95
+ };
96
+ }
97
+
98
+ `;
99
+ /**
100
+ * Generates the tokenizer function code as an array of strings
101
+ * @param tokenDef - Token definitions mapping token names to patterns
102
+ * @returns Array of code lines for the tokenizer
103
+ */
104
+ function generateTokenizer(tokenDef) {
105
+ const output = [];
106
+ const keys = Object.keys(tokenDef);
107
+ for (let i = 0; i < keys.length; i++) {
108
+ const key = keys[i];
109
+ if ((/:|\?/g).test(key)) {
110
+ throw new Error('Reserved word in token name');
111
+ }
112
+ }
113
+ output.push('function _tokenize(tokenDef, input, char, stream) {');
114
+ output.push(' let match;');
115
+ let key;
116
+ for (let i = 0; i < keys.length; i++) {
117
+ key = keys[i];
118
+ const token = tokenDef[key];
119
+ if (token.str) {
120
+ const strLen = token.str.length;
121
+ if (token.str.indexOf("'") > -1 || token.str.indexOf('\n') > -1) {
122
+ output.push(` if (input.substr(char, ${strLen}) === \`${token.str}\`) {`);
123
+ output.push(` return [\`${token.str}\`, '${key}'];`);
124
+ }
125
+ else {
126
+ output.push(` if (input.substr(char, ${strLen}) === '${token.str}') {`);
127
+ output.push(` return ['${token.str}', '${key}'];`);
128
+ }
129
+ output.push(' }');
130
+ }
131
+ else if (token.reg) {
132
+ output.push(` match = input.substring(char).match(tokenDef.${key}.reg);`);
133
+ output.push(' if (match !== null) {');
134
+ output.push(` return [match[0], '${key}'];`);
135
+ output.push(' }');
136
+ }
137
+ else if (token.func) {
138
+ output.push(` match = tokenDef.${key}.func(input.substring(char), stream);`);
139
+ output.push(' if (match !== undefined) {');
140
+ output.push(` return [match, '${key}'];`);
141
+ output.push(' }');
142
+ }
143
+ else {
144
+ throw new Error(`Tokenizer error: Invalid token ${key} without a reg, str or func property`);
145
+ }
146
+ }
147
+ output.push(` return [null, '${key}'];`);
148
+ output.push('}');
149
+ output.push('function tokenize(tokenDef, input) {');
150
+ output.push(` const stream = [];
151
+ const originalInput = input;
152
+ let lastToken;
153
+ let key;
154
+ let candidate = null;
155
+ const len = input.length;
156
+ let char = 0;
157
+ let index = 0;
158
+ let line = 0;
159
+ let column = 0;
160
+ while (char < len) {
161
+ [candidate, key] = _tokenize(tokenDef, originalInput, char, stream);
162
+ if (candidate !== null) {
163
+ const candidateLen = candidate.length;
164
+ lastToken = {
165
+ type: key,
166
+ value: candidate,
167
+ start: char,
168
+ stream_index: index,
169
+ len: candidateLen,
170
+ line_start: line,
171
+ column_start: column,
172
+ };
173
+ // Only split if there might be newlines (optimization)
174
+ if (candidate.indexOf('\\n') !== -1) {
175
+ const lines = candidate.split('\\n');
176
+ line += lines.length - 1;
177
+ column = lines[lines.length - 1].length;
178
+ } else {
179
+ column += candidateLen;
180
+ }
181
+ lastToken.lineEnd = line;
182
+ lastToken.columnEnd = column;
183
+ stream.push(lastToken);
184
+ index++;
185
+ char += candidateLen;
186
+ } else {
187
+ if (stream.length === 0) {
188
+ throw new Error('Tokenizer error: total match failure');
189
+ }
190
+ if (lastToken) {
191
+ lastToken.pointer += lastToken.len;
192
+ }
193
+ let msg = \`Tokenizer error, no matching token found for \${originalInput.slice(char, char + 26)}\`;
194
+ if (lastToken) {
195
+ msg += \` After token of type \${lastToken.type}: \${lastToken.value}\`;
196
+ }
197
+ const error = new Error(msg);
198
+ error.token = lastToken;
199
+ throw error;
200
+ }
201
+ }
202
+ stream.push({
203
+ type: 'EOS', value: '<End Of Stream>', char, index,
204
+ });
205
+ return stream;
206
+ }
207
+ `);
208
+ return output;
209
+ }
210
+ exports.generateTokenizer = generateTokenizer;
211
+ /**
212
+ * Generates code for a specific grammar rule alternative
213
+ * @param name - Name of the grammar rule
214
+ * @param index - Index of this alternative in the rule
215
+ * @param ruleItems - Array of rule items (tokens and sub-rules) in this alternative
216
+ * @param tokensDef - Token definitions
217
+ * @param debug - Whether to include debug logging
218
+ * @returns Array of code lines for this rule function
219
+ */
220
+ function generatesub_rule_index(name, index, ruleItems, tokensDef, debug) {
221
+ const output = [];
222
+ output.push(`let ${name}_${index} = (stream, index) => {`);
223
+ let i = 0;
224
+ output.push(' let i = index;');
225
+ output.push(' const children = [];');
226
+ output.push(' const named = {};');
227
+ output.push(` const node = {
228
+ children, stream_index: index, name: '${name}',
229
+ sub_rule_index: ${index}, type: '${name}', named,
230
+ };`);
231
+ ruleItems.forEach((rule) => {
232
+ // terminal rule
233
+ if (tokensDef[rule.value] || rule.value === 'EOS') {
234
+ debug ? output.push(' console.log(i, stream[i])') : null;
235
+ if (rule.repeatable) {
236
+ output.push(` while(stream[i].type === '${rule.value}') {`);
237
+ if (rule.alias) {
238
+ output.push(` named['${rule.alias}'] ? null : named['${rule.alias}'] = []`);
239
+ output.push(` named['${rule.alias}'].push(stream[i])`);
240
+ }
241
+ output.push(' children.push(stream[i]); i++;');
242
+ output.push(' }');
243
+ }
244
+ else if (rule.optional) {
245
+ output.push(` if (stream[i].type === '${rule.value}') {`);
246
+ rule.alias ? output.push(` named['${rule.alias}'] = stream[i];`) : null;
247
+ output.push(' children.push(stream[i]); i++;');
248
+ output.push(' }');
249
+ }
250
+ else {
251
+ output.push(`
252
+ if (stream[i].type !== '${rule.value}') {
253
+ if (i >= best_failure_index) {
254
+ const failure = {
255
+ type: '${name}', sub_rule_index: ${index},
256
+ sub_rule_stream_index: i - index, sub_rule_token_index: ${i},
257
+ stream_index: i, token: stream[i], first_token: stream[index], success: false,
258
+ };
259
+ record_failure(failure, i);
260
+ }
261
+ return false;
262
+ }
263
+ `);
264
+ rule.alias ? output.push(` named['${rule.alias}'] = stream[i];`) : null;
265
+ output.push(' children.push(stream[i]); i++;');
266
+ }
267
+ i++;
268
+ // calling another rule in the grammar
269
+ }
270
+ else {
271
+ if (rule.function) {
272
+ output.push(` if (!(${rule.value})(node)) { return false; }`);
273
+ }
274
+ else if (rule.repeatable) {
275
+ output.push(` let _rule_${i} = ${rule.value}(stream, i);`); // doing the call
276
+ output.push(` while (_rule_${i}) {`);
277
+ if (rule.alias) {
278
+ output.push(` named['${rule.alias}'] ? null : named['${rule.alias}'] = [];`);
279
+ output.push(` named['${rule.alias}'].push(_rule_${i});`);
280
+ }
281
+ output.push(` children.push(_rule_${i});`);
282
+ output.push(` i = _rule_${i}.last_index;`);
283
+ output.push(` _rule_${i} = ${rule.value}(stream, i);`);
284
+ output.push(' }');
285
+ }
286
+ else if (!rule.optional) {
287
+ output.push(` const _rule_${i} = ${rule.value}(stream, i);`); // doing the call
288
+ output.push(` if (!_rule_${i}) return false;`);
289
+ rule.alias ? output.push(` named['${rule.alias}'] = _rule_${i};`) : null;
290
+ output.push(` children.push(_rule_${i});`);
291
+ output.push(` i = _rule_${i}.last_index;`);
292
+ }
293
+ else {
294
+ output.push(` const _rule_${i} = ${rule.value}(stream, i);`); // doing the call
295
+ output.push(` if (_rule_${i}) {`);
296
+ output.push(` children.push(_rule_${i});`);
297
+ rule.alias ? output.push(` named['${rule.alias}'] = _rule_${i};`) : null;
298
+ output.push(` i = _rule_${i}.last_index;`);
299
+ output.push(' }');
300
+ }
301
+ i++;
302
+ }
303
+ });
304
+ output.push(' node.success = i === stream.length; node.last_index = i;');
305
+ output.push(' return node;');
306
+ output.push('};');
307
+ if (ruleItems[0].leftRecursion) {
308
+ output.push(`${name}_${index} = memoize_left_recur('${name}_${index}', ${name}_${index});`);
309
+ }
310
+ else {
311
+ output.push(`${name}_${index} = memoize('${name}_${index}', ${name}_${index});`);
312
+ }
313
+ output.push('\n');
314
+ return output;
315
+ }
316
+ /**
317
+ * Generates the complete parser code from grammar and token definitions
318
+ * @param grammar - Grammar rules defining the language structure
319
+ * @param tokensDef - Token definitions mapping token names to patterns
320
+ * @param debug - Whether to include debug logging in generated code
321
+ * @returns Array of code lines for the complete parser
322
+ */
323
+ function generate(grammar, tokensDef, debug) {
324
+ let output = [];
325
+ (0, utils_1.checkGrammarAndTokens)(grammar, tokensDef);
326
+ const newGrammar = (0, utils_1.preprocessGrammar)(grammar);
327
+ const entries = Object.keys(newGrammar);
328
+ output.push('// This code is automatically generated by the meta parser, do not modify');
329
+ output.push('// produced with metaParserGenerator.js');
330
+ output.push(recordFailure);
331
+ entries.forEach((key) => {
332
+ let i = 0;
333
+ const metaSub = [];
334
+ newGrammar[key].forEach((ruleItems) => {
335
+ output = output.concat(generatesub_rule_index(key, i, ruleItems, tokensDef, debug));
336
+ metaSub.push(`${key}_${i}`);
337
+ i++;
338
+ });
339
+ output.push(`function ${key}(stream, index) {`);
340
+ const st = metaSub.map(sub => `${sub}(stream, index)`).join('\n || ');
341
+ output.push(` return ${st};`);
342
+ output.push('}');
343
+ });
344
+ output = output.concat(generateTokenizer(tokensDef));
345
+ output.push(`module.exports = {
346
+ parse: (stream) => {
347
+ best_failure = null;
348
+ best_failure_index = 0;
349
+ best_failure_array = [];
350
+ cache = {};
351
+ cacheR = {};
352
+ const result = START(stream, 0);
353
+ if (!result) {
354
+ return {
355
+ ...best_failure,
356
+ best_failure_array,
357
+ }
358
+ }
359
+ return result;
360
+ },
361
+ tokenize,
362
+ };
363
+ `);
364
+ return output;
365
+ }
366
+ exports.generate = generate;
367
+ /**
368
+ * Generates a parser file from grammar and token definitions
369
+ * @param grammar - Grammar rules defining the language structure
370
+ * @param tokensDefinition - Token definitions mapping token names to patterns
371
+ * @param filename - Output path for the generated parser file
372
+ */
373
+ function generateParser(grammar, tokensDefinition, filename) {
374
+ fs.writeFileSync(filename, generate(grammar, tokensDefinition, false).join('\n'));
375
+ }
376
+ exports.generateParser = generateParser;
377
+ //# sourceMappingURL=metaParserGenerator.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"metaParserGenerator.js","sourceRoot":"","sources":["../metaParserGenerator.ts"],"names":[],"mappings":";;;;;;;;;;;;;;;;;;;;;;;;;;AACA,uCAAyB;AACzB,mCAAmE;AAInE,MAAM,aAAa,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAqErB,CAAC;AAEF;;;;GAIG;AACH,SAAgB,iBAAiB,CAAC,QAA0B;IAC1D,MAAM,MAAM,GAAa,EAAE,CAAC;IAC5B,MAAM,IAAI,GAAG,MAAM,CAAC,IAAI,CAAC,QAAQ,CAAC,CAAC;IACnC,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,IAAI,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE;QACpC,MAAM,GAAG,GAAG,IAAI,CAAC,CAAC,CAAC,CAAC;QACpB,IAAI,CAAC,OAAO,CAAC,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE;YACvB,MAAM,IAAI,KAAK,CAAC,6BAA6B,CAAC,CAAC;SAChD;KACF;IAED,MAAM,CAAC,IAAI,CAAC,qDAAqD,CAAC,CAAC;IACnE,MAAM,CAAC,IAAI,CAAC,cAAc,CAAC,CAAC;IAC5B,IAAI,GAAG,CAAC;IACR,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,IAAI,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE;QACpC,GAAG,GAAG,IAAI,CAAC,CAAC,CAAC,CAAC;QACd,MAAM,KAAK,GAAG,QAAQ,CAAC,GAAG,CAAC,CAAC;QAC5B,IAAI,KAAK,CAAC,GAAG,EAAE;YACb,MAAM,MAAM,GAAG,KAAK,CAAC,GAAG,CAAC,MAAM,CAAC;YAChC,IAAI,KAAK,CAAC,GAAG,CAAC,OAAO,CAAC,GAAG,CAAC,GAAG,CAAC,CAAC,IAAI,KAAK,CAAC,GAAG,CAAC,OAAO,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC,EAAE;gBAC/D,MAAM,CAAC,IAAI,CAAC,4BAA4B,MAAM,WAAW,KAAK,CAAC,GAAG,OAAO,CAAC,CAAC;gBAC3E,MAAM,CAAC,IAAI,CAAC,iBAAiB,KAAK,CAAC,GAAG,QAAQ,GAAG,KAAK,CAAC,CAAC;aACzD;iBAAM;gBACL,MAAM,CAAC,IAAI,CAAC,4BAA4B,MAAM,UAAU,KAAK,CAAC,GAAG,MAAM,CAAC,CAAC;gBACzE,MAAM,CAAC,IAAI,CAAC,gBAAgB,KAAK,CAAC,GAAG,OAAO,GAAG,KAAK,CAAC,CAAC;aACvD;YACD,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;SACpB;aAAM,IAAI,KAAK,CAAC,GAAG,EAAE;YACpB,MAAM,CAAC,IAAI,CAAC,kDAAkD,GAAG,QAAQ,CAAC,CAAC;YAC3E,MAAM,CAAC,IAAI,CAAC,yBAAyB,CAAC,CAAC;YACvC,MAAM,CAAC,IAAI,CAAC,0BAA0B,GAAG,KAAK,CAAC,CAAC;YAChD,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;SACpB;aAAM,IAAI,KAAK,CAAC,IAAI,EAAE;YACrB,MAAM,CAAC,IAAI,CAAC,sBAAsB,GAAG,uCAAuC,CAAC,CAAC;YAC9E,MAAM,CAAC,IAAI,CAAC,8BAA8B,CAAC,CAAC;YAC5C,MAAM,CAAC,IAAI,CAAC,uBAAuB,GAAG,KAAK,CAAC,CAAC;YAC7C,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;SACpB;aAAM;YACL,MAAM,IAAI,KAAK,CAAC,kCAAkC,GAAG,sCAAsC,CAAC,CAAC;SAC9F;KACF;IACD,MAAM,CAAC,IAAI,CAAC,oBAAoB,GAAG,KAAK,CAAC,CAAC;IAC1C,MAAM,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC;IAEjB,MAAM,CAAC,IAAI,CAAC,sCAAsC,CAAC,CAAC;IACpD,MAAM,CAAC,IAAI,CAAC;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAyDb,CAAC,CAAC;IACD,OAAO,MAAM,CAAC;AAChB,CAAC;AAvGD,8CAuGC;AAED;;;;;;;;GAQG;AACH,SAAS,sBAAsB,CAC7B,IAAY,EACZ,KAAa,EACb,SAA0B,EAC1B,SAA2B,EAC3B,KAAc;IAEd,MAAM,MAAM,GAAa,EAAE,CAAC;IAC5B,MAAM,CAAC,IAAI,CAAC,OAAO,IAAI,IAAI,KAAK,yBAAyB,CAAC,CAAC;IAC3D,IAAI,CAAC,GAAG,CAAC,CAAC;IACV,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,CAAC;IAChC,MAAM,CAAC,IAAI,CAAC,wBAAwB,CAAC,CAAC;IACtC,MAAM,CAAC,IAAI,CAAC,qBAAqB,CAAC,CAAC;IACnC,MAAM,CAAC,IAAI,CAAC;4CAC8B,IAAI;sBAC1B,KAAK,YAAY,IAAI;KACtC,CAAC,CAAC;IACL,SAAS,CAAC,OAAO,CAAC,CAAC,IAAI,EAAE,EAAE;QACzB,gBAAgB;QAChB,IAAI,SAAS,CAAC,IAAI,CAAC,KAAK,CAAC,IAAI,IAAI,CAAC,KAAK,KAAK,KAAK,EAAE;YACjD,KAAK,CAAC,CAAC,CAAC,MAAM,CAAC,IAAI,CAAC,6BAA6B,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;YAC1D,IAAI,IAAI,CAAC,UAAU,EAAE;gBACnB,MAAM,CAAC,IAAI,CAAC,+BAA+B,IAAI,CAAC,KAAK,MAAM,CAAC,CAAC;gBAC7D,IAAI,IAAI,CAAC,KAAK,EAAE;oBACd,MAAM,CAAC,IAAI,CAAC,cAAc,IAAI,CAAC,KAAK,sBAAsB,IAAI,CAAC,KAAK,SAAS,CAAC,CAAC;oBAC/E,MAAM,CAAC,IAAI,CAAC,cAAc,IAAI,CAAC,KAAK,oBAAoB,CAAC,CAAC;iBAC3D;gBACD,MAAM,CAAC,IAAI,CAAC,oCAAoC,CAAC,CAAC;gBAClD,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;aACpB;iBAAM,IAAI,IAAI,CAAC,QAAQ,EAAE;gBACxB,MAAM,CAAC,IAAI,CAAC,6BAA6B,IAAI,CAAC,KAAK,MAAM,CAAC,CAAC;gBAC3D,IAAI,CAAC,KAAK,CAAC,CAAC,CAAC,MAAM,CAAC,IAAI,CAAC,cAAc,IAAI,CAAC,KAAK,iBAAiB,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;gBAC3E,MAAM,CAAC,IAAI,CAAC,oCAAoC,CAAC,CAAC;gBAClD,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;aACpB;iBAAM;gBACL,MAAM,CAAC,IAAI,CAAC;4BACQ,IAAI,CAAC,KAAK;;;iBAGrB,IAAI,sBAAsB,KAAK;kEACkB,CAAC;;;;;;;CAOlE,CAAC,CAAC;gBACK,IAAI,CAAC,KAAK,CAAC,CAAC,CAAC,MAAM,CAAC,IAAI,CAAC,YAAY,IAAI,CAAC,KAAK,iBAAiB,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;gBACzE,MAAM,CAAC,IAAI,CAAC,kCAAkC,CAAC,CAAC;aACjD;YACD,CAAC,EAAE,CAAC;YACN,sCAAsC;SACrC;aAAM;YACL,IAAI,IAAI,CAAC,QAAQ,EAAE;gBACjB,MAAM,CAAC,IAAI,CAAC,WAAW,IAAI,CAAC,KAAK,4BAA4B,CAAC,CAAC;aAChE;iBAAM,IAAI,IAAI,CAAC,UAAU,EAAE;gBAC1B,MAAM,CAAC,IAAI,CAAC,eAAe,CAAC,MAAM,IAAI,CAAC,KAAK,cAAc,CAAC,CAAC,CAAC,iBAAiB;gBAC9E,MAAM,CAAC,IAAI,CAAC,kBAAkB,CAAC,KAAK,CAAC,CAAC;gBACtC,IAAI,IAAI,CAAC,KAAK,EAAE;oBACd,MAAM,CAAC,IAAI,CAAC,cAAc,IAAI,CAAC,KAAK,sBAAsB,IAAI,CAAC,KAAK,UAAU,CAAC,CAAC;oBAChF,MAAM,CAAC,IAAI,CAAC,cAAc,IAAI,CAAC,KAAK,iBAAiB,CAAC,IAAI,CAAC,CAAC;iBAC7D;gBACD,MAAM,CAAC,IAAI,CAAC,2BAA2B,CAAC,IAAI,CAAC,CAAC;gBAC9C,MAAM,CAAC,IAAI,CAAC,iBAAiB,CAAC,cAAc,CAAC,CAAC;gBAC9C,MAAM,CAAC,IAAI,CAAC,aAAa,CAAC,MAAM,IAAI,CAAC,KAAK,cAAc,CAAC,CAAC;gBAC1D,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;aACpB;iBAAM,IAAI,CAAC,IAAI,CAAC,QAAQ,EAAE;gBACzB,MAAM,CAAC,IAAI,CAAC,iBAAiB,CAAC,MAAM,IAAI,CAAC,KAAK,cAAc,CAAC,CAAC,CAAC,iBAAiB;gBAChF,MAAM,CAAC,IAAI,CAAC,gBAAgB,CAAC,iBAAiB,CAAC,CAAC;gBAChD,IAAI,CAAC,KAAK,CAAC,CAAC,CAAC,MAAM,CAAC,IAAI,CAAC,YAAY,IAAI,CAAC,KAAK,cAAc,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;gBAC1E,MAAM,CAAC,IAAI,CAAC,yBAAyB,CAAC,IAAI,CAAC,CAAC;gBAC5C,MAAM,CAAC,IAAI,CAAC,eAAe,CAAC,cAAc,CAAC,CAAC;aAC7C;iBAAM;gBACL,MAAM,CAAC,IAAI,CAAC,iBAAiB,CAAC,MAAM,IAAI,CAAC,KAAK,cAAc,CAAC,CAAC,CAAC,iBAAiB;gBAChF,MAAM,CAAC,IAAI,CAAC,eAAe,CAAC,KAAK,CAAC,CAAC;gBACnC,MAAM,CAAC,IAAI,CAAC,2BAA2B,CAAC,IAAI,CAAC,CAAC;gBAC9C,IAAI,CAAC,KAAK,CAAC,CAAC,CAAC,MAAM,CAAC,IAAI,CAAC,cAAc,IAAI,CAAC,KAAK,cAAc,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,IAAI,CAAC;gBAC5E,MAAM,CAAC,IAAI,CAAC,iBAAiB,CAAC,cAAc,CAAC,CAAC;gBAC9C,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;aACpB;YACD,CAAC,EAAE,CAAC;SACL;IACH,CAAC,CAAC,CAAC;IACH,MAAM,CAAC,IAAI,CAAC,4DAA4D,CAAC,CAAC;IAC1E,MAAM,CAAC,IAAI,CAAC,gBAAgB,CAAC,CAAC;IAC9B,MAAM,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;IAClB,IAAI,SAAS,CAAC,CAAC,CAAC,CAAC,aAAa,EAAE;QAC9B,MAAM,CAAC,IAAI,CAAC,GAAG,IAAI,IAAI,KAAK,0BAA0B,IAAI,IAAI,KAAK,MAAM,IAAI,IAAI,KAAK,IAAI,CAAC,CAAC;KAC7F;SAAM;QACL,MAAM,CAAC,IAAI,CAAC,GAAG,IAAI,IAAI,KAAK,eAAe,IAAI,IAAI,KAAK,MAAM,IAAI,IAAI,KAAK,IAAI,CAAC,CAAC;KAClF;IACD,MAAM,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;IAClB,OAAO,MAAM,CAAC;AAChB,CAAC;AAED;;;;;;GAMG;AACH,SAAgB,QAAQ,CAAC,OAAgB,EAAE,SAA2B,EAAE,KAAc;IACpF,IAAI,MAAM,GAAa,EAAE,CAAC;IAC1B,IAAA,6BAAqB,EAAC,OAAO,EAAE,SAAS,CAAC,CAAC;IAC1C,MAAM,UAAU,GAAG,IAAA,yBAAiB,EAAC,OAAO,CAAC,CAAC;IAC9C,MAAM,OAAO,GAAG,MAAM,CAAC,IAAI,CAAC,UAAU,CAAC,CAAC;IACxC,MAAM,CAAC,IAAI,CAAC,2EAA2E,CAAC,CAAC;IACzF,MAAM,CAAC,IAAI,CAAC,yCAAyC,CAAC,CAAC;IACvD,MAAM,CAAC,IAAI,CAAC,aAAa,CAAC,CAAC;IAC3B,OAAO,CAAC,OAAO,CAAC,CAAC,GAAG,EAAE,EAAE;QACtB,IAAI,CAAC,GAAG,CAAC,CAAC;QACV,MAAM,OAAO,GAAa,EAAE,CAAC;QAC7B,UAAU,CAAC,GAAG,CAAC,CAAC,OAAO,CAAC,CAAC,SAAS,EAAE,EAAE;YACpC,MAAM,GAAG,MAAM,CAAC,MAAM,CAAC,sBAAsB,CAAC,GAAG,EAAE,CAAC,EAAE,SAAS,EAAE,SAAS,EAAE,KAAK,CAAC,CAAC,CAAC;YACpF,OAAO,CAAC,IAAI,CAAC,GAAG,GAAG,IAAI,CAAC,EAAE,CAAC,CAAC;YAC5B,CAAC,EAAE,CAAC;QACN,CAAC,CAAC,CAAC;QACH,MAAM,CAAC,IAAI,CAAC,YAAY,GAAG,mBAAmB,CAAC,CAAC;QAChD,MAAM,EAAE,GAAG,OAAO,CAAC,GAAG,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,GAAG,iBAAiB,CAAC,CAAC,IAAI,CAAC,WAAW,CAAC,CAAC;QACzE,MAAM,CAAC,IAAI,CAAC,YAAY,EAAE,GAAG,CAAC,CAAC;QAC/B,MAAM,CAAC,IAAI,CAAC,GAAG,CAAC,CAAC;IACnB,CAAC,CAAC,CAAC;IACH,MAAM,GAAG,MAAM,CAAC,MAAM,CAAC,iBAAiB,CAAC,SAAS,CAAC,CAAC,CAAC;IACrD,MAAM,CAAC,IAAI,CAAC;;;;;;;;;;;;;;;;;;CAkBb,CAAC,CAAC;IACD,OAAO,MAAM,CAAC;AAChB,CAAC;AA1CD,4BA0CC;AAED;;;;;GAKG;AACH,SAAgB,cAAc,CAAC,OAAgB,EAAE,gBAAkC,EAAE,QAAgB;IACnG,EAAE,CAAC,aAAa,CAAC,QAAQ,EACvB,QAAQ,CAAC,OAAO,EAAE,gBAAgB,EAAE,KAAK,CAAC,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC;AAC3D,CAAC;AAHD,wCAGC"}
@@ -0,0 +1,47 @@
1
+ export type ASTNode = RuleNode | Token;
2
+ export interface RuleNode {
3
+ stream_index: number;
4
+ type: string;
5
+ sub_rule_index: number;
6
+ children: [ASTNode];
7
+ named: {
8
+ [key: string]: ASTNode;
9
+ };
10
+ }
11
+ export interface Token {
12
+ stream_index: number;
13
+ type: string;
14
+ value: string;
15
+ len: number;
16
+ line_start: number;
17
+ column_start: number;
18
+ start: number;
19
+ }
20
+ export interface TokenDefinition {
21
+ str?: string;
22
+ reg?: RegExp;
23
+ func?: (input: string, stream: Token[]) => string | undefined;
24
+ verbose?: string;
25
+ }
26
+ export type TokensDefinition = Record<string, TokenDefinition>;
27
+ export type Grammar = Record<string, string[][]>;
28
+ export interface ProcessedRule {
29
+ value: string;
30
+ alias?: string;
31
+ optional: boolean;
32
+ repeatable: boolean;
33
+ leftRecursion: boolean;
34
+ function?: boolean;
35
+ }
36
+ export type ProcessedGrammar = Record<string, ProcessedRule[][]>;
37
+ export interface ParseFailure {
38
+ type: string;
39
+ sub_rule_index: number;
40
+ sub_rule_stream_index: number;
41
+ sub_rule_token_index: number;
42
+ stream_index: number;
43
+ token: Token;
44
+ first_token: Token;
45
+ success: false;
46
+ best_failure_array?: ParseFailure[];
47
+ }
package/dist/types.js ADDED
@@ -0,0 +1,3 @@
1
+ "use strict";
2
+ Object.defineProperty(exports, "__esModule", { value: true });
3
+ //# sourceMappingURL=types.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"types.js","sourceRoot":"","sources":["../types.ts"],"names":[],"mappings":""}
@@ -0,0 +1,29 @@
1
+ import { ASTNode, Token, TokensDefinition, Grammar, ProcessedGrammar, ParseFailure } from "./types";
2
+ declare function streamContext(token: Token, firstToken: Token, stream: Token[]): string;
3
+ /**
4
+ * Displays a formatted error message for parsing failures
5
+ * @param stream - Token stream
6
+ * @param tokensDefinition - Token definitions
7
+ * @param grammar - Grammar rules
8
+ * @param bestFailure - The parse failure with details
9
+ */
10
+ declare function displayError(stream: Token[], tokensDefinition: TokensDefinition, grammar: Grammar, bestFailure: ParseFailure): never;
11
+ /**
12
+ * Prints the AST tree structure to console
13
+ * @param node - Root AST node to print
14
+ * @param sp - Spacing/indentation string
15
+ */
16
+ declare function printTree(node: ASTNode, sp: string): void;
17
+ /**
18
+ * Validates that grammar and token definitions don't have overlapping keys
19
+ * @param grammar - Grammar rules
20
+ * @param tokensDefinition - Token definitions
21
+ */
22
+ declare function checkGrammarAndTokens(grammar: Grammar, tokensDefinition: TokensDefinition): void;
23
+ /**
24
+ * Preprocesses grammar rules to extract modifiers and aliases
25
+ * @param rules - Raw grammar rules
26
+ * @returns Processed grammar with parsed modifiers and metadata
27
+ */
28
+ declare function preprocessGrammar(rules: Grammar): ProcessedGrammar;
29
+ export { streamContext, preprocessGrammar, checkGrammarAndTokens, displayError, printTree, };