meta-parser-generator 1.0.3 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +157 -6
  2. package/package.json +3 -3
package/README.md CHANGED
@@ -1,19 +1,170 @@
1
- # meta-parser-generator
1
+ # Meta Parser Generator
2
2
 
3
- Generate an efficient parser using a grammar and and token definition.
4
- Meta programming is used to generate an efficient output.
5
- The JavaScript call stack is used within the parser. So if you design a very recursive grammar you might trigger a "Maximum call stack size exceeded" error for a large input.
3
+ ```bash
4
+ npm install meta-parser-generator
5
+ ```
6
+
7
+ Meta Parser Generator will help you generate an efficient parser using a grammar and a token definition.
8
+ Meta programming is used to generate a single self contained parser file.
6
9
 
7
10
  This code has been extracted from https://github.com/batiste/blop-language
8
11
 
9
- Characterisitcs
12
+ ## Characteristics
10
13
 
11
14
  * LL parser (Left to Right parser), arbitrary look ahead
12
15
  * Direct Left recursion support (no indirect)
13
16
  * Parser code is generated from a grammar
14
- * Good parsing performance
17
+ * Good parsing performance (provided your grammar is efficient)
15
18
  * Decent error reporting on parsing error
16
19
  * Memoization
17
20
  * Small source code (~500 lines of code), no dependencies
21
+
22
+ ## How to generate and use a parser
23
+
24
+ This will generate a mathematical operation parser
25
+
26
+ ```javascript
27
+ const { generateParser } = require('meta-parser-generator');
28
+ const path = require('path');
29
+
30
+ // only 3 possible tokens
31
+ const tokensDefinition = {
32
+ 'number': { reg: /^[0-9]+(\.[0-9]*)?/ },
33
+ 'math_operator': { reg: /^(\+|-|\*|%)/ },
34
+ 'newline': { str: '\n' }
35
+ }
36
+
37
+ const grammar = {
38
+ // START is the convention keyword for the entry point of the grammar
39
+ 'START': [
40
+ // necessary to accept the first line be MATH
41
+ ['MATH', 'LINE*', 'EOS'], // EOS is the End Of Stream token, added automatically by the tokenizer
42
+ // * is the repeating modifier {0,∞}. Better than recursion as it does not use the call stack
43
+ ['LINE*', 'EOS'],
44
+ ],
45
+ 'LINE': [
46
+ // we define a line as always starting with a newline
47
+ ['newline', 'MATH'],
48
+ ['newline'],
49
+ ],
50
+ 'MATH': [
51
+ // direct left recursion here
52
+ ['MATH', 'math_operator', 'number'],
53
+ ['number'],
54
+ ],
55
+ };
56
+
57
+ // this generate the executable parser file
58
+ generateParser(grammar, tokensDefinition, path.resolve(__dirname, './parser.js'));
59
+ console.log('parser generated');
60
+ ```
61
+
62
+ Then you can use the generated parser this way
63
+
64
+ ```javascript
65
+ const parser = require('./parser');
66
+ const { tokensDefinition, grammar } = require('./generator');
67
+ const { displayError } = require('meta-parser-generator');
68
+
69
+ function parse(input) {
70
+ const tokens = parser.tokenize(tokensDefinition, input);
71
+ const ast = parser.parse(tokens);
72
+ if (!ast.success) {
73
+ displayError(tokens, tokensDefinition, grammar, ast);
74
+ }
75
+ return ast;
76
+ }
77
+
78
+ let ast = parse('9+10-190.3');
79
+ console.log(ast)
80
+ ```
81
+
82
+ ### How does generated parser works?
83
+
84
+ Each grammar rule you write is transformed into a function, and those grammar functions call each other until the input parsing is sucessful. Therefor the JavaScript call stack is used by the generated parser. So if you design a very recursive grammar, you might trigger a "Maximum call stack size exceeded" error for a large input.
85
+
86
+ In our example `MATH` grammar rule above you have a left recursion. It means you can parse expressions such as 1+2+3+4+5+...X, where X is the maximum stack size of V8.
87
+
88
+ To know the default maximum stack size of V8 you can run `node --v8-options | grep stack-size`. If the default size is not enough for your grammar, use this option to extend the size. You can also try to rewrite your grammar in order to be less recursive.
89
+
90
+ Anything that can be handled by a modifier rather than recursion will not use the call stack and should be preffered.
91
+
92
+ ### AST interface
93
+
94
+ ```typescript
95
+ type ASTNode = RuleNode | Token
96
+
97
+ interface RuleNode {
98
+ stream_index: number // position of the first token for this rule in the token stream
99
+ type: str // name of the rule
100
+ subRule: number // index of this grammar rule in the subrule array
101
+ children: [ASTNode] // list of children
102
+ named: { [key: string]: ASTNode; } // named elements withing this rule, see named aliases
103
+ }
104
+ ```
105
+
106
+ At the leaf of the AST you will find the final tokens. They have a slightly different interface
107
+
108
+ ```typescript
109
+ interface Token {
110
+ stream_index: number // position of the token in the token stream
111
+ type: str // name of token
112
+ value: str // the value of the token
113
+ len: number // shortcut for value.length
114
+ lineStart: number // line start position in the input
115
+ columnStart: number // column start position in the input
116
+ start: number // character start position in the input
117
+ }
118
+ ```
119
+
120
+ ### Modifiers
121
+
122
+ There is 3 modifiers you can add at the end of a rule or token
123
+
124
+ ```
125
+ * is the {0,∞} repeating modifier
126
+ + is the {1,∞} repeating modifier
127
+ ? is the {0,1} conditional modifier
128
+ ```
129
+
130
+ #### Example
131
+
132
+ ```typescript
133
+ ['PREPOSITION', 'ADJECTIVE*', 'NAME']
134
+ ```
135
+
136
+ ### Named alias
137
+
138
+ To facilitate your work with the AST, you can name a rule or a token using a colon
139
+
140
+ ```typescript
141
+ 'MATH': [
142
+ ['MATH', 'math_operator:operator', 'number:num'], // math_operator and number token are named
143
+ ['number:num'], // here only number is named
144
+ ]
145
+ ```
146
+
147
+ Then in the corresponding `RuleNode` you will find the `math_operator` in the children, but also in the named object.
148
+ This is useful to more easily handle and differenciate your grammar rules:
149
+
150
+ ```typescript
151
+ // a function that handle both MATH grammar rules defined above
152
+ function handle_MATH_node(node: RuleNode) {
153
+ const named = node.named
154
+ // if there is an operator, we are dealing with sub rule 0
155
+ if(named['operator']) {
156
+ const left_recursion = handle_MATH_node(node.children[0])
157
+ console.log(`${left_recursion} ${named['operator'].value} ${named['num'].value}`)
158
+ } else {
159
+ console.log(named['num'].value)
160
+ }
161
+ }
162
+ ```
163
+
164
+
165
+ ### Errors
166
+
167
+ The util function `displayError` will display detailed informations about a tokenizer or parsing error. The hint given
168
+ is based on the first grammar rule found that consume the most token from the stream.
18
169
 
19
170
  <img src="/error.png" width="800">
package/package.json CHANGED
@@ -1,11 +1,11 @@
1
1
  {
2
2
  "name": "meta-parser-generator",
3
- "version": "1.0.3",
3
+ "version": "1.0.4",
4
4
  "description": "A peg parser generator written in JavaScript for JavaScript",
5
5
  "main": "metaParserGenerator.js",
6
6
  "scripts": {
7
7
  "test": "jest --no-cache",
8
- "ttest": "node ./tests/generateParser.js && jest --no-cache"
8
+ "gentest": "node ./tests/generateParser.js && jest --no-cache"
9
9
  },
10
10
  "repository": {
11
11
  "type": "git",
@@ -24,6 +24,6 @@
24
24
  },
25
25
  "homepage": "https://github.com/batiste/meta-parser-generator#readme",
26
26
  "devDependencies": {
27
- "jest": "^24.9.0"
27
+ "jest": "^28.1.3"
28
28
  }
29
29
  }