occam-parsers 17.0.2 → 17.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +247 -246
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -1,246 +1,247 @@
1
- # Occam Parsers
2
-
3
- [Occam](https://github.com/djalbat/occam)'s parsers.
4
-
5
- ### Contents
6
-
7
- - [Introduction](#introduction)
8
- - [Installation](#installation)
9
- - [Usage](#usage)
10
- - [Examples](#examples)
11
- - [Features](#features)
12
- - [Building](#building)
13
- - [Contact](#contact)
14
-
15
- ## Introduction
16
-
17
- Three parsers are documented:
18
-
19
- * A BNF parser, actually a variant of [extended BNF](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form).
20
- * A basic parser, for illustrative purposes, and for developing new grammars.
21
- * The common parser, which can be extended.
22
-
23
- All parsers share common functionality. The last two parse content according to rules defined in the aforementioned variant of extended BNF. The BNF parser on the other hand has its rules hard-coded. These rules can be defined in the self same variant that they implement:
24
-
25
- document ::= ( rule | error )+ ;
26
-
27
- rule ::= name ambiguousModifier? "::=" definitions ";" ;
28
-
29
- name ::= [name] ;
30
-
31
- definitions ::= definition ( "|" definition )* ;
32
-
33
- definition ::= part+ ;
34
-
35
- part ::= nonTerminalPart quantifier*
36
-
37
- | terminalPart quantifier*
38
-
39
- | noWhitespacePart
40
-
41
- ;
42
-
43
- nonTerminalPart ::= choiceOfParts
44
-
45
- | sequenceOfParts
46
-
47
- | ruleName lookAheadModifier?
48
-
49
- ;
50
-
51
- terminalPart ::= significantTokenType
52
-
53
- | regularExpression
54
-
55
- | terminalSymbol
56
-
57
- | endOfLine
58
-
59
- | wildcard
60
-
61
- | epsilon
62
-
63
- ;
64
-
65
- noWhitespacePart ::= "<NO_WHITESPACE>" ;
66
-
67
- choiceOfParts ::= "(" part ( "|" part )+ ")" ;
68
-
69
- sequenceOfParts ::= "(" part part+ ")" ;
70
-
71
- ruleName ::= [name] ;
72
-
73
- significantTokenType ::= [type] ;
74
-
75
- regularExpression ::= [regular-expression] ;
76
-
77
- terminalSymbol ::= [string-literal] ;
78
-
79
- endOfLine ::= "<END_OF_LINE>" ;
80
-
81
- wildcard ::= "." ;
82
-
83
- epsilon ::= "ε" ;
84
-
85
- quantifier ::= optionalQuantifier
86
-
87
- | oneOrMoreQuantifier
88
-
89
- | zeroOrMoreQuantifier
90
-
91
- ;
92
-
93
- ambiguousModifier ::= <NO_WHITESPACE>"!" ;
94
-
95
- lookAheadModifier ::= <NO_WHITESPACE>"..." ;
96
-
97
- optionalQuantifier ::= <NO_WHITESPACE>"?" ;
98
-
99
- oneOrMoreQuantifier ::= <NO_WHITESPACE>"+" ;
100
-
101
- zeroOrMoreQuantifier ::= <NO_WHITESPACE>"*" ;
102
-
103
- error ::= . ;
104
-
105
- ## Installation
106
-
107
- With [npm](https://www.npmjs.com/):
108
-
109
- npm install occam-parsers
110
-
111
- You can also clone the repository with [Git](https://git-scm.com/)...
112
-
113
- git clone https://github.com/djalbat/occam-parsers.git
114
-
115
- ...and then install the dependencies with npm from within the project's root directory:
116
-
117
- npm install
118
-
119
- You can also run a development server, see the section on building later on.
120
-
121
- ## Usage
122
-
123
- Import the requisite parser and its corresponding lexer from this package and the `occam-lexers` package, respectively. Then call their `fromNothing(...)` factory methods.
124
-
125
- ```
126
- import { BasicLexer } from "occam-lexers";
127
- import { BasicParser } from "occam-parsers"
128
-
129
- const basicLexer = BasicLexer.fromNothing(),
130
- basicParser = BasicParser.fromNothing();
131
-
132
- const content = `
133
-
134
- ...
135
-
136
- `,
137
- tokens = basicLexer.tokenise(content),
138
- node = basicParser.parse(tokens);
139
-
140
- ...
141
- ```
142
- The tokens returned from the lexers's `tokenise(...)` method can be passed directly to the parser's `parse(...)` method, which itself returns a node or `null`.
143
-
144
- ## Examples
145
-
146
- There are three examples, one for each parser. To view them, open the `index.html` file in the root of the repository. Each example shows a representation of the parse tree, which is useful for debugging.
147
-
148
- ## Features
149
-
150
- ### Quantifiers
151
-
152
- - `*` zero or more
153
- - `+` one or more
154
- - `?` optional
155
-
156
- These bind tightly to the symbols to their left and can be chained. Take note that both the `*+` and `?+` chains will cause an infinite loop and must be avoided.
157
-
158
- ### Regular expressions
159
-
160
- A regular expression is distinguished by the usual leading and trailing forward slashes:
161
-
162
- /\d+/
163
-
164
- ### Matching significant token types
165
-
166
- This can be done with a symbol that is identical to the significant token type in question, contained within square brackets. For example:
167
-
168
- name ::= [unassigned] ;
169
-
170
- ### Matching end of line tokens
171
-
172
- This can be done with a special `<END_OF_LINE>` special symbol. For example:
173
-
174
- verticalSpace ::= <END_OF_LINE>+ ;
175
-
176
- ### Matching no whitespace
177
-
178
- This can be done with the `<NO_WHITESPACE>` special symbol. For example:
179
-
180
- parenthesisedTerms ::= <NO_WHITESPACE>"(" terms <NO_WHITESPACE>")" ;
181
-
182
- It is conventional to leave no whitespace between the symbol and its subsequent part.
183
-
184
- ### Sequences of parts
185
-
186
- This can be done with the brackets. For example:
187
-
188
- terms ::= term ( "," term )* ;
189
-
190
- ### Choosing between parts
191
-
192
- The vertical bar symbol `|` is overloaded and can be used in conjunction with brackets to choose between parts as opposed to definitions. For example:
193
-
194
- justifiedStatement ::= statement ( "by" | "from" ) reference <END_OF_LINE> ;
195
-
196
-
197
- ### Look-ahead
198
-
199
- Consider the following rules:
200
-
201
- ABC ::= AAB BC ;
202
-
203
- AAB ::= "a" "b" | "a" ;
204
-
205
- BC ::= "b" "c" ;
206
-
207
- These will not parse the tokens `a`, `b`, `c` because the first definition of the `AAB` rule will parse the `a` and `b` tokens, leaving only the `c` token for the `BC` rule to parse. This situation can be addressed by making the `AAB` rule look ahead, that is, try each of its definitions in turn until one is found that allows the remainder of the parent rule to parse. The look-ahead modifier is an ellipsis, thus the rules above become:
208
-
209
- ABC ::= AAB... BC ;
210
-
211
- AAB ::= "a" "b" | "a" ;
212
-
213
- BC ::= "b" "c" ;
214
-
215
- Now the `ABC` rule will indeed parse the tokens `a`, `b`, `c`, because the second definition of the `AAB` rule will be tried after the first definition fails to allow the `BC` rule name part to parse.
216
-
217
- Also bear in mind that look-ahead is carried out to arbitrary depth and this it affects the behaviour of the `?`, `*` and `+` quantifiers, which become lazy. For example:
218
-
219
- ABC ::= AAB... ;
220
-
221
- AAB ::= "a" "b"+ "b" "c" ;
222
-
223
- Here the look-ahead modifier on the `AAB` rule name part forces the `+` quantifier on the `"b"` terminal part to be lazy, allowing the following to parse:
224
-
225
- a b b b c
226
-
227
- Without look-ahead, the `"b"+` part would consume all of the `b` tokens, leaving none for the subsequent `"b"` terminal part.
228
-
229
- It seems that the parser parses in time that is roughly directly proportional to the length of the input. However, on the ohter hand it is most likely that look-ahead takes exponential time given its nested nature. For this reason, look-ahead should be used sparingly.
230
-
231
- ## Building
232
-
233
- Automation is done with [npm scripts](https://docs.npmjs.com/misc/scripts), have a look at the `package.json` file. The pertinent commands are:
234
-
235
- npm run build-debug
236
- npm run watch-debug
237
-
238
- You can also start a small development server:
239
-
240
- npm start
241
-
242
- The example will then be available at http://localhost:8888 and will reload automatically when changes are made.
243
-
244
- ## Contact
245
-
246
- * james.smith@openmathematics.org
1
+ # Occam Parsers
2
+
3
+ [Occam](https://github.com/djalbat/occam)'s parsers.
4
+
5
+ ### Contents
6
+
7
+ - [Introduction](#introduction)
8
+ - [Installation](#installation)
9
+ - [Usage](#usage)
10
+ - [Examples](#examples)
11
+ - [Features](#features)
12
+ - [Building](#building)
13
+ - [Contact](#contact)
14
+
15
+ ## Introduction
16
+
17
+ Three parsers are documented:
18
+
19
+ * A BNF parser, actually a variant of [extended BNF](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form).
20
+ * A basic parser, for illustrative purposes, and for developing new grammars.
21
+ * The common parser, which can be extended.
22
+
23
+ All parsers share common functionality. The last two parse content according to rules defined in the aforementioned variant of extended BNF. The BNF parser on the other hand has its rules hard-coded. These rules can be defined in the self same variant that they implement:
24
+
25
+ document ::= ( rule | error )+ ;
26
+
27
+ rule ::= name ambiguousModifier? "::=" definitions ";" ;
28
+
29
+ name ::= [name] ;
30
+
31
+ definitions ::= definition ( "|" definition )* ;
32
+
33
+ definition ::= part+ ;
34
+
35
+ part ::= nonTerminalPart quantifier*
36
+
37
+ | terminalPart quantifier*
38
+
39
+ | noWhitespacePart
40
+
41
+ ;
42
+
43
+ nonTerminalPart ::= choiceOfParts
44
+
45
+ | sequenceOfParts
46
+
47
+ | ruleName lookAheadModifier?
48
+
49
+ ;
50
+
51
+ terminalPart ::= significantTokenType
52
+
53
+ | regularExpression
54
+
55
+ | terminalSymbol
56
+
57
+ | endOfLine
58
+
59
+ | wildcard
60
+
61
+ | epsilon
62
+
63
+ ;
64
+
65
+ noWhitespacePart ::= "<NO_WHITESPACE>" ;
66
+
67
+ choiceOfParts ::= "(" part ( "|" part )+ ")" ;
68
+
69
+ sequenceOfParts ::= "(" part part+ ")" ;
70
+
71
+ ruleName ::= [name] ;
72
+
73
+ significantTokenType ::= [type] ;
74
+
75
+ regularExpression ::= [regular-expression] ;
76
+
77
+ terminalSymbol ::= [string-literal] ;
78
+
79
+ endOfLine ::= "<END_OF_LINE>" ;
80
+
81
+ wildcard ::= "." ;
82
+
83
+ epsilon ::= "ε" ;
84
+
85
+ quantifier ::= optionalQuantifier
86
+
87
+ | oneOrMoreQuantifier
88
+
89
+ | zeroOrMoreQuantifier
90
+
91
+ ;
92
+
93
+ ambiguousModifier ::= <NO_WHITESPACE>"!" ;
94
+
95
+ lookAheadModifier ::= <NO_WHITESPACE>"..." ;
96
+
97
+ optionalQuantifier ::= <NO_WHITESPACE>"?" ;
98
+
99
+ oneOrMoreQuantifier ::= <NO_WHITESPACE>"+" ;
100
+
101
+ zeroOrMoreQuantifier ::= <NO_WHITESPACE>"*" ;
102
+
103
+ error ::= . ;
104
+
105
+ ## Installation
106
+
107
+ With [npm](https://www.npmjs.com/):
108
+
109
+ npm install occam-parsers
110
+
111
+ You can also clone the repository with [Git](https://git-scm.com/)...
112
+
113
+ git clone https://github.com/djalbat/occam-parsers.git
114
+
115
+ ...and then install the dependencies with npm from within the project's root directory:
116
+
117
+ npm install
118
+
119
+ You can also run a development server, see the section on building later on.
120
+
121
+ ## Usage
122
+
123
+ Import the requisite parser and its corresponding lexer from this package and the `occam-lexers` package, respectively. Then call their `fromNothing(...)` factory methods.
124
+
125
+ ```
126
+ import { BasicLexer } from "occam-lexers";
127
+ import { BasicParser } from "occam-parsers"
128
+
129
+ const basicLexer = BasicLexer.fromNothing(),
130
+ basicParser = BasicParser.fromNothing();
131
+
132
+ const content = `
133
+
134
+ ...
135
+
136
+ `,
137
+ tokens = basicLexer.tokenise(content),
138
+ node = basicParser.parse(tokens);
139
+
140
+ ...
141
+ ```
142
+ The tokens returned from the lexers's `tokenise(...)` method can be passed directly to the parser's `parse(...)` method, which itself returns a node or `null`.
143
+
144
+ ## Examples
145
+
146
+ There are three examples, one for each parser. To view them, open the `index.html` file in the root of the repository. Each example shows a representation of the parse tree, which is useful for debugging.
147
+
148
+ ## Features
149
+
150
+ ### Quantifiers
151
+
152
+ - `*` zero or more
153
+ - `+` one or more
154
+ - `?` optional
155
+
156
+ These bind tightly to the symbols to their left and can be chained. Take note that both the `*+` and `?+` chains will cause an infinite loop and must be avoided.
157
+
158
+ ### Regular expressions
159
+
160
+ A regular expression is distinguished by the usual leading and trailing forward slashes:
161
+
162
+ /\d+/
163
+
164
+ ### Matching significant token types
165
+
166
+ This can be done with a symbol that is identical to the significant token type in question, contained within square brackets. For example:
167
+
168
+ name ::= [unassigned] ;
169
+
170
+ ### Matching end of line tokens
171
+
172
+ This can be done with a special `<END_OF_LINE>` special symbol. For example:
173
+
174
+ verticalSpace ::= <END_OF_LINE>+ ;
175
+
176
+ ### Matching no whitespace
177
+
178
+ This can be done with the `<NO_WHITESPACE>` special symbol. For example:
179
+
180
+ parenthesisedTerms ::= <NO_WHITESPACE>"(" terms <NO_WHITESPACE>")" ;
181
+
182
+ It is conventional to leave no whitespace between the symbol and its subsequent part.
183
+
184
+ ### Sequences of parts
185
+
186
+ This can be done with the brackets. For example:
187
+
188
+ terms ::= term ( "," term )* ;
189
+
190
+ ### Choosing between parts
191
+
192
+ The vertical bar symbol `|` is overloaded and can be used in conjunction with brackets to choose between parts as opposed to definitions. For example:
193
+
194
+ justifiedStatement ::= statement ( "by" | "from" ) reference <END_OF_LINE> ;
195
+
196
+
197
+ ### Look-ahead
198
+
199
+ Consider the following rules:
200
+
201
+ ABC ::= AAB BC ;
202
+
203
+ AAB ::= "a" "b" | "a" ;
204
+
205
+ BC ::= "b" "c" ;
206
+
207
+ These will not parse the tokens `a`, `b`, `c` because the first definition of the `AAB` rule will parse the `a` and `b` tokens, leaving only the `c` token for the `BC` rule to parse. This situation can be addressed by making the `AAB` rule look ahead, that is, try each of its definitions in turn until one is found that allows the remainder of the parent rule to parse. The look-ahead modifier is an ellipsis, thus the rules above become:
208
+
209
+ ABC ::= AAB... BC ;
210
+
211
+ AAB ::= "a" "b" | "a" ;
212
+
213
+ BC ::= "b" "c" ;
214
+
215
+ Now the `ABC` rule will indeed parse the tokens `a`, `b`, `c`, because the second definition of the `AAB` rule will be tried after the first definition fails to allow the `BC` rule name part to parse.
216
+
217
+ Also bear in mind that look-ahead is carried out to arbitrary depth and this it affects the behaviour of the `?`, `*` and `+` quantifiers, which become lazy. For example:
218
+
219
+ ABC ::= AAB... ;
220
+
221
+ AAB ::= "a" "b"+ "b" "c" ;
222
+
223
+ Here the look-ahead modifier on the `AAB` rule name part forces the `+` quantifier on the `"b"` terminal part to be lazy, allowing the following to parse:
224
+
225
+ a b b b c
226
+
227
+ Without look-ahead, the `"b"+` part would consume all of the `b` tokens, leaving none for the subsequent `"b"` terminal part.
228
+
229
+ It seems that the parser parses in time that is roughly directly proportional to the length of the input. However, on the ohter hand it is most likely that look-ahead takes exponential time given its nested nature. For this reason, look-ahead should be used sparingly.
230
+
231
+ ## Building
232
+
233
+ Automation is done with [npm scripts](https://docs.npmjs.com/misc/scripts), have a look at the `package.json` file. The pertinent commands are:
234
+
235
+ npm run build-debug
236
+ npm run watch-debug
237
+
238
+ You can also start a small development server:
239
+
240
+ npm start
241
+
242
+ The example will then be available at http://localhost:8888 and will reload automatically when changes are made.
243
+
244
+ ## Contact
245
+
246
+ * james.smith@djalbat.com
247
+
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "occam-parsers",
3
3
  "author": "James Smith",
4
- "version": "17.0.2",
4
+ "version": "17.0.3",
5
5
  "license": "MIT, Anti-996",
6
6
  "homepage": "https://github.com/djalbat/occam-parsers",
7
7
  "description": "Occam's parsers.",