@zigsterz/parzing 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +24 -18
- package/dist/builder.d.ts +2 -0
- package/dist/builder.js +4 -0
- package/dist/core.d.ts +5 -1
- package/dist/core.js +17 -0
- package/dist/parsers/RegexParser.d.ts +7 -0
- package/dist/parsers/RegexParser.js +21 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,7 +1,9 @@
|
|
|
1
1
|
# Parzing: TypesSript Parser Combinator Library
|
|
2
2
|
|
|
3
3
|
## Overview
|
|
4
|
-
This package
|
|
4
|
+
This package is Parzing: a parser combinator library, allowing client code to easily create parsers in JavaScript or TypeScript. When used with TypeScript accurate types are computed for parsed and intermediate results, allowing easy and safe implementation and use of parsers.
|
|
5
|
+
|
|
6
|
+
This page provides instructions on how to use and customize Parzing. For more insights about the library, [see this blog](https://www.imonlydoingthis.benhaim.net/home/categories/parzing).
|
|
5
7
|
|
|
6
8
|
## Installation
|
|
7
9
|
|
|
@@ -13,7 +15,7 @@ npm install --save @zigsterz/parzing
|
|
|
13
15
|
|
|
14
16
|
Parzing exposes a `parse` function for invoking a parser on content. To use it, we first construct a parser, and then pass the parser along with the content to parse. `parse` will either return a the result of succesfuly parsing the content, or throw an error describing a failure to parse.
|
|
15
17
|
|
|
16
|
-
The `ParserBuilder` class exposes a set of helper factory functions for constructing
|
|
18
|
+
The parser passed to `parse` is usually built using the `ParserBuilder` class. This class exposes a set of helper factory functions for constructing parsers.
|
|
17
19
|
|
|
18
20
|
The example below demonstrates how to parse a sequence of 1 to 3 digits by first constructing a `ParserBuilder`, then using it to create an `AnyOfParser` parser and finally running the parser using `parse`.
|
|
19
21
|
|
|
@@ -29,11 +31,11 @@ assert(result == "123");
|
|
|
29
31
|
|
|
30
32
|
### Parsing Results
|
|
31
33
|
The result of parsing content may be a value of any type. A Parzing parser has an associated result type that describes the type of the parsing result returned by that parser.
|
|
32
|
-
|
|
34
|
+
The return type from `parse` will match the result type of the parser passed to it.
|
|
33
35
|
|
|
34
36
|
### Basic Parsers
|
|
35
37
|
|
|
36
|
-
`pb.anyOf` creates the Any Of *basic parser*. Basic parsers are the atomic building blocks for parsing. They may be combined using [*parser combinators*](#parser-combinators) to construct more complex parsers.
|
|
38
|
+
In the example above, `pb.anyOf` creates the Any Of *basic parser*. Basic parsers are the atomic building blocks for parsing. They may be combined using [*parser combinators*](#parser-combinators) to construct more complex parsers.
|
|
37
39
|
|
|
38
40
|
Parzing offers the following basic parsers out of the box:
|
|
39
41
|
|
|
@@ -41,17 +43,18 @@ Parzing offers the following basic parsers out of the box:
|
|
|
41
43
|
| ------------------ | ----------- | ----------- |
|
|
42
44
|
| `ParserBuilder.anyOf(chars, min, max)` | Parses a minimum of *min* characters, and up to *max* characters, all out the characters listed in *chars*. | `string` |
|
|
43
45
|
| `ParserBuilder.token(token)` | Parses the exact string specified by *token*. | `string` |
|
|
46
|
+
| `ParserBuilder.regex(regex)` | Parses the regular expression specified by *regex*. | `string` |
|
|
44
47
|
| `ParserBuilder.pass()` | This is a no-op parser. It consumes no input and always succeeds. | `void` |
|
|
45
48
|
| `ParserBuilder.fail(message)` | Fail parsing with the error message provided in *message*. | `void` |
|
|
46
49
|
|
|
47
|
-
In addition to these parsers, you can create [custom parsers](#custom-parsers) to parse arbitrary complex "atoms".
|
|
50
|
+
In addition to these parsers, you can create [custom parsers](#custom-parsers) to parse arbitrary complex "atoms". Custom parsers may provide any result type.
|
|
48
51
|
|
|
49
52
|
## Creating Complex Parsers
|
|
50
53
|
|
|
51
54
|
### Parser Combinators
|
|
52
|
-
*Parser Combinators* are parsers
|
|
55
|
+
*Parser Combinators* are parsers constructed based on other parsers that combine these parsers in some form to generate a more complex parser.
|
|
53
56
|
|
|
54
|
-
Perhaps the simplest example of a
|
|
57
|
+
Perhaps the simplest example of a parser combinator is the Sequence combinator.
|
|
55
58
|
The sequence combinator is constructed based on a sequence of underlying parsers, using the `ParserBuilder.sequence(...)` factory method.
|
|
56
59
|
When parsing input content, the combinator will invoke each of the underlying parsers to parse consecutive fragments of the content.
|
|
57
60
|
If any underlying parser fails, the sequence parser fails as well.
|
|
@@ -76,7 +79,7 @@ const result = parse(sample_parser, "1-2-3", true);
|
|
|
76
79
|
assert.deepEqual(result, ["1", "-", "2", "-", "3"]);
|
|
77
80
|
```
|
|
78
81
|
|
|
79
|
-
In addition to the `sequence` parser combinator, the following parser combinators offered by Parzing out-of-the box:
|
|
82
|
+
In addition to the `sequence` parser combinator, the following parser combinators are offered by Parzing out-of-the box:
|
|
80
83
|
|
|
81
84
|
| Parser constructor | Description | Result type |
|
|
82
85
|
| ------------------ | ----------- | ----------- |
|
|
@@ -91,13 +94,13 @@ In addition to the `sequence` parser combinator, the following parser combinator
|
|
|
91
94
|
|
|
92
95
|
### Whitespace Support
|
|
93
96
|
|
|
94
|
-
Parsers that derive from `ParserWithInternalWhitespaceSupport` support ignoring whitespace within
|
|
97
|
+
Parsers that derive from `ParserWithInternalWhitespaceSupport` support ignoring whitespace within parsed content. Exactly where whitespace is ignored depends on the specific parser as per the table below.
|
|
95
98
|
|
|
96
99
|
For all of these parsers, the ignored "whitespace" is defined as content that can be parsed by the *whitespace parser*. The whitespace parser can be set by invoking `target_parser.whitespace(whitespace_parser)`.
|
|
97
|
-
If this method is not invoked on the parser, and the parser was created using `ParserBuilder`, then the whitespace parser is set as the default whitespace parser for the builder. The default whitespace parser for a `ParserBuilder` can be set by passing it on construction.
|
|
98
|
-
|
|
99
100
|
If there's no set whitespace parser on a `ParserWithInternalWhitespaceSupport`, no whitespace will be ignored.
|
|
100
101
|
|
|
102
|
+
If the `whitespace` method is not invoked on a parser, and the parser was created using `ParserBuilder`, then the whitespace parser is set as the default whitespace parser for the builder. The default whitespace parser for a `ParserBuilder` can be set by passing it on construction.
|
|
103
|
+
|
|
101
104
|
The `WhitespaceParser` class implements a parser that accepts common whitespace patterns.
|
|
102
105
|
|
|
103
106
|
The following code example illustrates a few ways to set whitespace parsers:
|
|
@@ -167,9 +170,10 @@ This parser will fail, but the failure will be reported by the `choice` combinat
|
|
|
167
170
|
ParseError { message: 'Parser rejected input' }
|
|
168
171
|
```
|
|
169
172
|
|
|
170
|
-
Clearly, for the input `number a123` a more
|
|
171
|
-
|
|
172
|
-
|
|
173
|
+
Clearly, for the input `number a123` a more reasonable behavior would be if the parser didn't even try the second alternative in the `choice` combinator above, and immediately bail out if we've encountered the `number` token. In Parzing, This kind of behavior can be achieved using *cuts*.
|
|
174
|
+
|
|
175
|
+
A cut is a special parser, constructed using `ParserBuilder.cut`, that doesn't attempt to consume any input. Rather,
|
|
176
|
+
When a cut 'parses', the fact that it was encountered is recored in the parsing context. Backtracking parsers, such as the ones listed above, will not attempt to backtrack parsing if a cut was encountered by one of their underlying parsers. Rather they will immediately fail with whatever failure that would have caused them to backtrack.
|
|
173
177
|
|
|
174
178
|
Fixing the example above using cuts, we can write:
|
|
175
179
|
|
|
@@ -202,7 +206,7 @@ which would now result in the following exception:
|
|
|
202
206
|
ParseError { message: "Expecting AnyOf 0123456789 at 7 ('a123')" }
|
|
203
207
|
```
|
|
204
208
|
|
|
205
|
-
Clearly a more useful error message. Note that in addition to yielding clearer errors, cuts may also improve parsing performance
|
|
209
|
+
Clearly a more useful error message. Note that in addition to yielding clearer errors, cuts may also improve parsing performance by preventing backtracks.
|
|
206
210
|
|
|
207
211
|
There are cases where you may want to reuse the same parser in different contexts -- where in some contexts you want the cut to appear but in others you want the cut to be ignored. This is achieved by invoking ``ParserBuilder.attempt`` on the parser which will return an ``AttemptCombinator``. This combinator parser will "swallow" any cut encountered indication within the underlying parser.
|
|
208
212
|
|
|
@@ -250,12 +254,14 @@ Parzing offers the following operators out of the box. Note that you can also cr
|
|
|
250
254
|
|
|
251
255
|
### Recursive Parsers
|
|
252
256
|
|
|
253
|
-
|
|
257
|
+
Grammars often include recursive definitions. Consider for example the following simple expresion parser grammer:
|
|
254
258
|
|
|
259
|
+
```
|
|
255
260
|
expression := addition | subtraction
|
|
256
261
|
addition := term '+' term
|
|
257
262
|
subtraction := term '-' term
|
|
258
263
|
term := number | '(' expression ')'
|
|
264
|
+
````
|
|
259
265
|
|
|
260
266
|
How would we define this using Parzing?
|
|
261
267
|
|
|
@@ -289,9 +295,9 @@ const subtraction = pb.sequence(
|
|
|
289
295
|
const expression = pb.choice(addition, subtraction);
|
|
290
296
|
```
|
|
291
297
|
|
|
292
|
-
Note the comment "Ooops!" above. The recursive nature of the parser creates a circular declaration
|
|
298
|
+
Note the comment "Ooops!" above. The recursive nature of the parser creates a circular declaration, which is disallowed in Typescript.
|
|
293
299
|
|
|
294
|
-
The `ParserBuilder.ref` method allows
|
|
300
|
+
The `ParserBuilder.ref` method allows solving this problem by receiving a parameterless function returning a parser, and creating a parser that lazily resolves to the function's return value.
|
|
295
301
|
Using this mechanism, our recursive parser becomes possible by modifying the code above as follows:
|
|
296
302
|
|
|
297
303
|
|
package/dist/builder.d.ts
CHANGED
|
@@ -6,6 +6,7 @@ import { OptionalCombinator } from "./combinators/OptionalCombinator";
|
|
|
6
6
|
import { SequenceCombinator } from "./combinators/SequenceCombinator";
|
|
7
7
|
import { CutParser, FailParser, Parser, ParserType, PassParser, RefParser } from "./core";
|
|
8
8
|
import { AnyOfParser } from "./parsers/AnyOfParser";
|
|
9
|
+
import { RegexParser } from "./parsers/RegexParser";
|
|
9
10
|
import { TokenParser } from "./parsers/TokenParser";
|
|
10
11
|
declare type WithPostfixSupport<T> = T & {
|
|
11
12
|
_<R>(f: (target: WithPostfixSupport<T>) => R): WithPostfixSupport<R>;
|
|
@@ -17,6 +18,7 @@ export declare class ParserBuilder {
|
|
|
17
18
|
parser<T extends Parser<any>>(p: T): WithPostfixSupport<T>;
|
|
18
19
|
token(tok: string): WithPostfixSupport<TokenParser>;
|
|
19
20
|
anyOf(alts: string, minLen?: number | null, maxLen?: number | null): WithPostfixSupport<AnyOfParser>;
|
|
21
|
+
regex(re: RegExp): WithPostfixSupport<RegexParser>;
|
|
20
22
|
fail(message?: string): WithPostfixSupport<FailParser>;
|
|
21
23
|
pass(): WithPostfixSupport<PassParser>;
|
|
22
24
|
cut(): WithPostfixSupport<CutParser>;
|
package/dist/builder.js
CHANGED
|
@@ -9,6 +9,7 @@ var OptionalCombinator_1 = require("./combinators/OptionalCombinator");
|
|
|
9
9
|
var SequenceCombinator_1 = require("./combinators/SequenceCombinator");
|
|
10
10
|
var core_1 = require("./core");
|
|
11
11
|
var AnyOfParser_1 = require("./parsers/AnyOfParser");
|
|
12
|
+
var RegexParser_1 = require("./parsers/RegexParser");
|
|
12
13
|
var TokenParser_1 = require("./parsers/TokenParser");
|
|
13
14
|
function addPostfixSupport(who) {
|
|
14
15
|
var ret = who;
|
|
@@ -41,6 +42,9 @@ var ParserBuilder = /** @class */ (function () {
|
|
|
41
42
|
if (maxLen === void 0) { maxLen = null; }
|
|
42
43
|
return this.postProcessParser(new AnyOfParser_1.AnyOfParser(alts, minLen, maxLen));
|
|
43
44
|
};
|
|
45
|
+
ParserBuilder.prototype.regex = function (re) {
|
|
46
|
+
return this.postProcessParser(new RegexParser_1.RegexParser(re));
|
|
47
|
+
};
|
|
44
48
|
ParserBuilder.prototype.fail = function (message) {
|
|
45
49
|
if (message === void 0) { message = null; }
|
|
46
50
|
return this.postProcessParser(new core_1.FailParser(message));
|
package/dist/core.d.ts
CHANGED
|
@@ -3,6 +3,8 @@ export interface ParserInputBookmark {
|
|
|
3
3
|
export interface ParserInput {
|
|
4
4
|
read(readLen: number): string;
|
|
5
5
|
peek(peekLen: number): string;
|
|
6
|
+
readRegex?(regex: RegExp): string | null;
|
|
7
|
+
peekRegex?(regex: RegExp): string | null;
|
|
6
8
|
getBookmark(): ParserInputBookmark;
|
|
7
9
|
seekToBookmark(bm: ParserInputBookmark): any;
|
|
8
10
|
eof(): boolean;
|
|
@@ -14,6 +16,8 @@ export declare class StringParserInput implements ParserInput {
|
|
|
14
16
|
constructor(_text: String);
|
|
15
17
|
read(readLen: number): string;
|
|
16
18
|
peek(readLen: number): string;
|
|
19
|
+
readRegex(regex: RegExp): string | null;
|
|
20
|
+
peekRegex(regex: RegExp): string | null;
|
|
17
21
|
getBookmark(): ParserInputBookmark;
|
|
18
22
|
seekToBookmark(bm: ParserInputBookmark): void;
|
|
19
23
|
eof(): boolean;
|
|
@@ -48,7 +52,7 @@ export interface Parser<T> {
|
|
|
48
52
|
parse(parserContext: ParserContext): ParseResult<T>;
|
|
49
53
|
}
|
|
50
54
|
export declare function isParser(p: any): p is Parser<unknown>;
|
|
51
|
-
export declare class FailParser implements Parser<
|
|
55
|
+
export declare class FailParser implements Parser<unknown> {
|
|
52
56
|
private _message;
|
|
53
57
|
constructor(_message?: string);
|
|
54
58
|
parse(parserContext: ParserContext): any;
|
package/dist/core.js
CHANGED
|
@@ -18,6 +18,23 @@ var StringParserInput = /** @class */ (function () {
|
|
|
18
18
|
StringParserInput.prototype.peek = function (readLen) {
|
|
19
19
|
return this._text.substr(this._index, readLen);
|
|
20
20
|
};
|
|
21
|
+
StringParserInput.prototype.readRegex = function (regex) {
|
|
22
|
+
var ret = this.peekRegex(regex);
|
|
23
|
+
if (ret) {
|
|
24
|
+
this._index += ret.length;
|
|
25
|
+
}
|
|
26
|
+
return ret;
|
|
27
|
+
};
|
|
28
|
+
StringParserInput.prototype.peekRegex = function (regex) {
|
|
29
|
+
var matchResult = regex.exec(this._text.substring(this._index));
|
|
30
|
+
if (!matchResult) {
|
|
31
|
+
return null;
|
|
32
|
+
}
|
|
33
|
+
if (matchResult.index != 0) {
|
|
34
|
+
return null;
|
|
35
|
+
}
|
|
36
|
+
return matchResult[0];
|
|
37
|
+
};
|
|
21
38
|
StringParserInput.prototype.getBookmark = function () {
|
|
22
39
|
return this._index;
|
|
23
40
|
};
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
import { Parser, ParserContext, ParseResult } from "../core";
|
|
2
|
+
export declare class RegexParser implements Parser<string> {
|
|
3
|
+
private _regex;
|
|
4
|
+
_charBitmap: number[] | null;
|
|
5
|
+
constructor(_regex: RegExp);
|
|
6
|
+
parse(parserContext: ParserContext): ParseResult<string>;
|
|
7
|
+
}
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
3
|
+
exports.RegexParser = void 0;
|
|
4
|
+
var core_1 = require("../core");
|
|
5
|
+
var RegexParser = /** @class */ (function () {
|
|
6
|
+
function RegexParser(_regex) {
|
|
7
|
+
this._regex = _regex;
|
|
8
|
+
}
|
|
9
|
+
RegexParser.prototype.parse = function (parserContext) {
|
|
10
|
+
if (!parserContext.input.readRegex) {
|
|
11
|
+
throw core_1.ParseError.parserRejected(this, parserContext, "Input doesn't support regex parsing");
|
|
12
|
+
}
|
|
13
|
+
var reResult = parserContext.input.readRegex(this._regex);
|
|
14
|
+
if (!reResult) {
|
|
15
|
+
return core_1.ParseResult.failed(core_1.ParseError.parserRejected(this, parserContext, "Expected " + this._regex.source));
|
|
16
|
+
}
|
|
17
|
+
return core_1.ParseResult.successful(reResult);
|
|
18
|
+
};
|
|
19
|
+
return RegexParser;
|
|
20
|
+
}());
|
|
21
|
+
exports.RegexParser = RegexParser;
|