@the_dissidents/libemmm 0.0.8 → 0.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,38 +1,45 @@
1
1
  # libemmm
2
2
 
3
- This package contains the parser and language server for the `emmm` markup language.
4
-
5
- ```sh
6
- npm install @the_dissidents/libemmm
7
- ```
3
+ This package contains the parser and a default configuration of the `emmm` markup language.
8
4
 
9
5
  ## Usage
10
6
 
11
7
  `emmm` is an extensible language. The parser by itself only handles the basic syntax; it accepts a `Configuration` object that defines most of the features.
12
8
 
9
+ To parse a source, create a `ParseContext` object from a `Configuration`; the context object holds the parser state, and you can use the same context to parse multiple sources to make definitions persist across them.
10
+
13
11
  ```typescript
14
12
  import * as emmm from '@the_dissidents/libemmm';
13
+
15
14
  let config = new emmm.Configuration(emmm.BuiltinConfiguration);
16
- // add definitions to config here
15
+ // optionally, add definitions to config here
16
+
17
+ let context = new ParseContext(config);
17
18
  ```
18
19
 
19
- The parser reads from a very simple scanner interface that only goes forward, without backtracking. Usually you can use the default implementation. The parser returns a `Document` object.
20
+ The parser reads from a very simple scanner interface that only goes forward, without backtracking. Usually you can use the default implementation. Parsing yields a `Document` object.
20
21
 
21
22
  ```typescript
22
23
  let scanner = new emmm.SimpleScanner(source);
23
- let doc = emmm.parse(scanner, config);
24
- ```
24
+ let doc = context.parse(scanner);
25
25
 
26
- - `doc.root` is the AST root node.
27
- - `doc.context` is a `ParseContext` object containing the state of the language that the extensions need to know, such as variables and modifier definitions. This is its state at the end of the parse.
28
- - `doc.messages` is the array of diagnostic messages.
29
- - You may want to call `doc.debugPrint(source)` to get a pretty-printed debug string of the AST.
26
+ // `doc.root` is the AST root node
27
+ // `doc.messages` is an array of diagnostic messages
28
+ ```
30
29
 
31
30
  ## A Semi-Technical Reference to `emmm` Syntax
32
31
 
33
32
  ![AST Structure](./doc-images/ast.svg)
34
33
 
35
- Block-level entities are usually separated by a blank line (two newline characters). One newline does not create a new block, and is preserved along with other whitespaces inside the block.
34
+ ### 1. Block entities
35
+
36
+ #### 1.1. Paragraphs
37
+
38
+ The most basic type of **block-level** entities is **paragraph**.
39
+
40
+ Block-level entities are usually separated by a blank line (two newline characters). One newline does not create a new block and is preserved. Whitespaces and newlines at the beginning of a block are usually ignored. However, whitespaces *inside* the block are preverved [^1].
41
+
42
+ [^1]: There is also an option in `KernelConfiguration` that tells the parser to collapse consecutive whitespaces to a single space.
36
43
 
37
44
  ```
38
45
  This is a paragraph.
@@ -41,11 +48,23 @@ This is another paragraph.
41
48
  Still in the same paragraph, but after a newline.
42
49
  ```
43
50
 
44
- A block that is not modified can be either a **paragraph** or a **preformatted block**, depending on the modifier that encloses it; if there is no modifier enclosing it, it is a normal paragraph. The contents of preformatted blocks are treated as plain text. No parsing of modifiers and escape sequences is performed. Whitespaces and newlines at the beginning of a block is usually ignored, but in preformatted blocks, only the first newline (if any) is ignored.
51
+ In paragraphs, you can use a backslash `\` to **escape** a character immediately after it, so that it will not be interpreted as a special character, such as the beginning of a modifier.
52
+
53
+ > You can even use this to put multiple consecutive newlines in a single paragraph: just put a backslash before each newline (or at least every two newlines). However, this may look confusing.
54
+
55
+ #### 1.2. Block modifiers
45
56
 
46
- The construct `[.foo]` or `[.foo args]` before a block signals a **block modifier**, with `args` being an optional `:`-separated list of arguments (more on that later). It always starts a new block, even when at a position normally not expected to do so (but this will trigger a warning).
57
+ The construct `[.foo]` or `[.foo args]` (called the **head** of the modifier) signals a **block modifier**, with `args` being an optional `:`-separated list of arguments (more on that later). It always starts a new block, even when at a position normally not expected to have a new block (this will trigger a warning).
47
58
 
48
- Some block modifiers don't accept any content. For those that accept, their scope is limited to *the immediately following block*, unless a pair of brackets (`:--` and `--:`) is used to group blocks together.
59
+ ##### 1.2.1. Normal content
60
+
61
+ Most block modifiers accept block-level entities as **content**. They will always try to find the content, even when it's separated from the head by multiple newlines. Block modifiers can be nested, or if there isn't a nesting modifier anymore, the content will be a Paragraph.
62
+
63
+ By default, a block modifier's scope is limited to *the immediately following block*, unless a pair of brackets (`:--` and `--:`) is used to group blocks together.
64
+
65
+ Examples:
66
+
67
+ > Note: the code below may appear very confusing and difficult to read. However, `emmm` is designed with a [GUI editor](../../apps/editor/README.md) in mind -- with a graphical gutter, syntax highlighting and automatic hanging indentation, the structures can be fairly intuitive.
49
68
 
50
69
  ```
51
70
  [.foo] This is under foo (whitespace after ] is optional).
@@ -60,7 +79,12 @@ This paragraph is NOT under foo, [.foo] but this immediately starts a new one un
60
79
  [.foo] ... and this is another block of foo.
61
80
 
62
81
  [.foo]
63
- [.foo] However, this is foo inside foo, since the outer foo hadn't encountered any block before the parser met the inner foo, which became the content of the outer one.
82
+
83
+
84
+ You can actually add a lot of inital newlines and whitespaces and still be inside foo. This will trigger a warning.
85
+
86
+ [.foo]
87
+ [.foo] This is a paragraph inside foo inside foo, since the outer foo hadn't encountered any block before the parser met the inner foo, which became the content of the outer one.
64
88
 
65
89
  [.foo]
66
90
  :--
@@ -73,14 +97,18 @@ This is still in foo.
73
97
  [.foo] :--
74
98
  You can have nested brackets. Not exactly beautiful looking, though.
75
99
 
76
- Note that closing brackets have to be on its own line, but the opening ones do not. But you must have a newline after it.
100
+ Note that closing brackets have to be on its own line, but the opening ones do not. On the other hand, you must have a newline after a closing bracket.
77
101
  --:
78
102
  --:
79
103
  ```
80
104
 
81
105
  You can also use the brackets without a modifier. However, this has little effect.
82
106
 
83
- Suppose the modifier `[.pre]` accepts a preformatted block:
107
+ ##### 1.2.2. Preformatted content
108
+
109
+ Some block modifiers accept a **preformatted block**. The contents of preformatted blocks are treated as plain text: no parsing of modifiers and escape sequences, and no collapsing of whitespaces is performed. However, as in normal content, newlines and whitespaces immediately following the modifier head are ignored.
110
+
111
+ Examples, supposing the modifier `[.pre]` accepts a preformatted block:
84
112
 
85
113
  ```
86
114
  [.pre] This is preformatted content, suitable for code and ASCII art. Always treated as plain text, even if I write [.foo] or [/foo] or \[.
@@ -97,14 +125,52 @@ export function setDebugLevel(level: DebugLevel) {
97
125
  --:
98
126
  ```
99
127
 
100
- Use a `;` before `]` to signify empty content. Modifiers that don't accept content can also be written with `;]`, but this is not required.
128
+ ##### 1.2.3. Empty content or no content
129
+
130
+ Use a `;` before `]` to signify empty content.
101
131
 
102
132
  ```
103
133
  [.foo;]
104
134
  [.pre;]
105
135
  ```
106
136
 
107
- In normal paragraphs, you can use a backslash `\` to **escape** a character immediately after it, so that it will not be interpreted as a special character (e.g. the beginning of a modifier).
137
+ Some block modifiers don't accept any content. In that case, `;` is optional.
138
+
139
+ Example, supposing `[.boo]` doesn't accept content:
140
+
141
+ ```
142
+ [.boo]
143
+ This is a regular paragraph and not in boo!
144
+
145
+ [.boo] This will trigger a warning.
146
+ [.boo;] Actually, this will still trigger a warning. It's better to put things on a new line.
147
+ ```
148
+
149
+ #### 1.3. Block shorthands
150
+
151
+ **Block shorthands** can be defined via custom configuration or the `[-block-shorthand]` system modifier (see below). They're just syntactic sugar for block modifiers.
152
+
153
+ A block shorthand consists of a **prefix** and some **interfixes**. Arguments are placed between pre- and interfixes, and the content (if it accepts one) is after the last interfix.
154
+
155
+ The following example shows a shorthand with prefix `:: ` and a single interfix ` =`:
156
+
157
+ ```
158
+ :: author = J. Mustermann
159
+ ```
160
+
161
+ It is equivalent to
162
+
163
+ ```
164
+ [.metadata|author] J. Mustermann
165
+ ```
166
+
167
+ except that modifiers must have a name (here "metadata") but block shorthands have no names.
168
+
169
+ Note that `emmm` shorthands are parsed without backtracking. Whenever the parser sees a shorthand prefix at the start of a line, it assumes a shorthand (except, of course, in preformatted blocks). If the shorthand can't be succefully parsed (for example lacking any ` =`), it will trigger an error.
170
+
171
+ ### 2. Inline entities
172
+
173
+ #### 2.1. Inline modifiers
108
174
 
109
175
  **Inline modifiers** are similar to block modifiers, but occur in paragraphs. They are written as `[/baa]` or `[/baa args]`. If accepting content, use `[;]` to mark the end of their scope.
110
176
 
@@ -114,87 +180,163 @@ This one is without content: [/baa;].
114
180
  Baa inside a baa: [/baa]one [/baa]two[;] three[;].
115
181
  ```
116
182
 
117
- Some modifiers **expand** to something. For example, the built-in inline modifier `[/$]` expands to the value of a variable.
183
+ #### 2.2. Inline shorthands
184
+
185
+ Inline shorthands are similar to their block-level counterpart, however, they can appear anywhere in a paragraph, not only at the beginning of a line, and if they accept any parameter or content they must also have a **postfix** (functioning like `[;]`).
186
+
187
+ The content slot doesn't have to come last. For example, a link shorthand can be defined with prefix `<`, interfix `>(` and postfix `)` and with content at the first position:
188
+
189
+ ```
190
+ Check out <this>(myurl).
191
+ ````
192
+
193
+ Roughly equivalent to:
118
194
 
119
- **System modifiers** are very similar to block modifiers in terms of parsing, except they begins with `[-` and never expand to anything. They modify the state of the `ParseContent`, e.g. assigning variables or creating new modifiers.
195
+ ```
196
+ Check out [/link myurl]this[;].
197
+ ````
198
+
199
+ > This is **intended behavior** but **not yet implemented**. Currently, the content slot must the last one.
200
+
201
+ Again, `libemmm` parses inline shorthands without backtracking. In this example, whenever you need to use the character `<` in a paragraph that doesn't constitute a link shorthand, you must escape it. For example, `(a+b) * (a-b) <= a^2` will produce an error.
202
+
203
+ > This shows why you should be careful defining inline shorthands. Only use characters that aren't used in regular writing, or use a combination of characters.
204
+
205
+ ### 3. Expansion of modifiers
206
+
207
+ Some modifiers **expand** to something. For example, the built-in inline modifier `[/$]` expands to the value of a variable, and all user-defined modifiers expand to a copy of their definition with "slots" filled in with content and parameters filled in with arguments.
208
+
209
+ After expanding, the new entities are reparsed as if they're part of the original source code. For a verbose walkthrough of how this works, see the [Parser reference](https://github.com/the-dissidents/emmm/wiki/Parser-reference#expanding-and-reparsing) in the wiki.
210
+
211
+ ### 4. System modifiers
212
+
213
+ **System modifiers** are a special type of modifiers. They are similar to block modifiers in terms of parsing, but they begin with `[-` and never expand to anything. Usually, they modify the state of the `ParseContent`, e.g. assigning variables or creating new modifiers.
120
214
 
121
215
  > The AST definiton specifies that `SystemModifierNode`s can appear as either block-level or inline-level entities. The reason behind this is that we may want them to appear inside `[-define-inline]` definitions and thus expanding into inline entities:
122
216
  > ```
123
217
  > [-define-inline foo]
124
218
  > :--
125
- > [-var xyz:123]
219
+ > [-var xyz=123]
126
220
  > xyz is now 123
127
221
  > --:
128
222
  > ```
129
223
  > However, in parsing they are treated only as block-level modifiers, meaning that it's not supported to use them inline *directly*. Also note that inside `[-define-inline]` definitions they are still technically distinct blocks, only transformed into inline entities at expand time. **This is indeed awkward. We will change it if we think of a better approach.**
130
224
 
131
- The **arguments** for modifiers are basically `:`-delimited sequences. Each argument can contain **interpolations**, whose syntaxes are defined by an opening string and a closing string (there isn't a fixed form). For example, the built-in interpolator for variable reference opens with `$(` and closes with `)`. Interpolations expand to plain strings. They can also be nested.
225
+ ### 5. Modifier arguments
226
+
227
+ #### 5.0. Introduction
132
228
 
133
- As in paragraphs, use `\` to **escape** characters in arguments.
229
+ The **arguments** for modifiers are basically `|`-delimited sequences. They are fundamentally simple strings and cannot contain modifiers.
230
+
231
+ As in paragraphs, use `\` to escape characters in arguments.
134
232
 
135
233
  ```
136
- [/baa anything can be arguments:they can even
234
+ [/baa anything can be arguments|they can even
137
235
  span
138
236
  many
139
237
 
140
- lines:but colons (\:), semicolons (\;) and square brackets (\[\]) need escaping;]
141
-
142
- Suppose the variables are "x" = "y", "y" = "1".
143
-
144
- [.foo $(x)] Argument is "y"
145
- [.foo $(x)$(y)] Argument is "y1"
146
- [.foo $($(x))] Argument is "1"
147
- [.foo $(invalid)] Will fail
238
+ lines (but there are no concept of paragraphs)|note that pipes (\|), semicolons (\;) and square brackets (\[\]) need escaping;]
148
239
  ```
149
240
 
150
241
  A colon before the first argument states explicitly the beginning of that argument, so that any following whitespaces are not trimmed. In fact, it is not even required to have *any* whitespaces after the modifier name, and the built-in `[/$]` makes use of this (you can write `[/$myvar]` instead of `[/$ myvar]`). However, omitting the space in most other cases is, obviously, not recommended.
151
242
 
152
243
  ```
153
244
  [.foo abc] Argument is "abc"
154
- [.foo: abc] Argument is " abc"
245
+ [.foo| abc] Argument is " abc"
155
246
  [.fooabc] Argument is "abc" (argh!)
156
247
  ```
157
248
 
249
+ Although the parser doesn't do this, many modifiers' implementation internally trims whitespace around arguments in order to make the syntax more flexible.
250
+
251
+ #### 5.1. Named arguments
252
+
253
+ Arguments can be **named**. Named arguments are in the form `name=value`, where `name` is not allowed to contain `:`, `/`, `[`, `=`, whitespaces, escape sequences or interpolations.
254
+
255
+ > This is experimental and subject to change. In particular, the non-allowed characters in names still feel arbitrary.
256
+
257
+ Arguments containing `=` are only interpreted as named if the name is valid. Otherwise they're treated as normal arguments.
258
+
259
+ ```
260
+ [.foo baa=www] One named argument "baa" with value www
261
+ [.foo example.com/?query=123] No named arguments!
262
+ ```
263
+
264
+ You can mix named and unnamed arguments. Internally, named arguments are unordered and they are accessed separately. For example, the following instances of `[.foo]` are equivalent:
265
+
266
+ ```
267
+ [.foo unnamed1|unnamed2|baa=www|boo=qqq;]
268
+ [.foo unnamed1|baa=www|unnamed2|boo=qqq;]
269
+ [.foo boo=qqq|unnamed1|baa=www|unnamed2;]
270
+
271
+ etc., etc.
272
+ ```
273
+
274
+ *Un*named arguments are also called **positional arguments**.
275
+
276
+ #### 5.1. Argument interpolations
277
+
278
+ Each argument can contain **interpolations**, whose syntaxes are defined by an opening string and a closing string (there isn't a fixed form). For example, the built-in interpolator for variable reference opens with `$(` and closes with `)`. Interpolations expand to plain strings. They can also be nested.
279
+
280
+ Suppose the variables are "x" = "y", "y" = "1":
281
+
282
+ ```
283
+ [.foo $(x)] Argument is "y"
284
+ [.foo $(x)$(y)] Argument is "y1"
285
+ [.foo $($(x))] Argument is "1"
286
+ [.foo $(invalid)] Triggers a warning, argument is empty string
287
+ ```
288
+
158
289
  ## A Synopsis of the Built-in Configuration
159
290
 
160
291
  ### System modifiers
161
292
 
162
- [**-define-block** *name*:*args...*] *content*
163
- [**-define-block** *name*:*args...*:(*slot*)] *content*
164
- [**-define-inline** *name*:*args...*] *content*
165
- [**-define-inline** *name*:*args...*:(*slot*)] *content*
293
+ [**-define-block** *name* | *args...*] *content*
294
+ [**-define-block** *name* | *args...* | (*slot*)] *content*
295
+ [**-define-inline** *name* | *args...*] *content*
296
+ [**-define-inline** *name* | *args...* | (*slot*)] *content*
166
297
 
167
- > Define a new modifier. The first argument is the name. If one or more arguments exist, and the last is enclosed in `()`, it is taken as the **slot name** (more on that later). The rest in the middle are names for the arguments.
168
- >
169
- > Take content as the definition of the new modifier.
298
+ > Define a new modifier, taking the content as the definition. The first argument is the name. If one or more arguments exist, and the last is enclosed in `()`, it is taken as the **slot name** (more on that later). The rest in the middle are names for the arguments.
299
+ >
300
+ > Currently, custom modifiers **always have a slot** even if you don't explicitly give a slot name. This is inconsistent with shorthands which can be slotless (see below). We're considering changing this.
301
+ >
302
+ > You can define named arguments for your modifier using, well, named arguments:
303
+ >
304
+ > ```
305
+ > [-define-block foo|pos1|pos2|named=default]
306
+ > ...
307
+ > ```
308
+ > Named arguments for custom modifiers are **always optional** and you must specify a default value.
170
309
 
171
- [**-var** *id*:*value*]
310
+ [**-var** *id* | *value*]
311
+ [**-var** *id*=*value*]
172
312
 
173
313
  > Assigns `value` to a variable.
314
+ >
315
+ > The two syntaxes are equivalent *except that* in the second one, you must obey the limitation for argument names. For example, you can't use interpolations.
174
316
  >
175
317
  > You can't reassign arguments, only variables. Since arguments always take precedence over variables, "reassigning" them has no effect inside a definition and can only confuse the rest of the code.
176
318
 
177
- [**-define-block-prefix** *prefix*] *content*
178
- [**-define-block-prefix** *prefix*:(*slot*)] *content*
319
+ [**-block-shorthand** *prefix*] *content*
320
+ [**-block-shorthand** *prefix* | (*slot*)] *content*
179
321
 
180
- > Not implemented yet
181
-
182
- [**-define-inline-shorthand** *prefix*] *content*
183
- [**-define-inline-shorthand** *prefix*:(*slot*):*postfix*] *content*
184
- [**-define-inline-shorthand** *prefix*:*arg1*:*mid1*:*arg2*:*mid2*...] *content*
185
- [**-define-inline-shorthand** *prefix*:*arg1*:*mid1*:*arg2*:*mid2*...:(*slot*):*postfix*] *content*
322
+ [**-inline-shorthand** *prefix*] *content*
323
+ [**-inline-shorthand** *prefix* | (*slot*) | *postfix*] *content*
324
+ [**-inline-shorthand** *prefix* | *arg1* | *mid1* | *arg2* | *mid2*...] *content*
325
+ [**-inline-shorthand** *prefix* | *arg1* | *mid1* | *arg2* | *mid2*...|(*slot*) | *postfix*] *content*
186
326
 
187
- > Defines an inline shorthand. A shorthand notation consists of a prefix, zero or more pairs of argument and middle part, and optionally a slot and a postfix. You must specify a slot name if you want to use one, although you can specify an empty one using `()`. You may also specify an *empty* last argument, i.e. a `:` before the `]` that ends the modifier head, to make the postfix stand out better.
327
+ > Define shorthands. A shorthand notation consists of a prefix, zero or more pairs of argument and middle part, and optionally a slot and a postfix. You can specify a slot name if you want to use one, or just use `()`. You may also specify an *empty* last argument, i.e. a `|` before the `]` that ends the modifier head, to make the postfix stand out better.
188
328
  > ```
189
- > [-inline-shorthand:\[!:url:|:():\]:] content
329
+ > [-inline-shorthand|\[!|url|\||()|\]:] content
190
330
  > ```
191
- > This creates: `[!` argument:url `|` slot `]`
331
+ > This creates: `[!` argument|url `|` slot `]`
192
332
  > ```
193
- > [-inline-shorthand:\[!:url:|:text:\]:] content
333
+ > [-inline-shorthand|\[!|url|\||text|\]:] content
194
334
  > ```
195
- > This creates: `[!` argument:url `|` argument:text `]`
335
+ > This creates: `[!` argument|url `|` argument|text `]`
196
336
  >
197
- > Note the first shorthand has a slot, while the second doesn't. This means you can't put formatted content as text in the second shorthand.
337
+ > Note the second shorthand is **slotless**. This means you can't put formatted content as text in the second shorthand. This also applies to slotless block shorthands: they can't have any content.
338
+ >
339
+ > You **can't define** named arguments in shorthands.
198
340
 
199
341
  [**-use** *module-name*]
200
342
 
@@ -205,9 +347,9 @@ A colon before the first argument states explicitly the beginning of that argume
205
347
  [**.slot**]
206
348
  [**.slot** *name*]
207
349
 
208
- > Only used in block modifier definitons. When the new modifier is being used, expands to its content. You can use the slot name to specify *which* modifier's content you mean, in case of ambiguity. By default it refers to the nearest one.
350
+ > Only used in block-level definitons. When the new modifier or shorthand is being used, expands to its content. You can use the slot name to specify *which* modifier's content you mean, in case of ambiguity. By default it refers to the nearest one.
209
351
  > ```
210
- > [-define-block p:(0)]
352
+ > [-define-block p|(0)]
211
353
  > [-define-block q]
212
354
  > :--
213
355
  > [.slot]
@@ -243,7 +385,7 @@ A colon before the first argument states explicitly the beginning of that argume
243
385
  [**/slot**]
244
386
  [**/slot** *name*]
245
387
 
246
- > Same as `[.slot]`, but for inline modifier definitions.
388
+ > Same as `[.slot]` but for inline definitions.
247
389
 
248
390
  [**/$** *id*]
249
391
 
@@ -0,0 +1,13 @@
1
+ //#region rolldown:runtime
2
+ var __defProp = Object.defineProperty;
3
+ var __export = (all) => {
4
+ let target = {};
5
+ for (var name in all) __defProp(target, name, {
6
+ get: all[name],
7
+ enumerable: true
8
+ });
9
+ return target;
10
+ };
11
+
12
+ //#endregion
13
+ export { __export as t };