npm - @the_dissidents/libemmm - Versions diffs - 0.0.8 → 0.0.9 - Mend

@the_dissidents/libemmm 0.0.8 → 0.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md CHANGED Viewed

@@ -1,38 +1,45 @@
 # libemmm
-This package contains the parser and language server for the `emmm` markup language.
-```sh
-npm install @the_dissidents/libemmm
-```
+This package contains the parser and a default configuration of the `emmm` markup language.
 ## Usage
 `emmm` is an extensible language. The parser by itself only handles the basic syntax; it accepts a `Configuration` object that defines most of the features.
+To parse a source, create a `ParseContext` object from a `Configuration`; the context object holds the parser state, and you can use the same context to parse multiple sources to make definitions persist across them.
 ```typescript
 import * as emmm from '@the_dissidents/libemmm';
 let config = new emmm.Configuration(emmm.BuiltinConfiguration);
-// add definitions to config here
+// optionally, add definitions to config here
+let context = new ParseContext(config);
 ```
-The parser reads from a very simple scanner interface that only goes forward, without backtracking. Usually you can use the default implementation. The parser returns a `Document` object.
+The parser reads from a very simple scanner interface that only goes forward, without backtracking. Usually you can use the default implementation. Parsing yields a `Document` object.
 ```typescript
 let scanner = new emmm.SimpleScanner(source);
-let doc = emmm.parse(scanner, config);
-```
+let doc = context.parse(scanner);
-- `doc.root` is the AST root node.
-- `doc.context` is a `ParseContext` object containing the state of the language that the extensions need to know, such as variables and modifier definitions. This is its state at the end of the parse.
-- `doc.messages` is the array of diagnostic messages.
-- You may want to call `doc.debugPrint(source)` to get a pretty-printed debug string of the AST.
+// `doc.root` is the AST root node
+// `doc.messages` is an array of diagnostic messages
+```
 ## A Semi-Technical Reference to `emmm` Syntax
 ![AST Structure](./doc-images/ast.svg)
-Block-level entities are usually separated by a blank line (two newline characters). One newline does not create a new block, and is preserved along with other whitespaces inside the block.
+### 1. Block entities
+#### 1.1. Paragraphs
+The most basic type of **block-level** entities is **paragraph**.
+Block-level entities are usually separated by a blank line (two newline characters). One newline does not create a new block and is preserved. Whitespaces and newlines at the beginning of a block are usually ignored. However, whitespaces *inside* the block are preverved [^1].
+[^1]: There is also an option in `KernelConfiguration` that tells the parser to collapse consecutive whitespaces to a single space.
 ```
 This is a paragraph.
@@ -41,11 +48,23 @@ This is another paragraph.
 Still in the same paragraph, but after a newline.
 ```
-A block that is not modified can be either a **paragraph** or a **preformatted block**, depending on the modifier that encloses it; if there is no modifier enclosing it, it is a normal paragraph. The contents of preformatted blocks are treated as plain text. No parsing of modifiers and escape sequences is performed. Whitespaces and newlines at the beginning of a block is usually ignored, but in preformatted blocks, only the first newline (if any) is ignored.
+In paragraphs, you can use a backslash `\` to **escape** a character immediately after it, so that it will not be interpreted as a special character, such as the beginning of a modifier.
+> You can even use this to put multiple consecutive newlines in a single paragraph: just put a backslash before each newline (or at least every two newlines). However, this may look confusing.
+#### 1.2. Block modifiers
-The construct `[.foo]` or `[.foo args]` before a block signals a **block modifier**, with `args` being an optional `:`-separated list of arguments (more on that later). It always starts a new block, even when at a position normally not expected to do so (but this will trigger a warning).
+The construct `[.foo]` or `[.foo args]` (called the **head** of the modifier) signals a **block modifier**, with `args` being an optional `:`-separated list of arguments (more on that later). It always starts a new block, even when at a position normally not expected to have a new block (this will trigger a warning).
-Some block modifiers don't accept any content. For those that accept, their scope is limited to *the immediately following block*, unless a pair of brackets (`:--` and `--:`) is used to group blocks together.
+##### 1.2.1. Normal content
+Most block modifiers accept block-level entities as **content**. They will always try to find the content, even when it's separated from the head by multiple newlines. Block modifiers can be nested, or if there isn't a nesting modifier anymore, the content will be a Paragraph.
+By default, a block modifier's scope is limited to *the immediately following block*, unless a pair of brackets (`:--` and `--:`) is used to group blocks together.
+Examples:
+> Note: the code below may appear very confusing and difficult to read. However, `emmm` is designed with a [GUI editor](../../apps/editor/README.md) in mind -- with a graphical gutter, syntax highlighting and automatic hanging indentation, the structures can be fairly intuitive.
 ```
 [.foo] This is under foo (whitespace after ] is optional).
@@ -60,7 +79,12 @@ This paragraph is NOT under foo, [.foo] but this immediately starts a new one un
 [.foo] ... and this is another block of foo.
 [.foo]
-[.foo] However, this is foo inside foo, since the outer foo hadn't encountered any block before the parser met the inner foo, which became the content of the outer one.
+    You can actually add a lot of inital newlines and whitespaces and still be inside foo. This will trigger a warning.
+[.foo]
+[.foo] This is a paragraph inside foo inside foo, since the outer foo hadn't encountered any block before the parser met the inner foo, which became the content of the outer one.
 [.foo]
 :--
@@ -73,14 +97,18 @@ This is still in foo.
 [.foo] :--
 You can have nested brackets. Not exactly beautiful looking, though.
-Note that closing brackets have to be on its own line, but the opening ones do not. But you must have a newline after it.
+Note that closing brackets have to be on its own line, but the opening ones do not. On the other hand, you must have a newline after a closing bracket.
 --:
 --:
 ```
 You can also use the brackets without a modifier. However, this has little effect.
-Suppose the modifier `[.pre]` accepts a preformatted block:
+##### 1.2.2. Preformatted content
+Some block modifiers accept a **preformatted block**. The contents of preformatted blocks are treated as plain text: no parsing of modifiers and escape sequences, and no collapsing of whitespaces is performed. However, as in normal content, newlines and whitespaces immediately following the modifier head are ignored.
+Examples, supposing the modifier `[.pre]` accepts a preformatted block:
 ```
 [.pre] This is preformatted content, suitable for code and ASCII art. Always treated as plain text, even if I write [.foo] or [/foo] or \[.
@@ -97,14 +125,52 @@ export function setDebugLevel(level: DebugLevel) {
 --:
 ```
-Use a `;` before `]` to signify empty content. Modifiers that don't accept content can also be written with `;]`, but this is not required.
+##### 1.2.3. Empty content or no content
+Use a `;` before `]` to signify empty content.
 ```
 [.foo;]
 [.pre;]
 ```
-In normal paragraphs, you can use a backslash `\` to **escape** a character immediately after it, so that it will not be interpreted as a special character (e.g. the beginning of a modifier).
+Some block modifiers don't accept any content. In that case, `;` is optional.
+Example, supposing `[.boo]` doesn't accept content:
+```
+[.boo]
+This is a regular paragraph and not in boo!
+[.boo]  This will trigger a warning.
+[.boo;] Actually, this will still trigger a warning. It's better to put things on a new line.
+```
+#### 1.3. Block shorthands
+**Block shorthands** can be defined via custom configuration or the `[-block-shorthand]` system modifier (see below). They're just syntactic sugar for block modifiers.
+A block shorthand consists of a **prefix** and some **interfixes**. Arguments are placed between pre- and interfixes, and the content (if it accepts one) is after the last interfix.
+The following example shows a shorthand with prefix `:: ` and a single interfix ` =`:
+```
+:: author = J. Mustermann
+```
+It is equivalent to
+```
+[.metadata|author] J. Mustermann
+```
+except that modifiers must have a name (here "metadata") but block shorthands have no names.
+Note that `emmm` shorthands are parsed without backtracking. Whenever the parser sees a shorthand prefix at the start of a line, it assumes a shorthand (except, of course, in preformatted blocks). If the shorthand can't be succefully parsed (for example lacking any ` =`), it will trigger an error.
+### 2. Inline entities
+#### 2.1. Inline modifiers
 **Inline modifiers** are similar to block modifiers, but occur in paragraphs. They are written as `[/baa]` or `[/baa args]`. If accepting content, use `[;]` to mark the end of their scope.
@@ -114,87 +180,163 @@ This one is without content: [/baa;].
 Baa inside a baa: [/baa]one [/baa]two[;] three[;].
 ```
-Some modifiers **expand** to something. For example, the built-in inline modifier `[/$]` expands to the value of a variable.
+#### 2.2. Inline shorthands
+Inline shorthands are similar to their block-level counterpart, however, they can appear anywhere in a paragraph, not only at the beginning of a line, and if they accept any parameter or content they must also have a **postfix** (functioning like `[;]`).
+The content slot doesn't have to come last. For example, a link shorthand can be defined with prefix `<`, interfix `>(` and postfix `)` and with content at the first position:
+```
+Check out <this>(myurl).
+````
+Roughly equivalent to:
-**System modifiers** are very similar to block modifiers in terms of parsing, except they begins with `[-` and never expand to anything. They modify the state of the `ParseContent`, e.g. assigning variables or creating new modifiers.
+```
+Check out [/link myurl]this[;].
+````
+> This is **intended behavior** but **not yet implemented**. Currently, the content slot must the last one.
+Again, `libemmm` parses inline shorthands without backtracking. In this example, whenever you need to use the character `<` in a paragraph that doesn't constitute a link shorthand, you must escape it. For example, `(a+b) * (a-b) <= a^2` will produce an error.
+> This shows why you should be careful defining inline shorthands. Only use characters that aren't used in regular writing, or use a combination of characters.
+### 3. Expansion of modifiers
+Some modifiers **expand** to something. For example, the built-in inline modifier `[/$]` expands to the value of a variable, and all user-defined modifiers expand to a copy of their definition with "slots" filled in with content and parameters filled in with arguments.
+After expanding, the new entities are reparsed as if they're part of the original source code. For a verbose walkthrough of how this works, see the [Parser reference](https://github.com/the-dissidents/emmm/wiki/Parser-reference#expanding-and-reparsing) in the wiki.
+### 4. System modifiers
+**System modifiers** are a special type of modifiers. They are similar to block modifiers in terms of parsing, but they begin with `[-` and never expand to anything. Usually, they modify the state of the `ParseContent`, e.g. assigning variables or creating new modifiers.
 > The AST definiton specifies that `SystemModifierNode`s can appear as either block-level or inline-level entities. The reason behind this is that we may want them to appear inside `[-define-inline]` definitions and thus expanding into inline entities:
 > ```
 > [-define-inline foo]
 > :--
-> [-var xyz:123]
+> [-var xyz=123]
 > xyz is now 123
 > --:
 > ```
 > However, in parsing they are treated only as block-level modifiers, meaning that it's not supported to use them inline *directly*. Also note that inside `[-define-inline]` definitions they are still technically distinct blocks, only transformed into inline entities at expand time. **This is indeed awkward. We will change it if we think of a better approach.**
-The **arguments** for modifiers are basically `:`-delimited sequences. Each argument can contain **interpolations**, whose syntaxes are defined by an opening string and a closing string (there isn't a fixed form). For example, the built-in interpolator for variable reference opens with `$(` and closes with `)`. Interpolations expand to plain strings. They can also be nested.
+### 5. Modifier arguments
+#### 5.0. Introduction
-As in paragraphs, use `\` to **escape** characters in arguments.
+The **arguments** for modifiers are basically `|`-delimited sequences. They are fundamentally simple strings and cannot contain modifiers.
+As in paragraphs, use `\` to escape characters in arguments.
 ```
-[/baa anything can be arguments:they can even
+[/baa anything can be arguments|they can even
 span
 many
-lines:but colons (\:), semicolons (\;) and square brackets (\[\]) need escaping;]
-Suppose the variables are "x" = "y", "y" = "1".
-[.foo $(x)]       Argument is "y"
-[.foo $(x)$(y)]   Argument is "y1"
-[.foo $($(x))]    Argument is "1"
-[.foo $(invalid)] Will fail
+lines (but there are no concept of paragraphs)|note that pipes (\|), semicolons (\;) and square brackets (\[\]) need escaping;]
 ```
 A colon before the first argument states explicitly the beginning of that argument, so that any following whitespaces are not trimmed. In fact, it is not even required to have *any* whitespaces after the modifier name, and the built-in `[/$]` makes use of this (you can write `[/$myvar]` instead of `[/$ myvar]`). However, omitting the space in most other cases is, obviously, not recommended.
 ```
 [.foo   abc] Argument is "abc"
-[.foo:  abc] Argument is "  abc"
+[.foo|  abc] Argument is "  abc"
 [.fooabc]    Argument is "abc" (argh!)
 ```
+Although the parser doesn't do this, many modifiers' implementation internally trims whitespace around arguments in order to make the syntax more flexible.
+#### 5.1. Named arguments
+Arguments can be **named**. Named arguments are in the form `name=value`, where `name` is not allowed to contain `:`, `/`, `[`, `=`, whitespaces, escape sequences or interpolations.
+> This is experimental and subject to change. In particular, the non-allowed characters in names still feel arbitrary.
+Arguments containing `=` are only interpreted as named if the name is valid. Otherwise they're treated as normal arguments.
+```
+[.foo baa=www] One named argument "baa" with value www
+[.foo example.com/?query=123] No named arguments!
+```
+You can mix named and unnamed arguments. Internally, named arguments are unordered and they are accessed separately. For example, the following instances of `[.foo]` are equivalent:
+```
+[.foo unnamed1|unnamed2|baa=www|boo=qqq;]
+[.foo unnamed1|baa=www|unnamed2|boo=qqq;]
+[.foo boo=qqq|unnamed1|baa=www|unnamed2;]
+etc., etc.
+```
+*Un*named arguments are also called **positional arguments**.
+#### 5.1. Argument interpolations
+Each argument can contain **interpolations**, whose syntaxes are defined by an opening string and a closing string (there isn't a fixed form). For example, the built-in interpolator for variable reference opens with `$(` and closes with `)`. Interpolations expand to plain strings. They can also be nested.
+Suppose the variables are "x" = "y", "y" = "1":
+```
+[.foo $(x)]       Argument is "y"
+[.foo $(x)$(y)]   Argument is "y1"
+[.foo $($(x))]    Argument is "1"
+[.foo $(invalid)] Triggers a warning, argument is empty string
+```
 ## A Synopsis of the Built-in Configuration
 ### System modifiers
-[**-define-block** *name*:*args...*] *content*
-[**-define-block** *name*:*args...*:(*slot*)] *content*
-[**-define-inline** *name*:*args...*] *content*
-[**-define-inline** *name*:*args...*:(*slot*)] *content*
+[**-define-block** *name* | *args...*] *content*
+[**-define-block** *name* | *args...* | (*slot*)] *content*
+[**-define-inline** *name* | *args...*] *content*
+[**-define-inline** *name* | *args...* | (*slot*)] *content*
-> Define a new modifier. The first argument is the name. If one or more arguments exist, and the last is enclosed in `()`, it is taken as the **slot name** (more on that later). The rest in the middle are names for the arguments.
->
-> Take content as the definition of the new modifier.
+> Define a new modifier, taking the content as the definition. The first argument is the name. If one or more arguments exist, and the last is enclosed in `()`, it is taken as the **slot name** (more on that later). The rest in the middle are names for the arguments.
+>
+> Currently, custom modifiers **always have a slot** even if you don't explicitly give a slot name. This is inconsistent with shorthands which can be slotless (see below). We're considering changing this.
+>
+> You can define named arguments for your modifier using, well, named arguments:
+>
+> ```
+> [-define-block foo|pos1|pos2|named=default]
+> ...
+> ```
+> Named arguments for custom modifiers are **always optional** and you must specify a default value.
-[**-var** *id*:*value*]
+[**-var** *id* | *value*]
+[**-var** *id*=*value*]
 > Assigns `value` to a variable.
+>
+> The two syntaxes are equivalent *except that* in the second one, you must obey the limitation for argument names. For example, you can't use interpolations.
 >
 > You can't reassign arguments, only variables. Since arguments always take precedence over variables, "reassigning" them has no effect inside a definition and can only confuse the rest of the code.
-[**-define-block-prefix** *prefix*] *content*
-[**-define-block-prefix** *prefix*:(*slot*)] *content*
+[**-block-shorthand** *prefix*] *content*
+[**-block-shorthand** *prefix* | (*slot*)] *content*
-> Not implemented yet
-[**-define-inline-shorthand** *prefix*] *content*
-[**-define-inline-shorthand** *prefix*:(*slot*):*postfix*] *content*
-[**-define-inline-shorthand** *prefix*:*arg1*:*mid1*:*arg2*:*mid2*...] *content*
-[**-define-inline-shorthand** *prefix*:*arg1*:*mid1*:*arg2*:*mid2*...:(*slot*):*postfix*] *content*
+[**-inline-shorthand** *prefix*] *content*
+[**-inline-shorthand** *prefix* | (*slot*) | *postfix*] *content*
+[**-inline-shorthand** *prefix* | *arg1* | *mid1* | *arg2* | *mid2*...] *content*
+[**-inline-shorthand** *prefix* | *arg1* | *mid1* | *arg2* | *mid2*...|(*slot*) | *postfix*] *content*
-> Defines an inline shorthand. A shorthand notation consists of a prefix, zero or more pairs of argument and middle part, and optionally a slot and a postfix. You must specify a slot name if you want to use one, although you can specify an empty one using `()`. You may also specify an *empty* last argument, i.e. a `:` before the `]` that ends the modifier head, to make the postfix stand out better.
+> Define shorthands. A shorthand notation consists of a prefix, zero or more pairs of argument and middle part, and optionally a slot and a postfix. You can specify a slot name if you want to use one, or just use `()`. You may also specify an *empty* last argument, i.e. a `|` before the `]` that ends the modifier head, to make the postfix stand out better.
 > ```
-> [-inline-shorthand:\[!:url:|:():\]:] content
+> [-inline-shorthand|\[!|url|\||()|\]:] content
 > ```
-> This creates: `[!` argument:url `|` slot `]`
+> This creates: `[!` argument|url `|` slot `]`
 > ```
-> [-inline-shorthand:\[!:url:|:text:\]:] content
+> [-inline-shorthand|\[!|url|\||text|\]:] content
 > ```
-> This creates: `[!` argument:url `|` argument:text `]`
+> This creates: `[!` argument|url `|` argument|text `]`
 >
-> Note the first shorthand has a slot, while the second doesn't. This means you can't put formatted content as text in the second shorthand.
+> Note the second shorthand is **slotless**. This means you can't put formatted content as text in the second shorthand. This also applies to slotless block shorthands: they can't have any content.
+>
+> You **can't define** named arguments in shorthands.
 [**-use** *module-name*]
@@ -205,9 +347,9 @@ A colon before the first argument states explicitly the beginning of that argume
 [**.slot**]
 [**.slot** *name*]
-> Only used in block modifier definitons. When the new modifier is being used, expands to its content. You can use the slot name to specify *which* modifier's content you mean, in case of ambiguity. By default it refers to the nearest one.
+> Only used in block-level definitons. When the new modifier or shorthand is being used, expands to its content. You can use the slot name to specify *which* modifier's content you mean, in case of ambiguity. By default it refers to the nearest one.
 > ```
-> [-define-block p:(0)]
+> [-define-block p|(0)]
 > [-define-block q]
 > :--
 > [.slot]
@@ -243,7 +385,7 @@ A colon before the first argument states explicitly the beginning of that argume
 [**/slot**]
 [**/slot** *name*]
-> Same as `[.slot]`, but for inline modifier definitions.
+> Same as `[.slot]` but for inline definitions.
 [**/$** *id*]

package/dist/chunk-Bp6m_JJh.js ADDED Viewed

@@ -0,0 +1,13 @@
+//#region rolldown:runtime
+var __defProp = Object.defineProperty;
+var __export = (all) => {
+	let target = {};
+	for (var name in all) __defProp(target, name, {
+		get: all[name],
+		enumerable: true
+	});
+	return target;
+};
+//#endregion
+export { __export as t };