@mlightcad/mtext-parser 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,193 @@
1
+ # AutoCAD MText Parser
2
+
3
+ TypeScript version of AutoCAD MText parser. It is based on [ezdxf dxf mtext parser](https://github.com/mozman/ezdxf/blob/aaa2d1b302c78a47fe1159bd6007254d1c2ebd22/src/ezdxf/tools/text.py) and ported by [Cursor](https://www.cursor.com/). Moveover, unit tests are added based on use case in the specification in next section.
4
+
5
+ ## AutoCAD MText Specification
6
+
7
+ The text formatting is done by inline codes. You can get more information from [this page](https://ezdxf.mozman.at/docs/dxfinternals/entities/mtext.html).
8
+
9
+ ### caret encoded characters:
10
+ - “^I” tabulator
11
+ - “^J” (LF) is a valid line break like “\P”
12
+ - “^M” (CR) is ignored
13
+ - other characters render as empty square “▯”
14
+ - a space “ “ after the caret renders the caret glyph: “1^ 2” renders “1^2”
15
+
16
+ ### special encoded characters:
17
+ - “%%c” and “%%C” renders “Ø” (alt-0216)
18
+ - “%%d” and “%%D” renders “°” (alt-0176)
19
+ - “%%p” and “%%P” renders “±” (alt-0177)
20
+
21
+ ### Alignment command “\A”: argument “0”, “1” or “2” is expected
22
+ - the terminator symbol “;” is optional
23
+ - the arguments “3”, “4”, “5”, “6”, “7”, “8”, “9” and “-” default to 0
24
+ - other characters terminate the command and will be printed: “\AX”, renders “X”
25
+
26
+ ### ACI color command “\C”: int argument is expected
27
+ - the terminator symbol “;” is optional
28
+ - a leading “-” or “+” terminates the command, “\C+5” renders “\C+5”
29
+ - arguments > 255, are ignored but consumed “\C1000” renders nothing, not even a “0”
30
+ - a trailing “;” after integers is always consumed, even for much to big values, “\C10000;” renders nothing
31
+
32
+ ### RGB color command “\c”: int argument is expected
33
+ - the terminator symbol “;” is optional
34
+ - a leading “-” or “+” terminates the command, “\c+255” renders “\c+255”
35
+ - arguments >= 16777216 are masked by: value & 0xFFFFFF
36
+ - a trailing “;” after integers is always consumed, even for much to big values, “\c9999999999;” renders nothing and switches the color to yellow (255, 227, 11)
37
+
38
+ ### Height command “\H” and “\H…x”: float argument is expected
39
+ - the terminator symbol “;” is optional
40
+ - a leading “-” is valid, but negative values are ignored
41
+ - a leading “+” is valid
42
+ - a leading “.” is valid like “\H.5x” for height factor 0.5
43
+ - exponential format is valid like “\H1e2” for height factor 100 and “\H1e-2” for 0.01
44
+ - an invalid floating point value terminates the command, “\H1..5” renders “\H1..5”
45
+
46
+ ### Other commands with floating point arguments like the height command:
47
+ - Width commands “\W” and “\W…x”
48
+ - Character tracking commands “\T” and “\T…x”, negative values are used
49
+ - Slanting (oblique) command “\Q”
50
+
51
+ ### Stacking command “\S”:
52
+ - build fractions: “numerator (upr)” + “stacking type char (t)” + “denominator (lwr)” + “;”
53
+ - divider chars: “^”, “/” or “#”
54
+ - a space “ “ after the divider char “^” is mandatory to avoid caret decoding: “\S1^ 2;”
55
+ - the terminator symbol “;” is mandatory to end the command, all chars beyond the “\S” until the next “;” or the end of the string are part of the fraction
56
+ - backslash escape “\;” to render the terminator char
57
+ - a space “ “ after the divider chars “/” and “#” is rendered as space “ ” in front of the denominator
58
+ - the numerator and denominator can contain spaces
59
+ - backslashes “\” inside the stacking command are ignored (except “\;”) “\S\N^ \P” render “N” over “P”, therefore property changes (color, text height, …) are not possible inside the stacking command
60
+ - grouping chars “{” and “}” render as simple curly braces
61
+ - caret encoded chars are decoded “^I”, “^J”, “^M”, but render as a simple space “ “ or as the replacement char “▯” plus a space
62
+ - a divider char after the first divider char, renders as the char itself: “\S1/2/3” renders the horizontal fraction “1” / “2/3”
63
+
64
+ ### Font command “\f” and “\F”: export only “\f”, parse both, “\F” ignores some arguments
65
+ - the terminator symbol “;” is mandatory to end the command, all chars beyond the “\f” until the next “;” or the end of the string are part of the command
66
+ - the command arguments are separated by the pipe char “|”
67
+ - arguments: “font family name” | “bold” | “italic” | “codepage” | “pitch”; example “\fArial|b0|i0|c0|p0;”
68
+ - only the “font family name” argument is required, fonts which are not available on the system are replaced by the “TXT.SHX” shape font
69
+ - the “font family name” is the font name shown in font selection widgets in desktop applications
70
+ - “b1” to use the bold font style, any other second char is interpreted as “non bold”
71
+ - “i1” to use an italic font style, any other second char is interpreted as “non italic”
72
+ - “c???” change codepage, “c0” use the default codepage, because of the age of unicode no further investigations, also seems to be ignored by AutoCAD and BricsCAD
73
+ - “p???” change pitch size, “p0” means don’t change, ignored by AutoCAD and BricsCAD, to change the text height use the “\H” command
74
+ - the order is not important, but export always in the shown order: “\fArial|b0|i0;” the arguments “c0” and “p0” are not required
75
+
76
+ ### Paragraph properties command “\p”
77
+ - the terminator symbol “;” is mandatory to end the command, all chars beyond the “\p” until the next “;” or the end of the string are part of the command
78
+ - the command arguments are separated by commas “,”
79
+ - all values are factors for the initial char height of the MTEXT entity, example: char height = 2.5, “\pl1;” set the left paragraph indentation to 1 x 2.5 = 2.5 drawing units.
80
+ - all values are floating point values, see height command
81
+ - arguments are “i”, “l”, “r”, “q”, “t”
82
+ - a “*” as argument value, resets the argument to the initial value: “i0”, “l0”, “r0”, the “q” argument most likely depends on the text direction; I haven’t seen “t*”. The sequence used by BricsCAD to reset all values is "\pi*,l*,r*,q*,t;"
83
+ - “i” indentation of the first line relative to the “l” argument as floating point value, “\pi1.5”
84
+ - “l” left paragraph indentation as floating point value, “\pl1.5”
85
+ - “r” right paragraph indentation as floating point value, “\pr1.5”
86
+ - “x” is required if a “q” or a “t” argument is present, the placement of the “x” has no obvious rules
87
+ - “q” paragraph alignment
88
+ - “ql” left paragraph alignment
89
+ - “qr” right paragraph alignment
90
+ - “qc” center paragraph alignment
91
+ - “qj” justified paragraph alignment
92
+ - “qd” distributed paragraph alignment
93
+ - “t” tabulator stops as comma separated list, the default tabulator stops are located at 4, 8, 12, …, by defining at least one tabulator stop, the default tabulator stops will be ignored. There 3 kind of tabulator stops: left, right and center adjusted stops, e.g. “pxt1,r5,c8”:
94
+ - a left adjusted stop has no leading char, two left adjusted stops “\pxt1,2;”
95
+ - a right adjusted stop has a preceding “r” char, “\pxtr1,r2;”
96
+ - a center adjusted stop has a preceding “c” char, “\pxtc1,c2;”
97
+
98
+ - complex example to create a numbered list with two items: "pxi-3,l4t4;1.^Ifirst item\P2.^Isecond item"
99
+ - a parser should be very flexible, I have seen several different orders of the arguments and placing the sometimes required “x” has no obvious rules.
100
+ - exporting seems to be safe to follow these three rules:
101
+ - the command starts with "\\px", the "x" does no harm, if not required
102
+ - argument order "i", "l", "r", "q", "t", any of the arguments can be left off
103
+ - terminate the command with a ";"
104
+
105
+ ## Usage
106
+
107
+ ```bash
108
+ npm install @mlightcad/mtext-parser
109
+ ```
110
+
111
+ Here's how to use the MText parser in your TypeScript/JavaScript project:
112
+
113
+ ```typescript
114
+ import { MTextParser, TokenType } from '@mlightcad/mtext-parser';
115
+
116
+ // Basic usage
117
+ const parser = new MTextParser('Hello World');
118
+ const tokens = Array.from(parser.parse());
119
+ // tokens will contain:
120
+ // - WORD token with "Hello"
121
+ // - SPACE token
122
+ // - WORD token with "World"
123
+
124
+ // Parsing formatted text
125
+ const formattedParser = new MTextParser('\\H2.5;Large\\H.5x;Small');
126
+ const formattedTokens = Array.from(formattedParser.parse());
127
+ // formattedTokens will contain:
128
+ // - WORD token with "Large" and capHeight = 2.5
129
+ // - WORD token with "Small" and capHeight = 0.5
130
+
131
+ // Parsing special characters
132
+ const specialParser = new MTextParser('Diameter: %%c, Angle: %%d, Tolerance: %%p');
133
+ const specialTokens = Array.from(specialParser.parse());
134
+ // specialTokens will contain:
135
+ // - WORD token with "Diameter: Ø, Angle: °, Tolerance: ±"
136
+
137
+ // Parsing with context
138
+ const ctx = new MTextContext();
139
+ ctx.capHeight = 2.0;
140
+ ctx.widthFactor = 1.5;
141
+ const contextParser = new MTextParser('Text with context', ctx);
142
+ const contextTokens = Array.from(contextParser.parse());
143
+ // contextTokens will contain tokens with the specified context
144
+
145
+ // Parsing with property commands
146
+ const propertyParser = new MTextParser('\\C1;Red Text', undefined, true);
147
+ const propertyTokens = Array.from(propertyParser.parse());
148
+ // propertyTokens will contain:
149
+ // - PROPERTIES_CHANGED token with the color command
150
+ // - WORD token with "Red Text" and aci = 1
151
+ ```
152
+
153
+ ### Token Types
154
+
155
+ The parser produces tokens of the following types:
156
+
157
+ - `WORD`: Text content with associated formatting context
158
+ - `SPACE`: Space character
159
+ - `NBSP`: Non-breaking space
160
+ - `TABULATOR`: Tab character
161
+ - `NEW_PARAGRAPH`: Paragraph break
162
+ - `NEW_COLUMN`: Column break
163
+ - `WRAP_AT_DIMLINE`: Wrap at dimension line
164
+ - `STACK`: Stacked fraction with [numerator, denominator, type] data
165
+ - `PROPERTIES_CHANGED`: Property change command (only when yieldPropertyCommands is true)
166
+
167
+ ### Context Properties
168
+
169
+ Each token includes a context object (`MTextContext`) that contains the current formatting state:
170
+
171
+ - `underline`: Whether text is underlined
172
+ - `overline`: Whether text has overline
173
+ - `strikeThrough`: Whether text has strike-through
174
+ - `aci`: ACI color value (0-256)
175
+ - `rgb`: RGB color value [r, g, b]
176
+ - `align`: Line alignment (BOTTOM, MIDDLE, TOP)
177
+ - `fontFace`: Font properties (family, style, weight)
178
+ - `capHeight`: Capital letter height
179
+ - `widthFactor`: Character width factor
180
+ - `charTrackingFactor`: Character tracking factor
181
+ - `oblique`: Oblique angle
182
+ - `paragraph`: Paragraph properties (indent, margins, alignment, tab stops)
183
+
184
+ ### Error Handling
185
+
186
+ The parser handles invalid commands gracefully:
187
+
188
+ - Invalid floating point values are treated as literal text
189
+ - Invalid special character codes are ignored
190
+ - Invalid property commands are treated as literal text
191
+ - Invalid stacking commands are treated as literal text
192
+
193
+ This makes the parser robust for handling real-world MText content that may contain errors or unexpected formatting.
@@ -0,0 +1 @@
1
+ export {};
@@ -0,0 +1,125 @@
1
+ "use strict";
2
+ Object.defineProperty(exports, "__esModule", { value: true });
3
+ const parser_1 = require("./parser");
4
+ // Example 1: Basic text with formatting
5
+ const basicExample = `
6
+ This is a test of the MText parser\\P
7
+ \\C1;This text is red\\P
8
+ \\H2x;This text is twice the height\\P
9
+ \\S1/2;This is a fraction\\P
10
+ `;
11
+ // Example 2: Paragraph formatting
12
+ const paragraphExample = `
13
+ \\pi2,l4;This paragraph has indentation\\P
14
+ \\pqc;This paragraph is centered\\P
15
+ \\pqr;This paragraph is right-aligned\\P
16
+ `;
17
+ // Example 3: Text styling
18
+ const stylingExample = `
19
+ \\LThis text is underlined\\l\\P
20
+ \\OThis text has an overline\\o\\P
21
+ \\KThis text is struck through\\k\\P
22
+ \\Q15;This text is oblique\\P
23
+ \\T2;This text has increased tracking\\P
24
+ `;
25
+ // Example 4: Colors and special characters
26
+ const colorExample = `
27
+ \\c16711680;This text is blue (RGB)\\P
28
+ %%c This is a diameter symbol\\P
29
+ %%d This is a degree symbol\\P
30
+ %%p This is a plus-minus symbol\\P
31
+ `;
32
+ // Example 5: Caret encoded characters
33
+ const caretExample = `
34
+ ^I This is a tab\\P
35
+ ^J This is a line break\\P
36
+ 1^ 2 This shows a caret\\P
37
+ `;
38
+ // Example 6: Complex formatting with context
39
+ const complexExample = `
40
+ {\\fArial|b1|i0;
41
+ This text is in Arial Bold\\P
42
+ \\H2.5;This text is larger\\P
43
+ \\H.5x;This text is smaller\\P
44
+ }
45
+ `;
46
+ // Function to process tokens and display their properties
47
+ function processTokens(parser, title) {
48
+ console.log(`\n=== ${title} ===\n`);
49
+ for (const token of parser.parse()) {
50
+ // Print token type and data
51
+ switch (token.type) {
52
+ case parser_1.TokenType.WORD:
53
+ console.log('Word:', token.data);
54
+ break;
55
+ case parser_1.TokenType.SPACE:
56
+ console.log('Space');
57
+ break;
58
+ case parser_1.TokenType.NEW_PARAGRAPH:
59
+ console.log('New Paragraph');
60
+ break;
61
+ case parser_1.TokenType.STACK: {
62
+ const [numerator, denominator, type] = token.data;
63
+ console.log('Stack:', { numerator, denominator, type });
64
+ break;
65
+ }
66
+ case parser_1.TokenType.PROPERTIES_CHANGED:
67
+ console.log('Properties Changed:', token.data);
68
+ break;
69
+ case parser_1.TokenType.NBSP:
70
+ console.log('Non-breaking space');
71
+ break;
72
+ case parser_1.TokenType.TABULATOR:
73
+ console.log('Tab');
74
+ break;
75
+ case parser_1.TokenType.NEW_COLUMN:
76
+ console.log('New Column');
77
+ break;
78
+ case parser_1.TokenType.WRAP_AT_DIMLINE:
79
+ console.log('Wrap at dimension line');
80
+ break;
81
+ }
82
+ // Print context properties if they differ from defaults
83
+ const ctx = token.ctx;
84
+ const contextProps = {
85
+ font: ctx.fontFace.family ? ctx.fontFace : undefined,
86
+ color: ctx.rgb ? `RGB(${ctx.rgb.join(',')})` : ctx.aci !== 7 ? `ACI(${ctx.aci})` : undefined,
87
+ height: ctx.capHeight !== 1.0 ? ctx.capHeight : undefined,
88
+ width: ctx.widthFactor !== 1.0 ? ctx.widthFactor : undefined,
89
+ tracking: ctx.charTrackingFactor !== 1.0 ? ctx.charTrackingFactor : undefined,
90
+ oblique: ctx.oblique !== 0.0 ? ctx.oblique : undefined,
91
+ underline: ctx.underline ? true : undefined,
92
+ overline: ctx.overline ? true : undefined,
93
+ strikeThrough: ctx.strikeThrough ? true : undefined,
94
+ paragraph: Object.keys(ctx.paragraph).some(key => ctx.paragraph[key] !==
95
+ new parser_1.MTextContext().paragraph[key])
96
+ ? ctx.paragraph
97
+ : undefined,
98
+ };
99
+ // Only print non-default context properties
100
+ const nonDefaultProps = Object.entries(contextProps)
101
+ .filter(([_, value]) => value !== undefined)
102
+ .reduce((obj, [key, value]) => ({ ...obj, [key]: value }), {});
103
+ if (Object.keys(nonDefaultProps).length > 0) {
104
+ console.log('Context:', nonDefaultProps);
105
+ }
106
+ console.log('---');
107
+ }
108
+ }
109
+ // Process all examples
110
+ processTokens(new parser_1.MTextParser(basicExample), 'Basic Formatting');
111
+ processTokens(new parser_1.MTextParser(paragraphExample), 'Paragraph Formatting');
112
+ processTokens(new parser_1.MTextParser(stylingExample), 'Text Styling');
113
+ processTokens(new parser_1.MTextParser(colorExample), 'Colors and Special Characters');
114
+ processTokens(new parser_1.MTextParser(caretExample), 'Caret Encoded Characters');
115
+ processTokens(new parser_1.MTextParser(complexExample), 'Complex Formatting with Context');
116
+ // Example of using custom context
117
+ const customContext = new parser_1.MTextContext();
118
+ customContext.capHeight = 2.0;
119
+ customContext.widthFactor = 1.5;
120
+ customContext.fontFace = { family: 'Times New Roman', style: 'Italic', weight: 400 };
121
+ const customParser = new parser_1.MTextParser('Text with custom context', customContext);
122
+ processTokens(customParser, 'Custom Context');
123
+ // Example of parsing with property commands
124
+ const propertyParser = new parser_1.MTextParser('\\C1;Red Text', undefined, true);
125
+ processTokens(propertyParser, 'Property Commands');