styled-text 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,202 @@
1
+ Metadata-Version: 2.4
2
+ Name: styled-text
3
+ Version: 0.1.0
4
+ Summary: Library to convert a text with custom markings to html (or anything else).
5
+ License-Expression: MIT
6
+ Requires-Python: >=3.14
7
+ Description-Content-Type: text/markdown
8
+
9
+ # styled-text (Python version)
10
+
11
+ The Python version of the `styled-text` library. Designed for **custom markup transformations**.
12
+
13
+ This library is for anyone who wants to create styled text **_like_** markdown, but with total **flexibility** to create their own rules.
14
+
15
+ ## Installation
16
+
17
+ `pip install styled-text`
18
+
19
+ ## Usage
20
+
21
+ ```python
22
+ import re
23
+
24
+ from text_styler import TextStyler, TextStylerRegexRule, TextStylerRule, html_tag
25
+
26
+ # Let's style this text:
27
+ text = "_Welcome_ to _<~my library~>*styled-text*_ version 0.0.1"
28
+
29
+ # Create the rules (only need to do this once)
30
+ style_rules = [
31
+ TextStylerRule(start="*", transform=html_tag("strong")),
32
+ TextStylerRule(start="_", transform=html_tag("em")),
33
+ TextStylerRule(start="<~", transform=html_tag("del"), end="~>"),
34
+ TextStylerRegexRule(
35
+ regex=re.compile(r"(\d+\.\d+\.\d+)"),
36
+ replace=r"<span style='color: red'>\1</span>",
37
+ ),
38
+ ]
39
+
40
+ # Create the styler:
41
+ styler = TextStyler(style_rules)
42
+
43
+ # Process text
44
+ html = styler.process_text(text)
45
+
46
+ # `html` looks like this now:
47
+ # <em>Welcome</em> to <em><del>my library</del><strong>styled-text</strong></em> version <span style='color: red'>0.0.1</span>
48
+ ```
49
+
50
+ ## Examples
51
+
52
+ #### Simple bold
53
+ ```python
54
+ TextStylerRule(
55
+ start='*',
56
+ transform=html_tag("strong")
57
+ )
58
+ ```
59
+ Input: `My *bolded* text`<br>
60
+ Output (raw): `My <strong>bolded</strong> text`<br>
61
+ Output (visual): My <strong>bolded</strong> text<br>
62
+
63
+ #### Nested bold/italic
64
+ ```python
65
+ TextStylerRule(
66
+ start='*',
67
+ transform=html_tag("strong")
68
+ ),
69
+ TextStylerRule(
70
+ start='_',
71
+ transform=html_tag("em")
72
+ )
73
+ ```
74
+ Input: `My *bolded and _italicized_ text*`<br>
75
+ Output (raw): `My <strong>bolded and <em>italicized</em> text</strong>`<br>
76
+ Output (visual): My <strong>bolded and <em>italicized</em> text</strong><br>
77
+
78
+ Input: `Three *asterisks* matches* eagerly`<br>
79
+ Output (raw): `Three <strong>asterisks</strong> matches* eagerly`<br>
80
+ Output (visual): Three <strong>asterisks</strong> matches* eagerly<br>
81
+
82
+ Input: `Overlapping * tags _ also * matches _ eagerly`<br>
83
+ Output (raw): `Overlapping <strong> tags _ also </strong> matches _ eagerly`<br>
84
+ Output (visual): Overlapping <strong> tags _ also </strong> matches _ eagerly<br>
85
+
86
+ #### Nested / Conflicting Tags
87
+ Here we show two things:
88
+ 1. `start` can be multiple characters (`~~` for strikethrough)
89
+ 2. one rule can be a subset of another, and it still works as expected (`~` for subscript)
90
+
91
+ ```python
92
+ TextStylerRule(
93
+ start="~",
94
+ transform=html_tag("sub")
95
+ ),
96
+ TextStylerRule(
97
+ start="~~",
98
+ transform=html_tag("del")
99
+ )
100
+ ```
101
+ Input: `H\~\~\~3\~\~2\~O`<br>
102
+ Output (raw): `H<sub><del>3</del>2</sub>O`<br>
103
+ Output (visual): H<sub><del>3</del>2</sub>O<br>
104
+
105
+ Input: `A \~\~\~[sic]\~tyop\~\~ typo is...`<br>
106
+ Output (raw): `H<del><sub>[sic]<sub>tyop</del> typo is...`<br>
107
+ Output (visual): H<del><sub>[sic]</sub>tyop</del> typo is...<br>
108
+
109
+ #### Regexes
110
+
111
+ Regexes are the best way to built a complex replacement strategy, like if you need to parse the inner text into pieces, or use the inner text multiple times, such as in this example, where the matched url is used both as the property `href` and as the link text:
112
+
113
+ ```python
114
+ TextStylerRegexRule(
115
+ regex=re.compile(r"https://www.[^\.]+.com),
116
+ replace=r"<a href='\\g<0>'>\\g<0></a>"
117
+ )
118
+ ```
119
+
120
+ Input: `My link https://www.google.com`<br>
121
+ Output (raw): `My link <a href='https://www.google.com'>https://www.google.com</a>`<br>
122
+ Output (visual): My link <a href='https://www.google.com'>https://www.google.com</a><br>
123
+
124
+ However, regexes are matched like literal strings, meaning that any styling within them is not matched by any other rules.<br>
125
+ For example, even if we included the rule from asterisks to \<strong> that we've used before, it will not use it to match within our regex:
126
+
127
+ Input: `My link https://www.*google*.com`<br>
128
+ Output (raw): `My link <a href='https://www.*google*.com'>https://www.*google*.com</a>`<br>
129
+ Output (visual): My link <a href='https://www.*google*.com'>https://www.*google*.com</a><br>
130
+
131
+ #### Preserving the special characters
132
+
133
+ By default, the special characters are removed from the output, but they can be preserved on the inside or on the outside:
134
+
135
+ ```python
136
+ TextStylerRule(
137
+ start='*',
138
+ transform=html_tag("strong"),
139
+ consume_start=ConsumptionStyle.OUTSIDE,
140
+ consume_end=ConsumptionStyle.OUTSIDE,
141
+ ),
142
+ TextStylerRule(
143
+ start='_',
144
+ transform=html_tag("em")
145
+ consume_start=ConsumptionStyle.INSIDE,
146
+ consume_end=ConsumptionStyle.INSIDE,
147
+ )
148
+ ```
149
+
150
+ Input: `My *bolded* text, my _italicized_ text`<br>
151
+ Output (raw): `My <strong>*bolded*</strong> text, my _<em>italicized</em>_ text`<br>
152
+ Output (visual): My <strong>\*bolded\*</strong> text, my \_<em>italicized</em>\_ text<br>
153
+
154
+ #### Disallowing self-nesting
155
+
156
+ By default, a rule nesting within itself is allowed, but this can be disabled in two ways:
157
+ 1. Completely disallowed, at any depth
158
+ 2. A direct parent-child is disallowed, but grandparent-grandchild (or more distant) is allowed
159
+
160
+ ```python
161
+ TextStylerRule(
162
+ start='*',
163
+ transform=html_tag("strong"),
164
+ allow_inner=InnerStyle.DISALLOW_DIRECT,
165
+ ),
166
+ TextStylerRule(
167
+ start='^',
168
+ transform=html_tag("sup")
169
+ allow_inner=InnerStyle.DISALLOW_ANCESTOR,
170
+ ),
171
+ TextStylerRule(
172
+ start='~',
173
+ transform=html_tag("sub")
174
+ allow_inner=InnerStyle.DISALLOW_DIRECT,
175
+ )
176
+ ```
177
+
178
+ Input: `Subscript ~cannot exist ~directly~ within subscript, but *can exist ~within~ the bolded* region~`<br>
179
+ Output (raw): `Subscript <sub>cannot exist ~directly~ within subscript, but <strong>can exist <sub>within</sub> the bolded</strong> region</sub>`<br>
180
+ Output (visual): Subscript <sub>cannot exist \~directly\~ within subscript, but <strong>can exist <sub>within</sub> the bolded</strong> region</sub>`<br>
181
+
182
+ Input: `Superscript ^of multiple depths is ^disallowed^, *even if we ^wrap^ it in a bolded* region^`<br>
183
+ Output (raw): `Superscript <sup>of multiple depths is ^disallowed^, <strong>even if we ^wrap^ it in a bolded</strong> region</sup>`<br>
184
+ Output (visual): Superscript <sup>of multiple depths is ^disallowed^, <strong>even if we ^wrap^ it in a bolded</strong> region</sup><br>
185
+
186
+
187
+ ## Reference
188
+ To use the library, just set up a list of "rules", create a `TextStyler` object, then call `process_text`.
189
+
190
+ | Class / Function | Parameter | Type | Default | Description |
191
+ | :--------------- | :-------- | :--- | :------ | :---------- |
192
+ |TextStyler|rules|list|Required|A list of TextStylerRule or TextStylerRegexRule objects.|
193
+ |TextStylerRegexRule|regex|str|Required|The regular expression pattern to match.
194
+ ||replace|str|Required|The replacement string (supports regex capture groups like \1).
195
+ |TextStylerRule|start|str|Required|The marker string that begins the rule.
196
+ ||transform|Callable[str, str]|Required|"Function to process inner content (e.g., html_tag)."
197
+ ||end|str|start|The marker string that terminates the rule.
198
+ ||consume_start|ConsumptionType|REPLACE|"Determines if start is included in output (INSIDE, OUTSIDE, REPLACE)."
199
+ ||consume_end|ConsumptionType|REPLACE|"Determines if end is included in output (INSIDE, OUTSIDE, REPLACE)."
200
+ ||allow_inner|InnerStyle|ALLOW|"Determines if self-nesting is allowed (ALLOW, DISALLOW_DIRECT, DISALLOW_ANCESTOR)."
201
+ |html_tag|name|str|Required|The HTML tag name (e.g., `"strong"`).|
202
+ ||attrs|dict|`{}`|Optional HTML attributes (e.g., `{"class": "my-css-class"}`).|
@@ -0,0 +1,5 @@
1
+ text_styler.py,sha256=CDGc3aejiIxOgzuyr1Hkl8J3UO2R3USZX0c-awBDyxo,10255
2
+ styled_text-0.1.0.dist-info/METADATA,sha256=_oSNLzcwpSs8jmj9jXnjnZ-sw5-wCMGpfUA_0qZdt9I,7477
3
+ styled_text-0.1.0.dist-info/WHEEL,sha256=aeYiig01lYGDzBgS8HxWXOg3uV61G9ijOsup-k9o1sk,91
4
+ styled_text-0.1.0.dist-info/top_level.txt,sha256=PUtzcegVzmbDMqYOkgthFRimdbqFK5Mm0NU96j0BFwo,12
5
+ styled_text-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (82.0.1)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+
@@ -0,0 +1 @@
1
+ text_styler
text_styler.py ADDED
@@ -0,0 +1,309 @@
1
+ import html
2
+ import re
3
+ from collections.abc import Callable
4
+ from dataclasses import dataclass
5
+ from enum import StrEnum
6
+ from re import Match, Pattern, sub
7
+ from typing import NamedTuple, override
8
+
9
+
10
+ class ConsumptionStyle(StrEnum):
11
+ REPLACE = "REPLACE"
12
+ OUTSIDE = "OUTSIDE"
13
+ INSIDE = "INSIDE"
14
+
15
+
16
+ class InnerStyle(StrEnum):
17
+ ALLOW = "ALLOW"
18
+ DISALLOW_DIRECT = "DISALLOW_DIRECT"
19
+ DISALLOW_ANCESTOR = "DISALLOW_ANCESTOR"
20
+
21
+
22
+ @dataclass
23
+ class TextStylerRegexRule:
24
+ regex: Pattern[str]
25
+ replace: str
26
+
27
+
28
+ def html_tag(
29
+ tag: str, attributes: dict[str, str] | None = None, auto_close_empty: bool = True
30
+ ) -> Callable[[str], str]:
31
+ attrs = "".join(f" {k}='{v}'" for k, v in attributes.items()) if attributes else ""
32
+ start = f"{tag}{attrs}"
33
+ return lambda text: (
34
+ f"<{start} />" if auto_close_empty and not text else f"<{start}>{text}</{tag}>"
35
+ )
36
+
37
+
38
+ @dataclass
39
+ class TextStylerRule:
40
+ start: str
41
+ transform: Callable[[str], str]
42
+ end: str | None = None
43
+
44
+ consume_start: ConsumptionStyle = ConsumptionStyle.REPLACE
45
+ consume_end: ConsumptionStyle = ConsumptionStyle.REPLACE
46
+ allow_inner: InnerStyle = InnerStyle.ALLOW
47
+
48
+ def get_end(self) -> str:
49
+ return html.escape(self.end or self.start)
50
+
51
+ def get_start(self) -> str:
52
+ return html.escape(self.start)
53
+
54
+ def get_wrappers(self) -> tuple[str, str, str, str]:
55
+ outer_prefix, inner_prefix, inner_suffix, outer_suffix = "", "", "", ""
56
+ if self.consume_start == ConsumptionStyle.INSIDE:
57
+ outer_prefix = self.get_end()
58
+ if self.consume_start == ConsumptionStyle.OUTSIDE:
59
+ inner_prefix = self.get_start()
60
+
61
+ if self.consume_end == ConsumptionStyle.INSIDE:
62
+ outer_suffix = self.get_end()
63
+ if self.consume_end == ConsumptionStyle.OUTSIDE:
64
+ inner_suffix = self.get_end()
65
+ return (outer_prefix, inner_prefix, inner_suffix, outer_suffix)
66
+
67
+
68
+ @dataclass
69
+ class TextAction:
70
+ text: str
71
+
72
+
73
+ @dataclass
74
+ class PushAction:
75
+ rule: TextStylerRule
76
+
77
+
78
+ @dataclass
79
+ class PopAction:
80
+ pass
81
+
82
+
83
+ @dataclass
84
+ class RegexAction:
85
+ rule: TextStylerRegexRule
86
+ match: re.Match[str]
87
+
88
+
89
+ # Used in _find_next
90
+ class NextStyle(NamedTuple):
91
+ rule: TextStylerRule
92
+ position: int
93
+ is_start: bool
94
+ is_end: bool
95
+
96
+
97
+ class NextRegex(NamedTuple):
98
+ rule: TextStylerRegexRule
99
+ position: int
100
+ match: Match[str]
101
+
102
+
103
+ type Action = TextAction | PushAction | PopAction | RegexAction
104
+
105
+
106
+ @dataclass(frozen=True)
107
+ class Path:
108
+ actions: tuple[Action, ...] = ()
109
+ stack: tuple[TextStylerRule, ...] = ()
110
+ num_skips: int = 0
111
+
112
+ @property
113
+ def num_pushes(self) -> int:
114
+ return sum(1 for a in self.actions if isinstance(a, (PushAction, RegexAction)))
115
+
116
+ def peek(self) -> TextStylerRule | None:
117
+ return self.stack[-1] if len(self.stack) > 0 else None
118
+
119
+ def copy_and_push(self, action: Action, extra_skip: int = 0) -> Path:
120
+ new_actions = self.actions
121
+ if not isinstance(action, TextAction) or action.text:
122
+ new_actions += (action,)
123
+
124
+ if isinstance(action, PushAction):
125
+ new_stack = self.stack + (action.rule,)
126
+ elif isinstance(action, PopAction):
127
+ new_stack = self.stack[:-1]
128
+ else:
129
+ new_stack = self.stack
130
+
131
+ return Path(new_actions, new_stack, self.num_skips + extra_skip)
132
+
133
+
134
+ class TextStyler:
135
+ def __init__(self, rules: list[TextStylerRule | TextStylerRegexRule]):
136
+ self.rules: list[TextStylerRule | TextStylerRegexRule] = rules
137
+ self.min_skips: int | None = None
138
+
139
+ def process_text(self, text: str, multiline: bool = False):
140
+ self.min_skips = None
141
+ if multiline:
142
+ return self._process_text(text)
143
+ return "".join(map(self._process_text, re.findall(r".*?\n|.+", text)))
144
+
145
+ def _process_text(self, text: str) -> str:
146
+ if text == "":
147
+ return text
148
+ text = html.escape(text, quote=False)
149
+ paths = self._helper(text, 0, Path())
150
+
151
+ # First we pick the paths with the lowest skipped markings (memoization already pruned out most of these)
152
+ # Then tie-break to the fewest blocks created
153
+ best_path = min(paths, key=lambda p: (p.num_skips, p.num_pushes))
154
+
155
+ # Build the tree
156
+ ast = SyntaxTree()
157
+ for action in best_path.actions:
158
+ if isinstance(action, TextAction):
159
+ ast.push_str(action.text)
160
+ elif isinstance(action, PushAction):
161
+ ast.push(action.rule)
162
+ elif isinstance(action, PopAction):
163
+ ast.pop()
164
+ else:
165
+ ast.push_regex(action.rule, action.match)
166
+
167
+ return str(ast)
168
+
169
+ def _helper(self, text: str, start: int, path: Path) -> list[Path]:
170
+ if self.min_skips is not None and path.num_skips > self.min_skips:
171
+ return []
172
+ if text == "":
173
+ return [Path()]
174
+
175
+ # Get the next token(s)
176
+ nexts = self._find_next(text, start)
177
+
178
+ # Base case, if there aren't any more tokens, return success or fail
179
+ if start >= len(text) or len(nexts) == 0:
180
+ if len(path.stack) > 0:
181
+ return []
182
+ self.min_skips = min(self.min_skips or path.num_skips, path.num_skips)
183
+ return [path.copy_and_push(TextAction(text[start:]))]
184
+
185
+ paths: list[Path] = []
186
+ for next in nexts:
187
+ new_start = next.position
188
+ new_path = path.copy_and_push(TextAction(text[start:new_start]))
189
+
190
+ if isinstance(next, NextRegex):
191
+ new_start += len(next.match.group(0))
192
+ new_path = new_path.copy_and_push(RegexAction(next.rule, next.match))
193
+ else:
194
+ rule, _, is_start, is_end = next
195
+ new_start += len(rule.get_start() if is_start else rule.get_end())
196
+ if is_end and len(path.stack) > 0 and path.peek() == rule:
197
+ new_path = new_path.copy_and_push(PopAction())
198
+ elif is_start:
199
+ new_path = new_path.copy_and_push(PushAction(rule))
200
+ # else is_end but the top of the stack doesn't match? new_start moves forward but stack stays the same
201
+
202
+ paths.extend(self._helper(text, new_start, new_path))
203
+
204
+ # Fallback branch: skip the current set of tokens entirely
205
+ new_start = nexts[-1].position + 1
206
+ new_path = path.copy_and_push(TextAction(text[start:new_start]), 1)
207
+ paths.extend(self._helper(text, new_start, new_path))
208
+ return paths
209
+
210
+ def _find_next(self, text: str, start: int) -> list[NextStyle | NextRegex]:
211
+ nexts: list[NextStyle | NextRegex] = []
212
+ escaped = False
213
+ for index in range(start, len(text)):
214
+ for marking in self.rules:
215
+ if isinstance(marking, TextStylerRegexRule):
216
+ if match := marking.regex.match(text, index):
217
+ nexts.append(NextRegex(marking, index, match))
218
+ elif not escaped:
219
+ is_start = text.startswith(marking.get_start(), index)
220
+ is_end = text.startswith(marking.get_end(), index)
221
+ if is_start or is_end:
222
+ nexts.append(NextStyle(marking, index, is_start, is_end))
223
+
224
+ escaped = text[index] == "\\" and not escaped
225
+ if len(nexts) > 0:
226
+ return nexts
227
+ return []
228
+
229
+
230
+ class SyntaxTree:
231
+ def __init__(self):
232
+ self.children: list[SyntaxTreeNode | str] = []
233
+ self.curr: SyntaxTreeNode | None = None
234
+
235
+ def push(self, rule: TextStylerRule):
236
+ new_node = SyntaxTreeNode(self.curr, rule)
237
+ self._push(new_node)
238
+ self.curr = new_node
239
+
240
+ def push_regex(self, rule: TextStylerRegexRule, match: re.Match[str]):
241
+ new_node = SyntaxTreeNode(self.curr, rule, match)
242
+ self._push(new_node)
243
+
244
+ def push_str(self, text: str):
245
+ if text:
246
+ self._push(re.sub(r"\\(.)", r"\1", text))
247
+
248
+ def _push(self, node: SyntaxTreeNode | str):
249
+ if self.curr is None:
250
+ self.children.append(node)
251
+ else:
252
+ self.curr.push(node)
253
+
254
+ def pop(self):
255
+ if self.curr is None:
256
+ raise ValueError("Attempted to pop() when already at root")
257
+ self.curr = self.curr.parent
258
+
259
+ @override
260
+ def __str__(self) -> str:
261
+ return "".join(map(str, self.children))
262
+
263
+
264
+ class SyntaxTreeNode:
265
+ def __init__(
266
+ self,
267
+ parent: SyntaxTreeNode | None,
268
+ rule: TextStylerRule | TextStylerRegexRule,
269
+ match: re.Match[str] | None = None,
270
+ ):
271
+ self.parent: SyntaxTreeNode | None = parent
272
+ self.rule: TextStylerRule | TextStylerRegexRule = rule
273
+ self.match: re.Match[str] | None = match
274
+
275
+ self.children: list[str | SyntaxTreeNode] = []
276
+ self.path: tuple[TextStylerRule, ...] = ()
277
+ if parent is not None and isinstance(parent.rule, TextStylerRule):
278
+ self.path = parent.path + (parent.rule,)
279
+
280
+ def push(self, child: str | SyntaxTreeNode):
281
+ self.children.append(child)
282
+
283
+ @override
284
+ def __str__(self):
285
+ if isinstance(self.rule, TextStylerRule):
286
+ inner = "".join(map(str, self.children))
287
+ if self._should_print_raw():
288
+ return self.rule.get_start() + inner + self.rule.get_end()
289
+
290
+ outer_prefix, inner_prefix, inner_suffix, outer_suffix = (
291
+ self.rule.get_wrappers()
292
+ )
293
+ inner = inner_prefix + inner + inner_suffix
294
+ return outer_prefix + self.rule.transform(inner) + outer_suffix
295
+ elif self.match is not None:
296
+ return sub(self.rule.regex, self.rule.replace, self.match.group(0))
297
+ raise ValueError("TextStylerRegexRule provided without a valid `match`")
298
+
299
+ def _should_print_raw(self) -> bool:
300
+ if isinstance(self.rule, TextStylerRegexRule):
301
+ return False
302
+
303
+ allow_inner: InnerStyle = self.rule.allow_inner
304
+ if allow_inner == InnerStyle.ALLOW or self.parent is None:
305
+ return False
306
+ if allow_inner == InnerStyle.DISALLOW_DIRECT:
307
+ return self.parent.rule == self.rule
308
+ if allow_inner == InnerStyle.DISALLOW_ANCESTOR:
309
+ return self.rule in self.path