PyPI - styled-text - Versions diffs - 0.1.0__py3-none-any.whl - Mend

styled-text 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

styled_text-0.1.0.dist-info/METADATA +202 -0
styled_text-0.1.0.dist-info/RECORD +5 -0
styled_text-0.1.0.dist-info/WHEEL +5 -0
styled_text-0.1.0.dist-info/top_level.txt +1 -0
text_styler.py +309 -0

styled_text-0.1.0.dist-info/METADATA ADDED Viewed

@@ -0,0 +1,202 @@
+Metadata-Version: 2.4
+Name: styled-text
+Version: 0.1.0
+Summary: Library to convert a text with custom markings to html (or anything else).
+License-Expression: MIT
+Requires-Python: >=3.14
+Description-Content-Type: text/markdown
+# styled-text (Python version)
+The Python version of the `styled-text` library. Designed for **custom markup transformations**.
+This library is for anyone who wants to create styled text **_like_** markdown, but with total **flexibility** to create their own rules.
+## Installation
+`pip install styled-text`
+## Usage
+```python
+import re
+from text_styler import TextStyler, TextStylerRegexRule, TextStylerRule, html_tag
+# Let's style this text:
+text = "_Welcome_ to _<~my library~>*styled-text*_ version 0.0.1"
+# Create the rules (only need to do this once)
+style_rules = [
+    TextStylerRule(start="*", transform=html_tag("strong")),
+    TextStylerRule(start="_", transform=html_tag("em")),
+    TextStylerRule(start="<~", transform=html_tag("del"), end="~>"),
+    TextStylerRegexRule(
+        regex=re.compile(r"(\d+\.\d+\.\d+)"),
+        replace=r"<span style='color: red'>\1</span>",
+    ),
+]
+# Create the styler:
+styler = TextStyler(style_rules)
+# Process text
+html = styler.process_text(text)
+# `html` looks like this now:
+# <em>Welcome</em> to <em><del>my library</del><strong>styled-text</strong></em> version <span style='color: red'>0.0.1</span>
+```
+## Examples
+#### Simple bold
+```python
+TextStylerRule(
+  start='*',
+  transform=html_tag("strong")
+)
+```
+Input: `My *bolded* text`<br>
+Output (raw): `My <strong>bolded</strong> text`<br>
+Output (visual): My <strong>bolded</strong> text<br>
+#### Nested bold/italic
+```python
+TextStylerRule(
+  start='*',
+  transform=html_tag("strong")
+),
+TextStylerRule(
+  start='_',
+  transform=html_tag("em")
+)
+```
+Input: `My *bolded and _italicized_ text*`<br>
+Output (raw): `My <strong>bolded and <em>italicized</em> text</strong>`<br>
+Output (visual): My <strong>bolded and <em>italicized</em> text</strong><br>
+Input: `Three *asterisks* matches* eagerly`<br>
+Output (raw): `Three <strong>asterisks</strong> matches* eagerly`<br>
+Output (visual): Three <strong>asterisks</strong> matches* eagerly<br>
+Input: `Overlapping * tags _ also * matches _ eagerly`<br>
+Output (raw): `Overlapping <strong> tags _ also </strong> matches _ eagerly`<br>
+Output (visual): Overlapping <strong> tags _ also </strong> matches _ eagerly<br>
+#### Nested / Conflicting Tags
+Here we show two things:
+1. `start` can be multiple characters (`~~` for strikethrough)
+2. one rule can be a subset of another, and it still works as expected (`~` for subscript)
+```python
+TextStylerRule(
+  start="~",
+  transform=html_tag("sub")
+),
+TextStylerRule(
+  start="~~",
+  transform=html_tag("del")
+)
+```
+Input: `H\~\~\~3\~\~2\~O`<br>
+Output (raw): `H<sub><del>3</del>2</sub>O`<br>
+Output (visual): H<sub><del>3</del>2</sub>O<br>
+Input: `A \~\~\~[sic]\~tyop\~\~ typo is...`<br>
+Output (raw): `H<del><sub>[sic]<sub>tyop</del> typo is...`<br>
+Output (visual): H<del><sub>[sic]</sub>tyop</del> typo is...<br>
+#### Regexes
+Regexes are the best way to built a complex replacement strategy, like if you need to parse the inner text into pieces, or use the inner text multiple times, such as in this example, where the matched url is used both as the property `href` and as the link text:
+```python
+TextStylerRegexRule(
+  regex=re.compile(r"https://www.[^\.]+.com),
+  replace=r"<a href='\\g<0>'>\\g<0></a>"
+)
+```
+Input: `My link https://www.google.com`<br>
+Output (raw): `My link <a href='https://www.google.com'>https://www.google.com</a>`<br>
+Output (visual): My link <a href='https://www.google.com'>https://www.google.com</a><br>
+However, regexes are matched like literal strings, meaning that any styling within them is not matched by any other rules.<br>
+For example, even if we included the rule from asterisks to \<strong> that we've used before, it will not use it to match within our regex:
+Input: `My link https://www.*google*.com`<br>
+Output (raw): `My link <a href='https://www.*google*.com'>https://www.*google*.com</a>`<br>
+Output (visual): My link <a href='https://www.*google*.com'>https://www.*google*.com</a><br>
+#### Preserving the special characters
+By default, the special characters are removed from the output, but they can be preserved on the inside or on the outside:
+```python
+TextStylerRule(
+  start='*',
+  transform=html_tag("strong"),
+  consume_start=ConsumptionStyle.OUTSIDE,
+  consume_end=ConsumptionStyle.OUTSIDE,
+),
+TextStylerRule(
+  start='_',
+  transform=html_tag("em")
+  consume_start=ConsumptionStyle.INSIDE,
+  consume_end=ConsumptionStyle.INSIDE,
+)
+```
+Input: `My *bolded* text, my _italicized_ text`<br>
+Output (raw): `My <strong>*bolded*</strong> text, my _<em>italicized</em>_ text`<br>
+Output (visual): My <strong>\*bolded\*</strong> text, my \_<em>italicized</em>\_ text<br>
+#### Disallowing self-nesting
+By default, a rule nesting within itself is allowed, but this can be disabled in two ways:
+1. Completely disallowed, at any depth
+2. A direct parent-child is disallowed, but grandparent-grandchild (or more distant) is allowed
+```python
+TextStylerRule(
+  start='*',
+  transform=html_tag("strong"),
+  allow_inner=InnerStyle.DISALLOW_DIRECT,
+),
+TextStylerRule(
+  start='^',
+  transform=html_tag("sup")
+  allow_inner=InnerStyle.DISALLOW_ANCESTOR,
+),
+TextStylerRule(
+  start='~',
+  transform=html_tag("sub")
+  allow_inner=InnerStyle.DISALLOW_DIRECT,
+)
+```
+Input: `Subscript ~cannot exist ~directly~ within subscript, but *can exist ~within~ the bolded* region~`<br>
+Output (raw): `Subscript <sub>cannot exist ~directly~ within subscript, but <strong>can exist <sub>within</sub> the bolded</strong> region</sub>`<br>
+Output (visual): Subscript <sub>cannot exist \~directly\~ within subscript, but <strong>can exist <sub>within</sub> the bolded</strong> region</sub>`<br>
+Input: `Superscript ^of multiple depths is ^disallowed^, *even if we ^wrap^ it in a bolded* region^`<br>
+Output (raw): `Superscript <sup>of multiple depths is ^disallowed^, <strong>even if we ^wrap^ it in a bolded</strong> region</sup>`<br>
+Output (visual): Superscript <sup>of multiple depths is ^disallowed^, <strong>even if we ^wrap^ it in a bolded</strong> region</sup><br>
+## Reference
+To use the library, just set up a list of "rules", create a `TextStyler` object, then call `process_text`.
+| Class / Function | Parameter | Type | Default | Description |
+| :--------------- | :-------- | :--- | :------ | :---------- |
+|TextStyler|rules|list|Required|A list of TextStylerRule or TextStylerRegexRule objects.|
+|TextStylerRegexRule|regex|str|Required|The regular expression pattern to match.
+||replace|str|Required|The replacement string (supports regex capture groups like \1).
+|TextStylerRule|start|str|Required|The marker string that begins the rule.
+||transform|Callable[str, str]|Required|"Function to process inner content (e.g., html_tag)."
+||end|str|start|The marker string that terminates the rule.
+||consume_start|ConsumptionType|REPLACE|"Determines if start is included in output (INSIDE, OUTSIDE, REPLACE)."
+||consume_end|ConsumptionType|REPLACE|"Determines if end is included in output (INSIDE, OUTSIDE, REPLACE)."
+||allow_inner|InnerStyle|ALLOW|"Determines if self-nesting is allowed (ALLOW, DISALLOW_DIRECT, DISALLOW_ANCESTOR)."
+|html_tag|name|str|Required|The HTML tag name (e.g., `"strong"`).|
+||attrs|dict|`{}`|Optional HTML attributes (e.g., `{"class": "my-css-class"}`).|

styled_text-0.1.0.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,5 @@
+text_styler.py,sha256=CDGc3aejiIxOgzuyr1Hkl8J3UO2R3USZX0c-awBDyxo,10255
+styled_text-0.1.0.dist-info/METADATA,sha256=_oSNLzcwpSs8jmj9jXnjnZ-sw5-wCMGpfUA_0qZdt9I,7477
+styled_text-0.1.0.dist-info/WHEEL,sha256=aeYiig01lYGDzBgS8HxWXOg3uV61G9ijOsup-k9o1sk,91
+styled_text-0.1.0.dist-info/top_level.txt,sha256=PUtzcegVzmbDMqYOkgthFRimdbqFK5Mm0NU96j0BFwo,12
+styled_text-0.1.0.dist-info/RECORD,,

styled_text-0.1.0.dist-info/WHEEL ADDED Viewed

@@ -0,0 +1,5 @@
+Wheel-Version: 1.0
+Generator: setuptools (82.0.1)
+Root-Is-Purelib: true
+Tag: py3-none-any

styled_text-0.1.0.dist-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ text_styler

text_styler.py ADDED Viewed

@@ -0,0 +1,309 @@
+import html
+import re
+from collections.abc import Callable
+from dataclasses import dataclass
+from enum import StrEnum
+from re import Match, Pattern, sub
+from typing import NamedTuple, override
+class ConsumptionStyle(StrEnum):
+    REPLACE = "REPLACE"
+    OUTSIDE = "OUTSIDE"
+    INSIDE = "INSIDE"
+class InnerStyle(StrEnum):
+    ALLOW = "ALLOW"
+    DISALLOW_DIRECT = "DISALLOW_DIRECT"
+    DISALLOW_ANCESTOR = "DISALLOW_ANCESTOR"
+@dataclass
+class TextStylerRegexRule:
+    regex: Pattern[str]
+    replace: str
+def html_tag(
+    tag: str, attributes: dict[str, str] | None = None, auto_close_empty: bool = True
+) -> Callable[[str], str]:
+    attrs = "".join(f" {k}='{v}'" for k, v in attributes.items()) if attributes else ""
+    start = f"{tag}{attrs}"
+    return lambda text: (
+        f"<{start} />" if auto_close_empty and not text else f"<{start}>{text}</{tag}>"
+    )
+@dataclass
+class TextStylerRule:
+    start: str
+    transform: Callable[[str], str]
+    end: str | None = None
+    consume_start: ConsumptionStyle = ConsumptionStyle.REPLACE
+    consume_end: ConsumptionStyle = ConsumptionStyle.REPLACE
+    allow_inner: InnerStyle = InnerStyle.ALLOW
+    def get_end(self) -> str:
+        return html.escape(self.end or self.start)
+    def get_start(self) -> str:
+        return html.escape(self.start)
+    def get_wrappers(self) -> tuple[str, str, str, str]:
+        outer_prefix, inner_prefix, inner_suffix, outer_suffix = "", "", "", ""
+        if self.consume_start == ConsumptionStyle.INSIDE:
+            outer_prefix = self.get_end()
+        if self.consume_start == ConsumptionStyle.OUTSIDE:
+            inner_prefix = self.get_start()
+        if self.consume_end == ConsumptionStyle.INSIDE:
+            outer_suffix = self.get_end()
+        if self.consume_end == ConsumptionStyle.OUTSIDE:
+            inner_suffix = self.get_end()
+        return (outer_prefix, inner_prefix, inner_suffix, outer_suffix)
+@dataclass
+class TextAction:
+    text: str
+@dataclass
+class PushAction:
+    rule: TextStylerRule
+@dataclass
+class PopAction:
+    pass
+@dataclass
+class RegexAction:
+    rule: TextStylerRegexRule
+    match: re.Match[str]
+# Used in _find_next
+class NextStyle(NamedTuple):
+    rule: TextStylerRule
+    position: int
+    is_start: bool
+    is_end: bool
+class NextRegex(NamedTuple):
+    rule: TextStylerRegexRule
+    position: int
+    match: Match[str]
+type Action = TextAction | PushAction | PopAction | RegexAction
+@dataclass(frozen=True)
+class Path:
+    actions: tuple[Action, ...] = ()
+    stack: tuple[TextStylerRule, ...] = ()
+    num_skips: int = 0
+    @property
+    def num_pushes(self) -> int:
+        return sum(1 for a in self.actions if isinstance(a, (PushAction, RegexAction)))
+    def peek(self) -> TextStylerRule | None:
+        return self.stack[-1] if len(self.stack) > 0 else None
+    def copy_and_push(self, action: Action, extra_skip: int = 0) -> Path:
+        new_actions = self.actions
+        if not isinstance(action, TextAction) or action.text:
+            new_actions += (action,)
+        if isinstance(action, PushAction):
+            new_stack = self.stack + (action.rule,)
+        elif isinstance(action, PopAction):
+            new_stack = self.stack[:-1]
+        else:
+            new_stack = self.stack
+        return Path(new_actions, new_stack, self.num_skips + extra_skip)
+class TextStyler:
+    def __init__(self, rules: list[TextStylerRule | TextStylerRegexRule]):
+        self.rules: list[TextStylerRule | TextStylerRegexRule] = rules
+        self.min_skips: int | None = None
+    def process_text(self, text: str, multiline: bool = False):
+        self.min_skips = None
+        if multiline:
+            return self._process_text(text)
+        return "".join(map(self._process_text, re.findall(r".*?\n|.+", text)))
+    def _process_text(self, text: str) -> str:
+        if text == "":
+            return text
+        text = html.escape(text, quote=False)
+        paths = self._helper(text, 0, Path())
+        # First we pick the paths with the lowest skipped markings (memoization already pruned out most of these)
+        # Then tie-break to the fewest blocks created
+        best_path = min(paths, key=lambda p: (p.num_skips, p.num_pushes))
+        # Build the tree
+        ast = SyntaxTree()
+        for action in best_path.actions:
+            if isinstance(action, TextAction):
+                ast.push_str(action.text)
+            elif isinstance(action, PushAction):
+                ast.push(action.rule)
+            elif isinstance(action, PopAction):
+                ast.pop()
+            else:
+                ast.push_regex(action.rule, action.match)
+        return str(ast)
+    def _helper(self, text: str, start: int, path: Path) -> list[Path]:
+        if self.min_skips is not None and path.num_skips > self.min_skips:
+            return []
+        if text == "":
+            return [Path()]
+        # Get the next token(s)
+        nexts = self._find_next(text, start)
+        # Base case, if there aren't any more tokens, return success or fail
+        if start >= len(text) or len(nexts) == 0:
+            if len(path.stack) > 0:
+                return []
+            self.min_skips = min(self.min_skips or path.num_skips, path.num_skips)
+            return [path.copy_and_push(TextAction(text[start:]))]
+        paths: list[Path] = []
+        for next in nexts:
+            new_start = next.position
+            new_path = path.copy_and_push(TextAction(text[start:new_start]))
+            if isinstance(next, NextRegex):
+                new_start += len(next.match.group(0))
+                new_path = new_path.copy_and_push(RegexAction(next.rule, next.match))
+            else:
+                rule, _, is_start, is_end = next
+                new_start += len(rule.get_start() if is_start else rule.get_end())
+                if is_end and len(path.stack) > 0 and path.peek() == rule:
+                    new_path = new_path.copy_and_push(PopAction())
+                elif is_start:
+                    new_path = new_path.copy_and_push(PushAction(rule))
+                # else is_end but the top of the stack doesn't match? new_start moves forward but stack stays the same
+            paths.extend(self._helper(text, new_start, new_path))
+        # Fallback branch: skip the current set of tokens entirely
+        new_start = nexts[-1].position + 1
+        new_path = path.copy_and_push(TextAction(text[start:new_start]), 1)
+        paths.extend(self._helper(text, new_start, new_path))
+        return paths
+    def _find_next(self, text: str, start: int) -> list[NextStyle | NextRegex]:
+        nexts: list[NextStyle | NextRegex] = []
+        escaped = False
+        for index in range(start, len(text)):
+            for marking in self.rules:
+                if isinstance(marking, TextStylerRegexRule):
+                    if match := marking.regex.match(text, index):
+                        nexts.append(NextRegex(marking, index, match))
+                elif not escaped:
+                    is_start = text.startswith(marking.get_start(), index)
+                    is_end = text.startswith(marking.get_end(), index)
+                    if is_start or is_end:
+                        nexts.append(NextStyle(marking, index, is_start, is_end))
+            escaped = text[index] == "\\" and not escaped
+            if len(nexts) > 0:
+                return nexts
+        return []
+class SyntaxTree:
+    def __init__(self):
+        self.children: list[SyntaxTreeNode | str] = []
+        self.curr: SyntaxTreeNode | None = None
+    def push(self, rule: TextStylerRule):
+        new_node = SyntaxTreeNode(self.curr, rule)
+        self._push(new_node)
+        self.curr = new_node
+    def push_regex(self, rule: TextStylerRegexRule, match: re.Match[str]):
+        new_node = SyntaxTreeNode(self.curr, rule, match)
+        self._push(new_node)
+    def push_str(self, text: str):
+        if text:
+            self._push(re.sub(r"\\(.)", r"\1", text))
+    def _push(self, node: SyntaxTreeNode | str):
+        if self.curr is None:
+            self.children.append(node)
+        else:
+            self.curr.push(node)
+    def pop(self):
+        if self.curr is None:
+            raise ValueError("Attempted to pop() when already at root")
+        self.curr = self.curr.parent
+    @override
+    def __str__(self) -> str:
+        return "".join(map(str, self.children))
+class SyntaxTreeNode:
+    def __init__(
+        self,
+        parent: SyntaxTreeNode | None,
+        rule: TextStylerRule | TextStylerRegexRule,
+        match: re.Match[str] | None = None,
+    ):
+        self.parent: SyntaxTreeNode | None = parent
+        self.rule: TextStylerRule | TextStylerRegexRule = rule
+        self.match: re.Match[str] | None = match
+        self.children: list[str | SyntaxTreeNode] = []
+        self.path: tuple[TextStylerRule, ...] = ()
+        if parent is not None and isinstance(parent.rule, TextStylerRule):
+            self.path = parent.path + (parent.rule,)
+    def push(self, child: str | SyntaxTreeNode):
+        self.children.append(child)
+    @override
+    def __str__(self):
+        if isinstance(self.rule, TextStylerRule):
+            inner = "".join(map(str, self.children))
+            if self._should_print_raw():
+                return self.rule.get_start() + inner + self.rule.get_end()
+            outer_prefix, inner_prefix, inner_suffix, outer_suffix = (
+                self.rule.get_wrappers()
+            )
+            inner = inner_prefix + inner + inner_suffix
+            return outer_prefix + self.rule.transform(inner) + outer_suffix
+        elif self.match is not None:
+            return sub(self.rule.regex, self.rule.replace, self.match.group(0))
+        raise ValueError("TextStylerRegexRule provided without a valid `match`")
+    def _should_print_raw(self) -> bool:
+        if isinstance(self.rule, TextStylerRegexRule):
+            return False
+        allow_inner: InnerStyle = self.rule.allow_inner
+        if allow_inner == InnerStyle.ALLOW or self.parent is None:
+            return False
+        if allow_inner == InnerStyle.DISALLOW_DIRECT:
+            return self.parent.rule == self.rule
+        if allow_inner == InnerStyle.DISALLOW_ANCESTOR:
+            return self.rule in self.path