slack-markdown-parser 2.4.1__tar.gz → 2.4.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/CHANGELOG.md +13 -0
- {slack_markdown_parser-2.4.1/slack_markdown_parser.egg-info → slack_markdown_parser-2.4.3}/PKG-INFO +1 -1
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/docs/spec-ja.md +2 -1
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/docs/spec.md +2 -1
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/pyproject.toml +1 -1
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser/__init__.py +1 -1
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser/converter.py +92 -6
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3/slack_markdown_parser.egg-info}/PKG-INFO +1 -1
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/LICENSE +0 -0
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/MANIFEST.in +0 -0
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/README-ja.md +0 -0
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/README.md +0 -0
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/setup.cfg +0 -0
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser/py.typed +0 -0
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser.egg-info/SOURCES.txt +0 -0
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser.egg-info/dependency_links.txt +0 -0
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser.egg-info/requires.txt +0 -0
- {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser.egg-info/top_level.txt +0 -0
|
@@ -6,6 +6,19 @@ The format is based on Keep a Changelog, and the project follows Semantic Versio
|
|
|
6
6
|
|
|
7
7
|
## [Unreleased]
|
|
8
8
|
|
|
9
|
+
## [2.4.3] - 2026-05-29
|
|
10
|
+
|
|
11
|
+
### Fixed
|
|
12
|
+
|
|
13
|
+
- Stopped bare-URL autolinking from greedily swallowing trailing text. `normalize_bare_urls_for_slack_markdown` matched `https?://[^\s<]+`, so a scheme URL glued directly to following CJK text (e.g. `(https://example.com)**。に句点を直結。`) — common in Japanese, which puts no space after a URL — captured the closing paren, the `**` markers, the CJK punctuation, and the rest of the sentence into one `<…>` autolink, over-extending the link and exposing the literal `**`. The matched URL is now trimmed GFM-style: it stops at a doubled emphasis run (`**`/`~~`), at code/angle/pipe markers (`` ` ``, `<`, `>`, `|`), and at CJK punctuation (`、`/`。`/`」` …), and trailing punctuation (GFM's autolink set `! ? . , : * _ ~`, and an unbalanced `)`) is dropped while balanced parentheses are kept. `;` and quotes are kept (URL-legal), and a lone `*` (URL wildcards/queries) and CJK letters (IRIs / Unicode IDN hosts) are preserved.
|
|
14
|
+
|
|
15
|
+
## [2.4.2] - 2026-05-29
|
|
16
|
+
|
|
17
|
+
### Fixed
|
|
18
|
+
|
|
19
|
+
- Stopped an unbalanced emphasis delimiter from corrupting unrelated, well-formed spans in the same block. The bold/italic/strikethrough patterns are matched with `re.DOTALL`, so a single stray `**` (for example a whitespace-flanked literal `**` in `閉じ ** が`, or an unclosed marker) shifted marker pairing across the whole block and flipped the protective ZWSP of nearby punctuation-terminated bold to the broken *outer* position, re-exposing the literal markers on Slack. `EMPHASIS_PATTERNS` now enforces CommonMark's minimal flanking requirement — an opening run is not followed by whitespace and a closing run is not preceded by whitespace — so a non-flanking stray marker stays literal and no longer disturbs its neighbours.
|
|
20
|
+
- Bounded the `**` and `~~` emphasis bodies to a single delimiter run so a dangling opener with no valid closer of its own (for example `**oops **` or `**: x **` before a later `**…%**`) can no longer scan past the literal stray and steal a following well-formed span's closing marker, which had misplaced that span's protective ZWSP. The single-`*` italic body is intentionally left unbounded because italics legitimately wrap `**bold**`.
|
|
21
|
+
|
|
9
22
|
## [2.4.1] - 2026-05-29
|
|
10
23
|
|
|
11
24
|
### Fixed
|
|
@@ -72,7 +72,7 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
|
|
|
72
72
|
### このパーサーが補正・安定化するもの
|
|
73
73
|
|
|
74
74
|
- `_..._` / `__...__` を Slack 互換の `*...*` / `**...**` に正規化する
|
|
75
|
-
- bare URL を Slack で安定しやすい `<https://...>`
|
|
75
|
+
- bare URL を Slack で安定しやすい `<https://...>` 形式にそろえる。まず URL を実際の範囲にトリミングする(GFM 風): 二重の強調記号(`**`/`~~`)・コード/山かっこ/パイプ記号(`` ` ``・`<`・`>`・`|`)・CJK / 全角の句読点(`、` `。` `」` `)` `!` …)で停止し、末尾の句読点(GFM の autolink 集合 `! ? . , : * _ ~` と不均衡な `)`)を除外する(`;` と引用符は URL で正当なため保持)。単独の `*`(URL のワイルドカード/クエリ)と CJK の**文字**(反復記号 `々` を含む。IRI / Unicode IDN ホスト。例 `https://ja.wikipedia.org/wiki/人々`)は保持する。これにより、日本語のように URL の直後へ空白なしで CJK 本文が続く場合でも、URL が行末まで(閉じの `**` ごと)貪欲に飲み込んでリンク化する事故を防ぐ。
|
|
76
76
|
- 崩れた Markdown テーブルを補って `table` ブロックへ変換する
|
|
77
77
|
- 意味が明確な単独 Markdown 構文を Slack ネイティブのブロックへ変換する
|
|
78
78
|
- 単独行の画像構文 `` → `image`
|
|
@@ -178,6 +178,7 @@ LLM は外枠パイプの省略、区切り行の欠落、列数の不一致な
|
|
|
178
178
|
- チャンクの先頭・末尾(行頭・行末・テキスト端、またはフェンスドコードブロックの境界)は安全とみなし、ゼロ幅スペースを付けません。
|
|
179
179
|
- 外側の片方が前後の非境界テキストに密着している場合、その側だけにゼロ幅スペースを付けます。安全(境界)側はそのままにします。
|
|
180
180
|
- 強調マーカー(`**`・`*`・`~~`)の内側が句読点に密着している場合(例 `**注意:**` や `**70.9%→83.0%**`)、マーカーの内側にゼロ幅スペースを挿入します。これによりマーカーの内側隣接文字が非句読点になり、後続が何であっても Slack の CommonMark right-/left-flanking 判定が成立します。Slack が flanking 近傍として認めない CJK テキストや CJK 句読点(`、` / `。`)の直前でも有効です。インラインコードは flanking 規則の対象外なので、このルールから除外します。
|
|
181
|
+
- 強調デリミタは CommonMark の最小 flanking 条件を満たす場合のみ認識します。すなわち、開きランの直後が空白でなく、閉じランの直前が空白でないこと。両側が空白の単独マーカー(例 `閉じ ** が` の literal な `**`)や、その他の対になっていないマーカーはそのまま残します。これにより、1 個の余分なマーカーが近くの正しい装飾のペアリングをずらして、ゼロ幅スペースを誤った位置に挿入することを防ぎます。
|
|
181
182
|
|
|
182
183
|
例外:
|
|
183
184
|
|
|
@@ -72,7 +72,7 @@ Slack still controls when those newer features appear and how they look, so trea
|
|
|
72
72
|
### Things this parser corrects or stabilizes
|
|
73
73
|
|
|
74
74
|
- `_..._` and `__...__` are normalized into Slack-friendly `*...*` and `**...**`
|
|
75
|
-
- Bare URLs are wrapped into Slack-friendly `<https://...>` form before `markdown` block delivery
|
|
75
|
+
- Bare URLs are wrapped into Slack-friendly `<https://...>` form before `markdown` block delivery. The URL is trimmed to its real extent first (GFM-style): it stops at a doubled emphasis run (`**`/`~~`), at code/angle/pipe markers (`` ` ``, `<`, `>`, `|`), and at CJK / full-width punctuation (`、` `。` `」` `)` `!` …); trailing punctuation (GFM's autolink set `! ? . , : * _ ~`, and an unbalanced `)`) is excluded — `;` and quotes are kept because they are URL-legal. A lone `*` (URL wildcards/queries) and CJK *letters* — including iteration marks like `々` (IRIs / Unicode IDN hosts such as `https://ja.wikipedia.org/wiki/人々`) — are preserved. This keeps a scheme URL glued directly to following CJK text — common in Japanese, where no space separates them — from greedily swallowing the rest of the line (including a closing `**`) into the autolink.
|
|
76
76
|
- Malformed Markdown tables are repaired before `table` block generation
|
|
77
77
|
- Unambiguous standalone Markdown constructs are promoted into native Slack blocks:
|
|
78
78
|
- standalone image syntax `` to `image`
|
|
@@ -177,6 +177,7 @@ Rules:
|
|
|
177
177
|
- The start and end of a chunk (a line/text boundary, or the edge of a fenced code block) are treated as safe; no zero-width space is added there.
|
|
178
178
|
- When an outer edge is tight against surrounding non-boundary text, only that edge is padded with a zero-width space. The safe (boundary) edge is left clean.
|
|
179
179
|
- When an emphasis marker (`**`, `*`, `~~`) sits directly against punctuation on its inner side (for example `**注意:**` or `**70.9%→83.0%**`), a zero-width space is inserted just *inside* the marker. This makes the marker's inner neighbor a non-punctuation character, so Slack's CommonMark right-/left-flanking check succeeds regardless of what surrounds the token — including before CJK text and CJK punctuation (`、` / `。`), which Slack does not accept as a flanking neighbor. Inline code spans are exempt from this rule because they do not obey flanking rules.
|
|
180
|
+
- Emphasis delimiters are recognized only when they satisfy CommonMark's minimal flanking rule: an opening run is not immediately followed by whitespace, and a closing run is not immediately preceded by whitespace. A stray, whitespace-flanked marker (for example the literal `**` in `閉じ ** が`), or an otherwise unbalanced marker, is left untouched. This prevents one dangling marker from shifting the pairing of nearby well-formed spans and misplacing their zero-width spaces.
|
|
180
181
|
|
|
181
182
|
Exception:
|
|
182
183
|
|
{slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser/converter.py
RENAMED
|
@@ -32,10 +32,24 @@ STANDALONE_IMAGE_PATTERN = re.compile(
|
|
|
32
32
|
)
|
|
33
33
|
MARKDOWN_LINK_PATTERN = re.compile(r"\[[^\]\n]+\]\([^\)\n]+\)")
|
|
34
34
|
INLINE_CODE_SPAN_PATTERN = re.compile(r"(?<!`)`[^`\n]+`(?!`)", flags=re.DOTALL)
|
|
35
|
+
# Emphasis delimiters must satisfy CommonMark's minimal flanking requirement:
|
|
36
|
+
# an opening run is not followed by whitespace and a closing run is not preceded
|
|
37
|
+
# by whitespace. Enforcing this keeps a stray, whitespace-flanked delimiter
|
|
38
|
+
# (e.g. the literal ``**`` in ``閉じ ** が``) from being paired at all.
|
|
39
|
+
#
|
|
40
|
+
# For ``**`` and ``~~`` the body additionally may not contain the same delimiter
|
|
41
|
+
# run (``(?:(?!\*\*).)+?`` / ``(?:(?!~~).)+?``). Without this, a dangling opener
|
|
42
|
+
# with no valid closer of its own (``**oops ** and **70.9%→83.0%**``) would scan
|
|
43
|
+
# past the literal stray and steal a *later* well-formed span's closing marker,
|
|
44
|
+
# shifting the pairing and corrupting that span's ZWSP placement. Bounding the
|
|
45
|
+
# body to a single run makes the regex pair the same markers CommonMark does.
|
|
46
|
+
# (The single-``*`` italic body is intentionally not bounded this way: italics
|
|
47
|
+
# legitimately wrap ``**bold**`` and ``*`` is heavily overloaded, so it keeps the
|
|
48
|
+
# whitespace guard only.)
|
|
35
49
|
EMPHASIS_PATTERNS = (
|
|
36
|
-
re.compile(r"(?<!\*)\*\*(
|
|
37
|
-
re.compile(r"(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)", flags=re.DOTALL),
|
|
38
|
-
re.compile(r"~~(
|
|
50
|
+
re.compile(r"(?<!\*)\*\*(?!\s)((?:(?!\*\*).)+?)(?<!\s)\*\*(?!\*)", flags=re.DOTALL),
|
|
51
|
+
re.compile(r"(?<!\*)\*(?!\*)(?!\s)(.+?)(?<!\s)(?<!\*)\*(?!\*)", flags=re.DOTALL),
|
|
52
|
+
re.compile(r"~~(?!\s)((?:(?!~~).)+?)(?<!\s)~~", flags=re.DOTALL),
|
|
39
53
|
)
|
|
40
54
|
INLINE_CODE_PLACEHOLDER_PATTERN = re.compile(r"\ufff0code\d+\ufff1")
|
|
41
55
|
PROTECTED_UNDERSCORE_SPAN_PATTERN = re.compile(
|
|
@@ -129,6 +143,72 @@ def _is_han_or_kana_char(char: str) -> bool:
|
|
|
129
143
|
)
|
|
130
144
|
|
|
131
145
|
|
|
146
|
+
# Code/angle/pipe markers that never appear inside a bare URL in this library's
|
|
147
|
+
# prose context. (A single ``*`` and CJK letters are intentionally NOT here: a
|
|
148
|
+
# URL may legally contain a wildcard/query ``*`` and an IRI/IDN may contain CJK
|
|
149
|
+
# letters, so those must be preserved.)
|
|
150
|
+
_URL_STOP_CHARS = frozenset("`<>|")
|
|
151
|
+
# Trailing punctuation stripped from the end of a bare URL. This is exactly
|
|
152
|
+
# GFM's autolink-extension set (``! ? . , : * _ ~``); a closing paren is handled
|
|
153
|
+
# separately, with balancing. ``;`` and quotes are intentionally NOT included —
|
|
154
|
+
# ``;`` is URL-legal in matrix/path parameters and quotes are sub-delimiters, so
|
|
155
|
+
# trimming them could change the link target rather than just shedding prose.
|
|
156
|
+
_URL_TRAILING_PUNCTUATION = frozenset("!?.,:*_~")
|
|
157
|
+
# CJK and full/half-width punctuation/brackets that terminate prose, so a bare
|
|
158
|
+
# URL is cut here. This is an explicit set rather than the whole U+3000–U+303F
|
|
159
|
+
# block on purpose: letter-like CJK iteration marks (々 U+3005, 〻 U+303B),
|
|
160
|
+
# ditto/closure marks (〆 U+3006) and the ideographic number zero (〇 U+3007)
|
|
161
|
+
# are *excluded* so IRIs such as ``https://ja.wikipedia.org/wiki/人々`` survive.
|
|
162
|
+
_URL_CJK_BOUNDARY_CHARS = frozenset(
|
|
163
|
+
"、。〃〈〉《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞・…" # CJK punctuation & brackets
|
|
164
|
+
"!?,.:;()[]{}<>|" # full-width punctuation & brackets
|
|
165
|
+
"。「」、" # half-width CJK punctuation & brackets
|
|
166
|
+
)
|
|
167
|
+
|
|
168
|
+
|
|
169
|
+
def _is_url_boundary_char(char: str) -> bool:
|
|
170
|
+
"""Return True when ``char`` is a hard boundary where a bare URL must stop.
|
|
171
|
+
|
|
172
|
+
Only unambiguous prose/markup boundaries qualify: code/angle/pipe markers
|
|
173
|
+
and CJK/full-width *punctuation* (``、``/``。``/``」``/``)`` …). CJK
|
|
174
|
+
*letters* (including iteration marks like ``々``) are not a boundary, so
|
|
175
|
+
IRIs such as ``https://ja.wikipedia.org/wiki/人々`` survive.
|
|
176
|
+
"""
|
|
177
|
+
return char in _URL_STOP_CHARS or char in _URL_CJK_BOUNDARY_CHARS
|
|
178
|
+
|
|
179
|
+
|
|
180
|
+
def _trim_bare_url(url: str) -> str:
|
|
181
|
+
"""Trim a greedily matched bare URL down to its real extent.
|
|
182
|
+
|
|
183
|
+
In CJK writing a URL is usually glued directly to the following text with no
|
|
184
|
+
whitespace, so the greedy ``[^\\s<]+`` match would otherwise swallow the
|
|
185
|
+
trailing ``)``/``**``/``。`` and the rest of the sentence. This stops the URL
|
|
186
|
+
at the first hard boundary or doubled emphasis run (``**``/``~~``) — single
|
|
187
|
+
``*`` and CJK letters are preserved — then drops GFM-style trailing
|
|
188
|
+
punctuation and unbalanced closing parens, so ``https://example.com)**。``
|
|
189
|
+
becomes ``https://example.com``.
|
|
190
|
+
"""
|
|
191
|
+
for index, char in enumerate(url):
|
|
192
|
+
nxt = url[index + 1] if index + 1 < len(url) else ""
|
|
193
|
+
if _is_url_boundary_char(char) or (char in "*~" and nxt == char):
|
|
194
|
+
url = url[:index]
|
|
195
|
+
break
|
|
196
|
+
|
|
197
|
+
while url:
|
|
198
|
+
last = url[-1]
|
|
199
|
+
if last == ")":
|
|
200
|
+
if url.count(")") <= url.count("("):
|
|
201
|
+
break
|
|
202
|
+
url = url[:-1]
|
|
203
|
+
continue
|
|
204
|
+
if last in _URL_TRAILING_PUNCTUATION:
|
|
205
|
+
url = url[:-1]
|
|
206
|
+
continue
|
|
207
|
+
break
|
|
208
|
+
|
|
209
|
+
return url
|
|
210
|
+
|
|
211
|
+
|
|
132
212
|
def _nested_code_space_strategy(
|
|
133
213
|
source: str,
|
|
134
214
|
start: int,
|
|
@@ -582,9 +662,15 @@ def normalize_bare_urls_for_slack_markdown(text: str) -> str:
|
|
|
582
662
|
|
|
583
663
|
url_match = BARE_URL_PATTERN.match(chunk, cursor)
|
|
584
664
|
if url_match:
|
|
585
|
-
|
|
586
|
-
|
|
587
|
-
|
|
665
|
+
url = _trim_bare_url(url_match.group(0))
|
|
666
|
+
scheme = re.match(r"https?://", url, re.IGNORECASE)
|
|
667
|
+
# Only autolink when something host-like survives the trim;
|
|
668
|
+
# a bare ``https://`` followed straight by CJK would otherwise
|
|
669
|
+
# produce an empty ``<https://>`` autolink.
|
|
670
|
+
if scheme and len(url) > scheme.end():
|
|
671
|
+
parts.append(f"<{url}>")
|
|
672
|
+
cursor += len(url)
|
|
673
|
+
continue
|
|
588
674
|
|
|
589
675
|
parts.append(char)
|
|
590
676
|
cursor += 1
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|