slack-markdown-parser 2.4.1__tar.gz → 2.4.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (18) hide show
  1. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/CHANGELOG.md +13 -0
  2. {slack_markdown_parser-2.4.1/slack_markdown_parser.egg-info → slack_markdown_parser-2.4.3}/PKG-INFO +1 -1
  3. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/docs/spec-ja.md +2 -1
  4. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/docs/spec.md +2 -1
  5. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/pyproject.toml +1 -1
  6. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser/__init__.py +1 -1
  7. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser/converter.py +92 -6
  8. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3/slack_markdown_parser.egg-info}/PKG-INFO +1 -1
  9. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/LICENSE +0 -0
  10. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/MANIFEST.in +0 -0
  11. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/README-ja.md +0 -0
  12. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/README.md +0 -0
  13. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/setup.cfg +0 -0
  14. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser/py.typed +0 -0
  15. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser.egg-info/SOURCES.txt +0 -0
  16. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser.egg-info/dependency_links.txt +0 -0
  17. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser.egg-info/requires.txt +0 -0
  18. {slack_markdown_parser-2.4.1 → slack_markdown_parser-2.4.3}/slack_markdown_parser.egg-info/top_level.txt +0 -0
@@ -6,6 +6,19 @@ The format is based on Keep a Changelog, and the project follows Semantic Versio
6
6
 
7
7
  ## [Unreleased]
8
8
 
9
+ ## [2.4.3] - 2026-05-29
10
+
11
+ ### Fixed
12
+
13
+ - Stopped bare-URL autolinking from greedily swallowing trailing text. `normalize_bare_urls_for_slack_markdown` matched `https?://[^\s<]+`, so a scheme URL glued directly to following CJK text (e.g. `(https://example.com)**。に句点を直結。`) — common in Japanese, which puts no space after a URL — captured the closing paren, the `**` markers, the CJK punctuation, and the rest of the sentence into one `<…>` autolink, over-extending the link and exposing the literal `**`. The matched URL is now trimmed GFM-style: it stops at a doubled emphasis run (`**`/`~~`), at code/angle/pipe markers (`` ` ``, `<`, `>`, `|`), and at CJK punctuation (`、`/`。`/`」` …), and trailing punctuation (GFM's autolink set `! ? . , : * _ ~`, and an unbalanced `)`) is dropped while balanced parentheses are kept. `;` and quotes are kept (URL-legal), and a lone `*` (URL wildcards/queries) and CJK letters (IRIs / Unicode IDN hosts) are preserved.
14
+
15
+ ## [2.4.2] - 2026-05-29
16
+
17
+ ### Fixed
18
+
19
+ - Stopped an unbalanced emphasis delimiter from corrupting unrelated, well-formed spans in the same block. The bold/italic/strikethrough patterns are matched with `re.DOTALL`, so a single stray `**` (for example a whitespace-flanked literal `**` in `閉じ ** が`, or an unclosed marker) shifted marker pairing across the whole block and flipped the protective ZWSP of nearby punctuation-terminated bold to the broken *outer* position, re-exposing the literal markers on Slack. `EMPHASIS_PATTERNS` now enforces CommonMark's minimal flanking requirement — an opening run is not followed by whitespace and a closing run is not preceded by whitespace — so a non-flanking stray marker stays literal and no longer disturbs its neighbours.
20
+ - Bounded the `**` and `~~` emphasis bodies to a single delimiter run so a dangling opener with no valid closer of its own (for example `**oops **` or `**: x **` before a later `**…%**`) can no longer scan past the literal stray and steal a following well-formed span's closing marker, which had misplaced that span's protective ZWSP. The single-`*` italic body is intentionally left unbounded because italics legitimately wrap `**bold**`.
21
+
9
22
  ## [2.4.1] - 2026-05-29
10
23
 
11
24
  ### Fixed
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: slack-markdown-parser
3
- Version: 2.4.1
3
+ Version: 2.4.3
4
4
  Summary: Convert LLM Markdown into Slack Block Kit messages
5
5
  Author: darkgaldragon
6
6
  License-Expression: MIT
@@ -72,7 +72,7 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
72
72
  ### このパーサーが補正・安定化するもの
73
73
 
74
74
  - `_..._` / `__...__` を Slack 互換の `*...*` / `**...**` に正規化する
75
- - bare URL を Slack で安定しやすい `<https://...>` 形式にそろえる
75
+ - bare URL を Slack で安定しやすい `<https://...>` 形式にそろえる。まず URL を実際の範囲にトリミングする(GFM 風): 二重の強調記号(`**`/`~~`)・コード/山かっこ/パイプ記号(`` ` ``・`<`・`>`・`|`)・CJK / 全角の句読点(`、` `。` `」` `)` `!` …)で停止し、末尾の句読点(GFM の autolink 集合 `! ? . , : * _ ~` と不均衡な `)`)を除外する(`;` と引用符は URL で正当なため保持)。単独の `*`(URL のワイルドカード/クエリ)と CJK の**文字**(反復記号 `々` を含む。IRI / Unicode IDN ホスト。例 `https://ja.wikipedia.org/wiki/人々`)は保持する。これにより、日本語のように URL の直後へ空白なしで CJK 本文が続く場合でも、URL が行末まで(閉じの `**` ごと)貪欲に飲み込んでリンク化する事故を防ぐ。
76
76
  - 崩れた Markdown テーブルを補って `table` ブロックへ変換する
77
77
  - 意味が明確な単独 Markdown 構文を Slack ネイティブのブロックへ変換する
78
78
  - 単独行の画像構文 `![alt](https://...)` → `image`
@@ -178,6 +178,7 @@ LLM は外枠パイプの省略、区切り行の欠落、列数の不一致な
178
178
  - チャンクの先頭・末尾(行頭・行末・テキスト端、またはフェンスドコードブロックの境界)は安全とみなし、ゼロ幅スペースを付けません。
179
179
  - 外側の片方が前後の非境界テキストに密着している場合、その側だけにゼロ幅スペースを付けます。安全(境界)側はそのままにします。
180
180
  - 強調マーカー(`**`・`*`・`~~`)の内側が句読点に密着している場合(例 `**注意:**` や `**70.9%→83.0%**`)、マーカーの内側にゼロ幅スペースを挿入します。これによりマーカーの内側隣接文字が非句読点になり、後続が何であっても Slack の CommonMark right-/left-flanking 判定が成立します。Slack が flanking 近傍として認めない CJK テキストや CJK 句読点(`、` / `。`)の直前でも有効です。インラインコードは flanking 規則の対象外なので、このルールから除外します。
181
+ - 強調デリミタは CommonMark の最小 flanking 条件を満たす場合のみ認識します。すなわち、開きランの直後が空白でなく、閉じランの直前が空白でないこと。両側が空白の単独マーカー(例 `閉じ ** が` の literal な `**`)や、その他の対になっていないマーカーはそのまま残します。これにより、1 個の余分なマーカーが近くの正しい装飾のペアリングをずらして、ゼロ幅スペースを誤った位置に挿入することを防ぎます。
181
182
 
182
183
  例外:
183
184
 
@@ -72,7 +72,7 @@ Slack still controls when those newer features appear and how they look, so trea
72
72
  ### Things this parser corrects or stabilizes
73
73
 
74
74
  - `_..._` and `__...__` are normalized into Slack-friendly `*...*` and `**...**`
75
- - Bare URLs are wrapped into Slack-friendly `<https://...>` form before `markdown` block delivery
75
+ - Bare URLs are wrapped into Slack-friendly `<https://...>` form before `markdown` block delivery. The URL is trimmed to its real extent first (GFM-style): it stops at a doubled emphasis run (`**`/`~~`), at code/angle/pipe markers (`` ` ``, `<`, `>`, `|`), and at CJK / full-width punctuation (`、` `。` `」` `)` `!` …); trailing punctuation (GFM's autolink set `! ? . , : * _ ~`, and an unbalanced `)`) is excluded — `;` and quotes are kept because they are URL-legal. A lone `*` (URL wildcards/queries) and CJK *letters* — including iteration marks like `々` (IRIs / Unicode IDN hosts such as `https://ja.wikipedia.org/wiki/人々`) — are preserved. This keeps a scheme URL glued directly to following CJK text — common in Japanese, where no space separates them — from greedily swallowing the rest of the line (including a closing `**`) into the autolink.
76
76
  - Malformed Markdown tables are repaired before `table` block generation
77
77
  - Unambiguous standalone Markdown constructs are promoted into native Slack blocks:
78
78
  - standalone image syntax `![alt](https://...)` to `image`
@@ -177,6 +177,7 @@ Rules:
177
177
  - The start and end of a chunk (a line/text boundary, or the edge of a fenced code block) are treated as safe; no zero-width space is added there.
178
178
  - When an outer edge is tight against surrounding non-boundary text, only that edge is padded with a zero-width space. The safe (boundary) edge is left clean.
179
179
  - When an emphasis marker (`**`, `*`, `~~`) sits directly against punctuation on its inner side (for example `**注意:**` or `**70.9%→83.0%**`), a zero-width space is inserted just *inside* the marker. This makes the marker's inner neighbor a non-punctuation character, so Slack's CommonMark right-/left-flanking check succeeds regardless of what surrounds the token — including before CJK text and CJK punctuation (`、` / `。`), which Slack does not accept as a flanking neighbor. Inline code spans are exempt from this rule because they do not obey flanking rules.
180
+ - Emphasis delimiters are recognized only when they satisfy CommonMark's minimal flanking rule: an opening run is not immediately followed by whitespace, and a closing run is not immediately preceded by whitespace. A stray, whitespace-flanked marker (for example the literal `**` in `閉じ ** が`), or an otherwise unbalanced marker, is left untouched. This prevents one dangling marker from shifting the pairing of nearby well-formed spans and misplacing their zero-width spaces.
180
181
 
181
182
  Exception:
182
183
 
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "slack-markdown-parser"
7
- version = "2.4.1"
7
+ version = "2.4.3"
8
8
  description = "Convert LLM Markdown into Slack Block Kit messages"
9
9
  readme = "README.md"
10
10
  requires-python = ">=3.10"
@@ -1,6 +1,6 @@
1
1
  """slack-markdown-parser public package API."""
2
2
 
3
- __version__ = "2.4.1"
3
+ __version__ = "2.4.3"
4
4
  __license__ = "MIT"
5
5
 
6
6
  from .converter import (
@@ -32,10 +32,24 @@ STANDALONE_IMAGE_PATTERN = re.compile(
32
32
  )
33
33
  MARKDOWN_LINK_PATTERN = re.compile(r"\[[^\]\n]+\]\([^\)\n]+\)")
34
34
  INLINE_CODE_SPAN_PATTERN = re.compile(r"(?<!`)`[^`\n]+`(?!`)", flags=re.DOTALL)
35
+ # Emphasis delimiters must satisfy CommonMark's minimal flanking requirement:
36
+ # an opening run is not followed by whitespace and a closing run is not preceded
37
+ # by whitespace. Enforcing this keeps a stray, whitespace-flanked delimiter
38
+ # (e.g. the literal ``**`` in ``閉じ ** が``) from being paired at all.
39
+ #
40
+ # For ``**`` and ``~~`` the body additionally may not contain the same delimiter
41
+ # run (``(?:(?!\*\*).)+?`` / ``(?:(?!~~).)+?``). Without this, a dangling opener
42
+ # with no valid closer of its own (``**oops ** and **70.9%→83.0%**``) would scan
43
+ # past the literal stray and steal a *later* well-formed span's closing marker,
44
+ # shifting the pairing and corrupting that span's ZWSP placement. Bounding the
45
+ # body to a single run makes the regex pair the same markers CommonMark does.
46
+ # (The single-``*`` italic body is intentionally not bounded this way: italics
47
+ # legitimately wrap ``**bold**`` and ``*`` is heavily overloaded, so it keeps the
48
+ # whitespace guard only.)
35
49
  EMPHASIS_PATTERNS = (
36
- re.compile(r"(?<!\*)\*\*(.+?)\*\*(?!\*)", flags=re.DOTALL),
37
- re.compile(r"(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)", flags=re.DOTALL),
38
- re.compile(r"~~(.+?)~~", flags=re.DOTALL),
50
+ re.compile(r"(?<!\*)\*\*(?!\s)((?:(?!\*\*).)+?)(?<!\s)\*\*(?!\*)", flags=re.DOTALL),
51
+ re.compile(r"(?<!\*)\*(?!\*)(?!\s)(.+?)(?<!\s)(?<!\*)\*(?!\*)", flags=re.DOTALL),
52
+ re.compile(r"~~(?!\s)((?:(?!~~).)+?)(?<!\s)~~", flags=re.DOTALL),
39
53
  )
40
54
  INLINE_CODE_PLACEHOLDER_PATTERN = re.compile(r"\ufff0code\d+\ufff1")
41
55
  PROTECTED_UNDERSCORE_SPAN_PATTERN = re.compile(
@@ -129,6 +143,72 @@ def _is_han_or_kana_char(char: str) -> bool:
129
143
  )
130
144
 
131
145
 
146
+ # Code/angle/pipe markers that never appear inside a bare URL in this library's
147
+ # prose context. (A single ``*`` and CJK letters are intentionally NOT here: a
148
+ # URL may legally contain a wildcard/query ``*`` and an IRI/IDN may contain CJK
149
+ # letters, so those must be preserved.)
150
+ _URL_STOP_CHARS = frozenset("`<>|")
151
+ # Trailing punctuation stripped from the end of a bare URL. This is exactly
152
+ # GFM's autolink-extension set (``! ? . , : * _ ~``); a closing paren is handled
153
+ # separately, with balancing. ``;`` and quotes are intentionally NOT included —
154
+ # ``;`` is URL-legal in matrix/path parameters and quotes are sub-delimiters, so
155
+ # trimming them could change the link target rather than just shedding prose.
156
+ _URL_TRAILING_PUNCTUATION = frozenset("!?.,:*_~")
157
+ # CJK and full/half-width punctuation/brackets that terminate prose, so a bare
158
+ # URL is cut here. This is an explicit set rather than the whole U+3000–U+303F
159
+ # block on purpose: letter-like CJK iteration marks (々 U+3005, 〻 U+303B),
160
+ # ditto/closure marks (〆 U+3006) and the ideographic number zero (〇 U+3007)
161
+ # are *excluded* so IRIs such as ``https://ja.wikipedia.org/wiki/人々`` survive.
162
+ _URL_CJK_BOUNDARY_CHARS = frozenset(
163
+ "、。〃〈〉《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞・…" # CJK punctuation & brackets
164
+ "!?,.:;()[]{}<>|" # full-width punctuation & brackets
165
+ "。「」、" # half-width CJK punctuation & brackets
166
+ )
167
+
168
+
169
+ def _is_url_boundary_char(char: str) -> bool:
170
+ """Return True when ``char`` is a hard boundary where a bare URL must stop.
171
+
172
+ Only unambiguous prose/markup boundaries qualify: code/angle/pipe markers
173
+ and CJK/full-width *punctuation* (``、``/``。``/``」``/``)`` …). CJK
174
+ *letters* (including iteration marks like ``々``) are not a boundary, so
175
+ IRIs such as ``https://ja.wikipedia.org/wiki/人々`` survive.
176
+ """
177
+ return char in _URL_STOP_CHARS or char in _URL_CJK_BOUNDARY_CHARS
178
+
179
+
180
+ def _trim_bare_url(url: str) -> str:
181
+ """Trim a greedily matched bare URL down to its real extent.
182
+
183
+ In CJK writing a URL is usually glued directly to the following text with no
184
+ whitespace, so the greedy ``[^\\s<]+`` match would otherwise swallow the
185
+ trailing ``)``/``**``/``。`` and the rest of the sentence. This stops the URL
186
+ at the first hard boundary or doubled emphasis run (``**``/``~~``) — single
187
+ ``*`` and CJK letters are preserved — then drops GFM-style trailing
188
+ punctuation and unbalanced closing parens, so ``https://example.com)**。``
189
+ becomes ``https://example.com``.
190
+ """
191
+ for index, char in enumerate(url):
192
+ nxt = url[index + 1] if index + 1 < len(url) else ""
193
+ if _is_url_boundary_char(char) or (char in "*~" and nxt == char):
194
+ url = url[:index]
195
+ break
196
+
197
+ while url:
198
+ last = url[-1]
199
+ if last == ")":
200
+ if url.count(")") <= url.count("("):
201
+ break
202
+ url = url[:-1]
203
+ continue
204
+ if last in _URL_TRAILING_PUNCTUATION:
205
+ url = url[:-1]
206
+ continue
207
+ break
208
+
209
+ return url
210
+
211
+
132
212
  def _nested_code_space_strategy(
133
213
  source: str,
134
214
  start: int,
@@ -582,9 +662,15 @@ def normalize_bare_urls_for_slack_markdown(text: str) -> str:
582
662
 
583
663
  url_match = BARE_URL_PATTERN.match(chunk, cursor)
584
664
  if url_match:
585
- parts.append(f"<{url_match.group(0)}>")
586
- cursor = url_match.end()
587
- continue
665
+ url = _trim_bare_url(url_match.group(0))
666
+ scheme = re.match(r"https?://", url, re.IGNORECASE)
667
+ # Only autolink when something host-like survives the trim;
668
+ # a bare ``https://`` followed straight by CJK would otherwise
669
+ # produce an empty ``<https://>`` autolink.
670
+ if scheme and len(url) > scheme.end():
671
+ parts.append(f"<{url}>")
672
+ cursor += len(url)
673
+ continue
588
674
 
589
675
  parts.append(char)
590
676
  cursor += 1
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: slack-markdown-parser
3
- Version: 2.4.1
3
+ Version: 2.4.3
4
4
  Summary: Convert LLM Markdown into Slack Block Kit messages
5
5
  Author: darkgaldragon
6
6
  License-Expression: MIT