slack-markdown-parser 2.4.2__tar.gz → 2.4.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (18) hide show
  1. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/CHANGELOG.md +13 -0
  2. {slack_markdown_parser-2.4.2/slack_markdown_parser.egg-info → slack_markdown_parser-2.4.4}/PKG-INFO +1 -1
  3. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/docs/spec-ja.md +1 -1
  4. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/docs/spec.md +2 -1
  5. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/pyproject.toml +1 -1
  6. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser/__init__.py +1 -1
  7. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser/converter.py +209 -89
  8. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4/slack_markdown_parser.egg-info}/PKG-INFO +1 -1
  9. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/LICENSE +0 -0
  10. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/MANIFEST.in +0 -0
  11. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/README-ja.md +0 -0
  12. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/README.md +0 -0
  13. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/setup.cfg +0 -0
  14. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser/py.typed +0 -0
  15. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser.egg-info/SOURCES.txt +0 -0
  16. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser.egg-info/dependency_links.txt +0 -0
  17. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser.egg-info/requires.txt +0 -0
  18. {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser.egg-info/top_level.txt +0 -0
@@ -6,6 +6,19 @@ The format is based on Keep a Changelog, and the project follows Semantic Versio
6
6
 
7
7
  ## [Unreleased]
8
8
 
9
+ ## [2.4.4] - 2026-06-10
10
+
11
+ ### Fixed
12
+
13
+ - Stopped a link wrapped entirely in emphasis (`**[text](url)**`, and the `*`/`_`/`__`/`~~`/`***` variants) from rendering as dead, literal text in `rich_text` contexts such as list items and table cells. The emphasis token pattern matched the whole `**[text](url)**` span before the link pattern got a chance, so `_create_rich_text_inline_elements` emitted the inner `[text](url)` as a styled plain-text run and the link was never clickable. When a styled (non-code) token's inner content is itself a complete markdown link, it now becomes a `link` element carrying that style (e.g. a bold link), matching the already-working emphasis-inside-the-brackets form (`[**text**](url)`).
14
+ - Stopped Slack mention tokens from rendering as literal text inside promoted list items. Since 2.4.0 a simple list is emitted as a `rich_text` block, but the inline builder only tokenized links, code, and emphasis — so a `<#C123>` / `<@U123>` / `<!subteam^S123>` / `<!here>` token in a list item fell through as a plain `text` run. In a `rich_text` block a mention has to be a structured element (`channel`, `user`, `usergroup`, `broadcast`), so Slack showed the raw `<#C123>` rather than a pill. (Prose was unaffected: it stays in a `markdown` block, where Slack resolves the token itself.) These tokens are now converted to the matching rich_text elements, an optional `|label` display suffix is dropped (Slack renders the element from the id), and the plain-text fallback re-emits the canonical `<#C123>` token so a downgraded mrkdwn fallback still links and notifies. The same applies inside table cells: `extract_plain_text_from_table_cell` now delegates inline runs to the shared rich_text downgrade path instead of a separate near-copy, so a mention in a cell also survives into the fallback text rather than vanishing.
15
+
16
+ ## [2.4.3] - 2026-05-29
17
+
18
+ ### Fixed
19
+
20
+ - Stopped bare-URL autolinking from greedily swallowing trailing text. `normalize_bare_urls_for_slack_markdown` matched `https?://[^\s<]+`, so a scheme URL glued directly to following CJK text (e.g. `(https://example.com)**。に句点を直結。`) — common in Japanese, which puts no space after a URL — captured the closing paren, the `**` markers, the CJK punctuation, and the rest of the sentence into one `<…>` autolink, over-extending the link and exposing the literal `**`. The matched URL is now trimmed GFM-style: it stops at a doubled emphasis run (`**`/`~~`), at code/angle/pipe markers (`` ` ``, `<`, `>`, `|`), and at CJK punctuation (`、`/`。`/`」` …), and trailing punctuation (GFM's autolink set `! ? . , : * _ ~`, and an unbalanced `)`) is dropped while balanced parentheses are kept. `;` and quotes are kept (URL-legal), and a lone `*` (URL wildcards/queries) and CJK letters (IRIs / Unicode IDN hosts) are preserved.
21
+
9
22
  ## [2.4.2] - 2026-05-29
10
23
 
11
24
  ### Fixed
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: slack-markdown-parser
3
- Version: 2.4.2
3
+ Version: 2.4.4
4
4
  Summary: Convert LLM Markdown into Slack Block Kit messages
5
5
  Author: darkgaldragon
6
6
  License-Expression: MIT
@@ -72,7 +72,7 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
72
72
  ### このパーサーが補正・安定化するもの
73
73
 
74
74
  - `_..._` / `__...__` を Slack 互換の `*...*` / `**...**` に正規化する
75
- - bare URL を Slack で安定しやすい `<https://...>` 形式にそろえる
75
+ - bare URL を Slack で安定しやすい `<https://...>` 形式にそろえる。まず URL を実際の範囲にトリミングする(GFM 風): 二重の強調記号(`**`/`~~`)・コード/山かっこ/パイプ記号(`` ` ``・`<`・`>`・`|`)・CJK / 全角の句読点(`、` `。` `」` `)` `!` …)で停止し、末尾の句読点(GFM の autolink 集合 `! ? . , : * _ ~` と不均衡な `)`)を除外する(`;` と引用符は URL で正当なため保持)。単独の `*`(URL のワイルドカード/クエリ)と CJK の**文字**(反復記号 `々` を含む。IRI / Unicode IDN ホスト。例 `https://ja.wikipedia.org/wiki/人々`)は保持する。これにより、日本語のように URL の直後へ空白なしで CJK 本文が続く場合でも、URL が行末まで(閉じの `**` ごと)貪欲に飲み込んでリンク化する事故を防ぐ。
76
76
  - 崩れた Markdown テーブルを補って `table` ブロックへ変換する
77
77
  - 意味が明確な単独 Markdown 構文を Slack ネイティブのブロックへ変換する
78
78
  - 単独行の画像構文 `![alt](https://...)` → `image`
@@ -72,7 +72,7 @@ Slack still controls when those newer features appear and how they look, so trea
72
72
  ### Things this parser corrects or stabilizes
73
73
 
74
74
  - `_..._` and `__...__` are normalized into Slack-friendly `*...*` and `**...**`
75
- - Bare URLs are wrapped into Slack-friendly `<https://...>` form before `markdown` block delivery
75
+ - Bare URLs are wrapped into Slack-friendly `<https://...>` form before `markdown` block delivery. The URL is trimmed to its real extent first (GFM-style): it stops at a doubled emphasis run (`**`/`~~`), at code/angle/pipe markers (`` ` ``, `<`, `>`, `|`), and at CJK / full-width punctuation (`、` `。` `」` `)` `!` …); trailing punctuation (GFM's autolink set `! ? . , : * _ ~`, and an unbalanced `)`) is excluded — `;` and quotes are kept because they are URL-legal. A lone `*` (URL wildcards/queries) and CJK *letters* — including iteration marks like `々` (IRIs / Unicode IDN hosts such as `https://ja.wikipedia.org/wiki/人々`) — are preserved. This keeps a scheme URL glued directly to following CJK text — common in Japanese, where no space separates them — from greedily swallowing the rest of the line (including a closing `**`) into the autolink.
76
76
  - Malformed Markdown tables are repaired before `table` block generation
77
77
  - Unambiguous standalone Markdown constructs are promoted into native Slack blocks:
78
78
  - standalone image syntax `![alt](https://...)` to `image`
@@ -81,6 +81,7 @@ Slack still controls when those newer features appear and how they look, so trea
81
81
  - simple one-level quotes to `rich_text_quote`
82
82
  - simple bullet and ordered lists to `rich_text_list`
83
83
  - Lists are promoted only when the list starts at the beginning of the text region or after a blank line, each non-blank line in the run is a list item, the list does not use ambiguous 1-3-space nested indentation, the item text does not rely on Markdown backslash escapes, and the run is not followed by an indented continuation paragraph.
84
+ - Slack mention tokens inside a promoted list item are converted to their structured `rich_text` elements — `<@U…>`/`<@W…>` to `user`, `<#C…>`/`<#G…>` to `channel`, `<!subteam^S…>` to `usergroup`, and `<!here>`/`<!channel>`/`<!everyone>` to `broadcast` — since a `rich_text` block does not resolve a raw token. An optional `|label` display suffix is dropped (Slack renders the element from the id).
84
85
  - Table-like rows inside fenced code blocks are kept out of table parsing
85
86
  - Internal blank lines can optionally be rewritten into placeholder lines so Slack keeps visible paragraph separation
86
87
  - Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "slack-markdown-parser"
7
- version = "2.4.2"
7
+ version = "2.4.4"
8
8
  description = "Convert LLM Markdown into Slack Block Kit messages"
9
9
  readme = "README.md"
10
10
  requires-python = ">=3.10"
@@ -1,6 +1,6 @@
1
1
  """slack-markdown-parser public package API."""
2
2
 
3
- __version__ = "2.4.2"
3
+ __version__ = "2.4.4"
4
4
  __license__ = "MIT"
5
5
 
6
6
  from .converter import (
@@ -84,6 +84,12 @@ LOOSE_TABLE_SEPARATOR_PATTERN = re.compile(
84
84
  TABLE_TOKEN_PATTERN = re.compile(
85
85
  r"\[(?P<markdown_label>[^\]\n]+)\]\((?P<markdown_url>https?://[^\s)]+)\)"
86
86
  r"|<(?P<angle_url>https?://[^>\s|]+)(?:\|(?P<angle_label>[^>\n]+))?>"
87
+ # Slack mention tokens: user (<@U…>/<@W…>), channel (<#C…>/<#G…>), user
88
+ # group (<!subteam^S…>), and broadcast (<!here>/<!channel>/<!everyone>).
89
+ # An optional ``|label`` is the human-readable display the author saw; the
90
+ # rich_text element is rendered by Slack from the id, so the label is dropped.
91
+ r"|<(?P<mention>@[UW][A-Z0-9]+|#[CG][A-Z0-9]+|!subteam\^[A-Z0-9]+"
92
+ r"|!(?:here|channel|everyone))(?:\|[^>\n]+)?>"
87
93
  r"|(?P<token>"
88
94
  r"(?P<code>(?P<code_delimiter>`+)(?P<code_text>[^\n]+?)(?P=code_delimiter))"
89
95
  r"|~~[^~]+~~"
@@ -143,6 +149,72 @@ def _is_han_or_kana_char(char: str) -> bool:
143
149
  )
144
150
 
145
151
 
152
+ # Code/angle/pipe markers that never appear inside a bare URL in this library's
153
+ # prose context. (A single ``*`` and CJK letters are intentionally NOT here: a
154
+ # URL may legally contain a wildcard/query ``*`` and an IRI/IDN may contain CJK
155
+ # letters, so those must be preserved.)
156
+ _URL_STOP_CHARS = frozenset("`<>|")
157
+ # Trailing punctuation stripped from the end of a bare URL. This is exactly
158
+ # GFM's autolink-extension set (``! ? . , : * _ ~``); a closing paren is handled
159
+ # separately, with balancing. ``;`` and quotes are intentionally NOT included —
160
+ # ``;`` is URL-legal in matrix/path parameters and quotes are sub-delimiters, so
161
+ # trimming them could change the link target rather than just shedding prose.
162
+ _URL_TRAILING_PUNCTUATION = frozenset("!?.,:*_~")
163
+ # CJK and full/half-width punctuation/brackets that terminate prose, so a bare
164
+ # URL is cut here. This is an explicit set rather than the whole U+3000–U+303F
165
+ # block on purpose: letter-like CJK iteration marks (々 U+3005, 〻 U+303B),
166
+ # ditto/closure marks (〆 U+3006) and the ideographic number zero (〇 U+3007)
167
+ # are *excluded* so IRIs such as ``https://ja.wikipedia.org/wiki/人々`` survive.
168
+ _URL_CJK_BOUNDARY_CHARS = frozenset(
169
+ "、。〃〈〉《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞・…" # CJK punctuation & brackets
170
+ "!?,.:;()[]{}<>|" # full-width punctuation & brackets
171
+ "。「」、" # half-width CJK punctuation & brackets
172
+ )
173
+
174
+
175
+ def _is_url_boundary_char(char: str) -> bool:
176
+ """Return True when ``char`` is a hard boundary where a bare URL must stop.
177
+
178
+ Only unambiguous prose/markup boundaries qualify: code/angle/pipe markers
179
+ and CJK/full-width *punctuation* (``、``/``。``/``」``/``)`` …). CJK
180
+ *letters* (including iteration marks like ``々``) are not a boundary, so
181
+ IRIs such as ``https://ja.wikipedia.org/wiki/人々`` survive.
182
+ """
183
+ return char in _URL_STOP_CHARS or char in _URL_CJK_BOUNDARY_CHARS
184
+
185
+
186
+ def _trim_bare_url(url: str) -> str:
187
+ """Trim a greedily matched bare URL down to its real extent.
188
+
189
+ In CJK writing a URL is usually glued directly to the following text with no
190
+ whitespace, so the greedy ``[^\\s<]+`` match would otherwise swallow the
191
+ trailing ``)``/``**``/``。`` and the rest of the sentence. This stops the URL
192
+ at the first hard boundary or doubled emphasis run (``**``/``~~``) — single
193
+ ``*`` and CJK letters are preserved — then drops GFM-style trailing
194
+ punctuation and unbalanced closing parens, so ``https://example.com)**。``
195
+ becomes ``https://example.com``.
196
+ """
197
+ for index, char in enumerate(url):
198
+ nxt = url[index + 1] if index + 1 < len(url) else ""
199
+ if _is_url_boundary_char(char) or (char in "*~" and nxt == char):
200
+ url = url[:index]
201
+ break
202
+
203
+ while url:
204
+ last = url[-1]
205
+ if last == ")":
206
+ if url.count(")") <= url.count("("):
207
+ break
208
+ url = url[:-1]
209
+ continue
210
+ if last in _URL_TRAILING_PUNCTUATION:
211
+ url = url[:-1]
212
+ continue
213
+ break
214
+
215
+ return url
216
+
217
+
146
218
  def _nested_code_space_strategy(
147
219
  source: str,
148
220
  start: int,
@@ -596,9 +668,15 @@ def normalize_bare_urls_for_slack_markdown(text: str) -> str:
596
668
 
597
669
  url_match = BARE_URL_PATTERN.match(chunk, cursor)
598
670
  if url_match:
599
- parts.append(f"<{url_match.group(0)}>")
600
- cursor = url_match.end()
601
- continue
671
+ url = _trim_bare_url(url_match.group(0))
672
+ scheme = re.match(r"https?://", url, re.IGNORECASE)
673
+ # Only autolink when something host-like survives the trim;
674
+ # a bare ``https://`` followed straight by CJK would otherwise
675
+ # produce an empty ``<https://>`` autolink.
676
+ if scheme and len(url) > scheme.end():
677
+ parts.append(f"<{url}>")
678
+ cursor += len(url)
679
+ continue
602
680
 
603
681
  parts.append(char)
604
682
  cursor += 1
@@ -1245,6 +1323,40 @@ def looks_like_markdown_table(text: str) -> bool:
1245
1323
  return table_like_lines >= 2
1246
1324
 
1247
1325
 
1326
+ def _slack_mention_element(mention: str) -> dict[str, Any]:
1327
+ """Map a Slack mention token body to its rich_text element.
1328
+
1329
+ ``mention`` is the token interior without the angle brackets or ``|label``
1330
+ (e.g. ``@U123``, ``#C123``, ``!subteam^S123``, ``!here``). Slack renders
1331
+ these from the id alone, so no display text is carried.
1332
+ """
1333
+ sigil, body = mention[0], mention[1:]
1334
+ if sigil == "@":
1335
+ return {"type": "user", "user_id": body}
1336
+ if sigil == "#":
1337
+ return {"type": "channel", "channel_id": body}
1338
+ if body.startswith("subteam^"):
1339
+ return {"type": "usergroup", "usergroup_id": body[len("subteam^") :]}
1340
+ return {"type": "broadcast", "range": body}
1341
+
1342
+
1343
+ def _slack_mention_element_to_token(element: dict[str, Any]) -> str:
1344
+ """Inverse of :func:`_slack_mention_element` for plain-text fallbacks.
1345
+
1346
+ Emitting the canonical ``<#C…>`` / ``<@U…>`` token (rather than an empty
1347
+ string) keeps the mention live when a rich_text block is downgraded to a
1348
+ mrkdwn fallback, so it still links and notifies.
1349
+ """
1350
+ element_type = element.get("type")
1351
+ if element_type == "user":
1352
+ return f"<@{element.get('user_id', '')}>"
1353
+ if element_type == "channel":
1354
+ return f"<#{element.get('channel_id', '')}>"
1355
+ if element_type == "usergroup":
1356
+ return f"<!subteam^{element.get('usergroup_id', '')}>"
1357
+ return f"<!{element.get('range', '')}>"
1358
+
1359
+
1248
1360
  def _create_rich_text_inline_elements(
1249
1361
  text: str, *, empty_text: str = ""
1250
1362
  ) -> list[dict[str, Any]]:
@@ -1268,6 +1380,7 @@ def _create_rich_text_inline_elements(
1268
1380
  markdown_url = match.group("markdown_url")
1269
1381
  angle_url = match.group("angle_url")
1270
1382
  angle_label = match.group("angle_label")
1383
+ mention = match.group("mention")
1271
1384
  token = match.group("token") or ""
1272
1385
 
1273
1386
  if markdown_label and markdown_url:
@@ -1278,6 +1391,8 @@ def _create_rich_text_inline_elements(
1278
1391
  "url": angle_url,
1279
1392
  "text": angle_label or angle_url,
1280
1393
  }
1394
+ elif mention:
1395
+ element = _slack_mention_element(mention)
1281
1396
  else:
1282
1397
  style: dict[str, bool] = {}
1283
1398
  content = token
@@ -1296,9 +1411,26 @@ def _create_rich_text_inline_elements(
1296
1411
  content = content[1:-1]
1297
1412
  style["italic"] = True
1298
1413
 
1299
- element = {"type": "text", "text": content}
1300
- if style:
1301
- element["style"] = style
1414
+ # A link wrapped entirely in emphasis (``**[text](url)**``) is matched
1415
+ # by the emphasis branch above, not the link branch, so its inner
1416
+ # content is a bare ``[text](url)``. Emit a styled ``link`` element
1417
+ # rather than a literal text run, otherwise the link is dead in Slack.
1418
+ inner_link = (
1419
+ TABLE_TOKEN_PATTERN.fullmatch(content)
1420
+ if style and not style.get("code")
1421
+ else None
1422
+ )
1423
+ if inner_link is not None and inner_link.group("markdown_url"):
1424
+ element = {
1425
+ "type": "link",
1426
+ "url": inner_link.group("markdown_url"),
1427
+ "text": inner_link.group("markdown_label"),
1428
+ "style": style,
1429
+ }
1430
+ else:
1431
+ element = {"type": "text", "text": content}
1432
+ if style:
1433
+ element["style"] = style
1302
1434
  elements.append(element)
1303
1435
  last_index = match.end()
1304
1436
 
@@ -1338,12 +1470,11 @@ def extract_plain_text_from_table_cell(cell: dict[str, Any]) -> str:
1338
1470
  if not isinstance(element, dict):
1339
1471
  continue
1340
1472
  if element.get("type") == "rich_text_section":
1341
- for child in element.get("elements", []):
1342
- if isinstance(child, dict):
1343
- if child.get("type") == "link":
1344
- texts.append(str(child.get("text") or child.get("url", "")))
1345
- else:
1346
- texts.append(child.get("text", ""))
1473
+ texts.append(
1474
+ _rich_text_inline_elements_to_plain_text(
1475
+ element.get("elements", [])
1476
+ )
1477
+ )
1347
1478
  elif "text" in element:
1348
1479
  texts.append(str(element.get("text", "")))
1349
1480
  return "".join(texts)
@@ -1410,6 +1541,8 @@ def _rich_text_inline_elements_to_plain_text(elements: list[dict[str, Any]]) ->
1410
1541
  element_type = element.get("type")
1411
1542
  if element_type == "link":
1412
1543
  texts.append(str(element.get("text") or element.get("url", "")))
1544
+ elif element_type in {"user", "channel", "usergroup", "broadcast"}:
1545
+ texts.append(_slack_mention_element_to_token(element))
1413
1546
  else:
1414
1547
  texts.append(str(element.get("text", "")))
1415
1548
  return "".join(texts)
@@ -1945,44 +2078,73 @@ def convert_markdown_to_slack_payloads(
1945
2078
  return payloads
1946
2079
 
1947
2080
 
1948
- def blocks_to_plain_text(blocks: list[dict[str, Any]]) -> str:
1949
- """Build plain text representation from Slack blocks."""
2081
+ def _markdown_block_to_plain_text(block: dict[str, Any]) -> str:
2082
+ """Downgrade a ``markdown`` block, preferring the build-time annotation."""
2083
+ text = getattr(block, "_plain_text", None) or ""
2084
+ if text:
2085
+ return str(text)
2086
+ raw_text = block.get("text", "")
2087
+ if not raw_text:
2088
+ return ""
2089
+ return _normalize_markdown_block_plain_text(
2090
+ _strip_synthetic_blank_line_placeholders(
2091
+ _strip_synthetic_spaces_from_plain_text(
2092
+ strip_zero_width_spaces(raw_text),
2093
+ getattr(block, "_synthetic_space_indices", None),
2094
+ ),
2095
+ getattr(block, "_synthetic_blank_line_indices", None),
2096
+ )
2097
+ )
2098
+
2099
+
2100
+ def _blocks_to_downgrade_parts(
2101
+ blocks: list[dict[str, Any]], *, fallback: bool
2102
+ ) -> list[str]:
2103
+ """Shared block walker behind :func:`blocks_to_plain_text` and
2104
+ :func:`build_fallback_text_from_blocks`.
2105
+
2106
+ The two public functions intentionally differ in a few policies, kept
2107
+ explicit on the ``fallback`` flag: fallback keeps table cells verbatim
2108
+ (empty cells preserve column alignment) and emits a whole table as one
2109
+ part, while the plain-text view strips zero-width spaces, drops empty
2110
+ cells, emits one part per row, and surfaces ``text`` from unknown blocks.
2111
+ """
1950
2112
  parts: list[str] = []
1951
2113
 
1952
2114
  for block in blocks or []:
1953
- block_type = block.get("type") if isinstance(block, dict) else None
2115
+ if not isinstance(block, dict):
2116
+ continue
2117
+ block_type = block.get("type")
1954
2118
 
1955
2119
  if block_type == "markdown":
1956
- text = getattr(block, "_plain_text", None) or ""
1957
- if not text:
1958
- raw_text = block.get("text", "")
1959
- if raw_text:
1960
- text = _normalize_markdown_block_plain_text(
1961
- _strip_synthetic_blank_line_placeholders(
1962
- _strip_synthetic_spaces_from_plain_text(
1963
- strip_zero_width_spaces(raw_text),
1964
- getattr(block, "_synthetic_space_indices", None),
1965
- ),
1966
- getattr(block, "_synthetic_blank_line_indices", None),
1967
- )
1968
- )
1969
- if text:
2120
+ text = _markdown_block_to_plain_text(block)
2121
+ if text.strip() if fallback else text:
1970
2122
  parts.append(text)
1971
2123
  elif block_type == "table":
1972
- rows = block.get("rows") or []
1973
- for row in rows:
1974
- cell_texts: list[str] = []
2124
+ row_texts: list[str] = []
2125
+ for row in block.get("rows") or []:
1975
2126
  if not isinstance(row, list):
1976
2127
  continue
1977
- for cell in row:
1978
- cell_text = extract_plain_text_from_table_cell(cell)
1979
- if cell_text:
1980
- cell_texts.append(strip_zero_width_spaces(cell_text))
2128
+ if fallback:
2129
+ cell_texts = [
2130
+ extract_plain_text_from_table_cell(cell) for cell in row
2131
+ ]
2132
+ else:
2133
+ cell_texts = []
2134
+ for cell in row:
2135
+ cell_text = extract_plain_text_from_table_cell(cell)
2136
+ if cell_text:
2137
+ cell_texts.append(strip_zero_width_spaces(cell_text))
1981
2138
  if cell_texts:
1982
- parts.append(" | ".join(cell_texts))
2139
+ row_texts.append(" | ".join(cell_texts))
2140
+ if fallback:
2141
+ if row_texts:
2142
+ parts.append("\n".join(row_texts))
2143
+ else:
2144
+ parts.extend(row_texts)
1983
2145
  elif block_type == "rich_text":
1984
2146
  text = _rich_text_block_to_plain_text(block)
1985
- if text:
2147
+ if text.strip() if fallback else text:
1986
2148
  parts.append(text)
1987
2149
  elif block_type == "header":
1988
2150
  text = block.get("text", {})
@@ -1998,66 +2160,24 @@ def blocks_to_plain_text(blocks: list[dict[str, Any]]) -> str:
1998
2160
  parts.append(image_text)
1999
2161
  elif block_type == "divider":
2000
2162
  parts.append(getattr(block, "_plain_text", None) or "---")
2001
- elif isinstance(block, dict):
2163
+ elif not fallback:
2002
2164
  text = block.get("text", "")
2003
2165
  if text:
2004
2166
  parts.append(str(text))
2005
2167
 
2168
+ return parts
2169
+
2170
+
2171
+ def blocks_to_plain_text(blocks: list[dict[str, Any]]) -> str:
2172
+ """Build plain text representation from Slack blocks."""
2173
+ parts = _blocks_to_downgrade_parts(blocks, fallback=False)
2006
2174
  return "\n".join([p for p in parts if p]).strip()
2007
2175
 
2008
2176
 
2009
2177
  def build_fallback_text_from_blocks(blocks: list[dict[str, Any]]) -> str:
2010
2178
  """Build Slack fallback text from block structure."""
2011
- plain_parts: list[str] = []
2012
-
2013
- for block in blocks or []:
2014
- if not isinstance(block, dict):
2015
- continue
2016
-
2017
- if block.get("type") == "markdown":
2018
- text = getattr(block, "_plain_text", None) or ""
2019
- if not text:
2020
- text = _normalize_markdown_block_plain_text(
2021
- _strip_synthetic_blank_line_placeholders(
2022
- _strip_synthetic_spaces_from_plain_text(
2023
- strip_zero_width_spaces(block.get("text", "")),
2024
- getattr(block, "_synthetic_space_indices", None),
2025
- ),
2026
- getattr(block, "_synthetic_blank_line_indices", None),
2027
- ),
2028
- )
2029
- if text.strip():
2030
- plain_parts.append(text)
2031
- elif block.get("type") == "table":
2032
- table_lines: list[str] = []
2033
- for row in block.get("rows", []):
2034
- if not isinstance(row, list):
2035
- continue
2036
- cells = [extract_plain_text_from_table_cell(cell) for cell in row]
2037
- if cells:
2038
- table_lines.append(" | ".join(cells))
2039
- if table_lines:
2040
- plain_parts.append("\n".join(table_lines))
2041
- elif block.get("type") == "rich_text":
2042
- text = _rich_text_block_to_plain_text(block)
2043
- if text.strip():
2044
- plain_parts.append(text)
2045
- elif block.get("type") == "header":
2046
- text = block.get("text", {})
2047
- if isinstance(text, dict) and text.get("text"):
2048
- plain_parts.append(str(text.get("text", "")))
2049
- elif block.get("type") == "image":
2050
- alt_text = str(block.get("alt_text", "")).strip()
2051
- image_url = str(block.get("image_url", "")).strip()
2052
- image_text = alt_text or image_url
2053
- if alt_text and image_url:
2054
- image_text = f"{alt_text} ({image_url})"
2055
- if image_text:
2056
- plain_parts.append(image_text)
2057
- elif block.get("type") == "divider":
2058
- plain_parts.append(getattr(block, "_plain_text", None) or "---")
2059
-
2060
- return "\n\n".join([part for part in plain_parts if part.strip()])
2179
+ parts = _blocks_to_downgrade_parts(blocks, fallback=True)
2180
+ return "\n\n".join([part for part in parts if part.strip()])
2061
2181
 
2062
2182
 
2063
2183
  # Backward-compatible helper retained for existing imports.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: slack-markdown-parser
3
- Version: 2.4.2
3
+ Version: 2.4.4
4
4
  Summary: Convert LLM Markdown into Slack Block Kit messages
5
5
  Author: darkgaldragon
6
6
  License-Expression: MIT