slack-markdown-parser 2.4.2__tar.gz → 2.4.4__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/CHANGELOG.md +13 -0
- {slack_markdown_parser-2.4.2/slack_markdown_parser.egg-info → slack_markdown_parser-2.4.4}/PKG-INFO +1 -1
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/docs/spec-ja.md +1 -1
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/docs/spec.md +2 -1
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/pyproject.toml +1 -1
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser/__init__.py +1 -1
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser/converter.py +209 -89
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4/slack_markdown_parser.egg-info}/PKG-INFO +1 -1
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/LICENSE +0 -0
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/MANIFEST.in +0 -0
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/README-ja.md +0 -0
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/README.md +0 -0
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/setup.cfg +0 -0
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser/py.typed +0 -0
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser.egg-info/SOURCES.txt +0 -0
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser.egg-info/dependency_links.txt +0 -0
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser.egg-info/requires.txt +0 -0
- {slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser.egg-info/top_level.txt +0 -0
|
@@ -6,6 +6,19 @@ The format is based on Keep a Changelog, and the project follows Semantic Versio
|
|
|
6
6
|
|
|
7
7
|
## [Unreleased]
|
|
8
8
|
|
|
9
|
+
## [2.4.4] - 2026-06-10
|
|
10
|
+
|
|
11
|
+
### Fixed
|
|
12
|
+
|
|
13
|
+
- Stopped a link wrapped entirely in emphasis (`**[text](url)**`, and the `*`/`_`/`__`/`~~`/`***` variants) from rendering as dead, literal text in `rich_text` contexts such as list items and table cells. The emphasis token pattern matched the whole `**[text](url)**` span before the link pattern got a chance, so `_create_rich_text_inline_elements` emitted the inner `[text](url)` as a styled plain-text run and the link was never clickable. When a styled (non-code) token's inner content is itself a complete markdown link, it now becomes a `link` element carrying that style (e.g. a bold link), matching the already-working emphasis-inside-the-brackets form (`[**text**](url)`).
|
|
14
|
+
- Stopped Slack mention tokens from rendering as literal text inside promoted list items. Since 2.4.0 a simple list is emitted as a `rich_text` block, but the inline builder only tokenized links, code, and emphasis — so a `<#C123>` / `<@U123>` / `<!subteam^S123>` / `<!here>` token in a list item fell through as a plain `text` run. In a `rich_text` block a mention has to be a structured element (`channel`, `user`, `usergroup`, `broadcast`), so Slack showed the raw `<#C123>` rather than a pill. (Prose was unaffected: it stays in a `markdown` block, where Slack resolves the token itself.) These tokens are now converted to the matching rich_text elements, an optional `|label` display suffix is dropped (Slack renders the element from the id), and the plain-text fallback re-emits the canonical `<#C123>` token so a downgraded mrkdwn fallback still links and notifies. The same applies inside table cells: `extract_plain_text_from_table_cell` now delegates inline runs to the shared rich_text downgrade path instead of a separate near-copy, so a mention in a cell also survives into the fallback text rather than vanishing.
|
|
15
|
+
|
|
16
|
+
## [2.4.3] - 2026-05-29
|
|
17
|
+
|
|
18
|
+
### Fixed
|
|
19
|
+
|
|
20
|
+
- Stopped bare-URL autolinking from greedily swallowing trailing text. `normalize_bare_urls_for_slack_markdown` matched `https?://[^\s<]+`, so a scheme URL glued directly to following CJK text (e.g. `(https://example.com)**。に句点を直結。`) — common in Japanese, which puts no space after a URL — captured the closing paren, the `**` markers, the CJK punctuation, and the rest of the sentence into one `<…>` autolink, over-extending the link and exposing the literal `**`. The matched URL is now trimmed GFM-style: it stops at a doubled emphasis run (`**`/`~~`), at code/angle/pipe markers (`` ` ``, `<`, `>`, `|`), and at CJK punctuation (`、`/`。`/`」` …), and trailing punctuation (GFM's autolink set `! ? . , : * _ ~`, and an unbalanced `)`) is dropped while balanced parentheses are kept. `;` and quotes are kept (URL-legal), and a lone `*` (URL wildcards/queries) and CJK letters (IRIs / Unicode IDN hosts) are preserved.
|
|
21
|
+
|
|
9
22
|
## [2.4.2] - 2026-05-29
|
|
10
23
|
|
|
11
24
|
### Fixed
|
|
@@ -72,7 +72,7 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
|
|
|
72
72
|
### このパーサーが補正・安定化するもの
|
|
73
73
|
|
|
74
74
|
- `_..._` / `__...__` を Slack 互換の `*...*` / `**...**` に正規化する
|
|
75
|
-
- bare URL を Slack で安定しやすい `<https://...>`
|
|
75
|
+
- bare URL を Slack で安定しやすい `<https://...>` 形式にそろえる。まず URL を実際の範囲にトリミングする(GFM 風): 二重の強調記号(`**`/`~~`)・コード/山かっこ/パイプ記号(`` ` ``・`<`・`>`・`|`)・CJK / 全角の句読点(`、` `。` `」` `)` `!` …)で停止し、末尾の句読点(GFM の autolink 集合 `! ? . , : * _ ~` と不均衡な `)`)を除外する(`;` と引用符は URL で正当なため保持)。単独の `*`(URL のワイルドカード/クエリ)と CJK の**文字**(反復記号 `々` を含む。IRI / Unicode IDN ホスト。例 `https://ja.wikipedia.org/wiki/人々`)は保持する。これにより、日本語のように URL の直後へ空白なしで CJK 本文が続く場合でも、URL が行末まで(閉じの `**` ごと)貪欲に飲み込んでリンク化する事故を防ぐ。
|
|
76
76
|
- 崩れた Markdown テーブルを補って `table` ブロックへ変換する
|
|
77
77
|
- 意味が明確な単独 Markdown 構文を Slack ネイティブのブロックへ変換する
|
|
78
78
|
- 単独行の画像構文 `` → `image`
|
|
@@ -72,7 +72,7 @@ Slack still controls when those newer features appear and how they look, so trea
|
|
|
72
72
|
### Things this parser corrects or stabilizes
|
|
73
73
|
|
|
74
74
|
- `_..._` and `__...__` are normalized into Slack-friendly `*...*` and `**...**`
|
|
75
|
-
- Bare URLs are wrapped into Slack-friendly `<https://...>` form before `markdown` block delivery
|
|
75
|
+
- Bare URLs are wrapped into Slack-friendly `<https://...>` form before `markdown` block delivery. The URL is trimmed to its real extent first (GFM-style): it stops at a doubled emphasis run (`**`/`~~`), at code/angle/pipe markers (`` ` ``, `<`, `>`, `|`), and at CJK / full-width punctuation (`、` `。` `」` `)` `!` …); trailing punctuation (GFM's autolink set `! ? . , : * _ ~`, and an unbalanced `)`) is excluded — `;` and quotes are kept because they are URL-legal. A lone `*` (URL wildcards/queries) and CJK *letters* — including iteration marks like `々` (IRIs / Unicode IDN hosts such as `https://ja.wikipedia.org/wiki/人々`) — are preserved. This keeps a scheme URL glued directly to following CJK text — common in Japanese, where no space separates them — from greedily swallowing the rest of the line (including a closing `**`) into the autolink.
|
|
76
76
|
- Malformed Markdown tables are repaired before `table` block generation
|
|
77
77
|
- Unambiguous standalone Markdown constructs are promoted into native Slack blocks:
|
|
78
78
|
- standalone image syntax `` to `image`
|
|
@@ -81,6 +81,7 @@ Slack still controls when those newer features appear and how they look, so trea
|
|
|
81
81
|
- simple one-level quotes to `rich_text_quote`
|
|
82
82
|
- simple bullet and ordered lists to `rich_text_list`
|
|
83
83
|
- Lists are promoted only when the list starts at the beginning of the text region or after a blank line, each non-blank line in the run is a list item, the list does not use ambiguous 1-3-space nested indentation, the item text does not rely on Markdown backslash escapes, and the run is not followed by an indented continuation paragraph.
|
|
84
|
+
- Slack mention tokens inside a promoted list item are converted to their structured `rich_text` elements — `<@U…>`/`<@W…>` to `user`, `<#C…>`/`<#G…>` to `channel`, `<!subteam^S…>` to `usergroup`, and `<!here>`/`<!channel>`/`<!everyone>` to `broadcast` — since a `rich_text` block does not resolve a raw token. An optional `|label` display suffix is dropped (Slack renders the element from the id).
|
|
84
85
|
- Table-like rows inside fenced code blocks are kept out of table parsing
|
|
85
86
|
- Internal blank lines can optionally be rewritten into placeholder lines so Slack keeps visible paragraph separation
|
|
86
87
|
- Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized
|
{slack_markdown_parser-2.4.2 → slack_markdown_parser-2.4.4}/slack_markdown_parser/converter.py
RENAMED
|
@@ -84,6 +84,12 @@ LOOSE_TABLE_SEPARATOR_PATTERN = re.compile(
|
|
|
84
84
|
TABLE_TOKEN_PATTERN = re.compile(
|
|
85
85
|
r"\[(?P<markdown_label>[^\]\n]+)\]\((?P<markdown_url>https?://[^\s)]+)\)"
|
|
86
86
|
r"|<(?P<angle_url>https?://[^>\s|]+)(?:\|(?P<angle_label>[^>\n]+))?>"
|
|
87
|
+
# Slack mention tokens: user (<@U…>/<@W…>), channel (<#C…>/<#G…>), user
|
|
88
|
+
# group (<!subteam^S…>), and broadcast (<!here>/<!channel>/<!everyone>).
|
|
89
|
+
# An optional ``|label`` is the human-readable display the author saw; the
|
|
90
|
+
# rich_text element is rendered by Slack from the id, so the label is dropped.
|
|
91
|
+
r"|<(?P<mention>@[UW][A-Z0-9]+|#[CG][A-Z0-9]+|!subteam\^[A-Z0-9]+"
|
|
92
|
+
r"|!(?:here|channel|everyone))(?:\|[^>\n]+)?>"
|
|
87
93
|
r"|(?P<token>"
|
|
88
94
|
r"(?P<code>(?P<code_delimiter>`+)(?P<code_text>[^\n]+?)(?P=code_delimiter))"
|
|
89
95
|
r"|~~[^~]+~~"
|
|
@@ -143,6 +149,72 @@ def _is_han_or_kana_char(char: str) -> bool:
|
|
|
143
149
|
)
|
|
144
150
|
|
|
145
151
|
|
|
152
|
+
# Code/angle/pipe markers that never appear inside a bare URL in this library's
|
|
153
|
+
# prose context. (A single ``*`` and CJK letters are intentionally NOT here: a
|
|
154
|
+
# URL may legally contain a wildcard/query ``*`` and an IRI/IDN may contain CJK
|
|
155
|
+
# letters, so those must be preserved.)
|
|
156
|
+
_URL_STOP_CHARS = frozenset("`<>|")
|
|
157
|
+
# Trailing punctuation stripped from the end of a bare URL. This is exactly
|
|
158
|
+
# GFM's autolink-extension set (``! ? . , : * _ ~``); a closing paren is handled
|
|
159
|
+
# separately, with balancing. ``;`` and quotes are intentionally NOT included —
|
|
160
|
+
# ``;`` is URL-legal in matrix/path parameters and quotes are sub-delimiters, so
|
|
161
|
+
# trimming them could change the link target rather than just shedding prose.
|
|
162
|
+
_URL_TRAILING_PUNCTUATION = frozenset("!?.,:*_~")
|
|
163
|
+
# CJK and full/half-width punctuation/brackets that terminate prose, so a bare
|
|
164
|
+
# URL is cut here. This is an explicit set rather than the whole U+3000–U+303F
|
|
165
|
+
# block on purpose: letter-like CJK iteration marks (々 U+3005, 〻 U+303B),
|
|
166
|
+
# ditto/closure marks (〆 U+3006) and the ideographic number zero (〇 U+3007)
|
|
167
|
+
# are *excluded* so IRIs such as ``https://ja.wikipedia.org/wiki/人々`` survive.
|
|
168
|
+
_URL_CJK_BOUNDARY_CHARS = frozenset(
|
|
169
|
+
"、。〃〈〉《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞・…" # CJK punctuation & brackets
|
|
170
|
+
"!?,.:;()[]{}<>|" # full-width punctuation & brackets
|
|
171
|
+
"。「」、" # half-width CJK punctuation & brackets
|
|
172
|
+
)
|
|
173
|
+
|
|
174
|
+
|
|
175
|
+
def _is_url_boundary_char(char: str) -> bool:
|
|
176
|
+
"""Return True when ``char`` is a hard boundary where a bare URL must stop.
|
|
177
|
+
|
|
178
|
+
Only unambiguous prose/markup boundaries qualify: code/angle/pipe markers
|
|
179
|
+
and CJK/full-width *punctuation* (``、``/``。``/``」``/``)`` …). CJK
|
|
180
|
+
*letters* (including iteration marks like ``々``) are not a boundary, so
|
|
181
|
+
IRIs such as ``https://ja.wikipedia.org/wiki/人々`` survive.
|
|
182
|
+
"""
|
|
183
|
+
return char in _URL_STOP_CHARS or char in _URL_CJK_BOUNDARY_CHARS
|
|
184
|
+
|
|
185
|
+
|
|
186
|
+
def _trim_bare_url(url: str) -> str:
|
|
187
|
+
"""Trim a greedily matched bare URL down to its real extent.
|
|
188
|
+
|
|
189
|
+
In CJK writing a URL is usually glued directly to the following text with no
|
|
190
|
+
whitespace, so the greedy ``[^\\s<]+`` match would otherwise swallow the
|
|
191
|
+
trailing ``)``/``**``/``。`` and the rest of the sentence. This stops the URL
|
|
192
|
+
at the first hard boundary or doubled emphasis run (``**``/``~~``) — single
|
|
193
|
+
``*`` and CJK letters are preserved — then drops GFM-style trailing
|
|
194
|
+
punctuation and unbalanced closing parens, so ``https://example.com)**。``
|
|
195
|
+
becomes ``https://example.com``.
|
|
196
|
+
"""
|
|
197
|
+
for index, char in enumerate(url):
|
|
198
|
+
nxt = url[index + 1] if index + 1 < len(url) else ""
|
|
199
|
+
if _is_url_boundary_char(char) or (char in "*~" and nxt == char):
|
|
200
|
+
url = url[:index]
|
|
201
|
+
break
|
|
202
|
+
|
|
203
|
+
while url:
|
|
204
|
+
last = url[-1]
|
|
205
|
+
if last == ")":
|
|
206
|
+
if url.count(")") <= url.count("("):
|
|
207
|
+
break
|
|
208
|
+
url = url[:-1]
|
|
209
|
+
continue
|
|
210
|
+
if last in _URL_TRAILING_PUNCTUATION:
|
|
211
|
+
url = url[:-1]
|
|
212
|
+
continue
|
|
213
|
+
break
|
|
214
|
+
|
|
215
|
+
return url
|
|
216
|
+
|
|
217
|
+
|
|
146
218
|
def _nested_code_space_strategy(
|
|
147
219
|
source: str,
|
|
148
220
|
start: int,
|
|
@@ -596,9 +668,15 @@ def normalize_bare_urls_for_slack_markdown(text: str) -> str:
|
|
|
596
668
|
|
|
597
669
|
url_match = BARE_URL_PATTERN.match(chunk, cursor)
|
|
598
670
|
if url_match:
|
|
599
|
-
|
|
600
|
-
|
|
601
|
-
|
|
671
|
+
url = _trim_bare_url(url_match.group(0))
|
|
672
|
+
scheme = re.match(r"https?://", url, re.IGNORECASE)
|
|
673
|
+
# Only autolink when something host-like survives the trim;
|
|
674
|
+
# a bare ``https://`` followed straight by CJK would otherwise
|
|
675
|
+
# produce an empty ``<https://>`` autolink.
|
|
676
|
+
if scheme and len(url) > scheme.end():
|
|
677
|
+
parts.append(f"<{url}>")
|
|
678
|
+
cursor += len(url)
|
|
679
|
+
continue
|
|
602
680
|
|
|
603
681
|
parts.append(char)
|
|
604
682
|
cursor += 1
|
|
@@ -1245,6 +1323,40 @@ def looks_like_markdown_table(text: str) -> bool:
|
|
|
1245
1323
|
return table_like_lines >= 2
|
|
1246
1324
|
|
|
1247
1325
|
|
|
1326
|
+
def _slack_mention_element(mention: str) -> dict[str, Any]:
|
|
1327
|
+
"""Map a Slack mention token body to its rich_text element.
|
|
1328
|
+
|
|
1329
|
+
``mention`` is the token interior without the angle brackets or ``|label``
|
|
1330
|
+
(e.g. ``@U123``, ``#C123``, ``!subteam^S123``, ``!here``). Slack renders
|
|
1331
|
+
these from the id alone, so no display text is carried.
|
|
1332
|
+
"""
|
|
1333
|
+
sigil, body = mention[0], mention[1:]
|
|
1334
|
+
if sigil == "@":
|
|
1335
|
+
return {"type": "user", "user_id": body}
|
|
1336
|
+
if sigil == "#":
|
|
1337
|
+
return {"type": "channel", "channel_id": body}
|
|
1338
|
+
if body.startswith("subteam^"):
|
|
1339
|
+
return {"type": "usergroup", "usergroup_id": body[len("subteam^") :]}
|
|
1340
|
+
return {"type": "broadcast", "range": body}
|
|
1341
|
+
|
|
1342
|
+
|
|
1343
|
+
def _slack_mention_element_to_token(element: dict[str, Any]) -> str:
|
|
1344
|
+
"""Inverse of :func:`_slack_mention_element` for plain-text fallbacks.
|
|
1345
|
+
|
|
1346
|
+
Emitting the canonical ``<#C…>`` / ``<@U…>`` token (rather than an empty
|
|
1347
|
+
string) keeps the mention live when a rich_text block is downgraded to a
|
|
1348
|
+
mrkdwn fallback, so it still links and notifies.
|
|
1349
|
+
"""
|
|
1350
|
+
element_type = element.get("type")
|
|
1351
|
+
if element_type == "user":
|
|
1352
|
+
return f"<@{element.get('user_id', '')}>"
|
|
1353
|
+
if element_type == "channel":
|
|
1354
|
+
return f"<#{element.get('channel_id', '')}>"
|
|
1355
|
+
if element_type == "usergroup":
|
|
1356
|
+
return f"<!subteam^{element.get('usergroup_id', '')}>"
|
|
1357
|
+
return f"<!{element.get('range', '')}>"
|
|
1358
|
+
|
|
1359
|
+
|
|
1248
1360
|
def _create_rich_text_inline_elements(
|
|
1249
1361
|
text: str, *, empty_text: str = ""
|
|
1250
1362
|
) -> list[dict[str, Any]]:
|
|
@@ -1268,6 +1380,7 @@ def _create_rich_text_inline_elements(
|
|
|
1268
1380
|
markdown_url = match.group("markdown_url")
|
|
1269
1381
|
angle_url = match.group("angle_url")
|
|
1270
1382
|
angle_label = match.group("angle_label")
|
|
1383
|
+
mention = match.group("mention")
|
|
1271
1384
|
token = match.group("token") or ""
|
|
1272
1385
|
|
|
1273
1386
|
if markdown_label and markdown_url:
|
|
@@ -1278,6 +1391,8 @@ def _create_rich_text_inline_elements(
|
|
|
1278
1391
|
"url": angle_url,
|
|
1279
1392
|
"text": angle_label or angle_url,
|
|
1280
1393
|
}
|
|
1394
|
+
elif mention:
|
|
1395
|
+
element = _slack_mention_element(mention)
|
|
1281
1396
|
else:
|
|
1282
1397
|
style: dict[str, bool] = {}
|
|
1283
1398
|
content = token
|
|
@@ -1296,9 +1411,26 @@ def _create_rich_text_inline_elements(
|
|
|
1296
1411
|
content = content[1:-1]
|
|
1297
1412
|
style["italic"] = True
|
|
1298
1413
|
|
|
1299
|
-
|
|
1300
|
-
|
|
1301
|
-
|
|
1414
|
+
# A link wrapped entirely in emphasis (``**[text](url)**``) is matched
|
|
1415
|
+
# by the emphasis branch above, not the link branch, so its inner
|
|
1416
|
+
# content is a bare ``[text](url)``. Emit a styled ``link`` element
|
|
1417
|
+
# rather than a literal text run, otherwise the link is dead in Slack.
|
|
1418
|
+
inner_link = (
|
|
1419
|
+
TABLE_TOKEN_PATTERN.fullmatch(content)
|
|
1420
|
+
if style and not style.get("code")
|
|
1421
|
+
else None
|
|
1422
|
+
)
|
|
1423
|
+
if inner_link is not None and inner_link.group("markdown_url"):
|
|
1424
|
+
element = {
|
|
1425
|
+
"type": "link",
|
|
1426
|
+
"url": inner_link.group("markdown_url"),
|
|
1427
|
+
"text": inner_link.group("markdown_label"),
|
|
1428
|
+
"style": style,
|
|
1429
|
+
}
|
|
1430
|
+
else:
|
|
1431
|
+
element = {"type": "text", "text": content}
|
|
1432
|
+
if style:
|
|
1433
|
+
element["style"] = style
|
|
1302
1434
|
elements.append(element)
|
|
1303
1435
|
last_index = match.end()
|
|
1304
1436
|
|
|
@@ -1338,12 +1470,11 @@ def extract_plain_text_from_table_cell(cell: dict[str, Any]) -> str:
|
|
|
1338
1470
|
if not isinstance(element, dict):
|
|
1339
1471
|
continue
|
|
1340
1472
|
if element.get("type") == "rich_text_section":
|
|
1341
|
-
|
|
1342
|
-
|
|
1343
|
-
|
|
1344
|
-
|
|
1345
|
-
|
|
1346
|
-
texts.append(child.get("text", ""))
|
|
1473
|
+
texts.append(
|
|
1474
|
+
_rich_text_inline_elements_to_plain_text(
|
|
1475
|
+
element.get("elements", [])
|
|
1476
|
+
)
|
|
1477
|
+
)
|
|
1347
1478
|
elif "text" in element:
|
|
1348
1479
|
texts.append(str(element.get("text", "")))
|
|
1349
1480
|
return "".join(texts)
|
|
@@ -1410,6 +1541,8 @@ def _rich_text_inline_elements_to_plain_text(elements: list[dict[str, Any]]) ->
|
|
|
1410
1541
|
element_type = element.get("type")
|
|
1411
1542
|
if element_type == "link":
|
|
1412
1543
|
texts.append(str(element.get("text") or element.get("url", "")))
|
|
1544
|
+
elif element_type in {"user", "channel", "usergroup", "broadcast"}:
|
|
1545
|
+
texts.append(_slack_mention_element_to_token(element))
|
|
1413
1546
|
else:
|
|
1414
1547
|
texts.append(str(element.get("text", "")))
|
|
1415
1548
|
return "".join(texts)
|
|
@@ -1945,44 +2078,73 @@ def convert_markdown_to_slack_payloads(
|
|
|
1945
2078
|
return payloads
|
|
1946
2079
|
|
|
1947
2080
|
|
|
1948
|
-
def
|
|
1949
|
-
"""
|
|
2081
|
+
def _markdown_block_to_plain_text(block: dict[str, Any]) -> str:
|
|
2082
|
+
"""Downgrade a ``markdown`` block, preferring the build-time annotation."""
|
|
2083
|
+
text = getattr(block, "_plain_text", None) or ""
|
|
2084
|
+
if text:
|
|
2085
|
+
return str(text)
|
|
2086
|
+
raw_text = block.get("text", "")
|
|
2087
|
+
if not raw_text:
|
|
2088
|
+
return ""
|
|
2089
|
+
return _normalize_markdown_block_plain_text(
|
|
2090
|
+
_strip_synthetic_blank_line_placeholders(
|
|
2091
|
+
_strip_synthetic_spaces_from_plain_text(
|
|
2092
|
+
strip_zero_width_spaces(raw_text),
|
|
2093
|
+
getattr(block, "_synthetic_space_indices", None),
|
|
2094
|
+
),
|
|
2095
|
+
getattr(block, "_synthetic_blank_line_indices", None),
|
|
2096
|
+
)
|
|
2097
|
+
)
|
|
2098
|
+
|
|
2099
|
+
|
|
2100
|
+
def _blocks_to_downgrade_parts(
|
|
2101
|
+
blocks: list[dict[str, Any]], *, fallback: bool
|
|
2102
|
+
) -> list[str]:
|
|
2103
|
+
"""Shared block walker behind :func:`blocks_to_plain_text` and
|
|
2104
|
+
:func:`build_fallback_text_from_blocks`.
|
|
2105
|
+
|
|
2106
|
+
The two public functions intentionally differ in a few policies, kept
|
|
2107
|
+
explicit on the ``fallback`` flag: fallback keeps table cells verbatim
|
|
2108
|
+
(empty cells preserve column alignment) and emits a whole table as one
|
|
2109
|
+
part, while the plain-text view strips zero-width spaces, drops empty
|
|
2110
|
+
cells, emits one part per row, and surfaces ``text`` from unknown blocks.
|
|
2111
|
+
"""
|
|
1950
2112
|
parts: list[str] = []
|
|
1951
2113
|
|
|
1952
2114
|
for block in blocks or []:
|
|
1953
|
-
|
|
2115
|
+
if not isinstance(block, dict):
|
|
2116
|
+
continue
|
|
2117
|
+
block_type = block.get("type")
|
|
1954
2118
|
|
|
1955
2119
|
if block_type == "markdown":
|
|
1956
|
-
text =
|
|
1957
|
-
if
|
|
1958
|
-
raw_text = block.get("text", "")
|
|
1959
|
-
if raw_text:
|
|
1960
|
-
text = _normalize_markdown_block_plain_text(
|
|
1961
|
-
_strip_synthetic_blank_line_placeholders(
|
|
1962
|
-
_strip_synthetic_spaces_from_plain_text(
|
|
1963
|
-
strip_zero_width_spaces(raw_text),
|
|
1964
|
-
getattr(block, "_synthetic_space_indices", None),
|
|
1965
|
-
),
|
|
1966
|
-
getattr(block, "_synthetic_blank_line_indices", None),
|
|
1967
|
-
)
|
|
1968
|
-
)
|
|
1969
|
-
if text:
|
|
2120
|
+
text = _markdown_block_to_plain_text(block)
|
|
2121
|
+
if text.strip() if fallback else text:
|
|
1970
2122
|
parts.append(text)
|
|
1971
2123
|
elif block_type == "table":
|
|
1972
|
-
|
|
1973
|
-
for row in rows:
|
|
1974
|
-
cell_texts: list[str] = []
|
|
2124
|
+
row_texts: list[str] = []
|
|
2125
|
+
for row in block.get("rows") or []:
|
|
1975
2126
|
if not isinstance(row, list):
|
|
1976
2127
|
continue
|
|
1977
|
-
|
|
1978
|
-
|
|
1979
|
-
|
|
1980
|
-
|
|
2128
|
+
if fallback:
|
|
2129
|
+
cell_texts = [
|
|
2130
|
+
extract_plain_text_from_table_cell(cell) for cell in row
|
|
2131
|
+
]
|
|
2132
|
+
else:
|
|
2133
|
+
cell_texts = []
|
|
2134
|
+
for cell in row:
|
|
2135
|
+
cell_text = extract_plain_text_from_table_cell(cell)
|
|
2136
|
+
if cell_text:
|
|
2137
|
+
cell_texts.append(strip_zero_width_spaces(cell_text))
|
|
1981
2138
|
if cell_texts:
|
|
1982
|
-
|
|
2139
|
+
row_texts.append(" | ".join(cell_texts))
|
|
2140
|
+
if fallback:
|
|
2141
|
+
if row_texts:
|
|
2142
|
+
parts.append("\n".join(row_texts))
|
|
2143
|
+
else:
|
|
2144
|
+
parts.extend(row_texts)
|
|
1983
2145
|
elif block_type == "rich_text":
|
|
1984
2146
|
text = _rich_text_block_to_plain_text(block)
|
|
1985
|
-
if text:
|
|
2147
|
+
if text.strip() if fallback else text:
|
|
1986
2148
|
parts.append(text)
|
|
1987
2149
|
elif block_type == "header":
|
|
1988
2150
|
text = block.get("text", {})
|
|
@@ -1998,66 +2160,24 @@ def blocks_to_plain_text(blocks: list[dict[str, Any]]) -> str:
|
|
|
1998
2160
|
parts.append(image_text)
|
|
1999
2161
|
elif block_type == "divider":
|
|
2000
2162
|
parts.append(getattr(block, "_plain_text", None) or "---")
|
|
2001
|
-
elif
|
|
2163
|
+
elif not fallback:
|
|
2002
2164
|
text = block.get("text", "")
|
|
2003
2165
|
if text:
|
|
2004
2166
|
parts.append(str(text))
|
|
2005
2167
|
|
|
2168
|
+
return parts
|
|
2169
|
+
|
|
2170
|
+
|
|
2171
|
+
def blocks_to_plain_text(blocks: list[dict[str, Any]]) -> str:
|
|
2172
|
+
"""Build plain text representation from Slack blocks."""
|
|
2173
|
+
parts = _blocks_to_downgrade_parts(blocks, fallback=False)
|
|
2006
2174
|
return "\n".join([p for p in parts if p]).strip()
|
|
2007
2175
|
|
|
2008
2176
|
|
|
2009
2177
|
def build_fallback_text_from_blocks(blocks: list[dict[str, Any]]) -> str:
|
|
2010
2178
|
"""Build Slack fallback text from block structure."""
|
|
2011
|
-
|
|
2012
|
-
|
|
2013
|
-
for block in blocks or []:
|
|
2014
|
-
if not isinstance(block, dict):
|
|
2015
|
-
continue
|
|
2016
|
-
|
|
2017
|
-
if block.get("type") == "markdown":
|
|
2018
|
-
text = getattr(block, "_plain_text", None) or ""
|
|
2019
|
-
if not text:
|
|
2020
|
-
text = _normalize_markdown_block_plain_text(
|
|
2021
|
-
_strip_synthetic_blank_line_placeholders(
|
|
2022
|
-
_strip_synthetic_spaces_from_plain_text(
|
|
2023
|
-
strip_zero_width_spaces(block.get("text", "")),
|
|
2024
|
-
getattr(block, "_synthetic_space_indices", None),
|
|
2025
|
-
),
|
|
2026
|
-
getattr(block, "_synthetic_blank_line_indices", None),
|
|
2027
|
-
),
|
|
2028
|
-
)
|
|
2029
|
-
if text.strip():
|
|
2030
|
-
plain_parts.append(text)
|
|
2031
|
-
elif block.get("type") == "table":
|
|
2032
|
-
table_lines: list[str] = []
|
|
2033
|
-
for row in block.get("rows", []):
|
|
2034
|
-
if not isinstance(row, list):
|
|
2035
|
-
continue
|
|
2036
|
-
cells = [extract_plain_text_from_table_cell(cell) for cell in row]
|
|
2037
|
-
if cells:
|
|
2038
|
-
table_lines.append(" | ".join(cells))
|
|
2039
|
-
if table_lines:
|
|
2040
|
-
plain_parts.append("\n".join(table_lines))
|
|
2041
|
-
elif block.get("type") == "rich_text":
|
|
2042
|
-
text = _rich_text_block_to_plain_text(block)
|
|
2043
|
-
if text.strip():
|
|
2044
|
-
plain_parts.append(text)
|
|
2045
|
-
elif block.get("type") == "header":
|
|
2046
|
-
text = block.get("text", {})
|
|
2047
|
-
if isinstance(text, dict) and text.get("text"):
|
|
2048
|
-
plain_parts.append(str(text.get("text", "")))
|
|
2049
|
-
elif block.get("type") == "image":
|
|
2050
|
-
alt_text = str(block.get("alt_text", "")).strip()
|
|
2051
|
-
image_url = str(block.get("image_url", "")).strip()
|
|
2052
|
-
image_text = alt_text or image_url
|
|
2053
|
-
if alt_text and image_url:
|
|
2054
|
-
image_text = f"{alt_text} ({image_url})"
|
|
2055
|
-
if image_text:
|
|
2056
|
-
plain_parts.append(image_text)
|
|
2057
|
-
elif block.get("type") == "divider":
|
|
2058
|
-
plain_parts.append(getattr(block, "_plain_text", None) or "---")
|
|
2059
|
-
|
|
2060
|
-
return "\n\n".join([part for part in plain_parts if part.strip()])
|
|
2179
|
+
parts = _blocks_to_downgrade_parts(blocks, fallback=True)
|
|
2180
|
+
return "\n\n".join([part for part in parts if part.strip()])
|
|
2061
2181
|
|
|
2062
2182
|
|
|
2063
2183
|
# Backward-compatible helper retained for existing imports.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|