slack-markdown-parser 2.4.4__tar.gz → 2.5.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/CHANGELOG.md +12 -0
- {slack_markdown_parser-2.4.4/slack_markdown_parser.egg-info → slack_markdown_parser-2.5.0}/PKG-INFO +10 -7
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/README-ja.md +9 -6
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/README.md +9 -6
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/docs/spec-ja.md +28 -5
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/docs/spec.md +28 -5
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/pyproject.toml +1 -1
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser/__init__.py +1 -1
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser/converter.py +544 -66
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0/slack_markdown_parser.egg-info}/PKG-INFO +10 -7
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/LICENSE +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/MANIFEST.in +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/setup.cfg +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser/py.typed +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/SOURCES.txt +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/dependency_links.txt +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/requires.txt +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/top_level.txt +0 -0
|
@@ -6,6 +6,18 @@ The format is based on Keep a Changelog, and the project follows Semantic Versio
|
|
|
6
6
|
|
|
7
7
|
## [Unreleased]
|
|
8
8
|
|
|
9
|
+
## [2.5.0] - 2026-06-11
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
|
|
13
|
+
- Added automatic size splitting so long or heading-dense LLM output no longer fails `chat.postMessage` outright. Three Slack-side hard limits were measured against a real workspace on 2026-06-11 and are now enforced at conversion time: a `markdown` block's `text` accepts exactly 12,000 characters (`msg_too_long` beyond that); Slack expands `markdown` blocks server-side and enforces "no more than 50 items" per message on the expanded result (`invalid_blocks`), where each heading and each thematic break is one item while paragraph/list/quote/fence runs between them merge into one; and one message's blocks may carry at most 13,200 characters of text in total across block types (`msg_blocks_too_long`). Oversized regions are split preferring paragraph boundaries, then line and word boundaries, with a hard cut as a last resort for space-less CJK; a cut inside an unclosed fence re-opens the fence in the continuation block so both halves keep rendering as code. Pieces are re-checked after ZWSP/NBSP formatting and re-split with shrinking budgets when they still overflow. `convert_markdown_to_slack_messages` packs blocks under all three budgets in addition to the existing one-table-per-message rule; documents already within every limit are returned unchanged. Note that the same input can now produce more blocks and more messages than 2.4.x when it previously exceeded Slack's limits (which used to fail delivery entirely).
|
|
14
|
+
|
|
15
|
+
### Fixed
|
|
16
|
+
|
|
17
|
+
- Stopped corrupting code samples during sanitization. `decode_html_entities` and the angle-token neutralization inside `sanitize_slack_text` ran over the whole text, so a fenced code block or inline code span containing `<div>` or `&` reached Slack as `<div>` / `&` even though Slack renders code content literally. Both passes now skip fenced code blocks and inline code spans; ANSI/control-character removal still applies everywhere. For this purpose a code span is recognized within a single line only and closes only on a backtick run of equal length (CommonMark pairing), so a stray unpaired backtick stays literal and cannot suppress sanitization of later lines, and an invalid angle token that spans a code span (`<foo `bar` baz>`) is still neutralized as a whole while the span content stays verbatim.
|
|
18
|
+
- Stopped crafted input from colliding with internal placeholders. Input carrying this library's reserved in-band marker code points (`U+2063`, `U+FFF0`–`U+FFF3`, e.g. a literal `code0` sequence) could crash conversion with `KeyError` or get substituted with another code span's content. The markers are now stripped during sanitization and at direct-call entry points of the placeholder machinery, and placeholder restoration passes unknown sequences through instead of raising.
|
|
19
|
+
- Consolidated the three duplicated fenced-code tracking loops onto a single `_iter_fence_states` helper so fence semantics cannot drift between passes again — that drift is exactly how the sanitize corruption happened.
|
|
20
|
+
|
|
9
21
|
## [2.4.4] - 2026-06-10
|
|
10
22
|
|
|
11
23
|
### Fixed
|
{slack_markdown_parser-2.4.4/slack_markdown_parser.egg-info → slack_markdown_parser-2.5.0}/PKG-INFO
RENAMED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: slack-markdown-parser
|
|
3
|
-
Version: 2.
|
|
3
|
+
Version: 2.5.0
|
|
4
4
|
Summary: Convert LLM Markdown into Slack Block Kit messages
|
|
5
5
|
Author: darkgaldragon
|
|
6
6
|
License-Expression: MIT
|
|
@@ -62,7 +62,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
|
|
|
62
62
|
- Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
|
|
63
63
|
- Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
|
|
64
64
|
- Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
|
|
65
|
-
-
|
|
65
|
+
- Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
|
|
66
|
+
- Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
|
|
66
67
|
- Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
|
|
67
68
|
- Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
|
|
68
69
|
- Support Markdown links and Slack-style links inside table cells
|
|
@@ -115,7 +116,8 @@ What this library compensates for:
|
|
|
115
116
|
- Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
|
|
116
117
|
- Keeps table-like rows inside fenced code blocks out of table normalization
|
|
117
118
|
- Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
|
|
118
|
-
- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
|
|
119
|
+
- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
|
|
120
|
+
- Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
|
|
119
121
|
|
|
120
122
|
## Requirements
|
|
121
123
|
|
|
@@ -151,7 +153,7 @@ for payload in convert_markdown_to_slack_payloads(
|
|
|
151
153
|
print(payload)
|
|
152
154
|
```
|
|
153
155
|
|
|
154
|
-
`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
|
|
156
|
+
`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
|
|
155
157
|
Set `preserve_visual_blank_lines=True` when you want the parser to compensate
|
|
156
158
|
for Slack's currently tight paragraph spacing inside `markdown` blocks.
|
|
157
159
|
The blank-line workaround is intentionally narrow: it skips table segments and
|
|
@@ -205,7 +207,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
|
|
|
205
207
|
|
|
206
208
|
| Function | Description |
|
|
207
209
|
|---|---|
|
|
208
|
-
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
|
|
210
|
+
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
|
|
209
211
|
| `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
|
|
210
212
|
| `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
|
|
211
213
|
| `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
|
|
@@ -225,9 +227,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
|
|
|
225
227
|
| Function | Description |
|
|
226
228
|
|---|---|
|
|
227
229
|
| `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
|
|
230
|
+
| `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
|
|
228
231
|
| `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
|
|
229
|
-
| `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
|
|
230
|
-
| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
|
|
232
|
+
| `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
|
|
233
|
+
| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
|
|
231
234
|
| `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
|
|
232
235
|
|
|
233
236
|
### Lower-level exported helpers
|
|
@@ -32,7 +32,8 @@ Slack の `markdown` ブロック自体が対応していない構文は、古
|
|
|
32
32
|
- 安全に判定できる単独 Markdown 構文を `image` / `divider` / `rich_text` ブロックに変換
|
|
33
33
|
- LLM が生成する表で起こりやすい崩れ(外枠パイプ不足、区切り行不足、列数不一致、空セル)を補正
|
|
34
34
|
- 必要に応じてメッセージを自動分割し、Slack の「1メッセージ1テーブル」制約とメッセージあたりのブロック数制限に対応
|
|
35
|
-
-
|
|
35
|
+
- 長文や見出しの多い出力は段落境界を優先して複数の `markdown` ブロック・メッセージに分割し、実測した Slack のハード制限——`markdown` ブロックあたり 12,000 文字(`msg_too_long`)、見出し・区切り線を 1 個ずつ数えるメッセージあたり展開 50 アイテム(`invalid_blocks`)、メッセージあたりブロックテキスト総量 13,200 文字(`msg_blocks_too_long`)——のすべてに収める
|
|
36
|
+
- ANSI escape / 制御文字 / ライブラリ予約の内部マーカー文字を除去し、散文中の不正な Slack 角括弧トークンを無害化(コードフェンスとインラインコードの中身は原文のまま保持)
|
|
36
37
|
- フェンスドコードブロック外では、装飾記号の前後にゼロ幅スペースを入れて表示崩れを減らす
|
|
37
38
|
- 日本語・中国語・韓国語の詰まった文で、インラインコードを含む装飾が崩れる一部のケースでは可視スペースを補って安定化
|
|
38
39
|
- テーブルセル内の Markdown リンク / Slack 形式リンクを認識
|
|
@@ -74,7 +75,8 @@ Slack 側の制約として残るもの:
|
|
|
74
75
|
- 意味が明確な単独 Markdown 構文を、raw `markdown` 表示に頼らず Slack ネイティブの Block Kit ブロックへ変換
|
|
75
76
|
- フェンスドコード内の table 風行をテーブル処理から除外
|
|
76
77
|
- 必要に応じて、内部空行を補助用の行に置き換えて段落の区切りを見えやすくする
|
|
77
|
-
- 生 HTML 風タグなど、Slack の特殊記法としては無効な `<...>`
|
|
78
|
+
- 生 HTML 風タグなど、Slack の特殊記法としては無効な `<...>` 形式を散文中では無害化(コードフェンスとインラインコード内は原文のまま)
|
|
79
|
+
- 実測した Slack のブロック文字数上限・メッセージ展開アイテム上限・メッセージテキスト総量上限を超える出力を分割し、`chat.postMessage` ごと失敗するのを防ぐ
|
|
78
80
|
|
|
79
81
|
## 利用前提
|
|
80
82
|
|
|
@@ -110,7 +112,7 @@ for payload in convert_markdown_to_slack_payloads(
|
|
|
110
112
|
print(payload)
|
|
111
113
|
```
|
|
112
114
|
|
|
113
|
-
`convert_markdown_to_slack_messages`
|
|
115
|
+
`convert_markdown_to_slack_messages` は、複数テーブルを含む入力に加えて、長文や見出しの多い内容が Slack のブロック・メッセージサイズ上限を超える場合も、自動的に複数メッセージへ分割します。
|
|
114
116
|
Slack Web の新しい `markdown` 表示で段落間の余白が極端に小さい場合は、`preserve_visual_blank_lines=True` を使うと内部空行だけを見えやすく補えます。
|
|
115
117
|
|
|
116
118
|
## 入出力イメージ
|
|
@@ -160,7 +162,7 @@ QA | ~~保留~~ | Team C
|
|
|
160
162
|
|
|
161
163
|
| 関数 | 説明 |
|
|
162
164
|
|---|---|
|
|
163
|
-
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) → list[list[dict]]` | Markdown
|
|
165
|
+
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) → list[list[dict]]` | Markdown を、テーブルと Slack の実測サイズ上限に沿って分割済みのメッセージ群に変換 |
|
|
164
166
|
| `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) → list[dict]` | `blocks` とプレビュー用 `text` を含む Slack 送信用データへ変換 |
|
|
165
167
|
| `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) → list[dict]` | Markdown を Block Kit ブロックのリストに変換 |
|
|
166
168
|
| `build_fallback_text_from_blocks(blocks) → str` | `chat.postMessage.text` 用のプレビュー文字列を生成 |
|
|
@@ -175,9 +177,10 @@ QA | ~~保留~~ | Team C
|
|
|
175
177
|
| 関数 | 説明 |
|
|
176
178
|
|---|---|
|
|
177
179
|
| `normalize_markdown_tables(markdown_text) → str` | テーブル記法を正規化(パイプ補完、区切り行生成、列数調整) |
|
|
180
|
+
| `normalize_underscore_emphasis(text) → str` | `_..._` / `__...__` の underscore 装飾を Slack 互換の asterisk 装飾へ変換 |
|
|
178
181
|
| `add_zero_width_spaces_to_markdown(text) → str` | 装飾記号の前後にゼロ幅スペースを挿入(フェンスドコードブロック内は除外) |
|
|
179
|
-
| `decode_html_entities(text) → str` | HTML
|
|
180
|
-
| `sanitize_slack_text(text) → str` | ANSI /
|
|
182
|
+
| `decode_html_entities(text) → str` | 散文中の HTML エンティティをデコード(コード領域は原文のまま) |
|
|
183
|
+
| `sanitize_slack_text(text) → str` | ANSI / 制御文字 / 内部マーカー文字を除去し、コード領域外の不正な Slack 角括弧トークンを無害化 |
|
|
181
184
|
| `strip_zero_width_spaces(text) → str` | ゼロ幅スペース (U+200B) と BOM (U+FEFF) を除去(ZWJ 等の結合制御文字は保持) |
|
|
182
185
|
|
|
183
186
|
## 仕様
|
|
@@ -32,7 +32,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
|
|
|
32
32
|
- Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
|
|
33
33
|
- Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
|
|
34
34
|
- Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
|
|
35
|
-
-
|
|
35
|
+
- Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
|
|
36
|
+
- Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
|
|
36
37
|
- Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
|
|
37
38
|
- Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
|
|
38
39
|
- Support Markdown links and Slack-style links inside table cells
|
|
@@ -85,7 +86,8 @@ What this library compensates for:
|
|
|
85
86
|
- Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
|
|
86
87
|
- Keeps table-like rows inside fenced code blocks out of table normalization
|
|
87
88
|
- Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
|
|
88
|
-
- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
|
|
89
|
+
- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
|
|
90
|
+
- Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
|
|
89
91
|
|
|
90
92
|
## Requirements
|
|
91
93
|
|
|
@@ -121,7 +123,7 @@ for payload in convert_markdown_to_slack_payloads(
|
|
|
121
123
|
print(payload)
|
|
122
124
|
```
|
|
123
125
|
|
|
124
|
-
`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
|
|
126
|
+
`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
|
|
125
127
|
Set `preserve_visual_blank_lines=True` when you want the parser to compensate
|
|
126
128
|
for Slack's currently tight paragraph spacing inside `markdown` blocks.
|
|
127
129
|
The blank-line workaround is intentionally narrow: it skips table segments and
|
|
@@ -175,7 +177,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
|
|
|
175
177
|
|
|
176
178
|
| Function | Description |
|
|
177
179
|
|---|---|
|
|
178
|
-
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
|
|
180
|
+
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
|
|
179
181
|
| `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
|
|
180
182
|
| `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
|
|
181
183
|
| `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
|
|
@@ -195,9 +197,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
|
|
|
195
197
|
| Function | Description |
|
|
196
198
|
|---|---|
|
|
197
199
|
| `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
|
|
200
|
+
| `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
|
|
198
201
|
| `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
|
|
199
|
-
| `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
|
|
200
|
-
| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
|
|
202
|
+
| `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
|
|
203
|
+
| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
|
|
201
204
|
| `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
|
|
202
205
|
|
|
203
206
|
### Lower-level exported helpers
|
|
@@ -10,7 +10,7 @@
|
|
|
10
10
|
## 出力
|
|
11
11
|
|
|
12
12
|
- Slack Block Kit ブロック(`markdown`, `table`, `rich_text`, `image`, `divider`)
|
|
13
|
-
-
|
|
13
|
+
- 複数テーブル・多数の昇格ブロック・長文を含む入力時は、「1メッセージ1テーブル」、Slack のメッセージあたりブロック数制限、および「markdown ブロックのサイズ分割」に記載の実測サイズ上限を満たすメッセージ群
|
|
14
14
|
|
|
15
15
|
## 設計目標
|
|
16
16
|
|
|
@@ -23,8 +23,8 @@ Markdown としての厳密さより Slack 上での読みやすさが重要に
|
|
|
23
23
|
|
|
24
24
|
`convert_markdown_to_slack_blocks` の処理順序:
|
|
25
25
|
|
|
26
|
-
1. HTML エンティティをデコードし、`>`, `&`
|
|
27
|
-
2. Slack
|
|
26
|
+
1. 散文中の HTML エンティティをデコードし、`>`, `&` などを元の文字へ戻す(コードフェンスとインラインコードの中身はデコードしない)
|
|
27
|
+
2. Slack 向けのテキスト掃除を行う。ANSI / 制御文字とライブラリ予約の内部マーカー文字は全体から除去し、不正な Slack 角括弧トークンの無害化はコードフェンスとインラインコードの外側にのみ適用する
|
|
28
28
|
3. underscore 装飾を正規化し、`_..._` / `__...__` を Slack 互換の `*...*` / `**...**` に変換する
|
|
29
29
|
4. bare URL を Slack で安定しやすい `<https://...>` 形式にそろえる
|
|
30
30
|
5. 崩れた表を、後述のルールで補う
|
|
@@ -33,8 +33,9 @@ Markdown としての厳密さより Slack 上での読みやすさが重要に
|
|
|
33
33
|
- テーブル領域: セル内装飾を解析して `table` ブロックを生成。変換に失敗した場合は `markdown` ブロックに戻す
|
|
34
34
|
- 非テーブル領域: 安全に判定できる単独 Markdown 構文を先にリッチブロックへ変換し、残りのテキストは必要に応じてゼロ幅スペースを加えた上で `markdown` ブロックを生成する
|
|
35
35
|
- `preserve_visual_blank_lines=True` の場合は、残った `markdown` ブロックの内部空行を「ノーブレークスペースだけを含む行」に置き換えてから `markdown` ブロックを作る
|
|
36
|
+
- 整形後のテキストが Slack の `markdown` ブロック上限(12,000 文字)を超える領域は、後述の「markdown ブロックの文字数分割」のルールで複数の `markdown` ブロックに分割する
|
|
36
37
|
|
|
37
|
-
`convert_markdown_to_slack_messages`
|
|
38
|
+
`convert_markdown_to_slack_messages` は上記の結果を、「1メッセージ1テーブル」制約、Slack のメッセージあたりブロック数制限、および「markdown ブロックのサイズ分割」に記載のメッセージあたり展開アイテム・テキスト総量の予算に沿って分割します。
|
|
38
39
|
`convert_markdown_to_slack_payloads` は、同じ分割結果に `chat.postMessage.text` 用のプレビュー文字列を付けた送信データを返します。
|
|
39
40
|
|
|
40
41
|
## 実測ベースの Slack の挙動
|
|
@@ -83,7 +84,7 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
|
|
|
83
84
|
- リストは、テキスト領域の先頭または空行の直後から始まり、連続する非空行がすべてリスト項目で、1〜3スペースの曖昧なネストインデントや Markdown バックスラッシュエスケープに依存せず、直後にインデント付きの継続段落がない場合だけ昇格する
|
|
84
85
|
- フェンスドコード内の table 風行をテーブル解析対象から除外する
|
|
85
86
|
- 内部空行を、必要に応じて段落区切りを見えやすくする補助行へ置き換える
|
|
86
|
-
- `<foo>` や生 HTML 風タグのような、Slack の特殊記法としては無効な `<...>`
|
|
87
|
+
- `<foo>` や生 HTML 風タグのような、Slack の特殊記法としては無効な `<...>` 形式を散文中では無害化する(コードフェンスとインラインコードの中身は原文のまま保持する)
|
|
87
88
|
|
|
88
89
|
## Slack 向けテキスト掃除のルール
|
|
89
90
|
|
|
@@ -91,10 +92,13 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
|
|
|
91
92
|
|
|
92
93
|
- ANSI escape を除去する
|
|
93
94
|
- 一般的な制御文字を除去する
|
|
95
|
+
- ライブラリが内部プレースホルダ用に予約しているマーカー文字(`U+2063`、`U+FFF0`〜`U+FFF3`)を除去し、入力が内部機構と衝突しないようにする
|
|
94
96
|
- 有効な Slack 角括弧トークンは保持する
|
|
95
97
|
- 例: リンク、メンション、チャンネル参照、`<!here>`、`<!subteam^...>`、`<!date^...>`
|
|
96
98
|
- Slack の特殊記法として解釈できない `<foo>` のようなトークンは `<foo>` に変換して無害化する
|
|
97
99
|
- これには `<div>` や `<span>` のような生 HTML 風タグも含まれる
|
|
100
|
+
- 角括弧トークンの無害化はコードフェンスとインラインコードの外側にのみ適用する。`` `<div>` `` のようなコード例は原文のまま Slack に届く。ANSI / 制御文字 / マーカー文字の除去は、表示内容として正当な用途がないため全体に適用する
|
|
101
|
+
- この判定でのインラインコードスパンは同一行内に限り、開始と同じ長さのバッククォート run でのみ閉じる。対になっていない孤立バッククォートはリテラルのまま扱われ、後続行のサニタイズを妨げない。また、コードスパンをまたぐ無効な角括弧トークン(`<foo `bar` baz>`)はスパン内容を原文のまま保ちつつ全体を無害化する
|
|
98
102
|
|
|
99
103
|
## underscore 装飾正規化ルール
|
|
100
104
|
|
|
@@ -211,6 +215,25 @@ LLM は外枠パイプの省略、区切り行の欠落、列数の不一致な
|
|
|
211
215
|
| U+200C | ZWNJ(ゼロ幅非接合子) | ペルシャ語・ヒンディー語などの語形制御に使われる |
|
|
212
216
|
| U+200D | ZWJ(ゼロ幅接合子) | 結合絵文字やその他の文字結合に必要 |
|
|
213
217
|
|
|
218
|
+
## markdown ブロックのサイズ分割
|
|
219
|
+
|
|
220
|
+
2026-06-11 に実ワークスペースで、Slack 側の 3 つのハード制限を実測しました。
|
|
221
|
+
|
|
222
|
+
- `markdown` ブロックの `text` はちょうど 12,000 文字まで受理。12,001 文字は `chat.postMessage` 全体が `msg_too_long` で拒否される
|
|
223
|
+
- Slack は `markdown` ブロックをサーバ側でネイティブブロック列に展開し、展開後の「アイテム数 ≤ 50」をメッセージ単位で検証する(超過は `invalid_blocks`)。見出しと区切り線は 1 個ずつアイテムになる(見出し 50 個は受理、51 個は拒否。30 見出し×2 ブロックも合算で拒否)。段落・リスト・引用・コードフェンスは、見出し/区切り線に挟まれた連続区間ごとに 1 アイテムへ集約される(空行区切りの段落 60 個やフェンス 52 個は受理。空行だけでは区間は分かれない)
|
|
224
|
+
- 1 メッセージのブロックが運べるテキスト総量はちょうど 13,200 文字(単一ブロック上限の 1.1 倍)。13,201 文字は `msg_blocks_too_long` で拒否される。総量はブロック種別をまたいで数えられる(11,900 文字の `markdown` ブロック + 1,400 文字の `rich_text` も拒否された)
|
|
225
|
+
|
|
226
|
+
このため、長い、または見出しの多い非テーブル領域は送信前に分割します。
|
|
227
|
+
|
|
228
|
+
- まず領域全体を 1 ブロックとして試し、整形後テキストが文字数上限を超えるか、展開アイテム数の見積もりが予算を超えた場合のみ分割する
|
|
229
|
+
- ゼロ幅スペースや補助行の挿入でテキストが膨らみ、アイテム数見積もりも意図的に保守的なため、生テキストは上限より低い目標値(11,500 文字 / 45 アイテム)に向けて詰める
|
|
230
|
+
- 分割点はまず段落境界(コードフェンス外の空行のまとまり)を選ぶ。境界に使った空行は、隣接ブロック自体が視覚的に分かれて表示されるため除去する
|
|
231
|
+
- 予算を超える単一段落は行境界で、超過する単一行は語境界で分割する。スペースが無い場合(密な CJK 文など)はやむを得ず文字位置で切る
|
|
232
|
+
- 未閉鎖のコードフェンス内で切れる場合は、続きのブロック先頭に元のフェンス開始行を再掲し、両方がコードとして表示され続けるようにする
|
|
233
|
+
- 分割後の各ピースも整形後に再チェックし、ハード制限を超える場合は詰め込み予算を縮めて再分割する
|
|
234
|
+
- `convert_markdown_to_slack_messages` はさらに、メッセージ内の展開アイテム見積もりの合計が 50 以内、ブロックテキストの総量が 13,200 文字以内に収まるようにブロックを束ねる(`markdown` 以外のブロックは 1 アイテムと数え、テキスト総量には全ブロック種別の内容を算入する)
|
|
235
|
+
- 最上位のフォールバック `text` フィールドには文字数上限は適用されない(Slack は拒否せず切り詰める)ため、プレビュー文字列は分割しない
|
|
236
|
+
|
|
214
237
|
## 空行の見え方を補うオプション
|
|
215
238
|
|
|
216
239
|
メインの変換 API に `preserve_visual_blank_lines=True` を渡すと、非テーブル領域で見える行に挟まれた空行だけを「ノーブレークスペースだけを含む行」に置き換えてから Slack `markdown` ブロックを生成します。
|
|
@@ -10,7 +10,7 @@ This document describes how `slack-markdown-parser` converts Markdown into Slack
|
|
|
10
10
|
## Output
|
|
11
11
|
|
|
12
12
|
- Slack Block Kit blocks (`markdown`, `table`, `rich_text`, `image`, and `divider`)
|
|
13
|
-
- When the input contains multiple tables
|
|
13
|
+
- When the input contains multiple tables, many promoted blocks, or long content, a list of messages that satisfies the "one table per message" rule, Slack's per-message block-count limit, and the measured size limits described in "Markdown block size splitting"
|
|
14
14
|
|
|
15
15
|
## Design target
|
|
16
16
|
|
|
@@ -23,8 +23,8 @@ When exact Markdown fidelity conflicts with Slack readability, readable Slack ou
|
|
|
23
23
|
|
|
24
24
|
`convert_markdown_to_slack_blocks` processes text in this order:
|
|
25
25
|
|
|
26
|
-
1. Decode HTML entities such as `>` and `&`
|
|
27
|
-
2. Clean Slack text
|
|
26
|
+
1. Decode HTML entities such as `>` and `&` in prose, leaving fenced code blocks and inline code spans verbatim
|
|
27
|
+
2. Clean Slack text: remove ANSI/control noise and this library's reserved internal marker code points everywhere, and neutralize invalid Slack angle-bracket tokens outside fenced code blocks and inline code spans
|
|
28
28
|
3. Normalize underscore emphasis by converting `_..._` / `__...__` into Slack-friendly `*...*` / `**...**`
|
|
29
29
|
4. Normalize bare URLs by wrapping them in Slack-friendly `<https://...>` form
|
|
30
30
|
5. Repair malformed tables using the rules below
|
|
@@ -33,8 +33,9 @@ When exact Markdown fidelity conflicts with Slack readability, readable Slack ou
|
|
|
33
33
|
- Table regions: parse inline cell styling and generate a `table` block. If conversion fails, such as when there are fewer than two candidate lines or the parse result is empty, fall back to a `markdown` block.
|
|
34
34
|
- Non-table regions: first promote safe standalone Markdown constructs into richer Block Kit blocks, then add zero-width spaces where needed and generate `markdown` blocks for the remaining text.
|
|
35
35
|
- If `preserve_visual_blank_lines=True`, replace internal blank lines in remaining `markdown` blocks with lines that contain only a non-breaking space before emitting the `markdown` block.
|
|
36
|
+
- A remaining region whose formatted text would exceed Slack's 12,000-character `markdown` block limit is split into multiple `markdown` blocks using the rules in "Markdown block length splitting" below.
|
|
36
37
|
|
|
37
|
-
`convert_markdown_to_slack_messages` then splits the resulting block list to satisfy the "one table per message" rule
|
|
38
|
+
`convert_markdown_to_slack_messages` then splits the resulting block list to satisfy the "one table per message" rule, Slack's per-message block-count limit, and the per-message expansion-item and total-text budgets described in "Markdown block size splitting".
|
|
38
39
|
`convert_markdown_to_slack_payloads` returns the same split blocks plus preview `text` values ready for `chat.postMessage`.
|
|
39
40
|
|
|
40
41
|
## How Slack behaved in testing
|
|
@@ -84,7 +85,7 @@ Slack still controls when those newer features appear and how they look, so trea
|
|
|
84
85
|
- Slack mention tokens inside a promoted list item are converted to their structured `rich_text` elements — `<@U…>`/`<@W…>` to `user`, `<#C…>`/`<#G…>` to `channel`, `<!subteam^S…>` to `usergroup`, and `<!here>`/`<!channel>`/`<!everyone>` to `broadcast` — since a `rich_text` block does not resolve a raw token. An optional `|label` display suffix is dropped (Slack renders the element from the id).
|
|
85
86
|
- Table-like rows inside fenced code blocks are kept out of table parsing
|
|
86
87
|
- Internal blank lines can optionally be rewritten into placeholder lines so Slack keeps visible paragraph separation
|
|
87
|
-
- Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized
|
|
88
|
+
- Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized in prose, while fenced code blocks and inline code spans keep them verbatim
|
|
88
89
|
|
|
89
90
|
## Slack text cleanup rules
|
|
90
91
|
|
|
@@ -92,9 +93,12 @@ Behavior of `sanitize_slack_text`:
|
|
|
92
93
|
|
|
93
94
|
- Remove ANSI escape sequences
|
|
94
95
|
- Remove general control characters except line breaks and tabs already preserved by the regex
|
|
96
|
+
- Remove this library's reserved internal marker code points (`U+2063`, `U+FFF0`–`U+FFF3`) so input cannot collide with the internal placeholder machinery
|
|
95
97
|
- Keep valid Slack angle-bracket tokens such as links, mentions, channels, special mentions, subteam mentions, and `<!date^...>`
|
|
96
98
|
- Replace unsupported angle-bracket tokens such as `<foo>` with full-width brackets (`<foo>`) so Slack does not interpret them as malformed special syntax
|
|
97
99
|
- This also applies to raw HTML-like tags such as `<div>` or `<span>`
|
|
100
|
+
- Angle-token neutralization applies only outside fenced code blocks and inline code spans, so code samples such as `` `<div>` `` reach Slack verbatim; ANSI/control/marker removal applies everywhere because those characters are never legitimate content
|
|
101
|
+
- For this purpose an inline code span is recognized within a single line only, and it closes only on a backtick run of the same length as the opener. A stray unpaired backtick therefore stays literal and cannot suppress sanitization of later lines, and an invalid angle token that spans a code span (`<foo `bar` baz>`) is still neutralized as a whole while the span content stays verbatim
|
|
98
102
|
|
|
99
103
|
## Underscore emphasis normalization rules
|
|
100
104
|
|
|
@@ -211,6 +215,25 @@ Exception:
|
|
|
211
215
|
| `U+200C` | ZWNJ (zero-width non-joiner) | Used for word-shape control in languages such as Persian and Hindi |
|
|
212
216
|
| `U+200D` | ZWJ (zero-width joiner) | Required for joined emoji and other grapheme composition |
|
|
213
217
|
|
|
218
|
+
## Markdown block size splitting
|
|
219
|
+
|
|
220
|
+
Three Slack-side hard limits were measured against a real workspace on 2026-06-11:
|
|
221
|
+
|
|
222
|
+
- A `markdown` block's `text` accepts exactly 12,000 characters; 12,001 fails the whole `chat.postMessage` call with `msg_too_long`.
|
|
223
|
+
- Slack expands `markdown` blocks server-side into native blocks and enforces "no more than 50 items" on the expanded result per message (`invalid_blocks`). Each heading and each thematic break becomes its own item (50 headings were accepted, 51 rejected; 30 headings in each of two blocks were rejected together), while paragraphs, lists, quotes, and fenced code merge into one item per contiguous run between those breakers (60 blank-separated paragraphs and 52 fences were accepted). Blank lines alone do not split a run.
|
|
224
|
+
- One message's blocks may carry at most 13,200 characters of text in total — exactly 1.1 × the single-block limit; 13,201 fails with `msg_blocks_too_long`. The total counts content across block types (a 11,900-character `markdown` block plus a 1,400-character `rich_text` was rejected).
|
|
225
|
+
|
|
226
|
+
Long or heading-dense non-table regions are therefore split before delivery:
|
|
227
|
+
|
|
228
|
+
- The whole region is tried as a single block first; splitting happens only when the formatted text exceeds the character limit or the estimated expansion exceeds the per-message item budget
|
|
229
|
+
- Raw content is packed toward targets below the hard limits (11,500 characters, 45 estimated items), because zero-width-space insertion and placeholder lines inflate the formatted text and the item estimate is intentionally conservative
|
|
230
|
+
- Split points prefer paragraph boundaries (blank-line runs outside fenced code); the blank run at a chosen boundary is dropped, since adjacent Slack blocks already render visually separated
|
|
231
|
+
- A single paragraph longer than the budget is split at line boundaries, and a single overlong line at word boundaries, with a hard cut when no space exists (for example dense CJK text)
|
|
232
|
+
- When a cut lands inside an unclosed fenced code block, the continuation block re-opens the fence with the original delimiter line so both halves keep rendering as code
|
|
233
|
+
- Each piece is re-checked after formatting; when it still exceeds a hard limit, the packing budgets shrink and the piece is split again
|
|
234
|
+
- `convert_markdown_to_slack_messages` additionally packs blocks into messages so that the summed expansion estimate stays within the 50-item budget (non-`markdown` blocks count as one item each) and the summed block text stays within the 13,200-character per-message total
|
|
235
|
+
- The top-level fallback `text` field is not subject to the character limit (Slack truncates it instead of rejecting), so preview text is left whole
|
|
236
|
+
|
|
214
237
|
## Optional blank-line visibility workaround
|
|
215
238
|
|
|
216
239
|
When `preserve_visual_blank_lines=True` is passed to the main conversion APIs,
|
{slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser/converter.py
RENAMED
|
@@ -8,6 +8,7 @@ from __future__ import annotations
|
|
|
8
8
|
|
|
9
9
|
import html
|
|
10
10
|
import re
|
|
11
|
+
from collections.abc import Callable, Iterable, Iterator
|
|
11
12
|
from typing import Any
|
|
12
13
|
from urllib.parse import urlparse
|
|
13
14
|
|
|
@@ -22,6 +23,14 @@ ANSI_ESCAPE_PATTERN = re.compile(
|
|
|
22
23
|
r"\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]|\][^\x1B\x07]*(?:\x07|\x1B\\))"
|
|
23
24
|
)
|
|
24
25
|
CONTROL_CHAR_PATTERN = re.compile(r"[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]")
|
|
26
|
+
# In-band marker code points reserved by this module's placeholder/spacing
|
|
27
|
+
# machinery: SYNTH_SPACE_MARKER (U+2063), the inline-code placeholder
|
|
28
|
+
# delimiters (U+FFF0/U+FFF1), and the ZWSP-strip markers (U+FFF2/U+FFF3).
|
|
29
|
+
# They have no legitimate use in chat text, and input that carries them would
|
|
30
|
+
# collide with internal placeholders (a stray ``\ufff0code0\ufff1`` either
|
|
31
|
+
# crashes restoration with KeyError or gets substituted with another span's
|
|
32
|
+
# content), so they are removed up front together with control characters.
|
|
33
|
+
INTERNAL_MARKER_CHAR_PATTERN = re.compile("[\u2063\ufff0-\ufff3]")
|
|
25
34
|
SLACK_ANGLE_TOKEN_PATTERN = re.compile(r"<[^>\n]+>")
|
|
26
35
|
BARE_URL_PATTERN = re.compile(r"https?://[^\s<]+", re.IGNORECASE)
|
|
27
36
|
FENCE_OPEN_PATTERN = re.compile(r"^[ \t]{0,3}(`{3,}|~{3,})([^\n]*)$")
|
|
@@ -107,13 +116,47 @@ ALLOWED_SLACK_ANGLE_TOKEN_PATTERNS = (
|
|
|
107
116
|
re.compile(r"^<!date\^[^>\n]+>$"),
|
|
108
117
|
)
|
|
109
118
|
SLACK_MAX_BLOCKS_PER_MESSAGE = 50
|
|
119
|
+
# Verified against a real Slack workspace (2026-06-11): a ``markdown`` block's
|
|
120
|
+
# ``text`` accepts exactly 12,000 characters, while 12,001 fails the whole
|
|
121
|
+
# chat.postMessage call with ``msg_too_long``. The top-level fallback ``text``
|
|
122
|
+
# field is not subject to this limit (40,001 characters was accepted).
|
|
123
|
+
SLACK_MAX_MARKDOWN_TEXT_LENGTH = 12000
|
|
124
|
+
# Raw-content packing target used when an oversized markdown segment is split.
|
|
125
|
+
# Formatting inflates text (ZWSP padding, NBSP blank-line placeholders), so
|
|
126
|
+
# pieces are packed below the hard limit; the block builder re-splits any
|
|
127
|
+
# piece whose *formatted* text still exceeds the hard limit.
|
|
128
|
+
_MARKDOWN_SPLIT_TARGET_LENGTH = 11500
|
|
129
|
+
# Slack expands a ``markdown`` block server-side into native blocks and then
|
|
130
|
+
# enforces "no more than 50 items" on the expanded result *per message*.
|
|
131
|
+
# Measured against a real workspace (2026-06-11): each heading and each
|
|
132
|
+
# thematic break becomes its own item (50 headings accepted, 51 rejected;
|
|
133
|
+
# 30 headings in each of two blocks rejected), while paragraphs, lists,
|
|
134
|
+
# quotes, and fenced code merge into one item per run between those breakers
|
|
135
|
+
# (60 blank-separated paragraphs and 52 fences were accepted).
|
|
136
|
+
SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE = 50
|
|
137
|
+
# Per-block packing target, leaving headroom for estimation error.
|
|
138
|
+
_MARKDOWN_EXPANSION_ITEMS_TARGET = 45
|
|
139
|
+
# Third measured hard limit (2026-06-11): the text carried by one message's
|
|
140
|
+
# blocks *in total* — across block types; rich_text content counts too — may
|
|
141
|
+
# not exceed 13,200 characters (= 1.1 × the single-block limit). 13,201
|
|
142
|
+
# fails the whole call with ``msg_blocks_too_long``.
|
|
143
|
+
SLACK_MAX_MESSAGE_BLOCKS_TEXT_LENGTH = 13200
|
|
144
|
+
# Packing target, leaving headroom for structural fields the size proxy may
|
|
145
|
+
# not count exactly.
|
|
146
|
+
_MESSAGE_BLOCKS_TEXT_TARGET = 12800
|
|
147
|
+
ATX_HEADING_PATTERN = re.compile(r"^[ \t]{0,3}#{1,6}(?:[ \t]|$)")
|
|
110
148
|
|
|
111
149
|
|
|
112
150
|
def decode_html_entities(text: str) -> str:
|
|
113
|
-
"""Decode HTML entities
|
|
114
|
-
|
|
151
|
+
"""Decode HTML entities in prose while leaving code regions verbatim.
|
|
152
|
+
|
|
153
|
+
Prose entities such as ``>`` are decoded for natural reading, but a
|
|
154
|
+
fenced code block or inline code span showing ``&`` keeps the literal
|
|
155
|
+
entity: code samples are content, not markup to repair.
|
|
156
|
+
"""
|
|
157
|
+
if not text or "&" not in text:
|
|
115
158
|
return text
|
|
116
|
-
return html.unescape
|
|
159
|
+
return _transform_outside_code_regions(text, html.unescape)
|
|
117
160
|
|
|
118
161
|
|
|
119
162
|
def strip_zero_width_spaces(text: str) -> str:
|
|
@@ -566,15 +609,90 @@ def _is_allowed_slack_angle_token(token: str) -> bool:
|
|
|
566
609
|
|
|
567
610
|
|
|
568
611
|
def _find_inline_code_span_end(text: str, start: int) -> int | None:
|
|
612
|
+
"""Find the end of the inline code span opened at ``start``.
|
|
613
|
+
|
|
614
|
+
Per CommonMark, a code span closes only with a backtick run of *equal*
|
|
615
|
+
length: a lone `` ` `` must not pair with the first backtick of a later
|
|
616
|
+
``` `` ``` run. Runs of a different length are skipped whole.
|
|
617
|
+
"""
|
|
569
618
|
delimiter_end = start
|
|
570
619
|
while delimiter_end < len(text) and text[delimiter_end] == "`":
|
|
571
620
|
delimiter_end += 1
|
|
621
|
+
delimiter_length = delimiter_end - start
|
|
572
622
|
|
|
573
|
-
|
|
574
|
-
|
|
575
|
-
|
|
576
|
-
|
|
577
|
-
|
|
623
|
+
cursor = delimiter_end
|
|
624
|
+
while True:
|
|
625
|
+
closing = text.find("`", cursor)
|
|
626
|
+
if closing == -1:
|
|
627
|
+
return None
|
|
628
|
+
run_end = closing
|
|
629
|
+
while run_end < len(text) and text[run_end] == "`":
|
|
630
|
+
run_end += 1
|
|
631
|
+
if run_end - closing == delimiter_length:
|
|
632
|
+
return run_end
|
|
633
|
+
cursor = run_end
|
|
634
|
+
|
|
635
|
+
|
|
636
|
+
def _transform_outside_inline_code(text: str, transform: Callable[[str], str]) -> str:
|
|
637
|
+
"""Apply ``transform`` to text while keeping inline code spans verbatim.
|
|
638
|
+
|
|
639
|
+
Spans are bounded to a single line, matching this module's span model
|
|
640
|
+
(``INLINE_CODE_SPAN_PATTERN``). Without that bound, one stray backtick
|
|
641
|
+
would pair with a backtick on a much later line and silently suppress
|
|
642
|
+
sanitization for everything in between.
|
|
643
|
+
|
|
644
|
+
Spans are replaced with placeholder tokens (which contain no backticks or
|
|
645
|
+
angle brackets) rather than split out, so the transform still sees any
|
|
646
|
+
construct that *spans* a code span — e.g. an invalid angle token such as
|
|
647
|
+
``<foo `bar` baz>`` is neutralized as a whole while the span content
|
|
648
|
+
itself stays verbatim. Reserved marker code points are stripped from the
|
|
649
|
+
input first, so crafted input cannot collide with the placeholders.
|
|
650
|
+
"""
|
|
651
|
+
text = INTERNAL_MARKER_CHAR_PATTERN.sub("", text)
|
|
652
|
+
|
|
653
|
+
spans: list[str] = []
|
|
654
|
+
parts: list[str] = []
|
|
655
|
+
plain_start = 0
|
|
656
|
+
cursor = text.find("`")
|
|
657
|
+
|
|
658
|
+
while cursor != -1:
|
|
659
|
+
span_end = _find_inline_code_span_end(text, cursor)
|
|
660
|
+
if span_end is None or "\n" in text[cursor:span_end]:
|
|
661
|
+
# No same-line closing run: the backticks are literal text.
|
|
662
|
+
delimiter_end = cursor
|
|
663
|
+
while delimiter_end < len(text) and text[delimiter_end] == "`":
|
|
664
|
+
delimiter_end += 1
|
|
665
|
+
cursor = text.find("`", delimiter_end)
|
|
666
|
+
continue
|
|
667
|
+
parts.append(text[plain_start:cursor])
|
|
668
|
+
parts.append(f"\ufff0code{len(spans)}\ufff1")
|
|
669
|
+
spans.append(text[cursor:span_end])
|
|
670
|
+
plain_start = span_end
|
|
671
|
+
cursor = text.find("`", span_end)
|
|
672
|
+
|
|
673
|
+
parts.append(text[plain_start:])
|
|
674
|
+
transformed = transform("".join(parts))
|
|
675
|
+
|
|
676
|
+
if not spans:
|
|
677
|
+
return transformed
|
|
678
|
+
placeholder_map = {f"\ufff0code{idx}\ufff1": span for idx, span in enumerate(spans)}
|
|
679
|
+
return INLINE_CODE_PLACEHOLDER_PATTERN.sub(
|
|
680
|
+
lambda match: placeholder_map.get(match.group(0), match.group(0)),
|
|
681
|
+
transformed,
|
|
682
|
+
)
|
|
683
|
+
|
|
684
|
+
|
|
685
|
+
def _transform_outside_code_regions(text: str, transform: Callable[[str], str]) -> str:
|
|
686
|
+
"""Apply ``transform`` outside fenced code blocks and inline code spans.
|
|
687
|
+
|
|
688
|
+
Code samples must reach Slack verbatim: neither Slack's ``markdown`` block
|
|
689
|
+
renderer nor ``rich_text_preformatted`` interprets their content, so any
|
|
690
|
+
rewrite inside a code region is visible corruption.
|
|
691
|
+
"""
|
|
692
|
+
return "".join(
|
|
693
|
+
chunk if is_fenced else _transform_outside_inline_code(chunk, transform)
|
|
694
|
+
for is_fenced, chunk in _split_fenced_code_chunks(text)
|
|
695
|
+
)
|
|
578
696
|
|
|
579
697
|
|
|
580
698
|
def _is_punctuation_like(char: str, boundary_chars: set[str]) -> bool:
|
|
@@ -690,12 +808,21 @@ def normalize_bare_urls_for_slack_markdown(text: str) -> str:
|
|
|
690
808
|
|
|
691
809
|
|
|
692
810
|
def sanitize_slack_text(text: str) -> str:
|
|
693
|
-
"""Remove control noise and neutralize invalid Slack angle tokens.
|
|
811
|
+
"""Remove control noise and neutralize invalid Slack angle tokens.
|
|
812
|
+
|
|
813
|
+
ANSI escapes, control characters, and this module's reserved in-band
|
|
814
|
+
marker code points are removed everywhere — including code regions —
|
|
815
|
+
because they are never legitimate visible content. Angle-token
|
|
816
|
+
neutralization rewrites visible text, so it skips fenced code blocks and
|
|
817
|
+
inline code spans: a code sample containing ``<div>`` must reach Slack
|
|
818
|
+
verbatim.
|
|
819
|
+
"""
|
|
694
820
|
if not text:
|
|
695
821
|
return text
|
|
696
822
|
|
|
697
823
|
cleaned = ANSI_ESCAPE_PATTERN.sub("", text)
|
|
698
824
|
cleaned = CONTROL_CHAR_PATTERN.sub("", cleaned)
|
|
825
|
+
cleaned = INTERNAL_MARKER_CHAR_PATTERN.sub("", cleaned)
|
|
699
826
|
|
|
700
827
|
def replace_invalid_token(match: re.Match[str]) -> str:
|
|
701
828
|
token = match.group(0)
|
|
@@ -703,7 +830,10 @@ def sanitize_slack_text(text: str) -> str:
|
|
|
703
830
|
return token
|
|
704
831
|
return f"<{token[1:-1]}>"
|
|
705
832
|
|
|
706
|
-
|
|
833
|
+
def neutralize_angle_tokens(segment: str) -> str:
|
|
834
|
+
return SLACK_ANGLE_TOKEN_PATTERN.sub(replace_invalid_token, segment)
|
|
835
|
+
|
|
836
|
+
return _transform_outside_code_regions(cleaned, neutralize_angle_tokens)
|
|
707
837
|
|
|
708
838
|
|
|
709
839
|
def _match_fence_open(line: str) -> tuple[str, int] | None:
|
|
@@ -723,38 +853,58 @@ def _is_fence_close(line: str, fence: tuple[str, int]) -> bool:
|
|
|
723
853
|
)
|
|
724
854
|
|
|
725
855
|
|
|
856
|
+
def _iter_fence_states(lines: Iterable[str]) -> Iterator[tuple[str, bool, bool]]:
|
|
857
|
+
"""Yield ``(line, is_fenced, is_opening)`` for each line.
|
|
858
|
+
|
|
859
|
+
Single source of truth for fenced-code tracking across this module.
|
|
860
|
+
``is_fenced`` covers the fence delimiter lines themselves and the body of
|
|
861
|
+
an unclosed trailing fence; ``is_opening`` marks the opening delimiter
|
|
862
|
+
line so callers can flush per-fence state. Works with or without trailing
|
|
863
|
+
newlines on the lines.
|
|
864
|
+
"""
|
|
865
|
+
active_fence: tuple[str, int] | None = None
|
|
866
|
+
for line in lines:
|
|
867
|
+
if active_fence is None:
|
|
868
|
+
opening_fence = _match_fence_open(line)
|
|
869
|
+
if opening_fence is not None:
|
|
870
|
+
active_fence = opening_fence
|
|
871
|
+
yield line, True, True
|
|
872
|
+
continue
|
|
873
|
+
yield line, False, False
|
|
874
|
+
continue
|
|
875
|
+
yield line, True, False
|
|
876
|
+
if _is_fence_close(line, active_fence):
|
|
877
|
+
active_fence = None
|
|
878
|
+
|
|
879
|
+
|
|
726
880
|
def _split_fenced_code_chunks(text: str) -> list[tuple[bool, str]]:
|
|
727
881
|
chunks: list[tuple[bool, str]] = []
|
|
728
882
|
if not text:
|
|
729
883
|
return chunks
|
|
730
884
|
|
|
731
885
|
current: list[str] = []
|
|
732
|
-
|
|
886
|
+
current_is_fenced = False
|
|
733
887
|
|
|
734
|
-
for line in
|
|
735
|
-
|
|
736
|
-
|
|
737
|
-
if
|
|
738
|
-
|
|
739
|
-
chunks.append((False, "".join(current)))
|
|
740
|
-
current = []
|
|
741
|
-
current.append(line)
|
|
742
|
-
active_fence = opening_fence
|
|
743
|
-
continue
|
|
744
|
-
|
|
745
|
-
current.append(line)
|
|
746
|
-
if active_fence and _is_fence_close(line, active_fence):
|
|
747
|
-
chunks.append((True, "".join(current)))
|
|
888
|
+
for line, is_fenced, is_opening in _iter_fence_states(
|
|
889
|
+
text.splitlines(keepends=True)
|
|
890
|
+
):
|
|
891
|
+
if current and (is_opening or is_fenced != current_is_fenced):
|
|
892
|
+
chunks.append((current_is_fenced, "".join(current)))
|
|
748
893
|
current = []
|
|
749
|
-
|
|
894
|
+
current.append(line)
|
|
895
|
+
current_is_fenced = is_fenced
|
|
750
896
|
|
|
751
897
|
if current:
|
|
752
|
-
chunks.append((
|
|
898
|
+
chunks.append((current_is_fenced, "".join(current)))
|
|
753
899
|
|
|
754
900
|
return chunks
|
|
755
901
|
|
|
756
902
|
|
|
757
903
|
def _normalize_underscore_emphasis_chunk(text: str) -> str:
|
|
904
|
+
# Same defense as _format_markdown_with_spacing_metadata: reserved marker
|
|
905
|
+
# code points in direct-call input must not collide with the numbered
|
|
906
|
+
# placeholders below.
|
|
907
|
+
text = INTERNAL_MARKER_CHAR_PATTERN.sub("", text)
|
|
758
908
|
protected_spans: list[str] = []
|
|
759
909
|
|
|
760
910
|
def protect(match: re.Match[str]) -> str:
|
|
@@ -798,6 +948,11 @@ def _format_markdown_with_spacing_metadata(text: str) -> tuple[str, list[int]]:
|
|
|
798
948
|
if not text:
|
|
799
949
|
return text, []
|
|
800
950
|
|
|
951
|
+
# Defense in depth for direct calls that bypass sanitize_slack_text: input
|
|
952
|
+
# carrying our reserved marker code points would collide with the inline
|
|
953
|
+
# placeholder machinery below.
|
|
954
|
+
text = INTERNAL_MARKER_CHAR_PATTERN.sub("", text)
|
|
955
|
+
|
|
801
956
|
boundary_chars = {*VISIBLE_BOUNDARY_CHARS, ZWSP, SYNTH_SPACE_MARKER}
|
|
802
957
|
|
|
803
958
|
def wrap_match(match: re.Match[str], source: str) -> str:
|
|
@@ -871,9 +1026,16 @@ def _format_markdown_with_spacing_metadata(text: str) -> tuple[str, list[int]]:
|
|
|
871
1026
|
before_char = source[start - 1] if start > 0 else ""
|
|
872
1027
|
after_char = source[end] if end < len(source) else ""
|
|
873
1028
|
strategy = _nested_code_space_strategy(source, start, end, boundary_chars)
|
|
1029
|
+
|
|
1030
|
+
def resolve_placeholder_raw(placeholder_match: re.Match[str]) -> str:
|
|
1031
|
+
# Unknown placeholder-shaped sequences pass through unchanged
|
|
1032
|
+
# (belt-and-braces against in-band collisions; the markers are
|
|
1033
|
+
# already stripped at every entry point).
|
|
1034
|
+
entry = replacements.get(placeholder_match.group(0))
|
|
1035
|
+
return entry["raw"] if entry else placeholder_match.group(0)
|
|
1036
|
+
|
|
874
1037
|
resolved_text = INLINE_CODE_PLACEHOLDER_PATTERN.sub(
|
|
875
|
-
|
|
876
|
-
match.group(0),
|
|
1038
|
+
resolve_placeholder_raw, match.group(0)
|
|
877
1039
|
)
|
|
878
1040
|
has_ascii_word = bool(re.search(r"[A-Za-z0-9]", resolved_text))
|
|
879
1041
|
adjusted_text = match.group(0)
|
|
@@ -981,11 +1143,12 @@ def _format_markdown_with_spacing_metadata(text: str) -> tuple[str, list[int]]:
|
|
|
981
1143
|
protected_segment,
|
|
982
1144
|
)
|
|
983
1145
|
|
|
1146
|
+
def restore_placeholder(placeholder_match: re.Match[str]) -> str:
|
|
1147
|
+
entry = placeholder_map.get(placeholder_match.group(0))
|
|
1148
|
+
return entry["wrapped"] if entry else placeholder_match.group(0)
|
|
1149
|
+
|
|
984
1150
|
protected_segment = INLINE_CODE_PLACEHOLDER_PATTERN.sub(
|
|
985
|
-
|
|
986
|
-
"wrapped"
|
|
987
|
-
],
|
|
988
|
-
protected_segment,
|
|
1151
|
+
restore_placeholder, protected_segment
|
|
989
1152
|
)
|
|
990
1153
|
|
|
991
1154
|
protected_segment = re.sub(
|
|
@@ -1275,20 +1438,11 @@ def normalize_markdown_tables(markdown_text: str) -> str:
|
|
|
1275
1438
|
normalized.extend(buffer)
|
|
1276
1439
|
buffer = []
|
|
1277
1440
|
|
|
1278
|
-
|
|
1279
|
-
|
|
1280
|
-
|
|
1281
|
-
|
|
1282
|
-
if opening_fence:
|
|
1283
|
-
flush_buffer()
|
|
1284
|
-
normalized.append(line)
|
|
1285
|
-
active_fence = opening_fence
|
|
1286
|
-
continue
|
|
1287
|
-
|
|
1288
|
-
if active_fence:
|
|
1441
|
+
for idx, (line, is_fenced, is_opening) in enumerate(_iter_fence_states(lines)):
|
|
1442
|
+
if is_fenced:
|
|
1443
|
+
if is_opening:
|
|
1444
|
+
flush_buffer()
|
|
1289
1445
|
normalized.append(line)
|
|
1290
|
-
if _is_fence_close(line, active_fence):
|
|
1291
|
-
active_fence = None
|
|
1292
1446
|
continue
|
|
1293
1447
|
|
|
1294
1448
|
stripped = line.strip()
|
|
@@ -1608,9 +1762,283 @@ def _create_markdown_block(
|
|
|
1608
1762
|
block._plain_text = plain_text
|
|
1609
1763
|
block._synthetic_space_indices = synthetic_indices
|
|
1610
1764
|
block._synthetic_blank_line_indices = synthetic_blank_line_indices
|
|
1765
|
+
block._expansion_items = _estimate_markdown_expansion_items(formatted)
|
|
1611
1766
|
return block
|
|
1612
1767
|
|
|
1613
1768
|
|
|
1769
|
+
def _is_markdown_expansion_breaker_line(line: str) -> bool:
|
|
1770
|
+
"""Return True for lines Slack expands into their own top-level block.
|
|
1771
|
+
|
|
1772
|
+
ATX headings and thematic breaks each become one expansion item and end
|
|
1773
|
+
the surrounding content run. A setext ``===`` underline is counted too;
|
|
1774
|
+
that can over-count by one (heading text + underline), which only makes
|
|
1775
|
+
the estimate conservative.
|
|
1776
|
+
"""
|
|
1777
|
+
if ATX_HEADING_PATTERN.match(line):
|
|
1778
|
+
return True
|
|
1779
|
+
if _is_thematic_break_line(line):
|
|
1780
|
+
return True
|
|
1781
|
+
stripped = line.strip()
|
|
1782
|
+
return bool(stripped) and set(stripped) == {"="}
|
|
1783
|
+
|
|
1784
|
+
|
|
1785
|
+
def _estimate_markdown_expansion_items(text: str) -> int:
|
|
1786
|
+
"""Estimate how many native blocks Slack expands this markdown text into.
|
|
1787
|
+
|
|
1788
|
+
Model measured against a real workspace (see the constants above): each
|
|
1789
|
+
heading / thematic break is one item, and each maximal run of any other
|
|
1790
|
+
content *between* those breakers is one item — blank lines inside a run
|
|
1791
|
+
do not split it. Fenced lines always count as run content.
|
|
1792
|
+
"""
|
|
1793
|
+
items = 0
|
|
1794
|
+
in_run = False
|
|
1795
|
+
for line, is_fenced, _ in _iter_fence_states(text.split("\n")):
|
|
1796
|
+
if is_fenced:
|
|
1797
|
+
if not in_run:
|
|
1798
|
+
items += 1
|
|
1799
|
+
in_run = True
|
|
1800
|
+
continue
|
|
1801
|
+
if not line.strip():
|
|
1802
|
+
continue
|
|
1803
|
+
if _is_markdown_expansion_breaker_line(line):
|
|
1804
|
+
items += 1
|
|
1805
|
+
in_run = False
|
|
1806
|
+
continue
|
|
1807
|
+
if not in_run:
|
|
1808
|
+
items += 1
|
|
1809
|
+
in_run = True
|
|
1810
|
+
return max(1, items)
|
|
1811
|
+
|
|
1812
|
+
|
|
1813
|
+
def _split_text_at_blank_lines(text: str, max_length: int, max_items: int) -> list[str]:
|
|
1814
|
+
"""Greedily pack paragraph units into pieces within both budgets.
|
|
1815
|
+
|
|
1816
|
+
Units are separated by blank-line runs outside fenced code (blank lines
|
|
1817
|
+
inside a fence never split). The blank run at a piece boundary is dropped
|
|
1818
|
+
— adjacent Slack blocks already render separated — while blank runs
|
|
1819
|
+
packed inside a piece are kept verbatim. A piece is closed when adding
|
|
1820
|
+
the next unit would exceed ``max_length`` characters or ``max_items``
|
|
1821
|
+
estimated expansion items. A single unit over either budget is returned
|
|
1822
|
+
oversized; the caller splits it harder.
|
|
1823
|
+
"""
|
|
1824
|
+
if (
|
|
1825
|
+
len(text) <= max_length
|
|
1826
|
+
and _estimate_markdown_expansion_items(text) <= max_items
|
|
1827
|
+
):
|
|
1828
|
+
return [text]
|
|
1829
|
+
|
|
1830
|
+
units: list[list[str]] = []
|
|
1831
|
+
content: list[str] = []
|
|
1832
|
+
blanks: list[str] = []
|
|
1833
|
+
for line, is_fenced, _ in _iter_fence_states(text.split("\n")):
|
|
1834
|
+
if not is_fenced and not line.strip():
|
|
1835
|
+
if content:
|
|
1836
|
+
blanks.append(line)
|
|
1837
|
+
else:
|
|
1838
|
+
# Leading blank lines stay attached to the first unit.
|
|
1839
|
+
content.append(line)
|
|
1840
|
+
continue
|
|
1841
|
+
if blanks:
|
|
1842
|
+
units.append(content)
|
|
1843
|
+
units.append(blanks)
|
|
1844
|
+
content, blanks = [], []
|
|
1845
|
+
content.append(line)
|
|
1846
|
+
if content:
|
|
1847
|
+
units.append(content)
|
|
1848
|
+
if blanks:
|
|
1849
|
+
units.append(blanks)
|
|
1850
|
+
|
|
1851
|
+
pieces: list[str] = []
|
|
1852
|
+
current: list[str] = []
|
|
1853
|
+
pending_blanks: list[str] = []
|
|
1854
|
+
for index in range(0, len(units), 2):
|
|
1855
|
+
unit = units[index]
|
|
1856
|
+
candidate = current + (pending_blanks if current else []) + unit
|
|
1857
|
+
candidate_text = "\n".join(candidate)
|
|
1858
|
+
if current and (
|
|
1859
|
+
len(candidate_text) > max_length
|
|
1860
|
+
or _estimate_markdown_expansion_items(candidate_text) > max_items
|
|
1861
|
+
):
|
|
1862
|
+
pieces.append("\n".join(current))
|
|
1863
|
+
current = list(unit)
|
|
1864
|
+
else:
|
|
1865
|
+
current = candidate
|
|
1866
|
+
pending_blanks = units[index + 1] if index + 1 < len(units) else []
|
|
1867
|
+
if current:
|
|
1868
|
+
pieces.append("\n".join(current))
|
|
1869
|
+
return pieces
|
|
1870
|
+
|
|
1871
|
+
|
|
1872
|
+
def _split_single_line_to_length(line: str, max_length: int) -> list[str]:
|
|
1873
|
+
"""Split one overlong line, preferring a space boundary near the limit."""
|
|
1874
|
+
parts: list[str] = []
|
|
1875
|
+
while len(line) > max_length:
|
|
1876
|
+
cut = line.rfind(" ", 1, max_length + 1)
|
|
1877
|
+
if cut <= 0:
|
|
1878
|
+
parts.append(line[:max_length])
|
|
1879
|
+
line = line[max_length:]
|
|
1880
|
+
else:
|
|
1881
|
+
parts.append(line[:cut])
|
|
1882
|
+
line = line[cut + 1 :]
|
|
1883
|
+
parts.append(line)
|
|
1884
|
+
return parts
|
|
1885
|
+
|
|
1886
|
+
|
|
1887
|
+
def _split_lines_to_length(text: str, max_length: int, max_items: int) -> list[str]:
|
|
1888
|
+
"""Split at line boundaries into pieces within both budgets.
|
|
1889
|
+
|
|
1890
|
+
Last-resort splitter for content that exceeds a budget without blank-line
|
|
1891
|
+
split points (a single huge paragraph, a fence body, or a long run of
|
|
1892
|
+
headings). When the cut lands inside an (unclosed) fence, the continuation
|
|
1893
|
+
piece re-opens the fence with the original delimiter line so both pieces
|
|
1894
|
+
keep rendering as code.
|
|
1895
|
+
"""
|
|
1896
|
+
pieces: list[str] = []
|
|
1897
|
+
current: list[str] = []
|
|
1898
|
+
current_len = 0
|
|
1899
|
+
current_items = 0
|
|
1900
|
+
in_run = False
|
|
1901
|
+
active_fence_open: str | None = None
|
|
1902
|
+
|
|
1903
|
+
def flush(next_fence_prefix: str | None) -> None:
|
|
1904
|
+
nonlocal current, current_len, current_items, in_run
|
|
1905
|
+
if current:
|
|
1906
|
+
pieces.append("\n".join(current))
|
|
1907
|
+
if next_fence_prefix:
|
|
1908
|
+
current = [next_fence_prefix]
|
|
1909
|
+
current_len = len(next_fence_prefix)
|
|
1910
|
+
current_items = 1
|
|
1911
|
+
in_run = True
|
|
1912
|
+
else:
|
|
1913
|
+
current = []
|
|
1914
|
+
current_len = 0
|
|
1915
|
+
current_items = 0
|
|
1916
|
+
in_run = False
|
|
1917
|
+
|
|
1918
|
+
for line, is_fenced, is_opening in _iter_fence_states(text.split("\n")):
|
|
1919
|
+
if is_opening:
|
|
1920
|
+
active_fence_open = line
|
|
1921
|
+
elif not is_fenced:
|
|
1922
|
+
active_fence_open = None
|
|
1923
|
+
|
|
1924
|
+
line_is_blank = not is_fenced and not line.strip()
|
|
1925
|
+
line_is_breaker = (
|
|
1926
|
+
not is_fenced
|
|
1927
|
+
and not line_is_blank
|
|
1928
|
+
and _is_markdown_expansion_breaker_line(line)
|
|
1929
|
+
)
|
|
1930
|
+
|
|
1931
|
+
for part_index, part in enumerate(
|
|
1932
|
+
[line]
|
|
1933
|
+
if len(line) <= max_length
|
|
1934
|
+
else _split_single_line_to_length(line, max_length)
|
|
1935
|
+
):
|
|
1936
|
+
# Word-split continuations of a breaker line render as plain
|
|
1937
|
+
# content, so only the first part keeps the breaker class.
|
|
1938
|
+
is_breaker = line_is_breaker and part_index == 0
|
|
1939
|
+
if line_is_blank:
|
|
1940
|
+
part_items = 0
|
|
1941
|
+
elif is_breaker:
|
|
1942
|
+
part_items = 1
|
|
1943
|
+
else:
|
|
1944
|
+
part_items = 0 if in_run else 1
|
|
1945
|
+
|
|
1946
|
+
added = len(part) + (1 if current else 0)
|
|
1947
|
+
if current and (
|
|
1948
|
+
current_len + added > max_length
|
|
1949
|
+
or current_items + part_items > max_items
|
|
1950
|
+
):
|
|
1951
|
+
reopen = active_fence_open if active_fence_open != part else None
|
|
1952
|
+
flush(reopen)
|
|
1953
|
+
added = len(part) + (1 if current else 0)
|
|
1954
|
+
if line_is_blank:
|
|
1955
|
+
part_items = 0
|
|
1956
|
+
elif is_breaker:
|
|
1957
|
+
part_items = 1
|
|
1958
|
+
else:
|
|
1959
|
+
part_items = 0 if in_run else 1
|
|
1960
|
+
current.append(part)
|
|
1961
|
+
current_len += added
|
|
1962
|
+
current_items += part_items
|
|
1963
|
+
if is_breaker:
|
|
1964
|
+
in_run = False
|
|
1965
|
+
elif not line_is_blank:
|
|
1966
|
+
in_run = True
|
|
1967
|
+
|
|
1968
|
+
if current:
|
|
1969
|
+
pieces.append("\n".join(current))
|
|
1970
|
+
return pieces
|
|
1971
|
+
|
|
1972
|
+
|
|
1973
|
+
def _markdown_block_fits_slack_limits(block: dict[str, Any]) -> bool:
|
|
1974
|
+
return (
|
|
1975
|
+
len(block["text"]) <= SLACK_MAX_MARKDOWN_TEXT_LENGTH
|
|
1976
|
+
and _estimate_markdown_expansion_items(block["text"])
|
|
1977
|
+
<= SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE
|
|
1978
|
+
)
|
|
1979
|
+
|
|
1980
|
+
|
|
1981
|
+
def _create_markdown_blocks(
|
|
1982
|
+
content: str, *, preserve_visual_blank_lines: bool = False
|
|
1983
|
+
) -> list[dict[str, Any]]:
|
|
1984
|
+
"""Build ``markdown`` blocks that fit Slack's measured hard limits.
|
|
1985
|
+
|
|
1986
|
+
The whole content is tried as a single block first. Only when the
|
|
1987
|
+
*formatted* text exceeds ``SLACK_MAX_MARKDOWN_TEXT_LENGTH`` or the
|
|
1988
|
+
estimated server-side expansion exceeds the per-message item budget is
|
|
1989
|
+
the raw content split — at paragraph boundaries when possible, then at
|
|
1990
|
+
line/word boundaries — and each piece re-checked after formatting,
|
|
1991
|
+
shrinking the packing budget geometrically until every block fits.
|
|
1992
|
+
"""
|
|
1993
|
+
|
|
1994
|
+
def build(piece: str) -> dict[str, Any] | None:
|
|
1995
|
+
return _create_markdown_block(
|
|
1996
|
+
piece, preserve_visual_blank_lines=preserve_visual_blank_lines
|
|
1997
|
+
)
|
|
1998
|
+
|
|
1999
|
+
whole = build(content)
|
|
2000
|
+
if whole is None:
|
|
2001
|
+
return []
|
|
2002
|
+
if _markdown_block_fits_slack_limits(whole):
|
|
2003
|
+
return [whole]
|
|
2004
|
+
|
|
2005
|
+
blocks: list[dict[str, Any]] = []
|
|
2006
|
+
worklist: list[tuple[str, int, int]] = [
|
|
2007
|
+
(content, _MARKDOWN_SPLIT_TARGET_LENGTH, _MARKDOWN_EXPANSION_ITEMS_TARGET)
|
|
2008
|
+
]
|
|
2009
|
+
while worklist:
|
|
2010
|
+
piece, budget, items_budget = worklist.pop(0)
|
|
2011
|
+
block = build(piece)
|
|
2012
|
+
if block is None:
|
|
2013
|
+
continue
|
|
2014
|
+
if _markdown_block_fits_slack_limits(block):
|
|
2015
|
+
blocks.append(block)
|
|
2016
|
+
continue
|
|
2017
|
+
|
|
2018
|
+
sub_pieces = _split_text_at_blank_lines(piece, budget, items_budget)
|
|
2019
|
+
if len(sub_pieces) == 1:
|
|
2020
|
+
sub_pieces = _split_lines_to_length(piece, budget, items_budget)
|
|
2021
|
+
if len(sub_pieces) > 1:
|
|
2022
|
+
worklist = [(sub, budget, items_budget) for sub in sub_pieces] + worklist
|
|
2023
|
+
continue
|
|
2024
|
+
if budget > 256 or items_budget > 8:
|
|
2025
|
+
# The piece fits the raw budgets but its *formatted* text overflows
|
|
2026
|
+
# (ZWSP/NBSP inflation or estimation drift): shrink and retry.
|
|
2027
|
+
worklist.insert(
|
|
2028
|
+
0,
|
|
2029
|
+
(
|
|
2030
|
+
piece,
|
|
2031
|
+
max(256, int(budget * 0.8)),
|
|
2032
|
+
max(8, int(items_budget * 0.8)),
|
|
2033
|
+
),
|
|
2034
|
+
)
|
|
2035
|
+
continue
|
|
2036
|
+
# A floor-budget piece cannot exceed the hard limits, so this is
|
|
2037
|
+
# unreachable; keep the block rather than loop forever.
|
|
2038
|
+
blocks.append(block)
|
|
2039
|
+
return blocks
|
|
2040
|
+
|
|
2041
|
+
|
|
1614
2042
|
def _create_rich_text_block(
|
|
1615
2043
|
elements: list[dict[str, Any]], *, plain_text: str | None = None
|
|
1616
2044
|
) -> dict[str, Any]:
|
|
@@ -1903,12 +2331,12 @@ def _convert_markdown_text_segment_to_blocks(
|
|
|
1903
2331
|
markdown_buffer.pop()
|
|
1904
2332
|
if not markdown_buffer:
|
|
1905
2333
|
return
|
|
1906
|
-
|
|
1907
|
-
|
|
1908
|
-
|
|
2334
|
+
blocks.extend(
|
|
2335
|
+
_create_markdown_blocks(
|
|
2336
|
+
"\n".join(markdown_buffer),
|
|
2337
|
+
preserve_visual_blank_lines=preserve_visual_blank_lines,
|
|
2338
|
+
)
|
|
1909
2339
|
)
|
|
1910
|
-
if markdown_block:
|
|
1911
|
-
blocks.append(markdown_block)
|
|
1912
2340
|
markdown_buffer = []
|
|
1913
2341
|
|
|
1914
2342
|
while cursor < len(lines):
|
|
@@ -1956,16 +2384,10 @@ def split_markdown_into_segments(markdown_text: str) -> list[dict[str, str]]:
|
|
|
1956
2384
|
current = []
|
|
1957
2385
|
current_is_table = None
|
|
1958
2386
|
|
|
1959
|
-
|
|
1960
|
-
|
|
1961
|
-
for line in lines:
|
|
2387
|
+
for line, is_fenced, _ in _iter_fence_states(lines):
|
|
1962
2388
|
stripped = line.strip()
|
|
1963
|
-
opening_fence = _match_fence_open(line) if active_fence is None else None
|
|
1964
|
-
is_fenced_line = active_fence is not None or opening_fence is not None
|
|
1965
2389
|
is_table_line = (
|
|
1966
|
-
False
|
|
1967
|
-
if is_fenced_line
|
|
1968
|
-
else stripped.startswith("|") and stripped.endswith("|")
|
|
2390
|
+
False if is_fenced else stripped.startswith("|") and stripped.endswith("|")
|
|
1969
2391
|
)
|
|
1970
2392
|
|
|
1971
2393
|
if current_is_table is None:
|
|
@@ -1978,11 +2400,6 @@ def split_markdown_into_segments(markdown_text: str) -> list[dict[str, str]]:
|
|
|
1978
2400
|
current_is_table = is_table_line
|
|
1979
2401
|
current.append(line)
|
|
1980
2402
|
|
|
1981
|
-
if opening_fence:
|
|
1982
|
-
active_fence = opening_fence
|
|
1983
|
-
elif active_fence and _is_fence_close(line, active_fence):
|
|
1984
|
-
active_fence = None
|
|
1985
|
-
|
|
1986
2403
|
flush()
|
|
1987
2404
|
return segments
|
|
1988
2405
|
|
|
@@ -2026,10 +2443,59 @@ def convert_markdown_to_slack_blocks(
|
|
|
2026
2443
|
convert_markdown_text_to_blocks = convert_markdown_to_slack_blocks
|
|
2027
2444
|
|
|
2028
2445
|
|
|
2446
|
+
def _block_expansion_weight(block: dict[str, Any]) -> int:
|
|
2447
|
+
"""Weight of one block against Slack's per-message expansion budget.
|
|
2448
|
+
|
|
2449
|
+
Slack expands ``markdown`` blocks server-side and enforces the 50-item
|
|
2450
|
+
limit on the expanded result, so a markdown block counts as its estimated
|
|
2451
|
+
expansion; every other block type posts as a single item.
|
|
2452
|
+
"""
|
|
2453
|
+
if not isinstance(block, dict) or block.get("type") != "markdown":
|
|
2454
|
+
return 1
|
|
2455
|
+
annotated = getattr(block, "_expansion_items", None)
|
|
2456
|
+
if isinstance(annotated, int) and annotated > 0:
|
|
2457
|
+
return annotated
|
|
2458
|
+
return _estimate_markdown_expansion_items(str(block.get("text", "")))
|
|
2459
|
+
|
|
2460
|
+
|
|
2461
|
+
_TEXT_SIZE_KEYS = frozenset({"text", "url", "alt_text", "image_url"})
|
|
2462
|
+
|
|
2463
|
+
|
|
2464
|
+
def _block_text_size(value: Any) -> int:
|
|
2465
|
+
"""Rough text payload of a block against the per-message total budget.
|
|
2466
|
+
|
|
2467
|
+
Slack's ``msg_blocks_too_long`` check counts content across block types
|
|
2468
|
+
(a 11,900-char markdown block plus a 1,400-char rich_text was rejected),
|
|
2469
|
+
so this sums every string under content-carrying keys, recursively.
|
|
2470
|
+
"""
|
|
2471
|
+
if isinstance(value, dict):
|
|
2472
|
+
total = 0
|
|
2473
|
+
for key, sub in value.items():
|
|
2474
|
+
if key in _TEXT_SIZE_KEYS and isinstance(sub, str):
|
|
2475
|
+
total += len(sub)
|
|
2476
|
+
else:
|
|
2477
|
+
total += _block_text_size(sub)
|
|
2478
|
+
return total
|
|
2479
|
+
if isinstance(value, list):
|
|
2480
|
+
return sum(_block_text_size(item) for item in value)
|
|
2481
|
+
return 0
|
|
2482
|
+
|
|
2483
|
+
|
|
2029
2484
|
def split_blocks_by_table(blocks: list[dict[str, Any]]) -> list[list[dict[str, Any]]]:
|
|
2030
|
-
"""Split blocks to satisfy Slack table and per-message
|
|
2485
|
+
"""Split blocks to satisfy Slack table and per-message constraints.
|
|
2486
|
+
|
|
2487
|
+
A message holds at most one ``table`` block, at most
|
|
2488
|
+
``SLACK_MAX_BLOCKS_PER_MESSAGE`` posted blocks, at most
|
|
2489
|
+
``SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE`` estimated post-expansion items
|
|
2490
|
+
(headings and thematic breaks inside ``markdown`` blocks count
|
|
2491
|
+
individually toward that budget), and at most
|
|
2492
|
+
``SLACK_MAX_MESSAGE_BLOCKS_TEXT_LENGTH`` characters of block text in
|
|
2493
|
+
total across block types.
|
|
2494
|
+
"""
|
|
2031
2495
|
messages: list[list[dict[str, Any]]] = []
|
|
2032
2496
|
current_message: list[dict[str, Any]] = []
|
|
2497
|
+
current_weight = 0
|
|
2498
|
+
current_text_size = 0
|
|
2033
2499
|
|
|
2034
2500
|
for block in blocks or []:
|
|
2035
2501
|
if isinstance(block, dict) and block.get("type") == "table":
|
|
@@ -2037,11 +2503,23 @@ def split_blocks_by_table(blocks: list[dict[str, Any]]) -> list[list[dict[str, A
|
|
|
2037
2503
|
messages.append(current_message)
|
|
2038
2504
|
messages.append([block])
|
|
2039
2505
|
current_message = []
|
|
2506
|
+
current_weight = 0
|
|
2507
|
+
current_text_size = 0
|
|
2040
2508
|
else:
|
|
2041
|
-
|
|
2509
|
+
weight = _block_expansion_weight(block)
|
|
2510
|
+
text_size = _block_text_size(block)
|
|
2511
|
+
if current_message and (
|
|
2512
|
+
len(current_message) >= SLACK_MAX_BLOCKS_PER_MESSAGE
|
|
2513
|
+
or current_weight + weight > SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE
|
|
2514
|
+
or current_text_size + text_size > _MESSAGE_BLOCKS_TEXT_TARGET
|
|
2515
|
+
):
|
|
2042
2516
|
messages.append(current_message)
|
|
2043
2517
|
current_message = []
|
|
2518
|
+
current_weight = 0
|
|
2519
|
+
current_text_size = 0
|
|
2044
2520
|
current_message.append(block)
|
|
2521
|
+
current_weight += weight
|
|
2522
|
+
current_text_size += text_size
|
|
2045
2523
|
|
|
2046
2524
|
if current_message:
|
|
2047
2525
|
messages.append(current_message)
|
{slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0/slack_markdown_parser.egg-info}/PKG-INFO
RENAMED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: slack-markdown-parser
|
|
3
|
-
Version: 2.
|
|
3
|
+
Version: 2.5.0
|
|
4
4
|
Summary: Convert LLM Markdown into Slack Block Kit messages
|
|
5
5
|
Author: darkgaldragon
|
|
6
6
|
License-Expression: MIT
|
|
@@ -62,7 +62,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
|
|
|
62
62
|
- Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
|
|
63
63
|
- Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
|
|
64
64
|
- Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
|
|
65
|
-
-
|
|
65
|
+
- Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
|
|
66
|
+
- Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
|
|
66
67
|
- Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
|
|
67
68
|
- Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
|
|
68
69
|
- Support Markdown links and Slack-style links inside table cells
|
|
@@ -115,7 +116,8 @@ What this library compensates for:
|
|
|
115
116
|
- Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
|
|
116
117
|
- Keeps table-like rows inside fenced code blocks out of table normalization
|
|
117
118
|
- Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
|
|
118
|
-
- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
|
|
119
|
+
- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
|
|
120
|
+
- Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
|
|
119
121
|
|
|
120
122
|
## Requirements
|
|
121
123
|
|
|
@@ -151,7 +153,7 @@ for payload in convert_markdown_to_slack_payloads(
|
|
|
151
153
|
print(payload)
|
|
152
154
|
```
|
|
153
155
|
|
|
154
|
-
`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
|
|
156
|
+
`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
|
|
155
157
|
Set `preserve_visual_blank_lines=True` when you want the parser to compensate
|
|
156
158
|
for Slack's currently tight paragraph spacing inside `markdown` blocks.
|
|
157
159
|
The blank-line workaround is intentionally narrow: it skips table segments and
|
|
@@ -205,7 +207,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
|
|
|
205
207
|
|
|
206
208
|
| Function | Description |
|
|
207
209
|
|---|---|
|
|
208
|
-
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
|
|
210
|
+
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
|
|
209
211
|
| `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
|
|
210
212
|
| `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
|
|
211
213
|
| `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
|
|
@@ -225,9 +227,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
|
|
|
225
227
|
| Function | Description |
|
|
226
228
|
|---|---|
|
|
227
229
|
| `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
|
|
230
|
+
| `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
|
|
228
231
|
| `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
|
|
229
|
-
| `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
|
|
230
|
-
| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
|
|
232
|
+
| `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
|
|
233
|
+
| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
|
|
231
234
|
| `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
|
|
232
235
|
|
|
233
236
|
### Lower-level exported helpers
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|