slack-markdown-parser 2.4.4__tar.gz → 2.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (18) hide show
  1. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/CHANGELOG.md +12 -0
  2. {slack_markdown_parser-2.4.4/slack_markdown_parser.egg-info → slack_markdown_parser-2.5.0}/PKG-INFO +10 -7
  3. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/README-ja.md +9 -6
  4. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/README.md +9 -6
  5. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/docs/spec-ja.md +28 -5
  6. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/docs/spec.md +28 -5
  7. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/pyproject.toml +1 -1
  8. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser/__init__.py +1 -1
  9. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser/converter.py +544 -66
  10. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0/slack_markdown_parser.egg-info}/PKG-INFO +10 -7
  11. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/LICENSE +0 -0
  12. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/MANIFEST.in +0 -0
  13. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/setup.cfg +0 -0
  14. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser/py.typed +0 -0
  15. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/SOURCES.txt +0 -0
  16. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/dependency_links.txt +0 -0
  17. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/requires.txt +0 -0
  18. {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/top_level.txt +0 -0
@@ -6,6 +6,18 @@ The format is based on Keep a Changelog, and the project follows Semantic Versio
6
6
 
7
7
  ## [Unreleased]
8
8
 
9
+ ## [2.5.0] - 2026-06-11
10
+
11
+ ### Added
12
+
13
+ - Added automatic size splitting so long or heading-dense LLM output no longer fails `chat.postMessage` outright. Three Slack-side hard limits were measured against a real workspace on 2026-06-11 and are now enforced at conversion time: a `markdown` block's `text` accepts exactly 12,000 characters (`msg_too_long` beyond that); Slack expands `markdown` blocks server-side and enforces "no more than 50 items" per message on the expanded result (`invalid_blocks`), where each heading and each thematic break is one item while paragraph/list/quote/fence runs between them merge into one; and one message's blocks may carry at most 13,200 characters of text in total across block types (`msg_blocks_too_long`). Oversized regions are split preferring paragraph boundaries, then line and word boundaries, with a hard cut as a last resort for space-less CJK; a cut inside an unclosed fence re-opens the fence in the continuation block so both halves keep rendering as code. Pieces are re-checked after ZWSP/NBSP formatting and re-split with shrinking budgets when they still overflow. `convert_markdown_to_slack_messages` packs blocks under all three budgets in addition to the existing one-table-per-message rule; documents already within every limit are returned unchanged. Note that the same input can now produce more blocks and more messages than 2.4.x when it previously exceeded Slack's limits (which used to fail delivery entirely).
14
+
15
+ ### Fixed
16
+
17
+ - Stopped corrupting code samples during sanitization. `decode_html_entities` and the angle-token neutralization inside `sanitize_slack_text` ran over the whole text, so a fenced code block or inline code span containing `<div>` or `&amp;` reached Slack as `<div>` / `&` even though Slack renders code content literally. Both passes now skip fenced code blocks and inline code spans; ANSI/control-character removal still applies everywhere. For this purpose a code span is recognized within a single line only and closes only on a backtick run of equal length (CommonMark pairing), so a stray unpaired backtick stays literal and cannot suppress sanitization of later lines, and an invalid angle token that spans a code span (`<foo `bar` baz>`) is still neutralized as a whole while the span content stays verbatim.
18
+ - Stopped crafted input from colliding with internal placeholders. Input carrying this library's reserved in-band marker code points (`U+2063`, `U+FFF0`–`U+FFF3`, e.g. a literal `￰code0￱` sequence) could crash conversion with `KeyError` or get substituted with another code span's content. The markers are now stripped during sanitization and at direct-call entry points of the placeholder machinery, and placeholder restoration passes unknown sequences through instead of raising.
19
+ - Consolidated the three duplicated fenced-code tracking loops onto a single `_iter_fence_states` helper so fence semantics cannot drift between passes again — that drift is exactly how the sanitize corruption happened.
20
+
9
21
  ## [2.4.4] - 2026-06-10
10
22
 
11
23
  ### Fixed
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: slack-markdown-parser
3
- Version: 2.4.4
3
+ Version: 2.5.0
4
4
  Summary: Convert LLM Markdown into Slack Block Kit messages
5
5
  Author: darkgaldragon
6
6
  License-Expression: MIT
@@ -62,7 +62,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
62
62
  - Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
63
63
  - Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
64
64
  - Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
65
- - Remove ANSI/control characters and neutralize invalid Slack angle-bracket tokens before block generation
65
+ - Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
66
+ - Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
66
67
  - Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
67
68
  - Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
68
69
  - Support Markdown links and Slack-style links inside table cells
@@ -115,7 +116,8 @@ What this library compensates for:
115
116
  - Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
116
117
  - Keeps table-like rows inside fenced code blocks out of table normalization
117
118
  - Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
118
- - Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
119
+ - Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
120
+ - Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
119
121
 
120
122
  ## Requirements
121
123
 
@@ -151,7 +153,7 @@ for payload in convert_markdown_to_slack_payloads(
151
153
  print(payload)
152
154
  ```
153
155
 
154
- `convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
156
+ `convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
155
157
  Set `preserve_visual_blank_lines=True` when you want the parser to compensate
156
158
  for Slack's currently tight paragraph spacing inside `markdown` blocks.
157
159
  The blank-line workaround is intentionally narrow: it skips table segments and
@@ -205,7 +207,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
205
207
 
206
208
  | Function | Description |
207
209
  |---|---|
208
- | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
210
+ | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
209
211
  | `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
210
212
  | `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
211
213
  | `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
@@ -225,9 +227,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
225
227
  | Function | Description |
226
228
  |---|---|
227
229
  | `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
230
+ | `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
228
231
  | `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
229
- | `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
230
- | `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
232
+ | `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
233
+ | `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
231
234
  | `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
232
235
 
233
236
  ### Lower-level exported helpers
@@ -32,7 +32,8 @@ Slack の `markdown` ブロック自体が対応していない構文は、古
32
32
  - 安全に判定できる単独 Markdown 構文を `image` / `divider` / `rich_text` ブロックに変換
33
33
  - LLM が生成する表で起こりやすい崩れ(外枠パイプ不足、区切り行不足、列数不一致、空セル)を補正
34
34
  - 必要に応じてメッセージを自動分割し、Slack の「1メッセージ1テーブル」制約とメッセージあたりのブロック数制限に対応
35
- - ANSI escape / 制御文字を除去し、不正な Slack 角括弧トークンを無害化
35
+ - 長文や見出しの多い出力は段落境界を優先して複数の `markdown` ブロック・メッセージに分割し、実測した Slack のハード制限——`markdown` ブロックあたり 12,000 文字(`msg_too_long`)、見出し・区切り線を 1 個ずつ数えるメッセージあたり展開 50 アイテム(`invalid_blocks`)、メッセージあたりブロックテキスト総量 13,200 文字(`msg_blocks_too_long`)——のすべてに収める
36
+ - ANSI escape / 制御文字 / ライブラリ予約の内部マーカー文字を除去し、散文中の不正な Slack 角括弧トークンを無害化(コードフェンスとインラインコードの中身は原文のまま保持)
36
37
  - フェンスドコードブロック外では、装飾記号の前後にゼロ幅スペースを入れて表示崩れを減らす
37
38
  - 日本語・中国語・韓国語の詰まった文で、インラインコードを含む装飾が崩れる一部のケースでは可視スペースを補って安定化
38
39
  - テーブルセル内の Markdown リンク / Slack 形式リンクを認識
@@ -74,7 +75,8 @@ Slack 側の制約として残るもの:
74
75
  - 意味が明確な単独 Markdown 構文を、raw `markdown` 表示に頼らず Slack ネイティブの Block Kit ブロックへ変換
75
76
  - フェンスドコード内の table 風行をテーブル処理から除外
76
77
  - 必要に応じて、内部空行を補助用の行に置き換えて段落の区切りを見えやすくする
77
- - 生 HTML 風タグなど、Slack の特殊記法としては無効な `<...>` 形式を無害化
78
+ - 生 HTML 風タグなど、Slack の特殊記法としては無効な `<...>` 形式を散文中では無害化(コードフェンスとインラインコード内は原文のまま)
79
+ - 実測した Slack のブロック文字数上限・メッセージ展開アイテム上限・メッセージテキスト総量上限を超える出力を分割し、`chat.postMessage` ごと失敗するのを防ぐ
78
80
 
79
81
  ## 利用前提
80
82
 
@@ -110,7 +112,7 @@ for payload in convert_markdown_to_slack_payloads(
110
112
  print(payload)
111
113
  ```
112
114
 
113
- `convert_markdown_to_slack_messages` は、複数テーブルを含む入力を Slack 制約に合わせて複数メッセージへ分割します。
115
+ `convert_markdown_to_slack_messages` は、複数テーブルを含む入力に加えて、長文や見出しの多い内容が Slack のブロック・メッセージサイズ上限を超える場合も、自動的に複数メッセージへ分割します。
114
116
  Slack Web の新しい `markdown` 表示で段落間の余白が極端に小さい場合は、`preserve_visual_blank_lines=True` を使うと内部空行だけを見えやすく補えます。
115
117
 
116
118
  ## 入出力イメージ
@@ -160,7 +162,7 @@ QA | ~~保留~~ | Team C
160
162
 
161
163
  | 関数 | 説明 |
162
164
  |---|---|
163
- | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) → list[list[dict]]` | Markdown をテーブル分割済みのメッセージ群に変換 |
165
+ | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) → list[list[dict]]` | Markdown を、テーブルと Slack の実測サイズ上限に沿って分割済みのメッセージ群に変換 |
164
166
  | `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) → list[dict]` | `blocks` とプレビュー用 `text` を含む Slack 送信用データへ変換 |
165
167
  | `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) → list[dict]` | Markdown を Block Kit ブロックのリストに変換 |
166
168
  | `build_fallback_text_from_blocks(blocks) → str` | `chat.postMessage.text` 用のプレビュー文字列を生成 |
@@ -175,9 +177,10 @@ QA | ~~保留~~ | Team C
175
177
  | 関数 | 説明 |
176
178
  |---|---|
177
179
  | `normalize_markdown_tables(markdown_text) → str` | テーブル記法を正規化(パイプ補完、区切り行生成、列数調整) |
180
+ | `normalize_underscore_emphasis(text) → str` | `_..._` / `__...__` の underscore 装飾を Slack 互換の asterisk 装飾へ変換 |
178
181
  | `add_zero_width_spaces_to_markdown(text) → str` | 装飾記号の前後にゼロ幅スペースを挿入(フェンスドコードブロック内は除外) |
179
- | `decode_html_entities(text) → str` | HTML エンティティをデコード |
180
- | `sanitize_slack_text(text) → str` | ANSI / 制御文字を除去し、不正な Slack 角括弧トークンを無害化 |
182
+ | `decode_html_entities(text) → str` | 散文中の HTML エンティティをデコード(コード領域は原文のまま) |
183
+ | `sanitize_slack_text(text) → str` | ANSI / 制御文字 / 内部マーカー文字を除去し、コード領域外の不正な Slack 角括弧トークンを無害化 |
181
184
  | `strip_zero_width_spaces(text) → str` | ゼロ幅スペース (U+200B) と BOM (U+FEFF) を除去(ZWJ 等の結合制御文字は保持) |
182
185
 
183
186
  ## 仕様
@@ -32,7 +32,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
32
32
  - Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
33
33
  - Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
34
34
  - Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
35
- - Remove ANSI/control characters and neutralize invalid Slack angle-bracket tokens before block generation
35
+ - Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
36
+ - Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
36
37
  - Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
37
38
  - Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
38
39
  - Support Markdown links and Slack-style links inside table cells
@@ -85,7 +86,8 @@ What this library compensates for:
85
86
  - Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
86
87
  - Keeps table-like rows inside fenced code blocks out of table normalization
87
88
  - Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
88
- - Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
89
+ - Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
90
+ - Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
89
91
 
90
92
  ## Requirements
91
93
 
@@ -121,7 +123,7 @@ for payload in convert_markdown_to_slack_payloads(
121
123
  print(payload)
122
124
  ```
123
125
 
124
- `convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
126
+ `convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
125
127
  Set `preserve_visual_blank_lines=True` when you want the parser to compensate
126
128
  for Slack's currently tight paragraph spacing inside `markdown` blocks.
127
129
  The blank-line workaround is intentionally narrow: it skips table segments and
@@ -175,7 +177,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
175
177
 
176
178
  | Function | Description |
177
179
  |---|---|
178
- | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
180
+ | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
179
181
  | `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
180
182
  | `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
181
183
  | `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
@@ -195,9 +197,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
195
197
  | Function | Description |
196
198
  |---|---|
197
199
  | `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
200
+ | `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
198
201
  | `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
199
- | `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
200
- | `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
202
+ | `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
203
+ | `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
201
204
  | `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
202
205
 
203
206
  ### Lower-level exported helpers
@@ -10,7 +10,7 @@
10
10
  ## 出力
11
11
 
12
12
  - Slack Block Kit ブロック(`markdown`, `table`, `rich_text`, `image`, `divider`)
13
- - 複数テーブルや多数の昇格ブロックがある入力時は、「1メッセージ1テーブル」と Slack のメッセージあたりブロック数制限を満たすメッセージ群
13
+ - 複数テーブル・多数の昇格ブロック・長文を含む入力時は、「1メッセージ1テーブル」、Slack のメッセージあたりブロック数制限、および「markdown ブロックのサイズ分割」に記載の実測サイズ上限を満たすメッセージ群
14
14
 
15
15
  ## 設計目標
16
16
 
@@ -23,8 +23,8 @@ Markdown としての厳密さより Slack 上での読みやすさが重要に
23
23
 
24
24
  `convert_markdown_to_slack_blocks` の処理順序:
25
25
 
26
- 1. HTML エンティティをデコードし、`&gt;`, `&amp;` などを元の文字へ戻す
27
- 2. Slack 向けのテキスト掃除を行い、ANSI / 制御文字を除去し、不正な Slack 角括弧トークンを無害化する
26
+ 1. 散文中の HTML エンティティをデコードし、`&gt;`, `&amp;` などを元の文字へ戻す(コードフェンスとインラインコードの中身はデコードしない)
27
+ 2. Slack 向けのテキスト掃除を行う。ANSI / 制御文字とライブラリ予約の内部マーカー文字は全体から除去し、不正な Slack 角括弧トークンの無害化はコードフェンスとインラインコードの外側にのみ適用する
28
28
  3. underscore 装飾を正規化し、`_..._` / `__...__` を Slack 互換の `*...*` / `**...**` に変換する
29
29
  4. bare URL を Slack で安定しやすい `<https://...>` 形式にそろえる
30
30
  5. 崩れた表を、後述のルールで補う
@@ -33,8 +33,9 @@ Markdown としての厳密さより Slack 上での読みやすさが重要に
33
33
  - テーブル領域: セル内装飾を解析して `table` ブロックを生成。変換に失敗した場合は `markdown` ブロックに戻す
34
34
  - 非テーブル領域: 安全に判定できる単独 Markdown 構文を先にリッチブロックへ変換し、残りのテキストは必要に応じてゼロ幅スペースを加えた上で `markdown` ブロックを生成する
35
35
  - `preserve_visual_blank_lines=True` の場合は、残った `markdown` ブロックの内部空行を「ノーブレークスペースだけを含む行」に置き換えてから `markdown` ブロックを作る
36
+ - 整形後のテキストが Slack の `markdown` ブロック上限(12,000 文字)を超える領域は、後述の「markdown ブロックの文字数分割」のルールで複数の `markdown` ブロックに分割する
36
37
 
37
- `convert_markdown_to_slack_messages` は上記の結果を「1メッセージ1テーブル」制約と Slack のメッセージあたりブロック数制限に沿って分割します。
38
+ `convert_markdown_to_slack_messages` は上記の結果を、「1メッセージ1テーブル」制約、Slack のメッセージあたりブロック数制限、および「markdown ブロックのサイズ分割」に記載のメッセージあたり展開アイテム・テキスト総量の予算に沿って分割します。
38
39
  `convert_markdown_to_slack_payloads` は、同じ分割結果に `chat.postMessage.text` 用のプレビュー文字列を付けた送信データを返します。
39
40
 
40
41
  ## 実測ベースの Slack の挙動
@@ -83,7 +84,7 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
83
84
  - リストは、テキスト領域の先頭または空行の直後から始まり、連続する非空行がすべてリスト項目で、1〜3スペースの曖昧なネストインデントや Markdown バックスラッシュエスケープに依存せず、直後にインデント付きの継続段落がない場合だけ昇格する
84
85
  - フェンスドコード内の table 風行をテーブル解析対象から除外する
85
86
  - 内部空行を、必要に応じて段落区切りを見えやすくする補助行へ置き換える
86
- - `<foo>` や生 HTML 風タグのような、Slack の特殊記法としては無効な `<...>` 形式を無害化する
87
+ - `<foo>` や生 HTML 風タグのような、Slack の特殊記法としては無効な `<...>` 形式を散文中では無害化する(コードフェンスとインラインコードの中身は原文のまま保持する)
87
88
 
88
89
  ## Slack 向けテキスト掃除のルール
89
90
 
@@ -91,10 +92,13 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
91
92
 
92
93
  - ANSI escape を除去する
93
94
  - 一般的な制御文字を除去する
95
+ - ライブラリが内部プレースホルダ用に予約しているマーカー文字(`U+2063`、`U+FFF0`〜`U+FFF3`)を除去し、入力が内部機構と衝突しないようにする
94
96
  - 有効な Slack 角括弧トークンは保持する
95
97
  - 例: リンク、メンション、チャンネル参照、`<!here>`、`<!subteam^...>`、`<!date^...>`
96
98
  - Slack の特殊記法として解釈できない `<foo>` のようなトークンは `<foo>` に変換して無害化する
97
99
  - これには `<div>` や `<span>` のような生 HTML 風タグも含まれる
100
+ - 角括弧トークンの無害化はコードフェンスとインラインコードの外側にのみ適用する。`` `<div>` `` のようなコード例は原文のまま Slack に届く。ANSI / 制御文字 / マーカー文字の除去は、表示内容として正当な用途がないため全体に適用する
101
+ - この判定でのインラインコードスパンは同一行内に限り、開始と同じ長さのバッククォート run でのみ閉じる。対になっていない孤立バッククォートはリテラルのまま扱われ、後続行のサニタイズを妨げない。また、コードスパンをまたぐ無効な角括弧トークン(`<foo `bar` baz>`)はスパン内容を原文のまま保ちつつ全体を無害化する
98
102
 
99
103
  ## underscore 装飾正規化ルール
100
104
 
@@ -211,6 +215,25 @@ LLM は外枠パイプの省略、区切り行の欠落、列数の不一致な
211
215
  | U+200C | ZWNJ(ゼロ幅非接合子) | ペルシャ語・ヒンディー語などの語形制御に使われる |
212
216
  | U+200D | ZWJ(ゼロ幅接合子) | 結合絵文字やその他の文字結合に必要 |
213
217
 
218
+ ## markdown ブロックのサイズ分割
219
+
220
+ 2026-06-11 に実ワークスペースで、Slack 側の 3 つのハード制限を実測しました。
221
+
222
+ - `markdown` ブロックの `text` はちょうど 12,000 文字まで受理。12,001 文字は `chat.postMessage` 全体が `msg_too_long` で拒否される
223
+ - Slack は `markdown` ブロックをサーバ側でネイティブブロック列に展開し、展開後の「アイテム数 ≤ 50」をメッセージ単位で検証する(超過は `invalid_blocks`)。見出しと区切り線は 1 個ずつアイテムになる(見出し 50 個は受理、51 個は拒否。30 見出し×2 ブロックも合算で拒否)。段落・リスト・引用・コードフェンスは、見出し/区切り線に挟まれた連続区間ごとに 1 アイテムへ集約される(空行区切りの段落 60 個やフェンス 52 個は受理。空行だけでは区間は分かれない)
224
+ - 1 メッセージのブロックが運べるテキスト総量はちょうど 13,200 文字(単一ブロック上限の 1.1 倍)。13,201 文字は `msg_blocks_too_long` で拒否される。総量はブロック種別をまたいで数えられる(11,900 文字の `markdown` ブロック + 1,400 文字の `rich_text` も拒否された)
225
+
226
+ このため、長い、または見出しの多い非テーブル領域は送信前に分割します。
227
+
228
+ - まず領域全体を 1 ブロックとして試し、整形後テキストが文字数上限を超えるか、展開アイテム数の見積もりが予算を超えた場合のみ分割する
229
+ - ゼロ幅スペースや補助行の挿入でテキストが膨らみ、アイテム数見積もりも意図的に保守的なため、生テキストは上限より低い目標値(11,500 文字 / 45 アイテム)に向けて詰める
230
+ - 分割点はまず段落境界(コードフェンス外の空行のまとまり)を選ぶ。境界に使った空行は、隣接ブロック自体が視覚的に分かれて表示されるため除去する
231
+ - 予算を超える単一段落は行境界で、超過する単一行は語境界で分割する。スペースが無い場合(密な CJK 文など)はやむを得ず文字位置で切る
232
+ - 未閉鎖のコードフェンス内で切れる場合は、続きのブロック先頭に元のフェンス開始行を再掲し、両方がコードとして表示され続けるようにする
233
+ - 分割後の各ピースも整形後に再チェックし、ハード制限を超える場合は詰め込み予算を縮めて再分割する
234
+ - `convert_markdown_to_slack_messages` はさらに、メッセージ内の展開アイテム見積もりの合計が 50 以内、ブロックテキストの総量が 13,200 文字以内に収まるようにブロックを束ねる(`markdown` 以外のブロックは 1 アイテムと数え、テキスト総量には全ブロック種別の内容を算入する)
235
+ - 最上位のフォールバック `text` フィールドには文字数上限は適用されない(Slack は拒否せず切り詰める)ため、プレビュー文字列は分割しない
236
+
214
237
  ## 空行の見え方を補うオプション
215
238
 
216
239
  メインの変換 API に `preserve_visual_blank_lines=True` を渡すと、非テーブル領域で見える行に挟まれた空行だけを「ノーブレークスペースだけを含む行」に置き換えてから Slack `markdown` ブロックを生成します。
@@ -10,7 +10,7 @@ This document describes how `slack-markdown-parser` converts Markdown into Slack
10
10
  ## Output
11
11
 
12
12
  - Slack Block Kit blocks (`markdown`, `table`, `rich_text`, `image`, and `divider`)
13
- - When the input contains multiple tables or many promoted blocks, a list of messages that satisfies the "one table per message" rule and Slack's per-message block-count limit
13
+ - When the input contains multiple tables, many promoted blocks, or long content, a list of messages that satisfies the "one table per message" rule, Slack's per-message block-count limit, and the measured size limits described in "Markdown block size splitting"
14
14
 
15
15
  ## Design target
16
16
 
@@ -23,8 +23,8 @@ When exact Markdown fidelity conflicts with Slack readability, readable Slack ou
23
23
 
24
24
  `convert_markdown_to_slack_blocks` processes text in this order:
25
25
 
26
- 1. Decode HTML entities such as `&gt;` and `&amp;`
27
- 2. Clean Slack text by removing ANSI/control noise and neutralizing invalid Slack angle-bracket tokens
26
+ 1. Decode HTML entities such as `&gt;` and `&amp;` in prose, leaving fenced code blocks and inline code spans verbatim
27
+ 2. Clean Slack text: remove ANSI/control noise and this library's reserved internal marker code points everywhere, and neutralize invalid Slack angle-bracket tokens outside fenced code blocks and inline code spans
28
28
  3. Normalize underscore emphasis by converting `_..._` / `__...__` into Slack-friendly `*...*` / `**...**`
29
29
  4. Normalize bare URLs by wrapping them in Slack-friendly `<https://...>` form
30
30
  5. Repair malformed tables using the rules below
@@ -33,8 +33,9 @@ When exact Markdown fidelity conflicts with Slack readability, readable Slack ou
33
33
  - Table regions: parse inline cell styling and generate a `table` block. If conversion fails, such as when there are fewer than two candidate lines or the parse result is empty, fall back to a `markdown` block.
34
34
  - Non-table regions: first promote safe standalone Markdown constructs into richer Block Kit blocks, then add zero-width spaces where needed and generate `markdown` blocks for the remaining text.
35
35
  - If `preserve_visual_blank_lines=True`, replace internal blank lines in remaining `markdown` blocks with lines that contain only a non-breaking space before emitting the `markdown` block.
36
+ - A remaining region whose formatted text would exceed Slack's 12,000-character `markdown` block limit is split into multiple `markdown` blocks using the rules in "Markdown block length splitting" below.
36
37
 
37
- `convert_markdown_to_slack_messages` then splits the resulting block list to satisfy the "one table per message" rule and Slack's per-message block-count limit.
38
+ `convert_markdown_to_slack_messages` then splits the resulting block list to satisfy the "one table per message" rule, Slack's per-message block-count limit, and the per-message expansion-item and total-text budgets described in "Markdown block size splitting".
38
39
  `convert_markdown_to_slack_payloads` returns the same split blocks plus preview `text` values ready for `chat.postMessage`.
39
40
 
40
41
  ## How Slack behaved in testing
@@ -84,7 +85,7 @@ Slack still controls when those newer features appear and how they look, so trea
84
85
  - Slack mention tokens inside a promoted list item are converted to their structured `rich_text` elements — `<@U…>`/`<@W…>` to `user`, `<#C…>`/`<#G…>` to `channel`, `<!subteam^S…>` to `usergroup`, and `<!here>`/`<!channel>`/`<!everyone>` to `broadcast` — since a `rich_text` block does not resolve a raw token. An optional `|label` display suffix is dropped (Slack renders the element from the id).
85
86
  - Table-like rows inside fenced code blocks are kept out of table parsing
86
87
  - Internal blank lines can optionally be rewritten into placeholder lines so Slack keeps visible paragraph separation
87
- - Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized
88
+ - Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized in prose, while fenced code blocks and inline code spans keep them verbatim
88
89
 
89
90
  ## Slack text cleanup rules
90
91
 
@@ -92,9 +93,12 @@ Behavior of `sanitize_slack_text`:
92
93
 
93
94
  - Remove ANSI escape sequences
94
95
  - Remove general control characters except line breaks and tabs already preserved by the regex
96
+ - Remove this library's reserved internal marker code points (`U+2063`, `U+FFF0`–`U+FFF3`) so input cannot collide with the internal placeholder machinery
95
97
  - Keep valid Slack angle-bracket tokens such as links, mentions, channels, special mentions, subteam mentions, and `<!date^...>`
96
98
  - Replace unsupported angle-bracket tokens such as `<foo>` with full-width brackets (`<foo>`) so Slack does not interpret them as malformed special syntax
97
99
  - This also applies to raw HTML-like tags such as `<div>` or `<span>`
100
+ - Angle-token neutralization applies only outside fenced code blocks and inline code spans, so code samples such as `` `<div>` `` reach Slack verbatim; ANSI/control/marker removal applies everywhere because those characters are never legitimate content
101
+ - For this purpose an inline code span is recognized within a single line only, and it closes only on a backtick run of the same length as the opener. A stray unpaired backtick therefore stays literal and cannot suppress sanitization of later lines, and an invalid angle token that spans a code span (`<foo `bar` baz>`) is still neutralized as a whole while the span content stays verbatim
98
102
 
99
103
  ## Underscore emphasis normalization rules
100
104
 
@@ -211,6 +215,25 @@ Exception:
211
215
  | `U+200C` | ZWNJ (zero-width non-joiner) | Used for word-shape control in languages such as Persian and Hindi |
212
216
  | `U+200D` | ZWJ (zero-width joiner) | Required for joined emoji and other grapheme composition |
213
217
 
218
+ ## Markdown block size splitting
219
+
220
+ Three Slack-side hard limits were measured against a real workspace on 2026-06-11:
221
+
222
+ - A `markdown` block's `text` accepts exactly 12,000 characters; 12,001 fails the whole `chat.postMessage` call with `msg_too_long`.
223
+ - Slack expands `markdown` blocks server-side into native blocks and enforces "no more than 50 items" on the expanded result per message (`invalid_blocks`). Each heading and each thematic break becomes its own item (50 headings were accepted, 51 rejected; 30 headings in each of two blocks were rejected together), while paragraphs, lists, quotes, and fenced code merge into one item per contiguous run between those breakers (60 blank-separated paragraphs and 52 fences were accepted). Blank lines alone do not split a run.
224
+ - One message's blocks may carry at most 13,200 characters of text in total — exactly 1.1 × the single-block limit; 13,201 fails with `msg_blocks_too_long`. The total counts content across block types (a 11,900-character `markdown` block plus a 1,400-character `rich_text` was rejected).
225
+
226
+ Long or heading-dense non-table regions are therefore split before delivery:
227
+
228
+ - The whole region is tried as a single block first; splitting happens only when the formatted text exceeds the character limit or the estimated expansion exceeds the per-message item budget
229
+ - Raw content is packed toward targets below the hard limits (11,500 characters, 45 estimated items), because zero-width-space insertion and placeholder lines inflate the formatted text and the item estimate is intentionally conservative
230
+ - Split points prefer paragraph boundaries (blank-line runs outside fenced code); the blank run at a chosen boundary is dropped, since adjacent Slack blocks already render visually separated
231
+ - A single paragraph longer than the budget is split at line boundaries, and a single overlong line at word boundaries, with a hard cut when no space exists (for example dense CJK text)
232
+ - When a cut lands inside an unclosed fenced code block, the continuation block re-opens the fence with the original delimiter line so both halves keep rendering as code
233
+ - Each piece is re-checked after formatting; when it still exceeds a hard limit, the packing budgets shrink and the piece is split again
234
+ - `convert_markdown_to_slack_messages` additionally packs blocks into messages so that the summed expansion estimate stays within the 50-item budget (non-`markdown` blocks count as one item each) and the summed block text stays within the 13,200-character per-message total
235
+ - The top-level fallback `text` field is not subject to the character limit (Slack truncates it instead of rejecting), so preview text is left whole
236
+
214
237
  ## Optional blank-line visibility workaround
215
238
 
216
239
  When `preserve_visual_blank_lines=True` is passed to the main conversion APIs,
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "slack-markdown-parser"
7
- version = "2.4.4"
7
+ version = "2.5.0"
8
8
  description = "Convert LLM Markdown into Slack Block Kit messages"
9
9
  readme = "README.md"
10
10
  requires-python = ">=3.10"
@@ -1,6 +1,6 @@
1
1
  """slack-markdown-parser public package API."""
2
2
 
3
- __version__ = "2.4.4"
3
+ __version__ = "2.5.0"
4
4
  __license__ = "MIT"
5
5
 
6
6
  from .converter import (
@@ -8,6 +8,7 @@ from __future__ import annotations
8
8
 
9
9
  import html
10
10
  import re
11
+ from collections.abc import Callable, Iterable, Iterator
11
12
  from typing import Any
12
13
  from urllib.parse import urlparse
13
14
 
@@ -22,6 +23,14 @@ ANSI_ESCAPE_PATTERN = re.compile(
22
23
  r"\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]|\][^\x1B\x07]*(?:\x07|\x1B\\))"
23
24
  )
24
25
  CONTROL_CHAR_PATTERN = re.compile(r"[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]")
26
+ # In-band marker code points reserved by this module's placeholder/spacing
27
+ # machinery: SYNTH_SPACE_MARKER (U+2063), the inline-code placeholder
28
+ # delimiters (U+FFF0/U+FFF1), and the ZWSP-strip markers (U+FFF2/U+FFF3).
29
+ # They have no legitimate use in chat text, and input that carries them would
30
+ # collide with internal placeholders (a stray ``\ufff0code0\ufff1`` either
31
+ # crashes restoration with KeyError or gets substituted with another span's
32
+ # content), so they are removed up front together with control characters.
33
+ INTERNAL_MARKER_CHAR_PATTERN = re.compile("[\u2063\ufff0-\ufff3]")
25
34
  SLACK_ANGLE_TOKEN_PATTERN = re.compile(r"<[^>\n]+>")
26
35
  BARE_URL_PATTERN = re.compile(r"https?://[^\s<]+", re.IGNORECASE)
27
36
  FENCE_OPEN_PATTERN = re.compile(r"^[ \t]{0,3}(`{3,}|~{3,})([^\n]*)$")
@@ -107,13 +116,47 @@ ALLOWED_SLACK_ANGLE_TOKEN_PATTERNS = (
107
116
  re.compile(r"^<!date\^[^>\n]+>$"),
108
117
  )
109
118
  SLACK_MAX_BLOCKS_PER_MESSAGE = 50
119
+ # Verified against a real Slack workspace (2026-06-11): a ``markdown`` block's
120
+ # ``text`` accepts exactly 12,000 characters, while 12,001 fails the whole
121
+ # chat.postMessage call with ``msg_too_long``. The top-level fallback ``text``
122
+ # field is not subject to this limit (40,001 characters was accepted).
123
+ SLACK_MAX_MARKDOWN_TEXT_LENGTH = 12000
124
+ # Raw-content packing target used when an oversized markdown segment is split.
125
+ # Formatting inflates text (ZWSP padding, NBSP blank-line placeholders), so
126
+ # pieces are packed below the hard limit; the block builder re-splits any
127
+ # piece whose *formatted* text still exceeds the hard limit.
128
+ _MARKDOWN_SPLIT_TARGET_LENGTH = 11500
129
+ # Slack expands a ``markdown`` block server-side into native blocks and then
130
+ # enforces "no more than 50 items" on the expanded result *per message*.
131
+ # Measured against a real workspace (2026-06-11): each heading and each
132
+ # thematic break becomes its own item (50 headings accepted, 51 rejected;
133
+ # 30 headings in each of two blocks rejected), while paragraphs, lists,
134
+ # quotes, and fenced code merge into one item per run between those breakers
135
+ # (60 blank-separated paragraphs and 52 fences were accepted).
136
+ SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE = 50
137
+ # Per-block packing target, leaving headroom for estimation error.
138
+ _MARKDOWN_EXPANSION_ITEMS_TARGET = 45
139
+ # Third measured hard limit (2026-06-11): the text carried by one message's
140
+ # blocks *in total* — across block types; rich_text content counts too — may
141
+ # not exceed 13,200 characters (= 1.1 × the single-block limit). 13,201
142
+ # fails the whole call with ``msg_blocks_too_long``.
143
+ SLACK_MAX_MESSAGE_BLOCKS_TEXT_LENGTH = 13200
144
+ # Packing target, leaving headroom for structural fields the size proxy may
145
+ # not count exactly.
146
+ _MESSAGE_BLOCKS_TEXT_TARGET = 12800
147
+ ATX_HEADING_PATTERN = re.compile(r"^[ \t]{0,3}#{1,6}(?:[ \t]|$)")
110
148
 
111
149
 
112
150
  def decode_html_entities(text: str) -> str:
113
- """Decode HTML entities that may appear in model output."""
114
- if not text:
151
+ """Decode HTML entities in prose while leaving code regions verbatim.
152
+
153
+ Prose entities such as ``&gt;`` are decoded for natural reading, but a
154
+ fenced code block or inline code span showing ``&amp;`` keeps the literal
155
+ entity: code samples are content, not markup to repair.
156
+ """
157
+ if not text or "&" not in text:
115
158
  return text
116
- return html.unescape(text)
159
+ return _transform_outside_code_regions(text, html.unescape)
117
160
 
118
161
 
119
162
  def strip_zero_width_spaces(text: str) -> str:
@@ -566,15 +609,90 @@ def _is_allowed_slack_angle_token(token: str) -> bool:
566
609
 
567
610
 
568
611
  def _find_inline_code_span_end(text: str, start: int) -> int | None:
612
+ """Find the end of the inline code span opened at ``start``.
613
+
614
+ Per CommonMark, a code span closes only with a backtick run of *equal*
615
+ length: a lone `` ` `` must not pair with the first backtick of a later
616
+ ``` `` ``` run. Runs of a different length are skipped whole.
617
+ """
569
618
  delimiter_end = start
570
619
  while delimiter_end < len(text) and text[delimiter_end] == "`":
571
620
  delimiter_end += 1
621
+ delimiter_length = delimiter_end - start
572
622
 
573
- delimiter = text[start:delimiter_end]
574
- closing = text.find(delimiter, delimiter_end)
575
- if closing == -1:
576
- return None
577
- return closing + len(delimiter)
623
+ cursor = delimiter_end
624
+ while True:
625
+ closing = text.find("`", cursor)
626
+ if closing == -1:
627
+ return None
628
+ run_end = closing
629
+ while run_end < len(text) and text[run_end] == "`":
630
+ run_end += 1
631
+ if run_end - closing == delimiter_length:
632
+ return run_end
633
+ cursor = run_end
634
+
635
+
636
+ def _transform_outside_inline_code(text: str, transform: Callable[[str], str]) -> str:
637
+ """Apply ``transform`` to text while keeping inline code spans verbatim.
638
+
639
+ Spans are bounded to a single line, matching this module's span model
640
+ (``INLINE_CODE_SPAN_PATTERN``). Without that bound, one stray backtick
641
+ would pair with a backtick on a much later line and silently suppress
642
+ sanitization for everything in between.
643
+
644
+ Spans are replaced with placeholder tokens (which contain no backticks or
645
+ angle brackets) rather than split out, so the transform still sees any
646
+ construct that *spans* a code span — e.g. an invalid angle token such as
647
+ ``<foo `bar` baz>`` is neutralized as a whole while the span content
648
+ itself stays verbatim. Reserved marker code points are stripped from the
649
+ input first, so crafted input cannot collide with the placeholders.
650
+ """
651
+ text = INTERNAL_MARKER_CHAR_PATTERN.sub("", text)
652
+
653
+ spans: list[str] = []
654
+ parts: list[str] = []
655
+ plain_start = 0
656
+ cursor = text.find("`")
657
+
658
+ while cursor != -1:
659
+ span_end = _find_inline_code_span_end(text, cursor)
660
+ if span_end is None or "\n" in text[cursor:span_end]:
661
+ # No same-line closing run: the backticks are literal text.
662
+ delimiter_end = cursor
663
+ while delimiter_end < len(text) and text[delimiter_end] == "`":
664
+ delimiter_end += 1
665
+ cursor = text.find("`", delimiter_end)
666
+ continue
667
+ parts.append(text[plain_start:cursor])
668
+ parts.append(f"\ufff0code{len(spans)}\ufff1")
669
+ spans.append(text[cursor:span_end])
670
+ plain_start = span_end
671
+ cursor = text.find("`", span_end)
672
+
673
+ parts.append(text[plain_start:])
674
+ transformed = transform("".join(parts))
675
+
676
+ if not spans:
677
+ return transformed
678
+ placeholder_map = {f"\ufff0code{idx}\ufff1": span for idx, span in enumerate(spans)}
679
+ return INLINE_CODE_PLACEHOLDER_PATTERN.sub(
680
+ lambda match: placeholder_map.get(match.group(0), match.group(0)),
681
+ transformed,
682
+ )
683
+
684
+
685
+ def _transform_outside_code_regions(text: str, transform: Callable[[str], str]) -> str:
686
+ """Apply ``transform`` outside fenced code blocks and inline code spans.
687
+
688
+ Code samples must reach Slack verbatim: neither Slack's ``markdown`` block
689
+ renderer nor ``rich_text_preformatted`` interprets their content, so any
690
+ rewrite inside a code region is visible corruption.
691
+ """
692
+ return "".join(
693
+ chunk if is_fenced else _transform_outside_inline_code(chunk, transform)
694
+ for is_fenced, chunk in _split_fenced_code_chunks(text)
695
+ )
578
696
 
579
697
 
580
698
  def _is_punctuation_like(char: str, boundary_chars: set[str]) -> bool:
@@ -690,12 +808,21 @@ def normalize_bare_urls_for_slack_markdown(text: str) -> str:
690
808
 
691
809
 
692
810
  def sanitize_slack_text(text: str) -> str:
693
- """Remove control noise and neutralize invalid Slack angle tokens."""
811
+ """Remove control noise and neutralize invalid Slack angle tokens.
812
+
813
+ ANSI escapes, control characters, and this module's reserved in-band
814
+ marker code points are removed everywhere — including code regions —
815
+ because they are never legitimate visible content. Angle-token
816
+ neutralization rewrites visible text, so it skips fenced code blocks and
817
+ inline code spans: a code sample containing ``<div>`` must reach Slack
818
+ verbatim.
819
+ """
694
820
  if not text:
695
821
  return text
696
822
 
697
823
  cleaned = ANSI_ESCAPE_PATTERN.sub("", text)
698
824
  cleaned = CONTROL_CHAR_PATTERN.sub("", cleaned)
825
+ cleaned = INTERNAL_MARKER_CHAR_PATTERN.sub("", cleaned)
699
826
 
700
827
  def replace_invalid_token(match: re.Match[str]) -> str:
701
828
  token = match.group(0)
@@ -703,7 +830,10 @@ def sanitize_slack_text(text: str) -> str:
703
830
  return token
704
831
  return f"<{token[1:-1]}>"
705
832
 
706
- return SLACK_ANGLE_TOKEN_PATTERN.sub(replace_invalid_token, cleaned)
833
+ def neutralize_angle_tokens(segment: str) -> str:
834
+ return SLACK_ANGLE_TOKEN_PATTERN.sub(replace_invalid_token, segment)
835
+
836
+ return _transform_outside_code_regions(cleaned, neutralize_angle_tokens)
707
837
 
708
838
 
709
839
  def _match_fence_open(line: str) -> tuple[str, int] | None:
@@ -723,38 +853,58 @@ def _is_fence_close(line: str, fence: tuple[str, int]) -> bool:
723
853
  )
724
854
 
725
855
 
856
+ def _iter_fence_states(lines: Iterable[str]) -> Iterator[tuple[str, bool, bool]]:
857
+ """Yield ``(line, is_fenced, is_opening)`` for each line.
858
+
859
+ Single source of truth for fenced-code tracking across this module.
860
+ ``is_fenced`` covers the fence delimiter lines themselves and the body of
861
+ an unclosed trailing fence; ``is_opening`` marks the opening delimiter
862
+ line so callers can flush per-fence state. Works with or without trailing
863
+ newlines on the lines.
864
+ """
865
+ active_fence: tuple[str, int] | None = None
866
+ for line in lines:
867
+ if active_fence is None:
868
+ opening_fence = _match_fence_open(line)
869
+ if opening_fence is not None:
870
+ active_fence = opening_fence
871
+ yield line, True, True
872
+ continue
873
+ yield line, False, False
874
+ continue
875
+ yield line, True, False
876
+ if _is_fence_close(line, active_fence):
877
+ active_fence = None
878
+
879
+
726
880
  def _split_fenced_code_chunks(text: str) -> list[tuple[bool, str]]:
727
881
  chunks: list[tuple[bool, str]] = []
728
882
  if not text:
729
883
  return chunks
730
884
 
731
885
  current: list[str] = []
732
- active_fence: tuple[str, int] | None = None
886
+ current_is_fenced = False
733
887
 
734
- for line in text.splitlines(keepends=True):
735
- opening_fence = _match_fence_open(line) if active_fence is None else None
736
-
737
- if opening_fence:
738
- if current:
739
- chunks.append((False, "".join(current)))
740
- current = []
741
- current.append(line)
742
- active_fence = opening_fence
743
- continue
744
-
745
- current.append(line)
746
- if active_fence and _is_fence_close(line, active_fence):
747
- chunks.append((True, "".join(current)))
888
+ for line, is_fenced, is_opening in _iter_fence_states(
889
+ text.splitlines(keepends=True)
890
+ ):
891
+ if current and (is_opening or is_fenced != current_is_fenced):
892
+ chunks.append((current_is_fenced, "".join(current)))
748
893
  current = []
749
- active_fence = None
894
+ current.append(line)
895
+ current_is_fenced = is_fenced
750
896
 
751
897
  if current:
752
- chunks.append((active_fence is not None, "".join(current)))
898
+ chunks.append((current_is_fenced, "".join(current)))
753
899
 
754
900
  return chunks
755
901
 
756
902
 
757
903
  def _normalize_underscore_emphasis_chunk(text: str) -> str:
904
+ # Same defense as _format_markdown_with_spacing_metadata: reserved marker
905
+ # code points in direct-call input must not collide with the numbered
906
+ # placeholders below.
907
+ text = INTERNAL_MARKER_CHAR_PATTERN.sub("", text)
758
908
  protected_spans: list[str] = []
759
909
 
760
910
  def protect(match: re.Match[str]) -> str:
@@ -798,6 +948,11 @@ def _format_markdown_with_spacing_metadata(text: str) -> tuple[str, list[int]]:
798
948
  if not text:
799
949
  return text, []
800
950
 
951
+ # Defense in depth for direct calls that bypass sanitize_slack_text: input
952
+ # carrying our reserved marker code points would collide with the inline
953
+ # placeholder machinery below.
954
+ text = INTERNAL_MARKER_CHAR_PATTERN.sub("", text)
955
+
801
956
  boundary_chars = {*VISIBLE_BOUNDARY_CHARS, ZWSP, SYNTH_SPACE_MARKER}
802
957
 
803
958
  def wrap_match(match: re.Match[str], source: str) -> str:
@@ -871,9 +1026,16 @@ def _format_markdown_with_spacing_metadata(text: str) -> tuple[str, list[int]]:
871
1026
  before_char = source[start - 1] if start > 0 else ""
872
1027
  after_char = source[end] if end < len(source) else ""
873
1028
  strategy = _nested_code_space_strategy(source, start, end, boundary_chars)
1029
+
1030
+ def resolve_placeholder_raw(placeholder_match: re.Match[str]) -> str:
1031
+ # Unknown placeholder-shaped sequences pass through unchanged
1032
+ # (belt-and-braces against in-band collisions; the markers are
1033
+ # already stripped at every entry point).
1034
+ entry = replacements.get(placeholder_match.group(0))
1035
+ return entry["raw"] if entry else placeholder_match.group(0)
1036
+
874
1037
  resolved_text = INLINE_CODE_PLACEHOLDER_PATTERN.sub(
875
- lambda placeholder_match: replacements[placeholder_match.group(0)]["raw"],
876
- match.group(0),
1038
+ resolve_placeholder_raw, match.group(0)
877
1039
  )
878
1040
  has_ascii_word = bool(re.search(r"[A-Za-z0-9]", resolved_text))
879
1041
  adjusted_text = match.group(0)
@@ -981,11 +1143,12 @@ def _format_markdown_with_spacing_metadata(text: str) -> tuple[str, list[int]]:
981
1143
  protected_segment,
982
1144
  )
983
1145
 
1146
+ def restore_placeholder(placeholder_match: re.Match[str]) -> str:
1147
+ entry = placeholder_map.get(placeholder_match.group(0))
1148
+ return entry["wrapped"] if entry else placeholder_match.group(0)
1149
+
984
1150
  protected_segment = INLINE_CODE_PLACEHOLDER_PATTERN.sub(
985
- lambda placeholder_match: placeholder_map[placeholder_match.group(0)][
986
- "wrapped"
987
- ],
988
- protected_segment,
1151
+ restore_placeholder, protected_segment
989
1152
  )
990
1153
 
991
1154
  protected_segment = re.sub(
@@ -1275,20 +1438,11 @@ def normalize_markdown_tables(markdown_text: str) -> str:
1275
1438
  normalized.extend(buffer)
1276
1439
  buffer = []
1277
1440
 
1278
- active_fence: tuple[str, int] | None = None
1279
-
1280
- for idx, line in enumerate(lines):
1281
- opening_fence = _match_fence_open(line) if active_fence is None else None
1282
- if opening_fence:
1283
- flush_buffer()
1284
- normalized.append(line)
1285
- active_fence = opening_fence
1286
- continue
1287
-
1288
- if active_fence:
1441
+ for idx, (line, is_fenced, is_opening) in enumerate(_iter_fence_states(lines)):
1442
+ if is_fenced:
1443
+ if is_opening:
1444
+ flush_buffer()
1289
1445
  normalized.append(line)
1290
- if _is_fence_close(line, active_fence):
1291
- active_fence = None
1292
1446
  continue
1293
1447
 
1294
1448
  stripped = line.strip()
@@ -1608,9 +1762,283 @@ def _create_markdown_block(
1608
1762
  block._plain_text = plain_text
1609
1763
  block._synthetic_space_indices = synthetic_indices
1610
1764
  block._synthetic_blank_line_indices = synthetic_blank_line_indices
1765
+ block._expansion_items = _estimate_markdown_expansion_items(formatted)
1611
1766
  return block
1612
1767
 
1613
1768
 
1769
+ def _is_markdown_expansion_breaker_line(line: str) -> bool:
1770
+ """Return True for lines Slack expands into their own top-level block.
1771
+
1772
+ ATX headings and thematic breaks each become one expansion item and end
1773
+ the surrounding content run. A setext ``===`` underline is counted too;
1774
+ that can over-count by one (heading text + underline), which only makes
1775
+ the estimate conservative.
1776
+ """
1777
+ if ATX_HEADING_PATTERN.match(line):
1778
+ return True
1779
+ if _is_thematic_break_line(line):
1780
+ return True
1781
+ stripped = line.strip()
1782
+ return bool(stripped) and set(stripped) == {"="}
1783
+
1784
+
1785
+ def _estimate_markdown_expansion_items(text: str) -> int:
1786
+ """Estimate how many native blocks Slack expands this markdown text into.
1787
+
1788
+ Model measured against a real workspace (see the constants above): each
1789
+ heading / thematic break is one item, and each maximal run of any other
1790
+ content *between* those breakers is one item — blank lines inside a run
1791
+ do not split it. Fenced lines always count as run content.
1792
+ """
1793
+ items = 0
1794
+ in_run = False
1795
+ for line, is_fenced, _ in _iter_fence_states(text.split("\n")):
1796
+ if is_fenced:
1797
+ if not in_run:
1798
+ items += 1
1799
+ in_run = True
1800
+ continue
1801
+ if not line.strip():
1802
+ continue
1803
+ if _is_markdown_expansion_breaker_line(line):
1804
+ items += 1
1805
+ in_run = False
1806
+ continue
1807
+ if not in_run:
1808
+ items += 1
1809
+ in_run = True
1810
+ return max(1, items)
1811
+
1812
+
1813
+ def _split_text_at_blank_lines(text: str, max_length: int, max_items: int) -> list[str]:
1814
+ """Greedily pack paragraph units into pieces within both budgets.
1815
+
1816
+ Units are separated by blank-line runs outside fenced code (blank lines
1817
+ inside a fence never split). The blank run at a piece boundary is dropped
1818
+ — adjacent Slack blocks already render separated — while blank runs
1819
+ packed inside a piece are kept verbatim. A piece is closed when adding
1820
+ the next unit would exceed ``max_length`` characters or ``max_items``
1821
+ estimated expansion items. A single unit over either budget is returned
1822
+ oversized; the caller splits it harder.
1823
+ """
1824
+ if (
1825
+ len(text) <= max_length
1826
+ and _estimate_markdown_expansion_items(text) <= max_items
1827
+ ):
1828
+ return [text]
1829
+
1830
+ units: list[list[str]] = []
1831
+ content: list[str] = []
1832
+ blanks: list[str] = []
1833
+ for line, is_fenced, _ in _iter_fence_states(text.split("\n")):
1834
+ if not is_fenced and not line.strip():
1835
+ if content:
1836
+ blanks.append(line)
1837
+ else:
1838
+ # Leading blank lines stay attached to the first unit.
1839
+ content.append(line)
1840
+ continue
1841
+ if blanks:
1842
+ units.append(content)
1843
+ units.append(blanks)
1844
+ content, blanks = [], []
1845
+ content.append(line)
1846
+ if content:
1847
+ units.append(content)
1848
+ if blanks:
1849
+ units.append(blanks)
1850
+
1851
+ pieces: list[str] = []
1852
+ current: list[str] = []
1853
+ pending_blanks: list[str] = []
1854
+ for index in range(0, len(units), 2):
1855
+ unit = units[index]
1856
+ candidate = current + (pending_blanks if current else []) + unit
1857
+ candidate_text = "\n".join(candidate)
1858
+ if current and (
1859
+ len(candidate_text) > max_length
1860
+ or _estimate_markdown_expansion_items(candidate_text) > max_items
1861
+ ):
1862
+ pieces.append("\n".join(current))
1863
+ current = list(unit)
1864
+ else:
1865
+ current = candidate
1866
+ pending_blanks = units[index + 1] if index + 1 < len(units) else []
1867
+ if current:
1868
+ pieces.append("\n".join(current))
1869
+ return pieces
1870
+
1871
+
1872
+ def _split_single_line_to_length(line: str, max_length: int) -> list[str]:
1873
+ """Split one overlong line, preferring a space boundary near the limit."""
1874
+ parts: list[str] = []
1875
+ while len(line) > max_length:
1876
+ cut = line.rfind(" ", 1, max_length + 1)
1877
+ if cut <= 0:
1878
+ parts.append(line[:max_length])
1879
+ line = line[max_length:]
1880
+ else:
1881
+ parts.append(line[:cut])
1882
+ line = line[cut + 1 :]
1883
+ parts.append(line)
1884
+ return parts
1885
+
1886
+
1887
+ def _split_lines_to_length(text: str, max_length: int, max_items: int) -> list[str]:
1888
+ """Split at line boundaries into pieces within both budgets.
1889
+
1890
+ Last-resort splitter for content that exceeds a budget without blank-line
1891
+ split points (a single huge paragraph, a fence body, or a long run of
1892
+ headings). When the cut lands inside an (unclosed) fence, the continuation
1893
+ piece re-opens the fence with the original delimiter line so both pieces
1894
+ keep rendering as code.
1895
+ """
1896
+ pieces: list[str] = []
1897
+ current: list[str] = []
1898
+ current_len = 0
1899
+ current_items = 0
1900
+ in_run = False
1901
+ active_fence_open: str | None = None
1902
+
1903
+ def flush(next_fence_prefix: str | None) -> None:
1904
+ nonlocal current, current_len, current_items, in_run
1905
+ if current:
1906
+ pieces.append("\n".join(current))
1907
+ if next_fence_prefix:
1908
+ current = [next_fence_prefix]
1909
+ current_len = len(next_fence_prefix)
1910
+ current_items = 1
1911
+ in_run = True
1912
+ else:
1913
+ current = []
1914
+ current_len = 0
1915
+ current_items = 0
1916
+ in_run = False
1917
+
1918
+ for line, is_fenced, is_opening in _iter_fence_states(text.split("\n")):
1919
+ if is_opening:
1920
+ active_fence_open = line
1921
+ elif not is_fenced:
1922
+ active_fence_open = None
1923
+
1924
+ line_is_blank = not is_fenced and not line.strip()
1925
+ line_is_breaker = (
1926
+ not is_fenced
1927
+ and not line_is_blank
1928
+ and _is_markdown_expansion_breaker_line(line)
1929
+ )
1930
+
1931
+ for part_index, part in enumerate(
1932
+ [line]
1933
+ if len(line) <= max_length
1934
+ else _split_single_line_to_length(line, max_length)
1935
+ ):
1936
+ # Word-split continuations of a breaker line render as plain
1937
+ # content, so only the first part keeps the breaker class.
1938
+ is_breaker = line_is_breaker and part_index == 0
1939
+ if line_is_blank:
1940
+ part_items = 0
1941
+ elif is_breaker:
1942
+ part_items = 1
1943
+ else:
1944
+ part_items = 0 if in_run else 1
1945
+
1946
+ added = len(part) + (1 if current else 0)
1947
+ if current and (
1948
+ current_len + added > max_length
1949
+ or current_items + part_items > max_items
1950
+ ):
1951
+ reopen = active_fence_open if active_fence_open != part else None
1952
+ flush(reopen)
1953
+ added = len(part) + (1 if current else 0)
1954
+ if line_is_blank:
1955
+ part_items = 0
1956
+ elif is_breaker:
1957
+ part_items = 1
1958
+ else:
1959
+ part_items = 0 if in_run else 1
1960
+ current.append(part)
1961
+ current_len += added
1962
+ current_items += part_items
1963
+ if is_breaker:
1964
+ in_run = False
1965
+ elif not line_is_blank:
1966
+ in_run = True
1967
+
1968
+ if current:
1969
+ pieces.append("\n".join(current))
1970
+ return pieces
1971
+
1972
+
1973
+ def _markdown_block_fits_slack_limits(block: dict[str, Any]) -> bool:
1974
+ return (
1975
+ len(block["text"]) <= SLACK_MAX_MARKDOWN_TEXT_LENGTH
1976
+ and _estimate_markdown_expansion_items(block["text"])
1977
+ <= SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE
1978
+ )
1979
+
1980
+
1981
+ def _create_markdown_blocks(
1982
+ content: str, *, preserve_visual_blank_lines: bool = False
1983
+ ) -> list[dict[str, Any]]:
1984
+ """Build ``markdown`` blocks that fit Slack's measured hard limits.
1985
+
1986
+ The whole content is tried as a single block first. Only when the
1987
+ *formatted* text exceeds ``SLACK_MAX_MARKDOWN_TEXT_LENGTH`` or the
1988
+ estimated server-side expansion exceeds the per-message item budget is
1989
+ the raw content split — at paragraph boundaries when possible, then at
1990
+ line/word boundaries — and each piece re-checked after formatting,
1991
+ shrinking the packing budget geometrically until every block fits.
1992
+ """
1993
+
1994
+ def build(piece: str) -> dict[str, Any] | None:
1995
+ return _create_markdown_block(
1996
+ piece, preserve_visual_blank_lines=preserve_visual_blank_lines
1997
+ )
1998
+
1999
+ whole = build(content)
2000
+ if whole is None:
2001
+ return []
2002
+ if _markdown_block_fits_slack_limits(whole):
2003
+ return [whole]
2004
+
2005
+ blocks: list[dict[str, Any]] = []
2006
+ worklist: list[tuple[str, int, int]] = [
2007
+ (content, _MARKDOWN_SPLIT_TARGET_LENGTH, _MARKDOWN_EXPANSION_ITEMS_TARGET)
2008
+ ]
2009
+ while worklist:
2010
+ piece, budget, items_budget = worklist.pop(0)
2011
+ block = build(piece)
2012
+ if block is None:
2013
+ continue
2014
+ if _markdown_block_fits_slack_limits(block):
2015
+ blocks.append(block)
2016
+ continue
2017
+
2018
+ sub_pieces = _split_text_at_blank_lines(piece, budget, items_budget)
2019
+ if len(sub_pieces) == 1:
2020
+ sub_pieces = _split_lines_to_length(piece, budget, items_budget)
2021
+ if len(sub_pieces) > 1:
2022
+ worklist = [(sub, budget, items_budget) for sub in sub_pieces] + worklist
2023
+ continue
2024
+ if budget > 256 or items_budget > 8:
2025
+ # The piece fits the raw budgets but its *formatted* text overflows
2026
+ # (ZWSP/NBSP inflation or estimation drift): shrink and retry.
2027
+ worklist.insert(
2028
+ 0,
2029
+ (
2030
+ piece,
2031
+ max(256, int(budget * 0.8)),
2032
+ max(8, int(items_budget * 0.8)),
2033
+ ),
2034
+ )
2035
+ continue
2036
+ # A floor-budget piece cannot exceed the hard limits, so this is
2037
+ # unreachable; keep the block rather than loop forever.
2038
+ blocks.append(block)
2039
+ return blocks
2040
+
2041
+
1614
2042
  def _create_rich_text_block(
1615
2043
  elements: list[dict[str, Any]], *, plain_text: str | None = None
1616
2044
  ) -> dict[str, Any]:
@@ -1903,12 +2331,12 @@ def _convert_markdown_text_segment_to_blocks(
1903
2331
  markdown_buffer.pop()
1904
2332
  if not markdown_buffer:
1905
2333
  return
1906
- markdown_block = _create_markdown_block(
1907
- "\n".join(markdown_buffer),
1908
- preserve_visual_blank_lines=preserve_visual_blank_lines,
2334
+ blocks.extend(
2335
+ _create_markdown_blocks(
2336
+ "\n".join(markdown_buffer),
2337
+ preserve_visual_blank_lines=preserve_visual_blank_lines,
2338
+ )
1909
2339
  )
1910
- if markdown_block:
1911
- blocks.append(markdown_block)
1912
2340
  markdown_buffer = []
1913
2341
 
1914
2342
  while cursor < len(lines):
@@ -1956,16 +2384,10 @@ def split_markdown_into_segments(markdown_text: str) -> list[dict[str, str]]:
1956
2384
  current = []
1957
2385
  current_is_table = None
1958
2386
 
1959
- active_fence: tuple[str, int] | None = None
1960
-
1961
- for line in lines:
2387
+ for line, is_fenced, _ in _iter_fence_states(lines):
1962
2388
  stripped = line.strip()
1963
- opening_fence = _match_fence_open(line) if active_fence is None else None
1964
- is_fenced_line = active_fence is not None or opening_fence is not None
1965
2389
  is_table_line = (
1966
- False
1967
- if is_fenced_line
1968
- else stripped.startswith("|") and stripped.endswith("|")
2390
+ False if is_fenced else stripped.startswith("|") and stripped.endswith("|")
1969
2391
  )
1970
2392
 
1971
2393
  if current_is_table is None:
@@ -1978,11 +2400,6 @@ def split_markdown_into_segments(markdown_text: str) -> list[dict[str, str]]:
1978
2400
  current_is_table = is_table_line
1979
2401
  current.append(line)
1980
2402
 
1981
- if opening_fence:
1982
- active_fence = opening_fence
1983
- elif active_fence and _is_fence_close(line, active_fence):
1984
- active_fence = None
1985
-
1986
2403
  flush()
1987
2404
  return segments
1988
2405
 
@@ -2026,10 +2443,59 @@ def convert_markdown_to_slack_blocks(
2026
2443
  convert_markdown_text_to_blocks = convert_markdown_to_slack_blocks
2027
2444
 
2028
2445
 
2446
+ def _block_expansion_weight(block: dict[str, Any]) -> int:
2447
+ """Weight of one block against Slack's per-message expansion budget.
2448
+
2449
+ Slack expands ``markdown`` blocks server-side and enforces the 50-item
2450
+ limit on the expanded result, so a markdown block counts as its estimated
2451
+ expansion; every other block type posts as a single item.
2452
+ """
2453
+ if not isinstance(block, dict) or block.get("type") != "markdown":
2454
+ return 1
2455
+ annotated = getattr(block, "_expansion_items", None)
2456
+ if isinstance(annotated, int) and annotated > 0:
2457
+ return annotated
2458
+ return _estimate_markdown_expansion_items(str(block.get("text", "")))
2459
+
2460
+
2461
+ _TEXT_SIZE_KEYS = frozenset({"text", "url", "alt_text", "image_url"})
2462
+
2463
+
2464
+ def _block_text_size(value: Any) -> int:
2465
+ """Rough text payload of a block against the per-message total budget.
2466
+
2467
+ Slack's ``msg_blocks_too_long`` check counts content across block types
2468
+ (a 11,900-char markdown block plus a 1,400-char rich_text was rejected),
2469
+ so this sums every string under content-carrying keys, recursively.
2470
+ """
2471
+ if isinstance(value, dict):
2472
+ total = 0
2473
+ for key, sub in value.items():
2474
+ if key in _TEXT_SIZE_KEYS and isinstance(sub, str):
2475
+ total += len(sub)
2476
+ else:
2477
+ total += _block_text_size(sub)
2478
+ return total
2479
+ if isinstance(value, list):
2480
+ return sum(_block_text_size(item) for item in value)
2481
+ return 0
2482
+
2483
+
2029
2484
  def split_blocks_by_table(blocks: list[dict[str, Any]]) -> list[list[dict[str, Any]]]:
2030
- """Split blocks to satisfy Slack table and per-message block constraints."""
2485
+ """Split blocks to satisfy Slack table and per-message constraints.
2486
+
2487
+ A message holds at most one ``table`` block, at most
2488
+ ``SLACK_MAX_BLOCKS_PER_MESSAGE`` posted blocks, at most
2489
+ ``SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE`` estimated post-expansion items
2490
+ (headings and thematic breaks inside ``markdown`` blocks count
2491
+ individually toward that budget), and at most
2492
+ ``SLACK_MAX_MESSAGE_BLOCKS_TEXT_LENGTH`` characters of block text in
2493
+ total across block types.
2494
+ """
2031
2495
  messages: list[list[dict[str, Any]]] = []
2032
2496
  current_message: list[dict[str, Any]] = []
2497
+ current_weight = 0
2498
+ current_text_size = 0
2033
2499
 
2034
2500
  for block in blocks or []:
2035
2501
  if isinstance(block, dict) and block.get("type") == "table":
@@ -2037,11 +2503,23 @@ def split_blocks_by_table(blocks: list[dict[str, Any]]) -> list[list[dict[str, A
2037
2503
  messages.append(current_message)
2038
2504
  messages.append([block])
2039
2505
  current_message = []
2506
+ current_weight = 0
2507
+ current_text_size = 0
2040
2508
  else:
2041
- if len(current_message) >= SLACK_MAX_BLOCKS_PER_MESSAGE:
2509
+ weight = _block_expansion_weight(block)
2510
+ text_size = _block_text_size(block)
2511
+ if current_message and (
2512
+ len(current_message) >= SLACK_MAX_BLOCKS_PER_MESSAGE
2513
+ or current_weight + weight > SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE
2514
+ or current_text_size + text_size > _MESSAGE_BLOCKS_TEXT_TARGET
2515
+ ):
2042
2516
  messages.append(current_message)
2043
2517
  current_message = []
2518
+ current_weight = 0
2519
+ current_text_size = 0
2044
2520
  current_message.append(block)
2521
+ current_weight += weight
2522
+ current_text_size += text_size
2045
2523
 
2046
2524
  if current_message:
2047
2525
  messages.append(current_message)
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: slack-markdown-parser
3
- Version: 2.4.4
3
+ Version: 2.5.0
4
4
  Summary: Convert LLM Markdown into Slack Block Kit messages
5
5
  Author: darkgaldragon
6
6
  License-Expression: MIT
@@ -62,7 +62,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
62
62
  - Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
63
63
  - Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
64
64
  - Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
65
- - Remove ANSI/control characters and neutralize invalid Slack angle-bracket tokens before block generation
65
+ - Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
66
+ - Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
66
67
  - Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
67
68
  - Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
68
69
  - Support Markdown links and Slack-style links inside table cells
@@ -115,7 +116,8 @@ What this library compensates for:
115
116
  - Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
116
117
  - Keeps table-like rows inside fenced code blocks out of table normalization
117
118
  - Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
118
- - Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
119
+ - Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
120
+ - Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
119
121
 
120
122
  ## Requirements
121
123
 
@@ -151,7 +153,7 @@ for payload in convert_markdown_to_slack_payloads(
151
153
  print(payload)
152
154
  ```
153
155
 
154
- `convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
156
+ `convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
155
157
  Set `preserve_visual_blank_lines=True` when you want the parser to compensate
156
158
  for Slack's currently tight paragraph spacing inside `markdown` blocks.
157
159
  The blank-line workaround is intentionally narrow: it skips table segments and
@@ -205,7 +207,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
205
207
 
206
208
  | Function | Description |
207
209
  |---|---|
208
- | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
210
+ | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
209
211
  | `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
210
212
  | `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
211
213
  | `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
@@ -225,9 +227,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
225
227
  | Function | Description |
226
228
  |---|---|
227
229
  | `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
230
+ | `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
228
231
  | `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
229
- | `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
230
- | `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
232
+ | `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
233
+ | `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
231
234
  | `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
232
235
 
233
236
  ### Lower-level exported helpers