slack-markdown-parser 2.4.3__tar.gz → 2.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (18) hide show
  1. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/CHANGELOG.md +19 -0
  2. {slack_markdown_parser-2.4.3/slack_markdown_parser.egg-info → slack_markdown_parser-2.5.0}/PKG-INFO +10 -7
  3. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/README-ja.md +9 -6
  4. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/README.md +9 -6
  5. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/docs/spec-ja.md +28 -5
  6. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/docs/spec.md +29 -5
  7. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/pyproject.toml +1 -1
  8. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/slack_markdown_parser/__init__.py +1 -1
  9. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/slack_markdown_parser/converter.py +678 -152
  10. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0/slack_markdown_parser.egg-info}/PKG-INFO +10 -7
  11. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/LICENSE +0 -0
  12. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/MANIFEST.in +0 -0
  13. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/setup.cfg +0 -0
  14. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/slack_markdown_parser/py.typed +0 -0
  15. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/SOURCES.txt +0 -0
  16. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/dependency_links.txt +0 -0
  17. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/requires.txt +0 -0
  18. {slack_markdown_parser-2.4.3 → slack_markdown_parser-2.5.0}/slack_markdown_parser.egg-info/top_level.txt +0 -0
@@ -6,6 +6,25 @@ The format is based on Keep a Changelog, and the project follows Semantic Versio
6
6
 
7
7
  ## [Unreleased]
8
8
 
9
+ ## [2.5.0] - 2026-06-11
10
+
11
+ ### Added
12
+
13
+ - Added automatic size splitting so long or heading-dense LLM output no longer fails `chat.postMessage` outright. Three Slack-side hard limits were measured against a real workspace on 2026-06-11 and are now enforced at conversion time: a `markdown` block's `text` accepts exactly 12,000 characters (`msg_too_long` beyond that); Slack expands `markdown` blocks server-side and enforces "no more than 50 items" per message on the expanded result (`invalid_blocks`), where each heading and each thematic break is one item while paragraph/list/quote/fence runs between them merge into one; and one message's blocks may carry at most 13,200 characters of text in total across block types (`msg_blocks_too_long`). Oversized regions are split preferring paragraph boundaries, then line and word boundaries, with a hard cut as a last resort for space-less CJK; a cut inside an unclosed fence re-opens the fence in the continuation block so both halves keep rendering as code. Pieces are re-checked after ZWSP/NBSP formatting and re-split with shrinking budgets when they still overflow. `convert_markdown_to_slack_messages` packs blocks under all three budgets in addition to the existing one-table-per-message rule; documents already within every limit are returned unchanged. Note that the same input can now produce more blocks and more messages than 2.4.x when it previously exceeded Slack's limits (which used to fail delivery entirely).
14
+
15
+ ### Fixed
16
+
17
+ - Stopped corrupting code samples during sanitization. `decode_html_entities` and the angle-token neutralization inside `sanitize_slack_text` ran over the whole text, so a fenced code block or inline code span containing `<div>` or `&amp;` reached Slack as `<div>` / `&` even though Slack renders code content literally. Both passes now skip fenced code blocks and inline code spans; ANSI/control-character removal still applies everywhere. For this purpose a code span is recognized within a single line only and closes only on a backtick run of equal length (CommonMark pairing), so a stray unpaired backtick stays literal and cannot suppress sanitization of later lines, and an invalid angle token that spans a code span (`<foo `bar` baz>`) is still neutralized as a whole while the span content stays verbatim.
18
+ - Stopped crafted input from colliding with internal placeholders. Input carrying this library's reserved in-band marker code points (`U+2063`, `U+FFF0`–`U+FFF3`, e.g. a literal `￰code0￱` sequence) could crash conversion with `KeyError` or get substituted with another code span's content. The markers are now stripped during sanitization and at direct-call entry points of the placeholder machinery, and placeholder restoration passes unknown sequences through instead of raising.
19
+ - Consolidated the three duplicated fenced-code tracking loops onto a single `_iter_fence_states` helper so fence semantics cannot drift between passes again — that drift is exactly how the sanitize corruption happened.
20
+
21
+ ## [2.4.4] - 2026-06-10
22
+
23
+ ### Fixed
24
+
25
+ - Stopped a link wrapped entirely in emphasis (`**[text](url)**`, and the `*`/`_`/`__`/`~~`/`***` variants) from rendering as dead, literal text in `rich_text` contexts such as list items and table cells. The emphasis token pattern matched the whole `**[text](url)**` span before the link pattern got a chance, so `_create_rich_text_inline_elements` emitted the inner `[text](url)` as a styled plain-text run and the link was never clickable. When a styled (non-code) token's inner content is itself a complete markdown link, it now becomes a `link` element carrying that style (e.g. a bold link), matching the already-working emphasis-inside-the-brackets form (`[**text**](url)`).
26
+ - Stopped Slack mention tokens from rendering as literal text inside promoted list items. Since 2.4.0 a simple list is emitted as a `rich_text` block, but the inline builder only tokenized links, code, and emphasis — so a `<#C123>` / `<@U123>` / `<!subteam^S123>` / `<!here>` token in a list item fell through as a plain `text` run. In a `rich_text` block a mention has to be a structured element (`channel`, `user`, `usergroup`, `broadcast`), so Slack showed the raw `<#C123>` rather than a pill. (Prose was unaffected: it stays in a `markdown` block, where Slack resolves the token itself.) These tokens are now converted to the matching rich_text elements, an optional `|label` display suffix is dropped (Slack renders the element from the id), and the plain-text fallback re-emits the canonical `<#C123>` token so a downgraded mrkdwn fallback still links and notifies. The same applies inside table cells: `extract_plain_text_from_table_cell` now delegates inline runs to the shared rich_text downgrade path instead of a separate near-copy, so a mention in a cell also survives into the fallback text rather than vanishing.
27
+
9
28
  ## [2.4.3] - 2026-05-29
10
29
 
11
30
  ### Fixed
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: slack-markdown-parser
3
- Version: 2.4.3
3
+ Version: 2.5.0
4
4
  Summary: Convert LLM Markdown into Slack Block Kit messages
5
5
  Author: darkgaldragon
6
6
  License-Expression: MIT
@@ -62,7 +62,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
62
62
  - Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
63
63
  - Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
64
64
  - Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
65
- - Remove ANSI/control characters and neutralize invalid Slack angle-bracket tokens before block generation
65
+ - Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
66
+ - Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
66
67
  - Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
67
68
  - Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
68
69
  - Support Markdown links and Slack-style links inside table cells
@@ -115,7 +116,8 @@ What this library compensates for:
115
116
  - Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
116
117
  - Keeps table-like rows inside fenced code blocks out of table normalization
117
118
  - Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
118
- - Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
119
+ - Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
120
+ - Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
119
121
 
120
122
  ## Requirements
121
123
 
@@ -151,7 +153,7 @@ for payload in convert_markdown_to_slack_payloads(
151
153
  print(payload)
152
154
  ```
153
155
 
154
- `convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
156
+ `convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
155
157
  Set `preserve_visual_blank_lines=True` when you want the parser to compensate
156
158
  for Slack's currently tight paragraph spacing inside `markdown` blocks.
157
159
  The blank-line workaround is intentionally narrow: it skips table segments and
@@ -205,7 +207,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
205
207
 
206
208
  | Function | Description |
207
209
  |---|---|
208
- | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
210
+ | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
209
211
  | `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
210
212
  | `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
211
213
  | `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
@@ -225,9 +227,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
225
227
  | Function | Description |
226
228
  |---|---|
227
229
  | `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
230
+ | `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
228
231
  | `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
229
- | `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
230
- | `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
232
+ | `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
233
+ | `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
231
234
  | `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
232
235
 
233
236
  ### Lower-level exported helpers
@@ -32,7 +32,8 @@ Slack の `markdown` ブロック自体が対応していない構文は、古
32
32
  - 安全に判定できる単独 Markdown 構文を `image` / `divider` / `rich_text` ブロックに変換
33
33
  - LLM が生成する表で起こりやすい崩れ(外枠パイプ不足、区切り行不足、列数不一致、空セル)を補正
34
34
  - 必要に応じてメッセージを自動分割し、Slack の「1メッセージ1テーブル」制約とメッセージあたりのブロック数制限に対応
35
- - ANSI escape / 制御文字を除去し、不正な Slack 角括弧トークンを無害化
35
+ - 長文や見出しの多い出力は段落境界を優先して複数の `markdown` ブロック・メッセージに分割し、実測した Slack のハード制限——`markdown` ブロックあたり 12,000 文字(`msg_too_long`)、見出し・区切り線を 1 個ずつ数えるメッセージあたり展開 50 アイテム(`invalid_blocks`)、メッセージあたりブロックテキスト総量 13,200 文字(`msg_blocks_too_long`)——のすべてに収める
36
+ - ANSI escape / 制御文字 / ライブラリ予約の内部マーカー文字を除去し、散文中の不正な Slack 角括弧トークンを無害化(コードフェンスとインラインコードの中身は原文のまま保持)
36
37
  - フェンスドコードブロック外では、装飾記号の前後にゼロ幅スペースを入れて表示崩れを減らす
37
38
  - 日本語・中国語・韓国語の詰まった文で、インラインコードを含む装飾が崩れる一部のケースでは可視スペースを補って安定化
38
39
  - テーブルセル内の Markdown リンク / Slack 形式リンクを認識
@@ -74,7 +75,8 @@ Slack 側の制約として残るもの:
74
75
  - 意味が明確な単独 Markdown 構文を、raw `markdown` 表示に頼らず Slack ネイティブの Block Kit ブロックへ変換
75
76
  - フェンスドコード内の table 風行をテーブル処理から除外
76
77
  - 必要に応じて、内部空行を補助用の行に置き換えて段落の区切りを見えやすくする
77
- - 生 HTML 風タグなど、Slack の特殊記法としては無効な `<...>` 形式を無害化
78
+ - 生 HTML 風タグなど、Slack の特殊記法としては無効な `<...>` 形式を散文中では無害化(コードフェンスとインラインコード内は原文のまま)
79
+ - 実測した Slack のブロック文字数上限・メッセージ展開アイテム上限・メッセージテキスト総量上限を超える出力を分割し、`chat.postMessage` ごと失敗するのを防ぐ
78
80
 
79
81
  ## 利用前提
80
82
 
@@ -110,7 +112,7 @@ for payload in convert_markdown_to_slack_payloads(
110
112
  print(payload)
111
113
  ```
112
114
 
113
- `convert_markdown_to_slack_messages` は、複数テーブルを含む入力を Slack 制約に合わせて複数メッセージへ分割します。
115
+ `convert_markdown_to_slack_messages` は、複数テーブルを含む入力に加えて、長文や見出しの多い内容が Slack のブロック・メッセージサイズ上限を超える場合も、自動的に複数メッセージへ分割します。
114
116
  Slack Web の新しい `markdown` 表示で段落間の余白が極端に小さい場合は、`preserve_visual_blank_lines=True` を使うと内部空行だけを見えやすく補えます。
115
117
 
116
118
  ## 入出力イメージ
@@ -160,7 +162,7 @@ QA | ~~保留~~ | Team C
160
162
 
161
163
  | 関数 | 説明 |
162
164
  |---|---|
163
- | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) → list[list[dict]]` | Markdown をテーブル分割済みのメッセージ群に変換 |
165
+ | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) → list[list[dict]]` | Markdown を、テーブルと Slack の実測サイズ上限に沿って分割済みのメッセージ群に変換 |
164
166
  | `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) → list[dict]` | `blocks` とプレビュー用 `text` を含む Slack 送信用データへ変換 |
165
167
  | `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) → list[dict]` | Markdown を Block Kit ブロックのリストに変換 |
166
168
  | `build_fallback_text_from_blocks(blocks) → str` | `chat.postMessage.text` 用のプレビュー文字列を生成 |
@@ -175,9 +177,10 @@ QA | ~~保留~~ | Team C
175
177
  | 関数 | 説明 |
176
178
  |---|---|
177
179
  | `normalize_markdown_tables(markdown_text) → str` | テーブル記法を正規化(パイプ補完、区切り行生成、列数調整) |
180
+ | `normalize_underscore_emphasis(text) → str` | `_..._` / `__...__` の underscore 装飾を Slack 互換の asterisk 装飾へ変換 |
178
181
  | `add_zero_width_spaces_to_markdown(text) → str` | 装飾記号の前後にゼロ幅スペースを挿入(フェンスドコードブロック内は除外) |
179
- | `decode_html_entities(text) → str` | HTML エンティティをデコード |
180
- | `sanitize_slack_text(text) → str` | ANSI / 制御文字を除去し、不正な Slack 角括弧トークンを無害化 |
182
+ | `decode_html_entities(text) → str` | 散文中の HTML エンティティをデコード(コード領域は原文のまま) |
183
+ | `sanitize_slack_text(text) → str` | ANSI / 制御文字 / 内部マーカー文字を除去し、コード領域外の不正な Slack 角括弧トークンを無害化 |
181
184
  | `strip_zero_width_spaces(text) → str` | ゼロ幅スペース (U+200B) と BOM (U+FEFF) を除去(ZWJ 等の結合制御文字は保持) |
182
185
 
183
186
  ## 仕様
@@ -32,7 +32,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
32
32
  - Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
33
33
  - Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
34
34
  - Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
35
- - Remove ANSI/control characters and neutralize invalid Slack angle-bracket tokens before block generation
35
+ - Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
36
+ - Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
36
37
  - Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
37
38
  - Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
38
39
  - Support Markdown links and Slack-style links inside table cells
@@ -85,7 +86,8 @@ What this library compensates for:
85
86
  - Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
86
87
  - Keeps table-like rows inside fenced code blocks out of table normalization
87
88
  - Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
88
- - Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
89
+ - Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
90
+ - Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
89
91
 
90
92
  ## Requirements
91
93
 
@@ -121,7 +123,7 @@ for payload in convert_markdown_to_slack_payloads(
121
123
  print(payload)
122
124
  ```
123
125
 
124
- `convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
126
+ `convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
125
127
  Set `preserve_visual_blank_lines=True` when you want the parser to compensate
126
128
  for Slack's currently tight paragraph spacing inside `markdown` blocks.
127
129
  The blank-line workaround is intentionally narrow: it skips table segments and
@@ -175,7 +177,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
175
177
 
176
178
  | Function | Description |
177
179
  |---|---|
178
- | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
180
+ | `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
179
181
  | `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
180
182
  | `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
181
183
  | `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
@@ -195,9 +197,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
195
197
  | Function | Description |
196
198
  |---|---|
197
199
  | `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
200
+ | `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
198
201
  | `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
199
- | `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
200
- | `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
202
+ | `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
203
+ | `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
201
204
  | `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
202
205
 
203
206
  ### Lower-level exported helpers
@@ -10,7 +10,7 @@
10
10
  ## 出力
11
11
 
12
12
  - Slack Block Kit ブロック(`markdown`, `table`, `rich_text`, `image`, `divider`)
13
- - 複数テーブルや多数の昇格ブロックがある入力時は、「1メッセージ1テーブル」と Slack のメッセージあたりブロック数制限を満たすメッセージ群
13
+ - 複数テーブル・多数の昇格ブロック・長文を含む入力時は、「1メッセージ1テーブル」、Slack のメッセージあたりブロック数制限、および「markdown ブロックのサイズ分割」に記載の実測サイズ上限を満たすメッセージ群
14
14
 
15
15
  ## 設計目標
16
16
 
@@ -23,8 +23,8 @@ Markdown としての厳密さより Slack 上での読みやすさが重要に
23
23
 
24
24
  `convert_markdown_to_slack_blocks` の処理順序:
25
25
 
26
- 1. HTML エンティティをデコードし、`&gt;`, `&amp;` などを元の文字へ戻す
27
- 2. Slack 向けのテキスト掃除を行い、ANSI / 制御文字を除去し、不正な Slack 角括弧トークンを無害化する
26
+ 1. 散文中の HTML エンティティをデコードし、`&gt;`, `&amp;` などを元の文字へ戻す(コードフェンスとインラインコードの中身はデコードしない)
27
+ 2. Slack 向けのテキスト掃除を行う。ANSI / 制御文字とライブラリ予約の内部マーカー文字は全体から除去し、不正な Slack 角括弧トークンの無害化はコードフェンスとインラインコードの外側にのみ適用する
28
28
  3. underscore 装飾を正規化し、`_..._` / `__...__` を Slack 互換の `*...*` / `**...**` に変換する
29
29
  4. bare URL を Slack で安定しやすい `<https://...>` 形式にそろえる
30
30
  5. 崩れた表を、後述のルールで補う
@@ -33,8 +33,9 @@ Markdown としての厳密さより Slack 上での読みやすさが重要に
33
33
  - テーブル領域: セル内装飾を解析して `table` ブロックを生成。変換に失敗した場合は `markdown` ブロックに戻す
34
34
  - 非テーブル領域: 安全に判定できる単独 Markdown 構文を先にリッチブロックへ変換し、残りのテキストは必要に応じてゼロ幅スペースを加えた上で `markdown` ブロックを生成する
35
35
  - `preserve_visual_blank_lines=True` の場合は、残った `markdown` ブロックの内部空行を「ノーブレークスペースだけを含む行」に置き換えてから `markdown` ブロックを作る
36
+ - 整形後のテキストが Slack の `markdown` ブロック上限(12,000 文字)を超える領域は、後述の「markdown ブロックの文字数分割」のルールで複数の `markdown` ブロックに分割する
36
37
 
37
- `convert_markdown_to_slack_messages` は上記の結果を「1メッセージ1テーブル」制約と Slack のメッセージあたりブロック数制限に沿って分割します。
38
+ `convert_markdown_to_slack_messages` は上記の結果を、「1メッセージ1テーブル」制約、Slack のメッセージあたりブロック数制限、および「markdown ブロックのサイズ分割」に記載のメッセージあたり展開アイテム・テキスト総量の予算に沿って分割します。
38
39
  `convert_markdown_to_slack_payloads` は、同じ分割結果に `chat.postMessage.text` 用のプレビュー文字列を付けた送信データを返します。
39
40
 
40
41
  ## 実測ベースの Slack の挙動
@@ -83,7 +84,7 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
83
84
  - リストは、テキスト領域の先頭または空行の直後から始まり、連続する非空行がすべてリスト項目で、1〜3スペースの曖昧なネストインデントや Markdown バックスラッシュエスケープに依存せず、直後にインデント付きの継続段落がない場合だけ昇格する
84
85
  - フェンスドコード内の table 風行をテーブル解析対象から除外する
85
86
  - 内部空行を、必要に応じて段落区切りを見えやすくする補助行へ置き換える
86
- - `<foo>` や生 HTML 風タグのような、Slack の特殊記法としては無効な `<...>` 形式を無害化する
87
+ - `<foo>` や生 HTML 風タグのような、Slack の特殊記法としては無効な `<...>` 形式を散文中では無害化する(コードフェンスとインラインコードの中身は原文のまま保持する)
87
88
 
88
89
  ## Slack 向けテキスト掃除のルール
89
90
 
@@ -91,10 +92,13 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
91
92
 
92
93
  - ANSI escape を除去する
93
94
  - 一般的な制御文字を除去する
95
+ - ライブラリが内部プレースホルダ用に予約しているマーカー文字(`U+2063`、`U+FFF0`〜`U+FFF3`)を除去し、入力が内部機構と衝突しないようにする
94
96
  - 有効な Slack 角括弧トークンは保持する
95
97
  - 例: リンク、メンション、チャンネル参照、`<!here>`、`<!subteam^...>`、`<!date^...>`
96
98
  - Slack の特殊記法として解釈できない `<foo>` のようなトークンは `<foo>` に変換して無害化する
97
99
  - これには `<div>` や `<span>` のような生 HTML 風タグも含まれる
100
+ - 角括弧トークンの無害化はコードフェンスとインラインコードの外側にのみ適用する。`` `<div>` `` のようなコード例は原文のまま Slack に届く。ANSI / 制御文字 / マーカー文字の除去は、表示内容として正当な用途がないため全体に適用する
101
+ - この判定でのインラインコードスパンは同一行内に限り、開始と同じ長さのバッククォート run でのみ閉じる。対になっていない孤立バッククォートはリテラルのまま扱われ、後続行のサニタイズを妨げない。また、コードスパンをまたぐ無効な角括弧トークン(`<foo `bar` baz>`)はスパン内容を原文のまま保ちつつ全体を無害化する
98
102
 
99
103
  ## underscore 装飾正規化ルール
100
104
 
@@ -211,6 +215,25 @@ LLM は外枠パイプの省略、区切り行の欠落、列数の不一致な
211
215
  | U+200C | ZWNJ(ゼロ幅非接合子) | ペルシャ語・ヒンディー語などの語形制御に使われる |
212
216
  | U+200D | ZWJ(ゼロ幅接合子) | 結合絵文字やその他の文字結合に必要 |
213
217
 
218
+ ## markdown ブロックのサイズ分割
219
+
220
+ 2026-06-11 に実ワークスペースで、Slack 側の 3 つのハード制限を実測しました。
221
+
222
+ - `markdown` ブロックの `text` はちょうど 12,000 文字まで受理。12,001 文字は `chat.postMessage` 全体が `msg_too_long` で拒否される
223
+ - Slack は `markdown` ブロックをサーバ側でネイティブブロック列に展開し、展開後の「アイテム数 ≤ 50」をメッセージ単位で検証する(超過は `invalid_blocks`)。見出しと区切り線は 1 個ずつアイテムになる(見出し 50 個は受理、51 個は拒否。30 見出し×2 ブロックも合算で拒否)。段落・リスト・引用・コードフェンスは、見出し/区切り線に挟まれた連続区間ごとに 1 アイテムへ集約される(空行区切りの段落 60 個やフェンス 52 個は受理。空行だけでは区間は分かれない)
224
+ - 1 メッセージのブロックが運べるテキスト総量はちょうど 13,200 文字(単一ブロック上限の 1.1 倍)。13,201 文字は `msg_blocks_too_long` で拒否される。総量はブロック種別をまたいで数えられる(11,900 文字の `markdown` ブロック + 1,400 文字の `rich_text` も拒否された)
225
+
226
+ このため、長い、または見出しの多い非テーブル領域は送信前に分割します。
227
+
228
+ - まず領域全体を 1 ブロックとして試し、整形後テキストが文字数上限を超えるか、展開アイテム数の見積もりが予算を超えた場合のみ分割する
229
+ - ゼロ幅スペースや補助行の挿入でテキストが膨らみ、アイテム数見積もりも意図的に保守的なため、生テキストは上限より低い目標値(11,500 文字 / 45 アイテム)に向けて詰める
230
+ - 分割点はまず段落境界(コードフェンス外の空行のまとまり)を選ぶ。境界に使った空行は、隣接ブロック自体が視覚的に分かれて表示されるため除去する
231
+ - 予算を超える単一段落は行境界で、超過する単一行は語境界で分割する。スペースが無い場合(密な CJK 文など)はやむを得ず文字位置で切る
232
+ - 未閉鎖のコードフェンス内で切れる場合は、続きのブロック先頭に元のフェンス開始行を再掲し、両方がコードとして表示され続けるようにする
233
+ - 分割後の各ピースも整形後に再チェックし、ハード制限を超える場合は詰め込み予算を縮めて再分割する
234
+ - `convert_markdown_to_slack_messages` はさらに、メッセージ内の展開アイテム見積もりの合計が 50 以内、ブロックテキストの総量が 13,200 文字以内に収まるようにブロックを束ねる(`markdown` 以外のブロックは 1 アイテムと数え、テキスト総量には全ブロック種別の内容を算入する)
235
+ - 最上位のフォールバック `text` フィールドには文字数上限は適用されない(Slack は拒否せず切り詰める)ため、プレビュー文字列は分割しない
236
+
214
237
  ## 空行の見え方を補うオプション
215
238
 
216
239
  メインの変換 API に `preserve_visual_blank_lines=True` を渡すと、非テーブル領域で見える行に挟まれた空行だけを「ノーブレークスペースだけを含む行」に置き換えてから Slack `markdown` ブロックを生成します。
@@ -10,7 +10,7 @@ This document describes how `slack-markdown-parser` converts Markdown into Slack
10
10
  ## Output
11
11
 
12
12
  - Slack Block Kit blocks (`markdown`, `table`, `rich_text`, `image`, and `divider`)
13
- - When the input contains multiple tables or many promoted blocks, a list of messages that satisfies the "one table per message" rule and Slack's per-message block-count limit
13
+ - When the input contains multiple tables, many promoted blocks, or long content, a list of messages that satisfies the "one table per message" rule, Slack's per-message block-count limit, and the measured size limits described in "Markdown block size splitting"
14
14
 
15
15
  ## Design target
16
16
 
@@ -23,8 +23,8 @@ When exact Markdown fidelity conflicts with Slack readability, readable Slack ou
23
23
 
24
24
  `convert_markdown_to_slack_blocks` processes text in this order:
25
25
 
26
- 1. Decode HTML entities such as `&gt;` and `&amp;`
27
- 2. Clean Slack text by removing ANSI/control noise and neutralizing invalid Slack angle-bracket tokens
26
+ 1. Decode HTML entities such as `&gt;` and `&amp;` in prose, leaving fenced code blocks and inline code spans verbatim
27
+ 2. Clean Slack text: remove ANSI/control noise and this library's reserved internal marker code points everywhere, and neutralize invalid Slack angle-bracket tokens outside fenced code blocks and inline code spans
28
28
  3. Normalize underscore emphasis by converting `_..._` / `__...__` into Slack-friendly `*...*` / `**...**`
29
29
  4. Normalize bare URLs by wrapping them in Slack-friendly `<https://...>` form
30
30
  5. Repair malformed tables using the rules below
@@ -33,8 +33,9 @@ When exact Markdown fidelity conflicts with Slack readability, readable Slack ou
33
33
  - Table regions: parse inline cell styling and generate a `table` block. If conversion fails, such as when there are fewer than two candidate lines or the parse result is empty, fall back to a `markdown` block.
34
34
  - Non-table regions: first promote safe standalone Markdown constructs into richer Block Kit blocks, then add zero-width spaces where needed and generate `markdown` blocks for the remaining text.
35
35
  - If `preserve_visual_blank_lines=True`, replace internal blank lines in remaining `markdown` blocks with lines that contain only a non-breaking space before emitting the `markdown` block.
36
+ - A remaining region whose formatted text would exceed Slack's 12,000-character `markdown` block limit is split into multiple `markdown` blocks using the rules in "Markdown block length splitting" below.
36
37
 
37
- `convert_markdown_to_slack_messages` then splits the resulting block list to satisfy the "one table per message" rule and Slack's per-message block-count limit.
38
+ `convert_markdown_to_slack_messages` then splits the resulting block list to satisfy the "one table per message" rule, Slack's per-message block-count limit, and the per-message expansion-item and total-text budgets described in "Markdown block size splitting".
38
39
  `convert_markdown_to_slack_payloads` returns the same split blocks plus preview `text` values ready for `chat.postMessage`.
39
40
 
40
41
  ## How Slack behaved in testing
@@ -81,9 +82,10 @@ Slack still controls when those newer features appear and how they look, so trea
81
82
  - simple one-level quotes to `rich_text_quote`
82
83
  - simple bullet and ordered lists to `rich_text_list`
83
84
  - Lists are promoted only when the list starts at the beginning of the text region or after a blank line, each non-blank line in the run is a list item, the list does not use ambiguous 1-3-space nested indentation, the item text does not rely on Markdown backslash escapes, and the run is not followed by an indented continuation paragraph.
85
+ - Slack mention tokens inside a promoted list item are converted to their structured `rich_text` elements — `<@U…>`/`<@W…>` to `user`, `<#C…>`/`<#G…>` to `channel`, `<!subteam^S…>` to `usergroup`, and `<!here>`/`<!channel>`/`<!everyone>` to `broadcast` — since a `rich_text` block does not resolve a raw token. An optional `|label` display suffix is dropped (Slack renders the element from the id).
84
86
  - Table-like rows inside fenced code blocks are kept out of table parsing
85
87
  - Internal blank lines can optionally be rewritten into placeholder lines so Slack keeps visible paragraph separation
86
- - Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized
88
+ - Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized in prose, while fenced code blocks and inline code spans keep them verbatim
87
89
 
88
90
  ## Slack text cleanup rules
89
91
 
@@ -91,9 +93,12 @@ Behavior of `sanitize_slack_text`:
91
93
 
92
94
  - Remove ANSI escape sequences
93
95
  - Remove general control characters except line breaks and tabs already preserved by the regex
96
+ - Remove this library's reserved internal marker code points (`U+2063`, `U+FFF0`–`U+FFF3`) so input cannot collide with the internal placeholder machinery
94
97
  - Keep valid Slack angle-bracket tokens such as links, mentions, channels, special mentions, subteam mentions, and `<!date^...>`
95
98
  - Replace unsupported angle-bracket tokens such as `<foo>` with full-width brackets (`<foo>`) so Slack does not interpret them as malformed special syntax
96
99
  - This also applies to raw HTML-like tags such as `<div>` or `<span>`
100
+ - Angle-token neutralization applies only outside fenced code blocks and inline code spans, so code samples such as `` `<div>` `` reach Slack verbatim; ANSI/control/marker removal applies everywhere because those characters are never legitimate content
101
+ - For this purpose an inline code span is recognized within a single line only, and it closes only on a backtick run of the same length as the opener. A stray unpaired backtick therefore stays literal and cannot suppress sanitization of later lines, and an invalid angle token that spans a code span (`<foo `bar` baz>`) is still neutralized as a whole while the span content stays verbatim
97
102
 
98
103
  ## Underscore emphasis normalization rules
99
104
 
@@ -210,6 +215,25 @@ Exception:
210
215
  | `U+200C` | ZWNJ (zero-width non-joiner) | Used for word-shape control in languages such as Persian and Hindi |
211
216
  | `U+200D` | ZWJ (zero-width joiner) | Required for joined emoji and other grapheme composition |
212
217
 
218
+ ## Markdown block size splitting
219
+
220
+ Three Slack-side hard limits were measured against a real workspace on 2026-06-11:
221
+
222
+ - A `markdown` block's `text` accepts exactly 12,000 characters; 12,001 fails the whole `chat.postMessage` call with `msg_too_long`.
223
+ - Slack expands `markdown` blocks server-side into native blocks and enforces "no more than 50 items" on the expanded result per message (`invalid_blocks`). Each heading and each thematic break becomes its own item (50 headings were accepted, 51 rejected; 30 headings in each of two blocks were rejected together), while paragraphs, lists, quotes, and fenced code merge into one item per contiguous run between those breakers (60 blank-separated paragraphs and 52 fences were accepted). Blank lines alone do not split a run.
224
+ - One message's blocks may carry at most 13,200 characters of text in total — exactly 1.1 × the single-block limit; 13,201 fails with `msg_blocks_too_long`. The total counts content across block types (a 11,900-character `markdown` block plus a 1,400-character `rich_text` was rejected).
225
+
226
+ Long or heading-dense non-table regions are therefore split before delivery:
227
+
228
+ - The whole region is tried as a single block first; splitting happens only when the formatted text exceeds the character limit or the estimated expansion exceeds the per-message item budget
229
+ - Raw content is packed toward targets below the hard limits (11,500 characters, 45 estimated items), because zero-width-space insertion and placeholder lines inflate the formatted text and the item estimate is intentionally conservative
230
+ - Split points prefer paragraph boundaries (blank-line runs outside fenced code); the blank run at a chosen boundary is dropped, since adjacent Slack blocks already render visually separated
231
+ - A single paragraph longer than the budget is split at line boundaries, and a single overlong line at word boundaries, with a hard cut when no space exists (for example dense CJK text)
232
+ - When a cut lands inside an unclosed fenced code block, the continuation block re-opens the fence with the original delimiter line so both halves keep rendering as code
233
+ - Each piece is re-checked after formatting; when it still exceeds a hard limit, the packing budgets shrink and the piece is split again
234
+ - `convert_markdown_to_slack_messages` additionally packs blocks into messages so that the summed expansion estimate stays within the 50-item budget (non-`markdown` blocks count as one item each) and the summed block text stays within the 13,200-character per-message total
235
+ - The top-level fallback `text` field is not subject to the character limit (Slack truncates it instead of rejecting), so preview text is left whole
236
+
213
237
  ## Optional blank-line visibility workaround
214
238
 
215
239
  When `preserve_visual_blank_lines=True` is passed to the main conversion APIs,
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "slack-markdown-parser"
7
- version = "2.4.3"
7
+ version = "2.5.0"
8
8
  description = "Convert LLM Markdown into Slack Block Kit messages"
9
9
  readme = "README.md"
10
10
  requires-python = ">=3.10"
@@ -1,6 +1,6 @@
1
1
  """slack-markdown-parser public package API."""
2
2
 
3
- __version__ = "2.4.3"
3
+ __version__ = "2.5.0"
4
4
  __license__ = "MIT"
5
5
 
6
6
  from .converter import (