PyPI - slack-markdown-parser - Versions diffs - 2.4.4__tar.gz → 2.5.0__tar.gz - Mend

slack-markdown-parser 2.4.4tar.gz → 2.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

{slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/CHANGELOG.md RENAMED Viewed

@@ -6,6 +6,18 @@ The format is based on Keep a Changelog, and the project follows Semantic Versio
 ## [Unreleased]
+## [2.5.0] - 2026-06-11
+### Added
+- Added automatic size splitting so long or heading-dense LLM output no longer fails `chat.postMessage` outright. Three Slack-side hard limits were measured against a real workspace on 2026-06-11 and are now enforced at conversion time: a `markdown` block's `text` accepts exactly 12,000 characters (`msg_too_long` beyond that); Slack expands `markdown` blocks server-side and enforces "no more than 50 items" per message on the expanded result (`invalid_blocks`), where each heading and each thematic break is one item while paragraph/list/quote/fence runs between them merge into one; and one message's blocks may carry at most 13,200 characters of text in total across block types (`msg_blocks_too_long`). Oversized regions are split preferring paragraph boundaries, then line and word boundaries, with a hard cut as a last resort for space-less CJK; a cut inside an unclosed fence re-opens the fence in the continuation block so both halves keep rendering as code. Pieces are re-checked after ZWSP/NBSP formatting and re-split with shrinking budgets when they still overflow. `convert_markdown_to_slack_messages` packs blocks under all three budgets in addition to the existing one-table-per-message rule; documents already within every limit are returned unchanged. Note that the same input can now produce more blocks and more messages than 2.4.x when it previously exceeded Slack's limits (which used to fail delivery entirely).
+### Fixed
+- Stopped corrupting code samples during sanitization. `decode_html_entities` and the angle-token neutralization inside `sanitize_slack_text` ran over the whole text, so a fenced code block or inline code span containing `<div>` or `&amp;` reached Slack as `＜div＞` / `&` even though Slack renders code content literally. Both passes now skip fenced code blocks and inline code spans; ANSI/control-character removal still applies everywhere. For this purpose a code span is recognized within a single line only and closes only on a backtick run of equal length (CommonMark pairing), so a stray unpaired backtick stays literal and cannot suppress sanitization of later lines, and an invalid angle token that spans a code span (`<foo `bar` baz>`) is still neutralized as a whole while the span content stays verbatim.
+- Stopped crafted input from colliding with internal placeholders. Input carrying this library's reserved in-band marker code points (`U+2063`, `U+FFF0`–`U+FFF3`, e.g. a literal `￰code0￱` sequence) could crash conversion with `KeyError` or get substituted with another code span's content. The markers are now stripped during sanitization and at direct-call entry points of the placeholder machinery, and placeholder restoration passes unknown sequences through instead of raising.
+- Consolidated the three duplicated fenced-code tracking loops onto a single `_iter_fence_states` helper so fence semantics cannot drift between passes again — that drift is exactly how the sanitize corruption happened.
 ## [2.4.4] - 2026-06-10
 ### Fixed

{slack_markdown_parser-2.4.4/slack_markdown_parser.egg-info → slack_markdown_parser-2.5.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: slack-markdown-parser
-Version: 2.4.4
+Version: 2.5.0
 Summary: Convert LLM Markdown into Slack Block Kit messages
 Author: darkgaldragon
 License-Expression: MIT
@@ -62,7 +62,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
 - Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
 - Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
 - Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
-- Remove ANSI/control characters and neutralize invalid Slack angle-bracket tokens before block generation
+- Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
+- Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
 - Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
 - Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
 - Support Markdown links and Slack-style links inside table cells
@@ -115,7 +116,8 @@ What this library compensates for:
 - Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
 - Keeps table-like rows inside fenced code blocks out of table normalization
 - Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
-- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
+- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
+- Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
 ## Requirements
@@ -151,7 +153,7 @@ for payload in convert_markdown_to_slack_payloads(
     print(payload)
 ```
-`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
+`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
 Set `preserve_visual_blank_lines=True` when you want the parser to compensate
 for Slack's currently tight paragraph spacing inside `markdown` blocks.
 The blank-line workaround is intentionally narrow: it skips table segments and
@@ -205,7 +207,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
 | Function | Description |
 |---|---|
-| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
+| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
 | `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
 | `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
 | `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
@@ -225,9 +227,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
 | Function | Description |
 |---|---|
 | `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
+| `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
 | `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
-| `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
-| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
+| `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
+| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
 | `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
 ### Lower-level exported helpers

{slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/README-ja.md RENAMED Viewed

@@ -32,7 +32,8 @@ Slack の `markdown` ブロック自体が対応していない構文は、古
 - 安全に判定できる単独 Markdown 構文を `image` / `divider` / `rich_text` ブロックに変換
 - LLM が生成する表で起こりやすい崩れ（外枠パイプ不足、区切り行不足、列数不一致、空セル）を補正
 - 必要に応じてメッセージを自動分割し、Slack の「1メッセージ1テーブル」制約とメッセージあたりのブロック数制限に対応
-- ANSI escape / 制御文字を除去し、不正な Slack 角括弧トークンを無害化
+- 長文や見出しの多い出力は段落境界を優先して複数の `markdown` ブロック・メッセージに分割し、実測した Slack のハード制限——`markdown` ブロックあたり 12,000 文字（`msg_too_long`）、見出し・区切り線を 1 個ずつ数えるメッセージあたり展開 50 アイテム（`invalid_blocks`）、メッセージあたりブロックテキスト総量 13,200 文字（`msg_blocks_too_long`）——のすべてに収める
+- ANSI escape / 制御文字 / ライブラリ予約の内部マーカー文字を除去し、散文中の不正な Slack 角括弧トークンを無害化（コードフェンスとインラインコードの中身は原文のまま保持）
 - フェンスドコードブロック外では、装飾記号の前後にゼロ幅スペースを入れて表示崩れを減らす
 - 日本語・中国語・韓国語の詰まった文で、インラインコードを含む装飾が崩れる一部のケースでは可視スペースを補って安定化
 - テーブルセル内の Markdown リンク / Slack 形式リンクを認識
@@ -74,7 +75,8 @@ Slack 側の制約として残るもの:
 - 意味が明確な単独 Markdown 構文を、raw `markdown` 表示に頼らず Slack ネイティブの Block Kit ブロックへ変換
 - フェンスドコード内の table 風行をテーブル処理から除外
 - 必要に応じて、内部空行を補助用の行に置き換えて段落の区切りを見えやすくする
-- 生 HTML 風タグなど、Slack の特殊記法としては無効な `<...>` 形式を無害化
+- 生 HTML 風タグなど、Slack の特殊記法としては無効な `<...>` 形式を散文中では無害化（コードフェンスとインラインコード内は原文のまま）
+- 実測した Slack のブロック文字数上限・メッセージ展開アイテム上限・メッセージテキスト総量上限を超える出力を分割し、`chat.postMessage` ごと失敗するのを防ぐ
 ## 利用前提
@@ -110,7 +112,7 @@ for payload in convert_markdown_to_slack_payloads(
     print(payload)
 ```
-`convert_markdown_to_slack_messages` は、複数テーブルを含む入力を Slack 制約に合わせて複数メッセージへ分割します。
+`convert_markdown_to_slack_messages` は、複数テーブルを含む入力に加えて、長文や見出しの多い内容が Slack のブロック・メッセージサイズ上限を超える場合も、自動的に複数メッセージへ分割します。
 Slack Web の新しい `markdown` 表示で段落間の余白が極端に小さい場合は、`preserve_visual_blank_lines=True` を使うと内部空行だけを見えやすく補えます。
 ## 入出力イメージ
@@ -160,7 +162,7 @@ QA | ~~保留~~ | Team C
 | 関数 | 説明 |
 |---|---|
-| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) → list[list[dict]]` | Markdown をテーブル分割済みのメッセージ群に変換 |
+| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) → list[list[dict]]` | Markdown を、テーブルと Slack の実測サイズ上限に沿って分割済みのメッセージ群に変換 |
 | `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) → list[dict]` | `blocks` とプレビュー用 `text` を含む Slack 送信用データへ変換 |
 | `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) → list[dict]` | Markdown を Block Kit ブロックのリストに変換 |
 | `build_fallback_text_from_blocks(blocks) → str` | `chat.postMessage.text` 用のプレビュー文字列を生成 |
@@ -175,9 +177,10 @@ QA | ~~保留~~ | Team C
 | 関数 | 説明 |
 |---|---|
 | `normalize_markdown_tables(markdown_text) → str` | テーブル記法を正規化（パイプ補完、区切り行生成、列数調整） |
+| `normalize_underscore_emphasis(text) → str` | `_..._` / `__...__` の underscore 装飾を Slack 互換の asterisk 装飾へ変換 |
 | `add_zero_width_spaces_to_markdown(text) → str` | 装飾記号の前後にゼロ幅スペースを挿入（フェンスドコードブロック内は除外） |
-| `decode_html_entities(text) → str` | HTML エンティティをデコード |
-| `sanitize_slack_text(text) → str` | ANSI / 制御文字を除去し、不正な Slack 角括弧トークンを無害化 |
+| `decode_html_entities(text) → str` | 散文中の HTML エンティティをデコード（コード領域は原文のまま） |
+| `sanitize_slack_text(text) → str` | ANSI / 制御文字 / 内部マーカー文字を除去し、コード領域外の不正な Slack 角括弧トークンを無害化 |
 | `strip_zero_width_spaces(text) → str` | ゼロ幅スペース (U+200B) と BOM (U+FEFF) を除去（ZWJ 等の結合制御文字は保持） |
 ## 仕様

{slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/README.md RENAMED Viewed

@@ -32,7 +32,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
 - Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
 - Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
 - Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
-- Remove ANSI/control characters and neutralize invalid Slack angle-bracket tokens before block generation
+- Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
+- Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
 - Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
 - Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
 - Support Markdown links and Slack-style links inside table cells
@@ -85,7 +86,8 @@ What this library compensates for:
 - Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
 - Keeps table-like rows inside fenced code blocks out of table normalization
 - Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
-- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
+- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
+- Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
 ## Requirements
@@ -121,7 +123,7 @@ for payload in convert_markdown_to_slack_payloads(
     print(payload)
 ```
-`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
+`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
 Set `preserve_visual_blank_lines=True` when you want the parser to compensate
 for Slack's currently tight paragraph spacing inside `markdown` blocks.
 The blank-line workaround is intentionally narrow: it skips table segments and
@@ -175,7 +177,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
 | Function | Description |
 |---|---|
-| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
+| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
 | `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
 | `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
 | `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
@@ -195,9 +197,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
 | Function | Description |
 |---|---|
 | `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
+| `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
 | `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
-| `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
-| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
+| `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
+| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
 | `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
 ### Lower-level exported helpers

{slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/docs/spec-ja.md RENAMED Viewed

@@ -10,7 +10,7 @@
 ## 出力
 - Slack Block Kit ブロック（`markdown`, `table`, `rich_text`, `image`, `divider`）
-- 複数テーブルや多数の昇格ブロックがある入力時は、「1メッセージ1テーブル」と Slack のメッセージあたりブロック数制限を満たすメッセージ群
+- 複数テーブル・多数の昇格ブロック・長文を含む入力時は、「1メッセージ1テーブル」、Slack のメッセージあたりブロック数制限、および「markdown ブロックのサイズ分割」に記載の実測サイズ上限を満たすメッセージ群
 ## 設計目標
@@ -23,8 +23,8 @@ Markdown としての厳密さより Slack 上での読みやすさが重要に
 `convert_markdown_to_slack_blocks` の処理順序:
-1. HTML エンティティをデコードし、`&gt;`, `&amp;` などを元の文字へ戻す
-2. Slack 向けのテキスト掃除を行い、ANSI / 制御文字を除去し、不正な Slack 角括弧トークンを無害化する
+1. 散文中の HTML エンティティをデコードし、`&gt;`, `&amp;` などを元の文字へ戻す（コードフェンスとインラインコードの中身はデコードしない）
+2. Slack 向けのテキスト掃除を行う。ANSI / 制御文字とライブラリ予約の内部マーカー文字は全体から除去し、不正な Slack 角括弧トークンの無害化はコードフェンスとインラインコードの外側にのみ適用する
 3. underscore 装飾を正規化し、`_..._` / `__...__` を Slack 互換の `*...*` / `**...**` に変換する
 4. bare URL を Slack で安定しやすい `<https://...>` 形式にそろえる
 5. 崩れた表を、後述のルールで補う
@@ -33,8 +33,9 @@ Markdown としての厳密さより Slack 上での読みやすさが重要に
    - テーブル領域: セル内装飾を解析して `table` ブロックを生成。変換に失敗した場合は `markdown` ブロックに戻す
    - 非テーブル領域: 安全に判定できる単独 Markdown 構文を先にリッチブロックへ変換し、残りのテキストは必要に応じてゼロ幅スペースを加えた上で `markdown` ブロックを生成する
    - `preserve_visual_blank_lines=True` の場合は、残った `markdown` ブロックの内部空行を「ノーブレークスペースだけを含む行」に置き換えてから `markdown` ブロックを作る
+   - 整形後のテキストが Slack の `markdown` ブロック上限（12,000 文字）を超える領域は、後述の「markdown ブロックの文字数分割」のルールで複数の `markdown` ブロックに分割する
-`convert_markdown_to_slack_messages` は上記の結果を「1メッセージ1テーブル」制約と Slack のメッセージあたりブロック数制限に沿って分割します。
+`convert_markdown_to_slack_messages` は上記の結果を、「1メッセージ1テーブル」制約、Slack のメッセージあたりブロック数制限、および「markdown ブロックのサイズ分割」に記載のメッセージあたり展開アイテム・テキスト総量の予算に沿って分割します。
 `convert_markdown_to_slack_payloads` は、同じ分割結果に `chat.postMessage.text` 用のプレビュー文字列を付けた送信データを返します。
 ## 実測ベースの Slack の挙動
@@ -83,7 +84,7 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
     - リストは、テキスト領域の先頭または空行の直後から始まり、連続する非空行がすべてリスト項目で、1〜3スペースの曖昧なネストインデントや Markdown バックスラッシュエスケープに依存せず、直後にインデント付きの継続段落がない場合だけ昇格する
 - フェンスドコード内の table 風行をテーブル解析対象から除外する
 - 内部空行を、必要に応じて段落区切りを見えやすくする補助行へ置き換える
-- `<foo>` や生 HTML 風タグのような、Slack の特殊記法としては無効な `<...>` 形式を無害化する
+- `<foo>` や生 HTML 風タグのような、Slack の特殊記法としては無効な `<...>` 形式を散文中では無害化する（コードフェンスとインラインコードの中身は原文のまま保持する）
 ## Slack 向けテキスト掃除のルール
@@ -91,10 +92,13 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
 - ANSI escape を除去する
 - 一般的な制御文字を除去する
+- ライブラリが内部プレースホルダ用に予約しているマーカー文字（`U+2063`、`U+FFF0`〜`U+FFF3`）を除去し、入力が内部機構と衝突しないようにする
 - 有効な Slack 角括弧トークンは保持する
   - 例: リンク、メンション、チャンネル参照、`<!here>`、`<!subteam^...>`、`<!date^...>`
 - Slack の特殊記法として解釈できない `<foo>` のようなトークンは `＜foo＞` に変換して無害化する
 - これには `<div>` や `<span>` のような生 HTML 風タグも含まれる
+- 角括弧トークンの無害化はコードフェンスとインラインコードの外側にのみ適用する。`` `<div>` `` のようなコード例は原文のまま Slack に届く。ANSI / 制御文字 / マーカー文字の除去は、表示内容として正当な用途がないため全体に適用する
+- この判定でのインラインコードスパンは同一行内に限り、開始と同じ長さのバッククォート run でのみ閉じる。対になっていない孤立バッククォートはリテラルのまま扱われ、後続行のサニタイズを妨げない。また、コードスパンをまたぐ無効な角括弧トークン（`<foo `bar` baz>`）はスパン内容を原文のまま保ちつつ全体を無害化する
 ## underscore 装飾正規化ルール
@@ -211,6 +215,25 @@ LLM は外枠パイプの省略、区切り行の欠落、列数の不一致な
 | U+200C | ZWNJ（ゼロ幅非接合子） | ペルシャ語・ヒンディー語などの語形制御に使われる |
 | U+200D | ZWJ（ゼロ幅接合子） | 結合絵文字やその他の文字結合に必要 |
+## markdown ブロックのサイズ分割
+2026-06-11 に実ワークスペースで、Slack 側の 3 つのハード制限を実測しました。
+- `markdown` ブロックの `text` はちょうど 12,000 文字まで受理。12,001 文字は `chat.postMessage` 全体が `msg_too_long` で拒否される
+- Slack は `markdown` ブロックをサーバ側でネイティブブロック列に展開し、展開後の「アイテム数 ≤ 50」をメッセージ単位で検証する（超過は `invalid_blocks`）。見出しと区切り線は 1 個ずつアイテムになる（見出し 50 個は受理、51 個は拒否。30 見出し×2 ブロックも合算で拒否）。段落・リスト・引用・コードフェンスは、見出し/区切り線に挟まれた連続区間ごとに 1 アイテムへ集約される（空行区切りの段落 60 個やフェンス 52 個は受理。空行だけでは区間は分かれない）
+- 1 メッセージのブロックが運べるテキスト総量はちょうど 13,200 文字（単一ブロック上限の 1.1 倍）。13,201 文字は `msg_blocks_too_long` で拒否される。総量はブロック種別をまたいで数えられる（11,900 文字の `markdown` ブロック + 1,400 文字の `rich_text` も拒否された）
+このため、長い、または見出しの多い非テーブル領域は送信前に分割します。
+- まず領域全体を 1 ブロックとして試し、整形後テキストが文字数上限を超えるか、展開アイテム数の見積もりが予算を超えた場合のみ分割する
+- ゼロ幅スペースや補助行の挿入でテキストが膨らみ、アイテム数見積もりも意図的に保守的なため、生テキストは上限より低い目標値（11,500 文字 / 45 アイテム）に向けて詰める
+- 分割点はまず段落境界（コードフェンス外の空行のまとまり）を選ぶ。境界に使った空行は、隣接ブロック自体が視覚的に分かれて表示されるため除去する
+- 予算を超える単一段落は行境界で、超過する単一行は語境界で分割する。スペースが無い場合（密な CJK 文など）はやむを得ず文字位置で切る
+- 未閉鎖のコードフェンス内で切れる場合は、続きのブロック先頭に元のフェンス開始行を再掲し、両方がコードとして表示され続けるようにする
+- 分割後の各ピースも整形後に再チェックし、ハード制限を超える場合は詰め込み予算を縮めて再分割する
+- `convert_markdown_to_slack_messages` はさらに、メッセージ内の展開アイテム見積もりの合計が 50 以内、ブロックテキストの総量が 13,200 文字以内に収まるようにブロックを束ねる（`markdown` 以外のブロックは 1 アイテムと数え、テキスト総量には全ブロック種別の内容を算入する）
+- 最上位のフォールバック `text` フィールドには文字数上限は適用されない（Slack は拒否せず切り詰める）ため、プレビュー文字列は分割しない
 ## 空行の見え方を補うオプション
 メインの変換 API に `preserve_visual_blank_lines=True` を渡すと、非テーブル領域で見える行に挟まれた空行だけを「ノーブレークスペースだけを含む行」に置き換えてから Slack `markdown` ブロックを生成します。

{slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/docs/spec.md RENAMED Viewed

@@ -10,7 +10,7 @@ This document describes how `slack-markdown-parser` converts Markdown into Slack
 ## Output
 - Slack Block Kit blocks (`markdown`, `table`, `rich_text`, `image`, and `divider`)
-- When the input contains multiple tables or many promoted blocks, a list of messages that satisfies the "one table per message" rule and Slack's per-message block-count limit
+- When the input contains multiple tables, many promoted blocks, or long content, a list of messages that satisfies the "one table per message" rule, Slack's per-message block-count limit, and the measured size limits described in "Markdown block size splitting"
 ## Design target
@@ -23,8 +23,8 @@ When exact Markdown fidelity conflicts with Slack readability, readable Slack ou
 `convert_markdown_to_slack_blocks` processes text in this order:
-1. Decode HTML entities such as `&gt;` and `&amp;`
-2. Clean Slack text by removing ANSI/control noise and neutralizing invalid Slack angle-bracket tokens
+1. Decode HTML entities such as `&gt;` and `&amp;` in prose, leaving fenced code blocks and inline code spans verbatim
+2. Clean Slack text: remove ANSI/control noise and this library's reserved internal marker code points everywhere, and neutralize invalid Slack angle-bracket tokens outside fenced code blocks and inline code spans
 3. Normalize underscore emphasis by converting `_..._` / `__...__` into Slack-friendly `*...*` / `**...**`
 4. Normalize bare URLs by wrapping them in Slack-friendly `<https://...>` form
 5. Repair malformed tables using the rules below
@@ -33,8 +33,9 @@ When exact Markdown fidelity conflicts with Slack readability, readable Slack ou
    - Table regions: parse inline cell styling and generate a `table` block. If conversion fails, such as when there are fewer than two candidate lines or the parse result is empty, fall back to a `markdown` block.
    - Non-table regions: first promote safe standalone Markdown constructs into richer Block Kit blocks, then add zero-width spaces where needed and generate `markdown` blocks for the remaining text.
    - If `preserve_visual_blank_lines=True`, replace internal blank lines in remaining `markdown` blocks with lines that contain only a non-breaking space before emitting the `markdown` block.
+   - A remaining region whose formatted text would exceed Slack's 12,000-character `markdown` block limit is split into multiple `markdown` blocks using the rules in "Markdown block length splitting" below.
-`convert_markdown_to_slack_messages` then splits the resulting block list to satisfy the "one table per message" rule and Slack's per-message block-count limit.
+`convert_markdown_to_slack_messages` then splits the resulting block list to satisfy the "one table per message" rule, Slack's per-message block-count limit, and the per-message expansion-item and total-text budgets described in "Markdown block size splitting".
 `convert_markdown_to_slack_payloads` returns the same split blocks plus preview `text` values ready for `chat.postMessage`.
 ## How Slack behaved in testing
@@ -84,7 +85,7 @@ Slack still controls when those newer features appear and how they look, so trea
     - Slack mention tokens inside a promoted list item are converted to their structured `rich_text` elements — `<@U…>`/`<@W…>` to `user`, `<#C…>`/`<#G…>` to `channel`, `<!subteam^S…>` to `usergroup`, and `<!here>`/`<!channel>`/`<!everyone>` to `broadcast` — since a `rich_text` block does not resolve a raw token. An optional `|label` display suffix is dropped (Slack renders the element from the id).
 - Table-like rows inside fenced code blocks are kept out of table parsing
 - Internal blank lines can optionally be rewritten into placeholder lines so Slack keeps visible paragraph separation
-- Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized
+- Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized in prose, while fenced code blocks and inline code spans keep them verbatim
 ## Slack text cleanup rules
@@ -92,9 +93,12 @@ Behavior of `sanitize_slack_text`:
 - Remove ANSI escape sequences
 - Remove general control characters except line breaks and tabs already preserved by the regex
+- Remove this library's reserved internal marker code points (`U+2063`, `U+FFF0`–`U+FFF3`) so input cannot collide with the internal placeholder machinery
 - Keep valid Slack angle-bracket tokens such as links, mentions, channels, special mentions, subteam mentions, and `<!date^...>`
 - Replace unsupported angle-bracket tokens such as `<foo>` with full-width brackets (`＜foo＞`) so Slack does not interpret them as malformed special syntax
 - This also applies to raw HTML-like tags such as `<div>` or `<span>`
+- Angle-token neutralization applies only outside fenced code blocks and inline code spans, so code samples such as `` `<div>` `` reach Slack verbatim; ANSI/control/marker removal applies everywhere because those characters are never legitimate content
+- For this purpose an inline code span is recognized within a single line only, and it closes only on a backtick run of the same length as the opener. A stray unpaired backtick therefore stays literal and cannot suppress sanitization of later lines, and an invalid angle token that spans a code span (`<foo `bar` baz>`) is still neutralized as a whole while the span content stays verbatim
 ## Underscore emphasis normalization rules
@@ -211,6 +215,25 @@ Exception:
 | `U+200C` | ZWNJ (zero-width non-joiner) | Used for word-shape control in languages such as Persian and Hindi |
 | `U+200D` | ZWJ (zero-width joiner) | Required for joined emoji and other grapheme composition |
+## Markdown block size splitting
+Three Slack-side hard limits were measured against a real workspace on 2026-06-11:
+- A `markdown` block's `text` accepts exactly 12,000 characters; 12,001 fails the whole `chat.postMessage` call with `msg_too_long`.
+- Slack expands `markdown` blocks server-side into native blocks and enforces "no more than 50 items" on the expanded result per message (`invalid_blocks`). Each heading and each thematic break becomes its own item (50 headings were accepted, 51 rejected; 30 headings in each of two blocks were rejected together), while paragraphs, lists, quotes, and fenced code merge into one item per contiguous run between those breakers (60 blank-separated paragraphs and 52 fences were accepted). Blank lines alone do not split a run.
+- One message's blocks may carry at most 13,200 characters of text in total — exactly 1.1 × the single-block limit; 13,201 fails with `msg_blocks_too_long`. The total counts content across block types (a 11,900-character `markdown` block plus a 1,400-character `rich_text` was rejected).
+Long or heading-dense non-table regions are therefore split before delivery:
+- The whole region is tried as a single block first; splitting happens only when the formatted text exceeds the character limit or the estimated expansion exceeds the per-message item budget
+- Raw content is packed toward targets below the hard limits (11,500 characters, 45 estimated items), because zero-width-space insertion and placeholder lines inflate the formatted text and the item estimate is intentionally conservative
+- Split points prefer paragraph boundaries (blank-line runs outside fenced code); the blank run at a chosen boundary is dropped, since adjacent Slack blocks already render visually separated
+- A single paragraph longer than the budget is split at line boundaries, and a single overlong line at word boundaries, with a hard cut when no space exists (for example dense CJK text)
+- When a cut lands inside an unclosed fenced code block, the continuation block re-opens the fence with the original delimiter line so both halves keep rendering as code
+- Each piece is re-checked after formatting; when it still exceeds a hard limit, the packing budgets shrink and the piece is split again
+- `convert_markdown_to_slack_messages` additionally packs blocks into messages so that the summed expansion estimate stays within the 50-item budget (non-`markdown` blocks count as one item each) and the summed block text stays within the 13,200-character per-message total
+- The top-level fallback `text` field is not subject to the character limit (Slack truncates it instead of rejecting), so preview text is left whole
 ## Optional blank-line visibility workaround
 When `preserve_visual_blank_lines=True` is passed to the main conversion APIs,

{slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "slack-markdown-parser"
-version = "2.4.4"
+version = "2.5.0"
 description = "Convert LLM Markdown into Slack Block Kit messages"
 readme = "README.md"
 requires-python = ">=3.10"

{slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser/__init__.py RENAMED Viewed

@@ -1,6 +1,6 @@
 """slack-markdown-parser public package API."""
-__version__ = "2.4.4"
+__version__ = "2.5.0"
 __license__ = "MIT"
 from .converter import (

{slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0}/slack_markdown_parser/converter.py RENAMED Viewed

@@ -8,6 +8,7 @@ from __future__ import annotations
 import html
 import re
+from collections.abc import Callable, Iterable, Iterator
 from typing import Any
 from urllib.parse import urlparse
@@ -22,6 +23,14 @@ ANSI_ESCAPE_PATTERN = re.compile(
     r"\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]|\][^\x1B\x07]*(?:\x07|\x1B\\))"
 )
 CONTROL_CHAR_PATTERN = re.compile(r"[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]")
+# In-band marker code points reserved by this module's placeholder/spacing
+# machinery: SYNTH_SPACE_MARKER (U+2063), the inline-code placeholder
+# delimiters (U+FFF0/U+FFF1), and the ZWSP-strip markers (U+FFF2/U+FFF3).
+# They have no legitimate use in chat text, and input that carries them would
+# collide with internal placeholders (a stray ``\ufff0code0\ufff1`` either
+# crashes restoration with KeyError or gets substituted with another span's
+# content), so they are removed up front together with control characters.
+INTERNAL_MARKER_CHAR_PATTERN = re.compile("[\u2063\ufff0-\ufff3]")
 SLACK_ANGLE_TOKEN_PATTERN = re.compile(r"<[^>\n]+>")
 BARE_URL_PATTERN = re.compile(r"https?://[^\s<]+", re.IGNORECASE)
 FENCE_OPEN_PATTERN = re.compile(r"^[ \t]{0,3}(`{3,}|~{3,})([^\n]*)$")
@@ -107,13 +116,47 @@ ALLOWED_SLACK_ANGLE_TOKEN_PATTERNS = (
     re.compile(r"^<!date\^[^>\n]+>$"),
 )
 SLACK_MAX_BLOCKS_PER_MESSAGE = 50
+# Verified against a real Slack workspace (2026-06-11): a ``markdown`` block's
+# ``text`` accepts exactly 12,000 characters, while 12,001 fails the whole
+# chat.postMessage call with ``msg_too_long``. The top-level fallback ``text``
+# field is not subject to this limit (40,001 characters was accepted).
+SLACK_MAX_MARKDOWN_TEXT_LENGTH = 12000
+# Raw-content packing target used when an oversized markdown segment is split.
+# Formatting inflates text (ZWSP padding, NBSP blank-line placeholders), so
+# pieces are packed below the hard limit; the block builder re-splits any
+# piece whose *formatted* text still exceeds the hard limit.
+_MARKDOWN_SPLIT_TARGET_LENGTH = 11500
+# Slack expands a ``markdown`` block server-side into native blocks and then
+# enforces "no more than 50 items" on the expanded result *per message*.
+# Measured against a real workspace (2026-06-11): each heading and each
+# thematic break becomes its own item (50 headings accepted, 51 rejected;
+# 30 headings in each of two blocks rejected), while paragraphs, lists,
+# quotes, and fenced code merge into one item per run between those breakers
+# (60 blank-separated paragraphs and 52 fences were accepted).
+SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE = 50
+# Per-block packing target, leaving headroom for estimation error.
+_MARKDOWN_EXPANSION_ITEMS_TARGET = 45
+# Third measured hard limit (2026-06-11): the text carried by one message's
+# blocks *in total* — across block types; rich_text content counts too — may
+# not exceed 13,200 characters (= 1.1 × the single-block limit). 13,201
+# fails the whole call with ``msg_blocks_too_long``.
+SLACK_MAX_MESSAGE_BLOCKS_TEXT_LENGTH = 13200
+# Packing target, leaving headroom for structural fields the size proxy may
+# not count exactly.
+_MESSAGE_BLOCKS_TEXT_TARGET = 12800
+ATX_HEADING_PATTERN = re.compile(r"^[ \t]{0,3}#{1,6}(?:[ \t]|$)")
 def decode_html_entities(text: str) -> str:
-    """Decode HTML entities that may appear in model output."""
-    if not text:
+    """Decode HTML entities in prose while leaving code regions verbatim.
+    Prose entities such as ``&gt;`` are decoded for natural reading, but a
+    fenced code block or inline code span showing ``&amp;`` keeps the literal
+    entity: code samples are content, not markup to repair.
+    """
+    if not text or "&" not in text:
         return text
-    return html.unescape(text)
+    return _transform_outside_code_regions(text, html.unescape)
 def strip_zero_width_spaces(text: str) -> str:
@@ -566,15 +609,90 @@ def _is_allowed_slack_angle_token(token: str) -> bool:
 def _find_inline_code_span_end(text: str, start: int) -> int | None:
+    """Find the end of the inline code span opened at ``start``.
+    Per CommonMark, a code span closes only with a backtick run of *equal*
+    length: a lone `` ` `` must not pair with the first backtick of a later
+    ``` `` ``` run. Runs of a different length are skipped whole.
+    """
     delimiter_end = start
     while delimiter_end < len(text) and text[delimiter_end] == "`":
         delimiter_end += 1
+    delimiter_length = delimiter_end - start
-    delimiter = text[start:delimiter_end]
-    closing = text.find(delimiter, delimiter_end)
-    if closing == -1:
-        return None
-    return closing + len(delimiter)
+    cursor = delimiter_end
+    while True:
+        closing = text.find("`", cursor)
+        if closing == -1:
+            return None
+        run_end = closing
+        while run_end < len(text) and text[run_end] == "`":
+            run_end += 1
+        if run_end - closing == delimiter_length:
+            return run_end
+        cursor = run_end
+def _transform_outside_inline_code(text: str, transform: Callable[[str], str]) -> str:
+    """Apply ``transform`` to text while keeping inline code spans verbatim.
+    Spans are bounded to a single line, matching this module's span model
+    (``INLINE_CODE_SPAN_PATTERN``). Without that bound, one stray backtick
+    would pair with a backtick on a much later line and silently suppress
+    sanitization for everything in between.
+    Spans are replaced with placeholder tokens (which contain no backticks or
+    angle brackets) rather than split out, so the transform still sees any
+    construct that *spans* a code span — e.g. an invalid angle token such as
+    ``<foo `bar` baz>`` is neutralized as a whole while the span content
+    itself stays verbatim. Reserved marker code points are stripped from the
+    input first, so crafted input cannot collide with the placeholders.
+    """
+    text = INTERNAL_MARKER_CHAR_PATTERN.sub("", text)
+    spans: list[str] = []
+    parts: list[str] = []
+    plain_start = 0
+    cursor = text.find("`")
+    while cursor != -1:
+        span_end = _find_inline_code_span_end(text, cursor)
+        if span_end is None or "\n" in text[cursor:span_end]:
+            # No same-line closing run: the backticks are literal text.
+            delimiter_end = cursor
+            while delimiter_end < len(text) and text[delimiter_end] == "`":
+                delimiter_end += 1
+            cursor = text.find("`", delimiter_end)
+            continue
+        parts.append(text[plain_start:cursor])
+        parts.append(f"\ufff0code{len(spans)}\ufff1")
+        spans.append(text[cursor:span_end])
+        plain_start = span_end
+        cursor = text.find("`", span_end)
+    parts.append(text[plain_start:])
+    transformed = transform("".join(parts))
+    if not spans:
+        return transformed
+    placeholder_map = {f"\ufff0code{idx}\ufff1": span for idx, span in enumerate(spans)}
+    return INLINE_CODE_PLACEHOLDER_PATTERN.sub(
+        lambda match: placeholder_map.get(match.group(0), match.group(0)),
+        transformed,
+    )
+def _transform_outside_code_regions(text: str, transform: Callable[[str], str]) -> str:
+    """Apply ``transform`` outside fenced code blocks and inline code spans.
+    Code samples must reach Slack verbatim: neither Slack's ``markdown`` block
+    renderer nor ``rich_text_preformatted`` interprets their content, so any
+    rewrite inside a code region is visible corruption.
+    """
+    return "".join(
+        chunk if is_fenced else _transform_outside_inline_code(chunk, transform)
+        for is_fenced, chunk in _split_fenced_code_chunks(text)
+    )
 def _is_punctuation_like(char: str, boundary_chars: set[str]) -> bool:
@@ -690,12 +808,21 @@ def normalize_bare_urls_for_slack_markdown(text: str) -> str:
 def sanitize_slack_text(text: str) -> str:
-    """Remove control noise and neutralize invalid Slack angle tokens."""
+    """Remove control noise and neutralize invalid Slack angle tokens.
+    ANSI escapes, control characters, and this module's reserved in-band
+    marker code points are removed everywhere — including code regions —
+    because they are never legitimate visible content. Angle-token
+    neutralization rewrites visible text, so it skips fenced code blocks and
+    inline code spans: a code sample containing ``<div>`` must reach Slack
+    verbatim.
+    """
     if not text:
         return text
     cleaned = ANSI_ESCAPE_PATTERN.sub("", text)
     cleaned = CONTROL_CHAR_PATTERN.sub("", cleaned)
+    cleaned = INTERNAL_MARKER_CHAR_PATTERN.sub("", cleaned)
     def replace_invalid_token(match: re.Match[str]) -> str:
         token = match.group(0)
@@ -703,7 +830,10 @@ def sanitize_slack_text(text: str) -> str:
             return token
         return f"＜{token[1:-1]}＞"
-    return SLACK_ANGLE_TOKEN_PATTERN.sub(replace_invalid_token, cleaned)
+    def neutralize_angle_tokens(segment: str) -> str:
+        return SLACK_ANGLE_TOKEN_PATTERN.sub(replace_invalid_token, segment)
+    return _transform_outside_code_regions(cleaned, neutralize_angle_tokens)
 def _match_fence_open(line: str) -> tuple[str, int] | None:
@@ -723,38 +853,58 @@ def _is_fence_close(line: str, fence: tuple[str, int]) -> bool:
     )
+def _iter_fence_states(lines: Iterable[str]) -> Iterator[tuple[str, bool, bool]]:
+    """Yield ``(line, is_fenced, is_opening)`` for each line.
+    Single source of truth for fenced-code tracking across this module.
+    ``is_fenced`` covers the fence delimiter lines themselves and the body of
+    an unclosed trailing fence; ``is_opening`` marks the opening delimiter
+    line so callers can flush per-fence state. Works with or without trailing
+    newlines on the lines.
+    """
+    active_fence: tuple[str, int] | None = None
+    for line in lines:
+        if active_fence is None:
+            opening_fence = _match_fence_open(line)
+            if opening_fence is not None:
+                active_fence = opening_fence
+                yield line, True, True
+                continue
+            yield line, False, False
+            continue
+        yield line, True, False
+        if _is_fence_close(line, active_fence):
+            active_fence = None
 def _split_fenced_code_chunks(text: str) -> list[tuple[bool, str]]:
     chunks: list[tuple[bool, str]] = []
     if not text:
         return chunks
     current: list[str] = []
-    active_fence: tuple[str, int] | None = None
+    current_is_fenced = False
-    for line in text.splitlines(keepends=True):
-        opening_fence = _match_fence_open(line) if active_fence is None else None
-        if opening_fence:
-            if current:
-                chunks.append((False, "".join(current)))
-                current = []
-            current.append(line)
-            active_fence = opening_fence
-            continue
-        current.append(line)
-        if active_fence and _is_fence_close(line, active_fence):
-            chunks.append((True, "".join(current)))
+    for line, is_fenced, is_opening in _iter_fence_states(
+        text.splitlines(keepends=True)
+    ):
+        if current and (is_opening or is_fenced != current_is_fenced):
+            chunks.append((current_is_fenced, "".join(current)))
             current = []
-            active_fence = None
+        current.append(line)
+        current_is_fenced = is_fenced
     if current:
-        chunks.append((active_fence is not None, "".join(current)))
+        chunks.append((current_is_fenced, "".join(current)))
     return chunks
 def _normalize_underscore_emphasis_chunk(text: str) -> str:
+    # Same defense as _format_markdown_with_spacing_metadata: reserved marker
+    # code points in direct-call input must not collide with the numbered
+    # placeholders below.
+    text = INTERNAL_MARKER_CHAR_PATTERN.sub("", text)
     protected_spans: list[str] = []
     def protect(match: re.Match[str]) -> str:
@@ -798,6 +948,11 @@ def _format_markdown_with_spacing_metadata(text: str) -> tuple[str, list[int]]:
     if not text:
         return text, []
+    # Defense in depth for direct calls that bypass sanitize_slack_text: input
+    # carrying our reserved marker code points would collide with the inline
+    # placeholder machinery below.
+    text = INTERNAL_MARKER_CHAR_PATTERN.sub("", text)
     boundary_chars = {*VISIBLE_BOUNDARY_CHARS, ZWSP, SYNTH_SPACE_MARKER}
     def wrap_match(match: re.Match[str], source: str) -> str:
@@ -871,9 +1026,16 @@ def _format_markdown_with_spacing_metadata(text: str) -> tuple[str, list[int]]:
         before_char = source[start - 1] if start > 0 else ""
         after_char = source[end] if end < len(source) else ""
         strategy = _nested_code_space_strategy(source, start, end, boundary_chars)
+        def resolve_placeholder_raw(placeholder_match: re.Match[str]) -> str:
+            # Unknown placeholder-shaped sequences pass through unchanged
+            # (belt-and-braces against in-band collisions; the markers are
+            # already stripped at every entry point).
+            entry = replacements.get(placeholder_match.group(0))
+            return entry["raw"] if entry else placeholder_match.group(0)
         resolved_text = INLINE_CODE_PLACEHOLDER_PATTERN.sub(
-            lambda placeholder_match: replacements[placeholder_match.group(0)]["raw"],
-            match.group(0),
+            resolve_placeholder_raw, match.group(0)
         )
         has_ascii_word = bool(re.search(r"[A-Za-z0-9]", resolved_text))
         adjusted_text = match.group(0)
@@ -981,11 +1143,12 @@ def _format_markdown_with_spacing_metadata(text: str) -> tuple[str, list[int]]:
                 protected_segment,
             )
+        def restore_placeholder(placeholder_match: re.Match[str]) -> str:
+            entry = placeholder_map.get(placeholder_match.group(0))
+            return entry["wrapped"] if entry else placeholder_match.group(0)
         protected_segment = INLINE_CODE_PLACEHOLDER_PATTERN.sub(
-            lambda placeholder_match: placeholder_map[placeholder_match.group(0)][
-                "wrapped"
-            ],
-            protected_segment,
+            restore_placeholder, protected_segment
         )
         protected_segment = re.sub(
@@ -1275,20 +1438,11 @@ def normalize_markdown_tables(markdown_text: str) -> str:
             normalized.extend(buffer)
         buffer = []
-    active_fence: tuple[str, int] | None = None
-    for idx, line in enumerate(lines):
-        opening_fence = _match_fence_open(line) if active_fence is None else None
-        if opening_fence:
-            flush_buffer()
-            normalized.append(line)
-            active_fence = opening_fence
-            continue
-        if active_fence:
+    for idx, (line, is_fenced, is_opening) in enumerate(_iter_fence_states(lines)):
+        if is_fenced:
+            if is_opening:
+                flush_buffer()
             normalized.append(line)
-            if _is_fence_close(line, active_fence):
-                active_fence = None
             continue
         stripped = line.strip()
@@ -1608,9 +1762,283 @@ def _create_markdown_block(
     block._plain_text = plain_text
     block._synthetic_space_indices = synthetic_indices
     block._synthetic_blank_line_indices = synthetic_blank_line_indices
+    block._expansion_items = _estimate_markdown_expansion_items(formatted)
     return block
+def _is_markdown_expansion_breaker_line(line: str) -> bool:
+    """Return True for lines Slack expands into their own top-level block.
+    ATX headings and thematic breaks each become one expansion item and end
+    the surrounding content run. A setext ``===`` underline is counted too;
+    that can over-count by one (heading text + underline), which only makes
+    the estimate conservative.
+    """
+    if ATX_HEADING_PATTERN.match(line):
+        return True
+    if _is_thematic_break_line(line):
+        return True
+    stripped = line.strip()
+    return bool(stripped) and set(stripped) == {"="}
+def _estimate_markdown_expansion_items(text: str) -> int:
+    """Estimate how many native blocks Slack expands this markdown text into.
+    Model measured against a real workspace (see the constants above): each
+    heading / thematic break is one item, and each maximal run of any other
+    content *between* those breakers is one item — blank lines inside a run
+    do not split it. Fenced lines always count as run content.
+    """
+    items = 0
+    in_run = False
+    for line, is_fenced, _ in _iter_fence_states(text.split("\n")):
+        if is_fenced:
+            if not in_run:
+                items += 1
+                in_run = True
+            continue
+        if not line.strip():
+            continue
+        if _is_markdown_expansion_breaker_line(line):
+            items += 1
+            in_run = False
+            continue
+        if not in_run:
+            items += 1
+            in_run = True
+    return max(1, items)
+def _split_text_at_blank_lines(text: str, max_length: int, max_items: int) -> list[str]:
+    """Greedily pack paragraph units into pieces within both budgets.
+    Units are separated by blank-line runs outside fenced code (blank lines
+    inside a fence never split). The blank run at a piece boundary is dropped
+    — adjacent Slack blocks already render separated — while blank runs
+    packed inside a piece are kept verbatim. A piece is closed when adding
+    the next unit would exceed ``max_length`` characters or ``max_items``
+    estimated expansion items. A single unit over either budget is returned
+    oversized; the caller splits it harder.
+    """
+    if (
+        len(text) <= max_length
+        and _estimate_markdown_expansion_items(text) <= max_items
+    ):
+        return [text]
+    units: list[list[str]] = []
+    content: list[str] = []
+    blanks: list[str] = []
+    for line, is_fenced, _ in _iter_fence_states(text.split("\n")):
+        if not is_fenced and not line.strip():
+            if content:
+                blanks.append(line)
+            else:
+                # Leading blank lines stay attached to the first unit.
+                content.append(line)
+            continue
+        if blanks:
+            units.append(content)
+            units.append(blanks)
+            content, blanks = [], []
+        content.append(line)
+    if content:
+        units.append(content)
+    if blanks:
+        units.append(blanks)
+    pieces: list[str] = []
+    current: list[str] = []
+    pending_blanks: list[str] = []
+    for index in range(0, len(units), 2):
+        unit = units[index]
+        candidate = current + (pending_blanks if current else []) + unit
+        candidate_text = "\n".join(candidate)
+        if current and (
+            len(candidate_text) > max_length
+            or _estimate_markdown_expansion_items(candidate_text) > max_items
+        ):
+            pieces.append("\n".join(current))
+            current = list(unit)
+        else:
+            current = candidate
+        pending_blanks = units[index + 1] if index + 1 < len(units) else []
+    if current:
+        pieces.append("\n".join(current))
+    return pieces
+def _split_single_line_to_length(line: str, max_length: int) -> list[str]:
+    """Split one overlong line, preferring a space boundary near the limit."""
+    parts: list[str] = []
+    while len(line) > max_length:
+        cut = line.rfind(" ", 1, max_length + 1)
+        if cut <= 0:
+            parts.append(line[:max_length])
+            line = line[max_length:]
+        else:
+            parts.append(line[:cut])
+            line = line[cut + 1 :]
+    parts.append(line)
+    return parts
+def _split_lines_to_length(text: str, max_length: int, max_items: int) -> list[str]:
+    """Split at line boundaries into pieces within both budgets.
+    Last-resort splitter for content that exceeds a budget without blank-line
+    split points (a single huge paragraph, a fence body, or a long run of
+    headings). When the cut lands inside an (unclosed) fence, the continuation
+    piece re-opens the fence with the original delimiter line so both pieces
+    keep rendering as code.
+    """
+    pieces: list[str] = []
+    current: list[str] = []
+    current_len = 0
+    current_items = 0
+    in_run = False
+    active_fence_open: str | None = None
+    def flush(next_fence_prefix: str | None) -> None:
+        nonlocal current, current_len, current_items, in_run
+        if current:
+            pieces.append("\n".join(current))
+        if next_fence_prefix:
+            current = [next_fence_prefix]
+            current_len = len(next_fence_prefix)
+            current_items = 1
+            in_run = True
+        else:
+            current = []
+            current_len = 0
+            current_items = 0
+            in_run = False
+    for line, is_fenced, is_opening in _iter_fence_states(text.split("\n")):
+        if is_opening:
+            active_fence_open = line
+        elif not is_fenced:
+            active_fence_open = None
+        line_is_blank = not is_fenced and not line.strip()
+        line_is_breaker = (
+            not is_fenced
+            and not line_is_blank
+            and _is_markdown_expansion_breaker_line(line)
+        )
+        for part_index, part in enumerate(
+            [line]
+            if len(line) <= max_length
+            else _split_single_line_to_length(line, max_length)
+        ):
+            # Word-split continuations of a breaker line render as plain
+            # content, so only the first part keeps the breaker class.
+            is_breaker = line_is_breaker and part_index == 0
+            if line_is_blank:
+                part_items = 0
+            elif is_breaker:
+                part_items = 1
+            else:
+                part_items = 0 if in_run else 1
+            added = len(part) + (1 if current else 0)
+            if current and (
+                current_len + added > max_length
+                or current_items + part_items > max_items
+            ):
+                reopen = active_fence_open if active_fence_open != part else None
+                flush(reopen)
+                added = len(part) + (1 if current else 0)
+                if line_is_blank:
+                    part_items = 0
+                elif is_breaker:
+                    part_items = 1
+                else:
+                    part_items = 0 if in_run else 1
+            current.append(part)
+            current_len += added
+            current_items += part_items
+            if is_breaker:
+                in_run = False
+            elif not line_is_blank:
+                in_run = True
+    if current:
+        pieces.append("\n".join(current))
+    return pieces
+def _markdown_block_fits_slack_limits(block: dict[str, Any]) -> bool:
+    return (
+        len(block["text"]) <= SLACK_MAX_MARKDOWN_TEXT_LENGTH
+        and _estimate_markdown_expansion_items(block["text"])
+        <= SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE
+    )
+def _create_markdown_blocks(
+    content: str, *, preserve_visual_blank_lines: bool = False
+) -> list[dict[str, Any]]:
+    """Build ``markdown`` blocks that fit Slack's measured hard limits.
+    The whole content is tried as a single block first. Only when the
+    *formatted* text exceeds ``SLACK_MAX_MARKDOWN_TEXT_LENGTH`` or the
+    estimated server-side expansion exceeds the per-message item budget is
+    the raw content split — at paragraph boundaries when possible, then at
+    line/word boundaries — and each piece re-checked after formatting,
+    shrinking the packing budget geometrically until every block fits.
+    """
+    def build(piece: str) -> dict[str, Any] | None:
+        return _create_markdown_block(
+            piece, preserve_visual_blank_lines=preserve_visual_blank_lines
+        )
+    whole = build(content)
+    if whole is None:
+        return []
+    if _markdown_block_fits_slack_limits(whole):
+        return [whole]
+    blocks: list[dict[str, Any]] = []
+    worklist: list[tuple[str, int, int]] = [
+        (content, _MARKDOWN_SPLIT_TARGET_LENGTH, _MARKDOWN_EXPANSION_ITEMS_TARGET)
+    ]
+    while worklist:
+        piece, budget, items_budget = worklist.pop(0)
+        block = build(piece)
+        if block is None:
+            continue
+        if _markdown_block_fits_slack_limits(block):
+            blocks.append(block)
+            continue
+        sub_pieces = _split_text_at_blank_lines(piece, budget, items_budget)
+        if len(sub_pieces) == 1:
+            sub_pieces = _split_lines_to_length(piece, budget, items_budget)
+        if len(sub_pieces) > 1:
+            worklist = [(sub, budget, items_budget) for sub in sub_pieces] + worklist
+            continue
+        if budget > 256 or items_budget > 8:
+            # The piece fits the raw budgets but its *formatted* text overflows
+            # (ZWSP/NBSP inflation or estimation drift): shrink and retry.
+            worklist.insert(
+                0,
+                (
+                    piece,
+                    max(256, int(budget * 0.8)),
+                    max(8, int(items_budget * 0.8)),
+                ),
+            )
+            continue
+        # A floor-budget piece cannot exceed the hard limits, so this is
+        # unreachable; keep the block rather than loop forever.
+        blocks.append(block)
+    return blocks
 def _create_rich_text_block(
     elements: list[dict[str, Any]], *, plain_text: str | None = None
 ) -> dict[str, Any]:
@@ -1903,12 +2331,12 @@ def _convert_markdown_text_segment_to_blocks(
             markdown_buffer.pop()
         if not markdown_buffer:
             return
-        markdown_block = _create_markdown_block(
-            "\n".join(markdown_buffer),
-            preserve_visual_blank_lines=preserve_visual_blank_lines,
+        blocks.extend(
+            _create_markdown_blocks(
+                "\n".join(markdown_buffer),
+                preserve_visual_blank_lines=preserve_visual_blank_lines,
+            )
         )
-        if markdown_block:
-            blocks.append(markdown_block)
         markdown_buffer = []
     while cursor < len(lines):
@@ -1956,16 +2384,10 @@ def split_markdown_into_segments(markdown_text: str) -> list[dict[str, str]]:
         current = []
         current_is_table = None
-    active_fence: tuple[str, int] | None = None
-    for line in lines:
+    for line, is_fenced, _ in _iter_fence_states(lines):
         stripped = line.strip()
-        opening_fence = _match_fence_open(line) if active_fence is None else None
-        is_fenced_line = active_fence is not None or opening_fence is not None
         is_table_line = (
-            False
-            if is_fenced_line
-            else stripped.startswith("|") and stripped.endswith("|")
+            False if is_fenced else stripped.startswith("|") and stripped.endswith("|")
         )
         if current_is_table is None:
@@ -1978,11 +2400,6 @@ def split_markdown_into_segments(markdown_text: str) -> list[dict[str, str]]:
             current_is_table = is_table_line
             current.append(line)
-        if opening_fence:
-            active_fence = opening_fence
-        elif active_fence and _is_fence_close(line, active_fence):
-            active_fence = None
     flush()
     return segments
@@ -2026,10 +2443,59 @@ def convert_markdown_to_slack_blocks(
 convert_markdown_text_to_blocks = convert_markdown_to_slack_blocks
+def _block_expansion_weight(block: dict[str, Any]) -> int:
+    """Weight of one block against Slack's per-message expansion budget.
+    Slack expands ``markdown`` blocks server-side and enforces the 50-item
+    limit on the expanded result, so a markdown block counts as its estimated
+    expansion; every other block type posts as a single item.
+    """
+    if not isinstance(block, dict) or block.get("type") != "markdown":
+        return 1
+    annotated = getattr(block, "_expansion_items", None)
+    if isinstance(annotated, int) and annotated > 0:
+        return annotated
+    return _estimate_markdown_expansion_items(str(block.get("text", "")))
+_TEXT_SIZE_KEYS = frozenset({"text", "url", "alt_text", "image_url"})
+def _block_text_size(value: Any) -> int:
+    """Rough text payload of a block against the per-message total budget.
+    Slack's ``msg_blocks_too_long`` check counts content across block types
+    (a 11,900-char markdown block plus a 1,400-char rich_text was rejected),
+    so this sums every string under content-carrying keys, recursively.
+    """
+    if isinstance(value, dict):
+        total = 0
+        for key, sub in value.items():
+            if key in _TEXT_SIZE_KEYS and isinstance(sub, str):
+                total += len(sub)
+            else:
+                total += _block_text_size(sub)
+        return total
+    if isinstance(value, list):
+        return sum(_block_text_size(item) for item in value)
+    return 0
 def split_blocks_by_table(blocks: list[dict[str, Any]]) -> list[list[dict[str, Any]]]:
-    """Split blocks to satisfy Slack table and per-message block constraints."""
+    """Split blocks to satisfy Slack table and per-message constraints.
+    A message holds at most one ``table`` block, at most
+    ``SLACK_MAX_BLOCKS_PER_MESSAGE`` posted blocks, at most
+    ``SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE`` estimated post-expansion items
+    (headings and thematic breaks inside ``markdown`` blocks count
+    individually toward that budget), and at most
+    ``SLACK_MAX_MESSAGE_BLOCKS_TEXT_LENGTH`` characters of block text in
+    total across block types.
+    """
     messages: list[list[dict[str, Any]]] = []
     current_message: list[dict[str, Any]] = []
+    current_weight = 0
+    current_text_size = 0
     for block in blocks or []:
         if isinstance(block, dict) and block.get("type") == "table":
@@ -2037,11 +2503,23 @@ def split_blocks_by_table(blocks: list[dict[str, Any]]) -> list[list[dict[str, A
                 messages.append(current_message)
             messages.append([block])
             current_message = []
+            current_weight = 0
+            current_text_size = 0
         else:
-            if len(current_message) >= SLACK_MAX_BLOCKS_PER_MESSAGE:
+            weight = _block_expansion_weight(block)
+            text_size = _block_text_size(block)
+            if current_message and (
+                len(current_message) >= SLACK_MAX_BLOCKS_PER_MESSAGE
+                or current_weight + weight > SLACK_MAX_EXPANSION_ITEMS_PER_MESSAGE
+                or current_text_size + text_size > _MESSAGE_BLOCKS_TEXT_TARGET
+            ):
                 messages.append(current_message)
                 current_message = []
+                current_weight = 0
+                current_text_size = 0
             current_message.append(block)
+            current_weight += weight
+            current_text_size += text_size
     if current_message:
         messages.append(current_message)

{slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.0/slack_markdown_parser.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: slack-markdown-parser
-Version: 2.4.4
+Version: 2.5.0
 Summary: Convert LLM Markdown into Slack Block Kit messages
 Author: darkgaldragon
 License-Expression: MIT
@@ -62,7 +62,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
 - Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
 - Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
 - Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
-- Remove ANSI/control characters and neutralize invalid Slack angle-bracket tokens before block generation
+- Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
+- Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
 - Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
 - Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
 - Support Markdown links and Slack-style links inside table cells
@@ -115,7 +116,8 @@ What this library compensates for:
 - Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
 - Keeps table-like rows inside fenced code blocks out of table normalization
 - Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
-- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
+- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
+- Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
 ## Requirements
@@ -151,7 +153,7 @@ for payload in convert_markdown_to_slack_payloads(
     print(payload)
 ```
-`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
+`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
 Set `preserve_visual_blank_lines=True` when you want the parser to compensate
 for Slack's currently tight paragraph spacing inside `markdown` blocks.
 The blank-line workaround is intentionally narrow: it skips table segments and
@@ -205,7 +207,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
 | Function | Description |
 |---|---|
-| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
+| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
 | `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
 | `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
 | `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
@@ -225,9 +227,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
 | Function | Description |
 |---|---|
 | `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
+| `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
 | `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
-| `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
-| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
+| `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
+| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
 | `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
 ### Lower-level exported helpers