slack-markdown-parser 2.4.4__tar.gz → 2.5.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/CHANGELOG.md +18 -0
- {slack_markdown_parser-2.4.4/slack_markdown_parser.egg-info → slack_markdown_parser-2.5.1}/PKG-INFO +10 -7
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/README-ja.md +9 -6
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/README.md +9 -6
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/docs/spec-ja.md +29 -5
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/docs/spec.md +29 -5
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/pyproject.toml +1 -1
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/slack_markdown_parser/__init__.py +1 -1
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/slack_markdown_parser/converter.py +574 -77
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1/slack_markdown_parser.egg-info}/PKG-INFO +10 -7
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/LICENSE +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/MANIFEST.in +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/setup.cfg +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/slack_markdown_parser/py.typed +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/slack_markdown_parser.egg-info/SOURCES.txt +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/slack_markdown_parser.egg-info/dependency_links.txt +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/slack_markdown_parser.egg-info/requires.txt +0 -0
- {slack_markdown_parser-2.4.4 → slack_markdown_parser-2.5.1}/slack_markdown_parser.egg-info/top_level.txt +0 -0
|
@@ -6,6 +6,24 @@ The format is based on Keep a Changelog, and the project follows Semantic Versio
|
|
|
6
6
|
|
|
7
7
|
## [Unreleased]
|
|
8
8
|
|
|
9
|
+
## [2.5.1] - 2026-06-11
|
|
10
|
+
|
|
11
|
+
### Fixed
|
|
12
|
+
|
|
13
|
+
- Stopped a lone `<` in a table cell from swallowing later cell separators. The cell splitter tracked angle brackets with a stateful flag that any `<` turned on and only a `>` turned off, so a bare `<` with no closing `>` on the same line — a comparison like `x < y` or a threshold like `< 100ms`, both common in LLM-generated tables — silently merged the remaining cells into one, shifted the columns, and filled the lost trailing cell with `-`. Pipes are now protected only inside valid Slack angle tokens (links, mentions, `<!date^...>`), which are consumed whole; a bare `<` stays literal. The heading+table-row splitter shares the same scan, so a `# Heading |a|b|` line now also splits when the heading contains a lone `<`, and standalone `parse_markdown_table` / `normalize_markdown_tables` now match the sanitized pipeline's behavior.
|
|
14
|
+
|
|
15
|
+
## [2.5.0] - 2026-06-11
|
|
16
|
+
|
|
17
|
+
### Added
|
|
18
|
+
|
|
19
|
+
- Added automatic size splitting so long or heading-dense LLM output no longer fails `chat.postMessage` outright. Three Slack-side hard limits were measured against a real workspace on 2026-06-11 and are now enforced at conversion time: a `markdown` block's `text` accepts exactly 12,000 characters (`msg_too_long` beyond that); Slack expands `markdown` blocks server-side and enforces "no more than 50 items" per message on the expanded result (`invalid_blocks`), where each heading and each thematic break is one item while paragraph/list/quote/fence runs between them merge into one; and one message's blocks may carry at most 13,200 characters of text in total across block types (`msg_blocks_too_long`). Oversized regions are split preferring paragraph boundaries, then line and word boundaries, with a hard cut as a last resort for space-less CJK; a cut inside an unclosed fence re-opens the fence in the continuation block so both halves keep rendering as code. Pieces are re-checked after ZWSP/NBSP formatting and re-split with shrinking budgets when they still overflow. `convert_markdown_to_slack_messages` packs blocks under all three budgets in addition to the existing one-table-per-message rule; documents already within every limit are returned unchanged. Note that the same input can now produce more blocks and more messages than 2.4.x when it previously exceeded Slack's limits (which used to fail delivery entirely).
|
|
20
|
+
|
|
21
|
+
### Fixed
|
|
22
|
+
|
|
23
|
+
- Stopped corrupting code samples during sanitization. `decode_html_entities` and the angle-token neutralization inside `sanitize_slack_text` ran over the whole text, so a fenced code block or inline code span containing `<div>` or `&` reached Slack as `<div>` / `&` even though Slack renders code content literally. Both passes now skip fenced code blocks and inline code spans; ANSI/control-character removal still applies everywhere. For this purpose a code span is recognized within a single line only and closes only on a backtick run of equal length (CommonMark pairing), so a stray unpaired backtick stays literal and cannot suppress sanitization of later lines, and an invalid angle token that spans a code span (`<foo `bar` baz>`) is still neutralized as a whole while the span content stays verbatim.
|
|
24
|
+
- Stopped crafted input from colliding with internal placeholders. Input carrying this library's reserved in-band marker code points (`U+2063`, `U+FFF0`–`U+FFF3`, e.g. a literal `code0` sequence) could crash conversion with `KeyError` or get substituted with another code span's content. The markers are now stripped during sanitization and at direct-call entry points of the placeholder machinery, and placeholder restoration passes unknown sequences through instead of raising.
|
|
25
|
+
- Consolidated the three duplicated fenced-code tracking loops onto a single `_iter_fence_states` helper so fence semantics cannot drift between passes again — that drift is exactly how the sanitize corruption happened.
|
|
26
|
+
|
|
9
27
|
## [2.4.4] - 2026-06-10
|
|
10
28
|
|
|
11
29
|
### Fixed
|
{slack_markdown_parser-2.4.4/slack_markdown_parser.egg-info → slack_markdown_parser-2.5.1}/PKG-INFO
RENAMED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: slack-markdown-parser
|
|
3
|
-
Version: 2.
|
|
3
|
+
Version: 2.5.1
|
|
4
4
|
Summary: Convert LLM Markdown into Slack Block Kit messages
|
|
5
5
|
Author: darkgaldragon
|
|
6
6
|
License-Expression: MIT
|
|
@@ -62,7 +62,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
|
|
|
62
62
|
- Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
|
|
63
63
|
- Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
|
|
64
64
|
- Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
|
|
65
|
-
-
|
|
65
|
+
- Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
|
|
66
|
+
- Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
|
|
66
67
|
- Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
|
|
67
68
|
- Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
|
|
68
69
|
- Support Markdown links and Slack-style links inside table cells
|
|
@@ -115,7 +116,8 @@ What this library compensates for:
|
|
|
115
116
|
- Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
|
|
116
117
|
- Keeps table-like rows inside fenced code blocks out of table normalization
|
|
117
118
|
- Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
|
|
118
|
-
- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
|
|
119
|
+
- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
|
|
120
|
+
- Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
|
|
119
121
|
|
|
120
122
|
## Requirements
|
|
121
123
|
|
|
@@ -151,7 +153,7 @@ for payload in convert_markdown_to_slack_payloads(
|
|
|
151
153
|
print(payload)
|
|
152
154
|
```
|
|
153
155
|
|
|
154
|
-
`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
|
|
156
|
+
`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
|
|
155
157
|
Set `preserve_visual_blank_lines=True` when you want the parser to compensate
|
|
156
158
|
for Slack's currently tight paragraph spacing inside `markdown` blocks.
|
|
157
159
|
The blank-line workaround is intentionally narrow: it skips table segments and
|
|
@@ -205,7 +207,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
|
|
|
205
207
|
|
|
206
208
|
| Function | Description |
|
|
207
209
|
|---|---|
|
|
208
|
-
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
|
|
210
|
+
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
|
|
209
211
|
| `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
|
|
210
212
|
| `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
|
|
211
213
|
| `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
|
|
@@ -225,9 +227,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
|
|
|
225
227
|
| Function | Description |
|
|
226
228
|
|---|---|
|
|
227
229
|
| `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
|
|
230
|
+
| `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
|
|
228
231
|
| `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
|
|
229
|
-
| `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
|
|
230
|
-
| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
|
|
232
|
+
| `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
|
|
233
|
+
| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
|
|
231
234
|
| `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
|
|
232
235
|
|
|
233
236
|
### Lower-level exported helpers
|
|
@@ -32,7 +32,8 @@ Slack の `markdown` ブロック自体が対応していない構文は、古
|
|
|
32
32
|
- 安全に判定できる単独 Markdown 構文を `image` / `divider` / `rich_text` ブロックに変換
|
|
33
33
|
- LLM が生成する表で起こりやすい崩れ(外枠パイプ不足、区切り行不足、列数不一致、空セル)を補正
|
|
34
34
|
- 必要に応じてメッセージを自動分割し、Slack の「1メッセージ1テーブル」制約とメッセージあたりのブロック数制限に対応
|
|
35
|
-
-
|
|
35
|
+
- 長文や見出しの多い出力は段落境界を優先して複数の `markdown` ブロック・メッセージに分割し、実測した Slack のハード制限——`markdown` ブロックあたり 12,000 文字(`msg_too_long`)、見出し・区切り線を 1 個ずつ数えるメッセージあたり展開 50 アイテム(`invalid_blocks`)、メッセージあたりブロックテキスト総量 13,200 文字(`msg_blocks_too_long`)——のすべてに収める
|
|
36
|
+
- ANSI escape / 制御文字 / ライブラリ予約の内部マーカー文字を除去し、散文中の不正な Slack 角括弧トークンを無害化(コードフェンスとインラインコードの中身は原文のまま保持)
|
|
36
37
|
- フェンスドコードブロック外では、装飾記号の前後にゼロ幅スペースを入れて表示崩れを減らす
|
|
37
38
|
- 日本語・中国語・韓国語の詰まった文で、インラインコードを含む装飾が崩れる一部のケースでは可視スペースを補って安定化
|
|
38
39
|
- テーブルセル内の Markdown リンク / Slack 形式リンクを認識
|
|
@@ -74,7 +75,8 @@ Slack 側の制約として残るもの:
|
|
|
74
75
|
- 意味が明確な単独 Markdown 構文を、raw `markdown` 表示に頼らず Slack ネイティブの Block Kit ブロックへ変換
|
|
75
76
|
- フェンスドコード内の table 風行をテーブル処理から除外
|
|
76
77
|
- 必要に応じて、内部空行を補助用の行に置き換えて段落の区切りを見えやすくする
|
|
77
|
-
- 生 HTML 風タグなど、Slack の特殊記法としては無効な `<...>`
|
|
78
|
+
- 生 HTML 風タグなど、Slack の特殊記法としては無効な `<...>` 形式を散文中では無害化(コードフェンスとインラインコード内は原文のまま)
|
|
79
|
+
- 実測した Slack のブロック文字数上限・メッセージ展開アイテム上限・メッセージテキスト総量上限を超える出力を分割し、`chat.postMessage` ごと失敗するのを防ぐ
|
|
78
80
|
|
|
79
81
|
## 利用前提
|
|
80
82
|
|
|
@@ -110,7 +112,7 @@ for payload in convert_markdown_to_slack_payloads(
|
|
|
110
112
|
print(payload)
|
|
111
113
|
```
|
|
112
114
|
|
|
113
|
-
`convert_markdown_to_slack_messages`
|
|
115
|
+
`convert_markdown_to_slack_messages` は、複数テーブルを含む入力に加えて、長文や見出しの多い内容が Slack のブロック・メッセージサイズ上限を超える場合も、自動的に複数メッセージへ分割します。
|
|
114
116
|
Slack Web の新しい `markdown` 表示で段落間の余白が極端に小さい場合は、`preserve_visual_blank_lines=True` を使うと内部空行だけを見えやすく補えます。
|
|
115
117
|
|
|
116
118
|
## 入出力イメージ
|
|
@@ -160,7 +162,7 @@ QA | ~~保留~~ | Team C
|
|
|
160
162
|
|
|
161
163
|
| 関数 | 説明 |
|
|
162
164
|
|---|---|
|
|
163
|
-
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) → list[list[dict]]` | Markdown
|
|
165
|
+
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) → list[list[dict]]` | Markdown を、テーブルと Slack の実測サイズ上限に沿って分割済みのメッセージ群に変換 |
|
|
164
166
|
| `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) → list[dict]` | `blocks` とプレビュー用 `text` を含む Slack 送信用データへ変換 |
|
|
165
167
|
| `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) → list[dict]` | Markdown を Block Kit ブロックのリストに変換 |
|
|
166
168
|
| `build_fallback_text_from_blocks(blocks) → str` | `chat.postMessage.text` 用のプレビュー文字列を生成 |
|
|
@@ -175,9 +177,10 @@ QA | ~~保留~~ | Team C
|
|
|
175
177
|
| 関数 | 説明 |
|
|
176
178
|
|---|---|
|
|
177
179
|
| `normalize_markdown_tables(markdown_text) → str` | テーブル記法を正規化(パイプ補完、区切り行生成、列数調整) |
|
|
180
|
+
| `normalize_underscore_emphasis(text) → str` | `_..._` / `__...__` の underscore 装飾を Slack 互換の asterisk 装飾へ変換 |
|
|
178
181
|
| `add_zero_width_spaces_to_markdown(text) → str` | 装飾記号の前後にゼロ幅スペースを挿入(フェンスドコードブロック内は除外) |
|
|
179
|
-
| `decode_html_entities(text) → str` | HTML
|
|
180
|
-
| `sanitize_slack_text(text) → str` | ANSI /
|
|
182
|
+
| `decode_html_entities(text) → str` | 散文中の HTML エンティティをデコード(コード領域は原文のまま) |
|
|
183
|
+
| `sanitize_slack_text(text) → str` | ANSI / 制御文字 / 内部マーカー文字を除去し、コード領域外の不正な Slack 角括弧トークンを無害化 |
|
|
181
184
|
| `strip_zero_width_spaces(text) → str` | ゼロ幅スペース (U+200B) と BOM (U+FEFF) を除去(ZWJ 等の結合制御文字は保持) |
|
|
182
185
|
|
|
183
186
|
## 仕様
|
|
@@ -32,7 +32,8 @@ If Slack itself does not support a construct in `markdown` blocks, this library
|
|
|
32
32
|
- Promote safe standalone Markdown constructs into richer Block Kit blocks: `image`, `divider`, and `rich_text`
|
|
33
33
|
- Repair common LLM table issues such as missing outer pipes, missing separator rows, mismatched column counts, and empty cells
|
|
34
34
|
- Split output into multiple Slack messages when needed to satisfy Slack's "one table per message" and per-message block-count constraints
|
|
35
|
-
-
|
|
35
|
+
- Split oversized Markdown text into multiple `markdown` blocks and messages, preferring paragraph boundaries, so output fits Slack's measured hard limits — 12,000 characters per `markdown` block (`msg_too_long`), 50 server-side expansion items per message where each heading or divider counts as one item (`invalid_blocks`), and 13,200 characters of total block text per message (`msg_blocks_too_long`)
|
|
36
|
+
- Remove ANSI/control characters and reserved internal marker code points, and neutralize invalid Slack angle-bracket tokens in prose while keeping fenced code blocks and inline code spans verbatim
|
|
36
37
|
- Add zero-width spaces around inline formatting markers to reduce rendering issues outside fenced code blocks, while preserving English-like punctuation-only boundaries that Slack already renders reliably
|
|
37
38
|
- Add visible spaces for a small set of nested inline-code cases in dense Japanese, Chinese, and Korean text when zero-width spaces alone are not enough
|
|
38
39
|
- Support Markdown links and Slack-style links inside table cells
|
|
@@ -85,7 +86,8 @@ What this library compensates for:
|
|
|
85
86
|
- Converts unambiguous standalone Markdown constructs into native Block Kit blocks when that is safer than relying on raw `markdown` rendering
|
|
86
87
|
- Keeps table-like rows inside fenced code blocks out of table normalization
|
|
87
88
|
- Optionally turns internal blank lines into placeholder lines that keep paragraphs visibly separated in Slack `markdown` blocks
|
|
88
|
-
- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags
|
|
89
|
+
- Neutralizes invalid Slack angle-bracket tokens such as raw HTML-like tags in prose, while code regions keep their content verbatim
|
|
90
|
+
- Splits oversized output across Slack's measured per-block character limit and per-message expansion-item and total-text limits, instead of letting `chat.postMessage` fail outright
|
|
89
91
|
|
|
90
92
|
## Requirements
|
|
91
93
|
|
|
@@ -121,7 +123,7 @@ for payload in convert_markdown_to_slack_payloads(
|
|
|
121
123
|
print(payload)
|
|
122
124
|
```
|
|
123
125
|
|
|
124
|
-
`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables.
|
|
126
|
+
`convert_markdown_to_slack_messages` automatically splits output into multiple messages when the input contains multiple tables, and also when long or heading-dense content would exceed Slack's per-block and per-message size limits.
|
|
125
127
|
Set `preserve_visual_blank_lines=True` when you want the parser to compensate
|
|
126
128
|
for Slack's currently tight paragraph spacing inside `markdown` blocks.
|
|
127
129
|
The blank-line workaround is intentionally narrow: it skips table segments and
|
|
@@ -175,7 +177,7 @@ Example Slack bot rendering (`markdown` + `table` blocks):
|
|
|
175
177
|
|
|
176
178
|
| Function | Description |
|
|
177
179
|
|---|---|
|
|
178
|
-
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks. |
|
|
180
|
+
| `convert_markdown_to_slack_messages(markdown_text, *, preserve_visual_blank_lines=False) -> list[list[dict]]` | Convert Markdown into Slack messages already split around table blocks and Slack's measured size limits. |
|
|
179
181
|
| `convert_markdown_to_slack_payloads(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into Slack-ready request data with both `blocks` and preview `text`. |
|
|
180
182
|
| `convert_markdown_to_slack_blocks(markdown_text, *, preserve_visual_blank_lines=False) -> list[dict]` | Convert Markdown into a flat Block Kit block list. |
|
|
181
183
|
| `build_fallback_text_from_blocks(blocks) -> str` | Build preview text suitable for `chat.postMessage.text`. |
|
|
@@ -195,9 +197,10 @@ Slack Markdown rendering or keep list formatting open in some clients.
|
|
|
195
197
|
| Function | Description |
|
|
196
198
|
|---|---|
|
|
197
199
|
| `normalize_markdown_tables(markdown_text) -> str` | Normalize Markdown table syntax before conversion. |
|
|
200
|
+
| `normalize_underscore_emphasis(text) -> str` | Convert `_..._` / `__...__` underscore emphasis into Slack-friendly asterisk emphasis. |
|
|
198
201
|
| `add_zero_width_spaces_to_markdown(text) -> str` | Insert zero-width spaces around formatting tokens where Slack needs stronger boundaries. |
|
|
199
|
-
| `decode_html_entities(text) -> str` | Decode HTML entities before parsing. |
|
|
200
|
-
| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and neutralize invalid Slack angle-bracket tokens. |
|
|
202
|
+
| `decode_html_entities(text) -> str` | Decode HTML entities in prose before parsing; fenced code and inline code stay verbatim. |
|
|
203
|
+
| `sanitize_slack_text(text) -> str` | Remove ANSI/control noise and reserved marker code points, and neutralize invalid Slack angle-bracket tokens outside code regions. |
|
|
201
204
|
| `strip_zero_width_spaces(text) -> str` | Remove zero-width spaces (`U+200B`) and BOM (`U+FEFF`) while preserving join-control characters such as ZWJ. |
|
|
202
205
|
|
|
203
206
|
### Lower-level exported helpers
|
|
@@ -10,7 +10,7 @@
|
|
|
10
10
|
## 出力
|
|
11
11
|
|
|
12
12
|
- Slack Block Kit ブロック(`markdown`, `table`, `rich_text`, `image`, `divider`)
|
|
13
|
-
-
|
|
13
|
+
- 複数テーブル・多数の昇格ブロック・長文を含む入力時は、「1メッセージ1テーブル」、Slack のメッセージあたりブロック数制限、および「markdown ブロックのサイズ分割」に記載の実測サイズ上限を満たすメッセージ群
|
|
14
14
|
|
|
15
15
|
## 設計目標
|
|
16
16
|
|
|
@@ -23,8 +23,8 @@ Markdown としての厳密さより Slack 上での読みやすさが重要に
|
|
|
23
23
|
|
|
24
24
|
`convert_markdown_to_slack_blocks` の処理順序:
|
|
25
25
|
|
|
26
|
-
1. HTML エンティティをデコードし、`>`, `&`
|
|
27
|
-
2. Slack
|
|
26
|
+
1. 散文中の HTML エンティティをデコードし、`>`, `&` などを元の文字へ戻す(コードフェンスとインラインコードの中身はデコードしない)
|
|
27
|
+
2. Slack 向けのテキスト掃除を行う。ANSI / 制御文字とライブラリ予約の内部マーカー文字は全体から除去し、不正な Slack 角括弧トークンの無害化はコードフェンスとインラインコードの外側にのみ適用する
|
|
28
28
|
3. underscore 装飾を正規化し、`_..._` / `__...__` を Slack 互換の `*...*` / `**...**` に変換する
|
|
29
29
|
4. bare URL を Slack で安定しやすい `<https://...>` 形式にそろえる
|
|
30
30
|
5. 崩れた表を、後述のルールで補う
|
|
@@ -33,8 +33,9 @@ Markdown としての厳密さより Slack 上での読みやすさが重要に
|
|
|
33
33
|
- テーブル領域: セル内装飾を解析して `table` ブロックを生成。変換に失敗した場合は `markdown` ブロックに戻す
|
|
34
34
|
- 非テーブル領域: 安全に判定できる単独 Markdown 構文を先にリッチブロックへ変換し、残りのテキストは必要に応じてゼロ幅スペースを加えた上で `markdown` ブロックを生成する
|
|
35
35
|
- `preserve_visual_blank_lines=True` の場合は、残った `markdown` ブロックの内部空行を「ノーブレークスペースだけを含む行」に置き換えてから `markdown` ブロックを作る
|
|
36
|
+
- 整形後のテキストが Slack の `markdown` ブロック上限(12,000 文字)を超える領域は、後述の「markdown ブロックの文字数分割」のルールで複数の `markdown` ブロックに分割する
|
|
36
37
|
|
|
37
|
-
`convert_markdown_to_slack_messages`
|
|
38
|
+
`convert_markdown_to_slack_messages` は上記の結果を、「1メッセージ1テーブル」制約、Slack のメッセージあたりブロック数制限、および「markdown ブロックのサイズ分割」に記載のメッセージあたり展開アイテム・テキスト総量の予算に沿って分割します。
|
|
38
39
|
`convert_markdown_to_slack_payloads` は、同じ分割結果に `chat.postMessage.text` 用のプレビュー文字列を付けた送信データを返します。
|
|
39
40
|
|
|
40
41
|
## 実測ベースの Slack の挙動
|
|
@@ -83,7 +84,7 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
|
|
|
83
84
|
- リストは、テキスト領域の先頭または空行の直後から始まり、連続する非空行がすべてリスト項目で、1〜3スペースの曖昧なネストインデントや Markdown バックスラッシュエスケープに依存せず、直後にインデント付きの継続段落がない場合だけ昇格する
|
|
84
85
|
- フェンスドコード内の table 風行をテーブル解析対象から除外する
|
|
85
86
|
- 内部空行を、必要に応じて段落区切りを見えやすくする補助行へ置き換える
|
|
86
|
-
- `<foo>` や生 HTML 風タグのような、Slack の特殊記法としては無効な `<...>`
|
|
87
|
+
- `<foo>` や生 HTML 風タグのような、Slack の特殊記法としては無効な `<...>` 形式を散文中では無害化する(コードフェンスとインラインコードの中身は原文のまま保持する)
|
|
87
88
|
|
|
88
89
|
## Slack 向けテキスト掃除のルール
|
|
89
90
|
|
|
@@ -91,10 +92,13 @@ Slack は 2026-03-06 に `markdown` ブロックの公式ドキュメントを
|
|
|
91
92
|
|
|
92
93
|
- ANSI escape を除去する
|
|
93
94
|
- 一般的な制御文字を除去する
|
|
95
|
+
- ライブラリが内部プレースホルダ用に予約しているマーカー文字(`U+2063`、`U+FFF0`〜`U+FFF3`)を除去し、入力が内部機構と衝突しないようにする
|
|
94
96
|
- 有効な Slack 角括弧トークンは保持する
|
|
95
97
|
- 例: リンク、メンション、チャンネル参照、`<!here>`、`<!subteam^...>`、`<!date^...>`
|
|
96
98
|
- Slack の特殊記法として解釈できない `<foo>` のようなトークンは `<foo>` に変換して無害化する
|
|
97
99
|
- これには `<div>` や `<span>` のような生 HTML 風タグも含まれる
|
|
100
|
+
- 角括弧トークンの無害化はコードフェンスとインラインコードの外側にのみ適用する。`` `<div>` `` のようなコード例は原文のまま Slack に届く。ANSI / 制御文字 / マーカー文字の除去は、表示内容として正当な用途がないため全体に適用する
|
|
101
|
+
- この判定でのインラインコードスパンは同一行内に限り、開始と同じ長さのバッククォート run でのみ閉じる。対になっていない孤立バッククォートはリテラルのまま扱われ、後続行のサニタイズを妨げない。また、コードスパンをまたぐ無効な角括弧トークン(`<foo `bar` baz>`)はスパン内容を原文のまま保ちつつ全体を無害化する
|
|
98
102
|
|
|
99
103
|
## underscore 装飾正規化ルール
|
|
100
104
|
|
|
@@ -134,6 +138,7 @@ LLM は外枠パイプの省略、区切り行の欠落、列数の不一致な
|
|
|
134
138
|
- Slack リンク `<url|text>` 内のパイプはセル区切りとして扱わない
|
|
135
139
|
- インラインコード `` `...` `` 内のパイプもセル区切りとして扱わない
|
|
136
140
|
- エスケープされたパイプ `\|` はセル区切りとして扱わず、表示時にバックスラッシュを除去して `|` として出力する
|
|
141
|
+
- パイプ保護領域を開くのは有効な Slack 角括弧トークン(リンク・メンション・`<!date^...>`)のみ。`x < y` の比較演算子や `< 100ms` のような閾値表記の裸の `<` はリテラルのまま扱われ、後続のセル区切りを飲み込まない
|
|
137
142
|
|
|
138
143
|
## テーブルセル装飾
|
|
139
144
|
|
|
@@ -211,6 +216,25 @@ LLM は外枠パイプの省略、区切り行の欠落、列数の不一致な
|
|
|
211
216
|
| U+200C | ZWNJ(ゼロ幅非接合子) | ペルシャ語・ヒンディー語などの語形制御に使われる |
|
|
212
217
|
| U+200D | ZWJ(ゼロ幅接合子) | 結合絵文字やその他の文字結合に必要 |
|
|
213
218
|
|
|
219
|
+
## markdown ブロックのサイズ分割
|
|
220
|
+
|
|
221
|
+
2026-06-11 に実ワークスペースで、Slack 側の 3 つのハード制限を実測しました。
|
|
222
|
+
|
|
223
|
+
- `markdown` ブロックの `text` はちょうど 12,000 文字まで受理。12,001 文字は `chat.postMessage` 全体が `msg_too_long` で拒否される
|
|
224
|
+
- Slack は `markdown` ブロックをサーバ側でネイティブブロック列に展開し、展開後の「アイテム数 ≤ 50」をメッセージ単位で検証する(超過は `invalid_blocks`)。見出しと区切り線は 1 個ずつアイテムになる(見出し 50 個は受理、51 個は拒否。30 見出し×2 ブロックも合算で拒否)。段落・リスト・引用・コードフェンスは、見出し/区切り線に挟まれた連続区間ごとに 1 アイテムへ集約される(空行区切りの段落 60 個やフェンス 52 個は受理。空行だけでは区間は分かれない)
|
|
225
|
+
- 1 メッセージのブロックが運べるテキスト総量はちょうど 13,200 文字(単一ブロック上限の 1.1 倍)。13,201 文字は `msg_blocks_too_long` で拒否される。総量はブロック種別をまたいで数えられる(11,900 文字の `markdown` ブロック + 1,400 文字の `rich_text` も拒否された)
|
|
226
|
+
|
|
227
|
+
このため、長い、または見出しの多い非テーブル領域は送信前に分割します。
|
|
228
|
+
|
|
229
|
+
- まず領域全体を 1 ブロックとして試し、整形後テキストが文字数上限を超えるか、展開アイテム数の見積もりが予算を超えた場合のみ分割する
|
|
230
|
+
- ゼロ幅スペースや補助行の挿入でテキストが膨らみ、アイテム数見積もりも意図的に保守的なため、生テキストは上限より低い目標値(11,500 文字 / 45 アイテム)に向けて詰める
|
|
231
|
+
- 分割点はまず段落境界(コードフェンス外の空行のまとまり)を選ぶ。境界に使った空行は、隣接ブロック自体が視覚的に分かれて表示されるため除去する
|
|
232
|
+
- 予算を超える単一段落は行境界で、超過する単一行は語境界で分割する。スペースが無い場合(密な CJK 文など)はやむを得ず文字位置で切る
|
|
233
|
+
- 未閉鎖のコードフェンス内で切れる場合は、続きのブロック先頭に元のフェンス開始行を再掲し、両方がコードとして表示され続けるようにする
|
|
234
|
+
- 分割後の各ピースも整形後に再チェックし、ハード制限を超える場合は詰め込み予算を縮めて再分割する
|
|
235
|
+
- `convert_markdown_to_slack_messages` はさらに、メッセージ内の展開アイテム見積もりの合計が 50 以内、ブロックテキストの総量が 13,200 文字以内に収まるようにブロックを束ねる(`markdown` 以外のブロックは 1 アイテムと数え、テキスト総量には全ブロック種別の内容を算入する)
|
|
236
|
+
- 最上位のフォールバック `text` フィールドには文字数上限は適用されない(Slack は拒否せず切り詰める)ため、プレビュー文字列は分割しない
|
|
237
|
+
|
|
214
238
|
## 空行の見え方を補うオプション
|
|
215
239
|
|
|
216
240
|
メインの変換 API に `preserve_visual_blank_lines=True` を渡すと、非テーブル領域で見える行に挟まれた空行だけを「ノーブレークスペースだけを含む行」に置き換えてから Slack `markdown` ブロックを生成します。
|
|
@@ -10,7 +10,7 @@ This document describes how `slack-markdown-parser` converts Markdown into Slack
|
|
|
10
10
|
## Output
|
|
11
11
|
|
|
12
12
|
- Slack Block Kit blocks (`markdown`, `table`, `rich_text`, `image`, and `divider`)
|
|
13
|
-
- When the input contains multiple tables
|
|
13
|
+
- When the input contains multiple tables, many promoted blocks, or long content, a list of messages that satisfies the "one table per message" rule, Slack's per-message block-count limit, and the measured size limits described in "Markdown block size splitting"
|
|
14
14
|
|
|
15
15
|
## Design target
|
|
16
16
|
|
|
@@ -23,8 +23,8 @@ When exact Markdown fidelity conflicts with Slack readability, readable Slack ou
|
|
|
23
23
|
|
|
24
24
|
`convert_markdown_to_slack_blocks` processes text in this order:
|
|
25
25
|
|
|
26
|
-
1. Decode HTML entities such as `>` and `&`
|
|
27
|
-
2. Clean Slack text
|
|
26
|
+
1. Decode HTML entities such as `>` and `&` in prose, leaving fenced code blocks and inline code spans verbatim
|
|
27
|
+
2. Clean Slack text: remove ANSI/control noise and this library's reserved internal marker code points everywhere, and neutralize invalid Slack angle-bracket tokens outside fenced code blocks and inline code spans
|
|
28
28
|
3. Normalize underscore emphasis by converting `_..._` / `__...__` into Slack-friendly `*...*` / `**...**`
|
|
29
29
|
4. Normalize bare URLs by wrapping them in Slack-friendly `<https://...>` form
|
|
30
30
|
5. Repair malformed tables using the rules below
|
|
@@ -33,8 +33,9 @@ When exact Markdown fidelity conflicts with Slack readability, readable Slack ou
|
|
|
33
33
|
- Table regions: parse inline cell styling and generate a `table` block. If conversion fails, such as when there are fewer than two candidate lines or the parse result is empty, fall back to a `markdown` block.
|
|
34
34
|
- Non-table regions: first promote safe standalone Markdown constructs into richer Block Kit blocks, then add zero-width spaces where needed and generate `markdown` blocks for the remaining text.
|
|
35
35
|
- If `preserve_visual_blank_lines=True`, replace internal blank lines in remaining `markdown` blocks with lines that contain only a non-breaking space before emitting the `markdown` block.
|
|
36
|
+
- A remaining region whose formatted text would exceed Slack's 12,000-character `markdown` block limit is split into multiple `markdown` blocks using the rules in "Markdown block length splitting" below.
|
|
36
37
|
|
|
37
|
-
`convert_markdown_to_slack_messages` then splits the resulting block list to satisfy the "one table per message" rule
|
|
38
|
+
`convert_markdown_to_slack_messages` then splits the resulting block list to satisfy the "one table per message" rule, Slack's per-message block-count limit, and the per-message expansion-item and total-text budgets described in "Markdown block size splitting".
|
|
38
39
|
`convert_markdown_to_slack_payloads` returns the same split blocks plus preview `text` values ready for `chat.postMessage`.
|
|
39
40
|
|
|
40
41
|
## How Slack behaved in testing
|
|
@@ -84,7 +85,7 @@ Slack still controls when those newer features appear and how they look, so trea
|
|
|
84
85
|
- Slack mention tokens inside a promoted list item are converted to their structured `rich_text` elements — `<@U…>`/`<@W…>` to `user`, `<#C…>`/`<#G…>` to `channel`, `<!subteam^S…>` to `usergroup`, and `<!here>`/`<!channel>`/`<!everyone>` to `broadcast` — since a `rich_text` block does not resolve a raw token. An optional `|label` display suffix is dropped (Slack renders the element from the id).
|
|
85
86
|
- Table-like rows inside fenced code blocks are kept out of table parsing
|
|
86
87
|
- Internal blank lines can optionally be rewritten into placeholder lines so Slack keeps visible paragraph separation
|
|
87
|
-
- Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized
|
|
88
|
+
- Unsupported Slack angle-bracket tokens such as `<foo>` or raw HTML-like tags are neutralized in prose, while fenced code blocks and inline code spans keep them verbatim
|
|
88
89
|
|
|
89
90
|
## Slack text cleanup rules
|
|
90
91
|
|
|
@@ -92,9 +93,12 @@ Behavior of `sanitize_slack_text`:
|
|
|
92
93
|
|
|
93
94
|
- Remove ANSI escape sequences
|
|
94
95
|
- Remove general control characters except line breaks and tabs already preserved by the regex
|
|
96
|
+
- Remove this library's reserved internal marker code points (`U+2063`, `U+FFF0`–`U+FFF3`) so input cannot collide with the internal placeholder machinery
|
|
95
97
|
- Keep valid Slack angle-bracket tokens such as links, mentions, channels, special mentions, subteam mentions, and `<!date^...>`
|
|
96
98
|
- Replace unsupported angle-bracket tokens such as `<foo>` with full-width brackets (`<foo>`) so Slack does not interpret them as malformed special syntax
|
|
97
99
|
- This also applies to raw HTML-like tags such as `<div>` or `<span>`
|
|
100
|
+
- Angle-token neutralization applies only outside fenced code blocks and inline code spans, so code samples such as `` `<div>` `` reach Slack verbatim; ANSI/control/marker removal applies everywhere because those characters are never legitimate content
|
|
101
|
+
- For this purpose an inline code span is recognized within a single line only, and it closes only on a backtick run of the same length as the opener. A stray unpaired backtick therefore stays literal and cannot suppress sanitization of later lines, and an invalid angle token that spans a code span (`<foo `bar` baz>`) is still neutralized as a whole while the span content stays verbatim
|
|
98
102
|
|
|
99
103
|
## Underscore emphasis normalization rules
|
|
100
104
|
|
|
@@ -134,6 +138,7 @@ LLMs often emit tables with omitted outer pipes, missing separator rows, or inco
|
|
|
134
138
|
- Treat pipes inside Slack links such as `<url|text>` as literal content, not as cell separators
|
|
135
139
|
- Treat pipes inside inline code `` `...` `` as literal content, not as cell separators
|
|
136
140
|
- Treat escaped pipes `\|` as literal content and remove the backslash in the final displayed text
|
|
141
|
+
- Only valid Slack angle tokens (links, mentions, `<!date^...>`) open a pipe-protected region; a bare `<` — such as the comparison in `x < y` or a threshold like `< 100ms` — stays literal and does not swallow later cell separators
|
|
137
142
|
|
|
138
143
|
## Table cell styling
|
|
139
144
|
|
|
@@ -211,6 +216,25 @@ Exception:
|
|
|
211
216
|
| `U+200C` | ZWNJ (zero-width non-joiner) | Used for word-shape control in languages such as Persian and Hindi |
|
|
212
217
|
| `U+200D` | ZWJ (zero-width joiner) | Required for joined emoji and other grapheme composition |
|
|
213
218
|
|
|
219
|
+
## Markdown block size splitting
|
|
220
|
+
|
|
221
|
+
Three Slack-side hard limits were measured against a real workspace on 2026-06-11:
|
|
222
|
+
|
|
223
|
+
- A `markdown` block's `text` accepts exactly 12,000 characters; 12,001 fails the whole `chat.postMessage` call with `msg_too_long`.
|
|
224
|
+
- Slack expands `markdown` blocks server-side into native blocks and enforces "no more than 50 items" on the expanded result per message (`invalid_blocks`). Each heading and each thematic break becomes its own item (50 headings were accepted, 51 rejected; 30 headings in each of two blocks were rejected together), while paragraphs, lists, quotes, and fenced code merge into one item per contiguous run between those breakers (60 blank-separated paragraphs and 52 fences were accepted). Blank lines alone do not split a run.
|
|
225
|
+
- One message's blocks may carry at most 13,200 characters of text in total — exactly 1.1 × the single-block limit; 13,201 fails with `msg_blocks_too_long`. The total counts content across block types (a 11,900-character `markdown` block plus a 1,400-character `rich_text` was rejected).
|
|
226
|
+
|
|
227
|
+
Long or heading-dense non-table regions are therefore split before delivery:
|
|
228
|
+
|
|
229
|
+
- The whole region is tried as a single block first; splitting happens only when the formatted text exceeds the character limit or the estimated expansion exceeds the per-message item budget
|
|
230
|
+
- Raw content is packed toward targets below the hard limits (11,500 characters, 45 estimated items), because zero-width-space insertion and placeholder lines inflate the formatted text and the item estimate is intentionally conservative
|
|
231
|
+
- Split points prefer paragraph boundaries (blank-line runs outside fenced code); the blank run at a chosen boundary is dropped, since adjacent Slack blocks already render visually separated
|
|
232
|
+
- A single paragraph longer than the budget is split at line boundaries, and a single overlong line at word boundaries, with a hard cut when no space exists (for example dense CJK text)
|
|
233
|
+
- When a cut lands inside an unclosed fenced code block, the continuation block re-opens the fence with the original delimiter line so both halves keep rendering as code
|
|
234
|
+
- Each piece is re-checked after formatting; when it still exceeds a hard limit, the packing budgets shrink and the piece is split again
|
|
235
|
+
- `convert_markdown_to_slack_messages` additionally packs blocks into messages so that the summed expansion estimate stays within the 50-item budget (non-`markdown` blocks count as one item each) and the summed block text stays within the 13,200-character per-message total
|
|
236
|
+
- The top-level fallback `text` field is not subject to the character limit (Slack truncates it instead of rejecting), so preview text is left whole
|
|
237
|
+
|
|
214
238
|
## Optional blank-line visibility workaround
|
|
215
239
|
|
|
216
240
|
When `preserve_visual_blank_lines=True` is passed to the main conversion APIs,
|