chatgpt-md-converter 0.3.7__tar.gz → 0.3.8__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (21) hide show
  1. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/PKG-INFO +1 -1
  2. chatgpt_md_converter-0.3.8/README.md +147 -0
  3. chatgpt_md_converter-0.3.8/chatgpt_md_converter/__init__.py +4 -0
  4. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/extractors.py +19 -18
  5. chatgpt_md_converter-0.3.8/chatgpt_md_converter/html_splitter.py +239 -0
  6. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter.egg-info/PKG-INFO +1 -1
  7. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter.egg-info/SOURCES.txt +1 -0
  8. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/setup.py +2 -2
  9. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/tests/test_parser.py +20 -0
  10. chatgpt_md_converter-0.3.8/tests/test_splitter.py +103 -0
  11. chatgpt_md_converter-0.3.7/chatgpt_md_converter/__init__.py +0 -3
  12. chatgpt_md_converter-0.3.7/chatgpt_md_converter/html_splitter.py +0 -114
  13. chatgpt_md_converter-0.3.7/tests/test_splitter.py +0 -436
  14. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/LICENSE +0 -0
  15. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/converters.py +0 -0
  16. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/formatters.py +0 -0
  17. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/helpers.py +0 -0
  18. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/telegram_formatter.py +0 -0
  19. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter.egg-info/dependency_links.txt +0 -0
  20. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter.egg-info/top_level.txt +0 -0
  21. {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: chatgpt_md_converter
3
- Version: 0.3.7
3
+ Version: 0.3.8
4
4
  Summary: A package for converting markdown to HTML for chat Telegram bots
5
5
  Home-page: https://github.com/botfather-dev/formatter-chatgpt-telegram
6
6
  Author: Kostiantyn Kriuchkov
@@ -0,0 +1,147 @@
1
+ # ChatGPT Markdown to Telegram HTML Parser
2
+
3
+ ## Overview
4
+
5
+ This project provides a solution for converting Telegram-style Markdown formatted text into HTML markup supported by the Telegram Bot API, specifically tailored for use in ChatGPT bots developed with the OpenAI API. It includes features for handling various Markdown elements and ensures proper tag closure, making it suitable for streaming mode applications.
6
+
7
+ ## Features
8
+
9
+ - Converts Telegram-style Markdown syntax to Telegram-compatible HTML
10
+ - Supports text styling:
11
+ - Bold: `**text**` → `<b>text</b>`
12
+ - Italic: `*text*` or `_text_` → `<i>text</i>`
13
+ - Underline: `__text__` → `<u>text</u>`
14
+ - Strikethrough: `~~text~~` → `<s>text</s>`
15
+ - Spoiler: `||text||` → `<span class="tg-spoiler">text</span>`
16
+ - Inline code: `` `code` `` → `<code>code</code>`
17
+ - Handles nested text styling
18
+ - Converts links: `[text](URL)` → `<a href="URL">text</a>`
19
+ - Processes code blocks with language specification
20
+ - Supports blockquotes:
21
+ - Regular blockquotes: `> text` → `<blockquote>text</blockquote>`
22
+ - Expandable blockquotes: `**> text` → `<blockquote expandable>text</blockquote>`
23
+ - Automatically appends missing closing delimiters for code blocks
24
+ - Escapes HTML special characters to prevent unwanted HTML rendering
25
+
26
+ ## Usage
27
+
28
+ To use the Markdown to Telegram HTML Parser in your ChatGPT bot, integrate the provided Python functions into your bot's processing pipeline. Here is a brief overview of how to incorporate the parser:
29
+
30
+ 1. **Ensure Closing Delimiters**: Automatically appends missing closing delimiters for backticks to ensure proper parsing.
31
+
32
+ 2. **Extract and Convert Code Blocks**: Extracts Markdown code blocks, converts them to HTML `<pre><code>` format, and replaces them with placeholders to prevent formatting within code blocks.
33
+
34
+ 3. **Markdown to HTML Conversion**: Applies various regex substitutions and custom logic to convert supported Markdown formatting to Telegram-compatible HTML tags.
35
+
36
+ 4. **Reinsert Code Blocks**: Reinserts the previously extracted and converted code blocks back into the main text, replacing placeholders with the appropriate HTML content.
37
+
38
+ Simply call the `telegram_format(text: str) -> str` function with your Markdown-formatted text as input to receive the converted HTML output ready for use with the Telegram Bot API.
39
+
40
+ ## Installation
41
+
42
+ ```sh
43
+ pip install chatgpt-md-converter
44
+ ```
45
+
46
+ ## Example
47
+
48
+ ```python
49
+ from chatgpt_md_converter import telegram_format
50
+
51
+ # Basic formatting example
52
+ text = """
53
+ Here is some **bold**, __underline__, and `inline code`.
54
+ This is a ||spoiler text|| and *italic*.
55
+
56
+ Code example:
57
+ print('Hello, world!')
58
+ """
59
+
60
+ # Blockquotes example
61
+ blockquote_text = """
62
+ > Regular blockquote
63
+ > Multiple lines
64
+
65
+ **> Expandable blockquote
66
+ > Hidden by default
67
+ > Multiple lines
68
+ """
69
+
70
+ formatted_text = telegram_format(text)
71
+ formatted_blockquote = telegram_format(blockquote_text)
72
+
73
+ print(formatted_text)
74
+ print(formatted_blockquote)
75
+ ```
76
+
77
+ ### Output:
78
+
79
+ ```
80
+ Here is some <b>bold</b>, <u>underline</u>, and <code>inline code</code>.
81
+ This is a <span class="tg-spoiler">spoiler text</span> and <i>italic</i>.
82
+
83
+ Code example:
84
+ print('Hello, world!')
85
+
86
+ <blockquote>Regular blockquote
87
+ Multiple lines</blockquote>
88
+
89
+ <blockquote expandable>Expandable blockquote
90
+ Hidden by default
91
+ Multiple lines</blockquote>
92
+ ```
93
+
94
+ ## Requirements
95
+
96
+ - Python 3.x
97
+ - No external libraries required (uses built-in `re` module for regex operations)
98
+
99
+ ## Contribution
100
+
101
+ Feel free to contribute to this project by submitting pull requests or opening issues for bugs, feature requests, or improvements.
102
+
103
+ ## Prompting LLMs for Telegram-Specific Formatting
104
+
105
+ > **Note**:
106
+ > Since standard Markdown doesn't include Telegram-specific features like spoilers (`||text||`) and expandable blockquotes (`**> text`), you'll need to explicitly instruct LLMs to use these formats. Here's a suggested prompt addition to include in your system message or initial instructions:
107
+
108
+ ````
109
+ When formatting your responses for Telegram, please use these special formatting conventions:
110
+
111
+ 1. For content that should be hidden as a spoiler (revealed only when users click):
112
+ Use: ||spoiler content here||
113
+ Example: This is visible, but ||this is hidden until clicked||.
114
+
115
+ 2. For lengthy explanations or optional content that should be collapsed:
116
+ Use: **> Expandable section title
117
+
118
+ > Content line 1
119
+ > Content line 2
120
+ > (Each line of the expandable blockquote should start with ">")
121
+
122
+ 3. Continue using standard markdown for other formatting:
123
+ - **bold text**
124
+ - *italic text*
125
+ - __underlined text__
126
+ - ~~strikethrough~~
127
+ - `inline code`
128
+ - ```code blocks```
129
+ - [link text](URL)
130
+
131
+ Apply spoilers for:
132
+
133
+ - Solution reveals
134
+ - Potential plot spoilers
135
+ - Sensitive information
136
+ - Surprising facts
137
+
138
+ Use expandable blockquotes for:
139
+
140
+ - Detailed explanations
141
+ - Long examples
142
+ - Optional reading
143
+ - Technical details
144
+ - Additional context not needed by all users
145
+ ````
146
+
147
+ You can add this prompt to your system message when initializing your ChatGPT interactions to ensure the model properly formats content for optimal display in Telegram.
@@ -0,0 +1,4 @@
1
+ from .telegram_formatter import telegram_format
2
+ from .html_splitter import split_html_for_telegram
3
+
4
+ __all__ = ["telegram_format", "split_html_for_telegram"]
@@ -2,31 +2,32 @@ import re
2
2
 
3
3
 
4
4
  def ensure_closing_delimiters(text: str) -> str:
5
- """Append missing closing backtick delimiters."""
5
+ # Append missing closing backtick delimiters.
6
6
 
7
7
  code_block_re = re.compile(
8
8
  r"(?P<fence>`{3,})(?P<lang>\w+)?\n?[\s\S]*?(?<=\n)?(?P=fence)",
9
9
  flags=re.DOTALL,
10
10
  )
11
11
 
12
- # Remove complete code blocks from consideration so inner backticks
13
- # don't affect delimiter balancing.
14
- cleaned = code_block_re.sub("", text)
15
-
16
- # Detect unclosed fences by tracking opening fence lengths.
17
- stack = []
18
- for line in cleaned.splitlines():
19
- m = re.match(r"^(?P<fence>`{3,})(?P<lang>\w+)?$", line.strip())
20
- if not m:
21
- continue
22
- fence = m.group("fence")
23
- if stack and fence == stack[-1]:
24
- stack.pop()
12
+ # Track an open fence. Once a fence is opened, everything until the same
13
+ # fence is encountered again is treated as plain text. This mimics how
14
+ # Markdown handles fences and allows fence-like strings inside code blocks.
15
+ open_fence = None
16
+ for line in text.splitlines():
17
+ stripped = line.strip()
18
+ if open_fence is None:
19
+ m = re.match(r"^(?P<fence>`{3,})(?P<lang>\w+)?$", stripped)
20
+ if m:
21
+ open_fence = m.group("fence")
25
22
  else:
26
- stack.append(fence)
23
+ if stripped.endswith(open_fence):
24
+ open_fence = None
27
25
 
28
- if stack:
29
- text += "\n" + stack[-1]
26
+ # If a fence was left open, append a matching closing fence.
27
+ if open_fence is not None:
28
+ if not text.endswith("\n"):
29
+ text += "\n"
30
+ text += open_fence
30
31
 
31
32
  cleaned_inline = code_block_re.sub("", text)
32
33
 
@@ -91,4 +92,4 @@ def reinsert_code_blocks(text: str, code_blocks: dict) -> str:
91
92
  """
92
93
  for placeholder, html_code_block in code_blocks.items():
93
94
  text = text.replace(placeholder, html_code_block, 1)
94
- return text
95
+ return text
@@ -0,0 +1,239 @@
1
+ import re
2
+ from html.parser import HTMLParser
3
+
4
+ MAX_LENGTH = 4096
5
+ MIN_LENGTH = 500
6
+
7
+
8
+ class HTMLTagTracker(HTMLParser):
9
+ def __init__(self):
10
+ super().__init__()
11
+ self.open_tags = []
12
+
13
+ def handle_starttag(self, tag, attrs):
14
+ # saving tags
15
+ if tag in (
16
+ "b", "i", "u", "s", "code", "pre", "a", "span", "blockquote",
17
+ "strong", "em", "ins", "strike", "del", "tg-spoiler", "tg-emoji"
18
+ ):
19
+ self.open_tags.append((tag, attrs))
20
+
21
+ def handle_endtag(self, tag):
22
+ for i in range(len(self.open_tags) - 1, -1, -1):
23
+ if self.open_tags[i][0] == tag:
24
+ del self.open_tags[i]
25
+ break
26
+
27
+ def get_open_tags_html(self):
28
+ parts = []
29
+ for tag, attrs in self.open_tags:
30
+ attr_str = ""
31
+ if attrs:
32
+ attr_str = " " + " ".join(f'{k}="{v}"' for k, v in attrs)
33
+ parts.append(f"<{tag}{attr_str}>")
34
+ return "".join(parts)
35
+
36
+ def get_closing_tags_html(self):
37
+ return "".join(f"</{tag}>" for tag, _ in reversed(self.open_tags))
38
+
39
+
40
+ def split_pre_block(pre_block: str, max_length) -> list[str]:
41
+ """
42
+ Splits long HTML-formatted text into chunks suitable for sending via Telegram,
43
+ preserving valid HTML tag nesting and handling <pre>/<code> blocks separately.
44
+
45
+ Args:
46
+ text (str): The input HTML-formatted string.
47
+ trim_leading_newlines (bool): If True, removes leading newline characters (`\\n`)
48
+ from each resulting chunk before sending. This is useful to avoid
49
+ unnecessary blank space at the beginning of messages in Telegram.
50
+
51
+ Returns:
52
+ list[str]: A list of HTML-formatted message chunks, each within Telegram's length limit.
53
+ """
54
+
55
+ # language-aware: <pre><code class="language-python">...</code></pre>
56
+ match = re.match(r"<pre><code(.*?)>(.*)</code></pre>", pre_block, re.DOTALL)
57
+ if match:
58
+ attr, content = match.groups()
59
+ lines = content.splitlines(keepends=True)
60
+ chunks, buf = [], ""
61
+ overhead = len(f"<pre><code{attr}></code></pre>")
62
+ for line in lines:
63
+ if len(buf) + len(line) + overhead > max_length:
64
+ chunks.append(f"<pre><code{attr}>{buf}</code></pre>")
65
+ buf = ""
66
+ buf += line
67
+ if buf:
68
+ chunks.append(f"<pre><code{attr}>{buf}</code></pre>")
69
+ return chunks
70
+ else:
71
+ # regular <pre>...</pre>
72
+ inner = pre_block[5:-6]
73
+ lines = inner.splitlines(keepends=True)
74
+ chunks, buf = [], ""
75
+ overhead = len('<pre></pre>')
76
+ for line in lines:
77
+ if len(buf) + len(line) + overhead > max_length:
78
+ chunks.append(f"<pre>{buf}</pre>")
79
+ buf = ""
80
+ buf += line
81
+ if buf:
82
+ chunks.append(f"<pre>{buf}</pre>")
83
+ return chunks
84
+
85
+
86
+ def _is_only_tags(block: str) -> bool:
87
+ return bool(re.fullmatch(r'(?:\s*<[^>]+>\s*)+', block))
88
+
89
+
90
+ def _effective_length(content: str) -> int:
91
+ tracker = HTMLTagTracker()
92
+ tracker.feed(content)
93
+ return len(tracker.get_open_tags_html()) + len(content) + len(tracker.get_closing_tags_html())
94
+
95
+
96
+ def split_html_for_telegram(text: str, trim_empty_leading_lines: bool = False, max_length: int = MAX_LENGTH) -> list[str]:
97
+ """Split long HTML-formatted text into Telegram-compatible chunks.
98
+
99
+ Parameters
100
+ ----------
101
+ text: str
102
+ Input HTML text.
103
+ trim_empty_leading_lines: bool, optional
104
+ If True, removes `\n` sybmols from start of chunks.
105
+ max_length: int, optional
106
+ Maximum allowed length for a single chunk (must be >= ``MIN_LENGTH = 500``).
107
+ Default = 4096 (symbols)
108
+
109
+ Returns
110
+ -------
111
+ list[str]
112
+ List of HTML chunks.
113
+ """
114
+
115
+ if max_length < MIN_LENGTH:
116
+ raise ValueError("max_length should be at least %d" % MIN_LENGTH)
117
+
118
+ pattern = re.compile(r"(<pre>.*?</pre>|<pre><code.*?</code></pre>)", re.DOTALL)
119
+ parts = pattern.split(text)
120
+
121
+ chunks: list[str] = []
122
+ prefix = ""
123
+ current = ""
124
+ whitespace_re = re.compile(r"(\\s+)")
125
+ tag_re = re.compile(r"(<[^>]+>)")
126
+
127
+ def finalize():
128
+ nonlocal current, prefix
129
+ tracker = HTMLTagTracker()
130
+ tracker.feed(prefix + current)
131
+ chunk = prefix + current + tracker.get_closing_tags_html()
132
+ chunks.append(chunk)
133
+ prefix = tracker.get_open_tags_html()
134
+ current = ""
135
+
136
+ def append_piece(piece: str):
137
+ nonlocal current, prefix
138
+
139
+ def split_on_whitespace(chunk: str) -> list[str] | None:
140
+ parts = [part for part in whitespace_re.split(chunk) if part]
141
+ if len(parts) <= 1:
142
+ return None
143
+ return parts
144
+
145
+ def split_on_tags(chunk: str) -> list[str] | None:
146
+ parts = [part for part in tag_re.split(chunk) if part]
147
+ if len(parts) <= 1:
148
+ return None
149
+ return parts
150
+
151
+ def fittable_prefix_length(chunk: str) -> int:
152
+ low, high = 1, len(chunk)
153
+ best = 0
154
+ while low <= high:
155
+ mid = (low + high) // 2
156
+ candidate = chunk[:mid]
157
+ if _effective_length(prefix + current + candidate) <= max_length:
158
+ best = mid
159
+ low = mid + 1
160
+ else:
161
+ high = mid - 1
162
+ return best
163
+
164
+ while piece:
165
+ if _effective_length(prefix + current + piece) <= max_length:
166
+ current += piece
167
+ return
168
+
169
+ if len(piece) > max_length:
170
+ if _is_only_tags(piece):
171
+ raise ValueError("block contains only html tags")
172
+ splitted = split_on_whitespace(piece)
173
+ if splitted:
174
+ for part in splitted:
175
+ append_piece(part)
176
+ return
177
+ tag_split = split_on_tags(piece)
178
+ if tag_split:
179
+ for part in tag_split:
180
+ append_piece(part)
181
+ return
182
+ elif current:
183
+ finalize()
184
+ continue
185
+ else:
186
+ splitted = split_on_whitespace(piece)
187
+ if splitted:
188
+ for part in splitted:
189
+ append_piece(part)
190
+ return
191
+ tag_split = split_on_tags(piece)
192
+ if tag_split:
193
+ for part in tag_split:
194
+ append_piece(part)
195
+ return
196
+
197
+ fitted = fittable_prefix_length(piece)
198
+ if fitted == 0:
199
+ if current:
200
+ finalize()
201
+ continue
202
+ raise ValueError("unable to split content within max_length")
203
+
204
+ current += piece[:fitted]
205
+ piece = piece[fitted:]
206
+
207
+ if piece:
208
+ finalize()
209
+
210
+
211
+ for part in parts:
212
+ if not part:
213
+ continue
214
+ if part.startswith("<pre>") or part.startswith("<pre><code"):
215
+ pre_chunks = split_pre_block(part, max_length=max_length)
216
+ for pc in pre_chunks:
217
+ append_piece(pc)
218
+ continue
219
+ blocks = re.split(r"(\n\s*\n|<br\s*/?>|\n)", part)
220
+ for block in blocks:
221
+ if block:
222
+ append_piece(block)
223
+
224
+ if current:
225
+ finalize()
226
+
227
+ merged: list[str] = []
228
+ buf = ""
229
+ for chunk in chunks:
230
+ if len(buf) + len(chunk) <= max_length:
231
+ buf += chunk
232
+ else:
233
+ if buf:
234
+ merged.append(buf)
235
+ buf = chunk.lstrip("\n") if trim_empty_leading_lines and merged else chunk
236
+ if buf:
237
+ merged.append(buf.lstrip("\n") if trim_empty_leading_lines and merged else buf)
238
+
239
+ return merged
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: chatgpt_md_converter
3
- Version: 0.3.7
3
+ Version: 0.3.8
4
4
  Summary: A package for converting markdown to HTML for chat Telegram bots
5
5
  Home-page: https://github.com/botfather-dev/formatter-chatgpt-telegram
6
6
  Author: Kostiantyn Kriuchkov
@@ -1,4 +1,5 @@
1
1
  LICENSE
2
+ README.md
2
3
  setup.py
3
4
  chatgpt_md_converter/__init__.py
4
5
  chatgpt_md_converter/converters.py
@@ -2,11 +2,11 @@ from setuptools import setup
2
2
 
3
3
  setup(
4
4
  name="chatgpt_md_converter",
5
- version="0.3.7",
5
+ version="0.3.8",
6
6
  author="Kostiantyn Kriuchkov",
7
7
  author_email="latand666@gmail.com",
8
8
  description="A package for converting markdown to HTML for chat Telegram bots",
9
- long_description=open("README.MD").read(),
9
+ long_description=open("README.md").read(),
10
10
  long_description_content_type="text/markdown",
11
11
  url="https://github.com/botfather-dev/formatter-chatgpt-telegram",
12
12
  classifiers=[
@@ -915,3 +915,23 @@ print("hello world ```"')
915
915
  print(f"Expected was: \n\n{expected_output}\n\n")
916
916
  print(f"output was: \n\n{output}")
917
917
  assert output == expected_output, show_output()
918
+
919
+ def test_some_new():
920
+ input_text = """
921
+ ``````markdown
922
+ `````
923
+ ````python
924
+ print("hello world ```")
925
+ ```
926
+ """ # Markdown code wasn't closed
927
+
928
+ expected_output = """<pre><code class=\"language-markdown\">`````
929
+ ````python
930
+ print("hello world ```")
931
+ ```
932
+ </code></pre>""" # But after closed correctly
933
+ output = telegram_format(input_text)
934
+ def show_output():
935
+ print(f"Expected was: \n\n{expected_output}\n\n")
936
+ print(f"output was: \n\n{output}")
937
+ assert output == expected_output, show_output()
@@ -0,0 +1,103 @@
1
+ import re
2
+
3
+ import pytest
4
+
5
+ from chatgpt_md_converter.html_splitter import (MIN_LENGTH,
6
+ split_html_for_telegram)
7
+
8
+ from . import html_examples
9
+
10
+
11
+ def test_html_splitter():
12
+ chunks = split_html_for_telegram(html_examples.input_text)
13
+ valid_chunks = [
14
+ html_examples.valid_chunk_1,
15
+ html_examples.valid_chunk_2,
16
+ html_examples.valid_chunk_3,
17
+ ]
18
+ for index, chunk in enumerate(chunks):
19
+ assert chunk == valid_chunks[index], (
20
+ f"expected: \n\n{valid_chunks[index]} \n\n got: \n\n{chunk}"
21
+ )
22
+
23
+ def test_html_splitter__remove_leading_brakes():
24
+ chunks = split_html_for_telegram(html_examples.input_text, trim_empty_leading_lines=True)
25
+ valid_chunks = [
26
+ html_examples.valid_chunk_1,
27
+ html_examples.valid_chunk_2,
28
+ html_examples.valid_chunk_3_remove_leading_brakes,
29
+ ]
30
+ for index, chunk in enumerate(chunks):
31
+ assert chunk == valid_chunks[index], (
32
+ f"expected: \n\n{valid_chunks[index]} \n\n got: \n\n{chunk}"
33
+ )
34
+
35
+ def test_html_splitter_max_length_550():
36
+ chunks = split_html_for_telegram(
37
+ html_examples.long_code_input, max_length=550, trim_empty_leading_lines=True
38
+ )
39
+
40
+ def load_expected_chunks_550():
41
+ raw = re.split(r"END\n?", html_examples.expected_550)
42
+ chunks = []
43
+ for part in raw:
44
+ if not part.strip():
45
+ continue
46
+ lines = part.splitlines()
47
+ chunks.append("\n".join(lines[1:]))
48
+ return chunks
49
+
50
+ valid_chunks = load_expected_chunks_550()
51
+ for index, chunk in enumerate(chunks):
52
+ assert chunk == valid_chunks[index], (
53
+ f"expected: \n\n{valid_chunks[index]} \n\n got: \n\n{chunk}"
54
+ )
55
+ assert len(chunk) <= 550
56
+
57
+ def test_split_html_respects_max_length_by_words():
58
+ text = "<b>" + "<i>word</i> " * 100 + "</b>"
59
+ chunks = split_html_for_telegram(text, max_length=550)
60
+ assert len(chunks) > 1
61
+ for chunk in chunks:
62
+ assert len(chunk) <= 550
63
+ assert chunk.startswith("<b>")
64
+ assert chunk.endswith("</b>")
65
+ assert chunk.count("<i>") == chunk.count("</i>")
66
+
67
+
68
+ def test_split_html_only_tags_raises():
69
+ text = "<b></b>" * 200
70
+ with pytest.raises(ValueError):
71
+ split_html_for_telegram(text, max_length=600)
72
+
73
+
74
+ def test_split_html_min_length_enforced():
75
+ with pytest.raises(ValueError):
76
+ split_html_for_telegram("hello", max_length=MIN_LENGTH - 1)
77
+
78
+
79
+ def test_split_html_long_word_exceeds_limit():
80
+ text = "a" * 600
81
+ chunks = split_html_for_telegram(text, max_length=550)
82
+ assert chunks == ["a" * 550, "a" * 50]
83
+
84
+
85
+ LONG_TEXT = "<b><i>" + "word " * 96 + "word!" + "</i></b>"
86
+
87
+ SHORT_TEXT = "<u>" + "another " * 9 + "another" + "</u>"
88
+
89
+
90
+ def test_split_html_keeps_newline_without_trim():
91
+ text = LONG_TEXT + "\n\n" + SHORT_TEXT
92
+ chunks = split_html_for_telegram(text, max_length=500, trim_empty_leading_lines=False)
93
+ assert chunks[0] == LONG_TEXT
94
+ assert chunks[1].startswith("\n")
95
+ assert chunks[1].endswith(SHORT_TEXT)
96
+ assert chunks[1].lstrip("\n").startswith("<u>")
97
+ assert chunks[1].lstrip("\n").endswith("</u>")
98
+
99
+
100
+ def test_split_html_trims_leading_newline_on_new_chunk():
101
+ text = LONG_TEXT + "\n\n" + SHORT_TEXT
102
+ chunks = split_html_for_telegram(text, max_length=500, trim_empty_leading_lines=True)
103
+ assert chunks == [LONG_TEXT, SHORT_TEXT]
@@ -1,3 +0,0 @@
1
- from .telegram_formatter import telegram_format
2
-
3
- __all__ = ["telegram_format"]
@@ -1,114 +0,0 @@
1
- import re
2
- from html.parser import HTMLParser
3
-
4
- MAX_LENGTH = 4096
5
-
6
-
7
- class HTMLTagTracker(HTMLParser):
8
- def __init__(self):
9
- super().__init__()
10
- self.open_tags = []
11
-
12
- def handle_starttag(self, tag, attrs):
13
- # saving tags
14
- if tag in ("b", "i", "u", "s", "code", "pre", "a", "span", "blockquote"):
15
- self.open_tags.append((tag, attrs))
16
-
17
- def handle_endtag(self, tag):
18
- for i in range(len(self.open_tags) - 1, -1, -1):
19
- if self.open_tags[i][0] == tag:
20
- del self.open_tags[i]
21
- break
22
-
23
- def get_open_tags_html(self):
24
- parts = []
25
- for tag, attrs in self.open_tags:
26
- attr_str = ""
27
- if attrs:
28
- attr_str = " " + " ".join(f'{k}="{v}"' for k, v in attrs)
29
- parts.append(f"<{tag}{attr_str}>")
30
- return "".join(parts)
31
-
32
- def get_closing_tags_html(self):
33
- return "".join(f"</{tag}>" for tag, _ in reversed(self.open_tags))
34
-
35
-
36
- def split_pre_block(pre_block: str) -> list[str]:
37
- # language-aware: <pre><code class="language-python">...</code></pre>
38
- match = re.match(r"<pre><code(.*?)>(.*)</code></pre>", pre_block, re.DOTALL)
39
- if match:
40
- attr, content = match.groups()
41
- lines = content.splitlines(keepends=True)
42
- chunks, buf = [], ""
43
- for line in lines:
44
- if len(buf) + len(line) + len('<pre><code></code></pre>') > MAX_LENGTH:
45
- chunks.append(f"<pre><code{attr}>{buf}</code></pre>")
46
- buf = ""
47
- buf += line
48
- if buf:
49
- chunks.append(f"<pre><code{attr}>{buf}</code></pre>")
50
- return chunks
51
- else:
52
- # regular <pre>...</pre>
53
- inner = pre_block[5:-6]
54
- lines = inner.splitlines(keepends=True)
55
- chunks, buf = [], ""
56
- for line in lines:
57
- if len(buf) + len(line) + len('<pre></pre>') > MAX_LENGTH:
58
- chunks.append(f"<pre>{buf}</pre>")
59
- buf = ""
60
- buf += line
61
- if buf:
62
- chunks.append(f"<pre>{buf}</pre>")
63
- return chunks
64
-
65
-
66
- def split_html_for_telegram(text: str) -> list[str]:
67
- chunks = []
68
- pattern = re.compile(r"(<pre>.*?</pre>|<pre><code.*?</code></pre>)", re.DOTALL)
69
- parts = pattern.split(text)
70
-
71
- for part in parts:
72
- if not part:
73
- continue
74
- if part.startswith("<pre>") or part.startswith("<pre><code"):
75
- pre_chunks = split_pre_block(part)
76
- chunks.extend(pre_chunks)
77
- else:
78
- # breaking down regular HTML
79
- tracker = HTMLTagTracker()
80
- current = ""
81
- blocks = re.split(r"(\n\s*\n|<br\s*/?>|\n)", part)
82
- for block in blocks:
83
- prospective = current + block
84
- if len(prospective) > MAX_LENGTH:
85
- tracker.feed(current)
86
- open_tags = tracker.get_open_tags_html()
87
- close_tags = tracker.get_closing_tags_html()
88
- chunks.append(open_tags + current + close_tags)
89
- current = block
90
- tracker = HTMLTagTracker()
91
- else:
92
- current = prospective
93
- if current.strip():
94
- tracker.feed(current)
95
- open_tags = tracker.get_open_tags_html()
96
- close_tags = tracker.get_closing_tags_html()
97
- chunks.append(open_tags + current + close_tags)
98
-
99
- # post-unification: combine chunks if they don't exceed the limit in total
100
- merged_chunks = []
101
- buf = ""
102
- for chunk in chunks:
103
- # chunk = chunk.lstrip("\n") # removing leading line breaks
104
-
105
- if len(buf) + len(chunk) <= MAX_LENGTH:
106
- buf += chunk
107
- else:
108
- if buf:
109
- merged_chunks.append(buf)
110
- buf = chunk
111
- if buf:
112
- merged_chunks.append(buf)
113
-
114
- return merged_chunks
@@ -1,436 +0,0 @@
1
- from chatgpt_md_converter.html_splitter import split_html_for_telegram
2
-
3
- input_text ="""
4
- Absolutely! Here’s a Markdown-formatted message exceeding 5,000 characters, exploring <b>The History and Impact of Computer Programming</b>. (You can verify the character count using any online tool.)
5
-
6
- ---
7
-
8
- <b>The History and Impact of Computer Programming</b>
9
-
10
- <i>“The computer was born to solve problems that did not exist before.”</i>
11
- — Bill Gates
12
-
13
- ---
14
-
15
- <b>Table of Contents</b>
16
-
17
- 1. <a href="#introduction">Introduction</a>
18
- 2. <a href="#ancient-beginnings-from-algorithms-to-machines">Ancient Beginnings: From Algorithms to Machines</a>
19
- • <a href="#al-khwarizmi-and-the-algorithm">Al-Khwarizmi and the Algorithm</a>
20
- • <a href="#the-analytical-engine">The Analytical Engine</a>
21
- • <a href="#punch-cards-and-the-jacquard-loom">Punch Cards and the Jacquard Loom</a>
22
- 3. <a href="#20th-century-the-birth-of-modern-programming">20th Century: The Birth of Modern Programming</a>
23
- • <a href="#eniac-and-early-programmers">ENIAC and Early Programmers</a>
24
- • <a href="#assembly-language-and-early-high-level-languages">Assembly Language and Early High-level Languages</a>
25
- • <a href="#cobol-fortran-and-the-expansion">COBOL, FORTRAN, and the Expansion</a>
26
- 4. <a href="#modern-era-languages-paradigms-and-the-internet">Modern Era: Languages, Paradigms, and the Internet</a>
27
- • <a href="#object-oriented-programming">Object-Oriented Programming</a>
28
- • <a href="#internet-and-open-source">Internet and Open Source</a>
29
- • <a href="#mobile-and-cloud-computing">Mobile and Cloud Computing</a>
30
- 5. <a href="#programmings-societal-impact">Programming’s Societal Impact</a>
31
- 6. <a href="#ethics-challenges-and-the-future">Ethics, Challenges, and the Future</a>
32
- 7. <a href="#conclusion">Conclusion</a>
33
- 8. <a href="#useful-resources">Useful Resources</a>
34
-
35
- ---
36
-
37
- <b>Introduction</b>
38
-
39
- Computer programming is the science and art of giving computers instructions to perform specific tasks. Today, it's impossible to imagine a world without software: from banking systems and mobile applications to traffic lights and airplanes, programming is everywhere.
40
-
41
- But how did programming begin, and what has it become today? This document explores the journey of programming, from ancient mathematical roots to the future of artificial intelligence.
42
-
43
- ---
44
-
45
- <b>Ancient Beginnings: From Algorithms to Machines</b>
46
-
47
- <b>Al-Khwarizmi and the Algorithm</b>
48
-
49
- The term "<b>algorithm</b>" (the foundation of programming) comes from Abu Abdullah Muhammad ibn Musa Al-Khwarizmi, a 9th-century Persian mathematician. His works on systematic procedures laid the groundwork for computational thinking.
50
-
51
- <b>The Analytical Engine</b>
52
-
53
- In the 19th century, <b>Charles Babbage</b> designed the Analytical Engine, a mechanical general-purpose computer. Though never built in his lifetime, it could—in theory—read instructions from punched cards.
54
-
55
- <b>Ada Lovelace</b>, Babbage's collaborator, is often called the first computer programmer. She wrote notes describing algorithms (in essence, programs) for the Analytical Engine to compute Bernoulli numbers.
56
-
57
- <blockquote>"That brain of mine is something more than merely mortal; as time will show."
58
- – Ada Lovelace</blockquote>
59
-
60
- <b>Punch Cards and the Jacquard Loom</b>
61
-
62
- The concept of programming a machine with punched cards predates computers. <b>Joseph Marie Jacquard</b> invented a loom in 1804 that used punched cards to control patterns in woven fabric—an early example of machine automation.
63
-
64
- ---
65
-
66
- <b>20th Century: The Birth of Modern Programming</b>
67
-
68
- <b>ENIAC and Early Programmers</b>
69
-
70
- ENIAC (Electronic Numerical Integrator and Computer), completed in 1945, is often cited as the first electronic general-purpose computer.
71
-
72
- Early programming was entirely manual and physically laborious—think patch cables and switches!
73
-
74
- Notably, many of the earliest programmers were women, such as <b>Kathleen McNulty</b>, <b>Jean Jennings</b>, and <b>Grace Hopper</b>.
75
-
76
- <b>Assembly Language and Early High-level Languages</b>
77
-
78
- The problem of complexity led to <b>assembly languages</b>, where mnemonics like <code>MOV</code> and <code>ADD</code> replaced binary codes. Programming became more accessible, but code was still hardware-specific.
79
-
80
- The 1950s saw the creation of:
81
-
82
- • <b>FORTRAN</b> (FORmula TRANslation) for scientific computation
83
- • <b>COBOL</b> (COmmon Business-Oriented Language) for business applications
84
-
85
- <b>Code Example: Hello World in COBOL</b>
86
- <pre><code class="language-cobol">IDENTIFICATION DIVISION.
87
- PROGRAM-ID. HELLO-WORLD.
88
- PROCEDURE DIVISION.
89
- DISPLAY "Hello, World!".
90
- STOP RUN.
91
- </code></pre>
92
-
93
- <b>COBOL, FORTRAN, and the Expansion</b>
94
-
95
- With the advent of high-level languages, programming became less about circuitry and more about solving problems. Standardized languages allowed code to run on multiple machines.
96
-
97
- Other languages soon emerged:
98
-
99
- • <b>LISP</b> (for AI research)
100
- • <b>ALGOL</b> (basis for many future languages)
101
- • <b>BASIC</b> (for beginners and education)
102
-
103
- ---
104
-
105
- <b>Modern Era: Languages, Paradigms, and the Internet</b>
106
-
107
- <b>Object-Oriented Programming</b>
108
-
109
- The 1970s and 1980s introduced <b>object-oriented programming</b> (OOP), where data and behavior are bundled together. The most influential languages here include:
110
-
111
- • <b>Smalltalk</b>: pioneered OOP concepts
112
- • <b>C++</b>: combined OOP with the efficiency of C
113
- • <b>Java</b>: “Write Once, Run Anywhere” with the Java Virtual Machine
114
-
115
- <b>Code Example: Simple Class in Java</b>
116
- <pre><code class="language-java">public class HelloWorld {
117
- public static void main(String[] args) {
118
- System.out.println("Hello, World!");
119
- }
120
- }
121
- </code></pre>
122
-
123
- <b>Internet and Open Source</b>
124
-
125
- The rise of the World Wide Web transformed programming. JavaScript, PHP, and Python became staples for Internet-connected software.
126
-
127
- <b>Open source</b> projects like Linux, Apache, and MySQL changed collaboration forever—developers worldwide could contribute to shared codebases.
128
-
129
- | Year | Technology | Impact |
130
- |------|-------------|-----------------------------------------|
131
- | 1991 | Linux | Free, open-source operating systems |
132
- | 1995 | JavaScript | Interactive web applications |
133
- | 2001 | Wikipedia | Collaborative knowledge base |
134
-
135
- <b>Mobile and Cloud Computing</b>
136
-
137
- Smartphones spawned new languages and frameworks (Swift, Kotlin, React Native).
138
-
139
- <b>Cloud computing</b> and <b>APIs</b> mean programs can collaborate on a global scale, in real-time.
140
-
141
- ---
142
-
143
- <b>Programming’s Societal Impact</b>
144
-
145
- Programming is reshaping society in profound ways:
146
-
147
- • <b>Healthcare</b>: Medical imaging, diagnostics, record management
148
- • <b>Finance</b>: Online banking, stock trading algorithms
149
- • <b>Entertainment</b>: Gaming, music streaming, social networks
150
- • <b>Education</b>: E-learning, interactive simulations, content platforms
151
- • <b>Transportation</b>: Navigation, ride-sharing apps, autonomous vehicles
152
- • <b>Science</b>: Processing large datasets, running complex simulations
153
-
154
- <blockquote>“Software is eating the world.”
155
- – Marc Andreessen</blockquote>
156
-
157
- <b>Programming Jobs In Demand</b>
158
-
159
- <pre><code class="language-mermaid">pie
160
- title Programming Job Market (2024)
161
- "Web Development" : 31
162
- "Data Science" : 22
163
- "Mobile Development" : 12
164
- "Embedded Systems" : 8
165
- "Cybersecurity" : 9
166
- "Other": 18
167
- </code></pre>
168
-
169
- <b>Note:</b> Mermaid diagrams require compatible renderers (e.g., GitHub, Obsidian).
170
-
171
- ---
172
-
173
- <b>Ethics, Challenges, and the Future</b>
174
-
175
- With great power comes responsibility. Programmers face new challenges:
176
-
177
- • <b>Bias in Algorithms:</b> Unintentional biases in data can lead to unfair outcomes (e.g., in hiring software or criminal justice prediction).
178
- • <b>Privacy:</b> Handling personal data securely is more critical than ever.
179
- • <b>Safety:</b> In fields like self-driving cars or medical devices, software bugs can have real-world consequences.
180
- • <b>Sustainability:</b> Software should be efficient, minimizing environmental impact in data centers.
181
-
182
- <b>Emerging Trends:</b>
183
-
184
- • <b>Artificial Intelligence:</b> Programs that learn, adapt, and sometimes surprise their creators.
185
- • <b>Quantum Computing:</b> New paradigms for solving currently intractable problems.
186
- • <b>No-Code/Low-Code:</b> Empowering more people to harness computational power.
187
-
188
- ---
189
-
190
- <b>Conclusion</b>
191
-
192
- From mechanical looms to neural networks, programming continues to redefine what’s possible. It’s not just for professional engineers: millions of people use programming as a tool for art, science, business, and personal growth.
193
-
194
- <b>Everyone can learn to code.</b> It might change your life—or even the world.
195
-
196
- <blockquote>"Any sufficiently advanced technology is indistinguishable from magic."
197
- — Arthur C. Clarke</blockquote>
198
-
199
- ---
200
-
201
- <b>Useful Resources</b>
202
-
203
- • <a href="https://www-cs-faculty.stanford.edu/~knuth/taocp.html">The Art of Computer Programming (Donald Knuth)</a>
204
- • <a href="https://www.khanacademy.org/computing/computer-programming">Khan Academy Computer Programming</a>
205
- • <a href="https://www.w3schools.com/">W3Schools Online Tutorials</a>
206
- • <a href="https://www.freecodecamp.org/">freeCodeCamp</a>
207
- • <a href="https://stackoverflow.com/">Stack Overflow</a>
208
- • <a href="https://guides.github.com/">GitHub Guides</a>
209
-
210
- ---
211
-
212
- <i>Thank you for reading! If you’re inspired to begin your coding journey, there has never been a better time to start.</i>
213
-
214
- ---"""
215
-
216
- valid_chunk_1 = """
217
- Absolutely! Here’s a Markdown-formatted message exceeding 5,000 characters, exploring <b>The History and Impact of Computer Programming</b>. (You can verify the character count using any online tool.)
218
-
219
- ---
220
-
221
- <b>The History and Impact of Computer Programming</b>
222
-
223
- <i>“The computer was born to solve problems that did not exist before.”</i>
224
- — Bill Gates
225
-
226
- ---
227
-
228
- <b>Table of Contents</b>
229
-
230
- 1. <a href="#introduction">Introduction</a>
231
- 2. <a href="#ancient-beginnings-from-algorithms-to-machines">Ancient Beginnings: From Algorithms to Machines</a>
232
- • <a href="#al-khwarizmi-and-the-algorithm">Al-Khwarizmi and the Algorithm</a>
233
- • <a href="#the-analytical-engine">The Analytical Engine</a>
234
- • <a href="#punch-cards-and-the-jacquard-loom">Punch Cards and the Jacquard Loom</a>
235
- 3. <a href="#20th-century-the-birth-of-modern-programming">20th Century: The Birth of Modern Programming</a>
236
- • <a href="#eniac-and-early-programmers">ENIAC and Early Programmers</a>
237
- • <a href="#assembly-language-and-early-high-level-languages">Assembly Language and Early High-level Languages</a>
238
- • <a href="#cobol-fortran-and-the-expansion">COBOL, FORTRAN, and the Expansion</a>
239
- 4. <a href="#modern-era-languages-paradigms-and-the-internet">Modern Era: Languages, Paradigms, and the Internet</a>
240
- • <a href="#object-oriented-programming">Object-Oriented Programming</a>
241
- • <a href="#internet-and-open-source">Internet and Open Source</a>
242
- • <a href="#mobile-and-cloud-computing">Mobile and Cloud Computing</a>
243
- 5. <a href="#programmings-societal-impact">Programming’s Societal Impact</a>
244
- 6. <a href="#ethics-challenges-and-the-future">Ethics, Challenges, and the Future</a>
245
- 7. <a href="#conclusion">Conclusion</a>
246
- 8. <a href="#useful-resources">Useful Resources</a>
247
-
248
- ---
249
-
250
- <b>Introduction</b>
251
-
252
- Computer programming is the science and art of giving computers instructions to perform specific tasks. Today, it's impossible to imagine a world without software: from banking systems and mobile applications to traffic lights and airplanes, programming is everywhere.
253
-
254
- But how did programming begin, and what has it become today? This document explores the journey of programming, from ancient mathematical roots to the future of artificial intelligence.
255
-
256
- ---
257
-
258
- <b>Ancient Beginnings: From Algorithms to Machines</b>
259
-
260
- <b>Al-Khwarizmi and the Algorithm</b>
261
-
262
- The term "<b>algorithm</b>" (the foundation of programming) comes from Abu Abdullah Muhammad ibn Musa Al-Khwarizmi, a 9th-century Persian mathematician. His works on systematic procedures laid the groundwork for computational thinking.
263
-
264
- <b>The Analytical Engine</b>
265
-
266
- In the 19th century, <b>Charles Babbage</b> designed the Analytical Engine, a mechanical general-purpose computer. Though never built in his lifetime, it could—in theory—read instructions from punched cards.
267
-
268
- <b>Ada Lovelace</b>, Babbage's collaborator, is often called the first computer programmer. She wrote notes describing algorithms (in essence, programs) for the Analytical Engine to compute Bernoulli numbers.
269
-
270
- <blockquote>"That brain of mine is something more than merely mortal; as time will show."
271
- – Ada Lovelace</blockquote>
272
-
273
- <b>Punch Cards and the Jacquard Loom</b>
274
-
275
- The concept of programming a machine with punched cards predates computers. <b>Joseph Marie Jacquard</b> invented a loom in 1804 that used punched cards to control patterns in woven fabric—an early example of machine automation.
276
-
277
- ---
278
-
279
- <b>20th Century: The Birth of Modern Programming</b>
280
-
281
- <b>ENIAC and Early Programmers</b>
282
-
283
- ENIAC (Electronic Numerical Integrator and Computer), completed in 1945, is often cited as the first electronic general-purpose computer.
284
-
285
- Early programming was entirely manual and physically laborious—think patch cables and switches!
286
-
287
- Notably, many of the earliest programmers were women, such as <b>Kathleen McNulty</b>, <b>Jean Jennings</b>, and <b>Grace Hopper</b>.
288
-
289
- <b>Assembly Language and Early High-level Languages</b>
290
-
291
- """
292
-
293
- valid_chunk_2 = """The problem of complexity led to <b>assembly languages</b>, where mnemonics like <code>MOV</code> and <code>ADD</code> replaced binary codes. Programming became more accessible, but code was still hardware-specific.
294
-
295
- The 1950s saw the creation of:
296
-
297
- • <b>FORTRAN</b> (FORmula TRANslation) for scientific computation
298
- • <b>COBOL</b> (COmmon Business-Oriented Language) for business applications
299
-
300
- <b>Code Example: Hello World in COBOL</b>
301
- <pre><code class="language-cobol">IDENTIFICATION DIVISION.
302
- PROGRAM-ID. HELLO-WORLD.
303
- PROCEDURE DIVISION.
304
- DISPLAY "Hello, World!".
305
- STOP RUN.
306
- </code></pre>
307
-
308
- <b>COBOL, FORTRAN, and the Expansion</b>
309
-
310
- With the advent of high-level languages, programming became less about circuitry and more about solving problems. Standardized languages allowed code to run on multiple machines.
311
-
312
- Other languages soon emerged:
313
-
314
- • <b>LISP</b> (for AI research)
315
- • <b>ALGOL</b> (basis for many future languages)
316
- • <b>BASIC</b> (for beginners and education)
317
-
318
- ---
319
-
320
- <b>Modern Era: Languages, Paradigms, and the Internet</b>
321
-
322
- <b>Object-Oriented Programming</b>
323
-
324
- The 1970s and 1980s introduced <b>object-oriented programming</b> (OOP), where data and behavior are bundled together. The most influential languages here include:
325
-
326
- • <b>Smalltalk</b>: pioneered OOP concepts
327
- • <b>C++</b>: combined OOP with the efficiency of C
328
- • <b>Java</b>: “Write Once, Run Anywhere” with the Java Virtual Machine
329
-
330
- <b>Code Example: Simple Class in Java</b>
331
- <pre><code class="language-java">public class HelloWorld {
332
- public static void main(String[] args) {
333
- System.out.println("Hello, World!");
334
- }
335
- }
336
- </code></pre>
337
-
338
- <b>Internet and Open Source</b>
339
-
340
- The rise of the World Wide Web transformed programming. JavaScript, PHP, and Python became staples for Internet-connected software.
341
-
342
- <b>Open source</b> projects like Linux, Apache, and MySQL changed collaboration forever—developers worldwide could contribute to shared codebases.
343
-
344
- | Year | Technology | Impact |
345
- |------|-------------|-----------------------------------------|
346
- | 1991 | Linux | Free, open-source operating systems |
347
- | 1995 | JavaScript | Interactive web applications |
348
- | 2001 | Wikipedia | Collaborative knowledge base |
349
-
350
- <b>Mobile and Cloud Computing</b>
351
-
352
- Smartphones spawned new languages and frameworks (Swift, Kotlin, React Native).
353
-
354
- <b>Cloud computing</b> and <b>APIs</b> mean programs can collaborate on a global scale, in real-time.
355
-
356
- ---
357
-
358
- <b>Programming’s Societal Impact</b>
359
-
360
- Programming is reshaping society in profound ways:
361
-
362
- • <b>Healthcare</b>: Medical imaging, diagnostics, record management
363
- • <b>Finance</b>: Online banking, stock trading algorithms
364
- • <b>Entertainment</b>: Gaming, music streaming, social networks
365
- • <b>Education</b>: E-learning, interactive simulations, content platforms
366
- • <b>Transportation</b>: Navigation, ride-sharing apps, autonomous vehicles
367
- • <b>Science</b>: Processing large datasets, running complex simulations
368
-
369
- <blockquote>“Software is eating the world.”
370
- – Marc Andreessen</blockquote>
371
-
372
- <b>Programming Jobs In Demand</b>
373
-
374
- <pre><code class="language-mermaid">pie
375
- title Programming Job Market (2024)
376
- "Web Development" : 31
377
- "Data Science" : 22
378
- "Mobile Development" : 12
379
- "Embedded Systems" : 8
380
- "Cybersecurity" : 9
381
- "Other": 18
382
- </code></pre>"""
383
-
384
- valid_chunk_3 = """
385
-
386
- <b>Note:</b> Mermaid diagrams require compatible renderers (e.g., GitHub, Obsidian).
387
-
388
- ---
389
-
390
- <b>Ethics, Challenges, and the Future</b>
391
-
392
- With great power comes responsibility. Programmers face new challenges:
393
-
394
- • <b>Bias in Algorithms:</b> Unintentional biases in data can lead to unfair outcomes (e.g., in hiring software or criminal justice prediction).
395
- • <b>Privacy:</b> Handling personal data securely is more critical than ever.
396
- • <b>Safety:</b> In fields like self-driving cars or medical devices, software bugs can have real-world consequences.
397
- • <b>Sustainability:</b> Software should be efficient, minimizing environmental impact in data centers.
398
-
399
- <b>Emerging Trends:</b>
400
-
401
- • <b>Artificial Intelligence:</b> Programs that learn, adapt, and sometimes surprise their creators.
402
- • <b>Quantum Computing:</b> New paradigms for solving currently intractable problems.
403
- • <b>No-Code/Low-Code:</b> Empowering more people to harness computational power.
404
-
405
- ---
406
-
407
- <b>Conclusion</b>
408
-
409
- From mechanical looms to neural networks, programming continues to redefine what’s possible. It’s not just for professional engineers: millions of people use programming as a tool for art, science, business, and personal growth.
410
-
411
- <b>Everyone can learn to code.</b> It might change your life—or even the world.
412
-
413
- <blockquote>"Any sufficiently advanced technology is indistinguishable from magic."
414
- — Arthur C. Clarke</blockquote>
415
-
416
- ---
417
-
418
- <b>Useful Resources</b>
419
-
420
- • <a href="https://www-cs-faculty.stanford.edu/~knuth/taocp.html">The Art of Computer Programming (Donald Knuth)</a>
421
- • <a href="https://www.khanacademy.org/computing/computer-programming">Khan Academy Computer Programming</a>
422
- • <a href="https://www.w3schools.com/">W3Schools Online Tutorials</a>
423
- • <a href="https://www.freecodecamp.org/">freeCodeCamp</a>
424
- • <a href="https://stackoverflow.com/">Stack Overflow</a>
425
- • <a href="https://guides.github.com/">GitHub Guides</a>
426
-
427
- ---
428
-
429
- <i>Thank you for reading! If you’re inspired to begin your coding journey, there has never been a better time to start.</i>
430
-
431
- ---"""
432
-
433
- def test_splitter_test():
434
- chunks = split_html_for_telegram(input_text)
435
- valid_chunks = [valid_chunk_1, valid_chunk_2, valid_chunk_3]
436
- assert chunks == valid_chunks