chatgpt-md-converter 0.3.7__tar.gz → 0.3.8__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/PKG-INFO +1 -1
- chatgpt_md_converter-0.3.8/README.md +147 -0
- chatgpt_md_converter-0.3.8/chatgpt_md_converter/__init__.py +4 -0
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/extractors.py +19 -18
- chatgpt_md_converter-0.3.8/chatgpt_md_converter/html_splitter.py +239 -0
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter.egg-info/PKG-INFO +1 -1
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter.egg-info/SOURCES.txt +1 -0
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/setup.py +2 -2
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/tests/test_parser.py +20 -0
- chatgpt_md_converter-0.3.8/tests/test_splitter.py +103 -0
- chatgpt_md_converter-0.3.7/chatgpt_md_converter/__init__.py +0 -3
- chatgpt_md_converter-0.3.7/chatgpt_md_converter/html_splitter.py +0 -114
- chatgpt_md_converter-0.3.7/tests/test_splitter.py +0 -436
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/LICENSE +0 -0
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/converters.py +0 -0
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/formatters.py +0 -0
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/helpers.py +0 -0
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/telegram_formatter.py +0 -0
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter.egg-info/dependency_links.txt +0 -0
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter.egg-info/top_level.txt +0 -0
- {chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/setup.cfg +0 -0
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
# ChatGPT Markdown to Telegram HTML Parser
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
This project provides a solution for converting Telegram-style Markdown formatted text into HTML markup supported by the Telegram Bot API, specifically tailored for use in ChatGPT bots developed with the OpenAI API. It includes features for handling various Markdown elements and ensures proper tag closure, making it suitable for streaming mode applications.
|
|
6
|
+
|
|
7
|
+
## Features
|
|
8
|
+
|
|
9
|
+
- Converts Telegram-style Markdown syntax to Telegram-compatible HTML
|
|
10
|
+
- Supports text styling:
|
|
11
|
+
- Bold: `**text**` → `<b>text</b>`
|
|
12
|
+
- Italic: `*text*` or `_text_` → `<i>text</i>`
|
|
13
|
+
- Underline: `__text__` → `<u>text</u>`
|
|
14
|
+
- Strikethrough: `~~text~~` → `<s>text</s>`
|
|
15
|
+
- Spoiler: `||text||` → `<span class="tg-spoiler">text</span>`
|
|
16
|
+
- Inline code: `` `code` `` → `<code>code</code>`
|
|
17
|
+
- Handles nested text styling
|
|
18
|
+
- Converts links: `[text](URL)` → `<a href="URL">text</a>`
|
|
19
|
+
- Processes code blocks with language specification
|
|
20
|
+
- Supports blockquotes:
|
|
21
|
+
- Regular blockquotes: `> text` → `<blockquote>text</blockquote>`
|
|
22
|
+
- Expandable blockquotes: `**> text` → `<blockquote expandable>text</blockquote>`
|
|
23
|
+
- Automatically appends missing closing delimiters for code blocks
|
|
24
|
+
- Escapes HTML special characters to prevent unwanted HTML rendering
|
|
25
|
+
|
|
26
|
+
## Usage
|
|
27
|
+
|
|
28
|
+
To use the Markdown to Telegram HTML Parser in your ChatGPT bot, integrate the provided Python functions into your bot's processing pipeline. Here is a brief overview of how to incorporate the parser:
|
|
29
|
+
|
|
30
|
+
1. **Ensure Closing Delimiters**: Automatically appends missing closing delimiters for backticks to ensure proper parsing.
|
|
31
|
+
|
|
32
|
+
2. **Extract and Convert Code Blocks**: Extracts Markdown code blocks, converts them to HTML `<pre><code>` format, and replaces them with placeholders to prevent formatting within code blocks.
|
|
33
|
+
|
|
34
|
+
3. **Markdown to HTML Conversion**: Applies various regex substitutions and custom logic to convert supported Markdown formatting to Telegram-compatible HTML tags.
|
|
35
|
+
|
|
36
|
+
4. **Reinsert Code Blocks**: Reinserts the previously extracted and converted code blocks back into the main text, replacing placeholders with the appropriate HTML content.
|
|
37
|
+
|
|
38
|
+
Simply call the `telegram_format(text: str) -> str` function with your Markdown-formatted text as input to receive the converted HTML output ready for use with the Telegram Bot API.
|
|
39
|
+
|
|
40
|
+
## Installation
|
|
41
|
+
|
|
42
|
+
```sh
|
|
43
|
+
pip install chatgpt-md-converter
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
## Example
|
|
47
|
+
|
|
48
|
+
```python
|
|
49
|
+
from chatgpt_md_converter import telegram_format
|
|
50
|
+
|
|
51
|
+
# Basic formatting example
|
|
52
|
+
text = """
|
|
53
|
+
Here is some **bold**, __underline__, and `inline code`.
|
|
54
|
+
This is a ||spoiler text|| and *italic*.
|
|
55
|
+
|
|
56
|
+
Code example:
|
|
57
|
+
print('Hello, world!')
|
|
58
|
+
"""
|
|
59
|
+
|
|
60
|
+
# Blockquotes example
|
|
61
|
+
blockquote_text = """
|
|
62
|
+
> Regular blockquote
|
|
63
|
+
> Multiple lines
|
|
64
|
+
|
|
65
|
+
**> Expandable blockquote
|
|
66
|
+
> Hidden by default
|
|
67
|
+
> Multiple lines
|
|
68
|
+
"""
|
|
69
|
+
|
|
70
|
+
formatted_text = telegram_format(text)
|
|
71
|
+
formatted_blockquote = telegram_format(blockquote_text)
|
|
72
|
+
|
|
73
|
+
print(formatted_text)
|
|
74
|
+
print(formatted_blockquote)
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### Output:
|
|
78
|
+
|
|
79
|
+
```
|
|
80
|
+
Here is some <b>bold</b>, <u>underline</u>, and <code>inline code</code>.
|
|
81
|
+
This is a <span class="tg-spoiler">spoiler text</span> and <i>italic</i>.
|
|
82
|
+
|
|
83
|
+
Code example:
|
|
84
|
+
print('Hello, world!')
|
|
85
|
+
|
|
86
|
+
<blockquote>Regular blockquote
|
|
87
|
+
Multiple lines</blockquote>
|
|
88
|
+
|
|
89
|
+
<blockquote expandable>Expandable blockquote
|
|
90
|
+
Hidden by default
|
|
91
|
+
Multiple lines</blockquote>
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
## Requirements
|
|
95
|
+
|
|
96
|
+
- Python 3.x
|
|
97
|
+
- No external libraries required (uses built-in `re` module for regex operations)
|
|
98
|
+
|
|
99
|
+
## Contribution
|
|
100
|
+
|
|
101
|
+
Feel free to contribute to this project by submitting pull requests or opening issues for bugs, feature requests, or improvements.
|
|
102
|
+
|
|
103
|
+
## Prompting LLMs for Telegram-Specific Formatting
|
|
104
|
+
|
|
105
|
+
> **Note**:
|
|
106
|
+
> Since standard Markdown doesn't include Telegram-specific features like spoilers (`||text||`) and expandable blockquotes (`**> text`), you'll need to explicitly instruct LLMs to use these formats. Here's a suggested prompt addition to include in your system message or initial instructions:
|
|
107
|
+
|
|
108
|
+
````
|
|
109
|
+
When formatting your responses for Telegram, please use these special formatting conventions:
|
|
110
|
+
|
|
111
|
+
1. For content that should be hidden as a spoiler (revealed only when users click):
|
|
112
|
+
Use: ||spoiler content here||
|
|
113
|
+
Example: This is visible, but ||this is hidden until clicked||.
|
|
114
|
+
|
|
115
|
+
2. For lengthy explanations or optional content that should be collapsed:
|
|
116
|
+
Use: **> Expandable section title
|
|
117
|
+
|
|
118
|
+
> Content line 1
|
|
119
|
+
> Content line 2
|
|
120
|
+
> (Each line of the expandable blockquote should start with ">")
|
|
121
|
+
|
|
122
|
+
3. Continue using standard markdown for other formatting:
|
|
123
|
+
- **bold text**
|
|
124
|
+
- *italic text*
|
|
125
|
+
- __underlined text__
|
|
126
|
+
- ~~strikethrough~~
|
|
127
|
+
- `inline code`
|
|
128
|
+
- ```code blocks```
|
|
129
|
+
- [link text](URL)
|
|
130
|
+
|
|
131
|
+
Apply spoilers for:
|
|
132
|
+
|
|
133
|
+
- Solution reveals
|
|
134
|
+
- Potential plot spoilers
|
|
135
|
+
- Sensitive information
|
|
136
|
+
- Surprising facts
|
|
137
|
+
|
|
138
|
+
Use expandable blockquotes for:
|
|
139
|
+
|
|
140
|
+
- Detailed explanations
|
|
141
|
+
- Long examples
|
|
142
|
+
- Optional reading
|
|
143
|
+
- Technical details
|
|
144
|
+
- Additional context not needed by all users
|
|
145
|
+
````
|
|
146
|
+
|
|
147
|
+
You can add this prompt to your system message when initializing your ChatGPT interactions to ensure the model properly formats content for optimal display in Telegram.
|
{chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/extractors.py
RENAMED
|
@@ -2,31 +2,32 @@ import re
|
|
|
2
2
|
|
|
3
3
|
|
|
4
4
|
def ensure_closing_delimiters(text: str) -> str:
|
|
5
|
-
|
|
5
|
+
# Append missing closing backtick delimiters.
|
|
6
6
|
|
|
7
7
|
code_block_re = re.compile(
|
|
8
8
|
r"(?P<fence>`{3,})(?P<lang>\w+)?\n?[\s\S]*?(?<=\n)?(?P=fence)",
|
|
9
9
|
flags=re.DOTALL,
|
|
10
10
|
)
|
|
11
11
|
|
|
12
|
-
#
|
|
13
|
-
#
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
fence = m.group("fence")
|
|
23
|
-
if stack and fence == stack[-1]:
|
|
24
|
-
stack.pop()
|
|
12
|
+
# Track an open fence. Once a fence is opened, everything until the same
|
|
13
|
+
# fence is encountered again is treated as plain text. This mimics how
|
|
14
|
+
# Markdown handles fences and allows fence-like strings inside code blocks.
|
|
15
|
+
open_fence = None
|
|
16
|
+
for line in text.splitlines():
|
|
17
|
+
stripped = line.strip()
|
|
18
|
+
if open_fence is None:
|
|
19
|
+
m = re.match(r"^(?P<fence>`{3,})(?P<lang>\w+)?$", stripped)
|
|
20
|
+
if m:
|
|
21
|
+
open_fence = m.group("fence")
|
|
25
22
|
else:
|
|
26
|
-
|
|
23
|
+
if stripped.endswith(open_fence):
|
|
24
|
+
open_fence = None
|
|
27
25
|
|
|
28
|
-
|
|
29
|
-
|
|
26
|
+
# If a fence was left open, append a matching closing fence.
|
|
27
|
+
if open_fence is not None:
|
|
28
|
+
if not text.endswith("\n"):
|
|
29
|
+
text += "\n"
|
|
30
|
+
text += open_fence
|
|
30
31
|
|
|
31
32
|
cleaned_inline = code_block_re.sub("", text)
|
|
32
33
|
|
|
@@ -91,4 +92,4 @@ def reinsert_code_blocks(text: str, code_blocks: dict) -> str:
|
|
|
91
92
|
"""
|
|
92
93
|
for placeholder, html_code_block in code_blocks.items():
|
|
93
94
|
text = text.replace(placeholder, html_code_block, 1)
|
|
94
|
-
return text
|
|
95
|
+
return text
|
|
@@ -0,0 +1,239 @@
|
|
|
1
|
+
import re
|
|
2
|
+
from html.parser import HTMLParser
|
|
3
|
+
|
|
4
|
+
MAX_LENGTH = 4096
|
|
5
|
+
MIN_LENGTH = 500
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
class HTMLTagTracker(HTMLParser):
|
|
9
|
+
def __init__(self):
|
|
10
|
+
super().__init__()
|
|
11
|
+
self.open_tags = []
|
|
12
|
+
|
|
13
|
+
def handle_starttag(self, tag, attrs):
|
|
14
|
+
# saving tags
|
|
15
|
+
if tag in (
|
|
16
|
+
"b", "i", "u", "s", "code", "pre", "a", "span", "blockquote",
|
|
17
|
+
"strong", "em", "ins", "strike", "del", "tg-spoiler", "tg-emoji"
|
|
18
|
+
):
|
|
19
|
+
self.open_tags.append((tag, attrs))
|
|
20
|
+
|
|
21
|
+
def handle_endtag(self, tag):
|
|
22
|
+
for i in range(len(self.open_tags) - 1, -1, -1):
|
|
23
|
+
if self.open_tags[i][0] == tag:
|
|
24
|
+
del self.open_tags[i]
|
|
25
|
+
break
|
|
26
|
+
|
|
27
|
+
def get_open_tags_html(self):
|
|
28
|
+
parts = []
|
|
29
|
+
for tag, attrs in self.open_tags:
|
|
30
|
+
attr_str = ""
|
|
31
|
+
if attrs:
|
|
32
|
+
attr_str = " " + " ".join(f'{k}="{v}"' for k, v in attrs)
|
|
33
|
+
parts.append(f"<{tag}{attr_str}>")
|
|
34
|
+
return "".join(parts)
|
|
35
|
+
|
|
36
|
+
def get_closing_tags_html(self):
|
|
37
|
+
return "".join(f"</{tag}>" for tag, _ in reversed(self.open_tags))
|
|
38
|
+
|
|
39
|
+
|
|
40
|
+
def split_pre_block(pre_block: str, max_length) -> list[str]:
|
|
41
|
+
"""
|
|
42
|
+
Splits long HTML-formatted text into chunks suitable for sending via Telegram,
|
|
43
|
+
preserving valid HTML tag nesting and handling <pre>/<code> blocks separately.
|
|
44
|
+
|
|
45
|
+
Args:
|
|
46
|
+
text (str): The input HTML-formatted string.
|
|
47
|
+
trim_leading_newlines (bool): If True, removes leading newline characters (`\\n`)
|
|
48
|
+
from each resulting chunk before sending. This is useful to avoid
|
|
49
|
+
unnecessary blank space at the beginning of messages in Telegram.
|
|
50
|
+
|
|
51
|
+
Returns:
|
|
52
|
+
list[str]: A list of HTML-formatted message chunks, each within Telegram's length limit.
|
|
53
|
+
"""
|
|
54
|
+
|
|
55
|
+
# language-aware: <pre><code class="language-python">...</code></pre>
|
|
56
|
+
match = re.match(r"<pre><code(.*?)>(.*)</code></pre>", pre_block, re.DOTALL)
|
|
57
|
+
if match:
|
|
58
|
+
attr, content = match.groups()
|
|
59
|
+
lines = content.splitlines(keepends=True)
|
|
60
|
+
chunks, buf = [], ""
|
|
61
|
+
overhead = len(f"<pre><code{attr}></code></pre>")
|
|
62
|
+
for line in lines:
|
|
63
|
+
if len(buf) + len(line) + overhead > max_length:
|
|
64
|
+
chunks.append(f"<pre><code{attr}>{buf}</code></pre>")
|
|
65
|
+
buf = ""
|
|
66
|
+
buf += line
|
|
67
|
+
if buf:
|
|
68
|
+
chunks.append(f"<pre><code{attr}>{buf}</code></pre>")
|
|
69
|
+
return chunks
|
|
70
|
+
else:
|
|
71
|
+
# regular <pre>...</pre>
|
|
72
|
+
inner = pre_block[5:-6]
|
|
73
|
+
lines = inner.splitlines(keepends=True)
|
|
74
|
+
chunks, buf = [], ""
|
|
75
|
+
overhead = len('<pre></pre>')
|
|
76
|
+
for line in lines:
|
|
77
|
+
if len(buf) + len(line) + overhead > max_length:
|
|
78
|
+
chunks.append(f"<pre>{buf}</pre>")
|
|
79
|
+
buf = ""
|
|
80
|
+
buf += line
|
|
81
|
+
if buf:
|
|
82
|
+
chunks.append(f"<pre>{buf}</pre>")
|
|
83
|
+
return chunks
|
|
84
|
+
|
|
85
|
+
|
|
86
|
+
def _is_only_tags(block: str) -> bool:
|
|
87
|
+
return bool(re.fullmatch(r'(?:\s*<[^>]+>\s*)+', block))
|
|
88
|
+
|
|
89
|
+
|
|
90
|
+
def _effective_length(content: str) -> int:
|
|
91
|
+
tracker = HTMLTagTracker()
|
|
92
|
+
tracker.feed(content)
|
|
93
|
+
return len(tracker.get_open_tags_html()) + len(content) + len(tracker.get_closing_tags_html())
|
|
94
|
+
|
|
95
|
+
|
|
96
|
+
def split_html_for_telegram(text: str, trim_empty_leading_lines: bool = False, max_length: int = MAX_LENGTH) -> list[str]:
|
|
97
|
+
"""Split long HTML-formatted text into Telegram-compatible chunks.
|
|
98
|
+
|
|
99
|
+
Parameters
|
|
100
|
+
----------
|
|
101
|
+
text: str
|
|
102
|
+
Input HTML text.
|
|
103
|
+
trim_empty_leading_lines: bool, optional
|
|
104
|
+
If True, removes `\n` sybmols from start of chunks.
|
|
105
|
+
max_length: int, optional
|
|
106
|
+
Maximum allowed length for a single chunk (must be >= ``MIN_LENGTH = 500``).
|
|
107
|
+
Default = 4096 (symbols)
|
|
108
|
+
|
|
109
|
+
Returns
|
|
110
|
+
-------
|
|
111
|
+
list[str]
|
|
112
|
+
List of HTML chunks.
|
|
113
|
+
"""
|
|
114
|
+
|
|
115
|
+
if max_length < MIN_LENGTH:
|
|
116
|
+
raise ValueError("max_length should be at least %d" % MIN_LENGTH)
|
|
117
|
+
|
|
118
|
+
pattern = re.compile(r"(<pre>.*?</pre>|<pre><code.*?</code></pre>)", re.DOTALL)
|
|
119
|
+
parts = pattern.split(text)
|
|
120
|
+
|
|
121
|
+
chunks: list[str] = []
|
|
122
|
+
prefix = ""
|
|
123
|
+
current = ""
|
|
124
|
+
whitespace_re = re.compile(r"(\\s+)")
|
|
125
|
+
tag_re = re.compile(r"(<[^>]+>)")
|
|
126
|
+
|
|
127
|
+
def finalize():
|
|
128
|
+
nonlocal current, prefix
|
|
129
|
+
tracker = HTMLTagTracker()
|
|
130
|
+
tracker.feed(prefix + current)
|
|
131
|
+
chunk = prefix + current + tracker.get_closing_tags_html()
|
|
132
|
+
chunks.append(chunk)
|
|
133
|
+
prefix = tracker.get_open_tags_html()
|
|
134
|
+
current = ""
|
|
135
|
+
|
|
136
|
+
def append_piece(piece: str):
|
|
137
|
+
nonlocal current, prefix
|
|
138
|
+
|
|
139
|
+
def split_on_whitespace(chunk: str) -> list[str] | None:
|
|
140
|
+
parts = [part for part in whitespace_re.split(chunk) if part]
|
|
141
|
+
if len(parts) <= 1:
|
|
142
|
+
return None
|
|
143
|
+
return parts
|
|
144
|
+
|
|
145
|
+
def split_on_tags(chunk: str) -> list[str] | None:
|
|
146
|
+
parts = [part for part in tag_re.split(chunk) if part]
|
|
147
|
+
if len(parts) <= 1:
|
|
148
|
+
return None
|
|
149
|
+
return parts
|
|
150
|
+
|
|
151
|
+
def fittable_prefix_length(chunk: str) -> int:
|
|
152
|
+
low, high = 1, len(chunk)
|
|
153
|
+
best = 0
|
|
154
|
+
while low <= high:
|
|
155
|
+
mid = (low + high) // 2
|
|
156
|
+
candidate = chunk[:mid]
|
|
157
|
+
if _effective_length(prefix + current + candidate) <= max_length:
|
|
158
|
+
best = mid
|
|
159
|
+
low = mid + 1
|
|
160
|
+
else:
|
|
161
|
+
high = mid - 1
|
|
162
|
+
return best
|
|
163
|
+
|
|
164
|
+
while piece:
|
|
165
|
+
if _effective_length(prefix + current + piece) <= max_length:
|
|
166
|
+
current += piece
|
|
167
|
+
return
|
|
168
|
+
|
|
169
|
+
if len(piece) > max_length:
|
|
170
|
+
if _is_only_tags(piece):
|
|
171
|
+
raise ValueError("block contains only html tags")
|
|
172
|
+
splitted = split_on_whitespace(piece)
|
|
173
|
+
if splitted:
|
|
174
|
+
for part in splitted:
|
|
175
|
+
append_piece(part)
|
|
176
|
+
return
|
|
177
|
+
tag_split = split_on_tags(piece)
|
|
178
|
+
if tag_split:
|
|
179
|
+
for part in tag_split:
|
|
180
|
+
append_piece(part)
|
|
181
|
+
return
|
|
182
|
+
elif current:
|
|
183
|
+
finalize()
|
|
184
|
+
continue
|
|
185
|
+
else:
|
|
186
|
+
splitted = split_on_whitespace(piece)
|
|
187
|
+
if splitted:
|
|
188
|
+
for part in splitted:
|
|
189
|
+
append_piece(part)
|
|
190
|
+
return
|
|
191
|
+
tag_split = split_on_tags(piece)
|
|
192
|
+
if tag_split:
|
|
193
|
+
for part in tag_split:
|
|
194
|
+
append_piece(part)
|
|
195
|
+
return
|
|
196
|
+
|
|
197
|
+
fitted = fittable_prefix_length(piece)
|
|
198
|
+
if fitted == 0:
|
|
199
|
+
if current:
|
|
200
|
+
finalize()
|
|
201
|
+
continue
|
|
202
|
+
raise ValueError("unable to split content within max_length")
|
|
203
|
+
|
|
204
|
+
current += piece[:fitted]
|
|
205
|
+
piece = piece[fitted:]
|
|
206
|
+
|
|
207
|
+
if piece:
|
|
208
|
+
finalize()
|
|
209
|
+
|
|
210
|
+
|
|
211
|
+
for part in parts:
|
|
212
|
+
if not part:
|
|
213
|
+
continue
|
|
214
|
+
if part.startswith("<pre>") or part.startswith("<pre><code"):
|
|
215
|
+
pre_chunks = split_pre_block(part, max_length=max_length)
|
|
216
|
+
for pc in pre_chunks:
|
|
217
|
+
append_piece(pc)
|
|
218
|
+
continue
|
|
219
|
+
blocks = re.split(r"(\n\s*\n|<br\s*/?>|\n)", part)
|
|
220
|
+
for block in blocks:
|
|
221
|
+
if block:
|
|
222
|
+
append_piece(block)
|
|
223
|
+
|
|
224
|
+
if current:
|
|
225
|
+
finalize()
|
|
226
|
+
|
|
227
|
+
merged: list[str] = []
|
|
228
|
+
buf = ""
|
|
229
|
+
for chunk in chunks:
|
|
230
|
+
if len(buf) + len(chunk) <= max_length:
|
|
231
|
+
buf += chunk
|
|
232
|
+
else:
|
|
233
|
+
if buf:
|
|
234
|
+
merged.append(buf)
|
|
235
|
+
buf = chunk.lstrip("\n") if trim_empty_leading_lines and merged else chunk
|
|
236
|
+
if buf:
|
|
237
|
+
merged.append(buf.lstrip("\n") if trim_empty_leading_lines and merged else buf)
|
|
238
|
+
|
|
239
|
+
return merged
|
|
@@ -2,11 +2,11 @@ from setuptools import setup
|
|
|
2
2
|
|
|
3
3
|
setup(
|
|
4
4
|
name="chatgpt_md_converter",
|
|
5
|
-
version="0.3.
|
|
5
|
+
version="0.3.8",
|
|
6
6
|
author="Kostiantyn Kriuchkov",
|
|
7
7
|
author_email="latand666@gmail.com",
|
|
8
8
|
description="A package for converting markdown to HTML for chat Telegram bots",
|
|
9
|
-
long_description=open("README.
|
|
9
|
+
long_description=open("README.md").read(),
|
|
10
10
|
long_description_content_type="text/markdown",
|
|
11
11
|
url="https://github.com/botfather-dev/formatter-chatgpt-telegram",
|
|
12
12
|
classifiers=[
|
|
@@ -915,3 +915,23 @@ print("hello world ```"')
|
|
|
915
915
|
print(f"Expected was: \n\n{expected_output}\n\n")
|
|
916
916
|
print(f"output was: \n\n{output}")
|
|
917
917
|
assert output == expected_output, show_output()
|
|
918
|
+
|
|
919
|
+
def test_some_new():
|
|
920
|
+
input_text = """
|
|
921
|
+
``````markdown
|
|
922
|
+
`````
|
|
923
|
+
````python
|
|
924
|
+
print("hello world ```")
|
|
925
|
+
```
|
|
926
|
+
""" # Markdown code wasn't closed
|
|
927
|
+
|
|
928
|
+
expected_output = """<pre><code class=\"language-markdown\">`````
|
|
929
|
+
````python
|
|
930
|
+
print("hello world ```")
|
|
931
|
+
```
|
|
932
|
+
</code></pre>""" # But after closed correctly
|
|
933
|
+
output = telegram_format(input_text)
|
|
934
|
+
def show_output():
|
|
935
|
+
print(f"Expected was: \n\n{expected_output}\n\n")
|
|
936
|
+
print(f"output was: \n\n{output}")
|
|
937
|
+
assert output == expected_output, show_output()
|
|
@@ -0,0 +1,103 @@
|
|
|
1
|
+
import re
|
|
2
|
+
|
|
3
|
+
import pytest
|
|
4
|
+
|
|
5
|
+
from chatgpt_md_converter.html_splitter import (MIN_LENGTH,
|
|
6
|
+
split_html_for_telegram)
|
|
7
|
+
|
|
8
|
+
from . import html_examples
|
|
9
|
+
|
|
10
|
+
|
|
11
|
+
def test_html_splitter():
|
|
12
|
+
chunks = split_html_for_telegram(html_examples.input_text)
|
|
13
|
+
valid_chunks = [
|
|
14
|
+
html_examples.valid_chunk_1,
|
|
15
|
+
html_examples.valid_chunk_2,
|
|
16
|
+
html_examples.valid_chunk_3,
|
|
17
|
+
]
|
|
18
|
+
for index, chunk in enumerate(chunks):
|
|
19
|
+
assert chunk == valid_chunks[index], (
|
|
20
|
+
f"expected: \n\n{valid_chunks[index]} \n\n got: \n\n{chunk}"
|
|
21
|
+
)
|
|
22
|
+
|
|
23
|
+
def test_html_splitter__remove_leading_brakes():
|
|
24
|
+
chunks = split_html_for_telegram(html_examples.input_text, trim_empty_leading_lines=True)
|
|
25
|
+
valid_chunks = [
|
|
26
|
+
html_examples.valid_chunk_1,
|
|
27
|
+
html_examples.valid_chunk_2,
|
|
28
|
+
html_examples.valid_chunk_3_remove_leading_brakes,
|
|
29
|
+
]
|
|
30
|
+
for index, chunk in enumerate(chunks):
|
|
31
|
+
assert chunk == valid_chunks[index], (
|
|
32
|
+
f"expected: \n\n{valid_chunks[index]} \n\n got: \n\n{chunk}"
|
|
33
|
+
)
|
|
34
|
+
|
|
35
|
+
def test_html_splitter_max_length_550():
|
|
36
|
+
chunks = split_html_for_telegram(
|
|
37
|
+
html_examples.long_code_input, max_length=550, trim_empty_leading_lines=True
|
|
38
|
+
)
|
|
39
|
+
|
|
40
|
+
def load_expected_chunks_550():
|
|
41
|
+
raw = re.split(r"END\n?", html_examples.expected_550)
|
|
42
|
+
chunks = []
|
|
43
|
+
for part in raw:
|
|
44
|
+
if not part.strip():
|
|
45
|
+
continue
|
|
46
|
+
lines = part.splitlines()
|
|
47
|
+
chunks.append("\n".join(lines[1:]))
|
|
48
|
+
return chunks
|
|
49
|
+
|
|
50
|
+
valid_chunks = load_expected_chunks_550()
|
|
51
|
+
for index, chunk in enumerate(chunks):
|
|
52
|
+
assert chunk == valid_chunks[index], (
|
|
53
|
+
f"expected: \n\n{valid_chunks[index]} \n\n got: \n\n{chunk}"
|
|
54
|
+
)
|
|
55
|
+
assert len(chunk) <= 550
|
|
56
|
+
|
|
57
|
+
def test_split_html_respects_max_length_by_words():
|
|
58
|
+
text = "<b>" + "<i>word</i> " * 100 + "</b>"
|
|
59
|
+
chunks = split_html_for_telegram(text, max_length=550)
|
|
60
|
+
assert len(chunks) > 1
|
|
61
|
+
for chunk in chunks:
|
|
62
|
+
assert len(chunk) <= 550
|
|
63
|
+
assert chunk.startswith("<b>")
|
|
64
|
+
assert chunk.endswith("</b>")
|
|
65
|
+
assert chunk.count("<i>") == chunk.count("</i>")
|
|
66
|
+
|
|
67
|
+
|
|
68
|
+
def test_split_html_only_tags_raises():
|
|
69
|
+
text = "<b></b>" * 200
|
|
70
|
+
with pytest.raises(ValueError):
|
|
71
|
+
split_html_for_telegram(text, max_length=600)
|
|
72
|
+
|
|
73
|
+
|
|
74
|
+
def test_split_html_min_length_enforced():
|
|
75
|
+
with pytest.raises(ValueError):
|
|
76
|
+
split_html_for_telegram("hello", max_length=MIN_LENGTH - 1)
|
|
77
|
+
|
|
78
|
+
|
|
79
|
+
def test_split_html_long_word_exceeds_limit():
|
|
80
|
+
text = "a" * 600
|
|
81
|
+
chunks = split_html_for_telegram(text, max_length=550)
|
|
82
|
+
assert chunks == ["a" * 550, "a" * 50]
|
|
83
|
+
|
|
84
|
+
|
|
85
|
+
LONG_TEXT = "<b><i>" + "word " * 96 + "word!" + "</i></b>"
|
|
86
|
+
|
|
87
|
+
SHORT_TEXT = "<u>" + "another " * 9 + "another" + "</u>"
|
|
88
|
+
|
|
89
|
+
|
|
90
|
+
def test_split_html_keeps_newline_without_trim():
|
|
91
|
+
text = LONG_TEXT + "\n\n" + SHORT_TEXT
|
|
92
|
+
chunks = split_html_for_telegram(text, max_length=500, trim_empty_leading_lines=False)
|
|
93
|
+
assert chunks[0] == LONG_TEXT
|
|
94
|
+
assert chunks[1].startswith("\n")
|
|
95
|
+
assert chunks[1].endswith(SHORT_TEXT)
|
|
96
|
+
assert chunks[1].lstrip("\n").startswith("<u>")
|
|
97
|
+
assert chunks[1].lstrip("\n").endswith("</u>")
|
|
98
|
+
|
|
99
|
+
|
|
100
|
+
def test_split_html_trims_leading_newline_on_new_chunk():
|
|
101
|
+
text = LONG_TEXT + "\n\n" + SHORT_TEXT
|
|
102
|
+
chunks = split_html_for_telegram(text, max_length=500, trim_empty_leading_lines=True)
|
|
103
|
+
assert chunks == [LONG_TEXT, SHORT_TEXT]
|
|
@@ -1,114 +0,0 @@
|
|
|
1
|
-
import re
|
|
2
|
-
from html.parser import HTMLParser
|
|
3
|
-
|
|
4
|
-
MAX_LENGTH = 4096
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
class HTMLTagTracker(HTMLParser):
|
|
8
|
-
def __init__(self):
|
|
9
|
-
super().__init__()
|
|
10
|
-
self.open_tags = []
|
|
11
|
-
|
|
12
|
-
def handle_starttag(self, tag, attrs):
|
|
13
|
-
# saving tags
|
|
14
|
-
if tag in ("b", "i", "u", "s", "code", "pre", "a", "span", "blockquote"):
|
|
15
|
-
self.open_tags.append((tag, attrs))
|
|
16
|
-
|
|
17
|
-
def handle_endtag(self, tag):
|
|
18
|
-
for i in range(len(self.open_tags) - 1, -1, -1):
|
|
19
|
-
if self.open_tags[i][0] == tag:
|
|
20
|
-
del self.open_tags[i]
|
|
21
|
-
break
|
|
22
|
-
|
|
23
|
-
def get_open_tags_html(self):
|
|
24
|
-
parts = []
|
|
25
|
-
for tag, attrs in self.open_tags:
|
|
26
|
-
attr_str = ""
|
|
27
|
-
if attrs:
|
|
28
|
-
attr_str = " " + " ".join(f'{k}="{v}"' for k, v in attrs)
|
|
29
|
-
parts.append(f"<{tag}{attr_str}>")
|
|
30
|
-
return "".join(parts)
|
|
31
|
-
|
|
32
|
-
def get_closing_tags_html(self):
|
|
33
|
-
return "".join(f"</{tag}>" for tag, _ in reversed(self.open_tags))
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
def split_pre_block(pre_block: str) -> list[str]:
|
|
37
|
-
# language-aware: <pre><code class="language-python">...</code></pre>
|
|
38
|
-
match = re.match(r"<pre><code(.*?)>(.*)</code></pre>", pre_block, re.DOTALL)
|
|
39
|
-
if match:
|
|
40
|
-
attr, content = match.groups()
|
|
41
|
-
lines = content.splitlines(keepends=True)
|
|
42
|
-
chunks, buf = [], ""
|
|
43
|
-
for line in lines:
|
|
44
|
-
if len(buf) + len(line) + len('<pre><code></code></pre>') > MAX_LENGTH:
|
|
45
|
-
chunks.append(f"<pre><code{attr}>{buf}</code></pre>")
|
|
46
|
-
buf = ""
|
|
47
|
-
buf += line
|
|
48
|
-
if buf:
|
|
49
|
-
chunks.append(f"<pre><code{attr}>{buf}</code></pre>")
|
|
50
|
-
return chunks
|
|
51
|
-
else:
|
|
52
|
-
# regular <pre>...</pre>
|
|
53
|
-
inner = pre_block[5:-6]
|
|
54
|
-
lines = inner.splitlines(keepends=True)
|
|
55
|
-
chunks, buf = [], ""
|
|
56
|
-
for line in lines:
|
|
57
|
-
if len(buf) + len(line) + len('<pre></pre>') > MAX_LENGTH:
|
|
58
|
-
chunks.append(f"<pre>{buf}</pre>")
|
|
59
|
-
buf = ""
|
|
60
|
-
buf += line
|
|
61
|
-
if buf:
|
|
62
|
-
chunks.append(f"<pre>{buf}</pre>")
|
|
63
|
-
return chunks
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
def split_html_for_telegram(text: str) -> list[str]:
|
|
67
|
-
chunks = []
|
|
68
|
-
pattern = re.compile(r"(<pre>.*?</pre>|<pre><code.*?</code></pre>)", re.DOTALL)
|
|
69
|
-
parts = pattern.split(text)
|
|
70
|
-
|
|
71
|
-
for part in parts:
|
|
72
|
-
if not part:
|
|
73
|
-
continue
|
|
74
|
-
if part.startswith("<pre>") or part.startswith("<pre><code"):
|
|
75
|
-
pre_chunks = split_pre_block(part)
|
|
76
|
-
chunks.extend(pre_chunks)
|
|
77
|
-
else:
|
|
78
|
-
# breaking down regular HTML
|
|
79
|
-
tracker = HTMLTagTracker()
|
|
80
|
-
current = ""
|
|
81
|
-
blocks = re.split(r"(\n\s*\n|<br\s*/?>|\n)", part)
|
|
82
|
-
for block in blocks:
|
|
83
|
-
prospective = current + block
|
|
84
|
-
if len(prospective) > MAX_LENGTH:
|
|
85
|
-
tracker.feed(current)
|
|
86
|
-
open_tags = tracker.get_open_tags_html()
|
|
87
|
-
close_tags = tracker.get_closing_tags_html()
|
|
88
|
-
chunks.append(open_tags + current + close_tags)
|
|
89
|
-
current = block
|
|
90
|
-
tracker = HTMLTagTracker()
|
|
91
|
-
else:
|
|
92
|
-
current = prospective
|
|
93
|
-
if current.strip():
|
|
94
|
-
tracker.feed(current)
|
|
95
|
-
open_tags = tracker.get_open_tags_html()
|
|
96
|
-
close_tags = tracker.get_closing_tags_html()
|
|
97
|
-
chunks.append(open_tags + current + close_tags)
|
|
98
|
-
|
|
99
|
-
# post-unification: combine chunks if they don't exceed the limit in total
|
|
100
|
-
merged_chunks = []
|
|
101
|
-
buf = ""
|
|
102
|
-
for chunk in chunks:
|
|
103
|
-
# chunk = chunk.lstrip("\n") # removing leading line breaks
|
|
104
|
-
|
|
105
|
-
if len(buf) + len(chunk) <= MAX_LENGTH:
|
|
106
|
-
buf += chunk
|
|
107
|
-
else:
|
|
108
|
-
if buf:
|
|
109
|
-
merged_chunks.append(buf)
|
|
110
|
-
buf = chunk
|
|
111
|
-
if buf:
|
|
112
|
-
merged_chunks.append(buf)
|
|
113
|
-
|
|
114
|
-
return merged_chunks
|
|
@@ -1,436 +0,0 @@
|
|
|
1
|
-
from chatgpt_md_converter.html_splitter import split_html_for_telegram
|
|
2
|
-
|
|
3
|
-
input_text ="""
|
|
4
|
-
Absolutely! Here’s a Markdown-formatted message exceeding 5,000 characters, exploring <b>The History and Impact of Computer Programming</b>. (You can verify the character count using any online tool.)
|
|
5
|
-
|
|
6
|
-
---
|
|
7
|
-
|
|
8
|
-
<b>The History and Impact of Computer Programming</b>
|
|
9
|
-
|
|
10
|
-
<i>“The computer was born to solve problems that did not exist before.”</i>
|
|
11
|
-
— Bill Gates
|
|
12
|
-
|
|
13
|
-
---
|
|
14
|
-
|
|
15
|
-
<b>Table of Contents</b>
|
|
16
|
-
|
|
17
|
-
1. <a href="#introduction">Introduction</a>
|
|
18
|
-
2. <a href="#ancient-beginnings-from-algorithms-to-machines">Ancient Beginnings: From Algorithms to Machines</a>
|
|
19
|
-
• <a href="#al-khwarizmi-and-the-algorithm">Al-Khwarizmi and the Algorithm</a>
|
|
20
|
-
• <a href="#the-analytical-engine">The Analytical Engine</a>
|
|
21
|
-
• <a href="#punch-cards-and-the-jacquard-loom">Punch Cards and the Jacquard Loom</a>
|
|
22
|
-
3. <a href="#20th-century-the-birth-of-modern-programming">20th Century: The Birth of Modern Programming</a>
|
|
23
|
-
• <a href="#eniac-and-early-programmers">ENIAC and Early Programmers</a>
|
|
24
|
-
• <a href="#assembly-language-and-early-high-level-languages">Assembly Language and Early High-level Languages</a>
|
|
25
|
-
• <a href="#cobol-fortran-and-the-expansion">COBOL, FORTRAN, and the Expansion</a>
|
|
26
|
-
4. <a href="#modern-era-languages-paradigms-and-the-internet">Modern Era: Languages, Paradigms, and the Internet</a>
|
|
27
|
-
• <a href="#object-oriented-programming">Object-Oriented Programming</a>
|
|
28
|
-
• <a href="#internet-and-open-source">Internet and Open Source</a>
|
|
29
|
-
• <a href="#mobile-and-cloud-computing">Mobile and Cloud Computing</a>
|
|
30
|
-
5. <a href="#programmings-societal-impact">Programming’s Societal Impact</a>
|
|
31
|
-
6. <a href="#ethics-challenges-and-the-future">Ethics, Challenges, and the Future</a>
|
|
32
|
-
7. <a href="#conclusion">Conclusion</a>
|
|
33
|
-
8. <a href="#useful-resources">Useful Resources</a>
|
|
34
|
-
|
|
35
|
-
---
|
|
36
|
-
|
|
37
|
-
<b>Introduction</b>
|
|
38
|
-
|
|
39
|
-
Computer programming is the science and art of giving computers instructions to perform specific tasks. Today, it's impossible to imagine a world without software: from banking systems and mobile applications to traffic lights and airplanes, programming is everywhere.
|
|
40
|
-
|
|
41
|
-
But how did programming begin, and what has it become today? This document explores the journey of programming, from ancient mathematical roots to the future of artificial intelligence.
|
|
42
|
-
|
|
43
|
-
---
|
|
44
|
-
|
|
45
|
-
<b>Ancient Beginnings: From Algorithms to Machines</b>
|
|
46
|
-
|
|
47
|
-
<b>Al-Khwarizmi and the Algorithm</b>
|
|
48
|
-
|
|
49
|
-
The term "<b>algorithm</b>" (the foundation of programming) comes from Abu Abdullah Muhammad ibn Musa Al-Khwarizmi, a 9th-century Persian mathematician. His works on systematic procedures laid the groundwork for computational thinking.
|
|
50
|
-
|
|
51
|
-
<b>The Analytical Engine</b>
|
|
52
|
-
|
|
53
|
-
In the 19th century, <b>Charles Babbage</b> designed the Analytical Engine, a mechanical general-purpose computer. Though never built in his lifetime, it could—in theory—read instructions from punched cards.
|
|
54
|
-
|
|
55
|
-
<b>Ada Lovelace</b>, Babbage's collaborator, is often called the first computer programmer. She wrote notes describing algorithms (in essence, programs) for the Analytical Engine to compute Bernoulli numbers.
|
|
56
|
-
|
|
57
|
-
<blockquote>"That brain of mine is something more than merely mortal; as time will show."
|
|
58
|
-
– Ada Lovelace</blockquote>
|
|
59
|
-
|
|
60
|
-
<b>Punch Cards and the Jacquard Loom</b>
|
|
61
|
-
|
|
62
|
-
The concept of programming a machine with punched cards predates computers. <b>Joseph Marie Jacquard</b> invented a loom in 1804 that used punched cards to control patterns in woven fabric—an early example of machine automation.
|
|
63
|
-
|
|
64
|
-
---
|
|
65
|
-
|
|
66
|
-
<b>20th Century: The Birth of Modern Programming</b>
|
|
67
|
-
|
|
68
|
-
<b>ENIAC and Early Programmers</b>
|
|
69
|
-
|
|
70
|
-
ENIAC (Electronic Numerical Integrator and Computer), completed in 1945, is often cited as the first electronic general-purpose computer.
|
|
71
|
-
|
|
72
|
-
Early programming was entirely manual and physically laborious—think patch cables and switches!
|
|
73
|
-
|
|
74
|
-
Notably, many of the earliest programmers were women, such as <b>Kathleen McNulty</b>, <b>Jean Jennings</b>, and <b>Grace Hopper</b>.
|
|
75
|
-
|
|
76
|
-
<b>Assembly Language and Early High-level Languages</b>
|
|
77
|
-
|
|
78
|
-
The problem of complexity led to <b>assembly languages</b>, where mnemonics like <code>MOV</code> and <code>ADD</code> replaced binary codes. Programming became more accessible, but code was still hardware-specific.
|
|
79
|
-
|
|
80
|
-
The 1950s saw the creation of:
|
|
81
|
-
|
|
82
|
-
• <b>FORTRAN</b> (FORmula TRANslation) for scientific computation
|
|
83
|
-
• <b>COBOL</b> (COmmon Business-Oriented Language) for business applications
|
|
84
|
-
|
|
85
|
-
<b>Code Example: Hello World in COBOL</b>
|
|
86
|
-
<pre><code class="language-cobol">IDENTIFICATION DIVISION.
|
|
87
|
-
PROGRAM-ID. HELLO-WORLD.
|
|
88
|
-
PROCEDURE DIVISION.
|
|
89
|
-
DISPLAY "Hello, World!".
|
|
90
|
-
STOP RUN.
|
|
91
|
-
</code></pre>
|
|
92
|
-
|
|
93
|
-
<b>COBOL, FORTRAN, and the Expansion</b>
|
|
94
|
-
|
|
95
|
-
With the advent of high-level languages, programming became less about circuitry and more about solving problems. Standardized languages allowed code to run on multiple machines.
|
|
96
|
-
|
|
97
|
-
Other languages soon emerged:
|
|
98
|
-
|
|
99
|
-
• <b>LISP</b> (for AI research)
|
|
100
|
-
• <b>ALGOL</b> (basis for many future languages)
|
|
101
|
-
• <b>BASIC</b> (for beginners and education)
|
|
102
|
-
|
|
103
|
-
---
|
|
104
|
-
|
|
105
|
-
<b>Modern Era: Languages, Paradigms, and the Internet</b>
|
|
106
|
-
|
|
107
|
-
<b>Object-Oriented Programming</b>
|
|
108
|
-
|
|
109
|
-
The 1970s and 1980s introduced <b>object-oriented programming</b> (OOP), where data and behavior are bundled together. The most influential languages here include:
|
|
110
|
-
|
|
111
|
-
• <b>Smalltalk</b>: pioneered OOP concepts
|
|
112
|
-
• <b>C++</b>: combined OOP with the efficiency of C
|
|
113
|
-
• <b>Java</b>: “Write Once, Run Anywhere” with the Java Virtual Machine
|
|
114
|
-
|
|
115
|
-
<b>Code Example: Simple Class in Java</b>
|
|
116
|
-
<pre><code class="language-java">public class HelloWorld {
|
|
117
|
-
public static void main(String[] args) {
|
|
118
|
-
System.out.println("Hello, World!");
|
|
119
|
-
}
|
|
120
|
-
}
|
|
121
|
-
</code></pre>
|
|
122
|
-
|
|
123
|
-
<b>Internet and Open Source</b>
|
|
124
|
-
|
|
125
|
-
The rise of the World Wide Web transformed programming. JavaScript, PHP, and Python became staples for Internet-connected software.
|
|
126
|
-
|
|
127
|
-
<b>Open source</b> projects like Linux, Apache, and MySQL changed collaboration forever—developers worldwide could contribute to shared codebases.
|
|
128
|
-
|
|
129
|
-
| Year | Technology | Impact |
|
|
130
|
-
|------|-------------|-----------------------------------------|
|
|
131
|
-
| 1991 | Linux | Free, open-source operating systems |
|
|
132
|
-
| 1995 | JavaScript | Interactive web applications |
|
|
133
|
-
| 2001 | Wikipedia | Collaborative knowledge base |
|
|
134
|
-
|
|
135
|
-
<b>Mobile and Cloud Computing</b>
|
|
136
|
-
|
|
137
|
-
Smartphones spawned new languages and frameworks (Swift, Kotlin, React Native).
|
|
138
|
-
|
|
139
|
-
<b>Cloud computing</b> and <b>APIs</b> mean programs can collaborate on a global scale, in real-time.
|
|
140
|
-
|
|
141
|
-
---
|
|
142
|
-
|
|
143
|
-
<b>Programming’s Societal Impact</b>
|
|
144
|
-
|
|
145
|
-
Programming is reshaping society in profound ways:
|
|
146
|
-
|
|
147
|
-
• <b>Healthcare</b>: Medical imaging, diagnostics, record management
|
|
148
|
-
• <b>Finance</b>: Online banking, stock trading algorithms
|
|
149
|
-
• <b>Entertainment</b>: Gaming, music streaming, social networks
|
|
150
|
-
• <b>Education</b>: E-learning, interactive simulations, content platforms
|
|
151
|
-
• <b>Transportation</b>: Navigation, ride-sharing apps, autonomous vehicles
|
|
152
|
-
• <b>Science</b>: Processing large datasets, running complex simulations
|
|
153
|
-
|
|
154
|
-
<blockquote>“Software is eating the world.”
|
|
155
|
-
– Marc Andreessen</blockquote>
|
|
156
|
-
|
|
157
|
-
<b>Programming Jobs In Demand</b>
|
|
158
|
-
|
|
159
|
-
<pre><code class="language-mermaid">pie
|
|
160
|
-
title Programming Job Market (2024)
|
|
161
|
-
"Web Development" : 31
|
|
162
|
-
"Data Science" : 22
|
|
163
|
-
"Mobile Development" : 12
|
|
164
|
-
"Embedded Systems" : 8
|
|
165
|
-
"Cybersecurity" : 9
|
|
166
|
-
"Other": 18
|
|
167
|
-
</code></pre>
|
|
168
|
-
|
|
169
|
-
<b>Note:</b> Mermaid diagrams require compatible renderers (e.g., GitHub, Obsidian).
|
|
170
|
-
|
|
171
|
-
---
|
|
172
|
-
|
|
173
|
-
<b>Ethics, Challenges, and the Future</b>
|
|
174
|
-
|
|
175
|
-
With great power comes responsibility. Programmers face new challenges:
|
|
176
|
-
|
|
177
|
-
• <b>Bias in Algorithms:</b> Unintentional biases in data can lead to unfair outcomes (e.g., in hiring software or criminal justice prediction).
|
|
178
|
-
• <b>Privacy:</b> Handling personal data securely is more critical than ever.
|
|
179
|
-
• <b>Safety:</b> In fields like self-driving cars or medical devices, software bugs can have real-world consequences.
|
|
180
|
-
• <b>Sustainability:</b> Software should be efficient, minimizing environmental impact in data centers.
|
|
181
|
-
|
|
182
|
-
<b>Emerging Trends:</b>
|
|
183
|
-
|
|
184
|
-
• <b>Artificial Intelligence:</b> Programs that learn, adapt, and sometimes surprise their creators.
|
|
185
|
-
• <b>Quantum Computing:</b> New paradigms for solving currently intractable problems.
|
|
186
|
-
• <b>No-Code/Low-Code:</b> Empowering more people to harness computational power.
|
|
187
|
-
|
|
188
|
-
---
|
|
189
|
-
|
|
190
|
-
<b>Conclusion</b>
|
|
191
|
-
|
|
192
|
-
From mechanical looms to neural networks, programming continues to redefine what’s possible. It’s not just for professional engineers: millions of people use programming as a tool for art, science, business, and personal growth.
|
|
193
|
-
|
|
194
|
-
<b>Everyone can learn to code.</b> It might change your life—or even the world.
|
|
195
|
-
|
|
196
|
-
<blockquote>"Any sufficiently advanced technology is indistinguishable from magic."
|
|
197
|
-
— Arthur C. Clarke</blockquote>
|
|
198
|
-
|
|
199
|
-
---
|
|
200
|
-
|
|
201
|
-
<b>Useful Resources</b>
|
|
202
|
-
|
|
203
|
-
• <a href="https://www-cs-faculty.stanford.edu/~knuth/taocp.html">The Art of Computer Programming (Donald Knuth)</a>
|
|
204
|
-
• <a href="https://www.khanacademy.org/computing/computer-programming">Khan Academy Computer Programming</a>
|
|
205
|
-
• <a href="https://www.w3schools.com/">W3Schools Online Tutorials</a>
|
|
206
|
-
• <a href="https://www.freecodecamp.org/">freeCodeCamp</a>
|
|
207
|
-
• <a href="https://stackoverflow.com/">Stack Overflow</a>
|
|
208
|
-
• <a href="https://guides.github.com/">GitHub Guides</a>
|
|
209
|
-
|
|
210
|
-
---
|
|
211
|
-
|
|
212
|
-
<i>Thank you for reading! If you’re inspired to begin your coding journey, there has never been a better time to start.</i>
|
|
213
|
-
|
|
214
|
-
---"""
|
|
215
|
-
|
|
216
|
-
valid_chunk_1 = """
|
|
217
|
-
Absolutely! Here’s a Markdown-formatted message exceeding 5,000 characters, exploring <b>The History and Impact of Computer Programming</b>. (You can verify the character count using any online tool.)
|
|
218
|
-
|
|
219
|
-
---
|
|
220
|
-
|
|
221
|
-
<b>The History and Impact of Computer Programming</b>
|
|
222
|
-
|
|
223
|
-
<i>“The computer was born to solve problems that did not exist before.”</i>
|
|
224
|
-
— Bill Gates
|
|
225
|
-
|
|
226
|
-
---
|
|
227
|
-
|
|
228
|
-
<b>Table of Contents</b>
|
|
229
|
-
|
|
230
|
-
1. <a href="#introduction">Introduction</a>
|
|
231
|
-
2. <a href="#ancient-beginnings-from-algorithms-to-machines">Ancient Beginnings: From Algorithms to Machines</a>
|
|
232
|
-
• <a href="#al-khwarizmi-and-the-algorithm">Al-Khwarizmi and the Algorithm</a>
|
|
233
|
-
• <a href="#the-analytical-engine">The Analytical Engine</a>
|
|
234
|
-
• <a href="#punch-cards-and-the-jacquard-loom">Punch Cards and the Jacquard Loom</a>
|
|
235
|
-
3. <a href="#20th-century-the-birth-of-modern-programming">20th Century: The Birth of Modern Programming</a>
|
|
236
|
-
• <a href="#eniac-and-early-programmers">ENIAC and Early Programmers</a>
|
|
237
|
-
• <a href="#assembly-language-and-early-high-level-languages">Assembly Language and Early High-level Languages</a>
|
|
238
|
-
• <a href="#cobol-fortran-and-the-expansion">COBOL, FORTRAN, and the Expansion</a>
|
|
239
|
-
4. <a href="#modern-era-languages-paradigms-and-the-internet">Modern Era: Languages, Paradigms, and the Internet</a>
|
|
240
|
-
• <a href="#object-oriented-programming">Object-Oriented Programming</a>
|
|
241
|
-
• <a href="#internet-and-open-source">Internet and Open Source</a>
|
|
242
|
-
• <a href="#mobile-and-cloud-computing">Mobile and Cloud Computing</a>
|
|
243
|
-
5. <a href="#programmings-societal-impact">Programming’s Societal Impact</a>
|
|
244
|
-
6. <a href="#ethics-challenges-and-the-future">Ethics, Challenges, and the Future</a>
|
|
245
|
-
7. <a href="#conclusion">Conclusion</a>
|
|
246
|
-
8. <a href="#useful-resources">Useful Resources</a>
|
|
247
|
-
|
|
248
|
-
---
|
|
249
|
-
|
|
250
|
-
<b>Introduction</b>
|
|
251
|
-
|
|
252
|
-
Computer programming is the science and art of giving computers instructions to perform specific tasks. Today, it's impossible to imagine a world without software: from banking systems and mobile applications to traffic lights and airplanes, programming is everywhere.
|
|
253
|
-
|
|
254
|
-
But how did programming begin, and what has it become today? This document explores the journey of programming, from ancient mathematical roots to the future of artificial intelligence.
|
|
255
|
-
|
|
256
|
-
---
|
|
257
|
-
|
|
258
|
-
<b>Ancient Beginnings: From Algorithms to Machines</b>
|
|
259
|
-
|
|
260
|
-
<b>Al-Khwarizmi and the Algorithm</b>
|
|
261
|
-
|
|
262
|
-
The term "<b>algorithm</b>" (the foundation of programming) comes from Abu Abdullah Muhammad ibn Musa Al-Khwarizmi, a 9th-century Persian mathematician. His works on systematic procedures laid the groundwork for computational thinking.
|
|
263
|
-
|
|
264
|
-
<b>The Analytical Engine</b>
|
|
265
|
-
|
|
266
|
-
In the 19th century, <b>Charles Babbage</b> designed the Analytical Engine, a mechanical general-purpose computer. Though never built in his lifetime, it could—in theory—read instructions from punched cards.
|
|
267
|
-
|
|
268
|
-
<b>Ada Lovelace</b>, Babbage's collaborator, is often called the first computer programmer. She wrote notes describing algorithms (in essence, programs) for the Analytical Engine to compute Bernoulli numbers.
|
|
269
|
-
|
|
270
|
-
<blockquote>"That brain of mine is something more than merely mortal; as time will show."
|
|
271
|
-
– Ada Lovelace</blockquote>
|
|
272
|
-
|
|
273
|
-
<b>Punch Cards and the Jacquard Loom</b>
|
|
274
|
-
|
|
275
|
-
The concept of programming a machine with punched cards predates computers. <b>Joseph Marie Jacquard</b> invented a loom in 1804 that used punched cards to control patterns in woven fabric—an early example of machine automation.
|
|
276
|
-
|
|
277
|
-
---
|
|
278
|
-
|
|
279
|
-
<b>20th Century: The Birth of Modern Programming</b>
|
|
280
|
-
|
|
281
|
-
<b>ENIAC and Early Programmers</b>
|
|
282
|
-
|
|
283
|
-
ENIAC (Electronic Numerical Integrator and Computer), completed in 1945, is often cited as the first electronic general-purpose computer.
|
|
284
|
-
|
|
285
|
-
Early programming was entirely manual and physically laborious—think patch cables and switches!
|
|
286
|
-
|
|
287
|
-
Notably, many of the earliest programmers were women, such as <b>Kathleen McNulty</b>, <b>Jean Jennings</b>, and <b>Grace Hopper</b>.
|
|
288
|
-
|
|
289
|
-
<b>Assembly Language and Early High-level Languages</b>
|
|
290
|
-
|
|
291
|
-
"""
|
|
292
|
-
|
|
293
|
-
valid_chunk_2 = """The problem of complexity led to <b>assembly languages</b>, where mnemonics like <code>MOV</code> and <code>ADD</code> replaced binary codes. Programming became more accessible, but code was still hardware-specific.
|
|
294
|
-
|
|
295
|
-
The 1950s saw the creation of:
|
|
296
|
-
|
|
297
|
-
• <b>FORTRAN</b> (FORmula TRANslation) for scientific computation
|
|
298
|
-
• <b>COBOL</b> (COmmon Business-Oriented Language) for business applications
|
|
299
|
-
|
|
300
|
-
<b>Code Example: Hello World in COBOL</b>
|
|
301
|
-
<pre><code class="language-cobol">IDENTIFICATION DIVISION.
|
|
302
|
-
PROGRAM-ID. HELLO-WORLD.
|
|
303
|
-
PROCEDURE DIVISION.
|
|
304
|
-
DISPLAY "Hello, World!".
|
|
305
|
-
STOP RUN.
|
|
306
|
-
</code></pre>
|
|
307
|
-
|
|
308
|
-
<b>COBOL, FORTRAN, and the Expansion</b>
|
|
309
|
-
|
|
310
|
-
With the advent of high-level languages, programming became less about circuitry and more about solving problems. Standardized languages allowed code to run on multiple machines.
|
|
311
|
-
|
|
312
|
-
Other languages soon emerged:
|
|
313
|
-
|
|
314
|
-
• <b>LISP</b> (for AI research)
|
|
315
|
-
• <b>ALGOL</b> (basis for many future languages)
|
|
316
|
-
• <b>BASIC</b> (for beginners and education)
|
|
317
|
-
|
|
318
|
-
---
|
|
319
|
-
|
|
320
|
-
<b>Modern Era: Languages, Paradigms, and the Internet</b>
|
|
321
|
-
|
|
322
|
-
<b>Object-Oriented Programming</b>
|
|
323
|
-
|
|
324
|
-
The 1970s and 1980s introduced <b>object-oriented programming</b> (OOP), where data and behavior are bundled together. The most influential languages here include:
|
|
325
|
-
|
|
326
|
-
• <b>Smalltalk</b>: pioneered OOP concepts
|
|
327
|
-
• <b>C++</b>: combined OOP with the efficiency of C
|
|
328
|
-
• <b>Java</b>: “Write Once, Run Anywhere” with the Java Virtual Machine
|
|
329
|
-
|
|
330
|
-
<b>Code Example: Simple Class in Java</b>
|
|
331
|
-
<pre><code class="language-java">public class HelloWorld {
|
|
332
|
-
public static void main(String[] args) {
|
|
333
|
-
System.out.println("Hello, World!");
|
|
334
|
-
}
|
|
335
|
-
}
|
|
336
|
-
</code></pre>
|
|
337
|
-
|
|
338
|
-
<b>Internet and Open Source</b>
|
|
339
|
-
|
|
340
|
-
The rise of the World Wide Web transformed programming. JavaScript, PHP, and Python became staples for Internet-connected software.
|
|
341
|
-
|
|
342
|
-
<b>Open source</b> projects like Linux, Apache, and MySQL changed collaboration forever—developers worldwide could contribute to shared codebases.
|
|
343
|
-
|
|
344
|
-
| Year | Technology | Impact |
|
|
345
|
-
|------|-------------|-----------------------------------------|
|
|
346
|
-
| 1991 | Linux | Free, open-source operating systems |
|
|
347
|
-
| 1995 | JavaScript | Interactive web applications |
|
|
348
|
-
| 2001 | Wikipedia | Collaborative knowledge base |
|
|
349
|
-
|
|
350
|
-
<b>Mobile and Cloud Computing</b>
|
|
351
|
-
|
|
352
|
-
Smartphones spawned new languages and frameworks (Swift, Kotlin, React Native).
|
|
353
|
-
|
|
354
|
-
<b>Cloud computing</b> and <b>APIs</b> mean programs can collaborate on a global scale, in real-time.
|
|
355
|
-
|
|
356
|
-
---
|
|
357
|
-
|
|
358
|
-
<b>Programming’s Societal Impact</b>
|
|
359
|
-
|
|
360
|
-
Programming is reshaping society in profound ways:
|
|
361
|
-
|
|
362
|
-
• <b>Healthcare</b>: Medical imaging, diagnostics, record management
|
|
363
|
-
• <b>Finance</b>: Online banking, stock trading algorithms
|
|
364
|
-
• <b>Entertainment</b>: Gaming, music streaming, social networks
|
|
365
|
-
• <b>Education</b>: E-learning, interactive simulations, content platforms
|
|
366
|
-
• <b>Transportation</b>: Navigation, ride-sharing apps, autonomous vehicles
|
|
367
|
-
• <b>Science</b>: Processing large datasets, running complex simulations
|
|
368
|
-
|
|
369
|
-
<blockquote>“Software is eating the world.”
|
|
370
|
-
– Marc Andreessen</blockquote>
|
|
371
|
-
|
|
372
|
-
<b>Programming Jobs In Demand</b>
|
|
373
|
-
|
|
374
|
-
<pre><code class="language-mermaid">pie
|
|
375
|
-
title Programming Job Market (2024)
|
|
376
|
-
"Web Development" : 31
|
|
377
|
-
"Data Science" : 22
|
|
378
|
-
"Mobile Development" : 12
|
|
379
|
-
"Embedded Systems" : 8
|
|
380
|
-
"Cybersecurity" : 9
|
|
381
|
-
"Other": 18
|
|
382
|
-
</code></pre>"""
|
|
383
|
-
|
|
384
|
-
valid_chunk_3 = """
|
|
385
|
-
|
|
386
|
-
<b>Note:</b> Mermaid diagrams require compatible renderers (e.g., GitHub, Obsidian).
|
|
387
|
-
|
|
388
|
-
---
|
|
389
|
-
|
|
390
|
-
<b>Ethics, Challenges, and the Future</b>
|
|
391
|
-
|
|
392
|
-
With great power comes responsibility. Programmers face new challenges:
|
|
393
|
-
|
|
394
|
-
• <b>Bias in Algorithms:</b> Unintentional biases in data can lead to unfair outcomes (e.g., in hiring software or criminal justice prediction).
|
|
395
|
-
• <b>Privacy:</b> Handling personal data securely is more critical than ever.
|
|
396
|
-
• <b>Safety:</b> In fields like self-driving cars or medical devices, software bugs can have real-world consequences.
|
|
397
|
-
• <b>Sustainability:</b> Software should be efficient, minimizing environmental impact in data centers.
|
|
398
|
-
|
|
399
|
-
<b>Emerging Trends:</b>
|
|
400
|
-
|
|
401
|
-
• <b>Artificial Intelligence:</b> Programs that learn, adapt, and sometimes surprise their creators.
|
|
402
|
-
• <b>Quantum Computing:</b> New paradigms for solving currently intractable problems.
|
|
403
|
-
• <b>No-Code/Low-Code:</b> Empowering more people to harness computational power.
|
|
404
|
-
|
|
405
|
-
---
|
|
406
|
-
|
|
407
|
-
<b>Conclusion</b>
|
|
408
|
-
|
|
409
|
-
From mechanical looms to neural networks, programming continues to redefine what’s possible. It’s not just for professional engineers: millions of people use programming as a tool for art, science, business, and personal growth.
|
|
410
|
-
|
|
411
|
-
<b>Everyone can learn to code.</b> It might change your life—or even the world.
|
|
412
|
-
|
|
413
|
-
<blockquote>"Any sufficiently advanced technology is indistinguishable from magic."
|
|
414
|
-
— Arthur C. Clarke</blockquote>
|
|
415
|
-
|
|
416
|
-
---
|
|
417
|
-
|
|
418
|
-
<b>Useful Resources</b>
|
|
419
|
-
|
|
420
|
-
• <a href="https://www-cs-faculty.stanford.edu/~knuth/taocp.html">The Art of Computer Programming (Donald Knuth)</a>
|
|
421
|
-
• <a href="https://www.khanacademy.org/computing/computer-programming">Khan Academy Computer Programming</a>
|
|
422
|
-
• <a href="https://www.w3schools.com/">W3Schools Online Tutorials</a>
|
|
423
|
-
• <a href="https://www.freecodecamp.org/">freeCodeCamp</a>
|
|
424
|
-
• <a href="https://stackoverflow.com/">Stack Overflow</a>
|
|
425
|
-
• <a href="https://guides.github.com/">GitHub Guides</a>
|
|
426
|
-
|
|
427
|
-
---
|
|
428
|
-
|
|
429
|
-
<i>Thank you for reading! If you’re inspired to begin your coding journey, there has never been a better time to start.</i>
|
|
430
|
-
|
|
431
|
-
---"""
|
|
432
|
-
|
|
433
|
-
def test_splitter_test():
|
|
434
|
-
chunks = split_html_for_telegram(input_text)
|
|
435
|
-
valid_chunks = [valid_chunk_1, valid_chunk_2, valid_chunk_3]
|
|
436
|
-
assert chunks == valid_chunks
|
|
File without changes
|
{chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/converters.py
RENAMED
|
File without changes
|
{chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/formatters.py
RENAMED
|
File without changes
|
|
File without changes
|
{chatgpt_md_converter-0.3.7 → chatgpt_md_converter-0.3.8}/chatgpt_md_converter/telegram_formatter.py
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|