py-text-toolkit 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Dawood Afzal
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,262 @@
1
+ Metadata-Version: 2.4
2
+ Name: py-text-toolkit
3
+ Version: 0.1.0
4
+ Summary: A comprehensive string utility library for Python.
5
+ Author-email: Dawood Afzal <dawoodafzal.62138@gmail.com>
6
+ Classifier: Programming Language :: Python :: 3
7
+ Classifier: License :: OSI Approved :: MIT License
8
+ Classifier: Operating System :: OS Independent
9
+ Requires-Python: >=3.8
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: emoji>=2.0.0
13
+ Dynamic: license-file
14
+
15
+ # py-text-toolkit
16
+
17
+ A lightweight, dependency-minimal Python library for everyday string operations — cleaning, validation, analysis, case conversion, and generation.
18
+
19
+ ---
20
+
21
+ ## Installation
22
+
23
+ ```bash
24
+ pip install py-text-toolkit
25
+ ```
26
+
27
+ > **Requires:** Python 3.8+
28
+ > **Optional dependency:** `emoji` (required only for `cleaning.remove_emojis`)
29
+
30
+ ---
31
+
32
+ ## Modules at a Glance
33
+
34
+ | Module | What it does |
35
+ |---|---|
36
+ | `py-text-toolkit.cleaning` | Strip, replace, and normalize raw text |
37
+ | `py-text-toolkit.validation` | Validate emails, URLs, passwords, and character sets |
38
+ | `py-text-toolkit.analysis` | Count, compare, and measure strings |
39
+ | `py-text-toolkit.format_cases` | Convert between naming conventions and formatting styles |
40
+ | `py-text-toolkit.generation` | Generate slugs, masks, ciphers, and reversed strings |
41
+
42
+ ---
43
+
44
+ ## Quick Start
45
+
46
+ ```python
47
+ from py-text-toolkit.cleaning import remove_html_tags, remove_urls
48
+ from py-text-toolkit.validation import is_email, is_strong_password
49
+ from py-text-toolkit.analysis import word_count, is_palindrome
50
+ from py-text-toolkit.format_cases import to_snake_case, to_camel_case
51
+ from py-text-toolkit.generation import generate_slug, mask_range
52
+
53
+ # Clean
54
+ remove_html_tags("<p>Hello <b>world</b></p>") # "Hello world"
55
+ remove_urls("Visit https://example.com today") # "Visit today"
56
+
57
+ # Validate
58
+ is_email("user@example.com") # True
59
+ is_strong_password("Passw0rd!") # True
60
+
61
+ # Analyse
62
+ word_count("Hello, world!") # 2
63
+ is_palindrome("A man a plan a canal Panama") # True
64
+
65
+ # Convert case
66
+ to_snake_case("camelCaseText") # "camel_case_text"
67
+ to_camel_case("hello_world") # "helloWorld"
68
+
69
+ # Generate
70
+ generate_slug("Hello World!") # "hello-world"
71
+ mask_range("1234-5678-9012", 5, 9, "*") # "1234-****-9012"
72
+ ```
73
+
74
+ ---
75
+
76
+ ## Module Reference
77
+
78
+ ### `py-text-toolkit.cleaning`
79
+
80
+ Functions for sanitising and normalising raw text.
81
+
82
+ | Function | Signature | Description |
83
+ |---|---|---|
84
+ | `normalize_whitespace` | `(text) → str` | Collapse all whitespace runs to a single space and strip ends |
85
+ | `remove_punctuation` | `(text, replace="") → str` | Remove or replace all punctuation characters |
86
+ | `remove_digits` | `(text, replace="") → str` | Remove or replace all digit characters |
87
+ | `remove_html_tags` | `(text, replace="") → str` | Strip or replace HTML tags |
88
+ | `remove_urls` | `(text, replace="") → str` | Remove or replace HTTP/HTTPS and `www.` URLs |
89
+ | `remove_emojis` | `(text, replace="") → str` | Remove or replace emoji characters (requires `emoji`) |
90
+ | `collapse_spaces` | `(text) → str` | Remove **all** whitespace (not just collapse) |
91
+
92
+ All cleaning functions accept an optional `replace` argument — the string substituted in place of each removed element (defaults to `""`). After replacement, whitespace is always normalized.
93
+
94
+ ```python
95
+ from py-text-toolkit.cleaning import remove_punctuation, remove_html_tags, remove_emojis
96
+
97
+ remove_punctuation("Hello, world!") # "Hello world"
98
+ remove_punctuation("Hello, world!", replace=" ") # "Hello world"
99
+
100
+ remove_html_tags("<p>Hello <b>world</b></p>") # "Hello world"
101
+ remove_html_tags("<br/>line1<br/>line2", replace=" ") # "line1 line2"
102
+
103
+ remove_emojis("Great job! 🎉") # "Great job!"
104
+ remove_emojis("Hello 😊", replace="[emoji]") # "Hello [emoji]"
105
+ ```
106
+
107
+ ---
108
+
109
+ ### `py-text-toolkit.validation`
110
+
111
+ Boolean predicates for common string formats.
112
+
113
+ | Function | Signature | Description |
114
+ |---|---|---|
115
+ | `is_email` | `(text) → bool` | Check for a valid email address |
116
+ | `is_url` | `(text) → bool` | Check for a valid HTTP or HTTPS URL |
117
+ | `contains_only` | `(text, allowed_chars) → bool` | Check that every character is in the allowed set |
118
+ | `is_strong_password` | `(text) → bool` | Check that a password meets strength requirements |
119
+
120
+ **Password requirements** (`is_strong_password`):
121
+ - Minimum 8 characters
122
+ - At least one lowercase letter
123
+ - At least one uppercase letter
124
+ - At least one digit
125
+ - At least one special character from `@$!%*?&`
126
+
127
+ ```python
128
+ from py-text-toolkit.validation import is_email, is_url, contains_only, is_strong_password
129
+
130
+ is_email("user@example.com") # True
131
+ is_email("not-an-email") # False
132
+
133
+ is_url("https://api.service.io/v1") # True
134
+ is_url("ftp://files.example.com") # False
135
+
136
+ contains_only("12345", "0123456789") # True
137
+ contains_only("hello!", "a-z") # False (literal chars only, not a range)
138
+
139
+ is_strong_password("Passw0rd!") # True
140
+ is_strong_password("weakpass") # False
141
+ ```
142
+
143
+ > **Note on `contains_only`:** `allowed_chars` is treated as a set of literal characters. Special regex characters are escaped automatically, so `"a-z"` matches only the three characters `a`, `-`, and `z`, **not** a range.
144
+
145
+ ---
146
+
147
+ ### `py-text-toolkit.analysis`
148
+
149
+ Functions that measure and compare strings.
150
+
151
+ | Function | Signature | Description |
152
+ |---|---|---|
153
+ | `word_count` | `(text) → int` | Count words using regex word-boundary matching |
154
+ | `char_frequency` | `(text, char) → int` | Count non-overlapping occurrences of a character or substring |
155
+ | `count_vowels` | `(text) → int` | Count English vowels (a e i o u), case-insensitive |
156
+ | `longest_word` | `(text) → int` | Return the length of the longest whitespace-delimited word |
157
+ | `is_palindrome` | `(text, case_sensitive=False, ignore_formatting=True) → bool` | Check if a string is a palindrome |
158
+ | `is_anagram` | `(word1, word2) → bool` | Check if two strings are anagrams (case-insensitive, ignores spaces) |
159
+
160
+ ```python
161
+ from py-text-toolkit.analysis import word_count, is_palindrome, is_anagram, char_frequency
162
+
163
+ word_count("Hello, world!") # 2
164
+ word_count(" spaces everywhere ") # 2
165
+
166
+ char_frequency("banana", "an") # 2
167
+
168
+ is_palindrome("racecar") # True
169
+ is_palindrome("A man a plan a canal Panama") # True
170
+ is_palindrome("Racecar", case_sensitive=True) # False
171
+
172
+ is_anagram("listen", "silent") # True
173
+ is_anagram("Astronomer", "Moon starer") # True
174
+ ```
175
+
176
+ ---
177
+
178
+ ### `py-text-toolkit.format_cases`
179
+
180
+ Convert strings between naming conventions and apply text formatting.
181
+
182
+ | Function | Signature | Description |
183
+ |---|---|---|
184
+ | `to_snake_case` | `(text) → str` | Convert to `snake_case` |
185
+ | `to_camel_case` | `(text) → str` | Convert to `camelCase` |
186
+ | `to_pascal_case` | `(text) → str` | Convert to `PascalCase` |
187
+ | `to_kebab_case` | `(text) → str` | Convert to `kebab-case` |
188
+ | `to_title_case` | `(text) → str` | Convert to `Title Case` |
189
+ | `truncate` | `(text, max_length, suffix="...") → str` | Truncate to a maximum length with a suffix |
190
+ | `pad_center` | `(text, width, fillchar=" ") → str` | Center-pad to a given width |
191
+
192
+ All case converters handle mixed input (camelCase, PascalCase, snake_case, kebab-case, spaces).
193
+
194
+ ```python
195
+ from py-text-toolkit.format_cases import to_snake_case, to_camel_case, truncate, pad_center
196
+
197
+ to_snake_case("camelCaseText") # "camel_case_text"
198
+ to_snake_case("Hello World!") # "hello_world"
199
+
200
+ to_camel_case("hello_world") # "helloWorld"
201
+ to_camel_case("PascalCaseText") # "pascalCaseText"
202
+
203
+ to_pascal_case("kebab-case-text") # "KebabCaseText"
204
+ to_kebab_case("camelCaseText") # "camel-case-text"
205
+ to_title_case("hello_world") # "Hello World"
206
+
207
+ truncate("Hello, World!", 8) # "Hello..."
208
+ truncate("Hi", 10) # "Hi"
209
+
210
+ pad_center("hello", 11) # " hello "
211
+ pad_center("hi", 10, "-") # "----hi----"
212
+ ```
213
+
214
+ ---
215
+
216
+ ### `py-text-toolkit.generation`
217
+
218
+ Functions that produce new strings from existing ones.
219
+
220
+ | Function | Signature | Description |
221
+ |---|---|---|
222
+ | `generate_slug` | `(text) → str` | Convert to a URL-friendly slug |
223
+ | `reverse_word` | `(text) → str` | Reverse all characters |
224
+ | `mask_range` | `(text, start_index, end_index, placeholder="X") → str` | Mask a character range with a placeholder |
225
+ | `ceasar_cipher` | `(text, shift) → str` | Encrypt/decrypt with the Caesar cipher |
226
+
227
+ ```python
228
+ from py-text-toolkit.generation import generate_slug, mask_range, ceasar_cipher, reverse_word
229
+
230
+ generate_slug("Hello World!") # "hello-world"
231
+ generate_slug("Python 3.11 -- Release Notes") # "python-3-11-release-notes"
232
+
233
+ reverse_word("hello") # "olleh"
234
+
235
+ mask_range("1234-5678-9012", 5, 9, "*") # "1234-****-9012"
236
+ mask_range("secret", -3, -1) # "secXXt"
237
+
238
+ ceasar_cipher("Hello, World!", 3) # "Khoor, Zruog!"
239
+ ceasar_cipher("Khoor, Zruog!", -3) # "Hello, World!" (decrypt)
240
+ ```
241
+
242
+ ---
243
+
244
+ ## Dependencies
245
+
246
+ | Package | Required | Used by |
247
+ |---|---|---|
248
+ | `re` (stdlib) | Always | All modules |
249
+ | `string` (stdlib) | Always | `cleaning` |
250
+ | `emoji` | Optional | `cleaning.remove_emojis` only |
251
+
252
+ Install with the optional dependency:
253
+
254
+ ```bash
255
+ pip install py-text-toolkit[emoji]
256
+ ```
257
+
258
+ ---
259
+
260
+ ## License
261
+
262
+ MIT License — see [LICENSE](LICENSE) for details.
@@ -0,0 +1,248 @@
1
+ # py-text-toolkit
2
+
3
+ A lightweight, dependency-minimal Python library for everyday string operations — cleaning, validation, analysis, case conversion, and generation.
4
+
5
+ ---
6
+
7
+ ## Installation
8
+
9
+ ```bash
10
+ pip install py-text-toolkit
11
+ ```
12
+
13
+ > **Requires:** Python 3.8+
14
+ > **Optional dependency:** `emoji` (required only for `cleaning.remove_emojis`)
15
+
16
+ ---
17
+
18
+ ## Modules at a Glance
19
+
20
+ | Module | What it does |
21
+ |---|---|
22
+ | `py-text-toolkit.cleaning` | Strip, replace, and normalize raw text |
23
+ | `py-text-toolkit.validation` | Validate emails, URLs, passwords, and character sets |
24
+ | `py-text-toolkit.analysis` | Count, compare, and measure strings |
25
+ | `py-text-toolkit.format_cases` | Convert between naming conventions and formatting styles |
26
+ | `py-text-toolkit.generation` | Generate slugs, masks, ciphers, and reversed strings |
27
+
28
+ ---
29
+
30
+ ## Quick Start
31
+
32
+ ```python
33
+ from py-text-toolkit.cleaning import remove_html_tags, remove_urls
34
+ from py-text-toolkit.validation import is_email, is_strong_password
35
+ from py-text-toolkit.analysis import word_count, is_palindrome
36
+ from py-text-toolkit.format_cases import to_snake_case, to_camel_case
37
+ from py-text-toolkit.generation import generate_slug, mask_range
38
+
39
+ # Clean
40
+ remove_html_tags("<p>Hello <b>world</b></p>") # "Hello world"
41
+ remove_urls("Visit https://example.com today") # "Visit today"
42
+
43
+ # Validate
44
+ is_email("user@example.com") # True
45
+ is_strong_password("Passw0rd!") # True
46
+
47
+ # Analyse
48
+ word_count("Hello, world!") # 2
49
+ is_palindrome("A man a plan a canal Panama") # True
50
+
51
+ # Convert case
52
+ to_snake_case("camelCaseText") # "camel_case_text"
53
+ to_camel_case("hello_world") # "helloWorld"
54
+
55
+ # Generate
56
+ generate_slug("Hello World!") # "hello-world"
57
+ mask_range("1234-5678-9012", 5, 9, "*") # "1234-****-9012"
58
+ ```
59
+
60
+ ---
61
+
62
+ ## Module Reference
63
+
64
+ ### `py-text-toolkit.cleaning`
65
+
66
+ Functions for sanitising and normalising raw text.
67
+
68
+ | Function | Signature | Description |
69
+ |---|---|---|
70
+ | `normalize_whitespace` | `(text) → str` | Collapse all whitespace runs to a single space and strip ends |
71
+ | `remove_punctuation` | `(text, replace="") → str` | Remove or replace all punctuation characters |
72
+ | `remove_digits` | `(text, replace="") → str` | Remove or replace all digit characters |
73
+ | `remove_html_tags` | `(text, replace="") → str` | Strip or replace HTML tags |
74
+ | `remove_urls` | `(text, replace="") → str` | Remove or replace HTTP/HTTPS and `www.` URLs |
75
+ | `remove_emojis` | `(text, replace="") → str` | Remove or replace emoji characters (requires `emoji`) |
76
+ | `collapse_spaces` | `(text) → str` | Remove **all** whitespace (not just collapse) |
77
+
78
+ All cleaning functions accept an optional `replace` argument — the string substituted in place of each removed element (defaults to `""`). After replacement, whitespace is always normalized.
79
+
80
+ ```python
81
+ from py-text-toolkit.cleaning import remove_punctuation, remove_html_tags, remove_emojis
82
+
83
+ remove_punctuation("Hello, world!") # "Hello world"
84
+ remove_punctuation("Hello, world!", replace=" ") # "Hello world"
85
+
86
+ remove_html_tags("<p>Hello <b>world</b></p>") # "Hello world"
87
+ remove_html_tags("<br/>line1<br/>line2", replace=" ") # "line1 line2"
88
+
89
+ remove_emojis("Great job! 🎉") # "Great job!"
90
+ remove_emojis("Hello 😊", replace="[emoji]") # "Hello [emoji]"
91
+ ```
92
+
93
+ ---
94
+
95
+ ### `py-text-toolkit.validation`
96
+
97
+ Boolean predicates for common string formats.
98
+
99
+ | Function | Signature | Description |
100
+ |---|---|---|
101
+ | `is_email` | `(text) → bool` | Check for a valid email address |
102
+ | `is_url` | `(text) → bool` | Check for a valid HTTP or HTTPS URL |
103
+ | `contains_only` | `(text, allowed_chars) → bool` | Check that every character is in the allowed set |
104
+ | `is_strong_password` | `(text) → bool` | Check that a password meets strength requirements |
105
+
106
+ **Password requirements** (`is_strong_password`):
107
+ - Minimum 8 characters
108
+ - At least one lowercase letter
109
+ - At least one uppercase letter
110
+ - At least one digit
111
+ - At least one special character from `@$!%*?&`
112
+
113
+ ```python
114
+ from py-text-toolkit.validation import is_email, is_url, contains_only, is_strong_password
115
+
116
+ is_email("user@example.com") # True
117
+ is_email("not-an-email") # False
118
+
119
+ is_url("https://api.service.io/v1") # True
120
+ is_url("ftp://files.example.com") # False
121
+
122
+ contains_only("12345", "0123456789") # True
123
+ contains_only("hello!", "a-z") # False (literal chars only, not a range)
124
+
125
+ is_strong_password("Passw0rd!") # True
126
+ is_strong_password("weakpass") # False
127
+ ```
128
+
129
+ > **Note on `contains_only`:** `allowed_chars` is treated as a set of literal characters. Special regex characters are escaped automatically, so `"a-z"` matches only the three characters `a`, `-`, and `z`, **not** a range.
130
+
131
+ ---
132
+
133
+ ### `py-text-toolkit.analysis`
134
+
135
+ Functions that measure and compare strings.
136
+
137
+ | Function | Signature | Description |
138
+ |---|---|---|
139
+ | `word_count` | `(text) → int` | Count words using regex word-boundary matching |
140
+ | `char_frequency` | `(text, char) → int` | Count non-overlapping occurrences of a character or substring |
141
+ | `count_vowels` | `(text) → int` | Count English vowels (a e i o u), case-insensitive |
142
+ | `longest_word` | `(text) → int` | Return the length of the longest whitespace-delimited word |
143
+ | `is_palindrome` | `(text, case_sensitive=False, ignore_formatting=True) → bool` | Check if a string is a palindrome |
144
+ | `is_anagram` | `(word1, word2) → bool` | Check if two strings are anagrams (case-insensitive, ignores spaces) |
145
+
146
+ ```python
147
+ from py-text-toolkit.analysis import word_count, is_palindrome, is_anagram, char_frequency
148
+
149
+ word_count("Hello, world!") # 2
150
+ word_count(" spaces everywhere ") # 2
151
+
152
+ char_frequency("banana", "an") # 2
153
+
154
+ is_palindrome("racecar") # True
155
+ is_palindrome("A man a plan a canal Panama") # True
156
+ is_palindrome("Racecar", case_sensitive=True) # False
157
+
158
+ is_anagram("listen", "silent") # True
159
+ is_anagram("Astronomer", "Moon starer") # True
160
+ ```
161
+
162
+ ---
163
+
164
+ ### `py-text-toolkit.format_cases`
165
+
166
+ Convert strings between naming conventions and apply text formatting.
167
+
168
+ | Function | Signature | Description |
169
+ |---|---|---|
170
+ | `to_snake_case` | `(text) → str` | Convert to `snake_case` |
171
+ | `to_camel_case` | `(text) → str` | Convert to `camelCase` |
172
+ | `to_pascal_case` | `(text) → str` | Convert to `PascalCase` |
173
+ | `to_kebab_case` | `(text) → str` | Convert to `kebab-case` |
174
+ | `to_title_case` | `(text) → str` | Convert to `Title Case` |
175
+ | `truncate` | `(text, max_length, suffix="...") → str` | Truncate to a maximum length with a suffix |
176
+ | `pad_center` | `(text, width, fillchar=" ") → str` | Center-pad to a given width |
177
+
178
+ All case converters handle mixed input (camelCase, PascalCase, snake_case, kebab-case, spaces).
179
+
180
+ ```python
181
+ from py-text-toolkit.format_cases import to_snake_case, to_camel_case, truncate, pad_center
182
+
183
+ to_snake_case("camelCaseText") # "camel_case_text"
184
+ to_snake_case("Hello World!") # "hello_world"
185
+
186
+ to_camel_case("hello_world") # "helloWorld"
187
+ to_camel_case("PascalCaseText") # "pascalCaseText"
188
+
189
+ to_pascal_case("kebab-case-text") # "KebabCaseText"
190
+ to_kebab_case("camelCaseText") # "camel-case-text"
191
+ to_title_case("hello_world") # "Hello World"
192
+
193
+ truncate("Hello, World!", 8) # "Hello..."
194
+ truncate("Hi", 10) # "Hi"
195
+
196
+ pad_center("hello", 11) # " hello "
197
+ pad_center("hi", 10, "-") # "----hi----"
198
+ ```
199
+
200
+ ---
201
+
202
+ ### `py-text-toolkit.generation`
203
+
204
+ Functions that produce new strings from existing ones.
205
+
206
+ | Function | Signature | Description |
207
+ |---|---|---|
208
+ | `generate_slug` | `(text) → str` | Convert to a URL-friendly slug |
209
+ | `reverse_word` | `(text) → str` | Reverse all characters |
210
+ | `mask_range` | `(text, start_index, end_index, placeholder="X") → str` | Mask a character range with a placeholder |
211
+ | `ceasar_cipher` | `(text, shift) → str` | Encrypt/decrypt with the Caesar cipher |
212
+
213
+ ```python
214
+ from py-text-toolkit.generation import generate_slug, mask_range, ceasar_cipher, reverse_word
215
+
216
+ generate_slug("Hello World!") # "hello-world"
217
+ generate_slug("Python 3.11 -- Release Notes") # "python-3-11-release-notes"
218
+
219
+ reverse_word("hello") # "olleh"
220
+
221
+ mask_range("1234-5678-9012", 5, 9, "*") # "1234-****-9012"
222
+ mask_range("secret", -3, -1) # "secXXt"
223
+
224
+ ceasar_cipher("Hello, World!", 3) # "Khoor, Zruog!"
225
+ ceasar_cipher("Khoor, Zruog!", -3) # "Hello, World!" (decrypt)
226
+ ```
227
+
228
+ ---
229
+
230
+ ## Dependencies
231
+
232
+ | Package | Required | Used by |
233
+ |---|---|---|
234
+ | `re` (stdlib) | Always | All modules |
235
+ | `string` (stdlib) | Always | `cleaning` |
236
+ | `emoji` | Optional | `cleaning.remove_emojis` only |
237
+
238
+ Install with the optional dependency:
239
+
240
+ ```bash
241
+ pip install py-text-toolkit[emoji]
242
+ ```
243
+
244
+ ---
245
+
246
+ ## License
247
+
248
+ MIT License — see [LICENSE](LICENSE) for details.