aize 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
aize-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Emmanuel Okoaze
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
aize-0.1.0/MANIFEST.in ADDED
@@ -0,0 +1,4 @@
1
+ include README.md
2
+ include LICENSE
3
+ include requirements.txt
4
+ recursive-include aize *.py
aize-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,433 @@
1
+ Metadata-Version: 2.4
2
+ Name: aize
3
+ Version: 0.1.0
4
+ Summary: aize — lightweight NLP analysis toolkit (Zipf, Heap's law, TF-IDF, sentiment, readability & more)
5
+ Author: eokoaze
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/eokoaze/aize
8
+ Project-URL: Repository, https://github.com/eokoaze/aize
9
+ Project-URL: Bug Tracker, https://github.com/eokoaze/aize/issues
10
+ Keywords: nlp,natural-language-processing,text-analysis,zipf,tfidf,sentiment,readability,wordcloud
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
+ Classifier: Topic :: Text Processing :: Linguistic
21
+ Requires-Python: >=3.9
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Requires-Dist: nltk>=3.8
25
+ Requires-Dist: scikit-learn>=1.2
26
+ Requires-Dist: wordcloud>=1.9
27
+ Requires-Dist: pandas>=1.5
28
+ Provides-Extra: dashboard
29
+ Requires-Dist: streamlit>=1.28; extra == "dashboard"
30
+ Requires-Dist: plotly>=5.0; extra == "dashboard"
31
+ Requires-Dist: Pillow>=9.0; extra == "dashboard"
32
+ Provides-Extra: api
33
+ Requires-Dist: fastapi>=0.100; extra == "api"
34
+ Requires-Dist: uvicorn>=0.23; extra == "api"
35
+ Requires-Dist: python-multipart>=0.0.6; extra == "api"
36
+ Provides-Extra: all
37
+ Requires-Dist: aize[dashboard]; extra == "all"
38
+ Requires-Dist: aize[api]; extra == "all"
39
+ Provides-Extra: dev
40
+ Requires-Dist: aize[all]; extra == "dev"
41
+ Requires-Dist: build>=1.0; extra == "dev"
42
+ Requires-Dist: twine>=5.0; extra == "dev"
43
+ Dynamic: license-file
44
+
45
+ # aize · NLP Analysis Toolkit
46
+
47
+ [![PyPI version](https://img.shields.io/pypi/v/aize.svg)](https://pypi.org/project/aize/)
48
+ [![Python](https://img.shields.io/pypi/pyversions/aize.svg)](https://pypi.org/project/aize/)
49
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
50
+
51
+ > A lightweight, pip-installable Python library for deep text analysis — covering everything from Zipf's law to sentiment, readability, TF-IDF, and more. Comes with a Streamlit dashboard and a FastAPI backend out of the box.
52
+
53
+ ---
54
+
55
+ ## Table of Contents
56
+
57
+ - [Features](#features)
58
+ - [Installation](#installation)
59
+ - [Quick Start](#quick-start)
60
+ - [Module Reference](#module-reference)
61
+ - [compute_stats](#compute_stats)
62
+ - [analyze_groupwords](#analyze_groupwords)
63
+ - [analyze_zipf](#analyze_zipf)
64
+ - [analyze_heaps](#analyze_heaps)
65
+ - [calculate_density](#calculate_density)
66
+ - [compare_vocab](#compare_vocab)
67
+ - [compute_tfidf](#compute_tfidf)
68
+ - [compute_ngrams](#compute_ngrams)
69
+ - [analyze_sentiment](#analyze_sentiment)
70
+ - [compute_readability](#compute_readability)
71
+ - [analyze_pos](#analyze_pos)
72
+ - [generate_wordcloud](#generate_wordcloud)
73
+ - [Streamlit Dashboard](#streamlit-dashboard)
74
+ - [FastAPI Backend](#fastapi-backend)
75
+ - [Dependencies](#dependencies)
76
+ - [Project Structure](#project-structure)
77
+ - [License](#license)
78
+
79
+ ---
80
+
81
+ ## Features
82
+
83
+ | Category | Capability |
84
+ |---|---|
85
+ | 📊 **Statistics** | Word count, unique words, avg word length, sentence count |
86
+ | 📏 **Word Grouping** | Frequency distribution grouped by word length |
87
+ | 📉 **Zipf's Law** | Rank-frequency distribution, hapax & dis legomena percentages |
88
+ | 📈 **Heap's Law** | Vocabulary growth curve as corpus size increases |
89
+ | 🚫 **Stopwords** | Stopword density analysis |
90
+ | 🔤 **Vocabulary** | Side-by-side vocabulary comparison across multiple texts |
91
+ | 🔍 **TF-IDF** | Top keyword extraction per document in a corpus |
92
+ | 🔗 **N-grams** | Most common bigrams and trigrams |
93
+ | 💬 **Sentiment** | VADER-based positive / negative / neutral / compound scoring |
94
+ | 📖 **Readability** | Flesch Reading Ease & Flesch-Kincaid Grade Level |
95
+ | 🏷️ **POS Tagging** | Part-of-speech frequency breakdown |
96
+ | ☁️ **Word Cloud** | Generates word cloud images from any text |
97
+ | 🖥️ **Dashboard** | Interactive Streamlit UI for all analyses |
98
+ | ⚡ **API** | FastAPI REST backend for programmatic access |
99
+
100
+ ---
101
+
102
+ ## Installation
103
+
104
+ ### Core library
105
+
106
+ ```bash
107
+ pip install aize
108
+ ```
109
+
110
+ ### With the Streamlit dashboard
111
+
112
+ ```bash
113
+ pip install aize[dashboard]
114
+ ```
115
+
116
+ ### With the FastAPI backend
117
+
118
+ ```bash
119
+ pip install aize[api]
120
+ ```
121
+
122
+ ### Everything (dashboard + API)
123
+
124
+ ```bash
125
+ pip install aize[all]
126
+ ```
127
+
128
+ ### From source (development)
129
+
130
+ ```bash
131
+ git clone https://github.com/eokoaze/aize.git
132
+ cd aize
133
+ pip install -e .[all]
134
+ ```
135
+
136
+ > **Python 3.9+** is required.
137
+
138
+ ---
139
+
140
+ ## Quick Start
141
+
142
+ ```python
143
+ import aize
144
+
145
+ text = """
146
+ Natural language processing is a subfield of linguistics and artificial intelligence.
147
+ It is primarily concerned with giving computers the ability to understand text and speech.
148
+ """
149
+
150
+ # Basic stats
151
+ print(aize.compute_stats(text))
152
+
153
+ # Sentiment
154
+ print(aize.analyze_sentiment(text))
155
+
156
+ # Readability
157
+ print(aize.compute_readability(text))
158
+
159
+ # Zipf's Law
160
+ print(aize.analyze_zipf(text))
161
+ ```
162
+
163
+ ---
164
+
165
+ ## Module Reference
166
+
167
+ ### `compute_stats`
168
+
169
+ ```python
170
+ from aize import compute_stats
171
+
172
+ result = compute_stats(text)
173
+ ```
174
+
175
+ Returns basic corpus statistics.
176
+
177
+ | Key | Type | Description |
178
+ |---|---|---|
179
+ | `word_count` | `int` | Total number of words |
180
+ | `unique_words` | `int` | Number of distinct words |
181
+ | `avg_word_length` | `float` | Average characters per word |
182
+ | `sentence_count` | `int` | Number of sentences |
183
+
184
+ ---
185
+
186
+ ### `analyze_groupwords`
187
+
188
+ ```python
189
+ from aize import analyze_groupwords
190
+
191
+ result = analyze_groupwords(text)
192
+ ```
193
+
194
+ Groups words by their character length and returns frequency counts per length bucket.
195
+
196
+ ---
197
+
198
+ ### `analyze_zipf`
199
+
200
+ ```python
201
+ from aize import analyze_zipf
202
+
203
+ result = analyze_zipf(text)
204
+ ```
205
+
206
+ Computes Zipf's Law statistics over the text.
207
+
208
+ | Key | Type | Description |
209
+ |---|---|---|
210
+ | `frequency` | `dict` | `{word: count}` sorted most → least frequent |
211
+ | `rank_freq` | `list[tuple]` | `[(rank, count)]` for rank-frequency plotting |
212
+ | `hapax_pct` | `float` | % of words appearing exactly once |
213
+ | `dis_pct` | `float` | % of words appearing exactly twice |
214
+ | `freq_gt2_pct` | `float` | % of words appearing more than twice |
215
+
216
+ ---
217
+
218
+ ### `analyze_heaps`
219
+
220
+ ```python
221
+ from aize import analyze_heaps
222
+
223
+ result = analyze_heaps(text)
224
+ ```
225
+
226
+ Returns a vocabulary growth curve (Heap's Law). Useful for visualising how the vocabulary expands as more text is read.
227
+
228
+ ---
229
+
230
+ ### `calculate_density`
231
+
232
+ ```python
233
+ from aize import calculate_density
234
+
235
+ result = calculate_density(text)
236
+ ```
237
+
238
+ Calculates the proportion of stopwords in the text, returning a stopword density percentage and associated word lists.
239
+
240
+ ---
241
+
242
+ ### `compare_vocab`
243
+
244
+ ```python
245
+ from aize import compare_vocab
246
+
247
+ result = compare_vocab({"doc1": text1, "doc2": text2})
248
+ ```
249
+
250
+ Compares vocabulary across multiple documents — unique words per document, shared vocabulary, and overlap statistics.
251
+
252
+ ---
253
+
254
+ ### `compute_tfidf`
255
+
256
+ ```python
257
+ from aize import compute_tfidf
258
+
259
+ result = compute_tfidf(
260
+ texts=["text of doc1...", "text of doc2..."],
261
+ labels=["doc1", "doc2"],
262
+ top_n=15
263
+ )
264
+ # Returns: {"doc1": [("word", score), ...], "doc2": [...]}
265
+ ```
266
+
267
+ Extracts the top `n` TF-IDF keywords for each document in a corpus. Uses scikit-learn under the hood with English stopword filtering.
268
+
269
+ ---
270
+
271
+ ### `compute_ngrams`
272
+
273
+ ```python
274
+ from aize import compute_ngrams
275
+
276
+ bigrams = compute_ngrams(text, n=2, top_n=20)
277
+ trigrams = compute_ngrams(text, n=3, top_n=20)
278
+ # Returns: [("phrase here", count), ...]
279
+ ```
280
+
281
+ Returns the most frequent n-grams (bigrams, trigrams, etc.) from the text.
282
+
283
+ ---
284
+
285
+ ### `analyze_sentiment`
286
+
287
+ ```python
288
+ from aize import analyze_sentiment
289
+
290
+ result = analyze_sentiment(text)
291
+ ```
292
+
293
+ Runs VADER sentiment analysis. NLTK's `vader_lexicon` is auto-downloaded on first use.
294
+
295
+ | Key | Type | Description |
296
+ |---|---|---|
297
+ | `positive` | `float` | Proportion of positive sentiment |
298
+ | `negative` | `float` | Proportion of negative sentiment |
299
+ | `neutral` | `float` | Proportion of neutral sentiment |
300
+ | `compound` | `float` | Overall score from `-1.0` (most negative) to `+1.0` (most positive) |
301
+ | `label` | `str` | `"Positive"`, `"Negative"`, or `"Neutral"` |
302
+
303
+ ---
304
+
305
+ ### `compute_readability`
306
+
307
+ ```python
308
+ from aize import compute_readability
309
+
310
+ result = compute_readability(text)
311
+ ```
312
+
313
+ Computes Flesch-Kincaid readability metrics.
314
+
315
+ | Key | Type | Description |
316
+ |---|---|---|
317
+ | `flesch_reading_ease` | `float` | 0–100 score; higher = easier to read |
318
+ | `fk_grade_level` | `float` | Approximate US school grade level |
319
+ | `sentences` | `int` | Sentence count |
320
+ | `words` | `int` | Word count |
321
+ | `syllables` | `int` | Total syllables |
322
+ | `interpretation` | `str` | `"Very Easy"` → `"Very Confusing"` |
323
+
324
+ ---
325
+
326
+ ### `analyze_pos`
327
+
328
+ ```python
329
+ from aize import analyze_pos
330
+
331
+ result = analyze_pos(text)
332
+ ```
333
+
334
+ Returns a part-of-speech frequency breakdown (nouns, verbs, adjectives, adverbs, etc.) using NLTK's POS tagger.
335
+
336
+ ---
337
+
338
+ ### `generate_wordcloud`
339
+
340
+ ```python
341
+ from aize import generate_wordcloud
342
+
343
+ image = generate_wordcloud(text)
344
+ ```
345
+
346
+ Generates a word cloud image from the input text. Returns a PIL `Image` object that can be displayed or saved.
347
+
348
+ ```python
349
+ image.save("wordcloud.png")
350
+ ```
351
+
352
+ ---
353
+
354
+ ## Streamlit Dashboard
355
+
356
+ An interactive, browser-based UI for all analyses is included.
357
+
358
+ ```bash
359
+ streamlit run nlp_dashboard.py
360
+ ```
361
+
362
+ The dashboard lets you upload one or more `.txt` files and interactively explore all analysis modules with charts and tables powered by Plotly.
363
+
364
+ ---
365
+
366
+ ## FastAPI Backend
367
+
368
+ A REST API is included for programmatic or remote access to the toolkit.
369
+
370
+ ```bash
371
+ uvicorn api:app --reload
372
+ ```
373
+
374
+ The API will be available at `http://127.0.0.1:8000`. Interactive docs are auto-generated at:
375
+
376
+ - **Swagger UI**: `http://127.0.0.1:8000/docs`
377
+ - **ReDoc**: `http://127.0.0.1:8000/redoc`
378
+
379
+ ---
380
+
381
+ ## Dependencies
382
+
383
+ | Package | Purpose |
384
+ |---|---|
385
+ | `nltk >= 3.8` | Tokenisation, POS tagging, VADER sentiment |
386
+ | `scikit-learn >= 1.2` | TF-IDF vectorisation |
387
+ | `wordcloud >= 1.9` | Word cloud image generation |
388
+ | `pandas >= 1.5` | Data manipulation |
389
+ | `plotly >= 5.0` | Interactive charts in the dashboard |
390
+ | `streamlit >= 1.28` | Web dashboard UI |
391
+ | `fastapi >= 0.100` | REST API framework |
392
+ | `uvicorn >= 0.23` | ASGI server for FastAPI |
393
+ | `python-multipart >= 0.0.6` | File upload support for FastAPI |
394
+
395
+ ---
396
+
397
+ ## Project Structure
398
+
399
+ ```
400
+ aize/
401
+ ├── aize/ # Core library package
402
+ │ ├── __init__.py # Public API surface
403
+ │ └── analysis/
404
+ │ ├── stats.py # Basic text statistics
405
+ │ ├── groupwords.py # Word length grouping
406
+ │ ├── zipf.py # Zipf's law analysis
407
+ │ ├── heaps.py # Heap's law analysis
408
+ │ ├── stopwords.py # Stopword density
409
+ │ ├── vocab.py # Vocabulary comparison
410
+ │ ├── tfidf.py # TF-IDF & n-grams
411
+ │ ├── sentiment.py # VADER sentiment
412
+ │ ├── readability.py # Flesch-Kincaid scores
413
+ │ ├── pos.py # POS tagging
414
+ │ └── wordcloud_gen.py # Word cloud generation
415
+ ├── .github/workflows/
416
+ │ └── publish.yml # Auto-publish to PyPI on version tags
417
+ ├── nlp_dashboard.py # Streamlit dashboard
418
+ ├── api.py # FastAPI REST backend
419
+ ├── pyproject.toml # Package config & dependency extras
420
+ ├── MANIFEST.in # Source distribution file rules
421
+ ├── requirements.txt # All-inclusive dev requirements
422
+ └── README.md
423
+ ```
424
+
425
+ ---
426
+
427
+ ## License
428
+
429
+ This project is licensed under the **MIT License**. See [LICENSE](LICENSE) for details.
430
+
431
+ ---
432
+
433
+ <p align="center">Built with ❤️ using Python, NLTK, scikit-learn, Streamlit & FastAPI</p>