pystylometry 1.1.0__py3-none-any.whl → 1.3.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,278 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: pystylometry
3
- Version: 1.1.0
4
- Summary: Comprehensive Python package for stylometric analysis
5
- License: MIT
6
- Keywords: stylometry,nlp,text-analysis,authorship,readability,lexical-diversity,readability-metrics
7
- Author: Craig Trim
8
- Author-email: craigtrim@gmail.com
9
- Requires-Python: >=3.11,<4.0
10
- Classifier: Development Status :: 4 - Beta
11
- Classifier: Intended Audience :: Developers
12
- Classifier: Intended Audience :: Science/Research
13
- Classifier: License :: OSI Approved :: MIT License
14
- Classifier: Programming Language :: Python :: 3
15
- Classifier: Programming Language :: Python :: 3.11
16
- Classifier: Programming Language :: Python :: 3.12
17
- Classifier: Programming Language :: Python :: 3.13
18
- Classifier: Programming Language :: Python :: 3.14
19
- Classifier: Programming Language :: Python :: 3.10
20
- Classifier: Programming Language :: Python :: 3.9
21
- Classifier: Topic :: Scientific/Engineering :: Information Analysis
22
- Classifier: Topic :: Text Processing :: Linguistic
23
- Classifier: Typing :: Typed
24
- Requires-Dist: stylometry-ttr (>=1.0.3,<2.0.0)
25
- Project-URL: Homepage, https://github.com/craigtrim/pystylometry
26
- Project-URL: Issues, https://github.com/craigtrim/pystylometry/issues
27
- Project-URL: Repository, https://github.com/craigtrim/pystylometry
28
- Description-Content-Type: text/markdown
29
-
30
- # pystylometry
31
-
32
- [![Python Version](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org/downloads/)
33
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
34
- [![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
35
- [![PyPI version](https://badge.fury.io/py/pystylometry.svg)](https://badge.fury.io/py/pystylometry)
36
- [![Downloads](https://pepy.tech/badge/pystylometry)](https://pepy.tech/project/pystylometry)
37
-
38
- A comprehensive Python package for stylometric analysis with modular architecture and optional dependencies.
39
-
40
- ## Features
41
-
42
- **pystylometry** provides 50+ metrics across five analysis domains:
43
-
44
- - **Lexical Diversity**: TTR, MTLD, Yule's K, Hapax ratios, and more
45
- - **Readability**: Flesch, SMOG, Gunning Fog, Coleman-Liau, ARI
46
- - **Syntactic Analysis**: POS ratios, sentence statistics (requires spaCy)
47
- - **Authorship Attribution**: Burrows' Delta, Cosine Delta, Zeta scores
48
- - **N-gram Analysis**: Character and word bigram entropy, perplexity
49
-
50
- ## Installation
51
-
52
- Install only what you need:
53
-
54
- ```bash
55
- # Core package (lexical metrics only)
56
- pip install pystylometry
57
-
58
- # With readability metrics
59
- pip install pystylometry[readability]
60
-
61
- # With syntactic metrics (requires spaCy)
62
- pip install pystylometry[syntactic]
63
-
64
- # With authorship metrics
65
- pip install pystylometry[authorship]
66
-
67
- # With n-gram analysis
68
- pip install pystylometry[ngrams]
69
-
70
- # Everything
71
- pip install pystylometry[all]
72
- ```
73
-
74
- ## Quick Start
75
-
76
- ### Using Individual Modules
77
-
78
- ```python
79
- from pystylometry.lexical import compute_mtld, compute_yule
80
- from pystylometry.readability import compute_flesch
81
-
82
- text = "Your text here..."
83
-
84
- # Lexical diversity
85
- mtld = compute_mtld(text)
86
- print(f"MTLD: {mtld.mtld_average:.2f}")
87
-
88
- yule = compute_yule(text)
89
- print(f"Yule's K: {yule.yule_k:.2f}")
90
-
91
- # Readability
92
- flesch = compute_flesch(text)
93
- print(f"Reading Ease: {flesch.reading_ease:.1f}")
94
- print(f"Grade Level: {flesch.grade_level:.1f}")
95
- ```
96
-
97
- ### Using the Unified API
98
-
99
- ```python
100
- from pystylometry import analyze
101
-
102
- text = "Your text here..."
103
-
104
- # Analyze with multiple metrics at once
105
- results = analyze(text, lexical=True, readability=True)
106
-
107
- # Access results
108
- print(f"MTLD: {results.lexical['mtld'].mtld_average:.2f}")
109
- print(f"Flesch: {results.readability['flesch'].reading_ease:.1f}")
110
- ```
111
-
112
- ### Checking Available Modules
113
-
114
- ```python
115
- from pystylometry import get_available_modules
116
-
117
- available = get_available_modules()
118
- print(available)
119
- # {'lexical': True, 'readability': True, 'syntactic': False, ...}
120
- ```
121
-
122
- ## API Design
123
-
124
- ### Clean, Consistent Interface
125
-
126
- Every metric function:
127
- - Takes text as input
128
- - Returns a rich result object (never just a float)
129
- - Includes metadata about the computation
130
- - Has comprehensive docstrings with formulas and references
131
-
132
- ```python
133
- from pystylometry.lexical import compute_yule
134
-
135
- result = compute_yule(text)
136
- # Returns: YuleResult(yule_k=..., yule_i=..., metadata={...})
137
- ```
138
-
139
- ## Available Metrics
140
-
141
- ### Lexical Diversity
142
- - **TTR** - Type-Token Ratio (via stylometry-ttr)
143
- - **MTLD** - Measure of Textual Lexical Diversity
144
- - **Yule's K** - Vocabulary repetitiveness
145
- - **Hapax Legomena** - Words appearing once/twice
146
- - **Sichel's S** - Hapax-based richness
147
- - **Honoré's R** - Vocabulary richness constant
148
-
149
- ### Readability
150
- - **Flesch Reading Ease** - 0-100 difficulty scale
151
- - **Flesch-Kincaid Grade** - US grade level
152
- - **SMOG Index** - Years of education needed
153
- - **Gunning Fog** - NLP-enhanced readability complexity (see below)
154
- - **Coleman-Liau** - Character-based grade level
155
- - **ARI** - Automated Readability Index
156
-
157
- #### Gunning Fog Index - NLP Enhancement
158
-
159
- The Gunning Fog Index implementation includes advanced NLP features when spaCy is available:
160
-
161
- **Enhanced Mode** (with spaCy):
162
- - Accurate proper noun detection via POS tagging (PROPN)
163
- - True morphological analysis via lemmatization
164
- - Component-based hyphenated word analysis
165
- - Handles edge cases: acronyms, irregular verbs, compound nouns
166
-
167
- **Basic Mode** (without spaCy):
168
- - Capitalization-based proper noun detection
169
- - Simple suffix stripping for inflections (-es, -ed, -ing)
170
- - Component-based hyphenated word analysis
171
- - Works without external dependencies
172
-
173
- ```python
174
- from pystylometry.readability import compute_gunning_fog
175
-
176
- text = "Understanding computational linguistics requires significant dedication."
177
- result = compute_gunning_fog(text)
178
-
179
- print(f"Fog Index: {result.fog_index:.1f}")
180
- print(f"Grade Level: {result.grade_level}")
181
- print(f"Detection Mode: {result.metadata['mode']}") # "enhanced" or "basic"
182
- ```
183
-
184
- **To enable enhanced mode:**
185
- ```bash
186
- pip install pystylometry[readability]
187
- python -m spacy download en_core_web_sm
188
- ```
189
-
190
- **Reference:** Gunning, R. (1952). The Technique of Clear Writing. McGraw-Hill.
191
-
192
- **Implementation Details:** See [GitHub PR #4](https://github.com/craigtrim/pystylometry/pull/4) for the rationale behind NLP enhancements.
193
-
194
- ### Syntactic (requires spaCy)
195
- - **POS Ratios** - Noun/verb/adjective/adverb ratios
196
- - **Lexical Density** - Content vs function words
197
- - **Sentence Statistics** - Length, variation, complexity
198
-
199
- ### Authorship (requires scikit-learn, scipy)
200
- - **Burrows' Delta** - Author distance measure
201
- - **Cosine Delta** - Angular distance
202
- - **Zeta Scores** - Distinctive word usage
203
-
204
- ### N-grams (requires nltk)
205
- - **Character Bigram Entropy** - Character predictability
206
- - **Word Bigram Entropy** - Word sequence predictability
207
- - **Perplexity** - Language model fit
208
-
209
- ## Dependencies
210
-
211
- **Core (always installed):**
212
- - stylometry-ttr
213
-
214
- **Optional:**
215
- - `readability`: pronouncing (syllable counting), spacy>=3.8.0 (NLP-enhanced Gunning Fog)
216
- - `syntactic`: spacy>=3.8.0 (POS tagging and syntactic analysis)
217
- - `authorship`: None (pure Python + stdlib)
218
- - `ngrams`: None (pure Python + stdlib)
219
-
220
- **Note:** spaCy is shared between `readability` and `syntactic` groups. For enhanced Gunning Fog accuracy, download a language model:
221
- ```bash
222
- python -m spacy download en_core_web_sm # Small model (13MB)
223
- python -m spacy download en_core_web_md # Medium model (better accuracy)
224
- ```
225
-
226
- ## Development
227
-
228
- ```bash
229
- # Clone the repository
230
- git clone https://github.com/craigtrim/pystylometry
231
- cd pystylometry
232
-
233
- # Install with dev dependencies
234
- pip install -e ".[dev,all]"
235
-
236
- # Run tests
237
- make test
238
-
239
- # Run linters
240
- make lint
241
-
242
- # Format code
243
- make format
244
- ```
245
-
246
- ## Project Status
247
-
248
- 🚧 **Phase 1 - Core Lexical Metrics** (In Progress)
249
- - [x] Project structure
250
- - [ ] MTLD implementation
251
- - [ ] Yule's K implementation
252
- - [ ] Hapax ratios implementation
253
- - [ ] Tests
254
- - [ ] v0.1.0 release
255
-
256
-
257
- ## Why pystylometry?
258
-
259
- - **Modular**: Install only what you need
260
- - **Consistent**: Uniform API across all metrics
261
- - **Rich Results**: Dataclass objects with metadata, not just numbers
262
- - **Well-Documented**: Formulas, references, and interpretations
263
- - **Type-Safe**: Full type hints for IDE support
264
- - **Tested**: Comprehensive test suite
265
-
266
-
267
- ## License
268
-
269
- MIT License - see LICENSE file for details.
270
-
271
- ## Author
272
-
273
- Craig Trim (craigtrim@gmail.com)
274
-
275
- ## Contributing
276
-
277
- Contributions welcome! Please open an issue or PR on GitHub.
278
-