pystylometry 1.1.0__py3-none-any.whl → 1.3.1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pystylometry/README.md +42 -0
- pystylometry/__init__.py +17 -1
- pystylometry/_types.py +206 -0
- pystylometry/authorship/README.md +21 -0
- pystylometry/authorship/__init__.py +9 -6
- pystylometry/authorship/additional_methods.py +262 -17
- pystylometry/authorship/compression.py +175 -0
- pystylometry/authorship/kilgarriff.py +8 -1
- pystylometry/character/README.md +17 -0
- pystylometry/consistency/README.md +27 -0
- pystylometry/dialect/README.md +26 -0
- pystylometry/lexical/README.md +23 -0
- pystylometry/lexical/__init__.py +3 -0
- pystylometry/lexical/repetition.py +506 -0
- pystylometry/ngrams/README.md +18 -0
- pystylometry/ngrams/extended_ngrams.py +314 -69
- pystylometry/prosody/README.md +17 -0
- pystylometry/prosody/rhythm_prosody.py +773 -11
- pystylometry/readability/README.md +23 -0
- pystylometry/stylistic/README.md +20 -0
- pystylometry/stylistic/cohesion_coherence.py +669 -13
- pystylometry/stylistic/genre_register.py +1560 -17
- pystylometry/stylistic/markers.py +611 -17
- pystylometry/stylistic/vocabulary_overlap.py +354 -13
- pystylometry/syntactic/README.md +20 -0
- pystylometry/viz/README.md +27 -0
- pystylometry-1.3.1.dist-info/LICENSE +21 -0
- pystylometry-1.3.1.dist-info/METADATA +79 -0
- {pystylometry-1.1.0.dist-info → pystylometry-1.3.1.dist-info}/RECORD +31 -16
- {pystylometry-1.1.0.dist-info → pystylometry-1.3.1.dist-info}/WHEEL +1 -1
- pystylometry-1.1.0.dist-info/METADATA +0 -278
- {pystylometry-1.1.0.dist-info → pystylometry-1.3.1.dist-info}/entry_points.txt +0 -0
|
@@ -1,278 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: pystylometry
|
|
3
|
-
Version: 1.1.0
|
|
4
|
-
Summary: Comprehensive Python package for stylometric analysis
|
|
5
|
-
License: MIT
|
|
6
|
-
Keywords: stylometry,nlp,text-analysis,authorship,readability,lexical-diversity,readability-metrics
|
|
7
|
-
Author: Craig Trim
|
|
8
|
-
Author-email: craigtrim@gmail.com
|
|
9
|
-
Requires-Python: >=3.11,<4.0
|
|
10
|
-
Classifier: Development Status :: 4 - Beta
|
|
11
|
-
Classifier: Intended Audience :: Developers
|
|
12
|
-
Classifier: Intended Audience :: Science/Research
|
|
13
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
-
Classifier: Programming Language :: Python :: 3
|
|
15
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
-
Classifier: Programming Language :: Python :: 3.13
|
|
18
|
-
Classifier: Programming Language :: Python :: 3.14
|
|
19
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
20
|
-
Classifier: Programming Language :: Python :: 3.9
|
|
21
|
-
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
|
22
|
-
Classifier: Topic :: Text Processing :: Linguistic
|
|
23
|
-
Classifier: Typing :: Typed
|
|
24
|
-
Requires-Dist: stylometry-ttr (>=1.0.3,<2.0.0)
|
|
25
|
-
Project-URL: Homepage, https://github.com/craigtrim/pystylometry
|
|
26
|
-
Project-URL: Issues, https://github.com/craigtrim/pystylometry/issues
|
|
27
|
-
Project-URL: Repository, https://github.com/craigtrim/pystylometry
|
|
28
|
-
Description-Content-Type: text/markdown
|
|
29
|
-
|
|
30
|
-
# pystylometry
|
|
31
|
-
|
|
32
|
-
[](https://www.python.org/downloads/)
|
|
33
|
-
[](https://opensource.org/licenses/MIT)
|
|
34
|
-
[](https://github.com/astral-sh/ruff)
|
|
35
|
-
[](https://badge.fury.io/py/pystylometry)
|
|
36
|
-
[](https://pepy.tech/project/pystylometry)
|
|
37
|
-
|
|
38
|
-
A comprehensive Python package for stylometric analysis with modular architecture and optional dependencies.
|
|
39
|
-
|
|
40
|
-
## Features
|
|
41
|
-
|
|
42
|
-
**pystylometry** provides 50+ metrics across five analysis domains:
|
|
43
|
-
|
|
44
|
-
- **Lexical Diversity**: TTR, MTLD, Yule's K, Hapax ratios, and more
|
|
45
|
-
- **Readability**: Flesch, SMOG, Gunning Fog, Coleman-Liau, ARI
|
|
46
|
-
- **Syntactic Analysis**: POS ratios, sentence statistics (requires spaCy)
|
|
47
|
-
- **Authorship Attribution**: Burrows' Delta, Cosine Delta, Zeta scores
|
|
48
|
-
- **N-gram Analysis**: Character and word bigram entropy, perplexity
|
|
49
|
-
|
|
50
|
-
## Installation
|
|
51
|
-
|
|
52
|
-
Install only what you need:
|
|
53
|
-
|
|
54
|
-
```bash
|
|
55
|
-
# Core package (lexical metrics only)
|
|
56
|
-
pip install pystylometry
|
|
57
|
-
|
|
58
|
-
# With readability metrics
|
|
59
|
-
pip install pystylometry[readability]
|
|
60
|
-
|
|
61
|
-
# With syntactic metrics (requires spaCy)
|
|
62
|
-
pip install pystylometry[syntactic]
|
|
63
|
-
|
|
64
|
-
# With authorship metrics
|
|
65
|
-
pip install pystylometry[authorship]
|
|
66
|
-
|
|
67
|
-
# With n-gram analysis
|
|
68
|
-
pip install pystylometry[ngrams]
|
|
69
|
-
|
|
70
|
-
# Everything
|
|
71
|
-
pip install pystylometry[all]
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
## Quick Start
|
|
75
|
-
|
|
76
|
-
### Using Individual Modules
|
|
77
|
-
|
|
78
|
-
```python
|
|
79
|
-
from pystylometry.lexical import compute_mtld, compute_yule
|
|
80
|
-
from pystylometry.readability import compute_flesch
|
|
81
|
-
|
|
82
|
-
text = "Your text here..."
|
|
83
|
-
|
|
84
|
-
# Lexical diversity
|
|
85
|
-
mtld = compute_mtld(text)
|
|
86
|
-
print(f"MTLD: {mtld.mtld_average:.2f}")
|
|
87
|
-
|
|
88
|
-
yule = compute_yule(text)
|
|
89
|
-
print(f"Yule's K: {yule.yule_k:.2f}")
|
|
90
|
-
|
|
91
|
-
# Readability
|
|
92
|
-
flesch = compute_flesch(text)
|
|
93
|
-
print(f"Reading Ease: {flesch.reading_ease:.1f}")
|
|
94
|
-
print(f"Grade Level: {flesch.grade_level:.1f}")
|
|
95
|
-
```
|
|
96
|
-
|
|
97
|
-
### Using the Unified API
|
|
98
|
-
|
|
99
|
-
```python
|
|
100
|
-
from pystylometry import analyze
|
|
101
|
-
|
|
102
|
-
text = "Your text here..."
|
|
103
|
-
|
|
104
|
-
# Analyze with multiple metrics at once
|
|
105
|
-
results = analyze(text, lexical=True, readability=True)
|
|
106
|
-
|
|
107
|
-
# Access results
|
|
108
|
-
print(f"MTLD: {results.lexical['mtld'].mtld_average:.2f}")
|
|
109
|
-
print(f"Flesch: {results.readability['flesch'].reading_ease:.1f}")
|
|
110
|
-
```
|
|
111
|
-
|
|
112
|
-
### Checking Available Modules
|
|
113
|
-
|
|
114
|
-
```python
|
|
115
|
-
from pystylometry import get_available_modules
|
|
116
|
-
|
|
117
|
-
available = get_available_modules()
|
|
118
|
-
print(available)
|
|
119
|
-
# {'lexical': True, 'readability': True, 'syntactic': False, ...}
|
|
120
|
-
```
|
|
121
|
-
|
|
122
|
-
## API Design
|
|
123
|
-
|
|
124
|
-
### Clean, Consistent Interface
|
|
125
|
-
|
|
126
|
-
Every metric function:
|
|
127
|
-
- Takes text as input
|
|
128
|
-
- Returns a rich result object (never just a float)
|
|
129
|
-
- Includes metadata about the computation
|
|
130
|
-
- Has comprehensive docstrings with formulas and references
|
|
131
|
-
|
|
132
|
-
```python
|
|
133
|
-
from pystylometry.lexical import compute_yule
|
|
134
|
-
|
|
135
|
-
result = compute_yule(text)
|
|
136
|
-
# Returns: YuleResult(yule_k=..., yule_i=..., metadata={...})
|
|
137
|
-
```
|
|
138
|
-
|
|
139
|
-
## Available Metrics
|
|
140
|
-
|
|
141
|
-
### Lexical Diversity
|
|
142
|
-
- **TTR** - Type-Token Ratio (via stylometry-ttr)
|
|
143
|
-
- **MTLD** - Measure of Textual Lexical Diversity
|
|
144
|
-
- **Yule's K** - Vocabulary repetitiveness
|
|
145
|
-
- **Hapax Legomena** - Words appearing once/twice
|
|
146
|
-
- **Sichel's S** - Hapax-based richness
|
|
147
|
-
- **Honoré's R** - Vocabulary richness constant
|
|
148
|
-
|
|
149
|
-
### Readability
|
|
150
|
-
- **Flesch Reading Ease** - 0-100 difficulty scale
|
|
151
|
-
- **Flesch-Kincaid Grade** - US grade level
|
|
152
|
-
- **SMOG Index** - Years of education needed
|
|
153
|
-
- **Gunning Fog** - NLP-enhanced readability complexity (see below)
|
|
154
|
-
- **Coleman-Liau** - Character-based grade level
|
|
155
|
-
- **ARI** - Automated Readability Index
|
|
156
|
-
|
|
157
|
-
#### Gunning Fog Index - NLP Enhancement
|
|
158
|
-
|
|
159
|
-
The Gunning Fog Index implementation includes advanced NLP features when spaCy is available:
|
|
160
|
-
|
|
161
|
-
**Enhanced Mode** (with spaCy):
|
|
162
|
-
- Accurate proper noun detection via POS tagging (PROPN)
|
|
163
|
-
- True morphological analysis via lemmatization
|
|
164
|
-
- Component-based hyphenated word analysis
|
|
165
|
-
- Handles edge cases: acronyms, irregular verbs, compound nouns
|
|
166
|
-
|
|
167
|
-
**Basic Mode** (without spaCy):
|
|
168
|
-
- Capitalization-based proper noun detection
|
|
169
|
-
- Simple suffix stripping for inflections (-es, -ed, -ing)
|
|
170
|
-
- Component-based hyphenated word analysis
|
|
171
|
-
- Works without external dependencies
|
|
172
|
-
|
|
173
|
-
```python
|
|
174
|
-
from pystylometry.readability import compute_gunning_fog
|
|
175
|
-
|
|
176
|
-
text = "Understanding computational linguistics requires significant dedication."
|
|
177
|
-
result = compute_gunning_fog(text)
|
|
178
|
-
|
|
179
|
-
print(f"Fog Index: {result.fog_index:.1f}")
|
|
180
|
-
print(f"Grade Level: {result.grade_level}")
|
|
181
|
-
print(f"Detection Mode: {result.metadata['mode']}") # "enhanced" or "basic"
|
|
182
|
-
```
|
|
183
|
-
|
|
184
|
-
**To enable enhanced mode:**
|
|
185
|
-
```bash
|
|
186
|
-
pip install pystylometry[readability]
|
|
187
|
-
python -m spacy download en_core_web_sm
|
|
188
|
-
```
|
|
189
|
-
|
|
190
|
-
**Reference:** Gunning, R. (1952). The Technique of Clear Writing. McGraw-Hill.
|
|
191
|
-
|
|
192
|
-
**Implementation Details:** See [GitHub PR #4](https://github.com/craigtrim/pystylometry/pull/4) for the rationale behind NLP enhancements.
|
|
193
|
-
|
|
194
|
-
### Syntactic (requires spaCy)
|
|
195
|
-
- **POS Ratios** - Noun/verb/adjective/adverb ratios
|
|
196
|
-
- **Lexical Density** - Content vs function words
|
|
197
|
-
- **Sentence Statistics** - Length, variation, complexity
|
|
198
|
-
|
|
199
|
-
### Authorship (requires scikit-learn, scipy)
|
|
200
|
-
- **Burrows' Delta** - Author distance measure
|
|
201
|
-
- **Cosine Delta** - Angular distance
|
|
202
|
-
- **Zeta Scores** - Distinctive word usage
|
|
203
|
-
|
|
204
|
-
### N-grams (requires nltk)
|
|
205
|
-
- **Character Bigram Entropy** - Character predictability
|
|
206
|
-
- **Word Bigram Entropy** - Word sequence predictability
|
|
207
|
-
- **Perplexity** - Language model fit
|
|
208
|
-
|
|
209
|
-
## Dependencies
|
|
210
|
-
|
|
211
|
-
**Core (always installed):**
|
|
212
|
-
- stylometry-ttr
|
|
213
|
-
|
|
214
|
-
**Optional:**
|
|
215
|
-
- `readability`: pronouncing (syllable counting), spacy>=3.8.0 (NLP-enhanced Gunning Fog)
|
|
216
|
-
- `syntactic`: spacy>=3.8.0 (POS tagging and syntactic analysis)
|
|
217
|
-
- `authorship`: None (pure Python + stdlib)
|
|
218
|
-
- `ngrams`: None (pure Python + stdlib)
|
|
219
|
-
|
|
220
|
-
**Note:** spaCy is shared between `readability` and `syntactic` groups. For enhanced Gunning Fog accuracy, download a language model:
|
|
221
|
-
```bash
|
|
222
|
-
python -m spacy download en_core_web_sm # Small model (13MB)
|
|
223
|
-
python -m spacy download en_core_web_md # Medium model (better accuracy)
|
|
224
|
-
```
|
|
225
|
-
|
|
226
|
-
## Development
|
|
227
|
-
|
|
228
|
-
```bash
|
|
229
|
-
# Clone the repository
|
|
230
|
-
git clone https://github.com/craigtrim/pystylometry
|
|
231
|
-
cd pystylometry
|
|
232
|
-
|
|
233
|
-
# Install with dev dependencies
|
|
234
|
-
pip install -e ".[dev,all]"
|
|
235
|
-
|
|
236
|
-
# Run tests
|
|
237
|
-
make test
|
|
238
|
-
|
|
239
|
-
# Run linters
|
|
240
|
-
make lint
|
|
241
|
-
|
|
242
|
-
# Format code
|
|
243
|
-
make format
|
|
244
|
-
```
|
|
245
|
-
|
|
246
|
-
## Project Status
|
|
247
|
-
|
|
248
|
-
🚧 **Phase 1 - Core Lexical Metrics** (In Progress)
|
|
249
|
-
- [x] Project structure
|
|
250
|
-
- [ ] MTLD implementation
|
|
251
|
-
- [ ] Yule's K implementation
|
|
252
|
-
- [ ] Hapax ratios implementation
|
|
253
|
-
- [ ] Tests
|
|
254
|
-
- [ ] v0.1.0 release
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
## Why pystylometry?
|
|
258
|
-
|
|
259
|
-
- **Modular**: Install only what you need
|
|
260
|
-
- **Consistent**: Uniform API across all metrics
|
|
261
|
-
- **Rich Results**: Dataclass objects with metadata, not just numbers
|
|
262
|
-
- **Well-Documented**: Formulas, references, and interpretations
|
|
263
|
-
- **Type-Safe**: Full type hints for IDE support
|
|
264
|
-
- **Tested**: Comprehensive test suite
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
## License
|
|
268
|
-
|
|
269
|
-
MIT License - see LICENSE file for details.
|
|
270
|
-
|
|
271
|
-
## Author
|
|
272
|
-
|
|
273
|
-
Craig Trim (craigtrim@gmail.com)
|
|
274
|
-
|
|
275
|
-
## Contributing
|
|
276
|
-
|
|
277
|
-
Contributions welcome! Please open an issue or PR on GitHub.
|
|
278
|
-
|
|
File without changes
|