trigram-llm 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. trigram_llm-0.1.0/LICENSE +21 -0
  2. trigram_llm-0.1.0/MANIFEST.in +6 -0
  3. trigram_llm-0.1.0/PKG-INFO +306 -0
  4. trigram_llm-0.1.0/README.md +273 -0
  5. trigram_llm-0.1.0/README_PYTHON.md +273 -0
  6. trigram_llm-0.1.0/pyproject.toml +46 -0
  7. trigram_llm-0.1.0/setup.cfg +4 -0
  8. trigram_llm-0.1.0/setup.py +149 -0
  9. trigram_llm-0.1.0/tests/test_advanced.py +203 -0
  10. trigram_llm-0.1.0/tests/test_basic.py +235 -0
  11. trigram_llm-0.1.0/tests/test_edge_cases.py +228 -0
  12. trigram_llm-0.1.0/tests/test_persistence.py +160 -0
  13. trigram_llm-0.1.0/trigram/__init__.py +25 -0
  14. trigram_llm-0.1.0/trigram/_lib.py +205 -0
  15. trigram_llm-0.1.0/trigram/_trigram_c.dylib +0 -0
  16. trigram_llm-0.1.0/trigram/model.py +756 -0
  17. trigram_llm-0.1.0/trigram/utils.py +80 -0
  18. trigram_llm-0.1.0/trigram_frontend_api/generate_graphs.py +238 -0
  19. trigram_llm-0.1.0/trigram_llm/include/hashmap.h +31 -0
  20. trigram_llm-0.1.0/trigram_llm/include/queue.h +23 -0
  21. trigram_llm-0.1.0/trigram_llm/include/reader.h +9 -0
  22. trigram_llm-0.1.0/trigram_llm/include/sll.h +23 -0
  23. trigram_llm-0.1.0/trigram_llm/include/tree.h +52 -0
  24. trigram_llm-0.1.0/trigram_llm/include/trigram.h +17 -0
  25. trigram_llm-0.1.0/trigram_llm/include/trigram_py.h +99 -0
  26. trigram_llm-0.1.0/trigram_llm/src/hashmap.c +166 -0
  27. trigram_llm-0.1.0/trigram_llm/src/main.c +219 -0
  28. trigram_llm-0.1.0/trigram_llm/src/queue.c +114 -0
  29. trigram_llm-0.1.0/trigram_llm/src/reader.c +63 -0
  30. trigram_llm-0.1.0/trigram_llm/src/sll.c +79 -0
  31. trigram_llm-0.1.0/trigram_llm/src/tree.c +780 -0
  32. trigram_llm-0.1.0/trigram_llm/src/trigram.c +177 -0
  33. trigram_llm-0.1.0/trigram_llm/src/trigram_py.c +209 -0
  34. trigram_llm-0.1.0/trigram_llm.egg-info/PKG-INFO +306 -0
  35. trigram_llm-0.1.0/trigram_llm.egg-info/SOURCES.txt +37 -0
  36. trigram_llm-0.1.0/trigram_llm.egg-info/dependency_links.txt +1 -0
  37. trigram_llm-0.1.0/trigram_llm.egg-info/not-zip-safe +1 -0
  38. trigram_llm-0.1.0/trigram_llm.egg-info/requires.txt +4 -0
  39. trigram_llm-0.1.0/trigram_llm.egg-info/top_level.txt +5 -0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Raghottam Girish Nadgoudar
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,6 @@
1
+ include trigram_llm/src/*.c
2
+ include trigram_llm/include/*.h
3
+ include README.md
4
+ include README_PYTHON.md
5
+ include LICENSE
6
+ include pyproject.toml
@@ -0,0 +1,306 @@
1
+ Metadata-Version: 2.4
2
+ Name: trigram-llm
3
+ Version: 0.1.0
4
+ Summary: Fast C-backed trigram language model for word prediction and sentence completion
5
+ Author: Raghottam Girish Nadgoudar
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/ROHITH-KUMAR-L/Trigrams
8
+ Project-URL: Repository, https://github.com/ROHITH-KUMAR-L/Trigrams
9
+ Project-URL: Bug Tracker, https://github.com/ROHITH-KUMAR-L/Trigrams/issues
10
+ Keywords: nlp,language-model,trigram,autocomplete,prediction,nlp,text
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: Education
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.8
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Programming Language :: Python :: 3.13
22
+ Classifier: Programming Language :: C
23
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
24
+ Classifier: Topic :: Text Processing :: Linguistic
25
+ Requires-Python: >=3.8
26
+ Description-Content-Type: text/markdown
27
+ License-File: LICENSE
28
+ Provides-Extra: dev
29
+ Requires-Dist: pytest>=7.0; extra == "dev"
30
+ Requires-Dist: pytest-cov; extra == "dev"
31
+ Dynamic: license-file
32
+ Dynamic: requires-python
33
+
34
+ # trigram-llm ๐Ÿง 
35
+
36
+ A **fast, production-ready Python library** for next-word prediction and sentence completion, powered by a hand-written C engine using a Prefix Trie, DJB2 HashMap, and Stupid Backoff smoothing.
37
+
38
+ > Sub-millisecond predictions ยท Zero dependencies ยท Pure ctypes ยท Thread-safe
39
+
40
+ ---
41
+
42
+ ## Features
43
+
44
+ | Feature | Description |
45
+ |---|---|
46
+ | `train_from_text(text)` | Train from any Python string |
47
+ | `train_from_file(path)` | Train from a text file (incremental) |
48
+ | `train_from_list(words)` | Train from a pre-tokenised word list |
49
+ | `predict_next(w1, w2)` | Greedy single-word prediction (< 1ms) |
50
+ | `predict_top_n(w1, w2, n, temperature)` | Top-N predictions with probabilities |
51
+ | `complete_sentence(prompt, num_words, beam_width)` | Beam search sentence generation |
52
+ | `greedy_generate(prompt, num_words)` | Fastest sentence completion |
53
+ | `perplexity(text)` | Evaluate model quality on held-out text |
54
+ | `vocabulary()` | Returns all known words as a Python `set` |
55
+ | `get_stats()` | Dict with trigram count, vocab size, etc. |
56
+ | `save(path)` / `TrigramModel.load(path)` | Binary model persistence |
57
+ | `reset()` | Clear model and retrain from scratch |
58
+ | `"the quick" in model` | Check if a bigram context was seen |
59
+ | `len(model)` | Total number of stored trigrams |
60
+ | Thread-safe | All predictions guarded by a `threading.Lock` |
61
+ | Context manager | `with TrigramModel.load(path) as m:` |
62
+
63
+ ---
64
+
65
+ ## Installation
66
+
67
+ ### Prerequisites
68
+ - Python 3.8+
69
+ - GCC (macOS: `xcode-select --install`, Ubuntu: `sudo apt install gcc`)
70
+
71
+ ### Install (one command)
72
+
73
+ ```bash
74
+ cd /path/to/Trigrams
75
+ pip install -e .
76
+ ```
77
+
78
+ This compiles the C engine into `trigram/_trigram_c.dylib` (or `.so` on Linux) and installs the package in editable mode.
79
+
80
+ ---
81
+
82
+ ## Quickstart
83
+
84
+ ```python
85
+ from trigram import TrigramModel
86
+
87
+ # 1. Create and train
88
+ model = TrigramModel()
89
+ model.train_from_text("""
90
+ The quick brown fox jumps over the lazy dog.
91
+ The quick brown fox was nimble and swift.
92
+ The lazy dog slept peacefully under the old oak tree.
93
+ """)
94
+
95
+ # 2. Predict next word (greedy)
96
+ word = model.predict_next("the", "quick")
97
+ print(word) # โ†’ "brown"
98
+
99
+ # 3. Top-N predictions with probabilities
100
+ preds = model.predict_top_n("the", "quick", n=3, temperature=1.0)
101
+ # [{"word": "brown", "probability": 0.75, "count": 2},
102
+ # {"word": "red", "probability": 0.25, "count": 1}]
103
+
104
+ # 4. Sentence completion (beam search)
105
+ completions = model.complete_sentence("the quick", num_words=4, beam_width=3)
106
+ # [{"sentence": "the quick brown fox jumps", "probability": 0.012}, ...]
107
+
108
+ # 5. Greedy generation (fastest)
109
+ sentence = model.greedy_generate("the quick", num_words=3)
110
+ # "the quick brown fox"
111
+
112
+ # 6. Evaluate quality
113
+ ppl = model.perplexity("the quick brown fox")
114
+ print(f"Perplexity: {ppl:.2f}")
115
+
116
+ # 7. Inspect model
117
+ print(len(model)) # โ†’ total trigrams
118
+ print("the quick" in model) # โ†’ True
119
+ print(model.vocabulary()) # โ†’ {"the", "quick", "brown", ...}
120
+ print(model.get_stats()) # โ†’ {"total_trigrams": 7, "unique_first_words": 3, ...}
121
+ ```
122
+
123
+ ---
124
+
125
+ ## Training from a File
126
+
127
+ ```python
128
+ model = TrigramModel()
129
+ model.train_from_file("path/to/my_corpus.txt")
130
+
131
+ # Incremental training โ€” add more data later
132
+ model.train_from_file("path/to/more_data.txt")
133
+ ```
134
+
135
+ ---
136
+
137
+ ## Saving and Loading Models
138
+
139
+ ```python
140
+ # Save
141
+ model.save("my_model.bin")
142
+
143
+ # Load (class method)
144
+ model2 = TrigramModel.load("my_model.bin")
145
+
146
+ # Context manager (auto-frees on exit)
147
+ with TrigramModel.load("my_model.bin") as m:
148
+ print(m.predict_next("the", "quick"))
149
+ ```
150
+
151
+ ---
152
+
153
+ ## Temperature Sampling
154
+
155
+ The `temperature` parameter controls how creative predictions are:
156
+
157
+ ```python
158
+ # Deterministic โ€” always picks the most common word
159
+ model.predict_top_n("the", "quick", temperature=0.1)
160
+
161
+ # Standard probability distribution
162
+ model.predict_top_n("the", "quick", temperature=1.0)
163
+
164
+ # More diverse / creative
165
+ model.predict_top_n("the", "quick", temperature=2.0)
166
+ ```
167
+
168
+ ---
169
+
170
+ ## Advanced Usage
171
+
172
+ ### Train from a word list (custom tokenisation)
173
+
174
+ ```python
175
+ import nltk
176
+ tokens = nltk.word_tokenize("The quick brown fox")
177
+ tokens = [t.lower() for t in tokens if t.isalpha()]
178
+
179
+ model = TrigramModel()
180
+ model.train_from_list(tokens)
181
+ ```
182
+
183
+ ### Thread-safe batch prediction
184
+
185
+ ```python
186
+ import threading
187
+
188
+ def worker(model, results, idx):
189
+ results[idx] = model.predict_top_n("the", "quick", n=5)
190
+
191
+ model = TrigramModel.load("model.bin")
192
+ results = [None] * 10
193
+ threads = [threading.Thread(target=worker, args=(model, results, i)) for i in range(10)]
194
+ for t in threads: t.start()
195
+ for t in threads: t.join()
196
+ ```
197
+
198
+ ### Check if a context exists before predicting
199
+
200
+ ```python
201
+ if "the quick" in model:
202
+ result = model.predict_next("the", "quick")
203
+ ```
204
+
205
+ ---
206
+
207
+ ## API Reference
208
+
209
+ ### `TrigramModel()`
210
+ Creates a new empty model.
211
+
212
+ ### `train_from_text(text: str) โ†’ int`
213
+ Train on a raw text string. Returns trigrams inserted.
214
+
215
+ ### `train_from_file(path) โ†’ int`
216
+ Train from a text file. Returns trigrams inserted.
217
+
218
+ ### `train_from_list(words: list) โ†’ int`
219
+ Train from a pre-tokenised word list. Returns trigrams inserted.
220
+
221
+ ### `predict_next(w1, w2) โ†’ str | None`
222
+ Return the single most-likely next word or `None`.
223
+
224
+ ### `predict_top_n(w1, w2, n=5, temperature=1.0) โ†’ list[dict]`
225
+ Return up to N predictions sorted by probability descending.
226
+ Each dict: `{"word": str, "probability": float, "count": int}`.
227
+
228
+ ### `complete_sentence(prompt, num_words=5, beam_width=3) โ†’ list[dict]`
229
+ Generate sentence completions via beam search.
230
+ Each dict: `{"sentence": str, "probability": float}`.
231
+
232
+ ### `greedy_generate(prompt, num_words=5) โ†’ str`
233
+ Fastest sentence completion using greedy decoding.
234
+
235
+ ### `perplexity(text) โ†’ float`
236
+ Compute per-token perplexity on held-out text. Lower = better.
237
+
238
+ ### `vocabulary() โ†’ set[str]`
239
+ All words seen in the first-word position of training trigrams.
240
+
241
+ ### `get_stats() โ†’ dict`
242
+ `{"total_trigrams": int, "unique_first_words": int, "vocabulary_size": int}`.
243
+
244
+ ### `save(path) โ†’ None`
245
+ Save model to binary file. Compatible with the C CLI tool.
246
+
247
+ ### `TrigramModel.load(path) โ†’ TrigramModel` (classmethod)
248
+ Load a pre-trained binary model. Supports context manager protocol.
249
+
250
+ ### `reset() โ†’ None`
251
+ Clear all training data.
252
+
253
+ ### `len(model)` โ†’ int
254
+ Total stored trigrams.
255
+
256
+ ### `"w1 w2" in model` / `("w1", "w2") in model` โ†’ bool
257
+ Check if a bigram context exists.
258
+
259
+ ### `repr(model)`
260
+ `TrigramModel(trigrams=11,062,203, vocab=97,277)`
261
+
262
+ ---
263
+
264
+ ## Performance
265
+
266
+ | Operation | Latency |
267
+ |---|---|
268
+ | Single word prediction | < 1ms |
269
+ | Top-5 predictions | 1โ€“2ms |
270
+ | Beam search (5 words, width 3) | 5โ€“10ms |
271
+ | Training (1M words) | ~30s |
272
+
273
+ ---
274
+
275
+ ## Running Tests
276
+
277
+ ```bash
278
+ pip install pytest
279
+ pytest tests/ -v
280
+ ```
281
+
282
+ ---
283
+
284
+ ## Project Structure
285
+
286
+ ```
287
+ Trigrams/
288
+ โ”œโ”€โ”€ trigram/ # Python library
289
+ โ”‚ โ”œโ”€โ”€ __init__.py
290
+ โ”‚ โ”œโ”€โ”€ _lib.py # ctypes bindings
291
+ โ”‚ โ”œโ”€โ”€ model.py # TrigramModel class
292
+ โ”‚ โ”œโ”€โ”€ utils.py # Text preprocessing
293
+ โ”‚ โ””โ”€โ”€ _trigram_c.dylib # Compiled C engine (auto-generated)
294
+ โ”œโ”€โ”€ trigram_llm/
295
+ โ”‚ โ”œโ”€โ”€ src/ # C source files
296
+ โ”‚ โ””โ”€โ”€ include/ # C headers
297
+ โ”œโ”€โ”€ tests/ # pytest test suite
298
+ โ”œโ”€โ”€ setup.py # Build script
299
+ โ””โ”€โ”€ pyproject.toml
300
+ ```
301
+
302
+ ---
303
+
304
+ ## License
305
+
306
+ MIT License โ€” feel free to use, modify, and distribute.
@@ -0,0 +1,273 @@
1
+ # trigram-llm ๐Ÿง 
2
+
3
+ A **fast, production-ready Python library** for next-word prediction and sentence completion, powered by a hand-written C engine using a Prefix Trie, DJB2 HashMap, and Stupid Backoff smoothing.
4
+
5
+ > Sub-millisecond predictions ยท Zero dependencies ยท Pure ctypes ยท Thread-safe
6
+
7
+ ---
8
+
9
+ ## Features
10
+
11
+ | Feature | Description |
12
+ |---|---|
13
+ | `train_from_text(text)` | Train from any Python string |
14
+ | `train_from_file(path)` | Train from a text file (incremental) |
15
+ | `train_from_list(words)` | Train from a pre-tokenised word list |
16
+ | `predict_next(w1, w2)` | Greedy single-word prediction (< 1ms) |
17
+ | `predict_top_n(w1, w2, n, temperature)` | Top-N predictions with probabilities |
18
+ | `complete_sentence(prompt, num_words, beam_width)` | Beam search sentence generation |
19
+ | `greedy_generate(prompt, num_words)` | Fastest sentence completion |
20
+ | `perplexity(text)` | Evaluate model quality on held-out text |
21
+ | `vocabulary()` | Returns all known words as a Python `set` |
22
+ | `get_stats()` | Dict with trigram count, vocab size, etc. |
23
+ | `save(path)` / `TrigramModel.load(path)` | Binary model persistence |
24
+ | `reset()` | Clear model and retrain from scratch |
25
+ | `"the quick" in model` | Check if a bigram context was seen |
26
+ | `len(model)` | Total number of stored trigrams |
27
+ | Thread-safe | All predictions guarded by a `threading.Lock` |
28
+ | Context manager | `with TrigramModel.load(path) as m:` |
29
+
30
+ ---
31
+
32
+ ## Installation
33
+
34
+ ### Prerequisites
35
+ - Python 3.8+
36
+ - GCC (macOS: `xcode-select --install`, Ubuntu: `sudo apt install gcc`)
37
+
38
+ ### Install (one command)
39
+
40
+ ```bash
41
+ cd /path/to/Trigrams
42
+ pip install -e .
43
+ ```
44
+
45
+ This compiles the C engine into `trigram/_trigram_c.dylib` (or `.so` on Linux) and installs the package in editable mode.
46
+
47
+ ---
48
+
49
+ ## Quickstart
50
+
51
+ ```python
52
+ from trigram import TrigramModel
53
+
54
+ # 1. Create and train
55
+ model = TrigramModel()
56
+ model.train_from_text("""
57
+ The quick brown fox jumps over the lazy dog.
58
+ The quick brown fox was nimble and swift.
59
+ The lazy dog slept peacefully under the old oak tree.
60
+ """)
61
+
62
+ # 2. Predict next word (greedy)
63
+ word = model.predict_next("the", "quick")
64
+ print(word) # โ†’ "brown"
65
+
66
+ # 3. Top-N predictions with probabilities
67
+ preds = model.predict_top_n("the", "quick", n=3, temperature=1.0)
68
+ # [{"word": "brown", "probability": 0.75, "count": 2},
69
+ # {"word": "red", "probability": 0.25, "count": 1}]
70
+
71
+ # 4. Sentence completion (beam search)
72
+ completions = model.complete_sentence("the quick", num_words=4, beam_width=3)
73
+ # [{"sentence": "the quick brown fox jumps", "probability": 0.012}, ...]
74
+
75
+ # 5. Greedy generation (fastest)
76
+ sentence = model.greedy_generate("the quick", num_words=3)
77
+ # "the quick brown fox"
78
+
79
+ # 6. Evaluate quality
80
+ ppl = model.perplexity("the quick brown fox")
81
+ print(f"Perplexity: {ppl:.2f}")
82
+
83
+ # 7. Inspect model
84
+ print(len(model)) # โ†’ total trigrams
85
+ print("the quick" in model) # โ†’ True
86
+ print(model.vocabulary()) # โ†’ {"the", "quick", "brown", ...}
87
+ print(model.get_stats()) # โ†’ {"total_trigrams": 7, "unique_first_words": 3, ...}
88
+ ```
89
+
90
+ ---
91
+
92
+ ## Training from a File
93
+
94
+ ```python
95
+ model = TrigramModel()
96
+ model.train_from_file("path/to/my_corpus.txt")
97
+
98
+ # Incremental training โ€” add more data later
99
+ model.train_from_file("path/to/more_data.txt")
100
+ ```
101
+
102
+ ---
103
+
104
+ ## Saving and Loading Models
105
+
106
+ ```python
107
+ # Save
108
+ model.save("my_model.bin")
109
+
110
+ # Load (class method)
111
+ model2 = TrigramModel.load("my_model.bin")
112
+
113
+ # Context manager (auto-frees on exit)
114
+ with TrigramModel.load("my_model.bin") as m:
115
+ print(m.predict_next("the", "quick"))
116
+ ```
117
+
118
+ ---
119
+
120
+ ## Temperature Sampling
121
+
122
+ The `temperature` parameter controls how creative predictions are:
123
+
124
+ ```python
125
+ # Deterministic โ€” always picks the most common word
126
+ model.predict_top_n("the", "quick", temperature=0.1)
127
+
128
+ # Standard probability distribution
129
+ model.predict_top_n("the", "quick", temperature=1.0)
130
+
131
+ # More diverse / creative
132
+ model.predict_top_n("the", "quick", temperature=2.0)
133
+ ```
134
+
135
+ ---
136
+
137
+ ## Advanced Usage
138
+
139
+ ### Train from a word list (custom tokenisation)
140
+
141
+ ```python
142
+ import nltk
143
+ tokens = nltk.word_tokenize("The quick brown fox")
144
+ tokens = [t.lower() for t in tokens if t.isalpha()]
145
+
146
+ model = TrigramModel()
147
+ model.train_from_list(tokens)
148
+ ```
149
+
150
+ ### Thread-safe batch prediction
151
+
152
+ ```python
153
+ import threading
154
+
155
+ def worker(model, results, idx):
156
+ results[idx] = model.predict_top_n("the", "quick", n=5)
157
+
158
+ model = TrigramModel.load("model.bin")
159
+ results = [None] * 10
160
+ threads = [threading.Thread(target=worker, args=(model, results, i)) for i in range(10)]
161
+ for t in threads: t.start()
162
+ for t in threads: t.join()
163
+ ```
164
+
165
+ ### Check if a context exists before predicting
166
+
167
+ ```python
168
+ if "the quick" in model:
169
+ result = model.predict_next("the", "quick")
170
+ ```
171
+
172
+ ---
173
+
174
+ ## API Reference
175
+
176
+ ### `TrigramModel()`
177
+ Creates a new empty model.
178
+
179
+ ### `train_from_text(text: str) โ†’ int`
180
+ Train on a raw text string. Returns trigrams inserted.
181
+
182
+ ### `train_from_file(path) โ†’ int`
183
+ Train from a text file. Returns trigrams inserted.
184
+
185
+ ### `train_from_list(words: list) โ†’ int`
186
+ Train from a pre-tokenised word list. Returns trigrams inserted.
187
+
188
+ ### `predict_next(w1, w2) โ†’ str | None`
189
+ Return the single most-likely next word or `None`.
190
+
191
+ ### `predict_top_n(w1, w2, n=5, temperature=1.0) โ†’ list[dict]`
192
+ Return up to N predictions sorted by probability descending.
193
+ Each dict: `{"word": str, "probability": float, "count": int}`.
194
+
195
+ ### `complete_sentence(prompt, num_words=5, beam_width=3) โ†’ list[dict]`
196
+ Generate sentence completions via beam search.
197
+ Each dict: `{"sentence": str, "probability": float}`.
198
+
199
+ ### `greedy_generate(prompt, num_words=5) โ†’ str`
200
+ Fastest sentence completion using greedy decoding.
201
+
202
+ ### `perplexity(text) โ†’ float`
203
+ Compute per-token perplexity on held-out text. Lower = better.
204
+
205
+ ### `vocabulary() โ†’ set[str]`
206
+ All words seen in the first-word position of training trigrams.
207
+
208
+ ### `get_stats() โ†’ dict`
209
+ `{"total_trigrams": int, "unique_first_words": int, "vocabulary_size": int}`.
210
+
211
+ ### `save(path) โ†’ None`
212
+ Save model to binary file. Compatible with the C CLI tool.
213
+
214
+ ### `TrigramModel.load(path) โ†’ TrigramModel` (classmethod)
215
+ Load a pre-trained binary model. Supports context manager protocol.
216
+
217
+ ### `reset() โ†’ None`
218
+ Clear all training data.
219
+
220
+ ### `len(model)` โ†’ int
221
+ Total stored trigrams.
222
+
223
+ ### `"w1 w2" in model` / `("w1", "w2") in model` โ†’ bool
224
+ Check if a bigram context exists.
225
+
226
+ ### `repr(model)`
227
+ `TrigramModel(trigrams=11,062,203, vocab=97,277)`
228
+
229
+ ---
230
+
231
+ ## Performance
232
+
233
+ | Operation | Latency |
234
+ |---|---|
235
+ | Single word prediction | < 1ms |
236
+ | Top-5 predictions | 1โ€“2ms |
237
+ | Beam search (5 words, width 3) | 5โ€“10ms |
238
+ | Training (1M words) | ~30s |
239
+
240
+ ---
241
+
242
+ ## Running Tests
243
+
244
+ ```bash
245
+ pip install pytest
246
+ pytest tests/ -v
247
+ ```
248
+
249
+ ---
250
+
251
+ ## Project Structure
252
+
253
+ ```
254
+ Trigrams/
255
+ โ”œโ”€โ”€ trigram/ # Python library
256
+ โ”‚ โ”œโ”€โ”€ __init__.py
257
+ โ”‚ โ”œโ”€โ”€ _lib.py # ctypes bindings
258
+ โ”‚ โ”œโ”€โ”€ model.py # TrigramModel class
259
+ โ”‚ โ”œโ”€โ”€ utils.py # Text preprocessing
260
+ โ”‚ โ””โ”€โ”€ _trigram_c.dylib # Compiled C engine (auto-generated)
261
+ โ”œโ”€โ”€ trigram_llm/
262
+ โ”‚ โ”œโ”€โ”€ src/ # C source files
263
+ โ”‚ โ””โ”€โ”€ include/ # C headers
264
+ โ”œโ”€โ”€ tests/ # pytest test suite
265
+ โ”œโ”€โ”€ setup.py # Build script
266
+ โ””โ”€โ”€ pyproject.toml
267
+ ```
268
+
269
+ ---
270
+
271
+ ## License
272
+
273
+ MIT License โ€” feel free to use, modify, and distribute.