cortex-llm 1.0.4__py3-none-any.whl → 1.0.6__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
cortex/__init__.py CHANGED
@@ -5,7 +5,7 @@ A high-performance terminal interface for running Hugging Face LLMs locally
5
5
  with exclusive GPU acceleration via Metal Performance Shaders (MPS) and MLX.
6
6
  """
7
7
 
8
- __version__ = "1.0.4"
8
+ __version__ = "1.0.6"
9
9
  __author__ = "Cortex Development Team"
10
10
  __license__ = "MIT"
11
11
 
@@ -0,0 +1,155 @@
1
+ Metadata-Version: 2.4
2
+ Name: cortex-llm
3
+ Version: 1.0.6
4
+ Summary: GPU-Accelerated LLM Terminal for Apple Silicon
5
+ Home-page: https://github.com/faisalmumtaz/Cortex
6
+ Author: Cortex Development Team
7
+ License-Expression: MIT
8
+ Project-URL: Homepage, https://github.com/faisalmumtaz/Cortex
9
+ Project-URL: Bug Tracker, https://github.com/faisalmumtaz/Cortex/issues
10
+ Project-URL: Documentation, https://github.com/faisalmumtaz/Cortex/wiki
11
+ Keywords: llm,gpu,metal,mps,apple-silicon,ai,machine-learning,terminal,mlx,pytorch
12
+ Platform: darwin
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Operating System :: MacOS
20
+ Classifier: Environment :: Console
21
+ Classifier: Environment :: GPU
22
+ Requires-Python: >=3.11
23
+ Description-Content-Type: text/markdown
24
+ License-File: LICENSE
25
+ Requires-Dist: torch>=2.1.0
26
+ Requires-Dist: mlx>=0.30.4
27
+ Requires-Dist: mlx-lm>=0.30.5
28
+ Requires-Dist: transformers>=4.36.0
29
+ Requires-Dist: safetensors>=0.4.0
30
+ Requires-Dist: huggingface-hub>=0.19.0
31
+ Requires-Dist: accelerate>=0.25.0
32
+ Requires-Dist: llama-cpp-python>=0.2.0
33
+ Requires-Dist: pyyaml>=6.0
34
+ Requires-Dist: pydantic>=2.5.0
35
+ Requires-Dist: rich>=13.0.0
36
+ Requires-Dist: psutil>=5.9.0
37
+ Requires-Dist: numpy>=1.24.0
38
+ Requires-Dist: packaging>=23.0
39
+ Requires-Dist: requests>=2.31.0
40
+ Provides-Extra: dev
41
+ Requires-Dist: pytest>=7.4.0; extra == "dev"
42
+ Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
43
+ Requires-Dist: black>=23.0.0; extra == "dev"
44
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
45
+ Requires-Dist: mypy>=1.8.0; extra == "dev"
46
+ Provides-Extra: optional
47
+ Requires-Dist: sentencepiece>=0.1.99; extra == "optional"
48
+ Requires-Dist: auto-gptq>=0.7.0; extra == "optional"
49
+ Requires-Dist: autoawq>=0.2.0; extra == "optional"
50
+ Requires-Dist: bitsandbytes>=0.41.0; extra == "optional"
51
+ Requires-Dist: optimum>=1.16.0; extra == "optional"
52
+ Requires-Dist: torchvision>=0.16.0; extra == "optional"
53
+ Requires-Dist: torchaudio>=2.1.0; extra == "optional"
54
+ Dynamic: home-page
55
+ Dynamic: license-file
56
+ Dynamic: platform
57
+ Dynamic: requires-python
58
+
59
+ # Cortex
60
+
61
+ GPU-accelerated local LLMs on Apple Silicon, built for the terminal.
62
+
63
+ Cortex is a fast, native CLI for running and fine-tuning LLMs on Apple Silicon using MLX and Metal. It automatically detects chat templates, supports multiple model formats, and keeps your workflow inside the terminal.
64
+
65
+ ## Highlights
66
+
67
+ - Apple Silicon GPU acceleration via MLX (primary) and PyTorch MPS
68
+ - Multi-format model support: MLX, GGUF, SafeTensors, PyTorch, GPTQ, AWQ
69
+ - Built-in LoRA fine-tuning wizard
70
+ - Chat template auto-detection (ChatML, Llama, Alpaca, Gemma, Reasoning)
71
+ - Conversation history with branching
72
+
73
+ ## Quick Start
74
+
75
+ ```bash
76
+ pipx install cortex-llm
77
+ cortex
78
+ ```
79
+
80
+ Inside Cortex:
81
+
82
+ - `/download` to fetch a model from HuggingFace
83
+ - `/model` to load or manage models
84
+ - `/status` to confirm GPU acceleration and current settings
85
+
86
+ ## Installation
87
+
88
+ ### Option A: pipx (recommended)
89
+
90
+ ```bash
91
+ pipx install cortex-llm
92
+ ```
93
+
94
+ ### Option B: from source
95
+
96
+ ```bash
97
+ git clone https://github.com/faisalmumtaz/Cortex.git
98
+ cd Cortex
99
+ ./install.sh
100
+ ```
101
+
102
+ The installer checks Apple Silicon compatibility, creates a venv, installs dependencies from `pyproject.toml`, and sets up the `cortex` command.
103
+
104
+ ## Requirements
105
+
106
+ - Apple Silicon Mac (M1/M2/M3/M4)
107
+ - macOS 13.3+
108
+ - Python 3.11+
109
+ - 16GB+ unified memory (24GB+ recommended for larger models)
110
+ - Xcode Command Line Tools
111
+
112
+ ## Model Support
113
+
114
+ Cortex supports:
115
+
116
+ - **MLX** (recommended)
117
+ - **GGUF** (llama.cpp + Metal)
118
+ - **SafeTensors**
119
+ - **PyTorch** (Transformers + MPS)
120
+ - **GPTQ** / **AWQ** quantized models
121
+
122
+ ## Configuration
123
+
124
+ Cortex reads `config.yaml` from the current working directory. For tuning GPU memory limits, quantization defaults, and inference parameters, see:
125
+
126
+ - `docs/configuration.md`
127
+
128
+ ## Documentation
129
+
130
+ Start here:
131
+
132
+ - `docs/installation.md`
133
+ - `docs/cli.md`
134
+ - `docs/model-management.md`
135
+ - `docs/troubleshooting.md`
136
+
137
+ Advanced topics:
138
+
139
+ - `docs/mlx-acceleration.md`
140
+ - `docs/inference-engine.md`
141
+ - `docs/template-registry.md`
142
+ - `docs/fine-tuning.md`
143
+ - `docs/development.md`
144
+
145
+ ## Contributing
146
+
147
+ Contributions are welcome. See `docs/development.md` for setup and workflow.
148
+
149
+ ## License
150
+
151
+ MIT License. See `LICENSE`.
152
+
153
+ ---
154
+
155
+ Note: Cortex requires Apple Silicon. Intel Macs are not supported.
@@ -1,4 +1,4 @@
1
- cortex/__init__.py,sha256=6KYzL3KARjSGTCZnrmxFxVlfuMUFtIwCL4cK2ekXOAs,2202
1
+ cortex/__init__.py,sha256=HQeri23e7w2It4MeziwPP2gTDfF9GgmBp9A0A2Zmrn0,2202
2
2
  cortex/__main__.py,sha256=I7Njt7BjGoHtPhftDoA44OyOYbwWNNaPwP_qlJSn0J4,2857
3
3
  cortex/config.py,sha256=txmpJXy3kUEKULZyu1OWb_jkNQRHZClm5ovZfCTX_Zc,13444
4
4
  cortex/conversation_manager.py,sha256=aSTdGjVttsMKIiRPzztP0tOXlqZBkWtgZDNCZGyaR-c,17177
@@ -41,9 +41,9 @@ cortex/ui/__init__.py,sha256=t3GrHJMHTVgBEKh2_qt4B9mS594V5jriTDqc3eZKMGc,3409
41
41
  cortex/ui/cli.py,sha256=ExzP56n1yV4bdA1EOqHSDFRWhpgpX0lkghq0H0FXw7Q,74661
42
42
  cortex/ui/markdown_render.py,sha256=bXt60vkNYT_jbpKeIg_1OlcrxssmdbMO7RB2E1sWw3E,5759
43
43
  cortex/ui/terminal_app.py,sha256=SF3KqcGFyZ4hpTmgX21idPzOTJLdKGkt4QdA-wwUBNE,18317
44
- cortex_llm-1.0.4.dist-info/licenses/LICENSE,sha256=_frJ3VsZWQGhMznZw2Tgjk7xwfAfDZRcBl43uZh8_4E,1070
45
- cortex_llm-1.0.4.dist-info/METADATA,sha256=rX0lVqvlXVaLNMfn3QWJH2rYSShxAiH7v6d_fWKvkYg,10087
46
- cortex_llm-1.0.4.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
47
- cortex_llm-1.0.4.dist-info/entry_points.txt,sha256=g83Nuz3iFrNdMLHxGLR2LnscdM7rdQRchuL3WGobQC8,48
48
- cortex_llm-1.0.4.dist-info/top_level.txt,sha256=79LAeTJJ_pMIBy3mkF7uNaN0mdBRt5tGrnne5N_iAio,7
49
- cortex_llm-1.0.4.dist-info/RECORD,,
44
+ cortex_llm-1.0.6.dist-info/licenses/LICENSE,sha256=_frJ3VsZWQGhMznZw2Tgjk7xwfAfDZRcBl43uZh8_4E,1070
45
+ cortex_llm-1.0.6.dist-info/METADATA,sha256=6lu4S6Jq8ijbV8MqFFjRU8b0dEp7QcJwPEPo7VFvtBk,4447
46
+ cortex_llm-1.0.6.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
47
+ cortex_llm-1.0.6.dist-info/entry_points.txt,sha256=g83Nuz3iFrNdMLHxGLR2LnscdM7rdQRchuL3WGobQC8,48
48
+ cortex_llm-1.0.6.dist-info/top_level.txt,sha256=79LAeTJJ_pMIBy3mkF7uNaN0mdBRt5tGrnne5N_iAio,7
49
+ cortex_llm-1.0.6.dist-info/RECORD,,
@@ -1,275 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: cortex-llm
3
- Version: 1.0.4
4
- Summary: GPU-Accelerated LLM Terminal for Apple Silicon
5
- Home-page: https://github.com/faisalmumtaz/Cortex
6
- Author: Cortex Development Team
7
- License: MIT
8
- Project-URL: Homepage, https://github.com/faisalmumtaz/Cortex
9
- Project-URL: Bug Tracker, https://github.com/faisalmumtaz/Cortex/issues
10
- Project-URL: Documentation, https://github.com/faisalmumtaz/Cortex/wiki
11
- Keywords: llm,gpu,metal,mps,apple-silicon,ai,machine-learning,terminal,mlx,pytorch
12
- Platform: darwin
13
- Classifier: Development Status :: 4 - Beta
14
- Classifier: Intended Audience :: Developers
15
- Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
16
- Classifier: License :: OSI Approved :: MIT License
17
- Classifier: Programming Language :: Python :: 3
18
- Classifier: Programming Language :: Python :: 3.11
19
- Classifier: Programming Language :: Python :: 3.12
20
- Classifier: Operating System :: MacOS
21
- Classifier: Environment :: Console
22
- Classifier: Environment :: GPU
23
- Requires-Python: >=3.11
24
- Description-Content-Type: text/markdown
25
- License-File: LICENSE
26
- Requires-Dist: torch>=2.1.0
27
- Requires-Dist: mlx>=0.10.0
28
- Requires-Dist: mlx-lm>=0.10.0
29
- Requires-Dist: transformers>=4.36.0
30
- Requires-Dist: safetensors>=0.4.0
31
- Requires-Dist: huggingface-hub>=0.19.0
32
- Requires-Dist: accelerate>=0.25.0
33
- Requires-Dist: llama-cpp-python>=0.2.0
34
- Requires-Dist: pyyaml>=6.0
35
- Requires-Dist: pydantic>=2.5.0
36
- Requires-Dist: rich>=13.0.0
37
- Requires-Dist: psutil>=5.9.0
38
- Requires-Dist: numpy>=1.24.0
39
- Requires-Dist: packaging>=23.0
40
- Requires-Dist: requests>=2.31.0
41
- Provides-Extra: dev
42
- Requires-Dist: pytest>=7.4.0; extra == "dev"
43
- Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
44
- Requires-Dist: black>=23.0.0; extra == "dev"
45
- Requires-Dist: ruff>=0.1.0; extra == "dev"
46
- Requires-Dist: mypy>=1.8.0; extra == "dev"
47
- Provides-Extra: optional
48
- Requires-Dist: sentencepiece>=0.1.99; extra == "optional"
49
- Requires-Dist: auto-gptq>=0.7.0; extra == "optional"
50
- Requires-Dist: autoawq>=0.2.0; extra == "optional"
51
- Requires-Dist: bitsandbytes>=0.41.0; extra == "optional"
52
- Requires-Dist: optimum>=1.16.0; extra == "optional"
53
- Requires-Dist: torchvision>=0.16.0; extra == "optional"
54
- Requires-Dist: torchaudio>=2.1.0; extra == "optional"
55
- Dynamic: home-page
56
- Dynamic: license-file
57
- Dynamic: platform
58
- Dynamic: requires-python
59
-
60
- # Cortex - LLM Terminal Client for Apple Silicon
61
-
62
- Cortex is an LLM terminal interface designed for Apple Silicon, using MLX and PyTorch MPS frameworks for GPU-accelerated inference.
63
-
64
- ## What It Does
65
-
66
- - **GPU-accelerated inference** via MLX (primary) and PyTorch MPS backends
67
- - **Apple Silicon required** - leverages unified memory architecture
68
- - **Multiple model formats** - MLX, GGUF, SafeTensors, PyTorch, GPTQ, AWQ
69
- - **Built-in fine-tuning** - LoRA-based model customization via interactive wizard
70
- - **Chat template auto-detection** - automatic format detection with confidence scoring
71
- - **Conversation persistence** - SQLite-backed chat history with branching
72
-
73
- ## Features
74
-
75
- - **GPU-Accelerated Inference** - Delegates to MLX and PyTorch MPS for Metal-based execution
76
- - **Apple Silicon Only** - Requires Metal GPU; exits if GPU acceleration is unavailable
77
- - **Model Format Support**:
78
- - MLX (Apple's format, loaded via `mlx_lm`)
79
- - GGUF (via `llama-cpp-python` with Metal backend)
80
- - SafeTensors (via HuggingFace `transformers`)
81
- - PyTorch models (via HuggingFace `transformers` with MPS device)
82
- - GPTQ quantized (via `auto-gptq`)
83
- - AWQ quantized (via `awq`)
84
- - **Quantization** - 4-bit, 5-bit, 8-bit, and mixed-precision quantization via MLX conversion pipeline
85
- - **Model Conversion** - Convert HuggingFace models to MLX format with configurable quantization recipes
86
- - **Template Registry** - Automatic detection of chat templates (ChatML, Llama, Alpaca, Gemma, Reasoning) with confidence scoring and real-time token filtering for reasoning models
87
- - **Rotating KV Cache** - MLX-based KV cache for long context handling (default 4096 tokens)
88
- - **Fine-Tuning** - LoRA-based model customization with interactive 6-step wizard
89
- - **Terminal UI** - ANSI terminal interface with streaming output
90
-
91
- ## Installation
92
-
93
- ```bash
94
- # Clone and install
95
- git clone https://github.com/faisalmumtaz/Cortex.git
96
- cd Cortex
97
- ./install.sh
98
- ```
99
-
100
- The installer:
101
- - Checks for Apple Silicon (arm64) compatibility
102
- - Creates a Python virtual environment
103
- - Installs dependencies via `pip install -e .` (from `pyproject.toml`)
104
- - Sets up the `cortex` command in your PATH
105
-
106
- ### Quick Install (pipx)
107
-
108
- If you just want the CLI without cloning the repo, use pipx:
109
-
110
- ```bash
111
- pipx install cortex-llm
112
- ```
113
-
114
- ## Quick Start
115
-
116
- ```bash
117
- # After installation, just run:
118
- cortex
119
- ```
120
-
121
- ### Downloading Models
122
-
123
- ```bash
124
- # Inside Cortex, use the download command:
125
- cortex
126
- # Then type: /download
127
- ```
128
-
129
- The download feature:
130
- - **HuggingFace integration** - download any model by repository ID
131
- - **Automatic loading** - option to load model immediately after download
132
-
133
- ## Documentation
134
-
135
- ### User Documentation
136
- - **[Installation Guide](docs/installation.md)** - Complete setup instructions
137
- - **[CLI Reference](docs/cli.md)** - Commands and user interface
138
- - **[Configuration](docs/configuration.md)** - System settings and optimization
139
- - **[Model Management](docs/model-management.md)** - Loading and managing models
140
- - **[Template Registry](docs/template-registry.md)** - Automatic chat template detection and management
141
- - **[Fine-Tuning Guide](docs/fine-tuning.md)** - Customize models with LoRA
142
- - **[Troubleshooting](docs/troubleshooting.md)** - Common issues and solutions
143
-
144
- ### Technical Documentation
145
- - **[MLX Acceleration](docs/mlx-acceleration.md)** - MLX framework integration and optimization
146
- - **[GPU Validation](docs/gpu-validation.md)** - Hardware requirements and detection
147
- - **[Inference Engine](docs/inference-engine.md)** - Text generation architecture
148
- - **[Conversation Management](docs/conversation-management.md)** - Chat history and persistence
149
- - **[Development Guide](docs/development.md)** - Contributing and architecture
150
-
151
- ## System Requirements
152
-
153
- - Apple Silicon Mac (M1/M2/M3/M4 - all variants supported)
154
- - macOS 13.3+ (required by MLX framework)
155
- - Python 3.11+
156
- - 16GB+ unified memory (24GB+ recommended for larger models)
157
- - Xcode Command Line Tools
158
-
159
- ## Performance
160
-
161
- Performance depends on your Apple Silicon chip, model size, and quantization level. The inference engine measures tokens/second, first-token latency, and memory usage at runtime.
162
-
163
- To check that GPU acceleration is working:
164
-
165
- ```bash
166
- source venv/bin/activate
167
- python tests/test_apple_silicon.py
168
- ```
169
-
170
- You should see:
171
- - All validation checks passing
172
- - Measured GFLOPS from matrix operations
173
- - Confirmation of Metal and MLX availability
174
-
175
- ## GPU Acceleration Architecture
176
-
177
- Cortex uses a multi-layer approach, delegating all GPU computation to established frameworks:
178
-
179
- 1. **MLX Framework (Primary Backend)**
180
- - Apple's ML framework with native Metal support
181
- - Quantization support (4-bit, 5-bit, 8-bit, mixed-precision)
182
- - Rotating KV cache for long contexts
183
- - JIT compilation via `mx.compile`
184
- - Operation fusion for reduced kernel launches
185
-
186
- 2. **PyTorch MPS Backend**
187
- - Metal Performance Shaders for PyTorch models
188
- - FP16 optimization and channels-last tensor format
189
-
190
- 3. **llama.cpp (GGUF Backend)**
191
- - Metal-accelerated inference for GGUF models
192
-
193
- 4. **Memory Management**
194
- - Pre-allocated memory pools with best-fit/first-fit allocation strategies
195
- - Automatic pool sizing (60% of available memory, capped at 75% of total)
196
- - Defragmentation support
197
-
198
- ### Understanding "Skipping Kernel" Messages
199
-
200
- When loading GGUF models, you may see messages like:
201
- ```
202
- ggml_metal_init: skipping kernel_xxx_bf16 (not supported)
203
- ```
204
-
205
- **These are NORMAL!** They indicate:
206
- - BF16 kernels being skipped (your GPU uses FP16 instead)
207
- - GPU acceleration is still fully active
208
- - The system automatically uses optimal alternatives
209
-
210
- ## Troubleshooting
211
-
212
- If you suspect GPU isn't being used:
213
-
214
- 1. **Run validation**: `python tests/test_apple_silicon.py`
215
- 2. **Check output**: Should see passing checks and measured GFLOPS
216
- 3. **Monitor tokens/sec**: Displayed during inference
217
- 4. **Verify Metal**: Ensure Xcode Command Line Tools installed
218
-
219
- Common issues:
220
- - **Low performance**: Run `python tests/test_apple_silicon.py` to diagnose
221
- - **Memory errors**: Reduce `gpu_memory_fraction` in config.yaml
222
-
223
- ## MLX Model Conversion
224
-
225
- Cortex includes an MLX model converter:
226
-
227
- ```python
228
- from cortex.metal.mlx_converter import MLXConverter, ConversionConfig, QuantizationRecipe
229
-
230
- converter = MLXConverter()
231
- config = ConversionConfig(
232
- quantization=QuantizationRecipe.SPEED_4BIT, # 4-bit quantization
233
- compile_model=True # JIT compilation
234
- )
235
-
236
- success, message, output_path = converter.convert_model(
237
- "microsoft/DialoGPT-medium",
238
- config=config
239
- )
240
- ```
241
-
242
- ### Quantization Options
243
-
244
- - **4-bit**: Maximum speed, 75% size reduction
245
- - **5-bit**: Balanced speed and quality
246
- - **8-bit**: Higher quality, 50% size reduction
247
- - **Mixed Precision**: Custom per-layer quantization
248
-
249
- ## MLX as Primary Backend
250
-
251
- Cortex uses MLX (Apple's machine learning framework) as the primary acceleration backend:
252
- - **Metal Support**: GPU execution via MLX's built-in Metal operations
253
- - **Quantization**: Support for 4-bit, 5-bit, 8-bit, and mixed-precision quantization
254
- - **Model Conversion**: Convert HuggingFace models to MLX format
255
-
256
- ## Built With
257
-
258
- - [MLX](https://github.com/ml-explore/mlx) - Apple's machine learning framework
259
- - [mlx-lm](https://github.com/ml-explore/mlx-examples) - LLM utilities and LoRA fine-tuning for MLX
260
- - [PyTorch](https://pytorch.org/) - With Metal Performance Shaders backend
261
- - [llama.cpp](https://github.com/ggerganov/llama.cpp) - Metal-accelerated GGUF support
262
- - [Rich](https://github.com/Textualize/rich) - Terminal formatting
263
- - [HuggingFace](https://huggingface.co/) - Model hub and transformers
264
-
265
- ## Contributing
266
-
267
- We welcome contributions! Please see the [Development Guide](docs/development.md) for contributing guidelines and setup instructions.
268
-
269
- ## License
270
-
271
- MIT License - See [LICENSE](LICENSE) for details.
272
-
273
- ---
274
-
275
- **Note**: Cortex requires Apple Silicon. Intel Macs are not supported.