@aladac/hu 0.1.0-a1 → 0.1.0-a2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,179 @@
1
+ # Implementation Plan: Local Model Integration
2
+
3
+ ## Overview
4
+
5
+ Extend the existing `models/` module to support MLX LLM inference (text/vision), HuggingFace model discovery, and LM Studio cache scanning.
6
+
7
+ **Key Principle**: Extend existing infrastructure, don't replace it.
8
+
9
+ ---
10
+
11
+ ## Phase 1: Dependencies & Cache Schema
12
+
13
+ ### 1.1 Update `pyproject.toml`
14
+ Add new dependencies:
15
+ ```toml
16
+ "mlx-lm>=0.20", # Text LLM generation
17
+ "huggingface_hub>=0.25", # HuggingFace downloads
18
+ ```
19
+
20
+ ### 1.2 Extend `CachedModel` in `src/prompts/models/cache.py`
21
+ Add fields (backward compatible):
22
+ - `hf_repo_id: Optional[str]` - e.g., "mlx-community/Qwen2.5-VL-3B"
23
+ - `hf_revision: Optional[str]` - git revision hash
24
+ - `memory_gb: Optional[float]` - estimated VRAM usage
25
+
26
+ Update `to_dict()` and `from_dict()` methods.
27
+
28
+ ### 1.3 Update `src/prompts/models/__init__.py`
29
+ Add path constants:
30
+ ```python
31
+ LMSTUDIO_MODELS_PATH = Path.home() / ".lmstudio" / "models" / "mlx-community"
32
+ PROMPTS_MLX_CACHE = Path.home() / ".cache" / "prompts" / "mlx"
33
+ ```
34
+
35
+ ---
36
+
37
+ ## Phase 2: LM Studio Scanner
38
+
39
+ ### 2.1 Create `src/prompts/models/lmstudio.py`
40
+ New file with:
41
+ - `LMStudioModel` dataclass
42
+ - `scan_lmstudio()` function - iterate MLX model directories
43
+ - Parse `config.json` to detect text vs vision models
44
+ - Calculate directory size for memory estimation
45
+
46
+ ### 2.2 Add `scan` command to `src/prompts/models/cli.py`
47
+ ```
48
+ prompts models scan # Scan all sources
49
+ prompts models scan --lmstudio # LM Studio only
50
+ prompts models scan --sm # StabilityMatrix only
51
+ ```
52
+
53
+ ---
54
+
55
+ ## Phase 3: HuggingFace Integration
56
+
57
+ ### 3.1 Create `src/prompts/models/huggingface.py`
58
+ New file with:
59
+ - `HFModelInfo` dataclass
60
+ - `search_mlx_models(query)` - search mlx-community namespace
61
+ - `download_mlx_model(repo_id)` - use `snapshot_download()`
62
+ - `check_existing_caches(repo_id)` - check LM Studio/prompts cache first
63
+
64
+ ### 3.2 Extend `search` command in `src/prompts/models/cli.py`
65
+ Add `--provider hf` option to existing search command.
66
+
67
+ ### 3.3 Add `download` command to `src/prompts/models/cli.py`
68
+ ```
69
+ prompts models download mlx-community/Qwen2.5-VL-3B
70
+ prompts models download 827184 --provider civitai
71
+ ```
72
+ Auto-detect provider from ID format.
73
+
74
+ ---
75
+
76
+ ## Phase 4: MLX Runtime Classes
77
+
78
+ ### 4.1 Create `src/prompts/models/runtime/` directory
79
+ New submodule structure:
80
+ ```
81
+ runtime/
82
+ ├── __init__.py # Convenience functions
83
+ ├── text.py # TextLLM singleton class
84
+ └── vision.py # VisionLLM singleton class
85
+ ```
86
+
87
+ ### 4.2 Create `src/prompts/models/runtime/text.py`
88
+ `TextLLM` class:
89
+ - Singleton pattern with thread lock
90
+ - `load(model_path)` - load via `mlx_lm.load()`
91
+ - `generate(prompt)` - generate via `mlx_lm.generate()`
92
+ - `unload()` - free memory with gc.collect()
93
+
94
+ ### 4.3 Create `src/prompts/models/runtime/vision.py`
95
+ `VisionLLM` class:
96
+ - Singleton pattern with thread lock
97
+ - `load(model_path)` - load via `mlx_vlm.load()`
98
+ - `describe(image_path, prompt)` - generate image description
99
+ - `unload()` - free memory
100
+
101
+ ### 4.4 Create `src/prompts/models/runtime/__init__.py`
102
+ Convenience functions:
103
+ - `generate(prompt, model=None)` - text generation
104
+ - `describe_image(image_path, prompt, model=None)` - vision
105
+ - `_resolve_model_path(model_id)` - check caches
106
+
107
+ ---
108
+
109
+ ## Phase 5: Additional CLI Commands
110
+
111
+ ### 5.1 Add `convert` command
112
+ ```
113
+ prompts models convert Qwen/Qwen2.5-7B-Instruct --quantize 4bit
114
+ ```
115
+ Use `mlx_lm.convert` or `mlx_vlm.convert`.
116
+
117
+ ### 5.2 Add `remove` command
118
+ ```
119
+ prompts models remove <model_id> # Remove from manifest
120
+ prompts models remove <model_id> --delete # Also delete files
121
+ ```
122
+ Skip deletion for external sources (lmstudio, sm).
123
+
124
+ ### 5.3 Add `nuke` command (top-level or under models)
125
+ ```
126
+ prompts nuke --cache # Delete ~/.cache/prompts/
127
+ prompts nuke --config # Delete ~/.config/prompts/
128
+ prompts nuke --all # Delete both
129
+ ```
130
+ Require confirmation unless `--yes` flag.
131
+
132
+ ---
133
+
134
+ ## Files to Modify
135
+
136
+ | File | Action |
137
+ |------|--------|
138
+ | `pyproject.toml` | Add mlx-lm, huggingface_hub deps |
139
+ | `src/prompts/models/__init__.py` | Add path constants |
140
+ | `src/prompts/models/cache.py` | Add hf_repo_id, hf_revision, memory_gb fields |
141
+ | `src/prompts/models/cli.py` | Add scan, download, convert, remove commands |
142
+
143
+ ## Files to Create
144
+
145
+ | File | Purpose |
146
+ |------|---------|
147
+ | `src/prompts/models/lmstudio.py` | LM Studio MLX scanner |
148
+ | `src/prompts/models/huggingface.py` | HuggingFace search/download |
149
+ | `src/prompts/models/runtime/__init__.py` | Convenience functions |
150
+ | `src/prompts/models/runtime/text.py` | TextLLM singleton |
151
+ | `src/prompts/models/runtime/vision.py` | VisionLLM singleton |
152
+
153
+ ---
154
+
155
+ ## Verification
156
+
157
+ 1. **Scan test**: `uv run prompts models scan` - should discover LM Studio models
158
+ 2. **Search test**: `uv run prompts models search "qwen" --provider hf` - HuggingFace results
159
+ 3. **Download test**: `uv run prompts models download mlx-community/Qwen2.5-0.5B-Instruct-4bit`
160
+ 4. **Runtime test**: Python import `from prompts.models.runtime import generate, describe_image`
161
+
162
+ ---
163
+
164
+ ## Deferred (Not in This Implementation)
165
+
166
+ - **SD Runtime (Diffusers)**: Deferred since ComfyUI handles SD generation via StabilityMatrix
167
+ - **MLX Conversion**: Lower priority, can add later
168
+ - **Unit Tests**: Add after core implementation works
169
+
170
+ ---
171
+
172
+ ## Implementation Order
173
+
174
+ 1. Dependencies (pyproject.toml)
175
+ 2. Cache schema extension (cache.py, __init__.py)
176
+ 3. LM Studio scanner + scan command
177
+ 4. HuggingFace integration + download command
178
+ 5. MLX runtime classes
179
+ 6. Additional CLI commands (remove, nuke)