free-gpu 0.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- free_gpu-0.1.1/PKG-INFO +317 -0
- free_gpu-0.1.1/README.md +299 -0
- free_gpu-0.1.1/free_gpu/__init__.py +2 -0
- free_gpu-0.1.1/free_gpu/cli.py +217 -0
- free_gpu-0.1.1/free_gpu/data.py +68 -0
- free_gpu-0.1.1/free_gpu/gpu_compute_database.csv +62 -0
- free_gpu-0.1.1/free_gpu/http_app.py +35 -0
- free_gpu-0.1.1/free_gpu/llmfit_adapter.py +140 -0
- free_gpu-0.1.1/free_gpu/mcp_server.py +206 -0
- free_gpu-0.1.1/free_gpu/models.py +145 -0
- free_gpu-0.1.1/free_gpu/planner.py +482 -0
- free_gpu-0.1.1/free_gpu/tui.py +470 -0
- free_gpu-0.1.1/free_gpu.egg-info/PKG-INFO +317 -0
- free_gpu-0.1.1/free_gpu.egg-info/SOURCES.txt +19 -0
- free_gpu-0.1.1/free_gpu.egg-info/dependency_links.txt +1 -0
- free_gpu-0.1.1/free_gpu.egg-info/entry_points.txt +3 -0
- free_gpu-0.1.1/free_gpu.egg-info/requires.txt +7 -0
- free_gpu-0.1.1/free_gpu.egg-info/top_level.txt +1 -0
- free_gpu-0.1.1/pyproject.toml +38 -0
- free_gpu-0.1.1/setup.cfg +4 -0
- free_gpu-0.1.1/tests/test_planner.py +93 -0
free_gpu-0.1.1/PKG-INFO
ADDED
|
@@ -0,0 +1,317 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: free-gpu
|
|
3
|
+
Version: 0.1.1
|
|
4
|
+
Summary: Plan local and free-tier GPU workflows around llmfit and curated provider data.
|
|
5
|
+
Author: Francesco Piccolo
|
|
6
|
+
Project-URL: Homepage, https://francescoopiccolo.github.io/free-gpu/
|
|
7
|
+
Project-URL: Repository, https://github.com/francescoopiccolo/free-gpu
|
|
8
|
+
Project-URL: Documentation, https://francescoopiccolo.github.io/free-gpu/
|
|
9
|
+
Project-URL: Issues, https://github.com/francescoopiccolo/free-gpu/issues
|
|
10
|
+
Requires-Python: >=3.12
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
Requires-Dist: fastapi<1,>=0.115
|
|
13
|
+
Requires-Dist: mcp<2,>=1.27
|
|
14
|
+
Requires-Dist: textual<0.90,>=0.80
|
|
15
|
+
Provides-Extra: publish
|
|
16
|
+
Requires-Dist: build>=1.2; extra == "publish"
|
|
17
|
+
Requires-Dist: twine>=5; extra == "publish"
|
|
18
|
+
|
|
19
|
+
# free-gpu
|
|
20
|
+
|
|
21
|
+
`free-gpu` is a terminal-first planner for free and near-free compute.
|
|
22
|
+
|
|
23
|
+
It is designed to sit on top of [`llmfit`](https://www.llmfit.org/):
|
|
24
|
+
|
|
25
|
+
- `llmfit` answers: what models fit my local hardware?
|
|
26
|
+
- `free-gpu` answers: given this workload and compute need, which providers should I use, and how should I split work across local plus remote stages?
|
|
27
|
+
|
|
28
|
+
The point is not to clone `llmfit`. The point is to use `llmfit` as the local-fit engine, then add provider filtering, role-aware ranking, and workflow planning around free, cheap, and grant-style compute.
|
|
29
|
+
|
|
30
|
+
## What users actually get
|
|
31
|
+
|
|
32
|
+
`free-gpu` helps answer questions like:
|
|
33
|
+
|
|
34
|
+
- "I need quick inference for a small coding task. Which free-tier provider is the least painful?"
|
|
35
|
+
- "I need a few hours of GPU time for LoRA fine-tuning. Should I look at credits, trials, or a cloud free tier?"
|
|
36
|
+
- "This task is too heavy for casual free tiers. Which grant or program lane should I think about instead?"
|
|
37
|
+
- "What should stay local, and what should move to remote compute?"
|
|
38
|
+
|
|
39
|
+
## What the repo includes
|
|
40
|
+
|
|
41
|
+
- The original provider dataset in [`free_gpu/gpu_compute_database.csv`](./free_gpu/gpu_compute_database.csv)
|
|
42
|
+
- A Python CLI for provider ranking and workflow planning
|
|
43
|
+
- A Textual TUI focused on provider selection rather than local model browsing
|
|
44
|
+
- A small MCP server so external agents can ask for provider plans programmatically
|
|
45
|
+
- A GitHub Pages-ready project page in [`docs/index.html`](./docs/index.html)
|
|
46
|
+
|
|
47
|
+
## Core product rules
|
|
48
|
+
|
|
49
|
+
- Role is a ranking lens, not a hard exclusion filter.
|
|
50
|
+
- Budget buckets are semantic UX buckets, not literal accounting truth.
|
|
51
|
+
- Grant-like providers behave like card-required options.
|
|
52
|
+
- The planner should surface the right provider lane for the task instead of treating every task as the same generic ranking problem.
|
|
53
|
+
|
|
54
|
+
## Provider lanes
|
|
55
|
+
|
|
56
|
+
`free-gpu` is not only about "free" in the narrow sense. It plans across several practical lanes:
|
|
57
|
+
|
|
58
|
+
- `free tier`: browser notebooks, starter quotas, short session access
|
|
59
|
+
- `under-25`: credits, trials, starter plans, light paid-but-cheap access
|
|
60
|
+
- `grant`: startup programs, research allocations, application-based access, and other heavier support paths
|
|
61
|
+
|
|
62
|
+
That matters because different tasks naturally fall into different lanes:
|
|
63
|
+
|
|
64
|
+
- quick demos and lightweight inference often fit the free-tier lane
|
|
65
|
+
- medium notebook work and moderate fine-tunes often fit the under-25 or credit lane
|
|
66
|
+
- heavier training and long-running work often belong in the grant lane
|
|
67
|
+
|
|
68
|
+
## Workflow logic
|
|
69
|
+
|
|
70
|
+
The planner estimates a compute lane from:
|
|
71
|
+
|
|
72
|
+
- workload
|
|
73
|
+
- model size
|
|
74
|
+
- target VRAM
|
|
75
|
+
- estimated task hours
|
|
76
|
+
- parallel jobs
|
|
77
|
+
- API needs
|
|
78
|
+
|
|
79
|
+
Then it schedules providers accordingly:
|
|
80
|
+
|
|
81
|
+
- `burst`: short runs, quick inference, fast-start options
|
|
82
|
+
- `session`: notebook or credit-backed work that lasts longer
|
|
83
|
+
- `heavy`: bigger VRAM or sustained remote compute
|
|
84
|
+
- `grant-scale`: tasks that look more like allocations, programs, or heavy research/startup support
|
|
85
|
+
|
|
86
|
+
Each workflow step carries its own compute summary, so a multi-stage plan can recommend different provider types for prep, fine-tune, eval, and serving.
|
|
87
|
+
|
|
88
|
+
## How Pages and MCP fit together
|
|
89
|
+
|
|
90
|
+
The project has two different surfaces:
|
|
91
|
+
|
|
92
|
+
- GitHub Pages hosts the public project site and docs
|
|
93
|
+
- the MCP server runs locally on the user's own machine
|
|
94
|
+
|
|
95
|
+
GitHub Pages cannot run the planner logic or host the Python MCP server. It is only the public website.
|
|
96
|
+
|
|
97
|
+
The actual MCP workflow is:
|
|
98
|
+
|
|
99
|
+
1. a user installs `free-gpu`
|
|
100
|
+
2. the user runs `free-gpu-mcp` locally
|
|
101
|
+
3. their coding agent connects to that local MCP server
|
|
102
|
+
4. the agent can call planner tools such as `plan_provider_workflow`
|
|
103
|
+
|
|
104
|
+
That means:
|
|
105
|
+
|
|
106
|
+
- no hosting cost for you
|
|
107
|
+
- no central backend to maintain
|
|
108
|
+
- users keep control because the tool runs locally
|
|
109
|
+
- any MCP-capable coding agent can use it if it supports local MCP servers
|
|
110
|
+
|
|
111
|
+
This repository also supports an optional hosted HTTP deployment. If you deploy it on Vercel, the MCP endpoint is exposed at `/mcp`.
|
|
112
|
+
|
|
113
|
+
## Install
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
python -m pip install free-gpu
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
For local development from the repository:
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
python -m pip install -e .
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
## CLI
|
|
126
|
+
|
|
127
|
+
### Local profile
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
free-gpu local
|
|
131
|
+
free-gpu local --ram-gb 32 --vram-gb 12
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
### Provider ranking
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
free-gpu providers --workload inference --budget free
|
|
138
|
+
free-gpu providers --workload agent-loop --budget under-25 --task-hours 3 --parallel-jobs 4 --requires-api
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Workflow planning
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
free-gpu plan --workload inference --model qwen2.5-coder-7b --ram-gb 32 --vram-gb 8
|
|
145
|
+
free-gpu plan --workload finetune-lora --model llama-3.1-8b --budget under-25 --task-hours 6 --min-vram-gb 16
|
|
146
|
+
free-gpu plan --workload scratch-train --budget grant --task-hours 24 --min-vram-gb 40
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
Useful planning flags:
|
|
150
|
+
|
|
151
|
+
- `--task-hours`
|
|
152
|
+
- `--min-vram-gb`
|
|
153
|
+
- `--parallel-jobs`
|
|
154
|
+
- `--requires-api`
|
|
155
|
+
- `--budget any|free|under-25|grant`
|
|
156
|
+
|
|
157
|
+
Every command also accepts `--json`.
|
|
158
|
+
|
|
159
|
+
## Terminal UI
|
|
160
|
+
|
|
161
|
+
Run:
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
free-gpu ui
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
The TUI is inspired by `llmfit`'s visual grammar, but it stays focused on provider planning:
|
|
168
|
+
|
|
169
|
+
- a top system bar with local hardware context from `llmfit`
|
|
170
|
+
- broad provider browsing by default
|
|
171
|
+
- role, workload, budget, and payment filters
|
|
172
|
+
- a central provider table
|
|
173
|
+
- bottom panes for links, recommendation context, and workflow summary
|
|
174
|
+
|
|
175
|
+
Current budget options in the TUI:
|
|
176
|
+
|
|
177
|
+
- `Budget Any`
|
|
178
|
+
- `Free`
|
|
179
|
+
- `<25`
|
|
180
|
+
- `Grant`
|
|
181
|
+
|
|
182
|
+
## llmfit integration
|
|
183
|
+
|
|
184
|
+
If `llmfit` is installed, `free-gpu` will try to use:
|
|
185
|
+
|
|
186
|
+
- `llmfit system --json`
|
|
187
|
+
- `llmfit recommend -n N --json`
|
|
188
|
+
|
|
189
|
+
The adapter uses structured JSON output rather than scraping terminal text. If `llmfit` is missing or parsing fails, `free-gpu` continues in provider-first mode and reports what it could not infer.
|
|
190
|
+
|
|
191
|
+
## MCP server
|
|
192
|
+
|
|
193
|
+
Run:
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
free-gpu-mcp
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
The MCP server exposes tools for compute-aware planning, including:
|
|
200
|
+
|
|
201
|
+
- `plan_provider_workflow`
|
|
202
|
+
- `rank_providers_for_task`
|
|
203
|
+
- `assess_task_compute`
|
|
204
|
+
|
|
205
|
+
It also exposes a small dataset summary resource:
|
|
206
|
+
|
|
207
|
+
- `providers://snapshot`
|
|
208
|
+
|
|
209
|
+
### What the MCP is for
|
|
210
|
+
|
|
211
|
+
The MCP lets an agent ask questions such as:
|
|
212
|
+
|
|
213
|
+
- "Plan a cheap inference workflow for this task"
|
|
214
|
+
- "Rank providers for a 6-hour fine-tune that needs about 16 GB VRAM"
|
|
215
|
+
- "Does this task look like free-tier, credit-tier, or grant-scale work?"
|
|
216
|
+
|
|
217
|
+
### Generic local MCP setup
|
|
218
|
+
|
|
219
|
+
If your coding agent supports local MCP servers over stdio, the setup is conceptually:
|
|
220
|
+
|
|
221
|
+
```json
|
|
222
|
+
{
|
|
223
|
+
"mcpServers": {
|
|
224
|
+
"free-gpu": {
|
|
225
|
+
"command": "free-gpu-mcp"
|
|
226
|
+
}
|
|
227
|
+
}
|
|
228
|
+
}
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
The exact config file depends on the agent, but the idea is the same: point the client at the local `free-gpu-mcp` command.
|
|
232
|
+
|
|
233
|
+
### Hosted HTTP MCP on Vercel
|
|
234
|
+
|
|
235
|
+
This repository also includes a Vercel-friendly HTTP entrypoint via [`app.py`](./app.py).
|
|
236
|
+
|
|
237
|
+
When deployed on Vercel:
|
|
238
|
+
|
|
239
|
+
- `/` returns a small service description
|
|
240
|
+
- `/health` returns a simple health check
|
|
241
|
+
- `/mcp` is the MCP endpoint to connect to
|
|
242
|
+
- the live hosted endpoint for this repo is `https://free-gpu.vercel.app/mcp`
|
|
243
|
+
|
|
244
|
+
That means an MCP-capable client that supports remote HTTP MCP can connect to:
|
|
245
|
+
|
|
246
|
+
```text
|
|
247
|
+
https://free-gpu.vercel.app/mcp
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
If you open `/mcp` in a browser, it may return a protocol-level error such as `406 Not Acceptable`. That is expected: the route is meant for MCP clients, not normal browser navigation.
|
|
251
|
+
|
|
252
|
+
Example MCP-style request shape:
|
|
253
|
+
|
|
254
|
+
```json
|
|
255
|
+
{
|
|
256
|
+
"tool": "plan_provider_workflow",
|
|
257
|
+
"arguments": {
|
|
258
|
+
"workload": "agent-loop",
|
|
259
|
+
"budget": "under-25",
|
|
260
|
+
"task_hours": 3,
|
|
261
|
+
"parallel_jobs": 4,
|
|
262
|
+
"requires_api": true
|
|
263
|
+
}
|
|
264
|
+
}
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
Example agent flow:
|
|
268
|
+
|
|
269
|
+
1. You ask your coding agent: "I need to fine-tune an 8B model for about 6 hours and want to stay near free."
|
|
270
|
+
2. The agent calls `plan_provider_workflow`.
|
|
271
|
+
3. `free-gpu` estimates the compute lane.
|
|
272
|
+
4. The agent gets back a structured plan with:
|
|
273
|
+
- local vs remote recommendation
|
|
274
|
+
- stage-by-stage workflow
|
|
275
|
+
- top providers for that compute need
|
|
276
|
+
- whether the task fits free tier, cheap credits, or a grant-style path
|
|
277
|
+
|
|
278
|
+
## GitHub Pages
|
|
279
|
+
|
|
280
|
+
A project page is included in [`docs/index.html`](./docs/index.html).
|
|
281
|
+
|
|
282
|
+
On GitHub, enable Pages and point it at:
|
|
283
|
+
|
|
284
|
+
- Branch: `main`
|
|
285
|
+
- Folder: `/docs`
|
|
286
|
+
|
|
287
|
+
## Tests
|
|
288
|
+
|
|
289
|
+
Run:
|
|
290
|
+
|
|
291
|
+
```bash
|
|
292
|
+
python -m unittest tests.test_planner -v
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
## Packaging and publishing
|
|
296
|
+
|
|
297
|
+
The project is structured so end users do not need to clone the repository.
|
|
298
|
+
|
|
299
|
+
After publishing to PyPI, users can install it with:
|
|
300
|
+
|
|
301
|
+
```bash
|
|
302
|
+
pip install free-gpu
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
To build distribution artifacts locally:
|
|
306
|
+
|
|
307
|
+
```bash
|
|
308
|
+
python -m pip install ".[publish]"
|
|
309
|
+
python -m build
|
|
310
|
+
python -m twine check dist/*
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
To publish to PyPI once you have a token for the `free-gpu` project:
|
|
314
|
+
|
|
315
|
+
```bash
|
|
316
|
+
python -m twine upload dist/*
|
|
317
|
+
```
|
free_gpu-0.1.1/README.md
ADDED
|
@@ -0,0 +1,299 @@
|
|
|
1
|
+
# free-gpu
|
|
2
|
+
|
|
3
|
+
`free-gpu` is a terminal-first planner for free and near-free compute.
|
|
4
|
+
|
|
5
|
+
It is designed to sit on top of [`llmfit`](https://www.llmfit.org/):
|
|
6
|
+
|
|
7
|
+
- `llmfit` answers: what models fit my local hardware?
|
|
8
|
+
- `free-gpu` answers: given this workload and compute need, which providers should I use, and how should I split work across local plus remote stages?
|
|
9
|
+
|
|
10
|
+
The point is not to clone `llmfit`. The point is to use `llmfit` as the local-fit engine, then add provider filtering, role-aware ranking, and workflow planning around free, cheap, and grant-style compute.
|
|
11
|
+
|
|
12
|
+
## What users actually get
|
|
13
|
+
|
|
14
|
+
`free-gpu` helps answer questions like:
|
|
15
|
+
|
|
16
|
+
- "I need quick inference for a small coding task. Which free-tier provider is the least painful?"
|
|
17
|
+
- "I need a few hours of GPU time for LoRA fine-tuning. Should I look at credits, trials, or a cloud free tier?"
|
|
18
|
+
- "This task is too heavy for casual free tiers. Which grant or program lane should I think about instead?"
|
|
19
|
+
- "What should stay local, and what should move to remote compute?"
|
|
20
|
+
|
|
21
|
+
## What the repo includes
|
|
22
|
+
|
|
23
|
+
- The original provider dataset in [`free_gpu/gpu_compute_database.csv`](./free_gpu/gpu_compute_database.csv)
|
|
24
|
+
- A Python CLI for provider ranking and workflow planning
|
|
25
|
+
- A Textual TUI focused on provider selection rather than local model browsing
|
|
26
|
+
- A small MCP server so external agents can ask for provider plans programmatically
|
|
27
|
+
- A GitHub Pages-ready project page in [`docs/index.html`](./docs/index.html)
|
|
28
|
+
|
|
29
|
+
## Core product rules
|
|
30
|
+
|
|
31
|
+
- Role is a ranking lens, not a hard exclusion filter.
|
|
32
|
+
- Budget buckets are semantic UX buckets, not literal accounting truth.
|
|
33
|
+
- Grant-like providers behave like card-required options.
|
|
34
|
+
- The planner should surface the right provider lane for the task instead of treating every task as the same generic ranking problem.
|
|
35
|
+
|
|
36
|
+
## Provider lanes
|
|
37
|
+
|
|
38
|
+
`free-gpu` is not only about "free" in the narrow sense. It plans across several practical lanes:
|
|
39
|
+
|
|
40
|
+
- `free tier`: browser notebooks, starter quotas, short session access
|
|
41
|
+
- `under-25`: credits, trials, starter plans, light paid-but-cheap access
|
|
42
|
+
- `grant`: startup programs, research allocations, application-based access, and other heavier support paths
|
|
43
|
+
|
|
44
|
+
That matters because different tasks naturally fall into different lanes:
|
|
45
|
+
|
|
46
|
+
- quick demos and lightweight inference often fit the free-tier lane
|
|
47
|
+
- medium notebook work and moderate fine-tunes often fit the under-25 or credit lane
|
|
48
|
+
- heavier training and long-running work often belong in the grant lane
|
|
49
|
+
|
|
50
|
+
## Workflow logic
|
|
51
|
+
|
|
52
|
+
The planner estimates a compute lane from:
|
|
53
|
+
|
|
54
|
+
- workload
|
|
55
|
+
- model size
|
|
56
|
+
- target VRAM
|
|
57
|
+
- estimated task hours
|
|
58
|
+
- parallel jobs
|
|
59
|
+
- API needs
|
|
60
|
+
|
|
61
|
+
Then it schedules providers accordingly:
|
|
62
|
+
|
|
63
|
+
- `burst`: short runs, quick inference, fast-start options
|
|
64
|
+
- `session`: notebook or credit-backed work that lasts longer
|
|
65
|
+
- `heavy`: bigger VRAM or sustained remote compute
|
|
66
|
+
- `grant-scale`: tasks that look more like allocations, programs, or heavy research/startup support
|
|
67
|
+
|
|
68
|
+
Each workflow step carries its own compute summary, so a multi-stage plan can recommend different provider types for prep, fine-tune, eval, and serving.
|
|
69
|
+
|
|
70
|
+
## How Pages and MCP fit together
|
|
71
|
+
|
|
72
|
+
The project has two different surfaces:
|
|
73
|
+
|
|
74
|
+
- GitHub Pages hosts the public project site and docs
|
|
75
|
+
- the MCP server runs locally on the user's own machine
|
|
76
|
+
|
|
77
|
+
GitHub Pages cannot run the planner logic or host the Python MCP server. It is only the public website.
|
|
78
|
+
|
|
79
|
+
The actual MCP workflow is:
|
|
80
|
+
|
|
81
|
+
1. a user installs `free-gpu`
|
|
82
|
+
2. the user runs `free-gpu-mcp` locally
|
|
83
|
+
3. their coding agent connects to that local MCP server
|
|
84
|
+
4. the agent can call planner tools such as `plan_provider_workflow`
|
|
85
|
+
|
|
86
|
+
That means:
|
|
87
|
+
|
|
88
|
+
- no hosting cost for you
|
|
89
|
+
- no central backend to maintain
|
|
90
|
+
- users keep control because the tool runs locally
|
|
91
|
+
- any MCP-capable coding agent can use it if it supports local MCP servers
|
|
92
|
+
|
|
93
|
+
This repository also supports an optional hosted HTTP deployment. If you deploy it on Vercel, the MCP endpoint is exposed at `/mcp`.
|
|
94
|
+
|
|
95
|
+
## Install
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
python -m pip install free-gpu
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
For local development from the repository:
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
python -m pip install -e .
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
## CLI
|
|
108
|
+
|
|
109
|
+
### Local profile
|
|
110
|
+
|
|
111
|
+
```bash
|
|
112
|
+
free-gpu local
|
|
113
|
+
free-gpu local --ram-gb 32 --vram-gb 12
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### Provider ranking
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
free-gpu providers --workload inference --budget free
|
|
120
|
+
free-gpu providers --workload agent-loop --budget under-25 --task-hours 3 --parallel-jobs 4 --requires-api
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### Workflow planning
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
free-gpu plan --workload inference --model qwen2.5-coder-7b --ram-gb 32 --vram-gb 8
|
|
127
|
+
free-gpu plan --workload finetune-lora --model llama-3.1-8b --budget under-25 --task-hours 6 --min-vram-gb 16
|
|
128
|
+
free-gpu plan --workload scratch-train --budget grant --task-hours 24 --min-vram-gb 40
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
Useful planning flags:
|
|
132
|
+
|
|
133
|
+
- `--task-hours`
|
|
134
|
+
- `--min-vram-gb`
|
|
135
|
+
- `--parallel-jobs`
|
|
136
|
+
- `--requires-api`
|
|
137
|
+
- `--budget any|free|under-25|grant`
|
|
138
|
+
|
|
139
|
+
Every command also accepts `--json`.
|
|
140
|
+
|
|
141
|
+
## Terminal UI
|
|
142
|
+
|
|
143
|
+
Run:
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
free-gpu ui
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
The TUI is inspired by `llmfit`'s visual grammar, but it stays focused on provider planning:
|
|
150
|
+
|
|
151
|
+
- a top system bar with local hardware context from `llmfit`
|
|
152
|
+
- broad provider browsing by default
|
|
153
|
+
- role, workload, budget, and payment filters
|
|
154
|
+
- a central provider table
|
|
155
|
+
- bottom panes for links, recommendation context, and workflow summary
|
|
156
|
+
|
|
157
|
+
Current budget options in the TUI:
|
|
158
|
+
|
|
159
|
+
- `Budget Any`
|
|
160
|
+
- `Free`
|
|
161
|
+
- `<25`
|
|
162
|
+
- `Grant`
|
|
163
|
+
|
|
164
|
+
## llmfit integration
|
|
165
|
+
|
|
166
|
+
If `llmfit` is installed, `free-gpu` will try to use:
|
|
167
|
+
|
|
168
|
+
- `llmfit system --json`
|
|
169
|
+
- `llmfit recommend -n N --json`
|
|
170
|
+
|
|
171
|
+
The adapter uses structured JSON output rather than scraping terminal text. If `llmfit` is missing or parsing fails, `free-gpu` continues in provider-first mode and reports what it could not infer.
|
|
172
|
+
|
|
173
|
+
## MCP server
|
|
174
|
+
|
|
175
|
+
Run:
|
|
176
|
+
|
|
177
|
+
```bash
|
|
178
|
+
free-gpu-mcp
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
The MCP server exposes tools for compute-aware planning, including:
|
|
182
|
+
|
|
183
|
+
- `plan_provider_workflow`
|
|
184
|
+
- `rank_providers_for_task`
|
|
185
|
+
- `assess_task_compute`
|
|
186
|
+
|
|
187
|
+
It also exposes a small dataset summary resource:
|
|
188
|
+
|
|
189
|
+
- `providers://snapshot`
|
|
190
|
+
|
|
191
|
+
### What the MCP is for
|
|
192
|
+
|
|
193
|
+
The MCP lets an agent ask questions such as:
|
|
194
|
+
|
|
195
|
+
- "Plan a cheap inference workflow for this task"
|
|
196
|
+
- "Rank providers for a 6-hour fine-tune that needs about 16 GB VRAM"
|
|
197
|
+
- "Does this task look like free-tier, credit-tier, or grant-scale work?"
|
|
198
|
+
|
|
199
|
+
### Generic local MCP setup
|
|
200
|
+
|
|
201
|
+
If your coding agent supports local MCP servers over stdio, the setup is conceptually:
|
|
202
|
+
|
|
203
|
+
```json
|
|
204
|
+
{
|
|
205
|
+
"mcpServers": {
|
|
206
|
+
"free-gpu": {
|
|
207
|
+
"command": "free-gpu-mcp"
|
|
208
|
+
}
|
|
209
|
+
}
|
|
210
|
+
}
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
The exact config file depends on the agent, but the idea is the same: point the client at the local `free-gpu-mcp` command.
|
|
214
|
+
|
|
215
|
+
### Hosted HTTP MCP on Vercel
|
|
216
|
+
|
|
217
|
+
This repository also includes a Vercel-friendly HTTP entrypoint via [`app.py`](./app.py).
|
|
218
|
+
|
|
219
|
+
When deployed on Vercel:
|
|
220
|
+
|
|
221
|
+
- `/` returns a small service description
|
|
222
|
+
- `/health` returns a simple health check
|
|
223
|
+
- `/mcp` is the MCP endpoint to connect to
|
|
224
|
+
- the live hosted endpoint for this repo is `https://free-gpu.vercel.app/mcp`
|
|
225
|
+
|
|
226
|
+
That means an MCP-capable client that supports remote HTTP MCP can connect to:
|
|
227
|
+
|
|
228
|
+
```text
|
|
229
|
+
https://free-gpu.vercel.app/mcp
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
If you open `/mcp` in a browser, it may return a protocol-level error such as `406 Not Acceptable`. That is expected: the route is meant for MCP clients, not normal browser navigation.
|
|
233
|
+
|
|
234
|
+
Example MCP-style request shape:
|
|
235
|
+
|
|
236
|
+
```json
|
|
237
|
+
{
|
|
238
|
+
"tool": "plan_provider_workflow",
|
|
239
|
+
"arguments": {
|
|
240
|
+
"workload": "agent-loop",
|
|
241
|
+
"budget": "under-25",
|
|
242
|
+
"task_hours": 3,
|
|
243
|
+
"parallel_jobs": 4,
|
|
244
|
+
"requires_api": true
|
|
245
|
+
}
|
|
246
|
+
}
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
Example agent flow:
|
|
250
|
+
|
|
251
|
+
1. You ask your coding agent: "I need to fine-tune an 8B model for about 6 hours and want to stay near free."
|
|
252
|
+
2. The agent calls `plan_provider_workflow`.
|
|
253
|
+
3. `free-gpu` estimates the compute lane.
|
|
254
|
+
4. The agent gets back a structured plan with:
|
|
255
|
+
- local vs remote recommendation
|
|
256
|
+
- stage-by-stage workflow
|
|
257
|
+
- top providers for that compute need
|
|
258
|
+
- whether the task fits free tier, cheap credits, or a grant-style path
|
|
259
|
+
|
|
260
|
+
## GitHub Pages
|
|
261
|
+
|
|
262
|
+
A project page is included in [`docs/index.html`](./docs/index.html).
|
|
263
|
+
|
|
264
|
+
On GitHub, enable Pages and point it at:
|
|
265
|
+
|
|
266
|
+
- Branch: `main`
|
|
267
|
+
- Folder: `/docs`
|
|
268
|
+
|
|
269
|
+
## Tests
|
|
270
|
+
|
|
271
|
+
Run:
|
|
272
|
+
|
|
273
|
+
```bash
|
|
274
|
+
python -m unittest tests.test_planner -v
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
## Packaging and publishing
|
|
278
|
+
|
|
279
|
+
The project is structured so end users do not need to clone the repository.
|
|
280
|
+
|
|
281
|
+
After publishing to PyPI, users can install it with:
|
|
282
|
+
|
|
283
|
+
```bash
|
|
284
|
+
pip install free-gpu
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
To build distribution artifacts locally:
|
|
288
|
+
|
|
289
|
+
```bash
|
|
290
|
+
python -m pip install ".[publish]"
|
|
291
|
+
python -m build
|
|
292
|
+
python -m twine check dist/*
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
To publish to PyPI once you have a token for the `free-gpu` project:
|
|
296
|
+
|
|
297
|
+
```bash
|
|
298
|
+
python -m twine upload dist/*
|
|
299
|
+
```
|