PyPI - llmstack-cli - Versions diffs - 0.1.0__tar.gz → 0.2.0__tar.gz - Mend

llmstack-cli 0.1.0tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (85) hide show

llmstack_cli-0.2.0/.github/ISSUE_TEMPLATE/bug_report.yml ADDED Viewed

@@ -0,0 +1,38 @@
+name: Bug Report
+description: Report a bug or unexpected behavior
+labels: ["bug"]
+body:
+  - type: textarea
+    id: description
+    attributes:
+      label: What happened?
+      description: A clear description of the bug.
+    validations:
+      required: true
+  - type: textarea
+    id: reproduce
+    attributes:
+      label: Steps to reproduce
+      description: Minimal steps to reproduce the issue.
+      placeholder: |
+        1. llmstack init --preset rag
+        2. llmstack up
+        3. ...
+    validations:
+      required: true
+  - type: textarea
+    id: expected
+    attributes:
+      label: Expected behavior
+      description: What did you expect to happen?
+  - type: textarea
+    id: environment
+    attributes:
+      label: Environment
+      description: Paste the output of `llmstack doctor`
+      render: shell
+  - type: input
+    id: version
+    attributes:
+      label: llmstack version
+      placeholder: "0.1.0"

llmstack_cli-0.2.0/.github/ISSUE_TEMPLATE/feature_request.yml ADDED Viewed

@@ -0,0 +1,23 @@
+name: Feature Request
+description: Suggest a new feature or improvement
+labels: ["enhancement"]
+body:
+  - type: textarea
+    id: problem
+    attributes:
+      label: Problem
+      description: What problem does this solve? What's your use case?
+    validations:
+      required: true
+  - type: textarea
+    id: solution
+    attributes:
+      label: Proposed solution
+      description: How would you like this to work?
+    validations:
+      required: true
+  - type: textarea
+    id: alternatives
+    attributes:
+      label: Alternatives considered
+      description: Any other approaches you've considered?

llmstack_cli-0.2.0/.github/PULL_REQUEST_TEMPLATE.md ADDED Viewed

@@ -0,0 +1,17 @@
+## Summary
+<!-- What does this PR do? -->
+## Changes
+-
+## Test plan
+- [ ] `make test` passes
+- [ ] `make lint` passes
+- [ ] Tested manually with `llmstack up`
+## Related issues
+<!-- Fixes #123 -->

llmstack_cli-0.2.0/.github/SECURITY.md ADDED Viewed

@@ -0,0 +1,17 @@
+# Security Policy
+## Reporting a Vulnerability
+If you discover a security vulnerability, please report it responsibly:
+1. **Do not** open a public issue
+2. Email: [open a private security advisory](https://github.com/mara-werils/llmstack/security/advisories/new)
+3. Include steps to reproduce and potential impact
+We will respond within 48 hours and aim to release a fix within 7 days.
+## Supported Versions
+| Version | Supported |
+|---------|-----------|
+| 0.1.x   | Yes       |

{llmstack_cli-0.1.0 → llmstack_cli-0.2.0}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,18 @@
 # Changelog
+## [0.2.0] - 2026-05-07
+### Added
+- `llmstack chat` — interactive terminal chat with streaming responses
+- `llmstack export` — generate standalone docker-compose.yml from llmstack.yaml
+- GitHub issue templates, PR template, security policy
+### Fixed
+- Gateway Docker image now builds locally (no longer requires ghcr.io)
+- Prometheus and Grafana configs are written to disk before container start
+- Generated API keys persist to llmstack.yaml across restarts
+- Clear error messages for port conflicts
 ## [0.1.0] - 2026-05-07
 ### Added

{llmstack_cli-0.1.0 → llmstack_cli-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: llmstack-cli
-Version: 0.1.0
+Version: 0.2.0
 Summary: One command. Full LLM stack. Zero config.
 Author: mara-werils
 License-Expression: Apache-2.0
@@ -44,21 +44,43 @@ Description-Content-Type: text/markdown
   <a href="https://github.com/mara-werils/llmstack/actions/workflows/ci.yml"><img src="https://github.com/mara-werils/llmstack/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
   <a href="https://github.com/mara-werils/llmstack/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-green" alt="License"></a>
   <a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.11+-blue" alt="Python"></a>
+  <a href="https://github.com/mara-werils/llmstack/stargazers"><img src="https://img.shields.io/github/stars/mara-werils/llmstack?style=social" alt="Stars"></a>
 </p>
 ---
-**llmstack** spins up a production-grade LLM stack locally with a single command. It auto-detects your hardware, picks the optimal inference backend, and wires everything together.
+<p align="center">
+  <img src="demo.gif" alt="llmstack demo" width="800">
+</p>
+## Quick Start
 ```bash
 pip install llmstack-cli
-llmstack init
+llmstack init --preset rag
 llmstack up
 ```
-That's it. You now have a full LLM API running locally.
+That's it. You now have **7 services** running: inference, embeddings, vector DB, cache, API gateway, Prometheus, and Grafana.
+```bash
+# Test it immediately
+curl http://localhost:8000/v1/chat/completions \
+  -H "Authorization: Bearer YOUR_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello!"}]}'
+```
+Works with **any OpenAI-compatible client**: LangChain, LlamaIndex, Vercel AI SDK, openai-python.
+## Who is this for?
-## Architecture
+- **AI app developers** who want local inference without Docker boilerplate
+- **Teams** who need an OpenAI-compatible API backed by local models
+- **Hobbyists** running LLMs locally who want vector search, caching, and monitoring out of the box
+- **Anyone** tired of writing 200+ lines of docker-compose.yml every time
+## What you get
 ```
                          llmstack up
@@ -84,8 +106,6 @@ That's it. You now have a full LLM API running locally.
                                                    :8080
 ```
-## What you get
 | Layer | Service | Default | Port |
 |-------|---------|---------|------|
 | Inference | Ollama / vLLM (auto) | llama3.2 | 11434 |
@@ -97,7 +117,7 @@ That's it. You now have a full LLM API running locally.
 ## How it works
-```
+```bash
 llmstack init       # Detects hardware, generates llmstack.yaml
                     # Picks optimal backend: vLLM for NVIDIA 16GB+, Ollama otherwise
@@ -109,27 +129,6 @@ llmstack logs ollama # Stream inference logs
 llmstack down       # Stops everything
 ```
-### Use the API
-```bash
-curl http://localhost:8000/v1/chat/completions \
-  -H "Authorization: Bearer YOUR_KEY" \
-  -H "Content-Type: application/json" \
-  -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello!"}]}'
-```
-Works with **any OpenAI-compatible client**: LangChain, LlamaIndex, Vercel AI SDK, openai-python.
-```python
-from openai import OpenAI
-client = OpenAI(base_url="http://localhost:8000/v1", api_key="YOUR_KEY")
-response = client.chat.completions.create(
-    model="llama3.2",
-    messages=[{"role": "user", "content": "Explain quantum computing"}]
-)
-```
 ## Auto hardware detection
 | Your hardware | Backend | Why |
@@ -181,6 +180,50 @@ observe:
   dashboard_port: 8080
 ```
+## Interactive Chat
+```bash
+llmstack chat
+```
+```
+LLMStack Chat — model: llama3.2
+Type 'exit' or Ctrl+C to quit. '/clear' to reset conversation.
+You: What is quantum computing?
+Assistant: Quantum computing uses quantum mechanical phenomena like
+superposition and entanglement to process information...
+You: /clear
+Conversation cleared.
+```
+Streaming responses, conversation history, works with any model in your stack.
+## Export to Docker Compose
+Don't want to install llmstack? Generate a standalone `docker-compose.yml`:
+```bash
+llmstack export
+# Exported 7 services to docker-compose.yml
+# Run with: docker compose up -d
+```
+Share the generated file with your team — no llmstack dependency required.
+## Use the API
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="YOUR_KEY")
+response = client.chat.completions.create(
+    model="llama3.2",
+    messages=[{"role": "user", "content": "Explain quantum computing"}]
+)
+```
 ## CLI
 | Command | Description |
@@ -189,6 +232,8 @@ observe:
 | `llmstack up [--attach]` | Start all services |
 | `llmstack down [--volumes]` | Stop and clean up |
 | `llmstack status` | Health check all services |
+| `llmstack chat [--model]` | Interactive terminal chat |
+| `llmstack export [--output]` | Generate docker-compose.yml |
 | `llmstack logs <service>` | Stream service logs |
 | `llmstack doctor` | Diagnose system issues |
@@ -204,6 +249,34 @@ When `observe.metrics: true`, llmstack boots Prometheus + Grafana with a pre-bui
 Access at `http://localhost:8080` (login: admin / llmstack)
+## Why not just Docker Compose?
+Here's what llmstack replaces:
+```yaml
+# Without llmstack: ~200 lines of docker-compose.yml
+# You have to configure each service, write health checks,
+# set up networking, manage GPU passthrough, create Prometheus
+# scrape configs, provision Grafana dashboards...
+# With llmstack:
+llmstack init && llmstack up
+```
+## Comparison
+| | llmstack | Ollama | LocalAI | AnythingLLM | LiteLLM |
+|---|---|---|---|---|---|
+| One-command full stack | **Yes** | No (inference only) | No | Partial | No (proxy only) |
+| Auto hardware detection | **Yes** | No | No | No | No |
+| OpenAI-compatible API | **Yes** | Yes | Yes | No | Yes |
+| Built-in vector DB | **Yes** | No | No | Bundled | No |
+| Built-in embeddings | **Yes** | No | No | Bundled | No |
+| Caching (Redis) | **Yes** | No | No | No | No |
+| Auth + rate limiting | **Yes** | No | No | Yes | Yes |
+| Observability dashboard | **Yes** | No | Partial | No | Partial |
+| Plugin ecosystem | **Yes** | No | No | No | No |
 ## Plugins
 Extend llmstack with new backends via pip:
@@ -215,21 +288,6 @@ pip install llmstack-cli-plugin-chromadb
 Create your own: implement `ServiceBase`, register via entry_points. See [CONTRIBUTING.md](CONTRIBUTING.md).
-## Why llmstack?
-| | llmstack | Ollama | Harbor | AnythingLLM | LiteLLM |
-|---|---|---|---|---|---|
-| One-command full stack | Yes | No (inference only) | Partial | Partial | No (proxy only) |
-| Auto hardware detection | Yes | No | No | No | No |
-| OpenAI-compatible API | Yes | Yes | Varies | No | Yes |
-| Built-in vector DB | Yes | No | Config needed | Bundled | No |
-| Built-in embeddings | Yes | No | No | Bundled | No |
-| Caching (Redis) | Yes | No | No | No | No |
-| Auth + rate limiting | Yes | No | No | Yes | Yes |
-| Observability dashboard | Yes | No | Partial | No | Partial |
-| Plugin ecosystem | Yes | No | No | No | No |
-| SSE streaming | Yes | Yes | Yes | Yes | Yes |
 ## Tech stack
 - **CLI**: [Typer](https://typer.tiangolo.com/) + [Rich](https://rich.readthedocs.io/)

{llmstack_cli-0.1.0 → llmstack_cli-0.2.0}/README.md RENAMED Viewed

@@ -9,21 +9,43 @@
   <a href="https://github.com/mara-werils/llmstack/actions/workflows/ci.yml"><img src="https://github.com/mara-werils/llmstack/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
   <a href="https://github.com/mara-werils/llmstack/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-green" alt="License"></a>
   <a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.11+-blue" alt="Python"></a>
+  <a href="https://github.com/mara-werils/llmstack/stargazers"><img src="https://img.shields.io/github/stars/mara-werils/llmstack?style=social" alt="Stars"></a>
 </p>
 ---
-**llmstack** spins up a production-grade LLM stack locally with a single command. It auto-detects your hardware, picks the optimal inference backend, and wires everything together.
+<p align="center">
+  <img src="demo.gif" alt="llmstack demo" width="800">
+</p>
+## Quick Start
 ```bash
 pip install llmstack-cli
-llmstack init
+llmstack init --preset rag
 llmstack up
 ```
-That's it. You now have a full LLM API running locally.
+That's it. You now have **7 services** running: inference, embeddings, vector DB, cache, API gateway, Prometheus, and Grafana.
+```bash
+# Test it immediately
+curl http://localhost:8000/v1/chat/completions \
+  -H "Authorization: Bearer YOUR_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello!"}]}'
+```
+Works with **any OpenAI-compatible client**: LangChain, LlamaIndex, Vercel AI SDK, openai-python.
+## Who is this for?
-## Architecture
+- **AI app developers** who want local inference without Docker boilerplate
+- **Teams** who need an OpenAI-compatible API backed by local models
+- **Hobbyists** running LLMs locally who want vector search, caching, and monitoring out of the box
+- **Anyone** tired of writing 200+ lines of docker-compose.yml every time
+## What you get
 ```
                          llmstack up
@@ -49,8 +71,6 @@ That's it. You now have a full LLM API running locally.
                                                    :8080
 ```
-## What you get
 | Layer | Service | Default | Port |
 |-------|---------|---------|------|
 | Inference | Ollama / vLLM (auto) | llama3.2 | 11434 |
@@ -62,7 +82,7 @@ That's it. You now have a full LLM API running locally.
 ## How it works
-```
+```bash
 llmstack init       # Detects hardware, generates llmstack.yaml
                     # Picks optimal backend: vLLM for NVIDIA 16GB+, Ollama otherwise
@@ -74,27 +94,6 @@ llmstack logs ollama # Stream inference logs
 llmstack down       # Stops everything
 ```
-### Use the API
-```bash
-curl http://localhost:8000/v1/chat/completions \
-  -H "Authorization: Bearer YOUR_KEY" \
-  -H "Content-Type: application/json" \
-  -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello!"}]}'
-```
-Works with **any OpenAI-compatible client**: LangChain, LlamaIndex, Vercel AI SDK, openai-python.
-```python
-from openai import OpenAI
-client = OpenAI(base_url="http://localhost:8000/v1", api_key="YOUR_KEY")
-response = client.chat.completions.create(
-    model="llama3.2",
-    messages=[{"role": "user", "content": "Explain quantum computing"}]
-)
-```
 ## Auto hardware detection
 | Your hardware | Backend | Why |
@@ -146,6 +145,50 @@ observe:
   dashboard_port: 8080
 ```
+## Interactive Chat
+```bash
+llmstack chat
+```
+```
+LLMStack Chat — model: llama3.2
+Type 'exit' or Ctrl+C to quit. '/clear' to reset conversation.
+You: What is quantum computing?
+Assistant: Quantum computing uses quantum mechanical phenomena like
+superposition and entanglement to process information...
+You: /clear
+Conversation cleared.
+```
+Streaming responses, conversation history, works with any model in your stack.
+## Export to Docker Compose
+Don't want to install llmstack? Generate a standalone `docker-compose.yml`:
+```bash
+llmstack export
+# Exported 7 services to docker-compose.yml
+# Run with: docker compose up -d
+```
+Share the generated file with your team — no llmstack dependency required.
+## Use the API
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="YOUR_KEY")
+response = client.chat.completions.create(
+    model="llama3.2",
+    messages=[{"role": "user", "content": "Explain quantum computing"}]
+)
+```
 ## CLI
 | Command | Description |
@@ -154,6 +197,8 @@ observe:
 | `llmstack up [--attach]` | Start all services |
 | `llmstack down [--volumes]` | Stop and clean up |
 | `llmstack status` | Health check all services |
+| `llmstack chat [--model]` | Interactive terminal chat |
+| `llmstack export [--output]` | Generate docker-compose.yml |
 | `llmstack logs <service>` | Stream service logs |
 | `llmstack doctor` | Diagnose system issues |
@@ -169,6 +214,34 @@ When `observe.metrics: true`, llmstack boots Prometheus + Grafana with a pre-bui
 Access at `http://localhost:8080` (login: admin / llmstack)
+## Why not just Docker Compose?
+Here's what llmstack replaces:
+```yaml
+# Without llmstack: ~200 lines of docker-compose.yml
+# You have to configure each service, write health checks,
+# set up networking, manage GPU passthrough, create Prometheus
+# scrape configs, provision Grafana dashboards...
+# With llmstack:
+llmstack init && llmstack up
+```
+## Comparison
+| | llmstack | Ollama | LocalAI | AnythingLLM | LiteLLM |
+|---|---|---|---|---|---|
+| One-command full stack | **Yes** | No (inference only) | No | Partial | No (proxy only) |
+| Auto hardware detection | **Yes** | No | No | No | No |
+| OpenAI-compatible API | **Yes** | Yes | Yes | No | Yes |
+| Built-in vector DB | **Yes** | No | No | Bundled | No |
+| Built-in embeddings | **Yes** | No | No | Bundled | No |
+| Caching (Redis) | **Yes** | No | No | No | No |
+| Auth + rate limiting | **Yes** | No | No | Yes | Yes |
+| Observability dashboard | **Yes** | No | Partial | No | Partial |
+| Plugin ecosystem | **Yes** | No | No | No | No |
 ## Plugins
 Extend llmstack with new backends via pip:
@@ -180,21 +253,6 @@ pip install llmstack-cli-plugin-chromadb
 Create your own: implement `ServiceBase`, register via entry_points. See [CONTRIBUTING.md](CONTRIBUTING.md).
-## Why llmstack?
-| | llmstack | Ollama | Harbor | AnythingLLM | LiteLLM |
-|---|---|---|---|---|---|
-| One-command full stack | Yes | No (inference only) | Partial | Partial | No (proxy only) |
-| Auto hardware detection | Yes | No | No | No | No |
-| OpenAI-compatible API | Yes | Yes | Varies | No | Yes |
-| Built-in vector DB | Yes | No | Config needed | Bundled | No |
-| Built-in embeddings | Yes | No | No | Bundled | No |
-| Caching (Redis) | Yes | No | No | No | No |
-| Auth + rate limiting | Yes | No | No | Yes | Yes |
-| Observability dashboard | Yes | No | Partial | No | Partial |
-| Plugin ecosystem | Yes | No | No | No | No |
-| SSE streaming | Yes | Yes | Yes | Yes | Yes |
 ## Tech stack
 - **CLI**: [Typer](https://typer.tiangolo.com/) + [Rich](https://rich.readthedocs.io/)

llmstack_cli-0.2.0/assets/social-preview.png ADDED Viewed

Binary file

llmstack_cli-0.2.0/demo.gif ADDED Viewed

Binary file

llmstack_cli-0.2.0/demo.tape ADDED Viewed

@@ -0,0 +1,51 @@
+# VHS tape for recording llmstack demo GIF
+# Pre-requisite: cp scripts/fake-llmstack /tmp/llmstack
+# Run: vhs demo.tape
+Output demo.gif
+Set FontSize 16
+Set Width 1200
+Set Height 800
+Set Theme "Catppuccin Mocha"
+Set Padding 20
+Set TypingSpeed 40ms
+Set Shell "bash"
+# Set PATH so /tmp/llmstack is found, then clear screen
+Hide
+Type "export PATH=/tmp:$PATH"
+Enter
+Sleep 300ms
+Type "export PS1='$ '"
+Enter
+Sleep 300ms
+Type "clear"
+Enter
+Sleep 500ms
+Show
+# --- Scene 1: Show version ---
+Type "llmstack --version"
+Enter
+Sleep 1.5s
+# --- Scene 2: Initialize with RAG preset ---
+Type "llmstack init --preset rag"
+Enter
+Sleep 3s
+# --- Scene 3: Start the stack ---
+Type "llmstack up"
+Enter
+Sleep 4s
+# --- Scene 4: Check status ---
+Type "llmstack status"
+Enter
+Sleep 3s
+# --- Scene 5: Tear down ---
+Type "llmstack down"
+Enter
+Sleep 3s

llmstack_cli-0.2.0/examples/output.txt ADDED Viewed

@@ -0,0 +1,86 @@
+$ llmstack --version
+llmstack 0.1.0
+$ llmstack init --preset rag
+Hardware detected:
+  CPU: 10 cores
+  RAM: 32 GB
+  GPU: Apple M2 Pro (16 GB VRAM)
+Using preset: rag
+  Backend: Ollama
+Created llmstack.yaml
+Next: edit the config if needed, then run llmstack up
+$ llmstack up
+Starting LLMStack...
+  ✓ qdrant        running   :6333
+  ✓ redis         running   :6379
+  ✓ ollama        running   :11434   (pulling llama3.2...)
+  ✓ tei           running   :8002    (loading bge-m3...)
+  ✓ gateway       running   :8000
+  ✓ prometheus    running   :9090
+  ✓ grafana       running   :8080
+Stack is ready! 7 services running.
+API: http://localhost:8000/v1
+Dashboard: http://localhost:8080
+$ llmstack status
+         LLMStack Status
+┌─────────────┬──────────┬─────────┬───────────────┐
+│ Service     │ Container│ Status  │ Ports         │
+├─────────────┼──────────┼─────────┼───────────────┤
+│ qdrant      │ a3f1..   │ running │ 6333->6333    │
+│ redis       │ b7e2..   │ running │ 6379->6379    │
+│ ollama      │ c9d4..   │ running │ 11434->11434  │
+│ tei         │ d2a8..   │ running │ 8002->8002    │
+│ gateway     │ e5c1..   │ running │ 8000->8000    │
+│ prometheus  │ f8b3..   │ running │ 9090->9090    │
+│ grafana     │ 1a7e..   │ running │ 8080->8080    │
+└─────────────┴──────────┴─────────┴───────────────┘
+$ curl -s http://localhost:8000/v1/chat/completions \
+  -H "Authorization: Bearer llmstack-key" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hello!"}]}' | jq .
+{
+  "id": "chatcmpl-abc123",
+  "object": "chat.completion",
+  "created": 1715090400,
+  "model": "llama3.2",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "Hello! How can I help you today?"
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 11,
+    "completion_tokens": 9,
+    "total_tokens": 20
+  }
+}
+$ llmstack down
+Stopping LLMStack...
+  ✓ grafana       stopped
+  ✓ prometheus    stopped
+  ✓ gateway       stopped
+  ✓ tei           stopped
+  ✓ ollama        stopped
+  ✓ redis         stopped
+  ✓ qdrant        stopped
+All services stopped.

llmstack-cli 0.1.0__tar.gz → 0.2.0__tar.gz

llmstack-cli 0.1.0tar.gz → 0.2.0tar.gz