PyPI - programasweights - Versions diffs - 0.1.0.dev6__tar.gz → 0.1.0.dev7__tar.gz - Mend

programasweights 0.1.0.dev6tar.gz → 0.1.0.dev7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (509) hide show

programasweights-0.1.0.dev7/.cursor/rules/vllm-deployment.mdc ADDED Viewed

@@ -0,0 +1,108 @@
+---
+description: vLLM deployment rules and known issues for PAW servers
+alwaysApply: true
+---
+# vLLM Deployment Rules
+## Critical Environment Variables
+### VLLM_ALLOW_RUNTIME_LORA_UPDATING=1
+**MUST** be set when starting any vLLM inference server with `--enable-lora`.
+Without this, `/v1/load_lora_adapter` and `/v1/unload_lora_adapter` endpoints
+are NOT registered, and dynamic LoRA loading silently fails.
+```bash
+VLLM_ALLOW_RUNTIME_LORA_UPDATING=1 python -m vllm.entrypoints.openai.api_server \
+    --model ... --enable-lora --max-loras 100 --max-lora-rank 64 ...
+```
+### VLLM_TORCH_COMPILE_LEVEL=0
+Set this on servers that don't have full CUDA dev toolchain installed.
+Without it, vLLM tries to compile CUDA kernels via gcc which may fail
+with missing headers.
+## vLLM Multi-LoRA Inference Flow
+1. Start server with `--enable-lora` + `VLLM_ALLOW_RUNTIME_LORA_UPDATING=1`
+2. Load adapter: `POST /v1/load_lora_adapter {"lora_name": "my_adapter", "lora_path": "/path/to/peft/dir"}`
+3. Infer: `POST /v1/completions {"model": "my_adapter", "prompt": "...", ...}`
+4. The adapter dir must contain `adapter_config.json` + `adapter_model.safetensors` (PEFT format)
+## vLLM Compiler Pooling (PawCompilerForPooling)
+- Must use `--enforce-eager` flag (CUDA graph capture hangs otherwise)
+- Must use `--runner pooling` flag
+- The model must be registered in vLLM's registry. Use `scripts/patch_vllm.sh` after every pip install.
+- Endpoint: `POST /v1/embeddings` (NOT `/pooling`)
+## Post-Install Patch (scripts/patch_vllm.sh)
+Run after every `pip install vllm` or `pip install --upgrade vllm`:
+```bash
+cd server && bash scripts/patch_vllm.sh /root/paw-venv
+```
+This patches:
+1. GPT-2 LoRA support (adds `SupportsLoRA` mixin to `GPT2LMHeadModel`)
+2. PawCompilerForPooling registration in vLLM model registry
+## Server Architecture
+- Main server (95.133.252.12): API gateway + Qwen3 provider (port 9000) + vLLM services
+- GPT-2 server (95.133.253.173): GPT-2 provider (port 9000) + vLLM services
+- Pseudo-gen (Qwen3-4B) runs on main server GPU 0 only; GPT-2 server shares it remotely
+- Each provider is self-contained: owns checkpoint, tokenizers, compilation pipeline
+## GPT-2 Specifics
+- Interpreter uses NO chat template (use_chat_template: false in config)
+- LoRA target modules: c_attn, attn_c_proj, c_fc, mlp_c_proj (disambiguated names)
+- GGUF arch: "gpt2" (not "qwen3")
+- MODULE_TO_PEFT mapping differs from Qwen3 (transformer.h.{} vs model.layers.{})
+- Position embeddings extended to 2048 (saved in checkpoint interpreter/)
+## Deployment: Git Push/Pull Only
+NEVER use rsync for code. Always:
+```bash
+# Local
+git push origin main
+# Server
+cd /data/paw-repo && git pull
+```
+Model weight files (checkpoints, GGUFs) use scp between servers — they don't belong in git.
+## Service Dependencies on Main Server (95.133.252.12)
+The API gateway (port 8000) depends on the Qwen3 CompileProvider (port 9000).
+When restarting the API, ALWAYS check that the provider is also running.
+Services that must be running:
+- vLLM pseudo-gen: port 8001 (GPU 0)
+- vLLM compiler: port 8002 (GPU 1, --runner pooling)
+- vLLM inference: port 8003 (GPU 1, --enable-lora)
+- CompileProvider: port 9000 (depends on 8001, 8002, 8003)
+- API gateway: port 8000 (depends on 9000)
+After restarting the API, verify the provider:
+```bash
+curl -s http://localhost:9000/provider/health
+```
+If not running, start it:
+```bash
+cd /data/paw-server && PYTHONPATH=. PROVIDER_CONFIG=provider_configs/qwen3-0.6b.json \
+  nohup /root/paw-venv/bin/uvicorn api.services.compile_provider:create_app \
+  --factory --host 0.0.0.0 --port 9000 > /tmp/paw-provider-9000.log 2>&1 &
+```
+## Testing: Use 36 Hard Examples
+When testing compile+infer, use the 36 handcrafted examples from
+`server/benchmarks/handcrafted_specs.json`. Simple tasks work without LoRA.
+Hard tasks (Caesar cipher, JSON extraction) require LoRA to function correctly.
+Expected baselines:
+- Qwen3-0.6B: 30/36 (83.3%) via server, 32/36 (88.9%) via local HF
+- GPT-2: 24/36 (66.7%) via local HF (server TBD)

programasweights-0.1.0.dev7/.git ADDED Viewed

	@@ -0,0 +1 @@
1	+ gitdir: ../.git/modules/programasweights

programasweights-0.1.0.dev7/.github/workflows/web-sdk.yml ADDED Viewed

@@ -0,0 +1,51 @@
+name: Web SDK CI
+on:
+  push:
+    paths: ['web-sdk/**']
+  pull_request:
+    paths: ['web-sdk/**']
+jobs:
+  lint-and-test:
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: web-sdk
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          submodules: recursive
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 20
+          cache: npm
+          cache-dependency-path: web-sdk/package-lock.json
+      - run: npm ci
+      - run: npm run typecheck
+      - run: npm test
+      - run: npm run build
+  publish:
+    if: startsWith(github.ref, 'refs/tags/web-sdk-v')
+    needs: [lint-and-test]
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: web-sdk
+    permissions:
+      contents: read
+      id-token: write
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          submodules: recursive
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 20
+          registry-url: https://registry.npmjs.org
+      - run: npm ci
+      - run: npm run build
+      - run: npm publish --access public --provenance
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}

programasweights-0.1.0.dev7/.gitmodules ADDED Viewed

@@ -0,0 +1,3 @@
+[submodule "web-sdk/wllama"]
+	path = web-sdk/wllama
+	url = git@github.com:programasweights/wllama.git

programasweights-0.1.0.dev7/.readthedocs.yaml ADDED Viewed

@@ -0,0 +1,13 @@
+version: 2
+build:
+  os: ubuntu-24.04
+  tools:
+    python: "3.12"
+mkdocs:
+  configuration: mkdocs.yml
+python:
+  install:
+    - requirements: docs/requirements.txt

programasweights-0.1.0.dev7/AGENTS.md ADDED Viewed

@@ -0,0 +1,112 @@
+# ProgramAsWeights (PAW)
+PAW compiles natural language specifications into tiny neural functions that run locally. Use it when you need fuzzy text processing — classification, extraction, format repair, search, triage — that regex can't handle but a full LLM is overkill for.
+Website: https://programasweights.com
+Docs: https://programasweights.readthedocs.io
+## Install
+```bash
+pip install --pre programasweights
+```
+## Usage
+```python
+import programasweights as paw
+# Use a pre-compiled function (downloads once, runs locally forever)
+fn = paw.function("email-triage")
+fn("Urgent: server is down!")  # "immediate"
+fn("Newsletter: spring picnic")  # "wait"
+# Compile your own from a description
+program = paw.compile(
+    "Fix malformed JSON: repair missing quotes and trailing commas",
+    compiler="paw-4b-qwen3-0.6b"  # or "paw-4b-gpt2" for smaller/faster
+)
+fn = paw.function(program.id)
+fn("{name: 'Alice',}")  # '{"name": "Alice"}'
+```
+## Two Compilers
+| | Qwen3 0.6B | GPT-2 124M |
+|---|---|---|
+| Compiler name | `paw-4b-qwen3-0.6b` | `paw-4b-gpt2` |
+| Accuracy | Higher | Lower |
+| Base model size | 594 MB | 105 MB |
+| Program size | ~22 MB | ~5 MB |
+| Inference speed | ~90ms (server) | ~50ms (server) |
+| Runs in browser | No | Yes |
+Default is Qwen3 0.6B. Use GPT-2 when you need smaller files or browser deployment.
+## Browser / JavaScript SDK
+Programs compiled with GPT-2 also run entirely in the browser via WebAssembly — no server needed. Users' data never leaves their device.
+```bash
+npm install @programasweights/web
+```
+```javascript
+import paw from '@programasweights/web';
+const fn = await paw.function('programasweights/email-triage');
+const result = await fn('Urgent: server is down!');
+// result: "immediate"
+```
+Use this when you want to offload all computation to the user's browser: zero server costs, zero API keys, full data privacy. The 105 MB base model downloads once and is cached.
+## When to Use PAW
+- **Fuzzy search** — typo-tolerant matching, semantic search, near-duplicate detection
+- **Format repair** — fix broken JSON, normalize dates, repair malformed inputs
+- **Classification** — sentiment, urgency, categories defined in your own words
+- **Extraction** — emails, names, dates from messy unstructured text
+- **Log triage** — extract errors from verbose output, filter noise
+- **Agent preprocessing** — parse tool calls, validate outputs, route tasks
+## Authentication
+```bash
+# Option 1: environment variable (recommended)
+export PAW_API_KEY=paw_sk_...
+# Option 2: CLI login (opens browser to generate key)
+paw login
+```
+Generate API keys at https://programasweights.com/settings
+The SDK automatically reads `PAW_API_KEY` from the environment. Authenticated users get higher rate limits (60 compiles/hr vs 5 for anonymous).
+## CLI
+```bash
+paw compile --spec "Classify sentiment as positive or negative" --json
+paw run --program <program_id> --input "I love this!" --json
+paw login  # Save API key for higher rate limits
+```
+`--json` gives structured output for programmatic use.
+## API
+```python
+paw.compile(spec, compiler="paw-4b-qwen3-0.6b")  # Compile a spec
+paw.function(name_or_id)                           # Load a compiled program
+paw.login()                                        # Save API key
+```
+## Browse Programs
+https://programasweights.com/hub
+## Add PAW to Your Project
+Copy this file into your project as AGENTS.md:
+https://programasweights.com/agents

programasweights-0.1.0.dev7/PKG-INFO ADDED Viewed

@@ -0,0 +1,141 @@
+Metadata-Version: 2.4
+Name: programasweights
+Version: 0.1.0.dev7
+Summary: Compile natural language specifications into neural programs that run locally via llama.cpp.
+Project-URL: Homepage, https://programasweights.com
+Project-URL: Repository, https://github.com/programasweights/programasweights-python
+Project-URL: Documentation, https://programasweights.readthedocs.io
+Project-URL: Bug Tracker, https://github.com/programasweights/programasweights-python/issues
+Author-email: ProgramAsWeights <support@programasweights.com>
+License: MIT
+Keywords: inference,llama-cpp,lora,neural-programs,nlp
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Requires-Python: >=3.9
+Requires-Dist: httpx<1.0,>=0.27.0
+Requires-Dist: llama-cpp-python<1.0,>=0.3.0
+Provides-Extra: test
+Requires-Dist: pytest; extra == 'test'
+Description-Content-Type: text/markdown
+# ProgramAsWeights
+**Compile natural language specs into tiny neural functions that run locally.**
+Define what a function should do in plain English. PAW compiles it into a small neural program that runs on your machine — no API keys at runtime, no internet needed after setup, fully deterministic.
+## Install
+```bash
+pip install programasweights
+```
+## Quick Start
+```python
+import programasweights as paw
+# Use a pre-compiled function (downloads once, runs locally forever)
+fn = paw.function("email-triage")
+fn("Urgent: the server is down!")        # "immediate"
+fn("Newsletter: spring picnic")          # "wait"
+# Compile your own from a description
+program = paw.compile(
+    "Fix malformed JSON: repair missing quotes and trailing commas",
+    compiler="paw-4b-qwen3-0.6b"  # or "paw-4b-gpt2" for smaller/faster
+)
+fn = paw.function(program.id)
+fn("{name: 'Alice',}")  # '{"name": "Alice"}'
+```
+## Two Compilers
+|                    | Qwen3 0.6B              | GPT-2 124M             |
+|--------------------|-------------------------|------------------------|
+| Compiler name      | `paw-4b-qwen3-0.6b`    | `paw-4b-gpt2`          |
+| Accuracy           | Higher                  | Lower                  |
+| Base model size    | 594 MB                  | 105 MB                 |
+| Program size       | ~22 MB                  | ~5 MB                  |
+| Inference speed    | ~90ms (server)          | ~50ms (server)         |
+| Runs in browser    | No                      | Yes                    |
+Default is Qwen3 0.6B. Use GPT-2 when you need smaller files or browser deployment.
+## Browser SDK
+Programs compiled with GPT-2 also run entirely in the browser via WebAssembly — no server needed, data never leaves the user's device.
+```bash
+npm install @programasweights/web
+```
+```javascript
+import paw from '@programasweights/web';
+const fn = await paw.function('programasweights/email-triage');
+const result = await fn('Urgent: the server is down!');
+// result: "immediate"
+```
+See the [browser SDK repo](https://github.com/programasweights/programasweights-js) for full documentation.
+## Use with AI Agents
+PAW works with Cursor, Claude, Codex, and other AI coding assistants. Paste this into your agent's chat:
+> I want to use ProgramAsWeights (PAW) to create fuzzy text functions that run locally. Read the instructions at https://programasweights.com/agents and help me integrate it.
+Or save [`AGENTS.md`](https://programasweights.com/agents) to your project root — agents read it automatically.
+## When to Use PAW
+- **Fuzzy search** — typo-tolerant matching, semantic search, near-duplicate detection
+- **Format repair** — fix broken JSON, normalize dates, repair malformed inputs
+- **Classification** — sentiment, urgency, categories defined in your own words
+- **Extraction** — emails, names, dates from messy unstructured text
+- **Log triage** — extract errors from verbose output, filter noise
+- **Agent preprocessing** — parse tool calls, validate outputs, route tasks
+## Authentication
+```bash
+# Option 1: environment variable (recommended)
+export PAW_API_KEY=paw_sk_...
+# Option 2: CLI login (opens browser to generate key)
+paw login
+```
+Generate API keys at [programasweights.com/settings](https://programasweights.com/settings). Authenticated users get higher rate limits.
+## CLI
+```bash
+paw compile --spec "Extract error lines from logs" --json
+paw run --program <program_id> --input "[ERROR] timeout" --json
+paw login
+```
+`--json` gives structured output for programmatic use.
+## Links
+- **Website**: [programasweights.com](https://programasweights.com)
+- **Documentation**: [programasweights.readthedocs.io](https://programasweights.readthedocs.io)
+- **Python SDK**: [github.com/programasweights/programasweights-python](https://github.com/programasweights/programasweights-python)
+- **Browser SDK**: [github.com/programasweights/programasweights-js](https://github.com/programasweights/programasweights-js)
+- **Program Hub**: [programasweights.com/hub](https://programasweights.com/hub)
+## License
+MIT

programasweights-0.1.0.dev7/PYPI_README.md ADDED Viewed

@@ -0,0 +1,112 @@
+# ProgramAsWeights
+**Compile natural language specs into tiny neural functions that run locally.**
+Define what a function should do in plain English. PAW compiles it into a small neural program that runs on your machine — no API keys at runtime, no internet needed after setup, fully deterministic.
+## Install
+```bash
+pip install programasweights
+```
+## Quick Start
+```python
+import programasweights as paw
+# Use a pre-compiled function (downloads once, runs locally forever)
+fn = paw.function("email-triage")
+fn("Urgent: the server is down!")        # "immediate"
+fn("Newsletter: spring picnic")          # "wait"
+# Compile your own from a description
+program = paw.compile(
+    "Fix malformed JSON: repair missing quotes and trailing commas",
+    compiler="paw-4b-qwen3-0.6b"  # or "paw-4b-gpt2" for smaller/faster
+)
+fn = paw.function(program.id)
+fn("{name: 'Alice',}")  # '{"name": "Alice"}'
+```
+## Two Compilers
+|                    | Qwen3 0.6B              | GPT-2 124M             |
+|--------------------|-------------------------|------------------------|
+| Compiler name      | `paw-4b-qwen3-0.6b`    | `paw-4b-gpt2`          |
+| Accuracy           | Higher                  | Lower                  |
+| Base model size    | 594 MB                  | 105 MB                 |
+| Program size       | ~22 MB                  | ~5 MB                  |
+| Inference speed    | ~90ms (server)          | ~50ms (server)         |
+| Runs in browser    | No                      | Yes                    |
+Default is Qwen3 0.6B. Use GPT-2 when you need smaller files or browser deployment.
+## Browser SDK
+Programs compiled with GPT-2 also run entirely in the browser via WebAssembly — no server needed, data never leaves the user's device.
+```bash
+npm install @programasweights/web
+```
+```javascript
+import paw from '@programasweights/web';
+const fn = await paw.function('programasweights/email-triage');
+const result = await fn('Urgent: the server is down!');
+// result: "immediate"
+```
+See the [browser SDK repo](https://github.com/programasweights/programasweights-js) for full documentation.
+## Use with AI Agents
+PAW works with Cursor, Claude, Codex, and other AI coding assistants. Paste this into your agent's chat:
+> I want to use ProgramAsWeights (PAW) to create fuzzy text functions that run locally. Read the instructions at https://programasweights.com/agents and help me integrate it.
+Or save [`AGENTS.md`](https://programasweights.com/agents) to your project root — agents read it automatically.
+## When to Use PAW
+- **Fuzzy search** — typo-tolerant matching, semantic search, near-duplicate detection
+- **Format repair** — fix broken JSON, normalize dates, repair malformed inputs
+- **Classification** — sentiment, urgency, categories defined in your own words
+- **Extraction** — emails, names, dates from messy unstructured text
+- **Log triage** — extract errors from verbose output, filter noise
+- **Agent preprocessing** — parse tool calls, validate outputs, route tasks
+## Authentication
+```bash
+# Option 1: environment variable (recommended)
+export PAW_API_KEY=paw_sk_...
+# Option 2: CLI login (opens browser to generate key)
+paw login
+```
+Generate API keys at [programasweights.com/settings](https://programasweights.com/settings). Authenticated users get higher rate limits.
+## CLI
+```bash
+paw compile --spec "Extract error lines from logs" --json
+paw run --program <program_id> --input "[ERROR] timeout" --json
+paw login
+```
+`--json` gives structured output for programmatic use.
+## Links
+- **Website**: [programasweights.com](https://programasweights.com)
+- **Documentation**: [programasweights.readthedocs.io](https://programasweights.readthedocs.io)
+- **Python SDK**: [github.com/programasweights/programasweights-python](https://github.com/programasweights/programasweights-python)
+- **Browser SDK**: [github.com/programasweights/programasweights-js](https://github.com/programasweights/programasweights-js)
+- **Program Hub**: [programasweights.com/hub](https://programasweights.com/hub)
+## License
+MIT

programasweights-0.1.0.dev7/docs/advanced/adrs.md ADDED Viewed

@@ -0,0 +1,66 @@
+# Architecture Decision Records
+Concise records of major technical choices. Full ADR files may live elsewhere in the repository; this page is the canonical summary for documentation.
+---
+## ADR 001: llama.cpp instead of PyTorch for the SDK runtime
+**Decision:** Ship local inference via **llama-cpp-python** (llama.cpp), not a PyTorch stack.
+**Context:** A PyTorch install typically exceeds **2 GB** of dependencies. llama.cpp bindings add on the order of **80 MB**, which keeps the SDK viable as a lightweight dependency.
+**Consequence:** Adapter weights must be converted to **GGUF** and loaded through the llama.cpp path. Training and server-side tooling may still use PyTorch where needed; the **end-user runtime** is GGUF-centric.
+---
+## ADR 002: Q4_0 adapter and Q6_K base quantization
+**Decision:** Use **Q4_0** for adapters and **Q6_K** for the base model in shipped bundles.
+**Context:** Empirical evaluation on **4096** held-out examples across quantization settings informed the trade-off.
+**Consequence:**
+- **Q6_K base** — quality is preserved while the footprint is roughly **60% smaller** than fp16.
+- **Q4_0 adapter** — quality loss is negligible; adapter size drops to about **23 MB** versus **78 MB** for a heavier format at comparable settings in prior experiments.
+---
+## ADR 003: Single specification field in the API
+**Decision:** The compile API accepts **one text field** for the specification. There is **no separate “examples” field** in the contract.
+**Context:** The compiler was trained on specs that naturally embed examples inside the same prose block.
+**Consequence:** The web UI may offer structured fields for examples or hints, but the client **merges** them into a single string before calling the API.
+---
+## ADR 004: Two-level compiler naming
+**Decision:** Expose **pretty aliases** (for example `paw-4b-qwen3-0.6b`) that resolve to **dated snapshots** (for example `paw-4b-qwen3-0.6b-20260325`).
+**Context:** Mirrors patterns such as OpenAI’s `gpt-4o` mapping to dated model IDs like `gpt-4o-2024-11-20`.
+**Consequence:** Users get stable marketing names while the platform can roll forward immutable snapshots without breaking references that pin the dated id.
+---
+## ADR 005: vLLM for GPU services
+**Decision:** Run **all** GPU-heavy paths (pseudo-program generation, hidden-state extraction / pooling, multi-LoRA inference) on **vLLM** instead of hand-rolled Hugging Face inference loops in production.
+**Context:** vLLM improves **throughput**, **batching**, and **memory efficiency** for serving and batched extraction workloads.
+**Consequence:** Operations that might classically be expressed as “run HuggingFace model X” are implemented as vLLM-managed models and schedules in the deployed stack.
+---
+## ADR 006: GitHub OAuth for authentication
+**Decision:** Use **GitHub OAuth** rather than email verification or password accounts for primary sign-in.
+**Context:** The target audience already maintains GitHub accounts; OAuth removes friction for naming programs, voting, and submitting cases.
+**Consequence:** Identity is tied to GitHub; users without GitHub need an alternative path if one is offered separately.

programasweights-0.1.0.dev7/docs/advanced/architecture.md ADDED Viewed

@@ -0,0 +1,47 @@
+# Architecture
+This page summarizes how the ProgramAsWeights production system is structured end to end.
+## Components
+| Layer | Technology | Role |
+|--------|------------|------|
+| Frontend | React, Vite | Web UI; static assets served by nginx |
+| API | FastAPI, uvicorn | REST endpoints, orchestration, auth integration |
+| Database | PostgreSQL | Users, programs, aliases, votes, cases, operational logs |
+| GPU services | Three vLLM instances | Pseudo-program generation, compiler (including hidden-state work), inference |
+| Storage | Hugging Face, local disk | `.paw` bundles on Hugging Face; PEFT adapter artifacts on server disk |
+| Auth | GitHub OAuth | Sign-in; session backed by HTTP cookies |
+### GPU layout
+Typical allocation:
+- **GPU 0** — pseudo-program generation
+- **GPU 1** — compiler workload (including pooling / hidden-state extraction used in the pipeline)
+- **GPU 2** — multi-LoRA inference
+Exact mapping may vary by deployment; the important split is dedicated vLLM roles per stage.
+## Compile pipeline
+High-level flow:
+1. **Pseudo-generation** (vLLM) — turn the natural-language spec into a pseudo-program representation.
+2. **LoRA extraction** — derive adapter weights from vLLM hidden states / pooling as implemented in the compiler stack.
+3. **Quantization** — convert adapters to **Q4_0 GGUF** for the bundle format used by the runtime.
+4. **Bundle** — assemble the **`.paw`** package with metadata and weights.
+5. **Upload** — publish the `.paw` artifact to Hugging Face for CDN-backed distribution.
+## Caching
+The system uses **two-level caching**:
+1. **Pseudo-generation cache** — avoid recomputing pseudo-programs for identical or equivalent spec inputs where the cache key applies.
+2. **Program-level disk cache** — reuse compiled artifacts and intermediate state on the server when the same content-addressed program is requested again.
+Together these reduce redundant GPU work and speed up repeat compiles.
+## Downloads and the SDK
+The Python SDK **downloads `.paw` files from the Hugging Face CDN** (or equivalent object storage fronted as a CDN). Programs are **not** served as large binary payloads from the ProgramAsWeights API host, which keeps the API focused on metadata, auth, and orchestration.

programasweights 0.1.0.dev6__tar.gz → 0.1.0.dev7__tar.gz

programasweights 0.1.0.dev6tar.gz → 0.1.0.dev7tar.gz