PyPI - shiftgate - Versions diffs - 0.2.0__tar.gz → 0.2.1__tar.gz - Mend

shiftgate 0.2.0tar.gz → 0.2.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{shiftgate-0.2.0 → shiftgate-0.2.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: shiftgate
-Version: 0.2.0
+Version: 0.2.1
 Summary: Intelligent routing layer that automatically selects the right LoRA adapter for each task in your local agent loop.
 Project-URL: Homepage, https://github.com/shiftgate-ai/shiftgate
 Project-URL: Repository, https://github.com/shiftgate-ai/shiftgate
@@ -356,10 +356,43 @@ shiftgate adapter add llama3.1-8b --runtime llama3.1-8b --tags general --base ll
 shiftgate run "write a python sorting function"
 ```
-shiftgate auto-detects backends in the order **Ollama → vLLM → Cerebras**, so local backends always win and Cerebras is used only when no local backend is running.
+shiftgate auto-detects backends in the order **Ollama → vLLM → Cerebras → Cloudflare**, so local backends always win and the cloud backends are used only when no local backend is running.
 > **Honest status:** shiftgate routes to Cerebras' base-model inference today. When Cerebras Multi-LoRA goes public, register your adapter with `--runtime <cerebras-lora-id>` and it just works — no shiftgate update needed.
+### Option 5 — Cloudflare Workers AI (cloud, LoRA-native)
+[Cloudflare Workers AI](https://developers.cloudflare.com/workers-ai/) serves your own LoRA finetunes on top of supported base models.
+```bash
+# 1. Upload your LoRA to Cloudflare (one-time)
+npx wrangler ai finetune create @cf/mistral/mistral-7b-instruct-v0.2-lora my-sql-lora ./adapter-folder
+# 2. Set credentials
+export CLOUDFLARE_ACCOUNT_ID=...
+export CLOUDFLARE_API_TOKEN=...
+# 3. Register in shiftgate — note --base is the Cloudflare model name
+shiftgate adapter add my-sql-lora \
+  --runtime my-sql-lora \
+  --base @cf/mistral/mistral-7b-instruct-v0.2-lora \
+  --tags sql
+# 4. Run
+shiftgate run "write a sql join query"
+```
+You can also pass credentials per-run with the `--cf-account-id` and `--cf-api-token` global flags.
+**Architectural difference (handled transparently):** Cloudflare keeps the base model in the URL and accepts the LoRA name as a separate `lora` field — not as the `model` value like vLLM/Cerebras. shiftgate handles this transparently; your routing logic doesn't change. The `--base` you register **must** be a Cloudflare model name starting with `@cf/`.
+**Limitations** (from [Cloudflare's docs](https://developers.cloudflare.com/workers-ai/features/fine-tunes/)):
+- Up to **100 LoRAs** per account.
+- LoRA file must be **< 300 MB**.
+- Must be trained with rank **r ≤ 8** (up to 32 on some models).
+- **Streaming is not yet supported** through shiftgate for Cloudflare — you get a single response. (Streaming requests to `shiftgate serve` against Cloudflare return HTTP 501.)
 ---
 ## How to contribute adapters

{shiftgate-0.2.0 → shiftgate-0.2.1}/README.md RENAMED Viewed

@@ -320,10 +320,43 @@ shiftgate adapter add llama3.1-8b --runtime llama3.1-8b --tags general --base ll
 shiftgate run "write a python sorting function"
 ```
-shiftgate auto-detects backends in the order **Ollama → vLLM → Cerebras**, so local backends always win and Cerebras is used only when no local backend is running.
+shiftgate auto-detects backends in the order **Ollama → vLLM → Cerebras → Cloudflare**, so local backends always win and the cloud backends are used only when no local backend is running.
 > **Honest status:** shiftgate routes to Cerebras' base-model inference today. When Cerebras Multi-LoRA goes public, register your adapter with `--runtime <cerebras-lora-id>` and it just works — no shiftgate update needed.
+### Option 5 — Cloudflare Workers AI (cloud, LoRA-native)
+[Cloudflare Workers AI](https://developers.cloudflare.com/workers-ai/) serves your own LoRA finetunes on top of supported base models.
+```bash
+# 1. Upload your LoRA to Cloudflare (one-time)
+npx wrangler ai finetune create @cf/mistral/mistral-7b-instruct-v0.2-lora my-sql-lora ./adapter-folder
+# 2. Set credentials
+export CLOUDFLARE_ACCOUNT_ID=...
+export CLOUDFLARE_API_TOKEN=...
+# 3. Register in shiftgate — note --base is the Cloudflare model name
+shiftgate adapter add my-sql-lora \
+  --runtime my-sql-lora \
+  --base @cf/mistral/mistral-7b-instruct-v0.2-lora \
+  --tags sql
+# 4. Run
+shiftgate run "write a sql join query"
+```
+You can also pass credentials per-run with the `--cf-account-id` and `--cf-api-token` global flags.
+**Architectural difference (handled transparently):** Cloudflare keeps the base model in the URL and accepts the LoRA name as a separate `lora` field — not as the `model` value like vLLM/Cerebras. shiftgate handles this transparently; your routing logic doesn't change. The `--base` you register **must** be a Cloudflare model name starting with `@cf/`.
+**Limitations** (from [Cloudflare's docs](https://developers.cloudflare.com/workers-ai/features/fine-tunes/)):
+- Up to **100 LoRAs** per account.
+- LoRA file must be **< 300 MB**.
+- Must be trained with rank **r ≤ 8** (up to 32 on some models).
+- **Streaming is not yet supported** through shiftgate for Cloudflare — you get a single response. (Streaming requests to `shiftgate serve` against Cloudflare return HTTP 501.)
 ---
 ## How to contribute adapters

{shiftgate-0.2.0 → shiftgate-0.2.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "shiftgate"
-version = "0.2.0"
+version = "0.2.1"
 description = "Intelligent routing layer that automatically selects the right LoRA adapter for each task in your local agent loop."
 readme = "README.md"
 requires-python = ">=3.10"

{shiftgate-0.2.0 → shiftgate-0.2.1}/shiftgate/cli.py RENAMED Viewed

@@ -52,12 +52,47 @@ def _main(
             ),
         ),
     ] = None,
+    cf_account_id: Annotated[
+        Optional[str],
+        typer.Option(
+            "--cf-account-id",
+            help="Cloudflare account ID. Sets CLOUDFLARE_ACCOUNT_ID for this run.",
+        ),
+    ] = None,
+    cf_api_token: Annotated[
+        Optional[str],
+        typer.Option(
+            "--cf-api-token",
+            help="Cloudflare API token. Sets CLOUDFLARE_API_TOKEN for this run.",
+        ),
+    ] = None,
+    verbose: Annotated[
+        bool,
+        typer.Option(
+            "--verbose",
+            "-v",
+            help="Enable DEBUG logging (shows routing internals like runtime filtering).",
+        ),
+    ] = False,
 ) -> None:
     """Global options applied before any command runs."""
-    if cerebras_key:
-        import os
+    import os
+    if cerebras_key:
         os.environ["CEREBRAS_API_KEY"] = cerebras_key
+    if cf_account_id:
+        os.environ["CLOUDFLARE_ACCOUNT_ID"] = cf_account_id
+    if cf_api_token:
+        os.environ["CLOUDFLARE_API_TOKEN"] = cf_api_token
+    if verbose:
+        import logging
+        logging.basicConfig(
+            level=logging.DEBUG,
+            format="%(levelname)s [%(name)s] %(message)s",
+        )
+        logging.getLogger("shiftgate").setLevel(logging.DEBUG)
 # ---------------------------------------------------------------------------
@@ -83,17 +118,33 @@ def _get_embedder():
     return Embedder()
-def _active_runtimes(backend_router) -> set[str] | None:
-    """Return the set of runtime names loaded on the active backend, or None.
+def _active_runtimes(backend_router, adapter_reg=None) -> set[str] | None:
+    """Return the set of runtime names usable on the active backend, or None.
     ``None`` means no backend is active → the router should not filter
     (preview behaviour).  An empty set means a backend is active but reports no
-    loaded models.
+    usable models.
+    For Cloudflare, base models are always available without any finetune
+    upload, so every registered ``@cf/`` adapter that has no finetune
+    ``runtime_name`` is considered usable (base-model inference).
     """
     active = backend_router.active_backend
     if active is None:
         return None
-    return set(active.list_loaded_adapters())
+    usable = set(active.list_loaded_adapters_cached())
+    from shiftgate.runtime.backend import CloudflareBackend
+    if isinstance(active, CloudflareBackend) and adapter_reg is not None:
+        for adapter in adapter_reg.list_adapters():
+            is_cf_base = (adapter.base_model or "").startswith("@cf/")
+            has_finetune = bool((adapter.runtime_name or "").strip())
+            if is_cf_base and not has_finetune:
+                usable.add(adapter.effective_backend_name())
+    return usable
 def _auto_link_adapter(adapter: AdapterEntry, task_reg) -> list[str]:
@@ -160,9 +211,15 @@ def _verify_runtime_adapter(adapter: AdapterEntry, adapter_reg) -> None:
     """
     from shiftgate.runtime.backend import BackendRouter
+    # A Cloudflare base model (@cf/...) implies this is a Cloudflare adapter —
+    # prefer the Cloudflare backend for verification when it's reachable.
+    is_cloudflare = (adapter.base_model or "").startswith("@cf/")
     try:
         with console.status("[cyan]Verifying adapter against running backend…[/cyan]"):
             router = BackendRouter()
+            if is_cloudflare and router._cloudflare.is_available():
+                router.select("cloudflare")
             is_loaded, backend_name = router.verify_adapter(adapter)
     except Exception as exc:  # pragma: no cover - defensive, should not happen
         logger_msg = f"verification error: {exc}"
@@ -179,9 +236,14 @@ def _verify_runtime_adapter(adapter: AdapterEntry, adapter_reg) -> None:
         console.print(f"   [green]Backend: {backend_name} ✓ verified[/green]")
     else:
         adapter.verified = False
+        hint = (
+            "— upload it with `npx wrangler ai finetune create`"
+            if backend_name == "cloudflare"
+            else "— did you pass --lora-modules?"
+        )
         console.print(
             f"   [yellow]Backend: {backend_name} ⚠ runtime '{runtime}' not loaded "
-            "— did you pass --lora-modules?[/yellow]"
+            f"{hint}[/yellow]"
         )
     adapter_reg.save()
@@ -207,12 +269,23 @@ def init() -> None:
     task_reg = TaskRegistry.load()
     if task_reg.embeddings_ready():
-        console.print("[dim]Embeddings already computed. Skipping (delete embeddings_cache.npy to force refresh).[/dim]")
+        console.print(
+            "[dim]Task centroids already computed — skipping re-embed "
+            "(delete embeddings_cache.npy to force refresh).[/dim]"
+        )
     else:
         console.print("[cyan]Computing task embeddings (first run — model download may take a moment)…[/cyan]")
         embedder = _get_embedder()
         task_reg.compute_embeddings(embedder)
-        console.print("[green]✓[/green]  Embeddings computed.")
+        console.print("[green]OK[/green]  Embeddings computed.")
+    # Centroids can exist while the ONNX runtime model is missing (e.g. after
+    # deleting fastembed_cache).  Always warm-load so routing works.
+    console.print("[cyan]Loading embedder model (one-time ~30 MB download if not cached)…[/cyan]")
+    from shiftgate.router.embedder import warm_up
+    dim = warm_up()
+    console.print(f"[green]OK[/green]  Embedder ready (dim={dim}).")
     task_reg.save()
     console.print(f"[green]✓[/green]  Task registry saved to {shiftgate_dir}")
@@ -301,6 +374,7 @@ def adapter_add(
     """
     from shiftgate.registry.adapter_registry import (
         AdapterRegistry,
+        adapter_from_base_model,
         adapter_from_hf,
         adapter_from_local,
         adapter_from_runtime,
@@ -347,12 +421,22 @@ def adapter_add(
                 **shared_kwargs,
             )
-    # --- Ambiguous: no '/', no --local, no --runtime ---
+    # --- Mode D: Cloudflare base model (always available, no finetune) ---
+    elif (base or "").startswith("@cf/"):
+        base_model_kwargs = {k: v for k, v in shared_kwargs.items() if k != "base_model"}
+        adapter = adapter_from_base_model(
+            base_model=base,
+            adapter_id=adapter_id or identifier,
+            **base_model_kwargs,
+        )
+    # --- Ambiguous: no '/', no --local, no --runtime, no @cf/ base ---
     else:
         console.print(
             f"[red]Error:[/red] '{identifier}' doesn't look like a HuggingFace repo ID (missing '/').\n"
-            "  Use [cyan]--local /path/to/adapter[/cyan] to register a local adapter, or\n"
-            "  use [cyan]--runtime <backend-name>[/cyan] for a runtime-registered adapter."
+            "  Use [cyan]--local /path/to/adapter[/cyan] to register a local adapter,\n"
+            "  use [cyan]--runtime <backend-name>[/cyan] for a runtime-registered adapter, or\n"
+            "  use [cyan]--base @cf/...[/cyan] for a Cloudflare Workers AI base model."
         )
         raise typer.Exit(1)
@@ -363,6 +447,11 @@ def adapter_add(
     # has it loaded. Purely informational — never fails the add command.
     if adapter.runtime_name:
         _verify_runtime_adapter(adapter, adapter_reg)
+    elif (adapter.base_model or "").startswith("@cf/"):
+        console.print(
+            "   [green]Backend:[/green] cloudflare "
+            "[green]✓[/green] base model is always available (no upload needed)"
+        )
 @adapter_app.command("list")
@@ -490,7 +579,7 @@ def route(
     backend_router = BackendRouter()
     backend_name = backend_router.detect()
-    available_runtimes = _active_runtimes(backend_router)
+    available_runtimes = _active_runtimes(backend_router, adapter_reg)
     try:
         trace, match_result = routing.route(
@@ -549,7 +638,7 @@ def run(
     backend_router = BackendRouter()
     backend_name = backend_router.detect()
-    available_runtimes = _active_runtimes(backend_router)
+    available_runtimes = _active_runtimes(backend_router, adapter_reg)
     try:
         trace, match_result = routing.route(
@@ -716,9 +805,11 @@ def doctor() -> None:
         router = BackendRouter()
         backend_name = router.detect()
         backend_url = router.active_backend_url
-        loaded_adapters: list[str] = []
+        loaded_adapters: set[str] = set()
         if backend_name is not None and router._active is not None:
-            loaded_adapters = router._active.list_loaded_adapters()
+            # Cloudflare base models are always available, so use the same
+            # usable-runtime computation the router uses when filtering.
+            loaded_adapters = _active_runtimes(router, adapter_reg) or set()
     # --- 3. Per-adapter runtime availability ---
     adapter_rows = []
@@ -774,7 +865,7 @@ def serve(
         str,
         typer.Option(
             "--backend",
-            help="Backend to forward to: auto | ollama | vllm | cerebras.",
+            help="Backend to forward to: auto | ollama | vllm | cerebras | cloudflare.",
         ),
     ] = "auto",
 ) -> None:

{shiftgate-0.2.0 → shiftgate-0.2.1}/shiftgate/registry/adapter_registry.py RENAMED Viewed

@@ -244,6 +244,32 @@ def adapter_from_runtime(
     )
+def adapter_from_base_model(
+    base_model: str,
+    *,
+    adapter_id: str,
+    name: str | None = None,
+    tags: list[str] | None = None,
+    description: str | None = None,
+    benchmark_score: float | None = None,
+) -> AdapterEntry:
+    """Build an AdapterEntry that routes to a base model with no finetune.
+    Used for backends whose base models are always available without any
+    upload (e.g. Cloudflare Workers AI ``@cf/`` models).  ``runtime_name`` is
+    left unset so the backend runs the base model directly.
+    """
+    slug = _slugify(adapter_id)
+    return AdapterEntry(
+        id=slug,
+        name=name or slug.replace("-", " ").title(),
+        base_model=base_model,
+        task_tags=list(tags or []),
+        description=description or f"Base model: {base_model}",
+        benchmark_score=benchmark_score,
+    )
 # ---------------------------------------------------------------------------
 # Internal helpers
 # ---------------------------------------------------------------------------

shiftgate-0.2.1/shiftgate/router/embedder.py ADDED Viewed

@@ -0,0 +1,174 @@
+"""
+Text embedder backed by fastembed.
+Uses ``BAAI/bge-small-en-v1.5`` — a compact (33 M param) model that runs
+efficiently on CPU.  The model is downloaded once by fastembed and cached in
+``~/.shiftgate/fastembed_cache`` (avoids Windows ``%TEMP%`` corruption issues).
+A module-level singleton (``_MODEL``) is created lazily on first use so that
+importing this module is cheap.  The model is NOT re-created between calls.
+"""
+from __future__ import annotations
+import logging
+import time
+from pathlib import Path
+from typing import Any
+import numpy as np
+logger = logging.getLogger(__name__)
+# Stable cache location — fastembed defaults to %TEMP% on Windows, which is
+# prone to partial downloads and "file sizes do not match" corruption.
+FASTEMBED_CACHE_DIR = Path.home() / ".shiftgate" / "fastembed_cache"
+# -------------------------------------------------------------------------
+# Default model — small, CPU-friendly, strong quality/speed trade-off.
+# -------------------------------------------------------------------------
+DEFAULT_MODEL = "BAAI/bge-small-en-v1.5"
+# HuggingFace downloads can flake; retry a few times before giving up.
+_LOAD_RETRIES = 3
+_LOAD_RETRY_DELAY_S = 2.0
+# Module-level singleton; populated on first call to `_get_model()`.
+_MODEL: Any | None = None
+def _is_retryable_download_error(exc: BaseException) -> bool:
+    """Return True for transient HuggingFace / network failures."""
+    msg = str(exc).lower()
+    needles = (
+        "server disconnected",
+        "connection reset",
+        "connection aborted",
+        "timed out",
+        "timeout",
+        "temporary failure",
+        "503",
+        "502",
+        "429",
+    )
+    return any(n in msg for n in needles)
+def _format_load_error(model_name: str, exc: BaseException) -> str:
+    cache_hint = str(FASTEMBED_CACHE_DIR)
+    if _is_retryable_download_error(exc):
+        return (
+            f"Failed to download embedding model '{model_name}' from HuggingFace: {exc}\n"
+            "This is usually a transient network/rate-limit issue. Retry:\n"
+            f"  uv run shiftgate init\n"
+            "Optional: set HF_TOKEN for higher HuggingFace rate limits.\n"
+            f"If downloads keep failing, delete '{cache_hint}' and retry."
+        )
+    return (
+        f"Failed to load embedding model '{model_name}': {exc}\n"
+        "If you see NO_SUCHFILE or 'file sizes do not match', delete the cache "
+        "and retry:\n"
+        f"  Remove-Item -Recurse -Force '{cache_hint}'\n"
+        "Also clear any stale copy at $env:TEMP\\fastembed_cache, then run "
+        "`shiftgate init` again."
+    )
+def _get_model(model_name: str = DEFAULT_MODEL) -> Any:
+    """Return the fastembed TextEmbedding singleton, creating it if needed.
+    The model is loaded once per process.  If you need a different model,
+    call ``reset_model()`` first.
+    """
+    global _MODEL
+    if _MODEL is None:
+        try:
+            from fastembed import TextEmbedding  # type: ignore
+        except ImportError as exc:
+            raise ImportError(
+                "fastembed is required for shiftgate routing. "
+                "Install it with: pip install fastembed"
+            ) from exc
+        FASTEMBED_CACHE_DIR.mkdir(parents=True, exist_ok=True)
+        logger.info(
+            "Loading embedding model '%s' (first use — one-time download may occur)…",
+            model_name,
+        )
+        last_exc: BaseException | None = None
+        for attempt in range(1, _LOAD_RETRIES + 1):
+            try:
+                _MODEL = TextEmbedding(
+                    model_name=model_name,
+                    cache_dir=str(FASTEMBED_CACHE_DIR),
+                )
+                break
+            except Exception as exc:
+                last_exc = exc
+                if attempt < _LOAD_RETRIES and _is_retryable_download_error(exc):
+                    logger.warning(
+                        "Embedder load attempt %d/%d failed (%s); retrying…",
+                        attempt,
+                        _LOAD_RETRIES,
+                        exc,
+                    )
+                    time.sleep(_LOAD_RETRY_DELAY_S * attempt)
+                    continue
+                raise RuntimeError(_format_load_error(model_name, exc)) from exc
+        else:
+            assert last_exc is not None
+            raise RuntimeError(_format_load_error(model_name, last_exc)) from last_exc
+        logger.info("Embedding model loaded.")
+    return _MODEL
+def warm_up(model_name: str = DEFAULT_MODEL) -> int:
+    """Load the embedder and run a dummy embed. Returns embedding dimension."""
+    vec = Embedder(model_name).embed("warmup")
+    return int(vec.shape[0])
+def reset_model() -> None:
+    """Force the next embed call to recreate the model singleton.
+    Useful in tests or when switching models at runtime.
+    """
+    global _MODEL
+    _MODEL = None
+class Embedder:
+    """Thin wrapper around the fastembed TextEmbedding model.
+    All embedding operations are synchronous and run on CPU.  The model
+    is shared across all ``Embedder`` instances via the module-level singleton.
+    """
+    def __init__(self, model_name: str = DEFAULT_MODEL) -> None:
+        self._model_name = model_name
+    @property
+    def _model(self) -> Any:
+        return _get_model(self._model_name)
+    def embed(self, text: str) -> np.ndarray:
+        """Embed a single text string.
+        Returns a 1-D float32 numpy array of shape ``(dim,)``.
+        The vector is **not** L2-normalised here; normalisation is done
+        where appropriate (e.g. when computing task centroids).
+        """
+        # fastembed returns a generator of numpy arrays.
+        results = list(self._model.embed([text]))
+        return np.array(results[0], dtype=np.float32)
+    def embed_batch(self, texts: list[str]) -> np.ndarray:
+        """Embed a list of strings.
+        Returns a 2-D float32 numpy array of shape ``(n, dim)`` where
+        ``n = len(texts)``.
+        """
+        if not texts:
+            raise ValueError("embed_batch received an empty list.")
+        results = list(self._model.embed(texts))
+        return np.array(results, dtype=np.float32)

{shiftgate-0.2.0 → shiftgate-0.2.1}/shiftgate/router/router.py RENAMED Viewed

@@ -80,6 +80,15 @@ def route(
     query_embedding = embedder.embed(query)
     all_tasks = task_registry.get_all_tasks()
     ranked = top_k_tasks(query_embedding, all_tasks, k=top_k)
+    if available_runtimes is not None:
+        logger.debug(
+            "filtering adapters to backend runtimes: %s",
+            sorted(available_runtimes),
+        )
+    else:
+        logger.debug("no active backend — adapter runtime filtering disabled")
     result = select_adapter(ranked, adapter_registry, available_runtimes=available_runtimes)
     selected_id = result.selected_adapter.id if result.selected_adapter else None

shiftgate 0.2.0__tar.gz → 0.2.1__tar.gz

shiftgate 0.2.0tar.gz → 0.2.1tar.gz