PyPI - effgen - Versions diffs - 0.2.2__tar.gz → 0.2.4__tar.gz - Mend

effgen 0.2.2tar.gz → 0.2.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (269) hide show

{effgen-0.2.2/effgen.egg-info → effgen-0.2.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: effgen
-Version: 0.2.2
+Version: 0.2.4
 Summary: A comprehensive framework for building agents with Small Language Models
 Home-page: https://github.com/ctrl-gaurav/effGen
 Author: Gaurav Srivastava
@@ -108,6 +108,16 @@ Provides-Extra: gguf
 Requires-Dist: llama-cpp-python>=0.2.0; extra == "gguf"
 Provides-Extra: cerebras
 Requires-Dist: cerebras-cloud-sdk>=1.0; extra == "cerebras"
+Provides-Extra: groq
+Requires-Dist: groq>=0.15; extra == "groq"
+Provides-Extra: together
+Requires-Dist: together>=1.3; extra == "together"
+Provides-Extra: fireworks
+Requires-Dist: fireworks-ai>=0.15; extra == "fireworks"
+Provides-Extra: replicate
+Requires-Dist: replicate>=1.0; extra == "replicate"
+Provides-Extra: hf
+Requires-Dist: huggingface_hub>=0.26; extra == "hf"
 Provides-Extra: flash-attn
 Requires-Dist: flash-attn>=2.3.0; extra == "flash-attn"
 Provides-Extra: vector-db
@@ -127,6 +137,11 @@ Provides-Extra: monitoring
 Requires-Dist: wandb>=0.16.0; extra == "monitoring"
 Requires-Dist: tensorboard>=2.15.0; extra == "monitoring"
 Provides-Extra: all
+Requires-Dist: pytest>=7.4.0; extra == "all"
+Requires-Dist: pytest-asyncio>=0.21.0; extra == "all"
+Requires-Dist: pytest-cov>=4.1.0; extra == "all"
+Requires-Dist: pytest-timeout>=2.2.0; extra == "all"
+Requires-Dist: pytest-forked>=1.6.0; extra == "all"
 Requires-Dist: vllm>=0.2.7; extra == "all"
 Requires-Dist: faiss-cpu>=1.7.4; extra == "all"
 Requires-Dist: chromadb>=0.4.18; extra == "all"
@@ -148,6 +163,11 @@ Requires-Dist: rouge-score>=0.1.2; extra == "all"
 Requires-Dist: nltk>=3.8.0; extra == "all"
 Requires-Dist: llama-cpp-python>=0.2.0; extra == "all"
 Requires-Dist: cerebras-cloud-sdk>=1.0; extra == "all"
+Requires-Dist: groq>=0.15; extra == "all"
+Requires-Dist: together>=1.3; extra == "all"
+Requires-Dist: fireworks-ai>=0.15; extra == "all"
+Requires-Dist: replicate>=1.0; extra == "all"
+Requires-Dist: huggingface_hub>=0.26; extra == "all"
 Requires-Dist: bitsandbytes>=0.46.1; extra == "all"
 Requires-Dist: datasets>=2.14.0; extra == "all"
 Dynamic: author
@@ -194,6 +214,8 @@ Dynamic: requires-python
 | | Date | Update |
 |:---:|:---|:---|
+| 🚀 | **14 May 2026** | **v0.2.4 Released**: ModelRouter with CostBased/LatencyBased/FirstAvailable policies, transparent provider failover, cross-process SQLite rate-limit coordination, persistent cost tracker + `effgen cost` dashboard CLI. [See changelog](https://github.com/ctrl-gaurav/effGen/blob/main/CHANGELOG.md#024---2026-05-14) |
+| 🚀 | **4 May 2026** | **v0.2.3 Released**: 5 new cloud backends (Groq, Together AI, Fireworks, Replicate, HuggingFace Inference) — 9 providers total. Unified ProviderRegistry, `effgen doctor` auth check, backend parity matrix. [See changelog](https://github.com/ctrl-gaurav/effGen/blob/main/CHANGELOG.md#023---2026-05-04) |
 | 🚀 | **25 Apr 2026** | **v0.2.1 Released**: Cerebras backend (4 free-tier models, streaming, native tool-calling, rate-limit coordinator, cost tracking) + OpenAI gpt-5/gpt-5.4-nano/o-series with `reasoning_effort`, prompt caching, structured outputs v2, and OpenAI native tools (web_search, code_interpreter, file_search). [See changelog](https://github.com/ctrl-gaurav/effGen/blob/main/CHANGELOG.md#021---2026-04-25) |
 | 🚀 | **9 Apr 2026** | **v0.2.0 Released**: Major release — native tool calling, guardrails, multi-agent orchestration, RAG pipeline, 31 tools, eval framework, production API server, MLX Apple Silicon support, Python & TypeScript SDKs. [See changelog](https://github.com/ctrl-gaurav/effGen/blob/main/CHANGELOG.md#020---2026-04-09) |
 | 🍎 | **8 Apr 2026** | **MLX & Apple Silicon support merged** (PR #4): Native Metal GPU acceleration via MLX & MLX-VLM backends. `pip install effgen[mlx]` |
@@ -392,6 +414,41 @@ Production API<br/>
 ---
+## 🆕 ModelRouter — Smart Multi-Provider Routing (v0.2.4)
+Route requests across 9 cloud providers automatically — pick the cheapest, fastest, or first available:
+```python
+from effgen import PolicyBasedRouter, RoutingContext, CostBasedPolicy, LatencyBasedPolicy
+from effgen.models.capabilities import Capability
+# Build a router: try fastest first, fall back to cheapest
+router = PolicyBasedRouter(policies=[LatencyBasedPolicy(), CostBasedPolicy()])
+ctx = RoutingContext(
+    prompt_tokens_estimate=500,
+    user_budget_usd=0.01,       # stay within $0.01
+    latency_budget_ms=3000,     # need response in under 3s
+    required_capabilities={Capability.chat},
+)
+decision = router.route(ctx)
+print(decision.chosen)      # e.g., ProviderModelPair("cerebras", "llama3.1-8b")
+print(decision.eliminated)  # [(pair, reason), ...] — fully explainable
+```
+**Transparent failover** — `route_and_execute` retries on rate-limits, 5xx errors, or timeouts and seamlessly moves to the next-best provider.
+**Cost dashboard** — track every API call:
+```bash
+effgen cost today          # per-provider per-model table
+effgen cost week           # rolling 7-day view
+effgen cost set-budget 1.0 # set $1/day cap
+```
+---
 ## 🎯 Agent Presets
 Get started instantly with ready-to-use agent configurations:

{effgen-0.2.2 → effgen-0.2.4}/README.md RENAMED Viewed

@@ -36,6 +36,8 @@
 | | Date | Update |
 |:---:|:---|:---|
+| 🚀 | **14 May 2026** | **v0.2.4 Released**: ModelRouter with CostBased/LatencyBased/FirstAvailable policies, transparent provider failover, cross-process SQLite rate-limit coordination, persistent cost tracker + `effgen cost` dashboard CLI. [See changelog](CHANGELOG.md#024---2026-05-14) |
+| 🚀 | **4 May 2026** | **v0.2.3 Released**: 5 new cloud backends (Groq, Together AI, Fireworks, Replicate, HuggingFace Inference) — 9 providers total. Unified ProviderRegistry, `effgen doctor` auth check, backend parity matrix. [See changelog](CHANGELOG.md#023---2026-05-04) |
 | 🚀 | **28 Apr 2026** | **v0.2.2 Released**: Gemini 3.x/2.5/2.0 registry, `thinking_budget`, Google Search grounding, Files API, Gemini native tools (GoogleSearch, UrlContext, CodeExecution). Anthropic Claude 4.7 registry, extended thinking, prompt caching (`cache_control`), streaming polish, experimental native tools. [See changelog](CHANGELOG.md#022---2026-04-28) |
 | 🚀 | **25 Apr 2026** | **v0.2.1 Released**: Cerebras backend (4 free-tier models, streaming, native tool-calling, rate-limit coordinator, cost tracking) + OpenAI gpt-5/gpt-5.4-nano/o-series with `reasoning_effort`, prompt caching, structured outputs v2, and OpenAI native tools (web_search, code_interpreter, file_search). [See changelog](CHANGELOG.md#021---2026-04-25) |
 | 🚀 | **9 Apr 2026** | **v0.2.0 Released**: Major release — native tool calling, guardrails, multi-agent orchestration, RAG pipeline, 31 tools, eval framework, production API server, MLX Apple Silicon support, Python & TypeScript SDKs. [See changelog](CHANGELOG.md#020---2026-04-09) |
@@ -270,10 +272,91 @@ Production API<br/>
 ---
-## 🆕 What's New in v0.2.2
+## 🆕 What's New in v0.2.4
 <details open>
-<summary><b>Top 5 features in v0.2.2</b></summary>
+<summary><b>Top 5 features in v0.2.4 — ModelRouter & Cost Optimizer</b></summary>
+1. **`PolicyBasedRouter`** — composable routing engine with three built-in policies. Pick the cheapest provider within your budget, the fastest under your SLA, or simply the first available — and combine them freely.
+   ```python
+   from effgen import PolicyBasedRouter, RoutingContext, CostBasedPolicy, LatencyBasedPolicy
+   from effgen.models.capabilities import Capability
+   router = PolicyBasedRouter(policies=[LatencyBasedPolicy(), CostBasedPolicy()])
+   ctx = RoutingContext(
+       prompt_tokens_estimate=500,
+       user_budget_usd=0.01,
+       latency_budget_ms=3000,
+       required_capabilities={Capability.chat},
+   )
+   decision = router.route(ctx)
+   print(decision.chosen)      # e.g., ProviderModelPair("cerebras", "llama3.1-8b")
+   print(decision.eliminated)  # [(pair, reason), ...] — fully explainable
+   ```
+2. **Transparent failover** — `route_and_execute(ctx, fn)` retries on rate-limits / 5xx / timeouts and seamlessly moves to the next-best provider. Each hop fires a `RouterEvent` to registered subscribers.
+   ```python
+   from effgen import load_model
+   def call_provider(pair):
+       model = load_model(pair.model_id, provider=pair.provider)
+       return model.generate("Hello!").text
+   router.subscribe(
+       lambda event: print(
+           f"Failover: {event.from_provider}/{event.from_model} "
+           f"→ {event.to_provider}/{event.to_model}"
+       )
+   )
+   result = router.route_and_execute(ctx, call_provider)
+   ```
+3. **Cross-process SQLite rate-limit coordination** — share a single rate-limit budget across multiple workers:
+   ```python
+   from effgen import RateLimitCoordinator, SQLiteRateLimitStore
+   store = SQLiteRateLimitStore("~/.effgen/rate_limits.sqlite")
+   coordinator = RateLimitCoordinator(storage=store)  # WAL-mode, BEGIN IMMEDIATE
+   ```
+4. **Persistent cost tracking + `effgen cost` CLI** — every API call persists to SQLite; query spend instantly:
+   ```bash
+   effgen cost today          # per-provider per-model table
+   effgen cost week           # rolling 7-day view
+   effgen cost by-provider    # lifetime totals
+   effgen cost set-budget 1.0 # set $1/day cap (BudgetExceededError at 100%)
+   ```
+5. **Fully explainable decisions + budget guard** — `RouterDecision` records every eliminated provider and why (`"rate_limited"`, `"no_key"`, `"cost_exceeds_budget"`, `"latency_exceeds_sla"`). Configure a daily spend cap; the router automatically fails over to a free-tier provider when the budget is hit.
+</details>
+<details>
+<summary><b>Top 5 features from v0.2.3</b></summary>
+1. **5 new cloud backends** — `GroqAdapter`, `TogetherAdapter`, `FireworksAdapter`, `ReplicateAdapter`, `HFInferenceAdapter` — each with streaming, native tools, rate-limit coordination, and cost tracking. 9 providers total.
+   ```python
+   model = load_model("llama-3.1-8b-instant", provider="groq")
+   model = load_model("Qwen/Qwen2.5-72B-Instruct", provider="hf")
+   ```
+2. **Unified ProviderRegistry** — `list_providers()`, `list_models(provider)`, `lookup(model_id)` consolidated across all 9 adapters. `AmbiguousModelError` on bare IDs shared across providers.
+3. **`effgen doctor`** — new CLI command showing which providers have API keys configured.
+4. **Backend parity matrix** — canonical agentic task ("(17 × 23) + sqrt(144) = 403") runs identically across all providers; streaming and error surfaces verified uniform. See `docs/providers/parity.md`.
+5. **HuggingFace Router support** — `HFInferenceAdapter` with 124-model dynamic catalog, `refresh_models()` + `check_drift()`, `ModelUnavailableError` with `suggest_alternatives()`, and custom Inference Endpoint URL.
+</details>
+<details>
+<summary><b>Top 5 features from v0.2.2 (and earlier)</b></summary>
 1. **Gemini 3.x/2.5/2.0 + Gemma families** — full model registry with correct context windows, output limits, and feature flags; SDK migrated to `google-genai>=1.0.0`.
@@ -586,15 +669,47 @@ result = agent.run("What does the documentation say about configuration?")
 ## 🤖 Multi-Model Support
-effGen supports **7 inference backends** and is tested across 11+ model families:
+effGen supports **9 cloud inference providers** + 4 local backends, tested across 11+ model families:
+| Backend | Platform | Install | Best For |
+|---------|----------|---------|----------|
+| **MLX** | Apple Silicon (M1/M2/M3/M4) | `effgen[mlx]` | Native Metal GPU, unified memory, 4/8-bit quantization |
+| **MLX-VLM** | Apple Silicon | `effgen[mlx-vlm]` | Vision-Language models (Qwen2-VL, LLaVA, Phi-3 Vision, 30+ architectures) |
+| **vLLM** | NVIDIA GPU | `effgen[vllm]` | High-throughput batch inference |
+| **Transformers** | Any (CPU/GPU) | *(bundled)* | Universal compatibility, local models |
+| **OpenAI** | Cloud API | *(bundled)* | gpt-5/gpt-5.4/o-series, reasoning_effort, structured outputs, native tools |
+| **Anthropic** | Cloud API | *(bundled)* | Claude 4.7/4.x, extended thinking, prompt caching, native tools |
+| **Google Gemini** | Cloud API | *(bundled)* | Gemini 3.x/2.5/2.0, thinking_budget, grounding, Files API, native tools |
+| **Cerebras** | Cloud API | `effgen[cerebras]` | 4 free-tier models (llama3.1-8b, qwen-3-235b), ultra-low latency |
+| **Groq** | Cloud API | `effgen[groq]` | 16 models (llama-3.3-70b, mixtral, qwen3-32b), ultra-fast free-tier inference |
+| **Together AI** | Cloud API | `effgen[together]` | 163-model catalog (llama, deepseek, qwen, mistral), per-model pricing |
+| **Fireworks** | Cloud API | `effgen[fireworks]` | 80 chat models (54 tool-capable), serverless + dedicated |
+| **Replicate** | Cloud API | `effgen[replicate]` | 38 models, async run-poll, SSE streaming, compute-second billing |
+| **HuggingFace** | Cloud API | `effgen[hf]` | 124-model HF Router catalog, custom Inference Endpoints, free serverless tier |
+### Provider Auth Check
+```bash
+# See which API keys are configured
+effgen doctor
+```
+### Quick Cloud Start
-| Backend | Platform | Best For |
-|---------|----------|----------|
-| **MLX** | Apple Silicon (M1/M2/M3/M4) | Native Metal GPU, unified memory, 4/8-bit quantization |
-| **MLX-VLM** | Apple Silicon | Vision-Language models (Qwen2-VL, LLaVA, Phi-3 Vision, 30+ architectures) |
-| **vLLM** | NVIDIA GPU | High-throughput batch inference |
-| **Transformers** | Any (CPU/GPU) | Universal compatibility |
-| **API** | Cloud | OpenAI (gpt-5/gpt-5.4/o-series + reasoning_effort), Anthropic (Claude 4.7/4.x + thinking + caching), Google Gemini (3.x/2.5/2.0 + thinking_budget + grounding + Files API + native tools), Cerebras (4 free-tier models, streaming + native tools) |
+```python
+from effgen import load_model, Agent
+from effgen.core.agent import AgentConfig
+from effgen.tools.builtin import Calculator
+# Any of the 9 cloud providers
+model = load_model("llama-3.1-8b-instant", provider="groq")          # Groq
+# model = load_model("meta-llama/Llama-3.3-70B-Instruct-Turbo", provider="together")
+# model = load_model("Qwen/Qwen2.5-72B-Instruct", provider="hf")
+agent = Agent(config=AgentConfig(name="agent", model=model, tools=[Calculator()]))
+result = agent.run("What is (17 * 23) + sqrt(144)?")
+print(result.output)  # → 403
+```
 ### Top Recommended Models

{effgen-0.2.2 → effgen-0.2.4}/README_PYPI.md RENAMED Viewed

@@ -37,6 +37,8 @@
 | | Date | Update |
 |:---:|:---|:---|
+| 🚀 | **14 May 2026** | **v0.2.4 Released**: ModelRouter with CostBased/LatencyBased/FirstAvailable policies, transparent provider failover, cross-process SQLite rate-limit coordination, persistent cost tracker + `effgen cost` dashboard CLI. [See changelog](https://github.com/ctrl-gaurav/effGen/blob/main/CHANGELOG.md#024---2026-05-14) |
+| 🚀 | **4 May 2026** | **v0.2.3 Released**: 5 new cloud backends (Groq, Together AI, Fireworks, Replicate, HuggingFace Inference) — 9 providers total. Unified ProviderRegistry, `effgen doctor` auth check, backend parity matrix. [See changelog](https://github.com/ctrl-gaurav/effGen/blob/main/CHANGELOG.md#023---2026-05-04) |
 | 🚀 | **25 Apr 2026** | **v0.2.1 Released**: Cerebras backend (4 free-tier models, streaming, native tool-calling, rate-limit coordinator, cost tracking) + OpenAI gpt-5/gpt-5.4-nano/o-series with `reasoning_effort`, prompt caching, structured outputs v2, and OpenAI native tools (web_search, code_interpreter, file_search). [See changelog](https://github.com/ctrl-gaurav/effGen/blob/main/CHANGELOG.md#021---2026-04-25) |
 | 🚀 | **9 Apr 2026** | **v0.2.0 Released**: Major release — native tool calling, guardrails, multi-agent orchestration, RAG pipeline, 31 tools, eval framework, production API server, MLX Apple Silicon support, Python & TypeScript SDKs. [See changelog](https://github.com/ctrl-gaurav/effGen/blob/main/CHANGELOG.md#020---2026-04-09) |
 | 🍎 | **8 Apr 2026** | **MLX & Apple Silicon support merged** (PR #4): Native Metal GPU acceleration via MLX & MLX-VLM backends. `pip install effgen[mlx]` |
@@ -235,6 +237,41 @@ Production API<br/>
 ---
+## 🆕 ModelRouter — Smart Multi-Provider Routing (v0.2.4)
+Route requests across 9 cloud providers automatically — pick the cheapest, fastest, or first available:
+```python
+from effgen import PolicyBasedRouter, RoutingContext, CostBasedPolicy, LatencyBasedPolicy
+from effgen.models.capabilities import Capability
+# Build a router: try fastest first, fall back to cheapest
+router = PolicyBasedRouter(policies=[LatencyBasedPolicy(), CostBasedPolicy()])
+ctx = RoutingContext(
+    prompt_tokens_estimate=500,
+    user_budget_usd=0.01,       # stay within $0.01
+    latency_budget_ms=3000,     # need response in under 3s
+    required_capabilities={Capability.chat},
+)
+decision = router.route(ctx)
+print(decision.chosen)      # e.g., ProviderModelPair("cerebras", "llama3.1-8b")
+print(decision.eliminated)  # [(pair, reason), ...] — fully explainable
+```
+**Transparent failover** — `route_and_execute` retries on rate-limits, 5xx errors, or timeouts and seamlessly moves to the next-best provider.
+**Cost dashboard** — track every API call:
+```bash
+effgen cost today          # per-provider per-model table
+effgen cost week           # rolling 7-day view
+effgen cost set-budget 1.0 # set $1/day cap
+```
+---
 ## 🎯 Agent Presets
 Get started instantly with ready-to-use agent configurations:

{effgen-0.2.2 → effgen-0.2.4}/effgen/__init__.py RENAMED Viewed

@@ -9,7 +9,7 @@ This framework enables SLMs to function as powerful agentic systems through:
 - Comprehensive configuration management
 """
-__version__ = "0.2.2"
+__version__ = "0.2.4"
 __author__ = "effGen Team"
 __license__ = "Apache-2.0"
@@ -74,30 +74,94 @@ from effgen.models import (
     AnthropicAdapter,
     BaseModel,
     CerebrasAdapter,
+    CostBasedPolicy,
+    CostTracker,
+    FireworksAdapter,
+    FirstAvailablePolicy,
     GeminiAdapter,
     GenerationConfig,
     GenerationResult,
+    GroqAdapter,
+    HFInferenceAdapter,
+    LatencyBasedPolicy,
+    LatencyTracker,
     ModelLoader,
     OpenAIAdapter,
+    PolicyBasedRouter,
+    ProviderModelPair,
+    ReplicateAdapter,
+    RetryPolicy,
+    RouterDecision,
+    RouterEvent,
+    RoutingContext,
+    RoutingPolicy,
+    SQLiteCostStore,
     StreamChunk,
+    TogetherAdapter,
     TransformersEngine,
     VLLMEngine,
     load_model,
 )
 from effgen.models._rate_limit import RateLimitCoordinator, RateLimitExceeded  # noqa: I001
+from effgen.models._rate_limit_store import SQLiteRateLimitStore  # noqa: I001
+from effgen.models.auth import check_keys
 from effgen.models.cerebras_models import available_models as cerebras_available_models
 from effgen.models.cerebras_models import free_tier_models as cerebras_free_tier_models
 from effgen.models.cerebras_models import model_info as cerebras_model_info
-from effgen.models.errors import ModelRefusalError, ToolIncompatibleError
+from effgen.models.errors import (  # noqa: I001
+    AllCandidatesExhaustedError,
+    AmbiguousModelError,
+    BudgetExceededError,
+    InvalidRequestError,
+    ModelAuthError,
+    ModelNotFoundError,
+    ModelRefusalError,
+    ModelTimeoutError,
+    ModelUnavailableError,
+    NoCandidateWithinBudgetError,
+    ProviderTransientError,
+    ToolIncompatibleError,
+)
+from effgen.models.fireworks_models import available_models as fireworks_available_models
+from effgen.models.fireworks_models import chat_models as fireworks_chat_models
+from effgen.models.fireworks_models import pricing_table as fireworks_pricing_table
+from effgen.models.fireworks_models import refresh_models as fireworks_refresh_models
+from effgen.models.fireworks_models import tool_capable_models as fireworks_tool_capable_models
 from effgen.models.gemini_models import available_models as gemini_available_models
 from effgen.models.gemini_models import free_tier_models as gemini_free_tier_models
 from effgen.models.gemini_models import model_info as gemini_model_info
 from effgen.models.gemini_models import recommended_models as gemini_recommended_models
+from effgen.models.groq_models import available_models as groq_available_models
+from effgen.models.groq_models import chat_models as groq_chat_models
+from effgen.models.groq_models import tool_capable_models as groq_tool_capable_models
+from effgen.models.hf_inference_models import available_models as hf_available_models
+from effgen.models.hf_inference_models import catalog_summary as hf_catalog_summary
+from effgen.models.hf_inference_models import chat_models as hf_chat_models
+from effgen.models.hf_inference_models import cheapest_provider as hf_cheapest_provider
+from effgen.models.hf_inference_models import check_drift as hf_check_drift
+from effgen.models.hf_inference_models import get_model_info as hf_get_model_info
+from effgen.models.hf_inference_models import list_providers_for as hf_list_providers_for
+from effgen.models.hf_inference_models import refresh_models as hf_refresh_models
+from effgen.models.hf_inference_models import serverless_models as hf_serverless_models
+from effgen.models.hf_inference_models import suggest_alternatives as hf_suggest_alternatives
+from effgen.models.hf_inference_models import tool_capable_models as hf_tool_capable_models
 from effgen.models.openai_models import available_models as openai_available_models
 from effgen.models.openai_models import chat_models as openai_chat_models
 from effgen.models.openai_models import model_info as openai_model_info
 from effgen.models.openai_models import reasoning_models as openai_reasoning_models  # noqa: I001
 from effgen.models.openai_schema import to_openai_schema
+from effgen.models.registry import ProviderRegistry, list_models, list_providers, lookup
+from effgen.models.replicate_models import available_models as replicate_available_models
+from effgen.models.replicate_models import get_model_info as replicate_get_model_info
+from effgen.models.replicate_models import refresh_models as replicate_refresh_models
+from effgen.models.replicate_models import streaming_models as replicate_streaming_models
+from effgen.models.replicate_models import tool_capable_models as replicate_tool_capable_models
+from effgen.models.together_models import available_models as together_available_models
+from effgen.models.together_models import chat_models as together_chat_models
+from effgen.models.together_models import pricing_table as together_pricing_table
+from effgen.models.together_models import refresh_models as together_refresh_models
+from effgen.models.together_models import serverless_models as together_serverless_models
+from effgen.models.together_models import tool_capable_models as together_tool_capable_models
 # Preset imports
 from effgen.presets import create_agent, list_presets
@@ -204,17 +268,60 @@ __all__ = [
     "StreamChunk",
     "GeminiAdapter",
     "CerebrasAdapter",
+    "GroqAdapter",
+    "TogetherAdapter",
+    "FireworksAdapter",
+    "ReplicateAdapter",
+    "HFInferenceAdapter",
     "ModelLoader",
     "GenerationConfig",
     "GenerationResult",
+    # Router (v0.2.4+)
+    "PolicyBasedRouter",
+    "RoutingPolicy",
+    "RoutingContext",
+    "RouterDecision",
+    "RouterEvent",
+    "ProviderModelPair",
+    "FirstAvailablePolicy",
+    "CostBasedPolicy",
+    "LatencyBasedPolicy",
+    "RetryPolicy",
+    # Tracking (v0.2.4+)
+    "LatencyTracker",
+    "CostTracker",
+    "SQLiteCostStore",
     "RateLimitCoordinator",
     "RateLimitExceeded",
+    "SQLiteRateLimitStore",
+    # Errors
     "ModelRefusalError",
+    "ModelAuthError",
+    "ModelTimeoutError",
+    "ModelUnavailableError",
+    "ModelNotFoundError",
+    "AmbiguousModelError",
+    "NoCandidateWithinBudgetError",
+    "ToolIncompatibleError",
+    "AllCandidatesExhaustedError",
+    "BudgetExceededError",
+    "ProviderTransientError",
+    "InvalidRequestError",
     "to_openai_schema",
+    # Provider registry + auth
+    "ProviderRegistry",
+    "list_providers",
+    "list_models",
+    "lookup",
+    "check_keys",
     # Cerebras helpers
     "cerebras_available_models",
     "cerebras_free_tier_models",
     "cerebras_model_info",
+    # Groq helpers
+    "groq_available_models",
+    "groq_chat_models",
+    "groq_tool_capable_models",
     # OpenAI helpers
     "openai_available_models",
     "openai_chat_models",
@@ -225,6 +332,37 @@ __all__ = [
     "gemini_free_tier_models",
     "gemini_model_info",
     "gemini_recommended_models",
+    # Together helpers
+    "together_available_models",
+    "together_chat_models",
+    "together_tool_capable_models",
+    "together_pricing_table",
+    "together_refresh_models",
+    "together_serverless_models",
+    # Fireworks helpers
+    "fireworks_available_models",
+    "fireworks_chat_models",
+    "fireworks_tool_capable_models",
+    "fireworks_pricing_table",
+    "fireworks_refresh_models",
+    # Replicate helpers
+    "replicate_available_models",
+    "replicate_streaming_models",
+    "replicate_tool_capable_models",
+    "replicate_refresh_models",
+    "replicate_get_model_info",
+    # HF Inference helpers
+    "hf_available_models",
+    "hf_chat_models",
+    "hf_tool_capable_models",
+    "hf_serverless_models",
+    "hf_suggest_alternatives",
+    "hf_get_model_info",
+    "hf_refresh_models",
+    "hf_check_drift",
+    "hf_catalog_summary",
+    "hf_list_providers_for",
+    "hf_cheapest_provider",
     # Tools
     "BaseTool",

{effgen-0.2.2 → effgen-0.2.4}/effgen/api/__init__.py RENAMED Viewed

@@ -1,6 +1,6 @@
 """effGen API Server v2 — Production Gateway.
-Phase 12 modules:
+Modules:
 - openai_compat: OpenAI-compatible /v1/chat/completions and /v1/completions
 - queue: RequestQueue with priority, fair scheduling, backpressure
 - pool: AgentPool with min/max size and auto-scaling

{effgen-0.2.2 → effgen-0.2.4}/effgen/cache/__init__.py RENAMED Viewed

@@ -1,4 +1,4 @@
-"""effGen caching subsystem (Phase 14).
+"""effGen caching subsystem.
 Provides prompt-prefix caching and result caching for tools and agents.
 All components are pure-Python and have no required external dependencies.

effgen 0.2.2__tar.gz → 0.2.4__tar.gz

effgen 0.2.2tar.gz → 0.2.4tar.gz