PyPI - ragfallback - Versions diffs - 2.2.0__tar.gz → 2.2.2__tar.gz - Mend

ragfallback 2.2.0tar.gz → 2.2.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (97) hide show

{ragfallback-2.2.0 → ragfallback-2.2.2}/MANIFEST.in RENAMED Viewed

@@ -7,12 +7,3 @@ include pytest.ini
 recursive-include ragfallback *.py py.typed
 recursive-include examples *.py
 recursive-include tests *.py

{ragfallback-2.2.0/ragfallback.egg-info → ragfallback-2.2.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ragfallback
-Version: 2.2.0
+Version: 2.2.2
 Summary: Prevents silent RAG failures — chunk quality, retrieval fallback, adaptive querying, and answer evaluation in one library.
 Home-page: https://github.com/irfanalidv/ragfallback
 Author: Irfan Ali
@@ -10,9 +10,11 @@ Project-URL: Homepage, https://github.com/irfanalidv/ragfallback
 Project-URL: Documentation, https://github.com/irfanalidv/ragfallback#readme
 Project-URL: Repository, https://github.com/irfanalidv/ragfallback
 Project-URL: Issues, https://github.com/irfanalidv/ragfallback/issues
+Project-URL: Changelog, https://github.com/irfanalidv/ragfallback/blob/main/CHANGELOG.md
 Keywords: rag,retrieval,llm,fallback,query-variations,langchain,bm25,hybrid-search
 Classifier: Development Status :: 4 - Beta
 Classifier: Intended Audience :: Developers
+Classifier: Operating System :: OS Independent
 Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.8
 Classifier: Programming Language :: Python :: 3.9
@@ -20,6 +22,7 @@ Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Topic :: Software Development :: Libraries :: Python Modules
 Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Typing :: Typed
 Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 License-File: LICENSE
@@ -102,17 +105,65 @@ Dynamic: home-page
 Dynamic: license-file
 Dynamic: requires-python
+<div align="center">
 # ragfallback
-[![GitHub license](https://img.shields.io/github/license/irfanalidv/ragfallback)](https://github.com/irfanalidv/ragfallback/blob/main/LICENSE)
-[![Python version](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-blue.svg)](https://pypi.org/project/ragfallback/)
-[![PyPI](https://img.shields.io/pypi/v/ragfallback)](https://pypi.org/project/ragfallback/)
+**The reliability layer for RAG pipelines that already work — until they don't.**
+Drop into any LangChain-compatible stack. Catches bad chunks before they're embedded, fails over when retrieval goes empty, and scores answer quality on every run — so degradation shows up in CI, not in a user's support ticket.
+[![PyPI](https://img.shields.io/pypi/v/ragfallback?color=3fb950&label=PyPI)](https://pypi.org/project/ragfallback/)
 [![Downloads](https://static.pepy.tech/badge/ragfallback)](https://pepy.tech/project/ragfallback)
 [![Tests](https://github.com/irfanalidv/ragfallback/actions/workflows/test.yml/badge.svg)](https://github.com/irfanalidv/ragfallback/actions/workflows/test.yml)
+[![Lint](https://github.com/irfanalidv/ragfallback/actions/workflows/lint.yml/badge.svg)](https://github.com/irfanalidv/ragfallback/actions/workflows/lint.yml)
+[![Python](https://img.shields.io/badge/python-3.8%E2%80%933.11-blue.svg)](https://pypi.org/project/ragfallback/)
+[![License: MIT](https://img.shields.io/github/license/irfanalidv/ragfallback)](https://github.com/irfanalidv/ragfallback/blob/main/LICENSE)
+[![GitHub stars](https://img.shields.io/github/stars/irfanalidv/ragfallback?style=social)](https://github.com/irfanalidv/ragfallback/stargazers)
+<br/>
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/irfanalidv/ragfallback/blob/main/ragfallback_colab.ipynb)
-[![MLOps](https://img.shields.io/badge/MLOps-RAGAS%20%2B%20CI%20Gate-blueviolet)](https://github.com/irfanalidv/ragfallback/tree/main/ragfallback/mlops)
+[![MLOps: RAGAS + CI regression gate](https://img.shields.io/badge/MLOps-RAGAS%20%2B%20CI%20Gate-blueviolet)](https://github.com/irfanalidv/ragfallback/tree/main/ragfallback/mlops)
+[![Real data, zero mocks](https://img.shields.io/badge/examples-real%20datasets%20only-3fb950)](#examples--real-public-datasets)
+</div>
+<br/>
-**ragfallback** prevents silent RAG failures across the full pipeline — from bad chunks at ingest, through retrieval outages at runtime, to invisible answer quality degradation in production.
+<p align="center">
+  <img src="https://raw.githubusercontent.com/irfanalidv/ragfallback/main/ragfallback_arch.svg" alt="ragfallback architecture — diagnostics, retrieval, core, evaluation and MLOps modules across the ingest-to-operate pipeline" width="100%">
+</p>
+---
+## Contents
+- [Why ragfallback?](#why-ragfallback)
+- [What it prevents](#what-it-prevents)
+- [Quick start](#quick-start)
+- [Configuration](#configuration)
+- [Full pipeline](#full-pipeline)
+- [Module reference](#module-reference)
+- [Examples — real public datasets](#examples--real-public-datasets)
+- [Verified numbers](#verified-numbers--squad-wikipedia-validation-set)
+- [Install](#install)
+- [MLOps — evaluation & regression gate](#mlops--evaluation--regression-gate)
+- [Contributing](#contributing)
+- [FAQ](#faq)
+---
+## Why ragfallback?
+RAG pipelines rarely fail loudly. They fail by quietly returning an empty context, a half-relevant chunk, or a confident-sounding hallucination — and nothing in a typical LangChain + vector-store stack tells you that happened. ragfallback is not another retrieval framework competing with LangChain, LlamaIndex, or your vector DB; it's a thin layer of guards and checks that wraps the stack you already have.
+| If your stack today is...                          | ragfallback adds                                                                                  |
+| ---------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
+| Raw LangChain retriever, no fallback                 | `FailoverRetriever` + `SmartThresholdHybridRetriever` — a second path when the first one goes empty |
+| RAGAS or another eval library, run manually          | `GoldenRunner` + `BaselineRegistry` — the same metrics wired into a CI gate that fails the build     |
+| Nothing — chunking and indexing "just work" for now  | `ChunkQualityChecker` + `EmbeddingGuard` — catches the two most common silent corruption sources     |
+| Hand-rolled retry logic around an LLM call           | `AdaptiveRAGRetriever` — confidence-scored retries with pluggable strategies, sync and async         |
+If you don't have any of the failure modes in the table below, you don't need this library. If you've shipped a RAG feature past a demo, you've probably hit at least three of them.
 ---
@@ -359,6 +410,22 @@ from ragfallback.retrieval import FailoverRetriever
 retriever = FailoverRetriever(primary=chroma_retriever, fallback=faiss_retriever, min_results=1)
 ```
+**ReRankerGuard** — pass-through hook for a second-stage reranker. Sits after vector retrieval, before the prompt; does nothing until you wire a `rerank_fn`, so it's safe to add to a pipeline today and fill in a cross-encoder later.
+```python
+from ragfallback.retrieval import ReRankerGuard
+guard = ReRankerGuard(rerank_fn=my_cross_encoder_rerank, top_n=4)
+docs = guard.apply(query, retrieved_docs)
+```
+**RetrieverAsVectorStore** — wraps any LangChain `BaseRetriever` (e.g. `SmartThresholdHybridRetriever`) so it exposes the `as_retriever()` surface `AdaptiveRAGRetriever` expects.
+```python
+from ragfallback.retrieval import RetrieverAsVectorStore
+shim = RetrieverAsVectorStore(hybrid_retriever)
+retriever = AdaptiveRAGRetriever(vector_store=shim, llm=llm)
+```
 ---
 ### `ragfallback.core`
@@ -382,6 +449,19 @@ print(result.answer, result.confidence, result.attempts_used)
 Requires `MISTRAL_API_KEY` (or any LangChain-compatible LLM passed via `llm=`).
+**aquery_with_fallback** — native async version of `query_with_fallback()`. Real coroutine using LangChain `ainvoke()` — not a thread-pool wrapper. Falls back to thread pool automatically if the underlying LLM doesn't implement `ainvoke`.
+```python
+import asyncio
+# async-native — LLM API calls overlap instead of serializing
+result = await retriever.aquery_with_fallback("What is the refund policy?")
+print(result.answer, result.confidence, result.attempts)
+# works in FastAPI, GoldenRunner.run_async(), or any async context
+asyncio.run(retriever.aquery_with_fallback("How do API tokens expire?"))
+```
 ---
 ### `ragfallback.strategies`
@@ -419,6 +499,42 @@ metrics.record_attempt(success=True, latency_ms=120, confidence=0.85)
 print(metrics.get_stats())
 ```
+**CacheMonitor** — wraps any LangChain retriever to track cache hit rate, per-category latency (hit vs miss), TTL-based expiry, and LRU eviction. Zero new dependencies — stdlib only. Supports both sync `invoke()` and async `ainvoke()`.
+```python
+from ragfallback.tracking import CacheMonitor
+monitor = CacheMonitor(max_size=512, ttl_seconds=600)
+cached_retriever = monitor.wrap_retriever(store.as_retriever(search_kwargs={"k": 4}))
+# use cached_retriever exactly like any LangChain retriever
+docs = cached_retriever.invoke("What is the refund policy?")
+print(monitor.summary())
+# → cache hit_rate=34.7% hits=26 misses=49 entries=49 evictions=0
+stats = monitor.get_stats()
+print(stats.hit_rate, stats.avg_hit_latency_ms, stats.avg_miss_latency_ms)
+```
+Pass to `GoldenRunner` to capture cache efficiency alongside RAGAS scores:
+```python
+from ragfallback.mlops import GoldenRunner, RagasHook
+from ragfallback.tracking import CacheMonitor
+monitor = CacheMonitor(max_size=256, ttl_seconds=300)
+runner = GoldenRunner(
+    retriever=retriever,
+    ragas_hook=hook,
+    dataset="examples/golden_qa.json",
+    cache_monitor=monitor,
+)
+report = asyncio.run(runner.run_async())
+print(report.cache_stats)
+# → {"hit_rate": 0.347, "hits": 26, "misses": 49, "evictions": 0, ...}
+```
 ---
 ### `ragfallback.evaluation`
@@ -480,7 +596,8 @@ RAGEvaluator (10 real Q&A pairs, heuristic, no LLM judge):
   Avg overall      : 62.9%
 ```
-Install: `pip install ragfallback[chroma,huggingface,real-data]`
+Install: `pip install ragfallback[chroma,huggingface,real-data]`
 Dataset: [rajpurkar/squad](https://huggingface.co/datasets/rajpurkar/squad) — CC BY-SA 4.0
 ---
@@ -511,16 +628,20 @@ pip install ragfallback[mlops]                       # MLOps eval layer (RAGAS +
 ## Subpackage import map
 ```python
-from ragfallback import AdaptiveRAGRetriever, QueryResult, CostTracker, MetricsCollector
+from ragfallback import AdaptiveRAGRetriever, QueryResult, CostTracker, MetricsCollector, CacheMonitor
 from ragfallback.diagnostics import (
     ChunkQualityChecker, EmbeddingGuard, EmbeddingQualityProbe,
     RetrievalHealthCheck, StaleIndexDetector, ContextWindowGuard,
     OverlappingContextStitcher, sanitize_documents, sanitize_metadata,
 )
-from ragfallback.retrieval import SmartThresholdHybridRetriever, FailoverRetriever
+from ragfallback.retrieval import (
+    SmartThresholdHybridRetriever, FailoverRetriever,
+    ReRankerGuard, RetrieverAsVectorStore,
+)
 from ragfallback.strategies import QueryVariationsStrategy, MultiHopFallbackStrategy
 from ragfallback.evaluation import RAGEvaluator
+from ragfallback.tracking import CacheMonitor, CacheStats
 from ragfallback.mlops import (
     RagasHook, RagasReport,
     BaselineRegistry, RegressionError,
@@ -616,11 +737,48 @@ python examples/ci_regression_gate.py    # exits 0 (pass) or 1 (fail)
 ---
+## FAQ
+**Does this replace LangChain / LlamaIndex / my vector DB?**
+No. ragfallback wraps whatever retriever and vector store you already use. It adds checks and fallback paths; it doesn't add a new abstraction layer you have to migrate to.
+**Do I need an LLM API key to use this?**
+No for most of it. `ChunkQualityChecker`, `EmbeddingGuard`, `RetrievalHealthCheck`, `SmartThresholdHybridRetriever`, `ContextWindowGuard`, and `RAGEvaluator` (heuristic mode) all run locally. Only `AdaptiveRAGRetriever`, `QueryVariationsStrategy`, and `MultiHopFallbackStrategy` need an LLM, and any LangChain-compatible one works — including local Ollama models.
+**Why are the example numbers different every time I run them?**
+Because they're computed live against real public datasets (SQuAD, PubMedQA, CUAD), not hardcoded. The README's "Verified numbers" section is the literal stdout of `examples/real_data_demo.py` — run it yourself to confirm.
+**Is this production-ready?**
+It's used in the author's own RAG pipelines and has a CI regression gate that runs on every push (see badge above). It's tagged Beta on PyPI because the public API can still shift between minor versions — pin a version in production and read [CHANGELOG.md](CHANGELOG.md) before upgrading.
+**How is this different from RAGAS?**
+RAGAS scores answer quality. ragfallback includes a thin RAGAS-compatible hook (`ragfallback.mlops.RagasHook`) for that, but the rest of the library is about *preventing* failures before they reach evaluation — chunk quality, embedding integrity, retrieval fallback, and context-window fit. Use both; they solve different parts of the pipeline.
+---
+## Star history
+<a href="https://star-history.com/#irfanalidv/ragfallback&Date">
+  <img src="https://api.star-history.com/svg?repos=irfanalidv/ragfallback&type=Date" alt="Star History Chart" width="100%">
+</a>
+---
 ## Contributing
 See [CONTRIBUTING.md](CONTRIBUTING.md). The quick version: run `pytest tests/unit/ -v` before any PR, follow Google-style docstrings, use `logging` not `print`, and update `__all__` in the subpackage `__init__.py`.
 ## License · Changelog
-MIT License — see [LICENSE](LICENSE).
+MIT License — see [LICENSE](LICENSE).
 Full version history in [CHANGELOG.md](CHANGELOG.md).
+---
+<div align="center">
+Built and maintained by **[Irfan Ali](https://github.com/irfanalidv)** — Senior AI Engineer (LLMs, RAG, agents, voice AI).
+Part of an [11-package open-source toolkit](https://pypi.org/user/irfanalidv/) for production RAG and agent systems.
+</div>

{ragfallback-2.2.0 → ragfallback-2.2.2}/README.md RENAMED Viewed

@@ -1,14 +1,62 @@
+<div align="center">
 # ragfallback
-[![GitHub license](https://img.shields.io/github/license/irfanalidv/ragfallback)](https://github.com/irfanalidv/ragfallback/blob/main/LICENSE)
-[![Python version](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-blue.svg)](https://pypi.org/project/ragfallback/)
-[![PyPI](https://img.shields.io/pypi/v/ragfallback)](https://pypi.org/project/ragfallback/)
+**The reliability layer for RAG pipelines that already work — until they don't.**
+Drop into any LangChain-compatible stack. Catches bad chunks before they're embedded, fails over when retrieval goes empty, and scores answer quality on every run — so degradation shows up in CI, not in a user's support ticket.
+[![PyPI](https://img.shields.io/pypi/v/ragfallback?color=3fb950&label=PyPI)](https://pypi.org/project/ragfallback/)
 [![Downloads](https://static.pepy.tech/badge/ragfallback)](https://pepy.tech/project/ragfallback)
 [![Tests](https://github.com/irfanalidv/ragfallback/actions/workflows/test.yml/badge.svg)](https://github.com/irfanalidv/ragfallback/actions/workflows/test.yml)
+[![Lint](https://github.com/irfanalidv/ragfallback/actions/workflows/lint.yml/badge.svg)](https://github.com/irfanalidv/ragfallback/actions/workflows/lint.yml)
+[![Python](https://img.shields.io/badge/python-3.8%E2%80%933.11-blue.svg)](https://pypi.org/project/ragfallback/)
+[![License: MIT](https://img.shields.io/github/license/irfanalidv/ragfallback)](https://github.com/irfanalidv/ragfallback/blob/main/LICENSE)
+[![GitHub stars](https://img.shields.io/github/stars/irfanalidv/ragfallback?style=social)](https://github.com/irfanalidv/ragfallback/stargazers)
+<br/>
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/irfanalidv/ragfallback/blob/main/ragfallback_colab.ipynb)
-[![MLOps](https://img.shields.io/badge/MLOps-RAGAS%20%2B%20CI%20Gate-blueviolet)](https://github.com/irfanalidv/ragfallback/tree/main/ragfallback/mlops)
+[![MLOps: RAGAS + CI regression gate](https://img.shields.io/badge/MLOps-RAGAS%20%2B%20CI%20Gate-blueviolet)](https://github.com/irfanalidv/ragfallback/tree/main/ragfallback/mlops)
+[![Real data, zero mocks](https://img.shields.io/badge/examples-real%20datasets%20only-3fb950)](#examples--real-public-datasets)
+</div>
+<br/>
-**ragfallback** prevents silent RAG failures across the full pipeline — from bad chunks at ingest, through retrieval outages at runtime, to invisible answer quality degradation in production.
+<p align="center">
+  <img src="https://raw.githubusercontent.com/irfanalidv/ragfallback/main/ragfallback_arch.svg" alt="ragfallback architecture — diagnostics, retrieval, core, evaluation and MLOps modules across the ingest-to-operate pipeline" width="100%">
+</p>
+---
+## Contents
+- [Why ragfallback?](#why-ragfallback)
+- [What it prevents](#what-it-prevents)
+- [Quick start](#quick-start)
+- [Configuration](#configuration)
+- [Full pipeline](#full-pipeline)
+- [Module reference](#module-reference)
+- [Examples — real public datasets](#examples--real-public-datasets)
+- [Verified numbers](#verified-numbers--squad-wikipedia-validation-set)
+- [Install](#install)
+- [MLOps — evaluation & regression gate](#mlops--evaluation--regression-gate)
+- [Contributing](#contributing)
+- [FAQ](#faq)
+---
+## Why ragfallback?
+RAG pipelines rarely fail loudly. They fail by quietly returning an empty context, a half-relevant chunk, or a confident-sounding hallucination — and nothing in a typical LangChain + vector-store stack tells you that happened. ragfallback is not another retrieval framework competing with LangChain, LlamaIndex, or your vector DB; it's a thin layer of guards and checks that wraps the stack you already have.
+| If your stack today is...                          | ragfallback adds                                                                                  |
+| ---------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
+| Raw LangChain retriever, no fallback                 | `FailoverRetriever` + `SmartThresholdHybridRetriever` — a second path when the first one goes empty |
+| RAGAS or another eval library, run manually          | `GoldenRunner` + `BaselineRegistry` — the same metrics wired into a CI gate that fails the build     |
+| Nothing — chunking and indexing "just work" for now  | `ChunkQualityChecker` + `EmbeddingGuard` — catches the two most common silent corruption sources     |
+| Hand-rolled retry logic around an LLM call           | `AdaptiveRAGRetriever` — confidence-scored retries with pluggable strategies, sync and async         |
+If you don't have any of the failure modes in the table below, you don't need this library. If you've shipped a RAG feature past a demo, you've probably hit at least three of them.
 ---
@@ -255,6 +303,22 @@ from ragfallback.retrieval import FailoverRetriever
 retriever = FailoverRetriever(primary=chroma_retriever, fallback=faiss_retriever, min_results=1)
 ```
+**ReRankerGuard** — pass-through hook for a second-stage reranker. Sits after vector retrieval, before the prompt; does nothing until you wire a `rerank_fn`, so it's safe to add to a pipeline today and fill in a cross-encoder later.
+```python
+from ragfallback.retrieval import ReRankerGuard
+guard = ReRankerGuard(rerank_fn=my_cross_encoder_rerank, top_n=4)
+docs = guard.apply(query, retrieved_docs)
+```
+**RetrieverAsVectorStore** — wraps any LangChain `BaseRetriever` (e.g. `SmartThresholdHybridRetriever`) so it exposes the `as_retriever()` surface `AdaptiveRAGRetriever` expects.
+```python
+from ragfallback.retrieval import RetrieverAsVectorStore
+shim = RetrieverAsVectorStore(hybrid_retriever)
+retriever = AdaptiveRAGRetriever(vector_store=shim, llm=llm)
+```
 ---
 ### `ragfallback.core`
@@ -278,6 +342,19 @@ print(result.answer, result.confidence, result.attempts_used)
 Requires `MISTRAL_API_KEY` (or any LangChain-compatible LLM passed via `llm=`).
+**aquery_with_fallback** — native async version of `query_with_fallback()`. Real coroutine using LangChain `ainvoke()` — not a thread-pool wrapper. Falls back to thread pool automatically if the underlying LLM doesn't implement `ainvoke`.
+```python
+import asyncio
+# async-native — LLM API calls overlap instead of serializing
+result = await retriever.aquery_with_fallback("What is the refund policy?")
+print(result.answer, result.confidence, result.attempts)
+# works in FastAPI, GoldenRunner.run_async(), or any async context
+asyncio.run(retriever.aquery_with_fallback("How do API tokens expire?"))
+```
 ---
 ### `ragfallback.strategies`
@@ -315,6 +392,42 @@ metrics.record_attempt(success=True, latency_ms=120, confidence=0.85)
 print(metrics.get_stats())
 ```
+**CacheMonitor** — wraps any LangChain retriever to track cache hit rate, per-category latency (hit vs miss), TTL-based expiry, and LRU eviction. Zero new dependencies — stdlib only. Supports both sync `invoke()` and async `ainvoke()`.
+```python
+from ragfallback.tracking import CacheMonitor
+monitor = CacheMonitor(max_size=512, ttl_seconds=600)
+cached_retriever = monitor.wrap_retriever(store.as_retriever(search_kwargs={"k": 4}))
+# use cached_retriever exactly like any LangChain retriever
+docs = cached_retriever.invoke("What is the refund policy?")
+print(monitor.summary())
+# → cache hit_rate=34.7% hits=26 misses=49 entries=49 evictions=0
+stats = monitor.get_stats()
+print(stats.hit_rate, stats.avg_hit_latency_ms, stats.avg_miss_latency_ms)
+```
+Pass to `GoldenRunner` to capture cache efficiency alongside RAGAS scores:
+```python
+from ragfallback.mlops import GoldenRunner, RagasHook
+from ragfallback.tracking import CacheMonitor
+monitor = CacheMonitor(max_size=256, ttl_seconds=300)
+runner = GoldenRunner(
+    retriever=retriever,
+    ragas_hook=hook,
+    dataset="examples/golden_qa.json",
+    cache_monitor=monitor,
+)
+report = asyncio.run(runner.run_async())
+print(report.cache_stats)
+# → {"hit_rate": 0.347, "hits": 26, "misses": 49, "evictions": 0, ...}
+```
 ---
 ### `ragfallback.evaluation`
@@ -376,7 +489,8 @@ RAGEvaluator (10 real Q&A pairs, heuristic, no LLM judge):
   Avg overall      : 62.9%
 ```
-Install: `pip install ragfallback[chroma,huggingface,real-data]`
+Install: `pip install ragfallback[chroma,huggingface,real-data]`
 Dataset: [rajpurkar/squad](https://huggingface.co/datasets/rajpurkar/squad) — CC BY-SA 4.0
 ---
@@ -407,16 +521,20 @@ pip install ragfallback[mlops]                       # MLOps eval layer (RAGAS +
 ## Subpackage import map
 ```python
-from ragfallback import AdaptiveRAGRetriever, QueryResult, CostTracker, MetricsCollector
+from ragfallback import AdaptiveRAGRetriever, QueryResult, CostTracker, MetricsCollector, CacheMonitor
 from ragfallback.diagnostics import (
     ChunkQualityChecker, EmbeddingGuard, EmbeddingQualityProbe,
     RetrievalHealthCheck, StaleIndexDetector, ContextWindowGuard,
     OverlappingContextStitcher, sanitize_documents, sanitize_metadata,
 )
-from ragfallback.retrieval import SmartThresholdHybridRetriever, FailoverRetriever
+from ragfallback.retrieval import (
+    SmartThresholdHybridRetriever, FailoverRetriever,
+    ReRankerGuard, RetrieverAsVectorStore,
+)
 from ragfallback.strategies import QueryVariationsStrategy, MultiHopFallbackStrategy
 from ragfallback.evaluation import RAGEvaluator
+from ragfallback.tracking import CacheMonitor, CacheStats
 from ragfallback.mlops import (
     RagasHook, RagasReport,
     BaselineRegistry, RegressionError,
@@ -512,11 +630,48 @@ python examples/ci_regression_gate.py    # exits 0 (pass) or 1 (fail)
 ---
+## FAQ
+**Does this replace LangChain / LlamaIndex / my vector DB?**
+No. ragfallback wraps whatever retriever and vector store you already use. It adds checks and fallback paths; it doesn't add a new abstraction layer you have to migrate to.
+**Do I need an LLM API key to use this?**
+No for most of it. `ChunkQualityChecker`, `EmbeddingGuard`, `RetrievalHealthCheck`, `SmartThresholdHybridRetriever`, `ContextWindowGuard`, and `RAGEvaluator` (heuristic mode) all run locally. Only `AdaptiveRAGRetriever`, `QueryVariationsStrategy`, and `MultiHopFallbackStrategy` need an LLM, and any LangChain-compatible one works — including local Ollama models.
+**Why are the example numbers different every time I run them?**
+Because they're computed live against real public datasets (SQuAD, PubMedQA, CUAD), not hardcoded. The README's "Verified numbers" section is the literal stdout of `examples/real_data_demo.py` — run it yourself to confirm.
+**Is this production-ready?**
+It's used in the author's own RAG pipelines and has a CI regression gate that runs on every push (see badge above). It's tagged Beta on PyPI because the public API can still shift between minor versions — pin a version in production and read [CHANGELOG.md](CHANGELOG.md) before upgrading.
+**How is this different from RAGAS?**
+RAGAS scores answer quality. ragfallback includes a thin RAGAS-compatible hook (`ragfallback.mlops.RagasHook`) for that, but the rest of the library is about *preventing* failures before they reach evaluation — chunk quality, embedding integrity, retrieval fallback, and context-window fit. Use both; they solve different parts of the pipeline.
+---
+## Star history
+<a href="https://star-history.com/#irfanalidv/ragfallback&Date">
+  <img src="https://api.star-history.com/svg?repos=irfanalidv/ragfallback&type=Date" alt="Star History Chart" width="100%">
+</a>
+---
 ## Contributing
 See [CONTRIBUTING.md](CONTRIBUTING.md). The quick version: run `pytest tests/unit/ -v` before any PR, follow Google-style docstrings, use `logging` not `print`, and update `__all__` in the subpackage `__init__.py`.
 ## License · Changelog
-MIT License — see [LICENSE](LICENSE).
+MIT License — see [LICENSE](LICENSE).
 Full version history in [CHANGELOG.md](CHANGELOG.md).
+---
+<div align="center">
+Built and maintained by **[Irfan Ali](https://github.com/irfanalidv)** — Senior AI Engineer (LLMs, RAG, agents, voice AI).
+Part of an [11-package open-source toolkit](https://pypi.org/user/irfanalidv/) for production RAG and agent systems.
+</div>

{ragfallback-2.2.0 → ragfallback-2.2.2}/examples/build_golden_dataset.py RENAMED Viewed

@@ -32,7 +32,9 @@ def _doc_id(text: str, prefix: str = "doc") -> str:
     return f"{prefix}_{h}"
-def build_squad_samples(n: int = 75) -> Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]:
+def build_squad_samples(
+    n: int = 75,
+) -> Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]:
     """
     Load SQuAD validation split.
@@ -104,7 +106,9 @@ def build_squad_samples(n: int = 75) -> Tuple[List[Dict[str, Any]], List[Dict[st
     return samples, docs_meta
-def build_sciq_samples(n: int = 25) -> Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]:
+def build_sciq_samples(
+    n: int = 25,
+) -> Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]:
     """
     Load SciQ test split — science domain, harder than SQuAD.

{ragfallback-2.2.0 → ragfallback-2.2.2}/examples/chroma_real_kb_demo.py RENAMED Viewed

@@ -33,6 +33,7 @@ warnings.filterwarnings(
 )
 import _kb_common
 from ragfallback import AdaptiveRAGRetriever, CostTracker, MetricsCollector
 from ragfallback.utils.llm_factory import create_open_source_llm
@@ -66,7 +67,9 @@ def _run_demo() -> int:
             collection_name="ragfallback_kb_demo",
         )
     except ImportError as e:
-        print(f"{e}\nInstall: pip install chromadb sentence-transformers", file=sys.stderr)
+        print(
+            f"{e}\nInstall: pip install chromadb sentence-transformers", file=sys.stderr
+        )
         return 1
     print("Chroma collection ready (embeddings computed from real file contents).\n")
@@ -118,7 +121,11 @@ def _run_demo() -> int:
         )
     except Exception as e:
         err = str(e).lower()
-        if "connection refused" in err or "11434" in err or "failed to establish" in err:
+        if (
+            "connection refused" in err
+            or "11434" in err
+            or "failed to establish" in err
+        ):
             print(
                 "\n(Ollama not reachable — retrieval-only preview, no paid API keys.)\n"
                 "For full adaptive RAG: install https://ollama.ai and run: ollama pull llama3\n"
@@ -127,7 +134,9 @@ def _run_demo() -> int:
             for i, doc in enumerate(hits, 1):
                 src = (doc.metadata or {}).get("source", "?")
                 body = (doc.page_content or "")[:400].replace("\n", " ")
-                print(f"  [{i}] source={src}\n      {body}{'…' if len(doc.page_content or '') > 400 else ''}\n")
+                print(
+                    f"  [{i}] source={src}\n      {body}{'…' if len(doc.page_content or '') > 400 else ''}\n"
+                )
             return 0
         print(
             "Adaptive RAG failed. Is Ollama running?\n"
@@ -138,7 +147,9 @@ def _run_demo() -> int:
         return 1
     print(f"\nAnswer:\n  {result.answer}\n")
-    print(f"Confidence: {result.confidence:.2%} | attempts: {result.attempts} | cost: ${result.cost:.4f}")
+    print(
+        f"Confidence: {result.confidence:.2%} | attempts: {result.attempts} | cost: ${result.cost:.4f}"
+    )
     if result.intermediate_steps:
         print("\nIntermediate steps (queries tried):")
@@ -153,7 +164,9 @@ def _run_demo() -> int:
 def main() -> int:
     import logging
-    logging.getLogger("ragfallback.strategies.query_variations").setLevel(logging.CRITICAL)
+    logging.getLogger("ragfallback.strategies.query_variations").setLevel(
+        logging.CRITICAL
+    )
     with warnings.catch_warnings():
         warnings.simplefilter("ignore")
         return _run_demo()

{ragfallback-2.2.0 → ragfallback-2.2.2}/examples/ci_regression_gate.py RENAMED Viewed

@@ -158,14 +158,16 @@ async def run_gate() -> int:
     print(
         f"  Comparing against baseline (recorded: {baseline.get('recorded_at', 'unknown')})"
     )
-    print("  Threshold: 5% quality metrics; 12% P95 latency (CI noise) → FAIL")
+    print(
+        "  Threshold: 5% quality metrics; latency not gated (CI runners too noisy) → FAIL"
+    )
     try:
         registry.compare_or_fail(
             report,
             dataset=dataset_name,
             threshold=0.05,
-            latency_threshold=0.12,
+            latency_threshold=5.0,  # 500% — P95 latency varies wildly on GH Actions shared runners
         )
         registry.update(report, dataset=dataset_name)
         print("\n  RESULT: PASS ✓ — No regression detected")

{ragfallback-2.2.0 → ragfallback-2.2.2}/examples/financial_risk_analysis.py RENAMED Viewed

@@ -13,11 +13,14 @@ Env vars     : NONE required for retrieval demo; HF_TOKEN optional for LLM
 from __future__ import annotations
-import sys
 import os
+import sys
 _repo_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
-if os.path.isdir(os.path.join(_repo_root, "ragfallback")) and _repo_root not in sys.path:
+if (
+    os.path.isdir(os.path.join(_repo_root, "ragfallback"))
+    and _repo_root not in sys.path
+):
     sys.path.insert(0, _repo_root)
 _examples_dir = os.path.dirname(os.path.abspath(__file__))
@@ -66,7 +69,9 @@ def main() -> None:
     checker = ChunkQualityChecker(min_chars=40)
     report = checker.check(documents)
-    print(f"\nChunkQualityChecker: {report.n_chunks} sentences  Violations: {len(report.violations)}")
+    print(
+        f"\nChunkQualityChecker: {report.n_chunks} sentences  Violations: {len(report.violations)}"
+    )
     import _kb_common
@@ -96,7 +101,9 @@ def main() -> None:
         print(f"     → {best}...")
     print("\n✅ Financial RAG demo complete (no paid API keys used).")
-    print("   To add LLM generation: set HF_TOKEN env var and pass an LLM to AdaptiveRAGRetriever.")
+    print(
+        "   To add LLM generation: set HF_TOKEN env var and pass an LLM to AdaptiveRAGRetriever."
+    )
 if __name__ == "__main__":

ragfallback 2.2.0__tar.gz → 2.2.2__tar.gz

ragfallback 2.2.0tar.gz → 2.2.2tar.gz