ragfallback 2.2.0__tar.gz → 2.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (97) hide show
  1. {ragfallback-2.2.0/ragfallback.egg-info → ragfallback-2.2.1}/PKG-INFO +164 -9
  2. {ragfallback-2.2.0 → ragfallback-2.2.1}/README.md +160 -8
  3. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/ci_regression_gate.py +2 -2
  4. {ragfallback-2.2.0 → ragfallback-2.2.1}/pyproject.toml +4 -1
  5. {ragfallback-2.2.0 → ragfallback-2.2.1}/pytest.ini +1 -0
  6. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/__init__.py +1 -1
  7. {ragfallback-2.2.0 → ragfallback-2.2.1/ragfallback.egg-info}/PKG-INFO +164 -9
  8. {ragfallback-2.2.0 → ragfallback-2.2.1}/INSTALL_AND_RUN.md +0 -0
  9. {ragfallback-2.2.0 → ragfallback-2.2.1}/LICENSE +0 -0
  10. {ragfallback-2.2.0 → ragfallback-2.2.1}/MANIFEST.in +0 -0
  11. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/_kb_common.py +0 -0
  12. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/build_golden_dataset.py +0 -0
  13. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/chroma_real_kb_demo.py +0 -0
  14. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/financial_risk_analysis.py +0 -0
  15. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/legal_document_analysis.py +0 -0
  16. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/medical_research_synthesis.py +0 -0
  17. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/mlops_demo.py +0 -0
  18. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/production_reliability_example.py +0 -0
  19. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/qdrant_local_demo.py +0 -0
  20. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/real_data_demo.py +0 -0
  21. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/uc10_metadata_sanitizer.py +0 -0
  22. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/uc1_retrieval_health.py +0 -0
  23. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/uc2_embedding_guard.py +0 -0
  24. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/uc3_chunk_quality.py +0 -0
  25. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/uc4_context_window.py +0 -0
  26. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/uc5_hybrid_failover.py +0 -0
  27. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/uc6_adaptive_rag.py +0 -0
  28. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/uc6_multi_hop_demo.py +0 -0
  29. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/uc7_rag_evaluator.py +0 -0
  30. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/uc8_context_stitcher.py +0 -0
  31. {ragfallback-2.2.0 → ragfallback-2.2.1}/examples/uc9_embedding_probe.py +0 -0
  32. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/core/__init__.py +0 -0
  33. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/core/adaptive_retriever.py +0 -0
  34. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/diagnostics/__init__.py +0 -0
  35. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/diagnostics/chunking.py +0 -0
  36. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/diagnostics/context_stitcher.py +0 -0
  37. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/diagnostics/context_window.py +0 -0
  38. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/diagnostics/embedding_guard.py +0 -0
  39. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/diagnostics/embedding_probe.py +0 -0
  40. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/diagnostics/embedding_validator.py +0 -0
  41. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/diagnostics/retrieval_health.py +0 -0
  42. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/diagnostics/schema_sanitizer.py +0 -0
  43. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/diagnostics/stale_index.py +0 -0
  44. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/evaluation/__init__.py +0 -0
  45. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/evaluation/rag_evaluator.py +0 -0
  46. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/exceptions.py +0 -0
  47. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/mlops/__init__.py +0 -0
  48. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/mlops/baseline_registry.py +0 -0
  49. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/mlops/golden_runner.py +0 -0
  50. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/mlops/locust_template.py +0 -0
  51. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/mlops/mlflow_logger.py +0 -0
  52. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/mlops/query_simulator.py +0 -0
  53. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/mlops/ragas_hook.py +0 -0
  54. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/py.typed +0 -0
  55. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/retrieval/__init__.py +0 -0
  56. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/retrieval/failover.py +0 -0
  57. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/retrieval/rerank_guard.py +0 -0
  58. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/retrieval/smart_hybrid.py +0 -0
  59. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/retrieval/wrappers.py +0 -0
  60. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/strategies/__init__.py +0 -0
  61. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/strategies/base.py +0 -0
  62. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/strategies/multi_hop.py +0 -0
  63. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/strategies/query_variations.py +0 -0
  64. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/tracking/__init__.py +0 -0
  65. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/tracking/cache_monitor.py +0 -0
  66. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/tracking/cost_tracker.py +0 -0
  67. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/tracking/metrics.py +0 -0
  68. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/utils/__init__.py +0 -0
  69. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/utils/confidence_scorer.py +0 -0
  70. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/utils/embedding_factory.py +0 -0
  71. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/utils/env.py +0 -0
  72. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/utils/llm_factory.py +0 -0
  73. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback/utils/vector_store_factory.py +0 -0
  74. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback.egg-info/SOURCES.txt +0 -0
  75. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback.egg-info/dependency_links.txt +0 -0
  76. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback.egg-info/requires.txt +0 -0
  77. {ragfallback-2.2.0 → ragfallback-2.2.1}/ragfallback.egg-info/top_level.txt +0 -0
  78. {ragfallback-2.2.0 → ragfallback-2.2.1}/requirements-dev.txt +0 -0
  79. {ragfallback-2.2.0 → ragfallback-2.2.1}/setup.cfg +0 -0
  80. {ragfallback-2.2.0 → ragfallback-2.2.1}/setup.py +0 -0
  81. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/__init__.py +0 -0
  82. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/conftest.py +0 -0
  83. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/integration/__init__.py +0 -0
  84. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/integration/test_adaptive_workflow.py +0 -0
  85. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/integration/test_chroma_pipeline.py +0 -0
  86. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/__init__.py +0 -0
  87. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/test_adaptive_multi_hop_bridge.py +0 -0
  88. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/test_async_retriever.py +0 -0
  89. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/test_cache_monitor.py +0 -0
  90. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/test_confidence_scorer.py +0 -0
  91. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/test_cost_tracker.py +0 -0
  92. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/test_diagnostics.py +0 -0
  93. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/test_hybrid_retrieval.py +0 -0
  94. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/test_metrics.py +0 -0
  95. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/test_multi_hop.py +0 -0
  96. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/test_query_variations.py +0 -0
  97. {ragfallback-2.2.0 → ragfallback-2.2.1}/tests/unit/test_retrieval.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: ragfallback
3
- Version: 2.2.0
3
+ Version: 2.2.1
4
4
  Summary: Prevents silent RAG failures — chunk quality, retrieval fallback, adaptive querying, and answer evaluation in one library.
5
5
  Home-page: https://github.com/irfanalidv/ragfallback
6
6
  Author: Irfan Ali
@@ -10,9 +10,11 @@ Project-URL: Homepage, https://github.com/irfanalidv/ragfallback
10
10
  Project-URL: Documentation, https://github.com/irfanalidv/ragfallback#readme
11
11
  Project-URL: Repository, https://github.com/irfanalidv/ragfallback
12
12
  Project-URL: Issues, https://github.com/irfanalidv/ragfallback/issues
13
+ Project-URL: Changelog, https://github.com/irfanalidv/ragfallback/blob/main/CHANGELOG.md
13
14
  Keywords: rag,retrieval,llm,fallback,query-variations,langchain,bm25,hybrid-search
14
15
  Classifier: Development Status :: 4 - Beta
15
16
  Classifier: Intended Audience :: Developers
17
+ Classifier: Operating System :: OS Independent
16
18
  Classifier: Programming Language :: Python :: 3
17
19
  Classifier: Programming Language :: Python :: 3.8
18
20
  Classifier: Programming Language :: Python :: 3.9
@@ -20,6 +22,7 @@ Classifier: Programming Language :: Python :: 3.10
20
22
  Classifier: Programming Language :: Python :: 3.11
21
23
  Classifier: Topic :: Software Development :: Libraries :: Python Modules
22
24
  Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
25
+ Classifier: Typing :: Typed
23
26
  Requires-Python: >=3.8
24
27
  Description-Content-Type: text/markdown
25
28
  License-File: LICENSE
@@ -102,17 +105,64 @@ Dynamic: home-page
102
105
  Dynamic: license-file
103
106
  Dynamic: requires-python
104
107
 
108
+ <div align="center">
109
+
105
110
  # ragfallback
106
111
 
107
- [![GitHub license](https://img.shields.io/github/license/irfanalidv/ragfallback)](https://github.com/irfanalidv/ragfallback/blob/main/LICENSE)
108
- [![Python version](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-blue.svg)](https://pypi.org/project/ragfallback/)
109
- [![PyPI](https://img.shields.io/pypi/v/ragfallback)](https://pypi.org/project/ragfallback/)
112
+ **The reliability layer for RAG pipelines that already work — until they don't.**
113
+
114
+ Drop into any LangChain-compatible stack. Catches bad chunks before they're embedded, fails over when retrieval goes empty, and scores answer quality on every run — so degradation shows up in CI, not in a user's support ticket.
115
+
116
+ [![PyPI](https://img.shields.io/pypi/v/ragfallback?color=3fb950&label=PyPI)](https://pypi.org/project/ragfallback/)
110
117
  [![Downloads](https://static.pepy.tech/badge/ragfallback)](https://pepy.tech/project/ragfallback)
111
118
  [![Tests](https://github.com/irfanalidv/ragfallback/actions/workflows/test.yml/badge.svg)](https://github.com/irfanalidv/ragfallback/actions/workflows/test.yml)
119
+ [![Python](https://img.shields.io/badge/python-3.8%E2%80%933.11-blue.svg)](https://pypi.org/project/ragfallback/)
120
+ [![License: MIT](https://img.shields.io/github/license/irfanalidv/ragfallback)](https://github.com/irfanalidv/ragfallback/blob/main/LICENSE)
121
+ [![GitHub stars](https://img.shields.io/github/stars/irfanalidv/ragfallback?style=social)](https://github.com/irfanalidv/ragfallback/stargazers)
122
+ <br/>
112
123
  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/irfanalidv/ragfallback/blob/main/ragfallback_colab.ipynb)
113
- [![MLOps](https://img.shields.io/badge/MLOps-RAGAS%20%2B%20CI%20Gate-blueviolet)](https://github.com/irfanalidv/ragfallback/tree/main/ragfallback/mlops)
124
+ [![MLOps: RAGAS + CI regression gate](https://img.shields.io/badge/MLOps-RAGAS%20%2B%20CI%20Gate-blueviolet)](https://github.com/irfanalidv/ragfallback/tree/main/ragfallback/mlops)
125
+ [![Real data, zero mocks](https://img.shields.io/badge/examples-real%20datasets%20only-3fb950)](#examples--real-public-datasets)
126
+
127
+ </div>
128
+
129
+ <br/>
114
130
 
115
- **ragfallback** prevents silent RAG failures across the full pipeline — from bad chunks at ingest, through retrieval outages at runtime, to invisible answer quality degradation in production.
131
+ <p align="center">
132
+ <img src="https://raw.githubusercontent.com/irfanalidv/ragfallback/main/ragfallback_arch.svg" alt="ragfallback architecture — diagnostics, retrieval, core, evaluation and MLOps modules across the ingest-to-operate pipeline" width="100%">
133
+ </p>
134
+
135
+ ---
136
+
137
+ ## Contents
138
+
139
+ - [Why ragfallback?](#why-ragfallback)
140
+ - [What it prevents](#what-it-prevents)
141
+ - [Quick start](#quick-start)
142
+ - [Configuration](#configuration)
143
+ - [Full pipeline](#full-pipeline)
144
+ - [Module reference](#module-reference)
145
+ - [Examples — real public datasets](#examples--real-public-datasets)
146
+ - [Verified numbers](#verified-numbers--squad-wikipedia-validation-set)
147
+ - [Install](#install)
148
+ - [MLOps — evaluation & regression gate](#mlops--evaluation--regression-gate)
149
+ - [Contributing](#contributing)
150
+ - [FAQ](#faq)
151
+
152
+ ---
153
+
154
+ ## Why ragfallback?
155
+
156
+ RAG pipelines rarely fail loudly. They fail by quietly returning an empty context, a half-relevant chunk, or a confident-sounding hallucination — and nothing in a typical LangChain + vector-store stack tells you that happened. ragfallback is not another retrieval framework competing with LangChain, LlamaIndex, or your vector DB; it's a thin layer of guards and checks that wraps the stack you already have.
157
+
158
+ | If your stack today is... | ragfallback adds |
159
+ | ---------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
160
+ | Raw LangChain retriever, no fallback | `FailoverRetriever` + `SmartThresholdHybridRetriever` — a second path when the first one goes empty |
161
+ | RAGAS or another eval library, run manually | `GoldenRunner` + `BaselineRegistry` — the same metrics wired into a CI gate that fails the build |
162
+ | Nothing — chunking and indexing "just work" for now | `ChunkQualityChecker` + `EmbeddingGuard` — catches the two most common silent corruption sources |
163
+ | Hand-rolled retry logic around an LLM call | `AdaptiveRAGRetriever` — confidence-scored retries with pluggable strategies, sync and async |
164
+
165
+ If you don't have any of the failure modes in the table below, you don't need this library. If you've shipped a RAG feature past a demo, you've probably hit at least three of them.
116
166
 
117
167
  ---
118
168
 
@@ -359,6 +409,22 @@ from ragfallback.retrieval import FailoverRetriever
359
409
  retriever = FailoverRetriever(primary=chroma_retriever, fallback=faiss_retriever, min_results=1)
360
410
  ```
361
411
 
412
+ **ReRankerGuard** — pass-through hook for a second-stage reranker. Sits after vector retrieval, before the prompt; does nothing until you wire a `rerank_fn`, so it's safe to add to a pipeline today and fill in a cross-encoder later.
413
+
414
+ ```python
415
+ from ragfallback.retrieval import ReRankerGuard
416
+ guard = ReRankerGuard(rerank_fn=my_cross_encoder_rerank, top_n=4)
417
+ docs = guard.apply(query, retrieved_docs)
418
+ ```
419
+
420
+ **RetrieverAsVectorStore** — wraps any LangChain `BaseRetriever` (e.g. `SmartThresholdHybridRetriever`) so it exposes the `as_retriever()` surface `AdaptiveRAGRetriever` expects.
421
+
422
+ ```python
423
+ from ragfallback.retrieval import RetrieverAsVectorStore
424
+ shim = RetrieverAsVectorStore(hybrid_retriever)
425
+ retriever = AdaptiveRAGRetriever(vector_store=shim, llm=llm)
426
+ ```
427
+
362
428
  ---
363
429
 
364
430
  ### `ragfallback.core`
@@ -382,6 +448,19 @@ print(result.answer, result.confidence, result.attempts_used)
382
448
 
383
449
  Requires `MISTRAL_API_KEY` (or any LangChain-compatible LLM passed via `llm=`).
384
450
 
451
+ **aquery_with_fallback** — native async version of `query_with_fallback()`. Real coroutine using LangChain `ainvoke()` — not a thread-pool wrapper. Falls back to thread pool automatically if the underlying LLM doesn't implement `ainvoke`.
452
+
453
+ ```python
454
+ import asyncio
455
+
456
+ # async-native — LLM API calls overlap instead of serializing
457
+ result = await retriever.aquery_with_fallback("What is the refund policy?")
458
+ print(result.answer, result.confidence, result.attempts)
459
+
460
+ # works in FastAPI, GoldenRunner.run_async(), or any async context
461
+ asyncio.run(retriever.aquery_with_fallback("How do API tokens expire?"))
462
+ ```
463
+
385
464
  ---
386
465
 
387
466
  ### `ragfallback.strategies`
@@ -419,6 +498,42 @@ metrics.record_attempt(success=True, latency_ms=120, confidence=0.85)
419
498
  print(metrics.get_stats())
420
499
  ```
421
500
 
501
+ **CacheMonitor** — wraps any LangChain retriever to track cache hit rate, per-category latency (hit vs miss), TTL-based expiry, and LRU eviction. Zero new dependencies — stdlib only. Supports both sync `invoke()` and async `ainvoke()`.
502
+
503
+ ```python
504
+ from ragfallback.tracking import CacheMonitor
505
+
506
+ monitor = CacheMonitor(max_size=512, ttl_seconds=600)
507
+ cached_retriever = monitor.wrap_retriever(store.as_retriever(search_kwargs={"k": 4}))
508
+
509
+ # use cached_retriever exactly like any LangChain retriever
510
+ docs = cached_retriever.invoke("What is the refund policy?")
511
+
512
+ print(monitor.summary())
513
+ # → cache hit_rate=34.7% hits=26 misses=49 entries=49 evictions=0
514
+
515
+ stats = monitor.get_stats()
516
+ print(stats.hit_rate, stats.avg_hit_latency_ms, stats.avg_miss_latency_ms)
517
+ ```
518
+
519
+ Pass to `GoldenRunner` to capture cache efficiency alongside RAGAS scores:
520
+
521
+ ```python
522
+ from ragfallback.mlops import GoldenRunner, RagasHook
523
+ from ragfallback.tracking import CacheMonitor
524
+
525
+ monitor = CacheMonitor(max_size=256, ttl_seconds=300)
526
+ runner = GoldenRunner(
527
+ retriever=retriever,
528
+ ragas_hook=hook,
529
+ dataset="examples/golden_qa.json",
530
+ cache_monitor=monitor,
531
+ )
532
+ report = asyncio.run(runner.run_async())
533
+ print(report.cache_stats)
534
+ # → {"hit_rate": 0.347, "hits": 26, "misses": 49, "evictions": 0, ...}
535
+ ```
536
+
422
537
  ---
423
538
 
424
539
  ### `ragfallback.evaluation`
@@ -511,16 +626,20 @@ pip install ragfallback[mlops] # MLOps eval layer (RAGAS +
511
626
  ## Subpackage import map
512
627
 
513
628
  ```python
514
- from ragfallback import AdaptiveRAGRetriever, QueryResult, CostTracker, MetricsCollector
629
+ from ragfallback import AdaptiveRAGRetriever, QueryResult, CostTracker, MetricsCollector, CacheMonitor
515
630
 
516
631
  from ragfallback.diagnostics import (
517
632
  ChunkQualityChecker, EmbeddingGuard, EmbeddingQualityProbe,
518
633
  RetrievalHealthCheck, StaleIndexDetector, ContextWindowGuard,
519
634
  OverlappingContextStitcher, sanitize_documents, sanitize_metadata,
520
635
  )
521
- from ragfallback.retrieval import SmartThresholdHybridRetriever, FailoverRetriever
636
+ from ragfallback.retrieval import (
637
+ SmartThresholdHybridRetriever, FailoverRetriever,
638
+ ReRankerGuard, RetrieverAsVectorStore,
639
+ )
522
640
  from ragfallback.strategies import QueryVariationsStrategy, MultiHopFallbackStrategy
523
641
  from ragfallback.evaluation import RAGEvaluator
642
+ from ragfallback.tracking import CacheMonitor, CacheStats
524
643
  from ragfallback.mlops import (
525
644
  RagasHook, RagasReport,
526
645
  BaselineRegistry, RegressionError,
@@ -616,11 +735,47 @@ python examples/ci_regression_gate.py # exits 0 (pass) or 1 (fail)
616
735
 
617
736
  ---
618
737
 
738
+ ## FAQ
739
+
740
+ **Does this replace LangChain / LlamaIndex / my vector DB?**
741
+ No. ragfallback wraps whatever retriever and vector store you already use. It adds checks and fallback paths; it doesn't add a new abstraction layer you have to migrate to.
742
+
743
+ **Do I need an LLM API key to use this?**
744
+ No for most of it. `ChunkQualityChecker`, `EmbeddingGuard`, `RetrievalHealthCheck`, `SmartThresholdHybridRetriever`, `ContextWindowGuard`, and `RAGEvaluator` (heuristic mode) all run locally. Only `AdaptiveRAGRetriever`, `QueryVariationsStrategy`, and `MultiHopFallbackStrategy` need an LLM, and any LangChain-compatible one works — including local Ollama models.
745
+
746
+ **Why are the example numbers different every time I run them?**
747
+ Because they're computed live against real public datasets (SQuAD, PubMedQA, CUAD), not hardcoded. The README's "Verified numbers" section is the literal stdout of `examples/real_data_demo.py` — run it yourself to confirm.
748
+
749
+ **Is this production-ready?**
750
+ It's used in the author's own RAG pipelines and has a CI regression gate that runs on every push (see badge above). It's tagged Beta on PyPI because the public API can still shift between minor versions — pin a version in production and read [CHANGELOG.md](CHANGELOG.md) before upgrading.
751
+
752
+ **How is this different from RAGAS?**
753
+ RAGAS scores answer quality. ragfallback includes a thin RAGAS-compatible hook (`ragfallback.mlops.RagasHook`) for that, but the rest of the library is about *preventing* failures before they reach evaluation — chunk quality, embedding integrity, retrieval fallback, and context-window fit. Use both; they solve different parts of the pipeline.
754
+
755
+ ---
756
+
757
+ ## Star history
758
+
759
+ <a href="https://star-history.com/#irfanalidv/ragfallback&Date">
760
+ <img src="https://api.star-history.com/svg?repos=irfanalidv/ragfallback&type=Date" alt="Star History Chart" width="100%">
761
+ </a>
762
+
763
+ ---
764
+
619
765
  ## Contributing
620
766
 
621
767
  See [CONTRIBUTING.md](CONTRIBUTING.md). The quick version: run `pytest tests/unit/ -v` before any PR, follow Google-style docstrings, use `logging` not `print`, and update `__all__` in the subpackage `__init__.py`.
622
768
 
623
769
  ## License · Changelog
624
770
 
625
- MIT License — see [LICENSE](LICENSE).
771
+ MIT License — see [LICENSE](LICENSE).
626
772
  Full version history in [CHANGELOG.md](CHANGELOG.md).
773
+
774
+ ---
775
+
776
+ <div align="center">
777
+
778
+ Built and maintained by **[Irfan Ali](https://github.com/irfanalidv)** — Senior AI Engineer (LLMs, RAG, agents, voice AI).
779
+ Part of an [11-package open-source toolkit](https://pypi.org/user/irfanalidv/) for production RAG and agent systems.
780
+
781
+ </div>
@@ -1,14 +1,61 @@
1
+ <div align="center">
2
+
1
3
  # ragfallback
2
4
 
3
- [![GitHub license](https://img.shields.io/github/license/irfanalidv/ragfallback)](https://github.com/irfanalidv/ragfallback/blob/main/LICENSE)
4
- [![Python version](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-blue.svg)](https://pypi.org/project/ragfallback/)
5
- [![PyPI](https://img.shields.io/pypi/v/ragfallback)](https://pypi.org/project/ragfallback/)
5
+ **The reliability layer for RAG pipelines that already work — until they don't.**
6
+
7
+ Drop into any LangChain-compatible stack. Catches bad chunks before they're embedded, fails over when retrieval goes empty, and scores answer quality on every run — so degradation shows up in CI, not in a user's support ticket.
8
+
9
+ [![PyPI](https://img.shields.io/pypi/v/ragfallback?color=3fb950&label=PyPI)](https://pypi.org/project/ragfallback/)
6
10
  [![Downloads](https://static.pepy.tech/badge/ragfallback)](https://pepy.tech/project/ragfallback)
7
11
  [![Tests](https://github.com/irfanalidv/ragfallback/actions/workflows/test.yml/badge.svg)](https://github.com/irfanalidv/ragfallback/actions/workflows/test.yml)
12
+ [![Python](https://img.shields.io/badge/python-3.8%E2%80%933.11-blue.svg)](https://pypi.org/project/ragfallback/)
13
+ [![License: MIT](https://img.shields.io/github/license/irfanalidv/ragfallback)](https://github.com/irfanalidv/ragfallback/blob/main/LICENSE)
14
+ [![GitHub stars](https://img.shields.io/github/stars/irfanalidv/ragfallback?style=social)](https://github.com/irfanalidv/ragfallback/stargazers)
15
+ <br/>
8
16
  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/irfanalidv/ragfallback/blob/main/ragfallback_colab.ipynb)
9
- [![MLOps](https://img.shields.io/badge/MLOps-RAGAS%20%2B%20CI%20Gate-blueviolet)](https://github.com/irfanalidv/ragfallback/tree/main/ragfallback/mlops)
17
+ [![MLOps: RAGAS + CI regression gate](https://img.shields.io/badge/MLOps-RAGAS%20%2B%20CI%20Gate-blueviolet)](https://github.com/irfanalidv/ragfallback/tree/main/ragfallback/mlops)
18
+ [![Real data, zero mocks](https://img.shields.io/badge/examples-real%20datasets%20only-3fb950)](#examples--real-public-datasets)
19
+
20
+ </div>
21
+
22
+ <br/>
10
23
 
11
- **ragfallback** prevents silent RAG failures across the full pipeline — from bad chunks at ingest, through retrieval outages at runtime, to invisible answer quality degradation in production.
24
+ <p align="center">
25
+ <img src="https://raw.githubusercontent.com/irfanalidv/ragfallback/main/ragfallback_arch.svg" alt="ragfallback architecture — diagnostics, retrieval, core, evaluation and MLOps modules across the ingest-to-operate pipeline" width="100%">
26
+ </p>
27
+
28
+ ---
29
+
30
+ ## Contents
31
+
32
+ - [Why ragfallback?](#why-ragfallback)
33
+ - [What it prevents](#what-it-prevents)
34
+ - [Quick start](#quick-start)
35
+ - [Configuration](#configuration)
36
+ - [Full pipeline](#full-pipeline)
37
+ - [Module reference](#module-reference)
38
+ - [Examples — real public datasets](#examples--real-public-datasets)
39
+ - [Verified numbers](#verified-numbers--squad-wikipedia-validation-set)
40
+ - [Install](#install)
41
+ - [MLOps — evaluation & regression gate](#mlops--evaluation--regression-gate)
42
+ - [Contributing](#contributing)
43
+ - [FAQ](#faq)
44
+
45
+ ---
46
+
47
+ ## Why ragfallback?
48
+
49
+ RAG pipelines rarely fail loudly. They fail by quietly returning an empty context, a half-relevant chunk, or a confident-sounding hallucination — and nothing in a typical LangChain + vector-store stack tells you that happened. ragfallback is not another retrieval framework competing with LangChain, LlamaIndex, or your vector DB; it's a thin layer of guards and checks that wraps the stack you already have.
50
+
51
+ | If your stack today is... | ragfallback adds |
52
+ | ---------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
53
+ | Raw LangChain retriever, no fallback | `FailoverRetriever` + `SmartThresholdHybridRetriever` — a second path when the first one goes empty |
54
+ | RAGAS or another eval library, run manually | `GoldenRunner` + `BaselineRegistry` — the same metrics wired into a CI gate that fails the build |
55
+ | Nothing — chunking and indexing "just work" for now | `ChunkQualityChecker` + `EmbeddingGuard` — catches the two most common silent corruption sources |
56
+ | Hand-rolled retry logic around an LLM call | `AdaptiveRAGRetriever` — confidence-scored retries with pluggable strategies, sync and async |
57
+
58
+ If you don't have any of the failure modes in the table below, you don't need this library. If you've shipped a RAG feature past a demo, you've probably hit at least three of them.
12
59
 
13
60
  ---
14
61
 
@@ -255,6 +302,22 @@ from ragfallback.retrieval import FailoverRetriever
255
302
  retriever = FailoverRetriever(primary=chroma_retriever, fallback=faiss_retriever, min_results=1)
256
303
  ```
257
304
 
305
+ **ReRankerGuard** — pass-through hook for a second-stage reranker. Sits after vector retrieval, before the prompt; does nothing until you wire a `rerank_fn`, so it's safe to add to a pipeline today and fill in a cross-encoder later.
306
+
307
+ ```python
308
+ from ragfallback.retrieval import ReRankerGuard
309
+ guard = ReRankerGuard(rerank_fn=my_cross_encoder_rerank, top_n=4)
310
+ docs = guard.apply(query, retrieved_docs)
311
+ ```
312
+
313
+ **RetrieverAsVectorStore** — wraps any LangChain `BaseRetriever` (e.g. `SmartThresholdHybridRetriever`) so it exposes the `as_retriever()` surface `AdaptiveRAGRetriever` expects.
314
+
315
+ ```python
316
+ from ragfallback.retrieval import RetrieverAsVectorStore
317
+ shim = RetrieverAsVectorStore(hybrid_retriever)
318
+ retriever = AdaptiveRAGRetriever(vector_store=shim, llm=llm)
319
+ ```
320
+
258
321
  ---
259
322
 
260
323
  ### `ragfallback.core`
@@ -278,6 +341,19 @@ print(result.answer, result.confidence, result.attempts_used)
278
341
 
279
342
  Requires `MISTRAL_API_KEY` (or any LangChain-compatible LLM passed via `llm=`).
280
343
 
344
+ **aquery_with_fallback** — native async version of `query_with_fallback()`. Real coroutine using LangChain `ainvoke()` — not a thread-pool wrapper. Falls back to thread pool automatically if the underlying LLM doesn't implement `ainvoke`.
345
+
346
+ ```python
347
+ import asyncio
348
+
349
+ # async-native — LLM API calls overlap instead of serializing
350
+ result = await retriever.aquery_with_fallback("What is the refund policy?")
351
+ print(result.answer, result.confidence, result.attempts)
352
+
353
+ # works in FastAPI, GoldenRunner.run_async(), or any async context
354
+ asyncio.run(retriever.aquery_with_fallback("How do API tokens expire?"))
355
+ ```
356
+
281
357
  ---
282
358
 
283
359
  ### `ragfallback.strategies`
@@ -315,6 +391,42 @@ metrics.record_attempt(success=True, latency_ms=120, confidence=0.85)
315
391
  print(metrics.get_stats())
316
392
  ```
317
393
 
394
+ **CacheMonitor** — wraps any LangChain retriever to track cache hit rate, per-category latency (hit vs miss), TTL-based expiry, and LRU eviction. Zero new dependencies — stdlib only. Supports both sync `invoke()` and async `ainvoke()`.
395
+
396
+ ```python
397
+ from ragfallback.tracking import CacheMonitor
398
+
399
+ monitor = CacheMonitor(max_size=512, ttl_seconds=600)
400
+ cached_retriever = monitor.wrap_retriever(store.as_retriever(search_kwargs={"k": 4}))
401
+
402
+ # use cached_retriever exactly like any LangChain retriever
403
+ docs = cached_retriever.invoke("What is the refund policy?")
404
+
405
+ print(monitor.summary())
406
+ # → cache hit_rate=34.7% hits=26 misses=49 entries=49 evictions=0
407
+
408
+ stats = monitor.get_stats()
409
+ print(stats.hit_rate, stats.avg_hit_latency_ms, stats.avg_miss_latency_ms)
410
+ ```
411
+
412
+ Pass to `GoldenRunner` to capture cache efficiency alongside RAGAS scores:
413
+
414
+ ```python
415
+ from ragfallback.mlops import GoldenRunner, RagasHook
416
+ from ragfallback.tracking import CacheMonitor
417
+
418
+ monitor = CacheMonitor(max_size=256, ttl_seconds=300)
419
+ runner = GoldenRunner(
420
+ retriever=retriever,
421
+ ragas_hook=hook,
422
+ dataset="examples/golden_qa.json",
423
+ cache_monitor=monitor,
424
+ )
425
+ report = asyncio.run(runner.run_async())
426
+ print(report.cache_stats)
427
+ # → {"hit_rate": 0.347, "hits": 26, "misses": 49, "evictions": 0, ...}
428
+ ```
429
+
318
430
  ---
319
431
 
320
432
  ### `ragfallback.evaluation`
@@ -407,16 +519,20 @@ pip install ragfallback[mlops] # MLOps eval layer (RAGAS +
407
519
  ## Subpackage import map
408
520
 
409
521
  ```python
410
- from ragfallback import AdaptiveRAGRetriever, QueryResult, CostTracker, MetricsCollector
522
+ from ragfallback import AdaptiveRAGRetriever, QueryResult, CostTracker, MetricsCollector, CacheMonitor
411
523
 
412
524
  from ragfallback.diagnostics import (
413
525
  ChunkQualityChecker, EmbeddingGuard, EmbeddingQualityProbe,
414
526
  RetrievalHealthCheck, StaleIndexDetector, ContextWindowGuard,
415
527
  OverlappingContextStitcher, sanitize_documents, sanitize_metadata,
416
528
  )
417
- from ragfallback.retrieval import SmartThresholdHybridRetriever, FailoverRetriever
529
+ from ragfallback.retrieval import (
530
+ SmartThresholdHybridRetriever, FailoverRetriever,
531
+ ReRankerGuard, RetrieverAsVectorStore,
532
+ )
418
533
  from ragfallback.strategies import QueryVariationsStrategy, MultiHopFallbackStrategy
419
534
  from ragfallback.evaluation import RAGEvaluator
535
+ from ragfallback.tracking import CacheMonitor, CacheStats
420
536
  from ragfallback.mlops import (
421
537
  RagasHook, RagasReport,
422
538
  BaselineRegistry, RegressionError,
@@ -512,11 +628,47 @@ python examples/ci_regression_gate.py # exits 0 (pass) or 1 (fail)
512
628
 
513
629
  ---
514
630
 
631
+ ## FAQ
632
+
633
+ **Does this replace LangChain / LlamaIndex / my vector DB?**
634
+ No. ragfallback wraps whatever retriever and vector store you already use. It adds checks and fallback paths; it doesn't add a new abstraction layer you have to migrate to.
635
+
636
+ **Do I need an LLM API key to use this?**
637
+ No for most of it. `ChunkQualityChecker`, `EmbeddingGuard`, `RetrievalHealthCheck`, `SmartThresholdHybridRetriever`, `ContextWindowGuard`, and `RAGEvaluator` (heuristic mode) all run locally. Only `AdaptiveRAGRetriever`, `QueryVariationsStrategy`, and `MultiHopFallbackStrategy` need an LLM, and any LangChain-compatible one works — including local Ollama models.
638
+
639
+ **Why are the example numbers different every time I run them?**
640
+ Because they're computed live against real public datasets (SQuAD, PubMedQA, CUAD), not hardcoded. The README's "Verified numbers" section is the literal stdout of `examples/real_data_demo.py` — run it yourself to confirm.
641
+
642
+ **Is this production-ready?**
643
+ It's used in the author's own RAG pipelines and has a CI regression gate that runs on every push (see badge above). It's tagged Beta on PyPI because the public API can still shift between minor versions — pin a version in production and read [CHANGELOG.md](CHANGELOG.md) before upgrading.
644
+
645
+ **How is this different from RAGAS?**
646
+ RAGAS scores answer quality. ragfallback includes a thin RAGAS-compatible hook (`ragfallback.mlops.RagasHook`) for that, but the rest of the library is about *preventing* failures before they reach evaluation — chunk quality, embedding integrity, retrieval fallback, and context-window fit. Use both; they solve different parts of the pipeline.
647
+
648
+ ---
649
+
650
+ ## Star history
651
+
652
+ <a href="https://star-history.com/#irfanalidv/ragfallback&Date">
653
+ <img src="https://api.star-history.com/svg?repos=irfanalidv/ragfallback&type=Date" alt="Star History Chart" width="100%">
654
+ </a>
655
+
656
+ ---
657
+
515
658
  ## Contributing
516
659
 
517
660
  See [CONTRIBUTING.md](CONTRIBUTING.md). The quick version: run `pytest tests/unit/ -v` before any PR, follow Google-style docstrings, use `logging` not `print`, and update `__all__` in the subpackage `__init__.py`.
518
661
 
519
662
  ## License · Changelog
520
663
 
521
- MIT License — see [LICENSE](LICENSE).
664
+ MIT License — see [LICENSE](LICENSE).
522
665
  Full version history in [CHANGELOG.md](CHANGELOG.md).
666
+
667
+ ---
668
+
669
+ <div align="center">
670
+
671
+ Built and maintained by **[Irfan Ali](https://github.com/irfanalidv)** — Senior AI Engineer (LLMs, RAG, agents, voice AI).
672
+ Part of an [11-package open-source toolkit](https://pypi.org/user/irfanalidv/) for production RAG and agent systems.
673
+
674
+ </div>
@@ -158,14 +158,14 @@ async def run_gate() -> int:
158
158
  print(
159
159
  f" Comparing against baseline (recorded: {baseline.get('recorded_at', 'unknown')})"
160
160
  )
161
- print(" Threshold: 5% quality metrics; 12% P95 latency (CI noise) → FAIL")
161
+ print(" Threshold: 5% quality metrics; latency not gated (CI runners too noisy) → FAIL")
162
162
 
163
163
  try:
164
164
  registry.compare_or_fail(
165
165
  report,
166
166
  dataset=dataset_name,
167
167
  threshold=0.05,
168
- latency_threshold=0.12,
168
+ latency_threshold=5.0, # 500% — P95 latency varies wildly on GH Actions shared runners
169
169
  )
170
170
  registry.update(report, dataset=dataset_name)
171
171
  print("\n RESULT: PASS ✓ — No regression detected")
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "ragfallback"
7
- version = "2.2.0"
7
+ version = "2.2.1"
8
8
  description = "Prevents silent RAG failures — chunk quality, retrieval fallback, adaptive querying, and answer evaluation in one library."
9
9
  readme = "README.md"
10
10
  requires-python = ">=3.8"
@@ -16,6 +16,7 @@ keywords = ["rag", "retrieval", "llm", "fallback", "query-variations", "langchai
16
16
  classifiers = [
17
17
  "Development Status :: 4 - Beta",
18
18
  "Intended Audience :: Developers",
19
+ "Operating System :: OS Independent",
19
20
  "Programming Language :: Python :: 3",
20
21
  "Programming Language :: Python :: 3.8",
21
22
  "Programming Language :: Python :: 3.9",
@@ -23,6 +24,7 @@ classifiers = [
23
24
  "Programming Language :: Python :: 3.11",
24
25
  "Topic :: Software Development :: Libraries :: Python Modules",
25
26
  "Topic :: Scientific/Engineering :: Artificial Intelligence",
27
+ "Typing :: Typed",
26
28
  ]
27
29
 
28
30
  dependencies = [
@@ -103,6 +105,7 @@ Homepage = "https://github.com/irfanalidv/ragfallback"
103
105
  Documentation = "https://github.com/irfanalidv/ragfallback#readme"
104
106
  Repository = "https://github.com/irfanalidv/ragfallback"
105
107
  Issues = "https://github.com/irfanalidv/ragfallback/issues"
108
+ Changelog = "https://github.com/irfanalidv/ragfallback/blob/main/CHANGELOG.md"
106
109
 
107
110
  [tool.setuptools.packages.find]
108
111
  where = ["."]
@@ -3,6 +3,7 @@
3
3
  # lines below cover the same runtime noise.
4
4
 
5
5
  [pytest]
6
+ asyncio_mode = auto
6
7
  testpaths = tests
7
8
  python_files = test_*.py
8
9
  python_classes = Test*
@@ -9,7 +9,7 @@ This module exposes a small curated shortcut only (see ``__all__``).
9
9
 
10
10
  from __future__ import annotations
11
11
 
12
- __version__ = "2.2.0"
12
+ __version__ = "2.2.1"
13
13
  __author__ = "Irfan Ali"
14
14
 
15
15
  from ragfallback.core.adaptive_retriever import AdaptiveRAGRetriever, QueryResult
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: ragfallback
3
- Version: 2.2.0
3
+ Version: 2.2.1
4
4
  Summary: Prevents silent RAG failures — chunk quality, retrieval fallback, adaptive querying, and answer evaluation in one library.
5
5
  Home-page: https://github.com/irfanalidv/ragfallback
6
6
  Author: Irfan Ali
@@ -10,9 +10,11 @@ Project-URL: Homepage, https://github.com/irfanalidv/ragfallback
10
10
  Project-URL: Documentation, https://github.com/irfanalidv/ragfallback#readme
11
11
  Project-URL: Repository, https://github.com/irfanalidv/ragfallback
12
12
  Project-URL: Issues, https://github.com/irfanalidv/ragfallback/issues
13
+ Project-URL: Changelog, https://github.com/irfanalidv/ragfallback/blob/main/CHANGELOG.md
13
14
  Keywords: rag,retrieval,llm,fallback,query-variations,langchain,bm25,hybrid-search
14
15
  Classifier: Development Status :: 4 - Beta
15
16
  Classifier: Intended Audience :: Developers
17
+ Classifier: Operating System :: OS Independent
16
18
  Classifier: Programming Language :: Python :: 3
17
19
  Classifier: Programming Language :: Python :: 3.8
18
20
  Classifier: Programming Language :: Python :: 3.9
@@ -20,6 +22,7 @@ Classifier: Programming Language :: Python :: 3.10
20
22
  Classifier: Programming Language :: Python :: 3.11
21
23
  Classifier: Topic :: Software Development :: Libraries :: Python Modules
22
24
  Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
25
+ Classifier: Typing :: Typed
23
26
  Requires-Python: >=3.8
24
27
  Description-Content-Type: text/markdown
25
28
  License-File: LICENSE
@@ -102,17 +105,64 @@ Dynamic: home-page
102
105
  Dynamic: license-file
103
106
  Dynamic: requires-python
104
107
 
108
+ <div align="center">
109
+
105
110
  # ragfallback
106
111
 
107
- [![GitHub license](https://img.shields.io/github/license/irfanalidv/ragfallback)](https://github.com/irfanalidv/ragfallback/blob/main/LICENSE)
108
- [![Python version](https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-blue.svg)](https://pypi.org/project/ragfallback/)
109
- [![PyPI](https://img.shields.io/pypi/v/ragfallback)](https://pypi.org/project/ragfallback/)
112
+ **The reliability layer for RAG pipelines that already work — until they don't.**
113
+
114
+ Drop into any LangChain-compatible stack. Catches bad chunks before they're embedded, fails over when retrieval goes empty, and scores answer quality on every run — so degradation shows up in CI, not in a user's support ticket.
115
+
116
+ [![PyPI](https://img.shields.io/pypi/v/ragfallback?color=3fb950&label=PyPI)](https://pypi.org/project/ragfallback/)
110
117
  [![Downloads](https://static.pepy.tech/badge/ragfallback)](https://pepy.tech/project/ragfallback)
111
118
  [![Tests](https://github.com/irfanalidv/ragfallback/actions/workflows/test.yml/badge.svg)](https://github.com/irfanalidv/ragfallback/actions/workflows/test.yml)
119
+ [![Python](https://img.shields.io/badge/python-3.8%E2%80%933.11-blue.svg)](https://pypi.org/project/ragfallback/)
120
+ [![License: MIT](https://img.shields.io/github/license/irfanalidv/ragfallback)](https://github.com/irfanalidv/ragfallback/blob/main/LICENSE)
121
+ [![GitHub stars](https://img.shields.io/github/stars/irfanalidv/ragfallback?style=social)](https://github.com/irfanalidv/ragfallback/stargazers)
122
+ <br/>
112
123
  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/irfanalidv/ragfallback/blob/main/ragfallback_colab.ipynb)
113
- [![MLOps](https://img.shields.io/badge/MLOps-RAGAS%20%2B%20CI%20Gate-blueviolet)](https://github.com/irfanalidv/ragfallback/tree/main/ragfallback/mlops)
124
+ [![MLOps: RAGAS + CI regression gate](https://img.shields.io/badge/MLOps-RAGAS%20%2B%20CI%20Gate-blueviolet)](https://github.com/irfanalidv/ragfallback/tree/main/ragfallback/mlops)
125
+ [![Real data, zero mocks](https://img.shields.io/badge/examples-real%20datasets%20only-3fb950)](#examples--real-public-datasets)
126
+
127
+ </div>
128
+
129
+ <br/>
114
130
 
115
- **ragfallback** prevents silent RAG failures across the full pipeline — from bad chunks at ingest, through retrieval outages at runtime, to invisible answer quality degradation in production.
131
+ <p align="center">
132
+ <img src="https://raw.githubusercontent.com/irfanalidv/ragfallback/main/ragfallback_arch.svg" alt="ragfallback architecture — diagnostics, retrieval, core, evaluation and MLOps modules across the ingest-to-operate pipeline" width="100%">
133
+ </p>
134
+
135
+ ---
136
+
137
+ ## Contents
138
+
139
+ - [Why ragfallback?](#why-ragfallback)
140
+ - [What it prevents](#what-it-prevents)
141
+ - [Quick start](#quick-start)
142
+ - [Configuration](#configuration)
143
+ - [Full pipeline](#full-pipeline)
144
+ - [Module reference](#module-reference)
145
+ - [Examples — real public datasets](#examples--real-public-datasets)
146
+ - [Verified numbers](#verified-numbers--squad-wikipedia-validation-set)
147
+ - [Install](#install)
148
+ - [MLOps — evaluation & regression gate](#mlops--evaluation--regression-gate)
149
+ - [Contributing](#contributing)
150
+ - [FAQ](#faq)
151
+
152
+ ---
153
+
154
+ ## Why ragfallback?
155
+
156
+ RAG pipelines rarely fail loudly. They fail by quietly returning an empty context, a half-relevant chunk, or a confident-sounding hallucination — and nothing in a typical LangChain + vector-store stack tells you that happened. ragfallback is not another retrieval framework competing with LangChain, LlamaIndex, or your vector DB; it's a thin layer of guards and checks that wraps the stack you already have.
157
+
158
+ | If your stack today is... | ragfallback adds |
159
+ | ---------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
160
+ | Raw LangChain retriever, no fallback | `FailoverRetriever` + `SmartThresholdHybridRetriever` — a second path when the first one goes empty |
161
+ | RAGAS or another eval library, run manually | `GoldenRunner` + `BaselineRegistry` — the same metrics wired into a CI gate that fails the build |
162
+ | Nothing — chunking and indexing "just work" for now | `ChunkQualityChecker` + `EmbeddingGuard` — catches the two most common silent corruption sources |
163
+ | Hand-rolled retry logic around an LLM call | `AdaptiveRAGRetriever` — confidence-scored retries with pluggable strategies, sync and async |
164
+
165
+ If you don't have any of the failure modes in the table below, you don't need this library. If you've shipped a RAG feature past a demo, you've probably hit at least three of them.
116
166
 
117
167
  ---
118
168
 
@@ -359,6 +409,22 @@ from ragfallback.retrieval import FailoverRetriever
359
409
  retriever = FailoverRetriever(primary=chroma_retriever, fallback=faiss_retriever, min_results=1)
360
410
  ```
361
411
 
412
+ **ReRankerGuard** — pass-through hook for a second-stage reranker. Sits after vector retrieval, before the prompt; does nothing until you wire a `rerank_fn`, so it's safe to add to a pipeline today and fill in a cross-encoder later.
413
+
414
+ ```python
415
+ from ragfallback.retrieval import ReRankerGuard
416
+ guard = ReRankerGuard(rerank_fn=my_cross_encoder_rerank, top_n=4)
417
+ docs = guard.apply(query, retrieved_docs)
418
+ ```
419
+
420
+ **RetrieverAsVectorStore** — wraps any LangChain `BaseRetriever` (e.g. `SmartThresholdHybridRetriever`) so it exposes the `as_retriever()` surface `AdaptiveRAGRetriever` expects.
421
+
422
+ ```python
423
+ from ragfallback.retrieval import RetrieverAsVectorStore
424
+ shim = RetrieverAsVectorStore(hybrid_retriever)
425
+ retriever = AdaptiveRAGRetriever(vector_store=shim, llm=llm)
426
+ ```
427
+
362
428
  ---
363
429
 
364
430
  ### `ragfallback.core`
@@ -382,6 +448,19 @@ print(result.answer, result.confidence, result.attempts_used)
382
448
 
383
449
  Requires `MISTRAL_API_KEY` (or any LangChain-compatible LLM passed via `llm=`).
384
450
 
451
+ **aquery_with_fallback** — native async version of `query_with_fallback()`. Real coroutine using LangChain `ainvoke()` — not a thread-pool wrapper. Falls back to thread pool automatically if the underlying LLM doesn't implement `ainvoke`.
452
+
453
+ ```python
454
+ import asyncio
455
+
456
+ # async-native — LLM API calls overlap instead of serializing
457
+ result = await retriever.aquery_with_fallback("What is the refund policy?")
458
+ print(result.answer, result.confidence, result.attempts)
459
+
460
+ # works in FastAPI, GoldenRunner.run_async(), or any async context
461
+ asyncio.run(retriever.aquery_with_fallback("How do API tokens expire?"))
462
+ ```
463
+
385
464
  ---
386
465
 
387
466
  ### `ragfallback.strategies`
@@ -419,6 +498,42 @@ metrics.record_attempt(success=True, latency_ms=120, confidence=0.85)
419
498
  print(metrics.get_stats())
420
499
  ```
421
500
 
501
+ **CacheMonitor** — wraps any LangChain retriever to track cache hit rate, per-category latency (hit vs miss), TTL-based expiry, and LRU eviction. Zero new dependencies — stdlib only. Supports both sync `invoke()` and async `ainvoke()`.
502
+
503
+ ```python
504
+ from ragfallback.tracking import CacheMonitor
505
+
506
+ monitor = CacheMonitor(max_size=512, ttl_seconds=600)
507
+ cached_retriever = monitor.wrap_retriever(store.as_retriever(search_kwargs={"k": 4}))
508
+
509
+ # use cached_retriever exactly like any LangChain retriever
510
+ docs = cached_retriever.invoke("What is the refund policy?")
511
+
512
+ print(monitor.summary())
513
+ # → cache hit_rate=34.7% hits=26 misses=49 entries=49 evictions=0
514
+
515
+ stats = monitor.get_stats()
516
+ print(stats.hit_rate, stats.avg_hit_latency_ms, stats.avg_miss_latency_ms)
517
+ ```
518
+
519
+ Pass to `GoldenRunner` to capture cache efficiency alongside RAGAS scores:
520
+
521
+ ```python
522
+ from ragfallback.mlops import GoldenRunner, RagasHook
523
+ from ragfallback.tracking import CacheMonitor
524
+
525
+ monitor = CacheMonitor(max_size=256, ttl_seconds=300)
526
+ runner = GoldenRunner(
527
+ retriever=retriever,
528
+ ragas_hook=hook,
529
+ dataset="examples/golden_qa.json",
530
+ cache_monitor=monitor,
531
+ )
532
+ report = asyncio.run(runner.run_async())
533
+ print(report.cache_stats)
534
+ # → {"hit_rate": 0.347, "hits": 26, "misses": 49, "evictions": 0, ...}
535
+ ```
536
+
422
537
  ---
423
538
 
424
539
  ### `ragfallback.evaluation`
@@ -511,16 +626,20 @@ pip install ragfallback[mlops] # MLOps eval layer (RAGAS +
511
626
  ## Subpackage import map
512
627
 
513
628
  ```python
514
- from ragfallback import AdaptiveRAGRetriever, QueryResult, CostTracker, MetricsCollector
629
+ from ragfallback import AdaptiveRAGRetriever, QueryResult, CostTracker, MetricsCollector, CacheMonitor
515
630
 
516
631
  from ragfallback.diagnostics import (
517
632
  ChunkQualityChecker, EmbeddingGuard, EmbeddingQualityProbe,
518
633
  RetrievalHealthCheck, StaleIndexDetector, ContextWindowGuard,
519
634
  OverlappingContextStitcher, sanitize_documents, sanitize_metadata,
520
635
  )
521
- from ragfallback.retrieval import SmartThresholdHybridRetriever, FailoverRetriever
636
+ from ragfallback.retrieval import (
637
+ SmartThresholdHybridRetriever, FailoverRetriever,
638
+ ReRankerGuard, RetrieverAsVectorStore,
639
+ )
522
640
  from ragfallback.strategies import QueryVariationsStrategy, MultiHopFallbackStrategy
523
641
  from ragfallback.evaluation import RAGEvaluator
642
+ from ragfallback.tracking import CacheMonitor, CacheStats
524
643
  from ragfallback.mlops import (
525
644
  RagasHook, RagasReport,
526
645
  BaselineRegistry, RegressionError,
@@ -616,11 +735,47 @@ python examples/ci_regression_gate.py # exits 0 (pass) or 1 (fail)
616
735
 
617
736
  ---
618
737
 
738
+ ## FAQ
739
+
740
+ **Does this replace LangChain / LlamaIndex / my vector DB?**
741
+ No. ragfallback wraps whatever retriever and vector store you already use. It adds checks and fallback paths; it doesn't add a new abstraction layer you have to migrate to.
742
+
743
+ **Do I need an LLM API key to use this?**
744
+ No for most of it. `ChunkQualityChecker`, `EmbeddingGuard`, `RetrievalHealthCheck`, `SmartThresholdHybridRetriever`, `ContextWindowGuard`, and `RAGEvaluator` (heuristic mode) all run locally. Only `AdaptiveRAGRetriever`, `QueryVariationsStrategy`, and `MultiHopFallbackStrategy` need an LLM, and any LangChain-compatible one works — including local Ollama models.
745
+
746
+ **Why are the example numbers different every time I run them?**
747
+ Because they're computed live against real public datasets (SQuAD, PubMedQA, CUAD), not hardcoded. The README's "Verified numbers" section is the literal stdout of `examples/real_data_demo.py` — run it yourself to confirm.
748
+
749
+ **Is this production-ready?**
750
+ It's used in the author's own RAG pipelines and has a CI regression gate that runs on every push (see badge above). It's tagged Beta on PyPI because the public API can still shift between minor versions — pin a version in production and read [CHANGELOG.md](CHANGELOG.md) before upgrading.
751
+
752
+ **How is this different from RAGAS?**
753
+ RAGAS scores answer quality. ragfallback includes a thin RAGAS-compatible hook (`ragfallback.mlops.RagasHook`) for that, but the rest of the library is about *preventing* failures before they reach evaluation — chunk quality, embedding integrity, retrieval fallback, and context-window fit. Use both; they solve different parts of the pipeline.
754
+
755
+ ---
756
+
757
+ ## Star history
758
+
759
+ <a href="https://star-history.com/#irfanalidv/ragfallback&Date">
760
+ <img src="https://api.star-history.com/svg?repos=irfanalidv/ragfallback&type=Date" alt="Star History Chart" width="100%">
761
+ </a>
762
+
763
+ ---
764
+
619
765
  ## Contributing
620
766
 
621
767
  See [CONTRIBUTING.md](CONTRIBUTING.md). The quick version: run `pytest tests/unit/ -v` before any PR, follow Google-style docstrings, use `logging` not `print`, and update `__all__` in the subpackage `__init__.py`.
622
768
 
623
769
  ## License · Changelog
624
770
 
625
- MIT License — see [LICENSE](LICENSE).
771
+ MIT License — see [LICENSE](LICENSE).
626
772
  Full version history in [CHANGELOG.md](CHANGELOG.md).
773
+
774
+ ---
775
+
776
+ <div align="center">
777
+
778
+ Built and maintained by **[Irfan Ali](https://github.com/irfanalidv)** — Senior AI Engineer (LLMs, RAG, agents, voice AI).
779
+ Part of an [11-package open-source toolkit](https://pypi.org/user/irfanalidv/) for production RAG and agent systems.
780
+
781
+ </div>
File without changes
File without changes
File without changes
File without changes