PyPI - ragpeek - Versions diffs - 0.1.0__tar.gz - Mend

ragpeek 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

ragpeek-0.1.0/.github/workflows/ci.yaml +50 -0
ragpeek-0.1.0/.gitignore +221 -0
ragpeek-0.1.0/.python-version +1 -0
ragpeek-0.1.0/CHANGELOG.md +33 -0
ragpeek-0.1.0/LICENSE +21 -0
ragpeek-0.1.0/PKG-INFO +311 -0
ragpeek-0.1.0/README.md +280 -0
ragpeek-0.1.0/examples/async_rag.py +148 -0
ragpeek-0.1.0/examples/corpus.py +70 -0
ragpeek-0.1.0/examples/data/jupiter.txt +5 -0
ragpeek-0.1.0/examples/data/saturn.txt +5 -0
ragpeek-0.1.0/examples/data/terrestrial_planets.txt +7 -0
ragpeek-0.1.0/examples/simple_rag.py +46 -0
ragpeek-0.1.0/pyproject.toml +52 -0
ragpeek-0.1.0/ragpeek/__init__.py +22 -0
ragpeek-0.1.0/ragpeek/__main__.py +3 -0
ragpeek-0.1.0/ragpeek/analyzers/__init__.py +56 -0
ragpeek-0.1.0/ragpeek/analyzers/context.py +213 -0
ragpeek-0.1.0/ragpeek/analyzers/generation.py +55 -0
ragpeek-0.1.0/ragpeek/analyzers/retrieval.py +89 -0
ragpeek-0.1.0/ragpeek/cli.py +210 -0
ragpeek-0.1.0/ragpeek/collector.py +60 -0
ragpeek-0.1.0/ragpeek/config.py +23 -0
ragpeek-0.1.0/ragpeek/decorators.py +146 -0
ragpeek-0.1.0/ragpeek/logging.py +103 -0
ragpeek-0.1.0/ragpeek/py.typed +0 -0
ragpeek-0.1.0/ragpeek/renderers/__init__.py +0 -0
ragpeek-0.1.0/ragpeek/renderers/html.py +178 -0
ragpeek-0.1.0/ragpeek/renderers/terminal.py +140 -0
ragpeek-0.1.0/ragpeek/serialization.py +107 -0
ragpeek-0.1.0/ragpeek/session.py +133 -0
ragpeek-0.1.0/tests/__init__.py +0 -0
ragpeek-0.1.0/tests/conftest.py +59 -0
ragpeek-0.1.0/tests/fixtures/sample_session.json +43 -0
ragpeek-0.1.0/tests/test_analyzers.py +294 -0
ragpeek-0.1.0/tests/test_cli.py +158 -0
ragpeek-0.1.0/tests/test_decorators.py +215 -0
ragpeek-0.1.0/tests/test_renderers.py +91 -0
ragpeek-0.1.0/tests/test_session.py +107 -0
ragpeek-0.1.0/uv.lock +3272 -0

ragpeek-0.1.0/.github/workflows/ci.yaml ADDED Viewed

@@ -0,0 +1,50 @@
+name: CI
+on:
+    push:
+        branches: [main]
+    pull_request:
+        branches: [main]
+jobs:
+    lint:
+        runs-on: ubuntu-latest
+        steps:
+            - uses: actions/checkout@v4
+              env:
+                  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
+            - name: Install uv
+              uses: astral-sh/setup-uv@v5
+              env:
+                  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
+            - name: Ruff lint
+              run: uvx ruff check .
+    test:
+        runs-on: ubuntu-latest
+        strategy:
+            matrix:
+                python-version: ["3.10", "3.11", "3.12", "3.13"]
+        steps:
+            - uses: actions/checkout@v4
+              env:
+                  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
+            - name: Install uv
+              uses: astral-sh/setup-uv@v5
+              env:
+                  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
+            - name: Set up Python ${{ matrix.python-version }}
+              run: uv python install ${{ matrix.python-version }}
+            - name: Install dependencies
+              run: uv sync --all-extras
+            - name: Run tests
+              run: uv run pytest tests/ -v

ragpeek-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,221 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[codz]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#   Usually these files are written by a python script from a template
+#   before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py.cover
+.hypothesis/
+.pytest_cache/
+cover/
+# ragpeek example output
+async_report.html
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+# Pipfile.lock
+# UV
+#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+# uv.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+# poetry.lock
+# poetry.toml
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#   pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
+#   https://pdm-project.org/en/latest/usage/project/#working-with-version-control
+# pdm.lock
+# pdm.toml
+.pdm-python
+.pdm-build/
+# pixi
+#   Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
+# pixi.lock
+#   Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
+#   in the .venv directory. It is recommended not to include this directory in version control.
+.pixi
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# Redis
+*.rdb
+*.aof
+*.pid
+# RabbitMQ
+mnesia/
+rabbitmq/
+rabbitmq-data/
+# ActiveMQ
+activemq-data/
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.envrc
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# PyCharm
+#   JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#   be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#   and can be added to the global gitignore or merged into this file.  For a more nuclear
+#   option (not recommended) you can uncomment the following to ignore the entire idea folder.
+# .idea/
+# Abstra
+#   Abstra is an AI-powered process automation framework.
+#   Ignore directories containing user credentials, local state, and settings.
+#   Learn more at https://abstra.io/docs
+.abstra/
+# Visual Studio Code
+#   Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
+#   that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
+#   and can be added to the global gitignore or merged into this file. However, if you prefer,
+#   you could uncomment the following to ignore the entire vscode folder
+# .vscode/
+# Temporary file for partial code execution
+tempCodeRunnerFile.py
+# Ruff stuff:
+.ruff_cache/
+# PyPI configuration file
+.pypirc
+# Marimo
+marimo/_static/
+marimo/_lsp/
+__marimo__/
+# Streamlit
+.streamlit/secrets.toml

ragpeek-0.1.0/.python-version ADDED Viewed

	@@ -0,0 +1 @@
1	+ 3.12.3

ragpeek-0.1.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,33 @@
+# Changelog
+All notable changes to this project are documented here. The format is based on
+[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres
+to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [Unreleased]
+## [0.1.0] - 2026-06-22
+Initial release.
+### Added
+- `@trace` decorator that instruments sync **and** async RAG pipelines, with
+  `log_retrieval`, `log_generation`, and `link_retrieval_to_generation`. The active
+  session rides a `contextvars.ContextVar`, so concurrent traces stay isolated.
+- Retrieval, context, and generation analyzers that produce within-set,
+  calibration-aware **signals** (low-relevance padding, sharp rank-1 precision, flat
+  distribution, k mismatch, rank disagreement, low context utilisation, hedging
+  language) rather than absolute verdicts.
+- Terminal and HTML trace renderers; `serialize_trace` / `deserialize_trace`.
+- `ragpeek` command line:
+  - `ragpeek demo` — ask a question, retrieve over a built-in corpus with real
+    embeddings, generate via a local Ollama server if available, and render the
+    trace.
+  - `ragpeek <trace.json>` — view and diagnose a saved trace.
+- `TracerConfig` for tuning thresholds; `py.typed` so downstream type checkers use
+  the inline type hints.
+- Optional extras: `semantic` (embedding-based context analysis) and `examples`.
+[Unreleased]: https://github.com/meutsabdahal/ragpeek/compare/v0.1.0...HEAD
+[0.1.0]: https://github.com/meutsabdahal/ragpeek/releases/tag/v0.1.0

ragpeek-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Utsab Dahal
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

ragpeek-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,311 @@
+Metadata-Version: 2.4
+Name: ragpeek
+Version: 0.1.0
+Summary: A lightweight debugger for RAG pipelines
+Project-URL: Homepage, https://github.com/meutsabdahal/ragpeek
+Project-URL: Repository, https://github.com/meutsabdahal/ragpeek
+Project-URL: Issues, https://github.com/meutsabdahal/ragpeek/issues
+Author: Utsab Dahal
+License: MIT
+License-File: LICENSE
+Keywords: debugging,developer-tools,llm,observability,rag,retrieval-augmented-generation
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Software Development :: Debuggers
+Requires-Python: >=3.10
+Requires-Dist: rich>=13.0
+Provides-Extra: examples
+Requires-Dist: chromadb>=1.5.9; extra == 'examples'
+Requires-Dist: httpx>=0.27; extra == 'examples'
+Provides-Extra: semantic
+Requires-Dist: scikit-learn>=1.3; extra == 'semantic'
+Requires-Dist: sentence-transformers>=3.0; extra == 'semantic'
+Description-Content-Type: text/markdown
+# ragpeek
+[![CI](https://github.com/meutsabdahal/ragpeek/actions/workflows/ci.yaml/badge.svg)](https://github.com/meutsabdahal/ragpeek/actions/workflows/ci.yaml)
+**A lightweight debugger for RAG pipelines.**
+When a RAG pipeline returns a bad answer, the usual move is to print the retrieved
+chunks and squint at them. ragpeek replaces the squinting: wrap your pipeline in one
+decorator and it shows you, per query, what was retrieved, the score of every chunk,
+the exact prompt sent to the model, and a plain-English read on where things went
+sideways retrieval, context ranking, or generation.
+Ask a question in one command no code (output depends on your question and the LLM):
+```
+$ ragpeek demo
+Question> How hot is Venus?
+Retrieval  k=4/4
+  ✓ 0.77  Venus is the hottest planet, with surface temperatures…
+  ✗ 0.39  Mercury is the smallest planet and the closest to the Sun.
+  ✗ 0.34  Neptune is the most distant planet from the Sun…
+  ✗ 0.31  Mars hosts Olympus Mons, the tallest volcano…
+  ⚠ 3 of 4 chunks sit in the lower half of this result's score range
+    (top 0.77, bottom 0.31) possible low-relevance padding.
+  ✓ Sharp rank-1 separation (0.77 vs 0.39): the retriever cleanly
+    separates the top match a precision signal.
+Generation  model=llama3.2
+  Venus's average surface temperature is around 465 °C…
+  ✓ Generation looks healthy - no obvious signals.
+```
+> **Score convention:** ragpeek assumes **higher scores mean more relevant** chunks.
+> If your vector store returns distances, convert them to similarities first see
+> [Works with any vector store](#works-with-any-vector-store).
+---
+## Install
+```bash
+pip install ragpeek
+```
+The default install is lightweight only [`rich`](https://github.com/Textualize/rich)
+at runtime. For the embedding-based context analyzer (and `ragpeek demo`, which
+retrieves with real embeddings), add the `semantic` extra:
+```bash
+pip install "ragpeek[semantic]"
+```
+Requires Python 3.10+. On first semantic run, ragpeek downloads a small embedding
+model (~80MB) once. `ragpeek demo` also generates an answer if a local
+[Ollama](https://ollama.com) server is running; without one it shows retrieval only.
+**From source:**
+```bash
+git clone https://github.com/meutsabdahal/ragpeek
+cd ragpeek
+uv sync --group dev        # create the env + install dev deps
+uv run pytest tests/ -v
+```
+---
+## Command line
+Once installed, `ragpeek` is a command:
+```bash
+ragpeek demo                       # prompts for a question, then retrieves + answers + traces it
+ragpeek demo "How hot is Venus?"   # or pass the question directly
+ragpeek demo --model mistral       # choose the Ollama model (default: llama3.2)
+ragpeek demo --html report.html    # also save a shareable HTML report
+ragpeek path/to/trace.json         # view a saved trace (from @trace(output=...) / serialize_trace)
+ragpeek                            # help
+```
+`ragpeek demo` retrieves over a small built-in corpus with real embeddings (needs the
+`semantic` extra) and answers via a local Ollama server if one is running. Running
+from a source checkout instead of an install? Prefix with `uv run`:
+```bash
+uv run ragpeek demo "How hot is Venus?"
+uv run ragpeek demo --html report.html              # also save an HTML report
+uv run ragpeek tests/fixtures/sample_session.json   # view a saved trace
+uv run ragpeek                                      # help
+```
+---
+## Instrument your pipeline
+Tracing your own pipeline is two imports and two log calls ragpeek never
+monkey-patches your stack, so it works with any retriever and any model.
+```python
+from ragpeek import trace, log_retrieval, log_generation
+@trace
+def answer_question(query: str) -> str:
+    docs, scores = retriever.search(query, k=5)
+    log_retrieval(query=query, chunks=docs, scores=scores)
+    prompt = build_prompt(docs, query)
+    response = llm.generate(prompt)
+    log_generation(prompt=prompt, response=response, model="llama3.2")
+    return response
+```
+Call the function exactly as before the trace prints automatically:
+```python
+answer_question("Which is the largest planet in the Solar System?")
+```
+Async pipelines work the same way; the active session follows your coroutines
+through every `await` (it rides a `contextvars.ContextVar`), so concurrent
+queries never cross-contaminate:
+```python
+@trace
+async def answer(query: str) -> str:
+    docs, scores = await retriever.asearch(query, k=5)
+    log_retrieval(query=query, chunks=docs, scores=scores)
+    response = await llm.acomplete(build_prompt(docs, query))
+    log_generation(prompt=build_prompt(docs, query), response=response, model="llama3.2")
+    return response
+```
+---
+## Configuration
+Pass a `TracerConfig` to tune thresholds, or flip decorator flags for common cases:
+```python
+from ragpeek import trace, TracerConfig
+config = TracerConfig(
+    score_gap_threshold=0.3,     # rank-1→rank-2 gap that reads as precision
+    semantic=True,               # embedding-based context analysis
+    show_prompt=False,           # hide the full prompt in terminal output
+    # min_score_threshold=0.6,   # opt-in absolute floor — only set once you've
+    #                            # calibrated a cutoff for your own embedder
+)
+@trace(config=config)
+def answer(query: str) -> str:
+    ...
+```
+```python
+@trace(semantic=False)              # skip the embedding model (faster, no download)
+@trace(output="report.html")        # save a shareable HTML report
+@trace(render=False)                # don't print — just populate session.analysis_report
+```
+With `render=False` the analyzers still run; grab the finalized session and hand it
+to downstream tooling with `serialize_trace(...)` (and `deserialize_trace(...)` to
+read it back, e.g. `ragpeek trace.json`).
+---
+## Works with any vector store
+`log_retrieval` takes similarity **scores** (higher = better). Most stores return
+those directly; some return distances you convert first.
+```python
+# ChromaDB (cosine space): distance ∈ [0, 2] → similarity = 1 - distance
+results = collection.query(query_texts=[query], n_results=5)
+log_retrieval(query=query,
+              chunks=results["documents"][0],
+              scores=[1.0 - d for d in results["distances"][0]])
+# FAISS IndexFlatL2 with normalized vectors: similarity = 1 - d² / 2
+distances, indices = index.search(query_embedding, k=5)
+log_retrieval(query=query,
+              chunks=[corpus[i] for i in indices[0]],
+              scores=[1.0 - (d ** 2) / 2 for d in distances[0].tolist()])
+# Qdrant (cosine): .score is already a similarity — use it as-is
+results = client.search("docs", query_vector=embedding, limit=5)
+log_retrieval(query=query,
+              chunks=[r.payload["text"] for r in results],
+              scores=[r.score for r in results])
+```
+> **Note on scores:** ragpeek assumes higher score = more relevant. There is
+> no single distance→similarity formula convert per metric:
+>
+> | Store returns | Correct conversion |
+> |---|---|
+> | Cosine distance (∈ [0, 2]) | `score = 1.0 - distance` (exact) |
+> | L2 / Euclidean, normalized vectors | `score = 1.0 - distance ** 2 / 2` (exact) |
+> | L2 / Euclidean, un-normalized | `score = 1.0 / (1.0 + distance)` (monotonic squash) |
+> | Inner product / dot product | already a similarity use as-is (negate if returned as a distance) |
+>
+> `score = 1.0 - distance` is **only** correct for cosine distance; using it on
+> raw L2 distances silently produces wrong (often negative) similarities.
+Need a non-default retrieval→generation association? Keep the returned span objects
+and pair them explicitly:
+```python
+from ragpeek import trace, log_retrieval, log_generation, link_retrieval_to_generation
+@trace(render=False)
+def answer(query: str) -> str:
+    retrieval = log_retrieval(query=query, chunks=["chunk"], scores=[0.9])
+    response = llm.complete(query)
+    generation = log_generation(prompt=query, response=response, model="llama3.2")
+    link_retrieval_to_generation(retrieval, generation)
+    return response
+```
+---
+## What it surfaces
+These are **signals to calibrate**, not verdicts. Scores are read within each
+result set, so they don't assume an absolute scale tune thresholds to your
+own embedder.
+| Signal | What it means |
+|---|---|
+| Within-set padding | Most chunks fall in the lower half of *this result's* score range (relative, not an absolute cutoff) |
+| Sharp rank-1 separation | The retriever cleanly separates the top match a **precision** signal, not noise |
+| Flat distribution | Scores barely differ the retriever can't discriminate (query too vague / chunks too broad) |
+| k mismatch | Retriever returned fewer chunks than requested |
+| Rank disagreement | The answer aligns with a chunk the retriever didn't rank first a reranking signal |
+| Low context utilisation | The response is semantically dissimilar to every retrieved chunk |
+| Hedging language | Phrase-level signal the model may be answering from training weights, not context |
+---
+## How it works
+1. `@trace` wraps your function and opens a `TraceSession`.
+2. The session id lives in a `contextvars.ContextVar`, so it propagates through both
+   sync and async code without you threading anything through your call stack.
+3. `log_retrieval()` and `log_generation()` read that `ContextVar` and append spans
+   to the active session.
+4. When your function returns, three analyzers run over the collected spans:
+   - **Retrieval**: within-set score distribution, low-relevance padding, rank-1 precision, k mismatch.
+   - **Context**: chunk↔response similarity and the rank-disagreement (reranking) signal.
+   - **Generation**: hedging language and response-length anomalies.
+5. The terminal renderer prints the trace; the HTML renderer saves a shareable report.
+The embedding model runs entirely on your machine your data never leaves it.
+---
+## Limitations
+- **Explicit, not magic.** You call `log_retrieval` / `log_generation` yourself
+  ragpeek doesn't patch framework internals. That's three lines of instrumentation
+  per pipeline, traded for working with any stack.
+- **Signals, not truth.** Retrieval signals are computed *within* each result set and
+  assume higher = better, but they can't know your embedder's absolute scale. Treat
+  every diagnosis as a prompt to calibrate, and convert distances to similarities
+  per metric (table above) before calling `log_retrieval`.
+---
+## Contributing
+Issues and PRs welcome. If a vector-store integration doesn't work or a diagnosis
+looks wrong, open an issue with a minimal reproduction.
+## License
+MIT