PyPI - traceredact - Versions diffs - 0.2.2__tar.gz → 0.2.3__tar.gz - Mend

traceredact 0.2.2tar.gz → 0.2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (59) hide show

{traceredact-0.2.2 → traceredact-0.2.3}/CHANGELOG.md RENAMED Viewed

@@ -6,6 +6,13 @@ All notable changes to this project are documented here. The format is based on
 ## [Unreleased]
+## [0.2.3] — 2026-06-07
+### Added
+- `examples/` — runnable, heavily-commented scenarios: drop-in `logging.Filter`,
+  FastAPI/ASGI body-logging middleware, streaming, redacting traces before a
+  DB/observability sink, custom policies, SDK wrappers, LangChain, CI gate.
 ## [0.2.2] — 2026-06-07
 ### Changed
@@ -80,7 +87,8 @@ Initial release.
 - Policy file (`traceredact.yml`): detector toggles, entropy thresholds,
   allowlist, custom patterns, placeholder template, optional HMAC correlation.
-[Unreleased]: https://github.com/traceredact/traceredact/compare/v0.2.2...HEAD
+[Unreleased]: https://github.com/traceredact/traceredact/compare/v0.2.3...HEAD
+[0.2.3]: https://github.com/traceredact/traceredact/compare/v0.2.2...v0.2.3
 [0.2.2]: https://github.com/traceredact/traceredact/compare/v0.2.1...v0.2.2
 [0.2.1]: https://github.com/traceredact/traceredact/compare/v0.2.0...v0.2.1
 [0.2.0]: https://github.com/traceredact/traceredact/compare/v0.1.3...v0.2.0

{traceredact-0.2.2 → traceredact-0.2.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: traceredact
-Version: 0.2.2
+Version: 0.2.3
 Summary: Redact PII and secrets from AI prompts, traces and tool-call arguments before they reach your loggers.
 Project-URL: Homepage, https://traceredact.com
 Project-URL: Documentation, https://traceredact.com
@@ -134,6 +134,15 @@ pydantic models, dataclasses and attrs instances are traversed automatically
 `Policy(decode_payloads=True)` base64-decodes blobs one layer and, if the decoded
 text contains a high-confidence secret, redacts the whole blob.
+## Examples
+Runnable, heavily-commented scenarios live in
+**[`examples/`](https://github.com/traceredact/traceredact/tree/main/examples)**:
+a drop-in [`logging.Filter`](https://github.com/traceredact/traceredact/blob/main/examples/04_logging_filter.py),
+[FastAPI/ASGI middleware](https://github.com/traceredact/traceredact/blob/main/examples/09_fastapi_middleware.py),
+[streaming](https://github.com/traceredact/traceredact/blob/main/examples/05_streaming.py),
+redacting traces before your DB/Langfuse, custom policies, and a CI gate.
 ## Policy file (`traceredact.yml`)
 Drop a `traceredact.yml` in your repo root (auto-discovered) or pass `--policy`:

{traceredact-0.2.2 → traceredact-0.2.3}/README.md RENAMED Viewed

@@ -97,6 +97,15 @@ pydantic models, dataclasses and attrs instances are traversed automatically
 `Policy(decode_payloads=True)` base64-decodes blobs one layer and, if the decoded
 text contains a high-confidence secret, redacts the whole blob.
+## Examples
+Runnable, heavily-commented scenarios live in
+**[`examples/`](https://github.com/traceredact/traceredact/tree/main/examples)**:
+a drop-in [`logging.Filter`](https://github.com/traceredact/traceredact/blob/main/examples/04_logging_filter.py),
+[FastAPI/ASGI middleware](https://github.com/traceredact/traceredact/blob/main/examples/09_fastapi_middleware.py),
+[streaming](https://github.com/traceredact/traceredact/blob/main/examples/05_streaming.py),
+redacting traces before your DB/Langfuse, custom policies, and a CI gate.
 ## Policy file (`traceredact.yml`)
 Drop a `traceredact.yml` in your repo root (auto-discovered) or pass `--policy`:

traceredact-0.2.3/examples/01_basics.py ADDED Viewed

@@ -0,0 +1,61 @@
+"""01 · Basics — redact a string and a nested payload, and inspect findings.
+Run it:
+    pip install traceredact
+    python examples/01_basics.py
+Everything here uses only the core API; no third-party SDKs required.
+"""
+from traceredact import redact
+# ---------------------------------------------------------------------------
+# 1) Redacting a free-form string.
+#
+# `redact()` returns a RedactionResult with three things you care about:
+#   - .value         the redacted copy (same shape as the input)
+#   - .findings      a list of what was found (detector id, category, json_path)
+#   - .has_findings  True if anything was redacted (handy for CI gates)
+# ---------------------------------------------------------------------------
+text = "email me at alice@acme.io, my key is sk-1234567890abcdefABCDEFGH"
+result = redact(text)
+print("redacted :", result.value)
+# -> email me at [REDACTED:pii], my key is [REDACTED:secret]
+print("has_findings:", result.has_findings)
+for f in result.findings:
+    # `preview` is a non-reversible masked sample, safe to log/print.
+    print(f"  - {f.detector_id:<22} conf={f.confidence:<4} preview={f.preview}")
+# ---------------------------------------------------------------------------
+# 2) Redacting a nested structure (e.g. an agent trace / tool-call payload).
+#
+# The engine walks dicts, lists, tuples and sets recursively. The INPUT IS
+# NEVER MUTATED — `result.value` is a redacted copy. Each finding carries a
+# `json_path` so you know exactly where the sensitive value lived.
+# ---------------------------------------------------------------------------
+trace = {
+    "user": {"email": "bob@example.com", "plan": "pro"},   # plan is untouched
+    "tool_call": {
+        "name": "charge_card",
+        "arguments": {"card": "4111 1111 1111 1111", "amount": 4200},
+    },
+    "config": {"openai_api_key": "sk-proj-Tt3kZ9qRsuVwXyZ012345abcd"},
+}
+result = redact(trace)
+import json  # noqa: E402  (imported here just to pretty-print the example)
+print("\nredacted trace:")
+print(json.dumps(result.value, indent=2))
+print("\nfindings by path:")
+for f in result.findings:
+    print(f"  {f.json_path:<40} -> {f.detector_id}")
+# Proof that the original object was not mutated:
+assert trace["user"]["email"] == "bob@example.com"
+print("\noriginal input untouched ✓")

traceredact-0.2.3/examples/02_policy.py ADDED Viewed

@@ -0,0 +1,72 @@
+"""02 · Policy — tune what gets redacted and how.
+A `Policy` is an explicit, inspectable config object. Pass it as the second
+argument to `redact()`. This example walks the knobs you'll actually use.
+    python examples/02_policy.py
+"""
+from traceredact import Policy, redact
+SAMPLE = "user a@b.com, ip 10.0.0.5, key sk-1234567890abcdefABCDEFGH"
+# ---------------------------------------------------------------------------
+# 1) Turn detectors OFF you don't care about (cuts false positives / noise).
+#    Detector ids look like "pii.ipv4", "secrets.openai_key", etc.
+# ---------------------------------------------------------------------------
+policy = Policy(disabled_detectors={"pii.ipv4"})
+print("ip kept:", redact(SAMPLE, policy).value)
+# -> the 10.0.0.5 stays; email + key still redacted
+# ...or run ONLY an explicit allow-set (everything else is skipped):
+only_secrets = Policy(enabled_detectors={"secrets.openai_key"})
+print("only key:", redact(SAMPLE, only_secrets).value)
+# ---------------------------------------------------------------------------
+# 2) Allowlisting — never redact known-safe values (e.g. docs/test fixtures).
+#    `allowlist` matches exact strings; `allow_patterns` matches regexes.
+# ---------------------------------------------------------------------------
+policy = Policy(allow_patterns=[r".*@example\.com"])
+print("allowed:", redact("real@gmail.com and demo@example.com", policy).value)
+# -> demo@example.com is kept, real@gmail.com is redacted
+# ---------------------------------------------------------------------------
+# 3) Custom placeholder template. `{category}` is substituted per finding.
+# ---------------------------------------------------------------------------
+policy = Policy(placeholder="«redacted {category}»")
+print("placeholder:", redact("a@b.com", policy).value)  # -> «redacted pii»
+# ---------------------------------------------------------------------------
+# 4) Organisation-specific detectors via `custom_patterns`.
+#    Regexes are length-bounded and rejected at load if they look ReDoS-prone.
+# ---------------------------------------------------------------------------
+policy = Policy(custom_patterns=[
+    {"id": "custom.employee_id", "category": "pii", "regex": r"EMP-\d{6}"},
+])
+print("custom:", redact("employee EMP-123456 logged in", policy).value)
+# ---------------------------------------------------------------------------
+# 5) Correlation hashing — replace a secret with a STABLE tag so you can join
+#    occurrences across traces WITHOUT storing the secret. Requires a key
+#    (fail-closed: omitting the key raises, so you never emit unkeyed hashes).
+# ---------------------------------------------------------------------------
+policy = Policy(
+    placeholder="[REDACTED:{category}:{hash}]",
+    hash_correlation=True,
+    hash_key="load-me-from-an-env-var",   # keep this out of source in real code
+)
+r1 = redact("key sk-1234567890abcdefABCDEFGH", policy)
+r2 = redact("again sk-1234567890abcdefABCDEFGH", policy)
+print("hash#1:", r1.value)
+print("hash#2:", r2.value)
+# Same secret -> same tag in both, so you can correlate without the plaintext.
+# ---------------------------------------------------------------------------
+# 6) Load a policy from a file (auto-discovered as ./traceredact.yml by the CLI).
+# ---------------------------------------------------------------------------
+# policy = Policy.load("traceredact.yml")

traceredact-0.2.3/examples/03_structured_args.py ADDED Viewed

@@ -0,0 +1,59 @@
+"""03 · Structured tool-call arguments — the case field-name filters miss.
+Agents pass nested JSON to tools, and secrets/PII end up deep inside under
+arbitrary keys. traceredact walks the structure and redacts by *value*, then
+tells you the exact `json_path` of every hit. It also traverses pydantic
+models, dataclasses and attrs instances.
+    python examples/03_structured_args.py
+"""
+from dataclasses import dataclass
+from traceredact import redact
+# ---------------------------------------------------------------------------
+# 1) A realistic tool-call payload. Note the secret sits under an innocuous
+#    key ("metadata.note") — a key-name denylist would sail right past it.
+# ---------------------------------------------------------------------------
+payload = {
+    "tool": "send_invoice",
+    "args": {
+        "to": "client@firm.co",                       # pii.email
+        "iban": "DE89 3704 0044 0532 0130 00",        # pii.iban (mod-97 checked)
+        "items": [{"sku": "A1", "note": "ok"}],        # untouched
+    },
+    "metadata": {"note": "their key is sk-1234567890abcdefABCDEFGH"},  # secret in prose
+}
+result = redact(payload)
+for f in result.findings:
+    print(f"{f.json_path:<28} {f.detector_id}")
+# args.to                      pii.email
+# args.iban                    pii.iban
+# metadata.note                secrets.openai_key
+# ---------------------------------------------------------------------------
+# 2) Sensitive KEY NAMES force-redact their value, even with no detectable
+#    signal in the value itself (e.g. a low-entropy password). This is the
+#    structured counterpart to scanning values.
+# ---------------------------------------------------------------------------
+result = redact({"password": "hunter2", "retries": 3})
+print("\nby key-name:", result.value)
+# -> {"password": "[REDACTED:secret]", "retries": 3}   (3 is left alone)
+# ---------------------------------------------------------------------------
+# 3) pydantic / dataclass / attrs instances are traversed automatically
+#    (redacted to a dict). Toggle with Policy(traverse_objects=False).
+# ---------------------------------------------------------------------------
+@dataclass
+class ToolCall:
+    user_email: str
+    api_key: str
+result = redact({"call": ToolCall("a@b.com", "ghp_abcdefghijklmnopqrstuvwxyz0123456789")})
+print("dataclass:", result.value)
+# -> {"call": {"user_email": "[REDACTED:pii]", "api_key": "[REDACTED:secret]"}}

traceredact-0.2.3/examples/04_logging_filter.py ADDED Viewed

@@ -0,0 +1,100 @@
+"""04 · Drop-in logging filter — redact every log record before it's emitted.
+This is the highest-leverage integration: attach one `logging.Filter` and every
+`logger.info(...)` across your app is redacted before it reaches any handler
+(stdout, JSON logs, Datadog, files...). No call-site changes.
+    python examples/04_logging_filter.py
+"""
+from __future__ import annotations
+import logging
+from typing import Any
+from traceredact import Policy, redact
+class RedactingFilter(logging.Filter):
+    """A logging.Filter that redacts a record's message, args and `extra` fields.
+    Key design points (these are the easy-to-get-wrong bits):
+    * We redact ``record.msg`` and ``record.args`` SEPARATELY, instead of calling
+      ``record.getMessage()``. This preserves logging's *deferred* %-formatting:
+      ``logger.info("token=%s", secret)`` keeps the secret in ``args`` until we
+      redact it, so it never gets formatted into the message in the clear.
+    * Redaction must NEVER raise — logging failures shouldn't crash your app — so
+      every redaction is wrapped and falls back to a marker.
+    * We deliberately skip ``exc_info``: redacting tracebacks is expensive and
+      fragile. If exception text can contain secrets, redact the rendered string
+      in a custom ``Formatter`` instead (see note at the bottom).
+    """
+    # LogRecord's own attributes — never treat these as user `extra=` data.
+    _RESERVED = frozenset(
+        "name msg args levelname levelno pathname filename module exc_info "
+        "exc_text stack_info lineno funcName created msecs relativeCreated "
+        "thread threadName processName process taskName".split()
+    )
+    def __init__(self, policy: Policy | None = None) -> None:
+        super().__init__()
+        self.policy = policy
+    def _redact(self, value: Any) -> Any:
+        try:
+            return redact(value, policy=self.policy).value
+        except Exception:  # logging must never break the caller
+            return "[REDACTION_ERROR]"
+    def filter(self, record: logging.LogRecord) -> bool:
+        record.msg = self._redact(record.msg)
+        if isinstance(record.args, tuple):
+            record.args = tuple(self._redact(a) for a in record.args)
+        elif isinstance(record.args, dict):  # %(name)s style
+            record.args = {k: self._redact(v) for k, v in record.args.items()}
+        elif record.args:
+            record.args = self._redact(record.args)
+        # Redact custom fields passed via logger.info(..., extra={...}).
+        for key, value in list(record.__dict__.items()):
+            if key in self._RESERVED or key.startswith("_"):
+                continue
+            record.__dict__[key] = self._redact(value)
+        return True  # keep the record (we redacted it, we don't drop it)
+if __name__ == "__main__":
+    import sys
+    # IMPORTANT: attach the filter to the HANDLER, not to a logger. A filter on a
+    # logger is only consulted for records logged *at that logger* — it is NOT
+    # applied to records propagated up from child loggers. A handler filter sees
+    # every record the handler emits, which is what you want for redaction.
+    handler = logging.StreamHandler(sys.stdout)
+    handler.setFormatter(logging.Formatter("%(levelname)s %(message)s"))
+    handler.addFilter(RedactingFilter())
+    root = logging.getLogger()
+    root.handlers.clear()
+    root.addHandler(handler)
+    root.setLevel(logging.INFO)
+    log = logging.getLogger("demo")
+    # Deferred %-formatting: the secret lives in args until the filter sees it.
+    log.info("calling %s with token=%s", "openai", "sk-1234567890abcdefABCDEFGH")
+    # dict args:
+    log.info("payload=%(p)s", {"p": {"email": "a@b.com", "ok": True}})
+    # structured extra= (avoid reserved LogRecord names like "args"/"msg"):
+    log.info("tool call", extra={"tool_args": {"card": "4111 1111 1111 1111"}})
+    # NOTE: an f-string formats BEFORE logging sees it, so the filter can't help:
+    #   log.info(f"token={secret}")   # <-- avoid; pass as an arg instead.
+    # For exceptions, redact the rendered text in a Formatter, e.g.:
+    #   class RedactingFormatter(logging.Formatter):
+    #       def format(self, record):
+    #           return redact(super().format(record)).value

traceredact-0.2.3/examples/05_streaming.py ADDED Viewed

@@ -0,0 +1,53 @@
+"""05 · Streaming — redact token streams without buffering the whole response.
+When you stream an LLM response, a secret like ``sk-...`` can be split across
+several token deltas. Scanning each delta on its own would leak it at the seam.
+``redact_stream`` keeps a small carry-over window so cross-chunk secrets are
+still caught.
+    python examples/05_streaming.py
+"""
+import asyncio
+from traceredact import StreamRedactor, redact_stream
+from traceredact.streaming import redact_stream_async
+# ---------------------------------------------------------------------------
+# 1) Sync: redact an iterable of text chunks. The secret below is deliberately
+#    split across two chunks to show the carry-over window working.
+# ---------------------------------------------------------------------------
+chunks = ["Here is the key sk-12345", "67890abcdefABCDEFGH — keep it safe"]
+out = "".join(redact_stream(chunks))
+print("streamed:", out)
+assert "sk-1234567890abcdefABCDEFGH" not in out  # caught across the boundary
+# ---------------------------------------------------------------------------
+# 2) Manual control with StreamRedactor: feed deltas, then flush at the end.
+#    Use ONE redactor per stream; never share carry between conversations.
+# ---------------------------------------------------------------------------
+r = StreamRedactor()
+emitted = []
+for delta in ["my email is al", "ice@acme.io done"]:
+    emitted.append(r.feed(delta))   # returns redacted text safe to emit now
+emitted.append(r.flush())           # flush the remaining carry at end-of-stream
+print("manual  :", "".join(emitted))
+# ---------------------------------------------------------------------------
+# 3) Async (e.g. an OpenAI async token stream — see 07_openai.py to plug it in).
+# ---------------------------------------------------------------------------
+async def main() -> None:
+    async def token_source():
+        for piece in ["secret ghp_abcdefghij", "klmnopqrstuvwxyz0123456789!"]:
+            yield piece
+    parts = [p async for p in redact_stream_async(token_source())]
+    print("async   :", "".join(parts))
+asyncio.run(main())
+# Tip: the carry-over window (default 512) must be >= the longest secret AND its
+# context. For huge PEM blocks streamed token-by-token, prefer one-shot redact().

traceredact-0.2.3/examples/06_before_db_or_logger.py ADDED Viewed

@@ -0,0 +1,47 @@
+"""06 · Redact agent traces before persisting them (DB / Langfuse / Datadog).
+The privacy-by-design pattern: redact in-process *before* the trace leaves you,
+so the sensitive data never lands in your store or observability vendor.
+    python examples/06_before_db_or_logger.py
+"""
+from traceredact import redact
+# A span you're about to insert into Postgres / ship to Langfuse / log as JSON.
+span = {
+    "trace_id": "t_8842",
+    "input": {"messages": [{"role": "user", "content": "charge card 4111111111111111"}]},
+    "tool_calls": [
+        {"name": "create_payment",
+         "args": {"card": "4111111111111111", "email": "a@b.com"}},
+    ],
+    "retrieved_context": ["doc#1: contact bob@corp.io for the AWS key AKIAIOSFODNN7EXAMPLE"],
+    "output": "Done. A receipt was sent.",
+}
+def store(_span: dict) -> None:
+    """Stand-in for your real sink: db.insert(...) / langfuse.trace(...) / log.info(...)."""
+# --- the one line that matters -------------------------------------------------
+result = redact(span)
+store(result.value)            # only the redacted copy is ever persisted
+# ------------------------------------------------------------------------------
+print(f"persisted with {len(result.findings)} value(s) redacted:")
+for f in result.findings:
+    print(f"  {f.json_path}  ({f.detector_id})")
+# Optional: emit a privacy metric / alert when secrets show up in traces.
+if any(f.category == "secret" for f in result.findings):
+    print("\n⚠️  secret detected in a trace — investigate the source.")
+# Belt-and-braces check you can assert in tests: no original survived.
+import json  # noqa: E402
+blob = json.dumps(result.value)
+for leaked in ("4111111111111111", "a@b.com", "AKIAIOSFODNN7EXAMPLE", "bob@corp.io"):
+    assert leaked not in blob
+print("\nzero originals present in the stored payload ✓")

traceredact-0.2.3/examples/07_openai.py ADDED Viewed

@@ -0,0 +1,43 @@
+"""07 · OpenAI wrapper — redact prompts going out and content coming back.
+`wrap_openai(client)` patches `chat.completions.create` so:
+  * outbound `messages` are redacted before the request leaves your process, and
+  * the returned assistant `content` is redacted before it reaches your logs.
+Requires the OpenAI SDK:  pip install "traceredact[openai]"
+This file is illustrative — it needs real credentials to actually call the API.
+"""
+from traceredact import Policy
+from traceredact.integrations.openai import wrap_openai
+# --- sync ---------------------------------------------------------------------
+# from openai import OpenAI
+# client = wrap_openai(OpenAI(), policy=Policy())   # policy is optional
+#
+# resp = client.chat.completions.create(
+#     model="gpt-4o-mini",
+#     messages=[{"role": "user", "content": "my email is a@b.com, summarise it"}],
+# )
+# The model receives the REDACTED prompt, and resp.choices[0].message.content
+# is redacted before you log it.
+# --- async --------------------------------------------------------------------
+# from openai import AsyncOpenAI
+# from traceredact.integrations.openai import wrap_async_openai
+# client = wrap_async_openai(AsyncOpenAI())
+# resp = await client.chat.completions.create(model="gpt-4o-mini", messages=[...])
+# --- streaming ----------------------------------------------------------------
+# For stream=True, wrap the returned async stream to get redacted text pieces
+# (carry-over across token deltas):
+#
+# from traceredact.integrations.openai import redact_content_stream
+# stream = await AsyncOpenAI().chat.completions.create(model="gpt-4o-mini",
+#                                                      messages=[...], stream=True)
+# async for safe_text in redact_content_stream(stream):
+#     log.info(safe_text)        # only redacted text is ever logged
+_ = (wrap_openai, Policy)  # keep the imports referenced for a clean `python -c`
+print("See the comments — this example needs the OpenAI SDK + credentials to run.")

traceredact-0.2.3/examples/08_langchain.py ADDED Viewed

@@ -0,0 +1,28 @@
+"""08 · LangChain — redact prompts flowing through the callback system.
+Attach `RedactingCallbackHandler` and the prompts your chains/agents send are
+redacted, with findings collected for inspection. Degrades gracefully if
+`langchain-core` isn't installed (so importing never hard-fails).
+Requires:  pip install "traceredact[langchain]"
+"""
+from traceredact.integrations.langchain import RedactingCallbackHandler
+handler = RedactingCallbackHandler()
+# Pass it to any LangChain call:
+#   llm.invoke("...", config={"callbacks": [handler]})
+#   chain.invoke({...}, config={"callbacks": [handler]})
+#
+# After a run, the redacted prompts and findings are available on the handler.
+# --- standalone demo (no LangChain needed) ------------------------------------
+handler.on_llm_start({}, ["my OpenAI key is sk-1234567890abcdefABCDEFGH"])
+print("redacted prompt :", handler.redacted_prompts[0])
+print("findings        :", [f.detector_id for f in handler.findings])
+# redacted prompt : my OpenAI key is [REDACTED:secret]
+# findings        : ['secrets.openai_key']
+# You can also redact arbitrary text yourself via the handler:
+print("ad-hoc          :", handler.redact_text("ping a@b.com"))

traceredact-0.2.3/examples/09_fastapi_middleware.py ADDED Viewed

@@ -0,0 +1,102 @@
+"""09 · FastAPI / ASGI middleware — log redacted request & response bodies.
+This middleware logs a REDACTED COPY of JSON request/response bodies. It does
+NOT modify what the app receives or what the client gets back — it only sanitises
+what you log. Pure-ASGI so it works with FastAPI, Starlette, etc.
+Requires Starlette (bundled with FastAPI):  pip install "traceredact" starlette
+Pitfalls handled below:
+  * only JSON content-types are parsed (skip multipart/files/SSE/protobuf/gzip)
+  * bodies are bounded (`max_body_bytes`) so a huge upload can't blow memory
+  * original ASGI messages are forwarded UNCHANGED — never break the response
+"""
+from __future__ import annotations
+import json
+import logging
+from typing import Any
+from traceredact import Policy, redact
+# These types come from Starlette; imported lazily so the file is importable
+# even without it installed (the class is illustrative).
+try:
+    from starlette.types import ASGIApp, Message, Receive, Scope, Send
+except Exception:  # pragma: no cover
+    ASGIApp = Message = Receive = Scope = Send = Any  # type: ignore
+class RedactedBodyLoggingMiddleware:
+    def __init__(self, app: ASGIApp, logger: logging.Logger,
+                 policy: Policy | None = None, max_body_bytes: int = 1_000_000) -> None:
+        self.app = app
+        self.logger = logger
+        self.policy = policy
+        self.max_body_bytes = max_body_bytes
+    def _is_json(self, headers: list[tuple[bytes, bytes]]) -> bool:
+        for k, v in headers:
+            if k.lower() == b"content-type":
+                return b"application/json" in v.lower() or b"+json" in v.lower()
+        return False
+    def _redacted_json(self, body: bytes) -> Any:
+        if not body or len(body) > self.max_body_bytes:
+            return {"_skipped": True, "bytes": len(body)}
+        try:
+            parsed = json.loads(body)
+        except (ValueError, UnicodeDecodeError):
+            return {"_unparsed": True}
+        return redact(parsed, policy=self.policy).value  # redact the parsed body
+    async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
+        if scope["type"] != "http":
+            await self.app(scope, receive, send)   # pass through websockets etc.
+            return
+        req_chunks: list[bytes] = []
+        res_chunks: list[bytes] = []
+        res_status = 0
+        res_headers: list[tuple[bytes, bytes]] = []
+        async def wrapped_receive() -> Message:
+            message = await receive()
+            if message["type"] == "http.request":
+                if sum(map(len, req_chunks)) <= self.max_body_bytes:
+                    req_chunks.append(message.get("body", b""))
+            return message  # forward the ORIGINAL message to the app
+        async def wrapped_send(message: Message) -> None:
+            nonlocal res_status, res_headers
+            if message["type"] == "http.response.start":
+                res_status = message["status"]
+                res_headers = list(message.get("headers", []))
+            elif message["type"] == "http.response.body":
+                if sum(map(len, res_chunks)) <= self.max_body_bytes:
+                    res_chunks.append(message.get("body", b""))
+                if not message.get("more_body", False):
+                    self.logger.info("http_exchange", extra={
+                        "method": scope.get("method"),
+                        "path": scope.get("path"),
+                        "status": res_status,
+                        "request_body": self._redacted_json(b"".join(req_chunks))
+                            if self._is_json(list(scope.get("headers", []))) else None,
+                        "response_body": self._redacted_json(b"".join(res_chunks))
+                            if self._is_json(res_headers) else None,
+                    })
+            await send(message)  # forward the ORIGINAL message to the client
+        await self.app(scope, wrapped_receive, wrapped_send)
+# Usage:
+#   app.add_middleware(RedactedBodyLoggingMiddleware,
+#                      logger=logging.getLogger("http"), policy=Policy())
+#
+# For true token-by-token streaming responses, log via redact_stream/
+# redact_content_stream instead (this middleware logs after the body completes).
+if __name__ == "__main__":
+    print("Illustrative ASGI middleware — wire it into a FastAPI/Starlette app.")

traceredact-0.2.3/examples/10_ci_gate.md ADDED Viewed

@@ -0,0 +1,51 @@
+# 10 · CI gate — fail the build if secrets/PII appear
+`traceredact scan` exits **non-zero** when it finds anything, so you can gate a
+job on it. Point it at fixtures, prompt files, exported traces, log dumps — any
+file or directory.
+## Locally
+```bash
+traceredact scan ./tests/fixtures/        # pretty table; exit 1 if findings
+traceredact scan trace.json --format json # machine-readable for tooling
+traceredact scan . --policy traceredact.yml
+```
+## GitHub Actions
+```yaml
+name: redaction-check
+on: [pull_request]
+jobs:
+  scan:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - run: pipx install traceredact
+      # Fails the PR if any secret/PII is committed under these paths.
+      - run: traceredact scan ./fixtures ./examples --format json
+```
+## pre-commit hook
+```yaml
+# .pre-commit-config.yaml
+repos:
+  - repo: local
+    hooks:
+      - id: traceredact
+        name: traceredact scan
+        entry: traceredact scan
+        language: system
+        pass_filenames: true
+```
+## Redact a file in place (CI artifact sanitisation)
+```bash
+traceredact redact prod-trace.json --output prod-trace.redacted.json
+```
+Use `traceredact.yml` to tune detectors/allowlist so the gate matches your repo
+(see `../traceredact.yml` for a fully-commented policy).

traceredact-0.2.3/examples/README.md ADDED Viewed

@@ -0,0 +1,25 @@
+# Examples
+Runnable, heavily-commented examples for common scenarios. Install first:
+```bash
+pip install traceredact          # core
+pip install "traceredact[openai]" "traceredact[langchain]"   # for the SDK examples
+```
+Run any pure-Python example directly, e.g. `python examples/01_basics.py`.
+| # | File | Scenario |
+|---|------|----------|
+| 01 | [`01_basics.py`](01_basics.py) | Redact a string & a nested payload; inspect findings |
+| 02 | [`02_policy.py`](02_policy.py) | Tune detectors, allowlist, custom patterns, placeholder, correlation hashing |
+| 03 | [`03_structured_args.py`](03_structured_args.py) | Tool-call args by JSON path; sensitive keys; pydantic/dataclass |
+| 04 | [`04_logging_filter.py`](04_logging_filter.py) | **Drop-in `logging.Filter`** — redact every log record, no call-site changes |
+| 05 | [`05_streaming.py`](05_streaming.py) | Redact streamed token deltas (secrets that span chunks) |
+| 06 | [`06_before_db_or_logger.py`](06_before_db_or_logger.py) | Redact agent traces before DB / Langfuse / Datadog |
+| 07 | [`07_openai.py`](07_openai.py) | OpenAI wrapper (sync / async / streaming) |
+| 08 | [`08_langchain.py`](08_langchain.py) | LangChain `RedactingCallbackHandler` |
+| 09 | [`09_fastapi_middleware.py`](09_fastapi_middleware.py) | ASGI/FastAPI middleware: log redacted request/response bodies |
+| 10 | [`10_ci_gate.md`](10_ci_gate.md) | `traceredact scan` as a CI gate / pre-commit hook |
+Pure-Python (no SDK needed): 01–06. Illustrative (need an SDK/framework): 07–09.

{traceredact-0.2.2 → traceredact-0.2.3}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "traceredact"
-version = "0.2.2"
+version = "0.2.3"
 description = "Redact PII and secrets from AI prompts, traces and tool-call arguments before they reach your loggers."
 readme = "README.md"
 requires-python = ">=3.11"