RubyGems - diogenes - Versions diffs - 0.1.0 → 0.1.1 - Mend

diogenes 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/.release-please-config.json +16 -0
data/.release-please-manifest.json +3 -0
data/CHANGELOG.md +7 -0
data/CLAUDE.md +138 -0
data/README.md +192 -19
data/docs/context.md +60 -0
data/docs/contributing.md +228 -0
data/docs/dashboard.md +365 -0
data/docs/examples.md +162 -0
data/docs/framework.md +146 -0
data/lib/diogenes/version.rb +1 -1
data/mise.lock +48 -0
data/mise.toml +6 -0
metadata +42 -17

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 5c657d944e41834630bc23193840875d8d28204089b2f0358759759df125137f
-  data.tar.gz: 5775eabc0ad2f0e56aa9d62c8bcc8654aaf1272d8c49d8d4c6d28c738947eb3d
+  metadata.gz: ea9a9823e0e6c2f671b528ed7815059180e9a6991798aa3e43c5bf48b185f7bb
+  data.tar.gz: acfc44625d69616c0b7282aa8b052ae8fc580b28dac6e570167779a979ddf14c
 SHA512:
-  metadata.gz: 7c89894cb19f74497b2f3c6b75fbbef0617563eaff3f0e66731a4940630ecba431417b2dff2d608b4a2fd00c1e4047b2036b93a09bb34d6c5584752ba3072acb
-  data.tar.gz: beb0b006c1f9c9201c09661d4f9e58add17ea473e6d4b635707e80b5ba4b2aefe1b7c064cbc615ebdb6b8d1124e4ac3dde4bf7537572dce472d4d512c9e38a9d
+  metadata.gz: 790dd1c3aa427d6c1f650a0dc8cd095b632386c11b7739cb1448a2f12f88fda359cbd5bdd2da2d5d361bc340ef1cce04c63cdc1f992b36c251eea3af19562503
+  data.tar.gz: 4f3f8ce2210e8ad92c49701490c60091780e733ac96f2c2b2d73d421eec1194414e7043e253fa1f942a1a57c485b7c21963804f64f67fcea2ee7d04f3eaa4657

data/.release-please-config.json ADDED Viewed

@@ -0,0 +1,16 @@
+{
+    "release-type": "ruby",
+    "packages": {
+        ".": {
+            "release-type": "ruby",
+            "package-name": "diogenes",
+            "version-file": "lib/diogenes/version.rb",
+            "changelog-path": "CHANGELOG.md",
+            "bump-minor-pre-major": true,
+            "bump-patch-for-minor-pre-major": true,
+            "draft": false,
+            "prerelease": false
+        }
+    },
+    "$schema": "https://raw.githubusercontent.com/googleapis/release-please/main/schemas/config.json"
+}

data/.release-please-manifest.json ADDED Viewed

@@ -0,0 +1,3 @@
+{
+    ".": "0.1.1"
+}

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,12 @@
 ## [Unreleased]
+## [0.1.1](https://github.com/meaganewaller/diogenes/compare/diogenes-v0.1.0...diogenes/v0.1.1) (2026-06-27)
+### Features
+* initial commit ([19265fa](https://github.com/meaganewaller/diogenes/commit/19265fa2e33901b799e0cb30c84bd23f476db7de))
 ## [0.1.0] - 2026-06-24
 - Initial release

data/CLAUDE.md ADDED Viewed

@@ -0,0 +1,138 @@
+# Diogenes — Claude Context
+This file provides context for working in the Diogenes codebase. Read it before making changes.
+---
+## What This Gem Does
+Diogenes is a responsible AI accountability layer for Rails applications. It has five distinct responsibilities:
+1. **Gate enforcement** — declaring and validating constraints on AI features at boot
+2. **Grounding verification** — actively checking that AI output is supported by retrieved context
+3. **Drift detection** — tracking when indexed documents go stale relative to their sources
+4. **Eval running** — testing AI features against golden pairs on a schedule, alerting on regression
+5. **Dashboard** — surfacing all of the above in a mounted Rails engine
+It is not a RAG toolkit, not an LLM client wrapper, and not a general observability tool. Every feature should serve one of those five responsibilities.
+---
+## Architecture
+```
+lib/
+  diogenes/
+    feature.rb              # Main module; DSL entry point for gate declarations
+    gates/
+      base.rb               # Abstract gate interface
+      failure_mode.rb
+      user_calibration.rb
+      human_in_loop.rb
+      observability.rb
+      necessity.rb
+      compatibility.rb      # Cross-gate incompatibility matrix
+    grounding/
+      verifier.rb           # Second-pass LLM grounding check
+      result.rb             # Structured grounding result (supported/unsupported/contradicted)
+      prompt.rb             # The verifier prompt; versioned separately
+    drift/
+      tracker.rb            # Source update ingestion and staleness calculation
+      staleness_score.rb    # Scoring logic; separate from storage
+      reindex_job.rb        # Base job class; host app subclasses with embedding logic
+    evals/
+      runner.rb             # Executes eval suites against live features
+      suite.rb              # DSL for defining golden pairs
+      matchers.rb           # grounded_in, contains, semantically_similar_to, etc.
+      result.rb             # Per-run result with pass/fail per eval
+      regression.rb         # Regression detection and diffing
+    audit/
+      log.rb                # Audit record creation and querying
+      record.rb             # Plain Ruby struct; no ActiveRecord dependency
+    review/
+      queue.rb              # Human review queue logic; framework-agnostic
+    engine.rb               # Rails engine; only loaded when Rails is present
+    configuration.rb        # Diogenes.configure block
+    errors.rb               # UnsafeFeatureError and friends
+```
+### Engine (Dashboard)
+```
+lib/diogenes/engine/
+  app/
+    controllers/diogenes/
+      dashboard_controller.rb
+      dashboard/
+        grounding_controller.rb
+        drift_controller.rb
+        evals_controller.rb
+    views/diogenes/dashboard/
+      overview.html.erb
+      grounding/
+      drift/
+      evals/
+    assets/
+      diogenes.css          # Minimal; no framework dependency
+  config/
+    routes.rb
+  db/
+    migrate/                # Audit log, drift tracking, eval results tables
+```
+---
+## Key Decisions and Why
+**Gates are boot-time, not runtime.** Misconfiguration should be caught before anything reaches production. Runtime gate checks sound useful but create a false sense of safety — a feature that fails a gate at runtime has already been partially executed.
+**No LLM dependency in core.** The grounding verifier accepts any callable. This avoids version conflicts and keeps Diogenes focused on accountability rather than API clients.
+**Grounding verification has a versioned prompt.** `grounding/prompt.rb` is versioned separately from the verifier logic. When the prompt changes, old audit records remain interpretable because they store the prompt version used.
+**The necessity gate is intentionally soft.** It requires documentation but enforces nothing programmatically. Some constraints are better enforced by code review and culture. The necessity gate creates a paper trail.
+**Audit records are structs, not ActiveRecord.** The core audit layer has no database dependency. The engine provides an ActiveRecord-backed persistence layer, but the host app can substitute any store that accepts the struct.
+**Drift detection is passive by default.** Diogenes doesn't poll source documents or crawl file systems. The host app informs Diogenes when sources change via `Diogenes::Drift.source_updated`. Active polling is opt-in and configured via `config.drift.check_interval`.
+**Eval golden pairs live in code, not in a database.** They are version-controlled, reviewed in PRs, and treated as tests. Storing them in the database would make them invisible to code review and prone to silent modification.
+**The dashboard has no authentication.** Authentication is the host app's concern. The engine documents patterns for common setups but enforces nothing.
+---
+## Conventions
+- Gates inherit from `Diogenes::Gates::Base` and implement `#valid?` and `#failure_message`
+- Failure messages always include the feature class name, the gate name, and a plain-English instruction for what to change — not what went wrong
+- Errors raised by Diogenes are never rescued internally — they propagate to the host app
+- Audit records never store raw query or response content — only hashes and structured metadata
+- All public-facing configuration goes through `Diogenes.configure`; no environment variable reading inside lib/
+- The engine and all ActiveRecord code lives under `lib/diogenes/engine/` and is never required by the core
+---
+## Testing Approach
+- Gate logic is unit tested in isolation without Rails
+- The grounding verifier has a stub: `Diogenes::Grounding::Verifier.stub(result: :pass)`
+- Drift tracker tests use fixed timestamps; never `Time.now` directly
+- Eval matchers are tested against fixture responses, not live LLM calls
+- The engine is tested with a minimal Rails app in `spec/dummy/`
+- `Diogenes::TestMode` disables gate enforcement while still recording gate declarations — use in specs that test the host app's AI features, not Diogenes itself
+Never silence a Diogenes error in a test. If a gate is raising, fix the configuration.
+---
+## What Not to Build Here
+- LLM client integrations for any specific provider
+- A RAG pipeline or embedding implementation
+- General observability tooling unrelated to AI feature health
+- Anything that makes a gate failure non-fatal
+- Database-backed golden pairs for the eval runner
+- Authentication for the dashboard
+If a feature request requires Diogenes to know about a specific LLM provider, vendor API, or embedding model, it belongs in the host app.

data/README.md CHANGED Viewed

@@ -1,43 +1,216 @@
 # Diogenes
-TODO: Delete this and the text below, and describe your gem
+> "I am looking for an honest man." — Diogenes of Sinope
-Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/diogenes`. To experiment with that code, run `bin/console` for an interactive prompt.
+Diogenes is a Ruby gem that helps engineering teams make and defend decisions about when AI belongs in a feature — and when it doesn't.
-## Installation
+It encodes a responsible AI decision framework directly into your Rails application as executable, auditable gates, and then actively monitors your AI features in production through grounding verification, document drift detection, and a regression-aware eval runner.
+A mounted dashboard surfaces all of this in one place. Think Sidekiq Web for AI accountability.
+---
+## The Problem
+Most teams make two kinds of mistakes with AI features:
+**Mistake one:** Deciding to build them based on excitement or pressure rather than defensible criteria. When something goes wrong, nobody can explain why the decision was made or what safeguards were in place.
+**Mistake two:** Shipping them and assuming they continue to work. AI features degrade silently — documents go stale, retrieval quality drifts, models change. Traditional monitoring misses all of it because wrong-but-fluent outputs don't raise exceptions.
+Diogenes addresses both.
-TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
+---
-Install the gem and add to the application's Gemfile by executing:
+## Installation
+```ruby
+gem 'diogenes'
+```
 ```bash
-bundle add UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
+bundle install
+rails generate diogenes:install
+bundle exec rake db:migrate
+```
+---
+## What Diogenes Does
+### 1. The Decision Framework (Gates)
+Before an AI feature can serve output to a user, it must pass a set of declared gates. Gates are validated at boot — misconfiguration fails loudly before anything reaches production.
+```ruby
+class SupportAssistant
+  include Diogenes::Feature
+  gate :failure_mode,      severity: :recoverable
+  gate :user_calibration,  audience: :trained_agent
+  gate :human_in_loop,     verified: true, max_daily_reviews: 80
+  gate :observability,     logging: :full, alerting: :enabled
+  gate :necessity,         alternatives_considered: true
+  def answer(query, agent:)
+    # your implementation
+  end
+end
+```
+A feature that cannot satisfy a gate raises `Diogenes::UnsafeFeatureError` at boot with a plain-English explanation of what needs to change.
+**The five gates:** `:failure_mode`, `:user_calibration`, `:human_in_loop`, `:observability`, `:necessity`. See [docs/framework.md](docs/framework.md) for full documentation.
+---
+### 2. Grounding Verification
+For RAG pipelines, Diogenes ships a grounding verifier that runs a second LLM pass to check that AI output is actually supported by retrieved context — not confabulated.
+```ruby
+class SupportAssistant
+  include Diogenes::Feature
+  include Diogenes::Grounding
+  verify_grounding threshold: 0.8, on_failure: :flag_for_review
+  def answer(query, agent:)
+    context = retriever.retrieve(query)
+    response = llm.complete(query, context: context)
+    verify_and_return(response, context: context, reviewed_by: agent)
+  end
+end
+```
+The verifier returns a structured verdict — which claims are supported, unsupported, or contradicted by the retrieved context — and acts on it according to your configuration. Flag rates and verdicts are tracked in the audit log and surfaced in the dashboard.
+Configure any LLM callable as the verifier backend — Diogenes has no opinion on which one:
+```ruby
+Diogenes.configure do |config|
+  config.grounding.verifier_llm = -> (prompt) { Anthropic::Client.new.complete(prompt) }
+end
+```
+---
+### 3. Drift Detection
+Documents get indexed once and go stale. Policies change, prices change, features change. Diogenes tracks when source documents were last updated versus when their embeddings were created, surfaces a staleness score, and can trigger re-indexing automatically.
+```ruby
+# Inform Diogenes that a source document has changed
+Diogenes::Drift.source_updated(
+  document_id: 'refund-policy-v2',
+  updated_at: Time.current,
+  diff_size: :major
+)
+```
+```ruby
+# config/initializers/diogenes.rb
+Diogenes.configure do |config|
+  config.drift.reindex_job = ReindexDocumentJob
+  config.drift.staleness_thresholds = { warning: 7.days, critical: 30.days }
+  config.drift.alert_webhook = ENV['DIOGENES_ALERT_WEBHOOK']
+end
 ```
-If bundler is not being used to manage dependencies, install the gem by executing:
+Stale documents surface in the dashboard drift tab, ranked by severity. Re-indexing queues your job with one click or via a Rake task.
+---
+### 4. Eval Runner
+The hardest unsolved problem in production AI is knowing whether your feature is getting better or worse over time. Diogenes ships a lightweight eval framework: define golden question/answer pairs, run them on a schedule, track pass rates over time, and alert on regression.
+```ruby
+# test/diogenes/evals/support_assistant_evals.rb
+Diogenes::Evals.define(SupportAssistant) do
+  eval "basic refund question" do
+    query    "How do I request a refund?"
+    expects  all_of(
+      grounded_in("refund-policy"),
+      contains("billing page"),
+      does_not_contain("24 hours")
+    )
+  end
+  eval "question with no good answer" do
+    query    "What is the API rate limit for legacy v1 endpoints?"
+    expects  one_of(
+      low_confidence_response,
+      routes_to_human_review
+    )
+  end
+end
+```
 ```bash
-gem install UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
+bundle exec rake diogenes:evals:run[SupportAssistant]
+```
+When a passing eval starts failing, Diogenes records the regression point and diffs the last passing response against the first failing one. In most cases it can correlate the regression directly to a stale document in the drift tracker.
+---
+### 5. The Dashboard
+Mount the Diogenes engine to get a live view of all of the above in one place:
+```ruby
+# config/routes.rb
+authenticate :user, ->(u) { u.admin? } do
+  mount Diogenes::Engine => '/diogenes'
+end
 ```
-## Usage
+The overview tab shows one row per gated feature — gates declared, grounding flag rate, drift score, and eval pass rate. A feature that is passing all its gates but has 11 stale documents and a declining eval pass rate is visible before it becomes a production incident.
-TODO: Write usage instructions here
+See [docs/dashboard.md](docs/dashboard.md) for the full dashboard documentation including route structure, controller layout, and configuration reference.
-## Development
+---
-After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+## The Audit Trail
-To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
+Every AI call made through a Diogenes-gated feature produces an audit record:
-## Contributing
+```ruby
+Diogenes::AuditLog.for_feature(SupportAssistant)
+# => [
+#   {
+#     feature:         "SupportAssistant",
+#     gate_config:     { failure_mode: :recoverable, ... },
+#     query_hash:      "sha256:...",
+#     context_sources: ["refund-policy.md", "enterprise-terms.md"],
+#     grounding:       { supported: [...], unsupported: [], contradicted: [] },
+#     verified_by:     "agent@company.com",
+#     timestamp:       2024-01-15 14:23:01 UTC
+#   }
+# ]
+```
-Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/diogenes. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/[USERNAME]/diogenes/blob/main/CODE_OF_CONDUCT.md).
+Audit records store hashes, not raw content — PII never enters the audit log directly. The host app controls content storage and retention.
-## License
+---
-The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
+## Philosophy
-## Code of Conduct
+Diogenes takes no position on whether AI is good or bad for your product. It takes one strong position: **that decision should be made deliberately, defensibly, and with receipts.**
+A feature that passes all five gates and fails in production is a recoverable engineering problem. A feature that never asked the questions is a different kind of problem entirely.
+See [docs/framework.md](docs/framework.md) for the full decision framework and [docs/examples.md](docs/examples.md) for two worked examples — one that passes, one that doesn't.
+---
+## Contributing
+See [docs/contributing.md](docs/contributing.md).
+## License
-Everyone interacting in the Diogenes project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/diogenes/blob/main/CODE_OF_CONDUCT.md).
+MIT

data/docs/context.md ADDED Viewed

@@ -0,0 +1,60 @@
+# Supplementary Context for Contributors
+This file extends CLAUDE.md with additional context for common contribution scenarios. Read CLAUDE.md first.
+---
+## Adding a New Gate
+Gates are the core of Diogenes. Adding one requires:
+1. A class in `lib/diogenes/gates/` inheriting `Diogenes::Gates::Base`
+2. Implementation of `#valid?` and `#failure_message`
+3. Registration in `Diogenes::Feature`
+4. Documentation in `docs/framework.md`
+5. An entry in the gate compatibility matrix (see below)
+Gates should be conservative by default. If configuration is ambiguous, fail loudly rather than pass silently.
+### Gate Compatibility Matrix
+Some gate configurations are incompatible with each other. The matrix lives in `lib/diogenes/gates/compatibility.rb` and is checked after all individual gates are validated.
+Current incompatibilities:
+- `:failure_mode severity: :financial_dispute` + `:human_in_loop verified: false` → always raises
+- `:failure_mode severity: :safety_risk` + `:user_calibration audience: :general_consumer` → always raises
+- `:user_calibration audience: :general_consumer` + `:human_in_loop verified: false` → raises unless `:failure_mode severity: :cosmetic`
+When adding a new gate, consider whether it creates new incompatibilities and add them to the matrix.
+---
+## Working on the Audit Log
+The audit log has two layers:
+**Core layer** (`lib/diogenes/audit/`) — plain Ruby, no ActiveRecord. Produces `Diogenes::Audit::Record` structs. This must remain framework-agnostic.
+**Rails layer** (`lib/diogenes/engine.rb`) — ActiveRecord-backed persistence, only loaded when Rails is detected. Writes records from the core layer to the database.
+If you're adding fields to audit records, add them to the struct first, then to the Rails migration. Never add Rails-specific code to the core layer.
+---
+## Working on the Review Engine
+The review engine is a Rails engine mounted at a configurable path. It has no opinions about authentication — that's the host app's job. Document this clearly in any UI-related PRs.
+The queue logic in `lib/diogenes/review/queue.rb` is tested without Rails. The engine views and controllers are tested with a minimal Rails app in `spec/dummy/`.
+---
+## Common Mistakes
+**Making gates runtime checks.** Gates are boot-time. If you find yourself writing code that evaluates a gate during a request, reconsider the design.
+**Adding LLM-specific code.** Diogenes doesn't know what LLM you're using. If a PR requires a specific client gem, it belongs in the user's codebase.
+**Soft failures.** Diogenes raises when gates fail. It does not return nil, log a warning, or degrade gracefully. A feature that fails a gate should not serve AI output under any circumstances.
+**Storing content in audit records.** Audit records store metadata and hashes, not raw content. Raw content may contain PII. The host app is responsible for its own content storage and retention.