diogenes 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 5c657d944e41834630bc23193840875d8d28204089b2f0358759759df125137f
4
- data.tar.gz: 5775eabc0ad2f0e56aa9d62c8bcc8654aaf1272d8c49d8d4c6d28c738947eb3d
3
+ metadata.gz: ea9a9823e0e6c2f671b528ed7815059180e9a6991798aa3e43c5bf48b185f7bb
4
+ data.tar.gz: acfc44625d69616c0b7282aa8b052ae8fc580b28dac6e570167779a979ddf14c
5
5
  SHA512:
6
- metadata.gz: 7c89894cb19f74497b2f3c6b75fbbef0617563eaff3f0e66731a4940630ecba431417b2dff2d608b4a2fd00c1e4047b2036b93a09bb34d6c5584752ba3072acb
7
- data.tar.gz: beb0b006c1f9c9201c09661d4f9e58add17ea473e6d4b635707e80b5ba4b2aefe1b7c064cbc615ebdb6b8d1124e4ac3dde4bf7537572dce472d4d512c9e38a9d
6
+ metadata.gz: 790dd1c3aa427d6c1f650a0dc8cd095b632386c11b7739cb1448a2f12f88fda359cbd5bdd2da2d5d361bc340ef1cce04c63cdc1f992b36c251eea3af19562503
7
+ data.tar.gz: 4f3f8ce2210e8ad92c49701490c60091780e733ac96f2c2b2d73d421eec1194414e7043e253fa1f942a1a57c485b7c21963804f64f67fcea2ee7d04f3eaa4657
@@ -0,0 +1,16 @@
1
+ {
2
+ "release-type": "ruby",
3
+ "packages": {
4
+ ".": {
5
+ "release-type": "ruby",
6
+ "package-name": "diogenes",
7
+ "version-file": "lib/diogenes/version.rb",
8
+ "changelog-path": "CHANGELOG.md",
9
+ "bump-minor-pre-major": true,
10
+ "bump-patch-for-minor-pre-major": true,
11
+ "draft": false,
12
+ "prerelease": false
13
+ }
14
+ },
15
+ "$schema": "https://raw.githubusercontent.com/googleapis/release-please/main/schemas/config.json"
16
+ }
@@ -0,0 +1,3 @@
1
+ {
2
+ ".": "0.1.1"
3
+ }
data/CHANGELOG.md CHANGED
@@ -1,5 +1,12 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [0.1.1](https://github.com/meaganewaller/diogenes/compare/diogenes-v0.1.0...diogenes/v0.1.1) (2026-06-27)
4
+
5
+
6
+ ### Features
7
+
8
+ * initial commit ([19265fa](https://github.com/meaganewaller/diogenes/commit/19265fa2e33901b799e0cb30c84bd23f476db7de))
9
+
3
10
  ## [0.1.0] - 2026-06-24
4
11
 
5
12
  - Initial release
data/CLAUDE.md ADDED
@@ -0,0 +1,138 @@
1
+ # Diogenes — Claude Context
2
+
3
+ This file provides context for working in the Diogenes codebase. Read it before making changes.
4
+
5
+ ---
6
+
7
+ ## What This Gem Does
8
+
9
+ Diogenes is a responsible AI accountability layer for Rails applications. It has five distinct responsibilities:
10
+
11
+ 1. **Gate enforcement** — declaring and validating constraints on AI features at boot
12
+ 2. **Grounding verification** — actively checking that AI output is supported by retrieved context
13
+ 3. **Drift detection** — tracking when indexed documents go stale relative to their sources
14
+ 4. **Eval running** — testing AI features against golden pairs on a schedule, alerting on regression
15
+ 5. **Dashboard** — surfacing all of the above in a mounted Rails engine
16
+
17
+ It is not a RAG toolkit, not an LLM client wrapper, and not a general observability tool. Every feature should serve one of those five responsibilities.
18
+
19
+ ---
20
+
21
+ ## Architecture
22
+
23
+ ```
24
+ lib/
25
+ diogenes/
26
+ feature.rb # Main module; DSL entry point for gate declarations
27
+ gates/
28
+ base.rb # Abstract gate interface
29
+ failure_mode.rb
30
+ user_calibration.rb
31
+ human_in_loop.rb
32
+ observability.rb
33
+ necessity.rb
34
+ compatibility.rb # Cross-gate incompatibility matrix
35
+ grounding/
36
+ verifier.rb # Second-pass LLM grounding check
37
+ result.rb # Structured grounding result (supported/unsupported/contradicted)
38
+ prompt.rb # The verifier prompt; versioned separately
39
+ drift/
40
+ tracker.rb # Source update ingestion and staleness calculation
41
+ staleness_score.rb # Scoring logic; separate from storage
42
+ reindex_job.rb # Base job class; host app subclasses with embedding logic
43
+ evals/
44
+ runner.rb # Executes eval suites against live features
45
+ suite.rb # DSL for defining golden pairs
46
+ matchers.rb # grounded_in, contains, semantically_similar_to, etc.
47
+ result.rb # Per-run result with pass/fail per eval
48
+ regression.rb # Regression detection and diffing
49
+ audit/
50
+ log.rb # Audit record creation and querying
51
+ record.rb # Plain Ruby struct; no ActiveRecord dependency
52
+ review/
53
+ queue.rb # Human review queue logic; framework-agnostic
54
+ engine.rb # Rails engine; only loaded when Rails is present
55
+ configuration.rb # Diogenes.configure block
56
+ errors.rb # UnsafeFeatureError and friends
57
+ ```
58
+
59
+ ### Engine (Dashboard)
60
+
61
+ ```
62
+ lib/diogenes/engine/
63
+ app/
64
+ controllers/diogenes/
65
+ dashboard_controller.rb
66
+ dashboard/
67
+ grounding_controller.rb
68
+ drift_controller.rb
69
+ evals_controller.rb
70
+ views/diogenes/dashboard/
71
+ overview.html.erb
72
+ grounding/
73
+ drift/
74
+ evals/
75
+ assets/
76
+ diogenes.css # Minimal; no framework dependency
77
+ config/
78
+ routes.rb
79
+ db/
80
+ migrate/ # Audit log, drift tracking, eval results tables
81
+ ```
82
+
83
+ ---
84
+
85
+ ## Key Decisions and Why
86
+
87
+ **Gates are boot-time, not runtime.** Misconfiguration should be caught before anything reaches production. Runtime gate checks sound useful but create a false sense of safety — a feature that fails a gate at runtime has already been partially executed.
88
+
89
+ **No LLM dependency in core.** The grounding verifier accepts any callable. This avoids version conflicts and keeps Diogenes focused on accountability rather than API clients.
90
+
91
+ **Grounding verification has a versioned prompt.** `grounding/prompt.rb` is versioned separately from the verifier logic. When the prompt changes, old audit records remain interpretable because they store the prompt version used.
92
+
93
+ **The necessity gate is intentionally soft.** It requires documentation but enforces nothing programmatically. Some constraints are better enforced by code review and culture. The necessity gate creates a paper trail.
94
+
95
+ **Audit records are structs, not ActiveRecord.** The core audit layer has no database dependency. The engine provides an ActiveRecord-backed persistence layer, but the host app can substitute any store that accepts the struct.
96
+
97
+ **Drift detection is passive by default.** Diogenes doesn't poll source documents or crawl file systems. The host app informs Diogenes when sources change via `Diogenes::Drift.source_updated`. Active polling is opt-in and configured via `config.drift.check_interval`.
98
+
99
+ **Eval golden pairs live in code, not in a database.** They are version-controlled, reviewed in PRs, and treated as tests. Storing them in the database would make them invisible to code review and prone to silent modification.
100
+
101
+ **The dashboard has no authentication.** Authentication is the host app's concern. The engine documents patterns for common setups but enforces nothing.
102
+
103
+ ---
104
+
105
+ ## Conventions
106
+
107
+ - Gates inherit from `Diogenes::Gates::Base` and implement `#valid?` and `#failure_message`
108
+ - Failure messages always include the feature class name, the gate name, and a plain-English instruction for what to change — not what went wrong
109
+ - Errors raised by Diogenes are never rescued internally — they propagate to the host app
110
+ - Audit records never store raw query or response content — only hashes and structured metadata
111
+ - All public-facing configuration goes through `Diogenes.configure`; no environment variable reading inside lib/
112
+ - The engine and all ActiveRecord code lives under `lib/diogenes/engine/` and is never required by the core
113
+
114
+ ---
115
+
116
+ ## Testing Approach
117
+
118
+ - Gate logic is unit tested in isolation without Rails
119
+ - The grounding verifier has a stub: `Diogenes::Grounding::Verifier.stub(result: :pass)`
120
+ - Drift tracker tests use fixed timestamps; never `Time.now` directly
121
+ - Eval matchers are tested against fixture responses, not live LLM calls
122
+ - The engine is tested with a minimal Rails app in `spec/dummy/`
123
+ - `Diogenes::TestMode` disables gate enforcement while still recording gate declarations — use in specs that test the host app's AI features, not Diogenes itself
124
+
125
+ Never silence a Diogenes error in a test. If a gate is raising, fix the configuration.
126
+
127
+ ---
128
+
129
+ ## What Not to Build Here
130
+
131
+ - LLM client integrations for any specific provider
132
+ - A RAG pipeline or embedding implementation
133
+ - General observability tooling unrelated to AI feature health
134
+ - Anything that makes a gate failure non-fatal
135
+ - Database-backed golden pairs for the eval runner
136
+ - Authentication for the dashboard
137
+
138
+ If a feature request requires Diogenes to know about a specific LLM provider, vendor API, or embedding model, it belongs in the host app.
data/README.md CHANGED
@@ -1,43 +1,216 @@
1
1
  # Diogenes
2
2
 
3
- TODO: Delete this and the text below, and describe your gem
3
+ > "I am looking for an honest man." Diogenes of Sinope
4
4
 
5
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/diogenes`. To experiment with that code, run `bin/console` for an interactive prompt.
5
+ Diogenes is a Ruby gem that helps engineering teams make and defend decisions about when AI belongs in a feature and when it doesn't.
6
6
 
7
- ## Installation
7
+ It encodes a responsible AI decision framework directly into your Rails application as executable, auditable gates, and then actively monitors your AI features in production through grounding verification, document drift detection, and a regression-aware eval runner.
8
+
9
+ A mounted dashboard surfaces all of this in one place. Think Sidekiq Web for AI accountability.
10
+
11
+ ---
12
+
13
+ ## The Problem
14
+
15
+ Most teams make two kinds of mistakes with AI features:
16
+
17
+ **Mistake one:** Deciding to build them based on excitement or pressure rather than defensible criteria. When something goes wrong, nobody can explain why the decision was made or what safeguards were in place.
18
+
19
+ **Mistake two:** Shipping them and assuming they continue to work. AI features degrade silently — documents go stale, retrieval quality drifts, models change. Traditional monitoring misses all of it because wrong-but-fluent outputs don't raise exceptions.
20
+
21
+ Diogenes addresses both.
8
22
 
9
- TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
23
+ ---
10
24
 
11
- Install the gem and add to the application's Gemfile by executing:
25
+ ## Installation
26
+
27
+ ```ruby
28
+ gem 'diogenes'
29
+ ```
12
30
 
13
31
  ```bash
14
- bundle add UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
32
+ bundle install
33
+ rails generate diogenes:install
34
+ bundle exec rake db:migrate
35
+ ```
36
+
37
+ ---
38
+
39
+ ## What Diogenes Does
40
+
41
+ ### 1. The Decision Framework (Gates)
42
+
43
+ Before an AI feature can serve output to a user, it must pass a set of declared gates. Gates are validated at boot — misconfiguration fails loudly before anything reaches production.
44
+
45
+ ```ruby
46
+ class SupportAssistant
47
+ include Diogenes::Feature
48
+
49
+ gate :failure_mode, severity: :recoverable
50
+ gate :user_calibration, audience: :trained_agent
51
+ gate :human_in_loop, verified: true, max_daily_reviews: 80
52
+ gate :observability, logging: :full, alerting: :enabled
53
+ gate :necessity, alternatives_considered: true
54
+
55
+ def answer(query, agent:)
56
+ # your implementation
57
+ end
58
+ end
59
+ ```
60
+
61
+ A feature that cannot satisfy a gate raises `Diogenes::UnsafeFeatureError` at boot with a plain-English explanation of what needs to change.
62
+
63
+ **The five gates:** `:failure_mode`, `:user_calibration`, `:human_in_loop`, `:observability`, `:necessity`. See [docs/framework.md](docs/framework.md) for full documentation.
64
+
65
+ ---
66
+
67
+ ### 2. Grounding Verification
68
+
69
+ For RAG pipelines, Diogenes ships a grounding verifier that runs a second LLM pass to check that AI output is actually supported by retrieved context — not confabulated.
70
+
71
+ ```ruby
72
+ class SupportAssistant
73
+ include Diogenes::Feature
74
+ include Diogenes::Grounding
75
+
76
+ verify_grounding threshold: 0.8, on_failure: :flag_for_review
77
+
78
+ def answer(query, agent:)
79
+ context = retriever.retrieve(query)
80
+ response = llm.complete(query, context: context)
81
+
82
+ verify_and_return(response, context: context, reviewed_by: agent)
83
+ end
84
+ end
85
+ ```
86
+
87
+ The verifier returns a structured verdict — which claims are supported, unsupported, or contradicted by the retrieved context — and acts on it according to your configuration. Flag rates and verdicts are tracked in the audit log and surfaced in the dashboard.
88
+
89
+ Configure any LLM callable as the verifier backend — Diogenes has no opinion on which one:
90
+
91
+ ```ruby
92
+ Diogenes.configure do |config|
93
+ config.grounding.verifier_llm = -> (prompt) { Anthropic::Client.new.complete(prompt) }
94
+ end
95
+ ```
96
+
97
+ ---
98
+
99
+ ### 3. Drift Detection
100
+
101
+ Documents get indexed once and go stale. Policies change, prices change, features change. Diogenes tracks when source documents were last updated versus when their embeddings were created, surfaces a staleness score, and can trigger re-indexing automatically.
102
+
103
+ ```ruby
104
+ # Inform Diogenes that a source document has changed
105
+ Diogenes::Drift.source_updated(
106
+ document_id: 'refund-policy-v2',
107
+ updated_at: Time.current,
108
+ diff_size: :major
109
+ )
110
+ ```
111
+
112
+ ```ruby
113
+ # config/initializers/diogenes.rb
114
+ Diogenes.configure do |config|
115
+ config.drift.reindex_job = ReindexDocumentJob
116
+ config.drift.staleness_thresholds = { warning: 7.days, critical: 30.days }
117
+ config.drift.alert_webhook = ENV['DIOGENES_ALERT_WEBHOOK']
118
+ end
15
119
  ```
16
120
 
17
- If bundler is not being used to manage dependencies, install the gem by executing:
121
+ Stale documents surface in the dashboard drift tab, ranked by severity. Re-indexing queues your job with one click or via a Rake task.
122
+
123
+ ---
124
+
125
+ ### 4. Eval Runner
126
+
127
+ The hardest unsolved problem in production AI is knowing whether your feature is getting better or worse over time. Diogenes ships a lightweight eval framework: define golden question/answer pairs, run them on a schedule, track pass rates over time, and alert on regression.
128
+
129
+ ```ruby
130
+ # test/diogenes/evals/support_assistant_evals.rb
131
+
132
+ Diogenes::Evals.define(SupportAssistant) do
133
+ eval "basic refund question" do
134
+ query "How do I request a refund?"
135
+ expects all_of(
136
+ grounded_in("refund-policy"),
137
+ contains("billing page"),
138
+ does_not_contain("24 hours")
139
+ )
140
+ end
141
+
142
+ eval "question with no good answer" do
143
+ query "What is the API rate limit for legacy v1 endpoints?"
144
+ expects one_of(
145
+ low_confidence_response,
146
+ routes_to_human_review
147
+ )
148
+ end
149
+ end
150
+ ```
18
151
 
19
152
  ```bash
20
- gem install UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
153
+ bundle exec rake diogenes:evals:run[SupportAssistant]
154
+ ```
155
+
156
+ When a passing eval starts failing, Diogenes records the regression point and diffs the last passing response against the first failing one. In most cases it can correlate the regression directly to a stale document in the drift tracker.
157
+
158
+ ---
159
+
160
+ ### 5. The Dashboard
161
+
162
+ Mount the Diogenes engine to get a live view of all of the above in one place:
163
+
164
+ ```ruby
165
+ # config/routes.rb
166
+ authenticate :user, ->(u) { u.admin? } do
167
+ mount Diogenes::Engine => '/diogenes'
168
+ end
21
169
  ```
22
170
 
23
- ## Usage
171
+ The overview tab shows one row per gated feature — gates declared, grounding flag rate, drift score, and eval pass rate. A feature that is passing all its gates but has 11 stale documents and a declining eval pass rate is visible before it becomes a production incident.
24
172
 
25
- TODO: Write usage instructions here
173
+ See [docs/dashboard.md](docs/dashboard.md) for the full dashboard documentation including route structure, controller layout, and configuration reference.
26
174
 
27
- ## Development
175
+ ---
28
176
 
29
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
177
+ ## The Audit Trail
30
178
 
31
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
179
+ Every AI call made through a Diogenes-gated feature produces an audit record:
32
180
 
33
- ## Contributing
181
+ ```ruby
182
+ Diogenes::AuditLog.for_feature(SupportAssistant)
183
+ # => [
184
+ # {
185
+ # feature: "SupportAssistant",
186
+ # gate_config: { failure_mode: :recoverable, ... },
187
+ # query_hash: "sha256:...",
188
+ # context_sources: ["refund-policy.md", "enterprise-terms.md"],
189
+ # grounding: { supported: [...], unsupported: [], contradicted: [] },
190
+ # verified_by: "agent@company.com",
191
+ # timestamp: 2024-01-15 14:23:01 UTC
192
+ # }
193
+ # ]
194
+ ```
34
195
 
35
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/diogenes. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/[USERNAME]/diogenes/blob/main/CODE_OF_CONDUCT.md).
196
+ Audit records store hashes, not raw content PII never enters the audit log directly. The host app controls content storage and retention.
36
197
 
37
- ## License
198
+ ---
38
199
 
39
- The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
200
+ ## Philosophy
40
201
 
41
- ## Code of Conduct
202
+ Diogenes takes no position on whether AI is good or bad for your product. It takes one strong position: **that decision should be made deliberately, defensibly, and with receipts.**
203
+
204
+ A feature that passes all five gates and fails in production is a recoverable engineering problem. A feature that never asked the questions is a different kind of problem entirely.
205
+
206
+ See [docs/framework.md](docs/framework.md) for the full decision framework and [docs/examples.md](docs/examples.md) for two worked examples — one that passes, one that doesn't.
207
+
208
+ ---
209
+
210
+ ## Contributing
211
+
212
+ See [docs/contributing.md](docs/contributing.md).
213
+
214
+ ## License
42
215
 
43
- Everyone interacting in the Diogenes project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/diogenes/blob/main/CODE_OF_CONDUCT.md).
216
+ MIT
data/docs/context.md ADDED
@@ -0,0 +1,60 @@
1
+ # Supplementary Context for Contributors
2
+
3
+ This file extends CLAUDE.md with additional context for common contribution scenarios. Read CLAUDE.md first.
4
+
5
+ ---
6
+
7
+ ## Adding a New Gate
8
+
9
+ Gates are the core of Diogenes. Adding one requires:
10
+
11
+ 1. A class in `lib/diogenes/gates/` inheriting `Diogenes::Gates::Base`
12
+ 2. Implementation of `#valid?` and `#failure_message`
13
+ 3. Registration in `Diogenes::Feature`
14
+ 4. Documentation in `docs/framework.md`
15
+ 5. An entry in the gate compatibility matrix (see below)
16
+
17
+ Gates should be conservative by default. If configuration is ambiguous, fail loudly rather than pass silently.
18
+
19
+ ### Gate Compatibility Matrix
20
+
21
+ Some gate configurations are incompatible with each other. The matrix lives in `lib/diogenes/gates/compatibility.rb` and is checked after all individual gates are validated.
22
+
23
+ Current incompatibilities:
24
+ - `:failure_mode severity: :financial_dispute` + `:human_in_loop verified: false` → always raises
25
+ - `:failure_mode severity: :safety_risk` + `:user_calibration audience: :general_consumer` → always raises
26
+ - `:user_calibration audience: :general_consumer` + `:human_in_loop verified: false` → raises unless `:failure_mode severity: :cosmetic`
27
+
28
+ When adding a new gate, consider whether it creates new incompatibilities and add them to the matrix.
29
+
30
+ ---
31
+
32
+ ## Working on the Audit Log
33
+
34
+ The audit log has two layers:
35
+
36
+ **Core layer** (`lib/diogenes/audit/`) — plain Ruby, no ActiveRecord. Produces `Diogenes::Audit::Record` structs. This must remain framework-agnostic.
37
+
38
+ **Rails layer** (`lib/diogenes/engine.rb`) — ActiveRecord-backed persistence, only loaded when Rails is detected. Writes records from the core layer to the database.
39
+
40
+ If you're adding fields to audit records, add them to the struct first, then to the Rails migration. Never add Rails-specific code to the core layer.
41
+
42
+ ---
43
+
44
+ ## Working on the Review Engine
45
+
46
+ The review engine is a Rails engine mounted at a configurable path. It has no opinions about authentication — that's the host app's job. Document this clearly in any UI-related PRs.
47
+
48
+ The queue logic in `lib/diogenes/review/queue.rb` is tested without Rails. The engine views and controllers are tested with a minimal Rails app in `spec/dummy/`.
49
+
50
+ ---
51
+
52
+ ## Common Mistakes
53
+
54
+ **Making gates runtime checks.** Gates are boot-time. If you find yourself writing code that evaluates a gate during a request, reconsider the design.
55
+
56
+ **Adding LLM-specific code.** Diogenes doesn't know what LLM you're using. If a PR requires a specific client gem, it belongs in the user's codebase.
57
+
58
+ **Soft failures.** Diogenes raises when gates fail. It does not return nil, log a warning, or degrade gracefully. A feature that fails a gate should not serve AI output under any circumstances.
59
+
60
+ **Storing content in audit records.** Audit records store metadata and hashes, not raw content. Raw content may contain PII. The host app is responsible for its own content storage and retention.