RubyGems - diogenes - Versions diffs - 0.1.2 → 0.1.3 - Mend

diogenes 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

checksums.yaml +4 -4
data/.mise/config.toml +72 -0
data/.mise/mise.lock +179 -0
data/.mise/tasks/update-hk-import +79 -0
data/.release-please-config.json +1 -1
data/.release-please-manifest.json +2 -2
data/CHANGELOG.md +7 -0
data/CLAUDE.md +107 -99
data/CONTRIBUTING.md +206 -0
data/README.md +157 -134
data/Rakefile +15 -1
data/Steepfile +11 -0
data/docs/gates.md +178 -0
data/docs/targets.md +11 -0
data/exe/diogenes +6 -0
data/hk.pkl +46 -0
data/lib/diogenes/cli/init.rb +88 -0
data/lib/diogenes/cli.rb +95 -0
data/lib/diogenes/templates/init/artifacts/decision_record.md.erb +53 -0
data/lib/diogenes/templates/init/diogenes.rb +13 -0
data/lib/diogenes/templates/init/hooks/README.md +15 -0
data/lib/diogenes/templates/init/rules/five_gates.rb +33 -0
data/lib/diogenes/templates/init/skills/example_skill.rb +33 -0
data/lib/diogenes/version.rb +2 -1
data/lib/diogenes.rb +27 -2
data/sig/generated/diogenes/cli/init.rbs +34 -0
data/sig/generated/diogenes/cli.rbs +34 -0
data/sig/generated/diogenes/version.rbs +5 -0
data/sig/generated/diogenes.rbs +26 -0
metadata +23 -9
data/docs/context.md +0 -60
data/docs/contributing.md +0 -228
data/docs/dashboard.md +0 -365
data/docs/examples.md +0 -162
data/docs/framework.md +0 -146
data/mise.lock +0 -48
data/mise.toml +0 -6

data/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,206 @@
+# Contributing to Diogenes
+Thanks for contributing. Diogenes has opinions — about AI, about Ruby, and about how this gem should be built. This doc explains both the practical setup and the philosophy that should guide contributions.
+## Setup
+```bash
+git clone https://github.com/meaganewaller/diogenes
+cd diogenes
+bundle install
+```
+Run the test suite:
+```bash
+bundle exec rspec
+```
+Run the CLI locally:
+```bash
+bundle exec exe/diogenes --help
+```
+## The Gem Eats Its Own Cooking
+Diogenes ships with its own `.diogenes/` configuration. When you're working on this gem, your AI agent is configured with skills and rules that reflect the gem's own philosophy.
+After cloning, build the agent configs for your setup:
+```bash
+bundle exec exe/diogenes build --all
+```
+This generates:
+* `.claude/commands/` + `CLAUDE.md` if you use Claude Code
+* `.cursor/rules/` if you use Cursor
+* `.github/copilot-instructions.md` if you use Copilot
+The skills available to you while developing:
+| Command | What it does |
+|---------|-------------|
+| `/evaluate-feature` | Run a proposed gem feature through the five gates |
+| `/new-target <name>` | Scaffold a new agent build target |
+| `/review-philosophy` | Check whether a change aligns with Diogenes' principles |
+| `/generate-stories` | Draft user stories for a proposed feature |
+| `/gate-schema` | Help define options and validation for a new gate |
+## Before Adding a Feature
+Run the proposed feature through the gates. Yes, the gem's own gates. Yes, seriously.
+```bash
+bundle exec exe/diogenes evaluate "your proposed feature description"
+```
+Commit the decision record. If a gate fails, that's either a reason not to build it, or a reason to write down why you're overriding the gate and what you're doing to mitigate the risk.
+This isn't bureaucracy. It's the gem practicing what it preaches.
+## Adding a New Build Target
+Each build target lives in `lib/diogenes/targets/`. A target is a class that:
+1. Accepts the compiled source (skills, rules, hooks, artifacts)
+2. Knows where to write its output files
+3. Emits idiomatic configuration for that agent
+To scaffold a new target:
+```bash
+bundle exec exe/diogenes /new-target <agent-name>
+```
+Or manually:
+```ruby
+# lib/diogenes/targets/my_agent.rb
+module Diogenes
+  module Targets
+    class MyAgent < Base
+      TARGET_NAME = :my_agent
+      def build
+        emit_file ".myagent/instructions.md", render_rules
+        emit_file ".myagent/commands/", render_skills
+      end
+      private
+      def render_rules
+        # transform source rules into target format
+      end
+      def render_skills
+        # transform source skills into target format
+      end
+    end
+  end
+end
+```
+Register it in `lib/diogenes/targets.rb`:
+```ruby
+autoload :MyAgent, "diogenes/targets/my_agent"
+REGISTRY[:my_agent] = Targets::MyAgent
+```
+Add tests in `spec/diogenes/targets/my_agent_spec.rb`. See existing targets for patterns.
+Update `docs/targets.md` with the new target's output format and any quirks.
+## Adding a New Gate Option
+Gate options are defined in `lib/diogenes/gates/`. Each gate has:
+* A name (`:failure_mode`, etc.)
+* A schema of valid options and values
+* Validation logic that determines pass/fail
+* A plain-English failure message
+To add a new option to an existing gate:
+1. Find the gate in `lib/diogenes/gates/<gate_name>.rb`
+2. Add the option to the schema
+3. Add validation logic
+4. Add a descriptive failure message
+5. Add tests in `spec/diogenes/gates/<gate_name>_spec.rb`
+6. Update `docs/gates.md`
+The failure message should:
+* Say clearly what the problem is
+* Suggest a concrete alternative or mitigation
+* Sound like a thoughtful colleague, not a compiler error
+## Code Conventions
+This is a Ruby gem. It should feel like one.
+* Follow the existing style — `standard` is configured and enforced in CI
+* Error messages should be human. `Diogenes::GateFailed` should tell you what happened and what to do, not just what went wrong.
+* Public APIs should be minimal. Less surface area means fewer things to break.
+* If you're adding a class, it probably needs tests. If you're not sure, add them anyway.
+* Prefer explicit over clever. The person reading this at 11pm before a deploy should understand it immediately.
+## Testing
+The test suite uses RSpec. Run it:
+```bash
+bundle exec rspec
+```
+Run a specific file:
+```bash
+bundle exec rspec spec/diogenes/gates/failure_mode_spec.rb
+```
+Test structure mirrors `lib/`:
+```text
+spec/
+├── diogenes/
+│   ├── cli/
+│   ├── gates/
+│   ├── targets/
+│   └── runtime/
+└── spec_helper.rb
+```
+Aim for:
+* Unit tests on individual gate validators
+* Integration tests on the CLI commands (use `TTY::Testing` or similar)
+* Target output tests that snapshot the emitted files
+* RSpec matcher tests that verify matcher messages, not just pass/fail
+## Pull Requests
+* One thing per PR. Seriously.
+* Include the decision record if you ran `diogenes evaluate` — link it in the PR description.
+* Update `CHANGELOG.md` under `[Unreleased]`.
+* Update docs if you changed behavior.
+* CI must pass. Don't open a PR you know is red.
+PR title format: `[gate|target|cli|runtime|docs] short description`
+Examples:
+* `[gate] Add :data_sensitivity option to user_verifiable gate`
+* `[target] Add Gemini build target`
+* `[cli] Add --dry-run flag to build command`
+## Changelog
+We use [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format. Add your change under `[Unreleased]` in the appropriate section: `Added`, `Changed`, `Deprecated`, `Removed`, `Fixed`, `Security`.
+## Questions
+Open an issue. Tag it `question`. We'll answer and, if it's a common one, fold the answer into this doc.

data/README.md CHANGED Viewed

@@ -1,215 +1,238 @@
 # Diogenes
-> "I am looking for an honest man." — Diogenes of Sinope
+> *Diogenes walked through Athens with a lantern in broad daylight, looking for an honest man.*
-Diogenes is a Ruby gem that helps engineering teams make and defend decisions about when AI belongs in a feature — and when it doesn't.
+**Diogenes** is a Ruby gem that holds your AI features to the same light. It has two jobs:
-It encodes a responsible AI decision framework directly into your Rails application as executable, auditable gates, and then actively monitors your AI features in production through grounding verification, document drift detection, and a regression-aware eval runner.
+1. A five-gate gauntlet decision framework that determines whether something should be an AI feature in production, or whether it's a software problem in disguise. Derived from Ruby's core principles: least surprise, programmer happiness, and human-centered design.
+2. An agent configuration build tool that lets you write your AI agent skills, rules, and hooks once in a canonical Ruby DSL, and Diogenes builds the right configuration for every agent you use: Claude Code, Cursor, Copilot, Codex, Gemini, and more.
-A mounted dashboard surfaces all of this in one place. Think Sidekiq Web for AI accountability.
----
+## Why Diogenes?
-## The Problem
+The current pressure to ship AI features is real. Your PM read something. Your CEO saw a demo. And you're a Ruby developer — you know how to build it, the APIs are cheap, and a proof of concept takes an afternoon.
-Most teams make two kinds of mistakes with AI features:
+But Ruby has always had opinions about how software should feel, who it should serve, and what it means to write something you're proud of. Those opinions are exactly the framework we need right now.
-**Mistake one:** Deciding to build them based on excitement or pressure rather than defensible criteria. When something goes wrong, nobody can explain why the decision was made or what safeguards were in place.
-**Mistake two:** Shipping them and assuming they continue to work. AI features degrade silently — documents go stale, retrieval quality drifts, models change. Traditional monitoring misses all of it because wrong-but-fluent outputs don't raise exceptions.
-Diogenes addresses both.
----
+Diogenes encodes those opinions into something you can run, test, and commit.
 ## Installation
 ```ruby
-gem 'diogenes'
+# Gemfile
+gem "diogenes"
 ```
 ```bash
 bundle install
-rails generate diogenes:install
-bundle exec rake db:migrate
 ```
----
-## What Diogenes Does
+Or, install globally to use the CLI anywhere:
-### 1. The Decision Framework (Gates)
+```bash
+gem install diogenes
+```
-Before an AI feature can serve output to a user, it must pass a set of declared gates. Gates are validated at boot — misconfiguration fails loudly before anything reaches production.
+## The CLI - Agent Configuration as Code
-```ruby
-class SupportAssistant
-  include Diogenes::Feature
+### Initialize a project
-  gate :failure_mode,      severity: :recoverable
-  gate :user_calibration,  audience: :trained_agent
-  gate :human_in_loop,     verified: true, max_daily_reviews: 80
-  gate :observability,     logging: :full, alerting: :enabled
-  gate :necessity,         alternatives_considered: true
-  def answer(query, agent:)
-    # your implementation
-  end
-end
+```bash
+diogenes init
 ```
-A feature that cannot satisfy a gate raises `Diogenes::UnsafeFeatureError` at boot with a plain-English explanation of what needs to change.
-**The five gates:** `:failure_mode`, `:user_calibration`, `:human_in_loop`, `:observability`, `:necessity`. See [docs/framework.md](docs/framework.md) for full documentation.
----
+Creates a `.diogenes/` directory with a canonical source structure:
-### 2. Grounding Verification
+```text
+.diogenes/
+├── diogenes.rb          # root config — declare your targets
+├── skills/              # things you invoke explicitly (/commands)
+├── rules/               # standing instructions for every session
+├── hooks/               # event-triggered behaviors
+└── artifacts/           # output templates (decision records, ADRs, etc.)
+```
-For RAG pipelines, Diogenes ships a grounding verifier that runs a second LLM pass to check that AI output is actually supported by retrieved context — not confabulated.
+### Define skills in the Ruby DSL
 ```ruby
-class SupportAssistant
-  include Diogenes::Feature
-  include Diogenes::Grounding
-  verify_grounding threshold: 0.8, on_failure: :flag_for_review
-  def answer(query, agent:)
-    context = retriever.retrieve(query)
-    response = llm.complete(query, context: context)
-    verify_and_return(response, context: context, reviewed_by: agent)
-  end
+# .diogenes/skills/evaluate_feature.rb
+Diogenes.skill "evaluate_feature" do
+  command     "/evaluate-feature"
+  description "Walk a proposed AI feature through the five Diogenes gates"
+  prompt      <<~PROMPT
+    You are helping a Ruby developer evaluate whether a proposed AI feature
+    should be built. Walk them through each of the five Diogenes gates in order,
+    asking focused questions and surfacing the failure mode clearly if a gate fails.
+    The five gates are:
+    1. Is the failure mode acceptable at scale?
+    2. Can the average user tell when it's wrong?
+    3. Is there a real human in the loop — actually checking?
+    4. Do you have the observability to know when it's going wrong?
+    5. Is AI actually the right tool, or just the exciting one?
+    At the end, generate a structured decision record.
+  PROMPT
 end
 ```
-The verifier returns a structured verdict — which claims are supported, unsupported, or contradicted by the retrieved context — and acts on it according to your configuration. Flag rates and verdicts are tracked in the audit log and surfaced in the dashboard.
+### Build for your agents
+```bash
+diogenes build --all                  # build for every configured target
+diogenes build --target claude-code   # .claude/commands/ + CLAUDE.md
+diogenes build --target cursor        # .cursor/rules/
+diogenes build --target copilot       # .github/copilot-instructions.md
+diogenes build --target codex         # codex-instructions.md + tools.json
+diogenes build --target gemini        # .gemini/instructions.md
+```
-Configure any LLM callable as the verifier backend — Diogenes has no opinion on which one:
+Configure your targets in `diogenes.rb`:
 ```ruby
-Diogenes.configure do |config|
-  config.grounding.verifier_llm = -> (prompt) { Anthropic::Client.new.complete(prompt) }
+Diogenes.configure do
+  targets :claude_code, :cursor, :copilot
 end
 ```
----
+### Evaluate a feature interactively
-### 3. Drift Detection
+```bash
+diogenes evaluate "AI assistant to explain billing history"
+```
-Documents get indexed once and go stale. Policies change, prices change, features change. Diogenes tracks when source documents were last updated versus when their embeddings were created, surfaces a staleness score, and can trigger re-indexing automatically.
+Walks through each gate as a conversation:
-```ruby
-# Inform Diogenes that a source document has changed
-Diogenes::Drift.source_updated(
-  document_id: 'refund-policy-v2',
-  updated_at: Time.current,
-  diff_size: :major
-)
-```
+```text
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  DIOGENES — Feature Gate Evaluation
+  "AI assistant to explain billing history"
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-```ruby
-# config/initializers/diogenes.rb
-Diogenes.configure do |config|
-  config.drift.reindex_job = ReindexDocumentJob
-  config.drift.staleness_thresholds = { warning: 7.days, critical: 30.days }
-  config.drift.alert_webhook = ENV['DIOGENES_ALERT_WEBHOOK']
-end
+Gate 1 of 5: Failure Mode
+→ Ruby principle: Least surprise — at scale
+What happens when this feature is wrong?
+A wrong answer here is a billing dispute the user
+can't resolve without contacting support.
+  Severity: [recoverable / embarrassing / catastrophic]
+  > catastrophic
+  ✗ GATE FAILED
+  Catastrophic failure modes don't pass Gate 1.
+  Consider: is there a software solution that makes
+  the billing data clearer without AI interpretation?
+Continue evaluating remaining gates? (y/N)
 ```
-Stale documents surface in the dashboard drift tab, ranked by severity. Re-indexing queues your job with one click or via a Rake task.
+At the end, generates a decision record you can commit:
----
+```bash
+  ✓ Decision record written to docs/decisions/ai_billing_assistant_decision.md
+```
-### 4. Eval Runner
+## The Runtime Library — Gates in Your Code
-The hardest unsolved problem in production AI is knowing whether your feature is getting better or worse over time. Diogenes ships a lightweight eval framework: define golden question/answer pairs, run them on a schedule, track pass rates over time, and alert on regression.
+### Include Diogenes::Gated
 ```ruby
-# test/diogenes/evals/support_assistant_evals.rb
-Diogenes::Evals.define(SupportAssistant) do
-  eval "basic refund question" do
-    query    "How do I request a refund?"
-    expects  all_of(
-      grounded_in("refund-policy"),
-      contains("billing page"),
-      does_not_contain("24 hours")
-    )
-  end
+class BillingAssistant
+  include Diogenes::Gated
+  gate :failure_mode,    severity: :catastrophic
+  gate :user_verifiable, domain: :financial
+  gate :human_in_loop,   capacity: :real
+  gate :observability,   monitoring: :required
+  gate :right_tool
-  eval "question with no good answer" do
-    query    "What is the API rate limit for legacy v1 endpoints?"
-    expects  one_of(
-      low_confidence_response,
-      routes_to_human_review
-    )
+  def explain(invoice)
+    # implementation
   end
 end
 ```
-```bash
-bundle exec rake diogenes:evals:run[SupportAssistant]
-```
+### Development — loud failures
-When a passing eval starts failing, Diogenes records the regression point and diffs the last passing response against the first failing one. In most cases it can correlate the regression directly to a stale document in the drift tracker.
+In development, gate configuration is validated at class load time.
+A failing gate raises immediately so you can't start a server with a misconfigured AI feature:
----
+```text
+Diogenes::GateFailed: Gate 1 (failure_mode) failed.
+  severity: :catastrophic — catastrophic failures are not acceptable at scale.
+  Consider a software alternative: clearer UI, explicit error states,
+  or rule-based logic with predictable output.
+```
-### 5. The Dashboard
+### Production — structured failures
-Mount the Diogenes engine to get a live view of all of the above in one place:
+In production, gate failures return a `Diogenes::GateResult` rather than raising:
 ```ruby
-# config/routes.rb
-authenticate :user, ->(u) { u.admin? } do
-  mount Diogenes::Engine => '/diogenes'
+result = BillingAssistant.new.explain(invoice)
+if result.passed?
+  render json: result.value
+else
+  render json: { error: result.reason }, status: :service_unavailable
 end
 ```
-The overview tab shows one row per gated feature — gates declared, grounding flag rate, drift score, and eval pass rate. A feature that is passing all its gates but has 11 stale documents and a declining eval pass rate is visible before it becomes a production incident.
+### Testing — gates are testable
-See [docs/dashboard.md](docs/dashboard.md) for the full dashboard documentation including route structure, controller layout, and configuration reference.
+```ruby
+# spec/models/billing_assistant_spec.rb
+require "diogenes/rspec"
+RSpec.describe BillingAssistant do
+  it "declares all five gates" do
+    expect(described_class).to have_gate(:failure_mode)
+    expect(described_class).to have_gate(:user_verifiable)
+    expect(described_class).to have_gate(:human_in_loop)
+    expect(described_class).to have_gate(:observability)
+    expect(described_class).to have_gate(:right_tool)
+  end
----
+  it "fails the failure mode gate — catastrophic severity is not acceptable" do
+    expect(described_class).to fail_gate(:failure_mode)
+  end
+end
+```
-## The Audit Trail
+## The Five Gates
-Every AI call made through a Diogenes-gated feature produces an audit record:
+| Gate | Ruby Principle | The Question |
+|------|---------------|--------------|
+| 1. Failure Mode | Least surprise — at scale | What happens when it's wrong? Is that acceptable? |
+| 2. User Verifiable | Trust — you can't trust what you can't verify | Can your average user tell when it's wrong? |
+| 3. Human in the Loop | Human-centered design, genuinely | Is there a human checking — actually checking? |
+| 4. Observability | Craftsmanship — you wouldn't ship a sort blind | Do you have the monitoring to know when it drifts? |
+| 5. Right Tool | Convention over configuration | Is AI the right answer, or just the exciting one? |
-```ruby
-Diogenes::AuditLog.for_feature(SupportAssistant)
-# => [
-#   {
-#     feature:         "SupportAssistant",
-#     gate_config:     { failure_mode: :recoverable, ... },
-#     query_hash:      "sha256:...",
-#     context_sources: ["refund-policy.md", "enterprise-terms.md"],
-#     grounding:       { supported: [...], unsupported: [], contradicted: [] },
-#     verified_by:     "agent@company.com",
-#     timestamp:       2024-01-15 14:23:01 UTC
-#   }
-# ]
-```
+## Gate Reference
-Audit records store hashes, not raw content — PII never enters the audit log directly. The host app controls content storage and retention.
+See [`docs/gates.md`](docs/gates.md) for the full schema of gate options and what passes/fails each gate.
----
+## Agent Targets
-## Philosophy
+See [`docs/targets.md`](docs/targets.md) for the full list of supported build targets, what files they emit, and how to add a new target.
+## Contributing
-Diogenes takes no position on whether AI is good or bad for your product. It takes one strong position: **that decision should be made deliberately, defensibly, and with receipts.**
+See [`CONTRIBUTING.md`](CONTRIBUTING.md).
-A feature that passes all five gates and fails in production is a recoverable engineering problem. A feature that never asked the questions is a different kind of problem entirely.
+Diogenes uses itself. The `.diogenes/` directory in this repo contains the canonical source for the skills, rules, and hooks used when developing the gem. Run `diogenes build --all` after changing them.
-See [docs/framework.md](docs/framework.md) for the full decision framework and [docs/examples.md](docs/examples.md) for two worked examples — one that passes, one that doesn't.
+## Philosophy
----
+Diogenes is not anti-AI. It's pro-Ruby.
-## Contributing
+Ruby has always prioritized the human — the programmer, the user, the person reading the code at 2am. That priority is the framework. The gates aren't a checklist to defeat; they're the questions you'd ask if you had time to think.
-See [docs/contributing.md](docs/contributing.md).
+Most AI features should be software features. The ones that survive all five gates are worth building well.
 ## License

data/Rakefile CHANGED Viewed

@@ -7,4 +7,18 @@ Minitest::TestTask.create
 require "standard/rake"
-task default: %i[test standard]
+namespace :rbs do
+  desc "Generate RBS type signatures from inline annotations"
+  task :generate do
+    sh "bundle exec rbs-inline --output sig/generated lib"
+  end
+end
+namespace :steep do
+  desc "Run Steep type checks"
+  task :check do
+    sh "bundle exec steep check"
+  end
+end
+task default: %i[test standard rbs:generate steep:check]

data/Steepfile ADDED Viewed

@@ -0,0 +1,11 @@
+D = Steep::Diagnostic
+target :lib do
+  signature "sig/generated"
+  check "lib"
+  library "logger"
+  configure_code_diagnostics(D::Ruby.lenient)
+end