prompt_canary 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +86 -0
- data/CODE_OF_CONDUCT.md +132 -0
- data/CONTRIBUTING.md +45 -0
- data/LICENSE.txt +21 -0
- data/README.md +338 -0
- data/Rakefile +12 -0
- data/app/controllers/prompt_canary/application_controller.rb +6 -0
- data/app/controllers/prompt_canary/dashboard/prompts_controller.rb +69 -0
- data/app/views/layouts/prompt_canary/application.html.erb +42 -0
- data/app/views/prompt_canary/dashboard/prompts/index.html.erb +50 -0
- data/app/views/prompt_canary/dashboard/prompts/show.html.erb +114 -0
- data/config/routes.rb +12 -0
- data/examples/auto_rollback.rb +105 -0
- data/examples/demo.rb +83 -0
- data/exe/prompt_canary +6 -0
- data/lib/generators/prompt_canary/install_generator.rb +39 -0
- data/lib/generators/prompt_canary/templates/create_prompt_canary_calls.rb +62 -0
- data/lib/prompt_canary/adapter_factory.rb +16 -0
- data/lib/prompt_canary/adapters/anthropic.rb +39 -0
- data/lib/prompt_canary/adapters/base.rb +11 -0
- data/lib/prompt_canary/cli/commands/history.rb +63 -0
- data/lib/prompt_canary/cli/commands/status.rb +55 -0
- data/lib/prompt_canary/cli.rb +69 -0
- data/lib/prompt_canary/configuration.rb +31 -0
- data/lib/prompt_canary/deployment.rb +186 -0
- data/lib/prompt_canary/engine.rb +7 -0
- data/lib/prompt_canary/monitor.rb +30 -0
- data/lib/prompt_canary/monitor_job.rb +13 -0
- data/lib/prompt_canary/prompt.rb +13 -0
- data/lib/prompt_canary/prompt_executor.rb +27 -0
- data/lib/prompt_canary/promptable.rb +50 -0
- data/lib/prompt_canary/railtie.rb +28 -0
- data/lib/prompt_canary/recorder.rb +55 -0
- data/lib/prompt_canary/result.rb +18 -0
- data/lib/prompt_canary/rollback_rule.rb +22 -0
- data/lib/prompt_canary/router.rb +61 -0
- data/lib/prompt_canary/storage/active_record_adapter.rb +58 -0
- data/lib/prompt_canary/storage/memory.rb +21 -0
- data/lib/prompt_canary/storage/sqlite.rb +64 -0
- data/lib/prompt_canary/storage_factory.rb +24 -0
- data/lib/prompt_canary/version.rb +5 -0
- data/lib/prompt_canary/version_builder.rb +52 -0
- data/lib/prompt_canary/version_object.rb +63 -0
- data/lib/prompt_canary.rb +101 -0
- data/sig/prompt_canary.rbs +4 -0
- metadata +95 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: 792f396d1e355446cdc4855cf306d531b04f1dc4638414a7a7ac11cc51f8dde0
|
|
4
|
+
data.tar.gz: 7066dc3b189eb3ca99965b49893cc540ae55ede627b8110020ae10abc4dbaa82
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 5cb8a51f28ae5666f2e1bf3f6f0213905cf736fc56612ff465d0dadd57df8596292579452e843cef7de5e8e1db8c5bfbd02c9b0ad40a88379c99c885bb084eec
|
|
7
|
+
data.tar.gz: c0a37b35f57769a453d479ffc4aab381339b50c641ccd906312c760409826131798afe8aefa75cd66a3fc8a4595645849aeea358cfad8d08ffd5cf2607947e88
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
## [Unreleased]
|
|
2
|
+
|
|
3
|
+
## [0.3.0] - 2026-06-03
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
|
|
7
|
+
- `PromptCanary.promote` — marks a version as primary at runtime; the previous primary becomes a candidate. Writes `PrimaryOverride` to DB so the designation survives restarts. Does not change traffic percentages.
|
|
8
|
+
- `PromptCanary.set_canary` — adjusts a version's canary traffic percentage at runtime without a deploy. Independent of status. Raises `ArgumentError` if percent is zero — use `demote` to stop traffic.
|
|
9
|
+
- `PromptCanary::PromptEvent` AR model — persists a full audit trail to `prompt_canary_events` with event type, previous/new status, previous/new percent, reason, triggered-by, and monitor metric fields.
|
|
10
|
+
- All deployment operations (`promote`, `demote`, `restore`, `set_canary`) write audit events to `prompt_canary_events`.
|
|
11
|
+
- `promote` writes two events: one for the promoted version and a `superseded` event for the displaced primary.
|
|
12
|
+
- Monitor passes `triggered_by: "monitor"` and metric metadata (`triggering_metric`, `triggering_value`, `triggering_threshold`) through `demote` so auto-rollbacks are distinguishable from manual ones in the audit trail.
|
|
13
|
+
- `CannotDemotePrimaryError` — raised when attempting to demote the primary version with no other viable candidate, preventing the system from being left without a route target.
|
|
14
|
+
- `DemotedVersionError` — raised when `set_canary` targets a demoted version. Call `restore` first.
|
|
15
|
+
- `Version#demoted?` — tracks demoted state in memory for non-AR storage paths.
|
|
16
|
+
- `Deployment` module — extracted from `PromptCanary` to group the four runtime operations (`promote`, `demote`, `restore`, `set_canary`) with a clear SRP boundary.
|
|
17
|
+
- CLI `promote` subcommand — `prompt_canary promote PromptClass version [--reason "..."]`
|
|
18
|
+
- CLI `history` subcommand — `prompt_canary history PromptClass [--since 7d]`; validates period format.
|
|
19
|
+
- CLI `status` subcommand — `prompt_canary status PromptClass`; shows current status and traffic for each version.
|
|
20
|
+
- Dashboard Promote button — appears on the show page for candidate versions; absent for primary and demoted versions.
|
|
21
|
+
- Dashboard deployment history — show page surfaces the last 10 `PromptEvent` rows so operators can see what happened to a prompt without leaving the dashboard.
|
|
22
|
+
- Generator now creates all four tables in a single migration: `prompt_canary_calls`, `prompt_canary_rollout_overrides`, `prompt_canary_primary_overrides`, `prompt_canary_events`.
|
|
23
|
+
|
|
24
|
+
### Changed
|
|
25
|
+
|
|
26
|
+
- `demote` is now idempotent via `find_or_initialize_by` — demoting an already-demoted version does not error or create duplicate rows.
|
|
27
|
+
- `restore` restores the pre-demotion traffic percentage (stored at demotion time) rather than defaulting to zero.
|
|
28
|
+
- Router reads `PrimaryOverride` before falling back to first-declared default — DB-promoted versions survive restarts.
|
|
29
|
+
- `--since` validation in `history` command — invalid formats (e.g. `abc`, `0d`) exit with a clear error instead of silently querying all records.
|
|
30
|
+
|
|
31
|
+
### Breaking Changes
|
|
32
|
+
|
|
33
|
+
| Change | Migration |
|
|
34
|
+
|---|---|
|
|
35
|
+
| `stable: true` removed from version DSL | Delete `stable: true` from all version declarations. The first declared version is primary automatically. |
|
|
36
|
+
| `set_canary(prompt, version, 0)` raises `ArgumentError` | Use `PromptCanary.demote` to stop traffic to a version. |
|
|
37
|
+
|
|
38
|
+
## [0.2.0] - 2026-05-30
|
|
39
|
+
|
|
40
|
+
### Added
|
|
41
|
+
|
|
42
|
+
- `PromptCanary::Promptable` module — compose prompt behavior via `include` without consuming the host class's superclass slot
|
|
43
|
+
- `PromptCanary::AdapterFactory` and `StorageFactory` — registry pattern replacing case statements; raises `ConfigurationError` immediately for unknown values
|
|
44
|
+
- `PromptCanary::PromptExecutor` — extracts router → adapter → recorder → result orchestration out of `Promptable#call`
|
|
45
|
+
- `Configuration#validate!` — centralizes configuration validation; called at end of `configure` block
|
|
46
|
+
- Dynamic system prompts — `system` DSL field accepts a block receiving call-time args, enabling runtime data in prompt text
|
|
47
|
+
- `RollbackRule` value object — replaces plain hashes; owns comparison logic via `violated_by?`
|
|
48
|
+
- `Recorder#stats` — returns `{ call_count:, error_rate:, latency_p95:, last_called_at: }` over a configurable window
|
|
49
|
+
- `PromptCanary.stats` — convenience method for Rails console access; no setup required
|
|
50
|
+
- `PromptCanary.register_prompt` / `registered_prompts` — prompt class registry populated automatically when `Promptable` is included
|
|
51
|
+
- `PromptCanary::Railtie` — loads prompt classes from `app/prompts/` at boot; warns when `:sqlite` is configured in a Rails context
|
|
52
|
+
- `PromptCanary::MonitorJob` — `ActiveJob::Base` subclass that iterates registered prompts and runs the monitor; host app only schedules it
|
|
53
|
+
- `Storage::ActiveRecord` — AR-backed storage using the host app's existing database connection and migration system
|
|
54
|
+
- `PromptCanary::RolloutOverride` — AR model persisting demotions to `prompt_canary_rollout_overrides`; survives restarts and redeploys
|
|
55
|
+
- `PromptCanary.restore` — clears a demotion override and emits `prompt_canary.restored`; router immediately resumes class-defined rollout
|
|
56
|
+
- `rails generate prompt_canary:install` — creates both `prompt_canary_calls` and `prompt_canary_rollout_overrides` migrations and mounts the engine
|
|
57
|
+
- `PromptCanary::Engine` — mountable Rails engine with read-only dashboard; index and show views display per-version stats, active/inactive row styling, and demoted badge
|
|
58
|
+
- Router reads `prompt_canary_rollout_overrides` on every request when AR is available — demoted versions receive zero traffic without a redeploy
|
|
59
|
+
- Dashboard active/inactive styling — inactive versions (zero rollout or demoted) rendered at reduced opacity; order is stable so a status change is visible without reordering
|
|
60
|
+
|
|
61
|
+
### Changed
|
|
62
|
+
|
|
63
|
+
- `PromptCanary::Prompt` is deprecated — `include PromptCanary::Promptable` is the correct pattern; `Prompt` emits a deprecation warning from `inherited`
|
|
64
|
+
- `PromptCanary.demote` writes a `RolloutOverride` record when using AR storage instead of mutating in-memory version state — class-defined rollout is preserved so restore requires no knowledge of the original value
|
|
65
|
+
- Adapter gems (`anthropic`, etc.) are the caller's dependency — not declared in gemspec to avoid forcing unused adapters on callers using custom implementations
|
|
66
|
+
|
|
67
|
+
### Fixed
|
|
68
|
+
|
|
69
|
+
- `frozen_string_literal` consistency across all files
|
|
70
|
+
|
|
71
|
+
## [0.1.0] - 2026-05-27
|
|
72
|
+
|
|
73
|
+
### Added
|
|
74
|
+
|
|
75
|
+
- `Prompt` DSL — declare versioned LLM prompts as Ruby classes with `version`, `stable`, `model`, `system`, `rollout`, `rollout_to`, and `rollback_if`
|
|
76
|
+
- `Router` — deterministic CRC32-based percentage routing and predicate routing; falls back to stable on missing `call_id` or predicate exception
|
|
77
|
+
- `Adapters::Anthropic` — Anthropic SDK v1.43.0 adapter; captures latency, token usage, and errors in a uniform telemetry hash
|
|
78
|
+
- `Recorder` — writes call telemetry to storage; computes `error_rate` and `latency_p95` (nearest-rank) over configurable request windows
|
|
79
|
+
- `Storage::Memory` — in-process storage for tests and development
|
|
80
|
+
- `Storage::SQLite` — persistent SQLite storage for production use
|
|
81
|
+
- `Monitor` — evaluates `rollback_if` rules against recorded metrics; calls `PromptCanary.demote` when a rule fires
|
|
82
|
+
- `PromptCanary.demote` — zeros a version's rollout percent and emits a `prompt_canary.demoted` notification
|
|
83
|
+
- `PromptCanary.subscribe` — simple pub/sub for demotion notifications; no ActiveSupport dependency
|
|
84
|
+
- `Result` — immutable value object returned by `Prompt.call`: `text`, `version_used`, `model`, `latency_ms`, `tokens`, `error`, `recorded_at`
|
|
85
|
+
- `CLI` — `prompt_canary demote PromptClass version --reason "..."` for manual rollbacks from scripts and cron jobs
|
|
86
|
+
- Configuration validation at `configure` time — raises `ConfigurationError` immediately if `adapter` or `storage` is not set
|
data/CODE_OF_CONDUCT.md
ADDED
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
# Contributor Covenant Code of Conduct
|
|
2
|
+
|
|
3
|
+
## Our Pledge
|
|
4
|
+
|
|
5
|
+
We as members, contributors, and leaders pledge to make participation in our
|
|
6
|
+
community a harassment-free experience for everyone, regardless of age, body
|
|
7
|
+
size, visible or invisible disability, ethnicity, sex characteristics, gender
|
|
8
|
+
identity and expression, level of experience, education, socio-economic status,
|
|
9
|
+
nationality, personal appearance, race, caste, color, religion, or sexual
|
|
10
|
+
identity and orientation.
|
|
11
|
+
|
|
12
|
+
We pledge to act and interact in ways that contribute to an open, welcoming,
|
|
13
|
+
diverse, inclusive, and healthy community.
|
|
14
|
+
|
|
15
|
+
## Our Standards
|
|
16
|
+
|
|
17
|
+
Examples of behavior that contributes to a positive environment for our
|
|
18
|
+
community include:
|
|
19
|
+
|
|
20
|
+
* Demonstrating empathy and kindness toward other people
|
|
21
|
+
* Being respectful of differing opinions, viewpoints, and experiences
|
|
22
|
+
* Giving and gracefully accepting constructive feedback
|
|
23
|
+
* Accepting responsibility and apologizing to those affected by our mistakes,
|
|
24
|
+
and learning from the experience
|
|
25
|
+
* Focusing on what is best not just for us as individuals, but for the overall
|
|
26
|
+
community
|
|
27
|
+
|
|
28
|
+
Examples of unacceptable behavior include:
|
|
29
|
+
|
|
30
|
+
* The use of sexualized language or imagery, and sexual attention or advances of
|
|
31
|
+
any kind
|
|
32
|
+
* Trolling, insulting or derogatory comments, and personal or political attacks
|
|
33
|
+
* Public or private harassment
|
|
34
|
+
* Publishing others' private information, such as a physical or email address,
|
|
35
|
+
without their explicit permission
|
|
36
|
+
* Other conduct which could reasonably be considered inappropriate in a
|
|
37
|
+
professional setting
|
|
38
|
+
|
|
39
|
+
## Enforcement Responsibilities
|
|
40
|
+
|
|
41
|
+
Community leaders are responsible for clarifying and enforcing our standards of
|
|
42
|
+
acceptable behavior and will take appropriate and fair corrective action in
|
|
43
|
+
response to any behavior that they deem inappropriate, threatening, offensive,
|
|
44
|
+
or harmful.
|
|
45
|
+
|
|
46
|
+
Community leaders have the right and responsibility to remove, edit, or reject
|
|
47
|
+
comments, commits, code, wiki edits, issues, and other contributions that are
|
|
48
|
+
not aligned to this Code of Conduct, and will communicate reasons for moderation
|
|
49
|
+
decisions when appropriate.
|
|
50
|
+
|
|
51
|
+
## Scope
|
|
52
|
+
|
|
53
|
+
This Code of Conduct applies within all community spaces, and also applies when
|
|
54
|
+
an individual is officially representing the community in public spaces.
|
|
55
|
+
Examples of representing our community include using an official email address,
|
|
56
|
+
posting via an official social media account, or acting as an appointed
|
|
57
|
+
representative at an online or offline event.
|
|
58
|
+
|
|
59
|
+
## Enforcement
|
|
60
|
+
|
|
61
|
+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
|
62
|
+
reported to the community leaders responsible for enforcement at
|
|
63
|
+
[INSERT CONTACT METHOD].
|
|
64
|
+
All complaints will be reviewed and investigated promptly and fairly.
|
|
65
|
+
|
|
66
|
+
All community leaders are obligated to respect the privacy and security of the
|
|
67
|
+
reporter of any incident.
|
|
68
|
+
|
|
69
|
+
## Enforcement Guidelines
|
|
70
|
+
|
|
71
|
+
Community leaders will follow these Community Impact Guidelines in determining
|
|
72
|
+
the consequences for any action they deem in violation of this Code of Conduct:
|
|
73
|
+
|
|
74
|
+
### 1. Correction
|
|
75
|
+
|
|
76
|
+
**Community Impact**: Use of inappropriate language or other behavior deemed
|
|
77
|
+
unprofessional or unwelcome in the community.
|
|
78
|
+
|
|
79
|
+
**Consequence**: A private, written warning from community leaders, providing
|
|
80
|
+
clarity around the nature of the violation and an explanation of why the
|
|
81
|
+
behavior was inappropriate. A public apology may be requested.
|
|
82
|
+
|
|
83
|
+
### 2. Warning
|
|
84
|
+
|
|
85
|
+
**Community Impact**: A violation through a single incident or series of
|
|
86
|
+
actions.
|
|
87
|
+
|
|
88
|
+
**Consequence**: A warning with consequences for continued behavior. No
|
|
89
|
+
interaction with the people involved, including unsolicited interaction with
|
|
90
|
+
those enforcing the Code of Conduct, for a specified period of time. This
|
|
91
|
+
includes avoiding interactions in community spaces as well as external channels
|
|
92
|
+
like social media. Violating these terms may lead to a temporary or permanent
|
|
93
|
+
ban.
|
|
94
|
+
|
|
95
|
+
### 3. Temporary Ban
|
|
96
|
+
|
|
97
|
+
**Community Impact**: A serious violation of community standards, including
|
|
98
|
+
sustained inappropriate behavior.
|
|
99
|
+
|
|
100
|
+
**Consequence**: A temporary ban from any sort of interaction or public
|
|
101
|
+
communication with the community for a specified period of time. No public or
|
|
102
|
+
private interaction with the people involved, including unsolicited interaction
|
|
103
|
+
with those enforcing the Code of Conduct, is allowed during this period.
|
|
104
|
+
Violating these terms may lead to a permanent ban.
|
|
105
|
+
|
|
106
|
+
### 4. Permanent Ban
|
|
107
|
+
|
|
108
|
+
**Community Impact**: Demonstrating a pattern of violation of community
|
|
109
|
+
standards, including sustained inappropriate behavior, harassment of an
|
|
110
|
+
individual, or aggression toward or disparagement of classes of individuals.
|
|
111
|
+
|
|
112
|
+
**Consequence**: A permanent ban from any sort of public interaction within the
|
|
113
|
+
community.
|
|
114
|
+
|
|
115
|
+
## Attribution
|
|
116
|
+
|
|
117
|
+
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
|
|
118
|
+
version 2.1, available at
|
|
119
|
+
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
|
|
120
|
+
|
|
121
|
+
Community Impact Guidelines were inspired by
|
|
122
|
+
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
|
|
123
|
+
|
|
124
|
+
For answers to common questions about this code of conduct, see the FAQ at
|
|
125
|
+
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
|
|
126
|
+
[https://www.contributor-covenant.org/translations][translations].
|
|
127
|
+
|
|
128
|
+
[homepage]: https://www.contributor-covenant.org
|
|
129
|
+
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
|
|
130
|
+
[Mozilla CoC]: https://github.com/mozilla/diversity
|
|
131
|
+
[FAQ]: https://www.contributor-covenant.org/faq
|
|
132
|
+
[translations]: https://www.contributor-covenant.org/translations
|
data/CONTRIBUTING.md
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
# Contributing to PromptCanary
|
|
2
|
+
|
|
3
|
+
Bug reports and pull requests are welcome on GitHub. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](CODE_OF_CONDUCT.md).
|
|
4
|
+
|
|
5
|
+
## Getting started
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
git clone https://github.com/[USERNAME]/prompt_canary
|
|
9
|
+
cd prompt_canary
|
|
10
|
+
bin/setup
|
|
11
|
+
bundle exec rake # runs tests + lint; everything should be green before you start
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
## Commit message format
|
|
15
|
+
|
|
16
|
+
This project uses a consistent commit message format. Copy and paste the template below for each commit — fill in the plain values, no labels:
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
FIX: Add router fallback for zero-percent rollout
|
|
20
|
+
|
|
21
|
+
(G)
|
|
22
|
+
|
|
23
|
+
Ensure the router returns the primary version when rollout percent is
|
|
24
|
+
explicitly set to 0, not just when omitted.
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Fields in order:
|
|
28
|
+
1. **Subject** — `TYPE: Short summary of the change`. Type is one of: `FEATURE`, `FIX`, `DOCUMENTATION`, `STYLE`, `REFACTOR`, `CHORE`
|
|
29
|
+
2. **Status** — almost always `(G)`. Only commit when all tests are passing. `(R)` exists but should be used only in rare exceptional circumstances.
|
|
30
|
+
3. **Description** — fuller explanation of what changed and why
|
|
31
|
+
|
|
32
|
+
**We only commit green.** Run `bundle exec rake` before every commit and confirm the suite passes. A red commit should be the exception, not the norm.
|
|
33
|
+
|
|
34
|
+
## Development workflow
|
|
35
|
+
|
|
36
|
+
This project is built test-first. Before writing any production code, write a failing test that requires it. See [claude/PLAN.md](claude/PLAN.md) §4 for the full TDD methodology this project follows.
|
|
37
|
+
|
|
38
|
+
## Running tests
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
bundle exec rake spec # full suite
|
|
42
|
+
bundle exec rspec spec/path/to/foo_spec.rb # single file
|
|
43
|
+
bundle exec rspec spec/path/to/foo_spec.rb:42 # single example
|
|
44
|
+
bundle exec rubocop # lint
|
|
45
|
+
```
|
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
The MIT License (MIT)
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Rockwell Windsor Rice
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
|
13
|
+
all copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,338 @@
|
|
|
1
|
+
# PromptCanary
|
|
2
|
+
|
|
3
|
+
[](https://github.com/rockwellwindsor/prompt_canary/actions/workflows/main.yml)
|
|
4
|
+
[](https://badge.fury.io/rb/prompt_canary)
|
|
5
|
+
|
|
6
|
+
Canary deployments for LLM prompts in Ruby. Declare prompts as versioned Ruby classes, route traffic by percentage or predicate, record telemetry, and automatically roll back misbehaving versions when error rate or latency exceeds a configured threshold.
|
|
7
|
+
|
|
8
|
+
## Design philosophy
|
|
9
|
+
|
|
10
|
+
PromptCanary's value is in routing, telemetry, and rollback — not in where your prompt text lives. You can declare versions directly in Ruby classes, load them from database records at boot, or both. Either way the gem handles traffic splitting, call recording, and automatic demotion identically.
|
|
11
|
+
|
|
12
|
+
## Installation
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
bundle add prompt_canary
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
Or add to your Gemfile:
|
|
19
|
+
|
|
20
|
+
```ruby
|
|
21
|
+
gem "prompt_canary"
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Rails Setup
|
|
25
|
+
|
|
26
|
+
Run the install generator:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
rails generate prompt_canary:install
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
This creates a single migration that sets up all four tables (`prompt_canary_calls`, `prompt_canary_rollout_overrides`, `prompt_canary_primary_overrides`, `prompt_canary_events`) and mounts the engine in `config/routes.rb`. Then run:
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
rails db:migrate
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
Add the adapter gem to your Gemfile — PromptCanary does not pull it in automatically:
|
|
39
|
+
|
|
40
|
+
```ruby
|
|
41
|
+
gem "anthropic" # required when using adapter: :anthropic
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Configure in `config/initializers/prompt_canary.rb`:
|
|
45
|
+
|
|
46
|
+
```ruby
|
|
47
|
+
PromptCanary.configure do |c|
|
|
48
|
+
c.adapter = :anthropic # required
|
|
49
|
+
c.api_key = ENV["ANTHROPIC_API_KEY"]
|
|
50
|
+
c.storage = :active_record # recommended for Rails
|
|
51
|
+
end
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## Configuration
|
|
55
|
+
|
|
56
|
+
```ruby
|
|
57
|
+
PromptCanary.configure do |c|
|
|
58
|
+
c.adapter = :anthropic # required
|
|
59
|
+
c.storage = :sqlite # :sqlite, :active_record, or :memory (tests)
|
|
60
|
+
end
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
Both `adapter` and `storage` are required. `ConfigurationError` is raised immediately if either is missing or unknown — not at first call.
|
|
64
|
+
|
|
65
|
+
Use `:active_record` in Rails apps, `:sqlite` for standalone scripts, and `:memory` in tests.
|
|
66
|
+
|
|
67
|
+
## Defining a Prompt
|
|
68
|
+
|
|
69
|
+
Include `PromptCanary::Promptable` in any class and declare versions using the DSL:
|
|
70
|
+
|
|
71
|
+
```ruby
|
|
72
|
+
class InvoiceExtractor
|
|
73
|
+
include PromptCanary::Promptable
|
|
74
|
+
|
|
75
|
+
version "v1" do
|
|
76
|
+
model "claude-opus-4-7"
|
|
77
|
+
system "Extract structured data from this invoice."
|
|
78
|
+
end
|
|
79
|
+
end
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
The first declared version is automatically treated as primary — no flag needed. Place prompt classes in `app/prompts/` — the Railtie loads them automatically on boot.
|
|
83
|
+
|
|
84
|
+
## Loading Prompts from the Database
|
|
85
|
+
|
|
86
|
+
Declaring versions in the class body and loading them from database records are equally supported patterns. The DSL works the same either way — it is registering `Version` objects regardless of where the data comes from.
|
|
87
|
+
|
|
88
|
+
To load from the DB, call the DSL in an initializer after the connection is established:
|
|
89
|
+
|
|
90
|
+
```ruby
|
|
91
|
+
# config/initializers/prompt_canary.rb
|
|
92
|
+
PromptCanary.configure do |c|
|
|
93
|
+
c.adapter = :anthropic
|
|
94
|
+
c.storage = :active_record
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
PromptRecord.all.each do |record|
|
|
98
|
+
klass = record.prompt_class.constantize
|
|
99
|
+
klass.version(record.version_name) do
|
|
100
|
+
model record.model
|
|
101
|
+
system record.system_prompt
|
|
102
|
+
end
|
|
103
|
+
end
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Choose whichever approach fits your team's operational needs. The gem has no opinion on where prompt text lives.
|
|
107
|
+
|
|
108
|
+
## Calling a Prompt
|
|
109
|
+
|
|
110
|
+
```ruby
|
|
111
|
+
result = InvoiceExtractor.call(user_message: "Invoice #123")
|
|
112
|
+
|
|
113
|
+
result.text # => "Here is the extracted data..."
|
|
114
|
+
result.version_used # => "v1"
|
|
115
|
+
result.model # => "claude-opus-4-7"
|
|
116
|
+
result.latency_ms # => 312
|
|
117
|
+
result.tokens # => { input: 50, output: 120 }
|
|
118
|
+
result.error # => nil (or the exception if the adapter failed)
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
`call` always returns a `Result` — errors are captured in `result.error`, not raised.
|
|
122
|
+
|
|
123
|
+
## Routing Traffic
|
|
124
|
+
|
|
125
|
+
### Percentage rollout
|
|
126
|
+
|
|
127
|
+
```ruby
|
|
128
|
+
class InvoiceExtractor
|
|
129
|
+
include PromptCanary::Promptable
|
|
130
|
+
|
|
131
|
+
version "v1" do
|
|
132
|
+
model "claude-opus-4-7"
|
|
133
|
+
system "Extract structured data from this invoice."
|
|
134
|
+
end
|
|
135
|
+
|
|
136
|
+
version "v2" do
|
|
137
|
+
model "claude-opus-4-7"
|
|
138
|
+
system "Extract structured data. Return JSON."
|
|
139
|
+
rollout percent: 10
|
|
140
|
+
end
|
|
141
|
+
end
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
Pass a `call_id` in context for deterministic routing — the same `call_id` always produces the same version:
|
|
145
|
+
|
|
146
|
+
```ruby
|
|
147
|
+
InvoiceExtractor.call(user_message: "Invoice #123", context: { call_id: current_user.id })
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### Predicate rollout
|
|
151
|
+
|
|
152
|
+
Route to a version based on any condition:
|
|
153
|
+
|
|
154
|
+
```ruby
|
|
155
|
+
version "v2" do
|
|
156
|
+
model "claude-opus-4-7"
|
|
157
|
+
system "Extract structured data. Return JSON."
|
|
158
|
+
rollout_to { |ctx| ctx[:user]&.fetch(:beta, false) }
|
|
159
|
+
end
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
```ruby
|
|
163
|
+
InvoiceExtractor.call(
|
|
164
|
+
user_message: "Invoice #123",
|
|
165
|
+
context: { user: { id: 42, beta: true } }
|
|
166
|
+
)
|
|
167
|
+
# => routes to v2 for beta users
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
If the predicate raises, the router falls back to the primary version — always safe.
|
|
171
|
+
|
|
172
|
+
## Canary Traffic
|
|
173
|
+
|
|
174
|
+
Traffic percentage and version status are independent controls. `rollout percent:` sets the initial split at declaration time. `set_canary` adjusts it at runtime without a deploy:
|
|
175
|
+
|
|
176
|
+
```ruby
|
|
177
|
+
PromptCanary.set_canary(InvoiceExtractor, "v2", 20) # send 20% to v2
|
|
178
|
+
PromptCanary.set_canary(InvoiceExtractor, "v2", 50) # ramp up to 50%
|
|
179
|
+
PromptCanary.set_canary(InvoiceExtractor, "v2", 80) # almost there
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
Passing `0` to `set_canary` is not allowed — use `demote` to stop traffic. This keeps the audit trail unambiguous.
|
|
183
|
+
|
|
184
|
+
### Typical canary ramp
|
|
185
|
+
|
|
186
|
+
```
|
|
187
|
+
v1: primary, 100% traffic
|
|
188
|
+
v2: candidate, 20% traffic ← set_canary(InvoiceExtractor, "v2", 20)
|
|
189
|
+
← watch telemetry
|
|
190
|
+
v2: candidate, 50% traffic ← set_canary(InvoiceExtractor, "v2", 50)
|
|
191
|
+
← looks good
|
|
192
|
+
v2: primary, 50% traffic ← promote(InvoiceExtractor, "v2")
|
|
193
|
+
v2: primary, 100% traffic ← demote(InvoiceExtractor, "v1")
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
Promoting v2 changes its status only — it does not flip traffic to 100%. The traffic ramp is a separate deliberate step.
|
|
197
|
+
|
|
198
|
+
## Promote and Demote
|
|
199
|
+
|
|
200
|
+
**Status** (primary / candidate / demoted) and **traffic percentage** are separate, independent concepts.
|
|
201
|
+
|
|
202
|
+
- **promote** — marks a version as primary. The previous primary becomes a candidate. Traffic percentages are not changed.
|
|
203
|
+
- **demote** — sets status to demoted and zeros traffic immediately. This is the emergency brake.
|
|
204
|
+
- **restore** — returns a demoted version to candidate status and restores its pre-demotion traffic percentage.
|
|
205
|
+
|
|
206
|
+
```ruby
|
|
207
|
+
PromptCanary.promote(InvoiceExtractor, "v2")
|
|
208
|
+
PromptCanary.promote(InvoiceExtractor, "v2", reason: "canary passed")
|
|
209
|
+
|
|
210
|
+
PromptCanary.demote(InvoiceExtractor, "v2")
|
|
211
|
+
PromptCanary.demote(InvoiceExtractor, "v2", reason: "error rate spike")
|
|
212
|
+
|
|
213
|
+
PromptCanary.restore(InvoiceExtractor, "v2")
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
Attempting to demote the primary version when no other viable candidate exists raises `CannotDemotePrimaryError` — the system is never left without a route target.
|
|
217
|
+
|
|
218
|
+
All status operations write an audit event to `prompt_canary_events`.
|
|
219
|
+
|
|
220
|
+
## Auto-Rollback
|
|
221
|
+
|
|
222
|
+
Define rollback rules on a version:
|
|
223
|
+
|
|
224
|
+
```ruby
|
|
225
|
+
version "v2" do
|
|
226
|
+
model "claude-opus-4-7"
|
|
227
|
+
system "Extract structured data. Return JSON."
|
|
228
|
+
rollout percent: 10
|
|
229
|
+
rollback_if :error_rate, greater_than: 0.05, over: 100
|
|
230
|
+
rollback_if :latency_p95, greater_than: 2000, over: 100
|
|
231
|
+
end
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
### In Rails
|
|
235
|
+
|
|
236
|
+
`PromptCanary::MonitorJob` is included and ready to queue:
|
|
237
|
+
|
|
238
|
+
```ruby
|
|
239
|
+
# config/initializers/prompt_canary.rb or a scheduler
|
|
240
|
+
PromptCanary::MonitorJob.set(wait: 5.minutes).perform_later
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
Schedule it with any background job backend (Sidekiq, GoodJob, Solid Queue, etc.).
|
|
244
|
+
|
|
245
|
+
### Standalone
|
|
246
|
+
|
|
247
|
+
```ruby
|
|
248
|
+
recorder = PromptCanary::Recorder.new(storage: PromptCanary::Storage::SQLite.new)
|
|
249
|
+
PromptCanary::Monitor.new(recorder: recorder).evaluate(InvoiceExtractor)
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
When a rule fires, `PromptCanary.demote` is called automatically — the version's rollout is zeroed, a `prompt_canary.demoted` notification is emitted, and a monitor-triggered audit event is written.
|
|
253
|
+
|
|
254
|
+
## CLI
|
|
255
|
+
|
|
256
|
+
```bash
|
|
257
|
+
# Promote a version to primary
|
|
258
|
+
prompt_canary promote InvoiceExtractor v2
|
|
259
|
+
prompt_canary promote InvoiceExtractor v2 --reason "canary passed"
|
|
260
|
+
|
|
261
|
+
# Demote a version (emergency stop)
|
|
262
|
+
prompt_canary demote InvoiceExtractor v2 --reason "error rate spike"
|
|
263
|
+
|
|
264
|
+
# Show current status and traffic for all versions
|
|
265
|
+
prompt_canary status InvoiceExtractor
|
|
266
|
+
|
|
267
|
+
# Show deployment history
|
|
268
|
+
prompt_canary history InvoiceExtractor
|
|
269
|
+
prompt_canary history InvoiceExtractor --since 7d
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
## Dashboard
|
|
273
|
+
|
|
274
|
+
The engine mounts a web dashboard at the path configured in your routes (default `/prompt_canary`):
|
|
275
|
+
|
|
276
|
+
- **Index** — all registered prompt classes with per-version call counts, error rates, P95 latency, and last-called timestamps
|
|
277
|
+
- **Show** — version breakdown with a Promote button for candidate versions, deployment history from the audit trail, and the 50 most recent calls with per-call latency, token counts, and error detail
|
|
278
|
+
|
|
279
|
+
No authentication is wired in by default. Protect the mount point with your app's existing auth if needed:
|
|
280
|
+
|
|
281
|
+
```ruby
|
|
282
|
+
authenticate :user, ->(u) { u.admin? } do
|
|
283
|
+
mount PromptCanary::Engine, at: "/prompt_canary"
|
|
284
|
+
end
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
## Notifications
|
|
288
|
+
|
|
289
|
+
Subscribe to deployment events:
|
|
290
|
+
|
|
291
|
+
```ruby
|
|
292
|
+
PromptCanary.subscribe("prompt_canary.promoted") do |payload|
|
|
293
|
+
puts "#{payload[:prompt]} #{payload[:version]} promoted"
|
|
294
|
+
end
|
|
295
|
+
|
|
296
|
+
PromptCanary.subscribe("prompt_canary.demoted") do |payload|
|
|
297
|
+
puts "#{payload[:prompt]} #{payload[:version]} demoted — #{payload[:reason]}"
|
|
298
|
+
end
|
|
299
|
+
|
|
300
|
+
PromptCanary.subscribe("prompt_canary.restored") do |payload|
|
|
301
|
+
puts "#{payload[:prompt]} #{payload[:version]} restored"
|
|
302
|
+
end
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
## Examples
|
|
306
|
+
|
|
307
|
+
Two runnable scripts are included in `examples/`. Both use a stubbed adapter and require no API key:
|
|
308
|
+
|
|
309
|
+
```bash
|
|
310
|
+
# Full call flow — routing, result structure, version distribution
|
|
311
|
+
bundle exec ruby examples/demo.rb
|
|
312
|
+
|
|
313
|
+
# Auto-rollback demo — seeds synthetic errors, runs monitor, watches demotion fire
|
|
314
|
+
bundle exec ruby examples/auto_rollback.rb
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
Pass `--real` to `demo.rb` to hit the Anthropic API directly (requires `ANTHROPIC_API_KEY`):
|
|
318
|
+
|
|
319
|
+
```bash
|
|
320
|
+
ANTHROPIC_API_KEY=sk-... bundle exec ruby examples/demo.rb --real
|
|
321
|
+
```
|
|
322
|
+
|
|
323
|
+
## Development
|
|
324
|
+
|
|
325
|
+
```bash
|
|
326
|
+
bin/setup # install dependencies
|
|
327
|
+
bundle exec rake # run tests + lint
|
|
328
|
+
bundle exec rspec spec/foo_spec.rb:42 # run a single example
|
|
329
|
+
bin/console # interactive prompt with gem loaded
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
## Contributing
|
|
333
|
+
|
|
334
|
+
See [CONTRIBUTING.md](CONTRIBUTING.md).
|
|
335
|
+
|
|
336
|
+
## License
|
|
337
|
+
|
|
338
|
+
MIT. See [LICENSE.txt](LICENSE.txt).
|
data/Rakefile
ADDED