mcpeye 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/README.md +265 -0
- data/lib/mcpeye/intent.rb +94 -0
- data/lib/mcpeye/redaction.rb +112 -0
- data/lib/mcpeye/request_capability.rb +74 -0
- data/lib/mcpeye/tracker.rb +844 -0
- data/lib/mcpeye/version.rb +5 -0
- data/lib/mcpeye.rb +66 -0
- metadata +55 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: 01f13d2a4a0f1671c2fe3cec6feef771c84da1e88f101915972330c5c28692ce
|
|
4
|
+
data.tar.gz: 811c321338035cc6c1d89ae1a4c51a5389d2a5e10c60d006c8404dbd045c98f1
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 258a88c0365ba40bd06487be165223e88d002432ca7820a2b79c986bbc81799758936d1d593aefcc908e46b4038374c2058c1557ba3cef89200e99fdbd1dd9e0
|
|
7
|
+
data.tar.gz: 04fa49d61cdb2552b7c6a1cb38db0f5108921897a84db135dc2637ef2b3befc2de728312961734ce6d21dcb32fe7550f271816b73e2ff2df15f0e19bd9c8a919
|
data/README.md
ADDED
|
@@ -0,0 +1,265 @@
|
|
|
1
|
+
# mcpeye (Ruby gem)
|
|
2
|
+
|
|
3
|
+
> See why your agent is failing.
|
|
4
|
+
|
|
5
|
+
Product analytics for **Ruby / Rails MCP servers**. `mcpeye` captures what
|
|
6
|
+
agents try to do through your MCP tools — and, crucially, the asks your tools
|
|
7
|
+
**could not** fulfill — then ships them to your self-hosted [mcpeye](https://github.com/mcpeye/mcpeye)
|
|
8
|
+
instance. There the worker clusters sessions into the **Intent Gap Report**: the
|
|
9
|
+
top user asks your tools attempted but failed to deliver.
|
|
10
|
+
|
|
11
|
+
This gem is the dogfooding SDK for the mcpeye team's own Rails-based MCP server,
|
|
12
|
+
and for any Ruby shop running one. It is **pure stdlib at runtime** (`net/http`,
|
|
13
|
+
`json`, `securerandom`) and **never raises into, or alters, the host server**.
|
|
14
|
+
Capture is O(1); set `flush_interval:` to ship telemetry off the tool-call thread
|
|
15
|
+
(see [Options](#options) for the zero-thread default's one caveat).
|
|
16
|
+
|
|
17
|
+
## How it works
|
|
18
|
+
|
|
19
|
+
1. **Inject** an optional `mcpeyeIntent` parameter into every tool's input schema.
|
|
20
|
+
The agent self-reports — in its own words — why it is calling the tool and any
|
|
21
|
+
blocker the user hit. Capture is near-zero cost: **no per-call LLM**.
|
|
22
|
+
2. **Capture** each tool call (name, arguments, result, error, duration, the
|
|
23
|
+
self-reported intent).
|
|
24
|
+
3. **Add** the reserved `mcpeye_request_capability` tool (active missing-capability
|
|
25
|
+
capture). When the agent wants a capability none of your tools cover, it calls
|
|
26
|
+
this tool to say so in the user's words; mcpeye answers it locally with a canned
|
|
27
|
+
acknowledgement (never forwarding it to your server) and records it as a normal
|
|
28
|
+
tool call with `tool_name = "mcpeye_request_capability"`. The report folds these
|
|
29
|
+
into "Top missing capabilities" as high-confidence, explicitly-requested
|
|
30
|
+
entries — catching the *silent miss*, where the right move is to call no tool at
|
|
31
|
+
all. Disable with `capture_missing_capabilities: false`.
|
|
32
|
+
4. **Redact** secrets/PII client-side (regex-based) *before* anything leaves the
|
|
33
|
+
process, and size-bound every field. Self-hosting is the real privacy control;
|
|
34
|
+
redaction shrinks the blast radius of obvious secrets in free-text args.
|
|
35
|
+
5. **Buffer + POST** the events as a single `IngestPayload` JSON to
|
|
36
|
+
`"#{ingest_url}/ingest"` with an `x-mcpeye-secret` header.
|
|
37
|
+
|
|
38
|
+
The wire payload is byte-compatible across every mcpeye SDK (TS / Python / Ruby):
|
|
39
|
+
|
|
40
|
+
```json
|
|
41
|
+
{
|
|
42
|
+
"projectId": "my-project-id",
|
|
43
|
+
"identity": { "userId": "u_123", "client": "claude-desktop/0.7.1", "serverVersion": "1.4.0" },
|
|
44
|
+
"events": [
|
|
45
|
+
{
|
|
46
|
+
"callId": "f7c1...uuid",
|
|
47
|
+
"toolName": "search_products",
|
|
48
|
+
"arguments": { "q": "headphones" },
|
|
49
|
+
"result": { "count": 3 },
|
|
50
|
+
"isError": false,
|
|
51
|
+
"intent": "The user is searching for wireless headphones to add to their cart.",
|
|
52
|
+
"durationMs": 42,
|
|
53
|
+
"timestamp": 1718323200000
|
|
54
|
+
}
|
|
55
|
+
]
|
|
56
|
+
}
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Install
|
|
60
|
+
|
|
61
|
+
Add to your `Gemfile`:
|
|
62
|
+
|
|
63
|
+
```ruby
|
|
64
|
+
gem "mcpeye"
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
Then `bundle install`. No runtime dependencies.
|
|
68
|
+
|
|
69
|
+
## Configuration
|
|
70
|
+
|
|
71
|
+
Reads the standard mcpeye env vars when arguments are omitted:
|
|
72
|
+
|
|
73
|
+
| Env var | Purpose |
|
|
74
|
+
| ----------------------- | -------------------------------------------------------- |
|
|
75
|
+
| `MCPEYE_INGEST_URL` | Base URL of your self-hosted mcpeye API (no `/ingest`). |
|
|
76
|
+
| `MCPEYE_INGEST_SECRET` | Shared secret sent as the `x-mcpeye-secret` header. |
|
|
77
|
+
|
|
78
|
+
## Quick start
|
|
79
|
+
|
|
80
|
+
```ruby
|
|
81
|
+
require "mcpeye"
|
|
82
|
+
|
|
83
|
+
tracker = Mcpeye.track(
|
|
84
|
+
server, # your MCP server object
|
|
85
|
+
"my-project-id",
|
|
86
|
+
ingest_url: ENV["MCPEYE_INGEST_URL"], # e.g. "http://localhost:3001"
|
|
87
|
+
ingest_secret: ENV["MCPEYE_INGEST_SECRET"]
|
|
88
|
+
)
|
|
89
|
+
|
|
90
|
+
# That's it. A drain is auto-registered via at_exit, so you no longer have to
|
|
91
|
+
# remember it. (An explicit `at_exit { tracker.flush }` still works — it's
|
|
92
|
+
# idempotent.)
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
`Mcpeye.track` injects `mcpeyeIntent` into discoverable tool schemas and wraps
|
|
96
|
+
discoverable handlers automatically. Ruby MCP server shapes vary, so when the
|
|
97
|
+
internals can't be introspected `track` returns a working tracker unchanged (and
|
|
98
|
+
reports once via `on_error`) — instrument manually with `#wrap` / `#record`.
|
|
99
|
+
|
|
100
|
+
## Options
|
|
101
|
+
|
|
102
|
+
All optional except `project_id`:
|
|
103
|
+
|
|
104
|
+
| Option | Default | What it does |
|
|
105
|
+
| ---------------- | ----------------------------- | --------------------------------------------------------------------------- |
|
|
106
|
+
| `ingest_url:` | `ENV["MCPEYE_INGEST_URL"]` | Base URL; `/ingest` is appended. |
|
|
107
|
+
| `ingest_secret:` | `ENV["MCPEYE_INGEST_SECRET"]` | Sent as `x-mcpeye-secret`. Missing → ingest 401 (warned once). |
|
|
108
|
+
| `redact:` | `true` | Scrub secrets/PII from arguments/result/intent/error. `false` = verbatim. |
|
|
109
|
+
| `identity:` | `{}` | Static `{ userId:, client:, serverVersion: }`. |
|
|
110
|
+
| `identify:` | `nil` | Callable evaluated **once per flush** for per-request identity. A raising one yields `{}`. |
|
|
111
|
+
| `flush_interval:`| `nil` (no thread) | Seconds between background flushes. Set it to drain low-traffic servers. |
|
|
112
|
+
| `flush_threshold:` | `20` | Eager-flush once this many events buffer. |
|
|
113
|
+
| `denylist_fields:` | `[]` | Extra field names whose values are always dropped (case-insensitive). |
|
|
114
|
+
| `max_buffer:` | `10_000` | Hard cap; oldest events drop past it while the API is down (warned once). |
|
|
115
|
+
| `capture_missing_capabilities:` | `true` | Add + locally answer the reserved `mcpeye_request_capability` tool. `false` keeps it out of your manifest. |
|
|
116
|
+
| `on_error:` | `warn "[mcpeye] ..."` | Diagnostics sink for every swallowed error. Wrapped so it can never throw. |
|
|
117
|
+
|
|
118
|
+
> **Manifest cost.** With `capture_missing_capabilities: true`, your server's tool
|
|
119
|
+
> list gains one extra tool — a few hundred tokens in any model context that lists
|
|
120
|
+
> tools, and one more entry in any tool picker / doc generator. That is the price
|
|
121
|
+
> of seeing silent misses; pass `false` to keep it out. Auto-add works for Hash /
|
|
122
|
+
> Array tool registries; for other server shapes the constant + descriptor live in
|
|
123
|
+
> `Mcpeye::RequestCapability` so you can register it yourself.
|
|
124
|
+
|
|
125
|
+
When `flush_interval` is set, a single background flush thread ships batches off
|
|
126
|
+
the tool-call thread, so capture never blocks. Without it the gem is
|
|
127
|
+
**zero-thread**: events flush eagerly at `flush_threshold`, on a manual `#flush`,
|
|
128
|
+
and via the auto `at_exit` drain — but that threshold flush runs **synchronously on
|
|
129
|
+
the calling thread** (one `Net::HTTP` POST, bounded by the 5s/10s open/read
|
|
130
|
+
timeouts), so the Nth tool call can block briefly. For a busy or latency-sensitive
|
|
131
|
+
server, set `flush_interval:` so capture is fully non-blocking.
|
|
132
|
+
|
|
133
|
+
Identity values (`userId`/`client`/`serverVersion`) are coerced to strings before
|
|
134
|
+
they're sent (the ingest contract requires strings), so an integer id is fine —
|
|
135
|
+
but pass an **opaque, non-PII** value.
|
|
136
|
+
|
|
137
|
+
## Dogfooding in a Rails MCP server
|
|
138
|
+
|
|
139
|
+
`config/initializers/mcpeye.rb`:
|
|
140
|
+
|
|
141
|
+
```ruby
|
|
142
|
+
require "mcpeye"
|
|
143
|
+
|
|
144
|
+
MCPEYE = Mcpeye.track(
|
|
145
|
+
MyMcpServer.instance, # your server object
|
|
146
|
+
ENV.fetch("MCPEYE_PROJECT_ID"),
|
|
147
|
+
ingest_url: ENV.fetch("MCPEYE_INGEST_URL", "http://localhost:3001"),
|
|
148
|
+
ingest_secret: ENV["MCPEYE_INGEST_SECRET"],
|
|
149
|
+
# Per-request identity: evaluated once per flush, so a thread/request-local
|
|
150
|
+
# value is attributed correctly. Pass an OPAQUE, non-PII id (coerced to a string).
|
|
151
|
+
identify: -> { { userId: Current.user_id&.to_s, client: Current.client, serverVersion: MyApp::VERSION } },
|
|
152
|
+
# Drop your own domain-sensitive fields on top of the built-in denylist:
|
|
153
|
+
denylist_fields: %w[ssn account_number],
|
|
154
|
+
# Forward diagnostics into your logger instead of stderr:
|
|
155
|
+
on_error: ->(e) { Rails.logger.warn("[mcpeye] #{e}") }
|
|
156
|
+
)
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### Puma / Unicorn (forking servers)
|
|
160
|
+
|
|
161
|
+
A thread does **not** survive `fork`, so in a clustered server start the flush
|
|
162
|
+
timer **after** each worker boots, and drain on shutdown:
|
|
163
|
+
|
|
164
|
+
```ruby
|
|
165
|
+
# config/puma.rb
|
|
166
|
+
on_worker_boot do
|
|
167
|
+
# Re-start the background flush thread inside each forked worker.
|
|
168
|
+
MCPEYE.start_flush_thread
|
|
169
|
+
end
|
|
170
|
+
|
|
171
|
+
on_worker_shutdown do
|
|
172
|
+
MCPEYE.stop # stops the timer + does a final flush
|
|
173
|
+
end
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
Pass `flush_interval:` to `Mcpeye.track` (e.g. `flush_interval: 5`) so
|
|
177
|
+
`start_flush_thread` has an interval to use. Even without the timer, the eager
|
|
178
|
+
threshold flush + the auto `at_exit` drain still ship per worker; at worst a tiny
|
|
179
|
+
tail is lost on `SIGKILL` (same as the other SDKs).
|
|
180
|
+
|
|
181
|
+
## Manual capture (when auto-wrap can't attach)
|
|
182
|
+
|
|
183
|
+
mcpeye duck-types the common Ruby MCP shapes — a server exposing `tools` /
|
|
184
|
+
`registered_tools` (a method or `@tools` / `@registered_tools` ivar) whose entries
|
|
185
|
+
are Hashes carrying an input schema (`"inputSchema"`, `:input_schema`, `"schema"`)
|
|
186
|
+
and, for auto-wrapping, a `name` plus a callable (`"handler"` / `"call"`). When
|
|
187
|
+
your server doesn't match (a custom Rack handler, a frozen tool registry, a future
|
|
188
|
+
framework), `instrument` is a safe no-op and you capture calls yourself:
|
|
189
|
+
|
|
190
|
+
```ruby
|
|
191
|
+
class SearchContactsTool
|
|
192
|
+
def self.handler
|
|
193
|
+
MCPEYE.wrap("search_contacts") do |args|
|
|
194
|
+
Contact.search(args["q"]).as_json # your real logic
|
|
195
|
+
end
|
|
196
|
+
end
|
|
197
|
+
end
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
`#wrap` strips `mcpeyeIntent` out of the incoming arguments, records it as
|
|
201
|
+
`intent`, runs your handler with the cleaned args, and returns the result
|
|
202
|
+
unchanged. A handler that raises is recorded as `isError` and the **identical
|
|
203
|
+
exception is re-raised**. Or record fully by hand:
|
|
204
|
+
|
|
205
|
+
```ruby
|
|
206
|
+
MCPEYE.record(
|
|
207
|
+
"place_order",
|
|
208
|
+
{ "items" => [{ "sku" => "p_7720", "qty" => 1 }] },
|
|
209
|
+
result: { "id" => "ord_123" },
|
|
210
|
+
intent: "User is placing an order but couldn't find a way to apply a discount code.",
|
|
211
|
+
duration_ms: 88
|
|
212
|
+
)
|
|
213
|
+
MCPEYE.flush
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
## Result-level errors
|
|
217
|
+
|
|
218
|
+
Besides a raised exception, mcpeye also captures a **tool-level** failure: when a
|
|
219
|
+
handler returns a result Hash carrying a truthy `"isError"` (the MCP `CallTool`
|
|
220
|
+
convention, e.g. `{ "isError" => true, "content" => [{ "type" => "text", "text" => "..." }] }`),
|
|
221
|
+
the event is recorded with `isError: true`, an `errorMessage` derived from the
|
|
222
|
+
content text, and the `result` omitted. The handler's return value is passed back
|
|
223
|
+
to the caller unchanged.
|
|
224
|
+
|
|
225
|
+
## Redaction
|
|
226
|
+
|
|
227
|
+
Client-side, regex-based, deliberately conservative — it over-redacts rather than
|
|
228
|
+
leak. Out of the box it scrubs emails, API keys (`sk-`, `sk-ant-`, GitHub
|
|
229
|
+
`gh*_`, AWS `AKIA…`), Bearer tokens, JWTs, card-like and phone-like number runs,
|
|
230
|
+
and drops the values of denylisted fields (`password`, `secret`, `token`,
|
|
231
|
+
`apiKey`, `authorization`, …). Deeply-nested and self-referential structures are
|
|
232
|
+
guarded (`[REDACTED_TOO_DEEP]` / `[REDACTED_CYCLE]`), and any single field over
|
|
233
|
+
~32 KB is replaced with a small marker so a multi-MB payload can never blow the
|
|
234
|
+
ingest body limit. Extend the denylist with `denylist_fields:`; disable redaction
|
|
235
|
+
with `redact: false`.
|
|
236
|
+
|
|
237
|
+
Redaction is **not** a substitute for self-hosting — it reduces the blast radius
|
|
238
|
+
of obvious secrets that slip into free-text arguments and intent strings.
|
|
239
|
+
|
|
240
|
+
## API
|
|
241
|
+
|
|
242
|
+
- `Mcpeye.track(server, project_id, ingest_url:, ingest_secret:, redact:, identity:, identify:, flush_interval:, on_error:, **opts) -> Tracker`
|
|
243
|
+
- `Mcpeye::Tracker#instrument(server)` — inject param + wrap discoverable handlers
|
|
244
|
+
- `Mcpeye::Tracker#inject_intent_param(server)`
|
|
245
|
+
- `Mcpeye::Tracker#wrap(tool_name) { |args| ... } -> Proc`
|
|
246
|
+
- `Mcpeye::Tracker#record(tool_name, args, result:, is_error:, error_message:, intent:, duration_ms:) -> Hash`
|
|
247
|
+
- `Mcpeye::Tracker#flush -> Net::HTTPResponse | nil` (never raises)
|
|
248
|
+
- `Mcpeye::Tracker#start_flush_thread` — start the background flush timer (call in `on_worker_boot`)
|
|
249
|
+
- `Mcpeye::Tracker#stop -> nil` — stop the timer + final flush
|
|
250
|
+
- `Mcpeye::Tracker#pending -> Integer`
|
|
251
|
+
- `Mcpeye::Redaction.redact_string(s)`, `Mcpeye::Redaction.redact_value(v, denylist_fields:)`
|
|
252
|
+
- `Mcpeye::INTENT_PARAM_NAME` (`"mcpeyeIntent"`), `Mcpeye::INTENT_PARAM_DESCRIPTION`
|
|
253
|
+
|
|
254
|
+
## Development
|
|
255
|
+
|
|
256
|
+
```bash
|
|
257
|
+
cd packages/sdk-ruby
|
|
258
|
+
bundle install
|
|
259
|
+
bundle exec rake spec # the seven-spec RSpec suite
|
|
260
|
+
# or: pnpm --filter @mcpeye/sdk-ruby test
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
## License
|
|
264
|
+
|
|
265
|
+
MIT
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Mcpeye
|
|
4
|
+
# The injected-intent contract.
|
|
5
|
+
#
|
|
6
|
+
# mcpeye's cheap capture trick: the SDK injects an optional `mcpeyeIntent`
|
|
7
|
+
# parameter into every tool's input schema. The agent self-reports, in its own
|
|
8
|
+
# words, why it is calling the tool and any blocker the user hit — so we capture
|
|
9
|
+
# intent at near-zero cost, with NO per-call LLM. The LLM runs later, in the
|
|
10
|
+
# worker, only to cluster sessions into reports.
|
|
11
|
+
#
|
|
12
|
+
# The description below is what the agent reads. It is deliberately specific
|
|
13
|
+
# about surfacing failures/blockers AND naming any capability the user wanted
|
|
14
|
+
# that no tool provides — those attempted-but-failed asks, phrased as the user's
|
|
15
|
+
# own unmet need, are the hero signal (the Intent Gap Report).
|
|
16
|
+
#
|
|
17
|
+
# Keep INTENT_PARAM_DESCRIPTION byte-for-byte in sync with @mcpeye/core
|
|
18
|
+
# (packages/core/src/intent.ts) and the other SDKs (Python:
|
|
19
|
+
# packages/sdk-python/src/mcpeye/intent.py). Every server speaks the same
|
|
20
|
+
# contract; spec/intent_spec.rb asserts this string against the canonical text.
|
|
21
|
+
module Intent
|
|
22
|
+
INTENT_PARAM_NAME = "mcpeyeIntent"
|
|
23
|
+
|
|
24
|
+
# Byte-for-byte identical to packages/core/src/intent.ts INTENT_PARAM_DESCRIPTION.
|
|
25
|
+
# If you change one, change all SDKs (TS, Python, Ruby) together.
|
|
26
|
+
INTENT_PARAM_DESCRIPTION =
|
|
27
|
+
"Explain why you are calling this tool and how it fits into the user's overall workflow. " \
|
|
28
|
+
"This parameter is used only for product analytics and user-intent tracking. " \
|
|
29
|
+
"Write 25-35 words, in the third person. " \
|
|
30
|
+
"Exclude sensitive information such as credentials, passwords, or personal data. " \
|
|
31
|
+
"Describe any blocker or failure the user hit. " \
|
|
32
|
+
"Most important: if the user wanted to do something these tools cannot do, state the missing " \
|
|
33
|
+
"capability they needed, in their own words (for example: 'wanted to export the report as CSV, " \
|
|
34
|
+
"but no export tool exists')."
|
|
35
|
+
|
|
36
|
+
# JSON-Schema fragment merged into each tool's inputSchema by the SDKs.
|
|
37
|
+
# Returns a fresh Hash each call so a caller mutating it can never corrupt the
|
|
38
|
+
# shared contract.
|
|
39
|
+
def self.param_json_schema
|
|
40
|
+
{
|
|
41
|
+
"type" => "string",
|
|
42
|
+
"description" => INTENT_PARAM_DESCRIPTION
|
|
43
|
+
}
|
|
44
|
+
end
|
|
45
|
+
|
|
46
|
+
# Whether a JSON-Schema fragment is object-shaped, i.e. somewhere we can add an
|
|
47
|
+
# `mcpeyeIntent` string property. The single source of truth for object
|
|
48
|
+
# detection, shared by `inject_intent_param` and the Tracker's in-place
|
|
49
|
+
# auto-injection so the two paths can never drift (matches Python's guard):
|
|
50
|
+
#
|
|
51
|
+
# - `type == "object"`, OR a `properties` key (string or symbol), OR a
|
|
52
|
+
# literally-empty `{}` (a parameterless tool — synthesize an object).
|
|
53
|
+
# - An explicit non-object type (e.g. `{ "type" => "array" }`) is NOT object
|
|
54
|
+
# shaped; a typeless-but-non-empty schema (e.g. `{ "description" => "x" }`)
|
|
55
|
+
# is NOT either — there is no sensible place to add the param, and capture
|
|
56
|
+
# still works without it.
|
|
57
|
+
def self.object_shaped?(schema)
|
|
58
|
+
return false unless schema.is_a?(Hash)
|
|
59
|
+
|
|
60
|
+
explicit_type = schema["type"] || schema[:type]
|
|
61
|
+
explicit_type == "object" ||
|
|
62
|
+
schema.key?("properties") || schema.key?(:properties) ||
|
|
63
|
+
(explicit_type.nil? && schema.empty?)
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
# Return a copy of a JSON-Schema object with the `mcpeyeIntent` property merged
|
|
67
|
+
# in. Non-destructive (the original schema and its properties are copied),
|
|
68
|
+
# mirroring the TS `augmentListToolsResult` / Python `inject_intent_param`:
|
|
69
|
+
#
|
|
70
|
+
# - Object-shaped schema (see `object_shaped?`): merge in `mcpeyeIntent`,
|
|
71
|
+
# defaulting `type` to "object".
|
|
72
|
+
# - Anything else (explicit non-object type, or typeless-non-empty): returned
|
|
73
|
+
# unchanged — capture still works.
|
|
74
|
+
# - When `mcpeyeIntent` already exists (a tool owns the name), the property is
|
|
75
|
+
# left exactly as the tool declared it.
|
|
76
|
+
def self.inject_intent_param(input_schema)
|
|
77
|
+
return input_schema unless object_shaped?(input_schema)
|
|
78
|
+
|
|
79
|
+
schema = input_schema.dup
|
|
80
|
+
props = (schema["properties"] || schema[:properties] || {}).dup
|
|
81
|
+
# Do not clobber a real tool param that happens to share the name.
|
|
82
|
+
unless props.key?(INTENT_PARAM_NAME) || props.key?(INTENT_PARAM_NAME.to_sym)
|
|
83
|
+
props[INTENT_PARAM_NAME] = param_json_schema
|
|
84
|
+
end
|
|
85
|
+
schema["properties"] = props
|
|
86
|
+
schema["type"] ||= "object"
|
|
87
|
+
schema
|
|
88
|
+
end
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
# Re-export at the top level so callers can use Mcpeye::INTENT_PARAM_NAME.
|
|
92
|
+
INTENT_PARAM_NAME = Intent::INTENT_PARAM_NAME
|
|
93
|
+
INTENT_PARAM_DESCRIPTION = Intent::INTENT_PARAM_DESCRIPTION
|
|
94
|
+
end
|
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "set"
|
|
4
|
+
|
|
5
|
+
module Mcpeye
|
|
6
|
+
# v1 redaction: regex-based secret/PII scrubbing applied client-side in the SDK
|
|
7
|
+
# BEFORE anything is sent to the ingest API, and again defensively in the worker
|
|
8
|
+
# before any LLM call. Self-hosting is the real privacy mitigation; this reduces
|
|
9
|
+
# the blast radius of obvious secrets/PII in free-text arguments and intent.
|
|
10
|
+
#
|
|
11
|
+
# Deliberately conservative: it over-redacts rather than leak. Smarter,
|
|
12
|
+
# structure-aware redaction is a documented future improvement.
|
|
13
|
+
#
|
|
14
|
+
# Ported from @mcpeye/core (redaction.ts) — keep the patterns, replacements, the
|
|
15
|
+
# depth cap, and the cycle guard in sync across SDKs (TS, Python, Ruby).
|
|
16
|
+
module Redaction
|
|
17
|
+
# Each entry: [regex, replacement]. Order matches the TS port so that the
|
|
18
|
+
# most-specific keys (sk-ant-) win before the generic ones where relevant.
|
|
19
|
+
PATTERNS = [
|
|
20
|
+
# email
|
|
21
|
+
[/[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}/, "[REDACTED_EMAIL]"],
|
|
22
|
+
# Anthropic key (more specific than the generic sk- key — run first)
|
|
23
|
+
[/\bsk-ant-[A-Za-z0-9_-]{16,}\b/, "[REDACTED_KEY]"],
|
|
24
|
+
# OpenAI-style key
|
|
25
|
+
[/\bsk-[A-Za-z0-9_-]{16,}\b/, "[REDACTED_KEY]"],
|
|
26
|
+
# GitHub tokens (ghp_, gho_, ghu_, ghs_, ghr_)
|
|
27
|
+
[/\bgh[pousr]_[A-Za-z0-9]{20,}\b/, "[REDACTED_KEY]"],
|
|
28
|
+
# AWS access key id
|
|
29
|
+
[/\bAKIA[0-9A-Z]{16}\b/, "[REDACTED_KEY]"],
|
|
30
|
+
# Bearer tokens
|
|
31
|
+
[/\bBearer\s+[A-Za-z0-9._-]{12,}\b/i, "Bearer [REDACTED_KEY]"],
|
|
32
|
+
# JWT
|
|
33
|
+
[/\beyJ[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{8,}\b/, "[REDACTED_JWT]"],
|
|
34
|
+
# Credit-card-ish (13-16 digit groups, optional spaces/dashes)
|
|
35
|
+
[/\b(?:\d[ -]?){13,16}\b/, "[REDACTED_CARD]"],
|
|
36
|
+
# Phone numbers (loose international)
|
|
37
|
+
[/\b\+?\d{1,3}[\s.-]?\(?\d{2,4}\)?[\s.-]?\d{3,4}[\s.-]?\d{3,4}\b/, "[REDACTED_PHONE]"]
|
|
38
|
+
].freeze
|
|
39
|
+
|
|
40
|
+
# Exact field names whose values are always dropped, regardless of content.
|
|
41
|
+
DEFAULT_DENYLIST = %w[
|
|
42
|
+
password passwd secret token apiKey api_key authorization
|
|
43
|
+
].freeze
|
|
44
|
+
|
|
45
|
+
REDACTED_FIELD = "[REDACTED_FIELD]"
|
|
46
|
+
REDACTED_TOO_DEEP = "[REDACTED_TOO_DEEP]"
|
|
47
|
+
REDACTED_CYCLE = "[REDACTED_CYCLE]"
|
|
48
|
+
|
|
49
|
+
# Max nesting walk descends. Past this we substitute a marker instead of
|
|
50
|
+
# recursing further. Without a cap, a deeply-nested value (which the ingest
|
|
51
|
+
# schema does NOT bound) overflows the call stack — and on the server that
|
|
52
|
+
# throw aborts the whole ingest transaction, 500-ing the entire batch and
|
|
53
|
+
# discarding every other valid event. 64 is far deeper than any real tool
|
|
54
|
+
# payload. Matches @mcpeye/core's MAX_REDACT_DEPTH.
|
|
55
|
+
MAX_REDACT_DEPTH = 64
|
|
56
|
+
|
|
57
|
+
# Scrub a single string through every pattern.
|
|
58
|
+
def self.redact_string(input)
|
|
59
|
+
return input unless input.is_a?(String)
|
|
60
|
+
|
|
61
|
+
out = input.dup
|
|
62
|
+
PATTERNS.each { |(re, replacement)| out = out.gsub(re, replacement) }
|
|
63
|
+
out
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
# Recursively redact a JSON-ish value (Hash / Array / String / primitive).
|
|
67
|
+
#
|
|
68
|
+
# opts[:denylist_fields] — extra exact field names to drop (case-insensitive),
|
|
69
|
+
# merged with DEFAULT_DENYLIST.
|
|
70
|
+
def self.redact_value(value, opts = {})
|
|
71
|
+
extra = opts[:denylist_fields] || opts["denylist_fields"] || []
|
|
72
|
+
denylist = (DEFAULT_DENYLIST + extra).map { |f| f.to_s.downcase }.to_set
|
|
73
|
+
|
|
74
|
+
# Identity set of containers on the current recursion path, so a
|
|
75
|
+
# self-referential structure is short-circuited instead of looping forever
|
|
76
|
+
# (added on enter, removed on exit — so sibling/diamond references are still
|
|
77
|
+
# walked, only true cycles cut). Mirrors the TS `onPath` WeakSet / Python id().
|
|
78
|
+
on_path = {}.compare_by_identity
|
|
79
|
+
walk(value, denylist, 0, on_path)
|
|
80
|
+
end
|
|
81
|
+
|
|
82
|
+
def self.walk(value, denylist, depth, on_path)
|
|
83
|
+
case value
|
|
84
|
+
when String
|
|
85
|
+
redact_string(value)
|
|
86
|
+
when Array, Hash
|
|
87
|
+
return REDACTED_TOO_DEEP if depth >= MAX_REDACT_DEPTH
|
|
88
|
+
return REDACTED_CYCLE if on_path[value]
|
|
89
|
+
|
|
90
|
+
on_path[value] = true
|
|
91
|
+
begin
|
|
92
|
+
if value.is_a?(Array)
|
|
93
|
+
value.map { |v| walk(v, denylist, depth + 1, on_path) }
|
|
94
|
+
else
|
|
95
|
+
value.each_with_object({}) do |(k, v), out|
|
|
96
|
+
out[k] = if denylist.include?(k.to_s.downcase)
|
|
97
|
+
REDACTED_FIELD
|
|
98
|
+
else
|
|
99
|
+
walk(v, denylist, depth + 1, on_path)
|
|
100
|
+
end
|
|
101
|
+
end
|
|
102
|
+
end
|
|
103
|
+
ensure
|
|
104
|
+
on_path.delete(value)
|
|
105
|
+
end
|
|
106
|
+
else
|
|
107
|
+
value
|
|
108
|
+
end
|
|
109
|
+
end
|
|
110
|
+
private_class_method :walk
|
|
111
|
+
end
|
|
112
|
+
end
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Mcpeye
|
|
4
|
+
# The active missing-capability contract — the reserved `mcpeye_request_capability`
|
|
5
|
+
# tool the SDK adds to the server's tool registry.
|
|
6
|
+
#
|
|
7
|
+
# The passive `mcpeyeIntent` param (see intent.rb) only sees a missing capability
|
|
8
|
+
# when the agent still calls SOME tool and self-reports the gap in the intent text.
|
|
9
|
+
# Its blind spot is the silent miss: when no tool fits, the correct behavior is to
|
|
10
|
+
# call no tool at all — so there is no intent string and no tool_call row, yet that
|
|
11
|
+
# is the highest-value roadmap signal (a user need with zero matching tool).
|
|
12
|
+
#
|
|
13
|
+
# This closes the gap actively: the SDK adds a reserved tool the agent can call to
|
|
14
|
+
# voice a capability the available tools do not cover. The SDK answers it locally
|
|
15
|
+
# (the host never sees it), returns a friendly acknowledgement, and records it as a
|
|
16
|
+
# normal tool_call with tool_name = "mcpeye_request_capability". The report folds
|
|
17
|
+
# these explicit calls into topMissingCapabilities as high-confidence entries.
|
|
18
|
+
#
|
|
19
|
+
# Keep these strings byte-for-byte in sync with @mcpeye/core
|
|
20
|
+
# (packages/core/src/request-capability.ts) and the Python SDK
|
|
21
|
+
# (packages/sdk-python/src/mcpeye/request_capability.py).
|
|
22
|
+
# spec/request_capability_spec.rb asserts them against the canonical text.
|
|
23
|
+
module RequestCapability
|
|
24
|
+
TOOL_NAME = "mcpeye_request_capability"
|
|
25
|
+
|
|
26
|
+
# Byte-for-byte identical to packages/core/src/request-capability.ts. The
|
|
27
|
+
# lower-the-bar framing ("even if an existing tool could be a fallback") is what
|
|
28
|
+
# pulls calls; a timid framing leaves the signal near zero. Change all SDKs together.
|
|
29
|
+
TOOL_DESCRIPTION =
|
|
30
|
+
"Use this tool whenever the user wants something the available tools do not cover, " \
|
|
31
|
+
"or whenever the task might benefit from a more specialized capability even if an " \
|
|
32
|
+
"existing tool could be used as a fallback. Describe the capability the user actually " \
|
|
33
|
+
"needs, in their own words (for example: 'export the report as a PDF'). This signal is " \
|
|
34
|
+
"used only to improve which tools are offered; calling it has no side effects and does " \
|
|
35
|
+
"not perform the action."
|
|
36
|
+
|
|
37
|
+
CAPABILITY_DESCRIPTION =
|
|
38
|
+
"The capability the user needs that the available tools do not provide, in the user's " \
|
|
39
|
+
"own words (for example: 'export the report as a PDF')."
|
|
40
|
+
|
|
41
|
+
CONTEXT_DESCRIPTION =
|
|
42
|
+
"Optional: what the user was trying to accomplish when they reached for this capability."
|
|
43
|
+
|
|
44
|
+
ACK =
|
|
45
|
+
"Thanks. This capability request has been recorded for the server's maintainers. " \
|
|
46
|
+
"Continue helping the user with the available tools as best you can."
|
|
47
|
+
|
|
48
|
+
# The reserved tool's input schema: `capability` required, `context` optional.
|
|
49
|
+
# Returns a fresh Hash each call so a caller mutating it can never corrupt the
|
|
50
|
+
# shared contract. Carries NO mcpeyeIntent property — the ask lives in `capability`.
|
|
51
|
+
def self.input_schema
|
|
52
|
+
{
|
|
53
|
+
"type" => "object",
|
|
54
|
+
"properties" => {
|
|
55
|
+
"capability" => { "type" => "string", "description" => CAPABILITY_DESCRIPTION },
|
|
56
|
+
"context" => { "type" => "string", "description" => CONTEXT_DESCRIPTION }
|
|
57
|
+
},
|
|
58
|
+
"required" => ["capability"]
|
|
59
|
+
}
|
|
60
|
+
end
|
|
61
|
+
|
|
62
|
+
# The full { name, description, inputSchema } descriptor, as a fresh Hash.
|
|
63
|
+
def self.descriptor
|
|
64
|
+
{
|
|
65
|
+
"name" => TOOL_NAME,
|
|
66
|
+
"description" => TOOL_DESCRIPTION,
|
|
67
|
+
"inputSchema" => input_schema
|
|
68
|
+
}
|
|
69
|
+
end
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
# Re-export at the top level for parity with INTENT_PARAM_NAME.
|
|
73
|
+
REQUEST_CAPABILITY_TOOL_NAME = RequestCapability::TOOL_NAME
|
|
74
|
+
end
|