eve 0.6.0-beta.9 → 0.7.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +281 -0
- package/README.md +9 -6
- package/dist/docs/public/README.md +17 -12
- package/dist/docs/public/agent-config.md +10 -10
- package/dist/docs/public/channels/custom.mdx +4 -4
- package/dist/docs/public/channels/discord.mdx +1 -1
- package/dist/docs/public/channels/eve.mdx +10 -10
- package/dist/docs/public/channels/github.mdx +1 -1
- package/dist/docs/public/channels/overview.mdx +21 -15
- package/dist/docs/public/channels/slack.mdx +16 -4
- package/dist/docs/public/channels/teams.mdx +1 -1
- package/dist/docs/public/channels/telegram.mdx +1 -1
- package/dist/docs/public/channels/twilio.mdx +1 -1
- package/dist/docs/public/{advanced → concepts}/context-control.md +3 -3
- package/dist/docs/public/{advanced → concepts}/default-harness.md +5 -5
- package/dist/docs/public/{advanced → concepts}/execution-model-and-durability.md +3 -1
- package/dist/docs/public/concepts/meta.json +10 -0
- package/dist/docs/public/{advanced → concepts}/security-model.md +3 -3
- package/dist/docs/public/{advanced → concepts}/sessions-runs-and-streaming.md +7 -7
- package/dist/docs/public/connections.mdx +6 -4
- package/dist/docs/public/evals/assertions.mdx +108 -0
- package/dist/docs/public/evals/cases.mdx +143 -0
- package/dist/docs/public/evals/judge.mdx +94 -0
- package/dist/docs/public/evals/meta.json +4 -0
- package/dist/docs/public/evals/overview.mdx +118 -0
- package/dist/docs/public/evals/reporters.mdx +62 -0
- package/dist/docs/public/evals/running.mdx +63 -0
- package/dist/docs/public/evals/targets.mdx +54 -0
- package/dist/docs/public/getting-started.mdx +38 -33
- package/dist/docs/public/{advanced → guides}/auth-and-route-protection.md +5 -3
- package/dist/docs/public/{client → guides/client}/continuations.mdx +2 -2
- package/dist/docs/public/{client → guides/client}/messages.mdx +1 -1
- package/dist/docs/public/{client → guides/client}/meta.json +1 -1
- package/dist/docs/public/{client → guides/client}/output-schema.mdx +2 -2
- package/dist/docs/public/{client → guides/client}/overview.mdx +5 -5
- package/dist/docs/public/{client → guides/client}/streaming.mdx +1 -1
- package/dist/docs/public/{advanced → guides}/deployment.md +9 -1
- package/dist/docs/public/guides/dev-tui.md +50 -0
- package/dist/docs/public/{advanced → guides}/dynamic-capabilities.md +1 -1
- package/dist/docs/public/{advanced → guides}/dynamic-workflows.md +1 -1
- package/dist/docs/public/{frontend → guides/frontend}/nextjs.mdx +16 -7
- package/dist/docs/public/{frontend → guides/frontend}/nuxt.mdx +7 -7
- package/dist/docs/public/{frontend → guides/frontend}/overview.mdx +6 -6
- package/dist/docs/public/{frontend → guides/frontend}/sveltekit.mdx +5 -5
- package/dist/docs/public/{frontend → guides/frontend}/use-eve-agent-svelte.mdx +2 -2
- package/dist/docs/public/{frontend → guides/frontend}/use-eve-agent-vue.mdx +2 -2
- package/dist/docs/public/{advanced → guides}/hooks.md +2 -2
- package/dist/docs/public/{advanced → guides}/instrumentation.md +3 -1
- package/dist/docs/public/{advanced → guides}/meta.json +8 -12
- package/dist/docs/public/{advanced → guides}/session-context.md +3 -3
- package/dist/docs/public/{advanced → guides}/state.md +1 -1
- package/dist/docs/public/instructions.mdx +2 -2
- package/dist/docs/public/introduction.md +5 -2
- package/dist/docs/public/meta.json +4 -3
- package/dist/docs/public/reference/cli.md +35 -19
- package/dist/docs/public/reference/meta.json +1 -1
- package/dist/docs/public/reference/project-layout.md +5 -1
- package/dist/docs/public/reference/typescript-api.md +27 -23
- package/dist/docs/public/sandbox.mdx +1 -1
- package/dist/docs/public/schedules.mdx +2 -2
- package/dist/docs/public/skills.mdx +3 -3
- package/dist/docs/public/subagents.mdx +3 -3
- package/dist/docs/public/tools.mdx +4 -8
- package/dist/docs/public/tutorial/connect-a-warehouse.mdx +3 -3
- package/dist/docs/public/tutorial/first-agent.mdx +6 -3
- package/dist/docs/public/tutorial/guard-the-spend.mdx +1 -1
- package/dist/docs/public/tutorial/how-it-runs.mdx +2 -2
- package/dist/docs/public/tutorial/meta.json +1 -1
- package/dist/docs/public/tutorial/query-sample-data.mdx +1 -1
- package/dist/docs/public/tutorial/remember-definitions.mdx +3 -3
- package/dist/docs/public/tutorial/run-analysis.mdx +1 -1
- package/dist/docs/public/tutorial/ship-it.mdx +4 -4
- package/dist/docs/public/tutorial/team-playbooks.mdx +3 -3
- package/dist/src/chunks/{use-eve-agent-DCZbkLG7.js → use-eve-agent-DErQj5hs.js} +125 -37
- package/dist/src/chunks/{use-eve-agent-DoheC4_o.js → use-eve-agent-DoR8C4i6.js} +125 -37
- package/dist/src/cli/banner.d.ts +7 -0
- package/dist/src/cli/banner.js +1 -0
- package/dist/src/cli/commands/channel-add-conflicts.d.ts +1 -1
- package/dist/src/cli/commands/channels.d.ts +9 -6
- package/dist/src/cli/commands/channels.js +1 -1
- package/dist/src/cli/commands/deploy.d.ts +21 -0
- package/dist/src/cli/commands/deploy.js +1 -0
- package/dist/src/cli/commands/init-git.d.ts +15 -0
- package/dist/src/cli/commands/init-git.js +1 -0
- package/dist/src/cli/commands/init.d.ts +29 -0
- package/dist/src/cli/commands/init.js +1 -0
- package/dist/src/cli/commands/link.d.ts +21 -0
- package/dist/src/cli/commands/link.js +1 -0
- package/dist/src/cli/commands/preconditions.d.ts +7 -0
- package/dist/src/cli/commands/preconditions.js +1 -0
- package/dist/src/cli/commands/register-project-commands.d.ts +12 -0
- package/dist/src/cli/commands/register-project-commands.js +1 -0
- package/dist/src/cli/dev/tui/agent-header.d.ts +15 -9
- package/dist/src/cli/dev/tui/agent-header.js +1 -1
- package/dist/src/cli/dev/tui/blocks.d.ts +1 -1
- package/dist/src/cli/dev/tui/blocks.js +3 -2
- package/dist/src/cli/dev/tui/command-typeahead.d.ts +47 -0
- package/dist/src/cli/dev/tui/command-typeahead.js +1 -0
- package/dist/src/cli/dev/tui/dev-rebuild-status.d.ts +21 -0
- package/dist/src/cli/dev/tui/dev-rebuild-status.js +1 -0
- package/dist/src/cli/dev/tui/errors.d.ts +18 -0
- package/dist/src/cli/dev/tui/errors.js +1 -1
- package/dist/src/cli/dev/tui/prompt-command-handler.d.ts +14 -0
- package/dist/src/cli/dev/tui/prompt-command-handler.js +1 -0
- package/dist/src/cli/dev/tui/prompt-commands.d.ts +54 -0
- package/dist/src/cli/dev/tui/prompt-commands.js +2 -0
- package/dist/src/cli/dev/tui/runner.d.ts +64 -7
- package/dist/src/cli/dev/tui/runner.js +1 -1
- package/dist/src/cli/dev/tui/setup-commands.d.ts +48 -0
- package/dist/src/cli/dev/tui/setup-commands.js +2 -0
- package/dist/src/cli/dev/tui/setup-flow.d.ts +35 -0
- package/dist/src/cli/dev/tui/setup-issues.d.ts +40 -0
- package/dist/src/cli/dev/tui/setup-issues.js +1 -0
- package/dist/src/cli/dev/tui/setup-panel.d.ts +103 -0
- package/dist/src/cli/dev/tui/setup-panel.js +1 -0
- package/dist/src/cli/dev/tui/status-line.d.ts +25 -0
- package/dist/src/cli/dev/tui/status-line.js +1 -0
- package/dist/src/cli/dev/tui/stream-format.d.ts +16 -1
- package/dist/src/cli/dev/tui/stream-format.js +1 -1
- package/dist/src/cli/dev/tui/terminal-renderer.d.ts +32 -3
- package/dist/src/cli/dev/tui/terminal-renderer.js +5 -2
- package/dist/src/cli/dev/tui/test/index.d.ts +3 -1
- package/dist/src/cli/dev/tui/test/index.js +1 -1
- package/dist/src/cli/dev/tui/test/mock-terminal.d.ts +1 -0
- package/dist/src/cli/dev/tui/test/mock-terminal.js +1 -1
- package/dist/src/cli/dev/tui/theme.d.ts +10 -0
- package/dist/src/cli/dev/tui/theme.js +1 -1
- package/dist/src/cli/dev/tui/tui-prompter.d.ts +20 -0
- package/dist/src/cli/dev/tui/tui-prompter.js +1 -0
- package/dist/src/cli/dev/tui/tui.d.ts +6 -8
- package/dist/src/cli/dev/tui/tui.js +1 -1
- package/dist/src/cli/dev/tui/types.d.ts +4 -3
- package/dist/src/cli/dev/tui/vercel-status.d.ts +47 -0
- package/dist/src/cli/dev/tui/vercel-status.js +1 -0
- package/dist/src/cli/run.d.ts +9 -18
- package/dist/src/cli/run.js +2 -2
- package/dist/src/client/client.d.ts +8 -0
- package/dist/src/client/client.js +1 -1
- package/dist/src/client/file-parts.d.ts +18 -0
- package/dist/src/client/file-parts.js +1 -0
- package/dist/src/client/index.d.ts +3 -2
- package/dist/src/client/index.js +1 -1
- package/dist/src/client/message-response.js +1 -1
- package/dist/src/client/open-stream.d.ts +6 -0
- package/dist/src/client/open-stream.js +1 -1
- package/dist/src/client/session-utils.d.ts +5 -0
- package/dist/src/client/session-utils.js +1 -1
- package/dist/src/client/session.js +1 -1
- package/dist/src/client/types.d.ts +9 -2
- package/dist/src/compiled/.vendor-stamp.json +8 -8
- package/dist/src/compiled/@ai-sdk/anthropic/index.d.ts +56 -31
- package/dist/src/compiled/@ai-sdk/anthropic/index.js +2 -2
- package/dist/src/compiled/@ai-sdk/google/index.js +1 -1
- package/dist/src/compiled/@ai-sdk/mcp/index.js +1 -1
- package/dist/src/compiled/@ai-sdk/openai/index.d.ts +16 -9
- package/dist/src/compiled/@ai-sdk/openai/index.js +2 -2
- package/dist/src/compiled/@ai-sdk/otel/index.js +2 -2
- package/dist/src/compiled/@vercel/sandbox/index.js +1 -1
- package/dist/src/compiled/@workflow/core/capabilities.d.ts +19 -1
- package/dist/src/compiled/@workflow/core/class-serialization.d.ts +32 -0
- package/dist/src/compiled/@workflow/core/create-hook.d.ts +37 -0
- package/dist/src/compiled/@workflow/core/global.d.ts +11 -1
- package/dist/src/compiled/@workflow/core/index.js +2 -2
- package/dist/src/compiled/@workflow/core/runtime/helpers.d.ts +4 -2
- package/dist/src/compiled/@workflow/core/runtime/start.d.ts +6 -0
- package/dist/src/compiled/@workflow/core/runtime/suspension-handler.d.ts +15 -2
- package/dist/src/compiled/@workflow/core/runtime/wait-continuation.d.ts +84 -0
- package/dist/src/compiled/@workflow/core/runtime/wait-until.d.ts +18 -0
- package/dist/src/compiled/@workflow/core/runtime.d.ts +3 -1
- package/dist/src/compiled/@workflow/core/runtime.js +28 -28
- package/dist/src/compiled/@workflow/core/serialization/types.d.ts +21 -0
- package/dist/src/compiled/@workflow/core/serialization.d.ts +113 -6
- package/dist/src/compiled/@workflow/core/symbols.d.ts +2 -0
- package/dist/src/compiled/@workflow/core/util.d.ts +0 -5
- package/dist/src/compiled/@workflow/core/version.d.ts +1 -1
- package/dist/src/compiled/@workflow/core/workflow/attribute-dispatcher.d.ts +6 -0
- package/dist/src/compiled/@workflow/core/workflow/set-attributes.d.ts +3 -4
- package/dist/src/compiled/@workflow/core/workflow.js +1 -1
- package/dist/src/compiled/@workflow/world/events.d.ts +48 -0
- package/dist/src/compiled/@workflow/world/index.d.ts +3 -3
- package/dist/src/compiled/@workflow/world/queue.d.ts +31 -2
- package/dist/src/compiled/@workflow/world/runs.d.ts +2 -0
- package/dist/src/compiled/@workflow/world/spec-version.d.ts +2 -1
- package/dist/src/compiled/_chunks/workflow/attribute-changes-DGVGRGfw.js +59 -0
- package/dist/src/compiled/_chunks/workflow/{dist-gEXVSMPU.js → dist-CkMRLaRV.js} +1 -1
- package/dist/src/compiled/_chunks/workflow/functions-DuPjIvMH.js +1 -0
- package/dist/src/compiled/_chunks/workflow/resume-hook-DMSadN9o.js +1 -0
- package/dist/src/compiled/_chunks/workflow/run-BRdn7zy_.js +1 -0
- package/dist/src/compiled/_chunks/workflow/sleep-CpXfoXLF.js +1 -0
- package/dist/src/compiled/just-bash/index.d.ts +4 -4
- package/dist/src/compiler/artifacts.js +1 -1
- package/dist/src/compiler/manifest.d.ts +8 -8
- package/dist/src/compiler/normalize-agent-config.js +1 -1
- package/dist/src/compiler/normalize-channel.d.ts +2 -1
- package/dist/src/compiler/normalize-channel.js +1 -1
- package/dist/src/compiler/normalize-connection.d.ts +2 -1
- package/dist/src/compiler/normalize-connection.js +1 -1
- package/dist/src/compiler/normalize-helpers.d.ts +5 -0
- package/dist/src/compiler/normalize-helpers.js +1 -1
- package/dist/src/compiler/normalize-instructions.d.ts +3 -2
- package/dist/src/compiler/normalize-instructions.js +1 -1
- package/dist/src/compiler/normalize-manifest.js +2 -2
- package/dist/src/compiler/normalize-sandbox.d.ts +2 -1
- package/dist/src/compiler/normalize-sandbox.js +1 -1
- package/dist/src/compiler/normalize-schedule.d.ts +2 -1
- package/dist/src/compiler/normalize-schedule.js +1 -1
- package/dist/src/compiler/normalize-skill.d.ts +2 -1
- package/dist/src/compiler/normalize-skill.js +1 -1
- package/dist/src/compiler/normalize-subagent.d.ts +4 -1
- package/dist/src/compiler/normalize-subagent.js +1 -1
- package/dist/src/compiler/normalize-tool.d.ts +2 -1
- package/dist/src/compiler/normalize-tool.js +1 -1
- package/dist/src/compiler/workspace-resources.js +1 -1
- package/dist/src/context/node.d.ts +1 -1
- package/dist/src/evals/assertions/collector.d.ts +43 -0
- package/dist/src/evals/assertions/collector.js +1 -0
- package/dist/src/evals/assertions/run.d.ts +72 -0
- package/dist/src/evals/assertions/run.js +2 -0
- package/dist/src/evals/autoevals-client.js +2 -0
- package/dist/src/evals/cli/eval-client.d.ts +22 -0
- package/dist/src/evals/cli/eval-client.js +1 -0
- package/dist/src/evals/cli/eval.d.ts +8 -5
- package/dist/src/evals/cli/eval.js +1 -1
- package/dist/src/evals/context.d.ts +19 -0
- package/dist/src/evals/context.js +1 -0
- package/dist/src/evals/define-eval-config.d.ts +16 -0
- package/dist/src/evals/define-eval-config.js +1 -0
- package/dist/src/evals/define-eval.d.ts +20 -0
- package/dist/src/evals/define-eval.js +1 -0
- package/dist/src/evals/expect/index.d.ts +25 -0
- package/dist/src/evals/expect/index.js +1 -0
- package/dist/src/evals/index.d.ts +6 -2
- package/dist/src/evals/index.js +1 -1
- package/dist/src/evals/judge.d.ts +20 -0
- package/dist/src/evals/judge.js +1 -0
- package/dist/src/evals/{checks/match.d.ts → match.d.ts} +17 -18
- package/dist/src/evals/match.js +1 -0
- package/dist/src/evals/reporters/index.d.ts +1 -0
- package/dist/src/evals/reporters/index.js +1 -1
- package/dist/src/evals/requirements.d.ts +3 -0
- package/dist/src/evals/requirements.js +1 -0
- package/dist/src/evals/runner/artifacts.d.ts +7 -6
- package/dist/src/evals/runner/artifacts.js +3 -3
- package/dist/src/evals/runner/discover.d.ts +31 -10
- package/dist/src/evals/runner/discover.js +1 -1
- package/dist/src/evals/runner/execute-eval.d.ts +25 -0
- package/dist/src/evals/runner/execute-eval.js +1 -0
- package/dist/src/evals/runner/execute-task.d.ts +31 -0
- package/dist/src/evals/runner/execute-task.js +1 -0
- package/dist/src/evals/runner/reporters/braintrust.d.ts +7 -5
- package/dist/src/evals/runner/reporters/braintrust.js +2 -2
- package/dist/src/evals/runner/reporters/console.d.ts +4 -4
- package/dist/src/evals/runner/reporters/console.js +1 -1
- package/dist/src/evals/runner/reporters/junit.d.ts +10 -0
- package/dist/src/evals/runner/reporters/junit.js +4 -0
- package/dist/src/evals/runner/reporters/types.d.ts +14 -8
- package/dist/src/evals/runner/run-evals.d.ts +38 -0
- package/dist/src/evals/runner/run-evals.js +1 -0
- package/dist/src/evals/runner/verdict.d.ts +10 -15
- package/dist/src/evals/runner/verdict.js +1 -1
- package/dist/src/evals/session.d.ts +52 -0
- package/dist/src/evals/session.js +1 -0
- package/dist/src/evals/target.d.ts +23 -0
- package/dist/src/evals/target.js +1 -0
- package/dist/src/evals/types.d.ts +294 -219
- package/dist/src/execution/compaction.d.ts +14 -0
- package/dist/src/execution/compaction.js +1 -0
- package/dist/src/execution/delegated-parent-notification.js +1 -1
- package/dist/src/execution/dispatch-runtime-actions-step.js +1 -1
- package/dist/src/execution/node-step.js +1 -1
- package/dist/src/execution/sandbox/bash-tool.d.ts +6 -6
- package/dist/src/execution/sandbox/bash-tool.js +1 -1
- package/dist/src/execution/sandbox/bindings/local.js +1 -1
- package/dist/src/execution/sandbox/bindings/vercel.d.ts +2 -6
- package/dist/src/execution/sandbox/bindings/vercel.js +1 -1
- package/dist/src/execution/sandbox/glob-tool.js +3 -3
- package/dist/src/execution/sandbox/grep-tool.js +3 -3
- package/dist/src/execution/sandbox/read-file-tool.js +1 -1
- package/dist/src/execution/subagent-adapter.js +1 -1
- package/dist/src/execution/tool-auth.js +1 -1
- package/dist/src/execution/turn-workflow.js +1 -1
- package/dist/src/execution/workflow-runtime.d.ts +2 -2
- package/dist/src/execution/workflow-runtime.js +1 -1
- package/dist/src/execution/workflow-steps.js +1 -1
- package/dist/src/harness/action-result-helpers.js +1 -1
- package/dist/src/harness/authorization.d.ts +26 -0
- package/dist/src/harness/authorization.js +1 -1
- package/dist/src/harness/code-mode-lifecycle.js +1 -1
- package/dist/src/harness/emission.d.ts +12 -5
- package/dist/src/harness/emission.js +1 -1
- package/dist/src/harness/model-call-error.d.ts +35 -6
- package/dist/src/harness/model-call-error.js +1 -1
- package/dist/src/harness/step-hooks.d.ts +10 -4
- package/dist/src/harness/step-hooks.js +1 -1
- package/dist/src/harness/tool-loop.js +1 -1
- package/dist/src/harness/tools.d.ts +4 -6
- package/dist/src/harness/tools.js +1 -1
- package/dist/src/harness/turn-tag-state.d.ts +4 -0
- package/dist/src/harness/turn-tag-state.js +1 -1
- package/dist/src/harness/types.d.ts +4 -15
- package/dist/src/internal/application/cache-metadata.js +1 -1
- package/dist/src/internal/application/compiled-artifacts.js +1 -1
- package/dist/src/internal/application/package.js +1 -1
- package/dist/src/internal/application/paths.js +1 -1
- package/dist/src/internal/authored-definition/schema-backed.js +1 -1
- package/dist/src/internal/authored-module-loader.d.ts +4 -1
- package/dist/src/internal/authored-module-loader.js +2 -2
- package/dist/src/internal/authored-module-map-loader.js +1 -1
- package/dist/src/internal/nitro/dev-runtime-artifacts.js +1 -1
- package/dist/src/internal/nitro/host/build-application.js +1 -1
- package/dist/src/internal/nitro/host/build-vercel-agent-summary.js +1 -1
- package/dist/src/internal/nitro/host/configure-nitro-routes.js +3 -3
- package/dist/src/internal/nitro/host/create-application-nitro.js +1 -1
- package/dist/src/internal/nitro/host/dev-authored-source-watcher.js +1 -1
- package/dist/src/internal/nitro/host/dev-watcher-log.d.ts +37 -0
- package/dist/src/internal/nitro/host/dev-watcher-log.js +1 -0
- package/dist/src/internal/nitro/host/ports.d.ts +8 -0
- package/dist/src/internal/nitro/host/ports.js +1 -0
- package/dist/src/internal/nitro/host/prepare-application-host.js +1 -1
- package/dist/src/internal/nitro/host/server-external-packages.d.ts +1 -1
- package/dist/src/internal/nitro/host/server-external-packages.js +1 -1
- package/dist/src/internal/nitro/host/start-development-server.js +1 -1
- package/dist/src/internal/nitro/host/start-production-server.js +1 -1
- package/dist/src/internal/nitro/routes/agent-info/build-agent-info-response-from-manifest.d.ts +5 -0
- package/dist/src/internal/nitro/routes/agent-info/build-agent-info-response-from-manifest.js +1 -0
- package/dist/src/internal/nitro/routes/agent-info/build-agent-info-response.d.ts +31 -2
- package/dist/src/internal/nitro/routes/agent-info/build-agent-info-response.js +1 -1
- package/dist/src/internal/nitro/routes/agent-info/load-agent-info-data.d.ts +13 -0
- package/dist/src/internal/nitro/routes/agent-info/load-agent-info-data.js +1 -1
- package/dist/src/internal/nitro/routes/info.d.ts +2 -2
- package/dist/src/internal/nitro/routes/info.js +1 -1
- package/dist/src/internal/workflow/queue-namespace.d.ts +5 -0
- package/dist/src/internal/workflow/queue-namespace.js +1 -0
- package/dist/src/internal/workflow-bundle/builder-support.js +2 -2
- package/dist/src/internal/workflow-bundle/builder.js +3 -5
- package/dist/src/internal/workflow-bundle/vercel-workflow-output.js +1 -1
- package/dist/src/internal/workflow-bundle/workflow-builders.d.ts +1 -1
- package/dist/src/internal/workflow-bundle/workflow-builders.js +1 -1
- package/dist/src/node_modules/.pnpm/@clack_core@1.3.1/node_modules/@clack/core/dist/index.js +4 -4
- package/dist/src/protocol/message.d.ts +15 -0
- package/dist/src/protocol/message.js +2 -2
- package/dist/src/public/channels/slack/api.d.ts +8 -0
- package/dist/src/public/channels/slack/api.js +1 -1
- package/dist/src/public/channels/slack/connections.d.ts +26 -18
- package/dist/src/public/channels/slack/connections.js +1 -1
- package/dist/src/public/channels/slack/defaults.d.ts +5 -2
- package/dist/src/public/channels/slack/defaults.js +1 -1
- package/dist/src/public/channels/slack/index.d.ts +1 -1
- package/dist/src/public/channels/slack/slackChannel.d.ts +65 -5
- package/dist/src/public/channels/slack/slackChannel.js +1 -1
- package/dist/src/public/channels/teams/defaults.js +1 -1
- package/dist/src/public/connections/errors.d.ts +8 -0
- package/dist/src/public/definitions/tool.d.ts +0 -33
- package/dist/src/public/next/index.d.ts +7 -1
- package/dist/src/public/next/index.js +1 -1
- package/dist/src/public/next/server.d.ts +1 -0
- package/dist/src/public/next/server.js +1 -1
- package/dist/src/public/nuxt/dev-server.js +1 -1
- package/dist/src/public/sveltekit/dev-server.js +1 -1
- package/dist/src/public/sveltekit/index.d.ts +1 -1
- package/dist/src/public/tools/defaults.d.ts +2 -4
- package/dist/src/public/tools/defaults.js +1 -1
- package/dist/src/public/tools/define-bash-tool.d.ts +3 -3
- package/dist/src/public/tools/define-bash-tool.js +1 -1
- package/dist/src/public/tools/define-read-file-tool.d.ts +0 -6
- package/dist/src/public/tools/define-read-file-tool.js +1 -1
- package/dist/src/public/tools/index.d.ts +2 -2
- package/dist/src/public/tools/index.js +1 -1
- package/dist/src/public/tools/internal.js +1 -1
- package/dist/src/runtime/actions/types.d.ts +11 -11
- package/dist/src/runtime/agent/mock-model-adapter.js +1 -1
- package/dist/src/runtime/agent/mock-model-fixtures.js +3 -2
- package/dist/src/runtime/agent/mock-model-skill-selection.js +3 -4
- package/dist/src/runtime/connections/callback-route.js +1 -1
- package/dist/src/runtime/connections/mcp-client.js +1 -1
- package/dist/src/runtime/connections/scoped-authorization.d.ts +21 -5
- package/dist/src/runtime/connections/scoped-authorization.js +1 -1
- package/dist/src/runtime/connections/types.d.ts +33 -0
- package/dist/src/runtime/connections/validate-authorization.js +1 -1
- package/dist/src/runtime/framework-tools/bash.d.ts +3 -3
- package/dist/src/runtime/framework-tools/bash.js +1 -1
- package/dist/src/runtime/framework-tools/connection-search-dynamic.d.ts +1 -1
- package/dist/src/runtime/framework-tools/connection-search-dynamic.js +1 -1
- package/dist/src/runtime/framework-tools/file-state.d.ts +3 -3
- package/dist/src/runtime/framework-tools/index.js +1 -1
- package/dist/src/runtime/framework-tools/read-file.js +2 -2
- package/dist/src/runtime/framework-tools/todo.d.ts +7 -0
- package/dist/src/runtime/framework-tools/todo.js +2 -2
- package/dist/src/runtime/governance/auth/http-basic.js +1 -1
- package/dist/src/runtime/input/types.d.ts +1 -1
- package/dist/src/runtime/resolve-tool.d.ts +2 -2
- package/dist/src/runtime/resolve-tool.js +1 -1
- package/dist/src/runtime/sandbox/keys.js +1 -1
- package/dist/src/runtime/session-callback-route.js +1 -1
- package/dist/src/runtime/types.d.ts +1 -7
- package/dist/src/services/dev-client/client-options.d.ts +8 -0
- package/dist/src/services/dev-client/client-options.js +1 -0
- package/dist/src/services/dev-client/runtime-artifacts.d.ts +13 -0
- package/dist/src/services/dev-client/runtime-artifacts.js +1 -0
- package/dist/src/services/dev-client.d.ts +13 -46
- package/dist/src/services/dev-client.js +1 -1
- package/dist/src/setup/ask.d.ts +205 -0
- package/dist/src/setup/ask.js +1 -0
- package/dist/src/setup/boxes/add-channels.d.ts +100 -16
- package/dist/src/setup/boxes/add-channels.js +2 -1
- package/dist/src/setup/boxes/add-connections.d.ts +13 -23
- package/dist/src/setup/boxes/add-connections.js +1 -1
- package/dist/src/setup/boxes/apply-ai-gateway-credential.d.ts +2 -2
- package/dist/src/setup/boxes/apply-ai-gateway-credential.js +1 -1
- package/dist/src/setup/boxes/deploy-project.d.ts +46 -14
- package/dist/src/setup/boxes/deploy-project.js +1 -1
- package/dist/src/setup/boxes/detect-ai-gateway.d.ts +10 -3
- package/dist/src/setup/boxes/detect-ai-gateway.js +1 -1
- package/dist/src/setup/boxes/link-project.d.ts +3 -3
- package/dist/src/setup/boxes/link-project.js +1 -1
- package/dist/src/setup/boxes/one-shot-next-steps.d.ts +18 -0
- package/dist/src/setup/boxes/one-shot-next-steps.js +2 -0
- package/dist/src/setup/boxes/preflight.d.ts +14 -6
- package/dist/src/setup/boxes/preflight.js +1 -1
- package/dist/src/setup/boxes/resolve-provisioning.d.ts +36 -8
- package/dist/src/setup/boxes/resolve-provisioning.js +1 -1
- package/dist/src/setup/boxes/resolve-target.d.ts +25 -8
- package/dist/src/setup/boxes/resolve-target.js +1 -1
- package/dist/src/setup/boxes/scaffold.d.ts +12 -6
- package/dist/src/setup/boxes/scaffold.js +1 -1
- package/dist/src/setup/boxes/select-channels.d.ts +38 -9
- package/dist/src/setup/boxes/select-channels.js +1 -1
- package/dist/src/setup/boxes/select-chat.d.ts +15 -11
- package/dist/src/setup/boxes/select-chat.js +1 -1
- package/dist/src/setup/boxes/select-connections.d.ts +30 -0
- package/dist/src/setup/boxes/select-connections.js +1 -0
- package/dist/src/setup/boxes/select-model.d.ts +18 -14
- package/dist/src/setup/boxes/select-model.js +1 -1
- package/dist/src/setup/boxes/select-setup-mode.d.ts +32 -0
- package/dist/src/setup/boxes/select-setup-mode.js +1 -0
- package/dist/src/setup/channel-add-conflicts.d.ts +28 -0
- package/dist/src/setup/channel-add-conflicts.js +1 -0
- package/dist/src/setup/cli/channel-setup-prompter.d.ts +23 -0
- package/dist/src/setup/cli/channel-setup-prompter.js +1 -0
- package/dist/src/setup/cli/connection-add-prompter.d.ts +8 -0
- package/dist/src/setup/cli/connection-add-prompter.js +1 -0
- package/dist/src/setup/{scaffold/cli → cli}/index.d.ts +4 -3
- package/dist/src/setup/cli/index.js +1 -0
- package/dist/src/setup/{scaffold/cli → cli}/prompt-ui.d.ts +39 -15
- package/dist/src/setup/cli/prompt-ui.js +5 -0
- package/dist/src/setup/{scaffold/cli → cli}/rail-log.d.ts +2 -0
- package/dist/src/setup/{scaffold/cli → cli}/rail-log.js +2 -2
- package/dist/src/setup/{scaffold/cli → cli}/select-component.d.ts +18 -3
- package/dist/src/setup/cli/select-component.js +1 -0
- package/dist/src/setup/cli/select-option-codec.d.ts +12 -0
- package/dist/src/setup/cli/select-option-codec.js +1 -0
- package/dist/src/setup/{scaffold/cli → cli}/select-state.d.ts +13 -1
- package/dist/src/setup/cli/select-state.js +1 -0
- package/dist/src/setup/cli/whimsy.d.ts +16 -0
- package/dist/src/setup/cli/whimsy.js +1 -0
- package/dist/src/setup/{scaffold/steps/setup-connection.d.ts → connection-connector.d.ts} +3 -2
- package/dist/src/setup/connection-connector.js +1 -0
- package/dist/src/setup/flows/channels.d.ts +43 -0
- package/dist/src/setup/flows/channels.js +1 -0
- package/dist/src/setup/flows/deploy.d.ts +40 -0
- package/dist/src/setup/flows/deploy.js +1 -0
- package/dist/src/setup/flows/in-project.d.ts +16 -0
- package/dist/src/setup/flows/in-project.js +1 -0
- package/dist/src/setup/flows/link.d.ts +43 -0
- package/dist/src/setup/flows/link.js +1 -0
- package/dist/src/setup/flows/model.d.ts +112 -0
- package/dist/src/setup/flows/model.js +1 -0
- package/dist/src/setup/flows/vercel.d.ts +31 -0
- package/dist/src/setup/flows/vercel.js +2 -0
- package/dist/src/setup/gateway-models.js +1 -1
- package/dist/src/setup/headless.d.ts +1 -1
- package/dist/src/setup/index.d.ts +10 -4
- package/dist/src/setup/index.js +1 -1
- package/dist/src/setup/onboarding.d.ts +7 -4
- package/dist/src/setup/onboarding.js +1 -1
- package/dist/src/setup/package-manager.d.ts +27 -0
- package/dist/src/setup/package-manager.js +1 -0
- package/dist/src/setup/primitives/index.d.ts +3 -0
- package/dist/src/setup/primitives/index.js +1 -0
- package/dist/src/setup/primitives/pm/bun.d.ts +10 -0
- package/dist/src/setup/primitives/pm/bun.js +1 -0
- package/dist/src/setup/primitives/pm/index.d.ts +11 -0
- package/dist/src/setup/primitives/pm/index.js +1 -0
- package/dist/src/setup/primitives/pm/npm.d.ts +10 -0
- package/dist/src/setup/primitives/pm/npm.js +1 -0
- package/dist/src/setup/primitives/pm/pnpm.d.ts +27 -0
- package/dist/src/setup/primitives/pm/pnpm.js +8 -0
- package/dist/src/setup/primitives/pm/run.d.ts +23 -0
- package/dist/src/setup/primitives/pm/run.js +1 -0
- package/dist/src/setup/primitives/pm/shared.d.ts +8 -0
- package/dist/src/setup/primitives/pm/shared.js +1 -0
- package/dist/src/setup/primitives/pm/types.d.ts +37 -0
- package/dist/src/setup/primitives/pm/types.js +1 -0
- package/dist/src/setup/primitives/pm/yarn.d.ts +10 -0
- package/dist/src/setup/primitives/pm/yarn.js +1 -0
- package/dist/src/setup/primitives/run-pnpm.d.ts +1 -0
- package/dist/src/setup/primitives/run-pnpm.js +1 -0
- package/dist/src/setup/{scaffold/primitives → primitives}/run-vercel.d.ts +7 -0
- package/dist/src/setup/primitives/run-vercel.js +1 -0
- package/dist/src/setup/project-name.d.ts +4 -0
- package/dist/src/setup/project-name.js +1 -0
- package/dist/src/setup/project-resolution.d.ts +54 -0
- package/dist/src/setup/project-resolution.js +1 -0
- package/dist/src/setup/prompter.d.ts +52 -4
- package/dist/src/setup/prompter.js +1 -1
- package/dist/src/setup/quit-guard.d.ts +1 -1
- package/dist/src/setup/run-vercel-link.d.ts +1 -1
- package/dist/src/setup/run-vercel-link.js +1 -1
- package/dist/src/setup/runner.d.ts +5 -4
- package/dist/src/setup/runner.js +1 -1
- package/dist/src/setup/scaffold/channels-catalog.d.ts +3 -3
- package/dist/src/setup/scaffold/channels-catalog.js +1 -1
- package/dist/src/setup/scaffold/create/add-to-project.d.ts +26 -0
- package/dist/src/setup/scaffold/create/add-to-project.js +1 -0
- package/dist/src/setup/scaffold/create/project.d.ts +54 -0
- package/dist/src/setup/scaffold/create/project.js +80 -0
- package/dist/src/setup/scaffold/index.d.ts +4 -4
- package/dist/src/setup/scaffold/index.js +1 -1
- package/dist/src/setup/scaffold/{channels.d.ts → update/channels.d.ts} +11 -0
- package/dist/src/setup/scaffold/update/channels.js +7 -0
- package/dist/src/setup/scaffold/{connections.d.ts → update/connections.d.ts} +1 -1
- package/dist/src/setup/scaffold/update/connections.js +21 -0
- package/dist/src/setup/scaffold/version-tokens.d.ts +11 -0
- package/dist/src/setup/scaffold/version-tokens.js +1 -0
- package/dist/src/setup/{scaffold/steps/setup-slackbot.d.ts → slackbot.d.ts} +24 -20
- package/dist/src/setup/slackbot.js +1 -0
- package/dist/src/setup/state.d.ts +62 -15
- package/dist/src/setup/state.js +1 -1
- package/dist/src/setup/step.d.ts +9 -18
- package/dist/src/setup/vercel-project.d.ts +15 -8
- package/dist/src/setup/vercel-project.js +1 -1
- package/dist/src/shared/agent-definition.d.ts +5 -3
- package/dist/src/shared/default-agent-model.d.ts +5 -0
- package/dist/src/shared/default-agent-model.js +1 -0
- package/dist/src/source-change/apply-model-name.d.ts +25 -0
- package/dist/src/source-change/apply-model-name.js +2 -0
- package/dist/src/source-change/static-source-change.d.ts +36 -0
- package/dist/src/source-change/static-source-change.js +1 -0
- package/dist/src/svelte/index.js +1 -1
- package/dist/src/svelte/use-eve-agent.js +1 -1
- package/dist/src/vue/index.js +1 -1
- package/dist/src/vue/use-eve-agent.js +1 -1
- package/package.json +22 -42
- package/dist/docs/evals-v2-plan.md +0 -939
- package/dist/docs/public/advanced/dev-tui.md +0 -52
- package/dist/docs/public/advanced/evals.md +0 -158
- package/dist/docs/public/reference/faqs.md +0 -48
- package/dist/src/cli/commands/setup.d.ts +0 -55
- package/dist/src/cli/commands/setup.js +0 -1
- package/dist/src/cli/dev/repl/input-requests.d.ts +0 -38
- package/dist/src/cli/dev/repl/input-requests.js +0 -1
- package/dist/src/cli/dev/repl/input.d.ts +0 -19
- package/dist/src/cli/dev/repl/input.js +0 -1
- package/dist/src/cli/dev/repl/repl.d.ts +0 -62
- package/dist/src/cli/dev/repl/repl.js +0 -2
- package/dist/src/cli/dev/repl/terminal.d.ts +0 -21
- package/dist/src/cli/dev/repl/terminal.js +0 -5
- package/dist/src/compiled/_chunks/workflow/resume-hook-0Zk0zSvq.js +0 -12
- package/dist/src/compiled/_chunks/workflow/sleep-DXZr2BgM.js +0 -1
- package/dist/src/compiled/_chunks/workflow/symbols-BWCAoPHE.js +0 -48
- package/dist/src/evals/checks/checks.d.ts +0 -66
- package/dist/src/evals/checks/checks.js +0 -2
- package/dist/src/evals/checks/index.d.ts +0 -21
- package/dist/src/evals/checks/index.js +0 -1
- package/dist/src/evals/checks/match.js +0 -1
- package/dist/src/evals/define-eval-suite.d.ts +0 -18
- package/dist/src/evals/define-eval-suite.js +0 -1
- package/dist/src/evals/runner/execute-case.d.ts +0 -23
- package/dist/src/evals/runner/execute-case.js +0 -1
- package/dist/src/evals/runner/execute-suite.d.ts +0 -24
- package/dist/src/evals/runner/execute-suite.js +0 -1
- package/dist/src/evals/scorers/autoevals-client.js +0 -2
- package/dist/src/evals/scorers/autoevals.d.ts +0 -58
- package/dist/src/evals/scorers/autoevals.js +0 -1
- package/dist/src/evals/scorers/json.d.ts +0 -10
- package/dist/src/evals/scorers/json.js +0 -1
- package/dist/src/evals/scorers/model-marker.d.ts +0 -12
- package/dist/src/evals/scorers/model-marker.js +0 -1
- package/dist/src/evals/scorers/run.d.ts +0 -24
- package/dist/src/evals/scorers/run.js +0 -1
- package/dist/src/evals/scorers/sql.d.ts +0 -9
- package/dist/src/evals/scorers/sql.js +0 -1
- package/dist/src/evals/scorers/text.d.ts +0 -18
- package/dist/src/evals/scorers/text.js +0 -1
- package/dist/src/evals/scores/index.d.ts +0 -72
- package/dist/src/evals/scores/index.js +0 -1
- package/dist/src/execution/tool-compaction.d.ts +0 -9
- package/dist/src/execution/tool-compaction.js +0 -1
- package/dist/src/services/dev-client/stream.d.ts +0 -5
- package/dist/src/services/dev-client/stream.js +0 -1
- package/dist/src/services/dev-client/url.d.ts +0 -11
- package/dist/src/services/dev-client/url.js +0 -1
- package/dist/src/setup/channel-setup-prompter.d.ts +0 -8
- package/dist/src/setup/channel-setup-prompter.js +0 -1
- package/dist/src/setup/scaffold/channels.js +0 -7
- package/dist/src/setup/scaffold/cli/channel-add-prompter.d.ts +0 -12
- package/dist/src/setup/scaffold/cli/channel-add-prompter.js +0 -1
- package/dist/src/setup/scaffold/cli/channel-setup-prompter.d.ts +0 -56
- package/dist/src/setup/scaffold/cli/connection-add-prompter.d.ts +0 -44
- package/dist/src/setup/scaffold/cli/connection-add-prompter.js +0 -1
- package/dist/src/setup/scaffold/cli/index.js +0 -1
- package/dist/src/setup/scaffold/cli/prompt-ui.js +0 -5
- package/dist/src/setup/scaffold/cli/select-component.js +0 -1
- package/dist/src/setup/scaffold/cli/select-state.js +0 -1
- package/dist/src/setup/scaffold/connections.js +0 -21
- package/dist/src/setup/scaffold/pnpm-workspace.d.ts +0 -3
- package/dist/src/setup/scaffold/pnpm-workspace.js +0 -11
- package/dist/src/setup/scaffold/primitives/detect-deployment.d.ts +0 -13
- package/dist/src/setup/scaffold/primitives/detect-deployment.js +0 -1
- package/dist/src/setup/scaffold/primitives/index.d.ts +0 -3
- package/dist/src/setup/scaffold/primitives/index.js +0 -1
- package/dist/src/setup/scaffold/primitives/pnpm-invocation.d.ts +0 -12
- package/dist/src/setup/scaffold/primitives/pnpm-invocation.js +0 -1
- package/dist/src/setup/scaffold/primitives/run-pnpm.d.ts +0 -17
- package/dist/src/setup/scaffold/primitives/run-pnpm.js +0 -1
- package/dist/src/setup/scaffold/primitives/run-vercel.js +0 -1
- package/dist/src/setup/scaffold/project.d.ts +0 -21
- package/dist/src/setup/scaffold/project.js +0 -80
- package/dist/src/setup/scaffold/steps/deploy-to-vercel.d.ts +0 -17
- package/dist/src/setup/scaffold/steps/deploy-to-vercel.js +0 -1
- package/dist/src/setup/scaffold/steps/index.d.ts +0 -4
- package/dist/src/setup/scaffold/steps/index.js +0 -1
- package/dist/src/setup/scaffold/steps/project-resolution.d.ts +0 -19
- package/dist/src/setup/scaffold/steps/project-resolution.js +0 -1
- package/dist/src/setup/scaffold/steps/run-add-connection.d.ts +0 -40
- package/dist/src/setup/scaffold/steps/run-add-connection.js +0 -1
- package/dist/src/setup/scaffold/steps/run-add-to-agent.d.ts +0 -81
- package/dist/src/setup/scaffold/steps/run-add-to-agent.js +0 -2
- package/dist/src/setup/scaffold/steps/setup-connection.js +0 -1
- package/dist/src/setup/scaffold/steps/setup-slackbot.js +0 -1
- /package/dist/docs/public/{frontend → guides/frontend}/meta.json +0 -0
- /package/dist/docs/public/{advanced → guides}/remote-agents.md +0 -0
- /package/dist/src/{setup/scaffold/cli/channel-setup-prompter.js → cli/dev/tui/setup-flow.js} +0 -0
- /package/dist/src/evals/{scorers/autoevals-client.d.ts → autoevals-client.d.ts} +0 -0
- /package/dist/src/setup/{scaffold/cli → cli}/command-output.d.ts +0 -0
- /package/dist/src/setup/{scaffold/cli → cli}/command-output.js +0 -0
- /package/dist/src/setup/{scaffold/human-action.d.ts → human-action.d.ts} +0 -0
- /package/dist/src/setup/{scaffold/human-action.js → human-action.js} +0 -0
- /package/dist/src/setup/{scaffold/primitives → primitives}/process-output.d.ts +0 -0
- /package/dist/src/setup/{scaffold/primitives → primitives}/process-output.js +0 -0
- /package/dist/src/setup/scaffold/{web-template.d.ts → create/web-template.d.ts} +0 -0
- /package/dist/src/setup/scaffold/{web-template.js → create/web-template.js} +0 -0
- /package/dist/src/setup/scaffold/{module-files.d.ts → update/module-files.d.ts} +0 -0
- /package/dist/src/setup/scaffold/{module-files.js → update/module-files.js} +0 -0
- /package/dist/src/setup/scaffold/{package-json.d.ts → update/package-json.d.ts} +0 -0
- /package/dist/src/setup/scaffold/{package-json.js → update/package-json.js} +0 -0
- /package/dist/src/setup/scaffold/{primitives → update}/update-connection-connector.d.ts +0 -0
- /package/dist/src/setup/scaffold/{primitives → update}/update-connection-connector.js +0 -0
- /package/dist/src/setup/scaffold/{primitives → update}/update-slack-channel.d.ts +0 -0
- /package/dist/src/setup/scaffold/{primitives → update}/update-slack-channel.js +0 -0
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Judge"
|
|
3
|
+
description: "Grade evals with an LLM judge via t.judge.autoevals, set thresholds on the assertion, and configure the judge model."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
When no deterministic [assertion](./assertions) captures what "good" means — factual correctness, summary quality, free-form criteria — grade the run with an LLM judge. The `t.judge.*` assertions are the only model-backed ones, and they use a judge model that is resolved separately from the agent under test: Eve only uses it for scoring, never to swap out the agent.
|
|
7
|
+
|
|
8
|
+
```ts
|
|
9
|
+
import { defineEval } from "eve/evals";
|
|
10
|
+
|
|
11
|
+
export default defineEval({
|
|
12
|
+
async test(t) {
|
|
13
|
+
await t.send("Explain quantum tunneling to a 10-year-old.");
|
|
14
|
+
t.completed();
|
|
15
|
+
t.judge.autoevals.closedQA("uses no math beyond arithmetic").atLeast(0.8);
|
|
16
|
+
},
|
|
17
|
+
});
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## The graders
|
|
21
|
+
|
|
22
|
+
The judges live under `t.judge.autoevals` — the namespace names the [Braintrust autoevals](https://github.com/braintrustdata/autoevals) grader family, so the factuality and closedQA semantics are autoevals', not Eve-invented. Each grades `t.reply` by default and is soft by default (tracked, no gate):
|
|
23
|
+
|
|
24
|
+
| Grader | Grades |
|
|
25
|
+
| ---------------------------------------- | -------------------------------------------------------------------------------------- |
|
|
26
|
+
| `t.judge.autoevals.factuality(expected)` | Factual consistency of the reply against an expected answer (A–E buckets) |
|
|
27
|
+
| `t.judge.autoevals.summarizes(expected)` | How well the reply summarizes the expected text |
|
|
28
|
+
| `t.judge.autoevals.closedQA(criteria)` | Whether the reply satisfies a free-form yes/no criterion (no expected answer to match) |
|
|
29
|
+
| `t.judge.autoevals.sql(expected)` | Semantic equivalence of two SQL statements |
|
|
30
|
+
|
|
31
|
+
The reference or criteria is the positional argument. An options object follows:
|
|
32
|
+
|
|
33
|
+
- `on` — the value to grade, defaulting to `t.reply`. Pass an intermediate draft or parsed value to grade it instead.
|
|
34
|
+
- `model` / `modelOptions` — a per-call judge override (see below).
|
|
35
|
+
|
|
36
|
+
```ts
|
|
37
|
+
const draft = await t.send("Draft the welcome email.");
|
|
38
|
+
t.judge.autoevals.closedQA("professional tone", { on: draft.message }).atLeast(0.6);
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
## Soft scoring and thresholds
|
|
42
|
+
|
|
43
|
+
Judge assertions are soft, so the threshold rides on the assertion handle — there is no separate thresholds map:
|
|
44
|
+
|
|
45
|
+
- **No threshold** — tracked-only. The score lands in reports and artifacts and never fails the eval. Use it to watch a metric without gating on it.
|
|
46
|
+
- `.atLeast(threshold)` — a soft bar. A below-threshold score marks the eval `scored`, fatal only under `eve eval --strict`.
|
|
47
|
+
- `.gate(threshold)` — promote a judge to a hard gate that fails the eval outright.
|
|
48
|
+
|
|
49
|
+
```ts
|
|
50
|
+
t.judge.autoevals.closedQA("cites a source"); // tracked, never fails
|
|
51
|
+
t.judge.autoevals.closedQA("cites a source").atLeast(0.6); // soft, fails under --strict below 0.6
|
|
52
|
+
t.judge.autoevals.factuality(reference).gate(0.8); // hard gate at 0.8
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
A judge runs once per assertion and burns tokens, so reach for one only when nothing deterministic will do. Several slow judge calls in one eval can fan out with `await Promise.all([...])`.
|
|
56
|
+
|
|
57
|
+
## Configuring the judge model
|
|
58
|
+
|
|
59
|
+
The judge model is resolved once when the runner builds `t`. It is **never** the model under test. Three levels resolve innermost-wins:
|
|
60
|
+
|
|
61
|
+
1. **Per-call** — `t.judge.autoevals.closedQA("…", { model, modelOptions })`.
|
|
62
|
+
2. **Per-eval** — `defineEval({ judge: { model, modelOptions }, test })`.
|
|
63
|
+
3. **Project default** — `defineEvalConfig({ judge: { model, modelOptions } })` in `evals.config.ts`.
|
|
64
|
+
|
|
65
|
+
```ts title="evals/evals.config.ts"
|
|
66
|
+
import { defineEvalConfig } from "eve/evals";
|
|
67
|
+
|
|
68
|
+
export default defineEvalConfig({
|
|
69
|
+
judge: { model: "openai/gpt-5.4-mini" }, // the default judge for every eval in this tree
|
|
70
|
+
});
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
```ts title="evals/quantum.eval.ts"
|
|
74
|
+
import { defineEval } from "eve/evals";
|
|
75
|
+
|
|
76
|
+
export default defineEval({
|
|
77
|
+
judge: { model: "anthropic/claude-opus-4.8" }, // a stronger judge for this eval
|
|
78
|
+
async test(t) {
|
|
79
|
+
await t.send("Explain quantum tunneling to a 10-year-old.");
|
|
80
|
+
t.judge.autoevals.factuality(reference).atLeast(0.7);
|
|
81
|
+
t.judge.autoevals.closedQA("is concise", { model: "anthropic/claude-haiku-4-5" }); // cheaper, per-call
|
|
82
|
+
},
|
|
83
|
+
});
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
`judge` in `evals.config.ts` is optional — a tree of fully deterministic evals can omit it. But calling `t.judge.*` with no judge model resolved is a fail-fast error at eval definition time.
|
|
87
|
+
|
|
88
|
+
A **string model id** (e.g. `"anthropic/claude-opus-4.8"`) routes through the Vercel AI Gateway and needs `AI_GATEWAY_API_KEY` or `VERCEL_OIDC_TOKEN` in the environment; an **AI SDK `LanguageModel` instance** is used directly. With a model configured but no credentials, a judge-backed eval **skips visibly** like other real-model legs, so mock-model fixture runs stay green. For provider-specific judge settings, use `modelOptions.providerOptions`.
|
|
89
|
+
|
|
90
|
+
## What to read next
|
|
91
|
+
|
|
92
|
+
- [Assertions](./assertions): deterministic run-level and value assertions
|
|
93
|
+
- [Reporters](./reporters): ship judged scores to Braintrust experiments
|
|
94
|
+
- [Targets and requirements](./targets): gating judge-backed evals on credentials
|
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Overview"
|
|
3
|
+
description: "Define repeatable scored checks for an Eve agent with defineEval and run them with eve eval."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
An eval is a scored check that runs your agent against real sessions and grades the result. Use it to catch regressions when you change a prompt or a tool: drive the agent through one or more turns, then assert on what it did — the run completed, the right tool ran, the reply contains the right text — and optionally ship the results to Braintrust.
|
|
7
|
+
|
|
8
|
+
Evals exercise the same HTTP surface your users hit. The runner boots (or targets) a real agent server, drives sessions through the [TypeScript client](../guides/client/overview) protocol, and grades what comes back — so a passing eval means the agent actually booted, accepted a request, and produced the result you asserted.
|
|
9
|
+
|
|
10
|
+
## `defineEval`
|
|
11
|
+
|
|
12
|
+
Eve discovers evals under the app-root `evals/` directory, in `.eval.ts` files. Each file is exactly one eval — one graded case. The file path is the eval's identity, so you don't author an `id` or `name`; directories group related evals (`evals/weather/brooklyn-forecast.eval.ts` → id `weather/brooklyn-forecast`).
|
|
13
|
+
|
|
14
|
+
```text
|
|
15
|
+
my-agent/
|
|
16
|
+
├── agent/
|
|
17
|
+
├── evals/
|
|
18
|
+
│ ├── evals.config.ts
|
|
19
|
+
│ ├── smoke.eval.ts
|
|
20
|
+
│ └── weather/
|
|
21
|
+
│ ├── brooklyn-forecast.eval.ts
|
|
22
|
+
│ └── no-tools-for-greetings.eval.ts
|
|
23
|
+
└── package.json
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
An eval is a single `async test(t)` function. You drive the agent with `t` and assert on the run with the same `t`:
|
|
27
|
+
|
|
28
|
+
```ts title="evals/weather/brooklyn-forecast.eval.ts"
|
|
29
|
+
import { defineEval } from "eve/evals";
|
|
30
|
+
import { includes } from "eve/evals/expect";
|
|
31
|
+
|
|
32
|
+
export default defineEval({
|
|
33
|
+
description: "Basic message and tool-usage coverage for the weather agent.",
|
|
34
|
+
async test(t) {
|
|
35
|
+
await t.send("What is the weather in Brooklyn?");
|
|
36
|
+
t.completed();
|
|
37
|
+
t.calledTool("get_weather");
|
|
38
|
+
t.check(t.reply, includes("Sunny"));
|
|
39
|
+
},
|
|
40
|
+
});
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
`test` is the only required field. The rest are optional: `description`, `requires`, `judge`, `tags`, `metadata`, `timeoutMs`, `reporters`. The init template adds `evals/**/*.ts` to `tsconfig.json`, so your eval code type-checks alongside the app.
|
|
44
|
+
|
|
45
|
+
## `evals.config.ts`
|
|
46
|
+
|
|
47
|
+
Every `evals/` directory needs exactly one `evals.config.ts` at its root. It declares the defaults every eval shares:
|
|
48
|
+
|
|
49
|
+
```ts title="evals/evals.config.ts"
|
|
50
|
+
import { defineEvalConfig } from "eve/evals";
|
|
51
|
+
import { Braintrust } from "eve/evals/reporters";
|
|
52
|
+
|
|
53
|
+
export default defineEvalConfig({
|
|
54
|
+
judge: { model: "openai/gpt-5.4-mini" },
|
|
55
|
+
reporters: [Braintrust({ projectName: "my-agent" })],
|
|
56
|
+
});
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Everything is optional. `judge` sets the default model for [LLM-as-judge](./judge) assertions (`t.judge.*`) — only evals that use them need it, so a tree of fully deterministic evals can omit it entirely. `reporters`, `maxConcurrency`, and `timeoutMs` round out the defaults. Config `reporters` observe every eval in the run — set one `Braintrust()` here instead of adding it to each eval. CLI flags (`--max-concurrency`, `--timeout`) and per-eval values take precedence over the config defaults.
|
|
60
|
+
|
|
61
|
+
## The `t` context
|
|
62
|
+
|
|
63
|
+
`t` is both the driver and the assertion surface. There are no separate `input`, `run`, `checks`, or `scores` fields — you write ordinary control flow, sending turns and asserting inline.
|
|
64
|
+
|
|
65
|
+
- **Drive** the agent: `t.send(...)`, `t.respond(...)`, `t.respondAll(...)`, `t.sendFile(...)`, `t.expectInputRequests(...)`, `t.newSession()`. Read what came back with `t.reply` (the last assistant message), `t.sessionId`, and `t.events`. See [Cases](./cases).
|
|
66
|
+
- **Assert** with three surfaces, covered next.
|
|
67
|
+
|
|
68
|
+
## Three assertion surfaces
|
|
69
|
+
|
|
70
|
+
Each surface matches a genuinely different kind of judgment:
|
|
71
|
+
|
|
72
|
+
- **Run-level methods** read the whole run — `t.completed()`, `t.calledTool("get_weather")`, `t.usedNoTools()`, `t.toolOrder([...])`. They take no value because they observe the run itself. See [Assertions](./assertions).
|
|
73
|
+
- **`t.check(value, assertion)`** grades an explicit value with a deterministic builder from `eve/evals/expect` — `t.check(t.reply, includes("sunny"))`. Grade `t.reply`, an intermediate draft, parsed JSON — anything. See [Assertions](./assertions).
|
|
74
|
+
- **`t.judge.autoevals.*`** is the LLM-as-judge surface — `t.judge.autoevals.closedQA("cites a source")`. It grades `t.reply` by default and uses the configured judge model, never the agent under test. See [Judge](./judge).
|
|
75
|
+
|
|
76
|
+
## Gate vs soft
|
|
77
|
+
|
|
78
|
+
Every assertion returns a chainable handle, so severity rides on the assertion itself — there is no separate thresholds map:
|
|
79
|
+
|
|
80
|
+
- **Gates** are hard. A failed gate marks the eval `failed` and `eve eval` exits non-zero. Run-level methods, `includes`, `equals`, and `matches` are gates by default.
|
|
81
|
+
- **Soft** assertions are tracked data. They land in reports and artifacts, and a below-threshold soft assertion marks the eval `scored` — visible but not fatal, unless you pass `--strict`. `similarity` and every `t.judge.*` assertion are soft by default. A soft assertion with no threshold is tracked-only and never fails.
|
|
82
|
+
|
|
83
|
+
Override per assertion: `.gate(threshold?)` promotes to a hard gate, `.soft(threshold?)` demotes to tracked, and `.atLeast(threshold)` is a soft assertion with a bar.
|
|
84
|
+
|
|
85
|
+
```ts
|
|
86
|
+
t.completed(); // gate
|
|
87
|
+
t.calledTool("get_weather").soft(); // record as a metric, don't gate
|
|
88
|
+
t.judge.autoevals.closedQA("cites a source"); // soft, tracked (no threshold)
|
|
89
|
+
t.judge.autoevals.factuality(reference).atLeast(0.7); // soft, gated under --strict at 0.7
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
## Run it
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
eve eval # run all discovered evals against a local dev server
|
|
96
|
+
eve eval weather # run one eval, or every eval under evals/weather/
|
|
97
|
+
eve eval --url https://<app> # target an existing server or deployment
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
Exit code `0` means every eval passed its gates. See [Running evals](./running) for the full flag list, exit codes, and CI guidance.
|
|
101
|
+
|
|
102
|
+
## A good baseline
|
|
103
|
+
|
|
104
|
+
Most apps do fine with a few small smoke evals. Assert behavior with `t.completed()` plus one or two content checks, keep dataset fixtures in `evals/data/`, and only reach for a judge or Braintrust once you actually need fuzzy grading or shared result review. In CI, run `eve eval --strict` so soft threshold misses fail the build too.
|
|
105
|
+
|
|
106
|
+
The rest of this section covers each piece:
|
|
107
|
+
|
|
108
|
+
- [Cases](./cases): single-turn evals, scripted multi-turn evals, and dataset fan-out
|
|
109
|
+
- [Assertions](./assertions): run-level methods and `t.check` value assertions, with matchers and severity
|
|
110
|
+
- [Judge](./judge): LLM-as-judge grading and the judge model
|
|
111
|
+
- [Targets and requirements](./targets): local vs remote targets, and gating evals on capabilities
|
|
112
|
+
- [Reporters](./reporters): Braintrust experiments and JUnit XML
|
|
113
|
+
- [Running evals](./running): the `eve eval` CLI, exit codes, and artifacts
|
|
114
|
+
|
|
115
|
+
## What to read next
|
|
116
|
+
|
|
117
|
+
- [Cases](./cases): author your first evals
|
|
118
|
+
- [Tools](../tools): the surface most evals assert on
|
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Reporters"
|
|
3
|
+
description: "Ship eval results to Braintrust experiments or JUnit XML — Eve runs and scores everything itself."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
Eve runs and grades everything itself; reporters just ship the results out. The CLI prints a console summary by default — one line per eval, failed assertions with their messages — and reporters from `eve/evals/reporters` add destinations on top.
|
|
7
|
+
|
|
8
|
+
Reporters attach in two places. Declare them in `evals.config.ts` to observe **every** eval in the run — the usual choice for a shared destination like one Braintrust experiment, so you don't repeat the reporter in each file. Or list them on an individual eval's `reporters` to scope a destination to that eval (or to a group of evals that share one instance).
|
|
9
|
+
|
|
10
|
+
## Braintrust
|
|
11
|
+
|
|
12
|
+
`Braintrust(...)` uploads eval results to Braintrust experiments. Put one instance in the config so it covers the whole run:
|
|
13
|
+
|
|
14
|
+
```ts title="evals/evals.config.ts"
|
|
15
|
+
import { defineEvalConfig } from "eve/evals";
|
|
16
|
+
import { Braintrust } from "eve/evals/reporters";
|
|
17
|
+
|
|
18
|
+
export default defineEvalConfig({
|
|
19
|
+
judge: { model: "openai/gpt-5.4-mini" },
|
|
20
|
+
reporters: [Braintrust({ projectName: "weather-agent" })],
|
|
21
|
+
});
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Need a destination for only some evals? Attach it per eval instead:
|
|
25
|
+
|
|
26
|
+
```ts title="evals/brooklyn-forecast.eval.ts"
|
|
27
|
+
import { defineEval } from "eve/evals";
|
|
28
|
+
import { Braintrust } from "eve/evals/reporters";
|
|
29
|
+
|
|
30
|
+
export default defineEval({
|
|
31
|
+
reporters: [Braintrust({ projectName: "weather-agent" })],
|
|
32
|
+
async test(t) {
|
|
33
|
+
await t.send("What is the weather in Brooklyn?");
|
|
34
|
+
t.completed();
|
|
35
|
+
},
|
|
36
|
+
});
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
The reporter config takes an optional `projectName` and `experimentName`, plus a base experiment (by name or id) to diff against. Gate assertions log as binary scores under a `gate:` prefix so experiments diff gate regressions the same way they diff soft-score regressions. Eval `metadata` rides along to reporters.
|
|
40
|
+
|
|
41
|
+
A reporter instance observes the evals that reference it: share one instance across several evals — the config, a `shared.ts` export, or every entry of a dataset array — and their results land in a single experiment. Listing the same config reporter on an eval too does not double-report it.
|
|
42
|
+
|
|
43
|
+
Braintrust needs its SDK installed in the app and credentials in the environment. Pass `--skip-report` to run the eval without shipping results (this also suppresses config reporters) — useful locally when iterating.
|
|
44
|
+
|
|
45
|
+
## JUnit
|
|
46
|
+
|
|
47
|
+
`JUnit({ filePath })` writes JUnit XML for CI annotations. The `--junit <path>` CLI flag does the same thing without touching the eval file, which is usually the better fit — CI owns the output path, not the eval:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
eve eval --strict --junit .eve/junit.xml
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Each eval becomes one `<testcase>` named by its path-derived id; failed gates and execution errors land as failure messages on the matching test case, so CI surfaces them inline.
|
|
54
|
+
|
|
55
|
+
## Custom reporters
|
|
56
|
+
|
|
57
|
+
A reporter implements the `EvalReporter` interface from `eve/evals/reporters` and receives the same structured results the built-ins do. Reach for one only when a destination isn't covered — the per-run artifacts under `.eve/evals/` already capture everything for ad-hoc inspection.
|
|
58
|
+
|
|
59
|
+
## What to read next
|
|
60
|
+
|
|
61
|
+
- [Running evals](./running): console output, `--json`, and artifacts
|
|
62
|
+
- [Judge](./judge): what the reported numbers mean
|
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Running Evals"
|
|
3
|
+
description: "The eve eval CLI: flags, filters, exit codes, artifacts, and how to wire evals into CI."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
`eve eval` discovers every `.eval.ts` file under `evals/`, boots a local dev server (or targets a remote one), runs the evals concurrently, and prints a per-eval summary.
|
|
7
|
+
|
|
8
|
+
```bash
|
|
9
|
+
eve eval # run all discovered evals locally
|
|
10
|
+
eve eval weather smoke # run selected evals (an id, or a directory prefix)
|
|
11
|
+
eve eval --url https://<app> # target a remote app instead of a local host
|
|
12
|
+
eve eval --mock-models # local dev target uses deterministic mock models
|
|
13
|
+
eve eval --tag fast # only evals carrying a tag
|
|
14
|
+
eve eval --strict # soft below-threshold assertions also fail the exit code
|
|
15
|
+
eve eval --no-skips # unmet requirements fail instead of skipping
|
|
16
|
+
eve eval --timeout 60000 # per-eval timeout in milliseconds
|
|
17
|
+
eve eval --max-concurrency 4 # cap concurrent eval executions (default 8)
|
|
18
|
+
eve eval --junit .eve/junit.xml # write JUnit XML
|
|
19
|
+
eve eval --list # print discovered evals without running
|
|
20
|
+
eve eval --verbose # stream per-eval ctx.log lines to stdout
|
|
21
|
+
eve eval --json # machine-readable output
|
|
22
|
+
eve eval --skip-report # skip config and eval-defined reporters (e.g. Braintrust)
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
Positional ids match exactly or by directory prefix: `eve eval weather` runs `evals/weather.eval.ts`, every eval under `evals/weather/`, and every entry of an array-exported `weather.eval.ts`.
|
|
26
|
+
|
|
27
|
+
## Exit codes
|
|
28
|
+
|
|
29
|
+
| Code | Means |
|
|
30
|
+
| ---- | ------------------------------------------------------------------------------- |
|
|
31
|
+
| `0` | Every eval passed its gates (and soft thresholds, under `--strict`) |
|
|
32
|
+
| `1` | Any eval failed — a failed gate, an execution error, or a strict threshold miss |
|
|
33
|
+
| `2` | Configuration error |
|
|
34
|
+
|
|
35
|
+
Unmet [requirements](./targets) skip visibly without affecting the exit code unless you pass `--no-skips`.
|
|
36
|
+
|
|
37
|
+
## Artifacts
|
|
38
|
+
|
|
39
|
+
Each run drops artifacts under `.eve/evals/<timestamp>/`: a run `summary.json`, a `results.jsonl` index, and per-eval assertion results, verdicts, captured event streams, and `t.log` lines under `evals/`. The console output stays tight on purpose; when an eval fails, the artifact has the full story.
|
|
40
|
+
|
|
41
|
+
## CI
|
|
42
|
+
|
|
43
|
+
A solid CI invocation is strict, deterministic, and machine-reportable:
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
eve eval --strict --mock-models --junit .eve/junit.xml
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
- `--strict` turns soft threshold misses into failures, so score regressions block the merge.
|
|
50
|
+
- `--mock-models` keeps the default leg deterministic and credential-free. Put real-model evals in their own files gated on `requires: ["env:..."]`, and add `--no-skips` on legs that must prove those ran.
|
|
51
|
+
- `--junit` gives the CI provider per-eval annotations; upload the `.eve/evals/` directory as a failure artifact for the full event streams.
|
|
52
|
+
|
|
53
|
+
Against a deployed app, swap `--mock-models` for `--url`:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
eve eval --strict --url "$DEPLOY_URL" --junit .eve/junit.xml
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## What to read next
|
|
60
|
+
|
|
61
|
+
- [Targets and requirements](./targets): what `--url`, `--mock-models`, and `--no-skips` interact with
|
|
62
|
+
- [Reporters](./reporters): Braintrust and JUnit output
|
|
63
|
+
- [CLI reference](../reference/cli): the rest of the `eve` CLI
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Targets and Requirements"
|
|
3
|
+
description: "Point evals at a local dev server or a deployment, and gate evals on target capabilities with requires."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
An eval target is always an HTTP URL. `eve eval` starts a local dev server, while `eve eval --url <url>` runs against an existing server or deployment — the same eval files work for both, which is what makes evals usable as end-to-end tests in CI.
|
|
7
|
+
|
|
8
|
+
The runner polls `/eve/v1/health`, verifies `/eve/v1/info`, and exposes the live target as `t.target` inside the `test` function.
|
|
9
|
+
|
|
10
|
+
## Target helpers
|
|
11
|
+
|
|
12
|
+
```ts title="evals/heartbeat.eval.ts"
|
|
13
|
+
import { defineEval } from "eve/evals";
|
|
14
|
+
|
|
15
|
+
export default defineEval({
|
|
16
|
+
requires: ["mockModels", "devRoutes"],
|
|
17
|
+
async test(t) {
|
|
18
|
+
const { sessionIds } = await t.target.dispatchSchedule("heartbeat");
|
|
19
|
+
await t.target.attachSession(sessionIds[0]!);
|
|
20
|
+
t.completed();
|
|
21
|
+
t.calledTool("send_report");
|
|
22
|
+
},
|
|
23
|
+
});
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
- `t.target.fetch(path, init)` performs an authenticated fetch against the target — useful for channel and webhook ingress.
|
|
27
|
+
- `t.target.dispatchSchedule(id)` triggers a [schedule](../schedules) through the dev-only schedule route and returns the session ids it created. It requires the `devRoutes` capability.
|
|
28
|
+
- `t.target.attachSession(sessionId, { startIndex? })` consumes one turn from a session created outside the eval — by a channel or a schedule — so its events feed the run-level assertions.
|
|
29
|
+
|
|
30
|
+
Sessions attached this way are full `EveEvalSession`s: you can keep driving them with `send` and read their event streams. The run-level assertions on `t` (`t.completed()`, `t.calledTool(...)`) read the whole run, attached sessions included.
|
|
31
|
+
|
|
32
|
+
## Requirements
|
|
33
|
+
|
|
34
|
+
Use `requires` to declare assumptions the runner verifies before executing an eval:
|
|
35
|
+
|
|
36
|
+
| Requirement | Means |
|
|
37
|
+
| -------------- | --------------------------------------------------------------------- |
|
|
38
|
+
| `"mockModels"` | `/eve/v1/info` reports the deterministic mock model adapter is active |
|
|
39
|
+
| `"devRoutes"` | `/eve/v1/info` reports dev-only routes are mounted |
|
|
40
|
+
| `"env:NAME"` | The eval process has environment variable `NAME` set |
|
|
41
|
+
|
|
42
|
+
Unmet requirements produce a visible `skipped` verdict and do not affect the exit code. Pass `--no-skips` when a CI leg must prove full coverage.
|
|
43
|
+
|
|
44
|
+
## Mock models
|
|
45
|
+
|
|
46
|
+
Deterministic evals — the kind you want in CI — should not depend on a live model. `eve eval --mock-models` starts the local dev server with deterministic authored models, and `requires: ["mockModels"]` makes the dependency explicit so the eval skips instead of flaking anywhere else.
|
|
47
|
+
|
|
48
|
+
`--mock-models` is invalid with `--url` because remote target capabilities are discovered, not set by the runner. For evals that genuinely need a real model — judging nuanced behavior, exercising provider-side tools — gate them on credentials instead (`requires: ["env:AI_GATEWAY_API_KEY"]`) and keep them in their own eval files so a tag filter can select or exclude them.
|
|
49
|
+
|
|
50
|
+
## What to read next
|
|
51
|
+
|
|
52
|
+
- [Running evals](./running): `--url`, `--mock-models`, and `--no-skips` in practice
|
|
53
|
+
- [Schedules](../schedules): the surface `dispatchSchedule` drives
|
|
54
|
+
- [Channels](../channels/overview): ingress you can exercise with `target.fetch`
|
|
@@ -5,52 +5,56 @@ description: "Install Eve, scaffold your first agent, give it a tool, and run it
|
|
|
5
5
|
|
|
6
6
|
Eve is a filesystem-first framework for durable agents: you write capabilities under `agent/`, and Eve runs the model loop, persists every session, and serves the agent over HTTP and platform channels. This guide gets a small app running locally and walks the current request loop end to end: build, run, message, stream, and follow up.
|
|
7
7
|
|
|
8
|
+
## Quick start
|
|
9
|
+
|
|
10
|
+
Run `eve init` with `npx` before Eve is installed locally:
|
|
11
|
+
|
|
12
|
+
```bash
|
|
13
|
+
npx eve@latest init my-agent
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
The command creates an npm-managed child directory, uses Eve's default model, installs dependencies, initializes Git, and starts the development server — the interactive [terminal UI](./guides/dev-tui) opens; type a message and watch the model loop run. Pass `--channel-web-nextjs` to add the Web Chat application; every app ships the built-in HTTP channel (`agent/channels/eve.ts`) regardless. Stop the server before editing the generated agent. The command does not create a Vercel project or deploy.
|
|
17
|
+
|
|
8
18
|
## Prerequisites
|
|
9
19
|
|
|
10
20
|
- Node `24.x`
|
|
11
|
-
-
|
|
21
|
+
- npm (bundled with Node)
|
|
12
22
|
|
|
13
23
|
You also need a model credential. Set the provider or gateway key your model string requires (gateway ids like `anthropic/claude-opus-4.8` route through the Vercel AI Gateway), or link a Vercel project that supplies one.
|
|
14
24
|
|
|
15
|
-
##
|
|
16
|
-
|
|
17
|
-
The fastest path is the `create` CLI. It scaffolds the project, prompts for a model, and wires up an optional channel:
|
|
18
|
-
|
|
19
|
-
```bash
|
|
20
|
-
pnpm create eve@beta
|
|
21
|
-
```
|
|
25
|
+
## Manual installation
|
|
22
26
|
|
|
23
|
-
The
|
|
27
|
+
The quick start uses `eve init` for a guided scaffold. The target can also be an existing project directory (`eve init .`): the project must have a `package.json`, the `agent/` files must not exist yet, and the missing `eve`, `ai`, and `zod` dependencies are added without touching anything else the project owns. Either way the final handoff runs the `eve dev` binary through the project's package manager, never the project's own `dev` script.
|
|
24
28
|
|
|
25
|
-
To
|
|
29
|
+
To wire Eve into an existing app yourself instead, add only the dependency and author the two files the runtime needs:
|
|
26
30
|
|
|
27
31
|
```bash
|
|
28
|
-
|
|
32
|
+
npm install eve@latest
|
|
29
33
|
```
|
|
30
34
|
|
|
31
|
-
|
|
35
|
+
### Project files
|
|
32
36
|
|
|
33
|
-
|
|
37
|
+
A minimal agent is two files; you add tools as you need them.
|
|
34
38
|
|
|
35
|
-
`agent/instructions.md`
|
|
39
|
+
`agent/instructions.md` is the always-on system prompt:
|
|
36
40
|
|
|
37
41
|
```md
|
|
38
42
|
You are a concise assistant. Use tools when they are available.
|
|
39
43
|
```
|
|
40
44
|
|
|
41
|
-
`agent/agent.ts`
|
|
45
|
+
`agent/agent.ts` holds runtime config:
|
|
42
46
|
|
|
43
47
|
```ts
|
|
44
48
|
import { defineAgent } from "eve";
|
|
45
49
|
|
|
46
50
|
export default defineAgent({
|
|
47
|
-
model: "anthropic/claude-
|
|
51
|
+
model: "anthropic/claude-sonnet-4.6",
|
|
48
52
|
});
|
|
49
53
|
```
|
|
50
54
|
|
|
51
|
-
Even at this size the agent can already do real work. The default harness gives it file, shell, web, and delegation tools out of the box. See [Default harness](./
|
|
55
|
+
Even at this size the agent can already do real work. The default harness gives it file, shell, web, and delegation tools out of the box. See [Default harness](./concepts/default-harness) for the full set and how to override or disable any of them.
|
|
52
56
|
|
|
53
|
-
### Add
|
|
57
|
+
### Add your first tool
|
|
54
58
|
|
|
55
59
|
Whatever you name the file becomes the tool name the model sees. Create `agent/tools/get_weather.ts`:
|
|
56
60
|
|
|
@@ -71,12 +75,12 @@ export default defineTool({
|
|
|
71
75
|
|
|
72
76
|
Tools run in your app runtime with full `process.env`, not inside the [sandbox](./sandbox). More in [Tools](./tools).
|
|
73
77
|
|
|
74
|
-
## Run
|
|
78
|
+
## Run the app
|
|
75
79
|
|
|
76
80
|
From the app root:
|
|
77
81
|
|
|
78
82
|
```bash
|
|
79
|
-
|
|
83
|
+
npm run dev
|
|
80
84
|
```
|
|
81
85
|
|
|
82
86
|
Useful commands:
|
|
@@ -84,13 +88,13 @@ Useful commands:
|
|
|
84
88
|
- `eve info`: show the active routes and compiled artifacts
|
|
85
89
|
- `eve build`: compile the agent into `.eve/` and build the host output
|
|
86
90
|
- `eve start`: serve the built output
|
|
87
|
-
- `eve dev`: start the local runtime and open the interactive [terminal UI](./
|
|
91
|
+
- `eve dev`: start the local runtime and open the interactive [terminal UI](./guides/dev-tui)
|
|
88
92
|
|
|
89
93
|
In the dev TUI, type a message and watch it happen in order: the `get_weather` call, its result, then the reply.
|
|
90
94
|
|
|
91
|
-
The same CLI can point at a deployment. `eve dev https://your-app.vercel.app` drives a deployed app, which is handy for preview and production smoke tests. See [Deployment](./
|
|
95
|
+
The same CLI can point at a deployment. `eve dev https://your-app.vercel.app` drives a deployed app, which is handy for preview and production smoke tests. See [Deployment](./guides/deployment).
|
|
92
96
|
|
|
93
|
-
## Send
|
|
97
|
+
## Send a message
|
|
94
98
|
|
|
95
99
|
Every Eve app exposes the same stable HTTP API. Start a durable session:
|
|
96
100
|
|
|
@@ -105,7 +109,7 @@ The response comes back with two things you'll reuse:
|
|
|
105
109
|
- a `continuationToken` in the JSON body, to resume this conversation
|
|
106
110
|
- an `x-eve-session-id` header that identifies the run to stream
|
|
107
111
|
|
|
108
|
-
## Stream
|
|
112
|
+
## Stream the session
|
|
109
113
|
|
|
110
114
|
Attach to the session stream:
|
|
111
115
|
|
|
@@ -137,7 +141,7 @@ The stream is NDJSON. Expect lifecycle events such as:
|
|
|
137
141
|
|
|
138
142
|
`message.completed.data.finishReason` tells you whether assistant text is interim tool-call narration or a terminal reply, and `step.completed.data.usage` carries token usage. When a parent delegates to a subagent, `subagent.called.data.childSessionId` gives you the child session id, so you can subscribe to that child stream and watch the delegated work.
|
|
139
143
|
|
|
140
|
-
## Send
|
|
144
|
+
## Send a follow-up message
|
|
141
145
|
|
|
142
146
|
When the session is waiting for the next user message, post a follow-up with the token:
|
|
143
147
|
|
|
@@ -147,16 +151,16 @@ curl -X POST http://127.0.0.1:3000/eve/v1/session/<sessionId> \
|
|
|
147
151
|
-d '{"continuationToken":"<token>","message":"Now do Queens."}'
|
|
148
152
|
```
|
|
149
153
|
|
|
150
|
-
See [Sessions, runs & streaming](./
|
|
154
|
+
See [Sessions, runs & streaming](./concepts/sessions-runs-and-streaming) for the full contract.
|
|
151
155
|
|
|
152
156
|
## Setting up with a coding agent
|
|
153
157
|
|
|
154
158
|
If a coding agent (Claude Code, Cursor, and the like) is doing the setup, hand it this prompt:
|
|
155
159
|
|
|
156
|
-
<CopyPrompt text="Set up an Eve agent for the user. Eve is a filesystem-first TypeScript framework for durable agents, published as the npm package eve. Read its docs: once eve is installed they are bundled in the package at node_modules/eve/dist/docs/public; before eve is installed, read the published Introduction and Getting Started pages. If the project has no Eve app, scaffold one with `
|
|
160
|
+
<CopyPrompt text="Set up an Eve agent for the user. Eve is a filesystem-first TypeScript framework for durable agents, published as the npm package eve. Read its docs: once eve is installed they are bundled in the package at node_modules/eve/dist/docs/public; before eve is installed, read the published Introduction and Getting Started pages. If the project has no Eve app, scaffold one with `npx eve@latest init <name>`; add `--channel-web-nextjs` only when the user wants Web Chat. The init command installs dependencies, initializes Git, and starts the dev server, so run it in a controllable process and stop it before editing. To add Eve to an existing app, run `npm install eve@latest`. Make sure agent/agent.ts and agent/instructions.md exist, then add a first typed tool at agent/tools/get_weather.ts using defineTool from eve/tools with a Zod inputSchema and an inline execute. Start the dev server again, then exercise the HTTP API: create a session with POST /eve/v1/session, attach to GET /eve/v1/session/:id/stream, and send a follow-up with the returned continuationToken. Verify with the project's typecheck, adapt model and provider choices to the project, and do not commit unless the user asks.">
|
|
157
161
|
Set up an Eve agent: read the Eve docs (bundled at node_modules/eve/dist/docs/public once eve is
|
|
158
|
-
installed), scaffold with `
|
|
159
|
-
a typed tool at agent/tools/get_weather.ts, run it with `
|
|
162
|
+
installed), scaffold with `npx eve@latest init <name>` (or `npm install eve@latest` in an existing app), add
|
|
163
|
+
a typed tool at agent/tools/get_weather.ts, run it with `npm run dev`, then create a session, stream
|
|
160
164
|
it, and send a follow-up.
|
|
161
165
|
</CopyPrompt>
|
|
162
166
|
|
|
@@ -164,13 +168,14 @@ Once `eve` is a dependency, the full docs are bundled in the package, so the age
|
|
|
164
168
|
|
|
165
169
|
- Docs: `node_modules/eve/dist/docs/public/`
|
|
166
170
|
|
|
167
|
-
|
|
171
|
+
`eve init <name>` creates the base agent; `eve init .` adds one to an existing project. Add `--channel-web-nextjs` for Web Chat, or run
|
|
172
|
+
`eve channels add slack` later from an interactive terminal.
|
|
168
173
|
|
|
169
174
|
## What to read next
|
|
170
175
|
|
|
171
176
|
- [Instructions](./instructions) and [Tools](./tools): the core building blocks
|
|
172
177
|
- [Channels](./channels/overview): reach the agent from Slack, Discord, or a web UI
|
|
173
|
-
- [Frontend](./frontend/overview): browser chat with `useEveAgent`
|
|
174
|
-
- [TypeScript
|
|
175
|
-
- [Sessions, runs & streaming](./
|
|
178
|
+
- [Frontend](./guides/frontend/overview): browser chat with `useEveAgent`
|
|
179
|
+
- [TypeScript SDK](./guides/client/overview): call the agent from scripts or server-side code
|
|
180
|
+
- [Sessions, runs & streaming](./concepts/sessions-runs-and-streaming): the durable session model
|
|
176
181
|
- [Build an agent](./tutorial/first-agent): the full end-to-end walkthrough
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
---
|
|
2
|
-
title: "Auth &
|
|
2
|
+
title: "Auth & Route Protection"
|
|
3
3
|
description: "Secure your agent's HTTP routes with an ordered auth walk, verifier helpers, and connection OAuth via Vercel Connect."
|
|
4
4
|
---
|
|
5
5
|
|
|
@@ -184,7 +184,7 @@ export default defineChannel({
|
|
|
184
184
|
|
|
185
185
|
## Replace `placeholderAuth` before production
|
|
186
186
|
|
|
187
|
-
`
|
|
187
|
+
`eve init` scaffolds `agent/channels/eve.ts` with a `placeholderAuth()` guardrail:
|
|
188
188
|
|
|
189
189
|
```ts
|
|
190
190
|
import { eveChannel } from "eve/channels/eve";
|
|
@@ -263,8 +263,10 @@ Declaring `auth` adds two accessors to the tool's `ctx`:
|
|
|
263
263
|
|
|
264
264
|
Throw `ConnectionAuthorizationRequiredError` anywhere in `execute` (directly, via `requireAuth()`, or implicitly from `getToken()`) and you trigger the consent flow, keyed by the tool's name. Calling either accessor on a tool that does not declare `auth` throws.
|
|
265
265
|
|
|
266
|
+
By default the sign-in affordance title-cases the tool's path-derived name — a tool file named `sfdc_lookup.ts` renders "Sign in with Sfdc_lookup". Set `displayName` on the `auth` definition to control what users see instead: `auth: { ...connect("sfdc"), displayName: "Salesforce" }`. It is presentation-only; the tool's name still keys the authorization scope, token cache, and callback URL, and a definition-level `displayName` wins over one the strategy stamps on the challenge.
|
|
267
|
+
|
|
266
268
|
## What to read next
|
|
267
269
|
|
|
268
|
-
- [Security model](
|
|
270
|
+
- [Security model](../concepts/security-model): trust boundaries and the pre-production checklist
|
|
269
271
|
- [Connections](../connections): connection auth shapes (`connect()` vs static token)
|
|
270
272
|
- [Deployment](./deployment): where route-auth secrets live in production
|
|
@@ -122,5 +122,5 @@ for await (const event of session.stream({ startIndex: 0 })) {
|
|
|
122
122
|
## What to read next
|
|
123
123
|
|
|
124
124
|
- [Streaming](./streaming): stream events and reconnect by index
|
|
125
|
-
- [Sessions, runs & streaming](
|
|
126
|
-
- [Eve channel](
|
|
125
|
+
- [Sessions, runs & streaming](../../concepts/sessions-runs-and-streaming): the raw HTTP contract
|
|
126
|
+
- [Eve channel](../../channels/eve): where continuation tokens come from
|
|
@@ -149,4 +149,4 @@ Don't do both on the same response. Once the stream is consumed, the `ClientSess
|
|
|
149
149
|
|
|
150
150
|
- [Continuations](./continuations): how the session cursor advances
|
|
151
151
|
- [Streaming](./streaming): handle events live instead of using `result()`
|
|
152
|
-
- [Tools](
|
|
152
|
+
- [Tools](../../tools): configure approvals and question prompts
|
|
@@ -126,10 +126,10 @@ const followUp = await followUpResponse.result();
|
|
|
126
126
|
console.log(followUp.data); // undefined unless this turn also requested a schema
|
|
127
127
|
```
|
|
128
128
|
|
|
129
|
-
For task-mode output that belongs to the agent or subagent definition itself, see [`agent.ts`](
|
|
129
|
+
For task-mode output that belongs to the agent or subagent definition itself, see [`agent.ts`](../../agent-config#outputschema) and [Subagents](../../subagents).
|
|
130
130
|
|
|
131
131
|
## What to read next
|
|
132
132
|
|
|
133
133
|
- [Messages](./messages): send turns with `send()`
|
|
134
134
|
- [Streaming](./streaming): handle `result.completed` live
|
|
135
|
-
- [`agent.ts`](
|
|
135
|
+
- [`agent.ts`](../../agent-config#outputschema): configured task-mode output
|