authsome 0.3.0__tar.gz → 0.3.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- authsome-0.3.2/.claude/commands/run-evals.md +495 -0
- {authsome-0.3.0 → authsome-0.3.2}/.claude-plugin/marketplace.json +1 -1
- authsome-0.3.2/.github/release-please-manifest.json +3 -0
- {authsome-0.3.0 → authsome-0.3.2}/.gitignore +7 -0
- {authsome-0.3.0 → authsome-0.3.2}/CHANGELOG.md +62 -0
- {authsome-0.3.0 → authsome-0.3.2}/CONTRIBUTING.md +1 -1
- {authsome-0.3.0 → authsome-0.3.2}/PKG-INFO +38 -20
- {authsome-0.3.0 → authsome-0.3.2}/README.md +36 -19
- authsome-0.3.2/docs/internal/cli-design-review.md +253 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/internal/manual-testing.md +18 -12
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/README.md +1 -1
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/changelog.mdx +5 -5
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/compared.mdx +1 -1
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/concepts/credential-storage.mdx +24 -23
- authsome-0.3.2/docs/site/concepts/profiles-vs-connections.mdx +86 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/concepts/provider-registry.mdx +4 -4
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/concepts/the-daemon.mdx +3 -3
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/docs.json +1 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/custom-providers.mdx +10 -10
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/headless-device-code.mdx +4 -4
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/login-with-oauth.mdx +12 -12
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/multiple-connections.mdx +14 -14
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/profiles.mdx +1 -1
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/run-agents-with-proxy.mdx +6 -6
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/use-api-keys.mdx +15 -15
- authsome-0.3.2/docs/site/images/login-github-authsome.png +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/index.mdx +2 -2
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/installation.mdx +29 -25
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/anthropic-sdk.mdx +5 -5
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/claude-code.mdx +23 -19
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/codex.mdx +18 -14
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/cowork.mdx +4 -4
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/cursor.mdx +17 -9
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/index.mdx +3 -3
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/langchain.mdx +3 -3
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/llamaindex.mdx +6 -6
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/nanoclaw.mdx +3 -3
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/openai-agents-sdk.mdx +5 -5
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/opencode.mdx +11 -7
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/python.mdx +3 -3
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/ahrefs.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/apollo.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/ashby.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/beehiiv.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/brevo.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/buffer.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/calendly.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/clearbit.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/dub.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/g2.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/hunter.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/index.mdx +5 -5
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/instantly.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/intercom.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/keywords-everywhere.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/klaviyo.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/lemlist.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/livestorm.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/mailchimp.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/mention-me.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/openai.mdx +15 -15
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/optimizely.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/postmark.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/resend.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/rewardful.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/savvycal.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/semrush.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/sendgrid.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/tolt.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/typeform.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/wistia.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/zapier.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/atlassian.mdx +10 -10
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/discord.mdx +10 -10
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/github.mdx +52 -27
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/gitlab.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/google.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/hubspot.mdx +10 -10
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/index.mdx +3 -3
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/klaviyo-oauth.mdx +10 -10
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/linear.mdx +10 -10
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/microsoft.mdx +11 -11
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/notion-dcr.mdx +8 -8
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/notion.mdx +8 -8
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/postiz.mdx +8 -8
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/slack.mdx +10 -10
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/x.mdx +10 -10
- authsome-0.3.2/docs/site/quickstart.mdx +303 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/audit-log.mdx +3 -3
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/bundled-providers.mdx +5 -5
- authsome-0.3.2/docs/site/reference/cli.mdx +298 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/daemon-api.mdx +3 -3
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/environment-variables.mdx +14 -14
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/file-layout.mdx +7 -7
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/python-library.mdx +4 -4
- authsome-0.3.2/docs/site/roadmap.mdx +55 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/security/disclosure.mdx +1 -1
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/security/encryption.mdx +4 -4
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/security/hosted-deployment.mdx +1 -1
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/security/threat-model.mdx +1 -1
- authsome-0.3.2/docs/site/troubleshooting/auth-errors.mdx +91 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/troubleshooting/daemon-issues.mdx +13 -13
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/troubleshooting/doctor.mdx +5 -5
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/troubleshooting/oauth-callbacks.mdx +7 -7
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/troubleshooting/proxy-networking.mdx +7 -7
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/troubleshooting/token-refresh.mdx +12 -12
- authsome-0.3.2/evals/.gitignore +2 -0
- authsome-0.3.2/evals/evals.json +85 -0
- authsome-0.3.2/evals/generate_report.py +278 -0
- {authsome-0.3.0 → authsome-0.3.2}/pyproject.toml +5 -1
- authsome-0.3.2/skills/authsome/SKILL.md +112 -0
- authsome-0.3.2/skills/authsome/references/adding-provider.md +19 -0
- authsome-0.3.2/skills/authsome/references/feedback.md +85 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/service.py +54 -39
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/client.py +8 -3
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/client_config.py +17 -1
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/context.py +6 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/daemon_control.py +58 -10
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/main.py +150 -37
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/proxy/runner.py +4 -2
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/proxy/server.py +79 -14
- authsome-0.3.2/src/authsome/server/analytics.py +37 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/app.py +3 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/auth.py +67 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/connections.py +24 -1
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/health.py +11 -4
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/identities.py +8 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/providers.py +18 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/proxy.py +14 -3
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/_layout.html +2 -2
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/overview.html +1 -1
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_client_signing.py +2 -1
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_daemon.py +11 -1
- {authsome-0.3.0 → authsome-0.3.2}/tests/proxy/test_proxy.py +80 -4
- {authsome-0.3.0 → authsome-0.3.2}/tests/server/test_pop_auth.py +33 -0
- {authsome-0.3.0 → authsome-0.3.2}/uv.lock +4 -1
- authsome-0.3.0/.github/release-please-manifest.json +0 -3
- authsome-0.3.0/docs/site/concepts/profiles-vs-connections.mdx +0 -102
- authsome-0.3.0/docs/site/quickstart.mdx +0 -109
- authsome-0.3.0/docs/site/reference/cli.mdx +0 -259
- authsome-0.3.0/docs/site/roadmap.mdx +0 -103
- authsome-0.3.0/skills/authsome/SKILL.md +0 -84
- authsome-0.3.0/skills/authsome/evals/evals.json +0 -29
- {authsome-0.3.0 → authsome-0.3.2}/.github/ISSUE_TEMPLATE/bug_report.yml +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/.github/ISSUE_TEMPLATE/feature_request.yml +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/.github/dependabot.yml +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/.github/pull_request_template.md +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/.github/release-please-config.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/.github/workflows/pr-title.yml +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/.github/workflows/publish-rc.yml +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/.github/workflows/publish.yml +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/.github/workflows/release-please.yml +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/.github/workflows/test.yml +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/.pre-commit-config.yaml +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/AGENTS.md +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/CLAUDE.md +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/LICENSE +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/assets/authsome-how-it-works-dark.svg +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/assets/authsome-how-it-works-light.svg +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/assets/authsome-logo-dark.svg +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/assets/authsome-logo-light.svg +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/UBIQUITOUS_LANGUAGE.md +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/adr/0001-provider-client-record-server-scope.md +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/adr/0002-server-registered-identities.md +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/internal/authsome-design.md +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/register-provider.md +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/concepts/architecture.mdx +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/concepts/proxy-injection.mdx +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/favicon.svg +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/logo/dark.svg +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/logo/light.svg +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/provider-schema.mdx +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/security/daemon-trust-boundary.mdx +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/snippets/masked-input-note.mdx +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/snippets/multi-connections-cta.mdx +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/snippets/whats-next-apikey.mdx +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/docs/site/snippets/whats-next-oauth.mdx +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/audit/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/ahrefs.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/apollo.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/ashby.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/atlassian.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/beehiiv.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/brevo.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/buffer.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/calendly.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/clearbit.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/discord.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/dub.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/g2.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/github.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/gitlab.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/google.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/hubspot.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/hunter.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/instantly.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/intercom.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/keywords-everywhere.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/klaviyo-oauth.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/klaviyo.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/lemlist.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/linear.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/livestorm.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/mailchimp.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/mention-me.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/microsoft.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/notion.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/notion_dcr.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/openai.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/optimizely.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/postiz.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/postmark.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/resend.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/rewardful.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/savvycal.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/semrush.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/sendgrid.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/slack.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/tolt.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/typeform.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/wistia.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/x.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/zapier.json +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/api_key.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/base.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/dcr_pkce.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/device_code.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/pkce.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/input_provider.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/models/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/models/config.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/models/connection.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/models/enums.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/models/provider.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/sessions.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/utils.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/helpers.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/errors.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/identity/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/identity/keys.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/identity/proof.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/identity/registry.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/paths.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/proxy/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/proxy/certs.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/proxy/router.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/py.typed +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/daemon.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/dependencies.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/_deps.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/ui.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/schemas.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/ui/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/ui/pages.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/ui/web_theme.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/ui_sessions.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/urls.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/store/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/store/interfaces.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/store/local.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/static/app.js +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/static/style.css +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/_app_detail_shell.html +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/app_detail_apikey.html +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/app_detail_disconnected.html +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/app_detail_oauth.html +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/connections.html +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/utils.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/vault/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/src/authsome/vault/crypto.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/auth/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/auth/test_flows.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/auth/test_models.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/auth/test_service.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/auth/test_service_provider_clients.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/auth/test_url_template.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/conftest.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_doctor.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_get.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_helpers.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_identity.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_import_env.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_init.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_list.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_login.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_logout.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_register.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_revoke.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_ui.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_whoami.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/common/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/common/test_audit.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/common/test_errors.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/common/test_logging.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/common/test_utils.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/conftest.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/identity/test_identity.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/identity/test_proof.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/proxy/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/server/test_auth_sessions.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/server/test_provider_operation_policy.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/server/test_ui_sessions.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/vault/__init__.py +0 -0
- {authsome-0.3.0 → authsome-0.3.2}/tests/vault/test_crypto.py +0 -0
|
@@ -0,0 +1,495 @@
|
|
|
1
|
+
# Run Authsome Evals
|
|
2
|
+
|
|
3
|
+
Interactive eval runner for the authsome skill. You orchestrate everything
|
|
4
|
+
inline — agent invocation, transcript parsing, grading, and result saving.
|
|
5
|
+
There is no separate Python runner script.
|
|
6
|
+
|
|
7
|
+
## Pre-session setup
|
|
8
|
+
|
|
9
|
+
Run this once before starting an eval session.
|
|
10
|
+
|
|
11
|
+
**1. Install the latest authsome CLI:**
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
uv sync
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
Verify:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
uv run authsome --version
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
**2. Create a fresh identity and verify the existing one still works:**
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
# Check the current identity is healthy
|
|
27
|
+
uv run authsome doctor
|
|
28
|
+
|
|
29
|
+
# Create a new identity for the eval session
|
|
30
|
+
uv run authsome profile create --json
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Save the new `profile` handle. Then switch to it:
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
uv run authsome profile use <new-handle>
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
**3. Confirm the new identity starts clean:**
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
uv run authsome list
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Expected: no providers connected. If any show `connected`, the wrong
|
|
46
|
+
profile may be active — check with `cat ~/.authsome/client/config.json`.
|
|
47
|
+
|
|
48
|
+
**4. Restart the daemon using the dev version:**
|
|
49
|
+
|
|
50
|
+
The daemon may be running as a globally tool-installed binary while the CLI runs via `uv run`. This version mismatch causes PoP auth failures (spurious 401s) that confuse agents into running `authsome init` mid-eval, corrupting the eval profile. Restart to ensure both CLI and daemon use the same code:
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
uv run authsome daemon restart
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Verify:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
uv run authsome list
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
Expected: the same clean state as before. If the daemon fails to restart, check for port conflicts with `lsof -i :7998`.
|
|
63
|
+
|
|
64
|
+
**5. Remove hermes GitHub skills to avoid interference with authsome triggering:**
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
rm -rf ~/.hermes/skills/github
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
This prevents hermes from using its bundled GitHub skills instead of loading authsome.
|
|
71
|
+
|
|
72
|
+
**6. Verify hermes and claude are working:**
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
hermes chat -Q -q "reply with the single word OK" -t ""
|
|
76
|
+
claude -p "reply with the single word OK" --output-format text
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Expected: both respond with `OK`. Hermes runs the eval agents; claude is the LLM judge.
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
## Arguments
|
|
84
|
+
|
|
85
|
+
- No args — run all non-optional evals (ids 1–6)
|
|
86
|
+
- `--id N` — run only eval with that id
|
|
87
|
+
- `--all` — include optional evals (currently id 7: Agentic Installation)
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## Steps
|
|
92
|
+
|
|
93
|
+
### 1. Load evals
|
|
94
|
+
|
|
95
|
+
Read `evals/evals.json`. Show the user a table of which
|
|
96
|
+
evals will run (id, name, agent, requires_human, optional).
|
|
97
|
+
|
|
98
|
+
Read `~/.authsome/client/config.json` and save `active_identity` as
|
|
99
|
+
`EVAL_HANDLE` — this is the fresh profile created during pre-session setup.
|
|
100
|
+
|
|
101
|
+
Create the run directory and save as `RUN_DIR`:
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
mkdir -p "evals/results/$(date +%Y%m%d_%H%M%S)"
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
### 2. Per-eval loop
|
|
110
|
+
|
|
111
|
+
For each eval to run, **in order**:
|
|
112
|
+
|
|
113
|
+
#### a. State-check
|
|
114
|
+
|
|
115
|
+
Run `uv run authsome list` and **show the full output to the user**.
|
|
116
|
+
Compare it against the eval's `environment` field and explicitly state
|
|
117
|
+
whether it matches. If it matches, proceed. If not, show the mismatch
|
|
118
|
+
and fix it inline using `uv run authsome` commands (e.g.
|
|
119
|
+
`uv run authsome logout github`). Re-check until state matches.
|
|
120
|
+
|
|
121
|
+
If the required state cannot be reached automatically (e.g. gh CLI login
|
|
122
|
+
requires interactive browser auth that isn't part of the eval), ask the
|
|
123
|
+
user whether to skip. If they say skip, write a null verdict and record
|
|
124
|
+
it in grading.json, then move to the next eval:
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
# Write null verdict
|
|
128
|
+
cat > RUN_DIR/verdict_N.json <<'EOF'
|
|
129
|
+
{
|
|
130
|
+
"outcome": {"passed": null, "evidence": "skipped by user"},
|
|
131
|
+
"trajectory_efficiency": {"passed": null, "evidence": "skipped by user"}
|
|
132
|
+
}
|
|
133
|
+
EOF
|
|
134
|
+
|
|
135
|
+
# Capture authsome state
|
|
136
|
+
uv run authsome list > RUN_DIR/authsome_state_N.txt 2>&1
|
|
137
|
+
|
|
138
|
+
# Append to grading.json (same save script as step f, RATE_LIMITED=false)
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Then run the step-f save script for this eval and continue to the next one.
|
|
142
|
+
|
|
143
|
+
For `requires_human` evals, also show `human_instructions` now.
|
|
144
|
+
|
|
145
|
+
#### b. Install the skill for the agent
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
# For hermes evals
|
|
149
|
+
rm -rf ~/.hermes/skills/authsome
|
|
150
|
+
cp -r skills/authsome ~/.hermes/skills/authsome
|
|
151
|
+
|
|
152
|
+
# For claude evals
|
|
153
|
+
rm -rf .claude/skills/authsome
|
|
154
|
+
mkdir -p .claude/skills
|
|
155
|
+
cp -r skills/authsome .claude/skills/authsome
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
#### c. Run the agent
|
|
159
|
+
|
|
160
|
+
Before running the agent, read `max_turns` from the eval object (default
|
|
161
|
+
`12` if absent) and store it as `MAX_TURNS`.
|
|
162
|
+
|
|
163
|
+
**Hermes evals:**
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
hermes chat -v -q "PROMPT" --yolo --max-turns MAX_TURNS \
|
|
167
|
+
2>&1 | tee RUN_DIR/transcript_N.txt
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
The combined stdout+stderr is the transcript. Save `RATE_LIMITED=false`
|
|
171
|
+
unless the output contains: `rate limit`, `429`, `too many requests`,
|
|
172
|
+
`usage limit`, or `quota exceeded`.
|
|
173
|
+
|
|
174
|
+
**Claude evals — turn 1:**
|
|
175
|
+
|
|
176
|
+
```bash
|
|
177
|
+
claude --dangerously-skip-permissions --verbose --output-format stream-json \
|
|
178
|
+
--max-turns MAX_TURNS -p "PROMPT" > RUN_DIR/raw_N_t1.jsonl 2>&1
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
Then parse the raw stream-json, extract the human-readable transcript,
|
|
182
|
+
and detect whether the agent is waiting for a human action:
|
|
183
|
+
|
|
184
|
+
```bash
|
|
185
|
+
uv run python - RUN_DIR/raw_N_t1.jsonl > RUN_DIR/transcript_N.txt 2> RUN_DIR/meta_N.txt <<'PYEOF'
|
|
186
|
+
import sys, json, re
|
|
187
|
+
|
|
188
|
+
RATE_LIMIT_SIGNALS = ["rate limit", "429", "too many requests", "usage limit", "quota exceeded"]
|
|
189
|
+
path = sys.argv[1]
|
|
190
|
+
lines_out = []
|
|
191
|
+
session_id = None
|
|
192
|
+
|
|
193
|
+
for line in open(path):
|
|
194
|
+
line = line.strip()
|
|
195
|
+
if not line:
|
|
196
|
+
continue
|
|
197
|
+
try:
|
|
198
|
+
ev = json.loads(line)
|
|
199
|
+
except json.JSONDecodeError:
|
|
200
|
+
lines_out.append(line)
|
|
201
|
+
continue
|
|
202
|
+
t = ev.get("type", "")
|
|
203
|
+
if t == "assistant" and "message" in ev:
|
|
204
|
+
for block in ev["message"].get("content", []):
|
|
205
|
+
if block.get("type") == "text":
|
|
206
|
+
lines_out.append(f"[assistant] {block['text']}")
|
|
207
|
+
elif block.get("type") == "tool_use":
|
|
208
|
+
inp = json.dumps(block.get("input", {}))
|
|
209
|
+
lines_out.append(f"[tool_use] {block['name']}({inp})")
|
|
210
|
+
elif t == "user" and "message" in ev:
|
|
211
|
+
for block in ev["message"].get("content", []):
|
|
212
|
+
if block.get("type") == "tool_result":
|
|
213
|
+
content = block.get("content", "")
|
|
214
|
+
if isinstance(content, list):
|
|
215
|
+
content = " ".join(
|
|
216
|
+
c.get("text", "") for c in content if isinstance(c, dict)
|
|
217
|
+
)
|
|
218
|
+
lines_out.append(f"[tool_result] {str(content)[:800]}")
|
|
219
|
+
elif t == "result":
|
|
220
|
+
if ev.get("result"):
|
|
221
|
+
lines_out.append(f"[result] {ev['result']}")
|
|
222
|
+
if ev.get("error"):
|
|
223
|
+
lines_out.append(f"[error] {ev['error']}")
|
|
224
|
+
if ev.get("session_id"):
|
|
225
|
+
session_id = ev["session_id"]
|
|
226
|
+
|
|
227
|
+
transcript = "\n".join(lines_out)
|
|
228
|
+
print(transcript)
|
|
229
|
+
|
|
230
|
+
# Emit metadata to stderr for the caller to read
|
|
231
|
+
url_match = re.search(r'http://127\.0\.0\.1:\d+/\S+', transcript)
|
|
232
|
+
if url_match:
|
|
233
|
+
print(f"WAITING_URL={url_match.group()}", file=sys.stderr)
|
|
234
|
+
if session_id:
|
|
235
|
+
print(f"SESSION_ID={session_id}", file=sys.stderr)
|
|
236
|
+
rate_limited = any(sig in transcript.lower() for sig in RATE_LIMIT_SIGNALS)
|
|
237
|
+
print(f"RATE_LIMITED={'true' if rate_limited else 'false'}", file=sys.stderr)
|
|
238
|
+
PYEOF
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
Read `RUN_DIR/meta_N.txt` and extract:
|
|
242
|
+
|
|
243
|
+
```bash
|
|
244
|
+
SESSION_ID=$(grep "^SESSION_ID=" RUN_DIR/meta_N.txt | cut -d= -f2)
|
|
245
|
+
WAITING_URL=$(grep "^WAITING_URL=" RUN_DIR/meta_N.txt | cut -d= -f2)
|
|
246
|
+
RATE_LIMITED=$(grep "^RATE_LIMITED=" RUN_DIR/meta_N.txt | cut -d= -f2)
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
#### d. Human handoff (requires_human evals only)
|
|
250
|
+
|
|
251
|
+
**Case 1 — Agent-initiated interrupt (`expected_interrupt` is set):**
|
|
252
|
+
|
|
253
|
+
If the eval has an `expected_interrupt` field, read `RUN_DIR/transcript_N.txt` and judge
|
|
254
|
+
whether the agent's final message matches the described interrupt — i.e. the agent paused
|
|
255
|
+
mid-task to ask the user a clarifying question or request input instead of proceeding
|
|
256
|
+
autonomously. If it matches, auto-resume without human input by sending `next_turn_instruction`
|
|
257
|
+
back to the session:
|
|
258
|
+
|
|
259
|
+
```bash
|
|
260
|
+
claude --resume SESSION_ID \
|
|
261
|
+
--dangerously-skip-permissions --verbose --output-format stream-json \
|
|
262
|
+
--max-turns MAX_TURNS -p "NEXT_TURN_INSTRUCTION" > RUN_DIR/raw_N_t2.jsonl 2>&1
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
Parse the continuation with the same parse script (substitute `raw_N_t2.jsonl` and
|
|
266
|
+
`meta_N_t2.txt`) and append to `RUN_DIR/transcript_N.txt`. Update `RATE_LIMITED` from
|
|
267
|
+
`meta_N_t2.txt`. Then continue to step e for grading — do not prompt the human unless
|
|
268
|
+
`WAITING_URL` is non-empty in the resumed turn.
|
|
269
|
+
|
|
270
|
+
**Case 2 — Browser auth flow (`WAITING_URL` is non-empty):**
|
|
271
|
+
|
|
272
|
+
If `WAITING_URL` is non-empty, the agent started an auth flow and is
|
|
273
|
+
suspended at its session boundary. Wait 5 seconds, then check whether
|
|
274
|
+
the auth flow completed automatically:
|
|
275
|
+
|
|
276
|
+
```bash
|
|
277
|
+
sleep 5 && uv run authsome list
|
|
278
|
+
```
|
|
279
|
+
|
|
280
|
+
If the relevant provider now shows `connected`, proceed directly to resuming
|
|
281
|
+
the session (skip the user prompt). If it is still `not_connected`, show the user:
|
|
282
|
+
|
|
283
|
+
> The agent is waiting. Please complete the auth flow at: `WAITING_URL`
|
|
284
|
+
> Tell me "done" when finished.
|
|
285
|
+
|
|
286
|
+
Wait for the user to reply "done". Then resume the claude session and
|
|
287
|
+
append the continuation to the transcript:
|
|
288
|
+
|
|
289
|
+
```bash
|
|
290
|
+
claude --resume SESSION_ID \
|
|
291
|
+
--dangerously-skip-permissions --verbose --output-format stream-json \
|
|
292
|
+
--max-turns MAX_TURNS -p "done" > RUN_DIR/raw_N_t2.jsonl 2>&1
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
Parse turn 2 with the same script above (substitute `raw_N_t2.jsonl` and
|
|
296
|
+
`meta_N_t2.txt`), then append its transcript to `RUN_DIR/transcript_N.txt`:
|
|
297
|
+
|
|
298
|
+
```bash
|
|
299
|
+
uv run python - RUN_DIR/raw_N_t2.jsonl >> RUN_DIR/transcript_N.txt 2> RUN_DIR/meta_N_t2.txt <<'PYEOF'
|
|
300
|
+
# ... same parse script as above ...
|
|
301
|
+
PYEOF
|
|
302
|
+
|
|
303
|
+
# Update RATE_LIMITED if turn 2 was rate-limited
|
|
304
|
+
RATE_LIMITED_T2=$(grep "^RATE_LIMITED=" RUN_DIR/meta_N_t2.txt | cut -d= -f2)
|
|
305
|
+
[ "$RATE_LIMITED_T2" = "true" ] && RATE_LIMITED=true
|
|
306
|
+
```
|
|
307
|
+
|
|
308
|
+
If `WAITING_URL` is empty and no agent-initiated interrupt was detected for a `requires_human`
|
|
309
|
+
eval, the agent finished in one turn (e.g. it polled for completion itself) — no resume needed.
|
|
310
|
+
|
|
311
|
+
#### e. Grade the transcript
|
|
312
|
+
|
|
313
|
+
Call claude as the LLM judge with the full eval criteria and the transcript:
|
|
314
|
+
|
|
315
|
+
```bash
|
|
316
|
+
uv run python - RUN_DIR/transcript_N.txt EVAL_ID <<'PYEOF' > RUN_DIR/verdict_N.json
|
|
317
|
+
import sys, json, subprocess
|
|
318
|
+
from pathlib import Path
|
|
319
|
+
|
|
320
|
+
transcript = Path(sys.argv[1]).read_text()
|
|
321
|
+
eval_id = int(sys.argv[2])
|
|
322
|
+
evals = json.loads(Path("evals/evals.json").read_text())["evals"]
|
|
323
|
+
eval_ = next(e for e in evals if e["id"] == eval_id)
|
|
324
|
+
|
|
325
|
+
JUDGE_PROMPT = """\
|
|
326
|
+
You are an eval grader for an agent called Authsome. You receive:
|
|
327
|
+
- An agent transcript (stdout+stderr from a live agent run)
|
|
328
|
+
- Environment pre-conditions describing the starting state
|
|
329
|
+
- An outcome criterion (did the task succeed?)
|
|
330
|
+
- An optional trajectory_efficiency criterion (did the agent take the right number of meaningful steps?)
|
|
331
|
+
|
|
332
|
+
Return a JSON object with this exact structure:
|
|
333
|
+
{
|
|
334
|
+
"outcome": {"passed": true, "evidence": "one sentence quoting or describing transcript evidence"},
|
|
335
|
+
"trajectory_efficiency": {"passed": true, "evidence": "one sentence quoting or describing transcript evidence"}
|
|
336
|
+
}
|
|
337
|
+
|
|
338
|
+
Rules:
|
|
339
|
+
- Grade outcome and trajectory_efficiency independently.
|
|
340
|
+
- When counting meaningful steps for trajectory_efficiency, **ignore scaffolding steps**:
|
|
341
|
+
skill loading or calling skill tool, using one extra step to parse and format a
|
|
342
|
+
response, returning results to the user, reading --help, version checks, and similar
|
|
343
|
+
overhead. Only task-relevant actions count (API calls, auth flows, etc).
|
|
344
|
+
- The actual number of LLM calls will be higher than the expected step count — this is normal.
|
|
345
|
+
- If trajectory_efficiency criterion is absent, return {"passed": null, "evidence": "not evaluated"} for it.
|
|
346
|
+
- Be strict: burden of proof to pass is on the transcript.
|
|
347
|
+
- evidence must quote or specifically reference the transcript, not repeat the criterion.\
|
|
348
|
+
"""
|
|
349
|
+
|
|
350
|
+
prompt = f"""{JUDGE_PROMPT}
|
|
351
|
+
|
|
352
|
+
Environment: {eval_["environment"]}
|
|
353
|
+
|
|
354
|
+
Outcome criterion: {eval_["outcome"]}
|
|
355
|
+
|
|
356
|
+
Trajectory efficiency criterion: {eval_.get("trajectory_efficiency", "(not provided — skip this grade)")}
|
|
357
|
+
|
|
358
|
+
Full transcript:
|
|
359
|
+
---
|
|
360
|
+
{transcript}
|
|
361
|
+
---
|
|
362
|
+
|
|
363
|
+
Return ONLY valid JSON, no markdown fences."""
|
|
364
|
+
|
|
365
|
+
result = subprocess.run(
|
|
366
|
+
["claude", "-p", prompt, "--output-format", "text"],
|
|
367
|
+
capture_output=True, text=True, timeout=120,
|
|
368
|
+
)
|
|
369
|
+
|
|
370
|
+
if result.returncode != 0:
|
|
371
|
+
raise RuntimeError(f"claude judge failed (exit {result.returncode}): {result.stderr[:300]}")
|
|
372
|
+
|
|
373
|
+
raw = result.stdout.strip()
|
|
374
|
+
if "```" in raw:
|
|
375
|
+
for part in raw.split("```"):
|
|
376
|
+
part = part.strip().lstrip("json").strip()
|
|
377
|
+
if part.startswith("{"):
|
|
378
|
+
raw = part
|
|
379
|
+
break
|
|
380
|
+
|
|
381
|
+
verdict = json.loads(raw)
|
|
382
|
+
if "trajectory_efficiency" not in eval_:
|
|
383
|
+
verdict["trajectory_efficiency"] = {"passed": None, "evidence": "not evaluated"}
|
|
384
|
+
|
|
385
|
+
print(json.dumps(verdict, indent=2))
|
|
386
|
+
PYEOF
|
|
387
|
+
```
|
|
388
|
+
|
|
389
|
+
Read `RUN_DIR/verdict_N.json` and print the result line:
|
|
390
|
+
|
|
391
|
+
```
|
|
392
|
+
[result] outcome=✓/✗ trajectory=✓/✗/—
|
|
393
|
+
outcome : <evidence>
|
|
394
|
+
trajectory: <evidence>
|
|
395
|
+
```
|
|
396
|
+
|
|
397
|
+
#### f. Append result to grading.json
|
|
398
|
+
|
|
399
|
+
```bash
|
|
400
|
+
uv run python - RUN_DIR/grading.json EVAL_ID "$RATE_LIMITED" <<'PYEOF'
|
|
401
|
+
import sys, json
|
|
402
|
+
from datetime import datetime
|
|
403
|
+
from pathlib import Path
|
|
404
|
+
|
|
405
|
+
grading_path = Path(sys.argv[1])
|
|
406
|
+
eval_id = int(sys.argv[2])
|
|
407
|
+
rate_limited = sys.argv[3] == "true"
|
|
408
|
+
|
|
409
|
+
evals = json.loads(Path("evals/evals.json").read_text())["evals"]
|
|
410
|
+
eval_ = next(e for e in evals if e["id"] == eval_id)
|
|
411
|
+
|
|
412
|
+
run_dir = grading_path.parent
|
|
413
|
+
verdict = json.loads((run_dir / f"verdict_{eval_id}.json").read_text())
|
|
414
|
+
authsome_state = (run_dir / f"authsome_state_{eval_id}.txt").read_text() \
|
|
415
|
+
if (run_dir / f"authsome_state_{eval_id}.txt").exists() else ""
|
|
416
|
+
|
|
417
|
+
result_entry = {
|
|
418
|
+
"id": eval_id,
|
|
419
|
+
"name": eval_.get("name", ""),
|
|
420
|
+
"prompt": eval_["prompt"],
|
|
421
|
+
"agent": eval_.get("agent", "claude"),
|
|
422
|
+
"environment": eval_["environment"],
|
|
423
|
+
"authsome_state": authsome_state,
|
|
424
|
+
"requires_human": eval_.get("requires_human", False),
|
|
425
|
+
"rate_limited": rate_limited,
|
|
426
|
+
**verdict,
|
|
427
|
+
}
|
|
428
|
+
|
|
429
|
+
existing = {"results": []}
|
|
430
|
+
if grading_path.exists():
|
|
431
|
+
existing = json.loads(grading_path.read_text())
|
|
432
|
+
|
|
433
|
+
all_results = existing["results"] + [result_entry]
|
|
434
|
+
passed = sum(1 for r in all_results if r["outcome"]["passed"] is True)
|
|
435
|
+
failed = sum(1 for r in all_results if r["outcome"]["passed"] is False)
|
|
436
|
+
skipped = sum(1 for r in all_results if r["outcome"]["passed"] is None)
|
|
437
|
+
|
|
438
|
+
grading = {
|
|
439
|
+
"skill_name": "authsome",
|
|
440
|
+
"timestamp": datetime.now().isoformat(timespec="seconds"),
|
|
441
|
+
"summary": {"passed": passed, "failed": failed, "skipped": skipped, "total": len(all_results)},
|
|
442
|
+
"results": all_results,
|
|
443
|
+
}
|
|
444
|
+
grading_path.write_text(json.dumps(grading, indent=2))
|
|
445
|
+
|
|
446
|
+
summary = grading["summary"]
|
|
447
|
+
print(f"Done: {summary['passed']} passed / {summary['failed']} failed / {summary['skipped']} skipped out of {summary['total']}")
|
|
448
|
+
print(f"Results: {grading_path}")
|
|
449
|
+
PYEOF
|
|
450
|
+
```
|
|
451
|
+
|
|
452
|
+
Before running the save script, write the current authsome state to
|
|
453
|
+
`RUN_DIR/authsome_state_N.txt` so it's captured in the grading record:
|
|
454
|
+
|
|
455
|
+
```bash
|
|
456
|
+
uv run authsome list > RUN_DIR/authsome_state_N.txt 2>&1
|
|
457
|
+
```
|
|
458
|
+
|
|
459
|
+
#### g. State-check and continue
|
|
460
|
+
|
|
461
|
+
After showing the verdict, immediately prepare for the next eval:
|
|
462
|
+
|
|
463
|
+
1. Run `uv run authsome list` and compare against the next eval's `environment` field.
|
|
464
|
+
2. Fix any mismatches inline (e.g. `uv run authsome revoke github`). Re-check until state matches.
|
|
465
|
+
3. If you need to login (e.g. for provider X) and the environment says provider X is NOT connected, it is okay to use `uv run authsome login <provider>` and `uv run authsome list` to poll the status of the provider a few seconds later to see if the login was successful.
|
|
466
|
+
4. Only pause and ask the user when:
|
|
467
|
+
- If polling fails during login (show the URL, wait for "done")
|
|
468
|
+
- A `requires_human` eval where the user must act during the run
|
|
469
|
+
|
|
470
|
+
---
|
|
471
|
+
|
|
472
|
+
### 3. Teardown
|
|
473
|
+
|
|
474
|
+
Delete the eval profile's key files:
|
|
475
|
+
|
|
476
|
+
```bash
|
|
477
|
+
rm ~/.authsome/client/identities/EVAL_HANDLE.json
|
|
478
|
+
rm ~/.authsome/client/identities/EVAL_HANDLE.key
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
Then switch back to the user's original profile:
|
|
482
|
+
|
|
483
|
+
```bash
|
|
484
|
+
uv run authsome profile use <original-handle>
|
|
485
|
+
```
|
|
486
|
+
|
|
487
|
+
---
|
|
488
|
+
|
|
489
|
+
### 4. Generate report
|
|
490
|
+
|
|
491
|
+
```bash
|
|
492
|
+
uv run python evals/generate_report.py RUN_DIR/grading.json
|
|
493
|
+
```
|
|
494
|
+
|
|
495
|
+
Tell the user the report path. The script opens it automatically.
|
|
@@ -18,7 +18,7 @@
|
|
|
18
18
|
"name": "Agentr",
|
|
19
19
|
"url": "https://github.com/agentrhq"
|
|
20
20
|
},
|
|
21
|
-
"homepage": "https://authsome.
|
|
21
|
+
"homepage": "https://authsome.ai",
|
|
22
22
|
"license": "MIT",
|
|
23
23
|
"keywords": ["auth", "oauth2", "credentials", "agent-identity", "broker"],
|
|
24
24
|
"category": "auth"
|
|
@@ -244,5 +244,12 @@ __marimo__/
|
|
|
244
244
|
# Git worktrees
|
|
245
245
|
.worktrees/
|
|
246
246
|
.claude/worktrees/
|
|
247
|
+
|
|
247
248
|
# Local authsome home directory
|
|
248
249
|
.authsome/
|
|
250
|
+
|
|
251
|
+
# Superpowers agent docs (local planning artifacts)
|
|
252
|
+
docs/superpowers/
|
|
253
|
+
|
|
254
|
+
# Authsome skill (generated at eval time)
|
|
255
|
+
.claude/skills/authsome/
|
|
@@ -1,5 +1,67 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.3.2](https://github.com/agentrhq/authsome/compare/authsome-v0.3.1...authsome-v0.3.2) (2026-05-20)
|
|
4
|
+
|
|
5
|
+
|
|
6
|
+
### Features
|
|
7
|
+
|
|
8
|
+
* Add posthog telemetry events ([575a464](https://github.com/agentrhq/authsome/commit/575a4648fbfa8c4de7b46c50311c2b426587977c))
|
|
9
|
+
* Add posthog telemetry events ([8797a94](https://github.com/agentrhq/authsome/commit/8797a94b406fe109b0a2216d693b74dd17446892))
|
|
10
|
+
* **evals:** /run-evals command + profile/run-dir flags ([bdeae58](https://github.com/agentrhq/authsome/commit/bdeae583a2d5bd9b4a28e1b717c74240469c890a))
|
|
11
|
+
* **evals:** add expected_interrupt and next_turn_instruction eval fields ([c4b2a93](https://github.com/agentrhq/authsome/commit/c4b2a9304ab53773c4b76cd4245b5663e744e4fc))
|
|
12
|
+
* **evals:** capture real claude transcripts via stream-json subprocess ([1b28f5a](https://github.com/agentrhq/authsome/commit/1b28f5ab4f9cd702e2ccf8b2bf805b0a58418158))
|
|
13
|
+
* **evals:** move new evals schema to evals/evals.json, restore skills copy ([5bb73ba](https://github.com/agentrhq/authsome/commit/5bb73ba509dc3a1b5d55c1356210f724d7c4b130))
|
|
14
|
+
* **evals:** profile isolation + authsome state check per eval ([73d3e70](https://github.com/agentrhq/authsome/commit/73d3e70b18387ec5510714e3ef4254f8fb33c49e))
|
|
15
|
+
* **proxy:** configurable intercept scope and unmatched policy ([987e312](https://github.com/agentrhq/authsome/commit/987e312aeeb680a4910697a18abb6eadb62d2b95))
|
|
16
|
+
* update health check to validate connections based on active identity and add test coverage ([9290169](https://github.com/agentrhq/authsome/commit/929016909595a68c4452c2291564241277df5ec6))
|
|
17
|
+
* update health check to validate connections based on active identity and add test coverage ([3fa9f97](https://github.com/agentrhq/authsome/commit/3fa9f97c8b3236b57649ae8b8a8e5ec538da8ca2))
|
|
18
|
+
|
|
19
|
+
|
|
20
|
+
### Bug Fixes
|
|
21
|
+
|
|
22
|
+
* copy full skill folder in evals and fix login flow links in authsome skill ([e8918a9](https://github.com/agentrhq/authsome/commit/e8918a97369f483f12b66e34d2eac5156ebc51f2))
|
|
23
|
+
* **evals:** use claude --system-prompt for judge, grade on rate limit ([b790c63](https://github.com/agentrhq/authsome/commit/b790c6329890cc0ccc824b51ca8b428ad6bca1c6))
|
|
24
|
+
* **evals:** use hermes as LLM judge instead of claude -p ([1157837](https://github.com/agentrhq/authsome/commit/1157837e4bda094d3f8f804886ecc4820fa62a5c))
|
|
25
|
+
* **marketplace:** point plugin homepage to authsome.ai ([11ffd80](https://github.com/agentrhq/authsome/commit/11ffd80da7e0c05c998236bfa3a7a9c59fb17de1))
|
|
26
|
+
* **marketplace:** point plugin homepage to authsome.ai ([a767cbc](https://github.com/agentrhq/authsome/commit/a767cbcea2464ed8fc7469219d33df3c5df9d794))
|
|
27
|
+
* **proxy:** address PR review feedback on mode validation and route defaults ([83ab469](https://github.com/agentrhq/authsome/commit/83ab4691e63ff62c8db9168af4e2cbbfdba55401))
|
|
28
|
+
|
|
29
|
+
|
|
30
|
+
### Documentation
|
|
31
|
+
|
|
32
|
+
* add GitHub OAuth app setup walkthrough to quickstart ([f45d9c9](https://github.com/agentrhq/authsome/commit/f45d9c9ece6bb7ba017b6ad6c1041b3bac53a4f7))
|
|
33
|
+
* add Roadmap, Contributing, and Links sections to README ([07534a4](https://github.com/agentrhq/authsome/commit/07534a4cdcec7f05f552b49f5d8d5a032232a630))
|
|
34
|
+
* add Roadmap, Contributing, and Links sections; rewrite roadmap.mdx ([03990b2](https://github.com/agentrhq/authsome/commit/03990b28db0460ea18c5c16eea7848ab262d235d))
|
|
35
|
+
* correct roadmap against changelog as source of truth ([eec08e3](https://github.com/agentrhq/authsome/commit/eec08e3b1605ac542153481b6faf49ab0a904676))
|
|
36
|
+
* **evals:** add hermes smoke test to pre-session setup ([bc4ef39](https://github.com/agentrhq/authsome/commit/bc4ef3917d74747fbeddec3c0f540f28b9f895d7))
|
|
37
|
+
* **evals:** add skip handling and per-eval max_turns config ([68ae0ed](https://github.com/agentrhq/authsome/commit/68ae0ed68b49df4d45083a911d8482dece1ae7a3))
|
|
38
|
+
* **evals:** merge setup.md into run-evals command, delete setup.md ([072836f](https://github.com/agentrhq/authsome/commit/072836f2d8a95822023f6f6f428d00bc21430875))
|
|
39
|
+
* **evals:** remove profile creation from run-evals command ([33ab525](https://github.com/agentrhq/authsome/commit/33ab5258b8b7144c02d12c5d3b71fa0aba4fa707))
|
|
40
|
+
* **evals:** update design spec and plan to reflect as-built state ([2558f5c](https://github.com/agentrhq/authsome/commit/2558f5c0de134bc26fee4d59a2c1120b736563c9))
|
|
41
|
+
* mark policy layer and firewall rules as shipped, add multi-user to coming next ([b1487e8](https://github.com/agentrhq/authsome/commit/b1487e87bcf5c99912181331296441190c4c2f05))
|
|
42
|
+
* **quickstart:** add provider tabs and a runnable agent example ([7c577ab](https://github.com/agentrhq/authsome/commit/7c577abe64fb9b4db34fa8b0bbc43f72289dd8da))
|
|
43
|
+
* **quickstart:** GitHub OAuth setup walkthrough and provider tabs ([63ffdec](https://github.com/agentrhq/authsome/commit/63ffdec8abe91c90966bfb61b8b547903b945bcf))
|
|
44
|
+
* reframe roadmap as end-user capabilities, not implementation work ([0740764](https://github.com/agentrhq/authsome/commit/07407646f6c88c8c1b72d8a43f947f80e24492dd))
|
|
45
|
+
* rewrite profile storage model to match current architecture ([457d1b2](https://github.com/agentrhq/authsome/commit/457d1b2ee7d54ce03ea697beb4913a5ce01ce0c7))
|
|
46
|
+
* rewrite profile storage model to match current architecture ([773323a](https://github.com/agentrhq/authsome/commit/773323ab73bd8a05e251f5ef121b2b072dc3259a))
|
|
47
|
+
* rewrite roadmap.mdx, remove ROADMAP.md, link README to docs ([a165d37](https://github.com/agentrhq/authsome/commit/a165d372fc5b44d7ee34066e40d6ade24b9ed575))
|
|
48
|
+
* simplify hosted daemon mode description in roadmap ([5458bf6](https://github.com/agentrhq/authsome/commit/5458bf62a5ededf7cfa2030d104bcf94b41178e6))
|
|
49
|
+
* **site:** adopt skill-driven CLI conventions across the docs ([8ba6967](https://github.com/agentrhq/authsome/commit/8ba6967f490502ab55bd401de25e8328bf476067))
|
|
50
|
+
* **site:** adopt skill-driven CLI conventions across the docs ([71b89fd](https://github.com/agentrhq/authsome/commit/71b89fd036dac67c7cfc761295b39889bf3ec9f2))
|
|
51
|
+
|
|
52
|
+
## [0.3.1](https://github.com/agentrhq/authsome/compare/authsome-v0.3.0...authsome-v0.3.1) (2026-05-17)
|
|
53
|
+
|
|
54
|
+
|
|
55
|
+
### Bug Fixes
|
|
56
|
+
|
|
57
|
+
* **cli:** resolve three CLI bugs and improve audit log command ([3b990e3](https://github.com/agentrhq/authsome/commit/3b990e3cae998d9ccaccb421b5cb577c2bb89de3))
|
|
58
|
+
* **cli:** resolve three CLI bugs, improve audit log, and sync docs ([dd9cad3](https://github.com/agentrhq/authsome/commit/dd9cad326cb2817826eb8e5b8bb42f5f3df8e2a2))
|
|
59
|
+
|
|
60
|
+
|
|
61
|
+
### Documentation
|
|
62
|
+
|
|
63
|
+
* **cli:** sync reference and manual-testing guide with 0.3.0 implementation ([bdf659f](https://github.com/agentrhq/authsome/commit/bdf659ffffc0a0c2100cc21508daec042161656e))
|
|
64
|
+
|
|
3
65
|
## [0.3.0](https://github.com/agentrhq/authsome/compare/authsome-v0.2.4...authsome-v0.3.0) (2026-05-15)
|
|
4
66
|
|
|
5
67
|
|
|
@@ -86,7 +86,7 @@ The diff shows what changed. The commit message should say why. Future readers (
|
|
|
86
86
|
## Getting started
|
|
87
87
|
|
|
88
88
|
```bash
|
|
89
|
-
git clone https://github.com/
|
|
89
|
+
git clone https://github.com/agentrhq/authsome.git
|
|
90
90
|
cd authsome
|
|
91
91
|
uv pip install -e ".[dev]"
|
|
92
92
|
pre-commit install # runs ruff automatically on every commit
|