authsome 0.3.0__tar.gz → 0.3.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (314) hide show
  1. authsome-0.3.2/.claude/commands/run-evals.md +495 -0
  2. {authsome-0.3.0 → authsome-0.3.2}/.claude-plugin/marketplace.json +1 -1
  3. authsome-0.3.2/.github/release-please-manifest.json +3 -0
  4. {authsome-0.3.0 → authsome-0.3.2}/.gitignore +7 -0
  5. {authsome-0.3.0 → authsome-0.3.2}/CHANGELOG.md +62 -0
  6. {authsome-0.3.0 → authsome-0.3.2}/CONTRIBUTING.md +1 -1
  7. {authsome-0.3.0 → authsome-0.3.2}/PKG-INFO +38 -20
  8. {authsome-0.3.0 → authsome-0.3.2}/README.md +36 -19
  9. authsome-0.3.2/docs/internal/cli-design-review.md +253 -0
  10. {authsome-0.3.0 → authsome-0.3.2}/docs/internal/manual-testing.md +18 -12
  11. {authsome-0.3.0 → authsome-0.3.2}/docs/site/README.md +1 -1
  12. {authsome-0.3.0 → authsome-0.3.2}/docs/site/changelog.mdx +5 -5
  13. {authsome-0.3.0 → authsome-0.3.2}/docs/site/compared.mdx +1 -1
  14. {authsome-0.3.0 → authsome-0.3.2}/docs/site/concepts/credential-storage.mdx +24 -23
  15. authsome-0.3.2/docs/site/concepts/profiles-vs-connections.mdx +86 -0
  16. {authsome-0.3.0 → authsome-0.3.2}/docs/site/concepts/provider-registry.mdx +4 -4
  17. {authsome-0.3.0 → authsome-0.3.2}/docs/site/concepts/the-daemon.mdx +3 -3
  18. {authsome-0.3.0 → authsome-0.3.2}/docs/site/docs.json +1 -0
  19. {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/custom-providers.mdx +10 -10
  20. {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/headless-device-code.mdx +4 -4
  21. {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/login-with-oauth.mdx +12 -12
  22. {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/multiple-connections.mdx +14 -14
  23. {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/profiles.mdx +1 -1
  24. {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/run-agents-with-proxy.mdx +6 -6
  25. {authsome-0.3.0 → authsome-0.3.2}/docs/site/guides/use-api-keys.mdx +15 -15
  26. authsome-0.3.2/docs/site/images/login-github-authsome.png +0 -0
  27. {authsome-0.3.0 → authsome-0.3.2}/docs/site/index.mdx +2 -2
  28. {authsome-0.3.0 → authsome-0.3.2}/docs/site/installation.mdx +29 -25
  29. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/anthropic-sdk.mdx +5 -5
  30. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/claude-code.mdx +23 -19
  31. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/codex.mdx +18 -14
  32. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/cowork.mdx +4 -4
  33. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/cursor.mdx +17 -9
  34. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/index.mdx +3 -3
  35. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/langchain.mdx +3 -3
  36. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/llamaindex.mdx +6 -6
  37. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/nanoclaw.mdx +3 -3
  38. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/openai-agents-sdk.mdx +5 -5
  39. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/opencode.mdx +11 -7
  40. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/agents/python.mdx +3 -3
  41. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/ahrefs.mdx +11 -11
  42. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/apollo.mdx +11 -11
  43. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/ashby.mdx +11 -11
  44. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/beehiiv.mdx +11 -11
  45. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/brevo.mdx +11 -11
  46. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/buffer.mdx +11 -11
  47. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/calendly.mdx +11 -11
  48. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/clearbit.mdx +11 -11
  49. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/dub.mdx +11 -11
  50. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/g2.mdx +11 -11
  51. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/hunter.mdx +11 -11
  52. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/index.mdx +5 -5
  53. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/instantly.mdx +11 -11
  54. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/intercom.mdx +11 -11
  55. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/keywords-everywhere.mdx +11 -11
  56. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/klaviyo.mdx +11 -11
  57. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/lemlist.mdx +11 -11
  58. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/livestorm.mdx +11 -11
  59. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/mailchimp.mdx +11 -11
  60. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/mention-me.mdx +11 -11
  61. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/openai.mdx +15 -15
  62. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/optimizely.mdx +11 -11
  63. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/postmark.mdx +11 -11
  64. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/resend.mdx +11 -11
  65. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/rewardful.mdx +11 -11
  66. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/savvycal.mdx +11 -11
  67. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/semrush.mdx +11 -11
  68. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/sendgrid.mdx +11 -11
  69. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/tolt.mdx +11 -11
  70. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/typeform.mdx +11 -11
  71. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/wistia.mdx +11 -11
  72. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/api-key/zapier.mdx +11 -11
  73. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/atlassian.mdx +10 -10
  74. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/discord.mdx +10 -10
  75. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/github.mdx +52 -27
  76. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/gitlab.mdx +11 -11
  77. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/google.mdx +11 -11
  78. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/hubspot.mdx +10 -10
  79. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/index.mdx +3 -3
  80. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/klaviyo-oauth.mdx +10 -10
  81. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/linear.mdx +10 -10
  82. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/microsoft.mdx +11 -11
  83. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/notion-dcr.mdx +8 -8
  84. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/notion.mdx +8 -8
  85. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/postiz.mdx +8 -8
  86. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/slack.mdx +10 -10
  87. {authsome-0.3.0 → authsome-0.3.2}/docs/site/integrations/oauth/x.mdx +10 -10
  88. authsome-0.3.2/docs/site/quickstart.mdx +303 -0
  89. {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/audit-log.mdx +3 -3
  90. {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/bundled-providers.mdx +5 -5
  91. authsome-0.3.2/docs/site/reference/cli.mdx +298 -0
  92. {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/daemon-api.mdx +3 -3
  93. {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/environment-variables.mdx +14 -14
  94. {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/file-layout.mdx +7 -7
  95. {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/python-library.mdx +4 -4
  96. authsome-0.3.2/docs/site/roadmap.mdx +55 -0
  97. {authsome-0.3.0 → authsome-0.3.2}/docs/site/security/disclosure.mdx +1 -1
  98. {authsome-0.3.0 → authsome-0.3.2}/docs/site/security/encryption.mdx +4 -4
  99. {authsome-0.3.0 → authsome-0.3.2}/docs/site/security/hosted-deployment.mdx +1 -1
  100. {authsome-0.3.0 → authsome-0.3.2}/docs/site/security/threat-model.mdx +1 -1
  101. authsome-0.3.2/docs/site/troubleshooting/auth-errors.mdx +91 -0
  102. {authsome-0.3.0 → authsome-0.3.2}/docs/site/troubleshooting/daemon-issues.mdx +13 -13
  103. {authsome-0.3.0 → authsome-0.3.2}/docs/site/troubleshooting/doctor.mdx +5 -5
  104. {authsome-0.3.0 → authsome-0.3.2}/docs/site/troubleshooting/oauth-callbacks.mdx +7 -7
  105. {authsome-0.3.0 → authsome-0.3.2}/docs/site/troubleshooting/proxy-networking.mdx +7 -7
  106. {authsome-0.3.0 → authsome-0.3.2}/docs/site/troubleshooting/token-refresh.mdx +12 -12
  107. authsome-0.3.2/evals/.gitignore +2 -0
  108. authsome-0.3.2/evals/evals.json +85 -0
  109. authsome-0.3.2/evals/generate_report.py +278 -0
  110. {authsome-0.3.0 → authsome-0.3.2}/pyproject.toml +5 -1
  111. authsome-0.3.2/skills/authsome/SKILL.md +112 -0
  112. authsome-0.3.2/skills/authsome/references/adding-provider.md +19 -0
  113. authsome-0.3.2/skills/authsome/references/feedback.md +85 -0
  114. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/service.py +54 -39
  115. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/client.py +8 -3
  116. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/client_config.py +17 -1
  117. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/context.py +6 -0
  118. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/daemon_control.py +58 -10
  119. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/main.py +150 -37
  120. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/proxy/runner.py +4 -2
  121. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/proxy/server.py +79 -14
  122. authsome-0.3.2/src/authsome/server/analytics.py +37 -0
  123. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/app.py +3 -0
  124. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/auth.py +67 -0
  125. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/connections.py +24 -1
  126. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/health.py +11 -4
  127. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/identities.py +8 -0
  128. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/providers.py +18 -0
  129. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/proxy.py +14 -3
  130. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/_layout.html +2 -2
  131. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/overview.html +1 -1
  132. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_client_signing.py +2 -1
  133. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_daemon.py +11 -1
  134. {authsome-0.3.0 → authsome-0.3.2}/tests/proxy/test_proxy.py +80 -4
  135. {authsome-0.3.0 → authsome-0.3.2}/tests/server/test_pop_auth.py +33 -0
  136. {authsome-0.3.0 → authsome-0.3.2}/uv.lock +4 -1
  137. authsome-0.3.0/.github/release-please-manifest.json +0 -3
  138. authsome-0.3.0/docs/site/concepts/profiles-vs-connections.mdx +0 -102
  139. authsome-0.3.0/docs/site/quickstart.mdx +0 -109
  140. authsome-0.3.0/docs/site/reference/cli.mdx +0 -259
  141. authsome-0.3.0/docs/site/roadmap.mdx +0 -103
  142. authsome-0.3.0/skills/authsome/SKILL.md +0 -84
  143. authsome-0.3.0/skills/authsome/evals/evals.json +0 -29
  144. {authsome-0.3.0 → authsome-0.3.2}/.github/ISSUE_TEMPLATE/bug_report.yml +0 -0
  145. {authsome-0.3.0 → authsome-0.3.2}/.github/ISSUE_TEMPLATE/feature_request.yml +0 -0
  146. {authsome-0.3.0 → authsome-0.3.2}/.github/dependabot.yml +0 -0
  147. {authsome-0.3.0 → authsome-0.3.2}/.github/pull_request_template.md +0 -0
  148. {authsome-0.3.0 → authsome-0.3.2}/.github/release-please-config.json +0 -0
  149. {authsome-0.3.0 → authsome-0.3.2}/.github/workflows/pr-title.yml +0 -0
  150. {authsome-0.3.0 → authsome-0.3.2}/.github/workflows/publish-rc.yml +0 -0
  151. {authsome-0.3.0 → authsome-0.3.2}/.github/workflows/publish.yml +0 -0
  152. {authsome-0.3.0 → authsome-0.3.2}/.github/workflows/release-please.yml +0 -0
  153. {authsome-0.3.0 → authsome-0.3.2}/.github/workflows/test.yml +0 -0
  154. {authsome-0.3.0 → authsome-0.3.2}/.pre-commit-config.yaml +0 -0
  155. {authsome-0.3.0 → authsome-0.3.2}/AGENTS.md +0 -0
  156. {authsome-0.3.0 → authsome-0.3.2}/CLAUDE.md +0 -0
  157. {authsome-0.3.0 → authsome-0.3.2}/LICENSE +0 -0
  158. {authsome-0.3.0 → authsome-0.3.2}/assets/authsome-how-it-works-dark.svg +0 -0
  159. {authsome-0.3.0 → authsome-0.3.2}/assets/authsome-how-it-works-light.svg +0 -0
  160. {authsome-0.3.0 → authsome-0.3.2}/assets/authsome-logo-dark.svg +0 -0
  161. {authsome-0.3.0 → authsome-0.3.2}/assets/authsome-logo-light.svg +0 -0
  162. {authsome-0.3.0 → authsome-0.3.2}/docs/UBIQUITOUS_LANGUAGE.md +0 -0
  163. {authsome-0.3.0 → authsome-0.3.2}/docs/adr/0001-provider-client-record-server-scope.md +0 -0
  164. {authsome-0.3.0 → authsome-0.3.2}/docs/adr/0002-server-registered-identities.md +0 -0
  165. {authsome-0.3.0 → authsome-0.3.2}/docs/internal/authsome-design.md +0 -0
  166. {authsome-0.3.0 → authsome-0.3.2}/docs/register-provider.md +0 -0
  167. {authsome-0.3.0 → authsome-0.3.2}/docs/site/concepts/architecture.mdx +0 -0
  168. {authsome-0.3.0 → authsome-0.3.2}/docs/site/concepts/proxy-injection.mdx +0 -0
  169. {authsome-0.3.0 → authsome-0.3.2}/docs/site/favicon.svg +0 -0
  170. {authsome-0.3.0 → authsome-0.3.2}/docs/site/logo/dark.svg +0 -0
  171. {authsome-0.3.0 → authsome-0.3.2}/docs/site/logo/light.svg +0 -0
  172. {authsome-0.3.0 → authsome-0.3.2}/docs/site/reference/provider-schema.mdx +0 -0
  173. {authsome-0.3.0 → authsome-0.3.2}/docs/site/security/daemon-trust-boundary.mdx +0 -0
  174. {authsome-0.3.0 → authsome-0.3.2}/docs/site/snippets/masked-input-note.mdx +0 -0
  175. {authsome-0.3.0 → authsome-0.3.2}/docs/site/snippets/multi-connections-cta.mdx +0 -0
  176. {authsome-0.3.0 → authsome-0.3.2}/docs/site/snippets/whats-next-apikey.mdx +0 -0
  177. {authsome-0.3.0 → authsome-0.3.2}/docs/site/snippets/whats-next-oauth.mdx +0 -0
  178. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/__init__.py +0 -0
  179. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/audit/__init__.py +0 -0
  180. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/__init__.py +0 -0
  181. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/__init__.py +0 -0
  182. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/ahrefs.json +0 -0
  183. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/apollo.json +0 -0
  184. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/ashby.json +0 -0
  185. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/atlassian.json +0 -0
  186. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/beehiiv.json +0 -0
  187. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/brevo.json +0 -0
  188. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/buffer.json +0 -0
  189. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/calendly.json +0 -0
  190. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/clearbit.json +0 -0
  191. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/discord.json +0 -0
  192. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/dub.json +0 -0
  193. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/g2.json +0 -0
  194. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/github.json +0 -0
  195. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/gitlab.json +0 -0
  196. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/google.json +0 -0
  197. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/hubspot.json +0 -0
  198. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/hunter.json +0 -0
  199. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/instantly.json +0 -0
  200. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/intercom.json +0 -0
  201. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/keywords-everywhere.json +0 -0
  202. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/klaviyo-oauth.json +0 -0
  203. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/klaviyo.json +0 -0
  204. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/lemlist.json +0 -0
  205. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/linear.json +0 -0
  206. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/livestorm.json +0 -0
  207. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/mailchimp.json +0 -0
  208. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/mention-me.json +0 -0
  209. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/microsoft.json +0 -0
  210. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/notion.json +0 -0
  211. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/notion_dcr.json +0 -0
  212. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/openai.json +0 -0
  213. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/optimizely.json +0 -0
  214. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/postiz.json +0 -0
  215. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/postmark.json +0 -0
  216. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/resend.json +0 -0
  217. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/rewardful.json +0 -0
  218. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/savvycal.json +0 -0
  219. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/semrush.json +0 -0
  220. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/sendgrid.json +0 -0
  221. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/slack.json +0 -0
  222. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/tolt.json +0 -0
  223. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/typeform.json +0 -0
  224. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/wistia.json +0 -0
  225. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/x.json +0 -0
  226. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/bundled_providers/zapier.json +0 -0
  227. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/__init__.py +0 -0
  228. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/api_key.py +0 -0
  229. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/base.py +0 -0
  230. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/dcr_pkce.py +0 -0
  231. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/device_code.py +0 -0
  232. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/flows/pkce.py +0 -0
  233. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/input_provider.py +0 -0
  234. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/models/__init__.py +0 -0
  235. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/models/config.py +0 -0
  236. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/models/connection.py +0 -0
  237. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/models/enums.py +0 -0
  238. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/models/provider.py +0 -0
  239. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/sessions.py +0 -0
  240. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/auth/utils.py +0 -0
  241. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/__init__.py +0 -0
  242. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/cli/helpers.py +0 -0
  243. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/errors.py +0 -0
  244. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/identity/__init__.py +0 -0
  245. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/identity/keys.py +0 -0
  246. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/identity/proof.py +0 -0
  247. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/identity/registry.py +0 -0
  248. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/paths.py +0 -0
  249. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/proxy/__init__.py +0 -0
  250. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/proxy/certs.py +0 -0
  251. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/proxy/router.py +0 -0
  252. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/py.typed +0 -0
  253. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/__init__.py +0 -0
  254. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/daemon.py +0 -0
  255. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/dependencies.py +0 -0
  256. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/__init__.py +0 -0
  257. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/_deps.py +0 -0
  258. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/routes/ui.py +0 -0
  259. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/schemas.py +0 -0
  260. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/ui/__init__.py +0 -0
  261. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/ui/pages.py +0 -0
  262. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/ui/web_theme.py +0 -0
  263. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/ui_sessions.py +0 -0
  264. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/server/urls.py +0 -0
  265. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/store/__init__.py +0 -0
  266. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/store/interfaces.py +0 -0
  267. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/store/local.py +0 -0
  268. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/__init__.py +0 -0
  269. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/static/app.js +0 -0
  270. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/static/style.css +0 -0
  271. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/_app_detail_shell.html +0 -0
  272. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/app_detail_apikey.html +0 -0
  273. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/app_detail_disconnected.html +0 -0
  274. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/app_detail_oauth.html +0 -0
  275. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/ui/templates/connections.html +0 -0
  276. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/utils.py +0 -0
  277. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/vault/__init__.py +0 -0
  278. {authsome-0.3.0 → authsome-0.3.2}/src/authsome/vault/crypto.py +0 -0
  279. {authsome-0.3.0 → authsome-0.3.2}/tests/__init__.py +0 -0
  280. {authsome-0.3.0 → authsome-0.3.2}/tests/auth/__init__.py +0 -0
  281. {authsome-0.3.0 → authsome-0.3.2}/tests/auth/test_flows.py +0 -0
  282. {authsome-0.3.0 → authsome-0.3.2}/tests/auth/test_models.py +0 -0
  283. {authsome-0.3.0 → authsome-0.3.2}/tests/auth/test_service.py +0 -0
  284. {authsome-0.3.0 → authsome-0.3.2}/tests/auth/test_service_provider_clients.py +0 -0
  285. {authsome-0.3.0 → authsome-0.3.2}/tests/auth/test_url_template.py +0 -0
  286. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/__init__.py +0 -0
  287. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/conftest.py +0 -0
  288. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_doctor.py +0 -0
  289. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_get.py +0 -0
  290. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_helpers.py +0 -0
  291. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_identity.py +0 -0
  292. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_import_env.py +0 -0
  293. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_init.py +0 -0
  294. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_list.py +0 -0
  295. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_login.py +0 -0
  296. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_logout.py +0 -0
  297. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_register.py +0 -0
  298. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_revoke.py +0 -0
  299. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_ui.py +0 -0
  300. {authsome-0.3.0 → authsome-0.3.2}/tests/cli/test_whoami.py +0 -0
  301. {authsome-0.3.0 → authsome-0.3.2}/tests/common/__init__.py +0 -0
  302. {authsome-0.3.0 → authsome-0.3.2}/tests/common/test_audit.py +0 -0
  303. {authsome-0.3.0 → authsome-0.3.2}/tests/common/test_errors.py +0 -0
  304. {authsome-0.3.0 → authsome-0.3.2}/tests/common/test_logging.py +0 -0
  305. {authsome-0.3.0 → authsome-0.3.2}/tests/common/test_utils.py +0 -0
  306. {authsome-0.3.0 → authsome-0.3.2}/tests/conftest.py +0 -0
  307. {authsome-0.3.0 → authsome-0.3.2}/tests/identity/test_identity.py +0 -0
  308. {authsome-0.3.0 → authsome-0.3.2}/tests/identity/test_proof.py +0 -0
  309. {authsome-0.3.0 → authsome-0.3.2}/tests/proxy/__init__.py +0 -0
  310. {authsome-0.3.0 → authsome-0.3.2}/tests/server/test_auth_sessions.py +0 -0
  311. {authsome-0.3.0 → authsome-0.3.2}/tests/server/test_provider_operation_policy.py +0 -0
  312. {authsome-0.3.0 → authsome-0.3.2}/tests/server/test_ui_sessions.py +0 -0
  313. {authsome-0.3.0 → authsome-0.3.2}/tests/vault/__init__.py +0 -0
  314. {authsome-0.3.0 → authsome-0.3.2}/tests/vault/test_crypto.py +0 -0
@@ -0,0 +1,495 @@
1
+ # Run Authsome Evals
2
+
3
+ Interactive eval runner for the authsome skill. You orchestrate everything
4
+ inline — agent invocation, transcript parsing, grading, and result saving.
5
+ There is no separate Python runner script.
6
+
7
+ ## Pre-session setup
8
+
9
+ Run this once before starting an eval session.
10
+
11
+ **1. Install the latest authsome CLI:**
12
+
13
+ ```bash
14
+ uv sync
15
+ ```
16
+
17
+ Verify:
18
+
19
+ ```bash
20
+ uv run authsome --version
21
+ ```
22
+
23
+ **2. Create a fresh identity and verify the existing one still works:**
24
+
25
+ ```bash
26
+ # Check the current identity is healthy
27
+ uv run authsome doctor
28
+
29
+ # Create a new identity for the eval session
30
+ uv run authsome profile create --json
31
+ ```
32
+
33
+ Save the new `profile` handle. Then switch to it:
34
+
35
+ ```bash
36
+ uv run authsome profile use <new-handle>
37
+ ```
38
+
39
+ **3. Confirm the new identity starts clean:**
40
+
41
+ ```bash
42
+ uv run authsome list
43
+ ```
44
+
45
+ Expected: no providers connected. If any show `connected`, the wrong
46
+ profile may be active — check with `cat ~/.authsome/client/config.json`.
47
+
48
+ **4. Restart the daemon using the dev version:**
49
+
50
+ The daemon may be running as a globally tool-installed binary while the CLI runs via `uv run`. This version mismatch causes PoP auth failures (spurious 401s) that confuse agents into running `authsome init` mid-eval, corrupting the eval profile. Restart to ensure both CLI and daemon use the same code:
51
+
52
+ ```bash
53
+ uv run authsome daemon restart
54
+ ```
55
+
56
+ Verify:
57
+
58
+ ```bash
59
+ uv run authsome list
60
+ ```
61
+
62
+ Expected: the same clean state as before. If the daemon fails to restart, check for port conflicts with `lsof -i :7998`.
63
+
64
+ **5. Remove hermes GitHub skills to avoid interference with authsome triggering:**
65
+
66
+ ```bash
67
+ rm -rf ~/.hermes/skills/github
68
+ ```
69
+
70
+ This prevents hermes from using its bundled GitHub skills instead of loading authsome.
71
+
72
+ **6. Verify hermes and claude are working:**
73
+
74
+ ```bash
75
+ hermes chat -Q -q "reply with the single word OK" -t ""
76
+ claude -p "reply with the single word OK" --output-format text
77
+ ```
78
+
79
+ Expected: both respond with `OK`. Hermes runs the eval agents; claude is the LLM judge.
80
+
81
+ ---
82
+
83
+ ## Arguments
84
+
85
+ - No args — run all non-optional evals (ids 1–6)
86
+ - `--id N` — run only eval with that id
87
+ - `--all` — include optional evals (currently id 7: Agentic Installation)
88
+
89
+ ---
90
+
91
+ ## Steps
92
+
93
+ ### 1. Load evals
94
+
95
+ Read `evals/evals.json`. Show the user a table of which
96
+ evals will run (id, name, agent, requires_human, optional).
97
+
98
+ Read `~/.authsome/client/config.json` and save `active_identity` as
99
+ `EVAL_HANDLE` — this is the fresh profile created during pre-session setup.
100
+
101
+ Create the run directory and save as `RUN_DIR`:
102
+
103
+ ```bash
104
+ mkdir -p "evals/results/$(date +%Y%m%d_%H%M%S)"
105
+ ```
106
+
107
+ ---
108
+
109
+ ### 2. Per-eval loop
110
+
111
+ For each eval to run, **in order**:
112
+
113
+ #### a. State-check
114
+
115
+ Run `uv run authsome list` and **show the full output to the user**.
116
+ Compare it against the eval's `environment` field and explicitly state
117
+ whether it matches. If it matches, proceed. If not, show the mismatch
118
+ and fix it inline using `uv run authsome` commands (e.g.
119
+ `uv run authsome logout github`). Re-check until state matches.
120
+
121
+ If the required state cannot be reached automatically (e.g. gh CLI login
122
+ requires interactive browser auth that isn't part of the eval), ask the
123
+ user whether to skip. If they say skip, write a null verdict and record
124
+ it in grading.json, then move to the next eval:
125
+
126
+ ```bash
127
+ # Write null verdict
128
+ cat > RUN_DIR/verdict_N.json <<'EOF'
129
+ {
130
+ "outcome": {"passed": null, "evidence": "skipped by user"},
131
+ "trajectory_efficiency": {"passed": null, "evidence": "skipped by user"}
132
+ }
133
+ EOF
134
+
135
+ # Capture authsome state
136
+ uv run authsome list > RUN_DIR/authsome_state_N.txt 2>&1
137
+
138
+ # Append to grading.json (same save script as step f, RATE_LIMITED=false)
139
+ ```
140
+
141
+ Then run the step-f save script for this eval and continue to the next one.
142
+
143
+ For `requires_human` evals, also show `human_instructions` now.
144
+
145
+ #### b. Install the skill for the agent
146
+
147
+ ```bash
148
+ # For hermes evals
149
+ rm -rf ~/.hermes/skills/authsome
150
+ cp -r skills/authsome ~/.hermes/skills/authsome
151
+
152
+ # For claude evals
153
+ rm -rf .claude/skills/authsome
154
+ mkdir -p .claude/skills
155
+ cp -r skills/authsome .claude/skills/authsome
156
+ ```
157
+
158
+ #### c. Run the agent
159
+
160
+ Before running the agent, read `max_turns` from the eval object (default
161
+ `12` if absent) and store it as `MAX_TURNS`.
162
+
163
+ **Hermes evals:**
164
+
165
+ ```bash
166
+ hermes chat -v -q "PROMPT" --yolo --max-turns MAX_TURNS \
167
+ 2>&1 | tee RUN_DIR/transcript_N.txt
168
+ ```
169
+
170
+ The combined stdout+stderr is the transcript. Save `RATE_LIMITED=false`
171
+ unless the output contains: `rate limit`, `429`, `too many requests`,
172
+ `usage limit`, or `quota exceeded`.
173
+
174
+ **Claude evals — turn 1:**
175
+
176
+ ```bash
177
+ claude --dangerously-skip-permissions --verbose --output-format stream-json \
178
+ --max-turns MAX_TURNS -p "PROMPT" > RUN_DIR/raw_N_t1.jsonl 2>&1
179
+ ```
180
+
181
+ Then parse the raw stream-json, extract the human-readable transcript,
182
+ and detect whether the agent is waiting for a human action:
183
+
184
+ ```bash
185
+ uv run python - RUN_DIR/raw_N_t1.jsonl > RUN_DIR/transcript_N.txt 2> RUN_DIR/meta_N.txt <<'PYEOF'
186
+ import sys, json, re
187
+
188
+ RATE_LIMIT_SIGNALS = ["rate limit", "429", "too many requests", "usage limit", "quota exceeded"]
189
+ path = sys.argv[1]
190
+ lines_out = []
191
+ session_id = None
192
+
193
+ for line in open(path):
194
+ line = line.strip()
195
+ if not line:
196
+ continue
197
+ try:
198
+ ev = json.loads(line)
199
+ except json.JSONDecodeError:
200
+ lines_out.append(line)
201
+ continue
202
+ t = ev.get("type", "")
203
+ if t == "assistant" and "message" in ev:
204
+ for block in ev["message"].get("content", []):
205
+ if block.get("type") == "text":
206
+ lines_out.append(f"[assistant] {block['text']}")
207
+ elif block.get("type") == "tool_use":
208
+ inp = json.dumps(block.get("input", {}))
209
+ lines_out.append(f"[tool_use] {block['name']}({inp})")
210
+ elif t == "user" and "message" in ev:
211
+ for block in ev["message"].get("content", []):
212
+ if block.get("type") == "tool_result":
213
+ content = block.get("content", "")
214
+ if isinstance(content, list):
215
+ content = " ".join(
216
+ c.get("text", "") for c in content if isinstance(c, dict)
217
+ )
218
+ lines_out.append(f"[tool_result] {str(content)[:800]}")
219
+ elif t == "result":
220
+ if ev.get("result"):
221
+ lines_out.append(f"[result] {ev['result']}")
222
+ if ev.get("error"):
223
+ lines_out.append(f"[error] {ev['error']}")
224
+ if ev.get("session_id"):
225
+ session_id = ev["session_id"]
226
+
227
+ transcript = "\n".join(lines_out)
228
+ print(transcript)
229
+
230
+ # Emit metadata to stderr for the caller to read
231
+ url_match = re.search(r'http://127\.0\.0\.1:\d+/\S+', transcript)
232
+ if url_match:
233
+ print(f"WAITING_URL={url_match.group()}", file=sys.stderr)
234
+ if session_id:
235
+ print(f"SESSION_ID={session_id}", file=sys.stderr)
236
+ rate_limited = any(sig in transcript.lower() for sig in RATE_LIMIT_SIGNALS)
237
+ print(f"RATE_LIMITED={'true' if rate_limited else 'false'}", file=sys.stderr)
238
+ PYEOF
239
+ ```
240
+
241
+ Read `RUN_DIR/meta_N.txt` and extract:
242
+
243
+ ```bash
244
+ SESSION_ID=$(grep "^SESSION_ID=" RUN_DIR/meta_N.txt | cut -d= -f2)
245
+ WAITING_URL=$(grep "^WAITING_URL=" RUN_DIR/meta_N.txt | cut -d= -f2)
246
+ RATE_LIMITED=$(grep "^RATE_LIMITED=" RUN_DIR/meta_N.txt | cut -d= -f2)
247
+ ```
248
+
249
+ #### d. Human handoff (requires_human evals only)
250
+
251
+ **Case 1 — Agent-initiated interrupt (`expected_interrupt` is set):**
252
+
253
+ If the eval has an `expected_interrupt` field, read `RUN_DIR/transcript_N.txt` and judge
254
+ whether the agent's final message matches the described interrupt — i.e. the agent paused
255
+ mid-task to ask the user a clarifying question or request input instead of proceeding
256
+ autonomously. If it matches, auto-resume without human input by sending `next_turn_instruction`
257
+ back to the session:
258
+
259
+ ```bash
260
+ claude --resume SESSION_ID \
261
+ --dangerously-skip-permissions --verbose --output-format stream-json \
262
+ --max-turns MAX_TURNS -p "NEXT_TURN_INSTRUCTION" > RUN_DIR/raw_N_t2.jsonl 2>&1
263
+ ```
264
+
265
+ Parse the continuation with the same parse script (substitute `raw_N_t2.jsonl` and
266
+ `meta_N_t2.txt`) and append to `RUN_DIR/transcript_N.txt`. Update `RATE_LIMITED` from
267
+ `meta_N_t2.txt`. Then continue to step e for grading — do not prompt the human unless
268
+ `WAITING_URL` is non-empty in the resumed turn.
269
+
270
+ **Case 2 — Browser auth flow (`WAITING_URL` is non-empty):**
271
+
272
+ If `WAITING_URL` is non-empty, the agent started an auth flow and is
273
+ suspended at its session boundary. Wait 5 seconds, then check whether
274
+ the auth flow completed automatically:
275
+
276
+ ```bash
277
+ sleep 5 && uv run authsome list
278
+ ```
279
+
280
+ If the relevant provider now shows `connected`, proceed directly to resuming
281
+ the session (skip the user prompt). If it is still `not_connected`, show the user:
282
+
283
+ > The agent is waiting. Please complete the auth flow at: `WAITING_URL`
284
+ > Tell me "done" when finished.
285
+
286
+ Wait for the user to reply "done". Then resume the claude session and
287
+ append the continuation to the transcript:
288
+
289
+ ```bash
290
+ claude --resume SESSION_ID \
291
+ --dangerously-skip-permissions --verbose --output-format stream-json \
292
+ --max-turns MAX_TURNS -p "done" > RUN_DIR/raw_N_t2.jsonl 2>&1
293
+ ```
294
+
295
+ Parse turn 2 with the same script above (substitute `raw_N_t2.jsonl` and
296
+ `meta_N_t2.txt`), then append its transcript to `RUN_DIR/transcript_N.txt`:
297
+
298
+ ```bash
299
+ uv run python - RUN_DIR/raw_N_t2.jsonl >> RUN_DIR/transcript_N.txt 2> RUN_DIR/meta_N_t2.txt <<'PYEOF'
300
+ # ... same parse script as above ...
301
+ PYEOF
302
+
303
+ # Update RATE_LIMITED if turn 2 was rate-limited
304
+ RATE_LIMITED_T2=$(grep "^RATE_LIMITED=" RUN_DIR/meta_N_t2.txt | cut -d= -f2)
305
+ [ "$RATE_LIMITED_T2" = "true" ] && RATE_LIMITED=true
306
+ ```
307
+
308
+ If `WAITING_URL` is empty and no agent-initiated interrupt was detected for a `requires_human`
309
+ eval, the agent finished in one turn (e.g. it polled for completion itself) — no resume needed.
310
+
311
+ #### e. Grade the transcript
312
+
313
+ Call claude as the LLM judge with the full eval criteria and the transcript:
314
+
315
+ ```bash
316
+ uv run python - RUN_DIR/transcript_N.txt EVAL_ID <<'PYEOF' > RUN_DIR/verdict_N.json
317
+ import sys, json, subprocess
318
+ from pathlib import Path
319
+
320
+ transcript = Path(sys.argv[1]).read_text()
321
+ eval_id = int(sys.argv[2])
322
+ evals = json.loads(Path("evals/evals.json").read_text())["evals"]
323
+ eval_ = next(e for e in evals if e["id"] == eval_id)
324
+
325
+ JUDGE_PROMPT = """\
326
+ You are an eval grader for an agent called Authsome. You receive:
327
+ - An agent transcript (stdout+stderr from a live agent run)
328
+ - Environment pre-conditions describing the starting state
329
+ - An outcome criterion (did the task succeed?)
330
+ - An optional trajectory_efficiency criterion (did the agent take the right number of meaningful steps?)
331
+
332
+ Return a JSON object with this exact structure:
333
+ {
334
+ "outcome": {"passed": true, "evidence": "one sentence quoting or describing transcript evidence"},
335
+ "trajectory_efficiency": {"passed": true, "evidence": "one sentence quoting or describing transcript evidence"}
336
+ }
337
+
338
+ Rules:
339
+ - Grade outcome and trajectory_efficiency independently.
340
+ - When counting meaningful steps for trajectory_efficiency, **ignore scaffolding steps**:
341
+ skill loading or calling skill tool, using one extra step to parse and format a
342
+ response, returning results to the user, reading --help, version checks, and similar
343
+ overhead. Only task-relevant actions count (API calls, auth flows, etc).
344
+ - The actual number of LLM calls will be higher than the expected step count — this is normal.
345
+ - If trajectory_efficiency criterion is absent, return {"passed": null, "evidence": "not evaluated"} for it.
346
+ - Be strict: burden of proof to pass is on the transcript.
347
+ - evidence must quote or specifically reference the transcript, not repeat the criterion.\
348
+ """
349
+
350
+ prompt = f"""{JUDGE_PROMPT}
351
+
352
+ Environment: {eval_["environment"]}
353
+
354
+ Outcome criterion: {eval_["outcome"]}
355
+
356
+ Trajectory efficiency criterion: {eval_.get("trajectory_efficiency", "(not provided — skip this grade)")}
357
+
358
+ Full transcript:
359
+ ---
360
+ {transcript}
361
+ ---
362
+
363
+ Return ONLY valid JSON, no markdown fences."""
364
+
365
+ result = subprocess.run(
366
+ ["claude", "-p", prompt, "--output-format", "text"],
367
+ capture_output=True, text=True, timeout=120,
368
+ )
369
+
370
+ if result.returncode != 0:
371
+ raise RuntimeError(f"claude judge failed (exit {result.returncode}): {result.stderr[:300]}")
372
+
373
+ raw = result.stdout.strip()
374
+ if "```" in raw:
375
+ for part in raw.split("```"):
376
+ part = part.strip().lstrip("json").strip()
377
+ if part.startswith("{"):
378
+ raw = part
379
+ break
380
+
381
+ verdict = json.loads(raw)
382
+ if "trajectory_efficiency" not in eval_:
383
+ verdict["trajectory_efficiency"] = {"passed": None, "evidence": "not evaluated"}
384
+
385
+ print(json.dumps(verdict, indent=2))
386
+ PYEOF
387
+ ```
388
+
389
+ Read `RUN_DIR/verdict_N.json` and print the result line:
390
+
391
+ ```
392
+ [result] outcome=✓/✗ trajectory=✓/✗/—
393
+ outcome : <evidence>
394
+ trajectory: <evidence>
395
+ ```
396
+
397
+ #### f. Append result to grading.json
398
+
399
+ ```bash
400
+ uv run python - RUN_DIR/grading.json EVAL_ID "$RATE_LIMITED" <<'PYEOF'
401
+ import sys, json
402
+ from datetime import datetime
403
+ from pathlib import Path
404
+
405
+ grading_path = Path(sys.argv[1])
406
+ eval_id = int(sys.argv[2])
407
+ rate_limited = sys.argv[3] == "true"
408
+
409
+ evals = json.loads(Path("evals/evals.json").read_text())["evals"]
410
+ eval_ = next(e for e in evals if e["id"] == eval_id)
411
+
412
+ run_dir = grading_path.parent
413
+ verdict = json.loads((run_dir / f"verdict_{eval_id}.json").read_text())
414
+ authsome_state = (run_dir / f"authsome_state_{eval_id}.txt").read_text() \
415
+ if (run_dir / f"authsome_state_{eval_id}.txt").exists() else ""
416
+
417
+ result_entry = {
418
+ "id": eval_id,
419
+ "name": eval_.get("name", ""),
420
+ "prompt": eval_["prompt"],
421
+ "agent": eval_.get("agent", "claude"),
422
+ "environment": eval_["environment"],
423
+ "authsome_state": authsome_state,
424
+ "requires_human": eval_.get("requires_human", False),
425
+ "rate_limited": rate_limited,
426
+ **verdict,
427
+ }
428
+
429
+ existing = {"results": []}
430
+ if grading_path.exists():
431
+ existing = json.loads(grading_path.read_text())
432
+
433
+ all_results = existing["results"] + [result_entry]
434
+ passed = sum(1 for r in all_results if r["outcome"]["passed"] is True)
435
+ failed = sum(1 for r in all_results if r["outcome"]["passed"] is False)
436
+ skipped = sum(1 for r in all_results if r["outcome"]["passed"] is None)
437
+
438
+ grading = {
439
+ "skill_name": "authsome",
440
+ "timestamp": datetime.now().isoformat(timespec="seconds"),
441
+ "summary": {"passed": passed, "failed": failed, "skipped": skipped, "total": len(all_results)},
442
+ "results": all_results,
443
+ }
444
+ grading_path.write_text(json.dumps(grading, indent=2))
445
+
446
+ summary = grading["summary"]
447
+ print(f"Done: {summary['passed']} passed / {summary['failed']} failed / {summary['skipped']} skipped out of {summary['total']}")
448
+ print(f"Results: {grading_path}")
449
+ PYEOF
450
+ ```
451
+
452
+ Before running the save script, write the current authsome state to
453
+ `RUN_DIR/authsome_state_N.txt` so it's captured in the grading record:
454
+
455
+ ```bash
456
+ uv run authsome list > RUN_DIR/authsome_state_N.txt 2>&1
457
+ ```
458
+
459
+ #### g. State-check and continue
460
+
461
+ After showing the verdict, immediately prepare for the next eval:
462
+
463
+ 1. Run `uv run authsome list` and compare against the next eval's `environment` field.
464
+ 2. Fix any mismatches inline (e.g. `uv run authsome revoke github`). Re-check until state matches.
465
+ 3. If you need to login (e.g. for provider X) and the environment says provider X is NOT connected, it is okay to use `uv run authsome login <provider>` and `uv run authsome list` to poll the status of the provider a few seconds later to see if the login was successful.
466
+ 4. Only pause and ask the user when:
467
+ - If polling fails during login (show the URL, wait for "done")
468
+ - A `requires_human` eval where the user must act during the run
469
+
470
+ ---
471
+
472
+ ### 3. Teardown
473
+
474
+ Delete the eval profile's key files:
475
+
476
+ ```bash
477
+ rm ~/.authsome/client/identities/EVAL_HANDLE.json
478
+ rm ~/.authsome/client/identities/EVAL_HANDLE.key
479
+ ```
480
+
481
+ Then switch back to the user's original profile:
482
+
483
+ ```bash
484
+ uv run authsome profile use <original-handle>
485
+ ```
486
+
487
+ ---
488
+
489
+ ### 4. Generate report
490
+
491
+ ```bash
492
+ uv run python evals/generate_report.py RUN_DIR/grading.json
493
+ ```
494
+
495
+ Tell the user the report path. The script opens it automatically.
@@ -18,7 +18,7 @@
18
18
  "name": "Agentr",
19
19
  "url": "https://github.com/agentrhq"
20
20
  },
21
- "homepage": "https://authsome.agentr.dev",
21
+ "homepage": "https://authsome.ai",
22
22
  "license": "MIT",
23
23
  "keywords": ["auth", "oauth2", "credentials", "agent-identity", "broker"],
24
24
  "category": "auth"
@@ -0,0 +1,3 @@
1
+ {
2
+ ".": "0.3.2"
3
+ }
@@ -244,5 +244,12 @@ __marimo__/
244
244
  # Git worktrees
245
245
  .worktrees/
246
246
  .claude/worktrees/
247
+
247
248
  # Local authsome home directory
248
249
  .authsome/
250
+
251
+ # Superpowers agent docs (local planning artifacts)
252
+ docs/superpowers/
253
+
254
+ # Authsome skill (generated at eval time)
255
+ .claude/skills/authsome/
@@ -1,5 +1,67 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.3.2](https://github.com/agentrhq/authsome/compare/authsome-v0.3.1...authsome-v0.3.2) (2026-05-20)
4
+
5
+
6
+ ### Features
7
+
8
+ * Add posthog telemetry events ([575a464](https://github.com/agentrhq/authsome/commit/575a4648fbfa8c4de7b46c50311c2b426587977c))
9
+ * Add posthog telemetry events ([8797a94](https://github.com/agentrhq/authsome/commit/8797a94b406fe109b0a2216d693b74dd17446892))
10
+ * **evals:** /run-evals command + profile/run-dir flags ([bdeae58](https://github.com/agentrhq/authsome/commit/bdeae583a2d5bd9b4a28e1b717c74240469c890a))
11
+ * **evals:** add expected_interrupt and next_turn_instruction eval fields ([c4b2a93](https://github.com/agentrhq/authsome/commit/c4b2a9304ab53773c4b76cd4245b5663e744e4fc))
12
+ * **evals:** capture real claude transcripts via stream-json subprocess ([1b28f5a](https://github.com/agentrhq/authsome/commit/1b28f5ab4f9cd702e2ccf8b2bf805b0a58418158))
13
+ * **evals:** move new evals schema to evals/evals.json, restore skills copy ([5bb73ba](https://github.com/agentrhq/authsome/commit/5bb73ba509dc3a1b5d55c1356210f724d7c4b130))
14
+ * **evals:** profile isolation + authsome state check per eval ([73d3e70](https://github.com/agentrhq/authsome/commit/73d3e70b18387ec5510714e3ef4254f8fb33c49e))
15
+ * **proxy:** configurable intercept scope and unmatched policy ([987e312](https://github.com/agentrhq/authsome/commit/987e312aeeb680a4910697a18abb6eadb62d2b95))
16
+ * update health check to validate connections based on active identity and add test coverage ([9290169](https://github.com/agentrhq/authsome/commit/929016909595a68c4452c2291564241277df5ec6))
17
+ * update health check to validate connections based on active identity and add test coverage ([3fa9f97](https://github.com/agentrhq/authsome/commit/3fa9f97c8b3236b57649ae8b8a8e5ec538da8ca2))
18
+
19
+
20
+ ### Bug Fixes
21
+
22
+ * copy full skill folder in evals and fix login flow links in authsome skill ([e8918a9](https://github.com/agentrhq/authsome/commit/e8918a97369f483f12b66e34d2eac5156ebc51f2))
23
+ * **evals:** use claude --system-prompt for judge, grade on rate limit ([b790c63](https://github.com/agentrhq/authsome/commit/b790c6329890cc0ccc824b51ca8b428ad6bca1c6))
24
+ * **evals:** use hermes as LLM judge instead of claude -p ([1157837](https://github.com/agentrhq/authsome/commit/1157837e4bda094d3f8f804886ecc4820fa62a5c))
25
+ * **marketplace:** point plugin homepage to authsome.ai ([11ffd80](https://github.com/agentrhq/authsome/commit/11ffd80da7e0c05c998236bfa3a7a9c59fb17de1))
26
+ * **marketplace:** point plugin homepage to authsome.ai ([a767cbc](https://github.com/agentrhq/authsome/commit/a767cbcea2464ed8fc7469219d33df3c5df9d794))
27
+ * **proxy:** address PR review feedback on mode validation and route defaults ([83ab469](https://github.com/agentrhq/authsome/commit/83ab4691e63ff62c8db9168af4e2cbbfdba55401))
28
+
29
+
30
+ ### Documentation
31
+
32
+ * add GitHub OAuth app setup walkthrough to quickstart ([f45d9c9](https://github.com/agentrhq/authsome/commit/f45d9c9ece6bb7ba017b6ad6c1041b3bac53a4f7))
33
+ * add Roadmap, Contributing, and Links sections to README ([07534a4](https://github.com/agentrhq/authsome/commit/07534a4cdcec7f05f552b49f5d8d5a032232a630))
34
+ * add Roadmap, Contributing, and Links sections; rewrite roadmap.mdx ([03990b2](https://github.com/agentrhq/authsome/commit/03990b28db0460ea18c5c16eea7848ab262d235d))
35
+ * correct roadmap against changelog as source of truth ([eec08e3](https://github.com/agentrhq/authsome/commit/eec08e3b1605ac542153481b6faf49ab0a904676))
36
+ * **evals:** add hermes smoke test to pre-session setup ([bc4ef39](https://github.com/agentrhq/authsome/commit/bc4ef3917d74747fbeddec3c0f540f28b9f895d7))
37
+ * **evals:** add skip handling and per-eval max_turns config ([68ae0ed](https://github.com/agentrhq/authsome/commit/68ae0ed68b49df4d45083a911d8482dece1ae7a3))
38
+ * **evals:** merge setup.md into run-evals command, delete setup.md ([072836f](https://github.com/agentrhq/authsome/commit/072836f2d8a95822023f6f6f428d00bc21430875))
39
+ * **evals:** remove profile creation from run-evals command ([33ab525](https://github.com/agentrhq/authsome/commit/33ab5258b8b7144c02d12c5d3b71fa0aba4fa707))
40
+ * **evals:** update design spec and plan to reflect as-built state ([2558f5c](https://github.com/agentrhq/authsome/commit/2558f5c0de134bc26fee4d59a2c1120b736563c9))
41
+ * mark policy layer and firewall rules as shipped, add multi-user to coming next ([b1487e8](https://github.com/agentrhq/authsome/commit/b1487e87bcf5c99912181331296441190c4c2f05))
42
+ * **quickstart:** add provider tabs and a runnable agent example ([7c577ab](https://github.com/agentrhq/authsome/commit/7c577abe64fb9b4db34fa8b0bbc43f72289dd8da))
43
+ * **quickstart:** GitHub OAuth setup walkthrough and provider tabs ([63ffdec](https://github.com/agentrhq/authsome/commit/63ffdec8abe91c90966bfb61b8b547903b945bcf))
44
+ * reframe roadmap as end-user capabilities, not implementation work ([0740764](https://github.com/agentrhq/authsome/commit/07407646f6c88c8c1b72d8a43f947f80e24492dd))
45
+ * rewrite profile storage model to match current architecture ([457d1b2](https://github.com/agentrhq/authsome/commit/457d1b2ee7d54ce03ea697beb4913a5ce01ce0c7))
46
+ * rewrite profile storage model to match current architecture ([773323a](https://github.com/agentrhq/authsome/commit/773323ab73bd8a05e251f5ef121b2b072dc3259a))
47
+ * rewrite roadmap.mdx, remove ROADMAP.md, link README to docs ([a165d37](https://github.com/agentrhq/authsome/commit/a165d372fc5b44d7ee34066e40d6ade24b9ed575))
48
+ * simplify hosted daemon mode description in roadmap ([5458bf6](https://github.com/agentrhq/authsome/commit/5458bf62a5ededf7cfa2030d104bcf94b41178e6))
49
+ * **site:** adopt skill-driven CLI conventions across the docs ([8ba6967](https://github.com/agentrhq/authsome/commit/8ba6967f490502ab55bd401de25e8328bf476067))
50
+ * **site:** adopt skill-driven CLI conventions across the docs ([71b89fd](https://github.com/agentrhq/authsome/commit/71b89fd036dac67c7cfc761295b39889bf3ec9f2))
51
+
52
+ ## [0.3.1](https://github.com/agentrhq/authsome/compare/authsome-v0.3.0...authsome-v0.3.1) (2026-05-17)
53
+
54
+
55
+ ### Bug Fixes
56
+
57
+ * **cli:** resolve three CLI bugs and improve audit log command ([3b990e3](https://github.com/agentrhq/authsome/commit/3b990e3cae998d9ccaccb421b5cb577c2bb89de3))
58
+ * **cli:** resolve three CLI bugs, improve audit log, and sync docs ([dd9cad3](https://github.com/agentrhq/authsome/commit/dd9cad326cb2817826eb8e5b8bb42f5f3df8e2a2))
59
+
60
+
61
+ ### Documentation
62
+
63
+ * **cli:** sync reference and manual-testing guide with 0.3.0 implementation ([bdf659f](https://github.com/agentrhq/authsome/commit/bdf659ffffc0a0c2100cc21508daec042161656e))
64
+
3
65
  ## [0.3.0](https://github.com/agentrhq/authsome/compare/authsome-v0.2.4...authsome-v0.3.0) (2026-05-15)
4
66
 
5
67
 
@@ -86,7 +86,7 @@ The diff shows what changed. The commit message should say why. Future readers (
86
86
  ## Getting started
87
87
 
88
88
  ```bash
89
- git clone https://github.com/manojbajaj95/authsome.git
89
+ git clone https://github.com/agentrhq/authsome.git
90
90
  cd authsome
91
91
  uv pip install -e ".[dev]"
92
92
  pre-commit install # runs ruff automatically on every commit