tokenjam 0.2.3__tar.gz → 0.3.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (272) hide show
  1. tokenjam-0.3.1/CHANGELOG.md +209 -0
  2. {tokenjam-0.2.3 → tokenjam-0.3.1}/CLAUDE.md +57 -56
  3. {tokenjam-0.2.3 → tokenjam-0.3.1}/PKG-INFO +4 -2
  4. {tokenjam-0.2.3 → tokenjam-0.3.1}/README.md +1 -1
  5. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/architecture.md +46 -0
  6. tokenjam-0.3.1/docs/backfill/helicone.md +62 -0
  7. tokenjam-0.3.1/docs/backfill/langfuse.md +63 -0
  8. tokenjam-0.3.1/docs/backfill/otlp.md +46 -0
  9. tokenjam-0.3.1/docs/backfill/overview.md +36 -0
  10. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/configuration.md +19 -6
  11. tokenjam-0.3.1/docs/installation.md +65 -0
  12. tokenjam-0.3.1/docs/optimize/cache.md +67 -0
  13. tokenjam-0.3.1/docs/optimize/downsize.md +46 -0
  14. tokenjam-0.3.1/docs/optimize/script.md +89 -0
  15. tokenjam-0.3.1/docs/optimize/trim.md +86 -0
  16. tokenjam-0.3.1/docs/policy/overview.md +64 -0
  17. tokenjam-0.3.1/examples/alerts_and_drift/_shared.py +126 -0
  18. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/alerts_and_drift/budget_breach_demo.py +6 -1
  19. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/alerts_and_drift/drift_demo.py +8 -0
  20. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/alerts_and_drift/sensitive_actions_demo.py +8 -1
  21. {tokenjam-0.2.3 → tokenjam-0.3.1}/pyproject.toml +5 -1
  22. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/package.json +1 -1
  23. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/factories.py +18 -2
  24. tokenjam-0.3.1/tests/fixtures/helicone_real_response.json +79 -0
  25. tokenjam-0.3.1/tests/fixtures/langfuse_real_response.json +88 -0
  26. tokenjam-0.3.1/tests/fixtures/otlp_sample.json +76 -0
  27. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/integration/test_cli.py +172 -7
  28. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/integration/test_db.py +3 -3
  29. tokenjam-0.3.1/tests/manual-new-release-tests.md +180 -0
  30. tokenjam-0.3.1/tests/manual-pre-release-testing.md +418 -0
  31. tokenjam-0.3.1/tests/unit/test_cache_efficacy.py +149 -0
  32. tokenjam-0.3.1/tests/unit/test_cache_recommend.py +151 -0
  33. tokenjam-0.3.1/tests/unit/test_cmd_policy.py +224 -0
  34. tokenjam-0.3.1/tests/unit/test_compare.py +163 -0
  35. tokenjam-0.3.1/tests/unit/test_config_secret_divergence.py +103 -0
  36. tokenjam-0.3.1/tests/unit/test_export_claude_code.py +122 -0
  37. tokenjam-0.3.1/tests/unit/test_ingest_helicone.py +194 -0
  38. tokenjam-0.3.1/tests/unit/test_ingest_langfuse.py +167 -0
  39. tokenjam-0.3.1/tests/unit/test_ingest_otlp.py +155 -0
  40. tokenjam-0.3.1/tests/unit/test_prompt_bloat.py +180 -0
  41. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_spans_stats_repair.py +2 -1
  42. tokenjam-0.3.1/tests/unit/test_transport_401.py +123 -0
  43. tokenjam-0.3.1/tests/unit/test_workflow_restructure.py +219 -0
  44. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/app.py +4 -0
  45. tokenjam-0.3.1/tokenjam/api/routes/cost_compare.py +75 -0
  46. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/routes/logs.py +4 -0
  47. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/routes/metrics.py +25 -5
  48. tokenjam-0.3.1/tokenjam/api/routes/optimize.py +104 -0
  49. tokenjam-0.3.1/tokenjam/api/routes/spans.py +71 -0
  50. tokenjam-0.3.1/tokenjam/cli/cmd_backfill.py +263 -0
  51. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_budget.py +21 -6
  52. tokenjam-0.3.1/tokenjam/cli/cmd_cost.py +319 -0
  53. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_doctor.py +21 -0
  54. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_drift.py +9 -1
  55. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_onboard.py +192 -15
  56. tokenjam-0.3.1/tokenjam/cli/cmd_optimize.py +912 -0
  57. tokenjam-0.3.1/tokenjam/cli/cmd_policy.py +251 -0
  58. tokenjam-0.3.1/tokenjam/cli/cmd_report.py +196 -0
  59. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_status.py +18 -0
  60. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/main.py +5 -1
  61. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/core/api_backend.py +105 -0
  62. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/core/backfill.py +10 -2
  63. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/core/config.py +79 -0
  64. tokenjam-0.3.1/tokenjam/core/cost.py +327 -0
  65. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/core/db.py +40 -4
  66. tokenjam-0.3.1/tokenjam/core/export/__init__.py +11 -0
  67. tokenjam-0.3.1/tokenjam/core/export/claude_code.py +122 -0
  68. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/core/ingest.py +34 -0
  69. tokenjam-0.3.1/tokenjam/core/ingest_adapters/__init__.py +12 -0
  70. tokenjam-0.3.1/tokenjam/core/ingest_adapters/helicone.py +358 -0
  71. tokenjam-0.3.1/tokenjam/core/ingest_adapters/langfuse.py +325 -0
  72. tokenjam-0.3.1/tokenjam/core/ingest_adapters/otlp.py +143 -0
  73. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/core/models.py +34 -0
  74. tokenjam-0.3.1/tokenjam/core/optimize/README.md +41 -0
  75. tokenjam-0.3.1/tokenjam/core/optimize/__init__.py +76 -0
  76. tokenjam-0.3.1/tokenjam/core/optimize/analyzers/__init__.py +16 -0
  77. tokenjam-0.3.1/tokenjam/core/optimize/analyzers/budget_projection.py +181 -0
  78. tokenjam-0.3.1/tokenjam/core/optimize/analyzers/cache_efficacy.py +125 -0
  79. tokenjam-0.3.1/tokenjam/core/optimize/analyzers/cache_recommend.py +187 -0
  80. tokenjam-0.3.1/tokenjam/core/optimize/analyzers/model_downgrade.py +212 -0
  81. tokenjam-0.3.1/tokenjam/core/optimize/analyzers/prompt_bloat.py +350 -0
  82. tokenjam-0.3.1/tokenjam/core/optimize/analyzers/workflow_restructure.py +206 -0
  83. tokenjam-0.3.1/tokenjam/core/optimize/registry.py +26 -0
  84. tokenjam-0.3.1/tokenjam/core/optimize/runner.py +388 -0
  85. tokenjam-0.3.1/tokenjam/core/optimize/types.py +116 -0
  86. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/mcp/server.py +19 -8
  87. tokenjam-0.2.3/tokenjam/api/routes/spans.py → tokenjam-0.3.1/tokenjam/otel/otlp_parsing.py +111 -110
  88. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/otel/provider.py +29 -0
  89. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/otel/semconv.py +23 -0
  90. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/http_exporter.py +29 -1
  91. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/transport.py +34 -1
  92. tokenjam-0.3.1/tokenjam/utils/__init__.py +0 -0
  93. tokenjam-0.2.3/CHANGELOG.md +0 -150
  94. tokenjam-0.2.3/examples/alerts_and_drift/_shared.py +0 -55
  95. tokenjam-0.2.3/tests/manual-new-release-tests.md +0 -144
  96. tokenjam-0.2.3/tests/manual-pre-release-testing.md +0 -307
  97. tokenjam-0.2.3/tokenjam/cli/cmd_backfill.py +0 -110
  98. tokenjam-0.2.3/tokenjam/cli/cmd_cost.py +0 -90
  99. tokenjam-0.2.3/tokenjam/cli/cmd_optimize.py +0 -232
  100. tokenjam-0.2.3/tokenjam/core/cost.py +0 -102
  101. tokenjam-0.2.3/tokenjam/core/optimize.py +0 -570
  102. {tokenjam-0.2.3 → tokenjam-0.3.1}/.github/CODEOWNERS +0 -0
  103. {tokenjam-0.2.3 → tokenjam-0.3.1}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
  104. {tokenjam-0.2.3 → tokenjam-0.3.1}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
  105. {tokenjam-0.2.3 → tokenjam-0.3.1}/.github/ISSUE_TEMPLATE/integration_request.md +0 -0
  106. {tokenjam-0.2.3 → tokenjam-0.3.1}/.github/pull_request_template.md +0 -0
  107. {tokenjam-0.2.3 → tokenjam-0.3.1}/.github/workflows/ci.yml +0 -0
  108. {tokenjam-0.2.3 → tokenjam-0.3.1}/.github/workflows/publish-npm.yml +0 -0
  109. {tokenjam-0.2.3 → tokenjam-0.3.1}/.github/workflows/publish-pypi.yml +0 -0
  110. {tokenjam-0.2.3 → tokenjam-0.3.1}/.gitignore +0 -0
  111. {tokenjam-0.2.3 → tokenjam-0.3.1}/AGENTS.md +0 -0
  112. {tokenjam-0.2.3 → tokenjam-0.3.1}/CONTRIBUTING.md +0 -0
  113. {tokenjam-0.2.3 → tokenjam-0.3.1}/LICENSE +0 -0
  114. {tokenjam-0.2.3 → tokenjam-0.3.1}/Makefile +0 -0
  115. {tokenjam-0.2.3 → tokenjam-0.3.1}/SECURITY.md +0 -0
  116. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/alerts.md +0 -0
  117. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/claude-code-integration.md +0 -0
  118. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/cli-reference.md +0 -0
  119. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/export.md +0 -0
  120. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/framework-support.md +0 -0
  121. /tokenjam-0.2.3/tests/__init__.py → /tokenjam-0.3.1/docs/internal/specs/.gitkeep +0 -0
  122. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/nemoclaw-integration.md +0 -0
  123. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/openclaw.md +0 -0
  124. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/screenshots/tj-alerts.png +0 -0
  125. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/screenshots/tj-budget.png +0 -0
  126. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/screenshots/tj-cost.png +0 -0
  127. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/screenshots/tj-status.png +0 -0
  128. {tokenjam-0.2.3 → tokenjam-0.3.1}/docs/screenshots/tj-traces.png +0 -0
  129. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/README.md +0 -0
  130. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/multi/rag_pipeline.py +0 -0
  131. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/multi/research_team.py +0 -0
  132. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/multi/router_agent.py +0 -0
  133. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/multi/sample_docs/agent_patterns.txt +0 -0
  134. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/multi/sample_docs/cost_management.txt +0 -0
  135. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/multi/sample_docs/observability.txt +0 -0
  136. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/multi/sample_docs/safety.txt +0 -0
  137. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/openclaw/README.md +0 -0
  138. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/single_framework/autogen_agent.py +0 -0
  139. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/single_framework/crewai_agent.py +0 -0
  140. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/single_framework/langchain_agent.py +0 -0
  141. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/single_framework/langgraph_agent.py +0 -0
  142. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/single_framework/llamaindex_agent.py +0 -0
  143. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/single_provider/anthropic_agent.py +0 -0
  144. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/single_provider/bedrock_agent.py +0 -0
  145. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/single_provider/gemini_agent.py +0 -0
  146. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/single_provider/litellm_agent.py +0 -0
  147. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/single_provider/openai_agent.py +0 -0
  148. {tokenjam-0.2.3 → tokenjam-0.3.1}/examples/single_provider/openai_agents_sdk_agent.py +0 -0
  149. {tokenjam-0.2.3 → tokenjam-0.3.1}/incidents/hallucination-drift/BLOG.md +0 -0
  150. {tokenjam-0.2.3 → tokenjam-0.3.1}/incidents/hallucination-drift/README.md +0 -0
  151. {tokenjam-0.2.3 → tokenjam-0.3.1}/incidents/hallucination-drift/scenario.py +0 -0
  152. {tokenjam-0.2.3 → tokenjam-0.3.1}/incidents/retry-loop/BLOG.md +0 -0
  153. {tokenjam-0.2.3 → tokenjam-0.3.1}/incidents/retry-loop/README.md +0 -0
  154. {tokenjam-0.2.3 → tokenjam-0.3.1}/incidents/retry-loop/scenario.py +0 -0
  155. {tokenjam-0.2.3 → tokenjam-0.3.1}/incidents/surprise-cost/BLOG.md +0 -0
  156. {tokenjam-0.2.3 → tokenjam-0.3.1}/incidents/surprise-cost/README.md +0 -0
  157. {tokenjam-0.2.3 → tokenjam-0.3.1}/incidents/surprise-cost/scenario.py +0 -0
  158. {tokenjam-0.2.3 → tokenjam-0.3.1}/pricing/models.toml +0 -0
  159. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/README.md +0 -0
  160. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/package-lock.json +0 -0
  161. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/src/client.test.ts +0 -0
  162. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/src/client.ts +0 -0
  163. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/src/index.ts +0 -0
  164. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/src/semconv.test.ts +0 -0
  165. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/src/semconv.ts +0 -0
  166. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/src/span-builder.test.ts +0 -0
  167. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/src/span-builder.ts +0 -0
  168. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/src/types.ts +0 -0
  169. {tokenjam-0.2.3 → tokenjam-0.3.1}/sdk-ts/tsconfig.json +0 -0
  170. {tokenjam-0.2.3/tests/agents → tokenjam-0.3.1/tests}/__init__.py +0 -0
  171. {tokenjam-0.2.3/tests/e2e → tokenjam-0.3.1/tests/agents}/__init__.py +0 -0
  172. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/agents/email_agent_budget_breach.py +0 -0
  173. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/agents/email_agent_drift.py +0 -0
  174. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/agents/email_agent_loop.py +0 -0
  175. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/agents/email_agent_normal.py +0 -0
  176. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/agents/mock_llm.py +0 -0
  177. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/agents/test_mock_scenarios.py +0 -0
  178. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/conftest.py +0 -0
  179. {tokenjam-0.2.3/tests/integration → tokenjam-0.3.1/tests/e2e}/__init__.py +0 -0
  180. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/e2e/conftest.py +0 -0
  181. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/e2e/test_real_llm.py +0 -0
  182. {tokenjam-0.2.3/tests/synthetic → tokenjam-0.3.1/tests/integration}/__init__.py +0 -0
  183. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/integration/test_api.py +0 -0
  184. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/integration/test_demos.py +0 -0
  185. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/integration/test_full_pipeline.py +0 -0
  186. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/integration/test_logs_api.py +0 -0
  187. {tokenjam-0.2.3/tests/unit → tokenjam-0.3.1/tests/synthetic}/__init__.py +0 -0
  188. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/synthetic/test_alert_rules.py +0 -0
  189. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/synthetic/test_cost_tracking.py +0 -0
  190. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/synthetic/test_drift_detection.py +0 -0
  191. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/synthetic/test_ingest.py +0 -0
  192. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/synthetic/test_schema_validation.py +0 -0
  193. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/toy_agent/toy_agent.py +0 -0
  194. {tokenjam-0.2.3/tokenjam/api → tokenjam-0.3.1/tests/unit}/__init__.py +0 -0
  195. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_alerts.py +0 -0
  196. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_backfill.py +0 -0
  197. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_cmd_stop.py +0 -0
  198. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_config.py +0 -0
  199. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_cost.py +0 -0
  200. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_demo_env.py +0 -0
  201. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_demo_scenarios.py +0 -0
  202. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_drift.py +0 -0
  203. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_formatting.py +0 -0
  204. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_litellm_client.py +0 -0
  205. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_litellm_integration.py +0 -0
  206. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_logs_converter.py +0 -0
  207. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_mcp_server.py +0 -0
  208. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_models.py +0 -0
  209. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_onboard_codex.py +0 -0
  210. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_onboard_daemon.py +0 -0
  211. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_openclaw_ingest.py +0 -0
  212. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_optimize.py +0 -0
  213. {tokenjam-0.2.3 → tokenjam-0.3.1}/tests/unit/test_time_parse.py +0 -0
  214. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/__init__.py +0 -0
  215. {tokenjam-0.2.3/tokenjam/api/routes → tokenjam-0.3.1/tokenjam/api}/__init__.py +0 -0
  216. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/deps.py +0 -0
  217. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/middleware.py +0 -0
  218. {tokenjam-0.2.3/tokenjam/cli → tokenjam-0.3.1/tokenjam/api/routes}/__init__.py +0 -0
  219. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/routes/agents.py +0 -0
  220. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/routes/alerts.py +0 -0
  221. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/routes/budget.py +0 -0
  222. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/routes/cost.py +0 -0
  223. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/routes/drift.py +0 -0
  224. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/routes/otlp.py +0 -0
  225. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/routes/status.py +0 -0
  226. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/routes/tools.py +0 -0
  227. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/api/routes/traces.py +0 -0
  228. {tokenjam-0.2.3/tokenjam/core → tokenjam-0.3.1/tokenjam/cli}/__init__.py +0 -0
  229. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_alerts.py +0 -0
  230. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_demo.py +0 -0
  231. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_export.py +0 -0
  232. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_mcp.py +0 -0
  233. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_serve.py +0 -0
  234. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_stop.py +0 -0
  235. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_tools.py +0 -0
  236. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_traces.py +0 -0
  237. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/cli/cmd_uninstall.py +0 -0
  238. {tokenjam-0.2.3/tokenjam/demo → tokenjam-0.3.1/tokenjam/core}/__init__.py +0 -0
  239. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/core/alerts.py +0 -0
  240. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/core/drift.py +0 -0
  241. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/core/pricing.py +0 -0
  242. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/core/retention.py +0 -0
  243. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/core/schema_validator.py +0 -0
  244. {tokenjam-0.2.3/tokenjam/mcp → tokenjam-0.3.1/tokenjam/demo}/__init__.py +0 -0
  245. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/demo/env.py +0 -0
  246. {tokenjam-0.2.3/tokenjam/otel → tokenjam-0.3.1/tokenjam/mcp}/__init__.py +0 -0
  247. {tokenjam-0.2.3/tokenjam/sdk/integrations → tokenjam-0.3.1/tokenjam/otel}/__init__.py +0 -0
  248. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/otel/exporters.py +0 -0
  249. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/pricing/models.toml +0 -0
  250. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/py.typed +0 -0
  251. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/__init__.py +0 -0
  252. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/agent.py +0 -0
  253. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/bootstrap.py +0 -0
  254. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/client.py +0 -0
  255. {tokenjam-0.2.3/tokenjam/utils → tokenjam-0.3.1/tokenjam/sdk/integrations}/__init__.py +0 -0
  256. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/anthropic.py +0 -0
  257. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/autogen.py +0 -0
  258. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/base.py +0 -0
  259. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/bedrock.py +0 -0
  260. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/crewai.py +0 -0
  261. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/gemini.py +0 -0
  262. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/langchain.py +0 -0
  263. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/langgraph.py +0 -0
  264. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/litellm.py +0 -0
  265. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/llamaindex.py +0 -0
  266. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/nemoclaw.py +0 -0
  267. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/openai.py +0 -0
  268. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/sdk/integrations/openai_agents_sdk.py +0 -0
  269. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/ui/index.html +0 -0
  270. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/utils/formatting.py +0 -0
  271. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/utils/ids.py +0 -0
  272. {tokenjam-0.2.3 → tokenjam-0.3.1}/tokenjam/utils/time_parse.py +0 -0
@@ -0,0 +1,209 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/).
7
+
8
+ ## Unreleased
9
+
10
+ This release pivots TokenJam toward cost-optimization for autonomous agents. Four `tj optimize` analyzers — **Downsize**, **Trim**, **Cache**, and **Script** — read the same spans you'd otherwise pay LangSmith or Langfuse to host, then point at concrete cost cuts. Plan-tier-aware rendering means subscription users (Claude Pro, Max, ChatGPT Plus/Team) see "implied API value" framing instead of dollar "spend" claims they didn't pay. New backfill adapters for Langfuse, Helicone, and raw OTLP make the analyzers work against telemetry you already have elsewhere.
11
+
12
+ ### Added
13
+
14
+ **Cost-optimization analyzers (four new `tj optimize` findings).**
15
+
16
+ - **Downsize** (`--finding model-downgrade`, ships in base install). Existing analyzer, now with the v1.1 honest-output rendering described below. Flags sessions whose structural shape matches a class of work where a cheaper alternative model is worth reviewing — never claims quality equivalence.
17
+ - **Cache** — two findings under one product:
18
+ - `--finding cache-efficacy` (no content capture required). Measures current prompt-caching usage per (provider, model). Anthropic fully supported; OpenAI / Gemini best-effort; Bedrock / LiteLLM / Cohere unsupported in v1. Flags rows with ≥100K input tokens and <30% caching efficacy.
19
+ - `--finding cache-recommend` (Anthropic-only v1, requires `[capture] prompts = true`). Walks captured prompts, hashes the first ~2000 chars, flags prefixes shared by ≥3 calls as `cache_control` breakpoint candidates.
20
+ - **Script** (`--finding workflow-restructure`). Clusters sessions by `(tool_name, arg_shape)` signature. Flags clusters with ≥20 instances as candidates for replacement with deterministic shell scripts. `arg_shape` classifies args by type (`file_path` / `command_string` / `json_object` / `array` / `number` / `boolean` / `string`) so structural patterns cluster even when values vary. Degrades to tool-names-only when `capture.tool_inputs = false`.
21
+ - **Trim** (`--finding prompt-bloat`, requires optional `tokenjam[bloat]` extra). LLMLingua-2 token-significance classifier identifies long low-significance regions in captured prompts. Self-registers without the extra installed and surfaces a clear install hint on first run if missing. Model downloads on first use (~110MB) and caches under `~/.cache/tokenjam/models/`. Never auto-rewrites prompts — recommendations only.
22
+
23
+ **Plan-tier-aware rendering (v1.1 honest output).**
24
+
25
+ - New `plan_tier` column on `SessionRecord` (`api` / `pro` / `max_5x` / `max_20x` / `plus` / `team` / `enterprise` / `local` / `unknown`). `tj onboard --claude-code` and `tj onboard --codex` prompt for the user's plan and write it to `[budget.<provider>] plan = "..."`. `tj onboard --reconfigure` re-runs the prompts against an existing config. `--plan` CLI flag bypasses the interactive prompt for scripted onboards.
26
+ - New `billing_account` attribute on every span (`anthropic` / `openai` / `google` / `bedrock` / `local.ollama`). Provider-only identifier set by every integration that writes spans (OTel patches, Claude Code JSONL backfill, OTLP HTTP ingest, OTLP logs ingest, Langfuse / Helicone backfill adapters).
27
+ - New `pricing_mode` derived property on `SessionRecord` (`local` / `subscription` / `api` / `unknown`). Single source of truth for renderer branching.
28
+ - `tj optimize` renderers branch on `pricing_mode`: subscription users see "implied API value" framing — never a dollar "spend" figure they didn't pay. Header surfaces plan label + monthly fee multiplier; downgrade body reframes savings as token-share against the plan's allocation. Local users see token-only framing. API users see the existing dollar-denominated rendering. JSON output carries top-level `plan` and `pricing_mode` fields plus a `monthly_tokens_freed` field on downgrade findings for non-API plans. Budget projections suppressed for subscription users (no dollar-denominated cap).
29
+
30
+ **Backfill adapters.**
31
+
32
+ - `tj backfill langfuse` — `--source-url` (live API) and `--source-file` (JSON dump) modes. Maps Langfuse `Observation` records onto `NormalizedSpan` with deterministic span IDs for idempotent re-runs. `billing_account` derived from model name. Supports `{data: [...]}`, bare list, and NDJSON input shapes. See `docs/backfill/langfuse.md`.
33
+ - `tj backfill helicone` — same two-mode pattern. POSTs `/v1/request/query` against Helicone with Bearer auth in live mode. See `docs/backfill/helicone.md`.
34
+ - `tj backfill otlp` — generic OTLP-JSON ingestion for sources Langfuse and Helicone don't cover. Reuses the same parser as the live `POST /api/v1/spans` route via `tokenjam/otel/otlp_parsing.py`. See `docs/backfill/otlp.md`.
35
+
36
+ **Other surfaces.**
37
+
38
+ - `tj report --bloat [<agent_id>]` — generates an HTML visualization of the Trim analyzer's findings. Output saved under `~/.cache/tokenjam/reports/` and opened in the user's default browser. `--no-open` writes without opening.
39
+ - `tj policy list` — read-only preview of a unified `policy` view over existing alerts, drift, schema, sensitive-actions, capture, and per-provider budget configuration. Each row points back to the TOML section it was read from. `--json` supported. The full `tj policy add | edit | apply | remove | test` surface (and the underlying unified `[policy]` config migration) lands next sprint. See `docs/policy/overview.md`.
40
+ - `--compare` flag on `tj cost` and `tj optimize` — surfaces a window-cost diff against a prior period. Accepts `previous` / `last-week` / `last-month` / `last-7d` / `last-30d` keywords (equal-length prior window) or `YYYY-MM-DD:YYYY-MM-DD` for explicit ranges. Output includes spend delta, token delta, and top per-agent/per-model shifts with ▲/▼ indicators. `--json` returns a structured `CostDiff` payload.
41
+
42
+ **Documentation.**
43
+
44
+ - `docs/installation.md` documents the new `tokenjam[bloat]` optional extra (pulls torch + transformers, ~2GB) and the rest of the optional-extras matrix.
45
+ - `docs/configuration.md` now documents the four-flag `[capture]` config (`prompts` / `completions` / `tool_inputs` / `tool_outputs`), the strip-on-ingest gate in `IngestPipeline.process()`, and its precedence with `alerts.include_captured_content`.
46
+ - Per-analyzer pages: `docs/optimize/cache.md` (per-provider support table), `docs/optimize/script.md` (worked example of signature definition), `docs/optimize/trim.md` (install + capture requirements + performance numbers).
47
+ - `docs/architecture.md` gains a section on the `billing_account` and `plan_tier` semconv extensions, including the `pricing_mode` derivation rules.
48
+ - `docs/backfill/overview.md` lists all four backfill sources (claude-code, langfuse, helicone, otlp) with a partnership-posture note (we ingest from these tools; we don't replace them).
49
+ - `docs/internal/specs/v1.1-honest-output.md` committed as the canonical reference for Wave-1 rendering decisions.
50
+
51
+ ### Changed
52
+ - **`tj optimize --finding` replaces `--only`.** Registry-driven valid choices (`model-downgrade`, `budget-projection`, plus any new analyzer). Repeatable. The `--only model|budget` flag has been removed (no backwards-compat per the no-external-users decision).
53
+ - **`tj onboard` no longer auto-writes `[budget.anthropic] usd = 200`.** Subscription users see no auto-written ceiling; API users are explicitly prompted for an optional self-imposed monthly ceiling.
54
+ - **`tj status` surfaces unknown plan tiers.** When sessions exist with `plan_tier = 'unknown'`, prints a one-line note pointing the user at `tj onboard --reconfigure`. Exit code unchanged.
55
+ - **`tj optimize` plan-tier-aware rendering.** When every session in the window has `plan_tier = 'unknown'`, dollar figures are suppressed and a header note explains why. Mixed / partial-unknown windows render normally with an advisory note.
56
+ - **MCP `get_optimize_report` tool.** Now accepts `findings: list[str]` (was `only: str`). Docstring surfaces for both API-billing and subscription-plan-efficiency phrasings.
57
+
58
+ ### Internal
59
+ - **Registry-driven optimize analyzers.** `tokenjam/core/optimize.py` split into `tokenjam/core/optimize/` package with `registry.py`, `runner.py`, `types.py`, and `analyzers/` subpackage using `pkgutil` auto-discovery. New analyzers drop a file under `analyzers/` with a `@register("name")` decorator — nothing else needs editing. See `tokenjam/core/optimize/README.md`.
60
+ - **`OptimizeReport.findings` generic dict.** Wave 2 analyzers attach their results here keyed by registration name. Adding a new analyzer no longer requires a typed slot on `OptimizeReport`. The existing typed slots (`downgrade`, `budgets`) stay for backwards compatibility with `cmd_optimize` and the MCP server.
61
+ - **`TjAttributes.BILLING_ACCOUNT` and `TjAttributes.PLAN_TIER`** semconv constants, plus `VALID_PLAN_TIERS` / `SUBSCRIPTION_PLAN_TIERS` frozensets in `tokenjam.otel.semconv`.
62
+ - **`tokenjam[bloat]` optional dependency.** Pulls `llmlingua>=0.2` (and transitively torch + transformers, ~2GB). Kept out of the base install.
63
+ - **DuckDB migration 4** adds `spans.billing_account TEXT` and `sessions.plan_tier TEXT DEFAULT 'unknown'`. New columns use `ALTER TABLE ADD COLUMN`; no backfill heuristics (product has no external users).
64
+ - **`IngestPipeline._build_or_update_session`** late-resolves `plan_tier` when a session starts on a tool span (no billing signal) and a later LLM span carries `billing_account`. Once set to a known value, `plan_tier` is never demoted back to `unknown`.
65
+ - **Test factories** `make_session(plan_tier="api")` and `make_llm_span(billing_account="anthropic")` carry safe defaults so existing tests behave as before.
66
+
67
+ ## [0.1.7] - 2026-04-13
68
+
69
+ ### Added
70
+ - **MCP server (`tj mcp`)** — stdio-based Model Context Protocol server giving Claude Code direct access to TokenJam observability data. 13 tool handlers: status, traces, alerts, budget headroom, cost summary, drift report, tool stats, trace detail, acknowledge alerts, setup project, list sessions, open dashboard. Dual-mode operation: routes queries through REST API when `tj serve` is running, falls back to read-only DuckDB otherwise. Auto-starts `tj serve` on demand.
71
+ - **Claude Code integration (`tj onboard --claude-code`)** — one-command setup for Claude Code telemetry. Configures OTLP log exporter in `~/.claude/settings.json`, sets project-level `OTEL_RESOURCE_ATTRIBUTES`, adds Docker-compatible endpoint to shell env, and optionally installs background daemon. Re-runs resync the auth header to fix 401s without manual setup.
72
+ - **Logs ingestion (`POST /v1/logs`)** — new OTLP log endpoint that converts Claude Code log events (`api_request`, `tool_result`, `api_error`, `user_prompt`, `tool_decision`) into NormalizedSpans with deterministic trace/span IDs. Spans flow through the standard ingest pipeline for cost, alerts, and drift.
73
+ - **`tj drift` CLI** — behavioral drift report with Rich table output showing baseline vs latest session Z-scores per dimension (input tokens, output tokens, duration, tool call count, tool sequence similarity). Color-coded thresholds, `--json` support, exit code 1 if drift detected.
74
+ - **`tj budget` CLI + API** — view and set per-agent daily/session cost limits. `GET/POST /api/v1/budget` endpoints. `resolve_effective_budget()` with per-field fallback so each budget dimension independently falls back to defaults.
75
+ - **Architecture documentation** (`docs/architecture.md`) — comprehensive architecture doc covering design principles, data flow, SDK internals, alert system, drift detection, MCP server, Claude Code pipeline, and testing architecture.
76
+ - `ClaudeCodeEvents` semantic conventions in `tokenjam/otel/semconv.py` for Claude Code log event attributes
77
+
78
+ ### Fixed
79
+ - Budget resolution inconsistency between AlertEngine enforcement and CLI display — both now use `resolve_effective_budget()` with field-level merge
80
+ - Drift display threshold bug in Z-score comparison
81
+ - `tj stop` now passes `-w` to `launchctl unload` to prevent auto-restart on macOS; added Linux systemd support
82
+ - Waterfall tooltip clipping for right-edge spans in web UI
83
+ - CLAUDE.md install command corrected from `pip install tokenjam` to `pip install tokenjam`
84
+
85
+ ### Improved
86
+ - README updated with Claude Code integration section, budget/drift CLI references, MCP server docs
87
+ - Web UI: budget headroom display, cost-today column in active sessions table, tooltip polish
88
+ - Onboard wizard: expanded for Claude Code workflow, status command enhancements
89
+ - MCP tool descriptions optimized for better agent tool selection
90
+
91
+ ### Changed
92
+ - Removed historical task specs from `.claude/specs/` (design intent preserved in `docs/architecture.md`)
93
+ - CLAUDE.md "Task Specs" section replaced with "Further Reading" linking to architecture doc
94
+ - 338 tests passing (up from 223)
95
+
96
+ ## [0.1.6] - 2026-04-08
97
+
98
+ ### Improved
99
+ - **`tj onboard` UX overhaul**
100
+ - Removed agent ID prompt — agents are auto-discovered when spans arrive
101
+ - Budget is now a global default (`[defaults.budget]`) that applies to all agents
102
+ - Per-agent `[agents.X.budget]` overrides the default when configured
103
+ - Cleaner budget prompt: "Daily budget in USD per agent (0 = no limit, default 5)"
104
+ - Daemon installs automatically (skip with `--no-daemon`)
105
+ - Rich next-steps output with instrumentation code example
106
+ - Minimal config file with commented per-agent example
107
+
108
+ ## [0.1.5] - 2026-04-08
109
+
110
+ ### Fixed
111
+ - **Pricing file missing from pip wheel** — `pricing/models.toml` was at the repo root, outside the `tokenjam/` package. Moved to `tokenjam/pricing/models.toml` so it's included in the wheel. All costs showed `$0.000000` in v0.1.4.
112
+
113
+ ### Improved
114
+ - **Web UI polish** — custom hover tooltips on waterfall bars (cost, duration, model), back arrow on trace detail, agent name heading, tighter layout, hint text on Status and Traces views
115
+ - **Waterfall bar labels** — now show cost alongside duration and model name
116
+
117
+ ### Added
118
+ - Manual release testing checklist (`tests/manual-new-release-tests.md`)
119
+ - Pre-release testing checklist (`tests/manual-pre-release-testing.md`)
120
+
121
+ ### Changed
122
+ - Task specs moved from `.claude/` to `.claude/specs/`
123
+
124
+ ## [0.1.4] - 2026-04-08
125
+
126
+ ### Fixed
127
+ - SDK DuckDB lock error when `tj serve` is running — bootstrap now detects the server and sends spans via HTTP (`TjHttpExporter`) instead of opening DuckDB directly
128
+ - LiteLLM model names no longer include provider prefix (`gpt-4o-mini` not `openai/gpt-4o-mini`), fixing pricing lookup failures
129
+ - LiteLLM streaming wrappers now correctly attribute provider and stripped model name
130
+
131
+ ### Added
132
+ - OpenClaw integration — zero-code OTLP ingestion for OpenClaw agents (PR #15)
133
+ - Web UI restyled to opencla.watch palette (deep navy + electric blue, IBM Plex Mono, Bricolage Grotesque)
134
+ - Inline SVG logo in web UI sidebar
135
+
136
+ ### Changed
137
+ - Node.js upgraded from 20 to 22 in CI and publish workflows
138
+ - npm SDK bumped to 0.1.4 (matching Python release)
139
+ - README: added Web UI section, updated roadmap (4 items complete, 5 new)
140
+
141
+ ## [0.1.3] - 2026-04-07
142
+
143
+ ### Added
144
+ - **Web UI** — local dashboard served by `tj serve` at `http://127.0.0.1:7391/`
145
+ - Status view with agent cards, cost, tokens, alerts (auto-refresh 5s)
146
+ - Traces view with span waterfall visualization and click-to-inspect detail
147
+ - Cost view with breakdown by day/agent/model/tool and summary totals
148
+ - Alerts view with severity filtering and expandable JSON detail
149
+ - Drift view with baseline vs latest session Z-score pass/fail
150
+ - `GET /api/v1/status` endpoint — agent status data (mirrors `tj status --json`)
151
+ - Drift endpoint now lists all agents when `agent_id` is omitted
152
+ - LiteLLM provider integration (`patch_litellm()`)
153
+ - Single-file Preact SPA — no build step, dark theme, JetBrains Mono
154
+
155
+ ### Changed
156
+ - CORS updated to regex matching for `localhost:*` ports
157
+ - API key injected into UI via `<meta>` tag (no user prompt needed)
158
+
159
+ ## [0.1.2] - 2026-04-07
160
+
161
+ ### Fixed
162
+ - `tj serve` printing wrong metrics port (9464 instead of 7391)
163
+ - `tj onboard` launchd daemon install now degrades gracefully on failure instead of crashing
164
+ - CLI commands now fall back to REST API when DuckDB is locked by `tj serve`
165
+
166
+ ### Added
167
+ - `tj stop` command — graceful shutdown of daemon or background process
168
+ - `tj uninstall` command — clean removal of all TokenJam data, config, and daemon
169
+ - 16 runnable example agents across 4 tiers: single provider, single framework, multi-agent, and alerts/drift demos
170
+ - API fallback backend (`ApiBackend`) so CLI works while `tj serve` holds the DB lock
171
+
172
+ ### Changed
173
+ - README: added toy agent quick-start, example agents section, corrected metrics URL, updated CLI reference
174
+ - CLAUDE.md: updated CLI command table, repo layout, added PyPI package name rule
175
+
176
+ ## [0.1.1] - 2026-04-06
177
+
178
+ ### Fixed
179
+ - `tj export` returning empty output due to corrupted DuckDB span indexes
180
+ - `tj status` showing `?` instead of `●` for completed sessions
181
+ - `tj status` showing `$0.000000` cost due to `date.today()` vs UTC date mismatch
182
+ - `tj cost` showing spurious `$0.000000` row from session-level spans with no model
183
+
184
+ ### Added
185
+ - `tj trace` prefix matching — short trace IDs now resolve like git short hashes
186
+ - PyPI and npm publish workflows (`publish-pypi.yml`, `publish-npm.yml`)
187
+ - PyPI metadata: README as long description, classifiers, project URLs
188
+ - `CODEOWNERS` requiring review from @anilmurty
189
+
190
+ ### Changed
191
+ - Renamed npm package from `@tokenjam/sdk` to `@tokenjam/sdk`
192
+ - Consolidated `AGENTS.md` to point at `CLAUDE.md` as source of truth
193
+
194
+ ## [0.1.0] - 2026-04-05
195
+
196
+ ### Added
197
+ - Core observability pipeline: span ingestion, session tracking, cost calculation
198
+ - DuckDB storage backend with migration runner
199
+ - 13 alert types with 6 dispatch channels (stdout, file, ntfy, webhook, Discord, Telegram)
200
+ - Z-score behavioral drift detection with automatic baseline building
201
+ - JSON Schema validation for tool outputs (declared or genson-inferred)
202
+ - CLI commands: `onboard`, `status`, `traces`, `cost`, `alerts`, `drift`, `tools`, `export`, `serve`, `doctor`
203
+ - REST API with OTLP JSON ingest endpoint and Prometheus metrics
204
+ - Python SDK: `@watch()` decorator, `patch_anthropic()`, `patch_openai()`, and 9 more provider/framework integrations
205
+ - TypeScript SDK (`@tokenjam/sdk`): `TjClient` and `SpanBuilder` for Node.js agents
206
+ - Auto-bootstrap: TracerProvider initializes lazily on first `@watch()` or `patch_*()` call
207
+ - Community-maintained model pricing table (`pricing/models.toml`)
208
+ - Session continuity via `conversation_id` across process restarts
209
+ - GitHub Actions CI (Python 3.10/3.11/3.12 + TypeScript)
@@ -37,34 +37,6 @@ cd sdk-ts && npm install && npm test
37
37
  ```
38
38
 
39
39
 
40
- ## Repo Layout
41
-
42
- ```
43
- tokenjam/
44
- ├── tokenjam/ Python package
45
- │ ├── cli/ Click CLI commands (one file per command)
46
- │ ├── core/ Domain logic — NO CLI or HTTP imports allowed here
47
- │ ├── otel/ OTel SDK wiring + semantic conventions
48
- │ ├── api/ FastAPI local REST API
49
- │ ├── mcp/ MCP stdio server (Claude Code integration)
50
- │ ├── sdk/ Python instrumentation SDK
51
- │ └── utils/ Formatting, time parsing, ID generation
52
- ├── examples/ Runnable example agents (see examples/README.md)
53
- │ ├── single_provider/ One file per LLM provider integration
54
- │ ├── single_framework/ One file per framework integration
55
- │ ├── multi/ Multi-provider/framework examples + sample_docs/
56
- │ └── alerts_and_drift/ Alert and drift demos (no API keys needed)
57
- ├── sdk-ts/ TypeScript SDK (@tokenjam/sdk)
58
- ├── pricing/ models.toml — community-maintained model pricing (USD per million tokens)
59
- └── tests/
60
- ├── factories.py Span factory — use this in ALL tests
61
- ├── unit/ Pure logic tests, no I/O
62
- ├── synthetic/ Span injection tests via factories.py
63
- ├── agents/ Mock agent scenario scripts
64
- ├── integration/ CLI + API integration tests
65
- └── e2e/ Real LLM tests — skipped without API key env vars
66
- ```
67
-
68
40
  ## Architecture
69
41
 
70
42
  ### Data Flow
@@ -92,18 +64,21 @@ Post-ingest hooks run synchronously after each span is written to DB:
92
64
  - **`tokenjam/core/cost.py`**: `calculate_cost()` (pure function, rounds to 8dp) + `CostEngine` (post-ingest hook that updates `spans.cost_usd` and `sessions.total_cost_usd` via `db.conn` — see db.py note). Pricing loaded from `pricing/models.toml`.
93
65
  - **`tokenjam/core/alerts.py`**: `AlertEngine` with 13 alert types, `CooldownTracker` (in-memory, per agent+type, resets on restart), `AlertDispatcher` routing to 6 channel types (stdout, file, ntfy, webhook, Discord, Telegram). `AlertEngine.fire()` is the external entry point for other modules (SchemaValidator, DriftDetector) to fire alerts. Suppressed alerts are still persisted to DB but not dispatched to channels. Hardcoded thresholds: retry loop fires at 4+ identical tool calls in last 6 spans; failure rate fires at >20% errors in last 20 spans (checked every 5th error); session duration default 3600s. Stdout and file channels always include full detail regardless of `include_captured_content` config.
94
66
  - **`tokenjam/core/drift.py`**: `DriftDetector` — Z-score based behavioral drift detection, fires at session end.
95
- - **`tokenjam/core/optimize.py`**: Two analyzers used by `tj optimize` and the `get_optimize_report` MCP tool. `analyze_model_downgrade()` flags sessions whose structural shape (input < 5K tokens AND output < 500 tokens AND tool_calls 5) matches a class of work where a cheaper alternative model is worth reviewing never claims quality equivalence. `MODEL_DOWNGRADE_CAVEAT` is in the dataclass default so it cannot be removed accidentally. `project_budget()` projects current cycle spend against a `[budget.<provider>]` ceiling; only fires when budget > 0. Both functions operate on `db.conn` directly.
67
+ - **`tokenjam/core/optimize/`**: Package powering `tj optimize` and the `get_optimize_report` MCP tool. Public API re-exported from `__init__.py`: `build_report()` (orchestrator), `report_to_dict()`, `ANALYZER_REGISTRY`, `ANALYZER_ORDER`, plus result dataclasses. Architecture: `registry.py` holds the `@register("name")` decorator and `ANALYZER_REGISTRY` dict; `runner.py` defines `ANALYZER_ORDER` and orchestrates execution; `types.py` holds `AnalyzerContext` + result dataclasses + `MODEL_DOWNGRADE_CAVEAT`. Individual analyzers live in `analyzers/`, each as a single file registering via `@register`: `model_downgrade.py` (structural candidates input < 5K tokens AND output < 500 tokens AND tool_calls ≤ 5; never claims quality equivalence, caveat baked into dataclass default), `budget_projection.py` (per-provider cycle spend vs `[budget.<provider>]` ceiling; only fires when budget > 0), `cache_efficacy.py`, `cache_recommend.py`, `prompt_bloat.py`, `workflow_restructure.py`. Analyzers receive an `AnalyzerContext` and operate on `db.conn` directly. To add a new analyzer: drop a file under `analyzers/`, decorate with `@register("name")`, append to `ANALYZER_ORDER` if ordering matters — `cmd_optimize --finding` choices auto-derive from the registry.
68
+ - **`tokenjam/core/ingest_adapters/`**: Third-party trace-export adapters that normalize external payloads (`langfuse.py`, `helicone.py`, `otlp.py`) into `NormalizedSpan` for ingest. Each is reachable as a `tj backfill <name>` subcommand and accepts `--source-url` (live API) or `--source-file` (offline JSON dump). Adapters write deterministic span IDs derived from the source's identifiers so re-runs are idempotent. `otlp.py` shares span-mapping logic with the live `POST /api/v1/spans` route via `tokenjam/otel/otlp_parsing.py`.
69
+ - **`tokenjam/core/export/`**: Routing-config snippet generators for `tj optimize --export-config`. Currently `claude_code.py` emits a JSONC fragment under a `tokenjam.routing_recommendations` namespace with honest-framing caveat comments baked in. Writes to `~/.config/tokenjam/exports/`; never touches `~/.claude/settings.json` or other external configs (no `--apply` flag — Claude Code doesn't currently honor TokenJam routing keys, so auto-writing would change nothing and erode trust).
96
70
  - **`tokenjam/core/backfill.py`**: Parses Claude Code on-disk session JSONL files into `NormalizedSpan`s. Cost is recomputed from `pricing/models.toml` because the on-disk format has no `cost_usd`. The parser tolerates the dated `claude-<family>-<ver>-YYYYMMDD` model-name suffixes Anthropic ships (handled by `core/pricing.py.get_rates()`, which strips the trailing 8-digit date suffix when no exact pricing match exists). Idempotency relies on deterministic span IDs derived from `(session_id, message uuid)` / `(session_id, tool_use id)`.
97
71
  - **`tokenjam/core/schema_validator.py`**: Validates tool outputs against declared or genson-inferred JSON Schema. Only fires on `gen_ai.tool.call` spans with `gen_ai.tool.output` in attributes. Schema priority: 1) declared file from agent config `output_schema`, 2) inferred schema from `DriftBaseline.output_schema_inferred`. Caches schemas in-memory per agent.
98
- - **`tokenjam/core/models.py`**: All domain dataclasses — `NormalizedSpan`, `SessionRecord`, `Alert`, `DriftBaseline`, filter types, etc.
99
- - **`tokenjam/core/config.py`**: `TjConfig` dataclass tree, TOML loading/writing, config file discovery.
72
+ - **`tokenjam/core/models.py`**: All domain dataclasses — `NormalizedSpan`, `SessionRecord`, `Alert`, `DriftBaseline`, filter types, etc. `NormalizedSpan` carries `billing_account` (provider-only: `anthropic` / `openai` / `google` / `bedrock` / `local.ollama`). `SessionRecord` carries `plan_tier` (api / pro / max_5x / max_20x / plus / team / enterprise / local / unknown) plus a derived `pricing_mode` property (`local` / `subscription` / `api` / `unknown`). Spans inherit plan via the session FK — analyzers JOIN through `SessionRecord` when they need plan context. See [`docs/architecture.md`](docs/architecture.md) → "OTel semconv extensions" for the full derivation rules.
73
+ - **`tokenjam/core/config.py`**: `TjConfig` dataclass tree, TOML loading/writing, config file discovery. `ProviderBudget` carries an optional `plan` field (set by `tj onboard`'s plan-tier prompt) that `IngestPipeline._build_or_update_session` reads to populate `SessionRecord.plan_tier` at session creation. `CaptureConfig` has four fine-grained content-capture toggles (`prompts` / `completions` / `tool_inputs` / `tool_outputs`); `strip_captured_content()` in `core/ingest.py` enforces them at the single ingest-pipeline gate.
100
74
  - **`tokenjam/sdk/agent.py`**: `@watch()` decorator creates session spans only. `record_llm_call()` and `record_tool_call()` create child spans for manual instrumentation. LLM call spans from provider clients require `patch_anthropic()`, `patch_openai()`, etc.
101
75
  - **`tokenjam/sdk/transport.py`**: `HttpTransport` — buffers up to 1000 spans, retries with exponential backoff (3 attempts, 2s base). Used when `tj serve` runs as a separate process.
102
76
  - **`tokenjam/sdk/bootstrap.py`**: `ensure_initialised()` — lazy, thread-safe, idempotent bootstrap of config -> DB -> IngestPipeline -> TracerProvider. Called automatically by `@watch()` and all `patch_*()` functions. Registers atexit flush.
103
77
  - **`tokenjam/sdk/integrations/`**: `Integration` protocol in `base.py`. Provider patches (anthropic, openai, gemini, bedrock, litellm) monkey-patch client methods to create OTel spans with token usage. `litellm.py` covers 100+ providers via LiteLLM's unified interface and uses a `contextvars.ContextVar` (`_tj_litellm_active`) to suppress inner provider patches (openai, anthropic) when active — prevents double-counted spans. Framework patches (langchain, langgraph, crewai, autogen) wrap LLM/tool methods. `llamaindex.py` and `openai_agents_sdk.py` are thin wrappers around those SDKs' native OTel support. `nemoclaw.py` is a WebSocket observer for OpenShell Gateway sandbox events.
104
78
  - **`tokenjam/otel/provider.py`**: `TjSpanExporter` (custom `SpanExporter` that feeds spans into `IngestPipeline`), `convert_otel_span()` (OTel `ReadableSpan` → `NormalizedSpan`), `build_tracer_provider()` (sets up global `TracerProvider` with local + optional OTLP exporters).
105
79
  - **`tokenjam/otel/exporters.py`**: Prometheus metric reader setup via `build_prometheus_exporter()`.
106
- - **`tokenjam/otel/semconv.py`**: `GenAIAttributes` and `TjAttributes` — OTel GenAI semantic convention constants.
80
+ - **`tokenjam/otel/otlp_parsing.py`**: Shared OTLP JSON → `NormalizedSpan` parser. Two callers: `api/routes/spans.py` (live `POST /api/v1/spans`) and `core/ingest_adapters/otlp.py` (`tj backfill otlp`). Keep parsing in this one place the live receive path and the backfill adapter must agree on attribute extraction, billing_account derivation, and timestamp handling.
81
+ - **`tokenjam/otel/semconv.py`**: `GenAIAttributes`, `TjAttributes` (includes `BILLING_ACCOUNT` and `PLAN_TIER`), `VALID_PLAN_TIERS` and `SUBSCRIPTION_PLAN_TIERS` frozensets — OTel GenAI semantic convention constants plus tj-specific extensions.
107
82
  - **`tokenjam/api/app.py`**: FastAPI app factory. `tj serve` starts it with uvicorn. Accepts `db`, `config`, `ingest_pipeline` for testability. Registers all routers under `/api/v1` plus `/metrics`.
108
83
  - **`tokenjam/api/middleware.py`**: `IngestAuthMiddleware` — protects `POST /api/v1/spans` with Bearer token. Returns `JSONResponse(401)` directly (not `HTTPException`, which doesn't propagate from `BaseHTTPMiddleware.dispatch`).
109
84
  - **`tokenjam/api/deps.py`**: `require_api_key` — FastAPI dependency for optional API key auth on GET endpoints. Only enforced when `api.auth.enabled = true` in config.
@@ -113,31 +88,25 @@ Post-ingest hooks run synchronously after each span is written to DB:
113
88
 
114
89
  ### CLI Commands
115
90
 
116
- | Command | File | Description |
117
- |---|---|---|
118
- | `tj onboard` | `cmd_onboard.py` | Setup wizard: agent ID, budget, ingest secret, optional daemon install (launchd/systemd) |
119
- | `tj status` | `cmd_status.py` | Agent overview: session, cost, tokens, alerts. Exit 1 if active alerts |
120
- | `tj traces` | `cmd_traces.py` | List recent traces in table format |
121
- | `tj trace <id>` | `cmd_traces.py` | Span waterfall tree for a single trace |
122
- | `tj cost` | `cmd_cost.py` | Cost breakdown by day/agent/model/tool with `--json` support |
123
- | `tj alerts` | `cmd_alerts.py` | Alert history with severity/type filtering |
124
- | `tj tools` | `cmd_tools.py` | Tool call summary: call counts, avg duration |
125
- | `tj export` | `cmd_export.py` | Export spans as json (NDJSON), csv, otlp, or openevals format |
126
- | `tj serve` | `cmd_serve.py` | Start FastAPI + uvicorn server with retention cleanup cron |
127
- | `tj stop` | `cmd_stop.py` | Stop background daemon or tj serve process |
128
- | `tj budget` | `cmd_budget.py` | Get/set daily and session budget limits per agent or globally |
129
- | `tj drift` | `cmd_drift.py` | Show drift baselines and Z-scores for recent sessions |
130
- | `tj demo [scenario]` | `cmd_demo.py` | Run Agent Incident Library scenarios (zero-config, no API keys). `tj demo` lists all; `tj demo retry-loop` runs one |
131
- | `tj mcp` | `cmd_mcp.py` | Start the stdio MCP server for Claude Code integration |
132
- | `tj uninstall` | `cmd_uninstall.py` | Remove all TokenJam data, config, and daemon |
133
- | `tj doctor` | `cmd_doctor.py` | Health checks (config, DB, secrets, webhooks, drift readiness, schema-vs-capture consistency). Exit 0 = ok, 1 = warnings, 2 = errors |
134
- | `tj optimize` | `cmd_optimize.py` | Two analyzers: model-downgrade candidates + per-provider budget projection. `--since 30d`, `--only model\|budget`, `--budget <provider>`, `--budget-usd <amount>`. JSON output supported. Opens the live DB read-only so it works alongside a running `tj serve`. |
135
- | `tj backfill claude-code` | `cmd_backfill.py` | Parse `~/.claude/projects/*.jsonl` and ingest historical sessions. Idempotent — deterministic span IDs (SHA-256 of `session_id + uuid`) mean re-runs skip already-ingested rows. Auto-invoked at the end of `tj onboard --claude-code`. Future agent log formats (Codex, etc.) plug in as additional subcommands. |
91
+ `tj --help` lists all commands; most are self-explanatory. Non-obvious ones:
92
+
93
+ - **`tj demo [scenario]`** (`cmd_demo.py`) runs Agent Incident Library scenarios (zero-config, no API keys). `tj demo` lists all; `tj demo retry-loop` runs one.
94
+ - **`tj doctor`** (`cmd_doctor.py`) health checks (config, DB, secrets, webhooks, drift readiness, schema-vs-capture consistency). Exit 0 = ok, 1 = warnings, 2 = errors.
95
+ - **`tj optimize`** (`cmd_optimize.py`) six analyzers, registry-driven: `model-downgrade`, `budget-projection`, `cache-efficacy`, `cache-recommend`, `workflow-restructure`, `prompt-bloat`. Flags: `--since 30d`, `--finding <name>` (repeatable; choices auto-derive from `ANALYZER_REGISTRY` at click decoration time), `--budget <provider>`, `--budget-usd <amount>`, `--compare <period>` (window-cost diff vs prior period; accepts `previous` / `last-week` / `last-month` / `last-7d` / `last-30d` / `YYYY-MM-DD:YYYY-MM-DD`), `--export-config <target>` (writes a routing snippet — currently `claude-code` — under `~/.config/tokenjam/exports/`; no `--apply` flag by design). Plan-tier-aware rendering: subscription users see "implied API value" framing and token-share savings (never dollar "spend"); local users see token-only framing; unknown-plan users see dollar figures suppressed with a `tj onboard --reconfigure` hint. Opens the live DB read-only so it works alongside a running `tj serve`.
96
+ - **`tj cost`** (`cmd_cost.py`) — cost breakdown by `--group-by agent|model|day|tool`. Same `--compare <period>` flag as `tj optimize` for window-over-window diffs (▲/▼ indicators, per-agent and per-model top-shifts, dollar + token deltas).
97
+ - **`tj backfill <source>`** (`cmd_backfill.py`) ingest historical telemetry from external sources. Subcommands: `claude-code` (parses `~/.claude/projects/*.jsonl`, auto-invoked at the end of `tj onboard --claude-code`), `langfuse` (live API or JSON dump), `helicone` (live API or JSON dump), `otlp` (raw OTLP JSON via URL or file — reuses the same parser as the live `POST /api/v1/spans` route). All idempotent via deterministic span IDs.
98
+ - **`tj onboard`** (`cmd_onboard.py`) `--claude-code` and `--codex` flags trigger integration-specific flows. Prompts for plan tier (api / pro / max_5x / max_20x for Anthropic; api / plus / team / enterprise for OpenAI) and writes it to `[budget.<provider>] plan = "..."`. Supports `--reconfigure` to re-prompt against an existing config, and `--plan <tier>` for non-interactive use. Does NOT auto-write a default `usd = 200` cycle ceiling — subscription users get only the `plan` field; API users are explicitly asked whether they want a self-imposed ceiling.
99
+ - **`tj report`** (`cmd_report.py`) generates standalone HTML visualizations of analyzer findings (e.g. `tj report --bloat [<agent_id>]` renders the prompt-bloat analyzer's per-token significance). Writes to `~/.cache/tokenjam/reports/` (override via `TOKENJAM_REPORT_DIR`) and opens in the default browser.
100
+ - **`tj policy list`** (`cmd_policy.py`) — read-only preview of the unified policy surface. Consolidates existing `[alerts]`, `[alerts.channels]`, `[defaults.budget]`, `[budget.<provider>]`, per-agent `budget`/`drift`/`sensitive_actions`/`output_schema`, and `[capture]` config into one table; each row carries its source TOML section. Supports `--json`. `tj policy add | edit | apply | remove | test` are intentionally absent this sprint — the unified config migration is next sprint's work. `policy` is in `no_db_commands` in `cli/main.py` so it doesn't open the DB. Rich source-section strings (`[budget.anthropic]`, `[[alerts.channels]]`) must be passed through `rich.markup.escape()` before rendering — otherwise Rich consumes them as style tags.
136
101
 
137
102
  All commands support `--json` for machine-readable output. Commands that query alerts use exit code 1 if active (unacknowledged, unsuppressed) alerts exist.
138
103
 
139
104
  **CLI testing pattern:** Tests use `click.testing.CliRunner` with `unittest.mock.patch` on `tokenjam.cli.main.load_config` and `tokenjam.cli.main.open_db` to inject an `InMemoryBackend` and test config. See `tests/integration/test_cli.py`. Note: `cmd_doctor` opens its own DuckDB connection via `config.storage.path` to verify writability — in tests you must set this to a real temp path (e.g. `tmp_path / "test.duckdb"`).
140
105
 
106
+ **`no_db_commands` in `cli/main.py`:** Commands that don't open the DB at startup — currently `{stop, uninstall, onboard, mcp, demo, policy}`. New commands that read only from config (or do their own DB connection later) should be added to this set so they work when `tj serve` holds the write lock. Tests for these commands can patch `open_db` with `side_effect=AssertionError(...)` to verify they never touch the DB.
107
+
108
+ **Test factories:** `tests/factories.py` provides `make_llm_span(billing_account="anthropic", ...)` and `make_session(plan_tier="api", ...)` with safe defaults that preserve existing test behavior. Tests exercising subscription / local / unknown plan-tier rendering paths should pass the field explicitly.
109
+
141
110
  ### REST API
142
111
 
143
112
  The API has two auth layers:
@@ -162,7 +131,7 @@ When a span has a `conversation_id` matching an existing session, it's attribute
162
131
  2. **TOML binary mode** — `tomllib.load()` requires `open(path, "rb")` not `"r"`. Text mode raises `TypeError` at runtime. Use the conditional import: `tomllib` (3.11+) or `tomli` (3.10). Writing config uses `tomli_w`.
163
132
  3. **`@watch()` alone does NOT create LLM spans** — only session start/end. Provider patches (`patch_anthropic()`, `patch_openai()`, etc.) are needed for individual LLM call spans.
164
133
  4. **Ingest auth** — `POST /api/v1/spans` requires `Authorization: Bearer <ingest_secret>` from `security.ingest_secret` in `tj.toml`.
165
- 5. **Alert content stripping** — remove `prompt_content`, `completion_content`, `tool_input`, `tool_output` from alert payloads sent to external channels unless `alerts.include_captured_content = true`. Stdout and file channels always get full payload.
134
+ 5. **Alert content stripping** — remove `gen_ai.prompt.content`, `gen_ai.completion.content`, `gen_ai.tool.input`, `gen_ai.tool.output` from alert payloads sent to external channels unless `alerts.include_captured_content = true`. Stdout and file channels always get full payload. Note: content is also stripped at *ingest* (before DB write) by `strip_captured_content()` in `core/ingest.py` per the four `[capture]` toggles (`prompts` / `completions` / `tool_inputs` / `tool_outputs`) — so the alert flag is moot when the corresponding capture flag is off.
166
135
  6. **No unicode bullets** — never hardcode `•` or `\u2022`; Rich handles bullet formatting.
167
136
  7. **Parameterised SQL only** — never use f-string SQL.
168
137
  8. **All test spans via factory** — never construct `NormalizedSpan` directly in tests; use `tests/factories.py` (`make_llm_span`, `make_session`, `make_tool_span`, `make_session_with_spans`).
@@ -171,8 +140,10 @@ When a span has a `conversation_id` matching an existing session, it's attribute
171
140
  11. **OTel TracerProvider is global and set-once** — `trace.set_tracer_provider()` only works once per process. In tests, set the provider once at module level (not per-test in a fixture) and clear spans between tests. Use a custom `_CollectingExporter(SpanExporter)` since `InMemorySpanExporter` is not available in the installed OTel version. See `tests/agents/test_mock_scenarios.py` for the SDK test pattern and `tests/integration/test_full_pipeline.py` for the pipeline pattern.
172
141
  12. **New SDK integrations must call `ensure_initialised()`** — every `patch_*()` convenience function must call `from tokenjam.sdk.bootstrap import ensure_initialised; ensure_initialised()` before installing hooks. This lazily bootstraps the TracerProvider + IngestPipeline on first use.
173
142
  13. **PyPI package name is `tokenjam`, not `ocw`** — `pip install tokenjam` is the correct install command. The CLI command is `tj` and the Python package directory is `tokenjam/`. The published package name on PyPI is `tokenjam`. Never write `pip install ocw` in docs, examples, or comments.
174
- 14. **`tj optimize` output must never claim quality equivalence** — the model-downgrade finding flags structural candidates only. Every user-visible string says "looks like" / "candidate" / "review before switching" — never "safe to downgrade" or "would have worked." The `MODEL_DOWNGRADE_CAVEAT` constant lives on `DowngradeFinding` as a dataclass default so it can't be removed by accident; it must also appear in human-readable CLI output. Equivalent honesty applies to future optimize analyzers (cache-opportunity, prompt-bloat).
143
+ 14. **`tj optimize` output must never claim quality equivalence** — the model-downgrade finding flags structural candidates only. Every user-visible string says "looks like" / "candidate" / "review before switching" — never "safe to downgrade" or "would have worked." The `MODEL_DOWNGRADE_CAVEAT` constant lives on `DowngradeFinding` as a dataclass default so it can't be removed by accident; it must also appear in human-readable CLI output. The same honesty discipline applies to all other analyzers — `cache-efficacy` ("you're getting X% of available caching"), `cache-recommend` (Anthropic-only, structural prefix detection), `workflow-restructure` ("structural shape matches", "review before replacing with a script"), `prompt-bloat` ("predicted low-significance regions; review before editing"). `tj optimize --export-config` snippets bake the caveat block into the JSONC output as comments.
175
144
  15. **Version bump on release** — both `pyproject.toml` (`version = "X.Y.Z"`) and `sdk-ts/package.json` (`"version": "X.Y.Z"`) must be bumped to the new version before creating a GitHub release. The publish workflows (`publish-pypi.yml`, `publish-npm.yml`) trigger on `release published` events and will fail with 403 if the version already exists on PyPI/npm.
145
+ 16. **New optimize analyzers self-register** — drop a `.py` file under `tokenjam/core/optimize/analyzers/` with a function decorated `@register("name")` taking `AnalyzerContext`. Auto-discovery in `analyzers/__init__.py` walks the directory at import time. `cmd_optimize.py`'s `--finding` choices read from `ANALYZER_REGISTRY.keys()` at click decoration — no edits needed there. If your analyzer depends on (or is depended on by) another, append it to `ANALYZER_ORDER` in `runner.py` at the right position. Wave-2 analyzers attach their findings to `OptimizeReport.findings[name]` (generic dict); the older `model-downgrade` / `budget-projection` analyzers retain typed slots on `OptimizeReport` for backwards compat with `cmd_optimize` and the MCP server.
146
+ 17. **OTLP parsing has one home** — `tokenjam/otel/otlp_parsing.py`. Both the live `POST /api/v1/spans` route and the `tj backfill otlp` adapter import `parse_otlp_span` and `extract_resource_attrs` from there. If you need to extend OTLP attribute extraction, do it once in that module; do not copy-paste into either caller.
176
147
 
177
148
  ## Config
178
149
 
@@ -180,7 +151,7 @@ Config is TOML, discovered at: `tj.toml` -> `.tj/config.toml` -> `~/.config/tj/c
180
151
 
181
152
  Two distinct budget concepts coexist — do not conflate:
182
153
  - **`[defaults.budget]` / `[agents.<id>.budget]`** (`daily_usd`, `session_usd`) — per-agent alert thresholds checked on every span by `AlertEngine`.
183
- - **`[budget.<provider>]`** (`usd`, `cycle_start_day`, `applies_to_services`) — periodic monthly ceilings used only by `tj optimize` projections. Read-only at projection time; no alerts fire from these. `tj onboard --claude-code` writes a default `[budget.anthropic] usd = 200` if no provider budget is configured. The analyzer scopes spend by `provider` column and (optionally) by `agent_id IN applies_to_services`.
154
+ - **`[budget.<provider>]`** (`plan`, `usd`, `cycle_start_day`, `applies_to_services`) — per-provider budget config. `plan` is the user's declared plan tier (api / pro / max_5x / max_20x / plus / team / enterprise / local), prompted for by `tj onboard` and used by `IngestPipeline` to populate `SessionRecord.plan_tier` at session creation. `usd` is a periodic monthly ceiling used only by `tj optimize` budget-projection (read-only; no alerts fire from it). Onboard does NOT auto-write `usd = 200` subscription users get only the `plan` field; API users are explicitly asked whether they want a self-imposed ceiling. The budget-projection analyzer scopes spend by `provider` column and (optionally) by `agent_id IN applies_to_services`.
184
155
 
185
156
  `tj onboard --claude-code` and `tj onboard --codex` always write to the **global** config (`~/.config/tj/config.toml`) regardless of cwd. This is intentional: each coding-agent integration reads one ingest secret from a single global location (`~/.claude/settings.json` or `~/.codex/config.toml`), and per-project configs would rotate that secret on every onboard, breaking auth for previously onboarded projects. Onboarded Claude Code project paths are tracked in `~/.config/tj/projects.json` for clean uninstall. Codex onboarding is fully project-agnostic — Codex hardcodes `service.name=codex_exec` in its binary, so there is one Codex agent ID for all projects.
186
157
 
@@ -222,6 +193,8 @@ The Agent Incident Library at `incidents/` is separate: each scenario is a `scen
222
193
 
223
194
  Model pricing lives in `pricing/models.toml` (USD per million tokens). Structure: `[provider.model_name]` with `input_per_mtok`, `output_per_mtok`, and optional `cache_read_per_mtok`/`cache_write_per_mtok`. Unknown models fall back to default rates ($0.50/$2.00 per MTok) with a logged warning. The pricing table is LRU-cached at process startup — restart to pick up changes.
224
195
 
196
+ Pricing is community-maintained: submit a PR editing `pricing/models.toml` when provider prices change. No code changes needed — the file is loaded at runtime.
197
+
225
198
  ## CI
226
199
 
227
200
  GitHub Actions workflow at `.github/workflows/ci.yml` runs on push/PR to `main`:
@@ -230,12 +203,40 @@ GitHub Actions workflow at `.github/workflows/ci.yml` runs on push/PR to `main`:
230
203
 
231
204
  All steps are blocking — lint, typecheck, and tests must pass for CI to go green.
232
205
 
206
+ There is no pre-commit configuration in this repo; `ruff` and `mypy` only run in CI. Run them locally before pushing.
207
+
208
+ ## Releases
209
+
210
+ PyPI and npm publishes are triggered by GitHub Release events (`.github/workflows/publish-pypi.yml`, `publish-npm.yml`, both `on: release: types: [published]`). Release flow:
211
+
212
+ 1. Bump both `pyproject.toml` `version` and `sdk-ts/package.json` `"version"` to the new `X.Y.Z` (see Critical Rule 15).
213
+ 2. Merge to `main`.
214
+ 3. Create a GitHub Release with tag `vX.Y.Z` (e.g. via `gh release create vX.Y.Z --generate-notes`). Publishing the release fires both workflows.
215
+
216
+ If a version already exists on PyPI or npm, the publish workflow fails with 403 — bump again rather than retrying.
217
+
233
218
  ## Packaging
234
219
 
235
220
  Build system is hatchling. The `pyproject.toml` requires `[tool.hatch.build.targets.wheel] packages = ["tj"]` because the package name (`tokenjam`) differs from the directory name (`tj`). Without this, `pip install -e .` fails.
236
221
 
237
222
  Key runtime dependency: `pytz` is required by DuckDB for `TIMESTAMPTZ` column handling — it's listed explicitly in `dependencies` because DuckDB doesn't declare it on all platforms.
238
223
 
224
+ **Optional extras** (declared under `[project.optional-dependencies]`):
225
+ - `tokenjam[bloat]` — `llmlingua>=0.2`, used by the Trim analyzer. Pulls PyTorch + transformers (~2GB). Kept out of base install. The analyzer self-registers without the extra installed; the deferred `import llmlingua` inside the analysis function body raises a typed message pointing the user at the install command.
226
+ - Framework extras `[langchain]`, `[crewai]`, `[autogen]`, `[litellm]` for SDK patches.
227
+ - `[dev]` for local development (`pytest`, `ruff`, `mypy`, `httpx`).
228
+ - `[mcp]` for the FastMCP stdio server.
229
+
239
230
  ## Further Reading
240
231
 
241
- - **[docs/architecture.md](docs/architecture.md)** — comprehensive architecture document covering design principles, system overview, data flow, SDK internals, alert system, drift detection, MCP server, Claude Code integration, budget system, and testing architecture.
232
+ - **[docs/architecture.md](docs/architecture.md)** — design principles, system overview, data flow, SDK internals, alert system, drift detection, MCP server, Claude Code integration, budget system, testing architecture, and the **OTel semconv extensions** section documenting `tokenjam.billing_account` (span attribute) and `tokenjam.plan_tier` (session-level), the `pricing_mode` derivation rules, and why `plan_tier` lives on `SessionRecord` rather than each span.
233
+ - **[docs/installation.md](docs/installation.md)** — base install vs optional extras matrix. Documents `tokenjam[bloat]` (the ~2GB torch + transformers extra used by the Trim analyzer), framework adapter extras (`[langchain]` / `[crewai]` / `[autogen]` / `[litellm]`), and the MCP / dev extras.
234
+ - **[docs/configuration.md](docs/configuration.md)** — full TOML config surface plus the "Content capture and privacy" section explaining the four `[capture]` toggles and how they interact with `alerts.include_captured_content`.
235
+ - **Optimize product pages** — one per user-facing product, all under `docs/optimize/`:
236
+ - [`downsize.md`](docs/optimize/downsize.md) — model-downgrade candidate flagging (internal: `model-downgrade`)
237
+ - [`cache.md`](docs/optimize/cache.md) — `cache-efficacy` (current caching ratio) + `cache-recommend` (Anthropic-only breakpoint suggestions)
238
+ - [`script.md`](docs/optimize/script.md) — `workflow-restructure` clustering by `(tool_name, arg_shape)` signature
239
+ - [`trim.md`](docs/optimize/trim.md) — LLMLingua-2 token-significance classifier (`prompt-bloat`), install + capture requirements, performance numbers
240
+ - **Backfill adapters** — `docs/backfill/overview.md` lists the four sources (`claude-code` / `langfuse` / `helicone` / `otlp`) with the partnership-posture framing; per-adapter pages document modes (URL / file), field mapping, idempotency, and v1 limitations.
241
+ - **[docs/policy/overview.md](docs/policy/overview.md)** — read-only preview of the unified policy surface (`tj policy list`). Notes that the `add` / `edit` / `apply` subcommands and the underlying `[policy]` config migration land next sprint.
242
+ - **Internal specs** — `docs/internal/specs/` is reserved for canonical specs that production code references at long-term. Currently empty (sprint specs have been cleaned up after merge); add new ones here when a feature needs a stable, code-referenced source of truth.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: tokenjam
3
- Version: 0.2.3
3
+ Version: 0.3.1
4
4
  Summary: TokenJam — local-first OTel-native observability for Autonomous AI agents
5
5
  Project-URL: Homepage, https://opencla.watch
6
6
  Project-URL: Repository, https://github.com/Metabuilder-Labs/openclawwatch
@@ -38,6 +38,8 @@ Requires-Dist: uvicorn>=0.27
38
38
  Requires-Dist: websockets>=12.0
39
39
  Provides-Extra: autogen
40
40
  Requires-Dist: pyautogen>=0.2; extra == 'autogen'
41
+ Provides-Extra: bloat
42
+ Requires-Dist: llmlingua>=0.2; extra == 'bloat'
41
43
  Provides-Extra: crewai
42
44
  Requires-Dist: crewai>=0.28; extra == 'crewai'
43
45
  Provides-Extra: dev
@@ -111,7 +113,7 @@ example sessions before changing models.
111
113
  Run rate $160.3500/mo — 19% of cycle budget unused.
112
114
  ```
113
115
 
114
- Two analyzers reading the same spans you'd otherwise pay LangSmith to host: structural model-downgrade candidate flagging (never claims quality equivalence — surfaces examples to review) and per-provider monthly budget projection. Works with **any** agent already sending TokenJam data, not just Claude Code.
116
+ Two analyzers reading the same spans you'd otherwise pay LangSmith to host: structural downsize candidate flagging (never claims quality equivalence — surfaces examples to review) and per-provider monthly budget projection. Works with **any** agent already sending TokenJam data, not just Claude Code.
115
117
 
116
118
  Try a tighter budget to see the over-budget renderer:
117
119
 
@@ -55,7 +55,7 @@ example sessions before changing models.
55
55
  Run rate $160.3500/mo — 19% of cycle budget unused.
56
56
  ```
57
57
 
58
- Two analyzers reading the same spans you'd otherwise pay LangSmith to host: structural model-downgrade candidate flagging (never claims quality equivalence — surfaces examples to review) and per-provider monthly budget projection. Works with **any** agent already sending TokenJam data, not just Claude Code.
58
+ Two analyzers reading the same spans you'd otherwise pay LangSmith to host: structural downsize candidate flagging (never claims quality equivalence — surfaces examples to review) and per-provider monthly budget projection. Works with **any** agent already sending TokenJam data, not just Claude Code.
59
59
 
60
60
  Try a tighter budget to see the over-budget renderer:
61
61
 
@@ -377,6 +377,52 @@ The converted spans flow through the standard `IngestPipeline.process()` path
377
377
 
378
378
  ---
379
379
 
380
+ ## OTel semconv extensions: `billing_account` and `plan_tier`
381
+
382
+ TokenJam extends the standard OTel GenAI semconv with two attributes for plan-tier-aware cost rendering. Both live in `TjAttributes` in `tokenjam/otel/semconv.py`:
383
+
384
+ | Attribute | Constant | Lives on | Set by |
385
+ |---|---|---|---|
386
+ | `tokenjam.billing_account` | `TjAttributes.BILLING_ACCOUNT` | Spans | Every integration (OTel patches, Claude Code JSONL backfill, OTLP HTTP ingest, OTLP logs ingest, Langfuse/Helicone backfill adapters) |
387
+ | `tokenjam.plan_tier` | `TjAttributes.PLAN_TIER` | `SessionRecord` (DB column, not a span attribute) | `IngestPipeline._build_or_update_session` reads `ProviderBudget.plan` for the session's `billing_account` |
388
+
389
+ ### `billing_account`
390
+
391
+ Provider-only identifier. Valid values: `anthropic`, `openai`, `google`, `bedrock`, `local.ollama`. Distinct from `gen_ai.system` (which can include framework-level wrappers like `litellm` or `langchain`) — `billing_account` is whichever provider actually billed the call.
392
+
393
+ It is **not** a composite. No API-key fingerprint, no plan tier, no account-name suffix. Multi-key disambiguation is deferred until someone asks for it.
394
+
395
+ ### `plan_tier`
396
+
397
+ The user's declared plan for the relevant provider. Valid values are enumerated in `VALID_PLAN_TIERS`:
398
+
399
+ ```python
400
+ {"api", "pro", "max_5x", "max_20x", "plus", "team", "enterprise", "local", "unknown"}
401
+ ```
402
+
403
+ `SUBSCRIPTION_PLAN_TIERS` is a sub-set: the plans where "spend" is a flat fee and the user isn't paying per-token. Renderers branch on this to avoid presenting a per-token dollar "spend" claim against a subscription plan.
404
+
405
+ ### `pricing_mode` derived property
406
+
407
+ `SessionRecord.pricing_mode` is a derived Python property — **not** a stored DB column. It maps `plan_tier` to one of four rendering modes, evaluated top-to-bottom (first match wins):
408
+
409
+ 1. `local` if `billing_account == "local.ollama"`
410
+ 2. `subscription` if `plan_tier in SUBSCRIPTION_PLAN_TIERS`
411
+ 3. `api` if `plan_tier == "api"`
412
+ 4. `unknown` if `plan_tier == "unknown"`
413
+
414
+ Renderers (`tj optimize`, `tj cost`, the web UI cost views) read `pricing_mode` and pick the appropriate framing. `tj optimize` suppresses dollar figures entirely when the entire window is `pricing_mode = unknown`.
415
+
416
+ ### Why a session-level column, not a span attribute
417
+
418
+ `plan_tier` doesn't change call-to-call within a session. Storing it on each span would duplicate the value across thousands of rows and create skew if a provider plan changes mid-session. Storing on `SessionRecord` keeps it normalized and lets analyzers JOIN through to it.
419
+
420
+ ### Backfilled sessions
421
+
422
+ `SessionRecord.plan_tier` defaults to `unknown` for backfilled rows (no plan signal in the source data). `tj status` surfaces a one-line note when unknown-tier sessions exist; `tj optimize` refuses to render dollar figures for those sessions until the user resolves them via `tj onboard --reconfigure`.
423
+
424
+ ---
425
+
380
426
  ## Budget system
381
427
 
382
428
  ### CLI (`tj budget`)
@@ -0,0 +1,62 @@
1
+ # `tj backfill helicone`
2
+
3
+ Imports [Helicone](https://helicone.ai) request records into the local TokenJam DB. Two input modes — live API or local JSON dump.
4
+
5
+ TokenJam doesn't replace Helicone. Keep Helicone wherever you have it; point `tj backfill helicone` at the same data so the local cost-optimization analyzers (`tj optimize`) can read it.
6
+
7
+ ## Live API ingestion
8
+
9
+ ```bash
10
+ tj backfill helicone \
11
+ --source-url https://api.helicone.ai \
12
+ --api-key hc_pk_... \
13
+ --since 30d
14
+ ```
15
+
16
+ POSTs `/v1/request/query` against Helicone with Bearer auth and follows pagination. Self-hosted Helicone instances work the same way — point `--source-url` at the base URL of your deployment.
17
+
18
+ `--since` accepts the same syntax as `tj cost --since`: `30d`, `24h`, or an ISO-8601 timestamp.
19
+
20
+ ## File ingestion
21
+
22
+ ```bash
23
+ tj backfill helicone --source-file ./helicone-export.json
24
+ ```
25
+
26
+ Accepts three input shapes:
27
+
28
+ 1. `{"data": [...]}` — the format returned by the live `/v1/request/query` endpoint.
29
+ 2. `[...]` — a bare JSON array of records.
30
+ 3. NDJSON — one JSON record per line.
31
+
32
+ The file mode is the right choice for testing, offline analysis, or scripted ingestion from a snapshot.
33
+
34
+ ## What gets mapped
35
+
36
+ Each Helicone request record becomes one TokenJam span:
37
+
38
+ | Helicone field | TokenJam field |
39
+ |---|---|
40
+ | `request.id` | `span_id` (deterministic hash) |
41
+ | `Helicone-Property-Session` (fallback: `request.id`) | `conversation_id` |
42
+ | `request.user_id` (fallback: `"helicone"`) | `agent_id` |
43
+ | `request.model` | `model` |
44
+ | `request.provider` (fallback: derived from model) | `provider` + `billing_account` |
45
+ | `request.created_at` | `start_time` |
46
+ | `request.created_at + response.delay_ms` | `end_time` |
47
+ | `request.prompt_tokens` | `input_tokens` |
48
+ | `response.completion_tokens` | `output_tokens` |
49
+ | `cost_usd` / `costUSD` | `cost_usd` |
50
+ | `properties` | merged into `attrs` |
51
+
52
+ `billing_account` is derived from `request.provider` when present, or from the model name otherwise. Unknown providers leave it `NULL`; affected sessions will surface as `plan_tier = 'unknown'` in `tj optimize`.
53
+
54
+ ## Idempotency
55
+
56
+ The TokenJam `span_id` is a deterministic SHA-256 hash of `("helicone", request.id)`. Re-running the same backfill skips rows already present — the output reports `spans_written` vs `spans_skipped`. Safe to schedule nightly.
57
+
58
+ ## Limitations in v1
59
+
60
+ - Helicone's per-request prompt/response bodies are not extracted into `gen_ai.prompt.content` / `gen_ai.completion.content`. Token counts and structural metadata only.
61
+ - Multi-tenant Helicone instances aren't filtered by org — the API key's scope determines what's returned.
62
+ - If `cost_usd` is missing from a record, TokenJam recomputes cost from `pricing/models.toml` using the model name.