rubino-agent 0.5.0 → 0.5.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (262) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +6 -0
  3. data/.rubocop_todo.yml +1 -0
  4. data/CHANGELOG.md +388 -0
  5. data/README.md +56 -7
  6. data/Rakefile +17 -0
  7. data/docs/agents.md +37 -30
  8. data/docs/api/v1.md +2 -0
  9. data/docs/architecture.md +2 -9
  10. data/docs/commands.md +17 -8
  11. data/docs/configuration.md +166 -12
  12. data/docs/mcp.md +3 -3
  13. data/docs/memory.md +3 -3
  14. data/docs/oauth-providers.md +21 -0
  15. data/docs/security.md +1 -1
  16. data/docs/tools.md +45 -49
  17. data/ext/landlock/extconf.rb +78 -0
  18. data/ext/landlock/landlock.c +253 -0
  19. data/lib/rubino/agent/action_claim_guard.rb +61 -29
  20. data/lib/rubino/agent/definition.rb +3 -19
  21. data/lib/rubino/agent/iteration_budget.rb +1 -1
  22. data/lib/rubino/agent/loop.rb +188 -22
  23. data/lib/rubino/agent/prompts/build.txt +43 -5
  24. data/lib/rubino/agent/prompts/general.txt +8 -3
  25. data/lib/rubino/agent/runner.rb +179 -10
  26. data/lib/rubino/agent/tool_executor.rb +205 -20
  27. data/lib/rubino/agent/truncation_continuation.rb +7 -4
  28. data/lib/rubino/api/operations/approvals/decide_operation.rb +0 -4
  29. data/lib/rubino/api/operations/clarifications/decide_operation.rb +0 -4
  30. data/lib/rubino/api/operations/cron_jobs/create_operation.rb +0 -4
  31. data/lib/rubino/api/operations/cron_jobs/delete_operation.rb +0 -4
  32. data/lib/rubino/api/operations/cron_jobs/list_operation.rb +0 -4
  33. data/lib/rubino/api/operations/cron_jobs/pause_operation.rb +1 -5
  34. data/lib/rubino/api/operations/cron_jobs/resume_operation.rb +1 -5
  35. data/lib/rubino/api/operations/cron_jobs/show_operation.rb +0 -4
  36. data/lib/rubino/api/operations/cron_jobs/trigger_operation.rb +0 -4
  37. data/lib/rubino/api/operations/cron_jobs/update_operation.rb +0 -4
  38. data/lib/rubino/api/operations/files/read_operation.rb +1 -5
  39. data/lib/rubino/api/operations/files/upload_operation.rb +0 -4
  40. data/lib/rubino/api/operations/health_operation.rb +1 -5
  41. data/lib/rubino/api/operations/memory/delete_operation.rb +0 -4
  42. data/lib/rubino/api/operations/memory/index_operation.rb +0 -4
  43. data/lib/rubino/api/operations/memory/stats_operation.rb +0 -4
  44. data/lib/rubino/api/operations/metrics_operation.rb +1 -1
  45. data/lib/rubino/api/operations/mode/show_operation.rb +0 -4
  46. data/lib/rubino/api/operations/mode/update_operation.rb +0 -4
  47. data/lib/rubino/api/operations/models/list_operation.rb +0 -4
  48. data/lib/rubino/api/operations/oauth/connections/disconnect_operation.rb +0 -4
  49. data/lib/rubino/api/operations/oauth/connections/list_operation.rb +0 -4
  50. data/lib/rubino/api/operations/oauth/providers/callback_operation.rb +0 -4
  51. data/lib/rubino/api/operations/oauth/providers/connect_operation.rb +0 -4
  52. data/lib/rubino/api/operations/oauth/providers/list_operation.rb +0 -4
  53. data/lib/rubino/api/operations/runs/create_operation.rb +0 -4
  54. data/lib/rubino/api/operations/runs/events_operation.rb +0 -4
  55. data/lib/rubino/api/operations/runs/stop_operation.rb +0 -4
  56. data/lib/rubino/api/operations/sessions/create_operation.rb +0 -4
  57. data/lib/rubino/api/operations/sessions/delete_operation.rb +0 -4
  58. data/lib/rubino/api/operations/sessions/index_operation.rb +0 -4
  59. data/lib/rubino/api/operations/sessions/retry_operation.rb +0 -4
  60. data/lib/rubino/api/operations/sessions/show_operation.rb +0 -4
  61. data/lib/rubino/api/operations/sessions/undo_operation.rb +0 -4
  62. data/lib/rubino/api/operations/skills/list_operation.rb +0 -4
  63. data/lib/rubino/api/operations/skills/toggle_operation.rb +0 -4
  64. data/lib/rubino/api/operations/tasks/index_operation.rb +0 -4
  65. data/lib/rubino/api/operations/tasks/show_operation.rb +0 -4
  66. data/lib/rubino/api/operations/tasks/stop_operation.rb +0 -7
  67. data/lib/rubino/api/router.rb +2 -2
  68. data/lib/rubino/attachments/classify.rb +0 -1
  69. data/lib/rubino/attachments/policy.rb +8 -0
  70. data/lib/rubino/attachments/preamble.rb +16 -8
  71. data/lib/rubino/cli/chat/completion_builder.rb +2 -10
  72. data/lib/rubino/cli/chat/session_resolver.rb +100 -30
  73. data/lib/rubino/cli/chat_command.rb +781 -236
  74. data/lib/rubino/cli/commands.rb +93 -1
  75. data/lib/rubino/cli/config_command.rb +54 -7
  76. data/lib/rubino/cli/doctor_command.rb +73 -20
  77. data/lib/rubino/cli/jobs_command.rb +38 -11
  78. data/lib/rubino/cli/memory_command.rb +29 -9
  79. data/lib/rubino/cli/onboarding_wizard.rb +6 -1
  80. data/lib/rubino/cli/server_command.rb +43 -1
  81. data/lib/rubino/cli/session_command.rb +129 -29
  82. data/lib/rubino/cli/setup_command.rb +166 -4
  83. data/lib/rubino/cli/skills_command.rb +21 -0
  84. data/lib/rubino/commands/built_ins.rb +1 -2
  85. data/lib/rubino/commands/executor.rb +17 -18
  86. data/lib/rubino/commands/handlers/agents.rb +108 -158
  87. data/lib/rubino/commands/handlers/config.rb +4 -0
  88. data/lib/rubino/commands/handlers/display.rb +50 -0
  89. data/lib/rubino/commands/handlers/help.rb +2 -9
  90. data/lib/rubino/commands/handlers/mcp.rb +7 -32
  91. data/lib/rubino/commands/handlers/memory.rb +10 -35
  92. data/lib/rubino/commands/handlers/sessions.rb +64 -50
  93. data/lib/rubino/commands/handlers/skills.rb +47 -28
  94. data/lib/rubino/commands/handlers/status.rb +56 -6
  95. data/lib/rubino/compression/compression_result.rb +35 -0
  96. data/lib/rubino/compression/compressor.rb +109 -0
  97. data/lib/rubino/compression/content_router.rb +240 -0
  98. data/lib/rubino/compression/diff_compressor.rb +252 -0
  99. data/lib/rubino/compression/javascript_code_skeleton.rb +15 -0
  100. data/lib/rubino/compression/json_compressor.rb +274 -0
  101. data/lib/rubino/compression/line_skeleton.rb +92 -0
  102. data/lib/rubino/compression/log_compressor.rb +299 -0
  103. data/lib/rubino/compression/python_code_skeleton.rb +122 -0
  104. data/lib/rubino/compression/ruby_code_skeleton.rb +80 -0
  105. data/lib/rubino/compression/tree_sitter_code_skeleton.rb +118 -0
  106. data/lib/rubino/compression/tsx_code_skeleton.rb +15 -0
  107. data/lib/rubino/compression/typescript_code_skeleton.rb +15 -0
  108. data/lib/rubino/config/configuration.rb +88 -92
  109. data/lib/rubino/config/defaults.rb +270 -27
  110. data/lib/rubino/config/loader.rb +9 -1
  111. data/lib/rubino/config/reasoning_prefs.rb +23 -0
  112. data/lib/rubino/config/validator.rb +50 -7
  113. data/lib/rubino/context/compressor.rb +1 -1
  114. data/lib/rubino/context/file_discovery.rb +0 -8
  115. data/lib/rubino/context/message_boundary.rb +2 -7
  116. data/lib/rubino/context/project_languages.rb +0 -7
  117. data/lib/rubino/context/prompt_assembler.rb +7 -2
  118. data/lib/rubino/context/summary_builder.rb +34 -25
  119. data/lib/rubino/context/token_budget.rb +2 -7
  120. data/lib/rubino/database/migrations/001_create_initial_schema.rb +1 -1
  121. data/lib/rubino/database/migrator.rb +0 -26
  122. data/lib/rubino/errors.rb +2 -2
  123. data/lib/rubino/files/workspace.rb +2 -2
  124. data/lib/rubino/interaction/events.rb +0 -3
  125. data/lib/rubino/interaction/input_queue.rb +11 -0
  126. data/lib/rubino/interaction/lifecycle.rb +144 -25
  127. data/lib/rubino/interaction/polishing.rb +8 -0
  128. data/lib/rubino/interaction/probe.rb +1 -1
  129. data/lib/rubino/jobs/cron_job_repository.rb +0 -4
  130. data/lib/rubino/jobs/handlers/distill_skill_job.rb +3 -13
  131. data/lib/rubino/jobs/queue.rb +70 -5
  132. data/lib/rubino/jobs/worker.rb +1 -1
  133. data/lib/rubino/llm/adapter_factory.rb +1 -1
  134. data/lib/rubino/llm/anthropic_role_merge.rb +75 -0
  135. data/lib/rubino/llm/auxiliary_client.rb +63 -3
  136. data/lib/rubino/llm/cache_breakpoint_middleware.rb +194 -0
  137. data/lib/rubino/llm/credential_check.rb +61 -4
  138. data/lib/rubino/llm/error_classifier.rb +175 -121
  139. data/lib/rubino/llm/fake_provider.rb +3 -7
  140. data/lib/rubino/llm/inline_think_filter.rb +34 -3
  141. data/lib/rubino/llm/reasoning_manager.rb +3 -26
  142. data/lib/rubino/llm/request.rb +0 -16
  143. data/lib/rubino/llm/ruby_llm_adapter.rb +257 -44
  144. data/lib/rubino/llm/scenario_loader.rb +10 -17
  145. data/lib/rubino/llm/scenarios/glued-table-prose.yml +36 -0
  146. data/lib/rubino/llm/scenarios/growing-table.yml +49 -0
  147. data/lib/rubino/llm/scenarios/narrow-terminal-table.yml +47 -0
  148. data/lib/rubino/llm/scenarios/streamed-table.yml +55 -0
  149. data/lib/rubino/llm/scenarios/table-then-prose.yml +34 -0
  150. data/lib/rubino/llm/scenarios/too-wide-table.yml +47 -0
  151. data/lib/rubino/llm/scenarios/wide-table.yml +1 -1
  152. data/lib/rubino/llm/stream_tool_call_recovery.rb +91 -0
  153. data/lib/rubino/llm/thinking_support.rb +17 -12
  154. data/lib/rubino/llm/tool_bridge.rb +101 -37
  155. data/lib/rubino/llm/tool_call_recovery.rb +177 -0
  156. data/lib/rubino/mcp/manager.rb +53 -9
  157. data/lib/rubino/mcp/mcp_tool_wrapper.rb +24 -0
  158. data/lib/rubino/memory/backends/sqlite.rb +43 -35
  159. data/lib/rubino/memory/backends.rb +3 -3
  160. data/lib/rubino/memory/deduplicator.rb +22 -0
  161. data/lib/rubino/memory/flusher.rb +35 -1
  162. data/lib/rubino/memory/salience_gate.rb +26 -0
  163. data/lib/rubino/memory/sqlite_extraction_prompt.rb +5 -3
  164. data/lib/rubino/memory/store.rb +29 -48
  165. data/lib/rubino/memory/threat_scanner.rb +8 -0
  166. data/lib/rubino/memory.rb +47 -0
  167. data/lib/rubino/oauth/provider.rb +0 -5
  168. data/lib/rubino/run/event_store.rb +1 -6
  169. data/lib/rubino/run/repository.rb +0 -14
  170. data/lib/rubino/security/approval_policy.rb +116 -30
  171. data/lib/rubino/security/command_normalizer.rb +36 -0
  172. data/lib/rubino/security/dangerous_patterns.rb +17 -4
  173. data/lib/rubino/security/hardline_guard.rb +4 -3
  174. data/lib/rubino/security/pattern_matcher.rb +0 -2
  175. data/lib/rubino/security/readonly_commands.rb +299 -15
  176. data/lib/rubino/security/redactor.rb +272 -0
  177. data/lib/rubino/security/sandbox.rb +460 -0
  178. data/lib/rubino/security/secret_detector.rb +110 -0
  179. data/lib/rubino/security/secret_path.rb +151 -10
  180. data/lib/rubino/session/lock.rb +91 -0
  181. data/lib/rubino/session/message.rb +38 -3
  182. data/lib/rubino/session/picker.rb +95 -0
  183. data/lib/rubino/session/repository.rb +57 -40
  184. data/lib/rubino/session/store.rb +0 -11
  185. data/lib/rubino/skills/registry.rb +30 -7
  186. data/lib/rubino/skills/skill.rb +31 -10
  187. data/lib/rubino/skills/skill_tool.rb +3 -18
  188. data/lib/rubino/skills/state_repository.rb +0 -4
  189. data/lib/rubino/tools/background_tasks.rb +157 -209
  190. data/lib/rubino/tools/base.rb +87 -89
  191. data/lib/rubino/tools/edit_tool.rb +50 -20
  192. data/lib/rubino/tools/fuzzy_match.rb +212 -0
  193. data/lib/rubino/tools/glob_tool.rb +5 -1
  194. data/lib/rubino/tools/grep_tool.rb +30 -52
  195. data/lib/rubino/tools/multi_edit_tool.rb +32 -19
  196. data/lib/rubino/tools/patch_tool.rb +51 -10
  197. data/lib/rubino/tools/probe_tool.rb +0 -20
  198. data/lib/rubino/tools/question_tool.rb +53 -2
  199. data/lib/rubino/tools/read_attachment_tool.rb +21 -11
  200. data/lib/rubino/tools/read_tool.rb +131 -25
  201. data/lib/rubino/tools/read_tracker.rb +36 -0
  202. data/lib/rubino/tools/registry.rb +63 -44
  203. data/lib/rubino/tools/result.rb +43 -12
  204. data/lib/rubino/tools/retrieve_output_tool.rb +70 -0
  205. data/lib/rubino/tools/ruby_tool.rb +0 -0
  206. data/lib/rubino/tools/shell_kill_tool.rb +6 -2
  207. data/lib/rubino/tools/shell_output_tool.rb +7 -1
  208. data/lib/rubino/tools/shell_registry.rb +169 -15
  209. data/lib/rubino/tools/shell_tail_tool.rb +6 -1
  210. data/lib/rubino/tools/shell_tool.rb +483 -53
  211. data/lib/rubino/tools/steer_tool.rb +3 -23
  212. data/lib/rubino/tools/subagent_probe.rb +1 -1
  213. data/lib/rubino/tools/summarize_file_tool.rb +6 -0
  214. data/lib/rubino/tools/task_result_tool.rb +8 -2
  215. data/lib/rubino/tools/task_stop_tool.rb +10 -13
  216. data/lib/rubino/tools/task_tool.rb +196 -116
  217. data/lib/rubino/tools/vision_tool.rb +32 -4
  218. data/lib/rubino/tools/webfetch_tool.rb +145 -0
  219. data/lib/rubino/tools/write_tool.rb +23 -3
  220. data/lib/rubino/ui/agent_menu.rb +177 -0
  221. data/lib/rubino/ui/api.rb +2 -2
  222. data/lib/rubino/ui/base.rb +2 -2
  223. data/lib/rubino/ui/bottom_composer.rb +976 -151
  224. data/lib/rubino/ui/cli.rb +943 -320
  225. data/lib/rubino/ui/completion_menu.rb +24 -43
  226. data/lib/rubino/ui/composer/input_line.rb +131 -0
  227. data/lib/rubino/ui/composer/subagent_panel.rb +35 -0
  228. data/lib/rubino/ui/headless_trace.rb +1 -1
  229. data/lib/rubino/ui/input_history.rb +90 -10
  230. data/lib/rubino/ui/live_region.rb +30 -1
  231. data/lib/rubino/ui/markdown_renderer.rb +150 -44
  232. data/lib/rubino/ui/markdown_repair.rb +114 -0
  233. data/lib/rubino/ui/menu_view.rb +117 -0
  234. data/lib/rubino/ui/notifier.rb +4 -10
  235. data/lib/rubino/ui/null.rb +1 -1
  236. data/lib/rubino/ui/paste_store.rb +33 -1
  237. data/lib/rubino/ui/printer_base.rb +135 -8
  238. data/lib/rubino/ui/streaming_markdown.rb +101 -0
  239. data/lib/rubino/ui/subagent_cards.rb +113 -39
  240. data/lib/rubino/util/atomic_file.rb +12 -0
  241. data/lib/rubino/util/duration.rb +8 -5
  242. data/lib/rubino/util/ignore_rules.rb +18 -2
  243. data/lib/rubino/util/output.rb +55 -10
  244. data/lib/rubino/util/secrets_mask.rb +0 -9
  245. data/lib/rubino/version.rb +7 -1
  246. data/lib/rubino/workspace.rb +65 -2
  247. data/lib/rubino.rb +62 -29
  248. data/rubino-agent.gemspec +28 -1
  249. metadata +95 -19
  250. data/docs/plugins.md +0 -195
  251. data/lib/rubino/interaction/state.rb +0 -56
  252. data/lib/rubino/memory/backends/default.rb +0 -101
  253. data/lib/rubino/memory/extractor.rb +0 -85
  254. data/lib/rubino/memory/retriever.rb +0 -50
  255. data/lib/rubino/plugins/registry.rb +0 -75
  256. data/lib/rubino/plugins.rb +0 -86
  257. data/lib/rubino/tools/answer_child_tool.rb +0 -83
  258. data/lib/rubino/tools/ask_parent_tool.rb +0 -232
  259. data/lib/rubino/tools/git_tool.rb +0 -71
  260. data/lib/rubino/tools/github_tool.rb +0 -233
  261. data/lib/rubino/tools/test_tool.rb +0 -454
  262. data/lib/rubino/ui/subagent_view.rb +0 -280
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8a3a48a6f7deb104446c624354a271ff1a1671c9b3ea846cee1144b3f55538db
4
- data.tar.gz: cbc407cd78db827d75f150f61045b2cce759b266816e32fdba874c30eb3647c4
3
+ metadata.gz: 45e009503b875320e560be8ce46459d7c2a70b0c5744043b70f2121c5e747ab2
4
+ data.tar.gz: c26d1f586ae8578ed3d7298d1f7f8765d16ed5328b6d063d77e0df41d32a710a
5
5
  SHA512:
6
- metadata.gz: 3342a4c8b1856691788ac9625b81eb4058870a0eb5a75946e99e6570ce40a05a7b44475a798863ac0fc92c9fcb505cf6caf27fcedc10a8e1bd9653ba18edf440
7
- data.tar.gz: e54077b30f385942dd3591fb7e6c4e7afc0f301e5fea9d3111732c91ce56e63a4c7d192704ebd74ea97ee4a04437784d73a8d398bf02ce6143cdd69b2227a841
6
+ metadata.gz: dc1cfd93fb51e6236daf8e839a8df248fdcd299aef3028c258fefc93c585bf15e322b5d41ebf4d1b504795a9d24d6e8668f5dece2089b9454f6432dabcace482
7
+ data.tar.gz: 5495d4862d7b082826205ca4ecb30701df4a103ce801233df6a3dc91c59ce423ea4cb6cab4e41b9b0327f72e911ad310bf9433aaf6d218d8803bb0ffd56bec65
data/.rubocop.yml CHANGED
@@ -27,6 +27,12 @@ AllCops:
27
27
  # Test fixtures are sample input documents (e.g. a .rb code sample for the
28
28
  # plain-text converter), not project source -- they must not be linted.
29
29
  - "spec/fixtures/**/*"
30
+ # Eval-harness fixtures are deliberately tiny/imperfect sample projects the
31
+ # agent edits at eval time (INPUT, not source); results/ is generated output.
32
+ # The eval/.rubocop.yml excludes these for an in-eval run; mirror it here so
33
+ # the whole-repo lint from the root is clean too.
34
+ - "eval/fixtures/**/*"
35
+ - "eval/results/**/*"
30
36
 
31
37
  # --- House style: strings ----------------------------------------------------
32
38
 
data/.rubocop_todo.yml CHANGED
@@ -538,6 +538,7 @@ RSpec/DescribeClass:
538
538
  - 'spec/rubino/skills/skills_spec.rb'
539
539
  - 'spec/rubino/tools/edit_read_gate_spec.rb'
540
540
  - 'spec/rubino/tools/shell_background_spec.rb'
541
+ - 'spec/rubino/tools/shell_background_completion_spec.rb'
541
542
  - 'spec/rubino/tools/shell_input_spec.rb'
542
543
  - 'spec/rubino/tools/tool_fixes_spec.rb'
543
544
  - 'spec/rubino/ui/bottom_composer_approval_handoff_pty_spec.rb'
data/CHANGELOG.md CHANGED
@@ -1,5 +1,393 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.5.2.1] - 2026-06-26
4
+
5
+ ### Fixed
6
+
7
+ - **Symlinked workspace roots broke three path checks.** Several modules compared
8
+ a symlink-resolved path against a NON-resolved root, so a workspace reached
9
+ through a symlink (macOS `/etc` → `/private/etc`, `/var` → `/private/var`, or
10
+ any symlinked checkout) defeated the match:
11
+ - **SecretPath** — `secret?("/etc/sudoers")` returned `false` and the
12
+ `~/.ssh`/`~/.aws`/… credential read-gate classified nothing, silently
13
+ no-op'ing the write-approval gate and read-block for those paths.
14
+ - **IgnoreRules** — `git rev-parse --show-toplevel` returns the realpath, so
15
+ the allowed-set rebase dropped *every* file and the whole tree read as
16
+ git-ignored; `grep`/`glob` then returned nothing under a symlinked checkout.
17
+ - **Skills::Registry** — an untrusted repo's project-local `.rubino/skills` was
18
+ not recognised as project-local, so the trust gate failed to drop it (hostile
19
+ project skills could load in an untrusted directory).
20
+
21
+ All three now resolve both sides of the comparison through `realpath` /
22
+ `canonical_path`. Defense-in-depth — not security boundaries.
23
+ - **`grep` Ruby fallback now matches dotfiles.** Without ripgrep on PATH, the
24
+ fallback globbed `**/<include>` without `FNM_DOTMATCH`, so an include like
25
+ `*.env` never matched `.env`/`.envrc` — exactly the secret-bearing files. The
26
+ include glob now matches dotfiles, mirroring `rg --glob`.
27
+
28
+ ## [0.5.2] - 2026-06-26
29
+
30
+ ### Added
31
+
32
+ - **Live formatted-markdown streaming.** The in-flight model stream now renders
33
+ as formatted markdown while it arrives (Stage 1), painted as atomic frames via
34
+ DEC-2026 synchronized output so a fast stream never tears mid-update (Stage 2),
35
+ with committed code blocks syntax-highlighted through Rouge (Stage 3). (#592,
36
+ #593, #594)
37
+ - **Leaked tool-call recovery.** Models that emit a tool call as plain text or
38
+ garbled XML/JSON markup instead of a structured call (MiniMax-M3 and other
39
+ tool-loop models) now have those calls re-parsed into real `tool_calls` at the
40
+ transport layer so they actually execute, including a garbled `<invoke">`
41
+ variant.
42
+ - **`write` content preview.** The `write` tool box now shows a preview of the
43
+ content being written.
44
+
45
+ ### Changed
46
+
47
+ - Raise the default `max_tool_iterations` from 25 to 90 (Hermes-aligned), so long
48
+ tool-driven turns no longer hit the ceiling mid-task.
49
+ - Teach the agent (via the build prompt) to read the compressed tool-output
50
+ markers introduced in 0.5.1.
51
+
52
+ ### Fixed
53
+
54
+ - Render any unterminated code fence as a code box, matching CommonMark's
55
+ end-of-file fence auto-close, instead of leaking the raw backticks. (#595)
56
+ - Merge consecutive same-role messages on the Anthropic-family wire so the
57
+ request shape stays valid. (#597)
58
+ - Give MiniMax its full output ceiling so a long thinking block no longer starves
59
+ the visible output (a root cause of heavy-turn "invalid params" death).
60
+ - Keep a 5xx-wrapped "invalid params" response on the retryable path.
61
+ - Multi-line `ask()` prompts no longer erase terminal scrollback.
62
+ - Exclude synthetic `[harness control]` injections from the rewind picker.
63
+ - Fix an installed-gem launch crash (`uninitialized constant Rubino::TAGLINE`).
64
+
65
+ ### Removed
66
+
67
+ - Drop the dead `server.*` config section, the orphaned `ask_parent` takeover and
68
+ ask/reply substrate, and dead code surfaced by the post-removal audit.
69
+
70
+ ### Docs
71
+
72
+ - Mark native OAuth as not wired end-to-end (WIP).
73
+
74
+ ## [0.5.1] - 2026-06-25
75
+
76
+ ### Added
77
+
78
+ - **Tool-output compression (deterministic, off by default).** A no-LLM content
79
+ router at the single `Agent::ToolExecutor` seam compresses high-volume tool
80
+ output before it reaches the model: test/build/lint logs are reduced to their
81
+ failures + summary (≈97% fewer tokens on a failing suite, every failure kept),
82
+ and a whole-file source read can be returned as a skeleton (signatures kept,
83
+ large bodies elided behind a `read offset:/limit:` pointer). Diffs, grep/search
84
+ results, JSON, and short output pass through **byte-identical**. Reversibility
85
+ reuses the existing spill: the full original is written to
86
+ `tool-results/<call_id>.txt` and the compressed output points the model there —
87
+ no separate store/tool. When enabled, `read` and `shell` expose a `compress`
88
+ parameter (default true) so the model can opt a single call out and get the
89
+ verbatim output. Master switch `tool_output_compression.enabled` (default
90
+ `false`); `rubino setup` offers to turn it on. See
91
+ [configuration.md](docs/configuration.md#tool_output_compression).
92
+ - **Multi-language code compression.** The whole-file source-skeleton compressor
93
+ now covers more than Ruby. `tool_output_compression.code.languages` (default
94
+ `["ruby"]`) selects which languages get skeletonised: Ruby (built-in Prism
95
+ parser), Python (stdlib `ast` via your `python3` — a no-op if `python3` isn't
96
+ on PATH), and JavaScript / TypeScript / TSX (via the optional
97
+ `tree_sitter_language_pack` gem — a no-op until it's installed). A read in an
98
+ unlisted language passes through verbatim. `rubino setup` adds a language
99
+ picker and, if you choose JS/TS, offers to install the parser gem.
100
+ - **Agent-attach view.** At the idle prompt, `↓` opens the subagent picker and
101
+ `Enter` now **attaches** to the highlighted background subagent: the screen
102
+ switches to that agent's OWN full timeline (its tool calls and what it said,
103
+ replayed from its session) and the input prompt becomes scoped — `sa_xxxx ❯`.
104
+ While attached, typed text steers the running child (or answers it when it's
105
+ blocked on you); `←` on the empty prompt (or the picker's `◂ main` row) returns
106
+ to the main timeline, and the picker doubles as a switcher between agents. This
107
+ replaces the bounded registry snapshot the picker's Enter used to show with the
108
+ agent's real conversation, and makes the global `/agents <id> steer/probe` and
109
+ `/reply <id>` forms redundant while attached. The attached view **live-tails**
110
+ the child's stream (tool rows and streaming prose) exactly like the main agent
111
+ instead of freezing on a snapshot, and `/back` / `/detach` return to the main
112
+ agent regardless of composer-draft state (#82, #85, #87).
113
+ - **`api.allow_public_bind` gate.** Because the API server can execute shell
114
+ tools, binding it to a non-loopback address (`--host 0.0.0.0`,
115
+ `RUBINO_API_HOST`) now **refuses to boot** unless `api.allow_public_bind: true`
116
+ is set in `config.yml`; when opted in, the server prints a one-time exposure
117
+ warning. Loopback binds are unaffected (#577).
118
+ - **MCP tool transparency + parallel startup.** An MCP tool's display label now
119
+ carries its source — the live tool card and the approval card both show
120
+ `<bare> (mcp:<server>)`, so you can tell at a glance that an out-of-process
121
+ server is running (the model-facing tool name is unchanged) (#582). MCP
122
+ servers also now connect **in parallel** at boot, so one hanging server no
123
+ longer serializes startup (#576).
124
+ - **Read-only meta-commands run immediately while a turn is active.** A small
125
+ set of non-mutating slash commands (`/agents`, `/tasks`, `/stop`, `/status`,
126
+ `/jobs`, `/help`, `/commands`, `/dirs`) now execute **immediately** mid-turn
127
+ instead of queuing — so you can drill into a sub-agent, stop the run, or check
128
+ status without interrupting. State-mutating commands (`/model`, `/clear`,
129
+ `/new`, `/config`, `/mode`, …) show a transient `⚠ <cmd> is not available
130
+ during an active turn — press Esc to interrupt first` notice; plain text still
131
+ queues, and `Esc` interrupts.
132
+ - **Interactive CLI session picker.** A bare `rubino sessions` on a TTY opens an
133
+ interactive picker (id, title, message count, dir, age; arrow-key highlight,
134
+ type-to-filter, `Esc` cancels) and `Enter` resumes the chosen session. On a
135
+ pipe / non-TTY it prints a script-safe list; `sessions list` stays list-only.
136
+ The picker is cwd-scoped by default; `--all` unscopes it.
137
+ - **`/sessions rename <id|title> <new title>`.** Rename a session from the REPL
138
+ (#45).
139
+ - **Aux-LLM session titles.** When `auxiliary.title` names a concrete backend,
140
+ new sessions get an LLM-generated, length-capped summary title; the
141
+ deterministic derivation stays the default and the fallback (#45).
142
+ - **Streaming GFM table rendering (#89).** A markdown table now renders as a
143
+ live, correctly-fitted table as it streams — a sliding window of recent rows
144
+ grows in place — instead of leaking raw `| col | col |` pipes that only snap
145
+ into a table once the message completes.
146
+
147
+ ### Changed
148
+
149
+ - **Provider auto-routing.** With `model.provider: "auto"` (the default), the
150
+ concrete provider is derived from the model id (`openai/*` → OpenAI); the
151
+ setup wizard / auto-detect write an explicit provider when a non-OpenAI
152
+ backend is chosen.
153
+ - **Credential check uses provider-specific env vars.** The credential check
154
+ and key resolution now read the env var for the configured provider
155
+ (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, `BEDROCK_API_KEY`,
156
+ `MINIMAX_API_KEY`, and `<PROVIDER>_API_KEY` for anything else, e.g.
157
+ `DEEPSEEK_API_KEY`). A non-OpenAI provider no longer silently falls back to
158
+ `OPENAI_API_KEY` (only providers explicitly marked `openai_compatible` /
159
+ `anthropic_compatible` fall back to `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`).
160
+ - **`security.confirm_policy` default is `dangerous_only`.** Safe shell commands
161
+ run unprompted; only commands matching a dangerous pattern prompt. Set
162
+ `confirm_policy: confirm_all` to restore prompt-on-everything. The
163
+ non-bypassable hardline floor and `permissions: deny` always run first
164
+ regardless of policy.
165
+ - **Removed the built-in `run_tests` and `github` tools.** Running tests and
166
+ GitHub/git operations now go through the generic `shell` tool (with its
167
+ hardened git arg parsing), matching the field norm and shrinking the tool
168
+ surface.
169
+ - **Blocked-tool results are now typed errors.** When a tool call is blocked
170
+ (denied by approval, sandbox, or policy), its result is returned to the model
171
+ as a typed error with explicit anti-confabulation wording, so the model is told
172
+ the action did NOT happen instead of being free to assume success (#583).
173
+ - **Single status bar during a turn.** The animated facet activity row is folded
174
+ into the model/ctx footer (one bar, not two); the "esc to interrupt" hint shows
175
+ exactly once, and a mid-stream **waiting indicator** resurfaces beneath the
176
+ in-flight tail after a short window of model/transport silence and drops away
177
+ the instant tokens resume (#21, #56b — `/status` now also shows the workspace
178
+ cwd line).
179
+ - **FIFO approval queue for concurrent subagents.** When multiple subagents need
180
+ approval at once, one modal shows at a time with an "(N more queued)"
181
+ indicator that dequeues on resolve, and async-completion notices no longer
182
+ print over an active modal. Subagent approvals also **escalate to the parent's
183
+ approval card at any nesting depth**, so a nested child no longer fail-closes
184
+ with a noninteractive block (#86).
185
+ - **Slash commands dispatch while attached to a subagent** (`/stop <id>`,
186
+ `/agents`, `/status`, …) instead of being steered into the child as text;
187
+ `/skills list` / `/skills ls` show the skills list rather than trying to
188
+ activate a skill named `list`; `/think off` hides the reasoning aside for
189
+ always-thinking models unless an explicit `/reasoning` is set; `/config <key>`
190
+ resolves the short labels `/status` advertises (`reasoning`, `effort`,
191
+ `think`) (#62, #66, #87).
192
+ - **`/new` returns instantly.** The end-of-session memory flush is enqueued as a
193
+ background job instead of running a synchronous aux-LLM extract, so starting a
194
+ new session no longer freezes the prompt for 2–3s.
195
+ - **Headless one-shot drains only its own jobs.** `rubino -q` now emits and
196
+ flushes the JSON result envelope before draining, and scopes the post-turn job
197
+ drain to the run's own session, so a one-shot returns immediately even with a
198
+ background job backlog.
199
+ - **Subagent cards are distinguishable + carry the task id.** Concurrent
200
+ subagent cards label by a dimension drawn from the task prompt (rather than the
201
+ bare agent type), background "done" markers carry the task id, and the live
202
+ elapsed counter shows seconds (`1m05s`) so it visibly advances (#44, #570).
203
+ - **Pastes coalesce into a single placeholder**, input history is recalled and
204
+ persisted, `Enter` accepts the highlighted dropdown candidate, and
205
+ `task_result` running-polls no longer flood the transcript (#524, #525).
206
+ - **System-prompt grounding for control + tools.** The cap / continuation /
207
+ summary control is framed as trusted `[harness control]` so MiniMax-M3 stops
208
+ treating it as prompt-injection (#75); the background-shell lifecycle is primed
209
+ so the model uses `shell_output` / `shell_kill` correctly; the verification
210
+ step is scoped to never modify the environment and to stop honestly.
211
+ - **Memory-flush best-effort boundary** made airtight (#471), so a failure
212
+ flushing memory at shutdown can't take down the run.
213
+
214
+ ### Removed
215
+
216
+ - **Child→parent `ask_parent` / `answer_child` tools.** Subagents are
217
+ non-blocking background workers and can no longer pause mid-task to ask their
218
+ parent (or the human) a question; instead they make sensible default calls and
219
+ surface open decisions in their result. The two model-facing tools that
220
+ implemented that channel — `ask_parent` (the child→parent escalation) and
221
+ `answer_child` (the parent's reply) — are gone. The parent→child `steer` /
222
+ `probe` tools and the human approval gate (`/reply` for a child parked on an
223
+ approval) are unchanged. `tasks.ask_parent_timeout` is now vestigial.
224
+ - **`streaming.cursor` config key.** It was dead config (assigned, never read)
225
+ and is no longer accepted — remove it from any `config.yml`.
226
+ - **`security.require_confirmation_for_shell` config key.** Replaced by
227
+ `security.confirm_policy` (`dangerous_only` | `confirm_all`); the old key is no
228
+ longer honored.
229
+
230
+ ### Security
231
+
232
+ - **Hermes-style secret handling (#506).** Adopts the Hermes secret model across
233
+ the agent: the structured `read` tool blocks `.env` and credential files
234
+ outright, and secret **values** are redacted in the output of `read`, `grep`,
235
+ `shell` (including the live stream seam, not just the final buffer, #507),
236
+ `summarize`, and `read_attachment` (#511/#512). A `security.redact_secrets`
237
+ toggle (default **on**) controls redaction. The earlier per-read secret-file
238
+ approval gate was removed in favour of this block-list + redaction model
239
+ (#480). Over-broad redaction was then narrowed: the `ENV_ASSIGN` pattern is
240
+ anchored so `AUTHORS` / `SECRETARY` pass through while `API_KEY` / `AUTH_TOKEN`
241
+ still redact, the Telegram-token pattern is pinned to its canonical shape, and
242
+ fully-masked secrets carry an explicit marker rather than a bare `***`
243
+ (#67, #516).
244
+ - **Secrets are no longer persisted to memory (#99).** A `Security::SecretDetector`
245
+ is wired into the memory write path (it refuses an explicit save and the
246
+ auto-extract persist path) and into the redactor, catching prefixed key
247
+ shapes, prefix-less AWS secret keys, and a high-entropy heuristic — previously
248
+ an `sk-proj-…` key could be saved verbatim and re-injected into every future
249
+ system prompt.
250
+ - **Removed the dedicated `git` tool (RCE bypass).** Git now runs through the
251
+ hardened `shell` with strict arg parsing that rejects exec vectors
252
+ (`--ext-diff`, `-c`, textconv, …) plus a `GIT_HARDENED_ENV`, instead of a tool
253
+ that could be steered into arbitrary command execution (#536/#553).
254
+ - **Dangerous write/exec flag-forms prompt under the default gate (#61).**
255
+ `git -c` / `--output`, `sed -i`, `sort -o`, `find -delete` / `-exec`,
256
+ `tar --to-command`, `tee`, interpreter `-c` / `-e` / `--eval`, etc. no longer
257
+ auto-run under `dangerous_only`, while bare interpreters and read-only forms
258
+ still auto-run. A shared `Security::CommandNormalizer` also closes
259
+ line-continuation evasion (e.g. `rm -r\<newline>f` no longer slips past the
260
+ danger/approval layer).
261
+ - **Extended HOME credential read-block.** Reading credential stores under HOME
262
+ is blocked and a base64-decode-pipe-to-shell (`echo … | base64 -d | sh`) is
263
+ flagged dangerous (#519); the denylist now covers `.ssh`, `.aws`, `.netrc`,
264
+ `.git-credentials`, `.kube`, `.docker`, `.gnupg`, `.azure`, and `.config/gh`
265
+ (#537). A write through a **dangling in-workspace symlink** can no longer
266
+ escape the sandbox — the link target is resolved before the create-new-file
267
+ fallback (#62).
268
+ - **Tighten the `ruby_llm` floor to `>= 1.16` (#508).** The adapter wires native
269
+ providers through ruby_llm's generic `<provider>_api_base=` setters
270
+ (deepseek/mistral/etc., #482), which only exist from ruby_llm 1.16.0. The
271
+ gemspec previously allowed `~> 1.0`, so a fresh `gem install` could resolve
272
+ ruby_llm 1.15 and crash at runtime with `NoMethodError`. The dependency is now
273
+ `>= 1.16, < 2.0`.
274
+ - **Secret masking on `config set`.** `rubino config set` now masks the echoed
275
+ value when the key looks secret (`api_key`, `token`, `password`, `secret`,
276
+ `authorization`, …) and when the value itself contains inline credentials
277
+ (`key=value`, `Bearer …`, URL userinfo, `curl -u`, `mysql -p…`), so keys are
278
+ not printed in the clear to the terminal/scrollback.
279
+ - **Sanitized untrusted text rendered to the terminal (CWE-150).** Text that
280
+ originates from the model, tools, or filenames (subagent cards, `/`-palette and
281
+ `@`-picker menu labels, and the remaining CLI aside sinks — probe, reasoning,
282
+ open-fence, branch title) is now defanged of ANSI/OSC escape sequences before
283
+ it is written, closing an escape-injection class (#563/#564/#565–#568).
284
+ - **Vision egress hardening.** The `vision` tool now honours
285
+ `attachments.policy.aux_vision_egress` (default `true`): set it to `false` and
286
+ the tool refuses to send an image to an external auxiliary model, returning a
287
+ clean error instead of egressing the bytes (#578). Before any egress it also
288
+ **content-sniffs** the file (magic bytes win over the extension, fail-closed),
289
+ so a mislabelled or non-image file can't be smuggled to the external host
290
+ (#579).
291
+ - **OS sandbox covers more executors.** The OS write-jail (Landlock / Seatbelt)
292
+ now also confines background shells, `ruby`, and `run_tests`, with relaxation
293
+ gated on verified enforcement; a write-jail `EACCES` outside the workspace
294
+ produces an attributable "blocked by write-jail" hint (#74).
295
+
296
+ ### Fixed
297
+
298
+ - **MiniMax-M3 pre-tool-call "freeze".** Thinking/reasoning now defaults ON for
299
+ every provider (it was deliberately off for MiniMax-family ids). On the
300
+ anthropic-compatible path rubino now sends `thinking: {type: enabled,
301
+ budget_tokens: …}` and streams the model's reasoning deltas — so the multi-
302
+ second window where M3 reasons toward a tool-call is filled with visible
303
+ streamed reasoning instead of dead air (the symptom that read as the agent
304
+ "freezing" when it spawned subagents). Matches the reference agent's default
305
+ `reasoning_effort: medium`. A backend that rejects the budget is caught and
306
+ retried once without it (#75), so default-on is safe; set
307
+ `providers.<name>.supports_thinking: false` to opt out.
308
+ - **MCP `degraded` server state.** `/mcp` and `rubino doctor` now distinguish a
309
+ reachable server (`●`) from a **degraded** one (`⚠` — the process is alive but
310
+ a protocol call such as `tools/list` failed), instead of reporting it as plain
311
+ reachable (#575).
312
+ - **Session-title length cap.** A renamed session title is now length-capped at
313
+ rename and truncated on render, so an over-long title can't disrupt status /
314
+ session-list layout (#581).
315
+ - **Streaming fidelity.** A streaming turn no longer re-executes or re-surfaces
316
+ tool calls it already ran (no double "started" line or duplicate final tool)
317
+ (#53), and a split think/fence sentinel is held across the message-boundary
318
+ flush so reasoning no longer leaks into the body and prose isn't torn apart
319
+ (#43/#54). A committed markdown table glued to trailing prose no longer leaks
320
+ raw pipes, and a too-wide table fits the pane instead of tearing the border.
321
+ - **Subagent / multiplexer UI.** A running `blocked_on_parent` sub stays visible
322
+ in the footer while listed; cap-rejected delegation renders a neutral
323
+ "at capacity" row instead of a phantom failed card; the close-row / replay use
324
+ the per-call subagent name instead of a shared stale one (#35); the agent
325
+ picker opens reliably on `↓` and `←`/`↑` backs out; picking `◂ main` returns to
326
+ main immediately mid-turn; a nested child's menu no longer crashes it; and the
327
+ parent autonomously resumes at idle when background subagents finish while
328
+ detached (#37, #44, #51, #561).
329
+ - **Interrupt handling.** `Esc` at the tool-dispatch boundary raises a clean
330
+ interrupt instead of a malformed continuation that the backend rejects as
331
+ "invalid params"; a stray `Ctrl-C` exits cleanly (130) with no raw `net/http`
332
+ backtrace; and a background thread never dumps a backtrace on death.
333
+ - **Input papercuts.** Backspace (`DEL 0x7f`) deletes instead of inserting a
334
+ space (#522); a single `Ctrl-D` at an idle empty composer no longer hangs, and
335
+ fast input bursts coalesce their redraws (#520). Several composer
336
+ render/input races and resize-while-typing reflows that duplicated the
337
+ in-progress input into the scrollback are fixed, including chained resizes and
338
+ the resize REPAINT path (#481/#485/#486/#499/#500/#501/#503).
339
+ - **`edit` no longer crashes on non-UTF-8 / binary buffers.** Fuzzy-match
340
+ normalization passes invalid-encoding bytes through verbatim (#47), atomic
341
+ writes are binmode'd so binary buffers never transcode (the intermittent
342
+ in-session edit crash on accented files) (#65), and `clean_slice` reinterprets
343
+ binary as UTF-8 rather than calling `.encode` (#58). A failed edit / read /
344
+ write now shows `✗` instead of a green `✓`.
345
+ - **Background jobs and shells.** The job queue drains reliably — stale `running`
346
+ rows are reclaimed after the lease expires (#76) and `ExtractMemoryJob` is
347
+ prioritized over `SummarizeSessionJob` so save→recall doesn't lag (#79);
348
+ finished background shells are retired with their buffer and exit status
349
+ retained, so `shell_output` / `shell_tail` / `shell_kill` stay reachable next
350
+ turn (#78); shell cancel no longer orphans the child process group, and a
351
+ finished background shell auto-wakes the model.
352
+ - **Turn-ledger honesty.** Blocked / errored tools no longer count toward the
353
+ "N tools ran / M edits" ledger, so a turn whose only tool was refused stops
354
+ telling you to review nonexistent changes; the force-summary and closing-summary
355
+ nudges are grounded in the truthful turn ledger so the model can't confabulate
356
+ having done nothing (#36/#84). MiniMax HTTP 429 / quota errors are categorized
357
+ as retryable rate-limit (honouring `Retry-After`) instead of "Invalid request",
358
+ and the anti-confabulation note no longer over-fires on accurate local caveats.
359
+ - **Sessions / resume / doctor.** A per-session `flock` guard stops a concurrent
360
+ `--continue` from forking a moving transcript (#543), replay renders only the
361
+ new tail of a restated final message (#542), `--resume <id>` is validated
362
+ before the boot banner (#521), and `doctor` warns instead of false-green when
363
+ no usable credential exists and no longer implies an unverified key is
364
+ validated (#541/#546).
365
+ - **Non-native provider wiring (#482).** Fixed the preflight that falsely
366
+ reported non-native providers (deepseek/mistral/…) as ready; they are now
367
+ wired through the generic `<provider>_api_base=` setters and the run stops
368
+ on an unreachable endpoint instead of failing later. Transient name-resolution
369
+ failures (`EAI_AGAIN`) are retried rather than fatal, and a stream that ends
370
+ without a finish signal is recovered instead of failing the turn.
371
+ - **Parent-death reaps child shells (#478).** When the agent process dies, the
372
+ long-running child shells it spawned are reaped instead of being orphaned,
373
+ using a trap-safe SIGTERM/SIGHUP handler (no `Mutex` inside the signal trap).
374
+ - **Compaction no-op loop (#484).** Stopped a busy-loop on an over-budget
375
+ session that has too few messages to compact. The `doom_loop.threshold`
376
+ default is also no longer rejected by its own validator (#60).
377
+ - **Memory polish indicator no longer flashes every turn (#59).** The polish
378
+ worker starts only when a row was actually enqueued, the indicator composes
379
+ alongside the ctx bar instead of replacing it, and a verbatim repeat
380
+ short-circuits to the existing row at the write seam.
381
+ - **`/exit` and exit codes.** `/exit` routes through the quit-guard, and an
382
+ interactive session exits non-zero on an auth/credential error (#154).
383
+ - **CLI DX papercuts.** Fixed the bare-`rubino "prompt"` one-shot path, help-
384
+ session clutter, a bare-prompt did-you-mean edge case, and a `read_attachment`
385
+ hint that suggested markitdown for raster images instead of OCR.
386
+ - **Input hardening.** Fixed a raw SQLite3 exception on session input with
387
+ hostile/NUL bytes (#498) and cleaned up `Errno` error messages on the failure
388
+ paths; tightened mcp args validation and assorted low-severity
389
+ config/sessions/resume/CLI papercuts.
390
+
3
391
  ## [0.5.0] - 2026-06-15
4
392
 
5
393
  ### Added
data/README.md CHANGED
@@ -5,12 +5,62 @@ A coding & automation **agent** — small, self-contained, and built to run *whe
5
5
  ## Why rubino
6
6
 
7
7
  - **Runs where the work is** — a single gem on the machine (or VM) that holds the code, not a remote service you pipe files to.
8
- - **Persistent memory** — a tiny SQLite "Zep"-style fact store that learns about you and the project across sessions.
8
+ - **Persistent memory** — a tiny SQLite fact store that learns about you and the project across sessions.
9
9
  - **Context compaction** — automatic compression with session lineage when the conversation outgrows the window.
10
10
  - **CLI *and* HTTP API** — an interactive terminal session for humans, a bearer-protected JSON + SSE API for programs.
11
- - **Real tools, gated** — read/write/edit, shell, ruby, git/github, grep/glob, a structured test runner, vision, and more, behind an approval model with a non-bypassable hardline floor.
11
+ - **Real tools, gated** — read/write/edit, shell, ruby, grep/glob, apply_patch, vision, and more (git, GitHub, and tests run through the hardened shell), behind an approval model with a non-bypassable hardline floor.
12
12
  - **Built on ruby_llm** — provider-agnostic: MiniMax, OpenAI, Anthropic, Gemini, or an OpenAI-compatible gateway.
13
13
 
14
+ ## Cache-friendly compaction (measured)
15
+
16
+ A long agent session only stays cheap if the cached prompt prefix survives
17
+ compaction. rubino is built so that when the conversation is compressed into a
18
+ summary, the summary lands *after* the cached head (system + tools + stable
19
+ history) — so the provider's prompt cache keeps **hitting** the head instead of
20
+ re-encoding it cold every time the session is compacted.
21
+
22
+ Measured with the model held fixed (local oMLX `Qwen3.6-35B-A3B`,
23
+ Anthropic-style `cache_control`) on a 25-turn coding session that triggers
24
+ compaction **9 times**:
25
+
26
+ | metric | rubino |
27
+ |---|---|
28
+ | cached prefix retained right after each compaction | **44–94%** (survives — never resets to 0) |
29
+ | cumulative cache-read over the whole session | **88%** |
30
+ | prefix byte-stability across turns | **0.95** |
31
+ | task solved through all 9 compactions | **10/10** hidden tests, 0 wasted work |
32
+
33
+ Holding the model fixed isolates the **engine** — any difference is the
34
+ scaffolding (prompt assembly, where the compaction summary is placed, cache
35
+ breakpoints), not the model. This is a single model and a single scenario:
36
+ indicative of the design, not a leaderboard. The harness lives in a separate
37
+ benchmark project.
38
+
39
+ ## Tool-output compression (measured)
40
+
41
+ Test logs, diffs and large command dumps are mostly noise. rubino can route
42
+ each tool output through a **deterministic (no-ML)** compressor that keeps the
43
+ signal and drops the rest — opt-in (`tool_output_compression`), with a
44
+ byte-identical passthrough for anything already small and a `retrieve_output`
45
+ pointer back to the full text. Token-honest: counts are the **exact**
46
+ `prompt_tokens` reported by the server (local oMLX `Qwen3.6-35B-A3B`), not
47
+ chars/4 estimates.
48
+
49
+ | tool output | reduction | fidelity (verified) |
50
+ |---|---:|---|
51
+ | rspec full suite (21 failures, ~8k lines) | **97%** | all 21 failures + the tally kept |
52
+ | `git log --stat` / `ls -R` | **94%** | boundary/keyword lines kept |
53
+ | large source diff (9 files) | **42%** | all 575 ± lines, 13 hunks, 9 headers |
54
+ | `package-lock.json` diff (60 bumps) | **99%** | file header + summary (body elided) |
55
+ | whole-file Ruby read → skeleton | **27%** | signatures + structure kept |
56
+ | JSON (kubectl / docker / gh, uniform rows) | **40–88%** | error rows + outliers always kept |
57
+ | rubocop (already signal-dense) | 11% | floor — every offense kept |
58
+
59
+ End-to-end A/B on real edit tasks: **12/12 tasks passed with compression ON and
60
+ OFF** — it never broke a task, and every forced-failure run still recovered the
61
+ single failing line out of a long log. Routing is verified (each output goes to
62
+ the right strategy) and small inputs pass through **byte-identical**.
63
+
14
64
  ## Install
15
65
 
16
66
  One line, Linux and macOS (x86_64 / arm64). Installs a compatible Ruby, then the gem — all in user space, no sudo:
@@ -111,7 +161,7 @@ agent:
111
161
 
112
162
  memory:
113
163
  enabled: true
114
- backend: "sqlite" # tiny-Zep FTS5 + graph-lite recall (default)
164
+ backend: "sqlite" # SQLite FTS5 + graph-lite recall (default)
115
165
  auto_extract: true
116
166
 
117
167
  compression:
@@ -126,7 +176,7 @@ tools:
126
176
  git: true
127
177
  shell: true # ON by default; every command is still approval-gated
128
178
  ruby: true
129
- web: false # gates BOTH webfetch and websearch
179
+ web: true # ON by default (keyless DuckDuckGo backend); gates BOTH webfetch and websearch
130
180
  memory: true
131
181
  ```
132
182
 
@@ -142,7 +192,7 @@ Full reference (every key, env vars, precedence): **[docs/configuration.md](docs
142
192
  - **[Configuration](docs/configuration.md)** — full config + env vars + precedence
143
193
  - **[Tools](docs/tools.md)** — the built-in tool set and approval behavior
144
194
  - **[Skills](docs/skills.md)** — reusable instruction packs, the 3-level disclosure, and `SKILL_LOADED` observability
145
- - **[Memory](docs/memory.md)** — the SQLite tiny-Zep backend
195
+ - **[Memory](docs/memory.md)** — the SQLite memory backend
146
196
  - **[Security](docs/security.md)** — approval model, hardline floor, TLS
147
197
  - **[Troubleshooting](docs/troubleshooting.md)** — keyed on the exact error strings
148
198
  - **[HTTP API](docs/api/v1.md)** · **[Jobs & cron](docs/jobs.md)** · **[OAuth providers](docs/oauth-providers.md)** · **[Architecture](docs/architecture.md)**
@@ -150,7 +200,7 @@ Full reference (every key, env vars, precedence): **[docs/configuration.md](docs
150
200
 
151
201
  ## Built-in tools
152
202
 
153
- The agent ships **27 built-in tools** (the set `rubino tools` lists): `read`, `read_attachment`, `summarize_file`, `write`, `edit`, `multi_edit`, `apply_patch`, `grep`, `glob`, `git`, `github`, `shell`, `shell_output`, `shell_tail`, `shell_input`, `shell_kill`, `ruby`, `run_tests`, `web`, `question`, `todowrite`, `memory`, `session_search`, `attach_file`, `vision`, `skill`, `task`. A single `web` tool gates both fetching a URL and searching (config key `tools.web`, off by default). Each tool is gated by a `tools.<key>` config flag (opt-out) and the approval model. See **[docs/tools.md](docs/tools.md)**.
203
+ The agent ships **27 built-in tools** (the set `rubino tools` lists): `read`, `read_attachment`, `summarize_file`, `write`, `edit`, `multi_edit`, `apply_patch`, `grep`, `glob`, `git`, `github`, `shell`, `shell_output`, `shell_tail`, `shell_input`, `shell_kill`, `ruby`, `run_tests`, `web`, `question`, `todowrite`, `memory`, `session_search`, `attach_file`, `vision`, `skill`, `task`. A single `web` tool gates both fetching a URL and searching (config key `tools.web`, on by default via the keyless DuckDuckGo backend; it degrades gracefully when no search backend is reachable). Each tool is gated by a `tools.<key>` config flag (opt-out) and the approval model. See **[docs/tools.md](docs/tools.md)**.
154
204
 
155
205
  ## Skills
156
206
 
@@ -191,7 +241,6 @@ These are designed-in but not fully wired yet — don't depend on them in produc
191
241
 
192
242
  - **MCP Support** — connect to Model Context Protocol servers via [ruby_llm-mcp](https://github.com/patvice/ruby_llm-mcp) ([docs/mcp.md](docs/mcp.md)).
193
243
  - **Multi-Agent** — Build / Plan / Explore agents with `@mention` routing ([docs/agents.md](docs/agents.md)).
194
- - **Plugin Hooks** — event hooks for extending behavior ([docs/plugins.md](docs/plugins.md)).
195
244
 
196
245
  ## Development
197
246
 
data/Rakefile CHANGED
@@ -7,6 +7,23 @@ RSpec::Core::RakeTask.new(:spec)
7
7
 
8
8
  task default: :spec
9
9
 
10
+ # API documentation. `rake rdoc` regenerates the HTML API docs into doc/rdoc
11
+ # (gitignored); the .github/workflows/docs.yml workflow publishes the same
12
+ # output to GitHub Pages. Guarded so the Rakefile still loads if the `rdoc`
13
+ # default gem is somehow absent.
14
+ begin
15
+ require "rdoc/task"
16
+
17
+ RDoc::Task.new(:rdoc) do |rdoc|
18
+ rdoc.rdoc_dir = "doc/rdoc"
19
+ rdoc.main = "README.md"
20
+ rdoc.title = "rubino-agent API documentation"
21
+ rdoc.rdoc_files.include("lib/**/*.rb", "exe/*", "README.md", "CHANGELOG.md", "docs/*.md")
22
+ end
23
+ rescue LoadError
24
+ # `rdoc` unavailable -> the `rake rdoc` task is simply not defined.
25
+ end
26
+
10
27
  # Parallel test execution across CPU cores via the `parallel_tests` gem.
11
28
  #
12
29
  # rake parallel:spec # auto: one worker per core