pikuri-core 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. checksums.yaml +7 -0
  2. data/README.md +67 -0
  3. data/lib/pikuri/agent/chat_transport.rb +41 -0
  4. data/lib/pikuri/agent/configurator.rb +270 -0
  5. data/lib/pikuri/agent/context_window_detector.rb +111 -0
  6. data/lib/pikuri/agent/control/cancellable.rb +128 -0
  7. data/lib/pikuri/agent/control/interloper.rb +167 -0
  8. data/lib/pikuri/agent/control/step_limit.rb +93 -0
  9. data/lib/pikuri/agent/control.rb +45 -0
  10. data/lib/pikuri/agent/event.rb +190 -0
  11. data/lib/pikuri/agent/extension.rb +82 -0
  12. data/lib/pikuri/agent/listener/in_memory_event_list.rb +34 -0
  13. data/lib/pikuri/agent/listener/rate_limited.rb +172 -0
  14. data/lib/pikuri/agent/listener/terminal.rb +264 -0
  15. data/lib/pikuri/agent/listener/token_log.rb +216 -0
  16. data/lib/pikuri/agent/listener.rb +54 -0
  17. data/lib/pikuri/agent/listener_list.rb +102 -0
  18. data/lib/pikuri/agent/synthesizer.rb +145 -0
  19. data/lib/pikuri/agent.rb +731 -0
  20. data/lib/pikuri/subprocess.rb +166 -0
  21. data/lib/pikuri/tool/calculator.rb +82 -0
  22. data/lib/pikuri/tool/fetch.rb +171 -0
  23. data/lib/pikuri/tool/parameters.rb +314 -0
  24. data/lib/pikuri/tool/scraper/fetch_error.rb +16 -0
  25. data/lib/pikuri/tool/scraper/html.rb +285 -0
  26. data/lib/pikuri/tool/scraper/pdf.rb +54 -0
  27. data/lib/pikuri/tool/scraper/simple.rb +183 -0
  28. data/lib/pikuri/tool/search/brave.rb +184 -0
  29. data/lib/pikuri/tool/search/duckduckgo.rb +196 -0
  30. data/lib/pikuri/tool/search/engines.rb +163 -0
  31. data/lib/pikuri/tool/search/exa.rb +217 -0
  32. data/lib/pikuri/tool/search/rate_limiter.rb +92 -0
  33. data/lib/pikuri/tool/search/result.rb +29 -0
  34. data/lib/pikuri/tool/sub_agent.rb +150 -0
  35. data/lib/pikuri/tool/web_scrape.rb +121 -0
  36. data/lib/pikuri/tool/web_search.rb +38 -0
  37. data/lib/pikuri/tool.rb +118 -0
  38. data/lib/pikuri/url_cache.rb +112 -0
  39. data/lib/pikuri/version.rb +10 -0
  40. data/lib/pikuri-core.rb +177 -0
  41. data/prompts/pikuri-chat.txt +15 -0
  42. metadata +251 -0
@@ -0,0 +1,121 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pikuri
4
+ class Tool
5
+ # Truncation policy and Tool spec for the +web_scrape+ tool. The actual
6
+ # scraping lives in {Tool::Scraper::Simple}; this module is a thin
7
+ # wrapper that picks the scraper, applies a character cap so the LLM
8
+ # doesn't drown in long-form content, and exposes the result to the
9
+ # agent loop in OpenAI tool-call shape.
10
+ module WebScrape
11
+ # @return [Integer] default character cap on the Markdown returned
12
+ # by {.visit}. Sized to cover most post-readability article bodies
13
+ # in full on the first call, so the LLM doesn't have to re-request
14
+ # with a larger cap and pollute its context with the same prefix
15
+ # twice. ~5K tokens at the typical char/token ratio — light even
16
+ # for small local models. The genuinely long pages (long Wikipedia
17
+ # entries, multi-section docs) still get cut, and the truncation
18
+ # marker invites a deliberate larger {.visit} call when needed.
19
+ DEFAULT_MAX_CHARS = 20_000
20
+
21
+ # @return [Integer] hard ceiling on the +max_chars+ argument to
22
+ # {.visit}. Requests above this are clamped silently so the LLM
23
+ # cannot dump an arbitrarily large page into the conversation.
24
+ MAX_MAX_CHARS = 100_000
25
+
26
+ # On-disk cache used by {.visit} to memoize fetched pages. Defined
27
+ # as a method so specs can swap it for an isolated cache or
28
+ # {UrlCache::NULL} without touching the shared instance.
29
+ #
30
+ # @return [UrlCache, #fetch]
31
+ CACHE = UrlCache.new(ttl: UrlCache::DEFAULT_TTL, dir: "#{UrlCache::ROOT_DIR}/web_scrape")
32
+ # Accessor for {CACHE}; specs override this to swap in
33
+ # {UrlCache::NULL} or an isolated cache.
34
+ #
35
+ # @return [UrlCache, #fetch]
36
+ def self.cache
37
+ CACHE
38
+ end
39
+
40
+ # Fetch +url+ via {Tool::Scraper::Simple} and truncate the rendered
41
+ # Markdown to +max_chars+ characters.
42
+ #
43
+ # The full extracted Markdown is cached on disk via {.cache}, keyed
44
+ # by URL, so repeat visits within the cache TTL skip the network
45
+ # and the extraction pass entirely. +max_chars+ is not part of the
46
+ # cache key — different values for the same URL share one entry,
47
+ # and truncation runs after the cache lookup.
48
+ #
49
+ # {Tool::Scraper::FetchError} (HTTP non-2xx, network failure,
50
+ # redirect-loop, missing +Location+ header) is caught and returned
51
+ # as +"Error: ..."+ in the calculator-style convention so the agent
52
+ # loop feeds the failure back to the model as the next observation
53
+ # instead of crashing — the LLM can then try a different URL or
54
+ # search again. The rescue lives outside {.cache}'s +fetch+ block,
55
+ # so failure strings are never persisted: a retry on the next call
56
+ # hits the network again. Other exceptions (parser bugs in our own
57
+ # code) bubble up unchanged.
58
+ #
59
+ # @param url [String] absolute HTTP(S) URL of the page to download
60
+ # @param max_chars [Integer] character cap on the returned Markdown.
61
+ # Clamped to +[1, {MAX_MAX_CHARS}]+; defaults to
62
+ # {DEFAULT_MAX_CHARS}. When the full page exceeds the cap, output
63
+ # is cut and a marker noting the original length is appended.
64
+ # @return [String] Markdown representation of the page, possibly
65
+ # truncated, or +"Error: ..."+ on a recoverable fetch failure
66
+ def self.visit(url, max_chars: DEFAULT_MAX_CHARS)
67
+ max_chars = max_chars.clamp(1, MAX_MAX_CHARS)
68
+ markdown = cache.fetch(url) { Scraper::Simple.visit(url) }
69
+ truncate(markdown, max_chars)
70
+ rescue Scraper::FetchError => e
71
+ "Error: #{e.message}"
72
+ end
73
+
74
+ # Cut +markdown+ to at most +max_chars+ characters, appending a
75
+ # marker describing the original length when truncation actually
76
+ # happens. Returns +markdown+ unchanged if it already fits.
77
+ #
78
+ # @param markdown [String] full Markdown text
79
+ # @param max_chars [Integer] character cap; assumed already clamped
80
+ # @return [String]
81
+ def self.truncate(markdown, max_chars)
82
+ return markdown if markdown.length <= max_chars
83
+
84
+ "#{markdown[0, max_chars]}\n\n" \
85
+ "... [truncated at #{max_chars} of #{markdown.length} chars; " \
86
+ 'call again with a larger `max_chars` to see more]'
87
+ end
88
+ end
89
+
90
+ # Webpage download + Markdown conversion tool. Thin wrapper over
91
+ # {Tool::WebScrape.visit} that exposes it to the agent loop in OpenAI
92
+ # tool-call shape.
93
+ #
94
+ # @return [Tool]
95
+ WEB_SCRAPE = new(
96
+ name: 'web_scrape',
97
+ description: <<~DESC,
98
+ Scrapes the rendered webpage, PDF, or text file at the given URL and returns its main content as Markdown.
99
+
100
+ Usage:
101
+ - Use for HTML pages or PDFs where you want readable content — readability extraction strips nav, sidebars, and boilerplate.
102
+ - For raw textual payloads (JSON, CSV, robots.txt, source files), use fetch instead — it returns bytes verbatim, while web_scrape would corrupt them with a Markdown pass.
103
+ - A Single Page App may return very little or no content. Do NOT retry with a larger max_chars; try a different URL instead.
104
+ DESC
105
+ parameters: Parameters.build { |p|
106
+ p.required_string :url,
107
+ 'Absolute URL of the webpage to scrape, including ' \
108
+ 'the scheme, e.g. "https://example.com/article".'
109
+ p.optional_integer :max_chars,
110
+ 'Maximum number of characters of Markdown to ' \
111
+ 'return. Defaults to 20000; hard-capped at ' \
112
+ '100000. When the page is longer than this, ' \
113
+ 'output is cut and a marker reports the full ' \
114
+ 'length.'
115
+ },
116
+ execute: ->(url:, max_chars: WebScrape::DEFAULT_MAX_CHARS) {
117
+ WebScrape.visit(url, max_chars: max_chars)
118
+ }
119
+ )
120
+ end
121
+ end
@@ -0,0 +1,38 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pikuri
4
+ class Tool
5
+ # Namespace marker matching the file path. The actual search
6
+ # orchestration lives in {Tool::Search::Engines}; this file owns only
7
+ # the LLM-facing {Tool::WEB_SEARCH} constant below.
8
+ module WebSearch; end
9
+
10
+ # Web-search tool exposed to the agent loop in OpenAI tool-call shape.
11
+ # Calls {Tool::Search::Engines.search}, which cascades through whichever
12
+ # providers are configured (DuckDuckGo always, Brave when its API key is
13
+ # present) in random order, falling back on temporary-unavailability
14
+ # errors. Providers return structured {Tool::Search::Result} rows;
15
+ # +Engines.search+ renders the winning provider's rows into the
16
+ # smolagents-style Markdown shape the LLM sees, so the format stays
17
+ # stable regardless of which provider ran.
18
+ #
19
+ # @return [Tool]
20
+ WEB_SEARCH = new(
21
+ name: 'web_search',
22
+ description: <<~DESC,
23
+ Searches the web for a query and returns the top results as a Markdown list of titles, URLs, and short snippets.
24
+
25
+ Usage:
26
+ - Use this to find candidate URLs, then call web_scrape on the most promising one(s) for full content. Snippets alone rarely answer a question.
27
+ DESC
28
+ parameters: Parameters.build { |p|
29
+ p.required_string :query,
30
+ 'The search query, e.g. "BigDecimal precision Ruby".'
31
+ p.optional_integer :max_results,
32
+ 'Maximum number of result entries to return. ' \
33
+ 'Defaults to 10; most providers cap this at 20.'
34
+ },
35
+ execute: ->(query:, max_results: 10) { Search::Engines.search(query, max_results: max_results) }
36
+ )
37
+ end
38
+ end
@@ -0,0 +1,118 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'ruby_llm'
4
+
5
+ module Pikuri
6
+ # A tool the LLM can request via OpenAI-style tool calling.
7
+ #
8
+ # {Tool} is a plain value object: +name+, +description+, +parameters+,
9
+ # and an +execute+ Proc that produces the observation. Pikuri's own
10
+ # code talks to this surface (the bundled tools, the sub-agent factory,
11
+ # the +Agent+ constructor); the conversion to ruby_llm's runtime shape
12
+ # happens at exactly one named seam — {#to_ruby_llm_tool} — which the
13
+ # +Agent+ calls when wiring tools into the underlying +RubyLLM::Chat+.
14
+ #
15
+ # Bundled tool implementations live under +lib/tool/+ and are required
16
+ # explicitly by the scripts that use them; +lib/tool.rb+ itself only
17
+ # introduces the {Tool} class and {Tool::Parameters} machinery.
18
+ #
19
+ # == Validation ordering
20
+ #
21
+ # Pikuri's {Parameters#validate} — DidYouMean suggestions, type coercion,
22
+ # all-errors-collected — is what the LLM must see when it emits bad
23
+ # arguments, not ruby_llm's keyword-presence checker. The synthetic
24
+ # class produced by {#to_ruby_llm_tool} defines +execute(**args)+: the
25
+ # keyrest signature makes +RubyLLM::Tool#validate_keyword_arguments+
26
+ # no-op (it bails out at the +accepts_extra_keywords+ check), so
27
+ # pikuri's validator runs inside {#run} before any user code does.
28
+ #
29
+ # == Error handling convention
30
+ #
31
+ # Tools split failures into two buckets:
32
+ #
33
+ # * *Recoverable* failures — anything the LLM can react to by retrying with
34
+ # different inputs (bad arguments, HTTP 4xx/5xx, network blips, search
35
+ # provider rate-limits, division by zero in the calculator, ...). These
36
+ # come back as a +"Error: <message>"+ String and become the next
37
+ # observation; the model sees them and self-corrects on the following
38
+ # turn instead of crashing the agent loop.
39
+ # * *Bugs* in pikuri's own code (parser regressions, schema misuse,
40
+ # misconfigured clients, ...). These keep raising. The LLM cannot fix
41
+ # them; a human needs to.
42
+ #
43
+ # Argument-validation failures from {Parameters#validate} are caught by
44
+ # {#run} and turned into +"Error: ..."+ observations. The +execute+ Proc
45
+ # owns the rest — it should follow the same convention internally for
46
+ # tool-specific recoverable failures, and let bugs raise.
47
+ class Tool
48
+ # @return [String] function name advertised to the LLM
49
+ attr_reader :name
50
+
51
+ # @return [String] human-readable description used by the LLM to decide
52
+ # when to call the tool
53
+ attr_reader :description
54
+
55
+ # @return [Tool::Parameters] declared schema; validates incoming
56
+ # arguments and serializes to the JSON Schema shape advertised to the
57
+ # LLM
58
+ attr_reader :parameters
59
+
60
+ # @return [Proc] callable invoked once arguments have been validated;
61
+ # receives validated keyword arguments and returns a +String+
62
+ # observation
63
+ attr_reader :execute
64
+
65
+ # @param name [String] function name advertised to the LLM
66
+ # @param description [String] human-readable description used by the LLM
67
+ # to decide when to call the tool
68
+ # @param parameters [Tool::Parameters] declared schema
69
+ # @param execute [Proc] callable invoked with validated keyword arguments
70
+ # that returns a +String+ observation. Recoverable failures should be
71
+ # returned as +"Error: <message>"+ Strings rather than raised — see
72
+ # "Error handling convention" above.
73
+ # @return [Tool]
74
+ def initialize(name:, description:, parameters:, execute:)
75
+ @name = name
76
+ @description = description
77
+ @parameters = parameters
78
+ @execute = execute
79
+ end
80
+
81
+ # Validate +args+ against {#parameters} and forward them as keyword
82
+ # arguments to {#execute}. Validation failures are caught and rendered
83
+ # as +"Error: <message>"+ Strings so the agent loop can feed them back
84
+ # to the LLM as the next observation; everything else bubbles up.
85
+ #
86
+ # @param args [Hash] raw arguments supplied by the LLM
87
+ # @return [String] tool observation, or +"Error: ..."+ on validation
88
+ # failure
89
+ def run(args)
90
+ validated = @parameters.validate(args)
91
+ @execute.call(**validated)
92
+ rescue Tool::Parameters::ValidationError => e
93
+ "Error: #{e.message}"
94
+ end
95
+
96
+ # Build a synthetic +RubyLLM::Tool+ subclass that wraps this Tool. The
97
+ # subclass is what +RubyLLM::Chat#with_tool+ accepts: ruby_llm
98
+ # instantiates it (+tool.new+) and routes tool calls through
99
+ # +instance.call(args)+, which lands in +#execute(**args)+ and
100
+ # delegates back to {#run} on this instance.
101
+ #
102
+ # @return [Class] anonymous +RubyLLM::Tool+ subclass
103
+ def to_ruby_llm_tool
104
+ pikuri_tool = self
105
+ schema = @parameters.to_h
106
+ tool_name = @name
107
+ tool_desc = @description
108
+
109
+ Class.new(RubyLLM::Tool) do
110
+ description(tool_desc)
111
+ params(schema)
112
+
113
+ define_singleton_method(:name) { tool_name }
114
+ define_method(:execute) { |**args| pikuri_tool.run(args) }
115
+ end
116
+ end
117
+ end
118
+ end
@@ -0,0 +1,112 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'digest'
4
+ require 'fileutils'
5
+
6
+ module Pikuri
7
+ # On-disk cache for string-keyed text payloads. Used by the bundled tools
8
+ # to avoid re-fetching the same page or re-issuing the same web-search
9
+ # query within a TTL window: {Tool::WebScrape.visit} caches the rendered
10
+ # Markdown for a URL, and {Tool::Search::Engines.search} caches the
11
+ # rendered result list for a query (the query string itself acts as the
12
+ # key — keys are SHA-256 hashed, so any opaque string works).
13
+ #
14
+ # Each tool wires its own {UrlCache} instance against a dedicated
15
+ # subdirectory under {ROOT_DIR}, so a +web_search+ query string and a
16
+ # +web_scrape+ URL string can never collide on the same cache file. There
17
+ # is no global default singleton — pass a fresh instance to whichever
18
+ # code needs caching, or use {NULL} to disable caching entirely.
19
+ #
20
+ # One file per entry, named +<sha256>.txt+ under {#initialize}'s +dir+.
21
+ # Freshness is tracked via the file's mtime; there is no sidecar metadata.
22
+ # Stale entries are simply overwritten the next time {#fetch} is called
23
+ # with the same key. To clear the cache, +rm -rf+ the directory.
24
+ #
25
+ # Not thread-safe: if two callers race on the same cold key, both compute
26
+ # and both write the same file. That is the intended tradeoff to keep this
27
+ # under a few dozen lines — the worst-case cost is a duplicate fetch.
28
+ class UrlCache
29
+ # Root directory under which per-tool cache subdirectories live.
30
+ # Follows the XDG Base Directory spec: +$XDG_CACHE_HOME/pikuri/url_cache+
31
+ # if the env var is set to a non-empty value, else
32
+ # +~/.cache/pikuri/url_cache+. Each tool picks its own subdir
33
+ # (e.g. +"#{ROOT_DIR}/web_scrape"+) so keys from different tools cannot
34
+ # collide. The directory is created lazily on first cache write; pikuri
35
+ # does not pre-create it.
36
+ # @return [String]
37
+ ROOT_DIR = begin
38
+ xdg = ENV['XDG_CACHE_HOME']
39
+ cache_home = xdg && !xdg.empty? ? xdg : File.join(Dir.home, '.cache')
40
+ File.join(cache_home, 'pikuri', 'url_cache')
41
+ end.freeze
42
+
43
+ # Default freshness window: 2 hours, in seconds.
44
+ #
45
+ # Long enough to cover a single interactive session — revisiting
46
+ # a scraped page or re-running a similar search within the same
47
+ # working window hits the cache. Short enough that resuming the
48
+ # next day doesn't serve stale news, docs, or search results.
49
+ # Reference points: opencode keeps no cache, the +pi-web-fetch+
50
+ # community extension uses 15 minutes, +pi-web-search+ uses 5;
51
+ # 2 hours sits comfortably above the "single follow-up" window
52
+ # those numbers are aimed at without holding content across days.
53
+ # @return [Integer]
54
+ DEFAULT_TTL = 2 * 60 * 60
55
+
56
+ # @param ttl [Integer] freshness window in seconds; entries with an
57
+ # mtime older than this are treated as misses
58
+ # @param dir [String] directory under which cache files live; created
59
+ # lazily on first write
60
+ def initialize(ttl:, dir:)
61
+ @ttl = ttl
62
+ @dir = dir
63
+ end
64
+
65
+ # Return the cached payload for +url+ if a fresh entry exists, otherwise
66
+ # yield to compute it, persist the result, and return it.
67
+ #
68
+ # The block is only invoked on a miss. If the block raises, no file is
69
+ # written — errors are not cached.
70
+ #
71
+ # @param url [String] cache key; a URL or any opaque string identifier
72
+ # @yieldreturn [String] payload to store and return on a miss
73
+ # @return [String] cached or freshly-computed payload
74
+ def fetch(url)
75
+ path = path_for(url)
76
+ return File.read(path) if fresh?(path)
77
+
78
+ content = yield
79
+ FileUtils.mkdir_p(@dir)
80
+ File.write(path, content)
81
+ content
82
+ end
83
+
84
+ # @param path [String]
85
+ # @return [Boolean] true when +path+ exists and was written within the
86
+ # TTL window
87
+ def fresh?(path)
88
+ File.exist?(path) && Time.now - File.mtime(path) < @ttl
89
+ end
90
+
91
+ # @param url [String]
92
+ # @return [String] absolute path of the cache file for +url+
93
+ def path_for(url)
94
+ File.join(@dir, "#{Digest::SHA256.hexdigest(url)}.txt")
95
+ end
96
+
97
+ # Null cache: a drop-in replacement that always misses and never
98
+ # persists. Use this in tests (or anywhere else you want caching off)
99
+ # without giving up the {UrlCache#fetch} contract.
100
+ NULL = Object.new
101
+ # Singleton +fetch+ on {NULL} that mirrors {UrlCache#fetch}'s shape:
102
+ # ignores +_key+, always yields, never persists.
103
+ #
104
+ # @param _key [String] cache key; ignored
105
+ # @yieldreturn [String] freshly-computed payload
106
+ # @return [String] whatever the block returned
107
+ def NULL.fetch(_key)
108
+ yield
109
+ end
110
+ NULL.freeze
111
+ end
112
+ end
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Pikuri
4
+ # Gem version, advertised in +pikuri.gemspec+. Bump on every release
5
+ # following semver: patch for bug fixes, minor for backward-compatible
6
+ # additions to the public surface (+Pikuri::Tool+ / +Pikuri::Agent+ /
7
+ # listeners / bundled tools), major for breaking changes to that
8
+ # surface or to the +bin/pikuri-*+ CLIs.
9
+ VERSION = '0.0.3'
10
+ end
@@ -0,0 +1,177 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'logger'
4
+ require 'zeitwerk'
5
+
6
+ # +Pikuri::VERSION+ lives in its own tiny file so the gemspec can read
7
+ # the version constant via +require_relative+ at build time without
8
+ # loading Zeitwerk and every runtime dep. Requiring it here makes the
9
+ # constant available to ordinary +require 'pikuri-core'+ callers too —
10
+ # without this line, Zeitwerk's ignore-rule below would leave
11
+ # +Pikuri::VERSION+ undefined for anyone using the installed gem.
12
+ require_relative 'pikuri/version'
13
+
14
+ # Boot file: configures the Zeitwerk autoloader for every file under
15
+ # +pikuri-core/lib/pikuri/+ and eager-loads them all. After
16
+ # +require 'pikuri-core'+, every constant pikuri-core ships
17
+ # (+Pikuri::Agent+, +Pikuri::Tool+, +Pikuri::Tool::Calculator+, ...)
18
+ # is defined and resolvable. Sibling gems (+pikuri-skills+,
19
+ # +pikuri-mcp+, ...) set up their own Zeitwerk loaders rooted at
20
+ # their own +lib/+ and contribute to the same +Pikuri::+ namespace.
21
+ #
22
+ # Beyond loading, the +Pikuri+ module owns the logging surface — see
23
+ # {.logger_for}, {.log_io=}, and the +PIKURI_LOG+ /
24
+ # +PIKURI_LOG_<NAME>+ env vars. Each subsystem holds its own
25
+ # memoized +Logger+, all writing through a shared IO that
26
+ # {.log_io=} can swap in one shot (handy in tests, daemons, or
27
+ # anywhere stderr isn't the right sink).
28
+ #
29
+ # == Why eager-load
30
+ #
31
+ # Tool implementations (+Pikuri::Tool::CALCULATOR+,
32
+ # +Pikuri::Tool::WEB_SEARCH+, +Pikuri::Tool::WEB_SCRAPE+,
33
+ # +Pikuri::Tool::FETCH+) are +ALL_CAPS+ value constants rather than
34
+ # classes/modules, and Zeitwerk only auto-loads constants that match
35
+ # its filename-↔-CamelCase convention. Eager-loading at boot guarantees
36
+ # the files defining those values run, so the bin script can drop them
37
+ # straight into the +Agent.new+ block via +c.add_tool+ without per-file
38
+ # +require+ ceremony. The cost is a few milliseconds of startup —
39
+ # negligible compared to a single LLM round-trip.
40
+ module Pikuri
41
+ # Search path for bundled system prompts. Mutable list: each pikuri
42
+ # gem appends its own +prompts/+ directory when it boots, so a
43
+ # +Pikuri.prompt(name)+ call from a host that requires +pikuri-code+
44
+ # (for example) finds +coding-system-prompt.txt+ in
45
+ # +pikuri-code/prompts/+. The core gem registers its own
46
+ # +pikuri-core/prompts/+ below; sibling gems do the equivalent in
47
+ # their own entry files.
48
+ #
49
+ # Exposed publicly so a downstream library user can read pikuri's
50
+ # prompts as a starting point for their own system prompt (or
51
+ # use them verbatim). Prefer {.prompt} for the common case of
52
+ # loading one by name.
53
+ #
54
+ # @return [Array<String>]
55
+ PROMPT_DIRS = [File.expand_path('../prompts', __dir__)]
56
+
57
+ # Mapping from +PIKURI_LOG+ env-var values (lowercased) to
58
+ # {Logger} level constants. Anything else falls back to +INFO+.
59
+ #
60
+ # @return [Hash{String=>Integer}]
61
+ LOG_LEVELS = {
62
+ 'debug' => Logger::DEBUG,
63
+ 'info' => Logger::INFO,
64
+ 'warn' => Logger::WARN,
65
+ 'error' => Logger::ERROR,
66
+ 'fatal' => Logger::FATAL
67
+ }.freeze
68
+
69
+ @log_io = $stderr
70
+ @log_loggers = {} # name → Logger; lets {.log_io=} rewire all of them
71
+ @log_default = LOG_LEVELS.fetch(ENV['PIKURI_LOG'].to_s.downcase, Logger::INFO)
72
+
73
+ class << self
74
+ # @return [IO] shared sink every {.logger_for} writes through
75
+ attr_reader :log_io
76
+
77
+ # Swap the shared sink (e.g. a +StringIO+ in tests, a file in a
78
+ # daemon). Re-points every logger already handed out by
79
+ # {.logger_for} so callers don't have to re-fetch them.
80
+ #
81
+ # Why we don't use +Logger#reopen+: that method closes the previous
82
+ # sink before installing the new one, which makes "swap to a
83
+ # capture +StringIO+, then swap back to the original" impossible —
84
+ # the original handle would be dead the second time around. Instead
85
+ # we swap each logger's +@logdev+ to a fresh +Logger::LogDevice+
86
+ # over the new +io+, leaving the previous device untouched so the
87
+ # caller can hand the same +IO+ back in later. This uses a documented
88
+ # but internal Logger ivar; if a future Ruby renames it the
89
+ # +Engines+ logging specs will fail loudly.
90
+ #
91
+ # @param io [IO] new sink for every Pikuri logger
92
+ # @return [IO]
93
+ def log_io=(io)
94
+ @log_io = io
95
+ @log_loggers.each_value do |lg|
96
+ lg.instance_variable_set(:@logdev, Logger::LogDevice.new(io))
97
+ end
98
+ io
99
+ end
100
+
101
+ # Memoized {Logger} tagged with +name+ in its +progname+, so each
102
+ # subsystem's lines stand out in the shared sink. Level resolves
103
+ # in this order: +PIKURI_LOG_<NAME>+ (e.g. +PIKURI_LOG_ENGINES=debug+),
104
+ # then +PIKURI_LOG+, then +INFO+.
105
+ #
106
+ # Repeated calls with the same +name+ return the same instance so
107
+ # there is one logger per subsystem and {.log_io=} can rewire them
108
+ # all in one shot.
109
+ #
110
+ # @param name [String] subsystem tag (rendered as +progname+)
111
+ # @return [Logger]
112
+ def logger_for(name)
113
+ @log_loggers[name] ||= begin
114
+ lg = Logger.new(@log_io, progname: name)
115
+ override = ENV["PIKURI_LOG_#{name.upcase}"].to_s.downcase
116
+ lg.level = LOG_LEVELS.fetch(override, @log_default)
117
+ lg
118
+ end
119
+ end
120
+
121
+ # Read a bundled prompt by basename. Searches every directory in
122
+ # {PROMPT_DIRS} in order, returning the first match. +.txt+ is
123
+ # auto-appended if absent. Symbols are accepted as a convenience
124
+ # (+:pikuri-chat+ / +'pikuri-chat'+ / +'pikuri-chat.txt'+ all
125
+ # resolve to the same file).
126
+ #
127
+ # Intended for downstream library users who want to bootstrap their
128
+ # own +Pikuri::Agent+ wiring from pikuri's defaults — read the
129
+ # prompt, customize the bits they care about, hand the result to
130
+ # +Agent.new(system_prompt: ...)+.
131
+ #
132
+ # @example
133
+ # chat_prompt = Pikuri.prompt(:'pikuri-chat')
134
+ # agent = Pikuri::Agent.new(system_prompt: chat_prompt, ...)
135
+ #
136
+ # @param name [String, Symbol] basename of the prompt file
137
+ # @return [String] file contents
138
+ # @raise [ArgumentError] if no matching file exists in any
139
+ # directory in {PROMPT_DIRS}
140
+ def prompt(name)
141
+ basename = name.to_s
142
+ basename += '.txt' unless basename.end_with?('.txt')
143
+ PROMPT_DIRS.each do |dir|
144
+ path = File.join(dir, basename)
145
+ return File.read(path) if File.exist?(path)
146
+ end
147
+ available = PROMPT_DIRS.flat_map { |dir| Dir.exist?(dir) ? Dir.children(dir) : [] }.sort.uniq
148
+ raise ArgumentError, "Unknown pikuri prompt #{name.inspect}; available: #{available.join(', ')}"
149
+ end
150
+ end
151
+
152
+ # Zeitwerk loader managing every constant under
153
+ # +pikuri-core/lib/pikuri/+. Exposed as a constant (rather than scoped
154
+ # to the boot block) so a downstream host that wants to add ignore
155
+ # rules can reach it without monkey-patching. Sibling gems
156
+ # (+pikuri-skills+, +pikuri-mcp+, ...) set up their own loaders
157
+ # rooted at their own +lib/+ dirs — see each gem's entry file.
158
+ #
159
+ # @return [Zeitwerk::Loader]
160
+ Loader = Zeitwerk::Loader.new
161
+ Loader.tag = 'pikuri-core'
162
+ Loader.push_dir(File.expand_path('.', __dir__))
163
+ Loader.ignore(__FILE__)
164
+ # +pikuri/version.rb+ defines +Pikuri::VERSION+ (an ALL_CAPS value
165
+ # constant), not +Pikuri::Version+ as Zeitwerk would expect from the
166
+ # filename — so tell the loader to skip it. The file is +require_relative+'d
167
+ # at the top of this file (and also by the gemspec at build time), so
168
+ # the constant is defined before Zeitwerk runs.
169
+ Loader.ignore(File.expand_path('pikuri/version.rb', __dir__))
170
+ Loader.inflector.inflect(
171
+ 'html' => 'HTML',
172
+ 'pdf' => 'PDF',
173
+ 'duckduckgo' => 'DuckDuckGo'
174
+ )
175
+ Loader.setup
176
+ Loader.eager_load
177
+ end
@@ -0,0 +1,15 @@
1
+ You are an expert assistant who solves tasks by calling tools when needed.
2
+
3
+ You have access to tools described in the API's tool list. To call one, use the standard tool-call mechanism — do not write tool calls as text.
4
+
5
+ If several next steps are independent (e.g. two unrelated lookups), emit them as parallel tool calls in a single turn rather than one at a time.
6
+
7
+ Choosing a tool:
8
+ - Default to `web_search` whenever a question asks for a specific fact about a named entity — company HQs and headcounts, product versions and release dates, prices, people's bios and roles, anything tying a proper noun to a specific attribute — and for anything time-sensitive or obscure. Your training will *feel* certain on these and is often wrong; check rather than guess. One well-chosen search beats three exploratory ones.
9
+ - Use `calculator` for any arithmetic beyond simple mental math. Don't search the web for "what is X * Y".
10
+ - Reply directly, without any tool, for language and translation, definitions of common terms, and widely-known general history.
11
+
12
+ Other guidelines:
13
+ - Don't repeat a tool call with identical arguments — re-read the previous observation instead.
14
+ - On a tool error (observation starting with `Error:`): use the data you already have to answer if you can. If you can't, reply to the user that you weren't able to find the answer and briefly say why (e.g. "web search hit a rate limit", "the page wasn't reachable"). Do not retry the same call hoping for a different result, and do not loop on rephrased variants of the same failing call.
15
+ - When you have the answer, reply in plain text with no tool call. That is how you finish.