tavily 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: a7b9e2b04dccf49bff52ed85008cc143ec009b95f289efed38cd230c165ee2fa
4
+ data.tar.gz: '089bd278c94dcf7ae30976a4e596dd36b7e462efda363369030592df105b0844'
5
+ SHA512:
6
+ metadata.gz: 1943c54bdfa88cbc213565714c523523bf389ba3319d18ba3f8a9b4a38cb34a07151d6a5691c8f7bbb935171817265a9744b30b9a9cd6a9c1e92c9b76538bbd8
7
+ data.tar.gz: 0a5c3869a3493867a68ddfe5c0c7c7c463492cbfa2391e61c77a2f457e793349d1e3539001946ada94b618ad76a103f951db15088e80a3d31b0bb79cc9e1dd14
data/CHANGELOG.md ADDED
@@ -0,0 +1,30 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project are documented here. The format is based on
4
+ [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
5
+ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
+
7
+ ## [Unreleased]
8
+
9
+ ## [0.1.0] - 2026-06-02
10
+
11
+ ### Added
12
+ - Initial release.
13
+ - `Tavily::Client` covering the Tavily REST API:
14
+ - `#search` (plus `#qna_search` and `#search_context` helpers)
15
+ - `#extract`
16
+ - `#crawl`
17
+ - `#map`
18
+ - `#research`, `#research_task`, `#wait_for_research`, and streaming research
19
+ via a block.
20
+ - Typed response objects (`SearchResponse`, `ExtractResponse`, `CrawlResponse`,
21
+ `MapResponse`, `ResearchTask`, and friends) with raw-hash access preserved.
22
+ - Global configuration (`Tavily.configure`) and per-client overrides.
23
+ - Automatic retries with exponential backoff for transient failures, honoring
24
+ the `Retry-After` header on `429` responses.
25
+ - Granular error hierarchy, including Tavily's non-standard `432` (plan limit)
26
+ and `433` (pay-as-you-go limit) status codes.
27
+ - Built entirely on the Ruby standard library, no runtime dependencies.
28
+
29
+ [Unreleased]: https://github.com/main-path/tavily/compare/v0.1.0...HEAD
30
+ [0.1.0]: https://github.com/main-path/tavily/releases/tag/v0.1.0
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2026 ned
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,319 @@
1
+ # Tavily
2
+
3
+ [![CI](https://github.com/main-path/tavily/actions/workflows/ci.yml/badge.svg)](https://github.com/main-path/tavily/actions/workflows/ci.yml)
4
+ [![Gem Version](https://badge.fury.io/rb/tavily.svg)](https://rubygems.org/gems/tavily)
5
+
6
+ A lightweight, dependency-free Ruby client for the [Tavily](https://docs.tavily.com/welcome) API — the web-access layer built for LLMs and AI agents.
7
+
8
+ It wraps every Tavily endpoint with typed response objects, automatic retries, streaming research, and a granular error hierarchy:
9
+
10
+ - **Search** — `search`, plus the `qna_search` and `search_context` helpers
11
+ - **Extract** — clean content from one or many URLs
12
+ - **Crawl** — follow links from a root URL
13
+ - **Map** — discover a site's URL structure
14
+ - **Research** — start, poll, or live-stream an asynchronous research task
15
+
16
+ Built entirely on the Ruby standard library (`net/http`) — **no runtime dependencies**.
17
+
18
+ ## Installation
19
+
20
+ Add it to your Gemfile:
21
+
22
+ ```ruby
23
+ gem "tavily"
24
+ ```
25
+
26
+ Then run `bundle install`. Or install it directly:
27
+
28
+ ```sh
29
+ gem install tavily
30
+ ```
31
+
32
+ Requires Ruby 3.1+.
33
+
34
+ ## Quick start
35
+
36
+ ```ruby
37
+ require "tavily"
38
+
39
+ client = Tavily::Client.new(api_key: "tvly-YOUR_API_KEY")
40
+
41
+ response = client.search("who won the 2022 FIFA World Cup?", include_answer: true)
42
+
43
+ response.answer # => "Argentina won the 2022 FIFA World Cup..."
44
+ response.results.first.url # => "https://en.wikipedia.org/wiki/..."
45
+ response.results.first.title
46
+ response.credits # credit usage, when include_usage: true
47
+ ```
48
+
49
+ Get your API key from the [Tavily dashboard](https://app.tavily.com).
50
+
51
+ ## Configuration
52
+
53
+ You can configure a global default (used by `Tavily.search` and friends, and as the default for every new client):
54
+
55
+ ```ruby
56
+ Tavily.configure do |config|
57
+ config.api_key = ENV["TAVILY_API_KEY"] # default: ENV["TAVILY_API_KEY"]
58
+ config.base_url = "https://api.tavily.com"
59
+ config.timeout = 60 # read timeout (seconds)
60
+ config.open_timeout = 10 # connection-open timeout (seconds)
61
+ config.max_retries = 2 # automatic retries for transient failures
62
+ config.retry_base_delay = 0.5 # base seconds for exponential backoff
63
+ config.proxy = nil # "http://user:pass@host:port"
64
+ config.ca_file = nil # path to a PEM CA bundle (see "Windows / TLS")
65
+ config.logger = Logger.new($stdout) # optional request logging
66
+ end
67
+
68
+ Tavily.search("latest ruby release")
69
+ ```
70
+
71
+ Every option can also be overridden per client:
72
+
73
+ ```ruby
74
+ client = Tavily::Client.new(api_key: "tvly-...", timeout: 120, max_retries: 5)
75
+ ```
76
+
77
+ The following environment variables are read automatically: `TAVILY_API_KEY`, `TAVILY_BASE_URL`, `TAVILY_TIMEOUT`, `TAVILY_MAX_RETRIES`, `TAVILY_PROXY`, and `SSL_CERT_FILE`.
78
+
79
+ ## Endpoints
80
+
81
+ ### Search
82
+
83
+ ```ruby
84
+ response = client.search(
85
+ "embedded systems news",
86
+ topic: "news", # "general" (default), "news", or "finance"
87
+ search_depth: "advanced", # "basic" (default), "advanced", "fast", "ultra-fast"
88
+ max_results: 10, # 0–20 (default 5)
89
+ time_range: "week", # "day" | "week" | "month" | "year"
90
+ include_answer: "advanced", # true/false, "basic", or "advanced"
91
+ include_raw_content: "markdown",
92
+ include_images: true,
93
+ include_domains: ["arxiv.org"],
94
+ exclude_domains: ["example.com"],
95
+ include_usage: true
96
+ )
97
+
98
+ response.query
99
+ response.answer
100
+ response.results # => [Tavily::SearchResult, ...]
101
+ response.results.first.title
102
+ response.results.first.content
103
+ response.results.first.score
104
+ response.urls # => convenience array of result URLs
105
+ response.images # => [Tavily::Image, ...] (url + optional description)
106
+ response.credits # => Integer (when include_usage: true)
107
+ response.request_id
108
+ ```
109
+
110
+ #### `qna_search` — just the answer
111
+
112
+ ```ruby
113
+ client.qna_search("what is the capital of France?")
114
+ # => "Paris is the capital of France."
115
+ ```
116
+
117
+ #### `search_context` — a RAG-ready context string
118
+
119
+ Returns a JSON string of `[{ "url" =>, "content" => }, ...]`, trimmed to roughly `max_tokens` tokens.
120
+
121
+ ```ruby
122
+ context = client.search_context("ruby concurrency", max_tokens: 4000)
123
+ ```
124
+
125
+ ### Extract
126
+
127
+ ```ruby
128
+ response = client.extract(
129
+ ["https://docs.tavily.com/welcome", "https://example.com"],
130
+ extract_depth: "advanced", # "basic" (default) or "advanced"
131
+ format: "markdown", # "markdown" (default) or "text"
132
+ include_images: true,
133
+ include_usage: true
134
+ )
135
+
136
+ response.results # => [Tavily::ExtractResult, ...]
137
+ response.results.first.raw_content
138
+ response.failed_results # => [Tavily::FailedResult, ...] (url + error)
139
+ ```
140
+
141
+ A single URL string works too: `client.extract("https://example.com")`. Up to 20 URLs per request.
142
+
143
+ ### Crawl
144
+
145
+ ```ruby
146
+ response = client.crawl(
147
+ "https://docs.tavily.com",
148
+ instructions: "Find all pages about pricing", # natural-language guidance
149
+ max_depth: 2, # 1–5
150
+ max_breadth: 50, # links per level (1–500)
151
+ limit: 100, # total links to process
152
+ select_paths: ["/documentation/.*"], # regex allowlist
153
+ exclude_paths: ["/blog/.*"], # regex blocklist
154
+ extract_depth: "basic"
155
+ )
156
+
157
+ response.base_url
158
+ response.results # => [Tavily::CrawlResult, ...] (url, raw_content, favicon)
159
+ ```
160
+
161
+ ### Map
162
+
163
+ ```ruby
164
+ response = client.map("https://docs.tavily.com", max_depth: 2, limit: 100)
165
+
166
+ response.base_url
167
+ response.results # => ["https://docs.tavily.com/...", ...] (array of URLs)
168
+ ```
169
+
170
+ ### Research (asynchronous)
171
+
172
+ Research is an asynchronous endpoint. Start a task, then either poll for the result or stream it live.
173
+
174
+ **Create + poll:**
175
+
176
+ ```ruby
177
+ task = client.research(
178
+ "Compare the leading Ruby HTTP clients in 2025",
179
+ model: "mini", # "mini", "pro", or "auto" (default)
180
+ output_length: "standard" # "short", "standard", or "long"
181
+ )
182
+
183
+ task.request_id
184
+ task.status # => "pending"
185
+
186
+ # Block until it finishes (raises on failure / timeout):
187
+ result = client.wait_for_research(task.request_id, poll_interval: 3, timeout: 600)
188
+ result.content # => the final report (String, or Hash if output_schema given)
189
+ result.sources # => [Tavily::ResearchSource, ...] (title, url, favicon)
190
+
191
+ # Or check once, yourself:
192
+ client.research_task(task.request_id).completed?
193
+ ```
194
+
195
+ **Stream live** (pass a block to receive Server-Sent Events). Each `event` is a `Tavily::ResearchEvent` with an `.event` name and parsed `.data`. The stream is OpenAI-compatible: events are `chat.completion.chunk`s whose `choices/0/delta` carries either report text (`content`) or a research `tool_calls` step, ending with a `done` event (or an `error` event on failure):
196
+
197
+ ```ruby
198
+ client.research("Latest developments in fusion energy", model: "mini") do |event|
199
+ case event.event
200
+ when "chat.completion.chunk"
201
+ delta = event.data.dig("choices", 0, "delta")
202
+ print delta["content"] if delta && delta["content"]
203
+ when "error"
204
+ warn event.data["error"]
205
+ when "done"
206
+ puts "\n[done]"
207
+ end
208
+ end
209
+ ```
210
+
211
+ The gem yields every event exactly as the API sends it, so it keeps working even if Tavily changes the event schema.
212
+
213
+ You can also request structured output with a JSON Schema:
214
+
215
+ ```ruby
216
+ client.research(
217
+ "Top 3 EVs by range in 2025",
218
+ output_schema: {
219
+ "type" => "object",
220
+ "properties" => { "cars" => { "type" => "array", "items" => { "type" => "string" } } },
221
+ "required" => ["cars"]
222
+ }
223
+ )
224
+ ```
225
+
226
+ ## Response objects
227
+
228
+ Every endpoint returns a typed object that wraps the raw JSON. Declared fields are exposed as methods, and the full payload is always reachable:
229
+
230
+ ```ruby
231
+ response = client.search("ruby")
232
+
233
+ response.results.first.title # typed accessor
234
+ response["request_id"] # raw access by key (String or Symbol)
235
+ response.dig("usage", "credits")
236
+ response.to_h # the complete parsed Hash
237
+ ```
238
+
239
+ Because access falls through to the raw hash, any new field Tavily adds is reachable immediately — and any new request parameter can be passed through as a keyword argument without waiting for a gem update:
240
+
241
+ ```ruby
242
+ client.search("ruby", some_new_param: true) # forwarded straight to the API
243
+ ```
244
+
245
+ ## Error handling
246
+
247
+ Non-2xx responses raise a subclass of `Tavily::APIError`, which carries the HTTP status, parsed body, and request id:
248
+
249
+ ```ruby
250
+ begin
251
+ client.search("ruby")
252
+ rescue Tavily::RateLimitError => e
253
+ retry_later
254
+ rescue Tavily::APIError => e
255
+ e.status # => 401
256
+ e.message # => "[401] Unauthorized: missing or invalid API key."
257
+ e.request_id # => "..." (quote this in support tickets)
258
+ e.body # => parsed response body
259
+ end
260
+ ```
261
+
262
+ | Exception | Status | Meaning |
263
+ |---|---|---|
264
+ | `Tavily::BadRequestError` | 400 | Invalid request or parameter value |
265
+ | `Tavily::AuthenticationError` | 401 | Missing or invalid API key |
266
+ | `Tavily::ForbiddenError` | 403 | Not permitted (e.g. unsupported URL) |
267
+ | `Tavily::NotFoundError` | 404 | Resource not found |
268
+ | `Tavily::UnprocessableEntityError` | 422 | Request body failed validation |
269
+ | `Tavily::RateLimitError` | 429 | Rate limit exceeded (honors `Retry-After`) |
270
+ | `Tavily::PlanLimitError` | 432 | Plan/key credit quota exceeded |
271
+ | `Tavily::PayAsYouGoLimitError` | 433 | Pay-as-you-go limit exceeded |
272
+ | `Tavily::ServerError` | 5xx | Tavily server-side error |
273
+
274
+ `Tavily::PlanLimitError` and `Tavily::PayAsYouGoLimitError` share the ancestor `Tavily::UsageLimitError`. Network problems raise `Tavily::TimeoutError` or `Tavily::ConnectionError`, and a missing key raises `Tavily::ConfigurationError`. Everything ultimately descends from `Tavily::Error`.
275
+
276
+ ## Retries
277
+
278
+ Transient failures (HTTP 408/409/425/429/5xx and network timeouts) are retried automatically up to `max_retries` times with exponential backoff and jitter. On a `429`, the `Retry-After` header is respected. Streaming research requests are not retried.
279
+
280
+ ## Windows / TLS certificates
281
+
282
+ Some Windows Ruby builds (including MSVC builds) ship without a usable default OpenSSL certificate store, which causes `certificate verify failed (unable to get local issuer certificate)`. Point the client at a CA bundle:
283
+
284
+ ```ruby
285
+ Tavily.configure { |c| c.ca_file = 'C:\path\to\cacert.pem' }
286
+ ```
287
+
288
+ Or set it for the whole process before requiring the gem:
289
+
290
+ ```sh
291
+ # A bundle ships with Git for Windows, for example:
292
+ set SSL_CERT_FILE=C:\Program Files\Git\mingw64\etc\ssl\certs\ca-bundle.crt
293
+ ```
294
+
295
+ You can also download an up-to-date bundle from <https://curl.se/ca/cacert.pem>.
296
+
297
+ ## Development
298
+
299
+ ```sh
300
+ bin/setup # bundle install + create .env
301
+ bundle exec rake # run the specs and RuboCop
302
+ bin/console # an IRB session with the gem loaded
303
+ ```
304
+
305
+ The default `rake` task runs RSpec and RuboCop. The offline suite uses [WebMock](https://github.com/bblimke/webmock) and makes no network calls.
306
+
307
+ To run the live integration suite (consumes credits):
308
+
309
+ ```sh
310
+ TAVILY_LIVE=1 TAVILY_API_KEY=tvly-... bundle exec rspec spec/live_spec.rb
311
+ ```
312
+
313
+ ## Contributing
314
+
315
+ Bug reports and pull requests are welcome at <https://github.com/main-path/tavily>.
316
+
317
+ ## License
318
+
319
+ Released under the [MIT License](LICENSE.txt).
@@ -0,0 +1,276 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "configuration"
4
+ require_relative "connection"
5
+ require_relative "responses"
6
+
7
+ module Tavily
8
+ # The main entry point for talking to the Tavily API.
9
+ #
10
+ # @example
11
+ # client = Tavily::Client.new(api_key: "tvly-...")
12
+ # response = client.search("who won the 2022 world cup?", include_answer: true)
13
+ # response.answer # => "Argentina ..."
14
+ #
15
+ # Every endpoint method accepts the documented parameters as keyword
16
+ # arguments and forwards any additional keywords (`**extra`) straight into the
17
+ # request body, so newly released API parameters work without a gem upgrade.
18
+ class Client
19
+ # Default polling interval (seconds) for {#wait_for_research}.
20
+ DEFAULT_POLL_INTERVAL = 3.0
21
+ # Default overall timeout (seconds) for {#wait_for_research}.
22
+ DEFAULT_POLL_TIMEOUT = 600
23
+
24
+ # @return [Configuration] the resolved configuration for this client.
25
+ attr_reader :config
26
+
27
+ # @param api_key [String, nil] overrides {Configuration#api_key}
28
+ # @param options [Hash] any other {Configuration} attribute to override
29
+ # (e.g. +base_url:+, +timeout:+, +max_retries:+, +logger:+)
30
+ def initialize(api_key: nil, **options)
31
+ @config = build_config(api_key, options)
32
+ @connection = Connection.new(@config)
33
+ end
34
+
35
+ # Execute a web search.
36
+ #
37
+ # @param query [String] the search query (required)
38
+ # @param search_depth [String, nil] "basic", "advanced", "fast", or "ultra-fast"
39
+ # @param topic [String, nil] "general", "news", or "finance"
40
+ # @param max_results [Integer, nil] number of results (0–20, default 5)
41
+ # @param chunks_per_source [Integer, nil] 1–3, advanced depth only
42
+ # @param time_range [String, nil] "day"/"week"/"month"/"year" (or d/w/m/y)
43
+ # @param days [Integer, nil] limit to the last N days (news topic)
44
+ # @param start_date [String, nil] "YYYY-MM-DD"
45
+ # @param end_date [String, nil] "YYYY-MM-DD"
46
+ # @param include_answer [Boolean, String, nil] true/false, "basic", or "advanced"
47
+ # @param include_raw_content [Boolean, String, nil] true/false, "markdown", or "text"
48
+ # @param include_images [Boolean, nil]
49
+ # @param include_image_descriptions [Boolean, nil] requires +include_images: true+
50
+ # @param include_favicon [Boolean, nil]
51
+ # @param include_domains [Array<String>, nil] up to 300 domains
52
+ # @param exclude_domains [Array<String>, nil] up to 150 domains
53
+ # @param country [String, nil] boost results from a country (general topic)
54
+ # @param auto_parameters [Boolean, nil] let Tavily choose parameters
55
+ # @param include_usage [Boolean, nil] include credit usage in the response
56
+ # @param extra [Hash] any additional request-body parameters
57
+ # @return [SearchResponse]
58
+ def search(query, search_depth: nil, topic: nil, max_results: nil, chunks_per_source: nil,
59
+ time_range: nil, days: nil, start_date: nil, end_date: nil, include_answer: nil,
60
+ include_raw_content: nil, include_images: nil, include_image_descriptions: nil,
61
+ include_favicon: nil, include_domains: nil, exclude_domains: nil, country: nil,
62
+ auto_parameters: nil, include_usage: nil, **extra)
63
+ body = {
64
+ query: query, search_depth: search_depth, topic: topic, max_results: max_results,
65
+ chunks_per_source: chunks_per_source, time_range: time_range, days: days,
66
+ start_date: start_date, end_date: end_date, include_answer: include_answer,
67
+ include_raw_content: include_raw_content, include_images: include_images,
68
+ include_image_descriptions: include_image_descriptions, include_favicon: include_favicon,
69
+ include_domains: include_domains, exclude_domains: exclude_domains, country: country,
70
+ auto_parameters: auto_parameters, include_usage: include_usage
71
+ }.merge(extra)
72
+ SearchResponse.new(@connection.post("/search", body))
73
+ end
74
+
75
+ # Convenience: run a search and return only the generated answer string.
76
+ # @param query [String]
77
+ # @param options [Hash] forwarded to {#search}
78
+ # @return [String, nil] the answer, or nil if none was produced
79
+ def qna_search(query, **options)
80
+ options[:include_answer] = true unless options.key?(:include_answer)
81
+ search(query, **options).answer
82
+ end
83
+
84
+ # Convenience: run a search and return a compact context string suitable for
85
+ # RAG prompts: a JSON array of {url, content} objects trimmed to roughly
86
+ # +max_tokens+ tokens (estimated at ~4 characters per token). The top result
87
+ # is always included, even if it alone exceeds the budget.
88
+ # @param query [String]
89
+ # @param max_tokens [Integer] approximate token budget for the context
90
+ # @param options [Hash] forwarded to {#search}
91
+ # @return [String] JSON string of [{ "url" =>, "content" => }, ...]
92
+ def search_context(query, max_tokens: 4000, **options)
93
+ budget = max_tokens * 4
94
+ used = 0
95
+ sources = []
96
+ search(query, **options).results.each do |result|
97
+ entry = { "url" => result.url, "content" => result.content }
98
+ sources << entry
99
+ used += JSON.generate(entry).length
100
+ break if used >= budget
101
+ end
102
+ JSON.generate(sources)
103
+ end
104
+
105
+ # Extract clean content from one or more URLs.
106
+ #
107
+ # @param urls [String, Array<String>] one URL or an array (max 20)
108
+ # @param query [String, nil] rerank extracted chunks by this intent
109
+ # @param chunks_per_source [Integer, nil] 1–5, only with +query+
110
+ # @param extract_depth [String, nil] "basic" or "advanced"
111
+ # @param include_images [Boolean, nil]
112
+ # @param include_favicon [Boolean, nil]
113
+ # @param format [String, nil] "markdown" or "text"
114
+ # @param timeout [Numeric, nil] per-request extraction timeout (1.0–60.0s)
115
+ # @param include_usage [Boolean, nil]
116
+ # @param extra [Hash] any additional request-body parameters
117
+ # @return [ExtractResponse]
118
+ def extract(urls, query: nil, chunks_per_source: nil, extract_depth: nil, include_images: nil,
119
+ include_favicon: nil, format: nil, timeout: nil, include_usage: nil, **extra)
120
+ body = {
121
+ urls: urls, query: query, chunks_per_source: chunks_per_source, extract_depth: extract_depth,
122
+ include_images: include_images, include_favicon: include_favicon, format: format,
123
+ timeout: timeout, include_usage: include_usage
124
+ }.merge(extra)
125
+ ExtractResponse.new(@connection.post("/extract", body))
126
+ end
127
+
128
+ # Crawl a site starting from a root URL, following links.
129
+ #
130
+ # @param url [String] root URL (required)
131
+ # @param instructions [String, nil] natural-language crawl guidance
132
+ # @param chunks_per_source [Integer, nil] 1–5, only with +instructions+
133
+ # @param max_depth [Integer, nil] 1–5
134
+ # @param max_breadth [Integer, nil] 1–500 links per level
135
+ # @param limit [Integer, nil] total links to process
136
+ # @param select_paths [Array<String>, nil] regex path allowlist
137
+ # @param select_domains [Array<String>, nil] regex domain allowlist
138
+ # @param exclude_paths [Array<String>, nil] regex path blocklist
139
+ # @param exclude_domains [Array<String>, nil] regex domain blocklist
140
+ # @param allow_external [Boolean, nil] follow external domains (default true)
141
+ # @param include_images [Boolean, nil]
142
+ # @param extract_depth [String, nil] "basic" or "advanced"
143
+ # @param format [String, nil] "markdown" or "text"
144
+ # @param include_favicon [Boolean, nil]
145
+ # @param timeout [Numeric, nil] 10–150s
146
+ # @param include_usage [Boolean, nil]
147
+ # @param extra [Hash] any additional request-body parameters
148
+ # @return [CrawlResponse]
149
+ def crawl(url, instructions: nil, chunks_per_source: nil, max_depth: nil, max_breadth: nil,
150
+ limit: nil, select_paths: nil, select_domains: nil, exclude_paths: nil,
151
+ exclude_domains: nil, allow_external: nil, include_images: nil, extract_depth: nil,
152
+ format: nil, include_favicon: nil, timeout: nil, include_usage: nil, **extra)
153
+ body = {
154
+ url: url, instructions: instructions, chunks_per_source: chunks_per_source,
155
+ max_depth: max_depth, max_breadth: max_breadth, limit: limit, select_paths: select_paths,
156
+ select_domains: select_domains, exclude_paths: exclude_paths, exclude_domains: exclude_domains,
157
+ allow_external: allow_external, include_images: include_images, extract_depth: extract_depth,
158
+ format: format, include_favicon: include_favicon, timeout: timeout, include_usage: include_usage
159
+ }.merge(extra)
160
+ CrawlResponse.new(@connection.post("/crawl", body))
161
+ end
162
+
163
+ # Map the structure of a site, returning discovered URLs.
164
+ #
165
+ # @param url [String] root URL (required)
166
+ # @param instructions [String, nil] natural-language mapping guidance
167
+ # @param max_depth [Integer, nil] 1–5
168
+ # @param max_breadth [Integer, nil] 1–500
169
+ # @param limit [Integer, nil] 1–500
170
+ # @param select_paths [Array<String>, nil] regex path allowlist
171
+ # @param select_domains [Array<String>, nil] regex domain allowlist
172
+ # @param exclude_paths [Array<String>, nil] regex path blocklist
173
+ # @param exclude_domains [Array<String>, nil] regex domain blocklist
174
+ # @param allow_external [Boolean, nil] follow external domains (default true)
175
+ # @param timeout [Numeric, nil] 10–150s
176
+ # @param include_usage [Boolean, nil]
177
+ # @param extra [Hash] any additional request-body parameters
178
+ # @return [MapResponse]
179
+ def map(url, instructions: nil, max_depth: nil, max_breadth: nil, limit: nil, select_paths: nil,
180
+ select_domains: nil, exclude_paths: nil, exclude_domains: nil, allow_external: nil,
181
+ timeout: nil, include_usage: nil, **extra)
182
+ body = {
183
+ url: url, instructions: instructions, max_depth: max_depth, max_breadth: max_breadth,
184
+ limit: limit, select_paths: select_paths, select_domains: select_domains,
185
+ exclude_paths: exclude_paths, exclude_domains: exclude_domains, allow_external: allow_external,
186
+ timeout: timeout, include_usage: include_usage
187
+ }.merge(extra)
188
+ MapResponse.new(@connection.post("/map", body))
189
+ end
190
+
191
+ # Start an asynchronous research task, or stream it live.
192
+ #
193
+ # Without a block this creates the task and returns immediately with a
194
+ # {ResearchTask} whose +status+ is "pending"; poll it with
195
+ # {#research_task} or {#wait_for_research}. With a block, the task is
196
+ # streamed and each Server-Sent Event is yielded as a {ResearchEvent}.
197
+ #
198
+ # @param input [String] the research question (required)
199
+ # @param model [String, nil] "mini", "pro", or "auto"
200
+ # @param output_schema [Hash, nil] JSON Schema for structured output
201
+ # @param citation_format [String, nil] "numbered", "mla", "apa", or "chicago"
202
+ # @param include_domains [Array<String>, nil] up to 20 preferred domains
203
+ # @param exclude_domains [Array<String>, nil] up to 20 blocked domains
204
+ # @param output_length [String, nil] "short", "standard", or "long"
205
+ # @param files [Array<Hash>, nil] up to 5 base64 file objects
206
+ # @param extra [Hash] any additional request-body parameters
207
+ # @yieldparam event [ResearchEvent] (streaming mode only)
208
+ # @return [ResearchTask, nil] the queued task, or nil when streaming
209
+ def research(input, model: nil, output_schema: nil, citation_format: nil, include_domains: nil,
210
+ exclude_domains: nil, output_length: nil, files: nil, **extra, &block)
211
+ body = {
212
+ input: input, model: model, output_schema: output_schema, citation_format: citation_format,
213
+ include_domains: include_domains, exclude_domains: exclude_domains,
214
+ output_length: output_length, files: files
215
+ }.merge(extra)
216
+
217
+ if block
218
+ @connection.stream("/research", body.merge(stream: true), &block)
219
+ nil
220
+ else
221
+ ResearchTask.new(@connection.post("/research", body))
222
+ end
223
+ end
224
+
225
+ # Fetch the current status and (when finished) result of a research task.
226
+ # @param request_id [String]
227
+ # @return [ResearchTask]
228
+ def research_task(request_id)
229
+ ResearchTask.new(@connection.get("/research/#{request_id}"))
230
+ end
231
+
232
+ # Poll a research task until it completes or fails.
233
+ #
234
+ # @param request_id [String]
235
+ # @param poll_interval [Numeric] seconds between polls
236
+ # @param timeout [Numeric] overall timeout in seconds
237
+ # @yieldparam task [ResearchTask] optional progress callback on each poll
238
+ # @raise [Tavily::Error] if the task fails or the timeout is exceeded
239
+ # @return [ResearchTask] the completed task
240
+ def wait_for_research(request_id, poll_interval: DEFAULT_POLL_INTERVAL,
241
+ timeout: DEFAULT_POLL_TIMEOUT)
242
+ deadline = monotonic_now + timeout
243
+ loop do
244
+ task = research_task(request_id)
245
+ yield task if block_given?
246
+ return task if task.completed?
247
+ raise Error, "Research task #{request_id} failed" if task.failed?
248
+
249
+ if monotonic_now >= deadline
250
+ raise TimeoutError,
251
+ "Research task #{request_id} did not complete within #{timeout}s"
252
+ end
253
+
254
+ sleep(poll_interval)
255
+ end
256
+ end
257
+
258
+ private
259
+
260
+ def build_config(api_key, options)
261
+ cfg = Tavily.configuration.dup
262
+ cfg.api_key = api_key unless api_key.nil?
263
+ options.each do |key, value|
264
+ setter = "#{key}="
265
+ raise ConfigurationError, "Unknown configuration option: #{key.inspect}" unless cfg.respond_to?(setter)
266
+
267
+ cfg.public_send(setter, value)
268
+ end
269
+ cfg
270
+ end
271
+
272
+ def monotonic_now
273
+ Process.clock_gettime(Process::CLOCK_MONOTONIC)
274
+ end
275
+ end
276
+ end