tavily 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +30 -0
- data/LICENSE.txt +21 -0
- data/README.md +319 -0
- data/lib/tavily/client.rb +276 -0
- data/lib/tavily/configuration.rb +69 -0
- data/lib/tavily/connection.rb +298 -0
- data/lib/tavily/errors.rb +95 -0
- data/lib/tavily/object.rb +80 -0
- data/lib/tavily/responses.rb +197 -0
- data/lib/tavily/version.rb +5 -0
- data/lib/tavily.rb +78 -0
- data/sig/tavily/client.rbs +107 -0
- data/sig/tavily/configuration.rbs +23 -0
- data/sig/tavily/connection.rbs +12 -0
- data/sig/tavily/errors.rbs +57 -0
- data/sig/tavily/object.rbs +18 -0
- data/sig/tavily/responses.rbs +108 -0
- data/sig/tavily.rbs +19 -0
- metadata +65 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: a7b9e2b04dccf49bff52ed85008cc143ec009b95f289efed38cd230c165ee2fa
|
|
4
|
+
data.tar.gz: '089bd278c94dcf7ae30976a4e596dd36b7e462efda363369030592df105b0844'
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 1943c54bdfa88cbc213565714c523523bf389ba3319d18ba3f8a9b4a38cb34a07151d6a5691c8f7bbb935171817265a9744b30b9a9cd6a9c1e92c9b76538bbd8
|
|
7
|
+
data.tar.gz: 0a5c3869a3493867a68ddfe5c0c7c7c463492cbfa2391e61c77a2f457e793349d1e3539001946ada94b618ad76a103f951db15088e80a3d31b0bb79cc9e1dd14
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project are documented here. The format is based on
|
|
4
|
+
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
|
|
5
|
+
adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
6
|
+
|
|
7
|
+
## [Unreleased]
|
|
8
|
+
|
|
9
|
+
## [0.1.0] - 2026-06-02
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
- Initial release.
|
|
13
|
+
- `Tavily::Client` covering the Tavily REST API:
|
|
14
|
+
- `#search` (plus `#qna_search` and `#search_context` helpers)
|
|
15
|
+
- `#extract`
|
|
16
|
+
- `#crawl`
|
|
17
|
+
- `#map`
|
|
18
|
+
- `#research`, `#research_task`, `#wait_for_research`, and streaming research
|
|
19
|
+
via a block.
|
|
20
|
+
- Typed response objects (`SearchResponse`, `ExtractResponse`, `CrawlResponse`,
|
|
21
|
+
`MapResponse`, `ResearchTask`, and friends) with raw-hash access preserved.
|
|
22
|
+
- Global configuration (`Tavily.configure`) and per-client overrides.
|
|
23
|
+
- Automatic retries with exponential backoff for transient failures, honoring
|
|
24
|
+
the `Retry-After` header on `429` responses.
|
|
25
|
+
- Granular error hierarchy, including Tavily's non-standard `432` (plan limit)
|
|
26
|
+
and `433` (pay-as-you-go limit) status codes.
|
|
27
|
+
- Built entirely on the Ruby standard library, no runtime dependencies.
|
|
28
|
+
|
|
29
|
+
[Unreleased]: https://github.com/main-path/tavily/compare/v0.1.0...HEAD
|
|
30
|
+
[0.1.0]: https://github.com/main-path/tavily/releases/tag/v0.1.0
|
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
The MIT License (MIT)
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 ned
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
|
13
|
+
all copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,319 @@
|
|
|
1
|
+
# Tavily
|
|
2
|
+
|
|
3
|
+
[](https://github.com/main-path/tavily/actions/workflows/ci.yml)
|
|
4
|
+
[](https://rubygems.org/gems/tavily)
|
|
5
|
+
|
|
6
|
+
A lightweight, dependency-free Ruby client for the [Tavily](https://docs.tavily.com/welcome) API — the web-access layer built for LLMs and AI agents.
|
|
7
|
+
|
|
8
|
+
It wraps every Tavily endpoint with typed response objects, automatic retries, streaming research, and a granular error hierarchy:
|
|
9
|
+
|
|
10
|
+
- **Search** — `search`, plus the `qna_search` and `search_context` helpers
|
|
11
|
+
- **Extract** — clean content from one or many URLs
|
|
12
|
+
- **Crawl** — follow links from a root URL
|
|
13
|
+
- **Map** — discover a site's URL structure
|
|
14
|
+
- **Research** — start, poll, or live-stream an asynchronous research task
|
|
15
|
+
|
|
16
|
+
Built entirely on the Ruby standard library (`net/http`) — **no runtime dependencies**.
|
|
17
|
+
|
|
18
|
+
## Installation
|
|
19
|
+
|
|
20
|
+
Add it to your Gemfile:
|
|
21
|
+
|
|
22
|
+
```ruby
|
|
23
|
+
gem "tavily"
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
Then run `bundle install`. Or install it directly:
|
|
27
|
+
|
|
28
|
+
```sh
|
|
29
|
+
gem install tavily
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
Requires Ruby 3.1+.
|
|
33
|
+
|
|
34
|
+
## Quick start
|
|
35
|
+
|
|
36
|
+
```ruby
|
|
37
|
+
require "tavily"
|
|
38
|
+
|
|
39
|
+
client = Tavily::Client.new(api_key: "tvly-YOUR_API_KEY")
|
|
40
|
+
|
|
41
|
+
response = client.search("who won the 2022 FIFA World Cup?", include_answer: true)
|
|
42
|
+
|
|
43
|
+
response.answer # => "Argentina won the 2022 FIFA World Cup..."
|
|
44
|
+
response.results.first.url # => "https://en.wikipedia.org/wiki/..."
|
|
45
|
+
response.results.first.title
|
|
46
|
+
response.credits # credit usage, when include_usage: true
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
Get your API key from the [Tavily dashboard](https://app.tavily.com).
|
|
50
|
+
|
|
51
|
+
## Configuration
|
|
52
|
+
|
|
53
|
+
You can configure a global default (used by `Tavily.search` and friends, and as the default for every new client):
|
|
54
|
+
|
|
55
|
+
```ruby
|
|
56
|
+
Tavily.configure do |config|
|
|
57
|
+
config.api_key = ENV["TAVILY_API_KEY"] # default: ENV["TAVILY_API_KEY"]
|
|
58
|
+
config.base_url = "https://api.tavily.com"
|
|
59
|
+
config.timeout = 60 # read timeout (seconds)
|
|
60
|
+
config.open_timeout = 10 # connection-open timeout (seconds)
|
|
61
|
+
config.max_retries = 2 # automatic retries for transient failures
|
|
62
|
+
config.retry_base_delay = 0.5 # base seconds for exponential backoff
|
|
63
|
+
config.proxy = nil # "http://user:pass@host:port"
|
|
64
|
+
config.ca_file = nil # path to a PEM CA bundle (see "Windows / TLS")
|
|
65
|
+
config.logger = Logger.new($stdout) # optional request logging
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
Tavily.search("latest ruby release")
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
Every option can also be overridden per client:
|
|
72
|
+
|
|
73
|
+
```ruby
|
|
74
|
+
client = Tavily::Client.new(api_key: "tvly-...", timeout: 120, max_retries: 5)
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
The following environment variables are read automatically: `TAVILY_API_KEY`, `TAVILY_BASE_URL`, `TAVILY_TIMEOUT`, `TAVILY_MAX_RETRIES`, `TAVILY_PROXY`, and `SSL_CERT_FILE`.
|
|
78
|
+
|
|
79
|
+
## Endpoints
|
|
80
|
+
|
|
81
|
+
### Search
|
|
82
|
+
|
|
83
|
+
```ruby
|
|
84
|
+
response = client.search(
|
|
85
|
+
"embedded systems news",
|
|
86
|
+
topic: "news", # "general" (default), "news", or "finance"
|
|
87
|
+
search_depth: "advanced", # "basic" (default), "advanced", "fast", "ultra-fast"
|
|
88
|
+
max_results: 10, # 0–20 (default 5)
|
|
89
|
+
time_range: "week", # "day" | "week" | "month" | "year"
|
|
90
|
+
include_answer: "advanced", # true/false, "basic", or "advanced"
|
|
91
|
+
include_raw_content: "markdown",
|
|
92
|
+
include_images: true,
|
|
93
|
+
include_domains: ["arxiv.org"],
|
|
94
|
+
exclude_domains: ["example.com"],
|
|
95
|
+
include_usage: true
|
|
96
|
+
)
|
|
97
|
+
|
|
98
|
+
response.query
|
|
99
|
+
response.answer
|
|
100
|
+
response.results # => [Tavily::SearchResult, ...]
|
|
101
|
+
response.results.first.title
|
|
102
|
+
response.results.first.content
|
|
103
|
+
response.results.first.score
|
|
104
|
+
response.urls # => convenience array of result URLs
|
|
105
|
+
response.images # => [Tavily::Image, ...] (url + optional description)
|
|
106
|
+
response.credits # => Integer (when include_usage: true)
|
|
107
|
+
response.request_id
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
#### `qna_search` — just the answer
|
|
111
|
+
|
|
112
|
+
```ruby
|
|
113
|
+
client.qna_search("what is the capital of France?")
|
|
114
|
+
# => "Paris is the capital of France."
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
#### `search_context` — a RAG-ready context string
|
|
118
|
+
|
|
119
|
+
Returns a JSON string of `[{ "url" =>, "content" => }, ...]`, trimmed to roughly `max_tokens` tokens.
|
|
120
|
+
|
|
121
|
+
```ruby
|
|
122
|
+
context = client.search_context("ruby concurrency", max_tokens: 4000)
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### Extract
|
|
126
|
+
|
|
127
|
+
```ruby
|
|
128
|
+
response = client.extract(
|
|
129
|
+
["https://docs.tavily.com/welcome", "https://example.com"],
|
|
130
|
+
extract_depth: "advanced", # "basic" (default) or "advanced"
|
|
131
|
+
format: "markdown", # "markdown" (default) or "text"
|
|
132
|
+
include_images: true,
|
|
133
|
+
include_usage: true
|
|
134
|
+
)
|
|
135
|
+
|
|
136
|
+
response.results # => [Tavily::ExtractResult, ...]
|
|
137
|
+
response.results.first.raw_content
|
|
138
|
+
response.failed_results # => [Tavily::FailedResult, ...] (url + error)
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
A single URL string works too: `client.extract("https://example.com")`. Up to 20 URLs per request.
|
|
142
|
+
|
|
143
|
+
### Crawl
|
|
144
|
+
|
|
145
|
+
```ruby
|
|
146
|
+
response = client.crawl(
|
|
147
|
+
"https://docs.tavily.com",
|
|
148
|
+
instructions: "Find all pages about pricing", # natural-language guidance
|
|
149
|
+
max_depth: 2, # 1–5
|
|
150
|
+
max_breadth: 50, # links per level (1–500)
|
|
151
|
+
limit: 100, # total links to process
|
|
152
|
+
select_paths: ["/documentation/.*"], # regex allowlist
|
|
153
|
+
exclude_paths: ["/blog/.*"], # regex blocklist
|
|
154
|
+
extract_depth: "basic"
|
|
155
|
+
)
|
|
156
|
+
|
|
157
|
+
response.base_url
|
|
158
|
+
response.results # => [Tavily::CrawlResult, ...] (url, raw_content, favicon)
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
### Map
|
|
162
|
+
|
|
163
|
+
```ruby
|
|
164
|
+
response = client.map("https://docs.tavily.com", max_depth: 2, limit: 100)
|
|
165
|
+
|
|
166
|
+
response.base_url
|
|
167
|
+
response.results # => ["https://docs.tavily.com/...", ...] (array of URLs)
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### Research (asynchronous)
|
|
171
|
+
|
|
172
|
+
Research is an asynchronous endpoint. Start a task, then either poll for the result or stream it live.
|
|
173
|
+
|
|
174
|
+
**Create + poll:**
|
|
175
|
+
|
|
176
|
+
```ruby
|
|
177
|
+
task = client.research(
|
|
178
|
+
"Compare the leading Ruby HTTP clients in 2025",
|
|
179
|
+
model: "mini", # "mini", "pro", or "auto" (default)
|
|
180
|
+
output_length: "standard" # "short", "standard", or "long"
|
|
181
|
+
)
|
|
182
|
+
|
|
183
|
+
task.request_id
|
|
184
|
+
task.status # => "pending"
|
|
185
|
+
|
|
186
|
+
# Block until it finishes (raises on failure / timeout):
|
|
187
|
+
result = client.wait_for_research(task.request_id, poll_interval: 3, timeout: 600)
|
|
188
|
+
result.content # => the final report (String, or Hash if output_schema given)
|
|
189
|
+
result.sources # => [Tavily::ResearchSource, ...] (title, url, favicon)
|
|
190
|
+
|
|
191
|
+
# Or check once, yourself:
|
|
192
|
+
client.research_task(task.request_id).completed?
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
**Stream live** (pass a block to receive Server-Sent Events). Each `event` is a `Tavily::ResearchEvent` with an `.event` name and parsed `.data`. The stream is OpenAI-compatible: events are `chat.completion.chunk`s whose `choices/0/delta` carries either report text (`content`) or a research `tool_calls` step, ending with a `done` event (or an `error` event on failure):
|
|
196
|
+
|
|
197
|
+
```ruby
|
|
198
|
+
client.research("Latest developments in fusion energy", model: "mini") do |event|
|
|
199
|
+
case event.event
|
|
200
|
+
when "chat.completion.chunk"
|
|
201
|
+
delta = event.data.dig("choices", 0, "delta")
|
|
202
|
+
print delta["content"] if delta && delta["content"]
|
|
203
|
+
when "error"
|
|
204
|
+
warn event.data["error"]
|
|
205
|
+
when "done"
|
|
206
|
+
puts "\n[done]"
|
|
207
|
+
end
|
|
208
|
+
end
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
The gem yields every event exactly as the API sends it, so it keeps working even if Tavily changes the event schema.
|
|
212
|
+
|
|
213
|
+
You can also request structured output with a JSON Schema:
|
|
214
|
+
|
|
215
|
+
```ruby
|
|
216
|
+
client.research(
|
|
217
|
+
"Top 3 EVs by range in 2025",
|
|
218
|
+
output_schema: {
|
|
219
|
+
"type" => "object",
|
|
220
|
+
"properties" => { "cars" => { "type" => "array", "items" => { "type" => "string" } } },
|
|
221
|
+
"required" => ["cars"]
|
|
222
|
+
}
|
|
223
|
+
)
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
## Response objects
|
|
227
|
+
|
|
228
|
+
Every endpoint returns a typed object that wraps the raw JSON. Declared fields are exposed as methods, and the full payload is always reachable:
|
|
229
|
+
|
|
230
|
+
```ruby
|
|
231
|
+
response = client.search("ruby")
|
|
232
|
+
|
|
233
|
+
response.results.first.title # typed accessor
|
|
234
|
+
response["request_id"] # raw access by key (String or Symbol)
|
|
235
|
+
response.dig("usage", "credits")
|
|
236
|
+
response.to_h # the complete parsed Hash
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
Because access falls through to the raw hash, any new field Tavily adds is reachable immediately — and any new request parameter can be passed through as a keyword argument without waiting for a gem update:
|
|
240
|
+
|
|
241
|
+
```ruby
|
|
242
|
+
client.search("ruby", some_new_param: true) # forwarded straight to the API
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
## Error handling
|
|
246
|
+
|
|
247
|
+
Non-2xx responses raise a subclass of `Tavily::APIError`, which carries the HTTP status, parsed body, and request id:
|
|
248
|
+
|
|
249
|
+
```ruby
|
|
250
|
+
begin
|
|
251
|
+
client.search("ruby")
|
|
252
|
+
rescue Tavily::RateLimitError => e
|
|
253
|
+
retry_later
|
|
254
|
+
rescue Tavily::APIError => e
|
|
255
|
+
e.status # => 401
|
|
256
|
+
e.message # => "[401] Unauthorized: missing or invalid API key."
|
|
257
|
+
e.request_id # => "..." (quote this in support tickets)
|
|
258
|
+
e.body # => parsed response body
|
|
259
|
+
end
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
| Exception | Status | Meaning |
|
|
263
|
+
|---|---|---|
|
|
264
|
+
| `Tavily::BadRequestError` | 400 | Invalid request or parameter value |
|
|
265
|
+
| `Tavily::AuthenticationError` | 401 | Missing or invalid API key |
|
|
266
|
+
| `Tavily::ForbiddenError` | 403 | Not permitted (e.g. unsupported URL) |
|
|
267
|
+
| `Tavily::NotFoundError` | 404 | Resource not found |
|
|
268
|
+
| `Tavily::UnprocessableEntityError` | 422 | Request body failed validation |
|
|
269
|
+
| `Tavily::RateLimitError` | 429 | Rate limit exceeded (honors `Retry-After`) |
|
|
270
|
+
| `Tavily::PlanLimitError` | 432 | Plan/key credit quota exceeded |
|
|
271
|
+
| `Tavily::PayAsYouGoLimitError` | 433 | Pay-as-you-go limit exceeded |
|
|
272
|
+
| `Tavily::ServerError` | 5xx | Tavily server-side error |
|
|
273
|
+
|
|
274
|
+
`Tavily::PlanLimitError` and `Tavily::PayAsYouGoLimitError` share the ancestor `Tavily::UsageLimitError`. Network problems raise `Tavily::TimeoutError` or `Tavily::ConnectionError`, and a missing key raises `Tavily::ConfigurationError`. Everything ultimately descends from `Tavily::Error`.
|
|
275
|
+
|
|
276
|
+
## Retries
|
|
277
|
+
|
|
278
|
+
Transient failures (HTTP 408/409/425/429/5xx and network timeouts) are retried automatically up to `max_retries` times with exponential backoff and jitter. On a `429`, the `Retry-After` header is respected. Streaming research requests are not retried.
|
|
279
|
+
|
|
280
|
+
## Windows / TLS certificates
|
|
281
|
+
|
|
282
|
+
Some Windows Ruby builds (including MSVC builds) ship without a usable default OpenSSL certificate store, which causes `certificate verify failed (unable to get local issuer certificate)`. Point the client at a CA bundle:
|
|
283
|
+
|
|
284
|
+
```ruby
|
|
285
|
+
Tavily.configure { |c| c.ca_file = 'C:\path\to\cacert.pem' }
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
Or set it for the whole process before requiring the gem:
|
|
289
|
+
|
|
290
|
+
```sh
|
|
291
|
+
# A bundle ships with Git for Windows, for example:
|
|
292
|
+
set SSL_CERT_FILE=C:\Program Files\Git\mingw64\etc\ssl\certs\ca-bundle.crt
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
You can also download an up-to-date bundle from <https://curl.se/ca/cacert.pem>.
|
|
296
|
+
|
|
297
|
+
## Development
|
|
298
|
+
|
|
299
|
+
```sh
|
|
300
|
+
bin/setup # bundle install + create .env
|
|
301
|
+
bundle exec rake # run the specs and RuboCop
|
|
302
|
+
bin/console # an IRB session with the gem loaded
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
The default `rake` task runs RSpec and RuboCop. The offline suite uses [WebMock](https://github.com/bblimke/webmock) and makes no network calls.
|
|
306
|
+
|
|
307
|
+
To run the live integration suite (consumes credits):
|
|
308
|
+
|
|
309
|
+
```sh
|
|
310
|
+
TAVILY_LIVE=1 TAVILY_API_KEY=tvly-... bundle exec rspec spec/live_spec.rb
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
## Contributing
|
|
314
|
+
|
|
315
|
+
Bug reports and pull requests are welcome at <https://github.com/main-path/tavily>.
|
|
316
|
+
|
|
317
|
+
## License
|
|
318
|
+
|
|
319
|
+
Released under the [MIT License](LICENSE.txt).
|
|
@@ -0,0 +1,276 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative "configuration"
|
|
4
|
+
require_relative "connection"
|
|
5
|
+
require_relative "responses"
|
|
6
|
+
|
|
7
|
+
module Tavily
|
|
8
|
+
# The main entry point for talking to the Tavily API.
|
|
9
|
+
#
|
|
10
|
+
# @example
|
|
11
|
+
# client = Tavily::Client.new(api_key: "tvly-...")
|
|
12
|
+
# response = client.search("who won the 2022 world cup?", include_answer: true)
|
|
13
|
+
# response.answer # => "Argentina ..."
|
|
14
|
+
#
|
|
15
|
+
# Every endpoint method accepts the documented parameters as keyword
|
|
16
|
+
# arguments and forwards any additional keywords (`**extra`) straight into the
|
|
17
|
+
# request body, so newly released API parameters work without a gem upgrade.
|
|
18
|
+
class Client
|
|
19
|
+
# Default polling interval (seconds) for {#wait_for_research}.
|
|
20
|
+
DEFAULT_POLL_INTERVAL = 3.0
|
|
21
|
+
# Default overall timeout (seconds) for {#wait_for_research}.
|
|
22
|
+
DEFAULT_POLL_TIMEOUT = 600
|
|
23
|
+
|
|
24
|
+
# @return [Configuration] the resolved configuration for this client.
|
|
25
|
+
attr_reader :config
|
|
26
|
+
|
|
27
|
+
# @param api_key [String, nil] overrides {Configuration#api_key}
|
|
28
|
+
# @param options [Hash] any other {Configuration} attribute to override
|
|
29
|
+
# (e.g. +base_url:+, +timeout:+, +max_retries:+, +logger:+)
|
|
30
|
+
def initialize(api_key: nil, **options)
|
|
31
|
+
@config = build_config(api_key, options)
|
|
32
|
+
@connection = Connection.new(@config)
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
# Execute a web search.
|
|
36
|
+
#
|
|
37
|
+
# @param query [String] the search query (required)
|
|
38
|
+
# @param search_depth [String, nil] "basic", "advanced", "fast", or "ultra-fast"
|
|
39
|
+
# @param topic [String, nil] "general", "news", or "finance"
|
|
40
|
+
# @param max_results [Integer, nil] number of results (0–20, default 5)
|
|
41
|
+
# @param chunks_per_source [Integer, nil] 1–3, advanced depth only
|
|
42
|
+
# @param time_range [String, nil] "day"/"week"/"month"/"year" (or d/w/m/y)
|
|
43
|
+
# @param days [Integer, nil] limit to the last N days (news topic)
|
|
44
|
+
# @param start_date [String, nil] "YYYY-MM-DD"
|
|
45
|
+
# @param end_date [String, nil] "YYYY-MM-DD"
|
|
46
|
+
# @param include_answer [Boolean, String, nil] true/false, "basic", or "advanced"
|
|
47
|
+
# @param include_raw_content [Boolean, String, nil] true/false, "markdown", or "text"
|
|
48
|
+
# @param include_images [Boolean, nil]
|
|
49
|
+
# @param include_image_descriptions [Boolean, nil] requires +include_images: true+
|
|
50
|
+
# @param include_favicon [Boolean, nil]
|
|
51
|
+
# @param include_domains [Array<String>, nil] up to 300 domains
|
|
52
|
+
# @param exclude_domains [Array<String>, nil] up to 150 domains
|
|
53
|
+
# @param country [String, nil] boost results from a country (general topic)
|
|
54
|
+
# @param auto_parameters [Boolean, nil] let Tavily choose parameters
|
|
55
|
+
# @param include_usage [Boolean, nil] include credit usage in the response
|
|
56
|
+
# @param extra [Hash] any additional request-body parameters
|
|
57
|
+
# @return [SearchResponse]
|
|
58
|
+
def search(query, search_depth: nil, topic: nil, max_results: nil, chunks_per_source: nil,
|
|
59
|
+
time_range: nil, days: nil, start_date: nil, end_date: nil, include_answer: nil,
|
|
60
|
+
include_raw_content: nil, include_images: nil, include_image_descriptions: nil,
|
|
61
|
+
include_favicon: nil, include_domains: nil, exclude_domains: nil, country: nil,
|
|
62
|
+
auto_parameters: nil, include_usage: nil, **extra)
|
|
63
|
+
body = {
|
|
64
|
+
query: query, search_depth: search_depth, topic: topic, max_results: max_results,
|
|
65
|
+
chunks_per_source: chunks_per_source, time_range: time_range, days: days,
|
|
66
|
+
start_date: start_date, end_date: end_date, include_answer: include_answer,
|
|
67
|
+
include_raw_content: include_raw_content, include_images: include_images,
|
|
68
|
+
include_image_descriptions: include_image_descriptions, include_favicon: include_favicon,
|
|
69
|
+
include_domains: include_domains, exclude_domains: exclude_domains, country: country,
|
|
70
|
+
auto_parameters: auto_parameters, include_usage: include_usage
|
|
71
|
+
}.merge(extra)
|
|
72
|
+
SearchResponse.new(@connection.post("/search", body))
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
# Convenience: run a search and return only the generated answer string.
|
|
76
|
+
# @param query [String]
|
|
77
|
+
# @param options [Hash] forwarded to {#search}
|
|
78
|
+
# @return [String, nil] the answer, or nil if none was produced
|
|
79
|
+
def qna_search(query, **options)
|
|
80
|
+
options[:include_answer] = true unless options.key?(:include_answer)
|
|
81
|
+
search(query, **options).answer
|
|
82
|
+
end
|
|
83
|
+
|
|
84
|
+
# Convenience: run a search and return a compact context string suitable for
|
|
85
|
+
# RAG prompts: a JSON array of {url, content} objects trimmed to roughly
|
|
86
|
+
# +max_tokens+ tokens (estimated at ~4 characters per token). The top result
|
|
87
|
+
# is always included, even if it alone exceeds the budget.
|
|
88
|
+
# @param query [String]
|
|
89
|
+
# @param max_tokens [Integer] approximate token budget for the context
|
|
90
|
+
# @param options [Hash] forwarded to {#search}
|
|
91
|
+
# @return [String] JSON string of [{ "url" =>, "content" => }, ...]
|
|
92
|
+
def search_context(query, max_tokens: 4000, **options)
|
|
93
|
+
budget = max_tokens * 4
|
|
94
|
+
used = 0
|
|
95
|
+
sources = []
|
|
96
|
+
search(query, **options).results.each do |result|
|
|
97
|
+
entry = { "url" => result.url, "content" => result.content }
|
|
98
|
+
sources << entry
|
|
99
|
+
used += JSON.generate(entry).length
|
|
100
|
+
break if used >= budget
|
|
101
|
+
end
|
|
102
|
+
JSON.generate(sources)
|
|
103
|
+
end
|
|
104
|
+
|
|
105
|
+
# Extract clean content from one or more URLs.
|
|
106
|
+
#
|
|
107
|
+
# @param urls [String, Array<String>] one URL or an array (max 20)
|
|
108
|
+
# @param query [String, nil] rerank extracted chunks by this intent
|
|
109
|
+
# @param chunks_per_source [Integer, nil] 1–5, only with +query+
|
|
110
|
+
# @param extract_depth [String, nil] "basic" or "advanced"
|
|
111
|
+
# @param include_images [Boolean, nil]
|
|
112
|
+
# @param include_favicon [Boolean, nil]
|
|
113
|
+
# @param format [String, nil] "markdown" or "text"
|
|
114
|
+
# @param timeout [Numeric, nil] per-request extraction timeout (1.0–60.0s)
|
|
115
|
+
# @param include_usage [Boolean, nil]
|
|
116
|
+
# @param extra [Hash] any additional request-body parameters
|
|
117
|
+
# @return [ExtractResponse]
|
|
118
|
+
def extract(urls, query: nil, chunks_per_source: nil, extract_depth: nil, include_images: nil,
|
|
119
|
+
include_favicon: nil, format: nil, timeout: nil, include_usage: nil, **extra)
|
|
120
|
+
body = {
|
|
121
|
+
urls: urls, query: query, chunks_per_source: chunks_per_source, extract_depth: extract_depth,
|
|
122
|
+
include_images: include_images, include_favicon: include_favicon, format: format,
|
|
123
|
+
timeout: timeout, include_usage: include_usage
|
|
124
|
+
}.merge(extra)
|
|
125
|
+
ExtractResponse.new(@connection.post("/extract", body))
|
|
126
|
+
end
|
|
127
|
+
|
|
128
|
+
# Crawl a site starting from a root URL, following links.
|
|
129
|
+
#
|
|
130
|
+
# @param url [String] root URL (required)
|
|
131
|
+
# @param instructions [String, nil] natural-language crawl guidance
|
|
132
|
+
# @param chunks_per_source [Integer, nil] 1–5, only with +instructions+
|
|
133
|
+
# @param max_depth [Integer, nil] 1–5
|
|
134
|
+
# @param max_breadth [Integer, nil] 1–500 links per level
|
|
135
|
+
# @param limit [Integer, nil] total links to process
|
|
136
|
+
# @param select_paths [Array<String>, nil] regex path allowlist
|
|
137
|
+
# @param select_domains [Array<String>, nil] regex domain allowlist
|
|
138
|
+
# @param exclude_paths [Array<String>, nil] regex path blocklist
|
|
139
|
+
# @param exclude_domains [Array<String>, nil] regex domain blocklist
|
|
140
|
+
# @param allow_external [Boolean, nil] follow external domains (default true)
|
|
141
|
+
# @param include_images [Boolean, nil]
|
|
142
|
+
# @param extract_depth [String, nil] "basic" or "advanced"
|
|
143
|
+
# @param format [String, nil] "markdown" or "text"
|
|
144
|
+
# @param include_favicon [Boolean, nil]
|
|
145
|
+
# @param timeout [Numeric, nil] 10–150s
|
|
146
|
+
# @param include_usage [Boolean, nil]
|
|
147
|
+
# @param extra [Hash] any additional request-body parameters
|
|
148
|
+
# @return [CrawlResponse]
|
|
149
|
+
def crawl(url, instructions: nil, chunks_per_source: nil, max_depth: nil, max_breadth: nil,
|
|
150
|
+
limit: nil, select_paths: nil, select_domains: nil, exclude_paths: nil,
|
|
151
|
+
exclude_domains: nil, allow_external: nil, include_images: nil, extract_depth: nil,
|
|
152
|
+
format: nil, include_favicon: nil, timeout: nil, include_usage: nil, **extra)
|
|
153
|
+
body = {
|
|
154
|
+
url: url, instructions: instructions, chunks_per_source: chunks_per_source,
|
|
155
|
+
max_depth: max_depth, max_breadth: max_breadth, limit: limit, select_paths: select_paths,
|
|
156
|
+
select_domains: select_domains, exclude_paths: exclude_paths, exclude_domains: exclude_domains,
|
|
157
|
+
allow_external: allow_external, include_images: include_images, extract_depth: extract_depth,
|
|
158
|
+
format: format, include_favicon: include_favicon, timeout: timeout, include_usage: include_usage
|
|
159
|
+
}.merge(extra)
|
|
160
|
+
CrawlResponse.new(@connection.post("/crawl", body))
|
|
161
|
+
end
|
|
162
|
+
|
|
163
|
+
# Map the structure of a site, returning discovered URLs.
|
|
164
|
+
#
|
|
165
|
+
# @param url [String] root URL (required)
|
|
166
|
+
# @param instructions [String, nil] natural-language mapping guidance
|
|
167
|
+
# @param max_depth [Integer, nil] 1–5
|
|
168
|
+
# @param max_breadth [Integer, nil] 1–500
|
|
169
|
+
# @param limit [Integer, nil] 1–500
|
|
170
|
+
# @param select_paths [Array<String>, nil] regex path allowlist
|
|
171
|
+
# @param select_domains [Array<String>, nil] regex domain allowlist
|
|
172
|
+
# @param exclude_paths [Array<String>, nil] regex path blocklist
|
|
173
|
+
# @param exclude_domains [Array<String>, nil] regex domain blocklist
|
|
174
|
+
# @param allow_external [Boolean, nil] follow external domains (default true)
|
|
175
|
+
# @param timeout [Numeric, nil] 10–150s
|
|
176
|
+
# @param include_usage [Boolean, nil]
|
|
177
|
+
# @param extra [Hash] any additional request-body parameters
|
|
178
|
+
# @return [MapResponse]
|
|
179
|
+
def map(url, instructions: nil, max_depth: nil, max_breadth: nil, limit: nil, select_paths: nil,
|
|
180
|
+
select_domains: nil, exclude_paths: nil, exclude_domains: nil, allow_external: nil,
|
|
181
|
+
timeout: nil, include_usage: nil, **extra)
|
|
182
|
+
body = {
|
|
183
|
+
url: url, instructions: instructions, max_depth: max_depth, max_breadth: max_breadth,
|
|
184
|
+
limit: limit, select_paths: select_paths, select_domains: select_domains,
|
|
185
|
+
exclude_paths: exclude_paths, exclude_domains: exclude_domains, allow_external: allow_external,
|
|
186
|
+
timeout: timeout, include_usage: include_usage
|
|
187
|
+
}.merge(extra)
|
|
188
|
+
MapResponse.new(@connection.post("/map", body))
|
|
189
|
+
end
|
|
190
|
+
|
|
191
|
+
# Start an asynchronous research task, or stream it live.
|
|
192
|
+
#
|
|
193
|
+
# Without a block this creates the task and returns immediately with a
|
|
194
|
+
# {ResearchTask} whose +status+ is "pending"; poll it with
|
|
195
|
+
# {#research_task} or {#wait_for_research}. With a block, the task is
|
|
196
|
+
# streamed and each Server-Sent Event is yielded as a {ResearchEvent}.
|
|
197
|
+
#
|
|
198
|
+
# @param input [String] the research question (required)
|
|
199
|
+
# @param model [String, nil] "mini", "pro", or "auto"
|
|
200
|
+
# @param output_schema [Hash, nil] JSON Schema for structured output
|
|
201
|
+
# @param citation_format [String, nil] "numbered", "mla", "apa", or "chicago"
|
|
202
|
+
# @param include_domains [Array<String>, nil] up to 20 preferred domains
|
|
203
|
+
# @param exclude_domains [Array<String>, nil] up to 20 blocked domains
|
|
204
|
+
# @param output_length [String, nil] "short", "standard", or "long"
|
|
205
|
+
# @param files [Array<Hash>, nil] up to 5 base64 file objects
|
|
206
|
+
# @param extra [Hash] any additional request-body parameters
|
|
207
|
+
# @yieldparam event [ResearchEvent] (streaming mode only)
|
|
208
|
+
# @return [ResearchTask, nil] the queued task, or nil when streaming
|
|
209
|
+
def research(input, model: nil, output_schema: nil, citation_format: nil, include_domains: nil,
|
|
210
|
+
exclude_domains: nil, output_length: nil, files: nil, **extra, &block)
|
|
211
|
+
body = {
|
|
212
|
+
input: input, model: model, output_schema: output_schema, citation_format: citation_format,
|
|
213
|
+
include_domains: include_domains, exclude_domains: exclude_domains,
|
|
214
|
+
output_length: output_length, files: files
|
|
215
|
+
}.merge(extra)
|
|
216
|
+
|
|
217
|
+
if block
|
|
218
|
+
@connection.stream("/research", body.merge(stream: true), &block)
|
|
219
|
+
nil
|
|
220
|
+
else
|
|
221
|
+
ResearchTask.new(@connection.post("/research", body))
|
|
222
|
+
end
|
|
223
|
+
end
|
|
224
|
+
|
|
225
|
+
# Fetch the current status and (when finished) result of a research task.
|
|
226
|
+
# @param request_id [String]
|
|
227
|
+
# @return [ResearchTask]
|
|
228
|
+
def research_task(request_id)
|
|
229
|
+
ResearchTask.new(@connection.get("/research/#{request_id}"))
|
|
230
|
+
end
|
|
231
|
+
|
|
232
|
+
# Poll a research task until it completes or fails.
|
|
233
|
+
#
|
|
234
|
+
# @param request_id [String]
|
|
235
|
+
# @param poll_interval [Numeric] seconds between polls
|
|
236
|
+
# @param timeout [Numeric] overall timeout in seconds
|
|
237
|
+
# @yieldparam task [ResearchTask] optional progress callback on each poll
|
|
238
|
+
# @raise [Tavily::Error] if the task fails or the timeout is exceeded
|
|
239
|
+
# @return [ResearchTask] the completed task
|
|
240
|
+
def wait_for_research(request_id, poll_interval: DEFAULT_POLL_INTERVAL,
|
|
241
|
+
timeout: DEFAULT_POLL_TIMEOUT)
|
|
242
|
+
deadline = monotonic_now + timeout
|
|
243
|
+
loop do
|
|
244
|
+
task = research_task(request_id)
|
|
245
|
+
yield task if block_given?
|
|
246
|
+
return task if task.completed?
|
|
247
|
+
raise Error, "Research task #{request_id} failed" if task.failed?
|
|
248
|
+
|
|
249
|
+
if monotonic_now >= deadline
|
|
250
|
+
raise TimeoutError,
|
|
251
|
+
"Research task #{request_id} did not complete within #{timeout}s"
|
|
252
|
+
end
|
|
253
|
+
|
|
254
|
+
sleep(poll_interval)
|
|
255
|
+
end
|
|
256
|
+
end
|
|
257
|
+
|
|
258
|
+
private
|
|
259
|
+
|
|
260
|
+
def build_config(api_key, options)
|
|
261
|
+
cfg = Tavily.configuration.dup
|
|
262
|
+
cfg.api_key = api_key unless api_key.nil?
|
|
263
|
+
options.each do |key, value|
|
|
264
|
+
setter = "#{key}="
|
|
265
|
+
raise ConfigurationError, "Unknown configuration option: #{key.inspect}" unless cfg.respond_to?(setter)
|
|
266
|
+
|
|
267
|
+
cfg.public_send(setter, value)
|
|
268
|
+
end
|
|
269
|
+
cfg
|
|
270
|
+
end
|
|
271
|
+
|
|
272
|
+
def monotonic_now
|
|
273
|
+
Process.clock_gettime(Process::CLOCK_MONOTONIC)
|
|
274
|
+
end
|
|
275
|
+
end
|
|
276
|
+
end
|