firecrawl 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +67 -52
- data/firecrawl.gemspec +1 -1
- data/lib/firecrawl/batch_scrape_request.rb +52 -16
- data/lib/firecrawl/crawl_request.rb +24 -25
- data/lib/firecrawl/error_result.rb +1 -1
- data/lib/firecrawl/map_request.rb +4 -4
- data/lib/firecrawl/module_methods.rb +1 -1
- data/lib/firecrawl/scrape_request.rb +4 -4
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7e55dc5e433f0632ab0c11feda818bf82ddf77ae8ea2fdaa624c5a9af1dccf4c
|
4
|
+
data.tar.gz: 2781378e0a6b62c2e7befb0ebd0298b0b91b2cf249fe33115dc675606aed7bfe
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6fa88114c36df02f9cd261159132298e9a44b01464334daeb31ea2d8d5b11122321066ddf55207fddc3bef72704b353cf42d7aebaa46ac70298ea1efe19c6885
|
7
|
+
data.tar.gz: 622fd277c01854131b4a21742c915359c97fffa54b4dc41aff47d09b06d4d7c6c485361ff6409dc1361cf561f313d42096674a426987b5bbf696dcba1f52cd96
|
data/README.md
CHANGED
@@ -11,7 +11,7 @@ provide markdown information for real time information lookup as well as groundi
|
|
11
11
|
require 'firecrawl'
|
12
12
|
|
13
13
|
Firecrawl.api_key ENV[ 'FIRECRAWL_API_KEY' ]
|
14
|
-
response = Firecrawl.scrape( 'https://example.com'
|
14
|
+
response = Firecrawl.scrape( 'https://example.com' )
|
15
15
|
if response.success?
|
16
16
|
result = response.result
|
17
17
|
puts result.metadata[ 'title' ]
|
@@ -45,18 +45,44 @@ $ gem install firecrawl
|
|
45
45
|
|
46
46
|
## Usage
|
47
47
|
|
48
|
-
###
|
48
|
+
### Scraping
|
49
49
|
|
50
|
-
The simplest way to use Firecrawl is to scrape a single page
|
50
|
+
The simplest way to use Firecrawl is to `scrape`, which will scrape the content of a single page
|
51
|
+
at the given url and optionally convert it to markdown as well as create a screenshot. You can
|
52
|
+
chose to scrape the entire page or only the main content.
|
51
53
|
|
52
54
|
```ruby
|
53
|
-
Firecrawl.api_key ENV['FIRECRAWL_API_KEY']
|
54
|
-
response = Firecrawl.scrape('https://example.com', format: :markdown )
|
55
|
+
Firecrawl.api_key ENV[ 'FIRECRAWL_API_KEY' ]
|
56
|
+
response = Firecrawl.scrape( 'https://example.com', format: :markdown )
|
55
57
|
|
56
58
|
if response.success?
|
57
59
|
result = response.result
|
58
60
|
if result.success?
|
59
|
-
puts result.metadata['title']
|
61
|
+
puts result.metadata[ 'title' ]
|
62
|
+
puts result.markdown
|
63
|
+
end
|
64
|
+
else
|
65
|
+
puts response.result.error_description
|
66
|
+
end
|
67
|
+
```
|
68
|
+
|
69
|
+
In this basic example we have globally set the `Firecrawl.api_key` from the environment and then
|
70
|
+
used the `Firecrawl.scrape` convenience method to make a request to the Firecrawl API to scrape
|
71
|
+
the `https://example.com` page and return markdown ( markdown and the main content of the page
|
72
|
+
are returned by default so we could have ommitted the options entirelly ).
|
73
|
+
|
74
|
+
The `Firecrawl.scrape` method instantiates a `Firecrawl::ScrapeRequest` instance and then calls
|
75
|
+
it's `submit` method. The following is the equivalent code which makes explict use of the
|
76
|
+
`Firecrawl::ScrapeRequest` class.
|
77
|
+
|
78
|
+
```ruby
|
79
|
+
request = Firecrawl::ScrapeRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )
|
80
|
+
response = request.submit( 'https://example.com', format: :markdown )
|
81
|
+
|
82
|
+
if response.success?
|
83
|
+
result = response.result
|
84
|
+
if result.success?
|
85
|
+
puts result.metadata[ 'title' ]
|
60
86
|
puts result.markdown
|
61
87
|
end
|
62
88
|
else
|
@@ -64,9 +90,14 @@ else
|
|
64
90
|
end
|
65
91
|
```
|
66
92
|
|
67
|
-
|
93
|
+
Notice also that in this example we've directly passed the `api_key` to the individual request.
|
94
|
+
This is optional. If you set the key globally and omit it in the request constructor the
|
95
|
+
`ScrapeRequest` instance will use the globally assigned `api_key`.
|
96
|
+
|
97
|
+
#### Scrape Options
|
68
98
|
|
69
|
-
You can customize scraping behavior using
|
99
|
+
You can customize scraping behavior using options, either by passing an option hash to
|
100
|
+
`submit` method, as we have done above, or by building a `ScrapeOptions` instance:
|
70
101
|
|
71
102
|
```ruby
|
72
103
|
options = Firecrawl::ScrapeOptions.build do
|
@@ -78,9 +109,28 @@ options = Firecrawl::ScrapeOptions.build do
|
|
78
109
|
end
|
79
110
|
|
80
111
|
request = Firecrawl::ScrapeRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )
|
81
|
-
response = request.
|
112
|
+
response = request.submit( 'https://example.com', options )
|
82
113
|
```
|
83
114
|
|
115
|
+
#### Scrape Response
|
116
|
+
|
117
|
+
The `Firecrawl` gem is based on the `Faraday` gem, which permits you to customize the request
|
118
|
+
orchestration, up to and including changing the actual HTTP implementation used to make the
|
119
|
+
request. See Connections below for additional details.
|
120
|
+
|
121
|
+
Any `Firecrawl` request, including the `submit` method as used above, will thus return a
|
122
|
+
`Faraday::Response`. This response includes a `success?` method which indicates if the request
|
123
|
+
was successful. If the request was successful, the `response.result` method will be an instance
|
124
|
+
of `Firecrawl::ScrapeResult` that will encapsulate the scraping result. This instance, in turn,
|
125
|
+
has a `success?` method which will return `true` if Firecrawl successfully scraped the page.
|
126
|
+
|
127
|
+
A successful result will include html, markdown, screenshot, as well as any action and llm
|
128
|
+
results and related metadata.
|
129
|
+
|
130
|
+
If the response is not successful ( if `response.success?` is `false` ) then `response.result`
|
131
|
+
will be an instance of Firecrawl::ErrorResult which will provide additional details about the
|
132
|
+
nature of the failure.
|
133
|
+
|
84
134
|
### Batch Scraping
|
85
135
|
|
86
136
|
For scraping multiple URLs efficiently:
|
@@ -94,7 +144,7 @@ options = Firecrawl::ScrapeOptions.build do
|
|
94
144
|
only_main_content true
|
95
145
|
end
|
96
146
|
|
97
|
-
response = request.
|
147
|
+
response = request.submit( urls, options )
|
98
148
|
while response.success?
|
99
149
|
batch_result = response.result
|
100
150
|
batch_result.scrape_results.each do |result|
|
@@ -104,7 +154,7 @@ while response.success?
|
|
104
154
|
end
|
105
155
|
break unless batch_result.status?( :scraping )
|
106
156
|
sleep 0.5
|
107
|
-
response = request.
|
157
|
+
response = request.retrieve( batch_result )
|
108
158
|
end
|
109
159
|
```
|
110
160
|
|
@@ -120,7 +170,7 @@ options = Firecrawl::MapOptions.build do
|
|
120
170
|
ignore_subdomains true
|
121
171
|
end
|
122
172
|
|
123
|
-
response = request.
|
173
|
+
response = request.submit( 'https://example.com', options )
|
124
174
|
if response.success?
|
125
175
|
result = response.result
|
126
176
|
result.links.each do |link|
|
@@ -145,51 +195,16 @@ options = Firecrawl::CrawlOptions.build do
|
|
145
195
|
end
|
146
196
|
end
|
147
197
|
|
148
|
-
response = request.
|
198
|
+
response = request.submit( 'https://example.com', options )
|
149
199
|
while response.success?
|
150
200
|
crawl_result = response.result
|
151
|
-
crawl_result.scrape_results.each do |result|
|
152
|
-
puts result.metadata['title']
|
201
|
+
crawl_result.scrape_results.each do | result |
|
202
|
+
puts result.metadata[ 'title' ]
|
153
203
|
puts result.markdown
|
154
204
|
end
|
155
|
-
break unless crawl_result.status?(:scraping)
|
205
|
+
break unless crawl_result.status?( :scraping )
|
156
206
|
sleep 0.5
|
157
|
-
response = request.
|
158
|
-
end
|
159
|
-
```
|
160
|
-
|
161
|
-
## Response Structure
|
162
|
-
|
163
|
-
All Firecrawl requests return a Faraday response with an added `result` method. The result will
|
164
|
-
be one of:
|
165
|
-
|
166
|
-
- `ScrapeResult`: Contains the scraped content and metadata
|
167
|
-
- `BatchScrapeResult`: Contains multiple scrape results
|
168
|
-
- `MapResult`: Contains discovered links
|
169
|
-
- `CrawlResult`: Contains scrape results from crawled pages
|
170
|
-
- `ErrorResult`: Contains error information if the request failed
|
171
|
-
|
172
|
-
### Working with Results
|
173
|
-
|
174
|
-
```ruby
|
175
|
-
response = request.scrape(url, options)
|
176
|
-
if response.success?
|
177
|
-
result = response.result
|
178
|
-
if result.success?
|
179
|
-
# Access scraped content
|
180
|
-
puts result.metadata['title']
|
181
|
-
puts result.markdown
|
182
|
-
puts result.html
|
183
|
-
puts result.raw_html
|
184
|
-
puts result.screenshot_url
|
185
|
-
puts result.links
|
186
|
-
|
187
|
-
# Check for warnings
|
188
|
-
puts result.warning if result.warning
|
189
|
-
end
|
190
|
-
else
|
191
|
-
error = response.result
|
192
|
-
puts "#{error.error_type}: #{error.error_description}"
|
207
|
+
response = request.retrieve( crawl_result )
|
193
208
|
end
|
194
209
|
```
|
195
210
|
|
data/firecrawl.gemspec
CHANGED
@@ -3,8 +3,8 @@ module Firecrawl
|
|
3
3
|
##
|
4
4
|
# The +BatchScrapeRequest+ class encapsulates a batch scrape request to the Firecrawl API.
|
5
5
|
# After creating a new +BatchScrapeRequest+ instance you can begin batch scraping by calling
|
6
|
-
# the +
|
7
|
-
# +
|
6
|
+
# the +submit+ method and then subsequently retrieve the results by calling the
|
7
|
+
# +retrieve' method.
|
8
8
|
#
|
9
9
|
# === examples
|
10
10
|
#
|
@@ -18,7 +18,7 @@ module Firecrawl
|
|
18
18
|
# only_main_content true
|
19
19
|
# end
|
20
20
|
#
|
21
|
-
# batch_response = request.
|
21
|
+
# batch_response = request.submit( urls, options )
|
22
22
|
# while response.success?
|
23
23
|
# batch_result = batch_response.result
|
24
24
|
# if batch_result.success?
|
@@ -30,7 +30,7 @@ module Firecrawl
|
|
30
30
|
# end
|
31
31
|
# end
|
32
32
|
# break unless batch_result.status?( :scraping )
|
33
|
-
# batch_response = request.
|
33
|
+
# batch_response = request.retrieve( batch_result )
|
34
34
|
# end
|
35
35
|
#
|
36
36
|
# unless batch_response.success?
|
@@ -40,7 +40,7 @@ module Firecrawl
|
|
40
40
|
class BatchScrapeRequest < Request
|
41
41
|
|
42
42
|
##
|
43
|
-
# The +
|
43
|
+
# The +submit+ method makes a Firecrawl '/batch/scrape/{id}' POST request which will initiate
|
44
44
|
# batch scraping of the given urls.
|
45
45
|
#
|
46
46
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
|
@@ -51,7 +51,7 @@ module Firecrawl
|
|
51
51
|
# successful and then +response.result.success?+ to validate that the API processed the
|
52
52
|
# request successfuly.
|
53
53
|
#
|
54
|
-
def
|
54
|
+
def submit( urls, options = nil, &block )
|
55
55
|
if options
|
56
56
|
options = options.is_a?( ScrapeOptions ) ? options : ScrapeOptions.build( options.to_h )
|
57
57
|
options = options.to_h
|
@@ -61,23 +61,23 @@ module Firecrawl
|
|
61
61
|
options[ :urls ] = [ urls ].flatten
|
62
62
|
response = post( "#{BASE_URI}/batch/scrape", options, &block )
|
63
63
|
result = nil
|
64
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
64
65
|
if response.success?
|
65
|
-
attributes = ( JSON.parse( response.body, symbolize_names: true ) rescue nil )
|
66
66
|
attributes ||= { success: false, status: :failed }
|
67
67
|
result = BatchScrapeResult.new( attributes[ :success ], attributes )
|
68
68
|
else
|
69
|
-
result = ErrorResult.new( response.status, attributes )
|
69
|
+
result = ErrorResult.new( response.status, attributes || {} )
|
70
70
|
end
|
71
71
|
|
72
72
|
ResponseMethods.install( response, result )
|
73
73
|
end
|
74
74
|
|
75
75
|
##
|
76
|
-
# The +
|
77
|
-
#
|
78
|
-
#
|
79
|
-
#
|
80
|
-
#
|
76
|
+
# The +retrieve+ method makes a Firecrawl '/batch/scrape' GET request which will return the
|
77
|
+
# scrape results that were completed since the previous call to this method ( or, if this is
|
78
|
+
# the first call to this method, since the batch scrape was started ). Note that there is no
|
79
|
+
# guarantee that there are any new batch scrape results at the time you make this call
|
80
|
+
# ( scrape_results may be empty ).
|
81
81
|
#
|
82
82
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is +true+,
|
83
83
|
# then +response.result+ will be an instance +BatchScrapeResult+. If the request is not
|
@@ -87,17 +87,53 @@ module Firecrawl
|
|
87
87
|
# successful and then +response.result.success?+ to validate that the API processed the
|
88
88
|
# request successfuly.
|
89
89
|
#
|
90
|
-
def
|
90
|
+
def retrieve( batch_result, &block )
|
91
91
|
raise ArgumentError, "The first argument must be an instance of BatchScrapeResult." \
|
92
92
|
unless batch_result.is_a?( BatchScrapeResult )
|
93
93
|
response = get( batch_result.next_url, &block )
|
94
94
|
result = nil
|
95
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
95
96
|
if response.success?
|
96
|
-
attributes = ( JSON.parse( response.body, symbolize_names: true ) rescue nil )
|
97
97
|
attributes ||= { success: false, status: :failed }
|
98
98
|
result = batch_result.merge( attributes )
|
99
99
|
else
|
100
|
-
result = ErrorResult.new( response.status, attributes )
|
100
|
+
result = ErrorResult.new( response.status, attributes || {} )
|
101
|
+
end
|
102
|
+
|
103
|
+
ResponseMethods.install( response, result )
|
104
|
+
end
|
105
|
+
|
106
|
+
##
|
107
|
+
# The +retrieve_all+ method makes a Firecrawl '/batch/scrape' GET request which will return
|
108
|
+
# the scrape results that were completed at the time of this call. Repeated calls to this
|
109
|
+
# method will retrieve the scrape results previouslly returned as well as any scrape results
|
110
|
+
# that have accumulated since.
|
111
|
+
#
|
112
|
+
# Note that there is no guarantee that there are any new batch scrape results at the time you
|
113
|
+
# make this call ( scrape_results may be empty ).
|
114
|
+
#
|
115
|
+
# The response is always an instance of +Faraday::Response+. If +response.success?+ is +true+,
|
116
|
+
# then +response.result+ will be an instance +BatchScrapeResult+. If the request is not
|
117
|
+
# successful then +response.result+ will be an instance of +ErrorResult+.
|
118
|
+
#
|
119
|
+
# Remember that you should call +response.success?+ to valida that the call to the API was
|
120
|
+
# successful and then +response.result.success?+ to validate that the API processed the
|
121
|
+
# request successfuly.
|
122
|
+
#
|
123
|
+
def retrieve_all( batch_result, &block )
|
124
|
+
raise ArgumentError, "The first argument must be an instance of BatchScrapeResult." \
|
125
|
+
unless batch_result.is_a?( BatchScrapeResult )
|
126
|
+
response = get( batch_result.url, &block )
|
127
|
+
result = nil
|
128
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
129
|
+
if response.success?
|
130
|
+
attributes ||= { success: false, status: :failed }
|
131
|
+
# the next url should not be set by this method so that retrieve and retrieve_all do
|
132
|
+
# not impact each other
|
133
|
+
attributes.delete( :next )
|
134
|
+
result = batch_result.merge( attributes )
|
135
|
+
else
|
136
|
+
result = ErrorResult.new( response.status, attributes || {} )
|
101
137
|
end
|
102
138
|
|
103
139
|
ResponseMethods.install( response, result )
|
@@ -2,9 +2,10 @@ module Firecrawl
|
|
2
2
|
|
3
3
|
##
|
4
4
|
# The +CrawlRequest+ class encapsulates a crawl request to the Firecrawl API. After creating
|
5
|
-
# a new +CrawlRequest+ instance you can begin crawling by calling the +
|
6
|
-
# then subsequently retrieving the results by calling the +
|
7
|
-
#
|
5
|
+
# a new +CrawlRequest+ instance you can begin crawling by calling the +submit+ method and
|
6
|
+
# then subsequently retrieving the results by calling the +retrieve+ method.
|
7
|
+
#
|
8
|
+
# You can also optionally cancel the crawling operation by calling +cancel+.
|
8
9
|
#
|
9
10
|
# === examples
|
10
11
|
#
|
@@ -19,7 +20,7 @@ module Firecrawl
|
|
19
20
|
# end
|
20
21
|
# end
|
21
22
|
#
|
22
|
-
# crawl_response = request.
|
23
|
+
# crawl_response = request.submit( urls, options )
|
23
24
|
# while crawl_response.success?
|
24
25
|
# crawl_result = crawl_response.result
|
25
26
|
# if crawl_result.success?
|
@@ -31,7 +32,7 @@ module Firecrawl
|
|
31
32
|
# end
|
32
33
|
# end
|
33
34
|
# break unless crawl_result.status?( :scraping )
|
34
|
-
# crawl_response = request.
|
35
|
+
# crawl_response = request.retrieve( crawl_result )
|
35
36
|
# end
|
36
37
|
#
|
37
38
|
# unless crawl_response.success?
|
@@ -41,7 +42,7 @@ module Firecrawl
|
|
41
42
|
class CrawlRequest < Request
|
42
43
|
|
43
44
|
##
|
44
|
-
# The +
|
45
|
+
# The +submit+ method makes a Firecrawl '/crawl' POST request which will initiate crawling
|
45
46
|
# of the given url.
|
46
47
|
#
|
47
48
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
|
@@ -52,18 +53,18 @@ module Firecrawl
|
|
52
53
|
# successful and then +response.result.success?+ to validate that the API processed the
|
53
54
|
# request successfuly.
|
54
55
|
#
|
55
|
-
def
|
56
|
+
def submit( url, options = nil, &block )
|
56
57
|
if options
|
57
58
|
options = options.is_a?( CrawlOptions ) ? options : CrawlOptions.build( options.to_h )
|
58
59
|
options = options.to_h
|
59
60
|
else
|
60
61
|
options = {}
|
61
62
|
end
|
62
|
-
options[ url ] = url
|
63
|
+
options[ :url ] = url
|
63
64
|
response = post( "#{BASE_URI}/crawl", options, &block )
|
64
65
|
result = nil
|
66
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
65
67
|
if response.success?
|
66
|
-
attributes = ( JSON.parse( response.body, symbolize_names: true ) rescue nil )
|
67
68
|
attributes ||= { success: false, status: :failed }
|
68
69
|
result = CrawlResult.new( attributes[ :success ], attributes )
|
69
70
|
else
|
@@ -74,10 +75,10 @@ module Firecrawl
|
|
74
75
|
end
|
75
76
|
|
76
77
|
##
|
77
|
-
# The +
|
78
|
-
#
|
79
|
-
#
|
80
|
-
#
|
78
|
+
# The +retrieve+ method makes a Firecrawl '/crawl/{id}' GET request which will return the
|
79
|
+
# crawl results that were completed since the previous call to this method( or, if this is
|
80
|
+
# the first call to this method, since the crawl was started ). Note that there is no
|
81
|
+
# guarantee that there are any new crawl results at the time you make this call
|
81
82
|
# ( scrape_results may be empty ).
|
82
83
|
#
|
83
84
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is
|
@@ -88,25 +89,24 @@ module Firecrawl
|
|
88
89
|
# successful and then +response.result.success?+ to validate that the API processed the
|
89
90
|
# request successfuly.
|
90
91
|
#
|
91
|
-
def
|
92
|
+
def retrieve( crawl_result, &block )
|
92
93
|
raise ArgumentError, "The first argument must be an instance of CrawlResult." \
|
93
94
|
unless crawl_result.is_a?( CrawlResult )
|
94
95
|
response = get( crawl_result.next_url, &block )
|
95
96
|
result = nil
|
97
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
96
98
|
if response.success?
|
97
|
-
|
98
|
-
attributes ||= { success: false, status: :failed }
|
99
|
-
result = crawl_result.merge( attributes )
|
99
|
+
result = crawl_result.merge( attributes || { success: false, status: :failed } )
|
100
100
|
else
|
101
|
-
result = ErrorResult.new( response.status, attributes )
|
101
|
+
result = ErrorResult.new( response.status, attributes || {} )
|
102
102
|
end
|
103
103
|
|
104
104
|
ResponseMethods.install( response, result )
|
105
105
|
end
|
106
106
|
|
107
107
|
##
|
108
|
-
# The +
|
109
|
-
#
|
108
|
+
# The +cancel+ method makes a Firecrawl '/crawl/{id}' DELETE request which will cancel a
|
109
|
+
# previouslly submitted crawl.
|
110
110
|
#
|
111
111
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is
|
112
112
|
# +true+, then +response.result+ will be an instance +CrawlResult+. If the request is not
|
@@ -116,17 +116,16 @@ module Firecrawl
|
|
116
116
|
# successful and then +response.result.success?+ to validate that the API processed the
|
117
117
|
# request successfuly.
|
118
118
|
#
|
119
|
-
def
|
119
|
+
def cancel( crawl_result, &block )
|
120
120
|
raise ArgumentError, "The first argument must be an instance of CrawlResult." \
|
121
121
|
unless crawl_result.is_a?( CrawlResult )
|
122
122
|
response = get( crawl_result.url, &block )
|
123
123
|
result = nil
|
124
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
124
125
|
if response.success?
|
125
|
-
|
126
|
-
attributes ||= { success: false, status: :failed }
|
127
|
-
result = crawl_result.merge( attributes )
|
126
|
+
result = crawl_result.merge( attributes || { success: false, status: :failed } )
|
128
127
|
else
|
129
|
-
result = ErrorResult.new( response.status, attributes )
|
128
|
+
result = ErrorResult.new( response.status, attributes || {} )
|
130
129
|
end
|
131
130
|
|
132
131
|
ResponseMethods.install( response, result )
|
@@ -5,7 +5,7 @@ module Firecrawl
|
|
5
5
|
|
6
6
|
def initialize( status_code, attributes = nil )
|
7
7
|
@error_code, @error_description = status_code_to_error( status_code )
|
8
|
-
@error_description = attributes[ :error ] if
|
8
|
+
@error_description = attributes[ :error ] if attributes&.respond_to?( :[] )
|
9
9
|
end
|
10
10
|
|
11
11
|
private
|
@@ -10,7 +10,7 @@ module Firecrawl
|
|
10
10
|
#
|
11
11
|
# request = Firecrawl::MapRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' )
|
12
12
|
#
|
13
|
-
# response = request.
|
13
|
+
# response = request.submit( 'https://example.com', { limit: 100 } )
|
14
14
|
# if response.success?
|
15
15
|
# result = response.result
|
16
16
|
# if result.success?
|
@@ -25,14 +25,14 @@ module Firecrawl
|
|
25
25
|
class MapRequest < Request
|
26
26
|
|
27
27
|
##
|
28
|
-
# The +
|
29
|
-
# given url.
|
28
|
+
# The +submit+ method makes a Firecrawl '/map' POST request which will scrape the site with
|
29
|
+
# given url and return links to all hosted pages related to that url.
|
30
30
|
#
|
31
31
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
|
32
32
|
# then +response.result+ will be an instance +MapResult+. If the request is not successful
|
33
33
|
# then +response.result+ will be an instance of +ErrorResult+.
|
34
34
|
#
|
35
|
-
def
|
35
|
+
def submit( url, options = nil, &block )
|
36
36
|
if options
|
37
37
|
options = options.is_a?( MapOptions ) ? options : MapOptions.build( options.to_h )
|
38
38
|
options = options.to_h
|
@@ -1,7 +1,7 @@
|
|
1
1
|
module Firecrawl
|
2
2
|
##
|
3
3
|
# The +ScrapeRequest+ class encapsulates a '/scrape' POST request to the Firecrawl API. After
|
4
|
-
# creating a new +ScrapeRequest+ instance you can initiate the request by calling the +
|
4
|
+
# creating a new +ScrapeRequest+ instance you can initiate the request by calling the +submit+
|
5
5
|
# method to perform synchronous scraping.
|
6
6
|
#
|
7
7
|
# === examples
|
@@ -15,7 +15,7 @@ module Firecrawl
|
|
15
15
|
# only_main_content true
|
16
16
|
# end
|
17
17
|
#
|
18
|
-
# response = request.
|
18
|
+
# response = request.submit( 'https://example.com', options )
|
19
19
|
# if response.success?
|
20
20
|
# result = response.result
|
21
21
|
# puts response.metadata[ 'title ]
|
@@ -28,13 +28,13 @@ module Firecrawl
|
|
28
28
|
class ScrapeRequest < Request
|
29
29
|
|
30
30
|
##
|
31
|
-
# The +
|
31
|
+
# The +submit+ method makes a Firecrawl '/scrape' POST request which will scrape the given url.
|
32
32
|
#
|
33
33
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
|
34
34
|
# then +response.result+ will be an instance +ScrapeResult+. If the request is not successful
|
35
35
|
# then +response.result+ will be an instance of +ErrorResult+.
|
36
36
|
#
|
37
|
-
def
|
37
|
+
def submit( url, options = nil, &block )
|
38
38
|
if options
|
39
39
|
options = options.is_a?( ScrapeOptions ) ? options : ScrapeOptions.build( options.to_h )
|
40
40
|
options = options.to_h
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: firecrawl
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Kristoph Cichocki-Romanov
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-11-
|
11
|
+
date: 2024-11-29 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: faraday
|