firecrawl 0.1.0 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +67 -52
- data/firecrawl.gemspec +1 -1
- data/lib/firecrawl/batch_scrape_request.rb +52 -16
- data/lib/firecrawl/crawl_request.rb +24 -25
- data/lib/firecrawl/error_result.rb +1 -1
- data/lib/firecrawl/map_request.rb +4 -4
- data/lib/firecrawl/module_methods.rb +1 -1
- data/lib/firecrawl/scrape_request.rb +4 -4
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7e55dc5e433f0632ab0c11feda818bf82ddf77ae8ea2fdaa624c5a9af1dccf4c
|
4
|
+
data.tar.gz: 2781378e0a6b62c2e7befb0ebd0298b0b91b2cf249fe33115dc675606aed7bfe
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6fa88114c36df02f9cd261159132298e9a44b01464334daeb31ea2d8d5b11122321066ddf55207fddc3bef72704b353cf42d7aebaa46ac70298ea1efe19c6885
|
7
|
+
data.tar.gz: 622fd277c01854131b4a21742c915359c97fffa54b4dc41aff47d09b06d4d7c6c485361ff6409dc1361cf561f313d42096674a426987b5bbf696dcba1f52cd96
|
data/README.md
CHANGED
@@ -11,7 +11,7 @@ provide markdown information for real time information lookup as well as groundi
|
|
11
11
|
require 'firecrawl'
|
12
12
|
|
13
13
|
Firecrawl.api_key ENV[ 'FIRECRAWL_API_KEY' ]
|
14
|
-
response = Firecrawl.scrape( 'https://example.com'
|
14
|
+
response = Firecrawl.scrape( 'https://example.com' )
|
15
15
|
if response.success?
|
16
16
|
result = response.result
|
17
17
|
puts result.metadata[ 'title' ]
|
@@ -45,18 +45,44 @@ $ gem install firecrawl
|
|
45
45
|
|
46
46
|
## Usage
|
47
47
|
|
48
|
-
###
|
48
|
+
### Scraping
|
49
49
|
|
50
|
-
The simplest way to use Firecrawl is to scrape a single page
|
50
|
+
The simplest way to use Firecrawl is to `scrape`, which will scrape the content of a single page
|
51
|
+
at the given url and optionally convert it to markdown as well as create a screenshot. You can
|
52
|
+
chose to scrape the entire page or only the main content.
|
51
53
|
|
52
54
|
```ruby
|
53
|
-
Firecrawl.api_key ENV['FIRECRAWL_API_KEY']
|
54
|
-
response = Firecrawl.scrape('https://example.com', format: :markdown )
|
55
|
+
Firecrawl.api_key ENV[ 'FIRECRAWL_API_KEY' ]
|
56
|
+
response = Firecrawl.scrape( 'https://example.com', format: :markdown )
|
55
57
|
|
56
58
|
if response.success?
|
57
59
|
result = response.result
|
58
60
|
if result.success?
|
59
|
-
puts result.metadata['title']
|
61
|
+
puts result.metadata[ 'title' ]
|
62
|
+
puts result.markdown
|
63
|
+
end
|
64
|
+
else
|
65
|
+
puts response.result.error_description
|
66
|
+
end
|
67
|
+
```
|
68
|
+
|
69
|
+
In this basic example we have globally set the `Firecrawl.api_key` from the environment and then
|
70
|
+
used the `Firecrawl.scrape` convenience method to make a request to the Firecrawl API to scrape
|
71
|
+
the `https://example.com` page and return markdown ( markdown and the main content of the page
|
72
|
+
are returned by default so we could have ommitted the options entirelly ).
|
73
|
+
|
74
|
+
The `Firecrawl.scrape` method instantiates a `Firecrawl::ScrapeRequest` instance and then calls
|
75
|
+
it's `submit` method. The following is the equivalent code which makes explict use of the
|
76
|
+
`Firecrawl::ScrapeRequest` class.
|
77
|
+
|
78
|
+
```ruby
|
79
|
+
request = Firecrawl::ScrapeRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )
|
80
|
+
response = request.submit( 'https://example.com', format: :markdown )
|
81
|
+
|
82
|
+
if response.success?
|
83
|
+
result = response.result
|
84
|
+
if result.success?
|
85
|
+
puts result.metadata[ 'title' ]
|
60
86
|
puts result.markdown
|
61
87
|
end
|
62
88
|
else
|
@@ -64,9 +90,14 @@ else
|
|
64
90
|
end
|
65
91
|
```
|
66
92
|
|
67
|
-
|
93
|
+
Notice also that in this example we've directly passed the `api_key` to the individual request.
|
94
|
+
This is optional. If you set the key globally and omit it in the request constructor the
|
95
|
+
`ScrapeRequest` instance will use the globally assigned `api_key`.
|
96
|
+
|
97
|
+
#### Scrape Options
|
68
98
|
|
69
|
-
You can customize scraping behavior using
|
99
|
+
You can customize scraping behavior using options, either by passing an option hash to
|
100
|
+
`submit` method, as we have done above, or by building a `ScrapeOptions` instance:
|
70
101
|
|
71
102
|
```ruby
|
72
103
|
options = Firecrawl::ScrapeOptions.build do
|
@@ -78,9 +109,28 @@ options = Firecrawl::ScrapeOptions.build do
|
|
78
109
|
end
|
79
110
|
|
80
111
|
request = Firecrawl::ScrapeRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )
|
81
|
-
response = request.
|
112
|
+
response = request.submit( 'https://example.com', options )
|
82
113
|
```
|
83
114
|
|
115
|
+
#### Scrape Response
|
116
|
+
|
117
|
+
The `Firecrawl` gem is based on the `Faraday` gem, which permits you to customize the request
|
118
|
+
orchestration, up to and including changing the actual HTTP implementation used to make the
|
119
|
+
request. See Connections below for additional details.
|
120
|
+
|
121
|
+
Any `Firecrawl` request, including the `submit` method as used above, will thus return a
|
122
|
+
`Faraday::Response`. This response includes a `success?` method which indicates if the request
|
123
|
+
was successful. If the request was successful, the `response.result` method will be an instance
|
124
|
+
of `Firecrawl::ScrapeResult` that will encapsulate the scraping result. This instance, in turn,
|
125
|
+
has a `success?` method which will return `true` if Firecrawl successfully scraped the page.
|
126
|
+
|
127
|
+
A successful result will include html, markdown, screenshot, as well as any action and llm
|
128
|
+
results and related metadata.
|
129
|
+
|
130
|
+
If the response is not successful ( if `response.success?` is `false` ) then `response.result`
|
131
|
+
will be an instance of Firecrawl::ErrorResult which will provide additional details about the
|
132
|
+
nature of the failure.
|
133
|
+
|
84
134
|
### Batch Scraping
|
85
135
|
|
86
136
|
For scraping multiple URLs efficiently:
|
@@ -94,7 +144,7 @@ options = Firecrawl::ScrapeOptions.build do
|
|
94
144
|
only_main_content true
|
95
145
|
end
|
96
146
|
|
97
|
-
response = request.
|
147
|
+
response = request.submit( urls, options )
|
98
148
|
while response.success?
|
99
149
|
batch_result = response.result
|
100
150
|
batch_result.scrape_results.each do |result|
|
@@ -104,7 +154,7 @@ while response.success?
|
|
104
154
|
end
|
105
155
|
break unless batch_result.status?( :scraping )
|
106
156
|
sleep 0.5
|
107
|
-
response = request.
|
157
|
+
response = request.retrieve( batch_result )
|
108
158
|
end
|
109
159
|
```
|
110
160
|
|
@@ -120,7 +170,7 @@ options = Firecrawl::MapOptions.build do
|
|
120
170
|
ignore_subdomains true
|
121
171
|
end
|
122
172
|
|
123
|
-
response = request.
|
173
|
+
response = request.submit( 'https://example.com', options )
|
124
174
|
if response.success?
|
125
175
|
result = response.result
|
126
176
|
result.links.each do |link|
|
@@ -145,51 +195,16 @@ options = Firecrawl::CrawlOptions.build do
|
|
145
195
|
end
|
146
196
|
end
|
147
197
|
|
148
|
-
response = request.
|
198
|
+
response = request.submit( 'https://example.com', options )
|
149
199
|
while response.success?
|
150
200
|
crawl_result = response.result
|
151
|
-
crawl_result.scrape_results.each do |result|
|
152
|
-
puts result.metadata['title']
|
201
|
+
crawl_result.scrape_results.each do | result |
|
202
|
+
puts result.metadata[ 'title' ]
|
153
203
|
puts result.markdown
|
154
204
|
end
|
155
|
-
break unless crawl_result.status?(:scraping)
|
205
|
+
break unless crawl_result.status?( :scraping )
|
156
206
|
sleep 0.5
|
157
|
-
response = request.
|
158
|
-
end
|
159
|
-
```
|
160
|
-
|
161
|
-
## Response Structure
|
162
|
-
|
163
|
-
All Firecrawl requests return a Faraday response with an added `result` method. The result will
|
164
|
-
be one of:
|
165
|
-
|
166
|
-
- `ScrapeResult`: Contains the scraped content and metadata
|
167
|
-
- `BatchScrapeResult`: Contains multiple scrape results
|
168
|
-
- `MapResult`: Contains discovered links
|
169
|
-
- `CrawlResult`: Contains scrape results from crawled pages
|
170
|
-
- `ErrorResult`: Contains error information if the request failed
|
171
|
-
|
172
|
-
### Working with Results
|
173
|
-
|
174
|
-
```ruby
|
175
|
-
response = request.scrape(url, options)
|
176
|
-
if response.success?
|
177
|
-
result = response.result
|
178
|
-
if result.success?
|
179
|
-
# Access scraped content
|
180
|
-
puts result.metadata['title']
|
181
|
-
puts result.markdown
|
182
|
-
puts result.html
|
183
|
-
puts result.raw_html
|
184
|
-
puts result.screenshot_url
|
185
|
-
puts result.links
|
186
|
-
|
187
|
-
# Check for warnings
|
188
|
-
puts result.warning if result.warning
|
189
|
-
end
|
190
|
-
else
|
191
|
-
error = response.result
|
192
|
-
puts "#{error.error_type}: #{error.error_description}"
|
207
|
+
response = request.retrieve( crawl_result )
|
193
208
|
end
|
194
209
|
```
|
195
210
|
|
data/firecrawl.gemspec
CHANGED
@@ -3,8 +3,8 @@ module Firecrawl
|
|
3
3
|
##
|
4
4
|
# The +BatchScrapeRequest+ class encapsulates a batch scrape request to the Firecrawl API.
|
5
5
|
# After creating a new +BatchScrapeRequest+ instance you can begin batch scraping by calling
|
6
|
-
# the +
|
7
|
-
# +
|
6
|
+
# the +submit+ method and then subsequently retrieve the results by calling the
|
7
|
+
# +retrieve' method.
|
8
8
|
#
|
9
9
|
# === examples
|
10
10
|
#
|
@@ -18,7 +18,7 @@ module Firecrawl
|
|
18
18
|
# only_main_content true
|
19
19
|
# end
|
20
20
|
#
|
21
|
-
# batch_response = request.
|
21
|
+
# batch_response = request.submit( urls, options )
|
22
22
|
# while response.success?
|
23
23
|
# batch_result = batch_response.result
|
24
24
|
# if batch_result.success?
|
@@ -30,7 +30,7 @@ module Firecrawl
|
|
30
30
|
# end
|
31
31
|
# end
|
32
32
|
# break unless batch_result.status?( :scraping )
|
33
|
-
# batch_response = request.
|
33
|
+
# batch_response = request.retrieve( batch_result )
|
34
34
|
# end
|
35
35
|
#
|
36
36
|
# unless batch_response.success?
|
@@ -40,7 +40,7 @@ module Firecrawl
|
|
40
40
|
class BatchScrapeRequest < Request
|
41
41
|
|
42
42
|
##
|
43
|
-
# The +
|
43
|
+
# The +submit+ method makes a Firecrawl '/batch/scrape/{id}' POST request which will initiate
|
44
44
|
# batch scraping of the given urls.
|
45
45
|
#
|
46
46
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
|
@@ -51,7 +51,7 @@ module Firecrawl
|
|
51
51
|
# successful and then +response.result.success?+ to validate that the API processed the
|
52
52
|
# request successfuly.
|
53
53
|
#
|
54
|
-
def
|
54
|
+
def submit( urls, options = nil, &block )
|
55
55
|
if options
|
56
56
|
options = options.is_a?( ScrapeOptions ) ? options : ScrapeOptions.build( options.to_h )
|
57
57
|
options = options.to_h
|
@@ -61,23 +61,23 @@ module Firecrawl
|
|
61
61
|
options[ :urls ] = [ urls ].flatten
|
62
62
|
response = post( "#{BASE_URI}/batch/scrape", options, &block )
|
63
63
|
result = nil
|
64
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
64
65
|
if response.success?
|
65
|
-
attributes = ( JSON.parse( response.body, symbolize_names: true ) rescue nil )
|
66
66
|
attributes ||= { success: false, status: :failed }
|
67
67
|
result = BatchScrapeResult.new( attributes[ :success ], attributes )
|
68
68
|
else
|
69
|
-
result = ErrorResult.new( response.status, attributes )
|
69
|
+
result = ErrorResult.new( response.status, attributes || {} )
|
70
70
|
end
|
71
71
|
|
72
72
|
ResponseMethods.install( response, result )
|
73
73
|
end
|
74
74
|
|
75
75
|
##
|
76
|
-
# The +
|
77
|
-
#
|
78
|
-
#
|
79
|
-
#
|
80
|
-
#
|
76
|
+
# The +retrieve+ method makes a Firecrawl '/batch/scrape' GET request which will return the
|
77
|
+
# scrape results that were completed since the previous call to this method ( or, if this is
|
78
|
+
# the first call to this method, since the batch scrape was started ). Note that there is no
|
79
|
+
# guarantee that there are any new batch scrape results at the time you make this call
|
80
|
+
# ( scrape_results may be empty ).
|
81
81
|
#
|
82
82
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is +true+,
|
83
83
|
# then +response.result+ will be an instance +BatchScrapeResult+. If the request is not
|
@@ -87,17 +87,53 @@ module Firecrawl
|
|
87
87
|
# successful and then +response.result.success?+ to validate that the API processed the
|
88
88
|
# request successfuly.
|
89
89
|
#
|
90
|
-
def
|
90
|
+
def retrieve( batch_result, &block )
|
91
91
|
raise ArgumentError, "The first argument must be an instance of BatchScrapeResult." \
|
92
92
|
unless batch_result.is_a?( BatchScrapeResult )
|
93
93
|
response = get( batch_result.next_url, &block )
|
94
94
|
result = nil
|
95
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
95
96
|
if response.success?
|
96
|
-
attributes = ( JSON.parse( response.body, symbolize_names: true ) rescue nil )
|
97
97
|
attributes ||= { success: false, status: :failed }
|
98
98
|
result = batch_result.merge( attributes )
|
99
99
|
else
|
100
|
-
result = ErrorResult.new( response.status, attributes )
|
100
|
+
result = ErrorResult.new( response.status, attributes || {} )
|
101
|
+
end
|
102
|
+
|
103
|
+
ResponseMethods.install( response, result )
|
104
|
+
end
|
105
|
+
|
106
|
+
##
|
107
|
+
# The +retrieve_all+ method makes a Firecrawl '/batch/scrape' GET request which will return
|
108
|
+
# the scrape results that were completed at the time of this call. Repeated calls to this
|
109
|
+
# method will retrieve the scrape results previouslly returned as well as any scrape results
|
110
|
+
# that have accumulated since.
|
111
|
+
#
|
112
|
+
# Note that there is no guarantee that there are any new batch scrape results at the time you
|
113
|
+
# make this call ( scrape_results may be empty ).
|
114
|
+
#
|
115
|
+
# The response is always an instance of +Faraday::Response+. If +response.success?+ is +true+,
|
116
|
+
# then +response.result+ will be an instance +BatchScrapeResult+. If the request is not
|
117
|
+
# successful then +response.result+ will be an instance of +ErrorResult+.
|
118
|
+
#
|
119
|
+
# Remember that you should call +response.success?+ to valida that the call to the API was
|
120
|
+
# successful and then +response.result.success?+ to validate that the API processed the
|
121
|
+
# request successfuly.
|
122
|
+
#
|
123
|
+
def retrieve_all( batch_result, &block )
|
124
|
+
raise ArgumentError, "The first argument must be an instance of BatchScrapeResult." \
|
125
|
+
unless batch_result.is_a?( BatchScrapeResult )
|
126
|
+
response = get( batch_result.url, &block )
|
127
|
+
result = nil
|
128
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
129
|
+
if response.success?
|
130
|
+
attributes ||= { success: false, status: :failed }
|
131
|
+
# the next url should not be set by this method so that retrieve and retrieve_all do
|
132
|
+
# not impact each other
|
133
|
+
attributes.delete( :next )
|
134
|
+
result = batch_result.merge( attributes )
|
135
|
+
else
|
136
|
+
result = ErrorResult.new( response.status, attributes || {} )
|
101
137
|
end
|
102
138
|
|
103
139
|
ResponseMethods.install( response, result )
|
@@ -2,9 +2,10 @@ module Firecrawl
|
|
2
2
|
|
3
3
|
##
|
4
4
|
# The +CrawlRequest+ class encapsulates a crawl request to the Firecrawl API. After creating
|
5
|
-
# a new +CrawlRequest+ instance you can begin crawling by calling the +
|
6
|
-
# then subsequently retrieving the results by calling the +
|
7
|
-
#
|
5
|
+
# a new +CrawlRequest+ instance you can begin crawling by calling the +submit+ method and
|
6
|
+
# then subsequently retrieving the results by calling the +retrieve+ method.
|
7
|
+
#
|
8
|
+
# You can also optionally cancel the crawling operation by calling +cancel+.
|
8
9
|
#
|
9
10
|
# === examples
|
10
11
|
#
|
@@ -19,7 +20,7 @@ module Firecrawl
|
|
19
20
|
# end
|
20
21
|
# end
|
21
22
|
#
|
22
|
-
# crawl_response = request.
|
23
|
+
# crawl_response = request.submit( urls, options )
|
23
24
|
# while crawl_response.success?
|
24
25
|
# crawl_result = crawl_response.result
|
25
26
|
# if crawl_result.success?
|
@@ -31,7 +32,7 @@ module Firecrawl
|
|
31
32
|
# end
|
32
33
|
# end
|
33
34
|
# break unless crawl_result.status?( :scraping )
|
34
|
-
# crawl_response = request.
|
35
|
+
# crawl_response = request.retrieve( crawl_result )
|
35
36
|
# end
|
36
37
|
#
|
37
38
|
# unless crawl_response.success?
|
@@ -41,7 +42,7 @@ module Firecrawl
|
|
41
42
|
class CrawlRequest < Request
|
42
43
|
|
43
44
|
##
|
44
|
-
# The +
|
45
|
+
# The +submit+ method makes a Firecrawl '/crawl' POST request which will initiate crawling
|
45
46
|
# of the given url.
|
46
47
|
#
|
47
48
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
|
@@ -52,18 +53,18 @@ module Firecrawl
|
|
52
53
|
# successful and then +response.result.success?+ to validate that the API processed the
|
53
54
|
# request successfuly.
|
54
55
|
#
|
55
|
-
def
|
56
|
+
def submit( url, options = nil, &block )
|
56
57
|
if options
|
57
58
|
options = options.is_a?( CrawlOptions ) ? options : CrawlOptions.build( options.to_h )
|
58
59
|
options = options.to_h
|
59
60
|
else
|
60
61
|
options = {}
|
61
62
|
end
|
62
|
-
options[ url ] = url
|
63
|
+
options[ :url ] = url
|
63
64
|
response = post( "#{BASE_URI}/crawl", options, &block )
|
64
65
|
result = nil
|
66
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
65
67
|
if response.success?
|
66
|
-
attributes = ( JSON.parse( response.body, symbolize_names: true ) rescue nil )
|
67
68
|
attributes ||= { success: false, status: :failed }
|
68
69
|
result = CrawlResult.new( attributes[ :success ], attributes )
|
69
70
|
else
|
@@ -74,10 +75,10 @@ module Firecrawl
|
|
74
75
|
end
|
75
76
|
|
76
77
|
##
|
77
|
-
# The +
|
78
|
-
#
|
79
|
-
#
|
80
|
-
#
|
78
|
+
# The +retrieve+ method makes a Firecrawl '/crawl/{id}' GET request which will return the
|
79
|
+
# crawl results that were completed since the previous call to this method( or, if this is
|
80
|
+
# the first call to this method, since the crawl was started ). Note that there is no
|
81
|
+
# guarantee that there are any new crawl results at the time you make this call
|
81
82
|
# ( scrape_results may be empty ).
|
82
83
|
#
|
83
84
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is
|
@@ -88,25 +89,24 @@ module Firecrawl
|
|
88
89
|
# successful and then +response.result.success?+ to validate that the API processed the
|
89
90
|
# request successfuly.
|
90
91
|
#
|
91
|
-
def
|
92
|
+
def retrieve( crawl_result, &block )
|
92
93
|
raise ArgumentError, "The first argument must be an instance of CrawlResult." \
|
93
94
|
unless crawl_result.is_a?( CrawlResult )
|
94
95
|
response = get( crawl_result.next_url, &block )
|
95
96
|
result = nil
|
97
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
96
98
|
if response.success?
|
97
|
-
|
98
|
-
attributes ||= { success: false, status: :failed }
|
99
|
-
result = crawl_result.merge( attributes )
|
99
|
+
result = crawl_result.merge( attributes || { success: false, status: :failed } )
|
100
100
|
else
|
101
|
-
result = ErrorResult.new( response.status, attributes )
|
101
|
+
result = ErrorResult.new( response.status, attributes || {} )
|
102
102
|
end
|
103
103
|
|
104
104
|
ResponseMethods.install( response, result )
|
105
105
|
end
|
106
106
|
|
107
107
|
##
|
108
|
-
# The +
|
109
|
-
#
|
108
|
+
# The +cancel+ method makes a Firecrawl '/crawl/{id}' DELETE request which will cancel a
|
109
|
+
# previouslly submitted crawl.
|
110
110
|
#
|
111
111
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is
|
112
112
|
# +true+, then +response.result+ will be an instance +CrawlResult+. If the request is not
|
@@ -116,17 +116,16 @@ module Firecrawl
|
|
116
116
|
# successful and then +response.result.success?+ to validate that the API processed the
|
117
117
|
# request successfuly.
|
118
118
|
#
|
119
|
-
def
|
119
|
+
def cancel( crawl_result, &block )
|
120
120
|
raise ArgumentError, "The first argument must be an instance of CrawlResult." \
|
121
121
|
unless crawl_result.is_a?( CrawlResult )
|
122
122
|
response = get( crawl_result.url, &block )
|
123
123
|
result = nil
|
124
|
+
attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
|
124
125
|
if response.success?
|
125
|
-
|
126
|
-
attributes ||= { success: false, status: :failed }
|
127
|
-
result = crawl_result.merge( attributes )
|
126
|
+
result = crawl_result.merge( attributes || { success: false, status: :failed } )
|
128
127
|
else
|
129
|
-
result = ErrorResult.new( response.status, attributes )
|
128
|
+
result = ErrorResult.new( response.status, attributes || {} )
|
130
129
|
end
|
131
130
|
|
132
131
|
ResponseMethods.install( response, result )
|
@@ -5,7 +5,7 @@ module Firecrawl
|
|
5
5
|
|
6
6
|
def initialize( status_code, attributes = nil )
|
7
7
|
@error_code, @error_description = status_code_to_error( status_code )
|
8
|
-
@error_description = attributes[ :error ] if
|
8
|
+
@error_description = attributes[ :error ] if attributes&.respond_to?( :[] )
|
9
9
|
end
|
10
10
|
|
11
11
|
private
|
@@ -10,7 +10,7 @@ module Firecrawl
|
|
10
10
|
#
|
11
11
|
# request = Firecrawl::MapRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' )
|
12
12
|
#
|
13
|
-
# response = request.
|
13
|
+
# response = request.submit( 'https://example.com', { limit: 100 } )
|
14
14
|
# if response.success?
|
15
15
|
# result = response.result
|
16
16
|
# if result.success?
|
@@ -25,14 +25,14 @@ module Firecrawl
|
|
25
25
|
class MapRequest < Request
|
26
26
|
|
27
27
|
##
|
28
|
-
# The +
|
29
|
-
# given url.
|
28
|
+
# The +submit+ method makes a Firecrawl '/map' POST request which will scrape the site with
|
29
|
+
# given url and return links to all hosted pages related to that url.
|
30
30
|
#
|
31
31
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
|
32
32
|
# then +response.result+ will be an instance +MapResult+. If the request is not successful
|
33
33
|
# then +response.result+ will be an instance of +ErrorResult+.
|
34
34
|
#
|
35
|
-
def
|
35
|
+
def submit( url, options = nil, &block )
|
36
36
|
if options
|
37
37
|
options = options.is_a?( MapOptions ) ? options : MapOptions.build( options.to_h )
|
38
38
|
options = options.to_h
|
@@ -1,7 +1,7 @@
|
|
1
1
|
module Firecrawl
|
2
2
|
##
|
3
3
|
# The +ScrapeRequest+ class encapsulates a '/scrape' POST request to the Firecrawl API. After
|
4
|
-
# creating a new +ScrapeRequest+ instance you can initiate the request by calling the +
|
4
|
+
# creating a new +ScrapeRequest+ instance you can initiate the request by calling the +submit+
|
5
5
|
# method to perform synchronous scraping.
|
6
6
|
#
|
7
7
|
# === examples
|
@@ -15,7 +15,7 @@ module Firecrawl
|
|
15
15
|
# only_main_content true
|
16
16
|
# end
|
17
17
|
#
|
18
|
-
# response = request.
|
18
|
+
# response = request.submit( 'https://example.com', options )
|
19
19
|
# if response.success?
|
20
20
|
# result = response.result
|
21
21
|
# puts response.metadata[ 'title ]
|
@@ -28,13 +28,13 @@ module Firecrawl
|
|
28
28
|
class ScrapeRequest < Request
|
29
29
|
|
30
30
|
##
|
31
|
-
# The +
|
31
|
+
# The +submit+ method makes a Firecrawl '/scrape' POST request which will scrape the given url.
|
32
32
|
#
|
33
33
|
# The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
|
34
34
|
# then +response.result+ will be an instance +ScrapeResult+. If the request is not successful
|
35
35
|
# then +response.result+ will be an instance of +ErrorResult+.
|
36
36
|
#
|
37
|
-
def
|
37
|
+
def submit( url, options = nil, &block )
|
38
38
|
if options
|
39
39
|
options = options.is_a?( ScrapeOptions ) ? options : ScrapeOptions.build( options.to_h )
|
40
40
|
options = options.to_h
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: firecrawl
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Kristoph Cichocki-Romanov
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-11-
|
11
|
+
date: 2024-11-29 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: faraday
|