RubyGems - firecrawl - Versions diffs - 0.0.1 → 0.2.0 - Mend

firecrawl 0.0.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

checksums.yaml +4 -4
data/README.md +213 -0
data/firecrawl.gemspec +3 -2
data/lib/firecrawl/batch_scrape_request.rb +55 -18
data/lib/firecrawl/crawl_options.rb +5 -9
data/lib/firecrawl/crawl_request.rb +135 -0
data/lib/firecrawl/crawl_result.rb +63 -0
data/lib/firecrawl/error_result.rb +1 -1
data/lib/firecrawl/map_request.rb +4 -4
data/lib/firecrawl/module_methods.rb +18 -0
data/lib/firecrawl/request.rb +13 -3
data/lib/firecrawl/scrape_options.rb +2 -2
data/lib/firecrawl/scrape_request.rb +4 -4
data/lib/firecrawl/scrape_result.rb +14 -14
data/lib/firecrawl.rb +7 -3
metadata +22 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d79423650fbec60124bab7710e372f16774987f437e8bb52cb13e72a434bdec6
-  data.tar.gz: bfb0ec5c302e4e646812855d5c954e2fc6c075e4a03af495a6a09815ce588e44
+  metadata.gz: 7e55dc5e433f0632ab0c11feda818bf82ddf77ae8ea2fdaa624c5a9af1dccf4c
+  data.tar.gz: 2781378e0a6b62c2e7befb0ebd0298b0b91b2cf249fe33115dc675606aed7bfe
 SHA512:
-  metadata.gz: 46c819135a19388beb434c797e16a3a73f52b1d85cc37f317e03dc159de8ed1a56a6a5170f2e2457e2798fbff25d93cfbf11f7be12682fcb6ef13534aa347f62
-  data.tar.gz: ffe9b239c29138617a1902f8502099b6f0b2019eeed8b266cd8f77e94eee868e8e876ee9f8545cd584d701a8bfffa75f6552508b2949463f2cc187580eeb35fe
+  metadata.gz: 6fa88114c36df02f9cd261159132298e9a44b01464334daeb31ea2d8d5b11122321066ddf55207fddc3bef72704b353cf42d7aebaa46ac70298ea1efe19c6885
+  data.tar.gz: 622fd277c01854131b4a21742c915359c97fffa54b4dc41aff47d09b06d4d7c6c485361ff6409dc1361cf561f313d42096674a426987b5bbf696dcba1f52cd96

data/README.md ADDED Viewed

@@ -0,0 +1,213 @@
+# Firecrawl
+Firecrawl is a lightweight Ruby gem that provides a semantically straightfoward interface to
+the Firecrawl.dev API, allowing you to easily scrape web content, take screenshots, as well as
+crawl entire web domains.
+The gem is particularly useful when working with Large Language Models (LLMs) as it can
+provide markdown information for real time information lookup as well as grounding.
+```ruby
+require 'firecrawl'
+Firecrawl.api_key ENV[ 'FIRECRAWL_API_KEY' ]
+response = Firecrawl.scrape( 'https://example.com' )
+if response.success?
+  result = response.result
+  puts result.metadata[ 'title' ]
+  puts '---'
+  puts result.markdown
+  puts "Screenshot URL: #{ result.screenshot_url }"
+else
+  puts response.result.error_description
+end
+```
+## Installation
+Add this line to your application's Gemfile:
+```ruby
+gem 'firecrawl'
+```
+Then execute:
+```bash
+$ bundle install
+```
+Or install it directly:
+```bash
+$ gem install firecrawl
+```
+## Usage
+### Scraping
+The simplest way to use Firecrawl is to `scrape`, which will scrape the content of a single page
+at the given url and optionally convert it to markdown as well as create a screenshot. You can
+chose to scrape the entire page or only the main content.
+```ruby
+Firecrawl.api_key ENV[ 'FIRECRAWL_API_KEY' ]
+response = Firecrawl.scrape( 'https://example.com', format: :markdown )
+if response.success?
+  result = response.result
+  if result.success?
+    puts result.metadata[ 'title' ]
+    puts result.markdown
+  end
+else
+  puts response.result.error_description
+end
+```
+In this basic example we have globally set the `Firecrawl.api_key` from the environment and then
+used the `Firecrawl.scrape` convenience method to make a request to the Firecrawl API to scrape
+the `https://example.com` page and return markdown ( markdown and the main content of the page
+are returned by default so we could have ommitted the options entirelly ).
+The `Firecrawl.scrape` method instantiates a `Firecrawl::ScrapeRequest` instance and then calls
+it's `submit` method. The following is the equivalent code which makes explict use of the
+`Firecrawl::ScrapeRequest` class.
+```ruby
+request = Firecrawl::ScrapeRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )
+response = request.submit( 'https://example.com', format: :markdown )
+if response.success?
+  result = response.result
+  if result.success?
+    puts result.metadata[ 'title' ]
+    puts result.markdown
+  end
+else
+  puts response.result.error_description
+end
+```
+Notice also that in this example we've directly passed the `api_key` to the individual request.
+This is optional. If you set the key globally and omit it in the request constructor the
+`ScrapeRequest` instance will use the globally assigned `api_key`.
+#### Scrape Options
+You can customize scraping behavior using options, either by passing an option hash to
+`submit` method, as we have done above, or by building a `ScrapeOptions` instance:
+```ruby
+options = Firecrawl::ScrapeOptions.build do
+  formats           [ :html, :markdown, :screenshot ]
+  only_main_content true
+  include_tags      [ 'article', 'main' ]
+  exclude_tags      [ 'nav', 'footer' ]
+  wait_for          5000  # milliseconds
+end
+request = Firecrawl::ScrapeRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )
+response = request.submit( 'https://example.com', options )
+```
+#### Scrape Response
+The `Firecrawl` gem is based on the `Faraday` gem, which permits you to customize the request
+orchestration, up to and including changing the actual HTTP implementation used to make the
+request. See Connections below for additional details.
+Any `Firecrawl` request, including the `submit` method as used above, will thus return a
+`Faraday::Response`. This response includes a `success?` method which indicates if the request
+was successful. If the request was successful, the `response.result` method will be an instance
+of `Firecrawl::ScrapeResult` that will encapsulate the scraping result. This instance, in turn,
+has a `success?` method which will return `true` if Firecrawl successfully scraped the page.
+A successful result will include html, markdown, screenshot, as well as any action and llm
+results and related metadata.
+If the response is not successful ( if `response.success?` is `false` ) then `response.result`
+will be an instance of Firecrawl::ErrorResult which will provide additional details about the
+nature of the failure.
+### Batch Scraping
+For scraping multiple URLs efficiently:
+```ruby
+request = Firecrawl::BatchScrapeRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )
+urls = [ 'https://example.com', 'https://example.org' ]
+options = Firecrawl::ScrapeOptions.build do
+  format :markdown
+  only_main_content true
+end
+response = request.submit( urls, options )
+while response.success?
+  batch_result = response.result
+  batch_result.scrape_results.each do |result|
+    puts result.metadata['title']
+    puts result.markdown
+    puts "\n---\n"
+  end
+  break unless batch_result.status?( :scraping )
+  sleep 0.5
+  response = request.retrieve( batch_result )
+end
+```
+### Site Mapping
+To retrieve a site's structure:
+```ruby
+request = Firecrawl::MapRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )
+options = Firecrawl::MapOptions.build do
+  limit 100
+  ignore_subdomains true
+end
+response = request.submit( 'https://example.com', options )
+if response.success?
+  result = response.result
+  result.links.each do |link|
+    puts link
+  end
+end
+```
+### Site Crawling
+For comprehensive site crawling:
+```ruby
+request = Firecrawl::CrawlRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' ] )
+options = Firecrawl::CrawlOptions.build do
+  maximum_depth 2
+  limit 10
+  scrape_options do
+    format :markdown
+    only_main_content true
+  end
+end
+response = request.submit( 'https://example.com', options )
+while response.success?
+  crawl_result = response.result
+  crawl_result.scrape_results.each do | result |
+    puts result.metadata[ 'title' ]
+    puts result.markdown
+  end
+  break unless crawl_result.status?( :scraping )
+  sleep 0.5
+  response = request.retrieve( crawl_result )
+end
+```
+## License
+The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).

data/firecrawl.gemspec CHANGED Viewed

@@ -1,7 +1,7 @@
 Gem::Specification.new do | spec |
   spec.name             = 'firecrawl'
-  spec.version          = '0.0.1'
+  spec.version          = '0.2.0'
   spec.authors          = [ 'Kristoph Cichocki-Romanov' ]
   spec.email            = [ 'rubygems.org@kristoph.net' ]
@@ -29,9 +29,10 @@ Gem::Specification.new do | spec |
   spec.require_paths    = [ "lib" ]
   spec.add_runtime_dependency 'faraday', '~> 2.7'
-  spec.add_runtime_dependency 'dynamicschema', '~> 1.0.0.beta03'
+  spec.add_runtime_dependency 'dynamicschema', '~> 1.0.0.beta04'
   spec.add_development_dependency 'rspec', '~> 3.13'
   spec.add_development_dependency 'debug', '~> 1.9'
+  spec.add_development_dependency 'vcr', '~> 6.3'
 end

data/lib/firecrawl/batch_scrape_request.rb CHANGED Viewed

@@ -3,8 +3,8 @@ module Firecrawl
   ##
   # The +BatchScrapeRequest+ class encapsulates a batch scrape request to the Firecrawl API.
   # After creating a new +BatchScrapeRequest+ instance you can begin batch scraping by calling
-  # the +begin_scraping+ method and then subsequently evaluate the results by calling the
-  # +continue_scraping' method.
+  # the +submit+ method and then subsequently retrieve the results by calling the
+  # +retrieve' method.
   #
   # === examples
   #
@@ -18,7 +18,7 @@ module Firecrawl
   #   only_main_content     true
   # end
   #
-  # batch_response = request.beging_scraping( urls, options )
+  # batch_response = request.submit( urls, options )
   # while response.success?
   #   batch_result = batch_response.result
   #   if batch_result.success?
@@ -30,17 +30,18 @@ module Firecrawl
   #     end
   #   end
   #   break unless batch_result.status?( :scraping )
+  #   batch_response = request.retrieve( batch_result )
   # end
   #
-  # unless response.success?
-  #   puts response.result.error_description
+  # unless batch_response.success?
+  #   puts batch_response.result.error_description
   # end
   #
   class BatchScrapeRequest < Request
     ##
-    # The +start_scraping+ method makes a Firecrawl '/batch/scrape' POST request which will
-    # initiate batch scraping of the given urls.
+    # The +submit+ method makes a Firecrawl '/batch/scrape/{id}' POST request which will initiate
+    # batch scraping of the given urls.
     #
     # The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
     # then +response.result+ will be an instance +BatchScrapeResult+. If the request is not
@@ -50,7 +51,7 @@ module Firecrawl
     # successful and then +response.result.success?+ to validate that the API processed the
     # request successfuly.
     #
-    def start_scraping( urls, options = nil, &block )
+    def submit( urls, options = nil, &block )
       if options
         options = options.is_a?( ScrapeOptions ) ? options : ScrapeOptions.build( options.to_h )
         options = options.to_h
@@ -58,25 +59,25 @@ module Firecrawl
         options = {}
       end
       options[ :urls ] = [ urls ].flatten
       response = post( "#{BASE_URI}/batch/scrape", options, &block )
       result = nil
+      attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
       if response.success?
-        attributes = ( JSON.parse( response.body, symbolize_names: true ) rescue nil )
         attributes ||= { success: false, status: :failed  }
         result = BatchScrapeResult.new( attributes[ :success ], attributes )
       else
-        result = ErrorResult.new( response.status, attributes )
+        result = ErrorResult.new( response.status, attributes || {} )
       end
       ResponseMethods.install( response, result )
     end
     ##
-    # The +retrieve_scraping+ method makes a Firecrawl '/batch/scrape' GET request which will
-    # retrieve batch scraping results. Note that there is no guarantee that there are any batch
-    # scraping results at the time of the call and you may need to call this method multiple
-    # times.
+    # The +retrieve+ method makes a Firecrawl '/batch/scrape' GET request which will return the
+    # scrape results that were completed since the previous call to this method ( or, if this is
+    # the first call to this method, since the batch scrape was started ). Note that there is no
+    # guarantee that there are any new batch scrape results at the time you make this call
+    # ( scrape_results may be empty ).
     #
     # The response is always an instance of +Faraday::Response+. If +response.success?+ is +true+,
     # then +response.result+ will be an instance +BatchScrapeResult+. If the request is not
@@ -86,17 +87,53 @@ module Firecrawl
     # successful and then +response.result.success?+ to validate that the API processed the
     # request successfuly.
     #
-    def retrieve_scraping( batch_result, &block )
+    def retrieve( batch_result, &block )
       raise ArgumentError, "The first argument must be an instance of BatchScrapeResult." \
         unless batch_result.is_a?( BatchScrapeResult )
       response = get( batch_result.next_url, &block )
       result = nil
+      attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
+      if response.success?
+        attributes ||= { success: false, status: :failed  }
+        result = batch_result.merge( attributes  )
+      else
+        result = ErrorResult.new( response.status, attributes || {} )
+      end
+      ResponseMethods.install( response, result )
+    end
+    ##
+    # The +retrieve_all+ method makes a Firecrawl '/batch/scrape' GET request which will return
+    # the scrape results that were completed at the time of this call. Repeated calls to this
+    # method will retrieve the scrape results previouslly returned as well as any scrape results
+    # that have accumulated since.
+    #
+    # Note that there is no guarantee that there are any new batch scrape results at the time you
+    # make this call ( scrape_results may be empty ).
+    #
+    # The response is always an instance of +Faraday::Response+. If +response.success?+ is +true+,
+    # then +response.result+ will be an instance +BatchScrapeResult+. If the request is not
+    # successful then +response.result+ will be an instance of +ErrorResult+.
+    #
+    # Remember that you should call +response.success?+ to valida that the call to the API was
+    # successful and then +response.result.success?+ to validate that the API processed the
+    # request successfuly.
+    #
+    def retrieve_all( batch_result, &block )
+      raise ArgumentError, "The first argument must be an instance of BatchScrapeResult." \
+        unless batch_result.is_a?( BatchScrapeResult )
+      response = get( batch_result.url, &block )
+      result = nil
+      attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
       if response.success?
-        attributes = ( JSON.parse( response.body, symbolize_names: true ) rescue nil )
         attributes ||= { success: false, status: :failed  }
+        # the next url should not be set by this method so that retrieve and retrieve_all do
+        # not impact each other
+        attributes.delete( :next )
         result = batch_result.merge( attributes  )
       else
-        result = ErrorResult.new( response.status, attributes )
+        result = ErrorResult.new( response.status, attributes || {} )
       end
       ResponseMethods.install( response, result )

data/lib/firecrawl/crawl_options.rb CHANGED Viewed

@@ -1,21 +1,17 @@
 module Firecrawl
   class CrawlOptions
     include DynamicSchema::Definable
-    include DynamicSchema::Buildable
-    FORMATS = [ :markdown, :links, :html, :raw_html, :screenshot ]
-    ACTIONS = [ :wait, :click, :write, :press, :screenshot, :scrape ]
+    include Helpers
     schema do
       exclude_paths         String, as: :excludePaths, array: true
       include_paths         String, as: :includePaths, array: true
       maximum_depth         Integer, as: :maxDepth
       ignore_sitemap        [ TrueClass, FalseClass ], as: :ignoreSitemap
-      limit                 Integer
+      limit                 Integer, in: (0..)
       allow_backward_links  [ TrueClass, FalseClass ], as: :allowBackwardLinks
       allow_external_links  [ TrueClass, FalseClass ], as: :allowExternalLinks
-      webhook               String
+      webhook_uri           URI, as: :webhook
       scrape_options        as: :scrapeOptions, &ScrapeOptions.schema
     end
@@ -27,13 +23,13 @@ module Firecrawl
       new( api_options: builder.build!( options, &block ) )
     end
-    def initialize( options, api_options: nil )
+    def initialize( options = nil, api_options: nil )
       @options = self.class.builder.build( options || {} )
       @options = api_options.merge( @options ) if api_options
       scrape_options = @options[ :scrapeOptions ]
       if scrape_options
-        scrape_options[ :formats ]&.map!( &method( :string_camelize ) )
+        scrape_options[ :formats ]&.map! { | format | string_camelize( format.to_s ) }
       end
     end

data/lib/firecrawl/crawl_request.rb ADDED Viewed

@@ -0,0 +1,135 @@
+module Firecrawl
+  ##
+  # The +CrawlRequest+ class encapsulates a crawl request to the Firecrawl API. After creating
+  # a new +CrawlRequest+ instance you can begin crawling by calling the +submit+ method and
+  # then subsequently retrieving the results by calling the +retrieve+ method.
+  #
+  # You can also optionally cancel the crawling operation by calling +cancel+.
+  #
+  # === examples
+  #
+  # require 'firecrawl'
+  #
+  # request = Firecrawl::CrawlRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' )
+  #
+  # urls = 'https://icann.org'
+  # options = Firecrawl::CrawlOptions.build do
+  #   scrape_options        do
+  #     main_content_only   true
+  #   end
+  # end
+  #
+  # crawl_response = request.submit( urls, options )
+  # while crawl_response.success?
+  #   crawl_result = crawl_response.result
+  #   if crawl_result.success?
+  #     crawl_result.scrape_results.each do | result |
+  #       puts response.metadata[ 'title ]
+  #       puts '---'
+  #       puts response.markdown
+  #       puts "\n\n"
+  #     end
+  #   end
+  #   break unless crawl_result.status?( :scraping )
+  #   crawl_response = request.retrieve( crawl_result )
+  # end
+  #
+  # unless crawl_response.success?
+  #   puts crawl_response.result.error_description
+  # end
+  #
+  class CrawlRequest < Request
+    ##
+    # The +submit+ method makes a Firecrawl '/crawl' POST request which will initiate crawling
+    # of the given url.
+    #
+    # The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
+    # then +response.result+ will be an instance +CrawlResult+. If the request is not successful
+    # then +response.result+ will be an instance of +ErrorResult+.
+    #
+    # Remember that you should call +response.success?+ to validr that the call to the API was
+    # successful and then +response.result.success?+ to validate that the API processed the
+    # request successfuly.
+    #
+    def submit( url, options = nil, &block )
+      if options
+        options = options.is_a?( CrawlOptions ) ? options : CrawlOptions.build( options.to_h )
+        options = options.to_h
+      else
+        options = {}
+      end
+      options[ :url ] = url
+      response = post( "#{BASE_URI}/crawl", options, &block )
+      result = nil
+      attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
+      if response.success?
+        attributes ||= { success: false, status: :failed  }
+        result = CrawlResult.new( attributes[ :success ], attributes )
+      else
+        result = ErrorResult.new( response.status, attributes )
+      end
+      ResponseMethods.install( response, result )
+    end
+    ##
+    # The +retrieve+ method makes a Firecrawl '/crawl/{id}' GET request which will return the
+    # crawl results that were completed since the previous call to this method( or, if this is
+    # the first call to this method, since the crawl was started ). Note that there is no
+    # guarantee that there are any new crawl results at the time you make this call
+    # ( scrape_results may be empty ).
+    #
+    # The response is always an instance of +Faraday::Response+. If +response.success?+ is
+    # +true+, then +response.result+ will be an instance +CrawlResult+. If the request is not
+    # successful then +response.result+ will be an instance of +ErrorResult+.
+    #
+    # Remember that you should call +response.success?+ to validate that the call to the API was
+    # successful and then +response.result.success?+ to validate that the API processed the
+    # request successfuly.
+    #
+    def retrieve( crawl_result, &block )
+      raise ArgumentError, "The first argument must be an instance of CrawlResult." \
+        unless crawl_result.is_a?( CrawlResult )
+      response = get( crawl_result.next_url, &block )
+      result = nil
+      attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
+      if response.success?
+        result = crawl_result.merge( attributes || { success: false, status: :failed  } )
+      else
+        result = ErrorResult.new( response.status, attributes || {} )
+      end
+      ResponseMethods.install( response, result )
+    end
+    ##
+    # The +cancel+ method makes a Firecrawl '/crawl/{id}' DELETE request which will cancel a
+    # previouslly submitted crawl.
+    #
+    # The response is always an instance of +Faraday::Response+. If +response.success?+ is
+    # +true+, then +response.result+ will be an instance +CrawlResult+. If the request is not
+    # successful then +response.result+ will be an instance of +ErrorResult+.
+    #
+    # Remember that you should call +response.success?+ to validate that the call to the API was
+    # successful and then +response.result.success?+ to validate that the API processed the
+    # request successfuly.
+    #
+    def cancel( crawl_result, &block )
+      raise ArgumentError, "The first argument must be an instance of CrawlResult." \
+        unless crawl_result.is_a?( CrawlResult )
+      response = get( crawl_result.url, &block )
+      result = nil
+      attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil
+      if response.success?
+        result = crawl_result.merge( attributes || { success: false, status: :failed  } )
+      else
+        result = ErrorResult.new( response.status, attributes || {} )
+      end
+      ResponseMethods.install( response, result )
+    end
+  end
+end

data/lib/firecrawl/crawl_result.rb ADDED Viewed

@@ -0,0 +1,63 @@
+module Firecrawl
+  class CrawlResult
+    def initialize( success, attributes )
+      @success = success
+      @attributes = attributes || {}
+    end
+    def success?
+      @success || false
+    end
+    def status
+      # the initial Firecrawl response does not have a status so we synthesize a 'crawling'
+      # status if the operation was otherwise successful
+      @attributes[ :status ]&.to_sym || ( @success ? :scraping : :failed )
+    end
+    def status?( status )
+      self.status == status
+    end
+    def id
+      @attributes[ :id ]
+    end
+    def total
+      @attributes[ :total ] || 0
+    end
+    def completed
+      @attributes[ :completed ] || 0
+    end
+    def credits_used
+      @attributes[ :creditsUsed ] || 0
+    end
+    def expires_at
+      Date.parse( @attributes[ :expiresAt ] ) rescue nil
+    end
+    def url
+      @attributes[ :url ]
+    end
+    def next_url
+      @attributes[ :next ] || @attributes[ :url ]
+    end
+    def scrape_results
+      success = @attributes[ :success ]
+      # note the &.compact is here because I've noted null entries in the data
+      ( @attributes[ :data ]&.compact || [] ).map do | attr |
+        ScrapeResult.new( success, attr )
+      end
+    end
+    def merge( attributes )
+      self.class.new( attributes[ :success ], @attributes.merge( attributes ) )
+    end
+  end
+end

data/lib/firecrawl/error_result.rb CHANGED Viewed

@@ -5,7 +5,7 @@ module Firecrawl
     def initialize( status_code, attributes = nil )
       @error_code, @error_description = status_code_to_error( status_code )
-      @error_description = attributes[ :error ] if @attributes&.respond_to?( :[] )
+      @error_description = attributes[ :error ] if attributes&.respond_to?( :[] )
     end
   private

data/lib/firecrawl/map_request.rb CHANGED Viewed

@@ -10,7 +10,7 @@ module Firecrawl
   #
   # request = Firecrawl::MapRequest.new( api_key: ENV[ 'FIRECRAWL_API_KEY' )
   #
-  # response = request.map( 'https://example.com', { limit: 100 } )
+  # response = request.submit( 'https://example.com', { limit: 100 } )
   # if response.success?
   #   result = response.result
   #   if result.success?
@@ -25,14 +25,14 @@ module Firecrawl
   class MapRequest < Request
     ##
-    # The +map+ method makes a Firecrawl '/map' POST request which will scrape the site with
-    # given url.
+    # The +submit+ method makes a Firecrawl '/map' POST request which will scrape the site with
+    # given url and return links to all hosted pages related to that url.
     #
     # The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
     # then +response.result+ will be an instance +MapResult+. If the request is not successful
     # then +response.result+ will be an instance of +ErrorResult+.
     #
-    def map( url, options = nil, &block )
+    def submit( url, options = nil, &block )
       if options
         options = options.is_a?( MapOptions ) ? options : MapOptions.build( options.to_h )
         options = options.to_h

data/lib/firecrawl/module_methods.rb ADDED Viewed

@@ -0,0 +1,18 @@
+module Firecrawl
+  module ModuleMethods
+    DEFAULT_CONNECTION = Faraday.new { | builder | builder.adapter Faraday.default_adapter }
+    def connection( connection = nil )
+      @connection = connection || @connection || DEFAULT_CONNECTION
+    end
+    def api_key( api_key = nil )
+      @api_key = api_key || @api_key
+      @api_key
+    end
+    def scrape( url, options = nil, &block )
+      Firecrawl::ScrapeRequest.new.submit( url, options, &block )
+    end
+  end
+end

data/lib/firecrawl/request.rb CHANGED Viewed

@@ -28,8 +28,6 @@ module Firecrawl
   #
   class Request
-    DEFAULT_CONNECTION = Faraday.new { | builder | builder.adapter Faraday.default_adapter }
     BASE_URI = 'https://api.firecrawl.dev/v1'
     ##
@@ -37,7 +35,7 @@ module Firecrawl
     # and optionally a (Faraday) +connection+.
     #
     def initialize( connection: nil, api_key: nil )
-      @connection = connection || DEFAULT_CONNECTION
+      @connection = connection || Firecrawl.connection
       @api_key = api_key || Firecrawl.api_key
       raise ArgumentError, "An 'api_key' is required unless configured using 'Firecrawl.api_key'." \
         unless @api_key
@@ -70,6 +68,18 @@ module Firecrawl
       end
     end
+    def delete( uri, &block )
+      headers = {
+        'Authorization' => "Bearer #{@api_key}",
+        'Content-Type' => 'application/json'
+      }
+      @connection.delete( uri ) do | request |
+        headers.each { | key, value | request.headers[ key ] = value }
+        block.call( request ) if block
+      end
+    end
   end
 end

data/lib/firecrawl/scrape_options.rb CHANGED Viewed

@@ -9,7 +9,7 @@ module Firecrawl
     schema do
       # note: both format and formats are defined as a semantic convenience
-      format            String, as: :formats, array: true, in: FORMATS
+      format            String, as: :formats, array: true, in: FORMATS
       formats           String, array: true, in: FORMATS
       only_main_content [ TrueClass, FalseClass ], as: :onlyMainContent
       include_tags      String, as: :includeTags, array: true
@@ -17,7 +17,7 @@ module Firecrawl
       wait_for          Integer
       timeout           Integer
       extract           do
-        #schema          Hash
+        schema          Hash
         system_prompt   String, as: :systemPrompt
         prompt          String
       end

data/lib/firecrawl/scrape_request.rb CHANGED Viewed

@@ -1,7 +1,7 @@
 module Firecrawl
   ##
   # The +ScrapeRequest+ class encapsulates a '/scrape' POST request to the Firecrawl API. After
-  # creating a new +ScrapeRequest+ instance you can initiate the request by calling the +scrape+
+  # creating a new +ScrapeRequest+ instance you can initiate the request by calling the +submit+
   # method to perform synchronous scraping.
   #
   # === examples
@@ -15,7 +15,7 @@ module Firecrawl
   #   only_main_content     true
   # end
   #
-  # response = request.scrape( 'https://example.com', options )
+  # response = request.submit( 'https://example.com', options )
   # if response.success?
   #   result = response.result
   #   puts response.metadata[ 'title ]
@@ -28,13 +28,13 @@ module Firecrawl
   class ScrapeRequest < Request
     ##
-    # The +scrape+ method makes a Firecrawl '/scrape' POST request which will scrape the given url.
+    # The +submit+ method makes a Firecrawl '/scrape' POST request which will scrape the given url.
     #
     # The response is always an instance of +Faraday::Response+. If +response.success?+ is true,
     # then +response.result+ will be an instance +ScrapeResult+. If the request is not successful
     # then +response.result+ will be an instance of +ErrorResult+.
     #
-    def scrape( url, options = nil, &block )
+    def submit( url, options = nil, &block )
       if options
         options = options.is_a?( ScrapeOptions ) ? options : ScrapeOptions.build( options.to_h )
         options = options.to_h

data/lib/firecrawl/scrape_result.rb CHANGED Viewed

@@ -16,6 +16,20 @@ module Firecrawl
       @success || false
     end
+    def metadata
+      unless @metadata
+        metadata = @attributes[ :metadata ] || {}
+        @metadata = metadata.transform_keys do | key |
+          key.to_s.gsub( /([a-z])([A-Z])/, '\1_\2' ).downcase
+        end
+        # remove the camelCase forms injected by Firecrawl
+        @metadata.delete_if do | key, _ |
+          key.start_with?( 'og_' ) && @metadata.key?( key.sub( 'og_', 'og:' ) )
+        end
+      end
+      @metadata
+    end
     ##
     # The +markdown+ method returns scraped content that has been converted to markdown. The
     # markdown content is present only if the request options +formats+ included +markdown+.
@@ -66,20 +80,6 @@ module Firecrawl
       @attributes[ :actions ] || {}
     end
-    def metadata
-      unless @metadata
-        metadata = @attributes[ :metadata ] || {}
-        @metadata = metadata.transform_keys do | key |
-          key.to_s.gsub( /([a-z])([A-Z])/, '\1_\2' ).downcase
-        end
-        # remove the camelCase forms injected by Firecrawl
-        @metadata.delete_if do | key, _ |
-          key.start_with?( 'og_' ) && @metadata.key?( key.sub( 'og_', 'og:' ) )
-        end
-      end
-      @metadata
-    end
     def llm_extraction
       @attributes[ :llm_extraction ] || {}
     end

data/lib/firecrawl.rb CHANGED Viewed

@@ -18,10 +18,14 @@ require_relative 'firecrawl/batch_scrape_request'
 require_relative 'firecrawl/map_options'
 require_relative 'firecrawl/map_result'
 require_relative 'firecrawl/map_request'
+require_relative 'firecrawl/crawl_options'
+require_relative 'firecrawl/crawl_result'
+require_relative 'firecrawl/crawl_request'
+require_relative 'firecrawl/module_methods'
 module Firecrawl
-  class << self
-    attr_accessor :api_key
-  end
+  extend ModuleMethods
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: firecrawl
 version: !ruby/object:Gem::Version
-  version: 0.0.1
+  version: 0.2.0
 platform: ruby
 authors:
 - Kristoph Cichocki-Romanov
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-11-04 00:00:00.000000000 Z
+date: 2024-11-29 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: faraday
@@ -30,14 +30,14 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 1.0.0.beta03
+        version: 1.0.0.beta04
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 1.0.0.beta03
+        version: 1.0.0.beta04
 - !ruby/object:Gem::Dependency
   name: rspec
   requirement: !ruby/object:Gem::Requirement
@@ -66,6 +66,20 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '1.9'
+- !ruby/object:Gem::Dependency
+  name: vcr
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '6.3'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '6.3'
 description: |-
   The Firecrawl gem implements a lightweight interface to the Firecrawl.dev API. Firecrawl can take a URL, scrape the page contents and return the whole page or principal content as html, markdown, or structured data.
@@ -77,16 +91,20 @@ extensions: []
 extra_rdoc_files: []
 files:
 - LICENSE
+- README.md
 - firecrawl.gemspec
 - lib/firecrawl.rb
 - lib/firecrawl/batch_scrape_request.rb
 - lib/firecrawl/batch_scrape_result.rb
 - lib/firecrawl/crawl_options.rb
+- lib/firecrawl/crawl_request.rb
+- lib/firecrawl/crawl_result.rb
 - lib/firecrawl/error_result.rb
 - lib/firecrawl/helpers.rb
 - lib/firecrawl/map_options.rb
 - lib/firecrawl/map_request.rb
 - lib/firecrawl/map_result.rb
+- lib/firecrawl/module_methods.rb
 - lib/firecrawl/request.rb
 - lib/firecrawl/response_methods.rb
 - lib/firecrawl/scrape_options.rb