RubyGems - crawlbase - Versions diffs - 1.0.0 - Mend

crawlbase 1.0.0

Files changed (18) hide show

checksums.yaml +7 -0
data/.gitignore +9 -0
data/CODE_OF_CONDUCT.md +74 -0
data/Gemfile +3 -0
data/LICENSE.txt +21 -0
data/README.md +364 -0
data/Rakefile +2 -0
data/bin/console +14 -0
data/bin/setup +8 -0
data/crawlbase.gemspec +31 -0
data/lib/crawlbase/api.rb +82 -0
data/lib/crawlbase/leads_api.rb +41 -0
data/lib/crawlbase/scraper_api.rb +23 -0
data/lib/crawlbase/screenshots_api.rb +52 -0
data/lib/crawlbase/storage_api.rb +116 -0
data/lib/crawlbase/version.rb +5 -0
data/lib/crawlbase.rb +11 -0
metadata +116 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 50c42144b472e240907828a2656215a7f1e5004f07a3288d3eafca184b4dc16c
+  data.tar.gz: fdbbd3ebe2a64ecde61e34b94bd5716574d04d5f1bffb5e1a9225f6500b0a793
+SHA512:
+  metadata.gz: c29927980f6cf82b431c7385e78429802916ebe188a40e789ca17e1c5a63d7183f4f8b5b30d8b9b7bd42c84fa6f1223eda48703cf317395e359a422a88cddfee
+  data.tar.gz: db7cf49a5dc174920a76bb35fecb92097b61025e3b2f7fb372a7143282fba93edd748763020001b0cdd928da4500cfe1a948e8123a2232a614cbfb399f3a4843

data/.gitignore ADDED Viewed

@@ -0,0 +1,9 @@
+/.bundle/
+/.yardoc
+/Gemfile.lock
+/_yardoc/
+/coverage/
+/doc/
+/pkg/
+/spec/reports/
+/tmp/

data/CODE_OF_CONDUCT.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Contributor Covenant Code of Conduct
+## Our Pledge
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to making participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, gender identity and expression, level of experience,
+nationality, personal appearance, race, religion, or sexual identity and
+orientation.
+## Our Standards
+Examples of behavior that contributes to creating a positive environment
+include:
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+Examples of unacceptable behavior by participants include:
+* The use of sexualized language or imagery and unwelcome sexual attention or
+advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+  address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+## Our Responsibilities
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+## Scope
+This Code of Conduct applies both within project spaces and in public spaces
+when an individual is representing the project or its community. Examples of
+representing a project or community include using an official project e-mail
+address, posting via an official social media account, or acting as an appointed
+representative at an online or offline event. Representation of a project may be
+further defined and clarified by project maintainers.
+## Enforcement
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at info@crawlbase.com. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+## Attribution
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at [http://contributor-covenant.org/version/1/4][version]
+[homepage]: http://contributor-covenant.org
+[version]: http://contributor-covenant.org/version/1/4/

data/Gemfile ADDED Viewed

@@ -0,0 +1,3 @@
+source "https://rubygems.org"
+gemspec

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright (c) 2020 Crawlbase
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,364 @@
+# Crawlbase
+Dependency free gem for scraping and crawling websites using the Crawlbase API.
+## Installation
+Add this line to your application's Gemfile:
+```ruby
+gem 'crawlbase'
+```
+And then execute:
+    $ bundle
+Or install it yourself as:
+    $ gem install crawlbase
+## Crawling API Usage
+Require the gem in your project
+```ruby
+require 'crawlbase'
+```
+Initialize the API with one of your account tokens, either normal or javascript token. Then make get or post requests accordingly.
+You can get a token for free by [creating a Crawlbase account](https://crawlbase.com/signup) and 1000 free testing requests. You can use them for tcp calls or javascript calls or both.
+```ruby
+api = Crawlbase::API.new(token: 'YOUR_TOKEN')
+```
+### GET requests
+Pass the url that you want to scrape plus any options from the ones available in the [API documentation](https://crawlbase.com/dashboard/docs).
+```ruby
+api.get(url, options)
+```
+Example:
+```ruby
+begin
+  response = api.get('https://www.facebook.com/britneyspears')
+  puts response.status_code
+  puts response.original_status
+  puts response.pc_status
+  puts response.body
+rescue => exception
+  puts exception.backtrace
+end
+```
+You can pass any options of what the Crawlbase API supports in exact param format.
+Example:
+```ruby
+options = {
+  user_agent: 'Mozilla/5.0 (Windows NT 6.2; rv:20.0) Gecko/20121202 Firefox/30.0',
+  format: 'json'
+}
+response = api.get('https://www.reddit.com/r/pics/comments/5bx4bx/thanks_obama/', options)
+puts response.status_code
+puts response.body # read the API json response
+```
+### POST requests
+Pass the url that you want to scrape, the data that you want to send which can be either a json or a string, plus any options from the ones available in the [API documentation](https://crawlbase.com/dashboard/docs).
+```ruby
+api.post(url, data, options);
+```
+Example:
+```ruby
+api.post('https://producthunt.com/search', { text: 'example search' })
+```
+You can send the data as application/json instead of x-www-form-urlencoded by setting options `post_content_type` as json.
+```ruby
+response = api.post('https://httpbin.org/post', { some_json: 'with some value' }, { post_content_type: 'json' })
+puts response.status_code
+puts response.body
+```
+### Javascript requests
+If you need to scrape any website built with Javascript like React, Angular, Vue, etc. You just need to pass your javascript token and use the same calls. Note that only `.get` is available for javascript and not `.post`.
+```ruby
+api = Crawlbase::API.new(token: 'YOUR_JAVASCRIPT_TOKEN' })
+```
+```ruby
+response = api.get('https://www.nfl.com')
+puts response.status_code
+puts response.body
+```
+Same way you can pass javascript additional options.
+```ruby
+response = api.get('https://www.freelancer.com', options: { page_wait: 5000 })
+puts response.status_code
+```
+## Original status
+You can always get the original status and crawlbase status from the response. Read the [Crawlbase documentation](https://crawlbase.com/dashboard/docs) to learn more about those status.
+```ruby
+response = api.get('https://sfbay.craigslist.org/')
+puts response.original_status
+puts response.pc_status
+```
+## Scraper API usage
+Initialize the Scraper API using your normal token and call the `get` method.
+```ruby
+scraper_api = Crawlbase::ScraperAPI.new(token: 'YOUR_TOKEN')
+```
+Pass the url that you want to scrape plus any options from the ones available in the [Scraper API documentation](https://crawlbase.com/docs/scraper-api/parameters).
+```ruby
+api.get(url, options)
+```
+Example:
+```ruby
+begin
+  response = scraper_api.get('https://www.amazon.com/Halo-SleepSack-Swaddle-Triangle-Neutral/dp/B01LAG1TOS')
+  puts response.remaining_requests
+  puts response.status_code
+  puts response.body
+rescue => exception
+  puts exception.backtrace
+end
+```
+## Leads API usage
+Initialize with your Leads API token and call the `get` method.
+For more details on the implementation, please visit the [Leads API documentation](https://crawlbase.com/docs/leads-api).
+```ruby
+leads_api = Crawlbase::LeadsAPI.new(token: 'YOUR_TOKEN')
+begin
+  response = leads_api.get('stripe.com')
+  puts response.success
+  puts response.remaining_requests
+  puts response.status_code
+  puts response.body
+rescue => exception
+  puts exception.backtrace
+end
+```
+If you have questions or need help using the library, please open an issue or [contact us](https://crawlbase.com/contact).
+## Screenshots API usage
+Initialize with your Screenshots API token and call the `get` method.
+```ruby
+screenshots_api = Crawlbase::ScreenshotsAPI.new(token: 'YOUR_TOKEN')
+begin
+  response = screenshots_api.get('https://www.apple.com')
+  puts response.success
+  puts response.remaining_requests
+  puts response.status_code
+  puts response.screenshot_path # do something with screenshot_path here
+rescue => exception
+  puts exception.backtrace
+end
+```
+or with using a block
+```ruby
+screenshots_api = Crawlbase::ScreenshotsAPI.new(token: 'YOUR_TOKEN')
+begin
+  response = screenshots_api.get('https://www.apple.com') do |file|
+    # do something (reading/writing) with the image file here
+  end
+  puts response.success
+  puts response.remaining_requests
+  puts response.status_code
+rescue => exception
+  puts exception.backtrace
+end
+```
+or specifying a file path
+```ruby
+screenshots_api = Crawlbase::ScreenshotsAPI.new(token: 'YOUR_TOKEN')
+begin
+  response = screenshots_api.get('https://www.apple.com', save_to_path: '~/screenshot.jpg') do |file|
+    # do something (reading/writing) with the image file here
+  end
+  puts response.success
+  puts response.remaining_requests
+  puts response.status_code
+rescue => exception
+  puts exception.backtrace
+end
+```
+Note that `screenshots_api.get(url, options)` method accepts an [options](https://crawlbase.com/docs/screenshots-api/parameters)
+## Storage API usage
+Initialize the Storage API using your private token.
+```ruby
+storage_api = Crawlbase::StorageAPI.new(token: 'YOUR_TOKEN')
+```
+Pass the [url](https://crawlbase.com/docs/storage-api/parameters/#url) that you want to get from [Crawlbase Storage](https://crawlbase.com/dashboard/storage).
+```ruby
+begin
+  response = storage_api.get('https://www.apple.com')
+  puts response.original_status
+  puts response.pc_status
+  puts response.url
+  puts response.status_code
+  puts response.rid
+  puts response.body
+  puts response.stored_at
+rescue => exception
+  puts exception.backtrace
+end
+```
+or you can use the [RID](https://crawlbase.com/docs/storage-api/parameters/#rid)
+```ruby
+begin
+  response = storage_api.get(RID)
+  puts response.original_status
+  puts response.pc_status
+  puts response.url
+  puts response.status_code
+  puts response.rid
+  puts response.body
+  puts response.stored_at
+rescue => exception
+  puts exception.backtrace
+end
+```
+Note: One of the two RID or URL must be sent. So both are optional but it's mandatory to send one of the two.
+### [Delete](https://crawlbase.com/docs/storage-api/delete/) request
+To delete a storage item from your storage area, use the correct RID
+```ruby
+if storage_api.delete(RID)
+  puts 'delete success'
+else
+  puts "Unable to delete: #{storage_api.body['error']}"
+end
+```
+### [Bulk](https://crawlbase.com/docs/storage-api/bulk/) request
+To do a bulk request with a list of RIDs, please send the list of rids as an array
+```ruby
+begin
+  response = storage_api.bulk([RID1, RID2, RID3, ...])
+  puts response.original_status
+  puts response.pc_status
+  puts response.url
+  puts response.status_code
+  puts response.rid
+  puts response.body
+  puts response.stored_at
+rescue => exception
+  puts exception.backtrace
+end
+```
+### [RIDs](https://crawlbase.com/docs/storage-api/rids) request
+To request a bulk list of RIDs from your storage area
+```ruby
+begin
+  response = storage_api.rids
+  puts response.status_code
+  puts response.rid
+  puts response.body
+rescue => exception
+  puts exception.backtrace
+end
+```
+You can also specify a limit as a parameter
+```ruby
+storage_api.rids(100)
+```
+### [Total Count](https://crawlbase.com/docs/storage-api/total_count)
+To get the total number of documents in your storage area
+```ruby
+total_count = storage_api.total_count
+puts "total_count: #{total_count}"
+```
+If you have questions or need help using the library, please open an issue or [contact us](https://crawlbase.com/contact).
+## Development
+After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
+## Contributing
+Bug reports and pull requests are welcome on GitHub at https://github.com/crawlbase-source/crawlbase-ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
+## License
+The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
+## Code of Conduct
+Everyone interacting in the Crawlbase project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/crawlbase-source/crawlbase-ruby/blob/master/CODE_OF_CONDUCT.md).
+---
+Copyright 2023 Crawlbase

data/Rakefile ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ require "bundler/gem_tasks"
2	+ task :default => :spec

data/bin/console ADDED Viewed

@@ -0,0 +1,14 @@
+#!/usr/bin/env ruby
+require "bundler/setup"
+require "crawlbase"
+# You can add fixtures and/or initialization code here to make experimenting
+# with your gem easier. You can also use a different console, if you like.
+# (If you use this, don't forget to add pry to your Gemfile!)
+# require "pry"
+# Pry.start
+require "irb"
+IRB.start(__FILE__)

data/bin/setup ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+IFS=$'\n\t'
+set -vx
+bundle install
+# Do any other automated setup that you need to do here

data/crawlbase.gemspec ADDED Viewed

@@ -0,0 +1,31 @@
+# coding: utf-8
+lib = File.expand_path("../lib", __FILE__)
+$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
+require "crawlbase/version"
+Gem::Specification.new do |spec|
+  spec.name        = "crawlbase"
+  spec.version     = Crawlbase::VERSION
+  spec.platform    = Gem::Platform::RUBY
+  spec.authors     = ["crawlbase"]
+  spec.email       = ["info@crawlbase.com"]
+  spec.summary     = %q{Crawlbase API client for web scraping and crawling}
+  spec.description = %q{Ruby based client for the Crawlbase API that helps developers crawl or scrape thousands of web pages anonymously}
+  spec.homepage    = "https://github.com/crawlbase-source/crawlbase-ruby"
+  spec.license     = "MIT"
+  spec.files         = `git ls-files -z`.split("\x0").reject do |f|
+    f.match(%r{^(test|spec|features)/})
+  end
+  spec.required_ruby_version = '>= 2.0'
+  spec.bindir        = "exe"
+  spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
+  spec.require_paths = ["lib"]
+  spec.add_development_dependency "rspec", "~> 3.2"
+  spec.add_development_dependency "webmock", "~> 3.4"
+  spec.add_development_dependency "bundler", "~> 2.0"
+  spec.add_development_dependency "rake", "~> 12.3.3"
+end

data/lib/crawlbase/api.rb ADDED Viewed

@@ -0,0 +1,82 @@
+# frozen_string_literal: true
+require 'net/http'
+require 'json'
+require 'uri'
+module Crawlbase
+  class API
+    attr_reader :token, :body, :status_code, :original_status, :pc_status, :url, :storage_url
+    INVALID_TOKEN = 'Token is required'
+    INVALID_URL = 'URL is required'
+    def initialize(options = {})
+      raise INVALID_TOKEN if options[:token].nil?
+      @token = options[:token]
+    end
+    def get(url, options = {})
+      raise INVALID_URL if url.empty?
+      uri = prepare_uri(url, options)
+      response = Net::HTTP.get_response(uri)
+      prepare_response(response, options[:format])
+      self
+    end
+    def post(url, data, options = {})
+      raise INVALID_URL if url.empty?
+      uri = prepare_uri(url, options)
+      http = Net::HTTP.new(uri.host, uri.port)
+      http.use_ssl = true
+      content_type = options[:post_content_type].to_s.include?('json') ? { 'Content-Type': 'text/json' } : nil
+      request = Net::HTTP::Post.new(uri.request_uri, content_type)
+      if options[:post_content_type].to_s.include?('json')
+        request.body = data.to_json
+      else
+        request.set_form_data(data)
+      end
+      response = http.request(request)
+      prepare_response(response, options[:format])
+      self
+    end
+    private
+    def base_url
+      'https://api.crawlbase.com'
+    end
+    def prepare_uri(url, options)
+      uri = URI(base_url)
+      uri.query = URI.encode_www_form({ token: @token, url: url }.merge(options))
+      uri
+    end
+    def prepare_response(response, format)
+      res = format == 'json' || base_url.include?('/scraper') ? JSON.parse(response.body) : response
+      @original_status = res['original_status'].to_i
+      @pc_status = res['pc_status'].to_i
+      @url = res['url']
+      @storage_url = res['storage_url']
+      @status_code = response.code.to_i
+      @body = response.body
+    end
+  end
+end

data/lib/crawlbase/leads_api.rb ADDED Viewed

@@ -0,0 +1,41 @@
+# frozen_string_literal: true
+require 'net/http'
+require 'json'
+require 'uri'
+module Crawlbase
+  class LeadsAPI
+    attr_reader :token, :body, :status_code, :success, :remaining_requests
+    INVALID_TOKEN = 'Token is required'
+    INVALID_DOMAIN = 'Domain is required'
+    def initialize(options = {})
+      raise INVALID_TOKEN if options[:token].nil? || options[:token].empty?
+      @token = options[:token]
+    end
+    def get(domain)
+      raise INVALID_DOMAIN if domain.empty?
+      uri = URI('https://api.crawlbase.com/leads')
+      uri.query = URI.encode_www_form({ token: token, domain: domain })
+      response = Net::HTTP.get_response(uri)
+      @status_code = response.code.to_i
+      @body = response.body
+      json_body = JSON.parse(response.body)
+      @success = json_body['success']
+      @remaining_requests = json_body['remaining_requests'].to_i
+      self
+    end
+    def post
+      raise 'Only GET is allowed for the LeadsAPI'
+    end
+  end
+end

data/lib/crawlbase/scraper_api.rb ADDED Viewed

@@ -0,0 +1,23 @@
+# frozen_string_literal: true
+module Crawlbase
+  class ScraperAPI < Crawlbase::API
+    attr_reader :remaining_requests
+    def post
+      raise 'Only GET is allowed for the ScraperAPI'
+    end
+    private
+    def prepare_response(response, format)
+      super(response, format)
+      json_body = JSON.parse(response.body)
+      @remaining_requests = json_body['remaining_requests'].to_i
+    end
+    def base_url
+      'https://api.crawlbase.com/scraper'
+    end
+  end
+end

data/lib/crawlbase/screenshots_api.rb ADDED Viewed

@@ -0,0 +1,52 @@
+# frozen_string_literal: true
+require 'securerandom'
+require 'tmpdir'
+module Crawlbase
+  class ScreenshotsAPI < Crawlbase::API
+    attr_reader :screenshot_path, :success, :remaining_requests, :screenshot_url
+    INVALID_SAVE_TO_PATH_FILENAME = 'Filename must end with .jpg or .jpeg'
+    SAVE_TO_PATH_FILENAME_PATTERN = /.+\.(jpg|JPG|jpeg|JPEG)$/.freeze
+    def post
+      raise 'Only GET is allowed for the ScreenshotsAPI'
+    end
+    def get(url, options = {})
+      screenshot_path = options.delete(:save_to_path) || generate_file_path
+      raise INVALID_SAVE_TO_PATH_FILENAME unless SAVE_TO_PATH_FILENAME_PATTERN =~ screenshot_path
+      response = super(url, options)
+      file = File.open(screenshot_path, 'w+')
+      file.write(response.body&.force_encoding('UTF-8'))
+      @screenshot_path = screenshot_path
+      yield(file) if block_given?
+      response
+    ensure
+      file&.close
+    end
+    private
+    def prepare_response(response, format)
+      super(response, format)
+      @remaining_requests = response['remaining_requests'].to_i
+      @success = response['success'] == 'true'
+      @screenshot_url = response['screenshot_url']
+    end
+    def base_url
+      'https://api.crawlbase.com/screenshots'
+    end
+    def generate_file_name
+      "#{SecureRandom.urlsafe_base64}.jpg"
+    end
+    def generate_file_path
+      File.join(Dir.tmpdir, generate_file_name)
+    end
+  end
+end

data/lib/crawlbase/storage_api.rb ADDED Viewed

@@ -0,0 +1,116 @@
+# frozen_string_literal: true
+require 'net/http'
+require 'json'
+require 'uri'
+module Crawlbase
+  class StorageAPI
+    attr_reader :token, :original_status, :pc_status, :url, :status_code, :rid, :body, :stored_at
+    INVALID_TOKEN = 'Token is required'
+    INVALID_RID = 'RID is required'
+    INVALID_RID_ARRAY = 'One or more RIDs are required'
+    INVALID_URL_OR_RID = 'Either URL or RID is required'
+    BASE_URL = 'https://api.crawlbase.com/storage'
+    def initialize(options = {})
+      raise INVALID_TOKEN if options[:token].nil? || options[:token].empty?
+      @token = options[:token]
+    end
+    def get(url_or_rid, format = 'html')
+      raise INVALID_URL_OR_RID if url_or_rid.nil? || url_or_rid.empty?
+      uri = URI(BASE_URL)
+      uri.query = URI.encode_www_form({ token: token, format: format }.merge(decide_url_or_rid(url_or_rid)))
+      response = Net::HTTP.get_response(uri)
+      res = format == 'json' ? JSON.parse(response.body) : response
+      @original_status = res['original_status'].to_i
+      @pc_status = res['pc_status'].to_i
+      @url = res['url']
+      @rid = res['rid']
+      @stored_at = res['stored_at']
+      @status_code = response.code.to_i
+      @body = response.body
+      self
+    end
+    def delete(rid)
+      raise INVALID_RID if rid.nil? || rid.empty?
+      uri = URI(BASE_URL)
+      uri.query = URI.encode_www_form(token: token, rid: rid)
+      http = Net::HTTP.new(uri.host)
+      request = Net::HTTP::Delete.new(uri.request_uri)
+      response = http.request(request)
+      @url, @original_status, @pc_status, @stored_at = nil
+      @status_code = response.code.to_i
+      @rid = rid
+      @body = JSON.parse(response.body)
+      @body.key?('success')
+    end
+    def bulk(rids_array = [])
+      raise INVALID_RID_ARRAY if rids_array.empty?
+      uri = URI("#{BASE_URL}/bulk")
+      uri.query = URI.encode_www_form(token: token)
+      http = Net::HTTP.new(uri.host)
+      request = Net::HTTP::Post.new(uri.request_uri, { 'Content-Type': 'application/json' })
+      request.body = { rids: rids_array }.to_json
+      response = http.request(request)
+      @body = JSON.parse(response.body)
+      @original_status = @body.map { |item| item['original_status'].to_i }
+      @status_code = response.code.to_i
+      @pc_status = @body.map { |item| item['pc_status'].to_i }
+      @url = @body.map { |item| item['url'] }
+      @rid = @body.map { |item| item['rid'] }
+      @stored_at = @body.map { |item| item['stored_at'] }
+      self
+    end
+    def rids(limit = -1)
+      uri = URI("#{BASE_URL}/rids")
+      query_hash = { token: token }
+      query_hash.merge!({ limit: limit }) if limit >= 0
+      uri.query = URI.encode_www_form(query_hash)
+      response = Net::HTTP.get_response(uri)
+      @url, @original_status, @pc_status, @stored_at = nil
+      @status_code = response.code.to_i
+      @body = JSON.parse(response.body)
+      @rid = @body
+      @body
+    end
+    def total_count
+      uri = URI("#{BASE_URL}/total_count")
+      uri.query = URI.encode_www_form(token: token)
+      response = Net::HTTP.get_response(uri)
+      @url, @original_status, @pc_status, @stored_at = nil
+      @status_code = response.code.to_i
+      @rid = rid
+      @body = JSON.parse(response.body)
+      body['totalCount']
+    end
+    private
+    def decide_url_or_rid(url_or_rid)
+      %r{^https?://} =~ url_or_rid ? { url: url_or_rid } : { rid: url_or_rid }
+    end
+  end
+end

data/lib/crawlbase/version.rb ADDED Viewed

@@ -0,0 +1,5 @@
+# frozen_string_literal: true
+module Crawlbase
+  VERSION = '1.0.0'
+end

data/lib/crawlbase.rb ADDED Viewed

@@ -0,0 +1,11 @@
+# frozen_string_literal: true
+require 'crawlbase/version'
+require 'crawlbase/api'
+require 'crawlbase/scraper_api'
+require 'crawlbase/leads_api'
+require 'crawlbase/screenshots_api'
+require 'crawlbase/storage_api'
+module Crawlbase
+end

metadata ADDED Viewed

@@ -0,0 +1,116 @@
+--- !ruby/object:Gem::Specification
+name: crawlbase
+version: !ruby/object:Gem::Version
+  version: 1.0.0
+platform: ruby
+authors:
+- crawlbase
+autorequire:
+bindir: exe
+cert_chain: []
+date: 2023-06-29 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.2'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.2'
+- !ruby/object:Gem::Dependency
+  name: webmock
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.4'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.4'
+- !ruby/object:Gem::Dependency
+  name: bundler
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.0'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 12.3.3
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 12.3.3
+description: Ruby based client for the Crawlbase API that helps developers crawl or
+  scrape thousands of web pages anonymously
+email:
+- info@crawlbase.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- ".gitignore"
+- CODE_OF_CONDUCT.md
+- Gemfile
+- LICENSE.txt
+- README.md
+- Rakefile
+- bin/console
+- bin/setup
+- crawlbase.gemspec
+- lib/crawlbase.rb
+- lib/crawlbase/api.rb
+- lib/crawlbase/leads_api.rb
+- lib/crawlbase/scraper_api.rb
+- lib/crawlbase/screenshots_api.rb
+- lib/crawlbase/storage_api.rb
+- lib/crawlbase/version.rb
+homepage: https://github.com/crawlbase-source/crawlbase-ruby
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '2.0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 3.1.2
+signing_key:
+specification_version: 4
+summary: Crawlbase API client for web scraping and crawling
+test_files: []