RubyGems - twitterscraper-ruby - Versions diffs - 0.4.0 → 0.9.0 - Mend

twitterscraper-ruby 0.4.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml +4 -4
data/Gemfile.lock +3 -1
data/README.md +131 -16
data/lib/twitterscraper.rb +1 -2
data/lib/twitterscraper/cli.rb +58 -9
data/lib/twitterscraper/http.rb +0 -1
data/lib/twitterscraper/proxy.rb +10 -8
data/lib/twitterscraper/query.rb +102 -34
data/lib/twitterscraper/tweet.rb +66 -5
data/lib/version.rb +1 -1
data/twitterscraper-ruby.gemspec +1 -0
metadata +16 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: eda9826c0c4afe8f4ee557a309d82330b0e970882e19d38d917d854ea4bd308b
-  data.tar.gz: 11b36f581640e7ab492b15364ed0521e7a15ad4d9b0e94d5b9d5aece36541d6a
+  metadata.gz: 59b71fc6129f6d8c5a441981dc1577fa9b761380ff119bed4985cfcd88ccb31b
+  data.tar.gz: 2de3fcadc334ee2689d3083ea9324127c3b22ec94cf1b08dec920f9c95771445
 SHA512:
-  metadata.gz: 990044f929c9dbcca4f17eb21730094cdc8d9aaf6b0a53eb012e55cd2738a26d3bd18dcc75456a8dd4d00a132faa1d32e4d04c2bcec5385ee1cfa554b4e7cfab
-  data.tar.gz: 6f50f5add0359866a2c4fa7f2ae78fb5dd96cbf3ab7525be847daee1a40015df2836b5900468159b34e24641cde7dc07267f53ee1e29ccea0401b5f85080f44b
+  metadata.gz: b1e392bc021f6f758b79b7bdcd099af2ac391863f8712dadb5fd19248946867cfd89f140b836532fb40554c82697b26ef3af00b7cbb2cb13b0d5a8e2a38c87e7
+  data.tar.gz: 8c0e81589202e4a094c17604354f0f23a08b4536fe60b58ffe616cf1233c0531547ef02b8e88b6f70b1870ce2d134e4518ee093a5349144e2edfce3b1088e06c

data/Gemfile.lock CHANGED

@@ -1,8 +1,9 @@
 PATH
   remote: .
   specs:
-    twitterscraper-ruby (0.4.0)
+    twitterscraper-ruby (0.9.0)
       nokogiri
+      parallel
 GEM
   remote: https://rubygems.org/
@@ -11,6 +12,7 @@ GEM
     minitest (5.14.1)
     nokogiri (1.10.10)
       mini_portile2 (~> 2.4.0)
+    parallel (1.19.2)
     rake (12.3.3)
 PLATFORMS

data/README.md CHANGED

@@ -1,46 +1,161 @@
 # twitterscraper-ruby
-Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/twitterscraper/ruby`. To experiment with that code, run `bin/console` for an interactive prompt.
+[![Gem Version](https://badge.fury.io/rb/twitterscraper-ruby.svg)](https://badge.fury.io/rb/twitterscraper-ruby)
-TODO: Delete this and the text above, and describe your gem
+A gem to scrape https://twitter.com/search. This gem is inspired by [taspinar/twitterscraper](https://github.com/taspinar/twitterscraper).
-## Installation
-Add this line to your application's Gemfile:
+## Twitter Search API vs. twitterscraper-ruby
-```ruby
-gem 'twitterscraper-ruby'
-```
+### Twitter Search API
-And then execute:
+- The number of tweets: 180 - 450 requests/15 minutes (18,000 - 45,000 tweets/15 minutes)
+- The time window: the past 7 days
-    $ bundle install
+### twitterscraper-ruby
-Or install it yourself as:
+- The number of tweets: Unlimited
+- The time window: from 2006-3-21 to today
-    $ gem install twitterscraper-ruby
+## Installation
+First install the library:
+```shell script
+$ gem install twitterscraper-ruby
+````
 ## Usage
+Command-line interface:
+```shell script
+$ twitterscraper --query KEYWORD --start_date 2020-06-01 --end_date 2020-06-30 --lang ja \
+      --limit 100 --threads 10 --proxy --output output.json
+```
+From Within Ruby:
 ```ruby
 require 'twitterscraper'
+options = {
+  start_date: '2020-06-01',
+  end_date:   '2020-06-30',
+  lang:       'ja',
+  limit:      100,
+  threads:    10,
+  proxy:      true
+}
+client = Twitterscraper::Client.new
+tweets = client.query_tweets(KEYWORD, options)
+tweets.each do |tweet|
+  puts tweet.tweet_id
+  puts tweet.text
+  puts tweet.tweet_url
+  puts tweet.created_at
+  hash = tweet.attrs
+  puts hash.keys
+end
 ```
-## Development
-After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+## Attributes
+### Tweet
+- screen_name
+- name
+- user_id
+- tweet_id
+- text
+- links
+- hashtags
+- image_urls
+- video_url
+- has_media
+- likes
+- retweets
+- replies
+- is_replied
+- is_reply_to
+- parent_tweet_id
+- reply_to_users
+- tweet_url
+- created_at
+## Search operators
+| Operator | Finds Tweets... |
+| ------------- | ------------- |
+| watching now | containing both "watching" and "now". This is the default operator. |
+| "happy hour" | containing the exact phrase "happy hour". |
+| love OR hate | containing either "love" or "hate" (or both). |
+| beer -root | containing "beer" but not "root". |
+| #haiku | containing the hashtag "haiku". |
+| from:interior | sent from Twitter account "interior". |
+| to:NASA | a Tweet authored in reply to Twitter account "NASA". |
+| @NASA | mentioning Twitter account "NASA". |
+| puppy filter:media | containing "puppy" and an image or video. |
+| puppy -filter:retweets | containing "puppy", filtering out retweets |
+| superhero since:2015-12-21 | containing "superhero" and sent since date "2015-12-21" (year-month-day). |
+| puppy until:2015-12-21 | containing "puppy" and sent before the date "2015-12-21". |
+Search operators documentation is in [Standard search operators](https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators).
+## Examples
+```shell script
+$ twitterscraper --query twitter --limit 1000
+$ cat tweets.json | jq . | less
+```
+```json
+[
+  {
+    "screen_name": "@screenname",
+    "name": "name",
+    "user_id": 1194529546483000000,
+    "tweet_id": 1282659891992000000,
+    "tweet_url": "https://twitter.com/screenname/status/1282659891992000000",
+    "created_at": "2020-07-13 12:00:00 +0000",
+    "text": "Thanks Twitter!"
+  }
+]
+```
+## CLI Options
+| Option | Description | Default |
+| ------------- | ------------- | ------------- |
+| `-h`, `--help` | This option displays a summary of twitterscraper. | |
+| `--query` | Specify a keyword used during the search. | |
+| `--start_date` | Set the date from which twitterscraper-ruby should start scraping for your query. | |
+| `--end_date` | Set the enddate which twitterscraper-ruby should use to stop scraping for your query. | |
+| `--lang` | Retrieve tweets written in a specific language. | |
+| `--limit` | Stop scraping when *at least* the number of tweets indicated with --limit is scraped. | 100 |
+| `--threads` | Set the number of threads twitterscraper-ruby should initiate while scraping for your query. | 2 |
+| `--proxy` | Scrape https://twitter.com/search via proxies. | false |
+| `--output` | The name of the output file. | tweets.json |
-To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
 ## Contributing
-Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/twitterscraper-ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/[USERNAME]/twitterscraper-ruby/blob/master/CODE_OF_CONDUCT.md).
+Bug reports and pull requests are welcome on GitHub at https://github.com/ts-3156/twitterscraper-ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/ts-3156/twitterscraper-ruby/blob/master/CODE_OF_CONDUCT.md).
 ## License
 The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
 ## Code of Conduct
-Everyone interacting in the Twitterscraper::Ruby project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/twitterscraper-ruby/blob/master/CODE_OF_CONDUCT.md).
+Everyone interacting in the twitterscraper-ruby project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/ts-3156/twitterscraper-ruby/blob/master/CODE_OF_CONDUCT.md).

data/lib/twitterscraper.rb CHANGED

@@ -9,10 +9,9 @@ require 'version'
 module Twitterscraper
   class Error < StandardError; end
-  # Your code goes here...
   def self.logger
-    @logger ||= ::Logger.new(STDOUT)
+    @logger ||= ::Logger.new(STDOUT, level: ::Logger::INFO)
   end
   def self.logger=(logger)

data/lib/twitterscraper/cli.rb CHANGED

@@ -8,17 +8,24 @@ module Twitterscraper
   class Cli
     def parse
       @options = parse_options(ARGV)
+      initialize_logger
     end
     def run
-      client = Twitterscraper::Client.new
-      limit = options['limit'] ? options['limit'].to_i : 100
-      tweets = client.query_tweets(options['query'], limit: limit, start_date: options['start_date'], end_date: options['end_date'])
-      File.write('tweets.json', generate_json(tweets))
-    end
+      print_help || return if print_help?
+      print_version || return if print_version?
-    def options
-      @options
+      query_options = {
+          start_date: options['start_date'],
+          end_date: options['end_date'],
+          lang: options['lang'],
+          limit: options['limit'],
+          threads: options['threads'],
+          proxy: options['proxy']
+      }
+      client = Twitterscraper::Client.new
+      tweets = client.query_tweets(options['query'], query_options)
+      File.write(options['output'], generate_json(tweets)) unless tweets.empty?
     end
     def generate_json(tweets)
@@ -29,15 +36,57 @@ module Twitterscraper
       end
     end
+    def options
+      @options
+    end
     def parse_options(argv)
-      argv.getopts(
+      options = argv.getopts(
           'h',
+          'help',
+          'v',
+          'version',
           'query:',
-          'limit:',
           'start_date:',
           'end_date:',
+          'lang:',
+          'limit:',
+          'threads:',
+          'output:',
+          'proxy',
           'pretty',
+          'verbose',
       )
+      options['lang'] ||= ''
+      options['limit'] = (options['limit'] || 100).to_i
+      options['threads'] = (options['threads'] || 2).to_i
+      options['output'] ||= 'tweets.json'
+      options
+    end
+    def initialize_logger
+      Twitterscraper.logger.level = ::Logger::DEBUG if options['verbose']
+    end
+    def print_help?
+      options['h'] || options['help']
+    end
+    def print_help
+      puts <<~'SHELL'
+        Usage:
+          twitterscraper --query KEYWORD --limit 100 --threads 10 --start_date 2020-07-01 --end_date 2020-07-10 --lang ja --proxy --output output.json
+      SHELL
+    end
+    def print_version?
+      options['v'] || options['version']
+    end
+    def print_version
+      puts "twitterscraper-#{Twitterscraper::VERSION}"
     end
   end
 end

data/lib/twitterscraper/http.rb CHANGED

@@ -9,7 +9,6 @@ module Twitterscraper
       if proxy
         ip, port = proxy.split(':')
         http_class = Net::HTTP::Proxy(ip, port.to_i)
-        Twitterscraper.logger.info("Using proxy #{proxy}")
       else
         http_class = Net::HTTP
       end

data/lib/twitterscraper/proxy.rb CHANGED

@@ -6,9 +6,9 @@ module Twitterscraper
     class RetryExhausted < StandardError
     end
-    class Result
-      def initialize(items)
-        @items = items.shuffle
+    class Pool
+      def initialize
+        @items = Proxy.get_proxies
         @cur_index = 0
       end
@@ -17,7 +17,9 @@ module Twitterscraper
           reload
         end
         @cur_index += 1
-        @items[@cur_index - 1]
+        item = @items[@cur_index - 1]
+        Twitterscraper.logger.info("Using proxy #{item}")
+        item
       end
       def size
@@ -27,9 +29,8 @@ module Twitterscraper
       private
       def reload
-        @items = Proxy.get_proxies.shuffle
+        @items = Proxy.get_proxies
         @cur_index = 0
-        Twitterscraper.logger.debug "Reload #{proxies.size} proxies"
       end
     end
@@ -44,13 +45,14 @@ module Twitterscraper
       table.xpath('tbody/tr').each do |tr|
         cells = tr.xpath('td')
-        ip, port, https = [0, 1, 6].map { |i| cells[i].text.strip }
+        ip, port, anonymity, https = [0, 1, 4, 6].map { |i| cells[i].text.strip }
+        next unless ['elite proxy', 'anonymous'].include?(anonymity)
         next if https == 'no'
         proxies << ip + ':' + port
       end
       Twitterscraper.logger.debug "Fetch #{proxies.size} proxies"
-      Result.new(proxies)
+      proxies.shuffle
     rescue => e
       if (retries -= 1) > 0
         retry

data/lib/twitterscraper/query.rb CHANGED

@@ -1,7 +1,10 @@
+require 'resolv-replace'
 require 'net/http'
 require 'nokogiri'
 require 'date'
 require 'json'
+require 'erb'
+require 'parallel'
 module Twitterscraper
   module Query
@@ -14,7 +17,6 @@ module Twitterscraper
         'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16',
         'Mozilla/5.0 (Windows NT 5.2; RW; rv:7.0a1) Gecko/20091211 SeaMonkey/9.23a1pre',
     ]
-    USER_AGENT = USER_AGENT_LIST.sample
     INIT_URL = 'https://twitter.com/search?f=tweets&vertical=default&q=__QUERY__&l=__LANG__'
     RELOAD_URL = 'https://twitter.com/i/search/timeline?f=tweets&vertical=' +
@@ -40,7 +42,8 @@ module Twitterscraper
       end
     end
-    def get_single_page(url, headers, proxies, timeout = 10, retries = 30)
+    def get_single_page(url, headers, proxies, timeout = 6, retries = 30)
+      return nil if stop_requested?
       Twitterscraper::Http.get(url, headers, proxies.sample, timeout)
     rescue => e
       logger.debug "query_single_page: #{e.inspect}"
@@ -53,26 +56,30 @@ module Twitterscraper
     end
     def parse_single_page(text, html = true)
+      return [nil, nil] if text.nil? || text == ''
       if html
         json_resp = nil
         items_html = text
       else
         json_resp = JSON.parse(text)
         items_html = json_resp['items_html'] || ''
-        logger.debug json_resp['message'] if json_resp['message'] # Sorry, you are rate limited.
+        logger.warn json_resp['message'] if json_resp['message'] # Sorry, you are rate limited.
       end
       [items_html, json_resp]
     end
     def query_single_page(query, lang, pos, from_user = false, headers: [], proxies: [])
-      query = query.gsub(' ', '%20').gsub('#', '%23').gsub(':', '%3A').gsub('&', '%26')
       logger.info("Querying #{query}")
+      query = ERB::Util.url_encode(query)
       url = build_query_url(query, lang, pos, from_user)
       logger.debug("Scraping tweets from #{url}")
       response = get_single_page(url, headers, proxies)
+      return [], nil if response.nil?
       html, json_resp = parse_single_page(response, pos.nil?)
       tweets = Tweet.from_html(html)
@@ -90,51 +97,112 @@ module Twitterscraper
       end
     end
-    def query_tweets(query, start_date: nil, end_date: nil, limit: 100, threads: 2, lang: '')
-      start_date = start_date ? Date.parse(start_date) : Date.parse('2006-3-21')
-      end_date = end_date ? Date.parse(end_date) : Date.today
-      if start_date == end_date
-        raise 'Please specify different values for :start_date and :end_date.'
-      elsif start_date > end_date
-        raise ':start_date must occur before :end_date.'
+    OLDEST_DATE = Date.parse('2006-03-21')
+    def validate_options!(query, start_date:, end_date:, lang:, limit:, threads:, proxy:)
+      if query.nil? || query == ''
+        raise 'Please specify a search query.'
+      end
+      if ERB::Util.url_encode(query).length >= 500
+        raise ':query must be a UTF-8, URL-encoded search query of 500 characters maximum, including operators.'
+      end
+      if start_date && end_date
+        if start_date == end_date
+          raise 'Please specify different values for :start_date and :end_date.'
+        elsif start_date > end_date
+          raise ':start_date must occur before :end_date.'
+        end
+      end
+      if start_date
+        if start_date < OLDEST_DATE
+          raise ":start_date must be greater than or equal to #{OLDEST_DATE}"
+        end
       end
-      # TODO parallel
+      if end_date
+        today = Date.today
+        if end_date > Date.today
+          raise ":end_date must be less than or equal to today(#{today})"
+        end
+      end
+    end
+    def build_queries(query, start_date, end_date)
+      if start_date && end_date
+        date_range = start_date.upto(end_date - 1)
+        date_range.map { |date| query + " since:#{date} until:#{date + 1}" }
+      elsif start_date
+        [query + " since:#{start_date}"]
+      elsif end_date
+        [query + " until:#{end_date}"]
+      else
+        [query]
+      end
+    end
+    def main_loop(query, lang, limit, headers, proxies)
       pos = nil
-      all_tweets = []
-      proxies = Twitterscraper::Proxy.get_proxies
+      while true
+        new_tweets, new_pos = query_single_page(query, lang, pos, headers: headers, proxies: proxies)
+        unless new_tweets.empty?
+          @mutex.synchronize {
+            @all_tweets.concat(new_tweets)
+            @all_tweets.uniq! { |t| t.tweet_id }
+          }
+        end
+        logger.info("Got #{new_tweets.size} tweets (total #{@all_tweets.size})")
-      headers = {'User-Agent': USER_AGENT, 'X-Requested-With': 'XMLHttpRequest'}
-      logger.info("Headers #{headers}")
+        break unless new_pos
+        break if @all_tweets.size >= limit
-      start_date.upto(end_date) do |date|
-        break if date == end_date
+        pos = new_pos
+      end
-        queries = query + " since:#{date} until:#{date + 1}"
+      if @all_tweets.size >= limit
+        logger.info("Limit reached #{@all_tweets.size}")
+        @stop_requested = true
+      end
+    end
-        while true
-          new_tweets, new_pos = query_single_page(queries, lang, pos, headers: headers, proxies: proxies)
-          unless new_tweets.empty?
-            all_tweets.concat(new_tweets)
-            all_tweets.uniq! { |t| t.tweet_id }
-          end
-          logger.info("Got #{new_tweets.size} tweets (total #{all_tweets.size})")
+    def stop_requested?
+      @stop_requested
+    end
-          break unless new_pos
-          break if all_tweets.size >= limit
+    def query_tweets(query, start_date: nil, end_date: nil, lang: '', limit: 100, threads: 2, proxy: false)
+      start_date = Date.parse(start_date) if start_date && start_date.is_a?(String)
+      end_date = Date.parse(end_date) if end_date && end_date.is_a?(String)
+      queries = build_queries(query, start_date, end_date)
+      threads = queries.size if threads > queries.size
+      proxies = proxy ? Twitterscraper::Proxy::Pool.new : []
-          pos = new_pos
-        end
+      validate_options!(queries[0], start_date: start_date, end_date: end_date, lang: lang, limit: limit, threads: threads, proxy: proxy)
+      logger.info("The number of threads #{threads}")
+      headers = {'User-Agent': USER_AGENT_LIST.sample, 'X-Requested-With': 'XMLHttpRequest'}
+      logger.info("Headers #{headers}")
-        if all_tweets.size >= limit
-          logger.info("Reached limit #{all_tweets.size}")
-          break
+      @all_tweets = []
+      @mutex = Mutex.new
+      @stop_requested = false
+      if threads > 1
+        Parallel.each(queries, in_threads: threads) do |query|
+          main_loop(query, lang, limit, headers, proxies)
+          raise Parallel::Break if stop_requested?
+        end
+      else
+        queries.each do |query|
+          main_loop(query, lang, limit, headers, proxies)
+          break if stop_requested?
         end
       end
-      all_tweets
+      @all_tweets.sort_by { |tweet| -tweet.created_at.to_i }
     end
   end
 end

data/lib/twitterscraper/tweet.rb CHANGED

@@ -2,7 +2,28 @@ require 'time'
 module Twitterscraper
   class Tweet
-    KEYS = [:screen_name, :name, :user_id, :tweet_id, :tweet_url, :created_at, :text]
+    KEYS = [
+        :screen_name,
+        :name,
+        :user_id,
+        :tweet_id,
+        :text,
+        :links,
+        :hashtags,
+        :image_urls,
+        :video_url,
+        :has_media,
+        :likes,
+        :retweets,
+        :replies,
+        :is_replied,
+        :is_reply_to,
+        :parent_tweet_id,
+        :reply_to_users,
+        :tweet_url,
+        :timestamp,
+        :created_at,
+    ]
     attr_reader *KEYS
     def initialize(attrs)
@@ -11,10 +32,14 @@ module Twitterscraper
       end
     end
-    def to_json(options = {})
+    def attrs
       KEYS.map do |key|
         [key, send(key)]
-      end.to_h.to_json
+      end.to_h
+    end
+    def to_json(options = {})
+      attrs.to_json
     end
     class << self
@@ -31,15 +56,51 @@ module Twitterscraper
       def from_tweet_html(html)
         inner_html = Nokogiri::HTML(html.inner_html)
+        tweet_id = html.attr('data-tweet-id').to_i
+        text = inner_html.xpath("//div[@class[contains(., 'js-tweet-text-container')]]/p[@class[contains(., 'js-tweet-text')]]").first.text
+        links = inner_html.xpath("//a[@class[contains(., 'twitter-timeline-link')]]").map { |elem| elem.attr('data-expanded-url') }.select { |link| link && !link.include?('pic.twitter') }
+        image_urls = inner_html.xpath("//div[@class[contains(., 'AdaptiveMedia-photoContainer')]]").map { |elem| elem.attr('data-image-url') }
+        video_url = inner_html.xpath("//div[@class[contains(., 'PlayableMedia-container')]]/a").map { |elem| elem.attr('href') }[0]
+        has_media = !image_urls.empty? || (video_url && !video_url.empty?)
+        actions = inner_html.xpath("//div[@class[contains(., 'ProfileTweet-actionCountList')]]")
+        likes = actions.xpath("//span[@class[contains(., 'ProfileTweet-action--favorite')]]/span[@class[contains(., 'ProfileTweet-actionCount')]]").first.attr('data-tweet-stat-count').to_i || 0
+        retweets = actions.xpath("//span[@class[contains(., 'ProfileTweet-action--retweet')]]/span[@class[contains(., 'ProfileTweet-actionCount')]]").first.attr('data-tweet-stat-count').to_i || 0
+        replies = actions.xpath("//span[@class[contains(., 'ProfileTweet-action--reply u-hiddenVisually')]]/span[@class[contains(., 'ProfileTweet-actionCount')]]").first.attr('data-tweet-stat-count').to_i || 0
+        is_replied = replies != 0
+        parent_tweet_id = inner_html.xpath('//*[@data-conversation-id]').first.attr('data-conversation-id').to_i
+        if tweet_id == parent_tweet_id
+          is_reply_to = false
+          parent_tweet_id = nil
+          reply_to_users = []
+        else
+          is_reply_to = true
+          reply_to_users = inner_html.xpath("//div[@class[contains(., 'ReplyingToContextBelowAuthor')]]/a").map { |user| {screen_name: user.text.delete_prefix('@'), user_id: user.attr('data-user-id')} }
+        end
         timestamp = inner_html.xpath("//span[@class[contains(., 'js-short-timestamp')]]").first.attr('data-time').to_i
         new(
             screen_name: html.attr('data-screen-name'),
             name: html.attr('data-name'),
             user_id: html.attr('data-user-id').to_i,
-            tweet_id: html.attr('data-tweet-id').to_i,
+            tweet_id: tweet_id,
+            text: text,
+            links: links,
+            hashtags: text.scan(/#\w+/).map { |tag| tag.delete_prefix('#') },
+            image_urls: image_urls,
+            video_url: video_url,
+            has_media: has_media,
+            likes: likes,
+            retweets: retweets,
+            replies: replies,
+            is_replied: is_replied,
+            is_reply_to: is_reply_to,
+            parent_tweet_id: parent_tweet_id,
+            reply_to_users: reply_to_users,
             tweet_url: 'https://twitter.com' + html.attr('data-permalink-path'),
+            timestamp: timestamp,
             created_at: Time.at(timestamp, in: '+00:00'),
-            text: inner_html.xpath("//div[@class[contains(., 'js-tweet-text-container')]]/p[@class[contains(., 'js-tweet-text')]]").first.text,
         )
       end
     end

data/lib/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Twitterscraper
-  VERSION = "0.4.0"
+  VERSION = '0.9.0'
 end

data/twitterscraper-ruby.gemspec CHANGED

@@ -27,4 +27,5 @@ Gem::Specification.new do |spec|
   spec.required_ruby_version = ">= 2.6.4"
   spec.add_dependency "nokogiri"
+  spec.add_dependency "parallel"
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: twitterscraper-ruby
 version: !ruby/object:Gem::Version
-  version: 0.4.0
+  version: 0.9.0
 platform: ruby
 authors:
 - ts-3156
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2020-07-12 00:00:00.000000000 Z
+date: 2020-07-13 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: nokogiri
@@ -24,6 +24,20 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: parallel
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 description: A gem  to scrape Tweets
 email:
 - ts_3156@yahoo.co.jp