twitterscraper-ruby 0.2.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6791ebfd82694e768350ec33a19d3a34336c26c5344e57ef92af6bf02a0dddf1
4
- data.tar.gz: 351bf02ad483c60993114a828a4a1e39b936ab8e1373237aa66de6d0a3c809a6
3
+ metadata.gz: 9cfd03782734642da8ac29839788f142399d2a3f4ec601e8b6f47ae1ca38c17f
4
+ data.tar.gz: 07a398e51fd2fbdc735ae27008d9a23e97dc390632179738045db4c81bd4fcad
5
5
  SHA512:
6
- metadata.gz: 236d01eaaf4ed8c5c016fff35b5794e1609e840d8d27edfba92e0fc63138dfced1a10f4e2952d919360d9cc5111bc4987422a9e709c3e798434614b734b3b029
7
- data.tar.gz: 6831c32b358651e8c75af0772afbd0f2888934e5ef314112ecaa2dab1bcaeb681dc6a350d473eab79e36a83da57059b35dd88d693cb3a2f894789cb03ceb1e8c
6
+ metadata.gz: 6f417fe3379a3d9d134c308a9ea9d4e01b458018c9c5a3f8508a85e7f5890d01991838cfcabe87b8246f69edf4458c66d17924359798017907862071353f643d
7
+ data.tar.gz: 758bcb55ded936c3696f99647f64bc9921386b3cb0c783c218510c0e36991ae6b95a9d08fa071e02072c8b727bbadb6674ceeb19a74e356a842d62c1ec4c038f
@@ -1,8 +1,9 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- twitterscraper-ruby (0.2.0)
4
+ twitterscraper-ruby (0.7.0)
5
5
  nokogiri
6
+ parallel
6
7
 
7
8
  GEM
8
9
  remote: https://rubygems.org/
@@ -11,6 +12,7 @@ GEM
11
12
  minitest (5.14.1)
12
13
  nokogiri (1.10.10)
13
14
  mini_portile2 (~> 2.4.0)
15
+ parallel (1.19.2)
14
16
  rake (12.3.3)
15
17
 
16
18
  PLATFORMS
data/README.md CHANGED
@@ -1,46 +1,127 @@
1
1
  # twitterscraper-ruby
2
2
 
3
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/twitterscraper/ruby`. To experiment with that code, run `bin/console` for an interactive prompt.
3
+ [![Gem Version](https://badge.fury.io/rb/twitterscraper-ruby.svg)](https://badge.fury.io/rb/twitterscraper-ruby)
4
4
 
5
- TODO: Delete this and the text above, and describe your gem
5
+ A gem to scrape https://twitter.com/search. This gem is inspired by [taspinar/twitterscraper](https://github.com/taspinar/twitterscraper).
6
6
 
7
- ## Installation
8
7
 
9
- Add this line to your application's Gemfile:
8
+ ## Twitter Search API vs. twitterscraper-ruby
10
9
 
11
- ```ruby
12
- gem 'twitterscraper-ruby'
13
- ```
10
+ ### Twitter Search API
11
+
12
+ - The number of tweets: 180 - 450 requests/15 minutes (18,000 - 45,000 tweets/15 minutes)
13
+ - The time window: the past 7 days
14
+
15
+ ### twitterscraper-ruby
16
+
17
+ - The number of tweets: Unlimited
18
+ - The time window: from 2006-3-21 to today
14
19
 
15
- And then execute:
16
20
 
17
- $ bundle install
21
+ ## Installation
18
22
 
19
- Or install it yourself as:
23
+ First install the library:
20
24
 
21
- $ gem install twitterscraper-ruby
25
+ ```shell script
26
+ $ gem install twitterscraper-ruby
27
+ ````
28
+
22
29
 
23
30
  ## Usage
24
31
 
32
+ Command-line interface:
33
+
34
+ ```shell script
35
+ $ twitterscraper --query KEYWORD --start_date 2020-06-01 --end_date 2020-06-30 --lang ja \
36
+ --limit 100 --threads 10 --proxy --output output.json
37
+ ```
38
+
39
+ From Within Ruby:
40
+
25
41
  ```ruby
26
42
  require 'twitterscraper'
43
+
44
+ options = {
45
+ start_date: '2020-06-01',
46
+ end_date: '2020-06-30',
47
+ lang: 'ja',
48
+ limit: 100,
49
+ threads: 10,
50
+ proxy: true
51
+ }
52
+
53
+ client = Twitterscraper::Client.new
54
+ tweets = client.query_tweets(KEYWORD, options)
55
+
56
+ tweets.each do |tweet|
57
+ puts tweet.tweet_id
58
+ puts tweet.text
59
+ puts tweet.created_at
60
+ puts tweet.tweet_url
61
+ end
27
62
  ```
28
63
 
29
- ## Development
30
64
 
31
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
65
+ ## Examples
66
+
67
+ ```shell script
68
+ $ twitterscraper --query twitter --limit 1000
69
+ $ cat tweets.json | jq . | less
70
+ ```
71
+
72
+ ```json
73
+ [
74
+ {
75
+ "screen_name": "@screenname",
76
+ "name": "name",
77
+ "user_id": 1194529546483000000,
78
+ "tweet_id": 1282659891992000000,
79
+ "tweet_url": "https://twitter.com/screenname/status/1282659891992000000",
80
+ "created_at": "2020-07-13 12:00:00 +0000",
81
+ "text": "Thanks Twitter!"
82
+ },
83
+ ...
84
+ ]
85
+ ```
86
+
87
+ ## Attributes
88
+
89
+ ### Tweet
90
+
91
+ - tweet_id
92
+ - text
93
+ - user_id
94
+ - screen_name
95
+ - name
96
+ - tweet_url
97
+ - created_at
98
+
99
+
100
+ ## CLI Options
101
+
102
+ | Option | Description | Default |
103
+ | ------------- | ------------- | ------------- |
104
+ | `-h`, `--help` | This option displays a summary of twitterscraper. | |
105
+ | `--query` | Specify a keyword used during the search. | |
106
+ | `--start_date` | Set the date from which twitterscraper-ruby should start scraping for your query. | |
107
+ | `--end_date` | Set the enddate which twitterscraper-ruby should use to stop scraping for your query. | |
108
+ | `--lang` | Retrieve tweets written in a specific language. | |
109
+ | `--limit` | Stop scraping when *at least* the number of tweets indicated with --limit is scraped. | 100 |
110
+ | `--threads` | Set the number of threads twitterscraper-ruby should initiate while scraping for your query. | 2 |
111
+ | `--proxy` | Scrape https://twitter.com/search via proxies. | false |
112
+ | `--output` | The name of the output file. | tweets.json |
32
113
 
33
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
34
114
 
35
115
  ## Contributing
36
116
 
37
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/twitterscraper-ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/[USERNAME]/twitterscraper-ruby/blob/master/CODE_OF_CONDUCT.md).
117
+ Bug reports and pull requests are welcome on GitHub at https://github.com/ts-3156/twitterscraper-ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/ts-3156/twitterscraper-ruby/blob/master/CODE_OF_CONDUCT.md).
38
118
 
39
119
 
40
120
  ## License
41
121
 
42
122
  The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
43
123
 
124
+
44
125
  ## Code of Conduct
45
126
 
46
- Everyone interacting in the Twitterscraper::Ruby project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/twitterscraper-ruby/blob/master/CODE_OF_CONDUCT.md).
127
+ Everyone interacting in the twitterscraper-ruby project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/ts-3156/twitterscraper-ruby/blob/master/CODE_OF_CONDUCT.md).
@@ -0,0 +1,13 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require_relative '../lib/twitterscraper/cli'
4
+
5
+ begin
6
+ cli = Twitterscraper::Cli.new
7
+ cli.parse
8
+ cli.run
9
+ rescue => e
10
+ STDERR.puts e.message
11
+ STDERR.puts e.backtrace.join("\n")
12
+ exit 1
13
+ end
@@ -9,10 +9,9 @@ require 'version'
9
9
 
10
10
  module Twitterscraper
11
11
  class Error < StandardError; end
12
- # Your code goes here...
13
12
 
14
13
  def self.logger
15
- @logger ||= ::Logger.new(STDOUT)
14
+ @logger ||= ::Logger.new(STDOUT, level: ::Logger::INFO)
16
15
  end
17
16
 
18
17
  def self.logger=(logger)
@@ -0,0 +1,92 @@
1
+ $stdout.sync = true
2
+
3
+ require 'json'
4
+ require 'optparse'
5
+ require 'twitterscraper'
6
+
7
+ module Twitterscraper
8
+ class Cli
9
+ def parse
10
+ @options = parse_options(ARGV)
11
+ initialize_logger
12
+ end
13
+
14
+ def run
15
+ print_help || return if print_help?
16
+ print_version || return if print_version?
17
+
18
+ query_options = {
19
+ start_date: options['start_date'],
20
+ end_date: options['end_date'],
21
+ lang: options['lang'],
22
+ limit: options['limit'],
23
+ threads: options['threads'],
24
+ proxy: options['proxy']
25
+ }
26
+ client = Twitterscraper::Client.new
27
+ tweets = client.query_tweets(options['query'], query_options)
28
+ File.write(options['output'], generate_json(tweets))
29
+ end
30
+
31
+ def generate_json(tweets)
32
+ if options['pretty']
33
+ ::JSON.pretty_generate(tweets)
34
+ else
35
+ ::JSON.generate(tweets)
36
+ end
37
+ end
38
+
39
+ def options
40
+ @options
41
+ end
42
+
43
+ def parse_options(argv)
44
+ options = argv.getopts(
45
+ 'h',
46
+ 'help',
47
+ 'v',
48
+ 'version',
49
+ 'query:',
50
+ 'start_date:',
51
+ 'end_date:',
52
+ 'lang:',
53
+ 'limit:',
54
+ 'threads:',
55
+ 'output:',
56
+ 'proxy',
57
+ 'pretty',
58
+ 'verbose',
59
+ )
60
+
61
+ options['lang'] ||= ''
62
+ options['limit'] = (options['limit'] || 100).to_i
63
+ options['threads'] = (options['threads'] || 2).to_i
64
+ options['output'] ||= 'tweets.json'
65
+
66
+ options
67
+ end
68
+
69
+ def initialize_logger
70
+ Twitterscraper.logger.level = ::Logger::DEBUG if options['verbose']
71
+ end
72
+
73
+ def print_help?
74
+ options['h'] || options['help']
75
+ end
76
+
77
+ def print_help
78
+ puts <<~'SHELL'
79
+ Usage:
80
+ twitterscraper --query KEYWORD --limit 100 --threads 10 --start_date 2020-07-01 --end_date 2020-07-10 --lang ja --proxy --output output.json
81
+ SHELL
82
+ end
83
+
84
+ def print_version?
85
+ options['v'] || options['version']
86
+ end
87
+
88
+ def print_version
89
+ puts "twitterscraper-#{Twitterscraper::VERSION}"
90
+ end
91
+ end
92
+ end
@@ -24,7 +24,8 @@ module Twitterscraper
24
24
  req[key] = value
25
25
  end
26
26
 
27
- http.request(req).body
27
+ res = http.start { http.request(req) }
28
+ res.body
28
29
  end
29
30
  end
30
31
  end
@@ -6,22 +6,53 @@ module Twitterscraper
6
6
  class RetryExhausted < StandardError
7
7
  end
8
8
 
9
+ class Pool
10
+ def initialize
11
+ @items = Proxy.get_proxies
12
+ @cur_index = 0
13
+ end
14
+
15
+ def sample
16
+ if @cur_index >= @items.size
17
+ reload
18
+ end
19
+ @cur_index += 1
20
+ item = @items[@cur_index - 1]
21
+ Twitterscraper.logger.info("Using proxy #{item}")
22
+ item
23
+ end
24
+
25
+ def size
26
+ @items.size
27
+ end
28
+
29
+ private
30
+
31
+ def reload
32
+ @items = Proxy.get_proxies
33
+ @cur_index = 0
34
+ end
35
+ end
36
+
9
37
  module_function
10
38
 
11
39
  def get_proxies(retries = 3)
12
40
  response = Twitterscraper::Http.get(PROXY_URL)
13
41
  html = Nokogiri::HTML(response)
14
- table = html.xpath('//*[@id="proxylisttable"]').first
42
+ table = html.xpath('//table[@id="proxylisttable"]').first
15
43
 
16
44
  proxies = []
17
45
 
18
46
  table.xpath('tbody/tr').each do |tr|
19
47
  cells = tr.xpath('td')
20
- ip, port = cells[0].text.strip, cells[1].text.strip
48
+ ip, port, anonymity, https = [0, 1, 4, 6].map { |i| cells[i].text.strip }
49
+ next unless ['elite proxy', 'anonymous'].include?(anonymity)
50
+ next if https == 'no'
21
51
  proxies << ip + ':' + port
22
52
  end
23
53
 
24
- proxies
54
+ Twitterscraper.logger.debug "Fetch #{proxies.size} proxies"
55
+ proxies.shuffle
25
56
  rescue => e
26
57
  if (retries -= 1) > 0
27
58
  retry
@@ -1,7 +1,10 @@
1
+ require 'resolv-replace'
1
2
  require 'net/http'
2
3
  require 'nokogiri'
3
4
  require 'date'
4
5
  require 'json'
6
+ require 'erb'
7
+ require 'parallel'
5
8
 
6
9
  module Twitterscraper
7
10
  module Query
@@ -14,7 +17,6 @@ module Twitterscraper
14
17
  'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16',
15
18
  'Mozilla/5.0 (Windows NT 5.2; RW; rv:7.0a1) Gecko/20091211 SeaMonkey/9.23a1pre',
16
19
  ]
17
- USER_AGENT = USER_AGENT_LIST.sample
18
20
 
19
21
  INIT_URL = 'https://twitter.com/search?f=tweets&vertical=default&q=__QUERY__&l=__LANG__'
20
22
  RELOAD_URL = 'https://twitter.com/i/search/timeline?f=tweets&vertical=' +
@@ -25,7 +27,7 @@ module Twitterscraper
25
27
  'include_available_features=1&include_entities=1&' +
26
28
  'max_position={pos}&reset_error_state=false'
27
29
 
28
- def get_query_url(query, lang, pos, from_user = false)
30
+ def build_query_url(query, lang, pos, from_user = false)
29
31
  # if from_user
30
32
  # if !pos
31
33
  # INIT_URL_USER.format(u = query)
@@ -40,52 +42,50 @@ module Twitterscraper
40
42
  end
41
43
  end
42
44
 
43
- def query_single_page(query, lang, pos, retries = 30, from_user = false, timeout = 3, headers: [], proxies: [])
44
- query = query.gsub(' ', '%20').gsub('#', '%23').gsub(':', '%3A').gsub('&', '%26')
45
- logger.info("Querying #{query}")
46
-
47
- url = get_query_url(query, lang, pos, from_user)
48
- logger.debug("Scraping tweets from #{url}")
49
-
50
- response = nil
51
- begin
52
- proxy = proxies.sample
53
- logger.info("Using proxy #{proxy}")
54
-
55
- response = Twitterscraper::Http.get(url, headers, proxy, timeout)
56
- rescue => e
57
- logger.debug "query_single_page: #{e.inspect}"
58
- if (retries -= 1) > 0
59
- logger.info("Retrying... (Attempts left: #{retries - 1})")
60
- retry
61
- else
62
- raise
63
- end
45
+ def get_single_page(url, headers, proxies, timeout = 6, retries = 30)
46
+ return nil if stop_requested?
47
+ Twitterscraper::Http.get(url, headers, proxies.sample, timeout)
48
+ rescue => e
49
+ logger.debug "query_single_page: #{e.inspect}"
50
+ if (retries -= 1) > 0
51
+ logger.info("Retrying... (Attempts left: #{retries - 1})")
52
+ retry
53
+ else
54
+ raise
64
55
  end
56
+ end
65
57
 
66
- html = ''
67
- json_resp = nil
58
+ def parse_single_page(text, html = true)
59
+ return [nil, nil] if text.nil? || text == ''
68
60
 
69
- if pos
70
- begin
71
- json_resp = JSON.parse(response)
72
- html = json_resp['items_html'] || ''
73
- rescue => e
74
- logger.warn("Failed to parse JSON #{e.inspect} while requesting #{url}")
75
- end
61
+ if html
62
+ json_resp = nil
63
+ items_html = text
76
64
  else
77
- html = response || ''
65
+ json_resp = JSON.parse(text)
66
+ items_html = json_resp['items_html'] || ''
67
+ logger.warn json_resp['message'] if json_resp['message'] # Sorry, you are rate limited.
78
68
  end
79
69
 
70
+ [items_html, json_resp]
71
+ end
72
+
73
+ def query_single_page(query, lang, pos, from_user = false, headers: [], proxies: [])
74
+ logger.info("Querying #{query}")
75
+ query = ERB::Util.url_encode(query)
76
+
77
+ url = build_query_url(query, lang, pos, from_user)
78
+ logger.debug("Scraping tweets from #{url}")
79
+
80
+ response = get_single_page(url, headers, proxies)
81
+ return [], nil if response.nil?
82
+
83
+ html, json_resp = parse_single_page(response, pos.nil?)
84
+
80
85
  tweets = Tweet.from_html(html)
81
86
 
82
87
  if tweets.empty?
83
- if json_resp && json_resp['has_more_items']
84
- pos = json_resp['min_position']
85
- else
86
- pos = nil
87
- end
88
- return [], pos
88
+ return [], (json_resp && json_resp['has_more_items'] && json_resp['min_position'])
89
89
  end
90
90
 
91
91
  if json_resp
@@ -97,51 +97,112 @@ module Twitterscraper
97
97
  end
98
98
  end
99
99
 
100
- def query_tweets(query, start_date: nil, end_date: nil, limit: 100, threads: 2, lang: '')
101
- start_date = start_date ? Date.parse(start_date) : Date.parse('2006-3-21')
102
- end_date = end_date ? Date.parse(end_date) : Date.today
103
- if start_date == end_date
104
- raise 'Please specify different values for :start_date and :end_date.'
105
- elsif start_date > end_date
106
- raise 'The :start_date must occur before :end_date.'
100
+ OLDEST_DATE = Date.parse('2006-3-21')
101
+
102
+ def validate_options!(query, start_date:, end_date:, lang:, limit:, threads:, proxy:)
103
+ if query.nil? || query == ''
104
+ raise 'Please specify a search query.'
105
+ end
106
+
107
+ if ERB::Util.url_encode(query).length >= 500
108
+ raise ':query must be a UTF-8, URL-encoded search query of 500 characters maximum, including operators.'
107
109
  end
108
110
 
109
- # TODO parallel
111
+ if start_date && end_date
112
+ if start_date == end_date
113
+ raise 'Please specify different values for :start_date and :end_date.'
114
+ elsif start_date > end_date
115
+ raise ':start_date must occur before :end_date.'
116
+ end
117
+ end
110
118
 
119
+ if start_date
120
+ if start_date < OLDEST_DATE
121
+ raise ":start_date must be greater than or equal to #{OLDEST_DATE}"
122
+ end
123
+ end
124
+
125
+ if end_date
126
+ today = Date.today
127
+ if end_date > Date.today
128
+ raise ":end_date must be less than or equal to today(#{today})"
129
+ end
130
+ end
131
+ end
132
+
133
+ def build_queries(query, start_date, end_date)
134
+ if start_date && end_date
135
+ date_range = start_date.upto(end_date - 1)
136
+ date_range.map { |date| query + " since:#{date} until:#{date + 1}" }
137
+ elsif start_date
138
+ [query + " since:#{start_date}"]
139
+ elsif end_date
140
+ [query + " until:#{end_date}"]
141
+ else
142
+ [query]
143
+ end
144
+ end
145
+
146
+ def main_loop(query, lang, limit, headers, proxies)
111
147
  pos = nil
112
- all_tweets = []
113
148
 
114
- proxies = Twitterscraper::Proxy.get_proxies
115
- logger.info "Using #{proxies.size} proxies"
149
+ while true
150
+ new_tweets, new_pos = query_single_page(query, lang, pos, headers: headers, proxies: proxies)
151
+ unless new_tweets.empty?
152
+ @mutex.synchronize {
153
+ @all_tweets.concat(new_tweets)
154
+ @all_tweets.uniq! { |t| t.tweet_id }
155
+ }
156
+ end
157
+ logger.info("Got #{new_tweets.size} tweets (total #{@all_tweets.size})")
116
158
 
117
- headers = {'User-Agent': USER_AGENT, 'X-Requested-With': 'XMLHttpRequest'}
118
- logger.info("Headers #{headers}")
159
+ break unless new_pos
160
+ break if @all_tweets.size >= limit
119
161
 
120
- start_date.upto(end_date) do |date|
121
- break if date == end_date
162
+ pos = new_pos
163
+ end
122
164
 
123
- queries = query + " since:#{date} until:#{date + 1}"
165
+ if @all_tweets.size >= limit
166
+ logger.info("Limit reached #{@all_tweets.size}")
167
+ @stop_requested = true
168
+ end
169
+ end
124
170
 
125
- while true
126
- new_tweets, new_pos = query_single_page(queries, lang, pos, headers: headers, proxies: proxies)
127
- logger.info("Got #{new_tweets.size} tweets")
128
- logger.debug("new_pos=#{new_pos}")
171
+ def stop_requested?
172
+ @stop_requested
173
+ end
129
174
 
130
- unless new_tweets.empty?
131
- all_tweets.concat(new_tweets)
132
- all_tweets.uniq! { |t| t.tweet_id }
133
- end
175
+ def query_tweets(query, start_date: nil, end_date: nil, lang: '', limit: 100, threads: 2, proxy: false)
176
+ start_date = Date.parse(start_date) if start_date && start_date.is_a?(String)
177
+ end_date = Date.parse(end_date) if end_date && end_date.is_a?(String)
178
+ queries = build_queries(query, start_date, end_date)
179
+ threads = queries.size if threads > queries.size
180
+ proxies = proxy ? Twitterscraper::Proxy::Pool.new : []
134
181
 
135
- break unless new_pos
136
- break if all_tweets.size >= limit
182
+ validate_options!(queries[0], start_date: start_date, end_date: end_date, lang: lang, limit: limit, threads: threads, proxy: proxy)
137
183
 
138
- pos = new_pos
139
- end
184
+ logger.info("The number of threads #{threads}")
140
185
 
141
- break if all_tweets.size >= limit
186
+ headers = {'User-Agent': USER_AGENT_LIST.sample, 'X-Requested-With': 'XMLHttpRequest'}
187
+ logger.info("Headers #{headers}")
188
+
189
+ @all_tweets = []
190
+ @mutex = Mutex.new
191
+ @stop_requested = false
192
+
193
+ if threads > 1
194
+ Parallel.each(queries, in_threads: threads) do |query|
195
+ main_loop(query, lang, limit, headers, proxies)
196
+ raise Parallel::Break if stop_requested?
197
+ end
198
+ else
199
+ queries.each do |query|
200
+ main_loop(query, lang, limit, headers, proxies)
201
+ break if stop_requested?
202
+ end
142
203
  end
143
204
 
144
- all_tweets
205
+ @all_tweets.sort_by { |tweet| -tweet.created_at.to_i }
145
206
  end
146
207
  end
147
208
  end
@@ -2,7 +2,8 @@ require 'time'
2
2
 
3
3
  module Twitterscraper
4
4
  class Tweet
5
- attr_reader :screen_name, :name, :user_id, :tweet_id, :tweet_url, :timestamp, :created_at, :text
5
+ KEYS = [:screen_name, :name, :user_id, :tweet_id, :tweet_url, :created_at, :text]
6
+ attr_reader *KEYS
6
7
 
7
8
  def initialize(attrs)
8
9
  attrs.each do |key, value|
@@ -10,6 +11,12 @@ module Twitterscraper
10
11
  end
11
12
  end
12
13
 
14
+ def to_json(options = {})
15
+ KEYS.map do |key|
16
+ [key, send(key)]
17
+ end.to_h.to_json
18
+ end
19
+
13
20
  class << self
14
21
  def from_html(text)
15
22
  html = Nokogiri::HTML(text)
@@ -31,7 +38,6 @@ module Twitterscraper
31
38
  user_id: html.attr('data-user-id').to_i,
32
39
  tweet_id: html.attr('data-tweet-id').to_i,
33
40
  tweet_url: 'https://twitter.com' + html.attr('data-permalink-path'),
34
- timestamp: timestamp,
35
41
  created_at: Time.at(timestamp, in: '+00:00'),
36
42
  text: inner_html.xpath("//div[@class[contains(., 'js-tweet-text-container')]]/p[@class[contains(., 'js-tweet-text')]]").first.text,
37
43
  )
@@ -1,3 +1,3 @@
1
1
  module Twitterscraper
2
- VERSION = "0.2.0"
2
+ VERSION = '0.7.0'
3
3
  end
@@ -21,9 +21,11 @@ Gem::Specification.new do |spec|
21
21
  spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
22
22
  `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
23
23
  end
24
- spec.bindir = "exe"
25
- spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
24
+ spec.executables = ["twitterscraper"]
26
25
  spec.require_paths = ["lib"]
27
26
 
27
+ spec.required_ruby_version = ">= 2.6.4"
28
+
28
29
  spec.add_dependency "nokogiri"
30
+ spec.add_dependency "parallel"
29
31
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: twitterscraper-ruby
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.7.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - ts-3156
8
8
  autorequire:
9
- bindir: exe
9
+ bindir: bin
10
10
  cert_chain: []
11
- date: 2020-07-12 00:00:00.000000000 Z
11
+ date: 2020-07-13 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -24,10 +24,25 @@ dependencies:
24
24
  - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: parallel
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
27
41
  description: A gem to scrape Tweets
28
42
  email:
29
43
  - ts_3156@yahoo.co.jp
30
- executables: []
44
+ executables:
45
+ - twitterscraper
31
46
  extensions: []
32
47
  extra_rdoc_files: []
33
48
  files:
@@ -43,8 +58,10 @@ files:
43
58
  - Rakefile
44
59
  - bin/console
45
60
  - bin/setup
61
+ - bin/twitterscraper
46
62
  - lib/twitterscraper-ruby.rb
47
63
  - lib/twitterscraper.rb
64
+ - lib/twitterscraper/cli.rb
48
65
  - lib/twitterscraper/client.rb
49
66
  - lib/twitterscraper/http.rb
50
67
  - lib/twitterscraper/lang.rb
@@ -69,7 +86,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
69
86
  requirements:
70
87
  - - ">="
71
88
  - !ruby/object:Gem::Version
72
- version: 2.3.0
89
+ version: 2.6.4
73
90
  required_rubygems_version: !ruby/object:Gem::Requirement
74
91
  requirements:
75
92
  - - ">="