twitterscraper-ruby 0.15.0 → 0.15.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a950fb24329aaa1020441e258a8a2144100d732142b6c227bb9b026b8bb73996
4
- data.tar.gz: 1f64f31e43189e2ee439f5ef6f6d54bc6ea58895adbed67cb8ddbe91af07681a
3
+ metadata.gz: 7f04cb0ba394884918271b5485b596c07203b7a6e9f4fec42d074ef4f02b6a0a
4
+ data.tar.gz: a4f618df53d1e8b54954619e87d383e43dbe5a63bbf83b33ee38f975998f2678
5
5
  SHA512:
6
- metadata.gz: 8573affbc9a5faa05e5e489364bb2ba0da1aa4f12af35445e5de8b1f8c399eb0575cc9f408b2ba96c3d7fd8b2a74b7dd703229053a33c1f8a883856818033cb9
7
- data.tar.gz: 2b2b3ad0b2dd9d089a7b6127ed1b0db21e7f4fa5f0c31e6b366d9b5ae444e2244d4200c813b7a3257f43702d2caa9f264515e701602c24f4482a746b89d41328
6
+ metadata.gz: fa9f02cf3ef0bf280f45b18ebacaec0b06dbd610477355602fcc59d382b5590c990695297e1e793457fdcff4cb7dd037f076c1f0fa4706eb69c67c3a165243e4
7
+ data.tar.gz: 9c08d9e4d1ee56fa133675bc73a50f502040cc9a2844d9a46a39c38ccdffdf43c15b17c2e4a8b74561f523493ccbc4a055f0add239574d2f5129ee4abe1f5ed9
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- twitterscraper-ruby (0.15.0)
4
+ twitterscraper-ruby (0.15.1)
5
5
  nokogiri
6
6
  parallel
7
7
 
data/README.md CHANGED
@@ -5,15 +5,17 @@
5
5
 
6
6
  A gem to scrape https://twitter.com/search. This gem is inspired by [taspinar/twitterscraper](https://github.com/taspinar/twitterscraper).
7
7
 
8
+ Please feel free to ask [@ts_3156](https://twitter.com/ts_3156) if you have any questions.
9
+
8
10
 
9
11
  ## Twitter Search API vs. twitterscraper-ruby
10
12
 
11
- ### Twitter Search API
13
+ #### Twitter Search API
12
14
 
13
15
  - The number of tweets: 180 - 450 requests/15 minutes (18,000 - 45,000 tweets/15 minutes)
14
16
  - The time window: the past 7 days
15
17
 
16
- ### twitterscraper-ruby
18
+ #### twitterscraper-ruby
17
19
 
18
20
  - The number of tweets: Unlimited
19
21
  - The time window: from 2006-3-21 to today
@@ -30,37 +32,49 @@ $ gem install twitterscraper-ruby
30
32
 
31
33
  ## Usage
32
34
 
33
- Command-line interface:
35
+ #### Command-line interface:
36
+
37
+ Returns a collection of relevant tweets matching a specified query.
34
38
 
35
39
  ```shell script
36
- # Returns a collection of relevant tweets matching a specified query.
37
40
  $ twitterscraper --type search --query KEYWORD --start_date 2020-06-01 --end_date 2020-06-30 --lang ja \
38
41
  --limit 100 --threads 10 --output tweets.json
39
42
  ```
40
43
 
44
+ Returns a collection of the most recent tweets posted by the user indicated by the screen_name
45
+
41
46
  ```shell script
42
- # Returns a collection of the most recent tweets posted by the user indicated by the screen_name
43
47
  $ twitterscraper --type user --query SCREEN_NAME --limit 100 --output tweets.json
44
48
  ```
45
49
 
46
- From Within Ruby:
50
+ #### From Within Ruby:
47
51
 
48
52
  ```ruby
49
53
  require 'twitterscraper'
50
54
  client = Twitterscraper::Client.new(cache: true, proxy: true)
51
55
  ```
52
56
 
57
+ Returns a collection of relevant tweets matching a specified query.
58
+
53
59
  ```ruby
54
- # Returns a collection of relevant tweets matching a specified query.
55
60
  tweets = client.search(KEYWORD, start_date: '2020-06-01', end_date: '2020-06-30', lang: 'ja', limit: 100, threads: 10)
56
61
  ```
57
62
 
63
+ Returns a collection of the most recent tweets posted by the user indicated by the screen_name
64
+
58
65
  ```ruby
59
- # Returns a collection of the most recent tweets posted by the user indicated by the screen_name
60
66
  tweets = client.user_timeline(SCREEN_NAME, limit: 100)
61
67
  ```
62
68
 
63
69
 
70
+ ## Examples
71
+
72
+ ```shell script
73
+ $ twitterscraper --query twitter --limit 1000
74
+ $ cat tweets.json | jq . | less
75
+ ```
76
+
77
+
64
78
  ## Attributes
65
79
 
66
80
  ### Tweet
@@ -72,11 +86,39 @@ tweets.each do |tweet|
72
86
  puts tweet.tweet_url
73
87
  puts tweet.created_at
74
88
 
89
+ attr_names = hash.keys
75
90
  hash = tweet.attrs
76
- puts hash.keys
91
+ json = tweet.to_json
77
92
  end
78
93
  ```
79
94
 
95
+ ```json
96
+ [
97
+ {
98
+ "screen_name": "@name",
99
+ "name": "Name",
100
+ "user_id": 12340000,
101
+ "tweet_id": 1234000000000000,
102
+ "text": "Thanks Twitter!",
103
+ "links": [],
104
+ "hashtags": [],
105
+ "image_urls": [],
106
+ "video_url": null,
107
+ "has_media": null,
108
+ "likes": 10,
109
+ "retweets": 20,
110
+ "replies": 0,
111
+ "is_replied": false,
112
+ "is_reply_to": false,
113
+ "parent_tweet_id": null,
114
+ "reply_to_users": [],
115
+ "tweet_url": "https://twitter.com/name/status/1234000000000000",
116
+ "timestamp": 1594793000,
117
+ "created_at": "2020-07-15 00:00:00 +0000"
118
+ }
119
+ ]
120
+ ```
121
+
80
122
  - screen_name
81
123
  - name
82
124
  - user_id
@@ -118,45 +160,24 @@ end
118
160
  Search operators documentation is in [Standard search operators](https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators).
119
161
 
120
162
 
121
- ## Examples
122
-
123
- ```shell script
124
- $ twitterscraper --query twitter --limit 1000
125
- $ cat tweets.json | jq . | less
126
- ```
127
-
128
- ```json
129
- [
130
- {
131
- "screen_name": "@screenname",
132
- "name": "name",
133
- "user_id": 1194529546483000000,
134
- "tweet_id": 1282659891992000000,
135
- "tweet_url": "https://twitter.com/screenname/status/1282659891992000000",
136
- "created_at": "2020-07-13 12:00:00 +0000",
137
- "text": "Thanks Twitter!"
138
- }
139
- ]
140
- ```
141
-
142
163
  ## CLI Options
143
164
 
144
- | Option | Description | Default |
145
- | ------------- | ------------- | ------------- |
146
- | `-h`, `--help` | This option displays a summary of twitterscraper. | |
147
- | `--type` | Specify a search type. | search |
148
- | `--query` | Specify a keyword used during the search. | |
149
- | `--start_date` | Used as "since:yyyy-mm-dd for your query. This means "since the date". | |
150
- | `--end_date` | Used as "until:yyyy-mm-dd for your query. This means "before the date". | |
151
- | `--lang` | Retrieve tweets written in a specific language. | |
152
- | `--limit` | Stop scraping when *at least* the number of tweets indicated with --limit is scraped. | 100 |
153
- | `--order` | Sort order of the results. | desc |
154
- | `--threads` | Set the number of threads twitterscraper-ruby should initiate while scraping for your query. | 2 |
155
- | `--proxy` | Scrape https://twitter.com/search via proxies. | true |
156
- | `--cache` | Enable caching. | true |
157
- | `--format` | The format of the output. | json |
158
- | `--output` | The name of the output file. | tweets.json |
159
- | `--verbose` | Print debug messages. | tweets.json |
165
+ | Option | Type | Description | Value |
166
+ | ------------- | ------------- | ------------- | ------------- |
167
+ | `--help` | string | This option displays a summary of twitterscraper. | |
168
+ | `--type` | string | Specify a search type. | search(default) or user |
169
+ | `--query` | string | Specify a keyword used during the search. | |
170
+ | `--start_date` | string | Used as "since:yyyy-mm-dd for your query. This means "since the date". | |
171
+ | `--end_date` | string | Used as "until:yyyy-mm-dd for your query. This means "before the date". | |
172
+ | `--lang` | string | Retrieve tweets written in a specific language. | |
173
+ | `--limit` | integer | Stop scraping when *at least* the number of tweets indicated with --limit is scraped. | 100 |
174
+ | `--order` | string | Sort a order of the results. | desc(default) or asc |
175
+ | `--threads` | integer | Set the number of threads twitterscraper-ruby should initiate while scraping for your query. | 2 |
176
+ | `--proxy` | boolean | Scrape https://twitter.com/search via proxies. | true(default) or false |
177
+ | `--cache` | boolean | Enable caching. | true(default) or false |
178
+ | `--format` | string | The format of the output. | json(default) or html |
179
+ | `--output` | string | The name of the output file. | tweets.json |
180
+ | `--verbose` | | Print debug messages. | |
160
181
 
161
182
 
162
183
  ## Contributing
@@ -27,8 +27,8 @@ module Twitterscraper
27
27
  'include_available_features=1&include_entities=1&' +
28
28
  'max_position=__POS__&reset_error_state=false'
29
29
 
30
- def build_query_url(query, lang, from_user, pos)
31
- if from_user
30
+ def build_query_url(query, lang, type, pos)
31
+ if type == 'user'
32
32
  if pos
33
33
  RELOAD_URL_USER.sub('__USER__', query).sub('__POS__', pos.to_s)
34
34
  else
@@ -51,7 +51,7 @@ module Twitterscraper
51
51
  end
52
52
  Http.get(url, headers, proxy, timeout)
53
53
  rescue => e
54
- logger.debug "query_single_page: #{e.inspect}"
54
+ logger.debug "get_single_page: #{e.inspect}"
55
55
  if (retries -= 1) > 0
56
56
  logger.info "Retrying... (Attempts left: #{retries - 1})"
57
57
  retry
@@ -79,7 +79,7 @@ module Twitterscraper
79
79
  logger.info "Querying #{query}"
80
80
  query = ERB::Util.url_encode(query)
81
81
 
82
- url = build_query_url(query, lang, type == 'user', pos)
82
+ url = build_query_url(query, lang, type, pos)
83
83
  http_request = lambda do
84
84
  logger.debug "Scraping tweets from #{url}"
85
85
  get_single_page(url, headers, proxies)
@@ -1,3 +1,3 @@
1
1
  module Twitterscraper
2
- VERSION = '0.15.0'
2
+ VERSION = '0.15.1'
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: twitterscraper-ruby
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.15.0
4
+ version: 0.15.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - ts-3156