twitterscraper-ruby 0.15.0 → 0.15.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a950fb24329aaa1020441e258a8a2144100d732142b6c227bb9b026b8bb73996
4
- data.tar.gz: 1f64f31e43189e2ee439f5ef6f6d54bc6ea58895adbed67cb8ddbe91af07681a
3
+ metadata.gz: 7f04cb0ba394884918271b5485b596c07203b7a6e9f4fec42d074ef4f02b6a0a
4
+ data.tar.gz: a4f618df53d1e8b54954619e87d383e43dbe5a63bbf83b33ee38f975998f2678
5
5
  SHA512:
6
- metadata.gz: 8573affbc9a5faa05e5e489364bb2ba0da1aa4f12af35445e5de8b1f8c399eb0575cc9f408b2ba96c3d7fd8b2a74b7dd703229053a33c1f8a883856818033cb9
7
- data.tar.gz: 2b2b3ad0b2dd9d089a7b6127ed1b0db21e7f4fa5f0c31e6b366d9b5ae444e2244d4200c813b7a3257f43702d2caa9f264515e701602c24f4482a746b89d41328
6
+ metadata.gz: fa9f02cf3ef0bf280f45b18ebacaec0b06dbd610477355602fcc59d382b5590c990695297e1e793457fdcff4cb7dd037f076c1f0fa4706eb69c67c3a165243e4
7
+ data.tar.gz: 9c08d9e4d1ee56fa133675bc73a50f502040cc9a2844d9a46a39c38ccdffdf43c15b17c2e4a8b74561f523493ccbc4a055f0add239574d2f5129ee4abe1f5ed9
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- twitterscraper-ruby (0.15.0)
4
+ twitterscraper-ruby (0.15.1)
5
5
  nokogiri
6
6
  parallel
7
7
 
data/README.md CHANGED
@@ -5,15 +5,17 @@
5
5
 
6
6
  A gem to scrape https://twitter.com/search. This gem is inspired by [taspinar/twitterscraper](https://github.com/taspinar/twitterscraper).
7
7
 
8
+ Please feel free to ask [@ts_3156](https://twitter.com/ts_3156) if you have any questions.
9
+
8
10
 
9
11
  ## Twitter Search API vs. twitterscraper-ruby
10
12
 
11
- ### Twitter Search API
13
+ #### Twitter Search API
12
14
 
13
15
  - The number of tweets: 180 - 450 requests/15 minutes (18,000 - 45,000 tweets/15 minutes)
14
16
  - The time window: the past 7 days
15
17
 
16
- ### twitterscraper-ruby
18
+ #### twitterscraper-ruby
17
19
 
18
20
  - The number of tweets: Unlimited
19
21
  - The time window: from 2006-3-21 to today
@@ -30,37 +32,49 @@ $ gem install twitterscraper-ruby
30
32
 
31
33
  ## Usage
32
34
 
33
- Command-line interface:
35
+ #### Command-line interface:
36
+
37
+ Returns a collection of relevant tweets matching a specified query.
34
38
 
35
39
  ```shell script
36
- # Returns a collection of relevant tweets matching a specified query.
37
40
  $ twitterscraper --type search --query KEYWORD --start_date 2020-06-01 --end_date 2020-06-30 --lang ja \
38
41
  --limit 100 --threads 10 --output tweets.json
39
42
  ```
40
43
 
44
+ Returns a collection of the most recent tweets posted by the user indicated by the screen_name
45
+
41
46
  ```shell script
42
- # Returns a collection of the most recent tweets posted by the user indicated by the screen_name
43
47
  $ twitterscraper --type user --query SCREEN_NAME --limit 100 --output tweets.json
44
48
  ```
45
49
 
46
- From Within Ruby:
50
+ #### From Within Ruby:
47
51
 
48
52
  ```ruby
49
53
  require 'twitterscraper'
50
54
  client = Twitterscraper::Client.new(cache: true, proxy: true)
51
55
  ```
52
56
 
57
+ Returns a collection of relevant tweets matching a specified query.
58
+
53
59
  ```ruby
54
- # Returns a collection of relevant tweets matching a specified query.
55
60
  tweets = client.search(KEYWORD, start_date: '2020-06-01', end_date: '2020-06-30', lang: 'ja', limit: 100, threads: 10)
56
61
  ```
57
62
 
63
+ Returns a collection of the most recent tweets posted by the user indicated by the screen_name
64
+
58
65
  ```ruby
59
- # Returns a collection of the most recent tweets posted by the user indicated by the screen_name
60
66
  tweets = client.user_timeline(SCREEN_NAME, limit: 100)
61
67
  ```
62
68
 
63
69
 
70
+ ## Examples
71
+
72
+ ```shell script
73
+ $ twitterscraper --query twitter --limit 1000
74
+ $ cat tweets.json | jq . | less
75
+ ```
76
+
77
+
64
78
  ## Attributes
65
79
 
66
80
  ### Tweet
@@ -72,11 +86,39 @@ tweets.each do |tweet|
72
86
  puts tweet.tweet_url
73
87
  puts tweet.created_at
74
88
 
89
+ attr_names = hash.keys
75
90
  hash = tweet.attrs
76
- puts hash.keys
91
+ json = tweet.to_json
77
92
  end
78
93
  ```
79
94
 
95
+ ```json
96
+ [
97
+ {
98
+ "screen_name": "@name",
99
+ "name": "Name",
100
+ "user_id": 12340000,
101
+ "tweet_id": 1234000000000000,
102
+ "text": "Thanks Twitter!",
103
+ "links": [],
104
+ "hashtags": [],
105
+ "image_urls": [],
106
+ "video_url": null,
107
+ "has_media": null,
108
+ "likes": 10,
109
+ "retweets": 20,
110
+ "replies": 0,
111
+ "is_replied": false,
112
+ "is_reply_to": false,
113
+ "parent_tweet_id": null,
114
+ "reply_to_users": [],
115
+ "tweet_url": "https://twitter.com/name/status/1234000000000000",
116
+ "timestamp": 1594793000,
117
+ "created_at": "2020-07-15 00:00:00 +0000"
118
+ }
119
+ ]
120
+ ```
121
+
80
122
  - screen_name
81
123
  - name
82
124
  - user_id
@@ -118,45 +160,24 @@ end
118
160
  Search operators documentation is in [Standard search operators](https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators).
119
161
 
120
162
 
121
- ## Examples
122
-
123
- ```shell script
124
- $ twitterscraper --query twitter --limit 1000
125
- $ cat tweets.json | jq . | less
126
- ```
127
-
128
- ```json
129
- [
130
- {
131
- "screen_name": "@screenname",
132
- "name": "name",
133
- "user_id": 1194529546483000000,
134
- "tweet_id": 1282659891992000000,
135
- "tweet_url": "https://twitter.com/screenname/status/1282659891992000000",
136
- "created_at": "2020-07-13 12:00:00 +0000",
137
- "text": "Thanks Twitter!"
138
- }
139
- ]
140
- ```
141
-
142
163
  ## CLI Options
143
164
 
144
- | Option | Description | Default |
145
- | ------------- | ------------- | ------------- |
146
- | `-h`, `--help` | This option displays a summary of twitterscraper. | |
147
- | `--type` | Specify a search type. | search |
148
- | `--query` | Specify a keyword used during the search. | |
149
- | `--start_date` | Used as "since:yyyy-mm-dd for your query. This means "since the date". | |
150
- | `--end_date` | Used as "until:yyyy-mm-dd for your query. This means "before the date". | |
151
- | `--lang` | Retrieve tweets written in a specific language. | |
152
- | `--limit` | Stop scraping when *at least* the number of tweets indicated with --limit is scraped. | 100 |
153
- | `--order` | Sort order of the results. | desc |
154
- | `--threads` | Set the number of threads twitterscraper-ruby should initiate while scraping for your query. | 2 |
155
- | `--proxy` | Scrape https://twitter.com/search via proxies. | true |
156
- | `--cache` | Enable caching. | true |
157
- | `--format` | The format of the output. | json |
158
- | `--output` | The name of the output file. | tweets.json |
159
- | `--verbose` | Print debug messages. | tweets.json |
165
+ | Option | Type | Description | Value |
166
+ | ------------- | ------------- | ------------- | ------------- |
167
+ | `--help` | string | This option displays a summary of twitterscraper. | |
168
+ | `--type` | string | Specify a search type. | search(default) or user |
169
+ | `--query` | string | Specify a keyword used during the search. | |
170
+ | `--start_date` | string | Used as "since:yyyy-mm-dd for your query. This means "since the date". | |
171
+ | `--end_date` | string | Used as "until:yyyy-mm-dd for your query. This means "before the date". | |
172
+ | `--lang` | string | Retrieve tweets written in a specific language. | |
173
+ | `--limit` | integer | Stop scraping when *at least* the number of tweets indicated with --limit is scraped. | 100 |
174
+ | `--order` | string | Sort a order of the results. | desc(default) or asc |
175
+ | `--threads` | integer | Set the number of threads twitterscraper-ruby should initiate while scraping for your query. | 2 |
176
+ | `--proxy` | boolean | Scrape https://twitter.com/search via proxies. | true(default) or false |
177
+ | `--cache` | boolean | Enable caching. | true(default) or false |
178
+ | `--format` | string | The format of the output. | json(default) or html |
179
+ | `--output` | string | The name of the output file. | tweets.json |
180
+ | `--verbose` | | Print debug messages. | |
160
181
 
161
182
 
162
183
  ## Contributing
@@ -27,8 +27,8 @@ module Twitterscraper
27
27
  'include_available_features=1&include_entities=1&' +
28
28
  'max_position=__POS__&reset_error_state=false'
29
29
 
30
- def build_query_url(query, lang, from_user, pos)
31
- if from_user
30
+ def build_query_url(query, lang, type, pos)
31
+ if type == 'user'
32
32
  if pos
33
33
  RELOAD_URL_USER.sub('__USER__', query).sub('__POS__', pos.to_s)
34
34
  else
@@ -51,7 +51,7 @@ module Twitterscraper
51
51
  end
52
52
  Http.get(url, headers, proxy, timeout)
53
53
  rescue => e
54
- logger.debug "query_single_page: #{e.inspect}"
54
+ logger.debug "get_single_page: #{e.inspect}"
55
55
  if (retries -= 1) > 0
56
56
  logger.info "Retrying... (Attempts left: #{retries - 1})"
57
57
  retry
@@ -79,7 +79,7 @@ module Twitterscraper
79
79
  logger.info "Querying #{query}"
80
80
  query = ERB::Util.url_encode(query)
81
81
 
82
- url = build_query_url(query, lang, type == 'user', pos)
82
+ url = build_query_url(query, lang, type, pos)
83
83
  http_request = lambda do
84
84
  logger.debug "Scraping tweets from #{url}"
85
85
  get_single_page(url, headers, proxies)
@@ -1,3 +1,3 @@
1
1
  module Twitterscraper
2
- VERSION = '0.15.0'
2
+ VERSION = '0.15.1'
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: twitterscraper-ruby
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.15.0
4
+ version: 0.15.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - ts-3156