twitterscraper-ruby 0.15.0 → 0.15.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +67 -46
- data/lib/twitterscraper/query.rb +4 -4
- data/lib/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7f04cb0ba394884918271b5485b596c07203b7a6e9f4fec42d074ef4f02b6a0a
|
4
|
+
data.tar.gz: a4f618df53d1e8b54954619e87d383e43dbe5a63bbf83b33ee38f975998f2678
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: fa9f02cf3ef0bf280f45b18ebacaec0b06dbd610477355602fcc59d382b5590c990695297e1e793457fdcff4cb7dd037f076c1f0fa4706eb69c67c3a165243e4
|
7
|
+
data.tar.gz: 9c08d9e4d1ee56fa133675bc73a50f502040cc9a2844d9a46a39c38ccdffdf43c15b17c2e4a8b74561f523493ccbc4a055f0add239574d2f5129ee4abe1f5ed9
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -5,15 +5,17 @@
|
|
5
5
|
|
6
6
|
A gem to scrape https://twitter.com/search. This gem is inspired by [taspinar/twitterscraper](https://github.com/taspinar/twitterscraper).
|
7
7
|
|
8
|
+
Please feel free to ask [@ts_3156](https://twitter.com/ts_3156) if you have any questions.
|
9
|
+
|
8
10
|
|
9
11
|
## Twitter Search API vs. twitterscraper-ruby
|
10
12
|
|
11
|
-
|
13
|
+
#### Twitter Search API
|
12
14
|
|
13
15
|
- The number of tweets: 180 - 450 requests/15 minutes (18,000 - 45,000 tweets/15 minutes)
|
14
16
|
- The time window: the past 7 days
|
15
17
|
|
16
|
-
|
18
|
+
#### twitterscraper-ruby
|
17
19
|
|
18
20
|
- The number of tweets: Unlimited
|
19
21
|
- The time window: from 2006-3-21 to today
|
@@ -30,37 +32,49 @@ $ gem install twitterscraper-ruby
|
|
30
32
|
|
31
33
|
## Usage
|
32
34
|
|
33
|
-
Command-line interface:
|
35
|
+
#### Command-line interface:
|
36
|
+
|
37
|
+
Returns a collection of relevant tweets matching a specified query.
|
34
38
|
|
35
39
|
```shell script
|
36
|
-
# Returns a collection of relevant tweets matching a specified query.
|
37
40
|
$ twitterscraper --type search --query KEYWORD --start_date 2020-06-01 --end_date 2020-06-30 --lang ja \
|
38
41
|
--limit 100 --threads 10 --output tweets.json
|
39
42
|
```
|
40
43
|
|
44
|
+
Returns a collection of the most recent tweets posted by the user indicated by the screen_name
|
45
|
+
|
41
46
|
```shell script
|
42
|
-
# Returns a collection of the most recent tweets posted by the user indicated by the screen_name
|
43
47
|
$ twitterscraper --type user --query SCREEN_NAME --limit 100 --output tweets.json
|
44
48
|
```
|
45
49
|
|
46
|
-
From Within Ruby:
|
50
|
+
#### From Within Ruby:
|
47
51
|
|
48
52
|
```ruby
|
49
53
|
require 'twitterscraper'
|
50
54
|
client = Twitterscraper::Client.new(cache: true, proxy: true)
|
51
55
|
```
|
52
56
|
|
57
|
+
Returns a collection of relevant tweets matching a specified query.
|
58
|
+
|
53
59
|
```ruby
|
54
|
-
# Returns a collection of relevant tweets matching a specified query.
|
55
60
|
tweets = client.search(KEYWORD, start_date: '2020-06-01', end_date: '2020-06-30', lang: 'ja', limit: 100, threads: 10)
|
56
61
|
```
|
57
62
|
|
63
|
+
Returns a collection of the most recent tweets posted by the user indicated by the screen_name
|
64
|
+
|
58
65
|
```ruby
|
59
|
-
# Returns a collection of the most recent tweets posted by the user indicated by the screen_name
|
60
66
|
tweets = client.user_timeline(SCREEN_NAME, limit: 100)
|
61
67
|
```
|
62
68
|
|
63
69
|
|
70
|
+
## Examples
|
71
|
+
|
72
|
+
```shell script
|
73
|
+
$ twitterscraper --query twitter --limit 1000
|
74
|
+
$ cat tweets.json | jq . | less
|
75
|
+
```
|
76
|
+
|
77
|
+
|
64
78
|
## Attributes
|
65
79
|
|
66
80
|
### Tweet
|
@@ -72,11 +86,39 @@ tweets.each do |tweet|
|
|
72
86
|
puts tweet.tweet_url
|
73
87
|
puts tweet.created_at
|
74
88
|
|
89
|
+
attr_names = hash.keys
|
75
90
|
hash = tweet.attrs
|
76
|
-
|
91
|
+
json = tweet.to_json
|
77
92
|
end
|
78
93
|
```
|
79
94
|
|
95
|
+
```json
|
96
|
+
[
|
97
|
+
{
|
98
|
+
"screen_name": "@name",
|
99
|
+
"name": "Name",
|
100
|
+
"user_id": 12340000,
|
101
|
+
"tweet_id": 1234000000000000,
|
102
|
+
"text": "Thanks Twitter!",
|
103
|
+
"links": [],
|
104
|
+
"hashtags": [],
|
105
|
+
"image_urls": [],
|
106
|
+
"video_url": null,
|
107
|
+
"has_media": null,
|
108
|
+
"likes": 10,
|
109
|
+
"retweets": 20,
|
110
|
+
"replies": 0,
|
111
|
+
"is_replied": false,
|
112
|
+
"is_reply_to": false,
|
113
|
+
"parent_tweet_id": null,
|
114
|
+
"reply_to_users": [],
|
115
|
+
"tweet_url": "https://twitter.com/name/status/1234000000000000",
|
116
|
+
"timestamp": 1594793000,
|
117
|
+
"created_at": "2020-07-15 00:00:00 +0000"
|
118
|
+
}
|
119
|
+
]
|
120
|
+
```
|
121
|
+
|
80
122
|
- screen_name
|
81
123
|
- name
|
82
124
|
- user_id
|
@@ -118,45 +160,24 @@ end
|
|
118
160
|
Search operators documentation is in [Standard search operators](https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators).
|
119
161
|
|
120
162
|
|
121
|
-
## Examples
|
122
|
-
|
123
|
-
```shell script
|
124
|
-
$ twitterscraper --query twitter --limit 1000
|
125
|
-
$ cat tweets.json | jq . | less
|
126
|
-
```
|
127
|
-
|
128
|
-
```json
|
129
|
-
[
|
130
|
-
{
|
131
|
-
"screen_name": "@screenname",
|
132
|
-
"name": "name",
|
133
|
-
"user_id": 1194529546483000000,
|
134
|
-
"tweet_id": 1282659891992000000,
|
135
|
-
"tweet_url": "https://twitter.com/screenname/status/1282659891992000000",
|
136
|
-
"created_at": "2020-07-13 12:00:00 +0000",
|
137
|
-
"text": "Thanks Twitter!"
|
138
|
-
}
|
139
|
-
]
|
140
|
-
```
|
141
|
-
|
142
163
|
## CLI Options
|
143
164
|
|
144
|
-
| Option | Description |
|
145
|
-
| ------------- | ------------- | ------------- |
|
146
|
-
|
|
147
|
-
| `--type` | Specify a search type. | search |
|
148
|
-
| `--query` | Specify a keyword used during the search. | |
|
149
|
-
| `--start_date` | Used as "since:yyyy-mm-dd for your query. This means "since the date". | |
|
150
|
-
| `--end_date` | Used as "until:yyyy-mm-dd for your query. This means "before the date". | |
|
151
|
-
| `--lang` | Retrieve tweets written in a specific language. | |
|
152
|
-
| `--limit` | Stop scraping when *at least* the number of tweets indicated with --limit is scraped. | 100 |
|
153
|
-
| `--order` | Sort order of the results. | desc |
|
154
|
-
| `--threads` | Set the number of threads twitterscraper-ruby should initiate while scraping for your query. | 2 |
|
155
|
-
| `--proxy` | Scrape https://twitter.com/search via proxies. | true |
|
156
|
-
| `--cache` | Enable caching. | true |
|
157
|
-
| `--format` | The format of the output. | json |
|
158
|
-
| `--output` | The name of the output file. | tweets.json |
|
159
|
-
| `--verbose`
|
165
|
+
| Option | Type | Description | Value |
|
166
|
+
| ------------- | ------------- | ------------- | ------------- |
|
167
|
+
| `--help` | string | This option displays a summary of twitterscraper. | |
|
168
|
+
| `--type` | string | Specify a search type. | search(default) or user |
|
169
|
+
| `--query` | string | Specify a keyword used during the search. | |
|
170
|
+
| `--start_date` | string | Used as "since:yyyy-mm-dd for your query. This means "since the date". | |
|
171
|
+
| `--end_date` | string | Used as "until:yyyy-mm-dd for your query. This means "before the date". | |
|
172
|
+
| `--lang` | string | Retrieve tweets written in a specific language. | |
|
173
|
+
| `--limit` | integer | Stop scraping when *at least* the number of tweets indicated with --limit is scraped. | 100 |
|
174
|
+
| `--order` | string | Sort a order of the results. | desc(default) or asc |
|
175
|
+
| `--threads` | integer | Set the number of threads twitterscraper-ruby should initiate while scraping for your query. | 2 |
|
176
|
+
| `--proxy` | boolean | Scrape https://twitter.com/search via proxies. | true(default) or false |
|
177
|
+
| `--cache` | boolean | Enable caching. | true(default) or false |
|
178
|
+
| `--format` | string | The format of the output. | json(default) or html |
|
179
|
+
| `--output` | string | The name of the output file. | tweets.json |
|
180
|
+
| `--verbose` | | Print debug messages. | |
|
160
181
|
|
161
182
|
|
162
183
|
## Contributing
|
data/lib/twitterscraper/query.rb
CHANGED
@@ -27,8 +27,8 @@ module Twitterscraper
|
|
27
27
|
'include_available_features=1&include_entities=1&' +
|
28
28
|
'max_position=__POS__&reset_error_state=false'
|
29
29
|
|
30
|
-
def build_query_url(query, lang,
|
31
|
-
if
|
30
|
+
def build_query_url(query, lang, type, pos)
|
31
|
+
if type == 'user'
|
32
32
|
if pos
|
33
33
|
RELOAD_URL_USER.sub('__USER__', query).sub('__POS__', pos.to_s)
|
34
34
|
else
|
@@ -51,7 +51,7 @@ module Twitterscraper
|
|
51
51
|
end
|
52
52
|
Http.get(url, headers, proxy, timeout)
|
53
53
|
rescue => e
|
54
|
-
logger.debug "
|
54
|
+
logger.debug "get_single_page: #{e.inspect}"
|
55
55
|
if (retries -= 1) > 0
|
56
56
|
logger.info "Retrying... (Attempts left: #{retries - 1})"
|
57
57
|
retry
|
@@ -79,7 +79,7 @@ module Twitterscraper
|
|
79
79
|
logger.info "Querying #{query}"
|
80
80
|
query = ERB::Util.url_encode(query)
|
81
81
|
|
82
|
-
url = build_query_url(query, lang, type
|
82
|
+
url = build_query_url(query, lang, type, pos)
|
83
83
|
http_request = lambda do
|
84
84
|
logger.debug "Scraping tweets from #{url}"
|
85
85
|
get_single_page(url, headers, proxies)
|
data/lib/version.rb
CHANGED