twitterscraper-ruby 0.15.0 → 0.15.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +67 -46
- data/lib/twitterscraper/query.rb +4 -4
- data/lib/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7f04cb0ba394884918271b5485b596c07203b7a6e9f4fec42d074ef4f02b6a0a
|
4
|
+
data.tar.gz: a4f618df53d1e8b54954619e87d383e43dbe5a63bbf83b33ee38f975998f2678
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: fa9f02cf3ef0bf280f45b18ebacaec0b06dbd610477355602fcc59d382b5590c990695297e1e793457fdcff4cb7dd037f076c1f0fa4706eb69c67c3a165243e4
|
7
|
+
data.tar.gz: 9c08d9e4d1ee56fa133675bc73a50f502040cc9a2844d9a46a39c38ccdffdf43c15b17c2e4a8b74561f523493ccbc4a055f0add239574d2f5129ee4abe1f5ed9
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -5,15 +5,17 @@
|
|
5
5
|
|
6
6
|
A gem to scrape https://twitter.com/search. This gem is inspired by [taspinar/twitterscraper](https://github.com/taspinar/twitterscraper).
|
7
7
|
|
8
|
+
Please feel free to ask [@ts_3156](https://twitter.com/ts_3156) if you have any questions.
|
9
|
+
|
8
10
|
|
9
11
|
## Twitter Search API vs. twitterscraper-ruby
|
10
12
|
|
11
|
-
|
13
|
+
#### Twitter Search API
|
12
14
|
|
13
15
|
- The number of tweets: 180 - 450 requests/15 minutes (18,000 - 45,000 tweets/15 minutes)
|
14
16
|
- The time window: the past 7 days
|
15
17
|
|
16
|
-
|
18
|
+
#### twitterscraper-ruby
|
17
19
|
|
18
20
|
- The number of tweets: Unlimited
|
19
21
|
- The time window: from 2006-3-21 to today
|
@@ -30,37 +32,49 @@ $ gem install twitterscraper-ruby
|
|
30
32
|
|
31
33
|
## Usage
|
32
34
|
|
33
|
-
Command-line interface:
|
35
|
+
#### Command-line interface:
|
36
|
+
|
37
|
+
Returns a collection of relevant tweets matching a specified query.
|
34
38
|
|
35
39
|
```shell script
|
36
|
-
# Returns a collection of relevant tweets matching a specified query.
|
37
40
|
$ twitterscraper --type search --query KEYWORD --start_date 2020-06-01 --end_date 2020-06-30 --lang ja \
|
38
41
|
--limit 100 --threads 10 --output tweets.json
|
39
42
|
```
|
40
43
|
|
44
|
+
Returns a collection of the most recent tweets posted by the user indicated by the screen_name
|
45
|
+
|
41
46
|
```shell script
|
42
|
-
# Returns a collection of the most recent tweets posted by the user indicated by the screen_name
|
43
47
|
$ twitterscraper --type user --query SCREEN_NAME --limit 100 --output tweets.json
|
44
48
|
```
|
45
49
|
|
46
|
-
From Within Ruby:
|
50
|
+
#### From Within Ruby:
|
47
51
|
|
48
52
|
```ruby
|
49
53
|
require 'twitterscraper'
|
50
54
|
client = Twitterscraper::Client.new(cache: true, proxy: true)
|
51
55
|
```
|
52
56
|
|
57
|
+
Returns a collection of relevant tweets matching a specified query.
|
58
|
+
|
53
59
|
```ruby
|
54
|
-
# Returns a collection of relevant tweets matching a specified query.
|
55
60
|
tweets = client.search(KEYWORD, start_date: '2020-06-01', end_date: '2020-06-30', lang: 'ja', limit: 100, threads: 10)
|
56
61
|
```
|
57
62
|
|
63
|
+
Returns a collection of the most recent tweets posted by the user indicated by the screen_name
|
64
|
+
|
58
65
|
```ruby
|
59
|
-
# Returns a collection of the most recent tweets posted by the user indicated by the screen_name
|
60
66
|
tweets = client.user_timeline(SCREEN_NAME, limit: 100)
|
61
67
|
```
|
62
68
|
|
63
69
|
|
70
|
+
## Examples
|
71
|
+
|
72
|
+
```shell script
|
73
|
+
$ twitterscraper --query twitter --limit 1000
|
74
|
+
$ cat tweets.json | jq . | less
|
75
|
+
```
|
76
|
+
|
77
|
+
|
64
78
|
## Attributes
|
65
79
|
|
66
80
|
### Tweet
|
@@ -72,11 +86,39 @@ tweets.each do |tweet|
|
|
72
86
|
puts tweet.tweet_url
|
73
87
|
puts tweet.created_at
|
74
88
|
|
89
|
+
attr_names = hash.keys
|
75
90
|
hash = tweet.attrs
|
76
|
-
|
91
|
+
json = tweet.to_json
|
77
92
|
end
|
78
93
|
```
|
79
94
|
|
95
|
+
```json
|
96
|
+
[
|
97
|
+
{
|
98
|
+
"screen_name": "@name",
|
99
|
+
"name": "Name",
|
100
|
+
"user_id": 12340000,
|
101
|
+
"tweet_id": 1234000000000000,
|
102
|
+
"text": "Thanks Twitter!",
|
103
|
+
"links": [],
|
104
|
+
"hashtags": [],
|
105
|
+
"image_urls": [],
|
106
|
+
"video_url": null,
|
107
|
+
"has_media": null,
|
108
|
+
"likes": 10,
|
109
|
+
"retweets": 20,
|
110
|
+
"replies": 0,
|
111
|
+
"is_replied": false,
|
112
|
+
"is_reply_to": false,
|
113
|
+
"parent_tweet_id": null,
|
114
|
+
"reply_to_users": [],
|
115
|
+
"tweet_url": "https://twitter.com/name/status/1234000000000000",
|
116
|
+
"timestamp": 1594793000,
|
117
|
+
"created_at": "2020-07-15 00:00:00 +0000"
|
118
|
+
}
|
119
|
+
]
|
120
|
+
```
|
121
|
+
|
80
122
|
- screen_name
|
81
123
|
- name
|
82
124
|
- user_id
|
@@ -118,45 +160,24 @@ end
|
|
118
160
|
Search operators documentation is in [Standard search operators](https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators).
|
119
161
|
|
120
162
|
|
121
|
-
## Examples
|
122
|
-
|
123
|
-
```shell script
|
124
|
-
$ twitterscraper --query twitter --limit 1000
|
125
|
-
$ cat tweets.json | jq . | less
|
126
|
-
```
|
127
|
-
|
128
|
-
```json
|
129
|
-
[
|
130
|
-
{
|
131
|
-
"screen_name": "@screenname",
|
132
|
-
"name": "name",
|
133
|
-
"user_id": 1194529546483000000,
|
134
|
-
"tweet_id": 1282659891992000000,
|
135
|
-
"tweet_url": "https://twitter.com/screenname/status/1282659891992000000",
|
136
|
-
"created_at": "2020-07-13 12:00:00 +0000",
|
137
|
-
"text": "Thanks Twitter!"
|
138
|
-
}
|
139
|
-
]
|
140
|
-
```
|
141
|
-
|
142
163
|
## CLI Options
|
143
164
|
|
144
|
-
| Option | Description |
|
145
|
-
| ------------- | ------------- | ------------- |
|
146
|
-
|
|
147
|
-
| `--type` | Specify a search type. | search |
|
148
|
-
| `--query` | Specify a keyword used during the search. | |
|
149
|
-
| `--start_date` | Used as "since:yyyy-mm-dd for your query. This means "since the date". | |
|
150
|
-
| `--end_date` | Used as "until:yyyy-mm-dd for your query. This means "before the date". | |
|
151
|
-
| `--lang` | Retrieve tweets written in a specific language. | |
|
152
|
-
| `--limit` | Stop scraping when *at least* the number of tweets indicated with --limit is scraped. | 100 |
|
153
|
-
| `--order` | Sort order of the results. | desc |
|
154
|
-
| `--threads` | Set the number of threads twitterscraper-ruby should initiate while scraping for your query. | 2 |
|
155
|
-
| `--proxy` | Scrape https://twitter.com/search via proxies. | true |
|
156
|
-
| `--cache` | Enable caching. | true |
|
157
|
-
| `--format` | The format of the output. | json |
|
158
|
-
| `--output` | The name of the output file. | tweets.json |
|
159
|
-
| `--verbose`
|
165
|
+
| Option | Type | Description | Value |
|
166
|
+
| ------------- | ------------- | ------------- | ------------- |
|
167
|
+
| `--help` | string | This option displays a summary of twitterscraper. | |
|
168
|
+
| `--type` | string | Specify a search type. | search(default) or user |
|
169
|
+
| `--query` | string | Specify a keyword used during the search. | |
|
170
|
+
| `--start_date` | string | Used as "since:yyyy-mm-dd for your query. This means "since the date". | |
|
171
|
+
| `--end_date` | string | Used as "until:yyyy-mm-dd for your query. This means "before the date". | |
|
172
|
+
| `--lang` | string | Retrieve tweets written in a specific language. | |
|
173
|
+
| `--limit` | integer | Stop scraping when *at least* the number of tweets indicated with --limit is scraped. | 100 |
|
174
|
+
| `--order` | string | Sort a order of the results. | desc(default) or asc |
|
175
|
+
| `--threads` | integer | Set the number of threads twitterscraper-ruby should initiate while scraping for your query. | 2 |
|
176
|
+
| `--proxy` | boolean | Scrape https://twitter.com/search via proxies. | true(default) or false |
|
177
|
+
| `--cache` | boolean | Enable caching. | true(default) or false |
|
178
|
+
| `--format` | string | The format of the output. | json(default) or html |
|
179
|
+
| `--output` | string | The name of the output file. | tweets.json |
|
180
|
+
| `--verbose` | | Print debug messages. | |
|
160
181
|
|
161
182
|
|
162
183
|
## Contributing
|
data/lib/twitterscraper/query.rb
CHANGED
@@ -27,8 +27,8 @@ module Twitterscraper
|
|
27
27
|
'include_available_features=1&include_entities=1&' +
|
28
28
|
'max_position=__POS__&reset_error_state=false'
|
29
29
|
|
30
|
-
def build_query_url(query, lang,
|
31
|
-
if
|
30
|
+
def build_query_url(query, lang, type, pos)
|
31
|
+
if type == 'user'
|
32
32
|
if pos
|
33
33
|
RELOAD_URL_USER.sub('__USER__', query).sub('__POS__', pos.to_s)
|
34
34
|
else
|
@@ -51,7 +51,7 @@ module Twitterscraper
|
|
51
51
|
end
|
52
52
|
Http.get(url, headers, proxy, timeout)
|
53
53
|
rescue => e
|
54
|
-
logger.debug "
|
54
|
+
logger.debug "get_single_page: #{e.inspect}"
|
55
55
|
if (retries -= 1) > 0
|
56
56
|
logger.info "Retrying... (Attempts left: #{retries - 1})"
|
57
57
|
retry
|
@@ -79,7 +79,7 @@ module Twitterscraper
|
|
79
79
|
logger.info "Querying #{query}"
|
80
80
|
query = ERB::Util.url_encode(query)
|
81
81
|
|
82
|
-
url = build_query_url(query, lang, type
|
82
|
+
url = build_query_url(query, lang, type, pos)
|
83
83
|
http_request = lambda do
|
84
84
|
logger.debug "Scraping tweets from #{url}"
|
85
85
|
get_single_page(url, headers, proxies)
|
data/lib/version.rb
CHANGED