wikipedia_twitterbot 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 1330f2a58c111f7cee263a8746cd241424ce78cd
4
+ data.tar.gz: 4ba11d3e6b833f66de96f57bb77150af90216f74
5
+ SHA512:
6
+ metadata.gz: 87da8c83217ea2f27a2150cdbfd9ca4505daf0493e18d96a61fadcadc58b8bf038d954ba0c64018fba1c2a308a8afb2909ff5fd5c8abc6551f21549d21386881
7
+ data.tar.gz: f82241415cfa16ec0f9c8c1670583e5ce8a2defbe85f00491f1f4b25945cf4192d0dc8a837f52f2474f841cae82fc451289e987e3eb3ac3698ed0ef2231e1503
data/.gitignore ADDED
@@ -0,0 +1,9 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at ragesoss@gmail.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source 'https://rubygems.org'
2
+
3
+ git_source(:github) { |repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in wikipedia_twitterbot.gemspec
6
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2017 Sage Ross
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,62 @@
1
+ # WikipediaTwitterbot
2
+
3
+ Gem for creating Twitter bots related to Wikipedia
4
+
5
+ ## Get Twitter API credentials
6
+
7
+ Create a twitter account for your bot and then register an app, and put the credentials in twitter.yml:
8
+
9
+ ```
10
+ twitter_consumer_key: ohai
11
+ twitter_consumer_secret: kthxbai
12
+ twitter_access_token: isee
13
+ twitter_access_token_secret: whatyoudidthere
14
+ ```
15
+
16
+ For more info, see https://github.com/sferik/twitter#configuration
17
+
18
+ ## Set up a database
19
+
20
+ Use this gem to create an article database, via irb:
21
+
22
+ ```ruby
23
+ require 'wikipedia_twitterbot'
24
+ ArticleDatabase.create 'your_bot_name'
25
+ ```
26
+
27
+ ## Write your bot code
28
+
29
+ Now you can write a bot. Here's what a basic one might look like:
30
+
31
+ ```ruby
32
+ require 'wikipedia_twitterbot'
33
+ Article.connect_to_database 'braggingvandalbot'
34
+
35
+ class TrivialWikipediaBot
36
+ def self.tweet(article)
37
+ tweet_text = "#{article.title} is here: #{article.url}"
38
+ article.tweet tweet_text
39
+ end
40
+
41
+ # adds random articles to the database matching the given criteria
42
+ def self.find_articles
43
+ options = {
44
+ max_w10: 30,
45
+ min_views: 300
46
+ }
47
+ Article.import_at_random(options)
48
+ end
49
+ end
50
+ ```
51
+
52
+ `Article` provides both class methods for fetching and importing Wikipedia articles and metadata, and instance methods for supplying info about a particular article that you can use in tweets. See `article.rb` for more details.
53
+
54
+ Make your but run by configuring cron jobs to import articles and tweet tweets about them.
55
+
56
+ ## License
57
+
58
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
59
+
60
+ ## Code of Conduct
61
+
62
+ Everyone interacting in the WikipediaTwitterbot project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/wikipedia_twitterbot/blob/master/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,2 @@
1
+ require 'bundler/gem_tasks'
2
+ task default: :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'wikipedia_twitterbot'
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require 'irb'
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,152 @@
1
+ require 'active_record'
2
+ require 'activerecord-import'
3
+ require 'sqlite3'
4
+ require 'logger'
5
+ require_relative 'tweet'
6
+ require_relative 'twitter_client'
7
+ require_relative 'find_images'
8
+
9
+ class Article < ActiveRecord::Base
10
+ class << self
11
+ attr_reader :bot_name
12
+
13
+ def connect_to_database(bot_name)
14
+ @bot_name = bot_name
15
+ ActiveRecord::Base.logger = Logger.new('debug.log')
16
+ ActiveRecord::Base.establish_connection(
17
+ adapter: 'sqlite3',
18
+ database: "#{bot_name}.sqlite3",
19
+ encoding: 'utf8'
20
+ )
21
+ end
22
+ end
23
+
24
+ serialize :ores_data, Hash
25
+ #################
26
+ # Class methods #
27
+ #################
28
+
29
+ def self.import_at_random(opts)
30
+ import fetch_at_random(opts)
31
+ end
32
+
33
+ DEFAULT_OPTS = {
34
+ count: 10_000,
35
+ discard_redirects: true,
36
+ min_views: 0,
37
+ max_wp10: nil,
38
+ discard_dabs: true
39
+ }.freeze
40
+
41
+ def self.fetch_at_random(opts)
42
+ options = DEFAULT_OPTS.merge opts
43
+
44
+ articles = FindArticles.at_random(count: options[:count])
45
+ puts "#{articles.count} mainspace articles found"
46
+
47
+ if options[:discard_redirects]
48
+ articles = DiscardRedirects.from(articles)
49
+ puts "#{articles.count} are not redirects"
50
+ end
51
+
52
+ if options[:min_views].positive?
53
+ articles = HighPageviews.from_among(articles, min_views: options[:min_views])
54
+ puts "#{articles.count} of those have high page views"
55
+ end
56
+
57
+ if options[:max_wp10]
58
+ articles = Ores.discard_high_revision_scores(articles, max_wp10: options[:max_wp10])
59
+ puts "#{articles.count} of those have low revision scores"
60
+ end
61
+
62
+ if options[:discard_dabs]
63
+ articles = CategoryFilter.discard_disambiguation_pages(articles)
64
+ puts "#{articles.count} of those are not disambiguation pages"
65
+ end
66
+
67
+ if articles.count > 0
68
+ puts "#{articles.count} tweetable prospect(s) found!"
69
+ else
70
+ puts 'no tweetable articles found'
71
+ end
72
+
73
+ articles
74
+ end
75
+
76
+ def self.last_tweetable
77
+ tweetable.last
78
+ end
79
+
80
+ def self.first_tweetable
81
+ tweetable.first
82
+ end
83
+
84
+ def self.tweetable
85
+ where(tweeted: nil, failed_tweet_at: nil)
86
+ end
87
+
88
+ ####################
89
+ # Instance methods #
90
+ ####################
91
+ def tweet(tweet_text)
92
+ Tweet.new(tweet_text, filename: @image)
93
+ self.tweeted = true
94
+ save
95
+ 'tweeted'
96
+ rescue StandardError => e
97
+ self.failed_tweet_at = Time.now
98
+ save
99
+ raise e
100
+ end
101
+
102
+ def screenshot_path
103
+ "screenshots/#{escaped_title}.png"
104
+ end
105
+
106
+ def commons_link(image)
107
+ "https://commons.wikimedia.org/wiki/#{CGI.escape(image.tr(' ', '_'))}"
108
+ end
109
+
110
+ def escaped_title
111
+ # CGI.escape will convert spaces to '+' which will break the URL
112
+ CGI.escape(title.tr(' ', '_'))
113
+ end
114
+
115
+ def views
116
+ average_views.to_i
117
+ end
118
+
119
+ def quality
120
+ wp10.to_i
121
+ end
122
+
123
+ def url
124
+ "https://en.wikipedia.org/wiki/#{escaped_title}"
125
+ end
126
+
127
+ def mobile_url
128
+ "https://en.m.wikipedia.org/wiki/#{escaped_title}"
129
+ end
130
+
131
+ def edit_url
132
+ # Includes the summary preload #FixmeBot, so that edits can be tracked:
133
+ # http://tools.wmflabs.org/hashtags/search/wikiphotofight
134
+ "https://en.wikipedia.org/wiki/#{escaped_title}?veaction=edit&summary=%23#{bot_name}"
135
+ end
136
+
137
+ def make_screenshot
138
+ webshot = Webshot::Screenshot.instance
139
+ webshot.capture mobile_url, "public/#{screenshot_path}",
140
+ width: 800, height: 800, allowed_status_codes: [404]
141
+ end
142
+
143
+ def hashtag
144
+ TwitterClient.new.top_hashtag(title)
145
+ end
146
+
147
+ def bot_name
148
+ self.class.bot_name
149
+ end
150
+
151
+ class NoImageError < StandardError; end
152
+ end
@@ -0,0 +1,34 @@
1
+ #= Gets the categories for articles and filters based on that data
2
+ class CategoryFilter
3
+ ################
4
+ # Entry points #
5
+ ################
6
+ def self.discard_disambiguation_pages(articles)
7
+ articles.reject! { |article| disambiguation_page?(article) }
8
+ articles
9
+ end
10
+
11
+ ###############
12
+ # Other stuff #
13
+ ###############
14
+ def self.category_query(page_id)
15
+ { prop: 'categories',
16
+ cllimit: 500,
17
+ pageids: page_id }
18
+ end
19
+
20
+ def self.categories_for(article)
21
+ article_id = article.id
22
+ response = Wiki.query category_query(article_id)
23
+ categories = response.data['pages'][article_id.to_s]['categories']
24
+ return unless categories
25
+ categories = categories.map { |cat| cat['title'] }
26
+ categories
27
+ end
28
+
29
+ def self.disambiguation_page?(article)
30
+ categories = categories_for(article)
31
+ return false unless categories
32
+ categories_for(article).include?('Category:Disambiguation pages')
33
+ end
34
+ end
@@ -0,0 +1,21 @@
1
+ class CreateArticles < ActiveRecord::Migration[5.1]
2
+ def change
3
+ create_table :articles, force: true do |t|
4
+ t.string :title
5
+ t.integer :latest_revision
6
+ t.datetime :latest_revision_datetime
7
+ t.string :rating
8
+ t.text :ores_data
9
+ t.float :wp10
10
+ t.float :average_views
11
+ t.date :average_views_updated_at
12
+ t.boolean :tweeted
13
+ t.timestamp :tweeted_at
14
+ t.boolean :redirect
15
+ t.timestamps null: false
16
+ t.integer :image_count
17
+ t.string :first_image_url
18
+ t.timestamp :failed_tweet_at
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,15 @@
1
+
2
+ # Run these commands to get started from scratch
3
+
4
+ class ArticleDatabase
5
+ def self.create(bot_name)
6
+ require 'sqlite3'
7
+ require 'active_record'
8
+ require_relative '001_create_articles'
9
+
10
+ SQLite3::Database.new("#{bot_name}.sqlite3")
11
+ ActiveRecord::Base.establish_connection(adapter: 'sqlite3',
12
+ database: "#{bot_name}.sqlite3")
13
+ CreateArticles.new.migrate(:up)
14
+ end
15
+ end
@@ -0,0 +1,28 @@
1
+ class DiscardRedirects
2
+ def self.from(articles)
3
+ pages = {}
4
+ articles.each_slice(50) do |fifty_articles|
5
+ ids = fifty_articles.map(&:id)
6
+ page_info_response = Wiki.query page_info_query(ids)
7
+ pages.merge! page_info_response.data['pages']
8
+ end
9
+
10
+ articles.each do |article|
11
+ info = pages[article.id.to_s]
12
+ next unless info
13
+ article.redirect = if info['redirect']
14
+ true
15
+ else
16
+ false
17
+ end
18
+ end
19
+
20
+ articles.select! { |article| article.redirect == false }
21
+ articles
22
+ end
23
+
24
+ def self.page_info_query(page_ids)
25
+ { prop: 'info',
26
+ pageids: page_ids }
27
+ end
28
+ end
@@ -0,0 +1,59 @@
1
+ require_relative 'wiki'
2
+
3
+ class FindArticles
4
+ ################
5
+ # Entry points #
6
+ ################
7
+
8
+ def self.by_ids(ids)
9
+ existing_ids = Article.all.pluck(:id)
10
+ ids -= existing_ids
11
+ page_data = get_pages(ids)
12
+ article_data = page_data.select { |page| page['ns'] == 0 }
13
+ article_data.select! { |page| existing_ids.exclude?(page['pageid']) }
14
+
15
+ articles = []
16
+ article_data.each do |article|
17
+ revision = article['revisions'][0]
18
+ articles << Article.new(id: article['pageid'],
19
+ title: article['title'],
20
+ latest_revision: revision['revid'],
21
+ latest_revision_datetime: revision['timestamp'])
22
+ end
23
+ articles
24
+ end
25
+
26
+ def self.at_random(count: 100)
27
+ # As of December 2015, recently created articles have page ids under
28
+ # 50_000_000.
29
+ ids = Array.new(count) { Random.rand(60_000_000) }
30
+ by_ids(ids)
31
+ end
32
+
33
+ ####################
34
+ # Internal methods #
35
+ ####################
36
+
37
+ def self.revisions_query(article_ids)
38
+ { prop: 'revisions',
39
+ pageids: article_ids,
40
+ rvprop: 'userid|ids|timestamp' }
41
+ end
42
+
43
+ def self.get_pages(article_ids)
44
+ pages = {}
45
+ threads = article_ids.in_groups(10, false).each_with_index.map do |group_of_ids, i|
46
+ Thread.new(i) do
47
+ pages = {}
48
+ group_of_ids.each_slice(50) do |fifty_ids|
49
+ rev_query = revisions_query(fifty_ids)
50
+ rev_response = Wiki.query rev_query
51
+ pages.merge! rev_response.data['pages']
52
+ end
53
+ end
54
+ end
55
+
56
+ threads.each(&:join)
57
+ pages.values
58
+ end
59
+ end
@@ -0,0 +1,6 @@
1
+ class FindImages
2
+ def self.first(article)
3
+ page_text = Wiki.get_page_content article.title
4
+ page_text[/File:.{,60}\.jpg/]
5
+ end
6
+ end
@@ -0,0 +1,30 @@
1
+ require_relative 'wiki_pageviews'
2
+
3
+ class HighPageviews
4
+ def self.from_among(articles, min_views: 300)
5
+ average_views = {}
6
+
7
+ articles.each_slice(50) do |fifty_articles|
8
+ threads = fifty_articles.each_with_index.map do |article, i|
9
+ Thread.new(i) do
10
+ title = article.title.tr(' ', '_')
11
+ average_views[article.id] = WikiPageviews.average_views_for_article(title)
12
+ end
13
+ end
14
+ threads.each(&:join)
15
+ end
16
+
17
+ timestamp = Time.now.utc
18
+ update_average_views(articles, average_views, timestamp)
19
+ articles.reject! { |article| article.average_views.nil? }
20
+ articles.select! { |article| article.average_views > min_views }
21
+ articles
22
+ end
23
+
24
+ def self.update_average_views(articles, average_views, average_views_updated_at)
25
+ articles.each do |article|
26
+ article.average_views_updated_at = average_views_updated_at
27
+ article.average_views = average_views[article.id]
28
+ end
29
+ end
30
+ end
@@ -0,0 +1,63 @@
1
+ #= Imports revision scoring data from ores.wmflabs.org
2
+ class Ores
3
+ ################
4
+ # Entry points #
5
+ ################
6
+ def self.select_by_image_count(articles, image_count: 1)
7
+ @ores = new
8
+ articles.each do |article|
9
+ article.ores_data = @ores.get_revision_data(article.latest_revision)
10
+ puts article.ores_data.dig('scores', 'enwiki', 'wp10', 'features',
11
+ article.latest_revision.to_s,
12
+ 'feature.enwiki.revision.image_links')
13
+ end
14
+ selected_articles = articles.select do |article|
15
+ article.ores_data.dig('scores', 'enwiki', 'wp10', 'features',
16
+ article.latest_revision.to_s,
17
+ 'feature.enwiki.revision.image_links') == image_count
18
+ end
19
+ selected_articles
20
+ end
21
+
22
+ def initialize
23
+ @project_code = 'enwiki'
24
+ end
25
+
26
+ def get_revision_data(rev_id)
27
+ # TODO: i18n
28
+ response = ores_server.get query_url(rev_id)
29
+ ores_data = JSON.parse(response.body)
30
+ ores_data
31
+ rescue StandardError => error
32
+ raise error unless TYPICAL_ERRORS.include?(error.class)
33
+ return {}
34
+ end
35
+
36
+ TYPICAL_ERRORS = [
37
+ Errno::ETIMEDOUT,
38
+ Net::ReadTimeout,
39
+ Errno::ECONNREFUSED,
40
+ JSON::ParserError,
41
+ Errno::EHOSTUNREACH,
42
+ Faraday::ConnectionFailed,
43
+ Faraday::TimeoutError
44
+ ].freeze
45
+
46
+ class InvalidProjectError < StandardError
47
+ end
48
+
49
+ private
50
+
51
+ def query_url(rev_id)
52
+ base_url = "/v2/scores/#{@project_code}/wp10/"
53
+ url = base_url + rev_id.to_s + '/?features'
54
+ url = URI.encode url
55
+ url
56
+ end
57
+
58
+ def ores_server
59
+ conn = Faraday.new(url: 'https://ores.wikimedia.org')
60
+ conn.headers['User-Agent'] = '@WikiPhotoFight by ragesoss'
61
+ conn
62
+ end
63
+ end
@@ -0,0 +1,40 @@
1
+ require 'twitter'
2
+
3
+ # Finds tweetable articles, tweets them
4
+ class Tweet
5
+ # Find an article to tweet and tweet it
6
+ def self.anything
7
+ # Randomly tweet either the earlier tweetable Article in the database
8
+ # or the latest.
9
+ # Wikipedia increments page ids over time, so the first ids are the oldest
10
+ # articles and the last ids are the latest.
11
+ article = if coin_flip
12
+ Article.last_tweetable
13
+ else
14
+ Article.first_tweetable
15
+ end
16
+ article.tweet
17
+ puts "Tweeted #{article.title}"
18
+ end
19
+
20
+ ###############
21
+ # Twitter API #
22
+ ###############
23
+ def initialize(tweet, filename: nil)
24
+ if filename
25
+ Wiki.save_commons_image filename
26
+ TwitterClient.new.client.update_with_media(tweet, File.new(filename))
27
+ File.delete filename
28
+ else
29
+ TwitterClient.new.client.update(tweet)
30
+ end
31
+ end
32
+
33
+ ###########
34
+ # Helpers #
35
+ ###########
36
+
37
+ def self.coin_flip
38
+ [true, false][rand(2)]
39
+ end
40
+ end
@@ -0,0 +1,33 @@
1
+ class TwitterClient
2
+ attr_reader :client
3
+ def initialize
4
+ twitter_secrets = YAML.safe_load File.read('twitter.yml')
5
+ @client = Twitter::REST::Client.new do |config|
6
+ config.consumer_key = twitter_secrets['twitter_consumer_key']
7
+ config.consumer_secret = twitter_secrets['twitter_consumer_secret']
8
+ config.access_token = twitter_secrets['twitter_access_token']
9
+ config.access_token_secret = twitter_secrets['twitter_access_token_secret']
10
+ end
11
+ end
12
+
13
+ def top_hashtag(search_query)
14
+ top_with_count = related_hashtags(search_query).max_by { |_h, v| v }
15
+ top_with_count[0] unless top_with_count.nil?
16
+ end
17
+
18
+ def related_hashtags(search_query)
19
+ @texts = @client.search(search_query).first(200).map(&:text)
20
+ @hashtags = Hash.new { |h, k| h[k] = 0 }
21
+ @texts.select! { |text| text.match(/#/) }
22
+ @texts.each do |text|
23
+ hashtags_in(text).each do |hashtag|
24
+ @hashtags[hashtag] += 1
25
+ end
26
+ end
27
+ @hashtags
28
+ end
29
+
30
+ def hashtags_in(text)
31
+ text.scan(/\s(#\w+)/).flatten
32
+ end
33
+ end
@@ -0,0 +1,3 @@
1
+ module WikipediaTwitterbot
2
+ VERSION = '0.1.0'.freeze
3
+ end
@@ -0,0 +1,66 @@
1
+ require 'mediawiki_api'
2
+ require 'open-uri'
3
+
4
+ #= This class is for getting data directly from the Wikipedia API.
5
+ class Wiki
6
+ ################
7
+ # Entry points #
8
+ ################
9
+
10
+ # General entry point for making arbitrary queries of the Wikipedia API
11
+ def self.query(query_parameters, opts = {})
12
+ wikipedia('query', query_parameters, opts)
13
+ end
14
+
15
+ def self.get_page_content(page_title, opts = {})
16
+ response = wikipedia('get_wikitext', page_title, opts)
17
+ response.status == 200 ? response.body : nil
18
+ end
19
+
20
+ def self.save_commons_image(filename)
21
+ opts = { site: 'commons.wikimedia.org' }
22
+ query = { prop: 'imageinfo',
23
+ iiprop: 'url',
24
+ iiurlheight: 1000,
25
+ titles: filename }
26
+ response = query(query, opts)
27
+ url = response.data['pages'].values.first['imageinfo'].first['thumburl']
28
+
29
+ File.write filename, open(url).read
30
+ end
31
+
32
+ ###################
33
+ # Private methods #
34
+ ###################
35
+ class << self
36
+ private
37
+
38
+ def wikipedia(action, query, opts = {})
39
+ tries ||= 3
40
+ @mediawiki = api_client(opts)
41
+ @mediawiki.send(action, query)
42
+ rescue StandardError => e
43
+ tries -= 1
44
+ typical_errors = [Faraday::TimeoutError,
45
+ Faraday::ConnectionFailed,
46
+ MediawikiApi::HttpError]
47
+ if typical_errors.include?(e.class)
48
+ retry if tries >= 0
49
+ else
50
+ raise e
51
+ end
52
+ end
53
+
54
+ def api_client(opts)
55
+ site = opts[:site]
56
+ language = opts[:language] || 'en'
57
+
58
+ url = if site
59
+ "https://#{site}/w/api.php"
60
+ else
61
+ "https://#{language}.wikipedia.org/w/api.php"
62
+ end
63
+ MediawikiApi::Client.new url
64
+ end
65
+ end
66
+ end
@@ -0,0 +1,86 @@
1
+ # Fetches pageview data from the Wikimedia pageviews REST API
2
+ # Documentation: https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews_per_article_project_access_agent_article_granularity_start_end
3
+ class WikiPageviews
4
+ ################
5
+ # Entry points #
6
+ ################
7
+
8
+ # Given an article title and a date, return the number of page views for every
9
+ # day from that date until today.
10
+ #
11
+ # [title] title of a Wikipedia page (including namespace, if applicable)
12
+ # [date] a specific date
13
+ def self.views_for_article(title, opts = {})
14
+ language = opts[:language] || 'en'
15
+ start_date = opts[:start_date] || (Date.today - 30)
16
+ end_date = opts[:end_date] || Date.today
17
+ url = query_url(title, start_date, end_date, language)
18
+ data = api_get url
19
+ return unless data
20
+ data = JSON.parse data
21
+ return unless data.include?('items')
22
+ daily_view_data = data['items']
23
+ views = {}
24
+ daily_view_data.each do |day_data|
25
+ date = day_data['timestamp'][0..7]
26
+ views[date] = day_data['views']
27
+ end
28
+ views
29
+ end
30
+
31
+ def self.average_views_for_article(title, opts = {})
32
+ language = opts[:language] || 'en'
33
+ data = recent_views(title, language)
34
+ # TODO: better handling of unexpected or empty responses, including logging
35
+ return unless data
36
+ data = JSON.parse data
37
+ return unless data.include?('items')
38
+ daily_view_data = data['items']
39
+ days = daily_view_data.count
40
+ total_views = 0
41
+ daily_view_data.each do |day_data|
42
+ total_views += day_data['views']
43
+ end
44
+ return if total_views == 0
45
+ average_views = total_views.to_f / days
46
+ average_views
47
+ end
48
+
49
+ ##################
50
+ # Helper methods #
51
+ ##################
52
+ def self.recent_views(title, language)
53
+ start_date = DateTime.now - 50
54
+ end_date = DateTime.now
55
+ url = query_url(title, start_date, end_date, language)
56
+ api_get url
57
+ end
58
+
59
+ def self.query_url(title, start_date, end_date, language)
60
+ title = CGI.escape(title)
61
+ base_url = 'https://wikimedia.org/api/rest_v1/metrics/pageviews/'
62
+ configuration_params = "per-article/#{language}.wikipedia/all-access/user/"
63
+ start_param = start_date.strftime('%Y%m%d')
64
+ end_param = end_date.strftime('%Y%m%d')
65
+ title_and_date_params = "#{title}/daily/#{start_param}00/#{end_param}00"
66
+ url = base_url + configuration_params + title_and_date_params
67
+ url
68
+ end
69
+
70
+ ###################
71
+ # Private methods #
72
+ ###################
73
+ class << self
74
+ private
75
+
76
+ def api_get(url)
77
+ tries ||= 3
78
+ response = Net::HTTP.get(URI.parse(url))
79
+ response
80
+ rescue Errno::ETIMEDOUT
81
+ retry unless tries.zero?
82
+ rescue StandardError => e
83
+ raise e
84
+ end
85
+ end
86
+ end
@@ -0,0 +1,16 @@
1
+ require 'wikipedia_twitterbot/version'
2
+
3
+ module WikipediaTwitterbot
4
+ require_relative 'wikipedia_twitterbot/article'
5
+ require_relative 'wikipedia_twitterbot/wiki'
6
+ require_relative 'wikipedia_twitterbot/wiki_pageviews'
7
+ require_relative 'wikipedia_twitterbot/ores'
8
+ require_relative 'wikipedia_twitterbot/twitter_client'
9
+ require_relative 'wikipedia_twitterbot/find_articles'
10
+ require_relative 'wikipedia_twitterbot/find_images'
11
+ require_relative 'wikipedia_twitterbot/high_pageviews'
12
+ require_relative 'wikipedia_twitterbot/category_filter'
13
+ require_relative 'wikipedia_twitterbot/discard_redirects'
14
+ require_relative 'wikipedia_twitterbot/tweet'
15
+ require_relative 'wikipedia_twitterbot/db/bootstrap'
16
+ end
@@ -0,0 +1,33 @@
1
+ # coding: utf-8
2
+
3
+ lib = File.expand_path('../lib', __FILE__)
4
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
5
+ require 'wikipedia_twitterbot/version'
6
+
7
+ Gem::Specification.new do |spec|
8
+ spec.name = 'wikipedia_twitterbot'
9
+ spec.version = WikipediaTwitterbot::VERSION
10
+ spec.authors = ['Sage Ross']
11
+ spec.email = ['sage@ragesoss.com']
12
+
13
+ spec.summary = 'Tools for building Wikipedia-focused Twitter bots'
14
+ spec.homepage = 'https://github.com/ragesoss/WikipediaTwitterbot'
15
+ spec.license = 'MIT'
16
+
17
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
18
+ f.match(%r{^(test|spec|features)/})
19
+ end
20
+ spec.bindir = 'exe'
21
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
22
+ spec.require_paths = ['lib']
23
+
24
+ spec.add_development_dependency 'bundler', '~> 1.15'
25
+ spec.add_development_dependency 'rake', '~> 10.0'
26
+
27
+ spec.add_runtime_dependency 'sqlite3'
28
+ spec.add_runtime_dependency 'activerecord'
29
+ spec.add_runtime_dependency 'activerecord-import'
30
+ spec.add_runtime_dependency 'twitter'
31
+ spec.add_runtime_dependency 'mediawiki_api'
32
+ spec.add_runtime_dependency 'logger'
33
+ end
metadata ADDED
@@ -0,0 +1,180 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: wikipedia_twitterbot
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Sage Ross
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2017-11-27 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.15'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.15'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: sqlite3
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: activerecord
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: activerecord-import
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: twitter
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :runtime
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: mediawiki_api
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ type: :runtime
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ - !ruby/object:Gem::Dependency
112
+ name: logger
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :runtime
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
125
+ description:
126
+ email:
127
+ - sage@ragesoss.com
128
+ executables: []
129
+ extensions: []
130
+ extra_rdoc_files: []
131
+ files:
132
+ - ".gitignore"
133
+ - CODE_OF_CONDUCT.md
134
+ - Gemfile
135
+ - LICENSE.txt
136
+ - README.md
137
+ - Rakefile
138
+ - bin/console
139
+ - bin/setup
140
+ - lib/wikipedia_twitterbot.rb
141
+ - lib/wikipedia_twitterbot/article.rb
142
+ - lib/wikipedia_twitterbot/category_filter.rb
143
+ - lib/wikipedia_twitterbot/db/001_create_articles.rb
144
+ - lib/wikipedia_twitterbot/db/bootstrap.rb
145
+ - lib/wikipedia_twitterbot/discard_redirects.rb
146
+ - lib/wikipedia_twitterbot/find_articles.rb
147
+ - lib/wikipedia_twitterbot/find_images.rb
148
+ - lib/wikipedia_twitterbot/high_pageviews.rb
149
+ - lib/wikipedia_twitterbot/ores.rb
150
+ - lib/wikipedia_twitterbot/tweet.rb
151
+ - lib/wikipedia_twitterbot/twitter_client.rb
152
+ - lib/wikipedia_twitterbot/version.rb
153
+ - lib/wikipedia_twitterbot/wiki.rb
154
+ - lib/wikipedia_twitterbot/wiki_pageviews.rb
155
+ - wikipedia_twitterbot.gemspec
156
+ homepage: https://github.com/ragesoss/WikipediaTwitterbot
157
+ licenses:
158
+ - MIT
159
+ metadata: {}
160
+ post_install_message:
161
+ rdoc_options: []
162
+ require_paths:
163
+ - lib
164
+ required_ruby_version: !ruby/object:Gem::Requirement
165
+ requirements:
166
+ - - ">="
167
+ - !ruby/object:Gem::Version
168
+ version: '0'
169
+ required_rubygems_version: !ruby/object:Gem::Requirement
170
+ requirements:
171
+ - - ">="
172
+ - !ruby/object:Gem::Version
173
+ version: '0'
174
+ requirements: []
175
+ rubyforge_project:
176
+ rubygems_version: 2.6.8
177
+ signing_key:
178
+ specification_version: 4
179
+ summary: Tools for building Wikipedia-focused Twitter bots
180
+ test_files: []