RubyGems - wikipedia_twitterbot - Versions diffs - 0.1.0 - Mend

wikipedia_twitterbot 0.1.0

Files changed (26) hide show

checksums.yaml +7 -0
data/.gitignore +9 -0
data/CODE_OF_CONDUCT.md +74 -0
data/Gemfile +6 -0
data/LICENSE.txt +21 -0
data/README.md +62 -0
data/Rakefile +2 -0
data/bin/console +14 -0
data/bin/setup +8 -0
data/lib/wikipedia_twitterbot/article.rb +152 -0
data/lib/wikipedia_twitterbot/category_filter.rb +34 -0
data/lib/wikipedia_twitterbot/db/001_create_articles.rb +21 -0
data/lib/wikipedia_twitterbot/db/bootstrap.rb +15 -0
data/lib/wikipedia_twitterbot/discard_redirects.rb +28 -0
data/lib/wikipedia_twitterbot/find_articles.rb +59 -0
data/lib/wikipedia_twitterbot/find_images.rb +6 -0
data/lib/wikipedia_twitterbot/high_pageviews.rb +30 -0
data/lib/wikipedia_twitterbot/ores.rb +63 -0
data/lib/wikipedia_twitterbot/tweet.rb +40 -0
data/lib/wikipedia_twitterbot/twitter_client.rb +33 -0
data/lib/wikipedia_twitterbot/version.rb +3 -0
data/lib/wikipedia_twitterbot/wiki.rb +66 -0
data/lib/wikipedia_twitterbot/wiki_pageviews.rb +86 -0
data/lib/wikipedia_twitterbot.rb +16 -0
data/wikipedia_twitterbot.gemspec +33 -0
metadata +180 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 1330f2a58c111f7cee263a8746cd241424ce78cd
+  data.tar.gz: 4ba11d3e6b833f66de96f57bb77150af90216f74
+SHA512:
+  metadata.gz: 87da8c83217ea2f27a2150cdbfd9ca4505daf0493e18d96a61fadcadc58b8bf038d954ba0c64018fba1c2a308a8afb2909ff5fd5c8abc6551f21549d21386881
+  data.tar.gz: f82241415cfa16ec0f9c8c1670583e5ce8a2defbe85f00491f1f4b25945cf4192d0dc8a837f52f2474f841cae82fc451289e987e3eb3ac3698ed0ef2231e1503

data/.gitignore ADDED Viewed

@@ -0,0 +1,9 @@
+/.bundle/
+/.yardoc
+/Gemfile.lock
+/_yardoc/
+/coverage/
+/doc/
+/pkg/
+/spec/reports/
+/tmp/

data/CODE_OF_CONDUCT.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Contributor Covenant Code of Conduct
+## Our Pledge
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to making participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, gender identity and expression, level of experience,
+nationality, personal appearance, race, religion, or sexual identity and
+orientation.
+## Our Standards
+Examples of behavior that contributes to creating a positive environment
+include:
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+Examples of unacceptable behavior by participants include:
+* The use of sexualized language or imagery and unwelcome sexual attention or
+advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+  address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+## Our Responsibilities
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+## Scope
+This Code of Conduct applies both within project spaces and in public spaces
+when an individual is representing the project or its community. Examples of
+representing a project or community include using an official project e-mail
+address, posting via an official social media account, or acting as an appointed
+representative at an online or offline event. Representation of a project may be
+further defined and clarified by project maintainers.
+## Enforcement
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at ragesoss@gmail.com. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+## Attribution
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at [http://contributor-covenant.org/version/1/4][version]
+[homepage]: http://contributor-covenant.org
+[version]: http://contributor-covenant.org/version/1/4/

data/Gemfile ADDED Viewed

@@ -0,0 +1,6 @@
+source 'https://rubygems.org'
+git_source(:github) { |repo_name| "https://github.com/#{repo_name}" }
+# Specify your gem's dependencies in wikipedia_twitterbot.gemspec
+gemspec

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright (c) 2017 Sage Ross
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,62 @@
+# WikipediaTwitterbot
+Gem for creating Twitter bots related to Wikipedia
+## Get Twitter API credentials
+Create a twitter account for your bot and then register an app, and put the credentials in twitter.yml:
+```
+twitter_consumer_key: ohai
+twitter_consumer_secret: kthxbai
+twitter_access_token: isee
+twitter_access_token_secret: whatyoudidthere
+```
+For more info, see https://github.com/sferik/twitter#configuration
+## Set up a database
+Use this gem to create an article database, via irb:
+```ruby
+require 'wikipedia_twitterbot'
+ArticleDatabase.create 'your_bot_name'
+```
+## Write your bot code
+Now you can write a bot. Here's what a basic one might look like:
+```ruby
+require 'wikipedia_twitterbot'
+Article.connect_to_database 'braggingvandalbot'
+class TrivialWikipediaBot
+  def self.tweet(article)
+    tweet_text = "#{article.title} is here: #{article.url}"
+    article.tweet tweet_text
+  end
+  # adds random articles to the database matching the given criteria
+  def self.find_articles
+    options = {
+      max_w10: 30,
+      min_views: 300
+    }
+    Article.import_at_random(options)
+  end
+end
+```
+`Article` provides both class methods for fetching and importing Wikipedia articles and metadata, and instance methods for supplying info about a particular article that you can use in tweets. See `article.rb` for more details.
+Make your but run by configuring cron jobs to import articles and tweet tweets about them.
+## License
+The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
+## Code of Conduct
+Everyone interacting in the WikipediaTwitterbot project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/wikipedia_twitterbot/blob/master/CODE_OF_CONDUCT.md).

data/Rakefile ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ require 'bundler/gem_tasks'
2	+ task default: :spec

data/bin/console ADDED Viewed

@@ -0,0 +1,14 @@
+#!/usr/bin/env ruby
+require 'bundler/setup'
+require 'wikipedia_twitterbot'
+# You can add fixtures and/or initialization code here to make experimenting
+# with your gem easier. You can also use a different console, if you like.
+# (If you use this, don't forget to add pry to your Gemfile!)
+# require "pry"
+# Pry.start
+require 'irb'
+IRB.start(__FILE__)

data/bin/setup ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+IFS=$'\n\t'
+set -vx
+bundle install
+# Do any other automated setup that you need to do here

data/lib/wikipedia_twitterbot/article.rb ADDED Viewed

@@ -0,0 +1,152 @@
+require 'active_record'
+require 'activerecord-import'
+require 'sqlite3'
+require 'logger'
+require_relative 'tweet'
+require_relative 'twitter_client'
+require_relative 'find_images'
+class Article < ActiveRecord::Base
+  class << self
+    attr_reader :bot_name
+    def connect_to_database(bot_name)
+      @bot_name = bot_name
+      ActiveRecord::Base.logger = Logger.new('debug.log')
+      ActiveRecord::Base.establish_connection(
+        adapter: 'sqlite3',
+        database: "#{bot_name}.sqlite3",
+        encoding: 'utf8'
+      )
+    end
+  end
+  serialize :ores_data, Hash
+  #################
+  # Class methods #
+  #################
+  def self.import_at_random(opts)
+    import fetch_at_random(opts)
+  end
+  DEFAULT_OPTS = {
+    count: 10_000,
+    discard_redirects: true,
+    min_views: 0,
+    max_wp10: nil,
+    discard_dabs: true
+  }.freeze
+  def self.fetch_at_random(opts)
+    options = DEFAULT_OPTS.merge opts
+    articles = FindArticles.at_random(count: options[:count])
+    puts "#{articles.count} mainspace articles found"
+    if options[:discard_redirects]
+      articles = DiscardRedirects.from(articles)
+      puts "#{articles.count} are not redirects"
+    end
+    if options[:min_views].positive?
+      articles = HighPageviews.from_among(articles, min_views: options[:min_views])
+      puts "#{articles.count} of those have high page views"
+    end
+    if options[:max_wp10]
+      articles = Ores.discard_high_revision_scores(articles, max_wp10: options[:max_wp10])
+      puts "#{articles.count} of those have low revision scores"
+    end
+    if options[:discard_dabs]
+      articles = CategoryFilter.discard_disambiguation_pages(articles)
+      puts "#{articles.count} of those are not disambiguation pages"
+    end
+    if articles.count > 0
+      puts "#{articles.count} tweetable prospect(s) found!"
+    else
+      puts 'no tweetable articles found'
+    end
+    articles
+  end
+  def self.last_tweetable
+    tweetable.last
+  end
+  def self.first_tweetable
+    tweetable.first
+  end
+  def self.tweetable
+    where(tweeted: nil, failed_tweet_at: nil)
+  end
+  ####################
+  # Instance methods #
+  ####################
+  def tweet(tweet_text)
+    Tweet.new(tweet_text, filename: @image)
+    self.tweeted = true
+    save
+    'tweeted'
+  rescue StandardError => e
+    self.failed_tweet_at = Time.now
+    save
+    raise e
+  end
+  def screenshot_path
+    "screenshots/#{escaped_title}.png"
+  end
+  def commons_link(image)
+    "https://commons.wikimedia.org/wiki/#{CGI.escape(image.tr(' ', '_'))}"
+  end
+  def escaped_title
+    # CGI.escape will convert spaces to '+' which will break the URL
+    CGI.escape(title.tr(' ', '_'))
+  end
+  def views
+    average_views.to_i
+  end
+  def quality
+    wp10.to_i
+  end
+  def url
+    "https://en.wikipedia.org/wiki/#{escaped_title}"
+  end
+  def mobile_url
+    "https://en.m.wikipedia.org/wiki/#{escaped_title}"
+  end
+  def edit_url
+    # Includes the summary preload #FixmeBot, so that edits can be tracked:
+    # http://tools.wmflabs.org/hashtags/search/wikiphotofight
+    "https://en.wikipedia.org/wiki/#{escaped_title}?veaction=edit&summary=%23#{bot_name}"
+  end
+  def make_screenshot
+    webshot = Webshot::Screenshot.instance
+    webshot.capture mobile_url, "public/#{screenshot_path}",
+                    width: 800, height: 800, allowed_status_codes: [404]
+  end
+  def hashtag
+    TwitterClient.new.top_hashtag(title)
+  end
+  def bot_name
+    self.class.bot_name
+  end
+  class NoImageError < StandardError; end
+end

data/lib/wikipedia_twitterbot/category_filter.rb ADDED Viewed

@@ -0,0 +1,34 @@
+#= Gets the categories for articles and filters based on that data
+class CategoryFilter
+  ################
+  # Entry points #
+  ################
+  def self.discard_disambiguation_pages(articles)
+    articles.reject! { |article| disambiguation_page?(article) }
+    articles
+  end
+  ###############
+  # Other stuff #
+  ###############
+  def self.category_query(page_id)
+    { prop: 'categories',
+      cllimit: 500,
+      pageids: page_id }
+  end
+  def self.categories_for(article)
+    article_id = article.id
+    response = Wiki.query category_query(article_id)
+    categories = response.data['pages'][article_id.to_s]['categories']
+    return unless categories
+    categories = categories.map { |cat| cat['title'] }
+    categories
+  end
+  def self.disambiguation_page?(article)
+    categories = categories_for(article)
+    return false unless categories
+    categories_for(article).include?('Category:Disambiguation pages')
+  end
+end

data/lib/wikipedia_twitterbot/db/001_create_articles.rb ADDED Viewed

@@ -0,0 +1,21 @@
+class CreateArticles < ActiveRecord::Migration[5.1]
+  def change
+    create_table :articles, force: true do |t|
+      t.string :title
+      t.integer :latest_revision
+      t.datetime :latest_revision_datetime
+      t.string :rating
+      t.text :ores_data
+      t.float :wp10
+      t.float :average_views
+      t.date :average_views_updated_at
+      t.boolean :tweeted
+      t.timestamp :tweeted_at
+      t.boolean :redirect
+      t.timestamps null: false
+      t.integer :image_count
+      t.string :first_image_url
+      t.timestamp :failed_tweet_at
+    end
+  end
+end

data/lib/wikipedia_twitterbot/db/bootstrap.rb ADDED Viewed

@@ -0,0 +1,15 @@
+# Run these commands to get started from scratch
+class ArticleDatabase
+  def self.create(bot_name)
+    require 'sqlite3'
+    require 'active_record'
+    require_relative '001_create_articles'
+    SQLite3::Database.new("#{bot_name}.sqlite3")
+    ActiveRecord::Base.establish_connection(adapter: 'sqlite3',
+                                            database: "#{bot_name}.sqlite3")
+    CreateArticles.new.migrate(:up)
+  end
+end

data/lib/wikipedia_twitterbot/discard_redirects.rb ADDED Viewed

@@ -0,0 +1,28 @@
+class DiscardRedirects
+  def self.from(articles)
+    pages = {}
+    articles.each_slice(50) do |fifty_articles|
+      ids = fifty_articles.map(&:id)
+      page_info_response = Wiki.query page_info_query(ids)
+      pages.merge! page_info_response.data['pages']
+    end
+    articles.each do |article|
+      info = pages[article.id.to_s]
+      next unless info
+      article.redirect = if info['redirect']
+                           true
+                         else
+                           false
+                         end
+    end
+    articles.select! { |article| article.redirect == false }
+    articles
+  end
+  def self.page_info_query(page_ids)
+    { prop: 'info',
+      pageids: page_ids }
+  end
+end

data/lib/wikipedia_twitterbot/find_articles.rb ADDED Viewed

@@ -0,0 +1,59 @@
+require_relative 'wiki'
+class FindArticles
+  ################
+  # Entry points #
+  ################
+  def self.by_ids(ids)
+    existing_ids = Article.all.pluck(:id)
+    ids -= existing_ids
+    page_data = get_pages(ids)
+    article_data = page_data.select { |page| page['ns'] == 0 }
+    article_data.select! { |page| existing_ids.exclude?(page['pageid']) }
+    articles = []
+    article_data.each do |article|
+      revision = article['revisions'][0]
+      articles << Article.new(id: article['pageid'],
+                              title: article['title'],
+                              latest_revision: revision['revid'],
+                              latest_revision_datetime: revision['timestamp'])
+    end
+    articles
+  end
+  def self.at_random(count: 100)
+    # As of December 2015, recently created articles have page ids under
+    # 50_000_000.
+    ids = Array.new(count) { Random.rand(60_000_000) }
+    by_ids(ids)
+  end
+  ####################
+  # Internal methods #
+  ####################
+  def self.revisions_query(article_ids)
+    { prop: 'revisions',
+      pageids: article_ids,
+      rvprop: 'userid|ids|timestamp' }
+  end
+  def self.get_pages(article_ids)
+    pages = {}
+    threads = article_ids.in_groups(10, false).each_with_index.map do |group_of_ids, i|
+      Thread.new(i) do
+        pages = {}
+        group_of_ids.each_slice(50) do |fifty_ids|
+          rev_query = revisions_query(fifty_ids)
+          rev_response = Wiki.query rev_query
+          pages.merge! rev_response.data['pages']
+        end
+      end
+    end
+    threads.each(&:join)
+    pages.values
+  end
+end

data/lib/wikipedia_twitterbot/find_images.rb ADDED Viewed

@@ -0,0 +1,6 @@
+class FindImages
+  def self.first(article)
+    page_text = Wiki.get_page_content article.title
+    page_text[/File:.{,60}\.jpg/]
+  end
+end

data/lib/wikipedia_twitterbot/high_pageviews.rb ADDED Viewed

@@ -0,0 +1,30 @@
+require_relative 'wiki_pageviews'
+class HighPageviews
+  def self.from_among(articles, min_views: 300)
+    average_views = {}
+    articles.each_slice(50) do |fifty_articles|
+      threads = fifty_articles.each_with_index.map do |article, i|
+        Thread.new(i) do
+          title = article.title.tr(' ', '_')
+          average_views[article.id] = WikiPageviews.average_views_for_article(title)
+        end
+      end
+      threads.each(&:join)
+    end
+    timestamp = Time.now.utc
+    update_average_views(articles, average_views, timestamp)
+    articles.reject! { |article| article.average_views.nil? }
+    articles.select! { |article| article.average_views > min_views }
+    articles
+  end
+  def self.update_average_views(articles, average_views, average_views_updated_at)
+    articles.each do |article|
+      article.average_views_updated_at = average_views_updated_at
+      article.average_views = average_views[article.id]
+    end
+  end
+end

data/lib/wikipedia_twitterbot/ores.rb ADDED Viewed

@@ -0,0 +1,63 @@
+#= Imports revision scoring data from ores.wmflabs.org
+class Ores
+  ################
+  # Entry points #
+  ################
+  def self.select_by_image_count(articles, image_count: 1)
+    @ores = new
+    articles.each do |article|
+      article.ores_data = @ores.get_revision_data(article.latest_revision)
+      puts article.ores_data.dig('scores', 'enwiki', 'wp10', 'features',
+                                 article.latest_revision.to_s,
+                                 'feature.enwiki.revision.image_links')
+    end
+    selected_articles = articles.select do |article|
+      article.ores_data.dig('scores', 'enwiki', 'wp10', 'features',
+                            article.latest_revision.to_s,
+                            'feature.enwiki.revision.image_links') == image_count
+    end
+    selected_articles
+  end
+  def initialize
+    @project_code = 'enwiki'
+  end
+  def get_revision_data(rev_id)
+    # TODO: i18n
+    response = ores_server.get query_url(rev_id)
+    ores_data = JSON.parse(response.body)
+    ores_data
+  rescue StandardError => error
+    raise error unless TYPICAL_ERRORS.include?(error.class)
+    return {}
+  end
+  TYPICAL_ERRORS = [
+    Errno::ETIMEDOUT,
+    Net::ReadTimeout,
+    Errno::ECONNREFUSED,
+    JSON::ParserError,
+    Errno::EHOSTUNREACH,
+    Faraday::ConnectionFailed,
+    Faraday::TimeoutError
+  ].freeze
+  class InvalidProjectError < StandardError
+  end
+  private
+  def query_url(rev_id)
+    base_url = "/v2/scores/#{@project_code}/wp10/"
+    url = base_url + rev_id.to_s + '/?features'
+    url = URI.encode url
+    url
+  end
+  def ores_server
+    conn = Faraday.new(url: 'https://ores.wikimedia.org')
+    conn.headers['User-Agent'] = '@WikiPhotoFight by ragesoss'
+    conn
+  end
+end

data/lib/wikipedia_twitterbot/tweet.rb ADDED Viewed

@@ -0,0 +1,40 @@
+require 'twitter'
+# Finds tweetable articles, tweets them
+class Tweet
+  # Find an article to tweet and tweet it
+  def self.anything
+    # Randomly tweet either the earlier tweetable Article in the database
+    # or the latest.
+    # Wikipedia increments page ids over time, so the first ids are the oldest
+    # articles and the last ids are the latest.
+    article = if coin_flip
+                Article.last_tweetable
+              else
+                Article.first_tweetable
+              end
+    article.tweet
+    puts "Tweeted #{article.title}"
+  end
+  ###############
+  # Twitter API #
+  ###############
+  def initialize(tweet, filename: nil)
+    if filename
+      Wiki.save_commons_image filename
+      TwitterClient.new.client.update_with_media(tweet, File.new(filename))
+      File.delete filename
+    else
+      TwitterClient.new.client.update(tweet)
+    end
+  end
+  ###########
+  # Helpers #
+  ###########
+  def self.coin_flip
+    [true, false][rand(2)]
+  end
+end

data/lib/wikipedia_twitterbot/twitter_client.rb ADDED Viewed

@@ -0,0 +1,33 @@
+class TwitterClient
+  attr_reader :client
+  def initialize
+    twitter_secrets = YAML.safe_load File.read('twitter.yml')
+    @client = Twitter::REST::Client.new do |config|
+      config.consumer_key = twitter_secrets['twitter_consumer_key']
+      config.consumer_secret = twitter_secrets['twitter_consumer_secret']
+      config.access_token = twitter_secrets['twitter_access_token']
+      config.access_token_secret = twitter_secrets['twitter_access_token_secret']
+    end
+  end
+  def top_hashtag(search_query)
+    top_with_count = related_hashtags(search_query).max_by { |_h, v| v }
+    top_with_count[0] unless top_with_count.nil?
+  end
+  def related_hashtags(search_query)
+    @texts = @client.search(search_query).first(200).map(&:text)
+    @hashtags = Hash.new { |h, k| h[k] = 0 }
+    @texts.select! { |text| text.match(/#/) }
+    @texts.each do |text|
+      hashtags_in(text).each do |hashtag|
+        @hashtags[hashtag] += 1
+      end
+    end
+    @hashtags
+  end
+  def hashtags_in(text)
+    text.scan(/\s(#\w+)/).flatten
+  end
+end

data/lib/wikipedia_twitterbot/version.rb ADDED Viewed

@@ -0,0 +1,3 @@
+module WikipediaTwitterbot
+  VERSION = '0.1.0'.freeze
+end

data/lib/wikipedia_twitterbot/wiki.rb ADDED Viewed

@@ -0,0 +1,66 @@
+require 'mediawiki_api'
+require 'open-uri'
+#= This class is for getting data directly from the Wikipedia API.
+class Wiki
+  ################
+  # Entry points #
+  ################
+  # General entry point for making arbitrary queries of the Wikipedia API
+  def self.query(query_parameters, opts = {})
+    wikipedia('query', query_parameters, opts)
+  end
+  def self.get_page_content(page_title, opts = {})
+    response = wikipedia('get_wikitext', page_title, opts)
+    response.status == 200 ? response.body : nil
+  end
+  def self.save_commons_image(filename)
+    opts = { site: 'commons.wikimedia.org' }
+    query = { prop: 'imageinfo',
+              iiprop: 'url',
+              iiurlheight: 1000,
+              titles: filename }
+    response = query(query, opts)
+    url = response.data['pages'].values.first['imageinfo'].first['thumburl']
+    File.write filename, open(url).read
+  end
+  ###################
+  # Private methods #
+  ###################
+  class << self
+    private
+    def wikipedia(action, query, opts = {})
+      tries ||= 3
+      @mediawiki = api_client(opts)
+      @mediawiki.send(action, query)
+    rescue StandardError => e
+      tries -= 1
+      typical_errors = [Faraday::TimeoutError,
+                        Faraday::ConnectionFailed,
+                        MediawikiApi::HttpError]
+      if typical_errors.include?(e.class)
+        retry if tries >= 0
+      else
+        raise e
+      end
+    end
+    def api_client(opts)
+      site = opts[:site]
+      language = opts[:language] || 'en'
+      url = if site
+              "https://#{site}/w/api.php"
+            else
+              "https://#{language}.wikipedia.org/w/api.php"
+            end
+      MediawikiApi::Client.new url
+    end
+  end
+end

data/lib/wikipedia_twitterbot/wiki_pageviews.rb ADDED Viewed

@@ -0,0 +1,86 @@
+# Fetches pageview data from the Wikimedia pageviews REST API
+# Documentation: https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews_per_article_project_access_agent_article_granularity_start_end
+class WikiPageviews
+  ################
+  # Entry points #
+  ################
+  # Given an article title and a date, return the number of page views for every
+  # day from that date until today.
+  #
+  # [title]  title of a Wikipedia page (including namespace, if applicable)
+  # [date]   a specific date
+  def self.views_for_article(title, opts = {})
+    language = opts[:language] || 'en'
+    start_date = opts[:start_date] || (Date.today - 30)
+    end_date = opts[:end_date] || Date.today
+    url = query_url(title, start_date, end_date, language)
+    data = api_get url
+    return unless data
+    data = JSON.parse data
+    return unless data.include?('items')
+    daily_view_data = data['items']
+    views = {}
+    daily_view_data.each do |day_data|
+      date = day_data['timestamp'][0..7]
+      views[date] = day_data['views']
+    end
+    views
+  end
+  def self.average_views_for_article(title, opts = {})
+    language = opts[:language] || 'en'
+    data = recent_views(title, language)
+    # TODO: better handling of unexpected or empty responses, including logging
+    return unless data
+    data = JSON.parse data
+    return unless data.include?('items')
+    daily_view_data = data['items']
+    days = daily_view_data.count
+    total_views = 0
+    daily_view_data.each do |day_data|
+      total_views += day_data['views']
+    end
+    return if total_views == 0
+    average_views = total_views.to_f / days
+    average_views
+  end
+  ##################
+  # Helper methods #
+  ##################
+  def self.recent_views(title, language)
+    start_date = DateTime.now - 50
+    end_date = DateTime.now
+    url = query_url(title, start_date, end_date, language)
+    api_get url
+  end
+  def self.query_url(title, start_date, end_date, language)
+    title = CGI.escape(title)
+    base_url = 'https://wikimedia.org/api/rest_v1/metrics/pageviews/'
+    configuration_params = "per-article/#{language}.wikipedia/all-access/user/"
+    start_param = start_date.strftime('%Y%m%d')
+    end_param = end_date.strftime('%Y%m%d')
+    title_and_date_params = "#{title}/daily/#{start_param}00/#{end_param}00"
+    url = base_url + configuration_params + title_and_date_params
+    url
+  end
+  ###################
+  # Private methods #
+  ###################
+  class << self
+    private
+    def api_get(url)
+      tries ||= 3
+      response = Net::HTTP.get(URI.parse(url))
+      response
+    rescue Errno::ETIMEDOUT
+      retry unless tries.zero?
+    rescue StandardError => e
+      raise e
+    end
+  end
+end

data/lib/wikipedia_twitterbot.rb ADDED Viewed

@@ -0,0 +1,16 @@
+require 'wikipedia_twitterbot/version'
+module WikipediaTwitterbot
+  require_relative 'wikipedia_twitterbot/article'
+  require_relative 'wikipedia_twitterbot/wiki'
+  require_relative 'wikipedia_twitterbot/wiki_pageviews'
+  require_relative 'wikipedia_twitterbot/ores'
+  require_relative 'wikipedia_twitterbot/twitter_client'
+  require_relative 'wikipedia_twitterbot/find_articles'
+  require_relative 'wikipedia_twitterbot/find_images'
+  require_relative 'wikipedia_twitterbot/high_pageviews'
+  require_relative 'wikipedia_twitterbot/category_filter'
+  require_relative 'wikipedia_twitterbot/discard_redirects'
+  require_relative 'wikipedia_twitterbot/tweet'
+  require_relative 'wikipedia_twitterbot/db/bootstrap'
+end

data/wikipedia_twitterbot.gemspec ADDED Viewed

@@ -0,0 +1,33 @@
+# coding: utf-8
+lib = File.expand_path('../lib', __FILE__)
+$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
+require 'wikipedia_twitterbot/version'
+Gem::Specification.new do |spec|
+  spec.name          = 'wikipedia_twitterbot'
+  spec.version       = WikipediaTwitterbot::VERSION
+  spec.authors       = ['Sage Ross']
+  spec.email         = ['sage@ragesoss.com']
+  spec.summary       = 'Tools for building Wikipedia-focused Twitter bots'
+  spec.homepage      = 'https://github.com/ragesoss/WikipediaTwitterbot'
+  spec.license       = 'MIT'
+  spec.files         = `git ls-files -z`.split("\x0").reject do |f|
+    f.match(%r{^(test|spec|features)/})
+  end
+  spec.bindir        = 'exe'
+  spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
+  spec.require_paths = ['lib']
+  spec.add_development_dependency 'bundler', '~> 1.15'
+  spec.add_development_dependency 'rake', '~> 10.0'
+  spec.add_runtime_dependency 'sqlite3'
+  spec.add_runtime_dependency 'activerecord'
+  spec.add_runtime_dependency 'activerecord-import'
+  spec.add_runtime_dependency 'twitter'
+  spec.add_runtime_dependency 'mediawiki_api'
+  spec.add_runtime_dependency 'logger'
+end

metadata ADDED Viewed

@@ -0,0 +1,180 @@
+--- !ruby/object:Gem::Specification
+name: wikipedia_twitterbot
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Sage Ross
+autorequire:
+bindir: exe
+cert_chain: []
+date: 2017-11-27 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: bundler
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.15'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.15'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+- !ruby/object:Gem::Dependency
+  name: sqlite3
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: activerecord
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: activerecord-import
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: twitter
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: mediawiki_api
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: logger
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+description:
+email:
+- sage@ragesoss.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- ".gitignore"
+- CODE_OF_CONDUCT.md
+- Gemfile
+- LICENSE.txt
+- README.md
+- Rakefile
+- bin/console
+- bin/setup
+- lib/wikipedia_twitterbot.rb
+- lib/wikipedia_twitterbot/article.rb
+- lib/wikipedia_twitterbot/category_filter.rb
+- lib/wikipedia_twitterbot/db/001_create_articles.rb
+- lib/wikipedia_twitterbot/db/bootstrap.rb
+- lib/wikipedia_twitterbot/discard_redirects.rb
+- lib/wikipedia_twitterbot/find_articles.rb
+- lib/wikipedia_twitterbot/find_images.rb
+- lib/wikipedia_twitterbot/high_pageviews.rb
+- lib/wikipedia_twitterbot/ores.rb
+- lib/wikipedia_twitterbot/tweet.rb
+- lib/wikipedia_twitterbot/twitter_client.rb
+- lib/wikipedia_twitterbot/version.rb
+- lib/wikipedia_twitterbot/wiki.rb
+- lib/wikipedia_twitterbot/wiki_pageviews.rb
+- wikipedia_twitterbot.gemspec
+homepage: https://github.com/ragesoss/WikipediaTwitterbot
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.6.8
+signing_key:
+specification_version: 4
+summary: Tools for building Wikipedia-focused Twitter bots
+test_files: []