RubyGems - reddit_get - Versions diffs - 0.1.0 - Mend

reddit_get 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 10171c0f2446ba66686862c922b20a8bcd9212f15c3134e0af72652fa4e6f0dd
+  data.tar.gz: 61563a46f484e8dc9125febbb68c8e988774506b6037beb784c54e686c42dbed
+SHA512:
+  metadata.gz: 602c9bdd9377205c787d6c573ac27d14781b30c1505fc5ac2beabde6a607fb27c7274745ee958567612c2413bc231c20dbd9f7b352117f4615be375baa73d17a
+  data.tar.gz: e32b9aeeb505c44f0d1573e0cdf33748cb0c203216a51a87e6fc006d82fd7503c0f398c9356a44914b357b8729381d57b92d78dfbb85893b54f844f4d8df19fe

data/.gitignore ADDED Viewed

@@ -0,0 +1,8 @@
+/.bundle/
+/.yardoc
+/_yardoc/
+/coverage/
+/doc/
+/pkg/
+/spec/reports/
+/tmp/

data/.rubocop.yml ADDED Viewed

@@ -0,0 +1,6 @@
+AllCops:
+  NewCops: enable
+  Exclude:
+    - lib/scheduler.rb
+require: rubocop-performance

data/Gemfile ADDED Viewed

@@ -0,0 +1,12 @@
+# frozen_string_literal: true
+source 'https://rubygems.org'
+ruby '3.0.0'
+gemspec
+group :development do
+  gem 'rubocop'
+  gem 'rubocop-performance'
+end

data/Gemfile.lock ADDED Viewed

@@ -0,0 +1,45 @@
+PATH
+  remote: .
+  specs:
+    reddit_get (0.1.0)
+GEM
+  remote: https://rubygems.org/
+  specs:
+    ast (2.4.2)
+    parallel (1.20.1)
+    parser (3.0.0.0)
+      ast (~> 2.4.1)
+    rainbow (3.0.0)
+    regexp_parser (2.0.3)
+    rexml (3.2.4)
+    rubocop (1.9.1)
+      parallel (~> 1.10)
+      parser (>= 3.0.0.0)
+      rainbow (>= 2.2.2, < 4.0)
+      regexp_parser (>= 1.8, < 3.0)
+      rexml
+      rubocop-ast (>= 1.2.0, < 2.0)
+      ruby-progressbar (~> 1.7)
+      unicode-display_width (>= 1.4.0, < 3.0)
+    rubocop-ast (1.4.1)
+      parser (>= 2.7.1.5)
+    rubocop-performance (1.9.2)
+      rubocop (>= 0.90.0, < 2.0)
+      rubocop-ast (>= 0.4.0)
+    ruby-progressbar (1.11.0)
+    unicode-display_width (2.0.0)
+PLATFORMS
+  x86_64-darwin-18
+DEPENDENCIES
+  reddit_get!
+  rubocop
+  rubocop-performance
+RUBY VERSION
+   ruby 3.0.0p0
+BUNDLED WITH
+   2.2.3

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright (c) 2021 Alessandro
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,67 @@
+# RedditGet
+This gem allows you to grab posts and comments from Reddit without any auth.
+It concurrently grabs multiple subbreddits at a time to utilize your machine as much as possible and increase throughput.
+No setup and a clean interface makes this gem ideal when you just want to process public reddit data.
+Zero dependencies.
+The [Redd gem](https://github.com/avinashbot/redd) seems to be abandoned so I created this gem to meet my needs.
+## Installation
+```ruby
+gem 'reddit_get'
+```
+And then execute:
+    $ bundle install
+Or install it yourself as:
+    $ gem install reddit_get
+## Usage
+### You want to grab many subreddits
+```ruby
+results = RedditGet::Subreddit.collect_all %w[gaming videos movies funny]
+results # will hold RedditGet::Data which acts like a hash
+results['gaming'].each do |post|
+  puts post['title']
+end
+results.gaming
+```
+### You want to grab one subreddit
+```ruby
+result = RedditGet::Subreddit.collect('gaming')
+results.gaming.each do |post|
+  puts post.title # all gaming posts titles
+end
+results['gaming'] # works too!
+```
+## You want to grab comments for each post
+```ruby
+results = RedditGet::Subreddit.collect_all %w[gaming videos movies funny], with_comments: true
+results.gaming.each do |post|
+  puts post.title
+  post.comments.each do |comment|
+    puts comment.body
+  end
+end
+# also works with single subreddit
+RedditGet::Subreddit.collect 'gaming', with_comments: true
+```
+## Contributing
+Bug reports and pull requests are welcome on GitHub at https://github.com/AlessandroMinali/reddit_get.
+## License
+The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).

data/lib/reddit_get.rb ADDED Viewed

@@ -0,0 +1,122 @@
+# frozen_string_literal: true
+require 'net/http'
+require 'json'
+require_relative 'scheduler'
+require_relative 'reddit_get/version'
+module RedditGet
+  class Error < StandardError; end
+  # Allow to use method call chains instead of Hash keys navigation
+  class Data
+    def initialize(data)
+      @data = data
+    end
+    def objectify(data)
+      data.transform_values! do |v|
+        case v
+        when Hash
+          Data.new(v)
+        when Array
+          v.map { |i| Data.new(i) }
+        else
+          v
+        end
+      end
+    end
+    def [](key)
+      @data.fetch(key)
+    end
+    def method_missing(method, *_args)
+      @data.send(method)
+    rescue NoMethodError
+      out = @data.fetch(method.to_s)
+      case out
+      when Hash
+        Data.new(out)
+      when Array
+        out.map { |i| Data.new(i) }
+      else
+        out
+      end
+    end
+    def respond_to_missing?(method)
+      @data.key?(method_name.to_s) || super
+    end
+  end
+  # Grab subreddit top page as json
+  class Subreddit
+    BASE_URL = 'https://old.reddit.com'
+    def self.collect_all(subreddits, with_comments: false)
+      raise TypeError, 'Must pass an array of subreddits' unless subreddits.is_a?(Array)
+      results = Hash[subreddits.zip([])]
+      subreddits.uniq.each do |subreddit|
+        grab_posts(results, subreddit, with_comments: with_comments)
+      end
+      scheduler_run
+      Data.new(results)
+    end
+    def self.collect(subreddit, with_comments: false)
+      collect_all([subreddit], with_comments: with_comments)
+    end
+    class << self
+      private
+      def scheduler_run
+        scheduler = Scheduler.new
+        Fiber.set_scheduler scheduler
+        scheduler.run
+      end
+      def grab_posts(results, subreddit, with_comments:)
+        Fiber.new do
+          results[subreddit] = get_reddit_posts(subreddit).map! do |post|
+            grab_comments(post) if with_comments
+            post['data']
+          end
+        end.resume
+      end
+      def grab_comments(post)
+        url = post['data']['permalink']
+        Fiber.new do
+          post['data']['comments'] = get_reddit_comments(url).map! do |comment|
+            comment['data']
+          end
+        end.resume
+      end
+      def get_reddit_posts(subreddit)
+        get_json(URI("#{BASE_URL}/r/#{subreddit}.json")).dig('data', 'children')
+      end
+      def get_reddit_comments(url)
+        get_json("#{BASE_URL}#{url}.json")[1].dig('data', 'children')
+      end
+      def get_json(uri)
+        req = Net::HTTP::Get.new(
+          uri,
+          { 'User-Agent':
+              'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0' }
+        )
+        body = Net::HTTP.start('old.reddit.com', 443, use_ssl: true) do |http|
+          http.request(req)
+        end.body
+        JSON.parse(body)
+      end
+    end
+  end
+end

data/lib/reddit_get/version.rb ADDED Viewed

@@ -0,0 +1,5 @@
+# frozen_string_literal: true
+module RedditGet
+  VERSION = '0.1.0'
+end

data/lib/scheduler.rb ADDED Viewed

@@ -0,0 +1,186 @@
+# frozen_string_literal: true
+# This is an example and simplified scheduler for test purposes.
+# It is not efficient for a large number of file descriptors as it uses IO.select().
+# Production Fiber schedulers should use epoll/kqueue/etc.
+require 'fiber'
+require 'socket'
+begin
+  require 'io/nonblock'
+rescue LoadError
+  # Ignore.
+end
+module RedditGet
+  class Scheduler
+    def initialize
+      @readable = {}
+      @writable = {}
+      @waiting = {}
+      @closed = false
+      @lock = Mutex.new
+      @blocking = 0
+      @ready = []
+      @urgent = IO.pipe
+    end
+    attr :readable, :writable, :waiting
+    def next_timeout
+      _fiber, timeout = @waiting.min_by { |_key, value| value }
+      if timeout
+        offset = timeout - current_time
+        if offset.negative?
+          0
+        else
+          offset
+        end
+      end
+    end
+    def run
+      while @readable.any? || @writable.any? || @waiting.any? || @blocking.positive?
+        # Can only handle file descriptors up to 1024...
+        readable, writable = IO.select(@readable.keys + [@urgent.first], @writable.keys, [],
+                                       next_timeout)
+        # puts "readable: #{readable}" if readable&.any?
+        # puts "writable: #{writable}" if writable&.any?
+        readable&.each do |io|
+          if fiber = @readable.delete(io)
+            fiber.resume
+          elsif io == @urgent.first
+            @urgent.first.read_nonblock(1024)
+          end
+        end
+        writable&.each do |io|
+          if fiber = @writable.delete(io)
+            fiber.resume
+          end
+        end
+        if @waiting.any?
+          time = current_time
+          waiting = @waiting
+          @waiting = {}
+          waiting.each do |fiber, timeout|
+            if timeout <= time
+              fiber.resume
+            else
+              @waiting[fiber] = timeout
+            end
+          end
+        end
+        next unless @ready.any?
+        ready = nil
+        @lock.synchronize do
+          ready = @ready
+          @ready = []
+        end
+        ready.each(&:resume)
+      end
+    end
+    def close
+      raise 'Scheduler already closed!' if @closed
+      run
+    ensure
+      @urgent.each(&:close)
+      @urgent = nil
+      @closed = true
+      # We freeze to detect any unintended modifications after the scheduler is closed:
+      freeze
+    end
+    def closed?
+      @closed
+    end
+    def current_time
+      Process.clock_gettime(Process::CLOCK_MONOTONIC)
+    end
+    def process_wait(pid, flags)
+      # This is a very simple way to implement a non-blocking wait:
+      Thread.new do
+        Process::Status.wait(pid, flags)
+      end.value
+    end
+    def io_wait(io, events, _duration)
+      @readable[io] = Fiber.current unless (events & IO::READABLE).zero?
+      @writable[io] = Fiber.current unless (events & IO::WRITABLE).zero?
+      Fiber.yield
+      events
+    end
+    # Used for Kernel#sleep and Mutex#sleep
+    def kernel_sleep(duration = nil)
+      block(:sleep, duration)
+      true
+    end
+    # Used when blocking on synchronization (Mutex#lock, Queue#pop, SizedQueue#push, ...)
+    def block(_blocker, timeout = nil)
+      # $stderr.puts [__method__, blocker, timeout].inspect
+      if timeout
+        @waiting[Fiber.current] = current_time + timeout
+        begin
+          Fiber.yield
+        ensure
+          # Remove from @waiting in the case #unblock was called before the timeout expired:
+          @waiting.delete(Fiber.current)
+        end
+      else
+        @blocking += 1
+        begin
+          Fiber.yield
+        ensure
+          @blocking -= 1
+        end
+      end
+    end
+    # Used when synchronization wakes up a previously-blocked fiber (Mutex#unlock, Queue#push, ...).
+    # This might be called from another thread.
+    def unblock(_blocker, fiber)
+      # $stderr.puts [__method__, blocker, fiber].inspect
+      @lock.synchronize do
+        @ready << fiber
+      end
+      io = @urgent.last
+      io.write_nonblock('.')
+    end
+    def fiber(&block)
+      fiber = Fiber.new(blocking: false, &block)
+      fiber.resume
+      fiber
+    end
+  end
+end

data/reddit_get.gemspec ADDED Viewed

@@ -0,0 +1,30 @@
+# frozen_string_literal: true
+require_relative 'lib/reddit_get/version'
+Gem::Specification.new do |spec|
+  spec.name          = 'reddit_get'
+  spec.version       = RedditGet::VERSION
+  spec.authors       = ['Alessandro']
+  spec.email         = ['4143332+AlessandroMinali@users.noreply.github.com']
+  spec.summary       = 'Simply grab subreddit posts and their comments'
+  spec.description   = 'A clean interface to handle reddit data without auth.'
+  spec.homepage      = 'https://github.com/AlessandroMinali/reddit_get'
+  spec.license       = 'MIT'
+  spec.required_ruby_version = Gem::Requirement.new('>= 3.0.0')
+  spec.metadata['homepage_uri'] = spec.homepage
+  spec.metadata['source_code_uri'] = 'https://github.com/AlessandroMinali/reddit_get'
+  # Specify which files should be added to the gem when it is released.
+  # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
+  spec.files = Dir.chdir(File.expand_path(__dir__)) do
+    `git ls-files -z`.split("\x0").reject { |f| f.match(%r{\A(?:test|spec|features)/}) }
+  end
+  spec.bindir        = 'exe'
+  spec.executables   = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
+  spec.require_paths = ['lib']
+  spec.add_development_dependency 'rubocop'
+end

data/spec.rb ADDED Viewed

@@ -0,0 +1,35 @@
+# frozen_string_literal: true
+require_relative 'lib/reddit_get'
+# RedditGet#collect
+expect = RedditGet::Subreddit.collect('gaming')
+raise 'Not a hash' unless expect.is_a? RedditGet::Data
+raise 'No results returned' unless expect.values.any?
+begin
+  RedditGet::Subreddit.collect(['gaming'])
+rescue URI::InvalidURIError
+  raise_error = true
+ensure
+  raise 'Should fail' unless raise_error
+end
+# RedditGet#collect_all
+expect = RedditGet::Subreddit.collect_all(%w[gaming videos])
+raise 'Not a hash' unless expect.is_a? RedditGet::Data
+raise 'No results returned' unless expect.values.any?
+expect = RedditGet::Subreddit.collect_all(%w[gaming videos gaming])
+raise 'Must remove dups' unless expect.keys.count == 2
+begin
+  RedditGet::Subreddit.collect_all('gaming')
+rescue TypeError
+  raise_error = true
+ensure
+  raise 'Should fail' unless raise_error
+end
+expect = RedditGet::Subreddit.collect_all(%w[gaming videos], with_comments: true)
+raise 'Must have comments' unless expect.gaming.all? { |i| i.comments.any? }

metadata ADDED Viewed

@@ -0,0 +1,70 @@
+--- !ruby/object:Gem::Specification
+name: reddit_get
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Alessandro
+autorequire:
+bindir: exe
+cert_chain: []
+date: 2021-02-13 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rubocop
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+description: A clean interface to handle reddit data without auth.
+email:
+- 4143332+AlessandroMinali@users.noreply.github.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- ".gitignore"
+- ".rubocop.yml"
+- Gemfile
+- Gemfile.lock
+- LICENSE.txt
+- README.md
+- lib/reddit_get.rb
+- lib/reddit_get/version.rb
+- lib/scheduler.rb
+- reddit_get.gemspec
+- spec.rb
+homepage: https://github.com/AlessandroMinali/reddit_get
+licenses:
+- MIT
+metadata:
+  homepage_uri: https://github.com/AlessandroMinali/reddit_get
+  source_code_uri: https://github.com/AlessandroMinali/reddit_get
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: 3.0.0
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 3.2.3
+signing_key:
+specification_version: 4
+summary: Simply grab subreddit posts and their comments
+test_files: []