RubyGems - burly - Versions diffs - 0.1.0 → 0.2.0 - Mend

burly 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml +4 -4
data/README.md +28 -2
data/burly.gemspec +2 -2
data/lib/burly/parser.rb +2 -2
data/lib/burly/parsers/html_parser.rb +15 -2
data/lib/burly.rb +17 -3
metadata +4 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 5012d619b0f133bb3b826065339370fb825ad444bf832987bdb01485daa5c232
-  data.tar.gz: 74921dc5b75e56b3e38aec36b175176a790cbb81c5a8fc67e73f40f6f61ba505
+  metadata.gz: 03d52fe37c2dc18c93b9e2bf74d149760264f2bb4014a0d9cc54fae78d3e1b63
+  data.tar.gz: 5e984dbdd20148684b992dfa19ea3638fc6218abcb451db556422e9f614e7ddb
 SHA512:
-  metadata.gz: 1d5c5f24ab94231093ebc53a9e2c47d595d1ff3a1a4810b8dc300467ae6475263e0f05a14f6f9051216d97dd77009018bb379d6519414603e254192bb09947f9
-  data.tar.gz: b0ad732e42aa2d84693eb727c4cca9e61fd85be214323fed5a79cc12663c953d5145a934fa3920fecd8b708a626e00322d1c60e84237fa60e335510a291039f9
+  metadata.gz: 837debbc2f3ddcf3b4c53663e6a0a5100ce96d89adc9fd5ce160a6c93f1d6ac38026ba00567e1e2c07920902bc63c2435240c61b4a2c975826a45168ce669996
+  data.tar.gz: da0af5413e7c00d4903d536db24bf64fc90ad2beea353cb1747a0455d3ef68bf98f42fe7847dfee651190154c47cb6cc37414b00213ff4d3c45ef2f9a7e21d1e

data/README.md CHANGED Viewed

@@ -33,9 +33,9 @@ Burly.parse(File.read("example.txt"))
 Parsing JSON or HTML documents is only slightly more complicated:
 ```ruby
-Burly.parse(File.read("example.json"), mime_type: "application/json")
+Burly.parse(File.read("example.html"), mime_type: "text/html")
-Burly.parse(File.read("example.html", mime_type: "text/html"))
+Burly.parse(File.read("example.json"), mime_type: "application/json")
 ```
 Burly uses _slightly_ different parsing rules for each supported MIME type:
@@ -46,6 +46,32 @@ Burly uses _slightly_ different parsing rules for each supported MIME type:
 In all cases, neither order nor uniqueness is guaranteed. You may also consider converting relative URLs extract from HTML documents to absolute URLs using the document's source URL and/or the `<base>` element's `href` attribute value (Ruby's [`URI.join` class method](https://docs.ruby-lang.org/en/master/URI.html#method-c-join) is good for this!).
+## Parser Options
+Burly's HTML parser supports a single option, `context`, which accepts either a String or an Array of Strings. The values may be either CSS or XPath selectors
+```ruby
+Burly.parse(File.read("example.html"), context: "main", mime_type: "text/html")
+Burly.parse(File.read("example.html"), context: ["//main", "//div"], mime_type: "text/html")
+```
+In all cases, Burly will search for nodes matching the provided selector(s) and use the _first_ match as the context within which to search for URLs. The `context` option is a great way to refine the list of extracted URLs based on their presence within the source document.
+> [!NOTE]
+> If Burly can't locate a node matching the provided selector(s), the context is reset to the document root.
+> [!TIP]
+> Passing an Array of Strings can be used to achieve an effect similar to conditional logic with fallback behavior.
+>
+> ```ruby
+> require "net/http"
+>
+> response = Net::HTTP.get(URI.parse("https://jgarber.example"))
+>
+> Burly.parse(response, context: [".h-entry .e-content", ".h-entry", "body"], mime_type: "text/html")
+> ```
 ## License
 Burly is freely available under the [MIT License](https://opensource.org/license/MIT).

data/burly.gemspec CHANGED Viewed

@@ -4,7 +4,7 @@ Gem::Specification.new do |spec|
   spec.required_ruby_version = ">= 2.6"
   spec.name = "burly"
-  spec.version = "0.1.0"
+  spec.version = "0.2.0"
   spec.authors = ["Jason Garber"]
   spec.email = ["jason@sixtwothree.org"]
@@ -25,7 +25,7 @@ Gem::Specification.new do |spec|
     "documentation_uri" => "https://rubydoc.info/gems/#{spec.name}/#{spec.version}",
     "homepage_uri" => spec.homepage,
     "rubygems_mfa_required" => "true",
-    "source_code_uri" => "#{spec.homepage}/tree/v#{spec.version}",
+    "source_code_uri" => "#{spec.homepage}/src/tag/v#{spec.version}",
   }
   spec.add_dependency "nokogiri", ">= 1.13"

data/lib/burly/parser.rb CHANGED Viewed

@@ -12,8 +12,8 @@ module Burly
       attr_reader :mime_types
     end
-    # @param document [String]
-    def initialize(document)
+    # @param document (see Burly.parse)
+    def initialize(document, **_kwargs)
       @document = document
     end

data/lib/burly/parsers/html_parser.rb CHANGED Viewed

@@ -33,9 +33,17 @@ module Burly
       ATTRIBUTES_XPATHS =
         URL_ATTRIBUTES_MAP.merge(SRCSET_ATTRIBUTES_MAP).flat_map do |attribute, names|
-          names.map { |name| "//#{name} / @#{attribute}" }
+          names.map { |name| ".//#{name} / @#{attribute}" }
         end
+      # @param document (see Burly.parse)
+      # @param context [String, Array<String>]
+      def initialize(document, context: nil)
+        @context = context
+        super
+      end
       # Parse an HTML document for absolute or relative URLs.
       #
       # @return [Array<String>]
@@ -53,7 +61,12 @@ module Burly
       # @return [Nokogiri::XML::NodeSet]
       def attr_nodes
-        @attr_nodes ||= doc.xpath(*ATTRIBUTES_XPATHS)
+        @attr_nodes ||= context_node.xpath(*ATTRIBUTES_XPATHS)
+      end
+      # @return [Nokogiri::HTML5::Document, Nokogiri::XML::Element]
+      def context_node
+        @context_node ||= doc.search(*Array(@context)).first || doc
       end
       # @return [Nokogiri::HTML5::Document]

data/lib/burly.rb CHANGED Viewed

@@ -16,15 +16,29 @@ module Burly
     attr_reader :registered_parsers
   end
-  # @param document [String]
+  # Parse a document for URLs.
+  #
+  # @example Parse a plaintext document.
+  #   Burly.parse(File.read("example.txt"))
+  #
+  # @example Parse an HTML document
+  #   Burly.parse(File.read("example.html", mime_type: "text/html"))
+  #
+  # @example Parse a JSON document.
+  #   Burly.parse(File.read("example.json"), mime_type: "application/json")
+  #
+  # @param document [String] The document to parse for URLs.
+  #
+  # @raise [UnsupportedMimeType]
+  #   Raised when an unsupported MIME type is passed as an option.
   #
   # @return [Array<String>]
-  def self.parse(document, mime_type: "text/plain")
+  def self.parse(document, mime_type: "text/plain", **options)
     parser = registered_parsers[mime_type]
     raise UnsupportedMimeType unless parser
-    parser.new(document).parse
+    parser.new(document, **options).parse
   end
   # @api private

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: burly
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.2.0
 platform: ruby
 authors:
 - Jason Garber
@@ -44,11 +44,11 @@ licenses:
 - MIT
 metadata:
   bug_tracker_uri: https://codeberg.org/jgarber/burly/issues
-  changelog_uri: https://codeberg.org/jgarber/burly/releases/tag/v0.1.0
-  documentation_uri: https://rubydoc.info/gems/burly/0.1.0
+  changelog_uri: https://codeberg.org/jgarber/burly/releases/tag/v0.2.0
+  documentation_uri: https://rubydoc.info/gems/burly/0.2.0
   homepage_uri: https://codeberg.org/jgarber/burly
   rubygems_mfa_required: 'true'
-  source_code_uri: https://codeberg.org/jgarber/burly/tree/v0.1.0
+  source_code_uri: https://codeberg.org/jgarber/burly/src/tag/v0.2.0
 rdoc_options: []
 require_paths:
 - lib