RubyGems - camdict - Versions diffs - 1.0.3 → 2.0.0 - Mend

camdict 1.0.3 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

checksums.yaml +4 -4
data/README.md +28 -33
data/lib/camdict/array_ext.rb +37 -0
data/lib/camdict/client.rb +133 -97
data/lib/camdict/common.rb +25 -143
data/lib/camdict/definition.rb +65 -596
data/lib/camdict/entry.rb +76 -0
data/lib/camdict/exception.rb +5 -0
data/lib/camdict/explanation.rb +29 -66
data/lib/camdict/http_client.rb +14 -10
data/lib/camdict/ipa.rb +52 -0
data/lib/camdict/pronunciation.rb +53 -0
data/lib/camdict/sentence.rb +38 -0
data/lib/camdict/string_ext.rb +141 -0
data/lib/camdict/word.rb +83 -17
data/test/debug.rb +60 -0
data/test/helper.rb +2 -0
data/test/itest_client.rb +39 -8
data/test/itest_definition.rb +24 -75
data/test/itest_entry.rb +37 -0
data/test/itest_explanation.rb +41 -20
data/test/itest_ipa.rb +105 -0
data/test/itest_pronunciation.rb +74 -0
data/test/itest_word.rb +49 -0
data/test/test_array_ext.rb +23 -0
data/test/test_client.rb +35 -42
data/test/test_common.rb +22 -78
data/test/test_explanation.rb +21 -25
data/test/test_http_client.rb +27 -13
data/test/test_string_ext.rb +95 -0
metadata +42 -7
data/test/test_definition.rb +0 -345

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 9375049d96a36f304ae7262da7047f6c4d6e0402
-  data.tar.gz: a1256a02286a23bbd204bb0cd5de8a5670c5d12b
+  metadata.gz: 65f86517bb1f1674118ec92b379e0436fe9fdbec
+  data.tar.gz: 05b6e68ba21c6d9dbee96863ed2146ceea9d12e1
 SHA512:
-  metadata.gz: 857e3d9ee01bdf43511371ec3f1b6bf6df45185f4078e2e1e245243307322a85b9e3bda3b2a77523fa3884b14dd27c62541c1d27a02d718b447e750546bdbf02
-  data.tar.gz: b69da41647e14adefa47ee049df7b7967354f212479e6b3669d3c4e19e6a8385ae6222bd2df73becb745e3758af6e62df75c1652286f3cde459d5c796964f5d4
+  metadata.gz: f93f6da6b914e84efc540f1fc753b75034402f218048cbc825e478dc6310a609d4282ffb4edb31e34bd572a43093ce66c25c82b37436a9a1b66d0970724e4acc
+  data.tar.gz: bb5cc16f332a3c6c28b66f31c4e08b59e11d2aea7086c4ee87b65a58997116b9f8bd635f504c964c5e950b957b8ade3e985f7380f63de8124e5470e59d233a94

data/README.md CHANGED Viewed

@@ -1,8 +1,10 @@
 # A ruby gem - camdict
+![Build Status][travis-image][travis-link]
+![Code Climate][climate-image][climate-link]
-## Introduction
+## Introduction
-The ruby gem camdict is a [Cambridge online dictionary][1] client.
+The ruby gem camdict is a [Cambridge online dictionary][1] client.
 You could use this excellent dictionary with a browser, but now it is possible
 to use it with this ruby API in your code.
@@ -10,52 +12,45 @@ to use it with this ruby API in your code.
 `gem install camdict`
 ## Verification
-The gem can be tested by below commands in the directory where it's installed.
-`rake`         - run all the testcases which don't need internet connection.
-`rake itest`   - run all the testcases that need internet connection.
+The gem can be tested by below commands in the directory where it's installed.
+`rake`         - run all the testcases which don't need internet connection.
+`rake itest`   - run all the testcases that need internet connection.
 `rake testall` - run all above tests.
-One test may fail if the gem nokogiri hasn't pulled in the fix [here][2]. But
-it is safe to apply the patch to your nokogiri copy.
 ## Usage
 ```ruby
     require 'camdict'
     # Look up a new word
-    word = Camdict::Word.new "health"
-    # get all definitions for this word from remote dictionary and select the
-    # first one. A word usually has many definitions.
-    health = word.definitions.first
+    word = Camdict::Word.new 'health'
     # Print the part of speech
     puts health.part_of_speech   #=> noun
-    # One definition may have more than one explanations.
-    # Just look at the details of the first one.
-    explanation1 = health.explanations.first
-    # What's the meaning
-    puts explanation1.meaning    #=>
-    # the condition of the body and the degree to which it is free from
-    # illness, or the state of being well:
-    # And it may have some useful example sentences.
-    explanation1.examples.each { |e|
-      puts e.sentence            #=>
-      # to be in good/poor health
-      # Regular exercise is good for your health.
-      # I had to give up drinking for health reasons.
-      # He gave up work because of ill health.
-    }
+    # What's the first meaning
+    puts health.meaning          #=>
+    # the condition of the body and the degree to which it is free from
+    # illness, or the state of being well:
+    # all meanings
+    puts health.meanings         #=> in addition to above meaning, it prints
+    # the condition of something that changes or develops, such as an
+    # organization or system:
 ```
-There are some useful testing examples in test directory of this gem.
+Need more? try `health.print` to show more data in a friendly format.
+## Versioning
+The release of this gem follows the [semantic versioning rules][2].
 ## Licence MIT
-Copyright (c) 2014 Pan Gaoyong
+Copyright (c) 2014-2017 Pan Gaoyong
 [1]: http://dictionary.cambridge.com "Cambridge"
-[2]: https://github.com/sparklemotion/nokogiri/pull/1020 "My Nokogiri Bug Fix"
+[2]: http://semver.org
+[travis-image]: https://travis-ci.org/pan/camdict.svg?branch=master
+[travis-link]: https://travis-ci.org/pan/camdict
+[climate-image]: https://codeclimate.com/github/pan/camdict/badges/gpa.svg
+[climate-link]: https://codeclimate.com/github/pan/camdict

data/lib/camdict/array_ext.rb ADDED Viewed

@@ -0,0 +1,37 @@
+# frozen_string_literal: true
+require 'camdict/string_ext'
+module Camdict
+  # Extention: Refine Array class.
+  module ArrayExt
+    refine Array do
+      # Iterate an array and return two elements +a+ +b+ each time for handling.
+      def each_pair
+        len = length
+        i = 0
+        while i < len
+          a = at(i)
+          b = at(i + 1)
+          yield(a, b)
+          i += 2
+        end
+      end
+      # Test if a phrase array includes a +word+.
+      #   ['blow your nose', 'blow a kiss to/at sb'].has?("a kiss at") #=> true
+      def has?(word)
+        expand.each { |phr| return true if phr.include? word }
+        false
+      end
+      using Camdict::StringExt
+      # Expand a phrase array into a flattened one. Example,
+      #   ['blow your nose', 'blow a kiss to/at sb'] #=>
+      #   ['blow your nose', 'blow a kiss to sb', 'blow a kiss at sb']
+      def expand
+        map { |p| p&.flatten || p }.flatten
+      end
+    end
+  end
+end

data/lib/camdict/client.rb CHANGED Viewed

@@ -1,117 +1,171 @@
-require "camdict/http_client"
+# frozen_string_literal: true
+require 'camdict/http_client'
+require 'camdict/string_ext'
+require 'camdict/exception'
 module Camdict
-  # The client downloads all the useful data about a word or phrase from
-  # remote Cambridge dictionaries, but not includes the extended data.
+  # The client downloads all the useful data about a word or phrase from
+  # remote Cambridge dictionaries, but not includes the extended data.
   # For example,
-  # when the word "mind" is searched, all its four exactly matched entries are
-  # downloaded. However, separated entries like "turn of mind" & "open mind"
+  # when the word "mind" is searched, the exactly matched entry is downloaded.
+  # However, other related entries like "turn of mind" & "open mind"
   # are not included.
-  class Client
-    # Default dictionary is english-chinese-simplified.
-    # Other possible +dict+ values:
-    # british, american-english, business-english, learner-english.
-    def initialize(dict=nil)
-      @dictionary = dict || "english-chinese-simplified"
-    end
-    # Get a word's html definition(s) by searching it from the web dictionary.
-    # The returned result could be an empty array when nothing is found, or
-    # is an array with a hash element,
-    #   [{ word => html definition }],
-    # or many hash elements when it has multiple entries,
-    #   [{ entry_id => html definition }, ...].
-    # Normally, when a +word+ has more than one meanings, its entry ID format is
-    # like word_nn. Otherwise it's just the word itself.
+  class Client < HTTP::Client
+    attr_reader :dictionary
+    # Default dictionary is British english.
+    # Other possible +dict+ values:
+    # english-chinese-simplified, learner-english,
+    # essential-british-english, essential-american-english, etc.
+    def initialize(dict = nil)
+      @dictionary = dict || 'english'
+    end
+    # Get a word's html definition from the web dictionary.
+    # The returned result could be an empty string when nothing is found, or
+    # its html definition
     def html_definition(word)
       html = fetch(word)
-      return [] if html.nil?
-      html_defs = []
-      # some words return their only definition directly, such as aluminium.
-      if definition_page? html
-        # entry id is just the word when there is only one definition
-        html_defs << { word => di_head(html) + di_body(html) }
+      if html
+        di_extracted(html)
       else
-        # returned page could be a spelling check suggestion page in case it is
-        # not found, or the found page with all matched entries and related.
-        # when entry urls are not found, they are empty and spelling suggestion
-        # pages. So mentry_links() returns an empty array. Otherwise, it returns
-        # all the exactly matched entry links.
-        matched_urls = mentry_links(word, html)
-        unless matched_urls.empty?
-          matched_urls.each { |url|
-            html_defs << { entry_id(url) => get_htmldef(url) }
-          }
-        end
+        search(word)
       end
-      html_defs
     end
-    # Get a word html page source by its entry +url+.
+    # Get a word html definition page by its entry +url+.
     def get_htmldef(url)
-      html = Camdict::HTTP::Client.get_html(url)
-      di_head(html) + di_body(html)
+      di_extracted get_html(url)
+    end
+    # search a word with this URL
+    def search_url(word)
+      "#{host}/search/#{@dictionary}/?q=#{word}"
+    end
+    def word_url(word)
+      "#{host}/dictionary/#{@dictionary}/#{encode(word).downcase}"
     end
     private
+    def search(word)
+      html = try_search(word)
+      return '' unless html
+      # some words return their only definition directly, such as plagiarism.
+      if single_def?(html)
+        di_extracted(html)
+      else
+        multiple_entries(word, html).join
+      end
+    end
+    def host
+      'http://dictionary.cambridge.org'
+    end
+    # returned page could be a spelling check suggestion page in case it is
+    # not found, or the found page with all matched entries and related.
+    # when entry urls are not found, they are empty and spelling suggestion
+    # pages. It returns all the exactly matched entry links otherwise, raise
+    # exception WordNotFound.
+    def multiple_entries(word, html)
+      html_defs = []
+      mentry_links(word, html).each do |url|
+        html_content = get_htmldef(url)
+        html_defs << html_content if html_content
+      end
+      raise WordNotFound, "#{word} not found" if html_defs.empty?
+      html_defs
+    end
+    def try_search(word)
+      get_html(search_url(word))
+    rescue OpenURI::HTTPError => e
+      # When a word does not match any definitions, it returns 404 not found.
+      return if e.message[0..2] == '404'
+    end
     # Fetch word searching result page.
     # Returned result is either just a single definition page if there is only
-    # one entry, or a result page listing all possible entries, or spelling
+    # one entry, or a result page listing all possible entries, or spelling
     # check result. All results are objects of Nokogiri::HTML.
     def fetch(w)
-      # search a word with this URL
-      search_url = "http://dictionary.cambridge.org/search/#{@dictionary}/?q="
-      url = search_url + w
-      begin
-        Camdict::HTTP::Client.get_html(url)
-      rescue OpenURI::HTTPError => e
-        # "404" == e.message[0..2], When a word is not found, it returns 404
-        # Not Found and spelling suggestions page.
-      end
+      ret = get_html(word_url(w))
+      ret if definition_page? ret
     end
-    # To determine whether or not the input object of Nokogiri::HTML is a page
+    # To determine whether or not the input object of Nokogiri::HTML is a page
     # of a word definition. Return true if it has a source structure like this,
     # <div class="di-head">
     #   <div class="di-title">
     #     <h1 class="hw">
     # This works for the translation page too, like English-Spanish.
     def single_def?(html)
-      node = html.css(".di-head .di-title .hw")
-      ! node.empty?
+      node = html.css('.di-head .di-title .hw')
+      !node.empty?
     end
     # Find out matched entry links from search result page
-    # <ul class="result-list">
-    #   <li><a href="entry_link1">
-    #   <li><a href="entry_link2">
-    # The search result html page should include above piece of code.
-    # The extended links are filtered out and the matched word or phrase's
+    # <ul class="prefix-block">
+    #   <li><a href="entry_link">
+    # The search result html page should include above piece of code.
+    # The extended links are filtered out and the matched word or phrase's
     # links are kept. An array of them are returned.
-    # For example, when the searched word is "related", entry links are like,
-    #   http://dictionary.cambridge.org/dictionary/british/related_1
-    #   http://dictionary.cambridge.org/dictionary/british/related_2
-    #   http://dictionary.cambridge.org/dictionary/british/stress-related
+    # For example, when the searched word is "related", entry links are like,
+    #   http://dictionary.cambridge.org/dictionary/english/related
+    #   http://dictionary.cambridge.org/dictionary/english/relate
     #   ...
-    # Returned result should only contain the first two.
+    # Returned result should only contain the first one.
     # Input html is an object of Nokogiri::HTML.
     def mentry_links(word, html)
       # suppose the word is not found in the dictionary, so it is empty.
       links = []
-      nodes = html.css(".result-list a")
-      # when found
-      unless nodes.empty?
-        nodes.each { |a|
-          links << a['href'] if matched_word?(word, a)
-        }
-      end
+      nodes = html.css('.prefix-block a')
+      nodes.each { |a| links << a['href'] if matched_word?(word, a) }
       links
     end
+    # Extract definition head and body from Nokogiri::HTML, discard share links
+    def di_extracted(html)
+      body = di_body(html)
+      # searching aluminium returns an American or British english page
+      # saparately, below condition filter out American english result
+      return if body.empty?
+      body.css('.share').each { |s| body.delete s }
+      body
+    end
+    # Return definition body in html source
+    def di_body(html)
+      html.css("#{tab_css} .di-body")
+    end
+    # the css selecting a tab
+    def tab_css
+      "[#{tab}]"
+    end
+    # the tab attributes according to dictionary name
+    def tab
+      case @dictionary
+      when 'english'
+        'data-tab="ds-british"'
+      end
+    end
+    # get the last part of http://dictionary.cambridge.org/british/related_1
+    def entry_id(url)
+      url.split('/').last
+    end
+    # phrase with space and single quote has to be replaced with dash
+    def encode(word)
+      word.gsub(/[ ']/, '-')
+    end
+    alias definition_page? single_def?
+    using Camdict::StringExt
     # Return true if the searched word matches the one on result page.
     # Node is an object of Nokogiri::Node
     # <li>
@@ -119,36 +173,18 @@ module Camdict
     #     <b class="phrase">out of mind, or
     #     <b class="hw">turn of mind, or
     #     <b class="w">mind-numbingly
-    # Match criterion: the queried word should equal to the result word;
+    # Match criterion: the queried word should equal to the result word;
     #   the result phrase should be flattened, which should equal to the
     #   queried phrase.
     def matched_word?(word, node)
-      li = node.css(".base")
+      li = node.css('.base')
+      return false if li.empty?
       resword = li.size == 1 ? li.text : li[0].text
-      if resword.include? '/' or resword.include? ';'
+      if resword.include?('/') || resword.include?(';')
         resword.flatten.include?(word)
       else
         word == resword
       end
     end
-    # Return definition head in html source
-    def di_head(html)
-      html.css(".cdo-section-title-hw").to_html(:save_with=>0) +
-      html.css(".di-info").to_html(:save_with=>0)
-    end
-    # Return definition body in html source
-    def di_body(html)
-      html.css(".di-body").to_html(:save_with=>0)
-    end
-    # get the last part of http://dictionary.cambridge.org/british/related_1
-    def entry_id(url)
-      url.split('/').last
-    end
-    alias :definition_page? :single_def?
   end
 end

data/lib/camdict/common.rb CHANGED Viewed

@@ -1,161 +1,43 @@
+# frozen_string_literal: true
+require 'camdict/string_ext'
 module Camdict
+  # some common private methods used to extract Nokogiri nodes
   module Common
-    # Extend String class.
-    String.class_eval do
-      # 'blow a kiss to/at sb'.flatten =>
-      # %q(blow a kiss to sb, blow a kiss at sb)
-      # if it doesn't include a slash, returns stripped string
-      def flatten
-        str = self.strip
-        # remove the space surrounding '/'
-        str = str.gsub /\s*\/\s*/, '/'
-        return str unless str.include? '/'
-        len = str.length
-        ret = []
-        # when two strings are passed in separated with ';', then separate them
-        if pos = str.index(';')
-          ret += str[0..pos-1].flatten
-          ret += str[pos+1..len-1].flatten
-          return ret
-        end
-        # when a string has round brackets meaning optional part
-        if str.include? '('
-          head, bracket, tail = str.partition(/\(.*\)/)
-          unless bracket.empty?
-            ret << (head.strip + tail).flatten
-            result = bracket.delete("()").flatten
-            result = [result] if result.is_a? String
-            result.each { |s|
-              ret << (head + s + tail).flatten
-            }
-          end
-          return ret.flatten
-        end
-        j=0     # count of the alternative words, 'to/at' has two.
-        b=[]    # b[]/e[] index of the beginning/end of alternative words
-        e=[]
-        # set this flag when next word is expected an alternate word after slash
-        include_next = false
-        for i in 0..len-1
-          c = str[i]
-          case c
-          # valid char in a word
-          when /[[:alnum:]\-']/
-            if b[j].nil?
-              b[j] = i
-              e[j] = i
-            else
-              e[j] = i
-            end
-          # char means a word has ended
-          when " ", "!", "?", ",", "."
-            if include_next
-              break
-            else
-              b[j] = nil
-              e[j] = nil
-            end
-          # 'or' separator
-          when "/"
-            j += 1
-            include_next = true
-          else
-            raise NotImplementedError, "char '#{c}' found in '#{self}'."
-          end
-        end
-        if j > 0
-          for i in (0..j)
-            # alternative word is not the last word and not at the beginning
-            if (e[j]+1 < len) && (b[0] > 0)
-              ret << str[0..b[0]-1] + str[b[i]..e[i]] + str[e[j]+1..len-1]
-            elsif (e[j]+1 == len) && (b[0] > 0)
-              ret << str[0..b[0]-1] + str[b[i]..e[i]]
-            elsif (e[j]+1 < len) && (b[0] == 0)
-              ret << str[b[i]..e[i]] + str[e[j]+1..len-1]
-            else
-              ret << str[b[i]..e[i]]
-            end
-          end
-        end
-        ret
-      end
-      # Test whether a String includes the +word+. It's useful while testing
-      # a variable which might be an array of phrase or just a single phrase.
-      def has?(word)
-        self.include? word
-      end
-    end
-    # Extend Array class.
-    Array.class_eval do
-      # Expand a phrase array into a flattened one. Example,
-      #   ['blow your nose', 'blow a kiss to/at sb'] #=>
-      #   ['blow your nose', 'blow a kiss to sb', 'blow a kiss at sb']
-      def expand
-        ret = self.map { |p|
-          p.flatten if p.is_a? String
-        }
-        ret.flatten
-      end
-      # Test if a phrase array includes a +word+.
-      #   ['blow your nose', 'blow a kiss to/at sb'].has?("a kiss at") #=>true
-      def has?(word)
-        self.expand.each { |phr|
-          return true if phr.include? word
-        }
-        false
-      end
-      # Iterate an array and return two elements +a+ +b+ each time for handling.
-      def each_pair
-        len = self.length
-        i = 0
-        while (i < len)
-          a = self.at(i)
-          b = self.at(i+1)
-          yield(a, b)
-          i += 2
-        end
-      end
-    end
     private
     # Get the text selected by the css +selector+.
-    def css_text(selector)
-      node = @html.css(selector)
+    def css_text(html, selector)
+      node = html.css(selector)
       node.text unless node.empty?
     end
-    # Get sth by the css +selector+ for the derived word inside its runon node
-    def derived_css(selector)
-      runon = @html.css(".runon")
-      runon.each { |r|
+    # Get sth by the css +selector+ for the derived word inside its runon node
+    def derived_css(html, selector)
+      runon = html.css('.runon')
+      runon.each do |r|
         n = r.css('[title="Derived word"]')
-        if n.text == @word
+        if n.text == @word
           node = r.css(selector)
           yield(node)
         end
-      }
+      end
     end
+    using Camdict::StringExt
     # Get sth by the css +selector+ for the phrase inside the node phrase-block
-    def phrase_css(selector)
-      phbs = @html.css(".phrase-block")
-      phbs.each { |phb|
+    def phrase_css(html, selector)
+      phbs = html.css('.phrase-block')
+      phbs.each do |phb|
         nodes = phb.css('.phrase, .v[title="Variant form"]')
-        nodes.each { |n|
-          if n.text.flatten.has? @word
-            node = phb.css(selector)
-            yield(node)
-            break
-          end
-        }
-      }
+        nodes.each do |n|
+          next unless n.text.flatten.has? @word
+          node = phb.css(selector)
+          yield(node)
+          break
+        end
+      end
     end
   end
 end