RubyGems - string_utility_belt - Versions diffs - 0.2.5 → 0.3.0 - Mend

string_utility_belt 0.2.5 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

data/.gitignore +6 -0
data/Gemfile +12 -0
data/Gemfile.lock +28 -0
data/README.markdown +145 -0
data/Rakefile +41 -33
data/lib/string_utility_belt/entities.rb +23 -0
data/lib/string_utility_belt/general.rb +72 -0
data/lib/{match_rank → string_utility_belt}/match_rank.rb +30 -24
data/lib/string_utility_belt/regex_me_helper.rb +100 -0
data/lib/string_utility_belt/regex_me_to_search.rb +107 -0
data/lib/string_utility_belt/tags.rb +22 -0
data/lib/string_utility_belt/version.rb +1 -6
data/lib/string_utility_belt.rb +6 -17
data/test/string_utility_belt/entities_test.rb +17 -0
data/test/string_utility_belt/general_test.rb +73 -0
data/test/string_utility_belt/match_rank_test.rb +64 -0
data/test/string_utility_belt/regex_me_helper_test.rb +117 -0
data/test/string_utility_belt/regex_me_to_search_test.rb +106 -0
data/test/string_utility_belt/tags_test.rb +25 -0
data/test/test_helper.rb +5 -0
metadata +30 -22
data/lib/general/general.rb +0 -36
data/lib/html_and_aml/helpers/entities.rb +0 -16
data/lib/html_and_aml/helpers/tags.rb +0 -13
data/lib/html_and_aml/html_and_aml.rb +0 -10
data/lib/regex_me/helpers/string/regex_me.rb +0 -73
data/lib/regex_me/regex_me.rb +0 -84
data/string_utility_belt.gemspec +0 -10

data/.gitignore ADDED Viewed

@@ -0,0 +1,6 @@
+*.gemspec
+pkg/**/*
+nbproject/**/*
+*.swp
+coverage*

data/Gemfile ADDED Viewed

@@ -0,0 +1,12 @@
+#!/usr/bin/env ruby
+source "http://rubygems.org"
+# Specify your gem's dependencies in string_utility_belt.gemspec
+gemspec
+gem "htmlentities", "4.3.0"
+group "test", "developement" do
+  gem "rcov"
+  gem "ruby-debug"
+end

data/Gemfile.lock ADDED Viewed

@@ -0,0 +1,28 @@
+PATH
+  remote: .
+  specs:
+    string_utility_belt (0.2.3)
+GEM
+  remote: http://rubygems.org/
+  specs:
+    columnize (0.3.3)
+    htmlentities (4.3.0)
+    linecache (0.46)
+      rbx-require-relative (> 0.0.4)
+    rbx-require-relative (0.0.5)
+    rcov (0.9.9)
+    ruby-debug (0.10.4)
+      columnize (>= 0.1)
+      ruby-debug-base (~> 0.10.4.0)
+    ruby-debug-base (0.10.4)
+      linecache (>= 0.3)
+PLATFORMS
+  ruby
+DEPENDENCIES
+  htmlentities (= 4.3.0)
+  rcov
+  ruby-debug
+  string_utility_belt!

data/README.markdown ADDED Viewed

@@ -0,0 +1,145 @@
+# serradura-string_utility_belt
+## Links
+<a href='http://rubygems.org/gems/string_utility_belt'>http://rubygems.org/gems/string_utility_belt</a>
+<a href="http://github.com/serradura/string_utility_belt">http://github.com/serradura/string_utility_belt</a>
+## Install
+    gem install string_utility_belt
+## Let's code!
+Pessoal,
+Peguei um amontoado de métodos que andei desenvolvendo para strings e surgiu essa gem!
+Dentre alguns dos módulos que ela tem é o módulo que transforma a String em uma Regex, o nome do módulo é RegexMe! :p
+Segue alguns exemplos:
+    >> require "string_utility_belt"
+    >> "coca cola".regex_me_to_search_ruby
+    => /(coca|cola)/
+Qual a utilidade disso???
+Imagine que você tem a seguinte coleção:
+    minha_colecao = %w{carro caminhão moto lancha avião banana bonono benene}
+E você que selecionar as palavras que contenha: car e mo
+     minha_colecao.select { |item| item =~ "car mo".regex_me_to_search_ruby }
+     #=> ["carro", "moto"]
+Mas e se você quiser as palavras que termine com a letra "a" e que case com b*n*n*
+     minha_colecao.select { |item| item =~ "*a b*n*n*".regex_me_to_search_ruby }
+     #=> ["lancha", "banana", "bonono", "benene"]
+Vamos tentar só com o que termina com a letra "a"
+     minha_colecao.select { |item| item =~ "*a".regex_me_to_search_ruby }
+     #=> ["lancha", "banana"]
+Que comece com a letra "m"
+     minha_colecao.select { |item| item =~ "m*".regex_me_to_search_ruby }
+     #=> ["moto"]
+Uaaaaauuuu, entenderam as possibilidades?
+Você também pode fazer:
+     minha_colecao.select { |item| item =~ "m* car *a b*n*".regex_me_to_search_ruby }
+     #=> ["carro", "moto", "lancha", "banana", "bonono", "benene"]
+Você também pode passar alguns parametros para criar Regex mais inteligentes:
+Regexs que ignoram case sensitive
+      minha_colecao.select { |item| item =~ "N".regex_me_to_search_ruby(:case_insensitive => true) }
+      #=> ["caminhão", "lancha", "banana", "bonono", "benene"]
+Que case palavras exatas idependente se é maiuscula ou minuscula
+      ["Ruby Rails", "Ruby on Rails", "Ruby - Rails"].select { |item| item =~ "ruby rails".regex_me_to_search_ruby(:case_insensitive => true, :exact_phrase => true) }
+      #=> ["Ruby Rails", "Ruby - Rails"]
+O parâmetro :exact_phrase ignora qualquer caracter do tipo
+letra (maiúscula ou minúscula) e números além do char "_"
+me basei na regra utilizada pelo twitter e google quando usamos "
+Ex: "Ruby Rails"
+Que tenha palavras exatas
+      minha_colecao.select { |item| item =~ "car".regex_me_to_search_ruby(:exact_word => true) }
+      #=> []
+Se eu quiser que palavras que contenham car
+      minha_colecao.select { |item| item =~ "car".regex_me_to_search_ruby }
+      #=> ["carro"]
+Agora vamos supor que o usuário queira casar a palavra estágio
+mas nos textos que ele estava buscando a palavra ele percebeu que existia palavra com e sem acentuação
+      palavras = %w{estagio estágio éstágio estagió estagios}
+      palavras.select { |palavra| palavra =~ "estágio".regex_me_to_search_ruby }
+      #=> ["estágio"]
+E agora e se eu quiser casar as palavras independente da acentuação???
+      palavras.select { |palavra| palavra =~ "estágio".regex_me_to_search_ruby(:latin_chars_variation => true)}
+      #=> ["estagio", "estágio", "éstágio", "estagió", "estagios"]
+Mas se eu quiser apenas estágio, e ignorar estagios por exemplo???
+      palavras.select { |palavra| palavra =~ "estágio".regex_me_to_search_ruby(:latin_chars_variation => true, :exact_word => true)}
+      #=> ["estagio", "estágio", "éstágio", "estagió"]
+Mas saindo um pouco do ruby e pensando numa aplicação em rails.
+Imagine que você tem um campo de busca e o usuário pode escrever várias palavras no campo (Igual ao google) e o resultado deverá retornar as palavras que contenham o que o usário digitou.
+Imagine que você tá no controller (Embora essa lógica deveria estar no model! :D)
+       @textos = Texto.all(:conditions => ["texto REGEXP ?", params[:busca].regex_me_to_search_mysql])
+Por enquanto as regex estão prontas para o mysql e você pode utilizar todos os parâmetros que foram apresentados acima!
+Perceberam os ganho que tivemos na aplicação???
+O usuário pode escrever no form:
+       car* *a c*r*
+Só com isso você já da o poder do usuário fazer pesquisas mais inteligentes e você só utiliza um método!
+Não sei se devo.... Mas muitos programadores fazem buscas dinâmicas utilizando o operador LIKE e sai essas bizarrices.
+        SELECT * FROM TEXTOS
+        WHERE texto LIKE "%CARRO%"
+        OR texto LIKE "%MOTO%"
+        OR texto LIKE "%AVIAO%"
+Já com o <b>string_utility_belt</b> ele vai gerar.
+      SELECT * FROM TEXTOS WHERE texto REGEXP "(CARRO|MOTO|AVIAO)"
+Que é muito mais inteligente e poderoso!!!
+É isso!!!
+A api tem outras funcionalidades bem interessantes...
+Mas para começar vou documentar apenas o módulo RegexMe!
+E caso você queira colaborar...
+Dê um fork no projeto envie seus códigos e publicarei na GEM.
+Abraço,
+Serradura

data/Rakefile CHANGED Viewed

@@ -1,35 +1,19 @@
-#
-# To change this template, choose Tools | Templates
-# and open the template in the editor.
-require 'rubygems'
+require 'rubygems' if RUBY_VERSION < '1.9'
 require 'rake'
 require 'rake/clean'
-require 'rake/gempackagetask'
-require 'rake/rdoctask'
 require 'rake/testtask'
-require 'spec/rake/spectask'
-spec = Gem::Specification.new do |s|
-  s.name = 'serradura-string_utility_belt'
-  s.version = '0.0.1'
-  s.has_rdoc = true
-  s.extra_rdoc_files = ['README', 'LICENSE']
-  s.summary = 'Your summary here'
-  s.description = s.summary
-  s.author = ''
-  s.email = ''
-  # s.executables = ['your_executable_here']
-  s.files = %w(LICENSE README Rakefile) + Dir.glob("{bin,lib,spec}/**/*")
-  s.require_path = "lib"
-  s.bindir = "bin"
+require 'bundler/gem_tasks'
+begin
+  require 'rake/rdoctask'
+rescue
+  require 'rdoc/task'
 end
-Rake::GemPackageTask.new(spec) do |p|
-  p.gem_spec = spec
-  p.need_tar = true
-  p.need_zip = true
+begin
+  require 'rcov/rcovtask'
+rescue
+  require 'rcov/task'
 end
 Rake::RDocTask.new do |rdoc|
@@ -41,11 +25,35 @@ Rake::RDocTask.new do |rdoc|
   rdoc.options << '--line-numbers'
 end
-Rake::TestTask.new do |t|
-  t.test_files = FileList['test/**/*.rb']
+namespace :test do
+  Rake::TestTask.new do |t|
+    t.test_files = FileList['test/**/*.rb']
+    t.name = 'all'
+  end
 end
-Spec::Rake::SpecTask.new do |t|
-  t.spec_files = FileList['spec/**/*.rb']
-  t.libs << Dir["lib"]
-end
+def run_coverage(files)
+  rm_f "coverage"
+  rm_f "coverage.data"
+  # turn the files we want to run into a  string
+  if files.length == 0
+    puts "No files were specified for testing"
+    return
+  end
+  files = files.join(" ")
+  exclude = '--exclude "usr/*"'
+  rcov = "rcov -Ilib:test --sort coverage --text-report #{exclude} --aggregate coverage.data"
+  cmd = "#{rcov} #{files}"
+  sh cmd
+end
+namespace :test do
+  desc 'Measures test coverage'
+  task :rcov do
+    run_coverage Dir["test/string_utility_belt/**/*.rb"]
+  end
+end

data/lib/string_utility_belt/entities.rb ADDED Viewed

@@ -0,0 +1,23 @@
+require 'htmlentities'
+module StringUtilityBelt
+  module Entities
+    CODER = HTMLEntities.new
+    def generate_entities
+      CODER.encode(self)
+    end
+    def decode_entities
+      CODER.decode(self)
+    end
+    def decode_entities_and_cleaner
+      decode_entities.tag_cleaner
+    end
+  end
+end
+class String
+  include StringUtilityBelt::Entities
+end

data/lib/string_utility_belt/general.rb ADDED Viewed

@@ -0,0 +1,72 @@
+require 'string_utility_belt/regex_me_to_search'
+module StringUtilityBelt
+  module General
+    class GENERAL
+      CASE_INSENSITIVE_OPT = {:case_insensitive => true}
+      def have_this_words?(string, words_to_match, options)
+        @string    = string
+        @arguments = options
+        for word in words_to_match
+          return false if string_does_not_match_with_this_word_pattern?(word)
+        end
+        return true
+      end
+      private
+      def string_does_not_match_with_this_word_pattern?(word)
+        @string !~ word.regex_me_to_search_ruby(arguments)
+      end
+      def arguments
+        if is_boolean?
+          CASE_INSENSITIVE_OPT.merge({:exact_word => @arguments})
+        elsif is_hash?
+          @arguments.merge(CASE_INSENSITIVE_OPT)
+        end
+      end
+      def is_boolean?
+        @arguments.instance_of?(FalseClass) || @arguments.instance_of?(TrueClass)
+      end
+      def is_hash?
+        @arguments.instance_of?(Hash)
+      end
+    end
+    WORD_PATTERN      = /\w[\w\'\-]*/
+    ANY_SPACE_PATTERN = /\s+/
+    SIMPLE_SPACE = " "
+    def words
+      self.scan(WORD_PATTERN)
+    end
+    def simple_space
+      self.gsub(ANY_SPACE_PATTERN, SIMPLE_SPACE)
+    end
+    def simple_space!
+      self.gsub!(ANY_SPACE_PATTERN, SIMPLE_SPACE)
+    end
+    def have_this_words?(words_to_match, options = false)
+      i = GENERAL.new
+      i.have_this_words?(self, words_to_match, options)
+    end
+    def not_have_this_words?(words_to_match, options = false)
+      i = GENERAL.new
+      !i.have_this_words?(self, words_to_match, options)
+    end
+  end
+end
+class String
+  include StringUtilityBelt::General
+end

data/lib/{match_rank → string_utility_belt}/match_rank.rb RENAMED Viewed

@@ -1,16 +1,33 @@
+require 'string_utility_belt/regex_me_to_search'
-module MatchRank
+module StringUtilityBelt
+  module MatchRank
-  def match_and_score_by words_to_match
-    freq = self.total_frequency_by words_to_match
-    statistic = {:exact => freq[:exact].to_f, :matched => freq[:matched].to_f, :precision => 0.0}
-    statistic[:precision] = (statistic[:exact] / statistic[:matched]) * 100
+    def total_frequency_by words_to_match
+      frequency_by(words_to_match, 0, 0) do |freq, word_to_match, word|
+        freq[:exact]   += 1 if word =~ word_to_match.regex_me_to_search_ruby(:exact_word => true  , :case_insensitive => true)
+        freq[:matched] += 1 if word =~ word_to_match.regex_me_to_search_ruby(:exact_word => false , :case_insensitive => true)
+      end
+    end
-    return statistic
-  end
+    def words_frequency_by words_to_match
+      frequency_by(words_to_match, Hash.new(0), Hash.new(0)) do |freq, word_to_match, word|
+        freq[:exact][word_to_match]   += 1 if word =~ word_to_match.regex_me_to_search_ruby(:exact_word => true  , :case_insensitive => true)
+        freq[:matched][word_to_match] += 1 if word =~ word_to_match.regex_me_to_search_ruby(:exact_word => false , :case_insensitive => true)
+      end
+    end
+    def match_and_score_by words_to_match
+      freq = self.total_frequency_by words_to_match
+      statistic = {:exact => freq[:exact].to_f, :matched => freq[:matched].to_f, :precision => 0.0}
+      statistic[:precision] = (statistic[:exact] / statistic[:matched]) * 100
+      return statistic
+    end
+    private
-  private
     def frequency_by words_to_match, frequency_object_a, frequency_object_b
       self_words = self.words
       freq = {:exact => frequency_object_a, :matched => frequency_object_b}
@@ -24,20 +41,9 @@ module MatchRank
       return freq
     end
-  public
-    def words_frequency_by words_to_match
-      frequency_by(words_to_match, Hash.new(0), Hash.new(0)) do |freq, word_to_match, word|
-          freq[:exact][word_to_match]   += 1 if word =~ word_to_match.regex_me_to_search_ruby(:exact_word => true  , :case_insensitive => true)
-          freq[:matched][word_to_match] += 1 if word =~ word_to_match.regex_me_to_search_ruby(:exact_word => false , :case_insensitive => true)
-      end
-    end
-    def total_frequency_by words_to_match
-      frequency_by(words_to_match, 0, 0) do |freq, word_to_match, word|
-        freq[:exact]   += 1 if word =~ word_to_match.regex_me_to_search_ruby(:exact_word => true  , :case_insensitive => true)
-        freq[:matched] += 1 if word =~ word_to_match.regex_me_to_search_ruby(:exact_word => false , :case_insensitive => true)
-      end
-    end
+  end
 end
+class String
+  include StringUtilityBelt::MatchRank
+end

data/lib/string_utility_belt/regex_me_helper.rb ADDED Viewed

@@ -0,0 +1,100 @@
+# coding: utf-8
+module RegexMe
+  module Helper
+    A_VARIATIONS = "(a|à|á|â|ã|ä)"
+    E_VARIATIONS = "(e|è|é|ê|ë)"
+    I_VARIATIONS = "(i|ì|í|î|ï)"
+    O_VARIATIONS = "(o|ò|ó|ô|õ|ö)"
+    U_VARIATIONS = "(u|ù|ú|û|ü)"
+    C_VARIATIONS = "(c|ç)"
+    N_VARIATIONS = "(n|ñ)"
+    LATIN_CHARS_VARIATIONS = [A_VARIATIONS,
+                              E_VARIATIONS,
+                              I_VARIATIONS,
+                              O_VARIATIONS,
+                              U_VARIATIONS,
+                              C_VARIATIONS,
+                              N_VARIATIONS]
+   BORDER_TO = {
+                 :ruby => {:left => '\b', :right => '\b' },
+                 :mysql => {:left => '[[:<:]]', :right => '[[:>:]]' }
+               }
+    def regex_latin_ci_list
+      memo = ""
+      self.each_char do |char|
+        changed = false
+        for variations in LATIN_CHARS_VARIATIONS
+          variations_pattern = Regexp.new(variations, Regexp::IGNORECASE)
+          if char =~ variations_pattern
+            changed = true
+            memo.insert(-1, variations)
+            break
+          end
+        end
+        memo.insert(-1, char) unless changed
+      end
+      self.replace(memo)
+    end
+    def regex_builder(options)
+      if options[:any]
+        replace_the_any_char_per_any_pattern
+      end
+      if options[:latin_chars_variations]
+        replace_chars_includeds_in_latin_variation_list
+      end
+      if options[:border]
+        insert_border(options[:border])
+      end
+      if options[:or]
+        insert_OR
+      end
+      return self
+    end
+    private
+    def replace_the_any_char_per_any_pattern
+      self.gsub!(/\*/, '.*')
+    end
+    def replace_chars_includeds_in_latin_variation_list
+      self.regex_latin_ci_list
+    end
+    def insert_border(options)
+      border = BORDER_TO[options[:to]]
+      case options[:direction]
+      when :left
+        self.insert(0, border[:left])
+      when :right
+        self.insert(-1, border[:right])
+      when :both
+        self.insert(0, border[:left]).insert(-1, border[:right])
+      else
+        self
+      end
+    end
+    def insert_OR
+      self.insert(-1, "|")
+    end
+  end
+end
+class String
+  include RegexMe::Helper
+end

data/lib/string_utility_belt/regex_me_to_search.rb ADDED Viewed

@@ -0,0 +1,107 @@
+require 'string_utility_belt/regex_me_helper'
+module StringUtilityBelt
+  module RegexMe
+    EMPTYs = {:ruby => //, :mysql => ''}
+    WORDS_INTERVAL_PATTERN_FOR_EXACT_PHRASES = '[^0-9a-zA-Z\_]+'
+    module To
+      module Search
+        def regex_me_to_search_ruby(options = {})
+          regex_me_to_search(:ruby, options)
+        end
+        def regex_me_to_search_mysql(options = {})
+          regex_me_to_search(:mysql, options)
+        end
+        private
+        def options_handler(options)
+          handled = \
+          {:case_insensitive  => (options[:case_insensitive] ? Regexp::IGNORECASE : nil ),
+           :multiline  => (options[:multiline] ? Regexp::MULTILINE : nil ),
+           :or => (options[:or] == false ? false : true)}
+          return options.merge(handled)
+        end
+        def regex_me_to_search(env, options)
+          return EMPTYs[env] if self.strip.empty?
+          execute_builder(env, options)
+        end
+        def execute_builder(env, options)
+          opt_handled = options_handler(options)
+          builder_result = builder(env, opt_handled)
+          case env
+          when :ruby
+            options = [opt_handled[:case_insensitive], opt_handled[:multiline]].compact
+            Regexp.new(builder_result, *options)
+          when :mysql
+            builder_result
+          end
+        end
+        def builder(border_to, options)
+          string = self
+          lcv = options[:latin_chars_variations]
+          if options[:exact_phrase]
+            @regexp = \
+              string \
+               .strip.simple_space \
+               .regex_latin_ci_list \
+               .gsub(/\s/, WORDS_INTERVAL_PATTERN_FOR_EXACT_PHRASES) \
+               .regex_builder(:or => false,
+                              :border => {:to => border_to,
+                                          :direction  => :both})
+          else
+            @regexp = '('
+            for word in string.strip.split
+              if options[:exact_word]
+                @regexp << word.regex_builder(:border => {:to => border_to, :direction => :both}, :latin_chars_variations => lcv, :or => true)
+              elsif have_the_any_char?(word)
+                @regexp << word.regex_builder(:any => true, :border => border(border_to, word) , :latin_chars_variations => lcv, :or => true)
+              else
+                @regexp << word.regex_builder(:latin_chars_variations => lcv, :or => true)
+              end
+            end
+            @regexp = (@regexp << ')').sub!(/\|\)/,')')
+          end
+          return @regexp
+        end
+       def have_the_any_char?(string)
+         string.include?('*')
+       end
+        def border(to, word)
+          direction = nil
+          case word
+          when/^\*/
+            direction = :right
+          when /\*$/
+            direction = :left
+          when /^.*\*.*$/
+            direction = :both
+          end
+          {:to => to, :direction => direction}
+        end
+      end
+    end
+  end
+end
+class String
+  include StringUtilityBelt::RegexMe::To::Search
+end

data/lib/string_utility_belt/tags.rb ADDED Viewed

@@ -0,0 +1,22 @@
+module StringUtilityBelt
+  module Tags
+    EMPTY_STR = ''
+    TAG_PATTERN = /<[^<]*?>/
+    # TAGs disponíveis até 09/2010 - FONTE: http://www.w3schools.com/tags/default.asp
+    ANY_HTML_TAG_PATTERN = /<\/?(a|p|abbr|acronym|address|applet|area|b|base|basefont|bdo|big|blockquote|body|br|button|caption|center|cite|code|col|colgroup|dd|del|dfn|dir|div|dl|dt|em|fieldset|font|form|frame|frameset|h6|head|hr|html|i|iframe|img|input|ins|isindex|kbd|label|legend|li|link|map|menu|meta|noframes|noscript|object)[^>]+??>/im
+    def tag_cleaner
+      self.gsub(TAG_PATTERN, EMPTY_STR)
+    end
+    def html_tag_cleaner
+      self.gsub(ANY_HTML_TAG_PATTERN, EMPTY_STR)
+    end
+  end
+end
+class String
+  include StringUtilityBelt::Tags
+end

data/lib/string_utility_belt/version.rb CHANGED Viewed

@@ -1,8 +1,3 @@
 module StringUtilityBelt
-	module Version
-		MAJOR  = 0
-		MINOR  = 2
-		PATCH  = 5
-		STRING = "#{MAJOR}.#{MINOR}.#{PATCH}"
-	end
+  VERSION = "0.3.0"
 end