RubyGems - lingua-it-readability - Versions diffs - 1.0.0 - Mend

lingua-it-readability 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

checksums.yaml +7 -0
data/.gitignore +9 -0
data/.rspec +2 -0
data/.travis.yml +4 -0
data/CHANGELOG.md +17 -0
data/Gemfile +4 -0
data/LICENSE.txt +21 -0
data/README.md +44 -0
data/Rakefile +6 -0
data/bin/console +14 -0
data/bin/setup +8 -0
data/lib/lingua/it/paragraph.rb +13 -0
data/lib/lingua/it/readability/version.rb +7 -0
data/lib/lingua/it/readability.rb +140 -0
data/lib/lingua/it/sentence.rb +64 -0
data/lib/lingua/it/syllable.rb +46 -0
data/lingua-it-readability.gemspec +25 -0
metadata +104 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: ca7c5b85336a54accf315881255133c101aef7cd
+  data.tar.gz: d1ffbc35b45fa73951cddad360cfc1b696ce8ea2
+SHA512:
+  metadata.gz: ccc18e702fd8487276d79542117377c436cdd5b7b4d46f108e583290430051f1750808d424f40d9217ba6173d3bf0f0d08f31d6ce47833d15435c1e77176d763
+  data.tar.gz: 59d73c1511929d8bb9ce5a9fe38017e717025612e868e8415d0d6ab9c49e0fcae0d5f35d4a61444d034208b03cf045f274b744af978d7b53808024688b8c4444

data/.gitignore ADDED Viewed

@@ -0,0 +1,9 @@
+/.bundle/
+/.yardoc
+/Gemfile.lock
+/_yardoc/
+/coverage/
+/doc/
+/pkg/
+/spec/reports/
+/tmp/

data/.rspec ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ --format documentation
2	+ --color

data/.travis.yml ADDED Viewed

@@ -0,0 +1,4 @@
+language: ruby
+rvm:
+  - 2.3.0
+before_install: gem install bundler -v 1.11.2

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,17 @@
+#### 1.0.0 - 2016-02-09
+###### Added
+- Some more tests.
+#### 0.6.0 - 2016-02-08
+###### Added
+- Types of text.
+- Some more tests.
+#### 0.5.0 - 2016-02-05
+###### Added
+- Initial release.
+- Sentences recognition
+- Italian abbreviations
+- Syllables recongnition
+- Gulpease readability index
+- Italian Flesch readability index

data/Gemfile ADDED Viewed

@@ -0,0 +1,4 @@
+source 'https://rubygems.org'
+# Specify your gem's dependencies in lingua-it-readability.gemspec
+gemspec

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright (c) 2016 Andrea Giacomo Baldan
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,44 @@
+[![Build Status](https://travis-ci.org/codepr/lingua.svg?branch=master)](https://travis-ci.org/codepr/lingua)
+# Lingua::It::Readability
+Inpired by Lingua::EN::Readability and his perl original version Lingua::EN::Fathom, a gem focused on readability of Italian language texts.
+## Installation
+Add this line to your application's Gemfile:
+```ruby
+gem 'lingua-it-readability'
+```
+And then execute:
+    $ bundle
+Or install it yourself as:
+    $ gem install lingua-it-readability
+## Usage
+TODO: Write usage instructions here
+## Development
+After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
+## Changelog
+See the [CHANGELOG](CHANGELOG.md) file.
+## Contributing
+Bug reports and pull requests are welcome on GitHub at https://github.com/codepr/lingua-it-readability.
+## License
+The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).

data/Rakefile ADDED Viewed

@@ -0,0 +1,6 @@
+require "bundler/gem_tasks"
+require "rspec/core/rake_task"
+RSpec::Core::RakeTask.new(:spec)
+task :default => :spec

data/bin/console ADDED Viewed

@@ -0,0 +1,14 @@
+#!/usr/bin/env ruby
+require "bundler/setup"
+require "lingua/it/readability"
+# You can add fixtures and/or initialization code here to make experimenting
+# with your gem easier. You can also use a different console, if you like.
+# (If you use this, don't forget to add pry to your Gemfile!)
+# require "pry"
+# Pry.start
+require "irb"
+IRB.start

data/bin/setup ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+IFS=$'\n\t'
+set -vx
+bundle install
+# Do any other automated setup that you need to do here

data/lib/lingua/it/paragraph.rb ADDED Viewed

@@ -0,0 +1,13 @@
+module Lingua
+  module IT
+    module Paragraph
+      # Split the sample in paragraph. A paragraph is defined by
+      # a sequence of sentences followed by one or more \n, \r\t
+      # if in Windows env.
+      def self.paragraphs(text)
+        text.dup.split(/(?:\n[\r\t ]*)+/).collect { |p| p.strip }
+      end
+    end
+  end
+end

data/lib/lingua/it/readability/version.rb ADDED Viewed

@@ -0,0 +1,7 @@
+module Lingua
+  module It
+    module Readability
+      VERSION = "1.0.0"
+    end
+  end
+end

data/lib/lingua/it/readability.rb ADDED Viewed

@@ -0,0 +1,140 @@
+# coding: utf-8
+prefix = File.dirname(__FILE__) + "/"
+$LOAD_PATH.unshift prefix
+Dir.glob(prefix + "**/*.rb").each do |f|
+  require File.expand_path(f)
+end
+module Lingua
+  module IT
+    class Readability
+      attr_reader :text
+      attr_reader :type
+      attr_reader :paragraphs
+      attr_reader :sentences
+      attr_reader :words
+      attr_reader :frequencies
+      # Initialize the sample with +text+
+      def initialize(text, type = 'standard')
+        @text                = text.dup
+        @type                = type
+        @paragraphs          = Lingua::IT::Paragraph.paragraphs(self.text)
+        @sentences           = Lingua::IT::Sentence.sentences(self.text, self.type)
+        @words               = []
+        @frequencies         = {}
+        @frequencies.default = 0
+        @syllables           = Lingua::IT::Syllable.syllables(self.text)
+        count_words
+      end
+      # The number of paragraphs in the sample. A paragraph is defined as a
+      # newline followed by one or more empty or whitespace-only lines.
+      def num_paragraphs
+        paragraphs.length
+      end
+      # The number of sentences in the sample. The meaning of a "sentence" is
+      # defined by Lingua::IT::Sentence.
+      def num_sentences
+        @sentences.length
+      end
+      # The number of characeters in the sample. A character is defined as a
+      # single letter, not taking account of punctuation and spaces
+      def num_chars
+        @text.dup.gsub(/[[:punct:]][[:space:]]/, '').scan(/[a-zA-Z0-9_Èàòèéìù\(\)\[\]\{\}]/i).length
+      end
+      alias :num_characters :num_chars
+      # The number of words in the sample. A word is defined as a sequence of
+      # characters, not taking account of punctuation and spaces, see private
+      # method +count_words+ for additional info about a word definition
+      def num_words
+        words.length
+      end
+      # The total number of syllables in the text sample. Syllables are defined
+      # in Lingua::IT::Syllable.
+      def num_syllables
+        @syllables.length
+      end
+      # The number of different unique words used in the text sample.
+      def num_unique_words
+        @frequencies.keys.length
+      end
+      # An array containing each unique word used in the text sample.
+      def unique_words
+        @frequencies.keys
+      end
+      # The number of occurences of the word +word+ in the text sample.
+      def occurrences(word)
+        @frequencies[word]
+      end
+      # The average number of words per sentence.
+      def words_per_sentence
+        ((words.length.to_f / sentences.length.to_f) * 100).round / 100.0
+      end
+      # The average number of syllables per word. The syllable count is
+      # performed by Lingua::IT::Syllable, and so may not be completely
+      # accurate
+      def syllables_per_word
+        ((@syllables.length.to_f / words.length.to_f) * 100).round / 100.0
+      end
+      # Gulpease index of readability expressly calibrated to suit italian
+      # text samples.
+      # An index < 40 means a low readable sample, between 40 and 60 it
+      # represents a medium readable sample, over 60 a well written sample
+      # easily readable by an under 16 person.
+      def gulpease
+        89 + (((300 * num_sentences) - (10 * num_chars)) / num_words)
+      end
+      # Flesch index of readability expressly calibrated to suit italian
+      # text samples, derived from U.S. Flesch index.
+      # An index < 40 means a low readable sample, between 40 and 60 it
+      # represents a medium readable sample, over 60 a well written sample
+      # easily readable by an under 16 person.
+      def flesch
+        ((206.0 - (65.0 * (num_syllables.to_f / num_words.to_f)) -
+          ((num_words.to_f / num_sentences.to_f))) * 100).round / 100.0
+      end
+      # A nicely formatted report on the sample, showing most the useful
+      # stats
+      def report
+        sprintf "Number of paragraphs           %d \n" <<
+                "Number of sentences            %d \n" <<
+                "Number of words                %d \n" <<
+                "Number of characters           %d \n\n" <<
+                "Average words per sentence     %.2f \n" <<
+                "Average syllables per word     %.2f \n\n" <<
+                "Gulpease score                 %2.2f \n" <<
+                "Flesch score                   %2.2f \n",
+                num_paragraphs, num_sentences, num_words, num_characters,
+                words_per_sentence, syllables_per_word, gulpease,
+                flesch
+      end
+      private
+      # Nnumber of words in the sample. A words is represented by a sequence
+      # of single characters exlucding punctuation, except for all kind of
+      # parenthesis like () [] and {}. Being calibrated for italian language
+      # it takes in account even accented characters.
+      def count_words
+        @words = @text.dup.gsub(/[^\wÈèòàù\(\)\[\]\{\}]/i, ' ').strip.split(/\s+/)
+        @words.each do |word|
+          @frequencies[word] += 1
+        end
+      end
+    end
+  end
+end

data/lib/lingua/it/sentence.rb ADDED Viewed

@@ -0,0 +1,64 @@
+module Lingua
+  module IT
+    class Sentence
+      # Takes Italian text and split it into sentences, respecting
+      # generale abbreviations. It grant permission of adding more
+      # abbreviations to take in account during the process.
+      class << self
+        attr_reader :abbreviations
+        attr_reader :abbr_regex
+      end
+      # Common abbreviations
+      TITLES = %w(sig sigg dott preg prof mr jr amn avv co stim dr egr geom ing mons on rag rev soc spett card ill gent cav) unless defined?(TITLES)
+      MISC   = %w(p v femm dim ecc etc corr cc bcc all es fatt g gg id int lett ogg pag pagg cap pp tel ind v n num min sec ms abbr agg art aus) unless defined?(MISC)
+      MONTHS = %w(gen feb mar apr mag giu lug ago set sett ott nov dic) unless defined?(MONTHS)
+      DAYS   = %w(lun mar mer gio ven sab dom) unless defined?(DAYS)
+      # Text types
+      TYPES = {
+        'standard'   => /["']?[A-Z][^.?!]+((?![.?!]['"]?\s["']?[A-Z][^.?!]).)+[.?!'"]+/,
+        'scientific' => /["']?[A-Z][^.;:?!]+((?![.;:?!]['"]?\s["']?[A-Z][^.;:?!]).)+[.;:?!'"]+/
+      }
+      TYPES.default_proc = proc { |hash, key| hash[key] = /["']?[A-Z][^.?!]+((?![.?!]['"]?\s["']?[A-Z][^.?!]).)+[.?!'"]+/ }
+      # Split up in sentences, use 0002 as a temporary end mark for
+      # the abbreviations found, even if the regex should be enough
+      # to recognize real stop point from abbreviations ones.
+      # A sentences should definetly end marked only by a . or a ?
+      # or a !
+      def self.sentences(text, type = 'standard')
+        txt = text.dup
+        txt.gsub!(/\b(#{@abbr_regex})(\.)\B/i, '\10002')
+        txt.gsub!(/#{TYPES[type]}/, '\2\001')
+        txt.gsub!(/\b(#{@abbr_regex})(0002)/i, '\1.')
+        txt.split(/01/).map { |sentence| sentence.strip }
+      end
+      # Add customized abbreviations to standard set
+      def self.abbreviation(*abbreviations)
+        @abbreviations += abbreviations
+        @abbreviations.uniq!
+        set_abbr_regex!
+        @abbreviations
+      end
+      private
+      # Utility method, chain up all abbreviations constants arrays
+      def self.initialize_abbreviations!
+        @abbreviations = TITLES + MISC + MONTHS + DAYS
+        set_abbr_regex!
+      end
+      # Utility method, join all elements of the abbreviations arrays
+      # using | as separator, making suitable for a regex.
+      def self.set_abbr_regex!
+        @abbr_regex = "#{@abbreviations.join('|')}"
+      end
+      initialize_abbreviations!
+    end
+  end
+end

data/lib/lingua/it/syllable.rb ADDED Viewed

@@ -0,0 +1,46 @@
+# coding: utf-8
+module Lingua
+  module IT
+    module Syllable
+      # This module is inspired by the Perl Lingua::IT::Hyphenation module.
+      # However, it uses a different (though not larger) set of patterns to
+      # compensate for the 'special cases' which arise out of Italian's
+      # irregular orthography. A number of extra patterns (particularly for
+      # derived word forms) means that this module is somewhat more accurate
+      # than the Perl original.
+      V = "[aeiouàèéìòù]"
+      C = "[b-df-hj-np-tv-z]"
+      S = "iut"
+      X = "fi|aci"
+      Y = "#{C}e"
+      Z = "i[aeo]"
+      def self.syllables(text)
+        words = text.dup.split(/[^a-zA-Zàèéìòù'0-9]+/)
+        hyphenation = ""
+        words.each do |word|
+          word.gsub!(/(#{V})(#{S})/i, '\1=iu=t')
+          word.gsub!(/(#{V})(#{Z})/i, '\1=\2')
+          word.gsub!(/(#{X})(#{V})/i, '\1=\2')
+          word.gsub!(/(#{Y})(#{V})/i, '\1=\2')
+          word.gsub!(/(#{V})([bcfgptv][lr])/i, '\1=\2')
+          word.gsub!(/(#{V})([cg]h)/i, '\1=\2')
+          word.gsub!(/(#{V})(gn)/i, '\1=\2')
+          word.gsub!(/(#{C})\1/i, '\1=\1')
+          word.gsub!(/(s#{C})/i, '=\1')
+          1 while word.gsub!(/(#{V}*#{C}+#{V}+)(#{C}#{V})/i, '\1=\2')
+          1 while word.gsub!(/(#{V}*#{C}+#{V}+#{C})(#{C})/i, '\1=\2')
+          word.gsub!(/^(#{V}+#{C})(#{C})/i, '\1=\2')
+          word.gsub!(/^(#{V}+)(#{C}#{V})/i, '\1=\2')
+          word.sub!(/^=/, '')
+          word.sub!(/=$/, '')
+          word.gsub!(/=+/,'=');
+          hyphenation += "#{word}="
+        end
+        hyphenation.split('=')
+      end
+    end
+  end
+end

data/lingua-it-readability.gemspec ADDED Viewed

@@ -0,0 +1,25 @@
+# coding: utf-8
+lib = File.expand_path('../lib', __FILE__)
+$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
+require 'lingua/it/readability/version'
+Gem::Specification.new do |spec|
+  spec.name          = "lingua-it-readability"
+  spec.version       = Lingua::It::Readability::VERSION
+  spec.authors       = ["Andrea Giacomo Baldan"]
+  spec.email         = ["a.g.baldan@gmail.com"]
+  spec.summary       = %q{Text readability indexes and stats calibrated on Italian language.}
+  spec.description   = %q{Text readability indexes and stats calibrated on Italian language. Inspired by Lingua::EN::Readability and the original perl module Lingua::EN::Fathom. Gulpease and Flesch for italian text is calculated.}
+  spec.homepage      = "https://github.com/codepr/lingua-it-readability"
+  spec.license       = "MIT"
+  spec.files         = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
+  spec.bindir        = "exe"
+  spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
+  spec.require_paths = ["lib"]
+  spec.add_development_dependency "bundler", "~> 1.11"
+  spec.add_development_dependency "rake", "~> 10.0"
+  spec.add_development_dependency "rspec", "~> 3.0"
+end

metadata ADDED Viewed

@@ -0,0 +1,104 @@
+--- !ruby/object:Gem::Specification
+name: lingua-it-readability
+version: !ruby/object:Gem::Version
+  version: 1.0.0
+platform: ruby
+authors:
+- Andrea Giacomo Baldan
+autorequire:
+bindir: exe
+cert_chain: []
+date: 2016-02-08 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: bundler
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.11'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.11'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+description: Text readability indexes and stats calibrated on Italian language. Inspired
+  by Lingua::EN::Readability and the original perl module Lingua::EN::Fathom. Gulpease
+  and Flesch for italian text is calculated.
+email:
+- a.g.baldan@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- ".gitignore"
+- ".rspec"
+- ".travis.yml"
+- CHANGELOG.md
+- Gemfile
+- LICENSE.txt
+- README.md
+- Rakefile
+- bin/console
+- bin/setup
+- lib/lingua/it/paragraph.rb
+- lib/lingua/it/readability.rb
+- lib/lingua/it/readability/version.rb
+- lib/lingua/it/sentence.rb
+- lib/lingua/it/syllable.rb
+- lingua-it-readability.gemspec
+homepage: https://github.com/codepr/lingua-it-readability
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.5.2
+signing_key:
+specification_version: 4
+summary: Text readability indexes and stats calibrated on Italian language.
+test_files: []