linguakit_ruby 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 9e669c35d8ace7919707cb34d1c2357d467139a6631861934b9aac8cebe631ca
4
+ data.tar.gz: 9a764b1406cf6c1df1ce6ea50cb72e4b068dc378484ac8af4048101d33c5ea03
5
+ SHA512:
6
+ metadata.gz: 622b1daf4932cf3a957e65cf1c5884c4c29592405297f580d0b8a535185eb55b218856b93aa3cc6fbc9ca607f7253f15ee020c7cfde4d9f3c91ea1ff59114dae
7
+ data.tar.gz: aa87c63e5988040cd66ed9ac36e7a1b526f55d02a6b72d5674c45cfb0e12167099a113156856c9c686122f920110b0deab9b689705f7073c49d11eee9e23554d
data/.gitignore ADDED
@@ -0,0 +1,8 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/*
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at cpqm07@gmail.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "https://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in linguakit_ruby.gemspec
4
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,22 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ linguakit_ruby (0.1.0)
5
+ fuzzy_match (~> 2.1)
6
+
7
+ GEM
8
+ remote: https://rubygems.org/
9
+ specs:
10
+ fuzzy_match (2.1.0)
11
+ rake (10.5.0)
12
+
13
+ PLATFORMS
14
+ ruby
15
+
16
+ DEPENDENCIES
17
+ bundler (~> 2.0)
18
+ linguakit_ruby!
19
+ rake (~> 10.0)
20
+
21
+ BUNDLED WITH
22
+ 2.1.4
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2020 Christopher Quezada
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,66 @@
1
+ # LinguakitRuby
2
+
3
+ ## Installation
4
+
5
+ Move to tmp directory
6
+
7
+ ```bash
8
+ cd /tmp
9
+ ```
10
+
11
+ Clone NLP project
12
+
13
+ ```bash
14
+ git clone https://github.com/4Talent/Linguakit
15
+ ```
16
+
17
+ Using Make (to be installed in an accessible bin directory):
18
+
19
+ ```bash
20
+ cd Linguakit
21
+ sudo make deps
22
+ sudo make install
23
+ ```
24
+
25
+ Add this line to your application's Gemfile:
26
+
27
+ ```ruby
28
+ gem 'linguakit_ruby'
29
+ ```
30
+
31
+ And then execute:
32
+
33
+ ```bash
34
+ bundle
35
+ ```
36
+
37
+ ## Usage
38
+
39
+ ```ruby
40
+ Linguakit.get_score(principal_items, secondary_items, **opts)
41
+ Linguakit.keyword(input, **args)
42
+ ```
43
+
44
+ ## Example
45
+
46
+ ```ruby
47
+ Linguakit.keyword("Estoy en Santiago, en la casa de mi amigo de Pedro. Luego voy a Valparaiso para pasear en el parque y conocer la plaza.")
48
+ Linguakit.get_score( { data: "FOO" }, { data: ["FOO", "BAR"], type: :arr }, { score: 0.8 )
49
+ ```
50
+
51
+ ## Contributing
52
+
53
+ Bug reports and pull requests are welcome on GitHub at https://github.com/4talent/linguakit_ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
54
+
55
+ ## License
56
+
57
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
58
+
59
+ ## Code of Conduct
60
+
61
+ Everyone interacting in the LinguakitRuby project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/4talent/linguakit_ruby/blob/master/CODE_OF_CONDUCT.md).
62
+
63
+ ## Thanks to
64
+
65
+ [@citiususc - Linguakit](https://github.com/citiususc/Linguakit)
66
+ [@seamusabshere - FuzzyMatch](https://github.com/seamusabshere/fuzzy_match)
data/Rakefile ADDED
@@ -0,0 +1,2 @@
1
+ require "bundler/gem_tasks"
2
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "linguakit_ruby"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,3 @@
1
+ module LinguakitRuby
2
+ VERSION = "0.1.1"
3
+ end
@@ -0,0 +1,112 @@
1
+ require "linguakit_ruby/version"
2
+ require "fuzzy_match"
3
+ require 'tempfile'
4
+ require 'pry'
5
+ require 'open3'
6
+ require 'awesome_print'
7
+
8
+ module Linguakit
9
+ require_relative './railtie' if defined? Rails
10
+
11
+ class << self
12
+ DEFAULT_COMMAND = "linguakit %{module} %{lang} %{input}"
13
+ DEFAULT_COMMAND_STR = "linguakit %{module} %{lang} '%{input}' %{options}"
14
+
15
+ def sentiment input, **args
16
+ # -s = input is a string and not a file
17
+ config = {
18
+ module: 'sent',
19
+ input: input,
20
+ lang: args[:lang] || 'es',
21
+ options: args[:opts]
22
+ }
23
+ command = args[:opts] == '-s' ? DEFAULT_COMMAND_STR : DEFAULT_COMMAND
24
+ result = Open3.capture3 command % config
25
+ {
26
+ emotion: result[0].split("\t")[1],
27
+ point: result[0].split("\t")[2].split("\n")[0].to_f
28
+ }
29
+ end
30
+
31
+ def keyphrases input, **args
32
+ # -s = input is a string and not a file
33
+ # -chi = chi-square co-occurrence measure
34
+ # -log = loglikelihood
35
+ # -scp = symmetrical conditional probability
36
+ # -mi = mutual information
37
+ # -cooc = co-occurrence counting
38
+ config = {
39
+ module: 'mwe',
40
+ input: str_to_file(input),
41
+ lang: args[:lang] || 'es',
42
+ options: args[:opts] || '-chi'
43
+ }
44
+ result = Open3.capture3 DEFAULT_COMMAND % config
45
+ items = result[0].split("\n")
46
+ items.map{|item|
47
+ object = item.split("\t")
48
+ {
49
+ phrase: object[0],
50
+ rank: object[1].to_f,
51
+ composition: object[2]
52
+ }
53
+ }
54
+ end
55
+
56
+ def keyword input, **args
57
+ config = {
58
+ module: 'key',
59
+ input: str_to_file(input),
60
+ lang: args[:lang] || 'es'
61
+ }
62
+ result = Open3.capture3 DEFAULT_COMMAND % config
63
+ items = result[0].split("\n")
64
+ items.map{|item|
65
+ object = item.split("\t")
66
+ {
67
+ phrase: object[0],
68
+ rank: object[1].to_f,
69
+ composition: object[2]
70
+ }
71
+ }
72
+ end
73
+
74
+ def items_to_array items
75
+ items.map{|item| item[:phrase]}
76
+ end
77
+
78
+ def str_to_file str
79
+ file = Tempfile.new(['data', '.txt'], "#{ Dir.pwd}/tmp", encoding: 'utf-8')
80
+ file.write str
81
+ file.close
82
+ file.path
83
+ end
84
+
85
+ def item_config item
86
+ {
87
+ data: item[:data] || "",
88
+ type: item[:type] || :str
89
+ }
90
+ end
91
+
92
+ def get_phrases item
93
+ case item_config(item)[:type]
94
+ when :str
95
+ items_to_array keyword(item[:data])
96
+ when :arr
97
+ item_config(item)[:data]
98
+ end
99
+ end
100
+
101
+ def get_score(principal_items, secondary_items, **args)
102
+ _options = { score: args[:score] || 0.8 }
103
+ principal_phrases = get_phrases principal_items
104
+ secondary_phrases = get_phrases secondary_items
105
+ final_score = secondary_phrases.map{ |phrase|
106
+ match = FuzzyMatch.new(principal_phrases).find(phrase, {find_with_score: true})
107
+ match[1] if match && match[1] >= _options[:score]
108
+ }.reject(&:nil?).sum
109
+ (final_score * 100) / principal_phrases.length
110
+ end
111
+ end
112
+ end
data/lib/railtie.rb ADDED
@@ -0,0 +1,14 @@
1
+ # lib/railtie.rb
2
+ require 'linguakit_ruby'
3
+ require 'rails'
4
+
5
+ module LinguakitRuby
6
+ class Railtie < Rails::Railtie
7
+ railtie_name :linguakit_ruby
8
+
9
+ rake_tasks do
10
+ path = File.expand_path(__dir__)
11
+ Dir.glob("#{path}/tasks/**/*.rake").each { |f| load f }
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,33 @@
1
+ lib = File.expand_path('lib', __dir__)
2
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
3
+ require 'linguakit_ruby/version'
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = 'linguakit_ruby'
7
+ spec.version = LinguakitRuby::VERSION
8
+ spec.authors = ['Franco Cabanas', 'Christopher Quezada']
9
+ spec.email = ['cpqm07@gmail.com']
10
+
11
+ spec.summary = 'LinguaKit is a Natural Language Processing tool containing several NLP modules. Fuzzy matching is a technique used in computer-assisted translation as a special case of record linkage.'
12
+ spec.description = 'This project uses Natural Language Processing tool and FuzzyMatch to find similarity in some words'
13
+ spec.homepage = 'https://github.com/4Talent/linguakit_ruby'
14
+ spec.license = 'MIT'
15
+
16
+ spec.metadata['homepage_uri'] = spec.homepage
17
+ spec.metadata['source_code_uri'] = spec.homepage
18
+ spec.metadata['changelog_uri'] = spec.homepage
19
+
20
+ # Specify which files should be added to the gem when it is released.
21
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
22
+ spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
23
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
24
+ end
25
+ spec.bindir = 'exe'
26
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
27
+ spec.require_paths = ['lib']
28
+
29
+ spec.add_development_dependency 'bundler', '~> 2.0'
30
+ spec.add_development_dependency 'rake', '~> 10.0'
31
+
32
+ spec.add_runtime_dependency 'fuzzy_match', '~> 2.1'
33
+ end
metadata ADDED
@@ -0,0 +1,105 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: linguakit_ruby
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.1
5
+ platform: ruby
6
+ authors:
7
+ - Franco Cabanas
8
+ - Christopher Quezada
9
+ autorequire:
10
+ bindir: exe
11
+ cert_chain: []
12
+ date: 2020-01-09 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: bundler
16
+ requirement: !ruby/object:Gem::Requirement
17
+ requirements:
18
+ - - "~>"
19
+ - !ruby/object:Gem::Version
20
+ version: '2.0'
21
+ type: :development
22
+ prerelease: false
23
+ version_requirements: !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - "~>"
26
+ - !ruby/object:Gem::Version
27
+ version: '2.0'
28
+ - !ruby/object:Gem::Dependency
29
+ name: rake
30
+ requirement: !ruby/object:Gem::Requirement
31
+ requirements:
32
+ - - "~>"
33
+ - !ruby/object:Gem::Version
34
+ version: '10.0'
35
+ type: :development
36
+ prerelease: false
37
+ version_requirements: !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - "~>"
40
+ - !ruby/object:Gem::Version
41
+ version: '10.0'
42
+ - !ruby/object:Gem::Dependency
43
+ name: fuzzy_match
44
+ requirement: !ruby/object:Gem::Requirement
45
+ requirements:
46
+ - - "~>"
47
+ - !ruby/object:Gem::Version
48
+ version: '2.1'
49
+ type: :runtime
50
+ prerelease: false
51
+ version_requirements: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - "~>"
54
+ - !ruby/object:Gem::Version
55
+ version: '2.1'
56
+ description: This project uses Natural Language Processing tool and FuzzyMatch to
57
+ find similarity in some words
58
+ email:
59
+ - cpqm07@gmail.com
60
+ executables: []
61
+ extensions: []
62
+ extra_rdoc_files: []
63
+ files:
64
+ - ".gitignore"
65
+ - CODE_OF_CONDUCT.md
66
+ - Gemfile
67
+ - Gemfile.lock
68
+ - LICENSE.txt
69
+ - README.md
70
+ - Rakefile
71
+ - bin/console
72
+ - bin/setup
73
+ - lib/linguakit_ruby.rb
74
+ - lib/linguakit_ruby/version.rb
75
+ - lib/railtie.rb
76
+ - linguakit_ruby.gemspec
77
+ homepage: https://github.com/4Talent/linguakit_ruby
78
+ licenses:
79
+ - MIT
80
+ metadata:
81
+ homepage_uri: https://github.com/4Talent/linguakit_ruby
82
+ source_code_uri: https://github.com/4Talent/linguakit_ruby
83
+ changelog_uri: https://github.com/4Talent/linguakit_ruby
84
+ post_install_message:
85
+ rdoc_options: []
86
+ require_paths:
87
+ - lib
88
+ required_ruby_version: !ruby/object:Gem::Requirement
89
+ requirements:
90
+ - - ">="
91
+ - !ruby/object:Gem::Version
92
+ version: '0'
93
+ required_rubygems_version: !ruby/object:Gem::Requirement
94
+ requirements:
95
+ - - ">="
96
+ - !ruby/object:Gem::Version
97
+ version: '0'
98
+ requirements: []
99
+ rubygems_version: 3.0.3
100
+ signing_key:
101
+ specification_version: 4
102
+ summary: LinguaKit is a Natural Language Processing tool containing several NLP modules.
103
+ Fuzzy matching is a technique used in computer-assisted translation as a special
104
+ case of record linkage.
105
+ test_files: []