logstash-filter-stanford-nlp 0.0.1-java

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: ff7cc127d889fca197a10767b6090798e15c26f3
4
+ data.tar.gz: 43502d6e32406d82ed13487239bc99054102c733
5
+ SHA512:
6
+ metadata.gz: 297bb707f2876d2c3f6a38a31725ec814e9dd3d45833cfb1584d73e39d9ea85e7aa8dec59f8f0bf3d5c7fe8931b1137e7dc00ba5275cabae3043914244e25e10
7
+ data.tar.gz: ce2d3fa01e89d595c89b1f760c66d4e435fe16afd7f78b32cc4973a116102c76208b6a5e13853786def230ad2356aa66471355c04b7c7984d4db958ea7467c7c
@@ -0,0 +1,65 @@
1
+ # Contributing to Logstash
2
+
3
+ All contributions are welcome: ideas, patches, documentation, bug reports,
4
+ complaints, etc!
5
+
6
+ Programming is not a required skill, and there are many ways to help out!
7
+ It is more important to us that you are able to contribute.
8
+
9
+ That said, some basic guidelines, which you are free to ignore :)
10
+
11
+ ## Want to learn?
12
+
13
+ Want to lurk about and see what others are doing with Logstash?
14
+
15
+ * The irc channel (#logstash on irc.freenode.org) is a good place for this
16
+ * The [forum](https://discuss.elastic.co/c/logstash) is also
17
+ great for learning from others.
18
+
19
+ ## Got Questions?
20
+
21
+ Have a problem you want Logstash to solve for you?
22
+
23
+ * You can ask a question in the [forum](https://discuss.elastic.co/c/logstash)
24
+ * Alternately, you are welcome to join the IRC channel #logstash on
25
+ irc.freenode.org and ask for help there!
26
+
27
+ ## Have an Idea or Feature Request?
28
+
29
+ * File a ticket on [GitHub](https://github.com/elastic/logstash/issues). Please remember that GitHub is used only for issues and feature requests. If you have a general question, the [forum](https://discuss.elastic.co/c/logstash) or IRC would be the best place to ask.
30
+
31
+ ## Something Not Working? Found a Bug?
32
+
33
+ If you think you found a bug, it probably is a bug.
34
+
35
+ * If it is a general Logstash or a pipeline issue, file it in [Logstash GitHub](https://github.com/elasticsearch/logstash/issues)
36
+ * If it is specific to a plugin, please file it in the respective repository under [logstash-plugins](https://github.com/logstash-plugins)
37
+ * or ask the [forum](https://discuss.elastic.co/c/logstash).
38
+
39
+ # Contributing Documentation and Code Changes
40
+
41
+ If you have a bugfix or new feature that you would like to contribute to
42
+ logstash, and you think it will take more than a few minutes to produce the fix
43
+ (ie; write code), it is worth discussing the change with the Logstash users and developers first! You can reach us via [GitHub](https://github.com/elastic/logstash/issues), the [forum](https://discuss.elastic.co/c/logstash), or via IRC (#logstash on freenode irc)
44
+ Please note that Pull Requests without tests will not be merged. If you would like to contribute but do not have experience with writing tests, please ping us on IRC/forum or create a PR and ask our help.
45
+
46
+ ## Contributing to plugins
47
+
48
+ Check our [documentation](https://www.elastic.co/guide/en/logstash/current/contributing-to-logstash.html) on how to contribute to plugins or write your own! It is super easy!
49
+
50
+ ## Contribution Steps
51
+
52
+ 1. Test your changes! [Run](https://github.com/elastic/logstash#testing) the test suite
53
+ 2. Please make sure you have signed our [Contributor License
54
+ Agreement](https://www.elastic.co/contributor-agreement/). We are not
55
+ asking you to assign copyright to us, but to give us the right to distribute
56
+ your code without restriction. We ask this of all contributors in order to
57
+ assure our users of the origin and continuing existence of the code. You
58
+ only need to sign the CLA once.
59
+ 3. Send a pull request! Push your changes to your fork of the repository and
60
+ [submit a pull
61
+ request](https://help.github.com/articles/using-pull-requests). In the pull
62
+ request, describe what your changes do and mention any bugs/issues related
63
+ to the pull request.
64
+
65
+
@@ -0,0 +1,9 @@
1
+ Please post all product and debugging questions on our [forum](https://discuss.elastic.co/c/logstash). Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.
2
+
3
+ For all general issues, please provide the following details for fast resolution:
4
+
5
+ - Version:
6
+ - Operating System:
7
+ - Config File (if you have sensitive info, please remove it):
8
+ - Sample Data:
9
+ - Steps to Reproduce:
@@ -0,0 +1 @@
1
+ Thanks for contributing to Logstash! If you haven't already signed our CLA, here's a handy link: https://www.elastic.co/contributor-agreement/
@@ -0,0 +1,7 @@
1
+ *.gem
2
+ Gemfile.lock
3
+ Gemfile.bak
4
+ .bundle
5
+ vendor
6
+
7
+ lib/**/*.jar
@@ -0,0 +1 @@
1
+ JRUBY_OPTS=-J-Xmx2048m
@@ -0,0 +1 @@
1
+ logstash_filter_stanford_nlp
@@ -0,0 +1 @@
1
+ jruby-1.7.19
@@ -0,0 +1,7 @@
1
+ sudo: false
2
+ language: ruby
3
+ cache: bundler
4
+ rvm:
5
+ - jruby-1.7.23
6
+ script:
7
+ - bundle exec rspec spec
@@ -0,0 +1,5 @@
1
+ ## 2.0.0
2
+ - Plugins were updated to follow the new shutdown semantic, this mainly allows Logstash to instruct input plugins to terminate gracefully,
3
+ instead of using Thread.raise on the plugins' threads. Ref: https://github.com/elastic/logstash/pull/3895
4
+ - Dependency on logstash-core update to 2.0
5
+
@@ -0,0 +1,11 @@
1
+ The following is a list of people who have contributed ideas, code, bug
2
+ reports, or in general have helped logstash along its way.
3
+
4
+ Contributors:
5
+ * Aaron Mildenstein (untergeek)
6
+ * Pier-Hugues Pellerin (ph)
7
+
8
+ Note: If you've sent us patches, bug reports, or otherwise contributed to
9
+ Logstash, and you aren't on the list above and want to be, please let us know
10
+ and we'll make sure you're here. Contributions from folks like you are what make
11
+ open source awesome.
@@ -0,0 +1,2 @@
1
+ # logstash-filter-example
2
+ Example filter plugin. This should help bootstrap your effort to write your own filter plugin!
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,13 @@
1
+ Copyright (c) 2012–2016 Elasticsearch <http://www.elastic.co>
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
@@ -0,0 +1,5 @@
1
+ Elasticsearch
2
+ Copyright 2012-2015 Elasticsearch
3
+
4
+ This product includes software developed by The Apache Software
5
+ Foundation (http://www.apache.org/).
@@ -0,0 +1,99 @@
1
+ # Logstash Plugin
2
+
3
+ This is a plugin for [Logstash](https://github.com/elastic/logstash). It integrates
4
+ the Logstash with the [Stanford NLP library](http://nlp.stanford.edu/software)
5
+
6
+ It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
7
+
8
+ ## Developing
9
+
10
+ ### 1. Plugin Developement and Testing
11
+
12
+ #### Code
13
+ - To get started, you'll need JRuby with the Bundler gem installed.
14
+
15
+ - Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example).
16
+
17
+ - Install dependencies
18
+ ```sh
19
+ bundle install
20
+ ```
21
+
22
+ #### Test
23
+
24
+ The Stanford NLP library relies on ***large*** (~400mb) model files stored in
25
+ JAR files. You will need to increase the size of your Java heap to run the tests
26
+ without crashing.
27
+
28
+ ```sh
29
+ export JRUBY_OPTS="-J-Xmx2048m"
30
+ ```
31
+ - Update your dependencies
32
+
33
+ ```sh
34
+ bundle install
35
+ ```
36
+
37
+ - Run tests
38
+
39
+ ```sh
40
+ bundle exec rspec
41
+ ```
42
+
43
+ ### 2. Running your unpublished Plugin in Logstash
44
+
45
+ #### 2.1 Run in a local Logstash clone
46
+
47
+ - Edit Logstash `Gemfile` and add the local plugin path, for example:
48
+ ```ruby
49
+ gem "logstash-filter-stanford-nlp", :path => "/your/local/logstash-filter-nlp"
50
+ ```
51
+ - Install plugin
52
+ ```sh
53
+ # Logstash 2.3 and higher
54
+ bin/logstash-plugin install --no-verify
55
+
56
+ # need to install a dependency
57
+ mkdir -p lib/edu/stanford/nlp/stanford-corenlp/3.6.0/ && curl http://nlp.stanford.edu/software/stanford-english-corenlp-2016-01-10-models.jar -o lib/edu/stanford/nlp/stanford-corenlp/3.6.0/stanford-corenlp-3.6.0-models.jar
58
+
59
+ # Prior to Logstash 2.3 - not supported
60
+ ```
61
+ - Run Logstash with your plugin
62
+
63
+ You may need to increase the heap size of logstash, just prepend LS_HEAP_SIZE=2048m
64
+ to the logstash invocation.
65
+
66
+ ```sh
67
+ bin/logstash -p lib -e 'input { stdin {} } filter { ner {} } output { stdout { codec => rubydebug } }'
68
+ ```
69
+ At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
70
+
71
+ #### 2.2 Run in an installed Logstash
72
+
73
+ You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using:
74
+
75
+ - Build your plugin gem
76
+ ```sh
77
+ gem build logstash-filter-awesome.gemspec
78
+ ```
79
+ - Install the plugin from the Logstash home
80
+ ```sh
81
+ # Logstash 2.3 and higher
82
+ bin/logstash-plugin install --no-verify
83
+
84
+ # need to install a dependency
85
+ mkdir -p lib/edu/stanford/nlp/stanford-corenlp/3.6.0/ && curl http://nlp.stanford.edu/software/stanford-english-corenlp-2016-01-10-models.jar -o lib/edu/stanford/nlp/stanford-corenlp/3.6.0/stanford-corenlp-3.6.0-models.jar
86
+
87
+ # Prior to Logstash 2.3 - not supported
88
+ ```
89
+ - Start Logstash and proceed to test the plugin
90
+
91
+ ## Contributing
92
+
93
+ All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.
94
+
95
+ Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.
96
+
97
+ It is more important to the community that you are able to contribute.
98
+
99
+ For more information about contributing, see the [CONTRIBUTING](https://github.com/elastic/logstash/blob/master/CONTRIBUTING.md) file.
@@ -0,0 +1 @@
1
+ require "logstash/devutils/rake"
@@ -0,0 +1,16 @@
1
+ # this is a generated file, to avoid over-writing it just delete this comment
2
+ require 'jar_dependencies'
3
+
4
+ require_jar( 'com.io7m.xom', 'xom', '1.2.10' )
5
+ require_jar( 'javax.xml.bind', 'jaxb-api', '2.2.7' )
6
+ require_jar( 'javax.json', 'javax.json-api', '1.0' )
7
+ require_jar( 'org.slf4j', 'slf4j-simple', '1.7.21' )
8
+ require_jar( 'xerces', 'xercesImpl', '2.8.0' )
9
+ require_jar( 'xalan', 'xalan', '2.7.0' )
10
+ require_jar( 'joda-time', 'joda-time', '2.9' )
11
+ require_jar( 'com.googlecode.efficient-java-matrix-library', 'ejml', '0.23' )
12
+ require_jar( 'org.slf4j', 'slf4j-api', '1.7.12' )
13
+ require_jar( 'xml-apis', 'xml-apis', '1.3.03' )
14
+ require_jar( 'edu.stanford.nlp', 'stanford-corenlp', '3.6.0' )
15
+ require_jar( 'de.jollyday', 'jollyday', '0.4.7' )
16
+ require_jar( 'com.google.protobuf', 'protobuf-java', '2.6.1' )
@@ -0,0 +1,45 @@
1
+ # encoding: utf-8
2
+ require "logstash/filters/base"
3
+ require "logstash/namespace"
4
+ require "logstash-filter-stanford-nlp_jars.rb"
5
+ require_jar( 'edu.stanford.nlp', 'stanford-corenlp', 'models', '3.6.0' )
6
+
7
+
8
+ class LogStash::Filters::Ner < LogStash::Filters::Base
9
+
10
+ config_name "ner"
11
+ config :message, :validate => :string, :default => ""
12
+
13
+ module NlpSimple
14
+ include_package "edu.stanford.nlp.simple"
15
+ end
16
+
17
+ public
18
+ def register
19
+ end
20
+
21
+ public
22
+ def filter(event)
23
+ if @message
24
+ names = []
25
+ locations = []
26
+ organizations = []
27
+ dates = []
28
+
29
+ document = NlpSimple::Document.new(event["message"])
30
+ document.sentences().each do |sentence|
31
+ names.concat(sentence.mentions("PERSON"))
32
+ locations.concat(sentence.mentions("LOCATION"))
33
+ organizations.concat(sentence.mentions("ORGANIZATION"))
34
+ dates.concat(sentence.mentions("DATE"))
35
+ end
36
+ event["ner.names"] = names
37
+ event["ner.locations"] = locations
38
+ event["ner.organizations"] = organizations
39
+ event["ner.dates"] = dates
40
+ end
41
+
42
+ # filter_matched should go in the last line of our successful code
43
+ filter_matched(event)
44
+ end
45
+ end
@@ -0,0 +1,29 @@
1
+ Gem::Specification.new do |s|
2
+ s.name = 'logstash-filter-stanford-nlp'
3
+ s.version = '0.0.1'
4
+ s.licenses = ['Apache License (2.0)']
5
+ s.summary = "This filter extracts named entities from the message and adds them as attributes to the message."
6
+ s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
7
+ s.authors = ["Jonathan Randall"]
8
+ s.email = 'jonathar@users.noreply.github.com'
9
+ s.homepage = "http://www.github.com/rahtanoj/logstash-filter-stanford-nlp"
10
+ s.require_paths = ["lib"]
11
+ s.platform = 'java'
12
+
13
+ # Files
14
+ s.files = `git ls-files -z`.split("\x0")
15
+
16
+ # Tests
17
+ s.test_files = s.files.grep(%r{^(test|spec|features)/})
18
+
19
+ # Special flag to let us know this is actually a logstash plugin
20
+ s.metadata = { "logstash_plugin" => "true", "logstash_group" => "filter" }
21
+
22
+ # Gem dependencies
23
+ s.requirements << "jar 'edu.stanford.nlp:stanford-corenlp', '3.6.0'"
24
+ s.requirements << "jar 'org.slf4j:slf4j-simple', '1.7.21'"
25
+ s.requirements << "jar 'com.google.protobuf:protobuf-java', '2.6.1'"
26
+ s.add_runtime_dependency "jar-dependencies", "~> 0"
27
+ s.add_runtime_dependency "logstash-core", ">= 2.0.0", "< 3.0.0"
28
+ s.add_development_dependency "logstash-devutils", "= 0.0.19"
29
+ end
@@ -0,0 +1,43 @@
1
+ # encoding: utf-8
2
+ require 'spec_helper'
3
+ require "logstash/filters/ner"
4
+
5
+ describe LogStash::Filters::Ner do
6
+ describe "should extract named entities" do
7
+ let(:config) { {} }
8
+ subject { described_class.new(config) }
9
+
10
+ describe "from the message attribute and expose" do
11
+ let(:data) { "Jeffrey Alan Mott and Michelle Mott, individuals Dda Integrity Landscape 3756 Independence Avenue Sanger, CA 93637 CSLB#774222 Decision 04/04/2016. Aldan, Inc. P.O. Box 9428, Brea, CA 92822 CSLB #949229 Decision"}
12
+ let(:event) { LogStash::Event.new("message" => data) }
13
+
14
+ it "a list of PERSONs" do
15
+ subject.register
16
+ subject.filter(event)
17
+ expect(event['message']).to eq(data)
18
+ expect(event['ner.names']).to include('Jeffrey Alan Mott')
19
+ end
20
+
21
+ it "a list of LOCATIONS" do
22
+ subject.register
23
+ subject.filter(event)
24
+ expect(event['message']).to eq(data)
25
+ expect(event['ner.locations']).to include('Brea')
26
+ end
27
+
28
+ it "a list of ORGANIZATIONs" do
29
+ subject.register
30
+ subject.filter(event)
31
+ expect(event['message']).to eq(data)
32
+ expect(event['ner.organizations']).to include('Aldan , Inc.')
33
+ end
34
+
35
+ it "a list of DATEs" do
36
+ subject.register
37
+ subject.filter(event)
38
+ expect(event['message']).to eq(data)
39
+ expect(event['ner.dates']).to include('04/04/2016')
40
+ end
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,2 @@
1
+ # encoding: utf-8
2
+ require "logstash/devutils/rspec/spec_helper"
metadata ADDED
@@ -0,0 +1,119 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: logstash-filter-stanford-nlp
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: java
6
+ authors:
7
+ - Jonathan Randall
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-05-10 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - ~>
17
+ - !ruby/object:Gem::Version
18
+ version: '0'
19
+ name: jar-dependencies
20
+ prerelease: false
21
+ type: :runtime
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ~>
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - '>='
31
+ - !ruby/object:Gem::Version
32
+ version: 2.0.0
33
+ - - <
34
+ - !ruby/object:Gem::Version
35
+ version: 3.0.0
36
+ name: logstash-core
37
+ prerelease: false
38
+ type: :runtime
39
+ version_requirements: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - '>='
42
+ - !ruby/object:Gem::Version
43
+ version: 2.0.0
44
+ - - <
45
+ - !ruby/object:Gem::Version
46
+ version: 3.0.0
47
+ - !ruby/object:Gem::Dependency
48
+ requirement: !ruby/object:Gem::Requirement
49
+ requirements:
50
+ - - '='
51
+ - !ruby/object:Gem::Version
52
+ version: 0.0.19
53
+ name: logstash-devutils
54
+ prerelease: false
55
+ type: :development
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - '='
59
+ - !ruby/object:Gem::Version
60
+ version: 0.0.19
61
+ description: This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program
62
+ email: jonathar@users.noreply.github.com
63
+ executables: []
64
+ extensions: []
65
+ extra_rdoc_files: []
66
+ files:
67
+ - .github/CONTRIBUTING.md
68
+ - .github/ISSUE_TEMPLATE.md
69
+ - .github/PULL_REQUEST_TEMPLATE.md
70
+ - .gitignore
71
+ - .ruby-env
72
+ - .ruby-gemset
73
+ - .ruby-version
74
+ - .travis.yml
75
+ - CHANGELOG.md
76
+ - CONTRIBUTORS
77
+ - DEVELOPER.md
78
+ - Gemfile
79
+ - LICENSE
80
+ - NOTICE.TXT
81
+ - README.md
82
+ - Rakefile
83
+ - lib/logstash-filter-stanford-nlp_jars.rb
84
+ - lib/logstash/filters/ner.rb
85
+ - logstash-filter-stanford-nlp.gemspec
86
+ - spec/filters/ner_spec.rb
87
+ - spec/spec_helper.rb
88
+ homepage: http://www.github.com/rahtanoj/logstash-filter-stanford-nlp
89
+ licenses:
90
+ - Apache License (2.0)
91
+ metadata:
92
+ logstash_plugin: 'true'
93
+ logstash_group: filter
94
+ post_install_message:
95
+ rdoc_options: []
96
+ require_paths:
97
+ - lib
98
+ required_ruby_version: !ruby/object:Gem::Requirement
99
+ requirements:
100
+ - - '>='
101
+ - !ruby/object:Gem::Version
102
+ version: '0'
103
+ required_rubygems_version: !ruby/object:Gem::Requirement
104
+ requirements:
105
+ - - '>='
106
+ - !ruby/object:Gem::Version
107
+ version: '0'
108
+ requirements:
109
+ - jar 'edu.stanford.nlp:stanford-corenlp', '3.6.0'
110
+ - jar 'org.slf4j:slf4j-simple', '1.7.21'
111
+ - jar 'com.google.protobuf:protobuf-java', '2.6.1'
112
+ rubyforge_project:
113
+ rubygems_version: 2.4.5
114
+ signing_key:
115
+ specification_version: 4
116
+ summary: This filter extracts named entities from the message and adds them as attributes to the message.
117
+ test_files:
118
+ - spec/filters/ner_spec.rb
119
+ - spec/spec_helper.rb