logstash-filter-stanford-nlp 0.0.1-java

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: ff7cc127d889fca197a10767b6090798e15c26f3
4
+ data.tar.gz: 43502d6e32406d82ed13487239bc99054102c733
5
+ SHA512:
6
+ metadata.gz: 297bb707f2876d2c3f6a38a31725ec814e9dd3d45833cfb1584d73e39d9ea85e7aa8dec59f8f0bf3d5c7fe8931b1137e7dc00ba5275cabae3043914244e25e10
7
+ data.tar.gz: ce2d3fa01e89d595c89b1f760c66d4e435fe16afd7f78b32cc4973a116102c76208b6a5e13853786def230ad2356aa66471355c04b7c7984d4db958ea7467c7c
@@ -0,0 +1,65 @@
1
+ # Contributing to Logstash
2
+
3
+ All contributions are welcome: ideas, patches, documentation, bug reports,
4
+ complaints, etc!
5
+
6
+ Programming is not a required skill, and there are many ways to help out!
7
+ It is more important to us that you are able to contribute.
8
+
9
+ That said, some basic guidelines, which you are free to ignore :)
10
+
11
+ ## Want to learn?
12
+
13
+ Want to lurk about and see what others are doing with Logstash?
14
+
15
+ * The irc channel (#logstash on irc.freenode.org) is a good place for this
16
+ * The [forum](https://discuss.elastic.co/c/logstash) is also
17
+ great for learning from others.
18
+
19
+ ## Got Questions?
20
+
21
+ Have a problem you want Logstash to solve for you?
22
+
23
+ * You can ask a question in the [forum](https://discuss.elastic.co/c/logstash)
24
+ * Alternately, you are welcome to join the IRC channel #logstash on
25
+ irc.freenode.org and ask for help there!
26
+
27
+ ## Have an Idea or Feature Request?
28
+
29
+ * File a ticket on [GitHub](https://github.com/elastic/logstash/issues). Please remember that GitHub is used only for issues and feature requests. If you have a general question, the [forum](https://discuss.elastic.co/c/logstash) or IRC would be the best place to ask.
30
+
31
+ ## Something Not Working? Found a Bug?
32
+
33
+ If you think you found a bug, it probably is a bug.
34
+
35
+ * If it is a general Logstash or a pipeline issue, file it in [Logstash GitHub](https://github.com/elasticsearch/logstash/issues)
36
+ * If it is specific to a plugin, please file it in the respective repository under [logstash-plugins](https://github.com/logstash-plugins)
37
+ * or ask the [forum](https://discuss.elastic.co/c/logstash).
38
+
39
+ # Contributing Documentation and Code Changes
40
+
41
+ If you have a bugfix or new feature that you would like to contribute to
42
+ logstash, and you think it will take more than a few minutes to produce the fix
43
+ (ie; write code), it is worth discussing the change with the Logstash users and developers first! You can reach us via [GitHub](https://github.com/elastic/logstash/issues), the [forum](https://discuss.elastic.co/c/logstash), or via IRC (#logstash on freenode irc)
44
+ Please note that Pull Requests without tests will not be merged. If you would like to contribute but do not have experience with writing tests, please ping us on IRC/forum or create a PR and ask our help.
45
+
46
+ ## Contributing to plugins
47
+
48
+ Check our [documentation](https://www.elastic.co/guide/en/logstash/current/contributing-to-logstash.html) on how to contribute to plugins or write your own! It is super easy!
49
+
50
+ ## Contribution Steps
51
+
52
+ 1. Test your changes! [Run](https://github.com/elastic/logstash#testing) the test suite
53
+ 2. Please make sure you have signed our [Contributor License
54
+ Agreement](https://www.elastic.co/contributor-agreement/). We are not
55
+ asking you to assign copyright to us, but to give us the right to distribute
56
+ your code without restriction. We ask this of all contributors in order to
57
+ assure our users of the origin and continuing existence of the code. You
58
+ only need to sign the CLA once.
59
+ 3. Send a pull request! Push your changes to your fork of the repository and
60
+ [submit a pull
61
+ request](https://help.github.com/articles/using-pull-requests). In the pull
62
+ request, describe what your changes do and mention any bugs/issues related
63
+ to the pull request.
64
+
65
+
@@ -0,0 +1,9 @@
1
+ Please post all product and debugging questions on our [forum](https://discuss.elastic.co/c/logstash). Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.
2
+
3
+ For all general issues, please provide the following details for fast resolution:
4
+
5
+ - Version:
6
+ - Operating System:
7
+ - Config File (if you have sensitive info, please remove it):
8
+ - Sample Data:
9
+ - Steps to Reproduce:
@@ -0,0 +1 @@
1
+ Thanks for contributing to Logstash! If you haven't already signed our CLA, here's a handy link: https://www.elastic.co/contributor-agreement/
@@ -0,0 +1,7 @@
1
+ *.gem
2
+ Gemfile.lock
3
+ Gemfile.bak
4
+ .bundle
5
+ vendor
6
+
7
+ lib/**/*.jar
@@ -0,0 +1 @@
1
+ JRUBY_OPTS=-J-Xmx2048m
@@ -0,0 +1 @@
1
+ logstash_filter_stanford_nlp
@@ -0,0 +1 @@
1
+ jruby-1.7.19
@@ -0,0 +1,7 @@
1
+ sudo: false
2
+ language: ruby
3
+ cache: bundler
4
+ rvm:
5
+ - jruby-1.7.23
6
+ script:
7
+ - bundle exec rspec spec
@@ -0,0 +1,5 @@
1
+ ## 2.0.0
2
+ - Plugins were updated to follow the new shutdown semantic, this mainly allows Logstash to instruct input plugins to terminate gracefully,
3
+ instead of using Thread.raise on the plugins' threads. Ref: https://github.com/elastic/logstash/pull/3895
4
+ - Dependency on logstash-core update to 2.0
5
+
@@ -0,0 +1,11 @@
1
+ The following is a list of people who have contributed ideas, code, bug
2
+ reports, or in general have helped logstash along its way.
3
+
4
+ Contributors:
5
+ * Aaron Mildenstein (untergeek)
6
+ * Pier-Hugues Pellerin (ph)
7
+
8
+ Note: If you've sent us patches, bug reports, or otherwise contributed to
9
+ Logstash, and you aren't on the list above and want to be, please let us know
10
+ and we'll make sure you're here. Contributions from folks like you are what make
11
+ open source awesome.
@@ -0,0 +1,2 @@
1
+ # logstash-filter-example
2
+ Example filter plugin. This should help bootstrap your effort to write your own filter plugin!
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,13 @@
1
+ Copyright (c) 2012–2016 Elasticsearch <http://www.elastic.co>
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
@@ -0,0 +1,5 @@
1
+ Elasticsearch
2
+ Copyright 2012-2015 Elasticsearch
3
+
4
+ This product includes software developed by The Apache Software
5
+ Foundation (http://www.apache.org/).
@@ -0,0 +1,99 @@
1
+ # Logstash Plugin
2
+
3
+ This is a plugin for [Logstash](https://github.com/elastic/logstash). It integrates
4
+ the Logstash with the [Stanford NLP library](http://nlp.stanford.edu/software)
5
+
6
+ It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
7
+
8
+ ## Developing
9
+
10
+ ### 1. Plugin Developement and Testing
11
+
12
+ #### Code
13
+ - To get started, you'll need JRuby with the Bundler gem installed.
14
+
15
+ - Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example).
16
+
17
+ - Install dependencies
18
+ ```sh
19
+ bundle install
20
+ ```
21
+
22
+ #### Test
23
+
24
+ The Stanford NLP library relies on ***large*** (~400mb) model files stored in
25
+ JAR files. You will need to increase the size of your Java heap to run the tests
26
+ without crashing.
27
+
28
+ ```sh
29
+ export JRUBY_OPTS="-J-Xmx2048m"
30
+ ```
31
+ - Update your dependencies
32
+
33
+ ```sh
34
+ bundle install
35
+ ```
36
+
37
+ - Run tests
38
+
39
+ ```sh
40
+ bundle exec rspec
41
+ ```
42
+
43
+ ### 2. Running your unpublished Plugin in Logstash
44
+
45
+ #### 2.1 Run in a local Logstash clone
46
+
47
+ - Edit Logstash `Gemfile` and add the local plugin path, for example:
48
+ ```ruby
49
+ gem "logstash-filter-stanford-nlp", :path => "/your/local/logstash-filter-nlp"
50
+ ```
51
+ - Install plugin
52
+ ```sh
53
+ # Logstash 2.3 and higher
54
+ bin/logstash-plugin install --no-verify
55
+
56
+ # need to install a dependency
57
+ mkdir -p lib/edu/stanford/nlp/stanford-corenlp/3.6.0/ && curl http://nlp.stanford.edu/software/stanford-english-corenlp-2016-01-10-models.jar -o lib/edu/stanford/nlp/stanford-corenlp/3.6.0/stanford-corenlp-3.6.0-models.jar
58
+
59
+ # Prior to Logstash 2.3 - not supported
60
+ ```
61
+ - Run Logstash with your plugin
62
+
63
+ You may need to increase the heap size of logstash, just prepend LS_HEAP_SIZE=2048m
64
+ to the logstash invocation.
65
+
66
+ ```sh
67
+ bin/logstash -p lib -e 'input { stdin {} } filter { ner {} } output { stdout { codec => rubydebug } }'
68
+ ```
69
+ At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
70
+
71
+ #### 2.2 Run in an installed Logstash
72
+
73
+ You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using:
74
+
75
+ - Build your plugin gem
76
+ ```sh
77
+ gem build logstash-filter-awesome.gemspec
78
+ ```
79
+ - Install the plugin from the Logstash home
80
+ ```sh
81
+ # Logstash 2.3 and higher
82
+ bin/logstash-plugin install --no-verify
83
+
84
+ # need to install a dependency
85
+ mkdir -p lib/edu/stanford/nlp/stanford-corenlp/3.6.0/ && curl http://nlp.stanford.edu/software/stanford-english-corenlp-2016-01-10-models.jar -o lib/edu/stanford/nlp/stanford-corenlp/3.6.0/stanford-corenlp-3.6.0-models.jar
86
+
87
+ # Prior to Logstash 2.3 - not supported
88
+ ```
89
+ - Start Logstash and proceed to test the plugin
90
+
91
+ ## Contributing
92
+
93
+ All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.
94
+
95
+ Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.
96
+
97
+ It is more important to the community that you are able to contribute.
98
+
99
+ For more information about contributing, see the [CONTRIBUTING](https://github.com/elastic/logstash/blob/master/CONTRIBUTING.md) file.
@@ -0,0 +1 @@
1
+ require "logstash/devutils/rake"
@@ -0,0 +1,16 @@
1
+ # this is a generated file, to avoid over-writing it just delete this comment
2
+ require 'jar_dependencies'
3
+
4
+ require_jar( 'com.io7m.xom', 'xom', '1.2.10' )
5
+ require_jar( 'javax.xml.bind', 'jaxb-api', '2.2.7' )
6
+ require_jar( 'javax.json', 'javax.json-api', '1.0' )
7
+ require_jar( 'org.slf4j', 'slf4j-simple', '1.7.21' )
8
+ require_jar( 'xerces', 'xercesImpl', '2.8.0' )
9
+ require_jar( 'xalan', 'xalan', '2.7.0' )
10
+ require_jar( 'joda-time', 'joda-time', '2.9' )
11
+ require_jar( 'com.googlecode.efficient-java-matrix-library', 'ejml', '0.23' )
12
+ require_jar( 'org.slf4j', 'slf4j-api', '1.7.12' )
13
+ require_jar( 'xml-apis', 'xml-apis', '1.3.03' )
14
+ require_jar( 'edu.stanford.nlp', 'stanford-corenlp', '3.6.0' )
15
+ require_jar( 'de.jollyday', 'jollyday', '0.4.7' )
16
+ require_jar( 'com.google.protobuf', 'protobuf-java', '2.6.1' )
@@ -0,0 +1,45 @@
1
+ # encoding: utf-8
2
+ require "logstash/filters/base"
3
+ require "logstash/namespace"
4
+ require "logstash-filter-stanford-nlp_jars.rb"
5
+ require_jar( 'edu.stanford.nlp', 'stanford-corenlp', 'models', '3.6.0' )
6
+
7
+
8
+ class LogStash::Filters::Ner < LogStash::Filters::Base
9
+
10
+ config_name "ner"
11
+ config :message, :validate => :string, :default => ""
12
+
13
+ module NlpSimple
14
+ include_package "edu.stanford.nlp.simple"
15
+ end
16
+
17
+ public
18
+ def register
19
+ end
20
+
21
+ public
22
+ def filter(event)
23
+ if @message
24
+ names = []
25
+ locations = []
26
+ organizations = []
27
+ dates = []
28
+
29
+ document = NlpSimple::Document.new(event["message"])
30
+ document.sentences().each do |sentence|
31
+ names.concat(sentence.mentions("PERSON"))
32
+ locations.concat(sentence.mentions("LOCATION"))
33
+ organizations.concat(sentence.mentions("ORGANIZATION"))
34
+ dates.concat(sentence.mentions("DATE"))
35
+ end
36
+ event["ner.names"] = names
37
+ event["ner.locations"] = locations
38
+ event["ner.organizations"] = organizations
39
+ event["ner.dates"] = dates
40
+ end
41
+
42
+ # filter_matched should go in the last line of our successful code
43
+ filter_matched(event)
44
+ end
45
+ end
@@ -0,0 +1,29 @@
1
+ Gem::Specification.new do |s|
2
+ s.name = 'logstash-filter-stanford-nlp'
3
+ s.version = '0.0.1'
4
+ s.licenses = ['Apache License (2.0)']
5
+ s.summary = "This filter extracts named entities from the message and adds them as attributes to the message."
6
+ s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
7
+ s.authors = ["Jonathan Randall"]
8
+ s.email = 'jonathar@users.noreply.github.com'
9
+ s.homepage = "http://www.github.com/rahtanoj/logstash-filter-stanford-nlp"
10
+ s.require_paths = ["lib"]
11
+ s.platform = 'java'
12
+
13
+ # Files
14
+ s.files = `git ls-files -z`.split("\x0")
15
+
16
+ # Tests
17
+ s.test_files = s.files.grep(%r{^(test|spec|features)/})
18
+
19
+ # Special flag to let us know this is actually a logstash plugin
20
+ s.metadata = { "logstash_plugin" => "true", "logstash_group" => "filter" }
21
+
22
+ # Gem dependencies
23
+ s.requirements << "jar 'edu.stanford.nlp:stanford-corenlp', '3.6.0'"
24
+ s.requirements << "jar 'org.slf4j:slf4j-simple', '1.7.21'"
25
+ s.requirements << "jar 'com.google.protobuf:protobuf-java', '2.6.1'"
26
+ s.add_runtime_dependency "jar-dependencies", "~> 0"
27
+ s.add_runtime_dependency "logstash-core", ">= 2.0.0", "< 3.0.0"
28
+ s.add_development_dependency "logstash-devutils", "= 0.0.19"
29
+ end
@@ -0,0 +1,43 @@
1
+ # encoding: utf-8
2
+ require 'spec_helper'
3
+ require "logstash/filters/ner"
4
+
5
+ describe LogStash::Filters::Ner do
6
+ describe "should extract named entities" do
7
+ let(:config) { {} }
8
+ subject { described_class.new(config) }
9
+
10
+ describe "from the message attribute and expose" do
11
+ let(:data) { "Jeffrey Alan Mott and Michelle Mott, individuals Dda Integrity Landscape 3756 Independence Avenue Sanger, CA 93637 CSLB#774222 Decision 04/04/2016. Aldan, Inc. P.O. Box 9428, Brea, CA 92822 CSLB #949229 Decision"}
12
+ let(:event) { LogStash::Event.new("message" => data) }
13
+
14
+ it "a list of PERSONs" do
15
+ subject.register
16
+ subject.filter(event)
17
+ expect(event['message']).to eq(data)
18
+ expect(event['ner.names']).to include('Jeffrey Alan Mott')
19
+ end
20
+
21
+ it "a list of LOCATIONS" do
22
+ subject.register
23
+ subject.filter(event)
24
+ expect(event['message']).to eq(data)
25
+ expect(event['ner.locations']).to include('Brea')
26
+ end
27
+
28
+ it "a list of ORGANIZATIONs" do
29
+ subject.register
30
+ subject.filter(event)
31
+ expect(event['message']).to eq(data)
32
+ expect(event['ner.organizations']).to include('Aldan , Inc.')
33
+ end
34
+
35
+ it "a list of DATEs" do
36
+ subject.register
37
+ subject.filter(event)
38
+ expect(event['message']).to eq(data)
39
+ expect(event['ner.dates']).to include('04/04/2016')
40
+ end
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,2 @@
1
+ # encoding: utf-8
2
+ require "logstash/devutils/rspec/spec_helper"
metadata ADDED
@@ -0,0 +1,119 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: logstash-filter-stanford-nlp
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: java
6
+ authors:
7
+ - Jonathan Randall
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-05-10 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - ~>
17
+ - !ruby/object:Gem::Version
18
+ version: '0'
19
+ name: jar-dependencies
20
+ prerelease: false
21
+ type: :runtime
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ~>
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - '>='
31
+ - !ruby/object:Gem::Version
32
+ version: 2.0.0
33
+ - - <
34
+ - !ruby/object:Gem::Version
35
+ version: 3.0.0
36
+ name: logstash-core
37
+ prerelease: false
38
+ type: :runtime
39
+ version_requirements: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - '>='
42
+ - !ruby/object:Gem::Version
43
+ version: 2.0.0
44
+ - - <
45
+ - !ruby/object:Gem::Version
46
+ version: 3.0.0
47
+ - !ruby/object:Gem::Dependency
48
+ requirement: !ruby/object:Gem::Requirement
49
+ requirements:
50
+ - - '='
51
+ - !ruby/object:Gem::Version
52
+ version: 0.0.19
53
+ name: logstash-devutils
54
+ prerelease: false
55
+ type: :development
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - '='
59
+ - !ruby/object:Gem::Version
60
+ version: 0.0.19
61
+ description: This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program
62
+ email: jonathar@users.noreply.github.com
63
+ executables: []
64
+ extensions: []
65
+ extra_rdoc_files: []
66
+ files:
67
+ - .github/CONTRIBUTING.md
68
+ - .github/ISSUE_TEMPLATE.md
69
+ - .github/PULL_REQUEST_TEMPLATE.md
70
+ - .gitignore
71
+ - .ruby-env
72
+ - .ruby-gemset
73
+ - .ruby-version
74
+ - .travis.yml
75
+ - CHANGELOG.md
76
+ - CONTRIBUTORS
77
+ - DEVELOPER.md
78
+ - Gemfile
79
+ - LICENSE
80
+ - NOTICE.TXT
81
+ - README.md
82
+ - Rakefile
83
+ - lib/logstash-filter-stanford-nlp_jars.rb
84
+ - lib/logstash/filters/ner.rb
85
+ - logstash-filter-stanford-nlp.gemspec
86
+ - spec/filters/ner_spec.rb
87
+ - spec/spec_helper.rb
88
+ homepage: http://www.github.com/rahtanoj/logstash-filter-stanford-nlp
89
+ licenses:
90
+ - Apache License (2.0)
91
+ metadata:
92
+ logstash_plugin: 'true'
93
+ logstash_group: filter
94
+ post_install_message:
95
+ rdoc_options: []
96
+ require_paths:
97
+ - lib
98
+ required_ruby_version: !ruby/object:Gem::Requirement
99
+ requirements:
100
+ - - '>='
101
+ - !ruby/object:Gem::Version
102
+ version: '0'
103
+ required_rubygems_version: !ruby/object:Gem::Requirement
104
+ requirements:
105
+ - - '>='
106
+ - !ruby/object:Gem::Version
107
+ version: '0'
108
+ requirements:
109
+ - jar 'edu.stanford.nlp:stanford-corenlp', '3.6.0'
110
+ - jar 'org.slf4j:slf4j-simple', '1.7.21'
111
+ - jar 'com.google.protobuf:protobuf-java', '2.6.1'
112
+ rubyforge_project:
113
+ rubygems_version: 2.4.5
114
+ signing_key:
115
+ specification_version: 4
116
+ summary: This filter extracts named entities from the message and adds them as attributes to the message.
117
+ test_files:
118
+ - spec/filters/ner_spec.rb
119
+ - spec/spec_helper.rb