logstash-input-multirss 1.1.0 → 1.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 48be80f785a5000cf522c4717ba16872e8547082844d38d032698b2813ddf2af
4
- data.tar.gz: 3e37d5fcddafc3cff0cb9dd6c4d488ae6a1fd40c9aad1219f8938771f7fd1ed2
3
+ metadata.gz: e1cb1975f468ee41df996eab224de416ee1c1389d4508ab153eba11e3d9ec601
4
+ data.tar.gz: 66adf18634c47c4a2bf3aebc31b071e112bf54b7f1eb40a34e0f47ecbebcedad
5
5
  SHA512:
6
- metadata.gz: b2795603d6db7056798272912fc7986e410aad8cabdfda6fdce74a5e7dc26ccbfe621a13817f191ff8a04bfa50ffe193b060a661e80ec9d6ba53cf70acc8f38c
7
- data.tar.gz: 1b83d04df5d8f0a35003f4c1287f7f800e1a4a3792c87ac58b7a6ca151f60ef0a31c20d940a663468aae62327820a2540995756e63684d3ad302d7e4e9310ab7
6
+ metadata.gz: '0401495068a213d7d689859c390b3ec1dd9a95e82c6e3af42d49a39780770eb9558c2558c9cf3b1951d1ba76ee7462d568c39384488b48b16f1ac6cf30219d39'
7
+ data.tar.gz: a65c76f4dce2c18bf85074328c30c1dc5ffbfaecd127bae27a1664cb1e2ac94c13a79bcd4c77a97c03741c62bdb458b3a506f9b9d8737f8a54c6e1a4488e9d1f
data/README.md CHANGED
@@ -4,7 +4,9 @@ This is a plugin for [Logstash](https://github.com/elastic/logstash).
4
4
 
5
5
  It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
6
6
 
7
- # Install
7
+ # Developing
8
+
9
+ ## Install
8
10
 
9
11
  U can install the plugin from https://rubygems.org/gems/logstash-input-multirss , or build it yuouself in a logstash service or container with :
10
12
 
@@ -16,15 +18,28 @@ ruby -S gem build logstash-input-multirss.gemspec
16
18
 
17
19
  logstash-plugin install logstash-input-multirss-[nº_version].gem
18
20
 
19
- # Pipeline Example
21
+ ### Pipeline Example
20
22
 
21
23
  You can see a example in https://github.com/felixramirezgarcia/logstash-input-multirss/blob/master/example-pipeline.conf
22
24
 
23
25
  The difference between the attributes multi_feed and one_feed is that the multi_feed is the URI of the parent address where several rss (xml) are found. For the case where you want to explore only one of those links you can use the one_feed attribute. A visual example can be seen by visiting the following links:
24
26
 
25
27
  Father (multi_feed) => http://rss.elmundo.es/rss/
28
+
26
29
  Son (one_feed) => http://estaticos.elmundo.es/elmundo/rss/portada.xml
27
30
 
31
+ All the params are :
32
+
33
+ 1) multi_feed => [array] URI parent with more rss links inside , something like this: http://rss.elmundo.es/rss/
34
+
35
+ 2) one_feed => [array] childs URIS with XML content inside , something like this: http://estaticos.elmundo.es/elmundo/rss/portada.xml
36
+
37
+ 3) blacklist => [array] strings , links, text ... what you dont want explored
38
+
39
+ 4) Interval => [int] Set the Stoppable_sleep interval for the pipe
40
+
41
+ 5) keywords => [array] If you use this parameter will only compile those news that contain in any of its attributes a word from this array
42
+
28
43
  ## Documentation
29
44
 
30
45
  Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elastic.co/guide/en/logstash/current/).
@@ -36,37 +51,9 @@ Logstash provides infrastructure to automatically generate documentation for thi
36
51
 
37
52
  Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/logstash discussion forum.
38
53
 
39
- ## Developing
40
-
41
- ### 1. Plugin Developement and Testing
42
-
43
- #### Code
44
- - To get started, you'll need JRuby with the Bundler gem installed.
45
-
46
- - Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization. We also provide [example plugins](https://github.com/logstash-plugins?query=example).
47
-
48
- - Install dependencies
49
- ```sh
50
- bundle install
51
- ```
52
-
53
- #### Test
54
-
55
- - Update your dependencies
56
-
57
- ```sh
58
- bundle install
59
- ```
60
-
61
- - Run tests
62
-
63
- ```sh
64
- bundle exec rspec
65
- ```
66
-
67
- ### 2. Running your unpublished Plugin in Logstash
54
+ # Running your unpublished Plugin in Logstash
68
55
 
69
- #### 2.1 Run in a local Logstash clone
56
+ ## Run in a local Logstash clone
70
57
 
71
58
  - Edit Logstash `Gemfile` and add the local plugin path, for example:
72
59
  ```ruby
@@ -82,9 +69,9 @@ bin/logstash -e 'filter {awesome {}}'
82
69
  ```
83
70
  At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
84
71
 
85
- #### 2.2 Run in an installed Logstash
72
+ ## Run in an installed Logstash
86
73
 
87
- You can use the same **2.1** method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using:
74
+ You can use the same method to run your plugin in an installed Logstash by editing its `Gemfile` and pointing the `:path` to your local plugin development directory or you can build the gem and install it using:
88
75
 
89
76
  - Build your plugin gem
90
77
  ```sh
@@ -7,31 +7,37 @@ require "uri"
7
7
  require "mechanize"
8
8
  require "rss"
9
9
  require "nokogiri"
10
+ require "fileutils"
10
11
 
11
12
  # if you want to debug it you just have to uncomment the puts and build the gem with
12
13
  # ruby -S gem build logstash-input-multirss.gemspec
13
- # and install the gem in a logstash service with
14
+ # and install the gem in a logstash service or container with
14
15
  # logstash-plugin install logstash-input-multirss-x.x.x.gem
15
16
 
16
17
  class LogStash::Inputs::Multirss < LogStash::Inputs::Base
17
- config_name "multirss"
18
+ config_name "multirss" #Plugin name
18
19
 
19
- default :codec, "plain"
20
+ default :codec, "plain" #Codec
20
21
 
21
- # The rss array list to use in the pipe
22
- config :multi_feed, :validate => :array, :required => true
22
+ # The rss parent array list to use in the pipe (link with a lot rss links inside)
23
+ config :multi_feed, :validate => :array, :default => []
23
24
 
24
- # The rss array list to use in the pipe
25
+ # The rss childs array list to use in the pipe (simple rss link)
25
26
  config :one_feed, :validate => :array, :default => []
26
27
 
27
28
  #Set de interval for stoppable_sleep
28
29
  config :interval, :validate => :number, :default => 3600
29
30
 
30
- #Set de black list to forget read
31
+ #Set de black list to forget read and get content
31
32
  config :blacklist, :validate => :array, :default => []
32
33
 
34
+ #Set de keywords to ONLY get content whit it
35
+ config :keywords, :validate => :array, :default => []
36
+
33
37
  public
34
- def register
38
+
39
+ def register #initialize
40
+ #Mechanize agent
35
41
  @agent = Mechanize.new
36
42
  @agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
37
43
  end # def register
@@ -41,11 +47,13 @@ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
41
47
  # we can abort the loop if stop? becomes true
42
48
  urls = []
43
49
 
50
+ #Don't stop, keep going.
44
51
  while !stop?
45
-
46
- @multi_feed.each do |rss|
47
- str = "Read parent: " + rss
48
- #puts str
52
+
53
+ manage_tempdir
54
+
55
+ @multi_feed.each do |rss| #get the father's children
56
+ #puts "Read parent: " + rss
49
57
  begin
50
58
  page = @agent.get(rss)
51
59
  page.links.each do |link|
@@ -62,11 +70,9 @@ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
62
70
  links.each do |link|
63
71
  begin
64
72
  response_link(link,queue)
65
- str = "Read clidren: " + link
66
- #puts str
73
+ #puts "Read clidren: " + link
67
74
  rescue
68
- str = "Fail to get " + link + " children"
69
- #puts str
75
+ #puts "Fail to get " + link + " children"
70
76
  next
71
77
  end # end begin
72
78
  end # end each links
@@ -82,17 +88,23 @@ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
82
88
  all_links.each do |link|
83
89
  begin
84
90
  response_link(link,queue)
85
- str = "Read clidren: " + link
86
- #puts str
91
+ #puts "Read clidren: " + link
87
92
  rescue
88
- str = "Fail to get " + link
89
- #puts str
93
+ #puts "Fail to get " + link
90
94
  next
91
95
  end # begin
92
96
  end # all_links loop
93
97
 
94
98
  urls.clear
95
99
 
100
+ # Remove the tempfiles
101
+ if (File::directory?(@d))
102
+ ENV.delete("TMPDIR")
103
+ FileUtils.rm_rf @d
104
+ #puts "Remove temp dir"
105
+ end
106
+
107
+ #Stoppable_sleep interval
96
108
  Stud.stoppable_sleep(@interval) { stop? }
97
109
  end # end while
98
110
  end # end def run
@@ -129,19 +141,53 @@ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
129
141
 
130
142
  def link_rss_response(queue, item)
131
143
  event = LogStash::Event.new()
132
- item.element_children.each do |x|
133
- if x.inner_html.to_s.chars.first(9).join == "<![CDATA["
134
- eve = LogStash::Event.new( x.name => x.inner_html.to_s[9..x.inner_html.to_s.length-4])
135
- event.append( eve )
136
- else
137
- eve = LogStash::Event.new( x.name => x.inner_html.to_s )
138
- event.append( eve )
139
- end # end if
140
- end # end loop
141
- decorate(event)
142
- queue << event
144
+
145
+ if @keywords.size.to_s.to_i > 0 # "Have keywords
146
+ haskey = false
147
+
148
+ item.element_children.each do |x|
149
+ if include_keywords(x.inner_html.to_s)
150
+ #puts "--------------Finded notice with the keyword---------------"
151
+ haskey = true
152
+ end
153
+ end # end loop
154
+
155
+ if haskey == true
156
+ item.element_children.each do |x|
157
+ #puts "The notice " + x.name + " is " + x.inner_html.to_s
158
+ if x.inner_html.to_s.chars.first(9).join == "<![CDATA["
159
+ eve = LogStash::Event.new( x.name => x.inner_html.to_s[9..x.inner_html.to_s.length-4] )
160
+ event.append( eve )
161
+ else
162
+ eve = LogStash::Event.new( x.name => x.inner_html.to_s )
163
+ event.append( eve )
164
+ end # end if else
165
+ end # end loop
166
+ elsif haskey == false # havent haskey
167
+ event = nil
168
+ end # if haskey
169
+
170
+ else # havent keywords!
171
+ #puts "Havent keywords, go to get all items"
172
+ item.element_children.each do |x|
173
+ if x.inner_html.to_s.chars.first(9).join == "<![CDATA["
174
+ eve = LogStash::Event.new( x.name => x.inner_html.to_s[9..x.inner_html.to_s.length-4])
175
+ event.append( eve )
176
+ else
177
+ eve = LogStash::Event.new( x.name => x.inner_html.to_s )
178
+ event.append( eve )
179
+ end # end if
180
+ end # end loop
181
+ end # end if have keywords
182
+
183
+ if event != nil
184
+ decorate(event)
185
+ queue << event
186
+ end # end if
187
+
143
188
  end # def link_rss_response
144
189
 
190
+
145
191
  def not_include_blacklist(link)
146
192
  for i in 0..@blacklist.length-1
147
193
  if link.href.include?(@blacklist[i])
@@ -152,4 +198,33 @@ class LogStash::Inputs::Multirss < LogStash::Inputs::Base
152
198
  end # def not_include_blacklist
153
199
 
154
200
 
201
+ def include_keywords(key)
202
+ for i in 0..@keywords.length-1
203
+ if key.include?(@keywords[i])
204
+ return true
205
+ end # end if
206
+ end # end for
207
+ return false
208
+ end # def include_keywords
209
+
210
+
211
+ def manage_tempdir
212
+ #set the tempfile to openUri output
213
+ @d = "#{Dir.home}/.tmp"
214
+ #if exists
215
+ if (File::directory?(@d))
216
+ #puts "Dir exists , removed and create again"
217
+ ENV.delete("TMPDIR")
218
+ FileUtils.rm_rf @d
219
+ #create new
220
+ Dir.mkdir @d #create in /usr/share/logstash
221
+ ENV["TMPDIR"] = @d
222
+ else
223
+ Dir.mkdir @d #create in /usr/share/logstash
224
+ ENV["TMPDIR"] = @d
225
+ #puts "Dir no exist , created...."
226
+ end
227
+ end
228
+
229
+
155
230
  end # class LogStash::Inputs::Crawler
@@ -1,9 +1,14 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'logstash-input-multirss'
3
- s.version = '1.1.0'
3
+ s.version = '1.2.0'
4
4
  s.licenses = ['Apache-2.0']
5
5
  s.summary = 'Simple multi rss plugin'
6
- s.description = 'This plugin needs a list of links of different rss. Get all the links of the main feed pages and get all the content of each of the links.'
6
+ s.description = 'This plugin get the feed rss content (being able to use keywords to get the feed) , the params are:
7
+ 1) multi_feed => [array] URI parent with more rss links inside , something like this: http://rss.elmundo.es/rss/
8
+ 2) one_feed => [array] (optionally) childs URIS with XML content inside , something like this: http://estaticos.elmundo.es/elmundo/rss/portada.xml
9
+ 3) blacklist => [array] (optionally) strings , links, text ... what you dont want explored
10
+ 4) Interval => [int] Set the Stoppable_sleep interval for the pipe
11
+ 5) keywords => [array] if you use this parameter will only compile those news that contain in any of its attributes a word from this array'
7
12
  s.homepage = 'https://github.com/felixramirezgarcia/logstash-input-multirss'
8
13
  s.authors = ['Felix R G']
9
14
  s.email = 'felixramirezgarcia@correo.ugr.es'
@@ -24,4 +29,5 @@ Gem::Specification.new do |s|
24
29
  s.add_development_dependency 'logstash-devutils', '>= 0.0.16'
25
30
  s.add_runtime_dependency "mechanize"
26
31
  s.add_runtime_dependency "nokogiri"
32
+ s.add_runtime_dependency "fileutils"
27
33
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-input-multirss
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 1.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Felix R G
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-08-27 00:00:00.000000000 Z
11
+ date: 2018-09-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -94,8 +94,30 @@ dependencies:
94
94
  - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0'
97
- description: This plugin needs a list of links of different rss. Get all the links
98
- of the main feed pages and get all the content of each of the links.
97
+ - !ruby/object:Gem::Dependency
98
+ requirement: !ruby/object:Gem::Requirement
99
+ requirements:
100
+ - - ">="
101
+ - !ruby/object:Gem::Version
102
+ version: '0'
103
+ name: fileutils
104
+ prerelease: false
105
+ type: :runtime
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ description: "This plugin get the feed rss content (being able to use keywords to\
112
+ \ get the feed) , the params are: \n 1) multi_feed => [array]\
113
+ \ URI parent with more rss links inside , something like this: http://rss.elmundo.es/rss/\
114
+ \ \n 2) one_feed => [array] (optionally) childs URIS with\
115
+ \ XML content inside , something like this: http://estaticos.elmundo.es/elmundo/rss/portada.xml\
116
+ \ \n 3) blacklist => [array] (optionally) strings , links,\
117
+ \ text ... what you dont want explored\n 4) Interval => [int]\
118
+ \ Set the Stoppable_sleep interval for the pipe\n 5) keywords\
119
+ \ => [array] if you use this parameter will only compile those news that contain\
120
+ \ in any of its attributes a word from this array"
99
121
  email: felixramirezgarcia@correo.ugr.es
100
122
  executables: []
101
123
  extensions: []