log_sense 1.9.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ccf396e27466411bfa603709787126290e24d0652ac5cc6b96bad008d11ccd6d
4
- data.tar.gz: 4cee6242192c1c38273c7a9fa55c2d55ea18aa65c6328932a27f944fa8b25cbd
3
+ metadata.gz: 288570381159c730801985845d064e62fa8ad08ee6b44f48ebf2928e60e80e47
4
+ data.tar.gz: 64d80612d568f4fd1991257d2755ec38ef94543b1fd451097efd6e9006ab551f
5
5
  SHA512:
6
- metadata.gz: 25874781f2e012a40de832d553a1a3bd0cfa0fd232988e482a2e6ceaf6c3cdd632bbe5059bbb134f4fc3029add42ff0e51311d9c6c3a61feedebf69df1f9c8b6
7
- data.tar.gz: 0e9733a78e5b972ed20c657983e04f7d238a14646e91a0c68e2d4594c4fbaecacc5271ec1c3e15949d590b86f04049b837e448b1473f9db60a8c2a3673074a02
6
+ metadata.gz: 6352f42ccbd453e9adf4af83372ad8c7113f58d419e502e444dad2b3f3ad5e54b1f31f077631040ea473a9b2976e38ff2b8960cb91091f65830ed03b987cb3e8
7
+ data.tar.gz: 4f01dc9e7ea6a53d983d51508e69710999741f6e5c74b49c2ab42e45a312f3364a07f7ff58fe6dd8e852f08d67d1df110a86c7e0330c6301e522bf8c595839f3
data/CHANGELOG.org CHANGED
@@ -2,6 +2,21 @@
2
2
  #+AUTHOR: Adolfo Villafiorita
3
3
  #+STARTUP: showall
4
4
 
5
+ * 2.0.1
6
+
7
+ - Add GitHub action for publishing to RubyGems after repeated failures
8
+ with authentication from the command line
9
+
10
+ * 2.0.0 (Not released)
11
+
12
+ - World Map
13
+ - Dark mode
14
+ - Fix link colors in sidebar
15
+ - Bars in the statuses bar plot are now colored according to status
16
+ - Add "statuses by day" in Rails report
17
+ - Enlarge "errors" and "potential attacks" reports
18
+ - Various smaller fixes
19
+
5
20
  * 1.9.0
6
21
 
7
22
  - Perform calculation on HTML pages only
data/Gemfile CHANGED
@@ -1,6 +1,6 @@
1
1
  source "https://rubygems.org"
2
2
 
3
- # Specify your gem's dependencies in apache_log_report.gemspec
3
+ # Specify your gem's dependencies in log_sense.gemspec
4
4
  gemspec
5
5
 
6
- gem "rake", "~> 12.0"
6
+ gem "rake", "~> 13.0"
data/Gemfile.lock CHANGED
@@ -1,12 +1,12 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- log_sense (1.7.0)
5
- browser
6
- ipaddr
7
- iso_country_codes
8
- sqlite3
9
- terminal-table
4
+ log_sense (2.0.0)
5
+ browser (~> 5.3.0)
6
+ ipaddr (~> 1.2.0)
7
+ iso_country_codes (~> 0.7.0)
8
+ sqlite3 (~> 2.0.0)
9
+ terminal-table (~> 3.0.0)
10
10
 
11
11
  GEM
12
12
  remote: https://rubygems.org/
@@ -17,22 +17,22 @@ GEM
17
17
  reline (>= 0.3.8)
18
18
  io-console (0.7.2)
19
19
  ipaddr (1.2.6)
20
- irb (1.13.1)
20
+ irb (1.14.0)
21
21
  rdoc (>= 4.0.0)
22
22
  reline (>= 0.4.2)
23
23
  iso_country_codes (0.7.8)
24
24
  mini_portile2 (2.8.7)
25
- minitest (5.23.1)
25
+ minitest (5.24.1)
26
26
  psych (5.1.2)
27
27
  stringio
28
- rake (12.3.3)
28
+ rake (13.2.1)
29
29
  rdoc (6.7.0)
30
30
  psych (>= 4.0.0)
31
- reline (0.5.8)
31
+ reline (0.5.9)
32
32
  io-console (~> 0.5)
33
- sqlite3 (2.0.2)
33
+ sqlite3 (2.0.3)
34
34
  mini_portile2 (~> 2.8.0)
35
- stringio (3.1.0)
35
+ stringio (3.1.1)
36
36
  terminal-table (3.0.2)
37
37
  unicode-display_width (>= 1.1.1, < 3)
38
38
  unicode-display_width (2.5.0)
@@ -41,10 +41,10 @@ PLATFORMS
41
41
  ruby
42
42
 
43
43
  DEPENDENCIES
44
- debug
44
+ debug (~> 1.9.0)
45
45
  log_sense!
46
- minitest
47
- rake (~> 12.0)
46
+ minitest (~> 5.24.0)
47
+ rake (~> 13.0)
48
48
 
49
49
  BUNDLED WITH
50
50
  2.5.3
data/README.org CHANGED
@@ -4,58 +4,89 @@
4
4
 
5
5
  * Introduction
6
6
 
7
- LogSense generates reports and statistics from Apache and Ruby on Rails log
8
- files. All the statistics you need to monitor your application, its
9
- performances, and how users access your app. Since it collects data from logs,
10
- there is no need for cookies or other tracking technologies.
11
-
12
- LogSense is Written in Ruby, it runs from the command line, it is
13
- fast, and it can be installed on any system with a relatively recent
14
- version of Ruby. We tested on Ruby 2.6.9, Ruby 3.0.x and later.
15
-
16
- When generating reports, LogSense reports the following data:
17
-
18
- - Visitors, hits, unique visitors, bandwidth used
19
- - Most accessed HTML pages
20
- - Most accessed resources
21
- - Missed resources (also by IP) which helps highlight
22
- potential attacks
23
- - Response statuses
24
- - Referers
25
- - OS, browsers, and devices
26
- - IP Country location, thanks to the DP-IP lite country DB
27
- - Streaks: resources accessed by a given IP over time
28
- - Performance of Rails requests
29
- - Rails Fatal Errors (with reference to the logs)
7
+ LogSense generates reports and statistics from Ruby on Rails and Apache/Nginx
8
+ log files.
9
+
10
+ Main features:
11
+
12
+ - Statistics for Rails app in production and Web server logs (combined format,
13
+ which can be produced both by Apache and Nginx)
14
+ - Reports on performances, errors, visitors, and devices used to access your
15
+ websites and webapps[fn:: LogSense parses also the data generated by the
16
+ BrowserInfo gem, providing additional information for Rails apps, including
17
+ devices, platforms and number of accesses to methods by device type.].
18
+ - Can combine one or more log files
19
+ - No need for cookies or other tracking technologies (but you need access to
20
+ your log files)
21
+ - Filters allow to analyze specific periods distinguish traffic generated by
22
+ self polls and crawlers.
23
+ - Reports can be generated in HTML, txt, ufw, and SQLite. HTML reports are
24
+ responsive and come with dark and light theme.
30
25
 
31
- LogSense parses also the data generated by BrowserInfo, providing additional
32
- information for Rails apps, including devices and platforms and number of
33
- accesses to methods by device type.
26
+ LogSense is Written in Ruby, it runs from the command line, it is fast, and it
27
+ can be installed on any system with a relatively recent version of Ruby. We
28
+ use it with Ruby 3.1.4 and 3.3.0.
34
29
 
35
- A special output format =ufw= generates rules for the [[https://launchpad.net/ufw][Uncomplicated
36
- Firewall]] to blacklist IPs requesting URLs matching a specific pattern.
37
-
38
- Filters from the command line allow to analyze specific periods and
39
- distinguish traffic generated by self polls and crawlers.
30
+ It is fast. On a ThinkPad P16, a 277M log file is parsed in 15 seconds,
31
+ processing, that is, about 7740 events per second; a 569M log file is parsed in
32
+ 50 seconds, that is, about 4700 events per second.
40
33
 
41
- LogSense generates HTML, txt, ufw, and SQLite outputs.
42
34
 
43
- ** Rails Report Structure
35
+ ** Rails Production Report
44
36
 
45
37
  #+ATTR_HTML: :width 80%
46
38
  [[file:./screenshots/rails-screenshot.png]]
47
39
 
48
-
49
- ** Apache Report Structure
40
+ LogSense understands the Rails *production log* and generates the following
41
+ reports in TXT and HTML:
42
+
43
+ - Daily Distribution
44
+ - Time Distribution
45
+ - Statuses
46
+ - Statuses by Day
47
+ - Rails Performance
48
+ - Controller and Methods by Device
49
+ - Fatal Events
50
+ - Internal Server Errors
51
+ - Errors
52
+ - Potential Attacks
53
+ - Browsers
54
+ - Platforms
55
+ - IPs
56
+ - Countries
57
+ - IP per hour
58
+ - Sessions
59
+
60
+ ** Apache/Nginx Report
50
61
 
51
62
  #+ATTR_HTML: :width 80%
52
- [[file:./screenshots/apache-screenshot.png]]
53
-
63
+ [[file:./screenshots/combined_log-screenshot.png]]
64
+
65
+ LogSense reads the Apache/Nginx *combined log* format and generates the
66
+ following reports in TXT and HTML:
67
+
68
+ - Time Distribution
69
+ - 20_ and 30_ on HTML pages
70
+ - 20_ and 30_ on other resources
71
+ - 40_ and 50_x on HTML pages
72
+ - 40_ and 50_ on other resources
73
+ - 40_ and 50_x on HTML pages by IP
74
+ - 40_ and 50_ on other resources by IP
75
+ - Statuses
76
+ - Statuses by Day
77
+ - Browsers
78
+ - Platforms
79
+ - IPs
80
+ - Countries
81
+ - IP per hour
82
+ - Combined Platform Data
83
+ - Referers
84
+ - Sessions
54
85
 
55
86
  ** UFW Report
56
87
 
57
- The output format =ufw= generates directives for Uncomplicated
58
- Firewall blacklisting IPs requesting URLs matching a given pattern.
88
+ The =ufw= output format generates directives for Uncomplicated Firewall,
89
+ blacklisting IPs requesting URLs matching a given pattern.
59
90
 
60
91
  We use it to blacklist IPs requesting WordPress login pages on our
61
92
  websites... since we don't use WordPress for our websites.
@@ -73,40 +104,55 @@ ufw deny from 185.255.134.18
73
104
  ...
74
105
  #+end_src
75
106
 
76
-
77
- * An important word of warning
78
-
79
- [[https://owasp.org/www-community/attacks/Log_Injection][Log poisoning]] is a technique whereby attackers send requests with invalidated
80
- user input to forge log entries or inject malicious content into the logs.
81
-
82
- log_sense sanitizes entries of HTML reports, to try and protect from log
83
- poisoning. *Log entries and URLs in SQLite3, however, are not sanitized*:
84
- they are stored and read from the log. This is not, in general, an issue,
85
- unless you use the data from SQLite in environments in which URLs can be
86
- opened or code executed.
87
-
88
- * Motivation
89
-
90
- LogSense moves along the lines of tools such as [[https://goaccess.io/][GoAccess]] and [[https://umami.is/][Umami]], focusing on
91
- *privacy*, *data-ownership*, and *simplicity*: no need to install JavaScript
92
- snippets, no tracking cookies, just plain and simple log analysis.
93
-
94
- LogSense is also inspired by *static websites generators*: statistics are
95
- generated from the command line and accessed as static HTML files. This
96
- significantly reduces the attack surface of your web server and installation
97
- headaches. We have, for instance, a cron job running on our servers, generating
98
- statistics at night. The generated files are then made available on a private
99
- area on the web.
100
-
101
107
  * Installation
102
108
 
103
109
  #+begin_src bash
104
110
  gem install log_sense
105
111
  #+end_src
106
112
 
113
+ If you want to collect information about browsers, platform and devices when
114
+ generating Rails reports, add the =browser= gem to your bundle and the
115
+ following code to =application_controller.rb=:
116
+
117
+ #+begin_example ruby
118
+ # Gemfile
119
+ gem "browser"
120
+ #+end_example
121
+
122
+ #+begin_example ruby
123
+ # application_controller.rb
124
+ class ApplicationController < ActionController::Base
125
+
126
+ # [...]
127
+
128
+ before_action do |controller|
129
+ user_agent = request.env['HTTP_USER_AGENT']
130
+ ip = request.env['REMOTE_ADDR']
131
+
132
+ hashed_ip = Digest::SHA256.hexdigest ip
133
+ b = Browser.new(user_agent)
134
+ now = DateTime.now
135
+
136
+ logger = Rails.logger
137
+ browser_data = [
138
+ b.name, b.platform, b.device.name,
139
+ controller.class.name, controller.action_name,
140
+ request.format.symbol,
141
+ hashed_ip,
142
+ now
143
+ ]
144
+
145
+ browser_data_str = browser_data.map { |x| "\"#{x}\"" }.join(',')
146
+ logger.info "BrowserInfo: #{browser_data_str}"
147
+ end
148
+
149
+ # [...]
150
+ end
151
+ #+end_example
152
+
107
153
  * Usage
108
154
 
109
- #+begin_src bash :results raw output :wrap example
155
+ #+begin_src bash :results raw output :wrap example :exports both
110
156
  log_sense --help
111
157
  #+end_src
112
158
 
@@ -131,7 +177,7 @@ area on the web.
131
177
  -v, --version Prints version information
132
178
  -h, --help Prints this help
133
179
 
134
- This is version 1.8.0
180
+ This is version 2.0.0
135
181
 
136
182
  Output formats:
137
183
 
@@ -146,6 +192,51 @@ log_sense -f apache -i access.log -t txt > access-data.txt
146
192
  log_sense -f rails -i production.log -t html -o performance.html
147
193
  #+end_example
148
194
 
195
+ * Motivation
196
+
197
+ LogSense focuses on *privacy*, *data-ownership*, and *simplicity*: no need to
198
+ install JavaScript snippets, no tracking cookies, just plain and simple log
199
+ analysis.
200
+
201
+ LogSense is also inspired by *static websites generators*: statistics are
202
+ generated from the command line and accessed as static HTML files. This
203
+ significantly reduces the attack surface of your web server and installation
204
+ headaches. We have a cron job running on our servers, generating statistics at
205
+ night. The generated files are then made available on a private area on the
206
+ web and rotated monthly.
207
+
208
+ * An important word of warning on SQLite3 output
209
+
210
+ [[https://owasp.org/www-community/attacks/Log_Injection][Log poisoning]] is a technique whereby attackers send requests with invalidated
211
+ user input to forge log entries or inject malicious content into the logs.
212
+
213
+ log_sense sanitizes entries of HTML reports, to try and protect from log
214
+ poisoning. *Log entries and URLs in SQLite3 tables, however, are not
215
+ sanitized*: they are read and stored from the log as they are. This is not, in
216
+ general, an issue, unless you use the unsanitized data from SQLite as it is in
217
+ environments where URL can be opened or code executed using the URLs as
218
+ argument.
219
+
220
+ * Change Log
221
+
222
+ See the [[file:CHANGELOG.org][CHANGELOG]] file.
223
+
224
+ * Compatibility
225
+
226
+ LogSense should run on any system on which a recent version of Ruby
227
+ runs. We tested it with Ruby 2.6.9 and Ruby 3.0.x, and Ruby 3.3.x
228
+
229
+ * Author and Contributors
230
+
231
+ [[https://shair.tech][Shair.Tech]]
232
+
233
+ * Credits
234
+
235
+ - HTML reports use [[https://get.foundation/][Zurb Foundation]], [[https://www.datatables.net/][Data Tables]], and [[https://echarts.apache.org/en/index.html][Apache ECharts]]
236
+ - The textual format is compatible with [[https://orgmode.org/][Org Mode]] and can be further processed to
237
+ any format [[https://orgmode.org/][Org Mode]] can be exported to, including HTML and PDF, with the word
238
+ of warning in the section above concerning log poisoning.
239
+
149
240
  * Code Structure
150
241
 
151
242
  The code implements a pipeline, with the following steps:
@@ -164,64 +255,26 @@ The code implements a pipeline, with the following steps:
164
255
  building the reports.
165
256
  5. *Emitter* generates reports from shaped data using ERB.
166
257
 
167
- The architecture and the structure of the code is far from being nice,
168
- for historical reason and for a bunch of small differences existing
169
- between the input and the outputs to be generated. This usually ends
170
- up with modifications to the code that have to be replicated in
171
- different parts of the code and in interferences.
172
-
173
- Among the points I would like to address:
174
-
175
- - The execution pipeline in the main script has a few exceptions to
176
- manage SQLite reading/dumping and ufw report. A linear structure
177
- would be a lot nicer.
178
- - Two different classes are defined for steps 1, 2, and 4, to manage,
179
- respectively, Apache and Rails logs. These classes inherit from a
180
- common ancestor (e.g. ApacheParser and RailsParser both inherit from
181
- Parser), but there is still too little code shared. A nicer
182
- approach would be that of identifying a common DB structure and
183
- unify the pipeline up to (or including) the generation of
184
- reports. There are a bunch of small different things to highlight in
185
- reports, which still make this difficult. For instance, the country
186
- report for Apache reports size of TX data, which is not available
187
- for Rail reports.
188
- - Geolocation could become a lot more efficient if performed in
189
- SQLite, rather than in Ruby
190
- - The distinction between Aggregation, Shaping, and Emission is a too
191
- fine-grained and it would be nice to be able to cleanly remove one
192
- of the steps.
193
-
194
-
195
- * Change Log
196
-
197
- See the [[file:CHANGELOG.org][CHANGELOG]] file.
198
-
199
- * Compatibility
200
-
201
- LogSense should run on any system on which a recent version of Ruby
202
- runs. We tested it with Ruby 2.6.9 and Ruby 3.x.x.
203
-
204
- Concerning the outputs:
205
258
 
206
- - HTML reports use [[https://get.foundation/][Zurb Foundation]], [[https://www.datatables.net/][Data Tables]], and [[https://vega.github.io/vega-lite/][Vega Light]], which
207
- are all downloaded from a CDN
208
- - The textual format is compatible with [[https://orgmode.org/][Org Mode]] and can be further
209
- processed to any format [[https://orgmode.org/][Org Mode]] can be exported to, including HTML
210
- and PDF, with the word of warning in the section above.
259
+ * Todo
211
260
 
212
- * Author and Contributors
213
-
214
- [[https://shair.tech][Shair.Tech]]
261
+ See [[todo.org]]
215
262
 
216
263
  * Known Bugs
217
264
 
218
265
  We have been running LogSense for quite a few years with no particular issues.
219
266
  There are no known bugs; there is an unknown number of unknown bugs.
220
267
 
221
- * License
268
+ You are most welcome to report issues and missing features, using the Issue
269
+ tracker.
270
+
271
+ * Licenses
222
272
 
223
- Source code distributed under the terms of the [[http://opensource.org/licenses/MIT][MIT License]].
273
+ LogSense is distributed under the terms of the [[http://opensource.org/licenses/MIT][MIT License]].
224
274
 
225
- Geolocation is made possible by the DB-IP.com IP to City database,
226
- released under a CC license.
275
+ Geolocation is made possible by [[https://db-ip.com/][dbip]]'s IP to City database, released under a
276
+ CC license.
227
277
 
278
+ The world map is distributed under the terms of the [[http://opensource.org/licenses/MIT][MIT License]] by Pareto
279
+ Softare, [[https://simplemaps.com/][Simplemaps.com]]. It is used in LogSense with some changes to the class
280
+ names and ids.
data/exe/log_sense CHANGED
@@ -114,7 +114,7 @@ elsif @options[:output_format] == "ufw"
114
114
  }
115
115
  ips_and_urls.each do |ip, urls|
116
116
  puts "# #{urls[0..10].uniq.join(' ')}"
117
- puts "ufw deny from #{ip}"
117
+ puts "ufw insert 1 deny from #{ip}"
118
118
  puts
119
119
  end
120
120
 
@@ -132,6 +132,7 @@ else
132
132
 
133
133
  warn "Grouping IPs by country ..." if @options[:verbose]
134
134
  country_col = geolocated_data[0].size - 1
135
+ @data[:ips] = geolocated_data
135
136
  @data[:countries] = geolocated_data.group_by { |x| x[country_col] }
136
137
  elsif @options[:geolocation] && @data[:ips].size == 0
137
138
  warn "Skipping geolocation: no IP found" if @options[:verbose]
@@ -78,7 +78,8 @@ module LogSense
78
78
  extra_cols = ""
79
79
  end
80
80
 
81
- @ips = @db.execute %(SELECT ip, count(ip) #{extra_cols} from #{@table}
81
+ @ips = @db.execute %(SELECT ip, count(ip) #{extra_cols}
82
+ from #{@table}
82
83
  where #{filter}
83
84
  group by ip
84
85
  order by count(ip) desc
@@ -169,7 +170,7 @@ module LogSense
169
170
  # name is used to give the name to the column with formatted time
170
171
  def ip_by_time_query(name, format_string)
171
172
  %(SELECT ip,
172
- strftime("%H", #{@date_field}) as #{name},
173
+ strftime('#{format_string}', #{@date_field}) as #{name},
173
174
  count(#{@url_field}) from #{@table}
174
175
  where #{filter} and ip != "" and
175
176
  #{@url_field} != "" and