log_sense 1.5.2 → 1.6.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (39) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.org +27 -0
  3. data/Gemfile.lock +6 -4
  4. data/README.org +108 -34
  5. data/Rakefile +6 -6
  6. data/exe/log_sense +110 -39
  7. data/ip_locations/dbip-country-lite.sqlite3 +0 -0
  8. data/lib/log_sense/aggregator.rb +191 -0
  9. data/lib/log_sense/apache_aggregator.rb +122 -0
  10. data/lib/log_sense/apache_log_line_parser.rb +23 -21
  11. data/lib/log_sense/apache_log_parser.rb +15 -12
  12. data/lib/log_sense/apache_report_shaper.rb +309 -0
  13. data/lib/log_sense/emitter.rb +55 -553
  14. data/lib/log_sense/ip_locator.rb +24 -12
  15. data/lib/log_sense/options_checker.rb +24 -0
  16. data/lib/log_sense/options_parser.rb +81 -51
  17. data/lib/log_sense/rails_aggregator.rb +69 -0
  18. data/lib/log_sense/rails_log_parser.rb +82 -68
  19. data/lib/log_sense/rails_report_shaper.rb +183 -0
  20. data/lib/log_sense/report_shaper.rb +105 -0
  21. data/lib/log_sense/templates/_cdn_links.html.erb +11 -0
  22. data/lib/log_sense/templates/_command_invocation.html.erb +4 -0
  23. data/lib/log_sense/templates/_log_structure.html.erb +7 -1
  24. data/lib/log_sense/templates/_output_table.html.erb +6 -2
  25. data/lib/log_sense/templates/_rails.css.erb +7 -0
  26. data/lib/log_sense/templates/_summary.html.erb +9 -7
  27. data/lib/log_sense/templates/_summary.txt.erb +2 -2
  28. data/lib/log_sense/templates/{rails.html.erb → report_html.erb} +19 -37
  29. data/lib/log_sense/templates/{apache.txt.erb → report_txt.erb} +1 -1
  30. data/lib/log_sense/version.rb +1 -1
  31. data/lib/log_sense.rb +19 -9
  32. data/log_sense.gemspec +1 -1
  33. data/{apache-screenshot.png → screenshots/apache-screenshot.png} +0 -0
  34. data/screenshots/rails-screenshot.png +0 -0
  35. metadata +17 -11
  36. data/lib/log_sense/apache_data_cruncher.rb +0 -147
  37. data/lib/log_sense/rails_data_cruncher.rb +0 -141
  38. data/lib/log_sense/templates/apache.html.erb +0 -115
  39. data/lib/log_sense/templates/rails.txt.erb +0 -22
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 5f37b2247014af9bccfb8bdd205b54eb51cbef35495904444765648a96e0b9ac
4
- data.tar.gz: a9cd03afb6770bf854791b8ec93e87d09532828d894b28f5dde670f1af1b1b1d
3
+ metadata.gz: 7d269dedfbb6ec6eae3a77491cc5ec7ca6241f388f2658964541cdd3983b8298
4
+ data.tar.gz: 6f24d23c8d06430b3605aad90522e08818cd18fd6c71c5fe90823cdc9483c81e
5
5
  SHA512:
6
- metadata.gz: 62981216e38b92e1c3ea227726dccd592c441de32519a4df6f39bac127f55da32e82b4a1a14897b09b7032c7b3f9506a14e80e49dc43bfe8c9d86bfaa340da82
7
- data.tar.gz: e5e6180008a12561d688668cffb2ef3d885d7733cf877aafaed0397b9fa0e91dd12a2998b19006dad4fe6b202bd203a00757f86d5d37efee777dbc7be43e5ab1
6
+ metadata.gz: a63f715b281101a6f61029da3e2bcf5db4d47a537af562507b0184d6c6755eed436afed729c31cc95255bf064cbe9f487917c96d78199bbee25e6d4189469951
7
+ data.tar.gz: 77c62a24c3c81067dd0288ad606b732e0f112d0d1710b326be9e894aa72a93db4cb0a188c93b54ada728c2eeb0cf7378fb7033e13982c9a0932414c8455d2703
data/CHANGELOG.org CHANGED
@@ -2,6 +2,33 @@
2
2
  #+AUTHOR: Adolfo Villafiorita
3
3
  #+STARTUP: showall
4
4
 
5
+ * 1.6.0
6
+
7
+ - [User] New output format =ufw= generates directives to blacklist IPs
8
+ requesting URLs matching a pattern. For users of the Uncomplicated
9
+ Firewall.
10
+ - [User] new option =--no-geo= skips geolocation, which is terribly
11
+ costly in the current implementation.
12
+ - [User] Updated DB-IP country file to Dec 2022 version.
13
+ - [User] Changed name of SQLite output format to sqlite3
14
+ - [User] It is now possible to start analysis from a sqlite3 DB
15
+ generated by log_sense, breaking parsing and generation in two
16
+ steps.
17
+ - [User] Check for correctness of I/O formats before launching
18
+ analysis
19
+ - [User] Streak report has been renames Session. Limited the number
20
+ of URLs shown in each session, to avoid buffer?/memory overflows
21
+ when an IP requests a massive amount of URLs.
22
+ - [User] Added an IP-per-hour visits report.
23
+ - [Code] A rather extensive refactoring of the source code to
24
+ remove code duplications and improve code structure.
25
+ - [Code] Rubocop-ped various files
26
+ - [Code] Added text renderer to DataTable, which sanitizes input and
27
+ further reduces risks of XSS and log poisoning attacks
28
+ - [Code] CDN links have been ported into the Emitter module and used
29
+ in the Embedded Ruby Templates (erbs). This simplifies version
30
+ updates of Javascript libraries used in reports.
31
+
5
32
  * 1.5.2
6
33
 
7
34
  - [User] Updated DB-IP country file.
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- log_sense (1.5.2)
4
+ log_sense (1.5.3)
5
5
  browser
6
6
  ipaddr
7
7
  iso_country_codes
@@ -16,18 +16,20 @@ GEM
16
16
  irb (>= 1.3.6)
17
17
  reline (>= 0.3.1)
18
18
  io-console (0.5.11)
19
- ipaddr (1.2.4)
19
+ ipaddr (1.2.5)
20
20
  irb (1.4.1)
21
21
  reline (>= 0.3.0)
22
22
  iso_country_codes (0.7.8)
23
+ mini_portile2 (2.8.0)
23
24
  minitest (5.15.0)
24
25
  rake (12.3.3)
25
26
  reline (0.3.1)
26
27
  io-console (~> 0.5)
27
- sqlite3 (1.4.4)
28
+ sqlite3 (1.5.4)
29
+ mini_portile2 (~> 2.8.0)
28
30
  terminal-table (3.0.2)
29
31
  unicode-display_width (>= 1.1.1, < 3)
30
- unicode-display_width (2.2.0)
32
+ unicode-display_width (2.3.0)
31
33
 
32
34
  PLATFORMS
33
35
  ruby
data/README.org CHANGED
@@ -9,7 +9,7 @@ Rails logs. Written in Ruby, it runs from the command line, it is
9
9
  fast, and it can be installed on any system with a relatively recent
10
10
  version of Ruby. We tested on Ruby 2.6.9, Ruby 3.0.x and later.
11
11
 
12
- LogSense reports the following data:
12
+ When generating reports, LogSense reports the following data:
13
13
 
14
14
  - Visitors, hits, unique visitors, bandwidth used
15
15
  - Most accessed HTML pages
@@ -22,18 +22,49 @@ LogSense reports the following data:
22
22
  - IP Country location, thanks to the DP-IP lite country DB
23
23
  - Streaks: resources accessed by a given IP over time
24
24
  - Performance of Rails requests
25
+
26
+ A special output format =ufw= generates rules for the [[https://launchpad.net/ufw][Uncomplicated
27
+ Firewall]] to blacklist IPs requesting URLs matching a specific pattern.
25
28
 
26
29
  Filters from the command line allow to analyze specific periods and
27
30
  distinguish traffic generated by self polls and crawlers.
28
31
 
29
- LogSense generates HTML, txt, and SQLite outputs.
32
+ LogSense generates HTML, txt, ufw, and SQLite outputs.
30
33
 
31
- And, of course, the compulsory screenshot:
34
+ ** Apache Report Structure
32
35
 
33
36
  #+ATTR_HTML: :width 80%
34
- [[file:./apache-screenshot.png]]
37
+ [[file:./screenshots/apache-screenshot.png]]
38
+
39
+
40
+ ** Rails Report Structure
41
+
42
+ #+ATTR_HTML: :width 80%
43
+ [[file:./screenshots/rails-screenshot.png]]
44
+
45
+
46
+ ** UFW Report
35
47
 
48
+ The output format =ufw= generates directives for Uncomplicated
49
+ Firewall blacklisting IPs requesting URLs matching a given pattern.
36
50
 
51
+ We use it to blacklist IPs requesting WordPress login pages on our
52
+ websites... since we don't use WordPress for our websites.
53
+
54
+ *Example*
55
+
56
+ #+begin_src
57
+ $ log_sense -f apache -t ufw -i apache.log
58
+ # /users/sign_in/xmlrpc.php?rsd
59
+ ufw deny from 20.212.3.206
60
+
61
+ # /wp-login.php /wordpress/wp-login.php /blog/wp-login.php /wp/wp-login.php
62
+ ufw deny from 185.255.134.18
63
+
64
+ ...
65
+ #+end_src
66
+
67
+
37
68
  * An important word of warning
38
69
 
39
70
  [[https://owasp.org/www-community/attacks/Log_Injection][Log poisoning]] is a technique whereby attackers send requests with invalidated
@@ -48,9 +79,10 @@ opened or code executed.
48
79
  * Motivation
49
80
 
50
81
  LogSense moves along the lines of tools such as [[https://goaccess.io/][GoAccess]] (which
51
- strongly inspired the development of Log Sense) and [[https://umami.is/][Umami]], focusing on
52
- *privacy* and *data-ownership*: the data generated by LogSense is
53
- stored on your computer and owned by you (like it should be)[fn:1].
82
+ strongly inspired the development of Log Sense) and [[https://umami.is/][Umami]], both
83
+ focusing on *privacy* and *data-ownership*: the data generated by
84
+ LogSense is stored on your computer and owned by you (like it should
85
+ be)[fn:1].
54
86
 
55
87
  LogSense is also inspired by *static websites generators*: statistics
56
88
  are generated from the command line and accessed as static HTML files.
@@ -76,33 +108,30 @@ generated files are then made available on a private area on the web.
76
108
  #+RESULTS:
77
109
  #+begin_example
78
110
  Usage: log_sense [options] [logfile ...]
79
- --title=TITLE Title to use in the report
80
- -f, --input-format=FORMAT Input format (either rails or apache)
81
- -i, --input-files=file,file, Input files (can also be passed directly)
82
- -t, --output-format=FORMAT Output format: html, org, txt, sqlite. See below for available formats
83
- -o, --output-file=OUTPUT_FILE Output file
84
- -b, --begin=DATE Consider entries after or on DATE
85
- -e, --end=DATE Consider entries before or on DATE
86
- -l, --limit=N Limit to the N most requested resources (defaults to 100)
87
- -w, --width=WIDTH Maximum width of long columns in textual reports
88
- -r, --rows=ROWS Maximum number of rows for columns with multiple entries in textual reports
89
- -c, --crawlers=POLICY Decide what to do with crawlers (applies to Apache Logs)
90
- -n, --no-selfpolls Ignore self poll entries (requests from ::1; applies to Apache Logs)
91
- --verbose Inform about progress (prints to STDERR)
92
- -v, --version Prints version information
93
- -h, --help Prints this help
94
-
95
- This is version 1.5.2
96
-
97
- Output formats
98
- apache parsing can produce the following outputs:
99
- - sqlite
100
- - html
101
- - txt
102
- rails parsing can produce the following outputs:
103
- - sqlite
104
- - html
105
- - txt
111
+ --title=TITLE Title to use in the report
112
+ -f, --input-format=FORMAT Input format (either rails or apache)
113
+ -i, --input-files=file,file, Input files (can also be passed directly)
114
+ -t, --output-format=FORMAT Output format: html, org, txt, sqlite.
115
+ -o, --output-file=OUTPUT_FILE Output file
116
+ -b, --begin=DATE Consider entries after or on DATE
117
+ -e, --end=DATE Consider entries before or on DATE
118
+ -l, --limit=N Limit to the N most requested resources (defaults to 100)
119
+ -w, --width=WIDTH Maximum width of long columns in textual reports
120
+ -r, --rows=ROWS Maximum number of rows for columns with multiple entries in textual reports
121
+ -p, --pattern=PATTERN Pattern to use with ufw report to decide IP to blacklist
122
+ -c, --crawlers=POLICY Decide what to do with crawlers (applies to Apache Logs)
123
+ --no-selfpolls Ignore self poll entries (requests from ::1; applies to Apache Logs)
124
+ -n, --no-geog Do not geolocate entries
125
+ --verbose Inform about progress (output to STDERR)
126
+ -v, --version Prints version information
127
+ -h, --help Prints this help
128
+
129
+ This is version 1.6.0
130
+
131
+ Output formats:
132
+
133
+ - rails: txt, html, sqlite3, ufw
134
+ - apache: txt, html, sqlite3, ufw
106
135
  #+end_example
107
136
 
108
137
  Examples:
@@ -112,6 +141,51 @@ log_sense -f apache -i access.log -t txt > access-data.txt
112
141
  log_sense -f rails -i production.log -t html -o performance.html
113
142
  #+end_example
114
143
 
144
+ * Code Structure
145
+
146
+ The code implements a pipeline, with the following steps:
147
+
148
+ 1. *Parser:* parses a log to a SQLite3 database. The database
149
+ contains a table with a list of events, and, in the case of Rails
150
+ report, a table with the errors.
151
+ 2. *Aggregator:* takes as input a SQLite DB and aggregates data,
152
+ typically performing "group by", which are simpler to generate in
153
+ Ruby, rather than in SQL. The module outputs a Hash, with
154
+ different reporting data.
155
+ 3. *GeoLocator:* add country information to all the reporting data
156
+ which has an IP as one the fields.
157
+ 4. *Shaper:* makes (geolocated) aggregated data (e.g. Hashes and
158
+ such), into Array of Arrays, simplifying the structure of the code
159
+ building the reports.
160
+ 5. *Emitter* generates reports from shaped data using ERB.
161
+
162
+ The architecture and the structure of the code is far from being nice,
163
+ for historical reason and for a bunch of small differences existing
164
+ between the input and the outputs to be generated. This usually ends
165
+ up with modifications to the code that have to be replicated in
166
+ different parts of the code and in interferences.
167
+
168
+ Among the points I would like to address:
169
+
170
+ - The execution pipeline in the main script has a few exceptions to
171
+ manage SQLite reading/dumping and ufw report. A linear structure
172
+ would be a lot nicer.
173
+ - Two different classes are defined for steps 1, 2, and 4, to manage,
174
+ respectively, Apache and Rails logs. These classes inherit from a
175
+ common ancestor (e.g. ApacheParser and RailsParser both inherit from
176
+ Parser), but there is still too little code shared. A nicer
177
+ approach would be that of identifying a common DB structure and
178
+ unify the pipeline up to (or including) the generation of
179
+ reports. There are a bunch of small different things to highlight in
180
+ reports, which still make this difficult. For instance, the country
181
+ report for Apache reports size of TX data, which is not available
182
+ for Rail reports.
183
+ - Geolocation could become a lot more efficient if performed in
184
+ SQLite, rather than in Ruby
185
+ - The distinction between Aggregation, Shaping, and Emission is a too
186
+ fine-grained and it would be nice to be able to cleanly remove one
187
+ of the steps.
188
+
115
189
 
116
190
  * Change Log
117
191
 
data/Rakefile CHANGED
@@ -9,18 +9,18 @@ end
9
9
  require_relative './lib/log_sense/ip_locator.rb'
10
10
 
11
11
  desc "Convert Geolocation DB to sqlite"
12
- task :dbip_to_sqlite3, [:year_month] do |tasks, args|
13
- filename = "./ip_locations/dbip-country-lite-#{args[:year_month]}.csv"
12
+ task :dbip, [:filename] do |tasks, args|
13
+ filename = args[:filename]
14
14
 
15
15
  if !File.exist? filename
16
16
  puts "Error. Could not find: #{filename}"
17
17
  puts
18
18
  puts 'I see the following files:'
19
19
  puts Dir.glob("ip_locations/dbip-country-lite*").map { |x| "- #{x}\n" }
20
- puts ''
21
- puts '1. Download (if necessary) a more recent version from: https://db-ip.com/db/download/ip-to-country-lite'
22
- puts '2. Save downloaded file to ip_locations/'
23
- puts '3. Relaunch with YYYY-MM'
20
+ puts
21
+ puts "1. Download (if necessary) a more recent version from: https://db-ip.com/db/download/ip-to-country-lite"
22
+ puts "2. Save downloaded file to ip_locations/"
23
+ puts "3. Relaunch with YYYY-MM"
24
24
 
25
25
  exit
26
26
  else
data/exe/log_sense CHANGED
@@ -1,82 +1,153 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
- require 'log_sense.rb'
3
+ require "log_sense"
4
+ require "sqlite3"
4
5
 
5
6
  #
6
7
  # Parse Command Line Arguments
7
8
  #
8
9
 
9
10
  # this better be here... OptionsParser consumes ARGV
10
- @command_line = ARGV.join(' ')
11
- @options = LogSense::OptionsParser.parse ARGV
12
- @output_file = @options[:output_file]
11
+ @command_line = ARGV.join(" ")
12
+ @options = LogSense::OptionsParser.parse ARGV
13
+ @input_filenames = @options[:input_filenames] + ARGV
14
+ @output_filename = @options[:output_filename]
13
15
 
14
16
  #
15
- # Input files can be gotten from an option and from what remains in
16
- # ARGV
17
+ # Check correctness of input data.
18
+ #
19
+
20
+ #
21
+ # Check input files
17
22
  #
18
- @input_filenames = @options[:input_filenames] + ARGV
19
23
  @non_existing = @input_filenames.reject { |x| File.exist?(x) }
20
24
 
21
- unless @non_existing.empty?
22
- $stderr.puts "Error: input file(s) '#{@non_existing.join(', ')}' do not exist"
25
+ if @non_existing.any?
26
+ warn "Error: some input file(s) \"#{@non_existing.join(", ")}\" do not exist"
27
+ exit 1
28
+ end
29
+
30
+ #
31
+ # Special condition: sqlite3 requires a single file as input
32
+ #
33
+ if @input_filenames.size > 0 &&
34
+ File.extname(@input_filenames.first) == "sqlite3" &&
35
+ @input_filenames.size > 1
36
+ warn "Error: you can pass only one sqlite3 file as input"
37
+ exit 1
38
+ end
39
+
40
+ #
41
+ # Supported input/output chains
42
+ #
43
+ iformat = @options[:input_format]
44
+ oformat = @options[:output_format]
45
+
46
+ if !LogSense::OptionsChecker::compatible?(iformat, oformat)
47
+ warn "Error: don't know how to make #{iformat} into #{oformat}."
48
+ warn "Possible transformation chains:"
49
+ warn LogSense::OptionsChecker.chains_to_s
23
50
  exit 1
24
51
  end
25
- @input_files = @input_filenames.empty? ? [$stdin] : @input_filenames.map { |x| File.open(x, 'r') }
26
52
 
27
53
  #
28
- # Parse Log and Track Statistics
54
+ # Do the work
29
55
  #
30
56
 
31
57
  @started_at = Time.now
32
58
 
33
- case @options[:input_format]
34
- when 'apache'
35
- parser_klass = LogSense::ApacheLogParser
36
- cruncher_klass = LogSense::ApacheDataCruncher
37
- when 'rails'
38
- parser_klass = LogSense::RailsLogParser
39
- cruncher_klass = LogSense::RailsDataCruncher
59
+ if @input_filenames.size > 0 &&
60
+ File.extname(@input_filenames.first) == ".sqlite3"
61
+ warn "Reading SQLite3 DB ..." if @options[:verbose]
62
+ @db = SQLite3::Database.open @input_filenames.first
40
63
  else
41
- $stderr.puts "Error: input format #{@options[:input_format]} not understood."
42
- exit 1
64
+ warn "Parsing ..." if @options[:verbose]
65
+ @input_files = if @input_filenames.empty?
66
+ [$stdin]
67
+ else
68
+ @input_filenames.map { |fname| File.open(fname, "r") }
69
+ end
70
+ class_name = "LogSense::#{@options[:input_format].capitalize}LogParser"
71
+ parser_class = Object.const_get class_name
72
+ parser = parser_class.new
73
+ @db = parser.parse @input_files
43
74
  end
44
75
 
45
- $stderr.puts "Parsing input files..." if @options[:verbose]
46
- @db = parser_klass.parse @input_files
76
+ if @options[:output_format] == "sqlite3"
77
+ warn "Saving SQLite3 DB ..." if @options[:verbose]
47
78
 
48
- if @options[:output_format] == 'sqlite'
49
- $stderr.puts "Saving to SQLite3..." if @options[:verbose]
50
- ddb = SQLite3::Database.new(@output_file || 'db.sqlite3')
51
- b = SQLite3::Backup.new(ddb, 'main', @db, 'main')
79
+ ddb = SQLite3::Database.new(@output_filename || "db.sqlite3")
80
+ b = SQLite3::Backup.new(ddb, "main", @db, "main")
52
81
  b.step(-1) #=> DONE
53
82
  b.finish
83
+
84
+ exit 0
85
+ elsif @options[:output_format] == "ufw"
86
+ pattern = @options[:pattern] || "php"
87
+
88
+ if @options[:input_format] == "rails"
89
+ query = "select distinct event.ip,event.url
90
+ from error join event
91
+ where event.log_id = error.log_id and
92
+ event.url like '%#{pattern}%'"
93
+ else
94
+ query = "select distinct ip,path from logline
95
+ where path like '%#{pattern}%'"
96
+ end
97
+
98
+ ips = @db.execute query
99
+ ips_and_urls = ips.group_by { |x| x[0] }.transform_values { |x|
100
+ x.map { |y| y[1..-1] }.flatten
101
+ }
102
+ ips_and_urls.each do |ip, urls|
103
+ puts "# #{urls[0..10].uniq.join(' ')}"
104
+ puts "ufw deny from #{ip}"
105
+ puts
106
+ end
107
+
108
+ exit 0
54
109
  else
55
- $stderr.puts "Aggregating data..." if @options[:verbose]
56
- @data = cruncher_klass.crunch @db, @options
110
+ warn "Aggregating data ..." if @options[:verbose]
111
+ class_name = "LogSense::#{@options[:input_format].capitalize}Aggregator"
112
+ aggr_class = Object.const_get class_name
113
+ aggr = aggr_class.new(@db, @options)
114
+ @data = aggr.aggregate
57
115
 
58
- $stderr.puts "Geolocating..." if @options[:verbose]
59
- @data = LogSense::IpLocator.geolocate @data
116
+ if @options[:geolocation]
117
+ warn "Geolocating ..." if @options[:verbose]
118
+ @data = LogSense::IpLocator.geolocate @data
60
119
 
61
- $stderr.puts "Grouping by country..." if @options[:verbose]
62
- country_col = @data[:ips][0].size - 1
63
- @data[:countries] = @data[:ips].group_by { |x| x[country_col] }
120
+ warn "Grouping IPs by country ..." if @options[:verbose]
121
+ country_col = @data[:ips][0].size - 1
122
+ @data[:countries] = @data[:ips].group_by { |x| x[country_col] }
123
+ else
124
+ @data[:countries] = {}
125
+ end
64
126
 
65
127
  @ended_at = Time.now
66
128
  @duration = @ended_at - @started_at
67
129
 
68
130
  @data = @data.merge({
69
131
  command: @command_line,
70
- filenames: ARGV,
132
+ filenames: @input_filenames,
71
133
  log_files: @input_files,
72
134
  started_at: @started_at,
73
135
  ended_at: @ended_at,
74
136
  duration: @duration,
75
137
  width: @options[:width]
76
138
  })
77
- #
78
- # Emit Output
79
- #
80
- $stderr.puts "Emitting..." if @options[:verbose]
81
- puts LogSense::Emitter.emit @data, @options
139
+
140
+ if @options[:verbose]
141
+ warn "I have the following keys in data: "
142
+ warn @data.keys.sort.map { |key| "#{key}: #{@data[key].class}" }.join("\n")
143
+ end
144
+
145
+ warn "Shaping data for output ..." if @options[:verbose]
146
+ class_name = "LogSense::#{@options[:input_format].capitalize}ReportShaper"
147
+ shaper_class = Object.const_get class_name
148
+ shaper = shaper_class.new
149
+ @reports = shaper.shape @data
150
+
151
+ warn "Emitting..." if @options[:verbose]
152
+ puts LogSense::Emitter.emit @reports, @data, @options
82
153
  end
Binary file
@@ -0,0 +1,191 @@
1
+ module LogSense
2
+ class Aggregator
3
+ def initialize
4
+ # not meant to be used directly
5
+ raise StandardError
6
+ end
7
+
8
+ protected
9
+
10
+ def logged_query(query)
11
+ puts query
12
+ @db.execute query
13
+ end
14
+
15
+ def aggregate_log_info
16
+ first_day_s = @db.execute "SELECT #{@date_field} from #{@table}
17
+ where #{@date_field} not NULL
18
+ order by #{@date_field}
19
+ limit 1"
20
+ last_day_s = @db.execute "SELECT #{@date_field} from #{@table}
21
+ where #{@date_field} not NULL
22
+ order by #{@date_field} desc
23
+ limit 1"
24
+
25
+ # make first and last day into dates or nil
26
+ @first_day = first_day_s&.first&.first ? Date.parse(first_day_s[0][0]) : nil
27
+ @last_day = last_day_s&.first&.first ? Date.parse(last_day_s[0][0]) : nil
28
+
29
+ @total_days = 0
30
+ @total_days = (@last_day - @first_day).to_i if @first_day && @last_day
31
+
32
+ evs = @db.execute "SELECT count(#{@date_field}) from #{@table}"
33
+ @events_in_log = @log_size = evs[0][0]
34
+
35
+ evs = @db.execute "SELECT count(#{@date_field}) from #{@table} where #{filter}"
36
+ @events = evs[0][0]
37
+
38
+ @source_files = @db.execute "SELECT distinct(source_file) from #{@table}"
39
+
40
+ tuv = @db.execute "SELECT count(distinct(unique_visitor)) from #{@table}
41
+ where #{filter}"
42
+ @total_unique_visits = tuv[0][0]
43
+
44
+ @first_day_requested = @options[:from_date]
45
+ @last_day_requested = @options[:to_date]
46
+
47
+ @first_day_in_analysis = date_sel @first_day_requested, @first_day, :max
48
+ @last_day_in_analysis = date_sel @last_day_requested, @last_day, :min
49
+
50
+ @total_days_in_analysis = 0
51
+ if @first_day_in_analysis && @last_day_in_analysis
52
+ diff = (@last_day_in_analysis - @first_day_in_analysis).to_i
53
+ @total_days_in_analysis = diff
54
+ end
55
+ end
56
+
57
+ def aggregate_statuses
58
+ @statuses = @db.execute %(SELECT status, count(status) from #{@table}
59
+ where #{filter}
60
+ group by status
61
+ order by status)
62
+
63
+ @by_day_5xx = @db.execute status_query(5)
64
+ @by_day_4xx = @db.execute status_query(4)
65
+ @by_day_3xx = @db.execute status_query(3)
66
+ @by_day_2xx = @db.execute status_query(2)
67
+
68
+ all_statuses = @by_day_2xx + @by_day_3xx + @by_day_4xx + @by_day_5xx
69
+ @statuses_by_day = all_statuses.group_by { |x| x[0] }.to_a.map { |x|
70
+ [x[0], x[1].map { |y| y[1] }].flatten
71
+ }
72
+ end
73
+
74
+ def aggregate_ips
75
+ if @table == "LogLine"
76
+ extra_cols = ", count(distinct(unique_visitor)), #{human_readable_size}"
77
+ else
78
+ extra_cols = ""
79
+ end
80
+
81
+ @ips = @db.execute %(SELECT ip, count(ip) #{extra_cols} from #{@table}
82
+ where #{filter}
83
+ group by ip
84
+ order by count(ip) desc
85
+ limit #{@options[:limit]}).gsub("\n", "")
86
+
87
+ @ips_per_hour = @db.execute ip_by_time_query("hour", "%H")
88
+ @ips_per_day = @db.execute ip_by_time_query("day", "%Y-%m-%d")
89
+ @ips_per_week = @db.execute ip_by_time_query("week", "%Y-%W")
90
+
91
+ @ips_per_day_detailed = @db.execute %(
92
+ SELECT ip,
93
+ strftime("%Y-%m-%d", #{@date_field}) as day,
94
+ #{@url_field}
95
+ from #{@table}
96
+ where #{filter} and ip != "" and #{@url_field} != "" and
97
+ #{@date_field} != ""
98
+ order by ip, #{@date_field}).gsub("\n", "")
99
+ end
100
+
101
+ def instance_vars_to_hash
102
+ data = {}
103
+ instance_variables.each do |variable|
104
+ var_as_symbol = variable.to_s[1..].to_sym
105
+ data[var_as_symbol] = instance_variable_get(variable)
106
+ end
107
+ data
108
+ end
109
+
110
+ def human_readable_size
111
+ mega = 1024 * 1024
112
+ giga = mega * 1024
113
+ tera = giga * 1024
114
+
115
+ %(CASE
116
+ WHEN sum(size) < 1024 THEN sum(size) || ' B'
117
+ WHEN sum(size) >= 1024 AND sum(size) < (#{mega})
118
+ THEN ROUND((CAST(sum(size) AS REAL) / 1024), 2) || ' KB'
119
+ WHEN sum(size) >= (#{mega}) AND sum(size) < (#{giga})
120
+ THEN ROUND((CAST(sum(size) AS REAL) / (#{mega})), 2) || ' MB'
121
+ WHEN sum(size) >= (#{giga}) AND sum(size) < (#{tera})
122
+ THEN ROUND((CAST(sum(size) AS REAL) / (#{giga})), 2) || ' GB'
123
+ WHEN sum(size) >= (#{tera})
124
+ THEN ROUND((CAST(sum(size) AS REAL) / (#{tera})), 2) || ' TB'
125
+ END AS size).gsub("\n", "")
126
+ end
127
+
128
+ def human_readable_day
129
+ %(case cast (strftime('%w', #{@date_field}) as integer)
130
+ when 0 then 'Sunday'
131
+ when 1 then 'Monday'
132
+ when 2 then 'Tuesday'
133
+ when 3 then 'Wednesday'
134
+ when 4 then 'Thursday'
135
+ when 5 then 'Friday'
136
+ when 6 then 'Saturday'
137
+ else 'not specified'
138
+ end as dow).gsub("\n", "")
139
+ end
140
+
141
+ #
142
+ # generate the where clause corresponding to the command line options to filter data
143
+ #
144
+ def filter
145
+ from = @options[:from_date]
146
+ to = @options[:to_date]
147
+
148
+ [
149
+ (from ? "date(#{@date_field}) >= '#{from}'" : nil),
150
+ (to ? "date(#{@date_field}) <= '#{to}'" : nil),
151
+ (@options[:only_crawlers] ? "bot == 1" : nil),
152
+ (@options[:ignore_crawlers] ? "bot == 0" : nil),
153
+ (@options[:no_selfpolls] ? "ip != '::1'" : nil),
154
+ "true"
155
+ ].compact.join " and "
156
+ end
157
+
158
+ private
159
+
160
+ # given 5 builds the query to get all lines with status 5xx
161
+ def status_query(status)
162
+ %(SELECT date(#{@date_field}), count(#{@date_field}) from #{@table}
163
+ where substr(status, 1,1) == '#{status}' and #{filter}
164
+ group by date(#{@date_field})).gsub("\n", "")
165
+ end
166
+
167
+ # given format string, group ip by time formatted with format string
168
+ # (e.g. by hour if format string is "%H")
169
+ # name is used to give the name to the column with formatted time
170
+ def ip_by_time_query(name, format_string)
171
+ %(SELECT ip,
172
+ strftime("%H", #{@date_field}) as #{name},
173
+ count(#{@url_field}) from #{@table}
174
+ where #{filter} and ip != "" and
175
+ #{@url_field} != "" and
176
+ #{@date_field} != ""
177
+ group by ip, #{name}
178
+ order by ip, #{@date_field}).gsub("\n", "")
179
+ end
180
+
181
+ def date_sel(date1, date2, method)
182
+ if date1 && date2
183
+ [date1, date2].send(method)
184
+ elsif date1
185
+ date1
186
+ else
187
+ date2
188
+ end
189
+ end
190
+ end
191
+ end