log_slice 0.1 → 0.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -5,46 +5,59 @@ Uses binary search to find a line quickly in a large log file. O(log2(n))
5
5
 
6
6
  Can only search sorted data, which means it's probably only useful for searching by timestamp.
7
7
 
8
+ ## Installation
9
+
10
+ gem install log_slice
11
+
8
12
  ## Example
9
13
 
10
14
  something-interesting.log:
11
15
 
12
- [2012-08-29 18:41:12] (9640) something something something else 1
13
- [2012-08-29 18:41:14] (9640) something something something else 2
14
- [2012-08-29 18:41:14] (9640) something something something else 3
15
- [2012-08-29 18:41:14] (9640) something something something else 4
16
- [2012-08-29 18:42:18] (9640) something something something else 5
17
- [2012-08-29 18:42:18] (9640) something something something else 6
18
- [2012-08-29 18:42:18] (9640) something something something else 7
19
- [2012-08-29 18:42:20] (9640) something something something else 8
20
- [2012-08-29 18:42:20] (9640) something something something else 9
21
- [2012-08-29 18:42:20] (9640) something something something else 10
16
+ [2012-08-29 18:41:12] (9640) 1 something something something else 1
17
+ [2012-08-29 18:41:14] (9640) 2 something something something else 2
18
+ [2012-08-29 18:41:14] (9640) 3 something something something else 3
19
+ [2012-08-29 18:41:14] (9640) 4 something something something else 4
20
+ [2012-08-29 18:42:18] (9640) 5 something something something else 5
21
+ [2012-08-29 18:42:18] (9640) 6 something something something else 6
22
+ [2012-08-29 18:42:18] (9640) 7 something something something else 7
23
+ [2012-08-29 18:42:20] (9640) 8 something something something else 8
24
+ [2012-08-29 18:42:20] (9640) 9 something something something else 9
25
+ [2012-08-29 18:42:20] (9640) 10 something something something else 10
26
+
27
+ extract everything that happened at or after 18:42:00:
28
+ ```ruby
29
+ find_date = DateTime.parse("2012-08-29 18:42:00")
30
+ file = LogSlice.new("something-interesting.log").find do |line|
31
+ date_string = line.match(/^\[([^\]]+)\]/)[1]
32
+ find_date <=> DateTime.parse(date_string)
33
+ end
22
34
 
23
- extract everything that happened at or after 18:42:18:
35
+ # this will yield an instance of File
36
+ # the position in the file is the first byte of the found line
24
37
 
25
- find_date = DateTime.parse("2012-08-29 18:42:18")
26
- file = LogSlice.new("something-interesting.log").find do |line|
27
- date_string = line.match(/^\[([^\]]+)\]/)[1]
28
- find_date <=> DateTime.parse(date_string)
29
- end
38
+ file.readline
39
+ #=> "[2012-08-29 18:42:18] (9640) 5 something something something else 5"
30
40
 
31
- # this will yield an instance of File
32
- # the position is the file is the first byte of the found line
41
+ # Once you found the line you were after,
42
+ # you can continue to read subsequent lines:
33
43
 
34
- file.readline
35
- #=> "[2012-08-29 18:42:18] (9640) something something something else 5"
44
+ file.readline
45
+ #=> "[2012-08-29 18:42:18] (9640) 6 something something something else 6"
46
+ ```
36
47
 
37
- file.readline
38
- #=> "[2012-08-29 18:42:18] (9640) something something something else 6"
48
+ LogSlice.new takes a File or file path, and a comparison function, passed as a block.
49
+ When passed a line, the block must return -1 if the value represented by the line
50
+ is too high, 1 if it's too low, or 0 if it's just right.
39
51
 
40
- LogSlice.new takes a File or file path, and a block. When passed a line,
41
- the block must return -1 if the value represented by the line is too high,
42
- 1 if it's too low, or 0 if it's just right.
52
+ ```ruby
53
+ LogSlice.new(file_or_file_path).find(&comparison_function) #=> File or nil
54
+ ```
43
55
 
44
56
  ## Limitations
45
57
 
46
- * Can only search sorted data. At the moment, if the data isn't sorting, it won't detect it and it will loop forever.
47
- * Can only search for a known value. For example, searching for 18:42:19 in the example above will yield nothing.
58
+ * Can only search sorted data. If the data isn't sorted, it will most likely not find anything (ie return nil).
59
+ In very rare cases it may find value the anyway by chance, so it's not guaranteed that unsorted
60
+ input will always yield nil.
48
61
 
49
62
  ## Disclaimer
50
63
 
@@ -0,0 +1,34 @@
1
+ class LogSlice
2
+ class SearchBoundary
3
+
4
+ attr_reader :cursor
5
+
6
+ def initialize file_size
7
+ @file_size = file_size
8
+ end
9
+
10
+ # reset the search boundary to cover the entire file
11
+ def reset
12
+ @lower = 0
13
+ @upper = @file_size
14
+ @cursor = 0
15
+ cursor_forward
16
+ self
17
+ end
18
+
19
+ # Move cursor forward.
20
+ # The cursor is moved half way between it's start location and the upper boundary.
21
+ def cursor_forward
22
+ @lower = @cursor
23
+ @cursor = @cursor + (@upper - @cursor) / 2
24
+ end
25
+
26
+ # Move cursor backward.
27
+ # The cursor is moved half way between it's start location and the lower boundary.
28
+ def cursor_back
29
+ @upper = @cursor
30
+ @cursor = @cursor - (@cursor - @lower) / 2
31
+ end
32
+
33
+ end
34
+ end
data/lib/log_slice.rb CHANGED
@@ -1,129 +1,142 @@
1
+ require File.expand_path("log_slice/search_boundary", File.dirname(__FILE__))
2
+
1
3
  class LogSlice
2
4
 
5
+ NEWLINE = "\n"
6
+ NEWLINE_CHAR = "\n"[0]
7
+
3
8
  # @param log_file [File, String]
4
- def initialize log_file
9
+ # @param options [Hash] :exact_match default false
10
+ def initialize log_file, options={}
5
11
  @file = log_file.respond_to?(:seek) ? log_file : File.open(log_file, 'r')
6
- @size = @file.stat.size
7
- @lower = 0
8
- @upper = @size
9
- @char_cursor = nil
12
+ @exact_match = options[:exact_match] || false
13
+ @search_boundary = SearchBoundary.new(@file.stat.size)
10
14
  @line_cursor = nil
11
15
  end
12
16
 
13
- # Depends on lines being sorted
14
- # @return [File] file after seeking to start of line
17
+ # Find line in the file using the comparison function.
18
+ # Depends on lines being sorted.
19
+ # The comparison function will be passed lines from the file. It must
20
+ # return -1 if the line is later than the one it's looking for, 1 if
21
+ # the line is earlier than the one it's looking for, and 0 if it is
22
+ # the line it's looking for.
23
+ # @param compare [Proc] comparison function
24
+ # @return [File, nil] file after seeking to start of line or nil if line not found
15
25
  def find &compare
16
- direction = :forward
17
- line_cursor = nil
18
- loop do
19
- line = next_line direction
20
- if line_cursor == @line_cursor
21
- return nil
22
- end
23
- line_cursor = @line_cursor
24
- case compare.call(line)
25
- when 0 # found
26
- walk_up_to_first_match compare
27
- return @file
28
- when -1
29
- direction = :back
30
- when 1
31
- direction = :forward
32
- else
33
- raise ArgumentError
26
+ reset_progress_check
27
+ @search_boundary.reset
28
+ line = find_next_newline
29
+ while making_progress?
30
+ comp_value = compare.call(line)
31
+ if comp_value == 0 # found matching line
32
+ backtrack_to_first_line_match compare
33
+ return @file
34
+ else
35
+ @search_boundary.send(comp_value < 0 ? :cursor_back : :cursor_forward)
36
+ line = find_next_newline
34
37
  end
35
38
  end
39
+ if @exact_match
40
+ nil
41
+ else
42
+ backtrack_to_gap compare
43
+ return @file.eof? ? nil : @file
44
+ end
36
45
  end
37
46
 
38
47
  private
39
48
 
40
- # @param direction [Symbol] direction in file to move, :forward or :back
41
- # @return [String] line
42
- def next_line direction
43
- move_char_cursor direction
44
- find_next_newline
49
+ # whether the cursor has moved since previous call
50
+ def making_progress?
51
+ return false if @previous_cursor_position == @line_cursor
52
+ @previous_cursor_position = @line_cursor
53
+ true
45
54
  end
46
55
 
56
+ def reset_progress_check
57
+ @previous_cursor_position = nil
58
+ end
59
+
60
+
47
61
  # once the line has been found, we must check the lines above it -
48
62
  # if a line above also matches, we should seek to it.
49
63
  # (this make search on some files O(n/2) instead of O(log2(n))) )
50
- def walk_up_to_first_match compare
51
- move_to_previous_line compare
64
+ # @param compare [Proc] comparison function
65
+ def backtrack_to_first_line_match compare
66
+ previous_cursor_position = @line_cursor
67
+ each_line_reverse do |line|
68
+ if compare.call(line) != 0
69
+ # we've found a non-matching line,
70
+ # so we set @line_cursor back to the previous matching line
71
+ @line_cursor = previous_cursor_position
72
+ break
73
+ end
74
+ previous_cursor_position = @line_cursor
75
+ end
52
76
  @file.seek(@line_cursor)
53
77
  end
54
78
 
55
- def move_to_previous_line compare
56
- last_cursor_position = @line_cursor
79
+ # if no match was found, we're sitting at too-high.
80
+ # backtrack up to the first too-high
81
+ def backtrack_to_gap compare
82
+ @line_cursor = @file.pos
83
+ previous_cursor_position = @line_cursor
57
84
  each_line_reverse do |line|
58
- if compare.call(line) != 0
59
- @line_cursor = last_cursor_position
85
+ if compare.call(line) == 1
86
+ @line_cursor = previous_cursor_position
60
87
  break
61
88
  end
62
- last_cursor_position = @line_cursor
89
+ previous_cursor_position = @line_cursor
63
90
  end
91
+ @file.seek(@line_cursor)
64
92
  end
65
93
 
94
+ # iterate over each line from the current cursor position, in reverse.
66
95
  def each_line_reverse
67
96
  chunk_size = 512
68
- left_over = ""
69
97
  cursor = @line_cursor
98
+ left_over = ""
70
99
  loop do
71
- cursor = cursor - chunk_size
72
- if cursor < 0
73
- chunk_size = chunk_size + cursor
100
+ if chunk_size > cursor
101
+ chunk_size = cursor
74
102
  cursor = 0
103
+ else
104
+ cursor -= chunk_size
75
105
  end
76
106
  break if chunk_size == 0
77
- #puts "seeking to #{cursor}, chunk size #{chunk_size}, left over #{left_over.length}"
78
107
  @file.seek(cursor)
79
108
  chunk = @file.read(chunk_size) + left_over
80
- lines = chunk.split("\n")
109
+ lines = chunk.split(NEWLINE)
81
110
  while lines.length > 1
82
111
  line = lines.pop || ""
83
- @line_cursor = @line_cursor - (line.length + 1)
112
+ @line_cursor -= (line.length + NEWLINE.length)
84
113
  yield(line)
85
114
  end
86
115
  left_over = lines[0] || ""
87
116
  lines = []
88
117
  end
118
+ @line_cursor -= (left_over.length + NEWLINE.length)
89
119
  yield left_over unless left_over == ''
90
120
  end
91
121
 
122
+ # After the search is moved by cursor search_boundary.cursor_*, it's position
123
+ # is probably not at the start of a line, but somewhere within a line.
124
+ # find_next_newline advances the cursor until we're at the start of the
125
+ # next line.
92
126
  def find_next_newline
93
- newline_char = "\n"[0]
94
- @line_cursor = @char_cursor
127
+ @line_cursor = @search_boundary.cursor
95
128
  @file.seek(@line_cursor)
96
- current_char = nil
97
- while (current_char = @file.getc) != newline_char && !current_char.nil?
98
- @line_cursor = @line_cursor + 1
129
+ while (current_char = @file.getc) != NEWLINE_CHAR && !current_char.nil?
130
+ @line_cursor += 1
99
131
  end
100
- if current_char.nil?
101
- # eof
132
+ if @file.eof?
102
133
  ""
103
134
  else
104
- @line_cursor = @line_cursor + 1
135
+ @line_cursor += 1
105
136
  @file.seek(@line_cursor)
106
137
  @file.readline
107
138
  end
108
139
  end
109
140
 
110
- # @param direction [Symbol] direction in file to move the cursor, :forward or :back
111
- def move_char_cursor direction
112
- if @char_cursor
113
- if direction == :forward
114
- distance = (@upper - @char_cursor) / 2
115
- old_cursor = @char_cursor
116
- @char_cursor = @char_cursor + distance
117
- @lower = old_cursor
118
- else
119
- distance = (@char_cursor - @lower) / 2
120
- old_cursor = @char_cursor
121
- @char_cursor = @char_cursor - distance
122
- @upper = old_cursor
123
- end
124
- else
125
- @char_cursor = @size / 2
126
- end
127
- end
128
141
 
129
142
  end
data/log_slice.gemspec CHANGED
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'log_slice'
3
- s.version = '0.1'
3
+ s.version = '0.2'
4
4
  s.authors = ["Joel Plane"]
5
5
  s.email = ["joel.plane@gmail.com"]
6
6
  s.homepage = 'https://github.com/joelplane/log_slice'
data/spec/helper.rb CHANGED
@@ -3,7 +3,7 @@ require 'tempfile'
3
3
  RSpec.configure do |c|
4
4
  c.include(Module.new do
5
5
 
6
- def range_to_file range
6
+ def enumerable_to_file range
7
7
  file = Tempfile.new("test-#{range}")
8
8
  file.write(range.to_a.join("\n"))
9
9
  file.flush
@@ -4,7 +4,7 @@ require 'helper'
4
4
  describe LogSlice do
5
5
 
6
6
  it "finds the line" do
7
- log_slice = LogSlice.new(range_to_file 1..100)
7
+ log_slice = LogSlice.new enumerable_to_file(1..100), :exact_match => true
8
8
  file = log_slice.find do |line|
9
9
  42 <=> line.to_i
10
10
  end
@@ -12,7 +12,7 @@ describe LogSlice do
12
12
  end
13
13
 
14
14
  it "finds the first line when there are many matching lines" do
15
- log_slice = LogSlice.new(string_to_file (["1", "2"] + ["3"]*20).join("\n"))
15
+ log_slice = LogSlice.new string_to_file((["1", "2"] + ["3"]*20).join("\n")), :exact_match => true
16
16
  file = log_slice.find do |line|
17
17
  3 <=> line.to_i
18
18
  end
@@ -22,7 +22,7 @@ describe LogSlice do
22
22
 
23
23
  it "finds a matching line with log2(lines)+1 calls to comparison function" do
24
24
  [100, 10_000, 100_000].each do |total_lines|
25
- log_slice = LogSlice.new(range_to_file 1..total_lines)
25
+ log_slice = LogSlice.new enumerable_to_file(1..total_lines), :exact_match => true
26
26
  comparisons_count = 0
27
27
  log_slice.find do |line|
28
28
  comparisons_count = comparisons_count + 1
@@ -32,24 +32,69 @@ describe LogSlice do
32
32
  end
33
33
  end
34
34
 
35
- it "nil when no matching line is found (1)" do
36
- log_slice = LogSlice.new(range_to_file 1..100)
35
+ it "nil when no matching line is found (all values lower)" do
36
+ log_slice = LogSlice.new enumerable_to_file(1..100), :exact_match => true
37
37
  file = log_slice.find do |line|
38
38
  1
39
39
  end
40
40
  file.should be_nil
41
41
  end
42
42
 
43
- it "nil when no matching line is found (2)" do
44
- log_slice = LogSlice.new(range_to_file 1..100)
43
+ it "nil when no matching line is found (all values higher)" do
44
+ log_slice = LogSlice.new enumerable_to_file(1..100), :exact_match => true
45
45
  file = log_slice.find do |line|
46
46
  -1
47
47
  end
48
48
  file.should be_nil
49
49
  end
50
50
 
51
+ it "exact_match nil when no matching line is found (higher and lower values)" do
52
+ log_slice = LogSlice.new enumerable_to_file((1..100).to_a-[42]), :exact_match => true
53
+ file = log_slice.find do |line|
54
+ 42 <=> line.to_i
55
+ end
56
+ file.should be_nil
57
+ end
58
+
59
+ it "non-exact_match find when no matching line is found (higher and lower values)" do
60
+ log_slice = LogSlice.new enumerable_to_file((1..100).to_a-[42])
61
+ file = log_slice.find do |line|
62
+ 42 <=> line.to_i
63
+ end
64
+ file.readline.should == "43\n"
65
+ end
66
+
67
+ it "non-exact_match find when no matching line is found, many dups" do
68
+ [0,2,4,6].each do |n|
69
+ log_slice = LogSlice.new enumerable_to_file([1,1,1,3,3,3,5,5,5,7,7,7])
70
+ file = log_slice.find do |line|
71
+ n <=> line.to_i
72
+ end
73
+ 3.times do
74
+ file.readline.strip.should == "#{n+1}"
75
+ end
76
+ end
77
+ end
78
+
79
+ it "non-exact_match find when no matching line is found, beyond" do
80
+ log_slice = LogSlice.new enumerable_to_file([1,1,1,3,3,3,5,5,5,7,7,7])
81
+ file = log_slice.find do |line|
82
+ 8 <=> line.to_i
83
+ end
84
+ file.should be_nil
85
+ end
86
+
87
+ it "nil when lines are not sorted" do
88
+ unsorted = [1,99,4,96,7,70,15,67,24,45,30,40]
89
+ log_slice = LogSlice.new enumerable_to_file(unsorted), :exact_match => true
90
+ file = log_slice.find do |line|
91
+ 42 <=> line.to_i
92
+ end
93
+ file.should be_nil
94
+ end
95
+
51
96
  it "nil when acting on an empty file" do
52
- log_slice = LogSlice.new(string_to_file "")
97
+ log_slice = LogSlice.new string_to_file(""), :exact_match => true
53
98
  file = log_slice.find do |line|
54
99
  42 <=> line.to_i
55
100
  end
@@ -57,8 +102,8 @@ describe LogSlice do
57
102
  end
58
103
 
59
104
  it "#each_line_reverse" do
60
- log_slice = LogSlice.new(range_to_file 1..10000)
61
- log_slice.instance_eval { @line_cursor = @size }
105
+ log_slice = LogSlice.new enumerable_to_file(1..10000), :exact_match => true
106
+ log_slice.instance_eval { @line_cursor = @file.stat.size }
62
107
  lines = []
63
108
  file = log_slice.send(:each_line_reverse) do |line|
64
109
  lines << line.strip.to_i
@@ -67,8 +112,8 @@ describe LogSlice do
67
112
  end
68
113
 
69
114
  it "#each_line_reverse when file is empty" do
70
- log_slice = LogSlice.new(string_to_file "")
71
- log_slice.instance_eval { @line_cursor = @size }
115
+ log_slice = LogSlice.new string_to_file(""), :exact_match => true
116
+ log_slice.instance_eval { @line_cursor = @file.stat.size }
72
117
  lines = []
73
118
  file = log_slice.send(:each_line_reverse) do |line|
74
119
  lines << line.strip.to_i
@@ -77,8 +122,8 @@ describe LogSlice do
77
122
  end
78
123
 
79
124
  it "#each_line_reverse when file has single newline char" do
80
- log_slice = LogSlice.new(string_to_file "\n")
81
- log_slice.instance_eval { @line_cursor = @size }
125
+ log_slice = LogSlice.new string_to_file("\n"), :exact_match => true
126
+ log_slice.instance_eval { @line_cursor = @file.stat.size }
82
127
  lines = []
83
128
  file = log_slice.send(:each_line_reverse) do |line|
84
129
  lines << line.strip.to_i
metadata CHANGED
@@ -1,12 +1,12 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: log_slice
3
3
  version: !ruby/object:Gem::Version
4
- hash: 9
4
+ hash: 15
5
5
  prerelease:
6
6
  segments:
7
7
  - 0
8
- - 1
9
- version: "0.1"
8
+ - 2
9
+ version: "0.2"
10
10
  platform: ruby
11
11
  authors:
12
12
  - Joel Plane
@@ -44,6 +44,7 @@ files:
44
44
  - Gemfile
45
45
  - README.md
46
46
  - lib/log_slice.rb
47
+ - lib/log_slice/search_boundary.rb
47
48
  - log_slice.gemspec
48
49
  - spec/helper.rb
49
50
  - spec/log_slice_spec.rb