sort_index 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 44dd87ebf9e2728f47050cfa271ed013f7af834d
4
+ data.tar.gz: cf3f99237d86fa6caed70a84ab57b5df06eb23c5
5
+ SHA512:
6
+ metadata.gz: 867b0ba9295f54cedcb616da6101e7aeac773c9688a42255d5ee4d3feab137767c767c7d7ba42ce87ed16e7302e8b66b0601eaee8b04b6df15162f771222d26e
7
+ data.tar.gz: 0bfdeed7a173e601e4d036586000bf66175a9bb7bcc72fa1e5971f5b750aa21d7351dfeebda4bdeec4b897dc6a52389522b4b390795d8c464cfa72d87dcb182d
data/.gitignore ADDED
@@ -0,0 +1,9 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
data/.travis.yml ADDED
@@ -0,0 +1,4 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.2.3
4
+ before_install: gem install bundler -v 1.10.6
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in sort_index.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2015 Scott Pierce
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,75 @@
1
+ # SortIndex
2
+ Proof of concept to maintain a file with sorted and unique values.
3
+ This could be helpful for building building indexes.
4
+
5
+ `Range#bsearch` is used to determine if a line is already in the file and
6
+ to determine where a line should be inserted. This means Ruby 2.0 is required.
7
+
8
+ ## Installation
9
+
10
+ Add this line to your application's Gemfile:
11
+
12
+ ```ruby
13
+ gem 'sort_index'
14
+ ```
15
+
16
+ And then execute:
17
+
18
+ $ bundle
19
+
20
+ Or install it yourself as:
21
+
22
+ $ gem install sort_index
23
+
24
+ ## Usage
25
+
26
+ Use `sorted_puts` on the File instance instead of `write`, `<<`, etc.
27
+
28
+ ```ruby
29
+ SortIndex::File.open('animals', 'w+') do |f|
30
+ f.sorted_puts 'cat' # add stuff out of order
31
+ f.sorted_puts 'bat'
32
+ f.sorted_puts 'dog'
33
+ f.sorted_puts 'ant'
34
+ f.sorted_puts 'cat' # duplicate on purpose
35
+ end
36
+
37
+ IO.read('animals')
38
+ => "ant\nbat\ncat\ndog\n"
39
+ ```
40
+
41
+ ## Gotchas
42
+ Performance of writes is probably the biggest problem. There really isn't a
43
+ great way to insert lines in the middle of a file. So we do it the naive way.
44
+
45
+ 1. save current position `IO#tell`
46
+ 2. read the remaining bytes of the file using `IO#read`
47
+ 3. write the line at the current position
48
+ 4. write the rest of the contents from step 2.
49
+
50
+ Common sense would say don't create gigabyte files this way. Use unix calls
51
+ instead:
52
+
53
+ ```sh
54
+ echo 'cat' >> animals
55
+ echo 'bat' >> animals
56
+ echo 'dog' >> animals
57
+ echo 'ant' >> animals
58
+ sort -u animals
59
+ ```
60
+
61
+ ## Development
62
+
63
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
64
+
65
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
66
+
67
+ ## Contributing
68
+
69
+ Bug reports and pull requests are welcome on GitHub at https://github.com/ddrscott/sort_index.
70
+
71
+
72
+ ## License
73
+
74
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
75
+
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,10 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'sort_index'
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ require 'pry'
10
+ Pry.start
data/bin/setup ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
6
+
7
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,104 @@
1
+ module SortIndex
2
+ class File < ::File
3
+
4
+ # adds the line to the while maintaining the data's sort order
5
+ #
6
+ # @param [String] line to add to the file, it should not have it's own line ending.
7
+ # @return [Nil] always returns nil to match standard #puts method
8
+ def sorted_puts(line)
9
+ if line == nil || line.size == 0
10
+ raise ArgumentError, 'Line cannot be blank!'
11
+ end
12
+ if line.index($/)
13
+ raise ArgumentError, "Cannot `puts` a line with extra line endings. Make sure the line does not contain `#{$/.inspect}`"
14
+ end
15
+
16
+ matched, idx = binary_seek(line)
17
+
18
+ if matched
19
+ # an exact match was found, nothing to do
20
+ else
21
+ if idx == nil
22
+ # append to end of file
23
+ self.seek(0, IO::SEEK_END)
24
+ puts(line)
25
+ else
26
+ self.seek(cached_positions[idx][0], IO::SEEK_SET)
27
+ do_at_current_position{puts(line)}
28
+ end
29
+ update_cached_position(idx, line)
30
+ end
31
+ nil
32
+ end
33
+
34
+ private
35
+
36
+ # Builds an Array of position and length of the current file.
37
+ # @return [Array[Array[Fixnum,Fixnum]]] array of position, line length pairs
38
+ def index_each_line
39
+ positions = []
40
+ size = 0
41
+ each_line do |line|
42
+ positions << [size, line.size]
43
+ size += line.size
44
+ end
45
+ rewind
46
+ positions
47
+ end
48
+
49
+ # remembers current file position, reads everything at the position
50
+ # execute the block, and put everything back.
51
+ # This routine is really bad for huge files since it could run out of
52
+ # memory.
53
+ def do_at_current_position(&block)
54
+ current_position = self.tell
55
+ huge_buffer = self.read
56
+ self.seek(current_position, IO::SEEK_SET)
57
+ block.call
58
+ ensure
59
+ self.write huge_buffer
60
+ end
61
+
62
+ def update_cached_position(idx, line)
63
+ line_size_with_ending = line.size + $/.size
64
+ if idx
65
+ # add an entry to the positions
66
+ prev_pos = cached_positions[idx][0]
67
+ cached_positions.insert(idx, [prev_pos, line_size_with_ending])
68
+
69
+ # move all other entries ahead by the new entries length
70
+ (idx+1).upto(cached_positions.size-1).each do |i|
71
+ prev_pos = cached_positions[i][0]
72
+ cached_positions[i][0] = prev_pos + line_size_with_ending
73
+ end
74
+ elsif cached_positions.empty?
75
+ cached_positions << [0, line_size_with_ending]
76
+ else
77
+ last_pos, last_size = *cached_positions.last
78
+ cached_positions << [last_pos + last_size, line_size_with_ending]
79
+ end
80
+ end
81
+
82
+ def cached_positions
83
+ @cached_positions ||= index_each_line
84
+ end
85
+
86
+ # @return [Boolean, Fixnum] true if exact match, the minimum position index
87
+ def binary_seek(val)
88
+ return nil if cached_positions.empty?
89
+
90
+ last_line = nil
91
+ idx = (0...@cached_positions.size).bsearch do |i|
92
+ last_pair = @cached_positions[i]
93
+ pos, len = *last_pair
94
+ self.seek(pos, IO::SEEK_SET)
95
+
96
+ # we don't want to compare the line ending
97
+ last_line = self.read(len).strip
98
+ last_line >= val
99
+ end
100
+ return last_line == val, idx
101
+ end
102
+ # END private
103
+ end
104
+ end
@@ -0,0 +1,3 @@
1
+ module SortIndex
2
+ VERSION = '0.1.0'
3
+ end
data/lib/sort_index.rb ADDED
@@ -0,0 +1,5 @@
1
+ require 'sort_index/version'
2
+ require 'sort_index/file'
3
+
4
+ module SortIndex
5
+ end
@@ -0,0 +1,32 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'sort_index/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.required_ruby_version = '>= 2.0.0'
8
+ spec.name = 'sort_index'
9
+ spec.version = SortIndex::VERSION
10
+ spec.authors = ['Scott Pierce']
11
+ spec.email = ['ddrscott@gmail.com']
12
+
13
+ spec.summary = %q{Simple File wrapper to keep file contents unique and sorted as lines are added.}
14
+ spec.description = <<-RDOC
15
+ == Description ==
16
+ Proof of concept to maintain a file with sorted and unique values.
17
+ This could be helpful for building building indexes.
18
+
19
+ Range#bsearch is used to determine if a line is already in the file and
20
+ to determine where a line should be inserted. This means Ruby 2.0 is required.
21
+ RDOC
22
+ spec.homepage = 'https://github.com/ddrscott/sort_index'
23
+ spec.license = 'MIT'
24
+
25
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
26
+ spec.require_paths = ['lib']
27
+
28
+ spec.add_development_dependency 'bundler', '~> 1.10'
29
+ spec.add_development_dependency 'rake', '~> 10.0'
30
+ spec.add_development_dependency 'rspec', '~> 3.2.0'
31
+ spec.add_development_dependency 'pry', '~> 0.10.3'
32
+ end
metadata ADDED
@@ -0,0 +1,121 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: sort_index
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Scott Pierce
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-12-26 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.10'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.10'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: 3.2.0
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: 3.2.0
55
+ - !ruby/object:Gem::Dependency
56
+ name: pry
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: 0.10.3
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: 0.10.3
69
+ description: |
70
+ == Description ==
71
+ Proof of concept to maintain a file with sorted and unique values.
72
+ This could be helpful for building building indexes.
73
+
74
+ Range#bsearch is used to determine if a line is already in the file and
75
+ to determine where a line should be inserted. This means Ruby 2.0 is required.
76
+ email:
77
+ - ddrscott@gmail.com
78
+ executables: []
79
+ extensions: []
80
+ extra_rdoc_files: []
81
+ files:
82
+ - ".gitignore"
83
+ - ".rspec"
84
+ - ".travis.yml"
85
+ - Gemfile
86
+ - LICENSE.txt
87
+ - README.md
88
+ - Rakefile
89
+ - bin/console
90
+ - bin/setup
91
+ - lib/sort_index.rb
92
+ - lib/sort_index/file.rb
93
+ - lib/sort_index/version.rb
94
+ - sort_index.gemspec
95
+ homepage: https://github.com/ddrscott/sort_index
96
+ licenses:
97
+ - MIT
98
+ metadata: {}
99
+ post_install_message:
100
+ rdoc_options: []
101
+ require_paths:
102
+ - lib
103
+ required_ruby_version: !ruby/object:Gem::Requirement
104
+ requirements:
105
+ - - ">="
106
+ - !ruby/object:Gem::Version
107
+ version: 2.0.0
108
+ required_rubygems_version: !ruby/object:Gem::Requirement
109
+ requirements:
110
+ - - ">="
111
+ - !ruby/object:Gem::Version
112
+ version: '0'
113
+ requirements: []
114
+ rubyforge_project:
115
+ rubygems_version: 2.4.5.1
116
+ signing_key:
117
+ specification_version: 4
118
+ summary: Simple File wrapper to keep file contents unique and sorted as lines are
119
+ added.
120
+ test_files: []
121
+ has_rdoc: