georgi-git_store 0.2.4 → 0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -10,25 +10,13 @@ in-memory representation, which can be modified and finally committed.
10
10
  GitStore supports transactions, so that updates to the store either
11
11
  fail or succeed completely.
12
12
 
13
- GitStore manages concurrent access by a file locking scheme. So only
14
- one process can start a transaction at one time. This is implemented
15
- by locking the `refs/head/<branch>.lock` file, which is also
16
- respected by the git binary.
17
-
18
13
  ### Installation
19
14
 
20
- GitStore can be installed as gem easily, if you have RubyGems 1.2.0:
15
+ GitStore can be installed as gem easily:
21
16
 
22
17
  $ gem sources -a http://gems.github.com
23
18
  $ sudo gem install georgi-git_store
24
19
 
25
- If you don't have RubyGems 1.2.0, you may download the package on the
26
- [github page][4] and build the gem yourself:
27
-
28
- $ gem build git_store.gemspec
29
- $ sudo gem install git_store
30
-
31
-
32
20
  ### Usage Example
33
21
 
34
22
  First thing you should do, is to initialize a new git repository.
@@ -41,8 +29,6 @@ Now you can instantiate a GitStore instance and store some data. The
41
29
  data will be serialized depending on the file extension. So for YAML
42
30
  storage you can use the 'yml' extension:
43
31
 
44
- @@ruby
45
-
46
32
  store = GitStore.new('/path/to/repo')
47
33
 
48
34
  store['users/matthias.yml'] = User.new('Matthias')
@@ -50,24 +36,17 @@ storage you can use the 'yml' extension:
50
36
 
51
37
  store.commit 'Added user and page'
52
38
 
53
- # Note, that directories will be created automatically.
54
- # Another way to access a path is:
55
-
56
- store['config', 'wiki.yml'] = { 'name' => 'My Personal Wiki' }
57
-
58
- # Finally you can access the git store as a Hash of Hashes, but in
59
- # this case you have to create the Tree objects manually:
60
-
61
- puts store['users']['wiki.yml']['name']
62
-
63
-
64
39
  ### Transactions
65
40
 
66
- If you access the repository from different processes, you should
67
- write to your store using transactions. If something goes wrong inside
68
- a transaction, all changes will be rolled back to the original state.
41
+ GitStore manages concurrent access by a file locking scheme. So only
42
+ one process can start a transaction at one time. This is implemented
43
+ by locking the `refs/head/<branch>.lock` file, which is also
44
+ respected by the git binary.
69
45
 
70
- @@ruby
46
+ If you access the repository from different processes or threads, you
47
+ should write to the store using transactions. If something goes wrong
48
+ inside a transaction, all changes will be rolled back to the original
49
+ state.
71
50
 
72
51
  store = GitStore.new('/path/to/repo')
73
52
 
@@ -76,7 +55,8 @@ a transaction, all changes will be rolled back to the original state.
76
55
  store['pages/home.yml'] = Page.new('matthias', 'Home')
77
56
  end
78
57
 
79
- # transaction without a block
58
+
59
+ A transaction without a block looks like this:
80
60
 
81
61
  store.start_transaction
82
62
 
@@ -85,76 +65,16 @@ a transaction, all changes will be rolled back to the original state.
85
65
  store.rollback # This will restore the original state
86
66
 
87
67
 
88
- ### Performance
89
-
90
- Maintaining 1000 objects in one folder seems to yield quite usable
91
- results. If I run the following benchmark:
92
-
93
- @@ruby
94
-
95
- Benchmark.bm 20 do |x|
96
- x.report 'store 1000 objects' do
97
- store.transaction { 'aaa'.upto('jjj') { |key| store[key] = rand.to_s } }
98
- end
99
- x.report 'commit one object' do
100
- store.transaction { store['aa'] = rand.to_s }
101
- end
102
- x.report 'load 1000 objects' do
103
- GitStore.new('.')
104
- end
105
- x.report 'load 1000 with grit' do
106
- Grit::Repo.new('.').tree.contents.each { |e| e.data }
107
- end
108
- end
109
-
110
-
111
- I get following results:
112
-
113
- user system total real
114
- store 1000 objects 4.150000 0.880000 5.030000 ( 5.035804)
115
- commit one object 0.070000 0.020000 0.090000 ( 0.082252)
116
- load 1000 objects 0.630000 0.120000 0.750000 ( 0.750765)
117
- load 1000 with grit 1.960000 0.260000 2.220000 ( 2.228583)
118
-
119
-
120
- In a real world scenario, you should partition your data. For example,
121
- my blog engine [Shinmun][7], stores posts in folders by month.
122
-
123
- One nice thing about the results is, that GitStore loads large
124
- directories three times faster than [Grit][2].
125
-
126
-
127
- ### Where is my data?
68
+ ### Data Storage
128
69
 
129
70
  When you call the `commit` method, your data is written back straight
130
71
  into the git repository. No intermediate file representation. So if
131
- you want to look into your data, you can use some git browser like
132
- [git-gui][6] or just checkout the files:
72
+ you want to have a look at your data, you can use a git browser like
73
+ [git-gui][6] or checkout the files:
133
74
 
134
75
  $ git checkout
135
76
 
136
77
 
137
- ### Development Mode
138
-
139
- There is also some kind of development mode, which is convenient to
140
- use. Imagine you are tweaking the design of your blog, which is
141
- storing its pages in a GitStore. You don't want to commit each change
142
- to some change in your browser. FileStore helps you here:
143
-
144
- @@ruby
145
-
146
- store = GitStore::FileStore.new('.')
147
-
148
- # Access the file 'posts/2009/1/git-store.md'
149
-
150
- p store['posts', 2009, 1, 'git-store.md']
151
-
152
-
153
- FileStore forbids you to write to the disk, as this makes no sense. If
154
- you want to store something programmatically, you have to use the real
155
- GitStore.
156
-
157
-
158
78
  ### Iteration
159
79
 
160
80
  Iterating over the data objects is quite easy. Furthermore you can
@@ -162,15 +82,15 @@ iterate over trees and subtrees, so you can partition your data in a
162
82
  meaningful way. For example you may separate the config files and the
163
83
  pages of a wiki:
164
84
 
165
- @@ruby
166
-
167
85
  store['pages/home.yml'] = Page.new('matthias', 'Home')
168
86
  store['pages/about.yml'] = Page.new('matthias', 'About')
169
- store['pages/links.yml'] = WikiPage.new('matthias', 'Links')
170
87
  store['config/wiki.yml'] = { 'name' => 'My Personal Wiki' }
171
88
 
172
- store.each { |obj| ... } # yields all pages and the config file
173
- store['pages'].each { |page| ... } # yields only the pages
89
+ # Enumerate all objects
90
+ store.each { |obj| ... }
91
+
92
+ # Enumerate only pages
93
+ store['pages'].each { |page| ... }
174
94
 
175
95
 
176
96
  ### Serialization
@@ -178,42 +98,39 @@ pages of a wiki:
178
98
  Serialization is dependent on the filename extension. You can add more
179
99
  handlers if you like, the interface is like this:
180
100
 
181
- @@ruby
182
-
183
101
  class YAMLHandler
184
- def read(path, data)
102
+ def read(data)
185
103
  YAML.load(data)
186
104
  end
187
105
 
188
- def write(path, data)
106
+ def write(data)
189
107
  data.to_yaml
190
108
  end
191
109
  end
192
110
 
193
- GitStore::Handler['yml'] = YAMLHandler.new
194
-
195
-
196
111
  Shinmun uses its own handler for files with `md` extension:
197
112
 
198
- @@ruby
199
-
200
113
  class PostHandler
201
- def read(path, data)
202
- Post.new(:path => path, :src => data)
114
+ def read(data)
115
+ Post.new(:src => data)
203
116
  end
204
117
 
205
- def write(path, post)
118
+ def write(post)
206
119
  post.dump
207
120
  end
208
121
  end
209
122
 
210
- GitStore::Handler['md'] = PostHandler.new
123
+ store = GitStore.new('.')
124
+ store.handler['md'] = PostHandler.new
211
125
 
212
126
 
213
127
  ### GitStore on GitHub
214
128
 
215
129
  Download or fork the project on its [Github page][5]
216
130
 
131
+ ### Mailing List
132
+
133
+ Please join the [GitStore Google Group][3] for further discussion.
217
134
 
218
135
  ### Related Work
219
136
 
@@ -223,6 +140,7 @@ John Wiegley already has done [something similar for Python][4].
223
140
 
224
141
  [1]: http://git.or.cz/
225
142
  [2]: http://github.com/mojombo/grit
143
+ [3]: http://groups.google.com/group/gitstore
226
144
  [4]: http://www.newartisans.com/blog_files/git.versioned.data.store.php
227
145
  [5]: http://github.com/georgi/git_store
228
146
  [6]: http://www.kernel.org/pub/software/scm/git/docs/git-gui.html
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'git_store'
3
- s.version = '0.2.4'
3
+ s.version = '0.3'
4
4
  s.summary = 'a simple data store based on git'
5
5
  s.author = 'Matthias Georgi'
6
6
  s.email = 'matti.georgi@gmail.com'
@@ -17,9 +17,13 @@ README.md
17
17
  git_store.gemspec
18
18
  lib/git_store.rb
19
19
  lib/git_store/blob.rb
20
+ lib/git_store/commit.rb
21
+ lib/git_store/diff.rb
20
22
  lib/git_store/tree.rb
21
23
  lib/git_store/handlers.rb
22
24
  lib/git_store/pack.rb
25
+ test/tree_spec.rb
26
+ test/commit_spec.rb
23
27
  test/git_store_spec.rb
24
28
  test/benchmark.rb
25
29
  }
@@ -2,11 +2,14 @@ require 'rubygems'
2
2
  require 'zlib'
3
3
  require 'digest/sha1'
4
4
  require 'yaml'
5
+ require 'fileutils'
5
6
 
6
7
  require 'git_store/blob'
8
+ require 'git_store/diff'
7
9
  require 'git_store/tree'
8
- require 'git_store/handlers'
9
10
  require 'git_store/pack'
11
+ require 'git_store/commit'
12
+ require 'git_store/handlers'
10
13
 
11
14
  # GitStore implements a versioned data store based on the revision
12
15
  # management system git. You can store object hierarchies as nested
@@ -36,23 +39,41 @@ require 'git_store/pack'
36
39
  class GitStore
37
40
  include Enumerable
38
41
 
39
- attr_reader :path, :index, :root, :branch, :lock_file, :head, :packs
42
+ TYPE_CLASS = {
43
+ 'tree' => Tree,
44
+ 'blob' => Blob,
45
+ 'commit' => Commit
46
+ }
47
+
48
+ attr_reader :path, :index, :root, :branch, :user, :lock_file, :head, :packs, :handler
40
49
 
41
50
  # Initialize a store.
42
51
  def initialize(path, branch = 'master')
43
- @path = path.chomp('/')
44
- @branch = branch
45
- @root = Tree.new(self)
46
- @packs = {}
52
+ if not File.exists?("#{path}/.git")
53
+ raise ArgumentError, "first argument must be a valid Git repository: `#{path}'"
54
+ end
55
+
56
+ @path = path.chomp('/')
57
+ @branch = branch
58
+ @root = Tree.new(self)
59
+ @packs = {}
60
+
61
+ init_handler
62
+
63
+ name = IO.popen("git config user.name") { |io| io.gets.chomp }
64
+ email = IO.popen("git config user.email") { |io| io.gets.chomp }
65
+
66
+ @user = "#{name} <#{email}>"
47
67
 
48
- raise(ArgumentError, "first argument must be a valid Git repository") unless valid?
49
-
50
68
  load_packs("#{path}/.git/objects/pack")
51
69
  load
52
70
  end
53
71
 
54
- def valid?
55
- File.exists?("#{path}/.git")
72
+ def init_handler
73
+ @handler = {
74
+ 'yml' => YAMLHandler.new
75
+ }
76
+ @handler.default = DefaultHandler.new
56
77
  end
57
78
 
58
79
  # The path to the current head file.
@@ -68,30 +89,43 @@ class GitStore
68
89
  # Read the id of the head commit.
69
90
  #
70
91
  # Returns the object id of the last commit.
71
- def read_head
92
+ def read_head_id
72
93
  File.read(head_path).strip if File.exists?(head_path)
73
94
  end
74
95
 
96
+ def handler_for(path)
97
+ handler[ path.split('.').last ]
98
+ end
99
+
75
100
  # Read an object for the specified path.
76
- #
77
- # Use multiple arguments or a string with slashes.
78
- def [](*args)
79
- root[*args]
101
+ def [](path)
102
+ root[path]
80
103
  end
81
104
 
82
105
  # Write an object to the specified path.
83
- #
84
- # Use multiple arguments or a string with slashes.
85
- def []=(*args)
86
- value = args.pop
87
- root[*args] = value
106
+ def []=(path, data)
107
+ root[path] = data
88
108
  end
89
109
 
90
- # Delete the specified path.
91
- #
92
- # Use multiple arguments or a string with slashes.
93
- def delete(*args)
94
- root.delete(*args)
110
+ # Iterate over all key-values pairs found in this store.
111
+ def each(&block)
112
+ root.each(&block)
113
+ end
114
+
115
+ def paths
116
+ root.paths
117
+ end
118
+
119
+ def values
120
+ root.values
121
+ end
122
+
123
+ def delete(path)
124
+ root.delete(path)
125
+ end
126
+
127
+ def tree(name)
128
+ root.tree(name)
95
129
  end
96
130
 
97
131
  # Returns the store as a hash tree.
@@ -101,30 +135,38 @@ class GitStore
101
135
 
102
136
  # Inspect the store.
103
137
  def inspect
104
- "#<GitStore #{path} #{branch} #{root.to_hash.inspect}>"
105
- end
106
-
107
- # Iterate over all values found in this store.
108
- def each(&block)
109
- root.each(&block)
138
+ "#<GitStore #{path} #{branch}>"
110
139
  end
111
140
 
112
141
  # Has our store been changed on disk?
113
142
  def changed?
114
- head != read_head
115
- end
116
-
117
- def refresh!
118
- load if changed?
143
+ head.nil? or head.id != read_head_id
119
144
  end
120
145
 
121
146
  # Load the current head version from repository.
122
- def load
123
- if @head = read_head
124
- commit = get_object(head)[0]
125
- root.id = commit.split(/[ \n]/, 3)[1].strip
126
- root.data = get_object(root.id)[0]
127
- root.load_from_store
147
+ def load(from_disk = false)
148
+ if id = read_head_id
149
+ @head = get(id)
150
+ @root = get(@head.tree)
151
+ end
152
+
153
+ load_from_disk if from_disk
154
+ end
155
+
156
+ def load_from_disk
157
+ @mtime ||= {}
158
+
159
+ root.each_blob do |path, blob|
160
+ file = "#{self.path}/#{path}"
161
+
162
+ if File.file?(file)
163
+ mtime = File.mtime(file)
164
+
165
+ if @mtime[path] != mtime
166
+ @mtime[path] = mtime
167
+ blob.data = File.read(file)
168
+ end
169
+ end
128
170
  end
129
171
  end
130
172
 
@@ -133,7 +175,7 @@ class GitStore
133
175
  load if changed?
134
176
  end
135
177
 
136
- # Do we have a current transacation?
178
+ # Is there any transaction going on?
137
179
  def in_transaction?
138
180
  Thread.current['git_store_lock']
139
181
  end
@@ -174,7 +216,7 @@ class GitStore
174
216
  #
175
217
  # Any changes made to the store are discarded.
176
218
  def rollback
177
- root.load_from_store
219
+ load
178
220
  finish_transaction
179
221
  end
180
222
 
@@ -188,25 +230,56 @@ class GitStore
188
230
  File.unlink("#{head_path}.lock") rescue nil
189
231
  end
190
232
 
233
+ def user_info(user, time)
234
+ "#{ user } #{ time.to_i } #{ time.to_s.split[4] }"
235
+ end
236
+
191
237
  # Write the commit object to disk and set the head of the current branch.
192
238
  #
193
239
  # Returns the id of the commit object
194
- def commit(message = '', author = 'ruby', committer = 'ruby')
195
- time = "#{ Time.now.to_i } #{ Time.now.to_s.split[4] }"
196
- tree = root.write_to_store
240
+ def commit(message = '', author = "#{user_info user, Time.now}", committer = "#{user_info user, Time.now}")
241
+ commit = Commit.new(self)
242
+ commit.tree = root.write
243
+ commit.parent << head.id if head
244
+ commit.author = author
245
+ commit.committer = committer
246
+ commit.message = message
247
+ commit.write
197
248
 
198
- contents = [ "tree #{tree}", (head and "parent #{head}"),
199
- "author #{author} #{time}",
200
- "committer #{committer} #{time}", '', message
201
- ].compact.join("\n")
249
+ open(head_path, "wb") do |file|
250
+ file.write(commit.id)
251
+ end
202
252
 
203
- id = put_object(contents, 'commit')
253
+ @head = commit
254
+ end
204
255
 
205
- open(head_path, "wb") do |file|
206
- file.write(id)
256
+ def commits(limit = 10, start = head)
257
+ entries = []
258
+ current = start
259
+
260
+ while current and entries.size < limit
261
+ entries << current
262
+ current = get(current.parent.first)
207
263
  end
208
264
 
209
- @head = id
265
+ entries
266
+ end
267
+
268
+ def get(id)
269
+ return nil if id.nil?
270
+ type, content = get_object(id)
271
+
272
+ klass = TYPE_CLASS[type]
273
+ klass.new(self, id, content)
274
+ end
275
+
276
+ # Returns the hash value of an object string.
277
+ def sha(str)
278
+ Digest::SHA1.hexdigest(str)[0, 40]
279
+ end
280
+
281
+ def id_for(type, content)
282
+ sha "#{type} #{content.length}\0#{content}"
210
283
  end
211
284
 
212
285
  # Read the raw object with the given id from the repository.
@@ -218,32 +291,26 @@ class GitStore
218
291
  if File.exists?(path)
219
292
  buf = open(path, "rb") { |f| f.read }
220
293
 
221
- raise if not legacy_loose_object?(buf)
294
+ raise "not a loose object: #{id}" if not legacy_loose_object?(buf)
222
295
 
223
296
  header, content = Zlib::Inflate.inflate(buf).split(/\0/, 2)
224
297
  type, size = header.split(/ /, 2)
298
+
299
+ raise "bad object: #{id}" if content.length != size.to_i
225
300
  else
226
301
  content, type = get_object_from_pack(id)
227
302
  end
228
303
 
229
- return content, type
304
+ return type, content
230
305
  end
231
-
232
- # Returns the hash value of an object string.
233
- def sha(str)
234
- Digest::SHA1.hexdigest(str)[0, 40]
235
- end
236
-
306
+
237
307
  # Write a raw object to the repository.
238
308
  #
239
309
  # Returns the object id.
240
- def put_object(content, type)
241
- size = content.length.to_s
242
- header = "#{type} #{size}\0"
243
- data = header + content
244
-
310
+ def put_object(type, content)
311
+ data = "#{type} #{content.length}\0#{content}"
245
312
  id = sha(data)
246
- path = object_path(id)
313
+ path = object_path(id)
247
314
 
248
315
  unless File.exists?(path)
249
316
  FileUtils.mkpath(File.dirname(path))
@@ -263,7 +330,7 @@ class GitStore
263
330
 
264
331
  def get_object_from_pack(id)
265
332
  pack, offset = @packs[id]
266
-
333
+
267
334
  pack.parse_object(offset) if pack
268
335
  end
269
336
 
@@ -281,53 +348,5 @@ class GitStore
281
348
  end
282
349
  end
283
350
  end
284
-
285
- # FileStore reads a working copy out of a directory. Changes made to
286
- # the store will not be written to a repository. This is useful, if
287
- # you want to read a filesystem without having a git repository.
288
- class FileStore < GitStore
289
-
290
- def initialize(path)
291
- @mtime = {}
292
- super
293
- rescue ArgumentError
294
- end
295
-
296
- def load
297
- root.load_from_disk
298
-
299
- each_blob_in(root) do |blob|
300
- @mtime[blob.path] = File.mtime("#{path}/#{blob.path}")
301
- end
302
- end
303
-
304
- def each_blob_in(tree, &blob)
305
- tree.table.each do |name, entry|
306
- case entry
307
- when Blob; yield entry
308
- when Tree; each_blob_in(entry, &blob)
309
- end
310
- end
311
- end
312
-
313
- def refresh!
314
- each_blob_in(root) do |blob|
315
- path = "#{self.path}/#{blob.path}"
316
- if File.exist?(path)
317
- mtime = File.mtime(path)
318
- if @mtime[blob.path] != mtime
319
- @mtime[blob.path] = mtime
320
- blob.load_from_disk
321
- end
322
- else
323
- delete blob.path
324
- end
325
- end
326
- end
327
-
328
- def commit(message="")
329
- end
330
-
331
- end
332
351
 
333
352
  end