json-stream 0.1.2 → 0.1.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/LICENSE +1 -1
- data/{README → README.md} +29 -22
- data/Rakefile +8 -25
- data/json-stream.gemspec +20 -0
- data/lib/json/stream/buffer.rb +0 -1
- data/lib/json/stream/builder.rb +1 -2
- data/lib/json/stream/parser.rb +2 -4
- data/lib/json/stream/version.rb +1 -1
- data/test/parser_test.rb +3 -1
- metadata +44 -46
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 6d404ae0a1e8e03bb1551ead94eda3e413c1f492
|
4
|
+
data.tar.gz: 0f14c4b8a6de9f53b4bddaf824a0c0184f969399
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 3640958bef5a32726c09c723a459d15986564bc7afe70bfc0db9faa15037bddd6539fa04a8d79bb95674b56a08901771f33118d3c43e862cc51984d9eca12b39
|
7
|
+
data.tar.gz: 260eac9e6ff3b440fce89231e0f013bb490ed5666ace3e2c5ee28c3d17cd2eb36a589a5340c42862cf4317a9737e80a976e3ce805b4979ed9dc1df3e9b39ae70
|
data/LICENSE
CHANGED
data/{README → README.md}
RENAMED
@@ -1,42 +1,48 @@
|
|
1
|
-
|
1
|
+
# JSON::Stream
|
2
2
|
|
3
|
-
JSON::Stream is a finite state machine
|
4
|
-
for each state change. This allows
|
5
|
-
memory and the parsed object graph out of memory to some other process.
|
6
|
-
is much like an XML SAX parser that generates events during parsing.
|
7
|
-
no requirement for the document
|
8
|
-
memory.
|
9
|
-
For example, streaming and processing large map/reduce views from Apache
|
3
|
+
JSON::Stream is a JSON parser, based on a finite state machine, that generates
|
4
|
+
events for each state change. This allows streaming both the JSON document into
|
5
|
+
memory and the parsed object graph out of memory to some other process. This
|
6
|
+
is much like an XML SAX parser that generates events during parsing. There is
|
7
|
+
no requirement for the document, or the object graph, to be fully buffered in
|
8
|
+
memory. This is best suited for huge JSON documents that won't fit in memory.
|
9
|
+
For example, streaming and processing large map/reduce views from Apache
|
10
|
+
CouchDB.
|
10
11
|
|
11
|
-
|
12
|
+
## Usage
|
12
13
|
|
13
14
|
The simplest way to parse is to read the full JSON document into memory
|
14
|
-
and then parse it into a full object graph.
|
15
|
+
and then parse it into a full object graph. This is fine for small documents
|
15
16
|
because we have room for both the document and parsed object in memory.
|
16
17
|
|
18
|
+
```ruby
|
17
19
|
require 'json/stream'
|
18
20
|
json = File.read('/tmp/test.json')
|
19
21
|
obj = JSON::Stream::Parser.parse(json)
|
22
|
+
```
|
20
23
|
|
21
24
|
While it's possible to do this with JSON::Stream, we really want to use the json
|
22
|
-
gem for documents like this.
|
25
|
+
gem for documents like this. JSON.parse() is much faster than this parser,
|
23
26
|
because it can rely on having the entire document in memory to analyze.
|
24
27
|
|
25
28
|
For larger documents we can use an IO object to stream it into the parser.
|
26
29
|
We still need room for the parsed object, but the document itself is never
|
27
30
|
fully read into memory.
|
28
31
|
|
32
|
+
```ruby
|
29
33
|
require 'json/stream'
|
30
34
|
stream = File.open('/tmp/test.json')
|
31
35
|
obj = JSON::Stream::Parser.parse(stream)
|
36
|
+
```
|
32
37
|
|
33
|
-
Again, while
|
38
|
+
Again, while JSON::Stream can be used this way, if we just need to stream the
|
34
39
|
document from disk or the network, we're better off using the yajl-ruby gem.
|
35
40
|
|
36
41
|
Huge documents arriving over the network in small chunks to an EventMachine
|
37
|
-
receive_data loop is where JSON::Stream is really useful.
|
42
|
+
receive_data loop is where JSON::Stream is really useful. Inside an
|
38
43
|
EventMachine::Connection subclass we might have:
|
39
44
|
|
45
|
+
```ruby
|
40
46
|
def post_init
|
41
47
|
@parser = JSON::Stream::Parser.new do
|
42
48
|
start_document { puts "start document" }
|
@@ -57,26 +63,27 @@ def receive_data(data)
|
|
57
63
|
close_connection
|
58
64
|
end
|
59
65
|
end
|
66
|
+
```
|
60
67
|
|
61
68
|
Notice how the parser accepts chunks of the JSON document and parses up
|
62
|
-
|
63
|
-
parse from the prior state.
|
69
|
+
to the end of the available buffer. Passing in more data resumes the
|
70
|
+
parse from the prior state. When an interesting state change happens, the
|
64
71
|
parser notifies all registered callback procs of the event.
|
65
72
|
|
66
73
|
The event callback is where we can do interesting data filtering and passing
|
67
|
-
to other processes.
|
74
|
+
to other processes. The above example simply prints state changes, but
|
68
75
|
imagine the callbacks looking for an array named "rows" and processing sets
|
69
|
-
of these row objects in small batches.
|
70
|
-
|
76
|
+
of these row objects in small batches. Millions of rows, streaming over the
|
77
|
+
network, can be processed in constant memory space this way.
|
71
78
|
|
72
|
-
|
79
|
+
## Dependencies
|
73
80
|
|
74
|
-
* ruby >= 1.9.
|
81
|
+
* ruby >= 1.9.2
|
75
82
|
|
76
|
-
|
83
|
+
## Contact
|
77
84
|
|
78
85
|
Project contact: David Graham <david.malcom.graham@gmail.com>
|
79
86
|
|
80
|
-
|
87
|
+
## License
|
81
88
|
|
82
89
|
JSON::Stream is released under the MIT license. Check the LICENSE file for details.
|
data/Rakefile
CHANGED
@@ -1,37 +1,20 @@
|
|
1
1
|
require 'rake'
|
2
2
|
require 'rake/clean'
|
3
|
-
require 'rake/gempackagetask'
|
4
3
|
require 'rake/testtask'
|
5
|
-
require_relative 'lib/json/stream/version'
|
6
4
|
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
s.date = Time.now.strftime("%Y-%m-%d")
|
11
|
-
s.summary = "A streaming JSON parser that generates SAX-like events."
|
12
|
-
s.description = "A finite state machine based JSON parser that generates events
|
13
|
-
for each state change. This allows us to stream both the JSON document into
|
14
|
-
memory and the parsed object graph out of memory to some other process. This
|
15
|
-
is much like an XML SAX parser that generates events during parsing. There is
|
16
|
-
no requirement for the document nor the object graph to be fully buffered in
|
17
|
-
memory. This is best suited for huge JSON documents that won't fit in memory.
|
18
|
-
For example, streaming and processing large map/reduce views from Apache CouchDB."
|
19
|
-
s.email = "david.malcom.graham@gmail.com"
|
20
|
-
s.homepage = "http://dgraham.github.com/json-stream/"
|
21
|
-
s.authors = ["David Graham"]
|
22
|
-
s.files = FileList['[A-Z]*', "{lib}/**/*"]
|
23
|
-
s.require_path = "lib"
|
24
|
-
s.test_files = FileList["{test}/**/*test.rb"]
|
25
|
-
s.required_ruby_version = '>= 1.9.1'
|
26
|
-
end
|
5
|
+
CLOBBER.include('pkg')
|
6
|
+
|
7
|
+
directory 'pkg'
|
27
8
|
|
28
|
-
|
29
|
-
|
9
|
+
desc 'Build distributable packages'
|
10
|
+
task :build => [:pkg] do
|
11
|
+
system 'gem build json-stream.gemspec && mv json-*.gem pkg/'
|
30
12
|
end
|
31
13
|
|
32
14
|
Rake::TestTask.new(:test) do |test|
|
15
|
+
test.libs << 'test'
|
33
16
|
test.pattern = 'test/**/*_test.rb'
|
34
17
|
test.warning = true
|
35
18
|
end
|
36
19
|
|
37
|
-
task :default => [:clobber, :test, :
|
20
|
+
task :default => [:clobber, :test, :build]
|
data/json-stream.gemspec
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
require './lib/json/stream/version'
|
2
|
+
|
3
|
+
Gem::Specification.new do |s|
|
4
|
+
s.name = 'json-stream'
|
5
|
+
s.version = JSON::Stream::VERSION
|
6
|
+
s.summary = %q[A streaming JSON parser that generates SAX-like events.]
|
7
|
+
s.description = %q[A parser best suited for huge JSON documents that don't fit in memory.]
|
8
|
+
|
9
|
+
s.authors = ['David Graham']
|
10
|
+
s.email = %w[david.malcom.graham@gmail.com]
|
11
|
+
s.homepage = 'http://dgraham.github.io/json-stream/'
|
12
|
+
s.license = 'MIT'
|
13
|
+
|
14
|
+
s.files = Dir['[A-Z]*', 'json-stream.gemspec', '{lib}/**/*']
|
15
|
+
s.test_files = Dir['test/**/*']
|
16
|
+
s.require_path = 'lib'
|
17
|
+
|
18
|
+
s.add_development_dependency 'rake'
|
19
|
+
s.required_ruby_version = '>= 1.9.2'
|
20
|
+
end
|
data/lib/json/stream/buffer.rb
CHANGED
data/lib/json/stream/builder.rb
CHANGED
data/lib/json/stream/parser.rb
CHANGED
@@ -2,7 +2,6 @@
|
|
2
2
|
|
3
3
|
module JSON
|
4
4
|
module Stream
|
5
|
-
|
6
5
|
class ParserError < RuntimeError; end
|
7
6
|
|
8
7
|
# A streaming JSON parser that generates SAX-like events for
|
@@ -10,7 +9,7 @@ module JSON
|
|
10
9
|
# for huge documents that won't fit in memory.
|
11
10
|
class Parser
|
12
11
|
BUF_SIZE = 512
|
13
|
-
CONTROL = /[
|
12
|
+
CONTROL = /[\x00-\x1F]/
|
14
13
|
WS = /\s/
|
15
14
|
HEX = /[0-9a-fA-F]/
|
16
15
|
DIGIT = /[0-9]/
|
@@ -194,7 +193,7 @@ module JSON
|
|
194
193
|
end
|
195
194
|
when :start_surrogate_pair
|
196
195
|
case ch
|
197
|
-
when BACKSLASH
|
196
|
+
when BACKSLASH
|
198
197
|
@state = :start_surrogate_pair_u
|
199
198
|
else
|
200
199
|
error('Expected low surrogate pair half')
|
@@ -425,6 +424,5 @@ module JSON
|
|
425
424
|
raise ParserError, "#{message}: char #{@pos}"
|
426
425
|
end
|
427
426
|
end
|
428
|
-
|
429
427
|
end
|
430
428
|
end
|
data/lib/json/stream/version.rb
CHANGED
data/test/parser_test.rb
CHANGED
@@ -220,6 +220,9 @@ class ParserTest < Test::Unit::TestCase
|
|
220
220
|
|
221
221
|
expected = [:start_document, :start_object, :error]
|
222
222
|
assert_equal(expected, events("{\" \u0000 \":12}"))
|
223
|
+
|
224
|
+
expected = [:start_document, :start_array, [:value, " \u007F "], :end_array, :end_document]
|
225
|
+
assert_equal(expected, events("[\" \u007f \"]"))
|
223
226
|
end
|
224
227
|
|
225
228
|
def test_unicode_escape
|
@@ -447,5 +450,4 @@ class ParserTest < Test::Unit::TestCase
|
|
447
450
|
@events << :error
|
448
451
|
end
|
449
452
|
end
|
450
|
-
|
451
453
|
end
|
metadata
CHANGED
@@ -1,38 +1,40 @@
|
|
1
|
-
--- !ruby/object:Gem::Specification
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
2
|
name: json-stream
|
3
|
-
version: !ruby/object:Gem::Version
|
4
|
-
|
5
|
-
version: 0.1.2
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.3
|
6
5
|
platform: ruby
|
7
|
-
authors:
|
6
|
+
authors:
|
8
7
|
- David Graham
|
9
8
|
autorequire:
|
10
9
|
bindir: bin
|
11
10
|
cert_chain: []
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
11
|
+
date: 2013-10-15 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: rake
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - '>='
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '0'
|
20
|
+
type: :development
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - '>='
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '0'
|
27
|
+
description: A parser best suited for huge JSON documents that don't fit in memory.
|
28
|
+
email:
|
29
|
+
- david.malcom.graham@gmail.com
|
26
30
|
executables: []
|
27
|
-
|
28
31
|
extensions: []
|
29
|
-
|
30
32
|
extra_rdoc_files: []
|
31
|
-
|
32
|
-
files:
|
33
|
+
files:
|
33
34
|
- LICENSE
|
34
35
|
- Rakefile
|
35
|
-
- README
|
36
|
+
- README.md
|
37
|
+
- json-stream.gemspec
|
36
38
|
- lib/json/stream/buffer.rb
|
37
39
|
- lib/json/stream/builder.rb
|
38
40
|
- lib/json/stream/parser.rb
|
@@ -41,35 +43,31 @@ files:
|
|
41
43
|
- test/buffer_test.rb
|
42
44
|
- test/builder_test.rb
|
43
45
|
- test/parser_test.rb
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
46
|
+
homepage: http://dgraham.github.io/json-stream/
|
47
|
+
licenses:
|
48
|
+
- MIT
|
49
|
+
metadata: {}
|
48
50
|
post_install_message:
|
49
51
|
rdoc_options: []
|
50
|
-
|
51
|
-
require_paths:
|
52
|
+
require_paths:
|
52
53
|
- lib
|
53
|
-
required_ruby_version: !ruby/object:Gem::Requirement
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
- !ruby/object:Gem::Version
|
64
|
-
version: "0"
|
54
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
55
|
+
requirements:
|
56
|
+
- - '>='
|
57
|
+
- !ruby/object:Gem::Version
|
58
|
+
version: 1.9.2
|
59
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
60
|
+
requirements:
|
61
|
+
- - '>='
|
62
|
+
- !ruby/object:Gem::Version
|
63
|
+
version: '0'
|
65
64
|
requirements: []
|
66
|
-
|
67
65
|
rubyforge_project:
|
68
|
-
rubygems_version:
|
66
|
+
rubygems_version: 2.0.3
|
69
67
|
signing_key:
|
70
|
-
specification_version:
|
68
|
+
specification_version: 4
|
71
69
|
summary: A streaming JSON parser that generates SAX-like events.
|
72
|
-
test_files:
|
70
|
+
test_files:
|
73
71
|
- test/buffer_test.rb
|
74
72
|
- test/builder_test.rb
|
75
73
|
- test/parser_test.rb
|