json-stream 0.1.2 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/LICENSE +1 -1
- data/{README → README.md} +29 -22
- data/Rakefile +8 -25
- data/json-stream.gemspec +20 -0
- data/lib/json/stream/buffer.rb +0 -1
- data/lib/json/stream/builder.rb +1 -2
- data/lib/json/stream/parser.rb +2 -4
- data/lib/json/stream/version.rb +1 -1
- data/test/parser_test.rb +3 -1
- metadata +44 -46
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 6d404ae0a1e8e03bb1551ead94eda3e413c1f492
|
4
|
+
data.tar.gz: 0f14c4b8a6de9f53b4bddaf824a0c0184f969399
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 3640958bef5a32726c09c723a459d15986564bc7afe70bfc0db9faa15037bddd6539fa04a8d79bb95674b56a08901771f33118d3c43e862cc51984d9eca12b39
|
7
|
+
data.tar.gz: 260eac9e6ff3b440fce89231e0f013bb490ed5666ace3e2c5ee28c3d17cd2eb36a589a5340c42862cf4317a9737e80a976e3ce805b4979ed9dc1df3e9b39ae70
|
data/LICENSE
CHANGED
data/{README → README.md}
RENAMED
@@ -1,42 +1,48 @@
|
|
1
|
-
|
1
|
+
# JSON::Stream
|
2
2
|
|
3
|
-
JSON::Stream is a finite state machine
|
4
|
-
for each state change. This allows
|
5
|
-
memory and the parsed object graph out of memory to some other process.
|
6
|
-
is much like an XML SAX parser that generates events during parsing.
|
7
|
-
no requirement for the document
|
8
|
-
memory.
|
9
|
-
For example, streaming and processing large map/reduce views from Apache
|
3
|
+
JSON::Stream is a JSON parser, based on a finite state machine, that generates
|
4
|
+
events for each state change. This allows streaming both the JSON document into
|
5
|
+
memory and the parsed object graph out of memory to some other process. This
|
6
|
+
is much like an XML SAX parser that generates events during parsing. There is
|
7
|
+
no requirement for the document, or the object graph, to be fully buffered in
|
8
|
+
memory. This is best suited for huge JSON documents that won't fit in memory.
|
9
|
+
For example, streaming and processing large map/reduce views from Apache
|
10
|
+
CouchDB.
|
10
11
|
|
11
|
-
|
12
|
+
## Usage
|
12
13
|
|
13
14
|
The simplest way to parse is to read the full JSON document into memory
|
14
|
-
and then parse it into a full object graph.
|
15
|
+
and then parse it into a full object graph. This is fine for small documents
|
15
16
|
because we have room for both the document and parsed object in memory.
|
16
17
|
|
18
|
+
```ruby
|
17
19
|
require 'json/stream'
|
18
20
|
json = File.read('/tmp/test.json')
|
19
21
|
obj = JSON::Stream::Parser.parse(json)
|
22
|
+
```
|
20
23
|
|
21
24
|
While it's possible to do this with JSON::Stream, we really want to use the json
|
22
|
-
gem for documents like this.
|
25
|
+
gem for documents like this. JSON.parse() is much faster than this parser,
|
23
26
|
because it can rely on having the entire document in memory to analyze.
|
24
27
|
|
25
28
|
For larger documents we can use an IO object to stream it into the parser.
|
26
29
|
We still need room for the parsed object, but the document itself is never
|
27
30
|
fully read into memory.
|
28
31
|
|
32
|
+
```ruby
|
29
33
|
require 'json/stream'
|
30
34
|
stream = File.open('/tmp/test.json')
|
31
35
|
obj = JSON::Stream::Parser.parse(stream)
|
36
|
+
```
|
32
37
|
|
33
|
-
Again, while
|
38
|
+
Again, while JSON::Stream can be used this way, if we just need to stream the
|
34
39
|
document from disk or the network, we're better off using the yajl-ruby gem.
|
35
40
|
|
36
41
|
Huge documents arriving over the network in small chunks to an EventMachine
|
37
|
-
receive_data loop is where JSON::Stream is really useful.
|
42
|
+
receive_data loop is where JSON::Stream is really useful. Inside an
|
38
43
|
EventMachine::Connection subclass we might have:
|
39
44
|
|
45
|
+
```ruby
|
40
46
|
def post_init
|
41
47
|
@parser = JSON::Stream::Parser.new do
|
42
48
|
start_document { puts "start document" }
|
@@ -57,26 +63,27 @@ def receive_data(data)
|
|
57
63
|
close_connection
|
58
64
|
end
|
59
65
|
end
|
66
|
+
```
|
60
67
|
|
61
68
|
Notice how the parser accepts chunks of the JSON document and parses up
|
62
|
-
|
63
|
-
parse from the prior state.
|
69
|
+
to the end of the available buffer. Passing in more data resumes the
|
70
|
+
parse from the prior state. When an interesting state change happens, the
|
64
71
|
parser notifies all registered callback procs of the event.
|
65
72
|
|
66
73
|
The event callback is where we can do interesting data filtering and passing
|
67
|
-
to other processes.
|
74
|
+
to other processes. The above example simply prints state changes, but
|
68
75
|
imagine the callbacks looking for an array named "rows" and processing sets
|
69
|
-
of these row objects in small batches.
|
70
|
-
|
76
|
+
of these row objects in small batches. Millions of rows, streaming over the
|
77
|
+
network, can be processed in constant memory space this way.
|
71
78
|
|
72
|
-
|
79
|
+
## Dependencies
|
73
80
|
|
74
|
-
* ruby >= 1.9.
|
81
|
+
* ruby >= 1.9.2
|
75
82
|
|
76
|
-
|
83
|
+
## Contact
|
77
84
|
|
78
85
|
Project contact: David Graham <david.malcom.graham@gmail.com>
|
79
86
|
|
80
|
-
|
87
|
+
## License
|
81
88
|
|
82
89
|
JSON::Stream is released under the MIT license. Check the LICENSE file for details.
|
data/Rakefile
CHANGED
@@ -1,37 +1,20 @@
|
|
1
1
|
require 'rake'
|
2
2
|
require 'rake/clean'
|
3
|
-
require 'rake/gempackagetask'
|
4
3
|
require 'rake/testtask'
|
5
|
-
require_relative 'lib/json/stream/version'
|
6
4
|
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
s.date = Time.now.strftime("%Y-%m-%d")
|
11
|
-
s.summary = "A streaming JSON parser that generates SAX-like events."
|
12
|
-
s.description = "A finite state machine based JSON parser that generates events
|
13
|
-
for each state change. This allows us to stream both the JSON document into
|
14
|
-
memory and the parsed object graph out of memory to some other process. This
|
15
|
-
is much like an XML SAX parser that generates events during parsing. There is
|
16
|
-
no requirement for the document nor the object graph to be fully buffered in
|
17
|
-
memory. This is best suited for huge JSON documents that won't fit in memory.
|
18
|
-
For example, streaming and processing large map/reduce views from Apache CouchDB."
|
19
|
-
s.email = "david.malcom.graham@gmail.com"
|
20
|
-
s.homepage = "http://dgraham.github.com/json-stream/"
|
21
|
-
s.authors = ["David Graham"]
|
22
|
-
s.files = FileList['[A-Z]*', "{lib}/**/*"]
|
23
|
-
s.require_path = "lib"
|
24
|
-
s.test_files = FileList["{test}/**/*test.rb"]
|
25
|
-
s.required_ruby_version = '>= 1.9.1'
|
26
|
-
end
|
5
|
+
CLOBBER.include('pkg')
|
6
|
+
|
7
|
+
directory 'pkg'
|
27
8
|
|
28
|
-
|
29
|
-
|
9
|
+
desc 'Build distributable packages'
|
10
|
+
task :build => [:pkg] do
|
11
|
+
system 'gem build json-stream.gemspec && mv json-*.gem pkg/'
|
30
12
|
end
|
31
13
|
|
32
14
|
Rake::TestTask.new(:test) do |test|
|
15
|
+
test.libs << 'test'
|
33
16
|
test.pattern = 'test/**/*_test.rb'
|
34
17
|
test.warning = true
|
35
18
|
end
|
36
19
|
|
37
|
-
task :default => [:clobber, :test, :
|
20
|
+
task :default => [:clobber, :test, :build]
|
data/json-stream.gemspec
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
require './lib/json/stream/version'
|
2
|
+
|
3
|
+
Gem::Specification.new do |s|
|
4
|
+
s.name = 'json-stream'
|
5
|
+
s.version = JSON::Stream::VERSION
|
6
|
+
s.summary = %q[A streaming JSON parser that generates SAX-like events.]
|
7
|
+
s.description = %q[A parser best suited for huge JSON documents that don't fit in memory.]
|
8
|
+
|
9
|
+
s.authors = ['David Graham']
|
10
|
+
s.email = %w[david.malcom.graham@gmail.com]
|
11
|
+
s.homepage = 'http://dgraham.github.io/json-stream/'
|
12
|
+
s.license = 'MIT'
|
13
|
+
|
14
|
+
s.files = Dir['[A-Z]*', 'json-stream.gemspec', '{lib}/**/*']
|
15
|
+
s.test_files = Dir['test/**/*']
|
16
|
+
s.require_path = 'lib'
|
17
|
+
|
18
|
+
s.add_development_dependency 'rake'
|
19
|
+
s.required_ruby_version = '>= 1.9.2'
|
20
|
+
end
|
data/lib/json/stream/buffer.rb
CHANGED
data/lib/json/stream/builder.rb
CHANGED
data/lib/json/stream/parser.rb
CHANGED
@@ -2,7 +2,6 @@
|
|
2
2
|
|
3
3
|
module JSON
|
4
4
|
module Stream
|
5
|
-
|
6
5
|
class ParserError < RuntimeError; end
|
7
6
|
|
8
7
|
# A streaming JSON parser that generates SAX-like events for
|
@@ -10,7 +9,7 @@ module JSON
|
|
10
9
|
# for huge documents that won't fit in memory.
|
11
10
|
class Parser
|
12
11
|
BUF_SIZE = 512
|
13
|
-
CONTROL = /[
|
12
|
+
CONTROL = /[\x00-\x1F]/
|
14
13
|
WS = /\s/
|
15
14
|
HEX = /[0-9a-fA-F]/
|
16
15
|
DIGIT = /[0-9]/
|
@@ -194,7 +193,7 @@ module JSON
|
|
194
193
|
end
|
195
194
|
when :start_surrogate_pair
|
196
195
|
case ch
|
197
|
-
when BACKSLASH
|
196
|
+
when BACKSLASH
|
198
197
|
@state = :start_surrogate_pair_u
|
199
198
|
else
|
200
199
|
error('Expected low surrogate pair half')
|
@@ -425,6 +424,5 @@ module JSON
|
|
425
424
|
raise ParserError, "#{message}: char #{@pos}"
|
426
425
|
end
|
427
426
|
end
|
428
|
-
|
429
427
|
end
|
430
428
|
end
|
data/lib/json/stream/version.rb
CHANGED
data/test/parser_test.rb
CHANGED
@@ -220,6 +220,9 @@ class ParserTest < Test::Unit::TestCase
|
|
220
220
|
|
221
221
|
expected = [:start_document, :start_object, :error]
|
222
222
|
assert_equal(expected, events("{\" \u0000 \":12}"))
|
223
|
+
|
224
|
+
expected = [:start_document, :start_array, [:value, " \u007F "], :end_array, :end_document]
|
225
|
+
assert_equal(expected, events("[\" \u007f \"]"))
|
223
226
|
end
|
224
227
|
|
225
228
|
def test_unicode_escape
|
@@ -447,5 +450,4 @@ class ParserTest < Test::Unit::TestCase
|
|
447
450
|
@events << :error
|
448
451
|
end
|
449
452
|
end
|
450
|
-
|
451
453
|
end
|
metadata
CHANGED
@@ -1,38 +1,40 @@
|
|
1
|
-
--- !ruby/object:Gem::Specification
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
2
|
name: json-stream
|
3
|
-
version: !ruby/object:Gem::Version
|
4
|
-
|
5
|
-
version: 0.1.2
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.3
|
6
5
|
platform: ruby
|
7
|
-
authors:
|
6
|
+
authors:
|
8
7
|
- David Graham
|
9
8
|
autorequire:
|
10
9
|
bindir: bin
|
11
10
|
cert_chain: []
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
11
|
+
date: 2013-10-15 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: rake
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - '>='
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '0'
|
20
|
+
type: :development
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - '>='
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '0'
|
27
|
+
description: A parser best suited for huge JSON documents that don't fit in memory.
|
28
|
+
email:
|
29
|
+
- david.malcom.graham@gmail.com
|
26
30
|
executables: []
|
27
|
-
|
28
31
|
extensions: []
|
29
|
-
|
30
32
|
extra_rdoc_files: []
|
31
|
-
|
32
|
-
files:
|
33
|
+
files:
|
33
34
|
- LICENSE
|
34
35
|
- Rakefile
|
35
|
-
- README
|
36
|
+
- README.md
|
37
|
+
- json-stream.gemspec
|
36
38
|
- lib/json/stream/buffer.rb
|
37
39
|
- lib/json/stream/builder.rb
|
38
40
|
- lib/json/stream/parser.rb
|
@@ -41,35 +43,31 @@ files:
|
|
41
43
|
- test/buffer_test.rb
|
42
44
|
- test/builder_test.rb
|
43
45
|
- test/parser_test.rb
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
46
|
+
homepage: http://dgraham.github.io/json-stream/
|
47
|
+
licenses:
|
48
|
+
- MIT
|
49
|
+
metadata: {}
|
48
50
|
post_install_message:
|
49
51
|
rdoc_options: []
|
50
|
-
|
51
|
-
require_paths:
|
52
|
+
require_paths:
|
52
53
|
- lib
|
53
|
-
required_ruby_version: !ruby/object:Gem::Requirement
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
- !ruby/object:Gem::Version
|
64
|
-
version: "0"
|
54
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
55
|
+
requirements:
|
56
|
+
- - '>='
|
57
|
+
- !ruby/object:Gem::Version
|
58
|
+
version: 1.9.2
|
59
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
60
|
+
requirements:
|
61
|
+
- - '>='
|
62
|
+
- !ruby/object:Gem::Version
|
63
|
+
version: '0'
|
65
64
|
requirements: []
|
66
|
-
|
67
65
|
rubyforge_project:
|
68
|
-
rubygems_version:
|
66
|
+
rubygems_version: 2.0.3
|
69
67
|
signing_key:
|
70
|
-
specification_version:
|
68
|
+
specification_version: 4
|
71
69
|
summary: A streaming JSON parser that generates SAX-like events.
|
72
|
-
test_files:
|
70
|
+
test_files:
|
73
71
|
- test/buffer_test.rb
|
74
72
|
- test/builder_test.rb
|
75
73
|
- test/parser_test.rb
|