fast_jsonparser 0.3.0 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cb4380ffb8ced606931028f66c05ddb3af40498a9aada56833be2f5ef0bc47b4
4
- data.tar.gz: 3a3159926d6f1b1b0d90431171b7c97a93df4082bc1371313c4fe31e6c4de0c8
3
+ metadata.gz: d87b77e2dd63a557d8a32fdf47c962a52476aa921e981dd0120a3d57a5873453
4
+ data.tar.gz: b859d5d018cb9b7ce88d1bf6fe662ebf80dfbf57f5aed831e1fc22059dde0ceb
5
5
  SHA512:
6
- metadata.gz: ffa4a69c6550db893fd93c1f13df5778004188f1f3b54944b2d20049366026bbf2720fb0d62dce124d279fab650ed4e2f12c763c48207323ec9a900da240b32a
7
- data.tar.gz: fe75932c18f3cf0d896536ffc1d4f8f30067fe85046c7c1bf1806e819674f3f0a48816240cc634ac5dd530eb06e80f7b2a35f63a91e5c4ff1e432e8bf5310df5
6
+ metadata.gz: f3c9ef4836bfae90bb7cfce534a108cd102c621c90a23cd783bacbf65bfe830b863eebb89ee92abf26549b954c247857f32f3d00b44f1e0c37b327393b69161d
7
+ data.tar.gz: acc8fca8c8110f4f6c88b74cad2a73357d0664db214b8fed5ca3503f0c3290f528d394c9d7abedf6c677ec1f5608667358ec56f0995fa7aa7ce8ada060ddbfcd
@@ -0,0 +1,36 @@
1
+ # This workflow uses actions that are not certified by GitHub.
2
+ # They are provided by a third-party and are governed by
3
+ # separate terms of service, privacy policy, and support
4
+ # documentation.
5
+ # This workflow will download a prebuilt Ruby version, install dependencies and run tests with Rake
6
+ # For more information see: https://github.com/marketplace/actions/setup-ruby-jruby-and-truffleruby
7
+
8
+ name: Ruby
9
+
10
+ on:
11
+ push:
12
+ branches: [ master ]
13
+ pull_request:
14
+ branches: [ master ]
15
+
16
+ jobs:
17
+ test:
18
+
19
+ runs-on: ubuntu-latest
20
+
21
+ strategy:
22
+ matrix:
23
+ ruby-version: [3.0.1, 2.7.3, 2.6.7, 2.6.5]
24
+
25
+ steps:
26
+ - uses: actions/checkout@v2
27
+ - name: Set up Ruby ${{ matrix.ruby-version }}
28
+ uses: ruby/setup-ruby@v1
29
+ with:
30
+ ruby-version: ${{ matrix.ruby-version }}
31
+ - name: Install dependencies
32
+ run: bundle install
33
+ - name: Compile
34
+ run: bundle exec rake compile
35
+ - name: Run tests
36
+ run: bundle exec rake
data/.gitignore CHANGED
@@ -6,3 +6,5 @@
6
6
  /pkg/
7
7
  /spec/reports/
8
8
  /tmp/
9
+ *.so
10
+ *.bundle
data/CHANGELOG.md ADDED
@@ -0,0 +1,9 @@
1
+ # 0.6.0
2
+ * Fix performance on Ruby 3.0 [Issue #20](https://github.com/anilmaurya/fast_jsonparser/issues/20), thanks to [Watson1978](https://github.com/Watson1978)
3
+ # 0.5.0
4
+ * Handle concurrent use of the parser in [Issue #15](https://github.com/anilmaurya/fast_jsonparser/pull/15), thanks to [casperisfine](https://github.com/casperisfine)
5
+
6
+ # 0.4.0
7
+ * load_many accept batch_size parameter to parse documents larger than 1 MB in [PR #5](https://github.com/anilmaurya/fast_jsonparser/pull/5), thanks to [casperisfine](https://github.com/casperisfine)
8
+ * Add option for symbolize_keys, default to true in [PR #9](https://github.com/anilmaurya/fast_jsonparser/pull/9), thanks to [casperisfine](https://github.com/casperisfine)
9
+ * Parse string values as UTF-8 in [PR #10](https://github.com/anilmaurya/fast_jsonparser/pull/10), thanks to [casperisfine](https://github.com/casperisfine)
data/Gemfile.lock CHANGED
@@ -1,20 +1,21 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- fast_jsonparser (0.2.0)
4
+ fast_jsonparser (0.5.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
- minitest (5.14.1)
10
- oj (3.10.6)
11
- rake (13.0.1)
12
- rake-compiler (1.1.0)
9
+ minitest (5.14.4)
10
+ oj (3.11.7)
11
+ rake (13.0.3)
12
+ rake-compiler (1.1.1)
13
13
  rake
14
14
  yajl-ruby (1.4.1)
15
15
 
16
16
  PLATFORMS
17
17
  ruby
18
+ x86_64-linux
18
19
 
19
20
  DEPENDENCIES
20
21
  bundler (~> 2.0)
@@ -26,4 +27,4 @@ DEPENDENCIES
26
27
  yajl-ruby
27
28
 
28
29
  BUNDLED WITH
29
- 2.0.1
30
+ 2.2.3
data/README.md CHANGED
@@ -107,14 +107,36 @@ Example: logs.json with following content
107
107
  "17/May/2015:08:05:23 +0000"
108
108
  "17/May/2015:08:05:24 +0000"
109
109
  ```
110
+ If size of json batch is greater than 1 MB then use `batch_size` option
110
111
 
112
+ ```
113
+ FastJsonparser.load_many(f.path, batch_size: 2_000) {}
114
+ ```
115
+
116
+ 4. Accept optional param :symbolize_keys (default symbolize_keys: true)
111
117
 
112
- 4. Raise FastJsonparser::ParseError when invalid JSON provided for parsing
118
+ If string key is expected in parsed result then use
119
+
120
+ ```
121
+ FastJsonparser.parse('{"one": 1, "two": 2}', symbolize_keys: false)
122
+
123
+ ```
124
+
125
+ 5. Raise FastJsonparser::ParseError when invalid JSON provided for parsing
113
126
 
114
127
  ```
115
128
  FastJsonparser.parse("123: 1") # FastJsonparser::ParseError (parse error)
116
129
  ```
117
130
 
131
+ ### Known Incompatibilities with stdlib JSON
132
+
133
+ `FastJsonparser` behaves mostly like stdlib's `JSON`, but there are a few corner cases:
134
+
135
+ - `FastJsonparser` will use symbols for hash keys by default. You can pass `symbolize_keys: false` to have strings instead like `JSON`.
136
+ - `FastJsonparser` will raise on integers outside of the 64bits range (`-9223372036854775808..18446744073709551615`), `JSON` will parse them fine.
137
+ - `FastJsonparser` will raise on invalid string escapings (`"\x15"`), `JSON` will often handle some of them.
138
+ - `FastJsonparser` will raise on `/**/` comments. `JSON` will sometimes ignore them, sometimes raise.
139
+
118
140
  ### Example
119
141
 
120
142
  ```
@@ -124,9 +146,9 @@ FastJsonparser.parse("123: 1") # FastJsonparser::ParseError (parse error)
124
146
  ```
125
147
  ## Development
126
148
 
127
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
149
+ After checking out the repo, run `rake compile` to install dependencies. Then, run `rake test` to run the tests.
128
150
 
129
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
151
+ For more option, refer https://github.com/rake-compiler/rake-compiler
130
152
 
131
153
  ## Contributing
132
154
 
data/Rakefile CHANGED
@@ -3,6 +3,7 @@ require "rake/testtask"
3
3
  require "rake/extensiontask"
4
4
 
5
5
  Rake::ExtensionTask.new("fast_jsonparser") do |ext|
6
+ ext.ext_dir = 'ext/fast_jsonparser'
6
7
  ext.lib_dir = "lib/fast_jsonparser"
7
8
  end
8
9
 
@@ -1,5 +1,5 @@
1
1
  require 'mkmf'
2
- $CXXFLAGS += ' -std=c++1z -Wno-register '
2
+ $CXXFLAGS += ' $(optflags) $(debugflags) -std=c++1z -Wno-register '
3
3
 
4
4
 
5
5
  create_makefile 'fast_jsonparser/fast_jsonparser'
@@ -2,116 +2,153 @@
2
2
 
3
3
  #include "simdjson.h"
4
4
 
5
- VALUE rb_mFastJsonparser;
6
-
7
- VALUE rb_eFastJsonparserParseError;
5
+ VALUE rb_eFastJsonparserUnknownError, rb_eFastJsonparserParseError;
8
6
 
9
7
  using namespace simdjson;
10
8
 
9
+ typedef struct {
10
+ dom::parser *parser;
11
+ } parser_t;
12
+
13
+ static void Parser_delete(void *ptr) {
14
+ parser_t *data = (parser_t*) ptr;
15
+ delete data->parser;
16
+ }
17
+
18
+ static size_t Parser_memsize(const void *parser) {
19
+ return sizeof(dom::parser); // TODO: low priority, figure the real size, e.g. internal buffers etc.
20
+ }
21
+
22
+ static const rb_data_type_t parser_data_type = {
23
+ "Parser",
24
+ { 0, Parser_delete, Parser_memsize, },
25
+ 0, 0, RUBY_TYPED_FREE_IMMEDIATELY
26
+ };
27
+
28
+ static VALUE parser_allocate(VALUE klass) {
29
+ parser_t *data;
30
+ VALUE obj = TypedData_Make_Struct(klass, parser_t, &parser_data_type, data);
31
+ data->parser = new dom::parser;
32
+ return obj;
33
+ }
34
+
35
+ static inline dom::parser * get_parser(VALUE self) {
36
+ parser_t *data;
37
+ TypedData_Get_Struct(self, parser_t, &parser_data_type, data);
38
+ return data->parser;
39
+ }
40
+
11
41
  // Convert tape to Ruby's Object
12
- static VALUE make_ruby_object(dom::element element)
42
+ static VALUE make_ruby_object(dom::element element, bool symbolize_keys)
13
43
  {
14
- auto t = element.type();
15
- if (t == dom::element_type::ARRAY)
44
+ switch (element.type())
45
+ {
46
+ case dom::element_type::ARRAY:
16
47
  {
17
48
  VALUE ary = rb_ary_new();
18
49
  for (dom::element x : element)
19
50
  {
20
- VALUE e = make_ruby_object(x);
51
+ VALUE e = make_ruby_object(x, symbolize_keys);
21
52
  rb_ary_push(ary, e);
22
53
  }
23
54
  return ary;
24
55
  }
25
- else if (t == dom::element_type::OBJECT)
56
+ case dom::element_type::OBJECT:
26
57
  {
27
58
  VALUE hash = rb_hash_new();
28
59
  for (dom::key_value_pair field : dom::object(element))
29
60
  {
30
61
  std::string_view view(field.key);
31
- VALUE k = rb_intern(view.data());
32
- VALUE v = make_ruby_object(field.value);
33
- rb_hash_aset(hash, ID2SYM(k), v);
62
+ VALUE k = rb_utf8_str_new(view.data(), view.size());
63
+ if (symbolize_keys)
64
+ {
65
+ k = ID2SYM(rb_intern_str(k));
66
+ }
67
+ VALUE v = make_ruby_object(field.value, symbolize_keys);
68
+ rb_hash_aset(hash, k, v);
34
69
  }
35
70
  return hash;
36
71
  }
37
- else if (t == dom::element_type::INT64)
72
+ case dom::element_type::INT64:
38
73
  {
39
74
  return LONG2NUM(element.get<int64_t>());
40
75
  }
41
- else if (t == dom::element_type::UINT64)
76
+ case dom::element_type::UINT64:
42
77
  {
43
78
  return ULONG2NUM(element.get<uint64_t>());
44
79
  }
45
- else if (t == dom::element_type::DOUBLE)
80
+ case dom::element_type::DOUBLE:
46
81
  {
47
82
  return DBL2NUM(double(element));
48
83
  }
49
- else if (t == dom::element_type::STRING)
84
+ case dom::element_type::STRING:
50
85
  {
51
86
  std::string_view view(element);
52
- return rb_str_new(view.data(), view.size());
87
+ return rb_utf8_str_new(view.data(), view.size());
53
88
  }
54
- else if (t == dom::element_type::BOOL)
89
+ case dom::element_type::BOOL:
55
90
  {
56
91
  return bool(element) ? Qtrue : Qfalse;
57
92
  }
58
- else if (t == dom::element_type::NULL_VALUE)
93
+ case dom::element_type::NULL_VALUE:
59
94
  {
60
95
  return Qnil;
61
96
  }
97
+ }
62
98
  // unknown case (bug)
63
99
  rb_raise(rb_eException, "[BUG] must not happen");
64
100
  }
65
101
 
66
- static VALUE rb_fast_jsonparser_parse(VALUE self, VALUE arg)
102
+ static VALUE rb_fast_jsonparser_parse(VALUE self, VALUE arg, VALUE symbolize_keys)
67
103
  {
68
104
  Check_Type(arg, T_STRING);
105
+ dom::parser *parser = get_parser(self);
69
106
 
70
- dom::parser parser;
71
- auto [doc, error] = parser.parse(RSTRING_PTR(arg), RSTRING_LEN(arg));
72
- if (error == SUCCESS)
107
+ auto [doc, error] = parser->parse(RSTRING_PTR(arg), RSTRING_LEN(arg));
108
+ if (error != SUCCESS)
73
109
  {
74
- return make_ruby_object(doc);
110
+ rb_raise(rb_eFastJsonparserParseError, "%s", error_message(error));
75
111
  }
76
- // TODO better error handling
77
- rb_raise(rb_eFastJsonparserParseError, "parse error");
78
- return Qnil;
112
+ return make_ruby_object(doc, RTEST(symbolize_keys));
79
113
  }
80
114
 
81
- static VALUE rb_fast_jsonparser_load(VALUE self, VALUE arg)
115
+ static VALUE rb_fast_jsonparser_load(VALUE self, VALUE arg, VALUE symbolize_keys)
82
116
  {
83
117
  Check_Type(arg, T_STRING);
118
+ dom::parser *parser = get_parser(self);
84
119
 
85
- dom::parser parser;
86
- auto [doc, error] = parser.load(RSTRING_PTR(arg));
87
- if (error == SUCCESS)
120
+ auto [doc, error] = parser->load(RSTRING_PTR(arg));
121
+ if (error != SUCCESS)
88
122
  {
89
- return make_ruby_object(doc);
123
+ rb_raise(rb_eFastJsonparserParseError, "%s", error_message(error));
90
124
  }
91
- // TODO better error handling
92
- rb_raise(rb_eFastJsonparserParseError, "parse error");
93
- return Qnil;
125
+ return make_ruby_object(doc, RTEST(symbolize_keys));
94
126
  }
95
127
 
96
- static VALUE rb_fast_jsonparser_load_many(VALUE self, VALUE arg)
128
+ static VALUE rb_fast_jsonparser_load_many(VALUE self, VALUE arg, VALUE symbolize_keys, VALUE batch_size)
97
129
  {
98
130
  Check_Type(arg, T_STRING);
131
+ Check_Type(batch_size, T_FIXNUM);
132
+ dom::parser *parser = get_parser(self);
133
+
134
+ try {
135
+ auto [docs, error] = parser->load_many(RSTRING_PTR(arg), FIX2INT(batch_size));
136
+ if (error != SUCCESS)
137
+ {
138
+ rb_raise(rb_eFastJsonparserParseError, "%s", error_message(error));
139
+ }
99
140
 
100
- dom::parser parser;
101
- auto [docs, error] = parser.load_many(RSTRING_PTR(arg));
102
- if (error == SUCCESS)
103
- {
104
141
  for (dom::element doc : docs)
105
142
  {
106
- if (rb_block_given_p())
107
- {
108
- rb_yield(make_ruby_object(doc));
109
- }
143
+ rb_yield(make_ruby_object(doc, RTEST(symbolize_keys)));
110
144
  }
145
+
111
146
  return Qnil;
112
147
  }
113
- rb_raise(rb_eFastJsonparserParseError, "parse error");
114
- return Qnil;
148
+ catch (simdjson::simdjson_error error)
149
+ {
150
+ rb_raise(rb_eFastJsonparserUnknownError, "%s", error.what());
151
+ }
115
152
  }
116
153
 
117
154
  extern "C"
@@ -119,10 +156,17 @@ extern "C"
119
156
 
120
157
  void Init_fast_jsonparser(void)
121
158
  {
122
- rb_mFastJsonparser = rb_define_module("FastJsonparser");
123
- rb_eFastJsonparserParseError = rb_define_class_under(rb_mFastJsonparser, "ParseError", rb_eStandardError);
124
- rb_define_module_function(rb_mFastJsonparser, "parse", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_parse), 1);
125
- rb_define_module_function(rb_mFastJsonparser, "load", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_load), 1);
126
- rb_define_module_function(rb_mFastJsonparser, "load_many", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_load_many), 1);
159
+ VALUE rb_mFastJsonparser = rb_const_get(rb_cObject, rb_intern("FastJsonparser"));
160
+ VALUE rb_cFastJsonparserNative = rb_const_get(rb_mFastJsonparser, rb_intern("Native"));
161
+
162
+ rb_define_alloc_func(rb_cFastJsonparserNative, parser_allocate);
163
+ rb_define_method(rb_cFastJsonparserNative, "_parse", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_parse), 2);
164
+ rb_define_method(rb_cFastJsonparserNative, "_load", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_load), 2);
165
+ rb_define_method(rb_cFastJsonparserNative, "_load_many", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_load_many), 3);
166
+
167
+ rb_eFastJsonparserParseError = rb_const_get(rb_mFastJsonparser, rb_intern("ParseError"));
168
+ rb_global_variable(&rb_eFastJsonparserParseError);
169
+ rb_eFastJsonparserUnknownError = rb_const_get(rb_mFastJsonparser, rb_intern("UnknownError"));
170
+ rb_global_variable(&rb_eFastJsonparserUnknownError);
127
171
  }
128
172
  }
@@ -2308,7 +2308,7 @@ using ErrorValues [[deprecated("This is an alias and will be removed, use error_
2308
2308
  * @deprecated Error codes should be stored and returned as `error_code`, use `error_message()` instead.
2309
2309
  */
2310
2310
  [[deprecated("Error codes should be stored and returned as `error_code`, use `error_message()` instead.")]]
2311
- inline const std::string &error_message(int error) noexcept;
2311
+ inline const std::string error_message(int error) noexcept;
2312
2312
 
2313
2313
  } // namespace simdjson
2314
2314
 
@@ -6367,7 +6367,7 @@ namespace internal {
6367
6367
  // We store the error code so we can validate the error message is associated with the right code
6368
6368
  struct error_code_info {
6369
6369
  error_code code;
6370
- std::string message;
6370
+ const char* message;
6371
6371
  };
6372
6372
  // These MUST match the codes in error_code. We check this constraint in basictests.
6373
6373
  extern SIMDJSON_DLLIMPORTEXPORT const error_code_info error_codes[];
@@ -6376,10 +6376,10 @@ namespace internal {
6376
6376
 
6377
6377
  inline const char *error_message(error_code error) noexcept {
6378
6378
  // If you're using error_code, we're trusting you got it from the enum.
6379
- return internal::error_codes[int(error)].message.c_str();
6379
+ return internal::error_codes[int(error)].message;
6380
6380
  }
6381
6381
 
6382
- inline const std::string &error_message(int error) noexcept {
6382
+ inline const std::string error_message(int error) noexcept {
6383
6383
  if (error < 0 || error >= error_code::NUM_ERROR_CODES) {
6384
6384
  return internal::error_codes[UNEXPECTED_ERROR].message;
6385
6385
  }
@@ -1,3 +1,3 @@
1
1
  module FastJsonparser
2
- VERSION = "0.3.0"
2
+ VERSION = "0.6.0"
3
3
  end
@@ -1,8 +1,46 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require "fast_jsonparser/version"
2
4
 
3
5
  module FastJsonparser
4
- class Error < StandardError; end
5
- # Your code goes here...
6
- end
6
+ Error = Class.new(StandardError)
7
+ ParseError = Class.new(Error)
8
+ UnknownError = Class.new(Error)
9
+ BatchSizeTooSmall = Class.new(Error)
10
+
11
+ DEFAULT_BATCH_SIZE = 1_000_000 # from include/simdjson/dom/parser.h
12
+
13
+ class << self
14
+ def parse(source, symbolize_keys: true)
15
+ parser._parse(source, symbolize_keys)
16
+ end
17
+
18
+ def load(source, symbolize_keys: true)
19
+ parser._load(source, symbolize_keys)
20
+ end
21
+
22
+ def load_many(source, symbolize_keys: true, batch_size: DEFAULT_BATCH_SIZE, &block)
23
+ Native.new._load_many(source, symbolize_keys, batch_size, &block)
24
+ rescue UnknownError => error
25
+ case error.message
26
+ when "This parser can't support a document that big"
27
+ raise BatchSizeTooSmall, "One of the documents was bigger than the batch size (#{batch_size}B), try increasing it."
28
+ else
29
+ raise
30
+ end
31
+ end
7
32
 
8
- require "fast_jsonparser/fast_jsonparser" # loads cpp extension
33
+ private
34
+
35
+ def parser
36
+ @parser ||= Native.new
37
+ end
38
+ end
39
+
40
+ class Native
41
+ end
42
+
43
+ require "fast_jsonparser/fast_jsonparser" # loads cpp extension
44
+
45
+ private_constant :Native
46
+ end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fast_jsonparser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Anil Maurya
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2020-07-14 00:00:00.000000000 Z
11
+ date: 2022-07-07 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -102,8 +102,10 @@ extensions:
102
102
  - ext/fast_jsonparser/extconf.rb
103
103
  extra_rdoc_files: []
104
104
  files:
105
+ - ".github/workflows/ruby.yml"
105
106
  - ".gitignore"
106
107
  - ".travis.yml"
108
+ - CHANGELOG.md
107
109
  - CODE_OF_CONDUCT.md
108
110
  - Gemfile
109
111
  - Gemfile.lock
@@ -119,7 +121,6 @@ files:
119
121
  - ext/fast_jsonparser/simdjson.h
120
122
  - fast_jsonparser.gemspec
121
123
  - lib/fast_jsonparser.rb
122
- - lib/fast_jsonparser/fast_jsonparser.bundle
123
124
  - lib/fast_jsonparser/version.rb
124
125
  homepage: https://github.com/anilmaurya/fast_jsonparser
125
126
  licenses:
@@ -140,7 +141,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
140
141
  - !ruby/object:Gem::Version
141
142
  version: '0'
142
143
  requirements: []
143
- rubygems_version: 3.0.3
144
+ rubygems_version: 3.2.22
144
145
  signing_key:
145
146
  specification_version: 4
146
147
  summary: Fast Json Parser