fast_jsonparser 0.3.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cb4380ffb8ced606931028f66c05ddb3af40498a9aada56833be2f5ef0bc47b4
4
- data.tar.gz: 3a3159926d6f1b1b0d90431171b7c97a93df4082bc1371313c4fe31e6c4de0c8
3
+ metadata.gz: d87b77e2dd63a557d8a32fdf47c962a52476aa921e981dd0120a3d57a5873453
4
+ data.tar.gz: b859d5d018cb9b7ce88d1bf6fe662ebf80dfbf57f5aed831e1fc22059dde0ceb
5
5
  SHA512:
6
- metadata.gz: ffa4a69c6550db893fd93c1f13df5778004188f1f3b54944b2d20049366026bbf2720fb0d62dce124d279fab650ed4e2f12c763c48207323ec9a900da240b32a
7
- data.tar.gz: fe75932c18f3cf0d896536ffc1d4f8f30067fe85046c7c1bf1806e819674f3f0a48816240cc634ac5dd530eb06e80f7b2a35f63a91e5c4ff1e432e8bf5310df5
6
+ metadata.gz: f3c9ef4836bfae90bb7cfce534a108cd102c621c90a23cd783bacbf65bfe830b863eebb89ee92abf26549b954c247857f32f3d00b44f1e0c37b327393b69161d
7
+ data.tar.gz: acc8fca8c8110f4f6c88b74cad2a73357d0664db214b8fed5ca3503f0c3290f528d394c9d7abedf6c677ec1f5608667358ec56f0995fa7aa7ce8ada060ddbfcd
@@ -0,0 +1,36 @@
1
+ # This workflow uses actions that are not certified by GitHub.
2
+ # They are provided by a third-party and are governed by
3
+ # separate terms of service, privacy policy, and support
4
+ # documentation.
5
+ # This workflow will download a prebuilt Ruby version, install dependencies and run tests with Rake
6
+ # For more information see: https://github.com/marketplace/actions/setup-ruby-jruby-and-truffleruby
7
+
8
+ name: Ruby
9
+
10
+ on:
11
+ push:
12
+ branches: [ master ]
13
+ pull_request:
14
+ branches: [ master ]
15
+
16
+ jobs:
17
+ test:
18
+
19
+ runs-on: ubuntu-latest
20
+
21
+ strategy:
22
+ matrix:
23
+ ruby-version: [3.0.1, 2.7.3, 2.6.7, 2.6.5]
24
+
25
+ steps:
26
+ - uses: actions/checkout@v2
27
+ - name: Set up Ruby ${{ matrix.ruby-version }}
28
+ uses: ruby/setup-ruby@v1
29
+ with:
30
+ ruby-version: ${{ matrix.ruby-version }}
31
+ - name: Install dependencies
32
+ run: bundle install
33
+ - name: Compile
34
+ run: bundle exec rake compile
35
+ - name: Run tests
36
+ run: bundle exec rake
data/.gitignore CHANGED
@@ -6,3 +6,5 @@
6
6
  /pkg/
7
7
  /spec/reports/
8
8
  /tmp/
9
+ *.so
10
+ *.bundle
data/CHANGELOG.md ADDED
@@ -0,0 +1,9 @@
1
+ # 0.6.0
2
+ * Fix performance on Ruby 3.0 [Issue #20](https://github.com/anilmaurya/fast_jsonparser/issues/20), thanks to [Watson1978](https://github.com/Watson1978)
3
+ # 0.5.0
4
+ * Handle concurrent use of the parser in [Issue #15](https://github.com/anilmaurya/fast_jsonparser/pull/15), thanks to [casperisfine](https://github.com/casperisfine)
5
+
6
+ # 0.4.0
7
+ * load_many accept batch_size parameter to parse documents larger than 1 MB in [PR #5](https://github.com/anilmaurya/fast_jsonparser/pull/5), thanks to [casperisfine](https://github.com/casperisfine)
8
+ * Add option for symbolize_keys, default to true in [PR #9](https://github.com/anilmaurya/fast_jsonparser/pull/9), thanks to [casperisfine](https://github.com/casperisfine)
9
+ * Parse string values as UTF-8 in [PR #10](https://github.com/anilmaurya/fast_jsonparser/pull/10), thanks to [casperisfine](https://github.com/casperisfine)
data/Gemfile.lock CHANGED
@@ -1,20 +1,21 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- fast_jsonparser (0.2.0)
4
+ fast_jsonparser (0.5.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
- minitest (5.14.1)
10
- oj (3.10.6)
11
- rake (13.0.1)
12
- rake-compiler (1.1.0)
9
+ minitest (5.14.4)
10
+ oj (3.11.7)
11
+ rake (13.0.3)
12
+ rake-compiler (1.1.1)
13
13
  rake
14
14
  yajl-ruby (1.4.1)
15
15
 
16
16
  PLATFORMS
17
17
  ruby
18
+ x86_64-linux
18
19
 
19
20
  DEPENDENCIES
20
21
  bundler (~> 2.0)
@@ -26,4 +27,4 @@ DEPENDENCIES
26
27
  yajl-ruby
27
28
 
28
29
  BUNDLED WITH
29
- 2.0.1
30
+ 2.2.3
data/README.md CHANGED
@@ -107,14 +107,36 @@ Example: logs.json with following content
107
107
  "17/May/2015:08:05:23 +0000"
108
108
  "17/May/2015:08:05:24 +0000"
109
109
  ```
110
+ If size of json batch is greater than 1 MB then use `batch_size` option
110
111
 
112
+ ```
113
+ FastJsonparser.load_many(f.path, batch_size: 2_000) {}
114
+ ```
115
+
116
+ 4. Accept optional param :symbolize_keys (default symbolize_keys: true)
111
117
 
112
- 4. Raise FastJsonparser::ParseError when invalid JSON provided for parsing
118
+ If string key is expected in parsed result then use
119
+
120
+ ```
121
+ FastJsonparser.parse('{"one": 1, "two": 2}', symbolize_keys: false)
122
+
123
+ ```
124
+
125
+ 5. Raise FastJsonparser::ParseError when invalid JSON provided for parsing
113
126
 
114
127
  ```
115
128
  FastJsonparser.parse("123: 1") # FastJsonparser::ParseError (parse error)
116
129
  ```
117
130
 
131
+ ### Known Incompatibilities with stdlib JSON
132
+
133
+ `FastJsonparser` behaves mostly like stdlib's `JSON`, but there are a few corner cases:
134
+
135
+ - `FastJsonparser` will use symbols for hash keys by default. You can pass `symbolize_keys: false` to have strings instead like `JSON`.
136
+ - `FastJsonparser` will raise on integers outside of the 64bits range (`-9223372036854775808..18446744073709551615`), `JSON` will parse them fine.
137
+ - `FastJsonparser` will raise on invalid string escapings (`"\x15"`), `JSON` will often handle some of them.
138
+ - `FastJsonparser` will raise on `/**/` comments. `JSON` will sometimes ignore them, sometimes raise.
139
+
118
140
  ### Example
119
141
 
120
142
  ```
@@ -124,9 +146,9 @@ FastJsonparser.parse("123: 1") # FastJsonparser::ParseError (parse error)
124
146
  ```
125
147
  ## Development
126
148
 
127
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
149
+ After checking out the repo, run `rake compile` to install dependencies. Then, run `rake test` to run the tests.
128
150
 
129
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
151
+ For more option, refer https://github.com/rake-compiler/rake-compiler
130
152
 
131
153
  ## Contributing
132
154
 
data/Rakefile CHANGED
@@ -3,6 +3,7 @@ require "rake/testtask"
3
3
  require "rake/extensiontask"
4
4
 
5
5
  Rake::ExtensionTask.new("fast_jsonparser") do |ext|
6
+ ext.ext_dir = 'ext/fast_jsonparser'
6
7
  ext.lib_dir = "lib/fast_jsonparser"
7
8
  end
8
9
 
@@ -1,5 +1,5 @@
1
1
  require 'mkmf'
2
- $CXXFLAGS += ' -std=c++1z -Wno-register '
2
+ $CXXFLAGS += ' $(optflags) $(debugflags) -std=c++1z -Wno-register '
3
3
 
4
4
 
5
5
  create_makefile 'fast_jsonparser/fast_jsonparser'
@@ -2,116 +2,153 @@
2
2
 
3
3
  #include "simdjson.h"
4
4
 
5
- VALUE rb_mFastJsonparser;
6
-
7
- VALUE rb_eFastJsonparserParseError;
5
+ VALUE rb_eFastJsonparserUnknownError, rb_eFastJsonparserParseError;
8
6
 
9
7
  using namespace simdjson;
10
8
 
9
+ typedef struct {
10
+ dom::parser *parser;
11
+ } parser_t;
12
+
13
+ static void Parser_delete(void *ptr) {
14
+ parser_t *data = (parser_t*) ptr;
15
+ delete data->parser;
16
+ }
17
+
18
+ static size_t Parser_memsize(const void *parser) {
19
+ return sizeof(dom::parser); // TODO: low priority, figure the real size, e.g. internal buffers etc.
20
+ }
21
+
22
+ static const rb_data_type_t parser_data_type = {
23
+ "Parser",
24
+ { 0, Parser_delete, Parser_memsize, },
25
+ 0, 0, RUBY_TYPED_FREE_IMMEDIATELY
26
+ };
27
+
28
+ static VALUE parser_allocate(VALUE klass) {
29
+ parser_t *data;
30
+ VALUE obj = TypedData_Make_Struct(klass, parser_t, &parser_data_type, data);
31
+ data->parser = new dom::parser;
32
+ return obj;
33
+ }
34
+
35
+ static inline dom::parser * get_parser(VALUE self) {
36
+ parser_t *data;
37
+ TypedData_Get_Struct(self, parser_t, &parser_data_type, data);
38
+ return data->parser;
39
+ }
40
+
11
41
  // Convert tape to Ruby's Object
12
- static VALUE make_ruby_object(dom::element element)
42
+ static VALUE make_ruby_object(dom::element element, bool symbolize_keys)
13
43
  {
14
- auto t = element.type();
15
- if (t == dom::element_type::ARRAY)
44
+ switch (element.type())
45
+ {
46
+ case dom::element_type::ARRAY:
16
47
  {
17
48
  VALUE ary = rb_ary_new();
18
49
  for (dom::element x : element)
19
50
  {
20
- VALUE e = make_ruby_object(x);
51
+ VALUE e = make_ruby_object(x, symbolize_keys);
21
52
  rb_ary_push(ary, e);
22
53
  }
23
54
  return ary;
24
55
  }
25
- else if (t == dom::element_type::OBJECT)
56
+ case dom::element_type::OBJECT:
26
57
  {
27
58
  VALUE hash = rb_hash_new();
28
59
  for (dom::key_value_pair field : dom::object(element))
29
60
  {
30
61
  std::string_view view(field.key);
31
- VALUE k = rb_intern(view.data());
32
- VALUE v = make_ruby_object(field.value);
33
- rb_hash_aset(hash, ID2SYM(k), v);
62
+ VALUE k = rb_utf8_str_new(view.data(), view.size());
63
+ if (symbolize_keys)
64
+ {
65
+ k = ID2SYM(rb_intern_str(k));
66
+ }
67
+ VALUE v = make_ruby_object(field.value, symbolize_keys);
68
+ rb_hash_aset(hash, k, v);
34
69
  }
35
70
  return hash;
36
71
  }
37
- else if (t == dom::element_type::INT64)
72
+ case dom::element_type::INT64:
38
73
  {
39
74
  return LONG2NUM(element.get<int64_t>());
40
75
  }
41
- else if (t == dom::element_type::UINT64)
76
+ case dom::element_type::UINT64:
42
77
  {
43
78
  return ULONG2NUM(element.get<uint64_t>());
44
79
  }
45
- else if (t == dom::element_type::DOUBLE)
80
+ case dom::element_type::DOUBLE:
46
81
  {
47
82
  return DBL2NUM(double(element));
48
83
  }
49
- else if (t == dom::element_type::STRING)
84
+ case dom::element_type::STRING:
50
85
  {
51
86
  std::string_view view(element);
52
- return rb_str_new(view.data(), view.size());
87
+ return rb_utf8_str_new(view.data(), view.size());
53
88
  }
54
- else if (t == dom::element_type::BOOL)
89
+ case dom::element_type::BOOL:
55
90
  {
56
91
  return bool(element) ? Qtrue : Qfalse;
57
92
  }
58
- else if (t == dom::element_type::NULL_VALUE)
93
+ case dom::element_type::NULL_VALUE:
59
94
  {
60
95
  return Qnil;
61
96
  }
97
+ }
62
98
  // unknown case (bug)
63
99
  rb_raise(rb_eException, "[BUG] must not happen");
64
100
  }
65
101
 
66
- static VALUE rb_fast_jsonparser_parse(VALUE self, VALUE arg)
102
+ static VALUE rb_fast_jsonparser_parse(VALUE self, VALUE arg, VALUE symbolize_keys)
67
103
  {
68
104
  Check_Type(arg, T_STRING);
105
+ dom::parser *parser = get_parser(self);
69
106
 
70
- dom::parser parser;
71
- auto [doc, error] = parser.parse(RSTRING_PTR(arg), RSTRING_LEN(arg));
72
- if (error == SUCCESS)
107
+ auto [doc, error] = parser->parse(RSTRING_PTR(arg), RSTRING_LEN(arg));
108
+ if (error != SUCCESS)
73
109
  {
74
- return make_ruby_object(doc);
110
+ rb_raise(rb_eFastJsonparserParseError, "%s", error_message(error));
75
111
  }
76
- // TODO better error handling
77
- rb_raise(rb_eFastJsonparserParseError, "parse error");
78
- return Qnil;
112
+ return make_ruby_object(doc, RTEST(symbolize_keys));
79
113
  }
80
114
 
81
- static VALUE rb_fast_jsonparser_load(VALUE self, VALUE arg)
115
+ static VALUE rb_fast_jsonparser_load(VALUE self, VALUE arg, VALUE symbolize_keys)
82
116
  {
83
117
  Check_Type(arg, T_STRING);
118
+ dom::parser *parser = get_parser(self);
84
119
 
85
- dom::parser parser;
86
- auto [doc, error] = parser.load(RSTRING_PTR(arg));
87
- if (error == SUCCESS)
120
+ auto [doc, error] = parser->load(RSTRING_PTR(arg));
121
+ if (error != SUCCESS)
88
122
  {
89
- return make_ruby_object(doc);
123
+ rb_raise(rb_eFastJsonparserParseError, "%s", error_message(error));
90
124
  }
91
- // TODO better error handling
92
- rb_raise(rb_eFastJsonparserParseError, "parse error");
93
- return Qnil;
125
+ return make_ruby_object(doc, RTEST(symbolize_keys));
94
126
  }
95
127
 
96
- static VALUE rb_fast_jsonparser_load_many(VALUE self, VALUE arg)
128
+ static VALUE rb_fast_jsonparser_load_many(VALUE self, VALUE arg, VALUE symbolize_keys, VALUE batch_size)
97
129
  {
98
130
  Check_Type(arg, T_STRING);
131
+ Check_Type(batch_size, T_FIXNUM);
132
+ dom::parser *parser = get_parser(self);
133
+
134
+ try {
135
+ auto [docs, error] = parser->load_many(RSTRING_PTR(arg), FIX2INT(batch_size));
136
+ if (error != SUCCESS)
137
+ {
138
+ rb_raise(rb_eFastJsonparserParseError, "%s", error_message(error));
139
+ }
99
140
 
100
- dom::parser parser;
101
- auto [docs, error] = parser.load_many(RSTRING_PTR(arg));
102
- if (error == SUCCESS)
103
- {
104
141
  for (dom::element doc : docs)
105
142
  {
106
- if (rb_block_given_p())
107
- {
108
- rb_yield(make_ruby_object(doc));
109
- }
143
+ rb_yield(make_ruby_object(doc, RTEST(symbolize_keys)));
110
144
  }
145
+
111
146
  return Qnil;
112
147
  }
113
- rb_raise(rb_eFastJsonparserParseError, "parse error");
114
- return Qnil;
148
+ catch (simdjson::simdjson_error error)
149
+ {
150
+ rb_raise(rb_eFastJsonparserUnknownError, "%s", error.what());
151
+ }
115
152
  }
116
153
 
117
154
  extern "C"
@@ -119,10 +156,17 @@ extern "C"
119
156
 
120
157
  void Init_fast_jsonparser(void)
121
158
  {
122
- rb_mFastJsonparser = rb_define_module("FastJsonparser");
123
- rb_eFastJsonparserParseError = rb_define_class_under(rb_mFastJsonparser, "ParseError", rb_eStandardError);
124
- rb_define_module_function(rb_mFastJsonparser, "parse", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_parse), 1);
125
- rb_define_module_function(rb_mFastJsonparser, "load", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_load), 1);
126
- rb_define_module_function(rb_mFastJsonparser, "load_many", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_load_many), 1);
159
+ VALUE rb_mFastJsonparser = rb_const_get(rb_cObject, rb_intern("FastJsonparser"));
160
+ VALUE rb_cFastJsonparserNative = rb_const_get(rb_mFastJsonparser, rb_intern("Native"));
161
+
162
+ rb_define_alloc_func(rb_cFastJsonparserNative, parser_allocate);
163
+ rb_define_method(rb_cFastJsonparserNative, "_parse", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_parse), 2);
164
+ rb_define_method(rb_cFastJsonparserNative, "_load", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_load), 2);
165
+ rb_define_method(rb_cFastJsonparserNative, "_load_many", reinterpret_cast<VALUE (*)(...)>(rb_fast_jsonparser_load_many), 3);
166
+
167
+ rb_eFastJsonparserParseError = rb_const_get(rb_mFastJsonparser, rb_intern("ParseError"));
168
+ rb_global_variable(&rb_eFastJsonparserParseError);
169
+ rb_eFastJsonparserUnknownError = rb_const_get(rb_mFastJsonparser, rb_intern("UnknownError"));
170
+ rb_global_variable(&rb_eFastJsonparserUnknownError);
127
171
  }
128
172
  }
@@ -2308,7 +2308,7 @@ using ErrorValues [[deprecated("This is an alias and will be removed, use error_
2308
2308
  * @deprecated Error codes should be stored and returned as `error_code`, use `error_message()` instead.
2309
2309
  */
2310
2310
  [[deprecated("Error codes should be stored and returned as `error_code`, use `error_message()` instead.")]]
2311
- inline const std::string &error_message(int error) noexcept;
2311
+ inline const std::string error_message(int error) noexcept;
2312
2312
 
2313
2313
  } // namespace simdjson
2314
2314
 
@@ -6367,7 +6367,7 @@ namespace internal {
6367
6367
  // We store the error code so we can validate the error message is associated with the right code
6368
6368
  struct error_code_info {
6369
6369
  error_code code;
6370
- std::string message;
6370
+ const char* message;
6371
6371
  };
6372
6372
  // These MUST match the codes in error_code. We check this constraint in basictests.
6373
6373
  extern SIMDJSON_DLLIMPORTEXPORT const error_code_info error_codes[];
@@ -6376,10 +6376,10 @@ namespace internal {
6376
6376
 
6377
6377
  inline const char *error_message(error_code error) noexcept {
6378
6378
  // If you're using error_code, we're trusting you got it from the enum.
6379
- return internal::error_codes[int(error)].message.c_str();
6379
+ return internal::error_codes[int(error)].message;
6380
6380
  }
6381
6381
 
6382
- inline const std::string &error_message(int error) noexcept {
6382
+ inline const std::string error_message(int error) noexcept {
6383
6383
  if (error < 0 || error >= error_code::NUM_ERROR_CODES) {
6384
6384
  return internal::error_codes[UNEXPECTED_ERROR].message;
6385
6385
  }
@@ -1,3 +1,3 @@
1
1
  module FastJsonparser
2
- VERSION = "0.3.0"
2
+ VERSION = "0.6.0"
3
3
  end
@@ -1,8 +1,46 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require "fast_jsonparser/version"
2
4
 
3
5
  module FastJsonparser
4
- class Error < StandardError; end
5
- # Your code goes here...
6
- end
6
+ Error = Class.new(StandardError)
7
+ ParseError = Class.new(Error)
8
+ UnknownError = Class.new(Error)
9
+ BatchSizeTooSmall = Class.new(Error)
10
+
11
+ DEFAULT_BATCH_SIZE = 1_000_000 # from include/simdjson/dom/parser.h
12
+
13
+ class << self
14
+ def parse(source, symbolize_keys: true)
15
+ parser._parse(source, symbolize_keys)
16
+ end
17
+
18
+ def load(source, symbolize_keys: true)
19
+ parser._load(source, symbolize_keys)
20
+ end
21
+
22
+ def load_many(source, symbolize_keys: true, batch_size: DEFAULT_BATCH_SIZE, &block)
23
+ Native.new._load_many(source, symbolize_keys, batch_size, &block)
24
+ rescue UnknownError => error
25
+ case error.message
26
+ when "This parser can't support a document that big"
27
+ raise BatchSizeTooSmall, "One of the documents was bigger than the batch size (#{batch_size}B), try increasing it."
28
+ else
29
+ raise
30
+ end
31
+ end
7
32
 
8
- require "fast_jsonparser/fast_jsonparser" # loads cpp extension
33
+ private
34
+
35
+ def parser
36
+ @parser ||= Native.new
37
+ end
38
+ end
39
+
40
+ class Native
41
+ end
42
+
43
+ require "fast_jsonparser/fast_jsonparser" # loads cpp extension
44
+
45
+ private_constant :Native
46
+ end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fast_jsonparser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Anil Maurya
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2020-07-14 00:00:00.000000000 Z
11
+ date: 2022-07-07 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -102,8 +102,10 @@ extensions:
102
102
  - ext/fast_jsonparser/extconf.rb
103
103
  extra_rdoc_files: []
104
104
  files:
105
+ - ".github/workflows/ruby.yml"
105
106
  - ".gitignore"
106
107
  - ".travis.yml"
108
+ - CHANGELOG.md
107
109
  - CODE_OF_CONDUCT.md
108
110
  - Gemfile
109
111
  - Gemfile.lock
@@ -119,7 +121,6 @@ files:
119
121
  - ext/fast_jsonparser/simdjson.h
120
122
  - fast_jsonparser.gemspec
121
123
  - lib/fast_jsonparser.rb
122
- - lib/fast_jsonparser/fast_jsonparser.bundle
123
124
  - lib/fast_jsonparser/version.rb
124
125
  homepage: https://github.com/anilmaurya/fast_jsonparser
125
126
  licenses:
@@ -140,7 +141,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
140
141
  - !ruby/object:Gem::Version
141
142
  version: '0'
142
143
  requirements: []
143
- rubygems_version: 3.0.3
144
+ rubygems_version: 3.2.22
144
145
  signing_key:
145
146
  specification_version: 4
146
147
  summary: Fast Json Parser