oj 0.9.0 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of oj might be problematic. Click here for more details.

data/README.md CHANGED
@@ -16,15 +16,17 @@ A fast JSON parser and Object marshaller as a Ruby gem.
16
16
 
17
17
  ## <a name="links">Links of Interest</a>
18
18
 
19
+ [Need for Speed](http://www.ohler.com/software/thoughts/Blog/Entries/2012/3/13_Need_for_Speed.html) for an overview of how Oj::Doc was designed.
20
+
19
21
  *Fast XML parser and marshaller on RubyGems*: https://rubygems.org/gems/ox
20
22
 
21
23
  *Fast XML parser and marshaller on GitHub*: https://rubygems.org/gems/ox
22
24
 
23
25
  ## <a name="release">Release Notes</a>
24
26
 
25
- ### Release 0.9.0
27
+ ### Release 1.0.0
26
28
 
27
- - Added support for circular references.
29
+ - The screaming fast Oj::Doc parser added.
28
30
 
29
31
  ## <a name="description">Description</a>
30
32
 
@@ -57,15 +59,100 @@ Oj is compatible with Ruby 1.8.7, 1.9.2, 1.9.3, JRuby, and RBX.
57
59
 
58
60
  ## <a name="plans">Planned Releases</a>
59
61
 
60
- - Release 1.0: A JSON stream parser.
62
+ - Release 1.0.1: Optimize the Oj::Doc dump() method to be native.
63
+
64
+ - Release 1.1: A JSON stream parser. Pushed out for the Oj::Doc parser.
61
65
 
62
66
  ## <a name="compare">Comparisons</a>
63
67
 
68
+ ### Fast Oj::Doc parser comparisons
69
+
70
+ The fast Oj::Doc parser is compared to the Yajl and JSON::Pure parsers with
71
+ strict JSON documents. No object conversions are included, just simple JSON.
72
+
73
+ Since the Oj::Doc deviation from the conventional parsers comparisons of not
74
+ only parsing but data access is also included. These tests use the
75
+ perf_fast.rb test file. The first benchmark is for just parsing. The second is
76
+ for doing a get on every leaf value in the JSON data structure. The third
77
+ fetchs a value from a specific spot in the document. With Yajl and JSON this
78
+ is done with a set of calls to fetch() for each level in the document. For
79
+ Oj::Doc a single fetch with a path is used.
80
+
81
+ The benchmark results are:
82
+
83
+ > perf_fast.rb -g 1 -f
84
+ --------------------------------------------------------------------------------
85
+ Parse Performance
86
+ Oj::Doc.parse 100000 times in 0.164 seconds or 609893.696 parse/sec.
87
+ Yajl.parse 100000 times in 3.168 seconds or 31569.902 parse/sec.
88
+ JSON::Ext.parse 100000 times in 3.282 seconds or 30464.826 parse/sec.
89
+
90
+ Summary:
91
+ System time (secs) rate (ops/sec)
92
+ --------- ----------- --------------
93
+ Oj::Doc 0.164 609893.696
94
+ Yajl 3.168 31569.902
95
+ JSON::Ext 3.282 30464.826
96
+
97
+ Comparison Matrix
98
+ (performance factor, 2.0 row is means twice as fast as column)
99
+ Oj::Doc Yajl JSON::Ext
100
+ --------- --------- --------- ---------
101
+ Oj::Doc 1.00 19.32 20.02
102
+ Yajl 0.05 1.00 1.04
103
+ JSON::Ext 0.05 0.96 1.00
104
+
105
+ --------------------------------------------------------------------------------
106
+ Parse and get all values Performance
107
+ Oj::Doc.parse 100000 times in 0.417 seconds or 240054.540 parse/sec.
108
+ Yajl.parse 100000 times in 5.159 seconds or 19384.191 parse/sec.
109
+ JSON::Ext.parse 100000 times in 5.269 seconds or 18978.638 parse/sec.
110
+
111
+ Summary:
112
+ System time (secs) rate (ops/sec)
113
+ --------- ----------- --------------
114
+ Oj::Doc 0.417 240054.540
115
+ Yajl 5.159 19384.191
116
+ JSON::Ext 5.269 18978.638
117
+
118
+ Comparison Matrix
119
+ (performance factor, 2.0 row is means twice as fast as column)
120
+ Oj::Doc Yajl JSON::Ext
121
+ --------- --------- --------- ---------
122
+ Oj::Doc 1.00 12.38 12.65
123
+ Yajl 0.08 1.00 1.02
124
+ JSON::Ext 0.08 0.98 1.00
125
+
126
+ --------------------------------------------------------------------------------
127
+ fetch nested Performance
128
+ Oj::Doc.fetch 100000 times in 0.094 seconds or 1059995.760 fetch/sec.
129
+ Ruby.fetch 100000 times in 0.503 seconds or 198851.434 fetch/sec.
130
+
131
+ Summary:
132
+ System time (secs) rate (ops/sec)
133
+ ------- ----------- --------------
134
+ Oj::Doc 0.094 1059995.760
135
+ Ruby 0.503 198851.434
136
+
137
+ Comparison Matrix
138
+ (performance factor, 2.0 row is means twice as fast as column)
139
+ Oj::Doc Ruby
140
+ ------- ------- -------
141
+ Oj::Doc 1.00 5.33
142
+ Ruby 0.19 1.00
143
+
144
+ What the results mean are that for getting just a few values from a JSON
145
+ document Oj::Doc is 20 times faster than any other parser and for accessing
146
+ all values it is still over 12 times faster than any other Ruby JSON parser.
147
+
148
+ ### Conventional Oj parser comparisons
149
+
64
150
  The following table shows the difference is speeds between several
65
- serialization packages. The tests had to be scaled back due to limitation of
66
- some of the gems. I finally gave up trying to get JSON Pure to serialize
67
- without errors with Ruby 1.9.3. It had internal errors on anything other than
68
- a simple JSON structure. The errors encountered were:
151
+ serialization packages compared to the more conventional Oj parser. The tests
152
+ had to be scaled back due to limitation of some of the gems. I finally gave up
153
+ trying to get JSON Pure to serialize without errors with Ruby 1.9.3. It had
154
+ internal errors on anything other than a simple JSON structure. The errors
155
+ encountered were:
69
156
 
70
157
  - MessagePack fails to convert Bignum to JSON
71
158
 
@@ -84,7 +171,7 @@ It is also worth noting that although Oj is slightly behind MessagePack for
84
171
  parsing, Oj serialization is much faster than MessagePack even though Oj uses
85
172
  human readable JSON vs the binary MessagePack format.
86
173
 
87
- UOj supports circular references when in :object mode and when the :circular
174
+ Oj supports circular references when in :object mode and when the :circular
88
175
  flag is true. None of the other gems tested supported circular
89
176
  references. They failed in the following manners when the input included
90
177
  circular references.
@@ -0,0 +1,1540 @@
1
+ /* fast.c
2
+ * Copyright (c) 2012, Peter Ohler
3
+ * All rights reserved.
4
+ *
5
+ * Redistribution and use in source and binary forms, with or without
6
+ * modification, are permitted provided that the following conditions are met:
7
+ *
8
+ * - Redistributions of source code must retain the above copyright notice, this
9
+ * list of conditions and the following disclaimer.
10
+ *
11
+ * - Redistributions in binary form must reproduce the above copyright notice,
12
+ * this list of conditions and the following disclaimer in the documentation
13
+ * and/or other materials provided with the distribution.
14
+ *
15
+ * - Neither the name of Peter Ohler nor the names of its contributors may be
16
+ * used to endorse or promote products derived from this software without
17
+ * specific prior written permission.
18
+ *
19
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22
+ * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29
+ */
30
+
31
+ #include <stdlib.h>
32
+ #include <stdio.h>
33
+ #include <string.h>
34
+ #include <math.h>
35
+ #include <errno.h>
36
+
37
+ #include "ruby.h"
38
+ #include "oj.h"
39
+
40
+ #define MAX_STACK 100
41
+
42
+ enum {
43
+ STR_VAL = 0x00,
44
+ COL_VAL = 0x01,
45
+ RUBY_VAL = 0x02
46
+ };
47
+
48
+ typedef struct _Leaf {
49
+ struct _Leaf *next;
50
+ union {
51
+ const char *key; // hash key
52
+ size_t index; // array index, 0 is not set
53
+ };
54
+ union {
55
+ char *str; // pointer to location in json string
56
+ struct _Leaf *elements; // array and hash elements
57
+ VALUE value;
58
+ };
59
+ uint8_t type;
60
+ uint8_t parent_type;
61
+ uint8_t value_type;
62
+ } *Leaf;
63
+
64
+ //#define BATCH_SIZE (4096 / sizeof(struct _Leaf) - 1)
65
+ #define BATCH_SIZE 100
66
+
67
+ typedef struct _Batch {
68
+ struct _Batch *next;
69
+ int next_avail;
70
+ struct _Leaf leaves[BATCH_SIZE];
71
+ } *Batch;
72
+
73
+ typedef struct _Doc {
74
+ Leaf data;
75
+ Leaf *where; // points to current location
76
+ Leaf where_path[MAX_STACK]; // points to head of path
77
+ #ifdef HAVE_RUBY_ENCODING_H
78
+ rb_encoding *encoding;
79
+ #else
80
+ void *encoding;
81
+ #endif
82
+ unsigned long size; // number of leaves/branches in the doc
83
+ VALUE self;
84
+ Batch batches;
85
+ //Leaf where_array[MAX_STACK];
86
+ //size_t where_len; // length of allocated if longer than where_array
87
+ struct _Batch batch0;
88
+ } *Doc;
89
+
90
+ typedef struct _ParseInfo {
91
+ char *str; /* buffer being read from */
92
+ char *s; /* current position in buffer */
93
+ Doc doc;
94
+ } *ParseInfo;
95
+
96
+ static void leaf_init(Leaf leaf, int type);
97
+ static Leaf leaf_new(Doc doc, int type);
98
+ static void leaf_append_element(Leaf parent, Leaf element);
99
+ static VALUE leaf_value(Doc doc, Leaf leaf);
100
+ static void leaf_fixnum_value(Leaf leaf);
101
+ static void leaf_float_value(Leaf leaf);
102
+ static VALUE leaf_array_value(Doc doc, Leaf leaf);
103
+ static VALUE leaf_hash_value(Doc doc, Leaf leaf);
104
+
105
+ static Leaf read_next(ParseInfo pi);
106
+ static Leaf read_obj(ParseInfo pi);
107
+ static Leaf read_array(ParseInfo pi);
108
+ static Leaf read_str(ParseInfo pi);
109
+ static Leaf read_num(ParseInfo pi);
110
+ static Leaf read_true(ParseInfo pi);
111
+ static Leaf read_false(ParseInfo pi);
112
+ static Leaf read_nil(ParseInfo pi);
113
+ static void next_non_white(ParseInfo pi);
114
+ static char* read_quoted_value(ParseInfo pi);
115
+
116
+ static VALUE protect_open_proc(VALUE x);
117
+ static VALUE parse_json(VALUE clas, char *json);
118
+ static void each_leaf(Doc doc, VALUE self);
119
+ static int move_step(Doc doc, const char *path, int loc);
120
+ static Leaf get_doc_leaf(Doc doc, const char *path);
121
+ static Leaf get_leaf(Leaf *stack, Leaf *lp, const char *path);
122
+ static void each_value(Doc doc, Leaf leaf);
123
+
124
+ static void doc_init(Doc doc);
125
+ static void doc_free(Doc doc);
126
+ static VALUE doc_open(VALUE clas, VALUE str);
127
+ static VALUE doc_open_file(VALUE clas, VALUE filename);
128
+ static VALUE doc_where(VALUE self);
129
+ static VALUE doc_local_key(VALUE self);
130
+ static VALUE doc_home(VALUE self);
131
+ static VALUE doc_type(int argc, VALUE *argv, VALUE self);
132
+ static VALUE doc_fetch(int argc, VALUE *argv, VALUE self);
133
+ static VALUE doc_each_leaf(int argc, VALUE *argv, VALUE self);
134
+ static VALUE doc_move(VALUE self, VALUE str);
135
+ static VALUE doc_each_child(int argc, VALUE *argv, VALUE self);
136
+ static VALUE doc_each_value(int argc, VALUE *argv, VALUE self);
137
+ static VALUE doc_dump(int argc, VALUE *argv, VALUE self);
138
+ static VALUE doc_size(VALUE self);
139
+
140
+ VALUE oj_doc_class = 0;
141
+
142
+ inline static void
143
+ next_non_white(ParseInfo pi) {
144
+ for (; 1; pi->s++) {
145
+ switch(*pi->s) {
146
+ case ' ':
147
+ case '\t':
148
+ case '\f':
149
+ case '\n':
150
+ case '\r':
151
+ break;
152
+ default:
153
+ return;
154
+ }
155
+ }
156
+ }
157
+
158
+ inline static void
159
+ next_white(ParseInfo pi) {
160
+ for (; 1; pi->s++) {
161
+ switch(*pi->s) {
162
+ case ' ':
163
+ case '\t':
164
+ case '\f':
165
+ case '\n':
166
+ case '\r':
167
+ case '\0':
168
+ return;
169
+ default:
170
+ break;
171
+ }
172
+ }
173
+ }
174
+
175
+ inline static char*
176
+ ulong_fill(char *s, size_t num) {
177
+ char buf[32];
178
+ char *b = buf + sizeof(buf) - 1;
179
+
180
+ *b-- = '\0';
181
+ for (; 0 < num; num /= 10, b--) {
182
+ *b = (num % 10) + '0';
183
+ }
184
+ b++;
185
+ if ('\0' == *b) {
186
+ b--;
187
+ *b = '0';
188
+ }
189
+ for (; '\0' != *b; b++, s++) {
190
+ *s = *b;
191
+ }
192
+ return s;
193
+ }
194
+
195
+ inline static void
196
+ leaf_init(Leaf leaf, int type) {
197
+ leaf->next = 0;
198
+ leaf->type = type;
199
+ leaf->parent_type = T_NONE;
200
+ switch (type) {
201
+ case T_ARRAY:
202
+ case T_HASH:
203
+ leaf->elements = 0;
204
+ leaf->value_type = COL_VAL;
205
+ break;
206
+ case T_NIL:
207
+ leaf->value = Qnil;
208
+ leaf->value_type = RUBY_VAL;
209
+ break;
210
+ case T_TRUE:
211
+ leaf->value = Qtrue;
212
+ leaf->value_type = RUBY_VAL;
213
+ break;
214
+ case T_FALSE:
215
+ leaf->value = Qfalse;
216
+ leaf->value_type = RUBY_VAL;
217
+ break;
218
+ case T_FIXNUM:
219
+ case T_FLOAT:
220
+ case T_STRING:
221
+ default:
222
+ leaf->value_type = STR_VAL;
223
+ break;
224
+ }
225
+ }
226
+
227
+ inline static Leaf
228
+ leaf_new(Doc doc, int type) {
229
+ Leaf leaf;
230
+
231
+ if (0 == doc->batches || BATCH_SIZE == doc->batches->next_avail) {
232
+ Batch b = ALLOC(struct _Batch);
233
+
234
+ b->next = doc->batches;
235
+ doc->batches = b;
236
+ b->next_avail = 0;
237
+ }
238
+ leaf = &doc->batches->leaves[doc->batches->next_avail];
239
+ doc->batches->next_avail++;
240
+ leaf_init(leaf, type);
241
+
242
+ return leaf;
243
+ }
244
+
245
+ inline static void
246
+ leaf_append_element(Leaf parent, Leaf element) {
247
+ if (0 == parent->elements) {
248
+ parent->elements = element;
249
+ element->next = element;
250
+ } else {
251
+ element->next = parent->elements->next;
252
+ parent->elements->next = element;
253
+ parent->elements = element;
254
+ }
255
+ }
256
+
257
+ static VALUE
258
+ leaf_value(Doc doc, Leaf leaf) {
259
+ if (RUBY_VAL != leaf->value_type) {
260
+ switch (leaf->type) {
261
+ case T_NIL:
262
+ leaf->value = Qnil;
263
+ break;
264
+ case T_TRUE:
265
+ leaf->value = Qtrue;
266
+ break;
267
+ case T_FALSE:
268
+ leaf->value = Qfalse;
269
+ break;
270
+ case T_FIXNUM:
271
+ leaf_fixnum_value(leaf);
272
+ break;
273
+ case T_FLOAT:
274
+ leaf_float_value(leaf);
275
+ break;
276
+ case T_STRING:
277
+ leaf->value = rb_str_new2(leaf->str);
278
+ #ifdef HAVE_RUBY_ENCODING_H
279
+ if (0 != doc->encoding) {
280
+ rb_enc_associate(leaf->value, doc->encoding);
281
+ }
282
+ #endif
283
+ leaf->value_type = RUBY_VAL;
284
+ break;
285
+ case T_ARRAY:
286
+ return leaf_array_value(doc, leaf);
287
+ break;
288
+ case T_HASH:
289
+ return leaf_hash_value(doc, leaf);
290
+ break;
291
+ default:
292
+ rb_raise(rb_eTypeError, "Unexpected type %02x.", leaf->type);
293
+ break;
294
+ }
295
+ }
296
+ return leaf->value;
297
+ }
298
+
299
+ #ifdef RUBINIUS
300
+ #define NUM_MAX 0x07FFFFFF
301
+ #else
302
+ #define NUM_MAX (FIXNUM_MAX >> 8)
303
+ #endif
304
+
305
+
306
+ static void
307
+ leaf_fixnum_value(Leaf leaf) {
308
+ char *s = leaf->str;
309
+ int64_t n = 0;
310
+ int neg = 0;
311
+ int big = 0;
312
+
313
+ if ('-' == *s) {
314
+ s++;
315
+ neg = 1;
316
+ } else if ('+' == *s) {
317
+ s++;
318
+ }
319
+ for (; '0' <= *s && *s <= '9'; s++) {
320
+ n = n * 10 + (*s - '0');
321
+ if (NUM_MAX <= n) {
322
+ big = 1;
323
+ }
324
+ }
325
+ if (big) {
326
+ char c = *s;
327
+
328
+ *s = '\0';
329
+ leaf->value = rb_cstr_to_inum(leaf->str, 10, 0);
330
+ *s = c;
331
+ } else {
332
+ if (neg) {
333
+ n = -n;
334
+ }
335
+ leaf->value = LONG2NUM(n);
336
+ }
337
+ leaf->value_type = RUBY_VAL;
338
+ }
339
+
340
+ #if 1
341
+ static void
342
+ leaf_float_value(Leaf leaf) {
343
+ leaf->value = DBL2NUM(rb_cstr_to_dbl(leaf->str, 1));
344
+ leaf->value_type = RUBY_VAL;
345
+ }
346
+ #else
347
+ static void
348
+ leaf_float_value(Leaf leaf) {
349
+ char *s = leaf->str;
350
+ int64_t n = 0;
351
+ long a = 0;
352
+ long div = 1;
353
+ long e = 0;
354
+ int neg = 0;
355
+ int eneg = 0;
356
+ int big = 0;
357
+
358
+ if ('-' == *s) {
359
+ s++;
360
+ neg = 1;
361
+ } else if ('+' == *s) {
362
+ s++;
363
+ }
364
+ for (; '0' <= *s && *s <= '9'; s++) {
365
+ n = n * 10 + (*s - '0');
366
+ if (NUM_MAX <= n) {
367
+ big = 1;
368
+ }
369
+ }
370
+ if (big) {
371
+ char c = *s;
372
+
373
+ *s = '\0';
374
+ leaf->value = rb_cstr_to_inum(leaf->str, 10, 0);
375
+ *s = c;
376
+ } else {
377
+ double d;
378
+
379
+ if ('.' == *s) {
380
+ s++;
381
+ for (; '0' <= *s && *s <= '9'; s++) {
382
+ a = a * 10 + (*s - '0');
383
+ div *= 10;
384
+ }
385
+ }
386
+ if ('e' == *s || 'E' == *s) {
387
+ s++;
388
+ if ('-' == *s) {
389
+ s++;
390
+ eneg = 1;
391
+ } else if ('+' == *s) {
392
+ s++;
393
+ }
394
+ for (; '0' <= *s && *s <= '9'; s++) {
395
+ e = e * 10 + (*s - '0');
396
+ }
397
+ }
398
+ d = (double)n + (double)a / (double)div;
399
+ if (neg) {
400
+ d = -d;
401
+ }
402
+ if (0 != e) {
403
+ if (eneg) {
404
+ e = -e;
405
+ }
406
+ d *= pow(10.0, e);
407
+ }
408
+ leaf->value = DBL2NUM(d);
409
+ }
410
+ leaf->value_type = RUBY_VAL;
411
+ }
412
+ #endif
413
+
414
+ static VALUE
415
+ leaf_array_value(Doc doc, Leaf leaf) {
416
+ VALUE a = rb_ary_new();
417
+
418
+ if (0 != leaf->elements) {
419
+ Leaf first = leaf->elements->next;
420
+ Leaf e = first;
421
+
422
+ do {
423
+ rb_ary_push(a, leaf_value(doc, e));
424
+ e = e->next;
425
+ } while (e != first);
426
+ }
427
+ return a;
428
+ }
429
+
430
+ static VALUE
431
+ leaf_hash_value(Doc doc, Leaf leaf) {
432
+ VALUE h = rb_hash_new();
433
+
434
+ if (0 != leaf->elements) {
435
+ Leaf first = leaf->elements->next;
436
+ Leaf e = first;
437
+ VALUE key;
438
+
439
+ do {
440
+ key = rb_str_new2(e->key);
441
+ #ifdef HAVE_RUBY_ENCODING_H
442
+ if (0 != doc->encoding) {
443
+ rb_enc_associate(key, doc->encoding);
444
+ }
445
+ #endif
446
+ rb_hash_aset(h, key, leaf_value(doc, e));
447
+ e = e->next;
448
+ } while (e != first);
449
+ }
450
+ return h;
451
+ }
452
+
453
+ static Leaf
454
+ read_next(ParseInfo pi) {
455
+ Leaf leaf = 0;
456
+
457
+ next_non_white(pi); // skip white space
458
+ switch (*pi->s) {
459
+ case '{':
460
+ leaf = read_obj(pi);
461
+ break;
462
+ case '[':
463
+ leaf = read_array(pi);
464
+ break;
465
+ case '"':
466
+ leaf = read_str(pi);
467
+ break;
468
+ case '+':
469
+ case '-':
470
+ case '0':
471
+ case '1':
472
+ case '2':
473
+ case '3':
474
+ case '4':
475
+ case '5':
476
+ case '6':
477
+ case '7':
478
+ case '8':
479
+ case '9':
480
+ leaf = read_num(pi);
481
+ break;
482
+ case 't':
483
+ leaf = read_true(pi);
484
+ break;
485
+ case 'f':
486
+ leaf = read_false(pi);
487
+ break;
488
+ case 'n':
489
+ leaf = read_nil(pi);
490
+ break;
491
+ case '\0':
492
+ default:
493
+ break; // returns 0
494
+ }
495
+ pi->doc->size++;
496
+
497
+ return leaf;
498
+ }
499
+
500
+ static Leaf
501
+ read_obj(ParseInfo pi) {
502
+ Leaf h = leaf_new(pi->doc, T_HASH);
503
+ char *end;
504
+ const char *key = 0;
505
+ Leaf val = 0;
506
+
507
+ pi->s++;
508
+ next_non_white(pi);
509
+ if ('}' == *pi->s) {
510
+ pi->s++;
511
+ return h;
512
+ }
513
+ while (1) {
514
+ next_non_white(pi);
515
+ key = 0;
516
+ val = 0;
517
+ if ('"' != *pi->s || 0 == (key = read_quoted_value(pi))) {
518
+ raise_error("unexpected character", pi->str, pi->s);
519
+ }
520
+ next_non_white(pi);
521
+ if (':' == *pi->s) {
522
+ pi->s++;
523
+ } else {
524
+ raise_error("invalid format, expected :", pi->str, pi->s);
525
+ }
526
+ if (0 == (val = read_next(pi))) {
527
+ //printf("*** '%s'\n", pi->s);
528
+ raise_error("unexpected character", pi->str, pi->s);
529
+ }
530
+ end = pi->s;
531
+ val->key = key;
532
+ val->parent_type = T_HASH;
533
+ leaf_append_element(h, val);
534
+ next_non_white(pi);
535
+ if ('}' == *pi->s) {
536
+ pi->s++;
537
+ *end = '\0';
538
+ break;
539
+ } else if (',' == *pi->s) {
540
+ pi->s++;
541
+ } else {
542
+ printf("*** '%s'\n", pi->s);
543
+ raise_error("invalid format, expected , or } while in an object", pi->str, pi->s);
544
+ }
545
+ *end = '\0';
546
+ }
547
+ return h;
548
+ }
549
+
550
+ static Leaf
551
+ read_array(ParseInfo pi) {
552
+ Leaf a = leaf_new(pi->doc, T_ARRAY);
553
+ Leaf e;
554
+ char *end;
555
+ int cnt = 0;
556
+
557
+ pi->s++;
558
+ next_non_white(pi);
559
+ if (']' == *pi->s) {
560
+ pi->s++;
561
+ return a;
562
+ }
563
+ while (1) {
564
+ next_non_white(pi);
565
+ if (0 == (e = read_next(pi))) {
566
+ raise_error("unexpected character", pi->str, pi->s);
567
+ }
568
+ cnt++;
569
+ e->index = cnt;
570
+ e->parent_type = T_ARRAY;
571
+ leaf_append_element(a, e);
572
+ end = pi->s;
573
+ next_non_white(pi);
574
+ if (',' == *pi->s) {
575
+ pi->s++;
576
+ } else if (']' == *pi->s) {
577
+ pi->s++;
578
+ *end = '\0';
579
+ break;
580
+ } else {
581
+ raise_error("invalid format, expected , or ] while in an array", pi->str, pi->s);
582
+ }
583
+ *end = '\0';
584
+ }
585
+ return a;
586
+ }
587
+
588
+ static Leaf
589
+ read_str(ParseInfo pi) {
590
+ Leaf leaf = leaf_new(pi->doc, T_STRING);
591
+
592
+ leaf->str = read_quoted_value(pi);
593
+
594
+ return leaf;
595
+ }
596
+
597
+ static Leaf
598
+ read_num(ParseInfo pi) {
599
+ char *start = pi->s;
600
+ int type = T_FIXNUM;
601
+ Leaf leaf = leaf_new(pi->doc, type);
602
+
603
+ if ('-' == *pi->s) {
604
+ pi->s++;
605
+ }
606
+ // digits
607
+ for (; '0' <= *pi->s && *pi->s <= '9'; pi->s++) {
608
+ }
609
+ if ('.' == *pi->s) {
610
+ type = T_FLOAT;
611
+ pi->s++;
612
+ for (; '0' <= *pi->s && *pi->s <= '9'; pi->s++) {
613
+ }
614
+ }
615
+ if ('e' == *pi->s || 'E' == *pi->s) {
616
+ pi->s++;
617
+ if ('-' == *pi->s || '+' == *pi->s) {
618
+ pi->s++;
619
+ }
620
+ for (; '0' <= *pi->s && *pi->s <= '9'; pi->s++) {
621
+ }
622
+ }
623
+ leaf = leaf_new(pi->doc, type);
624
+ leaf->str = start;
625
+
626
+ return leaf;
627
+ }
628
+
629
+ static Leaf
630
+ read_true(ParseInfo pi) {
631
+ Leaf leaf = leaf_new(pi->doc, T_TRUE);
632
+
633
+ pi->s++;
634
+ if ('r' != *pi->s || 'u' != *(pi->s + 1) || 'e' != *(pi->s + 2)) {
635
+ raise_error("invalid format, expected 'true'", pi->str, pi->s);
636
+ }
637
+ pi->s += 3;
638
+
639
+ return leaf;
640
+ }
641
+
642
+ static Leaf
643
+ read_false(ParseInfo pi) {
644
+ Leaf leaf = leaf_new(pi->doc, T_FALSE);
645
+
646
+ pi->s++;
647
+ if ('a' != *pi->s || 'l' != *(pi->s + 1) || 's' != *(pi->s + 2) || 'e' != *(pi->s + 3)) {
648
+ raise_error("invalid format, expected 'false'", pi->str, pi->s);
649
+ }
650
+ pi->s += 4;
651
+
652
+ return leaf;
653
+ }
654
+
655
+ static Leaf
656
+ read_nil(ParseInfo pi) {
657
+ Leaf leaf = leaf_new(pi->doc, T_NIL);
658
+
659
+ pi->s++;
660
+ if ('u' != *pi->s || 'l' != *(pi->s + 1) || 'l' != *(pi->s + 2)) {
661
+ raise_error("invalid format, expected 'nil'", pi->str, pi->s);
662
+ }
663
+ pi->s += 3;
664
+
665
+ return leaf;
666
+ }
667
+
668
+ static char
669
+ read_hex(ParseInfo pi, char *h) {
670
+ uint8_t b = 0;
671
+
672
+ if ('0' <= *h && *h <= '9') {
673
+ b = *h - '0';
674
+ } else if ('A' <= *h && *h <= 'F') {
675
+ b = *h - 'A' + 10;
676
+ } else if ('a' <= *h && *h <= 'f') {
677
+ b = *h - 'a' + 10;
678
+ } else {
679
+ pi->s = h;
680
+ raise_error("invalid hex character", pi->str, pi->s);
681
+ }
682
+ h++;
683
+ b = b << 4;
684
+ if ('0' <= *h && *h <= '9') {
685
+ b += *h - '0';
686
+ } else if ('A' <= *h && *h <= 'F') {
687
+ b += *h - 'A' + 10;
688
+ } else if ('a' <= *h && *h <= 'f') {
689
+ b += *h - 'a' + 10;
690
+ } else {
691
+ pi->s = h;
692
+ raise_error("invalid hex character", pi->str, pi->s);
693
+ }
694
+ return (char)b;
695
+ }
696
+
697
+ /* Assume the value starts immediately and goes until the quote character is
698
+ * reached again. Do not read the character after the terminating quote.
699
+ */
700
+ static char*
701
+ read_quoted_value(ParseInfo pi) {
702
+ char *value = 0;
703
+ char *h = pi->s; // head
704
+ char *t = h; // tail
705
+
706
+ h++; // skip quote character
707
+ t++;
708
+ value = h;
709
+ for (; '"' != *h; h++, t++) {
710
+ if ('\0' == *h) {
711
+ pi->s = h;
712
+ raise_error("quoted string not terminated", pi->str, pi->s);
713
+ } else if ('\\' == *h) {
714
+ h++;
715
+ switch (*h) {
716
+ case 'n': *t = '\n'; break;
717
+ case 'r': *t = '\r'; break;
718
+ case 't': *t = '\t'; break;
719
+ case 'f': *t = '\f'; break;
720
+ case 'b': *t = '\b'; break;
721
+ case '"': *t = '"'; break;
722
+ case '/': *t = '/'; break;
723
+ case '\\': *t = '\\'; break;
724
+ case 'u':
725
+ h++;
726
+ *t = read_hex(pi, h);
727
+ h += 2;
728
+ if ('\0' != *t) {
729
+ t++;
730
+ }
731
+ *t = read_hex(pi, h);
732
+ h++;
733
+ break;
734
+ default:
735
+ pi->s = h;
736
+ raise_error("invalid escaped character", pi->str, pi->s);
737
+ break;
738
+ }
739
+ } else if (t != h) {
740
+ *t = *h;
741
+ }
742
+ }
743
+ *t = '\0'; // terminate value
744
+ pi->s = h + 1;
745
+
746
+ return value;
747
+ }
748
+
749
+ // doc support functions
750
+ inline static void
751
+ doc_init(Doc doc) {
752
+ //doc->where_path = doc->where_array;
753
+ //doc->where_len = 0;
754
+ doc->where = doc->where_path;
755
+ *doc->where = 0;
756
+ doc->data = 0;
757
+ doc->self = Qundef;
758
+ #ifdef HAVE_RUBY_ENCODING_H
759
+ doc->encoding = ('\0' == *oj_default_options.encoding) ? 0 : rb_enc_find(oj_default_options.encoding);
760
+ #else
761
+ doc->encoding = 0;
762
+ #endif
763
+ doc->size = 0;
764
+ doc->batches = &doc->batch0;
765
+ doc->batch0.next = 0;
766
+ doc->batch0.next_avail = 0;
767
+ }
768
+
769
+ static void
770
+ doc_free(Doc doc) {
771
+ if (0 != doc) {
772
+ Batch b;
773
+
774
+ while (0 != (b = doc->batches)) {
775
+ doc->batches = doc->batches->next;
776
+ if (&doc->batch0 != b) {
777
+ xfree(b);
778
+ }
779
+ }
780
+ /*
781
+ if (doc->where_array != doc->where_path) {
782
+ free(doc->where_path);
783
+ }
784
+ */
785
+ //xfree(f);
786
+ }
787
+ }
788
+
789
+ static VALUE
790
+ protect_open_proc(VALUE x) {
791
+ ParseInfo pi = (ParseInfo)x;
792
+
793
+ pi->doc->data = read_next(pi); // parse
794
+ *pi->doc->where = pi->doc->data;
795
+ pi->doc->where = pi->doc->where_path;
796
+ return rb_yield(pi->doc->self); // caller processing
797
+ }
798
+
799
+ static VALUE
800
+ parse_json(VALUE clas, char *json) {
801
+ struct _ParseInfo pi;
802
+ VALUE result = Qnil;
803
+ struct _Doc doc;
804
+ int ex = 0;
805
+
806
+ if (!rb_block_given_p()) {
807
+ rb_raise(rb_eArgError, "Block or Proc is required.");
808
+ }
809
+ pi.str = json;
810
+ pi.s = pi.str;
811
+ doc_init(&doc);
812
+ pi.doc = &doc;
813
+ doc.self = rb_obj_alloc(clas);
814
+ DATA_PTR(doc.self) = pi.doc;
815
+ result = rb_protect(protect_open_proc, (VALUE)&pi, &ex);
816
+ DATA_PTR(doc.self) = 0;
817
+ doc_free(pi.doc);
818
+ //xfree(pi.str);
819
+ if (0 != ex) {
820
+ rb_jump_tag(ex);
821
+ }
822
+ return result;
823
+ }
824
+
825
+ static Leaf
826
+ get_doc_leaf(Doc doc, const char *path) {
827
+ Leaf leaf = *doc->where;
828
+
829
+ if (0 != doc->data && 0 != path) {
830
+ Leaf stack[MAX_STACK];
831
+ Leaf *lp;
832
+
833
+ if ('/' == *path) {
834
+ path++;
835
+ *stack = doc->data;
836
+ lp = stack;
837
+ } else {
838
+ size_t cnt = doc->where - doc->where_path;
839
+
840
+ memcpy(stack, doc->where_path, sizeof(Leaf) * cnt);
841
+ lp = stack + cnt;
842
+ }
843
+ return get_leaf(stack, lp, path);
844
+ }
845
+ return leaf;
846
+ }
847
+
848
+ static Leaf
849
+ get_leaf(Leaf *stack, Leaf *lp, const char *path) {
850
+ Leaf leaf = *lp;
851
+
852
+ if ('\0' != *path) {
853
+ if ('.' == *path && '.' == *(path + 1)) {
854
+ path += 2;
855
+ if ('/' == *path) {
856
+ path++;
857
+ }
858
+ if (stack < lp) {
859
+ leaf = get_leaf(stack, lp - 1, path);
860
+ } else {
861
+ return 0;
862
+ }
863
+ } else if (COL_VAL == leaf->value_type && 0 != leaf->elements) {
864
+ Leaf first = leaf->elements->next;
865
+ Leaf e = first;
866
+ int type = leaf->type;
867
+
868
+ // TBD fail if stack too deep
869
+ leaf = 0;
870
+ if (T_ARRAY == type) {
871
+ int cnt = 0;
872
+
873
+ for (; '0' <= *path && *path <= '9'; path++) {
874
+ cnt = cnt * 10 + (*path - '0');
875
+ }
876
+ if ('/' == *path) {
877
+ path++;
878
+ }
879
+ do {
880
+ if (1 >= cnt) {
881
+ lp++;
882
+ *lp = e;
883
+ leaf = get_leaf(stack, lp, path);
884
+ break;
885
+ }
886
+ cnt--;
887
+ e = e->next;
888
+ } while (e != first);
889
+ } else if (T_HASH == type) {
890
+ const char *key = path;
891
+ const char *slash = strchr(path, '/');
892
+ int klen;
893
+
894
+ if (0 == slash) {
895
+ klen = (int)strlen(key);
896
+ path += klen;
897
+ } else {
898
+ klen = (int)(slash - key);
899
+ path += klen + 1;
900
+ }
901
+ do {
902
+ if (0 == strncmp(key, e->key, klen) && '\0' == e->key[klen]) {
903
+ lp++;
904
+ *lp = e;
905
+ leaf = get_leaf(stack, lp, path);
906
+ break;
907
+ }
908
+ e = e->next;
909
+ } while (e != first);
910
+ }
911
+ }
912
+ }
913
+ return leaf;
914
+ }
915
+
916
+ static void
917
+ each_leaf(Doc doc, VALUE self) {
918
+ if (COL_VAL == (*doc->where)->value_type) {
919
+ if (0 != (*doc->where)->elements) {
920
+ Leaf first = (*doc->where)->elements->next;
921
+ Leaf e = first;
922
+
923
+ doc->where++;
924
+ do {
925
+ *doc->where = e;
926
+ each_leaf(doc, self);
927
+ e = e->next;
928
+ } while (e != first);
929
+ }
930
+ } else {
931
+ rb_yield(self);
932
+ }
933
+ }
934
+
935
+ static int
936
+ move_step(Doc doc, const char *path, int loc) {
937
+ // TBD raise if too deep
938
+ if ('\0' == *path) {
939
+ loc = 0;
940
+ } else {
941
+ Leaf leaf;
942
+
943
+ if (0 == doc->where || 0 == (leaf = *doc->where)) {
944
+ printf("*** Internal error at %s\n", path);
945
+ return loc;
946
+ }
947
+ if ('.' == *path && '.' == *(path + 1)) {
948
+ Leaf init = *doc->where;
949
+
950
+ path += 2;
951
+ if (doc->where == doc->where_path) {
952
+ return loc;
953
+ }
954
+ if ('/' == *path) {
955
+ path++;
956
+ }
957
+ *doc->where = 0;
958
+ doc->where--;
959
+ loc = move_step(doc, path, loc + 1);
960
+ if (0 != loc) {
961
+ *doc->where = init;
962
+ doc->where++;
963
+ }
964
+ } else if (COL_VAL == leaf->value_type && 0 != leaf->elements) {
965
+ Leaf first = leaf->elements->next;
966
+ Leaf e = first;
967
+
968
+ if (T_ARRAY == leaf->type) {
969
+ int cnt = 0;
970
+
971
+ for (; '0' <= *path && *path <= '9'; path++) {
972
+ cnt = cnt * 10 + (*path - '0');
973
+ }
974
+ if ('/' == *path) {
975
+ path++;
976
+ } else if ('\0' != *path) {
977
+ return loc;
978
+ }
979
+ do {
980
+ if (1 >= cnt) {
981
+ doc->where++;
982
+ *doc->where = e;
983
+ loc = move_step(doc, path, loc + 1);
984
+ if (0 != loc) {
985
+ *doc->where = 0;
986
+ doc->where--;
987
+ }
988
+ break;
989
+ }
990
+ cnt--;
991
+ e = e->next;
992
+ } while (e != first);
993
+ } else if (T_HASH == leaf->type) {
994
+ const char *key = path;
995
+ const char *slash = strchr(path, '/');
996
+ int klen;
997
+
998
+ if (0 == slash) {
999
+ klen = (int)strlen(key);
1000
+ path += klen;
1001
+ } else {
1002
+ klen = (int)(slash - key);
1003
+ path += klen + 1;
1004
+ }
1005
+ do {
1006
+ if (0 == strncmp(key, e->key, klen) && '\0' == e->key[klen]) {
1007
+ doc->where++;
1008
+ *doc->where = e;
1009
+ loc = move_step(doc, path, loc + 1);
1010
+ if (0 != loc) {
1011
+ *doc->where = 0;
1012
+ doc->where--;
1013
+ }
1014
+ break;
1015
+ }
1016
+ e = e->next;
1017
+ } while (e != first);
1018
+ }
1019
+ }
1020
+ }
1021
+ return loc;
1022
+ }
1023
+
1024
+ static void
1025
+ each_value(Doc doc, Leaf leaf) {
1026
+ if (COL_VAL == leaf->value_type) {
1027
+ if (0 != leaf->elements) {
1028
+ Leaf first = leaf->elements->next;
1029
+ Leaf e = first;
1030
+
1031
+ do {
1032
+ each_value(doc, e);
1033
+ e = e->next;
1034
+ } while (e != first);
1035
+ }
1036
+ } else {
1037
+ VALUE args[1];
1038
+
1039
+ *args = leaf_value(doc, leaf);
1040
+ rb_yield_values2(1, args);
1041
+ }
1042
+ }
1043
+
1044
+ // doc functions
1045
+
1046
+ /* call-seq: open(json) { |doc| ... } => Object
1047
+ *
1048
+ * Parses a JSON document String and then yields to the provided block with an
1049
+ * instance of the Oj::Doc as the single yield parameter.
1050
+ *
1051
+ * @param [String] json JSON document string
1052
+ * @yieldparam [Oj::Doc] doc parsed JSON document
1053
+ * @yieldreturn [Object] returns the result of the yield as the result of the method call
1054
+ * @example
1055
+ * Oj::Doc.open('[1,2,3]') { |doc| doc.size() } #=> 4
1056
+ */
1057
+ static VALUE
1058
+ doc_open(VALUE clas, VALUE str) {
1059
+ char *json;
1060
+ size_t len;
1061
+
1062
+ Check_Type(str, T_STRING);
1063
+ len = RSTRING_LEN(str) + 1;
1064
+ json = ALLOCA_N(char, len);
1065
+ memcpy(json, StringValuePtr(str), len);
1066
+
1067
+ return parse_json(clas, json);
1068
+ }
1069
+
1070
+ /* call-seq: open_file(filename) { |doc| ... } => Object
1071
+ *
1072
+ * Parses a JSON document from a file and then yields to the provided block
1073
+ * with an instance of the Oj::Doc as the single yield parameter.
1074
+ *
1075
+ * @param [String] filename name of file that contains a JSON document
1076
+ * @yieldparam [Oj::Doc] doc parsed JSON document
1077
+ * @yieldreturn [Object] returns the result of the yield as the result of the method call
1078
+ * @example
1079
+ * File.open('array.json', 'w') { |f| f.write('[1,2,3]') }
1080
+ * Oj::Doc.open_file(filename) { |doc| doc.size() } #=> 4
1081
+ */
1082
+ static VALUE
1083
+ doc_open_file(VALUE clas, VALUE filename) {
1084
+ char *path;
1085
+ char *json;
1086
+ FILE *f;
1087
+ size_t len;
1088
+
1089
+ Check_Type(filename, T_STRING);
1090
+ path = StringValuePtr(filename);
1091
+ if (0 == (f = fopen(path, "r"))) {
1092
+ rb_raise(rb_eIOError, "%s\n", strerror(errno));
1093
+ }
1094
+ fseek(f, 0, SEEK_END);
1095
+ len = ftell(f);
1096
+ json = ALLOCA_N(char, len + 1);
1097
+ fseek(f, 0, SEEK_SET);
1098
+ if (len != fread(json, 1, len, f)) {
1099
+ fclose(f);
1100
+ rb_raise(rb_eLoadError, "Failed to read %ld bytes from %s.\n", len, path);
1101
+ }
1102
+ fclose(f);
1103
+ json[len] = '\0';
1104
+
1105
+ return parse_json(clas, json);
1106
+ }
1107
+
1108
+ /* Document-method: parse
1109
+ * @see Oj::Doc.open
1110
+ */
1111
+
1112
+ /* call-seq: where?() => String
1113
+ *
1114
+ * Returns a String that describes the absolute path to the current location
1115
+ * in the JSON document.
1116
+ */
1117
+ static VALUE
1118
+ doc_where(VALUE self) {
1119
+ Doc doc = DATA_PTR(self);
1120
+
1121
+ if (0 == *doc->where_path || doc->where == doc->where_path) {
1122
+ return oj_slash_string;
1123
+ } else {
1124
+ Leaf *lp;
1125
+ Leaf leaf;
1126
+ size_t size = 3; // leading / and terminating \0
1127
+ char *path;
1128
+ char *p;
1129
+
1130
+ for (lp = doc->where_path; lp <= doc->where; lp++) {
1131
+ leaf = *lp;
1132
+ if (T_HASH == leaf->parent_type) {
1133
+ size += strlen((*lp)->key) + 1;
1134
+ } else if (T_ARRAY == leaf->parent_type) {
1135
+ size += ((*lp)->index < 100) ? 3 : 11;
1136
+ }
1137
+ }
1138
+ path = ALLOCA_N(char, size);
1139
+ p = path;
1140
+ for (lp = doc->where_path; lp <= doc->where; lp++) {
1141
+ leaf = *lp;
1142
+ if (T_HASH == leaf->parent_type) {
1143
+ p = stpcpy(p, (*lp)->key);
1144
+ } else if (T_ARRAY == leaf->parent_type) {
1145
+ p = ulong_fill(p, (*lp)->index);
1146
+ }
1147
+ *p++ = '/';
1148
+ }
1149
+ *--p = '\0';
1150
+ return rb_str_new2(path);
1151
+ }
1152
+ }
1153
+
1154
+ /* call-seq: local_key() => String, Fixnum, nil
1155
+ *
1156
+ * Returns the final key to the current location.
1157
+ * @example
1158
+ * Oj::Doc.open('[1,2,3]') { |doc| doc.move('/2'); doc.local_key() } #=> 2
1159
+ * Oj::Doc.open('{"one":3}') { |doc| doc.move('/one'); doc.local_key() } #=> "one"
1160
+ * Oj::Doc.open('[1,2,3]') { |doc| doc.local_key() } #=> nil
1161
+ */
1162
+ static VALUE
1163
+ doc_local_key(VALUE self) {
1164
+ Doc doc = DATA_PTR(self);
1165
+ Leaf leaf = *doc->where;
1166
+ VALUE key = Qnil;
1167
+
1168
+ if (T_HASH == leaf->parent_type) {
1169
+ key = rb_str_new2(leaf->key);
1170
+ #ifdef HAVE_RUBY_ENCODING_H
1171
+ if (0 != doc->encoding) {
1172
+ rb_enc_associate(key, doc->encoding);
1173
+ }
1174
+ #endif
1175
+ } else if (T_ARRAY == leaf->parent_type) {
1176
+ key = LONG2NUM(leaf->index);
1177
+ }
1178
+ return key;
1179
+ }
1180
+
1181
+ /* call-seq: home() => nil
1182
+ *
1183
+ * Moves the document marker or location to the hoot or home position. The
1184
+ * same operation can be performed with a Oj::Doc.move('/').
1185
+ * @example
1186
+ * Oj::Doc.open('[1,2,3]') { |doc| doc.move('/2'); doc.home(); doc.where? } #=> '/'
1187
+ */
1188
+ static VALUE
1189
+ doc_home(VALUE self) {
1190
+ Doc doc = DATA_PTR(self);
1191
+
1192
+ *doc->where_path = doc->data;
1193
+ doc->where = doc->where_path;
1194
+
1195
+ return oj_slash_string;
1196
+ }
1197
+
1198
+ /* call-seq: type(path=nil) => Class
1199
+ *
1200
+ * Returns the Class of the data value at the location identified by the path
1201
+ * or the current location if the path is nil or not provided. This method
1202
+ * does not create the Ruby Object at the location specified so the overhead
1203
+ * is low.
1204
+ * @param [String] path path to the location to get the type of if provided
1205
+ * @example
1206
+ * Oj::Doc.open('[1,2]') { |doc| doc.type() } #=> Array
1207
+ * Oj::Doc.open('[1,2]') { |doc| doc.type('/1') } #=> Fixnum
1208
+ */
1209
+ static VALUE
1210
+ doc_type(int argc, VALUE *argv, VALUE self) {
1211
+ Doc doc = DATA_PTR(self);
1212
+ Leaf leaf;
1213
+ const char *path = 0;
1214
+ VALUE type = Qnil;
1215
+
1216
+ if (1 <= argc) {
1217
+ Check_Type(*argv, T_STRING);
1218
+ path = StringValuePtr(*argv);
1219
+ }
1220
+ if (0 != (leaf = get_doc_leaf(doc, path))) {
1221
+ switch (leaf->type) {
1222
+ case T_NIL: type = rb_cNilClass; break;
1223
+ case T_TRUE: type = rb_cTrueClass; break;
1224
+ case T_FALSE: type = rb_cFalseClass; break;
1225
+ case T_STRING: type = rb_cString; break;
1226
+ case T_FIXNUM: type = rb_cFixnum; break;
1227
+ case T_FLOAT: type = rb_cFloat; break;
1228
+ case T_ARRAY: type = rb_cArray; break;
1229
+ case T_HASH: type = rb_cHash; break;
1230
+ default: break;
1231
+ }
1232
+ }
1233
+ return type;
1234
+ }
1235
+
1236
+ /* call-seq: fetch(path=nil) => nil, true, false, Fixnum, Float, String, Array, Hash
1237
+ *
1238
+ * Returns the value at the location identified by the path or the current
1239
+ * location if the path is nil or not provided. This method will create and
1240
+ * return an Array or Hash if that is the type of Object at the location
1241
+ * specified. This is more expensive than navigating to the leaves of the JSON
1242
+ * document.
1243
+ * @param [String] path path to the location to get the type of if provided
1244
+ * @example
1245
+ * Oj::Doc.open('[1,2]') { |doc| doc.fetch() } #=> [1, 2]
1246
+ * Oj::Doc.open('[1,2]') { |doc| doc.fetch('/1') } #=> 1
1247
+ */
1248
+ static VALUE
1249
+ doc_fetch(int argc, VALUE *argv, VALUE self) {
1250
+ Doc doc = DATA_PTR(self);
1251
+ Leaf leaf;
1252
+ VALUE val = Qnil;
1253
+ const char *path = 0;
1254
+
1255
+ if (1 <= argc) {
1256
+ Check_Type(*argv, T_STRING);
1257
+ path = StringValuePtr(*argv);
1258
+ if (2 == argc) {
1259
+ val = argv[1];
1260
+ }
1261
+ }
1262
+ if (0 != (leaf = get_doc_leaf(doc, path))) {
1263
+ val = leaf_value(doc, leaf);
1264
+ }
1265
+ return val;
1266
+ }
1267
+
1268
+ /* call-seq: each_leaf(path=nil) => nil
1269
+ *
1270
+ * Yields to the provided block for each leaf node with the identified
1271
+ * location of the JSON document as the root. The parameter passed to the
1272
+ * block on yield is the Doc instance after moving to the child location.
1273
+ * @param [String] path if provided it identified the top of the branch to process the leaves of
1274
+ * @yieldparam [Doc] Doc at the child location
1275
+ * @example
1276
+ * Oj::Doc.open('[3,[2,1]]') { |doc|
1277
+ * result = {}
1278
+ * doc.each_leaf() { |d| result[d.where?] = d.fetch() }
1279
+ * result
1280
+ * }
1281
+ * #=> ["/1" => 3, "/2/1" => 2, "/2/2" => 1]
1282
+ */
1283
+ static VALUE
1284
+ doc_each_leaf(int argc, VALUE *argv, VALUE self) {
1285
+ if (rb_block_given_p()) {
1286
+ Leaf save_path[MAX_STACK];
1287
+ Doc doc = DATA_PTR(self);
1288
+ const char *path = 0;
1289
+ size_t wlen;
1290
+
1291
+ wlen = doc->where - doc->where_path;
1292
+ memcpy(save_path, doc->where_path, sizeof(Leaf) * wlen);
1293
+ if (1 <= argc) {
1294
+ Check_Type(*argv, T_STRING);
1295
+ path = StringValuePtr(*argv);
1296
+ if ('/' == *path) {
1297
+ doc->where = doc->where_path;
1298
+ path++;
1299
+ }
1300
+ if (0 != move_step(doc, path, 1)) {
1301
+ memcpy(doc->where_path, save_path, sizeof(Leaf) * wlen);
1302
+ return Qnil;
1303
+ }
1304
+ }
1305
+ each_leaf(doc, self);
1306
+ memcpy(doc->where_path, save_path, sizeof(Leaf) * wlen);
1307
+ }
1308
+ return Qnil;
1309
+ }
1310
+
1311
+ /* call-seq: move(path) => nil
1312
+ *
1313
+ * Moves the document marker to the path specified. The path can an absolute
1314
+ * path or a relative path.
1315
+ * @param [String] path path to the location to move to
1316
+ * @example
1317
+ * Oj::Doc.open('{"one":[1,2]') { |doc| doc.move('/one/2'); doc.where? } #=> "/one/2"
1318
+ */
1319
+ static VALUE
1320
+ doc_move(VALUE self, VALUE str) {
1321
+ Doc doc = DATA_PTR(self);
1322
+ const char *path;
1323
+ int loc;
1324
+
1325
+ Check_Type(str, T_STRING);
1326
+ path = StringValuePtr(str);
1327
+ if ('/' == *path) {
1328
+ doc->where = doc->where_path;
1329
+ path++;
1330
+ }
1331
+ if (0 != (loc = move_step(doc, path, 1))) {
1332
+ rb_raise(rb_eArgError, "Failed to locate element %d of the path %s.", loc, path);
1333
+ }
1334
+ return Qnil;
1335
+ }
1336
+
1337
+ /* call-seq: each_child(path=nil) { |doc| ... } => nil
1338
+ *
1339
+ * Yields to the provided block for each immediate child node with the
1340
+ * identified location of the JSON document as the root. The parameter passed
1341
+ * to the block on yield is the Doc instance after moving to the child
1342
+ * location.
1343
+ * @param [String] path if provided it identified the top of the branch to process the chilren of
1344
+ * @yieldparam [Doc] Doc at the child location
1345
+ * @example
1346
+ * Oj::Doc.open('[3,[2,1]]') { |doc|
1347
+ * result = []
1348
+ * doc.each_value('/2') { |doc| result << doc.where? }
1349
+ * result
1350
+ * }
1351
+ * #=> ["/2/1", "/2/2"]
1352
+ */
1353
+ static VALUE
1354
+ doc_each_child(int argc, VALUE *argv, VALUE self) {
1355
+ if (rb_block_given_p()) {
1356
+ Leaf save_path[MAX_STACK];
1357
+ Doc doc = DATA_PTR(self);
1358
+ const char *path = 0;
1359
+ size_t wlen;
1360
+
1361
+ wlen = doc->where - doc->where_path;
1362
+ memcpy(save_path, doc->where_path, sizeof(Leaf) * wlen);
1363
+ if (1 <= argc) {
1364
+ Check_Type(*argv, T_STRING);
1365
+ path = StringValuePtr(*argv);
1366
+ if ('/' == *path) {
1367
+ doc->where = doc->where_path;
1368
+ path++;
1369
+ }
1370
+ if (0 != move_step(doc, path, 1)) {
1371
+ memcpy(doc->where_path, save_path, sizeof(Leaf) * wlen);
1372
+ return Qnil;
1373
+ }
1374
+ }
1375
+ if (COL_VAL == (*doc->where)->value_type && 0 != (*doc->where)->elements) {
1376
+ Leaf first = (*doc->where)->elements->next;
1377
+ Leaf e = first;
1378
+ VALUE args[1];
1379
+
1380
+ *args = self;
1381
+ doc->where++;
1382
+ do {
1383
+ *doc->where = e;
1384
+ rb_yield_values2(1, args);
1385
+ e = e->next;
1386
+ } while (e != first);
1387
+ }
1388
+ memcpy(doc->where_path, save_path, sizeof(Leaf) * wlen);
1389
+ }
1390
+ return Qnil;
1391
+ }
1392
+
1393
+ /* call-seq: each_value(path=nil) { |val| ... } => nil
1394
+ *
1395
+ * Yields to the provided block for each leaf value in the identified location
1396
+ * of the JSON document. The parameter passed to the block on yield is the
1397
+ * value of the leaf. Only those leaves below the element specified by the
1398
+ * path parameter are processed.
1399
+ * @param [String] path if provided it identified the top of the branch to process the leaf values of
1400
+ * @yieldparam [Object] val each leaf value
1401
+ * @example
1402
+ * Oj::Doc.open('[3,[2,1]]') { |doc|
1403
+ * result = []
1404
+ * doc.each_value() { |v| result << v }
1405
+ * result
1406
+ * }
1407
+ * #=> [3, 2, 1]
1408
+ *
1409
+ * Oj::Doc.open('[3,[2,1]]') { |doc|
1410
+ * result = []
1411
+ * doc.each_value('/2') { |v| result << v }
1412
+ * result
1413
+ * }
1414
+ * #=> [2, 1]
1415
+ */
1416
+ static VALUE
1417
+ doc_each_value(int argc, VALUE *argv, VALUE self) {
1418
+ if (rb_block_given_p()) {
1419
+ Doc doc = DATA_PTR(self);
1420
+ const char *path = 0;
1421
+ Leaf leaf;
1422
+
1423
+ if (1 <= argc) {
1424
+ Check_Type(*argv, T_STRING);
1425
+ path = StringValuePtr(*argv);
1426
+ }
1427
+ if (0 != (leaf = get_doc_leaf(doc, path))) {
1428
+ each_value(doc, leaf);
1429
+ }
1430
+ }
1431
+ return Qnil;
1432
+ }
1433
+
1434
+ // TBD improve to be more direct for higher performance
1435
+
1436
+ /* call-seq: dump(path=nil) => String
1437
+ *
1438
+ * Dumps the document or nodes to a new JSON document. It uses the default
1439
+ * options for generating the JSON.
1440
+ * @param [String] path if provided it identified the top of the branch to dump to JSON
1441
+ * @example
1442
+ * Oj::Doc.open('[3,[2,1]]') { |doc|
1443
+ * doc.dump('/2')
1444
+ * }
1445
+ * #=> "[2,1]"
1446
+ */
1447
+ static VALUE
1448
+ doc_dump(int argc, VALUE *argv, VALUE self) {
1449
+ Doc doc = DATA_PTR(self);
1450
+ Leaf leaf;
1451
+ const char *path = 0;
1452
+ const char *json;
1453
+
1454
+ if (1 <= argc) {
1455
+ Check_Type(*argv, T_STRING);
1456
+ path = StringValuePtr(*argv);
1457
+ }
1458
+ if (0 != (leaf = get_doc_leaf(doc, path))) {
1459
+ json = oj_write_obj_to_str(leaf_value(doc, leaf), &oj_default_options);
1460
+
1461
+ return rb_str_new2(json);
1462
+ }
1463
+ return Qnil;
1464
+ }
1465
+
1466
+ /* call-seq: size() => Fixnum
1467
+ *
1468
+ * Returns the number of nodes in the JSON document where a node is any one of
1469
+ * the basic JSON components.
1470
+ * @return Returns the size of the JSON document.
1471
+ * @example
1472
+ * Oj::Doc.open('[1,2,3]') { |doc| doc.size() } #=> 4
1473
+ */
1474
+ static VALUE
1475
+ doc_size(VALUE self) {
1476
+ return ULONG2NUM(((Doc)DATA_PTR(self))->size);
1477
+ }
1478
+
1479
+ /* Document-class: Oj::Doc
1480
+ *
1481
+ * The Doc class is used to parse and navigate a JSON document. The model it
1482
+ * employs is that of a document that while open can be navigated and values
1483
+ * extracted. Once the document is closed the document can not longer be
1484
+ * accessed. This allows the parsing and data extraction to be extremely fast
1485
+ * compared to other JSON parses.
1486
+ *
1487
+ * An Oj::Doc class is not created directly but the _open()_ class method is
1488
+ * used to open a document and the yield parameter to the block of the #open()
1489
+ * call is the Doc instance. The Doc instance can be moved across, up, and
1490
+ * down the JSON document. At each element the data associated with the
1491
+ * element can be extracted. It is also possible to just provide a path to the
1492
+ * data to be extracted and retrieve the data in that manner.
1493
+ *
1494
+ * For many of the methods a path is used to describe the location of an
1495
+ * element. Paths follow a subset of the XPath syntax. The slash ('/')
1496
+ * character is the separator. Each step in the path identifies the next
1497
+ * branch to take through the document. A JSON object will expect a key string
1498
+ * while an array will expect a positive index. A .. step indicates a move up
1499
+ * the JSON document.
1500
+ *
1501
+ * @example
1502
+ * json = %{[
1503
+ * {
1504
+ * "one" : 1,
1505
+ * "two" : 2
1506
+ * },
1507
+ * {
1508
+ * "three" : 3,
1509
+ * "four" : 4
1510
+ * }
1511
+ * ]}
1512
+ * # move and get value
1513
+ * Oj::Doc.open(json) do |doc|
1514
+ * doc.move('/1/two')
1515
+ * # doc location is now at the 'two' element of the hash that is the first element of the array.
1516
+ * doc.fetch()
1517
+ * end
1518
+ * #=> 2
1519
+ *
1520
+ * # Now try again using a path to Oj::Doc.fetch() directly.
1521
+ * Oj::Doc.open(json) { |doc| doc.fetch('/2/three') } #=> 3
1522
+ */
1523
+ void
1524
+ oj_init_doc() {
1525
+ oj_doc_class = rb_define_class_under(Oj, "Doc", rb_cObject);
1526
+ rb_define_singleton_method(oj_doc_class, "open", doc_open, 1);
1527
+ rb_define_singleton_method(oj_doc_class, "open_file", doc_open_file, 1);
1528
+ rb_define_singleton_method(oj_doc_class, "parse", doc_open, 1);
1529
+ rb_define_method(oj_doc_class, "where?", doc_where, 0);
1530
+ rb_define_method(oj_doc_class, "local_key", doc_local_key, 0);
1531
+ rb_define_method(oj_doc_class, "home", doc_home, 0);
1532
+ rb_define_method(oj_doc_class, "type", doc_type, -1);
1533
+ rb_define_method(oj_doc_class, "fetch", doc_fetch, -1);
1534
+ rb_define_method(oj_doc_class, "each_leaf", doc_each_leaf, -1);
1535
+ rb_define_method(oj_doc_class, "move", doc_move, 1);
1536
+ rb_define_method(oj_doc_class, "each_child", doc_each_child, -1);
1537
+ rb_define_method(oj_doc_class, "each_value", doc_each_value, -1);
1538
+ rb_define_method(oj_doc_class, "dump", doc_dump, -1);
1539
+ rb_define_method(oj_doc_class, "size", doc_size, 0);
1540
+ }