isomorfeus-ferret 0.13.12 → 0.14.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 60ce50bdaf067d95199abf381bd96e0b92fcce70ac7f64ea7e2f73ac1564ff64
4
- data.tar.gz: 9c6eba0bc2630a6d95b55ea5887466a041e50425f1472c9008ef82a841194cf6
3
+ metadata.gz: e6893e7012cf75189d3ff378b6e869a831a5281472f84ea5ab4e354bd92bfcee
4
+ data.tar.gz: 0a4cad49faae062c29e0bed8fd7f87c5e3875c548cbb6e168907719b52777306
5
5
  SHA512:
6
- metadata.gz: 878e93d18fe9a2847599be12478baea42fd010d4aa52f639977a7fb4bc8901771035908c9ceebfd34508ccd02f5d909569d06f88e55df36d8345e5bf35942d3e
7
- data.tar.gz: c00f2909a8ef647a81deff946bedbfe5e4bfd86c175537a114866ca278a388a36c885a612f3cb1c49b582aaea9c9751f86b439d111e1b72205f46f653d7913cc
6
+ metadata.gz: 4e48ec64d99af7fe0440480f11f22fd79dd9fca0c6dd09ce58bc0953f7f556ee4f61bdf18810f5cb2f39c8f80c99b406c5319c93235d06974009cefb1c73fccb
7
+ data.tar.gz: 254960eb7543fb59e1d12087f83feaeb4abd957314f76b56778baf7a2fef2e090922b4b99d653e8532e30690a14d90c661ec5f849588a7e384b2a65346f7f04d
data/README.md CHANGED
@@ -11,13 +11,29 @@ At the [Isomorfeus Framework Project](https://isomorfeus.com)
11
11
 
12
12
  ## About this project
13
13
 
14
- Isomorfeus-Ferret is a revived version of the original ferret gem created by Dave Balmain, [https://github.com/dbalmain/ferret](https://github.com/dbalmain/ferret).
14
+ Isomorfeus-Ferret is a revived version of the original ferret gem created by Dave Balmain,
15
+ [https://github.com/dbalmain/ferret](https://github.com/dbalmain/ferret).
15
16
  During revival many things havbe been fixed, now all tests pass, no crashes and it
16
17
  successfully compiles and runs with rubys >3. Its no longer a goal to have
17
18
  a c library available, but instead the usage is meant as ruby gem with a c extension only.
18
19
 
19
20
  It works on *nixes, *nuxes, *BSDs and also works on Windows and RaspberryPi.
20
21
 
22
+ ## Improvements and Changes in Version 0.14
23
+
24
+ ### Breaking
25
+
26
+ - The API for LazyDocs has changed, they are read only now. LazyDoc#to_h may be used to create a hash, that may be changed and reindexed as doc.
27
+
28
+ ### Performance
29
+
30
+ - LazyDoc is now truly lazy, fields are automatically retrieved. LazyDoc#load is no longer required, but may be used to preload all fields.
31
+ - Index#each is now multiple times faster, depending on use case.
32
+
33
+ ### Other
34
+
35
+ - The Index class now includes Enumerable
36
+
21
37
  ## Improvements and Changes in Version 0.13
22
38
 
23
39
  ### Breaking
@@ -99,37 +115,42 @@ Ensure your locale is set to C.UTF-8, because the internal c tests don't know ho
99
115
 
100
116
  A recent Java JDK must be installed to compile and run lucene benchmarks.
101
117
 
102
- Results, Ferret 0.13.7 vs. Lucene 9.1.0, WhitespaceAnalyzer,
103
- Linux Ubuntu 20.04, FreeBSD 13.0 and Windows 10 on old Intel Core i5 from 2015,
118
+ Results, Ferret 0.14.0 vs. Lucene 9.1.0, WhitespaceAnalyzer,
119
+ Linux Ubuntu 20.04, FreeBSD 13.1 and Windows 10 on old Intel Core i5 from 2015,
104
120
  LinuxPi on RaspberryPi 400:
105
121
 
106
122
  | OS | Task | Ferret | Lucene* |
107
123
  |---------|------------|-----------------|----------------|
108
- | Linux | Indexing | 4905 docs/s | 4785 docs/s |
109
- | FreeBSD | Indexing | 4516 docs/s | - |
110
- | Windows | Indexing | 2361 docs/s | 2395 docs/s |
111
- | LinuxPi | Indexing | 1161 docs/s | 707 docs/s |
112
- | Linux | Searching | 25664 queries/s | 4708 queries/s |
113
- | FreeBSD | Searching | 25073 queries/s | - |
114
- | Windows | Searching | 3646 queries/s | 935 queries/s |
115
- | LinuxPi | Searching | 5768 queries/s | 680 queries/s |
124
+ | Linux | Indexing | 5125 docs/s | 4671 docs/s |
125
+ | FreeBSD | Indexing | 4537 docs/s | 3831 docs/s |
126
+ | Windows | Indexing | 2488 docs/s | 2588 docs/s |
127
+ | LinuxPi | Indexing | 1200 docs/s | 551 docs/s |
128
+ | Linux | Searching | 26610 queries/s | 7165 queries/s |
129
+ | FreeBSD | Searching | 24167 queries/s | 4288 queries/s |
130
+ | Windows | Searching | 3901 queries/s | 1033 queries/s |
131
+ | LinuxPi | Searching | 6194 queries/s | 769 queries/s |
116
132
  | | Index Size | 28 MB | 35 MB |
117
133
 
118
- *Lucene 9.1.0 on JVM 11.0.14.1 (Ubuntu)
134
+ * JVM Versions:
135
+ OpenJDK Runtime Environment (build 18-ea+36-Ubuntu-1) (Linux)
136
+ OpenJDK Runtime Environment (build 17.0.3+7-Raspbian-1deb11u1rpt1) (LinuxPi)
137
+ OpenJDK Runtime Environment Temurin-18.0.1+10 (build 18.0.1+10) (Windows)
138
+ OpenJDK Runtime Environment (build 17.0.2+8-1) (FreeBSD)
119
139
 
120
140
  ### Storing Fields with Compression, Indexing and Retrieval
141
+
121
142
  - clone repo
122
143
  - bundle install
123
144
  - rake ferret_compression_benchmark
124
145
 
125
- Results on Linux, 0.13.7, on old Intel Core i5 from 2015:
146
+ Results on Linux, 0.14.0, on old Intel Core i5 from 2015:
126
147
 
127
- | Compression | Index & Store | Retrieve | Index size |
128
- |-------------|---------------|---------------|------------|
129
- | none | 4866 docs/s | 153853 docs/s | 43 MB |
130
- | brotli | 3539 docs/s | 58315 docs/s | 36 MB |
131
- | bzip2 | 2624 docs/s | 15382 docs/s | 38 MB |
132
- | lz4 | 4639 docs/s | 127100 docs/s | 41 MB |
148
+ | Compression | Index & Store | Retrieve Title | Index size |
149
+ |-------------|---------------|----------------|------------|
150
+ | none | 4862 docs/s | 278827 docs/s | 43 MB |
151
+ | brotli | 3559 docs/s | 178170 docs/s | 36 MB |
152
+ | bzip2 | 2628 docs/s | 81877 docs/s | 38 MB |
153
+ | lz4 | 4648 docs/s | 232236 docs/s | 41 MB |
133
154
 
134
155
  ## Future
135
156
 
@@ -1,6 +1,5 @@
1
1
  #include "frt_index.h"
2
2
  #include "isomorfeus_ferret.h"
3
- #include <ruby.h>
4
3
 
5
4
  // #undef close
6
5
 
@@ -16,8 +15,6 @@ VALUE cTermVector;
16
15
  VALUE cTermEnum;
17
16
  VALUE cTermDocEnum;
18
17
 
19
- VALUE cLazyDoc;
20
- VALUE cLazyDocData;
21
18
  VALUE cIndexWriter;
22
19
  VALUE cIndexReader;
23
20
 
@@ -59,15 +56,17 @@ static VALUE sym_with_positions_offsets;
59
56
  static ID fsym_content;
60
57
 
61
58
  static ID id_term;
62
- static ID id_fields;
63
59
  static ID id_fld_num_map;
64
60
  static ID id_field_num;
65
61
  static ID id_boost;
66
62
 
63
+ extern VALUE sym_each;
67
64
  extern rb_encoding *utf8_encoding;
68
65
  extern void frb_set_term(VALUE rterm, FrtTerm *t);
69
66
  extern FrtAnalyzer *frb_get_cwrapped_analyzer(VALUE ranalyzer);
70
67
  extern VALUE frb_get_analyzer(FrtAnalyzer *a);
68
+ extern VALUE frb_get_lazy_doc(FrtLazyDoc *lazy_doc);
69
+ extern void Init_LazyDoc(void);
71
70
 
72
71
  /****************************************************************************
73
72
  *
@@ -1987,125 +1986,6 @@ frb_iw_set_use_compound_file(VALUE self, VALUE rval)
1987
1986
  return rval;
1988
1987
  }
1989
1988
 
1990
- /****************************************************************************
1991
- *
1992
- * LazyDoc Methods
1993
- *
1994
- ****************************************************************************/
1995
-
1996
- static void frb_lzd_data_free(void *p) {
1997
- frt_lazy_doc_close((FrtLazyDoc *)p);
1998
- }
1999
-
2000
- static size_t frb_lazy_doc_size(const void *p) {
2001
- return sizeof(FrtLazyDoc);
2002
- (void)p;
2003
- }
2004
-
2005
- const rb_data_type_t frb_lazy_doc_t = {
2006
- .wrap_struct_name = "FrbLazyDoc",
2007
- .function = {
2008
- .dmark = NULL,
2009
- .dfree = frb_lzd_data_free,
2010
- .dsize = frb_lazy_doc_size,
2011
- .dcompact = NULL,
2012
- .reserved = {0},
2013
- },
2014
- .parent = NULL,
2015
- .data = NULL,
2016
- .flags = RUBY_TYPED_FREE_IMMEDIATELY
2017
- };
2018
-
2019
- static VALUE frb_lzd_alloc(VALUE klass) {
2020
- FrtLazyDoc *ld = FRT_ALLOC(FrtLazyDoc);
2021
- return TypedData_Wrap_Struct(klass, &frb_lazy_doc_t, ld);
2022
- }
2023
-
2024
- static VALUE frb_lazy_df_load(VALUE self, VALUE rkey, FrtLazyDocField *lazy_df) {
2025
- VALUE rdata = Qnil;
2026
- if (lazy_df) {
2027
- if (lazy_df->size == 1) {
2028
- char *data = frt_lazy_df_get_data(lazy_df, 0);
2029
- rdata = rb_str_new(data, lazy_df->data[0].length);
2030
- rb_enc_associate(rdata, lazy_df->data[0].encoding);
2031
- } else {
2032
- int i;
2033
- VALUE rstr;
2034
- rdata = rb_ary_new2(lazy_df->size);
2035
- for (i = 0; i < lazy_df->size; i++) {
2036
- char *data = frt_lazy_df_get_data(lazy_df, i);
2037
- rstr = rb_str_new(data, lazy_df->data[i].length);
2038
- rb_enc_associate(rstr, lazy_df->data[i].encoding);
2039
- rb_ary_store(rdata, i, rstr);
2040
- }
2041
- }
2042
- rb_hash_aset(self, rkey, rdata);
2043
- }
2044
- return rdata;
2045
- }
2046
-
2047
- /*
2048
- * call-seq:
2049
- * lazy_doc.default(key) -> string
2050
- *
2051
- * This method is used internally to lazily load fields. You should never
2052
- * really need to call it yourself.
2053
- */
2054
- static VALUE frb_lzd_default(VALUE self, VALUE rkey) {
2055
- FrtLazyDoc *lazy_doc = (FrtLazyDoc *)DATA_PTR(rb_ivar_get(self, id_data));
2056
- ID field = frb_field(rkey);
2057
- VALUE rfield = ID2SYM(field);
2058
-
2059
- return frb_lazy_df_load(self, rfield, frt_lazy_doc_get(lazy_doc, field));
2060
- }
2061
-
2062
- /*
2063
- * call-seq:
2064
- * lazy_doc.fields -> array of available fields
2065
- *
2066
- * Returns the list of fields stored for this particular document. If you try
2067
- * to access any of these fields in the document the field will be loaded.
2068
- * Try to access any other field an nil will be returned.
2069
- */
2070
- static VALUE frb_lzd_fields(VALUE self) {
2071
- return rb_ivar_get(self, id_fields);
2072
- }
2073
-
2074
- /*
2075
- * call-seq:
2076
- * lazy_doc.load -> lazy_doc
2077
- *
2078
- * Load all unloaded fields in the document from the index.
2079
- */
2080
- static VALUE frb_lzd_load(VALUE self) {
2081
- FrtLazyDoc *lazy_doc = (FrtLazyDoc *)DATA_PTR(rb_ivar_get(self, id_data));
2082
- int i;
2083
- for (i = 0; i < lazy_doc->size; i++) {
2084
- FrtLazyDocField *lazy_df = lazy_doc->fields[i];
2085
- frb_lazy_df_load(self, ID2SYM(lazy_df->name), lazy_df);
2086
- }
2087
- return self;
2088
- }
2089
-
2090
- VALUE frb_get_lazy_doc(FrtLazyDoc *lazy_doc) {
2091
- int i;
2092
- VALUE rfields = rb_ary_new2(lazy_doc->size);
2093
-
2094
- VALUE self, rdata;
2095
- self = rb_hash_new();
2096
- OBJSETUP(self, cLazyDoc, T_HASH);
2097
-
2098
- rdata = TypedData_Wrap_Struct(cLazyDocData, &frb_lazy_doc_t, lazy_doc);
2099
- rb_ivar_set(self, id_data, rdata);
2100
-
2101
- for (i = 0; i < lazy_doc->size; i++) {
2102
- rb_ary_store(rfields, i, ID2SYM(lazy_doc->fields[i]->name));
2103
- }
2104
- rb_ivar_set(self, id_fields, rfields);
2105
-
2106
- return self;
2107
- }
2108
-
2109
1989
  /****************************************************************************
2110
1990
  *
2111
1991
  * IndexReader Methods
@@ -2743,12 +2623,33 @@ frb_ir_tk_fields(VALUE self)
2743
2623
  * Returns the current version of the index reader.
2744
2624
  */
2745
2625
  static VALUE
2746
- frb_ir_version(VALUE self)
2747
- {
2626
+ frb_ir_version(VALUE self) {
2748
2627
  FrtIndexReader *ir = (FrtIndexReader *)DATA_PTR(self);
2749
2628
  return ULL2NUM(ir->sis->version);
2750
2629
  }
2751
2630
 
2631
+ static VALUE frb_ir_to_enum(VALUE self) {
2632
+ return rb_enumeratorize(self, sym_each, 0, NULL);
2633
+ }
2634
+
2635
+ static VALUE frb_ir_each(VALUE self) {
2636
+ FrtIndexReader *ir = (FrtIndexReader *)DATA_PTR(self);
2637
+ if (rb_block_given_p()) {
2638
+ long i;
2639
+ long max_doc = ir->max_doc(ir);
2640
+ VALUE rld;
2641
+ for (i = 0; i < max_doc; i++) {
2642
+ if (ir->is_deleted(ir, i)) continue;
2643
+ rld = frb_get_lazy_doc(ir->get_lazy_doc(ir, i));
2644
+ rb_yield(rld);
2645
+ }
2646
+ return self;
2647
+ } else {
2648
+ return frb_ir_to_enum(self);
2649
+ }
2650
+
2651
+ }
2652
+
2752
2653
  /****************************************************************************
2753
2654
  *
2754
2655
  * Init Functions
@@ -3350,48 +3251,6 @@ void Init_IndexWriter(void) {
3350
3251
  rb_define_method(cIndexWriter, "use_compound_file=", frb_iw_set_use_compound_file, 1);
3351
3252
  }
3352
3253
 
3353
- /*
3354
- * Document-class: Ferret::Index::LazyDoc
3355
- *
3356
- * == Summary
3357
- *
3358
- * When a document is retrieved from the index a LazyDoc is returned.
3359
- * Actually, LazyDoc is just a modified Hash object which lazily adds fields
3360
- * to itself when they are accessed. You should note that the keys method
3361
- * will return nothing until you actually access one of the fields. To see
3362
- * what fields are available use LazyDoc#fields rather than LazyDoc#keys. To
3363
- * load all fields use the LazyDoc#load method.
3364
- *
3365
- * == Example
3366
- *
3367
- * doc = index_reader[0]
3368
- *
3369
- * doc.keys #=> []
3370
- * doc.values #=> []
3371
- * doc.fields #=> [:title, :content]
3372
- *
3373
- * title = doc[:title] #=> "the title"
3374
- * doc.keys #=> [:title]
3375
- * doc.values #=> ["the title"]
3376
- * doc.fields #=> [:title, :content]
3377
- *
3378
- * doc.load
3379
- * doc.keys #=> [:title, :content]
3380
- * doc.values #=> ["the title", "the content"]
3381
- * doc.fields #=> [:title, :content]
3382
- */
3383
- void Init_LazyDoc(void) {
3384
- id_fields = rb_intern("@fields");
3385
-
3386
- cLazyDoc = rb_define_class_under(mIndex, "LazyDoc", rb_cHash);
3387
- rb_define_method(cLazyDoc, "default", frb_lzd_default, 1);
3388
- rb_define_method(cLazyDoc, "load", frb_lzd_load, 0);
3389
- rb_define_method(cLazyDoc, "fields", frb_lzd_fields, 0);
3390
-
3391
- cLazyDocData = rb_define_class_under(cLazyDoc, "LazyDocData", rb_cObject);
3392
- rb_define_alloc_func(cLazyDocData, frb_lzd_alloc);
3393
- }
3394
-
3395
3254
  /*
3396
3255
  * Document-class: Ferret::Index::IndexReader
3397
3256
  *
@@ -3405,36 +3264,38 @@ void Init_LazyDoc(void) {
3405
3264
  void Init_IndexReader(void) {
3406
3265
  cIndexReader = rb_define_class_under(mIndex, "IndexReader", rb_cObject);
3407
3266
  rb_define_alloc_func(cIndexReader, frb_ir_alloc);
3408
- rb_define_method(cIndexReader, "initialize", frb_ir_init, 1);
3409
- rb_define_method(cIndexReader, "set_norm", frb_ir_set_norm, 3);
3410
- rb_define_method(cIndexReader, "norms", frb_ir_norms, 1);
3267
+ rb_define_method(cIndexReader, "initialize", frb_ir_init, 1);
3268
+ rb_define_method(cIndexReader, "set_norm", frb_ir_set_norm, 3);
3269
+ rb_define_method(cIndexReader, "norms", frb_ir_norms, 1);
3411
3270
  rb_define_method(cIndexReader, "get_norms_into", frb_ir_get_norms_into, 3);
3412
- rb_define_method(cIndexReader, "commit", frb_ir_commit, 0);
3413
- rb_define_method(cIndexReader, "close", frb_ir_close, 0);
3271
+ rb_define_method(cIndexReader, "commit", frb_ir_commit, 0);
3272
+ rb_define_method(cIndexReader, "close", frb_ir_close, 0);
3414
3273
  rb_define_method(cIndexReader, "has_deletions?", frb_ir_has_deletions, 0);
3415
- rb_define_method(cIndexReader, "delete", frb_ir_delete, 1);
3416
- rb_define_method(cIndexReader, "deleted?", frb_ir_is_deleted, 1);
3417
- rb_define_method(cIndexReader, "max_doc", frb_ir_max_doc, 0);
3418
- rb_define_method(cIndexReader, "num_docs", frb_ir_num_docs, 0);
3419
- rb_define_method(cIndexReader, "undelete_all", frb_ir_undelete_all, 0);
3420
- rb_define_method(cIndexReader, "latest?", frb_ir_is_latest, 0);
3421
- rb_define_method(cIndexReader, "get_document", frb_ir_get_doc, -1);
3422
- rb_define_method(cIndexReader, "[]", frb_ir_get_doc, -1);
3423
- rb_define_method(cIndexReader, "term_vector", frb_ir_term_vector, 2);
3424
- rb_define_method(cIndexReader, "term_vectors", frb_ir_term_vectors, 1);
3425
- rb_define_method(cIndexReader, "term_docs", frb_ir_term_docs, 0);
3274
+ rb_define_method(cIndexReader, "delete", frb_ir_delete, 1);
3275
+ rb_define_method(cIndexReader, "deleted?", frb_ir_is_deleted, 1);
3276
+ rb_define_method(cIndexReader, "max_doc", frb_ir_max_doc, 0);
3277
+ rb_define_method(cIndexReader, "num_docs", frb_ir_num_docs, 0);
3278
+ rb_define_method(cIndexReader, "undelete_all", frb_ir_undelete_all, 0);
3279
+ rb_define_method(cIndexReader, "latest?", frb_ir_is_latest, 0);
3280
+ rb_define_method(cIndexReader, "get_document", frb_ir_get_doc, -1);
3281
+ rb_define_method(cIndexReader, "[]", frb_ir_get_doc, -1);
3282
+ rb_define_method(cIndexReader, "term_vector", frb_ir_term_vector, 2);
3283
+ rb_define_method(cIndexReader, "term_vectors", frb_ir_term_vectors, 1);
3284
+ rb_define_method(cIndexReader, "term_docs", frb_ir_term_docs, 0);
3426
3285
  rb_define_method(cIndexReader, "term_positions", frb_ir_term_positions, 0);
3427
3286
  rb_define_method(cIndexReader, "term_docs_for", frb_ir_term_docs_for, 2);
3428
3287
  rb_define_method(cIndexReader, "term_positions_for", frb_ir_t_pos_for, 2);
3429
- rb_define_method(cIndexReader, "doc_freq", frb_ir_doc_freq, 2);
3430
- rb_define_method(cIndexReader, "terms", frb_ir_terms, 1);
3431
- rb_define_method(cIndexReader, "terms_from", frb_ir_terms_from, 2);
3432
- rb_define_method(cIndexReader, "term_count", frb_ir_term_count, 1);
3433
- rb_define_method(cIndexReader, "fields", frb_ir_fields, 0);
3434
- rb_define_method(cIndexReader, "field_names", frb_ir_fields, 0);
3435
- rb_define_method(cIndexReader, "field_infos", frb_ir_field_infos, 0);
3436
- rb_define_method(cIndexReader, "tokenized_fields", frb_ir_tk_fields, 0);
3437
- rb_define_method(cIndexReader, "version", frb_ir_version, 0);
3288
+ rb_define_method(cIndexReader, "doc_freq", frb_ir_doc_freq, 2);
3289
+ rb_define_method(cIndexReader, "terms", frb_ir_terms, 1);
3290
+ rb_define_method(cIndexReader, "terms_from", frb_ir_terms_from, 2);
3291
+ rb_define_method(cIndexReader, "term_count", frb_ir_term_count, 1);
3292
+ rb_define_method(cIndexReader, "fields", frb_ir_fields, 0);
3293
+ rb_define_method(cIndexReader, "field_names", frb_ir_fields, 0);
3294
+ rb_define_method(cIndexReader, "field_infos", frb_ir_field_infos, 0);
3295
+ rb_define_method(cIndexReader, "tokenized_fields", frb_ir_tk_fields, 0);
3296
+ rb_define_method(cIndexReader, "version", frb_ir_version, 0);
3297
+ rb_define_method(cIndexReader, "each", frb_ir_each, 0);
3298
+ rb_define_method(cIndexReader, "to_enum", frb_ir_to_enum, 0);
3438
3299
  }
3439
3300
 
3440
3301
  /* rdoc hack
@@ -0,0 +1,705 @@
1
+ #include "frt_index.h"
2
+ #include "isomorfeus_ferret.h"
3
+
4
+ extern VALUE rb_hash_update(int argc, VALUE *argv, VALUE self);
5
+
6
+ extern VALUE sym_each;
7
+ extern ID id_eql;
8
+
9
+ static VALUE sym_each_key;
10
+ static VALUE sym_each_value;
11
+ static ID id_compact;
12
+ static ID id_equal;
13
+ static ID id_except;
14
+ static ID id_fields;
15
+ static ID id_flatten;
16
+ static ID id_ge;
17
+ static ID id_get;
18
+ static ID id_gt;
19
+ static ID id_inspect;
20
+ static ID id_invert;
21
+ static ID id_le;
22
+ static ID id_merge_bang;
23
+ static ID id_reject;
24
+ static ID id_select;
25
+ static ID id_size;
26
+ static ID id_slice;
27
+ static ID id_to_h;
28
+ static ID id_to_proc;
29
+ static ID id_transform_keys;
30
+ static ID id_transform_values;
31
+
32
+ FrtLazyDoc empty_lazy_doc = {0};
33
+ VALUE cLazyDoc;
34
+
35
+ typedef struct rLazyDoc {
36
+ FrtHash *hash;
37
+ FrtLazyDoc *doc;
38
+ } rLazyDoc;
39
+
40
+ /****************************************************************************
41
+ *
42
+ * LazyDoc Methods
43
+ *
44
+ ****************************************************************************/
45
+
46
+ static void frb_ld_free(void *p) {
47
+ rLazyDoc *rld = (rLazyDoc *)p;
48
+ if (rld->doc != &empty_lazy_doc) {
49
+ frt_lazy_doc_close(rld->doc);
50
+ }
51
+ frt_h_destroy(rld->hash);
52
+ free(rld);
53
+ }
54
+
55
+ static size_t frb_ld_size(const void *p) {
56
+ return sizeof(rLazyDoc);
57
+ (void)p;
58
+ }
59
+
60
+ void rld_mark(void *key, void *value, void *arg) {
61
+ rb_gc_mark((VALUE)value);
62
+ }
63
+
64
+ static void frb_ld_mark(void *p) {
65
+ frt_h_each(((rLazyDoc *)p)->hash, rld_mark, NULL);
66
+ }
67
+
68
+ const rb_data_type_t frb_ld_t = {
69
+ .wrap_struct_name = "FrbLazyDoc",
70
+ .function = {
71
+ .dmark = frb_ld_mark,
72
+ .dfree = frb_ld_free,
73
+ .dsize = frb_ld_size,
74
+ .dcompact = NULL,
75
+ .reserved = {0},
76
+ },
77
+ .parent = NULL,
78
+ .data = NULL,
79
+ .flags = RUBY_TYPED_FREE_IMMEDIATELY
80
+ };
81
+
82
+ VALUE frb_get_lazy_doc(FrtLazyDoc *lazy_doc) {
83
+ rLazyDoc *rld = FRT_ALLOC(rLazyDoc);
84
+ rld->hash = frt_h_new_ptr(NULL);
85
+ rld->doc = lazy_doc;
86
+ return TypedData_Wrap_Struct(cLazyDoc, &frb_ld_t, rld);
87
+ }
88
+
89
+ static VALUE frb_ld_alloc(VALUE rclass) {
90
+ rLazyDoc *rld = FRT_ALLOC(rLazyDoc);
91
+ rld->hash = frt_h_new_ptr(NULL);
92
+ rld->doc = &empty_lazy_doc;
93
+ return TypedData_Wrap_Struct(rclass, &frb_ld_t, rld);
94
+ }
95
+
96
+ static VALUE frb_ld_df_load(VALUE self, VALUE rkey, FrtLazyDocField *lazy_df) {
97
+ rLazyDoc *rld = DATA_PTR(self);
98
+ VALUE rdata;
99
+ if (lazy_df->size == 1) {
100
+ char *data = frt_lazy_df_get_data(lazy_df, 0);
101
+ rdata = rb_str_new(data, lazy_df->data[0].length);
102
+ rb_enc_associate(rdata, lazy_df->data[0].encoding);
103
+ } else {
104
+ int i;
105
+ VALUE rstr;
106
+ rdata = rb_ary_new2(lazy_df->size);
107
+ for (i = 0; i < lazy_df->size; i++) {
108
+ char *data = frt_lazy_df_get_data(lazy_df, i);
109
+ rstr = rb_str_new(data, lazy_df->data[i].length);
110
+ rb_enc_associate(rstr, lazy_df->data[i].encoding);
111
+ rb_ary_store(rdata, i, rstr);
112
+ }
113
+ }
114
+ frt_h_set(rld->hash, (void *)rkey, (void *)rdata);
115
+ return rdata;
116
+ }
117
+
118
+ /*
119
+ * call-seq:
120
+ * lazy_doc.load -> lazy_doc
121
+ *
122
+ * Load all unloaded fields in the document from the index.
123
+ */
124
+ static VALUE frb_ld_load(VALUE self) {
125
+ rLazyDoc *rld = DATA_PTR(self);
126
+ FrtLazyDoc *ld = rld->doc;
127
+ if (ld->loaded) return self;
128
+ int i;
129
+ FrtLazyDocField *lazy_df;
130
+ for (i = 0; i < ld->size; i++) {
131
+ lazy_df = ld->fields[i];
132
+ if (!(lazy_df->loaded)) frb_ld_df_load(self, ID2SYM(lazy_df->name), lazy_df);
133
+ }
134
+ ld->loaded = true;
135
+ return self;
136
+ }
137
+
138
+ /*
139
+ * call-seq:
140
+ * lazy_doc.fields -> array of available fields
141
+ *
142
+ * Returns the list of fields stored for this particular document. If you try
143
+ * to access any of these fields in the document the field will be loaded.
144
+ * Try to access any other field an nil will be returned.
145
+ */
146
+ static VALUE frb_ld_fields(VALUE self) {
147
+ FrtLazyDoc *ld = ((rLazyDoc *)DATA_PTR(self))->doc;
148
+ VALUE rfields = rb_ivar_get(self, id_fields);
149
+ if (rfields == Qnil) {
150
+ int i;
151
+ rfields = rb_ary_new2(ld->size);
152
+ for (i = 0; i < ld->size; i++) {
153
+ rb_ary_store(rfields, i, ID2SYM(ld->fields[i]->name));
154
+ }
155
+ rb_ivar_set(self, id_fields, rfields);
156
+ }
157
+ return rfields;
158
+ }
159
+
160
+ void rld_to_hash(void *key, void *value, void *arg) {
161
+ rb_hash_aset((VALUE)arg, (VALUE)key, (VALUE)value);
162
+ }
163
+
164
+ static VALUE frb_ld_to_h(VALUE self) {
165
+ rLazyDoc *rld = (rLazyDoc *)DATA_PTR(self);
166
+ if (!rld->doc->loaded) frb_ld_load(self);
167
+ VALUE hash = rb_hash_new();
168
+ frt_h_each(rld->hash, rld_to_hash, (void *)hash);
169
+ return hash;
170
+ }
171
+
172
+ static VALUE frb_ld_lt(VALUE self, VALUE other) {
173
+ VALUE other_h;
174
+ if (TYPE(other) == T_HASH) {
175
+ other_h = other;
176
+ } else {
177
+ rLazyDoc *other_rld;
178
+ TypedData_Get_Struct(other, rLazyDoc, &frb_ld_t, other_rld);
179
+ other_h = frb_ld_to_h(other);
180
+ }
181
+ VALUE self_h = frb_ld_to_h(self);
182
+ return rb_funcall(self_h, id_lt, 1, other_h);
183
+ }
184
+
185
+ static VALUE frb_ld_le(VALUE self, VALUE other) {
186
+ VALUE other_h;
187
+ if (TYPE(other) == T_HASH) {
188
+ other_h = other;
189
+ } else {
190
+ rLazyDoc *other_rld;
191
+ TypedData_Get_Struct(other, rLazyDoc, &frb_ld_t, other_rld);
192
+ other_h = frb_ld_to_h(other);
193
+ }
194
+ VALUE self_h = frb_ld_to_h(self);
195
+ return rb_funcall(self_h, id_le, 1, other_h);
196
+ }
197
+
198
+ static VALUE frb_ld_equal(VALUE self, VALUE other) {
199
+ FrtLazyDoc *ld = ((rLazyDoc *)DATA_PTR(self))->doc;
200
+ int other_size;
201
+ VALUE other_h;
202
+ if (TYPE(other) == T_HASH) {
203
+ other_h = other;
204
+ other_size = FIX2INT(rb_funcall(other_h, id_size, 0));
205
+ } else {
206
+ rLazyDoc *other_rld;
207
+ TypedData_Get_Struct(other, rLazyDoc, &frb_ld_t, other_rld);
208
+ other_h = frb_ld_to_h(other);
209
+ other_size = other_rld->doc->size;
210
+ }
211
+ if (ld->size == other_size) {
212
+ VALUE self_h = frb_ld_to_h(self);
213
+ return rb_funcall(self_h, id_equal, 1, other_h);
214
+ }
215
+ return Qfalse;
216
+ }
217
+
218
+ static VALUE frb_ld_gt(VALUE self, VALUE other) {
219
+ VALUE other_h;
220
+ if (TYPE(other) == T_HASH) {
221
+ other_h = other;
222
+ } else {
223
+ rLazyDoc *other_rld;
224
+ TypedData_Get_Struct(other, rLazyDoc, &frb_ld_t, other_rld);
225
+ other_h = frb_ld_to_h(other);
226
+ }
227
+ VALUE self_h = frb_ld_to_h(self);
228
+ return rb_funcall(self_h, id_gt, 1, other_h);
229
+ }
230
+
231
+ static VALUE frb_ld_ge(VALUE self, VALUE other) {
232
+ VALUE other_h;
233
+ if (TYPE(other) == T_HASH) {
234
+ other_h = other;
235
+ } else {
236
+ rLazyDoc *other_rld;
237
+ TypedData_Get_Struct(other, rLazyDoc, &frb_ld_t, other_rld);
238
+ other_h = frb_ld_to_h(other);
239
+ }
240
+ VALUE self_h = frb_ld_to_h(self);
241
+ return rb_funcall(self_h, id_ge, 1, other_h);
242
+ }
243
+
244
+ static VALUE frb_ld_get(VALUE self, VALUE key) {
245
+ rLazyDoc *rld = (rLazyDoc *)DATA_PTR(self);
246
+ VALUE rval = (VALUE)frt_h_get(rld->hash, (void *)key);
247
+ if (rval) return rval;
248
+ if (TYPE(key) != T_SYMBOL) rb_raise(rb_eArgError, "key, must be a symbol");
249
+ FrtLazyDocField *df = frt_h_get(rld->doc->field_dictionary, (void *)SYM2ID(key));
250
+ if (df) return frb_ld_df_load(self, key, df);
251
+ return Qnil;
252
+ }
253
+
254
+ void rld_any(void *key, void *value, void *arg) {
255
+ VALUE *v = arg;
256
+ *v = rb_yield_values(2, (VALUE)key, (VALUE)value);
257
+ }
258
+
259
+ static VALUE frb_ld_assoc(VALUE self, VALUE key) {
260
+ rLazyDoc *rld = DATA_PTR(self);
261
+ VALUE value = (VALUE)frt_h_get(rld->hash, (void *)key);
262
+ if (!value) {
263
+ FrtLazyDoc *ld = rld->doc;
264
+ FrtLazyDocField *df = frt_h_get(ld->field_dictionary, (void *)SYM2ID(key));
265
+ if (!df) return Qnil;
266
+ if (df && !df->loaded) value = frb_ld_df_load(self, key, df);
267
+ }
268
+ VALUE a[2] = {key, value};
269
+ return rb_ary_new_from_values(2, a);
270
+ }
271
+
272
+ static VALUE frb_ld_any(int argc, VALUE *argv, VALUE self) {
273
+ rLazyDoc *rld = (rLazyDoc *)DATA_PTR(self);
274
+ FrtLazyDoc *ld = rld->doc;
275
+ if (argc == 0) {
276
+ if (!rb_block_given_p()) {
277
+ return (ld->size > 0) ? Qtrue : Qfalse;
278
+ } else {
279
+ if (!ld->loaded) frb_ld_load(self);
280
+ VALUE res = Qnil;
281
+ frt_h_each(rld->hash, rld_any, &res);
282
+ if (RTEST(res)) return Qtrue;
283
+ else return Qfalse;
284
+ }
285
+ } else if (argc == 1) {
286
+ VALUE obj = argv[0];
287
+ VALUE key = rb_funcall(obj, id_get, 1, 0);
288
+ VALUE a = frb_ld_assoc(self, key);
289
+ return rb_funcall(a, id_equal, 1, obj);
290
+ }
291
+ rb_raise(rb_eArgError, "at most one arg may be given");
292
+ return Qfalse;
293
+ }
294
+
295
+ static VALUE frb_ld_compact(VALUE self) {
296
+ VALUE hash = frb_ld_to_h(self);
297
+ return rb_funcall(hash, id_compact, 0);
298
+ }
299
+
300
+ static VALUE frb_ld_dig(int argc, VALUE *argv, VALUE self) {
301
+ if (argc == 0) rb_raise(rb_eArgError, "at least a key must be given");
302
+ VALUE key = argv[0];
303
+ if (TYPE(key) != T_SYMBOL) rb_raise(rb_eArgError, "first arg, key, must be a symbol");
304
+ VALUE value = frb_ld_get(self, key);
305
+ if (argc == 1) return value;
306
+ if (TYPE(value) == T_ARRAY && argc == 2) {
307
+ return rb_ary_entry(value, NUM2LONG(argv[1]));
308
+ }
309
+ return Qnil;
310
+ }
311
+
312
+ static VALUE frb_ld_to_enum(VALUE self) {
313
+ return rb_enumeratorize(self, sym_each, 0, NULL);
314
+ }
315
+
316
+ static VALUE frb_ld_to_key_enum(VALUE self) {
317
+ return rb_enumeratorize(self, sym_each_key, 0, NULL);
318
+ }
319
+
320
+ static VALUE frb_ld_to_value_enum(VALUE self) {
321
+ return rb_enumeratorize(self, sym_each_value, 0, NULL);
322
+ }
323
+
324
+ void rld_each(void *key, void *value, void *arg) {
325
+ rb_yield_values(2, (VALUE)key, (VALUE)value);
326
+ }
327
+
328
+ static VALUE frb_ld_each(VALUE self) {
329
+ rLazyDoc *rld = (rLazyDoc *)DATA_PTR(self);
330
+ FrtLazyDoc *ld = rld->doc;
331
+ if (!ld->loaded) frb_ld_load(self);
332
+ if (rb_block_given_p()) {
333
+ frt_h_each(rld->hash, rld_each, NULL);
334
+ return self;
335
+ } else {
336
+ return frb_ld_to_enum(self);
337
+ }
338
+ }
339
+
340
+ void rld_each_key(void *key, void *value, void *arg) {
341
+ rb_yield((VALUE)key);
342
+ }
343
+
344
+ static VALUE frb_ld_each_key(VALUE self) {
345
+ rLazyDoc *rld = (rLazyDoc *)DATA_PTR(self);
346
+ FrtLazyDoc *ld = rld->doc;
347
+ if (!ld->loaded) frb_ld_load(self);
348
+ if (rb_block_given_p()) {
349
+ frt_h_each(rld->hash, rld_each_key, NULL);
350
+ return self;
351
+ } else {
352
+ return frb_ld_to_key_enum(self);
353
+ }
354
+ }
355
+
356
+ void rld_each_value(void *key, void *value, void *arg) {
357
+ rb_yield((VALUE)value);
358
+ }
359
+
360
+ static VALUE frb_ld_each_value(VALUE self) {
361
+ rLazyDoc *rld = (rLazyDoc *)DATA_PTR(self);
362
+ FrtLazyDoc *ld = rld->doc;
363
+ if (!ld->loaded) frb_ld_load(self);
364
+ if (rb_block_given_p()) {
365
+ frt_h_each(rld->hash, rld_each_value, NULL);
366
+ return self;
367
+ } else {
368
+ return frb_ld_to_value_enum(self);
369
+ }
370
+ }
371
+
372
+ static VALUE frb_ld_empty(VALUE self) {
373
+ FrtLazyDoc *ld = ((rLazyDoc *)DATA_PTR(self))->doc;
374
+ return (ld->size == 0) ? Qtrue : Qfalse;
375
+ }
376
+
377
+ static VALUE frb_ld_eql(VALUE self, VALUE other) {
378
+ FrtLazyDoc *ld = ((rLazyDoc *)DATA_PTR(self))->doc;
379
+ rLazyDoc *other_rld;
380
+ int other_size;
381
+ VALUE other_h;
382
+ if (TYPE(other) == T_HASH) {
383
+ other_h = other;
384
+ other_size = FIX2INT(rb_funcall(other_h, id_size, 0));
385
+ } else {
386
+ TypedData_Get_Struct(other, rLazyDoc, &frb_ld_t, other_rld);
387
+ other_h = frb_ld_to_h(other);
388
+ other_size = other_rld->doc->size;
389
+ }
390
+ if (ld->size == other_size) {
391
+ VALUE self_h = frb_ld_to_h(self);
392
+ return rb_funcall(self_h, id_eql, 1, other_h);
393
+ }
394
+ return Qfalse;
395
+ }
396
+
397
+ static VALUE frb_ld_except(int argc, VALUE *argv, VALUE self) {
398
+ VALUE hash = frb_ld_to_h(self);
399
+ return rb_funcallv(hash, id_except, argc, argv);
400
+ }
401
+
402
+ static VALUE frb_ld_fetch(int argc, VALUE *argv, VALUE self) {
403
+ VALUE key = argv[0];
404
+ if (TYPE(key) != T_SYMBOL) rb_raise(rb_eArgError, "first arg must be a symbol");
405
+ VALUE res = frb_ld_get(self, key);
406
+ if (argc == 1) {
407
+ if (res == Qnil && rb_block_given_p()) return rb_yield(key);
408
+ return res;
409
+ }
410
+ if (argc == 2) {
411
+ if (res == Qnil) return argv[1];
412
+ return res;
413
+ }
414
+ rb_raise(rb_eArgError, "too many args, only two allowed: key, default_value");
415
+ }
416
+
417
+ static VALUE frb_ld_fetch_values(int argc, VALUE *argv, VALUE self) {
418
+ rLazyDoc *rld = DATA_PTR(self);
419
+ if (!rld->doc->loaded) frb_ld_load(self);
420
+ VALUE ary = rb_ary_new();
421
+ int i;
422
+ VALUE value;
423
+ for (i=0; i<argc; i++) {
424
+ value = (VALUE)frt_h_get(rld->hash, (void *)argv[i]);
425
+ if (value) rb_ary_push(ary, value);
426
+ else if (rb_block_given_p()) {
427
+ value = rb_yield(argv[i]);
428
+ rb_ary_push(ary, value);
429
+ }
430
+ }
431
+ if (FIX2INT(rb_funcall(ary, id_size, 0)) == 0) rb_raise(rb_eException, "nothing found for given keys");
432
+ return ary;
433
+ }
434
+
435
+ static VALUE frb_ld_filter(VALUE self) {
436
+ VALUE hash = frb_ld_to_h(self);
437
+ return rb_funcall_passing_block(hash, id_select, 0, NULL);
438
+ }
439
+
440
+ void rld_flatten(void *key, void *value, void *arg) {
441
+ rb_ary_push((VALUE)arg, (VALUE)key);
442
+ rb_ary_push((VALUE)arg, (VALUE)value);
443
+ }
444
+
445
+ static VALUE frb_ld_flatten(int argc, VALUE *argv, VALUE self) {
446
+ rLazyDoc *rld = (rLazyDoc *)DATA_PTR(self);
447
+ if (!rld->doc->loaded) frb_ld_load(self);
448
+ VALUE ary = rb_ary_new();
449
+ frt_h_each(rld->hash, rld_flatten, (void *)ary);
450
+ if (argc == 1) {
451
+ int level = FIX2INT(argv[0]) - 1;
452
+ VALUE rlevel = INT2FIX(level);
453
+ rb_funcall(ary, id_flatten, 1, rlevel);
454
+ }
455
+ return ary;
456
+ }
457
+
458
+ static VALUE frb_ld_has_key(VALUE self, VALUE key) {
459
+ if (TYPE(key) != T_SYMBOL) rb_raise(rb_eArgError, "arg must be a symbol");
460
+ VALUE hk = Qfalse;
461
+ FrtLazyDoc *ld = ((rLazyDoc *)DATA_PTR(self))->doc;
462
+ ID dfkey = SYM2ID(key);
463
+ FrtLazyDocField *df = frt_h_get(ld->field_dictionary, (void *)dfkey);
464
+ if (df) hk = Qtrue;
465
+ return hk;
466
+ }
467
+
468
+ static VALUE frb_ld_has_value(VALUE self, VALUE value) {
469
+ rLazyDoc *rld = (rLazyDoc *)DATA_PTR(self);
470
+ FrtLazyDoc *ld = rld->doc;
471
+ if (!ld->loaded) frb_ld_load(self);
472
+ int i;
473
+ VALUE hvalue;
474
+ for (i=0; i<ld->size; i++) {
475
+ hvalue = (VALUE)frt_h_get(rld->hash, (void *)ID2SYM(ld->fields[i]->name));
476
+ hvalue = rb_funcall(hvalue, id_equal, 1, value);
477
+ if (hvalue == Qtrue) return Qtrue;
478
+ }
479
+ return Qfalse;
480
+ }
481
+
482
+ static VALUE frb_ld_inspect(VALUE self) {
483
+ VALUE hash = frb_ld_to_h(self);
484
+ return rb_funcall(hash, id_inspect, 0);
485
+ }
486
+
487
+ static VALUE frb_ld_invert(VALUE self) {
488
+ VALUE hash = frb_ld_to_h(self);
489
+ return rb_funcall(hash, id_invert, 0);
490
+ }
491
+
492
+ static VALUE frb_ld_key(VALUE self, VALUE value) {
493
+ rLazyDoc *rld = (rLazyDoc *)DATA_PTR(self);
494
+ FrtLazyDoc *ld = rld->doc;
495
+ if (!ld->loaded) frb_ld_load(self);
496
+ int i;
497
+ VALUE hvalue;
498
+ for (i=0; i<ld->size; i++) {
499
+ hvalue = (VALUE)frt_h_get(rld->hash, (void *)ID2SYM(ld->fields[i]->name));
500
+ hvalue = rb_funcall(hvalue, id_equal, 1, value);
501
+ if (hvalue == Qtrue) return ID2SYM(ld->fields[i]->name);
502
+ }
503
+ return Qnil;
504
+ }
505
+
506
+ static VALUE frb_ld_length(VALUE self) {
507
+ FrtLazyDoc *ld = ((rLazyDoc *)DATA_PTR(self))->doc;
508
+ return INT2FIX(ld->size);
509
+ }
510
+
511
+ static VALUE frb_ld_merge(int argc, VALUE *argv, VALUE self) {
512
+ rLazyDoc *rld = (rLazyDoc *)DATA_PTR(self);
513
+ if (!rld->doc->loaded) frb_ld_load(self);
514
+ VALUE hash = frb_ld_to_h(self);
515
+ return rb_funcall_passing_block(hash, id_merge_bang, argc, argv);
516
+ }
517
+
518
+ static VALUE frb_ld_rassoc(VALUE self, VALUE value) {
519
+ VALUE key = frb_ld_key(self, value);
520
+ if (key == Qnil) return Qnil;
521
+ VALUE a[2] = {key, value};
522
+ return rb_ary_new_from_values(2, a);
523
+ }
524
+
525
+ static VALUE frb_ld_reject(VALUE self) {
526
+ VALUE hash = frb_ld_to_h(self);
527
+ return rb_funcall_passing_block(hash, id_reject, 0, NULL);
528
+ }
529
+
530
+ static VALUE frb_ld_slice(int argc, VALUE *argv, VALUE self) {
531
+ VALUE hash = frb_ld_to_h(self);
532
+ return rb_funcallv(hash, id_slice, argc, argv);
533
+ }
534
+
535
+ void rld_to_a(void *key, void *value, void *arg) {
536
+ VALUE ary = rb_ary_new();
537
+ rb_ary_push(ary, (VALUE)key);
538
+ rb_ary_push(ary, (VALUE)value);
539
+ rb_ary_push((VALUE)arg, ary);
540
+ }
541
+
542
+ static VALUE frb_ld_to_a(VALUE self) {
543
+ rLazyDoc *rld = DATA_PTR(self);
544
+ if (!rld->doc->loaded) frb_ld_load(self);
545
+ VALUE ary = rb_ary_new();
546
+ frt_h_each(rld->hash, rld_to_a, (void *)ary);
547
+ return ary;
548
+ }
549
+
550
+ static VALUE frb_ld_to_ha(VALUE self) {
551
+ VALUE hash = frb_ld_to_h(self);
552
+ if (!rb_block_given_p()) return hash;
553
+ return rb_funcall_passing_block(hash, id_to_h, 0, NULL);
554
+ }
555
+
556
+ static VALUE frb_ld_to_proc(VALUE self) {
557
+ VALUE hash = frb_ld_to_h(self);
558
+ return rb_funcall(hash, id_to_proc, 0);
559
+ }
560
+
561
+ static VALUE frb_ld_transform_keys(int argc, VALUE *argv, VALUE self) {
562
+ VALUE hash = frb_ld_to_h(self);
563
+ return rb_funcall_passing_block(hash, id_transform_keys, argc, argv);
564
+ }
565
+
566
+ static VALUE frb_ld_transform_values(VALUE self) {
567
+ VALUE hash = frb_ld_to_h(self);
568
+ return rb_funcall_passing_block(hash, id_transform_values, 0, NULL);
569
+ }
570
+
571
+ void rld_values(void *key, void *value, void *arg) {
572
+ rb_ary_push((VALUE)arg, (VALUE)value);
573
+ }
574
+
575
+ static VALUE frb_ld_values(VALUE self) {
576
+ rLazyDoc *rld = DATA_PTR(self);
577
+ if (!rld->doc->loaded) frb_ld_load(self);
578
+ VALUE ary = rb_ary_new();
579
+ frt_h_each(rld->hash, rld_values, (void *)ary);
580
+ return ary;
581
+ }
582
+
583
+ static VALUE frb_ld_values_at(int argc, VALUE *argv, VALUE self) {
584
+ rLazyDoc *rld = DATA_PTR(self);
585
+ if (!rld->doc->loaded) frb_ld_load(self);
586
+ VALUE ary = rb_ary_new();
587
+ int i;
588
+ VALUE value;
589
+ for (i=0; i<argc; i++) {
590
+ value = (VALUE)frt_h_get(rld->hash, (void *)argv[i]);
591
+ if (value) rb_ary_push(ary, value);
592
+ else rb_ary_push(ary, Qnil);
593
+ }
594
+ return ary;
595
+ }
596
+
597
+ /*
598
+ * Document-class: Ferret::Index::LazyDoc
599
+ *
600
+ * == Summary
601
+ *
602
+ * When a document is retrieved from the index a LazyDoc is returned.
603
+ * It inherits from rubys Hash class, however it is read only.
604
+ * LazyDoc lazily adds fields to itself when they are accessed or
605
+ * automatically loads all fields if needed.
606
+ * To load all fields use the LazyDoc#load method.
607
+ * Methods from the Hash class, that would modify the LazyDoc itself,
608
+ * are not supported, .
609
+ *
610
+ * == Example
611
+ *
612
+ * doc = index_reader[0]
613
+ *
614
+ * doc.keys #=> []
615
+ * doc.values #=> []
616
+ * doc.fields #=> [:title, :content]
617
+ *
618
+ * title = doc[:title] #=> "the title"
619
+ * doc.keys #=> [:title]
620
+ * doc.values #=> ["the title"]
621
+ * doc.fields #=> [:title, :content]
622
+ *
623
+ * doc.load
624
+ * doc.keys #=> [:title, :content]
625
+ * doc.values #=> ["the title", "the content"]
626
+ * doc.fields #=> [:title, :content]
627
+ */
628
+ void Init_LazyDoc(void) {
629
+ sym_each_key = ID2SYM(rb_intern("each_key"));
630
+ sym_each_value = ID2SYM(rb_intern("each_value"));
631
+ id_compact = rb_intern("compact");
632
+ id_equal = rb_intern("==");
633
+ id_except = rb_intern("except");
634
+ id_fields = rb_intern("@fields");
635
+ id_flatten = rb_intern("flatten");
636
+ id_ge = rb_intern(">=");
637
+ id_get = rb_intern("[]");
638
+ id_gt = rb_intern(">");
639
+ id_inspect = rb_intern("inspect");
640
+ id_invert = rb_intern("invert");
641
+ id_le = rb_intern("<=");
642
+ id_merge_bang = rb_intern("merge!");
643
+ id_reject = rb_intern("reject");
644
+ id_select = rb_intern("select");
645
+ id_size = rb_intern("size");
646
+ id_slice = rb_intern("slice");
647
+ id_to_h = rb_intern("to_h");
648
+ id_to_proc = rb_intern("to_proc");
649
+ id_transform_keys = rb_intern("transform_keys");
650
+ id_transform_values = rb_intern("transform_values");
651
+
652
+ cLazyDoc = rb_define_class_under(mIndex, "LazyDoc", rb_cObject);
653
+ rb_include_module(cLazyDoc, rb_mEnumerable);
654
+ rb_define_alloc_func(cLazyDoc, frb_ld_alloc);
655
+ rb_define_method(cLazyDoc, "load", frb_ld_load, 0);
656
+ rb_define_method(cLazyDoc, "fields", frb_ld_fields, 0);
657
+ rb_define_method(cLazyDoc, "keys", frb_ld_fields, 0);
658
+ rb_define_method(cLazyDoc, "<", frb_ld_lt, 1);
659
+ rb_define_method(cLazyDoc, "<=", frb_ld_le, 1);
660
+ rb_define_method(cLazyDoc, "==", frb_ld_equal, 1);
661
+ rb_define_method(cLazyDoc, ">", frb_ld_gt, 1);
662
+ rb_define_method(cLazyDoc, ">=", frb_ld_ge, 1);
663
+ rb_define_method(cLazyDoc, "[]", frb_ld_get, 1);
664
+ rb_define_method(cLazyDoc, "any?", frb_ld_any, -1);
665
+ rb_define_method(cLazyDoc, "assoc", frb_ld_assoc, 1);
666
+ rb_define_method(cLazyDoc, "compact", frb_ld_compact, 0);
667
+ rb_define_method(cLazyDoc, "dig", frb_ld_dig, -1);
668
+ rb_define_method(cLazyDoc, "each", frb_ld_each, 0);
669
+ rb_define_method(cLazyDoc, "each_key", frb_ld_each_key, 0);
670
+ rb_define_method(cLazyDoc, "each_pair", frb_ld_each, 0);
671
+ rb_define_method(cLazyDoc, "each_value", frb_ld_each_value, 0);
672
+ rb_define_method(cLazyDoc, "empty?", frb_ld_empty, 0);
673
+ rb_define_method(cLazyDoc, "eql?", frb_ld_eql, 1);
674
+ rb_define_method(cLazyDoc, "except", frb_ld_except, -1);
675
+ rb_define_method(cLazyDoc, "fetch", frb_ld_fetch, -1);
676
+ rb_define_method(cLazyDoc, "fetch_values", frb_ld_fetch_values, -1);
677
+ rb_define_method(cLazyDoc, "filter", frb_ld_filter, 0);
678
+ rb_define_method(cLazyDoc, "flatten", frb_ld_flatten, -1);
679
+ rb_define_method(cLazyDoc, "has_key?", frb_ld_has_key, 1);
680
+ rb_define_method(cLazyDoc, "has_value?", frb_ld_has_value, 1);
681
+ rb_define_method(cLazyDoc, "include?", frb_ld_has_key, 1);
682
+ rb_define_method(cLazyDoc, "inspect", frb_ld_inspect, 0);
683
+ rb_define_method(cLazyDoc, "invert", frb_ld_invert, 0);
684
+ rb_define_method(cLazyDoc, "key", frb_ld_key, 1);
685
+ rb_define_method(cLazyDoc, "key?", frb_ld_has_key, 1);
686
+ rb_define_method(cLazyDoc, "length", frb_ld_length, 0);
687
+ rb_define_method(cLazyDoc, "member?", frb_ld_has_key, 1);
688
+ rb_define_method(cLazyDoc, "merge", frb_ld_merge, -1);
689
+ rb_define_method(cLazyDoc, "rassoc", frb_ld_rassoc, 1);
690
+ rb_define_method(cLazyDoc, "reject", frb_ld_reject, 0);
691
+ rb_define_method(cLazyDoc, "select", frb_ld_filter, 0);
692
+ rb_define_method(cLazyDoc, "size", frb_ld_length, 0);
693
+ rb_define_method(cLazyDoc, "slice", frb_ld_slice, -1);
694
+ rb_define_method(cLazyDoc, "to_a", frb_ld_to_a, 0);
695
+ rb_define_method(cLazyDoc, "to_enum", frb_ld_to_enum, 0);
696
+ rb_define_method(cLazyDoc, "to_h", frb_ld_to_ha, 0);
697
+ rb_define_method(cLazyDoc, "to_hash", frb_ld_to_h, 0);
698
+ rb_define_method(cLazyDoc, "to_proc", frb_ld_to_proc, 0);
699
+ rb_define_method(cLazyDoc, "to_s", frb_ld_inspect, 0);
700
+ rb_define_method(cLazyDoc, "transform_keys", frb_ld_transform_keys, -1);
701
+ rb_define_method(cLazyDoc, "transform_values", frb_ld_transform_values, 0);
702
+ rb_define_method(cLazyDoc, "value?", frb_ld_has_value, 1);
703
+ rb_define_method(cLazyDoc, "values", frb_ld_values, 0);
704
+ rb_define_method(cLazyDoc, "values_at", frb_ld_values_at, -1);
705
+ }
@@ -58,10 +58,10 @@ typedef struct FrtHash {
58
58
  * used outside of the Hash methods */
59
59
  FrtHashEntry *(*lookup_i)(struct FrtHash *self,
60
60
  register const void *key);
61
- unsigned long (*hash_i)(const void *key);
62
- int (*eq_i)(const void *key1, const void *key2);
63
- void (*free_key_i)(void *p);
64
- void (*free_value_i)(void *p);
61
+ unsigned long (*hash_i)(const void *key);
62
+ int (*eq_i)(const void *key1, const void *key2);
63
+ void (*free_key_i)(void *p);
64
+ void (*free_value_i)(void *p);
65
65
  } FrtHash;
66
66
 
67
67
  /**
@@ -140,8 +140,7 @@ extern FrtHash *frt_h_new(frt_hash_ft hash,
140
140
  * pass NULL in place of this parameter the value will not be destroyed.
141
141
  * @return A newly allocated Hash
142
142
  */
143
- extern FrtHash *frt_h_new_str(frt_free_ft free_key,
144
- frt_free_ft free_value);
143
+ extern FrtHash *frt_h_new_str(frt_free_ft free_key, frt_free_ft free_value);
145
144
 
146
145
  /**
147
146
  * Create a new Hash that uses integers as its keys. The Hash will store all
@@ -258,8 +257,7 @@ extern void *frt_h_rem(FrtHash *self, const void *key, bool del_key);
258
257
  * the existing key so no key was freed
259
258
  * </pre>
260
259
  */
261
- extern FrtHashKeyStatus frt_h_set(FrtHash *self,
262
- const void *key, void *value);
260
+ extern FrtHashKeyStatus frt_h_set(FrtHash *self, const void *key, void *value);
263
261
 
264
262
  /**
265
263
  * Add the value +value+ to the Hash referencing it with key +key+. If
@@ -170,9 +170,9 @@ static char *fn_for_gen_field(char *buf,
170
170
  *
171
171
  ***************************************************************************/
172
172
 
173
- static unsigned long long co_hash(const void *key)
173
+ static unsigned long co_hash(const void *key)
174
174
  {
175
- return (unsigned long long)key;
175
+ return (unsigned long)key;
176
176
  }
177
177
 
178
178
  static int co_eq(const void *key1, const void *key2)
@@ -1163,6 +1163,7 @@ static FrtLazyDocField *lazy_df_new(ID name, const int size, FrtCompressionType
1163
1163
  self->data = FRT_ALLOC_AND_ZERO_N(FrtLazyDocFieldData, size);
1164
1164
  self->compression = compression;
1165
1165
  self->decompressed = false;
1166
+ self->loaded = false;
1166
1167
  return self;
1167
1168
  }
1168
1169
 
@@ -1400,6 +1401,7 @@ char *frt_lazy_df_get_data(FrtLazyDocField *self, int i) {
1400
1401
  frt_is_read_bytes(self->doc->fields_in, (frt_uchar *)text, read_len);
1401
1402
  text[read_len - 1] = '\0';
1402
1403
  }
1404
+ self->loaded = true;
1403
1405
  }
1404
1406
  }
1405
1407
 
@@ -1473,6 +1475,7 @@ static FrtLazyDoc *lazy_doc_new(int size, FrtInStream *fdt_in)
1473
1475
  self->size = size;
1474
1476
  self->fields = FRT_ALLOC_AND_ZERO_N(FrtLazyDocField *, size);
1475
1477
  self->fields_in = frt_is_clone(fdt_in);
1478
+ self->loaded = false;
1476
1479
  return self;
1477
1480
  }
1478
1481
 
@@ -529,7 +529,7 @@ extern FrtTVTerm *frt_tv_get_tv_term(FrtTermVector *tv, const char *term);
529
529
 
530
530
  /* * * FrtLazyDocField * * */
531
531
  typedef struct FrtLazyDocFieldData {
532
- frt_off_t start;
532
+ frt_off_t start;
533
533
  int length;
534
534
  rb_encoding *encoding;
535
535
  FrtCompressionType compression; /* as stored */
@@ -545,6 +545,7 @@ typedef struct FrtLazyDocField {
545
545
  int len; /* length of data elements concatenated */
546
546
  FrtCompressionType compression; /* as configured */
547
547
  bool decompressed;
548
+ bool loaded;
548
549
  } FrtLazyDocField;
549
550
 
550
551
  extern char *frt_lazy_df_get_data(FrtLazyDocField *self, int i);
@@ -556,6 +557,7 @@ struct FrtLazyDoc {
556
557
  int size;
557
558
  FrtLazyDocField **fields;
558
559
  FrtInStream *fields_in;
560
+ bool loaded;
559
561
  };
560
562
 
561
563
  extern void frt_lazy_doc_close(FrtLazyDoc *self);
@@ -29,6 +29,7 @@ VALUE sym_true;
29
29
  VALUE sym_false;
30
30
  VALUE sym_path;
31
31
  VALUE sym_dir;
32
+ VALUE sym_each;
32
33
 
33
34
  /* Modules */
34
35
  VALUE mIsomorfeus;
@@ -272,12 +273,13 @@ void Init_isomorfeus_ferret_ext(void) {
272
273
  id_data = rb_intern("@data");
273
274
 
274
275
  /* Symbols */
275
- sym_yes = ID2SYM(rb_intern("yes"));;
276
- sym_no = ID2SYM(rb_intern("no"));;
277
- sym_true = ID2SYM(rb_intern("true"));;
278
- sym_false = ID2SYM(rb_intern("false"));;
279
- sym_path = ID2SYM(rb_intern("path"));;
280
- sym_dir = ID2SYM(rb_intern("dir"));;
276
+ sym_yes = ID2SYM(rb_intern("yes"));
277
+ sym_no = ID2SYM(rb_intern("no"));
278
+ sym_true = ID2SYM(rb_intern("true"));
279
+ sym_false = ID2SYM(rb_intern("false"));
280
+ sym_path = ID2SYM(rb_intern("path"));
281
+ sym_dir = ID2SYM(rb_intern("dir"));
282
+ sym_each = ID2SYM(rb_intern("each"));
281
283
 
282
284
  mIsomorfeus = rb_define_module("Isomorfeus");
283
285
  mFerret = rb_define_module_under(mIsomorfeus, "Ferret");
@@ -5,6 +5,7 @@ module Isomorfeus
5
5
  # information on how to use this class.
6
6
  class Index
7
7
  include MonitorMixin
8
+ include Enumerable
8
9
  include Isomorfeus::Ferret::Store
9
10
  include Isomorfeus::Ferret::Search
10
11
 
@@ -485,15 +486,11 @@ module Isomorfeus
485
486
  end
486
487
  end
487
488
 
488
- # iterate through all documents in the index. This method preloads the
489
- # documents so you don't need to call #load on the document to load all the
490
- # fields.
491
- def each
489
+ # iterate through all documents in the index.
490
+ def each(&block)
492
491
  @dir.synchronize do
493
492
  ensure_reader_open
494
- (0...@reader.max_doc).each do |i|
495
- yield @reader[i].load unless @reader.deleted?(i)
496
- end
493
+ @reader.each(&block)
497
494
  end
498
495
  end
499
496
 
@@ -679,7 +676,7 @@ module Isomorfeus
679
676
  docs_to_add = []
680
677
  query = do_process_query(query)
681
678
  @searcher.search_each(query, :limit => :all) do |id, score|
682
- document = @searcher[id].load
679
+ document = @searcher[id].to_h
683
680
  if new_val.is_a?(Hash)
684
681
  document.merge!(new_val)
685
682
  else new_val.is_a?(String) or new_val.is_a?(Symbol)
@@ -850,6 +847,12 @@ module Isomorfeus
850
847
  end
851
848
  end
852
849
 
850
+ def to_enum
851
+ @dir.synchronize do
852
+ ensure_reader_open
853
+ @reader.to_enum
854
+ end
855
+ end
853
856
 
854
857
  protected
855
858
  def ensure_writer_open()
@@ -1,5 +1,5 @@
1
1
  module Isomorfeus
2
2
  module Ferret
3
- VERSION = '0.13.12'
3
+ VERSION = '0.14.0'
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: isomorfeus-ferret
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.13.12
4
+ version: 0.14.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Biedermann
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-05-20 00:00:00.000000000 Z
11
+ date: 2022-05-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: oj
@@ -181,6 +181,7 @@ files:
181
181
  - ext/isomorfeus_ferret_ext/fio_tmpfile.h
182
182
  - ext/isomorfeus_ferret_ext/frb_analysis.c
183
183
  - ext/isomorfeus_ferret_ext/frb_index.c
184
+ - ext/isomorfeus_ferret_ext/frb_lazy_doc.c
184
185
  - ext/isomorfeus_ferret_ext/frb_qparser.c
185
186
  - ext/isomorfeus_ferret_ext/frb_search.c
186
187
  - ext/isomorfeus_ferret_ext/frb_store.c