stretchy-model 0.6.5 → 0.6.6
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.yardopts +2 -1
- data/README.md +28 -10
- data/Rakefile +56 -0
- data/docs/.nojekyll +0 -0
- data/docs/README.md +147 -0
- data/docs/_coverpage.md +14 -0
- data/docs/_sidebar.md +14 -0
- data/docs/examples/_sidebar.md +15 -0
- data/docs/examples/data_analysis.md +216 -0
- data/docs/examples/semantic_search_with_llm.md +83 -0
- data/docs/examples/simple-ingest-pipeline.md +326 -0
- data/docs/guides/_sidebar.md +14 -0
- data/docs/guides/aggregations.md +142 -0
- data/docs/guides/machine-learning.md +154 -0
- data/docs/guides/models.md +372 -0
- data/docs/guides/pipelines.md +151 -0
- data/docs/guides/querying.md +361 -0
- data/docs/guides/quick-start.md +72 -0
- data/docs/guides/scopes.md +125 -0
- data/docs/index.html +113 -0
- data/docs/stretchy.cover.png +0 -0
- data/docs/stretchy.logo.png +0 -0
- data/docs/styles.css +90 -0
- data/lib/stretchy/attributes/transformers/keyword_transformer.rb +41 -35
- data/lib/stretchy/attributes/type/array.rb +24 -1
- data/lib/stretchy/attributes/type/base.rb +6 -2
- data/lib/stretchy/attributes/type/binary.rb +24 -17
- data/lib/stretchy/attributes/type/boolean.rb +29 -22
- data/lib/stretchy/attributes/type/completion.rb +18 -10
- data/lib/stretchy/attributes/type/constant_keyword.rb +35 -26
- data/lib/stretchy/attributes/type/date_time.rb +28 -17
- data/lib/stretchy/attributes/type/dense_vector.rb +46 -49
- data/lib/stretchy/attributes/type/flattened.rb +28 -19
- data/lib/stretchy/attributes/type/geo_point.rb +21 -12
- data/lib/stretchy/attributes/type/geo_shape.rb +21 -12
- data/lib/stretchy/attributes/type/hash.rb +24 -10
- data/lib/stretchy/attributes/type/histogram.rb +25 -0
- data/lib/stretchy/attributes/type/ip.rb +26 -17
- data/lib/stretchy/attributes/type/join.rb +16 -7
- data/lib/stretchy/attributes/type/keyword.rb +21 -26
- data/lib/stretchy/attributes/type/knn_vector.rb +47 -0
- data/lib/stretchy/attributes/type/match_only_text.rb +22 -1
- data/lib/stretchy/attributes/type/nested.rb +16 -11
- data/lib/stretchy/attributes/type/numeric/base.rb +30 -22
- data/lib/stretchy/attributes/type/numeric/byte.rb +20 -0
- data/lib/stretchy/attributes/type/numeric/double.rb +20 -0
- data/lib/stretchy/attributes/type/numeric/float.rb +20 -0
- data/lib/stretchy/attributes/type/numeric/half_float.rb +20 -0
- data/lib/stretchy/attributes/type/numeric/integer.rb +21 -1
- data/lib/stretchy/attributes/type/numeric/long.rb +20 -0
- data/lib/stretchy/attributes/type/numeric/scaled_float.rb +16 -7
- data/lib/stretchy/attributes/type/numeric/short.rb +20 -0
- data/lib/stretchy/attributes/type/numeric/unsigned_long.rb +21 -1
- data/lib/stretchy/attributes/type/percolator.rb +16 -4
- data/lib/stretchy/attributes/type/point.rb +19 -9
- data/lib/stretchy/attributes/type/range/base.rb +24 -1
- data/lib/stretchy/attributes/type/range/date_range.rb +21 -5
- data/lib/stretchy/attributes/type/range/double_range.rb +20 -4
- data/lib/stretchy/attributes/type/range/float_range.rb +21 -5
- data/lib/stretchy/attributes/type/range/integer_range.rb +20 -4
- data/lib/stretchy/attributes/type/range/ip_range.rb +20 -4
- data/lib/stretchy/attributes/type/range/long_range.rb +20 -4
- data/lib/stretchy/attributes/type/rank_feature.rb +16 -6
- data/lib/stretchy/attributes/type/rank_features.rb +16 -9
- data/lib/stretchy/attributes/type/search_as_you_type.rb +28 -18
- data/lib/stretchy/attributes/type/shape.rb +19 -9
- data/lib/stretchy/attributes/type/sparse_vector.rb +25 -21
- data/lib/stretchy/attributes/type/string.rb +42 -1
- data/lib/stretchy/attributes/type/text.rb +53 -28
- data/lib/stretchy/attributes/type/token_count.rb +21 -11
- data/lib/stretchy/attributes/type/version.rb +16 -6
- data/lib/stretchy/attributes/type/wildcard.rb +36 -25
- data/lib/stretchy/attributes.rb +29 -0
- data/lib/stretchy/delegation/gateway_delegation.rb +78 -0
- data/lib/stretchy/index_setting.rb +94 -0
- data/lib/stretchy/indexing/bulk.rb +75 -3
- data/lib/stretchy/model/callbacks.rb +1 -0
- data/lib/stretchy/model/common.rb +157 -0
- data/lib/stretchy/model/persistence.rb +144 -0
- data/lib/stretchy/model/refreshable.rb +26 -0
- data/lib/stretchy/pipeline.rb +2 -1
- data/lib/stretchy/pipelines/processor.rb +38 -36
- data/lib/stretchy/querying.rb +7 -8
- data/lib/stretchy/record.rb +5 -4
- data/lib/stretchy/relation.rb +229 -28
- data/lib/stretchy/relations/aggregation_methods/aggregation.rb +59 -0
- data/lib/stretchy/relations/aggregation_methods/avg.rb +45 -0
- data/lib/stretchy/relations/aggregation_methods/bucket_script.rb +47 -0
- data/lib/stretchy/relations/aggregation_methods/bucket_selector.rb +47 -0
- data/lib/stretchy/relations/aggregation_methods/bucket_sort.rb +47 -0
- data/lib/stretchy/relations/aggregation_methods/cardinality.rb +47 -0
- data/lib/stretchy/relations/aggregation_methods/children.rb +47 -0
- data/lib/stretchy/relations/aggregation_methods/composite.rb +41 -0
- data/lib/stretchy/relations/aggregation_methods/date_histogram.rb +53 -0
- data/lib/stretchy/relations/aggregation_methods/date_range.rb +53 -0
- data/lib/stretchy/relations/aggregation_methods/extended_stats.rb +48 -0
- data/lib/stretchy/relations/aggregation_methods/filter.rb +47 -0
- data/lib/stretchy/relations/aggregation_methods/filters.rb +47 -0
- data/lib/stretchy/relations/aggregation_methods/geo_bounds.rb +40 -0
- data/lib/stretchy/relations/aggregation_methods/geo_centroid.rb +40 -0
- data/lib/stretchy/relations/aggregation_methods/global.rb +39 -0
- data/lib/stretchy/relations/aggregation_methods/histogram.rb +43 -0
- data/lib/stretchy/relations/aggregation_methods/ip_range.rb +41 -0
- data/lib/stretchy/relations/aggregation_methods/max.rb +40 -0
- data/lib/stretchy/relations/aggregation_methods/min.rb +41 -0
- data/lib/stretchy/relations/aggregation_methods/missing.rb +40 -0
- data/lib/stretchy/relations/aggregation_methods/nested.rb +40 -0
- data/lib/stretchy/relations/aggregation_methods/percentile_ranks.rb +45 -0
- data/lib/stretchy/relations/aggregation_methods/percentiles.rb +45 -0
- data/lib/stretchy/relations/aggregation_methods/range.rb +42 -0
- data/lib/stretchy/relations/aggregation_methods/reverse_nested.rb +40 -0
- data/lib/stretchy/relations/aggregation_methods/sampler.rb +40 -0
- data/lib/stretchy/relations/aggregation_methods/scripted_metric.rb +43 -0
- data/lib/stretchy/relations/aggregation_methods/significant_terms.rb +45 -0
- data/lib/stretchy/relations/aggregation_methods/stats.rb +42 -0
- data/lib/stretchy/relations/aggregation_methods/sum.rb +42 -0
- data/lib/stretchy/relations/aggregation_methods/terms.rb +46 -0
- data/lib/stretchy/relations/aggregation_methods/top_hits.rb +42 -0
- data/lib/stretchy/relations/aggregation_methods/top_metrics.rb +44 -0
- data/lib/stretchy/relations/aggregation_methods/value_count.rb +41 -0
- data/lib/stretchy/relations/aggregation_methods/weighted_avg.rb +42 -0
- data/lib/stretchy/relations/aggregation_methods.rb +20 -749
- data/lib/stretchy/relations/finder_methods.rb +2 -18
- data/lib/stretchy/relations/null_relation.rb +55 -0
- data/lib/stretchy/relations/query_builder.rb +82 -36
- data/lib/stretchy/relations/query_methods/bind.rb +19 -0
- data/lib/stretchy/relations/query_methods/extending.rb +29 -0
- data/lib/stretchy/relations/query_methods/fields.rb +70 -0
- data/lib/stretchy/relations/query_methods/filter_query.rb +53 -0
- data/lib/stretchy/relations/query_methods/has_field.rb +40 -0
- data/lib/stretchy/relations/query_methods/highlight.rb +75 -0
- data/lib/stretchy/relations/query_methods/hybrid.rb +60 -0
- data/lib/stretchy/relations/query_methods/ids.rb +40 -0
- data/lib/stretchy/relations/query_methods/match.rb +52 -0
- data/lib/stretchy/relations/query_methods/must_not.rb +54 -0
- data/lib/stretchy/relations/query_methods/neural.rb +58 -0
- data/lib/stretchy/relations/query_methods/neural_sparse.rb +43 -0
- data/lib/stretchy/relations/query_methods/none.rb +21 -0
- data/lib/stretchy/relations/query_methods/or_filter.rb +21 -0
- data/lib/stretchy/relations/query_methods/order.rb +63 -0
- data/lib/stretchy/relations/query_methods/query_string.rb +44 -0
- data/lib/stretchy/relations/query_methods/regexp.rb +61 -0
- data/lib/stretchy/relations/query_methods/should.rb +51 -0
- data/lib/stretchy/relations/query_methods/size.rb +44 -0
- data/lib/stretchy/relations/query_methods/skip_callbacks.rb +47 -0
- data/lib/stretchy/relations/query_methods/source.rb +59 -0
- data/lib/stretchy/relations/query_methods/where.rb +113 -0
- data/lib/stretchy/relations/query_methods.rb +48 -569
- data/lib/stretchy/relations/scoping/default.rb +136 -0
- data/lib/stretchy/relations/scoping/named.rb +70 -0
- data/lib/stretchy/relations/scoping/scope_registry.rb +36 -0
- data/lib/stretchy/relations/scoping.rb +30 -0
- data/lib/stretchy/relations/search_option_methods.rb +2 -0
- data/lib/stretchy/version.rb +1 -1
- data/lib/stretchy.rb +17 -10
- metadata +111 -17
- data/lib/stretchy/common.rb +0 -38
- data/lib/stretchy/null_relation.rb +0 -53
- data/lib/stretchy/persistence.rb +0 -43
- data/lib/stretchy/refreshable.rb +0 -15
- data/lib/stretchy/scoping/default.rb +0 -134
- data/lib/stretchy/scoping/named.rb +0 -68
- data/lib/stretchy/scoping/scope_registry.rb +0 -34
- data/lib/stretchy/scoping.rb +0 -28
@@ -0,0 +1,372 @@
|
|
1
|
+
# Models
|
2
|
+
|
3
|
+
|
4
|
+
## Creating Stretchy Documents
|
5
|
+
|
6
|
+
Create your models in app/models and subclass the `StretchyModel` class.
|
7
|
+
|
8
|
+
```ruby
|
9
|
+
class Profile < StretchyModel
|
10
|
+
end
|
11
|
+
```
|
12
|
+
|
13
|
+
The index name `profiles` will be inferred from the `Profile` model.
|
14
|
+
|
15
|
+
### Overriding Naming Conventions
|
16
|
+
|
17
|
+
If you need a different naming convention for your indexes, you can override the default conventions.
|
18
|
+
|
19
|
+
```ruby
|
20
|
+
class Profile < StretchyModel
|
21
|
+
index_name 'user_profiles'
|
22
|
+
end
|
23
|
+
```
|
24
|
+
|
25
|
+
### Default Sorting
|
26
|
+
|
27
|
+
By default Stretchy Models are sorted by `created_at`. If you have another field you'd like to sort by you can easily change it.
|
28
|
+
|
29
|
+
|
30
|
+
```ruby
|
31
|
+
class Profile < StretchyModel
|
32
|
+
default_sort_key :updated_at
|
33
|
+
end
|
34
|
+
```
|
35
|
+
|
36
|
+
|
37
|
+
## Attributes
|
38
|
+
|
39
|
+
|
40
|
+
Attributes are defined using `Stretchy::Attributes::Type` attributes.
|
41
|
+
|
42
|
+
* `:array` - Used to store multiple values in a single field.
|
43
|
+
* `:binary` - Used to store binary data as a `Base64` encoded string.
|
44
|
+
* `:boolean` - Used to store true or false values.
|
45
|
+
* `:constant_keyword` - Used for fields that will contain the same value across all documents.
|
46
|
+
* `:datetime` - Used to store date and time.
|
47
|
+
* `:flattened` - Used for indexing object hierarchies as a flat list.
|
48
|
+
* `:geo_point` - Used to store geographical locations as latitude/longitude points.
|
49
|
+
* `:geo_shape` - Used to store complex shapes like polygons.
|
50
|
+
* `:histogram` - Used to support aggregations on numerical data.
|
51
|
+
* `:hash` - Used for structured JSON objects. Each field can be of any data type, including another object.
|
52
|
+
* `:ip` - Used to store IP addresses.
|
53
|
+
* `:join` - Used to create parent/child relation within documents.
|
54
|
+
* `:keyword` - Used for exact match searches, sorting, and aggregations. This field is not analyzed.
|
55
|
+
* `:match_only_text` - Used for full-text search.
|
56
|
+
* `:nested` - Used to index arrays of objects as separate documents.
|
57
|
+
* `:percolator` - Used to store queries for matching documents.
|
58
|
+
* `:point` - Used to store points in space.
|
59
|
+
* `:rank_feature` - Used to record a numeric feature to boost hits at query time.
|
60
|
+
* `:rank_features` - Used to record many numeric features to boost hits at query time.
|
61
|
+
* `:text` - Used to store text. This field is analyzed, which means that its value is broken down into separate searchable terms.
|
62
|
+
* `:token_count` - Used to count the number of tokens in a string.
|
63
|
+
* `:dense_vector` - Used to store dense vectors of float values.
|
64
|
+
* `:search_as_you_type` - Used for full-text search, especially for auto-complete functionality.
|
65
|
+
* `:sparse_vector` - Used to store sparse vectors of float values.
|
66
|
+
* `:string` - Used to store text. This field is analyzed, which means that its value is broken down into separate searchable terms. _an alias of `:text`_
|
67
|
+
* `:version` - Used to store version numbers.
|
68
|
+
* `:wildcard` - Used for fields that will contain arbitrary strings.
|
69
|
+
|
70
|
+
#### Numeric Types
|
71
|
+
* `:long` - Used to store long integer numbers.
|
72
|
+
* `:integer` - Used to store integer numbers.
|
73
|
+
* `:short` - Used to store short integer numbers.
|
74
|
+
* `:byte` - Used to store byte data.
|
75
|
+
* `:double` - Used to store double-precision floating point numbers.
|
76
|
+
* `:float` - Used to store floating point numbers.
|
77
|
+
* `:half_float` - Used to store half-precision floating point numbers.
|
78
|
+
* `:scaled_float` - Used to store floating point numbers that are scaled by a specific factor.
|
79
|
+
* `:unsigned_long` - Used to store long integer numbers that are always positive.
|
80
|
+
|
81
|
+
#### Range Types
|
82
|
+
* `:integer_range` - Used to store ranges of integer numbers.
|
83
|
+
* `:float_range` - Used to store ranges of floating point numbers.
|
84
|
+
* `:long_range` - Used to store ranges of long integer numbers.
|
85
|
+
* `:double_range` - Used to store ranges of double-precision floating point numbers.
|
86
|
+
* `:date_range` - Used to store ranges of dates.
|
87
|
+
* `:ip_range` - Used to store ranges of IP addresses.
|
88
|
+
|
89
|
+
In Elasticsearch, `:string` and `:keyword` are two seemlingly similar data types that are used for different purposes.
|
90
|
+
|
91
|
+
A `:string` field is analyzed, which means that its value is broken down into separate searchable terms. For example, the string "quick brown fox" might be broken down into the terms "quick", "brown", and "fox". This makes `:string` fields suitable for full-text search where you want to find partial matches or search for individual words within a field.
|
92
|
+
|
93
|
+
On the other hand, a `:keyword` field is not analyzed and is used for exact match searches, sorting, and aggregations. The entire value of the field is used as a single term. For example, the string "quick brown fox" is a single term. This makes `:keyword` fields suitable for filtering, sorting, and aggregations where you need the exact value of the field.
|
94
|
+
|
95
|
+
In Stretchy, when you define a field as `:keyword`, it's like defining a `:string` or `:text` field but with the `.keyword` notation automatically added to the field in queries, aggregations, and filters. This tells Elasticsearch to treat the field as a `:keyword` field and use the entire field value as a single term.
|
96
|
+
|
97
|
+
Avoid using `:keyword` fields for full-text search. Use the `:text` or `:string` type instead.
|
98
|
+
|
99
|
+
|
100
|
+
```ruby
|
101
|
+
class Profile < StretchyModel
|
102
|
+
|
103
|
+
attribute :first_name, :keyword
|
104
|
+
attribute :last_name, :keyword
|
105
|
+
attribute :geo, :hash
|
106
|
+
attribute :bio, :text
|
107
|
+
attribute :age, :integer
|
108
|
+
attribute :score, :float
|
109
|
+
attribute :joined, :datetime
|
110
|
+
attribute :fav_flowers, :array
|
111
|
+
attribute :visible, :boolean, default: true
|
112
|
+
|
113
|
+
end
|
114
|
+
|
115
|
+
```
|
116
|
+
|
117
|
+
>[!TIP]
|
118
|
+
>
|
119
|
+
> `created_at`, `updated_at` and `id` are automatically included
|
120
|
+
|
121
|
+
|
122
|
+
All of the attribute types can receive additional parameters that are used to configure the mapping for the field. These parameters can be used to customize how the data is indexed and searched. For example, you can specify whether a field should be stored, whether it should be indexed, the data type of the field, and more.
|
123
|
+
|
124
|
+
Here's an example of how to use additional parameters with the :keyword attribute type:
|
125
|
+
|
126
|
+
```ruby
|
127
|
+
class Product < StretchyModel
|
128
|
+
attribute :tag, :keyword, ignore_above: 256, time_series_dimension: true
|
129
|
+
end
|
130
|
+
```
|
131
|
+
In this example, ignore_above: 256 means that strings longer than 256 characters will not be indexed, and time_series_dimension: true indicates that the field is a time series dimension.
|
132
|
+
|
133
|
+
Refer to the documentation for each attribute type for a full list of available parameters and their meanings.
|
134
|
+
|
135
|
+
When working with `:hash` data types, it's often useful to specify the fields within that object for fine-tuned control over mappings.
|
136
|
+
|
137
|
+
The `:hash` data type is used for structured JSON objects, where each field can be of any data type, including another object. This provides a flexible way to store complex data structures in a single field.
|
138
|
+
|
139
|
+
However, this flexibility can sometimes lead to less than optimal search performance or unexpected search results. For example, if a field within the hash object contains text, it might be analyzed and tokenized in a way that's not suitable for your use case.
|
140
|
+
|
141
|
+
To overcome this, you can specify the `:properties` within the hash object and their data types. This allows you to control how each field is indexed and searched. For example, you can specify that a field should be of type `:keyword` to ensure that it's not analyzed and can be used for exact match searches.
|
142
|
+
|
143
|
+
Here's an example of how to specify fields within a hash object:
|
144
|
+
|
145
|
+
```ruby
|
146
|
+
attribute :metadata, :hash, properties: {
|
147
|
+
title: { type: :text },
|
148
|
+
tags: { type: :keyword }
|
149
|
+
checkins: { type: :array }
|
150
|
+
}
|
151
|
+
```
|
152
|
+
In this example, the metadata attribute is a hash object with fields: title, tags and checkins. The title field is of type `:text`, which means it will be analyzed and can be used for full-text search. The tags field is of type `:keyword`, which means it will not be analyzed and can be used for exact match searches. The checkins field is of type `:array` which means it can contain zero or more values.
|
153
|
+
|
154
|
+
> [!NOTE]
|
155
|
+
>
|
156
|
+
>When adding a field dynamically, the first value in the array determines the field type. All subsequent values must be of the same data type or it must at least be possible to coerce subsequent values to the same data type.
|
157
|
+
>
|
158
|
+
>Arrays with a mixture of data types are not supported: `[ 10, "some string" ]`
|
159
|
+
|
160
|
+
By specifying the fields within the hash object, you have fine-tuned control over the mappings and can optimize the search performance and accuracy for your specific use case.
|
161
|
+
|
162
|
+
### Reading and Writing Data
|
163
|
+
|
164
|
+
|
165
|
+
#### Create
|
166
|
+
|
167
|
+
The `new` method will return a new object while `create` with return the object and save it to the index.
|
168
|
+
|
169
|
+
For example, given a model `Profile` with attributes `first_name`, `last_name`, and `age`, the `create` method call will create and save a new record to the index:
|
170
|
+
|
171
|
+
```ruby
|
172
|
+
profile = Profile.new(first_name: "Candy", last_name: "Mu", age: 33)
|
173
|
+
```
|
174
|
+
|
175
|
+
Using the `new` method, an object can be instatiated without being saved:
|
176
|
+
|
177
|
+
```ruby
|
178
|
+
profile = Profile.new
|
179
|
+
profile.first_name = "Candy"
|
180
|
+
profile.last_name = "Mu"
|
181
|
+
profile.age = 33
|
182
|
+
```
|
183
|
+
|
184
|
+
Calling `profile.save` will index the record.
|
185
|
+
|
186
|
+
#### Read
|
187
|
+
|
188
|
+
Following the ActiveRecord API, Stretchy models have the familiar methods defined:
|
189
|
+
|
190
|
+
```ruby
|
191
|
+
# return a collection with all profiles
|
192
|
+
profiles = Profile.all
|
193
|
+
```
|
194
|
+
|
195
|
+
```ruby
|
196
|
+
# return the first profile
|
197
|
+
profile = Profile.first
|
198
|
+
```
|
199
|
+
|
200
|
+
```ruby
|
201
|
+
# find all profiles with age of 24 named Lori and sort by :created_at in descending order
|
202
|
+
profile = Profile.where(first_name: "Lori", age: 24).sort(created_at: :desc)
|
203
|
+
```
|
204
|
+
|
205
|
+
The full [Query Interface](guides/querying) guide goes into further depth on the API available for interacting with Stretchy models.
|
206
|
+
|
207
|
+
|
208
|
+
#### Update
|
209
|
+
|
210
|
+
```ruby
|
211
|
+
profile = Profile.where(first_name: "Candy").first
|
212
|
+
profile.update(first_name: "Lilly")
|
213
|
+
```
|
214
|
+
|
215
|
+
|
216
|
+
#### Delete
|
217
|
+
|
218
|
+
```ruby
|
219
|
+
profile = Profile.where(first_name: "Lori")
|
220
|
+
profile.destroy
|
221
|
+
```
|
222
|
+
|
223
|
+
### Bulk Operations
|
224
|
+
|
225
|
+
Bulk operations in Elasticsearch are a way to perform multiple operations in a single API call. This can greatly increase the speed of indexing and updating documents in Elasticsearch.
|
226
|
+
|
227
|
+
The bulk API makes it possible to perform many `index`, `update`, `create`, or `delete` operations in a single API call. This can greatly improve the indexing speed.
|
228
|
+
|
229
|
+
In the context of Stretchy, the `Model.bulk` method can be used to perform bulk operations. You can pass an array of records to this method, and each record will be processed according to the operation specified in its `to_bulk` method.
|
230
|
+
|
231
|
+
The `to_bulk` method is used to generate the structure for the bulk operation. By default, it performs an index operation, but you can also specify `:delete` or `:update` operations.
|
232
|
+
|
233
|
+
For large datasets, you can use the `Model.bulk_in_batches` method to perform bulk operations in batches. This method divides the records into batches of a specified size and processes each batch separately. This can be more memory-efficient than processing all records at once, especially for large datasets.
|
234
|
+
|
235
|
+
Here's an example of how you can use these methods:
|
236
|
+
|
237
|
+
```ruby
|
238
|
+
Model.bulk(records_as_bulk_operations)
|
239
|
+
```
|
240
|
+
|
241
|
+
#### Bulk helper
|
242
|
+
Generates structure for the bulk operation
|
243
|
+
```ruby
|
244
|
+
record.to_bulk # default to_bulk(:index)
|
245
|
+
record.to_bulk(:delete)
|
246
|
+
record.to_bulk(:update)
|
247
|
+
```
|
248
|
+
|
249
|
+
#### In batches
|
250
|
+
Run bulk operations in batches specified by `size`
|
251
|
+
```ruby
|
252
|
+
Model.bulk_in_batches(records, size: 100) do |batch|
|
253
|
+
batch.map! { |record| Model.new(record).to_bulk }
|
254
|
+
end
|
255
|
+
```
|
256
|
+
|
257
|
+
|
258
|
+
### Validations
|
259
|
+
|
260
|
+
```ruby
|
261
|
+
class Profile < StretchyModel
|
262
|
+
attribute :first_name, :string
|
263
|
+
validates :first_name, presence: true
|
264
|
+
end
|
265
|
+
```
|
266
|
+
|
267
|
+
```irb
|
268
|
+
profile = Profile.new
|
269
|
+
profile.save
|
270
|
+
#=> First name can't be blank
|
271
|
+
```
|
272
|
+
|
273
|
+
### Callbacks
|
274
|
+
|
275
|
+
* `:before_save`
|
276
|
+
* `:after_save`
|
277
|
+
* `:before_create`
|
278
|
+
* `:after_create`
|
279
|
+
* `:before_update`
|
280
|
+
* `:after_update`
|
281
|
+
* `:before_destroy`
|
282
|
+
* `:after_destroy`
|
283
|
+
|
284
|
+
### Associations
|
285
|
+
Associations can be made between models. While Elasticsearch is not a relational database, it is sometimes useful to have a link between records.
|
286
|
+
|
287
|
+
```ruby
|
288
|
+
class Animal < StretchyModel
|
289
|
+
attribute :name, :keyword
|
290
|
+
attribute :zoo_id, :keyword
|
291
|
+
|
292
|
+
belongs_to :zoo
|
293
|
+
end
|
294
|
+
```
|
295
|
+
|
296
|
+
```ruby
|
297
|
+
class Zoo < StretchyModel
|
298
|
+
has_many :animals
|
299
|
+
end
|
300
|
+
```
|
301
|
+
|
302
|
+
```ruby (console)
|
303
|
+
zoo.animals
|
304
|
+
=> [#<Animal id: 1, name: "Panda", zoo_id: 1, created_at: 2024-03-15T01:03:38.395Z, updated_at: 2024-03-15T01:03:38.395Z>,
|
305
|
+
#<Animal id: 2, name: "Lemur", zoo_id: 1, created_at: 2024-03-15T01:03:38.395Z, updated_at: 2024-03-15T01:03:38.395Z>]
|
306
|
+
|
307
|
+
```
|
308
|
+
|
309
|
+
Associations largely work the same as their Rails counterparts. The following association types are supported:
|
310
|
+
|
311
|
+
* `belongs_to`
|
312
|
+
* `has_many`
|
313
|
+
* `has_one`
|
314
|
+
|
315
|
+
|
316
|
+
>[!WARNING|label:Associations in Elasticsearch]
|
317
|
+
>
|
318
|
+
> Because Elasticsearch is not a relational database, there are no join statements. This means associations will generate additional queries to fetch data.
|
319
|
+
|
320
|
+
|
321
|
+
## Mappings
|
322
|
+
|
323
|
+
In Elasticsearch, mappings define how documents and their fields are stored and indexed. When using the attribute method you can specify additional mapping options as parameters.
|
324
|
+
|
325
|
+
```ruby
|
326
|
+
class Report < StretchyModel
|
327
|
+
attribute :title, :text
|
328
|
+
attribute :created_by, :keyword
|
329
|
+
attribute :body, :text, term_vector: :with_positions_offsets
|
330
|
+
end
|
331
|
+
```
|
332
|
+
The attribute method is used to define the fields of the Report model. The first argument is the field name, the second argument is the field type, and any subsequent arguments are additional mapping options for the field.
|
333
|
+
|
334
|
+
In the case of the body field, `term_vector: :with_positions_offsets` is an additional mapping option. This option configures Elasticsearch to store term vectors for the body field, including position and character offset information. Term vectors are used to speed up full-text search operations.
|
335
|
+
|
336
|
+
So, when you're defining mappings for your Elasticsearch models, you can use the attribute method to specify not only the field names and types, but also any additional mapping options that you need.
|
337
|
+
|
338
|
+
If you're curious and want to dive deeper, check out the [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html). It's a treasure trove of information on all the supported options for each field type.
|
339
|
+
|
340
|
+
```ruby
|
341
|
+
Report.mappings
|
342
|
+
=> {
|
343
|
+
"properties" => {
|
344
|
+
"id" => {
|
345
|
+
"type" => "keyword"
|
346
|
+
},
|
347
|
+
"created_at" => {
|
348
|
+
"type" => "date"
|
349
|
+
},
|
350
|
+
"updated_at" => {
|
351
|
+
"type" => "date"
|
352
|
+
},
|
353
|
+
"title" => {
|
354
|
+
"type" => "text"
|
355
|
+
},
|
356
|
+
"created_by" => {
|
357
|
+
"type" => "keyword"
|
358
|
+
},
|
359
|
+
"body" => {
|
360
|
+
"type" => "text",
|
361
|
+
"term_vector" => "with_positions_offsets",
|
362
|
+
"fields" => {
|
363
|
+
"keyword" => {
|
364
|
+
"type" => "keyword",
|
365
|
+
"ignore_above" => 256
|
366
|
+
}
|
367
|
+
}
|
368
|
+
}
|
369
|
+
},
|
370
|
+
:dynamic => true
|
371
|
+
}
|
372
|
+
```
|
@@ -0,0 +1,151 @@
|
|
1
|
+
# Pipelines
|
2
|
+
|
3
|
+
Pipelines follow a specific convention for storing pipeline definitions. This helps us keep our code organized and easy to navigate.
|
4
|
+
|
5
|
+
Generally, pipelines can be stored in `app/pipelines`. However, if you have a mix of ingest and search pipelines it's good practice to break them out into their own namespace.
|
6
|
+
|
7
|
+
For ingest pipelines, we store each pipeline in its own Ruby file inside the `app/pipelines/ingest` directory. The file name matches the pipeline name. For example, the definition for an ingest pipeline named `example_pipeline` would be stored in `app/pipelines/ingest/example_pipeline.rb`.
|
8
|
+
|
9
|
+
Similarly, for search pipelines, we store each pipeline in its own Ruby file inside the `app/pipelines/search` directory. Again, the file name matches the pipeline name. So, a search pipeline named `example_pipeline` would be stored in `app/pipelines/search/example_pipeline.rb`.
|
10
|
+
|
11
|
+
This convention makes it easy to find the definition for a specific pipeline, whether it's an ingest pipeline or a search pipeline.
|
12
|
+
|
13
|
+
- *app/pipelines/ingest/example_ingest_pipeline.rb*
|
14
|
+
- *app/pipelines/search/example_search_pipeline.rb*
|
15
|
+
|
16
|
+
---
|
17
|
+
|
18
|
+
## Defining a Pipeline
|
19
|
+
|
20
|
+
A pipeline in `stretchy-model` is defined as a Ruby class that inherits from `Stretchy::Pipeline`. It has a specific structure and includes several key components:
|
21
|
+
|
22
|
+
- `pipeline_name`: This is the ID of your pipeline. By default, it's inferred from the class name.
|
23
|
+
- `description`: This is a brief description of what your pipeline does. It's a good practice to provide a meaningful description so that others can understand the purpose of your pipeline at a glance.
|
24
|
+
- `processor`: These are the processors that your pipeline will run. A pipeline can have one or more processors. Each processor is defined with a type (like `sparse_encoding`) and a set of options. The processors are run in the order they are defined. Refer to [Ingest Processors](/guides/pipelines?id=ingest-processors) for available options.
|
25
|
+
|
26
|
+
|
27
|
+
Here's an example of a pipeline with these components:
|
28
|
+
|
29
|
+
```ruby
|
30
|
+
class NLPSparsePipeline < Stretchy::Pipeline
|
31
|
+
|
32
|
+
#inferred from class by default
|
33
|
+
pipeline_name 'nlp-sparse-pipeline'
|
34
|
+
|
35
|
+
description "Sparse encoding pipeline"
|
36
|
+
|
37
|
+
processor :sparse_encoding,
|
38
|
+
model_id: 'q32Pw02BJ3squ3VZa',
|
39
|
+
field_map: {
|
40
|
+
body: :embedding
|
41
|
+
}
|
42
|
+
end
|
43
|
+
```
|
44
|
+
|
45
|
+
[^1]: https://opensearch.org/docs/latest/ingest-pipelines/
|
46
|
+
|
47
|
+
[^2]: https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html
|
48
|
+
|
49
|
+
|
50
|
+
## Managing Pipelines
|
51
|
+
|
52
|
+
### create!
|
53
|
+
This method creates a new pipeline in Elasticsearch. It uses the pipeline definition provided in the class where it's called. If a pipeline with the same name already exists, it will be overwritten.
|
54
|
+
|
55
|
+
Example:
|
56
|
+
|
57
|
+
```ruby
|
58
|
+
MyPipeline.create!
|
59
|
+
```
|
60
|
+
|
61
|
+
### delete!
|
62
|
+
This method deletes the pipeline from Elasticsearch. Be careful when using this method, as it will permanently remove the pipeline.
|
63
|
+
|
64
|
+
Example:
|
65
|
+
|
66
|
+
```ruby
|
67
|
+
MyPipeline.delete!
|
68
|
+
```
|
69
|
+
|
70
|
+
### exists?
|
71
|
+
This method checks if the pipeline exists in Elasticsearch. It returns true if the pipeline exists, and false otherwise.
|
72
|
+
|
73
|
+
Example:
|
74
|
+
|
75
|
+
```ruby
|
76
|
+
MyPipeline.exists?
|
77
|
+
```
|
78
|
+
|
79
|
+
### find
|
80
|
+
This method retrieves a pipeline from Elasticsearch using its ID. If the pipeline exists, it returns the pipeline definition. If the pipeline doesn't exist, it returns nil.
|
81
|
+
|
82
|
+
|
83
|
+
Example:
|
84
|
+
```ruby
|
85
|
+
# Find another pipeline id
|
86
|
+
MyPipeline.find('another-pipeline-id')
|
87
|
+
|
88
|
+
# Find this pipeline
|
89
|
+
MyPipeline.find
|
90
|
+
```
|
91
|
+
### all
|
92
|
+
This method returns all pipelines that currently exist in Elasticsearch. It's useful when you want to see all your pipelines at once.
|
93
|
+
|
94
|
+
Example:
|
95
|
+
```ruby
|
96
|
+
MyPipeline.all
|
97
|
+
```
|
98
|
+
|
99
|
+
### simulate
|
100
|
+
|
101
|
+
This method is used to simulate the execution of the pipeline on a set of documents. It takes two parameters: `docs`, which is an array of documents to be processed, and `verbose`, which is a boolean that controls whether detailed information about each step is included in the simulation output.
|
102
|
+
|
103
|
+
Example:
|
104
|
+
|
105
|
+
```ruby
|
106
|
+
MyPipeline.simulate([{ '_source' => { 'message' => 'hello world' } }])
|
107
|
+
```
|
108
|
+
|
109
|
+
---
|
110
|
+
|
111
|
+
## Ingest Processors
|
112
|
+
- `:append` - Adds one or more values to a field in a document.
|
113
|
+
- `:bytes` - Converts a human-readable byte value to its value in bytes.
|
114
|
+
- `:convert` - Changes the data type of a field in a document.
|
115
|
+
- `:copy` - Copies an entire object in an existing field to another field.
|
116
|
+
- `:csv` - Extracts CSVs and stores them as individual fields in a document.
|
117
|
+
- `:date` - Parses dates from fields and then uses the date or timestamp as the timestamp for a document.
|
118
|
+
- `:date_index_name` - Indexes documents into time-based indexes based on a date or timestamp field in a document.
|
119
|
+
- `:dissect` - Extracts structured fields from a text field using a defined pattern.
|
120
|
+
- `:dot_expander` - Expands a field with dots into an object field.
|
121
|
+
- `:drop` - Drops a document without indexing it or raising any errors.
|
122
|
+
- `:fail` - Raises an exception and stops the execution of a pipeline.
|
123
|
+
- `:foreach` - Allows for another processor to be applied to each element of an array or an object field in a document.
|
124
|
+
- `:geoip` - Adds information about the geographical location of an IP address.
|
125
|
+
- `:geojson-feature` - Indexes GeoJSON data into a geospatial field.
|
126
|
+
- `:grok` - Parses and structures unstructured data using pattern matching.
|
127
|
+
- `:gsub` - Replaces or deletes substrings within a string field of a document.
|
128
|
+
- `:html_strip` - Removes HTML tags from a text field and returns the plain text content.
|
129
|
+
- `:ip2geo` - Adds information about the geographical location of an IPv4 or IPv6 address.
|
130
|
+
- `:join` - Concatenates each element of an array into a single string using a separator character between each element.
|
131
|
+
- `:json` - Converts a JSON string into a structured JSON object.
|
132
|
+
- `:kv` - Automatically parses key-value pairs in a field.
|
133
|
+
- `:lowercase` - Converts text in a specific field to lowercase letters.
|
134
|
+
- `:pipeline` - Runs an inner pipeline.
|
135
|
+
- `:remove` - Removes fields from a document.
|
136
|
+
- `:script` - Runs an inline or stored script on incoming documents.
|
137
|
+
- `:set` - Sets the value of a field to a specified value.
|
138
|
+
- `:sort` - Sorts the elements of an array in ascending or descending order.
|
139
|
+
- `:sparse_encoding` - Generates a sparse vector/token and weights from text fields for neural sparse search using sparse retrieval.
|
140
|
+
- `:split` - Splits a field into an array using a separator character.
|
141
|
+
- `:text_embedding` - Generates vector embeddings from text fields for semantic search.
|
142
|
+
- `:text_image_embedding` - Generates combined vector embeddings from text and image fields for multimodal neural search.
|
143
|
+
- `:trim` - Removes leading and trailing white space from a string field.
|
144
|
+
- `:uppercase` - Converts text in a specific field to uppercase letters.
|
145
|
+
- `:urldecode` - Decodes a string from URL-encoded format.
|
146
|
+
- `:user_agent` - Extracts details from the user agent sent by a browser to its web requests.
|
147
|
+
|
148
|
+
Please note that this is not an exhaustive list. There are many more ingest processors available in Elasticsearch and OpenSearch. For a complete list and their available options, refer to the official documentation:
|
149
|
+
|
150
|
+
- [Elasticsearch Ingest Processors](https://www.elastic.co/guide/en/elasticsearch/reference/current/processors.html)
|
151
|
+
- [OpenSearch Ingest Processors](https://opensearch.org/docs/latest/ingest-pipelines/processors/index-processors/#supported-processors)
|