stretchy-model 0.6.5 → 0.7.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (212) hide show
  1. checksums.yaml +4 -4
  2. data/.yardopts +2 -1
  3. data/README.md +28 -10
  4. data/Rakefile +56 -0
  5. data/docs/.nojekyll +0 -0
  6. data/docs/README.md +147 -0
  7. data/docs/_coverpage.md +14 -0
  8. data/docs/_sidebar.md +15 -0
  9. data/docs/examples/_sidebar.md +15 -0
  10. data/docs/examples/data_analysis.md +216 -0
  11. data/docs/examples/neural_search_with_llm.md +381 -0
  12. data/docs/examples/simple-ingest-pipeline.md +326 -0
  13. data/docs/guides/_sidebar.md +15 -0
  14. data/docs/guides/aggregations.md +142 -0
  15. data/docs/guides/machine-learning.md +154 -0
  16. data/docs/guides/models.md +372 -0
  17. data/docs/guides/pipelines.md +151 -0
  18. data/docs/guides/querying.md +361 -0
  19. data/docs/guides/quick-start.md +72 -0
  20. data/docs/guides/scopes.md +125 -0
  21. data/docs/index.html +113 -0
  22. data/docs/stretchy.cover.png +0 -0
  23. data/docs/stretchy.logo.png +0 -0
  24. data/docs/styles.css +90 -0
  25. data/lib/elasticsearch/api/actions/connector/check_in.rb +64 -0
  26. data/lib/elasticsearch/api/actions/connector/delete.rb +64 -0
  27. data/lib/elasticsearch/api/actions/connector/get.rb +64 -0
  28. data/lib/elasticsearch/api/actions/connector/last_sync.rb +66 -0
  29. data/lib/elasticsearch/api/actions/connector/list.rb +60 -0
  30. data/lib/elasticsearch/api/actions/connector/post.rb +57 -0
  31. data/lib/elasticsearch/api/actions/connector/put.rb +66 -0
  32. data/lib/elasticsearch/api/actions/connector/update_api_key_id.rb +66 -0
  33. data/lib/elasticsearch/api/actions/connector/update_configuration.rb +66 -0
  34. data/lib/elasticsearch/api/actions/connector/update_error.rb +66 -0
  35. data/lib/elasticsearch/api/actions/connector/update_filtering.rb +66 -0
  36. data/lib/elasticsearch/api/actions/connector/update_index_name.rb +66 -0
  37. data/lib/elasticsearch/api/actions/connector/update_name.rb +66 -0
  38. data/lib/elasticsearch/api/actions/connector/update_native.rb +66 -0
  39. data/lib/elasticsearch/api/actions/connector/update_pipeline.rb +66 -0
  40. data/lib/elasticsearch/api/actions/connector/update_scheduling.rb +66 -0
  41. data/lib/elasticsearch/api/actions/connector/update_service_type.rb +66 -0
  42. data/lib/elasticsearch/api/actions/connector/update_status.rb +66 -0
  43. data/lib/elasticsearch/api/namespace/connector.rb +36 -0
  44. data/lib/opensearch/api/actions/machine_learning/connector/delete.rb +42 -0
  45. data/lib/opensearch/api/actions/machine_learning/connector/get.rb +42 -0
  46. data/lib/opensearch/api/actions/machine_learning/connector/list.rb +38 -0
  47. data/lib/opensearch/api/actions/machine_learning/connector/post.rb +35 -0
  48. data/lib/opensearch/api/actions/machine_learning/connector/put.rb +44 -0
  49. data/lib/opensearch/api/actions/machine_learning/models/predict.rb +32 -0
  50. data/lib/opensearch/api/namespace/connector.rb +19 -0
  51. data/lib/stretchy/attributes/transformers/keyword_transformer.rb +41 -35
  52. data/lib/stretchy/attributes/type/array.rb +24 -1
  53. data/lib/stretchy/attributes/type/base.rb +6 -2
  54. data/lib/stretchy/attributes/type/binary.rb +24 -17
  55. data/lib/stretchy/attributes/type/boolean.rb +29 -22
  56. data/lib/stretchy/attributes/type/completion.rb +18 -10
  57. data/lib/stretchy/attributes/type/constant_keyword.rb +35 -26
  58. data/lib/stretchy/attributes/type/date_time.rb +28 -17
  59. data/lib/stretchy/attributes/type/dense_vector.rb +46 -49
  60. data/lib/stretchy/attributes/type/flattened.rb +28 -19
  61. data/lib/stretchy/attributes/type/geo_point.rb +21 -12
  62. data/lib/stretchy/attributes/type/geo_shape.rb +21 -12
  63. data/lib/stretchy/attributes/type/hash.rb +24 -10
  64. data/lib/stretchy/attributes/type/histogram.rb +25 -0
  65. data/lib/stretchy/attributes/type/ip.rb +26 -17
  66. data/lib/stretchy/attributes/type/join.rb +16 -7
  67. data/lib/stretchy/attributes/type/keyword.rb +21 -26
  68. data/lib/stretchy/attributes/type/knn_vector.rb +47 -0
  69. data/lib/stretchy/attributes/type/match_only_text.rb +22 -1
  70. data/lib/stretchy/attributes/type/nested.rb +16 -11
  71. data/lib/stretchy/attributes/type/numeric/base.rb +30 -22
  72. data/lib/stretchy/attributes/type/numeric/byte.rb +20 -0
  73. data/lib/stretchy/attributes/type/numeric/double.rb +20 -0
  74. data/lib/stretchy/attributes/type/numeric/float.rb +20 -0
  75. data/lib/stretchy/attributes/type/numeric/half_float.rb +20 -0
  76. data/lib/stretchy/attributes/type/numeric/integer.rb +21 -1
  77. data/lib/stretchy/attributes/type/numeric/long.rb +20 -0
  78. data/lib/stretchy/attributes/type/numeric/scaled_float.rb +16 -7
  79. data/lib/stretchy/attributes/type/numeric/short.rb +20 -0
  80. data/lib/stretchy/attributes/type/numeric/unsigned_long.rb +21 -1
  81. data/lib/stretchy/attributes/type/percolator.rb +16 -4
  82. data/lib/stretchy/attributes/type/point.rb +19 -9
  83. data/lib/stretchy/attributes/type/range/base.rb +24 -1
  84. data/lib/stretchy/attributes/type/range/date_range.rb +21 -5
  85. data/lib/stretchy/attributes/type/range/double_range.rb +20 -4
  86. data/lib/stretchy/attributes/type/range/float_range.rb +21 -5
  87. data/lib/stretchy/attributes/type/range/integer_range.rb +20 -4
  88. data/lib/stretchy/attributes/type/range/ip_range.rb +20 -4
  89. data/lib/stretchy/attributes/type/range/long_range.rb +20 -4
  90. data/lib/stretchy/attributes/type/rank_feature.rb +16 -6
  91. data/lib/stretchy/attributes/type/rank_features.rb +16 -9
  92. data/lib/stretchy/attributes/type/search_as_you_type.rb +28 -18
  93. data/lib/stretchy/attributes/type/shape.rb +19 -9
  94. data/lib/stretchy/attributes/type/sparse_vector.rb +25 -21
  95. data/lib/stretchy/attributes/type/string.rb +42 -1
  96. data/lib/stretchy/attributes/type/text.rb +53 -28
  97. data/lib/stretchy/attributes/type/token_count.rb +21 -11
  98. data/lib/stretchy/attributes/type/version.rb +16 -6
  99. data/lib/stretchy/attributes/type/wildcard.rb +36 -25
  100. data/lib/stretchy/attributes.rb +29 -0
  101. data/lib/stretchy/delegation/gateway_delegation.rb +78 -0
  102. data/lib/stretchy/index_setting.rb +94 -0
  103. data/lib/stretchy/indexing/bulk.rb +75 -3
  104. data/lib/stretchy/machine_learning/connector.rb +130 -0
  105. data/lib/stretchy/machine_learning/errors.rb +25 -0
  106. data/lib/stretchy/machine_learning/model.rb +162 -109
  107. data/lib/stretchy/machine_learning/registry.rb +19 -0
  108. data/lib/stretchy/model/callbacks.rb +1 -0
  109. data/lib/stretchy/model/common.rb +157 -0
  110. data/lib/stretchy/model/persistence.rb +144 -0
  111. data/lib/stretchy/model/refreshable.rb +26 -0
  112. data/lib/stretchy/open_search_compatibility.rb +2 -0
  113. data/lib/stretchy/pipeline.rb +2 -1
  114. data/lib/stretchy/pipelines/processor.rb +40 -36
  115. data/lib/stretchy/querying.rb +7 -8
  116. data/lib/stretchy/rails/railtie.rb +11 -0
  117. data/lib/stretchy/rails/tasks/connector/create.rake +32 -0
  118. data/lib/stretchy/rails/tasks/connector/delete.rake +27 -0
  119. data/lib/stretchy/rails/tasks/connector/status.rake +31 -0
  120. data/lib/stretchy/rails/tasks/connector/update.rake +32 -0
  121. data/lib/stretchy/rails/tasks/index/create.rake +28 -0
  122. data/lib/stretchy/rails/tasks/index/delete.rake +27 -0
  123. data/lib/stretchy/rails/tasks/index/status.rake +23 -0
  124. data/lib/stretchy/rails/tasks/ml/delete.rake +25 -0
  125. data/lib/stretchy/rails/tasks/ml/deploy.rake +78 -0
  126. data/lib/stretchy/rails/tasks/ml/status.rake +31 -0
  127. data/lib/stretchy/rails/tasks/pipeline/create.rake +27 -0
  128. data/lib/stretchy/rails/tasks/pipeline/delete.rake +26 -0
  129. data/lib/stretchy/rails/tasks/pipeline/status.rake +25 -0
  130. data/lib/stretchy/rails/tasks/status.rake +15 -0
  131. data/lib/stretchy/rails/tasks/stretchy.rake +42 -0
  132. data/lib/stretchy/record.rb +5 -4
  133. data/lib/stretchy/relation.rb +229 -28
  134. data/lib/stretchy/relations/aggregation_methods/aggregation.rb +59 -0
  135. data/lib/stretchy/relations/aggregation_methods/avg.rb +45 -0
  136. data/lib/stretchy/relations/aggregation_methods/bucket_script.rb +47 -0
  137. data/lib/stretchy/relations/aggregation_methods/bucket_selector.rb +47 -0
  138. data/lib/stretchy/relations/aggregation_methods/bucket_sort.rb +47 -0
  139. data/lib/stretchy/relations/aggregation_methods/cardinality.rb +47 -0
  140. data/lib/stretchy/relations/aggregation_methods/children.rb +47 -0
  141. data/lib/stretchy/relations/aggregation_methods/composite.rb +41 -0
  142. data/lib/stretchy/relations/aggregation_methods/date_histogram.rb +53 -0
  143. data/lib/stretchy/relations/aggregation_methods/date_range.rb +53 -0
  144. data/lib/stretchy/relations/aggregation_methods/extended_stats.rb +48 -0
  145. data/lib/stretchy/relations/aggregation_methods/filter.rb +47 -0
  146. data/lib/stretchy/relations/aggregation_methods/filters.rb +47 -0
  147. data/lib/stretchy/relations/aggregation_methods/geo_bounds.rb +40 -0
  148. data/lib/stretchy/relations/aggregation_methods/geo_centroid.rb +40 -0
  149. data/lib/stretchy/relations/aggregation_methods/global.rb +39 -0
  150. data/lib/stretchy/relations/aggregation_methods/histogram.rb +43 -0
  151. data/lib/stretchy/relations/aggregation_methods/ip_range.rb +41 -0
  152. data/lib/stretchy/relations/aggregation_methods/max.rb +40 -0
  153. data/lib/stretchy/relations/aggregation_methods/min.rb +41 -0
  154. data/lib/stretchy/relations/aggregation_methods/missing.rb +40 -0
  155. data/lib/stretchy/relations/aggregation_methods/nested.rb +40 -0
  156. data/lib/stretchy/relations/aggregation_methods/percentile_ranks.rb +45 -0
  157. data/lib/stretchy/relations/aggregation_methods/percentiles.rb +45 -0
  158. data/lib/stretchy/relations/aggregation_methods/range.rb +42 -0
  159. data/lib/stretchy/relations/aggregation_methods/reverse_nested.rb +40 -0
  160. data/lib/stretchy/relations/aggregation_methods/sampler.rb +40 -0
  161. data/lib/stretchy/relations/aggregation_methods/scripted_metric.rb +43 -0
  162. data/lib/stretchy/relations/aggregation_methods/significant_terms.rb +45 -0
  163. data/lib/stretchy/relations/aggregation_methods/stats.rb +42 -0
  164. data/lib/stretchy/relations/aggregation_methods/sum.rb +42 -0
  165. data/lib/stretchy/relations/aggregation_methods/terms.rb +46 -0
  166. data/lib/stretchy/relations/aggregation_methods/top_hits.rb +42 -0
  167. data/lib/stretchy/relations/aggregation_methods/top_metrics.rb +44 -0
  168. data/lib/stretchy/relations/aggregation_methods/value_count.rb +41 -0
  169. data/lib/stretchy/relations/aggregation_methods/weighted_avg.rb +42 -0
  170. data/lib/stretchy/relations/aggregation_methods.rb +20 -749
  171. data/lib/stretchy/relations/finder_methods.rb +2 -18
  172. data/lib/stretchy/relations/null_relation.rb +55 -0
  173. data/lib/stretchy/relations/query_builder.rb +82 -36
  174. data/lib/stretchy/relations/query_methods/bind.rb +19 -0
  175. data/lib/stretchy/relations/query_methods/extending.rb +29 -0
  176. data/lib/stretchy/relations/query_methods/fields.rb +70 -0
  177. data/lib/stretchy/relations/query_methods/filter_query.rb +53 -0
  178. data/lib/stretchy/relations/query_methods/has_field.rb +40 -0
  179. data/lib/stretchy/relations/query_methods/highlight.rb +75 -0
  180. data/lib/stretchy/relations/query_methods/hybrid.rb +60 -0
  181. data/lib/stretchy/relations/query_methods/ids.rb +40 -0
  182. data/lib/stretchy/relations/query_methods/match.rb +52 -0
  183. data/lib/stretchy/relations/query_methods/must_not.rb +54 -0
  184. data/lib/stretchy/relations/query_methods/neural.rb +58 -0
  185. data/lib/stretchy/relations/query_methods/neural_sparse.rb +43 -0
  186. data/lib/stretchy/relations/query_methods/none.rb +21 -0
  187. data/lib/stretchy/relations/query_methods/or_filter.rb +21 -0
  188. data/lib/stretchy/relations/query_methods/order.rb +63 -0
  189. data/lib/stretchy/relations/query_methods/query_string.rb +44 -0
  190. data/lib/stretchy/relations/query_methods/regexp.rb +61 -0
  191. data/lib/stretchy/relations/query_methods/should.rb +51 -0
  192. data/lib/stretchy/relations/query_methods/size.rb +44 -0
  193. data/lib/stretchy/relations/query_methods/skip_callbacks.rb +47 -0
  194. data/lib/stretchy/relations/query_methods/source.rb +59 -0
  195. data/lib/stretchy/relations/query_methods/where.rb +113 -0
  196. data/lib/stretchy/relations/query_methods.rb +48 -569
  197. data/lib/stretchy/relations/scoping/default.rb +136 -0
  198. data/lib/stretchy/relations/scoping/named.rb +70 -0
  199. data/lib/stretchy/relations/scoping/scope_registry.rb +36 -0
  200. data/lib/stretchy/relations/scoping.rb +30 -0
  201. data/lib/stretchy/relations/search_option_methods.rb +2 -0
  202. data/lib/stretchy/version.rb +1 -1
  203. data/lib/stretchy.rb +24 -10
  204. metadata +170 -17
  205. data/lib/stretchy/common.rb +0 -38
  206. data/lib/stretchy/null_relation.rb +0 -53
  207. data/lib/stretchy/persistence.rb +0 -43
  208. data/lib/stretchy/refreshable.rb +0 -15
  209. data/lib/stretchy/scoping/default.rb +0 -134
  210. data/lib/stretchy/scoping/named.rb +0 -68
  211. data/lib/stretchy/scoping/scope_registry.rb +0 -34
  212. data/lib/stretchy/scoping.rb +0 -28
@@ -0,0 +1,372 @@
1
+ # Models
2
+
3
+
4
+ ## Creating Stretchy Documents
5
+
6
+ Create your models in app/models and subclass the `StretchyModel` class.
7
+
8
+ ```ruby
9
+ class Profile < StretchyModel
10
+ end
11
+ ```
12
+
13
+ The index name `profiles` will be inferred from the `Profile` model.
14
+
15
+ ### Overriding Naming Conventions
16
+
17
+ If you need a different naming convention for your indexes, you can override the default conventions.
18
+
19
+ ```ruby
20
+ class Profile < StretchyModel
21
+ index_name 'user_profiles'
22
+ end
23
+ ```
24
+
25
+ ### Default Sorting
26
+
27
+ By default Stretchy Models are sorted by `created_at`. If you have another field you'd like to sort by you can easily change it.
28
+
29
+
30
+ ```ruby
31
+ class Profile < StretchyModel
32
+ default_sort_key :updated_at
33
+ end
34
+ ```
35
+
36
+
37
+ ## Attributes
38
+
39
+
40
+ Attributes are defined using `Stretchy::Attributes::Type` attributes.
41
+
42
+ * `:array` - Used to store multiple values in a single field.
43
+ * `:binary` - Used to store binary data as a `Base64` encoded string.
44
+ * `:boolean` - Used to store true or false values.
45
+ * `:constant_keyword` - Used for fields that will contain the same value across all documents.
46
+ * `:datetime` - Used to store date and time.
47
+ * `:flattened` - Used for indexing object hierarchies as a flat list.
48
+ * `:geo_point` - Used to store geographical locations as latitude/longitude points.
49
+ * `:geo_shape` - Used to store complex shapes like polygons.
50
+ * `:histogram` - Used to support aggregations on numerical data.
51
+ * `:hash` - Used for structured JSON objects. Each field can be of any data type, including another object.
52
+ * `:ip` - Used to store IP addresses.
53
+ * `:join` - Used to create parent/child relation within documents.
54
+ * `:keyword` - Used for exact match searches, sorting, and aggregations. This field is not analyzed.
55
+ * `:match_only_text` - Used for full-text search.
56
+ * `:nested` - Used to index arrays of objects as separate documents.
57
+ * `:percolator` - Used to store queries for matching documents.
58
+ * `:point` - Used to store points in space.
59
+ * `:rank_feature` - Used to record a numeric feature to boost hits at query time.
60
+ * `:rank_features` - Used to record many numeric features to boost hits at query time.
61
+ * `:text` - Used to store text. This field is analyzed, which means that its value is broken down into separate searchable terms.
62
+ * `:token_count` - Used to count the number of tokens in a string.
63
+ * `:dense_vector` - Used to store dense vectors of float values.
64
+ * `:search_as_you_type` - Used for full-text search, especially for auto-complete functionality.
65
+ * `:sparse_vector` - Used to store sparse vectors of float values.
66
+ * `:string` - Used to store text. This field is analyzed, which means that its value is broken down into separate searchable terms. _an alias of `:text`_
67
+ * `:version` - Used to store version numbers.
68
+ * `:wildcard` - Used for fields that will contain arbitrary strings.
69
+
70
+ #### Numeric Types
71
+ * `:long` - Used to store long integer numbers.
72
+ * `:integer` - Used to store integer numbers.
73
+ * `:short` - Used to store short integer numbers.
74
+ * `:byte` - Used to store byte data.
75
+ * `:double` - Used to store double-precision floating point numbers.
76
+ * `:float` - Used to store floating point numbers.
77
+ * `:half_float` - Used to store half-precision floating point numbers.
78
+ * `:scaled_float` - Used to store floating point numbers that are scaled by a specific factor.
79
+ * `:unsigned_long` - Used to store long integer numbers that are always positive.
80
+
81
+ #### Range Types
82
+ * `:integer_range` - Used to store ranges of integer numbers.
83
+ * `:float_range` - Used to store ranges of floating point numbers.
84
+ * `:long_range` - Used to store ranges of long integer numbers.
85
+ * `:double_range` - Used to store ranges of double-precision floating point numbers.
86
+ * `:date_range` - Used to store ranges of dates.
87
+ * `:ip_range` - Used to store ranges of IP addresses.
88
+
89
+ In Elasticsearch, `:string` and `:keyword` are two seemlingly similar data types that are used for different purposes.
90
+
91
+ A `:string` field is analyzed, which means that its value is broken down into separate searchable terms. For example, the string "quick brown fox" might be broken down into the terms "quick", "brown", and "fox". This makes `:string` fields suitable for full-text search where you want to find partial matches or search for individual words within a field.
92
+
93
+ On the other hand, a `:keyword` field is not analyzed and is used for exact match searches, sorting, and aggregations. The entire value of the field is used as a single term. For example, the string "quick brown fox" is a single term. This makes `:keyword` fields suitable for filtering, sorting, and aggregations where you need the exact value of the field.
94
+
95
+ In Stretchy, when you define a field as `:keyword`, it's like defining a `:string` or `:text` field but with the `.keyword` notation automatically added to the field in queries, aggregations, and filters. This tells Elasticsearch to treat the field as a `:keyword` field and use the entire field value as a single term.
96
+
97
+ Avoid using `:keyword` fields for full-text search. Use the `:text` or `:string` type instead.
98
+
99
+
100
+ ```ruby
101
+ class Profile < StretchyModel
102
+
103
+ attribute :first_name, :keyword
104
+ attribute :last_name, :keyword
105
+ attribute :geo, :hash
106
+ attribute :bio, :text
107
+ attribute :age, :integer
108
+ attribute :score, :float
109
+ attribute :joined, :datetime
110
+ attribute :fav_flowers, :array
111
+ attribute :visible, :boolean, default: true
112
+
113
+ end
114
+
115
+ ```
116
+
117
+ >[!TIP]
118
+ >
119
+ > `created_at`, `updated_at` and `id` are automatically included
120
+
121
+
122
+ All of the attribute types can receive additional parameters that are used to configure the mapping for the field. These parameters can be used to customize how the data is indexed and searched. For example, you can specify whether a field should be stored, whether it should be indexed, the data type of the field, and more.
123
+
124
+ Here's an example of how to use additional parameters with the :keyword attribute type:
125
+
126
+ ```ruby
127
+ class Product < StretchyModel
128
+ attribute :tag, :keyword, ignore_above: 256, time_series_dimension: true
129
+ end
130
+ ```
131
+ In this example, ignore_above: 256 means that strings longer than 256 characters will not be indexed, and time_series_dimension: true indicates that the field is a time series dimension.
132
+
133
+ Refer to the documentation for each attribute type for a full list of available parameters and their meanings.
134
+
135
+ When working with `:hash` data types, it's often useful to specify the fields within that object for fine-tuned control over mappings.
136
+
137
+ The `:hash` data type is used for structured JSON objects, where each field can be of any data type, including another object. This provides a flexible way to store complex data structures in a single field.
138
+
139
+ However, this flexibility can sometimes lead to less than optimal search performance or unexpected search results. For example, if a field within the hash object contains text, it might be analyzed and tokenized in a way that's not suitable for your use case.
140
+
141
+ To overcome this, you can specify the `:properties` within the hash object and their data types. This allows you to control how each field is indexed and searched. For example, you can specify that a field should be of type `:keyword` to ensure that it's not analyzed and can be used for exact match searches.
142
+
143
+ Here's an example of how to specify fields within a hash object:
144
+
145
+ ```ruby
146
+ attribute :metadata, :hash, properties: {
147
+ title: { type: :text },
148
+ tags: { type: :keyword }
149
+ checkins: { type: :array }
150
+ }
151
+ ```
152
+ In this example, the metadata attribute is a hash object with fields: title, tags and checkins. The title field is of type `:text`, which means it will be analyzed and can be used for full-text search. The tags field is of type `:keyword`, which means it will not be analyzed and can be used for exact match searches. The checkins field is of type `:array` which means it can contain zero or more values.
153
+
154
+ > [!NOTE]
155
+ >
156
+ >When adding a field dynamically, the first value in the array determines the field type. All subsequent values must be of the same data type or it must at least be possible to coerce subsequent values to the same data type.
157
+ >
158
+ >Arrays with a mixture of data types are not supported: `[ 10, "some string" ]`
159
+
160
+ By specifying the fields within the hash object, you have fine-tuned control over the mappings and can optimize the search performance and accuracy for your specific use case.
161
+
162
+ ### Reading and Writing Data
163
+
164
+
165
+ #### Create
166
+
167
+ The `new` method will return a new object while `create` with return the object and save it to the index.
168
+
169
+ For example, given a model `Profile` with attributes `first_name`, `last_name`, and `age`, the `create` method call will create and save a new record to the index:
170
+
171
+ ```ruby
172
+ profile = Profile.new(first_name: "Candy", last_name: "Mu", age: 33)
173
+ ```
174
+
175
+ Using the `new` method, an object can be instatiated without being saved:
176
+
177
+ ```ruby
178
+ profile = Profile.new
179
+ profile.first_name = "Candy"
180
+ profile.last_name = "Mu"
181
+ profile.age = 33
182
+ ```
183
+
184
+ Calling `profile.save` will index the record.
185
+
186
+ #### Read
187
+
188
+ Following the ActiveRecord API, Stretchy models have the familiar methods defined:
189
+
190
+ ```ruby
191
+ # return a collection with all profiles
192
+ profiles = Profile.all
193
+ ```
194
+
195
+ ```ruby
196
+ # return the first profile
197
+ profile = Profile.first
198
+ ```
199
+
200
+ ```ruby
201
+ # find all profiles with age of 24 named Lori and sort by :created_at in descending order
202
+ profile = Profile.where(first_name: "Lori", age: 24).sort(created_at: :desc)
203
+ ```
204
+
205
+ The full [Query Interface](guides/querying) guide goes into further depth on the API available for interacting with Stretchy models.
206
+
207
+
208
+ #### Update
209
+
210
+ ```ruby
211
+ profile = Profile.where(first_name: "Candy").first
212
+ profile.update(first_name: "Lilly")
213
+ ```
214
+
215
+
216
+ #### Delete
217
+
218
+ ```ruby
219
+ profile = Profile.where(first_name: "Lori")
220
+ profile.destroy
221
+ ```
222
+
223
+ ### Bulk Operations
224
+
225
+ Bulk operations in Elasticsearch are a way to perform multiple operations in a single API call. This can greatly increase the speed of indexing and updating documents in Elasticsearch.
226
+
227
+ The bulk API makes it possible to perform many `index`, `update`, `create`, or `delete` operations in a single API call. This can greatly improve the indexing speed.
228
+
229
+ In the context of Stretchy, the `Model.bulk` method can be used to perform bulk operations. You can pass an array of records to this method, and each record will be processed according to the operation specified in its `to_bulk` method.
230
+
231
+ The `to_bulk` method is used to generate the structure for the bulk operation. By default, it performs an index operation, but you can also specify `:delete` or `:update` operations.
232
+
233
+ For large datasets, you can use the `Model.bulk_in_batches` method to perform bulk operations in batches. This method divides the records into batches of a specified size and processes each batch separately. This can be more memory-efficient than processing all records at once, especially for large datasets.
234
+
235
+ Here's an example of how you can use these methods:
236
+
237
+ ```ruby
238
+ Model.bulk(records_as_bulk_operations)
239
+ ```
240
+
241
+ #### Bulk helper
242
+ Generates structure for the bulk operation
243
+ ```ruby
244
+ record.to_bulk # default to_bulk(:index)
245
+ record.to_bulk(:delete)
246
+ record.to_bulk(:update)
247
+ ```
248
+
249
+ #### In batches
250
+ Run bulk operations in batches specified by `size`
251
+ ```ruby
252
+ Model.bulk_in_batches(records, size: 100) do |batch|
253
+ batch.map! { |record| Model.new(record).to_bulk }
254
+ end
255
+ ```
256
+
257
+
258
+ ### Validations
259
+
260
+ ```ruby
261
+ class Profile < StretchyModel
262
+ attribute :first_name, :string
263
+ validates :first_name, presence: true
264
+ end
265
+ ```
266
+
267
+ ```irb
268
+ profile = Profile.new
269
+ profile.save
270
+ #=> First name can't be blank
271
+ ```
272
+
273
+ ### Callbacks
274
+
275
+ * `:before_save`
276
+ * `:after_save`
277
+ * `:before_create`
278
+ * `:after_create`
279
+ * `:before_update`
280
+ * `:after_update`
281
+ * `:before_destroy`
282
+ * `:after_destroy`
283
+
284
+ ### Associations
285
+ Associations can be made between models. While Elasticsearch is not a relational database, it is sometimes useful to have a link between records.
286
+
287
+ ```ruby
288
+ class Animal < StretchyModel
289
+ attribute :name, :keyword
290
+ attribute :zoo_id, :keyword
291
+
292
+ belongs_to :zoo
293
+ end
294
+ ```
295
+
296
+ ```ruby
297
+ class Zoo < StretchyModel
298
+ has_many :animals
299
+ end
300
+ ```
301
+
302
+ ```ruby (console)
303
+ zoo.animals
304
+ => [#<Animal id: 1, name: "Panda", zoo_id: 1, created_at: 2024-03-15T01:03:38.395Z, updated_at: 2024-03-15T01:03:38.395Z>,
305
+ #<Animal id: 2, name: "Lemur", zoo_id: 1, created_at: 2024-03-15T01:03:38.395Z, updated_at: 2024-03-15T01:03:38.395Z>]
306
+
307
+ ```
308
+
309
+ Associations largely work the same as their Rails counterparts. The following association types are supported:
310
+
311
+ * `belongs_to`
312
+ * `has_many`
313
+ * `has_one`
314
+
315
+
316
+ >[!WARNING|label:Associations in Elasticsearch]
317
+ >
318
+ > Because Elasticsearch is not a relational database, there are no join statements. This means associations will generate additional queries to fetch data.
319
+
320
+
321
+ ## Mappings
322
+
323
+ In Elasticsearch, mappings define how documents and their fields are stored and indexed. When using the attribute method you can specify additional mapping options as parameters.
324
+
325
+ ```ruby
326
+ class Report < StretchyModel
327
+ attribute :title, :text
328
+ attribute :created_by, :keyword
329
+ attribute :body, :text, term_vector: :with_positions_offsets
330
+ end
331
+ ```
332
+ The attribute method is used to define the fields of the Report model. The first argument is the field name, the second argument is the field type, and any subsequent arguments are additional mapping options for the field.
333
+
334
+ In the case of the body field, `term_vector: :with_positions_offsets` is an additional mapping option. This option configures Elasticsearch to store term vectors for the body field, including position and character offset information. Term vectors are used to speed up full-text search operations.
335
+
336
+ So, when you're defining mappings for your Elasticsearch models, you can use the attribute method to specify not only the field names and types, but also any additional mapping options that you need.
337
+
338
+ If you're curious and want to dive deeper, check out the [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html). It's a treasure trove of information on all the supported options for each field type.
339
+
340
+ ```ruby
341
+ Report.mappings
342
+ => {
343
+ "properties" => {
344
+ "id" => {
345
+ "type" => "keyword"
346
+ },
347
+ "created_at" => {
348
+ "type" => "date"
349
+ },
350
+ "updated_at" => {
351
+ "type" => "date"
352
+ },
353
+ "title" => {
354
+ "type" => "text"
355
+ },
356
+ "created_by" => {
357
+ "type" => "keyword"
358
+ },
359
+ "body" => {
360
+ "type" => "text",
361
+ "term_vector" => "with_positions_offsets",
362
+ "fields" => {
363
+ "keyword" => {
364
+ "type" => "keyword",
365
+ "ignore_above" => 256
366
+ }
367
+ }
368
+ }
369
+ },
370
+ :dynamic => true
371
+ }
372
+ ```
@@ -0,0 +1,151 @@
1
+ # Pipelines
2
+
3
+ Pipelines follow a specific convention for storing pipeline definitions. This helps us keep our code organized and easy to navigate.
4
+
5
+ Generally, pipelines can be stored in `app/pipelines`. However, if you have a mix of ingest and search pipelines it's good practice to break them out into their own namespace.
6
+
7
+ For ingest pipelines, we store each pipeline in its own Ruby file inside the `app/pipelines/ingest` directory. The file name matches the pipeline name. For example, the definition for an ingest pipeline named `example_pipeline` would be stored in `app/pipelines/ingest/example_pipeline.rb`.
8
+
9
+ Similarly, for search pipelines, we store each pipeline in its own Ruby file inside the `app/pipelines/search` directory. Again, the file name matches the pipeline name. So, a search pipeline named `example_pipeline` would be stored in `app/pipelines/search/example_pipeline.rb`.
10
+
11
+ This convention makes it easy to find the definition for a specific pipeline, whether it's an ingest pipeline or a search pipeline.
12
+
13
+ - *app/pipelines/ingest/example_ingest_pipeline.rb*
14
+ - *app/pipelines/search/example_search_pipeline.rb*
15
+
16
+ ---
17
+
18
+ ## Defining a Pipeline
19
+
20
+ A pipeline in `stretchy-model` is defined as a Ruby class that inherits from `Stretchy::Pipeline`. It has a specific structure and includes several key components:
21
+
22
+ - `pipeline_name`: This is the ID of your pipeline. By default, it's inferred from the class name.
23
+ - `description`: This is a brief description of what your pipeline does. It's a good practice to provide a meaningful description so that others can understand the purpose of your pipeline at a glance.
24
+ - `processor`: These are the processors that your pipeline will run. A pipeline can have one or more processors. Each processor is defined with a type (like `sparse_encoding`) and a set of options. The processors are run in the order they are defined. Refer to [Ingest Processors](/guides/pipelines?id=ingest-processors) for available options.
25
+
26
+
27
+ Here's an example of a pipeline with these components:
28
+
29
+ ```ruby
30
+ class NLPSparsePipeline < Stretchy::Pipeline
31
+
32
+ #inferred from class by default
33
+ pipeline_name 'nlp-sparse-pipeline'
34
+
35
+ description "Sparse encoding pipeline"
36
+
37
+ processor :sparse_encoding,
38
+ model_id: 'q32Pw02BJ3squ3VZa',
39
+ field_map: {
40
+ body: :embedding
41
+ }
42
+ end
43
+ ```
44
+
45
+ [^1]: https://opensearch.org/docs/latest/ingest-pipelines/
46
+
47
+ [^2]: https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html
48
+
49
+
50
+ ## Managing Pipelines
51
+
52
+ ### create!
53
+ This method creates a new pipeline in Elasticsearch. It uses the pipeline definition provided in the class where it's called. If a pipeline with the same name already exists, it will be overwritten.
54
+
55
+ Example:
56
+
57
+ ```ruby
58
+ MyPipeline.create!
59
+ ```
60
+
61
+ ### delete!
62
+ This method deletes the pipeline from Elasticsearch. Be careful when using this method, as it will permanently remove the pipeline.
63
+
64
+ Example:
65
+
66
+ ```ruby
67
+ MyPipeline.delete!
68
+ ```
69
+
70
+ ### exists?
71
+ This method checks if the pipeline exists in Elasticsearch. It returns true if the pipeline exists, and false otherwise.
72
+
73
+ Example:
74
+
75
+ ```ruby
76
+ MyPipeline.exists?
77
+ ```
78
+
79
+ ### find
80
+ This method retrieves a pipeline from Elasticsearch using its ID. If the pipeline exists, it returns the pipeline definition. If the pipeline doesn't exist, it returns nil.
81
+
82
+
83
+ Example:
84
+ ```ruby
85
+ # Find another pipeline id
86
+ MyPipeline.find('another-pipeline-id')
87
+
88
+ # Find this pipeline
89
+ MyPipeline.find
90
+ ```
91
+ ### all
92
+ This method returns all pipelines that currently exist in Elasticsearch. It's useful when you want to see all your pipelines at once.
93
+
94
+ Example:
95
+ ```ruby
96
+ MyPipeline.all
97
+ ```
98
+
99
+ ### simulate
100
+
101
+ This method is used to simulate the execution of the pipeline on a set of documents. It takes two parameters: `docs`, which is an array of documents to be processed, and `verbose`, which is a boolean that controls whether detailed information about each step is included in the simulation output.
102
+
103
+ Example:
104
+
105
+ ```ruby
106
+ MyPipeline.simulate([{ '_source' => { 'message' => 'hello world' } }])
107
+ ```
108
+
109
+ ---
110
+
111
+ ## Ingest Processors
112
+ - `:append` - Adds one or more values to a field in a document.
113
+ - `:bytes` - Converts a human-readable byte value to its value in bytes.
114
+ - `:convert` - Changes the data type of a field in a document.
115
+ - `:copy` - Copies an entire object in an existing field to another field.
116
+ - `:csv` - Extracts CSVs and stores them as individual fields in a document.
117
+ - `:date` - Parses dates from fields and then uses the date or timestamp as the timestamp for a document.
118
+ - `:date_index_name` - Indexes documents into time-based indexes based on a date or timestamp field in a document.
119
+ - `:dissect` - Extracts structured fields from a text field using a defined pattern.
120
+ - `:dot_expander` - Expands a field with dots into an object field.
121
+ - `:drop` - Drops a document without indexing it or raising any errors.
122
+ - `:fail` - Raises an exception and stops the execution of a pipeline.
123
+ - `:foreach` - Allows for another processor to be applied to each element of an array or an object field in a document.
124
+ - `:geoip` - Adds information about the geographical location of an IP address.
125
+ - `:geojson-feature` - Indexes GeoJSON data into a geospatial field.
126
+ - `:grok` - Parses and structures unstructured data using pattern matching.
127
+ - `:gsub` - Replaces or deletes substrings within a string field of a document.
128
+ - `:html_strip` - Removes HTML tags from a text field and returns the plain text content.
129
+ - `:ip2geo` - Adds information about the geographical location of an IPv4 or IPv6 address.
130
+ - `:join` - Concatenates each element of an array into a single string using a separator character between each element.
131
+ - `:json` - Converts a JSON string into a structured JSON object.
132
+ - `:kv` - Automatically parses key-value pairs in a field.
133
+ - `:lowercase` - Converts text in a specific field to lowercase letters.
134
+ - `:pipeline` - Runs an inner pipeline.
135
+ - `:remove` - Removes fields from a document.
136
+ - `:script` - Runs an inline or stored script on incoming documents.
137
+ - `:set` - Sets the value of a field to a specified value.
138
+ - `:sort` - Sorts the elements of an array in ascending or descending order.
139
+ - `:sparse_encoding` - Generates a sparse vector/token and weights from text fields for neural sparse search using sparse retrieval.
140
+ - `:split` - Splits a field into an array using a separator character.
141
+ - `:text_embedding` - Generates vector embeddings from text fields for semantic search.
142
+ - `:text_image_embedding` - Generates combined vector embeddings from text and image fields for multimodal neural search.
143
+ - `:trim` - Removes leading and trailing white space from a string field.
144
+ - `:uppercase` - Converts text in a specific field to uppercase letters.
145
+ - `:urldecode` - Decodes a string from URL-encoded format.
146
+ - `:user_agent` - Extracts details from the user agent sent by a browser to its web requests.
147
+
148
+ Please note that this is not an exhaustive list. There are many more ingest processors available in Elasticsearch and OpenSearch. For a complete list and their available options, refer to the official documentation:
149
+
150
+ - [Elasticsearch Ingest Processors](https://www.elastic.co/guide/en/elasticsearch/reference/current/processors.html)
151
+ - [OpenSearch Ingest Processors](https://opensearch.org/docs/latest/ingest-pipelines/processors/index-processors/#supported-processors)