activerecord-graph-extractor 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,381 @@
1
+ # S3 Integration
2
+
3
+ The ActiveRecord Graph Extractor gem provides built-in support for uploading extraction files directly to Amazon S3. This feature is useful for:
4
+
5
+ - Storing extractions in cloud storage for backup or sharing
6
+ - Integrating with data pipelines that consume from S3
7
+ - Archiving large extraction files
8
+ - Enabling remote access to extraction data
9
+
10
+ ## Installation
11
+
12
+ The S3 integration requires the `aws-sdk-s3` gem, which is automatically included as a dependency when you install the activerecord-graph-extractor gem.
13
+
14
+ ## Configuration
15
+
16
+ ### AWS Credentials
17
+
18
+ The S3Client uses the standard AWS SDK credential chain. You can configure credentials in several ways:
19
+
20
+ 1. **Environment Variables** (Recommended for production):
21
+ ```bash
22
+ export AWS_ACCESS_KEY_ID=your_access_key
23
+ export AWS_SECRET_ACCESS_KEY=your_secret_key
24
+ export AWS_REGION=us-east-1
25
+ ```
26
+
27
+ 2. **AWS Credentials File** (`~/.aws/credentials`):
28
+ ```ini
29
+ [default]
30
+ aws_access_key_id = your_access_key
31
+ aws_secret_access_key = your_secret_key
32
+ ```
33
+
34
+ 3. **IAM Roles** (Recommended for EC2/ECS):
35
+ When running on AWS infrastructure, use IAM roles for secure, temporary credentials.
36
+
37
+ 4. **Explicit Credentials** (Not recommended for production):
38
+ ```ruby
39
+ s3_client = ActiveRecordGraphExtractor::S3Client.new(
40
+ bucket_name: 'my-bucket',
41
+ access_key_id: 'your_access_key',
42
+ secret_access_key: 'your_secret_key'
43
+ )
44
+ ```
45
+
46
+ ### Required S3 Permissions
47
+
48
+ Your AWS credentials need the following S3 permissions:
49
+
50
+ ```json
51
+ {
52
+ "Version": "2012-10-17",
53
+ "Statement": [
54
+ {
55
+ "Effect": "Allow",
56
+ "Action": [
57
+ "s3:GetObject",
58
+ "s3:PutObject",
59
+ "s3:DeleteObject",
60
+ "s3:ListBucket",
61
+ "s3:GetBucketLocation"
62
+ ],
63
+ "Resource": [
64
+ "arn:aws:s3:::your-bucket-name",
65
+ "arn:aws:s3:::your-bucket-name/*"
66
+ ]
67
+ }
68
+ ]
69
+ }
70
+ ```
71
+
72
+ ## Usage
73
+
74
+ ### Basic S3 Upload
75
+
76
+ ```ruby
77
+ # Extract and upload to S3 in one step
78
+ extractor = ActiveRecordGraphExtractor::Extractor.new
79
+ result = extractor.extract_and_upload_to_s3(
80
+ order,
81
+ bucket_name: 'my-extraction-bucket',
82
+ s3_key: 'extractions/order_123.json',
83
+ region: 'us-east-1'
84
+ )
85
+
86
+ puts "Uploaded to: #{result['s3_upload'][:url]}"
87
+ ```
88
+
89
+ ### Using S3Client Directly
90
+
91
+ ```ruby
92
+ # Create S3 client
93
+ s3_client = ActiveRecordGraphExtractor::S3Client.new(
94
+ bucket_name: 'my-extraction-bucket',
95
+ region: 'us-east-1'
96
+ )
97
+
98
+ # Extract to S3
99
+ extractor = ActiveRecordGraphExtractor::Extractor.new
100
+ result = extractor.extract_to_s3(order, s3_client, 'extractions/order_123.json')
101
+ ```
102
+
103
+ ### Auto-Generated S3 Keys
104
+
105
+ If you don't specify an S3 key, one will be automatically generated with a timestamp:
106
+
107
+ ```ruby
108
+ result = extractor.extract_and_upload_to_s3(
109
+ order,
110
+ bucket_name: 'my-extraction-bucket'
111
+ )
112
+
113
+ # S3 key will be something like:
114
+ # "activerecord-graph-extractor/2024/01/25/extraction_20240125_143022.json"
115
+ ```
116
+
117
+ ### Extraction Options
118
+
119
+ You can pass all the same extraction options when uploading to S3:
120
+
121
+ ```ruby
122
+ result = extractor.extract_and_upload_to_s3(
123
+ order,
124
+ bucket_name: 'my-extraction-bucket',
125
+ options: {
126
+ max_depth: 3,
127
+ custom_serializers: {
128
+ 'Order' => ->(order) { { id: order.id, status: order.status } }
129
+ }
130
+ }
131
+ )
132
+ ```
133
+
134
+ ## S3Client Methods
135
+
136
+ The `S3Client` class provides comprehensive S3 operations:
137
+
138
+ ### Upload Files
139
+
140
+ ```ruby
141
+ s3_client = ActiveRecordGraphExtractor::S3Client.new(bucket_name: 'my-bucket')
142
+
143
+ # Upload with custom options
144
+ result = s3_client.upload_file(
145
+ 'local_file.json',
146
+ 'remote/path/file.json',
147
+ server_side_encryption: 'AES256',
148
+ metadata: { 'extracted_by' => 'my_app' }
149
+ )
150
+ ```
151
+
152
+ ### Download Files
153
+
154
+ ```ruby
155
+ # Download to specific path
156
+ result = s3_client.download_file('remote/file.json', 'local_file.json')
157
+
158
+ # Download using original filename
159
+ result = s3_client.download_file('remote/path/file.json')
160
+ # Downloads to './file.json'
161
+ ```
162
+
163
+ ### List Files
164
+
165
+ ```ruby
166
+ # List all files
167
+ files = s3_client.list_files
168
+
169
+ # List with prefix filter
170
+ files = s3_client.list_files(prefix: 'extractions/2024/')
171
+
172
+ # Limit results
173
+ files = s3_client.list_files(max_keys: 10)
174
+ ```
175
+
176
+ ### Check File Existence
177
+
178
+ ```ruby
179
+ if s3_client.file_exists?('remote/file.json')
180
+ puts "File exists!"
181
+ end
182
+ ```
183
+
184
+ ### Get File Metadata
185
+
186
+ ```ruby
187
+ metadata = s3_client.file_metadata('remote/file.json')
188
+ puts "Size: #{metadata[:size]} bytes"
189
+ puts "Last modified: #{metadata[:last_modified]}"
190
+ ```
191
+
192
+ ### Generate Presigned URLs
193
+
194
+ ```ruby
195
+ # URL valid for 1 hour (default)
196
+ url = s3_client.presigned_url('remote/file.json')
197
+
198
+ # Custom expiration (24 hours)
199
+ url = s3_client.presigned_url('remote/file.json', expires_in: 86400)
200
+ ```
201
+
202
+ ### Delete Files
203
+
204
+ ```ruby
205
+ s3_client.delete_file('remote/file.json')
206
+ ```
207
+
208
+ ## CLI Commands
209
+
210
+ The gem includes CLI commands for S3 operations:
211
+
212
+ ### Extract to S3
213
+
214
+ ```bash
215
+ # Extract and upload to S3
216
+ arge extract_to_s3 Order 123 \
217
+ --bucket my-extraction-bucket \
218
+ --key extractions/order_123.json \
219
+ --region us-east-1 \
220
+ --max-depth 3
221
+
222
+ # Auto-generate S3 key
223
+ arge extract_to_s3 Order 123 \
224
+ --bucket my-extraction-bucket \
225
+ --region us-east-1
226
+ ```
227
+
228
+ ### List S3 Files
229
+
230
+ ```bash
231
+ # List all extraction files
232
+ arge s3_list --bucket my-extraction-bucket
233
+
234
+ # List with prefix filter
235
+ arge s3_list --bucket my-extraction-bucket --prefix extractions/2024/
236
+
237
+ # Limit results
238
+ arge s3_list --bucket my-extraction-bucket --max-keys 10
239
+ ```
240
+
241
+ ### Download from S3
242
+
243
+ ```bash
244
+ # Download to specific file
245
+ arge s3_download extractions/order_123.json \
246
+ --bucket my-extraction-bucket \
247
+ --output local_order.json
248
+
249
+ # Download using original filename
250
+ arge s3_download extractions/order_123.json \
251
+ --bucket my-extraction-bucket
252
+ ```
253
+
254
+ ## Error Handling
255
+
256
+ The S3 integration includes comprehensive error handling:
257
+
258
+ ```ruby
259
+ begin
260
+ result = extractor.extract_and_upload_to_s3(order, bucket_name: 'my-bucket')
261
+ rescue ActiveRecordGraphExtractor::S3Error => e
262
+ puts "S3 operation failed: #{e.message}"
263
+ rescue ActiveRecordGraphExtractor::ExtractionError => e
264
+ puts "Extraction failed: #{e.message}"
265
+ rescue ActiveRecordGraphExtractor::FileError => e
266
+ puts "File operation failed: #{e.message}"
267
+ end
268
+ ```
269
+
270
+ Common S3 errors:
271
+ - `Bucket not found` - The specified bucket doesn't exist
272
+ - `Access denied` - Insufficient permissions
273
+ - `File not found in S3` - Trying to download non-existent file
274
+ - `Failed to upload file to S3` - Network or permission issues
275
+
276
+ ## Best Practices
277
+
278
+ ### Security
279
+
280
+ 1. **Use IAM Roles** when running on AWS infrastructure
281
+ 2. **Rotate credentials** regularly
282
+ 3. **Use least-privilege permissions** - only grant necessary S3 actions
283
+ 4. **Enable S3 bucket encryption** for sensitive data
284
+
285
+ ### Performance
286
+
287
+ 1. **Use appropriate regions** - choose regions close to your application
288
+ 2. **Consider S3 storage classes** for archival data (IA, Glacier)
289
+ 3. **Enable S3 Transfer Acceleration** for global applications
290
+ 4. **Use multipart uploads** for large files (handled automatically by AWS SDK)
291
+
292
+ ### Organization
293
+
294
+ 1. **Use consistent S3 key patterns**:
295
+ ```ruby
296
+ # Good: organized by date and model
297
+ "extractions/2024/01/25/orders/order_123.json"
298
+
299
+ # Good: include metadata in key
300
+ "extractions/production/orders/2024-01-25/order_123_depth_3.json"
301
+ ```
302
+
303
+ 2. **Set up S3 lifecycle policies** to automatically archive or delete old extractions
304
+
305
+ 3. **Use S3 bucket notifications** to trigger downstream processing
306
+
307
+ ### Monitoring
308
+
309
+ 1. **Enable S3 access logging** to track usage
310
+ 2. **Set up CloudWatch metrics** for monitoring
311
+ 3. **Use S3 inventory** for large-scale file management
312
+
313
+ ## Configuration Examples
314
+
315
+ ### Production Configuration
316
+
317
+ ```ruby
318
+ # config/initializers/activerecord_graph_extractor.rb
319
+ ActiveRecordGraphExtractor.configure do |config|
320
+ # S3 settings
321
+ config.s3_bucket = ENV['EXTRACTION_S3_BUCKET']
322
+ config.s3_region = ENV['AWS_REGION'] || 'us-east-1'
323
+ config.s3_key_prefix = "extractions/#{Rails.env}"
324
+
325
+ # Extraction settings
326
+ config.max_depth = 5
327
+ config.handle_circular_references = true
328
+ end
329
+ ```
330
+
331
+ ### Development Configuration
332
+
333
+ ```ruby
334
+ # For development, you might want to use local files instead of S3
335
+ ActiveRecordGraphExtractor.configure do |config|
336
+ if Rails.env.development?
337
+ # Use local storage in development
338
+ config.default_output_path = Rails.root.join('tmp', 'extractions')
339
+ else
340
+ # Use S3 in other environments
341
+ config.s3_bucket = ENV['EXTRACTION_S3_BUCKET']
342
+ config.s3_region = ENV['AWS_REGION']
343
+ end
344
+ end
345
+ ```
346
+
347
+ ## Integration with Other Services
348
+
349
+ ### Data Pipelines
350
+
351
+ ```ruby
352
+ # Extract and trigger downstream processing
353
+ result = extractor.extract_and_upload_to_s3(order, bucket_name: 'pipeline-bucket')
354
+
355
+ # Notify processing service
356
+ ProcessingService.notify_new_extraction(
357
+ s3_url: result['s3_upload'][:url],
358
+ metadata: result['metadata']
359
+ )
360
+ ```
361
+
362
+ ### Backup and Archival
363
+
364
+ ```ruby
365
+ # Regular backup job
366
+ class ExtractionBackupJob
367
+ def perform
368
+ critical_orders = Order.where(status: 'critical')
369
+
370
+ critical_orders.find_each do |order|
371
+ extractor.extract_and_upload_to_s3(
372
+ order,
373
+ bucket_name: 'backup-bucket',
374
+ s3_key: "backups/#{Date.current}/order_#{order.id}.json"
375
+ )
376
+ end
377
+ end
378
+ end
379
+ ```
380
+
381
+ This S3 integration makes the ActiveRecord Graph Extractor a powerful tool for cloud-native data extraction and processing workflows.