activerecord-graph-extractor 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.rspec +4 -0
- data/CHANGELOG.md +36 -0
- data/Gemfile +17 -0
- data/Gemfile.lock +201 -0
- data/LICENSE +21 -0
- data/README.md +532 -0
- data/Rakefile +36 -0
- data/activerecord-graph-extractor.gemspec +64 -0
- data/docs/dry_run.md +410 -0
- data/docs/examples.md +239 -0
- data/docs/s3_integration.md +381 -0
- data/docs/usage.md +363 -0
- data/examples/dry_run_example.rb +227 -0
- data/examples/s3_example.rb +247 -0
- data/exe/arge +7 -0
- data/lib/activerecord_graph_extractor/cli.rb +627 -0
- data/lib/activerecord_graph_extractor/configuration.rb +98 -0
- data/lib/activerecord_graph_extractor/dependency_resolver.rb +406 -0
- data/lib/activerecord_graph_extractor/dry_run_analyzer.rb +421 -0
- data/lib/activerecord_graph_extractor/errors.rb +33 -0
- data/lib/activerecord_graph_extractor/extractor.rb +182 -0
- data/lib/activerecord_graph_extractor/importer.rb +260 -0
- data/lib/activerecord_graph_extractor/json_serializer.rb +176 -0
- data/lib/activerecord_graph_extractor/primary_key_mapper.rb +57 -0
- data/lib/activerecord_graph_extractor/progress_tracker.rb +202 -0
- data/lib/activerecord_graph_extractor/relationship_analyzer.rb +212 -0
- data/lib/activerecord_graph_extractor/s3_client.rb +170 -0
- data/lib/activerecord_graph_extractor/version.rb +5 -0
- data/lib/activerecord_graph_extractor.rb +34 -0
- data/scripts/verify_installation.rb +192 -0
- metadata +388 -0
@@ -0,0 +1,381 @@
|
|
1
|
+
# S3 Integration
|
2
|
+
|
3
|
+
The ActiveRecord Graph Extractor gem provides built-in support for uploading extraction files directly to Amazon S3. This feature is useful for:
|
4
|
+
|
5
|
+
- Storing extractions in cloud storage for backup or sharing
|
6
|
+
- Integrating with data pipelines that consume from S3
|
7
|
+
- Archiving large extraction files
|
8
|
+
- Enabling remote access to extraction data
|
9
|
+
|
10
|
+
## Installation
|
11
|
+
|
12
|
+
The S3 integration requires the `aws-sdk-s3` gem, which is automatically included as a dependency when you install the activerecord-graph-extractor gem.
|
13
|
+
|
14
|
+
## Configuration
|
15
|
+
|
16
|
+
### AWS Credentials
|
17
|
+
|
18
|
+
The S3Client uses the standard AWS SDK credential chain. You can configure credentials in several ways:
|
19
|
+
|
20
|
+
1. **Environment Variables** (Recommended for production):
|
21
|
+
```bash
|
22
|
+
export AWS_ACCESS_KEY_ID=your_access_key
|
23
|
+
export AWS_SECRET_ACCESS_KEY=your_secret_key
|
24
|
+
export AWS_REGION=us-east-1
|
25
|
+
```
|
26
|
+
|
27
|
+
2. **AWS Credentials File** (`~/.aws/credentials`):
|
28
|
+
```ini
|
29
|
+
[default]
|
30
|
+
aws_access_key_id = your_access_key
|
31
|
+
aws_secret_access_key = your_secret_key
|
32
|
+
```
|
33
|
+
|
34
|
+
3. **IAM Roles** (Recommended for EC2/ECS):
|
35
|
+
When running on AWS infrastructure, use IAM roles for secure, temporary credentials.
|
36
|
+
|
37
|
+
4. **Explicit Credentials** (Not recommended for production):
|
38
|
+
```ruby
|
39
|
+
s3_client = ActiveRecordGraphExtractor::S3Client.new(
|
40
|
+
bucket_name: 'my-bucket',
|
41
|
+
access_key_id: 'your_access_key',
|
42
|
+
secret_access_key: 'your_secret_key'
|
43
|
+
)
|
44
|
+
```
|
45
|
+
|
46
|
+
### Required S3 Permissions
|
47
|
+
|
48
|
+
Your AWS credentials need the following S3 permissions:
|
49
|
+
|
50
|
+
```json
|
51
|
+
{
|
52
|
+
"Version": "2012-10-17",
|
53
|
+
"Statement": [
|
54
|
+
{
|
55
|
+
"Effect": "Allow",
|
56
|
+
"Action": [
|
57
|
+
"s3:GetObject",
|
58
|
+
"s3:PutObject",
|
59
|
+
"s3:DeleteObject",
|
60
|
+
"s3:ListBucket",
|
61
|
+
"s3:GetBucketLocation"
|
62
|
+
],
|
63
|
+
"Resource": [
|
64
|
+
"arn:aws:s3:::your-bucket-name",
|
65
|
+
"arn:aws:s3:::your-bucket-name/*"
|
66
|
+
]
|
67
|
+
}
|
68
|
+
]
|
69
|
+
}
|
70
|
+
```
|
71
|
+
|
72
|
+
## Usage
|
73
|
+
|
74
|
+
### Basic S3 Upload
|
75
|
+
|
76
|
+
```ruby
|
77
|
+
# Extract and upload to S3 in one step
|
78
|
+
extractor = ActiveRecordGraphExtractor::Extractor.new
|
79
|
+
result = extractor.extract_and_upload_to_s3(
|
80
|
+
order,
|
81
|
+
bucket_name: 'my-extraction-bucket',
|
82
|
+
s3_key: 'extractions/order_123.json',
|
83
|
+
region: 'us-east-1'
|
84
|
+
)
|
85
|
+
|
86
|
+
puts "Uploaded to: #{result['s3_upload'][:url]}"
|
87
|
+
```
|
88
|
+
|
89
|
+
### Using S3Client Directly
|
90
|
+
|
91
|
+
```ruby
|
92
|
+
# Create S3 client
|
93
|
+
s3_client = ActiveRecordGraphExtractor::S3Client.new(
|
94
|
+
bucket_name: 'my-extraction-bucket',
|
95
|
+
region: 'us-east-1'
|
96
|
+
)
|
97
|
+
|
98
|
+
# Extract to S3
|
99
|
+
extractor = ActiveRecordGraphExtractor::Extractor.new
|
100
|
+
result = extractor.extract_to_s3(order, s3_client, 'extractions/order_123.json')
|
101
|
+
```
|
102
|
+
|
103
|
+
### Auto-Generated S3 Keys
|
104
|
+
|
105
|
+
If you don't specify an S3 key, one will be automatically generated with a timestamp:
|
106
|
+
|
107
|
+
```ruby
|
108
|
+
result = extractor.extract_and_upload_to_s3(
|
109
|
+
order,
|
110
|
+
bucket_name: 'my-extraction-bucket'
|
111
|
+
)
|
112
|
+
|
113
|
+
# S3 key will be something like:
|
114
|
+
# "activerecord-graph-extractor/2024/01/25/extraction_20240125_143022.json"
|
115
|
+
```
|
116
|
+
|
117
|
+
### Extraction Options
|
118
|
+
|
119
|
+
You can pass all the same extraction options when uploading to S3:
|
120
|
+
|
121
|
+
```ruby
|
122
|
+
result = extractor.extract_and_upload_to_s3(
|
123
|
+
order,
|
124
|
+
bucket_name: 'my-extraction-bucket',
|
125
|
+
options: {
|
126
|
+
max_depth: 3,
|
127
|
+
custom_serializers: {
|
128
|
+
'Order' => ->(order) { { id: order.id, status: order.status } }
|
129
|
+
}
|
130
|
+
}
|
131
|
+
)
|
132
|
+
```
|
133
|
+
|
134
|
+
## S3Client Methods
|
135
|
+
|
136
|
+
The `S3Client` class provides comprehensive S3 operations:
|
137
|
+
|
138
|
+
### Upload Files
|
139
|
+
|
140
|
+
```ruby
|
141
|
+
s3_client = ActiveRecordGraphExtractor::S3Client.new(bucket_name: 'my-bucket')
|
142
|
+
|
143
|
+
# Upload with custom options
|
144
|
+
result = s3_client.upload_file(
|
145
|
+
'local_file.json',
|
146
|
+
'remote/path/file.json',
|
147
|
+
server_side_encryption: 'AES256',
|
148
|
+
metadata: { 'extracted_by' => 'my_app' }
|
149
|
+
)
|
150
|
+
```
|
151
|
+
|
152
|
+
### Download Files
|
153
|
+
|
154
|
+
```ruby
|
155
|
+
# Download to specific path
|
156
|
+
result = s3_client.download_file('remote/file.json', 'local_file.json')
|
157
|
+
|
158
|
+
# Download using original filename
|
159
|
+
result = s3_client.download_file('remote/path/file.json')
|
160
|
+
# Downloads to './file.json'
|
161
|
+
```
|
162
|
+
|
163
|
+
### List Files
|
164
|
+
|
165
|
+
```ruby
|
166
|
+
# List all files
|
167
|
+
files = s3_client.list_files
|
168
|
+
|
169
|
+
# List with prefix filter
|
170
|
+
files = s3_client.list_files(prefix: 'extractions/2024/')
|
171
|
+
|
172
|
+
# Limit results
|
173
|
+
files = s3_client.list_files(max_keys: 10)
|
174
|
+
```
|
175
|
+
|
176
|
+
### Check File Existence
|
177
|
+
|
178
|
+
```ruby
|
179
|
+
if s3_client.file_exists?('remote/file.json')
|
180
|
+
puts "File exists!"
|
181
|
+
end
|
182
|
+
```
|
183
|
+
|
184
|
+
### Get File Metadata
|
185
|
+
|
186
|
+
```ruby
|
187
|
+
metadata = s3_client.file_metadata('remote/file.json')
|
188
|
+
puts "Size: #{metadata[:size]} bytes"
|
189
|
+
puts "Last modified: #{metadata[:last_modified]}"
|
190
|
+
```
|
191
|
+
|
192
|
+
### Generate Presigned URLs
|
193
|
+
|
194
|
+
```ruby
|
195
|
+
# URL valid for 1 hour (default)
|
196
|
+
url = s3_client.presigned_url('remote/file.json')
|
197
|
+
|
198
|
+
# Custom expiration (24 hours)
|
199
|
+
url = s3_client.presigned_url('remote/file.json', expires_in: 86400)
|
200
|
+
```
|
201
|
+
|
202
|
+
### Delete Files
|
203
|
+
|
204
|
+
```ruby
|
205
|
+
s3_client.delete_file('remote/file.json')
|
206
|
+
```
|
207
|
+
|
208
|
+
## CLI Commands
|
209
|
+
|
210
|
+
The gem includes CLI commands for S3 operations:
|
211
|
+
|
212
|
+
### Extract to S3
|
213
|
+
|
214
|
+
```bash
|
215
|
+
# Extract and upload to S3
|
216
|
+
arge extract_to_s3 Order 123 \
|
217
|
+
--bucket my-extraction-bucket \
|
218
|
+
--key extractions/order_123.json \
|
219
|
+
--region us-east-1 \
|
220
|
+
--max-depth 3
|
221
|
+
|
222
|
+
# Auto-generate S3 key
|
223
|
+
arge extract_to_s3 Order 123 \
|
224
|
+
--bucket my-extraction-bucket \
|
225
|
+
--region us-east-1
|
226
|
+
```
|
227
|
+
|
228
|
+
### List S3 Files
|
229
|
+
|
230
|
+
```bash
|
231
|
+
# List all extraction files
|
232
|
+
arge s3_list --bucket my-extraction-bucket
|
233
|
+
|
234
|
+
# List with prefix filter
|
235
|
+
arge s3_list --bucket my-extraction-bucket --prefix extractions/2024/
|
236
|
+
|
237
|
+
# Limit results
|
238
|
+
arge s3_list --bucket my-extraction-bucket --max-keys 10
|
239
|
+
```
|
240
|
+
|
241
|
+
### Download from S3
|
242
|
+
|
243
|
+
```bash
|
244
|
+
# Download to specific file
|
245
|
+
arge s3_download extractions/order_123.json \
|
246
|
+
--bucket my-extraction-bucket \
|
247
|
+
--output local_order.json
|
248
|
+
|
249
|
+
# Download using original filename
|
250
|
+
arge s3_download extractions/order_123.json \
|
251
|
+
--bucket my-extraction-bucket
|
252
|
+
```
|
253
|
+
|
254
|
+
## Error Handling
|
255
|
+
|
256
|
+
The S3 integration includes comprehensive error handling:
|
257
|
+
|
258
|
+
```ruby
|
259
|
+
begin
|
260
|
+
result = extractor.extract_and_upload_to_s3(order, bucket_name: 'my-bucket')
|
261
|
+
rescue ActiveRecordGraphExtractor::S3Error => e
|
262
|
+
puts "S3 operation failed: #{e.message}"
|
263
|
+
rescue ActiveRecordGraphExtractor::ExtractionError => e
|
264
|
+
puts "Extraction failed: #{e.message}"
|
265
|
+
rescue ActiveRecordGraphExtractor::FileError => e
|
266
|
+
puts "File operation failed: #{e.message}"
|
267
|
+
end
|
268
|
+
```
|
269
|
+
|
270
|
+
Common S3 errors:
|
271
|
+
- `Bucket not found` - The specified bucket doesn't exist
|
272
|
+
- `Access denied` - Insufficient permissions
|
273
|
+
- `File not found in S3` - Trying to download non-existent file
|
274
|
+
- `Failed to upload file to S3` - Network or permission issues
|
275
|
+
|
276
|
+
## Best Practices
|
277
|
+
|
278
|
+
### Security
|
279
|
+
|
280
|
+
1. **Use IAM Roles** when running on AWS infrastructure
|
281
|
+
2. **Rotate credentials** regularly
|
282
|
+
3. **Use least-privilege permissions** - only grant necessary S3 actions
|
283
|
+
4. **Enable S3 bucket encryption** for sensitive data
|
284
|
+
|
285
|
+
### Performance
|
286
|
+
|
287
|
+
1. **Use appropriate regions** - choose regions close to your application
|
288
|
+
2. **Consider S3 storage classes** for archival data (IA, Glacier)
|
289
|
+
3. **Enable S3 Transfer Acceleration** for global applications
|
290
|
+
4. **Use multipart uploads** for large files (handled automatically by AWS SDK)
|
291
|
+
|
292
|
+
### Organization
|
293
|
+
|
294
|
+
1. **Use consistent S3 key patterns**:
|
295
|
+
```ruby
|
296
|
+
# Good: organized by date and model
|
297
|
+
"extractions/2024/01/25/orders/order_123.json"
|
298
|
+
|
299
|
+
# Good: include metadata in key
|
300
|
+
"extractions/production/orders/2024-01-25/order_123_depth_3.json"
|
301
|
+
```
|
302
|
+
|
303
|
+
2. **Set up S3 lifecycle policies** to automatically archive or delete old extractions
|
304
|
+
|
305
|
+
3. **Use S3 bucket notifications** to trigger downstream processing
|
306
|
+
|
307
|
+
### Monitoring
|
308
|
+
|
309
|
+
1. **Enable S3 access logging** to track usage
|
310
|
+
2. **Set up CloudWatch metrics** for monitoring
|
311
|
+
3. **Use S3 inventory** for large-scale file management
|
312
|
+
|
313
|
+
## Configuration Examples
|
314
|
+
|
315
|
+
### Production Configuration
|
316
|
+
|
317
|
+
```ruby
|
318
|
+
# config/initializers/activerecord_graph_extractor.rb
|
319
|
+
ActiveRecordGraphExtractor.configure do |config|
|
320
|
+
# S3 settings
|
321
|
+
config.s3_bucket = ENV['EXTRACTION_S3_BUCKET']
|
322
|
+
config.s3_region = ENV['AWS_REGION'] || 'us-east-1'
|
323
|
+
config.s3_key_prefix = "extractions/#{Rails.env}"
|
324
|
+
|
325
|
+
# Extraction settings
|
326
|
+
config.max_depth = 5
|
327
|
+
config.handle_circular_references = true
|
328
|
+
end
|
329
|
+
```
|
330
|
+
|
331
|
+
### Development Configuration
|
332
|
+
|
333
|
+
```ruby
|
334
|
+
# For development, you might want to use local files instead of S3
|
335
|
+
ActiveRecordGraphExtractor.configure do |config|
|
336
|
+
if Rails.env.development?
|
337
|
+
# Use local storage in development
|
338
|
+
config.default_output_path = Rails.root.join('tmp', 'extractions')
|
339
|
+
else
|
340
|
+
# Use S3 in other environments
|
341
|
+
config.s3_bucket = ENV['EXTRACTION_S3_BUCKET']
|
342
|
+
config.s3_region = ENV['AWS_REGION']
|
343
|
+
end
|
344
|
+
end
|
345
|
+
```
|
346
|
+
|
347
|
+
## Integration with Other Services
|
348
|
+
|
349
|
+
### Data Pipelines
|
350
|
+
|
351
|
+
```ruby
|
352
|
+
# Extract and trigger downstream processing
|
353
|
+
result = extractor.extract_and_upload_to_s3(order, bucket_name: 'pipeline-bucket')
|
354
|
+
|
355
|
+
# Notify processing service
|
356
|
+
ProcessingService.notify_new_extraction(
|
357
|
+
s3_url: result['s3_upload'][:url],
|
358
|
+
metadata: result['metadata']
|
359
|
+
)
|
360
|
+
```
|
361
|
+
|
362
|
+
### Backup and Archival
|
363
|
+
|
364
|
+
```ruby
|
365
|
+
# Regular backup job
|
366
|
+
class ExtractionBackupJob
|
367
|
+
def perform
|
368
|
+
critical_orders = Order.where(status: 'critical')
|
369
|
+
|
370
|
+
critical_orders.find_each do |order|
|
371
|
+
extractor.extract_and_upload_to_s3(
|
372
|
+
order,
|
373
|
+
bucket_name: 'backup-bucket',
|
374
|
+
s3_key: "backups/#{Date.current}/order_#{order.id}.json"
|
375
|
+
)
|
376
|
+
end
|
377
|
+
end
|
378
|
+
end
|
379
|
+
```
|
380
|
+
|
381
|
+
This S3 integration makes the ActiveRecord Graph Extractor a powerful tool for cloud-native data extraction and processing workflows.
|