s3grep 0.1.9 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.ruby-version +1 -1
- data/ARCHITECTURE.md +188 -0
- data/CLAUDE.md +55 -0
- data/Gemfile.lock +24 -17
- data/README.md +125 -5
- data/bin/s3cat +31 -6
- data/bin/s3grep +54 -18
- data/bin/s3info +37 -19
- data/bin/s3report +70 -44
- data/lib/s3grep/directory.rb +11 -13
- data/lib/s3grep/directory_info.rb +2 -0
- data/lib/s3grep/search.rb +109 -21
- data/s3grep.gemspec +6 -3
- metadata +24 -9
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 686a06c854681a05e7b5915cc5819fe710b2b3da842926b6d5acb019de08c1d8
|
|
4
|
+
data.tar.gz: ebbfd6d9f891ca5a3a7036c90775595357f376385c5573c413fff740d9a7e086
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: fd00546c540f5ef8688b9af0e3668366bc496e4eae2120f394aac67400ade50c485ae43b00e4987e03423cc2d53404bafe6fa59f07c122c82d5c9de1de5f6071
|
|
7
|
+
data.tar.gz: 216072d06695ac0404584cc1273c1e84fc988e61eab9b8700c9d3e4ffcee786f169077e860502a4db2c86aa23aa0a39e8a3ae53c9b9b3d4c98073ac123026b33
|
data/.ruby-version
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
3.
|
|
1
|
+
3.4.2
|
data/ARCHITECTURE.md
ADDED
|
@@ -0,0 +1,188 @@
|
|
|
1
|
+
# Architecture
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
s3grep is a Ruby gem providing grep-like functionality for AWS S3 objects. It streams files directly from S3 without downloading them locally, enabling efficient searching of large files.
|
|
6
|
+
|
|
7
|
+
## Component Diagram
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
11
|
+
│ CLI Layer (bin/) │
|
|
12
|
+
├──────────┬──────────┬──────────────┬───────────────────────────┤
|
|
13
|
+
│ s3grep │ s3cat │ s3info │ s3report │
|
|
14
|
+
│ (search) │ (stream) │ (dir stats) │ (bucket inventory) │
|
|
15
|
+
└────┬─────┴────┬─────┴──────┬───────┴─────────────┬─────────────┘
|
|
16
|
+
│ │ │ │
|
|
17
|
+
▼ ▼ ▼ ▼
|
|
18
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
19
|
+
│ S3Grep Module (lib/) │
|
|
20
|
+
├─────────────────────┬───────────────────┬───────────────────────┤
|
|
21
|
+
│ Search │ Directory │ DirectoryInfo │
|
|
22
|
+
│ (file streaming) │ (object listing) │ (stats aggregation) │
|
|
23
|
+
└──────────┬──────────┴─────────┬─────────┴───────────┬───────────┘
|
|
24
|
+
│ │ │
|
|
25
|
+
▼ ▼ ▼
|
|
26
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
27
|
+
│ AWS SDK (aws-sdk-s3) │
|
|
28
|
+
│ get_object │ list_objects │ list_buckets │
|
|
29
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## Core Classes
|
|
33
|
+
|
|
34
|
+
### S3Grep::Search
|
|
35
|
+
|
|
36
|
+
True streaming S3 object reader with line-by-line regex matching. Uses chunked transfer to avoid loading entire files into memory.
|
|
37
|
+
|
|
38
|
+
**Responsibilities:**
|
|
39
|
+
- Parse S3 URL to extract bucket and key
|
|
40
|
+
- Stream object content via `get_object` block form (chunked transfer)
|
|
41
|
+
- Buffer partial lines across chunk boundaries
|
|
42
|
+
- Auto-detect and decompress .gz files (streaming) and .zip files (buffered)
|
|
43
|
+
- Yield matching lines with line numbers
|
|
44
|
+
- Enforce size limits to prevent resource exhaustion
|
|
45
|
+
|
|
46
|
+
**Key Methods:**
|
|
47
|
+
- `Search.search(s3_url, client, regex)` - Class method for simple searches
|
|
48
|
+
- `Search.detect_compression(s3_url)` - Infers compression from file extension
|
|
49
|
+
- `#each_line` - Core streaming iterator, yields lines as they arrive
|
|
50
|
+
- `#to_io` - Returns StreamingIO adapter for backward compatibility
|
|
51
|
+
|
|
52
|
+
**Streaming Implementation:**
|
|
53
|
+
- Raw files: Chunks streamed directly, lines extracted from buffer
|
|
54
|
+
- Gzip files: Chunks decompressed via `Zlib::Inflate` as they arrive
|
|
55
|
+
- ZIP files: Must buffer entire archive (ZIP format requires central directory at EOF)
|
|
56
|
+
|
|
57
|
+
### S3Grep::Directory
|
|
58
|
+
|
|
59
|
+
Lists objects in an S3 prefix with optional glob-style filtering.
|
|
60
|
+
|
|
61
|
+
**Responsibilities:**
|
|
62
|
+
- Parse S3 URL to extract bucket and prefix
|
|
63
|
+
- Handle pagination (1000 objects per request)
|
|
64
|
+
- URL-encode/decode object keys with special characters
|
|
65
|
+
- Support regex filtering via `glob` method
|
|
66
|
+
|
|
67
|
+
**Key Methods:**
|
|
68
|
+
- `Directory.glob(s3_url, client, regex)` - List objects matching pattern
|
|
69
|
+
- `#each` - Iterate full S3 URLs for all objects
|
|
70
|
+
- `#each_content` - Iterate raw S3 object metadata (for DirectoryInfo)
|
|
71
|
+
- `#info` - Factory method returning DirectoryInfo
|
|
72
|
+
|
|
73
|
+
### S3Grep::DirectoryInfo
|
|
74
|
+
|
|
75
|
+
Aggregates statistics while iterating through directory contents.
|
|
76
|
+
|
|
77
|
+
**Responsibilities:**
|
|
78
|
+
- Count files and total size
|
|
79
|
+
- Track newest/oldest files by modification date
|
|
80
|
+
- Breakdown counts and sizes by storage class
|
|
81
|
+
|
|
82
|
+
**Key Methods:**
|
|
83
|
+
- `DirectoryInfo.get(directory)` - Process directory and return populated info
|
|
84
|
+
- `#last_modified` / `#first_modified` - Timestamp accessors
|
|
85
|
+
- `#newest_file` / `#first_file` - Key accessors
|
|
86
|
+
|
|
87
|
+
## Data Flow
|
|
88
|
+
|
|
89
|
+
### Search Flow (s3grep)
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
User Input: regex + s3://bucket/key
|
|
93
|
+
│
|
|
94
|
+
▼
|
|
95
|
+
┌──────────────┐
|
|
96
|
+
│ Parse S3 URL │
|
|
97
|
+
└──────┬───────┘
|
|
98
|
+
│
|
|
99
|
+
▼
|
|
100
|
+
┌──────────────────────┐
|
|
101
|
+
│ Detect compression │
|
|
102
|
+
│ (.gz, .zip, or none) │
|
|
103
|
+
└──────────┬───────────┘
|
|
104
|
+
│
|
|
105
|
+
▼
|
|
106
|
+
┌──────────────────────────────────────┐
|
|
107
|
+
│ aws_s3_client.get_object(block form) │
|
|
108
|
+
│ Streams chunks as they arrive │
|
|
109
|
+
└──────────┬───────────────────────────┘
|
|
110
|
+
│
|
|
111
|
+
▼ (for each chunk)
|
|
112
|
+
┌──────────────────────────────────────┐
|
|
113
|
+
│ Decompress chunk if gzip │
|
|
114
|
+
│ (Zlib::Inflate streaming) │
|
|
115
|
+
└──────────┬───────────────────────────┘
|
|
116
|
+
│
|
|
117
|
+
▼
|
|
118
|
+
┌──────────────────────────────────────┐
|
|
119
|
+
│ Append to line buffer │
|
|
120
|
+
│ Extract complete lines │
|
|
121
|
+
│ Yield matches with line numbers │
|
|
122
|
+
└──────────┬───────────────────────────┘
|
|
123
|
+
│
|
|
124
|
+
▼
|
|
125
|
+
┌──────────────────────────────────────┐
|
|
126
|
+
│ Repeat until stream exhausted │
|
|
127
|
+
│ Yield final partial line if any │
|
|
128
|
+
└──────────────────────────────────────┘
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
**Memory Behavior:**
|
|
132
|
+
- Raw/Gzip: Only current chunk + line buffer in memory (~64KB typical)
|
|
133
|
+
- ZIP: Entire archive buffered (ZIP format limitation)
|
|
134
|
+
|
|
135
|
+
### Directory Listing Flow (s3info, recursive s3grep)
|
|
136
|
+
|
|
137
|
+
```
|
|
138
|
+
User Input: s3://bucket/prefix/
|
|
139
|
+
│
|
|
140
|
+
▼
|
|
141
|
+
┌──────────────────────┐
|
|
142
|
+
│ list_objects │
|
|
143
|
+
│ (max_keys: 1000) │
|
|
144
|
+
└──────────┬───────────┘
|
|
145
|
+
│
|
|
146
|
+
▼
|
|
147
|
+
┌──────────────────────┐
|
|
148
|
+
│ More results? │──No──▶ Done
|
|
149
|
+
│ (size == max_keys) │
|
|
150
|
+
└──────────┬───────────┘
|
|
151
|
+
│ Yes
|
|
152
|
+
▼
|
|
153
|
+
┌──────────────────────┐
|
|
154
|
+
│ list_objects with │
|
|
155
|
+
│ marker = last key │
|
|
156
|
+
└──────────┬───────────┘
|
|
157
|
+
│
|
|
158
|
+
└───────▶ (repeat until exhausted)
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
## AWS Integration
|
|
162
|
+
|
|
163
|
+
### Authentication
|
|
164
|
+
Uses the AWS SDK default credential chain:
|
|
165
|
+
1. Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)
|
|
166
|
+
2. Shared credentials file (`~/.aws/credentials`)
|
|
167
|
+
3. IAM instance profile (EC2/ECS)
|
|
168
|
+
|
|
169
|
+
### Region Handling
|
|
170
|
+
- Default client uses `AWS_REGION` or `~/.aws/config`
|
|
171
|
+
- `s3report` creates region-specific clients per bucket via `get_bucket_location`
|
|
172
|
+
|
|
173
|
+
### S3 URL Format
|
|
174
|
+
All tools expect: `s3://bucket-name/path/to/prefix`
|
|
175
|
+
- Host = bucket name
|
|
176
|
+
- Path = object key or prefix (URL-decoded internally)
|
|
177
|
+
|
|
178
|
+
## Compression Support
|
|
179
|
+
|
|
180
|
+
| Extension | Library | Streaming | Notes |
|
|
181
|
+
|-----------|---------|-----------|-------|
|
|
182
|
+
| `.gz` | zlib (stdlib) | ✅ Yes | Zlib::Inflate processes chunks as they arrive |
|
|
183
|
+
| `.zip` | rubyzip | ❌ No | ZIP format requires buffering (central directory at EOF) |
|
|
184
|
+
| (none) | - | ✅ Yes | Chunks streamed directly |
|
|
185
|
+
|
|
186
|
+
**Size Limits:**
|
|
187
|
+
- `MAX_BYTES_PROCESSED` (100MB default) prevents resource exhaustion
|
|
188
|
+
- Configurable via `S3Grep::Search::MAX_BYTES_PROCESSED`
|
data/CLAUDE.md
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Project Overview
|
|
6
|
+
|
|
7
|
+
s3grep is a Ruby gem for searching through S3 files without downloading them. It provides CLI tools for grep-like searching, file viewing, and bucket reporting directly on S3 objects.
|
|
8
|
+
|
|
9
|
+
## Development Commands
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
# Install dependencies
|
|
13
|
+
bundle install
|
|
14
|
+
|
|
15
|
+
# Build the gem
|
|
16
|
+
gem build s3grep.gemspec
|
|
17
|
+
|
|
18
|
+
# Install locally for testing
|
|
19
|
+
gem install s3grep-*.gem
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## CLI Tools
|
|
23
|
+
|
|
24
|
+
- `s3grep` - Search for patterns in S3 files (supports `-i` for case-insensitive, `-r` for recursive, `--include` for file patterns)
|
|
25
|
+
- `s3cat` - Stream S3 file contents to stdout
|
|
26
|
+
- `s3info` - Get directory statistics (file count, size, storage classes, date ranges) as JSON
|
|
27
|
+
- `s3report` - Generate CSV report of all buckets in an AWS account
|
|
28
|
+
|
|
29
|
+
## Architecture
|
|
30
|
+
|
|
31
|
+
**Core Classes (lib/s3grep/):**
|
|
32
|
+
|
|
33
|
+
- `Search` - Streams S3 objects and searches line-by-line with regex. Auto-detects compression (.gz, .zip)
|
|
34
|
+
- `Directory` - Lists S3 objects with prefix filtering. Handles pagination via marker-based iteration
|
|
35
|
+
- `DirectoryInfo` - Aggregates statistics from Directory iteration (counts, sizes, timestamps by storage class)
|
|
36
|
+
|
|
37
|
+
**S3 URL Convention:** All tools use `s3://bucket-name/path/to/object` format. The bucket name is parsed from the URL host, and the path becomes the S3 key prefix.
|
|
38
|
+
|
|
39
|
+
**AWS Authentication:** Uses default AWS SDK credential chain (env vars, ~/.aws/credentials, IAM roles). Region-specific clients are created automatically for cross-region bucket access in s3report.
|
|
40
|
+
|
|
41
|
+
## Code Commits
|
|
42
|
+
|
|
43
|
+
Format using angular formatting:
|
|
44
|
+
```
|
|
45
|
+
<type>(<scope>): <short summary>
|
|
46
|
+
```
|
|
47
|
+
- **type**: build|ci|docs|feat|fix|perf|refactor|test
|
|
48
|
+
- **scope**: The feature or component of the service we're working on
|
|
49
|
+
- **summary**: Summary in present tense. Not capitalized. No period at the end.
|
|
50
|
+
|
|
51
|
+
## Documentation Maintenance
|
|
52
|
+
|
|
53
|
+
When modifying the codebase, keep documentation in sync:
|
|
54
|
+
- **ARCHITECTURE.md** - Update when adding/removing classes, changing component relationships, or altering data flow patterns
|
|
55
|
+
- **README.md** - Update when adding new features, changing public APIs, or modifying usage examples
|
data/Gemfile.lock
CHANGED
|
@@ -1,29 +1,36 @@
|
|
|
1
1
|
GEM
|
|
2
2
|
remote: http://rubygems.org/
|
|
3
3
|
specs:
|
|
4
|
-
aws-eventstream (1.
|
|
5
|
-
aws-partitions (1.
|
|
6
|
-
aws-sdk-core (3.
|
|
7
|
-
aws-eventstream (~> 1, >= 1.0
|
|
8
|
-
aws-partitions (~> 1, >= 1.
|
|
9
|
-
aws-sigv4 (~> 1.
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
aws-sdk-
|
|
15
|
-
aws-sdk-core (~> 3, >= 3.
|
|
4
|
+
aws-eventstream (1.4.0)
|
|
5
|
+
aws-partitions (1.1211.0)
|
|
6
|
+
aws-sdk-core (3.241.4)
|
|
7
|
+
aws-eventstream (~> 1, >= 1.3.0)
|
|
8
|
+
aws-partitions (~> 1, >= 1.992.0)
|
|
9
|
+
aws-sigv4 (~> 1.9)
|
|
10
|
+
base64
|
|
11
|
+
bigdecimal
|
|
12
|
+
jmespath (~> 1, >= 1.6.1)
|
|
13
|
+
logger
|
|
14
|
+
aws-sdk-kms (1.121.0)
|
|
15
|
+
aws-sdk-core (~> 3, >= 3.241.4)
|
|
16
|
+
aws-sigv4 (~> 1.5)
|
|
17
|
+
aws-sdk-s3 (1.213.0)
|
|
18
|
+
aws-sdk-core (~> 3, >= 3.241.4)
|
|
16
19
|
aws-sdk-kms (~> 1)
|
|
17
|
-
aws-sigv4 (~> 1.
|
|
18
|
-
aws-sigv4 (1.
|
|
20
|
+
aws-sigv4 (~> 1.5)
|
|
21
|
+
aws-sigv4 (1.12.1)
|
|
19
22
|
aws-eventstream (~> 1, >= 1.0.2)
|
|
20
|
-
|
|
23
|
+
base64 (0.3.0)
|
|
24
|
+
bigdecimal (4.0.1)
|
|
25
|
+
jmespath (1.6.2)
|
|
26
|
+
logger (1.7.0)
|
|
21
27
|
|
|
22
28
|
PLATFORMS
|
|
23
|
-
|
|
29
|
+
arm64-darwin-24
|
|
30
|
+
ruby
|
|
24
31
|
|
|
25
32
|
DEPENDENCIES
|
|
26
33
|
aws-sdk-s3
|
|
27
34
|
|
|
28
35
|
BUNDLED WITH
|
|
29
|
-
2.
|
|
36
|
+
2.6.2
|
data/README.md
CHANGED
|
@@ -2,13 +2,133 @@
|
|
|
2
2
|
|
|
3
3
|
Search through S3 files without downloading them.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
## Installation
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
```bash
|
|
8
|
+
gem install s3grep
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
Or add to your Gemfile:
|
|
8
12
|
|
|
9
|
-
|
|
13
|
+
```ruby
|
|
14
|
+
gem 's3grep'
|
|
10
15
|
```
|
|
11
|
-
|
|
16
|
+
|
|
17
|
+
## CLI Tools
|
|
18
|
+
|
|
19
|
+
### s3grep
|
|
20
|
+
|
|
21
|
+
Search for a pattern in S3 files. Supports gzip and zip compressed files automatically.
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
# Basic search
|
|
25
|
+
s3grep "pattern" s3://bucket-name/path/to/file.csv
|
|
26
|
+
|
|
27
|
+
# Case-insensitive search
|
|
28
|
+
s3grep -i "pattern" s3://bucket-name/path/to/file.csv
|
|
29
|
+
|
|
30
|
+
# Recursive search through a directory
|
|
31
|
+
s3grep -r "pattern" s3://bucket-name/path/to/directory/
|
|
32
|
+
|
|
33
|
+
# Recursive search with file pattern filter
|
|
34
|
+
s3grep -r --include "\.csv$" "pattern" s3://bucket-name/logs/
|
|
35
|
+
|
|
36
|
+
# Search compressed files (auto-detected)
|
|
37
|
+
s3grep "error" s3://bucket-name/logs/app.log.gz
|
|
12
38
|
```
|
|
13
39
|
|
|
14
|
-
|
|
40
|
+
Output format: `s3://bucket/path/file:line_number content`
|
|
41
|
+
|
|
42
|
+
### s3cat
|
|
43
|
+
|
|
44
|
+
Stream S3 file contents to stdout.
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
# Print file contents
|
|
48
|
+
s3cat s3://bucket-name/path/to/file.txt
|
|
49
|
+
|
|
50
|
+
# Pipe to other commands
|
|
51
|
+
s3cat s3://bucket-name/data.csv | head -20
|
|
52
|
+
|
|
53
|
+
# Use with standard unix tools
|
|
54
|
+
s3cat s3://bucket-name/users.json | jq '.users[]'
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### s3info
|
|
58
|
+
|
|
59
|
+
Get statistics about an S3 directory as JSON.
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
# Get info for a prefix
|
|
63
|
+
s3info s3://bucket-name/path/to/directory/
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Output includes:
|
|
67
|
+
- `bucket` - Bucket name
|
|
68
|
+
- `base_prefix` - S3 prefix path
|
|
69
|
+
- `total_size` - Total bytes across all files
|
|
70
|
+
- `num_files` - File count
|
|
71
|
+
- `last_modified` / `newest_file` - Most recently modified file
|
|
72
|
+
- `first_modified` / `first_file` - Oldest file
|
|
73
|
+
- `num_files_by_storage_class` - File count breakdown by storage class
|
|
74
|
+
- `total_size_by_storage_class` - Size breakdown by storage class
|
|
75
|
+
|
|
76
|
+
Example output:
|
|
77
|
+
```json
|
|
78
|
+
{
|
|
79
|
+
"bucket": "my-bucket",
|
|
80
|
+
"base_prefix": "logs/2024/",
|
|
81
|
+
"total_size": 1048576000,
|
|
82
|
+
"num_files": 365,
|
|
83
|
+
"last_modified": "2024-12-31T23:59:59+00:00",
|
|
84
|
+
"newest_file": "logs/2024/12/31/app.log",
|
|
85
|
+
"first_modified": "2024-01-01T00:00:00+00:00",
|
|
86
|
+
"first_file": "logs/2024/01/01/app.log",
|
|
87
|
+
"num_files_by_storage_class": {
|
|
88
|
+
"STANDARD": 100,
|
|
89
|
+
"STANDARD_IA": 265
|
|
90
|
+
},
|
|
91
|
+
"total_size_by_storage_class": {
|
|
92
|
+
"STANDARD": 500000000,
|
|
93
|
+
"STANDARD_IA": 548576000
|
|
94
|
+
}
|
|
95
|
+
}
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
### s3report
|
|
99
|
+
|
|
100
|
+
Generate a CSV report of all S3 buckets in your AWS account.
|
|
101
|
+
|
|
102
|
+
```bash
|
|
103
|
+
s3report
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Creates a file named `AWS-S3-Usage-Report-YYYY-MM-DDTHHMMSS.csv` with columns:
|
|
107
|
+
- Bucket
|
|
108
|
+
- Creation Date
|
|
109
|
+
- Total Size
|
|
110
|
+
- Number of Files
|
|
111
|
+
- Last Modified
|
|
112
|
+
- Newest File
|
|
113
|
+
- First Modified
|
|
114
|
+
- First File
|
|
115
|
+
|
|
116
|
+
## AWS Configuration
|
|
117
|
+
|
|
118
|
+
Authentication uses the standard AWS SDK credential chain:
|
|
119
|
+
|
|
120
|
+
1. Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)
|
|
121
|
+
2. Shared credentials file (`~/.aws/credentials`)
|
|
122
|
+
3. IAM instance profile (EC2/ECS)
|
|
123
|
+
|
|
124
|
+
Set your region via `AWS_REGION` environment variable or `~/.aws/config`.
|
|
125
|
+
|
|
126
|
+
Use `AWS_PROFILE` to select a named profile:
|
|
127
|
+
|
|
128
|
+
```bash
|
|
129
|
+
AWS_PROFILE=stage s3grep "error" s3://my-bucket/logs/app.log
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## License
|
|
133
|
+
|
|
134
|
+
MIT
|
data/bin/s3cat
CHANGED
|
@@ -1,12 +1,37 @@
|
|
|
1
1
|
#!/usr/bin/env ruby
|
|
2
2
|
|
|
3
|
-
require 'optparse'
|
|
4
3
|
require 's3grep'
|
|
5
4
|
require 'aws-sdk-s3'
|
|
6
5
|
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
6
|
+
# Exit cleanly on broken pipe (e.g., when piping to head)
|
|
7
|
+
Signal.trap("PIPE", "EXIT")
|
|
8
|
+
|
|
9
|
+
s3_url = ARGV[0]
|
|
10
|
+
|
|
11
|
+
if s3_url.nil? || s3_url.empty?
|
|
12
|
+
$stderr.puts "Usage: s3cat s3://bucket/path/to/file"
|
|
13
|
+
exit 1
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
unless s3_url.start_with?('s3://')
|
|
17
|
+
$stderr.puts "Error: Invalid S3 URL format. Expected s3://bucket/path"
|
|
18
|
+
exit 1
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
begin
|
|
22
|
+
# max_attempts: 1 disables retries (required for streaming)
|
|
23
|
+
aws_s3_client = Aws::S3::Client.new(max_attempts: 1)
|
|
24
|
+
search = S3Grep::Search.new(s3_url, aws_s3_client)
|
|
25
|
+
search.to_io.each do |line|
|
|
26
|
+
print line
|
|
27
|
+
end
|
|
28
|
+
rescue Errno::EPIPE
|
|
29
|
+
# Broken pipe (e.g., piping to head) - exit silently
|
|
30
|
+
exit 0
|
|
31
|
+
rescue Aws::S3::Errors::ServiceError => e
|
|
32
|
+
$stderr.puts "S3 Error: #{e.message}"
|
|
33
|
+
exit 1
|
|
34
|
+
rescue => e
|
|
35
|
+
$stderr.puts "Error: #{e.message}"
|
|
36
|
+
exit 1
|
|
12
37
|
end
|
data/bin/s3grep
CHANGED
|
@@ -4,13 +4,31 @@ require 'optparse'
|
|
|
4
4
|
require 's3grep'
|
|
5
5
|
require 'aws-sdk-s3'
|
|
6
6
|
|
|
7
|
+
# Exit cleanly on broken pipe (e.g., when piping to head)
|
|
8
|
+
Signal.trap("PIPE", "EXIT")
|
|
9
|
+
|
|
10
|
+
# Maximum regex pattern length to prevent ReDoS
|
|
11
|
+
MAX_PATTERN_LENGTH = 1000
|
|
12
|
+
|
|
13
|
+
def safe_regexp(pattern, options = 0)
|
|
14
|
+
if pattern.length > MAX_PATTERN_LENGTH
|
|
15
|
+
$stderr.puts "Error: Pattern too long (max #{MAX_PATTERN_LENGTH} characters)"
|
|
16
|
+
exit 1
|
|
17
|
+
end
|
|
18
|
+
Regexp.new(pattern, options)
|
|
19
|
+
rescue RegexpError => e
|
|
20
|
+
$stderr.puts "Error: Invalid regular expression: #{e.message}"
|
|
21
|
+
exit 1
|
|
22
|
+
end
|
|
23
|
+
|
|
7
24
|
options = {
|
|
8
25
|
ignore_case: false,
|
|
9
26
|
recursive: false,
|
|
10
27
|
file_pattern: /.*/
|
|
11
28
|
}
|
|
29
|
+
|
|
12
30
|
OptionParser.new do |opts|
|
|
13
|
-
opts.banner = 'Usage: s3grep [options]'
|
|
31
|
+
opts.banner = 'Usage: s3grep [options] PATTERN s3://bucket/path'
|
|
14
32
|
|
|
15
33
|
opts.on('-i', '--ignore-case', 'Ignore case') do
|
|
16
34
|
options[:ignore_case] = true
|
|
@@ -21,30 +39,48 @@ OptionParser.new do |opts|
|
|
|
21
39
|
end
|
|
22
40
|
|
|
23
41
|
opts.on('--include FILE_PATTERN', 'Include matching files') do |v|
|
|
24
|
-
options[:file_pattern] =
|
|
42
|
+
options[:file_pattern] = safe_regexp(v, Regexp::IGNORECASE)
|
|
25
43
|
end
|
|
26
44
|
end.parse!
|
|
27
45
|
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
0
|
|
33
|
-
end
|
|
46
|
+
if ARGV.length < 2
|
|
47
|
+
$stderr.puts "Usage: s3grep [options] PATTERN s3://bucket/path"
|
|
48
|
+
exit 1
|
|
49
|
+
end
|
|
34
50
|
|
|
35
|
-
|
|
51
|
+
pattern = ARGV[0]
|
|
36
52
|
s3_url = ARGV[1]
|
|
37
53
|
|
|
38
|
-
|
|
54
|
+
unless s3_url.start_with?('s3://')
|
|
55
|
+
$stderr.puts "Error: Invalid S3 URL format. Expected s3://bucket/path"
|
|
56
|
+
exit 1
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
regex_options = options[:ignore_case] ? Regexp::IGNORECASE : 0
|
|
60
|
+
regex = safe_regexp(pattern, regex_options)
|
|
61
|
+
|
|
62
|
+
begin
|
|
63
|
+
# max_attempts: 1 disables retries (required for streaming)
|
|
64
|
+
aws_s3_client = Aws::S3::Client.new(max_attempts: 1)
|
|
39
65
|
|
|
40
|
-
if options[:recursive]
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
66
|
+
if options[:recursive]
|
|
67
|
+
S3Grep::Directory.glob(s3_url, aws_s3_client, options[:file_pattern]) do |s3_file|
|
|
68
|
+
S3Grep::Search.search(s3_file, aws_s3_client, regex) do |line_number, line|
|
|
69
|
+
puts "#{s3_file}:#{line_number} #{line}"
|
|
70
|
+
end
|
|
71
|
+
end
|
|
72
|
+
else
|
|
73
|
+
S3Grep::Search.search(s3_url, aws_s3_client, regex) do |line_number, line|
|
|
74
|
+
puts "#{s3_url}:#{line_number} #{line}"
|
|
44
75
|
end
|
|
45
76
|
end
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
77
|
+
rescue Errno::EPIPE
|
|
78
|
+
# Broken pipe (e.g., piping to head) - exit silently
|
|
79
|
+
exit 0
|
|
80
|
+
rescue Aws::S3::Errors::ServiceError => e
|
|
81
|
+
$stderr.puts "S3 Error: #{e.message}"
|
|
82
|
+
exit 1
|
|
83
|
+
rescue => e
|
|
84
|
+
$stderr.puts "Error: #{e.message}"
|
|
85
|
+
exit 1
|
|
50
86
|
end
|
data/bin/s3info
CHANGED
|
@@ -4,26 +4,44 @@ require 'pathname'
|
|
|
4
4
|
BASE_DIR = Pathname.new(File.expand_path('..', __dir__))
|
|
5
5
|
$LOAD_PATH << "#{BASE_DIR}/lib"
|
|
6
6
|
|
|
7
|
-
require 'optparse'
|
|
8
7
|
require 's3grep'
|
|
9
8
|
require 'aws-sdk-s3'
|
|
10
9
|
require 'json'
|
|
11
10
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
11
|
+
s3_url = ARGV[0]
|
|
12
|
+
|
|
13
|
+
if s3_url.nil? || s3_url.empty?
|
|
14
|
+
$stderr.puts "Usage: s3info s3://bucket/path"
|
|
15
|
+
exit 1
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
unless s3_url.start_with?('s3://')
|
|
19
|
+
$stderr.puts "Error: Invalid S3 URL format. Expected s3://bucket/path"
|
|
20
|
+
exit 1
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
begin
|
|
24
|
+
aws_s3_client = Aws::S3::Client.new
|
|
25
|
+
info = S3Grep::Directory.new(s3_url, aws_s3_client).info
|
|
26
|
+
|
|
27
|
+
stats = {
|
|
28
|
+
bucket: info.bucket,
|
|
29
|
+
base_prefix: info.base_prefix,
|
|
30
|
+
total_size: info.total_size,
|
|
31
|
+
num_files: info.num_files,
|
|
32
|
+
last_modified: info.last_modified,
|
|
33
|
+
newest_file: info.newest_file,
|
|
34
|
+
first_modified: info.first_modified,
|
|
35
|
+
first_file: info.first_file,
|
|
36
|
+
num_files_by_storage_class: info.num_files_by_storage_class,
|
|
37
|
+
total_size_by_storage_class: info.total_size_by_storage_class
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
print JSON.pretty_generate(stats) + "\n"
|
|
41
|
+
rescue Aws::S3::Errors::ServiceError => e
|
|
42
|
+
$stderr.puts "S3 Error: #{e.message}"
|
|
43
|
+
exit 1
|
|
44
|
+
rescue => e
|
|
45
|
+
$stderr.puts "Error: #{e.message}"
|
|
46
|
+
exit 1
|
|
47
|
+
end
|
data/bin/s3report
CHANGED
|
@@ -1,56 +1,82 @@
|
|
|
1
1
|
#!/usr/bin/env ruby
|
|
2
2
|
|
|
3
|
-
require 'optparse'
|
|
4
3
|
require 's3grep'
|
|
5
4
|
require 'aws-sdk-s3'
|
|
6
5
|
require 'csv'
|
|
7
6
|
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
7
|
+
# Sanitize values to prevent CSV injection attacks
|
|
8
|
+
# Prefixes dangerous characters with a single quote
|
|
9
|
+
def sanitize_csv_value(value)
|
|
10
|
+
return value unless value.is_a?(String)
|
|
11
|
+
return value if value.empty?
|
|
12
|
+
|
|
13
|
+
# Characters that can trigger formula execution in spreadsheets
|
|
14
|
+
if %w[= + - @ \t \r].include?(value[0])
|
|
15
|
+
"'#{value}"
|
|
16
|
+
else
|
|
17
|
+
value
|
|
18
|
+
end
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
begin
|
|
22
|
+
bucket_info = {}
|
|
23
|
+
aws_s3_client = Aws::S3::Client.new
|
|
24
|
+
|
|
25
|
+
aws_s3_client.list_buckets[:buckets].each do |bucket|
|
|
26
|
+
name = bucket[:name]
|
|
27
|
+
puts name
|
|
28
|
+
|
|
29
|
+
begin
|
|
30
|
+
bucket_location = aws_s3_client.get_bucket_location(bucket: name)
|
|
31
|
+
aws_s3_client_region_specific =
|
|
32
|
+
if bucket_location[:location_constraint].nil? || bucket_location[:location_constraint] == ''
|
|
33
|
+
aws_s3_client
|
|
34
|
+
else
|
|
35
|
+
Aws::S3::Client.new(region: bucket_location[:location_constraint])
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
info = S3Grep::Directory.new("s3://#{name}/", aws_s3_client_region_specific).info
|
|
39
|
+
|
|
40
|
+
bucket_info[name] = {
|
|
41
|
+
bucket: info.bucket,
|
|
42
|
+
creation_date: bucket[:creation_date],
|
|
43
|
+
total_size: info.total_size,
|
|
44
|
+
num_files: info.num_files,
|
|
45
|
+
last_modified: info.last_modified,
|
|
46
|
+
newest_file: info.newest_file,
|
|
47
|
+
first_modified: info.first_modified,
|
|
48
|
+
first_file: info.first_file
|
|
49
|
+
}
|
|
50
|
+
rescue Aws::S3::Errors::ServiceError => e
|
|
51
|
+
$stderr.puts "Warning: Could not access bucket '#{name}': #{e.message}"
|
|
20
52
|
end
|
|
53
|
+
end
|
|
21
54
|
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
first_modified: info.first_modified,
|
|
32
|
-
first_file: info.first_file
|
|
55
|
+
csv_headers = {
|
|
56
|
+
bucket: 'Bucket',
|
|
57
|
+
creation_date: 'Creation Date',
|
|
58
|
+
total_size: 'Total Size',
|
|
59
|
+
num_files: 'Number of Files',
|
|
60
|
+
last_modified: 'Last Modified',
|
|
61
|
+
newest_file: 'Newest File',
|
|
62
|
+
first_modified: 'First Modified',
|
|
63
|
+
first_file: 'First File'
|
|
33
64
|
}
|
|
34
|
-
end
|
|
35
65
|
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
first_modified: 'First Modified',
|
|
44
|
-
first_file: 'First File'
|
|
45
|
-
}
|
|
46
|
-
|
|
47
|
-
file = "AWS-S3-Usage-Report-#{Time.now.strftime('%Y-%m-%dT%H%M%S')}.csv"
|
|
48
|
-
CSV.open(file, 'w') do |csv|
|
|
49
|
-
csv << csv_headers.values
|
|
50
|
-
|
|
51
|
-
bucket_info.each_value do |stats|
|
|
52
|
-
csv << csv_headers.keys.map { |k| stats[k] }
|
|
66
|
+
file = "AWS-S3-Usage-Report-#{Time.now.strftime('%Y-%m-%dT%H%M%S')}.csv"
|
|
67
|
+
CSV.open(file, 'w') do |csv|
|
|
68
|
+
csv << csv_headers.values
|
|
69
|
+
|
|
70
|
+
bucket_info.each_value do |stats|
|
|
71
|
+
csv << csv_headers.keys.map { |k| sanitize_csv_value(stats[k]) }
|
|
72
|
+
end
|
|
53
73
|
end
|
|
54
|
-
end
|
|
55
74
|
|
|
56
|
-
puts file
|
|
75
|
+
puts file
|
|
76
|
+
rescue Aws::S3::Errors::ServiceError => e
|
|
77
|
+
$stderr.puts "S3 Error: #{e.message}"
|
|
78
|
+
exit 1
|
|
79
|
+
rescue => e
|
|
80
|
+
$stderr.puts "Error: #{e.message}"
|
|
81
|
+
exit 1
|
|
82
|
+
end
|
data/lib/s3grep/directory.rb
CHANGED
|
@@ -12,6 +12,10 @@ module S3Grep
|
|
|
12
12
|
@aws_s3_client = aws_s3_client
|
|
13
13
|
end
|
|
14
14
|
|
|
15
|
+
def uri
|
|
16
|
+
@uri ||= URI(s3_url)
|
|
17
|
+
end
|
|
18
|
+
|
|
15
19
|
def self.glob(s3_url, aws_s3_client, regex, &block)
|
|
16
20
|
new(s3_url, aws_s3_client).glob(regex, &block)
|
|
17
21
|
end
|
|
@@ -31,18 +35,14 @@ module S3Grep
|
|
|
31
35
|
end
|
|
32
36
|
|
|
33
37
|
def each_content
|
|
34
|
-
uri = URI(s3_url)
|
|
35
|
-
|
|
36
38
|
max_keys = 1_000
|
|
37
39
|
|
|
38
40
|
prefix = CGI.unescape(uri.path[1..-1] || '')
|
|
39
41
|
|
|
40
42
|
resp = aws_s3_client.list_objects(
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
max_keys: max_keys
|
|
45
|
-
}
|
|
43
|
+
bucket: uri.host,
|
|
44
|
+
prefix: prefix,
|
|
45
|
+
max_keys: max_keys
|
|
46
46
|
)
|
|
47
47
|
|
|
48
48
|
resp.contents.each do |content|
|
|
@@ -53,12 +53,10 @@ module S3Grep
|
|
|
53
53
|
marker = resp.contents.last.key
|
|
54
54
|
|
|
55
55
|
resp = aws_s3_client.list_objects(
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
marker: marker
|
|
61
|
-
}
|
|
56
|
+
bucket: uri.host,
|
|
57
|
+
prefix: prefix,
|
|
58
|
+
max_keys: max_keys,
|
|
59
|
+
marker: marker
|
|
62
60
|
)
|
|
63
61
|
|
|
64
62
|
resp.contents.each do |content|
|
data/lib/s3grep/search.rb
CHANGED
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
require 'aws-sdk-s3'
|
|
2
2
|
require 'cgi'
|
|
3
|
+
require 'zlib'
|
|
3
4
|
|
|
4
5
|
module S3Grep
|
|
5
6
|
class Search
|
|
@@ -10,11 +11,11 @@ module S3Grep
|
|
|
10
11
|
def initialize(s3_url, aws_s3_client, compression = nil)
|
|
11
12
|
@s3_url = s3_url
|
|
12
13
|
@aws_s3_client = aws_s3_client
|
|
13
|
-
@compression = compression
|
|
14
|
+
@compression = compression || self.class.detect_compression(s3_url)
|
|
14
15
|
end
|
|
15
16
|
|
|
16
17
|
def self.search(s3_url, aws_s3_client, regex, &block)
|
|
17
|
-
new(s3_url, aws_s3_client
|
|
18
|
+
new(s3_url, aws_s3_client).search(regex, &block)
|
|
18
19
|
end
|
|
19
20
|
|
|
20
21
|
def self.detect_compression(s3_url)
|
|
@@ -24,9 +25,10 @@ module S3Grep
|
|
|
24
25
|
nil
|
|
25
26
|
end
|
|
26
27
|
|
|
28
|
+
|
|
27
29
|
def search(regex)
|
|
28
30
|
line_number = 0
|
|
29
|
-
|
|
31
|
+
each_line do |line|
|
|
30
32
|
line_number += 1
|
|
31
33
|
next unless line.match?(regex)
|
|
32
34
|
|
|
@@ -34,28 +36,114 @@ module S3Grep
|
|
|
34
36
|
end
|
|
35
37
|
end
|
|
36
38
|
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
39
|
+
# Stream lines from S3 without loading entire file into memory
|
|
40
|
+
def each_line(&block)
|
|
41
|
+
if compression == :gzip
|
|
42
|
+
each_line_gzip(&block)
|
|
43
|
+
elsif compression == :zip
|
|
44
|
+
each_line_zip(&block)
|
|
45
|
+
else
|
|
46
|
+
each_line_raw(&block)
|
|
47
|
+
end
|
|
46
48
|
end
|
|
47
49
|
|
|
50
|
+
# For backward compatibility - streams content for s3cat
|
|
48
51
|
def to_io
|
|
49
|
-
|
|
52
|
+
StreamingIO.new(self)
|
|
53
|
+
end
|
|
50
54
|
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
55
|
+
def bucket
|
|
56
|
+
@bucket ||= URI(s3_url).host
|
|
57
|
+
end
|
|
58
|
+
|
|
59
|
+
def key
|
|
60
|
+
@key ||= CGI.unescape(URI(s3_url).path[1..-1])
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
private
|
|
64
|
+
|
|
65
|
+
# Stream raw (uncompressed) content line by line
|
|
66
|
+
# True streaming - only keeps current chunk + line buffer in memory
|
|
67
|
+
def each_line_raw(&block)
|
|
68
|
+
buffer = "".b
|
|
69
|
+
|
|
70
|
+
aws_s3_client.get_object(bucket: bucket, key: key) do |chunk|
|
|
71
|
+
buffer << chunk
|
|
72
|
+
extract_lines!(buffer, &block)
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
# Yield any remaining content (last line without newline)
|
|
76
|
+
yield buffer unless buffer.empty?
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
# Stream gzip content line by line
|
|
80
|
+
# True streaming - decompresses chunks as they arrive from S3
|
|
81
|
+
def each_line_gzip(&block)
|
|
82
|
+
buffer = "".b
|
|
83
|
+
# Zlib::MAX_WBITS + 32 enables automatic gzip/zlib header detection
|
|
84
|
+
inflater = Zlib::Inflate.new(Zlib::MAX_WBITS + 32)
|
|
85
|
+
|
|
86
|
+
begin
|
|
87
|
+
aws_s3_client.get_object(bucket: bucket, key: key) do |chunk|
|
|
88
|
+
# Decompress this chunk
|
|
89
|
+
decompressed = inflater.inflate(chunk)
|
|
90
|
+
buffer << decompressed
|
|
91
|
+
extract_lines!(buffer, &block)
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
# Finish decompression and process remaining data
|
|
95
|
+
remaining = inflater.finish
|
|
96
|
+
buffer << remaining
|
|
97
|
+
extract_lines!(buffer, &block)
|
|
98
|
+
|
|
99
|
+
yield buffer unless buffer.empty?
|
|
100
|
+
ensure
|
|
101
|
+
inflater.close
|
|
102
|
+
end
|
|
103
|
+
end
|
|
104
|
+
|
|
105
|
+
# ZIP files cannot be truly streamed (central directory is at EOF)
|
|
106
|
+
# We stream the download but must buffer before decompressing
|
|
107
|
+
def each_line_zip(&block)
|
|
108
|
+
require 'zip'
|
|
109
|
+
|
|
110
|
+
# Stream download into buffer (ZIP format requires full file)
|
|
111
|
+
body = StringIO.new("".b)
|
|
112
|
+
aws_s3_client.get_object(bucket: bucket, key: key) do |chunk|
|
|
113
|
+
body << chunk
|
|
114
|
+
end
|
|
115
|
+
body.rewind
|
|
116
|
+
|
|
117
|
+
zip = Zip::File.open_buffer(body)
|
|
118
|
+
entry = zip.entries.first
|
|
119
|
+
raise IOError, "ZIP archive is empty" if entry.nil?
|
|
120
|
+
|
|
121
|
+
buffer = "".b
|
|
122
|
+
entry.get_input_stream.each do |chunk|
|
|
123
|
+
buffer << chunk
|
|
124
|
+
extract_lines!(buffer, &block)
|
|
125
|
+
end
|
|
126
|
+
|
|
127
|
+
yield buffer unless buffer.empty?
|
|
128
|
+
end
|
|
129
|
+
|
|
130
|
+
# Extract complete lines from buffer, yielding each one
|
|
131
|
+
def extract_lines!(buffer)
|
|
132
|
+
while (newline_index = buffer.index("\n"))
|
|
133
|
+
line = buffer.slice!(0, newline_index + 1)
|
|
134
|
+
yield line
|
|
135
|
+
end
|
|
136
|
+
end
|
|
137
|
+
|
|
138
|
+
# Adapter class that provides IO-like interface for streaming
|
|
139
|
+
# Used by s3cat for backward compatibility
|
|
140
|
+
class StreamingIO
|
|
141
|
+
def initialize(search)
|
|
142
|
+
@search = search
|
|
143
|
+
end
|
|
144
|
+
|
|
145
|
+
def each(&block)
|
|
146
|
+
@search.each_line(&block)
|
|
59
147
|
end
|
|
60
148
|
end
|
|
61
149
|
end
|
data/s3grep.gemspec
CHANGED
|
@@ -2,10 +2,10 @@
|
|
|
2
2
|
|
|
3
3
|
Gem::Specification.new do |s|
|
|
4
4
|
s.name = 's3grep'
|
|
5
|
-
s.version = '0.
|
|
5
|
+
s.version = '0.2.0'
|
|
6
6
|
s.licenses = ['MIT']
|
|
7
|
-
s.summary = 'Search through S3 files'
|
|
8
|
-
s.description = '
|
|
7
|
+
s.summary = 'Search through S3 files without downloading them'
|
|
8
|
+
s.description = 'CLI tools for streaming search (s3grep), viewing (s3cat), and reporting (s3info, s3report) on S3 objects. Supports gzip compression and searches large files with minimal memory usage.'
|
|
9
9
|
s.authors = ['Doug Youch']
|
|
10
10
|
s.email = 'dougyouch@gmail.com'
|
|
11
11
|
s.homepage = 'https://github.com/dougyouch/s3grep'
|
|
@@ -13,5 +13,8 @@ Gem::Specification.new do |s|
|
|
|
13
13
|
s.bindir = 'bin'
|
|
14
14
|
s.executables = s.files.grep(%r{^bin/}) { |f| File.basename(f) }
|
|
15
15
|
|
|
16
|
+
s.required_ruby_version = '>= 2.6.0'
|
|
17
|
+
|
|
16
18
|
s.add_runtime_dependency 'aws-sdk-s3'
|
|
19
|
+
s.add_runtime_dependency 'rubyzip'
|
|
17
20
|
end
|
metadata
CHANGED
|
@@ -1,14 +1,13 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: s3grep
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Doug Youch
|
|
8
|
-
autorequire:
|
|
9
8
|
bindir: bin
|
|
10
9
|
cert_chain: []
|
|
11
|
-
date:
|
|
10
|
+
date: 2026-02-01 00:00:00.000000000 Z
|
|
12
11
|
dependencies:
|
|
13
12
|
- !ruby/object:Gem::Dependency
|
|
14
13
|
name: aws-sdk-s3
|
|
@@ -24,7 +23,23 @@ dependencies:
|
|
|
24
23
|
- - ">="
|
|
25
24
|
- !ruby/object:Gem::Version
|
|
26
25
|
version: '0'
|
|
27
|
-
|
|
26
|
+
- !ruby/object:Gem::Dependency
|
|
27
|
+
name: rubyzip
|
|
28
|
+
requirement: !ruby/object:Gem::Requirement
|
|
29
|
+
requirements:
|
|
30
|
+
- - ">="
|
|
31
|
+
- !ruby/object:Gem::Version
|
|
32
|
+
version: '0'
|
|
33
|
+
type: :runtime
|
|
34
|
+
prerelease: false
|
|
35
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
36
|
+
requirements:
|
|
37
|
+
- - ">="
|
|
38
|
+
- !ruby/object:Gem::Version
|
|
39
|
+
version: '0'
|
|
40
|
+
description: CLI tools for streaming search (s3grep), viewing (s3cat), and reporting
|
|
41
|
+
(s3info, s3report) on S3 objects. Supports gzip compression and searches large files
|
|
42
|
+
with minimal memory usage.
|
|
28
43
|
email: dougyouch@gmail.com
|
|
29
44
|
executables:
|
|
30
45
|
- s3cat
|
|
@@ -37,6 +52,8 @@ files:
|
|
|
37
52
|
- ".gitignore"
|
|
38
53
|
- ".ruby-gemset"
|
|
39
54
|
- ".ruby-version"
|
|
55
|
+
- ARCHITECTURE.md
|
|
56
|
+
- CLAUDE.md
|
|
40
57
|
- Gemfile
|
|
41
58
|
- Gemfile.lock
|
|
42
59
|
- LICENSE
|
|
@@ -55,7 +72,6 @@ homepage: https://github.com/dougyouch/s3grep
|
|
|
55
72
|
licenses:
|
|
56
73
|
- MIT
|
|
57
74
|
metadata: {}
|
|
58
|
-
post_install_message:
|
|
59
75
|
rdoc_options: []
|
|
60
76
|
require_paths:
|
|
61
77
|
- lib
|
|
@@ -63,15 +79,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
|
63
79
|
requirements:
|
|
64
80
|
- - ">="
|
|
65
81
|
- !ruby/object:Gem::Version
|
|
66
|
-
version:
|
|
82
|
+
version: 2.6.0
|
|
67
83
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
68
84
|
requirements:
|
|
69
85
|
- - ">="
|
|
70
86
|
- !ruby/object:Gem::Version
|
|
71
87
|
version: '0'
|
|
72
88
|
requirements: []
|
|
73
|
-
rubygems_version: 3.
|
|
74
|
-
signing_key:
|
|
89
|
+
rubygems_version: 3.6.2
|
|
75
90
|
specification_version: 4
|
|
76
|
-
summary: Search through S3 files
|
|
91
|
+
summary: Search through S3 files without downloading them
|
|
77
92
|
test_files: []
|