logstash-input-azure_storage_blob 1.0.0-java

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: b820217f7a001f86131e31c24d98c78e96a470593318f90d5fd180470bdf4b68
4
+ data.tar.gz: 94db9edbbf831573d893e54a18441a985af0de7d8097737cbb1c104419639764
5
+ SHA512:
6
+ metadata.gz: 88f88a7545f9c4763d6fa566bdf1aa7408fe2cdfa44aabc65142e935eb83835ccfdab2dc2e68a00995c135932fee638869b65b6fc5dbf6aca50f53f71b0c9351
7
+ data.tar.gz: 48b06fab2c17c6a741c6bbfc20327d0c129a5f22499b32b9e7feac64753fe8f34f85535b9503748ce6c8c1b6c4d10cbb3499dacd0437dda5d54b57c5e9a9890a
data/Gemfile ADDED
@@ -0,0 +1,12 @@
1
+ # AUTOGENERATED BY THE GRADLE SCRIPT. EDITS WILL BE OVERWRITTEN.
2
+ source 'https://rubygems.org'
3
+
4
+ gemspec
5
+
6
+ logstash_path = ENV["LOGSTASH_PATH"] || "../../logstash"
7
+ use_logstash_source = ENV["LOGSTASH_SOURCE"] && ENV["LOGSTASH_SOURCE"].to_s == "1"
8
+
9
+ if Dir.exist?(logstash_path) && use_logstash_source
10
+ gem 'logstash-core', :path => "#{logstash_path}/logstash-core"
11
+ gem 'logstash-core-plugin-api', :path => "#{logstash_path}/logstash-core-plugin-api"
12
+ end
data/README.md ADDED
@@ -0,0 +1,301 @@
1
+ # Logstash Input Plugin for Azure Blob Storage
2
+
3
+ A [Logstash](https://www.elastic.co/logstash) input plugin that reads blobs from [Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/) and streams their content line-by-line into your Logstash pipeline.
4
+
5
+ Built with Microsoft's first-party Java Azure SDKs, this plugin runs natively on the JVM alongside Logstash - no Ruby shims or REST wrappers required. It supports Azure Commercial, Azure Government, and Azure China clouds out of the box.
6
+
7
+ ## Features
8
+
9
+ - **Streaming architecture** - processes blobs line-by-line without loading entire files into memory
10
+ - **Multi-replica safe** - blob leases prevent duplicate processing across multiple Logstash instances
11
+ - **Three state-tracking strategies** - choose the right trade-off between permissions, portability, and coordination
12
+ - **Azure Government and sovereign clouds** - first-class support with automatic cloud detection
13
+ - **Flexible authentication** - DefaultAzureCredential (managed identity, workload identity, CLI), connection strings, or shared keys
14
+
15
+ ## Installation
16
+
17
+ ### From RubyGems
18
+
19
+ ```bash
20
+ /usr/share/logstash/bin/logstash-plugin install logstash-input-azure_storage_blob
21
+ ```
22
+
23
+ ### From GitHub Release
24
+
25
+ Download the `.gem` file from [Releases](https://github.com/kfriede/logstash-input-azure_storage_blob/releases), then install it:
26
+
27
+ ```bash
28
+ /usr/share/logstash/bin/logstash-plugin install --no-verify logstash-input-azure_storage_blob-1.0.0-java.gem
29
+ ```
30
+
31
+ ### From Source
32
+
33
+ ```bash
34
+ git clone https://github.com/kfriede/logstash-input-azure_storage_blob.git
35
+ cd logstash-input-azure_storage_blob
36
+ ./gradlew gem
37
+ /usr/share/logstash/bin/logstash-plugin install --no-verify logstash-input-azure_storage_blob-1.0.0-java.gem
38
+ ```
39
+
40
+ ### Verify
41
+
42
+ ```bash
43
+ /usr/share/logstash/bin/logstash-plugin list | grep azure_storage_blob
44
+ ```
45
+
46
+ ## Quick Start
47
+
48
+ The simplest configuration requires only a storage account and container name. Authentication defaults to `DefaultAzureCredential`, which automatically picks up managed identity, workload identity, Azure CLI credentials, or environment variables.
49
+
50
+ ```ruby
51
+ input {
52
+ azure_storage_blob {
53
+ storage_account => "mystorageaccount"
54
+ container => "logs"
55
+ }
56
+ }
57
+
58
+ output {
59
+ stdout { codec => rubydebug }
60
+ }
61
+ ```
62
+
63
+ ## Configuration Reference
64
+
65
+ ### Required Settings
66
+
67
+ | Setting | Type | Description |
68
+ |---------|------|-------------|
69
+ | `storage_account` | string | Azure Storage account name. |
70
+ | `container` | string | Name of the blob container to read from. |
71
+
72
+ ### Authentication
73
+
74
+ | Setting | Type | Default | Description |
75
+ |---------|------|---------|-------------|
76
+ | `auth_method` | string | `"default"` | Authentication method. One of `default`, `connection_string`, or `shared_key`. |
77
+ | `connection_string` | string | `""` | Azure Storage connection string. Required when `auth_method => "connection_string"`. |
78
+ | `storage_key` | string | `""` | Storage account access key. Required when `auth_method => "shared_key"`. |
79
+
80
+ **`auth_method` options:**
81
+
82
+ | Value | Description | When to use |
83
+ |-------|-------------|-------------|
84
+ | `default` | Uses `DefaultAzureCredential` - automatically tries managed identity, workload identity, environment variables, and Azure CLI in order. | Production deployments on Azure (AKS, VMs, App Service). |
85
+ | `connection_string` | Authenticates with a full connection string. | Quick setup, development, or environments without Azure AD. |
86
+ | `shared_key` | Authenticates with the storage account access key. | When you need key-based auth without a full connection string. |
87
+
88
+ ### Azure Environment
89
+
90
+ | Setting | Type | Default | Description |
91
+ |---------|------|---------|-------------|
92
+ | `cloud` | string | `""` (auto-detect) | Azure cloud environment. One of `AzureCloud`, `AzureUSGovernment`, or `AzureChinaCloud`. When empty, auto-detects from the `AZURE_AUTHORITY_HOST` environment variable. |
93
+ | `blob_endpoint` | string | `""` | Explicit blob service endpoint URL. Overrides the endpoint derived from `cloud` and `storage_account`. Use for private endpoints or the Azurite emulator. |
94
+
95
+ **Cloud endpoints:**
96
+
97
+ | `cloud` value | Storage endpoint | Auth endpoint |
98
+ |---------------|-----------------|---------------|
99
+ | `AzureCloud` | `*.blob.core.windows.net` | `login.microsoftonline.com` |
100
+ | `AzureUSGovernment` | `*.blob.core.usgovcloudapi.net` | `login.microsoftonline.us` |
101
+ | `AzureChinaCloud` | `*.blob.core.chinacloudapi.cn` | `login.chinacloudapi.cn` |
102
+
103
+ ### State Tracking
104
+
105
+ | Setting | Type | Default | Description |
106
+ |---------|------|---------|-------------|
107
+ | `tracking_strategy` | string | `"tags"` | How the plugin tracks which blobs have been processed. One of `tags`, `container`, or `registry`. |
108
+ | `archive_container` | string | `"archive"` | Container where completed blobs are moved. Only used with `container` strategy. |
109
+ | `error_container` | string | `"errors"` | Container where failed blobs are moved. Only used with `container` strategy. |
110
+ | `registry_path` | string | `"/data/registry.db"` | Path to the local SQLite database file. Only used with `registry` strategy. |
111
+
112
+ **`tracking_strategy` options:**
113
+
114
+ | Strategy | How it works | RBAC required | Multi-replica safe |
115
+ |----------|-------------|---------------|-------------------|
116
+ | `tags` | Writes blob index tags (`logstash_status`, etc.) to mark blobs as processing/completed/failed. Uses blob leases for coordination. | Storage Blob Data Owner | Yes |
117
+ | `container` | Moves completed blobs to an archive container and failed blobs to an error container. Uses blob leases for coordination. | Storage Blob Data Contributor | Yes |
118
+ | `registry` | Tracks state in a local SQLite database. No Azure-side state changes. | Storage Blob Data Reader | **No** - single instance only |
119
+
120
+ ### Polling
121
+
122
+ | Setting | Type | Default | Description |
123
+ |---------|------|---------|-------------|
124
+ | `poll_interval` | number | `30` | Seconds to wait between poll cycles. |
125
+ | `prefix` | string | `""` | Only process blobs whose name starts with this prefix. Empty means all blobs. |
126
+ | `blob_batch_size` | number | `10` | Maximum number of blobs to process per poll cycle. |
127
+
128
+ ### Processing
129
+
130
+ | Setting | Type | Default | Description |
131
+ |---------|------|---------|-------------|
132
+ | `skip_empty_lines` | boolean | `true` | Whether to skip empty lines when reading blob content. |
133
+
134
+ ### Lease Coordination
135
+
136
+ These settings apply only to the `tags` and `container` tracking strategies.
137
+
138
+ | Setting | Type | Default | Description |
139
+ |---------|------|---------|-------------|
140
+ | `lease_duration` | number | `30` | Blob lease duration in seconds. Must be between 15 and 60 (Azure requirement). |
141
+ | `lease_renewal` | number | `20` | How often to renew the lease, in seconds. Should be less than `lease_duration`. |
142
+
143
+ ## Event Metadata
144
+
145
+ Every event produced by this plugin includes the following `@metadata` fields:
146
+
147
+ | Field | Description |
148
+ |-------|-------------|
149
+ | `[@metadata][azure_blob_name]` | Name of the source blob. |
150
+ | `[@metadata][azure_blob_container]` | Name of the source container. |
151
+ | `[@metadata][azure_blob_storage_account]` | Storage account the blob was read from. |
152
+ | `[@metadata][azure_blob_line_number]` | Line number within the blob (1-based). |
153
+ | `[@metadata][azure_blob_last_modified]` | Last-modified timestamp of the blob (ISO 8601). |
154
+
155
+ You can use these in filters and outputs:
156
+
157
+ ```ruby
158
+ filter {
159
+ mutate {
160
+ add_field => {
161
+ "source_blob" => "%{[@metadata][azure_blob_name]}"
162
+ }
163
+ }
164
+ }
165
+ ```
166
+
167
+ ## Examples
168
+
169
+ ### Ingest JSON logs from Azure Government
170
+
171
+ ```ruby
172
+ input {
173
+ azure_storage_blob {
174
+ storage_account => "govlogsstorage"
175
+ container => "application-logs"
176
+ cloud => "AzureUSGovernment"
177
+ tracking_strategy => "tags"
178
+ prefix => "prod/"
179
+ poll_interval => 10
180
+ }
181
+ }
182
+
183
+ filter {
184
+ json { source => "message" }
185
+ }
186
+
187
+ output {
188
+ elasticsearch {
189
+ hosts => ["https://elasticsearch:9200"]
190
+ index => "app-logs-%{+YYYY.MM.dd}"
191
+ }
192
+ }
193
+ ```
194
+
195
+ ### Connection string auth with container-based tracking
196
+
197
+ ```ruby
198
+ input {
199
+ azure_storage_blob {
200
+ storage_account => "myaccount"
201
+ container => "incoming"
202
+ auth_method => "connection_string"
203
+ connection_string => "${AZURE_STORAGE_CONNECTION_STRING}"
204
+ tracking_strategy => "container"
205
+ archive_container => "processed"
206
+ error_container => "failed"
207
+ }
208
+ }
209
+
210
+ output {
211
+ stdout { codec => rubydebug }
212
+ }
213
+ ```
214
+
215
+ Blobs are moved from `incoming` to `processed` on success, or to `failed` on error. Create all three containers before starting the pipeline.
216
+
217
+ ### Lightweight single-instance setup with SQLite registry
218
+
219
+ ```ruby
220
+ input {
221
+ azure_storage_blob {
222
+ storage_account => "myaccount"
223
+ container => "data"
224
+ auth_method => "shared_key"
225
+ storage_key => "${AZURE_STORAGE_KEY}"
226
+ tracking_strategy => "registry"
227
+ registry_path => "/var/lib/logstash/blob-registry.db"
228
+ blob_batch_size => 50
229
+ poll_interval => 60
230
+ }
231
+ }
232
+
233
+ output {
234
+ file {
235
+ path => "/var/log/ingested/%{[@metadata][azure_blob_name]}.log"
236
+ }
237
+ }
238
+ ```
239
+
240
+ The `registry` strategy only requires `Storage Blob Data Reader` permissions and stores state locally. Do not run multiple Logstash instances with this strategy - they will process the same blobs independently.
241
+
242
+ ### Multi-replica deployment on Kubernetes
243
+
244
+ ```ruby
245
+ input {
246
+ azure_storage_blob {
247
+ storage_account => "prodlogs"
248
+ container => "events"
249
+ tracking_strategy => "tags"
250
+ poll_interval => 5
251
+ blob_batch_size => 20
252
+ lease_duration => 60
253
+ lease_renewal => 40
254
+ }
255
+ }
256
+ ```
257
+
258
+ With the `tags` or `container` strategy, you can safely run multiple Logstash replicas against the same container. Blob leases ensure each blob is processed by exactly one replica. On AKS with workload identity, `DefaultAzureCredential` picks up the pod identity automatically and the `cloud` setting is auto-detected from `AZURE_AUTHORITY_HOST`.
259
+
260
+ ### Route events by blob name
261
+
262
+ ```ruby
263
+ input {
264
+ azure_storage_blob {
265
+ storage_account => "myaccount"
266
+ container => "logs"
267
+ }
268
+ }
269
+
270
+ output {
271
+ if [@metadata][azure_blob_name] =~ /^access-logs/ {
272
+ elasticsearch {
273
+ index => "access-logs-%{+YYYY.MM.dd}"
274
+ }
275
+ } else if [@metadata][azure_blob_name] =~ /^error-logs/ {
276
+ elasticsearch {
277
+ index => "error-logs-%{+YYYY.MM.dd}"
278
+ }
279
+ }
280
+ }
281
+ ```
282
+
283
+ ## Building from Source
284
+
285
+ Requires Java 11+ and a local Logstash installation.
286
+
287
+ ```bash
288
+ # Run unit tests
289
+ ./gradlew test
290
+
291
+ # Run integration tests (requires Azurite)
292
+ docker run -d --name azurite -p 10000:10000 mcr.microsoft.com/azure-storage/azurite
293
+ ./gradlew integrationTest
294
+
295
+ # Build the gem
296
+ ./gradlew gem
297
+ ```
298
+
299
+ ## License
300
+
301
+ Apache-2.0
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 1.0.0
@@ -0,0 +1,12 @@
1
+ # AUTOGENERATED BY THE GRADLE SCRIPT. EDITS WILL BE OVERWRITTEN.
2
+ # encoding: utf-8
3
+ require "logstash/inputs/base"
4
+ require "logstash/namespace"
5
+ require "logstash-input-azure_storage_blob_jars"
6
+ require "java"
7
+
8
+ class LogStash::Inputs::AzureStorageBlob < LogStash::Inputs::Base
9
+ config_name "azure_storage_blob"
10
+
11
+ def self.javaClass() Java::com.azure.logstash.input.AzureStorageBlob.java_class; end
12
+ end
@@ -0,0 +1,5 @@
1
+ # AUTOGENERATED BY THE GRADLE SCRIPT. EDITS WILL BE OVERWRITTEN.
2
+ # encoding: utf-8
3
+
4
+ require 'jar_dependencies'
5
+ require_jar('com.azure.logstash.input', 'logstash-input-azure_storage_blob', '1.0.0')
@@ -0,0 +1,23 @@
1
+ # AUTOGENERATED BY THE GRADLE SCRIPT. EDITS WILL BE OVERWRITTEN.
2
+ Gem::Specification.new do |s|
3
+ s.name = 'logstash-input-azure_storage_blob'
4
+ s.version = ::File.read('VERSION').split('\n').first
5
+ s.licenses = ['Apache-2.0']
6
+ s.summary = 'Logstash input plugin for Azure Blob Storage'
7
+ s.description = 'This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program'
8
+ s.authors = ['Microsoft']
9
+ s.email = ['']
10
+ s.homepage = 'https://github.com/Azure/logstash-input-azure_storage_blob'
11
+ s.platform = 'java'
12
+ s.require_paths = ['lib', 'vendor/jar-dependencies']
13
+
14
+ s.files = Dir["lib/**/*","*.gemspec","*.md","CONTRIBUTORS","Gemfile","LICENSE","NOTICE.TXT", "vendor/jar-dependencies/**/*.jar", "vendor/jar-dependencies/**/*.rb", "VERSION", "docs/**/*"]
15
+
16
+ # Special flag to let us know this is actually a logstash plugin
17
+ s.metadata = { 'logstash_plugin' => 'true', 'logstash_group' => 'input', 'java_plugin' => 'true'}
18
+
19
+ # Gem dependencies
20
+ s.add_runtime_dependency "logstash-core-plugin-api", ">= 1.60", "<= 2.99"
21
+ s.add_runtime_dependency 'jar-dependencies'
22
+ s.add_development_dependency 'logstash-devutils'
23
+ end
metadata ADDED
@@ -0,0 +1,104 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: logstash-input-azure_storage_blob
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: java
6
+ authors:
7
+ - Microsoft
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2026-02-06 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - ">="
17
+ - !ruby/object:Gem::Version
18
+ version: '1.60'
19
+ - - "<="
20
+ - !ruby/object:Gem::Version
21
+ version: '2.99'
22
+ name: logstash-core-plugin-api
23
+ type: :runtime
24
+ prerelease: false
25
+ version_requirements: !ruby/object:Gem::Requirement
26
+ requirements:
27
+ - - ">="
28
+ - !ruby/object:Gem::Version
29
+ version: '1.60'
30
+ - - "<="
31
+ - !ruby/object:Gem::Version
32
+ version: '2.99'
33
+ - !ruby/object:Gem::Dependency
34
+ requirement: !ruby/object:Gem::Requirement
35
+ requirements:
36
+ - - ">="
37
+ - !ruby/object:Gem::Version
38
+ version: '0'
39
+ name: jar-dependencies
40
+ type: :runtime
41
+ prerelease: false
42
+ version_requirements: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ - !ruby/object:Gem::Dependency
48
+ requirement: !ruby/object:Gem::Requirement
49
+ requirements:
50
+ - - ">="
51
+ - !ruby/object:Gem::Version
52
+ version: '0'
53
+ name: logstash-devutils
54
+ type: :development
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - ">="
59
+ - !ruby/object:Gem::Version
60
+ version: '0'
61
+ description: This gem is a Logstash plugin required to be installed on top of the
62
+ Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This
63
+ gem is not a stand-alone program
64
+ email:
65
+ - ''
66
+ executables: []
67
+ extensions: []
68
+ extra_rdoc_files: []
69
+ files:
70
+ - Gemfile
71
+ - README.md
72
+ - VERSION
73
+ - lib/logstash-input-azure_storage_blob_jars.rb
74
+ - lib/logstash/inputs/azure_storage_blob.rb
75
+ - logstash-input-azure_storage_blob.gemspec
76
+ - vendor/jar-dependencies/com/azure/logstash/input/logstash-input-azure_storage_blob/1.0.0/logstash-input-azure_storage_blob-1.0.0.jar
77
+ homepage: https://github.com/Azure/logstash-input-azure_storage_blob
78
+ licenses:
79
+ - Apache-2.0
80
+ metadata:
81
+ logstash_plugin: 'true'
82
+ logstash_group: input
83
+ java_plugin: 'true'
84
+ post_install_message:
85
+ rdoc_options: []
86
+ require_paths:
87
+ - lib
88
+ - vendor/jar-dependencies
89
+ required_ruby_version: !ruby/object:Gem::Requirement
90
+ requirements:
91
+ - - ">="
92
+ - !ruby/object:Gem::Version
93
+ version: '0'
94
+ required_rubygems_version: !ruby/object:Gem::Requirement
95
+ requirements:
96
+ - - ">="
97
+ - !ruby/object:Gem::Version
98
+ version: '0'
99
+ requirements: []
100
+ rubygems_version: 3.3.26
101
+ signing_key:
102
+ specification_version: 4
103
+ summary: Logstash input plugin for Azure Blob Storage
104
+ test_files: []