re2 2.15.0.rc1-x86-linux-gnu

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,396 @@
1
+ # re2 - safer regular expressions in Ruby
2
+
3
+ Ruby bindings to [RE2][], a "fast, safe, thread-friendly alternative to
4
+ backtracking regular expression engines like those used in PCRE, Perl, and
5
+ Python".
6
+
7
+ [![Build Status](https://github.com/mudge/re2/actions/workflows/tests.yml/badge.svg?branch=main)](https://github.com/mudge/re2/actions)
8
+
9
+ **Current version:** 2.15.0.rc1
10
+ **Bundled RE2 version:** libre2.11 (2024-07-02)
11
+
12
+ ```ruby
13
+ RE2('h.*o').full_match?("hello") #=> true
14
+ RE2('e').full_match?("hello") #=> false
15
+ RE2('h.*o').partial_match?("hello") #=> true
16
+ RE2('e').partial_match?("hello") #=> true
17
+ RE2('(\w+):(\d+)').full_match("ruby:1234")
18
+ #=> #<RE2::MatchData "ruby:1234" 1:"ruby" 2:"1234">
19
+ ```
20
+
21
+ ## Table of Contents
22
+
23
+ * [Why RE2?](#why-re2)
24
+ * [Usage](#usage)
25
+ * [Compiling regular expressions](#compiling-regular-expressions)
26
+ * [Matching interface](#matching-interface)
27
+ * [Submatch extraction](#submatch-extraction)
28
+ * [Scanning text incrementally](#scanning-text-incrementally)
29
+ * [Searching simultaneously](#searching-simultaneously)
30
+ * [Encoding](#encoding)
31
+ * [Requirements](#requirements)
32
+ * [Native gems](#native-gems)
33
+ * [Verifying the gems](#verifying-the-gems)
34
+ * [Installing the `ruby` platform gem](#installing-the-ruby-platform-gem)
35
+ * [Using system libraries](#using-system-libraries)
36
+ * [Thanks](#thanks)
37
+ * [Contact](#contact)
38
+ * [License](#license)
39
+ * [Dependencies](#dependencies)
40
+
41
+ ## Why RE2?
42
+
43
+ While [recent
44
+ versions](https://www.ruby-lang.org/en/news/2022/12/25/ruby-3-2-0-released/) of
45
+ Ruby have improved defences against [regular expression denial of service
46
+ (ReDoS) attacks](https://en.wikipedia.org/wiki/ReDoS), it is still possible for
47
+ users to craft malicious patterns that take a long time to process by using
48
+ syntactic features such as [back-references, lookaheads and possessive
49
+ quantifiers](https://bugs.ruby-lang.org/issues/19104#note-3). RE2 aims to
50
+ eliminate ReDoS by design:
51
+
52
+ > **_Safety is RE2's raison d'être._**
53
+ >
54
+ > RE2 was designed and implemented with an explicit goal of being able to
55
+ > handle regular expressions from untrusted users without risk. One of its
56
+ > primary guarantees is that the match time is linear in the length of the
57
+ > input string. It was also written with production concerns in mind: the
58
+ > parser, the compiler and the execution engines limit their memory usage by
59
+ > working within a configurable budget – failing gracefully when exhausted –
60
+ > and they avoid stack overflow by eschewing recursion.
61
+
62
+ — [Why RE2?](https://github.com/google/re2/wiki/WhyRE2)
63
+
64
+ ## Usage
65
+
66
+ Install re2 as a dependency:
67
+
68
+ ```ruby
69
+ # In your Gemfile
70
+ gem "re2"
71
+
72
+ # Or without Bundler
73
+ gem install re2
74
+ ```
75
+
76
+ Include in your code:
77
+
78
+ ```ruby
79
+ require "re2"
80
+ ```
81
+
82
+ Full API documentation automatically generated from the latest version is
83
+ available at https://mudge.name/re2/.
84
+
85
+ While re2 uses the same naming scheme as Ruby's built-in regular expression
86
+ library (with [`Regexp`](https://mudge.name/re2/RE2/Regexp.html) and
87
+ [`MatchData`](https://mudge.name/re2/RE2/MatchData.html)), its API is slightly
88
+ different:
89
+
90
+ ### Compiling regular expressions
91
+
92
+ > [!WARNING]
93
+ > RE2's regular expression syntax differs from PCRE and Ruby's built-in
94
+ > [`Regexp`](https://docs.ruby-lang.org/en/3.2/Regexp.html) library, see the
95
+ > [official syntax page](https://github.com/google/re2/wiki/Syntax) for more
96
+ > details.
97
+
98
+ The core class is [`RE2::Regexp`](https://mudge.name/re2/RE2/Regexp.html) which
99
+ takes a regular expression as a string and compiles it internally into an `RE2`
100
+ object. A global function `RE2` is available to concisely compile a new
101
+ `RE2::Regexp`:
102
+
103
+ ```ruby
104
+ re = RE2('(\w+):(\d+)')
105
+ #=> #<RE2::Regexp /(\w+):(\d+)/>
106
+ re.ok? #=> true
107
+
108
+ re = RE2('abc)def')
109
+ re.ok? #=> false
110
+ re.error #=> "missing ): abc(def"
111
+ ```
112
+
113
+ > [!TIP]
114
+ > Note the use of *single quotes* when passing the regular expression as
115
+ > a string to `RE2` so that the backslashes aren't interpreted as escapes.
116
+
117
+ When compiling a regular expression, an optional second argument can be used to change RE2's default options, e.g. stop logging syntax and execution errors to stderr with `log_errors`:
118
+
119
+ ```ruby
120
+ RE2('abc)def', log_errors: false)
121
+ ```
122
+
123
+ See the API documentation for [`RE2::Regexp#initialize`](https://mudge.name/re2/RE2/Regexp.html#initialize-instance_method) for all the available options.
124
+
125
+ ### Matching interface
126
+
127
+ There are two main methods for matching: [`RE2::Regexp#full_match?`](https://mudge.name/re2/RE2/Regexp.html#full_match%3F-instance_method) requires the regular expression to match the entire input text, and [`RE2::Regexp#partial_match?`](https://mudge.name/re2/RE2/Regexp.html#partial_match%3F-instance_method) looks for a match for a substring of the input text, returning a boolean to indicate whether a match was successful or not.
128
+
129
+ ```ruby
130
+ RE2('h.*o').full_match?("hello") #=> true
131
+ RE2('e').full_match?("hello") #=> false
132
+
133
+ RE2('h.*o').partial_match?("hello") #=> true
134
+ RE2('e').partial_match?("hello") #=> true
135
+ ```
136
+
137
+ ### Submatch extraction
138
+
139
+ > [!TIP]
140
+ > Only extract the number of submatches you need as performance is improved
141
+ > with fewer submatches (with the best performance when avoiding submatch
142
+ > extraction altogether).
143
+
144
+ Both matching methods have a second form that can extract submatches as [`RE2::MatchData`](https://mudge.name/re2/RE2/MatchData.html) objects: [`RE2::Regexp#full_match`](https://mudge.name/re2/RE2/Regexp.html#full_match-instance_method) and [`RE2::Regexp#partial_match`](https://mudge.name/re2/RE2/Regexp.html#partial_match-instance_method).
145
+
146
+ ```ruby
147
+ m = RE2('(\w+):(\d+)').full_match("ruby:1234")
148
+ #=> #<RE2::MatchData "ruby:1234" 1:"ruby" 2:"1234">
149
+
150
+ m[0] #=> "ruby:1234"
151
+ m[1] #=> "ruby"
152
+ m[2] #=> "1234"
153
+
154
+ m = RE2('(\w+):(\d+)').full_match("r")
155
+ #=> nil
156
+ ```
157
+
158
+ `RE2::MatchData` supports retrieving submatches by numeric index or by name if present in the regular expression:
159
+
160
+ ```ruby
161
+ m = RE2('(?P<word>\w+):(?P<number>\d+)').full_match("ruby:1234")
162
+ #=> #<RE2::MatchData "ruby:1234" 1:"ruby" 2:"1234">
163
+
164
+ m["word"] #=> "ruby"
165
+ m["number"] #=> "1234"
166
+ ```
167
+
168
+ They can also be used with Ruby's [pattern matching](https://docs.ruby-lang.org/en/3.2/syntax/pattern_matching_rdoc.html):
169
+
170
+ ```ruby
171
+ case RE2('(\w+):(\d+)').full_match("ruby:1234")
172
+ in [word, number]
173
+ puts "Word: #{word}, Number: #{number}"
174
+ else
175
+ puts "No match"
176
+ end
177
+ # Word: ruby, Number: 1234
178
+
179
+ case RE2('(?P<word>\w+):(?P<number>\d+)').full_match("ruby:1234")
180
+ in word:, number:
181
+ puts "Word: #{word}, Number: #{number}"
182
+ else
183
+ puts "No match"
184
+ end
185
+ # Word: ruby, Number: 1234
186
+ ```
187
+
188
+ By default, both `full_match` and `partial_match` will extract all submatches into the `RE2::MatchData` based on the number of capturing groups in the regular expression. This can be changed by passing an optional second argument when matching:
189
+
190
+ ```ruby
191
+ m = RE2('(\w+):(\d+)').full_match("ruby:1234", submatches: 1)
192
+ => #<RE2::MatchData "ruby:1234" 1:"ruby">
193
+ ```
194
+
195
+ > [!WARNING]
196
+ > If the regular expression has no capturing groups or you pass `submatches:
197
+ > 0`, the matching method will behave like its `full_match?` or
198
+ > `partial_match?` form and only return `true` or `false` rather than
199
+ > `RE2::MatchData`.
200
+
201
+ ### Scanning text incrementally
202
+
203
+ If you want to repeatedly match regular expressions from the start of some input text, you can use [`RE2::Regexp#scan`](https://mudge.name/re2/RE2/Regexp.html#scan-instance_method) to return an `Enumerable` [`RE2::Scanner`](https://mudge.name/re2/RE2/Scanner.html) object which will lazily consume matches as you iterate over it:
204
+
205
+ ```ruby
206
+ scanner = RE2('(\w+)').scan(" one two three 4")
207
+ scanner.each do |match|
208
+ puts match.inspect
209
+ end
210
+ # ["one"]
211
+ # ["two"]
212
+ # ["three"]
213
+ # ["4"]
214
+ ```
215
+
216
+ ### Searching simultaneously
217
+
218
+ [`RE2::Set`](https://mudge.name/re2/RE2/Set.html) represents a collection of
219
+ regular expressions that can be searched for simultaneously. Calling
220
+ [`RE2::Set#add`](https://mudge.name/re2/RE2/Set.html#add-instance_method) with
221
+ a regular expression will return the integer index at which it is stored within
222
+ the set. After all patterns have been added, the set can be compiled using
223
+ [`RE2::Set#compile`](https://mudge.name/re2/RE2/Set.html#compile-instance_method),
224
+ and then
225
+ [`RE2::Set#match`](https://mudge.name/re2/RE2/Set.html#match-instance_method)
226
+ will return an array containing the indices of all the patterns that matched.
227
+
228
+ ```ruby
229
+ set = RE2::Set.new
230
+ set.add("abc") #=> 0
231
+ set.add("def") #=> 1
232
+ set.add("ghi") #=> 2
233
+ set.compile #=> true
234
+ set.match("abcdefghi") #=> [0, 1, 2]
235
+ set.match("ghidefabc") #=> [2, 1, 0]
236
+ ```
237
+
238
+ ### Encoding
239
+
240
+ > [!WARNING]
241
+ > Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be
242
+ > returned in UTF-8 by default or ISO-8859-1 if the `:utf8` option for the
243
+ > `RE2::Regexp` is set to `false` (any other encoding's behaviour is undefined).
244
+
245
+ For backward compatibility: re2 won't automatically convert string inputs to
246
+ the right encoding so this is the responsibility of the caller, e.g.
247
+
248
+ ```ruby
249
+ # By default, RE2 will process patterns and text as UTF-8
250
+ RE2(non_utf8_pattern.encode("UTF-8")).match(non_utf8_text.encode("UTF-8"))
251
+
252
+ # If the :utf8 option is false, RE2 will process patterns and text as ISO-8859-1
253
+ RE2(non_latin1_pattern.encode("ISO-8859-1"), utf8: false).match(non_latin1_text.encode("ISO-8859-1"))
254
+ ```
255
+
256
+ ## Requirements
257
+
258
+ This gem requires the following to run:
259
+
260
+ * [Ruby](https://www.ruby-lang.org/en/) 3.1 to 3.4.0-rc1
261
+
262
+ It supports the following RE2 ABI versions:
263
+
264
+ * libre2.0 (prior to release 2020-03-02) to libre2.11 (2023-07-01 to 2024-07-02)
265
+
266
+ ### Native gems
267
+
268
+ Where possible, a pre-compiled native gem will be provided for the following platforms:
269
+
270
+ * Linux
271
+ * `aarch64-linux`, `arm-linux`, `x86-linux` and `x86_64-linux` (requires [glibc](https://www.gnu.org/software/libc/) 2.29+, RubyGems 3.3.22+ and Bundler 2.3.21+)
272
+ * [musl](https://musl.libc.org/)-based systems such as [Alpine](https://alpinelinux.org) are supported with Bundler 2.5.6+
273
+ * macOS `x86_64-darwin` and `arm64-darwin`
274
+ * Windows `x64-mingw-ucrt`
275
+
276
+ ### Verifying the gems
277
+
278
+ SHA256 checksums are included in the [release notes](https://github.com/mudge/re2/releases) for each version and can be checked with `sha256sum`, e.g.
279
+
280
+ ```console
281
+ $ gem fetch re2 -v 2.14.0
282
+ Fetching re2-2.14.0-arm64-darwin.gem
283
+ Downloaded re2-2.14.0-arm64-darwin
284
+ $ sha256sum re2-2.14.0-arm64-darwin.gem
285
+ 3c922d54a44ac88499f6391bc2f9740559381deaf7f4e49eef5634cf32efc2ce re2-2.14.0-arm64-darwin.gem
286
+ ```
287
+
288
+ [GPG](https://www.gnupg.org/) signatures are attached to each release (the assets ending in `.sig`) and can be verified if you import [our signing key `0x39AC3530070E0F75`](https://mudge.name/39AC3530070E0F75.asc) (or fetch it from a public keyserver, e.g. `gpg --keyserver keyserver.ubuntu.com --recv-key 0x39AC3530070E0F75`):
289
+
290
+ ```console
291
+ $ gpg --verify re2-2.14.0-arm64-darwin.gem.sig re2-2.14.0-arm64-darwin.gem
292
+ gpg: Signature made Fri 2 Aug 12:39:12 2024 BST
293
+ gpg: using RSA key 702609D9C790F45B577D7BEC39AC3530070E0F75
294
+ gpg: Good signature from "Paul Mucur <mudge@mudge.name>" [unknown]
295
+ gpg: aka "Paul Mucur <paul@ghostcassette.com>" [unknown]
296
+ gpg: WARNING: This key is not certified with a trusted signature!
297
+ gpg: There is no indication that the signature belongs to the owner.
298
+ Primary key fingerprint: 7026 09D9 C790 F45B 577D 7BEC 39AC 3530 070E 0F75
299
+ ```
300
+
301
+ The fingerprint should be as shown above or you can independently verify it with the ones shown in the footer of https://mudge.name.
302
+
303
+ ### Installing the `ruby` platform gem
304
+
305
+ > [!WARNING]
306
+ > We strongly recommend using the native gems where possible to avoid the need
307
+ > for compiling the C++ extension and its dependencies which will take longer
308
+ > and be less reliable.
309
+
310
+ If you wish to compile the gem, you will need to explicitly install the `ruby` platform gem:
311
+
312
+ ```ruby
313
+ # In your Gemfile with Bundler 2.3.18+
314
+ gem "re2", force_ruby_platform: true
315
+
316
+ # With Bundler 2.1+
317
+ bundle config set force_ruby_platform true
318
+
319
+ # With older versions of Bundler
320
+ bundle config force_ruby_platform true
321
+
322
+ # Without Bundler
323
+ gem install re2 --platform=ruby
324
+ ```
325
+
326
+ You will need a full compiler toolchain for compiling Ruby C extensions (see
327
+ [Nokogiri's "The Compiler
328
+ Toolchain"](https://nokogiri.org/tutorials/installing_nokogiri.html#appendix-a-the-compiler-toolchain))
329
+ plus the toolchain required for compiling the vendored version of RE2 and its
330
+ dependency [Abseil][] which includes [CMake](https://cmake.org), a compiler
331
+ with C++14 support such as [clang](http://clang.llvm.org/) 3.4 or
332
+ [gcc](https://gcc.gnu.org/) 5 and a recent version of
333
+ [pkg-config](https://www.freedesktop.org/wiki/Software/pkg-config/). On
334
+ Windows, you'll also need pkgconf 2.1.0+ to avoid [`undefined reference`
335
+ errors](https://github.com/pkgconf/pkgconf/issues/322) when attempting to
336
+ compile Abseil.
337
+
338
+ ### Using system libraries
339
+
340
+ If you already have RE2 installed, you can instruct the gem not to use its own vendored version:
341
+
342
+ ```ruby
343
+ gem install re2 --platform=ruby -- --enable-system-libraries
344
+
345
+ # If RE2 is not installed in /usr/local, /usr, or /opt/homebrew:
346
+ gem install re2 --platform=ruby -- --enable-system-libraries --with-re2-dir=/path/to/re2/prefix
347
+ ```
348
+
349
+ Alternatively, you can set the `RE2_USE_SYSTEM_LIBRARIES` environment variable instead of passing `--enable-system-libraries` to the `gem` command.
350
+
351
+
352
+ ## Thanks
353
+
354
+ * Thanks to [Jason Woods](https://github.com/driskell) who contributed the
355
+ original implementations of `RE2::MatchData#begin` and `RE2::MatchData#end`.
356
+ * Thanks to [Stefano Rivera](https://github.com/stefanor) who first contributed
357
+ C++11 support.
358
+ * Thanks to [Stan Hu](https://github.com/stanhu) for reporting a bug with empty
359
+ patterns and `RE2::Regexp#scan`, contributing support for libre2.11
360
+ (2023-07-01) and for vendoring RE2 and abseil and compiling native gems in
361
+ 2.0.
362
+ * Thanks to [Sebastian Reitenbach](https://github.com/buzzdeee) for reporting
363
+ the deprecation and removal of the `utf8` encoding option in RE2.
364
+ * Thanks to [Sergio Medina](https://github.com/serch) for reporting a bug when
365
+ using `RE2::Scanner#scan` with an invalid regular expression.
366
+ * Thanks to [Pritam Baral](https://github.com/pritambaral) for contributing the
367
+ initial support for `RE2::Set`.
368
+ * Thanks to [Mike Dalessio](https://github.com/flavorjones) for reviewing the
369
+ precompilation of native gems in 2.0.
370
+ * Thanks to [Peter Zhu](https://github.com/peterzhu2118) for
371
+ [ruby_memcheck](https://github.com/Shopify/ruby_memcheck) and helping find
372
+ the memory leaks fixed in 2.1.3.
373
+ * Thanks to [Jean Boussier](https://github.com/byroot) for contributing the
374
+ switch to Ruby's `TypedData` API and the resulting garbage collection
375
+ improvements in 2.4.0.
376
+ * Thanks to [Manuel Jacob](https://github.com/manueljacob) for reporting a bug
377
+ when passing strings with null bytes.
378
+
379
+ ## Contact
380
+
381
+ All issues and suggestions should go to [GitHub Issues](https://github.com/mudge/re2/issues).
382
+
383
+ ## License
384
+
385
+ This library is licensed under the BSD 3-Clause License, see `LICENSE.txt`.
386
+
387
+ Copyright © 2010, Paul Mucur.
388
+
389
+ ### Dependencies
390
+
391
+ The source code of [RE2][] is distributed in the `ruby` platform gem. This code is licensed under the BSD 3-Clause License, see `LICENSE-DEPENDENCIES.txt`.
392
+
393
+ The source code of [Abseil][] is distributed in the `ruby` platform gem. This code is licensed under the Apache License 2.0, see `LICENSE-DEPENDENCIES.txt`.
394
+
395
+ [RE2]: https://github.com/google/re2
396
+ [Abseil]: https://abseil.io
data/Rakefile ADDED
@@ -0,0 +1,94 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'rake/extensiontask'
4
+ require 'rake_compiler_dock'
5
+ require 'rspec/core/rake_task'
6
+
7
+ require_relative 'ext/re2/recipes'
8
+
9
+ re2_gemspec = Gem::Specification.load('re2.gemspec')
10
+ abseil_recipe, re2_recipe = load_recipes
11
+
12
+ # Add Abseil and RE2's latest archives to the gem files. (Note these will be
13
+ # removed from the precompiled native gems.)
14
+ abseil_archive = File.join("ports/archives", File.basename(abseil_recipe.files[0][:url]))
15
+ re2_archive = File.join("ports/archives", File.basename(re2_recipe.files[0][:url]))
16
+
17
+ re2_gemspec.files << abseil_archive
18
+ re2_gemspec.files << re2_archive
19
+
20
+ cross_platforms = %w[
21
+ aarch64-linux-gnu
22
+ aarch64-linux-musl
23
+ arm-linux-gnu
24
+ arm-linux-musl
25
+ arm64-darwin
26
+ x64-mingw-ucrt
27
+ x64-mingw32
28
+ x86-linux-gnu
29
+ x86-linux-musl
30
+ x86-mingw32
31
+ x86_64-darwin
32
+ x86_64-linux-gnu
33
+ x86_64-linux-musl
34
+ ].freeze
35
+
36
+ ENV['RUBY_CC_VERSION'] = %w[3.4.0 3.3.5 3.2.0 3.1.0].join(':')
37
+
38
+ Gem::PackageTask.new(re2_gemspec).define
39
+
40
+ Rake::ExtensionTask.new('re2', re2_gemspec) do |e|
41
+ e.cross_compile = true
42
+ e.cross_config_options << '--enable-cross-build'
43
+ e.config_options << '--disable-system-libraries'
44
+ e.cross_platform = cross_platforms
45
+ e.cross_compiling do |spec|
46
+ spec.files.reject! { |path| File.fnmatch?('ports/*', path) }
47
+ spec.dependencies.reject! { |dep| dep.name == 'mini_portile2' }
48
+ end
49
+ end
50
+
51
+ RSpec::Core::RakeTask.new(:spec)
52
+
53
+ begin
54
+ require 'ruby_memcheck'
55
+ require 'ruby_memcheck/rspec/rake_task'
56
+
57
+ namespace :spec do
58
+ RubyMemcheck::RSpec::RakeTask.new(valgrind: :compile)
59
+ end
60
+ rescue LoadError
61
+ # Only define the spec:valgrind task if ruby_memcheck is installed
62
+ end
63
+
64
+ namespace :gem do
65
+ cross_platforms.each do |platform|
66
+
67
+ # Compile each platform's native gem, packaging up the result. Note we add
68
+ # /usr/local/bin to the PATH as it contains the newest version of CMake in
69
+ # the rake-compiler-dock images.
70
+ desc "Compile and build native gem for #{platform} platform"
71
+ task platform do
72
+ RakeCompilerDock.sh <<~SCRIPT, platform: platform, verbose: true
73
+ gem install bundler --no-document &&
74
+ bundle &&
75
+ bundle exec rake native:#{platform} pkg/#{re2_gemspec.full_name}-#{Gem::Platform.new(platform)}.gem PATH="/usr/local/bin:$PATH"
76
+ SCRIPT
77
+ end
78
+ end
79
+ end
80
+
81
+ # Set up file tasks for Abseil and RE2's archives so they are automatically
82
+ # downloaded when required by the gem task.
83
+ file abseil_archive do
84
+ abseil_recipe.download
85
+ end
86
+
87
+ file re2_archive do
88
+ re2_recipe.download
89
+ end
90
+
91
+ task default: :spec
92
+
93
+ CLEAN.add("lib/**/*.{o,so,bundle}", "pkg")
94
+ CLOBBER.add("ports")
data/dependencies.yml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ libre2:
3
+ version: '2024-07-02'
4
+ sha256: eb2df807c781601c14a260a507a5bb4509be1ee626024cb45acbd57cb9d4032b
5
+ abseil:
6
+ version: '20240722.0'
7
+ sha256: f50e5ac311a81382da7fa75b97310e4b9006474f9560ac46f54a9967f07d4ae3