re2 2.15.0.rc1-x86-linux-gnu

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,396 @@
1
+ # re2 - safer regular expressions in Ruby
2
+
3
+ Ruby bindings to [RE2][], a "fast, safe, thread-friendly alternative to
4
+ backtracking regular expression engines like those used in PCRE, Perl, and
5
+ Python".
6
+
7
+ [![Build Status](https://github.com/mudge/re2/actions/workflows/tests.yml/badge.svg?branch=main)](https://github.com/mudge/re2/actions)
8
+
9
+ **Current version:** 2.15.0.rc1
10
+ **Bundled RE2 version:** libre2.11 (2024-07-02)
11
+
12
+ ```ruby
13
+ RE2('h.*o').full_match?("hello") #=> true
14
+ RE2('e').full_match?("hello") #=> false
15
+ RE2('h.*o').partial_match?("hello") #=> true
16
+ RE2('e').partial_match?("hello") #=> true
17
+ RE2('(\w+):(\d+)').full_match("ruby:1234")
18
+ #=> #<RE2::MatchData "ruby:1234" 1:"ruby" 2:"1234">
19
+ ```
20
+
21
+ ## Table of Contents
22
+
23
+ * [Why RE2?](#why-re2)
24
+ * [Usage](#usage)
25
+ * [Compiling regular expressions](#compiling-regular-expressions)
26
+ * [Matching interface](#matching-interface)
27
+ * [Submatch extraction](#submatch-extraction)
28
+ * [Scanning text incrementally](#scanning-text-incrementally)
29
+ * [Searching simultaneously](#searching-simultaneously)
30
+ * [Encoding](#encoding)
31
+ * [Requirements](#requirements)
32
+ * [Native gems](#native-gems)
33
+ * [Verifying the gems](#verifying-the-gems)
34
+ * [Installing the `ruby` platform gem](#installing-the-ruby-platform-gem)
35
+ * [Using system libraries](#using-system-libraries)
36
+ * [Thanks](#thanks)
37
+ * [Contact](#contact)
38
+ * [License](#license)
39
+ * [Dependencies](#dependencies)
40
+
41
+ ## Why RE2?
42
+
43
+ While [recent
44
+ versions](https://www.ruby-lang.org/en/news/2022/12/25/ruby-3-2-0-released/) of
45
+ Ruby have improved defences against [regular expression denial of service
46
+ (ReDoS) attacks](https://en.wikipedia.org/wiki/ReDoS), it is still possible for
47
+ users to craft malicious patterns that take a long time to process by using
48
+ syntactic features such as [back-references, lookaheads and possessive
49
+ quantifiers](https://bugs.ruby-lang.org/issues/19104#note-3). RE2 aims to
50
+ eliminate ReDoS by design:
51
+
52
+ > **_Safety is RE2's raison d'être._**
53
+ >
54
+ > RE2 was designed and implemented with an explicit goal of being able to
55
+ > handle regular expressions from untrusted users without risk. One of its
56
+ > primary guarantees is that the match time is linear in the length of the
57
+ > input string. It was also written with production concerns in mind: the
58
+ > parser, the compiler and the execution engines limit their memory usage by
59
+ > working within a configurable budget – failing gracefully when exhausted –
60
+ > and they avoid stack overflow by eschewing recursion.
61
+
62
+ — [Why RE2?](https://github.com/google/re2/wiki/WhyRE2)
63
+
64
+ ## Usage
65
+
66
+ Install re2 as a dependency:
67
+
68
+ ```ruby
69
+ # In your Gemfile
70
+ gem "re2"
71
+
72
+ # Or without Bundler
73
+ gem install re2
74
+ ```
75
+
76
+ Include in your code:
77
+
78
+ ```ruby
79
+ require "re2"
80
+ ```
81
+
82
+ Full API documentation automatically generated from the latest version is
83
+ available at https://mudge.name/re2/.
84
+
85
+ While re2 uses the same naming scheme as Ruby's built-in regular expression
86
+ library (with [`Regexp`](https://mudge.name/re2/RE2/Regexp.html) and
87
+ [`MatchData`](https://mudge.name/re2/RE2/MatchData.html)), its API is slightly
88
+ different:
89
+
90
+ ### Compiling regular expressions
91
+
92
+ > [!WARNING]
93
+ > RE2's regular expression syntax differs from PCRE and Ruby's built-in
94
+ > [`Regexp`](https://docs.ruby-lang.org/en/3.2/Regexp.html) library, see the
95
+ > [official syntax page](https://github.com/google/re2/wiki/Syntax) for more
96
+ > details.
97
+
98
+ The core class is [`RE2::Regexp`](https://mudge.name/re2/RE2/Regexp.html) which
99
+ takes a regular expression as a string and compiles it internally into an `RE2`
100
+ object. A global function `RE2` is available to concisely compile a new
101
+ `RE2::Regexp`:
102
+
103
+ ```ruby
104
+ re = RE2('(\w+):(\d+)')
105
+ #=> #<RE2::Regexp /(\w+):(\d+)/>
106
+ re.ok? #=> true
107
+
108
+ re = RE2('abc)def')
109
+ re.ok? #=> false
110
+ re.error #=> "missing ): abc(def"
111
+ ```
112
+
113
+ > [!TIP]
114
+ > Note the use of *single quotes* when passing the regular expression as
115
+ > a string to `RE2` so that the backslashes aren't interpreted as escapes.
116
+
117
+ When compiling a regular expression, an optional second argument can be used to change RE2's default options, e.g. stop logging syntax and execution errors to stderr with `log_errors`:
118
+
119
+ ```ruby
120
+ RE2('abc)def', log_errors: false)
121
+ ```
122
+
123
+ See the API documentation for [`RE2::Regexp#initialize`](https://mudge.name/re2/RE2/Regexp.html#initialize-instance_method) for all the available options.
124
+
125
+ ### Matching interface
126
+
127
+ There are two main methods for matching: [`RE2::Regexp#full_match?`](https://mudge.name/re2/RE2/Regexp.html#full_match%3F-instance_method) requires the regular expression to match the entire input text, and [`RE2::Regexp#partial_match?`](https://mudge.name/re2/RE2/Regexp.html#partial_match%3F-instance_method) looks for a match for a substring of the input text, returning a boolean to indicate whether a match was successful or not.
128
+
129
+ ```ruby
130
+ RE2('h.*o').full_match?("hello") #=> true
131
+ RE2('e').full_match?("hello") #=> false
132
+
133
+ RE2('h.*o').partial_match?("hello") #=> true
134
+ RE2('e').partial_match?("hello") #=> true
135
+ ```
136
+
137
+ ### Submatch extraction
138
+
139
+ > [!TIP]
140
+ > Only extract the number of submatches you need as performance is improved
141
+ > with fewer submatches (with the best performance when avoiding submatch
142
+ > extraction altogether).
143
+
144
+ Both matching methods have a second form that can extract submatches as [`RE2::MatchData`](https://mudge.name/re2/RE2/MatchData.html) objects: [`RE2::Regexp#full_match`](https://mudge.name/re2/RE2/Regexp.html#full_match-instance_method) and [`RE2::Regexp#partial_match`](https://mudge.name/re2/RE2/Regexp.html#partial_match-instance_method).
145
+
146
+ ```ruby
147
+ m = RE2('(\w+):(\d+)').full_match("ruby:1234")
148
+ #=> #<RE2::MatchData "ruby:1234" 1:"ruby" 2:"1234">
149
+
150
+ m[0] #=> "ruby:1234"
151
+ m[1] #=> "ruby"
152
+ m[2] #=> "1234"
153
+
154
+ m = RE2('(\w+):(\d+)').full_match("r")
155
+ #=> nil
156
+ ```
157
+
158
+ `RE2::MatchData` supports retrieving submatches by numeric index or by name if present in the regular expression:
159
+
160
+ ```ruby
161
+ m = RE2('(?P<word>\w+):(?P<number>\d+)').full_match("ruby:1234")
162
+ #=> #<RE2::MatchData "ruby:1234" 1:"ruby" 2:"1234">
163
+
164
+ m["word"] #=> "ruby"
165
+ m["number"] #=> "1234"
166
+ ```
167
+
168
+ They can also be used with Ruby's [pattern matching](https://docs.ruby-lang.org/en/3.2/syntax/pattern_matching_rdoc.html):
169
+
170
+ ```ruby
171
+ case RE2('(\w+):(\d+)').full_match("ruby:1234")
172
+ in [word, number]
173
+ puts "Word: #{word}, Number: #{number}"
174
+ else
175
+ puts "No match"
176
+ end
177
+ # Word: ruby, Number: 1234
178
+
179
+ case RE2('(?P<word>\w+):(?P<number>\d+)').full_match("ruby:1234")
180
+ in word:, number:
181
+ puts "Word: #{word}, Number: #{number}"
182
+ else
183
+ puts "No match"
184
+ end
185
+ # Word: ruby, Number: 1234
186
+ ```
187
+
188
+ By default, both `full_match` and `partial_match` will extract all submatches into the `RE2::MatchData` based on the number of capturing groups in the regular expression. This can be changed by passing an optional second argument when matching:
189
+
190
+ ```ruby
191
+ m = RE2('(\w+):(\d+)').full_match("ruby:1234", submatches: 1)
192
+ => #<RE2::MatchData "ruby:1234" 1:"ruby">
193
+ ```
194
+
195
+ > [!WARNING]
196
+ > If the regular expression has no capturing groups or you pass `submatches:
197
+ > 0`, the matching method will behave like its `full_match?` or
198
+ > `partial_match?` form and only return `true` or `false` rather than
199
+ > `RE2::MatchData`.
200
+
201
+ ### Scanning text incrementally
202
+
203
+ If you want to repeatedly match regular expressions from the start of some input text, you can use [`RE2::Regexp#scan`](https://mudge.name/re2/RE2/Regexp.html#scan-instance_method) to return an `Enumerable` [`RE2::Scanner`](https://mudge.name/re2/RE2/Scanner.html) object which will lazily consume matches as you iterate over it:
204
+
205
+ ```ruby
206
+ scanner = RE2('(\w+)').scan(" one two three 4")
207
+ scanner.each do |match|
208
+ puts match.inspect
209
+ end
210
+ # ["one"]
211
+ # ["two"]
212
+ # ["three"]
213
+ # ["4"]
214
+ ```
215
+
216
+ ### Searching simultaneously
217
+
218
+ [`RE2::Set`](https://mudge.name/re2/RE2/Set.html) represents a collection of
219
+ regular expressions that can be searched for simultaneously. Calling
220
+ [`RE2::Set#add`](https://mudge.name/re2/RE2/Set.html#add-instance_method) with
221
+ a regular expression will return the integer index at which it is stored within
222
+ the set. After all patterns have been added, the set can be compiled using
223
+ [`RE2::Set#compile`](https://mudge.name/re2/RE2/Set.html#compile-instance_method),
224
+ and then
225
+ [`RE2::Set#match`](https://mudge.name/re2/RE2/Set.html#match-instance_method)
226
+ will return an array containing the indices of all the patterns that matched.
227
+
228
+ ```ruby
229
+ set = RE2::Set.new
230
+ set.add("abc") #=> 0
231
+ set.add("def") #=> 1
232
+ set.add("ghi") #=> 2
233
+ set.compile #=> true
234
+ set.match("abcdefghi") #=> [0, 1, 2]
235
+ set.match("ghidefabc") #=> [2, 1, 0]
236
+ ```
237
+
238
+ ### Encoding
239
+
240
+ > [!WARNING]
241
+ > Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be
242
+ > returned in UTF-8 by default or ISO-8859-1 if the `:utf8` option for the
243
+ > `RE2::Regexp` is set to `false` (any other encoding's behaviour is undefined).
244
+
245
+ For backward compatibility: re2 won't automatically convert string inputs to
246
+ the right encoding so this is the responsibility of the caller, e.g.
247
+
248
+ ```ruby
249
+ # By default, RE2 will process patterns and text as UTF-8
250
+ RE2(non_utf8_pattern.encode("UTF-8")).match(non_utf8_text.encode("UTF-8"))
251
+
252
+ # If the :utf8 option is false, RE2 will process patterns and text as ISO-8859-1
253
+ RE2(non_latin1_pattern.encode("ISO-8859-1"), utf8: false).match(non_latin1_text.encode("ISO-8859-1"))
254
+ ```
255
+
256
+ ## Requirements
257
+
258
+ This gem requires the following to run:
259
+
260
+ * [Ruby](https://www.ruby-lang.org/en/) 3.1 to 3.4.0-rc1
261
+
262
+ It supports the following RE2 ABI versions:
263
+
264
+ * libre2.0 (prior to release 2020-03-02) to libre2.11 (2023-07-01 to 2024-07-02)
265
+
266
+ ### Native gems
267
+
268
+ Where possible, a pre-compiled native gem will be provided for the following platforms:
269
+
270
+ * Linux
271
+ * `aarch64-linux`, `arm-linux`, `x86-linux` and `x86_64-linux` (requires [glibc](https://www.gnu.org/software/libc/) 2.29+, RubyGems 3.3.22+ and Bundler 2.3.21+)
272
+ * [musl](https://musl.libc.org/)-based systems such as [Alpine](https://alpinelinux.org) are supported with Bundler 2.5.6+
273
+ * macOS `x86_64-darwin` and `arm64-darwin`
274
+ * Windows `x64-mingw-ucrt`
275
+
276
+ ### Verifying the gems
277
+
278
+ SHA256 checksums are included in the [release notes](https://github.com/mudge/re2/releases) for each version and can be checked with `sha256sum`, e.g.
279
+
280
+ ```console
281
+ $ gem fetch re2 -v 2.14.0
282
+ Fetching re2-2.14.0-arm64-darwin.gem
283
+ Downloaded re2-2.14.0-arm64-darwin
284
+ $ sha256sum re2-2.14.0-arm64-darwin.gem
285
+ 3c922d54a44ac88499f6391bc2f9740559381deaf7f4e49eef5634cf32efc2ce re2-2.14.0-arm64-darwin.gem
286
+ ```
287
+
288
+ [GPG](https://www.gnupg.org/) signatures are attached to each release (the assets ending in `.sig`) and can be verified if you import [our signing key `0x39AC3530070E0F75`](https://mudge.name/39AC3530070E0F75.asc) (or fetch it from a public keyserver, e.g. `gpg --keyserver keyserver.ubuntu.com --recv-key 0x39AC3530070E0F75`):
289
+
290
+ ```console
291
+ $ gpg --verify re2-2.14.0-arm64-darwin.gem.sig re2-2.14.0-arm64-darwin.gem
292
+ gpg: Signature made Fri 2 Aug 12:39:12 2024 BST
293
+ gpg: using RSA key 702609D9C790F45B577D7BEC39AC3530070E0F75
294
+ gpg: Good signature from "Paul Mucur <mudge@mudge.name>" [unknown]
295
+ gpg: aka "Paul Mucur <paul@ghostcassette.com>" [unknown]
296
+ gpg: WARNING: This key is not certified with a trusted signature!
297
+ gpg: There is no indication that the signature belongs to the owner.
298
+ Primary key fingerprint: 7026 09D9 C790 F45B 577D 7BEC 39AC 3530 070E 0F75
299
+ ```
300
+
301
+ The fingerprint should be as shown above or you can independently verify it with the ones shown in the footer of https://mudge.name.
302
+
303
+ ### Installing the `ruby` platform gem
304
+
305
+ > [!WARNING]
306
+ > We strongly recommend using the native gems where possible to avoid the need
307
+ > for compiling the C++ extension and its dependencies which will take longer
308
+ > and be less reliable.
309
+
310
+ If you wish to compile the gem, you will need to explicitly install the `ruby` platform gem:
311
+
312
+ ```ruby
313
+ # In your Gemfile with Bundler 2.3.18+
314
+ gem "re2", force_ruby_platform: true
315
+
316
+ # With Bundler 2.1+
317
+ bundle config set force_ruby_platform true
318
+
319
+ # With older versions of Bundler
320
+ bundle config force_ruby_platform true
321
+
322
+ # Without Bundler
323
+ gem install re2 --platform=ruby
324
+ ```
325
+
326
+ You will need a full compiler toolchain for compiling Ruby C extensions (see
327
+ [Nokogiri's "The Compiler
328
+ Toolchain"](https://nokogiri.org/tutorials/installing_nokogiri.html#appendix-a-the-compiler-toolchain))
329
+ plus the toolchain required for compiling the vendored version of RE2 and its
330
+ dependency [Abseil][] which includes [CMake](https://cmake.org), a compiler
331
+ with C++14 support such as [clang](http://clang.llvm.org/) 3.4 or
332
+ [gcc](https://gcc.gnu.org/) 5 and a recent version of
333
+ [pkg-config](https://www.freedesktop.org/wiki/Software/pkg-config/). On
334
+ Windows, you'll also need pkgconf 2.1.0+ to avoid [`undefined reference`
335
+ errors](https://github.com/pkgconf/pkgconf/issues/322) when attempting to
336
+ compile Abseil.
337
+
338
+ ### Using system libraries
339
+
340
+ If you already have RE2 installed, you can instruct the gem not to use its own vendored version:
341
+
342
+ ```ruby
343
+ gem install re2 --platform=ruby -- --enable-system-libraries
344
+
345
+ # If RE2 is not installed in /usr/local, /usr, or /opt/homebrew:
346
+ gem install re2 --platform=ruby -- --enable-system-libraries --with-re2-dir=/path/to/re2/prefix
347
+ ```
348
+
349
+ Alternatively, you can set the `RE2_USE_SYSTEM_LIBRARIES` environment variable instead of passing `--enable-system-libraries` to the `gem` command.
350
+
351
+
352
+ ## Thanks
353
+
354
+ * Thanks to [Jason Woods](https://github.com/driskell) who contributed the
355
+ original implementations of `RE2::MatchData#begin` and `RE2::MatchData#end`.
356
+ * Thanks to [Stefano Rivera](https://github.com/stefanor) who first contributed
357
+ C++11 support.
358
+ * Thanks to [Stan Hu](https://github.com/stanhu) for reporting a bug with empty
359
+ patterns and `RE2::Regexp#scan`, contributing support for libre2.11
360
+ (2023-07-01) and for vendoring RE2 and abseil and compiling native gems in
361
+ 2.0.
362
+ * Thanks to [Sebastian Reitenbach](https://github.com/buzzdeee) for reporting
363
+ the deprecation and removal of the `utf8` encoding option in RE2.
364
+ * Thanks to [Sergio Medina](https://github.com/serch) for reporting a bug when
365
+ using `RE2::Scanner#scan` with an invalid regular expression.
366
+ * Thanks to [Pritam Baral](https://github.com/pritambaral) for contributing the
367
+ initial support for `RE2::Set`.
368
+ * Thanks to [Mike Dalessio](https://github.com/flavorjones) for reviewing the
369
+ precompilation of native gems in 2.0.
370
+ * Thanks to [Peter Zhu](https://github.com/peterzhu2118) for
371
+ [ruby_memcheck](https://github.com/Shopify/ruby_memcheck) and helping find
372
+ the memory leaks fixed in 2.1.3.
373
+ * Thanks to [Jean Boussier](https://github.com/byroot) for contributing the
374
+ switch to Ruby's `TypedData` API and the resulting garbage collection
375
+ improvements in 2.4.0.
376
+ * Thanks to [Manuel Jacob](https://github.com/manueljacob) for reporting a bug
377
+ when passing strings with null bytes.
378
+
379
+ ## Contact
380
+
381
+ All issues and suggestions should go to [GitHub Issues](https://github.com/mudge/re2/issues).
382
+
383
+ ## License
384
+
385
+ This library is licensed under the BSD 3-Clause License, see `LICENSE.txt`.
386
+
387
+ Copyright © 2010, Paul Mucur.
388
+
389
+ ### Dependencies
390
+
391
+ The source code of [RE2][] is distributed in the `ruby` platform gem. This code is licensed under the BSD 3-Clause License, see `LICENSE-DEPENDENCIES.txt`.
392
+
393
+ The source code of [Abseil][] is distributed in the `ruby` platform gem. This code is licensed under the Apache License 2.0, see `LICENSE-DEPENDENCIES.txt`.
394
+
395
+ [RE2]: https://github.com/google/re2
396
+ [Abseil]: https://abseil.io
data/Rakefile ADDED
@@ -0,0 +1,94 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'rake/extensiontask'
4
+ require 'rake_compiler_dock'
5
+ require 'rspec/core/rake_task'
6
+
7
+ require_relative 'ext/re2/recipes'
8
+
9
+ re2_gemspec = Gem::Specification.load('re2.gemspec')
10
+ abseil_recipe, re2_recipe = load_recipes
11
+
12
+ # Add Abseil and RE2's latest archives to the gem files. (Note these will be
13
+ # removed from the precompiled native gems.)
14
+ abseil_archive = File.join("ports/archives", File.basename(abseil_recipe.files[0][:url]))
15
+ re2_archive = File.join("ports/archives", File.basename(re2_recipe.files[0][:url]))
16
+
17
+ re2_gemspec.files << abseil_archive
18
+ re2_gemspec.files << re2_archive
19
+
20
+ cross_platforms = %w[
21
+ aarch64-linux-gnu
22
+ aarch64-linux-musl
23
+ arm-linux-gnu
24
+ arm-linux-musl
25
+ arm64-darwin
26
+ x64-mingw-ucrt
27
+ x64-mingw32
28
+ x86-linux-gnu
29
+ x86-linux-musl
30
+ x86-mingw32
31
+ x86_64-darwin
32
+ x86_64-linux-gnu
33
+ x86_64-linux-musl
34
+ ].freeze
35
+
36
+ ENV['RUBY_CC_VERSION'] = %w[3.4.0 3.3.5 3.2.0 3.1.0].join(':')
37
+
38
+ Gem::PackageTask.new(re2_gemspec).define
39
+
40
+ Rake::ExtensionTask.new('re2', re2_gemspec) do |e|
41
+ e.cross_compile = true
42
+ e.cross_config_options << '--enable-cross-build'
43
+ e.config_options << '--disable-system-libraries'
44
+ e.cross_platform = cross_platforms
45
+ e.cross_compiling do |spec|
46
+ spec.files.reject! { |path| File.fnmatch?('ports/*', path) }
47
+ spec.dependencies.reject! { |dep| dep.name == 'mini_portile2' }
48
+ end
49
+ end
50
+
51
+ RSpec::Core::RakeTask.new(:spec)
52
+
53
+ begin
54
+ require 'ruby_memcheck'
55
+ require 'ruby_memcheck/rspec/rake_task'
56
+
57
+ namespace :spec do
58
+ RubyMemcheck::RSpec::RakeTask.new(valgrind: :compile)
59
+ end
60
+ rescue LoadError
61
+ # Only define the spec:valgrind task if ruby_memcheck is installed
62
+ end
63
+
64
+ namespace :gem do
65
+ cross_platforms.each do |platform|
66
+
67
+ # Compile each platform's native gem, packaging up the result. Note we add
68
+ # /usr/local/bin to the PATH as it contains the newest version of CMake in
69
+ # the rake-compiler-dock images.
70
+ desc "Compile and build native gem for #{platform} platform"
71
+ task platform do
72
+ RakeCompilerDock.sh <<~SCRIPT, platform: platform, verbose: true
73
+ gem install bundler --no-document &&
74
+ bundle &&
75
+ bundle exec rake native:#{platform} pkg/#{re2_gemspec.full_name}-#{Gem::Platform.new(platform)}.gem PATH="/usr/local/bin:$PATH"
76
+ SCRIPT
77
+ end
78
+ end
79
+ end
80
+
81
+ # Set up file tasks for Abseil and RE2's archives so they are automatically
82
+ # downloaded when required by the gem task.
83
+ file abseil_archive do
84
+ abseil_recipe.download
85
+ end
86
+
87
+ file re2_archive do
88
+ re2_recipe.download
89
+ end
90
+
91
+ task default: :spec
92
+
93
+ CLEAN.add("lib/**/*.{o,so,bundle}", "pkg")
94
+ CLOBBER.add("ports")
data/dependencies.yml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ libre2:
3
+ version: '2024-07-02'
4
+ sha256: eb2df807c781601c14a260a507a5bb4509be1ee626024cb45acbd57cb9d4032b
5
+ abseil:
6
+ version: '20240722.0'
7
+ sha256: f50e5ac311a81382da7fa75b97310e4b9006474f9560ac46f54a9967f07d4ae3