strscan 3.1.0 → 3.1.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,544 @@
1
+ \Class `StringScanner` supports processing a stored string as a stream;
2
+ this code creates a new `StringScanner` object with string `'foobarbaz'`:
3
+
4
+ ```
5
+ require 'strscan'
6
+ scanner = StringScanner.new('foobarbaz')
7
+ ```
8
+
9
+ ## About the Examples
10
+
11
+ All examples here assume that `StringScanner` has been required:
12
+
13
+ ```
14
+ require 'strscan'
15
+ ```
16
+
17
+ Some examples here assume that these constants are defined:
18
+
19
+ ```
20
+ MULTILINE_TEXT = <<~EOT
21
+ Go placidly amid the noise and haste,
22
+ and remember what peace there may be in silence.
23
+ EOT
24
+
25
+ HIRAGANA_TEXT = 'こんにちは'
26
+
27
+ ENGLISH_TEXT = 'Hello'
28
+ ```
29
+
30
+ Some examples here assume that certain helper methods are defined:
31
+
32
+ - `put_situation(scanner)`:
33
+ Displays the values of the scanner's
34
+ methods #pos, #charpos, #rest, and #rest_size.
35
+ - `put_match_values(scanner)`:
36
+ Displays the scanner's [match values][9].
37
+ - `match_values_cleared?(scanner)`:
38
+ Returns whether the scanner's [match values][9] are cleared.
39
+
40
+ See examples [here][ext/strscan/helper_methods_md.html].
41
+
42
+ ## The `StringScanner` \Object
43
+
44
+ This code creates a `StringScanner` object
45
+ (we'll call it simply a _scanner_),
46
+ and shows some of its basic properties:
47
+
48
+ ```
49
+ scanner = StringScanner.new('foobarbaz')
50
+ scanner.string # => "foobarbaz"
51
+ put_situation(scanner)
52
+ # Situation:
53
+ # pos: 0
54
+ # charpos: 0
55
+ # rest: "foobarbaz"
56
+ # rest_size: 9
57
+ ```
58
+
59
+ The scanner has:
60
+
61
+ * A <i>stored string</i>, which is:
62
+
63
+ * Initially set by StringScanner.new(string) to the given `string`
64
+ (`'foobarbaz'` in the example above).
65
+ * Modifiable by methods #string=(new_string) and #concat(more_string).
66
+ * Returned by method #string.
67
+
68
+ More at [Stored String][1] below.
69
+
70
+ * A _position_;
71
+ a zero-based index into the bytes of the stored string (_not_ into its characters):
72
+
73
+ * Initially set by StringScanner.new to `0`.
74
+ * Returned by method #pos.
75
+ * Modifiable explicitly by methods #reset, #terminate, and #pos=(new_pos).
76
+ * Modifiable implicitly (various traversing methods, among others).
77
+
78
+ More at [Byte Position][2] below.
79
+
80
+ * A <i>target substring</i>,
81
+ which is a trailing substring of the stored string;
82
+ it extends from the current position to the end of the stored string:
83
+
84
+ * Initially set by StringScanner.new(string) to the given `string`
85
+ (`'foobarbaz'` in the example above).
86
+ * Returned by method #rest.
87
+ * Modified by any modification to either the stored string or the position.
88
+
89
+ <b>Most importantly</b>:
90
+ the searching and traversing methods operate on the target substring,
91
+ which may be (and often is) less than the entire stored string.
92
+
93
+ More at [Target Substring][3] below.
94
+
95
+ ## Stored \String
96
+
97
+ The <i>stored string</i> is the string stored in the `StringScanner` object.
98
+
99
+ Each of these methods sets, modifies, or returns the stored string:
100
+
101
+ | Method | Effect |
102
+ |----------------------|-------------------------------------------------|
103
+ | ::new(string) | Creates a new scanner for the given string. |
104
+ | #string=(new_string) | Replaces the existing stored string. |
105
+ | #concat(more_string) | Appends a string to the existing stored string. |
106
+ | #string | Returns the stored string. |
107
+
108
+ ## Positions
109
+
110
+ A `StringScanner` object maintains a zero-based <i>byte position</i>
111
+ and a zero-based <i>character position</i>.
112
+
113
+ Each of these methods explicitly sets positions:
114
+
115
+ | Method | Effect |
116
+ |--------------------------|----------------------------------------------------------|
117
+ | #reset | Sets both positions to zero (begining of stored string). |
118
+ | #terminate | Sets both positions to the end of the stored string. |
119
+ | #pos=(new_byte_position) | Sets byte position; adjusts character position. |
120
+
121
+ ### Byte Position (Position)
122
+
123
+ The byte position (or simply _position_)
124
+ is a zero-based index into the bytes in the scanner's stored string;
125
+ for a new `StringScanner` object, the byte position is zero.
126
+
127
+ When the byte position is:
128
+
129
+ * Zero (at the beginning), the target substring is the entire stored string.
130
+ * Equal to the size of the stored string (at the end),
131
+ the target substring is the empty string `''`.
132
+
133
+ To get or set the byte position:
134
+
135
+ * \#pos: returns the byte position.
136
+ * \#pos=(new_pos): sets the byte position.
137
+
138
+ Many methods use the byte position as the basis for finding matches;
139
+ many others set, increment, or decrement the byte position:
140
+
141
+ ```
142
+ scanner = StringScanner.new('foobar')
143
+ scanner.pos # => 0
144
+ scanner.scan(/foo/) # => "foo" # Match found.
145
+ scanner.pos # => 3 # Byte position incremented.
146
+ scanner.scan(/foo/) # => nil # Match not found.
147
+ scanner.pos # => 3 # Byte position not changed.
148
+ ```
149
+
150
+ Some methods implicitly modify the byte position;
151
+ see:
152
+
153
+ * [Setting the Target Substring][4].
154
+ * [Traversing the Target Substring][5].
155
+
156
+ The values of these methods are derived directly from the values of #pos and #string:
157
+
158
+ - \#charpos: the [character position][7].
159
+ - \#rest: the [target substring][3].
160
+ - \#rest_size: `rest.size`.
161
+
162
+ ### Character Position
163
+
164
+ The character position is a zero-based index into the _characters_
165
+ in the stored string;
166
+ for a new `StringScanner` object, the character position is zero.
167
+
168
+ \Method #charpos returns the character position;
169
+ its value may not be reset explicitly.
170
+
171
+ Some methods change (increment or reset) the character position;
172
+ see:
173
+
174
+ * [Setting the Target Substring][4].
175
+ * [Traversing the Target Substring][5].
176
+
177
+ Example (string includes multi-byte characters):
178
+
179
+ ```
180
+ scanner = StringScanner.new(ENGLISH_TEXT) # Five 1-byte characters.
181
+ scanner.concat(HIRAGANA_TEXT) # Five 3-byte characters
182
+ scanner.string # => "Helloこんにちは" # Twenty bytes in all.
183
+ put_situation(scanner)
184
+ # Situation:
185
+ # pos: 0
186
+ # charpos: 0
187
+ # rest: "Helloこんにちは"
188
+ # rest_size: 20
189
+ scanner.scan(/Hello/) # => "Hello" # Five 1-byte characters.
190
+ put_situation(scanner)
191
+ # Situation:
192
+ # pos: 5
193
+ # charpos: 5
194
+ # rest: "こんにちは"
195
+ # rest_size: 15
196
+ scanner.getch # => "こ" # One 3-byte character.
197
+ put_situation(scanner)
198
+ # Situation:
199
+ # pos: 8
200
+ # charpos: 6
201
+ # rest: "んにちは"
202
+ # rest_size: 12
203
+ ```
204
+
205
+ ## Target Substring
206
+
207
+ The target substring is the the part of the [stored string][1]
208
+ that extends from the current [byte position][2] to the end of the stored string;
209
+ it is always either:
210
+
211
+ - The entire stored string (byte position is zero).
212
+ - A trailing substring of the stored string (byte position positive).
213
+
214
+ The target substring is returned by method #rest,
215
+ and its size is returned by method #rest_size.
216
+
217
+ Examples:
218
+
219
+ ```
220
+ scanner = StringScanner.new('foobarbaz')
221
+ put_situation(scanner)
222
+ # Situation:
223
+ # pos: 0
224
+ # charpos: 0
225
+ # rest: "foobarbaz"
226
+ # rest_size: 9
227
+ scanner.pos = 3
228
+ put_situation(scanner)
229
+ # Situation:
230
+ # pos: 3
231
+ # charpos: 3
232
+ # rest: "barbaz"
233
+ # rest_size: 6
234
+ scanner.pos = 9
235
+ put_situation(scanner)
236
+ # Situation:
237
+ # pos: 9
238
+ # charpos: 9
239
+ # rest: ""
240
+ # rest_size: 0
241
+ ```
242
+
243
+ ### Setting the Target Substring
244
+
245
+ The target substring is set whenever:
246
+
247
+ * The [stored string][1] is set (position reset to zero; target substring set to stored string).
248
+ * The [byte position][2] is set (target substring adjusted accordingly).
249
+
250
+ ### Querying the Target Substring
251
+
252
+ This table summarizes (details and examples at the links):
253
+
254
+ | Method | Returns |
255
+ |------------|-----------------------------------|
256
+ | #rest | Target substring. |
257
+ | #rest_size | Size (bytes) of target substring. |
258
+
259
+ ### Searching the Target Substring
260
+
261
+ A _search_ method examines the target substring,
262
+ but does not advance the [positions][11]
263
+ or (by implication) shorten the target substring.
264
+
265
+ This table summarizes (details and examples at the links):
266
+
267
+ | Method | Returns | Sets Match Values? |
268
+ |-----------------------|-----------------------------------------------|--------------------|
269
+ | #check(pattern) | Matched leading substring or +nil+. | Yes. |
270
+ | #check_until(pattern) | Matched substring (anywhere) or +nil+. | Yes. |
271
+ | #exist?(pattern) | Matched substring (anywhere) end index. | Yes. |
272
+ | #match?(pattern) | Size of matched leading substring or +nil+. | Yes. |
273
+ | #peek(size) | Leading substring of given length (bytes). | No. |
274
+ | #peek_byte | Integer leading byte or +nil+. | No. |
275
+ | #rest | Target substring (from byte position to end). | No. |
276
+
277
+ ### Traversing the Target Substring
278
+
279
+ A _traversal_ method examines the target substring,
280
+ and, if successful:
281
+
282
+ - Advances the [positions][11].
283
+ - Shortens the target substring.
284
+
285
+
286
+ This table summarizes (details and examples at links):
287
+
288
+ | Method | Returns | Sets Match Values? |
289
+ |----------------------|------------------------------------------------------|--------------------|
290
+ | #get_byte | Leading byte or +nil+. | No. |
291
+ | #getch | Leading character or +nil+. | No. |
292
+ | #scan(pattern) | Matched leading substring or +nil+. | Yes. |
293
+ | #scan_byte | Integer leading byte or +nil+. | No. |
294
+ | #scan_until(pattern) | Matched substring (anywhere) or +nil+. | Yes. |
295
+ | #skip(pattern) | Matched leading substring size or +nil+. | Yes. |
296
+ | #skip_until(pattern) | Position delta to end-of-matched-substring or +nil+. | Yes. |
297
+ | #unscan | +self+. | No. |
298
+
299
+ ## Querying the Scanner
300
+
301
+ Each of these methods queries the scanner object
302
+ without modifying it (details and examples at links)
303
+
304
+ | Method | Returns |
305
+ |---------------------|----------------------------------|
306
+ | #beginning_of_line? | +true+ or +false+. |
307
+ | #charpos | Character position. |
308
+ | #eos? | +true+ or +false+. |
309
+ | #fixed_anchor? | +true+ or +false+. |
310
+ | #inspect | String representation of +self+. |
311
+ | #pos | Byte position. |
312
+ | #rest | Target substring. |
313
+ | #rest_size | Size of target substring. |
314
+ | #string | Stored string. |
315
+
316
+ ## Matching
317
+
318
+ `StringScanner` implements pattern matching via Ruby class [Regexp][6],
319
+ and its matching behaviors are the same as Ruby's
320
+ except for the [fixed-anchor property][10].
321
+
322
+ ### Matcher Methods
323
+
324
+ Each <i>matcher method</i> takes a single argument `pattern`,
325
+ and attempts to find a matching substring in the [target substring][3].
326
+
327
+ | Method | Pattern Type | Matches Target Substring | Success Return | May Update Positions? |
328
+ |--------------|-------------------|--------------------------|--------------------|-----------------------|
329
+ | #check | Regexp or String. | At beginning. | Matched substring. | No. |
330
+ | #check_until | Regexp or String. | Anywhere. | Substring. | No. |
331
+ | #match? | Regexp or String. | At beginning. | Match size. | No. |
332
+ | #exist? | Regexp or String. | Anywhere. | Substring size. | No. |
333
+ | #scan | Regexp or String. | At beginning. | Matched substring. | Yes. |
334
+ | #scan_until | Regexp or String. | Anywhere. | Substring. | Yes. |
335
+ | #skip | Regexp or String. | At beginning. | Match size. | Yes. |
336
+ | #skip_until | Regexp or String. | Anywhere. | Substring size. | Yes. |
337
+
338
+ <br>
339
+
340
+ Which matcher you choose will depend on:
341
+
342
+ - Where you want to find a match:
343
+
344
+ - Only at the beginning of the target substring:
345
+ #check, #match?, #scan, #skip.
346
+ - Anywhere in the target substring:
347
+ #check_until, #exist?, #scan_until, #skip_until.
348
+
349
+ - Whether you want to:
350
+
351
+ - Traverse, by advancing the positions:
352
+ #scan, #scan_until, #skip, #skip_until.
353
+ - Keep the positions unchanged:
354
+ #check, #check_until, #match?, #exist?.
355
+
356
+ - What you want for the return value:
357
+
358
+ - The matched substring: #check, #scan.
359
+ - The substring: #check_until, #scan_until.
360
+ - The match size: #match?, #skip.
361
+ - The substring size: #exist?, #skip_until.
362
+
363
+ ### Match Values
364
+
365
+ The <i>match values</i> in a `StringScanner` object
366
+ generally contain the results of the most recent attempted match.
367
+
368
+ Each match value may be thought of as:
369
+
370
+ * _Clear_: Initially, or after an unsuccessful match attempt:
371
+ usually, `false`, `nil`, or `{}`.
372
+ * _Set_: After a successful match attempt:
373
+ `true`, string, array, or hash.
374
+
375
+ Each of these methods clears match values:
376
+
377
+ - ::new(string).
378
+ - \#reset.
379
+ - \#terminate.
380
+
381
+ Each of these methods attempts a match based on a pattern,
382
+ and either sets match values (if successful) or clears them (if not);
383
+
384
+ - \#check(pattern)
385
+ - \#check_until(pattern)
386
+ - \#exist?(pattern)
387
+ - \#match?(pattern)
388
+ - \#scan(pattern)
389
+ - \#scan_until(pattern)
390
+ - \#skip(pattern)
391
+ - \#skip_until(pattern)
392
+
393
+ #### Basic Match Values
394
+
395
+ Basic match values are those not related to captures.
396
+
397
+ Each of these methods returns a basic match value:
398
+
399
+ | Method | Return After Match | Return After No Match |
400
+ |-----------------|----------------------------------------|-----------------------|
401
+ | #matched? | +true+. | +false+. |
402
+ | #matched_size | Size of matched substring. | +nil+. |
403
+ | #matched | Matched substring. | +nil+. |
404
+ | #pre_match | Substring preceding matched substring. | +nil+. |
405
+ | #post_match | Substring following matched substring. | +nil+. |
406
+
407
+ <br>
408
+
409
+ See examples below.
410
+
411
+ #### Captured Match Values
412
+
413
+ Captured match values are those related to [captures][16].
414
+
415
+ Each of these methods returns a captured match value:
416
+
417
+ | Method | Return After Match | Return After No Match |
418
+ |-----------------|-----------------------------------------|-----------------------|
419
+ | #size | Count of captured substrings. | +nil+. |
420
+ | #[](n) | <tt>n</tt>th captured substring. | +nil+. |
421
+ | #captures | Array of all captured substrings. | +nil+. |
422
+ | #values_at(*n) | Array of specified captured substrings. | +nil+. |
423
+ | #named_captures | Hash of named captures. | <tt>{}</tt>. |
424
+
425
+ <br>
426
+
427
+ See examples below.
428
+
429
+ #### Match Values Examples
430
+
431
+ Successful basic match attempt (no captures):
432
+
433
+ ```
434
+ scanner = StringScanner.new('foobarbaz')
435
+ scanner.exist?(/bar/)
436
+ put_match_values(scanner)
437
+ # Basic match values:
438
+ # matched?: true
439
+ # matched_size: 3
440
+ # pre_match: "foo"
441
+ # matched : "bar"
442
+ # post_match: "baz"
443
+ # Captured match values:
444
+ # size: 1
445
+ # captures: []
446
+ # named_captures: {}
447
+ # values_at: ["bar", nil]
448
+ # []:
449
+ # [0]: "bar"
450
+ # [1]: nil
451
+ ```
452
+
453
+ Failed basic match attempt (no captures);
454
+
455
+ ```
456
+ scanner = StringScanner.new('foobarbaz')
457
+ scanner.exist?(/nope/)
458
+ match_values_cleared?(scanner) # => true
459
+ ```
460
+
461
+ Successful unnamed capture match attempt:
462
+
463
+ ```
464
+ scanner = StringScanner.new('foobarbazbatbam')
465
+ scanner.exist?(/(foo)bar(baz)bat(bam)/)
466
+ put_match_values(scanner)
467
+ # Basic match values:
468
+ # matched?: true
469
+ # matched_size: 15
470
+ # pre_match: ""
471
+ # matched : "foobarbazbatbam"
472
+ # post_match: ""
473
+ # Captured match values:
474
+ # size: 4
475
+ # captures: ["foo", "baz", "bam"]
476
+ # named_captures: {}
477
+ # values_at: ["foobarbazbatbam", "foo", "baz", "bam", nil]
478
+ # []:
479
+ # [0]: "foobarbazbatbam"
480
+ # [1]: "foo"
481
+ # [2]: "baz"
482
+ # [3]: "bam"
483
+ # [4]: nil
484
+ ```
485
+
486
+ Successful named capture match attempt;
487
+ same as unnamed above, except for #named_captures:
488
+
489
+ ```
490
+ scanner = StringScanner.new('foobarbazbatbam')
491
+ scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
492
+ scanner.named_captures # => {"x"=>"foo", "y"=>"baz", "z"=>"bam"}
493
+ ```
494
+
495
+ Failed unnamed capture match attempt:
496
+
497
+ ```
498
+ scanner = StringScanner.new('somestring')
499
+ scanner.exist?(/(foo)bar(baz)bat(bam)/)
500
+ match_values_cleared?(scanner) # => true
501
+ ```
502
+
503
+ Failed named capture match attempt;
504
+ same as unnamed above, except for #named_captures:
505
+
506
+ ```
507
+ scanner = StringScanner.new('somestring')
508
+ scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
509
+ match_values_cleared?(scanner) # => false
510
+ scanner.named_captures # => {"x"=>nil, "y"=>nil, "z"=>nil}
511
+ ```
512
+
513
+ ## Fixed-Anchor Property
514
+
515
+ Pattern matching in `StringScanner` is the same as in Ruby's,
516
+ except for its fixed-anchor property,
517
+ which determines the meaning of `'\A'`:
518
+
519
+ * `false` (the default): matches the current byte position.
520
+
521
+ ```
522
+ scanner = StringScanner.new('foobar')
523
+ scanner.scan(/\A./) # => "f"
524
+ scanner.scan(/\A./) # => "o"
525
+ scanner.scan(/\A./) # => "o"
526
+ scanner.scan(/\A./) # => "b"
527
+ ```
528
+
529
+ * `true`: matches the beginning of the target substring;
530
+ never matches unless the byte position is zero:
531
+
532
+ ```
533
+ scanner = StringScanner.new('foobar', fixed_anchor: true)
534
+ scanner.scan(/\A./) # => "f"
535
+ scanner.scan(/\A./) # => nil
536
+ scanner.reset
537
+ scanner.scan(/\A./) # => "f"
538
+ ```
539
+
540
+ The fixed-anchor property is set when the `StringScanner` object is created,
541
+ and may not be modified
542
+ (see StringScanner.new);
543
+ method #fixed_anchor? returns the setting.
544
+