rubyjedi-oga 1.0.3

Sign up to get free protection for your applications and to get access to all the features.
Files changed (58) hide show
  1. checksums.yaml +7 -0
  2. data/.yardopts +13 -0
  3. data/LICENSE +362 -0
  4. data/README.md +317 -0
  5. data/doc/css/common.css +77 -0
  6. data/doc/css_selectors.md +935 -0
  7. data/doc/manually_creating_documents.md +67 -0
  8. data/doc/migrating_from_nokogiri.md +169 -0
  9. data/doc/xml_namespaces.md +63 -0
  10. data/ext/c/extconf.rb +11 -0
  11. data/ext/c/lexer.c +2595 -0
  12. data/ext/c/lexer.h +16 -0
  13. data/ext/c/lexer.rl +198 -0
  14. data/ext/c/liboga.c +6 -0
  15. data/ext/c/liboga.h +11 -0
  16. data/ext/java/Liboga.java +14 -0
  17. data/ext/java/org/liboga/xml/Lexer.java +1363 -0
  18. data/ext/java/org/liboga/xml/Lexer.rl +223 -0
  19. data/ext/ragel/base_lexer.rl +633 -0
  20. data/lib/oga.rb +57 -0
  21. data/lib/oga/blacklist.rb +40 -0
  22. data/lib/oga/css/lexer.rb +743 -0
  23. data/lib/oga/css/parser.rb +976 -0
  24. data/lib/oga/entity_decoder.rb +21 -0
  25. data/lib/oga/html/entities.rb +2150 -0
  26. data/lib/oga/html/parser.rb +25 -0
  27. data/lib/oga/html/sax_parser.rb +18 -0
  28. data/lib/oga/lru.rb +160 -0
  29. data/lib/oga/oga.rb +57 -0
  30. data/lib/oga/version.rb +3 -0
  31. data/lib/oga/whitelist.rb +20 -0
  32. data/lib/oga/xml/attribute.rb +136 -0
  33. data/lib/oga/xml/cdata.rb +17 -0
  34. data/lib/oga/xml/character_node.rb +37 -0
  35. data/lib/oga/xml/comment.rb +17 -0
  36. data/lib/oga/xml/default_namespace.rb +13 -0
  37. data/lib/oga/xml/doctype.rb +82 -0
  38. data/lib/oga/xml/document.rb +108 -0
  39. data/lib/oga/xml/element.rb +428 -0
  40. data/lib/oga/xml/entities.rb +122 -0
  41. data/lib/oga/xml/html_void_elements.rb +15 -0
  42. data/lib/oga/xml/lexer.rb +550 -0
  43. data/lib/oga/xml/namespace.rb +48 -0
  44. data/lib/oga/xml/node.rb +219 -0
  45. data/lib/oga/xml/node_set.rb +333 -0
  46. data/lib/oga/xml/parser.rb +631 -0
  47. data/lib/oga/xml/processing_instruction.rb +37 -0
  48. data/lib/oga/xml/pull_parser.rb +175 -0
  49. data/lib/oga/xml/querying.rb +56 -0
  50. data/lib/oga/xml/sax_parser.rb +192 -0
  51. data/lib/oga/xml/text.rb +66 -0
  52. data/lib/oga/xml/traversal.rb +50 -0
  53. data/lib/oga/xml/xml_declaration.rb +65 -0
  54. data/lib/oga/xpath/evaluator.rb +1798 -0
  55. data/lib/oga/xpath/lexer.rb +1958 -0
  56. data/lib/oga/xpath/parser.rb +622 -0
  57. data/oga.gemspec +45 -0
  58. metadata +227 -0
@@ -0,0 +1,77 @@
1
+ body
2
+ {
3
+ font-size: 14px;
4
+ line-height: 1.6;
5
+ margin: 0 auto;
6
+ max-width: 960px;
7
+ }
8
+
9
+ p code, dd code, li code
10
+ {
11
+ background: #f9f2f4;
12
+ color: #c7254e;
13
+ border-radius: 4px;
14
+ padding: 2px 4px;
15
+ }
16
+
17
+ pre.code
18
+ {
19
+ font-size: 13px;
20
+ line-height: 1.4;
21
+ overflow: auto;
22
+ }
23
+
24
+ blockquote
25
+ {
26
+ border-left: 5px solid #eee;
27
+ margin: 0px;
28
+ padding-left: 15px;
29
+ }
30
+
31
+ /**
32
+ * YARD uses generic table styles, using a special class means those tables
33
+ * don't get messed up.
34
+ */
35
+ .table
36
+ {
37
+ border: 1px solid #ccc;
38
+ border-right: none;
39
+ border-collapse: separate;
40
+ border-spacing: 0;
41
+ text-align: left;
42
+ }
43
+
44
+ .table.full
45
+ {
46
+ width: 100%;
47
+ }
48
+
49
+ .table .field_name
50
+ {
51
+ min-width: 160px;
52
+ }
53
+
54
+ .table thead tr th.no_sort:first-child
55
+ {
56
+ width: 25px;
57
+ }
58
+
59
+ .table thead tr th, .table tbody tr td
60
+ {
61
+ border-bottom: 1px solid #ccc;
62
+ border-right: 1px solid #ccc;
63
+ min-width: 20px;
64
+ padding: 8px 5px;
65
+ text-align: left;
66
+ vertical-align: top;
67
+ }
68
+
69
+ .table tbody tr:last-child td
70
+ {
71
+ border-bottom: none;
72
+ }
73
+
74
+ .table tr:nth-child(odd) td
75
+ {
76
+ background: #f9f9f9;
77
+ }
@@ -0,0 +1,935 @@
1
+ # CSS Selectors Specification
2
+
3
+ This document acts as an alternative specification to the official W3
4
+ [CSS3 Selectors Specification][w3spec]. This document specifies only the
5
+ selectors supported by Oga itself. Only CSS3 selectors are covered, CSS4 is not
6
+ part of this specification.
7
+
8
+ This document is best viewed in the YARD generated documentation or any other
9
+ Markdown viewer that supports the [Kramdown][kramdown] syntax. Alternatively it
10
+ can be viewed in its raw form.
11
+
12
+ ## Abstract
13
+
14
+ The official W3 specification on CSS selectors is anything but pleasant to read.
15
+ A lack of good examples and unspecified behaviour are just two of many problems.
16
+ This document was written as a reference guide for myself as well as a way for
17
+ others to more easily understand how CSS selectors work.
18
+
19
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
20
+ "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
21
+ interpreted as described in [RFC 2119][rfc-2119].
22
+
23
+ ## Syntax
24
+
25
+ To describe syntax elements of CSS selectors this document uses the same grammar
26
+ as [Ragel][ragel]. For example, an integer would be defined as following:
27
+
28
+ integer = [0-9]+;
29
+
30
+ In turn an integer that can optionally be prefixed by `+` or `-` would be
31
+ defined as following:
32
+
33
+ integer = ('+' | '-')* [0-9]+;
34
+
35
+ A quick and basic crash course of the Ragel grammar:
36
+
37
+ * `*`: zero or more instance of the preceding token(s)
38
+ * `+`: one or more instances of the preceding token(s)
39
+ * `(` and `)`: used for grouping expressions together
40
+ * `^`: inverts a match, thus `^[0-9]` means "anything but a single digit"
41
+ * `"..."` or `'...'`: a literal character, `"x"` would match the literal "x"
42
+ * `|`: the OR operator, `x | y` translates to "x OR y"
43
+ * `[...]`: used to define a sequence, `[0-9]` translates to "0 OR 1 OR 2 OR
44
+ 3..." all the way upto 9
45
+
46
+ Semicolons are used to terminate lines. While not strictly required in this
47
+ specification they are included in order to produce a Ragel syntax compatible
48
+ grammar.
49
+
50
+ See the Ragel documentation for more information on the grammar.
51
+
52
+ ## Terminology
53
+
54
+ local name
55
+ : The name of an element without a namespace. For the element `<strong>` the
56
+ local name is `strong`.
57
+
58
+ namespace prefix
59
+ : The namespace prefix of an element. For the element `<foo:strong>` the
60
+ namespace prefix is `foo`.
61
+
62
+ expression
63
+ : A single or multiple selectors used together to retrieve a set of elements
64
+ from a document.
65
+
66
+ ## Selector Scoping
67
+
68
+ Whenever a selector is used to match an element the selector applies to all
69
+ nodes in the context. For example, the selector `foo` would match all `foo`
70
+ elements at any position in the document. On the other hand, the selector
71
+ `foo bar` only matches any `bar` elements that are a descedant of any `foo`
72
+ element.
73
+
74
+ In XPath the corresponding axis for this is `descendant`. In other words, this
75
+ CSS expression:
76
+
77
+ foo
78
+
79
+ is the same as this XPath expression:
80
+
81
+ descendant::foo
82
+
83
+ In turn this CSS expression:
84
+
85
+ foo bar
86
+
87
+ is the same as this XPath expression:
88
+
89
+ descendant::foo/::bar
90
+
91
+ Note that in the various XPath examples the `descendant` axis is omitted in
92
+ order to enhance readability.
93
+
94
+ ### Syntax
95
+
96
+ A CSS expression is made up of multiple selectors separated by one or more
97
+ spaces. There MUST be at least 1 space between two selectors, there MAY be more
98
+ than one. Multiple spaces do not alter the behaviour of the expression in any
99
+ way.
100
+
101
+ ## Universal Selector
102
+
103
+ W3 chapter: <http://www.w3.org/TR/css3-selectors/#universal-selector>
104
+
105
+ The universal selector `*` (also known as the "wildcard selector") can be used
106
+ to match any element, regardless of its local name or namespace prefix.
107
+
108
+ Example XML:
109
+
110
+ <root>
111
+ <foo></foo>
112
+ <bar></bar>
113
+ </root>
114
+
115
+ CSS:
116
+
117
+ root *
118
+
119
+ This would return a set containing two elements: `<foo>` and `<bar>`
120
+
121
+ The corresponding XPath is also `*`.
122
+
123
+ ### Syntax
124
+
125
+ The syntax for the universal selector is very simple:
126
+
127
+ universal = '*';
128
+
129
+ ## Element Selector
130
+
131
+ W3 chapter: <http://www.w3.org/TR/css3-selectors/#type-selectors>
132
+
133
+ The element selector (known as "Type selector" in the official W3 specification)
134
+ can be used to match a set of elements by their local name or namespace. The
135
+ selector `foo` is used to match all elements with the local name being set to
136
+ `foo`.
137
+
138
+ Example XML:
139
+
140
+ <root>
141
+ <foo />
142
+ <bar />
143
+ </root>
144
+
145
+ CSS:
146
+
147
+ root foo
148
+
149
+ This would return a set with only the `<foo>` element.
150
+
151
+ This selector can be used in combination with the
152
+ [Universal Selector][universal-selector]. This allows one to select elements
153
+ using both a given local name and namespace. The syntax for this is as
154
+ following:
155
+
156
+ ns-prefix|local-name
157
+
158
+ Here the pipe (`|`) character separates the namespace prefix and the local name.
159
+ Both can either be an identifier or a wildcard. For example, the selector
160
+ `rb|foo` matches all elements with local name `foo` and namespace prefix `rb`.
161
+
162
+ The namespace prefix MAY be left out producing the selector `|local-name`. In
163
+ this case the selector only matches elements _without_ a namespace prefix.
164
+
165
+ If a namespace prefix is given and it's _not_ a wildcard then elements without a
166
+ namespace prefix will _not_ be matched.
167
+
168
+ The corresponding XPath expression for such a selector is
169
+ `ns-prefix:local-name`. For example, `rb|foo` in CSS is the same as `rb:foo` in
170
+ XPath.
171
+
172
+ ### Syntax
173
+
174
+ The syntax for just the local name is as following:
175
+
176
+ identifier = '*' | [a-zA-Z]+ [a-zA-Z\-_0-9]*;
177
+
178
+ The wildcard is put in place to allow a single rule to be used for both names
179
+ and wildcards.
180
+
181
+ The syntax for selecting an element including a namespace prefix is as
182
+ following:
183
+
184
+ ns_plus_local_name = identifier* '|' identifier
185
+
186
+ This would match `|foo`, `*|foo` and `foo|bar`. In order to match `foo` the
187
+ regular `identifier` rule declared above can be used.
188
+
189
+ ## Class Selector
190
+
191
+ Class selectors can be used to select a set of elements based on the values set
192
+ in the `class` attribute. Class selectors start with a period (`.`) followed by
193
+ an identifier. Multiple class selectors can be chained together, matching only
194
+ elements that have all the specified classes set.
195
+
196
+ As an example, `.foo` can be used to select all elements that have "foo" set in
197
+ the `class` attribute, either as the sole or one of many values. In turn,
198
+ `.foo.bar` matches elements that have both "foo" and "bar" set as the class.
199
+
200
+ Example XML:
201
+
202
+ <root>
203
+ <a class="first" />
204
+ <b class="second" />
205
+ </root>
206
+
207
+ Using the CSS selector `.first` would return a set containing only the `<a>`
208
+ element. Using `.first.second` would return a set containing both the `<a>` and
209
+ `<b>` nodes.
210
+
211
+ ### Syntax
212
+
213
+ identifier = '*' | [a-zA-Z]+ [a-zA-Z\-_0-9]*;
214
+
215
+ # .foo, .foo.bar, .foo.bar.baz, etc
216
+ class = ('.' identifier)+;
217
+
218
+ ## ID Selector
219
+
220
+ The ID selector can be used to match elements where the value of the `id`
221
+ attribute matches whatever is specified in the selector. ID selectors start with
222
+ a hash sign (`#`) followed by an identifier.
223
+
224
+ While technically multiple ID selectors _can_ be chained together, HTML only
225
+ allows elements to have a single ID. As a result doing so is fairly useless.
226
+ Unlike classes IDs are globally unique, no two elements can have the same ID.
227
+
228
+ Example XML:
229
+
230
+ <root>
231
+ <a id="first" />
232
+ <b id="second" />
233
+ </root>
234
+
235
+ Using the CSS selector `#first` would return a set containing only the `<a>`
236
+ node.
237
+
238
+ ### Syntax
239
+
240
+ identifier = '*' | [a-zA-Z]+ [a-zA-Z\-_0-9]*;
241
+
242
+ # .foo, .foo.bar, .foo.bar.baz, etc
243
+ class = ('#' identifier)+;
244
+
245
+ ## Attribute Selector
246
+
247
+ W3 chapter: <http://www.w3.org/TR/css3-selectors/#attribute-selectors>
248
+
249
+ Attribute selectors can be used to further narrow down a set of elements based
250
+ on their attribute list. In XPath these selectors are known as "predicates". For
251
+ example, the selector `foo[bar]` matches all `foo` elements that have a `bar`
252
+ attribute, regardless of the value of said attribute.
253
+
254
+ Example XML:
255
+
256
+ <root>
257
+ <foo number="1" />
258
+ <bar />
259
+ </root>
260
+
261
+ CSS:
262
+
263
+ root foo[number]
264
+
265
+ This would return a set containing only the `<foo>` element since the `<bar>`
266
+ element has no attributes.
267
+
268
+ For the CSS expression `foo[number]` the corresponding XPath expression is the
269
+ following:
270
+
271
+ foo[@number]
272
+
273
+ When specifying an attribute you MAY include an operator and a value to match.
274
+ In this case you MUST include an attribute value surrounded by either single or
275
+ double quotes (but not a combination of the two).
276
+
277
+ There are 6 operators available:
278
+
279
+ * `=`: equals operator
280
+ * `~=`: whitespace-in operator
281
+ * `^=`: starts-with operator
282
+ * `$=`: ends-with operator
283
+ * `*=`: contains operator
284
+ * `|=`: hyphen-starts-with operator
285
+
286
+ ### Equals Operator
287
+
288
+ The equals operator matches an element if a given attribute value equals the
289
+ value specified. For example, `foo[number="1"]` matches all `foo` elements that
290
+ have a `number` attribute who's value is _exactly_ "1".
291
+
292
+ Example XML:
293
+
294
+ <root>
295
+ <foo number="1" />
296
+ <foo number="2" />
297
+ </root>
298
+
299
+ CSS:
300
+
301
+ root foo[number="1"]
302
+
303
+ This would return a set containing only the first `<foo>` element.
304
+
305
+ The corresponding XPath expression is quite similar. For `foo[number="1"]` this
306
+ would be:
307
+
308
+ foo[@number="1"]
309
+
310
+ ### Whitespace-in Operator
311
+
312
+ This operator matches an element if the given attribute value consists out of
313
+ space separated values of which one is exactly the given value. For example,
314
+ `foo[numbers~="1"]` matches all `foo` elements that have the value `"1"` in the
315
+ `numbers` attribute.
316
+
317
+ Example XML:
318
+
319
+ <root>
320
+ <foo numbers="1 2 3" />
321
+ <foo numbers="4 bar 6" />
322
+ </root>
323
+
324
+ CSS:
325
+
326
+ root foo[numbers~="1"]
327
+
328
+ This would return a set containing only the first `foo` element. On the other
329
+ hand, if one were to use the expression `root foo[numbers~="bar"]` instead then
330
+ only the second `<foo>` element would be matched.
331
+
332
+ The corresponding XPath expression is quite complex, `foo[numbers~="1"]` is
333
+ translated into the following XPath expression:
334
+
335
+ foo[contains(concat(" ", @numbers, " "), concat(" ", "1", " "))]
336
+
337
+ The `concat` calls are used to ensure the expression doesn't match the substring
338
+ of an attrbitue value and that the expression matches elements of which the
339
+ attribute only has a single value. If `foo[contains(@numbers, ' 1 ')]` were to
340
+ be used then attributes such as `<foo numbers="1" />` would not be matched.
341
+
342
+ Software implementing this selector are free to decide how they concatenate
343
+ spaces around the value to match. Both Oga and Nokogiri use an extra call to
344
+ `concat` but the following would be perfectly valid too:
345
+
346
+ foo[contains(concat(" ", @numbers, " "), " 1 ")]
347
+
348
+ ### Starts-with Operator
349
+
350
+ This operator matches elements of which the attribute value starts _exactly_
351
+ with the given value. For example, `foo[numbers^="1"]` would match the element
352
+ `<foo numbers="1 2 3" />` but _not_ the element `<foo numbers="2 3 1" />`.
353
+
354
+ For `foo[numbers^="1"]` the corresponding XPath expression is as following:
355
+
356
+ foo[starts-with(@numbers, "1")]
357
+
358
+ ### Ends-with Operator
359
+
360
+ This operator matches elements of which the attribute value ends _exactly_ with
361
+ the given value. For example, `foo[numbers$="3"]` would match the element `<foo
362
+ numbers="1 2 3" />` but _not_ the element `<foo numbers="2 3 1" />`.
363
+
364
+ The corresponding XPath expression is quite complex due to a lack of a
365
+ `ends-with` function in XPath. Instead one has to resort to using the
366
+ `substring()` function. As such the corresponding XPath expression for
367
+ `foo[bar="baz"]` is as following:
368
+
369
+ foo[substring(@bar, string-length(@bar) - string-length("baz") + 1, string-length("baz")) = "baz"]
370
+
371
+ ### Contains Operator
372
+
373
+ This operator matches elements of which the attribute value contains the given
374
+ value. For example, `foo[bar*="baz"]` would match both `<foo bar="bazzzz" />`
375
+ and `<foo bar="hello baz" />`.
376
+
377
+ For `foo[bar*="baz"]` the corresponding XPath expression is as following:
378
+
379
+ foo[contains(@bar, "baz")]
380
+
381
+ ### Hyphen-starts-with Operator
382
+
383
+ This operator matches elements of which the attribute value is a hyphen
384
+ separated list of values that starts _exactly_ with the given value. For
385
+ example, `foo[numbers|="1"]` matches `<foo numbers="1-2-3" />` but not
386
+ `<foo numbers="2-1-3" />`.
387
+
388
+ For `foo[numbers|="1"]` the corresponding XPath expression is as following:
389
+
390
+ foo[@numbers = "1" or starts-with(@numbers, concat("1", "-"))]
391
+
392
+ Note that this selector will also match elements such as
393
+ `<foo numbers="1- foo bar" />`.
394
+
395
+ ### Syntax
396
+
397
+ The syntax of the various attribute selectors can be described as following:
398
+
399
+ # Strings are used for the attribute values
400
+
401
+ dquote = '"';
402
+ squote = "'";
403
+
404
+ string_dquote = dquote ^dquote* dquote;
405
+ string_squote = squote ^squote* squote;
406
+
407
+ string = string_dquote | string_squote;
408
+
409
+ # The `identifier` rule is the same as the one used for matching element
410
+ # names.
411
+ attr_test = identifier '[' space* identifier (space* '=' space* string)* space* ']';
412
+
413
+ Whitespace inside the brackets does not affect the behaviour of the selector.
414
+
415
+ ## Pseudo Classes
416
+
417
+ W3 chapter: <http://www.w3.org/TR/css3-selectors/#structural-pseudos>
418
+
419
+ Pseudo classes can be used to further narrow down elements besides just their
420
+ names and attribute values. In essence they are a combination of XPath function
421
+ calls and axes. Some pseudo classes can take an argument to alter their
422
+ behaviour.
423
+
424
+ Pseudo classes are often applied to element selectors. For example:
425
+
426
+ foo:bar
427
+
428
+ Here `:bar` would be a pseudo class applied to the `foo` element. Some pseudo
429
+ classes (e.g. the `:root` pseudo class) can also be used on their own, for
430
+ example:
431
+
432
+ :root
433
+
434
+ ### :root
435
+
436
+ The `:root` pseudo class selects an element only if it's the top-level element
437
+ in a document.
438
+
439
+ Example XML:
440
+
441
+ <root>
442
+ <foo />
443
+ </root>
444
+
445
+ Using the CSS expression `root foo:root` we'd get an empty set as the `<foo>`
446
+ element is not the root element. On the other hand, `root:root` would return a
447
+ set containing only the `<root>` element.
448
+
449
+ This selector can both be applied to an element selector as well as being used
450
+ on its own.
451
+
452
+ For the selector `foo:root` the corresponding XPath expression is as following:
453
+
454
+ foo[not(parent::*)]
455
+
456
+ For `:root` the XPath expression is:
457
+
458
+ *[not(parent::*)]
459
+
460
+ ### :nth-child(n)
461
+
462
+ The `:nth-child(n)` pseudo class can be used to select a set of elements based
463
+ on their position or an interval, skipping elements that occur in a set before
464
+ the given position or interval.
465
+
466
+ In the form `:nth-child(n)` the identifier `n` is an argument that can be used
467
+ to specify one of the following:
468
+
469
+ 1. A literal node set index
470
+ 2. A node interval used to match every N nodes
471
+ 3. A node interval plus an initial offset
472
+
473
+ The first element in a node set for `:nth-child()` is located at position 1,
474
+ _not_ position 0 (unlike most programming languages). As a result
475
+ `:nth-child(1)` matches the _first_ element, _not_ the second. This can be
476
+ visualized as following:
477
+
478
+ :nth-child(2)
479
+
480
+ 1 2 3 4 5 6
481
+ +---+ +---+ +---+ +---+ +---+ +---+
482
+ | | | X | | | | | | | | |
483
+ +---+ +---+ +---+ +---+ +---+ +---+
484
+
485
+ Besides using a literal index argument you can also use an interval, optionally
486
+ with an offset. This can be used to for example match every 2nd element, or
487
+ every 2nd element starting at element number 4.
488
+
489
+ The syntax of this argument is as following:
490
+
491
+ integer = ('+' | '-')* [0-9]+;
492
+ interval = ('n' | '-n' | integer 'n') integer;
493
+
494
+ Here `interval` would match any of the following:
495
+
496
+ n
497
+ -n
498
+ 2n
499
+ 2n+5
500
+ 2n-5
501
+ -2n+5
502
+ -2n-5
503
+
504
+ Due to `integer` also matching the `+` and `-` it will be part of the same
505
+ token. If this is not desired the following grammar can be used instead:
506
+
507
+ integer = [0-9]+;
508
+ modifier = '+' | '-';
509
+ interval = ('n' | '-n' | modifier* integer 'n') modifier integer;
510
+
511
+ To match every 2nd element you'd use the following:
512
+
513
+ :nth-child(2n)
514
+
515
+ 1 2 3 4 5 6
516
+ +---+ +---+ +---+ +---+ +---+ +---+
517
+ | | | X | | | | X | | | | X |
518
+ +---+ +---+ +---+ +---+ +---+ +---+
519
+
520
+ To match every 2nd element starting at element 1 you'd instead use this:
521
+
522
+ :nth-child(2n+1)
523
+
524
+ 1 2 3 4 5 6
525
+ +---+ +---+ +---+ +---+ +---+ +---+
526
+ | X | | | | X | | | | X | | |
527
+ +---+ +---+ +---+ +---+ +---+ +---+
528
+
529
+ As mentioned the `+1` in the above example is the initial offset. This is
530
+ however _only_ the case if the second number is positive. That means that for
531
+ `:nth-child(2n-2)` the offset is _not_ `-2`. When using a negative offset the
532
+ actual offset first has to be calculated. When using an argument in the form of
533
+ `An-B` we can calculate the actual offset as following:
534
+
535
+ offset = A - (B % A)
536
+
537
+ For example, for the selector `:nth-child(2n-2)` the formula would be:
538
+
539
+ offset = 2 - (-2 % 2) # => 2
540
+
541
+ This would result in the selector `:nth-child(2n+2)`.
542
+
543
+ As an another example, for the selector `:nth-child(2n-5)` the formula would be:
544
+
545
+ offset = 2 - (-5 % 2) # => 1
546
+
547
+ Which would result in the selector `:nth-child(2n+1)`
548
+
549
+ To ease the process of selecting even and uneven elements you can also use
550
+ `even` and `odd` as an argument. Using `:nth-child(even)` is the same as
551
+ `:nth-child(2n)` while using `:nth-child(odd)` in turn is the same as
552
+ `:nth-child(2n+1)`.
553
+
554
+ Using `:nth-child(n)` simply matches all elements in the set. Using
555
+ `:nth-child(-n)` doesn't match any elements, though Oga treats it the same as
556
+ `:nth-child(n)`.
557
+
558
+ Expressions such as `:nth-child(-n-5)` are invalid as both parts of the interval
559
+ (`-n` and `-5`) are a negative. However, `:nth-child(-n+5)` is
560
+ perfectly valid and would match the first 5 elements in a set:
561
+
562
+ :nth-child(-n+5)
563
+
564
+ 1 2 3 4 5 6
565
+ +---+ +---+ +---+ +---+ +---+ +---+
566
+ | X | | X | | X | | X | | X | | |
567
+ +---+ +---+ +---+ +---+ +---+ +---+
568
+
569
+
570
+ Using `:nth-child(n+5)` would match all elements starting at element 5:
571
+
572
+ :nth-child(n+5)
573
+
574
+ 1 2 3 4 5 6 7 8 9 10
575
+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
576
+ | | | | | | | | | X | | X | | X | | X | | X | | X |
577
+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
578
+
579
+ To summarize:
580
+
581
+ :nth-child(n) => matches all elements
582
+ :nth-child(-n) => matches nothing, though Oga treats it the same as "n"
583
+ :nth-child(5) => matches element #5
584
+ :nth-child(2n) => matches every 2 elements
585
+ :nth-child(2n+2) => matches every 2 elements, starting at element 2
586
+ :nth-child(2n-2) => matches every 2 elements, starting at element 1
587
+ :nth-child(n+5) => matches all elements, starting at element 5
588
+ :nth-child(-n+5) => matches the first 5 elements
589
+ :nth-child(even) => matches every 2nd element, starting at element 2
590
+ :nth-child(odd) => matches every 2nd element, starting at element 1
591
+
592
+ The corresponding XPath expressions are quite complex and differ based on the
593
+ interval argument used. For the various forms the corresponding XPath
594
+ expressions are as following:
595
+
596
+ :nth-child(n) => *[((count(preceding-sibling::*) + 1) mod 1) = 0]
597
+ :nth-child(-n) => *[((count(preceding-sibling::*) + 1) mod 1) = 0]
598
+ :nth-child(5) => *[count(preceding-sibling::*) = 4]
599
+ :nth-child(2n) => *[((count(preceding-sibling::*) + 1) mod 2) = 0]
600
+ :nth-child(2n+2) => *[(count(preceding-sibling::*) + 1) >= 2 and (((count(preceding-sibling::*) + 1) - 2) mod 2) = 0]
601
+ :nth-child(2n-6) => *[(count(preceding-sibling::*) + 1) >= 2 and (((count(preceding-sibling::*) + 1) - 2) mod 2) = 0]
602
+ :nth-child(n+5) => *[(count(preceding-sibling::*) + 1) >= 5 and (((count(preceding-sibling::*) + 1) - 5) mod 1) = 0]
603
+ :nth-child(-n+6) => *[((count(preceding-sibling::*) + 1) <= 6) and (((count(preceding-sibling::*) + 1) - 6) mod 1) = 0]
604
+ :nth-child(even) => *[((count(preceding-sibling::*) + 1) mod 2) = 0]
605
+ :nth-child(odd) => *[(count(preceding-sibling::*) + 1) >= 1 and (((count(preceding-sibling::*) + 1) - 1) mod 2) = 0]
606
+
607
+ ### :nth-last-child(n)
608
+
609
+ The `:nth-last-child(n)` pseudo class can be used to select a set of elements
610
+ based on their position or an interval, skipping elements that occur in a set
611
+ after the given position or interval.
612
+
613
+ The arguments that can be used by this selector are the same as those mentioned
614
+ in [:nth-child(n)][nth-childn].
615
+
616
+ Because this selectors matches in reverse (compared to
617
+ [:nth-child(n)][nth-childn]) using an index such as "1" will match the _last_
618
+ element in a set, not the first one:
619
+
620
+ :nth-last-child(1)
621
+
622
+ 1 2 3 4 5 6
623
+ +---+ +---+ +---+ +---+ +---+ +---+
624
+ | | | | | | | | | | | X | <- matching direction
625
+ +---+ +---+ +---+ +---+ +---+ +---+
626
+
627
+ When using an interval (with or without an offset) the nodes are also matched in
628
+ reverse order. However, matched nodes should be returned in the order they
629
+ appear in in the document.
630
+
631
+ For example, the selector `:nth-last-child(2n)` would match as following:
632
+
633
+ :nth-last-child(2n)
634
+
635
+ 1 2 3 4 5 6
636
+ +---+ +---+ +---+ +---+ +---+ +---+
637
+ | X | | | | X | | | | X | | | <- matching direction
638
+ +---+ +---+ +---+ +---+ +---+ +---+
639
+
640
+ The resulting set however would contain the nodes in the order `[1, 3, 5]`
641
+ instead of `[5, 3, 1]`.
642
+
643
+ When using an interval with an initial offset the offset is also applied in
644
+ reverse order. For example, the selector `:nth-last-child(2n)` would match as
645
+ following:
646
+
647
+ :nth-last-child(2n+1)
648
+
649
+ 1 2 3 4 5 6
650
+ +---+ +---+ +---+ +---+ +---+ +---+
651
+ | | | X | | | | X | | | | X | <- matching direction
652
+ +---+ +---+ +---+ +---+ +---+ +---+
653
+
654
+ The corresponding XPath expressions are similar to those used for
655
+ [:nth-child(n)][nth-childn]:
656
+
657
+ :nth-last-child(n) => *[count(following-sibling::*) = -1]
658
+ :nth-last-child(-n) => *[count(following-sibling::*) = -1]
659
+ :nth-last-child(5) => *[count(following-sibling::*) = 4]
660
+ :nth-last-child(2n) => *[((count(following-sibling::*) + 1) mod 2) = 0]
661
+ :nth-last-child(2n+2) => *[((count(following-sibling::*) + 1) >= 2) and ((((count(following-sibling::*) + 1) - 2) mod 2) = 0)]
662
+ :nth-last-child(2n-6) => *[((count(following-sibling::*) + 1) >= 2) and ((((count(following-sibling::*) + 1) - 2) mod 2) = 0)]
663
+ :nth-last-child(n+5) => *[((count(following-sibling::*) + 1) >= 5) and ((((count(following-sibling::*) + 1) - 5) mod 1) = 0)]
664
+ :nth-last-child(-n+6) => *[((count(following-sibling::*) + 1) <= 6) and ((((count(following-sibling::*) + 1) - 6) mod 1) = 0)]
665
+ :nth-last-child(even) => *[((count(following-sibling::*) + 1) mod 2) = 0]
666
+ :nth-last-child(odd) => *[((count(following-sibling::*) + 1) >= 1) and ((((count(following-sibling::*) + 1) - 1) mod 2) = 0)]
667
+
668
+ ### :nth-of-type(n)
669
+
670
+ The `:nth-of-type(n)` pseudo class can be used to select a set of elements that
671
+ has a set of preceding siblings with the same name. The arguments that can be
672
+ used by this selector are the same as those mentioned in
673
+ [:nth-child(n)][nth-childn].
674
+
675
+ The matching order of this selector is the same as [:nth-child(n)][nth-childn].
676
+
677
+ Example XML:
678
+
679
+ <root>
680
+ <foo />
681
+ <foo />
682
+ <foo />
683
+ <foo />
684
+ <bar />
685
+ </root>
686
+
687
+ Using the CSS expression `root foo:nth-of-type(even)` would return a set
688
+ containing the 2nd and 4th `<foo>` nodes.
689
+
690
+ The corresponding XPath expressions for the various forms of this pseudo class
691
+ are as following:
692
+
693
+ :nth-of-type(n) => *[position() = n]
694
+ :nth-of-type(-n) => *[position() = -n]
695
+ :nth-of-type(5) => *[position() = 5]
696
+ :nth-of-type(2n) => *[(position() mod 2) = 0]
697
+ :nth-of-type(2n+2) => *[(position() >= 2) and (((position() - 2) mod 2) = 0)]
698
+ :nth-of-type(2n-6) => *[(position() >= 2) and (((position() - 2) mod 2) = 0)]
699
+ :nth-of-type(n+5) => *[(position() >= 5) and (((position() - 5) mod 1) = 0)]
700
+ :nth-of-type(-n+6) => *[(position() <= 6) and (((position() - 6) mod 1) = 0)]
701
+ :nth-of-type(even) => *[(position() mod 2) = 0]
702
+ :nth-of-type(odd) => *[(position() >= 1) and (((position() - 1) mod 2) = 0)]
703
+
704
+ ### :nth-last-of-type(n)
705
+
706
+ The `:nth-last-of-type(n)` pseudo class behaves the same as
707
+ [:nth-of-type(n)][nth-last-of-typen] excepts it matches nodes in reverse order
708
+ similar to [:nth-last-child(n)][nth-last-childn]. To clarify, this means
709
+ matching occurs as following:
710
+
711
+
712
+ :nth-last-of-type(1)
713
+
714
+ 1 2 3 4 5 6
715
+ +---+ +---+ +---+ +---+ +---+ +---+
716
+ | | | | | | | | | | | X | <- matching direction
717
+ +---+ +---+ +---+ +---+ +---+ +---+
718
+
719
+ Example XML:
720
+
721
+ <root>
722
+ <foo />
723
+ <foo />
724
+ <foo />
725
+ <foo />
726
+ <bar />
727
+ </root>
728
+
729
+ Using the CSS expression `root foo:nth-of-type(even)` would return a set
730
+ containing the 1st and 3rd `<foo>` nodes.
731
+
732
+ The corresponding XPath expressions for the various forms of this pseudo class
733
+ are as following:
734
+
735
+ :nth-last-of-type(n) => *[position() = last() - -1]
736
+ :nth-last-of-type(-n) => *[position() = last() - -1]
737
+ :nth-last-of-type(5) => *[position() = last() - 4]
738
+ :nth-last-of-type(2n) => *[((last() - position()+1) mod 2) = 0]
739
+ :nth-last-of-type(2n+2) => *[((last() - position()+1) >= 2) and ((((last() - position() + 1) - 2) mod 2) = 0)]
740
+ :nth-last-of-type(2n-6) => *[((last() - position()+1) >= 2) and ((((last() - position() + 1) - 2) mod 2) = 0)]
741
+ :nth-last-of-type(n+5) => *[((last() - position()+1) >= 5) and ((((last() - position() + 1) - 5) mod 1) = 0)]
742
+ :nth-last-of-type(-n+6) => *[((last() - position()+1) <= 6) and ((((last() - position() + 1) - 6) mod 1) = 0)]
743
+ :nth-last-of-type(even) => *[((last() - position()+1) mod 2) = 0]
744
+ :nth-last-of-type(odd) => *[((last() - position()+1) >= 1) and ((((last() - position() + 1) - 1) mod 2) = 0)]
745
+
746
+ ### :first-child
747
+
748
+ The `:first-child` pseudo class can be used to match a node that is the first
749
+ child node of another node (= a node without any preceding nodes).
750
+
751
+ Example XML:
752
+
753
+ <root>
754
+ <foo />
755
+ <bar />
756
+ </root>
757
+
758
+ Using the CSS selector `root :first-child` would return a set containing only
759
+ the `<foo>` node.
760
+
761
+ The corresponding XPath expression for this pseudo class is as following:
762
+
763
+ :first-child => *[count(preceding-sibling::*) = 0]
764
+
765
+ ### :last-child
766
+
767
+ The `:last-child` pseudo class can be used to match a node that is the last
768
+ child node of another node (= a node without any following nodes).
769
+
770
+ Example XML:
771
+
772
+ <root>
773
+ <foo />
774
+ <bar />
775
+ </root>
776
+
777
+ Using the CSS selector `root :last-child` would return a set containing only
778
+ the `<bar>` node.
779
+
780
+ The corresponding XPath expression for this pseudo class is as following:
781
+
782
+ :last-child => *[count(following-sibling::*) = 0]
783
+
784
+ ### :first-of-type
785
+
786
+ The `:first-of-type` pseudo class matches elements that are the first sibling of
787
+ its type in the list of elements of its parent element. This selector is the
788
+ same as [:nth-of-type(1)][nth-of-typen].
789
+
790
+ Example XML:
791
+
792
+ <root>
793
+ <a id="1" />
794
+ <a id="2">
795
+ <a id="3" />
796
+ <a id="4" />
797
+ </a>
798
+ </root>
799
+
800
+ Using the CSS selector `root a:first-of-type` would return a node set containing
801
+ nodes `<a id="1">` and `<a id="3">` as both nodes are the first siblings of
802
+ their type.
803
+
804
+ The corresponding XPath for this pseudo class is as following:
805
+
806
+ a:first-of-type => a[count(preceding-sibling::a) = 0]
807
+
808
+ An alternative way is to use the following XPath:
809
+
810
+ a:first-of-type => //a[position() = 1]
811
+
812
+ This however relies on the less efficient `descendant-or-self::node()` selector.
813
+ For querying larger documents it's recommended to use the first form instead.
814
+
815
+ ### :last-of-type
816
+
817
+ The `:last-of-type` pseudo class can be used to match elements that are the last
818
+ sibling of its type in the list of elements of its parent. This selector is the
819
+ same as [:nth-last-of-type(1)][nth-last-of-typen].
820
+
821
+ Example XML:
822
+
823
+ <root>
824
+ <a id="1" />
825
+ <a id="2">
826
+ <a id="3" />
827
+ <a id="4" />
828
+ </a>
829
+ </root>
830
+
831
+ Using the CSS selector `root a:last-of-type` would return a set containing nodes
832
+ `<a id="2">` and `<a id="4">` as both nodes are the last siblings of their type.
833
+
834
+ The corresponding XPath for this pseudo class is as following:
835
+
836
+ a:last-of-type => a[count(following-sibling::a) = 0]
837
+
838
+ Similar to [:first-of-type][first-of-typen] this XPath can alternatively be
839
+ written as following:
840
+
841
+ a:last-of-type => //a[position() = last()]
842
+
843
+ ### :only-child
844
+
845
+ The `:only-child` pseudo class can be used to match elements that are the only
846
+ child element of its parent.
847
+
848
+ Example XML:
849
+
850
+ <root>
851
+ <a id="1" />
852
+ <a id="2">
853
+ <a id="3" />
854
+ </a>
855
+ </root>
856
+
857
+ Using the CSS selector `root a:only-child` would return a set containing only
858
+ the `<a id="3">` node.
859
+
860
+ The corresponding XPath for this pseudo class is as following:
861
+
862
+ a:only-child => a[count(preceding-sibling::*) = 0 and count(following-sibling::*) = 0]
863
+
864
+ ### :only-of-type
865
+
866
+ The `:only-of-type` pseudo class can be used to match elements that are the only
867
+ child elements of its type of its parent.
868
+
869
+ Example XML:
870
+
871
+ <root>
872
+ <a id="1" />
873
+ <a id="2">
874
+ <a id="3" />
875
+ <b id="4" />
876
+ </a>
877
+ </root>
878
+
879
+ Using the CSS selector `root a:only-of-type` would return a set containing
880
+ only the `<a id="3">` node due to it being the only `<a>` node in the list of
881
+ elements of its parent.
882
+
883
+ The corresponding XPath for this pseudo class is as following:
884
+
885
+ a:only-child => a[count(preceding-sibling::a) = 0 and count(following-sibling::a) = 0]
886
+
887
+ ### :empty
888
+
889
+ The `:empty` pseudo class can be used to match elements that have no child nodes
890
+ at all.
891
+
892
+ Example XML:
893
+
894
+ <root>
895
+ <a />
896
+ <b>10</b>
897
+ </root>
898
+
899
+ Using the CSS selector `root :empty` would return a set containing only the
900
+ `<a>` node.
901
+
902
+ ### Syntax
903
+
904
+ The syntax of the various pseudo classes is as following:
905
+
906
+ integer = ('+' | '-')* [0-9]+;
907
+
908
+ odd = 'odd';
909
+ even = 'even';
910
+ nth = 'n';
911
+
912
+ pseudo_arg_interval = '-'* integer* nth;
913
+ pseudo_arg_offset = ('+' | '-')* integer;
914
+
915
+ pseudo_arg = odd
916
+ | even
917
+ | '-'* nth
918
+ | integer
919
+ | pseudo_arg_interval
920
+ | pseudo_arg_interval pseudo_arg_offset;
921
+
922
+ # The `identifier` rule is the same as the one used for element names.
923
+ pseudo = ':' identifier ('(' space* pseudo_arg space* ')')*;
924
+
925
+ [w3spec]: http://www.w3.org/TR/css3-selectors/
926
+ [rfc-2119]: https://www.ietf.org/rfc/rfc2119.txt
927
+ [kramdown]: http://kramdown.gettalong.org/
928
+ [universal-selector]: #universal-selector
929
+ [ragel]: http://www.colm.net/open-source/ragel/
930
+ [nth-childn]: #nth-childn
931
+ [nth-last-childn]: #nth-last-childn
932
+ [nth-last-of-typen]: #nth-last-of-typen
933
+ [nth-of-typen]: #nth-of-type
934
+ [nth-last-of-typen]: #nth-last-of-typen
935
+ [first-of-typen]: #first-of-typen