oga 0.1.3 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +57 -0
- data/doc/changelog.md +128 -0
- data/doc/css/common.css +5 -4
- data/doc/css_selectors.md +935 -0
- data/doc/manually_creating_documents.md +67 -0
- data/doc/xml_namespaces.md +63 -0
- data/ext/c/lexer.c +745 -628
- data/ext/c/lexer.h +8 -0
- data/ext/c/lexer.rl +44 -7
- data/ext/java/org/liboga/xml/Lexer.java +351 -232
- data/ext/java/org/liboga/xml/Lexer.rl +29 -8
- data/ext/ragel/base_lexer.rl +68 -18
- data/lib/oga.rb +4 -1
- data/lib/oga/css/lexer.rb +743 -0
- data/lib/oga/css/parser.rb +828 -0
- data/lib/oga/version.rb +1 -1
- data/lib/oga/xml/attribute.rb +3 -1
- data/lib/oga/xml/element.rb +15 -1
- data/lib/oga/xml/entities.rb +60 -0
- data/lib/oga/xml/html_void_elements.rb +2 -0
- data/lib/oga/xml/lexer.rb +36 -28
- data/lib/oga/xml/node_set.rb +22 -0
- data/lib/oga/xml/parser.rb +149 -128
- data/lib/oga/xml/querying.rb +24 -0
- data/lib/oga/xml/sax_parser.rb +55 -1
- data/lib/oga/xml/text.rb +6 -1
- data/lib/oga/xpath/evaluator.rb +138 -101
- data/lib/oga/xpath/lexer.rb +1205 -1294
- data/lib/oga/xpath/parser.rb +228 -204
- metadata +9 -4
- data/lib/oga/xpath/node.rb +0 -10
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 9abda7194e4d0f181bf8a43c5d5154c965fd1d81
|
4
|
+
data.tar.gz: 8cd27a710c2c761ffd37d9393e53e3b78f53444e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 556c869e33dfe785eda199e42a5d2fe869e269c8522809b81d8febdefe88285c52a3c7b17e22524b031cf5698395c505fd82f9ef8d69db538f9ac7f19f761e47
|
7
|
+
data.tar.gz: 149034fbe883e5e0df5f805aa17b07cf12afc8cccefbc22d9f73e6dcc6ba2b6fbe1ef39706f4945fbab90e9ecdcb208b4d299f939eb428706f724f657ecbc822
|
data/README.md
CHANGED
@@ -70,6 +70,12 @@ Querying a document using XPath:
|
|
70
70
|
|
71
71
|
document.xpath('string(people/person)') # => "Alice"
|
72
72
|
|
73
|
+
Querying a document using CSS:
|
74
|
+
|
75
|
+
document = Oga.parse_xml('<people><person>Alice</person></people>')
|
76
|
+
|
77
|
+
document.css('people person') # => NodeSet(Element(name: "person" ...))
|
78
|
+
|
73
79
|
Modifying a document and serializing it back to XML:
|
74
80
|
|
75
81
|
document = Oga.parse_xml('<people><person>Alice</person></people>')
|
@@ -95,6 +101,7 @@ Querying a document using a namespace:
|
|
95
101
|
* Low memory footprint
|
96
102
|
* High performance, if something doesn't perform well enough it's a bug
|
97
103
|
* Support for XPath 1.0
|
104
|
+
* CSS3 selector support
|
98
105
|
* XML namespace support (registering, querying, etc)
|
99
106
|
|
100
107
|
## Requirements
|
@@ -127,6 +134,53 @@ _not_ thread-safe and should not be done by multiple threads at once.
|
|
127
134
|
It is advised that you do not share parsed documents between threads unless you
|
128
135
|
_really_ have to.
|
129
136
|
|
137
|
+
## Namespace Support
|
138
|
+
|
139
|
+
Oga fully supports parsing/registering XML namespaces as well as querying them
|
140
|
+
using XPath. For example, take the following XML:
|
141
|
+
|
142
|
+
<root xmlns="http://example.com">
|
143
|
+
<bar>bar</bar>
|
144
|
+
</root>
|
145
|
+
|
146
|
+
If one were to try and query the `bar` element (e.g. using XPath `root/bar`)
|
147
|
+
they'd end up with an empty node set. This is due to `<root>` defining an
|
148
|
+
alternative default namespace. Instead you can query this element using the
|
149
|
+
following XPath:
|
150
|
+
|
151
|
+
*[local-name() = "root"]/*[local-name() = "bar"]
|
152
|
+
|
153
|
+
Alternatively, if you don't really care where the `<bar>` element is located you
|
154
|
+
can use the following:
|
155
|
+
|
156
|
+
descendant::*[local-name() = "bar"]
|
157
|
+
|
158
|
+
And if you want to specify an explici namespace URI, you can use this:
|
159
|
+
|
160
|
+
descendant::*[local-name() = "bar" and namespace-uri() = "http://example.com"]
|
161
|
+
|
162
|
+
Unlike Nokogiri, Oga does _not_ provide a way to create "dynamic" namespaces.
|
163
|
+
That is, Nokogiri allows one to query the above document as following:
|
164
|
+
|
165
|
+
document = Nokogiri::XML('<root xmlns="http://example.com"><bar>bar</bar></root>')
|
166
|
+
|
167
|
+
document.xpath('x:root/x:bar', :x => 'http://example.com')
|
168
|
+
|
169
|
+
Oga does have a small trick you can use to cut down the size of your XPath
|
170
|
+
queries. Because Oga assigns the name "xmlns" to default namespaces you can use
|
171
|
+
this in your XPath queries:
|
172
|
+
|
173
|
+
document = Oga.parse_xml('<root xmlns="http://example.com"><bar>bar</bar></root>')
|
174
|
+
|
175
|
+
document.xpath('xmlns:root/xmlns:bar')
|
176
|
+
|
177
|
+
When using this you can still restrict the query to the correct namespace URI:
|
178
|
+
|
179
|
+
document.xpath('xmlns:root[namespace-uri() = "http://example.com"]/xmlns:bar')
|
180
|
+
|
181
|
+
In the future I might add an API to ease this process, although at this time I
|
182
|
+
have little interest in providing an API similar to Nokogiri.
|
183
|
+
|
130
184
|
## Documentation
|
131
185
|
|
132
186
|
The documentation is best viewed [on the documentation website][doc-website].
|
@@ -134,6 +188,9 @@ The documentation is best viewed [on the documentation website][doc-website].
|
|
134
188
|
* {file:CONTRIBUTING Contributing}
|
135
189
|
* {file:changelog Changelog}
|
136
190
|
* {file:migrating\_from\_nokogiri Migrating From Nokogiri}
|
191
|
+
* {Oga::XML::Parser XML Parser}
|
192
|
+
* {Oga::XML::SaxParser XML SAX Parser}
|
193
|
+
* {file:xml\_namespaces XML Namespaces}
|
137
194
|
|
138
195
|
## Native Extension Setup
|
139
196
|
|
data/doc/changelog.md
CHANGED
@@ -3,6 +3,134 @@
|
|
3
3
|
This document contains details of the various releases and their release dates.
|
4
4
|
Dates are in the format `yyyy-mm-dd`.
|
5
5
|
|
6
|
+
## 0.2.0 - 2014-11-17
|
7
|
+
|
8
|
+
### CSS Selector Support
|
9
|
+
|
10
|
+
Probably the biggest feature of this release: support for querying documents
|
11
|
+
using CSS selectors. Oga supports a subset of the CSS3 selector specification,
|
12
|
+
in particular the following selectors are supported:
|
13
|
+
|
14
|
+
* Element, class and ID selectors
|
15
|
+
* Attribute selectors (e.g. `foo[x ~= "y"]`)
|
16
|
+
|
17
|
+
The following pseudo classes are supported:
|
18
|
+
|
19
|
+
* `:root`
|
20
|
+
* `:nth-child(n)`
|
21
|
+
* `:nth-last-child(n)`
|
22
|
+
* `:nth-of-type(n)`
|
23
|
+
* `:nth-last-of-type(n)`
|
24
|
+
* `:first-child`
|
25
|
+
* `:last-child`
|
26
|
+
* `:first-of-type`
|
27
|
+
* `:last-of-type`
|
28
|
+
* `:only-child`
|
29
|
+
* `:only-of-type`
|
30
|
+
* `:empty`
|
31
|
+
|
32
|
+
You can use CSS selectors using the methods `css` and `at_css` on an instance of
|
33
|
+
`Oga::XML::Document` or `Oga::XML::Element`. For example:
|
34
|
+
|
35
|
+
document = Oga.parse_xml('<people><person>Alice</person></people>')
|
36
|
+
|
37
|
+
document.css('people person') # => NodeSet(Element(name: "person" ...))
|
38
|
+
|
39
|
+
The architecture behind this is quite similar to parsing XPath. There's a lexer
|
40
|
+
(`Oga::CSS::Lexer`) and a parser (`Oga::CSS::Parser`). Unlike Nokogiri (and
|
41
|
+
perhaps other libraries) the parser _does not_ output XPath expressions as a
|
42
|
+
String or a CSS specific AST. Instead it directly emits an XPath AST. This
|
43
|
+
allows the resulting AST to be directly evaluated by `Oga::XPath::Evaluator`.
|
44
|
+
|
45
|
+
See <https://github.com/YorickPeterse/oga/issues/11> for more information.
|
46
|
+
|
47
|
+
### Mutli-line Attribute Support
|
48
|
+
|
49
|
+
Oga can now lex/parse elements that have attributes with newlines in them.
|
50
|
+
Previously this would trigger memory allocation errors.
|
51
|
+
|
52
|
+
See <https://github.com/YorickPeterse/oga/issues/58> for more information.
|
53
|
+
|
54
|
+
### SAX after_element
|
55
|
+
|
56
|
+
The `after_element` method in the SAX parsing API now always takes two
|
57
|
+
arguments: the namespace name and element name. Previously this method would
|
58
|
+
always receive a single nil value as its argument, which is rather pointless.
|
59
|
+
|
60
|
+
See <https://github.com/YorickPeterse/oga/issues/54> for more information.
|
61
|
+
|
62
|
+
### XPath Grouping
|
63
|
+
|
64
|
+
XPath expressions can now be grouped together using parenthesis. This allows one
|
65
|
+
to specify a custom operator precedence.
|
66
|
+
|
67
|
+
### Enumerator Parsing Input
|
68
|
+
|
69
|
+
Enumerator instances can now be used as input for `Oga.parse_xml` and friends.
|
70
|
+
This can be used to download and parse XML files on the fly. For example:
|
71
|
+
|
72
|
+
enum = Enumerator.new do |yielder|
|
73
|
+
HTTPClient.get('http://some-website.com/some-big-file.xml') do |chunk|
|
74
|
+
yielder << chunk
|
75
|
+
end
|
76
|
+
end
|
77
|
+
|
78
|
+
document = Oga.parse_xml(enum)
|
79
|
+
|
80
|
+
See <https://github.com/YorickPeterse/oga/issues/48> for more information.
|
81
|
+
|
82
|
+
### Removing Attributes
|
83
|
+
|
84
|
+
Element attributes can now be removed using `Oga::XML::Element#unset`:
|
85
|
+
|
86
|
+
element = Oga::XML::Element.new(:name => 'foo')
|
87
|
+
|
88
|
+
element.set('class', 'foo')
|
89
|
+
element.unset('class')
|
90
|
+
|
91
|
+
### XPath Attributes
|
92
|
+
|
93
|
+
XPath predicates are now evaluated for every context node opposed to being
|
94
|
+
evaluated once for the entire context. This ensures that expressions such as
|
95
|
+
`descendant-or-self::node()/foo[1]` are evaluated correctly.
|
96
|
+
|
97
|
+
### Available Namespaces
|
98
|
+
|
99
|
+
When calling `Oga::XML::Element#available_namespaces` the Hash returned by
|
100
|
+
`Oga::XML::Element#namespaces` would be modified in place. This was a bug that
|
101
|
+
has been fixed in this release.
|
102
|
+
|
103
|
+
### NodeSets
|
104
|
+
|
105
|
+
NodeSet instances can now be compared with each other using `==`. Previously
|
106
|
+
this would always consider two instances to be different from each other due to
|
107
|
+
the usage of the default `Object#==` method.
|
108
|
+
|
109
|
+
### XML Entities
|
110
|
+
|
111
|
+
XML entities such as `&` and `<` are now encoded/decoded by the lexer,
|
112
|
+
string and text nodes.
|
113
|
+
|
114
|
+
See <https://github.com/YorickPeterse/oga/issues/49> for more information.
|
115
|
+
|
116
|
+
### General
|
117
|
+
|
118
|
+
Source lines are no longer included in error messages generated by the XML
|
119
|
+
parser. This simplifies the code and removes the need of re-reading the input
|
120
|
+
(in case of IO/Enumerable inputs).
|
121
|
+
|
122
|
+
### XML Lexer Newlines
|
123
|
+
|
124
|
+
Newlines in the XML lexer are now counted in native code (C/Java). On MRI and
|
125
|
+
JRuby the improvement is quite small, but on Rubinius it's a massive
|
126
|
+
improvement. See commit `8db77c0a09bf6c996dd2856a6dbe1ad076b1d30a` for more
|
127
|
+
information.
|
128
|
+
|
129
|
+
### HTML Void Element Performance
|
130
|
+
|
131
|
+
Performance for detecting HTML void elements (e.g. `<br>` and `<link>`) has been
|
132
|
+
improved by removing String allocations that were not needed.
|
133
|
+
|
6
134
|
## 0.1.3 - 2014-09-24
|
7
135
|
|
8
136
|
This release fixes a problem with serializing attributes using the namespace
|
data/doc/css/common.css
CHANGED
@@ -6,11 +6,12 @@ body
|
|
6
6
|
max-width: 960px;
|
7
7
|
}
|
8
8
|
|
9
|
-
p code
|
9
|
+
p code, dd code, li code
|
10
10
|
{
|
11
|
-
background:
|
12
|
-
|
13
|
-
|
11
|
+
background: #f9f2f4;
|
12
|
+
color: #c7254e;
|
13
|
+
border-radius: 4px;
|
14
|
+
padding: 2px 4px;
|
14
15
|
}
|
15
16
|
|
16
17
|
pre.code
|
@@ -0,0 +1,935 @@
|
|
1
|
+
# CSS Selectors Specification
|
2
|
+
|
3
|
+
This document acts as an alternative specification to the official W3
|
4
|
+
[CSS3 Selectors Specification][w3spec]. This document specifies only the
|
5
|
+
selectors supported by Oga itself. Only CSS3 selectors are covered, CSS4 is not
|
6
|
+
part of this specification.
|
7
|
+
|
8
|
+
This document is best viewed in the YARD generated documentation or any other
|
9
|
+
Markdown viewer that supports the [Kramdown][kramdown] syntax. Alternatively it
|
10
|
+
can be viewed in its raw form.
|
11
|
+
|
12
|
+
## Abstract
|
13
|
+
|
14
|
+
The official W3 specification on CSS selectors is anything but pleasant to read.
|
15
|
+
A lack of good examples and unspecified behaviour are just two of many problems.
|
16
|
+
This document was written as a reference guide for myself as well as a way for
|
17
|
+
others to more easily understand how CSS selectors work.
|
18
|
+
|
19
|
+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
|
20
|
+
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
|
21
|
+
interpreted as described in [RFC 2119][rfc-2119].
|
22
|
+
|
23
|
+
## Syntax
|
24
|
+
|
25
|
+
To describe syntax elements of CSS selectors this document uses the same grammar
|
26
|
+
as [Ragel][ragel]. For example, an integer would be defined as following:
|
27
|
+
|
28
|
+
integer = [0-9]+;
|
29
|
+
|
30
|
+
In turn an integer that can optionally be prefixed by `+` or `-` would be
|
31
|
+
defined as following:
|
32
|
+
|
33
|
+
integer = ('+' | '-')* [0-9]+;
|
34
|
+
|
35
|
+
A quick and basic crash course of the Ragel grammar:
|
36
|
+
|
37
|
+
* `*`: zero or more instance of the preceding token(s)
|
38
|
+
* `+`: one or more instances of the preceding token(s)
|
39
|
+
* `(` and `)`: used for grouping expressions together
|
40
|
+
* `^`: inverts a match, thus `^[0-9]` means "anything but a single digit"
|
41
|
+
* `"..."` or `'...'`: a literal character, `"x"` would match the literal "x"
|
42
|
+
* `|`: the OR operator, `x | y` translates to "x OR y"
|
43
|
+
* `[...]`: used to define a sequence, `[0-9]` translates to "0 OR 1 OR 2 OR
|
44
|
+
3..." all the way upto 9
|
45
|
+
|
46
|
+
Semicolons are used to terminate lines. While not strictly required in this
|
47
|
+
specification they are included in order to produce a Ragel syntax compatible
|
48
|
+
grammar.
|
49
|
+
|
50
|
+
See the Ragel documentation for more information on the grammar.
|
51
|
+
|
52
|
+
## Terminology
|
53
|
+
|
54
|
+
local name
|
55
|
+
: The name of an element without a namespace. For the element `<strong>` the
|
56
|
+
local name is `strong`.
|
57
|
+
|
58
|
+
namespace prefix
|
59
|
+
: The namespace prefix of an element. For the element `<foo:strong>` the
|
60
|
+
namespace prefix is `foo`.
|
61
|
+
|
62
|
+
expression
|
63
|
+
: A single or multiple selectors used together to retrieve a set of elements
|
64
|
+
from a document.
|
65
|
+
|
66
|
+
## Selector Scoping
|
67
|
+
|
68
|
+
Whenever a selector is used to match an element the selector applies to all
|
69
|
+
nodes in the context. For example, the selector `foo` would match all `foo`
|
70
|
+
elements at any position in the document. On the other hand, the selector
|
71
|
+
`foo bar` only matches any `bar` elements that are a descedant of any `foo`
|
72
|
+
element.
|
73
|
+
|
74
|
+
In XPath the corresponding axis for this is `descendant`. In other words, this
|
75
|
+
CSS expression:
|
76
|
+
|
77
|
+
foo
|
78
|
+
|
79
|
+
is the same as this XPath expression:
|
80
|
+
|
81
|
+
descendant::foo
|
82
|
+
|
83
|
+
In turn this CSS expression:
|
84
|
+
|
85
|
+
foo bar
|
86
|
+
|
87
|
+
is the same as this XPath expression:
|
88
|
+
|
89
|
+
descendant::foo/::bar
|
90
|
+
|
91
|
+
Note that in the various XPath examples the `descendant` axis is omitted in
|
92
|
+
order to enhance readability.
|
93
|
+
|
94
|
+
### Syntax
|
95
|
+
|
96
|
+
A CSS expression is made up of multiple selectors separated by one or more
|
97
|
+
spaces. There MUST be at least 1 space between two selectors, there MAY be more
|
98
|
+
than one. Multiple spaces do not alter the behaviour of the expression in any
|
99
|
+
way.
|
100
|
+
|
101
|
+
## Universal Selector
|
102
|
+
|
103
|
+
W3 chapter: <http://www.w3.org/TR/css3-selectors/#universal-selector>
|
104
|
+
|
105
|
+
The universal selector `*` (also known as the "wildcard selector") can be used
|
106
|
+
to match any element, regardless of its local name or namespace prefix.
|
107
|
+
|
108
|
+
Example XML:
|
109
|
+
|
110
|
+
<root>
|
111
|
+
<foo></foo>
|
112
|
+
<bar></bar>
|
113
|
+
</root>
|
114
|
+
|
115
|
+
CSS:
|
116
|
+
|
117
|
+
root *
|
118
|
+
|
119
|
+
This would return a set containing two elements: `<foo>` and `<bar>`
|
120
|
+
|
121
|
+
The corresponding XPath is also `*`.
|
122
|
+
|
123
|
+
### Syntax
|
124
|
+
|
125
|
+
The syntax for the universal selector is very simple:
|
126
|
+
|
127
|
+
universal = '*';
|
128
|
+
|
129
|
+
## Element Selector
|
130
|
+
|
131
|
+
W3 chapter: <http://www.w3.org/TR/css3-selectors/#type-selectors>
|
132
|
+
|
133
|
+
The element selector (known as "Type selector" in the official W3 specification)
|
134
|
+
can be used to match a set of elements by their local name or namespace. The
|
135
|
+
selector `foo` is used to match all elements with the local name being set to
|
136
|
+
`foo`.
|
137
|
+
|
138
|
+
Example XML:
|
139
|
+
|
140
|
+
<root>
|
141
|
+
<foo />
|
142
|
+
<bar />
|
143
|
+
</root>
|
144
|
+
|
145
|
+
CSS:
|
146
|
+
|
147
|
+
root foo
|
148
|
+
|
149
|
+
This would return a set with only the `<foo>` element.
|
150
|
+
|
151
|
+
This selector can be used in combination with the
|
152
|
+
[Universal Selector][universal-selector]. This allows one to select elements
|
153
|
+
using both a given local name and namespace. The syntax for this is as
|
154
|
+
following:
|
155
|
+
|
156
|
+
ns-prefix|local-name
|
157
|
+
|
158
|
+
Here the pipe (`|`) character separates the namespace prefix and the local name.
|
159
|
+
Both can either be an identifier or a wildcard. For example, the selector
|
160
|
+
`rb|foo` matches all elements with local name `foo` and namespace prefix `rb`.
|
161
|
+
|
162
|
+
The namespace prefix MAY be left out producing the selector `|local-name`. In
|
163
|
+
this case the selector only matches elements _without_ a namespace prefix.
|
164
|
+
|
165
|
+
If a namespace prefix is given and it's _not_ a wildcard then elements without a
|
166
|
+
namespace prefix will _not_ be matched.
|
167
|
+
|
168
|
+
The corresponding XPath expression for such a selector is
|
169
|
+
`ns-prefix:local-name`. For example, `rb|foo` in CSS is the same as `rb:foo` in
|
170
|
+
XPath.
|
171
|
+
|
172
|
+
### Syntax
|
173
|
+
|
174
|
+
The syntax for just the local name is as following:
|
175
|
+
|
176
|
+
identifier = '*' | [a-zA-Z]+ [a-zA-Z\-_0-9]*;
|
177
|
+
|
178
|
+
The wildcard is put in place to allow a single rule to be used for both names
|
179
|
+
and wildcards.
|
180
|
+
|
181
|
+
The syntax for selecting an element including a namespace prefix is as
|
182
|
+
following:
|
183
|
+
|
184
|
+
ns_plus_local_name = identifier* '|' identifier
|
185
|
+
|
186
|
+
This would match `|foo`, `*|foo` and `foo|bar`. In order to match `foo` the
|
187
|
+
regular `identifier` rule declared above can be used.
|
188
|
+
|
189
|
+
## Class Selector
|
190
|
+
|
191
|
+
Class selectors can be used to select a set of elements based on the values set
|
192
|
+
in the `class` attribute. Class selectors start with a period (`.`) followed by
|
193
|
+
an identifier. Multiple class selectors can be chained together, matching only
|
194
|
+
elements that have all the specified classes set.
|
195
|
+
|
196
|
+
As an example, `.foo` can be used to select all elements that have "foo" set in
|
197
|
+
the `class` attribute, either as the sole or one of many values. In turn,
|
198
|
+
`.foo.bar` matches elements that have both "foo" and "bar" set as the class.
|
199
|
+
|
200
|
+
Example XML:
|
201
|
+
|
202
|
+
<root>
|
203
|
+
<a class="first" />
|
204
|
+
<b class="second" />
|
205
|
+
</root>
|
206
|
+
|
207
|
+
Using the CSS selector `.first` would return a set containing only the `<a>`
|
208
|
+
element. Using `.first.second` would return a set containing both the `<a>` and
|
209
|
+
`<b>` nodes.
|
210
|
+
|
211
|
+
### Syntax
|
212
|
+
|
213
|
+
identifier = '*' | [a-zA-Z]+ [a-zA-Z\-_0-9]*;
|
214
|
+
|
215
|
+
# .foo, .foo.bar, .foo.bar.baz, etc
|
216
|
+
class = ('.' identifier)+;
|
217
|
+
|
218
|
+
## ID Selector
|
219
|
+
|
220
|
+
The ID selector can be used to match elements where the value of the `id`
|
221
|
+
attribute matches whatever is specified in the selector. ID selectors start with
|
222
|
+
a hash sign (`#`) followed by an identifier.
|
223
|
+
|
224
|
+
While technically multiple ID selectors _can_ be chained together, HTML only
|
225
|
+
allows elements to have a single ID. As a result doing so is fairly useless.
|
226
|
+
Unlike classes IDs are globally unique, no two elements can have the same ID.
|
227
|
+
|
228
|
+
Example XML:
|
229
|
+
|
230
|
+
<root>
|
231
|
+
<a id="first" />
|
232
|
+
<b id="second" />
|
233
|
+
</root>
|
234
|
+
|
235
|
+
Using the CSS selector `#first` would return a set containing only the `<a>`
|
236
|
+
node.
|
237
|
+
|
238
|
+
### Syntax
|
239
|
+
|
240
|
+
identifier = '*' | [a-zA-Z]+ [a-zA-Z\-_0-9]*;
|
241
|
+
|
242
|
+
# .foo, .foo.bar, .foo.bar.baz, etc
|
243
|
+
class = ('#' identifier)+;
|
244
|
+
|
245
|
+
## Attribute Selector
|
246
|
+
|
247
|
+
W3 chapter: <http://www.w3.org/TR/css3-selectors/#attribute-selectors>
|
248
|
+
|
249
|
+
Attribute selectors can be used to further narrow down a set of elements based
|
250
|
+
on their attribute list. In XPath these selectors are known as "predicates". For
|
251
|
+
example, the selector `foo[bar]` matches all `foo` elements that have a `bar`
|
252
|
+
attribute, regardless of the value of said attribute.
|
253
|
+
|
254
|
+
Example XML:
|
255
|
+
|
256
|
+
<root>
|
257
|
+
<foo number="1" />
|
258
|
+
<bar />
|
259
|
+
</root>
|
260
|
+
|
261
|
+
CSS:
|
262
|
+
|
263
|
+
root foo[number]
|
264
|
+
|
265
|
+
This would return a set containing only the `<foo>` element since the `<bar>`
|
266
|
+
element has no attributes.
|
267
|
+
|
268
|
+
For the CSS expression `foo[number]` the corresponding XPath expression is the
|
269
|
+
following:
|
270
|
+
|
271
|
+
foo[@number]
|
272
|
+
|
273
|
+
When specifying an attribute you MAY include an operator and a value to match.
|
274
|
+
In this case you MUST include an attribute value surrounded by either single or
|
275
|
+
double quotes (but not a combination of the two).
|
276
|
+
|
277
|
+
There are 6 operators available:
|
278
|
+
|
279
|
+
* `=`: equals operator
|
280
|
+
* `~=`: whitespace-in operator
|
281
|
+
* `^=`: starts-with operator
|
282
|
+
* `$=`: ends-with operator
|
283
|
+
* `*=`: contains operator
|
284
|
+
* `|=`: hyphen-starts-with operator
|
285
|
+
|
286
|
+
### Equals Operator
|
287
|
+
|
288
|
+
The equals operator matches an element if a given attribute value equals the
|
289
|
+
value specified. For example, `foo[number="1"]` matches all `foo` elements that
|
290
|
+
have a `number` attribute who's value is _exactly_ "1".
|
291
|
+
|
292
|
+
Example XML:
|
293
|
+
|
294
|
+
<root>
|
295
|
+
<foo number="1" />
|
296
|
+
<foo number="2" />
|
297
|
+
</root>
|
298
|
+
|
299
|
+
CSS:
|
300
|
+
|
301
|
+
root foo[number="1"]
|
302
|
+
|
303
|
+
This would return a set containing only the first `<foo>` element.
|
304
|
+
|
305
|
+
The corresponding XPath expression is quite similar. For `foo[number="1"]` this
|
306
|
+
would be:
|
307
|
+
|
308
|
+
foo[@number="1"]
|
309
|
+
|
310
|
+
### Whitespace-in Operator
|
311
|
+
|
312
|
+
This operator matches an element if the given attribute value consists out of
|
313
|
+
space separated values of which one is exactly the given value. For example,
|
314
|
+
`foo[numbers~="1"]` matches all `foo` elements that have the value `"1"` in the
|
315
|
+
`numbers` attribute.
|
316
|
+
|
317
|
+
Example XML:
|
318
|
+
|
319
|
+
<root>
|
320
|
+
<foo numbers="1 2 3" />
|
321
|
+
<foo numbers="4 bar 6" />
|
322
|
+
</root>
|
323
|
+
|
324
|
+
CSS:
|
325
|
+
|
326
|
+
root foo[numbers~="1"]
|
327
|
+
|
328
|
+
This would return a set containing only the first `foo` element. On the other
|
329
|
+
hand, if one were to use the expression `root foo[numbers~="bar"]` instead then
|
330
|
+
only the second `<foo>` element would be matched.
|
331
|
+
|
332
|
+
The corresponding XPath expression is quite complex, `foo[numbers~="1"]` is
|
333
|
+
translated into the following XPath expression:
|
334
|
+
|
335
|
+
foo[contains(concat(" ", @numbers, " "), concat(" ", "1", " "))]
|
336
|
+
|
337
|
+
The `concat` calls are used to ensure the expression doesn't match the substring
|
338
|
+
of an attrbitue value and that the expression matches elements of which the
|
339
|
+
attribute only has a single value. If `foo[contains(@numbers, ' 1 ')]` were to
|
340
|
+
be used then attributes such as `<foo numbers="1" />` would not be matched.
|
341
|
+
|
342
|
+
Software implementing this selector are free to decide how they concatenate
|
343
|
+
spaces around the value to match. Both Oga and Nokogiri use an extra call to
|
344
|
+
`concat` but the following would be perfectly valid too:
|
345
|
+
|
346
|
+
foo[contains(concat(" ", @numbers, " "), " 1 ")]
|
347
|
+
|
348
|
+
### Starts-with Operator
|
349
|
+
|
350
|
+
This operator matches elements of which the attribute value starts _exactly_
|
351
|
+
with the given value. For example, `foo[numbers^="1"]` would match the element
|
352
|
+
`<foo numbers="1 2 3" />` but _not_ the element `<foo numbers="2 3 1" />`.
|
353
|
+
|
354
|
+
For `foo[numbers^="1"]` the corresponding XPath expression is as following:
|
355
|
+
|
356
|
+
foo[starts-with(@numbers, "1")]
|
357
|
+
|
358
|
+
### Ends-with Operator
|
359
|
+
|
360
|
+
This operator matches elements of which the attribute value ends _exactly_ with
|
361
|
+
the given value. For example, `foo[numbers$="3"]` would match the element `<foo
|
362
|
+
numbers="1 2 3" />` but _not_ the element `<foo numbers="2 3 1" />`.
|
363
|
+
|
364
|
+
The corresponding XPath expression is quite complex due to a lack of a
|
365
|
+
`ends-with` function in XPath. Instead one has to resort to using the
|
366
|
+
`substring()` function. As such the corresponding XPath expression for
|
367
|
+
`foo[bar="baz"]` is as following:
|
368
|
+
|
369
|
+
foo[substring(@bar, string-length(@bar) - string-length("baz") + 1, string-length("baz")) = "baz"]
|
370
|
+
|
371
|
+
### Contains Operator
|
372
|
+
|
373
|
+
This operator matches elements of which the attribute value contains the given
|
374
|
+
value. For example, `foo[bar*="baz"]` would match both `<foo bar="bazzzz" />`
|
375
|
+
and `<foo bar="hello baz" />`.
|
376
|
+
|
377
|
+
For `foo[bar*="baz"]` the corresponding XPath expression is as following:
|
378
|
+
|
379
|
+
foo[contains(@bar, "baz")]
|
380
|
+
|
381
|
+
### Hyphen-starts-with Operator
|
382
|
+
|
383
|
+
This operator matches elements of which the attribute value is a hyphen
|
384
|
+
separated list of values that starts _exactly_ with the given value. For
|
385
|
+
example, `foo[numbers|="1"]` matches `<foo numbers="1-2-3" />` but not
|
386
|
+
`<foo numbers="2-1-3" />`.
|
387
|
+
|
388
|
+
For `foo[numbers|="1"]` the corresponding XPath expression is as following:
|
389
|
+
|
390
|
+
foo[@numbers = "1" or starts-with(@numbers, concat("1", "-"))]
|
391
|
+
|
392
|
+
Note that this selector will also match elements such as
|
393
|
+
`<foo numbers="1- foo bar" />`.
|
394
|
+
|
395
|
+
### Syntax
|
396
|
+
|
397
|
+
The syntax of the various attribute selectors can be described as following:
|
398
|
+
|
399
|
+
# Strings are used for the attribute values
|
400
|
+
|
401
|
+
dquote = '"';
|
402
|
+
squote = "'";
|
403
|
+
|
404
|
+
string_dquote = dquote ^dquote* dquote;
|
405
|
+
string_squote = squote ^squote* squote;
|
406
|
+
|
407
|
+
string = string_dquote | string_squote;
|
408
|
+
|
409
|
+
# The `identifier` rule is the same as the one used for matching element
|
410
|
+
# names.
|
411
|
+
attr_test = identifier '[' space* identifier (space* '=' space* string)* space* ']';
|
412
|
+
|
413
|
+
Whitespace inside the brackets does not affect the behaviour of the selector.
|
414
|
+
|
415
|
+
## Pseudo Classes
|
416
|
+
|
417
|
+
W3 chapter: <http://www.w3.org/TR/css3-selectors/#structural-pseudos>
|
418
|
+
|
419
|
+
Pseudo classes can be used to further narrow down elements besides just their
|
420
|
+
names and attribute values. In essence they are a combination of XPath function
|
421
|
+
calls and axes. Some pseudo classes can take an argument to alter their
|
422
|
+
behaviour.
|
423
|
+
|
424
|
+
Pseudo classes are often applied to element selectors. For example:
|
425
|
+
|
426
|
+
foo:bar
|
427
|
+
|
428
|
+
Here `:bar` would be a pseudo class applied to the `foo` element. Some pseudo
|
429
|
+
classes (e.g. the `:root` pseudo class) can also be used on their own, for
|
430
|
+
example:
|
431
|
+
|
432
|
+
:root
|
433
|
+
|
434
|
+
### :root
|
435
|
+
|
436
|
+
The `:root` pseudo class selects an element only if it's the top-level element
|
437
|
+
in a document.
|
438
|
+
|
439
|
+
Example XML:
|
440
|
+
|
441
|
+
<root>
|
442
|
+
<foo />
|
443
|
+
</root>
|
444
|
+
|
445
|
+
Using the CSS expression `root foo:root` we'd get an empty set as the `<foo>`
|
446
|
+
element is not the root element. On the other hand, `root:root` would return a
|
447
|
+
set containing only the `<root>` element.
|
448
|
+
|
449
|
+
This selector can both be applied to an element selector as well as being used
|
450
|
+
on its own.
|
451
|
+
|
452
|
+
For the selector `foo:root` the corresponding XPath expression is as following:
|
453
|
+
|
454
|
+
foo[not(parent::*)]
|
455
|
+
|
456
|
+
For `:root` the XPath expression is:
|
457
|
+
|
458
|
+
*[not(parent::*)]
|
459
|
+
|
460
|
+
### :nth-child(n)
|
461
|
+
|
462
|
+
The `:nth-child(n)` pseudo class can be used to select a set of elements based
|
463
|
+
on their position or an interval, skipping elements that occur in a set before
|
464
|
+
the given position or interval.
|
465
|
+
|
466
|
+
In the form `:nth-child(n)` the identifier `n` is an argument that can be used
|
467
|
+
to specify one of the following:
|
468
|
+
|
469
|
+
1. A literal node set index
|
470
|
+
2. A node interval used to match every N nodes
|
471
|
+
3. A node interval plus an initial offset
|
472
|
+
|
473
|
+
The first element in a node set for `:nth-child()` is located at position 1,
|
474
|
+
_not_ position 0 (unlike most programming languages). As a result
|
475
|
+
`:nth-child(1)` matches the _first_ element, _not_ the second. This can be
|
476
|
+
visualized as following:
|
477
|
+
|
478
|
+
:nth-child(2)
|
479
|
+
|
480
|
+
1 2 3 4 5 6
|
481
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
482
|
+
| | | X | | | | | | | | |
|
483
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
484
|
+
|
485
|
+
Besides using a literal index argument you can also use an interval, optionally
|
486
|
+
with an offset. This can be used to for example match every 2nd element, or
|
487
|
+
every 2nd element starting at element number 4.
|
488
|
+
|
489
|
+
The syntax of this argument is as following:
|
490
|
+
|
491
|
+
integer = ('+' | '-')* [0-9]+;
|
492
|
+
interval = ('n' | '-n' | integer 'n') integer;
|
493
|
+
|
494
|
+
Here `interval` would match any of the following:
|
495
|
+
|
496
|
+
n
|
497
|
+
-n
|
498
|
+
2n
|
499
|
+
2n+5
|
500
|
+
2n-5
|
501
|
+
-2n+5
|
502
|
+
-2n-5
|
503
|
+
|
504
|
+
Due to `integer` also matching the `+` and `-` it will be part of the same
|
505
|
+
token. If this is not desired the following grammar can be used instead:
|
506
|
+
|
507
|
+
integer = [0-9]+;
|
508
|
+
modifier = '+' | '-';
|
509
|
+
interval = ('n' | '-n' | modifier* integer 'n') modifier integer;
|
510
|
+
|
511
|
+
To match every 2nd element you'd use the following:
|
512
|
+
|
513
|
+
:nth-child(2n)
|
514
|
+
|
515
|
+
1 2 3 4 5 6
|
516
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
517
|
+
| | | X | | | | X | | | | X |
|
518
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
519
|
+
|
520
|
+
To match every 2nd element starting at element 1 you'd instead use this:
|
521
|
+
|
522
|
+
:nth-child(2n+1)
|
523
|
+
|
524
|
+
1 2 3 4 5 6
|
525
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
526
|
+
| X | | | | X | | | | X | | |
|
527
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
528
|
+
|
529
|
+
As mentioned the `+1` in the above example is the initial offset. This is
|
530
|
+
however _only_ the case if the second number is positive. That means that for
|
531
|
+
`:nth-child(2n-2)` the offset is _not_ `-2`. When using a negative offset the
|
532
|
+
actual offset first has to be calculated. When using an argument in the form of
|
533
|
+
`An-B` we can calculate the actual offset as following:
|
534
|
+
|
535
|
+
offset = A - (B % A)
|
536
|
+
|
537
|
+
For example, for the selector `:nth-child(2n-2)` the formula would be:
|
538
|
+
|
539
|
+
offset = 2 - (-2 % 2) # => 2
|
540
|
+
|
541
|
+
This would result in the selector `:nth-child(2n+2)`.
|
542
|
+
|
543
|
+
As an another example, for the selector `:nth-child(2n-5)` the formula would be:
|
544
|
+
|
545
|
+
offset = 2 - (-5 % 2) # => 1
|
546
|
+
|
547
|
+
Which would result in the selector `:nth-child(2n+1)`
|
548
|
+
|
549
|
+
To ease the process of selecting even and uneven elements you can also use
|
550
|
+
`even` and `odd` as an argument. Using `:nth-child(even)` is the same as
|
551
|
+
`:nth-child(2n)` while using `:nth-child(odd)` in turn is the same as
|
552
|
+
`:nth-child(2n+1)`.
|
553
|
+
|
554
|
+
Using `:nth-child(n)` simply matches all elements in the set. Using
|
555
|
+
`:nth-child(-n)` doesn't match any elements, though Oga treats it the same as
|
556
|
+
`:nth-child(n)`.
|
557
|
+
|
558
|
+
Expressions such as `:nth-child(-n-5)` are invalid as both parts of the interval
|
559
|
+
(`-n` and `-5`) are a negative. However, `:nth-child(-n+5)` is
|
560
|
+
perfectly valid and would match the first 5 elements in a set:
|
561
|
+
|
562
|
+
:nth-child(-n+5)
|
563
|
+
|
564
|
+
1 2 3 4 5 6
|
565
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
566
|
+
| X | | X | | X | | X | | X | | |
|
567
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
568
|
+
|
569
|
+
|
570
|
+
Using `:nth-child(n+5)` would match all elements starting at element 5:
|
571
|
+
|
572
|
+
:nth-child(n+5)
|
573
|
+
|
574
|
+
1 2 3 4 5 6 7 8 9 10
|
575
|
+
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
|
576
|
+
| | | | | | | | | X | | X | | X | | X | | X | | X |
|
577
|
+
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
|
578
|
+
|
579
|
+
To summarize:
|
580
|
+
|
581
|
+
:nth-child(n) => matches all elements
|
582
|
+
:nth-child(-n) => matches nothing, though Oga treats it the same as "n"
|
583
|
+
:nth-child(5) => matches element #5
|
584
|
+
:nth-child(2n) => matches every 2 elements
|
585
|
+
:nth-child(2n+2) => matches every 2 elements, starting at element 2
|
586
|
+
:nth-child(2n-2) => matches every 2 elements, starting at element 1
|
587
|
+
:nth-child(n+5) => matches all elements, starting at element 5
|
588
|
+
:nth-child(-n+5) => matches the first 5 elements
|
589
|
+
:nth-child(even) => matches every 2nd element, starting at element 2
|
590
|
+
:nth-child(odd) => matches every 2nd element, starting at element 1
|
591
|
+
|
592
|
+
The corresponding XPath expressions are quite complex and differ based on the
|
593
|
+
interval argument used. For the various forms the corresponding XPath
|
594
|
+
expressions are as following:
|
595
|
+
|
596
|
+
:nth-child(n) => *[((count(preceding-sibling::*) + 1) mod 1) = 0]
|
597
|
+
:nth-child(-n) => *[((count(preceding-sibling::*) + 1) mod 1) = 0]
|
598
|
+
:nth-child(5) => *[count(preceding-sibling::*) = 4]
|
599
|
+
:nth-child(2n) => *[((count(preceding-sibling::*) + 1) mod 2) = 0]
|
600
|
+
:nth-child(2n+2) => *[(count(preceding-sibling::*) + 1) >= 2 and (((count(preceding-sibling::*) + 1) - 2) mod 2) = 0]
|
601
|
+
:nth-child(2n-6) => *[(count(preceding-sibling::*) + 1) >= 2 and (((count(preceding-sibling::*) + 1) - 2) mod 2) = 0]
|
602
|
+
:nth-child(n+5) => *[(count(preceding-sibling::*) + 1) >= 5 and (((count(preceding-sibling::*) + 1) - 5) mod 1) = 0]
|
603
|
+
:nth-child(-n+6) => *[((count(preceding-sibling::*) + 1) <= 6) and (((count(preceding-sibling::*) + 1) - 6) mod 1) = 0]
|
604
|
+
:nth-child(even) => *[((count(preceding-sibling::*) + 1) mod 2) = 0]
|
605
|
+
:nth-child(odd) => *[(count(preceding-sibling::*) + 1) >= 1 and (((count(preceding-sibling::*) + 1) - 1) mod 2) = 0]
|
606
|
+
|
607
|
+
### :nth-last-child(n)
|
608
|
+
|
609
|
+
The `:nth-last-child(n)` pseudo class can be used to select a set of elements
|
610
|
+
based on their position or an interval, skipping elements that occur in a set
|
611
|
+
after the given position or interval.
|
612
|
+
|
613
|
+
The arguments that can be used by this selector are the same as those mentioned
|
614
|
+
in [:nth-child(n)][nth-childn].
|
615
|
+
|
616
|
+
Because this selectors matches in reverse (compared to
|
617
|
+
[:nth-child(n)][nth-childn]) using an index such as "1" will match the _last_
|
618
|
+
element in a set, not the first one:
|
619
|
+
|
620
|
+
:nth-last-child(1)
|
621
|
+
|
622
|
+
1 2 3 4 5 6
|
623
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
624
|
+
| | | | | | | | | | | X | <- matching direction
|
625
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
626
|
+
|
627
|
+
When using an interval (with or without an offset) the nodes are also matched in
|
628
|
+
reverse order. However, matched nodes should be returned in the order they
|
629
|
+
appear in in the document.
|
630
|
+
|
631
|
+
For example, the selector `:nth-last-child(2n)` would match as following:
|
632
|
+
|
633
|
+
:nth-last-child(2n)
|
634
|
+
|
635
|
+
1 2 3 4 5 6
|
636
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
637
|
+
| X | | | | X | | | | X | | | <- matching direction
|
638
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
639
|
+
|
640
|
+
The resulting set however would contain the nodes in the order `[1, 3, 5]`
|
641
|
+
instead of `[5, 3, 1]`.
|
642
|
+
|
643
|
+
When using an interval with an initial offset the offset is also applied in
|
644
|
+
reverse order. For example, the selector `:nth-last-child(2n)` would match as
|
645
|
+
following:
|
646
|
+
|
647
|
+
:nth-last-child(2n+1)
|
648
|
+
|
649
|
+
1 2 3 4 5 6
|
650
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
651
|
+
| | | X | | | | X | | | | X | <- matching direction
|
652
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
653
|
+
|
654
|
+
The corresponding XPath expressions are similar to those used for
|
655
|
+
[:nth-child(n)][nth-childn]:
|
656
|
+
|
657
|
+
:nth-last-child(n) => *[count(following-sibling::*) = -1]
|
658
|
+
:nth-last-child(-n) => *[count(following-sibling::*) = -1]
|
659
|
+
:nth-last-child(5) => *[count(following-sibling::*) = 4]
|
660
|
+
:nth-last-child(2n) => *[((count(following-sibling::*) + 1) mod 2) = 0]
|
661
|
+
:nth-last-child(2n+2) => *[((count(following-sibling::*) + 1) >= 2) and ((((count(following-sibling::*) + 1) - 2) mod 2) = 0)]
|
662
|
+
:nth-last-child(2n-6) => *[((count(following-sibling::*) + 1) >= 2) and ((((count(following-sibling::*) + 1) - 2) mod 2) = 0)]
|
663
|
+
:nth-last-child(n+5) => *[((count(following-sibling::*) + 1) >= 5) and ((((count(following-sibling::*) + 1) - 5) mod 1) = 0)]
|
664
|
+
:nth-last-child(-n+6) => *[((count(following-sibling::*) + 1) <= 6) and ((((count(following-sibling::*) + 1) - 6) mod 1) = 0)]
|
665
|
+
:nth-last-child(even) => *[((count(following-sibling::*) + 1) mod 2) = 0]
|
666
|
+
:nth-last-child(odd) => *[((count(following-sibling::*) + 1) >= 1) and ((((count(following-sibling::*) + 1) - 1) mod 2) = 0)]
|
667
|
+
|
668
|
+
### :nth-of-type(n)
|
669
|
+
|
670
|
+
The `:nth-of-type(n)` pseudo class can be used to select a set of elements that
|
671
|
+
has a set of preceding siblings with the same name. The arguments that can be
|
672
|
+
used by this selector are the same as those mentioned in
|
673
|
+
[:nth-child(n)][nth-childn].
|
674
|
+
|
675
|
+
The matching order of this selector is the same as [:nth-child(n)][nth-childn].
|
676
|
+
|
677
|
+
Example XML:
|
678
|
+
|
679
|
+
<root>
|
680
|
+
<foo />
|
681
|
+
<foo />
|
682
|
+
<foo />
|
683
|
+
<foo />
|
684
|
+
<bar />
|
685
|
+
</root>
|
686
|
+
|
687
|
+
Using the CSS expression `root foo:nth-of-type(even)` would return a set
|
688
|
+
containing the 2nd and 4th `<foo>` nodes.
|
689
|
+
|
690
|
+
The corresponding XPath expressions for the various forms of this pseudo class
|
691
|
+
are as following:
|
692
|
+
|
693
|
+
:nth-of-type(n) => *[position() = n]
|
694
|
+
:nth-of-type(-n) => *[position() = -n]
|
695
|
+
:nth-of-type(5) => *[position() = 5]
|
696
|
+
:nth-of-type(2n) => *[(position() mod 2) = 0]
|
697
|
+
:nth-of-type(2n+2) => *[(position() >= 2) and (((position() - 2) mod 2) = 0)]
|
698
|
+
:nth-of-type(2n-6) => *[(position() >= 2) and (((position() - 2) mod 2) = 0)]
|
699
|
+
:nth-of-type(n+5) => *[(position() >= 5) and (((position() - 5) mod 1) = 0)]
|
700
|
+
:nth-of-type(-n+6) => *[(position() <= 6) and (((position() - 6) mod 1) = 0)]
|
701
|
+
:nth-of-type(even) => *[(position() mod 2) = 0]
|
702
|
+
:nth-of-type(odd) => *[(position() >= 1) and (((position() - 1) mod 2) = 0)]
|
703
|
+
|
704
|
+
### :nth-last-of-type(n)
|
705
|
+
|
706
|
+
The `:nth-last-of-type(n)` pseudo class behaves the same as
|
707
|
+
[:nth-of-type(n)][nth-last-of-typen] excepts it matches nodes in reverse order
|
708
|
+
similar to [:nth-last-child(n)][nth-last-childn]. To clarify, this means
|
709
|
+
matching occurs as following:
|
710
|
+
|
711
|
+
|
712
|
+
:nth-last-of-type(1)
|
713
|
+
|
714
|
+
1 2 3 4 5 6
|
715
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
716
|
+
| | | | | | | | | | | X | <- matching direction
|
717
|
+
+---+ +---+ +---+ +---+ +---+ +---+
|
718
|
+
|
719
|
+
Example XML:
|
720
|
+
|
721
|
+
<root>
|
722
|
+
<foo />
|
723
|
+
<foo />
|
724
|
+
<foo />
|
725
|
+
<foo />
|
726
|
+
<bar />
|
727
|
+
</root>
|
728
|
+
|
729
|
+
Using the CSS expression `root foo:nth-of-type(even)` would return a set
|
730
|
+
containing the 1st and 3rd `<foo>` nodes.
|
731
|
+
|
732
|
+
The corresponding XPath expressions for the various forms of this pseudo class
|
733
|
+
are as following:
|
734
|
+
|
735
|
+
:nth-last-of-type(n) => *[position() = last() - -1]
|
736
|
+
:nth-last-of-type(-n) => *[position() = last() - -1]
|
737
|
+
:nth-last-of-type(5) => *[position() = last() - 4]
|
738
|
+
:nth-last-of-type(2n) => *[((last() - position()+1) mod 2) = 0]
|
739
|
+
:nth-last-of-type(2n+2) => *[((last() - position()+1) >= 2) and ((((last() - position() + 1) - 2) mod 2) = 0)]
|
740
|
+
:nth-last-of-type(2n-6) => *[((last() - position()+1) >= 2) and ((((last() - position() + 1) - 2) mod 2) = 0)]
|
741
|
+
:nth-last-of-type(n+5) => *[((last() - position()+1) >= 5) and ((((last() - position() + 1) - 5) mod 1) = 0)]
|
742
|
+
:nth-last-of-type(-n+6) => *[((last() - position()+1) <= 6) and ((((last() - position() + 1) - 6) mod 1) = 0)]
|
743
|
+
:nth-last-of-type(even) => *[((last() - position()+1) mod 2) = 0]
|
744
|
+
:nth-last-of-type(odd) => *[((last() - position()+1) >= 1) and ((((last() - position() + 1) - 1) mod 2) = 0)]
|
745
|
+
|
746
|
+
### :first-child
|
747
|
+
|
748
|
+
The `:first-child` pseudo class can be used to match a node that is the first
|
749
|
+
child node of another node (= a node without any preceding nodes).
|
750
|
+
|
751
|
+
Example XML:
|
752
|
+
|
753
|
+
<root>
|
754
|
+
<foo />
|
755
|
+
<bar />
|
756
|
+
</root>
|
757
|
+
|
758
|
+
Using the CSS selector `root :first-child` would return a set containing only
|
759
|
+
the `<foo>` node.
|
760
|
+
|
761
|
+
The corresponding XPath expression for this pseudo class is as following:
|
762
|
+
|
763
|
+
:first-child => *[count(preceding-sibling::*) = 0]
|
764
|
+
|
765
|
+
### :last-child
|
766
|
+
|
767
|
+
The `:last-child` pseudo class can be used to match a node that is the last
|
768
|
+
child node of another node (= a node without any following nodes).
|
769
|
+
|
770
|
+
Example XML:
|
771
|
+
|
772
|
+
<root>
|
773
|
+
<foo />
|
774
|
+
<bar />
|
775
|
+
</root>
|
776
|
+
|
777
|
+
Using the CSS selector `root :last-child` would return a set containing only
|
778
|
+
the `<bar>` node.
|
779
|
+
|
780
|
+
The corresponding XPath expression for this pseudo class is as following:
|
781
|
+
|
782
|
+
:last-child => *[count(following-sibling::*) = 0]
|
783
|
+
|
784
|
+
### :first-of-type
|
785
|
+
|
786
|
+
The `:first-of-type` pseudo class matches elements that are the first sibling of
|
787
|
+
its type in the list of elements of its parent element. This selector is the
|
788
|
+
same as [:nth-of-type(1)][nth-of-typen].
|
789
|
+
|
790
|
+
Example XML:
|
791
|
+
|
792
|
+
<root>
|
793
|
+
<a id="1" />
|
794
|
+
<a id="2">
|
795
|
+
<a id="3" />
|
796
|
+
<a id="4" />
|
797
|
+
</a>
|
798
|
+
</root>
|
799
|
+
|
800
|
+
Using the CSS selector `root a:first-of-type` would return a node set containing
|
801
|
+
nodes `<a id="1">` and `<a id="3">` as both nodes are the first siblings of
|
802
|
+
their type.
|
803
|
+
|
804
|
+
The corresponding XPath for this pseudo class is as following:
|
805
|
+
|
806
|
+
a:first-of-type => a[count(preceding-sibling::a) = 0]
|
807
|
+
|
808
|
+
An alternative way is to use the following XPath:
|
809
|
+
|
810
|
+
a:first-of-type => //a[position() = 1]
|
811
|
+
|
812
|
+
This however relies on the less efficient `descendant-or-self::node()` selector.
|
813
|
+
For querying larger documents it's recommended to use the first form instead.
|
814
|
+
|
815
|
+
### :last-of-type
|
816
|
+
|
817
|
+
The `:last-of-type` pseudo class can be used to match elements that are the last
|
818
|
+
sibling of its type in the list of elements of its parent. This selector is the
|
819
|
+
same as [:nth-last-of-type(1)][nth-last-of-typen].
|
820
|
+
|
821
|
+
Example XML:
|
822
|
+
|
823
|
+
<root>
|
824
|
+
<a id="1" />
|
825
|
+
<a id="2">
|
826
|
+
<a id="3" />
|
827
|
+
<a id="4" />
|
828
|
+
</a>
|
829
|
+
</root>
|
830
|
+
|
831
|
+
Using the CSS selector `root a:last-of-type` would return a set containing nodes
|
832
|
+
`<a id="2">` and `<a id="4">` as both nodes are the last siblings of their type.
|
833
|
+
|
834
|
+
The corresponding XPath for this pseudo class is as following:
|
835
|
+
|
836
|
+
a:last-of-type => a[count(following-sibling::a) = 0]
|
837
|
+
|
838
|
+
Similar to [:first-of-type][first-of-typen] this XPath can alternatively be
|
839
|
+
written as following:
|
840
|
+
|
841
|
+
a:last-of-type => //a[position() = last()]
|
842
|
+
|
843
|
+
### :only-child
|
844
|
+
|
845
|
+
The `:only-child` pseudo class can be used to match elements that are the only
|
846
|
+
child element of its parent.
|
847
|
+
|
848
|
+
Example XML:
|
849
|
+
|
850
|
+
<root>
|
851
|
+
<a id="1" />
|
852
|
+
<a id="2">
|
853
|
+
<a id="3" />
|
854
|
+
</a>
|
855
|
+
</root>
|
856
|
+
|
857
|
+
Using the CSS selector `root a:only-child` would return a set containing only
|
858
|
+
the `<a id="3">` node.
|
859
|
+
|
860
|
+
The corresponding XPath for this pseudo class is as following:
|
861
|
+
|
862
|
+
a:only-child => a[count(preceding-sibling::*) = 0 and count(following-sibling::*) = 0]
|
863
|
+
|
864
|
+
### :only-of-type
|
865
|
+
|
866
|
+
The `:only-of-type` pseudo class can be used to match elements that are the only
|
867
|
+
child elements of its type of its parent.
|
868
|
+
|
869
|
+
Example XML:
|
870
|
+
|
871
|
+
<root>
|
872
|
+
<a id="1" />
|
873
|
+
<a id="2">
|
874
|
+
<a id="3" />
|
875
|
+
<b id="4" />
|
876
|
+
</a>
|
877
|
+
</root>
|
878
|
+
|
879
|
+
Using the CSS selector `root a:only-of-type` would return a set containing
|
880
|
+
only the `<a id="3">` node due to it being the only `<a>` node in the list of
|
881
|
+
elements of its parent.
|
882
|
+
|
883
|
+
The corresponding XPath for this pseudo class is as following:
|
884
|
+
|
885
|
+
a:only-child => a[count(preceding-sibling::a) = 0 and count(following-sibling::a) = 0]
|
886
|
+
|
887
|
+
### :empty
|
888
|
+
|
889
|
+
The `:empty` pseudo class can be used to match elements that have no child nodes
|
890
|
+
at all.
|
891
|
+
|
892
|
+
Example XML:
|
893
|
+
|
894
|
+
<root>
|
895
|
+
<a />
|
896
|
+
<b>10</b>
|
897
|
+
</root>
|
898
|
+
|
899
|
+
Using the CSS selector `root :empty` would return a set containing only the
|
900
|
+
`<a>` node.
|
901
|
+
|
902
|
+
### Syntax
|
903
|
+
|
904
|
+
The syntax of the various pseudo classes is as following:
|
905
|
+
|
906
|
+
integer = ('+' | '-')* [0-9]+;
|
907
|
+
|
908
|
+
odd = 'odd';
|
909
|
+
even = 'even';
|
910
|
+
nth = 'n';
|
911
|
+
|
912
|
+
pseudo_arg_interval = '-'* integer* nth;
|
913
|
+
pseudo_arg_offset = ('+' | '-')* integer;
|
914
|
+
|
915
|
+
pseudo_arg = odd
|
916
|
+
| even
|
917
|
+
| '-'* nth
|
918
|
+
| integer
|
919
|
+
| pseudo_arg_interval
|
920
|
+
| pseudo_arg_interval pseudo_arg_offset;
|
921
|
+
|
922
|
+
# The `identifier` rule is the same as the one used for element names.
|
923
|
+
pseudo = ':' identifier ('(' space* pseudo_arg space* ')')*;
|
924
|
+
|
925
|
+
[w3spec]: http://www.w3.org/TR/css3-selectors/
|
926
|
+
[rfc-2119]: https://www.ietf.org/rfc/rfc2119.txt
|
927
|
+
[kramdown]: http://kramdown.gettalong.org/
|
928
|
+
[universal-selector]: #universal-selector
|
929
|
+
[ragel]: http://www.colm.net/open-source/ragel/
|
930
|
+
[nth-childn]: #nth-childn
|
931
|
+
[nth-last-childn]: #nth-last-childn
|
932
|
+
[nth-last-of-typen]: #nth-last-of-typen
|
933
|
+
[nth-of-typen]: #nth-of-type
|
934
|
+
[nth-last-of-typen]: #nth-last-of-typen
|
935
|
+
[first-of-typen]: #first-of-typen
|