makiri 0.3.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/conformance.yml +22 -0
- data/.github/workflows/libfuzzer.yml +83 -0
- data/.github/workflows/security.yml +88 -3
- data/.github/workflows/valgrind.yml +138 -0
- data/CHANGELOG.md +127 -2
- data/README.md +95 -77
- data/Rakefile +207 -3
- data/ext/makiri/bridge/ruby_string.c +159 -80
- data/ext/makiri/core/mkr_alloc.c +40 -3
- data/ext/makiri/core/mkr_alloc.h +28 -5
- data/ext/makiri/core/mkr_buf.c +13 -3
- data/ext/makiri/core/mkr_buf.h +80 -5
- data/ext/makiri/core/mkr_core.c +143 -0
- data/ext/makiri/core/mkr_core.h +10 -1
- data/ext/makiri/core/mkr_span.h +186 -0
- data/ext/makiri/core/mkr_utf8.c +101 -0
- data/ext/makiri/core/mkr_utf8.h +88 -0
- data/ext/makiri/{lexbor_compat → dom_adapter}/compat.h +4 -4
- data/ext/makiri/{lexbor_compat → dom_adapter}/compat_internal.h +1 -1
- data/ext/makiri/dom_adapter/cross_import.c +434 -0
- data/ext/makiri/dom_adapter/cross_import.h +35 -0
- data/ext/makiri/{lexbor_compat → dom_adapter}/source_loc.c +14 -16
- data/ext/makiri/{lexbor_compat → dom_adapter}/text_index.c +1 -1
- data/ext/makiri/{lexbor_compat → dom_adapter}/utf8_input.c +5 -78
- data/ext/makiri/extconf.rb +104 -9
- data/ext/makiri/fuzz/Makefile +95 -0
- data/ext/makiri/fuzz/check_fuzzer.cc +4 -0
- data/ext/makiri/fuzz/xml_fuzz.c +24 -0
- data/ext/makiri/fuzz/xpath_fuzz.c +109 -0
- data/ext/makiri/glue/cross_import.h +30 -0
- data/ext/makiri/glue/glue.h +9 -1
- data/ext/makiri/glue/ruby_doc.c +31 -27
- data/ext/makiri/glue/ruby_html_css.c +58 -12
- data/ext/makiri/glue/ruby_html_mutate.c +17 -6
- data/ext/makiri/glue/ruby_html_node.c +4 -33
- data/ext/makiri/glue/ruby_lexbor_css.c +462 -0
- data/ext/makiri/glue/ruby_node.c +53 -0
- data/ext/makiri/glue/ruby_xml.c +228 -17
- data/ext/makiri/glue/ruby_xml_node.c +133 -61
- data/ext/makiri/glue/ruby_xpath.c +20 -5
- data/ext/makiri/makiri.c +48 -0
- data/ext/makiri/makiri.h +5 -0
- data/ext/makiri/xml/mkr_xml.h +7 -3
- data/ext/makiri/xml/mkr_xml_chars.c +89 -97
- data/ext/makiri/xml/mkr_xml_index.c +169 -0
- data/ext/makiri/xml/mkr_xml_index.h +48 -0
- data/ext/makiri/xml/mkr_xml_mutate.c +220 -168
- data/ext/makiri/xml/mkr_xml_mutate.h +24 -0
- data/ext/makiri/xml/mkr_xml_node.c +147 -15
- data/ext/makiri/xml/mkr_xml_node.h +71 -6
- data/ext/makiri/xml/mkr_xml_tree.c +246 -174
- data/ext/makiri/xpath/mkr_css.c +1023 -0
- data/ext/makiri/xpath/mkr_css.h +65 -0
- data/ext/makiri/xpath/mkr_xpath.c +65 -0
- data/ext/makiri/xpath/mkr_xpath.h +18 -1
- data/ext/makiri/xpath/mkr_xpath_eval_body.h +383 -90
- data/ext/makiri/xpath/mkr_xpath_funcs_body.h +249 -231
- data/ext/makiri/xpath/mkr_xpath_internal.h +89 -9
- data/ext/makiri/xpath/mkr_xpath_lex.c +94 -124
- data/ext/makiri/xpath/mkr_xpath_node_access_xml.h +6 -3
- data/ext/makiri/xpath/mkr_xpath_number.c +109 -0
- data/ext/makiri/xpath/mkr_xpath_parse.c +79 -90
- data/ext/makiri/xpath/mkr_xpath_shared.c +40 -24
- data/ext/makiri/xpath/mkr_xpath_value_body.h +50 -24
- data/lib/makiri/cdata_section.rb +1 -3
- data/lib/makiri/comment.rb +1 -3
- data/lib/makiri/document.rb +8 -0
- data/lib/makiri/element.rb +1 -3
- data/lib/makiri/html/document.rb +11 -12
- data/lib/makiri/html/node_methods.rb +0 -1
- data/lib/makiri/node_set.rb +14 -9
- data/lib/makiri/processing_instruction.rb +8 -2
- data/lib/makiri/text.rb +1 -3
- data/lib/makiri/version.rb +1 -1
- data/lib/makiri/xml/builder.rb +271 -0
- data/lib/makiri/xml/node_methods.rb +47 -0
- data/lib/makiri/xpath_context.rb +12 -4
- data/lib/makiri.rb +1 -0
- data/script/check_alloc_failures.rb +266 -0
- data/script/check_c_safety.rb +45 -2
- data/script/check_c_safety_allowlist.yml +27 -5
- data/script/check_leaks.rb +64 -0
- data/script/leaks_harness.rb +71 -0
- data/suppressions/ruby.supp +140 -0
- data/vendor/lexbor/CMakeLists.txt +6 -0
- data/vendor/lexbor/README.md +12 -0
- data/vendor/lexbor/config.cmake +1 -1
- data/vendor/lexbor/source/lexbor/core/base.h +1 -1
- data/vendor/lexbor/source/lexbor/core/config.cmake +9 -1
- data/vendor/lexbor/source/lexbor/css/selectors/pseudo_state.c +2 -3
- data/vendor/lexbor/source/lexbor/css/selectors/state.c +3 -0
- data/vendor/lexbor/source/lexbor/dom/interfaces/element.c +21 -0
- data/vendor/lexbor/source/lexbor/dom/interfaces/element.h +5 -0
- data/vendor/lexbor/source/lexbor/encoding/decode.c +33 -4
- data/vendor/lexbor/source/lexbor/html/base.h +1 -1
- data/vendor/lexbor/source/lexbor/html/interfaces/select_element.c +4 -0
- data/vendor/lexbor/source/lexbor/html/serialize.c +545 -41
- data/vendor/lexbor/source/lexbor/html/serialize.h +2 -1
- data/vendor/lexbor/source/lexbor/html/tokenizer.h +2 -2
- data/vendor/lexbor/source/lexbor/html/tree/insertion_mode/in_body.c +1 -1
- data/vendor/lexbor/source/lexbor/html/tree.c +6 -6
- data/vendor/lexbor/source/lexbor/selectors/selectors.c +12 -3
- data/vendor/lexbor/source/lexbor/url/base.h +1 -1
- data/vendor/lexbor/source/lexbor/url/url.c +5 -2
- data/vendor/lexbor/source/lexbor/url/url.h +9 -0
- data/vendor/lexbor/version +1 -1
- metadata +31 -8
- /data/ext/makiri/{lexbor_compat → dom_adapter}/dom_index.c +0 -0
- /data/ext/makiri/{lexbor_compat → dom_adapter}/post_parse.c +0 -0
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 27ac120b94ab835caee9bbb50a1cee71b19e339dde2384496db9608e58b3269b
|
|
4
|
+
data.tar.gz: 27b8ea683abe8854e6c68269413d4858e0f2fedfdd04f04d8fa91130b9b05ac1
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 84754fb994af236692bdbc281cb0cba89a8cd6d7c75e2caa4e16ebe9b1efa6c4cbd270409be2957e461db4909bbabc32296ed44e185ccaa8985a0c285f25846c
|
|
7
|
+
data.tar.gz: c3fba2792720ad30d1bee90343e4ae7877bd9871195620ce667281fffeb36e994b3e715a5f456327aaa4e8de1e80e3c24c7e6a898739c660f4d1c8d5ffa51c60
|
|
@@ -57,6 +57,15 @@ jobs:
|
|
|
57
57
|
- name: CSS Selectors differential vs Nokogiri::HTML5
|
|
58
58
|
run: bundle exec rake conformance:css
|
|
59
59
|
|
|
60
|
+
- name: XML XPath 1.0 differential vs Nokogiri::XML
|
|
61
|
+
run: bundle exec rake conformance:xpath_xml
|
|
62
|
+
|
|
63
|
+
- name: XML CSS-selector differential vs Nokogiri::XML
|
|
64
|
+
run: bundle exec rake conformance:css_xml
|
|
65
|
+
|
|
66
|
+
- name: XML Builder differential vs Nokogiri::XML::Builder
|
|
67
|
+
run: bundle exec rake conformance:builder
|
|
68
|
+
|
|
60
69
|
# Nightly: a wide XPath differential sweep across many seeds and a high
|
|
61
70
|
# generated volume, to surface divergences the curated corpus and a single
|
|
62
71
|
# seed miss. Fails fast (bash -e) on the first real divergence.
|
|
@@ -92,3 +101,16 @@ jobs:
|
|
|
92
101
|
XPATH_ARGS="--generate 20000 --seed ${seed}" bundle exec rake conformance:xpath
|
|
93
102
|
echo "::endgroup::"
|
|
94
103
|
done
|
|
104
|
+
|
|
105
|
+
# The W3C XML conformance suite fetches its (pinned) test corpus on first
|
|
106
|
+
# run, so it lives in the nightly rather than on every PR.
|
|
107
|
+
- name: XML 1.0 well-formedness vs the W3C XML Conformance Test Suite
|
|
108
|
+
run: bundle exec rake conformance:xmlconf
|
|
109
|
+
|
|
110
|
+
# Property-based tree differential: generated documents, Makiri's parsed
|
|
111
|
+
# tree + canonical output vs Nokogiri::XML. Scalable volume - nightly gets
|
|
112
|
+
# a much larger batch than the resident in-suite PBT specs.
|
|
113
|
+
- name: XML property-based tree differential vs Nokogiri::XML
|
|
114
|
+
run: bundle exec rake conformance:xml_pbt
|
|
115
|
+
env:
|
|
116
|
+
PBT_ARGS: "--count 20000"
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
name: libFuzzer
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
# Nightly: libFuzzer runs are coverage-guided and long-running; they complement
|
|
5
|
+
# the PR-level short fuzz (30s) and the nightly sanitizer fuzz (300s per target).
|
|
6
|
+
# We run them on a schedule because the coverage signal needs sustained CPU
|
|
7
|
+
# time to reach deep branches (e.g. DOCTYPE quote/bracket state machine,
|
|
8
|
+
# reference expansion boundaries).
|
|
9
|
+
schedule:
|
|
10
|
+
- cron: "0 3 * * *"
|
|
11
|
+
workflow_dispatch:
|
|
12
|
+
|
|
13
|
+
jobs:
|
|
14
|
+
# Build the libFuzzer harnesses under clang with -fsanitize=fuzzer,address
|
|
15
|
+
# and run each target for 300s. The corpus is stored as a build artifact so
|
|
16
|
+
# it can be seeded in subsequent runs (regression asset per the roadmap).
|
|
17
|
+
libfuzzer-nightly:
|
|
18
|
+
name: libFuzzer ${{ matrix.target }} (clang)
|
|
19
|
+
runs-on: ubuntu-latest
|
|
20
|
+
timeout-minutes: 360
|
|
21
|
+
strategy:
|
|
22
|
+
fail-fast: false
|
|
23
|
+
matrix:
|
|
24
|
+
target: [xml, xpath]
|
|
25
|
+
|
|
26
|
+
steps:
|
|
27
|
+
- name: Checkout (with vendored Lexbor submodule)
|
|
28
|
+
uses: actions/checkout@v6
|
|
29
|
+
with:
|
|
30
|
+
submodules: recursive
|
|
31
|
+
|
|
32
|
+
- name: Ensure cmake is available
|
|
33
|
+
uses: lukka/get-cmake@latest
|
|
34
|
+
|
|
35
|
+
- name: Set up Ruby
|
|
36
|
+
uses: ruby/setup-ruby@v1
|
|
37
|
+
with:
|
|
38
|
+
ruby-version: "3.4"
|
|
39
|
+
bundler-cache: true
|
|
40
|
+
|
|
41
|
+
# Build the vendored Lexbor first (plain mode is fine; the harness links
|
|
42
|
+
# the static archive). The fuzzer binary itself is built with ASan.
|
|
43
|
+
- name: Compile the extension (builds Lexbor)
|
|
44
|
+
run: bundle exec rake compile
|
|
45
|
+
|
|
46
|
+
- name: Install clang
|
|
47
|
+
run: |
|
|
48
|
+
sudo apt-get update
|
|
49
|
+
sudo apt-get install -y clang
|
|
50
|
+
|
|
51
|
+
- name: Build libFuzzer harnesses
|
|
52
|
+
run: |
|
|
53
|
+
cd ext/makiri/fuzz
|
|
54
|
+
make clean
|
|
55
|
+
make all
|
|
56
|
+
|
|
57
|
+
# Restore the previous corpus so the fuzzer starts from the accumulated
|
|
58
|
+
# regression seeds rather than from scratch.
|
|
59
|
+
- name: Restore corpus cache
|
|
60
|
+
uses: actions/cache@v4
|
|
61
|
+
with:
|
|
62
|
+
path: ext/makiri/fuzz/corpus/${{ matrix.target }}
|
|
63
|
+
key: libfuzzer-corpus-${{ matrix.target }}-${{ github.run_id }}
|
|
64
|
+
restore-keys: |
|
|
65
|
+
libfuzzer-corpus-${{ matrix.target }}-
|
|
66
|
+
|
|
67
|
+
- name: Run libFuzzer ${{ matrix.target }} (300s)
|
|
68
|
+
run: |
|
|
69
|
+
mkdir -p ext/makiri/fuzz/corpus/${{ matrix.target }}
|
|
70
|
+
cd ext/makiri/fuzz
|
|
71
|
+
./${{ matrix.target }}_fuzz \
|
|
72
|
+
-max_total_time=300 \
|
|
73
|
+
-max_len=4096 \
|
|
74
|
+
-print_final_stats=1 \
|
|
75
|
+
corpus/${{ matrix.target }}
|
|
76
|
+
|
|
77
|
+
# Save the mutated corpus as an artifact for download / seeding.
|
|
78
|
+
- name: Upload corpus artifact
|
|
79
|
+
uses: actions/upload-artifact@v4
|
|
80
|
+
with:
|
|
81
|
+
name: libfuzzer-corpus-${{ matrix.target }}
|
|
82
|
+
path: ext/makiri/fuzz/corpus/${{ matrix.target }}
|
|
83
|
+
retention-days: 7
|
|
@@ -70,12 +70,71 @@ jobs:
|
|
|
70
70
|
bundler-cache: true
|
|
71
71
|
|
|
72
72
|
- name: Run short fuzz under sanitizers
|
|
73
|
-
run: bundle exec rake fuzz:sanitize FUZZ_ARGS="--
|
|
73
|
+
run: bundle exec rake fuzz:sanitize FUZZ_ARGS="--time 30"
|
|
74
|
+
|
|
75
|
+
# macOS-only malloc-leak gate: ASan everywhere runs with detect_leaks=0 (Ruby
|
|
76
|
+
# and Lexbor are uninstrumented), so this is the ONLY automated leak check. It
|
|
77
|
+
# flags per-call leak stacks through the extension, including on rescued
|
|
78
|
+
# failure paths (see script/check_leaks.rb).
|
|
79
|
+
security-leaks:
|
|
80
|
+
name: Malloc-leak gate (macOS leaks)
|
|
81
|
+
runs-on: macos-latest
|
|
82
|
+
if: github.event_name != 'schedule'
|
|
83
|
+
steps:
|
|
84
|
+
- name: Checkout (with vendored Lexbor submodule)
|
|
85
|
+
uses: actions/checkout@v6
|
|
86
|
+
with:
|
|
87
|
+
submodules: recursive
|
|
88
|
+
|
|
89
|
+
- name: Ensure cmake is available
|
|
90
|
+
uses: lukka/get-cmake@latest
|
|
91
|
+
|
|
92
|
+
- name: Set up Ruby
|
|
93
|
+
uses: ruby/setup-ruby@v1
|
|
94
|
+
with:
|
|
95
|
+
ruby-version: "3.4"
|
|
96
|
+
bundler-cache: true
|
|
74
97
|
|
|
98
|
+
- name: Run the leak gate
|
|
99
|
+
run: bundle exec rake leaks
|
|
100
|
+
|
|
101
|
+
# OOM-injection sweep: rebuilds with MAKIRI_ALLOC_INJECT=1 and fails each core
|
|
102
|
+
# C allocation site in turn, gating that every OOM branch fails closed - a
|
|
103
|
+
# clean exception or a baseline-identical result, never truncated output
|
|
104
|
+
# (see script/check_alloc_failures.rb).
|
|
105
|
+
security-alloc-inject:
|
|
106
|
+
name: OOM-injection sweep
|
|
107
|
+
runs-on: ubuntu-latest
|
|
108
|
+
if: github.event_name != 'schedule'
|
|
109
|
+
steps:
|
|
110
|
+
- name: Checkout (with vendored Lexbor submodule)
|
|
111
|
+
uses: actions/checkout@v6
|
|
112
|
+
with:
|
|
113
|
+
submodules: recursive
|
|
114
|
+
|
|
115
|
+
- name: Ensure cmake is available
|
|
116
|
+
uses: lukka/get-cmake@latest
|
|
117
|
+
|
|
118
|
+
- name: Set up Ruby
|
|
119
|
+
uses: ruby/setup-ruby@v1
|
|
120
|
+
with:
|
|
121
|
+
ruby-version: "3.4"
|
|
122
|
+
bundler-cache: true
|
|
123
|
+
|
|
124
|
+
- name: Run the OOM-injection sweep
|
|
125
|
+
run: bundle exec rake oom
|
|
126
|
+
|
|
127
|
+
# Nightly: fuzz EVERY target (the 30s PR fuzz covers only the default xpath
|
|
128
|
+
# target; the CSS engine reuse and the XML parser/mutator are each their own
|
|
129
|
+
# documented memory-safety risk, so each gets a full 300s run).
|
|
75
130
|
security-fuzz-nightly:
|
|
76
|
-
name: Nightly sanitized fuzz
|
|
131
|
+
name: Nightly sanitized fuzz (${{ matrix.target }})
|
|
77
132
|
runs-on: ubuntu-latest
|
|
78
133
|
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
|
|
134
|
+
strategy:
|
|
135
|
+
fail-fast: false
|
|
136
|
+
matrix:
|
|
137
|
+
target: [xpath, css, xml, mutate, xmlcss]
|
|
79
138
|
steps:
|
|
80
139
|
- name: Checkout (with vendored Lexbor submodule)
|
|
81
140
|
uses: actions/checkout@v6
|
|
@@ -92,4 +151,30 @@ jobs:
|
|
|
92
151
|
bundler-cache: true
|
|
93
152
|
|
|
94
153
|
- name: Run nightly fuzz under sanitizers
|
|
95
|
-
run: bundle exec rake fuzz:sanitize FUZZ_ARGS="--
|
|
154
|
+
run: bundle exec rake fuzz:sanitize FUZZ_ARGS="--target ${{ matrix.target }} --time 300"
|
|
155
|
+
|
|
156
|
+
# Nightly: the whole spec suite with Lexbor ITSELF built under ASan (mraw
|
|
157
|
+
# poisoning on), catching intra-arena overflows that a plain ASan build cannot
|
|
158
|
+
# see - the class the v3.0.0 :lexbor-contains overflow belonged to. Heavy
|
|
159
|
+
# (full instrumented Lexbor rebuild), so nightly only.
|
|
160
|
+
security-sanitize-lexbor:
|
|
161
|
+
name: Nightly instrumented-Lexbor ASan suite
|
|
162
|
+
runs-on: ubuntu-latest
|
|
163
|
+
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
|
|
164
|
+
steps:
|
|
165
|
+
- name: Checkout (with vendored Lexbor submodule)
|
|
166
|
+
uses: actions/checkout@v6
|
|
167
|
+
with:
|
|
168
|
+
submodules: recursive
|
|
169
|
+
|
|
170
|
+
- name: Ensure cmake is available
|
|
171
|
+
uses: lukka/get-cmake@latest
|
|
172
|
+
|
|
173
|
+
- name: Set up Ruby
|
|
174
|
+
uses: ruby/setup-ruby@v1
|
|
175
|
+
with:
|
|
176
|
+
ruby-version: "3.4"
|
|
177
|
+
bundler-cache: true
|
|
178
|
+
|
|
179
|
+
- name: Build Lexbor under ASan and run the suite
|
|
180
|
+
run: bundle exec rake "sanitize:lexbor"
|
|
@@ -0,0 +1,138 @@
|
|
|
1
|
+
name: Valgrind + GC.compact
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
# Valgrind memcheck ALSO runs on push to main: it is the only check without a
|
|
5
|
+
# frequency threshold (any "definitely lost" / uninitialised-value use fails,
|
|
6
|
+
# unlike the PR-level macOS leak gate, which only flags stacks repeated >=30x),
|
|
7
|
+
# so a leak on a rarely-hit error path slips past the PR gates and would
|
|
8
|
+
# otherwise surface only on the next nightly. Running it post-merge catches such
|
|
9
|
+
# regressions within ~30 min without adding ~20 min to every PR. (It is gated to
|
|
10
|
+
# main only, not pull_request, to keep PR latency low.)
|
|
11
|
+
#
|
|
12
|
+
# The GC.stress job stays nightly-only (see its `if:` below): it is heavy and
|
|
13
|
+
# checks structural properties that do not vary by day-to-day churn.
|
|
14
|
+
push:
|
|
15
|
+
branches: [main, master]
|
|
16
|
+
schedule:
|
|
17
|
+
- cron: "0 2 * * *"
|
|
18
|
+
workflow_dispatch:
|
|
19
|
+
|
|
20
|
+
jobs:
|
|
21
|
+
# Valgrind memcheck on the full spec suite. Complements ASan by catching
|
|
22
|
+
# layout-independent errors ASan misses: use of uninitialised values, and
|
|
23
|
+
# invalid reads/writes that happen to land inside valid malloc regions
|
|
24
|
+
# (intra-arena overflows). Runs on Linux because Valgrind is x86_64/amd64
|
|
25
|
+
# Linux only.
|
|
26
|
+
valgrind-memcheck:
|
|
27
|
+
name: Valgrind memcheck (Ruby ${{ matrix.ruby }})
|
|
28
|
+
runs-on: ubuntu-latest
|
|
29
|
+
timeout-minutes: 360
|
|
30
|
+
env:
|
|
31
|
+
BUNDLE_WITH: valgrind
|
|
32
|
+
# These heavy jobs verify memory discipline (uninit values, intra-arena
|
|
33
|
+
# overflows, use-after-move), not the property space - so the full
|
|
34
|
+
# 300-iteration PBT sweep (already run by the normal CI matrix) is
|
|
35
|
+
# overkill here and, multiplied by Valgrind's 10-50x slowdown, never
|
|
36
|
+
# finishes. A handful of iterations exercises every C memory path while
|
|
37
|
+
# keeping the run tractable.
|
|
38
|
+
PBT_COUNT: "15"
|
|
39
|
+
CSS_PBT_COUNT: "15"
|
|
40
|
+
strategy:
|
|
41
|
+
fail-fast: false
|
|
42
|
+
matrix:
|
|
43
|
+
ruby: ["3.4"]
|
|
44
|
+
|
|
45
|
+
steps:
|
|
46
|
+
- name: Checkout (with vendored Lexbor submodule)
|
|
47
|
+
uses: actions/checkout@v6
|
|
48
|
+
with:
|
|
49
|
+
submodules: recursive
|
|
50
|
+
|
|
51
|
+
- name: Ensure cmake is available
|
|
52
|
+
uses: lukka/get-cmake@latest
|
|
53
|
+
|
|
54
|
+
- name: Set up Ruby
|
|
55
|
+
uses: ruby/setup-ruby@v1
|
|
56
|
+
with:
|
|
57
|
+
ruby-version: ${{ matrix.ruby }}
|
|
58
|
+
bundler-cache: true
|
|
59
|
+
|
|
60
|
+
- name: Compile the extension
|
|
61
|
+
run: bundle exec rake compile
|
|
62
|
+
|
|
63
|
+
- name: Install Valgrind
|
|
64
|
+
run: sudo apt-get update && sudo apt-get install -y valgrind
|
|
65
|
+
|
|
66
|
+
# ruby_memcheck (the `spec:valgrind` rake task) runs the suite under
|
|
67
|
+
# memcheck. It ships Ruby's Valgrind suppression files itself (matched to
|
|
68
|
+
# the running Ruby), so there is no longer a ruby.supp to fetch from
|
|
69
|
+
# ruby/ruby - that path was removed upstream and the fetch step 404'd.
|
|
70
|
+
- name: Run spec suite under Valgrind (ruby_memcheck)
|
|
71
|
+
run: bundle exec rake spec:valgrind
|
|
72
|
+
|
|
73
|
+
# GC.auto_compact + GC.stress over the GC-sensitive examples. This
|
|
74
|
+
# structurally tests the borrowed-pointer discipline under the condition that
|
|
75
|
+
# Ruby Strings actually move (compaction) and that every allocation triggers a
|
|
76
|
+
# full GC cycle (stress). Failures here are typically use-after-move or stale
|
|
77
|
+
# pointer bugs in the C extension or bridge layer.
|
|
78
|
+
#
|
|
79
|
+
# Scope: only the examples tagged `:gc_compact` (the `memory safety` blocks in
|
|
80
|
+
# css/xpath/serialize/mutation/source_location/xpath_handler/api_compat2 +
|
|
81
|
+
# attribute's lazy-index example). Those are the examples written to exercise
|
|
82
|
+
# the borrowed-pointer paths. `GC_COMPACT_STRESS=1` makes spec_helper set
|
|
83
|
+
# `GC.auto_compact = true` process-wide and wrap every example in `GC.stress`,
|
|
84
|
+
# so each allocation inside a tagged example triggers a *compacting* GC - the
|
|
85
|
+
# strongest form of the use-after-move test. The high-volume churn loops
|
|
86
|
+
# (parse/drop cycles) scale their iteration count down under stress
|
|
87
|
+
# (`gc_churn_iters` / `GC_COMPACT_ITERS`) because each stressed iteration is
|
|
88
|
+
# orders of magnitude heavier; `GC_COMPACT_ITERS` below tunes the total runtime
|
|
89
|
+
# (~6-9 min on CI at 200). An earlier version forced GC.stress onto the
|
|
90
|
+
# *entire* suite (~800 examples): it ran 1h40m+ and never finished, while
|
|
91
|
+
# testing borrowed-pointer discipline on hundreds of examples that have none.
|
|
92
|
+
# The rest of the suite still runs in ci.yml.
|
|
93
|
+
#
|
|
94
|
+
# THREADING is deliberately OFF here. The :threading suite is 8 threads x tens
|
|
95
|
+
# of iterations; it runs in ci.yml and its GC-sensitive examples opt into
|
|
96
|
+
# GC.stress themselves, so cross-thread interactions are covered there.
|
|
97
|
+
gc-compact-stress:
|
|
98
|
+
# Nightly / on-demand only - not on push (the valgrind job is the post-merge
|
|
99
|
+
# gate; GC.stress is heavy and structural, so it does not need per-push runs).
|
|
100
|
+
if: github.event_name != 'push'
|
|
101
|
+
name: GC.auto_compact + GC.stress (Ruby ${{ matrix.ruby }})
|
|
102
|
+
runs-on: ubuntu-latest
|
|
103
|
+
timeout-minutes: 30
|
|
104
|
+
env:
|
|
105
|
+
GC_COMPACT_STRESS: "1"
|
|
106
|
+
# Per-iteration cost under per-allocation compacting GC is ~1000x normal, so
|
|
107
|
+
# the churn loops run this many iterations (vs their normal 200-1000). Tunes
|
|
108
|
+
# the job's runtime; raise for more coverage, lower if it approaches the
|
|
109
|
+
# timeout.
|
|
110
|
+
GC_COMPACT_ITERS: "200"
|
|
111
|
+
strategy:
|
|
112
|
+
fail-fast: false
|
|
113
|
+
matrix:
|
|
114
|
+
ruby: ["3.4"]
|
|
115
|
+
|
|
116
|
+
steps:
|
|
117
|
+
- name: Checkout (with vendored Lexbor submodule)
|
|
118
|
+
uses: actions/checkout@v6
|
|
119
|
+
with:
|
|
120
|
+
submodules: recursive
|
|
121
|
+
|
|
122
|
+
- name: Ensure cmake is available
|
|
123
|
+
uses: lukka/get-cmake@latest
|
|
124
|
+
|
|
125
|
+
- name: Set up Ruby
|
|
126
|
+
uses: ruby/setup-ruby@v1
|
|
127
|
+
with:
|
|
128
|
+
ruby-version: ${{ matrix.ruby }}
|
|
129
|
+
bundler-cache: true
|
|
130
|
+
|
|
131
|
+
- name: Compile the extension
|
|
132
|
+
run: bundle exec rake compile
|
|
133
|
+
|
|
134
|
+
# GC_COMPACT_STRESS=1 (set in env above) makes spec_helper enable
|
|
135
|
+
# auto_compact globally and wrap each example in GC.stress; --tag gc_compact
|
|
136
|
+
# limits the run to the borrowed-pointer examples.
|
|
137
|
+
- name: Run GC-sensitive examples under GC.auto_compact + GC.stress
|
|
138
|
+
run: bundle exec rspec --tag gc_compact spec
|
data/CHANGELOG.md
CHANGED
|
@@ -5,7 +5,130 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
-
## [
|
|
8
|
+
## [0.5.0] - 2026-06-14
|
|
9
|
+
|
|
10
|
+
### Fixed
|
|
11
|
+
|
|
12
|
+
* Use-after-free when an XPath custom-function handler mutated the same
|
|
13
|
+
`XPathContext` (`register_*` / `node=`) mid-`evaluate`: such re-entrant context
|
|
14
|
+
mutation is now refused instead of invalidating the running evaluation's state.
|
|
15
|
+
|
|
16
|
+
* `Node#name=` now invalidates the element-name index, so a later `//tag` query
|
|
17
|
+
reflects the rename instead of seeing a stale bucket.
|
|
18
|
+
|
|
19
|
+
* XML processing-instruction targets now follow XML 1.0 §2.6: a PITarget is a
|
|
20
|
+
`Name`, not an NCName, so a colon is permitted (`<?a:b ...?>` parses, and
|
|
21
|
+
`create_processing_instruction("a:b", ...)` succeeds). Only the reserved `xml`
|
|
22
|
+
(any case) is still rejected. Previously a colon in a PI target was rejected as
|
|
23
|
+
not-well-formed, which was stricter than the spec (a PI target is not subject to
|
|
24
|
+
namespace processing).
|
|
25
|
+
|
|
26
|
+
* Memory leaks of the internal XPath evaluation context on error / edge paths: a
|
|
27
|
+
`Makiri::XML` `#css` / `#xpath` / `#at_xpath` whose selector or expression failed
|
|
28
|
+
the text-input contract leaked the context (it is now verified BEFORE the context
|
|
29
|
+
is allocated), and a context could leak if building the Ruby result raised (it is
|
|
30
|
+
now freed before conversion).
|
|
31
|
+
|
|
32
|
+
### Added
|
|
33
|
+
|
|
34
|
+
* `ProcessingInstruction#target` on the XML node (the PI's target name).
|
|
35
|
+
|
|
36
|
+
* Cross-kind `Document#import_node(node, deep = false)`. `import_node` now
|
|
37
|
+
translates a subtree across representations: `Makiri::XML::Document#import_node`
|
|
38
|
+
(newly added) imports an HTML (Lexbor) node by translating it to the XML node
|
|
39
|
+
representation, and `Makiri::HTML::Document#import_node` likewise translates an
|
|
40
|
+
XML node to HTML. Same-representation imports keep working (HTML to HTML via
|
|
41
|
+
Lexbor, XML to XML via the arena deep/shallow copy). The result is a detached
|
|
42
|
+
copy owned by the target document; the source is untouched. Elements (with
|
|
43
|
+
attributes), text, comment, and processing-instruction nodes translate both
|
|
44
|
+
ways, and an HTML `<template>`'s contents (which HTML keeps in a separate
|
|
45
|
+
fragment) are carried across rather than silently dropped; an XML CDATA section
|
|
46
|
+
has no HTML counterpart, so translating one into an HTML document fails closed
|
|
47
|
+
(`Makiri::Error`). Namespaces are preserved across the translation: HTML->XML
|
|
48
|
+
synthesizes the xmlns declarations needed to reproduce each node's namespace
|
|
49
|
+
(so e.g. an inline `<svg>` stays in the SVG namespace and HTML elements in the
|
|
50
|
+
XHTML namespace), and XML->HTML maps the namespace URI back to a Lexbor
|
|
51
|
+
namespace id, interning any URI (not only the ones Lexbor knows by default) so
|
|
52
|
+
custom namespaces survive too. An HTML-namespaced `<template>`'s content is
|
|
53
|
+
placed in its content fragment (HTMLTemplateElement.content), like a parsed
|
|
54
|
+
template. The other node-argument mutators
|
|
55
|
+
(`add_child`/`before`/`after`/`replace`/`fragment`) still reject a foreign-kind
|
|
56
|
+
node; `import_node` is the one sanctioned crossing point.
|
|
57
|
+
|
|
58
|
+
* `set_attribute_ns(namespace, qualified_name, value)` and
|
|
59
|
+
`remove_attribute_ns(namespace, local_name)` on `Makiri::XML` elements - the DOM
|
|
60
|
+
setAttributeNS / removeAttributeNS, keyed on the (explicit namespace, local name)
|
|
61
|
+
pair so two attributes with the same qualified name in different namespaces
|
|
62
|
+
coexist (a null/"" namespace is the null namespace).
|
|
63
|
+
|
|
64
|
+
* `Makiri::Lexbor::CSS.parse_stylesheet(text)`, a thin binding over Lexbor's
|
|
65
|
+
CSS stylesheet parser that returns the parsed rules as plain Ruby primitives
|
|
66
|
+
(`{type: :style, selectors: [{text:, specificity: [a,b,c]}, ...],
|
|
67
|
+
declarations: [{name:, value:, important:}, ...]}` and nested
|
|
68
|
+
`{type: :media, condition:, rules: [...]}`, in source order). Selector
|
|
69
|
+
specificity and value normalization come from Lexbor; `css-syntax-3` error
|
|
70
|
+
recovery means a broken stylesheet yields its valid rules instead of raising.
|
|
71
|
+
Hosts the new `Makiri::Lexbor::*` namespace (the unabstracted lexbor-native
|
|
72
|
+
surface, distinct from the Nokogiri-compatible `Makiri::*`).
|
|
73
|
+
|
|
74
|
+
## [0.4.0] - 2026-06-12
|
|
75
|
+
|
|
76
|
+
### Added
|
|
77
|
+
|
|
78
|
+
* CSS selectors on `Makiri::XML`. `#css` / `#at_css` / `#matches?`, lowered
|
|
79
|
+
to the native XPath engine (case-sensitive, namespace-aware). Covers the
|
|
80
|
+
standard selector set including combinator arguments to `:is`/`:where`/`:not`/
|
|
81
|
+
`:has`, untyped `:*-of-type`, and `:lexbor-contains`. Verified by a differential
|
|
82
|
+
against `Nokogiri::XML` plus property-based tests.
|
|
83
|
+
|
|
84
|
+
* `Makiri::XML::Builder`, a Nokogiri-compatible DSL for building an XML
|
|
85
|
+
document or subtree from scratch (block / `instance_eval` forms, namespaced
|
|
86
|
+
elements via `xml["prefix"]`, the `tag.class.id!` attribute short-cuts, raw-XML
|
|
87
|
+
`<<`, and `.with`). Verified by a differential against `Nokogiri::XML::Builder`.
|
|
88
|
+
|
|
89
|
+
### Changed
|
|
90
|
+
|
|
91
|
+
* The XML declaration emits `encoding="UTF-8"` only when the source declared
|
|
92
|
+
one (or `#to_xml(encoding:)` is passed); built or declaration-less documents
|
|
93
|
+
now serialize to a bare `<?xml version="1.0"?>`, like Nokogiri (the output is
|
|
94
|
+
UTF-8 either way).
|
|
95
|
+
|
|
96
|
+
* Faster XML queries. A document-rooted `//name` / `css("name")` is served
|
|
97
|
+
from a lazily-built element-name index instead of a full-tree walk (~11x
|
|
98
|
+
Nokogiri on the benchmark feed); name tests resolve their prefix once per step,
|
|
99
|
+
and `at_css` / `at_xpath` short-circuit on prefixed name tests.
|
|
100
|
+
|
|
101
|
+
* CSS class/ID selectors now match case-sensitively in no-quirks documents
|
|
102
|
+
(case-insensitively only in quirks mode), like browsers and `Nokogiri::HTML5` -
|
|
103
|
+
via an upstreamed Lexbor fix (see below).
|
|
104
|
+
|
|
105
|
+
* XPath number parsing now follows the XPath 1.0 `Number` grammar exactly and
|
|
106
|
+
is locale-independent, matching libxml2/Nokogiri and browsers. C `strtod`'s
|
|
107
|
+
superset forms are no longer accepted: `1e3` / `0x1A` lex as a Number followed
|
|
108
|
+
by a name (a syntax error as a full expression, where they previously parsed
|
|
109
|
+
as 1000 / 26), `number()` returns NaN for exponent/hex/`+`-signed strings, and
|
|
110
|
+
only XPath whitespace (space/tab/CR/LF, not `\v`/`\f`) is trimmed around the
|
|
111
|
+
coerced value. Valid literals (`5.`, `.5`, `1.5`) are unchanged.
|
|
112
|
+
|
|
113
|
+
### Security
|
|
114
|
+
|
|
115
|
+
* Updated the vendored Lexbor (v3.0.0 -> `3a2d595`), which includes two
|
|
116
|
+
CSS-selector fixes we upstreamed - class/ID case-sensitivity follows quirks
|
|
117
|
+
mode, and a prefix-less type selector no longer defaults to the universal
|
|
118
|
+
namespace - plus a heap-overflow fix in its `:lexbor-contains()` parser
|
|
119
|
+
(reached from `Node#css`) and other post-v3.0.0 bugfixes. (An untagged master
|
|
120
|
+
commit, taken deliberately; see CLAUDE.md.)
|
|
121
|
+
|
|
122
|
+
* Hardened native memory safety. The XML arena is ASan-red-zoned to catch
|
|
123
|
+
intra-arena overflows, the engines are fuzzed under ASan/UBSan, and buffer
|
|
124
|
+
growth is bounded by a hard ceiling.
|
|
125
|
+
|
|
126
|
+
* Extended the lint-enforced bounded-reader (`mkr_span`) discipline to the
|
|
127
|
+
remaining byte-scanning code: the source-location line table, the XPath
|
|
128
|
+
string-function scanners (now explicitly length-bounded instead of relying on
|
|
129
|
+
the NUL contract), and the number parse above. Fixed a borrowed-RSTRING
|
|
130
|
+
pointer held across a potential GC point in the XML encoding sniffer, and a
|
|
131
|
+
missing NUL-termination guarantee in the libFuzzer XPath harness.
|
|
9
132
|
|
|
10
133
|
## [0.3.0] - 2026-06-06
|
|
11
134
|
|
|
@@ -239,7 +362,9 @@ libxml2 / libxslt dependency at any layer**.
|
|
|
239
362
|
domxpath, CSS differential vs `Nokogiri::HTML5`). GitHub Actions CI across
|
|
240
363
|
Ruby 3.2–4.0 × Ubuntu/macOS plus a sanitizer job.
|
|
241
364
|
|
|
242
|
-
[Unreleased]: https://github.com/takahashim/makiri/compare/v0.
|
|
365
|
+
[Unreleased]: https://github.com/takahashim/makiri/compare/v0.5.0...HEAD
|
|
366
|
+
[0.5.0]: https://github.com/takahashim/makiri/compare/v0.4.0...v0.5.0
|
|
367
|
+
[0.4.0]: https://github.com/takahashim/makiri/compare/v0.3.0...v0.4.0
|
|
243
368
|
[0.3.0]: https://github.com/takahashim/makiri/compare/v0.2.0...v0.3.0
|
|
244
369
|
[0.2.0]: https://github.com/takahashim/makiri/compare/v0.1.0...v0.2.0
|
|
245
370
|
[0.1.0]: https://github.com/takahashim/makiri/releases/tag/v0.1.0
|