iriq 0.2.0 → 0.30.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +78 -0
- data/CLAUDE.md +128 -41
- data/Gemfile.lock +4 -4
- data/Makefile +80 -23
- data/README.md +225 -347
- data/completions/_iriq +52 -0
- data/completions/iriq.bash +70 -0
- data/docs/ARCHITECTURE.md +223 -0
- data/docs/ROADMAP.md +190 -0
- data/iriq.gemspec +2 -2
- data/lib/iriq/cli.rb +398 -46
- data/lib/iriq/cluster.rb +284 -12
- data/lib/iriq/corpus.rb +318 -36
- data/lib/iriq/cross_host_shape.rb +37 -0
- data/lib/iriq/event.rb +22 -0
- data/lib/iriq/evidence.rb +114 -0
- data/lib/iriq/explanation.rb +1 -1
- data/lib/iriq/normalizer.rb +71 -29
- data/lib/iriq/path_shape.rb +30 -24
- data/lib/iriq/position.rb +75 -0
- data/lib/iriq/position_stats.rb +74 -8
- data/lib/iriq/recognizer.rb +54 -0
- data/lib/iriq/recognizer_proposal.rb +167 -0
- data/lib/iriq/recognizers/date.rb +53 -0
- data/lib/iriq/recognizers/integer.rb +37 -0
- data/lib/iriq/recognizers/uuid.rb +16 -0
- data/lib/iriq/reducer.rb +37 -0
- data/lib/iriq/registrable_domain.rb +56 -0
- data/lib/iriq/segment_classifier.rb +475 -23
- data/lib/iriq/segment_hints.rb +9 -0
- data/lib/iriq/shape.rb +106 -0
- data/lib/iriq/specificity.rb +35 -0
- data/lib/iriq/storage/memory.rb +83 -12
- data/lib/iriq/storage/sqlite.rb +216 -37
- data/lib/iriq/synthesized_recognizer.rb +56 -0
- data/lib/iriq/trace.rb +294 -0
- data/lib/iriq/version.rb +1 -1
- data/lib/iriq.rb +17 -0
- metadata +22 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 598c04e3c1777787ae9e5d1be98e2bc68d441e2020ebe7743bdf8075b20fdaec
|
|
4
|
+
data.tar.gz: 396ad6b0b0acffb76b7bc2b4e31792b02ac65749c6ac769fd70a47ce5d806496
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 16637ff46f4648a2cfc14404ed074d3cedc6fc08cabf0c46a6fa7a39553b8b78020907fde1d9024005cec3a4f2cdb4cb3e999802ec866a942c20c15be6c7af34
|
|
7
|
+
data.tar.gz: 458aa6deba73a571a07801bb3df445a4d08500a55c3d1d8c50b11df0811dbc785270fd1ef908ebbb58c6f53dfb1ecfc7013fe137e216e88bca81c9bc56d21fa4
|
data/CHANGELOG.md
CHANGED
|
@@ -1,3 +1,81 @@
|
|
|
1
|
+
### 0.30.2 (2026-06-23)
|
|
2
|
+
- Piped stdin and `--file` now **stream** the per-IRI sections (`-n`/`-p`/`-c`/`-e`) line by line, flushing each IRI as it's processed — `tail -f access.log | iriq -n` is live and memory stays bounded on huge inputs. Output is byte-identical to before; the aggregate views (deduped URL list, clusters, `--stats`) still read the whole input. Ruby, Go, and Rust.
|
|
3
|
+
|
|
4
|
+
### 0.30.1 (2026-06-21)
|
|
5
|
+
- Batch sections (`--normalize` etc.) are now corpus-informed when `--corpus` is supplied, matching single-input behavior.
|
|
6
|
+
- Added a CLI end-to-end test suite (sections, formats, batch/cluster, subcommands) and a `make check` Rust gate + pre-push hook.
|
|
7
|
+
|
|
8
|
+
### 0.30.0 (2026-06-21)
|
|
9
|
+
- Rust consolidated into a single crate (library + `iriq` binary) with SQLite always on by default — no separate sqlite build.
|
|
10
|
+
- Go moved into the `go/` subdirectory; import path is now `github.com/dpep/iriq/go`.
|
|
11
|
+
|
|
12
|
+
### 0.11.0 (2026-05-27)
|
|
13
|
+
- New classifier types: `:color` (hex form `#fff`/`#ffffff`/`#ffffff80`), `:coordinate` (`lat,lng` pair with plausible-range validation), `:country` (ISO 3166-1 alpha-2, allowlisted), `:base64` (≥16 chars with `+`/`/`/`=` to disambiguate from `:opaque_id`).
|
|
14
|
+
- `SegmentClassifier.color_kind(value)` / `ColorKind(value)` returns `:hex` for hex-shaped colors — placeholder for future named / rgb / hsl support, mirrors the file_kind pattern.
|
|
15
|
+
- Param-name hint map extended: `color`/`bg`/`fg`/`background`/`foreground` → `:color`, `coords`/`coordinates`/`geo`/`location`/`position`/`latlng` → `:coordinate`, `country`/`country_code`/`nation` → `:country`.
|
|
16
|
+
- `-J` is now a short alias for `--ndjson` (combinable: `iriq -nJ < file`).
|
|
17
|
+
- New CLI `-e/--explain` flag — annotated normalization trace. For each path segment / query param, shows the value, type, output (placeholder or canonical value), and notes for every non-obvious transformation (hint suppression for semantic types, currency upcase, IP umbrella collapse, canonical date, param-name lift). JSON via `-e -j` returns the same structure.
|
|
18
|
+
- Library API: `Iriq::Trace.for(input)` (Ruby) / `iriq.Trace(input)` (Go) returns the same trace data structure.
|
|
19
|
+
- Classifier perf: each regex test is now gated on a cheap composition check (`String#include?` / `IndexByte` / `size`) so a literal like `"users"` skips ~20 regex matches instead of walking the full chain. Measured: Ruby normalize +12%, extract +27%; Go CLI wall time -25%.
|
|
20
|
+
|
|
21
|
+
### 0.10.0 (2026-05-27)
|
|
22
|
+
- New classifier type `:file` — `name.ext` shape where `ext` is in a curated allowlist spanning image / document / data / text / web / audio / video / archive / code kinds. `image.png` and `report.pdf` classify as `:file` instead of falling through to `:opaque_id`. The per-extension kind (`:image`, `:document`, etc.) is surfaced via `SegmentClassifier.file_kind(value)` / `FileKindOf(value)` for verbose displays.
|
|
23
|
+
- `Cluster#param_summary` adds `:kind_distribution` for `:file`-typed params — buckets observed values by kind. Best-effort: only reflects values within the tracking cap.
|
|
24
|
+
- New phone format: NANP-style `555-666-7777`, `555.666.7777`, `(555) 666-7777`. Leading area-code + exchange digits constrained to 2-9 so dotted version strings / digit blobs don't shadow. The `+` E.164 form still covers international.
|
|
25
|
+
- Param-name hints — when a value's type is generic (`:literal`, `:opaque_id`, `:slug`), the param name can supply the type. `?phone=unknown` becomes `{phone}` and `?email=tbd` becomes `{email}`. Hint map covers phone/email/locale/currency/url/jwt/mime variations. Specific value types (e.g. `?phone=12345` → `:integer`) still win.
|
|
26
|
+
|
|
27
|
+
### 0.9.0 (2026-05-27)
|
|
28
|
+
- Semantic types (`:version`, `:locale`, `:currency`, `:date`, `:boolean`, `:timestamp`, etc.) now surface as `{type}` placeholders instead of being run through the noun-singularize hint. `/api/v1/status` renders `/api/{version}/status` rather than the misleading `/api/{api_id}/status`. Only ID-shaped types (`:integer`, `:uuid`, `:hash`, `:opaque_id`, `:slug`) keep the `{noun_id}` form.
|
|
29
|
+
- `--normalize` collapses `:ipv4` and `:ipv6` to `{ip}` in placeholder form (previously rendered as `{ipv4}` / `{ipv6}`). The classifier still tracks the specific family; cluster summary keeps the distinct types.
|
|
30
|
+
- `--normalize` canonicalizes currency segments and params to ISO 4217 upper case — `/pricing/usd` → `/pricing/USD`, `?currency=eur` → `?currency=EUR`. Mirrors the existing date canonicalization (`canonical_currencies: true` flag on `PathShape`).
|
|
31
|
+
- `LOCALE_RE` tightened: the region/script portion now caps at 2-4 alphanumeric chars and the language portion is validated against the ISO 639-1 allowlist — `by-locale` no longer wrongly classifies as `:locale`.
|
|
32
|
+
- New classifier types: `:phone` (E.164 — `+` then 7-15 digits with optional separators), `:jwt` (three base64url segments separated by dots), `:mime` (RFC 2046 top-level type + subtype, e.g. `image/png`, `application/vnd.api+json`).
|
|
33
|
+
- New corpus-promoted type `:http_status` — integer positions whose observed range falls inside 100..599 with ≥2 distinct values and ≥5 samples get promoted. Same range-analysis pattern as `:year`.
|
|
34
|
+
- Scheme-less URL detection: query values like `?redirect=foo.com/path` classify as `:url`. Requires a dotted host with a TLD-like ≥2-letter suffix followed by a slash, so `image.png` stays as `:opaque_id`.
|
|
35
|
+
- `Cluster#param_summary` adds two new fields:
|
|
36
|
+
- `:value_distribution` — fractions per tracked value, for `:boolean` and `:enum` positions (e.g. `{ "true" => 0.97, "false" => 0.03 }`). Same data already in `value_counts`, surfaced as ratios.
|
|
37
|
+
- `:subtype_distribution` — int-vs-float split for `:number` positions (e.g. `{ integer: 0.4, float: 0.6 }`).
|
|
38
|
+
- `:boolean` now wins over `:enum` when the dominant type is boolean — a position of pure `true`/`false` stays `:boolean` rather than being demoted to a 2-value enum.
|
|
39
|
+
|
|
40
|
+
### 0.8.0 (2026-05-27)
|
|
41
|
+
- **Breaking**: `:numeric` umbrella renamed to `:number` (Ruby) / `TypeNumeric` → `TypeNumber` (Go). Same semantics.
|
|
42
|
+
- New classifier types: `:boolean` (`true`/`false`, any case), `:version` (`v1`, `v2.0.1`, `v1.2.3-beta` — requires the `v` prefix), `:locale` (BCP 47-ish full forms like `en-US`/`fr_CA`, plus bare 2-letter language codes from an inline ISO 639-1 allowlist of ~55 entries — `en`, `fr`, `ja`, etc.), `:currency` (3-letter codes from an inline ISO 4217 allowlist of ~35 entries).
|
|
43
|
+
- `:year` is now corpus-only: an `:integer` position whose observed min/max land in 1900..2100 with ≥2 distinct values gets promoted. A single 4-digit integer in isolation classifies as `:integer` — only range analysis across observations is reliable.
|
|
44
|
+
- `PositionStats` now tracks `numeric_min` / `numeric_max` / `numeric_sum` / `numeric_count` for `:integer`/`:float` observations. `Cluster#param_summary` surfaces `min` / `max` / `avg` on any param with numeric observations.
|
|
45
|
+
- Shape-y variable types (`:version`, `:locale`, `:currency`, `:boolean`) now respect the stable-literal rule: a single dominant value at a position (`v1` only across many observations) stays as the literal `v1` instead of being placeholdered as `{version}`. High cardinality at the same position falls back to `{version}` / `{locale}` / etc. as expected.
|
|
46
|
+
- 0/1 booleans still classify as `:integer` individually; the existing `:enum` umbrella catches `?flag=0` / `?flag=1` patterns when they cluster.
|
|
47
|
+
|
|
48
|
+
### 0.7.0 (2026-05-27)
|
|
49
|
+
- **Breaking**: `:integer_id` classifier type renamed to `:integer` (Ruby) / `TypeIntegerID` → `TypeInteger` (Go). The "ID" semantics live in the hints layer (which still produces `{user_id}` placeholders); the classifier now reflects pure shape. Update any direct `.classify(...) == :integer_id` checks, dump-file consumers, and persisted corpora — the type symbol changed in `type_counts` and raw shape strings (e.g. `/users/{integer_id}` → `/users/{integer}`).
|
|
50
|
+
- New `:enum` umbrella (corpus-only): when a param has a small bounded set of repeated values (default ≥20 observations, ≤10 distinct, each ≥2 occurrences, ≥95% coverage), `Cluster#param_type` returns `:enum` and `param_summary` includes the value list under `:values`. Normalize output keeps the `{enum}` placeholder — values aren't inlined.
|
|
51
|
+
- `iriq --host=full|registrable|reg|none` CLI flag plumbs `Corpus#host_strategy` from the command line. `reg` is a short alias for `registrable`.
|
|
52
|
+
|
|
53
|
+
### 0.6.0 (2026-05-27)
|
|
54
|
+
- New classifier types: `:ipv4`, `:ipv6`, `:url`, `:email` (Ruby) / `TypeIPv4`, `TypeIPv6`, `TypeURL`, `TypeEmail` (Go). Slotted before the generic `:opaque_id` / `:literal` catch-alls so URL params like `?redirect=https://foo.com/...`, `?email=alice@example.com`, `?ip=192.168.1.1`, `?gateway=fe80::1` get distinct types instead of falling through.
|
|
55
|
+
- IPv4 validates octets ≤ 255 — out-of-range dotted-quads fall back to `:opaque_id`.
|
|
56
|
+
- IPv6 accepts the full eight-group form and any compressed form containing `::`. IPv4-mapped variants (`::ffff:192.0.2.1`) are not recognized.
|
|
57
|
+
|
|
58
|
+
### 0.5.0 (2026-05-27)
|
|
59
|
+
- Float values now classify as `:float` instead of falling through to `:opaque_id` (Ruby `:float` / Go `TypeFloat`). Regex requires digits on both sides of the decimal — `3.14`, `-2.5`, `1.0` match; `.5`, `1.`, `1e10` do not.
|
|
60
|
+
- New `:numeric` umbrella (corpus-only): when a cluster sees both `:integer_id` and `:float` observations at the same param with neither subtype hitting the 80% confidence threshold, the param surfaces as `:numeric` in `param_summary` and renders as `{numeric}` in `Corpus#normalize` output. The classifier itself never returns `:numeric` directly — individual values are always specifically int or float.
|
|
61
|
+
- `Corpus.new(host_strategy: ...)` knob controls how host is keyed into clusters: `:full` (default, unchanged), `:registrable` (strip subdomains, so `api.foo.com` and `app.foo.com` cluster as `foo.com`), `:none` (ignore host, group all observations by shape alone). `:registrable` uses an inline allowlist of ~70 common multi-label TLDs (`co.uk`, `com.au`, `co.jp`, etc.) — niche multi-label suffixes like `.priv.no` will be over-stripped.
|
|
62
|
+
|
|
63
|
+
### 0.4.0 (2026-05-27)
|
|
64
|
+
- Query-param clustering: each `Cluster` now tracks per-param presence, value cardinality, and type via `param_stats`. Surfaced on `cluster.to_h[:params]` (and the JSON cluster view), persisted in both JSON and SQLite backends.
|
|
65
|
+
- `Corpus#normalize` (Ruby) / `Corpus.NormalizeIdentifier` (Go) now include query params, rendered with corpus-informed types when available (falls back to mechanical classification otherwise).
|
|
66
|
+
- New `corpus.params_for(url)` / `Corpus.ParamsFor(url)` — returns the inferred params for the cluster `url` would fall into. Useful for "what params might this URL accept?" tooling.
|
|
67
|
+
- Date detection expanded to include `YYYY/MM/DD` and `YYYYMMDD` (with year/month/day sanity bounds) alongside the existing `YYYY-MM-DD`.
|
|
68
|
+
- `SegmentClassifier.canonical_date(value)` / `CanonicalDate(value)` returns the ISO form for any recognized date.
|
|
69
|
+
- `--normalize` output canonicalizes recognized date values to `YYYY-MM-DD` (path segments and query params). Cluster keys still use `{date}` placeholders so dated routes still group together.
|
|
70
|
+
- `PositionStats::DEFAULT_MAX_VALUES` is now the value cap for `cluster.param_stats[name]` too.
|
|
71
|
+
|
|
72
|
+
### 0.3.0 (2026-05-25)
|
|
73
|
+
- Go: SQLite backend is now opt-in via `-tags sqlite`. Default `go install` and the `iriq` Homebrew formula ship a slim binary (~30% smaller) with JSON corpora only. SQLite users compile with `-tags sqlite` or install `dpep/tools/iriq-sqlite`.
|
|
74
|
+
- Makefile: `release` / `release-sqlite` targets strip debug symbols and use `-trimpath` for reproducible builds.
|
|
75
|
+
- CLI: `iriq --help` reports the active build (slim vs sqlite).
|
|
76
|
+
- Slim build returns a friendly error when a `.db` corpus path is opened, pointing at the iriq-sqlite formula.
|
|
77
|
+
- `PositionStats::DEFAULT_MAX_VALUES` / `DefaultMaxValuesPerPosition` raised from 1000 → 5000. Existing corpora keep whatever cap they were created with (the cap is persisted in the dump / SQLite meta table); only freshly-constructed corpora pick up the new default.
|
|
78
|
+
|
|
1
79
|
### 0.2.0 (2026-05-25)
|
|
2
80
|
- Corpus storage backends: JSON (default) and SQLite, dispatched by file extension
|
|
3
81
|
- Go: `iriq.OpenCorpus(path)`; Ruby: `Iriq::Corpus.open(path)`
|
data/CLAUDE.md
CHANGED
|
@@ -1,34 +1,69 @@
|
|
|
1
1
|
# Iriq development conventions
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
3
|
+
> **⚠️ Behavior changes touch ALL THREE runtimes.** Ruby is the reference; Go
|
|
4
|
+
> + Rust mirror it. Before committing any change to
|
|
5
|
+
> parser/normalizer/extractor/CLI/etc:
|
|
6
|
+
>
|
|
7
|
+
> 1. Update Ruby + specs.
|
|
8
|
+
> 2. `bundle exec ruby script/generate_fixtures.rb` (regenerate JSON parity fixtures).
|
|
9
|
+
> 3. Port the change to Go (the Go module lives in `go/`).
|
|
10
|
+
> 4. `go -C go test ./...` — fixture tests should still pass.
|
|
11
|
+
> 5. `make build && script/cli_parity.sh` — Ruby ↔ Go CLI parity should still pass.
|
|
12
|
+
> 6. Port the change to Rust under `rust/`.
|
|
13
|
+
> 7. `cd rust && cargo test --workspace` — Rust fixture tests should still pass
|
|
14
|
+
> (SQLite is a default feature).
|
|
15
|
+
> 8. `cd rust && cargo build --release --bin iriq && cd .. && script/rust_parity.sh`
|
|
16
|
+
> — Rust ↔ Go CLI parity (covers Ruby transitively).
|
|
17
|
+
> 9. Commit the regenerated fixtures alongside the code change.
|
|
18
|
+
>
|
|
19
|
+
> CI's parity + Rust jobs will fail if any step is skipped. The **Rust gate**
|
|
20
|
+
> (fmt + clippy + tests) is automated — run `make hooks` once to install the
|
|
21
|
+
> committed pre-push hook that runs `make check`. Full multi-runtime pre-push
|
|
22
|
+
> for a behavior change:
|
|
23
|
+
> `bundle exec rspec && go -C go test ./... && script/cli_parity.sh && make check && script/rust_parity.sh`.
|
|
24
|
+
|
|
25
|
+
## Repo layout — Ruby at the root, Go and Rust in subdirs
|
|
26
|
+
|
|
27
|
+
The Ruby gem lives at the repo root (it's the reference implementation and the
|
|
28
|
+
published gem); the two mirror implementations are compartmentalized into
|
|
29
|
+
`go/` and `rust/`. Earlier the Go code was intermixed at the root; it now sits
|
|
30
|
+
in `go/`, symmetric with `rust/`, so the root reads as "Ruby + two ports."
|
|
7
31
|
|
|
8
32
|
```
|
|
9
33
|
iriq/
|
|
10
|
-
lib/ exe/ spec/
|
|
34
|
+
lib/ exe/ spec/ ← Ruby gem (library, CLI, specs) — the reference
|
|
35
|
+
completions/ ← shell-completion scripts shipped by the gem
|
|
11
36
|
iriq.gemspec
|
|
12
37
|
Gemfile
|
|
13
38
|
|
|
14
|
-
go
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
39
|
+
go/ ← Go module github.com/dpep/iriq/go
|
|
40
|
+
go.mod go.sum
|
|
41
|
+
*.go ← Go package `iriq`
|
|
42
|
+
cmd/iriq/ ← Go CLI binary
|
|
43
|
+
completions/ ← Go's own embedded copy (go:embed can't reach ../)
|
|
44
|
+
|
|
45
|
+
rust/ ← Cargo workspace
|
|
46
|
+
Cargo.toml ← workspace root
|
|
47
|
+
iriq/ ← one crate: library + `iriq` CLI binary; inlines completions
|
|
48
|
+
REPORT.md ← Go → Rust port spike notes + perf
|
|
49
|
+
target/ ← Rust build artifacts (gitignored)
|
|
50
|
+
|
|
51
|
+
bin/ ← built Go binary (gitignored)
|
|
52
|
+
script/ ← shared dev scripts (fixture gen, parity, benches)
|
|
53
|
+
spec/fixtures/ ← golden JSON shared by Ruby specs + Go + Rust tests
|
|
54
|
+
.github/workflows/ ← Ruby CI, Go CI, Rust CI, parity CIs
|
|
22
55
|
```
|
|
23
56
|
|
|
24
|
-
|
|
57
|
+
Notes on this layout:
|
|
25
58
|
|
|
26
|
-
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
- The gemspec
|
|
31
|
-
`git ls-files * ':!:spec' ':!:script' ':!:
|
|
59
|
+
- Go's import path is now `github.com/dpep/iriq/go` (the `/go` suffix matches
|
|
60
|
+
the subdir). Consumers import `github.com/dpep/iriq/go`.
|
|
61
|
+
- One version tag (`vX.Y.Z`) serves all three runtimes — Ruby's gemspec, Go's
|
|
62
|
+
module, and Rust's `Cargo.toml` use the same tag stream.
|
|
63
|
+
- The gemspec ships only Ruby + `completions/`, excluding `go/` and `rust/`:
|
|
64
|
+
`git ls-files * ':!:spec' ':!:script' ':!:bin' ':!:rust' ':!:go'`.
|
|
65
|
+
- Completion scripts exist in three places (gem root `completions/`, `go/completions/`
|
|
66
|
+
for `go:embed`, and inlined in the Rust CLI) — keep them in sync like fixtures.
|
|
32
67
|
|
|
33
68
|
## Building
|
|
34
69
|
|
|
@@ -44,24 +79,40 @@ make uninstall # remove from $GOBIN
|
|
|
44
79
|
make clean # remove ./bin/
|
|
45
80
|
make test # go test ./...
|
|
46
81
|
|
|
47
|
-
#
|
|
48
|
-
|
|
82
|
+
# Rust — one crate (library + `iriq` binary), SQLite bundled by default
|
|
83
|
+
cd rust && cargo build --release --bin iriq # → ./rust/target/release/iriq
|
|
84
|
+
cd rust && cargo install --path iriq # install into ~/.cargo/bin
|
|
85
|
+
cd rust && cargo test --workspace
|
|
86
|
+
|
|
87
|
+
# Via Homebrew (builds the Rust CLI from main)
|
|
88
|
+
brew install dpep/tools/iriq
|
|
89
|
+
|
|
90
|
+
# Via crates.io
|
|
91
|
+
cargo install iriq
|
|
49
92
|
```
|
|
50
93
|
|
|
51
|
-
## Keeping
|
|
94
|
+
## Keeping the three runtimes in sync
|
|
52
95
|
|
|
53
|
-
|
|
54
|
-
and behavior.
|
|
96
|
+
Ruby is the **reference implementation**. Go and Rust mirror its public API
|
|
97
|
+
and behavior. Three layers of parity testing keep them aligned:
|
|
55
98
|
|
|
56
99
|
1. **Golden JSON fixtures** (`spec/fixtures/*.json`)
|
|
57
100
|
Generated by `script/generate_fixtures.rb` from the Ruby implementation
|
|
58
|
-
over a curated set of inputs. Go's `fixtures_test.go`
|
|
59
|
-
|
|
101
|
+
over a curated set of inputs. Go's `fixtures_test.go` and Rust's
|
|
102
|
+
`rust/iriq/tests/fixtures.rs` both load each file and assert the same
|
|
103
|
+
outputs.
|
|
60
104
|
|
|
61
|
-
2. **CLI parity harness** (`script/cli_parity.sh`)
|
|
105
|
+
2. **Ruby ↔ Go CLI parity harness** (`script/cli_parity.sh`)
|
|
62
106
|
Runs the same input through `bundle exec exe/iriq` and the Go binary and
|
|
63
107
|
diffs stdout. Lives in CI as the `Ruby ↔ Go parity` job.
|
|
64
108
|
|
|
109
|
+
3. **Rust ↔ Go CLI parity harness** (`script/rust_parity.sh`)
|
|
110
|
+
Same idea — runs every Phase 1 + Phase 2 scenario (single-input,
|
|
111
|
+
pipe-mode, JSON corpus, SQLite corpus, --stats, --reinfer,
|
|
112
|
+
--propose-recognizers, --cross-host-shapes, --host=reg) through the
|
|
113
|
+
Go and Rust binaries and diffs stdout. Lives in CI as the
|
|
114
|
+
`Rust ↔ Go parity` job. Rust transitively inherits Ruby parity via Go.
|
|
115
|
+
|
|
65
116
|
When changing behavior:
|
|
66
117
|
|
|
67
118
|
1. Update the Ruby code + specs first.
|
|
@@ -69,24 +120,54 @@ When changing behavior:
|
|
|
69
120
|
3. Port the change to Go.
|
|
70
121
|
4. `go test ./...` (uses the updated fixtures).
|
|
71
122
|
5. `script/cli_parity.sh` should pass.
|
|
72
|
-
6.
|
|
123
|
+
6. Port the change to Rust under `rust/`.
|
|
124
|
+
7. `cd rust && cargo test --workspace`.
|
|
125
|
+
8. `cd rust && cargo build --release --bin iriq && cd .. && script/rust_parity.sh` should pass.
|
|
126
|
+
9. Commit fixtures with the change — CI will fail if they're stale.
|
|
73
127
|
|
|
74
128
|
## Tests
|
|
75
129
|
|
|
76
130
|
```sh
|
|
77
|
-
bundle exec rspec
|
|
78
|
-
go test ./...
|
|
79
|
-
script/cli_parity.sh
|
|
131
|
+
bundle exec rspec # Ruby suite (305+ examples)
|
|
132
|
+
go test ./... # Go suite (native + fixture parity)
|
|
133
|
+
script/cli_parity.sh # Ruby ↔ Go CLI parity
|
|
134
|
+
cd rust && cargo test --workspace
|
|
135
|
+
cd rust && cargo fmt --check # formatting (CI-gated)
|
|
136
|
+
cd rust && cargo clippy --workspace --all-targets -- -D warnings
|
|
137
|
+
make check # the three Rust checks above, in one shot
|
|
138
|
+
script/rust_parity.sh # Rust ↔ Go CLI parity (~59 scenarios)
|
|
80
139
|
```
|
|
81
140
|
|
|
82
141
|
## Releases
|
|
83
142
|
|
|
84
|
-
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
143
|
+
Versioning is single-stream: one `vX.Y.Z` covers all three runtimes. Bump the
|
|
144
|
+
three version constants **together** — the `--version` parity checks fail
|
|
145
|
+
if they drift:
|
|
146
|
+
|
|
147
|
+
1. `lib/iriq/version.rb` (`VERSION`), `go/version.go` (`Version`), and the two
|
|
148
|
+
`version = "X.Y.Z"` / `pub const VERSION` lines in `rust/iriq/Cargo.toml` and
|
|
149
|
+
`rust/iriq/src/lib.rs` — same string.
|
|
150
|
+
2. `Gemfile.lock` — re-resolve so the pinned `iriq (X.Y.Z)` matches
|
|
151
|
+
(`bundle install`, or it regenerates on the next `bundle exec`). Commit it.
|
|
152
|
+
3. Run `cd rust && cargo update -p iriq` to refresh `Cargo.lock`.
|
|
153
|
+
4. Tag `vX.Y.Z` and push. Go consumers pick it up via
|
|
154
|
+
`go get github.com/dpep/iriq/go@vX.Y.Z`.
|
|
155
|
+
5. `gem push iriq-X.Y.Z.gem` to publish to RubyGems.
|
|
156
|
+
6. `cd rust && cargo publish -p iriq` to publish to crates.io (the crate ships
|
|
157
|
+
both the library and the `iriq` binary).
|
|
158
|
+
|
|
159
|
+
### Keep Homebrew in sync — bump on EVERY version change
|
|
160
|
+
|
|
161
|
+
The tap (`~/code/lib/homebrew-tools`) ships a single `Formula/iriq.rb` that
|
|
162
|
+
builds the Rust CLI (`cargo install --path rust/iriq`) from `branch: "main"`.
|
|
163
|
+
SQLite is on by default (the `iriq` crate's `default` feature set), so there is
|
|
164
|
+
no longer a separate `iriq-sqlite` formula.
|
|
165
|
+
|
|
166
|
+
The formula pins a static `version "X.Y.Z"` label. Because the build tracks
|
|
167
|
+
`main` rather than a tagged tarball, `brew upgrade` only rebuilds when that
|
|
168
|
+
label changes. So on every bump here, update the `version` string in
|
|
169
|
+
`Formula/iriq.rb` to match `version.rb`, then commit + push the tap. Leaving it
|
|
170
|
+
stale means brew users never get the new code even though it's already on `main`.
|
|
90
171
|
|
|
91
172
|
## Corpus storage backends
|
|
92
173
|
|
|
@@ -106,10 +187,14 @@ SQLite file with JSON, etc.).
|
|
|
106
187
|
|
|
107
188
|
The Ruby `sqlite3` gem is loaded lazily (only when a `.db` path is opened),
|
|
108
189
|
keeping the iriq install footprint minimal for users that stick with JSON.
|
|
109
|
-
On the Go side we use `modernc.org/sqlite` (pure Go — no cgo).
|
|
190
|
+
On the Go side we use `modernc.org/sqlite` (pure Go — no cgo). The Rust
|
|
191
|
+
side uses `rusqlite` with the `bundled` feature (statically links C SQLite,
|
|
192
|
+
~3-4 MB binary cost). Schema v4 is shared across all three runtimes — a
|
|
193
|
+
`.db` written by any binary opens cleanly in any other.
|
|
110
194
|
|
|
111
|
-
When adding a new backend, replicate the contract in
|
|
112
|
-
add
|
|
195
|
+
When adding a new backend, replicate the contract in all three languages
|
|
196
|
+
and add parity scenarios in `script/cli_parity.sh`'s `corpus_pair`
|
|
197
|
+
section + `script/rust_parity.sh`'s `corpus_pair`.
|
|
113
198
|
|
|
114
199
|
## What lives where in scripts
|
|
115
200
|
|
|
@@ -117,5 +202,7 @@ add a parity scenario in `script/cli_parity.sh`'s `corpus_pair` section.
|
|
|
117
202
|
- `script/memory.rb` — Ruby-only memory profile.
|
|
118
203
|
- `script/generate_fixtures.rb` — produces `spec/fixtures/*.json` for cross-runtime parity.
|
|
119
204
|
- `script/cli_parity.sh` — Ruby ↔ Go CLI diff.
|
|
205
|
+
- `script/rust_parity.sh` — Rust ↔ Go CLI diff.
|
|
206
|
+
- `script/bench_three_way.sh` — Go vs Rust wall-clock comparison.
|
|
120
207
|
- `script/bench_compare.sh` — Ruby vs Go CLI wall-time comparison.
|
|
121
208
|
- `script/bench_storage.sh` — JSON vs SQLite backend timing (single-process, incremental, concurrent).
|
data/Gemfile.lock
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
PATH
|
|
2
2
|
remote: .
|
|
3
3
|
specs:
|
|
4
|
-
iriq (0.2
|
|
4
|
+
iriq (0.30.2)
|
|
5
5
|
|
|
6
6
|
GEM
|
|
7
7
|
remote: https://rubygems.org/
|
|
@@ -54,7 +54,7 @@ GEM
|
|
|
54
54
|
simplecov_json_formatter (~> 0.1)
|
|
55
55
|
simplecov-html (0.13.2)
|
|
56
56
|
simplecov_json_formatter (0.1.4)
|
|
57
|
-
sqlite3 (2.9.
|
|
57
|
+
sqlite3 (2.9.5)
|
|
58
58
|
mini_portile2 (~> 2.8.0)
|
|
59
59
|
stringio (3.2.0)
|
|
60
60
|
tsort (0.2.0)
|
|
@@ -78,7 +78,7 @@ CHECKSUMS
|
|
|
78
78
|
erb (6.0.4) sha256=38e3803694be357fe2bfe312487c74beaf9fb4e5beb3e22498952fe1645b95d9
|
|
79
79
|
io-console (0.8.2) sha256=d6e3ae7a7cc7574f4b8893b4fca2162e57a825b223a177b7afa236c5ef9814cc
|
|
80
80
|
irb (1.17.0) sha256=168c4ddb93d8a361a045c41d92b2952c7a118fa73f23fe14e55609eb7a863aae
|
|
81
|
-
iriq (0.2
|
|
81
|
+
iriq (0.30.2)
|
|
82
82
|
mini_portile2 (2.8.9) sha256=0cd7c7f824e010c072e33f68bc02d85a00aeb6fce05bb4819c03dfd3c140c289
|
|
83
83
|
pp (0.6.3) sha256=2951d514450b93ccfeb1df7d021cae0da16e0a7f95ee1e2273719669d0ab9df6
|
|
84
84
|
prettyprint (0.2.0) sha256=2bc9e15581a94742064a3cc8b0fb9d45aae3d03a1baa6ef80922627a0766f193
|
|
@@ -95,7 +95,7 @@ CHECKSUMS
|
|
|
95
95
|
simplecov (0.22.0) sha256=fe2622c7834ff23b98066bb0a854284b2729a569ac659f82621fc22ef36213a5
|
|
96
96
|
simplecov-html (0.13.2) sha256=bd0b8e54e7c2d7685927e8d6286466359b6f16b18cb0df47b508e8d73c777246
|
|
97
97
|
simplecov_json_formatter (0.1.4) sha256=529418fbe8de1713ac2b2d612aa3daa56d316975d307244399fa4838c601b428
|
|
98
|
-
sqlite3 (2.9.
|
|
98
|
+
sqlite3 (2.9.5) sha256=04572973a3f943ad50a8adfffc8dd752a5f06e4c3db2026f71838fed8a982606
|
|
99
99
|
stringio (3.2.0) sha256=c37cb2e58b4ffbd33fe5cd948c05934af997b36e0b6ca6fdf43afa234cf222e1
|
|
100
100
|
tsort (0.2.0) sha256=9650a793f6859a43b6641671278f79cfead60ac714148aabe4e3f0060480089f
|
|
101
101
|
|
data/Makefile
CHANGED
|
@@ -1,48 +1,105 @@
|
|
|
1
1
|
# Iriq Go binary — build/install/clean/uninstall helpers.
|
|
2
2
|
#
|
|
3
|
-
# make
|
|
4
|
-
# make build
|
|
5
|
-
# make
|
|
6
|
-
# make
|
|
7
|
-
# make
|
|
8
|
-
# make
|
|
3
|
+
# make - same as `make help`
|
|
4
|
+
# make build - dev build into ./bin/iriq (no SQLite, debug info)
|
|
5
|
+
# make build-sqlite - dev build with SQLite backend included
|
|
6
|
+
# make release - stripped + trimpath build (no SQLite)
|
|
7
|
+
# make release-sqlite - stripped + trimpath build with SQLite
|
|
8
|
+
# make install - go install into $GOBIN
|
|
9
|
+
# make test - go test ./... (both tag states)
|
|
10
|
+
# make clean - remove ./bin/
|
|
11
|
+
# make uninstall - remove the binary from $GOBIN
|
|
12
|
+
#
|
|
13
|
+
# The default build excludes the SQLite backend to keep the binary lean.
|
|
14
|
+
# Pass `-tags sqlite` (or use the *-sqlite targets) to compile it in. The
|
|
15
|
+
# CLI's `--version` output tells you which backends are baked in.
|
|
9
16
|
#
|
|
10
17
|
# Ruby gem build/install is handled by Bundler/RubyGems; see CLAUDE.md.
|
|
11
18
|
|
|
12
|
-
GO
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
19
|
+
GO ?= go
|
|
20
|
+
GO_DIR := go
|
|
21
|
+
BIN_DIR := bin
|
|
22
|
+
BIN := $(BIN_DIR)/iriq
|
|
23
|
+
# Absolute output path: builds run inside $(GO_DIR) via `go -C`, so a
|
|
24
|
+
# relative -o would land under go/. Keep the binary at the repo-root bin/.
|
|
25
|
+
ABS_BIN := $(CURDIR)/$(BIN)
|
|
26
|
+
PKG := ./cmd/iriq
|
|
27
|
+
|
|
28
|
+
# Rust crate lives under rust/; CI gates fmt + clippy + tests there.
|
|
29
|
+
CARGO ?= cargo
|
|
30
|
+
RUST_DIR := rust
|
|
31
|
+
|
|
32
|
+
# Release flags strip the symbol table (-s), debug info (-w), and bake
|
|
33
|
+
# reproducible paths (-trimpath). Drops binary size ~30% with no
|
|
34
|
+
# functional impact; stack-trace function names are gone but file:line
|
|
35
|
+
# resolution still works.
|
|
36
|
+
RELEASE_FLAGS := -ldflags "-s -w" -trimpath
|
|
16
37
|
|
|
17
38
|
# Resolve $GOBIN, falling back to $GOPATH/bin (Go's default install location).
|
|
18
|
-
GOBIN
|
|
39
|
+
GOBIN := $(shell $(GO) env GOBIN)
|
|
19
40
|
ifeq ($(GOBIN),)
|
|
20
|
-
GOBIN
|
|
41
|
+
GOBIN := $(shell $(GO) env GOPATH)/bin
|
|
21
42
|
endif
|
|
22
|
-
INSTALLED
|
|
43
|
+
INSTALLED := $(GOBIN)/iriq
|
|
23
44
|
|
|
24
45
|
.DEFAULT_GOAL := help
|
|
25
|
-
.PHONY: help build install test clean uninstall
|
|
46
|
+
.PHONY: help build build-sqlite release release-sqlite install test clean uninstall check fmt hooks
|
|
26
47
|
|
|
27
48
|
help:
|
|
28
49
|
@echo "Iriq Go targets:"
|
|
29
|
-
@echo " make build
|
|
30
|
-
@echo " make
|
|
31
|
-
@echo " make
|
|
32
|
-
@echo " make
|
|
33
|
-
@echo " make
|
|
50
|
+
@echo " make build slim dev build into $(BIN)"
|
|
51
|
+
@echo " make build-sqlite dev build with SQLite backend"
|
|
52
|
+
@echo " make release stripped slim build into $(BIN)"
|
|
53
|
+
@echo " make release-sqlite stripped build with SQLite backend"
|
|
54
|
+
@echo " make install go install into $(GOBIN)"
|
|
55
|
+
@echo " make test run go test ./... in both tag states"
|
|
56
|
+
@echo " make check Rust gate: cargo fmt --check + clippy + test (run before merging)"
|
|
57
|
+
@echo " make fmt cargo fmt the Rust crate"
|
|
58
|
+
@echo " make hooks enable the committed git hooks (pre-push runs 'make check')"
|
|
59
|
+
@echo " make clean remove $(BIN_DIR)/"
|
|
60
|
+
@echo " make uninstall remove $(INSTALLED)"
|
|
34
61
|
|
|
35
62
|
build:
|
|
36
63
|
@mkdir -p $(BIN_DIR)
|
|
37
|
-
$(GO) build -o $(
|
|
38
|
-
@echo "built $(BIN)"
|
|
64
|
+
$(GO) -C $(GO_DIR) build -o $(ABS_BIN) $(PKG)
|
|
65
|
+
@echo "built $(BIN) (slim, debug)"
|
|
66
|
+
|
|
67
|
+
build-sqlite:
|
|
68
|
+
@mkdir -p $(BIN_DIR)
|
|
69
|
+
$(GO) -C $(GO_DIR) build -tags sqlite -o $(ABS_BIN) $(PKG)
|
|
70
|
+
@echo "built $(BIN) (sqlite, debug)"
|
|
71
|
+
|
|
72
|
+
release:
|
|
73
|
+
@mkdir -p $(BIN_DIR)
|
|
74
|
+
$(GO) -C $(GO_DIR) build $(RELEASE_FLAGS) -o $(ABS_BIN) $(PKG)
|
|
75
|
+
@echo "built $(BIN) (slim, stripped)"
|
|
76
|
+
|
|
77
|
+
release-sqlite:
|
|
78
|
+
@mkdir -p $(BIN_DIR)
|
|
79
|
+
$(GO) -C $(GO_DIR) build -tags sqlite $(RELEASE_FLAGS) -o $(ABS_BIN) $(PKG)
|
|
80
|
+
@echo "built $(BIN) (sqlite, stripped)"
|
|
39
81
|
|
|
40
82
|
install:
|
|
41
|
-
$(GO) install $(PKG)
|
|
83
|
+
$(GO) -C $(GO_DIR) install $(PKG)
|
|
42
84
|
@echo "installed $(INSTALLED)"
|
|
43
85
|
|
|
44
86
|
test:
|
|
45
|
-
$(GO) test ./...
|
|
87
|
+
$(GO) -C $(GO_DIR) test ./...
|
|
88
|
+
$(GO) -C $(GO_DIR) test -tags sqlite ./...
|
|
89
|
+
|
|
90
|
+
# The Rust gate — mirrors CI's Rust job. Run before merging/pushing (the
|
|
91
|
+
# pre-push hook runs this for you once `make hooks` is enabled).
|
|
92
|
+
check:
|
|
93
|
+
cd $(RUST_DIR) && $(CARGO) fmt --check
|
|
94
|
+
cd $(RUST_DIR) && $(CARGO) clippy --workspace --all-targets -- -D warnings
|
|
95
|
+
cd $(RUST_DIR) && $(CARGO) test --workspace
|
|
96
|
+
|
|
97
|
+
fmt:
|
|
98
|
+
cd $(RUST_DIR) && $(CARGO) fmt
|
|
99
|
+
|
|
100
|
+
hooks:
|
|
101
|
+
git config core.hooksPath .githooks
|
|
102
|
+
@echo "git hooks enabled (.githooks) — pre-push now runs 'make check'"
|
|
46
103
|
|
|
47
104
|
clean:
|
|
48
105
|
rm -rf $(BIN_DIR)
|