lda-ruby 0.3.9 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +5 -13
- data/CHANGELOG.md +8 -0
- data/Gemfile +9 -0
- data/README.md +123 -3
- data/VERSION.yml +3 -3
- data/docs/modernization-handoff.md +190 -0
- data/docs/porting-strategy.md +127 -0
- data/docs/precompiled-platform-policy.md +68 -0
- data/docs/release-runbook.md +157 -0
- data/ext/lda-ruby/extconf.rb +10 -6
- data/ext/lda-ruby/lda-inference.c +21 -5
- data/ext/lda-ruby-rust/Cargo.toml +12 -0
- data/ext/lda-ruby-rust/README.md +48 -0
- data/ext/lda-ruby-rust/extconf.rb +123 -0
- data/ext/lda-ruby-rust/src/lib.rs +456 -0
- data/lda-ruby.gemspec +0 -0
- data/lib/lda-ruby/backends/base.rb +129 -0
- data/lib/lda-ruby/backends/native.rb +158 -0
- data/lib/lda-ruby/backends/pure_ruby.rb +613 -0
- data/lib/lda-ruby/backends/rust.rb +226 -0
- data/lib/lda-ruby/backends.rb +58 -0
- data/lib/lda-ruby/corpus/corpus.rb +17 -15
- data/lib/lda-ruby/corpus/data_corpus.rb +2 -2
- data/lib/lda-ruby/corpus/directory_corpus.rb +2 -2
- data/lib/lda-ruby/corpus/text_corpus.rb +2 -2
- data/lib/lda-ruby/document/document.rb +6 -6
- data/lib/lda-ruby/document/text_document.rb +5 -4
- data/lib/lda-ruby/rust_build_policy.rb +21 -0
- data/lib/lda-ruby/version.rb +5 -0
- data/lib/lda-ruby.rb +293 -48
- data/test/backend_compatibility_test.rb +146 -0
- data/test/backends_selection_test.rb +100 -0
- data/test/gemspec_test.rb +27 -0
- data/test/lda_ruby_test.rb +49 -11
- data/test/packaged_gem_smoke_test.rb +33 -0
- data/test/release_scripts_test.rb +54 -0
- data/test/rust_build_policy_test.rb +23 -0
- data/test/simple_pipeline_test.rb +22 -0
- data/test/simple_yaml.rb +1 -7
- data/test/test_helper.rb +5 -6
- metadata +48 -38
- data/Rakefile +0 -61
- data/ext/lda-ruby/Makefile +0 -181
- data/test/data/.gitignore +0 -2
- data/test/simple_test.rb +0 -26
checksums.yaml
CHANGED
|
@@ -1,15 +1,7 @@
|
|
|
1
1
|
---
|
|
2
|
-
|
|
3
|
-
metadata.gz:
|
|
4
|
-
|
|
5
|
-
data.tar.gz: !binary |-
|
|
6
|
-
OTNjZjE5MGNmOGI2YzY3YzhlNDRiYTBlNDM5NmUwYmY4Mjc2ZmNkNQ==
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: d0b97c33082528d2f992c4686e85e2080746938cd898c2a0662dd357979ad650
|
|
4
|
+
data.tar.gz: e4bd8e49f0b3f295b0f3ebfd78fc32db4835f24d93837ad566069b83046538c2
|
|
7
5
|
SHA512:
|
|
8
|
-
metadata.gz:
|
|
9
|
-
|
|
10
|
-
ZmFjZDZiZWIwZmRiMDJlYjNjYTg5YWQ0N2RlOTY1MTg0YzZjZTc0NGQ0YmI1
|
|
11
|
-
ZDljMjljNTA3ODdlZjBjNjNjMTc0MGRlMjBhODQzZTg3YWM5OWE=
|
|
12
|
-
data.tar.gz: !binary |-
|
|
13
|
-
OGFiMzVhMzY4OTFmZTkzMmM3YjQxMzNkM2JlMjVjOTRjM2ZhNTc1YmUxZTRj
|
|
14
|
-
Y2M3YzZiNzNiYmVkNDI1ZTUxODhkMjhlYTQ4MmRhZjVkZjQxMDhjOTk5Njc4
|
|
15
|
-
OTZhOTM5ZTQ3OTFhY2U1YjRhODQyYzBkZWNlYzhjNzBjMWFlNGU=
|
|
6
|
+
metadata.gz: cebd90cdaca9d030379105509325aac12f457eb9da75f8e2fc8f2d618d1ecbf201dc6b4407ec28e8f95b1dda4a71ba864b37ea16ad38a22b7605f2b7e281f908
|
|
7
|
+
data.tar.gz: fb452cc42435ff9382a39f878e8d7de5e3825e1b800e1cca4c2bec7684294d10fdec960f11cefc3f18a8c6381552e14ff058ecebd50884f91115d65e93880825
|
data/CHANGELOG.md
CHANGED
|
@@ -1,3 +1,11 @@
|
|
|
1
|
+
version 0.4.0
|
|
2
|
+
=============
|
|
3
|
+
|
|
4
|
+
- Ruby 3.2+/3.3+ modernization release.
|
|
5
|
+
- Added Rust/native/pure backend selection and packaged-gem install policy validation.
|
|
6
|
+
- Added tag-driven release automation and release runbook.
|
|
7
|
+
- Added precompiled platform gem build/publish pipeline (Linux + macOS) and compatibility policy documentation.
|
|
8
|
+
|
|
1
9
|
version 0.3.9
|
|
2
10
|
=============
|
|
3
11
|
|
data/Gemfile
ADDED
data/README.md
CHANGED
|
@@ -17,6 +17,116 @@ The original C code relied on files for the input and output. We felt it was nec
|
|
|
17
17
|
|
|
18
18
|
If you have general questions about Latent Dirichlet Allocation, I urge you to use the [topic models mailing list][topic-models], since the people who monitor that are very knowledgeable. If you encounter bugs specific to lda-ruby, please post an issue on the Github project.
|
|
19
19
|
|
|
20
|
+
## Development
|
|
21
|
+
|
|
22
|
+
### Local (Ruby 3.2+)
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
bundle install
|
|
26
|
+
bundle exec rake test
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
### Docker (recommended for isolated setup)
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
./bin/docker-test
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
Rust backend runtime checks in Docker:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
./bin/docker-test-rust
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
Install policy matrix checks in Docker:
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
./bin/docker-test-install-policies
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
For an interactive shell inside the dev container:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
./bin/docker-shell
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
For an interactive shell with Rust toolchain + bindgen dependencies:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
./bin/docker-shell-rust
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Build tasks
|
|
60
|
+
|
|
61
|
+
- `bundle exec rake compile` builds the native extension.
|
|
62
|
+
- `bundle exec rake compile_rust` builds the experimental Rust extension and stages a Ruby-loadable artifact (`lda_ruby_rust.<dlext>`).
|
|
63
|
+
- On macOS, build tasks automatically add the Rust linker flag for Ruby extension `dynamic_lookup`.
|
|
64
|
+
- `bundle exec rake test` rebuilds the extension, then runs tests.
|
|
65
|
+
- `bundle exec rake build` builds the gem package.
|
|
66
|
+
- `bundle exec ruby -Ilib:test test/backend_compatibility_test.rb` runs backend compatibility fixtures.
|
|
67
|
+
- `LDA_RUBY_BACKEND=rust bundle exec ruby -Ilib:test test/backend_compatibility_test.rb` runs parity checks in rust mode.
|
|
68
|
+
- `./bin/benchmark-backends` benchmarks available backends (`pure`, `native`, `rust`) and prints JSON.
|
|
69
|
+
- `./bin/docker-test-install-policies` verifies packaged-gem install behavior for `LDA_RUBY_RUST_BUILD=auto|always|never`, including runtime EM smoke checks.
|
|
70
|
+
- `./bin/test-packaged-gem-fallback` verifies packaged-gem fallback behavior without Cargo (auto/never succeed, always fails) plus runtime smoke checks.
|
|
71
|
+
- `./bin/test-packaged-gem-rust-enabled` verifies packaged-gem behavior with Cargo available (auto/always enable Rust, never disables Rust) plus runtime smoke checks.
|
|
72
|
+
- `./bin/test-packaged-gem-manifest` verifies packaged-gem contents/metadata and rejects leaked build artifacts.
|
|
73
|
+
- `./bin/release-preflight` runs unit tests + packaged-gem validation stack; set `SKIP_DOCKER=1` to skip Docker matrix checks.
|
|
74
|
+
- `./bin/check-version-sync` verifies version parity between `VERSION.yml`, `lib/lda-ruby/version.rb`, and expected release tag.
|
|
75
|
+
- `./bin/release-prepare 0.4.0` updates version/changelog files for a new release version.
|
|
76
|
+
- `./bin/release-artifacts --tag v0.4.0` runs release checks, builds the source gem, and writes SHA256 checksums.
|
|
77
|
+
- `./bin/release-precompiled-artifacts --tag v0.4.0 --platform x86_64-linux --skip-preflight` builds a precompiled platform gem and verifies install/runtime smoke checks.
|
|
78
|
+
- The `--platform` value must match the current host platform.
|
|
79
|
+
|
|
80
|
+
Benchmark environment variables:
|
|
81
|
+
- `BENCH_RUNS` (default: `3`)
|
|
82
|
+
- `BENCH_START` (default: `seeded`)
|
|
83
|
+
- `BENCH_TOPICS` (default: `8`)
|
|
84
|
+
- `BENCH_MAX_ITER` (default: `20`)
|
|
85
|
+
- `BENCH_EM_MAX_ITER` (default: `40`)
|
|
86
|
+
|
|
87
|
+
### Install-time Rust build policy
|
|
88
|
+
|
|
89
|
+
Source installs now run both extension setup scripts (`ext/lda-ruby/extconf.rb` and `ext/lda-ruby-rust/extconf.rb`).
|
|
90
|
+
|
|
91
|
+
Rust build policy is controlled by `LDA_RUBY_RUST_BUILD`:
|
|
92
|
+
- `auto` (default): build Rust extension if `cargo` is available, otherwise skip.
|
|
93
|
+
- `always`: require Rust extension build and fail install if unavailable.
|
|
94
|
+
- `never`: skip Rust extension build.
|
|
95
|
+
|
|
96
|
+
Examples:
|
|
97
|
+
- `LDA_RUBY_RUST_BUILD=always gem install lda-ruby`
|
|
98
|
+
- `LDA_RUBY_RUST_BUILD=never bundle exec rake compile`
|
|
99
|
+
|
|
100
|
+
### Precompiled platform gems
|
|
101
|
+
|
|
102
|
+
Releases publish a source gem plus precompiled platform gems for:
|
|
103
|
+
- `x86_64-linux`
|
|
104
|
+
- `x86_64-darwin`
|
|
105
|
+
- `arm64-darwin`
|
|
106
|
+
|
|
107
|
+
On these platforms, installation should not require local C/Rust toolchains.
|
|
108
|
+
Other platforms install from source gem and use the existing install-time fallback policy.
|
|
109
|
+
|
|
110
|
+
For artifact strategy, compatibility targets, and rollout/deprecation rules, see `docs/precompiled-platform-policy.md`.
|
|
111
|
+
|
|
112
|
+
### Backend selection
|
|
113
|
+
|
|
114
|
+
- Default mode is `auto`: Rust backend when available, otherwise native extension, otherwise pure Ruby.
|
|
115
|
+
- Force pure Ruby backend:
|
|
116
|
+
- `Lda::Lda.new(corpus, backend: :pure)`
|
|
117
|
+
- or `LDA_RUBY_BACKEND=pure`
|
|
118
|
+
- Force native backend:
|
|
119
|
+
- `Lda::Lda.new(corpus, backend: :native)`
|
|
120
|
+
- Force Rust backend (when extension is available):
|
|
121
|
+
- `Lda::Lda.new(corpus, backend: :rust)`
|
|
122
|
+
- or `LDA_RUBY_BACKEND=rust`
|
|
123
|
+
|
|
124
|
+
`em("seeded")` is supported by both native and pure backends for deterministic fixture-oriented runs.
|
|
125
|
+
|
|
126
|
+
Rust status: the extension hook layer is scaffolded in `ext/lda-ruby-rust`. Current Rust kernels include batched per-iteration corpus inference, batched per-document inference, topic-weights-per-word, topic-term-count accumulation, topic-term normalization/log-beta finalization, gamma-shift convergence reduction, topic-document average log-probability computation, and seeded topic-term initialization when `backend: :rust` is active; remaining model math still delegates to the pure Ruby backend. CI now runs dedicated rust-runtime checks and numeric parity fixtures against the pure backend.
|
|
127
|
+
`compile_rust` and `LDA_RUBY_RUST_BUILD=always` require a Rust toolchain plus Ruby development headers and `libclang`.
|
|
128
|
+
Gem packaging excludes local Rust build artifacts (`ext/lda-ruby-rust/target/**`) so local cargo outputs do not leak into published gems.
|
|
129
|
+
|
|
20
130
|
## Resources
|
|
21
131
|
|
|
22
132
|
+ [Blog post about LDA-Ruby][lda-ruby]
|
|
@@ -29,9 +139,19 @@ If you have general questions about Latent Dirichlet Allocation, I urge you to u
|
|
|
29
139
|
Blei, David M., Ng, Andrew Y., and Jordan, Michael I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research. 3 (Mar. 2003), 993-1022 [[pdf][pdf]].
|
|
30
140
|
|
|
31
141
|
[svmlight]: http://svmlight.joachims.org
|
|
32
|
-
[lda-ruby]: http://mendicantbug.com/2008/11/17/lda-in-ruby/
|
|
33
|
-
[blei]: http://www.cs.princeton.edu/~blei/lda-c/
|
|
142
|
+
[lda-ruby]: http://web.archive.org/web/20120616115448/http://mendicantbug.com/2008/11/17/lda-in-ruby/
|
|
143
|
+
[blei]: http://web.archive.org/web/20161126004857/http://www.cs.princeton.edu/~blei/lda-c/
|
|
34
144
|
[wikipedia]: http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
|
|
35
|
-
[ap-data]: http://www.cs.princeton.edu/~blei/lda-c/ap.tgz
|
|
145
|
+
[ap-data]: http://web.archive.org/web/20160507090044/http://www.cs.princeton.edu/~blei/lda-c/ap.tgz
|
|
36
146
|
[pdf]: http://www.cs.princeton.edu/picasso/mats/BleiNgJordan2003_blei.pdf
|
|
37
147
|
[topic-models]: https://lists.cs.princeton.edu/mailman/listinfo/topic-models
|
|
148
|
+
|
|
149
|
+
## Modernization
|
|
150
|
+
|
|
151
|
+
For a Ruby 3.2+/3.3+ porting proposal, see `docs/porting-strategy.md`.
|
|
152
|
+
|
|
153
|
+
For the latest implementation status and exact resume instructions, see `docs/modernization-handoff.md`.
|
|
154
|
+
|
|
155
|
+
For release steps and rollback guidance, see `docs/release-runbook.md`.
|
|
156
|
+
|
|
157
|
+
For precompiled gem strategy and compatibility policy, see `docs/precompiled-platform-policy.md`.
|
data/VERSION.yml
CHANGED
|
@@ -0,0 +1,190 @@
|
|
|
1
|
+
# Modernization Handoff (Resume Guide)
|
|
2
|
+
|
|
3
|
+
This document is the canonical handoff state for continuing the Ruby 3.2+/3.3+ modernization in a new conversation.
|
|
4
|
+
|
|
5
|
+
## Snapshot
|
|
6
|
+
|
|
7
|
+
- Snapshot date: 2026-02-25
|
|
8
|
+
- Active branch: `codex/modernization`
|
|
9
|
+
- Branch head at snapshot: `7f9b101` (`Fix macOS Rust extension linking for precompiled builds`)
|
|
10
|
+
- Repo status at snapshot: clean working tree on `codex/modernization` (in sync with `origin/codex/modernization`)
|
|
11
|
+
- Latest release dry-run validation (GitHub Actions):
|
|
12
|
+
- date: 2026-02-25
|
|
13
|
+
- workflow run: `release.yml` run `22382692416` (`workflow_dispatch`, `publish=false`)
|
|
14
|
+
- result: success (`validate release candidate`, `build release artifacts`, and all `build precompiled artifacts` matrix targets)
|
|
15
|
+
- publish jobs skipped by design (`publish=false`)
|
|
16
|
+
- Open pull request:
|
|
17
|
+
- `codex/modernization` -> `master`
|
|
18
|
+
- PR `#18`: `https://github.com/ealdent/lda-ruby/pull/18`
|
|
19
|
+
- Latest PR CI validation (GitHub Actions):
|
|
20
|
+
- date: 2026-02-25
|
|
21
|
+
- workflow run: `CI` run `22383301379` (trigger: `pull_request` for PR `#18`)
|
|
22
|
+
- result: success (all checks green, including `precompiled gem build` matrix and install policy/rust runtime jobs)
|
|
23
|
+
|
|
24
|
+
## Project Goal
|
|
25
|
+
|
|
26
|
+
Modernize `lda-ruby` for Ruby 3.2+/3.3+ with:
|
|
27
|
+
|
|
28
|
+
- stable Ruby API compatibility
|
|
29
|
+
- high-performance Rust backend for hot paths
|
|
30
|
+
- pure Ruby fallback for portability
|
|
31
|
+
- reliable packaging and release validation
|
|
32
|
+
|
|
33
|
+
## Current Backend Behavior
|
|
34
|
+
|
|
35
|
+
- `auto` selection order: `rust` -> `native` -> `pure_ruby`
|
|
36
|
+
- `LDA_RUBY_BACKEND` override supported for `rust`, `native`, and `pure`
|
|
37
|
+
- Rust build policy for source installs: `LDA_RUBY_RUST_BUILD=auto|always|never`
|
|
38
|
+
|
|
39
|
+
## Phase Status
|
|
40
|
+
|
|
41
|
+
### Phase 1 (API/tests stabilization)
|
|
42
|
+
|
|
43
|
+
Status: complete.
|
|
44
|
+
|
|
45
|
+
Delivered:
|
|
46
|
+
|
|
47
|
+
- expanded compatibility fixtures
|
|
48
|
+
- CI test coverage for multiple backend modes
|
|
49
|
+
|
|
50
|
+
### Phase 2 (backend boundary extraction)
|
|
51
|
+
|
|
52
|
+
Status: complete.
|
|
53
|
+
|
|
54
|
+
Delivered:
|
|
55
|
+
|
|
56
|
+
- `Lda::Lda` delegates through backend adapters
|
|
57
|
+
- backend selection normalized through `Lda::Backends.build`
|
|
58
|
+
|
|
59
|
+
### Phase 3 (pure Ruby reference backend)
|
|
60
|
+
|
|
61
|
+
Status: complete.
|
|
62
|
+
|
|
63
|
+
Delivered:
|
|
64
|
+
|
|
65
|
+
- full pure Ruby backend path with EM/model outputs
|
|
66
|
+
- pure backend compatibility in tests
|
|
67
|
+
|
|
68
|
+
### Phase 4 (Rust native backend)
|
|
69
|
+
|
|
70
|
+
Status: mostly complete.
|
|
71
|
+
|
|
72
|
+
Delivered:
|
|
73
|
+
|
|
74
|
+
- Rust extension scaffold with magnus/rb_sys
|
|
75
|
+
- Rust kernels for the main hot loops:
|
|
76
|
+
- corpus iteration
|
|
77
|
+
- document inference
|
|
78
|
+
- topic weights
|
|
79
|
+
- topic-term accumulation
|
|
80
|
+
- topic-term finalization (`beta`/`log(beta)`)
|
|
81
|
+
- gamma shift reduction
|
|
82
|
+
- topic-document probability
|
|
83
|
+
- seeded initialization
|
|
84
|
+
- trusted kernel-output fast path enabled in rust mode
|
|
85
|
+
- parity/compatibility test coverage and rust runtime CI
|
|
86
|
+
|
|
87
|
+
Open in Phase 4:
|
|
88
|
+
|
|
89
|
+
- optional deeper Rust ownership of orchestration logic (current design still intentionally delegates control flow through Ruby fallback scaffolding)
|
|
90
|
+
|
|
91
|
+
### Phase 5 (packaging/release)
|
|
92
|
+
|
|
93
|
+
Status: Phase 5A complete (source-gem release automation), Phase 5B complete for initial Linux/macOS precompiled gems.
|
|
94
|
+
|
|
95
|
+
Delivered:
|
|
96
|
+
|
|
97
|
+
- clean gem file filtering (no local cargo/native artifacts)
|
|
98
|
+
- Docker install-policy matrix checks
|
|
99
|
+
- packaged gem runtime checks without Cargo (`bin/test-packaged-gem-fallback`)
|
|
100
|
+
- packaged gem runtime checks with Cargo (`bin/test-packaged-gem-rust-enabled`)
|
|
101
|
+
- packaged gem manifest/metadata gate (`bin/test-packaged-gem-manifest`)
|
|
102
|
+
- single-command local gate (`bin/release-preflight`)
|
|
103
|
+
- version/tag parity guard (`bin/check-version-sync`)
|
|
104
|
+
- deterministic release preparation helper (`bin/release-prepare`)
|
|
105
|
+
- release artifact builder with checksum output (`bin/release-artifacts`)
|
|
106
|
+
- precompiled artifact builder + runtime validator (`bin/release-precompiled-artifacts`)
|
|
107
|
+
- gemspec precompiled variant support (`LDA_RUBY_GEM_VARIANT=precompiled`)
|
|
108
|
+
- precompiled platform compatibility/publish policy (`docs/precompiled-platform-policy.md`)
|
|
109
|
+
- macOS Rust build linker guardrail (`dynamic_lookup`) for precompiled packaging paths
|
|
110
|
+
- tag-driven release workflow (`.github/workflows/release.yml`)
|
|
111
|
+
- maintainer release runbook (`docs/release-runbook.md`)
|
|
112
|
+
- CI jobs for packaged-gem fallback, rust-enabled checks, and manifest checks
|
|
113
|
+
- CI precompiled gem build guardrail job (`precompiled-gem-build`)
|
|
114
|
+
- release workflow matrix for precompiled gems:
|
|
115
|
+
- `x86_64-linux`
|
|
116
|
+
- `x86_64-darwin`
|
|
117
|
+
- `arm64-darwin`
|
|
118
|
+
|
|
119
|
+
Open in Phase 5:
|
|
120
|
+
|
|
121
|
+
- optional expansion of precompiled targets (for example Windows and/or musl Linux)
|
|
122
|
+
- tighter post-publish verification/alerting for multi-artifact release runs
|
|
123
|
+
|
|
124
|
+
## Validation Commands
|
|
125
|
+
|
|
126
|
+
Core:
|
|
127
|
+
|
|
128
|
+
- `./bin/docker-test`
|
|
129
|
+
- `./bin/docker-test-rust`
|
|
130
|
+
|
|
131
|
+
Packaging/release checks:
|
|
132
|
+
|
|
133
|
+
- `./bin/check-version-sync`
|
|
134
|
+
- `./bin/test-packaged-gem-manifest`
|
|
135
|
+
- `./bin/test-packaged-gem-fallback`
|
|
136
|
+
- `./bin/test-packaged-gem-rust-enabled`
|
|
137
|
+
- `SKIP_DOCKER=1 ./bin/release-preflight`
|
|
138
|
+
- `./bin/release-artifacts --tag v0.4.0`
|
|
139
|
+
- `./bin/release-precompiled-artifacts --tag v0.4.0 --skip-preflight`
|
|
140
|
+
|
|
141
|
+
Optional full Docker matrix:
|
|
142
|
+
|
|
143
|
+
- `./bin/docker-test-install-policies`
|
|
144
|
+
|
|
145
|
+
Performance tracking:
|
|
146
|
+
|
|
147
|
+
- `./bin/benchmark-backends`
|
|
148
|
+
|
|
149
|
+
## CI Jobs Expected
|
|
150
|
+
|
|
151
|
+
- native tests (`test-native`)
|
|
152
|
+
- pure backend tests (`test-pure`)
|
|
153
|
+
- rust runtime tests (`rust-runtime`)
|
|
154
|
+
- Docker install policy matrix (`install-policy-matrix`)
|
|
155
|
+
- packaged gem fallback checks (`packaged-gem-fallback`)
|
|
156
|
+
- packaged gem rust-enabled checks (`packaged-gem-rust-enabled`)
|
|
157
|
+
- packaged gem manifest checks (`packaged-gem-manifest`)
|
|
158
|
+
- precompiled gem build checks (`precompiled-gem-build`)
|
|
159
|
+
- rust scaffold check (`rust-scaffold`)
|
|
160
|
+
- release validation/build/publish pipeline on `v*` tags (`release.yml`)
|
|
161
|
+
|
|
162
|
+
## Remaining Work Queue
|
|
163
|
+
|
|
164
|
+
Priority 1:
|
|
165
|
+
|
|
166
|
+
- decide whether to keep current hybrid rust-kernel architecture or move more orchestration into Rust
|
|
167
|
+
- if moving deeper into Rust, define parity guardrails and benchmark thresholds before refactors
|
|
168
|
+
|
|
169
|
+
Priority 2:
|
|
170
|
+
|
|
171
|
+
- evaluate additional precompiled targets (Windows and/or musl Linux)
|
|
172
|
+
- add explicit post-publish verification checks for all uploaded release artifacts
|
|
173
|
+
|
|
174
|
+
Priority 3:
|
|
175
|
+
|
|
176
|
+
- define automated alerts/notifications for release artifact publish failures
|
|
177
|
+
|
|
178
|
+
## Resume Instructions For A New Conversation
|
|
179
|
+
|
|
180
|
+
1. Check out `codex/modernization`.
|
|
181
|
+
2. Open this file first: `docs/modernization-handoff.md`.
|
|
182
|
+
3. Run `SKIP_DOCKER=1 ./bin/release-preflight`.
|
|
183
|
+
4. Review `docs/release-runbook.md` for release flow/rollback details.
|
|
184
|
+
5. Validate precompiled packaging locally for your host:
|
|
185
|
+
- `./bin/release-precompiled-artifacts --tag "$(./bin/check-version-sync --print-tag)" --skip-preflight`
|
|
186
|
+
6. Continue with `Priority 1` items under "Remaining Work Queue".
|
|
187
|
+
|
|
188
|
+
If you want the next assistant to continue immediately, use:
|
|
189
|
+
|
|
190
|
+
"Open `docs/modernization-handoff.md`, validate with `SKIP_DOCKER=1 ./bin/release-preflight`, run `./bin/release-precompiled-artifacts --skip-preflight`, and continue the remaining modernization queue."
|
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
# Ruby 3.2+/3.3+ Porting Strategy (Experimental)
|
|
2
|
+
|
|
3
|
+
## Recommendation
|
|
4
|
+
|
|
5
|
+
For this gem, the best long-term path is:
|
|
6
|
+
|
|
7
|
+
1. **Keep the public Ruby API.**
|
|
8
|
+
2. **Replace the handwritten C extension bindings with Rust + magnus** (Ruby native extension bindings for modern CRuby).
|
|
9
|
+
3. **Ship a pure-Ruby fallback backend** for portability and easier debugging.
|
|
10
|
+
|
|
11
|
+
This gives a practical balance between maintenance and speed:
|
|
12
|
+
|
|
13
|
+
- Pure Ruby only: easiest to maintain, but likely much slower for training.
|
|
14
|
+
- Modern C extension rewrite: fast, but still painful to maintain against Ruby internals.
|
|
15
|
+
- Rust extension: native speed with safer memory management and cleaner FFI layer.
|
|
16
|
+
|
|
17
|
+
## Why this path fits this project
|
|
18
|
+
|
|
19
|
+
- The current project already has a stable Ruby-facing object model (`Lda::Lda`, corpus/document classes).
|
|
20
|
+
- The expensive parts are numeric loops in inference/training, which benefit from native code.
|
|
21
|
+
- Ruby 3.2+ compatibility is much easier to preserve with a modern binding layer than with legacy C wrapper patterns.
|
|
22
|
+
|
|
23
|
+
## Proposed architecture
|
|
24
|
+
|
|
25
|
+
- `Lda::Backends::Rust` (preferred in `auto` mode when extension loads)
|
|
26
|
+
- `Lda::Backends::Native` (fallback in `auto` mode when Rust is unavailable)
|
|
27
|
+
- `Lda::Backends::PureRuby` (always available)
|
|
28
|
+
- `Lda::Lda` delegates heavy operations to the selected backend.
|
|
29
|
+
|
|
30
|
+
Suggested backend selection:
|
|
31
|
+
|
|
32
|
+
- `ENV['LDA_RUBY_BACKEND']=pure_ruby` → force Ruby backend.
|
|
33
|
+
- default (`auto`) → try Rust backend, then native backend, then pure Ruby.
|
|
34
|
+
|
|
35
|
+
## Migration plan
|
|
36
|
+
|
|
37
|
+
### Current status
|
|
38
|
+
|
|
39
|
+
Completed in `codex/experiment-ruby3-modernization`:
|
|
40
|
+
|
|
41
|
+
- Phase 1 baseline test capture expanded with backend compatibility fixtures.
|
|
42
|
+
- Phase 2 backend boundary extraction (`Lda::Lda` now delegates through backend adapters).
|
|
43
|
+
- Phase 3 pure Ruby backend implementation (available as `backend: :pure` or `LDA_RUBY_BACKEND=pure`).
|
|
44
|
+
- CI matrix added for Ruby 3.2/3.3 with native and pure backend jobs.
|
|
45
|
+
- Phase 4 started with Rust extension scaffolding (`ext/lda-ruby-rust`) and backend mode wiring (`backend: :rust` when extension is available).
|
|
46
|
+
- Rust kernels ported so far:
|
|
47
|
+
- batched per-iteration corpus inference
|
|
48
|
+
- batched per-document inference loop (EM inner updates)
|
|
49
|
+
- per-word topic-weight computation
|
|
50
|
+
- topic-term accumulation from per-document `phi`
|
|
51
|
+
- topic-term normalization and log-beta finalization in EM
|
|
52
|
+
- gamma convergence shift reduction between EM iterations
|
|
53
|
+
- topic-document average log-probability computation
|
|
54
|
+
- seeded topic-term initialization
|
|
55
|
+
- Rust runtime CI job added (compile + execute rust backend tests).
|
|
56
|
+
- Rust/Pure numeric parity fixtures added for deterministic seeded runs.
|
|
57
|
+
- `compile_rust` now stages a Ruby-loadable extension artifact to avoid `Init_` symbol mismatch from Cargo's `lib*` output naming.
|
|
58
|
+
- Dockerized rust runtime workflow added for local parity with CI (`Dockerfile.rust`, `bin/docker-test-rust`).
|
|
59
|
+
- Gem packaging now excludes local Rust cargo build artifacts (`target/**`) for clean release builds.
|
|
60
|
+
- Backend benchmark driver added (`bin/benchmark-backends`) to track pure/native/rust runtime deltas.
|
|
61
|
+
- Source install path now has explicit Rust build policy via `LDA_RUBY_RUST_BUILD=auto|always|never`.
|
|
62
|
+
- Docker install-policy matrix script added (`bin/docker-test-install-policies`) to verify source install behavior across environments.
|
|
63
|
+
- CI now runs install-policy matrix checks on Ubuntu.
|
|
64
|
+
- Install-policy matrix now runs packaged-gem runtime smoke checks (auto/pure/native/rust mode selection + EM pipeline) to validate release-time fallback behavior.
|
|
65
|
+
- Cross-OS packaged-gem fallback CI job added (`bin/test-packaged-gem-fallback`) to validate auto/never/always install policy semantics without Cargo.
|
|
66
|
+
- Packaged-gem Rust-enabled CI job added (`bin/test-packaged-gem-rust-enabled`) to validate auto/never/always install policy semantics with Cargo available.
|
|
67
|
+
- Packaged-gem manifest CI job added (`bin/test-packaged-gem-manifest`) to enforce release artifact contents and metadata.
|
|
68
|
+
- Local release preflight command added (`bin/release-preflight`) to run unit + packaged-gem validation checks in one pass.
|
|
69
|
+
- Version/tag sync guard added (`bin/check-version-sync`) to enforce parity between `VERSION.yml`, `lib/lda-ruby/version.rb`, and release tags.
|
|
70
|
+
- Release preparation helper added (`bin/release-prepare`) for deterministic version/changelog updates.
|
|
71
|
+
- Release artifact helper added (`bin/release-artifacts`) to build source gem artifacts with SHA256 checksums.
|
|
72
|
+
- Precompiled platform artifact helper added (`bin/release-precompiled-artifacts`) to build + validate native gems.
|
|
73
|
+
- Tag-driven release workflow added (`.github/workflows/release.yml`) with dry-run support and environment-gated publish jobs.
|
|
74
|
+
- CI precompiled guardrail job added (`precompiled-gem-build`) for Linux/macOS packaging checks.
|
|
75
|
+
- Maintainer release runbook added (`docs/release-runbook.md`) with publish and rollback/yank procedures.
|
|
76
|
+
- Precompiled platform support policy added (`docs/precompiled-platform-policy.md`).
|
|
77
|
+
|
|
78
|
+
For an up-to-date resume snapshot (phase status + exact remaining queue), see `docs/modernization-handoff.md`.
|
|
79
|
+
|
|
80
|
+
### Phase 1: Stabilize API and tests
|
|
81
|
+
|
|
82
|
+
- Capture current behavior with golden tests around:
|
|
83
|
+
- corpus loading
|
|
84
|
+
- EM convergence hooks
|
|
85
|
+
- topic-word output format
|
|
86
|
+
- `top_words`, `top_word_indices`, and `phi` shape
|
|
87
|
+
- Add CI matrix for Ruby 3.2, 3.3, and latest.
|
|
88
|
+
|
|
89
|
+
### Phase 2: Extract backend boundary
|
|
90
|
+
|
|
91
|
+
- Introduce backend interface in Ruby.
|
|
92
|
+
- Keep all existing high-level classes and output methods.
|
|
93
|
+
- Route existing calls through one backend object.
|
|
94
|
+
|
|
95
|
+
### Phase 3: Add pure Ruby reference backend
|
|
96
|
+
|
|
97
|
+
- Implement a simple, correct (not necessarily fast) Gibbs or variational inference path.
|
|
98
|
+
- Use this backend in tests as the compatibility baseline.
|
|
99
|
+
|
|
100
|
+
### Phase 4: Add Rust native backend
|
|
101
|
+
|
|
102
|
+
- Implement performance-critical loops in Rust.
|
|
103
|
+
- Expose only minimal Ruby-facing methods.
|
|
104
|
+
- Verify parity against pure Ruby backend on deterministic fixtures.
|
|
105
|
+
|
|
106
|
+
### Phase 5: Packaging and release
|
|
107
|
+
|
|
108
|
+
- Phase 5A (source-gem release automation): complete.
|
|
109
|
+
- Keep source build path available.
|
|
110
|
+
- Phase 5B (precompiled/native gem publishing): complete for initial Linux/macOS targets via `bin/release-precompiled-artifacts` and release workflow matrix builds.
|
|
111
|
+
|
|
112
|
+
## Tooling suggestions
|
|
113
|
+
|
|
114
|
+
- Ruby test framework: Minitest (already present).
|
|
115
|
+
- Native extension: `magnus` + `rb_sys`.
|
|
116
|
+
- Performance checks: benchmark script comparing legacy behavior vs pure Ruby vs Rust.
|
|
117
|
+
- CI: GitHub Actions with matrix (`ubuntu`, `macos`) and Ruby 3.2/3.3/latest.
|
|
118
|
+
|
|
119
|
+
## What not to do first
|
|
120
|
+
|
|
121
|
+
- Do not start by rewriting all algorithms at once.
|
|
122
|
+
- Do not couple file parsing and inference internals.
|
|
123
|
+
- Do not rely on old Ruby C API macros that changed across versions.
|
|
124
|
+
|
|
125
|
+
## Decision summary
|
|
126
|
+
|
|
127
|
+
If the goal is a future-proof gem with acceptable speed and much lower maintenance pain, **use a hybrid model (Rust native backend + pure Ruby fallback)** instead of a full pure-Ruby rewrite or another large handwritten C extension.
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
# Precompiled Platform Gem Policy (Phase 5B)
|
|
2
|
+
|
|
3
|
+
This document defines the publish strategy and compatibility policy for `lda-ruby` precompiled gems.
|
|
4
|
+
|
|
5
|
+
## Artifact Strategy
|
|
6
|
+
|
|
7
|
+
Each release version publishes a split package set:
|
|
8
|
+
|
|
9
|
+
- Source gem: `lda-ruby-<version>.gem`
|
|
10
|
+
- Precompiled platform gems:
|
|
11
|
+
- `lda-ruby-<version>-x86_64-linux.gem`
|
|
12
|
+
- `lda-ruby-<version>-x86_64-darwin.gem`
|
|
13
|
+
- `lda-ruby-<version>-arm64-darwin.gem`
|
|
14
|
+
|
|
15
|
+
The source gem remains the universal fallback. Platform gems are additive and are expected to install without local build tools.
|
|
16
|
+
Precompiled artifacts are built on matching host runners (no cross-compilation in current workflow).
|
|
17
|
+
|
|
18
|
+
## Compatibility Policy
|
|
19
|
+
|
|
20
|
+
- Supported Ruby versions: 3.2 and 3.3 (plus future versions validated by CI).
|
|
21
|
+
- Release-blocking precompiled targets:
|
|
22
|
+
- Linux `x86_64-linux`
|
|
23
|
+
- macOS Intel `x86_64-darwin`
|
|
24
|
+
- macOS Apple Silicon `arm64-darwin`
|
|
25
|
+
- Other platforms:
|
|
26
|
+
- Install from source gem.
|
|
27
|
+
- Runtime remains supported through native/pure fallback paths.
|
|
28
|
+
|
|
29
|
+
Backend behavior expectations:
|
|
30
|
+
|
|
31
|
+
- Platform gem install:
|
|
32
|
+
- `auto` backend resolves to `rust` by default.
|
|
33
|
+
- `native` and `pure` overrides continue to work.
|
|
34
|
+
- Source gem install:
|
|
35
|
+
- Rust build policy is controlled by `LDA_RUBY_RUST_BUILD=auto|always|never`.
|
|
36
|
+
- If Rust build is skipped/unavailable, `auto` falls back to `native`, then `pure_ruby`.
|
|
37
|
+
|
|
38
|
+
## Guardrails
|
|
39
|
+
|
|
40
|
+
Validation must pass before publish:
|
|
41
|
+
|
|
42
|
+
- `./bin/release-preflight` (source-gem checks).
|
|
43
|
+
- `./bin/release-precompiled-artifacts --platform <target>` for each release-blocking platform.
|
|
44
|
+
|
|
45
|
+
Release automation requirements:
|
|
46
|
+
|
|
47
|
+
- `.github/workflows/release.yml` builds source + precompiled artifacts.
|
|
48
|
+
- Release workflow matrix must include all release-blocking precompiled targets.
|
|
49
|
+
- Publish jobs push all built gems and attach checksums to GitHub releases.
|
|
50
|
+
|
|
51
|
+
Continuous integration guardrail:
|
|
52
|
+
|
|
53
|
+
- `.github/workflows/ci.yml` runs `release-precompiled-artifacts` for representative Linux/macOS targets on every branch/PR.
|
|
54
|
+
|
|
55
|
+
## Rollout / Expansion Rules
|
|
56
|
+
|
|
57
|
+
When adding a new precompiled platform:
|
|
58
|
+
|
|
59
|
+
1. Add target to release workflow matrix.
|
|
60
|
+
2. Add or update CI coverage for that platform family.
|
|
61
|
+
3. Update this policy and the release runbook support matrix.
|
|
62
|
+
4. Validate a dry-run release with `workflow_dispatch` before shipping.
|
|
63
|
+
|
|
64
|
+
When deprecating a precompiled platform:
|
|
65
|
+
|
|
66
|
+
1. Remove platform from release matrix.
|
|
67
|
+
2. Keep source-gem path available unless the overall platform support policy changes.
|
|
68
|
+
3. Document deprecation in `CHANGELOG.md` and release notes.
|