lda-ruby 0.3.9 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. checksums.yaml +5 -13
  2. data/CHANGELOG.md +8 -0
  3. data/Gemfile +9 -0
  4. data/README.md +123 -3
  5. data/VERSION.yml +3 -3
  6. data/docs/modernization-handoff.md +190 -0
  7. data/docs/porting-strategy.md +127 -0
  8. data/docs/precompiled-platform-policy.md +68 -0
  9. data/docs/release-runbook.md +157 -0
  10. data/ext/lda-ruby/extconf.rb +10 -6
  11. data/ext/lda-ruby/lda-inference.c +21 -5
  12. data/ext/lda-ruby-rust/Cargo.toml +12 -0
  13. data/ext/lda-ruby-rust/README.md +48 -0
  14. data/ext/lda-ruby-rust/extconf.rb +123 -0
  15. data/ext/lda-ruby-rust/src/lib.rs +456 -0
  16. data/lda-ruby.gemspec +0 -0
  17. data/lib/lda-ruby/backends/base.rb +129 -0
  18. data/lib/lda-ruby/backends/native.rb +158 -0
  19. data/lib/lda-ruby/backends/pure_ruby.rb +613 -0
  20. data/lib/lda-ruby/backends/rust.rb +226 -0
  21. data/lib/lda-ruby/backends.rb +58 -0
  22. data/lib/lda-ruby/corpus/corpus.rb +17 -15
  23. data/lib/lda-ruby/corpus/data_corpus.rb +2 -2
  24. data/lib/lda-ruby/corpus/directory_corpus.rb +2 -2
  25. data/lib/lda-ruby/corpus/text_corpus.rb +2 -2
  26. data/lib/lda-ruby/document/document.rb +6 -6
  27. data/lib/lda-ruby/document/text_document.rb +5 -4
  28. data/lib/lda-ruby/rust_build_policy.rb +21 -0
  29. data/lib/lda-ruby/version.rb +5 -0
  30. data/lib/lda-ruby.rb +293 -48
  31. data/test/backend_compatibility_test.rb +146 -0
  32. data/test/backends_selection_test.rb +100 -0
  33. data/test/gemspec_test.rb +27 -0
  34. data/test/lda_ruby_test.rb +49 -11
  35. data/test/packaged_gem_smoke_test.rb +33 -0
  36. data/test/release_scripts_test.rb +54 -0
  37. data/test/rust_build_policy_test.rb +23 -0
  38. data/test/simple_pipeline_test.rb +22 -0
  39. data/test/simple_yaml.rb +1 -7
  40. data/test/test_helper.rb +5 -6
  41. metadata +48 -38
  42. data/Rakefile +0 -61
  43. data/ext/lda-ruby/Makefile +0 -181
  44. data/test/data/.gitignore +0 -2
  45. data/test/simple_test.rb +0 -26
@@ -0,0 +1,157 @@
1
+ # Release Runbook (Phase 5A + 5B)
2
+
3
+ This runbook defines the maintainer workflow for shipping `lda-ruby` source and precompiled platform gem releases.
4
+
5
+ Authoritative platform/support policy is maintained in `docs/precompiled-platform-policy.md`.
6
+
7
+ ## Scope
8
+
9
+ - Release artifact types:
10
+ - source gem: `pkg/lda-ruby-<version>.gem`
11
+ - precompiled gems (current targets are defined in `docs/precompiled-platform-policy.md`)
12
+ - Release trigger: git tag (`vX.Y.Z`) with matching version files
13
+ - Publish targets:
14
+ - RubyGems (`gem push`)
15
+ - GitHub Releases (gem + checksum attachment)
16
+
17
+ ## Prerequisites
18
+
19
+ 1. Access:
20
+ - push/tag rights on `master`
21
+ - access to GitHub Actions environments for release approvals
22
+ - RubyGems owner access for `lda-ruby`
23
+ 2. Local tooling:
24
+ - Ruby 3.2+ with Bundler
25
+ - Rust toolchain (`cargo`) for local precompiled-gem build checks
26
+ - `libclang` available to Rust bindgen
27
+ - Docker (recommended for reproducible checks)
28
+ 3. Repository state:
29
+ - release commit merged to `master`
30
+ - clean working tree
31
+ - version files in sync
32
+
33
+ ## Required Secrets and Environments
34
+
35
+ GitHub repository secret:
36
+
37
+ - `RUBYGEMS_API_KEY`: API key with push rights for `lda-ruby`.
38
+
39
+ GitHub Actions environment:
40
+
41
+ - `release`: protect this environment with required reviewer approval.
42
+ - Both publish jobs in `.github/workflows/release.yml` are bound to `release`.
43
+
44
+ ## Release Preparation
45
+
46
+ 1. Prepare and update release files:
47
+
48
+ ```bash
49
+ ./bin/release-prepare 0.4.0
50
+ ```
51
+
52
+ 2. Review changes:
53
+ - `VERSION.yml`
54
+ - `lib/lda-ruby/version.rb`
55
+ - `CHANGELOG.md`
56
+
57
+ 3. Validate full release checks locally:
58
+
59
+ ```bash
60
+ SKIP_DOCKER=1 ./bin/release-preflight
61
+ ./bin/test-packaged-gem-manifest
62
+ ```
63
+
64
+ 4. Validate local precompiled gem flow for your current host platform:
65
+
66
+ ```bash
67
+ ./bin/release-precompiled-artifacts --tag v0.4.0 --skip-preflight
68
+ ```
69
+
70
+ Note: `release-precompiled-artifacts` only supports building for the current host platform (no cross-compilation).
71
+
72
+ 5. Commit and merge to `master`.
73
+
74
+ ## Dry-Run Path (No Publish)
75
+
76
+ Use `workflow_dispatch` with `publish=false`.
77
+
78
+ Behavior:
79
+
80
+ - runs release validation and artifact build
81
+ - uploads source + precompiled `pkg/lda-ruby-*.gem` and checksum files as workflow artifacts
82
+ - does not push to RubyGems
83
+ - does not create a GitHub release
84
+
85
+ Latest verified dry-run reference:
86
+
87
+ - date: 2026-02-25
88
+ - workflow run: `https://github.com/ealdent/lda-ruby/actions/runs/22382692416`
89
+ - dispatch parameters: `release_tag=v0.4.0`, `publish=false`
90
+ - result: success across `validate`, `build_artifacts`, and full `build_precompiled_artifacts` matrix
91
+
92
+ Optional local dry-run equivalent:
93
+
94
+ ```bash
95
+ ./bin/release-artifacts --tag v0.4.0
96
+ ./bin/release-precompiled-artifacts --tag v0.4.0 --skip-preflight
97
+ ```
98
+
99
+ ## Publish Path (Tag-Driven)
100
+
101
+ 1. Ensure the release commit is on `master`.
102
+ 2. Create and push the release tag:
103
+
104
+ ```bash
105
+ git checkout master
106
+ git pull --ff-only
107
+ git tag -a v0.4.0 -m "Release v0.4.0"
108
+ git push origin v0.4.0
109
+ ```
110
+
111
+ 3. Monitor `.github/workflows/release.yml`:
112
+ - `validate`
113
+ - `build_artifacts`
114
+ - `build_precompiled_artifacts` (linux + macOS matrix)
115
+ - environment-gated `publish_rubygems`
116
+ - environment-gated `publish_github_release`
117
+ 4. Approve the protected `release` environment when prompted.
118
+ 5. Confirm published outputs:
119
+ - RubyGems shows `lda-ruby` `0.4.0` source gem and platform gems
120
+ - GitHub release `v0.4.0` exists with all gem and `.sha256` attachments
121
+
122
+ ## Rollback and Recovery
123
+
124
+ If publish fails before RubyGems push:
125
+
126
+ 1. Fix issue on `master`.
127
+ 2. Delete and recreate the tag only if the broken tag did not produce public artifacts:
128
+ - `git tag -d vX.Y.Z`
129
+ - `git push origin :refs/tags/vX.Y.Z`
130
+ 3. Re-tag and re-run release.
131
+
132
+ If RubyGems push succeeds but GitHub release fails:
133
+
134
+ 1. Re-run only the GitHub release path by re-running the workflow job after fix.
135
+ 2. Do not re-push gem for the same version.
136
+
137
+ If an incorrect gem is published:
138
+
139
+ 1. Yank from RubyGems:
140
+
141
+ ```bash
142
+ gem yank lda-ruby -v X.Y.Z
143
+ ```
144
+
145
+ 2. Publish a corrective version (for example `X.Y.(Z+1)`), do not re-use yanked version numbers.
146
+ 3. Update `CHANGELOG.md` and release notes to document the correction.
147
+
148
+ ## Troubleshooting
149
+
150
+ - `Could not find 'bundler'`: install the Bundler version pinned in `Gemfile.lock`.
151
+ - `cargo not found` in rust-enabled checks: ensure Rust toolchain is installed or run in Docker.
152
+ - `libclang` not found while building precompiled gems: install LLVM/libclang and set `LIBCLANG_PATH` if needed.
153
+ - Linux `Install Rust bindgen dependencies` can take several minutes on fresh runners due apt package index and package installs.
154
+ - macOS Rust link errors (`symbol(s) not found` for Ruby APIs): ensure build path preserves `-C link-arg=-Wl,-undefined,dynamic_lookup` in `RUSTFLAGS`.
155
+ - Tag/version mismatch: run `./bin/check-version-sync --tag vX.Y.Z`.
156
+ - Artifact mismatch during release: rebuild with `./bin/release-artifacts --tag vX.Y.Z`.
157
+ - Precompiled artifact mismatch: rebuild with `./bin/release-precompiled-artifacts --tag vX.Y.Z --skip-preflight`.
@@ -1,9 +1,13 @@
1
- ENV["ARCHFLAGS"] = "-arch #{`uname -p` =~ /powerpc/ ? 'ppc' : 'i386'}"
1
+ # frozen_string_literal: true
2
2
 
3
- require 'mkmf'
3
+ require "mkmf"
4
4
 
5
- $CFLAGS << ' -Wall -ggdb -O0'
6
- $defs.push( "-D USE_RUBY" )
5
+ extension_name = "lda-ruby/lda"
6
+ dir_config(extension_name)
7
7
 
8
- dir_config('lda-ruby/lda')
9
- create_makefile("lda-ruby/lda")
8
+ $defs << "-DUSE_RUBY"
9
+ append_cflags("-Wall")
10
+ append_cflags("-Wextra")
11
+ append_cflags("-Wno-unused-parameter")
12
+
13
+ create_makefile(extension_name)
@@ -614,7 +614,25 @@ void run_quiet_em(char* start, corpus* corpus) {
614
614
  * * em_convergence
615
615
  * * est_alpha
616
616
  */
617
- static VALUE wrap_set_config(VALUE self, VALUE init_alpha, VALUE num_topics, VALUE max_iter, VALUE convergence, VALUE em_max_iter, VALUE em_convergence, VALUE est_alpha) {
617
+ static VALUE wrap_set_config(int argc, VALUE* argv, VALUE self) {
618
+ VALUE init_alpha = Qnil;
619
+ VALUE num_topics = Qnil;
620
+ VALUE max_iter = Qnil;
621
+ VALUE convergence = Qnil;
622
+ VALUE em_max_iter = Qnil;
623
+ VALUE em_convergence = Qnil;
624
+ VALUE est_alpha = Qnil;
625
+
626
+ rb_check_arity(argc, 5, 7);
627
+
628
+ init_alpha = argv[0];
629
+ num_topics = argv[1];
630
+ max_iter = argv[2];
631
+ convergence = argv[3];
632
+ em_max_iter = argv[4];
633
+ em_convergence = (argc >= 6) ? argv[5] : rb_float_new(EM_CONVERGED);
634
+ est_alpha = (argc == 7) ? argv[6] : rb_int_new(ESTIMATE_ALPHA);
635
+
618
636
  INITIAL_ALPHA = NUM2DBL(init_alpha);
619
637
  NTOPICS = NUM2INT(num_topics);
620
638
  if( NTOPICS < 0 ) { rb_raise(rb_eRuntimeError, "NTOPICS must be greater than 0 - %d", NTOPICS); }
@@ -954,13 +972,11 @@ static VALUE wrap_get_model_settings(VALUE self) {
954
972
  }
955
973
 
956
974
 
957
- void Init_lda() {
975
+ void Init_lda(void) {
958
976
  corpus_loaded = FALSE;
959
977
  model_loaded = FALSE;
960
978
  VERBOSE = TRUE;
961
979
 
962
- rb_require("lda-ruby");
963
-
964
980
  rb_cLdaModule = rb_define_module("Lda");
965
981
  rb_cLda = rb_define_class_under(rb_cLdaModule, "Lda", rb_cObject);
966
982
  rb_cLdaCorpus = rb_define_class_under(rb_cLdaModule, "Corpus", rb_cObject);
@@ -977,7 +993,7 @@ void Init_lda() {
977
993
  rb_define_method(rb_cLda, "load_settings", wrap_load_settings, 1);
978
994
 
979
995
  // method to set all the config options at once
980
- rb_define_method(rb_cLda, "set_config", wrap_set_config, 5);
996
+ rb_define_method(rb_cLda, "set_config", wrap_set_config, -1);
981
997
 
982
998
  // accessor stuff for main settings
983
999
  rb_define_method(rb_cLda, "max_iter", wrap_get_max_iter, 0);
@@ -0,0 +1,12 @@
1
+ [package]
2
+ name = "lda_ruby_rust"
3
+ version = "0.1.0"
4
+ edition = "2021"
5
+ rust-version = "1.74"
6
+
7
+ [lib]
8
+ name = "lda_ruby_rust"
9
+ crate-type = ["cdylib"]
10
+
11
+ [dependencies]
12
+ magnus = "0.7"
@@ -0,0 +1,48 @@
1
+ # Experimental Rust Extension Scaffold
2
+
3
+ This directory contains an experimental Rust extension scaffold built with `magnus`.
4
+
5
+ Current scope:
6
+
7
+ - Defines `Lda::RustBackend` module in Ruby.
8
+ - Exposes capability hooks:
9
+ - `Lda::RustBackend.available?`
10
+ - `Lda::RustBackend.abi_version`
11
+ - `Lda::RustBackend.before_em(start, num_docs, num_terms)`
12
+ - `Lda::RustBackend.topic_weights_for_word(beta, gamma, word_index, min_probability)`
13
+ - `Lda::RustBackend.accumulate_topic_term_counts(topic_term_counts, phi_d, words, counts)`
14
+ - `Lda::RustBackend.infer_document(beta, gamma_initial, words, counts, max_iter, convergence, min_probability, init_alpha)`
15
+ - `Lda::RustBackend.infer_corpus_iteration(beta, document_words, document_counts, max_iter, convergence, min_probability, init_alpha)`
16
+ - `Lda::RustBackend.normalize_topic_term_counts(topic_term_counts, min_probability)`
17
+ - `Lda::RustBackend.average_gamma_shift(previous_gamma, current_gamma)`
18
+ - `Lda::RustBackend.topic_document_probability(phi_tensor, document_counts, num_topics, min_probability)`
19
+ - `Lda::RustBackend.seeded_topic_term_probabilities(document_words, document_counts, topics, terms, min_probability)`
20
+
21
+ Hot-path kernels currently executed in Rust when `backend: :rust` is active:
22
+ - topic weights for a word across topics
23
+ - topic-term count accumulation from per-document `phi`
24
+ - full per-document inference loop (batched inner EM updates)
25
+ - full per-iteration corpus inference (batched document processing)
26
+ - topic-term normalization and log-probability finalization for EM beta updates
27
+ - gamma convergence shift reduction between EM iterations
28
+ - topic-document average log-probability computation
29
+ - seeded topic-term initialization
30
+
31
+ Remaining numeric LDA kernels are still provided by the pure Ruby backend and will move incrementally.
32
+
33
+ ## Local build (optional)
34
+
35
+ ```bash
36
+ cd ext/lda-ruby-rust
37
+ cargo build --release
38
+ ```
39
+
40
+ Then run Ruby with `require "lda_ruby_rust"` available on load path.
41
+
42
+ ## Install-time policy
43
+
44
+ During source gem installs, `ext/lda-ruby-rust/extconf.rb` can optionally build this extension.
45
+
46
+ - `LDA_RUBY_RUST_BUILD=auto` (default): build when `cargo` is available.
47
+ - `LDA_RUBY_RUST_BUILD=always`: require a successful Rust build or fail installation.
48
+ - `LDA_RUBY_RUST_BUILD=never`: always skip Rust build.
@@ -0,0 +1,123 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "fileutils"
4
+ require "rbconfig"
5
+
6
+ require_relative "../../lib/lda-ruby/rust_build_policy"
7
+
8
+ module Lda
9
+ module RustExtensionBuild
10
+ module_function
11
+
12
+ def run
13
+ policy = RustBuildPolicy.resolve
14
+ puts("Rust extension build policy: #{policy} (#{RustBuildPolicy::ENV_KEY})")
15
+
16
+ case policy
17
+ when RustBuildPolicy::NEVER
18
+ puts("Skipping Rust extension build (policy=#{RustBuildPolicy::NEVER}).")
19
+ when RustBuildPolicy::ALWAYS
20
+ ensure_cargo_available!
21
+ build_and_stage!
22
+ else
23
+ if cargo_available?
24
+ build_and_stage!
25
+ else
26
+ puts("cargo not found; skipping Rust extension build (policy=#{RustBuildPolicy::AUTO}).")
27
+ end
28
+ end
29
+
30
+ write_noop_makefile
31
+ rescue StandardError => e
32
+ if policy == RustBuildPolicy::ALWAYS
33
+ abort("Rust extension build failed with #{RustBuildPolicy::ENV_KEY}=#{RustBuildPolicy::ALWAYS}: #{e.message}")
34
+ end
35
+
36
+ warn("Rust extension build skipped after error in auto mode: #{e.message}")
37
+ write_noop_makefile
38
+ end
39
+
40
+ def ensure_cargo_available!
41
+ return if cargo_available?
42
+
43
+ abort("cargo not found in PATH but #{RustBuildPolicy::ENV_KEY}=#{RustBuildPolicy::ALWAYS} was requested.")
44
+ end
45
+
46
+ def cargo_available?
47
+ cargo = ENV.fetch("CARGO", "cargo")
48
+ system(cargo, "--version", out: File::NULL, err: File::NULL)
49
+ end
50
+
51
+ def build_and_stage!
52
+ cargo = ENV.fetch("CARGO", "cargo")
53
+ Dir.chdir(__dir__) do
54
+ env = rust_build_env
55
+ success =
56
+ if env.empty?
57
+ system(cargo, "build", "--release")
58
+ else
59
+ system(env, cargo, "build", "--release")
60
+ end
61
+ success or raise "cargo build --release failed"
62
+ end
63
+
64
+ source = File.join(__dir__, "target", "release", rust_cdylib_filename)
65
+ raise "Rust extension artifact not found at #{source}" unless File.exist?(source)
66
+
67
+ destination = File.expand_path("../../lib/lda_ruby_rust.#{RbConfig::CONFIG.fetch('DLEXT')}", __dir__)
68
+ FileUtils.cp(source, destination)
69
+ puts("Staged Rust extension to #{destination}")
70
+ end
71
+
72
+ def rust_cdylib_filename
73
+ host_os = RbConfig::CONFIG.fetch("host_os")
74
+ extension =
75
+ case host_os
76
+ when /darwin/
77
+ "dylib"
78
+ when /mswin|mingw|cygwin/
79
+ "dll"
80
+ else
81
+ "so"
82
+ end
83
+
84
+ "liblda_ruby_rust.#{extension}"
85
+ end
86
+
87
+ def rust_build_env
88
+ host_os = RbConfig::CONFIG.fetch("host_os")
89
+ return {} unless host_os.match?(/darwin/)
90
+
91
+ dynamic_lookup_flag = "-C link-arg=-Wl,-undefined,dynamic_lookup"
92
+ existing = ENV.fetch("RUSTFLAGS", "")
93
+ merged =
94
+ if existing.include?(dynamic_lookup_flag)
95
+ existing
96
+ else
97
+ [existing, dynamic_lookup_flag].reject(&:empty?).join(" ")
98
+ end
99
+
100
+ { "RUSTFLAGS" => merged }
101
+ end
102
+
103
+ def write_noop_makefile
104
+ File.write(
105
+ File.join(__dir__, "Makefile"),
106
+ <<~MAKEFILE
107
+ all:
108
+ \t@echo "Rust extension handled by extconf.rb"
109
+
110
+ install:
111
+ \t@echo "Rust extension handled by extconf.rb"
112
+
113
+ clean:
114
+ \t@true
115
+
116
+ distclean: clean
117
+ MAKEFILE
118
+ )
119
+ end
120
+ end
121
+ end
122
+
123
+ Lda::RustExtensionBuild.run