RubyGems - scout-ai - Versions diffs - 1.0.0 → 1.0.1 - Mend

scout-ai 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (62) hide show

checksums.yaml +4 -4
data/.vimproject +80 -15
data/README.md +296 -0
data/Rakefile +2 -0
data/VERSION +1 -1
data/doc/Agent.md +279 -0
data/doc/Chat.md +258 -0
data/doc/LLM.md +446 -0
data/doc/Model.md +513 -0
data/doc/RAG.md +129 -0
data/lib/scout/llm/agent/chat.rb +51 -1
data/lib/scout/llm/agent/delegate.rb +39 -0
data/lib/scout/llm/agent/iterate.rb +44 -0
data/lib/scout/llm/agent.rb +42 -21
data/lib/scout/llm/ask.rb +38 -6
data/lib/scout/llm/backends/anthropic.rb +147 -0
data/lib/scout/llm/backends/bedrock.rb +1 -1
data/lib/scout/llm/backends/ollama.rb +23 -29
data/lib/scout/llm/backends/openai.rb +34 -40
data/lib/scout/llm/backends/responses.rb +158 -110
data/lib/scout/llm/chat.rb +250 -94
data/lib/scout/llm/embed.rb +4 -4
data/lib/scout/llm/mcp.rb +28 -0
data/lib/scout/llm/parse.rb +1 -0
data/lib/scout/llm/rag.rb +9 -0
data/lib/scout/llm/tools/call.rb +66 -0
data/lib/scout/llm/tools/knowledge_base.rb +158 -0
data/lib/scout/llm/tools/mcp.rb +59 -0
data/lib/scout/llm/tools/workflow.rb +69 -0
data/lib/scout/llm/tools.rb +58 -143
data/lib/scout-ai.rb +1 -0
data/scout-ai.gemspec +31 -18
data/scout_commands/agent/ask +28 -71
data/scout_commands/documenter +148 -0
data/scout_commands/llm/ask +2 -2
data/scout_commands/llm/server +319 -0
data/share/server/chat.html +138 -0
data/share/server/chat.js +468 -0
data/test/scout/llm/backends/test_anthropic.rb +134 -0
data/test/scout/llm/backends/test_openai.rb +45 -6
data/test/scout/llm/backends/test_responses.rb +124 -0
data/test/scout/llm/test_agent.rb +0 -70
data/test/scout/llm/test_ask.rb +3 -1
data/test/scout/llm/test_chat.rb +43 -1
data/test/scout/llm/test_mcp.rb +29 -0
data/test/scout/llm/tools/test_knowledge_base.rb +22 -0
data/test/scout/llm/tools/test_mcp.rb +11 -0
data/test/scout/llm/tools/test_workflow.rb +39 -0
metadata +56 -17
data/README.rdoc +0 -18
data/python/scout_ai/__pycache__/__init__.cpython-310.pyc +0 -0
data/python/scout_ai/__pycache__/__init__.cpython-311.pyc +0 -0
data/python/scout_ai/__pycache__/huggingface.cpython-310.pyc +0 -0
data/python/scout_ai/__pycache__/huggingface.cpython-311.pyc +0 -0
data/python/scout_ai/__pycache__/util.cpython-310.pyc +0 -0
data/python/scout_ai/__pycache__/util.cpython-311.pyc +0 -0
data/python/scout_ai/atcold/plot_lib.py +0 -141
data/python/scout_ai/atcold/spiral.py +0 -27
data/python/scout_ai/huggingface/train/__pycache__/__init__.cpython-310.pyc +0 -0
data/python/scout_ai/huggingface/train/__pycache__/next_token.cpython-310.pyc +0 -0
data/python/scout_ai/language_model.py +0 -70
/data/{python/scout_ai/atcold/__init__.py → test/scout/llm/tools/test_call.rb} +0 -0

data/doc/Model.md ADDED Viewed

@@ -0,0 +1,513 @@
+# Model
+The Model subsystem in scout-ai provides a small, composable framework to wrap machine‑learning models (pure Ruby, Python/PyTorch, and Hugging Face Transformers) with a consistent API for evaluation, training, feature extraction, post‑processing, and persistence.
+It consists of a base class (ScoutModel) and higher-level implementations:
+- PythonModel — instantiate and drive Python classes via ScoutPython.
+- TorchModel — drive arbitrary PyTorch modules with simple training/eval loops, tensor helpers, and state save/load.
+- HuggingfaceModel — convenience wrapper for Transformers models and tokenizers, with specializations:
+  - SequenceClassificationModel — text classification.
+  - CausalModel — chat/causal generation.
+  - NextTokenModel — next-token fine-tuning pipeline.
+This document covers the common API, how to customize models with feature extraction and post-processing, saving/loading models and their behavior, and several concrete examples (including how ExTRI2 uses a Hugging Face model inside a Workflow).
+---
+## Core concepts and base API (ScoutModel)
+ScoutModel is the foundation. You create a model object, attach blocks describing how to evaluate, train, extract features, and post-process, and optionally persist both its behavior and state in a directory.
+Constructor:
+- ScoutModel.new(directory = nil, options = {})
+  - directory (optional) — if provided, model behavior/state can be saved and later restored from here.
+  - options — free-form hash for your parameters (e.g., hyperparameters). These are persisted to options.json in the directory and merged on restore.
+Key responsibilities:
+- Provide hooks to set the model’s:
+  - init — how to initialize internal state (e.g., load a Python object).
+  - eval — how to evaluate one sample.
+  - eval_list — how to evaluate a list (batch) of samples (by default dispatches to eval).
+  - extract_features / extract_features_list — how to map raw inputs to “features” the model expects.
+  - post_process / post_process_list — transform raw predictions/logits to final outputs.
+  - train — how to fit with accumulated training data (features and labels).
+- Build and hold training data:
+  - add(sample, label = nil)
+  - add_list(list, labels = nil or Hash mapping sample->label)
+  - Internal arrays @features and @labels are filled after feature extraction.
+- Persist behavior and state:
+  - save — persists options, all behavior blocks (as .rb) and state (see below).
+  - restore — loads behavior and options; if the model has a directory, init/load_state are called on demand.
+- A directory-bound state file:
+  - state_file — shorthand for directory.state; used by implementations to store learned parameters.
+Execution helpers (util/run.rb):
+- execute(method, *args) — run a stored Proc with arity checks.
+- init { ... } / init() — define or execute the initialization method.
+- eval(sample=nil) { ... } — define or run the eval method; calls extract_features and post_process around your block as needed.
+- eval_list(list=nil) { ... } — define or run the list version; defaults to mapping eval unless you override.
+- post_process(result=nil) { ... }, post_process_list(list=nil) { ... } — define or run post-processing.
+- train { ... } / train() — define or run training using @features/@labels.
+- extract_features(sample=nil) { ... }, extract_features_list(list=nil) { ... } — define or run feature extraction.
+Persistence (util/save.rb):
+- save — writes options.json; saves each defined Proc to a .rb file beside the state (using method_source); calls save_state if @state exists.
+- restore — loads behavior (.rb), options, and sets up init/load_state/save_state blocks.
+- save_state { |state_file, state| ... } — define or execute logic to persist the current @state.
+- load_state { |state_file| ... } — define or execute logic to restore @state.
+Minimal example (pure Ruby)
+```ruby
+model = ScoutModel.new
+model.eval do |sample, list=nil|
+  if list
+    list.map { |x| x * 2 }
+  else
+    sample * 2
+  end
+end
+model.eval(1)             # => 2
+model.eval_list([1, 2])   # => [2, 4]
+```
+Persisting behavior/state
+```ruby
+TmpFile.with_file do |dir|
+  model = ScoutModel.new dir, factor: 4
+  model.eval { |x, list=nil| list ? list.map { |v| v * @options[:factor] } : x * @options[:factor] }
+  model.save
+  # Later
+  reloaded = ScoutModel.new dir
+  reloaded.eval(1)           # => 4
+  reloaded.eval_list([1,2])  # => [4,8]
+end
+```
+---
+## PythonModel: wrap Python classes
+PythonModel specializes ScoutModel to initialize a Python class instance (via ScoutPython) and keep it in @state.
+Constructor:
+- PythonModel.new(dir, python_class = nil, python_module = :model, options = {})
+  - dir — directory holding model.py or any Python package you want on sys.path.
+  - python_class/python_module — class and module to import; if python_module omitted, defaults to :model.
+  - options — additional keyword arguments passed to the Python class initializer.
+Initialization:
+- On init, PythonModel adjusts paths, ensures ScoutPython is initialized, and builds an instance:
+  - ScoutPython.class_new_obj(python_module, python_class, **options.except(...))
+From tests (python/test_base.rb):
+```ruby
+TmpFile.with_path do |dir|
+  dir['model.py'].write <<~PY
+    class TestModel:
+      def __init__(self, delta):
+        self.delta = delta
+      def eval(self, x):
+        return [e + self.delta for e in x]
+  PY
+  model = PythonModel.new dir, 'TestModel', :model, delta: 1
+  model.eval do |sample, list=nil|
+    init unless state
+    if list
+      state.eval(list)       # Python: returns list
+    else
+      state.eval([sample])[0]
+    end
+  end
+  model.eval(1)                 # => 2
+  model.eval_list([3,5])        # => [4,6]
+  model.save
+  model2 = ScoutModel.new dir   # generic loader from directory works too
+  model2.eval(1)                # => 2
+  model3 = ScoutModel.new dir, delta: 2
+  model3.eval(1)                # => 3
+end
+```
+Notes:
+- Behavior blocks (eval/extract_features/train/post_process) are still Ruby procs you define; inside, you can call Python methods on state.
+- Options are persisted and merged on restore, allowing default hyperparameter overrides.
+---
+## TorchModel: PyTorch convenience
+TorchModel extends PythonModel with a ready-to-use setup for PyTorch nn.Modules, training loop, tensor helpers, and state I/O.
+Highlights:
+- torch helpers (torch/helpers.rb):
+  - TorchModel.init_python — imports torch and utility modules once.
+  - TorchModel::Tensor — wrapper adding to_ruby/to_ruby!/del for tensor lifecycle management.
+  - device(options) / dtype(options) — configure device/dtype from options (e.g., device: 'cuda').
+  - tensor(obj, device, dtype) — build a torch.tensor; result responds to .to_ruby / .del.
+- Save/Load (torch/load_and_save.rb):
+  - TorchModel.save(state_file, state) — saves both architecture (torch.save(model)) and weights (state_dict) into state_file(.architecture).
+  - TorchModel.load(state_file, state=nil) — loads architecture and then weights.
+  - reset_state — clear current state and remove persisted files.
+- Introspection (torch/introspection.rb):
+  - get_layer(state, layer_path = nil), get_weights(state, layer_path)
+  - freeze_layer(state, layer_path, requires_grad=false) — recursively freezes a submodule.
+- Training loop (torch.rb):
+  - Provide your nn.Module as state (e.g., via model.state = ScoutPython.torch.nn.Linear.new(1,1)).
+  - Set criterion/optimizer or rely on defaults:
+    - TorchModel.optimizer(model, training_args) — default SGD(lr: 0.01).
+    - TorchModel.criterion(model, training_args) — default MSELoss.
+  - options[:training_args] may set epochs, batch_size, learning_rate, etc.
+Example (from tests/test_torch.rb)
+```ruby
+TorchModel.init_python
+model = TorchModel.new dir
+model.state = ScoutPython.torch.nn.Linear.new(1, 1)
+model.criterion = ScoutPython.torch.nn.MSELoss.new()
+model.extract_features { |f| [f] }
+model.post_process     { |v, list| list ? v.map(&:first) : v.first }
+# Train y ~ 2x
+model.add 5.0,  [10.0]
+model.add 10.0, [20.0]
+model.options[:training_args][:epochs] = 1000
+model.train
+w = model.get_weights.to_ruby.first.first
+# w between 1.8 and 2.2
+```
+Persist and reuse
+```ruby
+model.save
+reloaded = ScoutModel.new dir
+y = reloaded.eval(100.0) # ~ 200
+```
+Tips:
+- Manage tensor memory with Tensor#del after large batch evaluations if needed.
+- You can freeze layers by name path ("encoder.layer.0") before training.
+---
+## HuggingfaceModel: Transformers integration
+HuggingfaceModel is a TorchModel specializing initialization and save/load to work with transformers:
+- Loads a model and tokenizer via Python functions (python/scout_ai/huggingface/model.py):
+  - load_model(task, checkpoint, **kwargs)
+  - load_tokenizer(checkpoint, **kwargs)
+- Persists using save_pretrained/from_pretrained into directory.state (a directory).
+Options normalization:
+- fix_options: splits options into:
+  - training_args (or via training: …),
+  - tokenizer_args (or via tokenizer: …),
+  - plus task / checkpoint.
+- Any model/tokenizer kwargs not in training_args or tokenizer_args are passed through on load.
+Save/Load:
+- save_state — model.save_pretrained and tokenizer.save_pretrained into state_file dir.
+- load_state — model.from_pretrained and tokenizer.from_pretrained when present.
+You typically use one of its specializations:
+### SequenceClassificationModel
+Purpose: text classification (logits to label).
+Behavior:
+- eval: calls Python eval_model(model, tokenizer, texts, locate_tokens?) to produce logits (default return_logits = true).
+- post_process: argmax across logits, mapping to class labels if provided.
+Training:
+- train: builds a TSV (text,label), constructs TrainingArguments and uses Trainer/train (python/scout_ai/huggingface/train).
+- Accepts optional class_weights to weight CrossEntropy in a custom Trainer.
+Example training (from tests)
+```ruby
+model = SequenceClassificationModel.new 'bert-base-uncased', nil, class_labels: %w(Bad Good)
+model.init
+10.times do
+  model.add "The dog", 'Bad'
+  model.add "The cat", 'Good'
+end
+model.train
+model.eval("This is dog")  # => "Bad"
+model.eval("This is cat")  # => "Good"
+```
+Notes:
+- post_process maps argmax index to options[:class_labels]. Raw logits can be left to downstream code by customizing post_process.
+### CausalModel
+Purpose: chat/causal generation.
+Behavior:
+- eval(messages, list=nil): calls Python eval_causal_lm_chat(model, tokenizer, messages, chat_template, chat_template_kwargs, generation_kwargs) to return generated text, using tokenizer.apply_chat_template when available.
+Training:
+- train(pairs, labels): hooks a basic RLHF pipeline (python/scout_ai/huggingface/rlhf.py) using PPO. You supply:
+  - pairs: array of [messages, response] pairs,
+  - labels: rewards for each pair.
+- After training, it reloads state from disk.
+Usage example (test/test_causal.rb):
+```ruby
+model = CausalModel.new 'mistralai/Mistral-7B-Instruct-v0.3'
+model.init
+model.eval([
+  {role: :system, content: "You are a calculator, just reply with the answer"},
+  {role: :user, content: " 1 + 2 ="}
+])
+# => "3"
+```
+### NextTokenModel
+Purpose: next-token fine-tuning for Causal LM.
+Adds a custom train block that:
+- Builds tokenized dataset from a list of strings.
+- Trains with a simple language modeling loop (python/scout_ai/huggingface/train/next_token.py).
+- Writes checkpoints under directory/output.
+From tests (huggingface/causal/test_next_token.rb):
+```ruby
+model = NextTokenModel.new model_name, tmp_dir, training_num_train_epochs: 1000, training_learning_rate: 0.1
+chat = Chat.setup []
+chat.user "say hi"
+pp model.eval chat   # generation before training
+state, tok = model.init
+tok.pad_token = tok.eos_token
+train_texts = ["say hi, no!", "say hi, hi", ...]
+model.add_list train_texts.shuffle
+model.train
+pp model.eval chat   # improved generations
+model.save
+reloaded = PythonModel.new tmp_dir
+pp reloaded.eval chat
+```
+---
+## Feature extraction and post-processing
+A key pattern is to keep evaluation logic generic and tailor feature extraction and post‑processing for each task.
+- extract_features(sample) and extract_features_list(list) let you shape inputs into the structure your model consumes.
+- post_process(result) or post_process_list(list) convert raw outputs to your final format (e.g., argmax to label, logits to softmax).
+ExTRI2 workflow example (SequenceClassification)
+```ruby
+# tri_sentences task uses a Huggingface SequenceClassification model
+tri_model = Rbbt.models[tri_model].find unless File.exist?(tri_model)
+model = HuggingfaceModel.new 'SequenceClassification', tri_model, nil,
+  tokenizer_args: { model_max_length: 512, truncation: true },
+  return_logits: true
+# Convert the TSV row into the sequence model expects
+model.extract_features do |_, feature_list|
+  feature_list.collect do |text, tf, tg|
+    text.sub("[TF]", "<TF>#{tf}</TF>").sub("[TG]", "<TG>#{tg}</TG>")
+  end
+end
+model.init
+# Evaluate as a batch (tsv.slice returns [["Text","TF","Gene"], ...])
+predictions = model.eval_list tsv.slice(["Text", "TF", "Gene"]).values
+# Write classifier output back to TSV
+tsv.add_field "Valid score" do
+  non_valid, valid = predictions.shift
+  begin
+    Misc.softmax([valid, non_valid]).first
+  rescue
+    0
+  end
+end
+tsv.add_field "Valid" do |_, values|
+  values.last > 0.5 ? "Valid" : "Non valid"
+end
+```
+Key takeaways:
+- Use extract_features to canonicalize input format independent of how your rows are structured.
+- Batch evaluation with eval_list on large tables; then write back into TSV columns.
+- Persist the model directory to reuse across runs.
+---
+## Training data management
+Collect samples:
+- add(sample, label=nil)
+- add_list(list, labels=nil)
+  - labels may be an Array aligned with list or a Hash mapping sample->label.
+In Torch/HF paths, training consumes @features/@labels after feature extraction:
+- SequenceClassificationModel’s train writes a TSV dataset to disk, builds TrainingArguments, tokenizes, and runs transformers.Trainer.
+- TorchModel’s train uses a simple loop with SGD and MSELoss by default (override criterion/optimizer if needed).
+---
+## Persistence and restore
+Behavior and state are independent:
+- Behavior (Ruby Procs for eval/extract_features/train/etc.) are saved to .rb sibling files in directory; they are reloaded and instance_eval’ed on restore.
+- Options are persisted to options.json and merged on restore.
+- State depends on implementation:
+  - TorchModel: two files — state (weights) and architecture dump (.architecture).
+  - HuggingfaceModel: directory with tokenizer+model via save_pretrained.
+  - PythonModel: you define save_state/load_state (or rely on higher-level class).
+Common methods:
+- save — writes options, behavior files, and calls save_state if @state exists.
+- restore — loads behavior files and options; state is lazy-initialized by calling init/load_state when used next.
+---
+## Devices, tensors, and memory notes (PyTorch)
+- Choose device automatically or pass options: { device: 'cuda' } or { device: 'cpu' }.
+- TorchModel::Tensor#to_ruby converts tensors to Ruby arrays via numpy; #to_ruby! also calls .del to free GPU memory (detach, move to CPU, clear grads and storage).
+- Freeze layers if fine-tuning only a head: TorchModel.freeze_layer(state, "encoder.layer.0", false).
+---
+## Building your own specializations
+You can layer new classes over PythonModel/TorchModel/HuggingfaceModel to produce high-level behaviors:
+- Override initialize to:
+  - Call super(...) with task/checkpoint/dir/options.
+  - Provide eval blocks suited for your task (e.g., locate tokens, decode strategies).
+  - Provide post_process/post_process_list.
+  - Provide train with your pipeline (tokenization, trainer, or custom loop).
+  - Optionally override save_state/load_state.
+- Or, stick with a plain ScoutModel and define init/eval/train/… blocks directly—particularly useful for lightweight pure-Ruby or ad‑hoc model logic.
+---
+## Patterns and recommendations
+- Start simple with ScoutModel for logic prototyping; then move to PythonModel/TorchModel/Hugging Face when integrating Python models.
+- Always isolate feature extraction from evaluation to keep eval focused on the lower-level API your model expects.
+- Persist: pass a directory when you want to reuse a model and its learned parameters across runs; call save after training.
+- For table‑driven workflows, use eval_list and TSV traversal to batch efficiently (see ExTRI2 usage).
+- In TorchModel, explicitly set criterion/optimizer where the default (SGD + MSELoss) is not appropriate.
+---
+## API quick reference
+Common (ScoutModel)
+- new(directory=nil, options={})
+- init { ... } / init() → @state
+- eval(sample=nil) { |features| ... } → result
+- eval_list(list=nil) { |list| ... } → array of results
+- extract_features(sample=nil) { ... }, extract_features_list(list=nil) { ... }
+- post_process(result=nil) { ... }, post_process_list(list=nil) { ... }
+- train { |features, labels| ... } / train()
+- add(sample, label=nil), add_list(list, labels=nil or Hash)
+- save / restore
+- save_state { |state_file, state| ... }, load_state { |state_file| ... }
+- directory, state_file, options
+PythonModel
+- new(dir, python_class=nil, python_module=:model, options={})
+- On init: state is an instance of the Python class.
+TorchModel
+- state (PyTorch nn.Module)
+- criterion, optimizer, device, dtype
+- TorchModel.init_python
+- TorchModel.tensor(obj, device, dtype) → Tensor wrapper
+- TorchModel.save(state_file, state) / TorchModel.load(state_file, state=nil)
+- TorchModel.get_layer(state, path), freeze_layer(state, path, requires_grad=false)
+HuggingfaceModel
+- new(task=nil, checkpoint=nil, dir=nil, options={})
+  - options: training_args (or training: {}), tokenizer_args (or tokenizer: {})
+- save_state/load_state via save_pretrained/from_pretrained
+SequenceClassificationModel
+- class_labels (optional)
+- train(texts, labels)
+- eval(text or list of texts) → label(s) or your post_process
+CausalModel
+- eval(messages) → generated text
+- train(pairs, rewards) — RLHF pipeline
+NextTokenModel
+- train(texts) — next-token fine-tuning loop
+---
+## CLI
+No dedicated “model” CLI commands are shipped in scout-ai. You will typically:
+- Invoke models programmatically from Ruby code, or
+- Use them inside Workflows (see ExTRI2 below), then drive training/eval via Workflow’s CLI (scout workflow task …).
+Refer to the Workflow documentation for CLI usage if you integrate models into tasks.
+---
+## Example: using a Hugging Face classifier inside a Workflow (ExTRI2)
+The ExTRI2 workflow builds sequence classification models to validate TRI sentences and determine Mode of Regulation (MoR). It uses HuggingfaceModel and custom feature extraction to mark [TF]/[TG] mentions:
+```ruby
+model = HuggingfaceModel.new 'SequenceClassification', tri_model, nil,
+  tokenizer_args: { model_max_length: 512, truncation: true },
+  return_logits: true
+model.extract_features do |_, rows|
+  rows.map do |text, tf, tg|
+    text.sub("[TF]", "<TF>#{tf}</TF>").sub("[TG]", "<TG>#{tg}</TG>")
+  end
+end
+model.init
+predictions = model.eval_list tsv.slice(["Text", "TF", "Gene"]).values
+tsv.add_field "Valid score" do
+  non_valid, valid = predictions.shift
+  Misc.softmax([valid, non_valid]).first rescue 0
+end
+tsv.add_field "Valid" do |_, row|
+  row.last > 0.5 ? "Valid" : "Non valid"
+end
+```
+This pattern—feature extraction tied to the row schema, batch evaluation, then TSV augmentation—is representative of how to fold models into reproducible pipelines.
+---
+Model provides the minimal structure needed to adapt, persist, and reuse models across Ruby and Python ecosystems, while keeping your training/evaluation logic concise and testable. Use the base hooks for clarity, leverage Torch/HF helpers when needed, and integrate with Workflows to scale out training and inference.

data/doc/RAG.md ADDED Viewed

@@ -0,0 +1,129 @@
+# RAG (Retrieval-Augmented Generation) module
+This document explains how to use the RAG helper provided in Scout (lib/scout/llm/rag.rb).
+Audience: AI agents and developers integrating retrieval-augmented flows into other applications.
+Overview
+--------
+LLM::RAG provides a thin helper to build a nearest-neighbor index over embedding vectors using the hnswlib library. It expects an array of fixed-size numeric vectors (Float arrays) and returns an HNSW index that can be queried with another vector to find the nearest neighbors.
+The RAG.index method is intentionally small and focused:
+- It requires the `hnswlib` Ruby gem at runtime (loaded inside the method).
+- It uses L2 (Euclidean) distance by default.
+- It sets the index dimension to the length of the first vector and initializes the HNSW index with the number of elements supplied.
+- Each vector is added in order; the integer ID stored in the index is the zero-based position in the input array.
+Prerequisites
+-------------
+- Ruby environment with the Scout gem code available.
+- The `hnswlib` Ruby gem installed (the method requires it dynamically):
+  gem install hnswlib
+- An embedding function that produces fixed-length numeric vectors. Scout exposes LLM.embed(...) which delegates to configured backends (OpenAI, Ollama, etc.). Ensure your embedding backend is configured and working.
+Basic usage
+-----------
+The common RAG flow is:
+1. Prepare a corpus (array of documents or chunks).
+2. Compute embeddings for each document.
+3. Build an HNSW index from those embeddings using LLM::RAG.index.
+4. For a query, compute its embedding and run a nearest-neighbor search on the index.
+5. Map matched neighbor indices back to the original documents.
+Example (Ruby)
+---------------
+This example shows a minimal end-to-end flow using Scout's LLM.embed helper to compute embeddings and LLM::RAG to build and query an index.
+```ruby
+# `documents` is an array of strings (documents/chunks).
+documents = [
+  "How to make espresso at home",
+  "Machine learning: an introduction",
+  "Ruby concurrency primitives and patterns",
+  "Cooking guide: baking sourdough"
+]
+# 1) Compute embeddings for each document.
+#    Use whatever embed model/backend you have configured. Pass model: if needed.
+embeddings = documents.map do |doc|
+  # returns an Array<Float> of fixed length
+  LLM.embed(doc, model: 'mxbai-embed-large')
+end
+# 2) Build the HNSW index
+index = LLM::RAG.index(embeddings)
+# 3) For a query, compute its embedding
+query = "best way to brew espresso"
+query_vec = LLM.embed(query, model: 'mxbai-embed-large')
+# 4) Run nearest-neighbor search
+#    search_knn returns two arrays: node indices and distances/scores
+k = 3
+nodes, scores = index.search_knn(query_vec, k)
+# 5) Map indices back to original documents
+results = nodes.map { |i| documents[i] }
+puts "Top #{k} results:"
+results.each_with_index do |doc, idx|
+  puts "#{idx + 1}. #{doc} (score=#{scores[idx]})"
+end
+```
+Notes and best practices
+------------------------
+- Vector dimensionality: All vectors passed to LLM::RAG.index must have identical length. The code inspects `data.first.length` to determine the index dimension.
+- Index IDs: The HNSW index stores integer IDs equal to the input array index. Keep a mapping from those indices to your document IDs/metadata (for instance, an array of document IDs parallel to the embeddings array).
+- Persistence: The RAG helper code only constructs and populates the index in memory. The underlying `hnswlib` gem typically offers persistence APIs (save/load). To persist or reload an index, consult the `hnswlib` gem documentation for the correct methods and usage patterns.
+- Memory and performance: HNSW indexes keep data in memory and can be large for many vectors. Choose your chunking strategy and max dataset size accordingly.
+- Distance metric: The current implementation uses the `'l2'` (Euclidean) space. If your application needs cosine similarity, either normalize vectors before indexing (common practice) or check whether the hnswlib Ruby binding supports a cosine space and adapt accordingly.
+Example: utility wrapper
+------------------------
+Here is a small utility that wraps the typical pattern and returns the top-k documents and scores for a query.
+```ruby
+# documents: Array of items (strings or objects). If objects, provide a `to_embedding_source` or pass a block to extract text.
+# embed_opts: options forwarded to LLM.embed (e.g. model: ...)
+def build_rag_index(documents, embed_opts = {})
+  # compute embeddings in order
+  embeddings = documents.map { |d| LLM.embed(d, embed_opts) }
+  index = LLM::RAG.index(embeddings)
+  [index, embeddings]
+end
+def rag_query(index, documents, query, k = 5, embed_opts = {})
+  qvec = LLM.embed(query, embed_opts)
+  nodes, scores = index.search_knn(qvec, k)
+  results = nodes.map { |i| { doc: documents[i], score: scores[nodes.index(i)] } }
+  results
+end
+# Usage:
+# index, embs = build_rag_index(documents, model: 'mxbai-embed-large')
+# top = rag_query(index, documents, 'how to make coffee', 3, model: 'mxbai-embed-large')
+```
+Troubleshooting
+---------------
+- "NoMethodError" or "uninitialized constant Hnswlib": ensure the `hnswlib` gem is installed and available to your Ruby runtime.
+- Inconsistent dimensions: If you see errors related to dimension mismatch, confirm every embedding vector has the same length and is numeric.
+- Mapping errors: Remember the index IDs correspond to the zero-based position in the `data` array passed to LLM::RAG.index. Keep a parallel array or map to metadata (IDs, titles, etc.).
+Further integration
+-------------------
+- Use chunking for long documents: split long documents into smaller passages, embed each passage, and keep a mapping from passage index to parent document.
+- Use result reranking: after retrieval, you can rerank retrieved documents with more expensive cross-encoders or scoring functions.
+- Combine with generative models: feed retrieved passages into an LLM prompt to produce answers grounded in retrieved content.
+References
+----------
+- lib/scout/llm/rag.rb (implementation)
+- hnswlib Ruby gem (install and persistence documentation)
+- Scout LLM embedding helpers (lib/scout/llm/embed.rb)