toy 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +1124 -0
- data/LICENSE +21 -0
- data/Makefile +2022 -0
- data/README.md +154 -0
- data/bin/toy +10 -0
- data/lib/toy/compute.rb +135 -0
- data/lib/toy/compute_cuda.rb +104 -0
- data/lib/toy/compute_metal.rb +97 -0
- data/lib/toy/core/cli/describe.rb +188 -0
- data/lib/toy/core/cli/eval.rb +385 -0
- data/lib/toy/core/cli/exit_codes.rb +15 -0
- data/lib/toy/core/cli/fetch.rb +238 -0
- data/lib/toy/core/cli/infer.rb +268 -0
- data/lib/toy/core/cli/install.rb +228 -0
- data/lib/toy/core/cli/list.rb +86 -0
- data/lib/toy/core/cli/manifest.rb +49 -0
- data/lib/toy/core/cli/new.rb +594 -0
- data/lib/toy/core/cli/serve.rb +237 -0
- data/lib/toy/core/cli/train.rb +471 -0
- data/lib/toy/core/cli.rb +165 -0
- data/lib/toy/core/config.rb +64 -0
- data/lib/toy/core/gguf_meta.rb +161 -0
- data/lib/toy/core/model_scan.rb +221 -0
- data/lib/toy/core/run_log.rb +94 -0
- data/lib/toy/core/toy_root.rb +95 -0
- data/lib/toy/dev/toy_card.rb +299 -0
- data/lib/toy/dev/toy_describe_flow.rb +412 -0
- data/lib/toy/dev/toy_logprobs.rb +86 -0
- data/lib/toy/dev/toy_tap.rb +183 -0
- data/lib/toy/dev/toy_token_drift.rb +121 -0
- data/lib/toy/ffi/tinynn.rb +1491 -0
- data/lib/toy/ffi/tinynn_cuda.rb +1124 -0
- data/lib/toy/ffi/tinynn_metal.rb +359 -0
- data/lib/toy/ffi_manifest.rb +84 -0
- data/lib/toy/io/bpe.rb +325 -0
- data/lib/toy/io/gguf_kv.rb +35 -0
- data/lib/toy/io/gguf_load.rb +331 -0
- data/lib/toy/io/loaders/toy_gpt2_loader.rb +70 -0
- data/lib/toy/io/loaders/toy_smollm2_loader.rb +754 -0
- data/lib/toy/io/model_index.rb +206 -0
- data/lib/toy/io/run_bundle.rb +280 -0
- data/lib/toy/io/tokenizer.rb +613 -0
- data/lib/toy/io/toy_corpus_loader.rb +52 -0
- data/lib/toy/io/toy_events.rb +56 -0
- data/lib/toy/io/toy_image_loader.rb +48 -0
- data/lib/toy/llm/adamw.rb +169 -0
- data/lib/toy/llm/archs/llama_arch.rb +233 -0
- data/lib/toy/llm/archs/llama_arch_cuda.rb +237 -0
- data/lib/toy/llm/archs/llama_arch_metal.rb +237 -0
- data/lib/toy/llm/blocks/transformer_block.rb +876 -0
- data/lib/toy/llm/blocks/transformer_block_cuda.rb +880 -0
- data/lib/toy/llm/blocks/transformer_block_metal.rb +880 -0
- data/lib/toy/llm/classify_batch.rb +88 -0
- data/lib/toy/llm/engine/gpt2_fwd_engine.rb +360 -0
- data/lib/toy/llm/engine/gpt2_fwd_engine_cuda.rb +362 -0
- data/lib/toy/llm/engine/gpt2_fwd_engine_metal.rb +362 -0
- data/lib/toy/llm/engine/gpt2_kv_engine.rb +346 -0
- data/lib/toy/llm/engine/gpt2_kv_engine_cuda.rb +348 -0
- data/lib/toy/llm/engine/gpt2_kv_engine_metal.rb +348 -0
- data/lib/toy/llm/engine/gpt2_seq_engine.rb +289 -0
- data/lib/toy/llm/engine/gpt2_seq_engine_cuda.rb +293 -0
- data/lib/toy/llm/engine/gpt2_seq_engine_metal.rb +293 -0
- data/lib/toy/llm/engine/llama_kv_engine.rb +1593 -0
- data/lib/toy/llm/engine/llama_kv_engine_cuda.rb +1526 -0
- data/lib/toy/llm/engine/llama_kv_engine_metal.rb +1526 -0
- data/lib/toy/llm/engine/llama_seq_engine.rb +1233 -0
- data/lib/toy/llm/engine/llama_seq_engine_cuda.rb +1238 -0
- data/lib/toy/llm/engine/llama_seq_engine_metal.rb +1238 -0
- data/lib/toy/llm/engine/vit_tiny_engine.rb +467 -0
- data/lib/toy/llm/labels.rb +142 -0
- data/lib/toy/llm/primitives/gqa.rb +62 -0
- data/lib/toy/llm/primitives/gqa_cuda.rb +66 -0
- data/lib/toy/llm/primitives/gqa_metal.rb +66 -0
- data/lib/toy/llm/primitives/rms_norm.rb +39 -0
- data/lib/toy/llm/primitives/rms_norm_cuda.rb +43 -0
- data/lib/toy/llm/primitives/rms_norm_metal.rb +43 -0
- data/lib/toy/llm/primitives/rope.rb +68 -0
- data/lib/toy/llm/primitives/rope_cuda.rb +72 -0
- data/lib/toy/llm/primitives/rope_metal.rb +72 -0
- data/lib/toy/llm/primitives/swiglu.rb +41 -0
- data/lib/toy/llm/primitives/swiglu_cuda.rb +45 -0
- data/lib/toy/llm/primitives/swiglu_metal.rb +45 -0
- data/lib/toy/llm/recipe_options.rb +71 -0
- data/lib/toy/llm/recipes/from_scratch.rb +105 -0
- data/lib/toy/llm/recipes/from_scratch_cuda.rb +109 -0
- data/lib/toy/llm/recipes/from_scratch_metal.rb +109 -0
- data/lib/toy/llm/recipes/lora.rb +110 -0
- data/lib/toy/llm/recipes/lora_cuda.rb +114 -0
- data/lib/toy/llm/recipes/lora_metal.rb +114 -0
- data/lib/toy/llm/recipes/vit_tiny.rb +75 -0
- data/lib/toy/llm/recipes/warm_start.rb +235 -0
- data/lib/toy/llm/recipes/warm_start_cuda.rb +239 -0
- data/lib/toy/llm/recipes/warm_start_metal.rb +239 -0
- data/lib/toy/llm/training_batch.rb +133 -0
- data/lib/toy/models/arch.rb +253 -0
- data/lib/toy/models/gpt2.rb +311 -0
- data/lib/toy/models/toy_gpt2.rb +177 -0
- data/lib/toy/models/toy_smollm2.rb +393 -0
- data/lib/toy/models/toy_vit.rb +83 -0
- data/lib/toy/models/transformer.rb +1494 -0
- data/lib/toy/models/transformer_lm.rb +298 -0
- data/lib/toy/models/transformer_lm_cuda.rb +159 -0
- data/lib/toy/models/transformer_lm_metal.rb +142 -0
- data/lib/toy/mri.rb +300 -0
- data/lib/toy/run/eval.rb +76 -0
- data/lib/toy/run/eval_cuda.rb +66 -0
- data/lib/toy/run/eval_lmc.rb +334 -0
- data/lib/toy/run/eval_metal.rb +67 -0
- data/lib/toy/run/infer.rb +130 -0
- data/lib/toy/run/infer_cuda.rb +118 -0
- data/lib/toy/run/infer_metal.rb +119 -0
- data/lib/toy/run/infer_trace.rb +37 -0
- data/lib/toy/run/serve.rb +144 -0
- data/lib/toy/run/train.rb +404 -0
- data/lib/toy/run/train_cuda.rb +397 -0
- data/lib/toy/run/train_gpt2.rb +103 -0
- data/lib/toy/run/train_gpt2_cuda.rb +85 -0
- data/lib/toy/run/train_gpt2_metal.rb +85 -0
- data/lib/toy/run/train_lora.rb +207 -0
- data/lib/toy/run/train_lora_cuda.rb +219 -0
- data/lib/toy/run/train_metal.rb +227 -0
- data/lib/toy/run/train_vit.rb +251 -0
- data/lib/toy/serve/openai/embeddings_handler.rb +92 -0
- data/lib/toy/serve/openai/handlers.rb +143 -0
- data/lib/toy/serve/openai/server.rb +159 -0
- data/lib/toy/train/sampler.rb +314 -0
- data/lib/toy/train/toy_chat_template.rb +179 -0
- data/lib/toy/train/toy_drift_grad.rb +176 -0
- data/lib/toy/train/toy_gguf_fuse.rb +428 -0
- data/lib/toy/train/toy_gguf_writer.rb +100 -0
- data/lib/toy/train/toy_lr_schedule.rb +39 -0
- data/lib/toy/train/toy_sample.rb +125 -0
- data/lib/toy/train/toy_trainer.rb +86 -0
- data/lib/toy/train/training.rb +160 -0
- data/lib/toy/version.rb +11 -0
- data/lib/toy.rb +902 -0
- data/prep/progress +118 -0
- data/prep/quietly +64 -0
- data/sig/toy.rbs +397 -0
- data/sig/toy_compute.rbs +450 -0
- data/spinel-ext.json +122 -0
- data/tinynn/Makefile +71 -0
- data/tinynn/tinynn_backend_cuda.c +99 -0
- data/tinynn/tinynn_backend_metal.m +75 -0
- data/tinynn/tinynn_events.c +122 -0
- data/tinynn/tinynn_events.h +83 -0
- data/tinynn/tinynn_ggml.c +2460 -0
- data/tinynn/tinynn_ggml.h +545 -0
- data/tinynn/tinynn_gguf.c +783 -0
- data/tinynn/tinynn_gguf.h +167 -0
- data/tinynn/tinynn_trace.c +180 -0
- data/tinynn/tinynn_trace.h +85 -0
- data/vendor/ggml/AUTHORS +335 -0
- data/vendor/ggml/CMakeLists.txt +505 -0
- data/vendor/ggml/CONTRIBUTING.md +3 -0
- data/vendor/ggml/LICENSE +21 -0
- data/vendor/ggml/README.md +50 -0
- data/vendor/ggml/ci/run.sh +395 -0
- data/vendor/ggml/cmake/FindNCCL.cmake +36 -0
- data/vendor/ggml/cmake/GitVars.cmake +22 -0
- data/vendor/ggml/cmake/common.cmake +50 -0
- data/vendor/ggml/cmake/ggml-config.cmake.in +191 -0
- data/vendor/ggml/docs/gguf.md +828 -0
- data/vendor/ggml/examples/CMakeLists.txt +34 -0
- data/vendor/ggml/examples/common-ggml.cpp +244 -0
- data/vendor/ggml/examples/common-ggml.h +18 -0
- data/vendor/ggml/examples/common.cpp +675 -0
- data/vendor/ggml/examples/common.h +322 -0
- data/vendor/ggml/examples/gpt-2/CMakeLists.txt +32 -0
- data/vendor/ggml/examples/gpt-2/README.md +225 -0
- data/vendor/ggml/examples/gpt-2/convert-cerebras-to-ggml.py +183 -0
- data/vendor/ggml/examples/gpt-2/convert-ckpt-to-ggml.py +159 -0
- data/vendor/ggml/examples/gpt-2/convert-h5-to-ggml.py +195 -0
- data/vendor/ggml/examples/gpt-2/download-ggml-model.sh +69 -0
- data/vendor/ggml/examples/gpt-2/download-model.sh +48 -0
- data/vendor/ggml/examples/gpt-2/main-alloc.cpp +880 -0
- data/vendor/ggml/examples/gpt-2/main-backend.cpp +946 -0
- data/vendor/ggml/examples/gpt-2/main-batched.cpp +1210 -0
- data/vendor/ggml/examples/gpt-2/main-ctx.cpp +840 -0
- data/vendor/ggml/examples/gpt-2/main-sched.cpp +1079 -0
- data/vendor/ggml/examples/gpt-2/quantize.cpp +184 -0
- data/vendor/ggml/examples/gpt-j/CMakeLists.txt +13 -0
- data/vendor/ggml/examples/gpt-j/README.md +239 -0
- data/vendor/ggml/examples/gpt-j/convert-h5-to-ggml.py +173 -0
- data/vendor/ggml/examples/gpt-j/download-ggml-model.sh +69 -0
- data/vendor/ggml/examples/gpt-j/download-model.sh +11 -0
- data/vendor/ggml/examples/gpt-j/main.cpp +755 -0
- data/vendor/ggml/examples/gpt-j/quantize.cpp +182 -0
- data/vendor/ggml/examples/magika/CMakeLists.txt +17 -0
- data/vendor/ggml/examples/magika/README.md +23 -0
- data/vendor/ggml/examples/magika/convert.py +32 -0
- data/vendor/ggml/examples/magika/main.cpp +374 -0
- data/vendor/ggml/examples/mnist/CMakeLists.txt +58 -0
- data/vendor/ggml/examples/mnist/README.md +206 -0
- data/vendor/ggml/examples/mnist/mnist-common.cpp +496 -0
- data/vendor/ggml/examples/mnist/mnist-common.h +166 -0
- data/vendor/ggml/examples/mnist/mnist-eval.cpp +67 -0
- data/vendor/ggml/examples/mnist/mnist-train-cnn.py +91 -0
- data/vendor/ggml/examples/mnist/mnist-train-fc.py +131 -0
- data/vendor/ggml/examples/mnist/mnist-train.cpp +39 -0
- data/vendor/ggml/examples/mnist/server.py +36 -0
- data/vendor/ggml/examples/mnist/web/index.html +178 -0
- data/vendor/ggml/examples/perf-metal/CMakeLists.txt +7 -0
- data/vendor/ggml/examples/perf-metal/perf-metal.cpp +152 -0
- data/vendor/ggml/examples/prompts/dolly-v2.txt +100 -0
- data/vendor/ggml/examples/prompts/gpt-2-chinese.txt +1 -0
- data/vendor/ggml/examples/prompts/gpt-2.txt +100 -0
- data/vendor/ggml/examples/prompts/gpt-j.txt +100 -0
- data/vendor/ggml/examples/prompts/gpt-neox-japanese.txt +1 -0
- data/vendor/ggml/examples/prompts/gpt-neox.txt +100 -0
- data/vendor/ggml/examples/prompts/polyglot-ko.txt +3 -0
- data/vendor/ggml/examples/prompts/replit.txt +100 -0
- data/vendor/ggml/examples/prompts/starcoder.txt +100 -0
- data/vendor/ggml/examples/prompts/test-cases.txt +110 -0
- data/vendor/ggml/examples/prompts/tokenize_huggingface.py +65 -0
- data/vendor/ggml/examples/prompts/whisper.txt +100 -0
- data/vendor/ggml/examples/python/README.md +115 -0
- data/vendor/ggml/examples/python/api.h +14 -0
- data/vendor/ggml/examples/python/example_add_quant.py +25 -0
- data/vendor/ggml/examples/python/example_test_all_quants.py +68 -0
- data/vendor/ggml/examples/python/ggml/__init__.py +58 -0
- data/vendor/ggml/examples/python/ggml/__init__.pyi +2406 -0
- data/vendor/ggml/examples/python/ggml/cffi.py +11 -0
- data/vendor/ggml/examples/python/ggml/ffi/__init__.pyi +7 -0
- data/vendor/ggml/examples/python/ggml/utils.py +182 -0
- data/vendor/ggml/examples/python/regenerate.py +42 -0
- data/vendor/ggml/examples/python/stubs.py +128 -0
- data/vendor/ggml/examples/python/test_tensor.py +258 -0
- data/vendor/ggml/examples/sam/CMakeLists.txt +13 -0
- data/vendor/ggml/examples/sam/README.md +95 -0
- data/vendor/ggml/examples/sam/convert-pth-to-ggml.py +147 -0
- data/vendor/ggml/examples/sam/example.jpg +0 -0
- data/vendor/ggml/examples/sam/sam.cpp +2370 -0
- data/vendor/ggml/examples/simple/CMakeLists.txt +21 -0
- data/vendor/ggml/examples/simple/README.md +61 -0
- data/vendor/ggml/examples/simple/simple-backend.cpp +153 -0
- data/vendor/ggml/examples/simple/simple-ctx.cpp +127 -0
- data/vendor/ggml/examples/stb_image.h +7987 -0
- data/vendor/ggml/examples/stb_image_write.h +1724 -0
- data/vendor/ggml/examples/test-cmake/CMakeLists.txt +10 -0
- data/vendor/ggml/examples/test-cmake/README.md +3 -0
- data/vendor/ggml/examples/test-cmake/test-cmake.cpp +6 -0
- data/vendor/ggml/examples/yolo/CMakeLists.txt +6 -0
- data/vendor/ggml/examples/yolo/README.md +59 -0
- data/vendor/ggml/examples/yolo/convert-yolov3-tiny.py +53 -0
- data/vendor/ggml/examples/yolo/data/coco.names +80 -0
- data/vendor/ggml/examples/yolo/data/labels/100_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/100_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/100_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/100_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/100_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/100_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/100_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/100_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/101_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/101_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/101_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/101_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/101_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/101_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/101_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/101_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/102_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/102_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/102_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/102_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/102_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/102_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/102_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/102_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/103_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/103_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/103_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/103_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/103_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/103_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/103_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/103_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/104_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/104_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/104_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/104_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/104_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/104_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/104_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/104_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/105_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/105_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/105_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/105_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/105_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/105_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/105_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/105_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/106_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/106_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/106_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/106_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/106_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/106_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/106_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/106_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/107_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/107_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/107_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/107_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/107_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/107_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/107_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/107_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/108_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/108_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/108_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/108_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/108_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/108_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/108_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/108_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/109_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/109_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/109_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/109_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/109_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/109_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/109_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/109_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/110_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/110_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/110_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/110_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/110_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/110_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/110_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/110_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/111_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/111_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/111_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/111_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/111_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/111_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/111_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/111_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/112_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/112_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/112_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/112_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/112_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/112_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/112_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/112_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/113_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/113_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/113_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/113_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/113_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/113_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/113_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/113_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/114_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/114_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/114_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/114_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/114_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/114_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/114_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/114_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/115_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/115_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/115_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/115_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/115_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/115_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/115_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/115_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/116_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/116_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/116_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/116_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/116_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/116_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/116_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/116_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/117_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/117_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/117_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/117_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/117_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/117_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/117_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/117_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/118_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/118_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/118_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/118_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/118_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/118_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/118_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/118_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/119_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/119_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/119_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/119_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/119_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/119_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/119_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/119_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/120_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/120_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/120_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/120_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/120_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/120_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/120_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/120_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/121_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/121_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/121_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/121_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/121_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/121_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/121_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/121_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/122_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/122_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/122_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/122_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/122_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/122_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/122_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/122_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/123_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/123_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/123_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/123_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/123_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/123_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/123_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/123_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/124_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/124_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/124_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/124_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/124_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/124_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/124_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/124_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/125_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/125_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/125_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/125_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/125_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/125_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/125_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/125_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/126_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/126_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/126_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/126_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/126_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/126_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/126_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/126_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/32_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/32_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/32_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/32_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/32_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/32_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/32_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/32_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/33_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/33_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/33_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/33_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/33_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/33_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/33_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/33_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/34_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/34_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/34_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/34_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/34_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/34_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/34_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/34_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/35_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/35_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/35_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/35_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/35_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/35_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/35_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/35_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/36_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/36_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/36_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/36_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/36_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/36_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/36_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/36_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/37_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/37_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/37_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/37_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/37_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/37_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/37_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/37_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/38_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/38_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/38_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/38_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/38_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/38_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/38_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/38_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/39_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/39_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/39_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/39_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/39_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/39_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/39_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/39_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/40_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/40_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/40_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/40_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/40_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/40_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/40_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/40_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/41_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/41_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/41_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/41_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/41_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/41_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/41_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/41_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/42_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/42_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/42_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/42_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/42_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/42_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/42_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/42_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/43_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/43_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/43_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/43_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/43_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/43_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/43_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/43_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/44_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/44_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/44_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/44_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/44_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/44_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/44_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/44_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/45_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/45_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/45_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/45_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/45_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/45_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/45_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/45_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/46_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/46_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/46_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/46_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/46_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/46_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/46_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/46_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/47_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/47_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/47_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/47_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/47_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/47_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/47_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/47_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/48_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/48_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/48_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/48_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/48_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/48_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/48_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/48_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/49_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/49_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/49_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/49_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/49_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/49_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/49_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/49_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/50_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/50_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/50_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/50_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/50_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/50_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/50_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/50_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/51_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/51_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/51_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/51_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/51_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/51_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/51_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/51_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/52_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/52_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/52_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/52_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/52_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/52_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/52_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/52_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/53_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/53_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/53_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/53_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/53_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/53_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/53_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/53_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/54_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/54_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/54_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/54_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/54_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/54_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/54_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/54_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/55_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/55_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/55_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/55_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/55_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/55_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/55_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/55_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/56_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/56_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/56_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/56_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/56_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/56_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/56_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/56_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/57_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/57_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/57_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/57_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/57_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/57_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/57_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/57_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/58_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/58_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/58_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/58_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/58_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/58_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/58_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/58_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/59_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/59_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/59_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/59_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/59_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/59_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/59_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/59_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/60_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/60_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/60_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/60_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/60_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/60_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/60_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/60_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/61_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/61_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/61_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/61_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/61_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/61_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/61_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/61_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/62_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/62_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/62_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/62_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/62_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/62_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/62_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/62_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/63_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/63_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/63_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/63_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/63_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/63_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/63_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/63_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/64_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/64_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/64_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/64_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/64_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/64_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/64_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/64_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/65_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/65_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/65_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/65_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/65_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/65_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/65_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/65_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/66_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/66_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/66_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/66_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/66_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/66_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/66_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/66_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/67_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/67_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/67_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/67_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/67_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/67_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/67_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/67_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/68_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/68_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/68_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/68_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/68_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/68_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/68_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/68_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/69_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/69_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/69_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/69_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/69_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/69_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/69_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/69_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/70_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/70_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/70_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/70_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/70_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/70_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/70_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/70_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/71_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/71_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/71_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/71_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/71_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/71_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/71_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/71_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/72_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/72_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/72_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/72_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/72_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/72_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/72_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/72_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/73_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/73_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/73_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/73_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/73_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/73_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/73_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/73_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/74_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/74_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/74_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/74_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/74_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/74_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/74_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/74_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/75_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/75_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/75_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/75_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/75_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/75_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/75_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/75_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/76_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/76_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/76_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/76_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/76_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/76_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/76_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/76_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/77_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/77_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/77_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/77_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/77_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/77_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/77_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/77_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/78_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/78_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/78_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/78_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/78_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/78_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/78_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/78_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/79_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/79_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/79_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/79_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/79_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/79_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/79_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/79_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/80_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/80_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/80_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/80_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/80_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/80_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/80_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/80_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/81_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/81_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/81_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/81_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/81_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/81_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/81_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/81_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/82_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/82_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/82_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/82_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/82_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/82_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/82_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/82_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/83_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/83_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/83_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/83_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/83_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/83_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/83_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/83_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/84_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/84_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/84_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/84_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/84_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/84_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/84_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/84_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/85_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/85_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/85_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/85_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/85_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/85_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/85_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/85_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/86_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/86_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/86_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/86_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/86_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/86_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/86_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/86_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/87_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/87_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/87_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/87_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/87_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/87_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/87_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/87_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/88_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/88_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/88_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/88_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/88_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/88_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/88_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/88_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/89_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/89_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/89_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/89_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/89_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/89_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/89_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/89_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/90_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/90_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/90_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/90_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/90_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/90_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/90_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/90_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/91_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/91_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/91_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/91_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/91_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/91_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/91_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/91_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/92_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/92_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/92_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/92_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/92_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/92_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/92_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/92_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/93_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/93_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/93_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/93_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/93_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/93_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/93_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/93_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/94_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/94_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/94_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/94_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/94_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/94_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/94_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/94_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/95_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/95_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/95_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/95_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/95_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/95_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/95_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/95_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/96_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/96_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/96_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/96_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/96_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/96_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/96_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/96_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/97_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/97_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/97_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/97_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/97_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/97_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/97_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/97_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/98_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/98_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/98_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/98_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/98_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/98_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/98_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/98_7.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/99_0.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/99_1.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/99_2.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/99_3.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/99_4.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/99_5.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/99_6.png +0 -0
- data/vendor/ggml/examples/yolo/data/labels/99_7.png +0 -0
- data/vendor/ggml/examples/yolo/yolo-image.cpp +210 -0
- data/vendor/ggml/examples/yolo/yolo-image.h +39 -0
- data/vendor/ggml/examples/yolo/yolov3-tiny.cpp +661 -0
- data/vendor/ggml/ggml.pc.in +10 -0
- data/vendor/ggml/include/ggml-alloc.h +85 -0
- data/vendor/ggml/include/ggml-backend.h +431 -0
- data/vendor/ggml/include/ggml-blas.h +25 -0
- data/vendor/ggml/include/ggml-cann.h +123 -0
- data/vendor/ggml/include/ggml-cpp.h +39 -0
- data/vendor/ggml/include/ggml-cpu.h +151 -0
- data/vendor/ggml/include/ggml-cuda.h +50 -0
- data/vendor/ggml/include/ggml-hexagon.h +19 -0
- data/vendor/ggml/include/ggml-metal.h +61 -0
- data/vendor/ggml/include/ggml-opencl.h +26 -0
- data/vendor/ggml/include/ggml-openvino.h +37 -0
- data/vendor/ggml/include/ggml-opt.h +256 -0
- data/vendor/ggml/include/ggml-rpc.h +35 -0
- data/vendor/ggml/include/ggml-sycl.h +49 -0
- data/vendor/ggml/include/ggml-virtgpu.h +14 -0
- data/vendor/ggml/include/ggml-vulkan.h +29 -0
- data/vendor/ggml/include/ggml-webgpu.h +19 -0
- data/vendor/ggml/include/ggml-zdnn.h +17 -0
- data/vendor/ggml/include/ggml-zendnn.h +22 -0
- data/vendor/ggml/include/ggml.h +2845 -0
- data/vendor/ggml/include/gguf.h +204 -0
- data/vendor/ggml/requirements.txt +12 -0
- data/vendor/ggml/scripts/gen-authors.sh +9 -0
- data/vendor/ggml/scripts/release.sh +296 -0
- data/vendor/ggml/scripts/sync-llama-am.sh +167 -0
- data/vendor/ggml/scripts/sync-llama.last +1 -0
- data/vendor/ggml/scripts/sync-llama.sh +21 -0
- data/vendor/ggml/scripts/sync-whisper-am.sh +138 -0
- data/vendor/ggml/scripts/sync-whisper.last +1 -0
- data/vendor/ggml/scripts/sync-whisper.sh +17 -0
- data/vendor/ggml/src/CMakeLists.txt +493 -0
- data/vendor/ggml/src/ggml-alloc.c +1248 -0
- data/vendor/ggml/src/ggml-backend-dl.cpp +48 -0
- data/vendor/ggml/src/ggml-backend-dl.h +45 -0
- data/vendor/ggml/src/ggml-backend-impl.h +275 -0
- data/vendor/ggml/src/ggml-backend-meta.cpp +2144 -0
- data/vendor/ggml/src/ggml-backend-reg.cpp +586 -0
- data/vendor/ggml/src/ggml-backend.cpp +2371 -0
- data/vendor/ggml/src/ggml-blas/CMakeLists.txt +101 -0
- data/vendor/ggml/src/ggml-blas/ggml-blas.cpp +522 -0
- data/vendor/ggml/src/ggml-cann/CMakeLists.txt +89 -0
- data/vendor/ggml/src/ggml-cann/acl_tensor.cpp +195 -0
- data/vendor/ggml/src/ggml-cann/acl_tensor.h +349 -0
- data/vendor/ggml/src/ggml-cann/aclnn_ops.cpp +4436 -0
- data/vendor/ggml/src/ggml-cann/aclnn_ops.h +1190 -0
- data/vendor/ggml/src/ggml-cann/common.h +651 -0
- data/vendor/ggml/src/ggml-cann/ggml-cann.cpp +3062 -0
- data/vendor/ggml/src/ggml-common.h +1900 -0
- data/vendor/ggml/src/ggml-cpu/CMakeLists.txt +731 -0
- data/vendor/ggml/src/ggml-cpu/amx/amx.cpp +249 -0
- data/vendor/ggml/src/ggml-cpu/amx/amx.h +8 -0
- data/vendor/ggml/src/ggml-cpu/amx/common.h +115 -0
- data/vendor/ggml/src/ggml-cpu/amx/mmq.cpp +2512 -0
- data/vendor/ggml/src/ggml-cpu/amx/mmq.h +10 -0
- data/vendor/ggml/src/ggml-cpu/arch/arm/cpu-feats.cpp +98 -0
- data/vendor/ggml/src/ggml-cpu/arch/arm/quants.c +4245 -0
- data/vendor/ggml/src/ggml-cpu/arch/arm/repack.cpp +5156 -0
- data/vendor/ggml/src/ggml-cpu/arch/loongarch/quants.c +2158 -0
- data/vendor/ggml/src/ggml-cpu/arch/powerpc/cpu-feats.cpp +82 -0
- data/vendor/ggml/src/ggml-cpu/arch/powerpc/quants.c +2304 -0
- data/vendor/ggml/src/ggml-cpu/arch/riscv/cpu-feats.cpp +38 -0
- data/vendor/ggml/src/ggml-cpu/arch/riscv/quants.c +4553 -0
- data/vendor/ggml/src/ggml-cpu/arch/riscv/repack.cpp +1703 -0
- data/vendor/ggml/src/ggml-cpu/arch/s390/cpu-feats.cpp +50 -0
- data/vendor/ggml/src/ggml-cpu/arch/s390/quants.c +1465 -0
- data/vendor/ggml/src/ggml-cpu/arch/wasm/quants.c +1220 -0
- data/vendor/ggml/src/ggml-cpu/arch/x86/cpu-feats.cpp +327 -0
- data/vendor/ggml/src/ggml-cpu/arch/x86/quants.c +3970 -0
- data/vendor/ggml/src/ggml-cpu/arch/x86/repack.cpp +6407 -0
- data/vendor/ggml/src/ggml-cpu/arch-fallback.h +348 -0
- data/vendor/ggml/src/ggml-cpu/binary-ops.cpp +154 -0
- data/vendor/ggml/src/ggml-cpu/binary-ops.h +16 -0
- data/vendor/ggml/src/ggml-cpu/cmake/FindSIMD.cmake +100 -0
- data/vendor/ggml/src/ggml-cpu/cmake/FindSMTIME.cmake +32 -0
- data/vendor/ggml/src/ggml-cpu/common.h +95 -0
- data/vendor/ggml/src/ggml-cpu/ggml-cpu-impl.h +539 -0
- data/vendor/ggml/src/ggml-cpu/ggml-cpu.c +3835 -0
- data/vendor/ggml/src/ggml-cpu/ggml-cpu.cpp +703 -0
- data/vendor/ggml/src/ggml-cpu/hbm.cpp +55 -0
- data/vendor/ggml/src/ggml-cpu/hbm.h +8 -0
- data/vendor/ggml/src/ggml-cpu/kleidiai/kernels.cpp +939 -0
- data/vendor/ggml/src/ggml-cpu/kleidiai/kernels.h +90 -0
- data/vendor/ggml/src/ggml-cpu/kleidiai/kleidiai.cpp +1513 -0
- data/vendor/ggml/src/ggml-cpu/kleidiai/kleidiai.h +17 -0
- data/vendor/ggml/src/ggml-cpu/llamafile/sgemm.cpp +4051 -0
- data/vendor/ggml/src/ggml-cpu/llamafile/sgemm.h +25 -0
- data/vendor/ggml/src/ggml-cpu/ops.cpp +11373 -0
- data/vendor/ggml/src/ggml-cpu/ops.h +119 -0
- data/vendor/ggml/src/ggml-cpu/quants.c +1288 -0
- data/vendor/ggml/src/ggml-cpu/quants.h +103 -0
- data/vendor/ggml/src/ggml-cpu/repack.cpp +4836 -0
- data/vendor/ggml/src/ggml-cpu/repack.h +245 -0
- data/vendor/ggml/src/ggml-cpu/simd-gemm.h +226 -0
- data/vendor/ggml/src/ggml-cpu/simd-mappings.h +1319 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/ime.cpp +1740 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/ime.h +21 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/ime1_kernels.cpp +1027 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/ime2_kernels.cpp +5768 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/ime_env.cpp +320 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/ime_env.h +55 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/ime_kernels.h +189 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/repack.cpp +1795 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/repack.h +14 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/rvv_kernels.cpp +3178 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/rvv_kernels.h +95 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/spine_barrier.h +34 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/spine_mem_pool.cpp +760 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/spine_mem_pool.h +32 -0
- data/vendor/ggml/src/ggml-cpu/spacemit/spine_tcm.h +409 -0
- data/vendor/ggml/src/ggml-cpu/traits.cpp +36 -0
- data/vendor/ggml/src/ggml-cpu/traits.h +38 -0
- data/vendor/ggml/src/ggml-cpu/unary-ops.cpp +337 -0
- data/vendor/ggml/src/ggml-cpu/unary-ops.h +35 -0
- data/vendor/ggml/src/ggml-cpu/vec.cpp +629 -0
- data/vendor/ggml/src/ggml-cpu/vec.h +1588 -0
- data/vendor/ggml/src/ggml-cuda/CMakeLists.txt +268 -0
- data/vendor/ggml/src/ggml-cuda/acc.cu +61 -0
- data/vendor/ggml/src/ggml-cuda/acc.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/add-id.cu +58 -0
- data/vendor/ggml/src/ggml-cuda/add-id.cuh +3 -0
- data/vendor/ggml/src/ggml-cuda/allreduce.cu +971 -0
- data/vendor/ggml/src/ggml-cuda/allreduce.cuh +29 -0
- data/vendor/ggml/src/ggml-cuda/arange.cu +34 -0
- data/vendor/ggml/src/ggml-cuda/arange.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/argmax.cu +91 -0
- data/vendor/ggml/src/ggml-cuda/argmax.cuh +3 -0
- data/vendor/ggml/src/ggml-cuda/argsort.cu +266 -0
- data/vendor/ggml/src/ggml-cuda/argsort.cuh +19 -0
- data/vendor/ggml/src/ggml-cuda/binbcast.cu +534 -0
- data/vendor/ggml/src/ggml-cuda/binbcast.cuh +12 -0
- data/vendor/ggml/src/ggml-cuda/clamp.cu +45 -0
- data/vendor/ggml/src/ggml-cuda/clamp.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/common.cuh +1489 -0
- data/vendor/ggml/src/ggml-cuda/concat.cu +204 -0
- data/vendor/ggml/src/ggml-cuda/concat.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/conv-transpose-1d.cu +86 -0
- data/vendor/ggml/src/ggml-cuda/conv-transpose-1d.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/conv2d-dw.cu +161 -0
- data/vendor/ggml/src/ggml-cuda/conv2d-dw.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/conv2d-transpose.cu +115 -0
- data/vendor/ggml/src/ggml-cuda/conv2d-transpose.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/conv2d.cu +166 -0
- data/vendor/ggml/src/ggml-cuda/conv2d.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/convert.cu +892 -0
- data/vendor/ggml/src/ggml-cuda/convert.cuh +66 -0
- data/vendor/ggml/src/ggml-cuda/count-equal.cu +64 -0
- data/vendor/ggml/src/ggml-cuda/count-equal.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/cp-async.cuh +57 -0
- data/vendor/ggml/src/ggml-cuda/cpy-utils.cuh +217 -0
- data/vendor/ggml/src/ggml-cuda/cpy.cu +558 -0
- data/vendor/ggml/src/ggml-cuda/cpy.cuh +7 -0
- data/vendor/ggml/src/ggml-cuda/cross-entropy-loss.cu +177 -0
- data/vendor/ggml/src/ggml-cuda/cross-entropy-loss.cuh +7 -0
- data/vendor/ggml/src/ggml-cuda/cumsum.cu +307 -0
- data/vendor/ggml/src/ggml-cuda/cumsum.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/dequantize.cuh +99 -0
- data/vendor/ggml/src/ggml-cuda/diag.cu +77 -0
- data/vendor/ggml/src/ggml-cuda/diag.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/diagmask.cu +40 -0
- data/vendor/ggml/src/ggml-cuda/diagmask.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/fattn-common.cuh +1212 -0
- data/vendor/ggml/src/ggml-cuda/fattn-mma-f16.cuh +2020 -0
- data/vendor/ggml/src/ggml-cuda/fattn-tile.cu +61 -0
- data/vendor/ggml/src/ggml-cuda/fattn-tile.cuh +1347 -0
- data/vendor/ggml/src/ggml-cuda/fattn-vec.cuh +600 -0
- data/vendor/ggml/src/ggml-cuda/fattn-wmma-f16.cu +696 -0
- data/vendor/ggml/src/ggml-cuda/fattn-wmma-f16.cuh +51 -0
- data/vendor/ggml/src/ggml-cuda/fattn.cu +562 -0
- data/vendor/ggml/src/ggml-cuda/fattn.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/fill.cu +37 -0
- data/vendor/ggml/src/ggml-cuda/fill.cuh +3 -0
- data/vendor/ggml/src/ggml-cuda/gated_delta_net.cu +311 -0
- data/vendor/ggml/src/ggml-cuda/gated_delta_net.cuh +4 -0
- data/vendor/ggml/src/ggml-cuda/getrows.cu +300 -0
- data/vendor/ggml/src/ggml-cuda/getrows.cuh +15 -0
- data/vendor/ggml/src/ggml-cuda/ggml-cuda.cu +5684 -0
- data/vendor/ggml/src/ggml-cuda/gla.cu +93 -0
- data/vendor/ggml/src/ggml-cuda/gla.cuh +3 -0
- data/vendor/ggml/src/ggml-cuda/im2col.cu +267 -0
- data/vendor/ggml/src/ggml-cuda/im2col.cuh +6 -0
- data/vendor/ggml/src/ggml-cuda/mean.cu +75 -0
- data/vendor/ggml/src/ggml-cuda/mean.cuh +3 -0
- data/vendor/ggml/src/ggml-cuda/mma.cuh +1456 -0
- data/vendor/ggml/src/ggml-cuda/mmf.cu +191 -0
- data/vendor/ggml/src/ggml-cuda/mmf.cuh +908 -0
- data/vendor/ggml/src/ggml-cuda/mmid.cu +164 -0
- data/vendor/ggml/src/ggml-cuda/mmid.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/mmq.cu +372 -0
- data/vendor/ggml/src/ggml-cuda/mmq.cuh +4176 -0
- data/vendor/ggml/src/ggml-cuda/mmvf.cu +862 -0
- data/vendor/ggml/src/ggml-cuda/mmvf.cuh +14 -0
- data/vendor/ggml/src/ggml-cuda/mmvq.cu +1161 -0
- data/vendor/ggml/src/ggml-cuda/mmvq.cuh +16 -0
- data/vendor/ggml/src/ggml-cuda/norm.cu +672 -0
- data/vendor/ggml/src/ggml-cuda/norm.cuh +18 -0
- data/vendor/ggml/src/ggml-cuda/opt-step-adamw.cu +78 -0
- data/vendor/ggml/src/ggml-cuda/opt-step-adamw.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/opt-step-sgd.cu +49 -0
- data/vendor/ggml/src/ggml-cuda/opt-step-sgd.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/out-prod.cu +84 -0
- data/vendor/ggml/src/ggml-cuda/out-prod.cuh +3 -0
- data/vendor/ggml/src/ggml-cuda/pad.cu +106 -0
- data/vendor/ggml/src/ggml-cuda/pad.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/pad_reflect_1d.cu +91 -0
- data/vendor/ggml/src/ggml-cuda/pad_reflect_1d.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/pool2d.cu +94 -0
- data/vendor/ggml/src/ggml-cuda/pool2d.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/quantize.cu +443 -0
- data/vendor/ggml/src/ggml-cuda/quantize.cuh +41 -0
- data/vendor/ggml/src/ggml-cuda/reduce_rows.cuh +39 -0
- data/vendor/ggml/src/ggml-cuda/roll.cu +67 -0
- data/vendor/ggml/src/ggml-cuda/roll.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/rope.cu +665 -0
- data/vendor/ggml/src/ggml-cuda/rope.cuh +9 -0
- data/vendor/ggml/src/ggml-cuda/scale.cu +34 -0
- data/vendor/ggml/src/ggml-cuda/scale.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/set-rows.cu +330 -0
- data/vendor/ggml/src/ggml-cuda/set-rows.cuh +7 -0
- data/vendor/ggml/src/ggml-cuda/set.cu +39 -0
- data/vendor/ggml/src/ggml-cuda/set.cuh +7 -0
- data/vendor/ggml/src/ggml-cuda/snake.cu +72 -0
- data/vendor/ggml/src/ggml-cuda/snake.cuh +8 -0
- data/vendor/ggml/src/ggml-cuda/softcap.cu +34 -0
- data/vendor/ggml/src/ggml-cuda/softcap.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/softmax.cu +472 -0
- data/vendor/ggml/src/ggml-cuda/softmax.cuh +7 -0
- data/vendor/ggml/src/ggml-cuda/solve_tri.cu +275 -0
- data/vendor/ggml/src/ggml-cuda/solve_tri.cuh +3 -0
- data/vendor/ggml/src/ggml-cuda/ssm-conv.cu +197 -0
- data/vendor/ggml/src/ggml-cuda/ssm-conv.cuh +3 -0
- data/vendor/ggml/src/ggml-cuda/ssm-scan.cu +342 -0
- data/vendor/ggml/src/ggml-cuda/ssm-scan.cuh +3 -0
- data/vendor/ggml/src/ggml-cuda/sum.cu +41 -0
- data/vendor/ggml/src/ggml-cuda/sum.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/sumrows.cu +43 -0
- data/vendor/ggml/src/ggml-cuda/sumrows.cuh +4 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_16.cu +6 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_32.cu +6 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_8.cu +12 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_1.cu +10 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_2.cu +10 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu +12 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_16.cu +6 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_32.cu +6 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu +12 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_8.cu +12 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_32-ncols2_1.cu +10 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_32-ncols2_2.cu +10 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_16.cu +6 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_2.cu +10 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu +12 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_8.cu +12 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_64-ncols2_1.cu +10 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_1.cu +10 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_2.cu +10 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu +12 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_8.cu +12 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq112-dv112.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq128-dv128.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq192-dv128.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq256-dv256.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq320-dv256.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq40-dv40.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq512-dv512.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq576-dv512.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq64-dv64.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq72-dv72.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq80-dv80.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq96-dv96.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-bf16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-f16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q4_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q4_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q5_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q5_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q8_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-f16-bf16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-f16-f16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-f16-q4_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-f16-q4_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-f16-q5_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-f16-q5_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-f16-q8_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_0-bf16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_0-f16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_0-q4_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_0-q4_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_0-q5_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_0-q5_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_0-q8_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_1-bf16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_1-f16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_1-q4_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_1-q4_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_1-q5_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_1-q5_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_1-q8_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_0-bf16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_0-f16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_0-q4_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_0-q4_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_0-q5_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_0-q5_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_0-q8_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_1-bf16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_1-f16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_1-q4_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_1-q4_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_1-q5_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_1-q5_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_1-q8_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q8_0-bf16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q8_0-f16.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q8_0-q4_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q8_0-q4_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q8_0-q5_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q8_0-q5_1.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q8_0-q8_0.cu +7 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/generate_cu_files.py +110 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_1.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_10.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_11.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_12.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_13.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_14.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_15.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_16.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_2.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_3.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_4.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_5.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_6.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_7.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_8.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_9.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-iq1_s.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-iq2_s.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-iq2_xs.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-iq2_xxs.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-iq3_s.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-iq3_xxs.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-iq4_nl.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-iq4_xs.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-mxfp4.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-nvfp4.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-q1_0.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-q2_k.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-q3_k.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-q4_0.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-q4_1.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-q4_k.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-q5_0.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-q5_1.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-q5_k.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-q6_k.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/template-instances/mmq-instance-q8_0.cu +5 -0
- data/vendor/ggml/src/ggml-cuda/top-k.cu +95 -0
- data/vendor/ggml/src/ggml-cuda/top-k.cuh +3 -0
- data/vendor/ggml/src/ggml-cuda/topk-moe.cu +415 -0
- data/vendor/ggml/src/ggml-cuda/topk-moe.cuh +27 -0
- data/vendor/ggml/src/ggml-cuda/tri.cu +136 -0
- data/vendor/ggml/src/ggml-cuda/tri.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/tsembd.cu +47 -0
- data/vendor/ggml/src/ggml-cuda/tsembd.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/unary.cu +640 -0
- data/vendor/ggml/src/ggml-cuda/unary.cuh +114 -0
- data/vendor/ggml/src/ggml-cuda/upscale.cu +293 -0
- data/vendor/ggml/src/ggml-cuda/upscale.cuh +5 -0
- data/vendor/ggml/src/ggml-cuda/vecdotq.cuh +1317 -0
- data/vendor/ggml/src/ggml-cuda/vendors/cuda.h +28 -0
- data/vendor/ggml/src/ggml-cuda/vendors/hip.h +304 -0
- data/vendor/ggml/src/ggml-cuda/vendors/musa.h +150 -0
- data/vendor/ggml/src/ggml-cuda/wkv.cu +199 -0
- data/vendor/ggml/src/ggml-cuda/wkv.cuh +7 -0
- data/vendor/ggml/src/ggml-hexagon/CMakeLists.txt +118 -0
- data/vendor/ggml/src/ggml-hexagon/ggml-hexagon.cpp +3680 -0
- data/vendor/ggml/src/ggml-hexagon/htp/CMakeLists.txt +78 -0
- data/vendor/ggml/src/ggml-hexagon/htp/act-ops.c +782 -0
- data/vendor/ggml/src/ggml-hexagon/htp/argsort-ops.c +293 -0
- data/vendor/ggml/src/ggml-hexagon/htp/binary-ops.c +872 -0
- data/vendor/ggml/src/ggml-hexagon/htp/cmake-toolchain.cmake +157 -0
- data/vendor/ggml/src/ggml-hexagon/htp/cpy-ops.c +275 -0
- data/vendor/ggml/src/ggml-hexagon/htp/cumsum-ops.c +270 -0
- data/vendor/ggml/src/ggml-hexagon/htp/diag-ops.c +216 -0
- data/vendor/ggml/src/ggml-hexagon/htp/fill-ops.c +123 -0
- data/vendor/ggml/src/ggml-hexagon/htp/flash-attn-ops.c +727 -0
- data/vendor/ggml/src/ggml-hexagon/htp/gated-delta-net-ops.c +955 -0
- data/vendor/ggml/src/ggml-hexagon/htp/get-rows-ops.c +124 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hex-dma.c +63 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hex-dma.h +372 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hex-dump.h +86 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hex-fastdiv.h +37 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hex-utils.h +137 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hmx-flash-attn-ops.c +1841 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c +1785 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hmx-ops.h +71 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hmx-profile.h +34 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hmx-queue.c +158 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hmx-queue.h +134 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hmx-utils.h +200 -0
- data/vendor/ggml/src/ggml-hexagon/htp/htp-ctx.h +111 -0
- data/vendor/ggml/src/ggml-hexagon/htp/htp-ops.h +181 -0
- data/vendor/ggml/src/ggml-hexagon/htp/htp_iface.idl +22 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-arith.h +443 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-base.h +308 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-copy.h +262 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-div.h +291 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-dump.h +129 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-exp.h +216 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-floor.h +100 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-inverse.h +210 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-reduce.h +296 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-repl.h +74 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-scale.h +133 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-sigmoid.h +142 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-sqrt.h +126 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-types.h +36 -0
- data/vendor/ggml/src/ggml-hexagon/htp/hvx-utils.h +19 -0
- data/vendor/ggml/src/ggml-hexagon/htp/main.c +880 -0
- data/vendor/ggml/src/ggml-hexagon/htp/matmul-ops.c +3173 -0
- data/vendor/ggml/src/ggml-hexagon/htp/repeat-ops.c +148 -0
- data/vendor/ggml/src/ggml-hexagon/htp/rope-ops.c +494 -0
- data/vendor/ggml/src/ggml-hexagon/htp/set-rows-ops.c +184 -0
- data/vendor/ggml/src/ggml-hexagon/htp/softmax-ops.c +407 -0
- data/vendor/ggml/src/ggml-hexagon/htp/solve-tri-ops.c +267 -0
- data/vendor/ggml/src/ggml-hexagon/htp/ssm-conv.c +340 -0
- data/vendor/ggml/src/ggml-hexagon/htp/sum-rows-ops.c +128 -0
- data/vendor/ggml/src/ggml-hexagon/htp/unary-ops.c +657 -0
- data/vendor/ggml/src/ggml-hexagon/htp/vtcm-utils.h +16 -0
- data/vendor/ggml/src/ggml-hexagon/htp/worker-pool.c +293 -0
- data/vendor/ggml/src/ggml-hexagon/htp/worker-pool.h +57 -0
- data/vendor/ggml/src/ggml-hexagon/htp-drv.cpp +418 -0
- data/vendor/ggml/src/ggml-hexagon/htp-drv.h +121 -0
- data/vendor/ggml/src/ggml-hexagon/libdl.h +79 -0
- data/vendor/ggml/src/ggml-hexagon/libggml-htp.inf +40 -0
- data/vendor/ggml/src/ggml-hexagon/op-desc.h +153 -0
- data/vendor/ggml/src/ggml-hip/CMakeLists.txt +157 -0
- data/vendor/ggml/src/ggml-impl.h +783 -0
- data/vendor/ggml/src/ggml-metal/CMakeLists.txt +124 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal-common.cpp +457 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal-common.h +52 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal-context.h +41 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal-context.m +739 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal-device.cpp +2053 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal-device.h +296 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal-device.m +1829 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal-impl.h +1175 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal-ops.cpp +4606 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal-ops.h +97 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal.cpp +950 -0
- data/vendor/ggml/src/ggml-metal/ggml-metal.metal +10679 -0
- data/vendor/ggml/src/ggml-musa/CMakeLists.txt +124 -0
- data/vendor/ggml/src/ggml-musa/mudnn.cu +112 -0
- data/vendor/ggml/src/ggml-musa/mudnn.cuh +12 -0
- data/vendor/ggml/src/ggml-opencl/CMakeLists.txt +189 -0
- data/vendor/ggml/src/ggml-opencl/ggml-opencl.cpp +16374 -0
- data/vendor/ggml/src/ggml-opencl/kernels/add.cl +190 -0
- data/vendor/ggml/src/ggml-opencl/kernels/add_id.cl +42 -0
- data/vendor/ggml/src/ggml-opencl/kernels/argsort.cl +86 -0
- data/vendor/ggml/src/ggml-opencl/kernels/clamp.cl +20 -0
- data/vendor/ggml/src/ggml-opencl/kernels/concat.cl +51 -0
- data/vendor/ggml/src/ggml-opencl/kernels/conv2d.cl +185 -0
- data/vendor/ggml/src/ggml-opencl/kernels/conv2d_f16_f32.cl +176 -0
- data/vendor/ggml/src/ggml-opencl/kernels/cpy.cl +229 -0
- data/vendor/ggml/src/ggml-opencl/kernels/cumsum.cl +139 -0
- data/vendor/ggml/src/ggml-opencl/kernels/cvt.cl +1471 -0
- data/vendor/ggml/src/ggml-opencl/kernels/diag.cl +27 -0
- data/vendor/ggml/src/ggml-opencl/kernels/diag_mask_inf.cl +58 -0
- data/vendor/ggml/src/ggml-opencl/kernels/div.cl +138 -0
- data/vendor/ggml/src/ggml-opencl/kernels/embed_kernel.py +26 -0
- data/vendor/ggml/src/ggml-opencl/kernels/exp.cl +125 -0
- data/vendor/ggml/src/ggml-opencl/kernels/expm1.cl +113 -0
- data/vendor/ggml/src/ggml-opencl/kernels/fill.cl +17 -0
- data/vendor/ggml/src/ggml-opencl/kernels/flash_attn_f16.cl +370 -0
- data/vendor/ggml/src/ggml-opencl/kernels/flash_attn_f32.cl +371 -0
- data/vendor/ggml/src/ggml-opencl/kernels/flash_attn_f32_f16.cl +373 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gelu.cl +89 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_moe_mxfp4_f32.cl +162 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_moe_mxfp4_f32_ns.cl +302 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_moe_q4_0_f32_ns.cl +252 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_moe_q4_1_f32_ns.cl +254 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_moe_q5_0_f32_ns.cl +256 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_moe_q5_1_f32_ns.cl +258 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_noshuffle_iq4_nl_f32.cl +150 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_noshuffle_q4_0_f32.cl +139 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_noshuffle_q4_1_f32.cl +132 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_noshuffle_q4_k_f32.cl +172 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_noshuffle_q5_k_f32.cl +176 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_noshuffle_q6_k_f32.cl +140 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_noshuffle_q8_0_f32.cl +129 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemm_xmem_f16_f32_os8.cl +233 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_moe_mxfp4_f32.cl +156 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_moe_mxfp4_f32_ns.cl +161 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_moe_q4_0_f32_ns.cl +116 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_moe_q4_1_f32_ns.cl +119 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_moe_q5_0_f32_ns.cl +119 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_moe_q5_1_f32_ns.cl +121 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_noshuffle_iq4_nl_f32.cl +302 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_noshuffle_q4_0_f32.cl +274 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_noshuffle_q4_0_f32_spec.cl +268 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_noshuffle_q4_1_f32.cl +283 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_noshuffle_q4_k_f32.cl +318 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_noshuffle_q5_k_f32.cl +326 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_noshuffle_q6_k_f32.cl +293 -0
- data/vendor/ggml/src/ggml-opencl/kernels/gemv_noshuffle_q8_0_f32.cl +195 -0
- data/vendor/ggml/src/ggml-opencl/kernels/get_rows.cl +187 -0
- data/vendor/ggml/src/ggml-opencl/kernels/glu.cl +378 -0
- data/vendor/ggml/src/ggml-opencl/kernels/group_norm.cl +121 -0
- data/vendor/ggml/src/ggml-opencl/kernels/im2col_f16.cl +57 -0
- data/vendor/ggml/src/ggml-opencl/kernels/im2col_f32.cl +57 -0
- data/vendor/ggml/src/ggml-opencl/kernels/l2_norm.cl +71 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mean.cl +140 -0
- data/vendor/ggml/src/ggml-opencl/kernels/moe_reorder_b.cl +30 -0
- data/vendor/ggml/src/ggml-opencl/kernels/moe_sort_by_expert.cl +82 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul.cl +152 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mat_f16_f32.cl +130 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mm_f16_f32_kq_kqv.cl +273 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mm_f16_f32_l4_lm.cl +146 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mm_f32_f32_l4_lm.cl +147 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mm_iq4_nl_f32_l4_lm.cl +171 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mm_q4_0_f32_l4_lm.cl +163 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mm_q4_1_f32_l4_lm.cl +165 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mm_q4_k_f32_l4_lm.cl +179 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mm_q5_k_f32_l4_lm.cl +192 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mm_q6_k_f32_l4_lm.cl +158 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mm_q8_0_f32_l4_lm.cl +154 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_f16_f16.cl +118 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_f16_f32.cl +118 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_f16_f32_1row.cl +94 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_f16_f32_l4.cl +84 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_f32_f32.cl +118 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_id_mxfp4_f32.cl +189 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_id_mxfp4_f32_flat.cl +176 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_id_q4_0_f32_8x_flat.cl +283 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_id_q8_0_f32.cl +140 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_id_q8_0_f32_flat.cl +222 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_iq4_nl_f32.cl +164 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_iq4_nl_f32_flat.cl +202 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_mxfp4_f32.cl +144 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_mxfp4_f32_flat.cl +167 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q4_0_f32.cl +192 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q4_0_f32_1d_16x_flat.cl +307 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q4_0_f32_1d_8x_flat.cl +265 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q4_0_f32_8x_flat.cl +272 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q4_0_f32_v.cl +254 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q4_1_f32.cl +219 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q4_1_f32_flat.cl +229 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q4_k_f32.cl +180 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q4_k_f32_flat.cl +196 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q5_k_f32.cl +187 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q5_k_f32_flat.cl +203 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q6_k_f32.cl +194 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q6_k_f32_flat.cl +194 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q8_0_f32.cl +125 -0
- data/vendor/ggml/src/ggml-opencl/kernels/mul_mv_q8_0_f32_flat.cl +202 -0
- data/vendor/ggml/src/ggml-opencl/kernels/neg.cl +125 -0
- data/vendor/ggml/src/ggml-opencl/kernels/norm.cl +161 -0
- data/vendor/ggml/src/ggml-opencl/kernels/pad.cl +39 -0
- data/vendor/ggml/src/ggml-opencl/kernels/relu.cl +16 -0
- data/vendor/ggml/src/ggml-opencl/kernels/repeat.cl +38 -0
- data/vendor/ggml/src/ggml-opencl/kernels/rms_norm.cl +190 -0
- data/vendor/ggml/src/ggml-opencl/kernels/rope.cl +747 -0
- data/vendor/ggml/src/ggml-opencl/kernels/scale.cl +27 -0
- data/vendor/ggml/src/ggml-opencl/kernels/set_rows.cl +208 -0
- data/vendor/ggml/src/ggml-opencl/kernels/sigmoid.cl +29 -0
- data/vendor/ggml/src/ggml-opencl/kernels/silu.cl +30 -0
- data/vendor/ggml/src/ggml-opencl/kernels/softmax_4_f16.cl +108 -0
- data/vendor/ggml/src/ggml-opencl/kernels/softmax_4_f32.cl +108 -0
- data/vendor/ggml/src/ggml-opencl/kernels/softmax_f16.cl +107 -0
- data/vendor/ggml/src/ggml-opencl/kernels/softmax_f32.cl +107 -0
- data/vendor/ggml/src/ggml-opencl/kernels/softplus.cl +116 -0
- data/vendor/ggml/src/ggml-opencl/kernels/solve_tri.cl +51 -0
- data/vendor/ggml/src/ggml-opencl/kernels/sqr.cl +53 -0
- data/vendor/ggml/src/ggml-opencl/kernels/sqrt.cl +53 -0
- data/vendor/ggml/src/ggml-opencl/kernels/ssm_conv.cl +77 -0
- data/vendor/ggml/src/ggml-opencl/kernels/sub.cl +138 -0
- data/vendor/ggml/src/ggml-opencl/kernels/sum_rows.cl +140 -0
- data/vendor/ggml/src/ggml-opencl/kernels/tanh.cl +109 -0
- data/vendor/ggml/src/ggml-opencl/kernels/transpose.cl +143 -0
- data/vendor/ggml/src/ggml-opencl/kernels/tri.cl +32 -0
- data/vendor/ggml/src/ggml-opencl/kernels/tsembd.cl +48 -0
- data/vendor/ggml/src/ggml-opencl/kernels/upscale.cl +120 -0
- data/vendor/ggml/src/ggml-openvino/CMakeLists.txt +22 -0
- data/vendor/ggml/src/ggml-openvino/ggml-decoder.cpp +985 -0
- data/vendor/ggml/src/ggml-openvino/ggml-decoder.h +294 -0
- data/vendor/ggml/src/ggml-openvino/ggml-openvino-extra.cpp +380 -0
- data/vendor/ggml/src/ggml-openvino/ggml-openvino-extra.h +182 -0
- data/vendor/ggml/src/ggml-openvino/ggml-openvino.cpp +1132 -0
- data/vendor/ggml/src/ggml-openvino/ggml-quants.cpp +956 -0
- data/vendor/ggml/src/ggml-openvino/ggml-quants.h +153 -0
- data/vendor/ggml/src/ggml-openvino/openvino/decoder.h +74 -0
- data/vendor/ggml/src/ggml-openvino/openvino/frontend.cpp +27 -0
- data/vendor/ggml/src/ggml-openvino/openvino/frontend.h +23 -0
- data/vendor/ggml/src/ggml-openvino/openvino/input_model.cpp +17 -0
- data/vendor/ggml/src/ggml-openvino/openvino/input_model.h +29 -0
- data/vendor/ggml/src/ggml-openvino/openvino/node_context.h +112 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/cont.cpp +48 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/cpy.cpp +21 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/flash_attn_ext.cpp +90 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/get_rows.cpp +69 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/glu_geglu.cpp +61 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/glu_swiglu.cpp +62 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/mulmat.cpp +90 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/permute.cpp +102 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/reshape.cpp +83 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/rms_norm.cpp +46 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/rope.cpp +149 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/scale.cpp +41 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/set_rows.cpp +76 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/softmax.cpp +89 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/transpose.cpp +23 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/unary_gelu.cpp +25 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/unary_silu.cpp +27 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op/view.cpp +53 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op_table.cpp +47 -0
- data/vendor/ggml/src/ggml-openvino/openvino/op_table.h +40 -0
- data/vendor/ggml/src/ggml-openvino/openvino/pass/fuse_to_sdpa.cpp +60 -0
- data/vendor/ggml/src/ggml-openvino/openvino/pass/fuse_to_sdpa.h +17 -0
- data/vendor/ggml/src/ggml-openvino/openvino/pass/mark_decompression_convert_constant_folding.h +29 -0
- data/vendor/ggml/src/ggml-openvino/openvino/pass/squeeze_matmul.cpp +58 -0
- data/vendor/ggml/src/ggml-openvino/openvino/pass/squeeze_matmul.h +17 -0
- data/vendor/ggml/src/ggml-openvino/openvino/rt_info/weightless_caching_attributes.hpp +41 -0
- data/vendor/ggml/src/ggml-openvino/openvino/translate_session.cpp +317 -0
- data/vendor/ggml/src/ggml-openvino/openvino/translate_session.h +28 -0
- data/vendor/ggml/src/ggml-openvino/openvino/utils.cpp +257 -0
- data/vendor/ggml/src/ggml-openvino/openvino/utils.h +86 -0
- data/vendor/ggml/src/ggml-openvino/utils.cpp +880 -0
- data/vendor/ggml/src/ggml-openvino/utils.h +143 -0
- data/vendor/ggml/src/ggml-opt.cpp +1094 -0
- data/vendor/ggml/src/ggml-quants.c +5491 -0
- data/vendor/ggml/src/ggml-quants.h +112 -0
- data/vendor/ggml/src/ggml-rpc/CMakeLists.txt +33 -0
- data/vendor/ggml/src/ggml-rpc/ggml-rpc.cpp +1974 -0
- data/vendor/ggml/src/ggml-rpc/transport.cpp +683 -0
- data/vendor/ggml/src/ggml-rpc/transport.h +34 -0
- data/vendor/ggml/src/ggml-sycl/CMakeLists.txt +207 -0
- data/vendor/ggml/src/ggml-sycl/add-id.cpp +81 -0
- data/vendor/ggml/src/ggml-sycl/add-id.hpp +8 -0
- data/vendor/ggml/src/ggml-sycl/backend.hpp +48 -0
- data/vendor/ggml/src/ggml-sycl/binbcast.cpp +346 -0
- data/vendor/ggml/src/ggml-sycl/binbcast.hpp +39 -0
- data/vendor/ggml/src/ggml-sycl/common.cpp +155 -0
- data/vendor/ggml/src/ggml-sycl/common.hpp +1002 -0
- data/vendor/ggml/src/ggml-sycl/concat.cpp +202 -0
- data/vendor/ggml/src/ggml-sycl/concat.hpp +20 -0
- data/vendor/ggml/src/ggml-sycl/conv.cpp +101 -0
- data/vendor/ggml/src/ggml-sycl/conv.hpp +20 -0
- data/vendor/ggml/src/ggml-sycl/convert.cpp +825 -0
- data/vendor/ggml/src/ggml-sycl/convert.hpp +64 -0
- data/vendor/ggml/src/ggml-sycl/count-equal.cpp +79 -0
- data/vendor/ggml/src/ggml-sycl/count-equal.hpp +9 -0
- data/vendor/ggml/src/ggml-sycl/cpy.cpp +602 -0
- data/vendor/ggml/src/ggml-sycl/cpy.hpp +223 -0
- data/vendor/ggml/src/ggml-sycl/cumsum.cpp +148 -0
- data/vendor/ggml/src/ggml-sycl/cumsum.hpp +5 -0
- data/vendor/ggml/src/ggml-sycl/dequantize.hpp +975 -0
- data/vendor/ggml/src/ggml-sycl/diag.cpp +67 -0
- data/vendor/ggml/src/ggml-sycl/diag.hpp +5 -0
- data/vendor/ggml/src/ggml-sycl/dmmv.cpp +1579 -0
- data/vendor/ggml/src/ggml-sycl/dmmv.hpp +27 -0
- data/vendor/ggml/src/ggml-sycl/dpct/helper.hpp +3774 -0
- data/vendor/ggml/src/ggml-sycl/element_wise.cpp +1124 -0
- data/vendor/ggml/src/ggml-sycl/element_wise.hpp +94 -0
- data/vendor/ggml/src/ggml-sycl/fattn-buffers.cpp +56 -0
- data/vendor/ggml/src/ggml-sycl/fattn-buffers.hpp +63 -0
- data/vendor/ggml/src/ggml-sycl/fattn-common.hpp +1181 -0
- data/vendor/ggml/src/ggml-sycl/fattn-tile.cpp +59 -0
- data/vendor/ggml/src/ggml-sycl/fattn-tile.hpp +1246 -0
- data/vendor/ggml/src/ggml-sycl/fattn-vec.hpp +674 -0
- data/vendor/ggml/src/ggml-sycl/fattn.cpp +227 -0
- data/vendor/ggml/src/ggml-sycl/fattn.hpp +22 -0
- data/vendor/ggml/src/ggml-sycl/fill.cpp +55 -0
- data/vendor/ggml/src/ggml-sycl/fill.hpp +5 -0
- data/vendor/ggml/src/ggml-sycl/gated_delta_net.cpp +307 -0
- data/vendor/ggml/src/ggml-sycl/gated_delta_net.hpp +9 -0
- data/vendor/ggml/src/ggml-sycl/gemm.hpp +93 -0
- data/vendor/ggml/src/ggml-sycl/getrows.cpp +219 -0
- data/vendor/ggml/src/ggml-sycl/getrows.hpp +20 -0
- data/vendor/ggml/src/ggml-sycl/ggml-sycl.cpp +5520 -0
- data/vendor/ggml/src/ggml-sycl/gla.cpp +106 -0
- data/vendor/ggml/src/ggml-sycl/gla.hpp +8 -0
- data/vendor/ggml/src/ggml-sycl/im2col.cpp +400 -0
- data/vendor/ggml/src/ggml-sycl/im2col.hpp +23 -0
- data/vendor/ggml/src/ggml-sycl/mmq.cpp +3030 -0
- data/vendor/ggml/src/ggml-sycl/mmq.hpp +33 -0
- data/vendor/ggml/src/ggml-sycl/mmvq.cpp +1380 -0
- data/vendor/ggml/src/ggml-sycl/mmvq.hpp +43 -0
- data/vendor/ggml/src/ggml-sycl/norm.cpp +656 -0
- data/vendor/ggml/src/ggml-sycl/norm.hpp +28 -0
- data/vendor/ggml/src/ggml-sycl/outprod.cpp +47 -0
- data/vendor/ggml/src/ggml-sycl/outprod.hpp +10 -0
- data/vendor/ggml/src/ggml-sycl/pad.cpp +97 -0
- data/vendor/ggml/src/ggml-sycl/pad.hpp +24 -0
- data/vendor/ggml/src/ggml-sycl/pad_reflect_1d.cpp +100 -0
- data/vendor/ggml/src/ggml-sycl/pad_reflect_1d.hpp +10 -0
- data/vendor/ggml/src/ggml-sycl/presets.hpp +79 -0
- data/vendor/ggml/src/ggml-sycl/quantize.hpp +133 -0
- data/vendor/ggml/src/ggml-sycl/quants.hpp +156 -0
- data/vendor/ggml/src/ggml-sycl/repeat_back.cpp +76 -0
- data/vendor/ggml/src/ggml-sycl/repeat_back.hpp +8 -0
- data/vendor/ggml/src/ggml-sycl/roll.cpp +122 -0
- data/vendor/ggml/src/ggml-sycl/roll.hpp +20 -0
- data/vendor/ggml/src/ggml-sycl/rope.cpp +641 -0
- data/vendor/ggml/src/ggml-sycl/rope.hpp +26 -0
- data/vendor/ggml/src/ggml-sycl/set.cpp +73 -0
- data/vendor/ggml/src/ggml-sycl/set.hpp +5 -0
- data/vendor/ggml/src/ggml-sycl/set_rows.cpp +240 -0
- data/vendor/ggml/src/ggml-sycl/set_rows.hpp +8 -0
- data/vendor/ggml/src/ggml-sycl/softmax.cpp +426 -0
- data/vendor/ggml/src/ggml-sycl/softmax.hpp +24 -0
- data/vendor/ggml/src/ggml-sycl/solve_tri.cpp +172 -0
- data/vendor/ggml/src/ggml-sycl/solve_tri.hpp +8 -0
- data/vendor/ggml/src/ggml-sycl/ssm_conv.cpp +132 -0
- data/vendor/ggml/src/ggml-sycl/ssm_conv.hpp +5 -0
- data/vendor/ggml/src/ggml-sycl/ssm_scan.cpp +156 -0
- data/vendor/ggml/src/ggml-sycl/ssm_scan.hpp +5 -0
- data/vendor/ggml/src/ggml-sycl/sycl_hw.cpp +67 -0
- data/vendor/ggml/src/ggml-sycl/sycl_hw.hpp +38 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-tile-instance-dkq112-dv112.cpp +5 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-tile-instance-dkq128-dv128.cpp +5 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-tile-instance-dkq256-dv256.cpp +5 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-tile-instance-dkq40-dv40.cpp +5 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-tile-instance-dkq512-dv512.cpp +6 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-tile-instance-dkq576-dv512.cpp +5 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-tile-instance-dkq64-dv64.cpp +5 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-tile-instance-dkq72-dv72.cpp +5 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-tile-instance-dkq80-dv80.cpp +5 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-tile-instance-dkq96-dv96.cpp +5 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-f16.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q8_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-f16.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q8_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-f16.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q8_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-f16.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q8_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-f16.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q8_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-f16.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_1.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q8_0.cpp +8 -0
- data/vendor/ggml/src/ggml-sycl/tsembd.cpp +73 -0
- data/vendor/ggml/src/ggml-sycl/tsembd.hpp +20 -0
- data/vendor/ggml/src/ggml-sycl/type.hpp +112 -0
- data/vendor/ggml/src/ggml-sycl/upscale.cpp +410 -0
- data/vendor/ggml/src/ggml-sycl/upscale.hpp +9 -0
- data/vendor/ggml/src/ggml-sycl/vecdotq.hpp +1508 -0
- data/vendor/ggml/src/ggml-sycl/wkv.cpp +293 -0
- data/vendor/ggml/src/ggml-sycl/wkv.hpp +10 -0
- data/vendor/ggml/src/ggml-threading.cpp +12 -0
- data/vendor/ggml/src/ggml-threading.h +14 -0
- data/vendor/ggml/src/ggml-virtgpu/CMakeLists.txt +70 -0
- data/vendor/ggml/src/ggml-virtgpu/apir_cs_ggml-rpc-front.cpp +87 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/CMakeLists.txt +21 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/apir_cs_ggml-rpc-back.cpp +115 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/backend-convert.h +13 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp +102 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer-type.cpp +105 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp +179 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/backend-dispatched-device.cpp +148 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp +51 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/backend-dispatched.gen.h +73 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/backend-dispatched.h +27 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/backend-virgl-apir.h +32 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/backend.cpp +144 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/shared/api_remoting.h +95 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h +94 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/shared/apir_backend.h +50 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/shared/apir_cs.h +378 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h +232 -0
- data/vendor/ggml/src/ggml-virtgpu/backend/shared/apir_cs_rpc.h +58 -0
- data/vendor/ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp +81 -0
- data/vendor/ggml/src/ggml-virtgpu/ggml-backend-buffer.cpp +123 -0
- data/vendor/ggml/src/ggml-virtgpu/ggml-backend-device.cpp +160 -0
- data/vendor/ggml/src/ggml-virtgpu/ggml-backend-reg.cpp +213 -0
- data/vendor/ggml/src/ggml-virtgpu/ggml-backend.cpp +71 -0
- data/vendor/ggml/src/ggml-virtgpu/ggml-remoting.h +71 -0
- data/vendor/ggml/src/ggml-virtgpu/ggmlremoting_functions.yaml +166 -0
- data/vendor/ggml/src/ggml-virtgpu/include/apir_hw.h +9 -0
- data/vendor/ggml/src/ggml-virtgpu/regenerate_remoting.py +333 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu-apir.h +15 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp +58 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp +110 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu-forward-buffer.cpp +173 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp +192 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu-forward-impl.h +36 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu-forward.gen.h +53 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu-shm.cpp +99 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu-shm.h +23 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu-utils.cpp +179 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu-utils.h +86 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu.cpp +545 -0
- data/vendor/ggml/src/ggml-virtgpu/virtgpu.h +115 -0
- data/vendor/ggml/src/ggml-vulkan/CMakeLists.txt +220 -0
- data/vendor/ggml/src/ggml-vulkan/cmake/host-toolchain.cmake.in +15 -0
- data/vendor/ggml/src/ggml-vulkan/ggml-vulkan.cpp +17208 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt +31 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/abs.comp +21 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/acc.comp +37 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/add.comp +69 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/add1.comp +28 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/add_id.comp +42 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/arange.comp +20 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/argmax.comp +60 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/argsort.comp +86 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/argsort_large.comp +114 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/ceil.comp +22 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/clamp.comp +17 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/concat.comp +41 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/contig_copy.comp +49 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/conv2d_dw.comp +105 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/conv2d_mm.comp +347 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/conv_transpose_1d.comp +98 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/copy.comp +23 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/copy_from_quant.comp +51 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/copy_to_quant.comp +320 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/copy_transpose.comp +67 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/cos.comp +17 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/count_equal.comp +31 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/count_experts.comp +51 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/cumsum.comp +83 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/cumsum_multipass1.comp +60 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/cumsum_multipass2.comp +66 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_f32.comp +20 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs.glsl +653 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs_cm2.glsl +768 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_head.glsl +13 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_iq1_m.comp +42 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_iq1_s.comp +35 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_iq2_s.comp +44 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_iq2_xs.comp +43 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_iq2_xxs.comp +49 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_iq3_s.comp +40 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_iq3_xxs.comp +51 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_iq4_nl.comp +32 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_iq4_xs.comp +34 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_mxfp4.comp +32 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_nvfp4.comp +32 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_q1_0.comp +29 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_q2_k.comp +34 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_q3_k.comp +42 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_q4_0.comp +30 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_q4_1.comp +32 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_q4_k.comp +68 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_q5_0.comp +34 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_q5_1.comp +35 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_q5_k.comp +70 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_q6_k.comp +33 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/dequant_q8_0.comp +31 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/diag.comp +28 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/diag_mask_inf.comp +34 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/div.comp +27 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/elu.comp +27 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/exp.comp +20 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/feature-tests/bfloat16.comp +7 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/feature-tests/coopmat.comp +7 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/feature-tests/coopmat2.comp +7 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/feature-tests/integer_dot.comp +7 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/fill.comp +19 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/flash_attn.comp +756 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_base.glsl +255 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm1.comp +626 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm2.comp +427 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_dequant.glsl +123 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_mask_opt.comp +162 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_mmq_funcs.glsl +203 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_split_k_reduce.comp +121 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/floor.comp +22 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/gated_delta_net.comp +190 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/geglu.comp +13 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/geglu_erf.comp +27 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/geglu_quick.comp +11 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/gelu.comp +25 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/gelu_erf.comp +39 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/gelu_quick.comp +23 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/generic_binary_head.glsl +65 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/generic_head.glsl +11 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/generic_unary_head.glsl +83 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/get_rows.comp +42 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/get_rows_quant.comp +51 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/glu_head.glsl +28 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/glu_main.glsl +39 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/group_norm.comp +66 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/hardsigmoid.comp +22 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/hardswish.comp +22 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/im2col.comp +93 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/im2col_3d.comp +124 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/l2_norm.comp +44 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/leaky_relu.comp +22 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/log.comp +17 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul.comp +27 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_split_k_reduce.comp +48 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec.comp +169 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_base.glsl +230 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iface.glsl +35 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq1_m.comp +132 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq1_s.comp +95 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq2_s.comp +90 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq2_xs.comp +105 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq2_xxs.comp +87 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq3_s.comp +90 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq3_xxs.comp +88 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_nc.comp +124 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_p021.comp +156 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_q2_k.comp +128 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_q3_k.comp +132 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_q4_k.comp +134 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_q5_k.comp +165 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_q6_k.comp +130 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vecq.comp +143 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vecq_funcs.glsl +503 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp +464 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp +624 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_funcs.glsl +600 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_id_funcs.glsl +74 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq.comp +311 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq_funcs.glsl +454 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq_shmem_types.glsl +93 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/multi_add.comp +194 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/neg.comp +20 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/norm.comp +44 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/opt_step_adamw.comp +42 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/opt_step_sgd.comp +22 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/pad.comp +64 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/pool2d.comp +74 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/quantize_q8_1.comp +127 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/reglu.comp +9 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/relu.comp +21 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/repeat.comp +26 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/repeat_back.comp +37 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/rms_norm.comp +150 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/rms_norm_back.comp +55 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/rms_norm_partials.comp +65 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/roll.comp +46 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/rope_funcs.glsl +207 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/rope_head.glsl +19 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/rope_multi.comp +17 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/rope_neox.comp +17 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/rope_norm.comp +17 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/rope_params.glsl +31 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/rope_vision.comp +17 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/round.comp +29 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/scale.comp +24 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/sgn.comp +21 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/sigmoid.comp +20 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/silu.comp +22 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/silu_back.comp +26 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/sin.comp +17 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/soft_max.comp +195 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/soft_max_back.comp +54 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/soft_max_large1.comp +62 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/soft_max_large2.comp +79 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/soft_max_large3.comp +65 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/soft_max_large_common.glsl +53 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/softplus.comp +23 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/solve_tri.comp +81 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/sqrt.comp +17 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/square.comp +17 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/ssm_conv.comp +50 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/ssm_scan.comp +124 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/step.comp +22 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/sub.comp +29 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/sum_rows.comp +47 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/sum_rows.glsl +25 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/swiglu.comp +9 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/swiglu_oai.comp +14 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/tanh.comp +20 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/timestep_embedding.comp +42 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/topk_argsort.comp +118 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/topk_moe.comp +213 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/topk_nary_search.comp +246 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/tri.comp +42 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/trunc.comp +22 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/types.glsl +1846 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/upscale.comp +178 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/utils.glsl +25 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp +1183 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/wkv6.comp +87 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/wkv7.comp +91 -0
- data/vendor/ggml/src/ggml-vulkan/vulkan-shaders/xielu.comp +35 -0
- data/vendor/ggml/src/ggml-webgpu/CMakeLists.txt +80 -0
- data/vendor/ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp +3231 -0
- data/vendor/ggml/src/ggml-webgpu/ggml-webgpu.cpp +4461 -0
- data/vendor/ggml/src/ggml-webgpu/pre_wgsl.hpp +778 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/add_id.wgsl +64 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/argmax.wgsl +72 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/argsort.wgsl +106 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/argsort_merge.wgsl +134 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/binary.wgsl +139 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl +905 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/concat.wgsl +75 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/conv2d.wgsl +165 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/cpy.wgsl +81 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/cumsum.wgsl +66 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py +89 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl +706 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_tile.wgsl +351 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_blk.wgsl +101 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_reduce.wgsl +84 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl +720 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/gated_delta_net.wgsl +132 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl +773 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/glu.wgsl +155 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/im2col.wgsl +101 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/memset.wgsl +40 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl +747 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl +1210 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_id.wgsl +195 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_id_gather.wgsl +55 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_id_vec.wgsl +154 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl +149 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl +200 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl +133 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec_acc.tmpl +1433 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/pad.wgsl +86 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/repeat.wgsl +67 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/rms_norm_mul.wgsl +152 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/rope.wgsl +224 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/row_norm.wgsl +153 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/scale.wgsl +63 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/set.wgsl +109 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/set_rows.wgsl +109 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/soft_max.wgsl +245 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/solve_tri.wgsl +121 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/ssm_conv.wgsl +65 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/ssm_scan.wgsl +193 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/sum_rows.wgsl +55 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl +210 -0
- data/vendor/ggml/src/ggml-webgpu/wgsl-shaders/upscale.wgsl +240 -0
- data/vendor/ggml/src/ggml-zdnn/CMakeLists.txt +36 -0
- data/vendor/ggml/src/ggml-zdnn/common.hpp +59 -0
- data/vendor/ggml/src/ggml-zdnn/ggml-zdnn.cpp +637 -0
- data/vendor/ggml/src/ggml-zdnn/mmf.cpp +80 -0
- data/vendor/ggml/src/ggml-zdnn/mmf.hpp +12 -0
- data/vendor/ggml/src/ggml-zdnn/utils.cpp +79 -0
- data/vendor/ggml/src/ggml-zdnn/utils.hpp +19 -0
- data/vendor/ggml/src/ggml-zendnn/CMakeLists.txt +91 -0
- data/vendor/ggml/src/ggml-zendnn/ggml-zendnn.cpp +669 -0
- data/vendor/ggml/src/ggml.c +7777 -0
- data/vendor/ggml/src/ggml.cpp +26 -0
- data/vendor/ggml/src/gguf.cpp +1556 -0
- data/vendor/ggml/tests/CMakeLists.txt +356 -0
- data/vendor/ggml/tests/test-arange.cpp +100 -0
- data/vendor/ggml/tests/test-backend-ops.cpp +9786 -0
- data/vendor/ggml/tests/test-cont.c +170 -0
- data/vendor/ggml/tests/test-conv-transpose-1d.cpp +691 -0
- data/vendor/ggml/tests/test-conv-transpose.c +248 -0
- data/vendor/ggml/tests/test-conv1d-dw-c1.cpp +243 -0
- data/vendor/ggml/tests/test-conv1d-dw-c2.cpp +243 -0
- data/vendor/ggml/tests/test-conv1d.cpp +289 -0
- data/vendor/ggml/tests/test-conv2d-dw.cpp +153 -0
- data/vendor/ggml/tests/test-conv2d.cpp +391 -0
- data/vendor/ggml/tests/test-customop.c +300 -0
- data/vendor/ggml/tests/test-dup.c +111 -0
- data/vendor/ggml/tests/test-interpolate.cpp +166 -0
- data/vendor/ggml/tests/test-opt.cpp +1003 -0
- data/vendor/ggml/tests/test-pad-reflect-1d.cpp +213 -0
- data/vendor/ggml/tests/test-pool.c +274 -0
- data/vendor/ggml/tests/test-quantize-fns.cpp +196 -0
- data/vendor/ggml/tests/test-quantize-perf.cpp +356 -0
- data/vendor/ggml/tests/test-rel-pos.c +87 -0
- data/vendor/ggml/tests/test-roll.cpp +128 -0
- data/vendor/ggml/tests/test-timestep_embedding.cpp +180 -0
- data/vendor-patches/0001-cuda-buffer_from_ptr.patch +253 -0
- data/vendor-patches/0002-cuda-buffer_from_ptr-reuse-iface.patch +117 -0
- data/vendor-patches/0003-cuda-buffer_from_ptr-copy-mode.patch +128 -0
- data/vendor-patches/0004-cuda-cpy-strided.patch +61 -0
- data/vendor-patches/0005-concat-backward.patch +36 -0
- data/vendor-patches/0006-getrows-back-large-vocab.patch +69 -0
- data/vendor-patches/0007-gpt2-backward-kernels.patch +438 -0
- data/vendor-patches/0008-mul-mat-backward-mixed-precision.patch +50 -0
- data/vendor-patches/0009-sched-unsupported-node-diagnostic.patch +26 -0
- metadata +2161 -0
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,1124 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## v0.8.0 — 2026-06-12
|
|
4
|
+
|
|
5
|
+
**The first published version** (RubyGems, gem name graciously transferred
|
|
6
|
+
by Ninoslav Milenović — see README acknowledgments). The release of the
|
|
7
|
+
"first form right" arc (#58): toy is now a publishable dual-use framework —
|
|
8
|
+
a playable CLI and a clean library — with every README claim gate-tested.
|
|
9
|
+
|
|
10
|
+
Highlights of the arc (full trail: issues #58–#87):
|
|
11
|
+
|
|
12
|
+
- **The experimenter API** (#64/#73): `RecipeOptions` named setters replace
|
|
13
|
+
the 8-positional `realize!`; validating `TrainingBatch` (+ objectives) and
|
|
14
|
+
`ClassifyBatch`; `AdamW.for_from_scratch`/`.for_lora` + `hp_for_step`
|
|
15
|
+
(misuse fails loud); `realize_warm!` owns donor plumbing; `Toy::RunBundle`
|
|
16
|
+
writer + `Toy::RunLog` reader (`RunLog.scan("runs")` — best run in one
|
|
17
|
+
line); config profile factories (`SmolLM2Config.tiny/.mid`,
|
|
18
|
+
`ViTTinyConfig.tiny`); ENV-driven scaffolds (one compile, many runs).
|
|
19
|
+
- **Tree truth** (#65/#68): the layered stack is the whole library —
|
|
20
|
+
`lib/` is exactly `toy.rb` + `toy/`; tinynn under `toy/ffi/`, loaders
|
|
21
|
+
under `toy/io/loaders/`, KV decode + the lifted GPT-2 engines under
|
|
22
|
+
`toy/llm/engine/`; docs teach the real six layers (L4 = Engine).
|
|
23
|
+
- **Compile-time devices** (#64/#70): `toy/compute` ·
|
|
24
|
+
`toy/compute_cuda` · `toy/compute_metal` entries; `toy new --lib` ships a
|
|
25
|
+
multi-arch `build.sh`; GPU vendor build-units (opt-in) validated through
|
|
26
|
+
the consumer front door (CUDA byte-exact on GB10; Metal Mac-verified).
|
|
27
|
+
- **MRI dev-runs + the CRuby oracle** (#71): `require "toy/mri"` loads
|
|
28
|
+
everything under plain Ruby (teaching surface works pure-Ruby); with
|
|
29
|
+
`make libtinynn_shared`, the Fiddle arm runs the real compute path —
|
|
30
|
+
**bit-exact vs the Spinel binaries on both Linux/aarch64 and Apple
|
|
31
|
+
Silicon** (train curves and decode ids).
|
|
32
|
+
- **Curated examples + tested quickstart** (#60): 7 narrated examples
|
|
33
|
+
replace the 60-entry sprawl; `make gate-consumer` proves the README
|
|
34
|
+
cold-start (app + `--lib` legs) on Linux and macOS.
|
|
35
|
+
- **Coverage honesty** (#61/#76/#77): 17-model matrix re-verified with
|
|
36
|
+
provenance stamps; Qwen3 CUDA fixed (below); Qwen2.5 CUDA-Q8 footnote
|
|
37
|
+
upgraded; bench-vs-PyTorch legs restored (train 1.064×, infer 1.399×).
|
|
38
|
+
- **sig/*.rbs type roots** (#69): shipped in the gem and consumed at
|
|
39
|
+
compile (`--rbs sig`) — uncalled-param poly-widening defense for toy and
|
|
40
|
+
for every consumer.
|
|
41
|
+
- **lora in `toy/compute`** (#52): the exclusion dissolved with the #64
|
|
42
|
+
reshape; the compute-surface gate carries an uncalled-LoRA tripwire.
|
|
43
|
+
- Upstream: eight Spinel issues filed from this arc (the silent-failure
|
|
44
|
+
cluster spinel-dev#17/#19/#20/#21/#23 among them), two fixed same-day.
|
|
45
|
+
|
|
46
|
+
Itemized entries accumulated during the arc:
|
|
47
|
+
|
|
48
|
+
- **#70: GPU build-units merged into `spinel-ext.json`** (spinelgems#20
|
|
49
|
+
landed: `"default": "disabled"` opt-in + `--with-ext NAME` /
|
|
50
|
+
`SPINEL_EXT_ENABLE`, cmake `build_dir` variants, copy-once shared source
|
|
51
|
+
dirs). The staged `spinel-ext-gpu.json` is retired. CUDA re-validated
|
|
52
|
+
end-to-end through the consumer front door on the GB10 (plain `vendor`
|
|
53
|
+
skips the units; `--with-ext cuda --with-ext cuda-shim` builds
|
|
54
|
+
`build-cuda/` alongside the CPU `build/`; vendored `train_cuda`
|
|
55
|
+
reproduces `prep/fixtures/train_cuda_baseline.txt` byte-for-byte on GPU).
|
|
56
|
+
Metal entries ride along structurally (Mac validation pending, toy#27;
|
|
57
|
+
macOS cmake build-units also need spinelgems#21 tool-side). Findings
|
|
58
|
+
hardened: the gem now ships the generated `lib/toy/llm/*_{cuda,metal}.rb`
|
|
59
|
+
mirrors (`gem-prep` runs `gen-mirrors`) — without them Spinel silently
|
|
60
|
+
compiles the missing requires to nothing and consumer GPU builds
|
|
61
|
+
mis-resolve call sites; the `toy new --lib` `build.sh` fails loud when a
|
|
62
|
+
GPU archive is missing (re-vendor hint) and its README documents the
|
|
63
|
+
enable flags.
|
|
64
|
+
|
|
65
|
+
- **Fixed #76: Qwen3 CUDA F32 degenerate decode.** The hand-written CUDA
|
|
66
|
+
loader (`transformer_lm_cuda.rb`) never wired the detected QK-norm flags
|
|
67
|
+
into the engine — it called `realize_for_mmap` with 5 of 6 args (Spinel
|
|
68
|
+
zero-fills missing call args silently), so Qwen3's per-head Q/K RMS-norms
|
|
69
|
+
were never built on the CUDA graph. Now wired identically to the CPU
|
|
70
|
+
loader; Qwen3-0.6B/1.7B CUDA decode is token-identical to CPU and
|
|
71
|
+
`docs/models.md` rows are restored ✓. Also: `NO_QK_NORM=1` actually
|
|
72
|
+
disables the norm now (it was overwritten by `realize_for_mmap`), and
|
|
73
|
+
loading a QK-norm model through the legacy (non-`toy.ggml_native`)
|
|
74
|
+
copy-load path fails loud instead of decoding silently degenerate.
|
|
75
|
+
|
|
76
|
+
- **The `toy` gem name is ours.** With huge thanks to **Ninoslav Milenović**,
|
|
77
|
+
who graciously transferred ownership of the [`toy`](https://rubygems.org/gems/toy)
|
|
78
|
+
gem on RubyGems. See the README acknowledgments. (Not released yet — name
|
|
79
|
+
secured for the eventual release.)
|
|
80
|
+
- **GPT-2 training foundation.** Two vendored ggml backward kernels
|
|
81
|
+
(`ggml_gelu_back`, `ggml_norm_back`; `vendor-patches/0007`) — finite-diff
|
|
82
|
+
validated, 2 of the 3 gaps in our upstream ggml#1514. A minimal inline GPT-2
|
|
83
|
+
trainer (`make gate-gpt2-min`) proves both kernels train end-to-end
|
|
84
|
+
(CE 3.47 → 0.007). Attention + the full arch are the next increments.
|
|
85
|
+
|
|
86
|
+
## v0.7.0-pre-alpha — 2026-06-02
|
|
87
|
+
|
|
88
|
+
### P2-finish + directory tidy (2026-06-03, branches `p2-retire-monolith` / `p2-dir-tidy`)
|
|
89
|
+
Past P2's accepted ceiling; all bit-identical-gated on CPU+CUDA (Metal
|
|
90
|
+
Mac-verified). Not yet merged to `main`.
|
|
91
|
+
- **Monolith retired.** `lib/llama_seq_forward_ffi.rb` → `lib/toy/llm/engine/llama_seq_engine.rb`
|
|
92
|
+
as `Toy::LLM::Engine::LlamaSeqEngine`; `vit_tiny_forward_ffi.rb` likewise →
|
|
93
|
+
`engine/vit_tiny_engine.rb`. The L1–L4 tree is now the engine.
|
|
94
|
+
- **`full_finetune` lifted** onto `TransformerBlock` (the 4th realize path,
|
|
95
|
+
left inline at the ceiling) + a 6th byte-exact gate (`prep/full_finetune_gate.rb`),
|
|
96
|
+
recorded from the inline path first. `random_init` global-tensor alloc lifted
|
|
97
|
+
onto `LlamaArch`.
|
|
98
|
+
- **Mirrors off-disk.** The 24 `_cuda`/`_metal` twins are generated at build time
|
|
99
|
+
(Makefile static-pattern rules) and gitignored, not committed (−~11k lines);
|
|
100
|
+
`verify-mirrors` regenerates + checks idempotency.
|
|
101
|
+
- **Directory tidy.** Top-level `lib/*.rb` grouped under `lib/toy/`: `io/`
|
|
102
|
+
(tokenizer, gguf, loaders), `models/` (transformer, arch, smollm2, gpt2, vit,
|
|
103
|
+
transformer_lm), `train/` (training, lr_schedule, drift_grad, gguf_writer/fuse,
|
|
104
|
+
sampler), `dev/` (describe_flow, card, tap, logprobs). `toy.rb` (package entry)
|
|
105
|
+
and `tinynn.rb` (FFI bridge) kept at root.
|
|
106
|
+
- Fixed `bench/check.rb` to `mkdir -p bench/build` (fresh-clone build). Filed
|
|
107
|
+
toy#34: `Tokenizer#decode` drops spaces (byte-level `Ġ` not restored;
|
|
108
|
+
pre-existing on `main`).
|
|
109
|
+
|
|
110
|
+
### Deferred queue — full compute surface + GPU + packaging (2026-06-01/02)
|
|
111
|
+
Landed after the initial v0.7 refactor; all bit-identical-gated, all on the
|
|
112
|
+
canonical gx10 box (see gate-portability note below).
|
|
113
|
+
- **`toy train` recipes:** `lora` + `warm-start` CLI dispatch (real recipes,
|
|
114
|
+
byte-exact gates), plus the existing `from-scratch`.
|
|
115
|
+
- **GPU `--device`:** `cuda` for `infer`/`eval` (token-parity vs CPU) and for
|
|
116
|
+
`train {from-scratch,lora,warm-start}` (byte-deterministic CUDA gates +
|
|
117
|
+
checkpoint round-trip). `metal` runtime for `infer`/`eval`/`train`
|
|
118
|
+
(Mac-verified; clean errors off-platform).
|
|
119
|
+
- **train→infer round-trip:** from-scratch/warm-start checkpoints fold the
|
|
120
|
+
projection-lens at write time into a standard fused-llama GGUF that
|
|
121
|
+
`toy infer` loads; `--prompt-ids` + fail-loud on tokenizer-less/out-of-range.
|
|
122
|
+
- **`toy serve` telemetry:** emits toy/v1 `run_start` + per-request
|
|
123
|
+
`eval`/`serve`/`request` events (run_end a known gap — `Tep.run!` SIGSEGVs on
|
|
124
|
+
SIGTERM; tep#175).
|
|
125
|
+
- **`toy eval lmc`:** two-checkpoint linear-mode-connectivity (pinned fixtures).
|
|
126
|
+
- **`toy train vit-tiny`:** first non-llama arch — ViT-tiny from-scratch on the
|
|
127
|
+
committed `data/vit_smoke` corpus; fail-loud on a missing corpus.
|
|
128
|
+
- **Gate portability:** float gates (train loss, eval logprobs) are
|
|
129
|
+
gx10-canonical strict byte-exact; cross-platform they gate the discrete
|
|
130
|
+
invariant + a tolerance (eval logprobs run through Ruby libm). Gate corpora
|
|
131
|
+
are pinned committed fixtures.
|
|
132
|
+
- **Packaging:** MIT `LICENSE`; gemspec `files` hardened with `git ls-files`
|
|
133
|
+
(artifact-free ~432 KB gem); option-(c) build inputs ship for
|
|
134
|
+
`gem install toy && toy install`.
|
|
135
|
+
- **Upstream:** filed tep#168/#172/#175, toy#30 (tep Backend convergence),
|
|
136
|
+
toy#31 (consume tep `~> 0.11`). GPT-2 training deferred (greenfield: ggml has
|
|
137
|
+
no LayerNorm backward).
|
|
138
|
+
|
|
139
|
+
**Headline.** toy becomes a **CLI-driven framework**. The forward path is
|
|
140
|
+
refactored into a five-layer algo stack (primitives → blocks → archs →
|
|
141
|
+
recipes), every layer landed behind a **bit-identical-output gate**; a
|
|
142
|
+
CRuby `bin/toy` CLI ships **9 commands**; tep is now consumed as a normal
|
|
143
|
+
**gem from `main`** (no hand-rolled vendoring); and the docs tree was
|
|
144
|
+
rebuilt clean. `toy --version` and the gemspec now agree on the version
|
|
145
|
+
(was: gem `0.1.0` vs README `v0.6.0-pre-alpha`).
|
|
146
|
+
|
|
147
|
+
### P2 — five-layer refactor (`lib/toy/llm/`), all bit-identical-gated
|
|
148
|
+
- **L1 primitives** `{rope, swiglu, rms_norm, gqa}`, **L2** `transformer_block`,
|
|
149
|
+
**L3** `llama_arch`, **L4 recipes** `{from_scratch, lora, warm_start}`
|
|
150
|
+
(curriculum deferred) extracted from `lib/llama_seq_forward_ffi.rb`.
|
|
151
|
+
Monolith ~1918 → ~1300 lines.
|
|
152
|
+
- **realize-path decomposition:** per-block weight loading for random_init /
|
|
153
|
+
mmap / q8 moved onto the block/arch, behind a **7-way** bit-identical gate
|
|
154
|
+
(CPU + CUDA + GGUF F32 round-trip + GQA-divergent + B>1 + qkv_bias + q8).
|
|
155
|
+
- **New gates:** CUDA forward parity (`smoke_projection_lens_cuda`), GGUF F32
|
|
156
|
+
mmap round-trip (`lib/toy_gguf_fuse.rb` head-fusing writer), and the
|
|
157
|
+
config-variant cascade fixtures. Harnesses under `prep/*_gate.rb`.
|
|
158
|
+
|
|
159
|
+
### P3 — CLI MVP + packaging
|
|
160
|
+
- **`bin/toy`** (CRuby, no Spinel): `new`, `install`, `list`, `describe`,
|
|
161
|
+
`fetch` + global `--manifest` / `--help` / `--version`; `toy.yml`
|
|
162
|
+
(run-id template + algos path). Manifest/help auto-generated from one
|
|
163
|
+
COMMANDS registry.
|
|
164
|
+
- **`describe`** reads `general.architecture` and **declines** non-llama
|
|
165
|
+
arches (gpt2/gemma2/olmoe) instead of mis-rendering a llama Card
|
|
166
|
+
(fail-loud); `list` dedups by canonical path; pure-Ruby GGUF metadata
|
|
167
|
+
reader (`lib/toy/core/gguf_meta.rb`).
|
|
168
|
+
- **Packaging (toy#28 → option c):** the gem ships the backend build inputs;
|
|
169
|
+
`gem install toy && toy install` builds the backend. Verified on Linux
|
|
170
|
+
aarch64 (gx10) + macOS Apple Silicon / Metal; Linux x86_64 pending.
|
|
171
|
+
|
|
172
|
+
### P4 — `infer` / `train` / `eval` / `serve`
|
|
173
|
+
- **Compute bridge:** each command builds a lib-side Spinel runner
|
|
174
|
+
(`lib/toy/run/<cmd>.rb` → `libexec/toy-<cmd>`) via the install machinery
|
|
175
|
+
and shells it with a controlled env; recorded-baseline gates.
|
|
176
|
+
- `infer` (greedy, byte-for-byte vs the old example), `train from-scratch`
|
|
177
|
+
(drives the FromScratch recipe → `runs/<id>/` + events.jsonl + checkpoint),
|
|
178
|
+
`eval` (CE/logprobs), `serve` (`lib/toy/serve/openai/`, OpenAI-compatible
|
|
179
|
+
IDs-in/IDs-out HTTP; tep as build-dep only). `examples/01_inference.rb` and
|
|
180
|
+
`tep_demo/openai_api_llama.rb` retired into the CLI/lib.
|
|
181
|
+
|
|
182
|
+
### Dependencies — tep as a normal gem
|
|
183
|
+
- tep consumed from its **`main` branch via the spinelgems convention**
|
|
184
|
+
(`Gemfile` `git:` → `bundle lock` → `spinel-compat vendor` →
|
|
185
|
+
`require_relative "vendor/spinel/deps"`). The `@TEP_*@` post-vendor
|
|
186
|
+
substitution trick is **retired** — `spinel-compat vendor` wires tep's
|
|
187
|
+
C-exts natively from its shipped `spinel-ext.json` (tep#98). `prep/post_vendor_tep.rb` deleted.
|
|
188
|
+
|
|
189
|
+
### Cleanup + docs
|
|
190
|
+
- Docs rebuilt to a clean tree (`architecture`, `cli`, `authoring`, `gating`,
|
|
191
|
+
`dependencies`, `events`, `roadmap`, `reference/`); ~33 dead docs pruned,
|
|
192
|
+
~16 archived; zero dangling intra-repo links.
|
|
193
|
+
|
|
194
|
+
### Upstream issues filed
|
|
195
|
+
- **matz/spinel#1043** — `Struct.new` accessor names act as global type-merge
|
|
196
|
+
keys (mis-types unrelated code); use hand-written value classes.
|
|
197
|
+
- **spinelgems#8** — `spinel-compat vendor` doesn't wire a split
|
|
198
|
+
`@MOD_CFLAGS@` pkg-config to the source `.o` compile (tep pg → libpq-fe.h
|
|
199
|
+
not found); `SPINEL_EXT_DISABLE=pg` workaround.
|
|
200
|
+
- **ggml#1506** noted — `mul_mat_id` broken for K-quants (use Q8_0 experts).
|
|
201
|
+
|
|
202
|
+
### Deferred
|
|
203
|
+
GPU runners (`--device`), train `lora`/`warm-start`/`curriculum` variants,
|
|
204
|
+
`eval lmc`, ViT, GPT-2 train, the train→infer checkpoint round-trip
|
|
205
|
+
(tensor-naming), the realize fixture-cascade (full_finetune), Tep then Tao
|
|
206
|
+
re-adaptation (until Toy is stable), Linux x86_64 verification.
|
|
207
|
+
|
|
208
|
+
## v0.6.0-pre-alpha — 2026-05-28
|
|
209
|
+
|
|
210
|
+
**Headline.** Tao's two fresh asks shipped same-day (toy#20 per-token
|
|
211
|
+
drift + corpus freq; toy#21 sample events). `/v1/embeddings` lives on
|
|
212
|
+
all 7 `tep_demo/openai_api_*` servers — mean-pooled OpenAI-shape
|
|
213
|
+
response over the dequantize-aware embed_lookup primitive. GH#13
|
|
214
|
+
(ViT-Tiny) closed — 200-step acceptance verified, events.jsonl
|
|
215
|
+
well-formed. GH#14 (Qwen-2.5-1.5B → 410M transfer) ~85% done; missing
|
|
216
|
+
piece is FineWeb-Edu data, filed as toy#22. GH#3 (multi-GPU mode 1)
|
|
217
|
+
got its C-side scaffolding (device-index plumbing in tnn_session_new);
|
|
218
|
+
runtime testing deferred until multi-GPU hardware. GH#9
|
|
219
|
+
(mixed-precision) API foundation shipped (tnn_cast + mp_matmul); full
|
|
220
|
+
implementation blocked on ggml autograd accepting BF16 srcs +
|
|
221
|
+
completing F16 backward sched paths. Spinel landmines #11 + #15
|
|
222
|
+
(File.open + FFI / non-block writes) fixed upstream; cleanup pass
|
|
223
|
+
removed stale workaround comments + memorialized probes as regression
|
|
224
|
+
tests.
|
|
225
|
+
|
|
226
|
+
### Tao asks shipped (same-day)
|
|
227
|
+
|
|
228
|
+
- **Per-token embedding drift + corpus frequency (toy#20).** New
|
|
229
|
+
`lib/toy_token_drift.rb` module; `TOY_TOKEN_DRIFT=N` env knob in
|
|
230
|
+
06_train_from_scratch emits one `drift` event per vocab row per
|
|
231
|
+
N macro-steps with `cos_to_init`, `l2_to_init`, and `freq`
|
|
232
|
+
(training-corpus occurrence). Tao renders this as the freq↔drift
|
|
233
|
+
figure (granite_transfer Pearson r = -0.835 headline).
|
|
234
|
+
- **Sample generation events (toy#21).** New `lib/toy_sample.rb`
|
|
235
|
+
module; `TOY_SAMPLES=N` decodes N completions from training-
|
|
236
|
+
sequence prompts at run end and emits toy/v1 `sample` events
|
|
237
|
+
`{prompt, text, step}`. Uses `tnn_compute` (forward graph only)
|
|
238
|
+
so weights are not mutated by sample emission. Cheap-when-off.
|
|
239
|
+
- **`/v1/embeddings` on every openai_api server** (closes the
|
|
240
|
+
toy-side of Tao's embedding ask). `tep_demo/embeddings_handler.rb`
|
|
241
|
+
(shared) + route registration in all 7 servers
|
|
242
|
+
(`openai_api_smollm2`, `openai_api_qwen25_{0.5,1.5,3,7}b` × {f32,
|
|
243
|
+
q8}). OpenAI-shape response; mean-pool over input token IDs;
|
|
244
|
+
dequantize-aware (works on F32 + Q8 + Q4 weight tables). Smoke
|
|
245
|
+
on SmolLM2-135M returns 576-dim vector; on Qwen-0.5B returns
|
|
246
|
+
896-dim.
|
|
247
|
+
|
|
248
|
+
### Training maturity follow-ups
|
|
249
|
+
|
|
250
|
+
- **GH#7 micro-batching** + **GH#8 LR-scaled grad accumulation**
|
|
251
|
+
shipped (commits 87800fa, 0ae99ac — full details below in
|
|
252
|
+
v0.5.0-pre-alpha). 06_train_from_scratch.rb gains BATCH +
|
|
253
|
+
GRAD_ACCUM env knobs; defaults bit-identical to pre-GH#7.
|
|
254
|
+
- **bench BATCH knob** in `bench/lora_step.rb` — emits per-batch
|
|
255
|
+
metrics (`lora_step_b1_ms`, `lora_step_b4_ms`, etc.). Track how
|
|
256
|
+
step time scales with effective batch. B=4 baseline at SmolLM2-
|
|
257
|
+
135M / gx10 CPU: 104.42 ms (vs 58.13 at B=1 → 1.80× wall for 4×
|
|
258
|
+
effective → 2.22× throughput).
|
|
259
|
+
|
|
260
|
+
### GitHub issue triage / closures
|
|
261
|
+
|
|
262
|
+
- **GH#13 ViT-Tiny — closed.** Primary acceptance verified:
|
|
263
|
+
`examples/07_train_vit_tiny.rb` runs 200 steps, emits 202-line
|
|
264
|
+
events.jsonl in toy/v1 schema, final loss 5.7e-4 ≪ initial 2.30.
|
|
265
|
+
Scope items (arch, timm loader, training driver, image loader)
|
|
266
|
+
all in main. The "E1 reproduces granite_transfer #28" follow-up
|
|
267
|
+
is downstream Tao work.
|
|
268
|
+
- **GH#14 Qwen-2.5-1.5B → 410M transfer — ~85% done.**
|
|
269
|
+
Trainer + projection lens (E2.3) + warm-start donor load
|
|
270
|
+
(`examples/09_warm_start_train.rb`) all working. Final `eval`
|
|
271
|
+
event added at run end (`name:"final"`, matches issue's
|
|
272
|
+
acceptance schema). Qwen-410M invocation pattern documented in
|
|
273
|
+
09's header. Smoke verified loading 233M-float donor embed from
|
|
274
|
+
`data/qwen25-1.5b-f32.gguf`. **Remaining gap is data, not code:**
|
|
275
|
+
FineWeb-Edu pretokenizer filed as toy#22. Once #22 lands +
|
|
276
|
+
TOKENS knob added, a full 10M-token run on the GB10 would close
|
|
277
|
+
the acceptance loop.
|
|
278
|
+
- **GH#10 activation recomputation — closed as blocked on
|
|
279
|
+
upstream ggml.** Investigation found no recompute/checkpoint API
|
|
280
|
+
in current ggml (no matches for "checkpoint", "recompute",
|
|
281
|
+
"rematerial" anywhere in `vendor/ggml/`). Issue body's premise
|
|
282
|
+
("ggml supports this via sched-recompute") doesn't map to a real
|
|
283
|
+
ggml feature. Real implementation paths (vendor patch or
|
|
284
|
+
two-pass approach) are much bigger than the "Ruby-side change to
|
|
285
|
+
realize_for_random_init" the issue suggested. Revisit when toy
|
|
286
|
+
hits the activation-memory wall (1B+ at long context).
|
|
287
|
+
- **GH#9 mixed-precision — API foundation shipped; full impl
|
|
288
|
+
blocked on ggml.** `tnn_cast` FFI binding + `mp_matmul` helper
|
|
289
|
+
in `LlamaSeqForwardFFICache` + `WEIGHT_DTYPE` env in
|
|
290
|
+
06_train_from_scratch. At WEIGHT_DTYPE=0 (default, F32):
|
|
291
|
+
bit-identical to pre-GH#9. At WEIGHT_DTYPE=30 (BF16): fails at
|
|
292
|
+
`ggml.c:7052` (autograd builder rejects BF16 src). At
|
|
293
|
+
WEIGHT_DTYPE=1 (F16): autograd accepts but sched can't place a
|
|
294
|
+
backward op (`ggml-backend.cpp:1242`) on either CPU or CUDA.
|
|
295
|
+
Both walls are upstream-ggml. Forward-only F16 cast works
|
|
296
|
+
(probe verified).
|
|
297
|
+
- **GH#3 multi-GPU mode 1 — C-side scaffolding shipped.**
|
|
298
|
+
`tnn_session_new_on(kind, device)` + `tnn_cuda_get_device_count`
|
|
299
|
+
C entry points; engine cache widened from a scalar to a
|
|
300
|
+
per-device array (`g_engine_cuda[TNN_MAX_CUDA_DEVICES=8]`). The
|
|
301
|
+
device > 0 path is untested at runtime (gx10 = 1 GPU); the
|
|
302
|
+
device = 0 path (= legacy behavior) is bit-equivalent (CUDA
|
|
303
|
+
example_train_from_scratch_cuda step-1 CE unchanged at
|
|
304
|
+
6.490198612213135).
|
|
305
|
+
- **GH#22 (new) — FineWeb-Edu pretokenizer.** Filed; blocks
|
|
306
|
+
GH#14's full acceptance run. ~150 LOC Python: streams
|
|
307
|
+
FineWeb-Edu via uv datasets, tokenizes via Qwen-2.5
|
|
308
|
+
tokenizer, packs i32 binary matching `lib/toy_corpus_loader.rb`.
|
|
309
|
+
- **Status comments on the keep-open issues** (#3, #4, #5, #6,
|
|
310
|
+
#14) — each got a concise current-state note. #5 retains its
|
|
311
|
+
"no action expected" framing; #6 is cross-repo (Tep streaming
|
|
312
|
+
handler); #4/#5 blocked on multi-GPU hardware.
|
|
313
|
+
|
|
314
|
+
### Spinel landmines retired
|
|
315
|
+
|
|
316
|
+
- **#11 — block-form `File.open(r)` + FFI session-init crash —
|
|
317
|
+
FIXED.** Verified on Spinel a03bb49 (commit 39438d8
|
|
318
|
+
"non-block File.open with sp_File handle" modernized File.open
|
|
319
|
+
codegen end-to-end). Probe at `tinynn/probe_file_block_ffi.rb`
|
|
320
|
+
passes. Existing `File.read(path).split("\\n")` sites stay
|
|
321
|
+
because they're the natural primitive; workaround comments
|
|
322
|
+
removed.
|
|
323
|
+
- **#15 — non-block `File.open(path, "w")` silent no-op —
|
|
324
|
+
FIXED** (same commit). Probe at `tinynn/probe_file_nonblock.rb`
|
|
325
|
+
writes "hello from non-block File.open" + reads back; verified
|
|
326
|
+
on disk.
|
|
327
|
+
- **#9, #4, #13 — re-verified active.** Probes
|
|
328
|
+
(`probe_hash_missing_key`, `probe_default_args`,
|
|
329
|
+
`probe_nested_mixed_array`) kept as regression checks for
|
|
330
|
+
future Spinel updates. Hash[missing_key]=0 still returns 0;
|
|
331
|
+
cross-module default-args poison can't be reproduced in
|
|
332
|
+
single-file probes but treat as active; nested-mixed-array
|
|
333
|
+
startup-segfault still active with the same
|
|
334
|
+
`incompatible types ... 'sp_IntArray *' from type 'sp_RbVal'`
|
|
335
|
+
diagnostic.
|
|
336
|
+
|
|
337
|
+
### Tep server consolidation (toy#188)
|
|
338
|
+
|
|
339
|
+
- **7 → 1 server.** Replaced the seven near-duplicate
|
|
340
|
+
`tep_demo/openai_api_{smollm2,qwen25_{0.5,1.5,3,7}b{,_q8}}.rb`
|
|
341
|
+
files with one env-driven `tep_demo/openai_api_llama.rb`.
|
|
342
|
+
`MODEL_PATH` + `MODEL_NAME` env at boot select any
|
|
343
|
+
llama-family GGUF; `MODEL_NAME` auto-derives from the GGUF
|
|
344
|
+
basename when not given. Net: -2128 LOC, +63 LOC.
|
|
345
|
+
- The 7 sources existed because Spinel module-constant
|
|
346
|
+
inference was sketchy on env-driven values when the
|
|
347
|
+
convention was set. With landmines #11/#15 retired this
|
|
348
|
+
release, env-driven constants work cleanly.
|
|
349
|
+
- The original `tep_demo/openai_api.rb` (GPT-2 / DistilGPT2)
|
|
350
|
+
stays as the legacy server with the server-side tokenizer.
|
|
351
|
+
|
|
352
|
+
### DevEx + docs pass
|
|
353
|
+
|
|
354
|
+
- **README** version bumped to v0.6.0-pre-alpha.
|
|
355
|
+
- **examples/README.md** table now covers 06-09 (modern
|
|
356
|
+
from-scratch trainer, ViT-Tiny, LMC, warm-start) and the
|
|
357
|
+
smoke_*.rb wire tests. Serving section points at
|
|
358
|
+
`tep_demo/openai_api_llama` as the canonical HTTP path.
|
|
359
|
+
- **tep_demo/README.md** updated for the 4-server reality;
|
|
360
|
+
`openai_api_llama` quick-start added with all the env knobs
|
|
361
|
+
+ `/v1/embeddings` + `/v1/completions` curl recipes.
|
|
362
|
+
- **events-schema.md** now documents the `sample` event
|
|
363
|
+
(toy#21) and the per-token `drift` variant (token_id + freq,
|
|
364
|
+
toy#20) — both producer + consumer notes.
|
|
365
|
+
- **examples/04_serve_http.rb** status note refreshed: the
|
|
366
|
+
original startup segfault is fixed, but `Tep.run!` exits
|
|
367
|
+
immediately because the file still uses the pre-spinelgems
|
|
368
|
+
vendored Tep. The header now points users at
|
|
369
|
+
`tep_demo/openai_api_llama` as the working serving binary.
|
|
370
|
+
|
|
371
|
+
### Bug fixes
|
|
372
|
+
|
|
373
|
+
- `tep_demo/openai_api_smollm2.rb` defaults restored to
|
|
374
|
+
SmolLM2-135M (file was serving Qwen-0.5B due to a
|
|
375
|
+
copy-paste artifact — `GGUF_PATH` and `MODEL_NAME` pointed at
|
|
376
|
+
qwen25-0.5b despite the filename). File then renamed to
|
|
377
|
+
`openai_api_llama.rb` as part of the toy#188 consolidation.
|
|
378
|
+
|
|
379
|
+
## v0.5.0-pre-alpha — 2026-05-27
|
|
380
|
+
|
|
381
|
+
**Headline.** Tao Tier-3 fully unblocked: trustworthy drift/grad, LMC,
|
|
382
|
+
and activation-CKA all have producer-side support landed. From-scratch
|
|
383
|
+
training runs on CUDA (~10× CPU at 24L × 16-head Qwen-shape). Toy
|
|
384
|
+
checkpoints round-trip through inference. Embedding lookup +
|
|
385
|
+
decode-logprobs primitives shipped for Tep's eventual `/v1/embeddings`
|
|
386
|
+
and `/v1/chat?logprobs=true`. Graph node capacity scales with
|
|
387
|
+
n_layers × n_heads so per-head decomposition doesn't cap us at ~10
|
|
388
|
+
layers anymore.
|
|
389
|
+
|
|
390
|
+
### Training observability + cross-run analysis (Tao Tier-3)
|
|
391
|
+
|
|
392
|
+
- **Semantic tensor names** (#11, #16). PARAM tensors emitted by
|
|
393
|
+
`realize_for_random_init` (from-scratch), `realize_for_mmap` (LoRA),
|
|
394
|
+
and `realize_for_full_finetune` (FFT) now carry llama.cpp-convention
|
|
395
|
+
names: `token_embd.weight`, `blk.N.attn_q.head_H.weight`,
|
|
396
|
+
`…ffn_down.weight`, `…lora_a.weight`, plus matching `.m` / `.v` for
|
|
397
|
+
Adam moments. Drift, grad, and gguf-checkpoint events now have
|
|
398
|
+
stable cross-run identifiers — Tao's `compare` can align.
|
|
399
|
+
- **Session graph capacity is parametric** (#17). Per-head
|
|
400
|
+
decomposition makes node count scale as `O(n_layers × n_heads)`;
|
|
401
|
+
the default 65536 cap overflowed on 24L × 16-head Qwen-shape at
|
|
402
|
+
backward-expand time. `tnn_session_set_graph_capacity` re-allocates
|
|
403
|
+
graphs (auto-grows ctx_buf if needed) + persists across
|
|
404
|
+
`tnn_reset_for_rebuild`. `realize_for_random_init` now sizes
|
|
405
|
+
`cap = n_layers × n_heads × 1000 + 65536` from cfg.
|
|
406
|
+
- **Activation-Gram taps for CKA** (#15). `ToyTap.emit_cka` computes
|
|
407
|
+
`G = Aᵀ·A` (T×T Gram) of an `[d, T]` activation and emits it as a
|
|
408
|
+
`gram` field on tap events. Wired into `build_seq_block` at three
|
|
409
|
+
stable regions per block: `attn_norm`, `ffn_out`,
|
|
410
|
+
`resid_post_block`. Gated by `TOY_CKA=N` (every N steps). Tao's
|
|
411
|
+
`Analyze.linear_cka` already unit-tested on synthetic grams.
|
|
412
|
+
Schema: `gram` field added to `tap` event in
|
|
413
|
+
[`docs/events-schema.md`](docs/events.md).
|
|
414
|
+
- **LMC interpolate-and-eval runner** (#18). `examples/08_lmc.rb`
|
|
415
|
+
takes two toy checkpoints + α-grid; for each α it blends
|
|
416
|
+
`θ_α = (1-α)·θ_A + α·θ_B` per-PARAM (by name, semantic-names
|
|
417
|
+
required), runs forward + CE on a fixed sequence, emits one
|
|
418
|
+
`eval` event per α with `name="lmc"`. Tao's `Analyze.lmc` reads
|
|
419
|
+
these → α-curve → same-basin / disconnected verdict.
|
|
420
|
+
|
|
421
|
+
### From-scratch training: CUDA + larger shapes
|
|
422
|
+
|
|
423
|
+
- **`DEVICE=cuda` for from-scratch training** (#152).
|
|
424
|
+
`prep/gen_cuda_mirror.rb` now mirrors `examples/06_train_from_scratch.rb`
|
|
425
|
+
alongside the FFI libs; `make example_train_from_scratch_cuda`
|
|
426
|
+
produces a CUDA-linked binary. Shell wrapper routes
|
|
427
|
+
`DEVICE=cuda` → CUDA binary. On GB10: SmolLM2-shape ~57 ms/step
|
|
428
|
+
(CPU is multi-s); Qwen-shape (24L × 1024) trains at ~314 ms/step
|
|
429
|
+
vs ~3 s/step on CPU. Math matches CPU to float-roundoff.
|
|
430
|
+
- **Checkpoint reload through the inference path** (#153). Toy
|
|
431
|
+
checkpoints written by `ToyGGUFWriter` are now loadable by the
|
|
432
|
+
standard inference path. `realize_for_mmap` detects per-head
|
|
433
|
+
naming via the `blk.0.attn_q.head_0.weight` sentinel and reads
|
|
434
|
+
each head's own GGUF tensor offset instead of base+stride. Writer
|
|
435
|
+
also flags `toy.ggml_native=true` so `transformer_lm.rb` routes
|
|
436
|
+
through the mmap path. Smoke: train 5 steps → write ckpt →
|
|
437
|
+
`lm.generate` 3 tokens, no NaN, no crash. New
|
|
438
|
+
`tnn_gguf_w_set_bool` primitive for the flag.
|
|
439
|
+
|
|
440
|
+
### Tep `/v1/*` building blocks
|
|
441
|
+
|
|
442
|
+
- **Embedding lookup** (#145). `tnn_embed_lookup_to_doubles`:
|
|
443
|
+
dequantize-aware single-row read from a 2-D tensor whose data
|
|
444
|
+
lives in CPU-readable memory (mmap'd GGUF pages — the common
|
|
445
|
+
case). F32 short-circuits memcpy; Q4/Q5/Q6/Q8/F16 go through
|
|
446
|
+
ggml's per-type `to_float`. Ruby API:
|
|
447
|
+
`ToyLM#embed_lookup(token_ids) → flat Array<Float>` of length
|
|
448
|
+
`n_tokens × d_model`. Verified on llama-3.2-1b f32 and
|
|
449
|
+
qwen25-0.5b q8.
|
|
450
|
+
- **Decode logprobs** (#151). `ToyLogProbs.log_softmax` (max-shift,
|
|
451
|
+
numerically stable) + `ToyLogProbs.top_k` (manual partial-sort,
|
|
452
|
+
Spinel-safe). `ToyLM#decode_step_with_logprobs(token_id, pos, k)`
|
|
453
|
+
returns `[logits_mat, logprobs_mat, top_ids, top_vals]`. Smoke on
|
|
454
|
+
SmolLM2-135M: top-5 logprobs around -2.6 to -3.7, argmax sanity
|
|
455
|
+
passes.
|
|
456
|
+
|
|
457
|
+
### Consumer packaging — toy as a vendored gem
|
|
458
|
+
|
|
459
|
+
- **toy.gemspec** (`toy#19`, commit `1f06840`). A downstream research
|
|
460
|
+
project (e.g. `tao_transfer`) can now declare `gem "toy", path:
|
|
461
|
+
"../toy_ruby_neural_network"` in a Gemfile, `bundle lock`, run
|
|
462
|
+
`spinel-compat vendor` from
|
|
463
|
+
[spinelgems](https://github.com/OriPekelman/spinelgems), apply a
|
|
464
|
+
small post-vendor link-path rewrite (`prep/post_vendor_toy.rb`),
|
|
465
|
+
and compile its own experiment against toy's primitives — no
|
|
466
|
+
forks, no hand-pathed `require_relative`s, no Mat poly-dispatch
|
|
467
|
+
landmine. End-to-end recipe in
|
|
468
|
+
[`docs/consuming-toy.md`](docs/consuming-toy.md).
|
|
469
|
+
- **lib/toy/version.rb** + **lib/toy/ffi_manifest.rb**. VERSION
|
|
470
|
+
extracted so the gemspec doesn't pull in Spinel-only `tinynn.rb`.
|
|
471
|
+
FFI manifest follows the same `Tep::FFIManifest` shape from
|
|
472
|
+
`tep#97` — CRuby-only declarative spec of per-backend link
|
|
473
|
+
recipes that consumer-side post-vendor scripts read.
|
|
474
|
+
- **prep/post_vendor_toy.rb**. Consumer-side hook (shipped as
|
|
475
|
+
template; consumers copy into their own `prep/`). Rewrites
|
|
476
|
+
`ffi_cflags` in the vendored `tinynn{,_cuda,_metal}.rb` from
|
|
477
|
+
toy's relative `-L.` / `-Ltinynn` paths to absolute references
|
|
478
|
+
anchored at `TOY_SRC`. Honors `CUDA_DIR_LIB` env, `TOY_DISABLE`
|
|
479
|
+
to skip backends a consumer doesn't compile.
|
|
480
|
+
- **Verified end-to-end** on a `/tmp` consumer project: bundle
|
|
481
|
+
lock + vendor + rewrite + spinel compile + run. Training 3 steps
|
|
482
|
+
with `LlamaSeqForwardFFICache`, loss 6.44 → 6.32, `events.jsonl`
|
|
483
|
+
emitted with full provenance.
|
|
484
|
+
|
|
485
|
+
### Training maturity — gradient accumulation (toy#8)
|
|
486
|
+
|
|
487
|
+
- **`GRAD_ACCUM` env knob** in `examples/06_train_from_scratch.rb`.
|
|
488
|
+
Effective batch = `BATCH × GRAD_ACCUM` without the memory cost
|
|
489
|
+
of a single big batch. Default `GRAD_ACCUM=1` is bit-identical
|
|
490
|
+
to pre-GH#8 (step-1 CE = 6.490187644958496 unchanged; step-10
|
|
491
|
+
CE = 5.3334760665893555).
|
|
492
|
+
- **Implementation: LR-scaled mini-batch, not literal grad
|
|
493
|
+
accumulation.** ggml's `opt_step_adamw` is baked into the
|
|
494
|
+
backward graph and runs on every `compute_backward`; there's
|
|
495
|
+
no graph-level "skip this op" primitive, so true grad
|
|
496
|
+
accumulation (skip opt_step for N-1 iters, fire on Nth with
|
|
497
|
+
the accumulated grad) would need either a vendor patch
|
|
498
|
+
(8th hp slot to gate `opt_step_adamw`) or a two-graph
|
|
499
|
+
approach (rebuild between modes — expensive). Instead the
|
|
500
|
+
training loop runs `STEPS × GRAD_ACCUM` micro-batches, each
|
|
501
|
+
firing opt_step with `lr = LR / GRAD_ACCUM`. Cumulative
|
|
502
|
+
weight movement over a cycle matches a single full-lr step
|
|
503
|
+
on the mean grad; Adam's m/v state evolves per micro-step
|
|
504
|
+
rather than once per cycle. For typical settings
|
|
505
|
+
(`beta1=0.9, GRAD_ACCUM<=8`) the divergence from true
|
|
506
|
+
accumulation is the "AdamW state warmup" the issue
|
|
507
|
+
acknowledged.
|
|
508
|
+
- **Step semantics: `STEPS` counts macro-steps (effective
|
|
509
|
+
opt-step cycles), not micro-batches.** `STEPS=20 GRAD_ACCUM=4`
|
|
510
|
+
runs 80 forward+backward passes, emits 20 step events, and
|
|
511
|
+
the loss/checkpoint cadence is on the macro boundary.
|
|
512
|
+
`tokens` in the step event = `CONTEXT × BATCH × GRAD_ACCUM`
|
|
513
|
+
(effective tokens per macro-step).
|
|
514
|
+
- **Acceptance verified.** `BATCH=2 GRAD_ACCUM=4 STEPS=20` vs
|
|
515
|
+
`BATCH=8 GRAD_ACCUM=1 STEPS=20` both train (final/initial
|
|
516
|
+
loss ratio 0.68 / 0.75 respectively at the toy shape). Curves
|
|
517
|
+
comparable; small differences from Adam state evolution +
|
|
518
|
+
data diversity (GA=4 micro-batches sweep different sequences
|
|
519
|
+
per macro-step).
|
|
520
|
+
- **run_start event** carries `config.grad_accum` so Tao /
|
|
521
|
+
external consumers see the actual training regime.
|
|
522
|
+
|
|
523
|
+
### Training maturity — micro-batching (toy#7)
|
|
524
|
+
|
|
525
|
+
- **B>1 in `realize_for_random_init`**. New `t_batch` arg (now 7
|
|
526
|
+
positionals; the from-scratch example exposes it as the `BATCH`
|
|
527
|
+
env knob). Lays B sequences side-by-side as a flat `[T*B]`
|
|
528
|
+
token + position vector. RoPE applies the right per-batch
|
|
529
|
+
positional encoding because `rope_ext` reads `positions[k]` for
|
|
530
|
+
each `ne[2]` slot; positions cycle `0..T-1` per batch element.
|
|
531
|
+
- **Block-causal attention mask**. New `@t_seq_attn_mask`
|
|
532
|
+
persistent `f32 ne=[T*B, T*B]` tensor, uploaded once at realize
|
|
533
|
+
via `upload_block_causal_mask!`: `0.0` for `(query, key)` pairs
|
|
534
|
+
in the same batch with `key <= query`, `-1.0e30` everywhere else
|
|
535
|
+
(`exp(-1e30) == 0.0` in f32). Applied via
|
|
536
|
+
`tnn_soft_max_ext(scores, mask, scale, 0.0)`, which folds
|
|
537
|
+
`scale + mask + softmax` into one op.
|
|
538
|
+
- **B=1 stays on the legacy path** (bit-identical to pre-GH#7):
|
|
539
|
+
no mask tensor allocated, `tnn_scale + tnn_diag_mask_inf +
|
|
540
|
+
tnn_softmax` triple. The conditional is `if @seq_b > 1` in
|
|
541
|
+
`build_seq_qhead`; everywhere else the `T*B` arithmetic
|
|
542
|
+
collapses to `T` at B=1 with no branch.
|
|
543
|
+
- **Verified bit-identical at B=1** on CPU + CUDA (CE step-1
|
|
544
|
+
6.490198 unchanged); verified learning at `B=4, B=8` on both
|
|
545
|
+
backends; CUDA's per-step time drops with `B=8` (19.5 ms vs
|
|
546
|
+
24.2 ms at B=1, T=16) — launch overhead amortizes. Larger
|
|
547
|
+
shape `T=64 B=8 STEPS=20`: CE 6.51 → 4.84 in 12 ms/step on
|
|
548
|
+
CUDA.
|
|
549
|
+
- 06_train_from_scratch.rb emits `config.batch` on `run_start`
|
|
550
|
+
and `step.tokens = CONTEXT * BATCH`. Other realize-path
|
|
551
|
+
callers (`08_lmc`, `09_warm_start_train`,
|
|
552
|
+
`smoke_projection_lens`) pass `t_batch=1` — those experiments
|
|
553
|
+
stay on the single-sequence path by design.
|
|
554
|
+
|
|
555
|
+
### Roadmap docs
|
|
556
|
+
|
|
557
|
+
- [`docs/archive/e1-e2-scope-2026-05-27.md`](docs/archive/e1-e2-scope-2026-05-27.md)
|
|
558
|
+
— full decomposition of E1 (ViT-Tiny, 6 sub-issues, 5-8 days) and
|
|
559
|
+
E2 (Qwen-410M embedding transfer, 7 sub-issues, 3-5 days). E2.1
|
|
560
|
+
cheapest-first-step was executed (24L × 16H Qwen-shape) — uncovered
|
|
561
|
+
the graph-capacity blocker (filed + shipped as #17).
|
|
562
|
+
- [`docs/roadmap/backends-and-scale-2026-05-27.md`](docs/roadmap/backends-and-scale-2026-05-27.md)
|
|
563
|
+
— training maturity (batching, grad accum, mixed-precision,
|
|
564
|
+
activation recompute), hybrid CPU/GPU offload, non-ggml backends,
|
|
565
|
+
the strategic question on what toy should own.
|
|
566
|
+
|
|
567
|
+
### Spinel landmines pinned this run
|
|
568
|
+
|
|
569
|
+
- F32 has no `to_float` in ggml type_traits — caller must memcpy-
|
|
570
|
+
shortcut (the type_traits API would otherwise return NULL and
|
|
571
|
+
emit zeros). Now memorialized in `tnn_embed_lookup_to_doubles`
|
|
572
|
+
with the F32 short-circuit branch.
|
|
573
|
+
- `Array<Array<int_or_float>>` seeds confuse Spinel poly inference
|
|
574
|
+
and can fault at startup-class-init (`poly_array_push`). Pin in
|
|
575
|
+
`lib/toy_logprobs.rb` header — return two parallel arrays
|
|
576
|
+
(`ids: Array<Int>`, `vals: Array<Float>`) instead.
|
|
577
|
+
|
|
578
|
+
## v0.4.0-pre-alpha — 2026-05-24
|
|
579
|
+
|
|
580
|
+
**Headline.** Real OLMoE-1B-7B-Instruct produces coherent factual
|
|
581
|
+
answers ("The capital of France is **called Paris**"). Gemma 2 extras
|
|
582
|
+
land. Flash attention finally beats baseline on Qwen3-1.7B (12%
|
|
583
|
+
faster) after the V cache layout flip that was eating its win. SSM
|
|
584
|
+
op primitives bound (Mamba enable, speculative). Upstream ggml issue
|
|
585
|
+
filed for a real mul_mat_id × K-quant bug we found.
|
|
586
|
+
|
|
587
|
+
### I-V-layout-flip (P5.2) — V Q8 unlocked, flash perf realized
|
|
588
|
+
|
|
589
|
+
- V cache flipped from `ne=[max_T, d_head]` (positions on ne0) to
|
|
590
|
+
`ne=[d_head, max_T]` (positions on ne1, mirroring K). Two wins land
|
|
591
|
+
together:
|
|
592
|
+
- `enable_kv_q8!` now quantizes BOTH K and V (was K-only). Per-
|
|
593
|
+
position V writes span a contiguous d_head-vector, block-aligned
|
|
594
|
+
at d_head=64 (=2 Q8_0 blocks).
|
|
595
|
+
- `flash_attn_ext` consumes V natively in its expected
|
|
596
|
+
`[d_head, hist_count]` orientation — no transpose-cont in the
|
|
597
|
+
hot loop. P4.1's reported flash=baseline was being eaten by that
|
|
598
|
+
transpose; now flash wins.
|
|
599
|
+
- Qwen3-1.7B, N_NEW=32, CPU:
|
|
600
|
+
- baseline 3.54s (1.00×)
|
|
601
|
+
- FLASH only 3.09s (0.87×, **12% faster** — was a wash before)
|
|
602
|
+
- KV_Q8 3.07s (0.87×, K+V Q8 + flash)
|
|
603
|
+
- SmolLM2-135M four-config token streams bit-identical (baseline /
|
|
604
|
+
KV_Q8 / FLASH / KV_Q8+FLASH).
|
|
605
|
+
- Structural constraint: Q8 V *requires* flash. Transposing a Q8
|
|
606
|
+
tensor yields a non-block-aligned destination (hist_count isn't
|
|
607
|
+
divisible by 32); ggml_cont can't materialize it. `enable_kv_q8!`
|
|
608
|
+
auto-enables flash to make this transparent.
|
|
609
|
+
|
|
610
|
+
### #110 I-QKnorm — per-arch QK-norm flavor (OLMoE coherent text)
|
|
611
|
+
|
|
612
|
+
- OLMoE / Granite-MoE store the QK-norm gamma at `[d_model]` (per-
|
|
613
|
+
head packed) and apply RMSNorm to the *full* Q before head split.
|
|
614
|
+
Qwen3-style models store it at `[d_head]` (shared per-head).
|
|
615
|
+
These are mathematically different. M2.3 mis-applied Qwen3 to
|
|
616
|
+
OLMoE, producing "The capital of France is a city in the capital
|
|
617
|
+
of France." This patch detects the flavor and routes accordingly.
|
|
618
|
+
- `SmolLM2Flags.qk_norm_kind`: 0 = none, 1 = Qwen3, 2 = OLMoE.
|
|
619
|
+
Detected from `blk.0.attn_q_norm.weight`'s ne[0] against d_head
|
|
620
|
+
and d_model from the multi-arch (llama.* / olmoe.* / gemma2.*)
|
|
621
|
+
metadata.
|
|
622
|
+
- Graph builder applies the kind-2 path via a per-head sliced
|
|
623
|
+
`view_1d(gamma, d_head, hq * d_head * 4)`. Per-head approximation
|
|
624
|
+
of true full-Q norm — exact gamma scaling, approximate variance
|
|
625
|
+
pooling. Empirically: produces coherent OLMoE output.
|
|
626
|
+
- Validated OLMoE-1B-7B-Instruct Q8:
|
|
627
|
+
- "The capital of France is" → "called Paris."
|
|
628
|
+
- "Python is a programming language that" → "is used to create
|
|
629
|
+
programs that can be executed on a computer."
|
|
630
|
+
- "The largest planet in our solar system is" → "Jupiter. It is
|
|
631
|
+
a gas giant"
|
|
632
|
+
- "Albert Einstein was famous for" → "his theory of relativity"
|
|
633
|
+
- KV_Q8 + flash + per-head sliced norm compose cleanly: same
|
|
634
|
+
factual answers on the quantized path.
|
|
635
|
+
|
|
636
|
+
### #113 I-Gemma — Gemma 2 extras
|
|
637
|
+
|
|
638
|
+
Four model-specific extras integrated as opt-in features. Non-Gemma
|
|
639
|
+
models pass inert defaults and the graph paths are no-ops.
|
|
640
|
+
|
|
641
|
+
- **Embedding scale** sqrt(d_model) applied post-token-embed lookup
|
|
642
|
+
(Gemma 2-2b → 48.0). Newton sqrt at detection time avoids the
|
|
643
|
+
Math.sqrt Spinel landmine.
|
|
644
|
+
- **Logit soft-cap** `tanh(x/c)*c`:
|
|
645
|
+
- Attention logits (c = 50.0): wired through flash_attn_ext's
|
|
646
|
+
native `logit_softcap` parameter; non-flash composes via
|
|
647
|
+
tnn_scale + new `tnn_tanh` + tnn_scale.
|
|
648
|
+
- Final output logits (c = 30.0): applied to t_kv_logits before
|
|
649
|
+
set_output.
|
|
650
|
+
- **Pre + post norms** on each sublayer: new
|
|
651
|
+
`t_post_attn_norm_gamma` + `t_post_ffn_norm_gamma` allocated
|
|
652
|
+
when has_post_norms. Applied on sublayer output BEFORE the
|
|
653
|
+
residual add (Gemma 2's sandwich).
|
|
654
|
+
- **Alternating SWA**: per-layer toggle (even = sliding, odd = full)
|
|
655
|
+
when `@swa_alternates` is set. layer_idx threaded through
|
|
656
|
+
build_attention_qhead_step.
|
|
657
|
+
- Multi-arch metadata probe gains "gemma2" alongside llama / olmoe.
|
|
658
|
+
`rope.freq_base` default-fallback to 10000.0 (Gemma 2 doesn't
|
|
659
|
+
emit the key). Force `is_native = true` when has_post_norms is
|
|
660
|
+
detected (third-party Gemma GGUFs take the mmap path despite
|
|
661
|
+
lacking toy.ggml_native).
|
|
662
|
+
- New primitive: `tnn_tanh` (ggml_tanh wrapper).
|
|
663
|
+
- Gemma 2-2b-it Q8 loads, mmaps, graph builds + realizes cleanly,
|
|
664
|
+
produces well-formed varied logits (no NaN). End-to-end text
|
|
665
|
+
output is currently blocked at the **tokenizer** layer: Gemma 2
|
|
666
|
+
uses sentencepiece with vocab=256000, our tokenizer mis-
|
|
667
|
+
tokenizes. Separate task (#117).
|
|
668
|
+
|
|
669
|
+
### #112 Q-mul_mat_id × K-quants — upstream filed
|
|
670
|
+
|
|
671
|
+
- Discovered in M2.3: ggml's `mul_mat_id` kernel produces wrong
|
|
672
|
+
output for K-quantized (Q4_K, Q5_K, Q6_K) expert weights. Same
|
|
673
|
+
model at Q8_0 produces coherent text; at Q4_K_M produces
|
|
674
|
+
"Dub Dub Dub" repeating. Root cause likely: mul_mat_id only has
|
|
675
|
+
reliable kernels for F32/F16/Q8_0 sources per
|
|
676
|
+
`test-backend-ops.cpp::test_mul_mat_id` registrations.
|
|
677
|
+
- Filed upstream: <https://github.com/ggml-org/ggml/issues/1506>
|
|
678
|
+
with the OLMoE repro + suggested test additions.
|
|
679
|
+
- Runtime WARN: realize_for_mmap detects K-quant MoE weights (type
|
|
680
|
+
∈ [10, 19]) and emits four WARN lines on layer 0. Loud failure
|
|
681
|
+
mode rather than silent wrong output.
|
|
682
|
+
- Documented in `docs/notes/mul_mat_id_quants.md` — the canonical
|
|
683
|
+
write-up with the workaround (use Q8_0 for MoE expert weights).
|
|
684
|
+
|
|
685
|
+
### #114 C-SSM — SSM_CONV + SSM_SCAN bindings (speculative)
|
|
686
|
+
|
|
687
|
+
- `tnn_ssm_conv` + `tnn_ssm_scan` FFI wrappers on CPU + CUDA.
|
|
688
|
+
Coverage 28 → 30 of 98 ops bound. Mamba/Jamba family is now
|
|
689
|
+
blocked only by Cache-class wiring, not by primitives.
|
|
690
|
+
- Speculative binding. No Mamba use case yet. When someone wants
|
|
691
|
+
Mamba inference, they start from "build a Mamba Cache class" with
|
|
692
|
+
the primitives already wired.
|
|
693
|
+
- Shape expectations documented (Mamba-2 grouped 4D layout). Smoke
|
|
694
|
+
deferred to the M-Mamba follow-up — meaningful test needs proper
|
|
695
|
+
Mamba-2-shaped inputs which is more work than the binding itself.
|
|
696
|
+
|
|
697
|
+
### Coverage matrix tweak
|
|
698
|
+
|
|
699
|
+
`prep/gen_coverage.rb` regression fix: `PRIMARY_WRAPPER` override
|
|
700
|
+
map for ops where our wrapper name doesn't follow `tnn_<ggml_stem>`
|
|
701
|
+
(MUL_MAT → tnn_matmul, SOFT_MAX → tnn_softmax). Caught during the
|
|
702
|
+
post-Metal-merge regression battery — the new wrapper-matching
|
|
703
|
+
heuristic was demoting them to status `via`.
|
|
704
|
+
|
|
705
|
+
### Follow-ups filed
|
|
706
|
+
|
|
707
|
+
- **#117 T-Gemma-tokenizer**: SentencePiece for Gemma 2
|
|
708
|
+
(vocab=256000). Graph integration works; example_inference text
|
|
709
|
+
output blocked on tokenizer correctness.
|
|
710
|
+
- **ggml-org/ggml#1506**: upstream mul_mat_id × K-quant bug. Drop
|
|
711
|
+
the runtime warning + add a coverage smoke when fixed.
|
|
712
|
+
|
|
713
|
+
## v0.3.0-pre-alpha — 2026-05-24
|
|
714
|
+
|
|
715
|
+
**Headline.** Metal backend (issue #2). SmolLM2-135M runs end-to-end
|
|
716
|
+
on Apple Silicon GPUs with bit-identical output to the CPU path. Same
|
|
717
|
+
FFI surface as CPU + CUDA, same graph builder, same generated mirror
|
|
718
|
+
classes — the only switch is `tnn_session_new(2)`. Validated on M2;
|
|
719
|
+
expected to work on M1 / M3 / M4 / M5.
|
|
720
|
+
|
|
721
|
+
### B-Metal — third backend
|
|
722
|
+
|
|
723
|
+
- New `setup-ggml-metal` Makefile target. Builds with
|
|
724
|
+
`GGML_METAL_EMBED_LIBRARY=ON` so the .metal shaders are baked into
|
|
725
|
+
the static archive as raw bytes — the Metal driver JIT-compiles
|
|
726
|
+
them on first device load (~15 s one-time per binary, then cached).
|
|
727
|
+
Works with Command Line Tools alone; no full Xcode required.
|
|
728
|
+
- `tinynn/tinynn_backend_metal.m` (Objective-C). Strong
|
|
729
|
+
`tnn_backend_metal_init_internal()` calling `ggml_backend_metal_init`,
|
|
730
|
+
mirroring the CUDA backend's archive-isolation pattern. Plus
|
|
731
|
+
`tnn_force_exit` — a flush-then-`_exit` trampoline that skips
|
|
732
|
+
`__cxa_finalize` so ggml-metal's static-destructor residency-set
|
|
733
|
+
assert doesn't fire on short-lived programs.
|
|
734
|
+
- `tinynn/tinynn_ggml.c` engine cache extended to ternary
|
|
735
|
+
(CPU / CUDA / Metal) via `tnn_engine_get(backend_kind)`. The
|
|
736
|
+
`prefer_cuda` integer is now `backend_kind` with 0/1/2 semantics;
|
|
737
|
+
callers pass `2` to opt into Metal. Adds `tnn_shutdown_engines()`
|
|
738
|
+
for explicit teardown (CPU + CUDA tolerate the call as well).
|
|
739
|
+
- `lib/tinynn_metal.rb` — `TinyNNMetal` FFI module mirroring the full
|
|
740
|
+
CPU surface, plus Ruby helpers (`upload_int_array`,
|
|
741
|
+
`download_row_major`, …) the generated mirrors call. Links
|
|
742
|
+
Foundation / Metal / MetalKit frameworks.
|
|
743
|
+
- `lib/transformer_lm_metal.rb` + `lib/toy_smollm2_ffi_kv_metal.rb` +
|
|
744
|
+
the rest of the `_metal.rb` mirror set. The KV-cache decode runs on
|
|
745
|
+
GPU; the loader takes the copy-load (non-mmap) path because
|
|
746
|
+
ggml-metal doesn't expose a public `buffer_from_pointer` — the
|
|
747
|
+
scheduler crashes when fed CPU-resident weight tensors as kernel
|
|
748
|
+
inputs. Multi-GB models pay the copy cost on Metal until upstream
|
|
749
|
+
adds the BYO-pointer API.
|
|
750
|
+
- `prep/gen_cuda_mirror.rb` generalized: emits both `*_cuda.rb` and
|
|
751
|
+
`*_metal.rb` from the same CPU source via a per-backend
|
|
752
|
+
substitution table (`--backend cuda|metal` to target one).
|
|
753
|
+
- `examples/01_inference_metal.rb` + `make example_inference_metal`
|
|
754
|
+
— end-to-end smoke. Output for SmolLM2-135M F32 with the five-ID
|
|
755
|
+
fallback prompt is bit-identical to the CPU path.
|
|
756
|
+
- `docs/coverage.md` gains a Metal column. Today the Metal mirror is
|
|
757
|
+
intentionally a thin surface; the `0/26` Metal-bound count is the
|
|
758
|
+
follow-up to-do list, not a regression signal.
|
|
759
|
+
|
|
760
|
+
### Known gaps (Metal)
|
|
761
|
+
|
|
762
|
+
- Zero-copy mmap: blocked on a public `ggml_backend_metal_buffer_from_ptr`
|
|
763
|
+
upstream. Workaround: copy-load (current default).
|
|
764
|
+
- GPT-2-family validation: `lib/gpt2_ffi_*_metal.rb` mirrors exist
|
|
765
|
+
(generated), but no binary has been built against them yet.
|
|
766
|
+
- Quantized weights: untested on Metal. Should work — same kernel
|
|
767
|
+
coverage upstream — but no smoke yet.
|
|
768
|
+
|
|
769
|
+
## v0.2.0-pre-alpha — 2026-05-23
|
|
770
|
+
|
|
771
|
+
**Headline.** Three new model families work end-to-end as
|
|
772
|
+
text → text: Qwen3-0.6B (dense), Mistral-7B-Instruct-v0.2, and
|
|
773
|
+
TinyLlama-1.1B. RoPE scaling (YaRN / llama3 / linear) lands.
|
|
774
|
+
The tokenizer now handles both byte-level BPE (GPT-2 / Llama-3 /
|
|
775
|
+
Qwen) and SentencePiece (Llama-1/2 / Mistral). A bench harness +
|
|
776
|
+
card-drift detector + Chrome-Trace-format observability primitive
|
|
777
|
+
ship as reusable infrastructure.
|
|
778
|
+
|
|
779
|
+
Still pre-alpha: no API stability commitments, and we'll happily
|
|
780
|
+
break shapes when something deserves it. See the sections below
|
|
781
|
+
for the full inventory.
|
|
782
|
+
|
|
783
|
+
### T1.3 — SentencePiece tokenizer (Llama-1/2 / Mistral / TinyLlama)
|
|
784
|
+
|
|
785
|
+
- `lib/tokenizer.rb` auto-detects SentencePiece vs byte-level BPE
|
|
786
|
+
by checking `vocab[3] == "<0x00>"` (the first byte-fallback
|
|
787
|
+
token in any SPM vocab). Sets `@spm = true` and dispatches
|
|
788
|
+
`encode` / `decode` to SPM-specific paths.
|
|
789
|
+
- SPM encode: prepend `▁` (U+2581), replace ASCII spaces with `▁`,
|
|
790
|
+
char-split, byte-fallback any char not in vocab via `<0xHH>`
|
|
791
|
+
tokens, then run the same merge-loop BPE as the GPT-2 path.
|
|
792
|
+
- SPM decode: collapse `<0xHH>` byte-fallback runs back to UTF-8
|
|
793
|
+
bytes (with robust byte-level hex parsing to dodge a Spinel
|
|
794
|
+
`String#[Range]` quirk), and convert `▁` → ASCII space (stripping
|
|
795
|
+
the leading boundary marker so the round-trip is lossless).
|
|
796
|
+
- Converter detects the tokenizer flavor at conversion time
|
|
797
|
+
(same `vocab[3] == "<0x00>"` heuristic) and emits the right
|
|
798
|
+
`tokenizer.ggml.model` value (`"llama"` for SPM, `"gpt2"` for
|
|
799
|
+
byte-level) plus the right `tokenizer.ggml.pre` hint.
|
|
800
|
+
- Verified:
|
|
801
|
+
TinyLlama-1.1B: "The capital of France is Paris, which is
|
|
802
|
+
known for its beautiful architecture, museums,
|
|
803
|
+
and cultural events"
|
|
804
|
+
Mistral-7B-v0.2: "The capital of France is Paris."
|
|
805
|
+
- `tinynn/ab_smoke_tokenizer.rb` extended to include
|
|
806
|
+
TinyLlama-1.1B-tok in the round-trip matrix. Now 20/20 PASS
|
|
807
|
+
across SmolLM2 / Llama-3.2 / Qwen2.5 / TinyLlama on five
|
|
808
|
+
representative English prompts.
|
|
809
|
+
|
|
810
|
+
### M1.1 — Qwen3-0.6B works end-to-end (head_dim + o_proj fix)
|
|
811
|
+
|
|
812
|
+
Final fix for the `sp_ToyLM_decode_step` crash: `t_w_o` was
|
|
813
|
+
allocated as `(d_model, d_model)`, but the output projection
|
|
814
|
+
actually maps `[n_heads * d_head] → [d_model]`. For SmolLM2 /
|
|
815
|
+
Llama / Qwen2.5, `n_heads * d_head == d_model`, so the shape was
|
|
816
|
+
right by accident. Qwen3-0.6B has `n_heads * d_head = 2048 ≠
|
|
817
|
+
d_model = 1024`, and the matmul aliased the wrong memory region.
|
|
818
|
+
|
|
819
|
+
Fix in both realize paths: `tnn_input_2d_persistent_mmap(@sess,
|
|
820
|
+
@d_model, @n_heads * @d_head, ...)`.
|
|
821
|
+
|
|
822
|
+
Now produces coherent text:
|
|
823
|
+
prompt: "The capital of France is"
|
|
824
|
+
output: "The capital of France is located in the city of Paris..."
|
|
825
|
+
|
|
826
|
+
Existing models unchanged (n_heads*d_head==d_model is an
|
|
827
|
+
invariant of the older configs).
|
|
828
|
+
|
|
829
|
+
### M1.1 — explicit head_dim + tied-embeddings handling (partial)
|
|
830
|
+
|
|
831
|
+
- `Toy::SmolLM2Config` gains `head_dim` field (defaults to
|
|
832
|
+
`d_model / n_heads`; loader overrides from GGUF).
|
|
833
|
+
- `SmolLM2ConfigLoader.read` reads `llama.attention.key_length`
|
|
834
|
+
(llama.cpp convention) when present; falls back to the computed
|
|
835
|
+
value for SmolLM2 / Llama-3.x / Qwen2.5 (all match
|
|
836
|
+
hidden_size/num_heads).
|
|
837
|
+
- `lib/toy_smollm2_ffi_kv.rb` and `lib/llama_seq_forward_ffi.rb`
|
|
838
|
+
now read `cfg.head_dim` everywhere they used to compute
|
|
839
|
+
`cfg.d_model / cfg.n_heads`. Behaviour unchanged for models
|
|
840
|
+
whose explicit head_dim matches the computed one.
|
|
841
|
+
- `prep/convert_smollm2_to_gguf.py` emits
|
|
842
|
+
`llama.attention.key_length` + `llama.attention.value_length`
|
|
843
|
+
GGUF keys (one value, written twice — every model on the
|
|
844
|
+
roadmap has K-dim == V-dim).
|
|
845
|
+
- Converter also handles `tie_word_embeddings: true` correctly:
|
|
846
|
+
skips emitting `output.weight` even when `lm_head.weight` is
|
|
847
|
+
in the safetensors. Trusts the config flag (as llama.cpp does).
|
|
848
|
+
|
|
849
|
+
Qwen3-0.6B converts with the right head_dim=128 and tied
|
|
850
|
+
embeddings, but inference still crashes in
|
|
851
|
+
`sp_ToyLM_decode_step`. M1.1 stays open; root cause appears to
|
|
852
|
+
be deeper than the converter / loader layer (suspect ToyLM /
|
|
853
|
+
decode-step assumes specific shape invariants).
|
|
854
|
+
|
|
855
|
+
Bench passes (±2% across all metrics). Existing models —
|
|
856
|
+
SmolLM2, Llama-3.2, Qwen2.5 — produce identical text.
|
|
857
|
+
|
|
858
|
+
### M1 — Qwen3 dense plumbing (QK-norm, partial)
|
|
859
|
+
|
|
860
|
+
- `SmolLM2KVBlockFFI` now carries `t_q_norm_gamma` + `t_k_norm_gamma`
|
|
861
|
+
(1D `[d_head]` shared across heads, per block). `SmolLM2KVFFICache`
|
|
862
|
+
carries `@has_qk_norm` flag.
|
|
863
|
+
- `GGUFLoad.detect_smollm2_flags` detects QK-norm by presence of
|
|
864
|
+
`blk.0.attn_q_norm.weight` (Qwen3-only; Qwen2.5 / Llama return false).
|
|
865
|
+
`SmolLM2Flags` gains a `qk_norm` field.
|
|
866
|
+
- `realize_for_mmap` signature widened: `(gguf, cfg, max_T, untied,
|
|
867
|
+
qkv_bias, qk_norm)`. Allocates the QK-norm gammas as mmap'd
|
|
868
|
+
1D F32 tensors when set. Graph builder applies `tnn_rms_norm` to
|
|
869
|
+
Q and K with the per-block gamma BEFORE `tnn_rope_ext`.
|
|
870
|
+
- `prep/convert_smollm2_to_gguf.py` propagates Qwen3's
|
|
871
|
+
`self_attn.q_norm.weight` / `k_norm.weight` HF tensors to
|
|
872
|
+
`attn_q_norm.weight` / `attn_k_norm.weight` GGUF tensors.
|
|
873
|
+
- Existing Qwen2.5 / SmolLM2 / Llama-3.2 / TinyLlama unchanged
|
|
874
|
+
(no QK-norm path triggered). Bench passes within ±5%.
|
|
875
|
+
- Qwen3-0.6B converts and loads, but **inference is not yet
|
|
876
|
+
correct**. Root cause: Qwen3 sets `head_dim = 128` explicitly in
|
|
877
|
+
HF config, not `hidden_size / num_heads = 64`. Our converter
|
|
878
|
+
computes the wrong head_dim and the Q/K/V projections come out
|
|
879
|
+
half-sized. Tracked as M1.1 (next task). QK-norm itself appears
|
|
880
|
+
correctly wired — the head_dim mismatch alone explains the
|
|
881
|
+
garbage output.
|
|
882
|
+
|
|
883
|
+
- Re-converted `data/tinyllama-1.1b-tok.gguf` and
|
|
884
|
+
`data/mistral-7b-instruct-v0.2-tok.gguf` with `--with-tokenizer`.
|
|
885
|
+
Both load + run inference, but **text-mode I/O fails** because
|
|
886
|
+
their tokenizers are SentencePiece (Llama-2 vocab), not the
|
|
887
|
+
byte-level BPE our `lib/tokenizer.rb` handles. T1.2's "never mask"
|
|
888
|
+
rule caught it cleanly: `WARN: tokenizer: piece "Ġ" not in vocab
|
|
889
|
+
— emitting UNK` ⇒ Mistral output `"The<unk>capital<unk>of<unk>..."`.
|
|
890
|
+
- Tracked as T1.3 (new task). Adds tokenizer-flavor detection from
|
|
891
|
+
`tokenizer.ggml.model` and a SentencePiece encoder path.
|
|
892
|
+
- **Current text-I/O coverage** (works end-to-end):
|
|
893
|
+
SmolLM2-135M, Llama-3.2-1B, Qwen2.5-0.5B. All byte-level BPE.
|
|
894
|
+
- **Inference-only** (text I/O blocked on T1.3): TinyLlama-1.1B,
|
|
895
|
+
Mistral-7B-v0.2. ID-mode `example_inference` (no PROMPT) still
|
|
896
|
+
works fine.
|
|
897
|
+
|
|
898
|
+
### D1 — algorithm-card drift detector (instead of auto-emitter)
|
|
899
|
+
|
|
900
|
+
- `prep/card_drift_check.rb`: a Ripper-walker tripwire that
|
|
901
|
+
verifies each `Toy::` class with both `def forward` and
|
|
902
|
+
`def algorithm` keeps the two in lock-step. Run via
|
|
903
|
+
`make check-cards`. Pure-Ruby stdlib, no extra deps.
|
|
904
|
+
- D1 was originally scoped as an auto-emitter that would delete
|
|
905
|
+
the 209 LOC of hand-written `def algorithm` methods. Closer
|
|
906
|
+
reading of those methods showed they're not 1:1 with the
|
|
907
|
+
unrolled forward code — `FFN`'s 5-line forward becomes a
|
|
908
|
+
curated 2-step card that fuses `matmul + add_bias + gelu`. An
|
|
909
|
+
auto-emitter would produce faithful-but-ugly output that
|
|
910
|
+
wouldn't actually replace the cards.
|
|
911
|
+
- The drift detector matches the real failure mode: forward
|
|
912
|
+
changes, card doesn't (or vice versa). Catches the common
|
|
913
|
+
`gelu` / `silu` / `⊙` activation-mismatch + the
|
|
914
|
+
matmul-presence collapse case. Validated by deliberate-drift
|
|
915
|
+
test (deleting `gelu(...)` from FFN's card → tool fails).
|
|
916
|
+
- For the original "delete the 209 LOC" goal (re-trigger
|
|
917
|
+
condition (a) from task #95): an auto-emitter is still possible
|
|
918
|
+
if/when we add a third architecture and feel the cost, but the
|
|
919
|
+
drift detector covers the maintenance-during-edits case today.
|
|
920
|
+
|
|
921
|
+
### P2 — measured, not viable (skipped)
|
|
922
|
+
|
|
923
|
+
- `docs/roadmap/p2-ffi-matmul-2026-05-23.md`: planned to FFI-wrap
|
|
924
|
+
`Mat#matmul` for a 5–10× win on `example_train`. Measured the
|
|
925
|
+
actual cost: session-per-op FFI is **1.7× SLOWER** than pure-Ruby
|
|
926
|
+
at training-toy shapes (32×8 matmul). 5 000 calls × ~180 µs
|
|
927
|
+
session lifecycle overhead = ~0.9 s deficit vs the 1.3 s
|
|
928
|
+
pure-Ruby baseline.
|
|
929
|
+
- Lesson: the 38× FFI gain on LLM-shape inference is for
|
|
930
|
+
*whole-graph* FFI (one `tnn_compute` per step), not per-op FFI.
|
|
931
|
+
At toy shapes, per-op FFI loses on session overhead; break-even
|
|
932
|
+
is around `m*k*n ~ 100 000`. Real workloads (LoRA, KV decode,
|
|
933
|
+
full FT) route through whole-graph FFI cache classes already
|
|
934
|
+
and are unaffected.
|
|
935
|
+
- If we later want fast `example_train`, build a
|
|
936
|
+
`TransformerLMTrainerFFI` mirror of `LlamaSeqForwardFFICache`
|
|
937
|
+
for the custom-GPT shape (~500 LOC, ~5–20× expected). Queued
|
|
938
|
+
as P2-α; not currently prioritised.
|
|
939
|
+
|
|
940
|
+
### Bench harness + Lowerer evidence
|
|
941
|
+
|
|
942
|
+
- `bench/` directory with three Spinel-compiled benches —
|
|
943
|
+
`lora_step.rb` (training step ms), `inference.rb` (toks/sec), and
|
|
944
|
+
`tokenizer.rb` (encode μs/token). Each emits `BENCH metric value`
|
|
945
|
+
lines on stdout; the orchestrator at `bench/check.rb` runs them
|
|
946
|
+
and compares to `bench/baselines.csv`, exiting 1 on any metric
|
|
947
|
+
past its per-metric tolerance.
|
|
948
|
+
- `make bench` runs the gate; `make bench-update` rewrites the
|
|
949
|
+
baselines; `make bench-report` runs without gating (handy for
|
|
950
|
+
local exploration). Suggested use: invoke before pushing
|
|
951
|
+
perf-sensitive changes; the CSV diffs cleanly in git so
|
|
952
|
+
re-baselining is a normal commit.
|
|
953
|
+
- `docs/roadmap/lowerer-evidence-2026-05-23.md`: trace-driven
|
|
954
|
+
evidence on whether the full Lowerer pays for itself.
|
|
955
|
+
Measured `example_train` (native-Mat path) — 82.7 % of step
|
|
956
|
+
wallclock is in three matmul methods, 51 % of all matmul
|
|
957
|
+
calls share one (N, P) inner-dim pair. But the comparable
|
|
958
|
+
alternative (FFI the matmul through ggml) gives 10–100×
|
|
959
|
+
for an order of magnitude less code than the ~500-LOC
|
|
960
|
+
Lowerer. Verdict: split the Lowerer's three claimed benefits
|
|
961
|
+
into proportionate tools (P2 to FFI Mat ops; D1 standalone
|
|
962
|
+
card emitter; Spinel landmines wait for upstream).
|
|
963
|
+
- `lib/transformer.rb`: six hot `Mat` methods (`matmul`,
|
|
964
|
+
`matmul_t`, `t_matmul`, `plus`, `add!`, `scale!`) wrapped with
|
|
965
|
+
`tnn_trace_begin/end`. Zero-cost when off (1.44 s baseline →
|
|
966
|
+
1.44 s with traces present but inactive on a 87-sequence
|
|
967
|
+
training run); 12 % overhead when on. `MAT_SHAPES=1` env
|
|
968
|
+
enables a shape histogram printf for Lowerer-style evidence
|
|
969
|
+
runs.
|
|
970
|
+
|
|
971
|
+
### Toolchain
|
|
972
|
+
|
|
973
|
+
- Bumped Spinel to master `d59926a`. Two changes affect us directly:
|
|
974
|
+
- `0adca86` (matz/spinel#647): `examples/example_serve` no longer
|
|
975
|
+
segfaults at startup. The bug was a Spinel codegen ordering issue
|
|
976
|
+
on top-level `CONST = recv.method` when `recv` was a local;
|
|
977
|
+
we'd kept the buggy form intentionally as a regression check.
|
|
978
|
+
Confirmed: it loads the model, binds the port, accepts requests.
|
|
979
|
+
- `97bf268` (rbs_extract): `--rbs sig` Mat-in-`Toy::` resolution
|
|
980
|
+
now works. Warning overhead dropped from +45 (was unusable) to
|
|
981
|
+
+3 (acceptable). `sig/toy.rbs` header reflects this.
|
|
982
|
+
- The Hash#[missing] → 0 codegen behavior (T1.2's root cause)
|
|
983
|
+
still requires the `has_key?` guards we landed yesterday;
|
|
984
|
+
Spinel's matz/spinel#521 fix narrowed the symptoms but the
|
|
985
|
+
structural workaround stays.
|
|
986
|
+
|
|
987
|
+
### Observability / training
|
|
988
|
+
|
|
989
|
+
- New `tinynn/tinynn_trace.{c,h}`: Chrome Trace Format emitter,
|
|
990
|
+
~5ns per begin/end when off, opens in https://perfetto.dev.
|
|
991
|
+
Instrumented `tnn_realize{,_backward}`, `tnn_compute{,_backward}`,
|
|
992
|
+
`tnn_upload{,_from_float_array,_from_int_array}`, `tnn_download`.
|
|
993
|
+
`examples/03_finetune_lora.rb` and its CUDA mirror accept
|
|
994
|
+
`TRACE=path.json` to wrap each step.
|
|
995
|
+
- New `tnn_scratch_sum_f32` and `tnn_scratch_sum_sq_f32` reducers
|
|
996
|
+
(mean / L2-norm without a Mat round-trip).
|
|
997
|
+
- `examples/03_finetune_lora{,_cuda}.rb` accept `GRAD_DUMP=1` to
|
|
998
|
+
emit per-(layer, head, A/B) gradient stats as CSV.
|
|
999
|
+
- **Finding** (`docs/p1-grad-bisection-2026-05-22.md`):
|
|
1000
|
+
CPU and CUDA LoRA gradients agree to 0.32–0.42 % median across
|
|
1001
|
+
steps 1–3; loss curves match to 3+ decimals. The underlying
|
|
1002
|
+
ggml-cpu sched aliasing bug is still upstream but the local
|
|
1003
|
+
workaround (`tnn_pin_all_graph_b_nodes`, wired into
|
|
1004
|
+
`lib/llama_seq_forward_ffi.rb:1192`) prevents it from biting
|
|
1005
|
+
prod LoRA training. Issue still wants filing against ggml-org/ggml.
|
|
1006
|
+
|
|
1007
|
+
### Inference / DevEx
|
|
1008
|
+
|
|
1009
|
+
- `examples/example_inference` now speaks text end-to-end on GGUFs
|
|
1010
|
+
converted with `--with-tokenizer`: encode `PROMPT`, generate,
|
|
1011
|
+
decode the IDs back into text. Falls back to the hardcoded
|
|
1012
|
+
five-ID SmolLM2 prompt + raw IDs when no tokenizer is embedded.
|
|
1013
|
+
- `tinynn/ab_smoke_tokenizer.rb`: 15/15 prompts round-trip
|
|
1014
|
+
bit-identically on SmolLM2-135M, Llama-3.2-1B, Qwen2.5-0.5B.
|
|
1015
|
+
The earlier "SmolLM2 fails 2/5" bug (`?\n` and `.txt` patterns)
|
|
1016
|
+
was actually a **Spinel hash codegen bug**: `Hash#[missing_key]`
|
|
1017
|
+
returns integer 0 instead of nil, so missing merges appeared as
|
|
1018
|
+
rank-0 (top-priority) merges in the BPE loop. Fixed by guarding
|
|
1019
|
+
every hash read with `has_key?`. Tokenizer now warns loudly when
|
|
1020
|
+
emitting UNK so the next instance of this class of bug surfaces
|
|
1021
|
+
fast.
|
|
1022
|
+
|
|
1023
|
+
### RoPE scaling (FFI)
|
|
1024
|
+
|
|
1025
|
+
- `tnn_rope_ext` and `tnn_rope_ext_back` widened with the full
|
|
1026
|
+
YaRN/llama3 arg surface (freq_scale, ext_factor, attn_factor,
|
|
1027
|
+
beta_fast, beta_slow, freq_factors). New
|
|
1028
|
+
`tnn_rope_freq_factors_alloc` allocates the per-dim factors
|
|
1029
|
+
tensor; values computed in
|
|
1030
|
+
`Toy::RopeScaling.compute_llama3_freq_factors`.
|
|
1031
|
+
- `Toy::RopeScaling` value class + `Toy::SmolLM2Config.@rope_scaling`
|
|
1032
|
+
field; `SmolLM2ConfigLoader` reads `llama.rope.scaling.*` from
|
|
1033
|
+
GGUF and dispatches to `none / linear / llama3 / yarn`.
|
|
1034
|
+
- `prep/convert_smollm2_to_gguf.py` now propagates HF `rope_scaling.*`
|
|
1035
|
+
metadata to GGUF. Without this our Llama-3.x GGUFs had no scaling
|
|
1036
|
+
metadata; converter was the bottleneck.
|
|
1037
|
+
|
|
1038
|
+
## v0.1.0-pre-alpha — 2026-05-22
|
|
1039
|
+
|
|
1040
|
+
**First tagged cut.** Not API-stable; expect breaking changes in any
|
|
1041
|
+
direction. The goal of this tag is to mark a coherent set of working
|
|
1042
|
+
capabilities so external readers have a reference point.
|
|
1043
|
+
|
|
1044
|
+
### Inference
|
|
1045
|
+
|
|
1046
|
+
- Run pretrained models end-to-end as a single native binary:
|
|
1047
|
+
GPT-2 family, Llama-family (SmolLM2, TinyLlama, Llama-3.2,
|
|
1048
|
+
Mistral-7B), Qwen2.5 family (0.5B → 7B).
|
|
1049
|
+
- KV-cache decode on CPU and CUDA, F32 and Q8.
|
|
1050
|
+
- Zero-copy mmap of GGUF weights into both CPU and CUDA buffers
|
|
1051
|
+
(UVA on GB10).
|
|
1052
|
+
- Discover cached models from HuggingFace / Ollama / LM Studio /
|
|
1053
|
+
`./data` / `$TOY_MODEL_DIR` via `examples/example_list_models`.
|
|
1054
|
+
|
|
1055
|
+
### Training
|
|
1056
|
+
|
|
1057
|
+
- From-scratch training of small GPTs via `Toy::Trainer`
|
|
1058
|
+
(`examples/example_train` on TinyStories).
|
|
1059
|
+
- LoRA fine-tune on attention Q heads, CPU + CUDA
|
|
1060
|
+
(`examples/example_finetune` / `example_finetune_cuda`).
|
|
1061
|
+
- QLoRA: Q8 base + F32 LoRA adapter, CPU (works through mmap) and
|
|
1062
|
+
CUDA (works through the new `realize_for_q8_copy` path that
|
|
1063
|
+
bypasses the BYO-pointer padding issue).
|
|
1064
|
+
- Full fine-tune on CUDA: every per-block weight + optional
|
|
1065
|
+
embedding/output trainable, up to ~1.5B verified
|
|
1066
|
+
(`demos/smollm2_seq_full_finetune_cuda`).
|
|
1067
|
+
- Sequence-mode forward graph (M3): `T` tokens in, `T` logits out;
|
|
1068
|
+
one forward + backward + opt_step per training step instead of
|
|
1069
|
+
T separate KV-decode rebuilds.
|
|
1070
|
+
|
|
1071
|
+
### Serving
|
|
1072
|
+
|
|
1073
|
+
- OpenAI-compatible HTTP API via Tep+Spinel
|
|
1074
|
+
(`tep_demo/openai_api_smollm2` and family).
|
|
1075
|
+
- Lite HTTP example for direct token-IDs in / token-IDs out
|
|
1076
|
+
(`examples/example_serve` — see Known issues).
|
|
1077
|
+
|
|
1078
|
+
### Infrastructure
|
|
1079
|
+
|
|
1080
|
+
- Phase 0.6 CPU/CUDA mirror dedup: `prep/gen_cuda_mirror.rb`
|
|
1081
|
+
generates `*_cuda.rb` files from their CPU counterparts via
|
|
1082
|
+
mechanical substitution; `make verify-mirrors` catches drift.
|
|
1083
|
+
- Vendored ggml patches (`vendor-patches/`):
|
|
1084
|
+
- `0001-0002` CUDA `buffer_from_ptr` for BYO-pointer mmap.
|
|
1085
|
+
- `0003` BYO copy-mode A/B selector.
|
|
1086
|
+
- `0004` CUDA `cpy` strided-destination fix.
|
|
1087
|
+
- `0005` `GGML_OP_CONCAT` backward.
|
|
1088
|
+
- `0006` chunked `get_rows_back` for vocab > 65535
|
|
1089
|
+
(fixes Qwen-class embedding training on CUDA).
|
|
1090
|
+
|
|
1091
|
+
### Bench reference (GB10, 2026-05-22)
|
|
1092
|
+
|
|
1093
|
+
| Model | CPU tok/s | CUDA tok/s |
|
|
1094
|
+
| ------------------- | --------- | ---------- |
|
|
1095
|
+
| SmolLM2-135M F32 | 88 | 76 |
|
|
1096
|
+
| Qwen2.5-0.5B F32 | 34 | 41 |
|
|
1097
|
+
| Qwen2.5-1.5B F32 | 14 | 20 |
|
|
1098
|
+
| Qwen2.5-7B Q8 | 4.2 | 14 |
|
|
1099
|
+
|
|
1100
|
+
LoRA training step time on SmolLM2-135M (T=4): 108 ms/step.
|
|
1101
|
+
Full fine-tune step time on SmolLM2-135M (T=4): 108 ms/step.
|
|
1102
|
+
|
|
1103
|
+
Full details: `docs/archive/bench-gx10-2026-05-22.md`.
|
|
1104
|
+
|
|
1105
|
+
### Known issues
|
|
1106
|
+
|
|
1107
|
+
- CPU LoRA training requires `tnn_pin_all_graph_b_nodes` to work
|
|
1108
|
+
around a ggml-cpu scheduler aliasing bug on long backward chains
|
|
1109
|
+
(filed upstream as `ggml-org/ggml#1501`). Current cache classes
|
|
1110
|
+
apply the pin transparently.
|
|
1111
|
+
- Full fine-tune memory ceiling: ~3B on a 121 GB unified-memory box
|
|
1112
|
+
(1.5B comfortable, 3B fits, 7B doesn't).
|
|
1113
|
+
|
|
1114
|
+
### Sibling projects
|
|
1115
|
+
|
|
1116
|
+
- [Spinel](https://github.com/matz/spinel) — Ruby AOT compiler
|
|
1117
|
+
matz is building; we live in `~/sites/spinel`. Issues filed
|
|
1118
|
+
during this work: `#644` (Range indexing codegen), `#645`
|
|
1119
|
+
(Optional<Int> narrowing).
|
|
1120
|
+
- [Tep](https://github.com/OriPekelman/tep) — Sinatra-flavoured
|
|
1121
|
+
HTTP framework that compiles to a native binary via Spinel.
|
|
1122
|
+
Issues filed during this work: `#13`, `#16`, `#17`.
|
|
1123
|
+
- Vendored [ggml](https://github.com/ggml-org/ggml). Upstream
|
|
1124
|
+
contributions filed: `#1500` (merged), `#1501` (open).
|