isagellm-core 0.4.0.18__tar.gz → 0.4.0.19__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- isagellm_core-0.4.0.19/PKG-INFO +718 -0
- isagellm_core-0.4.0.19/README.md +684 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/pyproject.toml +1 -1
- isagellm_core-0.4.0.19/src/isagellm_core.egg-info/PKG-INFO +718 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/isagellm_core.egg-info/SOURCES.txt +64 -4
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__init__.py +4 -22
- isagellm_core-0.4.0.19/src/sagellm_core/__init__.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__main__.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/__pycache__/__init__.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/config.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/demo.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/engine.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/engine_factory.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/engine_server.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/factory.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/health.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/llm_engine.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/pd_executor.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/plugins.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/runner.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/runtime.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/__pycache__/workload.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/config.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/__init__.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/__pycache__/__init__.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/__pycache__/base.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/__pycache__/beam_search.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/__pycache__/contrastive.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/__pycache__/greedy.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/__pycache__/sampling.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/base.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/beam_search.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/contrastive.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/greedy.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/sampling.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/demo.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/distributed/__init__.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/distributed/__pycache__/__init__.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/distributed/__pycache__/strategies.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/distributed/strategies.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engine.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engine_core/__init__.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engine_core/__pycache__/__init__.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engine_core/__pycache__/engine_core.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engine_core/engine_core.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/__init__.py +47 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/__init__.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/__pycache__/base.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/__pycache__/batch.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/__pycache__/metrics.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/__pycache__/scheduler.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/__pycache__/scheduler_kv_bridge.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/base.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/batch.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/metrics.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/policy/__init__.py +6 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/policy/__init__.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/policy/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/policy/__pycache__/fcfs.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/policy/__pycache__/priority.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/policy/fcfs.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/policy/priority.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/scheduler.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engine_core/scheduler/scheduler_kv_bridge.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engine_factory.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engine_server.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engines/__init__.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engines/__pycache__/__init__.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engines/__pycache__/ascend.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engines/__pycache__/cpu.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engines/__pycache__/embedding.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engines/__pycache__/hf_cuda.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engines/__pycache__/pytorch.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/engines/__pycache__/pytorch_engine.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engines/embedding.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/executor/__init__.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/executor/__pycache__/__init__.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/executor/__pycache__/executor_base.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/executor/__pycache__/uniproc_executor.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/executor/executor_base.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/executor/uniproc_executor.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/factory.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/health.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/inputs/__init__.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/inputs/__pycache__/__init__.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/inputs/__pycache__/processor.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/inputs/__pycache__/tokenizer_utils.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/inputs/processor.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/inputs/tokenizer_utils.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/__init__.py +30 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/__init__.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/__pycache__/activation.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/__pycache__/base.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/__pycache__/embedding.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/__pycache__/linear.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/__pycache__/normalization.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/activation.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/base.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/embedding.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/linear.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/layers/normalization.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/llm_engine.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/__init__.py +139 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/__init__.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/__pycache__/base.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/__pycache__/factory.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/__pycache__/gpt2.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/__pycache__/llama.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/__pycache__/mixtral.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/model/__pycache__/model_loader.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/__pycache__/quantization.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/__pycache__/qwen2.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/__pycache__/registry.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/model/__pycache__/weight_utils.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/base.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/factory.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/gpt2.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/llama.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/mixtral.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/model/model_loader.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/quantization.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/qwen2.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/registry.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/weight_loader/__init__.py +54 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/weight_loader/__init__.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/weight_loader/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/weight_loader/__pycache__/base.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/weight_loader/__pycache__/pytorch.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/weight_loader/__pycache__/quantized.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/weight_loader/__pycache__/safetensors.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/weight_loader/base.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/weight_loader/pytorch.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/weight_loader/quantized.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/model/weight_loader/safetensors.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/model/weight_utils.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/observability/__init__.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/observability/__pycache__/__init__.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/observability/__pycache__/logger.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/observability/__pycache__/metrics.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/observability/logger.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/observability/metrics.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/pd_executor.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/plugins.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/runner.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/runtime.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/sampling/__init__.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/sampling/__pycache__/__init__.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/sampling/__pycache__/params.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/sampling/__pycache__/sampler.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/sampling/params.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/sampling/sampler.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/worker/__init__.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/worker/__pycache__/__init__.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/worker/__pycache__/worker.cpython-311.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/worker/model_runner/__init__.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/worker/model_runner/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/worker/model_runner/__pycache__/model_runner.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.19/src/sagellm_core/worker/model_runner/model_runner.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/worker/worker.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/workload.pyc +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_ci_smoke.py +14 -14
- isagellm_core-0.4.0.19/tests/test_engine.py +165 -0
- isagellm_core-0.4.0.19/tests/test_engine_contract_simplified.py +125 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_engine_server.py +0 -3
- isagellm_core-0.4.0.19/tests/test_llama_model.py +280 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_llm_engine_error_handling.py +1 -1
- isagellm_core-0.4.0.19/tests/test_llm_engine_task2_interfaces.py +171 -0
- isagellm_core-0.4.0.19/tests/test_mixtral_moe.py +309 -0
- isagellm_core-0.4.0.19/tests/test_model_registry.py +164 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_pd_separation.py +27 -31
- isagellm_core-0.4.0.19/tests/test_quantized_loader.py +336 -0
- isagellm_core-0.4.0.19/tests/test_qwen2_model.py +273 -0
- isagellm_core-0.4.0.19/tests/test_scheduler_kv_integration.py +458 -0
- isagellm_core-0.4.0.19/tests/test_weight_loader.py +413 -0
- isagellm_core-0.4.0.18/PKG-INFO +0 -308
- isagellm_core-0.4.0.18/README.md +0 -274
- isagellm_core-0.4.0.18/src/isagellm_core.egg-info/PKG-INFO +0 -308
- isagellm_core-0.4.0.18/src/sagellm_core/__init__.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/__pycache__/base_engine.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/__pycache__/mock_engine.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/distributed/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/engine_core/scheduler/__init__.py +0 -19
- isagellm_core-0.4.0.18/src/sagellm_core/engine_core/scheduler/__init__.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/engine_core/scheduler/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/engine_core/scheduler/__pycache__/scheduler.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/engine_core/scheduler/scheduler.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/engines/__pycache__/mock.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/engines/__pycache__/pytorch_engine.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/inputs/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/model/__init__.py +0 -13
- isagellm_core-0.4.0.18/src/sagellm_core/model/__init__.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/model/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/sampling/__pycache__/__init__.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/worker/model_runner/__pycache__/model_runner.cpython-311.pyc +0 -0
- isagellm_core-0.4.0.18/src/sagellm_core/worker/model_runner/model_runner.pyc +0 -0
- isagellm_core-0.4.0.18/tests/test_engine.py +0 -308
- isagellm_core-0.4.0.18/tests/test_engine_contract_simplified.py +0 -75
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/MANIFEST.in +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/setup.cfg +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/setup.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/isagellm_core.egg-info/dependency_links.txt +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/isagellm_core.egg-info/entry_points.txt +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/isagellm_core.egg-info/requires.txt +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/isagellm_core.egg-info/top_level.txt +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/decoding/__init__.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/distributed/__init__.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engine_core/__init__.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/engines/__init__.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/executor/__init__.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/inputs/__init__.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/observability/__init__.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/py.typed +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/sampling/__init__.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/worker/__init__.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/src/sagellm_core/worker/model_runner/__init__.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_config.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_decoding_strategies.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_e2e_llm_integration.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_engine_behavior_parity.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_llm_engine_contract.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_llm_engine_decoding.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_model_loader.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_observability.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_sampling.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_streaming_pd.py +0 -0
- {isagellm_core-0.4.0.18 → isagellm_core-0.4.0.19}/tests/test_task0_10_workload.py +0 -0
|
@@ -0,0 +1,718 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: isagellm-core
|
|
3
|
+
Version: 0.4.0.19
|
|
4
|
+
Summary: sageLLM core runtime with PD separation (MVP)
|
|
5
|
+
Author: IntelliStream Team
|
|
6
|
+
License: Proprietary - IntelliStream
|
|
7
|
+
Classifier: Development Status :: 3 - Alpha
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
10
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
12
|
+
Requires-Python: ==3.11.*
|
|
13
|
+
Description-Content-Type: text/markdown
|
|
14
|
+
Requires-Dist: pydantic>=2.0.0
|
|
15
|
+
Requires-Dist: pyyaml>=6.0.0
|
|
16
|
+
Requires-Dist: isagellm-protocol<0.5.0,>=0.4.0.0
|
|
17
|
+
Requires-Dist: isagellm-backend<0.5.0,>=0.4.0.0
|
|
18
|
+
Requires-Dist: isagellm-comm<0.5.0,>=0.4.0.0
|
|
19
|
+
Requires-Dist: isagellm-kv-cache<0.5.0,>=0.4.0.0
|
|
20
|
+
Requires-Dist: fastapi>=0.100.0
|
|
21
|
+
Requires-Dist: uvicorn>=0.22.0
|
|
22
|
+
Requires-Dist: torch>=2.0.0
|
|
23
|
+
Requires-Dist: transformers>=4.35.0
|
|
24
|
+
Requires-Dist: accelerate>=0.26.0
|
|
25
|
+
Provides-Extra: dev
|
|
26
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
27
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
|
28
|
+
Requires-Dist: pytest-timeout>=2.0.0; extra == "dev"
|
|
29
|
+
Requires-Dist: ruff>=0.8.0; extra == "dev"
|
|
30
|
+
Requires-Dist: mypy>=1.0.0; extra == "dev"
|
|
31
|
+
Requires-Dist: types-PyYAML>=6.0.0; extra == "dev"
|
|
32
|
+
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
|
|
33
|
+
Requires-Dist: isage-pypi-publisher>=0.2.0; extra == "dev"
|
|
34
|
+
|
|
35
|
+
# sagellm-core
|
|
36
|
+
|
|
37
|
+

|
|
38
|
+
[](https://pypi.org/project/isagellm-core/)
|
|
39
|
+
[](https://pypi.org/project/isagellm-core/)
|
|
40
|
+
[](https://github.com/intellistream/sagellm-core/blob/main/LICENSE)
|
|
41
|
+
[](https://github.com/astral-sh/ruff)
|
|
42
|
+
|
|
43
|
+
**sageLLM Core** 是一个硬件无关的 LLM 推理引擎,提供统一的推理接口(generate、stream、execute),支持自动后端选择(CPU/CUDA/Ascend),内置解码策略系统,并支持 PD 分离的混合模式执行。
|
|
44
|
+
|
|
45
|
+
**版本**: `0.4.0.17` | **最后更新**: 2026-02-02 | **协议遵循**: [Protocol v0.1](https://github.com/intellistream/sagellm-docs/blob/main/docs/specs/protocol_v0.1.md)
|
|
46
|
+
|
|
47
|
+
## 📍 职责定位
|
|
48
|
+
|
|
49
|
+
在整个 sageLLM 架构中的位置与职责:
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
53
|
+
│ Application Layer │
|
|
54
|
+
│ (sagellm-gateway, sagellm-control-plane) │
|
|
55
|
+
└────────────────┬────────────────────────────────────────────┘
|
|
56
|
+
│
|
|
57
|
+
┌────────────────┴────────────────────────────────────────────┐
|
|
58
|
+
│ sagellm-core 本仓库 │
|
|
59
|
+
│ ┌──────────────────────────────────────────────────────┐ │
|
|
60
|
+
│ │ LLMEngine: 硬件无关的统一推理入口 │ │
|
|
61
|
+
│ │ • generate() / stream() / execute() │ │
|
|
62
|
+
│ │ • 自动后端选择 (cpu/cuda/ascend) │ │
|
|
63
|
+
│ │ • Continuous Batching 调度 │ │
|
|
64
|
+
│ │ • 解码策略系统 (Greedy/Sampling/BeamSearch) │ │
|
|
65
|
+
│ │ • PD 分离混合模式执行 │ │
|
|
66
|
+
│ └──────────────────────────────────────────────────────┘ │
|
|
67
|
+
├──────────────────────────────────────────────────────────────┤
|
|
68
|
+
│ 核心依赖 (L1 层) │
|
|
69
|
+
│ ├─ sagellm-backend: 硬件抽象、设备管理 │
|
|
70
|
+
│ ├─ sagellm-comm: 通信硬件、TP/PP 通信 │
|
|
71
|
+
│ ├─ sagellm-kv-cache: KV 缓存管理、驱逐策略 │
|
|
72
|
+
│ └─ sagellm-protocol: 数据结构、错误定义 │
|
|
73
|
+
└──────────────────────────────────────────────────────────────┘
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
**职责边界**:
|
|
77
|
+
- ✅ **Core 负责**: LLMEngine、调度、推理编排、解码策略
|
|
78
|
+
- ✅ **Backend 负责**: 硬件抽象、设备管理、算子/内核
|
|
79
|
+
- ✅ **Comm 负责**: 通信硬件抽象、集合操作、拓扑管理
|
|
80
|
+
- ✅ **Protocol 负责**: 全局共享的数据结构、错误码、ID 方案
|
|
81
|
+
|
|
82
|
+
## ✨ 核心特性
|
|
83
|
+
|
|
84
|
+
| 特性 | 说明 |
|
|
85
|
+
|------|------|
|
|
86
|
+
| **统一推理接口** | `generate()` / `stream()` / `execute()` - 同步、流式、协议兼容 |
|
|
87
|
+
| **硬件无关** | CPU/CUDA/Ascend - 自动检测与选择 |
|
|
88
|
+
| **解码策略系统** | Greedy、Sampling、Beam Search、Contrastive Decoding |
|
|
89
|
+
| **Continuous Batching** | 动态批处理,充分利用硬件 |
|
|
90
|
+
| **PD 分离执行** | Prefill 和 Decode 阶段分离,支持混合模式 |
|
|
91
|
+
| **配置驱动** | YAML/JSON 配置,Pydantic v2 验证 |
|
|
92
|
+
| **HTTP Server** | FastAPI 实现,支持 SSE 流式传输 |
|
|
93
|
+
| **CPU-First** | 完整支持无 GPU 环境,便于测试开发 |
|
|
94
|
+
| **类型安全** | 完整的 Python 类型标注,Mypy 支持 |
|
|
95
|
+
|
|
96
|
+
## 📦 依赖关系
|
|
97
|
+
|
|
98
|
+
### 核心依赖(自动安装)
|
|
99
|
+
|
|
100
|
+
```toml
|
|
101
|
+
isagellm-protocol>=0.4.0.0,<0.5.0 # 协议定义
|
|
102
|
+
isagellm-backend>=0.4.0.0,<0.5.0 # 硬件抽象
|
|
103
|
+
isagellm-comm>=0.4.0.0,<0.5.0 # 通信后端
|
|
104
|
+
isagellm-kv-cache>=0.4.0.0,<0.5.0 # KV 缓存管理
|
|
105
|
+
|
|
106
|
+
# 框架依赖
|
|
107
|
+
pydantic>=2.0.0 # 数据验证
|
|
108
|
+
pyyaml>=6.0.0 # 配置解析
|
|
109
|
+
torch>=2.0.0 # 张量计算
|
|
110
|
+
transformers>=4.35.0 # 模型加载
|
|
111
|
+
fastapi>=0.100.0 # HTTP 服务
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### 谁依赖我
|
|
115
|
+
|
|
116
|
+
- 🔵 **sagellm-control-plane**: 使用 Core 进行请求调度、负载均衡
|
|
117
|
+
- 🟡 **sagellm-compression**: 建立在 Core 的模型执行层上
|
|
118
|
+
- 🟢 **sagellm-gateway**: 提供 OpenAI 兼容 API
|
|
119
|
+
|
|
120
|
+
## 🚀 安装指南
|
|
121
|
+
|
|
122
|
+
### PyPI 安装(推荐)
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
# 安装最新版本
|
|
126
|
+
pip install isagellm-core==0.4.0.17
|
|
127
|
+
|
|
128
|
+
# 安装指定版本范围
|
|
129
|
+
pip install "isagellm-core>=0.4.0.0,<0.5.0"
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
### 本地开发安装
|
|
133
|
+
|
|
134
|
+
```bash
|
|
135
|
+
# 克隆仓库
|
|
136
|
+
git clone https://github.com/intellistream/sagellm-core.git
|
|
137
|
+
cd sagellm-core
|
|
138
|
+
|
|
139
|
+
# 方式 1:一键安装(推荐)
|
|
140
|
+
./quickstart.sh
|
|
141
|
+
|
|
142
|
+
# 方式 2:手动安装开发环境
|
|
143
|
+
pip install -e ".[dev]"
|
|
144
|
+
|
|
145
|
+
# 安装 pre-commit hooks
|
|
146
|
+
pre-commit install
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
### 本地链接依赖(用于本地多包开发)
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
# 如果同时在开发 backend/protocol/comm,使用本地版本
|
|
153
|
+
pip install -e ../sagellm-protocol
|
|
154
|
+
pip install -e ../sagellm-backend
|
|
155
|
+
pip install -e ../sagellm-comm
|
|
156
|
+
pip install -e ../sagellm-kv-cache
|
|
157
|
+
pip install -e ".[dev]"
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
### 验证安装
|
|
161
|
+
|
|
162
|
+
```bash
|
|
163
|
+
# 检查 package 版本
|
|
164
|
+
python -c "import sagellm_core; print(sagellm_core.__version__)"
|
|
165
|
+
|
|
166
|
+
# 运行快速测试
|
|
167
|
+
pytest tests/test_ci_smoke.py -v
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
## 🎯 快速开始
|
|
171
|
+
|
|
172
|
+
### 1. 基础推理
|
|
173
|
+
|
|
174
|
+
```python
|
|
175
|
+
from sagellm_core import LLMEngine, LLMEngineConfig
|
|
176
|
+
|
|
177
|
+
# 创建配置
|
|
178
|
+
config = LLMEngineConfig(
|
|
179
|
+
model_path="sshleifer/tiny-gpt2", # HuggingFace 模型名或本地路径
|
|
180
|
+
backend_type="cpu", # 自动选择 cpu/cuda/ascend
|
|
181
|
+
max_new_tokens=20
|
|
182
|
+
)
|
|
183
|
+
|
|
184
|
+
# 初始化引擎
|
|
185
|
+
engine = LLMEngine(config)
|
|
186
|
+
|
|
187
|
+
# 异步运行
|
|
188
|
+
import asyncio
|
|
189
|
+
|
|
190
|
+
async def main():
|
|
191
|
+
await engine.start()
|
|
192
|
+
|
|
193
|
+
# 同步生成(完整输出)
|
|
194
|
+
response = await engine.generate("Hello, world!")
|
|
195
|
+
print(response.output_text)
|
|
196
|
+
|
|
197
|
+
# 流式生成(逐 token 返回)
|
|
198
|
+
async for event in engine.stream("Once upon a time"):
|
|
199
|
+
if event.event == "delta":
|
|
200
|
+
print(event.chunk, end="", flush=True)
|
|
201
|
+
|
|
202
|
+
await engine.stop()
|
|
203
|
+
|
|
204
|
+
asyncio.run(main())
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
### 2. 使用采样参数控制生成
|
|
208
|
+
|
|
209
|
+
```python
|
|
210
|
+
from sagellm_core import LLMEngine, LLMEngineConfig
|
|
211
|
+
from sagellm_protocol.sampling import SamplingParams, DecodingStrategy
|
|
212
|
+
|
|
213
|
+
async def main():
|
|
214
|
+
config = LLMEngineConfig(model_path="sshleifer/tiny-gpt2")
|
|
215
|
+
engine = LLMEngine(config)
|
|
216
|
+
await engine.start()
|
|
217
|
+
|
|
218
|
+
prompt = "The future of AI is"
|
|
219
|
+
|
|
220
|
+
# 确定性输出(Greedy)
|
|
221
|
+
response = await engine.generate(
|
|
222
|
+
prompt,
|
|
223
|
+
sampling_params=SamplingParams(
|
|
224
|
+
strategy=DecodingStrategy.GREEDY,
|
|
225
|
+
max_tokens=20
|
|
226
|
+
)
|
|
227
|
+
)
|
|
228
|
+
print(f"Greedy: {response.output_text}")
|
|
229
|
+
|
|
230
|
+
# 随机采样(Temperature 控制)
|
|
231
|
+
response = await engine.generate(
|
|
232
|
+
prompt,
|
|
233
|
+
sampling_params=SamplingParams(
|
|
234
|
+
strategy=DecodingStrategy.SAMPLING,
|
|
235
|
+
temperature=0.7,
|
|
236
|
+
top_p=0.9,
|
|
237
|
+
max_tokens=20
|
|
238
|
+
)
|
|
239
|
+
)
|
|
240
|
+
print(f"Sampling: {response.output_text}")
|
|
241
|
+
|
|
242
|
+
await engine.stop()
|
|
243
|
+
|
|
244
|
+
asyncio.run(main())
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
### 3. 从 YAML 配置文件运行 Demo
|
|
248
|
+
|
|
249
|
+
```bash
|
|
250
|
+
# 查看可用配置
|
|
251
|
+
cat examples/config_cpu.yaml
|
|
252
|
+
|
|
253
|
+
# 运行 Demo
|
|
254
|
+
python -m sagellm_core.demo --config examples/config_cpu.yaml --verbose
|
|
255
|
+
|
|
256
|
+
# 查看输出 metrics
|
|
257
|
+
cat metrics.json
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
### 4. 启动 HTTP Server
|
|
261
|
+
|
|
262
|
+
```bash
|
|
263
|
+
# 方式 1:命令行
|
|
264
|
+
sage-engine --host 0.0.0.0 --port 8000
|
|
265
|
+
|
|
266
|
+
# 方式 2:Python API
|
|
267
|
+
from sagellm_core import engine_server_app
|
|
268
|
+
import uvicorn
|
|
269
|
+
|
|
270
|
+
uvicorn.run(engine_server_app, host="0.0.0.0", port=8000)
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
### 5. HTTP 请求示例
|
|
274
|
+
|
|
275
|
+
```bash
|
|
276
|
+
# 同步推理
|
|
277
|
+
curl -X POST http://localhost:8000/v1/completions \
|
|
278
|
+
-H "Content-Type: application/json" \
|
|
279
|
+
-d '{
|
|
280
|
+
"model": "gpt2",
|
|
281
|
+
"prompt": "Hello",
|
|
282
|
+
"max_tokens": 20
|
|
283
|
+
}'
|
|
284
|
+
|
|
285
|
+
# 流式推理
|
|
286
|
+
curl -X POST http://localhost:8000/v1/completions/stream \
|
|
287
|
+
-H "Content-Type: application/json" \
|
|
288
|
+
-d '{
|
|
289
|
+
"model": "gpt2",
|
|
290
|
+
"prompt": "Hello",
|
|
291
|
+
"max_tokens": 20,
|
|
292
|
+
"stream": true
|
|
293
|
+
}'
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
## 📚 API 文档
|
|
297
|
+
|
|
298
|
+
### LLMEngine - 主入口
|
|
299
|
+
|
|
300
|
+
**初始化**:
|
|
301
|
+
```python
|
|
302
|
+
LLMEngineConfig(
|
|
303
|
+
model_path: str, # 必需:HuggingFace 名或本地路径
|
|
304
|
+
backend_type: str = "auto", # 计算后端:cpu/cuda/ascend/auto
|
|
305
|
+
comm_type: str = "auto", # 通信后端:gloo/nccl/hccl/auto
|
|
306
|
+
max_batch_size: int = 32, # 最大批大小
|
|
307
|
+
max_model_len: int = 4096, # 最大序列长度
|
|
308
|
+
max_new_tokens: int = 128, # 每个请求最多生成 token 数
|
|
309
|
+
tensor_parallel_size: int = 1, # 张量并行度
|
|
310
|
+
pipeline_parallel_size: int = 1, # 流水线并行度
|
|
311
|
+
dtype: str = "auto", # 数据类型:float32/float16/bfloat16
|
|
312
|
+
)
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
**关键方法**:
|
|
316
|
+
```python
|
|
317
|
+
async def start() -> None:
|
|
318
|
+
"""启动引擎,加载模型"""
|
|
319
|
+
|
|
320
|
+
async def stop() -> None:
|
|
321
|
+
"""停止引擎,释放资源"""
|
|
322
|
+
|
|
323
|
+
async def generate(
|
|
324
|
+
prompt: str | list[int],
|
|
325
|
+
*,
|
|
326
|
+
sampling_params: SamplingParams | None = None,
|
|
327
|
+
max_tokens: int | None = None,
|
|
328
|
+
request_id: str | None = None,
|
|
329
|
+
) -> Response:
|
|
330
|
+
"""同步推理,返回完整输出"""
|
|
331
|
+
|
|
332
|
+
async def stream(
|
|
333
|
+
prompt_or_request: str | Request,
|
|
334
|
+
*,
|
|
335
|
+
max_tokens: int | None = None,
|
|
336
|
+
request_id: str | None = None,
|
|
337
|
+
) -> AsyncIterator[StreamEvent]:
|
|
338
|
+
"""流式推理,逐 token 返回事件"""
|
|
339
|
+
|
|
340
|
+
async def execute(request: Request) -> Response:
|
|
341
|
+
"""执行 Protocol Request,用于兼容旧接口"""
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
### SamplingParams - 采样参数
|
|
345
|
+
|
|
346
|
+
```python
|
|
347
|
+
from sagellm_protocol.sampling import SamplingParams, DecodingStrategy
|
|
348
|
+
|
|
349
|
+
SamplingParams(
|
|
350
|
+
strategy: DecodingStrategy = DecodingStrategy.GREEDY,
|
|
351
|
+
temperature: float = 0.0, # 越高越随机
|
|
352
|
+
top_p: float = 1.0, # Nucleus 采样
|
|
353
|
+
top_k: int = 0, # Top-K 采样
|
|
354
|
+
repetition_penalty: float = 1.0,
|
|
355
|
+
length_penalty: float = 1.0,
|
|
356
|
+
num_beams: int = 1, # Beam Search 宽度
|
|
357
|
+
max_tokens: int = 128,
|
|
358
|
+
seed: int | None = None, # 可复现性
|
|
359
|
+
)
|
|
360
|
+
```
|
|
361
|
+
|
|
362
|
+
### 其他核心 API
|
|
363
|
+
|
|
364
|
+
```python
|
|
365
|
+
# 配置加载(Legacy)
|
|
366
|
+
from sagellm_core import load_config
|
|
367
|
+
config = load_config("config.yaml") # 支持 YAML/JSON
|
|
368
|
+
|
|
369
|
+
# 后端创建(Legacy)
|
|
370
|
+
from sagellm_core import create_backend, BackendConfig
|
|
371
|
+
backend = create_backend(BackendConfig(kind="cpu"))
|
|
372
|
+
|
|
373
|
+
# 工厂方法(Legacy)
|
|
374
|
+
from sagellm_core import EngineFactory
|
|
375
|
+
factory = EngineFactory()
|
|
376
|
+
engine = factory.create("cpu") # 支持自动发现
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
## 🏗️ 架构设计
|
|
380
|
+
|
|
381
|
+
### 分层架构
|
|
382
|
+
|
|
383
|
+
```
|
|
384
|
+
┌──────────────────────────────────┐
|
|
385
|
+
│ LLMEngine (对外 API) │ ← 用户交互层
|
|
386
|
+
│ • generate/stream/execute │
|
|
387
|
+
└────────┬─────────────────────────┘
|
|
388
|
+
│
|
|
389
|
+
┌────────▼──────────────────────────┐
|
|
390
|
+
│ EngineCore (引擎核心) │ ← 推理协调层
|
|
391
|
+
│ • Scheduler: Continuous Batching │
|
|
392
|
+
│ • Executor: 工作进程管理 │
|
|
393
|
+
│ • KVCacheManager: 缓存管理 │
|
|
394
|
+
└────────┬──────────────────────────┘
|
|
395
|
+
│
|
|
396
|
+
┌────────▼──────────────────────────┐
|
|
397
|
+
│ Worker & ModelRunner │ ← 执行层
|
|
398
|
+
│ • 前向传播 │
|
|
399
|
+
│ • TP/PP 通信 │
|
|
400
|
+
│ • 硬件资源管理 │
|
|
401
|
+
└────────┬──────────────────────────┘
|
|
402
|
+
│
|
|
403
|
+
┌────┴────┬───────────┬────────────┐
|
|
404
|
+
▼ ▼ ▼ ▼
|
|
405
|
+
Backend Comm KV-Cache Protocol
|
|
406
|
+
```
|
|
407
|
+
|
|
408
|
+
### 模块说明
|
|
409
|
+
|
|
410
|
+
| 模块 | 路径 | 职责 |
|
|
411
|
+
|------|------|------|
|
|
412
|
+
| **llm_engine** | `src/sagellm_core/llm_engine.py` | 统一推理入口 |
|
|
413
|
+
| **engine_core** | `src/sagellm_core/engine_core/` | 调度与执行协调 |
|
|
414
|
+
| **scheduler** | `src/sagellm_core/engine_core/scheduler.py` | Continuous Batching |
|
|
415
|
+
| **executor** | `src/sagellm_core/executor/` | Worker 管理 |
|
|
416
|
+
| **worker** | `src/sagellm_core/worker/` | 单设备执行 |
|
|
417
|
+
| **decoding** | `src/sagellm_core/decoding/` | 5+ 种解码策略 |
|
|
418
|
+
| **runtime** | `src/sagellm_core/runtime.py` | PD 分离 Runtime |
|
|
419
|
+
| **pd_executor** | `src/sagellm_core/pd_executor.py` | Prefill/Decode 分离 |
|
|
420
|
+
| **engine_server** | `src/sagellm_core/engine_server.py` | HTTP 服务 |
|
|
421
|
+
|
|
422
|
+
## 🔧 开发指南
|
|
423
|
+
|
|
424
|
+
### 项目结构
|
|
425
|
+
|
|
426
|
+
```
|
|
427
|
+
sagellm-core/
|
|
428
|
+
├── src/sagellm_core/ # 源代码
|
|
429
|
+
│ ├── llm_engine.py # 统一推理引擎
|
|
430
|
+
│ ├── engine_core/ # 引擎核心(调度+执行)
|
|
431
|
+
│ ├── executor/ # Worker 执行器
|
|
432
|
+
│ ├── worker/ # Worker 和 ModelRunner
|
|
433
|
+
│ ├── decoding/ # 解码策略(Greedy/Sampling/...)
|
|
434
|
+
│ ├── engine_server.py # HTTP Server (FastAPI)
|
|
435
|
+
│ ├── config.py # 配置类(Legacy)
|
|
436
|
+
│ ├── factory.py # 工厂方法(Legacy)
|
|
437
|
+
│ ├── runtime.py # PD 分离 Runtime
|
|
438
|
+
│ ├── pd_executor.py # PD 分离执行器
|
|
439
|
+
│ └── ...
|
|
440
|
+
├── tests/ # 测试用例
|
|
441
|
+
│ ├── unit/ # 单元测试
|
|
442
|
+
│ ├── integration/ # 集成测试
|
|
443
|
+
│ ├── e2e/ # 端到端测试
|
|
444
|
+
│ └── conftest.py # Pytest 配置
|
|
445
|
+
├── examples/ # 示例代码
|
|
446
|
+
│ ├── config_cpu.yaml # CPU 配置示例
|
|
447
|
+
│ ├── config_cuda.yaml # CUDA 配置示例
|
|
448
|
+
│ ├── decoding_strategies_demo.py # 解码策略演示
|
|
449
|
+
│ ├── pd_separation_demo.py # PD 分离演示
|
|
450
|
+
│ └── ...
|
|
451
|
+
├── docs/ # 文档
|
|
452
|
+
│ ├── ARCHITECTURE.md # 详细架构
|
|
453
|
+
│ ├── DECODING_STRATEGIES.md # 解码策略指南
|
|
454
|
+
│ └── ...
|
|
455
|
+
├── pyproject.toml # 项目配置(setuptools)
|
|
456
|
+
├── pytest.ini # Pytest 配置
|
|
457
|
+
├── .pre-commit-config.yaml # Pre-commit hooks
|
|
458
|
+
└── quickstart.sh # 快速安装脚本
|
|
459
|
+
```
|
|
460
|
+
|
|
461
|
+
### 环境设置
|
|
462
|
+
|
|
463
|
+
```bash
|
|
464
|
+
# 克隆并进入项目
|
|
465
|
+
git clone https://github.com/intellistream/sagellm-core.git
|
|
466
|
+
cd sagellm-core
|
|
467
|
+
|
|
468
|
+
# 安装开发依赖
|
|
469
|
+
pip install -e ".[dev]"
|
|
470
|
+
|
|
471
|
+
# 安装 git hooks(提交前自动检查)
|
|
472
|
+
pre-commit install
|
|
473
|
+
|
|
474
|
+
# 验证安装
|
|
475
|
+
python -m pytest tests/test_ci_smoke.py -v
|
|
476
|
+
```
|
|
477
|
+
|
|
478
|
+
### 运行测试
|
|
479
|
+
|
|
480
|
+
```bash
|
|
481
|
+
# 运行所有测试
|
|
482
|
+
pytest tests/ -v
|
|
483
|
+
|
|
484
|
+
# 运行特定测试模块
|
|
485
|
+
pytest tests/unit/test_config.py -v
|
|
486
|
+
|
|
487
|
+
# 运行带覆盖率报告
|
|
488
|
+
pytest tests/ --cov=sagellm_core --cov-report=html
|
|
489
|
+
|
|
490
|
+
# 运行 slow 标记的测试(包括 LLM 测试)
|
|
491
|
+
pytest tests/ -v -m slow
|
|
492
|
+
|
|
493
|
+
# 运行单个测试用例
|
|
494
|
+
pytest tests/test_llm_engine.py::test_engine_generate -v
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
### 代码质量检查
|
|
498
|
+
|
|
499
|
+
```bash
|
|
500
|
+
# Ruff 代码格式化 + Lint 检查
|
|
501
|
+
ruff check . --fix # 自动修复可修复的问题
|
|
502
|
+
ruff format . # 格式化代码
|
|
503
|
+
|
|
504
|
+
# Mypy 静态类型检查
|
|
505
|
+
mypy src/
|
|
506
|
+
|
|
507
|
+
# 手动运行所有 pre-commit hooks
|
|
508
|
+
pre-commit run --all-files
|
|
509
|
+
|
|
510
|
+
# 运行特定 hook
|
|
511
|
+
pre-commit run ruff --all-files
|
|
512
|
+
pre-commit run mypy --all-files
|
|
513
|
+
```
|
|
514
|
+
|
|
515
|
+
### Git 提交流程
|
|
516
|
+
|
|
517
|
+
1. **创建特性分支**
|
|
518
|
+
```bash
|
|
519
|
+
git checkout -b feature/your-feature-name
|
|
520
|
+
```
|
|
521
|
+
|
|
522
|
+
2. **提交代码(hooks 会自动检查)**
|
|
523
|
+
```bash
|
|
524
|
+
git add .
|
|
525
|
+
git commit -m "feat: add your feature description"
|
|
526
|
+
```
|
|
527
|
+
- 如果 hooks 失败,修复问题后重新提交
|
|
528
|
+
- 紧急情况:`git commit --no-verify` (不推荐)
|
|
529
|
+
|
|
530
|
+
3. **推送并提 PR**
|
|
531
|
+
```bash
|
|
532
|
+
git push origin feature/your-feature-name
|
|
533
|
+
```
|
|
534
|
+
|
|
535
|
+
### 常见开发任务
|
|
536
|
+
|
|
537
|
+
**添加新的解码策略**:
|
|
538
|
+
1. 在 `src/sagellm_core/decoding/` 创建新文件
|
|
539
|
+
2. 继承 `BaseDecodingStrategy`
|
|
540
|
+
3. 实现 `__call__()` 方法
|
|
541
|
+
4. 在 `__init__.py` 中导出
|
|
542
|
+
5. 添加单元测试
|
|
543
|
+
|
|
544
|
+
**添加新的后端支持**:
|
|
545
|
+
1. 在 `sagellm-backend` 实现 BackendProvider
|
|
546
|
+
2. 在 Core 中使用 `get_provider()` 自动发现
|
|
547
|
+
3. 添加集成测试
|
|
548
|
+
|
|
549
|
+
**添加配置选项**:
|
|
550
|
+
1. 修改 `src/sagellm_core/config.py` 中的 Pydantic 模型
|
|
551
|
+
2. 在示例配置文件中更新示例
|
|
552
|
+
3. 更新文档和测试
|
|
553
|
+
|
|
554
|
+
## 📖 示例代码
|
|
555
|
+
|
|
556
|
+
### 完整的演示应用
|
|
557
|
+
|
|
558
|
+
```bash
|
|
559
|
+
# 运行解码策略完整演示(包含 6 个场景)
|
|
560
|
+
python examples/decoding_strategies_demo.py
|
|
561
|
+
|
|
562
|
+
# 运行 PD 分离演示
|
|
563
|
+
python examples/pd_separation_demo.py
|
|
564
|
+
```
|
|
565
|
+
|
|
566
|
+
### CPU-First 测试
|
|
567
|
+
|
|
568
|
+
所有测试默认在 CPU 上运行(无 GPU 要求):
|
|
569
|
+
|
|
570
|
+
```bash
|
|
571
|
+
# 测试 LLMEngine
|
|
572
|
+
pytest tests/test_engine.py -v
|
|
573
|
+
|
|
574
|
+
# 测试配置系统
|
|
575
|
+
pytest tests/test_config.py -v
|
|
576
|
+
|
|
577
|
+
# 测试解码策略
|
|
578
|
+
pytest tests/test_decoding_strategies.py -v
|
|
579
|
+
|
|
580
|
+
# 测试 E2E 流程
|
|
581
|
+
pytest tests/test_llm_engine_contract.py -v
|
|
582
|
+
```
|
|
583
|
+
|
|
584
|
+
### 模型下载
|
|
585
|
+
|
|
586
|
+
使用提供的帮助脚本下载测试模型:
|
|
587
|
+
|
|
588
|
+
```bash
|
|
589
|
+
# 下载 tiny-gpt2(用于测试)
|
|
590
|
+
python examples/model_download_helper.py
|
|
591
|
+
|
|
592
|
+
# 或手动下载
|
|
593
|
+
python -c "from transformers import AutoModel; AutoModel.from_pretrained('sshleifer/tiny-gpt2')"
|
|
594
|
+
```
|
|
595
|
+
|
|
596
|
+
## 🔄 持续集成
|
|
597
|
+
|
|
598
|
+
本项目使用 GitHub Actions 进行 CI/CD:
|
|
599
|
+
|
|
600
|
+
- **单元测试**: 每次 push 运行 `pytest tests/unit/`
|
|
601
|
+
- **集成测试**: 每次 push 运行 `pytest tests/integration/`
|
|
602
|
+
- **Lint 检查**: Ruff、Mypy、YAML 验证
|
|
603
|
+
- **覆盖率**: 维持 >80% 的代码覆盖率
|
|
604
|
+
|
|
605
|
+
查看 CI 配置:[.github/workflows/ci.yml](.github/workflows/ci.yml)
|
|
606
|
+
|
|
607
|
+
## 📋 版本与变更
|
|
608
|
+
|
|
609
|
+
**当前版本**: `0.4.0.17` (Alpha)
|
|
610
|
+
|
|
611
|
+
**支持的 Python**: 3.10, 3.11, 3.12
|
|
612
|
+
|
|
613
|
+
**完整变更日志**: 见 [CHANGELOG.md](CHANGELOG.md)
|
|
614
|
+
|
|
615
|
+
**最近更新** (v0.4.0.17):
|
|
616
|
+
- ✅ 采样参数标准化(issue #22)- 参数优先级系统
|
|
617
|
+
- ✅ 增强解码策略测试
|
|
618
|
+
- ✅ 完成 LLMEngine 与解码策略的集成测试
|
|
619
|
+
- ✅ 解码策略使用演示与文档
|
|
620
|
+
|
|
621
|
+
## 🤝 贡献指南
|
|
622
|
+
|
|
623
|
+
我们欢迎社区贡献!请遵循以下步骤:
|
|
624
|
+
|
|
625
|
+
1. **Fork** 仓库
|
|
626
|
+
2. **创建特性分支** (`git checkout -b feature/your-feature`)
|
|
627
|
+
3. **提交更改** (`git commit -m "feat: description"`)
|
|
628
|
+
4. **推送到分支** (`git push origin feature/your-feature`)
|
|
629
|
+
5. **提交 Pull Request**
|
|
630
|
+
|
|
631
|
+
### 提交规范
|
|
632
|
+
|
|
633
|
+
使用 Conventional Commits:
|
|
634
|
+
```
|
|
635
|
+
feat: 新增功能
|
|
636
|
+
fix: 修复 bug
|
|
637
|
+
docs: 文档更新
|
|
638
|
+
test: 测试相关
|
|
639
|
+
refactor: 代码重构
|
|
640
|
+
perf: 性能优化
|
|
641
|
+
```
|
|
642
|
+
|
|
643
|
+
## 📄 许可证
|
|
644
|
+
|
|
645
|
+
Proprietary - IntelliStream
|
|
646
|
+
|
|
647
|
+
## 📞 反馈与支持
|
|
648
|
+
|
|
649
|
+
- 📍 **GitHub Issues**: [提交问题](https://github.com/intellistream/sagellm-core/issues)
|
|
650
|
+
- 💬 **讨论**: [启动讨论](https://github.com/intellistream/sagellm-core/discussions)
|
|
651
|
+
- 📧 **Email**: team@intellistream.ai
|
|
652
|
+
|
|
653
|
+
## 相关资源
|
|
654
|
+
|
|
655
|
+
- 🔗 [Protocol v0.1 文档](https://github.com/intellistream/sagellm-docs/blob/main/docs/specs/protocol_v0.1.md)
|
|
656
|
+
- 🔗 [sagellm-backend](https://github.com/intellistream/sagellm-backend)
|
|
657
|
+
- 🔗 [sagellm-comm](https://github.com/intellistream/sagellm-comm)
|
|
658
|
+
- 🔗 [sagellm-kv-cache](https://github.com/intellistream/sagellm-kv-cache)
|
|
659
|
+
- 🔗 [完整架构文档](docs/ARCHITECTURE.md)
|
|
660
|
+
- 🔗 [解码策略指南](docs/DECODING_STRATEGIES.md)
|
|
661
|
+
|
|
662
|
+
|
|
663
|
+
#### Continuous Integration
|
|
664
|
+
|
|
665
|
+
GitHub Actions automatically runs on each PR:
|
|
666
|
+
- Code linting and formatting checks
|
|
667
|
+
- Tests across Python 3.10, 3.11, 3.12
|
|
668
|
+
- Package build verification
|
|
669
|
+
|
|
670
|
+
### Code Style
|
|
671
|
+
|
|
672
|
+
This project uses:
|
|
673
|
+
- **Ruff** for formatting and linting
|
|
674
|
+
- **Mypy** for type checking
|
|
675
|
+
- **Type hints** are required for all functions
|
|
676
|
+
|
|
677
|
+
For detailed guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md)
|
|
678
|
+
|
|
679
|
+
### 代码检查
|
|
680
|
+
|
|
681
|
+
```bash
|
|
682
|
+
# 格式化代码
|
|
683
|
+
ruff format .
|
|
684
|
+
|
|
685
|
+
# Lint 检查
|
|
686
|
+
ruff check .
|
|
687
|
+
|
|
688
|
+
# 类型检查
|
|
689
|
+
mypy src/sagellm_core
|
|
690
|
+
|
|
691
|
+
# 一键检查所有
|
|
692
|
+
pre-commit run --all-files
|
|
693
|
+
```
|
|
694
|
+
|
|
695
|
+
## 依赖
|
|
696
|
+
|
|
697
|
+
- `pydantic>=2.0.0`: 配置校验
|
|
698
|
+
- `pyyaml>=6.0.0`: YAML 配置支持
|
|
699
|
+
- `isagellm-protocol>=0.4.0.0,<0.5.0`: 协议定义
|
|
700
|
+
- `isagellm-backend>=0.4.0.0,<0.5.0`: 后端抽象
|
|
701
|
+
- `isagellm-comm>=0.4.0.0,<0.5.0`: 通信后端
|
|
702
|
+
- `isagellm-kv-cache>=0.4.0.0,<0.5.0`: KV 缓存
|
|
703
|
+
|
|
704
|
+
## Related Packages
|
|
705
|
+
|
|
706
|
+
- `isagellm-protocol` - Protocol definitions (L0)
|
|
707
|
+
- `isagellm-backend` - Backend abstraction layer (L1)
|
|
708
|
+
- `isagellm-comm` - Communication abstraction (L1)
|
|
709
|
+
- `isagellm-kv-cache` - KV Cache management (L1.5)
|
|
710
|
+
- `sagellm-control-plane` - Cross-engine orchestration (L3)
|
|
711
|
+
- `sagellm-gateway` - OpenAI-compatible API (L4)
|
|
712
|
+
|
|
713
|
+
For the complete ecosystem, see [sageLLM organization](https://github.com/intellistream/sagellm)
|
|
714
|
+
|
|
715
|
+
---
|
|
716
|
+
|
|
717
|
+
**Last Updated**: 2026-02-02 | **Status**: Alpha (v0.4.0.17) | **Protocol**: v0.1
|
|
718
|
+
|