@aws/ml-container-creator 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +202 -0
- package/LICENSE-THIRD-PARTY +68620 -0
- package/NOTICE +2 -0
- package/README.md +106 -0
- package/bin/cli.js +365 -0
- package/config/defaults.json +32 -0
- package/config/presets/transformers-djl.json +26 -0
- package/config/presets/transformers-gpu.json +24 -0
- package/config/presets/transformers-lmi.json +27 -0
- package/package.json +129 -0
- package/servers/README.md +419 -0
- package/servers/base-image-picker/catalogs/model-servers.json +1191 -0
- package/servers/base-image-picker/catalogs/python-slim.json +38 -0
- package/servers/base-image-picker/catalogs/triton-backends.json +51 -0
- package/servers/base-image-picker/catalogs/triton.json +38 -0
- package/servers/base-image-picker/index.js +495 -0
- package/servers/base-image-picker/manifest.json +17 -0
- package/servers/base-image-picker/package.json +15 -0
- package/servers/hyperpod-cluster-picker/LICENSE +202 -0
- package/servers/hyperpod-cluster-picker/index.js +424 -0
- package/servers/hyperpod-cluster-picker/manifest.json +14 -0
- package/servers/hyperpod-cluster-picker/package.json +17 -0
- package/servers/instance-recommender/LICENSE +202 -0
- package/servers/instance-recommender/catalogs/instances.json +852 -0
- package/servers/instance-recommender/index.js +284 -0
- package/servers/instance-recommender/manifest.json +16 -0
- package/servers/instance-recommender/package.json +15 -0
- package/servers/lib/LICENSE +202 -0
- package/servers/lib/bedrock-client.js +160 -0
- package/servers/lib/custom-validators.js +46 -0
- package/servers/lib/dynamic-resolver.js +36 -0
- package/servers/lib/package.json +11 -0
- package/servers/lib/schemas/image-catalog.schema.json +185 -0
- package/servers/lib/schemas/instances.schema.json +124 -0
- package/servers/lib/schemas/manifest.schema.json +64 -0
- package/servers/lib/schemas/model-catalog.schema.json +91 -0
- package/servers/lib/schemas/regions.schema.json +26 -0
- package/servers/lib/schemas/triton-backends.schema.json +51 -0
- package/servers/model-picker/catalogs/jumpstart-public.json +66 -0
- package/servers/model-picker/catalogs/popular-diffusors.json +88 -0
- package/servers/model-picker/catalogs/popular-transformers.json +226 -0
- package/servers/model-picker/index.js +1693 -0
- package/servers/model-picker/manifest.json +18 -0
- package/servers/model-picker/package.json +20 -0
- package/servers/region-picker/LICENSE +202 -0
- package/servers/region-picker/catalogs/regions.json +263 -0
- package/servers/region-picker/index.js +230 -0
- package/servers/region-picker/manifest.json +16 -0
- package/servers/region-picker/package.json +15 -0
- package/src/app.js +1007 -0
- package/src/copy-tpl.js +77 -0
- package/src/lib/accelerator-validator.js +39 -0
- package/src/lib/asset-manager.js +385 -0
- package/src/lib/aws-profile-parser.js +181 -0
- package/src/lib/bootstrap-command-handler.js +1647 -0
- package/src/lib/bootstrap-config.js +238 -0
- package/src/lib/ci-register-helpers.js +124 -0
- package/src/lib/ci-report-helpers.js +158 -0
- package/src/lib/ci-stage-helpers.js +268 -0
- package/src/lib/cli-handler.js +529 -0
- package/src/lib/comment-generator.js +544 -0
- package/src/lib/community-reports-validator.js +91 -0
- package/src/lib/config-manager.js +2106 -0
- package/src/lib/configuration-exporter.js +204 -0
- package/src/lib/configuration-manager.js +695 -0
- package/src/lib/configuration-matcher.js +221 -0
- package/src/lib/cpu-validator.js +36 -0
- package/src/lib/cuda-validator.js +57 -0
- package/src/lib/deployment-config-resolver.js +103 -0
- package/src/lib/deployment-entry-schema.js +125 -0
- package/src/lib/deployment-registry.js +598 -0
- package/src/lib/docker-introspection-validator.js +51 -0
- package/src/lib/engine-prefix-resolver.js +60 -0
- package/src/lib/huggingface-client.js +172 -0
- package/src/lib/key-value-parser.js +37 -0
- package/src/lib/known-flags-validator.js +200 -0
- package/src/lib/manifest-cli.js +280 -0
- package/src/lib/mcp-client.js +303 -0
- package/src/lib/mcp-command-handler.js +532 -0
- package/src/lib/neuron-validator.js +80 -0
- package/src/lib/parameter-schema-validator.js +284 -0
- package/src/lib/prompt-runner.js +1349 -0
- package/src/lib/prompts.js +1138 -0
- package/src/lib/registry-command-handler.js +519 -0
- package/src/lib/registry-loader.js +198 -0
- package/src/lib/rocm-validator.js +80 -0
- package/src/lib/schema-validator.js +157 -0
- package/src/lib/sensitive-redactor.js +59 -0
- package/src/lib/template-engine.js +156 -0
- package/src/lib/template-manager.js +341 -0
- package/src/lib/validation-engine.js +314 -0
- package/src/prompt-adapter.js +63 -0
- package/templates/Dockerfile +300 -0
- package/templates/IAM_PERMISSIONS.md +84 -0
- package/templates/MIGRATION.md +488 -0
- package/templates/PROJECT_README.md +439 -0
- package/templates/TEMPLATE_SYSTEM.md +243 -0
- package/templates/buildspec.yml +64 -0
- package/templates/code/chat_template.jinja +1 -0
- package/templates/code/flask/gunicorn_config.py +35 -0
- package/templates/code/flask/wsgi.py +10 -0
- package/templates/code/model_handler.py +387 -0
- package/templates/code/serve +300 -0
- package/templates/code/serve.py +175 -0
- package/templates/code/serving.properties +105 -0
- package/templates/code/start_server.py +39 -0
- package/templates/code/start_server.sh +39 -0
- package/templates/diffusors/Dockerfile +72 -0
- package/templates/diffusors/patch_image_api.py +35 -0
- package/templates/diffusors/serve +115 -0
- package/templates/diffusors/start_server.sh +114 -0
- package/templates/do/.gitkeep +1 -0
- package/templates/do/README.md +541 -0
- package/templates/do/build +83 -0
- package/templates/do/ci +681 -0
- package/templates/do/clean +811 -0
- package/templates/do/config +260 -0
- package/templates/do/deploy +1560 -0
- package/templates/do/export +306 -0
- package/templates/do/logs +319 -0
- package/templates/do/manifest +12 -0
- package/templates/do/push +119 -0
- package/templates/do/register +580 -0
- package/templates/do/run +113 -0
- package/templates/do/submit +417 -0
- package/templates/do/test +1147 -0
- package/templates/hyperpod/configmap.yaml +24 -0
- package/templates/hyperpod/deployment.yaml +71 -0
- package/templates/hyperpod/pvc.yaml +42 -0
- package/templates/hyperpod/service.yaml +17 -0
- package/templates/nginx-diffusors.conf +74 -0
- package/templates/nginx-predictors.conf +47 -0
- package/templates/nginx-tensorrt.conf +74 -0
- package/templates/requirements.txt +61 -0
- package/templates/sample_model/test_inference.py +123 -0
- package/templates/sample_model/train_abalone.py +252 -0
- package/templates/test/test_endpoint.sh +79 -0
- package/templates/test/test_local_image.sh +80 -0
- package/templates/test/test_model_handler.py +180 -0
- package/templates/triton/Dockerfile +128 -0
- package/templates/triton/config.pbtxt +163 -0
- package/templates/triton/model.py +130 -0
- package/templates/triton/requirements.txt +11 -0
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
|
|
2
|
+
// SPDX-License-Identifier: Apache-2.0
|
|
3
|
+
|
|
4
|
+
import { select, input, confirm, checkbox, number, Separator } from '@inquirer/prompts'
|
|
5
|
+
|
|
6
|
+
/**
|
|
7
|
+
* Maps Yeoman prompt type names to @inquirer/prompts runner functions.
|
|
8
|
+
*/
|
|
9
|
+
const runners = { list: select, select, input, confirm, checkbox, number }
|
|
10
|
+
|
|
11
|
+
/**
|
|
12
|
+
* Runs a sequence of Yeoman-style prompt definitions using @inquirer/prompts.
|
|
13
|
+
*
|
|
14
|
+
* Handles:
|
|
15
|
+
* - Type mapping (list → select)
|
|
16
|
+
* - Conditional prompts via `when` function
|
|
17
|
+
* - Dynamic `choices`, `default`, and `message` (functions resolved with current answers)
|
|
18
|
+
* - Separator mapping from Yeoman format to @inquirer/prompts Separator
|
|
19
|
+
* - Validate function passthrough
|
|
20
|
+
*
|
|
21
|
+
* @param {Array<object>} prompts - Array of Yeoman-style prompt definitions
|
|
22
|
+
* @param {object} [previousAnswers={}] - Answers from prior prompt phases
|
|
23
|
+
* @param {object} [options={}] - Options for dependency injection
|
|
24
|
+
* @param {object} [options.runners] - Override prompt runners (useful for testing)
|
|
25
|
+
* @returns {Promise<object>} Accumulated answers keyed by prompt name
|
|
26
|
+
*/
|
|
27
|
+
export async function runPrompts(prompts, previousAnswers = {}, options = {}) {
|
|
28
|
+
const promptRunners = options.runners || runners
|
|
29
|
+
const answers = { ...previousAnswers }
|
|
30
|
+
|
|
31
|
+
for (const prompt of prompts) {
|
|
32
|
+
if (prompt.when && !prompt.when(answers)) continue
|
|
33
|
+
|
|
34
|
+
const type = prompt.type === 'list' ? 'select' : prompt.type
|
|
35
|
+
const runner = promptRunners[type]
|
|
36
|
+
|
|
37
|
+
if (!runner) {
|
|
38
|
+
throw new Error(`Unsupported prompt type: "${prompt.type}"`)
|
|
39
|
+
}
|
|
40
|
+
|
|
41
|
+
const message = typeof prompt.message === 'function'
|
|
42
|
+
? prompt.message(answers) : prompt.message
|
|
43
|
+
const choices = typeof prompt.choices === 'function'
|
|
44
|
+
? prompt.choices(answers) : prompt.choices
|
|
45
|
+
const defaultVal = typeof prompt.default === 'function'
|
|
46
|
+
? prompt.default(answers) : prompt.default
|
|
47
|
+
|
|
48
|
+
const mappedChoices = choices?.map(c =>
|
|
49
|
+
c && c.type === 'separator'
|
|
50
|
+
? new Separator(c.separator || c.line)
|
|
51
|
+
: c
|
|
52
|
+
)
|
|
53
|
+
|
|
54
|
+
const config = { message }
|
|
55
|
+
if (mappedChoices !== undefined) config.choices = mappedChoices
|
|
56
|
+
if (defaultVal !== undefined) config.default = defaultVal
|
|
57
|
+
if (prompt.validate) config.validate = prompt.validate
|
|
58
|
+
|
|
59
|
+
answers[prompt.name] = await runner(config)
|
|
60
|
+
}
|
|
61
|
+
|
|
62
|
+
return answers
|
|
63
|
+
}
|
|
@@ -0,0 +1,300 @@
|
|
|
1
|
+
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
|
|
2
|
+
# SPDX-License-Identifier: Apache-2.0
|
|
3
|
+
|
|
4
|
+
<% if (comments && comments.acceleratorInfo) { %>
|
|
5
|
+
<%= comments.acceleratorInfo %>
|
|
6
|
+
<% } %>
|
|
7
|
+
|
|
8
|
+
<% if (comments && comments.validationInfo) { %>
|
|
9
|
+
<%= comments.validationInfo %>
|
|
10
|
+
<% } %>
|
|
11
|
+
|
|
12
|
+
<% if (framework !== 'transformers') { %>
|
|
13
|
+
FROM <%= baseImage || 'python:3.12-slim' %>
|
|
14
|
+
|
|
15
|
+
# Set a docker label to name this project, postpended with the build time
|
|
16
|
+
LABEL project.name="<%= projectName %>-<%= buildTimestamp %>" \
|
|
17
|
+
project.base-name="<%= projectName %>" \
|
|
18
|
+
project.build-time="<%= buildTimestamp %>"
|
|
19
|
+
|
|
20
|
+
# Set a docker label to advertise multi-model support on the container
|
|
21
|
+
LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
|
|
22
|
+
# Set a docker label to enable container to use SAGEMAKER_BIND_TO_PORT environment variable if present
|
|
23
|
+
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
|
|
24
|
+
|
|
25
|
+
# Set working directory
|
|
26
|
+
WORKDIR /opt/ml
|
|
27
|
+
|
|
28
|
+
RUN apt-get update && \
|
|
29
|
+
apt-get upgrade -y && \
|
|
30
|
+
apt-get clean
|
|
31
|
+
|
|
32
|
+
# Install system dependencies
|
|
33
|
+
RUN apt-get install -y --no-install-recommends \
|
|
34
|
+
build-essential \
|
|
35
|
+
ca-certificates \
|
|
36
|
+
curl \
|
|
37
|
+
git \
|
|
38
|
+
nginx \
|
|
39
|
+
&& rm -rf /var/lib/apt/lists/* \
|
|
40
|
+
&& apt-get clean
|
|
41
|
+
|
|
42
|
+
# Install Python dependencies
|
|
43
|
+
COPY requirements.txt .
|
|
44
|
+
RUN pip install --no-cache-dir pip==24.2 && \
|
|
45
|
+
pip install --no-cache-dir -r requirements.txt
|
|
46
|
+
|
|
47
|
+
# Copy model serving code
|
|
48
|
+
COPY code/serve.py code/
|
|
49
|
+
COPY code/start_server.py code/
|
|
50
|
+
COPY code/model_handler.py code/
|
|
51
|
+
|
|
52
|
+
<% if (modelServer === 'flask') { %>
|
|
53
|
+
COPY code/flask/gunicorn_config.py code/
|
|
54
|
+
COPY code/flask/wsgi.py code/
|
|
55
|
+
<% } %>
|
|
56
|
+
|
|
57
|
+
|
|
58
|
+
# Set up SageMaker directories
|
|
59
|
+
RUN mkdir -p /opt/ml/input/data \
|
|
60
|
+
&& mkdir -p /opt/ml/output/data \
|
|
61
|
+
&& mkdir -p /opt/ml/model
|
|
62
|
+
|
|
63
|
+
COPY nginx-predictors.conf /etc/nginx/nginx.conf
|
|
64
|
+
|
|
65
|
+
# Model files will be provided at runtime via SageMaker model artifacts
|
|
66
|
+
<% if (includeSampleModel) { %>
|
|
67
|
+
# Copy the generated sample model
|
|
68
|
+
<% if (modelFormat === 'SavedModel') { %>
|
|
69
|
+
COPY sample_model/abalone_model /opt/ml/model/
|
|
70
|
+
<% } else { %>
|
|
71
|
+
COPY sample_model/abalone_model.<%= modelFormat %> /opt/ml/model/
|
|
72
|
+
<% } %>
|
|
73
|
+
# Also copy training script for reference
|
|
74
|
+
COPY sample_model/ /opt/ml/sample_model/
|
|
75
|
+
<% } else { %>
|
|
76
|
+
# COPY your_model_files /opt/ml/model/
|
|
77
|
+
<% } %>
|
|
78
|
+
|
|
79
|
+
<% if (comments && comments.envVarExplanations && Object.keys(comments.envVarExplanations).length > 0) { %>
|
|
80
|
+
# Environment Variables Configuration
|
|
81
|
+
<% for (const [category, comment] of Object.entries(comments.envVarExplanations)) { %>
|
|
82
|
+
<%= comment %>
|
|
83
|
+
<% } %>
|
|
84
|
+
<% } %>
|
|
85
|
+
|
|
86
|
+
# Set environment variables for SageMaker
|
|
87
|
+
ENV PYTHONPATH=/opt/ml/code
|
|
88
|
+
ENV SAGEMAKER_BIND_TO_PORT=8080
|
|
89
|
+
|
|
90
|
+
<% if (orderedEnvVars && orderedEnvVars.length > 0) { %>
|
|
91
|
+
# Additional environment variables from configuration
|
|
92
|
+
<% orderedEnvVars.forEach(({ key, value }) => { %>
|
|
93
|
+
ENV <%= key %>=<%= value %>
|
|
94
|
+
<% }); %>
|
|
95
|
+
<% } %>
|
|
96
|
+
|
|
97
|
+
# Expose port 8080 for SageMaker inference
|
|
98
|
+
EXPOSE 8080
|
|
99
|
+
|
|
100
|
+
<% if (comments && comments.troubleshooting) { %>
|
|
101
|
+
<%= comments.troubleshooting %>
|
|
102
|
+
<% } %>
|
|
103
|
+
|
|
104
|
+
# Set the inference script as the entry point
|
|
105
|
+
RUN chmod +x code/start_server.py
|
|
106
|
+
ENTRYPOINT ["python", "/opt/ml/code/start_server.py"]
|
|
107
|
+
<% } else { %>
|
|
108
|
+
<% if (comments && comments.acceleratorInfo) { %>
|
|
109
|
+
<%= comments.acceleratorInfo %>
|
|
110
|
+
<% } %>
|
|
111
|
+
|
|
112
|
+
<% if (comments && comments.validationInfo) { %>
|
|
113
|
+
<%= comments.validationInfo %>
|
|
114
|
+
<% } %>
|
|
115
|
+
|
|
116
|
+
<% if (modelServer === 'vllm') { %>
|
|
117
|
+
# https://github.com/aws-samples/sagemaker-genai-hosting-examples/tree/main/OpenAI/gpt-oss/deploy/docker
|
|
118
|
+
ARG BASE_IMAGE=<%= baseImage || 'vllm/vllm-openai:v0.10.1' %>
|
|
119
|
+
<% } else if (modelServer === 'sglang') { %>
|
|
120
|
+
ARG BASE_IMAGE=<%= baseImage || 'lmsysorg/sglang:v0.5.4.post1' %>
|
|
121
|
+
<% } else if (modelServer === 'tensorrt-llm') { %>
|
|
122
|
+
# TensorRT-LLM requires NVIDIA NGC authentication
|
|
123
|
+
# Before building, authenticate with NGC:
|
|
124
|
+
# 1. Create NGC account: https://ngc.nvidia.com/signup
|
|
125
|
+
# 2. Generate API key: https://ngc.nvidia.com/setup/api-key
|
|
126
|
+
# 3. Login: docker login nvcr.io
|
|
127
|
+
# Username: $oauthtoken
|
|
128
|
+
# Password: <your-ngc-api-key>
|
|
129
|
+
# Using a stable release for better SageMaker compatibility
|
|
130
|
+
ARG BASE_IMAGE=<%= baseImage || 'nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc8' %>
|
|
131
|
+
<% } else if (modelServer === 'lmi') { %>
|
|
132
|
+
# AWS Large Model Inference (LMI) Container
|
|
133
|
+
# LMI containers are pre-built by AWS and include DJL Serving with optimized inference libraries
|
|
134
|
+
# Available backends: vLLM, TensorRT-LLM, LMI-Dist (DeepSpeed), Transformers NeuronX
|
|
135
|
+
# Documentation: https://docs.djl.ai/master/docs/serving/serving/docs/lmi/index.html
|
|
136
|
+
ARG BASE_IMAGE=<%= baseImage || '763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.32.0-lmi14.0.0-cu126' %>
|
|
137
|
+
<% } else if (modelServer === 'djl') { %>
|
|
138
|
+
# DJL Serving Container
|
|
139
|
+
# Deep Java Library serving with support for multiple inference backends
|
|
140
|
+
# Documentation: https://djl.ai/
|
|
141
|
+
ARG BASE_IMAGE=<%= baseImage || 'deepjavalibrary/djl-serving:0.36.0-pytorch-gpu' %>
|
|
142
|
+
<% } %>
|
|
143
|
+
|
|
144
|
+
FROM ${BASE_IMAGE}
|
|
145
|
+
|
|
146
|
+
<% if (comments && comments.chatTemplate) { %>
|
|
147
|
+
<%= comments.chatTemplate %>
|
|
148
|
+
<% } %>
|
|
149
|
+
|
|
150
|
+
# Model source metadata
|
|
151
|
+
ENV MODEL_SOURCE="<%= (typeof modelSource !== 'undefined' && modelSource) ? modelSource : 'huggingface' %>"
|
|
152
|
+
<% if (typeof artifactUri !== 'undefined' && artifactUri) { %>
|
|
153
|
+
ENV MODEL_ARTIFACT_URI="<%= artifactUri %>"
|
|
154
|
+
<% } %>
|
|
155
|
+
|
|
156
|
+
# Set the model name for the transformer model
|
|
157
|
+
<% if (modelServer === 'vllm') { %>
|
|
158
|
+
<% if (typeof modelLoadStrategy !== 'undefined' && modelLoadStrategy === 'build-time' && (typeof modelSource === 'undefined' || !modelSource || modelSource === 'huggingface')) { %>
|
|
159
|
+
ENV VLLM_MODEL="/opt/ml/model"
|
|
160
|
+
<% } else { %>
|
|
161
|
+
ENV VLLM_MODEL="<%= modelName %>"
|
|
162
|
+
<% } %>
|
|
163
|
+
<% if (typeof modelSource !== 'undefined' && modelSource && modelSource !== 'huggingface') { %>
|
|
164
|
+
# Model will be resolved at container startup by the serve script
|
|
165
|
+
<% } %>
|
|
166
|
+
<% } else if (modelServer === 'sglang') { %>
|
|
167
|
+
<% if (typeof modelLoadStrategy !== 'undefined' && modelLoadStrategy === 'build-time' && (typeof modelSource === 'undefined' || !modelSource || modelSource === 'huggingface')) { %>
|
|
168
|
+
ENV SGLANG_MODEL_PATH="/opt/ml/model"
|
|
169
|
+
<% } else { %>
|
|
170
|
+
ENV SGLANG_MODEL_PATH="<%= modelName %>"
|
|
171
|
+
<% } %>
|
|
172
|
+
<% if (typeof modelSource !== 'undefined' && modelSource && modelSource !== 'huggingface') { %>
|
|
173
|
+
# Model will be resolved at container startup by the serve script
|
|
174
|
+
<% } %>
|
|
175
|
+
<% } else if (modelServer === 'tensorrt-llm') { %>
|
|
176
|
+
<% if (typeof modelLoadStrategy !== 'undefined' && modelLoadStrategy === 'build-time' && (typeof modelSource === 'undefined' || !modelSource || modelSource === 'huggingface')) { %>
|
|
177
|
+
ENV TRTLLM_MODEL="/opt/ml/model"
|
|
178
|
+
<% } else { %>
|
|
179
|
+
ENV TRTLLM_MODEL="<%= modelName %>"
|
|
180
|
+
<% } %>
|
|
181
|
+
<% if (typeof modelSource !== 'undefined' && modelSource && modelSource !== 'huggingface') { %>
|
|
182
|
+
# Model will be resolved at container startup by the serve script
|
|
183
|
+
<% } %>
|
|
184
|
+
|
|
185
|
+
# Disable UCX CUDA transport to avoid symbol lookup errors
|
|
186
|
+
# The UCX CUDA library has compatibility issues with some CUDA versions in SageMaker
|
|
187
|
+
RUN if [ -f /usr/local/ucx/lib/ucx/libuct_cuda.so.0 ]; then \
|
|
188
|
+
mv /usr/local/ucx/lib/ucx/libuct_cuda.so.0 /usr/local/ucx/lib/ucx/libuct_cuda.so.0.disabled; \
|
|
189
|
+
fi
|
|
190
|
+
|
|
191
|
+
ENV UCX_TLS=tcp,self,sm
|
|
192
|
+
ENV UCX_NET_DEVICES=all
|
|
193
|
+
ENV NCCL_IB_DISABLE=1
|
|
194
|
+
ENV NCCL_P2P_DISABLE=1
|
|
195
|
+
<% } else if (modelServer === 'lmi' || modelServer === 'djl') { %>
|
|
196
|
+
# LMI/DJL Configuration
|
|
197
|
+
# Model configuration is done via serving.properties file
|
|
198
|
+
# The model will be loaded from HuggingFace Hub or S3
|
|
199
|
+
ENV HF_MODEL_ID="<%= modelName %>"
|
|
200
|
+
<% if (typeof modelSource !== 'undefined' && modelSource && modelSource !== 'huggingface') { %>
|
|
201
|
+
# Model will be resolved at container startup by the serve script
|
|
202
|
+
<% } %>
|
|
203
|
+
|
|
204
|
+
# DJL Serving listens on port 8080 by default (SageMaker requirement)
|
|
205
|
+
ENV SERVING_PORT=8080
|
|
206
|
+
<% } %>
|
|
207
|
+
|
|
208
|
+
<% if (hfToken && (!modelSource || modelSource === 'huggingface')) { %>
|
|
209
|
+
# Set HuggingFace authentication token
|
|
210
|
+
ENV HF_TOKEN="<%= hfToken %>"
|
|
211
|
+
<% } %>
|
|
212
|
+
|
|
213
|
+
<% if (chatTemplate) { %>
|
|
214
|
+
# Chat template configuration
|
|
215
|
+
# This template formats chat messages for the model
|
|
216
|
+
# Writing to file to avoid shell escaping issues with multi-line Jinja2 templates
|
|
217
|
+
COPY code/chat_template.jinja /opt/ml/chat_template.jinja
|
|
218
|
+
ENV SGLANG_CHAT_TEMPLATE="/opt/ml/chat_template.jinja"
|
|
219
|
+
<% } %>
|
|
220
|
+
|
|
221
|
+
<% if (comments && comments.envVarExplanations && Object.keys(comments.envVarExplanations).length > 0) { %>
|
|
222
|
+
# Environment Variables Configuration
|
|
223
|
+
<% for (const [category, comment] of Object.entries(comments.envVarExplanations)) { %>
|
|
224
|
+
<%= comment %>
|
|
225
|
+
<% } %>
|
|
226
|
+
<% } %>
|
|
227
|
+
|
|
228
|
+
<% if (orderedEnvVars && orderedEnvVars.length > 0) { %>
|
|
229
|
+
# Additional environment variables from configuration
|
|
230
|
+
<% orderedEnvVars.forEach(({ key, value }) => { %>
|
|
231
|
+
ENV <%= key %>=<%= value %>
|
|
232
|
+
<% }); %>
|
|
233
|
+
<% } %>
|
|
234
|
+
|
|
235
|
+
<% if (typeof modelSource !== 'undefined' && modelSource && modelSource !== 'huggingface' && modelServer !== 'lmi' && modelServer !== 'djl') { %>
|
|
236
|
+
# Install AWS CLI for S3 model downloads
|
|
237
|
+
RUN pip install --no-cache-dir awscli
|
|
238
|
+
|
|
239
|
+
<% } %>
|
|
240
|
+
<% if (typeof modelLoadStrategy !== 'undefined' && modelLoadStrategy === 'build-time') { %>
|
|
241
|
+
# Build-time model download
|
|
242
|
+
# ⚠️ Credentials required during docker build.
|
|
243
|
+
# Build with: DOCKER_BUILDKIT=1 docker build --secret id=aws,src=$HOME/.aws/credentials .
|
|
244
|
+
# Or in CodeBuild: pass AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN as build args.
|
|
245
|
+
<% if (typeof modelSource === 'undefined' || !modelSource || modelSource === 'huggingface') { %>
|
|
246
|
+
ARG HF_TOKEN
|
|
247
|
+
RUN huggingface-cli download <%= modelName %> --local-dir /opt/ml/model
|
|
248
|
+
<% } else if (typeof artifactUri !== 'undefined' && artifactUri) { %>
|
|
249
|
+
ARG AWS_ACCESS_KEY_ID
|
|
250
|
+
ARG AWS_SECRET_ACCESS_KEY
|
|
251
|
+
ARG AWS_SESSION_TOKEN
|
|
252
|
+
ARG AWS_DEFAULT_REGION=<%= (typeof awsRegion !== 'undefined' && awsRegion) ? awsRegion : 'us-east-1' %>
|
|
253
|
+
RUN AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
|
|
254
|
+
AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
|
|
255
|
+
AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN} \
|
|
256
|
+
AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION} \
|
|
257
|
+
aws s3 sync <%= artifactUri %> /opt/ml/model
|
|
258
|
+
<% } %>
|
|
259
|
+
<% } %>
|
|
260
|
+
<% if (modelServer === 'tensorrt-llm') { %>
|
|
261
|
+
# Install nginx and curl for reverse proxy and health checks
|
|
262
|
+
RUN apt-get update && \
|
|
263
|
+
apt-get install -y nginx curl && \
|
|
264
|
+
rm -rf /var/lib/apt/lists/*
|
|
265
|
+
|
|
266
|
+
# Copy nginx configuration for TensorRT-LLM
|
|
267
|
+
COPY nginx-tensorrt.conf /etc/nginx/nginx.conf
|
|
268
|
+
|
|
269
|
+
# Copy TensorRT-LLM serve script
|
|
270
|
+
COPY code/serve /usr/bin/serve_trtllm
|
|
271
|
+
RUN chmod +x /usr/bin/serve_trtllm
|
|
272
|
+
|
|
273
|
+
# Copy startup script
|
|
274
|
+
COPY code/start_server.sh /usr/bin/start_server.sh
|
|
275
|
+
RUN chmod +x /usr/bin/start_server.sh
|
|
276
|
+
|
|
277
|
+
ENTRYPOINT [ "/usr/bin/start_server.sh" ]
|
|
278
|
+
<% } else if (modelServer === 'lmi' || modelServer === 'djl') { %>
|
|
279
|
+
# Create serving.properties configuration file for LMI/DJL
|
|
280
|
+
RUN mkdir -p /opt/ml/model
|
|
281
|
+
COPY code/serving.properties /opt/ml/model/serving.properties
|
|
282
|
+
|
|
283
|
+
<% if (comments && comments.troubleshooting) { %>
|
|
284
|
+
<%= comments.troubleshooting %>
|
|
285
|
+
<% } %>
|
|
286
|
+
|
|
287
|
+
# LMI/DJL containers use their own entrypoint
|
|
288
|
+
# The container will automatically start DJL Serving with the configuration
|
|
289
|
+
<% } else { %>
|
|
290
|
+
COPY code/serve /usr/bin/serve
|
|
291
|
+
RUN chmod 777 /usr/bin/serve
|
|
292
|
+
|
|
293
|
+
<% if (comments && comments.troubleshooting) { %>
|
|
294
|
+
<%= comments.troubleshooting %>
|
|
295
|
+
<% } %>
|
|
296
|
+
|
|
297
|
+
ENTRYPOINT [ "/usr/bin/serve" ]
|
|
298
|
+
<% } %>
|
|
299
|
+
|
|
300
|
+
<% } %>
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# IAM Permissions — <%= projectName %>
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
This project uses three sets of IAM permissions:
|
|
6
|
+
|
|
7
|
+
1. **SageMaker Execution Role** — created automatically by `bootstrap` via CloudFormation
|
|
8
|
+
2. **CodeBuild Service Role** — created automatically by `./do/submit`
|
|
9
|
+
3. **User/CI Permissions** — your AWS user or CI system needs these to run the do-scripts
|
|
10
|
+
|
|
11
|
+
## SageMaker Execution Role
|
|
12
|
+
|
|
13
|
+
The bootstrap command creates an IAM role (`mlcc-sagemaker-execution-role`) with permissions for:
|
|
14
|
+
|
|
15
|
+
- **SageMaker**: Create, update, delete, and invoke endpoints, endpoint configs, models, and inference components
|
|
16
|
+
- **ECR**: Pull images from the `ml-container-creator` repository
|
|
17
|
+
- **CloudWatch Logs**: Write container logs
|
|
18
|
+
- **S3**: Read model artifacts from `ml-container-creator-*` buckets
|
|
19
|
+
|
|
20
|
+
The role is defined in the CloudFormation stack template (`config/bootstrap-stack.json`) and updated automatically when you re-run bootstrap after upgrading.
|
|
21
|
+
|
|
22
|
+
If you use a custom role (`--role-arn`), ensure it has at minimum:
|
|
23
|
+
|
|
24
|
+
| Permission | Purpose |
|
|
25
|
+
|-----------|---------|
|
|
26
|
+
| `sagemaker:CreateEndpoint`, `CreateEndpointConfig`, `CreateModel`, `CreateInferenceComponent` | Deploy |
|
|
27
|
+
| `sagemaker:DeleteEndpoint`, `DeleteEndpointConfig`, `DeleteModel`, `DeleteInferenceComponent` | Clean up |
|
|
28
|
+
| `sagemaker:DescribeEndpoint`, `DescribeEndpointConfig`, `DescribeModel`, `DescribeInferenceComponent` | Status checks |
|
|
29
|
+
| `sagemaker:InvokeEndpoint`, `InvokeEndpointAsync` | Inference |
|
|
30
|
+
| `sagemaker:UpdateEndpoint`, `UpdateEndpointWeightsAndCapacities`, `UpdateInferenceComponent` | Updates |
|
|
31
|
+
| `ecr:GetAuthorizationToken`, `BatchGetImage`, `GetDownloadUrlForLayer`, `BatchCheckLayerAvailability` | Pull container image |
|
|
32
|
+
| `logs:CreateLogGroup`, `CreateLogStream`, `PutLogEvents` | Container logging |
|
|
33
|
+
| `s3:GetObject`, `s3:ListBucket` on `ml-container-creator-*` | Model artifact access |
|
|
34
|
+
|
|
35
|
+
Trust policy must allow `sagemaker.amazonaws.com` to assume the role.
|
|
36
|
+
|
|
37
|
+
## CodeBuild Service Role
|
|
38
|
+
|
|
39
|
+
Created automatically by `./do/submit` as `<%= codebuildProjectName %>-service-role`. Permissions:
|
|
40
|
+
|
|
41
|
+
- **CloudWatch Logs**: Write build logs to `/aws/codebuild/<%= codebuildProjectName %>*`
|
|
42
|
+
- **ECR**: Push images to `ml-container-creator` repository
|
|
43
|
+
- **S3**: Read source archives from `codebuild-source-*` buckets
|
|
44
|
+
|
|
45
|
+
## User/CI Permissions
|
|
46
|
+
|
|
47
|
+
Your AWS user or CI system needs these permissions to run the do-scripts:
|
|
48
|
+
|
|
49
|
+
| Script | Permissions Needed |
|
|
50
|
+
|--------|-------------------|
|
|
51
|
+
| `./do/push` | `ecr:GetAuthorizationToken`, `ecr:PutImage`, `ecr:InitiateLayerUpload`, `ecr:UploadLayerPart`, `ecr:CompleteLayerUpload`, `ecr:BatchCheckLayerAvailability` |
|
|
52
|
+
| `./do/submit` | `codebuild:CreateProject`, `codebuild:StartBuild`, `codebuild:BatchGetBuilds`, `iam:CreateRole`, `iam:PutRolePolicy`, `iam:PassRole`, `s3:PutObject`, `s3:CreateBucket` |
|
|
53
|
+
| `./do/deploy` | `sagemaker:CreateEndpointConfig`, `sagemaker:CreateEndpoint`, `sagemaker:CreateInferenceComponent`, `sagemaker:DescribeEndpoint`, `iam:PassRole` |
|
|
54
|
+
| `./do/clean` | `sagemaker:DeleteEndpoint`, `sagemaker:DeleteEndpointConfig`, `sagemaker:DeleteInferenceComponent`, `codebuild:DeleteProject`, `iam:DeleteRole`, `iam:DeleteRolePolicy` |
|
|
55
|
+
| `./do/test` | `sagemaker-runtime:InvokeEndpoint` |
|
|
56
|
+
| `bootstrap` | `cloudformation:*`, `iam:CreateRole`, `iam:PutRolePolicy`, `iam:TagRole`, `ecr:CreateRepository`, `s3:CreateBucket` (and `sts:GetCallerIdentity`) |
|
|
57
|
+
|
|
58
|
+
<% if (framework === 'transformers' && hfToken) { %>
|
|
59
|
+
## HuggingFace Token Security
|
|
60
|
+
|
|
61
|
+
This project includes a HuggingFace token baked into the Docker image. Key practices:
|
|
62
|
+
|
|
63
|
+
- **Use read-only tokens** — never bake write tokens into containers
|
|
64
|
+
- **Rotate regularly** — every 30–90 days, or immediately if compromised
|
|
65
|
+
- **Restrict ECR access** — limit who can pull images containing the token
|
|
66
|
+
- **Consider runtime injection** — pass `HF_TOKEN` as a SageMaker environment variable instead of baking it in (avoids token in image layers, enables rotation without rebuild)
|
|
67
|
+
|
|
68
|
+
To rotate: generate a new token on [HuggingFace](https://huggingface.co/settings/tokens), rebuild with `./do/submit`, revoke the old token.
|
|
69
|
+
|
|
70
|
+
If compromised: revoke the token immediately, delete the ECR image (`aws ecr batch-delete-image`), rebuild, and review CloudTrail logs.
|
|
71
|
+
<% } %>
|
|
72
|
+
|
|
73
|
+
## Security Best Practices
|
|
74
|
+
|
|
75
|
+
- **Least privilege**: All roles are scoped to specific resources where possible
|
|
76
|
+
- **Resource scoping**: CodeBuild permissions scoped to `<%= codebuildProjectName %>`, SageMaker to `<%= projectName %>*`
|
|
77
|
+
- **Audit**: Enable CloudTrail for IAM, SageMaker, ECR, and CodeBuild events
|
|
78
|
+
- **Separate environments**: Consider per-environment roles (dev/prod)
|
|
79
|
+
|
|
80
|
+
## References
|
|
81
|
+
|
|
82
|
+
- [SageMaker Execution Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html)
|
|
83
|
+
- [CodeBuild Service Role](https://docs.aws.amazon.com/codebuild/latest/userguide/setting-up.html#setting-up-service-role)
|
|
84
|
+
- [ECR Permissions](https://docs.aws.amazon.com/AmazonECR/latest/userguide/security_iam_service-with-iam.html)
|