aws-cdk-neuronx-patterns 0.0.4 → 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.jsii CHANGED
@@ -3851,7 +3851,7 @@
3851
3851
  },
3852
3852
  "name": "aws-cdk-neuronx-patterns",
3853
3853
  "readme": {
3854
- "markdown": "# Neuronx patterns Construct Library\n\n> [!WARNING]\n> This library is experimental module.\n\nThis library provides high-level architectural patterns using neuronx (e.g. Inferentia2 and Trainium1). It contains:\n\n- Transformers Neuronx SageMaker Real-time Inference Endpoint\n- Neuronx Compile\n\n## Transformers Neuronx SageMaker Real-time Inference Endpoint\n\n> [!WARNING]\n> This construct uses an Inferentia2 instance on SageMaker. You may need to increase your request limit for your AWS account.\n\nBy using the `NeuronxCompile` construct included in this construct library, models published on HuggingFace can be easily deployed to SageMaker Real-time inference. To define using the `NeuronxCompile` construct:\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nconst compile = new NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-7b-chat\", {\n parameters: Parameters.billion(7),\n }),\n});\nnew TransformersNeuronxSageMakerRealtimeInferenceEndpoint(\n this,\n \"RealtimeInference\",\n {\n modelData:\n TransformersNeuronxSageMakerInferenceModelData.fromNeuronxCompile(\n compile,\n ),\n },\n);\n```\n\n### Default inference code\n\nBy default, default inference code is deployed to implement the chat interface. The default inference code takes an object array like [transformers' conversations](https://huggingface.co/docs/transformers/main/en/conversations) and responds to the generated text. The following code is an example using the AWS SDK for JavaScript v3.\n\n```ts\nimport {\n InvokeEndpointCommand,\n SageMakerRuntimeClient,\n} from \"@aws-sdk/client-sagemaker-runtime\";\n\nconst client = new SageMakerRuntimeClient({\n region: \"us-east-1\",\n});\nclient\n .send(\n new InvokeEndpointCommand({\n EndpointName: \"my-endpoint-id\",\n Body: JSON.stringify({\n // Optional. You can change answer role.\n role: \"ai\",\n // Require. The messages like conversation.\n messages: [\n {\n role: \"system\",\n content: `You are helpfull assistant.`,\n },\n {\n role: \"user\",\n content:\n \"please answer '1+1=?'. You must answer only answer numeric.\",\n },\n ],\n }),\n ContentType: \"application/json\",\n Accept: \"application/json\",\n }),\n )\n .then((res) => {\n // { generated_text: \"2\" }\n console.log(JSON.parse(res.Body.transformToString()));\n });\n```\n\nTo change your own inference code, you can pass the code source.\n\n```ts\nimport * as s3Deplyment from \"aws-cdk-lib/aws-s3-deployment\";\n\ndeclare const compile: NeuronxCompile;\nnew TransformersNeuronxSageMakerRealtimeInferenceEndpoint(\n this,\n \"RealtimeInference\",\n {\n modelData:\n TransformersNeuronxSageMakerInferenceModelData.fromNeuronxCompile(\n compile,\n s3Deplyment.Source.asset(\"path/to/my/code/directory\"),\n ),\n },\n);\n```\n\n## Neuronx Compile\n\n> [!WARNING]\n> This construct uses an Inferentia2 instance on EC2. You may need to increase your request limit for your AWS account.\n\nThis construct compiles models supported by Neuronx and uploads them to the specified S3 bucket. To define\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nconst compile = new NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-7b-chat\", {\n parameters: Parameters.billion(7),\n }),\n});\n\n// Get the compiled artifacts from this S3 URL\nnew CfnOutput(this, \"CompiledArtifact\", {\n value: compile.compiledArtifactS3Url,\n});\n```\n\nThis construct assumes the required instance type depending on the number of model parameters.\n\nAfter compiled, you can see like the this file tree in the S3 bucket.\n\n```txt\n{compiledArtifactS3Url}/\n├── model\n│ ├── config.json\n│ ├── tokenizer_config.json\n│ ├── xxx.safetensors\n│ └── xxx.safetensors\n└── compiled\n ├── xxx.neff\n ├── xxx.neff\n └── xxx.neff\n```\n\nThis is NeuronxCompile architecture.\n![NeuronxCompile architecture](./docs/neuronx-compile-architecture.png)\n\n### Spot Instance\n\n> [!WARNING]\n> If you use Spot Instances, check if the request limit for Spot has been increased.\n\nYou can also use Spot Instances.\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nnew NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-7b-chat\", {\n parameters: Parameters.billion(7),\n }),\n spot: true,\n});\n```\n\n### Compile Options\n\nIf you are familiar with Neuronx, you can also specify compilation options to better meet your requirements.\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nnew NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-22b-chat\", {\n parameters: Parameters.billion(22),\n }),\n compileOptions: {\n nPositions: 1024,\n quantDtype: QuantDtype.S8,\n optLevel: OptLevel.MODEL_EXECUTION_PERFORMANCE,\n },\n});\n```\n"
3854
+ "markdown": "# Neuronx patterns Construct Library\n\n> [!WARNING]\n> This library is experimental module.\n\nThis library provides high-level architectural patterns using neuronx (e.g. Inferentia2 and Trainium1). It contains:\n\n- Transformers Neuronx SageMaker Real-time Inference Endpoint\n- Neuronx Compile\n\n## Transformers Neuronx SageMaker Real-time Inference Endpoint\n\n> [!WARNING]\n> This construct uses an Inferentia2 instance on SageMaker. You may need to increase your request limit for your AWS account.\n\nBy using the `NeuronxCompile` construct included in this construct library, models published on HuggingFace can be easily deployed to SageMaker Real-time inference. To define using the `NeuronxCompile` construct:\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nconst compile = new NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-7b-chat\"),\n});\nnew TransformersNeuronxSageMakerRealtimeInferenceEndpoint(\n this,\n \"RealtimeInference\",\n {\n modelData:\n TransformersNeuronxSageMakerInferenceModelData.fromNeuronxCompile(\n compile,\n ),\n },\n);\n```\n\nThis is TransformersNeuronxSageMakerRealtimeInferenceEndpoint architecture.\n![TransformersNeuronxSageMakerRealtimeInferenceEndpoint architecture](./docs/transformers-neuronx-sagemaker-realtime-inference-architecture.png)\n\n### Default inference code\n\nBy default, default inference code is deployed to implement the chat interface. The default inference code takes an object array like [transformers' conversations](https://huggingface.co/docs/transformers/main/en/conversations) and responds to the generated text. The following code is an example using the AWS SDK for JavaScript v3.\n\n```ts\nimport {\n InvokeEndpointCommand,\n SageMakerRuntimeClient,\n} from \"@aws-sdk/client-sagemaker-runtime\";\n\nconst client = new SageMakerRuntimeClient({\n region: \"us-east-1\",\n});\nclient\n .send(\n new InvokeEndpointCommand({\n EndpointName: \"my-endpoint-id\",\n Body: JSON.stringify({\n // Optional. You can change answer role.\n role: \"ai\",\n // Require. The messages like conversation.\n messages: [\n {\n role: \"system\",\n content: `You are helpfull assistant.`,\n },\n {\n role: \"user\",\n content:\n \"please answer '1+1=?'. You must answer only answer numeric.\",\n },\n ],\n }),\n ContentType: \"application/json\",\n Accept: \"application/json\",\n }),\n )\n .then((res) => {\n // { generated_text: \"2\" }\n console.log(JSON.parse(res.Body.transformToString()));\n });\n```\n\nTo use your own inference code, you can pass the code to model data option.\n\n```ts\nimport * as s3Deplyment from \"aws-cdk-lib/aws-s3-deployment\";\n\ndeclare const compile: NeuronxCompile;\nnew TransformersNeuronxSageMakerRealtimeInferenceEndpoint(\n this,\n \"RealtimeInference\",\n {\n modelData:\n TransformersNeuronxSageMakerInferenceModelData.fromNeuronxCompile(\n compile,\n s3Deplyment.Source.asset(\"path/to/my/code/directory\"),\n ),\n },\n);\n```\n\n## Neuronx Compile\n\n> [!WARNING]\n> This construct uses an Inferentia2 instance on EC2. You may need to increase your request limit for your AWS account.\n\nThis construct compiles models supported by Neuronx and uploads them to the specified S3 bucket. To define\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nconst compile = new NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-7b-chat\"),\n});\n\n// Get the compiled artifacts from this S3 URL\nnew CfnOutput(this, \"CompiledArtifact\", {\n value: compile.compiledArtifactS3Url,\n});\n```\n\nThis construct assumes the required instance type depending on the number of model parameters.\n\nAfter compiled, you can see like the this file tree in the S3 bucket.\n\n```txt\n{compiledArtifactS3Url}/\n├── model\n│ ├── config.json\n│ ├── tokenizer_config.json\n│ ├── xxx.safetensors\n│ └── xxx.safetensors\n└── compiled\n ├── xxx.neff\n ├── xxx.neff\n └── xxx.neff\n```\n\nThis is NeuronxCompile architecture.\n![NeuronxCompile architecture](./docs/neuronx-compile-architecture.png)\n\n### Spot Instance\n\n> [!WARNING]\n> If you use Spot Instances, check if the request limit for Spot has been increased.\n\nYou can also use Spot Instances.\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nnew NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-7b-chat\", {\n parameters: Parameters.billion(7),\n }),\n spot: true,\n});\n```\n\n### Compile Options\n\nIf you are familiar with Neuronx, you can also specify compilation options to better meet your requirements.\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nnew NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-22b-chat\"),\n compileOptions: {\n nPositions: 1024,\n quantDtype: QuantDtype.S8,\n optLevel: OptLevel.MODEL_EXECUTION_PERFORMANCE,\n },\n});\n```\n"
3855
3855
  },
3856
3856
  "repository": {
3857
3857
  "type": "git",
@@ -4289,7 +4289,7 @@
4289
4289
  },
4290
4290
  "locationInModule": {
4291
4291
  "filename": "src/model.ts",
4292
- "line": 101
4292
+ "line": 112
4293
4293
  },
4294
4294
  "name": "fromBucket",
4295
4295
  "parameters": [
@@ -4354,6 +4354,7 @@
4354
4354
  "summary": "model basic infromation."
4355
4355
  },
4356
4356
  "name": "options",
4357
+ "optional": true,
4357
4358
  "type": {
4358
4359
  "fqn": "aws-cdk-neuronx-patterns.ModelOptions"
4359
4360
  }
@@ -4376,7 +4377,7 @@
4376
4377
  "immutable": true,
4377
4378
  "locationInModule": {
4378
4379
  "filename": "src/model.ts",
4379
- "line": 105
4380
+ "line": 116
4380
4381
  },
4381
4382
  "name": "modelId",
4382
4383
  "type": {
@@ -4390,7 +4391,7 @@
4390
4391
  "immutable": true,
4391
4392
  "locationInModule": {
4392
4393
  "filename": "src/model.ts",
4393
- "line": 106
4394
+ "line": 117
4394
4395
  },
4395
4396
  "name": "options",
4396
4397
  "type": {
@@ -4404,7 +4405,7 @@
4404
4405
  "immutable": true,
4405
4406
  "locationInModule": {
4406
4407
  "filename": "src/model.ts",
4407
- "line": 107
4408
+ "line": 118
4408
4409
  },
4409
4410
  "name": "bucket",
4410
4411
  "optional": true,
@@ -4419,7 +4420,7 @@
4419
4420
  "immutable": true,
4420
4421
  "locationInModule": {
4421
4422
  "filename": "src/model.ts",
4422
- "line": 108
4423
+ "line": 119
4423
4424
  },
4424
4425
  "name": "prefix",
4425
4426
  "optional": true,
@@ -4809,7 +4810,7 @@
4809
4810
  "kind": "class",
4810
4811
  "locationInModule": {
4811
4812
  "filename": "src/neuronx-instance-type.ts",
4812
- "line": 22
4813
+ "line": 31
4813
4814
  },
4814
4815
  "methods": [
4815
4816
  {
@@ -4820,7 +4821,7 @@
4820
4821
  },
4821
4822
  "locationInModule": {
4822
4823
  "filename": "src/neuronx-instance-type.ts",
4823
- "line": 69
4824
+ "line": 96
4824
4825
  },
4825
4826
  "name": "toString",
4826
4827
  "returns": {
@@ -4841,7 +4842,7 @@
4841
4842
  "immutable": true,
4842
4843
  "locationInModule": {
4843
4844
  "filename": "src/neuronx-instance-type.ts",
4844
- "line": 44
4845
+ "line": 53
4845
4846
  },
4846
4847
  "name": "INF2_24XLARGE",
4847
4848
  "static": true,
@@ -4858,7 +4859,7 @@
4858
4859
  "immutable": true,
4859
4860
  "locationInModule": {
4860
4861
  "filename": "src/neuronx-instance-type.ts",
4861
- "line": 53
4862
+ "line": 62
4862
4863
  },
4863
4864
  "name": "INF2_48XLARGE",
4864
4865
  "static": true,
@@ -4875,7 +4876,7 @@
4875
4876
  "immutable": true,
4876
4877
  "locationInModule": {
4877
4878
  "filename": "src/neuronx-instance-type.ts",
4878
- "line": 35
4879
+ "line": 44
4879
4880
  },
4880
4881
  "name": "INF2_8XLARGE",
4881
4882
  "static": true,
@@ -4892,7 +4893,7 @@
4892
4893
  "immutable": true,
4893
4894
  "locationInModule": {
4894
4895
  "filename": "src/neuronx-instance-type.ts",
4895
- "line": 26
4896
+ "line": 35
4896
4897
  },
4897
4898
  "name": "INF2_XLARGE",
4898
4899
  "static": true,
@@ -4900,6 +4901,40 @@
4900
4901
  "fqn": "aws-cdk-neuronx-patterns.NeuronxInstanceType"
4901
4902
  }
4902
4903
  },
4904
+ {
4905
+ "const": true,
4906
+ "docs": {
4907
+ "stability": "stable",
4908
+ "summary": "ml.trn1.2xlarge."
4909
+ },
4910
+ "immutable": true,
4911
+ "locationInModule": {
4912
+ "filename": "src/neuronx-instance-type.ts",
4913
+ "line": 71
4914
+ },
4915
+ "name": "TRN1_2XLARGE",
4916
+ "static": true,
4917
+ "type": {
4918
+ "fqn": "aws-cdk-neuronx-patterns.NeuronxInstanceType"
4919
+ }
4920
+ },
4921
+ {
4922
+ "const": true,
4923
+ "docs": {
4924
+ "stability": "stable",
4925
+ "summary": "ml.trn1.32xlarge."
4926
+ },
4927
+ "immutable": true,
4928
+ "locationInModule": {
4929
+ "filename": "src/neuronx-instance-type.ts",
4930
+ "line": 80
4931
+ },
4932
+ "name": "TRN1_32XLARGE",
4933
+ "static": true,
4934
+ "type": {
4935
+ "fqn": "aws-cdk-neuronx-patterns.NeuronxInstanceType"
4936
+ }
4937
+ },
4903
4938
  {
4904
4939
  "docs": {
4905
4940
  "stability": "stable"
@@ -4907,7 +4942,7 @@
4907
4942
  "immutable": true,
4908
4943
  "locationInModule": {
4909
4944
  "filename": "src/neuronx-instance-type.ts",
4910
- "line": 63
4945
+ "line": 90
4911
4946
  },
4912
4947
  "name": "acceleratorChips",
4913
4948
  "type": {
@@ -4921,7 +4956,7 @@
4921
4956
  "immutable": true,
4922
4957
  "locationInModule": {
4923
4958
  "filename": "src/neuronx-instance-type.ts",
4924
- "line": 60
4959
+ "line": 87
4925
4960
  },
4926
4961
  "name": "instanceType",
4927
4962
  "type": {
@@ -4935,7 +4970,7 @@
4935
4970
  "immutable": true,
4936
4971
  "locationInModule": {
4937
4972
  "filename": "src/neuronx-instance-type.ts",
4938
- "line": 62
4973
+ "line": 89
4939
4974
  },
4940
4975
  "name": "memory",
4941
4976
  "type": {
@@ -4949,7 +4984,7 @@
4949
4984
  "immutable": true,
4950
4985
  "locationInModule": {
4951
4986
  "filename": "src/neuronx-instance-type.ts",
4952
- "line": 61
4987
+ "line": 88
4953
4988
  },
4954
4989
  "name": "vCpu",
4955
4990
  "type": {
@@ -5084,6 +5119,87 @@
5084
5119
  "name": "QuantDtype",
5085
5120
  "symbolId": "src/model:QuantDtype"
5086
5121
  },
5122
+ "aws-cdk-neuronx-patterns.TrainiumChips": {
5123
+ "assembly": "aws-cdk-neuronx-patterns",
5124
+ "docs": {
5125
+ "stability": "stable"
5126
+ },
5127
+ "fqn": "aws-cdk-neuronx-patterns.TrainiumChips",
5128
+ "initializer": {
5129
+ "docs": {
5130
+ "stability": "stable"
5131
+ },
5132
+ "locationInModule": {
5133
+ "filename": "src/neuronx-instance-type.ts",
5134
+ "line": 25
5135
+ },
5136
+ "parameters": [
5137
+ {
5138
+ "name": "chips",
5139
+ "type": {
5140
+ "primitive": "number"
5141
+ }
5142
+ }
5143
+ ]
5144
+ },
5145
+ "interfaces": [
5146
+ "aws-cdk-neuronx-patterns.IAcceleratorChips"
5147
+ ],
5148
+ "kind": "class",
5149
+ "locationInModule": {
5150
+ "filename": "src/neuronx-instance-type.ts",
5151
+ "line": 22
5152
+ },
5153
+ "name": "TrainiumChips",
5154
+ "properties": [
5155
+ {
5156
+ "docs": {
5157
+ "stability": "stable"
5158
+ },
5159
+ "immutable": true,
5160
+ "locationInModule": {
5161
+ "filename": "src/neuronx-instance-type.ts",
5162
+ "line": 24
5163
+ },
5164
+ "name": "acceleratorMemory",
5165
+ "overrides": "aws-cdk-neuronx-patterns.IAcceleratorChips",
5166
+ "type": {
5167
+ "fqn": "aws-cdk-lib.Size"
5168
+ }
5169
+ },
5170
+ {
5171
+ "docs": {
5172
+ "stability": "stable"
5173
+ },
5174
+ "immutable": true,
5175
+ "locationInModule": {
5176
+ "filename": "src/neuronx-instance-type.ts",
5177
+ "line": 25
5178
+ },
5179
+ "name": "chips",
5180
+ "overrides": "aws-cdk-neuronx-patterns.IAcceleratorChips",
5181
+ "type": {
5182
+ "primitive": "number"
5183
+ }
5184
+ },
5185
+ {
5186
+ "docs": {
5187
+ "stability": "stable"
5188
+ },
5189
+ "immutable": true,
5190
+ "locationInModule": {
5191
+ "filename": "src/neuronx-instance-type.ts",
5192
+ "line": 23
5193
+ },
5194
+ "name": "neuronxCores",
5195
+ "overrides": "aws-cdk-neuronx-patterns.IAcceleratorChips",
5196
+ "type": {
5197
+ "primitive": "number"
5198
+ }
5199
+ }
5200
+ ],
5201
+ "symbolId": "src/neuronx-instance-type:TrainiumChips"
5202
+ },
5087
5203
  "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerInferenceModelData": {
5088
5204
  "assembly": "aws-cdk-neuronx-patterns",
5089
5205
  "docs": {
@@ -5360,7 +5476,7 @@
5360
5476
  },
5361
5477
  "locationInModule": {
5362
5478
  "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5363
- "line": 287
5479
+ "line": 289
5364
5480
  },
5365
5481
  "name": "grantInvoke",
5366
5482
  "parameters": [
@@ -5569,6 +5685,6 @@
5569
5685
  "symbolId": "src/transformers-neuronx-sagemaker-realtime-inference:TransformersNeuronxSageMakerRealtimeInferenceEndpointProps"
5570
5686
  }
5571
5687
  },
5572
- "version": "0.0.4",
5573
- "fingerprint": "XO2QlDqW3bojQGczF18xCsjcRr+vPLQOWOpbuoVbGs4="
5688
+ "version": "0.0.6",
5689
+ "fingerprint": "LPX68IUEckJ1umM4OsY8VpXPj7Ewo76D+yOrkS5BDK4="
5574
5690
  }
package/API.md CHANGED
@@ -1026,7 +1026,7 @@ model basic infromation.
1026
1026
  ```typescript
1027
1027
  import { Model } from 'aws-cdk-neuronx-patterns'
1028
1028
 
1029
- Model.fromHuggingFace(modelId: string, options: ModelOptions)
1029
+ Model.fromHuggingFace(modelId: string, options?: ModelOptions)
1030
1030
  ```
1031
1031
 
1032
1032
  model informations at HuggingFace.
@@ -1039,7 +1039,7 @@ model id on the HuggingFace.
1039
1039
 
1040
1040
  ---
1041
1041
 
1042
- ###### `options`<sup>Required</sup> <a name="options" id="aws-cdk-neuronx-patterns.Model.fromHuggingFace.parameter.options"></a>
1042
+ ###### `options`<sup>Optional</sup> <a name="options" id="aws-cdk-neuronx-patterns.Model.fromHuggingFace.parameter.options"></a>
1043
1043
 
1044
1044
  - *Type:* <a href="#aws-cdk-neuronx-patterns.ModelOptions">ModelOptions</a>
1045
1045
 
@@ -1177,6 +1177,8 @@ public readonly vCpu: number;
1177
1177
  | <code><a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType.property.INF2_48XLARGE">INF2_48XLARGE</a></code> | <code><a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType">NeuronxInstanceType</a></code> | ml.inf2.48xlarge. |
1178
1178
  | <code><a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType.property.INF2_8XLARGE">INF2_8XLARGE</a></code> | <code><a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType">NeuronxInstanceType</a></code> | ml.inf2.8xlarge. |
1179
1179
  | <code><a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType.property.INF2_XLARGE">INF2_XLARGE</a></code> | <code><a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType">NeuronxInstanceType</a></code> | ml.inf2.xlarge. |
1180
+ | <code><a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType.property.TRN1_2XLARGE">TRN1_2XLARGE</a></code> | <code><a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType">NeuronxInstanceType</a></code> | ml.trn1.2xlarge. |
1181
+ | <code><a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType.property.TRN1_32XLARGE">TRN1_32XLARGE</a></code> | <code><a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType">NeuronxInstanceType</a></code> | ml.trn1.32xlarge. |
1180
1182
 
1181
1183
  ---
1182
1184
 
@@ -1228,6 +1230,30 @@ ml.inf2.xlarge.
1228
1230
 
1229
1231
  ---
1230
1232
 
1233
+ ##### `TRN1_2XLARGE`<sup>Required</sup> <a name="TRN1_2XLARGE" id="aws-cdk-neuronx-patterns.NeuronxInstanceType.property.TRN1_2XLARGE"></a>
1234
+
1235
+ ```typescript
1236
+ public readonly TRN1_2XLARGE: NeuronxInstanceType;
1237
+ ```
1238
+
1239
+ - *Type:* <a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType">NeuronxInstanceType</a>
1240
+
1241
+ ml.trn1.2xlarge.
1242
+
1243
+ ---
1244
+
1245
+ ##### `TRN1_32XLARGE`<sup>Required</sup> <a name="TRN1_32XLARGE" id="aws-cdk-neuronx-patterns.NeuronxInstanceType.property.TRN1_32XLARGE"></a>
1246
+
1247
+ ```typescript
1248
+ public readonly TRN1_32XLARGE: NeuronxInstanceType;
1249
+ ```
1250
+
1251
+ - *Type:* <a href="#aws-cdk-neuronx-patterns.NeuronxInstanceType">NeuronxInstanceType</a>
1252
+
1253
+ ml.trn1.32xlarge.
1254
+
1255
+ ---
1256
+
1231
1257
  ### Parameters <a name="Parameters" id="aws-cdk-neuronx-patterns.Parameters"></a>
1232
1258
 
1233
1259
  Represents the amount of parameters.
@@ -1276,6 +1302,73 @@ number of parameters bilionX.
1276
1302
 
1277
1303
 
1278
1304
 
1305
+ ### TrainiumChips <a name="TrainiumChips" id="aws-cdk-neuronx-patterns.TrainiumChips"></a>
1306
+
1307
+ - *Implements:* <a href="#aws-cdk-neuronx-patterns.IAcceleratorChips">IAcceleratorChips</a>
1308
+
1309
+ #### Initializers <a name="Initializers" id="aws-cdk-neuronx-patterns.TrainiumChips.Initializer"></a>
1310
+
1311
+ ```typescript
1312
+ import { TrainiumChips } from 'aws-cdk-neuronx-patterns'
1313
+
1314
+ new TrainiumChips(chips: number)
1315
+ ```
1316
+
1317
+ | **Name** | **Type** | **Description** |
1318
+ | --- | --- | --- |
1319
+ | <code><a href="#aws-cdk-neuronx-patterns.TrainiumChips.Initializer.parameter.chips">chips</a></code> | <code>number</code> | *No description.* |
1320
+
1321
+ ---
1322
+
1323
+ ##### `chips`<sup>Required</sup> <a name="chips" id="aws-cdk-neuronx-patterns.TrainiumChips.Initializer.parameter.chips"></a>
1324
+
1325
+ - *Type:* number
1326
+
1327
+ ---
1328
+
1329
+
1330
+
1331
+ #### Properties <a name="Properties" id="Properties"></a>
1332
+
1333
+ | **Name** | **Type** | **Description** |
1334
+ | --- | --- | --- |
1335
+ | <code><a href="#aws-cdk-neuronx-patterns.TrainiumChips.property.acceleratorMemory">acceleratorMemory</a></code> | <code>aws-cdk-lib.Size</code> | *No description.* |
1336
+ | <code><a href="#aws-cdk-neuronx-patterns.TrainiumChips.property.chips">chips</a></code> | <code>number</code> | *No description.* |
1337
+ | <code><a href="#aws-cdk-neuronx-patterns.TrainiumChips.property.neuronxCores">neuronxCores</a></code> | <code>number</code> | *No description.* |
1338
+
1339
+ ---
1340
+
1341
+ ##### `acceleratorMemory`<sup>Required</sup> <a name="acceleratorMemory" id="aws-cdk-neuronx-patterns.TrainiumChips.property.acceleratorMemory"></a>
1342
+
1343
+ ```typescript
1344
+ public readonly acceleratorMemory: Size;
1345
+ ```
1346
+
1347
+ - *Type:* aws-cdk-lib.Size
1348
+
1349
+ ---
1350
+
1351
+ ##### `chips`<sup>Required</sup> <a name="chips" id="aws-cdk-neuronx-patterns.TrainiumChips.property.chips"></a>
1352
+
1353
+ ```typescript
1354
+ public readonly chips: number;
1355
+ ```
1356
+
1357
+ - *Type:* number
1358
+
1359
+ ---
1360
+
1361
+ ##### `neuronxCores`<sup>Required</sup> <a name="neuronxCores" id="aws-cdk-neuronx-patterns.TrainiumChips.property.neuronxCores"></a>
1362
+
1363
+ ```typescript
1364
+ public readonly neuronxCores: number;
1365
+ ```
1366
+
1367
+ - *Type:* number
1368
+
1369
+ ---
1370
+
1371
+
1279
1372
  ### TransformersNeuronxSageMakerInferenceModelData <a name="TransformersNeuronxSageMakerInferenceModelData" id="aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerInferenceModelData"></a>
1280
1373
 
1281
1374
 
@@ -1456,7 +1549,7 @@ public readonly quantDtype: QuantDtype;
1456
1549
 
1457
1550
  ### IAcceleratorChips <a name="IAcceleratorChips" id="aws-cdk-neuronx-patterns.IAcceleratorChips"></a>
1458
1551
 
1459
- - *Implemented By:* <a href="#aws-cdk-neuronx-patterns.Inferentia2Chips">Inferentia2Chips</a>, <a href="#aws-cdk-neuronx-patterns.IAcceleratorChips">IAcceleratorChips</a>
1552
+ - *Implemented By:* <a href="#aws-cdk-neuronx-patterns.Inferentia2Chips">Inferentia2Chips</a>, <a href="#aws-cdk-neuronx-patterns.TrainiumChips">TrainiumChips</a>, <a href="#aws-cdk-neuronx-patterns.IAcceleratorChips">IAcceleratorChips</a>
1460
1553
 
1461
1554
 
1462
1555
  #### Properties <a name="Properties" id="Properties"></a>
package/README.md CHANGED
@@ -24,9 +24,7 @@ declare const bucket: s3.Bucket;
24
24
  const compile = new NeuronxCompile(this, "NeuronxCompile", {
25
25
  vpc,
26
26
  bucket,
27
- model: Model.fromHuggingFace("example/example-7b-chat", {
28
- parameters: Parameters.billion(7),
29
- }),
27
+ model: Model.fromHuggingFace("example/example-7b-chat"),
30
28
  });
31
29
  new TransformersNeuronxSageMakerRealtimeInferenceEndpoint(
32
30
  this,
@@ -40,6 +38,9 @@ new TransformersNeuronxSageMakerRealtimeInferenceEndpoint(
40
38
  );
41
39
  ```
42
40
 
41
+ This is TransformersNeuronxSageMakerRealtimeInferenceEndpoint architecture.
42
+ ![TransformersNeuronxSageMakerRealtimeInferenceEndpoint architecture](./docs/transformers-neuronx-sagemaker-realtime-inference-architecture.png)
43
+
43
44
  ### Default inference code
44
45
 
45
46
  By default, default inference code is deployed to implement the chat interface. The default inference code takes an object array like [transformers' conversations](https://huggingface.co/docs/transformers/main/en/conversations) and responds to the generated text. The following code is an example using the AWS SDK for JavaScript v3.
@@ -83,7 +84,7 @@ client
83
84
  });
84
85
  ```
85
86
 
86
- To change your own inference code, you can pass the code source.
87
+ To use your own inference code, you can pass the code to model data option.
87
88
 
88
89
  ```ts
89
90
  import * as s3Deplyment from "aws-cdk-lib/aws-s3-deployment";
@@ -118,9 +119,7 @@ declare const bucket: s3.Bucket;
118
119
  const compile = new NeuronxCompile(this, "NeuronxCompile", {
119
120
  vpc,
120
121
  bucket,
121
- model: Model.fromHuggingFace("example/example-7b-chat", {
122
- parameters: Parameters.billion(7),
123
- }),
122
+ model: Model.fromHuggingFace("example/example-7b-chat"),
124
123
  });
125
124
 
126
125
  // Get the compiled artifacts from this S3 URL
@@ -185,9 +184,7 @@ declare const bucket: s3.Bucket;
185
184
  new NeuronxCompile(this, "NeuronxCompile", {
186
185
  vpc,
187
186
  bucket,
188
- model: Model.fromHuggingFace("example/example-22b-chat", {
189
- parameters: Parameters.billion(22),
190
- }),
187
+ model: Model.fromHuggingFace("example/example-22b-chat"),
191
188
  compileOptions: {
192
189
  nPositions: 1024,
193
190
  quantDtype: QuantDtype.S8,
@@ -84,7 +84,7 @@ export declare class Model {
84
84
  * @param options model basic infromation
85
85
  * @returns model instance
86
86
  */
87
- static fromHuggingFace(modelId: string, options: ModelOptions): Model;
87
+ static fromHuggingFace(modelId: string, options?: ModelOptions): Model;
88
88
  /**
89
89
  * model informations at S3 Bucket
90
90
  * @param bucket Model stored S3 Bucket
@@ -14,6 +14,12 @@ export declare class Inferentia2Chips implements IAcceleratorChips {
14
14
  readonly acceleratorMemory: Size;
15
15
  constructor(chips: number);
16
16
  }
17
+ export declare class TrainiumChips implements IAcceleratorChips {
18
+ readonly chips: number;
19
+ readonly neuronxCores: number;
20
+ readonly acceleratorMemory: Size;
21
+ constructor(chips: number);
22
+ }
17
23
  export declare class NeuronxInstanceType {
18
24
  readonly instanceType: ec2.InstanceType;
19
25
  readonly vCpu: number;
@@ -35,6 +41,14 @@ export declare class NeuronxInstanceType {
35
41
  * ml.inf2.48xlarge
36
42
  */
37
43
  static readonly INF2_48XLARGE: NeuronxInstanceType;
44
+ /**
45
+ * ml.trn1.2xlarge
46
+ */
47
+ static readonly TRN1_2XLARGE: NeuronxInstanceType;
48
+ /**
49
+ * ml.trn1.32xlarge
50
+ */
51
+ static readonly TRN1_32XLARGE: NeuronxInstanceType;
38
52
  private constructor();
39
53
  /**
40
54
  * Return the instance type as a string
package/lib/model.d.ts CHANGED
@@ -84,7 +84,7 @@ export declare class Model {
84
84
  * @param options model basic infromation
85
85
  * @returns model instance
86
86
  */
87
- static fromHuggingFace(modelId: string, options: ModelOptions): Model;
87
+ static fromHuggingFace(modelId: string, options?: ModelOptions): Model;
88
88
  /**
89
89
  * model informations at S3 Bucket
90
90
  * @param bucket Model stored S3 Bucket
package/lib/model.js CHANGED
@@ -56,7 +56,7 @@ class Parameters {
56
56
  }
57
57
  exports.Parameters = Parameters;
58
58
  _a = JSII_RTTI_SYMBOL_1;
59
- Parameters[_a] = { fqn: "aws-cdk-neuronx-patterns.Parameters", version: "0.0.4" };
59
+ Parameters[_a] = { fqn: "aws-cdk-neuronx-patterns.Parameters", version: "0.0.6" };
60
60
  /**
61
61
  * Compile target model.
62
62
  */
@@ -68,7 +68,15 @@ class Model {
68
68
  * @returns model instance
69
69
  */
70
70
  static fromHuggingFace(modelId, options) {
71
- return new Model(modelId, options);
71
+ const inferParameters = modelId.match(/(\d+)b/);
72
+ if (!options?.parameters && !inferParameters) {
73
+ throw new Error("The number of parameters cannot be inferred from the model ID. Set optional parameters.");
74
+ }
75
+ const parameters = options?.parameters ?? Parameters.billion(parseInt(inferParameters[1]));
76
+ return new Model(modelId, {
77
+ ...options,
78
+ parameters,
79
+ });
72
80
  }
73
81
  /**
74
82
  * model informations at S3 Bucket
@@ -89,5 +97,5 @@ class Model {
89
97
  }
90
98
  exports.Model = Model;
91
99
  _b = JSII_RTTI_SYMBOL_1;
92
- Model[_b] = { fqn: "aws-cdk-neuronx-patterns.Model", version: "0.0.4" };
93
- //# sourceMappingURL=data:application/json;base64,eyJ2ZXJzaW9uIjozLCJmaWxlIjoibW9kZWwuanMiLCJzb3VyY2VSb290IjoiIiwic291cmNlcyI6WyIuLi9zcmMvbW9kZWwudHMiXSwibmFtZXMiOltdLCJtYXBwaW5ncyI6Ijs7Ozs7QUFFQTs7R0FFRztBQUNILElBQVksVUFLWDtBQUxELFdBQVksVUFBVTtJQUNwQjs7T0FFRztJQUNILHVCQUFTLENBQUE7QUFDWCxDQUFDLEVBTFcsVUFBVSwwQkFBVixVQUFVLFFBS3JCO0FBRUQ7O0dBRUc7QUFDSCxJQUFZLFFBYVg7QUFiRCxXQUFZLFFBQVE7SUFDbEI7O09BRUc7SUFDSCw2RUFBMkIsQ0FBQTtJQUMzQjs7T0FFRztJQUNILHVEQUFnQixDQUFBO0lBQ2hCOztPQUVHO0lBQ0gscUZBQStCLENBQUE7QUFDakMsQ0FBQyxFQWJXLFFBQVEsd0JBQVIsUUFBUSxRQWFuQjtBQXdCRDs7R0FFRztBQUNILE1BQWEsVUFBVTtJQUNyQjs7OztPQUlHO0lBQ0gsTUFBTSxDQUFDLE9BQU8sQ0FBQyxVQUFrQjtRQUMvQixPQUFPLElBQUksVUFBVSxDQUFDLFVBQVUsQ0FBQyxDQUFDO0lBQ3BDLENBQUM7SUFDRCxZQUFxQyxPQUFlO1FBQWYsWUFBTyxHQUFQLE9BQU8sQ0FBUTtJQUFHLENBQUM7SUFDeEQ7OztPQUdHO0lBQ0gsUUFBUTtRQUNOLE9BQU8sSUFBSSxDQUFDLE9BQU8sQ0FBQztJQUN0QixDQUFDOztBQWhCSCxnQ0FpQkM7OztBQVFEOztHQUVHO0FBQ0gsTUFBYSxLQUFLO0lBQ2hCOzs7OztPQUtHO0lBQ0gsTUFBTSxDQUFDLGVBQWUsQ0FBQyxPQUFlLEVBQUUsT0FBcUI7UUFDM0QsT0FBTyxJQUFJLEtBQUssQ0FBQyxPQUFPLEVBQUUsT0FBTyxDQUFDLENBQUM7SUFDckMsQ0FBQztJQUNEOzs7Ozs7T0FNRztJQUNILE1BQU0sQ0FBQyxVQUFVLENBQUMsTUFBZSxFQUFFLE1BQWMsRUFBRSxPQUFxQjtRQUN0RSxPQUFPLElBQUksS0FBSyxDQUFDLE1BQU0sQ0FBQyxjQUFjLENBQUMsTUFBTSxDQUFDLEVBQUUsT0FBTyxFQUFFLE1BQU0sRUFBRSxNQUFNLENBQUMsQ0FBQztJQUMzRSxDQUFDO0lBQ0QsWUFDVyxPQUFlLEVBQ2YsT0FBcUIsRUFDckIsTUFBZ0IsRUFDaEIsTUFBZTtRQUhmLFlBQU8sR0FBUCxPQUFPLENBQVE7UUFDZixZQUFPLEdBQVAsT0FBTyxDQUFjO1FBQ3JCLFdBQU0sR0FBTixNQUFNLENBQVU7UUFDaEIsV0FBTSxHQUFOLE1BQU0sQ0FBUztJQUN2QixDQUFDOztBQXpCTixzQkEwQkMiLCJzb3VyY2VzQ29udGVudCI6WyJpbXBvcnQgeyBJQnVja2V0IH0gZnJvbSBcImF3cy1jZGstbGliL2F3cy1zM1wiO1xuXG4vKipcbiAqIFF1YW50IGRhdGEgdHlwZS5cbiAqL1xuZXhwb3J0IGVudW0gUXVhbnREdHlwZSB7XG4gIC8qKlxuICAgKiBpbnQ4IHdlaWdodCBzdG9yYWdlLlxuICAgKi9cbiAgUzggPSBcInM4XCIsXG59XG5cbi8qKlxuICogT3B0aW1pemF0aW9uIGxldmVsLlxuICovXG5leHBvcnQgZW51bSBPcHRMZXZlbCB7XG4gIC8qKlxuICAgKiBlbmFibGVzIHRoZSBjb3JlIHBlcmZvcm1hbmNlIG9wdGltaXphdGlvbnMgaW4gdGhlIGNvbXBpbGVyLCB3aGlsZSBhbHNvIG1pbmltaXppbmcgY29tcGlsZSB0aW1lLlxuICAgKi9cbiAgTUlOSU1JWklOR19DT01QSUxFX1RJTUUgPSAxLFxuICAvKipcbiAgICogcHJvdmlkZXMgdGhlIGJlc3QgYmFsYW5jZSBiZXR3ZWVuIG1vZGVsIHBlcmZvcm1hbmNlIGFuZCBjb21waWxlIHRpbWUuXG4gICAqL1xuICBCRVNUX0JBTEFOQ0UgPSAyLFxuICAvKipcbiAgICogbWF5IHByb3ZpZGUgYWRkaXRpb25hbCBtb2RlbCBleGVjdXRpb24gcGVyZm9ybWFuY2UgYnV0IG1heSBpbmN1ciBsb25nZXIgY29tcGlsZSB0aW1lcyBhbmQgaGlnaGVyIGhvc3QgbWVtb3J5IHVzYWdlIGR1cmluZyBtb2RlbCBjb21waWxhdGlvbi5cbiAgICovXG4gIE1PREVMX0VYRUNVVElPTl9QRVJGT1JNQU5DRSA9IDMsXG59XG5cbi8qKlxuICogQ29tcGlsZSBvcHRpb25zLlxuICovXG5leHBvcnQgaW50ZXJmYWNlIENvbXBpbGVPcHRpb25zIHtcbiAgLyoqXG4gICAqIEBkZWZhdWx0IC0gY2FsYyBmcm9tIHBhcmFtZXRlcnMgYW5kIHF1YW50RHR5cGVcbiAgICovXG4gIHJlYWRvbmx5IHRwRGVncmVlPzogbnVtYmVyO1xuICAvKipcbiAgICogQGRlZmF1bHQgLSBObyBxdWFudFxuICAgKi9cbiAgcmVhZG9ubHkgcXVhbnREdHlwZT86IFF1YW50RHR5cGU7XG4gIC8qKlxuICAgKiBAZGVmYXVsdCA0MDk2XG4gICAqL1xuICByZWFkb25seSBuUG9zaXRpb25zPzogbnVtYmVyO1xuICAvKipcbiAgICogQGRlZmF1bHQgT3B0TGV2ZWwuQkVTVF9CQUxBTkNFXG4gICAqL1xuICByZWFkb25seSBvcHRMZXZlbD86IE9wdExldmVsO1xufVxuXG4vKipcbiAqIFJlcHJlc2VudHMgdGhlIGFtb3VudCBvZiBwYXJhbWV0ZXJzLlxuICovXG5leHBvcnQgY2xhc3MgUGFyYW1ldGVycyB7XG4gIC8qKlxuICAgKiBDcmVhdGUgYSBQYXJhbWV0ZXJzIHJlcHJlc2VudGluZyBhbiBhbW91bnQgYmlsaW9uLlxuICAgKiBAcGFyYW0gcGFyYW1ldGVycyBudW1iZXIgb2YgcGFyYW1ldGVycyBiaWxpb25YXG4gICAqIEByZXR1cm5zIHBhcmFtZXRlcnNcbiAgICovXG4gIHN0YXRpYyBiaWxsaW9uKHBhcmFtZXRlcnM6IG51bWJlcikge1xuICAgIHJldHVybiBuZXcgUGFyYW1ldGVycyhwYXJhbWV0ZXJzKTtcbiAgfVxuICBwcml2YXRlIGNvbnN0cnVjdG9yKHByaXZhdGUgcmVhZG9ubHkgYmlsbGlvbjogbnVtYmVyKSB7fVxuICAvKipcbiAgICogUmV0dXJuIHRoaXMgbnVtYmVyIG9mIHBhcmFtZXRlcnMgYXMgYmlsaW9uLlxuICAgKiBAcmV0dXJucyBUaGlzIG51bWJlciBvZiBwYXJhbWV0ZXJzIGFzIGJpbGlvbi5cbiAgICovXG4gIHRvQmlsaW9uKCkge1xuICAgIHJldHVybiB0aGlzLmJpbGxpb247XG4gIH1cbn1cblxuLyoqXG4gKiBDb21waWxlIHRhcmdldCBtb2RlbCBiYXNpYyBpbmZyb21hdGlvblxuICovXG5leHBvcnQgaW50ZXJmYWNlIE1vZGVsT3B0aW9ucyB7XG4gIHJlYWRvbmx5IHBhcmFtZXRlcnM6IFBhcmFtZXRlcnM7XG59XG4vKipcbiAqIENvbXBpbGUgdGFyZ2V0IG1vZGVsLlxuICovXG5leHBvcnQgY2xhc3MgTW9kZWwge1xuICAvKipcbiAgICogbW9kZWwgaW5mb3JtYXRpb25zIGF0IEh1Z2dpbmdGYWNlXG4gICAqIEBwYXJhbSBtb2RlbElkIG1vZGVsIGlkIG9uIHRoZSBIdWdnaW5nRmFjZVxuICAgKiBAcGFyYW0gb3B0aW9ucyBtb2RlbCBiYXNpYyBpbmZyb21hdGlvblxuICAgKiBAcmV0dXJucyBtb2RlbCBpbnN0YW5jZVxuICAgKi9cbiAgc3RhdGljIGZyb21IdWdnaW5nRmFjZShtb2RlbElkOiBzdHJpbmcsIG9wdGlvbnM6IE1vZGVsT3B0aW9ucykge1xuICAgIHJldHVybiBuZXcgTW9kZWwobW9kZWxJZCwgb3B0aW9ucyk7XG4gIH1cbiAgLyoqXG4gICAqIG1vZGVsIGluZm9ybWF0aW9ucyBhdCBTMyBCdWNrZXRcbiAgICogQHBhcmFtIGJ1Y2tldCBNb2RlbCBzdG9yZWQgUzMgQnVja2V0XG4gICAqIEBwYXJhbSBwcmVmaXggTW9kZWwgc3RvcmVkIG9iamVjdHMgcHJlZml4XG4gICAqIEBwYXJhbSBvcHRpb25zIG1vZGVsIGJhc2ljIGluZnJvbWF0aW9uXG4gICAqIEByZXR1cm5zIG1vZGVsIGluc3RhbmNlXG4gICAqL1xuICBzdGF0aWMgZnJvbUJ1Y2tldChidWNrZXQ6IElCdWNrZXQsIHByZWZpeDogc3RyaW5nLCBvcHRpb25zOiBNb2RlbE9wdGlvbnMpIHtcbiAgICByZXR1cm4gbmV3IE1vZGVsKGJ1Y2tldC5zM1VybEZvck9iamVjdChwcmVmaXgpLCBvcHRpb25zLCBidWNrZXQsIHByZWZpeCk7XG4gIH1cbiAgcHJpdmF0ZSBjb25zdHJ1Y3RvcihcbiAgICByZWFkb25seSBtb2RlbElkOiBzdHJpbmcsXG4gICAgcmVhZG9ubHkgb3B0aW9uczogTW9kZWxPcHRpb25zLFxuICAgIHJlYWRvbmx5IGJ1Y2tldD86IElCdWNrZXQsXG4gICAgcmVhZG9ubHkgcHJlZml4Pzogc3RyaW5nLFxuICApIHt9XG59XG4iXX0=
100
+ Model[_b] = { fqn: "aws-cdk-neuronx-patterns.Model", version: "0.0.6" };
101
+ //# sourceMappingURL=data:application/json;base64,eyJ2ZXJzaW9uIjozLCJmaWxlIjoibW9kZWwuanMiLCJzb3VyY2VSb290IjoiIiwic291cmNlcyI6WyIuLi9zcmMvbW9kZWwudHMiXSwibmFtZXMiOltdLCJtYXBwaW5ncyI6Ijs7Ozs7QUFFQTs7R0FFRztBQUNILElBQVksVUFLWDtBQUxELFdBQVksVUFBVTtJQUNwQjs7T0FFRztJQUNILHVCQUFTLENBQUE7QUFDWCxDQUFDLEVBTFcsVUFBVSwwQkFBVixVQUFVLFFBS3JCO0FBRUQ7O0dBRUc7QUFDSCxJQUFZLFFBYVg7QUFiRCxXQUFZLFFBQVE7SUFDbEI7O09BRUc7SUFDSCw2RUFBMkIsQ0FBQTtJQUMzQjs7T0FFRztJQUNILHVEQUFnQixDQUFBO0lBQ2hCOztPQUVHO0lBQ0gscUZBQStCLENBQUE7QUFDakMsQ0FBQyxFQWJXLFFBQVEsd0JBQVIsUUFBUSxRQWFuQjtBQXdCRDs7R0FFRztBQUNILE1BQWEsVUFBVTtJQUNyQjs7OztPQUlHO0lBQ0gsTUFBTSxDQUFDLE9BQU8sQ0FBQyxVQUFrQjtRQUMvQixPQUFPLElBQUksVUFBVSxDQUFDLFVBQVUsQ0FBQyxDQUFDO0lBQ3BDLENBQUM7SUFDRCxZQUFxQyxPQUFlO1FBQWYsWUFBTyxHQUFQLE9BQU8sQ0FBUTtJQUFHLENBQUM7SUFDeEQ7OztPQUdHO0lBQ0gsUUFBUTtRQUNOLE9BQU8sSUFBSSxDQUFDLE9BQU8sQ0FBQztJQUN0QixDQUFDOztBQWhCSCxnQ0FpQkM7OztBQVFEOztHQUVHO0FBQ0gsTUFBYSxLQUFLO0lBQ2hCOzs7OztPQUtHO0lBQ0gsTUFBTSxDQUFDLGVBQWUsQ0FBQyxPQUFlLEVBQUUsT0FBc0I7UUFDNUQsTUFBTSxlQUFlLEdBQUcsT0FBTyxDQUFDLEtBQUssQ0FBQyxRQUFRLENBQUMsQ0FBQztRQUNoRCxJQUFJLENBQUMsT0FBTyxFQUFFLFVBQVUsSUFBSSxDQUFDLGVBQWUsRUFBRSxDQUFDO1lBQzdDLE1BQU0sSUFBSSxLQUFLLENBQ2IseUZBQXlGLENBQzFGLENBQUM7UUFDSixDQUFDO1FBQ0QsTUFBTSxVQUFVLEdBQ2QsT0FBTyxFQUFFLFVBQVUsSUFBSSxVQUFVLENBQUMsT0FBTyxDQUFDLFFBQVEsQ0FBQyxlQUFnQixDQUFDLENBQUMsQ0FBQyxDQUFDLENBQUMsQ0FBQztRQUMzRSxPQUFPLElBQUksS0FBSyxDQUFDLE9BQU8sRUFBRTtZQUN4QixHQUFHLE9BQU87WUFDVixVQUFVO1NBQ1gsQ0FBQyxDQUFDO0lBQ0wsQ0FBQztJQUNEOzs7Ozs7T0FNRztJQUNILE1BQU0sQ0FBQyxVQUFVLENBQUMsTUFBZSxFQUFFLE1BQWMsRUFBRSxPQUFxQjtRQUN0RSxPQUFPLElBQUksS0FBSyxDQUFDLE1BQU0sQ0FBQyxjQUFjLENBQUMsTUFBTSxDQUFDLEVBQUUsT0FBTyxFQUFFLE1BQU0sRUFBRSxNQUFNLENBQUMsQ0FBQztJQUMzRSxDQUFDO0lBQ0QsWUFDVyxPQUFlLEVBQ2YsT0FBcUIsRUFDckIsTUFBZ0IsRUFDaEIsTUFBZTtRQUhmLFlBQU8sR0FBUCxPQUFPLENBQVE7UUFDZixZQUFPLEdBQVAsT0FBTyxDQUFjO1FBQ3JCLFdBQU0sR0FBTixNQUFNLENBQVU7UUFDaEIsV0FBTSxHQUFOLE1BQU0sQ0FBUztJQUN2QixDQUFDOztBQXBDTixzQkFxQ0MiLCJzb3VyY2VzQ29udGVudCI6WyJpbXBvcnQgeyBJQnVja2V0IH0gZnJvbSBcImF3cy1jZGstbGliL2F3cy1zM1wiO1xuXG4vKipcbiAqIFF1YW50IGRhdGEgdHlwZS5cbiAqL1xuZXhwb3J0IGVudW0gUXVhbnREdHlwZSB7XG4gIC8qKlxuICAgKiBpbnQ4IHdlaWdodCBzdG9yYWdlLlxuICAgKi9cbiAgUzggPSBcInM4XCIsXG59XG5cbi8qKlxuICogT3B0aW1pemF0aW9uIGxldmVsLlxuICovXG5leHBvcnQgZW51bSBPcHRMZXZlbCB7XG4gIC8qKlxuICAgKiBlbmFibGVzIHRoZSBjb3JlIHBlcmZvcm1hbmNlIG9wdGltaXphdGlvbnMgaW4gdGhlIGNvbXBpbGVyLCB3aGlsZSBhbHNvIG1pbmltaXppbmcgY29tcGlsZSB0aW1lLlxuICAgKi9cbiAgTUlOSU1JWklOR19DT01QSUxFX1RJTUUgPSAxLFxuICAvKipcbiAgICogcHJvdmlkZXMgdGhlIGJlc3QgYmFsYW5jZSBiZXR3ZWVuIG1vZGVsIHBlcmZvcm1hbmNlIGFuZCBjb21waWxlIHRpbWUuXG4gICAqL1xuICBCRVNUX0JBTEFOQ0UgPSAyLFxuICAvKipcbiAgICogbWF5IHByb3ZpZGUgYWRkaXRpb25hbCBtb2RlbCBleGVjdXRpb24gcGVyZm9ybWFuY2UgYnV0IG1heSBpbmN1ciBsb25nZXIgY29tcGlsZSB0aW1lcyBhbmQgaGlnaGVyIGhvc3QgbWVtb3J5IHVzYWdlIGR1cmluZyBtb2RlbCBjb21waWxhdGlvbi5cbiAgICovXG4gIE1PREVMX0VYRUNVVElPTl9QRVJGT1JNQU5DRSA9IDMsXG59XG5cbi8qKlxuICogQ29tcGlsZSBvcHRpb25zLlxuICovXG5leHBvcnQgaW50ZXJmYWNlIENvbXBpbGVPcHRpb25zIHtcbiAgLyoqXG4gICAqIEBkZWZhdWx0IC0gY2FsYyBmcm9tIHBhcmFtZXRlcnMgYW5kIHF1YW50RHR5cGVcbiAgICovXG4gIHJlYWRvbmx5IHRwRGVncmVlPzogbnVtYmVyO1xuICAvKipcbiAgICogQGRlZmF1bHQgLSBObyBxdWFudFxuICAgKi9cbiAgcmVhZG9ubHkgcXVhbnREdHlwZT86IFF1YW50RHR5cGU7XG4gIC8qKlxuICAgKiBAZGVmYXVsdCA0MDk2XG4gICAqL1xuICByZWFkb25seSBuUG9zaXRpb25zPzogbnVtYmVyO1xuICAvKipcbiAgICogQGRlZmF1bHQgT3B0TGV2ZWwuQkVTVF9CQUxBTkNFXG4gICAqL1xuICByZWFkb25seSBvcHRMZXZlbD86IE9wdExldmVsO1xufVxuXG4vKipcbiAqIFJlcHJlc2VudHMgdGhlIGFtb3VudCBvZiBwYXJhbWV0ZXJzLlxuICovXG5leHBvcnQgY2xhc3MgUGFyYW1ldGVycyB7XG4gIC8qKlxuICAgKiBDcmVhdGUgYSBQYXJhbWV0ZXJzIHJlcHJlc2VudGluZyBhbiBhbW91bnQgYmlsaW9uLlxuICAgKiBAcGFyYW0gcGFyYW1ldGVycyBudW1iZXIgb2YgcGFyYW1ldGVycyBiaWxpb25YXG4gICAqIEByZXR1cm5zIHBhcmFtZXRlcnNcbiAgICovXG4gIHN0YXRpYyBiaWxsaW9uKHBhcmFtZXRlcnM6IG51bWJlcikge1xuICAgIHJldHVybiBuZXcgUGFyYW1ldGVycyhwYXJhbWV0ZXJzKTtcbiAgfVxuICBwcml2YXRlIGNvbnN0cnVjdG9yKHByaXZhdGUgcmVhZG9ubHkgYmlsbGlvbjogbnVtYmVyKSB7fVxuICAvKipcbiAgICogUmV0dXJuIHRoaXMgbnVtYmVyIG9mIHBhcmFtZXRlcnMgYXMgYmlsaW9uLlxuICAgKiBAcmV0dXJucyBUaGlzIG51bWJlciBvZiBwYXJhbWV0ZXJzIGFzIGJpbGlvbi5cbiAgICovXG4gIHRvQmlsaW9uKCkge1xuICAgIHJldHVybiB0aGlzLmJpbGxpb247XG4gIH1cbn1cblxuLyoqXG4gKiBDb21waWxlIHRhcmdldCBtb2RlbCBiYXNpYyBpbmZyb21hdGlvblxuICovXG5leHBvcnQgaW50ZXJmYWNlIE1vZGVsT3B0aW9ucyB7XG4gIHJlYWRvbmx5IHBhcmFtZXRlcnM6IFBhcmFtZXRlcnM7XG59XG4vKipcbiAqIENvbXBpbGUgdGFyZ2V0IG1vZGVsLlxuICovXG5leHBvcnQgY2xhc3MgTW9kZWwge1xuICAvKipcbiAgICogbW9kZWwgaW5mb3JtYXRpb25zIGF0IEh1Z2dpbmdGYWNlXG4gICAqIEBwYXJhbSBtb2RlbElkIG1vZGVsIGlkIG9uIHRoZSBIdWdnaW5nRmFjZVxuICAgKiBAcGFyYW0gb3B0aW9ucyBtb2RlbCBiYXNpYyBpbmZyb21hdGlvblxuICAgKiBAcmV0dXJucyBtb2RlbCBpbnN0YW5jZVxuICAgKi9cbiAgc3RhdGljIGZyb21IdWdnaW5nRmFjZShtb2RlbElkOiBzdHJpbmcsIG9wdGlvbnM/OiBNb2RlbE9wdGlvbnMpIHtcbiAgICBjb25zdCBpbmZlclBhcmFtZXRlcnMgPSBtb2RlbElkLm1hdGNoKC8oXFxkKyliLyk7XG4gICAgaWYgKCFvcHRpb25zPy5wYXJhbWV0ZXJzICYmICFpbmZlclBhcmFtZXRlcnMpIHtcbiAgICAgIHRocm93IG5ldyBFcnJvcihcbiAgICAgICAgXCJUaGUgbnVtYmVyIG9mIHBhcmFtZXRlcnMgY2Fubm90IGJlIGluZmVycmVkIGZyb20gdGhlIG1vZGVsIElELiBTZXQgb3B0aW9uYWwgcGFyYW1ldGVycy5cIixcbiAgICAgICk7XG4gICAgfVxuICAgIGNvbnN0IHBhcmFtZXRlcnMgPVxuICAgICAgb3B0aW9ucz8ucGFyYW1ldGVycyA/PyBQYXJhbWV0ZXJzLmJpbGxpb24ocGFyc2VJbnQoaW5mZXJQYXJhbWV0ZXJzIVsxXSkpO1xuICAgIHJldHVybiBuZXcgTW9kZWwobW9kZWxJZCwge1xuICAgICAgLi4ub3B0aW9ucyxcbiAgICAgIHBhcmFtZXRlcnMsXG4gICAgfSk7XG4gIH1cbiAgLyoqXG4gICAqIG1vZGVsIGluZm9ybWF0aW9ucyBhdCBTMyBCdWNrZXRcbiAgICogQHBhcmFtIGJ1Y2tldCBNb2RlbCBzdG9yZWQgUzMgQnVja2V0XG4gICAqIEBwYXJhbSBwcmVmaXggTW9kZWwgc3RvcmVkIG9iamVjdHMgcHJlZml4XG4gICAqIEBwYXJhbSBvcHRpb25zIG1vZGVsIGJhc2ljIGluZnJvbWF0aW9uXG4gICAqIEByZXR1cm5zIG1vZGVsIGluc3RhbmNlXG4gICAqL1xuICBzdGF0aWMgZnJvbUJ1Y2tldChidWNrZXQ6IElCdWNrZXQsIHByZWZpeDogc3RyaW5nLCBvcHRpb25zOiBNb2RlbE9wdGlvbnMpIHtcbiAgICByZXR1cm4gbmV3IE1vZGVsKGJ1Y2tldC5zM1VybEZvck9iamVjdChwcmVmaXgpLCBvcHRpb25zLCBidWNrZXQsIHByZWZpeCk7XG4gIH1cbiAgcHJpdmF0ZSBjb25zdHJ1Y3RvcihcbiAgICByZWFkb25seSBtb2RlbElkOiBzdHJpbmcsXG4gICAgcmVhZG9ubHkgb3B0aW9uczogTW9kZWxPcHRpb25zLFxuICAgIHJlYWRvbmx5IGJ1Y2tldD86IElCdWNrZXQsXG4gICAgcmVhZG9ubHkgcHJlZml4Pzogc3RyaW5nLFxuICApIHt9XG59XG4iXX0=
@@ -169,5 +169,5 @@ class NeuronxCompile extends constructs_1.Construct {
169
169
  }
170
170
  exports.NeuronxCompile = NeuronxCompile;
171
171
  _a = JSII_RTTI_SYMBOL_1;
172
- NeuronxCompile[_a] = { fqn: "aws-cdk-neuronx-patterns.NeuronxCompile", version: "0.0.4" };
172
+ NeuronxCompile[_a] = { fqn: "aws-cdk-neuronx-patterns.NeuronxCompile", version: "0.0.6" };
173
173
  //# sourceMappingURL=data:application/json;base64,{"version":3,"file":"neuronx-compile.js","sourceRoot":"","sources":["../src/neuronx-compile.ts"],"names":[],"mappings":";;;;;AAAA,+BAA4B;AAC5B,6CAAmE;AACnE,+CAA+C;AAC/C,2CAA2C;AAC3C,iDAAqD;AACrD,iDAA4C;AAC5C,uDAA0E;AAE1E,mEAAwD;AACxD,2CAAuC;AACvC,mCAMiB;AACjB,mEAA8D;AAC9D,6FAAuF;AACvF,yCAA8C;AAgE9C;;GAEG;AACH,MAAa,cAAe,SAAQ,sBAAS;IAe3C,YAAY,KAAgB,EAAE,EAAU,EAAE,KAA0B;QAClE,KAAK,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;QAEjB,IAAI,CAAC,UAAU,GAAG,KAAK,CAAC,KAAK,CAAC,OAAO,CAAC,UAAU,CAAC;QACjD,IAAI,CAAC,wBAAwB,GAAG,KAAK,CAAC,MAAM,CAAC;QAC7C,IAAI,CAAC,UAAU,GAAG,KAAK,CAAC,cAAc,EAAE,UAAU,IAAI,IAAI,CAAC;QAC3D,IAAI,CAAC,UAAU,GAAG,KAAK,CAAC,cAAc,EAAE,UAAU,CAAC;QACnD,IAAI,CAAC,QAAQ,GAAG,KAAK,CAAC,cAAc,EAAE,QAAQ,IAAI,gBAAQ,CAAC,YAAY,CAAC;QACxE,IAAI,CAAC,QAAQ;YACX,KAAK,CAAC,cAAc,EAAE,QAAQ;gBAC9B,IAAA,mBAAY,EAAC,KAAK,CAAC,KAAK,CAAC,OAAO,CAAC,UAAU,EAAE;oBAC3C,UAAU,EAAE,IAAI,CAAC,UAAU;oBAC3B,UAAU,EAAE,IAAI,CAAC,UAAU;iBAC5B,CAAC,CAAC;QACL,MAAM,YAAY,GAChB,KAAK,CAAC,YAAY,IAAI,IAAI,CAAC,4BAA4B,CAAC,IAAI,CAAC,QAAQ,CAAC,CAAC;QACzE,MAAM,cAAc,GAAG,IAAI,GAAG,CAAC,cAAc,CAAC,IAAI,EAAE,gBAAgB,EAAE;YACpE,YAAY,EAAE;gBACZ;oBACE,UAAU,EAAE,WAAW;oBACvB,MAAM,EAAE,GAAG,CAAC,iBAAiB,CAAC,GAAG,CAC/B,KAAK,CAAC,UAAU,EAAE,WAAW,EAAE;wBAC7B,KAAK,CAAC,KAAK,CAAC,OAAO,CAAC,UAAU,CAAC,QAAQ,EAAE,GAAG,CAAC,CAChD;iBACF;aACF;SACF,CAAC,CAAC;QACH,MAAM,kBAAkB,GAAG,IAAI,KAAK,CAAC,+BAA+B,CAClE,IAAI,EACJ,oBAAoB,EACpB;YACE,GAAG,EAAE,KAAK,CAAC,GAAG;YACd,UAAU,EAAE,KAAK,CAAC,UAAU;YAC5B,aAAa,EAAE,CAAC,YAAY,CAAC,YAAY,CAAC;YAC1C,yBAAyB,EAAE,KAAK;YAChC,MAAM,EAAE;gBACN;oBACE,KAAK,EAAE,IAAI,4DAA2B,CAAC,IAAI,EAAE,aAAa,CAAC;oBAC3D,aAAa;oBACb,SAAS,EAAE,YAAY;iBACxB;aACF;YACD,cAAc;YACd,IAAI,EAAE,KAAK,CAAC,IAAI;SACjB,CACF,CAAC;QAEA,kBAAkB,CAAC,IAAI,CAAC,YACzB,CAAC,mBAAmB,CACnB,yCAAyC,EACzC,cAAc,CAAC,mBAAmB,CACnC,CAAC;QAEF,kBAAI,CAAC,EAAE,CAAC,kBAAkB,CAAC,CAAC,GAAG,CAAC,MAAM,EAAE,wBAAwB,CAAC,CAAC;QAClE,MAAM,QAAQ,GAAG,IAAI,KAAK,CAAC,QAAQ,CAAC,IAAI,EAAE,UAAU,EAAE;YACpD,mBAAmB,EAAE;gBACnB;oBACE,kBAAkB;oBAClB,KAAK,EAAE,CAAC;iBACT;aACF;YACD,wBAAwB,EAAE;gBACxB;oBACE,KAAK,EAAE,KAAK,CAAC,6BAA6B,CAAC,QAAQ;oBACnD,MAAM,EAAE,KAAK,CAAC,8BAA8B,CAAC,wBAAwB;oBACrE,OAAO,EAAE,sBAAQ,CAAC,OAAO,CAAC,EAAE,CAAC;oBAC7B,MAAM,EAAE,KAAK,CAAC,8BAA8B,CAAC,MAAM;iBACpD;aACF;SACF,CAAC,CAAC;QAEH,MAAM,OAAO,GAAmB,KAAK,CAAC,OAAO,IAAI;YAC/C,KAAK,EAAE,wBAAc,CAAC,SAAS,CAAC,IAAA,WAAI,EAAC,SAAS,EAAE,oBAAoB,CAAC,CAAC;YACtE,cAAc,EAAE,QAAQ;SACzB,CAAC;QACF,IAAI,0BAA0B,GAAG,GAAG,KAAK,CAAC,KAAK,CAAC,OAAO,YAAY,OAAO,CAAC,cAAc,MAAM,IAAI,CAAC,QAAQ,MAAM,IAAI,CAAC,UAAU,OAAO,IAAI,CAAC,QAAQ,EAAE,CAAC;QACxJ,IAAI,IAAI,CAAC,UAAY,EAAE,CAAC;YACtB,0BAA0B,GAAG,GAAG,0BAA0B,SAAS,IAAI,CAAC,UAAU,EAAE,CAAC;QACvF,CAAC;QACD,KAAK,CAAC,MAAM,CAAC,cAAc,CACzB,kBAAkB,CAAC,YAAa,EAChC,GAAG,0BAA0B,IAAI,CAClC,CAAC;QACF,MAAM,eAAe,GAAG,IAAI,KAAK,CAAC,eAAe,CAAC,IAAI,EAAE,iBAAiB,CAAC,CAAC;QAC3E,eAAe,CAAC,UAAU,CACxB,GAAG,KAAK,CAAC,IAAI,CAAC;YACZ,MAAM,EAAE,YAAY,CAAC,gBAAgB,CAAC,KAAK;SAC5C,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC,EAAE,KAAK,EAAE,EAAE,CAAC,CAAC;YACpB,QAAQ,EAAE,cAAc,KAAK,EAAE;YAC/B,aAAa,EAAE,cAAc,KAAK,EAAE;YACpC,WAAW,EAAE;gBACX,KAAK,CAAC,gBAAgB,CAAC,IAAI;gBAC3B,KAAK,CAAC,gBAAgB,CAAC,KAAK;aAC7B;SACF,CAAC,CAAC,CACJ,CAAC;QACF,MAAM,aAAa,GAAG,IAAI,KAAK,CAAC,gBAAgB,CAAC,IAAI,EAAE,eAAe,EAAE;YACtE,SAAS,EAAE,IAAI,KAAK,CAAC,yBAAyB,CAC5C,IAAI,EACJ,qBAAqB,EACrB;gBACE,KAAK,EAAE,OAAO,CAAC,KAAK;gBACpB,oDAAoD;gBACpD,kBAAkB;gBAClB,2DAA2D;gBAC3D,0EAA0E;gBAC1E,4EAA4E;gBAC5E,MAAM,EAAE,kBAAI,CAAC,SAAS,CACpB,IAAI,CAAC,IAAI,CAAC,YAAY,CAAC,MAAM,CAAC,WAAW,EAAE,GAAG,IAAI,CAAC,CACpD;gBACD,GAAG,EAAE,YAAY,CAAC,IAAI;gBACtB,WAAW,EAAE;oBACX,QAAQ,EAAE,KAAK,CAAC,KAAK,CAAC,OAAO;oBAC7B,SAAS,EAAE,IAAI,CAAC,QAAQ,CAAC,QAAQ,EAAE;oBACnC,WAAW,EAAE,IAAI,CAAC,UAAU,CAAC,QAAQ,EAAE;oBACvC,SAAS,EAAE,IAAI,CAAC,QAAQ,CAAC,QAAQ,EAAE;oBACnC,WAAW,EAAE,IAAI,CAAC,UAAU,EAAE,QAAQ,EAAE,IAAI,EAAE;oBAC9C,eAAe,EAAE,KAAK,CAAC,MAAM,CAAC,cAAc,CAC1C,0BAA0B,CAC3B;iBACF;gBACD,eAAe;aAChB,CACF;SACF,CAAC,CAAC;QAEH,MAAM,iBAAiB,GAAG,IAAI,8BAAiB,CAAC,IAAI,EAAE,mBAAmB,EAAE;YACzE,IAAI,EAAE,iBAAI,CAAC,SAAS,CAAC,IAAA,WAAI,EAAC,SAAS,EAAE,2BAA2B,CAAC,CAAC;YAClE,OAAO,EAAE,eAAe;YACxB,OAAO,EAAE,oBAAO,CAAC,WAAW;YAC5B,IAAI,EAAE,sCAAsC;SAC7C,CAAC,CAAC;QACH,aAAa,CAAC,cAAc,CAAC,iBAAiB,EAAE,QAAQ,CAAC,CAAC;QAC1D,MAAM,qBAAqB,GAAG,IAAI,8BAAiB,CACjD,IAAI,EACJ,uBAAuB,EACvB;YACE,IAAI,EAAE,iBAAI,CAAC,SAAS,CAAC,IAAA,WAAI,EAAC,SAAS,EAAE,2BAA2B,CAAC,CAAC;YAClE,OAAO,EAAE,kBAAkB;YAC3B,OAAO,EAAE,oBAAO,CAAC,WAAW;YAC5B,IAAI,EAAE,sCAAsC;SAC7C,CACF,CAAC;QACF,eAAK,CAAC,cAAc,CAAC;YACnB,YAAY,EAAE,CAAC,GAAG,CAAC;YACnB,OAAO,EAAE,qBAAqB;YAC9B,OAAO,EAAE,CAAC,oBAAoB,CAAC;SAChC,CAAC,CAAC;QACH,MAAM,QAAQ,GAAG,IAAI,2BAAQ,CAAC,IAAI,EAAE,oBAAoB,EAAE;YACxD,cAAc,EAAE,iBAAiB;YACjC,iBAAiB,EAAE,qBAAqB;YACxC,aAAa,EAAE,sBAAQ,CAAC,OAAO,CAAC,CAAC,CAAC;YAClC,YAAY,EAAE,sBAAQ,CAAC,KAAK,CAAC,CAAC,CAAC;SAChC,CAAC,CAAC;QACH,MAAM,UAAU,GAAG,IAAI,4BAAc,CAAC,IAAI,EAAE,UAAU,EAAE;YACtD,YAAY,EAAE,QAAQ,CAAC,YAAY;YACnC,YAAY,EAAE,wBAAwB;YACtC,UAAU,EAAE;gBACV,gBAAgB,EAAE,aAAa,CAAC,gBAAgB;gBAChD,WAAW,EAAE,QAAQ,CAAC,WAAW;gBACjC,gBAAgB,EAAE,0BAA0B;aAC7C;SACF,CAAC,CAAC;QACH,IAAI,CAAC,wBAAwB,GAAG,UAAU,CAAC,YAAY,CAAC,kBAAkB,CAAC,CAAC;QAC5E,IAAI,CAAC,qBAAqB,GAAG,KAAK,CAAC,MAAM,CAAC,cAAc,CACtD,IAAI,CAAC,wBAAwB,CAC9B,CAAC;IACJ,CAAC;IAEO,4BAA4B,CAAC,QAAgB;QACnD,MAAM,aAAa,GAAG;YACpB,2CAAmB,CAAC,YAAY;YAChC,2CAAmB,CAAC,aAAa;YACjC,2CAAmB,CAAC,aAAa;SAClC,CAAC;QACF,KAAK,MAAM,YAAY,IAAI,aAAa,EAAE,CAAC;YACzC,IAAI,QAAQ,IAAI,YAAY,CAAC,gBAAgB,CAAC,YAAY,EAAE,CAAC;gBAC3D,OAAO,YAAY,CAAC;YACtB,CAAC;QACH,CAAC;QACD,MAAM,IAAI,KAAK,CACb,wEAAwE,CACzE,CAAC;IACJ,CAAC;;AAtMH,wCAuMC","sourcesContent":["import { join } from \"path\";\nimport { CustomResource, Duration, Size, Tags } from \"aws-cdk-lib\";\nimport * as batch from \"aws-cdk-lib/aws-batch\";\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport { ContainerImage } from \"aws-cdk-lib/aws-ecs\";\nimport { Grant } from \"aws-cdk-lib/aws-iam\";\nimport { Code, Runtime, SingletonFunction } from \"aws-cdk-lib/aws-lambda\";\nimport { IBucket } from \"aws-cdk-lib/aws-s3\";\nimport { Provider } from \"aws-cdk-lib/custom-resources\";\nimport { Construct } from \"constructs\";\nimport {\n  CompileOptions,\n  Model,\n  OptLevel,\n  Parameters,\n  QuantDtype,\n} from \"./model\";\nimport { NeuronxInstanceType } from \"./neuronx-instance-type\";\nimport { NeuronOptimizedMachineImage } from \"./private/neuron-optimized-machine-image\";\nimport { calcTpDegree } from \"./private/util\";\n\n/**\n * Compile runtime.\n */\nexport interface CompileRuntime {\n  /**\n   * An image of the container where the compile job is executed.\n   */\n  readonly image: ContainerImage;\n  /**\n   * Neuronx version included in container image.\n   */\n  readonly neuronxVersion: string;\n}\n/**\n * Props of NeuronxCompile.\n */\nexport interface NeuronxCompileProps {\n  /**\n   * VPC in which this will launch compile worker instance.\n   */\n  readonly vpc: ec2.IVpc;\n  /**\n   * The bucket to upload compiled artifacts.\n   */\n  readonly bucket: IBucket;\n  /**\n   * The model to be compiled.\n   */\n  readonly model: Model;\n  /**\n   * The instance type of compile worker instance.\n   */\n  readonly instanceType?: NeuronxInstanceType;\n  /**\n   * The root volume of worker instance.\n   * @default - N bilion parameters * 5GiB EBS\n   */\n  readonly volumeSize?: Size;\n  /**\n   * Compile runtime.\n   * @default { neuronxSdkVersion: \"2.19.0\", image: ContainerImage.fromRegistry(\"public.ecr.aws/neuron/pytorch-training-neuronx:2.1.2-neuronx-py310-sdk2.19.0-ubuntu20.04\")}\n   */\n  readonly runtime?: CompileRuntime;\n  /**\n   * Neuronx compile options.\n   * @default - Each properties are set default.\n   */\n  readonly compileOptions?: CompileOptions;\n  /**\n   * Whether or not to use spot instances. Spot instances are less expensive EC2 instances that can be reclaimed by EC2 at any time; your job will be given two minutes of notice before reclamation.\n   *\n   * @default false\n   */\n  readonly spot?: boolean;\n  /**\n   * The VPC Subnets this Compute Environment will launch instances in.\n   *\n   * @default - new subnets will be created\n   */\n  readonly vpcSubnets?: ec2.SubnetSelection;\n}\n\n/**\n * Neuronx compile construct. Compile the model to work with Inferentia2 and Trainium1 and upload it to an S3 bucket.\n */\nexport class NeuronxCompile extends Construct {\n  readonly compiledArtifactS3Bucket: IBucket;\n  /**\n   * S3 URL that compiled artifact uploaded.\n   */\n  readonly compiledArtifactS3Url: string;\n  /**\n   * S3 Prefix that compiled artifact uploaded.\n   */\n  readonly compiledArtifactS3Prefix: string;\n  readonly tpDegree: number;\n  readonly quantDtype?: QuantDtype;\n  readonly nPositions: number;\n  readonly optLevel: OptLevel;\n  readonly parameters: Parameters;\n  constructor(scope: Construct, id: string, props: NeuronxCompileProps) {\n    super(scope, id);\n\n    this.parameters = props.model.options.parameters;\n    this.compiledArtifactS3Bucket = props.bucket;\n    this.nPositions = props.compileOptions?.nPositions ?? 4096;\n    this.quantDtype = props.compileOptions?.quantDtype;\n    this.optLevel = props.compileOptions?.optLevel ?? OptLevel.BEST_BALANCE;\n    this.tpDegree =\n      props.compileOptions?.tpDegree ??\n      calcTpDegree(props.model.options.parameters, {\n        nPositions: this.nPositions,\n        quantDtype: this.quantDtype,\n      });\n    const instanceType =\n      props.instanceType ?? this.selectInstanceTypeByTpDegree(this.tpDegree);\n    const launchTemplate = new ec2.LaunchTemplate(this, \"LaunchTemplate\", {\n      blockDevices: [\n        {\n          deviceName: \"/dev/xvda\",\n          volume: ec2.BlockDeviceVolume.ebs(\n            props.volumeSize?.toGibibytes() ??\n              props.model.options.parameters.toBilion() * 5,\n          ),\n        },\n      ],\n    });\n    const computeEnvironment = new batch.ManagedEc2EcsComputeEnvironment(\n      this,\n      \"ComputeEnvironment\",\n      {\n        vpc: props.vpc,\n        vpcSubnets: props.vpcSubnets,\n        instanceTypes: [instanceType.instanceType],\n        useOptimalInstanceClasses: false,\n        images: [\n          {\n            image: new NeuronOptimizedMachineImage(this, \"MachinImage\"),\n            // @ts-ignore\n            imageType: \"ECS_AL2023\",\n          },\n        ],\n        launchTemplate,\n        spot: props.spot,\n      },\n    );\n    (\n      computeEnvironment.node.defaultChild as batch.CfnComputeEnvironment\n    ).addPropertyOverride(\n      \"ComputeResources.LaunchTemplate.Version\",\n      launchTemplate.latestVersionNumber,\n    );\n\n    Tags.of(computeEnvironment).add(\"Name\", \"neuronx-compile-worker\");\n    const jobQueue = new batch.JobQueue(this, \"JobQueue\", {\n      computeEnvironments: [\n        {\n          computeEnvironment,\n          order: 1,\n        },\n      ],\n      jobStateTimeLimitActions: [\n        {\n          state: batch.JobStateTimeLimitActionsState.RUNNABLE,\n          reason: batch.JobStateTimeLimitActionsReason.JOB_RESOURCE_REQUIREMENT,\n          maxTime: Duration.minutes(10),\n          action: batch.JobStateTimeLimitActionsAction.CANCEL,\n        },\n      ],\n    });\n\n    const runtime: CompileRuntime = props.runtime ?? {\n      image: ContainerImage.fromAsset(join(__dirname, \"../scripts/compile\")),\n      neuronxVersion: \"2.19.1\",\n    };\n    let compiledArtifactPathPrefix = `${props.model.modelId}/neuronx-${runtime.neuronxVersion}/tp${this.tpDegree}-np${this.nPositions}-opt${this.optLevel}`;\n    if (this.quantDtype!!) {\n      compiledArtifactPathPrefix = `${compiledArtifactPathPrefix}-quant${this.quantDtype}`;\n    }\n    props.bucket.grantReadWrite(\n      computeEnvironment.instanceRole!,\n      `${compiledArtifactPathPrefix}/*`,\n    );\n    const linuxParameters = new batch.LinuxParameters(this, \"LinuxParameters\");\n    linuxParameters.addDevices(\n      ...Array.from({\n        length: instanceType.acceleratorChips.chips,\n      }).map((_, index) => ({\n        hostPath: `/dev/neuron${index}`,\n        containerPath: `/dev/neuron${index}`,\n        permissions: [\n          batch.DevicePermission.READ,\n          batch.DevicePermission.WRITE,\n        ],\n      })),\n    );\n    const jobDefinition = new batch.EcsJobDefinition(this, \"JobDefinition\", {\n      container: new batch.EcsEc2ContainerDefinition(\n        this,\n        \"ContainerDefinition\",\n        {\n          image: runtime.image,\n          // The fllowing command was executed on inf2.8xlarge\n          // sh-5.2$ free -b\n          // \t\t\ttotal\t\t\t\t\tused\t\t\tfree\t\t\t\t\tshared\tbuff/cache\tavailable\n          // Mem:\t132265766912\t866320384\t130341785600\t667648\t1057660928\t130529148928\n          // https://docs.aws.amazon.com/batch/latest/userguide/memory-management.html\n          memory: Size.mebibytes(\n            Math.ceil(instanceType.memory.toMebibytes() * 0.95),\n          ),\n          cpu: instanceType.vCpu,\n          environment: {\n            MODEL_ID: props.model.modelId,\n            TP_DEGREE: this.tpDegree.toString(),\n            N_POSITIONS: this.nPositions.toString(),\n            OPT_LEVEL: this.optLevel.toString(),\n            QUANT_DTYPE: this.quantDtype?.toString() ?? \"\",\n            ARTIFACT_S3_URL: props.bucket.s3UrlForObject(\n              compiledArtifactPathPrefix,\n            ),\n          },\n          linuxParameters,\n        },\n      ),\n    });\n\n    const jobSubmitFunction = new SingletonFunction(this, \"JobSubmitFunction\", {\n      code: Code.fromAsset(join(__dirname, \"private/await-compile-job\")),\n      handler: \"index.onEvent\",\n      runtime: Runtime.NODEJS_20_X,\n      uuid: \"1361f469-5c92-4c46-9e11-5d1dbf925bac\",\n    });\n    jobDefinition.grantSubmitJob(jobSubmitFunction, jobQueue);\n    const jobMonitoringFunction = new SingletonFunction(\n      this,\n      \"JobMonitoringFunction\",\n      {\n        code: Code.fromAsset(join(__dirname, \"private/await-compile-job\")),\n        handler: \"index.isComplete\",\n        runtime: Runtime.NODEJS_20_X,\n        uuid: \"df16dba8-5f77-480c-a6ad-cfdf74c3de62\",\n      },\n    );\n    Grant.addToPrincipal({\n      resourceArns: [\"*\"],\n      grantee: jobMonitoringFunction,\n      actions: [\"batch:DescribeJobs\"],\n    });\n    const provider = new Provider(this, \"CompileJobProvider\", {\n      onEventHandler: jobSubmitFunction,\n      isCompleteHandler: jobMonitoringFunction,\n      queryInterval: Duration.minutes(1),\n      totalTimeout: Duration.hours(1),\n    });\n    const compileJob = new CustomResource(this, \"Resource\", {\n      serviceToken: provider.serviceToken,\n      resourceType: \"Custom::NeuronxCompile\",\n      properties: {\n        jobDefinitionArn: jobDefinition.jobDefinitionArn,\n        jobQueueArn: jobQueue.jobQueueArn,\n        artifactS3Prefix: compiledArtifactPathPrefix,\n      },\n    });\n    this.compiledArtifactS3Prefix = compileJob.getAttString(\"ArtifactS3Prefix\");\n    this.compiledArtifactS3Url = props.bucket.s3UrlForObject(\n      this.compiledArtifactS3Prefix,\n    );\n  }\n\n  private selectInstanceTypeByTpDegree(tpDegree: number) {\n    const instanceTypes = [\n      NeuronxInstanceType.INF2_8XLARGE,\n      NeuronxInstanceType.INF2_24XLARGE,\n      NeuronxInstanceType.INF2_48XLARGE,\n    ];\n    for (const instanceType of instanceTypes) {\n      if (tpDegree <= instanceType.acceleratorChips.neuronxCores) {\n        return instanceType;\n      }\n    }\n    throw new Error(\n      \"This model is too large, I can not support this model current version.\",\n    );\n  }\n}\n"]}
@@ -14,6 +14,12 @@ export declare class Inferentia2Chips implements IAcceleratorChips {
14
14
  readonly acceleratorMemory: Size;
15
15
  constructor(chips: number);
16
16
  }
17
+ export declare class TrainiumChips implements IAcceleratorChips {
18
+ readonly chips: number;
19
+ readonly neuronxCores: number;
20
+ readonly acceleratorMemory: Size;
21
+ constructor(chips: number);
22
+ }
17
23
  export declare class NeuronxInstanceType {
18
24
  readonly instanceType: ec2.InstanceType;
19
25
  readonly vCpu: number;
@@ -35,6 +41,14 @@ export declare class NeuronxInstanceType {
35
41
  * ml.inf2.48xlarge
36
42
  */
37
43
  static readonly INF2_48XLARGE: NeuronxInstanceType;
44
+ /**
45
+ * ml.trn1.2xlarge
46
+ */
47
+ static readonly TRN1_2XLARGE: NeuronxInstanceType;
48
+ /**
49
+ * ml.trn1.32xlarge
50
+ */
51
+ static readonly TRN1_32XLARGE: NeuronxInstanceType;
38
52
  private constructor();
39
53
  /**
40
54
  * Return the instance type as a string
@@ -1,7 +1,7 @@
1
1
  "use strict";
2
- var _a, _b;
2
+ var _a, _b, _c;
3
3
  Object.defineProperty(exports, "__esModule", { value: true });
4
- exports.NeuronxInstanceType = exports.Inferentia2Chips = void 0;
4
+ exports.NeuronxInstanceType = exports.TrainiumChips = exports.Inferentia2Chips = void 0;
5
5
  const JSII_RTTI_SYMBOL_1 = Symbol.for("jsii.rtti");
6
6
  const aws_cdk_lib_1 = require("aws-cdk-lib");
7
7
  const ec2 = require("aws-cdk-lib/aws-ec2");
@@ -14,7 +14,17 @@ class Inferentia2Chips {
14
14
  }
15
15
  exports.Inferentia2Chips = Inferentia2Chips;
16
16
  _a = JSII_RTTI_SYMBOL_1;
17
- Inferentia2Chips[_a] = { fqn: "aws-cdk-neuronx-patterns.Inferentia2Chips", version: "0.0.4" };
17
+ Inferentia2Chips[_a] = { fqn: "aws-cdk-neuronx-patterns.Inferentia2Chips", version: "0.0.6" };
18
+ class TrainiumChips {
19
+ constructor(chips) {
20
+ this.chips = chips;
21
+ this.neuronxCores = chips * 2;
22
+ this.acceleratorMemory = aws_cdk_lib_1.Size.gibibytes(16 * this.neuronxCores);
23
+ }
24
+ }
25
+ exports.TrainiumChips = TrainiumChips;
26
+ _b = JSII_RTTI_SYMBOL_1;
27
+ TrainiumChips[_b] = { fqn: "aws-cdk-neuronx-patterns.TrainiumChips", version: "0.0.6" };
18
28
  class NeuronxInstanceType {
19
29
  constructor(instanceType, vCpu, memory, acceleratorChips) {
20
30
  this.instanceType = instanceType;
@@ -31,8 +41,8 @@ class NeuronxInstanceType {
31
41
  }
32
42
  }
33
43
  exports.NeuronxInstanceType = NeuronxInstanceType;
34
- _b = JSII_RTTI_SYMBOL_1;
35
- NeuronxInstanceType[_b] = { fqn: "aws-cdk-neuronx-patterns.NeuronxInstanceType", version: "0.0.4" };
44
+ _c = JSII_RTTI_SYMBOL_1;
45
+ NeuronxInstanceType[_c] = { fqn: "aws-cdk-neuronx-patterns.NeuronxInstanceType", version: "0.0.6" };
36
46
  /**
37
47
  * ml.inf2.xlarge
38
48
  */
@@ -49,4 +59,12 @@ NeuronxInstanceType.INF2_24XLARGE = new NeuronxInstanceType(ec2.InstanceType.of(
49
59
  * ml.inf2.48xlarge
50
60
  */
51
61
  NeuronxInstanceType.INF2_48XLARGE = new NeuronxInstanceType(ec2.InstanceType.of(ec2.InstanceClass.INF2, ec2.InstanceSize.XLARGE48), 192, aws_cdk_lib_1.Size.gibibytes(768), new Inferentia2Chips(12));
52
- //# sourceMappingURL=data:application/json;base64,eyJ2ZXJzaW9uIjozLCJmaWxlIjoibmV1cm9ueC1pbnN0YW5jZS10eXBlLmpzIiwic291cmNlUm9vdCI6IiIsInNvdXJjZXMiOlsiLi4vc3JjL25ldXJvbngtaW5zdGFuY2UtdHlwZS50cyJdLCJuYW1lcyI6W10sIm1hcHBpbmdzIjoiOzs7OztBQUFBLDZDQUFtQztBQUNuQywyQ0FBMkM7QUFXM0MsTUFBYSxnQkFBZ0I7SUFHM0IsWUFBcUIsS0FBYTtRQUFiLFVBQUssR0FBTCxLQUFLLENBQVE7UUFDaEMsSUFBSSxDQUFDLFlBQVksR0FBRyxLQUFLLEdBQUcsQ0FBQyxDQUFDO1FBQzlCLElBQUksQ0FBQyxpQkFBaUIsR0FBRyxrQkFBSSxDQUFDLFNBQVMsQ0FBQyxFQUFFLEdBQUcsSUFBSSxDQUFDLFlBQVksQ0FBQyxDQUFDO0lBQ2xFLENBQUM7O0FBTkgsNENBT0M7OztBQUVELE1BQWEsbUJBQW1CO0lBcUM5QixZQUNXLFlBQThCLEVBQzlCLElBQVksRUFDWixNQUFZLEVBQ1osZ0JBQW1DO1FBSG5DLGlCQUFZLEdBQVosWUFBWSxDQUFrQjtRQUM5QixTQUFJLEdBQUosSUFBSSxDQUFRO1FBQ1osV0FBTSxHQUFOLE1BQU0sQ0FBTTtRQUNaLHFCQUFnQixHQUFoQixnQkFBZ0IsQ0FBbUI7SUFDM0MsQ0FBQztJQUNKOzs7T0FHRztJQUNILFFBQVE7UUFDTixPQUFPLE1BQU0sSUFBSSxDQUFDLFlBQVksQ0FBQyxRQUFRLEVBQUUsRUFBRSxDQUFDO0lBQzlDLENBQUM7O0FBakRILGtEQWtEQzs7O0FBakRDOztHQUVHO0FBQ29CLCtCQUFXLEdBQUcsSUFBSSxtQkFBbUIsQ0FDMUQsR0FBRyxDQUFDLFlBQVksQ0FBQyxFQUFFLENBQUMsR0FBRyxDQUFDLGFBQWEsQ0FBQyxJQUFJLEVBQUUsR0FBRyxDQUFDLFlBQVksQ0FBQyxNQUFNLENBQUMsRUFDcEUsQ0FBQyxFQUNELGtCQUFJLENBQUMsU0FBUyxDQUFDLEVBQUUsQ0FBQyxFQUNsQixJQUFJLGdCQUFnQixDQUFDLENBQUMsQ0FBQyxDQUN4QixDQUFDO0FBQ0Y7O0dBRUc7QUFDb0IsZ0NBQVksR0FBRyxJQUFJLG1CQUFtQixDQUMzRCxHQUFHLENBQUMsWUFBWSxDQUFDLEVBQUUsQ0FBQyxHQUFHLENBQUMsYUFBYSxDQUFDLElBQUksRUFBRSxHQUFHLENBQUMsWUFBWSxDQUFDLE9BQU8sQ0FBQyxFQUNyRSxFQUFFLEVBQ0Ysa0JBQUksQ0FBQyxTQUFTLENBQUMsR0FBRyxDQUFDLEVBQ25CLElBQUksZ0JBQWdCLENBQUMsQ0FBQyxDQUFDLENBQ3hCLENBQUM7QUFDRjs7R0FFRztBQUNvQixpQ0FBYSxHQUFHLElBQUksbUJBQW1CLENBQzVELEdBQUcsQ0FBQyxZQUFZLENBQUMsRUFBRSxDQUFDLEdBQUcsQ0FBQyxhQUFhLENBQUMsSUFBSSxFQUFFLEdBQUcsQ0FBQyxZQUFZLENBQUMsUUFBUSxDQUFDLEVBQ3RFLEVBQUUsRUFDRixrQkFBSSxDQUFDLFNBQVMsQ0FBQyxHQUFHLENBQUMsRUFDbkIsSUFBSSxnQkFBZ0IsQ0FBQyxDQUFDLENBQUMsQ0FDeEIsQ0FBQztBQUNGOztHQUVHO0FBQ29CLGlDQUFhLEdBQUcsSUFBSSxtQkFBbUIsQ0FDNUQsR0FBRyxDQUFDLFlBQVksQ0FBQyxFQUFFLENBQUMsR0FBRyxDQUFDLGFBQWEsQ0FBQyxJQUFJLEVBQUUsR0FBRyxDQUFDLFlBQVksQ0FBQyxRQUFRLENBQUMsRUFDdEUsR0FBRyxFQUNILGtCQUFJLENBQUMsU0FBUyxDQUFDLEdBQUcsQ0FBQyxFQUNuQixJQUFJLGdCQUFnQixDQUFDLEVBQUUsQ0FBQyxDQUN6QixDQUFDIiwic291cmNlc0NvbnRlbnQiOlsiaW1wb3J0IHsgU2l6ZSB9IGZyb20gXCJhd3MtY2RrLWxpYlwiO1xuaW1wb3J0ICogYXMgZWMyIGZyb20gXCJhd3MtY2RrLWxpYi9hd3MtZWMyXCI7XG5cbi8qKlxuICpcbiAqL1xuZXhwb3J0IGludGVyZmFjZSBJQWNjZWxlcmF0b3JDaGlwcyB7XG4gIHJlYWRvbmx5IGNoaXBzOiBudW1iZXI7XG4gIHJlYWRvbmx5IG5ldXJvbnhDb3JlczogbnVtYmVyO1xuICByZWFkb25seSBhY2NlbGVyYXRvck1lbW9yeTogU2l6ZTtcbn1cblxuZXhwb3J0IGNsYXNzIEluZmVyZW50aWEyQ2hpcHMgaW1wbGVtZW50cyBJQWNjZWxlcmF0b3JDaGlwcyB7XG4gIHJlYWRvbmx5IG5ldXJvbnhDb3JlczogbnVtYmVyO1xuICByZWFkb25seSBhY2NlbGVyYXRvck1lbW9yeTogU2l6ZTtcbiAgY29uc3RydWN0b3IocmVhZG9ubHkgY2hpcHM6IG51bWJlcikge1xuICAgIHRoaXMubmV1cm9ueENvcmVzID0gY2hpcHMgKiAyO1xuICAgIHRoaXMuYWNjZWxlcmF0b3JNZW1vcnkgPSBTaXplLmdpYmlieXRlcygxNiAqIHRoaXMubmV1cm9ueENvcmVzKTtcbiAgfVxufVxuXG5leHBvcnQgY2xhc3MgTmV1cm9ueEluc3RhbmNlVHlwZSB7XG4gIC8qKlxuICAgKiBtbC5pbmYyLnhsYXJnZVxuICAgKi9cbiAgcHVibGljIHN0YXRpYyByZWFkb25seSBJTkYyX1hMQVJHRSA9IG5ldyBOZXVyb254SW5zdGFuY2VUeXBlKFxuICAgIGVjMi5JbnN0YW5jZVR5cGUub2YoZWMyLkluc3RhbmNlQ2xhc3MuSU5GMiwgZWMyLkluc3RhbmNlU2l6ZS5YTEFSR0UpLFxuICAgIDQsXG4gICAgU2l6ZS5naWJpYnl0ZXMoMTYpLFxuICAgIG5ldyBJbmZlcmVudGlhMkNoaXBzKDEpLFxuICApO1xuICAvKipcbiAgICogbWwuaW5mMi44eGxhcmdlXG4gICAqL1xuICBwdWJsaWMgc3RhdGljIHJlYWRvbmx5IElORjJfOFhMQVJHRSA9IG5ldyBOZXVyb254SW5zdGFuY2VUeXBlKFxuICAgIGVjMi5JbnN0YW5jZVR5cGUub2YoZWMyLkluc3RhbmNlQ2xhc3MuSU5GMiwgZWMyLkluc3RhbmNlU2l6ZS5YTEFSR0U4KSxcbiAgICAzMixcbiAgICBTaXplLmdpYmlieXRlcygxMjgpLFxuICAgIG5ldyBJbmZlcmVudGlhMkNoaXBzKDEpLFxuICApO1xuICAvKipcbiAgICogbWwuaW5mMi4yNHhsYXJnZVxuICAgKi9cbiAgcHVibGljIHN0YXRpYyByZWFkb25seSBJTkYyXzI0WExBUkdFID0gbmV3IE5ldXJvbnhJbnN0YW5jZVR5cGUoXG4gICAgZWMyLkluc3RhbmNlVHlwZS5vZihlYzIuSW5zdGFuY2VDbGFzcy5JTkYyLCBlYzIuSW5zdGFuY2VTaXplLlhMQVJHRTI0KSxcbiAgICA5NixcbiAgICBTaXplLmdpYmlieXRlcygzODQpLFxuICAgIG5ldyBJbmZlcmVudGlhMkNoaXBzKDYpLFxuICApO1xuICAvKipcbiAgICogbWwuaW5mMi40OHhsYXJnZVxuICAgKi9cbiAgcHVibGljIHN0YXRpYyByZWFkb25seSBJTkYyXzQ4WExBUkdFID0gbmV3IE5ldXJvbnhJbnN0YW5jZVR5cGUoXG4gICAgZWMyLkluc3RhbmNlVHlwZS5vZihlYzIuSW5zdGFuY2VDbGFzcy5JTkYyLCBlYzIuSW5zdGFuY2VTaXplLlhMQVJHRTQ4KSxcbiAgICAxOTIsXG4gICAgU2l6ZS5naWJpYnl0ZXMoNzY4KSxcbiAgICBuZXcgSW5mZXJlbnRpYTJDaGlwcygxMiksXG4gICk7XG4gIHByaXZhdGUgY29uc3RydWN0b3IoXG4gICAgcmVhZG9ubHkgaW5zdGFuY2VUeXBlOiBlYzIuSW5zdGFuY2VUeXBlLFxuICAgIHJlYWRvbmx5IHZDcHU6IG51bWJlcixcbiAgICByZWFkb25seSBtZW1vcnk6IFNpemUsXG4gICAgcmVhZG9ubHkgYWNjZWxlcmF0b3JDaGlwczogSUFjY2VsZXJhdG9yQ2hpcHMsXG4gICkge31cbiAgLyoqXG4gICAqIFJldHVybiB0aGUgaW5zdGFuY2UgdHlwZSBhcyBhIHN0cmluZ1xuICAgKiBAcmV0dXJucyBUaGUgaW5zdGFuY2UgdHlwZSBhcyBhIHN0cmluZ1xuICAgKi9cbiAgdG9TdHJpbmcoKSB7XG4gICAgcmV0dXJuIGBtbC4ke3RoaXMuaW5zdGFuY2VUeXBlLnRvU3RyaW5nKCl9YDtcbiAgfVxufVxuIl19
62
+ /**
63
+ * ml.trn1.2xlarge
64
+ */
65
+ NeuronxInstanceType.TRN1_2XLARGE = new NeuronxInstanceType(ec2.InstanceType.of(ec2.InstanceClass.TRN1, ec2.InstanceSize.XLARGE2), 8, aws_cdk_lib_1.Size.gibibytes(32), new TrainiumChips(1));
66
+ /**
67
+ * ml.trn1.32xlarge
68
+ */
69
+ NeuronxInstanceType.TRN1_32XLARGE = new NeuronxInstanceType(ec2.InstanceType.of(ec2.InstanceClass.TRN1, ec2.InstanceSize.XLARGE32), 128, aws_cdk_lib_1.Size.gibibytes(512), new TrainiumChips(16));
70
+ //# sourceMappingURL=data:application/json;base64,eyJ2ZXJzaW9uIjozLCJmaWxlIjoibmV1cm9ueC1pbnN0YW5jZS10eXBlLmpzIiwic291cmNlUm9vdCI6IiIsInNvdXJjZXMiOlsiLi4vc3JjL25ldXJvbngtaW5zdGFuY2UtdHlwZS50cyJdLCJuYW1lcyI6W10sIm1hcHBpbmdzIjoiOzs7OztBQUFBLDZDQUFtQztBQUNuQywyQ0FBMkM7QUFXM0MsTUFBYSxnQkFBZ0I7SUFHM0IsWUFBcUIsS0FBYTtRQUFiLFVBQUssR0FBTCxLQUFLLENBQVE7UUFDaEMsSUFBSSxDQUFDLFlBQVksR0FBRyxLQUFLLEdBQUcsQ0FBQyxDQUFDO1FBQzlCLElBQUksQ0FBQyxpQkFBaUIsR0FBRyxrQkFBSSxDQUFDLFNBQVMsQ0FBQyxFQUFFLEdBQUcsSUFBSSxDQUFDLFlBQVksQ0FBQyxDQUFDO0lBQ2xFLENBQUM7O0FBTkgsNENBT0M7OztBQUVELE1BQWEsYUFBYTtJQUd4QixZQUFxQixLQUFhO1FBQWIsVUFBSyxHQUFMLEtBQUssQ0FBUTtRQUNoQyxJQUFJLENBQUMsWUFBWSxHQUFHLEtBQUssR0FBRyxDQUFDLENBQUM7UUFDOUIsSUFBSSxDQUFDLGlCQUFpQixHQUFHLGtCQUFJLENBQUMsU0FBUyxDQUFDLEVBQUUsR0FBRyxJQUFJLENBQUMsWUFBWSxDQUFDLENBQUM7SUFDbEUsQ0FBQzs7QUFOSCxzQ0FPQzs7O0FBRUQsTUFBYSxtQkFBbUI7SUF1RDlCLFlBQ1csWUFBOEIsRUFDOUIsSUFBWSxFQUNaLE1BQVksRUFDWixnQkFBbUM7UUFIbkMsaUJBQVksR0FBWixZQUFZLENBQWtCO1FBQzlCLFNBQUksR0FBSixJQUFJLENBQVE7UUFDWixXQUFNLEdBQU4sTUFBTSxDQUFNO1FBQ1oscUJBQWdCLEdBQWhCLGdCQUFnQixDQUFtQjtJQUMzQyxDQUFDO0lBQ0o7OztPQUdHO0lBQ0gsUUFBUTtRQUNOLE9BQU8sTUFBTSxJQUFJLENBQUMsWUFBWSxDQUFDLFFBQVEsRUFBRSxFQUFFLENBQUM7SUFDOUMsQ0FBQzs7QUFuRUgsa0RBb0VDOzs7QUFuRUM7O0dBRUc7QUFDb0IsK0JBQVcsR0FBRyxJQUFJLG1CQUFtQixDQUMxRCxHQUFHLENBQUMsWUFBWSxDQUFDLEVBQUUsQ0FBQyxHQUFHLENBQUMsYUFBYSxDQUFDLElBQUksRUFBRSxHQUFHLENBQUMsWUFBWSxDQUFDLE1BQU0sQ0FBQyxFQUNwRSxDQUFDLEVBQ0Qsa0JBQUksQ0FBQyxTQUFTLENBQUMsRUFBRSxDQUFDLEVBQ2xCLElBQUksZ0JBQWdCLENBQUMsQ0FBQyxDQUFDLENBQ3hCLENBQUM7QUFDRjs7R0FFRztBQUNvQixnQ0FBWSxHQUFHLElBQUksbUJBQW1CLENBQzNELEdBQUcsQ0FBQyxZQUFZLENBQUMsRUFBRSxDQUFDLEdBQUcsQ0FBQyxhQUFhLENBQUMsSUFBSSxFQUFFLEdBQUcsQ0FBQyxZQUFZLENBQUMsT0FBTyxDQUFDLEVBQ3JFLEVBQUUsRUFDRixrQkFBSSxDQUFDLFNBQVMsQ0FBQyxHQUFHLENBQUMsRUFDbkIsSUFBSSxnQkFBZ0IsQ0FBQyxDQUFDLENBQUMsQ0FDeEIsQ0FBQztBQUNGOztHQUVHO0FBQ29CLGlDQUFhLEdBQUcsSUFBSSxtQkFBbUIsQ0FDNUQsR0FBRyxDQUFDLFlBQVksQ0FBQyxFQUFFLENBQUMsR0FBRyxDQUFDLGFBQWEsQ0FBQyxJQUFJLEVBQUUsR0FBRyxDQUFDLFlBQVksQ0FBQyxRQUFRLENBQUMsRUFDdEUsRUFBRSxFQUNGLGtCQUFJLENBQUMsU0FBUyxDQUFDLEdBQUcsQ0FBQyxFQUNuQixJQUFJLGdCQUFnQixDQUFDLENBQUMsQ0FBQyxDQUN4QixDQUFDO0FBQ0Y7O0dBRUc7QUFDb0IsaUNBQWEsR0FBRyxJQUFJLG1CQUFtQixDQUM1RCxHQUFHLENBQUMsWUFBWSxDQUFDLEVBQUUsQ0FBQyxHQUFHLENBQUMsYUFBYSxDQUFDLElBQUksRUFBRSxHQUFHLENBQUMsWUFBWSxDQUFDLFFBQVEsQ0FBQyxFQUN0RSxHQUFHLEVBQ0gsa0JBQUksQ0FBQyxTQUFTLENBQUMsR0FBRyxDQUFDLEVBQ25CLElBQUksZ0JBQWdCLENBQUMsRUFBRSxDQUFDLENBQ3pCLENBQUM7QUFDRjs7R0FFRztBQUNvQixnQ0FBWSxHQUFHLElBQUksbUJBQW1CLENBQzNELEdBQUcsQ0FBQyxZQUFZLENBQUMsRUFBRSxDQUFDLEdBQUcsQ0FBQyxhQUFhLENBQUMsSUFBSSxFQUFFLEdBQUcsQ0FBQyxZQUFZLENBQUMsT0FBTyxDQUFDLEVBQ3JFLENBQUMsRUFDRCxrQkFBSSxDQUFDLFNBQVMsQ0FBQyxFQUFFLENBQUMsRUFDbEIsSUFBSSxhQUFhLENBQUMsQ0FBQyxDQUFDLENBQ3JCLENBQUM7QUFDRjs7R0FFRztBQUNvQixpQ0FBYSxHQUFHLElBQUksbUJBQW1CLENBQzVELEdBQUcsQ0FBQyxZQUFZLENBQUMsRUFBRSxDQUFDLEdBQUcsQ0FBQyxhQUFhLENBQUMsSUFBSSxFQUFFLEdBQUcsQ0FBQyxZQUFZLENBQUMsUUFBUSxDQUFDLEVBQ3RFLEdBQUcsRUFDSCxrQkFBSSxDQUFDLFNBQVMsQ0FBQyxHQUFHLENBQUMsRUFDbkIsSUFBSSxhQUFhLENBQUMsRUFBRSxDQUFDLENBQ3RCLENBQUMiLCJzb3VyY2VzQ29udGVudCI6WyJpbXBvcnQgeyBTaXplIH0gZnJvbSBcImF3cy1jZGstbGliXCI7XG5pbXBvcnQgKiBhcyBlYzIgZnJvbSBcImF3cy1jZGstbGliL2F3cy1lYzJcIjtcblxuLyoqXG4gKlxuICovXG5leHBvcnQgaW50ZXJmYWNlIElBY2NlbGVyYXRvckNoaXBzIHtcbiAgcmVhZG9ubHkgY2hpcHM6IG51bWJlcjtcbiAgcmVhZG9ubHkgbmV1cm9ueENvcmVzOiBudW1iZXI7XG4gIHJlYWRvbmx5IGFjY2VsZXJhdG9yTWVtb3J5OiBTaXplO1xufVxuXG5leHBvcnQgY2xhc3MgSW5mZXJlbnRpYTJDaGlwcyBpbXBsZW1lbnRzIElBY2NlbGVyYXRvckNoaXBzIHtcbiAgcmVhZG9ubHkgbmV1cm9ueENvcmVzOiBudW1iZXI7XG4gIHJlYWRvbmx5IGFjY2VsZXJhdG9yTWVtb3J5OiBTaXplO1xuICBjb25zdHJ1Y3RvcihyZWFkb25seSBjaGlwczogbnVtYmVyKSB7XG4gICAgdGhpcy5uZXVyb254Q29yZXMgPSBjaGlwcyAqIDI7XG4gICAgdGhpcy5hY2NlbGVyYXRvck1lbW9yeSA9IFNpemUuZ2liaWJ5dGVzKDE2ICogdGhpcy5uZXVyb254Q29yZXMpO1xuICB9XG59XG5cbmV4cG9ydCBjbGFzcyBUcmFpbml1bUNoaXBzIGltcGxlbWVudHMgSUFjY2VsZXJhdG9yQ2hpcHMge1xuICByZWFkb25seSBuZXVyb254Q29yZXM6IG51bWJlcjtcbiAgcmVhZG9ubHkgYWNjZWxlcmF0b3JNZW1vcnk6IFNpemU7XG4gIGNvbnN0cnVjdG9yKHJlYWRvbmx5IGNoaXBzOiBudW1iZXIpIHtcbiAgICB0aGlzLm5ldXJvbnhDb3JlcyA9IGNoaXBzICogMjtcbiAgICB0aGlzLmFjY2VsZXJhdG9yTWVtb3J5ID0gU2l6ZS5naWJpYnl0ZXMoMTYgKiB0aGlzLm5ldXJvbnhDb3Jlcyk7XG4gIH1cbn1cblxuZXhwb3J0IGNsYXNzIE5ldXJvbnhJbnN0YW5jZVR5cGUge1xuICAvKipcbiAgICogbWwuaW5mMi54bGFyZ2VcbiAgICovXG4gIHB1YmxpYyBzdGF0aWMgcmVhZG9ubHkgSU5GMl9YTEFSR0UgPSBuZXcgTmV1cm9ueEluc3RhbmNlVHlwZShcbiAgICBlYzIuSW5zdGFuY2VUeXBlLm9mKGVjMi5JbnN0YW5jZUNsYXNzLklORjIsIGVjMi5JbnN0YW5jZVNpemUuWExBUkdFKSxcbiAgICA0LFxuICAgIFNpemUuZ2liaWJ5dGVzKDE2KSxcbiAgICBuZXcgSW5mZXJlbnRpYTJDaGlwcygxKSxcbiAgKTtcbiAgLyoqXG4gICAqIG1sLmluZjIuOHhsYXJnZVxuICAgKi9cbiAgcHVibGljIHN0YXRpYyByZWFkb25seSBJTkYyXzhYTEFSR0UgPSBuZXcgTmV1cm9ueEluc3RhbmNlVHlwZShcbiAgICBlYzIuSW5zdGFuY2VUeXBlLm9mKGVjMi5JbnN0YW5jZUNsYXNzLklORjIsIGVjMi5JbnN0YW5jZVNpemUuWExBUkdFOCksXG4gICAgMzIsXG4gICAgU2l6ZS5naWJpYnl0ZXMoMTI4KSxcbiAgICBuZXcgSW5mZXJlbnRpYTJDaGlwcygxKSxcbiAgKTtcbiAgLyoqXG4gICAqIG1sLmluZjIuMjR4bGFyZ2VcbiAgICovXG4gIHB1YmxpYyBzdGF0aWMgcmVhZG9ubHkgSU5GMl8yNFhMQVJHRSA9IG5ldyBOZXVyb254SW5zdGFuY2VUeXBlKFxuICAgIGVjMi5JbnN0YW5jZVR5cGUub2YoZWMyLkluc3RhbmNlQ2xhc3MuSU5GMiwgZWMyLkluc3RhbmNlU2l6ZS5YTEFSR0UyNCksXG4gICAgOTYsXG4gICAgU2l6ZS5naWJpYnl0ZXMoMzg0KSxcbiAgICBuZXcgSW5mZXJlbnRpYTJDaGlwcyg2KSxcbiAgKTtcbiAgLyoqXG4gICAqIG1sLmluZjIuNDh4bGFyZ2VcbiAgICovXG4gIHB1YmxpYyBzdGF0aWMgcmVhZG9ubHkgSU5GMl80OFhMQVJHRSA9IG5ldyBOZXVyb254SW5zdGFuY2VUeXBlKFxuICAgIGVjMi5JbnN0YW5jZVR5cGUub2YoZWMyLkluc3RhbmNlQ2xhc3MuSU5GMiwgZWMyLkluc3RhbmNlU2l6ZS5YTEFSR0U0OCksXG4gICAgMTkyLFxuICAgIFNpemUuZ2liaWJ5dGVzKDc2OCksXG4gICAgbmV3IEluZmVyZW50aWEyQ2hpcHMoMTIpLFxuICApO1xuICAvKipcbiAgICogbWwudHJuMS4yeGxhcmdlXG4gICAqL1xuICBwdWJsaWMgc3RhdGljIHJlYWRvbmx5IFRSTjFfMlhMQVJHRSA9IG5ldyBOZXVyb254SW5zdGFuY2VUeXBlKFxuICAgIGVjMi5JbnN0YW5jZVR5cGUub2YoZWMyLkluc3RhbmNlQ2xhc3MuVFJOMSwgZWMyLkluc3RhbmNlU2l6ZS5YTEFSR0UyKSxcbiAgICA4LFxuICAgIFNpemUuZ2liaWJ5dGVzKDMyKSxcbiAgICBuZXcgVHJhaW5pdW1DaGlwcygxKSxcbiAgKTtcbiAgLyoqXG4gICAqIG1sLnRybjEuMzJ4bGFyZ2VcbiAgICovXG4gIHB1YmxpYyBzdGF0aWMgcmVhZG9ubHkgVFJOMV8zMlhMQVJHRSA9IG5ldyBOZXVyb254SW5zdGFuY2VUeXBlKFxuICAgIGVjMi5JbnN0YW5jZVR5cGUub2YoZWMyLkluc3RhbmNlQ2xhc3MuVFJOMSwgZWMyLkluc3RhbmNlU2l6ZS5YTEFSR0UzMiksXG4gICAgMTI4LFxuICAgIFNpemUuZ2liaWJ5dGVzKDUxMiksXG4gICAgbmV3IFRyYWluaXVtQ2hpcHMoMTYpLFxuICApO1xuICBwcml2YXRlIGNvbnN0cnVjdG9yKFxuICAgIHJlYWRvbmx5IGluc3RhbmNlVHlwZTogZWMyLkluc3RhbmNlVHlwZSxcbiAgICByZWFkb25seSB2Q3B1OiBudW1iZXIsXG4gICAgcmVhZG9ubHkgbWVtb3J5OiBTaXplLFxuICAgIHJlYWRvbmx5IGFjY2VsZXJhdG9yQ2hpcHM6IElBY2NlbGVyYXRvckNoaXBzLFxuICApIHt9XG4gIC8qKlxuICAgKiBSZXR1cm4gdGhlIGluc3RhbmNlIHR5cGUgYXMgYSBzdHJpbmdcbiAgICogQHJldHVybnMgVGhlIGluc3RhbmNlIHR5cGUgYXMgYSBzdHJpbmdcbiAgICovXG4gIHRvU3RyaW5nKCkge1xuICAgIHJldHVybiBgbWwuJHt0aGlzLmluc3RhbmNlVHlwZS50b1N0cmluZygpfWA7XG4gIH1cbn1cbiJdfQ==
@@ -58,7 +58,7 @@ class TransformersNeuronxSageMakerInferenceModelData {
58
58
  }
59
59
  exports.TransformersNeuronxSageMakerInferenceModelData = TransformersNeuronxSageMakerInferenceModelData;
60
60
  _a = JSII_RTTI_SYMBOL_1;
61
- TransformersNeuronxSageMakerInferenceModelData[_a] = { fqn: "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerInferenceModelData", version: "0.0.4" };
61
+ TransformersNeuronxSageMakerInferenceModelData[_a] = { fqn: "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerInferenceModelData", version: "0.0.6" };
62
62
  class TransformersNeuronxSageMakerRealtimeInferenceEndpoint extends constructs_1.Construct {
63
63
  constructor(scope, id, props) {
64
64
  super(scope, id);
@@ -111,7 +111,9 @@ class TransformersNeuronxSageMakerRealtimeInferenceEndpoint extends constructs_1
111
111
  (props.modelData.parameters.toBilion() * 2.5 > 512
112
112
  ? 512
113
113
  : props.modelData.parameters.toBilion() * 2.5));
114
- cfnEndpointConfig.addPropertyOverride("ProductionVariants.0.VolumeSizeInGB", volumeSize.toGibibytes());
114
+ cfnEndpointConfig.addPropertyOverride("ProductionVariants.0.VolumeSizeInGB", instanceType.toString().startsWith("ml.trn1")
115
+ ? undefined
116
+ : volumeSize.toGibibytes());
115
117
  const modelDataDownloadTimeout = props.modelDataDownloadTimeout ??
116
118
  inferenceModelDataDownloadTime(volumeSize);
117
119
  cfnEndpointConfig.addPropertyOverride("ProductionVariants.0.ModelDataDownloadTimeoutInSeconds", modelDataDownloadTimeout.toSeconds());
@@ -142,9 +144,9 @@ class TransformersNeuronxSageMakerRealtimeInferenceEndpoint extends constructs_1
142
144
  }
143
145
  exports.TransformersNeuronxSageMakerRealtimeInferenceEndpoint = TransformersNeuronxSageMakerRealtimeInferenceEndpoint;
144
146
  _b = JSII_RTTI_SYMBOL_1;
145
- TransformersNeuronxSageMakerRealtimeInferenceEndpoint[_b] = { fqn: "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerRealtimeInferenceEndpoint", version: "0.0.4" };
147
+ TransformersNeuronxSageMakerRealtimeInferenceEndpoint[_b] = { fqn: "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerRealtimeInferenceEndpoint", version: "0.0.6" };
146
148
  function inferenceModelDataDownloadTime(volumeSize) {
147
149
  const seconds = volumeSize.toGibibytes() * 15;
148
150
  return aws_cdk_lib_1.Duration.seconds(seconds < 3600 ? seconds : 3600);
149
151
  }
150
- //# sourceMappingURL=data:application/json;base64,{"version":3,"file":"transformers-neuronx-sagemaker-realtime-inference.js","sourceRoot":"","sources":["../src/transformers-neuronx-sagemaker-realtime-inference.ts"],"names":[],"mappings":";;;;;AAAA,+BAA4B;AAC5B,0DAA0D;AAC1D,6CAA6C;AAG7C,qEAIuC;AAEvC,2CAAuC;AACvC,mCAA2E;AAE3E,mEAA8D;AAC9D,yCAA8C;AAmC9C,MAAa,8CAA8C;IACzD,MAAM,CAAC,UAAU,CACf,MAAe,EACf,MAAc,EACd,OAAmC;QAEnC,MAAM,UAAU,GAAG,OAAO,CAAC,cAAc,EAAE,UAAU,IAAI,IAAI,CAAC;QAC9D,MAAM,UAAU,GAAG,OAAO,CAAC,cAAc,EAAE,UAAU,CAAC;QACtD,MAAM,QAAQ,GAAG,OAAO,CAAC,cAAc,EAAE,QAAQ,IAAI,gBAAQ,CAAC,YAAY,CAAC;QAC3E,MAAM,QAAQ,GACZ,OAAO,CAAC,cAAc,EAAE,QAAQ;YAChC,IAAA,mBAAY,EAAC,OAAO,CAAC,UAAU,EAAE;gBAC/B,UAAU;gBACV,UAAU;aACX,CAAC,CAAC;QACL,OAAO,IAAI,8CAA8C,CAAC;YACxD,MAAM;YACN,wBAAwB,EAAE,MAAM;YAChC,UAAU;YACV,UAAU;YACV,QAAQ;YACR,QAAQ;YACR,IAAI,EAAE,OAAO,CAAC,IAAI;YAClB,aAAa,EAAE,OAAO,CAAC,aAAa;YACpC,UAAU,EAAE,OAAO,CAAC,UAAU;SAC/B,CAAC,CAAC;IACL,CAAC;IACD,MAAM,CAAC,kBAAkB,CAAC,OAAuB,EAAE,IAAc;QAC/D,OAAO,IAAI,8CAA8C,CAAC;YACxD,GAAG,OAAO;YACV,MAAM,EAAE,OAAO,CAAC,wBAAwB;YACxC,wBAAwB,EAAE,OAAO,CAAC,wBAAwB;YAC1D,IAAI;SACL,CAAC,CAAC;IACL,CAAC;IAYD,YAAoB,OAWnB;QACC,IAAI,CAAC,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;QAC7B,IAAI,CAAC,wBAAwB,GAAG,OAAO,CAAC,wBAAwB,CAAC;QACjE,IAAI,CAAC,IAAI;YACP,OAAO,CAAC,IAAI;gBACZ,0BAAM,CAAC,KAAK,CACV,IAAA,WAAI,EAAC,SAAS,EAAE,gDAAgD,CAAC,CAClE,CAAC;QACJ,IAAI,CAAC,QAAQ,GAAG,OAAO,CAAC,QAAQ,CAAC;QACjC,IAAI,CAAC,UAAU,GAAG,OAAO,CAAC,UAAU,CAAC;QACrC,IAAI,CAAC,UAAU,GAAG,OAAO,CAAC,UAAU,CAAC;QACrC,IAAI,CAAC,QAAQ,GAAG,OAAO,CAAC,QAAQ,CAAC;QACjC,IAAI,CAAC,aAAa,GAAG,OAAO,CAAC,aAAa,CAAC;QAC3C,IAAI,CAAC,oBAAoB,GAAG,OAAO,CAAC,oBAAoB,CAAC;QACzD,IAAI,CAAC,UAAU,GAAG,OAAO,CAAC,UAAU,CAAC;IACvC,CAAC;;AAxEH,wGAyEC;;;AA2CD,MAAa,qDAAsD,SAAQ,sBAAS;IAYlF,YACE,KAAgB,EAChB,EAAU,EACV,KAAiE;QAEjE,KAAK,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;QACjB,MAAM,KAAK,GACT,KAAK,CAAC,KAAK;YACX,SAAS,CAAC,cAAc,CAAC,SAAS,CAChC,IAAA,WAAI,EAAC,SAAS,EAAE,2CAA2C,CAAC,CAC7D,CAAC;QACJ,MAAM,YAAY,GAChB,KAAK,CAAC,YAAY;YAClB,IAAI,CAAC,4BAA4B,CAAC,KAAK,CAAC,SAAS,CAAC,QAAQ,CAAC,CAAC;QAC9D,MAAM,MAAM,GAAG,IAAI,oCAAgB,CAAC,IAAI,EAAE,gBAAgB,EAAE;YAC1D,iBAAiB,EAAE,KAAK,CAAC,SAAS,CAAC,MAAM;YACzC,OAAO,EAAE,CAAC,KAAK,CAAC,SAAS,CAAC,IAAI,CAAC;YAC/B,oBAAoB,EAAE,IAAA,WAAI,EACxB,KAAK,CAAC,SAAS,CAAC,wBAAwB,EACxC,MAAM,CACP;SACF,CAAC,CAAC;QACH,MAAM,KAAK,GAAG,IAAI,SAAS,CAAC,KAAK,CAAC,IAAI,EAAE,OAAO,EAAE;YAC/C,UAAU,EAAE;gBACV;oBACE,KAAK;oBACL,WAAW,EAAE;wBACX,mBAAmB,EAAE,KAAK,CAAC,SAAS,CAAC,QAAQ,CAAC,QAAQ,EAAE;wBACxD,2BAA2B,EAAE,CAAC,EAAE,GAAG,EAAE,CAAC,CAAC,QAAQ,EAAE;wBACjD,4BAA4B,EAAE,IAAI,CAAC,KAAK,CACtC,YAAY,CAAC,gBAAgB,CAAC,YAAY;4BACxC,KAAK,CAAC,SAAS,CAAC,QAAQ,CAC3B,CAAC,QAAQ,EAAE;wBACZ,KAAK,EAAE,KAAK,CAAC,SAAS,CAAC,aAAa,IAAI,SAAS;wBACjD,iBAAiB,EACf,KAAK,CAAC,SAAS,CAAC,oBAAoB,IAAI,YAAY;wBACtD,SAAS,EAAE,KAAK,CAAC,SAAS,CAAC,QAAQ,CAAC,QAAQ,EAAE;wBAC9C,WAAW,EAAE,KAAK,CAAC,SAAS,CAAC,UAAU,CAAC,QAAQ,EAAE;wBAClD,SAAS,EAAE,KAAK,CAAC,SAAS,CAAC,QAAQ,CAAC,QAAQ,EAAE;wBAC9C,WAAW,EAAE,KAAK,CAAC,SAAS,CAAC,UAAU,EAAE,QAAQ,EAAE,IAAI,EAAE;wBACzD,GAAG,KAAK,CAAC,WAAW;qBACrB;iBACF;aACF;SACF,CAAC,CAAC;QACH,MAAM,QAAQ,GAAG,KAAK,CAAC,IAAI,CAAC,SAAS,CAAC,OAAO,CAAa,CAAC;QAC3D,QAAQ,CAAC,mBAAmB,CAC1B,qDAAqD,EACrD,KAAK,CAAC,SAAS,CAAC,MAAM,CAAC,cAAc,CACnC,KAAK,CAAC,SAAS,CAAC,wBAAwB,CACzC,CACF,CAAC;QACF,QAAQ,CAAC,mBAAmB,CAC1B,0DAA0D,EAC1D,UAAU,CACX,CAAC;QACF,QAAQ,CAAC,mBAAmB,CAC1B,+DAA+D,EAC/D,MAAM,CACP,CAAC;QACF,KAAK,CAAC,SAAS,CAAC,MAAM,CAAC,SAAS,CAAC,KAAK,CAAC,CAAC;QACxC,KAAK,CAAC,IAAI,CAAC,aAAa,CAAC,MAAM,CAAC,CAAC;QACjC,MAAM,cAAc,GAAG,IAAI,SAAS,CAAC,cAAc,CACjD,IAAI,EACJ,gBAAgB,EAChB;YACE,0BAA0B,EAAE;gBAC1B;oBACE,KAAK;oBACL,WAAW,EAAE,gBAAgB;oBAC7B,YAAY,EAAE,SAAS,CAAC,YAAY,CAAC,EAAE,CAAC,YAAY,CAAC,QAAQ,EAAE,CAAC;iBACjE;aACF;SACF,CACF,CAAC;QACF,MAAM,iBAAiB,GAAG,cAAc,CAAC,IAAI,CAAC,SAAS,CACrD,gBAAgB,CACI,CAAC;QACvB,MAAM,UAAU,GAAG,kBAAI,CAAC,SAAS,CAC/B,KAAK,CAAC,UAAU,EAAE,WAAW,EAAE;YAC7B,CAAC,KAAK,CAAC,SAAS,CAAC,UAAU,CAAC,QAAQ,EAAE,GAAG,GAAG,GAAG,GAAG;gBAChD,CAAC,CAAC,GAAG;gBACL,CAAC,CAAC,KAAK,CAAC,SAAS,CAAC,UAAU,CAAC,QAAQ,EAAE,GAAG,GAAG,CAAC,CACnD,CAAC;QACF,iBAAiB,CAAC,mBAAmB,CACnC,qCAAqC,EACrC,UAAU,CAAC,WAAW,EAAE,CACzB,CAAC;QACF,MAAM,wBAAwB,GAC5B,KAAK,CAAC,wBAAwB;YAC9B,8BAA8B,CAAC,UAAU,CAAC,CAAC;QAC7C,iBAAiB,CAAC,mBAAmB,CACnC,wDAAwD,EACxD,wBAAwB,CAAC,SAAS,EAAE,CACrC,CAAC;QACF,iBAAiB,CAAC,mBAAmB,CACnC,kEAAkE,EAClE,CACE,KAAK,CAAC,kCAAkC,IAAI,wBAAwB,CACrE,CAAC,SAAS,EAAE,CACd,CAAC;QACF,MAAM,QAAQ,GAAG,IAAI,SAAS,CAAC,QAAQ,CAAC,IAAI,EAAE,UAAU,EAAE;YACxD,cAAc;SACf,CAAC,CAAC;QACH,IAAI,CAAC,WAAW,GAAG,QAAQ,CAAC,WAAW,CAAC;QACxC,IAAI,CAAC,YAAY,GAAG,QAAQ,CAAC,YAAY,CAAC;QAC1C,IAAI,CAAC,QAAQ,GAAG,QAAQ,CAAC;IAC3B,CAAC;IACD,WAAW,CAAC,OAAmB;QAC7B,OAAO,IAAI,CAAC,QAAQ,CAAC,WAAW,CAAC,OAAO,CAAC,CAAC;IAC5C,CAAC;IAEO,4BAA4B,CAAC,QAAgB;QACnD,MAAM,aAAa,GAAG;YACpB,2CAAmB,CAAC,YAAY;YAChC,2CAAmB,CAAC,aAAa;YACjC,2CAAmB,CAAC,aAAa;SAClC,CAAC;QACF,KAAK,MAAM,YAAY,IAAI,aAAa,EAAE,CAAC;YACzC,IAAI,QAAQ,IAAI,YAAY,CAAC,gBAAgB,CAAC,YAAY,EAAE,CAAC;gBAC3D,OAAO,YAAY,CAAC;YACtB,CAAC;QACH,CAAC;QACD,MAAM,IAAI,KAAK,CACb,wEAAwE,CACzE,CAAC;IACJ,CAAC;;AA1IH,sHA2IC;;;AAED,SAAS,8BAA8B,CAAC,UAAgB;IACtD,MAAM,OAAO,GAAG,UAAU,CAAC,WAAW,EAAE,GAAG,EAAE,CAAC;IAC9C,OAAO,sBAAQ,CAAC,OAAO,CAAC,OAAO,GAAG,IAAI,CAAC,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC;AAC3D,CAAC","sourcesContent":["import { join } from \"path\";\nimport * as sagemaker from \"@aws-cdk/aws-sagemaker-alpha\";\nimport { Duration, Size } from \"aws-cdk-lib\";\nimport { Grant, IGrantable } from \"aws-cdk-lib/aws-iam\";\nimport { IBucket } from \"aws-cdk-lib/aws-s3\";\nimport {\n  BucketDeployment,\n  ISource,\n  Source,\n} from \"aws-cdk-lib/aws-s3-deployment\";\nimport { CfnEndpointConfig, CfnModel } from \"aws-cdk-lib/aws-sagemaker\";\nimport { Construct } from \"constructs\";\nimport { CompileOptions, OptLevel, Parameters, QuantDtype } from \"./model\";\nimport { NeuronxCompile } from \"./neuronx-compile\";\nimport { NeuronxInstanceType } from \"./neuronx-instance-type\";\nimport { calcTpDegree } from \"./private/util\";\n\n/**\n * Precompiled model options.\n */\nexport interface CompiledModelOptions {\n  /**\n   * Neuronx compile options.\n   * @default - Each properties are set default.\n   */\n  readonly compileOptions?: CompileOptions;\n  /**\n   * Code used for inference\n   * @default - using the predefined code\n   */\n  readonly code?: ISource;\n  /**\n   * Model ID or saved path\n   * @default \"./model\"\n   */\n  readonly modelIdOrPath?: string;\n  /**\n   * The path where compiled artifacts (i.e. xxx.neff) are stored\n   * @default \"./compiled\"\n   */\n  readonly compiledArtifactPath?: string;\n}\n\nexport interface BucketCompiledModelOptions extends CompiledModelOptions {\n  /**\n   * The number of parameters of model.\n   */\n  readonly parameters: Parameters;\n}\n\nexport class TransformersNeuronxSageMakerInferenceModelData {\n  static fromBucket(\n    bucket: IBucket,\n    prefix: string,\n    options: BucketCompiledModelOptions,\n  ) {\n    const nPositions = options.compileOptions?.nPositions ?? 4096;\n    const quantDtype = options.compileOptions?.quantDtype;\n    const optLevel = options.compileOptions?.optLevel ?? OptLevel.BEST_BALANCE;\n    const tpDegree =\n      options.compileOptions?.tpDegree ??\n      calcTpDegree(options.parameters, {\n        nPositions,\n        quantDtype,\n      });\n    return new TransformersNeuronxSageMakerInferenceModelData({\n      bucket,\n      compiledArtifactS3Prefix: prefix,\n      nPositions,\n      quantDtype,\n      optLevel,\n      tpDegree,\n      code: options.code,\n      modelIdOrPath: options.modelIdOrPath,\n      parameters: options.parameters,\n    });\n  }\n  static fromNeuronxCompile(compile: NeuronxCompile, code?: ISource) {\n    return new TransformersNeuronxSageMakerInferenceModelData({\n      ...compile,\n      bucket: compile.compiledArtifactS3Bucket,\n      compiledArtifactS3Prefix: compile.compiledArtifactS3Prefix,\n      code,\n    });\n  }\n  readonly bucket: IBucket;\n  readonly compiledArtifactS3Prefix: string;\n  readonly code: ISource;\n  readonly tpDegree: number;\n  readonly quantDtype?: QuantDtype;\n  readonly nPositions: number;\n  readonly optLevel: OptLevel;\n  readonly modelIdOrPath?: string;\n  readonly compiledArtifactPath?: string;\n  readonly parameters: Parameters;\n\n  private constructor(options: {\n    readonly bucket: IBucket;\n    readonly compiledArtifactS3Prefix: string;\n    readonly tpDegree: number;\n    readonly quantDtype?: QuantDtype;\n    readonly nPositions: number;\n    readonly optLevel: OptLevel;\n    readonly code?: ISource;\n    readonly modelIdOrPath?: string;\n    readonly compiledArtifactPath?: string;\n    readonly parameters: Parameters;\n  }) {\n    this.bucket = options.bucket;\n    this.compiledArtifactS3Prefix = options.compiledArtifactS3Prefix;\n    this.code =\n      options.code ??\n      Source.asset(\n        join(__dirname, \"../scripts/inference/transformers-neuronx/code\"),\n      );\n    this.tpDegree = options.tpDegree;\n    this.quantDtype = options.quantDtype;\n    this.nPositions = options.nPositions;\n    this.optLevel = options.optLevel;\n    this.modelIdOrPath = options.modelIdOrPath;\n    this.compiledArtifactPath = options.compiledArtifactPath;\n    this.parameters = options.parameters;\n  }\n}\n\nexport interface TransformersNeuronxSageMakerRealtimeInferenceEndpointProps {\n  /**\n   * Model data for SageMaker inference.\n   * The model data requires at least compiled artifacts.\n   */\n  readonly modelData: TransformersNeuronxSageMakerInferenceModelData;\n  /**\n   * An image of the container where the inference job is executed.\n   */\n  readonly image?: sagemaker.ContainerImage;\n  /**\n   * A map of environment variables to pass into the container.\n   * @default - Only the predefined environment variables required to use Neuronx have been set.\n   */\n  readonly environment?: { [key: string]: string };\n  /**\n   * The instance type of compile worker instance.\n   * @default - It is determined automatically according to the number of model parameters and compilation options.\n   */\n  readonly instanceType?: NeuronxInstanceType;\n  /**\n   * The size, of the ML storage volume attached to individual inference instance associated with the production variant.\n   * Currently only Amazon EBS gp2 storage volumes are supported.\n   * @see https://aws.amazon.com/jp/releasenotes/host-instance-storage-volumes-table\n   * @default - 2.5 GB per billion parameter (Max 512 GB)\n   */\n  readonly volumeSize?: Size;\n  /**\n   * The timeout value, to download and extract the model that you want to host from Amazon S3\n   * to the individual inference instance associated with this production variant.\n   * @default - 60 seconds, when `volumeSize` larger than 30GB then 1GB x 15 seconds (max 60 minutes)\n   */\n  readonly modelDataDownloadTimeout?: Duration;\n  /**\n   * The timeout value, for your inference container to pass health check by SageMaker Hosting.\n   * @see https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-algo-ping-requests\n   * @default - 60 seconds, when set the `modelDataDownloadTimeout` then use same value (max 60 minutes)\n   */\n  readonly containerStartupHealthCheckTimeout?: Duration;\n}\n\nexport class TransformersNeuronxSageMakerRealtimeInferenceEndpoint extends Construct {\n  /**\n   * The ARN of the endpoint.\n   * @attribute\n   */\n  readonly endpointArn: string;\n  /**\n   * The name of the endpoint.\n   * @attribute\n   */\n  readonly endpointName: string;\n  private readonly endpoint: sagemaker.Endpoint;\n  constructor(\n    scope: Construct,\n    id: string,\n    props: TransformersNeuronxSageMakerRealtimeInferenceEndpointProps,\n  ) {\n    super(scope, id);\n    const image =\n      props.image ??\n      sagemaker.ContainerImage.fromAsset(\n        join(__dirname, \"../scripts/inference/transformers-neuronx\"),\n      );\n    const instanceType =\n      props.instanceType ??\n      this.selectInstanceTypeByTpDegree(props.modelData.tpDegree);\n    const deploy = new BucketDeployment(this, \"CodeDeployment\", {\n      destinationBucket: props.modelData.bucket,\n      sources: [props.modelData.code],\n      destinationKeyPrefix: join(\n        props.modelData.compiledArtifactS3Prefix,\n        \"code\",\n      ),\n    });\n    const model = new sagemaker.Model(this, \"Model\", {\n      containers: [\n        {\n          image,\n          environment: {\n            NEURON_RT_NUM_CORES: props.modelData.tpDegree.toString(),\n            TS_DEFAULT_RESPONSE_TIMEOUT: (60 * 60).toString(),\n            TS_DEFAULT_WORKERS_PER_MODEL: Math.floor(\n              instanceType.acceleratorChips.neuronxCores /\n                props.modelData.tpDegree,\n            ).toString(),\n            MODEL: props.modelData.modelIdOrPath ?? \"./model\",\n            COMPILED_ARTIFACT:\n              props.modelData.compiledArtifactPath ?? \"./compiled\",\n            TP_DEGREE: props.modelData.tpDegree.toString(),\n            N_POSITIONS: props.modelData.nPositions.toString(),\n            OPT_LEVEL: props.modelData.optLevel.toString(),\n            QUANT_DTYPE: props.modelData.quantDtype?.toString() ?? \"\",\n            ...props.environment,\n          },\n        },\n      ],\n    });\n    const cfnModel = model.node.findChild(\"Model\") as CfnModel;\n    cfnModel.addPropertyOverride(\n      \"PrimaryContainer.ModelDataSource.S3DataSource.S3Uri\",\n      props.modelData.bucket.s3UrlForObject(\n        props.modelData.compiledArtifactS3Prefix,\n      ),\n    );\n    cfnModel.addPropertyOverride(\n      \"PrimaryContainer.ModelDataSource.S3DataSource.S3DataType\",\n      \"S3Prefix\",\n    );\n    cfnModel.addPropertyOverride(\n      \"PrimaryContainer.ModelDataSource.S3DataSource.CompressionType\",\n      \"None\",\n    );\n    props.modelData.bucket.grantRead(model);\n    model.node.addDependency(deploy);\n    const endpointConfig = new sagemaker.EndpointConfig(\n      this,\n      \"EndpointConfig\",\n      {\n        instanceProductionVariants: [\n          {\n            model,\n            variantName: \"PrimaryVariant\",\n            instanceType: sagemaker.InstanceType.of(instanceType.toString()),\n          },\n        ],\n      },\n    );\n    const cfnEndpointConfig = endpointConfig.node.findChild(\n      \"EndpointConfig\",\n    ) as CfnEndpointConfig;\n    const volumeSize = Size.gibibytes(\n      props.volumeSize?.toGibibytes() ??\n        (props.modelData.parameters.toBilion() * 2.5 > 512\n          ? 512\n          : props.modelData.parameters.toBilion() * 2.5),\n    );\n    cfnEndpointConfig.addPropertyOverride(\n      \"ProductionVariants.0.VolumeSizeInGB\",\n      volumeSize.toGibibytes(),\n    );\n    const modelDataDownloadTimeout =\n      props.modelDataDownloadTimeout ??\n      inferenceModelDataDownloadTime(volumeSize);\n    cfnEndpointConfig.addPropertyOverride(\n      \"ProductionVariants.0.ModelDataDownloadTimeoutInSeconds\",\n      modelDataDownloadTimeout.toSeconds(),\n    );\n    cfnEndpointConfig.addPropertyOverride(\n      \"ProductionVariants.0.ContainerStartupHealthCheckTimeoutInSeconds\",\n      (\n        props.containerStartupHealthCheckTimeout ?? modelDataDownloadTimeout\n      ).toSeconds(),\n    );\n    const endpoint = new sagemaker.Endpoint(this, \"Endpoint\", {\n      endpointConfig,\n    });\n    this.endpointArn = endpoint.endpointArn;\n    this.endpointName = endpoint.endpointName;\n    this.endpoint = endpoint;\n  }\n  grantInvoke(grantee: IGrantable): Grant {\n    return this.endpoint.grantInvoke(grantee);\n  }\n\n  private selectInstanceTypeByTpDegree(tpDegree: number) {\n    const instanceTypes = [\n      NeuronxInstanceType.INF2_8XLARGE,\n      NeuronxInstanceType.INF2_24XLARGE,\n      NeuronxInstanceType.INF2_48XLARGE,\n    ];\n    for (const instanceType of instanceTypes) {\n      if (tpDegree <= instanceType.acceleratorChips.neuronxCores) {\n        return instanceType;\n      }\n    }\n    throw new Error(\n      \"This model is too large, I can not support this model current version.\",\n    );\n  }\n}\n\nfunction inferenceModelDataDownloadTime(volumeSize: Size) {\n  const seconds = volumeSize.toGibibytes() * 15;\n  return Duration.seconds(seconds < 3600 ? seconds : 3600);\n}\n"]}
152
+ //# sourceMappingURL=data:application/json;base64,{"version":3,"file":"transformers-neuronx-sagemaker-realtime-inference.js","sourceRoot":"","sources":["../src/transformers-neuronx-sagemaker-realtime-inference.ts"],"names":[],"mappings":";;;;;AAAA,+BAA4B;AAC5B,0DAA0D;AAC1D,6CAA6C;AAG7C,qEAIuC;AAEvC,2CAAuC;AACvC,mCAA2E;AAE3E,mEAA8D;AAC9D,yCAA8C;AAmC9C,MAAa,8CAA8C;IACzD,MAAM,CAAC,UAAU,CACf,MAAe,EACf,MAAc,EACd,OAAmC;QAEnC,MAAM,UAAU,GAAG,OAAO,CAAC,cAAc,EAAE,UAAU,IAAI,IAAI,CAAC;QAC9D,MAAM,UAAU,GAAG,OAAO,CAAC,cAAc,EAAE,UAAU,CAAC;QACtD,MAAM,QAAQ,GAAG,OAAO,CAAC,cAAc,EAAE,QAAQ,IAAI,gBAAQ,CAAC,YAAY,CAAC;QAC3E,MAAM,QAAQ,GACZ,OAAO,CAAC,cAAc,EAAE,QAAQ;YAChC,IAAA,mBAAY,EAAC,OAAO,CAAC,UAAU,EAAE;gBAC/B,UAAU;gBACV,UAAU;aACX,CAAC,CAAC;QACL,OAAO,IAAI,8CAA8C,CAAC;YACxD,MAAM;YACN,wBAAwB,EAAE,MAAM;YAChC,UAAU;YACV,UAAU;YACV,QAAQ;YACR,QAAQ;YACR,IAAI,EAAE,OAAO,CAAC,IAAI;YAClB,aAAa,EAAE,OAAO,CAAC,aAAa;YACpC,UAAU,EAAE,OAAO,CAAC,UAAU;SAC/B,CAAC,CAAC;IACL,CAAC;IACD,MAAM,CAAC,kBAAkB,CAAC,OAAuB,EAAE,IAAc;QAC/D,OAAO,IAAI,8CAA8C,CAAC;YACxD,GAAG,OAAO;YACV,MAAM,EAAE,OAAO,CAAC,wBAAwB;YACxC,wBAAwB,EAAE,OAAO,CAAC,wBAAwB;YAC1D,IAAI;SACL,CAAC,CAAC;IACL,CAAC;IAYD,YAAoB,OAWnB;QACC,IAAI,CAAC,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;QAC7B,IAAI,CAAC,wBAAwB,GAAG,OAAO,CAAC,wBAAwB,CAAC;QACjE,IAAI,CAAC,IAAI;YACP,OAAO,CAAC,IAAI;gBACZ,0BAAM,CAAC,KAAK,CACV,IAAA,WAAI,EAAC,SAAS,EAAE,gDAAgD,CAAC,CAClE,CAAC;QACJ,IAAI,CAAC,QAAQ,GAAG,OAAO,CAAC,QAAQ,CAAC;QACjC,IAAI,CAAC,UAAU,GAAG,OAAO,CAAC,UAAU,CAAC;QACrC,IAAI,CAAC,UAAU,GAAG,OAAO,CAAC,UAAU,CAAC;QACrC,IAAI,CAAC,QAAQ,GAAG,OAAO,CAAC,QAAQ,CAAC;QACjC,IAAI,CAAC,aAAa,GAAG,OAAO,CAAC,aAAa,CAAC;QAC3C,IAAI,CAAC,oBAAoB,GAAG,OAAO,CAAC,oBAAoB,CAAC;QACzD,IAAI,CAAC,UAAU,GAAG,OAAO,CAAC,UAAU,CAAC;IACvC,CAAC;;AAxEH,wGAyEC;;;AA2CD,MAAa,qDAAsD,SAAQ,sBAAS;IAYlF,YACE,KAAgB,EAChB,EAAU,EACV,KAAiE;QAEjE,KAAK,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;QACjB,MAAM,KAAK,GACT,KAAK,CAAC,KAAK;YACX,SAAS,CAAC,cAAc,CAAC,SAAS,CAChC,IAAA,WAAI,EAAC,SAAS,EAAE,2CAA2C,CAAC,CAC7D,CAAC;QACJ,MAAM,YAAY,GAChB,KAAK,CAAC,YAAY;YAClB,IAAI,CAAC,4BAA4B,CAAC,KAAK,CAAC,SAAS,CAAC,QAAQ,CAAC,CAAC;QAC9D,MAAM,MAAM,GAAG,IAAI,oCAAgB,CAAC,IAAI,EAAE,gBAAgB,EAAE;YAC1D,iBAAiB,EAAE,KAAK,CAAC,SAAS,CAAC,MAAM;YACzC,OAAO,EAAE,CAAC,KAAK,CAAC,SAAS,CAAC,IAAI,CAAC;YAC/B,oBAAoB,EAAE,IAAA,WAAI,EACxB,KAAK,CAAC,SAAS,CAAC,wBAAwB,EACxC,MAAM,CACP;SACF,CAAC,CAAC;QACH,MAAM,KAAK,GAAG,IAAI,SAAS,CAAC,KAAK,CAAC,IAAI,EAAE,OAAO,EAAE;YAC/C,UAAU,EAAE;gBACV;oBACE,KAAK;oBACL,WAAW,EAAE;wBACX,mBAAmB,EAAE,KAAK,CAAC,SAAS,CAAC,QAAQ,CAAC,QAAQ,EAAE;wBACxD,2BAA2B,EAAE,CAAC,EAAE,GAAG,EAAE,CAAC,CAAC,QAAQ,EAAE;wBACjD,4BAA4B,EAAE,IAAI,CAAC,KAAK,CACtC,YAAY,CAAC,gBAAgB,CAAC,YAAY;4BACxC,KAAK,CAAC,SAAS,CAAC,QAAQ,CAC3B,CAAC,QAAQ,EAAE;wBACZ,KAAK,EAAE,KAAK,CAAC,SAAS,CAAC,aAAa,IAAI,SAAS;wBACjD,iBAAiB,EACf,KAAK,CAAC,SAAS,CAAC,oBAAoB,IAAI,YAAY;wBACtD,SAAS,EAAE,KAAK,CAAC,SAAS,CAAC,QAAQ,CAAC,QAAQ,EAAE;wBAC9C,WAAW,EAAE,KAAK,CAAC,SAAS,CAAC,UAAU,CAAC,QAAQ,EAAE;wBAClD,SAAS,EAAE,KAAK,CAAC,SAAS,CAAC,QAAQ,CAAC,QAAQ,EAAE;wBAC9C,WAAW,EAAE,KAAK,CAAC,SAAS,CAAC,UAAU,EAAE,QAAQ,EAAE,IAAI,EAAE;wBACzD,GAAG,KAAK,CAAC,WAAW;qBACrB;iBACF;aACF;SACF,CAAC,CAAC;QACH,MAAM,QAAQ,GAAG,KAAK,CAAC,IAAI,CAAC,SAAS,CAAC,OAAO,CAAa,CAAC;QAC3D,QAAQ,CAAC,mBAAmB,CAC1B,qDAAqD,EACrD,KAAK,CAAC,SAAS,CAAC,MAAM,CAAC,cAAc,CACnC,KAAK,CAAC,SAAS,CAAC,wBAAwB,CACzC,CACF,CAAC;QACF,QAAQ,CAAC,mBAAmB,CAC1B,0DAA0D,EAC1D,UAAU,CACX,CAAC;QACF,QAAQ,CAAC,mBAAmB,CAC1B,+DAA+D,EAC/D,MAAM,CACP,CAAC;QACF,KAAK,CAAC,SAAS,CAAC,MAAM,CAAC,SAAS,CAAC,KAAK,CAAC,CAAC;QACxC,KAAK,CAAC,IAAI,CAAC,aAAa,CAAC,MAAM,CAAC,CAAC;QACjC,MAAM,cAAc,GAAG,IAAI,SAAS,CAAC,cAAc,CACjD,IAAI,EACJ,gBAAgB,EAChB;YACE,0BAA0B,EAAE;gBAC1B;oBACE,KAAK;oBACL,WAAW,EAAE,gBAAgB;oBAC7B,YAAY,EAAE,SAAS,CAAC,YAAY,CAAC,EAAE,CAAC,YAAY,CAAC,QAAQ,EAAE,CAAC;iBACjE;aACF;SACF,CACF,CAAC;QACF,MAAM,iBAAiB,GAAG,cAAc,CAAC,IAAI,CAAC,SAAS,CACrD,gBAAgB,CACI,CAAC;QACvB,MAAM,UAAU,GAAG,kBAAI,CAAC,SAAS,CAC/B,KAAK,CAAC,UAAU,EAAE,WAAW,EAAE;YAC7B,CAAC,KAAK,CAAC,SAAS,CAAC,UAAU,CAAC,QAAQ,EAAE,GAAG,GAAG,GAAG,GAAG;gBAChD,CAAC,CAAC,GAAG;gBACL,CAAC,CAAC,KAAK,CAAC,SAAS,CAAC,UAAU,CAAC,QAAQ,EAAE,GAAG,GAAG,CAAC,CACnD,CAAC;QACF,iBAAiB,CAAC,mBAAmB,CACnC,qCAAqC,EACrC,YAAY,CAAC,QAAQ,EAAE,CAAC,UAAU,CAAC,SAAS,CAAC;YAC3C,CAAC,CAAC,SAAS;YACX,CAAC,CAAC,UAAU,CAAC,WAAW,EAAE,CAC7B,CAAC;QACF,MAAM,wBAAwB,GAC5B,KAAK,CAAC,wBAAwB;YAC9B,8BAA8B,CAAC,UAAU,CAAC,CAAC;QAC7C,iBAAiB,CAAC,mBAAmB,CACnC,wDAAwD,EACxD,wBAAwB,CAAC,SAAS,EAAE,CACrC,CAAC;QACF,iBAAiB,CAAC,mBAAmB,CACnC,kEAAkE,EAClE,CACE,KAAK,CAAC,kCAAkC,IAAI,wBAAwB,CACrE,CAAC,SAAS,EAAE,CACd,CAAC;QACF,MAAM,QAAQ,GAAG,IAAI,SAAS,CAAC,QAAQ,CAAC,IAAI,EAAE,UAAU,EAAE;YACxD,cAAc;SACf,CAAC,CAAC;QACH,IAAI,CAAC,WAAW,GAAG,QAAQ,CAAC,WAAW,CAAC;QACxC,IAAI,CAAC,YAAY,GAAG,QAAQ,CAAC,YAAY,CAAC;QAC1C,IAAI,CAAC,QAAQ,GAAG,QAAQ,CAAC;IAC3B,CAAC;IACD,WAAW,CAAC,OAAmB;QAC7B,OAAO,IAAI,CAAC,QAAQ,CAAC,WAAW,CAAC,OAAO,CAAC,CAAC;IAC5C,CAAC;IAEO,4BAA4B,CAAC,QAAgB;QACnD,MAAM,aAAa,GAAG;YACpB,2CAAmB,CAAC,YAAY;YAChC,2CAAmB,CAAC,aAAa;YACjC,2CAAmB,CAAC,aAAa;SAClC,CAAC;QACF,KAAK,MAAM,YAAY,IAAI,aAAa,EAAE,CAAC;YACzC,IAAI,QAAQ,IAAI,YAAY,CAAC,gBAAgB,CAAC,YAAY,EAAE,CAAC;gBAC3D,OAAO,YAAY,CAAC;YACtB,CAAC;QACH,CAAC;QACD,MAAM,IAAI,KAAK,CACb,wEAAwE,CACzE,CAAC;IACJ,CAAC;;AA5IH,sHA6IC;;;AAED,SAAS,8BAA8B,CAAC,UAAgB;IACtD,MAAM,OAAO,GAAG,UAAU,CAAC,WAAW,EAAE,GAAG,EAAE,CAAC;IAC9C,OAAO,sBAAQ,CAAC,OAAO,CAAC,OAAO,GAAG,IAAI,CAAC,CAAC,CAAC,OAAO,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC;AAC3D,CAAC","sourcesContent":["import { join } from \"path\";\nimport * as sagemaker from \"@aws-cdk/aws-sagemaker-alpha\";\nimport { Duration, Size } from \"aws-cdk-lib\";\nimport { Grant, IGrantable } from \"aws-cdk-lib/aws-iam\";\nimport { IBucket } from \"aws-cdk-lib/aws-s3\";\nimport {\n  BucketDeployment,\n  ISource,\n  Source,\n} from \"aws-cdk-lib/aws-s3-deployment\";\nimport { CfnEndpointConfig, CfnModel } from \"aws-cdk-lib/aws-sagemaker\";\nimport { Construct } from \"constructs\";\nimport { CompileOptions, OptLevel, Parameters, QuantDtype } from \"./model\";\nimport { NeuronxCompile } from \"./neuronx-compile\";\nimport { NeuronxInstanceType } from \"./neuronx-instance-type\";\nimport { calcTpDegree } from \"./private/util\";\n\n/**\n * Precompiled model options.\n */\nexport interface CompiledModelOptions {\n  /**\n   * Neuronx compile options.\n   * @default - Each properties are set default.\n   */\n  readonly compileOptions?: CompileOptions;\n  /**\n   * Code used for inference\n   * @default - using the predefined code\n   */\n  readonly code?: ISource;\n  /**\n   * Model ID or saved path\n   * @default \"./model\"\n   */\n  readonly modelIdOrPath?: string;\n  /**\n   * The path where compiled artifacts (i.e. xxx.neff) are stored\n   * @default \"./compiled\"\n   */\n  readonly compiledArtifactPath?: string;\n}\n\nexport interface BucketCompiledModelOptions extends CompiledModelOptions {\n  /**\n   * The number of parameters of model.\n   */\n  readonly parameters: Parameters;\n}\n\nexport class TransformersNeuronxSageMakerInferenceModelData {\n  static fromBucket(\n    bucket: IBucket,\n    prefix: string,\n    options: BucketCompiledModelOptions,\n  ) {\n    const nPositions = options.compileOptions?.nPositions ?? 4096;\n    const quantDtype = options.compileOptions?.quantDtype;\n    const optLevel = options.compileOptions?.optLevel ?? OptLevel.BEST_BALANCE;\n    const tpDegree =\n      options.compileOptions?.tpDegree ??\n      calcTpDegree(options.parameters, {\n        nPositions,\n        quantDtype,\n      });\n    return new TransformersNeuronxSageMakerInferenceModelData({\n      bucket,\n      compiledArtifactS3Prefix: prefix,\n      nPositions,\n      quantDtype,\n      optLevel,\n      tpDegree,\n      code: options.code,\n      modelIdOrPath: options.modelIdOrPath,\n      parameters: options.parameters,\n    });\n  }\n  static fromNeuronxCompile(compile: NeuronxCompile, code?: ISource) {\n    return new TransformersNeuronxSageMakerInferenceModelData({\n      ...compile,\n      bucket: compile.compiledArtifactS3Bucket,\n      compiledArtifactS3Prefix: compile.compiledArtifactS3Prefix,\n      code,\n    });\n  }\n  readonly bucket: IBucket;\n  readonly compiledArtifactS3Prefix: string;\n  readonly code: ISource;\n  readonly tpDegree: number;\n  readonly quantDtype?: QuantDtype;\n  readonly nPositions: number;\n  readonly optLevel: OptLevel;\n  readonly modelIdOrPath?: string;\n  readonly compiledArtifactPath?: string;\n  readonly parameters: Parameters;\n\n  private constructor(options: {\n    readonly bucket: IBucket;\n    readonly compiledArtifactS3Prefix: string;\n    readonly tpDegree: number;\n    readonly quantDtype?: QuantDtype;\n    readonly nPositions: number;\n    readonly optLevel: OptLevel;\n    readonly code?: ISource;\n    readonly modelIdOrPath?: string;\n    readonly compiledArtifactPath?: string;\n    readonly parameters: Parameters;\n  }) {\n    this.bucket = options.bucket;\n    this.compiledArtifactS3Prefix = options.compiledArtifactS3Prefix;\n    this.code =\n      options.code ??\n      Source.asset(\n        join(__dirname, \"../scripts/inference/transformers-neuronx/code\"),\n      );\n    this.tpDegree = options.tpDegree;\n    this.quantDtype = options.quantDtype;\n    this.nPositions = options.nPositions;\n    this.optLevel = options.optLevel;\n    this.modelIdOrPath = options.modelIdOrPath;\n    this.compiledArtifactPath = options.compiledArtifactPath;\n    this.parameters = options.parameters;\n  }\n}\n\nexport interface TransformersNeuronxSageMakerRealtimeInferenceEndpointProps {\n  /**\n   * Model data for SageMaker inference.\n   * The model data requires at least compiled artifacts.\n   */\n  readonly modelData: TransformersNeuronxSageMakerInferenceModelData;\n  /**\n   * An image of the container where the inference job is executed.\n   */\n  readonly image?: sagemaker.ContainerImage;\n  /**\n   * A map of environment variables to pass into the container.\n   * @default - Only the predefined environment variables required to use Neuronx have been set.\n   */\n  readonly environment?: { [key: string]: string };\n  /**\n   * The instance type of compile worker instance.\n   * @default - It is determined automatically according to the number of model parameters and compilation options.\n   */\n  readonly instanceType?: NeuronxInstanceType;\n  /**\n   * The size, of the ML storage volume attached to individual inference instance associated with the production variant.\n   * Currently only Amazon EBS gp2 storage volumes are supported.\n   * @see https://aws.amazon.com/jp/releasenotes/host-instance-storage-volumes-table\n   * @default - 2.5 GB per billion parameter (Max 512 GB)\n   */\n  readonly volumeSize?: Size;\n  /**\n   * The timeout value, to download and extract the model that you want to host from Amazon S3\n   * to the individual inference instance associated with this production variant.\n   * @default - 60 seconds, when `volumeSize` larger than 30GB then 1GB x 15 seconds (max 60 minutes)\n   */\n  readonly modelDataDownloadTimeout?: Duration;\n  /**\n   * The timeout value, for your inference container to pass health check by SageMaker Hosting.\n   * @see https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-algo-ping-requests\n   * @default - 60 seconds, when set the `modelDataDownloadTimeout` then use same value (max 60 minutes)\n   */\n  readonly containerStartupHealthCheckTimeout?: Duration;\n}\n\nexport class TransformersNeuronxSageMakerRealtimeInferenceEndpoint extends Construct {\n  /**\n   * The ARN of the endpoint.\n   * @attribute\n   */\n  readonly endpointArn: string;\n  /**\n   * The name of the endpoint.\n   * @attribute\n   */\n  readonly endpointName: string;\n  private readonly endpoint: sagemaker.Endpoint;\n  constructor(\n    scope: Construct,\n    id: string,\n    props: TransformersNeuronxSageMakerRealtimeInferenceEndpointProps,\n  ) {\n    super(scope, id);\n    const image =\n      props.image ??\n      sagemaker.ContainerImage.fromAsset(\n        join(__dirname, \"../scripts/inference/transformers-neuronx\"),\n      );\n    const instanceType =\n      props.instanceType ??\n      this.selectInstanceTypeByTpDegree(props.modelData.tpDegree);\n    const deploy = new BucketDeployment(this, \"CodeDeployment\", {\n      destinationBucket: props.modelData.bucket,\n      sources: [props.modelData.code],\n      destinationKeyPrefix: join(\n        props.modelData.compiledArtifactS3Prefix,\n        \"code\",\n      ),\n    });\n    const model = new sagemaker.Model(this, \"Model\", {\n      containers: [\n        {\n          image,\n          environment: {\n            NEURON_RT_NUM_CORES: props.modelData.tpDegree.toString(),\n            TS_DEFAULT_RESPONSE_TIMEOUT: (60 * 60).toString(),\n            TS_DEFAULT_WORKERS_PER_MODEL: Math.floor(\n              instanceType.acceleratorChips.neuronxCores /\n                props.modelData.tpDegree,\n            ).toString(),\n            MODEL: props.modelData.modelIdOrPath ?? \"./model\",\n            COMPILED_ARTIFACT:\n              props.modelData.compiledArtifactPath ?? \"./compiled\",\n            TP_DEGREE: props.modelData.tpDegree.toString(),\n            N_POSITIONS: props.modelData.nPositions.toString(),\n            OPT_LEVEL: props.modelData.optLevel.toString(),\n            QUANT_DTYPE: props.modelData.quantDtype?.toString() ?? \"\",\n            ...props.environment,\n          },\n        },\n      ],\n    });\n    const cfnModel = model.node.findChild(\"Model\") as CfnModel;\n    cfnModel.addPropertyOverride(\n      \"PrimaryContainer.ModelDataSource.S3DataSource.S3Uri\",\n      props.modelData.bucket.s3UrlForObject(\n        props.modelData.compiledArtifactS3Prefix,\n      ),\n    );\n    cfnModel.addPropertyOverride(\n      \"PrimaryContainer.ModelDataSource.S3DataSource.S3DataType\",\n      \"S3Prefix\",\n    );\n    cfnModel.addPropertyOverride(\n      \"PrimaryContainer.ModelDataSource.S3DataSource.CompressionType\",\n      \"None\",\n    );\n    props.modelData.bucket.grantRead(model);\n    model.node.addDependency(deploy);\n    const endpointConfig = new sagemaker.EndpointConfig(\n      this,\n      \"EndpointConfig\",\n      {\n        instanceProductionVariants: [\n          {\n            model,\n            variantName: \"PrimaryVariant\",\n            instanceType: sagemaker.InstanceType.of(instanceType.toString()),\n          },\n        ],\n      },\n    );\n    const cfnEndpointConfig = endpointConfig.node.findChild(\n      \"EndpointConfig\",\n    ) as CfnEndpointConfig;\n    const volumeSize = Size.gibibytes(\n      props.volumeSize?.toGibibytes() ??\n        (props.modelData.parameters.toBilion() * 2.5 > 512\n          ? 512\n          : props.modelData.parameters.toBilion() * 2.5),\n    );\n    cfnEndpointConfig.addPropertyOverride(\n      \"ProductionVariants.0.VolumeSizeInGB\",\n      instanceType.toString().startsWith(\"ml.trn1\")\n        ? undefined\n        : volumeSize.toGibibytes(),\n    );\n    const modelDataDownloadTimeout =\n      props.modelDataDownloadTimeout ??\n      inferenceModelDataDownloadTime(volumeSize);\n    cfnEndpointConfig.addPropertyOverride(\n      \"ProductionVariants.0.ModelDataDownloadTimeoutInSeconds\",\n      modelDataDownloadTimeout.toSeconds(),\n    );\n    cfnEndpointConfig.addPropertyOverride(\n      \"ProductionVariants.0.ContainerStartupHealthCheckTimeoutInSeconds\",\n      (\n        props.containerStartupHealthCheckTimeout ?? modelDataDownloadTimeout\n      ).toSeconds(),\n    );\n    const endpoint = new sagemaker.Endpoint(this, \"Endpoint\", {\n      endpointConfig,\n    });\n    this.endpointArn = endpoint.endpointArn;\n    this.endpointName = endpoint.endpointName;\n    this.endpoint = endpoint;\n  }\n  grantInvoke(grantee: IGrantable): Grant {\n    return this.endpoint.grantInvoke(grantee);\n  }\n\n  private selectInstanceTypeByTpDegree(tpDegree: number) {\n    const instanceTypes = [\n      NeuronxInstanceType.INF2_8XLARGE,\n      NeuronxInstanceType.INF2_24XLARGE,\n      NeuronxInstanceType.INF2_48XLARGE,\n    ];\n    for (const instanceType of instanceTypes) {\n      if (tpDegree <= instanceType.acceleratorChips.neuronxCores) {\n        return instanceType;\n      }\n    }\n    throw new Error(\n      \"This model is too large, I can not support this model current version.\",\n    );\n  }\n}\n\nfunction inferenceModelDataDownloadTime(volumeSize: Size) {\n  const seconds = volumeSize.toGibibytes() * 15;\n  return Duration.seconds(seconds < 3600 ? seconds : 3600);\n}\n"]}
package/package.json CHANGED
@@ -91,7 +91,7 @@
91
91
  ]
92
92
  }
93
93
  },
94
- "version": "0.0.4",
94
+ "version": "0.0.6",
95
95
  "jest": {
96
96
  "coverageProvider": "v8",
97
97
  "testMatch": [