aws-cdk-neuronx-patterns 0.0.2 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.jsii CHANGED
@@ -7,6 +7,7 @@
7
7
  ]
8
8
  },
9
9
  "dependencies": {
10
+ "@aws-cdk/aws-sagemaker-alpha": "2.149.0-alpha.0",
10
11
  "aws-cdk-lib": "^2.149.0",
11
12
  "constructs": "^10.0.5"
12
13
  },
@@ -89,6 +90,37 @@
89
90
  }
90
91
  }
91
92
  },
93
+ "@aws-cdk/aws-sagemaker-alpha": {
94
+ "targets": {
95
+ "dotnet": {
96
+ "iconUrl": "https://raw.githubusercontent.com/aws/aws-cdk/main/logo/default-256-dark.png",
97
+ "namespace": "Amazon.CDK.AWS.Sagemaker.Alpha",
98
+ "packageId": "Amazon.CDK.AWS.Sagemaker.Alpha"
99
+ },
100
+ "go": {
101
+ "moduleName": "github.com/aws/aws-cdk-go",
102
+ "packageName": "awscdksagemakeralpha"
103
+ },
104
+ "java": {
105
+ "maven": {
106
+ "artifactId": "sagemaker-alpha",
107
+ "groupId": "software.amazon.awscdk"
108
+ },
109
+ "package": "software.amazon.awscdk.services.sagemaker.alpha"
110
+ },
111
+ "js": {
112
+ "npm": "@aws-cdk/aws-sagemaker-alpha"
113
+ },
114
+ "python": {
115
+ "classifiers": [
116
+ "Framework :: AWS CDK",
117
+ "Framework :: AWS CDK :: 2"
118
+ ],
119
+ "distName": "aws-cdk.aws-sagemaker-alpha",
120
+ "module": "aws_cdk.aws_sagemaker_alpha"
121
+ }
122
+ }
123
+ },
92
124
  "aws-cdk-lib": {
93
125
  "submodules": {
94
126
  "aws-cdk-lib.alexa_ask": {
@@ -3819,7 +3851,7 @@
3819
3851
  },
3820
3852
  "name": "aws-cdk-neuronx-patterns",
3821
3853
  "readme": {
3822
- "markdown": "# Neuronx patterns Construct Library\n\nThis library provides high-level architectural patterns using neuronx (e.g. Inferentia2 and Trainium1). It contains:\n\n- Neuronx Compile\n\n## Neuronx Compile\n\n:::note warn\nThis construct uses an Inferentia2 instance on EC2. You may need to increase your request limit for your AWS account.\n:::\n\nThis construct compiles models supported by Neuronx and uploads them to the specified S3 bucket.\n\nThis is NeuronxCompile architecture.\n![NeuronxCompile architecture](./docs/neuronx-compile-architecture.png)\n\nTo define\n\n```ts\nimport { Vpc } from \"aws-cdk-lib/aws-ec2\";\nimport { Bucket } from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: Vpc;\ndeclare const bucket: Bucket;\nconst compile = new NeuronxCompile(stack, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-7b-chat\", {\n parameters: Parameters.billion(7),\n }),\n});\n\n// Get the compiled artifacts from this S3 URL\nnew CfnOutput(stack, \"CompiledArtifact\", {\n value: compile.compiledArtifactS3Url,\n});\n```\n\nThis construct assumes the required instance type depending on the number of model parameters.\n\nAfter compiled, you can see like the this file tree in the S3 bucket.\n\n```txt\n{compiledArtifactS3Url}/\n├── model\n│ ├── config.json\n│ ├── tokenizer_config.json\n│ ├── xxx.safetensors\n│ └── xxx.safetensors\n└── compiled\n ├── xxx.neff\n ├── xxx.neff\n └── xxx.neff\n```\n\n### Spot Instance\n\n:::note warn\nIf you use Spot Instances, check if the request limit for Spot has been increased.\n:::\n\nYou can also use Spot Instances.\n\n```ts\nimport { Vpc } from \"aws-cdk-lib/aws-ec2\";\nimport { Bucket } from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: Vpc;\ndeclare const bucket: Bucket;\nnew NeuronxCompile(stack, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-7b-chat\", {\n parameters: Parameters.billion(7),\n }),\n spot: true,\n});\n```\n\n### Compile Options\n\nIf you are familiar with Neuronx, you can also specify compilation options to better meet your requirements.\n\n```ts\nimport { Vpc } from \"aws-cdk-lib/aws-ec2\";\nimport { Bucket } from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: Vpc;\ndeclare const bucket: Bucket;\nnew NeuronxCompile(stack, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-22b-chat\", {\n parameters: Parameters.billion(22),\n }),\n compileOptions: {\n nPositions: 1024,\n quantDtype: QuantDtype.S8,\n optLevel: OptLevel.MODEL_EXECUTION_PERFORMANCE,\n },\n});\n```\n"
3854
+ "markdown": "# Neuronx patterns Construct Library\n\n> [!WARNING]\n> This library is experimental module.\n\nThis library provides high-level architectural patterns using neuronx (e.g. Inferentia2 and Trainium1). It contains:\n\n- Transformers Neuronx SageMaker Real-time Inference Endpoint\n- Neuronx Compile\n\n## Transformers Neuronx SageMaker Real-time Inference Endpoint\n\n> [!WARNING]\n> This construct uses an Inferentia2 instance on SageMaker. You may need to increase your request limit for your AWS account.\n\nBy using the `NeuronxCompile` construct included in this construct library, models published on HuggingFace can be easily deployed to SageMaker Real-time inference. To define using the `NeuronxCompile` construct:\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nconst compile = new NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-7b-chat\", {\n parameters: Parameters.billion(7),\n }),\n});\nnew TransformersNeuronxSageMakerRealtimeInferenceEndpoint(\n this,\n \"RealtimeInference\",\n {\n modelData:\n TransformersNeuronxSageMakerInferenceModelData.fromNeuronxCompile(\n compile,\n ),\n },\n);\n```\n\n### Default inference code\n\nBy default, default inference code is deployed to implement the chat interface. The default inference code takes an object array like [transformers' conversations](https://huggingface.co/docs/transformers/main/en/conversations) and responds to the generated text. The following code is an example using the AWS SDK for JavaScript v3.\n\n```ts\nimport {\n InvokeEndpointCommand,\n SageMakerRuntimeClient,\n} from \"@aws-sdk/client-sagemaker-runtime\";\n\nconst client = new SageMakerRuntimeClient({\n region: \"us-east-1\",\n});\nclient\n .send(\n new InvokeEndpointCommand({\n EndpointName: \"my-endpoint-id\",\n Body: JSON.stringify({\n // Optional. You can change answer role.\n role: \"ai\",\n // Require. The messages like conversation.\n messages: [\n {\n role: \"system\",\n content: `You are helpfull assistant.`,\n },\n {\n role: \"user\",\n content:\n \"please answer '1+1=?'. You must answer only answer numeric.\",\n },\n ],\n }),\n ContentType: \"application/json\",\n Accept: \"application/json\",\n }),\n )\n .then((res) => {\n // { generated_text: \"2\" }\n console.log(JSON.parse(res.Body.transformToString()));\n });\n```\n\nTo change your own inference code, you can pass the code source.\n\n```ts\nimport * as s3Deplyment from \"aws-cdk-lib/aws-s3-deployment\";\n\ndeclare const compile: NeuronxCompile;\nnew TransformersNeuronxSageMakerRealtimeInferenceEndpoint(\n this,\n \"RealtimeInference\",\n {\n modelData:\n TransformersNeuronxSageMakerInferenceModelData.fromNeuronxCompile(\n compile,\n s3Deplyment.Source.asset(\"path/to/my/code/directory\"),\n ),\n },\n);\n```\n\n## Neuronx Compile\n\n> [!WARNING]\n> This construct uses an Inferentia2 instance on EC2. You may need to increase your request limit for your AWS account.\n\nThis construct compiles models supported by Neuronx and uploads them to the specified S3 bucket. To define\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nconst compile = new NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-7b-chat\", {\n parameters: Parameters.billion(7),\n }),\n});\n\n// Get the compiled artifacts from this S3 URL\nnew CfnOutput(this, \"CompiledArtifact\", {\n value: compile.compiledArtifactS3Url,\n});\n```\n\nThis construct assumes the required instance type depending on the number of model parameters.\n\nAfter compiled, you can see like the this file tree in the S3 bucket.\n\n```txt\n{compiledArtifactS3Url}/\n├── model\n│ ├── config.json\n│ ├── tokenizer_config.json\n│ ├── xxx.safetensors\n│ └── xxx.safetensors\n└── compiled\n ├── xxx.neff\n ├── xxx.neff\n └── xxx.neff\n```\n\nThis is NeuronxCompile architecture.\n![NeuronxCompile architecture](./docs/neuronx-compile-architecture.png)\n\n### Spot Instance\n\n> [!WARNING]\n> If you use Spot Instances, check if the request limit for Spot has been increased.\n\nYou can also use Spot Instances.\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nnew NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-7b-chat\", {\n parameters: Parameters.billion(7),\n }),\n spot: true,\n});\n```\n\n### Compile Options\n\nIf you are familiar with Neuronx, you can also specify compilation options to better meet your requirements.\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\nnew NeuronxCompile(this, \"NeuronxCompile\", {\n vpc,\n bucket,\n model: Model.fromHuggingFace(\"example/example-22b-chat\", {\n parameters: Parameters.billion(22),\n }),\n compileOptions: {\n nPositions: 1024,\n quantDtype: QuantDtype.S8,\n optLevel: OptLevel.MODEL_EXECUTION_PERFORMANCE,\n },\n});\n```\n"
3823
3855
  },
3824
3856
  "repository": {
3825
3857
  "type": "git",
@@ -3832,6 +3864,42 @@
3832
3864
  }
3833
3865
  },
3834
3866
  "types": {
3867
+ "aws-cdk-neuronx-patterns.BucketCompiledModelOptions": {
3868
+ "assembly": "aws-cdk-neuronx-patterns",
3869
+ "datatype": true,
3870
+ "docs": {
3871
+ "stability": "stable"
3872
+ },
3873
+ "fqn": "aws-cdk-neuronx-patterns.BucketCompiledModelOptions",
3874
+ "interfaces": [
3875
+ "aws-cdk-neuronx-patterns.CompiledModelOptions"
3876
+ ],
3877
+ "kind": "interface",
3878
+ "locationInModule": {
3879
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
3880
+ "line": 44
3881
+ },
3882
+ "name": "BucketCompiledModelOptions",
3883
+ "properties": [
3884
+ {
3885
+ "abstract": true,
3886
+ "docs": {
3887
+ "stability": "stable",
3888
+ "summary": "The number of parameters of model."
3889
+ },
3890
+ "immutable": true,
3891
+ "locationInModule": {
3892
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
3893
+ "line": 48
3894
+ },
3895
+ "name": "parameters",
3896
+ "type": {
3897
+ "fqn": "aws-cdk-neuronx-patterns.Parameters"
3898
+ }
3899
+ }
3900
+ ],
3901
+ "symbolId": "src/transformers-neuronx-sagemaker-realtime-inference:BucketCompiledModelOptions"
3902
+ },
3835
3903
  "aws-cdk-neuronx-patterns.CompileOptions": {
3836
3904
  "assembly": "aws-cdk-neuronx-patterns",
3837
3905
  "datatype": true,
@@ -3842,21 +3910,21 @@
3842
3910
  "fqn": "aws-cdk-neuronx-patterns.CompileOptions",
3843
3911
  "kind": "interface",
3844
3912
  "locationInModule": {
3845
- "filename": "src/neuronx-compile.ts",
3846
- "line": 60
3913
+ "filename": "src/model.ts",
3914
+ "line": 34
3847
3915
  },
3848
3916
  "name": "CompileOptions",
3849
3917
  "properties": [
3850
3918
  {
3851
3919
  "abstract": true,
3852
3920
  "docs": {
3853
- "default": "4092",
3921
+ "default": "4096",
3854
3922
  "stability": "stable"
3855
3923
  },
3856
3924
  "immutable": true,
3857
3925
  "locationInModule": {
3858
- "filename": "src/neuronx-compile.ts",
3859
- "line": 72
3926
+ "filename": "src/model.ts",
3927
+ "line": 46
3860
3928
  },
3861
3929
  "name": "nPositions",
3862
3930
  "optional": true,
@@ -3872,8 +3940,8 @@
3872
3940
  },
3873
3941
  "immutable": true,
3874
3942
  "locationInModule": {
3875
- "filename": "src/neuronx-compile.ts",
3876
- "line": 76
3943
+ "filename": "src/model.ts",
3944
+ "line": 50
3877
3945
  },
3878
3946
  "name": "optLevel",
3879
3947
  "optional": true,
@@ -3889,8 +3957,8 @@
3889
3957
  },
3890
3958
  "immutable": true,
3891
3959
  "locationInModule": {
3892
- "filename": "src/neuronx-compile.ts",
3893
- "line": 68
3960
+ "filename": "src/model.ts",
3961
+ "line": 42
3894
3962
  },
3895
3963
  "name": "quantDtype",
3896
3964
  "optional": true,
@@ -3906,8 +3974,8 @@
3906
3974
  },
3907
3975
  "immutable": true,
3908
3976
  "locationInModule": {
3909
- "filename": "src/neuronx-compile.ts",
3910
- "line": 64
3977
+ "filename": "src/model.ts",
3978
+ "line": 38
3911
3979
  },
3912
3980
  "name": "tpDegree",
3913
3981
  "optional": true,
@@ -3916,7 +3984,7 @@
3916
3984
  }
3917
3985
  }
3918
3986
  ],
3919
- "symbolId": "src/neuronx-compile:CompileOptions"
3987
+ "symbolId": "src/model:CompileOptions"
3920
3988
  },
3921
3989
  "aws-cdk-neuronx-patterns.CompileRuntime": {
3922
3990
  "assembly": "aws-cdk-neuronx-patterns",
@@ -3929,7 +3997,7 @@
3929
3997
  "kind": "interface",
3930
3998
  "locationInModule": {
3931
3999
  "filename": "src/neuronx-compile.ts",
3932
- "line": 18
4000
+ "line": 25
3933
4001
  },
3934
4002
  "name": "CompileRuntime",
3935
4003
  "properties": [
@@ -3942,7 +4010,7 @@
3942
4010
  "immutable": true,
3943
4011
  "locationInModule": {
3944
4012
  "filename": "src/neuronx-compile.ts",
3945
- "line": 22
4013
+ "line": 29
3946
4014
  },
3947
4015
  "name": "image",
3948
4016
  "type": {
@@ -3958,7 +4026,7 @@
3958
4026
  "immutable": true,
3959
4027
  "locationInModule": {
3960
4028
  "filename": "src/neuronx-compile.ts",
3961
- "line": 26
4029
+ "line": 33
3962
4030
  },
3963
4031
  "name": "neuronxVersion",
3964
4032
  "type": {
@@ -3968,6 +4036,96 @@
3968
4036
  ],
3969
4037
  "symbolId": "src/neuronx-compile:CompileRuntime"
3970
4038
  },
4039
+ "aws-cdk-neuronx-patterns.CompiledModelOptions": {
4040
+ "assembly": "aws-cdk-neuronx-patterns",
4041
+ "datatype": true,
4042
+ "docs": {
4043
+ "stability": "stable",
4044
+ "summary": "Precompiled model options."
4045
+ },
4046
+ "fqn": "aws-cdk-neuronx-patterns.CompiledModelOptions",
4047
+ "kind": "interface",
4048
+ "locationInModule": {
4049
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
4050
+ "line": 21
4051
+ },
4052
+ "name": "CompiledModelOptions",
4053
+ "properties": [
4054
+ {
4055
+ "abstract": true,
4056
+ "docs": {
4057
+ "default": "- using the predefined code",
4058
+ "stability": "stable",
4059
+ "summary": "Code used for inference."
4060
+ },
4061
+ "immutable": true,
4062
+ "locationInModule": {
4063
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
4064
+ "line": 31
4065
+ },
4066
+ "name": "code",
4067
+ "optional": true,
4068
+ "type": {
4069
+ "fqn": "aws-cdk-lib.aws_s3_deployment.ISource"
4070
+ }
4071
+ },
4072
+ {
4073
+ "abstract": true,
4074
+ "docs": {
4075
+ "default": "\"./compiled\"",
4076
+ "stability": "stable",
4077
+ "summary": "The path where compiled artifacts (i.e. xxx.neff) are stored."
4078
+ },
4079
+ "immutable": true,
4080
+ "locationInModule": {
4081
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
4082
+ "line": 41
4083
+ },
4084
+ "name": "compiledArtifactPath",
4085
+ "optional": true,
4086
+ "type": {
4087
+ "primitive": "string"
4088
+ }
4089
+ },
4090
+ {
4091
+ "abstract": true,
4092
+ "docs": {
4093
+ "default": "- Each properties are set default.",
4094
+ "stability": "stable",
4095
+ "summary": "Neuronx compile options."
4096
+ },
4097
+ "immutable": true,
4098
+ "locationInModule": {
4099
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
4100
+ "line": 26
4101
+ },
4102
+ "name": "compileOptions",
4103
+ "optional": true,
4104
+ "type": {
4105
+ "fqn": "aws-cdk-neuronx-patterns.CompileOptions"
4106
+ }
4107
+ },
4108
+ {
4109
+ "abstract": true,
4110
+ "docs": {
4111
+ "default": "\"./model\"",
4112
+ "stability": "stable",
4113
+ "summary": "Model ID or saved path."
4114
+ },
4115
+ "immutable": true,
4116
+ "locationInModule": {
4117
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
4118
+ "line": 36
4119
+ },
4120
+ "name": "modelIdOrPath",
4121
+ "optional": true,
4122
+ "type": {
4123
+ "primitive": "string"
4124
+ }
4125
+ }
4126
+ ],
4127
+ "symbolId": "src/transformers-neuronx-sagemaker-realtime-inference:CompiledModelOptions"
4128
+ },
3971
4129
  "aws-cdk-neuronx-patterns.IAcceleratorChips": {
3972
4130
  "assembly": "aws-cdk-neuronx-patterns",
3973
4131
  "docs": {
@@ -4119,10 +4277,57 @@
4119
4277
  "fqn": "aws-cdk-neuronx-patterns.Model",
4120
4278
  "kind": "class",
4121
4279
  "locationInModule": {
4122
- "filename": "src/neuronx-compile.ts",
4123
- "line": 110
4280
+ "filename": "src/model.ts",
4281
+ "line": 84
4124
4282
  },
4125
4283
  "methods": [
4284
+ {
4285
+ "docs": {
4286
+ "returns": "model instance",
4287
+ "stability": "stable",
4288
+ "summary": "model informations at S3 Bucket."
4289
+ },
4290
+ "locationInModule": {
4291
+ "filename": "src/model.ts",
4292
+ "line": 101
4293
+ },
4294
+ "name": "fromBucket",
4295
+ "parameters": [
4296
+ {
4297
+ "docs": {
4298
+ "summary": "Model stored S3 Bucket."
4299
+ },
4300
+ "name": "bucket",
4301
+ "type": {
4302
+ "fqn": "aws-cdk-lib.aws_s3.IBucket"
4303
+ }
4304
+ },
4305
+ {
4306
+ "docs": {
4307
+ "summary": "Model stored objects prefix."
4308
+ },
4309
+ "name": "prefix",
4310
+ "type": {
4311
+ "primitive": "string"
4312
+ }
4313
+ },
4314
+ {
4315
+ "docs": {
4316
+ "summary": "model basic infromation."
4317
+ },
4318
+ "name": "options",
4319
+ "type": {
4320
+ "fqn": "aws-cdk-neuronx-patterns.ModelOptions"
4321
+ }
4322
+ }
4323
+ ],
4324
+ "returns": {
4325
+ "type": {
4326
+ "fqn": "aws-cdk-neuronx-patterns.Model"
4327
+ }
4328
+ },
4329
+ "static": true
4330
+ },
4126
4331
  {
4127
4332
  "docs": {
4128
4333
  "returns": "model instance",
@@ -4130,8 +4335,8 @@
4130
4335
  "summary": "model informations at HuggingFace."
4131
4336
  },
4132
4337
  "locationInModule": {
4133
- "filename": "src/neuronx-compile.ts",
4134
- "line": 117
4338
+ "filename": "src/model.ts",
4339
+ "line": 91
4135
4340
  },
4136
4341
  "name": "fromHuggingFace",
4137
4342
  "parameters": [
@@ -4170,8 +4375,8 @@
4170
4375
  },
4171
4376
  "immutable": true,
4172
4377
  "locationInModule": {
4173
- "filename": "src/neuronx-compile.ts",
4174
- "line": 121
4378
+ "filename": "src/model.ts",
4379
+ "line": 105
4175
4380
  },
4176
4381
  "name": "modelId",
4177
4382
  "type": {
@@ -4184,16 +4389,46 @@
4184
4389
  },
4185
4390
  "immutable": true,
4186
4391
  "locationInModule": {
4187
- "filename": "src/neuronx-compile.ts",
4188
- "line": 122
4392
+ "filename": "src/model.ts",
4393
+ "line": 106
4189
4394
  },
4190
4395
  "name": "options",
4191
4396
  "type": {
4192
4397
  "fqn": "aws-cdk-neuronx-patterns.ModelOptions"
4193
4398
  }
4399
+ },
4400
+ {
4401
+ "docs": {
4402
+ "stability": "stable"
4403
+ },
4404
+ "immutable": true,
4405
+ "locationInModule": {
4406
+ "filename": "src/model.ts",
4407
+ "line": 107
4408
+ },
4409
+ "name": "bucket",
4410
+ "optional": true,
4411
+ "type": {
4412
+ "fqn": "aws-cdk-lib.aws_s3.IBucket"
4413
+ }
4414
+ },
4415
+ {
4416
+ "docs": {
4417
+ "stability": "stable"
4418
+ },
4419
+ "immutable": true,
4420
+ "locationInModule": {
4421
+ "filename": "src/model.ts",
4422
+ "line": 108
4423
+ },
4424
+ "name": "prefix",
4425
+ "optional": true,
4426
+ "type": {
4427
+ "primitive": "string"
4428
+ }
4194
4429
  }
4195
4430
  ],
4196
- "symbolId": "src/neuronx-compile:Model"
4431
+ "symbolId": "src/model:Model"
4197
4432
  },
4198
4433
  "aws-cdk-neuronx-patterns.ModelOptions": {
4199
4434
  "assembly": "aws-cdk-neuronx-patterns",
@@ -4205,8 +4440,8 @@
4205
4440
  "fqn": "aws-cdk-neuronx-patterns.ModelOptions",
4206
4441
  "kind": "interface",
4207
4442
  "locationInModule": {
4208
- "filename": "src/neuronx-compile.ts",
4209
- "line": 104
4443
+ "filename": "src/model.ts",
4444
+ "line": 78
4210
4445
  },
4211
4446
  "name": "ModelOptions",
4212
4447
  "properties": [
@@ -4217,8 +4452,8 @@
4217
4452
  },
4218
4453
  "immutable": true,
4219
4454
  "locationInModule": {
4220
- "filename": "src/neuronx-compile.ts",
4221
- "line": 105
4455
+ "filename": "src/model.ts",
4456
+ "line": 79
4222
4457
  },
4223
4458
  "name": "parameters",
4224
4459
  "type": {
@@ -4226,7 +4461,7 @@
4226
4461
  }
4227
4462
  }
4228
4463
  ],
4229
- "symbolId": "src/neuronx-compile:ModelOptions"
4464
+ "symbolId": "src/model:ModelOptions"
4230
4465
  },
4231
4466
  "aws-cdk-neuronx-patterns.NeuronxCompile": {
4232
4467
  "assembly": "aws-cdk-neuronx-patterns",
@@ -4243,7 +4478,7 @@
4243
4478
  },
4244
4479
  "locationInModule": {
4245
4480
  "filename": "src/neuronx-compile.ts",
4246
- "line": 182
4481
+ "line": 102
4247
4482
  },
4248
4483
  "parameters": [
4249
4484
  {
@@ -4269,124 +4504,224 @@
4269
4504
  "kind": "class",
4270
4505
  "locationInModule": {
4271
4506
  "filename": "src/neuronx-compile.ts",
4272
- "line": 177
4507
+ "line": 87
4273
4508
  },
4274
4509
  "name": "NeuronxCompile",
4275
4510
  "properties": [
4276
4511
  {
4277
4512
  "docs": {
4278
- "stability": "stable",
4279
- "summary": "S3 URL that compiled artifact uploaded."
4513
+ "stability": "stable"
4280
4514
  },
4281
4515
  "immutable": true,
4282
4516
  "locationInModule": {
4283
4517
  "filename": "src/neuronx-compile.ts",
4284
- "line": 181
4518
+ "line": 88
4285
4519
  },
4286
- "name": "compiledArtifactS3Url",
4520
+ "name": "compiledArtifactS3Bucket",
4287
4521
  "type": {
4288
- "primitive": "string"
4522
+ "fqn": "aws-cdk-lib.aws_s3.IBucket"
4289
4523
  }
4290
- }
4291
- ],
4292
- "symbolId": "src/neuronx-compile:NeuronxCompile"
4293
- },
4294
- "aws-cdk-neuronx-patterns.NeuronxCompileProps": {
4295
- "assembly": "aws-cdk-neuronx-patterns",
4296
- "datatype": true,
4297
- "docs": {
4298
- "stability": "stable",
4299
- "summary": "Props of NeuronxCompile."
4300
- },
4301
- "fqn": "aws-cdk-neuronx-patterns.NeuronxCompileProps",
4302
- "kind": "interface",
4303
- "locationInModule": {
4304
- "filename": "src/neuronx-compile.ts",
4305
- "line": 128
4306
- },
4307
- "name": "NeuronxCompileProps",
4308
- "properties": [
4524
+ },
4309
4525
  {
4310
- "abstract": true,
4311
4526
  "docs": {
4312
4527
  "stability": "stable",
4313
- "summary": "The bucket to upload compiled artifacts."
4528
+ "summary": "S3 Prefix that compiled artifact uploaded."
4314
4529
  },
4315
4530
  "immutable": true,
4316
4531
  "locationInModule": {
4317
4532
  "filename": "src/neuronx-compile.ts",
4318
- "line": 140
4533
+ "line": 96
4319
4534
  },
4320
- "name": "bucket",
4535
+ "name": "compiledArtifactS3Prefix",
4321
4536
  "type": {
4322
- "fqn": "aws-cdk-lib.aws_s3.IBucket"
4537
+ "primitive": "string"
4323
4538
  }
4324
4539
  },
4325
4540
  {
4326
- "abstract": true,
4327
4541
  "docs": {
4328
4542
  "stability": "stable",
4329
- "summary": "The model to be compiled."
4543
+ "summary": "S3 URL that compiled artifact uploaded."
4330
4544
  },
4331
4545
  "immutable": true,
4332
4546
  "locationInModule": {
4333
4547
  "filename": "src/neuronx-compile.ts",
4334
- "line": 144
4548
+ "line": 92
4335
4549
  },
4336
- "name": "model",
4550
+ "name": "compiledArtifactS3Url",
4337
4551
  "type": {
4338
- "fqn": "aws-cdk-neuronx-patterns.Model"
4552
+ "primitive": "string"
4339
4553
  }
4340
4554
  },
4341
4555
  {
4342
- "abstract": true,
4343
4556
  "docs": {
4344
- "stability": "stable",
4345
- "summary": "VPC in which this will launch compile worker instance."
4557
+ "stability": "stable"
4346
4558
  },
4347
4559
  "immutable": true,
4348
4560
  "locationInModule": {
4349
4561
  "filename": "src/neuronx-compile.ts",
4350
- "line": 132
4562
+ "line": 99
4351
4563
  },
4352
- "name": "vpc",
4564
+ "name": "nPositions",
4353
4565
  "type": {
4354
- "fqn": "aws-cdk-lib.aws_ec2.IVpc"
4566
+ "primitive": "number"
4355
4567
  }
4356
4568
  },
4357
4569
  {
4358
- "abstract": true,
4359
4570
  "docs": {
4360
- "default": "- Each properties are set default.",
4361
- "stability": "stable",
4362
- "summary": "Neuronx compile options."
4571
+ "stability": "stable"
4363
4572
  },
4364
4573
  "immutable": true,
4365
4574
  "locationInModule": {
4366
4575
  "filename": "src/neuronx-compile.ts",
4367
- "line": 159
4576
+ "line": 100
4368
4577
  },
4369
- "name": "compileOptions",
4370
- "optional": true,
4578
+ "name": "optLevel",
4371
4579
  "type": {
4372
- "fqn": "aws-cdk-neuronx-patterns.CompileOptions"
4580
+ "fqn": "aws-cdk-neuronx-patterns.OptLevel"
4373
4581
  }
4374
4582
  },
4375
4583
  {
4376
- "abstract": true,
4377
4584
  "docs": {
4378
- "stability": "stable",
4379
- "summary": "The instance type of compile worker instance."
4585
+ "stability": "stable"
4380
4586
  },
4381
4587
  "immutable": true,
4382
4588
  "locationInModule": {
4383
4589
  "filename": "src/neuronx-compile.ts",
4384
- "line": 136
4590
+ "line": 101
4385
4591
  },
4386
- "name": "instanceType",
4387
- "optional": true,
4592
+ "name": "parameters",
4388
4593
  "type": {
4389
- "fqn": "aws-cdk-neuronx-patterns.NeuronxInstanceType"
4594
+ "fqn": "aws-cdk-neuronx-patterns.Parameters"
4595
+ }
4596
+ },
4597
+ {
4598
+ "docs": {
4599
+ "stability": "stable"
4600
+ },
4601
+ "immutable": true,
4602
+ "locationInModule": {
4603
+ "filename": "src/neuronx-compile.ts",
4604
+ "line": 97
4605
+ },
4606
+ "name": "tpDegree",
4607
+ "type": {
4608
+ "primitive": "number"
4609
+ }
4610
+ },
4611
+ {
4612
+ "docs": {
4613
+ "stability": "stable"
4614
+ },
4615
+ "immutable": true,
4616
+ "locationInModule": {
4617
+ "filename": "src/neuronx-compile.ts",
4618
+ "line": 98
4619
+ },
4620
+ "name": "quantDtype",
4621
+ "optional": true,
4622
+ "type": {
4623
+ "fqn": "aws-cdk-neuronx-patterns.QuantDtype"
4624
+ }
4625
+ }
4626
+ ],
4627
+ "symbolId": "src/neuronx-compile:NeuronxCompile"
4628
+ },
4629
+ "aws-cdk-neuronx-patterns.NeuronxCompileProps": {
4630
+ "assembly": "aws-cdk-neuronx-patterns",
4631
+ "datatype": true,
4632
+ "docs": {
4633
+ "stability": "stable",
4634
+ "summary": "Props of NeuronxCompile."
4635
+ },
4636
+ "fqn": "aws-cdk-neuronx-patterns.NeuronxCompileProps",
4637
+ "kind": "interface",
4638
+ "locationInModule": {
4639
+ "filename": "src/neuronx-compile.ts",
4640
+ "line": 38
4641
+ },
4642
+ "name": "NeuronxCompileProps",
4643
+ "properties": [
4644
+ {
4645
+ "abstract": true,
4646
+ "docs": {
4647
+ "stability": "stable",
4648
+ "summary": "The bucket to upload compiled artifacts."
4649
+ },
4650
+ "immutable": true,
4651
+ "locationInModule": {
4652
+ "filename": "src/neuronx-compile.ts",
4653
+ "line": 46
4654
+ },
4655
+ "name": "bucket",
4656
+ "type": {
4657
+ "fqn": "aws-cdk-lib.aws_s3.IBucket"
4658
+ }
4659
+ },
4660
+ {
4661
+ "abstract": true,
4662
+ "docs": {
4663
+ "stability": "stable",
4664
+ "summary": "The model to be compiled."
4665
+ },
4666
+ "immutable": true,
4667
+ "locationInModule": {
4668
+ "filename": "src/neuronx-compile.ts",
4669
+ "line": 50
4670
+ },
4671
+ "name": "model",
4672
+ "type": {
4673
+ "fqn": "aws-cdk-neuronx-patterns.Model"
4674
+ }
4675
+ },
4676
+ {
4677
+ "abstract": true,
4678
+ "docs": {
4679
+ "stability": "stable",
4680
+ "summary": "VPC in which this will launch compile worker instance."
4681
+ },
4682
+ "immutable": true,
4683
+ "locationInModule": {
4684
+ "filename": "src/neuronx-compile.ts",
4685
+ "line": 42
4686
+ },
4687
+ "name": "vpc",
4688
+ "type": {
4689
+ "fqn": "aws-cdk-lib.aws_ec2.IVpc"
4690
+ }
4691
+ },
4692
+ {
4693
+ "abstract": true,
4694
+ "docs": {
4695
+ "default": "- Each properties are set default.",
4696
+ "stability": "stable",
4697
+ "summary": "Neuronx compile options."
4698
+ },
4699
+ "immutable": true,
4700
+ "locationInModule": {
4701
+ "filename": "src/neuronx-compile.ts",
4702
+ "line": 69
4703
+ },
4704
+ "name": "compileOptions",
4705
+ "optional": true,
4706
+ "type": {
4707
+ "fqn": "aws-cdk-neuronx-patterns.CompileOptions"
4708
+ }
4709
+ },
4710
+ {
4711
+ "abstract": true,
4712
+ "docs": {
4713
+ "stability": "stable",
4714
+ "summary": "The instance type of compile worker instance."
4715
+ },
4716
+ "immutable": true,
4717
+ "locationInModule": {
4718
+ "filename": "src/neuronx-compile.ts",
4719
+ "line": 54
4720
+ },
4721
+ "name": "instanceType",
4722
+ "optional": true,
4723
+ "type": {
4724
+ "fqn": "aws-cdk-neuronx-patterns.NeuronxInstanceType"
4390
4725
  }
4391
4726
  },
4392
4727
  {
@@ -4399,7 +4734,7 @@
4399
4734
  "immutable": true,
4400
4735
  "locationInModule": {
4401
4736
  "filename": "src/neuronx-compile.ts",
4402
- "line": 154
4737
+ "line": 64
4403
4738
  },
4404
4739
  "name": "runtime",
4405
4740
  "optional": true,
@@ -4418,7 +4753,7 @@
4418
4753
  "immutable": true,
4419
4754
  "locationInModule": {
4420
4755
  "filename": "src/neuronx-compile.ts",
4421
- "line": 165
4756
+ "line": 75
4422
4757
  },
4423
4758
  "name": "spot",
4424
4759
  "optional": true,
@@ -4436,7 +4771,7 @@
4436
4771
  "immutable": true,
4437
4772
  "locationInModule": {
4438
4773
  "filename": "src/neuronx-compile.ts",
4439
- "line": 149
4774
+ "line": 59
4440
4775
  },
4441
4776
  "name": "volumeSize",
4442
4777
  "optional": true,
@@ -4454,7 +4789,7 @@
4454
4789
  "immutable": true,
4455
4790
  "locationInModule": {
4456
4791
  "filename": "src/neuronx-compile.ts",
4457
- "line": 171
4792
+ "line": 81
4458
4793
  },
4459
4794
  "name": "vpcSubnets",
4460
4795
  "optional": true,
@@ -4633,8 +4968,8 @@
4633
4968
  "fqn": "aws-cdk-neuronx-patterns.OptLevel",
4634
4969
  "kind": "enum",
4635
4970
  "locationInModule": {
4636
- "filename": "src/neuronx-compile.ts",
4637
- "line": 42
4971
+ "filename": "src/model.ts",
4972
+ "line": 16
4638
4973
  },
4639
4974
  "members": [
4640
4975
  {
@@ -4660,7 +4995,7 @@
4660
4995
  }
4661
4996
  ],
4662
4997
  "name": "OptLevel",
4663
- "symbolId": "src/neuronx-compile:OptLevel"
4998
+ "symbolId": "src/model:OptLevel"
4664
4999
  },
4665
5000
  "aws-cdk-neuronx-patterns.Parameters": {
4666
5001
  "assembly": "aws-cdk-neuronx-patterns",
@@ -4671,8 +5006,8 @@
4671
5006
  "fqn": "aws-cdk-neuronx-patterns.Parameters",
4672
5007
  "kind": "class",
4673
5008
  "locationInModule": {
4674
- "filename": "src/neuronx-compile.ts",
4675
- "line": 82
5009
+ "filename": "src/model.ts",
5010
+ "line": 56
4676
5011
  },
4677
5012
  "methods": [
4678
5013
  {
@@ -4682,8 +5017,8 @@
4682
5017
  "summary": "Create a Parameters representing an amount bilion."
4683
5018
  },
4684
5019
  "locationInModule": {
4685
- "filename": "src/neuronx-compile.ts",
4686
- "line": 88
5020
+ "filename": "src/model.ts",
5021
+ "line": 62
4687
5022
  },
4688
5023
  "name": "billion",
4689
5024
  "parameters": [
@@ -4711,8 +5046,8 @@
4711
5046
  "summary": "Return this number of parameters as bilion."
4712
5047
  },
4713
5048
  "locationInModule": {
4714
- "filename": "src/neuronx-compile.ts",
4715
- "line": 96
5049
+ "filename": "src/model.ts",
5050
+ "line": 70
4716
5051
  },
4717
5052
  "name": "toBilion",
4718
5053
  "returns": {
@@ -4723,7 +5058,7 @@
4723
5058
  }
4724
5059
  ],
4725
5060
  "name": "Parameters",
4726
- "symbolId": "src/neuronx-compile:Parameters"
5061
+ "symbolId": "src/model:Parameters"
4727
5062
  },
4728
5063
  "aws-cdk-neuronx-patterns.QuantDtype": {
4729
5064
  "assembly": "aws-cdk-neuronx-patterns",
@@ -4734,8 +5069,8 @@
4734
5069
  "fqn": "aws-cdk-neuronx-patterns.QuantDtype",
4735
5070
  "kind": "enum",
4736
5071
  "locationInModule": {
4737
- "filename": "src/neuronx-compile.ts",
4738
- "line": 32
5072
+ "filename": "src/model.ts",
5073
+ "line": 6
4739
5074
  },
4740
5075
  "members": [
4741
5076
  {
@@ -4747,9 +5082,493 @@
4747
5082
  }
4748
5083
  ],
4749
5084
  "name": "QuantDtype",
4750
- "symbolId": "src/neuronx-compile:QuantDtype"
5085
+ "symbolId": "src/model:QuantDtype"
5086
+ },
5087
+ "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerInferenceModelData": {
5088
+ "assembly": "aws-cdk-neuronx-patterns",
5089
+ "docs": {
5090
+ "stability": "stable"
5091
+ },
5092
+ "fqn": "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerInferenceModelData",
5093
+ "kind": "class",
5094
+ "locationInModule": {
5095
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5096
+ "line": 51
5097
+ },
5098
+ "methods": [
5099
+ {
5100
+ "docs": {
5101
+ "stability": "stable"
5102
+ },
5103
+ "locationInModule": {
5104
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5105
+ "line": 52
5106
+ },
5107
+ "name": "fromBucket",
5108
+ "parameters": [
5109
+ {
5110
+ "name": "bucket",
5111
+ "type": {
5112
+ "fqn": "aws-cdk-lib.aws_s3.IBucket"
5113
+ }
5114
+ },
5115
+ {
5116
+ "name": "prefix",
5117
+ "type": {
5118
+ "primitive": "string"
5119
+ }
5120
+ },
5121
+ {
5122
+ "name": "options",
5123
+ "type": {
5124
+ "fqn": "aws-cdk-neuronx-patterns.BucketCompiledModelOptions"
5125
+ }
5126
+ }
5127
+ ],
5128
+ "returns": {
5129
+ "type": {
5130
+ "fqn": "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerInferenceModelData"
5131
+ }
5132
+ },
5133
+ "static": true
5134
+ },
5135
+ {
5136
+ "docs": {
5137
+ "stability": "stable"
5138
+ },
5139
+ "locationInModule": {
5140
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5141
+ "line": 78
5142
+ },
5143
+ "name": "fromNeuronxCompile",
5144
+ "parameters": [
5145
+ {
5146
+ "name": "compile",
5147
+ "type": {
5148
+ "fqn": "aws-cdk-neuronx-patterns.NeuronxCompile"
5149
+ }
5150
+ },
5151
+ {
5152
+ "name": "code",
5153
+ "optional": true,
5154
+ "type": {
5155
+ "fqn": "aws-cdk-lib.aws_s3_deployment.ISource"
5156
+ }
5157
+ }
5158
+ ],
5159
+ "returns": {
5160
+ "type": {
5161
+ "fqn": "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerInferenceModelData"
5162
+ }
5163
+ },
5164
+ "static": true
5165
+ }
5166
+ ],
5167
+ "name": "TransformersNeuronxSageMakerInferenceModelData",
5168
+ "properties": [
5169
+ {
5170
+ "docs": {
5171
+ "stability": "stable"
5172
+ },
5173
+ "immutable": true,
5174
+ "locationInModule": {
5175
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5176
+ "line": 86
5177
+ },
5178
+ "name": "bucket",
5179
+ "type": {
5180
+ "fqn": "aws-cdk-lib.aws_s3.IBucket"
5181
+ }
5182
+ },
5183
+ {
5184
+ "docs": {
5185
+ "stability": "stable"
5186
+ },
5187
+ "immutable": true,
5188
+ "locationInModule": {
5189
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5190
+ "line": 88
5191
+ },
5192
+ "name": "code",
5193
+ "type": {
5194
+ "fqn": "aws-cdk-lib.aws_s3_deployment.ISource"
5195
+ }
5196
+ },
5197
+ {
5198
+ "docs": {
5199
+ "stability": "stable"
5200
+ },
5201
+ "immutable": true,
5202
+ "locationInModule": {
5203
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5204
+ "line": 87
5205
+ },
5206
+ "name": "compiledArtifactS3Prefix",
5207
+ "type": {
5208
+ "primitive": "string"
5209
+ }
5210
+ },
5211
+ {
5212
+ "docs": {
5213
+ "stability": "stable"
5214
+ },
5215
+ "immutable": true,
5216
+ "locationInModule": {
5217
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5218
+ "line": 91
5219
+ },
5220
+ "name": "nPositions",
5221
+ "type": {
5222
+ "primitive": "number"
5223
+ }
5224
+ },
5225
+ {
5226
+ "docs": {
5227
+ "stability": "stable"
5228
+ },
5229
+ "immutable": true,
5230
+ "locationInModule": {
5231
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5232
+ "line": 92
5233
+ },
5234
+ "name": "optLevel",
5235
+ "type": {
5236
+ "fqn": "aws-cdk-neuronx-patterns.OptLevel"
5237
+ }
5238
+ },
5239
+ {
5240
+ "docs": {
5241
+ "stability": "stable"
5242
+ },
5243
+ "immutable": true,
5244
+ "locationInModule": {
5245
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5246
+ "line": 95
5247
+ },
5248
+ "name": "parameters",
5249
+ "type": {
5250
+ "fqn": "aws-cdk-neuronx-patterns.Parameters"
5251
+ }
5252
+ },
5253
+ {
5254
+ "docs": {
5255
+ "stability": "stable"
5256
+ },
5257
+ "immutable": true,
5258
+ "locationInModule": {
5259
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5260
+ "line": 89
5261
+ },
5262
+ "name": "tpDegree",
5263
+ "type": {
5264
+ "primitive": "number"
5265
+ }
5266
+ },
5267
+ {
5268
+ "docs": {
5269
+ "stability": "stable"
5270
+ },
5271
+ "immutable": true,
5272
+ "locationInModule": {
5273
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5274
+ "line": 94
5275
+ },
5276
+ "name": "compiledArtifactPath",
5277
+ "optional": true,
5278
+ "type": {
5279
+ "primitive": "string"
5280
+ }
5281
+ },
5282
+ {
5283
+ "docs": {
5284
+ "stability": "stable"
5285
+ },
5286
+ "immutable": true,
5287
+ "locationInModule": {
5288
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5289
+ "line": 93
5290
+ },
5291
+ "name": "modelIdOrPath",
5292
+ "optional": true,
5293
+ "type": {
5294
+ "primitive": "string"
5295
+ }
5296
+ },
5297
+ {
5298
+ "docs": {
5299
+ "stability": "stable"
5300
+ },
5301
+ "immutable": true,
5302
+ "locationInModule": {
5303
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5304
+ "line": 90
5305
+ },
5306
+ "name": "quantDtype",
5307
+ "optional": true,
5308
+ "type": {
5309
+ "fqn": "aws-cdk-neuronx-patterns.QuantDtype"
5310
+ }
5311
+ }
5312
+ ],
5313
+ "symbolId": "src/transformers-neuronx-sagemaker-realtime-inference:TransformersNeuronxSageMakerInferenceModelData"
5314
+ },
5315
+ "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerRealtimeInferenceEndpoint": {
5316
+ "assembly": "aws-cdk-neuronx-patterns",
5317
+ "base": "constructs.Construct",
5318
+ "docs": {
5319
+ "stability": "stable"
5320
+ },
5321
+ "fqn": "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerRealtimeInferenceEndpoint",
5322
+ "initializer": {
5323
+ "docs": {
5324
+ "stability": "stable"
5325
+ },
5326
+ "locationInModule": {
5327
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5328
+ "line": 179
5329
+ },
5330
+ "parameters": [
5331
+ {
5332
+ "name": "scope",
5333
+ "type": {
5334
+ "fqn": "constructs.Construct"
5335
+ }
5336
+ },
5337
+ {
5338
+ "name": "id",
5339
+ "type": {
5340
+ "primitive": "string"
5341
+ }
5342
+ },
5343
+ {
5344
+ "name": "props",
5345
+ "type": {
5346
+ "fqn": "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerRealtimeInferenceEndpointProps"
5347
+ }
5348
+ }
5349
+ ]
5350
+ },
5351
+ "kind": "class",
5352
+ "locationInModule": {
5353
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5354
+ "line": 167
5355
+ },
5356
+ "methods": [
5357
+ {
5358
+ "docs": {
5359
+ "stability": "stable"
5360
+ },
5361
+ "locationInModule": {
5362
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5363
+ "line": 287
5364
+ },
5365
+ "name": "grantInvoke",
5366
+ "parameters": [
5367
+ {
5368
+ "name": "grantee",
5369
+ "type": {
5370
+ "fqn": "aws-cdk-lib.aws_iam.IGrantable"
5371
+ }
5372
+ }
5373
+ ],
5374
+ "returns": {
5375
+ "type": {
5376
+ "fqn": "aws-cdk-lib.aws_iam.Grant"
5377
+ }
5378
+ }
5379
+ }
5380
+ ],
5381
+ "name": "TransformersNeuronxSageMakerRealtimeInferenceEndpoint",
5382
+ "properties": [
5383
+ {
5384
+ "docs": {
5385
+ "custom": {
5386
+ "attribute": "true"
5387
+ },
5388
+ "stability": "stable",
5389
+ "summary": "The ARN of the endpoint."
5390
+ },
5391
+ "immutable": true,
5392
+ "locationInModule": {
5393
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5394
+ "line": 172
5395
+ },
5396
+ "name": "endpointArn",
5397
+ "type": {
5398
+ "primitive": "string"
5399
+ }
5400
+ },
5401
+ {
5402
+ "docs": {
5403
+ "custom": {
5404
+ "attribute": "true"
5405
+ },
5406
+ "stability": "stable",
5407
+ "summary": "The name of the endpoint."
5408
+ },
5409
+ "immutable": true,
5410
+ "locationInModule": {
5411
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5412
+ "line": 177
5413
+ },
5414
+ "name": "endpointName",
5415
+ "type": {
5416
+ "primitive": "string"
5417
+ }
5418
+ }
5419
+ ],
5420
+ "symbolId": "src/transformers-neuronx-sagemaker-realtime-inference:TransformersNeuronxSageMakerRealtimeInferenceEndpoint"
5421
+ },
5422
+ "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerRealtimeInferenceEndpointProps": {
5423
+ "assembly": "aws-cdk-neuronx-patterns",
5424
+ "datatype": true,
5425
+ "docs": {
5426
+ "stability": "stable"
5427
+ },
5428
+ "fqn": "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerRealtimeInferenceEndpointProps",
5429
+ "kind": "interface",
5430
+ "locationInModule": {
5431
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5432
+ "line": 126
5433
+ },
5434
+ "name": "TransformersNeuronxSageMakerRealtimeInferenceEndpointProps",
5435
+ "properties": [
5436
+ {
5437
+ "abstract": true,
5438
+ "docs": {
5439
+ "remarks": "The model data requires at least compiled artifacts.",
5440
+ "stability": "stable",
5441
+ "summary": "Model data for SageMaker inference."
5442
+ },
5443
+ "immutable": true,
5444
+ "locationInModule": {
5445
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5446
+ "line": 131
5447
+ },
5448
+ "name": "modelData",
5449
+ "type": {
5450
+ "fqn": "aws-cdk-neuronx-patterns.TransformersNeuronxSageMakerInferenceModelData"
5451
+ }
5452
+ },
5453
+ {
5454
+ "abstract": true,
5455
+ "docs": {
5456
+ "default": "- 60 seconds, when set the `modelDataDownloadTimeout` then use same value (max 60 minutes)",
5457
+ "see": "https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-algo-ping-requests",
5458
+ "stability": "stable",
5459
+ "summary": "The timeout value, for your inference container to pass health check by SageMaker Hosting."
5460
+ },
5461
+ "immutable": true,
5462
+ "locationInModule": {
5463
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5464
+ "line": 164
5465
+ },
5466
+ "name": "containerStartupHealthCheckTimeout",
5467
+ "optional": true,
5468
+ "type": {
5469
+ "fqn": "aws-cdk-lib.Duration"
5470
+ }
5471
+ },
5472
+ {
5473
+ "abstract": true,
5474
+ "docs": {
5475
+ "default": "- Only the predefined environment variables required to use Neuronx have been set.",
5476
+ "stability": "stable",
5477
+ "summary": "A map of environment variables to pass into the container."
5478
+ },
5479
+ "immutable": true,
5480
+ "locationInModule": {
5481
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5482
+ "line": 140
5483
+ },
5484
+ "name": "environment",
5485
+ "optional": true,
5486
+ "type": {
5487
+ "collection": {
5488
+ "elementtype": {
5489
+ "primitive": "string"
5490
+ },
5491
+ "kind": "map"
5492
+ }
5493
+ }
5494
+ },
5495
+ {
5496
+ "abstract": true,
5497
+ "docs": {
5498
+ "stability": "stable",
5499
+ "summary": "An image of the container where the inference job is executed."
5500
+ },
5501
+ "immutable": true,
5502
+ "locationInModule": {
5503
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5504
+ "line": 135
5505
+ },
5506
+ "name": "image",
5507
+ "optional": true,
5508
+ "type": {
5509
+ "fqn": "@aws-cdk/aws-sagemaker-alpha.ContainerImage"
5510
+ }
5511
+ },
5512
+ {
5513
+ "abstract": true,
5514
+ "docs": {
5515
+ "default": "- It is determined automatically according to the number of model parameters and compilation options.",
5516
+ "stability": "stable",
5517
+ "summary": "The instance type of compile worker instance."
5518
+ },
5519
+ "immutable": true,
5520
+ "locationInModule": {
5521
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5522
+ "line": 145
5523
+ },
5524
+ "name": "instanceType",
5525
+ "optional": true,
5526
+ "type": {
5527
+ "fqn": "aws-cdk-neuronx-patterns.NeuronxInstanceType"
5528
+ }
5529
+ },
5530
+ {
5531
+ "abstract": true,
5532
+ "docs": {
5533
+ "default": "- 60 seconds, when `volumeSize` larger than 30GB then 1GB x 15 seconds (max 60 minutes)",
5534
+ "stability": "stable",
5535
+ "summary": "The timeout value, to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this production variant."
5536
+ },
5537
+ "immutable": true,
5538
+ "locationInModule": {
5539
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5540
+ "line": 158
5541
+ },
5542
+ "name": "modelDataDownloadTimeout",
5543
+ "optional": true,
5544
+ "type": {
5545
+ "fqn": "aws-cdk-lib.Duration"
5546
+ }
5547
+ },
5548
+ {
5549
+ "abstract": true,
5550
+ "docs": {
5551
+ "default": "- 2.5 GB per billion parameter (Max 512 GB)",
5552
+ "remarks": "Currently only Amazon EBS gp2 storage volumes are supported.",
5553
+ "see": "https://aws.amazon.com/jp/releasenotes/host-instance-storage-volumes-table",
5554
+ "stability": "stable",
5555
+ "summary": "The size, of the ML storage volume attached to individual inference instance associated with the production variant."
5556
+ },
5557
+ "immutable": true,
5558
+ "locationInModule": {
5559
+ "filename": "src/transformers-neuronx-sagemaker-realtime-inference.ts",
5560
+ "line": 152
5561
+ },
5562
+ "name": "volumeSize",
5563
+ "optional": true,
5564
+ "type": {
5565
+ "fqn": "aws-cdk-lib.Size"
5566
+ }
5567
+ }
5568
+ ],
5569
+ "symbolId": "src/transformers-neuronx-sagemaker-realtime-inference:TransformersNeuronxSageMakerRealtimeInferenceEndpointProps"
4751
5570
  }
4752
5571
  },
4753
- "version": "0.0.2",
4754
- "fingerprint": "p6gVUQb1hvNV7Eh7GaBS3XMbGzlWxaqBx9dV/sS1KX0="
5572
+ "version": "0.0.4",
5573
+ "fingerprint": "XO2QlDqW3bojQGczF18xCsjcRr+vPLQOWOpbuoVbGs4="
4755
5574
  }