npm - aws-cdk-neuronx-patterns - Versions diffs - 0.2.0 → 0.3.0 - Mend

aws-cdk-neuronx-patterns 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

package/.jsii CHANGED Viewed

@@ -8,7 +8,7 @@
   },
   "dependencies": {
     "@aws-cdk/aws-sagemaker-alpha": "2.240.0-alpha.0",
-    "@cdklabs/deploy-time-build": "^0.0.6",
+    "@cdklabs/deploy-time-build": "^0.1.0",
     "aws-cdk-lib": "^2.240.0",
     "constructs": "^10.5.1"
   },
@@ -8555,7 +8555,7 @@
     "stability": "stable"
   },
   "homepage": "https://github.com/WinterYukky/aws-cdk-neuronx-patterns.git",
-  "jsiiVersion": "5.9.29 (build f415c53)",
+  "jsiiVersion": "5.9.32 (build ac92fbd)",
   "keywords": [
     "cdk",
     "neuronx"
@@ -8571,7 +8571,7 @@
   },
   "name": "aws-cdk-neuronx-patterns",
   "readme": {
-    "markdown": "# Neuronx patterns Construct Library\n\n> [!WARNING]\n> This library is experimental module.\n\nThis library provides high-level architectural patterns using AWS Neuronx (e.g. Inferentia2 and Trainium1). It contains:\n\n- vLLM with NxD Inference on ALB & ECS on EC2\n- Neuronx Compiler\n\n[日本語版 README はこちら](./README.ja.md)\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [vLLM NxD Inference on ALB & ECS on EC2](#vllm-nxd-inference-on-alb--ecs-on-ec2)\n  - [Architecture](#architecture)\n  - [Basic Usage](#basic-usage)\n  - [Complete Example](#complete-example)\n  - [Using Specific Official AWS Neuron vLLM Image Version](#using-specific-official-aws-neuron-vllm-image-version)\n  - [Using HuggingFace Token with Secrets](#using-huggingface-token-with-secrets)\n- [Neuronx Compiler](#neuronx-compiler)\n  - [Spot Instance](#spot-instance)\n- [API Reference](#api-reference)\n- [Cost Considerations](#cost-considerations)\n- [Troubleshooting](#troubleshooting)\n- [Security Best Practices](#security-best-practices)\n- [License](#license)\n\n## Installation\n\n```bash\n# NPM\nnpm i aws-cdk-neuronx-patterns\n\n# yarn\nyarn add aws-cdk-neuronx-patterns\n\n# PNPM\npnpm i aws-cdk-neuronx-patterns\n```\n\n## Quick Start\n\nHere's a minimal example to deploy a vLLM inference service:\n\n```ts\nimport * as cdk from \"aws-cdk-lib\";\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport {\n  VllmNxdInferenceCompiler,\n  VllmNxdInferenceTaskDefinition,\n  ApplicationLoadBalancedVllmNxDInferenceService,\n  Model,\n} from \"aws-cdk-neuronx-patterns\";\n\nconst app = new cdk.App();\nconst stack = new cdk.Stack(app, \"VllmInferenceStack\");\n\nconst vpc = new ec2.Vpc(stack, \"Vpc\", { maxAzs: 2 });\nconst bucket = new s3.Bucket(stack, \"ModelBucket\");\n\nconst compiler = new VllmNxdInferenceCompiler(stack, \"Compiler\", {\n  vpc,\n  bucket,\n  model: Model.fromHuggingFace(\"HuggingFaceTB/SmolLM-135M-Instruct\"),\n});\n\nconst compiledModel = compiler.compile();\nconst taskDefinition = new VllmNxdInferenceTaskDefinition(stack, \"TaskDef\", {\n  compiledModel,\n});\n\nconst service = new ApplicationLoadBalancedVllmNxDInferenceService(\n  stack,\n  \"Service\",\n  { vpc, taskDefinition }\n);\n\nnew cdk.CfnOutput(stack, \"LoadBalancerDNS\", {\n  value: service.loadBalancer.loadBalancerDnsName,\n});\n```\n\n## vLLM NxD Inference on ALB & ECS on EC2\n\n> [!WARNING]\n> This construct uses an Inferentia2 instance on EC2. You may need to increase your service quota for Inferentia2 instances in your AWS account via the [Service Quotas console](https://console.aws.amazon.com/servicequotas/).\n\nThis pattern combines `VllmNxdInferenceCompiler` for model compilation and `ApplicationLoadBalancedVllmNxDInferenceService` for deployment. Models published on HuggingFace can be easily compiled and deployed to ECS with Application Load Balancer.\n\n### Architecture\n\n![ApplicationLoadBalancedVllmNxDInferenceService architecture](./docs/application-load-balanced-vllm-nxd-inference-service.png)\n\nThe construct automatically:\n\n- Calculates optimal tensor parallelism based on model size\n- Configures memory footprint for the ECS tasks\n- Sets up the Application Load Balancer with health checks\n- Deploys the compiled model to ECS tasks\n- Configures auto-scaling policies\n\nThe service exposes a REST API endpoint through the Application Load Balancer that can be used to perform inference with the deployed model.\n\n### Basic Usage\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport {\n  VllmNxdInferenceCompiler,\n  VllmNxdInferenceTaskDefinition,\n  ApplicationLoadBalancedVllmNxDInferenceService,\n  Model,\n} from \"aws-cdk-neuronx-patterns\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\n\nconst compiler = new VllmNxdInferenceCompiler(this, \"Compiler\", {\n  vpc,\n  bucket,\n  model: Model.fromHuggingFace(\"HuggingFaceTB/SmolLM-135M-Instruct\"),\n});\n\nconst compiledModel = compiler.compile();\nconst taskDefinition = new VllmNxdInferenceTaskDefinition(\n  this,\n  \"TaskDefinition\",\n  {\n    compiledModel,\n  }\n);\n\nconst service = new ApplicationLoadBalancedVllmNxDInferenceService(\n  this,\n  \"Service\",\n  {\n    vpc,\n    taskDefinition,\n  }\n);\n```\n\n### Complete Example\n\nHere's a complete example with VPC and S3 bucket creation, including access from other ECS tasks:\n\n```ts\nimport * as cdk from \"aws-cdk-lib\";\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as ecs from \"aws-cdk-lib/aws-ecs\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport {\n  VllmNxdInferenceCompiler,\n  VllmNxdInferenceTaskDefinition,\n  ApplicationLoadBalancedVllmNxDInferenceService,\n  Model,\n} from \"aws-cdk-neuronx-patterns\";\n\nexport class MyVllmStack extends cdk.Stack {\n  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {\n    super(scope, id, props);\n\n    // Create VPC\n    const vpc = new ec2.Vpc(this, \"Vpc\", {\n      maxAzs: 2,\n      natGateways: 1,\n    });\n\n    // Create S3 bucket for compiled models\n    const bucket = new s3.Bucket(this, \"ModelBucket\", {\n      removalPolicy: cdk.RemovalPolicy.DESTROY,\n      autoDeleteObjects: true,\n    });\n\n    // Compile the model\n    const compiler = new VllmNxdInferenceCompiler(this, \"Compiler\", {\n      vpc,\n      bucket,\n      model: Model.fromHuggingFace(\"HuggingFaceTB/SmolLM-135M-Instruct\"),\n    });\n\n    const compiledModel = compiler.compile();\n\n    // Create task definition\n    const taskDefinition = new VllmNxdInferenceTaskDefinition(\n      this,\n      \"TaskDefinition\",\n      {\n        compiledModel,\n      }\n    );\n\n    // Deploy service with ALB\n    const service = new ApplicationLoadBalancedVllmNxDInferenceService(\n      this,\n      \"Service\",\n      {\n        vpc,\n        taskDefinition,\n      }\n    );\n\n    // Allow access from other ECS tasks\n    const cluster = new ecs.Cluster(this, \"AppCluster\", { vpc });\n    const appTaskDefinition = new ecs.FargateTaskDefinition(\n      this,\n      \"AppTaskDefinition\"\n    );\n    appTaskDefinition.addContainer(\"app\", {\n      image: ecs.ContainerImage.fromRegistry(\"amazon/amazon-ecs-sample\"),\n      logging: ecs.LogDrivers.awsLogs({ streamPrefix: \"app\" }),\n    });\n\n    const appService = new ecs.FargateService(this, \"AppService\", {\n      cluster,\n      taskDefinition: appTaskDefinition,\n    });\n\n    // Allow application service to access inference service\n    service.service.connections.allowFrom(\n      appService,\n      ec2.Port.tcp(8000),\n      \"Allow access from application service\"\n    );\n\n    // Output the load balancer URL\n    new cdk.CfnOutput(this, \"LoadBalancerURL\", {\n      value: `http://${service.loadBalancer.loadBalancerDnsName}`,\n      description: \"Load Balancer URL for inference endpoint\",\n    });\n  }\n}\n```\n\n### Using Specific Official AWS Neuron vLLM Image Version\n\nThis library supports the official AWS Neuron Deep Learning Containers for vLLM inference. You can use the `VllmInferenceNeuronxImage` class to reference these images and `VllmNxdInferenceImage.fromNeuronSdkVersion` to create a compatible image object:\n\n```typescript\nimport { VllmNxdInferenceImage, VllmInferenceNeuronxImage } from \"aws-cdk-neuronx-patterns\";\n\n// Use the official vLLM Neuron Image\nconst vllmImage = VllmNxdInferenceImage.fromNeuronSdkVersion(\n  VllmInferenceNeuronxImage.SDK_2_26_0\n);\n\n// Use with task definition\nconst taskDefinition = new VllmNxdInferenceTaskDefinition(\n  this,\n  \"TaskDefinition\",\n  {\n    compiledModel,\n    image: vllmImage, // Default is using latest official vLLM Neuron Image\n  }\n);\n```\n\n### Using HuggingFace Token with Secrets\n\nWhen working with private or gated models on HuggingFace, you need to provide an authentication token. For security best practices, store your HuggingFace token in AWS Secrets Manager and pass it to both the compiler and inference environments:\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport * as batch from \"aws-cdk-lib/aws-batch\";\nimport { Secret } from \"aws-cdk-lib/aws-secretsmanager\";\nimport {\n  VllmNxdInferenceCompiler,\n  VllmNxdInferenceTaskDefinition,\n  ApplicationLoadBalancedVllmNxDInferenceService,\n  Model,\n} from \"aws-cdk-neuronx-patterns\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\n\n// Reference an existing secret containing your HuggingFace token\nconst hfTokenSecret = Secret.fromSecretNameV2(\n  this,\n  \"HFTokenSecret\",\n  \"my-huggingface-token\"\n);\nconst hfToken = batch.Secret.fromSecretsManager(hfTokenSecret, \"readonlyToken\");\n\n// Pass the secret to the compiler\nconst compiler = new VllmNxdInferenceCompiler(this, \"Compiler\", {\n  vpc,\n  bucket,\n  model: Model.fromHuggingFace(\"meta-llama/Meta-Llama-3-8B\"),\n  vllmArgs: {\n    hfToken, // Pass the HF token secret here\n  },\n});\n\nconst compiledModel = compiler.compile();\nconst taskDefinition = new VllmNxdInferenceTaskDefinition(\n  this,\n  \"TaskDefinition\",\n  {\n    compiledModel,\n  }\n);\n\nconst service = new ApplicationLoadBalancedVllmNxDInferenceService(\n  this,\n  \"Service\",\n  {\n    vpc,\n    taskDefinition,\n  }\n);\n```\n\nThe secret will be securely passed as an environment variable to the compilation batch job and the ECS tasks running the inference server.\n\n## Neuronx Compiler\n\n> [!WARNING]\n> This construct uses an Inferentia2 instance on EC2. You may need to increase your service quota for Inferentia2 instances in your AWS account.\n\nThis construct compiles models supported by Neuronx and uploads them to the specified S3 bucket. The construct automatically selects the required instance type based on the number of model parameters.\n\n![NeuronxCompiler architecture](./docs/neuronx-compile-architecture.png)\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport { NeuronxCompiler, Model } from \"aws-cdk-neuronx-patterns\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\ndeclare const image: INeuronxContainerImage;\n\nconst compiler = new NeuronxCompiler(this, \"NeuronxCompiler\", {\n  vpc,\n  bucket,\n  model: Model.fromHuggingFace(\"HuggingFaceTB/SmolLM-135M-Instruct\"),\n  artifactS3Prefix: \"my-compiled-artifacts\",\n  image,\n});\n\nconst compiledModel = compiler.compile();\n\n// Get the compiled artifacts from this S3 URL\nnew cdk.CfnOutput(this, \"CompiledArtifact\", {\n  value: compiledModel.s3Url,\n});\n```\n\n### Spot Instance\n\n> [!WARNING]\n> If you use Spot Instances, verify that your service quota for Spot instances has been increased.\n\nYou can reduce costs by using Spot Instances for compilation:\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport { NeuronxCompiler, Model } from \"aws-cdk-neuronx-patterns\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\ndeclare const image: INeuronxContainerImage;\n\nnew NeuronxCompiler(this, \"NeuronxCompiler\", {\n  vpc,\n  bucket,\n  model: Model.fromHuggingFace(\"HuggingFaceTB/SmolLM-135M-Instruct\"),\n  artifactS3Prefix: \"my-compiled-artifacts\",\n  image,\n  spot: true, // Enable Spot Instances\n});\n```\n\n## API Reference\n\nFor detailed API documentation, see [API.md](./API.md).\n\n## Cost Considerations\n\n> [!IMPORTANT]\n> This library deploys AWS resources that incur costs:\n> - **Inferentia2 instances** (EC2) - Significant hourly costs\n> - **Application Load Balancer** - Hourly and data processing charges\n> - **NAT Gateway** - Hourly and data processing charges\n> - **S3 storage** - Storage and request charges\n> - **Data transfer** - Charges for data transfer out\n\nFor cost estimates, use the [AWS Pricing Calculator](https://calculator.aws).\n\n**Cost optimization tips:**\n- Use Spot Instances for compilation jobs (can save up to 90%)\n- Delete resources when not in use (`cdk destroy`)\n- Use appropriate instance sizes for your workload\n- Monitor usage with AWS Cost Explorer\n\n## Troubleshooting\n\n### Common Issues\n\n**Issue: \"Service quota exceeded for Inferentia2 instances\"**\n- Solution: Request a quota increase via the [Service Quotas console](https://console.aws.amazon.com/servicequotas/)\n- Navigate to: EC2 → Running On-Demand Inf instances\n\n**Issue: \"Compilation job fails\"**\n- Check AWS Batch job logs in CloudWatch Logs\n- Verify the model exists on HuggingFace\n- Ensure sufficient disk space and memory for the model size\n\n**Issue: \"ECS tasks fail to start\"**\n- Check ECS task logs in CloudWatch\n- Verify S3 bucket permissions\n- Ensure the compiled model exists in S3\n\n**Issue: \"Health check failures\"**\n- Increase health check grace period\n- Verify security group rules allow ALB to reach ECS tasks\n- Check container logs for startup errors\n\n### Debugging\n\nView logs in CloudWatch:\n```bash\n# Batch job logs\naws logs tail /aws/batch/job --follow\n\n# ECS task logs\naws logs tail /ecs/vllm-inference --follow\n```\n\n## Security Best Practices\n\n- **Secrets Management**: Always use AWS Secrets Manager for sensitive data (HuggingFace tokens, API keys)\n- **IAM Roles**: Follow the principle of least privilege for IAM roles\n- **VPC Configuration**:\n  - Deploy ECS tasks in private subnets\n  - Use security groups to restrict traffic\n  - Enable VPC Flow Logs for monitoring\n- **S3 Buckets**:\n  - Enable encryption at rest\n  - Use bucket policies to restrict access\n  - Enable versioning for compiled models\n- **ALB**:\n  - Use HTTPS with ACM certificates in production\n  - Enable access logs for auditing\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis library is licensed under the Apache-2.0 License. See the [LICENSE](./LICENSE) file.\n"
+    "markdown": "# Neuronx patterns Construct Library\n\n> [!WARNING]\n> This library is experimental module.\n\nThis library provides high-level architectural patterns using AWS Neuronx (e.g. Inferentia2 and Trainium1). It contains:\n\n- vLLM with NxD Inference on ALB & ECS on EC2\n- Neuronx Compiler\n\n[日本語版 README はこちら](./README.ja.md)\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [vLLM NxD Inference on ALB & ECS on EC2](#vllm-nxd-inference-on-alb--ecs-on-ec2)\n  - [Architecture](#architecture)\n  - [Basic Usage](#basic-usage)\n  - [Complete Example](#complete-example)\n  - [Using Specific Official AWS Neuron vLLM Image Version](#using-specific-official-aws-neuron-vllm-image-version)\n  - [Using HuggingFace Token with Secrets](#using-huggingface-token-with-secrets)\n- [Neuronx Compiler](#neuronx-compiler)\n  - [Spot Instance](#spot-instance)\n- [API Reference](#api-reference)\n- [Cost Considerations](#cost-considerations)\n- [Troubleshooting](#troubleshooting)\n- [Security Best Practices](#security-best-practices)\n- [License](#license)\n\n## Installation\n\n```bash\n# NPM\nnpm i aws-cdk-neuronx-patterns\n\n# yarn\nyarn add aws-cdk-neuronx-patterns\n\n# PNPM\npnpm i aws-cdk-neuronx-patterns\n```\n\n## Quick Start\n\nHere's a minimal example to deploy a vLLM inference service:\n\n```ts\nimport * as cdk from \"aws-cdk-lib\";\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport {\n  VllmNxdInferenceCompiler,\n  VllmNxdInferenceTaskDefinition,\n  ApplicationLoadBalancedVllmNxDInferenceService,\n  Model,\n} from \"aws-cdk-neuronx-patterns\";\n\nconst app = new cdk.App();\nconst stack = new cdk.Stack(app, \"VllmInferenceStack\");\n\nconst vpc = new ec2.Vpc(stack, \"Vpc\", { maxAzs: 2 });\nconst bucket = new s3.Bucket(stack, \"ModelBucket\");\n\nconst compiler = new VllmNxdInferenceCompiler(stack, \"Compiler\", {\n  vpc,\n  bucket,\n  model: Model.fromHuggingFace(\"HuggingFaceTB/SmolLM-135M-Instruct\"),\n});\n\nconst compiledModel = compiler.compile();\nconst taskDefinition = new VllmNxdInferenceTaskDefinition(stack, \"TaskDef\", {\n  compiledModel,\n});\n\nconst service = new ApplicationLoadBalancedVllmNxDInferenceService(\n  stack,\n  \"Service\",\n  { vpc, taskDefinition }\n);\n\nnew cdk.CfnOutput(stack, \"LoadBalancerDNS\", {\n  value: service.loadBalancer.loadBalancerDnsName,\n});\n```\n\n## vLLM NxD Inference on ALB & ECS on EC2\n\n> [!WARNING]\n> This construct uses an Inferentia2 instance on EC2 for inference. You may need to increase your service quota for Inferentia2 instances in your AWS account via the [Service Quotas console](https://console.aws.amazon.com/servicequotas/).\n\n> [!NOTE]\n> Model compilation is performed on standard (non-Neuron) EC2 instances via cross-compilation, so no Inferentia/Trainium quota is needed for the compilation phase.\n\nThis pattern combines `VllmNxdInferenceCompiler` for model compilation and `ApplicationLoadBalancedVllmNxDInferenceService` for deployment. Models published on HuggingFace can be easily compiled and deployed to ECS with Application Load Balancer.\n\n### Architecture\n\n![ApplicationLoadBalancedVllmNxDInferenceService architecture](./docs/application-load-balanced-vllm-nxd-inference-service.png)\n\nThe construct automatically:\n\n- Calculates optimal tensor parallelism based on model size\n- Configures memory footprint for the ECS tasks\n- Sets up the Application Load Balancer with health checks\n- Deploys the compiled model to ECS tasks\n- Configures auto-scaling policies\n\nThe service exposes a REST API endpoint through the Application Load Balancer that can be used to perform inference with the deployed model.\n\n### Basic Usage\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport {\n  VllmNxdInferenceCompiler,\n  VllmNxdInferenceTaskDefinition,\n  ApplicationLoadBalancedVllmNxDInferenceService,\n  Model,\n} from \"aws-cdk-neuronx-patterns\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\n\nconst compiler = new VllmNxdInferenceCompiler(this, \"Compiler\", {\n  vpc,\n  bucket,\n  model: Model.fromHuggingFace(\"HuggingFaceTB/SmolLM-135M-Instruct\"),\n});\n\nconst compiledModel = compiler.compile();\nconst taskDefinition = new VllmNxdInferenceTaskDefinition(\n  this,\n  \"TaskDefinition\",\n  {\n    compiledModel,\n  }\n);\n\nconst service = new ApplicationLoadBalancedVllmNxDInferenceService(\n  this,\n  \"Service\",\n  {\n    vpc,\n    taskDefinition,\n  }\n);\n```\n\n### Complete Example\n\nHere's a complete example with VPC and S3 bucket creation, including access from other ECS tasks:\n\n```ts\nimport * as cdk from \"aws-cdk-lib\";\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as ecs from \"aws-cdk-lib/aws-ecs\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport {\n  VllmNxdInferenceCompiler,\n  VllmNxdInferenceTaskDefinition,\n  ApplicationLoadBalancedVllmNxDInferenceService,\n  Model,\n} from \"aws-cdk-neuronx-patterns\";\n\nexport class MyVllmStack extends cdk.Stack {\n  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {\n    super(scope, id, props);\n\n    // Create VPC\n    const vpc = new ec2.Vpc(this, \"Vpc\", {\n      maxAzs: 2,\n      natGateways: 1,\n    });\n\n    // Create S3 bucket for compiled models\n    const bucket = new s3.Bucket(this, \"ModelBucket\", {\n      removalPolicy: cdk.RemovalPolicy.DESTROY,\n      autoDeleteObjects: true,\n    });\n\n    // Compile the model\n    const compiler = new VllmNxdInferenceCompiler(this, \"Compiler\", {\n      vpc,\n      bucket,\n      model: Model.fromHuggingFace(\"HuggingFaceTB/SmolLM-135M-Instruct\"),\n    });\n\n    const compiledModel = compiler.compile();\n\n    // Create task definition\n    const taskDefinition = new VllmNxdInferenceTaskDefinition(\n      this,\n      \"TaskDefinition\",\n      {\n        compiledModel,\n      }\n    );\n\n    // Deploy service with ALB\n    const service = new ApplicationLoadBalancedVllmNxDInferenceService(\n      this,\n      \"Service\",\n      {\n        vpc,\n        taskDefinition,\n      }\n    );\n\n    // Allow access from other ECS tasks\n    const cluster = new ecs.Cluster(this, \"AppCluster\", { vpc });\n    const appTaskDefinition = new ecs.FargateTaskDefinition(\n      this,\n      \"AppTaskDefinition\"\n    );\n    appTaskDefinition.addContainer(\"app\", {\n      image: ecs.ContainerImage.fromRegistry(\"amazon/amazon-ecs-sample\"),\n      logging: ecs.LogDrivers.awsLogs({ streamPrefix: \"app\" }),\n    });\n\n    const appService = new ecs.FargateService(this, \"AppService\", {\n      cluster,\n      taskDefinition: appTaskDefinition,\n    });\n\n    // Allow application service to access inference service\n    service.service.connections.allowFrom(\n      appService,\n      ec2.Port.tcp(8000),\n      \"Allow access from application service\"\n    );\n\n    // Output the load balancer URL\n    new cdk.CfnOutput(this, \"LoadBalancerURL\", {\n      value: `http://${service.loadBalancer.loadBalancerDnsName}`,\n      description: \"Load Balancer URL for inference endpoint\",\n    });\n  }\n}\n```\n\n### Using Specific Official AWS Neuron vLLM Image Version\n\nThis library supports the official AWS Neuron Deep Learning Containers for vLLM inference. You can use the `VllmInferenceNeuronxImage` class to reference these images and `VllmNxdInferenceImage.fromNeuronSdkVersion` to create a compatible image object:\n\n```typescript\nimport { VllmNxdInferenceImage, VllmInferenceNeuronxImage } from \"aws-cdk-neuronx-patterns\";\n\n// Use the official vLLM Neuron Image\nconst vllmImage = VllmNxdInferenceImage.fromNeuronSdkVersion(\n  VllmInferenceNeuronxImage.SDK_2_26_0\n);\n\n// Use with task definition\nconst taskDefinition = new VllmNxdInferenceTaskDefinition(\n  this,\n  \"TaskDefinition\",\n  {\n    compiledModel,\n    image: vllmImage, // Default is using latest official vLLM Neuron Image\n  }\n);\n```\n\n### Using HuggingFace Token with Secrets\n\nWhen working with private or gated models on HuggingFace, you need to provide an authentication token. For security best practices, store your HuggingFace token in AWS Secrets Manager and pass it to both the compiler and inference environments:\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport * as batch from \"aws-cdk-lib/aws-batch\";\nimport { Secret } from \"aws-cdk-lib/aws-secretsmanager\";\nimport {\n  VllmNxdInferenceCompiler,\n  VllmNxdInferenceTaskDefinition,\n  ApplicationLoadBalancedVllmNxDInferenceService,\n  Model,\n} from \"aws-cdk-neuronx-patterns\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\n\n// Reference an existing secret containing your HuggingFace token\nconst hfTokenSecret = Secret.fromSecretNameV2(\n  this,\n  \"HFTokenSecret\",\n  \"my-huggingface-token\"\n);\nconst hfToken = batch.Secret.fromSecretsManager(hfTokenSecret, \"readonlyToken\");\n\n// Pass the secret to the compiler\nconst compiler = new VllmNxdInferenceCompiler(this, \"Compiler\", {\n  vpc,\n  bucket,\n  model: Model.fromHuggingFace(\"meta-llama/Meta-Llama-3-8B\"),\n  vllmArgs: {\n    hfToken, // Pass the HF token secret here\n  },\n});\n\nconst compiledModel = compiler.compile();\nconst taskDefinition = new VllmNxdInferenceTaskDefinition(\n  this,\n  \"TaskDefinition\",\n  {\n    compiledModel,\n  }\n);\n\nconst service = new ApplicationLoadBalancedVllmNxDInferenceService(\n  this,\n  \"Service\",\n  {\n    vpc,\n    taskDefinition,\n  }\n);\n```\n\nThe secret will be securely passed as an environment variable to the compilation batch job and the ECS tasks running the inference server.\n\n## Neuronx Compiler\n\n> [!WARNING]\n> This construct uses an Inferentia2 instance on EC2. You may need to increase your service quota for Inferentia2 instances in your AWS account.\n\nThis construct compiles models supported by Neuronx and uploads them to the specified S3 bucket. The construct automatically selects the required instance type based on the number of model parameters.\n\nThere are two compiler variants:\n\n- **`NeuronxNativeCompiler`** — Compiles on Neuron instances (Inferentia2/Trainium). Requires Neuron device quota.\n- **`NeuronxCrossCompiler`** — Compiles on standard EC2 instances (e.g., `c7i-flex.4xlarge`) without Neuron hardware. Used by `VllmNxdInferenceCompiler` by default.\n\nBoth implement the `INeuronxCompiler` interface and produce compatible artifacts.\n\n![NeuronxCompiler architecture](./docs/neuronx-compile-architecture.png)\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport { NeuronxNativeCompiler, Model } from \"aws-cdk-neuronx-patterns\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\ndeclare const image: INeuronxContainerImage;\n\nconst compiler = new NeuronxNativeCompiler(this, \"NeuronxCompiler\", {\n  vpc,\n  bucket,\n  model: Model.fromHuggingFace(\"HuggingFaceTB/SmolLM-135M-Instruct\"),\n  artifactS3Prefix: \"my-compiled-artifacts\",\n  image,\n});\n\nconst compiledModel = compiler.compile();\n\n// Get the compiled artifacts from this S3 URL\nnew cdk.CfnOutput(this, \"CompiledArtifact\", {\n  value: compiledModel.s3Url,\n});\n```\n\n### Spot Instance\n\n> [!WARNING]\n> If you use Spot Instances, verify that your service quota for Spot instances has been increased.\n\nYou can reduce costs by using Spot Instances for compilation:\n\n```ts\nimport * as ec2 from \"aws-cdk-lib/aws-ec2\";\nimport * as s3 from \"aws-cdk-lib/aws-s3\";\nimport { NeuronxNativeCompiler, Model } from \"aws-cdk-neuronx-patterns\";\n\ndeclare const vpc: ec2.Vpc;\ndeclare const bucket: s3.Bucket;\ndeclare const image: INeuronxContainerImage;\n\nnew NeuronxNativeCompiler(this, \"NeuronxCompiler\", {\n  vpc,\n  bucket,\n  model: Model.fromHuggingFace(\"HuggingFaceTB/SmolLM-135M-Instruct\"),\n  artifactS3Prefix: \"my-compiled-artifacts\",\n  image,\n  spot: true, // Enable Spot Instances\n});\n```\n\n## API Reference\n\nFor detailed API documentation, see [API.md](./API.md).\n\n## Cost Considerations\n\n> [!IMPORTANT]\n> This library deploys AWS resources that incur costs:\n> - **Inferentia2 instances** (EC2) - Significant hourly costs\n> - **Application Load Balancer** - Hourly and data processing charges\n> - **NAT Gateway** - Hourly and data processing charges\n> - **S3 storage** - Storage and request charges\n> - **Data transfer** - Charges for data transfer out\n\nFor cost estimates, use the [AWS Pricing Calculator](https://calculator.aws).\n\n**Cost optimization tips:**\n- The `VllmNxdInferenceCompiler` uses cross-compilation on standard EC2 instances by default, avoiding expensive Neuron instances during compilation\n- Use Spot Instances for compilation jobs (can save up to 90%)\n- Delete resources when not in use (`cdk destroy`)\n- Use appropriate instance sizes for your workload\n- Monitor usage with AWS Cost Explorer\n\n## Troubleshooting\n\n### Common Issues\n\n**Issue: \"Service quota exceeded for Inferentia2 instances\"**\n- Solution: Request a quota increase via the [Service Quotas console](https://console.aws.amazon.com/servicequotas/)\n- Navigate to: EC2 → Running On-Demand Inf instances\n\n**Issue: \"Compilation job fails\"**\n- Check AWS Batch job logs in CloudWatch Logs\n- Verify the model exists on HuggingFace\n- Ensure sufficient disk space and memory for the model size\n\n**Issue: \"ECS tasks fail to start\"**\n- Check ECS task logs in CloudWatch\n- Verify S3 bucket permissions\n- Ensure the compiled model exists in S3\n\n**Issue: \"Health check failures\"**\n- Increase health check grace period\n- Verify security group rules allow ALB to reach ECS tasks\n- Check container logs for startup errors\n\n### Debugging\n\nView logs in CloudWatch:\n```bash\n# Batch job logs\naws logs tail /aws/batch/job --follow\n\n# ECS task logs\naws logs tail /ecs/vllm-inference --follow\n```\n\n## Security Best Practices\n\n- **Secrets Management**: Always use AWS Secrets Manager for sensitive data (HuggingFace tokens, API keys)\n- **IAM Roles**: Follow the principle of least privilege for IAM roles\n- **VPC Configuration**:\n  - Deploy ECS tasks in private subnets\n  - Use security groups to restrict traffic\n  - Enable VPC Flow Logs for monitoring\n- **S3 Buckets**:\n  - Enable encryption at rest\n  - Use bucket policies to restrict access\n  - Enable versioning for compiled models\n- **ALB**:\n  - Use HTTPS with ACM certificates in production\n  - Enable access logs for auditing\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis library is licensed under the Apache-2.0 License. See the [LICENSE](./LICENSE) file.\n"
   },
   "repository": {
     "type": "git",
@@ -8866,6 +8866,56 @@
       "name": "ChatTemplateContentFormat",
       "symbolId": "src/base/server-engine/vllm-engine/vllm-engine-argments:ChatTemplateContentFormat"
     },
+    "aws-cdk-neuronx-patterns.ComputeEnvironmentResult": {
+      "assembly": "aws-cdk-neuronx-patterns",
+      "datatype": true,
+      "docs": {
+        "stability": "stable",
+        "summary": "Result of creating a compute environment."
+      },
+      "fqn": "aws-cdk-neuronx-patterns.ComputeEnvironmentResult",
+      "kind": "interface",
+      "locationInModule": {
+        "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+        "line": 144
+      },
+      "name": "ComputeEnvironmentResult",
+      "properties": [
+        {
+          "abstract": true,
+          "docs": {
+            "stability": "stable",
+            "summary": "The compute environment."
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 148
+          },
+          "name": "computeEnvironment",
+          "type": {
+            "fqn": "aws-cdk-lib.aws_batch.IComputeEnvironment"
+          }
+        },
+        {
+          "abstract": true,
+          "docs": {
+            "stability": "stable",
+            "summary": "The instance role associated with the compute environment."
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 152
+          },
+          "name": "instanceRole",
+          "type": {
+            "fqn": "aws-cdk-lib.aws_iam.IRole"
+          }
+        }
+      ],
+      "symbolId": "src/base/neuronx-compiler/neuronx-compiler-base:ComputeEnvironmentResult"
+    },
     "aws-cdk-neuronx-patterns.ConfigFormat": {
       "assembly": "aws-cdk-neuronx-patterns",
       "docs": {
@@ -9221,6 +9271,39 @@
       ],
       "symbolId": "src/base/neuronx/neuronx-instance-type:IAcceleratorChips"
     },
+    "aws-cdk-neuronx-patterns.INeuronxCompiler": {
+      "assembly": "aws-cdk-neuronx-patterns",
+      "docs": {
+        "stability": "stable",
+        "summary": "Interface for Neuronx compilers."
+      },
+      "fqn": "aws-cdk-neuronx-patterns.INeuronxCompiler",
+      "kind": "interface",
+      "locationInModule": {
+        "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+        "line": 72
+      },
+      "methods": [
+        {
+          "abstract": true,
+          "docs": {
+            "stability": "stable"
+          },
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 73
+          },
+          "name": "compile",
+          "returns": {
+            "type": {
+              "fqn": "aws-cdk-neuronx-patterns.NeuronxCompiledModel"
+            }
+          }
+        }
+      ],
+      "name": "INeuronxCompiler",
+      "symbolId": "src/base/neuronx-compiler/neuronx-compiler-base:INeuronxCompiler"
+    },
     "aws-cdk-neuronx-patterns.INeuronxContainerImage": {
       "assembly": "aws-cdk-neuronx-patterns",
       "docs": {
@@ -9230,8 +9313,8 @@
       "fqn": "aws-cdk-neuronx-patterns.INeuronxContainerImage",
       "kind": "interface",
       "locationInModule": {
-        "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-        "line": 39
+        "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+        "line": 28
       },
       "name": "INeuronxContainerImage",
       "properties": [
@@ -9243,8 +9326,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 43
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 32
           },
           "name": "image",
           "type": {
@@ -9259,8 +9342,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 47
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 36
           },
           "name": "neuronSdkVersion",
           "type": {
@@ -9268,7 +9351,7 @@
           }
         }
       ],
-      "symbolId": "src/base/neuronx-compiler/neuronx-compiler:INeuronxContainerImage"
+      "symbolId": "src/base/neuronx-compiler/neuronx-compiler-base:INeuronxContainerImage"
     },
     "aws-cdk-neuronx-patterns.INeuronxImage": {
       "assembly": "aws-cdk-neuronx-patterns",
@@ -9340,7 +9423,7 @@
       "kind": "interface",
       "locationInModule": {
         "filename": "src/base/neuronx/neuronx-instance-type.ts",
-        "line": 30
+        "line": 42
       },
       "name": "INeuronxInstanceType",
       "properties": [
@@ -9352,7 +9435,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/base/neuronx/neuronx-instance-type.ts",
-            "line": 35
+            "line": 47
           },
           "name": "acceleratorChips",
           "type": {
@@ -9367,7 +9450,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/base/neuronx/neuronx-instance-type.ts",
-            "line": 32
+            "line": 44
           },
           "name": "instanceType",
           "type": {
@@ -9382,7 +9465,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/base/neuronx/neuronx-instance-type.ts",
-            "line": 34
+            "line": 46
           },
           "name": "memory",
           "type": {
@@ -9397,7 +9480,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/base/neuronx/neuronx-instance-type.ts",
-            "line": 31
+            "line": 43
           },
           "name": "supportedTensorParallelism",
           "type": {
@@ -9417,7 +9500,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/base/neuronx/neuronx-instance-type.ts",
-            "line": 33
+            "line": 45
           },
           "name": "vCpu",
           "type": {
@@ -10969,13 +11052,14 @@
       "assembly": "aws-cdk-neuronx-patterns",
       "datatype": true,
       "docs": {
-        "stability": "stable"
+        "stability": "stable",
+        "summary": "The model compiled by Neuronx compiler."
       },
       "fqn": "aws-cdk-neuronx-patterns.NeuronxCompiledModel",
       "kind": "interface",
       "locationInModule": {
-        "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-        "line": 112
+        "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+        "line": 42
       },
       "name": "NeuronxCompiledModel",
       "properties": [
@@ -10987,8 +11071,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 117
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 50
           },
           "name": "bucket",
           "type": {
@@ -10998,32 +11082,33 @@
         {
           "abstract": true,
           "docs": {
-            "stability": "stable"
+            "stability": "stable",
+            "summary": "The model name."
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 113
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 62
           },
-          "name": "compileTimeInstanceType",
+          "name": "modelName",
           "type": {
-            "fqn": "aws-cdk-neuronx-patterns.INeuronxInstanceType"
+            "primitive": "string"
           }
         },
         {
           "abstract": true,
           "docs": {
             "stability": "stable",
-            "summary": "The model name."
+            "summary": "The recommended Neuron instance type for running inference with this compiled model."
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 129
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 46
           },
-          "name": "modelName",
+          "name": "recommendedInstanceType",
           "type": {
-            "primitive": "string"
+            "fqn": "aws-cdk-neuronx-patterns.INeuronxInstanceType"
           }
         },
         {
@@ -11034,8 +11119,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 125
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 58
           },
           "name": "s3Prefix",
           "type": {
@@ -11050,8 +11135,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 121
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 54
           },
           "name": "s3Uri",
           "type": {
@@ -11061,12 +11146,13 @@
         {
           "abstract": true,
           "docs": {
-            "stability": "stable"
+            "stability": "stable",
+            "summary": "The weight size of the model."
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 130
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 66
           },
           "name": "weightSize",
           "type": {
@@ -11074,24 +11160,25 @@
           }
         }
       ],
-      "symbolId": "src/base/neuronx-compiler/neuronx-compiler:NeuronxCompiledModel"
+      "symbolId": "src/base/neuronx-compiler/neuronx-compiler-base:NeuronxCompiledModel"
     },
-    "aws-cdk-neuronx-patterns.NeuronxCompiler": {
+    "aws-cdk-neuronx-patterns.NeuronxCompilerBase": {
+      "abstract": true,
       "assembly": "aws-cdk-neuronx-patterns",
       "base": "constructs.Construct",
       "docs": {
-        "remarks": "Compile the model to work with Inferentia2 and Trainium1 and upload it to an S3 bucket.",
+        "remarks": "Provides the common orchestration logic (Lambda, CustomResource, WaitCondition)\nwhile subclasses define how to create the Batch compute environment and job definition.",
         "stability": "stable",
-        "summary": "Neuronx compiler construct."
+        "summary": "Abstract base class for Neuronx compilers."
       },
-      "fqn": "aws-cdk-neuronx-patterns.NeuronxCompiler",
+      "fqn": "aws-cdk-neuronx-patterns.NeuronxCompilerBase",
       "initializer": {
         "docs": {
           "stability": "stable"
         },
         "locationInModule": {
-          "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-          "line": 147
+          "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+          "line": 172
         },
         "parameters": [
           {
@@ -11109,15 +11196,18 @@
           {
             "name": "props",
             "type": {
-              "fqn": "aws-cdk-neuronx-patterns.NeuronxCompilerProps"
+              "fqn": "aws-cdk-neuronx-patterns.NeuronxCompilerBaseProps"
             }
           }
         ]
       },
+      "interfaces": [
+        "aws-cdk-neuronx-patterns.INeuronxCompiler"
+      ],
       "kind": "class",
       "locationInModule": {
-        "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-        "line": 137
+        "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+        "line": 160
       },
       "methods": [
         {
@@ -11125,34 +11215,166 @@
             "stability": "stable"
           },
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 286
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 284
           },
           "name": "compile",
+          "overrides": "aws-cdk-neuronx-patterns.INeuronxCompiler",
           "returns": {
             "type": {
               "fqn": "aws-cdk-neuronx-patterns.NeuronxCompiledModel"
             }
           }
+        },
+        {
+          "abstract": true,
+          "docs": {
+            "remarks": "Subclasses must implement this to provide the appropriate compute environment.",
+            "stability": "stable",
+            "summary": "Create the Batch compute environment."
+          },
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 251
+          },
+          "name": "createComputeEnvironment",
+          "parameters": [
+            {
+              "name": "props",
+              "type": {
+                "fqn": "aws-cdk-neuronx-patterns.NeuronxCompilerBaseProps"
+              }
+            }
+          ],
+          "protected": true,
+          "returns": {
+            "type": {
+              "fqn": "aws-cdk-neuronx-patterns.ComputeEnvironmentResult"
+            }
+          }
+        },
+        {
+          "abstract": true,
+          "docs": {
+            "remarks": "Subclasses must implement this to provide the appropriate job definition.",
+            "stability": "stable",
+            "summary": "Create the Batch job definition."
+          },
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 259
+          },
+          "name": "createJobDefinition",
+          "parameters": [
+            {
+              "name": "props",
+              "type": {
+                "fqn": "aws-cdk-neuronx-patterns.NeuronxCompilerBaseProps"
+              }
+            }
+          ],
+          "protected": true,
+          "returns": {
+            "type": {
+              "fqn": "aws-cdk-lib.aws_batch.IJobDefinition"
+            }
+          }
         }
       ],
-      "name": "NeuronxCompiler",
-      "symbolId": "src/base/neuronx-compiler/neuronx-compiler:NeuronxCompiler"
+      "name": "NeuronxCompilerBase",
+      "properties": [
+        {
+          "docs": {
+            "stability": "stable"
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 166
+          },
+          "name": "artifactS3Prefix",
+          "protected": true,
+          "type": {
+            "primitive": "string"
+          }
+        },
+        {
+          "docs": {
+            "stability": "stable"
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 170
+          },
+          "name": "bucket",
+          "protected": true,
+          "type": {
+            "fqn": "aws-cdk-lib.aws_s3.IBucket"
+          }
+        },
+        {
+          "docs": {
+            "stability": "stable"
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 169
+          },
+          "name": "model",
+          "protected": true,
+          "type": {
+            "fqn": "aws-cdk-neuronx-patterns.Model"
+          }
+        },
+        {
+          "docs": {
+            "stability": "stable"
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 168
+          },
+          "name": "neuronxInstanceType",
+          "protected": true,
+          "type": {
+            "fqn": "aws-cdk-neuronx-patterns.INeuronxInstanceType"
+          }
+        },
+        {
+          "docs": {
+            "stability": "stable"
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 167
+          },
+          "name": "weightSize",
+          "protected": true,
+          "type": {
+            "fqn": "aws-cdk-lib.Size"
+          }
+        }
+      ],
+      "symbolId": "src/base/neuronx-compiler/neuronx-compiler-base:NeuronxCompilerBase"
     },
-    "aws-cdk-neuronx-patterns.NeuronxCompilerProps": {
+    "aws-cdk-neuronx-patterns.NeuronxCompilerBaseProps": {
       "assembly": "aws-cdk-neuronx-patterns",
       "datatype": true,
       "docs": {
         "stability": "stable",
-        "summary": "Props of NeuronxCompiler."
+        "summary": "Common props for NeuronxCompilerBase."
       },
-      "fqn": "aws-cdk-neuronx-patterns.NeuronxCompilerProps",
+      "fqn": "aws-cdk-neuronx-patterns.NeuronxCompilerBaseProps",
       "kind": "interface",
       "locationInModule": {
-        "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-        "line": 53
+        "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+        "line": 79
       },
-      "name": "NeuronxCompilerProps",
+      "name": "NeuronxCompilerBaseProps",
       "properties": [
         {
           "abstract": true,
@@ -11163,8 +11385,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 70
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 96
           },
           "name": "artifactS3Prefix",
           "type": {
@@ -11179,8 +11401,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 61
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 87
           },
           "name": "bucket",
           "type": {
@@ -11195,8 +11417,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 82
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 108
           },
           "name": "image",
           "type": {
@@ -11211,8 +11433,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 78
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 104
           },
           "name": "model",
           "type": {
@@ -11227,8 +11449,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 74
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 100
           },
           "name": "neuronxInstanceType",
           "type": {
@@ -11243,8 +11465,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 57
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 83
           },
           "name": "vpc",
           "type": {
@@ -11254,12 +11476,13 @@
         {
           "abstract": true,
           "docs": {
-            "stability": "stable"
+            "stability": "stable",
+            "summary": "The command to run in the container."
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 83
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 112
           },
           "name": "command",
           "optional": true,
@@ -11282,8 +11505,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 107
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 136
           },
           "name": "environment",
           "optional": true,
@@ -11304,8 +11527,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 65
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 91
           },
           "name": "secrets",
           "optional": true,
@@ -11328,8 +11551,8 @@
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 94
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 123
           },
           "name": "spot",
           "optional": true,
@@ -11340,14 +11563,14 @@
         {
           "abstract": true,
           "docs": {
-            "default": "- N bilion parameters * 5GiB EBS",
+            "default": "- N billion parameters * 5GiB EBS",
             "stability": "stable",
             "summary": "The root volume of worker instance."
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 88
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 117
           },
           "name": "volumeSize",
           "optional": true,
@@ -11358,23 +11581,165 @@
         {
           "abstract": true,
           "docs": {
-            "default": "- new subnets will be created",
+            "default": "- new subnets will be created",
+            "stability": "stable",
+            "summary": "The VPC Subnets this Compute Environment will launch instances in."
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-compiler-base.ts",
+            "line": 129
+          },
+          "name": "vpcSubnets",
+          "optional": true,
+          "type": {
+            "fqn": "aws-cdk-lib.aws_ec2.SubnetSelection"
+          }
+        }
+      ],
+      "symbolId": "src/base/neuronx-compiler/neuronx-compiler-base:NeuronxCompilerBaseProps"
+    },
+    "aws-cdk-neuronx-patterns.NeuronxCrossCompiler": {
+      "assembly": "aws-cdk-neuronx-patterns",
+      "base": "aws-cdk-neuronx-patterns.NeuronxCompilerBase",
+      "docs": {
+        "remarks": "Compile the model on a non-Neuron instance and upload the artifacts to an S3 bucket.\nThis avoids the need for expensive Neuron instances during the compilation phase.\n\nThe compilation uses `vllm serve` which performs model tracing and neuronx-cc compilation\nentirely on CPU. The resulting artifacts are compatible with Neuron instances for inference.",
+        "stability": "stable",
+        "summary": "Neuronx cross-compiler construct."
+      },
+      "fqn": "aws-cdk-neuronx-patterns.NeuronxCrossCompiler",
+      "initializer": {
+        "docs": {
+          "stability": "stable"
+        },
+        "locationInModule": {
+          "filename": "src/base/neuronx-compiler/neuronx-cross-compiler.ts",
+          "line": 38
+        },
+        "parameters": [
+          {
+            "name": "scope",
+            "type": {
+              "fqn": "constructs.Construct"
+            }
+          },
+          {
+            "name": "id",
+            "type": {
+              "primitive": "string"
+            }
+          },
+          {
+            "name": "props",
+            "type": {
+              "fqn": "aws-cdk-neuronx-patterns.NeuronxCrossCompilerProps"
+            }
+          }
+        ]
+      },
+      "kind": "class",
+      "locationInModule": {
+        "filename": "src/base/neuronx-compiler/neuronx-cross-compiler.ts",
+        "line": 37
+      },
+      "methods": [
+        {
+          "docs": {
+            "remarks": "Subclasses must implement this to provide the appropriate compute environment.",
+            "stability": "stable",
+            "summary": "Create the Batch compute environment."
+          },
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-cross-compiler.ts",
+            "line": 42
+          },
+          "name": "createComputeEnvironment",
+          "overrides": "aws-cdk-neuronx-patterns.NeuronxCompilerBase",
+          "parameters": [
+            {
+              "name": "props",
+              "type": {
+                "fqn": "aws-cdk-neuronx-patterns.NeuronxCompilerBaseProps"
+              }
+            }
+          ],
+          "protected": true,
+          "returns": {
+            "type": {
+              "fqn": "aws-cdk-neuronx-patterns.ComputeEnvironmentResult"
+            }
+          }
+        },
+        {
+          "docs": {
+            "remarks": "Subclasses must implement this to provide the appropriate job definition.",
+            "stability": "stable",
+            "summary": "Create the Batch job definition."
+          },
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-cross-compiler.ts",
+            "line": 97
+          },
+          "name": "createJobDefinition",
+          "overrides": "aws-cdk-neuronx-patterns.NeuronxCompilerBase",
+          "parameters": [
+            {
+              "name": "props",
+              "type": {
+                "fqn": "aws-cdk-neuronx-patterns.NeuronxCompilerBaseProps"
+              }
+            }
+          ],
+          "protected": true,
+          "returns": {
+            "type": {
+              "fqn": "aws-cdk-lib.aws_batch.IJobDefinition"
+            }
+          }
+        }
+      ],
+      "name": "NeuronxCrossCompiler",
+      "symbolId": "src/base/neuronx-compiler/neuronx-cross-compiler:NeuronxCrossCompiler"
+    },
+    "aws-cdk-neuronx-patterns.NeuronxCrossCompilerProps": {
+      "assembly": "aws-cdk-neuronx-patterns",
+      "datatype": true,
+      "docs": {
+        "stability": "stable",
+        "summary": "Props of NeuronxCrossCompiler."
+      },
+      "fqn": "aws-cdk-neuronx-patterns.NeuronxCrossCompilerProps",
+      "interfaces": [
+        "aws-cdk-neuronx-patterns.NeuronxCompilerBaseProps"
+      ],
+      "kind": "interface",
+      "locationInModule": {
+        "filename": "src/base/neuronx-compiler/neuronx-cross-compiler.ts",
+        "line": 18
+      },
+      "name": "NeuronxCrossCompilerProps",
+      "properties": [
+        {
+          "abstract": true,
+          "docs": {
+            "default": "ec2.InstanceType.of(ec2.InstanceClass.C7I, ec2.InstanceSize.XLARGE4)",
+            "remarks": "This should be a non-Neuron instance type with sufficient memory and CPU\nfor model compilation.",
             "stability": "stable",
-            "summary": "The VPC Subnets this Compute Environment will launch instances in."
+            "summary": "The EC2 instance type to use for cross-compilation."
           },
           "immutable": true,
           "locationInModule": {
-            "filename": "src/base/neuronx-compiler/neuronx-compiler.ts",
-            "line": 100
+            "filename": "src/base/neuronx-compiler/neuronx-cross-compiler.ts",
+            "line": 26
           },
-          "name": "vpcSubnets",
+          "name": "compileInstanceType",
           "optional": true,
           "type": {
-            "fqn": "aws-cdk-lib.aws_ec2.SubnetSelection"
+            "fqn": "aws-cdk-lib.aws_ec2.InstanceType"
           }
         }
       ],
-      "symbolId": "src/base/neuronx-compiler/neuronx-compiler:NeuronxCompilerProps"
+      "symbolId": "src/base/neuronx-compiler/neuronx-cross-compiler:NeuronxCrossCompilerProps"
     },
     "aws-cdk-neuronx-patterns.NeuronxInstanceType": {
       "abstract": true,
@@ -11391,7 +11756,7 @@
       "kind": "class",
       "locationInModule": {
         "filename": "src/base/neuronx/neuronx-instance-type.ts",
-        "line": 105
+        "line": 137
       },
       "name": "NeuronxInstanceType",
       "properties": [
@@ -11404,7 +11769,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/base/neuronx/neuronx-instance-type.ts",
-            "line": 135
+            "line": 167
           },
           "name": "INF2_24XLARGE",
           "static": true,
@@ -11421,7 +11786,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/base/neuronx/neuronx-instance-type.ts",
-            "line": 148
+            "line": 180
           },
           "name": "INF2_48XLARGE",
           "static": true,
@@ -11438,7 +11803,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/base/neuronx/neuronx-instance-type.ts",
-            "line": 122
+            "line": 154
           },
           "name": "INF2_8XLARGE",
           "static": true,
@@ -11455,7 +11820,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/base/neuronx/neuronx-instance-type.ts",
-            "line": 109
+            "line": 141
           },
           "name": "INF2_XLARGE",
           "static": true,
@@ -11472,7 +11837,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/base/neuronx/neuronx-instance-type.ts",
-            "line": 161
+            "line": 193
           },
           "name": "TRN1_2XLARGE",
           "static": true,
@@ -11489,17 +11854,189 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/base/neuronx/neuronx-instance-type.ts",
-            "line": 174
+            "line": 206
           },
           "name": "TRN1_32XLARGE",
           "static": true,
           "type": {
             "fqn": "aws-cdk-neuronx-patterns.INeuronxInstanceType"
           }
+        },
+        {
+          "const": true,
+          "docs": {
+            "stability": "stable",
+            "summary": "trn2.3xlarge."
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx/neuronx-instance-type.ts",
+            "line": 219
+          },
+          "name": "TRN2_3XLARGE",
+          "static": true,
+          "type": {
+            "fqn": "aws-cdk-neuronx-patterns.INeuronxInstanceType"
+          }
+        },
+        {
+          "const": true,
+          "docs": {
+            "stability": "stable",
+            "summary": "trn2.48xlarge."
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx/neuronx-instance-type.ts",
+            "line": 232
+          },
+          "name": "TRN2_48XLARGE",
+          "static": true,
+          "type": {
+            "fqn": "aws-cdk-neuronx-patterns.INeuronxInstanceType"
+          }
+        },
+        {
+          "const": true,
+          "docs": {
+            "stability": "stable",
+            "summary": "trn2u.48xlarge."
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx/neuronx-instance-type.ts",
+            "line": 245
+          },
+          "name": "TRN2U_48XLARGE",
+          "static": true,
+          "type": {
+            "fqn": "aws-cdk-neuronx-patterns.INeuronxInstanceType"
+          }
         }
       ],
       "symbolId": "src/base/neuronx/neuronx-instance-type:NeuronxInstanceType"
     },
+    "aws-cdk-neuronx-patterns.NeuronxNativeCompiler": {
+      "assembly": "aws-cdk-neuronx-patterns",
+      "base": "aws-cdk-neuronx-patterns.NeuronxCompilerBase",
+      "docs": {
+        "remarks": "Compile the model to work with Inferentia2 and Trainium1 and upload it to an S3 bucket.",
+        "stability": "stable",
+        "summary": "Neuronx compiler construct."
+      },
+      "fqn": "aws-cdk-neuronx-patterns.NeuronxNativeCompiler",
+      "initializer": {
+        "docs": {
+          "stability": "stable"
+        },
+        "locationInModule": {
+          "filename": "src/base/neuronx-compiler/neuronx-native-compiler.ts",
+          "line": 32
+        },
+        "parameters": [
+          {
+            "name": "scope",
+            "type": {
+              "fqn": "constructs.Construct"
+            }
+          },
+          {
+            "name": "id",
+            "type": {
+              "primitive": "string"
+            }
+          },
+          {
+            "name": "props",
+            "type": {
+              "fqn": "aws-cdk-neuronx-patterns.NeuronxNativeCompilerProps"
+            }
+          }
+        ]
+      },
+      "kind": "class",
+      "locationInModule": {
+        "filename": "src/base/neuronx-compiler/neuronx-native-compiler.ts",
+        "line": 31
+      },
+      "methods": [
+        {
+          "docs": {
+            "remarks": "Subclasses must implement this to provide the appropriate compute environment.",
+            "stability": "stable",
+            "summary": "Create the Batch compute environment."
+          },
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-native-compiler.ts",
+            "line": 36
+          },
+          "name": "createComputeEnvironment",
+          "overrides": "aws-cdk-neuronx-patterns.NeuronxCompilerBase",
+          "parameters": [
+            {
+              "name": "props",
+              "type": {
+                "fqn": "aws-cdk-neuronx-patterns.NeuronxCompilerBaseProps"
+              }
+            }
+          ],
+          "protected": true,
+          "returns": {
+            "type": {
+              "fqn": "aws-cdk-neuronx-patterns.ComputeEnvironmentResult"
+            }
+          }
+        },
+        {
+          "docs": {
+            "remarks": "Subclasses must implement this to provide the appropriate job definition.",
+            "stability": "stable",
+            "summary": "Create the Batch job definition."
+          },
+          "locationInModule": {
+            "filename": "src/base/neuronx-compiler/neuronx-native-compiler.ts",
+            "line": 80
+          },
+          "name": "createJobDefinition",
+          "overrides": "aws-cdk-neuronx-patterns.NeuronxCompilerBase",
+          "parameters": [
+            {
+              "name": "props",
+              "type": {
+                "fqn": "aws-cdk-neuronx-patterns.NeuronxCompilerBaseProps"
+              }
+            }
+          ],
+          "protected": true,
+          "returns": {
+            "type": {
+              "fqn": "aws-cdk-lib.aws_batch.IJobDefinition"
+            }
+          }
+        }
+      ],
+      "name": "NeuronxNativeCompiler",
+      "symbolId": "src/base/neuronx-compiler/neuronx-native-compiler:NeuronxNativeCompiler"
+    },
+    "aws-cdk-neuronx-patterns.NeuronxNativeCompilerProps": {
+      "assembly": "aws-cdk-neuronx-patterns",
+      "datatype": true,
+      "docs": {
+        "stability": "stable",
+        "summary": "Props of NeuronxNativeCompiler."
+      },
+      "fqn": "aws-cdk-neuronx-patterns.NeuronxNativeCompilerProps",
+      "interfaces": [
+        "aws-cdk-neuronx-patterns.NeuronxCompilerBaseProps"
+      ],
+      "kind": "interface",
+      "locationInModule": {
+        "filename": "src/base/neuronx-compiler/neuronx-native-compiler.ts",
+        "line": 25
+      },
+      "name": "NeuronxNativeCompilerProps",
+      "symbolId": "src/base/neuronx-compiler/neuronx-native-compiler:NeuronxNativeCompilerProps"
+    },
     "aws-cdk-neuronx-patterns.NeuronxTaskDefinition": {
       "assembly": "aws-cdk-neuronx-patterns",
       "base": "aws-cdk-lib.aws_ecs.Ec2TaskDefinition",
@@ -13285,6 +13822,87 @@
       ],
       "symbolId": "src/base/neuronx/neuronx-instance-type:Trainium1Chips"
     },
+    "aws-cdk-neuronx-patterns.Trainium2Chips": {
+      "assembly": "aws-cdk-neuronx-patterns",
+      "docs": {
+        "stability": "stable"
+      },
+      "fqn": "aws-cdk-neuronx-patterns.Trainium2Chips",
+      "initializer": {
+        "docs": {
+          "stability": "stable"
+        },
+        "locationInModule": {
+          "filename": "src/base/neuronx/neuronx-instance-type.ts",
+          "line": 34
+        },
+        "parameters": [
+          {
+            "name": "chips",
+            "type": {
+              "primitive": "number"
+            }
+          }
+        ]
+      },
+      "interfaces": [
+        "aws-cdk-neuronx-patterns.IAcceleratorChips"
+      ],
+      "kind": "class",
+      "locationInModule": {
+        "filename": "src/base/neuronx/neuronx-instance-type.ts",
+        "line": 31
+      },
+      "name": "Trainium2Chips",
+      "properties": [
+        {
+          "docs": {
+            "stability": "stable"
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx/neuronx-instance-type.ts",
+            "line": 33
+          },
+          "name": "acceleratorMemory",
+          "overrides": "aws-cdk-neuronx-patterns.IAcceleratorChips",
+          "type": {
+            "fqn": "aws-cdk-lib.Size"
+          }
+        },
+        {
+          "docs": {
+            "stability": "stable"
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx/neuronx-instance-type.ts",
+            "line": 34
+          },
+          "name": "chips",
+          "overrides": "aws-cdk-neuronx-patterns.IAcceleratorChips",
+          "type": {
+            "primitive": "number"
+          }
+        },
+        {
+          "docs": {
+            "stability": "stable"
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/base/neuronx/neuronx-instance-type.ts",
+            "line": 32
+          },
+          "name": "neuronxCores",
+          "overrides": "aws-cdk-neuronx-patterns.IAcceleratorChips",
+          "type": {
+            "primitive": "number"
+          }
+        }
+      ],
+      "symbolId": "src/base/neuronx/neuronx-instance-type:Trainium2Chips"
+    },
     "aws-cdk-neuronx-patterns.UvicornLogLevel": {
       "assembly": "aws-cdk-neuronx-patterns",
       "docs": {
@@ -15842,7 +16460,7 @@
         },
         "locationInModule": {
           "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-          "line": 43
+          "line": 76
         },
         "parameters": [
           {
@@ -15869,7 +16487,7 @@
       "kind": "class",
       "locationInModule": {
         "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-        "line": 38
+        "line": 71
       },
       "name": "VllmNxdInferenceCompileImage",
       "properties": [
@@ -15881,7 +16499,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 42
+            "line": 75
           },
           "name": "image",
           "overrides": "aws-cdk-neuronx-patterns.VllmNxdInferenceEcsImageBase",
@@ -15903,7 +16521,7 @@
       "kind": "interface",
       "locationInModule": {
         "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-        "line": 64
+        "line": 97
       },
       "name": "VllmNxdInferenceCompileProps",
       "properties": [
@@ -15916,7 +16534,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 72
+            "line": 105
           },
           "name": "bucket",
           "type": {
@@ -15932,7 +16550,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 80
+            "line": 113
           },
           "name": "model",
           "type": {
@@ -15948,13 +16566,32 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 68
+            "line": 101
           },
           "name": "vpc",
           "type": {
             "fqn": "aws-cdk-lib.aws_ec2.IVpc"
           }
         },
+        {
+          "abstract": true,
+          "docs": {
+            "default": "- Automatically selected based on model size",
+            "remarks": "This should be a non-Neuron instance type with sufficient memory for model compilation.",
+            "stability": "stable",
+            "summary": "The EC2 instance type to use for cross-compilation."
+          },
+          "immutable": true,
+          "locationInModule": {
+            "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
+            "line": 156
+          },
+          "name": "compileInstanceType",
+          "optional": true,
+          "type": {
+            "fqn": "aws-cdk-lib.aws_ec2.InstanceType"
+          }
+        },
         {
           "abstract": true,
           "docs": {
@@ -15966,7 +16603,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 104
+            "line": 137
           },
           "name": "environment",
           "optional": true,
@@ -15989,7 +16626,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 116
+            "line": 149
           },
           "name": "image",
           "optional": true,
@@ -16006,7 +16643,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 76
+            "line": 109
           },
           "name": "neuronxInstanceType",
           "optional": true,
@@ -16025,7 +16662,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 91
+            "line": 124
           },
           "name": "spot",
           "optional": true,
@@ -16043,7 +16680,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 111
+            "line": 144
           },
           "name": "vllmArgs",
           "optional": true,
@@ -16061,7 +16698,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 85
+            "line": 118
           },
           "name": "volumeSize",
           "optional": true,
@@ -16079,7 +16716,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 97
+            "line": 130
           },
           "name": "vpcSubnets",
           "optional": true,
@@ -16104,7 +16741,7 @@
       "kind": "interface",
       "locationInModule": {
         "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-        "line": 122
+        "line": 162
       },
       "name": "VllmNxdInferenceCompiledModel",
       "properties": [
@@ -16117,7 +16754,7 @@
           "immutable": true,
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 126
+            "line": 166
           },
           "name": "vllmArgs",
           "type": {
@@ -16142,7 +16779,7 @@
         },
         "locationInModule": {
           "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-          "line": 136
+          "line": 176
         },
         "parameters": [
           {
@@ -16168,7 +16805,7 @@
       "kind": "class",
       "locationInModule": {
         "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-        "line": 133
+        "line": 173
       },
       "methods": [
         {
@@ -16179,7 +16816,7 @@
           },
           "locationInModule": {
             "filename": "src/vllm-nxd-inference/vllm-nxd-inference-compiler.ts",
-            "line": 251
+            "line": 293
           },
           "name": "compile",
           "returns": {
@@ -17089,6 +17726,6 @@
       "symbolId": "src/base/server-engine/vllm-engine/vllm-engine-argments:VllmTask"
     }
   },
-  "version": "0.2.0",
-  "fingerprint": "y3L7pFC63WvgygoXeexNy9PC78TL0KjiDkaYJ1KI0l8="
+  "version": "0.3.0",
+  "fingerprint": "aK6GSL1zrw/JNphQGwQEHYhOfs7iFqHe5PGaAzykQM0="
 }