npm - omgkit - Versions diffs - 2.20.0 → 2.21.1 - Mend

omgkit 2.20.0 → 2.21.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (73) hide show

package/README.md +125 -10
package/package.json +1 -1
package/plugin/agents/ai-architect-agent.md +282 -0
package/plugin/agents/data-scientist-agent.md +221 -0
package/plugin/agents/experiment-analyst-agent.md +318 -0
package/plugin/agents/ml-engineer-agent.md +165 -0
package/plugin/agents/mlops-engineer-agent.md +324 -0
package/plugin/agents/model-optimizer-agent.md +287 -0
package/plugin/agents/production-engineer-agent.md +360 -0
package/plugin/agents/research-scientist-agent.md +274 -0
package/plugin/commands/omgdata/augment.md +86 -0
package/plugin/commands/omgdata/collect.md +81 -0
package/plugin/commands/omgdata/label.md +83 -0
package/plugin/commands/omgdata/split.md +83 -0
package/plugin/commands/omgdata/validate.md +76 -0
package/plugin/commands/omgdata/version.md +85 -0
package/plugin/commands/omgdeploy/ab.md +94 -0
package/plugin/commands/omgdeploy/cloud.md +89 -0
package/plugin/commands/omgdeploy/edge.md +93 -0
package/plugin/commands/omgdeploy/package.md +91 -0
package/plugin/commands/omgdeploy/serve.md +92 -0
package/plugin/commands/omgfeature/embed.md +93 -0
package/plugin/commands/omgfeature/extract.md +93 -0
package/plugin/commands/omgfeature/select.md +85 -0
package/plugin/commands/omgfeature/store.md +97 -0
package/plugin/commands/omgml/init.md +60 -0
package/plugin/commands/omgml/status.md +82 -0
package/plugin/commands/omgops/drift.md +87 -0
package/plugin/commands/omgops/monitor.md +99 -0
package/plugin/commands/omgops/pipeline.md +102 -0
package/plugin/commands/omgops/registry.md +109 -0
package/plugin/commands/omgops/retrain.md +91 -0
package/plugin/commands/omgoptim/distill.md +90 -0
package/plugin/commands/omgoptim/profile.md +92 -0
package/plugin/commands/omgoptim/prune.md +81 -0
package/plugin/commands/omgoptim/quantize.md +83 -0
package/plugin/commands/omgtrain/baseline.md +78 -0
package/plugin/commands/omgtrain/compare.md +99 -0
package/plugin/commands/omgtrain/evaluate.md +85 -0
package/plugin/commands/omgtrain/train.md +81 -0
package/plugin/commands/omgtrain/tune.md +89 -0
package/plugin/registry.yaml +252 -2
package/plugin/skills/ml-systems/SKILL.md +65 -0
package/plugin/skills/ml-systems/ai-accelerators/SKILL.md +342 -0
package/plugin/skills/ml-systems/data-eng/SKILL.md +126 -0
package/plugin/skills/ml-systems/deep-learning-primer/SKILL.md +143 -0
package/plugin/skills/ml-systems/deployment-paradigms/SKILL.md +148 -0
package/plugin/skills/ml-systems/dnn-architectures/SKILL.md +128 -0
package/plugin/skills/ml-systems/edge-deployment/SKILL.md +366 -0
package/plugin/skills/ml-systems/efficient-ai/SKILL.md +316 -0
package/plugin/skills/ml-systems/feature-engineering/SKILL.md +151 -0
package/plugin/skills/ml-systems/ml-frameworks/SKILL.md +187 -0
package/plugin/skills/ml-systems/ml-serving-optimization/SKILL.md +371 -0
package/plugin/skills/ml-systems/ml-systems-fundamentals/SKILL.md +103 -0
package/plugin/skills/ml-systems/ml-workflow/SKILL.md +162 -0
package/plugin/skills/ml-systems/mlops/SKILL.md +386 -0
package/plugin/skills/ml-systems/model-deployment/SKILL.md +350 -0
package/plugin/skills/ml-systems/model-dev/SKILL.md +160 -0
package/plugin/skills/ml-systems/model-optimization/SKILL.md +339 -0
package/plugin/skills/ml-systems/robust-ai/SKILL.md +395 -0
package/plugin/skills/ml-systems/training-data/SKILL.md +152 -0
package/plugin/workflows/ml-systems/data-preparation-workflow.md +276 -0
package/plugin/workflows/ml-systems/edge-deployment-workflow.md +413 -0
package/plugin/workflows/ml-systems/full-ml-lifecycle-workflow.md +405 -0
package/plugin/workflows/ml-systems/hyperparameter-tuning-workflow.md +352 -0
package/plugin/workflows/ml-systems/mlops-pipeline-workflow.md +384 -0
package/plugin/workflows/ml-systems/model-deployment-workflow.md +392 -0
package/plugin/workflows/ml-systems/model-development-workflow.md +218 -0
package/plugin/workflows/ml-systems/model-evaluation-workflow.md +416 -0
package/plugin/workflows/ml-systems/model-optimization-workflow.md +390 -0
package/plugin/workflows/ml-systems/monitoring-drift-workflow.md +446 -0
package/plugin/workflows/ml-systems/retraining-workflow.md +401 -0
package/plugin/workflows/ml-systems/training-pipeline-workflow.md +382 -0

package/plugin/registry.yaml CHANGED Viewed

@@ -1,9 +1,9 @@
 # OMGKIT Component Registry
 # Single Source of Truth for Agents, Skills, Commands, Workflows, and MCPs
-# Version: 2.20.0
+# Version: 2.21.0
 # Updated: 2026-01-02
-version: "2.20.0"
+version: "2.21.1"
 # =============================================================================
 # OPTIMIZED ALIGNMENT PRINCIPLE (OAP)
@@ -493,6 +493,154 @@ agents:
       - simulation/visualization-scientific
     commands: []
+  # ---------------------------------------------------------------------------
+  # ML SYSTEMS AGENTS (Harvard CS 329S + Chip Huyen)
+  # ---------------------------------------------------------------------------
+  ml-engineer-agent:
+    file: agents/ml-engineer-agent.md
+    description: Full-stack ML engineering for end-to-end ML systems
+    skills:
+      - ml-systems/ml-systems-fundamentals
+      - ml-systems/data-eng
+      - ml-systems/feature-engineering
+      - ml-systems/ml-workflow
+      - ml-systems/model-dev
+      - ml-systems/ml-frameworks
+      - ml-systems/model-deployment
+      - ml-systems/mlops
+    commands:
+      - /omgml:init
+      - /omgml:status
+      - /omgdata:collect
+      - /omgdata:validate
+      - /omgfeature:extract
+      - /omgfeature:select
+      - /omgtrain:train
+      - /omgtrain:evaluate
+      - /omgdeploy:package
+      - /omgdeploy:serve
+      - /omgops:pipeline
+  data-scientist-agent:
+    file: agents/data-scientist-agent.md
+    description: Expert data science for EDA, statistical modeling, and insights
+    skills:
+      - ml-systems/ml-systems-fundamentals
+      - ml-systems/data-eng
+      - ml-systems/training-data
+      - ml-systems/feature-engineering
+      - ml-systems/ml-workflow
+      - ml-systems/model-dev
+    commands:
+      - /omgdata:collect
+      - /omgdata:validate
+      - /omgdata:label
+      - /omgdata:augment
+      - /omgdata:split
+      - /omgfeature:extract
+      - /omgfeature:select
+      - /omgtrain:baseline
+      - /omgtrain:train
+      - /omgtrain:evaluate
+      - /omgtrain:compare
+  mlops-engineer-agent:
+    file: agents/mlops-engineer-agent.md
+    description: MLOps for production ML infrastructure and automation
+    skills:
+      - ml-systems/mlops
+      - ml-systems/robust-ai
+      - ml-systems/model-deployment
+      - ml-systems/ml-serving-optimization
+    commands:
+      - /omgops:pipeline
+      - /omgops:monitor
+      - /omgops:drift
+      - /omgops:retrain
+      - /omgops:registry
+      - /omgdeploy:package
+      - /omgdeploy:serve
+      - /omgdeploy:cloud
+      - /omgdeploy:ab
+  research-scientist-agent:
+    file: agents/research-scientist-agent.md
+    description: AI/ML research for novel approaches and paper implementation
+    skills:
+      - ml-systems/ml-systems-fundamentals
+      - ml-systems/deep-learning-primer
+      - ml-systems/dnn-architectures
+      - ml-systems/ml-workflow
+      - ml-systems/model-dev
+      - ml-systems/ml-frameworks
+    commands:
+      - /omgtrain:train
+      - /omgtrain:tune
+      - /omgtrain:evaluate
+      - /omgtrain:compare
+      - /omgml:status
+  model-optimizer-agent:
+    file: agents/model-optimizer-agent.md
+    description: Model optimization through quantization, pruning, and distillation
+    skills:
+      - ml-systems/efficient-ai
+      - ml-systems/model-optimization
+      - ml-systems/ai-accelerators
+      - ml-systems/ml-serving-optimization
+    commands:
+      - /omgoptim:quantize
+      - /omgoptim:prune
+      - /omgoptim:distill
+      - /omgoptim:profile
+      - /omgtrain:evaluate
+  production-engineer-agent:
+    file: agents/production-engineer-agent.md
+    description: ML production deployment with reliability and scalability
+    skills:
+      - ml-systems/model-deployment
+      - ml-systems/ml-serving-optimization
+      - ml-systems/edge-deployment
+      - ml-systems/robust-ai
+    commands:
+      - /omgdeploy:package
+      - /omgdeploy:serve
+      - /omgdeploy:edge
+      - /omgdeploy:cloud
+      - /omgdeploy:ab
+      - /omgops:monitor
+  ai-architect-agent:
+    file: agents/ai-architect-agent.md
+    description: Senior AI/ML architect for end-to-end ML system design
+    skills:
+      - ml-systems/ml-systems-fundamentals
+      - ml-systems/deployment-paradigms
+      - ml-systems/data-eng
+      - ml-systems/feature-engineering
+      - ml-systems/ml-workflow
+      - ml-systems/model-deployment
+      - ml-systems/mlops
+      - ml-systems/robust-ai
+    commands:
+      - /omgml:init
+      - /omgml:status
+      - /omgops:pipeline
+      - /omgops:registry
+  experiment-analyst-agent:
+    file: agents/experiment-analyst-agent.md
+    description: ML experiment analysis and model comparison
+    skills:
+      - ml-systems/ml-workflow
+      - ml-systems/model-dev
+      - ml-systems/training-data
+    commands:
+      - /omgtrain:evaluate
+      - /omgtrain:compare
+      - /omgml:status
 # =============================================================================
 # SKILL CATEGORIES
 # =============================================================================
@@ -514,6 +662,7 @@ skill_categories:
   - languages
   - methodology
   - microservices
+  - ml-systems        # ML Systems Design (Harvard CS 329S + Chip Huyen)
   - mobile
   - mobile-advanced
   - omega
@@ -540,6 +689,13 @@ command_namespaces:
   - iot       # IoT operations
   - ml        # Machine learning
   - omega     # Omega principles
+  - omgdata   # ML Data Engineering
+  - omgdeploy # ML Model Deployment
+  - omgfeature # ML Feature Engineering
+  - omgml     # ML Project Management
+  - omgops    # ML Operations
+  - omgoptim  # ML Model Optimization
+  - omgtrain  # ML Model Training
   - perf      # Performance
   - planning  # Planning and research
   - platform  # Platform engineering
@@ -772,3 +928,97 @@ workflows:
     agents: [copywriter, researcher]
     skills: []
     commands: [/planning:brainstorm, /planning:research]
+  # ---------------------------------------------------------------------------
+  # ML SYSTEMS WORKFLOWS (Harvard CS 329S + Chip Huyen)
+  # ---------------------------------------------------------------------------
+  ml-systems/model-development-workflow:
+    agents: [data-scientist-agent, research-scientist-agent, experiment-analyst-agent]
+    skills:
+      - ml-systems/ml-systems-fundamentals
+      - ml-systems/ml-workflow
+      - ml-systems/model-dev
+    commands: [/omgml:init, /omgtrain:baseline, /omgtrain:train, /omgtrain:evaluate]
+  ml-systems/data-preparation-workflow:
+    agents: [data-scientist-agent, ml-engineer-agent]
+    skills:
+      - ml-systems/data-eng
+      - ml-systems/training-data
+      - ml-systems/feature-engineering
+    commands: [/omgdata:collect, /omgdata:validate, /omgdata:label, /omgdata:augment, /omgdata:split]
+  ml-systems/training-pipeline-workflow:
+    agents: [ml-engineer-agent, mlops-engineer-agent]
+    skills:
+      - ml-systems/ml-workflow
+      - ml-systems/mlops
+    commands: [/omgops:pipeline, /omgtrain:train, /omgtrain:evaluate, /omgops:registry]
+  ml-systems/hyperparameter-tuning-workflow:
+    agents: [research-scientist-agent, experiment-analyst-agent]
+    skills:
+      - ml-systems/model-dev
+      - ml-systems/ml-frameworks
+    commands: [/omgtrain:tune, /omgtrain:compare]
+  ml-systems/model-evaluation-workflow:
+    agents: [experiment-analyst-agent, data-scientist-agent]
+    skills:
+      - ml-systems/model-dev
+      - ml-systems/training-data
+    commands: [/omgtrain:evaluate, /omgtrain:compare]
+  ml-systems/model-optimization-workflow:
+    agents: [model-optimizer-agent, production-engineer-agent]
+    skills:
+      - ml-systems/efficient-ai
+      - ml-systems/model-optimization
+      - ml-systems/ml-serving-optimization
+    commands: [/omgoptim:profile, /omgoptim:quantize, /omgoptim:prune, /omgoptim:distill]
+  ml-systems/model-deployment-workflow:
+    agents: [production-engineer-agent, mlops-engineer-agent]
+    skills:
+      - ml-systems/model-deployment
+      - ml-systems/ml-serving-optimization
+      - ml-systems/robust-ai
+    commands: [/omgdeploy:package, /omgdeploy:serve, /omgdeploy:cloud, /omgdeploy:ab]
+  ml-systems/edge-deployment-workflow:
+    agents: [model-optimizer-agent, production-engineer-agent]
+    skills:
+      - ml-systems/edge-deployment
+      - ml-systems/efficient-ai
+    commands: [/omgoptim:quantize, /omgdeploy:edge]
+  ml-systems/mlops-pipeline-workflow:
+    agents: [mlops-engineer-agent, production-engineer-agent]
+    skills:
+      - ml-systems/mlops
+      - ml-systems/model-deployment
+    commands: [/omgops:pipeline, /omgops:monitor, /omgops:registry]
+  ml-systems/monitoring-drift-workflow:
+    agents: [mlops-engineer-agent, experiment-analyst-agent]
+    skills:
+      - ml-systems/robust-ai
+      - ml-systems/mlops
+    commands: [/omgops:monitor, /omgops:drift]
+  ml-systems/retraining-workflow:
+    agents: [ml-engineer-agent, mlops-engineer-agent, experiment-analyst-agent]
+    skills:
+      - ml-systems/mlops
+      - ml-systems/ml-workflow
+    commands: [/omgops:retrain, /omgtrain:train, /omgtrain:evaluate]
+  ml-systems/full-ml-lifecycle-workflow:
+    agents: [ai-architect-agent, data-scientist-agent, ml-engineer-agent, research-scientist-agent, model-optimizer-agent, production-engineer-agent, mlops-engineer-agent, experiment-analyst-agent]
+    skills:
+      - ml-systems/ml-systems-fundamentals
+      - ml-systems/data-eng
+      - ml-systems/ml-workflow
+      - ml-systems/model-deployment
+      - ml-systems/mlops
+    commands: [/omgml:init, /omgdata:collect, /omgtrain:train, /omgdeploy:serve, /omgops:monitor]

package/plugin/skills/ml-systems/SKILL.md ADDED Viewed

@@ -0,0 +1,65 @@
+---
+name: ml-systems
+description: Machine Learning Systems - comprehensive knowledge for building production ML systems from data engineering through deployment and operations. Based on Harvard ML Systems course and Designing ML Systems by Chip Huyen.
+---
+# ML Systems
+Building production-ready machine learning systems.
+## Overview
+This skill category covers the complete ML system lifecycle:
+1. **Foundations** - Core concepts, architectures, paradigms
+2. **Data Engineering** - Data collection, quality, feature engineering
+3. **Model Development** - Training, evaluation, frameworks
+4. **Performance** - Optimization, acceleration, efficiency
+5. **Deployment** - Serving, edge deployment, scaling
+6. **Operations** - MLOps, monitoring, reliability
+## Categories
+### Foundations
+- `ml-systems-fundamentals` - Core ML systems concepts
+- `deep-learning-primer` - Deep learning foundations
+- `dnn-architectures` - Neural network architectures
+- `deployment-paradigms` - Deployment patterns
+### Data Engineering
+- `data-engineering` - Data pipelines and quality
+- `training-data` - Training data management
+- `feature-engineering` - Feature creation and stores
+### Model Development
+- `ml-workflow` - ML development workflow
+- `model-development` - Model training and selection
+- `ml-frameworks` - Framework best practices
+### Performance
+- `efficient-ai` - Efficiency techniques
+- `model-optimization` - Quantization, pruning, distillation
+- `ai-accelerators` - Hardware acceleration
+### Deployment
+- `model-deployment` - Production deployment
+- `inference-optimization` - Inference optimization
+- `edge-deployment` - Edge and mobile deployment
+### Operations
+- `mlops` - ML operations and lifecycle
+- `robust-ai` - Reliability and robustness
+## Key Principles
+1. **Data-Centric AI** - Focus on data quality over model complexity
+2. **Iterative Development** - Start simple, iterate based on metrics
+3. **Production-First** - Design for deployment from the start
+4. **Monitoring** - Continuous monitoring and improvement
+5. **Reproducibility** - Version everything (data, code, models)
+## References
+- Harvard CS 329S: Machine Learning Systems Design
+- Designing Machine Learning Systems by Chip Huyen
+- MLOps: Continuous Delivery and Automation Pipelines

package/plugin/skills/ml-systems/ai-accelerators/SKILL.md ADDED Viewed

@@ -0,0 +1,342 @@
+---
+name: ai-accelerators
+description: AI hardware accelerators including GPUs, TPUs, custom silicon, and hardware-aware optimization strategies for ML workloads.
+---
+# AI Accelerators
+Hardware acceleration for ML workloads.
+## Hardware Landscape
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    AI ACCELERATOR TYPES                      │
+├─────────────────────────────────────────────────────────────┤
+│                                                              │
+│  GPU (NVIDIA)         TPU (Google)        NPU/Custom        │
+│  ─────────────        ────────────        ─────────         │
+│  CUDA cores           Systolic array      Apple Neural      │
+│  Tensor cores         BF16 native         Qualcomm Hexagon  │
+│  General purpose      TPU pods            Intel Habana      │
+│  PyTorch/TF native    JAX optimized       AWS Inferentia    │
+│                                                              │
+│  FPGA                 ASIC                Edge Accelerators │
+│  ─────────────        ────────────        ─────────         │
+│  Reconfigurable       Fixed function      Coral Edge TPU    │
+│  Low latency          Maximum perf        Jetson (NVIDIA)   │
+│  Power efficient      High volume         Intel NCS2        │
+│                                                              │
+└─────────────────────────────────────────────────────────────┘
+```
+## GPU Optimization
+### CUDA Memory Management
+```python
+import torch
+# Memory allocation
+torch.cuda.empty_cache()
+torch.cuda.memory_allocated()
+torch.cuda.max_memory_allocated()
+# Pin memory for faster transfers
+train_loader = DataLoader(
+    dataset,
+    batch_size=32,
+    pin_memory=True,
+    num_workers=4
+)
+# Async data transfer
+def async_prefetch(loader, device):
+    stream = torch.cuda.Stream()
+    for batch in loader:
+        with torch.cuda.stream(stream):
+            batch = batch.to(device, non_blocking=True)
+        torch.cuda.current_stream().wait_stream(stream)
+        yield batch
+```
+### Tensor Core Utilization
+```python
+# Ensure tensor core alignment (multiples of 8)
+class TensorCoreOptimized(nn.Module):
+    def __init__(self, in_features, out_features):
+        super().__init__()
+        # Round to multiple of 8 for tensor cores
+        self.in_features = ((in_features + 7) // 8) * 8
+        self.out_features = ((out_features + 7) // 8) * 8
+        self.linear = nn.Linear(self.in_features, self.out_features)
+        self.pad_in = self.in_features - in_features
+    def forward(self, x):
+        if self.pad_in > 0:
+            x = F.pad(x, (0, self.pad_in))
+        return self.linear(x)
+# Enable TF32 on Ampere+ GPUs
+torch.backends.cuda.matmul.allow_tf32 = True
+torch.backends.cudnn.allow_tf32 = True
+# Force FP16 computation
+with torch.cuda.amp.autocast(dtype=torch.float16):
+    output = model(input)
+```
+### Multi-GPU Strategies
+```python
+# DataParallel (simple, not recommended for training)
+model = nn.DataParallel(model)
+# DistributedDataParallel (recommended)
+model = DistributedDataParallel(model, device_ids=[local_rank])
+# Model Parallelism (for large models)
+class ModelParallel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.encoder = nn.TransformerEncoder(...).to('cuda:0')
+        self.decoder = nn.TransformerDecoder(...).to('cuda:1')
+    def forward(self, x):
+        x = self.encoder(x.to('cuda:0'))
+        x = self.decoder(x.to('cuda:1'))
+        return x
+# Pipeline Parallelism
+from torch.distributed.pipeline.sync import Pipe
+model = nn.Sequential(
+    nn.Linear(100, 200).to('cuda:0'),
+    nn.ReLU().to('cuda:0'),
+    nn.Linear(200, 100).to('cuda:1')
+)
+model = Pipe(model, chunks=8)
+```
+## TPU Optimization
+```python
+# JAX/TPU optimized training
+import jax
+import jax.numpy as jnp
+from flax import linen as nn
+class TPUModel(nn.Module):
+    features: int
+    @nn.compact
+    def __call__(self, x):
+        x = nn.Dense(self.features)(x)
+        x = nn.relu(x)
+        return nn.Dense(10)(x)
+# pmap for data parallelism across TPU cores
+@jax.pmap
+def train_step(state, batch):
+    def loss_fn(params):
+        logits = state.apply_fn({'params': params}, batch['image'])
+        loss = jnp.mean(optax.softmax_cross_entropy(logits, batch['label']))
+        return loss
+    grad_fn = jax.value_and_grad(loss_fn)
+    loss, grads = grad_fn(state.params)
+    grads = jax.lax.pmean(grads, axis_name='batch')
+    state = state.apply_gradients(grads=grads)
+    return state, loss
+# PyTorch/XLA for TPU
+import torch_xla.core.xla_model as xm
+import torch_xla.distributed.parallel_loader as pl
+device = xm.xla_device()
+model = model.to(device)
+for batch in pl.ParallelLoader(train_loader, [device]):
+    output = model(batch)
+    loss.backward()
+    xm.optimizer_step(optimizer)
+```
+## Edge Accelerators
+### NVIDIA Jetson
+```python
+# TensorRT optimization for Jetson
+import tensorrt as trt
+import pycuda.driver as cuda
+def build_engine(onnx_path, precision='fp16'):
+    logger = trt.Logger(trt.Logger.WARNING)
+    builder = trt.Builder(logger)
+    network = builder.create_network(
+        1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
+    )
+    parser = trt.OnnxParser(network, logger)
+    with open(onnx_path, 'rb') as f:
+        parser.parse(f.read())
+    config = builder.create_builder_config()
+    config.max_workspace_size = 1 << 30  # 1GB
+    if precision == 'fp16':
+        config.set_flag(trt.BuilderFlag.FP16)
+    elif precision == 'int8':
+        config.set_flag(trt.BuilderFlag.INT8)
+        config.int8_calibrator = EntropyCalibrator(calibration_data)
+    return builder.build_engine(network, config)
+# DeepStream for video inference
+# gst-launch-1.0 filesrc location=video.mp4 ! \
+#   decodebin ! nvvideoconvert ! \
+#   nvinfer config-file-path=config.txt ! \
+#   nvdsosd ! nveglglessink
+```
+### Coral Edge TPU
+```python
+from pycoral.utils import edgetpu
+from pycoral.adapters import common, classify
+# Load Edge TPU model
+interpreter = edgetpu.make_interpreter('model_edgetpu.tflite')
+interpreter.allocate_tensors()
+# Inference
+common.set_input(interpreter, image)
+interpreter.invoke()
+classes = classify.get_classes(interpreter, top_k=5)
+# Compile model for Edge TPU
+# edgetpu_compiler model.tflite
+```
+### TFLite for Mobile
+```python
+import tensorflow as tf
+# Convert to TFLite with quantization
+converter = tf.lite.TFLiteConverter.from_saved_model('saved_model/')
+converter.optimizations = [tf.lite.Optimize.DEFAULT]
+converter.target_spec.supported_types = [tf.float16]
+# Full integer quantization
+converter.representative_dataset = representative_dataset
+converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
+converter.inference_input_type = tf.uint8
+converter.inference_output_type = tf.uint8
+tflite_model = converter.convert()
+# Inference
+interpreter = tf.lite.Interpreter(model_content=tflite_model)
+interpreter.allocate_tensors()
+```
+## Hardware-Aware Optimization
+### Auto-Tuning
+```python
+# TVM auto-tuning for specific hardware
+import tvm
+from tvm import relay, autotvm
+# Extract tuning tasks
+tasks = autotvm.task.extract_from_program(
+    mod["main"], target="cuda", params=params
+)
+# Tune each task
+for task in tasks:
+    tuner = autotvm.tuner.XGBTuner(task)
+    tuner.tune(
+        n_trial=1000,
+        measure_option=autotvm.measure_option(
+            builder=autotvm.LocalBuilder(),
+            runner=autotvm.LocalRunner(number=10)
+        ),
+        callbacks=[autotvm.callback.log_to_file('tune.log')]
+    )
+# Compile with best configs
+with autotvm.apply_history_best('tune.log'):
+    with tvm.transform.PassContext(opt_level=3):
+        lib = relay.build(mod, target="cuda", params=params)
+```
+### Hardware Selection Matrix
+```python
+def select_hardware(model_size, latency_req, batch_size, budget):
+    """Select optimal hardware for ML workload."""
+    recommendations = []
+    if model_size > 10e9:  # >10B params
+        recommendations.append({
+            'hardware': 'Multi-GPU (A100/H100)',
+            'reason': 'Large model requires high memory bandwidth',
+            'cost': 'High'
+        })
+    if latency_req < 10:  # <10ms
+        recommendations.append({
+            'hardware': 'TensorRT + GPU',
+            'reason': 'Low latency requires optimized inference',
+            'cost': 'Medium'
+        })
+    if batch_size == 1 and latency_req < 5:
+        recommendations.append({
+            'hardware': 'Edge TPU / Jetson',
+            'reason': 'Single-sample low-latency inference',
+            'cost': 'Low'
+        })
+    return recommendations
+```
+## Benchmarking
+```python
+import torch.utils.benchmark as benchmark
+def benchmark_model(model, input_shape, device='cuda'):
+    x = torch.randn(*input_shape).to(device)
+    model = model.to(device)
+    model.eval()
+    # Warmup
+    for _ in range(10):
+        model(x)
+    # Benchmark
+    timer = benchmark.Timer(
+        stmt='model(x)',
+        globals={'model': model, 'x': x}
+    )
+    result = timer.blocked_autorange(min_run_time=1)
+    return {
+        'mean_ms': result.mean * 1000,
+        'median_ms': result.median * 1000,
+        'iqr_ms': result.iqr * 1000,
+        'throughput': 1000 / (result.mean * 1000)
+    }
+```
+## Commands
+- `/omgoptim:profile` - Profile on hardware
+- `/omgdeploy:edge` - Edge deployment
+- `/omgdeploy:cloud` - Cloud GPU deployment
+## Best Practices
+1. Profile on target hardware early
+2. Use hardware-specific optimizations
+3. Batch for throughput, stream for latency
+4. Consider power consumption for edge
+5. Test with production data volumes