npm - packwise-skills - Versions diffs - 1.0.0 - Mend

packwise-skills 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

package/.cursorrules +23 -0
package/CLAUDE.md +25 -0
package/README.md +295 -0
package/audit.md +224 -0
package/bin/packwise.js +155 -0
package/package.json +31 -0
package/skill.md +719 -0
package/sub-skills/ai/local-llm.md +183 -0
package/sub-skills/ai/python-ml.md +164 -0
package/sub-skills/backend/go-server.md +184 -0
package/sub-skills/backend/java-spring.md +241 -0
package/sub-skills/backend/node-server.md +164 -0
package/sub-skills/backend/php-laravel.md +175 -0
package/sub-skills/backend/python-server.md +164 -0
package/sub-skills/backend/rust-backend.md +118 -0
package/sub-skills/cli/python-cli.md +236 -0
package/sub-skills/cli/sdk-library.md +497 -0
package/sub-skills/cloud/ci-cd-pipelines.md +350 -0
package/sub-skills/cloud/docker.md +191 -0
package/sub-skills/cloud/kubernetes.md +277 -0
package/sub-skills/cloud/payment-integration.md +307 -0
package/sub-skills/cross-platform/multiplatform.md +252 -0
package/sub-skills/desktop/electron.md +783 -0
package/sub-skills/desktop/game-dev.md +443 -0
package/sub-skills/desktop/native-app.md +123 -0
package/sub-skills/desktop/scenarios.md +443 -0
package/sub-skills/desktop/smart-platforms.md +324 -0
package/sub-skills/desktop/tauri.md +428 -0
package/sub-skills/desktop/vr-ar.md +252 -0
package/sub-skills/desktop/web-to-desktop.md +153 -0
package/sub-skills/embedded/car-infotainment.md +129 -0
package/sub-skills/embedded/esp32.md +184 -0
package/sub-skills/embedded/ros.md +150 -0
package/sub-skills/embedded/stm32.md +160 -0
package/sub-skills/mobile/android.md +322 -0
package/sub-skills/mobile/capacitor.md +232 -0
package/sub-skills/mobile/flutter-mobile.md +138 -0
package/sub-skills/mobile/harmonyos.md +150 -0
package/sub-skills/mobile/ios.md +245 -0
package/sub-skills/mobile/react-native.md +443 -0
package/sub-skills/mobile/wearables.md +230 -0
package/sub-skills/plugins/browser-extension.md +308 -0
package/sub-skills/plugins/jetbrains-plugin.md +226 -0
package/sub-skills/plugins/vscode-extension.md +204 -0
package/sub-skills/security/security-tools.md +174 -0
package/sub-skills/web/monorepo.md +274 -0
package/sub-skills/web/pwa.md +220 -0
package/sub-skills/web/serverless-edge.md +295 -0
package/sub-skills/web/spa.md +266 -0
package/sub-skills/web/ssr.md +228 -0
package/sub-skills/web/wasm.md +243 -0

package/sub-skills/ai/local-llm.md ADDED Viewed

@@ -0,0 +1,183 @@
+# Local LLM Application Build Sub-Skill
+Package and deploy local large language model applications (offline AI, private deployment, edge inference).
+**Current versions**: Ollama 0.4+ / llama.cpp b4000+ / vLLM 0.6+ (2025-2026)
+## When to Use
+- Offline AI assistant (no internet required)
+- Privacy-sensitive AI applications (enterprise internal)
+- Edge AI deployment (Jetson, Raspberry Pi, local servers)
+- Cost optimization (avoid API fees)
+- Custom fine-tuned model serving
+## Tech Stack Comparison
+| Framework | Language | GPU Support | Best For | Setup Complexity |
+|-----------|---------|-------------|---------|-----------------|
+| Ollama | Go | CUDA/Metal/ROCm | Simplest local LLM runtime | Lowest |
+| llama.cpp | C++ | CUDA/Metal/Vulkan/ROCm | CPU inference, maximum control | Medium |
+| vLLM | Python | CUDA only | High-throughput GPU serving | Medium |
+| LM Studio | Desktop app | CUDA/Metal | GUI-based model management | Lowest |
+| text-generation-inference | Rust/Python | CUDA | Production GPU serving (Hugging Face) | High |
+| LocalAI | Go | CUDA/Metal | OpenAI-compatible local API | Low |
+---
+## Ollama (Recommended for Getting Started)
+### Install & Run
+```bash
+# Install
+curl -fsSL https://ollama.ai/install.sh | sh          # Linux
+brew install ollama                                     # macOS
+# Windows: download from ollama.ai
+# Run model
+ollama run llama3.1                    # 8B (default)
+ollama run llama3.1:70b                # 70B (needs ~40GB VRAM)
+ollama run codellama                   # Code-specific model
+ollama run mistral                     # Mistral 7B
+ollama run phi3                        # Microsoft Phi-3
+# API call (OpenAI-compatible)
+curl http://localhost:11434/v1/chat/completions -d '{
+  "model": "llama3.1",
+  "messages": [{"role": "user", "content": "Hello!"}]
+}'
+# List installed models
+ollama list
+# Pull model without running
+ollama pull llama3.1:70b
+```
+### Docker Deployment
+```yaml
+# docker-compose.yml
+services:
+  ollama:
+    image: ollama/ollama
+    ports: ["11434:11434"]
+    volumes: ["ollama:/root/.ollama"]
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+  open-webui:
+    image: ghcr.io/open-webui/open-webui
+    ports: ["3000:8080"]
+    environment:
+      OLLAMA_BASE_URL: http://ollama:11434
+    depends_on: [ollama]
+volumes:
+  ollama:
+```
+---
+## llama.cpp (Maximum Control)
+### Build & Run
+```bash
+# Clone and build
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp
+make -j$(nproc)                          # CPU only
+make -j$(nproc) CUDA=1                   # NVIDIA GPU
+make -j$(nproc) METAL=1                  # Apple Silicon
+# Run
+./llama-server -m models/llama-3.1-8b-q4_k_m.gguf \
+  --host 0.0.0.0 --port 8080 \
+  -ngl 99 \                             # Offload all layers to GPU
+  -c 4096                               # Context size
+# Quantize model
+./llama-quantize input.gguf output-q4_k_m.gguf Q4_K_M
+```
+### GGUF Quantization Levels
+| Quant | Size (8B) | Quality | Speed | Use Case |
+|-------|----------|---------|-------|----------|
+| Q2_K | ~3 GB | Low | Fastest | Maximum compression |
+| Q4_K_M | ~5 GB | Good | Fast | **Recommended default** |
+| Q5_K_M | ~6 GB | Better | Good | Quality-sensitive |
+| Q6_K | ~7 GB | Great | Slower | Near-lossless |
+| Q8_0 | ~8 GB | Excellent | Slower | Minimal quality loss |
+| F16 | ~16 GB | Lossless | Slowest | Research/evaluation |
+---
+## vLLM (High-Throughput GPU Serving)
+```bash
+pip install vllm
+# Serve model (OpenAI-compatible API)
+vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct \
+  --host 0.0.0.0 --port 8000 \
+  --tensor-parallel-size 1 \            # Number of GPUs
+  --max-model-len 4096 \
+  --quantization awq                   # AWQ quantization (optional)
+```
+```python
+# Use as Python library
+from vllm import LLM, SamplingParams
+llm = LLM(model="meta-llama/Meta-Llama-3.1-8B-Instruct")
+params = SamplingParams(temperature=0.7, max_tokens=512)
+outputs = llm.generate(["Hello, how are you?"], params)
+print(outputs[0].outputs[0].text)
+```
+---
+## Memory Requirements (Approximate)
+| Model | Q4_K_M (Recommended) | FP16 (Full) | Min GPU VRAM |
+|-------|---------------------|-------------|-------------|
+| 1B | ~1 GB | ~2 GB | 2 GB |
+| 3B | ~2 GB | ~6 GB | 4 GB |
+| 7B/8B | ~5 GB | ~14 GB | 8 GB |
+| 13B | ~8 GB | ~26 GB | 16 GB |
+| 34B | ~20 GB | ~68 GB | 40 GB |
+| 70B | ~40 GB | ~140 GB | 2×40 GB |
+---
+## Hardware Recommendations
+| Use Case | Minimum | Recommended |
+|----------|---------|------------|
+| Chat (7B) | 16GB RAM (CPU) | 8GB VRAM GPU |
+| Chat (13B+) | 32GB RAM or 16GB VRAM | 24GB VRAM (RTX 4090) |
+| Code (7B) | 16GB RAM | 12GB VRAM |
+| Production serving | 24GB VRAM | A100 40GB / H100 |
+| Edge (Raspberry Pi) | 8GB RAM (very slow) | Jetson Orin 16GB |
+| Apple Silicon Mac | 16GB unified | 32GB+ unified (M2/M3 Pro/Max) |
+---
+## Common Pitfalls
+| Issue | Fix |
+|-------|-----|
+| Slow model download | Use Hugging Face mirror (`HF_ENDPOINT`); use `ollama pull` for Ollama |
+| GPU not detected | Check `nvidia-smi`; install NVIDIA Container Toolkit for Docker |
+| Out of memory | Use smaller quantization (Q4_K_M); reduce context length; use CPU offload |
+| Slow response | Use GPU; reduce model size; use `--flash-attention` |
+| CORS errors when calling API | Ollama: set `OLLAMA_ORIGINS=*`; llama.cpp: add `--cors *` |
+| Model hallucination | Use system prompt; lower temperature; use RAG for factual accuracy |
+| Docker GPU not working | Install `nvidia-container-toolkit`; restart Docker daemon |
+| Apple Silicon not using GPU | Use Metal-enabled build; Ollama uses Metal by default on macOS |

package/sub-skills/ai/python-ml.md ADDED Viewed

@@ -0,0 +1,164 @@
+# Python ML Model Packaging Sub-Skill
+Package, optimize, and serve machine learning models for production.
+**Current version**: Python 3.12+ / PyTorch 2.x / TensorFlow 2.17+ / ONNX Runtime 1.19+ (2025-2026)
+## When to Use
+- Trained ML model deployed as API service
+- Image recognition / NLP / recommendation system
+- Data analysis pipeline
+- Research model publication and reproducibility
+- Edge ML deployment (mobile/embedded)
+## Deployment Options
+### FastAPI + Uvicorn (Recommended for APIs)
+```python
+from fastapi import FastAPI
+from transformers import pipeline
+import torch
+app = FastAPI()
+# Load model once at startup
+device = "cuda" if torch.cuda.is_available() else "cpu"
+classifier = pipeline("text-classification", model="my-model", device=device)
+@app.post("/predict")
+async def predict(text: str):
+    return classifier(text)
+@app.get("/health")
+async def health():
+    return {"status": "ok", "device": device}
+```
+### Flask + Gunicorn
+```python
+from flask import Flask, request, jsonify
+import pickle
+app = Flask(__name__)
+model = pickle.load(open("model.pkl", "rb"))
+@app.route("/predict", methods=["POST"])
+def predict():
+    data = request.json
+    result = model.predict([data["features"]])
+    return jsonify({"prediction": result.tolist()})
+```
+### ONNX Runtime (High-Performance Inference)
+```python
+import onnxruntime as ort
+import numpy as np
+# CPU inference
+session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
+# GPU inference (CUDA)
+session = ort.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])
+# TensorRT (fastest, NVIDIA GPU only)
+session = ort.InferenceSession("model.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider"])
+result = session.run(None, {"input": input_data.astype(np.float32)})
+```
+### Model Export
+```python
+# PyTorch → ONNX
+import torch
+model = torch.load("model.pt")
+model.eval()
+dummy_input = torch.randn(1, 3, 224, 224)
+torch.onnx.export(model, dummy_input, "model.onnx", opset_version=17)
+# TensorFlow → SavedModel
+model.save("saved_model/")
+# TensorFlow → TFLite (mobile)
+converter = tf.lite.TFLiteConverter.from_saved_model("saved_model/")
+converter.optimizations = [tf.lite.Optimize.DEFAULT]
+tflite_model = converter.convert()
+with open("model.tflite", "wb") as f:
+    f.write(tflite_model)
+```
+## Model Formats
+| Format | Framework | Size | Inference Speed | Best For |
+|--------|----------|------|----------------|----------|
+| PyTorch (.pt) | PyTorch | Large | Standard | Research, training |
+| ONNX (.onnx) | Cross-framework | Medium | Fast | Production APIs |
+| SavedModel | TensorFlow | Large | Standard | TF Serving |
+| TFLite (.tflite) | TensorFlow Lite | Small | Fast (mobile) | Mobile/edge |
+| TensorRT | NVIDIA | Medium | Fastest (GPU) | NVIDIA GPU servers |
+| GGUF | llama.cpp | Small | Fast (CPU) | Local LLM inference |
+| CoreML (.mlmodel) | Apple | Medium | Fast (Apple) | iOS/macOS on-device |
+| OpenVINO | Intel | Medium | Fast (Intel CPU) | Intel hardware |
+## Docker (GPU)
+```dockerfile
+FROM nvidia/cuda:12.4-runtime-ubuntu22.04
+RUN apt-get update && apt-get install -y python3 python3-pip && rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+COPY requirements.txt .
+RUN pip3 install --no-cache-dir -r requirements.txt
+COPY . .
+RUN groupadd -r appuser && useradd -r -g appuser appuser
+USER appuser
+EXPOSE 8000
+HEALTHCHECK --interval=30s --timeout=3s CMD curl -f http://localhost:8000/health || exit 1
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
+```
+```dockerfile
+# CPU-only (smaller)
+FROM python:3.13-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+RUN groupadd -r appuser && useradd -r -g appuser appuser
+USER appuser
+EXPOSE 8000
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0"]
+```
+## Model Optimization
+| Technique | Description | Typical Savings |
+|-----------|-------------|----------------|
+| Quantization (INT8) | Reduce precision from FP32 to INT8 | 4x smaller, 2-3x faster |
+| Quantization (INT4) | Further reduction | 8x smaller (LLMs) |
+| Pruning | Remove redundant weights | 2-10x smaller |
+| Knowledge Distillation | Train smaller model from larger | Custom |
+| ONNX Optimization | Graph optimization | 10-30% faster |
+| TensorRT | NVIDIA GPU optimization | 2-5x faster |
+```python
+# ONNX quantization
+from onnxruntime.quantization import quantize_dynamic, QuantType
+quantize_dynamic("model.onnx", "model_quant.onnx", weight_type=QuantType.QUInt8)
+```
+## Common Pitfalls
+| Issue | Fix |
+|-------|-----|
+| Model too large for deployment | Quantize (INT8/INT4); use ONNX; prune |
+| GPU out of memory | Reduce batch size; use gradient checkpointing; quantize |
+| Dependency conflicts | Use Docker isolation; pin exact versions |
+| Slow inference | Convert to ONNX or TensorRT; use GPU; batch requests |
+| CUDA version mismatch | Match PyTorch CUDA version with system CUDA; use Docker |
+| Model loading time too long | Load once at startup; use model server (TorchServe, TF Serving) |
+| Inconsistent results between environments | Fix random seeds; pin all dependency versions |
+| ONNX export fails | Check opset version; use `torch.onnx.export` with `opset_version=17` |

package/sub-skills/backend/go-server.md ADDED Viewed

@@ -0,0 +1,184 @@
+# Go Backend Build Sub-Skill
+Build Go backend services (Gin/Echo/Fiber/Axum-like stdlib).
+**Current version**: Go 1.24+ / 1.25 / 1.26 (2025-2026)
+## When to Use
+- High-performance REST/gRPC API services
+- CLI tools with embedded server
+- Microservices requiring low memory footprint
+- Network services (proxy, gateway, load balancer)
+- Concurrent/parallel processing services
+## Framework Quick Start
+### Gin (Most Popular)
+```go
+package main
+import "github.com/gin-gonic/gin"
+func main() {
+    r := gin.Default()
+    r.GET("/health", func(c *gin.Context) { c.JSON(200, gin.H{"status": "ok"}) })
+    r.GET("/api/users", getUsers)
+    r.Run(":8080")
+}
+```
+### Echo (Lightweight)
+```go
+package main
+import "github.com/labstack/echo/v4"
+func main() {
+    e := echo.New()
+    e.GET("/health", func(c echo.Context) error { return c.JSON(200, map[string]string{"status": "ok"}) })
+    e.Logger.Fatal(e.Start(":8080"))
+}
+```
+### Fiber (Express-like, fastest)
+```go
+package main
+import "github.com/gofiber/fiber/v3"
+func main() {
+    app := fiber.New()
+    app.Get("/health", func(c fiber.Ctx) error { return c.JSON(map[string]string{"status": "ok"}) })
+    app.Listen(":8080")
+}
+```
+### net/http (Standard Library, Zero Dependencies)
+```go
+package main
+import (
+    "encoding/json"
+    "net/http"
+)
+func main() {
+    http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
+        json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
+    })
+    http.ListenAndServe(":8080", nil)
+}
+```
+### Framework Comparison
+| Framework | Performance | Middleware | Ecosystem | Best For |
+|-----------|------------|-----------|-----------|----------|
+| Gin | High | Rich | Largest | General-purpose APIs |
+| Echo | High | Rich | Large | Clean API design |
+| Fiber | Highest | Rich | Growing | Express-style, max performance |
+| net/http | High | Manual | Stdlib | Zero-dependency, simple APIs |
+| Chi | High | Composable | Moderate | RESTful APIs, middleware chains |
+## Build
+```bash
+# Standard build
+go build -o myapp .
+# Optimized release build
+CGO_ENABLED=0 go build -ldflags="-s -w" -o myapp .
+# With version embedding
+go build -ldflags="-s -w \
+  -X main.version=$(git describe --tags --always) \
+  -X main.buildDate=$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
+  -o myapp .
+# Cross-compile
+CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o myapp-linux .
+CGO_ENABLED=0 GOOS=windows GOARCH=amd64 go build -ldflags="-s -w" -o myapp.exe .
+CGO_ENABLED=0 GOOS=darwin GOARCH=arm64 go build -ldflags="-s -w" -o myapp-mac .
+CGO_ENABLED=0 GOOS=darwin GOARCH=amd64 go build -ldflags="-s -w" -o myapp-mac-intel .
+CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -ldflags="-s -w" -o myapp-linux-arm64 .
+# Build all platforms at once
+GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o dist/myapp-linux-amd64 .
+GOOS=linux GOARCH=arm64 go build -ldflags="-s -w" -o dist/myapp-linux-arm64 .
+GOOS=darwin GOARCH=arm64 go build -ldflags="-s -w" -o dist/myapp-darwin-arm64 .
+GOOS=darwin GOARCH=amd64 go build -ldflags="-s -w" -o dist/myapp-darwin-amd64 .
+GOOS=windows GOARCH=amd64 go build -ldflags="-s -w" -o dist/myapp-windows-amd64.exe .
+```
+## Docker
+```dockerfile
+FROM golang:1.23-alpine AS builder
+WORKDIR /app
+COPY go.* ./
+RUN go mod download
+COPY . .
+RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o myapp .
+FROM alpine:latest
+RUN apk add --no-cache ca-certificates tzdata && \
+    addgroup -S appgroup && adduser -S appuser -G appgroup
+COPY --from=builder /app/myapp /myapp
+USER appuser
+EXPOSE 8080
+HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- http://localhost:8080/health || exit 1
+CMD ["/myapp"]
+```
+## Embed Static Files
+```go
+package main
+import "embed"
+//go:embed static/*
+var staticFiles embed.FS
+func main() {
+    http.Handle("/", http.FileServer(http.FS(staticFiles)))
+}
+```
+## gRPC Build
+```protobuf
+// proto/service.proto
+syntax = "proto3";
+package myservice;
+service MyService {
+    rpc GetUser(GetUserRequest) returns (User);
+}
+```
+```bash
+# Generate Go code
+protoc --go_out=. --go-grpc_out=. proto/service.proto
+# Build server and client
+go build -o server ./cmd/server
+go build -o client ./cmd/client
+```
+## Common Pitfalls
+| Issue | Fix |
+|-------|-----|
+| CGO dependency fails cross-compile | Use `CGO_ENABLED=0`; or use `cross` Docker image |
+| Timezone issues in container | Install `tzdata` package in Alpine image |
+| Static files not included | Use `//go:embed` directive (Go 1.16+) |
+| Binary too large | Use `ldflags="-s -w"`; strip debug symbols |
+| `go.sum` mismatch in CI | Run `go mod tidy`; commit both `go.mod` and `go.sum` |
+| Import cycle | Restructure packages; use interfaces to break cycles |
+| Race condition | Run with `-race` flag during testing |
+| Memory leak in long-running | Use `pprof` for profiling; check goroutine leaks |