claude-autopm 2.8.2 → 2.8.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (390) hide show
  1. package/README.md +399 -637
  2. package/package.json +2 -1
  3. package/packages/plugin-ai/LICENSE +21 -0
  4. package/packages/plugin-ai/README.md +316 -0
  5. package/packages/plugin-ai/agents/anthropic-claude-expert.md +579 -0
  6. package/packages/plugin-ai/agents/azure-openai-expert.md +1411 -0
  7. package/packages/plugin-ai/agents/gemini-api-expert.md +880 -0
  8. package/packages/plugin-ai/agents/google-a2a-expert.md +1445 -0
  9. package/packages/plugin-ai/agents/huggingface-expert.md +2131 -0
  10. package/packages/plugin-ai/agents/langchain-expert.md +1427 -0
  11. package/packages/plugin-ai/agents/langgraph-workflow-expert.md +520 -0
  12. package/packages/plugin-ai/agents/openai-python-expert.md +1087 -0
  13. package/packages/plugin-ai/commands/a2a-setup.md +886 -0
  14. package/packages/plugin-ai/commands/ai-model-deployment.md +481 -0
  15. package/packages/plugin-ai/commands/anthropic-optimize.md +793 -0
  16. package/packages/plugin-ai/commands/huggingface-deploy.md +789 -0
  17. package/packages/plugin-ai/commands/langchain-optimize.md +807 -0
  18. package/packages/plugin-ai/commands/llm-optimize.md +348 -0
  19. package/packages/plugin-ai/commands/openai-optimize.md +863 -0
  20. package/packages/plugin-ai/commands/rag-optimize.md +841 -0
  21. package/packages/plugin-ai/commands/rag-setup-scaffold.md +382 -0
  22. package/packages/plugin-ai/package.json +66 -0
  23. package/packages/plugin-ai/plugin.json +519 -0
  24. package/packages/plugin-ai/rules/ai-model-standards.md +449 -0
  25. package/packages/plugin-ai/rules/prompt-engineering-standards.md +509 -0
  26. package/packages/plugin-ai/scripts/examples/huggingface-inference-example.py +145 -0
  27. package/packages/plugin-ai/scripts/examples/langchain-rag-example.py +366 -0
  28. package/packages/plugin-ai/scripts/examples/mlflow-tracking-example.py +224 -0
  29. package/packages/plugin-ai/scripts/examples/openai-chat-example.py +425 -0
  30. package/packages/plugin-cloud/README.md +268 -0
  31. package/packages/plugin-cloud/agents/README.md +55 -0
  32. package/packages/plugin-cloud/agents/aws-cloud-architect.md +521 -0
  33. package/packages/plugin-cloud/agents/azure-cloud-architect.md +436 -0
  34. package/packages/plugin-cloud/agents/gcp-cloud-architect.md +385 -0
  35. package/packages/plugin-cloud/agents/gcp-cloud-functions-engineer.md +306 -0
  36. package/packages/plugin-cloud/agents/gemini-api-expert.md +880 -0
  37. package/packages/plugin-cloud/agents/kubernetes-orchestrator.md +566 -0
  38. package/packages/plugin-cloud/agents/openai-python-expert.md +1087 -0
  39. package/packages/plugin-cloud/agents/terraform-infrastructure-expert.md +454 -0
  40. package/packages/plugin-cloud/commands/cloud-cost-optimize.md +243 -0
  41. package/packages/plugin-cloud/commands/cloud-validate.md +196 -0
  42. package/packages/plugin-cloud/commands/infra-deploy.md +38 -0
  43. package/packages/plugin-cloud/commands/k8s-deploy.md +37 -0
  44. package/packages/plugin-cloud/commands/ssh-security.md +65 -0
  45. package/packages/plugin-cloud/commands/traefik-setup.md +65 -0
  46. package/packages/plugin-cloud/hooks/pre-cloud-deploy.js +456 -0
  47. package/packages/plugin-cloud/package.json +64 -0
  48. package/packages/plugin-cloud/plugin.json +338 -0
  49. package/packages/plugin-cloud/rules/cloud-security-compliance.md +313 -0
  50. package/packages/plugin-cloud/rules/infrastructure-pipeline.md +128 -0
  51. package/packages/plugin-cloud/scripts/examples/aws-validate.sh +30 -0
  52. package/packages/plugin-cloud/scripts/examples/azure-setup.sh +33 -0
  53. package/packages/plugin-cloud/scripts/examples/gcp-setup.sh +39 -0
  54. package/packages/plugin-cloud/scripts/examples/k8s-validate.sh +40 -0
  55. package/packages/plugin-cloud/scripts/examples/terraform-init.sh +26 -0
  56. package/packages/plugin-core/README.md +274 -0
  57. package/packages/plugin-core/agents/core/agent-manager.md +296 -0
  58. package/packages/plugin-core/agents/core/code-analyzer.md +131 -0
  59. package/packages/plugin-core/agents/core/file-analyzer.md +162 -0
  60. package/packages/plugin-core/agents/core/test-runner.md +200 -0
  61. package/packages/plugin-core/commands/code-rabbit.md +128 -0
  62. package/packages/plugin-core/commands/prompt.md +9 -0
  63. package/packages/plugin-core/commands/re-init.md +9 -0
  64. package/packages/plugin-core/hooks/context7-reminder.md +29 -0
  65. package/packages/plugin-core/hooks/enforce-agents.js +125 -0
  66. package/packages/plugin-core/hooks/enforce-agents.sh +35 -0
  67. package/packages/plugin-core/hooks/pre-agent-context7.js +224 -0
  68. package/packages/plugin-core/hooks/pre-command-context7.js +229 -0
  69. package/packages/plugin-core/hooks/strict-enforce-agents.sh +39 -0
  70. package/packages/plugin-core/hooks/test-hook.sh +21 -0
  71. package/packages/plugin-core/hooks/unified-context7-enforcement.sh +38 -0
  72. package/packages/plugin-core/package.json +45 -0
  73. package/packages/plugin-core/plugin.json +387 -0
  74. package/packages/plugin-core/rules/agent-coordination.md +549 -0
  75. package/packages/plugin-core/rules/agent-mandatory.md +170 -0
  76. package/packages/plugin-core/rules/ai-integration-patterns.md +219 -0
  77. package/packages/plugin-core/rules/command-pipelines.md +208 -0
  78. package/packages/plugin-core/rules/context-optimization.md +176 -0
  79. package/packages/plugin-core/rules/context7-enforcement.md +327 -0
  80. package/packages/plugin-core/rules/datetime.md +122 -0
  81. package/packages/plugin-core/rules/definition-of-done.md +272 -0
  82. package/packages/plugin-core/rules/development-environments.md +19 -0
  83. package/packages/plugin-core/rules/development-workflow.md +198 -0
  84. package/packages/plugin-core/rules/framework-path-rules.md +180 -0
  85. package/packages/plugin-core/rules/frontmatter-operations.md +64 -0
  86. package/packages/plugin-core/rules/git-strategy.md +237 -0
  87. package/packages/plugin-core/rules/golden-rules.md +181 -0
  88. package/packages/plugin-core/rules/naming-conventions.md +111 -0
  89. package/packages/plugin-core/rules/no-pr-workflow.md +183 -0
  90. package/packages/plugin-core/rules/performance-guidelines.md +403 -0
  91. package/packages/plugin-core/rules/pipeline-mandatory.md +109 -0
  92. package/packages/plugin-core/rules/security-checklist.md +318 -0
  93. package/packages/plugin-core/rules/standard-patterns.md +197 -0
  94. package/packages/plugin-core/rules/strip-frontmatter.md +85 -0
  95. package/packages/plugin-core/rules/tdd.enforcement.md +103 -0
  96. package/packages/plugin-core/rules/use-ast-grep.md +113 -0
  97. package/packages/plugin-core/scripts/lib/datetime-utils.sh +254 -0
  98. package/packages/plugin-core/scripts/lib/frontmatter-utils.sh +294 -0
  99. package/packages/plugin-core/scripts/lib/github-utils.sh +221 -0
  100. package/packages/plugin-core/scripts/lib/logging-utils.sh +199 -0
  101. package/packages/plugin-core/scripts/lib/validation-utils.sh +339 -0
  102. package/packages/plugin-core/scripts/mcp/add.sh +7 -0
  103. package/packages/plugin-core/scripts/mcp/disable.sh +12 -0
  104. package/packages/plugin-core/scripts/mcp/enable.sh +12 -0
  105. package/packages/plugin-core/scripts/mcp/list.sh +7 -0
  106. package/packages/plugin-core/scripts/mcp/sync.sh +8 -0
  107. package/packages/plugin-data/README.md +315 -0
  108. package/packages/plugin-data/agents/airflow-orchestration-expert.md +158 -0
  109. package/packages/plugin-data/agents/kedro-pipeline-expert.md +304 -0
  110. package/packages/plugin-data/agents/langgraph-workflow-expert.md +530 -0
  111. package/packages/plugin-data/commands/airflow-dag-scaffold.md +413 -0
  112. package/packages/plugin-data/commands/kafka-pipeline-scaffold.md +503 -0
  113. package/packages/plugin-data/package.json +66 -0
  114. package/packages/plugin-data/plugin.json +294 -0
  115. package/packages/plugin-data/rules/data-quality-standards.md +373 -0
  116. package/packages/plugin-data/rules/etl-pipeline-standards.md +255 -0
  117. package/packages/plugin-data/scripts/examples/airflow-dag-example.py +245 -0
  118. package/packages/plugin-data/scripts/examples/dbt-transform-example.sql +238 -0
  119. package/packages/plugin-data/scripts/examples/kafka-streaming-example.py +257 -0
  120. package/packages/plugin-data/scripts/examples/pandas-etl-example.py +332 -0
  121. package/packages/plugin-databases/README.md +330 -0
  122. package/packages/plugin-databases/agents/README.md +50 -0
  123. package/packages/plugin-databases/agents/bigquery-expert.md +401 -0
  124. package/packages/plugin-databases/agents/cosmosdb-expert.md +375 -0
  125. package/packages/plugin-databases/agents/mongodb-expert.md +407 -0
  126. package/packages/plugin-databases/agents/postgresql-expert.md +329 -0
  127. package/packages/plugin-databases/agents/redis-expert.md +74 -0
  128. package/packages/plugin-databases/commands/db-optimize.md +612 -0
  129. package/packages/plugin-databases/package.json +60 -0
  130. package/packages/plugin-databases/plugin.json +237 -0
  131. package/packages/plugin-databases/rules/database-management-strategy.md +146 -0
  132. package/packages/plugin-databases/rules/database-pipeline.md +316 -0
  133. package/packages/plugin-databases/scripts/examples/bigquery-cost-analyze.sh +160 -0
  134. package/packages/plugin-databases/scripts/examples/cosmosdb-ru-optimize.sh +163 -0
  135. package/packages/plugin-databases/scripts/examples/mongodb-shard-check.sh +120 -0
  136. package/packages/plugin-databases/scripts/examples/postgres-index-analyze.sh +95 -0
  137. package/packages/plugin-databases/scripts/examples/redis-cache-stats.sh +121 -0
  138. package/packages/plugin-devops/README.md +367 -0
  139. package/packages/plugin-devops/agents/README.md +52 -0
  140. package/packages/plugin-devops/agents/azure-devops-specialist.md +308 -0
  141. package/packages/plugin-devops/agents/docker-containerization-expert.md +298 -0
  142. package/packages/plugin-devops/agents/github-operations-specialist.md +335 -0
  143. package/packages/plugin-devops/agents/mcp-context-manager.md +319 -0
  144. package/packages/plugin-devops/agents/observability-engineer.md +574 -0
  145. package/packages/plugin-devops/agents/ssh-operations-expert.md +1093 -0
  146. package/packages/plugin-devops/agents/traefik-proxy-expert.md +444 -0
  147. package/packages/plugin-devops/commands/ci-pipeline-create.md +581 -0
  148. package/packages/plugin-devops/commands/docker-optimize.md +493 -0
  149. package/packages/plugin-devops/commands/workflow-create.md +42 -0
  150. package/packages/plugin-devops/hooks/pre-docker-build.js +472 -0
  151. package/packages/plugin-devops/package.json +61 -0
  152. package/packages/plugin-devops/plugin.json +302 -0
  153. package/packages/plugin-devops/rules/ci-cd-kubernetes-strategy.md +25 -0
  154. package/packages/plugin-devops/rules/devops-troubleshooting-playbook.md +450 -0
  155. package/packages/plugin-devops/rules/docker-first-development.md +404 -0
  156. package/packages/plugin-devops/rules/github-operations.md +92 -0
  157. package/packages/plugin-devops/scripts/examples/docker-build-multistage.sh +43 -0
  158. package/packages/plugin-devops/scripts/examples/docker-compose-validate.sh +74 -0
  159. package/packages/plugin-devops/scripts/examples/github-workflow-validate.sh +48 -0
  160. package/packages/plugin-devops/scripts/examples/prometheus-health-check.sh +58 -0
  161. package/packages/plugin-devops/scripts/examples/ssh-key-setup.sh +74 -0
  162. package/packages/plugin-frameworks/README.md +309 -0
  163. package/packages/plugin-frameworks/agents/README.md +64 -0
  164. package/packages/plugin-frameworks/agents/e2e-test-engineer.md +579 -0
  165. package/packages/plugin-frameworks/agents/nats-messaging-expert.md +254 -0
  166. package/packages/plugin-frameworks/agents/react-frontend-engineer.md +393 -0
  167. package/packages/plugin-frameworks/agents/react-ui-expert.md +226 -0
  168. package/packages/plugin-frameworks/agents/tailwindcss-expert.md +1021 -0
  169. package/packages/plugin-frameworks/agents/ux-design-expert.md +244 -0
  170. package/packages/plugin-frameworks/commands/app-scaffold.md +50 -0
  171. package/packages/plugin-frameworks/commands/nextjs-optimize.md +692 -0
  172. package/packages/plugin-frameworks/commands/react-optimize.md +583 -0
  173. package/packages/plugin-frameworks/commands/tailwind-system.md +64 -0
  174. package/packages/plugin-frameworks/package.json +59 -0
  175. package/packages/plugin-frameworks/plugin.json +224 -0
  176. package/packages/plugin-frameworks/rules/performance-guidelines.md +403 -0
  177. package/packages/plugin-frameworks/rules/ui-development-standards.md +281 -0
  178. package/packages/plugin-frameworks/rules/ui-framework-rules.md +151 -0
  179. package/packages/plugin-frameworks/scripts/examples/react-component-perf.sh +34 -0
  180. package/packages/plugin-frameworks/scripts/examples/tailwind-optimize.sh +44 -0
  181. package/packages/plugin-frameworks/scripts/examples/vue-composition-check.sh +41 -0
  182. package/packages/plugin-languages/README.md +333 -0
  183. package/packages/plugin-languages/agents/README.md +50 -0
  184. package/packages/plugin-languages/agents/bash-scripting-expert.md +541 -0
  185. package/packages/plugin-languages/agents/javascript-frontend-engineer.md +197 -0
  186. package/packages/plugin-languages/agents/nodejs-backend-engineer.md +226 -0
  187. package/packages/plugin-languages/agents/python-backend-engineer.md +214 -0
  188. package/packages/plugin-languages/agents/python-backend-expert.md +289 -0
  189. package/packages/plugin-languages/commands/javascript-optimize.md +636 -0
  190. package/packages/plugin-languages/commands/nodejs-api-scaffold.md +341 -0
  191. package/packages/plugin-languages/commands/nodejs-optimize.md +689 -0
  192. package/packages/plugin-languages/commands/python-api-scaffold.md +261 -0
  193. package/packages/plugin-languages/commands/python-optimize.md +593 -0
  194. package/packages/plugin-languages/package.json +65 -0
  195. package/packages/plugin-languages/plugin.json +265 -0
  196. package/packages/plugin-languages/rules/code-quality-standards.md +496 -0
  197. package/packages/plugin-languages/rules/testing-standards.md +768 -0
  198. package/packages/plugin-languages/scripts/examples/bash-production-script.sh +520 -0
  199. package/packages/plugin-languages/scripts/examples/javascript-es6-patterns.js +291 -0
  200. package/packages/plugin-languages/scripts/examples/nodejs-async-iteration.js +360 -0
  201. package/packages/plugin-languages/scripts/examples/python-async-patterns.py +289 -0
  202. package/packages/plugin-languages/scripts/examples/typescript-patterns.ts +432 -0
  203. package/packages/plugin-ml/README.md +430 -0
  204. package/packages/plugin-ml/agents/automl-expert.md +326 -0
  205. package/packages/plugin-ml/agents/computer-vision-expert.md +550 -0
  206. package/packages/plugin-ml/agents/gradient-boosting-expert.md +455 -0
  207. package/packages/plugin-ml/agents/neural-network-architect.md +1228 -0
  208. package/packages/plugin-ml/agents/nlp-transformer-expert.md +584 -0
  209. package/packages/plugin-ml/agents/pytorch-expert.md +412 -0
  210. package/packages/plugin-ml/agents/reinforcement-learning-expert.md +2088 -0
  211. package/packages/plugin-ml/agents/scikit-learn-expert.md +228 -0
  212. package/packages/plugin-ml/agents/tensorflow-keras-expert.md +509 -0
  213. package/packages/plugin-ml/agents/time-series-expert.md +303 -0
  214. package/packages/plugin-ml/commands/ml-automl.md +572 -0
  215. package/packages/plugin-ml/commands/ml-train-optimize.md +657 -0
  216. package/packages/plugin-ml/package.json +52 -0
  217. package/packages/plugin-ml/plugin.json +338 -0
  218. package/packages/plugin-pm/README.md +368 -0
  219. package/packages/plugin-pm/claudeautopm-plugin-pm-2.0.0.tgz +0 -0
  220. package/packages/plugin-pm/commands/azure/COMMANDS.md +107 -0
  221. package/packages/plugin-pm/commands/azure/COMMAND_MAPPING.md +252 -0
  222. package/packages/plugin-pm/commands/azure/INTEGRATION_FIX.md +103 -0
  223. package/packages/plugin-pm/commands/azure/README.md +246 -0
  224. package/packages/plugin-pm/commands/azure/active-work.md +198 -0
  225. package/packages/plugin-pm/commands/azure/aliases.md +143 -0
  226. package/packages/plugin-pm/commands/azure/blocked-items.md +287 -0
  227. package/packages/plugin-pm/commands/azure/clean.md +93 -0
  228. package/packages/plugin-pm/commands/azure/docs-query.md +48 -0
  229. package/packages/plugin-pm/commands/azure/feature-decompose.md +380 -0
  230. package/packages/plugin-pm/commands/azure/feature-list.md +61 -0
  231. package/packages/plugin-pm/commands/azure/feature-new.md +115 -0
  232. package/packages/plugin-pm/commands/azure/feature-show.md +205 -0
  233. package/packages/plugin-pm/commands/azure/feature-start.md +130 -0
  234. package/packages/plugin-pm/commands/azure/fix-integration-example.md +93 -0
  235. package/packages/plugin-pm/commands/azure/help.md +150 -0
  236. package/packages/plugin-pm/commands/azure/import-us.md +269 -0
  237. package/packages/plugin-pm/commands/azure/init.md +211 -0
  238. package/packages/plugin-pm/commands/azure/next-task.md +262 -0
  239. package/packages/plugin-pm/commands/azure/search.md +160 -0
  240. package/packages/plugin-pm/commands/azure/sprint-status.md +235 -0
  241. package/packages/plugin-pm/commands/azure/standup.md +260 -0
  242. package/packages/plugin-pm/commands/azure/sync-all.md +99 -0
  243. package/packages/plugin-pm/commands/azure/task-analyze.md +186 -0
  244. package/packages/plugin-pm/commands/azure/task-close.md +329 -0
  245. package/packages/plugin-pm/commands/azure/task-edit.md +145 -0
  246. package/packages/plugin-pm/commands/azure/task-list.md +263 -0
  247. package/packages/plugin-pm/commands/azure/task-new.md +84 -0
  248. package/packages/plugin-pm/commands/azure/task-reopen.md +79 -0
  249. package/packages/plugin-pm/commands/azure/task-show.md +126 -0
  250. package/packages/plugin-pm/commands/azure/task-start.md +301 -0
  251. package/packages/plugin-pm/commands/azure/task-status.md +65 -0
  252. package/packages/plugin-pm/commands/azure/task-sync.md +67 -0
  253. package/packages/plugin-pm/commands/azure/us-edit.md +164 -0
  254. package/packages/plugin-pm/commands/azure/us-list.md +202 -0
  255. package/packages/plugin-pm/commands/azure/us-new.md +265 -0
  256. package/packages/plugin-pm/commands/azure/us-parse.md +253 -0
  257. package/packages/plugin-pm/commands/azure/us-show.md +188 -0
  258. package/packages/plugin-pm/commands/azure/us-status.md +320 -0
  259. package/packages/plugin-pm/commands/azure/validate.md +86 -0
  260. package/packages/plugin-pm/commands/azure/work-item-sync.md +47 -0
  261. package/packages/plugin-pm/commands/blocked.md +28 -0
  262. package/packages/plugin-pm/commands/clean.md +119 -0
  263. package/packages/plugin-pm/commands/context-create.md +136 -0
  264. package/packages/plugin-pm/commands/context-prime.md +170 -0
  265. package/packages/plugin-pm/commands/context-update.md +292 -0
  266. package/packages/plugin-pm/commands/context.md +28 -0
  267. package/packages/plugin-pm/commands/epic-close.md +86 -0
  268. package/packages/plugin-pm/commands/epic-decompose.md +370 -0
  269. package/packages/plugin-pm/commands/epic-edit.md +83 -0
  270. package/packages/plugin-pm/commands/epic-list.md +30 -0
  271. package/packages/plugin-pm/commands/epic-merge.md +222 -0
  272. package/packages/plugin-pm/commands/epic-oneshot.md +119 -0
  273. package/packages/plugin-pm/commands/epic-refresh.md +119 -0
  274. package/packages/plugin-pm/commands/epic-show.md +28 -0
  275. package/packages/plugin-pm/commands/epic-split.md +120 -0
  276. package/packages/plugin-pm/commands/epic-start.md +195 -0
  277. package/packages/plugin-pm/commands/epic-status.md +28 -0
  278. package/packages/plugin-pm/commands/epic-sync-modular.md +338 -0
  279. package/packages/plugin-pm/commands/epic-sync-original.md +473 -0
  280. package/packages/plugin-pm/commands/epic-sync.md +486 -0
  281. package/packages/plugin-pm/commands/github/workflow-create.md +42 -0
  282. package/packages/plugin-pm/commands/help.md +28 -0
  283. package/packages/plugin-pm/commands/import.md +115 -0
  284. package/packages/plugin-pm/commands/in-progress.md +28 -0
  285. package/packages/plugin-pm/commands/init.md +28 -0
  286. package/packages/plugin-pm/commands/issue-analyze.md +202 -0
  287. package/packages/plugin-pm/commands/issue-close.md +119 -0
  288. package/packages/plugin-pm/commands/issue-edit.md +93 -0
  289. package/packages/plugin-pm/commands/issue-reopen.md +87 -0
  290. package/packages/plugin-pm/commands/issue-show.md +41 -0
  291. package/packages/plugin-pm/commands/issue-start.md +234 -0
  292. package/packages/plugin-pm/commands/issue-status.md +95 -0
  293. package/packages/plugin-pm/commands/issue-sync.md +411 -0
  294. package/packages/plugin-pm/commands/next.md +28 -0
  295. package/packages/plugin-pm/commands/prd-edit.md +82 -0
  296. package/packages/plugin-pm/commands/prd-list.md +28 -0
  297. package/packages/plugin-pm/commands/prd-new.md +55 -0
  298. package/packages/plugin-pm/commands/prd-parse.md +42 -0
  299. package/packages/plugin-pm/commands/prd-status.md +28 -0
  300. package/packages/plugin-pm/commands/search.md +28 -0
  301. package/packages/plugin-pm/commands/standup.md +28 -0
  302. package/packages/plugin-pm/commands/status.md +28 -0
  303. package/packages/plugin-pm/commands/sync.md +99 -0
  304. package/packages/plugin-pm/commands/test-reference-update.md +151 -0
  305. package/packages/plugin-pm/commands/validate.md +28 -0
  306. package/packages/plugin-pm/commands/what-next.md +28 -0
  307. package/packages/plugin-pm/package.json +57 -0
  308. package/packages/plugin-pm/plugin.json +503 -0
  309. package/packages/plugin-pm/scripts/pm/analytics.js +425 -0
  310. package/packages/plugin-pm/scripts/pm/blocked.js +164 -0
  311. package/packages/plugin-pm/scripts/pm/blocked.sh +78 -0
  312. package/packages/plugin-pm/scripts/pm/clean.js +464 -0
  313. package/packages/plugin-pm/scripts/pm/context-create.js +216 -0
  314. package/packages/plugin-pm/scripts/pm/context-prime.js +335 -0
  315. package/packages/plugin-pm/scripts/pm/context-update.js +344 -0
  316. package/packages/plugin-pm/scripts/pm/context.js +338 -0
  317. package/packages/plugin-pm/scripts/pm/epic-close.js +347 -0
  318. package/packages/plugin-pm/scripts/pm/epic-edit.js +382 -0
  319. package/packages/plugin-pm/scripts/pm/epic-list.js +273 -0
  320. package/packages/plugin-pm/scripts/pm/epic-list.sh +109 -0
  321. package/packages/plugin-pm/scripts/pm/epic-show.js +291 -0
  322. package/packages/plugin-pm/scripts/pm/epic-show.sh +105 -0
  323. package/packages/plugin-pm/scripts/pm/epic-split.js +522 -0
  324. package/packages/plugin-pm/scripts/pm/epic-start/epic-start.js +183 -0
  325. package/packages/plugin-pm/scripts/pm/epic-start/epic-start.sh +94 -0
  326. package/packages/plugin-pm/scripts/pm/epic-status.js +291 -0
  327. package/packages/plugin-pm/scripts/pm/epic-status.sh +104 -0
  328. package/packages/plugin-pm/scripts/pm/epic-sync/README.md +208 -0
  329. package/packages/plugin-pm/scripts/pm/epic-sync/create-epic-issue.sh +77 -0
  330. package/packages/plugin-pm/scripts/pm/epic-sync/create-task-issues.sh +86 -0
  331. package/packages/plugin-pm/scripts/pm/epic-sync/update-epic-file.sh +79 -0
  332. package/packages/plugin-pm/scripts/pm/epic-sync/update-references.sh +89 -0
  333. package/packages/plugin-pm/scripts/pm/epic-sync.sh +137 -0
  334. package/packages/plugin-pm/scripts/pm/help.js +92 -0
  335. package/packages/plugin-pm/scripts/pm/help.sh +90 -0
  336. package/packages/plugin-pm/scripts/pm/in-progress.js +178 -0
  337. package/packages/plugin-pm/scripts/pm/in-progress.sh +93 -0
  338. package/packages/plugin-pm/scripts/pm/init.js +321 -0
  339. package/packages/plugin-pm/scripts/pm/init.sh +178 -0
  340. package/packages/plugin-pm/scripts/pm/issue-close.js +232 -0
  341. package/packages/plugin-pm/scripts/pm/issue-edit.js +310 -0
  342. package/packages/plugin-pm/scripts/pm/issue-show.js +272 -0
  343. package/packages/plugin-pm/scripts/pm/issue-start.js +181 -0
  344. package/packages/plugin-pm/scripts/pm/issue-sync/format-comment.sh +468 -0
  345. package/packages/plugin-pm/scripts/pm/issue-sync/gather-updates.sh +460 -0
  346. package/packages/plugin-pm/scripts/pm/issue-sync/post-comment.sh +330 -0
  347. package/packages/plugin-pm/scripts/pm/issue-sync/preflight-validation.sh +348 -0
  348. package/packages/plugin-pm/scripts/pm/issue-sync/update-frontmatter.sh +387 -0
  349. package/packages/plugin-pm/scripts/pm/lib/README.md +85 -0
  350. package/packages/plugin-pm/scripts/pm/lib/epic-discovery.js +119 -0
  351. package/packages/plugin-pm/scripts/pm/lib/logger.js +78 -0
  352. package/packages/plugin-pm/scripts/pm/next.js +189 -0
  353. package/packages/plugin-pm/scripts/pm/next.sh +72 -0
  354. package/packages/plugin-pm/scripts/pm/optimize.js +407 -0
  355. package/packages/plugin-pm/scripts/pm/pr-create.js +337 -0
  356. package/packages/plugin-pm/scripts/pm/pr-list.js +257 -0
  357. package/packages/plugin-pm/scripts/pm/prd-list.js +242 -0
  358. package/packages/plugin-pm/scripts/pm/prd-list.sh +103 -0
  359. package/packages/plugin-pm/scripts/pm/prd-new.js +684 -0
  360. package/packages/plugin-pm/scripts/pm/prd-parse.js +547 -0
  361. package/packages/plugin-pm/scripts/pm/prd-status.js +152 -0
  362. package/packages/plugin-pm/scripts/pm/prd-status.sh +63 -0
  363. package/packages/plugin-pm/scripts/pm/release.js +460 -0
  364. package/packages/plugin-pm/scripts/pm/search.js +192 -0
  365. package/packages/plugin-pm/scripts/pm/search.sh +89 -0
  366. package/packages/plugin-pm/scripts/pm/standup.js +362 -0
  367. package/packages/plugin-pm/scripts/pm/standup.sh +95 -0
  368. package/packages/plugin-pm/scripts/pm/status.js +148 -0
  369. package/packages/plugin-pm/scripts/pm/status.sh +59 -0
  370. package/packages/plugin-pm/scripts/pm/sync-batch.js +337 -0
  371. package/packages/plugin-pm/scripts/pm/sync.js +343 -0
  372. package/packages/plugin-pm/scripts/pm/template-list.js +141 -0
  373. package/packages/plugin-pm/scripts/pm/template-new.js +366 -0
  374. package/packages/plugin-pm/scripts/pm/validate.js +274 -0
  375. package/packages/plugin-pm/scripts/pm/validate.sh +106 -0
  376. package/packages/plugin-pm/scripts/pm/what-next.js +660 -0
  377. package/packages/plugin-testing/README.md +401 -0
  378. package/packages/plugin-testing/agents/frontend-testing-engineer.md +768 -0
  379. package/packages/plugin-testing/commands/jest-optimize.md +800 -0
  380. package/packages/plugin-testing/commands/playwright-optimize.md +887 -0
  381. package/packages/plugin-testing/commands/test-coverage.md +512 -0
  382. package/packages/plugin-testing/commands/test-performance.md +1041 -0
  383. package/packages/plugin-testing/commands/test-setup.md +414 -0
  384. package/packages/plugin-testing/package.json +40 -0
  385. package/packages/plugin-testing/plugin.json +197 -0
  386. package/packages/plugin-testing/rules/test-coverage-requirements.md +581 -0
  387. package/packages/plugin-testing/rules/testing-standards.md +529 -0
  388. package/packages/plugin-testing/scripts/examples/react-testing-example.test.jsx +460 -0
  389. package/packages/plugin-testing/scripts/examples/vitest-config-example.js +352 -0
  390. package/packages/plugin-testing/scripts/examples/vue-testing-example.test.js +586 -0
@@ -0,0 +1,789 @@
1
+ ---
2
+ allowed-tools: Bash, Read, Write, LS
3
+ ---
4
+
5
+ # huggingface:deploy
6
+
7
+ Deploy HuggingFace models to production with Context7-verified quantization, optimization, and inference endpoint strategies.
8
+
9
+ ## Description
10
+
11
+ Comprehensive HuggingFace model deployment following official best practices:
12
+ - Model quantization (GPTQ, AWQ, GGUF)
13
+ - Inference optimization (vLLM, TGI, Optimum)
14
+ - Deployment strategies (HF Inference Endpoints, SageMaker, local)
15
+ - Auto-scaling and load balancing
16
+ - Model serving with FastAPI
17
+ - Performance monitoring
18
+
19
+ ## Required Documentation Access
20
+
21
+ **MANDATORY:** Before deployment, query Context7 for HuggingFace best practices:
22
+
23
+ **Documentation Queries:**
24
+ - `mcp://context7/huggingface/transformers` - Transformers library patterns
25
+ - `mcp://context7/huggingface/inference-endpoints` - Managed inference deployment
26
+ - `mcp://context7/huggingface/quantization` - GPTQ, AWQ, GGUF quantization
27
+ - `mcp://context7/huggingface/optimum` - Hardware-optimized inference
28
+ - `mcp://context7/huggingface/vllm` - vLLM high-throughput serving
29
+ - `mcp://context7/huggingface/tgi` - Text Generation Inference
30
+
31
+ **Why This is Required:**
32
+ - Ensures deployment follows official HuggingFace documentation
33
+ - Applies proven quantization techniques
34
+ - Validates inference optimization strategies
35
+ - Prevents performance bottlenecks
36
+ - Optimizes resource usage and costs
37
+ - Implements production-ready patterns
38
+
39
+ ## Usage
40
+
41
+ ```bash
42
+ /huggingface:deploy [options]
43
+ ```
44
+
45
+ ## Options
46
+
47
+ - `--model <model-id>` - HuggingFace model ID (e.g., mistralai/Mistral-7B-v0.1)
48
+ - `--quantization <none|gptq|awq|gguf>` - Quantization method (default: none)
49
+ - `--backend <vllm|tgi|optimum|transformers>` - Inference backend (default: transformers)
50
+ - `--deployment <endpoints|sagemaker|local>` - Deployment target (default: local)
51
+ - `--output <file>` - Write deployment config
52
+
53
+ ## Examples
54
+
55
+ ### Full Deployment Pipeline
56
+ ```bash
57
+ /huggingface:deploy --model mistralai/Mistral-7B-v0.1 --quantization gptq --backend vllm
58
+ ```
59
+
60
+ ### Deploy to HF Inference Endpoints
61
+ ```bash
62
+ /huggingface:deploy --model meta-llama/Llama-3.1-8B --deployment endpoints
63
+ ```
64
+
65
+ ### Local Deployment with Quantization
66
+ ```bash
67
+ /huggingface:deploy --model TheBloke/Mistral-7B-GPTQ --backend vllm --deployment local
68
+ ```
69
+
70
+ ### Generate Deployment Config
71
+ ```bash
72
+ /huggingface:deploy --model microsoft/phi-2 --output deploy-config.yaml
73
+ ```
74
+
75
+ ## Deployment Categories
76
+
77
+ ### 1. Model Quantization (Context7-Verified)
78
+
79
+ **Pattern from Context7 (/huggingface/transformers):**
80
+
81
+ #### GPTQ Quantization (4-bit)
82
+ ```python
83
+ from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
84
+
85
+ # Load model with GPTQ quantization
86
+ model_id = "TheBloke/Mistral-7B-GPTQ"
87
+
88
+ quantization_config = GPTQConfig(
89
+ bits=4,
90
+ group_size=128,
91
+ desc_act=False
92
+ )
93
+
94
+ model = AutoModelForCausalLM.from_pretrained(
95
+ model_id,
96
+ device_map="auto",
97
+ quantization_config=quantization_config
98
+ )
99
+
100
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
101
+
102
+ # Generate
103
+ inputs = tokenizer("What is machine learning?", return_tensors="pt").to("cuda")
104
+ outputs = model.generate(**inputs, max_new_tokens=100)
105
+ print(tokenizer.decode(outputs[0]))
106
+ ```
107
+
108
+ **Memory Savings:**
109
+ - FP16 model: 14 GB (7B parameters × 2 bytes)
110
+ - GPTQ 4-bit: 3.5 GB (7B parameters × 0.5 bytes)
111
+ - Reduction: 75% memory savings
112
+
113
+ **Performance:**
114
+ - Speed: ~5% slower than FP16
115
+ - Quality: Minimal degradation (<1% perplexity increase)
116
+ - Throughput: 4x more models per GPU
117
+
118
+ #### AWQ Quantization (4-bit, optimized)
119
+ ```python
120
+ from transformers import AutoModelForCausalLM, AwqConfig
121
+
122
+ # AWQ: Activation-aware Weight Quantization
123
+ awq_config = AwqConfig(
124
+ bits=4,
125
+ group_size=128,
126
+ zero_point=True,
127
+ version="gemm" # Optimized GEMM kernels
128
+ )
129
+
130
+ model = AutoModelForCausalLM.from_pretrained(
131
+ "TheBloke/Mistral-7B-AWQ",
132
+ device_map="auto",
133
+ quantization_config=awq_config
134
+ )
135
+
136
+ # AWQ is 2-3x faster than GPTQ for same quality
137
+ ```
138
+
139
+ **Performance:**
140
+ - Speed: Same as FP16 (optimized kernels)
141
+ - Quality: Better than GPTQ (activation-aware)
142
+ - Memory: 75% reduction (same as GPTQ)
143
+ - Best for: Production inference
144
+
145
+ #### bitsandbytes INT8 Quantization
146
+ ```python
147
+ from transformers import AutoModelForCausalLM, BitsAndBytesConfig
148
+
149
+ # INT8 quantization with bitsandbytes
150
+ bnb_config = BitsAndBytesConfig(
151
+ load_in_8bit=True,
152
+ llm_int8_threshold=6.0,
153
+ llm_int8_has_fp16_weight=False
154
+ )
155
+
156
+ model = AutoModelForCausalLM.from_pretrained(
157
+ "mistralai/Mistral-7B-v0.1",
158
+ device_map="auto",
159
+ quantization_config=bnb_config
160
+ )
161
+
162
+ # Memory: 50% reduction, minimal quality loss
163
+ ```
164
+
165
+ **Performance:**
166
+ - Memory: 7 GB (50% reduction)
167
+ - Speed: 10% slower than FP16
168
+ - Quality: <0.5% degradation
169
+ - Best for: Fine-tuning and inference
170
+
171
+ #### GGUF Quantization (CPU inference)
172
+ ```bash
173
+ # Convert to GGUF format for llama.cpp
174
+ pip install llama-cpp-python
175
+
176
+ # Download GGUF model
177
+ from huggingface_hub import hf_hub_download
178
+
179
+ model_path = hf_hub_download(
180
+ repo_id="TheBloke/Mistral-7B-GGUF",
181
+ filename="mistral-7b.Q4_K_M.gguf"
182
+ )
183
+
184
+ # Load with llama-cpp-python
185
+ from llama_cpp import Llama
186
+
187
+ llm = Llama(
188
+ model_path=model_path,
189
+ n_ctx=2048,
190
+ n_gpu_layers=0 # CPU only (or 32 for GPU offload)
191
+ )
192
+
193
+ # Generate
194
+ output = llm("What is AI?", max_tokens=100)
195
+ print(output['choices'][0]['text'])
196
+ ```
197
+
198
+ **Performance:**
199
+ - CPU inference: 5-10 tokens/sec (4-bit)
200
+ - GPU offload: 20-50 tokens/sec
201
+ - Memory: 4 GB (CPU)
202
+ - Best for: Edge deployment, CPU servers
203
+
204
+ ### 2. vLLM High-Throughput Serving (Context7-Verified)
205
+
206
+ **Pattern from Context7 (/huggingface/vllm):**
207
+
208
+ #### vLLM Server Setup
209
+ ```python
210
+ from vllm import LLM, SamplingParams
211
+
212
+ # Initialize vLLM
213
+ llm = LLM(
214
+ model="mistralai/Mistral-7B-v0.1",
215
+ tensor_parallel_size=1, # Number of GPUs
216
+ dtype="auto",
217
+ max_model_len=4096,
218
+ gpu_memory_utilization=0.9,
219
+ enforce_eager=False, # Use CUDA graphs for faster inference
220
+ trust_remote_code=True
221
+ )
222
+
223
+ # Sampling parameters
224
+ sampling_params = SamplingParams(
225
+ temperature=0.7,
226
+ top_p=0.9,
227
+ max_tokens=256
228
+ )
229
+
230
+ # Batch inference
231
+ prompts = [
232
+ "What is machine learning?",
233
+ "Explain quantum computing.",
234
+ "What is Python programming?"
235
+ ]
236
+
237
+ outputs = llm.generate(prompts, sampling_params)
238
+
239
+ for output in outputs:
240
+ prompt = output.prompt
241
+ generated_text = output.outputs[0].text
242
+ print(f"Prompt: {prompt}")
243
+ print(f"Generated: {generated_text}\n")
244
+ ```
245
+
246
+ **Performance:**
247
+ - Throughput: 2-10x higher than HF Transformers
248
+ - Continuous batching: Automatic request batching
249
+ - PagedAttention: Efficient KV cache management
250
+ - Multi-GPU: Tensor parallelism support
251
+
252
+ **Benchmarks (Mistral-7B on A100):**
253
+ - HF Transformers: 30 tokens/sec
254
+ - vLLM: 200+ tokens/sec (6x faster)
255
+ - Memory efficiency: 2x more concurrent requests
256
+
257
+ #### vLLM API Server
258
+ ```bash
259
+ # Start vLLM OpenAI-compatible API server
260
+ python -m vllm.entrypoints.openai.api_server \
261
+ --model mistralai/Mistral-7B-v0.1 \
262
+ --host 0.0.0.0 \
263
+ --port 8000 \
264
+ --tensor-parallel-size 1
265
+
266
+ # Test with OpenAI client
267
+ from openai import OpenAI
268
+
269
+ client = OpenAI(
270
+ base_url="http://localhost:8000/v1",
271
+ api_key="dummy" # vLLM doesn't require API key
272
+ )
273
+
274
+ completion = client.chat.completions.create(
275
+ model="mistralai/Mistral-7B-v0.1",
276
+ messages=[
277
+ {"role": "user", "content": "What is AI?"}
278
+ ]
279
+ )
280
+
281
+ print(completion.choices[0].message.content)
282
+ ```
283
+
284
+ **Benefits:**
285
+ - OpenAI-compatible API
286
+ - Drop-in replacement for OpenAI
287
+ - 10x cheaper than OpenAI (self-hosted)
288
+ - Full control over model
289
+
290
+ ### 3. Text Generation Inference (TGI) (Context7-Verified)
291
+
292
+ **Pattern from Context7 (/huggingface/tgi):**
293
+
294
+ #### TGI Docker Deployment
295
+ ```bash
296
+ # Run TGI with Docker
297
+ docker run --gpus all --shm-size 1g -p 8080:80 \
298
+ -v $PWD/data:/data \
299
+ ghcr.io/huggingface/text-generation-inference:latest \
300
+ --model-id mistralai/Mistral-7B-v0.1 \
301
+ --num-shard 1 \
302
+ --max-input-length 2048 \
303
+ --max-total-tokens 4096 \
304
+ --quantize gptq
305
+
306
+ # Test with curl
307
+ curl http://localhost:8080/generate \
308
+ -X POST \
309
+ -d '{"inputs": "What is deep learning?", "parameters": {"max_new_tokens": 100}}' \
310
+ -H 'Content-Type: application/json'
311
+ ```
312
+
313
+ **TGI Features:**
314
+ - Continuous batching
315
+ - Flash Attention 2
316
+ - GPTQ/AWQ quantization
317
+ - Token streaming
318
+ - Auto-scaling
319
+
320
+ #### TGI Client (Python)
321
+ ```python
322
+ from huggingface_hub import InferenceClient
323
+
324
+ client = InferenceClient(model="http://localhost:8080")
325
+
326
+ # Generate
327
+ text = client.text_generation(
328
+ "Explain artificial intelligence",
329
+ max_new_tokens=100,
330
+ temperature=0.7,
331
+ top_p=0.9,
332
+ stream=False
333
+ )
334
+
335
+ print(text)
336
+
337
+ # Streaming
338
+ for token in client.text_generation(
339
+ "Write a story about AI",
340
+ max_new_tokens=200,
341
+ stream=True
342
+ ):
343
+ print(token, end="", flush=True)
344
+ ```
345
+
346
+ **Performance (Mistral-7B on A100):**
347
+ - Throughput: 150+ tokens/sec
348
+ - Latency: <100ms time to first token
349
+ - Memory: Optimized with Flash Attention 2
350
+ - Best for: Production serving
351
+
352
+ ### 4. HuggingFace Inference Endpoints (Context7-Verified)
353
+
354
+ **Pattern from Context7 (/huggingface/inference-endpoints):**
355
+
356
+ #### Deploy to HF Inference Endpoints
357
+ ```python
358
+ from huggingface_hub import create_inference_endpoint
359
+
360
+ # Create managed endpoint
361
+ endpoint = create_inference_endpoint(
362
+ name="mistral-7b-endpoint",
363
+ repository="mistralai/Mistral-7B-v0.1",
364
+ framework="pytorch",
365
+ task="text-generation",
366
+ accelerator="gpu",
367
+ instance_size="x1", # 1x NVIDIA A10G
368
+ instance_type="nvidia-a10g",
369
+ region="us-east-1",
370
+ vendor="aws",
371
+ account_id="your-account-id",
372
+ min_replica=1,
373
+ max_replica=3,
374
+ revision="main",
375
+ custom_image={
376
+ "health_route": "/health",
377
+ "env": {
378
+ "MAX_INPUT_LENGTH": "2048",
379
+ "MAX_TOTAL_TOKENS": "4096"
380
+ }
381
+ }
382
+ )
383
+
384
+ print(f"Endpoint created: {endpoint.name}")
385
+ print(f"URL: {endpoint.url}")
386
+
387
+ # Wait for deployment
388
+ endpoint.wait()
389
+
390
+ # Test endpoint
391
+ from huggingface_hub import InferenceClient
392
+
393
+ client = InferenceClient(model=endpoint.url, token="hf_xxx")
394
+
395
+ response = client.text_generation(
396
+ "What is machine learning?",
397
+ max_new_tokens=100
398
+ )
399
+
400
+ print(response)
401
+ ```
402
+
403
+ **Pricing (as of 2025):**
404
+ - x1 (NVIDIA A10G): $0.60/hour
405
+ - x2 (2x A10G): $1.20/hour
406
+ - x4 (4x A100): $4.50/hour
407
+ - Auto-scaling: Pay only for active replicas
408
+
409
+ **Benefits:**
410
+ - Fully managed infrastructure
411
+ - Auto-scaling (1-10 replicas)
412
+ - Built-in monitoring
413
+ - 99.9% uptime SLA
414
+ - Global CDN
415
+
416
+ ### 5. Optimum Hardware Acceleration (Context7-Verified)
417
+
418
+ **Pattern from Context7 (/huggingface/optimum):**
419
+
420
+ #### ONNX Runtime Optimization
421
+ ```python
422
+ from optimum.onnxruntime import ORTModelForCausalLM
423
+ from transformers import AutoTokenizer
424
+
425
+ # Convert to ONNX and optimize
426
+ model = ORTModelForCausalLM.from_pretrained(
427
+ "microsoft/phi-2",
428
+ export=True,
429
+ provider="CUDAExecutionProvider" # GPU acceleration
430
+ )
431
+
432
+ tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
433
+
434
+ # Inference
435
+ inputs = tokenizer("What is AI?", return_tensors="pt")
436
+ outputs = model.generate(**inputs, max_new_tokens=100)
437
+ print(tokenizer.decode(outputs[0]))
438
+ ```
439
+
440
+ **Performance:**
441
+ - Speed: 2-3x faster than PyTorch
442
+ - Memory: 30% reduction
443
+ - Cross-platform: CPU, GPU, NPU
444
+ - Best for: Edge deployment
445
+
446
+ #### Intel Neural Compressor
447
+ ```python
448
+ from optimum.intel import INCModelForCausalLM
449
+
450
+ # Optimize for Intel CPUs
451
+ model = INCModelForCausalLM.from_pretrained(
452
+ "microsoft/phi-2",
453
+ export=True
454
+ )
455
+
456
+ # INT8 quantization for CPU
457
+ from optimum.intel import INCQuantizer
458
+
459
+ quantizer = INCQuantizer.from_pretrained(model)
460
+ quantized_model = quantizer.quantize()
461
+
462
+ # 4x faster on Intel CPUs
463
+ ```
464
+
465
+ ### 6. FastAPI Model Serving (Context7-Verified)
466
+
467
+ **Pattern from Context7:**
468
+
469
+ #### Production API Server
470
+ ```python
471
+ from fastapi import FastAPI, HTTPException
472
+ from pydantic import BaseModel
473
+ from transformers import AutoModelForCausalLM, AutoTokenizer
474
+ import torch
475
+
476
+ app = FastAPI()
477
+
478
+ # Load model once at startup
479
+ model_id = "microsoft/phi-2"
480
+ model = AutoModelForCausalLM.from_pretrained(
481
+ model_id,
482
+ torch_dtype=torch.float16,
483
+ device_map="auto"
484
+ )
485
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
486
+
487
+ class GenerationRequest(BaseModel):
488
+ prompt: str
489
+ max_tokens: int = 100
490
+ temperature: float = 0.7
491
+ top_p: float = 0.9
492
+
493
+ class GenerationResponse(BaseModel):
494
+ generated_text: str
495
+ tokens_generated: int
496
+
497
+ @app.post("/generate", response_model=GenerationResponse)
498
+ async def generate(request: GenerationRequest):
499
+ try:
500
+ inputs = tokenizer(request.prompt, return_tensors="pt").to(model.device)
501
+
502
+ outputs = model.generate(
503
+ **inputs,
504
+ max_new_tokens=request.max_tokens,
505
+ temperature=request.temperature,
506
+ top_p=request.top_p,
507
+ do_sample=True
508
+ )
509
+
510
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
511
+ tokens_generated = len(outputs[0]) - len(inputs.input_ids[0])
512
+
513
+ return GenerationResponse(
514
+ generated_text=generated_text,
515
+ tokens_generated=tokens_generated
516
+ )
517
+
518
+ except Exception as e:
519
+ raise HTTPException(status_code=500, detail=str(e))
520
+
521
+ @app.get("/health")
522
+ async def health():
523
+ return {"status": "healthy"}
524
+
525
+ # Run: uvicorn server:app --host 0.0.0.0 --port 8000
526
+ ```
527
+
528
+ #### Docker Deployment
529
+ ```dockerfile
530
+ FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04
531
+
532
+ # Install Python
533
+ RUN apt-get update && apt-get install -y python3 python3-pip
534
+
535
+ # Install dependencies
536
+ COPY requirements.txt .
537
+ RUN pip3 install -r requirements.txt
538
+
539
+ # Copy application
540
+ COPY server.py .
541
+
542
+ # Download model at build time
543
+ RUN python3 -c "from transformers import AutoModelForCausalLM; AutoModelForCausalLM.from_pretrained('microsoft/phi-2')"
544
+
545
+ # Run server
546
+ CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8000"]
547
+ ```
548
+
549
+ ```bash
550
+ # Build and run
551
+ docker build -t hf-model-server .
552
+ docker run --gpus all -p 8000:8000 hf-model-server
553
+ ```
554
+
555
+ ### 7. Monitoring and Auto-Scaling (Context7-Verified)
556
+
557
+ **Pattern from Context7:**
558
+
559
+ #### Prometheus Metrics
560
+ ```python
561
+ from prometheus_client import Counter, Histogram, start_http_server
562
+ import time
563
+
564
+ # Metrics
565
+ request_count = Counter('model_requests_total', 'Total inference requests')
566
+ request_duration = Histogram('model_request_duration_seconds', 'Request duration')
567
+ tokens_generated = Counter('model_tokens_generated_total', 'Total tokens generated')
568
+
569
+ @app.post("/generate")
570
+ async def generate(request: GenerationRequest):
571
+ request_count.inc()
572
+
573
+ start_time = time.time()
574
+
575
+ # Generate
576
+ outputs = model.generate(...)
577
+
578
+ # Record metrics
579
+ duration = time.time() - start_time
580
+ request_duration.observe(duration)
581
+ tokens_generated.inc(len(outputs[0]))
582
+
583
+ return response
584
+
585
+ # Start metrics server
586
+ start_http_server(9090)
587
+ ```
588
+
589
+ #### Kubernetes Auto-Scaling
590
+ ```yaml
591
+ apiVersion: autoscaling/v2
592
+ kind: HorizontalPodAutoscaler
593
+ metadata:
594
+ name: hf-model-server
595
+ spec:
596
+ scaleTargetRef:
597
+ apiVersion: apps/v1
598
+ kind: Deployment
599
+ name: hf-model-server
600
+ minReplicas: 1
601
+ maxReplicas: 10
602
+ metrics:
603
+ - type: Resource
604
+ resource:
605
+ name: cpu
606
+ target:
607
+ type: Utilization
608
+ averageUtilization: 70
609
+ - type: Pods
610
+ pods:
611
+ metric:
612
+ name: model_requests_per_second
613
+ target:
614
+ type: AverageValue
615
+ averageValue: "100"
616
+ ```
617
+
618
+ ## Deployment Output
619
+
620
+ ```
621
+ 🚀 HuggingFace Model Deployment
622
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
623
+
624
+ Model: mistralai/Mistral-7B-v0.1
625
+ Quantization: GPTQ 4-bit
626
+ Backend: vLLM
627
+ Deployment: Local
628
+
629
+ 📊 Model Configuration
630
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
631
+
632
+ Base Model:
633
+ - Parameters: 7.2B
634
+ - FP16 size: 14 GB
635
+ - Context length: 8192 tokens
636
+
637
+ Quantization:
638
+ - Method: GPTQ 4-bit
639
+ - Quantized size: 3.5 GB (75% reduction)
640
+ - Quality: 99% of FP16 (minimal degradation)
641
+ - Speed: 95% of FP16 performance
642
+
643
+ vLLM Configuration:
644
+ - GPUs: 1x NVIDIA A100
645
+ - Tensor parallel: 1
646
+ - Max model length: 4096
647
+ - GPU memory: 90% utilization
648
+ - PagedAttention: Enabled
649
+
650
+ ⚡ Performance Benchmarks
651
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
652
+
653
+ Throughput:
654
+ - Baseline (HF Transformers): 30 tokens/sec
655
+ - vLLM optimized: 200 tokens/sec (6.7x faster)
656
+ - Concurrent requests: 20 (vs 5 baseline)
657
+
658
+ Latency:
659
+ - Time to first token: 50ms
660
+ - Average token latency: 5ms
661
+ - End-to-end (100 tokens): 550ms
662
+
663
+ Memory:
664
+ - Model: 3.5 GB
665
+ - KV cache: 2 GB
666
+ - Total: 5.5 GB (vs 14 GB baseline)
667
+
668
+ 💰 Cost Analysis
669
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
670
+
671
+ Infrastructure:
672
+ - GPU: 1x A100 ($2/hour AWS)
673
+ - Monthly cost: $1,440
674
+ - Requests: 10M/month
675
+ - Cost per 1K requests: $0.144
676
+
677
+ vs OpenAI GPT-4o:
678
+ - OpenAI cost: $2.50 per 1M input tokens
679
+ - Self-hosted: $0.144 per 1K requests (~$14.40 per 1M tokens)
680
+ - Savings: 82% ($1.86 per MTok)
681
+
682
+ Break-even: ~720K requests/month
683
+
684
+ 🎯 Deployment Status
685
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
686
+
687
+ ✅ Model downloaded and quantized
688
+ ✅ vLLM server running on port 8000
689
+ ✅ Health check: http://localhost:8000/health
690
+ ✅ OpenAI-compatible API: http://localhost:8000/v1
691
+ ✅ Prometheus metrics: http://localhost:9090/metrics
692
+
693
+ Next Steps:
694
+ 1. Test inference: curl http://localhost:8000/v1/completions
695
+ 2. Load test: python load_test.py
696
+ 3. Deploy to production: docker push
697
+ 4. Setup monitoring: prometheus + grafana
698
+
699
+ Configuration saved to: deploy-config.yaml
700
+ ```
701
+
702
+ ## Implementation
703
+
704
+ This command uses the **@huggingface-expert** agent with deployment expertise:
705
+
706
+ 1. Query Context7 for HuggingFace deployment patterns
707
+ 2. Select optimal quantization method
708
+ 3. Configure inference backend
709
+ 4. Setup deployment infrastructure
710
+ 5. Implement monitoring
711
+ 6. Generate deployment config
712
+ 7. Test and validate
713
+
714
+ ## Best Practices Applied
715
+
716
+ Based on Context7 documentation from `/huggingface/transformers`:
717
+
718
+ 1. **GPTQ Quantization** - 75% memory savings, minimal quality loss
719
+ 2. **vLLM Serving** - 6x faster throughput than baseline
720
+ 3. **PagedAttention** - 2x more concurrent requests
721
+ 4. **Flash Attention 2** - 2-4x faster attention computation
722
+ 5. **Continuous Batching** - Automatic request batching
723
+ 6. **Auto-Scaling** - Scale 1-10 replicas based on load
724
+ 7. **Monitoring** - Prometheus metrics for observability
725
+
726
+ ## Related Commands
727
+
728
+ - `/ai:model-deployment` - General model deployment
729
+ - `/openai:optimize` - OpenAI API optimization
730
+ - `/anthropic:optimize` - Anthropic Claude optimization
731
+
732
+ ## Troubleshooting
733
+
734
+ ### Out of Memory (OOM)
735
+ - Use GPTQ/AWQ 4-bit quantization (75% reduction)
736
+ - Reduce max_model_len parameter
737
+ - Enable CPU offloading for large models
738
+ - Use tensor parallelism (multi-GPU)
739
+
740
+ ### Low Throughput
741
+ - Switch to vLLM (6x faster than HF Transformers)
742
+ - Enable continuous batching
743
+ - Use Flash Attention 2
744
+ - Reduce max_new_tokens
745
+
746
+ ### High Latency
747
+ - Use smaller model (Phi-2, Mistral-7B vs Llama-70B)
748
+ - Enable CUDA graphs (vLLM)
749
+ - Use AWQ quantization (same speed as FP16)
750
+ - Reduce context length
751
+
752
+ ### Quality Degradation
753
+ - Use AWQ instead of GPTQ (better quality)
754
+ - Try INT8 quantization (bitsandbytes)
755
+ - Use larger model
756
+ - Reduce quantization level (4-bit → 8-bit)
757
+
758
+ ## Installation
759
+
760
+ ```bash
761
+ # Install HuggingFace ecosystem
762
+ pip install transformers accelerate
763
+
764
+ # Install quantization
765
+ pip install auto-gptq bitsandbytes
766
+
767
+ # Install vLLM
768
+ pip install vllm
769
+
770
+ # Install Optimum
771
+ pip install optimum[onnxruntime-gpu]
772
+
773
+ # Install serving
774
+ pip install fastapi uvicorn
775
+
776
+ # Install monitoring
777
+ pip install prometheus-client
778
+ ```
779
+
780
+ ## Version History
781
+
782
+ - v2.0.0 - Initial Schema v2.0 release with Context7 integration
783
+ - GPTQ/AWQ/GGUF quantization support
784
+ - vLLM high-throughput serving
785
+ - Text Generation Inference (TGI) integration
786
+ - HF Inference Endpoints deployment
787
+ - Optimum hardware acceleration
788
+ - FastAPI production serving
789
+ - Prometheus monitoring