@groupby/ai-dev 0.5.5 → 0.5.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/package.json +1 -1
  2. package/teams/OOF/skills/jira-ticket-creator/README.md +22 -0
  3. package/teams/OOF/skills/jira-ticket-creator/SKILL.md +266 -0
  4. package/teams/fhr-ai-team/github/PULL_REQUEST_TEMPLATE/full.md +31 -0
  5. package/teams/fhr-ai-team/github/PULL_REQUEST_TEMPLATE/light.md +7 -0
  6. package/teams/fhr-ai-team/github/copilot-instructions.md +24 -0
  7. package/teams/fhr-ai-team/github/instructions/python.instructions.md +23 -0
  8. package/teams/fhr-ai-team/github/pull_request_template.md +21 -0
  9. package/teams/fhr-ai-team/prompts/brainstorm.md +7 -0
  10. package/teams/fhr-ai-team/prompts/plan-algo-tests.md +7 -0
  11. package/teams/fhr-ai-team/prompts/plan.md +7 -0
  12. package/teams/fhr-ai-team/prompts/pr-description.md +7 -0
  13. package/teams/fhr-ai-team/prompts/test.md +7 -0
  14. package/teams/fhr-ai-team/resources/AGENTS.md +55 -0
  15. package/teams/fhr-ai-team/resources/CLAUDE.md +52 -0
  16. package/teams/fhr-ai-team/resources/README.md +51 -0
  17. package/teams/fhr-ai-team/resources/claude-code-setup.md +60 -0
  18. package/teams/fhr-ai-team/resources/copilot-setup.md +64 -0
  19. package/teams/fhr-ai-team/resources/onboarding.md +179 -0
  20. package/teams/fhr-ai-team/resources/opencode-install.md +29 -0
  21. package/teams/fhr-ai-team/resources/opencode-setup.md +43 -0
  22. package/teams/fhr-ai-team/skills/algo-test-planning/SKILL.md +192 -0
  23. package/teams/fhr-ai-team/skills/algo-test-planning/references/pipeline-registry.md +280 -0
  24. package/teams/fhr-ai-team/skills/brainstorming/SKILL.md +111 -0
  25. package/teams/fhr-ai-team/skills/e2e-testing/SKILL.md +163 -0
  26. package/teams/fhr-ai-team/skills/grill-me/SKILL.md +10 -0
  27. package/teams/fhr-ai-team/skills/ml-tooling-dev/SKILL.md +313 -0
  28. package/teams/fhr-ai-team/skills/ml-tooling-dev/references/kubectl-debug.md +165 -0
  29. package/teams/fhr-ai-team/skills/ml-tooling-dev/references/mongodb-config.md +218 -0
  30. package/teams/fhr-ai-team/skills/ml-tooling-dev/references/pipeline-configs.md +190 -0
  31. package/teams/fhr-ai-team/skills/ml-tooling-dev/references/pipeline-steps.md +182 -0
  32. package/teams/fhr-ai-team/skills/ml-tooling-dev/scripts/kf_logs.py +203 -0
  33. package/teams/fhr-ai-team/skills/ml-tooling-dev/scripts/kf_query.py +233 -0
  34. package/teams/fhr-ai-team/skills/ml-tooling-dev/scripts/kf_wait.py +195 -0
  35. package/teams/fhr-ai-team/skills/ml-tooling-dev/scripts/mlflow_query.py +252 -0
  36. package/teams/fhr-ai-team/skills/ml-tooling-dev/scripts/mongo_predictor.py +352 -0
  37. package/teams/fhr-ai-team/skills/naming-conventions-reviewer/SKILL.md +230 -0
  38. package/teams/fhr-ai-team/skills/naming-conventions-reviewer/references/dataset-naming.md +190 -0
  39. package/teams/fhr-ai-team/skills/naming-conventions-reviewer/references/domain-vocabulary.md +447 -0
  40. package/teams/fhr-ai-team/skills/naming-conventions-reviewer/references/repo-dependency-graph.md +264 -0
  41. package/teams/fhr-ai-team/skills/planning/SKILL.md +138 -0
  42. package/teams/fhr-ai-team/skills/pr-description/SKILL.md +94 -0
  43. package/teams/snpd/skills/code-review-github/SKILL.md +475 -0
@@ -0,0 +1,60 @@
1
+ # Claude Code Setup
2
+
3
+ ## Installation
4
+
5
+ ### Via Plugin System (Recommended)
6
+
7
+ ```bash
8
+ # Add the marketplace (if not already added)
9
+ /plugin marketplace add Attraqt/ai-agent-skills-marketplace
10
+
11
+ # Install the plugin
12
+ /plugin install ai.pierre@ai-agent-skills-marketplace
13
+ ```
14
+
15
+ ### Manual Install (Local Development)
16
+
17
+ ```bash
18
+ # Clone the repo
19
+ git clone https://github.com/Attraqt/ai.agent-skills.git
20
+
21
+ # Run Claude Code with the plugin directory
22
+ claude --plugin-dir /path/to/ai.agent-skills
23
+ ```
24
+
25
+ ## Available Commands
26
+
27
+ | Command | Description |
28
+ |---------|-------------|
29
+ | `/brainstorm` | Codebase-aware brainstorming with dynamic repo discovery |
30
+ | `/plan` | Implementation planning with mandatory codebase search and code reuse |
31
+ | `/plan-algo-tests` | Interactive 3-stage pipeline test configuration and Kubeflow JSON generation |
32
+ | `/test` | Local pytest, Kubeflow pipeline e2e, or MLflow model validation |
33
+
34
+ ### Usage Examples
35
+
36
+ ```
37
+ /brainstorm adding sparse retrieval to semantic search
38
+ /plan BGE-M3 migration for algo.semantic-search-ml
39
+ /plan-algo-tests
40
+ /test algo.search-ml unit tests
41
+ ```
42
+
43
+ ## Bundled Skills
44
+
45
+ These skills are loaded automatically and invoked based on context:
46
+
47
+ | Skill | Auto-invoked when |
48
+ |-------|-------------------|
49
+ | `ml-tooling-dev` | Working with Kubeflow, MLflow, or MongoDB |
50
+ | `naming-conventions-reviewer` | Writing or reviewing code in any ML repo |
51
+ | `brainstorming` | Designing or exploring features |
52
+ | `planning` | Breaking down implementation tasks |
53
+ | `algo-test-planning` | Configuring pipeline test runs |
54
+ | `e2e-testing` | Running any type of test |
55
+
56
+ ## How Auto-Routing Works
57
+
58
+ The `CLAUDE.md` file at the plugin root defines when each skill should be invoked.
59
+ Claude Code reads this file and applies the appropriate skill based on your request context.
60
+ You can also invoke skills explicitly via slash commands.
@@ -0,0 +1,64 @@
1
+ # GitHub Copilot Setup
2
+
3
+ ## Installation
4
+
5
+ ### Copy Skills to Your Repository
6
+
7
+ ```bash
8
+ # Clone ai.agent-skills
9
+ git clone https://github.com/Attraqt/ai.agent-skills.git /tmp/ai.agent-skills
10
+
11
+ # Copy skills to your project
12
+ mkdir -p .github/skills
13
+ cp -R /tmp/ai.agent-skills/skills/* .github/skills/
14
+ ```
15
+
16
+ ### Add Copilot Instructions
17
+
18
+ Copy the instructions files from this repository into your project:
19
+
20
+ ```bash
21
+ cp /tmp/ai.agent-skills/.github/copilot-instructions.md .github/copilot-instructions.md
22
+ mkdir -p .github/instructions
23
+ cp /tmp/ai.agent-skills/.github/instructions/*.instructions.md .github/instructions/
24
+ ```
25
+
26
+ The instructions are split into two layers:
27
+
28
+ - [`.github/copilot-instructions.md`](../.github/copilot-instructions.md) - Generic rules: Crownpeak AI team conventions and commit message standards. Applied to all reviews.
29
+ - [`.github/instructions/python.instructions.md`](../.github/instructions/python.instructions.md) - Python coding style. Applied only when reviewing `**/*.py` files.
30
+
31
+ You can add more language-specific files (e.g. `typescript.instructions.md`) following the same pattern. See [GitHub docs on path-specific instructions](https://docs.github.com/en/copilot/tutorials/customize-code-review#when-to-use-path-specific-instructions) for details.
32
+
33
+ ### Agent Personas (Optional)
34
+
35
+ Copy agent definitions for specialized review:
36
+
37
+ ```bash
38
+ mkdir -p .github/agents
39
+
40
+ # Create a naming reviewer agent
41
+ cat > .github/agents/naming-reviewer.md << 'EOF'
42
+ You are an expert reviewer for naming conventions in Crownpeak/Earlybirds ML repositories.
43
+ Review code changes for naming consistency using the rules in .github/skills/naming-conventions-reviewer/SKILL.md.
44
+ Flag violations with the correct canonical name.
45
+ EOF
46
+ ```
47
+
48
+ Invoke in Copilot Chat:
49
+ ```
50
+ @naming-reviewer Review this PR for naming convention violations
51
+ ```
52
+
53
+ ## Usage Tips
54
+
55
+ 1. **Keep instructions concise.** Copilot works best with focused, summarized rules rather than full skill files.
56
+ 2. **Use agents for review.** The naming-reviewer agent is useful for PR reviews.
57
+ 3. **Reference skills in chat.** When working on pipeline configs, paste relevant content from `skills/ml-tooling-dev/` into Copilot Chat for context.
58
+ 4. **Combine with PR reviews.** Configure Copilot to use the naming-reviewer agent for automated PR checks.
59
+
60
+ ## Limitations
61
+
62
+ - Copilot does not support the interactive AskUserQuestion flow used by brainstorming and algo-test-planning skills
63
+ - Pipeline-specific skills (ml-tooling-dev, algo-test-planning) work best in Claude Code or OpenCode where they can execute commands
64
+ - For full skill support, use Claude Code or OpenCode
@@ -0,0 +1,179 @@
1
+ # Claude Code — Team Tips & Best Practices
2
+
3
+ > After installing the plugin per [claude-code-setup.md](claude-code-setup.md), read this before your first ticket.
4
+
5
+ ---
6
+
7
+ ## 1. Initial Setup
8
+
9
+ ### GitHub Access
10
+
11
+ Connect Claude Code to GitHub by selecting **individual repositories** — never grant org-wide access. This is a security requirement. If you run into authorization issues, ask Pavel, Julian, or Aurélie for help.
12
+
13
+ ### Atlassian Integration
14
+
15
+ You can connect Claude Code to Atlassian so you can reference Jira ticket IDs directly in your prompts instead of copy-pasting ticket content.
16
+
17
+ ### Essential Skills to Install
18
+
19
+ Before you start working, make sure you have at least these skills available (check with `/` in a session):
20
+
21
+ - **Grill Me** — structured requirements Q&A before implementation
22
+ - **Handoff** — summarizes a session into a spec for the next one
23
+ - **Caveman** — strips pleasantries to save tokens (multiple verbosity levels)
24
+
25
+ If skills don't appear after installation, try these steps in order:
26
+
27
+ 1. Open a new session and type `/` to check if skills are listed
28
+ 2. If not, try reloading skills from the settings
29
+ 3. If still missing, restart the desktop app entirely
30
+ 4. If you installed via CLI but use the desktop app (or vice versa), you may need to reinstall using the method that matches your interface
31
+
32
+ ---
33
+
34
+ ## 2. Workflow: From Ticket to PR
35
+
36
+ ### Step 1 — Read the Ticket
37
+
38
+ Read and understand the ticket yourself first. You need to be able to answer Claude's questions during the Grill Me phase. Don't skip this.
39
+
40
+ ### Step 2 — Grill Me Session
41
+
42
+ Paste the ticket content into a code block (triple backticks), add your instruction and context below it, then invoke `/grill-me`.
43
+
44
+ ```
45
+ \`\`\`
46
+ <ticket content here>
47
+ \`\`\`
48
+
49
+ I want to implement this ticket within the repository <repo-name>.
50
+ /grill-me
51
+ ```
52
+
53
+ **What Grill Me does:** It injects a prompt that tells Claude to interview you relentlessly about every aspect of the plan, traversing a "design tree" of implementation possibilities until you both reach a shared understanding of what to build.
54
+
55
+ **Why it matters:**
56
+ - Surfaces things you forgot or didn't think about (e.g., deployment strategy, missing endpoints)
57
+ - Reaches an explicit **agreement** with the LLM — so it doesn't guess or go off-track during implementation
58
+ - Without it, Claude still asks questions, but they're less structured and less relevant. You end up spending more tokens correcting course later.
59
+
60
+ **During the Q&A:**
61
+ - Read Claude's recommendations before answering — they're usually good, but not always what you want
62
+ - You can reply with just "B" or "recommendation" when you agree — be token-efficient
63
+ - When Claude asks about something out of scope, say so clearly: "Authentication will be handled on the infrastructure side and should not be implemented in this ticket" — not just "don't include this" (too vague, causes misinterpretation)
64
+ - If Claude asks about something you're unsure of, provide context even if it wasn't in the ticket — this enriches later decision-making
65
+
66
+ **Variant — Grill Me with Docs:** Searches your codebase documentation to ask better-informed questions. Useful when you're not sure how to answer a Grill Me question yourself.
67
+
68
+ ### Step 3 — Handoff to a Fresh Session
69
+
70
+ After Grill Me reaches "ready to implement," use `/handoff`. This generates a markdown specification summarizing the agreed plan. Then start a new session and point Claude at that spec.
71
+
72
+ **Why not just keep going in the same session?**
73
+ - Long sessions bloat context — every message you send includes all previous conversation
74
+ - More context = more token cost + degraded LLM quality ("the more context it has, the dumber it gets")
75
+ - `/handoff` gives you a clean start with only what matters
76
+
77
+ Alternative: `/compact` does a similar context reset within the same session.
78
+
79
+ ### Step 4 — Implementation
80
+
81
+ Point Claude at the handoff spec and let it implement. The desktop app (not CLI) will handle PR creation and avoid pushing directly to develop.
82
+
83
+ ### Step 5 — Review
84
+
85
+ Use `/review` for a general code review of your changes. You can also fine-tune review behavior via `copilot-instructions.md` in the repository.
86
+
87
+ ---
88
+
89
+ ## 3. Prompting Techniques
90
+
91
+ ### Be Terse, but Not Ambiguous
92
+
93
+ Saving tokens is good. Losing meaning is not. "B" is fine when Claude gives you options. "Don't include this" is too vague when scoping out a feature — say what you mean and why.
94
+
95
+ ### Provide Full Stack Context
96
+
97
+ Claude doesn't know your stack unless you tell it. Missing context leads to rework — even a good Grill Me session can miss things if Claude doesn't know about your tooling.
98
+
99
+ At the start of a session (or in your Grill Me prompt), mention:
100
+ - **Frameworks and services** your project depends on (LangFuse, Streamlit, FastAPI, etc.)
101
+ - **Deployment targets** (ArgoCD, Argo App ML, Kubeflow, specific namespaces)
102
+ - **Data sources** (MongoDB collections, MLflow experiment names)
103
+ - **What's out of scope** for this ticket (auth, infra, other tickets handling adjacent work)
104
+
105
+ If you're unsure what context matters, use the **Grill Me with Docs** variant — it reads your codebase docs to fill in gaps.
106
+
107
+ ### Use Code Blocks for Pasted Content
108
+
109
+ Wrap ticket text, specs, or any pasted content in triple backticks. Keeps the prompt clean and helps Claude distinguish instructions from reference material.
110
+
111
+ ### Use `/rewind` When Things Go Wrong
112
+
113
+ Rolls back both conversation and code changes to any previous point. Extremely useful when Claude goes down the wrong path.
114
+
115
+ ---
116
+
117
+ ## 4. Token Efficiency
118
+
119
+ - **Caveman skill** reduces conversational overhead. At higher levels it switches to the most token-efficient language (while still generating code in English).
120
+ - **Smaller tickets = fewer tokens.** Split work into focused subtasks. One task per session is the sweet spot.
121
+ - **Avoid 1M-token context models** — quality degrades well before that limit. Standard context with fresh sessions works better.
122
+ - **Handoff between sessions** instead of accumulating context in one long conversation.
123
+ - **French uses more tokens than English** due to accents and tokenization. If token budget is tight, prompt in English.
124
+
125
+ ---
126
+
127
+ ## 5. Building & Using Skills
128
+
129
+ Skills are more than markdown prompts — they're most powerful when they wrap **scripts** that Claude executes.
130
+
131
+ ### The Pattern
132
+
133
+ 1. **Skill file** (markdown) — describes the workflow, what scripts to use at each step, and provides context (URLs, environment info, conventions)
134
+ 2. **Scripts** (bash/python) — do the actual work without burning tokens on CLI commands
135
+
136
+ ### Example: ML Tooling Dev Skill
137
+
138
+ Manages Kubeflow, MLflow, and MongoDB by providing Claude with pre-built scripts for querying pipelines, reading logs, checking run status, and updating configs. Claude uses the scripts directly instead of figuring out `kubectl` commands from scratch each time.
139
+
140
+ ### Why Scripts Over MCP
141
+
142
+ - **Adapted to your workflow** — generic MCPs don't know your conventions
143
+ - **Token-efficient** — scripts execute directly; MCP adds an LLM layer for each tool call
144
+ - **Documented** — docs in the skill help Claude understand what each script does
145
+
146
+ ### Improving Skills Iteratively
147
+
148
+ When a skill produces errors, feed those errors back to Claude and ask it to enhance the skill. Each iteration reduces token waste and improves reliability.
149
+
150
+ ### Creating Skills from Sessions
151
+
152
+ At the end of a productive session, ask Claude: "Based on this session, use Skill Creator to build a skill that replicates what I just did." Results vary by session complexity, but it's a good starting point.
153
+
154
+ ---
155
+
156
+ ## 6. Multi-Repository Work
157
+
158
+ For projects spanning multiple repos:
159
+
160
+ 1. Keep all project repos in one parent directory
161
+ 2. Add a `CLAUDE.md` at the parent level that summarizes how the projects relate
162
+ 3. Launch Claude from that parent directory
163
+ 4. Tell Claude to look at local repos when it needs cross-project context
164
+
165
+ ---
166
+
167
+ ## 7. Security Reminders
168
+
169
+ - **GitHub:** Select individual repos, not org-wide access
170
+ - **Credentials/tokens:** Don't hardcode secrets in code Claude generates. Use `.env` files and ensure they're in `.gitignore`. Claude Code settings can be configured to ignore sensitive files by default.
171
+ - **Audit trail:** Everything Claude does is under **your** username. Treat its actions as your own — review before merging.
172
+ - **Destructive commands:** Claude is reluctant to run Terragrunt, and cautious with SQL. This is by design. Don't override these guardrails without thinking.
173
+ - **Auto mode:** Be careful — it can push to develop or skip branch creation if you're not watching. Before entering auto mode, always verify: (1) you're on a feature branch, not develop; (2) the remote is set correctly. If something goes wrong, use `git reflog` to find your previous state and `git reset` to recover. Ask for help if unsure.
174
+
175
+ ---
176
+
177
+ ## 8. Learning from Claude
178
+
179
+ Read Claude's traces (`Cmd+O` on macOS) — the full trace of what Claude does is visible and educational. You'll learn bash techniques, see its reasoning process, and spot errors you can feed back to improve skills.
@@ -0,0 +1,29 @@
1
+ # OpenCode Installation
2
+
3
+ ## Setup
4
+
5
+ 1. Clone the repository:
6
+ ```bash
7
+ git clone https://github.com/Attraqt/ai.agent-skills.git
8
+ ```
9
+
10
+ 2. Open the project in OpenCode. The `AGENTS.md` file at the repo root is loaded automatically and provides agent instructions.
11
+
12
+ 3. Skills in the `skills/` directory are discovered automatically via directory convention.
13
+
14
+ ## How It Works
15
+
16
+ - **No slash commands needed.** The agent reads `AGENTS.md` and automatically selects the appropriate skill based on your natural language request.
17
+ - Skills are stored as `skills/<skill-name>/SKILL.md` files with supporting references and scripts.
18
+ - The agent maps your request to the right skill (e.g., "design a feature" triggers brainstorming, "test the pipeline" triggers algo-test-planning).
19
+
20
+ ## Example Prompts
21
+
22
+ | What you say | Skill invoked |
23
+ |-------------|---------------|
24
+ | "Let's brainstorm how to add image search" | brainstorming |
25
+ | "Plan the implementation for BGE-M3 migration" | planning |
26
+ | "I want to test the semantic search pipeline for Myer" | algo-test-planning |
27
+ | "Run the tests for algo.search-ml" | e2e-testing |
28
+ | "Check the status of my Kubeflow run" | ml-tooling-dev |
29
+ | "Review the naming in this PR" | naming-conventions-reviewer |
@@ -0,0 +1,43 @@
1
+ # OpenCode Setup
2
+
3
+ ## Installation
4
+
5
+ ```bash
6
+ git clone https://github.com/Attraqt/ai.agent-skills.git
7
+ ```
8
+
9
+ Open the cloned directory in OpenCode. No additional setup is required.
10
+
11
+ ## How It Works
12
+
13
+ - `AGENTS.md` at the repo root is loaded automatically and provides agent instructions
14
+ - Skills in the `skills/` directory are discovered via directory convention
15
+ - The agent automatically selects the appropriate skill based on your natural language request
16
+ - No slash commands are needed; the agent detects intent and routes to the right workflow
17
+
18
+ ## Skill Routing
19
+
20
+ | Your request | Skill invoked |
21
+ |-------------|---------------|
22
+ | "Let's brainstorm how to..." | `skills/brainstorming/SKILL.md` |
23
+ | "Plan the implementation for..." | `skills/planning/SKILL.md` |
24
+ | "Test the pipeline for..." | `skills/algo-test-planning/SKILL.md` |
25
+ | "Run the tests for..." | `skills/e2e-testing/SKILL.md` |
26
+ | "Check my Kubeflow run" | `skills/ml-tooling-dev/SKILL.md` |
27
+ | "Review the naming in..." | `skills/naming-conventions-reviewer/SKILL.md` |
28
+
29
+ ## Using in Other Projects
30
+
31
+ To use ai.pierre skills when working in a different project repo:
32
+
33
+ 1. Ensure `AGENTS.md` is copied or symlinked to your project root
34
+ 2. Copy the `skills/` directory (or specific skills you need) into your project
35
+ 3. The agent will auto-discover and apply them
36
+
37
+ Alternatively, reference the skills directory in your OpenCode configuration to load them globally.
38
+
39
+ ## Notes
40
+
41
+ - The skill routing depends on the model consistently following rules in `AGENTS.md`
42
+ - For best results, keep your requests natural and descriptive
43
+ - The agent will use the `question` tool (OpenCode equivalent of AskUserQuestion) to gather requirements interactively
@@ -0,0 +1,192 @@
1
+ ---
2
+ name: algo-test-planning
3
+ description: >
4
+ Use when the user wants to plan or configure a test for an algo pipeline. Guides through
5
+ pipeline selection, config gathering, and Kubeflow config JSON generation via a 3-stage
6
+ interactive flow using AskUserQuestion. Covers full multi-step pipelines and base
7
+ single-step pipelines.
8
+ ---
9
+
10
+ # Algo Pipeline Test Planning
11
+
12
+ ## Overview
13
+
14
+ This skill guides you through planning and configuring a test run for an ML pipeline.
15
+ It produces a complete Kubeflow config JSON and a test plan with launch and verification steps.
16
+
17
+ All pipeline operations target the **DEV environment only**.
18
+
19
+ ## Stage 1: Pipeline Type Selection
20
+
21
+ Use AskUserQuestion to ask:
22
+
23
+ **Question:** "What type of pipeline do you want to test?"
24
+
25
+ | Option | Description |
26
+ |--------|-------------|
27
+ | Full pipeline | Multi-step end-to-end pipeline (e.g., learning + evaluation + encoding) |
28
+ | Base/single-step pipeline | Single step using a base pipeline template |
29
+
30
+ ---
31
+
32
+ ## Stage 2a: Full Pipeline Selection
33
+
34
+ If the user chose "Full pipeline", use AskUserQuestion to ask which pipeline.
35
+
36
+ Consult `references/pipeline-registry.md` for the complete list grouped by domain.
37
+ Present the most relevant options based on context, or let the user search.
38
+
39
+ Common full pipelines by domain:
40
+
41
+ **Semantic Search:**
42
+ - `semantic_search_learning_with_generated_analytics_pipeline`
43
+ - `semantic_search_item_encoding_pipeline`
44
+
45
+ **Visual Search / CLIP:**
46
+ - `clip_learning_pipeline`
47
+ - `clip_item_encoding_pipeline`
48
+
49
+ **Tagging:**
50
+ - `tagging_learning_pipeline`
51
+ - `transformer_tagging_learning_pipeline`
52
+
53
+ **Image:**
54
+ - `image_encoder_learning_pipeline`
55
+ - `image_classifier_pipeline`
56
+
57
+ **Shop the Look:**
58
+ - `shop_the_look_learning_pipeline`
59
+
60
+ **FM / Recommendations:**
61
+ - `fm_learning_pipeline`
62
+
63
+ **Text Encoder:**
64
+ - `text_encoder_learning_pipeline`
65
+
66
+ Then gather pipeline-specific config fields based on the selected pipeline's requirements
67
+ (see `references/pipeline-registry.md` for required fields per pipeline).
68
+
69
+ ---
70
+
71
+ ## Stage 2b: Base Pipeline Selection
72
+
73
+ If the user chose "Base/single-step pipeline", use AskUserQuestion to ask:
74
+
75
+ **Question:** "Which base pipeline type?"
76
+
77
+ | Option | Description |
78
+ |--------|-------------|
79
+ | `python_batch_pipeline` | Standard Python batch jobs |
80
+ | `large_python_batch_pipeline` | GPU/high-memory Python batch jobs |
81
+ | `scala_batch_pipeline` | Scala-based batch jobs |
82
+ | `spark_scala_batch_pipeline` | Spark Scala batch jobs |
83
+
84
+ Then ask:
85
+ - **Strategy ID** (what job to run, e.g., `semantic-search-learning`, `item-images-single-encoding`)
86
+ - **Docker image name** (e.g., `semantic-search`, `algo-fm-batch`)
87
+ - **Arguments** specific to the strategy (varies by step type)
88
+
89
+ Key rules for base pipelines:
90
+ - `python_batch_pipeline` and `large_python_batch_pipeline` use `batch_config.arguments` for custom params
91
+ - `scala_batch_pipeline` and `spark_scala_batch_pipeline` use `batch_config.custom_params` (NOT `arguments`)
92
+ - GPU jobs must include `gpu_vendor: "nvidia.com/gpu"` and `gpu_accelerator_name: "nvidia-l4"` for L4 nodes
93
+
94
+ ---
95
+
96
+ ## Stage 3: Config Gathering and JSON Generation
97
+
98
+ Use AskUserQuestion sequentially for each required input:
99
+
100
+ ### 3.1 Predictor ID
101
+ Ask for the MongoDB ObjectId (e.g., `64f0a12b5856b11b7aa4e71e`).
102
+ This identifies the tenant/predictor whose config will be used.
103
+
104
+ ### 3.2 Experiment Name
105
+ Discover available experiments:
106
+ ```bash
107
+ python3 scripts/kf_query.py --experiments
108
+ ```
109
+ Or let the user provide one directly.
110
+
111
+ ### 3.3 Strategy ID
112
+ Based on the pipeline or step selected. Reference `skills/ml-tooling-dev/references/pipeline-configs.md`
113
+ for the canonical strategy ID list.
114
+
115
+ ### 3.4 Image Version
116
+ Verify the version exists in Kubeflow:
117
+ ```bash
118
+ python3 scripts/kf_query.py --pipeline-versions <pipeline_name>
119
+ ```
120
+ Use the most recent `version_name` from the output (e.g., `"0.1.271"`).
121
+
122
+ ### 3.5 Dataset Paths (if applicable)
123
+ GCS paths from previous pipeline runs. Discover via:
124
+ ```bash
125
+ python3 scripts/kf_query.py <previous_run_id>
126
+ ```
127
+ Check Kubeflow UI -> run -> succeeded steps -> Output artifacts tab.
128
+
129
+ ### 3.6 MLflow Run ID (if applicable)
130
+ For evaluation or encoding steps that need a trained model:
131
+ ```bash
132
+ python3 scripts/mlflow_query.py model-for-predictor <predictor_id>
133
+ ```
134
+
135
+ ### 3.7 MongoDB Config Check
136
+ Read current training hyperparameters:
137
+ ```bash
138
+ mongosh "mongodb://10.11.96.21:27017/earlybirds" --quiet --eval '
139
+ const doc = db.predictors.findOne({"_id": ObjectId("<PREDICTOR_ID>")});
140
+ print(JSON.stringify(doc.config.batch, null, 2));
141
+ '
142
+ ```
143
+ Present the current config to the user. Ask if any changes are needed before the test run.
144
+ If changes are needed, generate the `updateOne` command (see `skills/ml-tooling-dev/references/mongodb-config.md`).
145
+
146
+ ### 3.8 Resource Overrides
147
+ Use defaults from `skills/ml-tooling-dev/references/pipeline-configs.md` unless the user specifies:
148
+ - CPU/memory requests and limits
149
+ - GPU type and count
150
+ - Disk size
151
+
152
+ ---
153
+
154
+ ## Output
155
+
156
+ Generate the following:
157
+
158
+ ### 1. Complete Kubeflow Config JSON
159
+ A ready-to-submit JSON file. Save to `/tmp/<pipeline>-<predictor_id>-test.json`.
160
+
161
+ ### 2. Pre-Launch Checklist
162
+ - [ ] `version_name` verified via `kf_query.py --pipeline-versions`
163
+ - [ ] MongoDB config confirmed (show current values)
164
+ - [ ] Dataset paths validated (exist in GCS)
165
+ - [ ] Experiment exists in Kubeflow
166
+
167
+ ### 3. Launch Command
168
+ ```bash
169
+ cd attraqt-kubeflow-configs/scripts
170
+ python -m run -c <absolute_path_to_config>
171
+ ```
172
+
173
+ ### 4. Verification Steps
174
+ - Monitor run: `python3 scripts/kf_query.py <run_id>`
175
+ - Check failed steps: `python3 scripts/kf_query.py <run_id> --failed`
176
+ - Expected step outcomes for each pipeline step
177
+ - Pod log patterns to watch for
178
+
179
+ ### 5. Failure Recovery
180
+ - Debug failed steps: see `skills/ml-tooling-dev/references/kubectl-debug.md`
181
+ - Common failure patterns and fixes
182
+ - How to re-run individual failed steps
183
+
184
+ ---
185
+
186
+ ## Skill Dependencies
187
+
188
+ This skill invokes `ai.pierre:ml-tooling-dev` for:
189
+ - Config templates and validation
190
+ - Kubeflow/MLflow query commands
191
+ - MongoDB read/update operations
192
+ - kubectl debugging commands