@rishildi/ldi-process-skills 0.1.2 → 0.1.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"embedded.d.ts","sourceRoot":"","sources":["../../src/skills/embedded.ts"],"names":[],"mappings":"AAGA,MAAM,WAAW,SAAS;IACxB,YAAY,EAAE,MAAM,CAAC;IACrB,OAAO,EAAE,MAAM,CAAC;CACjB;AAED,MAAM,WAAW,aAAa;IAC5B,IAAI,EAAE,MAAM,CAAC;IACb,QAAQ,EAAE,MAAM,CAAC;IACjB,KAAK,EAAE,SAAS,EAAE,CAAC;CACpB;AAED,eAAO,MAAM,eAAe,EAAE,aAAa,
|
|
1
|
+
{"version":3,"file":"embedded.d.ts","sourceRoot":"","sources":["../../src/skills/embedded.ts"],"names":[],"mappings":"AAGA,MAAM,WAAW,SAAS;IACxB,YAAY,EAAE,MAAM,CAAC;IACrB,OAAO,EAAE,MAAM,CAAC;CACjB;AAED,MAAM,WAAW,aAAa;IAC5B,IAAI,EAAE,MAAM,CAAC;IACb,QAAQ,EAAE,MAAM,CAAC;IACjB,KAAK,EAAE,SAAS,EAAE,CAAC;CACpB;AAED,eAAO,MAAM,eAAe,EAAE,aAAa,EAuP1C,CAAC"}
|
package/build/skills/embedded.js
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
// AUTO-GENERATED by scripts/embed-skills.ts — do not edit
|
|
2
|
-
// Generated at: 2026-04-04T20:
|
|
2
|
+
// Generated at: 2026-04-04T20:56:18.349Z
|
|
3
3
|
export const EMBEDDED_SKILLS = [
|
|
4
4
|
{
|
|
5
5
|
name: "create-fabric-lakehouses",
|
|
@@ -183,7 +183,15 @@ export const EMBEDDED_SKILLS = [
|
|
|
183
183
|
files: [
|
|
184
184
|
{
|
|
185
185
|
relativePath: "SKILL.md",
|
|
186
|
-
content: "---\nname: fabric-process-discovery\ndescription: >\n Use this skill to conduct the initial environment discovery conversation for any\n Microsoft Fabric process workflow. Collects workspace access, deployment approach,\n access control preferences, capacity, and data location through an adaptive,\n one-question-at-a-time conversation grounded in what the downstream Fabric skills\n actually require. Output is a structured environment profile used by the\n orchestrating agent to plan execution. Triggers as Sub-Agent 0 in any Fabric\n process workflow agent.\nlicense: MIT\ncompatibility: Works in any Claude context — no external tools required at this stage.\n---\n\n# Fabric Process Discovery\n\n> ⚠️ **GOVERNANCE**: This skill only gathers context — it never executes commands or\n> creates resources. All collected information feeds into the execution plan which the\n> operator reviews and confirms before anything runs.\n\n## Workflow\n\n1. Read the process requirements and identify which domains below are relevant.\n2. Ask one question at a time, branching adaptively based on each answer.\n3. Collect all path decisions and any parameter values the operator has available.\n4. Present a confirmation summary and wait for explicit approval.\n5. Write the environment profile and append to `CHANGE_LOG.md`.\n\nRuns a structured, adaptive discovery conversation before any Fabric work begins.\nAsk **one question at a time**. Branch based on each answer before deciding what to\nask next. Every question must explain why it matters. Never leave the user blocked.\n\n## Principles\n\nThese are not scripts to follow — they are the reasoning the model should apply\nwhen deriving and sequencing questions.\n\n**1. Read requirements first.**\nBefore asking anything, read the process requirements. Identify which domains below\nare relevant. Only ask about what the requirements actually need — do not run\nthrough all domains for every process.\n\n**2. Ask one question at a time.**\nNever present multiple questions in one turn. Ask the most important unresolved\nquestion, wait for the answer, then decide what to ask next based on that answer.\nThis produces cleaner answers and better branching.\n\n**3. Always explain why.**\nEvery question must briefly state what it unlocks or what it blocks. Users answer\nbetter when they understand the purpose.\n\n**4. Always offer a way forward.**\nEvery question should include an option to provide the answer later (placeholder),\nor to skip the step if it is optional. For questions requiring specific values the\nuser may not have ready (names, IDs, capacity names), offer a command or\ninstruction that helps them find it. Never leave the user stuck.\n\n**5. Distinguish path decisions from parameter values.**\n- **Path decisions** (can you create workspaces? what deployment approach?) determine\n the plan structure — always collect these during discovery.\n- **Parameter values** (exact workspace names, group Object IDs, capacity name) are\n needed before execution — collect them now if the user has them, or flag them as\n \"required before running\" if not.\n\n**6. Trust the model's intelligence.**\nThe domains below describe what to establish and the technical context needed to\nask good questions. Do not read them as scripts. Derive clear, natural questions\nfrom the requirements and the conversation so far.\n\n---\n\n## Domains\n\nCover only the domains relevant to the process requirements:\n\n| Process involves | Domains to cover |\n|---|---|\n| Creating workspaces | A, B, C, D, F |\n| Creating lakehouses | A, D, F |\n| Ingesting files (CSV/PDF) | D, E |\n| Running notebooks/scripts | D, F |\n| Full pipeline | All domains |\n\n---\n\n### Domain A — Workspace access\n\n**What to establish:**\n- Can the operator create new workspaces, or must they use existing ones?\n- If creating: what names do they want?\n- If using existing: what are the exact names?\n\n**Technical context:**\n- Workspace names are case-sensitive in `fab` paths. Always confirm exact casing.\n- If the operator is unsure whether they have create rights, `fab ls` will show\n workspaces they already have access to. Command requires `fab` CLI installed first:\n `pip install ms-fabric-cli` → `fab auth login` → `fab ls`.\n- Read the requirements to determine how many workspaces are needed (e.g. hub and\n spoke, or a single workspace) before asking.\n\n**Branch:**\n- Can create → collect intended workspace names (or use placeholder if not decided)\n- Cannot create → collect exact names of existing workspaces to use\n- Unsure → offer the `fab ls` command to check; proceed once confirmed\n\n---\n\n### Domain B — Domain assignment\n\n**What to establish:**\n- Does the operator want to assign the workspace(s) to a Fabric domain?\n- If yes: assign to existing domain, or create a new one?\n- If creating a new domain: do they have Fabric Admin rights?\n\n**Technical context:**\n- Domain assignment is optional. Many teams skip it and add it later.\n- Assigning to an existing domain requires no special rights beyond workspace access.\n- **Creating a new domain requires Fabric Administrator rights — this is a\n tenant-level permission, not workspace-level.** If the operator is unsure, default\n to assigning an existing domain or skipping. Do not assume they have these rights.\n- Domain assignment can always be done later via the Fabric portal.\n\n**Branch:**\n- Assign to existing domain → collect domain name\n- Create new domain → confirm Fabric Admin rights; if uncertain or no → mark as\n manual gate, note the intended domain name for the plan\n- Skip → no domain parameters needed\n\n---\n\n### Domain C — Access control\n\n**What to establish:**\n- Beyond the workspace creator (automatically assigned as Admin), should additional\n users or security groups be assigned workspace roles?\n- If groups: how will the Object IDs be obtained?\n\n**Technical context:**\n- **The Fabric REST API requires Entra group Object IDs (GUIDs) — display names are\n not accepted programmatically.** This is a hard API requirement.\n- Individual users can be identified by email address (UPN) — no Object ID needed.\n- Object IDs can be found via:\n - Azure portal: Azure Active Directory → Groups → select group → Object ID field\n - Azure CLI: `az ad group show --group \"Display Name\" --query id -o tsv`\n - PowerShell: `Get-MgGroup -Filter \"displayName eq 'Name'\" | Select-Object Id`\n- **If the deployment approach is a PySpark notebook AND security groups are involved:\n `notebookutils` inside a Fabric notebook cannot query Microsoft Graph.** The\n notebook cannot resolve group display names to Object IDs at runtime. Options:\n (a) operator provides Object IDs directly before running, (b) IDs are resolved via\n Azure CLI or PowerShell before the notebook is run, (c) switch to PowerShell or\n terminal deployment for the role assignment step.\n\n**Branch:**\n- No additional access → skip role collection\n- Users only → collect email addresses and intended roles\n- Security groups → ask if the operator can see the groups in the Azure portal:\n - Yes → ask if they will provide Object IDs directly, or want the agent to\n generate Azure CLI lookup commands to retrieve them automatically\n - No / unsure → mark group role assignment as manual; provide portal instructions\n- Mix of users and groups → handle each type appropriately\n\n**Roles available:** Admin, Member, Contributor, Viewer\n\n---\n\n### Domain D — Deployment approach\n\n**What to establish:**\n- How does the operator want to run the generated scripts or notebooks?\n\n**Technical context:**\n- **All three approaches use the Fabric CLI (`fab`) internally.** This is not a\n question about whether to use the CLI — it is about how the operator runs the\n generated artefacts.\n- **PySpark notebook:** imported into a Fabric workspace and run cell-by-cell in the\n Fabric UI. Authentication is automatic via `notebookutils`. Best for operators\n who prefer working inside Fabric and want step-by-step visibility.\n- **PowerShell script:** a `.ps1` file the operator reviews and runs locally.\n Requires `fab` CLI installed locally (`pip install ms-fabric-cli`) and PowerShell.\n- **Terminal commands:** individual `fab` commands run one at a time in a terminal.\n Requires `fab` CLI installed locally. Best for operators who want full control\n and visibility at each step.\n- If the operator chooses notebook AND has Entra group role assignments, flag the\n Service Principal constraint from Domain C before proceeding.\n\n---\n\n### Domain E — Source data\n\n*Only ask if the process involves ingesting files.*\n\n**What to establish:**\n- Where are the source files (CSVs, PDFs, etc.)?\n\n**Technical context:**\n- Local files require an upload step before they can be referenced in Fabric.\n- Files already in OneLake can be referenced by path directly — no upload needed.\n- Files in SharePoint or Azure Blob Storage can be connected via Fabric shortcuts,\n avoiding the need to copy data.\n\n**Branch:**\n- Local machine → include an upload step in the plan\n- Already in OneLake → collect the OneLake path; skip upload\n- Cloud storage (SharePoint / Azure Blob) → collect source URL; include shortcut\n creation step\n\n---\n\n### Domain F — Capacity\n\n*Ask whenever workspaces are being created.*\n\n**What to establish:**\n- What Fabric capacity will the workspace(s) be assigned to?\n\n**Technical context:**\n- Every Fabric workspace must be assigned to an active capacity at creation time.\n- The capacity must be in Active state — if it is paused, the operator must resume\n it in the Azure portal before running workspace creation.\n- The operator may not know the exact name. Options:\n - Run `fab ls` — capacity information appears in the output\n - Check the Fabric Admin portal under Capacities\n- If the operator does not have the name yet, use the placeholder `[CAPACITY_NAME]`\n and flag it as required before the notebook or script is run.\n\n---\n\n## What to Collect\n\nBy the end of discovery, the environment profile must include:\n\n**Path decisions** (always required — these determine the shape of the plan):\n- Workspace approach: creating new / using existing\n- Domain approach: new (manual if no admin rights) / existing / skipped\n- Access control: none / users only / groups / manual\n- Deployment approach: notebook / PowerShell / terminal\n- Group ID resolution method (if groups involved): direct / CLI lookup / manual\n\n**Parameter values** (collect if available; flag as required before run if not):\n- Workspace name(s) — exact, case-preserved\n- Capacity name\n- Domain name (if assigning)\n- Security group display names and intended roles\n- Group Object IDs (if the operator has them; otherwise flag as needed before run)\n- Existing workspace names (verbatim, if using existing)\n\n---\n\n## Confirmation\n\nBefore writing the environment profile, present a concise summary table of all path\ndecisions and collected parameters. Ask the operator to confirm accuracy. If anything\nis missing or unclear, ask only the targeted follow-up needed — do not restart from\nthe beginning.\n\nExample format:\n\n```\n| # | Question | Your answer | What this means |\n|---|-----------------------|------------------------------------|----------------------------------------------------|\n| A | Workspace creation | Creating new | Agent will create hub + spoke workspaces |\n| B | Domain assignment | New domain (manual gate) | Domain creation flagged manual — admin rights needed |\n| C | Access control | Security groups — IDs to be provided | Role assignment scripted; IDs needed before run |\n| D | Deployment approach | PySpark notebook | Agent generates .ipynb for import into Fabric |\n| F | Capacity | ldifabricdev | Embedded in notebook |\n```\n\n---\n\n## Output\n\nSave the confirmed profile as `00-environment-discovery/environment-profile.md`.\n\nInclude:\n- All path decisions\n- All collected parameter values\n- Parameters flagged as required before execution, with instructions for obtaining them\n- Manual gates — steps the operator must perform themselves, and why\n- Deployment prerequisites (e.g. `pip install ms-fabric-cli` if PowerShell or terminal)\n\nAppend to `CHANGE_LOG.md`:\n`[{DATETIME}] Sub-Agent 0 complete — environment-profile.md produced. [N] path decisions recorded. Manual gates: [list or none]. Parameters still needed: [list or none].`\n\n---\n\n## Gotchas\n\n- **Never frame deployment as CLI vs no-CLI.** All three approaches use `fab`. The\n question is only about how the operator runs the generated artefacts.\n- **Workspace names are case-sensitive in `fab` paths.** Always confirm exact casing.\n- **Entra group Object IDs are GUIDs, not display names.** The Fabric REST API will\n reject display names. If the user provides a name, generate a lookup command rather\n than scripting the assignment directly.\n- **`notebookutils` does not support Microsoft Graph.** A Fabric notebook cannot\n resolve group display names to Object IDs at runtime. Either the operator provides\n IDs directly, or resolution must happen outside the notebook.\n- **Domain creation requires Fabric Administrator rights — tenant-level.** Workspace\n Admin rights are not sufficient. Default to assigning an existing domain or skipping\n if there is any doubt about the operator's rights.\n- **Never leave the user blocked.** If a step requires permissions they don't have,\n always offer: (a) skip and mark as manual, (b) produce a spec for their admin, or\n (c) substitute a UI-based workaround.\n",
|
|
186
|
+
content: "---\nname: fabric-process-discovery\ndescription: >\n Use this skill to conduct the initial environment discovery conversation for any\n Microsoft Fabric process workflow. Collects workload scope, workspace access,\n deployment approach, access control, capacity, data location, and environment\n promotion needs through a FATA-aligned, one-question-at-a-time adaptive\n conversation. Output is a structured environment profile used by the orchestrating\n agent to plan execution. Triggers as Sub-Agent 0 in any Fabric process workflow agent.\nlicense: MIT\ncompatibility: Works in any Claude context — no external tools required at this stage.\n---\n\n# Fabric Process Discovery\n\n> ⚠️ **GOVERNANCE**: This skill only gathers context — it never executes commands or\n> creates resources. All collected information feeds into the execution plan which the\n> operator reviews and confirms before anything runs.\n>\n> ⚠️ **PRIVACY**: Never ask for passwords, tokens, client secrets, or Object IDs\n> during discovery. If a Service Principal is needed, record that it is required and\n> the permissions needed — the operator enters credential values at runtime only.\n\n## Workflow\n\n1. Adopt a Fabric architect expert perspective before asking anything.\n2. Read process requirements — identify which domains are relevant.\n3. Ask Phase 1 (contextual + historical background) first.\n4. Work through relevant domains one question at a time, branching on each answer.\n5. Present a confirmation summary and wait for explicit approval.\n6. Write the environment profile and append to `CHANGE_LOG.md`.\n\n## References\n\n- `references/technical-constraints.md` — authentication separation, Object IDs,\n `notebookutils` Graph limitation, Service Principal requirements, capacity state\n- `references/fabric-architecture.md` — workload landscape, medallion architecture,\n environment promotion patterns, credential management\n\nLoad the relevant reference when a domain question requires deeper technical context\nor when the operator asks a technical follow-up.\n\n---\n\n## Core Principles\n\n**1. Expert perspective first.**\nBefore generating questions, reason as a senior Fabric architect. Ask: *what gaps,\nif left unfilled, would cause the plan to fail or need rework?* Surface things the\noperator may not know they need to tell you.\n\n**2. One question at a time — Yes/No or 3–4 labelled options.**\nNever present multiple questions in one turn. Each question must be answerable with\na yes/no or a single choice (A/B/C or A/B/C/D). Wait for the answer before\nbranching. In Fabric discovery, each answer materially changes which questions are\nworth asking next — this is why one-at-a-time is correct here even though FATA\ndefaults to single-turn efficiency.\n\n**3. Scaffold before asking.**\nOne sentence of context before each question explaining what it establishes and why\nit matters for the plan. Operators new to Fabric cannot anticipate what a Fabric\narchitect considers essential.\n\n**4. Cover all five FATA dimensions.**\n\n| Dimension | What to establish |\n|---|---|\n| **Contextual** | Project background, team, experience level |\n| **Constraint-based** | Permissions, tooling, licensing |\n| **Preference-oriented** | Deployment style, governance vs speed, reuse goals |\n| **Environmental** | Capacity, workloads, existing workspaces, data locations |\n| **Historical** | Previous runs, naming conventions, existing patterns |\n\n**5. Path decisions vs parameter values.**\nPath decisions (can you create workspaces? which workloads?) determine plan structure\n— always collect. Parameter values (exact names, IDs) — collect now if available,\notherwise flag as *required before running*.\n\n**6. Offer a way forward on every question.**\nInclude an \"I'm not sure / I'll find out\" option. For specific values the operator\nmay not have ready, offer the command to retrieve them.\n\n**7. Prevent over-questioning.**\nOnly cover domains the requirements actually need. Stop when all path decisions are\nresolved. Roughly: 4–6 questions for simple processes, up to 12 for full pipelines.\n\n---\n\n## Question Sequence\n\n### Phase 1 — Contextual and Historical (always run first)\n\nEstablish background before specifics. Ask one question covering:\n- Is this a new setup, an extension of something existing, or a migration?\n- (If extending/migrating) Are there naming conventions or existing patterns to follow?\n\nOptions should cover: new / extending existing / migrating / unsure.\nThese answers shape the level of explanation needed in later questions and whether\ndefaults can be inferred from what already exists.\n\n---\n\n### Phase 2 — Relevant Domains\n\nSelect domains based on requirements. Work through them in order, one question at a\ntime, completing each branch before moving to the next domain.\n\n| Process involves | Domains to cover |\n|---|---|\n| Creating workspaces | A, B, C, D, F, G |\n| Creating lakehouses | A, D, F, G + medallion question |\n| Ingesting files | D, E |\n| Full pipeline (multiple workloads) | Workload scope question first, then A–G |\n| Notebooks / scripts only | D, F |\n\n---\n\n#### Workload scope (ask first for full pipelines)\n\n*Only ask when requirements span more than one workload or mention end-to-end pipelines.*\n\nOne sentence of context: the answer determines which downstream skills are needed\nand what the workspace structure should look like.\n\nQuestion: Which Fabric workloads does this process involve? (Select all that apply)\n- A) Lakehouse / Spark (Delta tables, PySpark notebooks, file ingestion)\n- B) Data Warehouse (T-SQL analytics)\n- C) Pipelines (orchestration, data movement)\n- D) KQL / Eventhouse (real-time or time-series data)\n- E) Power BI / Semantic Model (reporting layer)\n\nLoad `references/fabric-architecture.md` → Workload Landscape for downstream skill mapping.\n\n---\n\n#### Domain A — Workspace access (Constraint-based + Environmental)\n\n**Establish:** Can the operator create workspaces, or must they use existing ones?\nWhat names?\n\nQuestion: Can you create new Fabric workspaces?\n- A) Yes — I can create workspaces\n- B) No — I need to use existing workspaces\n- C) I'm not sure — I can run `fab ls` to check\n\nBranch:\n- A → ask for intended names (or placeholder); if lakehouses involved, ask whether\n medallion naming is expected (load `references/fabric-architecture.md` → Medallion)\n- B → ask for exact verbatim names of existing workspaces (case-sensitive in `fab`)\n- C → provide `fab ls` command (`pip install ms-fabric-cli` → `fab auth login` → `fab ls`); wait; branch as A or B\n\n---\n\n#### Domain B — Domain assignment (Constraint-based)\n\n**Establish:** Should workspaces be assigned to a Fabric domain?\n\nQuestion: Would you like to assign these workspaces to a Fabric domain?\n- A) Yes — assign to an existing domain (provide name)\n- B) Yes — create a new domain for these workspaces\n- C) No — skip for now\n\nBranch:\n- B → ask if they have Fabric Administrator rights (Yes / No / Unsure);\n No or Unsure → mark as manual gate, note intended domain name for documentation\n\n---\n\n#### Domain C — Access control (Environmental + Constraint-based)\n\n**Establish:** Who else needs workspace access? How will group identifiers be obtained?\n\nKey constraint: Fabric REST API requires Entra group **Object IDs** — display names\nare not accepted. Load `references/technical-constraints.md` → Entra Group Object IDs\nfor resolution methods.\n\nQuestion: Beyond yourself as Admin, does anyone else need workspace access?\n- A) No — just me for now\n- B) Yes — specific users (by email address)\n- C) Yes — Entra security groups\n- D) Yes — a mix of users and groups\n\nBranch (C or D):\n- Ask: Can you see these security groups in the Azure portal\n (Azure Active Directory → Groups)?\n - Yes → Ask: will you provide Object IDs directly, or should the agent generate\n Azure CLI lookup commands?\n - Either way: flag Object IDs as required before run; ask for group names and roles\n - No → mark group role assignment as manual gate; provide portal navigation instructions\n\nIf notebook deployment is chosen AND groups are involved: flag the `notebookutils`\nGraph limitation. Load `references/technical-constraints.md` → notebookutils and\nMicrosoft Graph. Ask whether a Service Principal is available or if the operator\nprefers to switch to PowerShell/terminal for role assignment.\n\n**Roles available:** Admin, Member, Contributor, Viewer\n\n---\n\n#### Domain D — Deployment approach (Preference-oriented)\n\n**Establish:** How does the operator prefer to run generated artefacts?\n\nKey context: all three approaches use the Fabric CLI (`fab`) internally — this is\nabout how the operator runs the generated artefacts, not whether they use the CLI.\nPowerShell and terminal approaches require **two separate logins**: `fab auth login`\n(Fabric) AND `az login` (Azure CLI, for group lookups). Load\n`references/technical-constraints.md` → Authentication for details.\n\nQuestion: How would you like to run the generated scripts or notebooks?\n- A) PySpark notebook — import into Fabric and run cell-by-cell in the Fabric UI\n- B) PowerShell script — review and run locally\n- C) Individual CLI commands — run step-by-step in the terminal\n\n---\n\n#### Domain E — Source data (Environmental)\n\n*Only ask if the process involves ingesting files.*\n\n**Establish:** Where are the source files?\n\nQuestion: Where are the source files you want to ingest?\n- A) On my local machine\n- B) Already in OneLake / Fabric (I have the path)\n- C) In cloud storage — SharePoint, Azure Blob, or similar\n\nBranch:\n- A → include upload step in plan\n- B → ask for OneLake path; skip upload\n- C → ask for source URL; include shortcut creation step\n\n---\n\n#### Domain F — Capacity (Environmental)\n\n*Ask whenever workspaces are being created.*\n\n**Establish:** What Fabric capacity will workspaces be assigned to?\n\nNote: capacity must be in Active state at creation time. Load\n`references/technical-constraints.md` → Capacity State Prerequisite if relevant.\n\nQuestion: Do you know the name of the Fabric capacity to use?\n- A) Yes — I know it (provide the name)\n- B) I can find it — I'll check via `fab ls` or the Fabric Admin portal\n- C) I'll provide it later — use a placeholder for now\n\n---\n\n#### Domain G — Environments (Constraint-based + Preference-oriented)\n\n*Ask whenever the process will run beyond a one-off or dev-only context.*\n\n**Establish:** How many environments need to be supported? This determines whether\nthe plan needs promotion logic and parameterised naming.\n\nLoad `references/fabric-architecture.md` → Environment Promotion for naming patterns.\n\nQuestion: Is this deployment for a single environment, or will it need to be\npromoted across environments?\n- A) Dev only — single environment, no promotion needed\n- B) Dev + prod — two environments, plan should parameterise workspace references\n- C) Dev + test + prod — three environments with a full promotion path\n- D) I'm not sure yet — build for single environment and we'll add promotion later\n\n---\n\n### Phase 3 — Credential management (ask if a Service Principal was flagged)\n\n*Only ask if Domain C or Domain D established that a Service Principal is needed.*\n\n**Establish:** How should SP credentials be managed in the generated artefacts?\n\nLoad `references/fabric-architecture.md` → Credential Management for options.\n\nQuestion: How would you like to handle the Service Principal credentials in the\ngenerated notebook or script?\n- A) Azure Key Vault reference — retrieve the secret at runtime from Key Vault\n- B) Runtime parameter entry — I'll paste in the values when running\n- C) Environment variable — set in my terminal session before running\n\n---\n\n### Phase 4 — Preference check\n\nAfter domains are resolved, ask one closing question if optional steps were\nidentified:\n\nQuestion: For optional steps (e.g. domain assignment, access control), would you\nprefer to include everything now or keep it minimal and add governance steps later?\n- A) Include everything — complete setup now\n- B) Keep it minimal — flag optional steps as manual for later\n- C) Decide step by step — confirm each optional item as we go\n\n---\n\n## Confirmation\n\nPresent a summary table before writing the profile. Include the FATA dimension for\neach item. Ask for explicit confirmation. If gaps remain, ask only the targeted\nfollow-up needed.\n\n```\n| # | Dimension | Question | Answer | What this means |\n|---|---------------|---------------------|-----------------------------|----------------------------------------------|\n| 0 | Contextual | Project context | New setup | No existing conventions to inherit |\n| A | Environmental | Workspace creation | Creating new | Agent creates workspaces |\n| B | Constraint | Domain assignment | New (manual gate) | Flagged manual — Fabric Admin rights needed |\n| C | Environmental | Access control | Groups — IDs direct | IDs required before run |\n| D | Preference | Deployment | PySpark notebook | .ipynb generated for Fabric import |\n| F | Environmental | Capacity | ldifabricdev | Embedded in notebook |\n| G | Constraint | Environments | Dev + prod | Plan parameterises all workspace references |\n```\n\n---\n\n## Output\n\nSave as `00-environment-discovery/environment-profile.md`. Include:\n- All path decisions (with FATA dimension)\n- Collected parameter values\n- Parameters flagged as required before execution (with retrieval instructions)\n- Manual gates with reason and operator instructions\n- Deployment prerequisites (auth steps, CLI installation)\n- Contextual/historical notes affecting naming or structure\n\nAppend to `CHANGE_LOG.md`:\n`[{DATETIME}] Sub-Agent 0 complete — environment-profile.md produced. [N] path decisions recorded. Manual gates: [list or none]. Parameters still needed: [list or none].`\n\n---\n\n## Gotchas\n\n- **Never frame deployment as CLI vs no-CLI** — all three approaches use `fab`\n- **`az login` and `fab auth login` are separate** — both required for PowerShell/terminal deployments that include group lookups\n- **Workspace names are case-sensitive** — confirm exact casing from `fab ls` output\n- **Entra group Object IDs required** — display names rejected by Fabric API; see `references/technical-constraints.md`\n- **`notebookutils` cannot query Microsoft Graph** — notebook + groups = SP or pre-resolved IDs required\n- **Domain creation = Fabric Admin rights** — not workspace-level; default to skip if uncertain\n- **Never collect credential values** — flag that they are needed; operator enters at runtime\n- **Stop when path decisions are resolved** — do not continue asking once the plan structure is clear\n",
|
|
187
|
+
},
|
|
188
|
+
{
|
|
189
|
+
relativePath: "references/fabric-architecture.md",
|
|
190
|
+
content: "# Fabric Architecture Reference\n\nLoad this file when questions arise about workload scope, environment promotion,\nmedallion architecture, or credential management patterns.\n\n---\n\n## Fabric Workload Landscape\n\nUnderstanding which workloads a process involves determines which downstream skills\nare needed and what environment questions are relevant.\n\n| Workload | Primary use | Downstream skill |\n|---|---|---|\n| **Lakehouse / Spark** | Delta tables, PySpark notebooks, file ingestion | spark-authoring-cli |\n| **Data Warehouse** | T-SQL analytics, structured serving layer | sqldw-authoring-cli |\n| **Pipelines** | Orchestration, data movement between workloads | Fabric Data Factory |\n| **KQL / Eventhouse** | Real-time and time-series analytics | eventhouse-authoring-cli |\n| **Power BI / Semantic Model** | Reporting layer, DAX, XMLA | powerbi-authoring-cli |\n| **Data Science / Agents** | ML models, conversational data agents | Fabric Data Science |\n\nMost process workflows involve a subset of these. Establishing workload scope early\nlets the plan delegate correctly and avoids discovering scope gaps mid-execution.\n\n**Ask about workload scope when:** requirements mention more than one of the above,\nor when the process spans ingestion → transformation → reporting (full pipeline).\n\n---\n\n## Medallion Architecture (Bronze / Silver / Gold)\n\nThe standard Fabric data engineering pattern organises data into three layers:\n\n| Layer | Contains | Format |\n|---|---|---|\n| **Bronze** | Raw ingested data — unmodified | Delta tables, files |\n| **Silver** | Validated, cleaned, conformed data | Delta tables |\n| **Gold** | Aggregated, business-ready data | Delta tables, views |\n\n**Why it matters for discovery:**\n- Lakehouse naming conventions typically reflect the layer\n (e.g. `lh_bronze`, `lh_silver`, `lh_gold`)\n- Shortcut and schema structures differ by layer\n- Pipelines must include validation gates between Bronze→Silver and Silver→Gold\n transitions — omitting these creates hard-to-debug data quality issues\n\n**Ask about medallion pattern when:** requirements involve lakehouses, ingestion,\nor transformation steps. The operator may not use the bronze/silver/gold naming —\nask whether they follow this pattern or have an existing naming convention.\n\n---\n\n## Environment Promotion (Dev / Test / Prod)\n\nThe FabricDataEngineer agent mandates explicit environment parameterisation.\nOne-off implementation choices that cannot be promoted across environments are\nexplicitly avoided.\n\n**What this means for discovery:**\n\n| Scenario | Plan impact |\n|---|---|\n| Dev only | Single workspace set; no promotion logic needed |\n| Dev + prod | Two workspace sets; plan must parameterise all workspace/lakehouse references |\n| Dev + test + prod | Three sets; deployment pipeline or scripted promotion required |\n\nWhen multiple environments are in scope:\n- Workspace names should follow a consistent pattern (e.g. `[Name]-Dev`, `[Name]-Prod`)\n- All IDs and names must be externalised — never hardcoded\n- The environment profile should record the naming pattern for each environment\n\n**Ask about environments when:** the process will run in production, or when the\noperator mentions promotion, CI/CD, or deploying to other teams.\n\n---\n\n## Credential Management\n\nCredentials required by Fabric processes (Service Principal secrets, storage keys,\nAPI tokens) should never be hardcoded in notebooks or scripts.\n\n| Method | Best for | Notes |\n|---|---|---|\n| **Azure Key Vault** | Production environments | Requires Key Vault resource + permissions |\n| **Notebook parameters** | Development / interactive runs | Operator enters at runtime; not stored |\n| **Environment variables** | Local PowerShell/terminal scripts | Set in shell session; not persisted |\n| **Fabric environment secrets** | Shared Spark environments | Requires Fabric environment configuration |\n\n**During discovery:** Do not collect credential values. If the plan requires a\nService Principal or storage credential, ask how the operator wants to manage it —\nKey Vault reference, runtime parameter entry, or environment variable. Record the\napproach in the environment profile so generated notebooks and scripts use the\ncorrect pattern.\n\n---\n\n## Developer vs Consumer Patterns\n\nUnderstanding the operator's role prevents over-engineering the plan:\n\n**Developers** (building pipelines, creating artefacts):\n- Use Fabric REST APIs for creating/managing workspaces, lakehouses, notebooks\n- Use protocol-specific connections for data access (Spark, ODBC/JDBC, XMLA, KQL)\n- Relevant to this skill — discovery is aimed at developers\n\n**Consumers** (querying data, running reports):\n- Use MCP servers or Fabric UI for natural language / report access\n- Typically do not need workspace creation or deployment scripts\n- If the operator is a consumer, scope the plan accordingly\n",
|
|
191
|
+
},
|
|
192
|
+
{
|
|
193
|
+
relativePath: "references/technical-constraints.md",
|
|
194
|
+
content: "# Technical Constraints Reference\n\nLoad this file when an operator's answer raises a technical question about\nauthentication, API limitations, or Fabric-specific constraints.\n\n---\n\n## Authentication — Two Separate Steps\n\nFabric CLI and Azure CLI use **different authentication sessions**. Both are\nrequired whenever the deployment involves Azure CLI lookups (e.g. resolving\nEntra group Object IDs) alongside Fabric CLI workspace operations.\n\n| Tool | Login command | Used for |\n|---|---|---|\n| Fabric CLI (`fab`) | `fab auth login` | Workspace creation, role assignment, lakehouse ops |\n| Azure CLI (`az`) | `az login` | Entra group/user Object ID resolution |\n\nOperators who choose PowerShell or terminal deployment must complete **both** logins\nbefore running the generated scripts. The generated artefacts will include both\ncommands with a clear note that they are separate.\n\nFor PySpark notebooks inside Fabric: authentication is automatic via\n`notebookutils.credentials.getToken('pbi')` — no manual login needed.\nHowever, this token covers Power BI / Fabric REST APIs only (see below).\n\n---\n\n## Entra Group Object IDs\n\nThe Fabric REST API and Fabric CLI require **Object IDs (GUIDs)** for group role\nassignment — display names are not accepted. This is a hard API constraint.\n\nResolution options for operators:\n\n| Method | Command | Requires |\n|---|---|---|\n| Azure portal | AAD → Groups → select → Object ID field | Portal access |\n| Azure CLI | `az ad group show --group \"Name\" --query id -o tsv` | `az login` |\n| PowerShell (Graph) | `Get-MgGroup -Filter \"displayName eq 'Name'\" \\| Select-Object Id` | Microsoft.Graph module |\n\nAlways ask operators to confirm group display names exactly as they appear in AAD —\nnames are case-sensitive in the API.\n\n---\n\n## `notebookutils` and Microsoft Graph\n\n`notebookutils.credentials.getToken('pbi')` inside a Fabric notebook returns a\nPower BI / Fabric scoped token. It **cannot** obtain a Microsoft Graph token.\n\nThis means a Fabric notebook **cannot**:\n- Look up Entra group Object IDs at runtime\n- Query AAD for user or group information\n- Call any Graph API endpoint\n\n**Consequence:** If the deployment approach is a PySpark notebook AND the plan\nincludes Entra group role assignment, one of these must be true before the notebook runs:\n- The operator provides Object IDs directly (entered into a parameter cell)\n- Object IDs are resolved via Azure CLI or PowerShell beforehand and passed in\n\nIf neither is practical, steer the operator toward PowerShell or terminal deployment\nfor the role assignment step — both support `az login` → Graph lookups inline.\n\n---\n\n## Service Principal — When Required\n\nA Service Principal with application permissions is required only when a Fabric\nnotebook needs to call Microsoft Graph at runtime. This applies when:\n- Deployment = PySpark notebook\n- Role assignment includes Entra groups\n- Operator wants ID resolution to happen inside the notebook automatically\n\nRequired SP permissions: `Group.Read.All` + `User.Read.All` (application, not delegated),\nwith admin consent granted in Azure AD.\n\n**During discovery:** Do not ask for SP credentials. Record that one is required,\nnote the permissions needed, and flag credential management as a runtime concern\n(see `fabric-architecture.md` → Credential Management).\n\n---\n\n## Workspace Name Case Sensitivity\n\nWorkspace names in `fab` paths are case-sensitive. `fab ls` returns exact names —\nalways confirm the operator is using the verbatim casing from that output.\n\nCommon failure: workspace names with leading/trailing spaces, or names that differ\nonly in capitalisation (e.g. `Finance Hub` vs `finance hub`).\n\n---\n\n## Capacity State Prerequisite\n\nA Fabric workspace must be assigned to an **Active** capacity at creation time.\nIf the capacity is paused, workspace creation will fail with `CapacityNotInActiveState`.\n\nThe operator must resume the capacity in the Azure portal before running the\nworkspace creation step. Flag this in the environment profile if there is any\nuncertainty about capacity state.\n",
|
|
187
195
|
},
|
|
188
196
|
],
|
|
189
197
|
},
|