@rishildi/ldi-process-skills 0.1.3 → 0.1.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"embedded.d.ts","sourceRoot":"","sources":["../../src/skills/embedded.ts"],"names":[],"mappings":"AAGA,MAAM,WAAW,SAAS;IACxB,YAAY,EAAE,MAAM,CAAC;IACrB,OAAO,EAAE,MAAM,CAAC;CACjB;AAED,MAAM,WAAW,aAAa;IAC5B,IAAI,EAAE,MAAM,CAAC;IACb,QAAQ,EAAE,MAAM,CAAC;IACjB,KAAK,EAAE,SAAS,EAAE,CAAC;CACpB;AAED,eAAO,MAAM,eAAe,EAAE,aAAa,
|
|
1
|
+
{"version":3,"file":"embedded.d.ts","sourceRoot":"","sources":["../../src/skills/embedded.ts"],"names":[],"mappings":"AAGA,MAAM,WAAW,SAAS;IACxB,YAAY,EAAE,MAAM,CAAC;IACrB,OAAO,EAAE,MAAM,CAAC;CACjB;AAED,MAAM,WAAW,aAAa;IAC5B,IAAI,EAAE,MAAM,CAAC;IACb,QAAQ,EAAE,MAAM,CAAC;IACjB,KAAK,EAAE,SAAS,EAAE,CAAC;CACpB;AAED,eAAO,MAAM,eAAe,EAAE,aAAa,EAuP1C,CAAC"}
|
package/build/skills/embedded.js
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
// AUTO-GENERATED by scripts/embed-skills.ts — do not edit
|
|
2
|
-
// Generated at: 2026-04-04T20:
|
|
2
|
+
// Generated at: 2026-04-04T20:56:18.349Z
|
|
3
3
|
export const EMBEDDED_SKILLS = [
|
|
4
4
|
{
|
|
5
5
|
name: "create-fabric-lakehouses",
|
|
@@ -183,7 +183,15 @@ export const EMBEDDED_SKILLS = [
|
|
|
183
183
|
files: [
|
|
184
184
|
{
|
|
185
185
|
relativePath: "SKILL.md",
|
|
186
|
-
content: "---\nname: fabric-process-discovery\ndescription: >\n Use this skill to conduct the initial environment discovery conversation for any\n Microsoft Fabric process workflow. Collects workspace access, deployment approach,\n access control preferences, capacity, and data location through a FATA-aligned,\n one-question-at-a-time adaptive conversation grounded in what the downstream Fabric\n skills actually require. Output is a structured environment profile used by the\n orchestrating agent to plan execution. Triggers as Sub-Agent 0 in any Fabric\n process workflow agent.\nlicense: MIT\ncompatibility: Works in any Claude context — no external tools required at this stage.\n---\n\n# Fabric Process Discovery\n\n> ⚠️ **GOVERNANCE**: This skill only gathers context — it never executes commands or\n> creates resources. All collected information feeds into the execution plan which the\n> operator reviews and confirms before anything runs.\n>\n> ⚠️ **PRIVACY**: Never ask for passwords, access tokens, client secrets, or any\n> credential values. If the plan requires a Service Principal, record only that one\n> is needed — not the values. Credentials are entered by the operator at runtime,\n> not during discovery.\n\n## Workflow\n\n1. Adopt a Fabric architect expert perspective before asking anything.\n2. Read process requirements and identify which domains are relevant.\n3. Gather contextual and historical background first (one question).\n4. Work through relevant domains — one question at a time, branching on each answer.\n5. Present a confirmation summary and wait for explicit approval.\n6. Write the environment profile and append to `CHANGE_LOG.md`.\n\n---\n\n## Core Principles\n\nThese govern how every question is asked. They are not optional — apply all of them\nthroughout the conversation.\n\n**1. Adopt expert perspective first (FATA: Domain Expert Activation).**\nBefore generating any questions, reason as a senior Fabric architect reviewing the\nrequirements. Ask yourself: *what information gaps, if left unfilled, would cause\nthe plan to fail or need rework?* Those are the questions worth asking. Surface\nthings the operator may not know they need to tell you.\n\n**2. One question at a time — Yes/No or 3–4 options.**\nNever present multiple questions in one turn. Each question must be answerable with\na yes/no or a single choice from 3–4 clearly labelled options (A/B/C or A/B/C/D).\nWait for the answer before deciding what to ask next. This is intentional:\nin Fabric discovery, each answer materially changes which questions are relevant —\npresenting all questions at once produces noise. Single-turn efficiency is the right\ndefault for general LLMs; one-at-a-time branching is correct here.\n\n**3. Scaffold before asking (FATA: User Experience Scaffolding).**\nBefore each question, write one sentence explaining what the question is trying to\nunderstand and why it matters for the plan. Operators new to Fabric cannot anticipate\nwhat a Fabric architect considers essential. Make the purpose visible.\n\n**4. Cover all five FATA information dimensions.**\nStructure discovery to address all five dimensions — not just the obvious ones:\n\n| Dimension | What to establish |\n|---|---|\n| **Contextual** | Project background, team, experience level with Fabric |\n| **Constraint-based** | Permissions, tooling, licensing limits |\n| **Preference-oriented** | Deployment style, governance priorities, reuse goals |\n| **Environmental** | Capacity, existing workspaces, data locations |\n| **Historical** | Previous runs, existing naming conventions, known issues |\n\n**5. Always offer a way forward.**\nEvery question must include an option equivalent to \"I'm not sure / I'll find out.\"\nFor questions requiring specific values (names, IDs), offer a command the operator\ncan run to retrieve them. Never leave the operator blocked.\n\n**6. Distinguish path decisions from parameter values.**\n- **Path decisions** determine the shape of the plan — always collect these.\n- **Parameter values** (exact names, IDs) are needed before execution — collect now\n if the operator has them, otherwise flag as *required before running*.\n\n**7. Prevent over-questioning.**\nCover only the domains the requirements actually need. For simple processes (e.g.\na single notebook), 4–6 questions is sufficient. For a full pipeline, up to 10 is\nreasonable. Stop when all path decisions are resolved — do not ask about things that\nwon't change the plan.\n\n**8. Protect privacy.**\nDo not ask for credentials, secrets, tokens, or Object IDs at this stage. If the\nplan needs a Service Principal, record that one is required and note the permissions\nneeded — the operator enters values at runtime.\n\n---\n\n## Question Sequence\n\n### Phase 1 — Contextual and Historical (always run first)\n\nAsk about background before asking about specifics. This sets the right level of\nexplanation for subsequent questions and surfaces constraints the operator may not\nthink to mention.\n\n**Contextual background question** — ask something like:\n*\"To make sure I pitch the questions at the right level — is this your first time\nsetting up a Fabric environment for this project, or are you extending something\nthat already exists?\"*\n\nOptions should cover: brand new setup / extending an existing one / rebuilding or\nmigrating from somewhere else / unsure.\n\n**Historical question** (ask if the answer above suggests existing work) — ask\nsomething like:\n*\"Are there existing naming conventions, workspace patterns, or previous deployments\nI should follow or be aware of?\"*\n\nOptions: yes (they'll describe) / no / unsure.\n\nThese answers shape how specific later questions need to be and whether defaults\ncan be inferred from what already exists.\n\n---\n\n### Phase 2 — Relevant Domains\n\nCover only the domains relevant to the process requirements. Typical mapping:\n\n| Process involves | Domains to cover |\n|---|---|\n| Creating workspaces | A, B, C, D, F |\n| Creating lakehouses | A, D, F |\n| Ingesting files (CSV/PDF) | D, E |\n| Running notebooks/scripts | D, F |\n| Full pipeline | All domains |\n\nWork through domains in order A → F, skipping irrelevant ones. Within each domain,\nask one question and branch before moving to the next domain.\n\n---\n\n#### Domain A — Workspace access (Constraint-based + Environmental)\n\n**What to establish:** Can the operator create new workspaces, or must they use\nexisting ones? What are the names?\n\n**Technical context:**\n- Workspace names are case-sensitive in `fab` paths.\n- If unsure about create rights: `pip install ms-fabric-cli` → `fab auth login`\n → `fab ls`. If workspace names are returned, they have access.\n- Read requirements to determine how many workspaces are needed before asking.\n\n**Question format:** Can you create new Fabric workspaces?\n- A) Yes — I can create workspaces\n- B) No — I need to use existing workspaces\n- C) I'm not sure — I can run `fab ls` to check\n\n**Branch:**\n- A → ask for intended names (or placeholder if not decided yet)\n- B → ask for exact names of existing workspaces (verbatim — case-sensitive)\n- C → provide the `fab ls` command; wait for output; branch as A or B\n\n---\n\n#### Domain B — Domain assignment (Constraint-based)\n\n**What to establish:** Should workspaces be assigned to a Fabric domain? If yes,\ndoes the operator have the rights needed?\n\n**Technical context:**\n- Domain assignment is optional and can be done later via the portal.\n- Assigning to an *existing* domain requires no special rights.\n- *Creating* a new domain requires Fabric Administrator rights (tenant-level —\n not the same as Workspace Admin). Default to \"skip\" or \"assign existing\" if\n there is any doubt.\n\n**Question format:** Would you like to assign these workspaces to a Fabric domain?\n- A) Yes — assign to an existing domain\n- B) Yes — create a new domain for these workspaces\n- C) No — skip domain assignment for now\n\n**Branch:**\n- A → ask for the domain name\n- B → ask if they have Fabric Administrator rights (Yes / No / Unsure);\n if No or Unsure → mark as manual gate, note intended domain name for documentation\n- C → no domain parameters needed\n\n---\n\n#### Domain C — Access control (Environmental + Constraint-based)\n\n**What to establish:** Who else needs access? How will group identifiers be obtained?\n\n**Technical context:**\n- The workspace creator is automatically assigned as Admin — no action needed.\n- Individual users are identified by email address (UPN) — straightforward.\n- **Entra security groups require Object IDs (GUIDs) — the Fabric REST API does not\n accept display names.** This is a hard API constraint, not a preference.\n- Object IDs can be found: Azure portal (AAD → Groups → select → Object ID field),\n Azure CLI (`az ad group show --group \"Name\" --query id -o tsv`), or PowerShell\n (`Get-MgGroup -Filter \"displayName eq 'Name'\" | Select-Object Id`).\n- **If deployment is a PySpark notebook AND groups are involved:** `notebookutils`\n cannot query Microsoft Graph. Either provide Object IDs directly, resolve via\n Azure CLI/PowerShell before running, or switch deployment approach for this step.\n- Do not ask for Object ID values during discovery — flag that they will be needed\n and establish how they will be obtained.\n\n**Question format:** Beyond yourself as Admin, does anyone else need access?\n- A) No — just me for now\n- B) Yes — specific users (by email)\n- C) Yes — Entra security groups\n- D) Yes — a mix of users and groups\n\n**Branch:**\n- A → skip role collection\n- B → ask for email addresses and intended roles (Admin/Member/Contributor/Viewer)\n- C or D → ask: \"Can you see the security groups in the Azure portal\n (Azure Active Directory → Groups)?\"\n - Yes → ask: will you provide Object IDs directly, or should the agent generate\n Azure CLI lookup commands to retrieve them automatically?\n - Provide directly → flag IDs as required before run; ask for group names and roles\n - CLI lookup → note that lookup commands will be generated; ask for group names and roles\n - No → mark group role assignment as manual gate; provide portal instructions\n\n---\n\n#### Domain D — Deployment approach (Preference-oriented)\n\n**What to establish:** How does the operator prefer to run generated scripts/notebooks?\n\n**Technical context:**\n- **All three approaches use the Fabric CLI (`fab`) internally.** This is not a\n question about whether to use the CLI — it is about how the operator runs the\n generated artefacts.\n- PySpark notebook: runs inside the Fabric UI cell-by-cell. Authentication is\n automatic. Best for operators who prefer working inside Fabric.\n- PowerShell script: reviewed and run locally. Requires `fab` CLI installed\n (`pip install ms-fabric-cli`) and PowerShell.\n- Terminal commands: `fab` commands run one at a time interactively. Requires `fab`\n CLI installed locally. Best for operators who want step-by-step control.\n- If notebook is chosen AND Entra groups are involved, flag the Service Principal\n constraint from Domain C.\n\n**Question format:** How would you like to run the generated artefacts?\n- A) PySpark notebook — import into Fabric and run cell-by-cell in the Fabric UI\n- B) PowerShell script — review and run locally\n- C) Individual CLI commands — run interactively in the terminal, one step at a time\n\n---\n\n#### Domain E — Source data (Environmental)\n\n*Only ask if the process involves ingesting files.*\n\n**What to establish:** Where are the source files?\n\n**Technical context:**\n- Local files require an upload step before they can be used in Fabric.\n- Files already in OneLake can be referenced by path directly.\n- SharePoint/Azure Blob files can be connected via Fabric shortcuts — no copying needed.\n\n**Question format:** Where are the source files you want to ingest?\n- A) On my local machine\n- B) Already in OneLake / Fabric\n- C) In cloud storage (SharePoint, Azure Blob, etc.)\n\n**Branch:**\n- A → include upload step in plan\n- B → ask for OneLake path; skip upload\n- C → ask for source URL/path; include shortcut creation step\n\n---\n\n#### Domain F — Capacity (Environmental + Constraint-based)\n\n*Ask whenever workspaces are being created.*\n\n**What to establish:** What Fabric capacity will workspaces be assigned to?\n\n**Technical context:**\n- Every workspace must be assigned to an active capacity at creation.\n- Capacity must be in Active state — if paused, the operator resumes it in the\n Azure portal before running.\n- `fab ls` output includes capacity information. Also visible in the Fabric Admin portal.\n\n**Question format:** Do you know the name of the Fabric capacity to use?\n- A) Yes — I know it (provide the name)\n- B) I can find it — I'll run `fab ls` or check the Fabric Admin portal\n- C) I'll provide it later — use a placeholder for now\n\n**Branch:**\n- A → embed capacity name in plan\n- B → provide `fab ls` command; wait for name; embed in plan\n- C → use `[CAPACITY_NAME]` placeholder; flag as required before running\n\n---\n\n### Phase 3 — Preference check (Preference-oriented)\n\nAfter the main domains, ask one closing preference question if the requirements\ninvolve choices between rigour and speed:\n\n*\"For any optional steps (e.g. domain assignment, access control), would you prefer\nto include everything now for a complete setup, or keep it minimal and add\ngovernance steps later?\"*\n\n- A) Include everything — set it up completely now\n- B) Keep it minimal — flag optional steps as manual for later\n- C) Decide step by step — I'll confirm each optional item\n\nThis shapes how the plan presents optional components.\n\n---\n\n## Confirmation\n\nBefore writing the environment profile, present a concise summary table of all path\ndecisions and collected parameters. Ask the operator to confirm accuracy. If anything\nis missing or unclear, ask only the targeted follow-up needed.\n\n```\n| # | Dimension | Question | Your answer | What this means |\n|---|-----------------|---------------------- |--------------------------------------|------------------------------------------------------|\n| 0 | Contextual | Project context | New setup | No existing conventions to inherit |\n| A | Constraint | Workspace creation | Creating new | Agent will create hub + spoke workspaces |\n| B | Constraint | Domain assignment | New domain (manual gate) | Domain creation flagged manual — admin rights needed |\n| C | Environmental | Access control | Groups — IDs to be provided directly | Role assignment scripted; IDs needed before run |\n| D | Preference | Deployment approach | PySpark notebook | Agent generates .ipynb for import into Fabric |\n| F | Environmental | Capacity | ldifabricdev | Embedded in notebook |\n| | Preference | Setup completeness | Include everything | All optional steps included in plan |\n```\n\n---\n\n## Output\n\nSave the confirmed profile as `00-environment-discovery/environment-profile.md`.\n\nInclude:\n- All path decisions (with FATA dimension label)\n- All collected parameter values\n- Parameters flagged as required before execution, with instructions for obtaining them\n- Manual gates — steps the operator must perform themselves, and why\n- Deployment prerequisites (e.g. `pip install ms-fabric-cli` if PowerShell/terminal)\n- Any historical/contextual notes that should inform naming or structure decisions\n\nAppend to `CHANGE_LOG.md`:\n`[{DATETIME}] Sub-Agent 0 complete — environment-profile.md produced. [N] path decisions recorded. Manual gates: [list or none]. Parameters still needed: [list or none].`\n\n---\n\n## Gotchas\n\n- **Never frame deployment as CLI vs no-CLI.** All three approaches use `fab`.\n- **Workspace names are case-sensitive in `fab` paths.** Always confirm exact casing.\n- **Entra group Object IDs are GUIDs, not display names.** Do not ask for them during\n discovery — flag that they are needed and establish how they will be obtained.\n- **`notebookutils` does not support Microsoft Graph.** A Fabric notebook cannot\n resolve group names to Object IDs at runtime.\n- **Domain creation requires Fabric Administrator rights — tenant-level.** Default to\n assigning an existing domain or skipping if there is any doubt.\n- **Never ask for credentials, secrets, or token values.** Discovery is about shape\n and approach — not credentials. Flag that a Service Principal is needed; the\n operator provides the values at runtime.\n- **Never leave the user blocked.** If a step requires permissions they don't have,\n offer: (a) skip and mark as manual, (b) produce a spec for their admin, or\n (c) substitute a UI-based workaround.\n- **Stop when path decisions are resolved.** Do not continue asking questions once\n everything that affects the plan structure is known.\n",
|
|
186
|
+
content: "---\nname: fabric-process-discovery\ndescription: >\n Use this skill to conduct the initial environment discovery conversation for any\n Microsoft Fabric process workflow. Collects workload scope, workspace access,\n deployment approach, access control, capacity, data location, and environment\n promotion needs through a FATA-aligned, one-question-at-a-time adaptive\n conversation. Output is a structured environment profile used by the orchestrating\n agent to plan execution. Triggers as Sub-Agent 0 in any Fabric process workflow agent.\nlicense: MIT\ncompatibility: Works in any Claude context — no external tools required at this stage.\n---\n\n# Fabric Process Discovery\n\n> ⚠️ **GOVERNANCE**: This skill only gathers context — it never executes commands or\n> creates resources. All collected information feeds into the execution plan which the\n> operator reviews and confirms before anything runs.\n>\n> ⚠️ **PRIVACY**: Never ask for passwords, tokens, client secrets, or Object IDs\n> during discovery. If a Service Principal is needed, record that it is required and\n> the permissions needed — the operator enters credential values at runtime only.\n\n## Workflow\n\n1. Adopt a Fabric architect expert perspective before asking anything.\n2. Read process requirements — identify which domains are relevant.\n3. Ask Phase 1 (contextual + historical background) first.\n4. Work through relevant domains one question at a time, branching on each answer.\n5. Present a confirmation summary and wait for explicit approval.\n6. Write the environment profile and append to `CHANGE_LOG.md`.\n\n## References\n\n- `references/technical-constraints.md` — authentication separation, Object IDs,\n `notebookutils` Graph limitation, Service Principal requirements, capacity state\n- `references/fabric-architecture.md` — workload landscape, medallion architecture,\n environment promotion patterns, credential management\n\nLoad the relevant reference when a domain question requires deeper technical context\nor when the operator asks a technical follow-up.\n\n---\n\n## Core Principles\n\n**1. Expert perspective first.**\nBefore generating questions, reason as a senior Fabric architect. Ask: *what gaps,\nif left unfilled, would cause the plan to fail or need rework?* Surface things the\noperator may not know they need to tell you.\n\n**2. One question at a time — Yes/No or 3–4 labelled options.**\nNever present multiple questions in one turn. Each question must be answerable with\na yes/no or a single choice (A/B/C or A/B/C/D). Wait for the answer before\nbranching. In Fabric discovery, each answer materially changes which questions are\nworth asking next — this is why one-at-a-time is correct here even though FATA\ndefaults to single-turn efficiency.\n\n**3. Scaffold before asking.**\nOne sentence of context before each question explaining what it establishes and why\nit matters for the plan. Operators new to Fabric cannot anticipate what a Fabric\narchitect considers essential.\n\n**4. Cover all five FATA dimensions.**\n\n| Dimension | What to establish |\n|---|---|\n| **Contextual** | Project background, team, experience level |\n| **Constraint-based** | Permissions, tooling, licensing |\n| **Preference-oriented** | Deployment style, governance vs speed, reuse goals |\n| **Environmental** | Capacity, workloads, existing workspaces, data locations |\n| **Historical** | Previous runs, naming conventions, existing patterns |\n\n**5. Path decisions vs parameter values.**\nPath decisions (can you create workspaces? which workloads?) determine plan structure\n— always collect. Parameter values (exact names, IDs) — collect now if available,\notherwise flag as *required before running*.\n\n**6. Offer a way forward on every question.**\nInclude an \"I'm not sure / I'll find out\" option. For specific values the operator\nmay not have ready, offer the command to retrieve them.\n\n**7. Prevent over-questioning.**\nOnly cover domains the requirements actually need. Stop when all path decisions are\nresolved. Roughly: 4–6 questions for simple processes, up to 12 for full pipelines.\n\n---\n\n## Question Sequence\n\n### Phase 1 — Contextual and Historical (always run first)\n\nEstablish background before specifics. Ask one question covering:\n- Is this a new setup, an extension of something existing, or a migration?\n- (If extending/migrating) Are there naming conventions or existing patterns to follow?\n\nOptions should cover: new / extending existing / migrating / unsure.\nThese answers shape the level of explanation needed in later questions and whether\ndefaults can be inferred from what already exists.\n\n---\n\n### Phase 2 — Relevant Domains\n\nSelect domains based on requirements. Work through them in order, one question at a\ntime, completing each branch before moving to the next domain.\n\n| Process involves | Domains to cover |\n|---|---|\n| Creating workspaces | A, B, C, D, F, G |\n| Creating lakehouses | A, D, F, G + medallion question |\n| Ingesting files | D, E |\n| Full pipeline (multiple workloads) | Workload scope question first, then A–G |\n| Notebooks / scripts only | D, F |\n\n---\n\n#### Workload scope (ask first for full pipelines)\n\n*Only ask when requirements span more than one workload or mention end-to-end pipelines.*\n\nOne sentence of context: the answer determines which downstream skills are needed\nand what the workspace structure should look like.\n\nQuestion: Which Fabric workloads does this process involve? (Select all that apply)\n- A) Lakehouse / Spark (Delta tables, PySpark notebooks, file ingestion)\n- B) Data Warehouse (T-SQL analytics)\n- C) Pipelines (orchestration, data movement)\n- D) KQL / Eventhouse (real-time or time-series data)\n- E) Power BI / Semantic Model (reporting layer)\n\nLoad `references/fabric-architecture.md` → Workload Landscape for downstream skill mapping.\n\n---\n\n#### Domain A — Workspace access (Constraint-based + Environmental)\n\n**Establish:** Can the operator create workspaces, or must they use existing ones?\nWhat names?\n\nQuestion: Can you create new Fabric workspaces?\n- A) Yes — I can create workspaces\n- B) No — I need to use existing workspaces\n- C) I'm not sure — I can run `fab ls` to check\n\nBranch:\n- A → ask for intended names (or placeholder); if lakehouses involved, ask whether\n medallion naming is expected (load `references/fabric-architecture.md` → Medallion)\n- B → ask for exact verbatim names of existing workspaces (case-sensitive in `fab`)\n- C → provide `fab ls` command (`pip install ms-fabric-cli` → `fab auth login` → `fab ls`); wait; branch as A or B\n\n---\n\n#### Domain B — Domain assignment (Constraint-based)\n\n**Establish:** Should workspaces be assigned to a Fabric domain?\n\nQuestion: Would you like to assign these workspaces to a Fabric domain?\n- A) Yes — assign to an existing domain (provide name)\n- B) Yes — create a new domain for these workspaces\n- C) No — skip for now\n\nBranch:\n- B → ask if they have Fabric Administrator rights (Yes / No / Unsure);\n No or Unsure → mark as manual gate, note intended domain name for documentation\n\n---\n\n#### Domain C — Access control (Environmental + Constraint-based)\n\n**Establish:** Who else needs workspace access? How will group identifiers be obtained?\n\nKey constraint: Fabric REST API requires Entra group **Object IDs** — display names\nare not accepted. Load `references/technical-constraints.md` → Entra Group Object IDs\nfor resolution methods.\n\nQuestion: Beyond yourself as Admin, does anyone else need workspace access?\n- A) No — just me for now\n- B) Yes — specific users (by email address)\n- C) Yes — Entra security groups\n- D) Yes — a mix of users and groups\n\nBranch (C or D):\n- Ask: Can you see these security groups in the Azure portal\n (Azure Active Directory → Groups)?\n - Yes → Ask: will you provide Object IDs directly, or should the agent generate\n Azure CLI lookup commands?\n - Either way: flag Object IDs as required before run; ask for group names and roles\n - No → mark group role assignment as manual gate; provide portal navigation instructions\n\nIf notebook deployment is chosen AND groups are involved: flag the `notebookutils`\nGraph limitation. Load `references/technical-constraints.md` → notebookutils and\nMicrosoft Graph. Ask whether a Service Principal is available or if the operator\nprefers to switch to PowerShell/terminal for role assignment.\n\n**Roles available:** Admin, Member, Contributor, Viewer\n\n---\n\n#### Domain D — Deployment approach (Preference-oriented)\n\n**Establish:** How does the operator prefer to run generated artefacts?\n\nKey context: all three approaches use the Fabric CLI (`fab`) internally — this is\nabout how the operator runs the generated artefacts, not whether they use the CLI.\nPowerShell and terminal approaches require **two separate logins**: `fab auth login`\n(Fabric) AND `az login` (Azure CLI, for group lookups). Load\n`references/technical-constraints.md` → Authentication for details.\n\nQuestion: How would you like to run the generated scripts or notebooks?\n- A) PySpark notebook — import into Fabric and run cell-by-cell in the Fabric UI\n- B) PowerShell script — review and run locally\n- C) Individual CLI commands — run step-by-step in the terminal\n\n---\n\n#### Domain E — Source data (Environmental)\n\n*Only ask if the process involves ingesting files.*\n\n**Establish:** Where are the source files?\n\nQuestion: Where are the source files you want to ingest?\n- A) On my local machine\n- B) Already in OneLake / Fabric (I have the path)\n- C) In cloud storage — SharePoint, Azure Blob, or similar\n\nBranch:\n- A → include upload step in plan\n- B → ask for OneLake path; skip upload\n- C → ask for source URL; include shortcut creation step\n\n---\n\n#### Domain F — Capacity (Environmental)\n\n*Ask whenever workspaces are being created.*\n\n**Establish:** What Fabric capacity will workspaces be assigned to?\n\nNote: capacity must be in Active state at creation time. Load\n`references/technical-constraints.md` → Capacity State Prerequisite if relevant.\n\nQuestion: Do you know the name of the Fabric capacity to use?\n- A) Yes — I know it (provide the name)\n- B) I can find it — I'll check via `fab ls` or the Fabric Admin portal\n- C) I'll provide it later — use a placeholder for now\n\n---\n\n#### Domain G — Environments (Constraint-based + Preference-oriented)\n\n*Ask whenever the process will run beyond a one-off or dev-only context.*\n\n**Establish:** How many environments need to be supported? This determines whether\nthe plan needs promotion logic and parameterised naming.\n\nLoad `references/fabric-architecture.md` → Environment Promotion for naming patterns.\n\nQuestion: Is this deployment for a single environment, or will it need to be\npromoted across environments?\n- A) Dev only — single environment, no promotion needed\n- B) Dev + prod — two environments, plan should parameterise workspace references\n- C) Dev + test + prod — three environments with a full promotion path\n- D) I'm not sure yet — build for single environment and we'll add promotion later\n\n---\n\n### Phase 3 — Credential management (ask if a Service Principal was flagged)\n\n*Only ask if Domain C or Domain D established that a Service Principal is needed.*\n\n**Establish:** How should SP credentials be managed in the generated artefacts?\n\nLoad `references/fabric-architecture.md` → Credential Management for options.\n\nQuestion: How would you like to handle the Service Principal credentials in the\ngenerated notebook or script?\n- A) Azure Key Vault reference — retrieve the secret at runtime from Key Vault\n- B) Runtime parameter entry — I'll paste in the values when running\n- C) Environment variable — set in my terminal session before running\n\n---\n\n### Phase 4 — Preference check\n\nAfter domains are resolved, ask one closing question if optional steps were\nidentified:\n\nQuestion: For optional steps (e.g. domain assignment, access control), would you\nprefer to include everything now or keep it minimal and add governance steps later?\n- A) Include everything — complete setup now\n- B) Keep it minimal — flag optional steps as manual for later\n- C) Decide step by step — confirm each optional item as we go\n\n---\n\n## Confirmation\n\nPresent a summary table before writing the profile. Include the FATA dimension for\neach item. Ask for explicit confirmation. If gaps remain, ask only the targeted\nfollow-up needed.\n\n```\n| # | Dimension | Question | Answer | What this means |\n|---|---------------|---------------------|-----------------------------|----------------------------------------------|\n| 0 | Contextual | Project context | New setup | No existing conventions to inherit |\n| A | Environmental | Workspace creation | Creating new | Agent creates workspaces |\n| B | Constraint | Domain assignment | New (manual gate) | Flagged manual — Fabric Admin rights needed |\n| C | Environmental | Access control | Groups — IDs direct | IDs required before run |\n| D | Preference | Deployment | PySpark notebook | .ipynb generated for Fabric import |\n| F | Environmental | Capacity | ldifabricdev | Embedded in notebook |\n| G | Constraint | Environments | Dev + prod | Plan parameterises all workspace references |\n```\n\n---\n\n## Output\n\nSave as `00-environment-discovery/environment-profile.md`. Include:\n- All path decisions (with FATA dimension)\n- Collected parameter values\n- Parameters flagged as required before execution (with retrieval instructions)\n- Manual gates with reason and operator instructions\n- Deployment prerequisites (auth steps, CLI installation)\n- Contextual/historical notes affecting naming or structure\n\nAppend to `CHANGE_LOG.md`:\n`[{DATETIME}] Sub-Agent 0 complete — environment-profile.md produced. [N] path decisions recorded. Manual gates: [list or none]. Parameters still needed: [list or none].`\n\n---\n\n## Gotchas\n\n- **Never frame deployment as CLI vs no-CLI** — all three approaches use `fab`\n- **`az login` and `fab auth login` are separate** — both required for PowerShell/terminal deployments that include group lookups\n- **Workspace names are case-sensitive** — confirm exact casing from `fab ls` output\n- **Entra group Object IDs required** — display names rejected by Fabric API; see `references/technical-constraints.md`\n- **`notebookutils` cannot query Microsoft Graph** — notebook + groups = SP or pre-resolved IDs required\n- **Domain creation = Fabric Admin rights** — not workspace-level; default to skip if uncertain\n- **Never collect credential values** — flag that they are needed; operator enters at runtime\n- **Stop when path decisions are resolved** — do not continue asking once the plan structure is clear\n",
|
|
187
|
+
},
|
|
188
|
+
{
|
|
189
|
+
relativePath: "references/fabric-architecture.md",
|
|
190
|
+
content: "# Fabric Architecture Reference\n\nLoad this file when questions arise about workload scope, environment promotion,\nmedallion architecture, or credential management patterns.\n\n---\n\n## Fabric Workload Landscape\n\nUnderstanding which workloads a process involves determines which downstream skills\nare needed and what environment questions are relevant.\n\n| Workload | Primary use | Downstream skill |\n|---|---|---|\n| **Lakehouse / Spark** | Delta tables, PySpark notebooks, file ingestion | spark-authoring-cli |\n| **Data Warehouse** | T-SQL analytics, structured serving layer | sqldw-authoring-cli |\n| **Pipelines** | Orchestration, data movement between workloads | Fabric Data Factory |\n| **KQL / Eventhouse** | Real-time and time-series analytics | eventhouse-authoring-cli |\n| **Power BI / Semantic Model** | Reporting layer, DAX, XMLA | powerbi-authoring-cli |\n| **Data Science / Agents** | ML models, conversational data agents | Fabric Data Science |\n\nMost process workflows involve a subset of these. Establishing workload scope early\nlets the plan delegate correctly and avoids discovering scope gaps mid-execution.\n\n**Ask about workload scope when:** requirements mention more than one of the above,\nor when the process spans ingestion → transformation → reporting (full pipeline).\n\n---\n\n## Medallion Architecture (Bronze / Silver / Gold)\n\nThe standard Fabric data engineering pattern organises data into three layers:\n\n| Layer | Contains | Format |\n|---|---|---|\n| **Bronze** | Raw ingested data — unmodified | Delta tables, files |\n| **Silver** | Validated, cleaned, conformed data | Delta tables |\n| **Gold** | Aggregated, business-ready data | Delta tables, views |\n\n**Why it matters for discovery:**\n- Lakehouse naming conventions typically reflect the layer\n (e.g. `lh_bronze`, `lh_silver`, `lh_gold`)\n- Shortcut and schema structures differ by layer\n- Pipelines must include validation gates between Bronze→Silver and Silver→Gold\n transitions — omitting these creates hard-to-debug data quality issues\n\n**Ask about medallion pattern when:** requirements involve lakehouses, ingestion,\nor transformation steps. The operator may not use the bronze/silver/gold naming —\nask whether they follow this pattern or have an existing naming convention.\n\n---\n\n## Environment Promotion (Dev / Test / Prod)\n\nThe FabricDataEngineer agent mandates explicit environment parameterisation.\nOne-off implementation choices that cannot be promoted across environments are\nexplicitly avoided.\n\n**What this means for discovery:**\n\n| Scenario | Plan impact |\n|---|---|\n| Dev only | Single workspace set; no promotion logic needed |\n| Dev + prod | Two workspace sets; plan must parameterise all workspace/lakehouse references |\n| Dev + test + prod | Three sets; deployment pipeline or scripted promotion required |\n\nWhen multiple environments are in scope:\n- Workspace names should follow a consistent pattern (e.g. `[Name]-Dev`, `[Name]-Prod`)\n- All IDs and names must be externalised — never hardcoded\n- The environment profile should record the naming pattern for each environment\n\n**Ask about environments when:** the process will run in production, or when the\noperator mentions promotion, CI/CD, or deploying to other teams.\n\n---\n\n## Credential Management\n\nCredentials required by Fabric processes (Service Principal secrets, storage keys,\nAPI tokens) should never be hardcoded in notebooks or scripts.\n\n| Method | Best for | Notes |\n|---|---|---|\n| **Azure Key Vault** | Production environments | Requires Key Vault resource + permissions |\n| **Notebook parameters** | Development / interactive runs | Operator enters at runtime; not stored |\n| **Environment variables** | Local PowerShell/terminal scripts | Set in shell session; not persisted |\n| **Fabric environment secrets** | Shared Spark environments | Requires Fabric environment configuration |\n\n**During discovery:** Do not collect credential values. If the plan requires a\nService Principal or storage credential, ask how the operator wants to manage it —\nKey Vault reference, runtime parameter entry, or environment variable. Record the\napproach in the environment profile so generated notebooks and scripts use the\ncorrect pattern.\n\n---\n\n## Developer vs Consumer Patterns\n\nUnderstanding the operator's role prevents over-engineering the plan:\n\n**Developers** (building pipelines, creating artefacts):\n- Use Fabric REST APIs for creating/managing workspaces, lakehouses, notebooks\n- Use protocol-specific connections for data access (Spark, ODBC/JDBC, XMLA, KQL)\n- Relevant to this skill — discovery is aimed at developers\n\n**Consumers** (querying data, running reports):\n- Use MCP servers or Fabric UI for natural language / report access\n- Typically do not need workspace creation or deployment scripts\n- If the operator is a consumer, scope the plan accordingly\n",
|
|
191
|
+
},
|
|
192
|
+
{
|
|
193
|
+
relativePath: "references/technical-constraints.md",
|
|
194
|
+
content: "# Technical Constraints Reference\n\nLoad this file when an operator's answer raises a technical question about\nauthentication, API limitations, or Fabric-specific constraints.\n\n---\n\n## Authentication — Two Separate Steps\n\nFabric CLI and Azure CLI use **different authentication sessions**. Both are\nrequired whenever the deployment involves Azure CLI lookups (e.g. resolving\nEntra group Object IDs) alongside Fabric CLI workspace operations.\n\n| Tool | Login command | Used for |\n|---|---|---|\n| Fabric CLI (`fab`) | `fab auth login` | Workspace creation, role assignment, lakehouse ops |\n| Azure CLI (`az`) | `az login` | Entra group/user Object ID resolution |\n\nOperators who choose PowerShell or terminal deployment must complete **both** logins\nbefore running the generated scripts. The generated artefacts will include both\ncommands with a clear note that they are separate.\n\nFor PySpark notebooks inside Fabric: authentication is automatic via\n`notebookutils.credentials.getToken('pbi')` — no manual login needed.\nHowever, this token covers Power BI / Fabric REST APIs only (see below).\n\n---\n\n## Entra Group Object IDs\n\nThe Fabric REST API and Fabric CLI require **Object IDs (GUIDs)** for group role\nassignment — display names are not accepted. This is a hard API constraint.\n\nResolution options for operators:\n\n| Method | Command | Requires |\n|---|---|---|\n| Azure portal | AAD → Groups → select → Object ID field | Portal access |\n| Azure CLI | `az ad group show --group \"Name\" --query id -o tsv` | `az login` |\n| PowerShell (Graph) | `Get-MgGroup -Filter \"displayName eq 'Name'\" \\| Select-Object Id` | Microsoft.Graph module |\n\nAlways ask operators to confirm group display names exactly as they appear in AAD —\nnames are case-sensitive in the API.\n\n---\n\n## `notebookutils` and Microsoft Graph\n\n`notebookutils.credentials.getToken('pbi')` inside a Fabric notebook returns a\nPower BI / Fabric scoped token. It **cannot** obtain a Microsoft Graph token.\n\nThis means a Fabric notebook **cannot**:\n- Look up Entra group Object IDs at runtime\n- Query AAD for user or group information\n- Call any Graph API endpoint\n\n**Consequence:** If the deployment approach is a PySpark notebook AND the plan\nincludes Entra group role assignment, one of these must be true before the notebook runs:\n- The operator provides Object IDs directly (entered into a parameter cell)\n- Object IDs are resolved via Azure CLI or PowerShell beforehand and passed in\n\nIf neither is practical, steer the operator toward PowerShell or terminal deployment\nfor the role assignment step — both support `az login` → Graph lookups inline.\n\n---\n\n## Service Principal — When Required\n\nA Service Principal with application permissions is required only when a Fabric\nnotebook needs to call Microsoft Graph at runtime. This applies when:\n- Deployment = PySpark notebook\n- Role assignment includes Entra groups\n- Operator wants ID resolution to happen inside the notebook automatically\n\nRequired SP permissions: `Group.Read.All` + `User.Read.All` (application, not delegated),\nwith admin consent granted in Azure AD.\n\n**During discovery:** Do not ask for SP credentials. Record that one is required,\nnote the permissions needed, and flag credential management as a runtime concern\n(see `fabric-architecture.md` → Credential Management).\n\n---\n\n## Workspace Name Case Sensitivity\n\nWorkspace names in `fab` paths are case-sensitive. `fab ls` returns exact names —\nalways confirm the operator is using the verbatim casing from that output.\n\nCommon failure: workspace names with leading/trailing spaces, or names that differ\nonly in capitalisation (e.g. `Finance Hub` vs `finance hub`).\n\n---\n\n## Capacity State Prerequisite\n\nA Fabric workspace must be assigned to an **Active** capacity at creation time.\nIf the capacity is paused, workspace creation will fail with `CapacityNotInActiveState`.\n\nThe operator must resume the capacity in the Azure portal before running the\nworkspace creation step. Flag this in the environment profile if there is any\nuncertainty about capacity state.\n",
|
|
187
195
|
},
|
|
188
196
|
],
|
|
189
197
|
},
|