@rishildi/ldi-process-skills-test 0.0.26 → 0.0.28
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/build/skills/embedded.js +9 -9
- package/package.json +1 -1
package/build/skills/embedded.js
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
// AUTO-GENERATED by scripts/embed-skills.ts — do not edit
|
|
2
|
-
// Generated at: 2026-04-05T20:
|
|
2
|
+
// Generated at: 2026-04-05T20:47:34.298Z
|
|
3
3
|
export const EMBEDDED_SKILLS = [
|
|
4
4
|
{
|
|
5
5
|
name: "create-fabric-lakehouses",
|
|
@@ -7,7 +7,7 @@ export const EMBEDDED_SKILLS = [
|
|
|
7
7
|
files: [
|
|
8
8
|
{
|
|
9
9
|
relativePath: "SKILL.md",
|
|
10
|
-
content: "---\nname: create-fabric-lakehouses\ndescription: >\n Use this skill when asked to create, provision, or set up one or more\n Lakehouse items in existing Microsoft Fabric workspaces. Triggers on:\n \"create a lakehouse\", \"provision lakehouses\", \"set up a Fabric lakehouse\",\n \"create lakehouse in Fabric\", \"new lakehouse\", \"create lakehouses across\n workspaces\". Does NOT trigger for: creating workspaces (use\n generate-fabric-workspace), querying lakehouse data, managing tables,\n uploading files, creating shortcuts, or general Fabric workspace management.\nlicense: MIT\ncompatibility: Fabric CLI (fab) installed and authenticated; Python 3.10+ for notebook approach\n---\n\n# Create Fabric Lakehouse\n\n> ⚠️ **GOVERNANCE**: This skill produces notebooks and scripts for the operator to\n> review and run — it never executes commands directly against a live Fabric environment.\n> Present each generated artefact to the operator before they run it.\n>\n> ⚠️ **DETERMINISTIC**: Follow the workflow and `references/notebook-template.py`\n> pattern exactly. Adapt only the parameters (workspace/lakehouse names, schema\n> settings). Never rewrite the authentication pattern, restructure the notebook, or\n> suggest alternative Python libraries. The only permitted skip is omitting a step\n> the operator lacks permissions for — document any skip in the definition file.\n\nProvisions one or more empty Lakehouse items across one or more existing\nMicrosoft Fabric workspaces, using a user-chosen approach, and produces an\naudit-trail definition file.\n\n**Companion skills:** Workspace creation is handled by the\n`generate-fabric-workspace` skill. Shortcut creation between lakehouses is\na separate skill / manual step. This skill assumes target workspaces already\nexist.\n\n## Orchestrated Context\n\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\nand the SOP before asking the user anything.\n\n| Parameter | Source when orchestrated |\n|---|---|\n| Deployment approach (notebook / PowerShell / terminal) | Environment profile |\n| Workspace name(s) | Environment profile or implementation plan |\n| Naming convention / prefix | Implementation plan or SOP |\n| Medallion layer(s) to create (Bronze / Silver / Gold) | SOP shared parameters |\n| Schema-enabled preference | SOP or implementation plan |\n\n**Only ask for parameters not found in these documents.** Summarise what was resolved\nautomatically, then ask for what remains (e.g. lakehouse core name, description).\n\n## Prerequisites\n\nBefore starting, ask the operator to run the following and share the output:\n\n```bash\nfab auth status # Must show authenticated\nfab ls # Must return workspace list\n```\n\nIf not authenticated, ask the operator to run `fab auth login` first.\n\n## Workflow\n\nExecute these steps in order.\n\n### Step 1 — Choose Provisioning Approach\n\nAsk the user which approach they want to follow:\n\n| Approach | Description | Best for |\n|----------|-------------|----------|\n| **A — PySpark Notebook** | Generates a `.py` notebook script that installs `ms-fabric-cli` and uses `!fab` commands. Output for the user to run in their Fabric workspace. | Users who want a reusable notebook artefact in Fabric |\n| **B — PowerShell Script** | Generates a PowerShell script containing `fab` CLI commands. Output for user validation before execution. | Users who prefer a single script to review and run locally |\n| **C — Interactive CLI** | Runs `fab` commands one-by-one in the terminal, pausing for user validation after each step. | Users who want maximum control and visibility |\n\n### Step 2 — Collect Workspace & Lakehouse Definitions (Sequential)\n\nCollect definitions **one workspace at a time**. For each workspace, gather:\n\n#### 2a — Target Workspace\n\n- [ ] **Workspace name** — must already exist. Verify with:\n ```bash\n fab exists \"<WorkspaceName>.Workspace\"\n ```\n If the workspace does not exist, inform the user and suggest they run\n the `generate-fabric-workspace` skill first. Do not proceed for that\n workspace until it exists.\n\n#### 2b — Naming Convention\n\nSuggest the default naming pattern: `{Prefix}_{CoreName}_{Suffix}`\n\n| Component | Description | Default | Example |\n|-----------|-------------|---------|---------|\n| **Prefix** | Item type indicator | `LH` | `LH` |\n| **CoreName** | Business/project name | *(user provides)* | `LANDONREVENUE` |\n| **Suffix** | Medallion layer or purpose | `BRONZE`, `SILVER`, `GOLD` | `BRONZE` |\n| **Separator** | Character between components | `_` | `_` |\n\nExample result: `LH_LANDONREVENUE_BRONZE`\n\nPresent the suggested defaults and **ask the user to confirm or override**\neach component. The user may change any component or use fully custom names\nthat don't follow the pattern at all.\n\n#### 2c — Lakehouse Definitions\n\nFor each lakehouse in this workspace, collect:\n\n- [ ] **Name** — generated from the naming convention, or custom\n- [ ] **Description** — optional text describing the lakehouse's purpose\n- [ ] **Schema-enabled** — yes/no (default: no). See\n `references/schema-enabled.md` for guidance.\n\n#### 2d — More Workspaces?\n\nAfter finishing one workspace, ask:\n\n> \"Do you have another workspace to provision lakehouses in, or are we done?\"\n\nIf yes, loop back to Step 2a. If done, proceed to Step 3.\n\n### Step 3 — Validate Inputs\n\nBefore generating anything, validate **all** lakehouse definitions:\n\n1. For each workspace, confirm it exists:\n ```bash\n fab exists \"<WorkspaceName>.Workspace\"\n ```\n — if it does not exist, stop and direct the user to create it first\n\n2. For each lakehouse, check it doesn't already exist:\n ```bash\n fab exists \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\"\n ```\n — if it already exists, warn the user and ask whether to **skip** or\n **rename**\n\n3. Validate lakehouse names against naming constraints (see Gotchas)\n\n### Step 4 — Generate & Execute\n\nBranch by the approach chosen in Step 1. Process workspaces sequentially.\n\n**Maintain an audit log** throughout execution — record every command run and\nits outcome. This log feeds into the definition file in Step 6.\n\n#### Approach A — PySpark Notebook\n\n1. Generate a PySpark notebook using the template in\n `references/notebook-template.py`\n2. The notebook pattern is:\n - Install `ms-fabric-cli` via `%pip install ms-fabric-cli -q`\n - Authenticate using `notebookutils.credentials.getToken('pbi')` for `FAB_TOKEN`\n and `FAB_TOKEN_AZURE`, and `notebookutils.credentials.getToken('storage')` for\n `FAB_TOKEN_ONELAKE` (OneLake requires the storage-scope token)\n - Add pip's scripts directory to `PATH` so `!fab` works\n - Use `!fab mkdir` shell commands for standard lakehouses\n - Use `!fab api` with REST payload for schema-enabled lakehouses\n3. The notebook must include:\n - A configuration cell with all workspace/lakehouse definitions\n - Existence checks before each creation\n - A summary cell at the end\n4. Save to `outputs/create-fabric-lakehouses_{YYYY-MM-DD_HH-MM}_{USERNAME}/<workspace>_create_lakehouses.py`\n5. Present to user for review\n6. Optionally upload:\n ```bash\n fab import \"<Workspace>.Workspace/<Name>.Notebook\" -i <path> --format py -f\n ```\n\n#### Approach B — PowerShell Script\n\n1. Generate a PowerShell script with the following structure:\n2. The script must:\n - Use `fab mkdir` for standard lakehouses\n - Handle schema-enabled lakehouses via the Fabric REST API\n (`fab api` wrapper — see `references/fabric-api-lakehouse.md`)\n - Include `fab exists` checks before each creation\n - Track created items for potential rollback\n - Include error handling and summary output\n3. Save to `outputs/create-fabric-lakehouses_{YYYY-MM-DD_HH-MM}_{USERNAME}/create_lakehouses.ps1`\n4. Present the script and **wait for explicit approval** before running\n\n#### Approach C — Interactive CLI\n\nExecute commands one-by-one per workspace, pausing after each:\n\n1. **For each lakehouse** — check then create:\n ```bash\n fab exists \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\"\n ```\n — if not exists, create. For standard lakehouses:\n ```bash\n fab mkdir \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\" -f\n ```\n — for schema-enabled lakehouses, use the REST API:\n ```bash\n WS_ID=$(fab get \"<WorkspaceName>.Workspace\" -q \"id\" | tr -d '\"')\n fab api \"workspaces/$WS_ID/lakehouses\" -X post \\\n -i '{\"displayName\":\"<Name>\",\"description\":\"<Desc>\",\"creationPayload\":{\"enableSchemas\":true}}'\n ```\n — wait for user confirmation after each\n\n2. **Verification** after all lakehouses in a workspace:\n ```bash\n fab ls \"<WorkspaceName>.Workspace\" -l\n ```\n\n3. Move to next workspace or proceed to Step 5.\n\n### Step 4a — Failure Handling\n\nIf any lakehouse creation fails during execution:\n\n1. **Stop immediately** — do not proceed to the next lakehouse\n2. **Report** what succeeded and what failed\n3. **Ask the user** how to proceed:\n\n| Option | Action |\n|--------|--------|\n| **Retry** | Re-attempt the failed lakehouse creation |\n| **Skip** | Skip the failed item and continue with remaining |\n| **Rollback & Abort** | Delete all lakehouses created *in this run*, then stop |\n| **Abort (keep)** | Stop but leave already-created lakehouses in place |\n\nIf the user chooses **Rollback & Abort**:\n```bash\nfab rm \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\" -f\n```\n— for each lakehouse created in this run (tracked in the audit log).\nConfirm each deletion with the user before executing.\n\n### Step 5 — Verify Creation\n\nRegardless of approach, verify every lakehouse across all workspaces:\n\n```bash\nfab exists \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\"\n```\n\nCollect the lakehouse ID for each:\n```bash\nfab get \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\" -q \"id\"\n```\n\nIf any verification fails, report and ask the user how to proceed (same\noptions as Step 4a).\n\n### Step 6 — Generate Definition File\n\nAfter all lakehouses are verified, generate a Lakehouse Definition markdown\nfile using the template in `references/definition-template.md`.\n\nThe definition file must include:\n\n- **Per workspace:** name, ID\n- **Per lakehouse:** name, ID, description, schema-enabled status, naming\n convention used, creation timestamp\n- **Overall:** approach used, naming convention applied, full audit trail of\n commands/API calls executed, any warnings, skipped items, or rollback actions\n\nSave to `outputs/create-fabric-lakehouses_{YYYY-MM-DD_HH-MM}_{USERNAME}/lakehouse_definition.md` and present to user.\n\n## Gotchas\n\n- `fab mkdir` creates a standard lakehouse but does NOT support the\n `enableSchemas` property. To create a schema-enabled lakehouse, use\n the Fabric REST API: `POST workspaces/{workspaceId}/lakehouses` with\n `{\"displayName\":\"<n>\",\"creationPayload\":{\"enableSchemas\":true}}`\n- Always use `-f` flag with `fab` commands in scripts to avoid interactive\n prompts that block execution\n- Lakehouse names must be unique within a workspace\n- Workspace names are case-sensitive in `fab` paths\n- Always quote paths containing spaces: `\"My Workspace.Workspace\"`\n- The Fabric REST API requires workspace ID (GUID), not display name —\n extract with `fab get \"<n>.Workspace\" -q \"id\"`\n- In notebooks, `ms-fabric-cli` must be installed via `%pip install` and\n the scripts directory added to `PATH` before `!fab` commands work\n- Token audiences for notebook auth: `'pbi'` for `FAB_TOKEN` and `FAB_TOKEN_AZURE`,\n `'storage'` for `FAB_TOKEN_ONELAKE` (OneLake requires the storage-scope token)\n- `fab auth status` must show a valid token before any operations; tokens\n expire and may need refresh\n- Lakehouse names cannot contain: `/`, `\\`, `#`, `%`, `?` or\n leading/trailing spaces. Max length: 256 characters\n- When rolling back, always confirm each deletion with the user — `fab rm`\n with `-f` is irreversible\n- This skill does NOT create workspaces — if a workspace is missing, direct\n the user to the `generate-fabric-workspace` skill\n- This skill does NOT create shortcuts between lakehouses — that is a\n separate step\n\n## Output Format\n\nSee `references/definition-template.md` for the full template.\n\n## Available References\n\n- **`references/notebook-template.py`** — PySpark notebook template for Approach A\n- **`references/definition-template.md`** — Lakehouse definition output template\n- **`references/schema-enabled.md`** — How schema-enabled lakehouses work\n- **`references/fabric-api-lakehouse.md`** — Fabric REST API reference for\n lakehouse creation\n",
|
|
10
|
+
content: "---\nname: create-fabric-lakehouses\ndescription: >\n Use this skill when asked to create, provision, or set up one or more\n Lakehouse items in existing Microsoft Fabric workspaces. Triggers on:\n \"create a lakehouse\", \"provision lakehouses\", \"set up a Fabric lakehouse\",\n \"create lakehouse in Fabric\", \"new lakehouse\", \"create lakehouses across\n workspaces\". Does NOT trigger for: creating workspaces (use\n generate-fabric-workspace), querying lakehouse data, managing tables,\n uploading files, creating shortcuts, or general Fabric workspace management.\nlicense: MIT\ncompatibility: Fabric CLI (fab) installed and authenticated; Python 3.10+ for notebook approach\n---\n\n# Create Fabric Lakehouse\n\n> ⚠️ **GOVERNANCE**: This skill produces notebooks and scripts for the operator to\n> review and run — it never executes commands directly against a live Fabric environment.\n> Present each generated artefact to the operator before they run it.\n>\n> ⚠️ **DETERMINISTIC**: The implementation pattern, tools, and artefact structure are\n> fixed — always follow `references/notebook-template.py` and the workflow below. The\n> workflow defines the permitted conditional branches (e.g. standard `fab mkdir` vs\n> REST API for schema-enabled lakehouses; skip steps the operator lacks permissions\n> for). Follow these branches based on the operator's situation. Never rewrite the\n> authentication pattern or suggest alternative approaches not defined in this skill.\n\nProvisions one or more empty Lakehouse items across one or more existing\nMicrosoft Fabric workspaces, using a user-chosen approach, and produces an\naudit-trail definition file.\n\n**Companion skills:** Workspace creation is handled by the\n`generate-fabric-workspace` skill. Shortcut creation between lakehouses is\na separate skill / manual step. This skill assumes target workspaces already\nexist.\n\n## Orchestrated Context\n\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\nand the SOP before asking the user anything.\n\n| Parameter | Source when orchestrated |\n|---|---|\n| Deployment approach (notebook / PowerShell / terminal) | Environment profile |\n| Workspace name(s) | Environment profile or implementation plan |\n| Naming convention / prefix | Implementation plan or SOP |\n| Medallion layer(s) to create (Bronze / Silver / Gold) | SOP shared parameters |\n| Schema-enabled preference | SOP or implementation plan |\n\n**Only ask for parameters not found in these documents.** Summarise what was resolved\nautomatically, then ask for what remains (e.g. lakehouse core name, description).\n\n## Prerequisites\n\nBefore starting, ask the operator to run the following and share the output:\n\n```bash\nfab auth status # Must show authenticated\nfab ls # Must return workspace list\n```\n\nIf not authenticated, ask the operator to run `fab auth login` first.\n\n## Workflow\n\nExecute these steps in order.\n\n### Step 1 — Choose Provisioning Approach\n\nAsk the user which approach they want to follow:\n\n| Approach | Description | Best for |\n|----------|-------------|----------|\n| **A — PySpark Notebook** | Generates a `.py` notebook script that installs `ms-fabric-cli` and uses `!fab` commands. Output for the user to run in their Fabric workspace. | Users who want a reusable notebook artefact in Fabric |\n| **B — PowerShell Script** | Generates a PowerShell script containing `fab` CLI commands. Output for user validation before execution. | Users who prefer a single script to review and run locally |\n| **C — Interactive CLI** | Runs `fab` commands one-by-one in the terminal, pausing for user validation after each step. | Users who want maximum control and visibility |\n\n### Step 2 — Collect Workspace & Lakehouse Definitions (Sequential)\n\nCollect definitions **one workspace at a time**. For each workspace, gather:\n\n#### 2a — Target Workspace\n\n- [ ] **Workspace name** — must already exist. Verify with:\n ```bash\n fab exists \"<WorkspaceName>.Workspace\"\n ```\n If the workspace does not exist, inform the user and suggest they run\n the `generate-fabric-workspace` skill first. Do not proceed for that\n workspace until it exists.\n\n#### 2b — Naming Convention\n\nSuggest the default naming pattern: `{Prefix}_{CoreName}_{Suffix}`\n\n| Component | Description | Default | Example |\n|-----------|-------------|---------|---------|\n| **Prefix** | Item type indicator | `LH` | `LH` |\n| **CoreName** | Business/project name | *(user provides)* | `LANDONREVENUE` |\n| **Suffix** | Medallion layer or purpose | `BRONZE`, `SILVER`, `GOLD` | `BRONZE` |\n| **Separator** | Character between components | `_` | `_` |\n\nExample result: `LH_LANDONREVENUE_BRONZE`\n\nPresent the suggested defaults and **ask the user to confirm or override**\neach component. The user may change any component or use fully custom names\nthat don't follow the pattern at all.\n\n#### 2c — Lakehouse Definitions\n\nFor each lakehouse in this workspace, collect:\n\n- [ ] **Name** — generated from the naming convention, or custom\n- [ ] **Description** — optional text describing the lakehouse's purpose\n- [ ] **Schema-enabled** — yes/no (default: no). See\n `references/schema-enabled.md` for guidance.\n\n#### 2d — More Workspaces?\n\nAfter finishing one workspace, ask:\n\n> \"Do you have another workspace to provision lakehouses in, or are we done?\"\n\nIf yes, loop back to Step 2a. If done, proceed to Step 3.\n\n### Step 3 — Validate Inputs\n\nBefore generating anything, validate **all** lakehouse definitions:\n\n1. For each workspace, confirm it exists:\n ```bash\n fab exists \"<WorkspaceName>.Workspace\"\n ```\n — if it does not exist, stop and direct the user to create it first\n\n2. For each lakehouse, check it doesn't already exist:\n ```bash\n fab exists \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\"\n ```\n — if it already exists, warn the user and ask whether to **skip** or\n **rename**\n\n3. Validate lakehouse names against naming constraints (see Gotchas)\n\n### Step 4 — Generate & Execute\n\nBranch by the approach chosen in Step 1. Process workspaces sequentially.\n\n**Maintain an audit log** throughout execution — record every command run and\nits outcome. This log feeds into the definition file in Step 6.\n\n#### Approach A — PySpark Notebook\n\n1. Generate a PySpark notebook using the template in\n `references/notebook-template.py`\n2. The notebook pattern is:\n - Install `ms-fabric-cli` via `%pip install ms-fabric-cli -q`\n - Authenticate using `notebookutils.credentials.getToken('pbi')` for `FAB_TOKEN`\n and `FAB_TOKEN_AZURE`, and `notebookutils.credentials.getToken('storage')` for\n `FAB_TOKEN_ONELAKE` (OneLake requires the storage-scope token)\n - Add pip's scripts directory to `PATH` so `!fab` works\n - Use `!fab mkdir` shell commands for standard lakehouses\n - Use `!fab api` with REST payload for schema-enabled lakehouses\n3. The notebook must include:\n - A configuration cell with all workspace/lakehouse definitions\n - Existence checks before each creation\n - A summary cell at the end\n4. Save to `outputs/create-fabric-lakehouses_{YYYY-MM-DD_HH-MM}_{USERNAME}/<workspace>_create_lakehouses.py`\n5. Present to user for review\n6. Optionally upload:\n ```bash\n fab import \"<Workspace>.Workspace/<Name>.Notebook\" -i <path> --format py -f\n ```\n\n#### Approach B — PowerShell Script\n\n1. Generate a PowerShell script with the following structure:\n2. The script must:\n - Use `fab mkdir` for standard lakehouses\n - Handle schema-enabled lakehouses via the Fabric REST API\n (`fab api` wrapper — see `references/fabric-api-lakehouse.md`)\n - Include `fab exists` checks before each creation\n - Track created items for potential rollback\n - Include error handling and summary output\n3. Save to `outputs/create-fabric-lakehouses_{YYYY-MM-DD_HH-MM}_{USERNAME}/create_lakehouses.ps1`\n4. Present the script and **wait for explicit approval** before running\n\n#### Approach C — Interactive CLI\n\nExecute commands one-by-one per workspace, pausing after each:\n\n1. **For each lakehouse** — check then create:\n ```bash\n fab exists \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\"\n ```\n — if not exists, create. For standard lakehouses:\n ```bash\n fab mkdir \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\" -f\n ```\n — for schema-enabled lakehouses, use the REST API:\n ```bash\n WS_ID=$(fab get \"<WorkspaceName>.Workspace\" -q \"id\" | tr -d '\"')\n fab api \"workspaces/$WS_ID/lakehouses\" -X post \\\n -i '{\"displayName\":\"<Name>\",\"description\":\"<Desc>\",\"creationPayload\":{\"enableSchemas\":true}}'\n ```\n — wait for user confirmation after each\n\n2. **Verification** after all lakehouses in a workspace:\n ```bash\n fab ls \"<WorkspaceName>.Workspace\" -l\n ```\n\n3. Move to next workspace or proceed to Step 5.\n\n### Step 4a — Failure Handling\n\nIf any lakehouse creation fails during execution:\n\n1. **Stop immediately** — do not proceed to the next lakehouse\n2. **Report** what succeeded and what failed\n3. **Ask the user** how to proceed:\n\n| Option | Action |\n|--------|--------|\n| **Retry** | Re-attempt the failed lakehouse creation |\n| **Skip** | Skip the failed item and continue with remaining |\n| **Rollback & Abort** | Delete all lakehouses created *in this run*, then stop |\n| **Abort (keep)** | Stop but leave already-created lakehouses in place |\n\nIf the user chooses **Rollback & Abort**:\n```bash\nfab rm \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\" -f\n```\n— for each lakehouse created in this run (tracked in the audit log).\nConfirm each deletion with the user before executing.\n\n### Step 5 — Verify Creation\n\nRegardless of approach, verify every lakehouse across all workspaces:\n\n```bash\nfab exists \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\"\n```\n\nCollect the lakehouse ID for each:\n```bash\nfab get \"<WorkspaceName>.Workspace/<LakehouseName>.Lakehouse\" -q \"id\"\n```\n\nIf any verification fails, report and ask the user how to proceed (same\noptions as Step 4a).\n\n### Step 6 — Generate Definition File\n\nAfter all lakehouses are verified, generate a Lakehouse Definition markdown\nfile using the template in `references/definition-template.md`.\n\nThe definition file must include:\n\n- **Per workspace:** name, ID\n- **Per lakehouse:** name, ID, description, schema-enabled status, naming\n convention used, creation timestamp\n- **Overall:** approach used, naming convention applied, full audit trail of\n commands/API calls executed, any warnings, skipped items, or rollback actions\n\nSave to `outputs/create-fabric-lakehouses_{YYYY-MM-DD_HH-MM}_{USERNAME}/lakehouse_definition.md` and present to user.\n\n## Gotchas\n\n- `fab mkdir` creates a standard lakehouse but does NOT support the\n `enableSchemas` property. To create a schema-enabled lakehouse, use\n the Fabric REST API: `POST workspaces/{workspaceId}/lakehouses` with\n `{\"displayName\":\"<n>\",\"creationPayload\":{\"enableSchemas\":true}}`\n- Always use `-f` flag with `fab` commands in scripts to avoid interactive\n prompts that block execution\n- Lakehouse names must be unique within a workspace\n- Workspace names are case-sensitive in `fab` paths\n- Always quote paths containing spaces: `\"My Workspace.Workspace\"`\n- The Fabric REST API requires workspace ID (GUID), not display name —\n extract with `fab get \"<n>.Workspace\" -q \"id\"`\n- In notebooks, `ms-fabric-cli` must be installed via `%pip install` and\n the scripts directory added to `PATH` before `!fab` commands work\n- Token audiences for notebook auth: `'pbi'` for `FAB_TOKEN` and `FAB_TOKEN_AZURE`,\n `'storage'` for `FAB_TOKEN_ONELAKE` (OneLake requires the storage-scope token)\n- `fab auth status` must show a valid token before any operations; tokens\n expire and may need refresh\n- Lakehouse names cannot contain: `/`, `\\`, `#`, `%`, `?` or\n leading/trailing spaces. Max length: 256 characters\n- When rolling back, always confirm each deletion with the user — `fab rm`\n with `-f` is irreversible\n- This skill does NOT create workspaces — if a workspace is missing, direct\n the user to the `generate-fabric-workspace` skill\n- This skill does NOT create shortcuts between lakehouses — that is a\n separate step\n\n## Output Format\n\nSee `references/definition-template.md` for the full template.\n\n## Available References\n\n- **`references/notebook-template.py`** — PySpark notebook template for Approach A\n- **`references/definition-template.md`** — Lakehouse definition output template\n- **`references/schema-enabled.md`** — How schema-enabled lakehouses work\n- **`references/fabric-api-lakehouse.md`** — Fabric REST API reference for\n lakehouse creation\n",
|
|
11
11
|
},
|
|
12
12
|
{
|
|
13
13
|
relativePath: "references/agent.md",
|
|
@@ -37,7 +37,7 @@ export const EMBEDDED_SKILLS = [
|
|
|
37
37
|
files: [
|
|
38
38
|
{
|
|
39
39
|
relativePath: "SKILL.md",
|
|
40
|
-
content: "---\nname: create-fabric-process-workflow-agent\ndescription: >\n Use this skill to create an orchestration agent definition (agent.md) for any\n Microsoft Fabric technical process. The user describes what they want to automate;\n the skill produces a self-contained agent.md. When run, the agent maps each process\n step to an available Fabric process skill, flags any steps with no matching skill\n as UNMAPPED, and at execution time offers three options for unmapped steps: perform\n manually, build a lightweight skill on the fly (saved locally), or engage the LDI\n Skills Creation Framework. Logs all decisions to an audit trail and orchestrates\n the full process end-to-end.\n Triggers on: \"create a process workflow agent\", \"build an orchestration agent\n for [process]\", \"create an agent that automates [process]\", \"orchestrate\n [process] into an agent\". Does NOT trigger for creating individual process\n skills, running an agent, writing code, or one-off analysis.\nlicense: MIT\ncompatibility: Python 3.8+ required for scripts/\n---\n\n# Create Fabric Process Workflow Agent\n\nCreates a concise, self-contained `agent.md` that defines an orchestration agent\nfor a Microsoft Fabric technical process. When run, the agent maps each process step\nto an available skill (COVERED), marks steps with no matching skill as UNMAPPED, and\noffers three options at execution time for unmapped steps: manual, on-the-fly skill\n(saved locally in a `skills/` folder), or the LDI Skills Creation Framework.\n\n> ⚠️ **DETERMINISTIC**: Generate `agent.md` by filling the placeholders in\n> `assets/agent-template.md` with the process-specific details collected from the\n> user. Never change the sub-agent structure, HARD STOP pattern, governance rules,\n> or output conventions defined in the template. The Core Governance Rules below\n> must be embedded verbatim in every generated `agent.md`.\n\n## Core Governance Rules\n\nThese rules are non-negotiable. They must be embedded verbatim in every generated\n`agent.md` so they are active at runtime.\n\n- **RULE 1 — Never execute autonomously.** Never run terminal commands, API calls,\n or scripts directly. Present every command in a fenced code block with the\n insert-into-terminal icon. The user runs it and reports back before proceeding.\n- **RULE 2 — Parameter gate before every execution step.** Before generating any\n artefact for a step, verify every required parameter is resolved. Any parameter\n deferred during discovery (marked `[TBC]`) must be asked for explicitly before\n proceeding. Never silently skip a parameter or substitute an empty value.\n- **RULE 3 — No silent approach changes.** If a blocker is found with the chosen\n approach, surface it and present alternatives. Let the user decide. Never switch\n silently. Approach constraints by step type:\n - Local file upload (CSV/PDF from operator's machine): **notebook not possible** —\n options are script, CLI commands, or manual. For 50+ files, note that script/CLI\n is sequential and slow; suggest manual upload via Fabric Files UI instead.\n - Schema creation: notebook (Spark SQL) or CLI; no native Fabric UI for lakehouses.\n - Shortcuts: notebook (`!fab ln` cell), PowerShell script, or interactive CLI all work.\n- **RULE 4 — No inference from context.** Collect all parameters from the user or\n the current prompt. Do not pre-populate from prior chat history, previous runs,\n or attached files not explicitly part of the current request.\n- **RULE 5 — Respect the user's skill level and environment.** Do not steer toward\n an approach the agent finds easier to generate. Match the user's comfort level,\n installed tooling, and stated preferences.\n- **RULE 6 — Stay within skill boundaries.** Generate only what skill definitions\n describe. On any failure: explain the cause from the error, offer the simplest\n manual or UI fallback, ask whether to skip.\n- **RULE 7 — Append to CHANGE_LOG.md after every step.** Include: step number,\n what was done, outcome (success/failure/skipped), and any notable decisions.\n- **RULE 8 — Two-question post-step pattern.** After each execution step: (Q1) ask\n whether the previous artefact ran correctly — if not, get the error and resolve it\n before proceeding; (Q2) propose the next step by name, state the planned approach\n and any implications, offer Yes (generate it) or No (choose a different approach\n or manual). Update the SOP and CHANGE_LOG to reflect any runtime decisions.\n\n## Inputs\n\n| Parameter | Description | Example |\n|-----------|-------------|---------|\n| `PROCESS_NAME` | Short name for the process (lowercase, hyphens) | `monthly-budget-consolidation` |\n| `REQUIREMENTS` | Full description of the process and each of its steps | `\"1) Collect data from five Excel files... 2) Summarise by category...\"` |\n| `SECTIONS` | Sub-agent sections to include (default: all four) | `impl-plan, biz-process, architecture, governance` |\n| `USERNAME` | Used in output folder naming | `rishi` |\n\n## Workflow\n\n- [ ] **Collect** — If `PROCESS_NAME`, `REQUIREMENTS`, or `USERNAME` are missing, ask for them.\n\n- [ ] **Confirm sections** — Present the four standard sections with descriptions\n (see `references/section-descriptions.md`). Ask which to include. Default: all four.\n Wait for explicit confirmation before drafting.\n\n- [ ] **Draft agent.md** — Use `assets/agent-template.md` as the base.\n - Substitute `{PROCESS_NAME}` and a ≤3-sentence `{REQUIREMENTS_SUMMARY}`.\n - Remove excluded sections. Keep each sub-agent block ≤25 lines.\n - Do not name any specific process skill or technology — all resolved at runtime.\n - Do not hardcode company names, specific values, or environment paths.\n\n- [ ] **Validate** — Present the draft. Ask: *\"Does this accurately reflect the process? Anything unclear?\"*\n Refine until the user confirms.\n\n- [ ] **Scaffold** — Run `python scripts/scaffold_output.py --process-name $PROCESS_NAME --username $USERNAME --sections $SECTIONS`.\n Write the confirmed agent.md to the returned `agent_md_path`.\n\n- [ ] **Confirm** — Report the output root path and list all created subfolders.\n\n## Output Format\n\n```\noutputs/\n└── {process-name}_{YYYY-MM-DD_HH-MM}_{username}/\n ├── agent.md ← self-contained orchestration agent definition\n ├── CHANGE_LOG.md ← audit trail; updated as agent runs\n ├── 01-implementation-plan/ ← empty; populated when agent runs\n ├── 02-business-process/ ← empty; populated when agent runs\n ├── 03-solution-architecture/ ← empty; populated when agent runs\n ├── 04-governance/ ← empty; populated when agent runs\n ├── 05-step-name/ ← execution step 1 (numbered from 05)\n │ └── thing.ipynb ← deliverable only (.ipynb / .ps1 / cli-commands.md)\n ├── 06-step-name/ ← execution step 2\n │ └── thing.ps1\n └── skills/ ← on-the-fly skills (created at runtime for UNMAPPED steps)\n └── [skill-name]/\n └── SKILL.md ← lightweight skill definition for local reference\n```\n\n`CHANGE_LOG.md` is initialised empty and updated by the agent each time it runs.\n\n### Intermediate vs. final artefacts\n\n| Classification | Description | Examples |\n|----------------|-------------|----------|\n| **Final** | The deliverable the user runs or deploys | `.ipynb` notebooks, `.sql` scripts, `.md` documentation |\n| **Intermediate** | Scripts that generate the final artefacts | `generate_*.py`, `generate_*.ps1` |\n\n- Intermediate artefacts live alongside their final outputs (same subfolder).\n- Label both types clearly when presenting outputs to the user.\n- Intermediate scripts must be deterministic and re-runnable.\n\n### Sub-agents in the generated agent.md\n\n| # | Section | Output document |\n|---|---------|-----------------|\n| 0 | Environment Discovery | `00-environment-discovery/environment-profile.md` |\n| 1 | Implementation Plan | `01-implementation-plan/implementation-plan.md` |\n| 2 | Business Process Mapping | `02-business-process/sop.md` |\n| 3 | Solution Architecture | `03-solution-architecture/specification.md` |\n| 4 | Security, Testing & Governance | `04-governance/governance-plan.md` |\n| — | **Execution Phase** | `05-[step]/`, `06-[step]/` ... + `COMPLETION_SUMMARY.md` |\n\nThe execution phase runs after all planning sub-agents are reviewed and confirmed.\nEach SOP step becomes a numbered execution subfolder. The SOP is updated in place\nthroughout execution to reflect runtime decisions (approach changes, errors, manual\nselections). CHANGE_LOG.md is updated after every step.\n\n## Gotchas\n\n- **Do not attempt to create process skills during skill execution.** Skill mapping\n happens inside Sub-Agent 2 when the generated agent.md is run. UNMAPPED steps are\n resolved by the operator at execution time, not by this skill upfront.\n- **Do not execute sub-agents** during skill execution — `agent.md` is a definition only.\n- Do not name specific tools, technologies, or process skills in the generated agent.md.\n- **Sub-Agent 0 always invokes `fabric-process-discovery` — never generate discovery questions ad-hoc.** The skill defines 6 fixed questions covering all Fabric process prerequisites: tenant access, workspace creation, item creation, domain assignment, Entra group visibility, and deployment preference. Do not derive or generate questions from the requirements.\n- Confirm sections **before** drafting, not after.\n- Keep each sub-agent block ≤25 lines to avoid context overload when the agent runs.\n\n## Available Scripts\n\n- **`scripts/scaffold_output.py`** — Creates the dated output folder structure including\n an empty `CHANGE_LOG.md`. Run: `python scripts/scaffold_output.py --help`\n",
|
|
40
|
+
content: "---\nname: create-fabric-process-workflow-agent\ndescription: >\n Use this skill to create an orchestration agent definition (agent.md) for any\n Microsoft Fabric technical process. The user describes what they want to automate;\n the skill produces a self-contained agent.md. When run, the agent maps each process\n step to an available Fabric process skill, flags any steps with no matching skill\n as UNMAPPED, and at execution time offers three options for unmapped steps: perform\n manually, build a lightweight skill on the fly (saved locally), or engage the LDI\n Skills Creation Framework. Logs all decisions to an audit trail and orchestrates\n the full process end-to-end.\n Triggers on: \"create a process workflow agent\", \"build an orchestration agent\n for [process]\", \"create an agent that automates [process]\", \"orchestrate\n [process] into an agent\". Does NOT trigger for creating individual process\n skills, running an agent, writing code, or one-off analysis.\nlicense: MIT\ncompatibility: Python 3.8+ required for scripts/\n---\n\n# Create Fabric Process Workflow Agent\n\nCreates a concise, self-contained `agent.md` that defines an orchestration agent\nfor a Microsoft Fabric technical process. When run, the agent maps each process step\nto an available skill (COVERED), marks steps with no matching skill as UNMAPPED, and\noffers three options at execution time for unmapped steps: manual, on-the-fly skill\n(saved locally in a `skills/` folder), or the LDI Skills Creation Framework.\n\n> ⚠️ **DETERMINISTIC**: The sub-agent structure, governance rules, HARD STOP pattern,\n> and output conventions are fixed — always generate `agent.md` from\n> `assets/agent-template.md`. The workflow defines the permitted conditional branches\n> (e.g. number and type of execution sub-agents depends on the process steps; UNMAPPED\n> steps are handled as defined). Follow these branches based on the process described.\n> Never alter the template structure or governance rules. The Core Governance Rules\n> below must be embedded verbatim in every generated `agent.md`.\n\n## Core Governance Rules\n\nThese rules are non-negotiable. They must be embedded verbatim in every generated\n`agent.md` so they are active at runtime.\n\n- **RULE 1 — Never execute autonomously.** Never run terminal commands, API calls,\n or scripts directly. Present every command in a fenced code block with the\n insert-into-terminal icon. The user runs it and reports back before proceeding.\n- **RULE 2 — Parameter gate before every execution step.** Before generating any\n artefact for a step, verify every required parameter is resolved. Any parameter\n deferred during discovery (marked `[TBC]`) must be asked for explicitly before\n proceeding. Never silently skip a parameter or substitute an empty value.\n- **RULE 3 — No silent approach changes.** If a blocker is found with the chosen\n approach, surface it and present alternatives. Let the user decide. Never switch\n silently. Approach constraints by step type:\n - Local file upload (CSV/PDF from operator's machine): **notebook not possible** —\n options are script, CLI commands, or manual. For 50+ files, note that script/CLI\n is sequential and slow; suggest manual upload via Fabric Files UI instead.\n - Schema creation: notebook (Spark SQL) or CLI; no native Fabric UI for lakehouses.\n - Shortcuts: notebook (`!fab ln` cell), PowerShell script, or interactive CLI all work.\n- **RULE 4 — No inference from context.** Collect all parameters from the user or\n the current prompt. Do not pre-populate from prior chat history, previous runs,\n or attached files not explicitly part of the current request.\n- **RULE 5 — Respect the user's skill level and environment.** Do not steer toward\n an approach the agent finds easier to generate. Match the user's comfort level,\n installed tooling, and stated preferences.\n- **RULE 6 — Stay within skill boundaries.** Generate only what skill definitions\n describe. On any failure: explain the cause from the error, offer the simplest\n manual or UI fallback, ask whether to skip.\n- **RULE 7 — Append to CHANGE_LOG.md after every step.** Include: step number,\n what was done, outcome (success/failure/skipped), and any notable decisions.\n- **RULE 8 — Two-question post-step pattern.** After each execution step: (Q1) ask\n whether the previous artefact ran correctly — if not, get the error and resolve it\n before proceeding; (Q2) propose the next step by name, state the planned approach\n and any implications, offer Yes (generate it) or No (choose a different approach\n or manual). Update the SOP and CHANGE_LOG to reflect any runtime decisions.\n\n## Inputs\n\n| Parameter | Description | Example |\n|-----------|-------------|---------|\n| `PROCESS_NAME` | Short name for the process (lowercase, hyphens) | `monthly-budget-consolidation` |\n| `REQUIREMENTS` | Full description of the process and each of its steps | `\"1) Collect data from five Excel files... 2) Summarise by category...\"` |\n| `SECTIONS` | Sub-agent sections to include (default: all four) | `impl-plan, biz-process, architecture, governance` |\n| `USERNAME` | Used in output folder naming | `rishi` |\n\n## Workflow\n\n- [ ] **Collect** — If `PROCESS_NAME`, `REQUIREMENTS`, or `USERNAME` are missing, ask for them.\n\n- [ ] **Confirm sections** — Present the four standard sections with descriptions\n (see `references/section-descriptions.md`). Ask which to include. Default: all four.\n Wait for explicit confirmation before drafting.\n\n- [ ] **Draft agent.md** — Use `assets/agent-template.md` as the base.\n - Substitute `{PROCESS_NAME}` and a ≤3-sentence `{REQUIREMENTS_SUMMARY}`.\n - Remove excluded sections. Keep each sub-agent block ≤25 lines.\n - Do not name any specific process skill or technology — all resolved at runtime.\n - Do not hardcode company names, specific values, or environment paths.\n\n- [ ] **Validate** — Present the draft. Ask: *\"Does this accurately reflect the process? Anything unclear?\"*\n Refine until the user confirms.\n\n- [ ] **Scaffold** — Run `python scripts/scaffold_output.py --process-name $PROCESS_NAME --username $USERNAME --sections $SECTIONS`.\n Write the confirmed agent.md to the returned `agent_md_path`.\n\n- [ ] **Confirm** — Report the output root path and list all created subfolders.\n\n## Output Format\n\n```\noutputs/\n└── {process-name}_{YYYY-MM-DD_HH-MM}_{username}/\n ├── agent.md ← self-contained orchestration agent definition\n ├── CHANGE_LOG.md ← audit trail; updated as agent runs\n ├── 01-implementation-plan/ ← empty; populated when agent runs\n ├── 02-business-process/ ← empty; populated when agent runs\n ├── 03-solution-architecture/ ← empty; populated when agent runs\n ├── 04-governance/ ← empty; populated when agent runs\n ├── 05-step-name/ ← execution step 1 (numbered from 05)\n │ └── thing.ipynb ← deliverable only (.ipynb / .ps1 / cli-commands.md)\n ├── 06-step-name/ ← execution step 2\n │ └── thing.ps1\n └── skills/ ← on-the-fly skills (created at runtime for UNMAPPED steps)\n └── [skill-name]/\n └── SKILL.md ← lightweight skill definition for local reference\n```\n\n`CHANGE_LOG.md` is initialised empty and updated by the agent each time it runs.\n\n### Intermediate vs. final artefacts\n\n| Classification | Description | Examples |\n|----------------|-------------|----------|\n| **Final** | The deliverable the user runs or deploys | `.ipynb` notebooks, `.sql` scripts, `.md` documentation |\n| **Intermediate** | Scripts that generate the final artefacts | `generate_*.py`, `generate_*.ps1` |\n\n- Intermediate artefacts live alongside their final outputs (same subfolder).\n- Label both types clearly when presenting outputs to the user.\n- Intermediate scripts must be deterministic and re-runnable.\n\n### Sub-agents in the generated agent.md\n\n| # | Section | Output document |\n|---|---------|-----------------|\n| 0 | Environment Discovery | `00-environment-discovery/environment-profile.md` |\n| 1 | Implementation Plan | `01-implementation-plan/implementation-plan.md` |\n| 2 | Business Process Mapping | `02-business-process/sop.md` |\n| 3 | Solution Architecture | `03-solution-architecture/specification.md` |\n| 4 | Security, Testing & Governance | `04-governance/governance-plan.md` |\n| — | **Execution Phase** | `05-[step]/`, `06-[step]/` ... + `COMPLETION_SUMMARY.md` |\n\nThe execution phase runs after all planning sub-agents are reviewed and confirmed.\nEach SOP step becomes a numbered execution subfolder. The SOP is updated in place\nthroughout execution to reflect runtime decisions (approach changes, errors, manual\nselections). CHANGE_LOG.md is updated after every step.\n\n## Gotchas\n\n- **Do not attempt to create process skills during skill execution.** Skill mapping\n happens inside Sub-Agent 2 when the generated agent.md is run. UNMAPPED steps are\n resolved by the operator at execution time, not by this skill upfront.\n- **Do not execute sub-agents** during skill execution — `agent.md` is a definition only.\n- Do not name specific tools, technologies, or process skills in the generated agent.md.\n- **Sub-Agent 0 always invokes `fabric-process-discovery` — never generate discovery questions ad-hoc.** The skill defines 6 fixed questions covering all Fabric process prerequisites: tenant access, workspace creation, item creation, domain assignment, Entra group visibility, and deployment preference. Do not derive or generate questions from the requirements.\n- Confirm sections **before** drafting, not after.\n- Keep each sub-agent block ≤25 lines to avoid context overload when the agent runs.\n\n## Available Scripts\n\n- **`scripts/scaffold_output.py`** — Creates the dated output folder structure including\n an empty `CHANGE_LOG.md`. Run: `python scripts/scaffold_output.py --help`\n",
|
|
41
41
|
},
|
|
42
42
|
{
|
|
43
43
|
relativePath: "assets/agent-template.md",
|
|
@@ -59,7 +59,7 @@ export const EMBEDDED_SKILLS = [
|
|
|
59
59
|
files: [
|
|
60
60
|
{
|
|
61
61
|
relativePath: "SKILL.md",
|
|
62
|
-
content: "---\r\nname: create-lakehouse-schemas-and-shortcuts\r\ndescription: >\r\n Use this skill to create schemas in schema-enabled Microsoft Fabric lakehouses\r\n and create cross-lakehouse table shortcuts using the Fabric CLI. Triggers on:\r\n \"create lakehouse shortcuts\", \"create schema in lakehouse\", \"shortcut tables\r\n between lakehouses\", \"cross-lakehouse shortcuts\", \"surface bronze tables in\r\n silver\". Does NOT trigger for: creating lakehouses (use create-fabric-lakehouse),\r\n uploading files, creating delta tables from CSV/PDF, or generating MLV scripts.\r\nlicense: MIT\r\ncompatibility: Python 3.8+ for scripts/. Fabric CLI (fab) installed and authenticated.\r\n---\r\n\r\n# Create Lakehouse Schemas and Shortcuts\r\n\r\nCreates schemas in schema-enabled Fabric lakehouses and creates cross-lakehouse\r\ntable shortcuts using `fab ln --type oneLake`. Schemas and shortcuts are\r\ncreated in the same run. Source and target lakehouses must already exist.\r\n\r\n> ⚠️ **GOVERNANCE**: This skill produces notebooks and scripts for the operator to\r\n> review and run — it never executes commands directly against a live Fabric environment.\r\n> Present each generated artefact to the operator before they run it.\r\n>\r\n> ⚠️ **DETERMINISTIC**: Run `scripts/generate_schema_shortcut_commands.py` with the\r\n> collected parameters — never write schema creation or shortcut commands by hand.\r\n> Adapt only data-specific values (lakehouse names, schema names, table lists). Skip\r\n> steps only if the operator lacks the required permissions.\r\n\r\n## Orchestrated Context\r\n\r\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\r\nand the SOP before asking the user anything.\r\n\r\n| Parameter | Source when orchestrated |\r\n|---|---|\r\n| Source and target workspace names | Environment profile or implementation plan |\r\n| Source and target lakehouse names | SOP shared parameters (from lakehouse creation step) |\r\n| Source schema | SOP shared parameters |\r\n\r\n**Only ask for parameters not found in these documents** (e.g. target schema name,\r\nspecific tables to shortcut if not listed in the SOP).\r\n\r\n## Inputs\r\n\r\n| Parameter | Description | Example |\r\n|-----------|-------------|---------|\r\n| `--source-workspace` | Source Fabric workspace name (exact, case-sensitive) | `\"LANDON_TEST_20260402_HUB\"` |\r\n| `--source-lakehouse` | Source lakehouse name (exact, case-sensitive) | `\"LANDON_FINANCE_BRONZE\"` |\r\n| `--source-schema` | Schema in source lakehouse. Use `dbo` for non-schema-enabled | `\"dbo\"` |\r\n| `--target-workspace` | Target Fabric workspace name (exact, case-sensitive) | `\"LANDON_TEST_20260402_FINANCE_SPOKE\"` |\r\n| `--target-lakehouse` | Target lakehouse name (exact, case-sensitive) | `\"LANDON_FINANCE_SILVER\"` |\r\n| `--target-schema` | Schema to create in target and place shortcuts into | `\"bronze\"` |\r\n| `--tables` | Comma-separated table names, or output of `fab ls` | `\"bookings,events\"` |\r\n\r\n## Workflow\r\n\r\n- [ ] **Step 1 — Collect parameters**: Ask the user for all inputs listed above.\r\n If source and target are in the same workspace, both workspace parameters will\r\n be the same value.\r\n\r\n- [ ] **Step 2 — Discover tables**: Ask the user to either:\r\n - Provide an explicit comma-separated list of table names, **or**\r\n - Run this command and share the output:\r\n ```\r\n fab ls \"<SOURCE_WORKSPACE>.Workspace/<SOURCE_LAKEHOUSE>.Lakehouse/Tables/\" -l\r\n ```\r\n Parse table names from the output or list. Present them back and confirm.\r\n\r\n- [ ] **Step 3 — Generate commands**: Run the script:\r\n ```\r\n python scripts/generate_schema_shortcut_commands.py \\\r\n --source-workspace \"<SOURCE_WORKSPACE>\" \\\r\n --source-lakehouse \"<SOURCE_LAKEHOUSE>\" \\\r\n --source-schema \"<SOURCE_SCHEMA>\" \\\r\n --target-workspace \"<TARGET_WORKSPACE>\" \\\r\n --target-lakehouse \"<TARGET_LAKEHOUSE>\" \\\r\n --target-schema \"<TARGET_SCHEMA>\" \\\r\n --tables \"<TABLE1>,<TABLE2>,...\"\r\n ```\r\n The script outputs JSON to stdout with sections: `schema_sql`,\r\n `schema_shortcut_test`, `shortcut_commands`, and `validation_command`.\r\n\r\n- [ ] **Step 4 — (Optional) Test schema-level shortcut**: Before creating\r\n individual table shortcuts, optionally test whether a single schema-level\r\n shortcut captures all tables (see \"Schema-Level Shortcut Hypothesis\" below).\r\n Use the `schema_shortcut_test` command from the script output.\r\n If the test succeeds and all tables appear, skip Step 5.\r\n\r\n- [ ] **Step 5 — Choose deployment approach**: Present these options:\r\n\r\n **Option A — Notebook Cells (Recommended for pipeline integration)**\r\n Append two cells to an existing notebook attached to the target lakehouse:\r\n 1. **Spark SQL cell**: Contains `CREATE SCHEMA IF NOT EXISTS <schema>;`\r\n from the `schema_sql` output.\r\n 2. **Code cell**: Contains each command from `shortcut_commands` prefixed\r\n with `!` (one per line).\r\n If no existing notebook is available, create a new one and note that it\r\n will need its own Spark session and `fab` authentication.\r\n\r\n **Option B — PowerShell Script**\r\n Write the `fab ln` commands from `shortcut_commands` to a `.ps1` file.\r\n Add a comment at the top reminding the user to create the schema first\r\n via a Spark SQL notebook cell (`fab` CLI cannot create schemas).\r\n\r\n **Option C — Interactive Terminal**\r\n Present each command one at a time for the operator to run. Start with the\r\n schema creation SQL (must run in a notebook), then present `fab ln` commands.\r\n\r\n- [ ] **Step 6 — Validate**: Ask the user to run:\r\n ```\r\n fab ls \"<TARGET_WORKSPACE>.Workspace/<TARGET_LAKEHOUSE>.Lakehouse/Tables/\" -l\r\n ```\r\n Confirm the expected shortcuts appear under the target schema.\r\n\r\n## Schema-Level Shortcut Hypothesis\r\n\r\nWhen creating shortcuts through the Fabric **UI**, connecting to a schema\r\nautomatically surfaces all tables in that schema as shortcuts. It is unknown\r\nwhether this works programmatically via `fab ln`. To test, use the\r\n`schema_shortcut_test` command from the script output, e.g.:\r\n\r\n```\r\nfab ln \"<TARGET_WS>.Workspace/<TARGET_LH>.Lakehouse/Tables/<TARGET_SCHEMA>/Shortcut\" \\\r\n --type oneLake \\\r\n --target ../../<SOURCE_WS_URL>.Workspace/<SOURCE_LH>.Lakehouse/Tables -f\r\n```\r\n\r\nIf this succeeds and all source tables appear in the target schema, use this\r\none-command approach instead of individual table shortcuts. Document the result\r\nfor future runs.\r\n\r\nIf the source is non-schema-enabled, test with `Tables` as the target path\r\n(no schema segment). If schema-enabled, use `Tables/<source_schema>`.\r\n\r\n## fab ln Syntax Reference\r\n\r\n### Shortcut naming convention (FIXED)\r\n\r\nShortcuts in schema-enabled lakehouses use **slash notation** for the schema path:\r\n```\r\nTables/<Schema>/<table_name>.Shortcut\r\n```\r\nExample: `Tables/Bronze/revenue_raw.Shortcut`\r\n\r\n**Periods (`.`) are FORBIDDEN in shortcut names.** Dot notation like\r\n`Tables/bronze.revenue_raw.Shortcut` will fail with:\r\n`[InvalidPath] Invalid shortcut name. The name should not include any of the following characters: [\"\\:|<>*?.%+]`\r\n\r\n### Cross-lakehouse: non-schema source → schema-enabled target\r\n\r\n```\r\nfab ln \"<TARGET_WS>.Workspace/<TARGET_LH>.Lakehouse/Tables/<TARGET_SCHEMA>/<TABLE>.Shortcut\" \\\r\n --type oneLake \\\r\n --target ../../<SOURCE_WS_URL>.Workspace/<SOURCE_LH>.Lakehouse/Tables/<TABLE> -f\r\n```\r\n\r\n### Cross-lakehouse: schema-enabled source → schema-enabled target\r\n\r\n```\r\nfab ln \"<TARGET_WS>.Workspace/<TARGET_LH>.Lakehouse/Tables/<TARGET_SCHEMA>/<TABLE>.Shortcut\" \\\r\n --type oneLake \\\r\n --target ../../<SOURCE_WS_URL>.Workspace/<SOURCE_LH>.Lakehouse/Tables/<SOURCE_SCHEMA>/<TABLE> -f\r\n```\r\n\r\n### Key rules\r\n\r\n- **Type**: Always `--type oneLake` for cross-lakehouse table shortcuts.\r\n Valid `fab ln` types are: `adlsGen2`, `amazonS3`, `dataverse`, `googleCloudStorage`,\r\n `oneLake`, `s3Compatible`. There is no `lakehouseTable` type.\r\n- **Slash notation**: Shortcut path uses `Tables/<Schema>/<table>.Shortcut` (slash, NOT dot)\r\n- **Periods forbidden**: `.` is not allowed in shortcut names — will error with `[InvalidPath]`\r\n- **`-f` flag**: Always include `-f` to skip the \"Are you sure?\" confirmation prompt\r\n (terminals that don't support CPR will hang without it)\r\n- **Source path**: Schema-enabled sources use `Tables/<schema>/<table>` (slash).\r\n Non-schema sources use `Tables/<table>` (no schema segment)\r\n- **URL encoding**: Workspace names with spaces use `%20` in the `--target` path\r\n- **`../../` prefix**: Required for cross-workspace targets to navigate to OneLake root\r\n- **Display names**: Shortcut destination path uses plain workspace/lakehouse names\r\n (no URL encoding); only the `--target` path is URL-encoded\r\n\r\n## Gotchas\r\n\r\n- **Slash NOT dot in shortcut paths**: The shortcut destination uses slash notation\r\n (`Tables/Bronze/revenue_raw.Shortcut`), NOT dot notation. Periods (`.`) are\r\n **forbidden** in shortcut names and will cause `[InvalidPath]` errors.\r\n- **Always use `-f` flag**: Without `-f`, `fab ln` prompts \"Are you sure? (Y/n)\".\r\n Terminals that don't support cursor position requests (CPR) will hang. Always\r\n append `-f` to force creation without confirmation.\r\n- **`--type oneLake` not `--type lakehouseTable`**: Cross-lakehouse table shortcuts\r\n require `--type oneLake`. The type `lakehouseTable` does not exist in the `fab ln`\r\n CLI. Valid types are: `adlsGen2`, `amazonS3`, `dataverse`, `googleCloudStorage`,\r\n `oneLake`, `s3Compatible`.\r\n- **Schema creation requires Spark SQL**: The `fab` CLI cannot create schemas.\r\n Schemas must be created via `CREATE SCHEMA IF NOT EXISTS <name>` in a Spark SQL\r\n cell in a notebook attached to the target lakehouse.\r\n- **Schema names are case-sensitive** in Fabric. Use exact casing consistently.\r\n- **Viewer access required**: Cross-workspace shortcuts require at least Viewer\r\n access on the source workspace.\r\n- **Existing shortcuts fail**: If a shortcut with the same name already exists,\r\n `fab ln` will error. Skip or delete existing ones before rerunning.\r\n- **Same-workspace shortcuts**: When source and target are in the same workspace,\r\n the `../../` prefix and URL encoding still apply in the `--target` path.\r\n\r\n## Available Scripts\r\n\r\n- **`scripts/generate_schema_shortcut_commands.py`** — Generates structured JSON\r\n containing schema SQL, `fab ln` shortcut commands, a schema-level shortcut test\r\n command, and a validation command.\r\n Run: `python scripts/generate_schema_shortcut_commands.py --help`\r\n",
|
|
62
|
+
content: "---\r\nname: create-lakehouse-schemas-and-shortcuts\r\ndescription: >\r\n Use this skill to create schemas in schema-enabled Microsoft Fabric lakehouses\r\n and create cross-lakehouse table shortcuts using the Fabric CLI. Triggers on:\r\n \"create lakehouse shortcuts\", \"create schema in lakehouse\", \"shortcut tables\r\n between lakehouses\", \"cross-lakehouse shortcuts\", \"surface bronze tables in\r\n silver\". Does NOT trigger for: creating lakehouses (use create-fabric-lakehouse),\r\n uploading files, creating delta tables from CSV/PDF, or generating MLV scripts.\r\nlicense: MIT\r\ncompatibility: Python 3.8+ for scripts/. Fabric CLI (fab) installed and authenticated.\r\n---\r\n\r\n# Create Lakehouse Schemas and Shortcuts\r\n\r\nCreates schemas in schema-enabled Fabric lakehouses and creates cross-lakehouse\r\ntable shortcuts using `fab ln --type oneLake`. Schemas and shortcuts are\r\ncreated in the same run. Source and target lakehouses must already exist.\r\n\r\n> ⚠️ **GOVERNANCE**: This skill produces notebooks and scripts for the operator to\r\n> review and run — it never executes commands directly against a live Fabric environment.\r\n> Present each generated artefact to the operator before they run it.\r\n>\r\n> ⚠️ **DETERMINISTIC**: The implementation pattern, tools, and artefact structure are\r\n> fixed — always run `scripts/generate_schema_shortcut_commands.py` and follow the\r\n> workflow below. The workflow defines the permitted conditional branches (e.g. include\r\n> schema creation only for schema-enabled lakehouses; skip steps the operator lacks\r\n> permissions for). Follow these branches based on the operator's situation. Never\r\n> write commands by hand or suggest approaches not defined in this skill.\r\n\r\n## Orchestrated Context\r\n\r\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\r\nand the SOP before asking the user anything.\r\n\r\n| Parameter | Source when orchestrated |\r\n|---|---|\r\n| Source and target workspace names | Environment profile or implementation plan |\r\n| Source and target lakehouse names | SOP shared parameters (from lakehouse creation step) |\r\n| Source schema | SOP shared parameters |\r\n\r\n**Only ask for parameters not found in these documents** (e.g. target schema name,\r\nspecific tables to shortcut if not listed in the SOP).\r\n\r\n## Inputs\r\n\r\n| Parameter | Description | Example |\r\n|-----------|-------------|---------|\r\n| `--source-workspace` | Source Fabric workspace name (exact, case-sensitive) | `\"LANDON_TEST_20260402_HUB\"` |\r\n| `--source-lakehouse` | Source lakehouse name (exact, case-sensitive) | `\"LANDON_FINANCE_BRONZE\"` |\r\n| `--source-schema` | Schema in source lakehouse. Use `dbo` for non-schema-enabled | `\"dbo\"` |\r\n| `--target-workspace` | Target Fabric workspace name (exact, case-sensitive) | `\"LANDON_TEST_20260402_FINANCE_SPOKE\"` |\r\n| `--target-lakehouse` | Target lakehouse name (exact, case-sensitive) | `\"LANDON_FINANCE_SILVER\"` |\r\n| `--target-schema` | Schema to create in target and place shortcuts into | `\"bronze\"` |\r\n| `--tables` | Comma-separated table names, or output of `fab ls` | `\"bookings,events\"` |\r\n\r\n## Workflow\r\n\r\n- [ ] **Step 1 — Collect parameters**: Ask the user for all inputs listed above.\r\n If source and target are in the same workspace, both workspace parameters will\r\n be the same value.\r\n\r\n- [ ] **Step 2 — Discover tables**: Ask the user to either:\r\n - Provide an explicit comma-separated list of table names, **or**\r\n - Run this command and share the output:\r\n ```\r\n fab ls \"<SOURCE_WORKSPACE>.Workspace/<SOURCE_LAKEHOUSE>.Lakehouse/Tables/\" -l\r\n ```\r\n Parse table names from the output or list. Present them back and confirm.\r\n\r\n- [ ] **Step 3 — Generate commands**: Run the script:\r\n ```\r\n python scripts/generate_schema_shortcut_commands.py \\\r\n --source-workspace \"<SOURCE_WORKSPACE>\" \\\r\n --source-lakehouse \"<SOURCE_LAKEHOUSE>\" \\\r\n --source-schema \"<SOURCE_SCHEMA>\" \\\r\n --target-workspace \"<TARGET_WORKSPACE>\" \\\r\n --target-lakehouse \"<TARGET_LAKEHOUSE>\" \\\r\n --target-schema \"<TARGET_SCHEMA>\" \\\r\n --tables \"<TABLE1>,<TABLE2>,...\"\r\n ```\r\n The script outputs JSON to stdout with sections: `schema_sql`,\r\n `schema_shortcut_test`, `shortcut_commands`, and `validation_command`.\r\n\r\n- [ ] **Step 4 — (Optional) Test schema-level shortcut**: Before creating\r\n individual table shortcuts, optionally test whether a single schema-level\r\n shortcut captures all tables (see \"Schema-Level Shortcut Hypothesis\" below).\r\n Use the `schema_shortcut_test` command from the script output.\r\n If the test succeeds and all tables appear, skip Step 5.\r\n\r\n- [ ] **Step 5 — Choose deployment approach**: Present these options:\r\n\r\n **Option A — Notebook Cells (Recommended for pipeline integration)**\r\n Append two cells to an existing notebook attached to the target lakehouse:\r\n 1. **Spark SQL cell**: Contains `CREATE SCHEMA IF NOT EXISTS <schema>;`\r\n from the `schema_sql` output.\r\n 2. **Code cell**: Contains each command from `shortcut_commands` prefixed\r\n with `!` (one per line).\r\n If no existing notebook is available, create a new one and note that it\r\n will need its own Spark session and `fab` authentication.\r\n\r\n **Option B — PowerShell Script**\r\n Write the `fab ln` commands from `shortcut_commands` to a `.ps1` file.\r\n Add a comment at the top reminding the user to create the schema first\r\n via a Spark SQL notebook cell (`fab` CLI cannot create schemas).\r\n\r\n **Option C — Interactive Terminal**\r\n Present each command one at a time for the operator to run. Start with the\r\n schema creation SQL (must run in a notebook), then present `fab ln` commands.\r\n\r\n- [ ] **Step 6 — Validate**: Ask the user to run:\r\n ```\r\n fab ls \"<TARGET_WORKSPACE>.Workspace/<TARGET_LAKEHOUSE>.Lakehouse/Tables/\" -l\r\n ```\r\n Confirm the expected shortcuts appear under the target schema.\r\n\r\n## Schema-Level Shortcut Hypothesis\r\n\r\nWhen creating shortcuts through the Fabric **UI**, connecting to a schema\r\nautomatically surfaces all tables in that schema as shortcuts. It is unknown\r\nwhether this works programmatically via `fab ln`. To test, use the\r\n`schema_shortcut_test` command from the script output, e.g.:\r\n\r\n```\r\nfab ln \"<TARGET_WS>.Workspace/<TARGET_LH>.Lakehouse/Tables/<TARGET_SCHEMA>/Shortcut\" \\\r\n --type oneLake \\\r\n --target ../../<SOURCE_WS_URL>.Workspace/<SOURCE_LH>.Lakehouse/Tables -f\r\n```\r\n\r\nIf this succeeds and all source tables appear in the target schema, use this\r\none-command approach instead of individual table shortcuts. Document the result\r\nfor future runs.\r\n\r\nIf the source is non-schema-enabled, test with `Tables` as the target path\r\n(no schema segment). If schema-enabled, use `Tables/<source_schema>`.\r\n\r\n## fab ln Syntax Reference\r\n\r\n### Shortcut naming convention (FIXED)\r\n\r\nShortcuts in schema-enabled lakehouses use **slash notation** for the schema path:\r\n```\r\nTables/<Schema>/<table_name>.Shortcut\r\n```\r\nExample: `Tables/Bronze/revenue_raw.Shortcut`\r\n\r\n**Periods (`.`) are FORBIDDEN in shortcut names.** Dot notation like\r\n`Tables/bronze.revenue_raw.Shortcut` will fail with:\r\n`[InvalidPath] Invalid shortcut name. The name should not include any of the following characters: [\"\\:|<>*?.%+]`\r\n\r\n### Cross-lakehouse: non-schema source → schema-enabled target\r\n\r\n```\r\nfab ln \"<TARGET_WS>.Workspace/<TARGET_LH>.Lakehouse/Tables/<TARGET_SCHEMA>/<TABLE>.Shortcut\" \\\r\n --type oneLake \\\r\n --target ../../<SOURCE_WS_URL>.Workspace/<SOURCE_LH>.Lakehouse/Tables/<TABLE> -f\r\n```\r\n\r\n### Cross-lakehouse: schema-enabled source → schema-enabled target\r\n\r\n```\r\nfab ln \"<TARGET_WS>.Workspace/<TARGET_LH>.Lakehouse/Tables/<TARGET_SCHEMA>/<TABLE>.Shortcut\" \\\r\n --type oneLake \\\r\n --target ../../<SOURCE_WS_URL>.Workspace/<SOURCE_LH>.Lakehouse/Tables/<SOURCE_SCHEMA>/<TABLE> -f\r\n```\r\n\r\n### Key rules\r\n\r\n- **Type**: Always `--type oneLake` for cross-lakehouse table shortcuts.\r\n Valid `fab ln` types are: `adlsGen2`, `amazonS3`, `dataverse`, `googleCloudStorage`,\r\n `oneLake`, `s3Compatible`. There is no `lakehouseTable` type.\r\n- **Slash notation**: Shortcut path uses `Tables/<Schema>/<table>.Shortcut` (slash, NOT dot)\r\n- **Periods forbidden**: `.` is not allowed in shortcut names — will error with `[InvalidPath]`\r\n- **`-f` flag**: Always include `-f` to skip the \"Are you sure?\" confirmation prompt\r\n (terminals that don't support CPR will hang without it)\r\n- **Source path**: Schema-enabled sources use `Tables/<schema>/<table>` (slash).\r\n Non-schema sources use `Tables/<table>` (no schema segment)\r\n- **URL encoding**: Workspace names with spaces use `%20` in the `--target` path\r\n- **`../../` prefix**: Required for cross-workspace targets to navigate to OneLake root\r\n- **Display names**: Shortcut destination path uses plain workspace/lakehouse names\r\n (no URL encoding); only the `--target` path is URL-encoded\r\n\r\n## Gotchas\r\n\r\n- **Slash NOT dot in shortcut paths**: The shortcut destination uses slash notation\r\n (`Tables/Bronze/revenue_raw.Shortcut`), NOT dot notation. Periods (`.`) are\r\n **forbidden** in shortcut names and will cause `[InvalidPath]` errors.\r\n- **Always use `-f` flag**: Without `-f`, `fab ln` prompts \"Are you sure? (Y/n)\".\r\n Terminals that don't support cursor position requests (CPR) will hang. Always\r\n append `-f` to force creation without confirmation.\r\n- **`--type oneLake` not `--type lakehouseTable`**: Cross-lakehouse table shortcuts\r\n require `--type oneLake`. The type `lakehouseTable` does not exist in the `fab ln`\r\n CLI. Valid types are: `adlsGen2`, `amazonS3`, `dataverse`, `googleCloudStorage`,\r\n `oneLake`, `s3Compatible`.\r\n- **Schema creation requires Spark SQL**: The `fab` CLI cannot create schemas.\r\n Schemas must be created via `CREATE SCHEMA IF NOT EXISTS <name>` in a Spark SQL\r\n cell in a notebook attached to the target lakehouse.\r\n- **Schema names are case-sensitive** in Fabric. Use exact casing consistently.\r\n- **Viewer access required**: Cross-workspace shortcuts require at least Viewer\r\n access on the source workspace.\r\n- **Existing shortcuts fail**: If a shortcut with the same name already exists,\r\n `fab ln` will error. Skip or delete existing ones before rerunning.\r\n- **Same-workspace shortcuts**: When source and target are in the same workspace,\r\n the `../../` prefix and URL encoding still apply in the `--target` path.\r\n\r\n## Available Scripts\r\n\r\n- **`scripts/generate_schema_shortcut_commands.py`** — Generates structured JSON\r\n containing schema SQL, `fab ln` shortcut commands, a schema-level shortcut test\r\n command, and a validation command.\r\n Run: `python scripts/generate_schema_shortcut_commands.py --help`\r\n",
|
|
63
63
|
},
|
|
64
64
|
{
|
|
65
65
|
relativePath: "scripts/generate_schema_shortcut_commands.py",
|
|
@@ -73,7 +73,7 @@ export const EMBEDDED_SKILLS = [
|
|
|
73
73
|
files: [
|
|
74
74
|
{
|
|
75
75
|
relativePath: "SKILL.md",
|
|
76
|
-
content: "---\nname: create-materialised-lakeview-scripts\ndescription: >\n Use this skill when asked to generate Spark SQL Materialized Lake View (MLV)\n scripts for Microsoft Fabric Lakehouse transformations. Triggers on: \"generate\n MLV\", \"create silver layer\", \"create gold layer\", \"bronze to silver\", \"silver\n to gold\", \"star schema\", \"lakehouse transformation\", \"materialized lake view\".\n Supports two layers (bronze→silver, silver→gold) and two approaches each\n (schema-driven with source+target CSVs, or pattern-driven with source-only CSVs).\n Does NOT trigger for general SQL writing, Power BI semantic model creation,\n notebook authoring, or Fabric workspace/lakehouse provisioning.\nlicense: MIT\ncompatibility: Python 3.8+ with pandas (for profiling script)\n---\n\n# Fabric Lakehouse MLV Generator\n\n> ⚠️ **GOVERNANCE**: This skill produces Spark SQL notebooks and scripts for the\n> operator to review and run — it never executes queries or deploys notebooks\n> autonomously. Present each generated artefact to the operator before they run it.\n>\n> ⚠️ **DETERMINISTIC**: Follow the reference patterns in `references/` exactly. Use\n> `references/profile_csvs.py` to analyse source CSVs — do not infer schema manually.\n> Generate SQL by applying the appropriate pattern guide (`bronze-to-silver-*.md` or\n> `silver-to-gold-*.md`) verbatim. Never invent transformation logic not defined in\n> the references. The only permitted deviation is a business rule explicitly stated\n> by the operator (e.g. a custom column calculation) — document it in the notebook.\n\nGenerates `CREATE OR REPLACE MATERIALIZED LAKE VIEW` scripts that transform data\nbetween lakehouse layers in Microsoft Fabric. Supports bronze→silver (cleaning,\nconforming, restructuring) and silver→gold (Power BI-optimised star schema).\n\n## Orchestrated Context\n\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\nand the SOP before asking the user anything.\n\n| Parameter | Source when orchestrated |\n|---|---|\n| Source and target schema names | SOP shared parameters or implementation plan |\n| Medallion layer (Bronze→Silver or Silver→Gold) | SOP step sequence |\n| Fiscal year start, currency code | Environment profile (organisation-wide settings) |\n\n**Only ask for parameters not found in these documents** (e.g. source/target CSV\nuploads, approach choice, any business-specific transformation rules).\n\n## Inputs\n\n| Parameter | Description | Example |\n|---|---|---|\n| Layer | Bronze→Silver or Silver→Gold | \"bronze to silver\" |\n| Approach | Schema-driven (source+target CSVs) or Pattern-driven (source CSVs only) | \"schema-driven\" |\n| Source CSVs | CSV exports of the source layer tables | `/mnt/user-data/uploads/*.csv` |\n| Target CSVs | (Schema-driven only) CSV exports of the target layer tables | `/mnt/user-data/uploads/silver_*.csv` |\n| Source schema | Schema name for source tables in SQL | `bronze` |\n| Target schema | Schema name for target views in SQL | `silver` or `gold` |\n| Fiscal year start | (Gold layer only) Month number 1–12 | `3` (March) |\n| Currency code | (Gold layer only) Base currency for measure suffixes | `GBP` |\n\n## Workflow\n\n### Phase 1 — Route the request\n\n- [ ] **1.1** Ask the user: **What layer transformation is this?**\n - Bronze → Silver\n - Silver → Gold\n\n- [ ] **1.2** Ask the user: **Which approach?**\n - **Schema-driven** — \"I have both source and target CSV files\"\n - **Pattern-driven** — \"I only have source CSV files; suggest transformations\"\n\n- [ ] **1.3** Based on answers, load the appropriate reference file:\n\n| Layer | Approach | Reference to load |\n|---|---|---|\n| Bronze → Silver | Schema-driven | `references/bronze-to-silver-schema-driven.md` |\n| Bronze → Silver | Pattern-driven | `references/bronze-to-silver-pattern-driven.md` |\n| Silver → Gold | Schema-driven | `references/silver-to-gold-schema-driven.md` |\n| Silver → Gold | Pattern-driven | `references/silver-to-gold-pattern-driven.md` |\n\nRead the full reference file with the `view` tool before proceeding. The reference\ncontains the detailed transformation catalogue, SQL patterns, and validation rules\nfor this specific layer+approach combination.\n\n- [ ] **1.4** Ask the user to confirm:\n - Source schema name (default: `bronze` for B→S, `silver` for S→G)\n - Target schema name (default: `silver` for B→S, `gold` for S→G)\n - If Silver→Gold: fiscal year start month and base currency code\n\n### Phase 2 — Inventory and profile\n\n- [ ] **2.1** List all CSV files in `/mnt/user-data/uploads/`.\n\n- [ ] **2.2** Ask the user to identify which CSVs are **source** and which (if\n schema-driven) are **target**. If file naming makes this obvious, propose the\n split and ask for confirmation.\n\n- [ ] **2.3** Run the profiler against every CSV:\n\n```bash\npython references/profile_csvs.py --dir /mnt/user-data/uploads/ --files <file1.csv> <file2.csv> ...\n```\n\nThe profiler outputs a JSON report per file with: column names, inferred dtypes,\nrow count, unique counts, null counts, sample values, and pattern flags (dates,\ncurrency, booleans, commas-in-numbers, whitespace). Store this output for use in\nsubsequent steps.\n\n> **Column naming in Fabric delta tables:** When CSVs are loaded into Fabric\n> Lakehouse delta tables (e.g., via the `csv-to-bronze-delta-tables` skill), a\n> `clean_columns()` function is applied that lowercases all column names and\n> replaces spaces and special characters with underscores. For example,\n> `Hotel ID` becomes `hotel_id` and `No_of_Rooms` becomes `no_of_rooms`.\n> PDF-extracted tables (from the `pdf-to-bronze-delta-tables` skill) may have\n> **entirely different column schemas** since fields are AI-extracted strings.\n> Always verify actual delta table column names — do NOT assume they match the\n> original CSV file headers.\n\n- [ ] **2.4** If schema-driven: profile both source and target CSVs. Map each\n target file to its source file(s) by column overlap. Present the mapping and\n ask the user to confirm.\n\n- [ ] **2.5** If pattern-driven: classify each source file by archetype (see\n reference file for the classification table). Present the classification and\n ask the user to confirm.\n\n### Phase 3 — Detect and plan transformations\n\nFollow the reference file's Step 3 (schema-driven) or Step 3 + Step 4\n(pattern-driven) exactly. The reference contains the full transformation detection\nlogic and catalogue.\n\n- [ ] **3.1** For each source→target pair (schema-driven) or each source file\n (pattern-driven), detect all applicable transformations.\n\n- [ ] **3.2** Present a **transformation plan** to the user — a table showing\n each output view, its sources, the transformations that will be applied, and\n any assumptions.\n\n- [ ] **3.3** If Silver→Gold: run the **anti-pattern check** from the reference:\n - No table mixes dimensions and measures\n - No dimension references another dimension via FK (no snowflaking)\n - Consistent grain within each fact\n - Degenerate dimensions stay in facts\n - Flag junk dimension candidates\n\n- [ ] **3.4** Wait for user confirmation before generating SQL.\n\n### Phase 4 — Generate the SQL\n\nFollow the reference file's SQL generation step exactly (Step 4 or Step 5,\ndepending on reference). Key rules that apply to ALL layer+approach combinations:\n\n**File structure:**\n1. `CREATE SCHEMA IF NOT EXISTS <target_schema>;`\n2. Comment header with assumptions (layer, approach, fiscal year, currency, grain)\n3. Views ordered by dependency (dimensions/independent views first, then dependents)\n4. Each view: `CREATE OR REPLACE MATERIALIZED LAKE VIEW <schema>.<view_name> AS`\n\n**Notebook documentation (when delivering as .ipynb):**\nLoad `references/notebook-standard.md` for the required markdown cell structure.\nWhen delivering as a notebook, the per-view markdown cells replace the separate\nlogic file — the notebook is the single source of truth.\n\n**MLV-to-MLV dependency pattern:**\nMaterialized Lake Views in Fabric can reference other Materialized Lake Views.\nThis is the **standard layered pattern** — build dimensions and independent\nfacts first, then create dependent views that JOIN to them. For example:\n- `silver.room_rate` joins to `silver.hotel_dim` via a fuzzy/normalised key\n- `silver.forecast_monthly` reads from `silver.revenue_monthly` for weight calculation\n- `silver.expenses_monthly` reads from `silver.revenue_monthly` for proportional allocation\n\nAlways order views by dependency: independent views first, dependent views last.\n\nLoad `references/sql-conventions.md` for naming conventions, CTE patterns,\ntype casting rules, and non-obvious Spark SQL syntax before writing any SQL.\n\n- [ ] **4.1** Write the SQL to `/home/claude/mlv_output.sql`.\n\n### Phase 4a — Generate T-SQL validation queries\n\nBefore converting to MLV format, generate a set of plain `SELECT` queries that\nthe user can run against the Fabric SQL Analytics Endpoint to validate the\ntransformation logic independently.\n\n- [ ] **4a.1** For each MLV definition, extract the CTE + SELECT logic and wrap\n it as a standalone `SELECT` statement (removing the `CREATE OR REPLACE\n MATERIALIZED LAKE VIEW` wrapper).\n\n- [ ] **4a.2** Write the validation queries to a separate file:\n - Bronze→Silver: `bronze_to_silver_validation.sql`\n - Silver→Gold: `silver_to_gold_validation.sql`\n\n- [ ] **4a.3** For each query, add a `LIMIT 20` clause and a `-- Expected: ...`\n comment indicating the expected row count and key column values.\n\n- [ ] **4a.4** Present the validation file to the user. The user can run these\n queries in the Fabric SQL Analytics Endpoint (T-SQL mode) to inspect outputs\n before committing to the MLV definitions.\n\n> **Why T-SQL first?** MLV creation is an all-or-nothing operation. If a column\n> name is wrong or a date format doesn't parse, the entire MLV fails. Running\n> validation SELECTs first catches these issues with clear error messages and\n> lets the user inspect sample data before committing.\n\n### Phase 5 — Validate\n\n- [ ] **5.1** Run the **data validation** from the reference file's validation\n step. Load source (and target, if schema-driven) CSVs in pandas and verify:\n - Column names match the target / expected output\n - Row counts are within tolerance (exact for dims, ±5% for facts)\n - Numeric columns: values within tolerance\n - Date columns: all parse correctly\n\n- [ ] **5.2** If Silver→Gold, run the **star schema structural checklist**:\n - [ ] Every table is clearly a dimension or a fact\n - [ ] Every fact has FKs to all related dimensions\n - [ ] Every dimension has a unique primary key\n - [ ] A date dimension exists spanning the full fact date range\n - [ ] Date dimension has display + sort column pairs for Power BI\n - [ ] Every dimension has an unknown/unassigned member row\n - [ ] No snowflaking (no dim-to-dim FK references)\n - [ ] No fact embeds descriptive attributes belonging in a dimension\n - [ ] Consistent grain within each fact table\n - [ ] Consistent naming: `dim_` for dimensions, `fact_` for facts\n - [ ] Surrogate key DENSE_RANK ORDER BY identical in dim views and fact CTEs\n - [ ] Role-playing dimensions documented\n - [ ] Degenerate dimensions remain in facts\n\n- [ ] **5.3** Fix any issues found. Re-validate until clean.\n\n### Phase 6 — Deliver\n\n- [ ] **6.1** Copy the validated SQL to `/mnt/user-data/outputs/` with a\n descriptive filename:\n - Bronze→Silver: `bronze_to_silver_mlv.sql`\n - Silver→Gold: `silver_to_gold_mlv.sql`\n\n- [ ] **6.2** Generate a **transformation logic document** alongside the SQL:\n - Bronze→Silver: `silver_logic.md`\n - Silver→Gold: `gold_logic.md`\n\n This file MUST contain:\n - **Per-view section** with: source table(s), transformations applied (reference\n T-codes), column mapping (bronze name → silver alias + type), any data quality\n issues detected (nulls, artifacts, dirty data, ambiguous formats) and how they\n were handled.\n - **Cross-view dependencies**: which MLVs reference other MLVs and why.\n - **Dropped/excluded data**: columns or rows removed, with rationale.\n - **Domain context**: any business-domain knowledge that informed the design\n (e.g., location hierarchies, currency conventions, fiscal calendars).\n - **Assumptions**: anything not explicitly confirmed by the user.\n\n If delivering as a notebook (`.ipynb`), the per-view markdown cells serve as\n the inline documentation — no separate logic file is needed, since the same\n information is embedded directly in the notebook.\n\n- [ ] **6.3** Present both files to the user.\n\n- [ ] **6.4** Summarise:\n - Number of views created\n - Key transformation patterns applied\n - (Gold) Number of dimensions vs facts, fiscal year config, currency\n - Any warnings or assumptions\n\n## Output Format\n\n```sql\n-- <Layer> layer Spark SQL MLV definitions\n-- Generated by fabric-lakehouse-mlv skill\n-- Source schema: <source_schema> | Target schema: <target_schema>\n-- Assumptions: <fiscal year, currency, grain, etc.>\n\nCREATE SCHEMA IF NOT EXISTS <target_schema>;\n\n-- <View description>\nCREATE OR REPLACE MATERIALIZED LAKE VIEW <target_schema>.<view_name> AS\nWITH cleaned AS (\n ...\n)\nSELECT ...\nFROM cleaned;\n```\n\n## Gotchas\n\n- **BOM characters**: Bronze/silver CSVs often have UTF-8 BOM. Always use\n `encoding='utf-8-sig'` in pandas.\n- **Date format ambiguity**: If all day values ≤ 12, `dd/MM/yyyy` vs `MM/dd/yyyy`\n is ambiguous. Default to `dd/MM/yyyy` for UK/EU data. Ask the user if unsure.\n- **Unpivot STACK count**: The integer N in `LATERAL VIEW STACK(N, ...)` must\n exactly match the number of column pairs. Off-by-one causes silent data loss.\n- **Surrogate key determinism**: `DENSE_RANK(ORDER BY col)` in a gold dimension\n and the matching CTE in a fact MUST use the exact same ORDER BY or keys diverge.\n- **SCD fan-out**: Overlapping date ranges in SCD tables duplicate fact rows.\n Validate non-overlap in silver before building gold.\n- **COALESCE placement**: Apply in the final SELECT of gold facts, never in the\n JOIN condition. Joining `ON fk = 'UNKNOWN'` would incorrectly match the\n unknown dimension row.\n- **Revenue-weighted allocation**: Only use when a revenue table exists. Fall back\n to equal split (`amount / 12.0`) when revenue is zero for a period.\n- **Power BI sort columns**: In the gold date dimension, always pair display\n columns (MonthName, DayOfWeekName, FiscalPeriodLabel) with numeric sort\n columns (MonthNumber, DayOfWeekNumber, FiscalPeriodNumber). Without these,\n months sort alphabetically in Power BI.\n- **No snowflaking in gold**: Flatten all dimension attributes. `dim_hotel`\n should contain City and Country directly, not reference a `dim_geography`.\n- **dayofweek() in Spark**: Returns 1=Sunday, 7=Saturday. Weekend = `IN (1,7)`.\n- **Fiscal year formula**: `((month + (12 - start_month)) % 12) + 1`. Test at\n January and at the start month for off-by-one errors.\n- **MLV-to-MLV references**: Materialized Lake Views in Fabric CAN reference\n other Materialized Lake Views. This is the preferred layered pattern. Always\n create referenced views before referencing views (dependency ordering).\n Use `silver.view_name` (not `bronze.view_name`) when joining to a silver\n view from another silver view.\n- **Column naming mismatch**: Bronze delta table columns may differ from the\n original CSV file headers. The `csv-to-bronze-delta-tables` skill applies\n `clean_columns()` which lowercases all names and replaces spaces/special\n characters with underscores (e.g., `Hotel ID` → `hotel_id`). PDF-extracted\n tables (from `pdf-to-bronze-delta-tables`) have AI-determined field names\n that may not match any CSV. Always verify actual lakehouse column names\n before writing SQL.\n\n## Available References\n\n- **`references/profile_csvs.py`** — Profiles uploaded CSV files and outputs a JSON\n report with column metadata, type flags, and pattern detection.\n Run: `python references/profile_csvs.py --help`\n- **`references/sql-conventions.md`** — Naming, CTE patterns, type casting, and Spark SQL syntax. Load during Phase 4.\n- **`references/notebook-standard.md`** — Required markdown cell structure when delivering output as a `.ipynb` notebook. Load when user requests notebook output.\n- **`references/bronze-to-silver-schema-driven.md`** — Transformation catalogue for bronze→silver schema-driven approach.\n- **`references/bronze-to-silver-pattern-driven.md`** — Transformation catalogue for bronze→silver pattern-driven approach.\n- **`references/silver-to-gold-schema-driven.md`** — Transformation catalogue for silver→gold schema-driven approach.\n- **`references/silver-to-gold-pattern-driven.md`** — Transformation catalogue for silver→gold pattern-driven approach.\n- **`references/output-template.sql`** — SQL output template.\n",
|
|
76
|
+
content: "---\nname: create-materialised-lakeview-scripts\ndescription: >\n Use this skill when asked to generate Spark SQL Materialized Lake View (MLV)\n scripts for Microsoft Fabric Lakehouse transformations. Triggers on: \"generate\n MLV\", \"create silver layer\", \"create gold layer\", \"bronze to silver\", \"silver\n to gold\", \"star schema\", \"lakehouse transformation\", \"materialized lake view\".\n Supports two layers (bronze→silver, silver→gold) and two approaches each\n (schema-driven with source+target CSVs, or pattern-driven with source-only CSVs).\n Does NOT trigger for general SQL writing, Power BI semantic model creation,\n notebook authoring, or Fabric workspace/lakehouse provisioning.\nlicense: MIT\ncompatibility: Python 3.8+ with pandas (for profiling script)\n---\n\n# Fabric Lakehouse MLV Generator\n\n> ⚠️ **GOVERNANCE**: This skill produces Spark SQL notebooks and scripts for the\n> operator to review and run — it never executes queries or deploys notebooks\n> autonomously. Present each generated artefact to the operator before they run it.\n>\n> ⚠️ **DETERMINISTIC**: The SQL patterns, notebook structure, and tooling are fixed —\n> always run `references/profile_csvs.py` and apply the appropriate pattern guide\n> (`bronze-to-silver-*.md` or `silver-to-gold-*.md`). The workflow defines the\n> permitted conditional branches (e.g. schema-driven vs pattern-driven approach;\n> bronze→silver vs silver→gold layer; operator-specified business rules applied on\n> top of the standard pattern). Follow these branches; never invent transformation\n> logic or SQL patterns not defined in the references.\n\nGenerates `CREATE OR REPLACE MATERIALIZED LAKE VIEW` scripts that transform data\nbetween lakehouse layers in Microsoft Fabric. Supports bronze→silver (cleaning,\nconforming, restructuring) and silver→gold (Power BI-optimised star schema).\n\n## Orchestrated Context\n\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\nand the SOP before asking the user anything.\n\n| Parameter | Source when orchestrated |\n|---|---|\n| Source and target schema names | SOP shared parameters or implementation plan |\n| Medallion layer (Bronze→Silver or Silver→Gold) | SOP step sequence |\n| Fiscal year start, currency code | Environment profile (organisation-wide settings) |\n\n**Only ask for parameters not found in these documents** (e.g. source/target CSV\nuploads, approach choice, any business-specific transformation rules).\n\n## Inputs\n\n| Parameter | Description | Example |\n|---|---|---|\n| Layer | Bronze→Silver or Silver→Gold | \"bronze to silver\" |\n| Approach | Schema-driven (source+target CSVs) or Pattern-driven (source CSVs only) | \"schema-driven\" |\n| Source CSVs | CSV exports of the source layer tables | `/mnt/user-data/uploads/*.csv` |\n| Target CSVs | (Schema-driven only) CSV exports of the target layer tables | `/mnt/user-data/uploads/silver_*.csv` |\n| Source schema | Schema name for source tables in SQL | `bronze` |\n| Target schema | Schema name for target views in SQL | `silver` or `gold` |\n| Fiscal year start | (Gold layer only) Month number 1–12 | `3` (March) |\n| Currency code | (Gold layer only) Base currency for measure suffixes | `GBP` |\n\n## Workflow\n\n### Phase 1 — Route the request\n\n- [ ] **1.1** Ask the user: **What layer transformation is this?**\n - Bronze → Silver\n - Silver → Gold\n\n- [ ] **1.2** Ask the user: **Which approach?**\n - **Schema-driven** — \"I have both source and target CSV files\"\n - **Pattern-driven** — \"I only have source CSV files; suggest transformations\"\n\n- [ ] **1.3** Based on answers, load the appropriate reference file:\n\n| Layer | Approach | Reference to load |\n|---|---|---|\n| Bronze → Silver | Schema-driven | `references/bronze-to-silver-schema-driven.md` |\n| Bronze → Silver | Pattern-driven | `references/bronze-to-silver-pattern-driven.md` |\n| Silver → Gold | Schema-driven | `references/silver-to-gold-schema-driven.md` |\n| Silver → Gold | Pattern-driven | `references/silver-to-gold-pattern-driven.md` |\n\nRead the full reference file with the `view` tool before proceeding. The reference\ncontains the detailed transformation catalogue, SQL patterns, and validation rules\nfor this specific layer+approach combination.\n\n- [ ] **1.4** Ask the user to confirm:\n - Source schema name (default: `bronze` for B→S, `silver` for S→G)\n - Target schema name (default: `silver` for B→S, `gold` for S→G)\n - If Silver→Gold: fiscal year start month and base currency code\n\n### Phase 2 — Inventory and profile\n\n- [ ] **2.1** List all CSV files in `/mnt/user-data/uploads/`.\n\n- [ ] **2.2** Ask the user to identify which CSVs are **source** and which (if\n schema-driven) are **target**. If file naming makes this obvious, propose the\n split and ask for confirmation.\n\n- [ ] **2.3** Run the profiler against every CSV:\n\n```bash\npython references/profile_csvs.py --dir /mnt/user-data/uploads/ --files <file1.csv> <file2.csv> ...\n```\n\nThe profiler outputs a JSON report per file with: column names, inferred dtypes,\nrow count, unique counts, null counts, sample values, and pattern flags (dates,\ncurrency, booleans, commas-in-numbers, whitespace). Store this output for use in\nsubsequent steps.\n\n> **Column naming in Fabric delta tables:** When CSVs are loaded into Fabric\n> Lakehouse delta tables (e.g., via the `csv-to-bronze-delta-tables` skill), a\n> `clean_columns()` function is applied that lowercases all column names and\n> replaces spaces and special characters with underscores. For example,\n> `Hotel ID` becomes `hotel_id` and `No_of_Rooms` becomes `no_of_rooms`.\n> PDF-extracted tables (from the `pdf-to-bronze-delta-tables` skill) may have\n> **entirely different column schemas** since fields are AI-extracted strings.\n> Always verify actual delta table column names — do NOT assume they match the\n> original CSV file headers.\n\n- [ ] **2.4** If schema-driven: profile both source and target CSVs. Map each\n target file to its source file(s) by column overlap. Present the mapping and\n ask the user to confirm.\n\n- [ ] **2.5** If pattern-driven: classify each source file by archetype (see\n reference file for the classification table). Present the classification and\n ask the user to confirm.\n\n### Phase 3 — Detect and plan transformations\n\nFollow the reference file's Step 3 (schema-driven) or Step 3 + Step 4\n(pattern-driven) exactly. The reference contains the full transformation detection\nlogic and catalogue.\n\n- [ ] **3.1** For each source→target pair (schema-driven) or each source file\n (pattern-driven), detect all applicable transformations.\n\n- [ ] **3.2** Present a **transformation plan** to the user — a table showing\n each output view, its sources, the transformations that will be applied, and\n any assumptions.\n\n- [ ] **3.3** If Silver→Gold: run the **anti-pattern check** from the reference:\n - No table mixes dimensions and measures\n - No dimension references another dimension via FK (no snowflaking)\n - Consistent grain within each fact\n - Degenerate dimensions stay in facts\n - Flag junk dimension candidates\n\n- [ ] **3.4** Wait for user confirmation before generating SQL.\n\n### Phase 4 — Generate the SQL\n\nFollow the reference file's SQL generation step exactly (Step 4 or Step 5,\ndepending on reference). Key rules that apply to ALL layer+approach combinations:\n\n**File structure:**\n1. `CREATE SCHEMA IF NOT EXISTS <target_schema>;`\n2. Comment header with assumptions (layer, approach, fiscal year, currency, grain)\n3. Views ordered by dependency (dimensions/independent views first, then dependents)\n4. Each view: `CREATE OR REPLACE MATERIALIZED LAKE VIEW <schema>.<view_name> AS`\n\n**Notebook documentation (when delivering as .ipynb):**\nLoad `references/notebook-standard.md` for the required markdown cell structure.\nWhen delivering as a notebook, the per-view markdown cells replace the separate\nlogic file — the notebook is the single source of truth.\n\n**MLV-to-MLV dependency pattern:**\nMaterialized Lake Views in Fabric can reference other Materialized Lake Views.\nThis is the **standard layered pattern** — build dimensions and independent\nfacts first, then create dependent views that JOIN to them. For example:\n- `silver.room_rate` joins to `silver.hotel_dim` via a fuzzy/normalised key\n- `silver.forecast_monthly` reads from `silver.revenue_monthly` for weight calculation\n- `silver.expenses_monthly` reads from `silver.revenue_monthly` for proportional allocation\n\nAlways order views by dependency: independent views first, dependent views last.\n\nLoad `references/sql-conventions.md` for naming conventions, CTE patterns,\ntype casting rules, and non-obvious Spark SQL syntax before writing any SQL.\n\n- [ ] **4.1** Write the SQL to `/home/claude/mlv_output.sql`.\n\n### Phase 4a — Generate T-SQL validation queries\n\nBefore converting to MLV format, generate a set of plain `SELECT` queries that\nthe user can run against the Fabric SQL Analytics Endpoint to validate the\ntransformation logic independently.\n\n- [ ] **4a.1** For each MLV definition, extract the CTE + SELECT logic and wrap\n it as a standalone `SELECT` statement (removing the `CREATE OR REPLACE\n MATERIALIZED LAKE VIEW` wrapper).\n\n- [ ] **4a.2** Write the validation queries to a separate file:\n - Bronze→Silver: `bronze_to_silver_validation.sql`\n - Silver→Gold: `silver_to_gold_validation.sql`\n\n- [ ] **4a.3** For each query, add a `LIMIT 20` clause and a `-- Expected: ...`\n comment indicating the expected row count and key column values.\n\n- [ ] **4a.4** Present the validation file to the user. The user can run these\n queries in the Fabric SQL Analytics Endpoint (T-SQL mode) to inspect outputs\n before committing to the MLV definitions.\n\n> **Why T-SQL first?** MLV creation is an all-or-nothing operation. If a column\n> name is wrong or a date format doesn't parse, the entire MLV fails. Running\n> validation SELECTs first catches these issues with clear error messages and\n> lets the user inspect sample data before committing.\n\n### Phase 5 — Validate\n\n- [ ] **5.1** Run the **data validation** from the reference file's validation\n step. Load source (and target, if schema-driven) CSVs in pandas and verify:\n - Column names match the target / expected output\n - Row counts are within tolerance (exact for dims, ±5% for facts)\n - Numeric columns: values within tolerance\n - Date columns: all parse correctly\n\n- [ ] **5.2** If Silver→Gold, run the **star schema structural checklist**:\n - [ ] Every table is clearly a dimension or a fact\n - [ ] Every fact has FKs to all related dimensions\n - [ ] Every dimension has a unique primary key\n - [ ] A date dimension exists spanning the full fact date range\n - [ ] Date dimension has display + sort column pairs for Power BI\n - [ ] Every dimension has an unknown/unassigned member row\n - [ ] No snowflaking (no dim-to-dim FK references)\n - [ ] No fact embeds descriptive attributes belonging in a dimension\n - [ ] Consistent grain within each fact table\n - [ ] Consistent naming: `dim_` for dimensions, `fact_` for facts\n - [ ] Surrogate key DENSE_RANK ORDER BY identical in dim views and fact CTEs\n - [ ] Role-playing dimensions documented\n - [ ] Degenerate dimensions remain in facts\n\n- [ ] **5.3** Fix any issues found. Re-validate until clean.\n\n### Phase 6 — Deliver\n\n- [ ] **6.1** Copy the validated SQL to `/mnt/user-data/outputs/` with a\n descriptive filename:\n - Bronze→Silver: `bronze_to_silver_mlv.sql`\n - Silver→Gold: `silver_to_gold_mlv.sql`\n\n- [ ] **6.2** Generate a **transformation logic document** alongside the SQL:\n - Bronze→Silver: `silver_logic.md`\n - Silver→Gold: `gold_logic.md`\n\n This file MUST contain:\n - **Per-view section** with: source table(s), transformations applied (reference\n T-codes), column mapping (bronze name → silver alias + type), any data quality\n issues detected (nulls, artifacts, dirty data, ambiguous formats) and how they\n were handled.\n - **Cross-view dependencies**: which MLVs reference other MLVs and why.\n - **Dropped/excluded data**: columns or rows removed, with rationale.\n - **Domain context**: any business-domain knowledge that informed the design\n (e.g., location hierarchies, currency conventions, fiscal calendars).\n - **Assumptions**: anything not explicitly confirmed by the user.\n\n If delivering as a notebook (`.ipynb`), the per-view markdown cells serve as\n the inline documentation — no separate logic file is needed, since the same\n information is embedded directly in the notebook.\n\n- [ ] **6.3** Present both files to the user.\n\n- [ ] **6.4** Summarise:\n - Number of views created\n - Key transformation patterns applied\n - (Gold) Number of dimensions vs facts, fiscal year config, currency\n - Any warnings or assumptions\n\n## Output Format\n\n```sql\n-- <Layer> layer Spark SQL MLV definitions\n-- Generated by fabric-lakehouse-mlv skill\n-- Source schema: <source_schema> | Target schema: <target_schema>\n-- Assumptions: <fiscal year, currency, grain, etc.>\n\nCREATE SCHEMA IF NOT EXISTS <target_schema>;\n\n-- <View description>\nCREATE OR REPLACE MATERIALIZED LAKE VIEW <target_schema>.<view_name> AS\nWITH cleaned AS (\n ...\n)\nSELECT ...\nFROM cleaned;\n```\n\n## Gotchas\n\n- **BOM characters**: Bronze/silver CSVs often have UTF-8 BOM. Always use\n `encoding='utf-8-sig'` in pandas.\n- **Date format ambiguity**: If all day values ≤ 12, `dd/MM/yyyy` vs `MM/dd/yyyy`\n is ambiguous. Default to `dd/MM/yyyy` for UK/EU data. Ask the user if unsure.\n- **Unpivot STACK count**: The integer N in `LATERAL VIEW STACK(N, ...)` must\n exactly match the number of column pairs. Off-by-one causes silent data loss.\n- **Surrogate key determinism**: `DENSE_RANK(ORDER BY col)` in a gold dimension\n and the matching CTE in a fact MUST use the exact same ORDER BY or keys diverge.\n- **SCD fan-out**: Overlapping date ranges in SCD tables duplicate fact rows.\n Validate non-overlap in silver before building gold.\n- **COALESCE placement**: Apply in the final SELECT of gold facts, never in the\n JOIN condition. Joining `ON fk = 'UNKNOWN'` would incorrectly match the\n unknown dimension row.\n- **Revenue-weighted allocation**: Only use when a revenue table exists. Fall back\n to equal split (`amount / 12.0`) when revenue is zero for a period.\n- **Power BI sort columns**: In the gold date dimension, always pair display\n columns (MonthName, DayOfWeekName, FiscalPeriodLabel) with numeric sort\n columns (MonthNumber, DayOfWeekNumber, FiscalPeriodNumber). Without these,\n months sort alphabetically in Power BI.\n- **No snowflaking in gold**: Flatten all dimension attributes. `dim_hotel`\n should contain City and Country directly, not reference a `dim_geography`.\n- **dayofweek() in Spark**: Returns 1=Sunday, 7=Saturday. Weekend = `IN (1,7)`.\n- **Fiscal year formula**: `((month + (12 - start_month)) % 12) + 1`. Test at\n January and at the start month for off-by-one errors.\n- **MLV-to-MLV references**: Materialized Lake Views in Fabric CAN reference\n other Materialized Lake Views. This is the preferred layered pattern. Always\n create referenced views before referencing views (dependency ordering).\n Use `silver.view_name` (not `bronze.view_name`) when joining to a silver\n view from another silver view.\n- **Column naming mismatch**: Bronze delta table columns may differ from the\n original CSV file headers. The `csv-to-bronze-delta-tables` skill applies\n `clean_columns()` which lowercases all names and replaces spaces/special\n characters with underscores (e.g., `Hotel ID` → `hotel_id`). PDF-extracted\n tables (from `pdf-to-bronze-delta-tables`) have AI-determined field names\n that may not match any CSV. Always verify actual lakehouse column names\n before writing SQL.\n\n## Available References\n\n- **`references/profile_csvs.py`** — Profiles uploaded CSV files and outputs a JSON\n report with column metadata, type flags, and pattern detection.\n Run: `python references/profile_csvs.py --help`\n- **`references/sql-conventions.md`** — Naming, CTE patterns, type casting, and Spark SQL syntax. Load during Phase 4.\n- **`references/notebook-standard.md`** — Required markdown cell structure when delivering output as a `.ipynb` notebook. Load when user requests notebook output.\n- **`references/bronze-to-silver-schema-driven.md`** — Transformation catalogue for bronze→silver schema-driven approach.\n- **`references/bronze-to-silver-pattern-driven.md`** — Transformation catalogue for bronze→silver pattern-driven approach.\n- **`references/silver-to-gold-schema-driven.md`** — Transformation catalogue for silver→gold schema-driven approach.\n- **`references/silver-to-gold-pattern-driven.md`** — Transformation catalogue for silver→gold pattern-driven approach.\n- **`references/output-template.sql`** — SQL output template.\n",
|
|
77
77
|
},
|
|
78
78
|
{
|
|
79
79
|
relativePath: "references/agent.md",
|
|
@@ -119,7 +119,7 @@ export const EMBEDDED_SKILLS = [
|
|
|
119
119
|
files: [
|
|
120
120
|
{
|
|
121
121
|
relativePath: "SKILL.md",
|
|
122
|
-
content: "---\r\nname: csv-to-bronze-delta-tables\r\ndescription: >\r\n Use this skill to upload CSV files from a local machine into a Microsoft Fabric\r\n bronze lakehouse and convert them to delta tables. Triggers on: \"create delta\r\n tables from CSV files\", \"load CSVs into bronze lakehouse\", \"upload CSV to Fabric\r\n and create tables\", \"ingest CSV files to delta format in Fabric\", \"create bronze\r\n tables from local CSV\". Does NOT trigger for creating lakehouses, transforming\r\n existing delta tables, or non-Fabric storage targets.\r\nlicense: MIT\r\ncompatibility: Python 3.8+ required for scripts/. Fabric CLI (fab) must be installed for the CLI upload option.\r\n---\r\n\r\n# CSV to Bronze Delta Tables\r\n\r\nUploads CSV files from an operator's local machine to a Microsoft Fabric bronze\r\nlakehouse and converts them to delta tables. The lakehouse must already exist.\r\n\r\n> ⚠️ **GOVERNANCE**: This skill produces notebooks and scripts for the operator to\r\n> review and run — it never executes commands directly against a live Fabric environment.\r\n> Present each generated artefact to the operator before they run it.\r\n>\r\n> ⚠️ **DETERMINISTIC**: This skill is parameterised, not creative. Run the scripts\r\n> in `scripts/` with the collected parameters — the artefact structure is fixed.\r\n> Never write custom notebook cells, invent upload commands, or suggest alternative\r\n> Python libraries. Adapt only data-specific values (lakehouse name, folder, table\r\n> names). Do not set up a virtual environment — install any local dependency directly\r\n> with `pip install -q`.\r\n>\r\n> ⚠️ **GENERATION**: Always run `scripts/generate_notebook.py` via Bash to produce\r\n> the `.ipynb` notebook — never generate notebook cell content directly. The\r\n> generated notebook uses native PySpark (`spark.read.csv`, `df.write.format(\"delta\")`)\r\n> — it does not use `fab` CLI or `FAB_TOKEN` auth.\r\n\r\n## Orchestrated Context\r\n\r\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\r\nand the SOP before asking the user anything.\r\n\r\n| Parameter | Source when orchestrated |\r\n|---|---|\r\n| Workspace name | Environment profile or implementation plan |\r\n| Lakehouse name | SOP shared parameters (from lakehouse creation step) |\r\n\r\n**Only ask for parameters not found in these documents** (e.g. local CSV folder path,\r\ndestination folder name in OneLake, table naming preferences).\r\n\r\n## Inputs\r\n\r\n| Parameter | Description | Example |\r\n|-----------|-------------|---------|\r\n| `WORKSPACE_NAME` | Fabric workspace name (exact, case-sensitive) | `\"Landon Finance Month End\"` |\r\n| `LAKEHOUSE_NAME` | Bronze lakehouse name (exact, case-sensitive) | `\"Lh_landon_finance_bronze\"` |\r\n| `LOCAL_CSV_FOLDER` | Relative path to local folder containing CSV files (CLI upload only) | `\"./Data\"` |\r\n| `LAKEHOUSE_FILES_FOLDER` | Folder name under the Files section of the lakehouse | `\"raw\"` |\r\n\r\n## Workflow\r\n\r\n- [ ] **Collect parameters** — If `WORKSPACE_NAME` or `LAKEHOUSE_NAME` are not\r\n provided, ask the operator for them before proceeding.\r\n\r\n- [ ] **Upload CSV files** — Present these three options and ask the operator to\r\n choose one:\r\n\r\n **Option A — OneLake File Explorer (Manual)**\r\n Open the OneLake File Explorer desktop app and drag-and-drop the CSV files into\r\n the target folder under the lakehouse Files section. No agent action required.\r\n\r\n **Option B — Fabric UI (Manual)**\r\n In the Fabric browser UI navigate to the lakehouse → Files section, open or create\r\n the target folder, click **Upload** and select the CSV files. No agent action required.\r\n\r\n **Option C — Fabric CLI (Automated)**\r\n > ⚠️ **Requires PowerShell** — generates a `.ps1` script. PowerShell is available\r\n > on Windows natively and on Mac/Linux via `brew install powershell`. If PowerShell\r\n > is not available and the operator does not want to install it, use Option A or B.\r\n > Do not substitute a bash or shell script.\r\n >\r\n > ⚠️ **Performance note**: The CLI uploads files one at a time. For large\r\n > batches (50+ files) this is significantly slower than Options A or B.\r\n > Recommend Options A or B for bulk uploads.\r\n\r\n Ask for `LOCAL_CSV_FOLDER` as the **exact absolute path** to the local folder\r\n and `LAKEHOUSE_FILES_FOLDER` (the destination folder name under Files). Then run:\r\n ```\r\n python scripts/generate_upload_commands.py \\\r\n --local-folder \"<LOCAL_CSV_FOLDER>\" \\\r\n --workspace \"<WORKSPACE_NAME>\" \\\r\n --lakehouse \"<LAKEHOUSE_NAME>\" \\\r\n --lakehouse-folder \"<LAKEHOUSE_FILES_FOLDER>\" \\\r\n --output-script \"<OUTPUT_FOLDER>/upload_csv_files.ps1\"\r\n ```\r\n The script generates a PowerShell `.ps1` file saved directly to the outputs folder.\r\n Present the script path to the operator and ask them to run it with `pwsh upload_csv_files.ps1`.\r\n\r\n## Output Folder\r\n\r\nBefore beginning the workflow, create the output folder:\r\n```\r\noutputs/csv-to-bronze-delta-tables_{YYYY-MM-DD_HH-MM}_{USERNAME}/\r\n```\r\nAll scripts produced during the run are saved here.\r\n\r\n- [ ] **Confirm upload** — Ask the operator to confirm the CSV files are visible\r\n in the Files section of the lakehouse before proceeding.\r\n\r\n- [ ] **Create delta tables** — If `LAKEHOUSE_FILES_FOLDER` was not already\r\n captured above, ask for it now. Present these two options:\r\n\r\n **Option A — Fabric UI (Manual)**\r\n > Quick and easy — recommended for most users.\r\n In the Fabric browser UI navigate to the lakehouse → Files →\r\n `<LAKEHOUSE_FILES_FOLDER>`. For each CSV file: click the three-dot menu →\r\n **Load to Tables** → **New Table**. Accept the suggested table name (Fabric\r\n applies it automatically). No agent action required.\r\n\r\n **Option B — PySpark notebook (Automated)**\r\n Run:\r\n ```\r\n python scripts/generate_notebook.py \\\r\n --lakehouse \"<LAKEHOUSE_NAME>\" \\\r\n --lakehouse-folder \"<LAKEHOUSE_FILES_FOLDER>\" \\\r\n --output-notebook \"<OUTPUT_FOLDER>\\csv_to_delta_tables.ipynb\"\r\n ```\r\n This writes a ready-to-run `.ipynb` file to the outputs folder. Tell the operator:\r\n 1. In the Fabric UI go to the workspace → **New** → **Import notebook**\r\n 2. Select `csv_to_delta_tables.ipynb` from the outputs folder\r\n 3. Follow the instructions in **Cell 1** to manually attach the lakehouse before running\r\n 4. Click **Run All**\r\n **Validate**: confirm every cell printed `✅ Created table: <table_name>` with\r\n no errors. If any `❌` lines appear, report the error message to the operator.\r\n\r\n## Table Naming Convention\r\n\r\nCSV filename → delta table name:\r\n- Strip `.csv` extension\r\n- Convert to lowercase\r\n- Replace any non-alphanumeric characters (spaces, hyphens, dots) with underscores\r\n- Strip leading/trailing underscores\r\n\r\nExamples:\r\n| CSV filename | Delta table name |\r\n|---|---|\r\n| `Revenue Data.csv` | `revenue_data` |\r\n| `Landon hotel revenue data.csv` | `landon_hotel_revenue_data` |\r\n| `Q1-Sales.csv` | `q1_sales` |\r\n\r\n## Column Naming Convention\r\n\r\nWhen CSVs are loaded into delta tables via the PySpark notebook (Option B of\r\ndelta table creation), a `clean_columns()` function transforms every column name:\r\n\r\n- Convert to lowercase\r\n- Replace spaces, hyphens, and other non-alphanumeric characters with underscores\r\n- Strip leading/trailing underscores\r\n\r\n| CSV column header | Delta table column name |\r\n|---|---|\r\n| `Hotel ID` | `hotel_id` |\r\n| `No_of_Rooms` | `no_of_rooms` |\r\n| `Total Revenue (GBP)` | `total_revenue_gbp` |\r\n| `First Name` | `first_name` |\r\n\r\n> **Important for downstream skills:** When writing SQL queries against bronze\r\n> delta tables (e.g., in the `create-materialised-lakeview-scripts` skill),\r\n> always use the cleaned column names — not the original CSV headers.\r\n\r\n## Output Format\r\n\r\nDelta tables appear under the **Tables** section of the bronze lakehouse in the\r\nFabric UI, named according to the convention above. Each table is queryable via\r\nthe lakehouse SQL endpoint and PySpark.\r\n\r\n## Gotchas\r\n\r\n- `fab cp` uses the path prefix to identify local vs OneLake paths. **Absolute\r\n Windows paths (`C:\\...`) are not recognised as local** and cause a\r\n `[NotSupported] Source and destination must be of the same type` error. Always\r\n use `Push-Location` into the source folder and `./filename` (forward slash,\r\n not backslash) syntax — confirmed working pattern.\r\n- **The destination folder must exist before running `fab cp`.** Always run\r\n `fab mkdir \"{WORKSPACE}.Workspace/{LAKEHOUSE}.Lakehouse/Files/{FOLDER}\"` first.\r\n Running `fab mkdir` on an already-existing folder is safe and does not error.\r\n- `WORKSPACE_NAME` and `LAKEHOUSE_NAME` are case-sensitive and must exactly match\r\n what appears in the Fabric UI.\r\n- The Fabric UI shortcut approach (Option A for delta table creation) uses Fabric's\r\n automatic schema inference. It may fail if column names contain spaces or data types\r\n are inconsistent. Switch to Option B (PySpark notebook) in those cases.\r\n- The PySpark notebook requires the lakehouse to be manually attached before running —\r\n Cell 1 contains step-by-step instructions. If you skip this, `saveAsTable()` will fail.\r\n- When using the Fabric CLI, run all commands from the directory that\r\n `LOCAL_CSV_FOLDER` is relative to (typically the project root).\r\n\r\n## Available Scripts\r\n\r\n- **`scripts/generate_upload_commands.py`** — Scans a local CSV folder and outputs\r\n `fab cp` commands to upload each file to the lakehouse Files section.\r\n Run: `python scripts/generate_upload_commands.py --help`\r\n- **`scripts/generate_notebook.py`** — Generates a Fabric-compatible `.ipynb`\r\n notebook pre-configured with the correct lakehouse name and `FILES_FOLDER`.\r\n Cell 1 instructs the operator to manually attach the lakehouse before running.\r\n Import into Fabric via **New → Import notebook**.\r\n Run: `python scripts/generate_notebook.py --help`\r\n",
|
|
122
|
+
content: "---\r\nname: csv-to-bronze-delta-tables\r\ndescription: >\r\n Use this skill to upload CSV files from a local machine into a Microsoft Fabric\r\n bronze lakehouse and convert them to delta tables. Triggers on: \"create delta\r\n tables from CSV files\", \"load CSVs into bronze lakehouse\", \"upload CSV to Fabric\r\n and create tables\", \"ingest CSV files to delta format in Fabric\", \"create bronze\r\n tables from local CSV\". Does NOT trigger for creating lakehouses, transforming\r\n existing delta tables, or non-Fabric storage targets.\r\nlicense: MIT\r\ncompatibility: Python 3.8+ required for scripts/. Fabric CLI (fab) must be installed for the CLI upload option.\r\n---\r\n\r\n# CSV to Bronze Delta Tables\r\n\r\nUploads CSV files from an operator's local machine to a Microsoft Fabric bronze\r\nlakehouse and converts them to delta tables. The lakehouse must already exist.\r\n\r\n> ⚠️ **GOVERNANCE**: This skill produces notebooks and scripts for the operator to\r\n> review and run — it never executes commands directly against a live Fabric environment.\r\n> Present each generated artefact to the operator before they run it.\r\n>\r\n> ⚠️ **DETERMINISTIC**: The implementation pattern, tools, and artefact structure are\r\n> fixed — always run the scripts in `scripts/` and follow the workflow below. The\r\n> workflow defines the permitted conditional branches (e.g. upload via CLI, UI, or\r\n> OneLake File Explorer; create tables via Fabric UI or PySpark notebook). Follow\r\n> these branches based on the operator's choice. Never write custom notebook cells,\r\n> invent upload commands, suggest alternative libraries, or set up a virtual\r\n> environment — install any local dependency directly with `pip install -q`.\r\n>\r\n> ⚠️ **GENERATION**: Always run `scripts/generate_notebook.py` via Bash to produce\r\n> the `.ipynb` notebook — never generate notebook cell content directly. The\r\n> generated notebook uses native PySpark (`spark.read.csv`, `df.write.format(\"delta\")`)\r\n> — it does not use `fab` CLI or `FAB_TOKEN` auth.\r\n\r\n## Orchestrated Context\r\n\r\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\r\nand the SOP before asking the user anything.\r\n\r\n| Parameter | Source when orchestrated |\r\n|---|---|\r\n| Workspace name | Environment profile or implementation plan |\r\n| Lakehouse name | SOP shared parameters (from lakehouse creation step) |\r\n\r\n**Only ask for parameters not found in these documents** (e.g. local CSV folder path,\r\ndestination folder name in OneLake, table naming preferences).\r\n\r\n## Inputs\r\n\r\n| Parameter | Description | Example |\r\n|-----------|-------------|---------|\r\n| `WORKSPACE_NAME` | Fabric workspace name (exact, case-sensitive) | `\"Landon Finance Month End\"` |\r\n| `LAKEHOUSE_NAME` | Bronze lakehouse name (exact, case-sensitive) | `\"Lh_landon_finance_bronze\"` |\r\n| `LOCAL_CSV_FOLDER` | Relative path to local folder containing CSV files (CLI upload only) | `\"./Data\"` |\r\n| `LAKEHOUSE_FILES_FOLDER` | Folder name under the Files section of the lakehouse | `\"raw\"` |\r\n\r\n## Workflow\r\n\r\n- [ ] **Collect parameters** — If `WORKSPACE_NAME` or `LAKEHOUSE_NAME` are not\r\n provided, ask the operator for them before proceeding.\r\n\r\n- [ ] **Upload CSV files** — Present these three options and ask the operator to\r\n choose one:\r\n\r\n **Option A — OneLake File Explorer (Manual)**\r\n Open the OneLake File Explorer desktop app and drag-and-drop the CSV files into\r\n the target folder under the lakehouse Files section. No agent action required.\r\n\r\n **Option B — Fabric UI (Manual)**\r\n In the Fabric browser UI navigate to the lakehouse → Files section, open or create\r\n the target folder, click **Upload** and select the CSV files. No agent action required.\r\n\r\n **Option C — Fabric CLI (Automated)**\r\n > ⚠️ **Requires PowerShell** — generates a `.ps1` script. PowerShell is available\r\n > on Windows natively and on Mac/Linux via `brew install powershell`. If PowerShell\r\n > is not available and the operator does not want to install it, use Option A or B.\r\n > Do not substitute a bash or shell script.\r\n >\r\n > ⚠️ **Performance note**: The CLI uploads files one at a time. For large\r\n > batches (50+ files) this is significantly slower than Options A or B.\r\n > Recommend Options A or B for bulk uploads.\r\n\r\n Ask for `LOCAL_CSV_FOLDER` as the **exact absolute path** to the local folder\r\n and `LAKEHOUSE_FILES_FOLDER` (the destination folder name under Files). Then run:\r\n ```\r\n python scripts/generate_upload_commands.py \\\r\n --local-folder \"<LOCAL_CSV_FOLDER>\" \\\r\n --workspace \"<WORKSPACE_NAME>\" \\\r\n --lakehouse \"<LAKEHOUSE_NAME>\" \\\r\n --lakehouse-folder \"<LAKEHOUSE_FILES_FOLDER>\" \\\r\n --output-script \"<OUTPUT_FOLDER>/upload_csv_files.ps1\"\r\n ```\r\n The script generates a PowerShell `.ps1` file saved directly to the outputs folder.\r\n Present the script path to the operator and ask them to run it with `pwsh upload_csv_files.ps1`.\r\n\r\n## Output Folder\r\n\r\nBefore beginning the workflow, create the output folder:\r\n```\r\noutputs/csv-to-bronze-delta-tables_{YYYY-MM-DD_HH-MM}_{USERNAME}/\r\n```\r\nAll scripts produced during the run are saved here.\r\n\r\n- [ ] **Confirm upload** — Ask the operator to confirm the CSV files are visible\r\n in the Files section of the lakehouse before proceeding.\r\n\r\n- [ ] **Create delta tables** — If `LAKEHOUSE_FILES_FOLDER` was not already\r\n captured above, ask for it now. Present these two options:\r\n\r\n **Option A — Fabric UI (Manual)**\r\n > Quick and easy — recommended for most users.\r\n In the Fabric browser UI navigate to the lakehouse → Files →\r\n `<LAKEHOUSE_FILES_FOLDER>`. For each CSV file: click the three-dot menu →\r\n **Load to Tables** → **New Table**. Accept the suggested table name (Fabric\r\n applies it automatically). No agent action required.\r\n\r\n **Option B — PySpark notebook (Automated)**\r\n Run:\r\n ```\r\n python scripts/generate_notebook.py \\\r\n --lakehouse \"<LAKEHOUSE_NAME>\" \\\r\n --lakehouse-folder \"<LAKEHOUSE_FILES_FOLDER>\" \\\r\n --output-notebook \"<OUTPUT_FOLDER>\\csv_to_delta_tables.ipynb\"\r\n ```\r\n This writes a ready-to-run `.ipynb` file to the outputs folder. Tell the operator:\r\n 1. In the Fabric UI go to the workspace → **New** → **Import notebook**\r\n 2. Select `csv_to_delta_tables.ipynb` from the outputs folder\r\n 3. Follow the instructions in **Cell 1** to manually attach the lakehouse before running\r\n 4. Click **Run All**\r\n **Validate**: confirm every cell printed `✅ Created table: <table_name>` with\r\n no errors. If any `❌` lines appear, report the error message to the operator.\r\n\r\n## Table Naming Convention\r\n\r\nCSV filename → delta table name:\r\n- Strip `.csv` extension\r\n- Convert to lowercase\r\n- Replace any non-alphanumeric characters (spaces, hyphens, dots) with underscores\r\n- Strip leading/trailing underscores\r\n\r\nExamples:\r\n| CSV filename | Delta table name |\r\n|---|---|\r\n| `Revenue Data.csv` | `revenue_data` |\r\n| `Landon hotel revenue data.csv` | `landon_hotel_revenue_data` |\r\n| `Q1-Sales.csv` | `q1_sales` |\r\n\r\n## Column Naming Convention\r\n\r\nWhen CSVs are loaded into delta tables via the PySpark notebook (Option B of\r\ndelta table creation), a `clean_columns()` function transforms every column name:\r\n\r\n- Convert to lowercase\r\n- Replace spaces, hyphens, and other non-alphanumeric characters with underscores\r\n- Strip leading/trailing underscores\r\n\r\n| CSV column header | Delta table column name |\r\n|---|---|\r\n| `Hotel ID` | `hotel_id` |\r\n| `No_of_Rooms` | `no_of_rooms` |\r\n| `Total Revenue (GBP)` | `total_revenue_gbp` |\r\n| `First Name` | `first_name` |\r\n\r\n> **Important for downstream skills:** When writing SQL queries against bronze\r\n> delta tables (e.g., in the `create-materialised-lakeview-scripts` skill),\r\n> always use the cleaned column names — not the original CSV headers.\r\n\r\n## Output Format\r\n\r\nDelta tables appear under the **Tables** section of the bronze lakehouse in the\r\nFabric UI, named according to the convention above. Each table is queryable via\r\nthe lakehouse SQL endpoint and PySpark.\r\n\r\n## Gotchas\r\n\r\n- `fab cp` uses the path prefix to identify local vs OneLake paths. **Absolute\r\n Windows paths (`C:\\...`) are not recognised as local** and cause a\r\n `[NotSupported] Source and destination must be of the same type` error. Always\r\n use `Push-Location` into the source folder and `./filename` (forward slash,\r\n not backslash) syntax — confirmed working pattern.\r\n- **The destination folder must exist before running `fab cp`.** Always run\r\n `fab mkdir \"{WORKSPACE}.Workspace/{LAKEHOUSE}.Lakehouse/Files/{FOLDER}\"` first.\r\n Running `fab mkdir` on an already-existing folder is safe and does not error.\r\n- `WORKSPACE_NAME` and `LAKEHOUSE_NAME` are case-sensitive and must exactly match\r\n what appears in the Fabric UI.\r\n- The Fabric UI shortcut approach (Option A for delta table creation) uses Fabric's\r\n automatic schema inference. It may fail if column names contain spaces or data types\r\n are inconsistent. Switch to Option B (PySpark notebook) in those cases.\r\n- The PySpark notebook requires the lakehouse to be manually attached before running —\r\n Cell 1 contains step-by-step instructions. If you skip this, `saveAsTable()` will fail.\r\n- When using the Fabric CLI, run all commands from the directory that\r\n `LOCAL_CSV_FOLDER` is relative to (typically the project root).\r\n\r\n## Available Scripts\r\n\r\n- **`scripts/generate_upload_commands.py`** — Scans a local CSV folder and outputs\r\n `fab cp` commands to upload each file to the lakehouse Files section.\r\n Run: `python scripts/generate_upload_commands.py --help`\r\n- **`scripts/generate_notebook.py`** — Generates a Fabric-compatible `.ipynb`\r\n notebook pre-configured with the correct lakehouse name and `FILES_FOLDER`.\r\n Cell 1 instructs the operator to manually attach the lakehouse before running.\r\n Import into Fabric via **New → Import notebook**.\r\n Run: `python scripts/generate_notebook.py --help`\r\n",
|
|
123
123
|
},
|
|
124
124
|
{
|
|
125
125
|
relativePath: "assets/pyspark_notebook_template.py",
|
|
@@ -149,7 +149,7 @@ export const EMBEDDED_SKILLS = [
|
|
|
149
149
|
files: [
|
|
150
150
|
{
|
|
151
151
|
relativePath: "SKILL.md",
|
|
152
|
-
content: "---\nname: fabric-process-discovery\ndescription: >\n Use this skill to conduct the initial environment discovery conversation for any\n Microsoft Fabric process workflow. Establishes the operator's tenant access, Fabric\n permissions, Entra group visibility, and deployment preference through a focused\n sequence of 6 questions asked one at a time. Output is a permissions and preferences\n profile used by the orchestrating agent to plan execution. Step-specific parameters\n (workspace names, group IDs, capacity, file paths) are collected contextually during\n execution — not upfront. Triggers as Sub-Agent 0 in any Fabric process workflow agent.\nlicense: MIT\ncompatibility: Works in any Claude context — no external tools required at this stage.\n---\n\n# Fabric Process Discovery\n\n> ⚠️ **GOVERNANCE**: This skill only gathers context — it never executes commands or\n> creates resources. All collected information feeds into the execution plan which the\n> operator reviews and confirms before anything runs.\n>\n> ⚠️ **PRIVACY**: Never ask for passwords, tokens, client secrets, Object IDs, or\n> credential values during discovery. Flag what will be needed; the operator provides\n> values at execution time.\n>\n> ⚠️ **DETERMINISTIC**: Ask exactly the questions defined in this skill, in the order\n> defined, one at a time. Never add, skip, or rephrase questions. The output profile\n> must follow the exact format specified below — do not summarise or restructure it.\n\n## Workflow\n\n1. Ask the 6 Phase 1 questions in order, one at a time, branching on each answer.\n2. Present a confirmation summary and wait for explicit approval.\n3. Write the environment profile and append to `CHANGE_LOG.md`.\n\n**Do not ask for workspace names, group names, capacity names, file paths, or any\nstep-specific parameters.** These are collected contextually during execution by the\nParameter Resolution Protocol — asking for them upfront is premature and confusing.\n\n## References\n\n- `references/technical-constraints.md` — authentication, Object IDs, notebookutils\n limitations, Service Principal requirements, capacity state\n- `references/fabric-architecture.md` — workload landscape, medallion architecture,\n environment promotion, credential management\n\n---\n\n## Core Principles\n\n**1. One question at a time — Yes/No or 3–4 labelled options.**\nEach question must be answerable with a yes/no or a single labelled choice. Wait for\nthe answer before asking the next question.\n\n**2. Scaffold before asking.**\nOne sentence before each question explaining what it establishes and why it matters.\nOperators new to Fabric cannot anticipate what a Fabric architect considers essential.\n\n**3. Always give a way to check.**\nFor every permission question, tell the operator exactly how to verify their access\nif they are unsure — which portal, which button to look for, which page to visit.\n\n**4. Permissions and preferences only.**\nPhase 1 establishes what the operator *can* do and *how* they prefer to do it.\nSpecific parameters (names, IDs, paths) belong to the execution phase, not here.\n\n---\n\n## Phase 1 — Permissions and Preferences\n\nAsk all 6 questions in sequence. Each one captures a universal constraint or\npreference that shapes the entire plan. Do not skip any of them.\n\n---\n\n### Q1 — Tenant and Fabric access\n\n*This confirms the process can proceed at all.*\n\n> \"First, let's confirm your Fabric environment. Can you access the Microsoft Fabric\n> portal and see at least one workspace?\"\n>\n> To check: go to [app.fabric.microsoft.com](https://app.fabric.microsoft.com).\n>\n> - A) Yes — I can log in and see workspaces\n> - B) I can log in but I'm not sure what I have access to\n> - C) No — I can't access Fabric yet\n\nBranch:\n- A → proceed to Q2\n- B → ask them to describe what they see; determine whether they have a Fabric\n capacity/SKU active (if the home page loads and they can see a workspace list,\n they have a working tenant — proceed to Q2)\n- C → stop; explain what is needed: a Microsoft 365 or Azure account with a Fabric\n capacity (F-SKU or P-SKU) assigned, or a Fabric trial activated at\n app.fabric.microsoft.com/home\n\n---\n\n### Q2 — Workspace creation\n\n*Determines whether the plan includes automated workspace creation or uses existing ones.*\n\n> \"Can you create new workspaces in Fabric?\"\n>\n> To check: in the Fabric portal, look for a **\"+ New workspace\"** button in the\n> left-hand workspace list. If it's visible, you have create rights.\n>\n> - A) Yes — I can create new workspaces\n> - B) No — I'll need to use existing workspaces\n> - C) I'm not sure — I'll check now\n\nBranch:\n- A → workspace creation steps will be automated in the plan\n- B → workspace creation steps will be marked manual (request from admin) or\n will use existing workspaces; note for plan\n- C → ask them to check the portal; wait for answer; branch as A or B\n\n---\n\n### Q3 — Fabric item creation\n\n*Determines whether lakehouses, notebooks, and other items can be created automatically.*\n\n> \"Within a workspace, can you create Fabric items like Lakehouses and Notebooks?\"\n>\n> To check: open any workspace, click **\"+ New item\"** and see if Lakehouse and\n> Notebook appear in the list.\n>\n> - A) Yes — I can create Lakehouses, Notebooks, and other items\n> - B) No — I can view items but not create them\n> - C) I'm not sure — I'll check now\n\nBranch:\n- A → item creation steps will be automated in the plan\n- B → item creation steps flagged as manual; plan will note admin involvement needed\n- C → ask them to check; wait; branch as A or B\n\n---\n\n### Q4 — Domain assignment rights\n\n*Determines whether workspace-to-domain assignment can be automated or needs an admin.*\n\n> \"Will these workspaces need to be assigned to a Fabric domain?\"\n>\n> - A) Yes — and I can do this myself (I'm a Domain Contributor or Fabric Admin)\n> - B) Yes — but I'll need an admin to do it\n> - C) No — skip domain assignment\n> - D) I'm not sure if I have the rights\n\nBranch:\n- A → domain assignment will be automated in the plan; domain name will be\n collected at the relevant execution step\n- B or D → domain assignment flagged as manual gate; the workspace creation\n proceeds automatically but domain assignment requires a Domain Admin or\n Fabric Admin.\n To check rights: in the Fabric Admin portal (app.fabric.microsoft.com/admin),\n look under Domains — if you can see and edit domains, you have admin rights.\n If you can assign workspaces within a domain but not manage the domain itself,\n you are a Domain Contributor.\n- C → no domain parameters needed; skip in plan\n\n---\n\n### Q5 — Entra group visibility\n\n*Determines how workspace role assignments will be handled if security groups are used.*\n\n> \"Can you see security groups in the Azure portal?\"\n>\n> To check: go to [portal.azure.com](https://portal.azure.com) → **Microsoft Entra ID**\n> (or Azure Active Directory) → **Groups**. If you can see a list of groups,\n> you have read access.\n>\n> - A) Yes — I can see security groups in the Entra portal\n> - B) No — I can't access that part of the portal\n> - C) We won't be using security groups for access control\n\nBranch:\n- A → group-based role assignment can be included in the plan; Object IDs will\n be collected or resolved at the relevant execution step (not now)\n- B → group role assignment flagged as manual or will use individual user emails\n instead; note for plan\n- C → role assignment will use individual users (email addresses) or be skipped;\n note for plan\n\nNote: Object IDs are **not** collected here — they are gathered at the role\nassignment execution step where the operator will have the context to look them up.\nLoad `references/technical-constraints.md` → Entra Group Object IDs if the operator\nasks why Object IDs are needed.\n\n---\n\n### Q6 — Deployment preference\n\n*Sets the default approach for all generated artefacts throughout the plan.*\n\n> \"How would you like to run the scripts and notebooks this process generates?\n> Here's what each option means:\"\n>\n> - **A) PySpark notebook** — Imported into your Fabric workspace and run\n> cell-by-cell in the browser. Authentication is automatic — no local software\n> needed. Each cell shows its output as it runs. Best if you prefer working\n> inside Fabric.\n>\n> - **B) PowerShell script** — A `.ps1` file you download and run locally in\n> PowerShell or VS Code. Requires the Fabric CLI installed locally\n> (`pip install ms-fabric-cli`). You can review the full script before running.\n> Best if you're comfortable with scripting locally.\n>\n> - **C) Terminal commands** — Individual `fab` commands run one at a time in\n> your terminal. Requires the Fabric CLI installed locally. Gives you full\n> visibility and control at each step. Best if you want to approve each action\n> before it runs.\n>\n> Note: some steps have constraints — local file uploads cannot be done in a\n> notebook, and large file batches are slow via script. Where a step requires a\n> different approach, you'll be told at that point and given the options.\n\nRecord the preference. This becomes the default for all execution steps, but can\nbe overridden step-by-step during execution.\n\n---\n\n## Confirmation\n\nPresent a summary before writing the profile:\n\n```\n| Q | What we established | Answer | Plan impact |\n|----|------------------------------|-------------------|------------------------------------------|\n| 1 | Tenant + Fabric access | Yes | Process can proceed |\n| 2 | Workspace creation | Yes | Workspace steps automated |\n| 3 | Item creation (lakehouses) | Yes | Lakehouse/notebook steps automated |\n| 4 | Domain assignment rights | Yes — contributor | Domain assignment automated |\n| 5 | Entra group visibility | Yes | Group role steps included; IDs at runtime|\n| 6 | Deployment preference | PySpark notebook | .ipynb generated for each step |\n```\n\nAsk: *\"Does this look correct? Confirm and I'll write the environment profile.\"*\nWait for explicit confirmation before writing.\n\n---\n\n## Output\n\nSave as `00-environment-discovery/environment-profile.md`. Include:\n\n- Permissions profile (workspace creation, item creation, domain rights, group visibility)\n- Deployment preference with any noted constraints\n- Manual gates identified (steps that will need admin or manual action)\n- Any prerequisites noted (CLI installation, tenant access requirements)\n\n**Do not include** workspace names, group names, capacity names, or any\nstep-specific parameters — these are collected at execution time.\n\nAppend to `CHANGE_LOG.md`:\n`[{DATETIME}] Sub-Agent 0 complete — environment-profile.md produced. Deployment preference: [preference]. Manual gates: [list or none].`\n\n---\n\n## Gotchas\n\n- **Never ask for step-specific parameters here** — workspace names, capacity,\n group IDs, file paths, domain names all belong to execution steps, not discovery\n- **Never frame deployment as CLI vs no-CLI** — all three approaches use `fab`\n- **`az login` and `fab auth login` are separate** — both required for\n PowerShell/terminal deployments that include group lookups; see\n `references/technical-constraints.md`\n- **Entra group Object IDs are not collected here** — flagged as needed, collected\n at the role assignment execution step with full context\n- **Domain creation requires Fabric Admin rights; assignment requires Domain\n Contributor** — both are distinct from workspace-level permissions\n- **`notebookutils` cannot query Microsoft Graph** — if notebook deployment +\n groups are used, Object IDs must be pre-provided; see references\n- **Never collect credential values** — flag what is needed; values entered at runtime\n",
|
|
152
|
+
content: "---\nname: fabric-process-discovery\ndescription: >\n Use this skill to conduct the initial environment discovery conversation for any\n Microsoft Fabric process workflow. Establishes the operator's tenant access, Fabric\n permissions, Entra group visibility, and deployment preference through a focused\n sequence of 6 questions asked one at a time. Output is a permissions and preferences\n profile used by the orchestrating agent to plan execution. Step-specific parameters\n (workspace names, group IDs, capacity, file paths) are collected contextually during\n execution — not upfront. Triggers as Sub-Agent 0 in any Fabric process workflow agent.\nlicense: MIT\ncompatibility: Works in any Claude context — no external tools required at this stage.\n---\n\n# Fabric Process Discovery\n\n> ⚠️ **GOVERNANCE**: This skill only gathers context — it never executes commands or\n> creates resources. All collected information feeds into the execution plan which the\n> operator reviews and confirms before anything runs.\n>\n> ⚠️ **PRIVACY**: Never ask for passwords, tokens, client secrets, Object IDs, or\n> credential values during discovery. Flag what will be needed; the operator provides\n> values at execution time.\n>\n> ⚠️ **DETERMINISTIC**: The question sequence, branching logic, and output format are\n> fixed — follow the workflow below exactly. The workflow defines the permitted\n> conditional branches (e.g. skip domain questions if operator cannot create workspaces;\n> branch on group resolution method). Follow these branches based on the operator's\n> answers. Never add, skip, or rephrase questions, and never restructure the output\n> profile format.\n\n## Workflow\n\n1. Ask the 6 Phase 1 questions in order, one at a time, branching on each answer.\n2. Present a confirmation summary and wait for explicit approval.\n3. Write the environment profile and append to `CHANGE_LOG.md`.\n\n**Do not ask for workspace names, group names, capacity names, file paths, or any\nstep-specific parameters.** These are collected contextually during execution by the\nParameter Resolution Protocol — asking for them upfront is premature and confusing.\n\n## References\n\n- `references/technical-constraints.md` — authentication, Object IDs, notebookutils\n limitations, Service Principal requirements, capacity state\n- `references/fabric-architecture.md` — workload landscape, medallion architecture,\n environment promotion, credential management\n\n---\n\n## Core Principles\n\n**1. One question at a time — Yes/No or 3–4 labelled options.**\nEach question must be answerable with a yes/no or a single labelled choice. Wait for\nthe answer before asking the next question.\n\n**2. Scaffold before asking.**\nOne sentence before each question explaining what it establishes and why it matters.\nOperators new to Fabric cannot anticipate what a Fabric architect considers essential.\n\n**3. Always give a way to check.**\nFor every permission question, tell the operator exactly how to verify their access\nif they are unsure — which portal, which button to look for, which page to visit.\n\n**4. Permissions and preferences only.**\nPhase 1 establishes what the operator *can* do and *how* they prefer to do it.\nSpecific parameters (names, IDs, paths) belong to the execution phase, not here.\n\n---\n\n## Phase 1 — Permissions and Preferences\n\nAsk all 6 questions in sequence. Each one captures a universal constraint or\npreference that shapes the entire plan. Do not skip any of them.\n\n---\n\n### Q1 — Tenant and Fabric access\n\n*This confirms the process can proceed at all.*\n\n> \"First, let's confirm your Fabric environment. Can you access the Microsoft Fabric\n> portal and see at least one workspace?\"\n>\n> To check: go to [app.fabric.microsoft.com](https://app.fabric.microsoft.com).\n>\n> - A) Yes — I can log in and see workspaces\n> - B) I can log in but I'm not sure what I have access to\n> - C) No — I can't access Fabric yet\n\nBranch:\n- A → proceed to Q2\n- B → ask them to describe what they see; determine whether they have a Fabric\n capacity/SKU active (if the home page loads and they can see a workspace list,\n they have a working tenant — proceed to Q2)\n- C → stop; explain what is needed: a Microsoft 365 or Azure account with a Fabric\n capacity (F-SKU or P-SKU) assigned, or a Fabric trial activated at\n app.fabric.microsoft.com/home\n\n---\n\n### Q2 — Workspace creation\n\n*Determines whether the plan includes automated workspace creation or uses existing ones.*\n\n> \"Can you create new workspaces in Fabric?\"\n>\n> To check: in the Fabric portal, look for a **\"+ New workspace\"** button in the\n> left-hand workspace list. If it's visible, you have create rights.\n>\n> - A) Yes — I can create new workspaces\n> - B) No — I'll need to use existing workspaces\n> - C) I'm not sure — I'll check now\n\nBranch:\n- A → workspace creation steps will be automated in the plan\n- B → workspace creation steps will be marked manual (request from admin) or\n will use existing workspaces; note for plan\n- C → ask them to check the portal; wait for answer; branch as A or B\n\n---\n\n### Q3 — Fabric item creation\n\n*Determines whether lakehouses, notebooks, and other items can be created automatically.*\n\n> \"Within a workspace, can you create Fabric items like Lakehouses and Notebooks?\"\n>\n> To check: open any workspace, click **\"+ New item\"** and see if Lakehouse and\n> Notebook appear in the list.\n>\n> - A) Yes — I can create Lakehouses, Notebooks, and other items\n> - B) No — I can view items but not create them\n> - C) I'm not sure — I'll check now\n\nBranch:\n- A → item creation steps will be automated in the plan\n- B → item creation steps flagged as manual; plan will note admin involvement needed\n- C → ask them to check; wait; branch as A or B\n\n---\n\n### Q4 — Domain assignment rights\n\n*Determines whether workspace-to-domain assignment can be automated or needs an admin.*\n\n> \"Will these workspaces need to be assigned to a Fabric domain?\"\n>\n> - A) Yes — and I can do this myself (I'm a Domain Contributor or Fabric Admin)\n> - B) Yes — but I'll need an admin to do it\n> - C) No — skip domain assignment\n> - D) I'm not sure if I have the rights\n\nBranch:\n- A → domain assignment will be automated in the plan; domain name will be\n collected at the relevant execution step\n- B or D → domain assignment flagged as manual gate; the workspace creation\n proceeds automatically but domain assignment requires a Domain Admin or\n Fabric Admin.\n To check rights: in the Fabric Admin portal (app.fabric.microsoft.com/admin),\n look under Domains — if you can see and edit domains, you have admin rights.\n If you can assign workspaces within a domain but not manage the domain itself,\n you are a Domain Contributor.\n- C → no domain parameters needed; skip in plan\n\n---\n\n### Q5 — Entra group visibility\n\n*Determines how workspace role assignments will be handled if security groups are used.*\n\n> \"Can you see security groups in the Azure portal?\"\n>\n> To check: go to [portal.azure.com](https://portal.azure.com) → **Microsoft Entra ID**\n> (or Azure Active Directory) → **Groups**. If you can see a list of groups,\n> you have read access.\n>\n> - A) Yes — I can see security groups in the Entra portal\n> - B) No — I can't access that part of the portal\n> - C) We won't be using security groups for access control\n\nBranch:\n- A → group-based role assignment can be included in the plan; Object IDs will\n be collected or resolved at the relevant execution step (not now)\n- B → group role assignment flagged as manual or will use individual user emails\n instead; note for plan\n- C → role assignment will use individual users (email addresses) or be skipped;\n note for plan\n\nNote: Object IDs are **not** collected here — they are gathered at the role\nassignment execution step where the operator will have the context to look them up.\nLoad `references/technical-constraints.md` → Entra Group Object IDs if the operator\nasks why Object IDs are needed.\n\n---\n\n### Q6 — Deployment preference\n\n*Sets the default approach for all generated artefacts throughout the plan.*\n\n> \"How would you like to run the scripts and notebooks this process generates?\n> Here's what each option means:\"\n>\n> - **A) PySpark notebook** — Imported into your Fabric workspace and run\n> cell-by-cell in the browser. Authentication is automatic — no local software\n> needed. Each cell shows its output as it runs. Best if you prefer working\n> inside Fabric.\n>\n> - **B) PowerShell script** — A `.ps1` file you download and run locally in\n> PowerShell or VS Code. Requires the Fabric CLI installed locally\n> (`pip install ms-fabric-cli`). You can review the full script before running.\n> Best if you're comfortable with scripting locally.\n>\n> - **C) Terminal commands** — Individual `fab` commands run one at a time in\n> your terminal. Requires the Fabric CLI installed locally. Gives you full\n> visibility and control at each step. Best if you want to approve each action\n> before it runs.\n>\n> Note: some steps have constraints — local file uploads cannot be done in a\n> notebook, and large file batches are slow via script. Where a step requires a\n> different approach, you'll be told at that point and given the options.\n\nRecord the preference. This becomes the default for all execution steps, but can\nbe overridden step-by-step during execution.\n\n---\n\n## Confirmation\n\nPresent a summary before writing the profile:\n\n```\n| Q | What we established | Answer | Plan impact |\n|----|------------------------------|-------------------|------------------------------------------|\n| 1 | Tenant + Fabric access | Yes | Process can proceed |\n| 2 | Workspace creation | Yes | Workspace steps automated |\n| 3 | Item creation (lakehouses) | Yes | Lakehouse/notebook steps automated |\n| 4 | Domain assignment rights | Yes — contributor | Domain assignment automated |\n| 5 | Entra group visibility | Yes | Group role steps included; IDs at runtime|\n| 6 | Deployment preference | PySpark notebook | .ipynb generated for each step |\n```\n\nAsk: *\"Does this look correct? Confirm and I'll write the environment profile.\"*\nWait for explicit confirmation before writing.\n\n---\n\n## Output\n\nSave as `00-environment-discovery/environment-profile.md`. Include:\n\n- Permissions profile (workspace creation, item creation, domain rights, group visibility)\n- Deployment preference with any noted constraints\n- Manual gates identified (steps that will need admin or manual action)\n- Any prerequisites noted (CLI installation, tenant access requirements)\n\n**Do not include** workspace names, group names, capacity names, or any\nstep-specific parameters — these are collected at execution time.\n\nAppend to `CHANGE_LOG.md`:\n`[{DATETIME}] Sub-Agent 0 complete — environment-profile.md produced. Deployment preference: [preference]. Manual gates: [list or none].`\n\n---\n\n## Gotchas\n\n- **Never ask for step-specific parameters here** — workspace names, capacity,\n group IDs, file paths, domain names all belong to execution steps, not discovery\n- **Never frame deployment as CLI vs no-CLI** — all three approaches use `fab`\n- **`az login` and `fab auth login` are separate** — both required for\n PowerShell/terminal deployments that include group lookups; see\n `references/technical-constraints.md`\n- **Entra group Object IDs are not collected here** — flagged as needed, collected\n at the role assignment execution step with full context\n- **Domain creation requires Fabric Admin rights; assignment requires Domain\n Contributor** — both are distinct from workspace-level permissions\n- **`notebookutils` cannot query Microsoft Graph** — if notebook deployment +\n groups are used, Object IDs must be pre-provided; see references\n- **Never collect credential values** — flag what is needed; values entered at runtime\n",
|
|
153
153
|
},
|
|
154
154
|
{
|
|
155
155
|
relativePath: "references/fabric-architecture.md",
|
|
@@ -167,7 +167,7 @@ export const EMBEDDED_SKILLS = [
|
|
|
167
167
|
files: [
|
|
168
168
|
{
|
|
169
169
|
relativePath: "SKILL.md",
|
|
170
|
-
content: "---\r\nname: generate-fabric-workspace\r\ndescription: >\r\n Use this skill when asked to create, provision, or set up a Microsoft Fabric\r\n workspace. Triggers on: \"create a Fabric workspace\", \"provision a workspace\r\n in Fabric\", \"set up a new Fabric workspace\", \"generate a workspace with\r\n capacity and permissions\", \"create workspace and assign roles in Fabric\".\r\n Collects workspace name, capacity, principals/roles, and optional domain\r\n settings, then creates the workspace using one of three approaches: PySpark\r\n notebook, PowerShell script, or interactive terminal commands. Produces a\r\n workspace definition markdown as a creation audit record. Does NOT trigger\r\n for general Fabric questions, item creation within a workspace, or\r\n workspace deletion tasks.\r\nlicense: MIT\r\ncompatibility: >\r\n ms-fabric-cli required (pip install ms-fabric-cli). Approach A requires a\r\n Fabric notebook environment. Approaches B and C require fab CLI installed\r\n locally with network access to Microsoft Fabric.\r\n---\r\n\r\n# Generate Fabric Workspace\r\n\r\n> ⚠️ **GOVERNANCE**: This skill produces notebooks and scripts for the operator to\r\n> review and run — it never executes commands directly against a live Fabric environment.\r\n> Present each generated artefact to the operator before they run it.\r\n>\r\n> ⚠️ **DETERMINISTIC**: This skill is parameterised, not creative. Run the generator\r\n> scripts with the collected parameters — the artefact structure is fixed. Adapt only\r\n> data-specific values (workspace name, capacity, roles, domain). Skip optional steps\r\n> if the operator lacks the required permissions (e.g. skip domain creation if no\r\n> Fabric Admin rights). Never write custom notebook cells or alter script structure.\r\n>\r\n> ⚠️ **GENERATION**: Always run the generator scripts (`scripts/generate_notebook.py`,\r\n> `scripts/generate_ps1.py`) via Bash to produce artefacts — never generate notebook\r\n> or script content directly. Do not present generator scripts themselves as outputs.\r\n>\r\n> **Canonical notebook pattern** — every generated PySpark notebook follows this\r\n> exact cell structure. Do not deviate:\r\n> 1. `%pip install ms-fabric-cli -q --no-warn-conflicts` (Cell 1 — no kernel restart needed)\r\n> 2. `notebookutils.credentials.getToken('pbi')` and `getToken('storage')` → set as\r\n> `os.environ['FAB_TOKEN']`, `FAB_TOKEN_ONELAKE`, `FAB_TOKEN_AZURE` (Cell 2 — auth)\r\n> 3. All workspace operations use `!fab` shell commands — `!fab mkdir`, `!fab get`,\r\n> `!fab acl set`, `!fab api`, etc. Python subprocess is never used.\r\n\r\nCreates a Microsoft Fabric workspace assigned to a specified capacity, with\r\naccess roles and optional domain assignment. If the workspace already exists,\r\ncreation is skipped and roles/domain are updated. Outputs a workspace\r\ndefinition markdown as an audit trail.\r\n\r\n## Orchestrated Context\r\n\r\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\r\nand the SOP before asking the user anything.\r\n\r\n| Parameter | Source when orchestrated |\r\n|---|---|\r\n| Deployment approach (notebook / PowerShell / terminal) | Environment profile |\r\n| Capacity name | Environment profile |\r\n| Workspace name(s) | Environment profile or implementation plan |\r\n| Access control method + Object ID resolution | Environment profile |\r\n| Domain assignment approach | Environment profile |\r\n| Credential management approach (Key Vault / runtime) | Environment profile |\r\n| Domain name, role assignments, group names | SOP shared parameters |\r\n\r\n**Only ask for parameters not found in these documents.** Summarise what was resolved\r\nautomatically, then ask for what remains.\r\n\r\n## Step 1 — Choose Approach\r\n\r\nAsk the user:\r\n\r\n> \"Which approach would you like to use?\r\n> A. **PySpark Notebook** — generates a notebook to run inside Fabric\r\n> (authenticated automatically via the notebook environment)\r\n> B. **PowerShell Script** — generates a `.ps1` for your review before execution\r\n> (requires fab CLI installed locally)\r\n> C. **Interactive Terminal** — runs fab CLI commands one by one in the terminal,\r\n> with your confirmation at each step (requires fab CLI installed locally)\"\r\n\r\n### Authentication by approach\r\n\r\n| Approach | Authentication |\r\n|---|---|\r\n| PySpark Notebook | Auto via `notebookutils.credentials.getToken('pbi')` inside Fabric |\r\n| PowerShell / Terminal | `fab auth login` (browser pop-up) or set `$env:FAB_TOKEN` / `FAB_TOKEN` |\r\n\r\n## Step 2 — Domain Handling\r\n\r\nAsk the user:\r\n\r\n> \"Would you like to:\r\n> A. **Create a new domain** and assign the workspace to it\r\n> ⚠️ Requires **Fabric Admin** tenant-level permissions.\r\n> You will also need to specify an **Entra group** that will be allowed to\r\n> add/remove workspaces from this domain (the domain contributor group).\r\n> B. **Assign the workspace to an existing domain**\r\n> C. **Skip domain assignment**\"\r\n\r\n- If **A**: collect `DOMAIN_NAME` and `DOMAIN_CONTRIBUTOR_GROUP` (the Entra\r\n group display name allowed to add/remove workspaces from the domain). Confirm\r\n the user has Fabric Admin rights.\r\n- If **B**: collect `DOMAIN_NAME` only.\r\n- If **C**: no domain parameters needed.\r\n\r\n## Step 3 — Collect Parameters\r\n\r\nCollect these values from the user:\r\n\r\n| Parameter | Required | Description |\r\n|---|---|---|\r\n| `WORKSPACE_NAME` | Yes | Display name for the workspace |\r\n| `CAPACITY_NAME` | Yes | Exact name of the Fabric capacity to assign |\r\n| `DOMAIN_NAME` | If A or B | Name of the domain (new or existing) |\r\n| `DOMAIN_CONTRIBUTOR_GROUP` | If A | Display name of the Entra group that manages the domain |\r\n| `WORKSPACE_ROLES` | Conditional | Additional principals + roles (see approach-specific guidance below) |\r\n\r\n### Workspace roles — approach-specific guidance\r\n\r\nThe workspace creator is **automatically assigned as Admin**. Before collecting\r\nadditional roles, ask:\r\n\r\n> \"You (the creator) will be automatically assigned as workspace Admin. Do you\r\n> want to assign additional roles to other users or groups?\"\r\n\r\nIf **no**, skip role collection entirely. If **yes**, load\r\n`references/role-assignment.md` for approach-specific guidance on collecting\r\nprincipals, group resolution requirements, and Service Principal prerequisites.\r\n\r\nFor each additional principal, collect:\r\n- User **email address (UPN)** or Entra **group display name** — do NOT ask for Object IDs\r\n- Principal type: `User` or `Group` (or `ServicePrincipal`)\r\n- Role: `Admin`, `Member`, `Contributor`, or `Viewer`\r\n\r\n## Step 4 — Execute\r\n\r\n### Approach A: PySpark Notebook\r\n\r\nIf role assignment includes Entra groups, `TENANT_ID`, `CLIENT_ID`, and `CLIENT_SECRET`\r\nare required — entered directly into Cell 1 of the generated notebook. See\r\n`references/role-assignment.md` for prerequisite details.\r\n\r\nRun `scripts/generate_notebook.py` with the collected parameters:\r\n\r\n```bash\r\npython scripts/generate_notebook.py \\\r\n --workspace-name \"WORKSPACE_NAME\" \\\r\n --capacity-name \"CAPACITY_NAME\" \\\r\n --roles \"user@corp.com:User:Admin,Finance Team:Group:Member\" \\\r\n [--domain-name \"DOMAIN_NAME\"] \\\r\n [--create-domain] \\\r\n [--domain-contributor-group \"DOMAIN_CONTRIBUTOR_GROUP\"] \\\r\n --output workspace_setup.ipynb\r\n```\r\n\r\nPresent the generated `workspace_setup.ipynb` to the user and instruct them to:\r\n1. Upload to any Fabric workspace as a notebook\r\n2. Run each cell **one at a time**, reading the output before proceeding\r\n3. ✅ Verification cells are clearly marked — confirm output before moving on\r\n4. Share the output of Cell 7 (`fab ls`) and Cell 9 (`fab acl ls`)\r\n\r\n### Approach B: PowerShell Script\r\n\r\nRun `scripts/generate_ps1.py` with the collected parameters:\r\n\r\n```bash\r\npython scripts/generate_ps1.py \\\r\n --workspace-name \"WORKSPACE_NAME\" \\\r\n --capacity-name \"CAPACITY_NAME\" \\\r\n --roles \"user@corp.com:User:Admin,Finance Team:Group:Member\" \\\r\n [--domain-name \"DOMAIN_NAME\"] \\\r\n [--create-domain] \\\r\n [--domain-contributor-group \"DOMAIN_CONTRIBUTOR_GROUP\"] \\\r\n --output workspace_setup.ps1\r\n```\r\n\r\nShow `workspace_setup.ps1` to the user for review. **Do not execute until the\r\nuser confirms.** Then run:\r\n\r\n```powershell\r\n.\\workspace_setup.ps1\r\n```\r\n\r\n### Approach C: Interactive Terminal\r\n\r\nRun these commands in sequence. Show output after each and ask the user to\r\nconfirm before continuing.\r\n\r\n**Install and authenticate:**\r\n```bash\r\npip install ms-fabric-cli\r\nfab auth login\r\n```\r\n\r\n**Check if workspace already exists:**\r\n```bash\r\nfab exists \"WORKSPACE_NAME.Workspace\"\r\n```\r\n- Exit code 0 → workspace exists → skip creation, go to role assignment\r\n- Non-zero → proceed to create\r\n\r\n**Create workspace:**\r\n```bash\r\nfab mkdir \"WORKSPACE_NAME.Workspace\" -P capacityName=CAPACITY_NAME -f\r\n```\r\n\r\n**Verify creation:**\r\n```bash\r\nfab exists \"WORKSPACE_NAME.Workspace\"\r\nfab ls \"WORKSPACE_NAME.Workspace\"\r\n```\r\n\r\n**Resolve principal IDs** (before assigning roles — repeat for each principal):\r\n```bash\r\n# For a user (by UPN / email):\r\naz ad user show --id user@corp.com --query id -o tsv\r\n\r\n# For a group (by display name):\r\naz ad group show --group \"Finance Team\" --query id -o tsv\r\n\r\n# For a service principal (by display name or app ID):\r\naz ad sp show --id \"My App Name\" --query id -o tsv\r\n```\r\n\r\n**Assign roles** (use the resolved Object ID, role in lowercase):\r\n```bash\r\nfab acl set \"WORKSPACE_NAME.Workspace\" -I <RESOLVED_OBJECT_ID> -R role\r\n```\r\n\r\n**Verify roles:**\r\n```bash\r\nfab acl ls \"WORKSPACE_NAME.Workspace\"\r\n```\r\n\r\n**Create domain** (if Step 2 = A):\r\n```bash\r\nfab create \".domains/DOMAIN_NAME.Domain\" -f\r\n```\r\n⚠️ After creation, set domain contributors manually in the Fabric Admin portal\r\n(admin.powerbi.com → Domains → DOMAIN_NAME → Manage contributors).\r\n`fab acl set` is not supported on `.domains/` paths.\r\n\r\n**Assign workspace to domain** (if Step 2 = A or B):\r\n```bash\r\nfab assign \".domains/DOMAIN_NAME.Domain\" -W \"WORKSPACE_NAME.Workspace\" -f\r\n```\r\n\r\n## Step 5 — Generate Workspace Definition\r\n\r\nCollect from the command output (or ask the user):\r\n- Workspace ID (appears in `fab ls` output)\r\n- Tenant name or tenant ID\r\n- Confirmed principals and roles\r\n- Domain name (if assigned)\r\n\r\nRun `scripts/generate_definition.py`:\r\n\r\n```bash\r\npython scripts/generate_definition.py \\\r\n --workspace-name \"WORKSPACE_NAME\" \\\r\n --workspace-id \"WORKSPACE_ID\" \\\r\n --capacity-name \"CAPACITY_NAME\" \\\r\n --tenant \"TENANT_NAME\" \\\r\n --roles \"user@corp.com:User:Admin,Finance Team:Group:Member\" \\\r\n [--domain-name \"DOMAIN_NAME\"] \\\r\n --approach \"notebook|powershell|terminal\" \\\r\n --output workspace_definition.md\r\n```\r\n\r\nPresent `workspace_definition.md` to the user.\r\n\r\n## Gotchas\r\n\r\n- Workspace path format is `WorkspaceName.Workspace` — the `.Workspace` suffix is required.\r\n- The capacity must be **Active** before `fab mkdir`. If you see `CapacityNotInActiveState`,\r\n ask the user to resume the capacity in the Azure portal before retrying.\r\n- `notebookutils.credentials.getToken()` in Fabric notebooks **does not support Microsoft Graph**.\r\n The notebook approach requires a Service Principal with `Group.Read.All` + `User.Read.All`\r\n application permissions and admin consent. The SP credentials are entered in Cell 1 of\r\n the generated notebook. If the user doesn't have an SP, direct them to the PowerShell\r\n or Interactive Terminal approach instead.\r\n- Domain creation requires Fabric Administrator tenant-level rights. If the user cannot\r\n create a domain, fall back to assigning an existing one or skipping.\r\n- `fab exists` uses exit code (0 = exists, non-zero = not found) — do not rely on stdout text alone.\r\n- In the notebook approach, `notebookutils` is only available inside a Fabric notebook.\r\n The generated script must not be run as a plain Python script outside Fabric.\r\n- The `.domain` suffix (lowercase) is used in `fab mkdir`; `.Domain` (capitalised) is\r\n used in `fab assign` and `fab acl set` — these are different and both matter.\r\n- Role values passed to `fab acl set` must be **lowercase** (`admin`, `member`, `contributor`, `viewer`).\r\n The scripts handle this conversion automatically.\r\n- For PowerShell/terminal approaches, `az login` must be completed before `az ad user/group show` will work.\r\n This is separate from `fab auth login` — both are required.\r\n\r\n## Available Scripts\r\n\r\n- **`scripts/generate_notebook.py`** — Generates PySpark notebook. Run: `python scripts/generate_notebook.py --help`\r\n- **`scripts/generate_ps1.py`** — Generates PowerShell script. Run: `python scripts/generate_ps1.py --help`\r\n- **`scripts/generate_definition.py`** — Generates workspace definition markdown. Run: `python scripts/generate_definition.py --help`\r\n\r\n## Available References\r\n\r\n- **`references/role-assignment.md`** — Approach-specific guidance for assigning roles to users and Entra groups. Load when user wants to assign additional workspace roles.\r\n- **`references/fabric-cli-reference.md`** — Fabric CLI command reference.\r\n",
|
|
170
|
+
content: "---\r\nname: generate-fabric-workspace\r\ndescription: >\r\n Use this skill when asked to create, provision, or set up a Microsoft Fabric\r\n workspace. Triggers on: \"create a Fabric workspace\", \"provision a workspace\r\n in Fabric\", \"set up a new Fabric workspace\", \"generate a workspace with\r\n capacity and permissions\", \"create workspace and assign roles in Fabric\".\r\n Collects workspace name, capacity, principals/roles, and optional domain\r\n settings, then creates the workspace using one of three approaches: PySpark\r\n notebook, PowerShell script, or interactive terminal commands. Produces a\r\n workspace definition markdown as a creation audit record. Does NOT trigger\r\n for general Fabric questions, item creation within a workspace, or\r\n workspace deletion tasks.\r\nlicense: MIT\r\ncompatibility: >\r\n ms-fabric-cli required (pip install ms-fabric-cli). Approach A requires a\r\n Fabric notebook environment. Approaches B and C require fab CLI installed\r\n locally with network access to Microsoft Fabric.\r\n---\r\n\r\n# Generate Fabric Workspace\r\n\r\n> ⚠️ **GOVERNANCE**: This skill produces notebooks and scripts for the operator to\r\n> review and run — it never executes commands directly against a live Fabric environment.\r\n> Present each generated artefact to the operator before they run it.\r\n>\r\n> ⚠️ **DETERMINISTIC**: The implementation pattern, tools, and artefact structure are\r\n> fixed — always run the generator scripts and follow the workflow below. The workflow\r\n> defines the permitted conditional branches (e.g. skip domain creation if no Fabric\r\n> Admin rights; use Object ID vs display-name lookup for group assignment; include or\r\n> omit role cells based on what the operator provides). Follow these branches based on\r\n> the operator's situation. Never invent logic or commands not defined in this skill.\r\n>\r\n> ⚠️ **GENERATION**: Always run the generator scripts (`scripts/generate_notebook.py`,\r\n> `scripts/generate_ps1.py`) via Bash to produce artefacts — never generate notebook\r\n> or script content directly. Do not present generator scripts themselves as outputs.\r\n>\r\n> **Canonical notebook pattern** — every generated PySpark notebook follows this\r\n> exact cell structure. Do not deviate:\r\n> 1. `%pip install ms-fabric-cli -q --no-warn-conflicts` (Cell 1 — no kernel restart needed)\r\n> 2. `notebookutils.credentials.getToken('pbi')` and `getToken('storage')` → set as\r\n> `os.environ['FAB_TOKEN']`, `FAB_TOKEN_ONELAKE`, `FAB_TOKEN_AZURE` (Cell 2 — auth)\r\n> 3. All workspace operations use `!fab` shell commands — `!fab mkdir`, `!fab get`,\r\n> `!fab acl set`, `!fab api`, etc. Python subprocess is never used.\r\n\r\nCreates a Microsoft Fabric workspace assigned to a specified capacity, with\r\naccess roles and optional domain assignment. If the workspace already exists,\r\ncreation is skipped and roles/domain are updated. Outputs a workspace\r\ndefinition markdown as an audit trail.\r\n\r\n## Orchestrated Context\r\n\r\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\r\nand the SOP before asking the user anything.\r\n\r\n| Parameter | Source when orchestrated |\r\n|---|---|\r\n| Deployment approach (notebook / PowerShell / terminal) | Environment profile |\r\n| Capacity name | Environment profile |\r\n| Workspace name(s) | Environment profile or implementation plan |\r\n| Access control method + Object ID resolution | Environment profile |\r\n| Domain assignment approach | Environment profile |\r\n| Credential management approach (Key Vault / runtime) | Environment profile |\r\n| Domain name, role assignments, group names | SOP shared parameters |\r\n\r\n**Only ask for parameters not found in these documents.** Summarise what was resolved\r\nautomatically, then ask for what remains.\r\n\r\n## Step 1 — Choose Approach\r\n\r\nAsk the user:\r\n\r\n> \"Which approach would you like to use?\r\n> A. **PySpark Notebook** — generates a notebook to run inside Fabric\r\n> (authenticated automatically via the notebook environment)\r\n> B. **PowerShell Script** — generates a `.ps1` for your review before execution\r\n> (requires fab CLI installed locally)\r\n> C. **Interactive Terminal** — runs fab CLI commands one by one in the terminal,\r\n> with your confirmation at each step (requires fab CLI installed locally)\"\r\n\r\n### Authentication by approach\r\n\r\n| Approach | Authentication |\r\n|---|---|\r\n| PySpark Notebook | Auto via `notebookutils.credentials.getToken('pbi')` inside Fabric |\r\n| PowerShell / Terminal | `fab auth login` (browser pop-up) or set `$env:FAB_TOKEN` / `FAB_TOKEN` |\r\n\r\n## Step 2 — Domain Handling\r\n\r\nAsk the user:\r\n\r\n> \"Would you like to:\r\n> A. **Create a new domain** and assign the workspace to it\r\n> ⚠️ Requires **Fabric Admin** tenant-level permissions.\r\n> You will also need to specify an **Entra group** that will be allowed to\r\n> add/remove workspaces from this domain (the domain contributor group).\r\n> B. **Assign the workspace to an existing domain**\r\n> C. **Skip domain assignment**\"\r\n\r\n- If **A**: collect `DOMAIN_NAME` and `DOMAIN_CONTRIBUTOR_GROUP` (the Entra\r\n group display name allowed to add/remove workspaces from the domain). Confirm\r\n the user has Fabric Admin rights.\r\n- If **B**: collect `DOMAIN_NAME` only.\r\n- If **C**: no domain parameters needed.\r\n\r\n## Step 3 — Collect Parameters\r\n\r\nCollect these values from the user:\r\n\r\n| Parameter | Required | Description |\r\n|---|---|---|\r\n| `WORKSPACE_NAME` | Yes | Display name for the workspace |\r\n| `CAPACITY_NAME` | Yes | Exact name of the Fabric capacity to assign |\r\n| `DOMAIN_NAME` | If A or B | Name of the domain (new or existing) |\r\n| `DOMAIN_CONTRIBUTOR_GROUP` | If A | Display name of the Entra group that manages the domain |\r\n| `WORKSPACE_ROLES` | Conditional | Additional principals + roles (see approach-specific guidance below) |\r\n\r\n### Workspace roles — approach-specific guidance\r\n\r\nThe workspace creator is **automatically assigned as Admin**. Before collecting\r\nadditional roles, ask:\r\n\r\n> \"You (the creator) will be automatically assigned as workspace Admin. Do you\r\n> want to assign additional roles to other users or groups?\"\r\n\r\nIf **no**, skip role collection entirely. If **yes**, load\r\n`references/role-assignment.md` for approach-specific guidance on collecting\r\nprincipals, group resolution requirements, and Service Principal prerequisites.\r\n\r\nFor each additional principal, collect:\r\n- User **email address (UPN)** or Entra **group display name** — do NOT ask for Object IDs\r\n- Principal type: `User` or `Group` (or `ServicePrincipal`)\r\n- Role: `Admin`, `Member`, `Contributor`, or `Viewer`\r\n\r\n## Step 4 — Execute\r\n\r\n### Approach A: PySpark Notebook\r\n\r\nIf role assignment includes Entra groups, `TENANT_ID`, `CLIENT_ID`, and `CLIENT_SECRET`\r\nare required — entered directly into Cell 1 of the generated notebook. See\r\n`references/role-assignment.md` for prerequisite details.\r\n\r\nRun `scripts/generate_notebook.py` with the collected parameters:\r\n\r\n```bash\r\npython scripts/generate_notebook.py \\\r\n --workspace-name \"WORKSPACE_NAME\" \\\r\n --capacity-name \"CAPACITY_NAME\" \\\r\n --roles \"user@corp.com:User:Admin,Finance Team:Group:Member\" \\\r\n [--domain-name \"DOMAIN_NAME\"] \\\r\n [--create-domain] \\\r\n [--domain-contributor-group \"DOMAIN_CONTRIBUTOR_GROUP\"] \\\r\n --output workspace_setup.ipynb\r\n```\r\n\r\nPresent the generated `workspace_setup.ipynb` to the user and instruct them to:\r\n1. Upload to any Fabric workspace as a notebook\r\n2. Run each cell **one at a time**, reading the output before proceeding\r\n3. ✅ Verification cells are clearly marked — confirm output before moving on\r\n4. Share the output of Cell 7 (`fab ls`) and Cell 9 (`fab acl ls`)\r\n\r\n### Approach B: PowerShell Script\r\n\r\nRun `scripts/generate_ps1.py` with the collected parameters:\r\n\r\n```bash\r\npython scripts/generate_ps1.py \\\r\n --workspace-name \"WORKSPACE_NAME\" \\\r\n --capacity-name \"CAPACITY_NAME\" \\\r\n --roles \"user@corp.com:User:Admin,Finance Team:Group:Member\" \\\r\n [--domain-name \"DOMAIN_NAME\"] \\\r\n [--create-domain] \\\r\n [--domain-contributor-group \"DOMAIN_CONTRIBUTOR_GROUP\"] \\\r\n --output workspace_setup.ps1\r\n```\r\n\r\nShow `workspace_setup.ps1` to the user for review. **Do not execute until the\r\nuser confirms.** Then run:\r\n\r\n```powershell\r\n.\\workspace_setup.ps1\r\n```\r\n\r\n### Approach C: Interactive Terminal\r\n\r\nRun these commands in sequence. Show output after each and ask the user to\r\nconfirm before continuing.\r\n\r\n**Install and authenticate:**\r\n```bash\r\npip install ms-fabric-cli\r\nfab auth login\r\n```\r\n\r\n**Check if workspace already exists:**\r\n```bash\r\nfab exists \"WORKSPACE_NAME.Workspace\"\r\n```\r\n- Exit code 0 → workspace exists → skip creation, go to role assignment\r\n- Non-zero → proceed to create\r\n\r\n**Create workspace:**\r\n```bash\r\nfab mkdir \"WORKSPACE_NAME.Workspace\" -P capacityName=CAPACITY_NAME -f\r\n```\r\n\r\n**Verify creation:**\r\n```bash\r\nfab exists \"WORKSPACE_NAME.Workspace\"\r\nfab ls \"WORKSPACE_NAME.Workspace\"\r\n```\r\n\r\n**Resolve principal IDs** (before assigning roles — repeat for each principal):\r\n```bash\r\n# For a user (by UPN / email):\r\naz ad user show --id user@corp.com --query id -o tsv\r\n\r\n# For a group (by display name):\r\naz ad group show --group \"Finance Team\" --query id -o tsv\r\n\r\n# For a service principal (by display name or app ID):\r\naz ad sp show --id \"My App Name\" --query id -o tsv\r\n```\r\n\r\n**Assign roles** (use the resolved Object ID, role in lowercase):\r\n```bash\r\nfab acl set \"WORKSPACE_NAME.Workspace\" -I <RESOLVED_OBJECT_ID> -R role\r\n```\r\n\r\n**Verify roles:**\r\n```bash\r\nfab acl ls \"WORKSPACE_NAME.Workspace\"\r\n```\r\n\r\n**Create domain** (if Step 2 = A):\r\n```bash\r\nfab create \".domains/DOMAIN_NAME.Domain\" -f\r\n```\r\n⚠️ After creation, set domain contributors manually in the Fabric Admin portal\r\n(admin.powerbi.com → Domains → DOMAIN_NAME → Manage contributors).\r\n`fab acl set` is not supported on `.domains/` paths.\r\n\r\n**Assign workspace to domain** (if Step 2 = A or B):\r\n```bash\r\nfab assign \".domains/DOMAIN_NAME.Domain\" -W \"WORKSPACE_NAME.Workspace\" -f\r\n```\r\n\r\n## Step 5 — Generate Workspace Definition\r\n\r\nCollect from the command output (or ask the user):\r\n- Workspace ID (appears in `fab ls` output)\r\n- Tenant name or tenant ID\r\n- Confirmed principals and roles\r\n- Domain name (if assigned)\r\n\r\nRun `scripts/generate_definition.py`:\r\n\r\n```bash\r\npython scripts/generate_definition.py \\\r\n --workspace-name \"WORKSPACE_NAME\" \\\r\n --workspace-id \"WORKSPACE_ID\" \\\r\n --capacity-name \"CAPACITY_NAME\" \\\r\n --tenant \"TENANT_NAME\" \\\r\n --roles \"user@corp.com:User:Admin,Finance Team:Group:Member\" \\\r\n [--domain-name \"DOMAIN_NAME\"] \\\r\n --approach \"notebook|powershell|terminal\" \\\r\n --output workspace_definition.md\r\n```\r\n\r\nPresent `workspace_definition.md` to the user.\r\n\r\n## Gotchas\r\n\r\n- Workspace path format is `WorkspaceName.Workspace` — the `.Workspace` suffix is required.\r\n- The capacity must be **Active** before `fab mkdir`. If you see `CapacityNotInActiveState`,\r\n ask the user to resume the capacity in the Azure portal before retrying.\r\n- `notebookutils.credentials.getToken()` in Fabric notebooks **does not support Microsoft Graph**.\r\n The notebook approach requires a Service Principal with `Group.Read.All` + `User.Read.All`\r\n application permissions and admin consent. The SP credentials are entered in Cell 1 of\r\n the generated notebook. If the user doesn't have an SP, direct them to the PowerShell\r\n or Interactive Terminal approach instead.\r\n- Domain creation requires Fabric Administrator tenant-level rights. If the user cannot\r\n create a domain, fall back to assigning an existing one or skipping.\r\n- `fab exists` uses exit code (0 = exists, non-zero = not found) — do not rely on stdout text alone.\r\n- In the notebook approach, `notebookutils` is only available inside a Fabric notebook.\r\n The generated script must not be run as a plain Python script outside Fabric.\r\n- The `.domain` suffix (lowercase) is used in `fab mkdir`; `.Domain` (capitalised) is\r\n used in `fab assign` and `fab acl set` — these are different and both matter.\r\n- Role values passed to `fab acl set` must be **lowercase** (`admin`, `member`, `contributor`, `viewer`).\r\n The scripts handle this conversion automatically.\r\n- For PowerShell/terminal approaches, `az login` must be completed before `az ad user/group show` will work.\r\n This is separate from `fab auth login` — both are required.\r\n\r\n## Available Scripts\r\n\r\n- **`scripts/generate_notebook.py`** — Generates PySpark notebook. Run: `python scripts/generate_notebook.py --help`\r\n- **`scripts/generate_ps1.py`** — Generates PowerShell script. Run: `python scripts/generate_ps1.py --help`\r\n- **`scripts/generate_definition.py`** — Generates workspace definition markdown. Run: `python scripts/generate_definition.py --help`\r\n\r\n## Available References\r\n\r\n- **`references/role-assignment.md`** — Approach-specific guidance for assigning roles to users and Entra groups. Load when user wants to assign additional workspace roles.\r\n- **`references/fabric-cli-reference.md`** — Fabric CLI command reference.\r\n",
|
|
171
171
|
},
|
|
172
172
|
{
|
|
173
173
|
relativePath: "references/fabric-cli-reference.md",
|
|
@@ -197,7 +197,7 @@ export const EMBEDDED_SKILLS = [
|
|
|
197
197
|
files: [
|
|
198
198
|
{
|
|
199
199
|
relativePath: "SKILL.md",
|
|
200
|
-
content: "---\r\nname: pdf-to-bronze-delta-tables\r\ndescription: >\r\n Use this skill to extract structured data from PDF files on an operator's\r\n local machine, upload them to a Microsoft Fabric bronze lakehouse, and convert\r\n them to a delta table using AI-powered field extraction. Triggers on: \"create\r\n delta tables from PDFs\", \"extract data from PDF invoices to Fabric\", \"load\r\n PDFs into bronze lakehouse\", \"parse PDF documents to delta format\", \"ingest\r\n PDF files to Fabric tables\". Does NOT trigger for CSV/Excel ingestion,\r\n transforming existing delta tables, or non-Fabric storage targets.\r\nlicense: MIT\r\ncompatibility: >\r\n Python 3.8+ for scripts/. Fabric CLI (fab) for CLI upload option.\r\n Fabric notebook runtime 1.3 required (for synapse.ml.aifunc).\r\n---\r\n\r\n# PDF to Bronze Delta Tables\r\n\r\nUploads PDF files from a local machine to a Microsoft Fabric bronze lakehouse\r\nand converts each PDF into a row in a delta table using AI field extraction.\r\nThe lakehouse must already exist.\r\n\r\n> ⚠️ **GOVERNANCE**: This skill produces notebooks and scripts for the operator to\r\n> review and run — it never executes commands directly against a live Fabric environment.\r\n> Present each generated artefact to the operator before they run it.\r\n>\r\n> ⚠️ **DETERMINISTIC**: This skill is parameterised, not creative. Run the scripts\r\n> in `scripts/` with the collected parameters — the artefact structure is fixed.\r\n> Never write custom notebook cells, invent upload commands, or suggest alternative\r\n> AI extraction approaches. Adapt only data-specific values (lakehouse name, folder,\r\n> table name, field definitions). Do not set up a virtual environment — if\r\n> `pdfplumber` is needed for field suggestion, install it directly with\r\n> `pip install pdfplumber -q`.\r\n>\r\n> ⚠️ **GENERATION**: Always run `scripts/generate_notebook.py` via Bash to produce\r\n> the `.ipynb` notebook — never generate notebook cell content directly. The\r\n> generated notebook uses native PySpark with `synapse.ml.aifunc` for AI extraction\r\n> — it does not use `fab` CLI or `FAB_TOKEN` auth.\r\n\r\n## Orchestrated Context\r\n\r\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\r\nand the SOP before asking the user anything.\r\n\r\n| Parameter | Source when orchestrated |\r\n|---|---|\r\n| Workspace name | Environment profile or implementation plan |\r\n| Lakehouse name | SOP shared parameters (from lakehouse creation step) |\r\n\r\n**Only ask for parameters not found in these documents** (e.g. local PDF folder path,\r\ndestination folder, table name, extraction field definitions).\r\n\r\n## Inputs\r\n\r\n| Parameter | Description | Example |\r\n|-----------|-------------|---------|\r\n| `WORKSPACE_NAME` | Fabric workspace name (exact, case-sensitive) | `\"Landon Finance Month End\"` |\r\n| `LAKEHOUSE_NAME` | Bronze lakehouse name (exact, case-sensitive) | `\"Lh_landon_finance_bronze\"` |\r\n| `LAKEHOUSE_FILES_FOLDER` | Folder name under lakehouse Files section | `\"Booking PDFs\"` |\r\n| `TABLE_NAME` | Target delta table name (snake_case) | `\"booking_invoices\"` |\r\n| `LOCAL_PDF_FOLDER` | Exact absolute path to local PDF folder (CLI upload only) | `\"C:\\Users\\rishi\\Data\\Booking PDFs\"` |\r\n| `FIELDS` | Fields to extract from each PDF — collected in Step 2 | See workflow |\r\n\r\n## Workflow\r\n\r\n- [ ] **Collect parameters** — If `WORKSPACE_NAME` or `LAKEHOUSE_NAME` are not\r\n provided, ask the operator for them before proceeding.\r\n\r\n- [ ] **Suggest and confirm extraction fields** — Before asking the operator to\r\n define fields from scratch, the agent should **read a sample PDF** to understand\r\n the document structure and proactively suggest fields:\r\n\r\n 1. Install `pdfplumber` if not already available (`pip install pdfplumber -q`),\r\n then use it to extract text from 1–2 sample PDFs in `LOCAL_PDF_FOLDER`.\r\n If a second PDF is from a different sub-group (e.g. different property/entity),\r\n include it to confirm layout consistency.\r\n **Do not set up a virtual environment** — install directly into the current environment.\r\n 2. Identify all extractable fields from the document structure (headers, labels,\r\n line items, totals, payment details, etc.).\r\n 3. Present the suggested fields to the operator in a table format, split into:\r\n - **Header-level fields** (one row per PDF) — for the main table\r\n - **Line-item fields** (multiple rows per PDF) — for the detail table, if\r\n the document contains repeating line items\r\n 4. For each field, show: `snake_case` name, extraction hint for the AI, and an\r\n example value from the sample PDF.\r\n 5. Ask the operator:\r\n - \"Do these fields look right? Anything to add, remove, or rename?\"\r\n - \"What should the main delta table be named?\" → `TABLE_NAME`\r\n - \"Do you want a second table for line/detail items?\" If yes:\r\n → `LINE_ITEMS_TABLE_NAME` and confirm the line-item fields\r\n - \"What folder name will the PDFs be stored in under the lakehouse Files\r\n section?\" → `LAKEHOUSE_FILES_FOLDER`\r\n 6. **Do not proceed until the operator confirms the fields.**\r\n\r\n Build `FIELDS` as a JSON array: `[{\"name\": \"...\", \"description\": \"...\"}, ...]`\r\n\r\n If the operator confirmed a second line-items table, build `LINE_ITEMS_FIELDS`\r\n as a JSON array: `[{\"name\": \"...\", \"description\": \"...\"}, ...]`\r\n\r\n- [ ] **Upload PDFs** — Present these three options and ask the operator to choose:\r\n\r\n **Option A — OneLake File Explorer (Manual)**\r\n Drag-and-drop the PDFs into the target folder under the lakehouse Files section\r\n using the OneLake File Explorer desktop app. No agent action required.\r\n\r\n **Option B — Fabric UI (Manual)**\r\n In the Fabric browser UI navigate to the lakehouse → Files section → open or\r\n create the `LAKEHOUSE_FILES_FOLDER` folder → click **Upload** and select the\r\n PDF files. No agent action required.\r\n\r\n **Option C — Fabric CLI (Automated)**\r\n > ⚠️ **Requires PowerShell** — generates a `.ps1` script. PowerShell is available\r\n > on Windows natively and on Mac/Linux via `brew install powershell`. If PowerShell\r\n > is not available and the operator does not want to install it, use Option A or B.\r\n > Do not substitute a bash or shell script.\r\n >\r\n > ⚠️ **Performance note**: The CLI uploads files one at a time. For large\r\n > batches (50+ files) this is significantly slower than Options A or B.\r\n > Recommend Options A or B for bulk uploads.\r\n\r\n Ask for `LOCAL_PDF_FOLDER` (exact absolute path). Then run:\r\n ```\r\n python scripts/generate_upload_commands.py \\\r\n --local-folder \"<LOCAL_PDF_FOLDER>\" \\\r\n --workspace \"<WORKSPACE_NAME>\" \\\r\n --lakehouse \"<LAKEHOUSE_NAME>\" \\\r\n --lakehouse-folder \"<LAKEHOUSE_FILES_FOLDER>\" \\\r\n --output-script \"<OUTPUT_FOLDER>/upload_pdf_files.ps1\"\r\n ```\r\n Present the script path to the operator and ask them to run it with `pwsh upload_pdf_files.ps1`.\r\n\r\n## Output Folder\r\n\r\nBefore beginning, create the output folder:\r\n```\r\noutputs/pdf-to-bronze-delta-tables_{YYYY-MM-DD_HH-MM}_{USERNAME}/\r\n```\r\nAll generated scripts and notebooks for this run are saved here.\r\n\r\n- [ ] **Confirm upload** — Ask the operator to confirm all PDFs are visible in the\r\n lakehouse Files section before proceeding.\r\n\r\n- [ ] **Generate TEST notebook** — Run:\r\n ```\r\n python scripts/generate_notebook.py \\\r\n --lakehouse \"<LAKEHOUSE_NAME>\" \\\r\n --lakehouse-folder \"<LAKEHOUSE_FILES_FOLDER>\" \\\r\n --table-name \"<TABLE_NAME>\" \\\r\n --fields-json \"<FIELDS_JSON>\" \\\r\n [--line-items-table-name \"<LINE_ITEMS_TABLE_NAME>\"] \\\r\n [--line-items-fields-json \"<LINE_ITEMS_FIELDS_JSON>\"] \\\r\n --test-mode \\\r\n --output-notebook \"<OUTPUT_FOLDER>\\pdf_to_delta_TEST.ipynb\"\r\n ```\r\n Where `<FIELDS_JSON>` is the JSON array built from `FIELDS` above, as a\r\n single-line string (e.g. `'[{\"name\":\"invoice_number\",\"description\":\"...\"}]'`).\r\n Include `--line-items-table-name` and `--line-items-fields-json` if a second\r\n line-items table was requested — both must be provided together.\r\n\r\n Tell the operator:\r\n 1. Go to the workspace → **New** → **Import notebook**\r\n 2. Select `pdf_to_delta_TEST.ipynb`\r\n 3. Follow the **setup steps in Cell 1** (attach the lakehouse, confirm AI features)\r\n 4. Click **Run All** — processes **one PDF only** in TEST mode\r\n 5. Share the output row displayed at the end of the notebook\r\n\r\n- [ ] **Validate and iterate** — Review the output row the operator shares:\r\n - Check each field has a value and it looks correct\r\n - If a field is missing or wrong: update its description in `FIELDS_JSON`,\r\n regenerate the TEST notebook, and ask the operator to re-run it\r\n - Repeat until all fields are correct\r\n - **Do not proceed to full run until the test row is confirmed correct**\r\n\r\n- [ ] **Generate FULL notebook** — Once test output is confirmed, run the same\r\n command **without** `--test-mode`:\r\n ```\r\n python scripts/generate_notebook.py \\\r\n --lakehouse \"<LAKEHOUSE_NAME>\" \\\r\n --lakehouse-folder \"<LAKEHOUSE_FILES_FOLDER>\" \\\r\n --table-name \"<TABLE_NAME>\" \\\r\n --fields-json \"<FIELDS_JSON>\" \\\r\n [--line-items-table-name \"<LINE_ITEMS_TABLE_NAME>\"] \\\r\n [--line-items-fields-json \"<LINE_ITEMS_FIELDS_JSON>\"] \\\r\n --output-notebook \"<OUTPUT_FOLDER>\\pdf_to_delta_FULL.ipynb\"\r\n ```\r\n Tell the operator to import and run `pdf_to_delta_FULL.ipynb`. This processes\r\n all PDFs in the folder.\r\n\r\n- [ ] **Validate final table** — Ask the operator to confirm:\r\n - Delta table `<TABLE_NAME>` appears in the Tables section of the lakehouse\r\n - Row count matches the number of PDFs uploaded\r\n - Spot-check a few rows for data quality\r\n\r\n## Table Naming\r\n\r\n- Use a descriptive `snake_case` name based on the document type, not the filename\r\n- PDFs are individual records — do not derive table name from filenames\r\n- Ask the operator to confirm the table name before generating any notebook\r\n\r\n## Gotchas\r\n\r\n- **AI features must be enabled on the capacity.** `synapse.ml.aifunc` uses Fabric's\r\n built-in AI endpoint — no Azure OpenAI key needed. Prerequisites: (1) paid Fabric\r\n capacity F2 or higher, (2) tenant admin must enable \"Copilot and other features\r\n powered by Azure OpenAI\" in Admin portal → Tenant settings, (3) if capacity is\r\n outside an Azure OpenAI region, also enable the cross-geo processing toggle.\r\n- **Default model is `gpt-4.1-mini`.** If the notebook throws `DeploymentConfigNotFound`,\r\n the `MODEL_DEPLOYMENT_NAME` in the configuration cell doesn't match a model on\r\n the built-in endpoint. Check supported models at\r\n https://learn.microsoft.com/en-us/fabric/data-science/ai-services/ai-services-overview\r\n- `fab cp` requires `./filename` (forward slash) syntax.Absolute Windows paths\r\n (`C:\\...`) cause `[NotSupported]` errors. The generated script uses `Push-Location`\r\n to work around this — do not modify this pattern.\r\n- **Destination folder must exist before uploading.** The script runs `fab mkdir` first.\r\n Running `fab mkdir` on an existing folder is safe.\r\n- `WORKSPACE_NAME` and `LAKEHOUSE_NAME` are case-sensitive.\r\n- The notebook uses `synapse.ml.aifunc` which requires Fabric **runtime 1.3**.\r\n If the operator sees import errors, check runtime version in notebook settings.\r\n- **Manually attach the lakehouse before clicking Run All.** Cell 1 contains\r\n step-by-step instructions. The notebook does not auto-attach — if you skip\r\n this step, the PDF file paths and `saveAsTable()` calls will fail.\r\n- AI extraction temperature is set to `0.0` for consistency, but it is still\r\n non-deterministic across different PDF layouts. Always validate with TEST mode first.\r\n- All extracted fields are written as strings. If the operator needs typed columns\r\n (dates, numbers), add a post-processing step after confirming extraction is correct.\r\n- **Column names come from AI extraction.** The delta table column names match\r\n the `name` field in the `FIELDS` JSON array provided during setup. These are\r\n `snake_case` names chosen by the operator (e.g., `invoice_number`, `hotel_name`).\r\n They do NOT follow the same `clean_columns()` convention used by the\r\n `csv-to-bronze-delta-tables` skill. Downstream skills (e.g.,\r\n `create-materialised-lakeview-scripts`) must verify actual delta table column\r\n names rather than assuming any naming convention.\r\n- The notebook installs `openai` and `pymupdf4llm` at runtime. The `synapse.ml.aifunc`\r\n package is pre-installed in Fabric Runtime 1.3+.\r\n\r\n## Available Scripts\r\n\r\n- **`scripts/generate_upload_commands.py`** — Scans a local folder for PDFs and\r\n writes a PowerShell script of `fab cp` upload commands.\r\n Run: `python scripts/generate_upload_commands.py --help`\r\n- **`scripts/generate_notebook.py`** — Generates a Fabric-compatible `.ipynb`\r\n notebook with the AI extraction prompt pre-populated from the supplied fields.\r\n Supports `--test-mode` for single-PDF validation runs.\r\n Run: `python scripts/generate_notebook.py --help`\r\n",
|
|
200
|
+
content: "---\r\nname: pdf-to-bronze-delta-tables\r\ndescription: >\r\n Use this skill to extract structured data from PDF files on an operator's\r\n local machine, upload them to a Microsoft Fabric bronze lakehouse, and convert\r\n them to a delta table using AI-powered field extraction. Triggers on: \"create\r\n delta tables from PDFs\", \"extract data from PDF invoices to Fabric\", \"load\r\n PDFs into bronze lakehouse\", \"parse PDF documents to delta format\", \"ingest\r\n PDF files to Fabric tables\". Does NOT trigger for CSV/Excel ingestion,\r\n transforming existing delta tables, or non-Fabric storage targets.\r\nlicense: MIT\r\ncompatibility: >\r\n Fabric notebook runtime 1.3 required (synapse.ml.aifunc pre-installed). The\r\n generated notebook is self-contained — it installs openai and pymupdf4llm at\r\n runtime inside Fabric. No local Python packages required to generate the notebook.\r\n Fabric CLI (fab) required only if using the CLI upload option.\r\n---\r\n\r\n# PDF to Bronze Delta Tables\r\n\r\nUploads PDF files from a local machine to a Microsoft Fabric bronze lakehouse\r\nand converts each PDF into a row in a delta table using AI field extraction.\r\nThe lakehouse must already exist.\r\n\r\n> ⚠️ **GOVERNANCE**: This skill produces notebooks and scripts for the operator to\r\n> review and run — it never executes commands directly against a live Fabric environment.\r\n> Present each generated artefact to the operator before they run it.\r\n>\r\n> ⚠️ **DETERMINISTIC**: The implementation pattern, tools, and artefact structure are\r\n> fixed — always run the scripts in `scripts/` and follow the workflow below. The\r\n> workflow defines the permitted conditional branches (e.g. upload via CLI, UI, or\r\n> OneLake File Explorer; single-table vs two-table extraction; TEST mode before FULL\r\n> run). Follow these branches based on the operator's situation. Never write custom\r\n> notebook cells or suggest alternative AI extraction approaches.\r\n>\r\n> The generated notebook is **self-contained for Fabric** — it installs its own\r\n> dependencies (`openai`, `pymupdf4llm`) at runtime in Cell 1. The only local\r\n> dependency is `pdfplumber`, used solely for the optional field-suggestion step\r\n> where the agent reads a sample PDF to propose extraction fields. If the operator\r\n> already knows their fields, `pdfplumber` is never needed. Install it directly with\r\n> `pip install pdfplumber -q` if required — no virtual environment needed.\r\n>\r\n> ⚠️ **GENERATION**: Always run `scripts/generate_notebook.py` via Bash to produce\r\n> the `.ipynb` notebook — never generate notebook cell content directly. The\r\n> generated notebook uses native PySpark with `synapse.ml.aifunc` for AI extraction\r\n> — it does not use `fab` CLI or `FAB_TOKEN` auth.\r\n\r\n## Orchestrated Context\r\n\r\nWhen invoked from a workflow agent, read `00-environment-discovery/environment-profile.md`\r\nand the SOP before asking the user anything.\r\n\r\n| Parameter | Source when orchestrated |\r\n|---|---|\r\n| Workspace name | Environment profile or implementation plan |\r\n| Lakehouse name | SOP shared parameters (from lakehouse creation step) |\r\n\r\n**Only ask for parameters not found in these documents** (e.g. local PDF folder path,\r\ndestination folder, table name, extraction field definitions).\r\n\r\n## Inputs\r\n\r\n| Parameter | Description | Example |\r\n|-----------|-------------|---------|\r\n| `WORKSPACE_NAME` | Fabric workspace name (exact, case-sensitive) | `\"Landon Finance Month End\"` |\r\n| `LAKEHOUSE_NAME` | Bronze lakehouse name (exact, case-sensitive) | `\"Lh_landon_finance_bronze\"` |\r\n| `LAKEHOUSE_FILES_FOLDER` | Folder name under lakehouse Files section | `\"Booking PDFs\"` |\r\n| `TABLE_NAME` | Target delta table name (snake_case) | `\"booking_invoices\"` |\r\n| `LOCAL_PDF_FOLDER` | Exact absolute path to local PDF folder (CLI upload only) | `\"C:\\Users\\rishi\\Data\\Booking PDFs\"` |\r\n| `FIELDS` | Fields to extract from each PDF — collected in Step 2 | See workflow |\r\n\r\n## Workflow\r\n\r\n- [ ] **Collect parameters** — If `WORKSPACE_NAME` or `LAKEHOUSE_NAME` are not\r\n provided, ask the operator for them before proceeding.\r\n\r\n- [ ] **Suggest and confirm extraction fields** — Before asking the operator to\r\n define fields from scratch, the agent should **read a sample PDF** to understand\r\n the document structure and proactively suggest fields:\r\n\r\n 1. If the operator has not already provided their fields, read 1–2 sample PDFs\r\n to suggest them. Install `pdfplumber` locally with `pip install pdfplumber -q`\r\n if not available (no virtual environment — install directly), then extract\r\n text from the sample PDFs. If the operator already knows their fields, skip\r\n this step entirely — ask for them directly.\r\n 2. Identify all extractable fields from the document structure (headers, labels,\r\n line items, totals, payment details, etc.).\r\n 3. Present the suggested fields to the operator in a table format, split into:\r\n - **Header-level fields** (one row per PDF) — for the main table\r\n - **Line-item fields** (multiple rows per PDF) — for the detail table, if\r\n the document contains repeating line items\r\n 4. For each field, show: `snake_case` name, extraction hint for the AI, and an\r\n example value from the sample PDF.\r\n 5. Ask the operator:\r\n - \"Do these fields look right? Anything to add, remove, or rename?\"\r\n - \"What should the main delta table be named?\" → `TABLE_NAME`\r\n - \"Do you want a second table for line/detail items?\" If yes:\r\n → `LINE_ITEMS_TABLE_NAME` and confirm the line-item fields\r\n - \"What folder name will the PDFs be stored in under the lakehouse Files\r\n section?\" → `LAKEHOUSE_FILES_FOLDER`\r\n 6. **Do not proceed until the operator confirms the fields.**\r\n\r\n Build `FIELDS` as a JSON array: `[{\"name\": \"...\", \"description\": \"...\"}, ...]`\r\n\r\n If the operator confirmed a second line-items table, build `LINE_ITEMS_FIELDS`\r\n as a JSON array: `[{\"name\": \"...\", \"description\": \"...\"}, ...]`\r\n\r\n- [ ] **Upload PDFs** — Present these three options and ask the operator to choose:\r\n\r\n **Option A — OneLake File Explorer (Manual)**\r\n Drag-and-drop the PDFs into the target folder under the lakehouse Files section\r\n using the OneLake File Explorer desktop app. No agent action required.\r\n\r\n **Option B — Fabric UI (Manual)**\r\n In the Fabric browser UI navigate to the lakehouse → Files section → open or\r\n create the `LAKEHOUSE_FILES_FOLDER` folder → click **Upload** and select the\r\n PDF files. No agent action required.\r\n\r\n **Option C — Fabric CLI (Automated)**\r\n > ⚠️ **Requires PowerShell** — generates a `.ps1` script. PowerShell is available\r\n > on Windows natively and on Mac/Linux via `brew install powershell`. If PowerShell\r\n > is not available and the operator does not want to install it, use Option A or B.\r\n > Do not substitute a bash or shell script.\r\n >\r\n > ⚠️ **Performance note**: The CLI uploads files one at a time. For large\r\n > batches (50+ files) this is significantly slower than Options A or B.\r\n > Recommend Options A or B for bulk uploads.\r\n\r\n Ask for `LOCAL_PDF_FOLDER` (exact absolute path). Then run:\r\n ```\r\n python scripts/generate_upload_commands.py \\\r\n --local-folder \"<LOCAL_PDF_FOLDER>\" \\\r\n --workspace \"<WORKSPACE_NAME>\" \\\r\n --lakehouse \"<LAKEHOUSE_NAME>\" \\\r\n --lakehouse-folder \"<LAKEHOUSE_FILES_FOLDER>\" \\\r\n --output-script \"<OUTPUT_FOLDER>/upload_pdf_files.ps1\"\r\n ```\r\n Present the script path to the operator and ask them to run it with `pwsh upload_pdf_files.ps1`.\r\n\r\n## Output Folder\r\n\r\nBefore beginning, create the output folder:\r\n```\r\noutputs/pdf-to-bronze-delta-tables_{YYYY-MM-DD_HH-MM}_{USERNAME}/\r\n```\r\nAll generated scripts and notebooks for this run are saved here.\r\n\r\n- [ ] **Confirm upload** — Ask the operator to confirm all PDFs are visible in the\r\n lakehouse Files section before proceeding.\r\n\r\n- [ ] **Generate TEST notebook** — Run:\r\n ```\r\n python scripts/generate_notebook.py \\\r\n --lakehouse \"<LAKEHOUSE_NAME>\" \\\r\n --lakehouse-folder \"<LAKEHOUSE_FILES_FOLDER>\" \\\r\n --table-name \"<TABLE_NAME>\" \\\r\n --fields-json \"<FIELDS_JSON>\" \\\r\n [--line-items-table-name \"<LINE_ITEMS_TABLE_NAME>\"] \\\r\n [--line-items-fields-json \"<LINE_ITEMS_FIELDS_JSON>\"] \\\r\n --test-mode \\\r\n --output-notebook \"<OUTPUT_FOLDER>\\pdf_to_delta_TEST.ipynb\"\r\n ```\r\n Where `<FIELDS_JSON>` is the JSON array built from `FIELDS` above, as a\r\n single-line string (e.g. `'[{\"name\":\"invoice_number\",\"description\":\"...\"}]'`).\r\n Include `--line-items-table-name` and `--line-items-fields-json` if a second\r\n line-items table was requested — both must be provided together.\r\n\r\n Tell the operator:\r\n 1. Go to the workspace → **New** → **Import notebook**\r\n 2. Select `pdf_to_delta_TEST.ipynb`\r\n 3. Follow the **setup steps in Cell 1** (attach the lakehouse, confirm AI features)\r\n 4. Click **Run All** — processes **one PDF only** in TEST mode\r\n 5. Share the output row displayed at the end of the notebook\r\n\r\n- [ ] **Validate and iterate** — Review the output row the operator shares:\r\n - Check each field has a value and it looks correct\r\n - If a field is missing or wrong: update its description in `FIELDS_JSON`,\r\n regenerate the TEST notebook, and ask the operator to re-run it\r\n - Repeat until all fields are correct\r\n - **Do not proceed to full run until the test row is confirmed correct**\r\n\r\n- [ ] **Generate FULL notebook** — Once test output is confirmed, run the same\r\n command **without** `--test-mode`:\r\n ```\r\n python scripts/generate_notebook.py \\\r\n --lakehouse \"<LAKEHOUSE_NAME>\" \\\r\n --lakehouse-folder \"<LAKEHOUSE_FILES_FOLDER>\" \\\r\n --table-name \"<TABLE_NAME>\" \\\r\n --fields-json \"<FIELDS_JSON>\" \\\r\n [--line-items-table-name \"<LINE_ITEMS_TABLE_NAME>\"] \\\r\n [--line-items-fields-json \"<LINE_ITEMS_FIELDS_JSON>\"] \\\r\n --output-notebook \"<OUTPUT_FOLDER>\\pdf_to_delta_FULL.ipynb\"\r\n ```\r\n Tell the operator to import and run `pdf_to_delta_FULL.ipynb`. This processes\r\n all PDFs in the folder.\r\n\r\n- [ ] **Validate final table** — Ask the operator to confirm:\r\n - Delta table `<TABLE_NAME>` appears in the Tables section of the lakehouse\r\n - Row count matches the number of PDFs uploaded\r\n - Spot-check a few rows for data quality\r\n\r\n## Table Naming\r\n\r\n- Use a descriptive `snake_case` name based on the document type, not the filename\r\n- PDFs are individual records — do not derive table name from filenames\r\n- Ask the operator to confirm the table name before generating any notebook\r\n\r\n## Gotchas\r\n\r\n- **AI features must be enabled on the capacity.** `synapse.ml.aifunc` uses Fabric's\r\n built-in AI endpoint — no Azure OpenAI key needed. Prerequisites: (1) paid Fabric\r\n capacity F2 or higher, (2) tenant admin must enable \"Copilot and other features\r\n powered by Azure OpenAI\" in Admin portal → Tenant settings, (3) if capacity is\r\n outside an Azure OpenAI region, also enable the cross-geo processing toggle.\r\n- **Default model is `gpt-4.1-mini`.** If the notebook throws `DeploymentConfigNotFound`,\r\n the `MODEL_DEPLOYMENT_NAME` in the configuration cell doesn't match a model on\r\n the built-in endpoint. Check supported models at\r\n https://learn.microsoft.com/en-us/fabric/data-science/ai-services/ai-services-overview\r\n- `fab cp` requires `./filename` (forward slash) syntax.Absolute Windows paths\r\n (`C:\\...`) cause `[NotSupported]` errors. The generated script uses `Push-Location`\r\n to work around this — do not modify this pattern.\r\n- **Destination folder must exist before uploading.** The script runs `fab mkdir` first.\r\n Running `fab mkdir` on an existing folder is safe.\r\n- `WORKSPACE_NAME` and `LAKEHOUSE_NAME` are case-sensitive.\r\n- The notebook uses `synapse.ml.aifunc` which requires Fabric **runtime 1.3**.\r\n If the operator sees import errors, check runtime version in notebook settings.\r\n- **Manually attach the lakehouse before clicking Run All.** Cell 1 contains\r\n step-by-step instructions. The notebook does not auto-attach — if you skip\r\n this step, the PDF file paths and `saveAsTable()` calls will fail.\r\n- AI extraction temperature is set to `0.0` for consistency, but it is still\r\n non-deterministic across different PDF layouts. Always validate with TEST mode first.\r\n- All extracted fields are written as strings. If the operator needs typed columns\r\n (dates, numbers), add a post-processing step after confirming extraction is correct.\r\n- **Column names come from AI extraction.** The delta table column names match\r\n the `name` field in the `FIELDS` JSON array provided during setup. These are\r\n `snake_case` names chosen by the operator (e.g., `invoice_number`, `hotel_name`).\r\n They do NOT follow the same `clean_columns()` convention used by the\r\n `csv-to-bronze-delta-tables` skill. Downstream skills (e.g.,\r\n `create-materialised-lakeview-scripts`) must verify actual delta table column\r\n names rather than assuming any naming convention.\r\n- The notebook installs `openai` and `pymupdf4llm` at runtime. The `synapse.ml.aifunc`\r\n package is pre-installed in Fabric Runtime 1.3+.\r\n\r\n## Available Scripts\r\n\r\n- **`scripts/generate_upload_commands.py`** — Scans a local folder for PDFs and\r\n writes a PowerShell script of `fab cp` upload commands.\r\n Run: `python scripts/generate_upload_commands.py --help`\r\n- **`scripts/generate_notebook.py`** — Generates a Fabric-compatible `.ipynb`\r\n notebook with the AI extraction prompt pre-populated from the supplied fields.\r\n Supports `--test-mode` for single-PDF validation runs.\r\n Run: `python scripts/generate_notebook.py --help`\r\n",
|
|
201
201
|
},
|
|
202
202
|
{
|
|
203
203
|
relativePath: "references/notebook-cells-reference.md",
|
package/package.json
CHANGED