PyPI - dwe-core - Versions diffs - 0.1.0a1__tar.gz - Mend

dwe-core 0.1.0a1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

dwe_core-0.1.0a1/PKG-INFO ADDED Viewed

@@ -0,0 +1,729 @@
+Metadata-Version: 2.4
+Name: dwe-core
+Version: 0.1.0a1
+Summary: DWE CLI - Data Warehouse Ecosystem Orchestrator
+Requires-Python: >=3.10
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
+Requires-Dist: PyGithub (>=2.1.1)
+Requires-Dist: copier (>=9.0.0)
+Requires-Dist: gitpython (>=3.1.40)
+Requires-Dist: jinja2 (>=3.1.2)
+Requires-Dist: python-gitlab (>=4.4.0)
+Requires-Dist: rich (>=13.7.0)
+Requires-Dist: typer[all] (>=0.9.0)
+Description-Content-Type: text/markdown
+# dwe-core
+The **DWE CLI** (`dwe`) is the orchestration brain of the Data Warehouse Ecosystem. It takes a blank or existing client Git repository and injects a fully working **Adapter** — infrastructure, application config, CI/CD pipelines, and local dev commands — in a single command.
+## How it works
+```
+dwe create-service test_adapter --git-repo https://github.com/client/repo --envs dev --envs prod
+```
+Internally this does:
+```
+1. Clone        GitPython clones the client repo to a temp directory
+2. Hydrate      Copier renders the adapter template into the clone
+3. State        CLI writes dwe-state.json
+4. CI/CD        CLI renders per-environment GitHub Actions / GitLab CI files
+5. Branch       initial-commit branch is created and committed
+6. Env branches dev, prod branches are created from initial-commit
+7. Push         All branches are pushed to the remote
+8. Secrets      GitHub/GitLab API uploads secrets to the repository settings
+```
+The result is a client repo that already has working infrastructure code, a `justfile` with `just up` / `just deploy-prod`, and CI/CD that deploys to the right environment when you push to its branch.
+---
+## Installation
+```bash
+pip install poetry        # if not already installed
+poetry install            # from dwe-core source (creates venv, installs deps)
+# or once published:
+pip install dwe-core
+```
+Verify:
+```bash
+dwe --help
+dwe list-adapters
+```
+---
+## Commands
+### `dwe create-service`
+```
+dwe create-service <adapter_name> \
+  --git-repo <url> \
+  [--envs <name>]...       \   # default: development, main
+  [--secrets <json>]       \   # e.g. '{"AWS_KEY":"abc"}'
+  [--tag <version>]        \   # adapter git tag, e.g. v1.2.0
+  [--token <api-token>]    \   # or set GITHUB_TOKEN / GITLAB_TOKEN
+  [--aws-region <region>]  \
+  [--instance-type <type>] \
+  [--clone-dir <path>]         # default: temp dir
+```
+**Example — full run:**
+```bash
+export GITHUB_TOKEN=ghp_xxxx
+dwe create-service test_adapter \
+  --git-repo https://github.com/acme/data-platform \
+  --envs development \
+  --envs staging \
+  --envs main \
+  --secrets '{"PULUMI_ACCESS_TOKEN":"pul-xxx","AWS_ACCESS_KEY_ID":"AKI...","AWS_SECRET_ACCESS_KEY":"..."}' \
+  --tag v1.0.0 \
+  --aws-region eu-west-1 \
+  --instance-type t3.small
+```
+After this runs, the `data-platform` repo has:
+```
+.github/workflows/
+  deploy-development.yaml
+  deploy-staging.yaml
+  deploy-main.yaml
+blueprint/
+  html/index.html
+  instance-setup.sh
+docker-compose.yml
+docker-compose.prod.yml
+.env.example
+justfile
+infrastructure/
+  __main__.py          <- project_name, instance_type already substituted
+  Pulumi.yaml
+  requirements.txt
+dwe-state.json
+.copier-answers.yml    <- Copier's internal state (enables future updates)
+```
+### `dwe update-service`
+```
+dwe update-service <adapter_name> <local_path> [--tag <version>]
+```
+**Example:**
+```bash
+dwe update-service test_adapter ./data-platform --tag v1.2.0
+```
+Internally:
+1. Reads `dwe-state.json` and validates the adapter name matches
+2. Creates a branch `dwe-update-20260322-1.2.0`
+3. Runs `copier.run_update()` — **smart merge** that preserves your customisations
+4. Updates `dwe-state.json` with the new version
+Review the diff on the branch, then merge into your environment branches to trigger deployments.
+### `dwe list-adapters`
+```bash
+dwe list-adapters
+```
+Shows all adapters registered in `adapters.json`.
+---
+## Adapter Registry (`adapters.json`)
+```json
+{
+  "test_adapter": {
+    "path": "/absolute/path/to/dwe_test_adapter",
+    "type": "local",
+    "description": "Test adapter: AWS EC2 instance via Pulumi"
+  },
+  "superset_adapter": {
+    "url": "https://github.com/hipposys/dwe-superset-adapter",
+    "type": "git",
+    "description": "Apache Superset on ECS"
+  }
+}
+```
+---
+## How to Define a New Adapter
+An adapter is a **real, runnable project** that also serves as a Copier template. The guiding principle:
+> **The adapter must work locally as-is.** A developer should be able to `git clone` the adapter, run `just up`, and have a working service — without running the DWE CLI at all.
+### Step 1: Create the adapter repository
+```bash
+mkdir my_adapter && cd my_adapter
+git init
+```
+### Step 2: Build a working application first
+Build your service as a real project before adding any template variables. For example, if you're building a Superset adapter:
+```bash
+# Make it work locally first
+docker compose up    # verify it runs
+```
+Only once everything works locally do you introduce `{{ variables }}`.
+### Step 3: Directory structure
+```
+my_adapter/
+├── copier.yml                  # Copier config + question definitions
+│
+├── docker-compose.yml          # Real, runnable. Uses ${ENV_VAR:-default} for runtime values.
+├── docker-compose.prod.yml     # Production overrides (restart policy, logging)
+├── .env.example                # Template for secrets — committed; .env is git-ignored
+├── .gitignore
+│
+├── justfile                    # Dev commands (just up, just deploy-prod, just infra-up)
+│
+├── blueprint/                  # Application-level config files
+│   ├── html/                   # or nginx.conf, superset_config.py, etc.
+│   └── instance-setup.sh       # EC2 user-data bootstrap script
+│
+├── infrastructure/             # Pulumi IaC — only files here use .jinja
+│   ├── __main__.py.jinja       # <- .jinja because it embeds {{ project_name }}
+│   ├── Pulumi.yaml.jinja       # <- .jinja because it embeds {{ project_name }}
+│   └── requirements.txt
+│
+└── ci-templates/               # Jinja2 templates rendered by the CLI (not Copier)
+    └── deploy.yaml             # Uses {{ ENV_NAME }}, {{ AWS_REGION }}
+```
+### Step 4: Write `copier.yml`
+`copier.yml` controls how Copier processes the adapter. Key settings:
+```yaml
+_templates_suffix: .jinja    # ONLY files ending in .jinja are treated as templates
+                              # Everything else is copied verbatim
+_exclude:
+  - copier.yml               # Don't copy Copier's own config
+  - ci-templates             # CLI handles this separately
+  - README.md                # Adapter's README is not for client repos
+  - .git
+  - .env                     # Never copy actual secrets
+  - __pycache__
+  - "*.pyc"
+_skip_if_exists:
+  - .env.example             # Preserve user customisations on updates
+# Questions (answered non-interactively by the dwe CLI):
+project_name:
+  type: str
+  help: "Client project name (used for cloud resource naming)"
+adapter_name:
+  type: str
+  default: "my_adapter"
+  when: false    # always set programmatically
+adapter_version:
+  type: str
+  default: "v1.0.0"
+  when: false    # always set programmatically
+environments:
+  type: yaml
+  default: "[development, main]"
+aws_region:
+  type: str
+  default: "us-east-1"
+```
+### Step 5: Decide what needs Jinja2
+Apply this rule: **if the value changes per client, use `{{ variable }}`. If it changes per deployment environment, use a `.env` variable.**
+| File | Approach | Reason |
+|---|---|---|
+| `docker-compose.yml` | `.env` interpolation (`${VAR:-default}`) | Works locally without any substitution; runtime config |
+| `infrastructure/__main__.py` | Jinja2 (`.jinja` extension) | Cloud resource names must be unique per client at provision time |
+| `infrastructure/Pulumi.yaml` | Jinja2 (`.jinja` extension) | Stack name must be unique per client |
+| `justfile` | Verbatim copy (no `.jinja`) | Commands are identical across clients |
+| `blueprint/instance-setup.sh` | Verbatim copy | Generic bootstrap, no client-specific values |
+| `.env.example` | Verbatim copy | Users fill in real values after cloning |
+**Jinja2 syntax in `.jinja` files:**
+```python
+# infrastructure/__main__.py.jinja
+instance = aws.ec2.Instance(
+    "{{ project_name }}-instance",          # <- substituted by Copier
+    instance_type="{{ instance_type }}",
+    ...
+)
+```
+After `dwe create-service` this becomes:
+```python
+instance = aws.ec2.Instance(
+    "acme-data-platform-instance",
+    instance_type="t3.small",
+    ...
+)
+```
+### Step 6: Write `ci-templates/deploy.yaml`
+This is a Jinja2 file rendered by the `dwe` CLI (not by Copier) to generate one workflow file per environment. The CLI uses `{@ @}` as variable delimiters (not `{{ }}`), so GitHub Actions `${{ secrets.X }}` syntax passes through **untouched** — no escaping needed.
+```yaml
+name: Deploy to {@ ENV_NAME @}
+on:
+  push:
+    branches:
+      - {@ ENV_NAME @}
+  pull_request:
+    branches:
+      - {@ ENV_NAME @}
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    environment: {@ ENV_NAME @}
+    steps:
+      - uses: actions/checkout@v4
+      - name: Deploy
+        run: just deploy-prod
+        env:
+          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}    # passes through unchanged
+          AWS_REGION: {@ AWS_REGION @}                           # substituted by dwe CLI
+```
+Available variables: `{@ ENV_NAME @}`, `{@ AWS_REGION @}`.
+### Step 7: Register the adapter
+Add an entry to `dwe-core/adapters.json`:
+**Local (development):**
+```json
+{
+  "my_adapter": {
+    "path": "/absolute/path/to/my_adapter",
+    "type": "local",
+    "description": "My adapter description"
+  }
+}
+```
+**Remote Git (production):**
+```json
+{
+  "my_adapter": {
+    "url": "https://github.com/your-org/my-adapter",
+    "type": "git",
+    "description": "My adapter description"
+  }
+}
+```
+### Step 8: Test the adapter
+**Test locally first (without DWE CLI):**
+```bash
+cd my_adapter
+cp .env.example .env
+just up                    # docker compose up — must work here
+```
+**Test Copier rendering in isolation:**
+```bash
+pip install copier
+copier copy /path/to/my_adapter /tmp/test-output \
+  --data project_name=testproject \
+  --data aws_region=us-east-1 \
+  --defaults --overwrite --trust
+# Inspect the output
+ls /tmp/test-output
+cat /tmp/test-output/infrastructure/Pulumi.yaml    # should have project_name substituted
+cat /tmp/test-output/docker-compose.yml            # should be identical to source
+cd /tmp/test-output && docker compose up           # should still work
+```
+**Test via dwe CLI:**
+```bash
+dwe create-service my_adapter \
+  --git-repo https://github.com/test-org/empty-repo \
+  --envs development \
+  --envs main
+```
+---
+## Adapter Versioning and Updates
+Tag your adapter repository with semantic version tags. The DWE CLI and Copier use these tags for `update-service`:
+```bash
+cd my_adapter
+git add -A && git commit -m "feat: add postgres service"
+git tag v1.1.0
+git push origin v1.1.0
+```
+When a client wants to update:
+```bash
+dwe update-service my_adapter ./client-repo --tag v1.1.0
+```
+Copier reads the source URL from `.copier-answers.yml` in the client repo, checks out `v1.1.0`, and runs a 3-way merge. Files the user has customised are preserved where possible; conflicts surface as standard git merge conflicts.
+**What gets updated:**
+- `infrastructure/` — Pulumi code (Jinja2 re-rendered with new template)
+- `blueprint/` — Application config files
+- `justfile` — Dev commands
+**What is NOT updated (protected):**
+- `.env.example` — skipped if it already exists (`_skip_if_exists` in `copier.yml`)
+- `.copier-answers.yml` — managed by Copier internally
+---
+## State Files
+### `dwe-state.json` (DWE-managed)
+Written by the `dwe` CLI after `copier.run_copy()`. Tracks DWE-specific metadata:
+```json
+{
+  "dwe_version": "1.0.0",
+  "adapter": {
+    "name": "test_adapter",
+    "version": "v1.0.0",
+    "last_update": "2026-03-22"
+  },
+  "environments": ["development", "main"],
+  "infrastructure": "pulumi"
+}
+```
+### `.copier-answers.yml` (Copier-managed)
+Written by Copier. Tracks the template source, version, and question answers. **Do not edit manually.** This is what enables `copier.run_update()` to know where the template came from.
+```yaml
+# Changes here will be overwritten by copier
+_commit: v1.0.0
+_src_path: /path/to/my_adapter
+project_name: acme-data-platform
+aws_region: eu-west-1
+instance_type: t3.small
+```
+Both files coexist. `dwe-state.json` is for DWE tooling; `.copier-answers.yml` is for Copier's update machinery.
+---
+## Developer Workflow After `create-service`
+Once the client repo is hydrated, the full developer loop is:
+**1. Local development (laptop):**
+```bash
+git clone https://github.com/client/data-platform
+cd data-platform
+cp .env.example .env      # fill in local values (no real AWS keys needed)
+just up                   # docker compose up — app is running at localhost:8080
+```
+**2. Provision cloud infrastructure (once):**
+```bash
+# Fill in real AWS keys in .env
+just install-infra         # pip install pulumi pulumi-aws
+just infra-preview         # see what Pulumi will create
+just infra-up              # provision the EC2 instance
+```
+**3. Deploy to EC2 (SSH into the instance, then):**
+```bash
+git clone https://github.com/client/data-platform /srv/app
+cd /srv/app
+cp .env.example .env       # fill in production values
+just deploy-prod           # docker compose -f ... up -d
+```
+**4. CI/CD (automatic after push):**
+Pushing to `development` or `main` triggers the corresponding GitHub Actions workflow. See the [CI/CD Workflow Design](#cicd-workflow-design) section below for the full two-path logic.
+---
+## CI/CD Workflow Design
+The generated CI/CD workflow (`.github/workflows/deploy-{env}.yaml`) implements a **two-path** logic inspired by the Superset production setup. The key insight: infrastructure changes and application changes require completely different responses.
+### The Two Paths
+```
+Push to branch
+       │
+       ▼
+  Detect changes
+  (dorny/paths-filter)
+       │
+       ├─── infrastructure/** changed?
+       │         │
+       │         ├─ Pull Request → pulumi preview  (validate, no apply)
+       │         └─ Push        → pulumi up --yes  (apply infra changes)
+       │
+       └─── docker-compose / blueprint changed?
+                 AND infrastructure NOT changed?
+                         │
+                         └─ Push → SSM: git pull + just deploy-prod
+                                   (redeploy app on the live EC2 instance)
+```
+**Why skip deploy when infra also changed?** The `pulumi up` step re-provisions the EC2 instance itself, which already pulls the latest code via its user-data script. Running the app deploy on top of that would be redundant and potentially racy.
+### Job Summary
+| Job | Trigger | What it does |
+|---|---|---|
+| `pulumi-preview` | PR, `infrastructure/**` changed | Runs `pulumi preview` — shows what *would* change, no side effects |
+| `pulumi-apply` | Push, `infrastructure/**` changed | Runs `pulumi up --yes` — applies infra changes |
+| `deploy-app` | Push, app files changed, infra NOT changed | AWS SSM command: `git pull && just deploy-prod` on live EC2 |
+### Required Secrets
+Set these via `dwe create-service --secrets '{...}'` or manually in GitHub repository settings:
+| Secret | Description |
+|---|---|
+| `AWS_ACCESS_KEY_ID` | AWS credentials for Pulumi and SSM |
+| `AWS_SECRET_ACCESS_KEY` | AWS credentials |
+| `PULUMI_ACCESS_TOKEN` | Pulumi Cloud token |
+| `PULUMI_CONFIG_PASSPHRASE` | Pulumi stack encryption passphrase |
+| `PULUMI_STACK` | Pulumi stack reference, e.g. `myorg/myproject/development` |
+| `EC2_INSTANCE_ID` | Instance ID from `pulumi stack output instance_id`, e.g. `i-0abc1234` |
+### SSM Prerequisites
+The `deploy-app` job uses **AWS Systems Manager (SSM)** instead of SSH — no port 22, no SSH key stored as a secret.
+To enable SSM on the EC2 instance:
+**1. IAM instance profile** — attach a role with these policies to the EC2:
+```json
+{
+  "Effect": "Allow",
+  "Action": [
+    "ssm:UpdateInstanceInformation",
+    "ssmmessages:CreateControlChannel",
+    "ssmmessages:OpenControlChannel",
+    "ec2messages:GetMessages",
+    "ec2messages:SendReply"
+  ],
+  "Resource": "*"
+}
+```
+Or simply attach the AWS managed policy `AmazonSSMManagedInstanceCore`.
+**2. SSM agent** — Amazon Linux 2023 ships with it pre-installed. The `blueprint/instance-setup.sh` bootstrap script ensures it's running:
+```bash
+systemctl enable amazon-ssm-agent
+systemctl start amazon-ssm-agent
+```
+**3. Store the instance ID** — after running `just infra-up`, get the instance ID and store it as a secret:
+```bash
+cd infrastructure && pulumi stack output instance_id
+# → i-0abc1234567890def
+# Add this to GitHub repository secrets as EC2_INSTANCE_ID
+```
+### Example: What Happens on a Typical Push
+**Scenario 1 — you edited `blueprint/html/index.html`:**
+```
+Push to development branch
+  ↓
+detect-changes: infrastructure=false, app=true
+  ↓
+deploy-app runs:
+  aws ssm send-command "git pull && just deploy-prod"
+  polls every 10s until success
+  prints stdout from EC2 instance
+  ↓
+New HTML is live ~30 seconds after push
+```
+**Scenario 2 — you changed `infrastructure/__main__.py.jinja` (e.g. bigger instance type):**
+```
+Push to development branch
+  ↓
+detect-changes: infrastructure=true, app=false
+  ↓
+pulumi-apply runs:
+  pulumi up --yes
+  Pulumi modifies the EC2 instance type in-place (or replaces it)
+  ↓
+Infrastructure updated. New instance pulls latest code via user-data.
+```
+**Scenario 3 — you opened a PR with Pulumi changes:**
+```
+Pull Request to development
+  ↓
+detect-changes: infrastructure=true
+  ↓
+pulumi-preview runs:
+  pulumi preview
+  Output shown in CI logs — no changes applied
+  ↓
+Reviewer can see exactly what Pulumi will do before merging.
+```
+### Adapting for Other Platforms
+The same two-path logic works for GitLab CI. The superset's `.gitlab-ci.yml` uses:
+```yaml
+# Skip deploy if terraform changed
+- if: $CI_COMMIT_BRANCH == "main"
+  changes:
+    - terraform_scalling/**/*
+  when: never
+# Only deploy if docker/compose changed
+- if: $CI_COMMIT_BRANCH == "main"
+  changes:
+    - docker/**/*
+    - docker-compose.yml
+```
+For your adapter's GitLab template, mirror this pattern with `pulumi` instead of `terraform` and `infrastructure/**` instead of `terraform_scalling/**`.
+---
+## Adding a New Environment Later
+Environments are set up at `create-service` time. To add one later:
+```bash
+# Create the branch
+git checkout initial-commit
+git checkout -b staging
+git push origin staging
+# Generate the workflow file
+cp .github/workflows/deploy-development.yaml .github/workflows/deploy-staging.yaml
+# Edit deploy-staging.yaml: change all occurrences of "development" to "staging"
+git add .github/workflows/deploy-staging.yaml
+git commit -m "chore: add staging environment"
+git push
+```
+---
+## Releasing to PyPI
+Two workflows handle the full release lifecycle:
+```
+bump version in pyproject.toml → merge to main
+         │
+         ▼
+  tag-version.yml          triggers on: push to main, pyproject.toml changed
+  reads Poetry version      creates git tag vX.Y.Z automatically
+         │
+         ▼
+  (go to GitHub → Releases → Draft a new release → publish it)
+         │
+         ▼
+  pypi-publish.yml          triggers on: release published
+  poetry build + publish    pushes to PyPI via PYPI_TOKEN
+```
+### One-time setup
+Add `PYPI_TOKEN` to the repository secrets (`Settings → Secrets → Actions`):
+1. Go to **https://pypi.org/manage/account/token/** and create an API token scoped to `dwe-core`
+2. In GitHub: `Settings → Secrets and variables → Actions → New repository secret`
+   - Name: `PYPI_TOKEN`
+   - Value: the token from PyPI (starts with `pypi-`)
+### Release flow
+**Step 1 — bump the version and merge to `main`:**
+```bash
+poetry version patch        # 1.0.0 → 1.0.1
+poetry version minor        # 1.0.0 → 1.1.0
+poetry version major        # 1.0.0 → 2.0.0
+poetry version prerelease   # 1.0.0 → 1.0.1a1
+poetry version 1.2.0        # set explicit version
+git add pyproject.toml
+git commit -m "chore: bump version to $(poetry version -s)"
+git push origin main
+```
+`tag-version.yml` fires on the push, reads the version from `pyproject.toml`, and pushes tag `vX.Y.Z`. No manual tagging needed, and it only runs on `main`.
+**Step 2 — publish the GitHub Release:**
+Go to `github.com/<org>/dwe-core/releases`, click **Draft a new release**, select the tag just created, and click **Publish release**.
+`pypi-publish.yml` fires on the publish event: runs `poetry install`, `poetry build`, then `poetry publish -u __token__ -p $PYPI_TOKEN`.
+---
+## Technical Stack
+| Concern | Library |
+|---|---|
+| CLI framework | [Typer](https://typer.tiangolo.com/) |
+| Template engine | [Copier](https://copier.readthedocs.io/) |
+| Git operations | [GitPython](https://gitpython.readthedocs.io/) |
+| GitHub secrets | [PyGithub](https://pygithub.readthedocs.io/) |
+| GitLab variables | [python-gitlab](https://python-gitlab.readthedocs.io/) |
+| Runtime templating | [Jinja2](https://jinja.palletsprojects.com/) (for CI templates) |
+| Infrastructure | [Pulumi](https://www.pulumi.com/) |
+| Task runner | [Just](https://just.systems/) |